Hi gang. one issue worth thinking about: before you scrape stuff off the
web, check the terms of use of the web site. Twitter, for example, just
changed its terms of use which in a way that seems to prevent you from
downloading tweets without the permission of twitter or an authorized
reseller.
Gary
--
*Gary King* - Albert J. Weatherhead III University Professor - Director,
IQSS - Harvard University
GKing.Harvard.edu <http://gking.harvard.edu/> - King(a)Harvard.edu -
@kinggary<http://twitter.com/kinggary>- 617-500-7570 - Asst 495-9271 -
Fax 812-8581
On Wed, May 4, 2011 at 9:50 PM, Brandon Stewart <bstewart(a)fas.harvard.edu>wrote;wrote:
Thanks to everyone who came out for the text section
tonight. Because the
files are really large, I've put them in the public folder of my dropbox.
You can grab them here:
http://dl.dropbox.com/u/12848660/Bundle.zip
Eventually they will be up on my website but until I have time to take care
of that I will leave this link active. The zipped folder is approximately
75MB and contains the following:
1) Tonight's Section Slides
2) The code to grab word counts from the NYT API
3) An R package you will need to install locally to use the code above
(RJSONIO also available from OmegaHat)
4) Some R code to grab tweets off twitter and do some basic count work with
them.
5) A Folder of Materials from a Presentation I gave at University of
Washington including:
5a) A Lab Handout walking through three clustering algorithms: k-means,
Latent Dirichlet Allocation, Grimmer's Expressed Agenda Model
5b) The slides from my presentation on clustering algorithms
5c) A references list including software, textbooks and articles for text
analysis
5d) The files needed to do the lab
5e) The files I used to prep the labs from the raw text
5f) The raw texts
The workshop website is here:
http://toolsfortext.wordpress.com/ and if
you really can't get enough of me talking I think there's a video on there
somewhere of me giving the presentation.
If you guys have any questions about the material- feel free to shoot me an
email. Good luck with the end of semester rush!
Brandon
_______________________________________________
gov2001-l mailing list
gov2001-l(a)lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l
_______________________________________________
gov2001-l mailing list
gov2001-l(a)lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l