Thanks Gary!  This reminds me of something else that bears repeating for those not there last night.  In addition to terms of use, remember that scraping blogs and twitter feeds  bumps up against IRB rules.  You can probably get an exemption for most work of that sort, but you need to apply! I know a lot of us aren't use to doing human subjects work, but this is an EXTREMELY important point if you are doing something with data like this.

Brandon

On Thu, May 5, 2011 at 4:51 PM, Gary King <king@harvard.edu> wrote:
Hi gang.  one issue worth thinking about:  before you scrape stuff off the web, check the terms of use of the web site.  Twitter, for example, just changed its terms of use which in a way that seems to prevent you from downloading tweets without the permission of twitter or an authorized reseller.
Gary
--
Gary KingAlbert J. Weatherhead III University Professor - Director, IQSS - Harvard University
GKing.Harvard.edu - King@Harvard.edu - @kinggary - 617-500-7570 - Asst 495-9271 - Fax 812-8581



On Wed, May 4, 2011 at 9:50 PM, Brandon Stewart <bstewart@fas.harvard.edu> wrote:
Thanks to everyone who came out for the text section tonight.  Because the files are really large, I've put them in the public folder of my dropbox.  You can grab them here:

Eventually they will be up on my website but until I have time to take care of that I will leave this link active.  The zipped folder is approximately 75MB and contains the following:
1) Tonight's Section Slides
2) The code to grab word counts from the NYT API
3) An R package you will need to install locally to use the code above (RJSONIO also available from OmegaHat)
4) Some R code to grab tweets off twitter and do some basic count work with them.
5) A Folder of Materials from a Presentation I gave at University of Washington including:
5a) A Lab Handout walking through three clustering algorithms: k-means, Latent Dirichlet Allocation, Grimmer's Expressed Agenda Model
5b) The slides from my presentation on clustering algorithms
5c) A references list including software, textbooks and articles for text analysis
5d) The files needed to do the lab
5e) The files I used to prep the labs from the raw text
5f) The raw texts

The workshop website is here: http://toolsfortext.wordpress.com/ and if you really can't get enough of me talking I think there's a video on there somewhere of me giving the presentation.  

If you guys have any questions about the material- feel free to shoot me an email.  Good luck with the end of semester rush!

Brandon

_______________________________________________
gov2001-l mailing list
gov2001-l@lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l



_______________________________________________
gov2001-l mailing list
gov2001-l@lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l