Thanks Gary! This reminds me of something else that bears repeating for
those not there last night. In addition to terms of use, remember that
scraping blogs and twitter feeds bumps up against IRB rules. You can
probably get an exemption for most work of that sort, but you need to apply!
I know a lot of us aren't use to doing human subjects work, but this is an
EXTREMELY important point if you are doing something with data like this.
Brandon
On Thu, May 5, 2011 at 4:51 PM, Gary King <king(a)harvard.edu> wrote:
Hi gang. one issue worth thinking about: before you
scrape stuff off the
web, check the terms of use of the web site. Twitter, for example, just
changed its terms of use which in a way that seems to prevent you from
downloading tweets without the permission of twitter or an authorized
reseller.
Gary
--
*Gary King* - Albert J. Weatherhead III University Professor - Director,
IQSS - Harvard University
GKing.Harvard.edu <http://gking.harvard.edu/> - King(a)Harvard.edu -
@kinggary <http://twitter.com/kinggary> - 617-500-7570 - Asst 495-9271 -
Fax 812-8581
On Wed, May 4, 2011 at 9:50 PM, Brandon Stewart <bstewart(a)fas.harvard.edu>wrote;wrote:
Thanks to everyone who came out for the text
section tonight. Because the
files are really large, I've put them in the public folder of my dropbox.
You can grab them here:
http://dl.dropbox.com/u/12848660/Bundle.zip
Eventually they will be up on my website but until I have time to take
care of that I will leave this link active. The zipped folder is
approximately 75MB and contains the following:
1) Tonight's Section Slides
2) The code to grab word counts from the NYT API
3) An R package you will need to install locally to use the code above
(RJSONIO also available from OmegaHat)
4) Some R code to grab tweets off twitter and do some basic count work
with them.
5) A Folder of Materials from a Presentation I gave at University of
Washington including:
5a) A Lab Handout walking through three clustering algorithms: k-means,
Latent Dirichlet Allocation, Grimmer's Expressed Agenda Model
5b) The slides from my presentation on clustering algorithms
5c) A references list including software, textbooks and articles for text
analysis
5d) The files needed to do the lab
5e) The files I used to prep the labs from the raw text
5f) The raw texts
The workshop website is here:
http://toolsfortext.wordpress.com/ and if
you really can't get enough of me talking I think there's a video on there
somewhere of me giving the presentation.
If you guys have any questions about the material- feel free to shoot me
an email. Good luck with the end of semester rush!
Brandon
_______________________________________________
gov2001-l mailing list
gov2001-l(a)lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l
_______________________________________________
gov2001-l mailing list
gov2001-l(a)lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l
_______________________________________________
gov2001-l mailing list
gov2001-l(a)lists.fas.harvard.edu