[gov2001-l] Preliminary Abstract

24 Apr 2006

I agree with the comments from my colleagues.
I think you explain clearly what the naive bayes approach does. I  
wonder though if it is possible to explain in a sentence what a  
"vanilla recursive heirarchical Dirichlet-multinomial mixture model"  
does or is. Or how the other "well known algorithms" you utilize  
differ from it.

Thomas

On Apr 24, 2006, at 10:18 PM, ghumphr(a)fas.harvard.edu wrote:

...
  Geoff Humphreys and Chris Long

 Classfying Political Documents

 In recent years, political methodologists, have produced  
 innumerable automated
 document classification systems. Many of these systems, such as  
 those based on
 the well-known Naive Bayes algorithm, treat each word as a distinct  
 entity,
 ignoring complex interactions between them. While for some  
 applications this
 approach may appear reasonable, the precise arrangements of words  
 in political
 documents often convey meanings which cannot be captured so easily.  
 In this
 paper, we investigate the success of such naive algorithms by  
 comparing
 Wordscores, a Naive Bayes derivative, to several well-known  
 algorithms and a
 new classification system based on vanilla recursive heirarchical
 Dirichlet-multinomial mixture models, pointing out avenues for future
 advancement. Surprisingly, we find that the assumptions of Wordscores
 notwithstanding, it shows dramatically increased performance,  
 comparable to
 some of the latest developments in document classification, at  
 carrying out a
 small number of carefully selected classifications on meticuously  
 arranged
 collections of political documents, and discuss its use in practical
 applications.

 _______________________________________________
 gov2001-l mailing list
 gov2001-l(a)lists.fas.harvard.edu
 http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

[gov2001-l] Preliminary Abstract