I agree with the comments from my colleagues.
I think you explain clearly what the naive bayes approach does. I
wonder though if it is possible to explain in a sentence what a
"vanilla recursive heirarchical Dirichlet-multinomial mixture model"
does or is. Or how the other "well known algorithms" you utilize
differ from it.
Thomas
On Apr 24, 2006, at 10:18 PM, ghumphr(a)fas.harvard.edu wrote:
Geoff Humphreys and Chris Long
Classfying Political Documents
In recent years, political methodologists, have produced
innumerable automated
document classification systems. Many of these systems, such as
those based on
the well-known Naive Bayes algorithm, treat each word as a distinct
entity,
ignoring complex interactions between them. While for some
applications this
approach may appear reasonable, the precise arrangements of words
in political
documents often convey meanings which cannot be captured so easily.
In this
paper, we investigate the success of such naive algorithms by
comparing
Wordscores, a Naive Bayes derivative, to several well-known
algorithms and a
new classification system based on vanilla recursive heirarchical
Dirichlet-multinomial mixture models, pointing out avenues for future
advancement. Surprisingly, we find that the assumptions of Wordscores
notwithstanding, it shows dramatically increased performance,
comparable to
some of the latest developments in document classification, at
carrying out a
small number of carefully selected classifications on meticuously
arranged
collections of political documents, and discuss its use in practical
applications.
_______________________________________________
gov2001-l mailing list
gov2001-l(a)lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l