[gov2001-l] Preliminary Abstract

24 Apr 2006

Geoff Humphreys and Chris Long

Classfying Political Documents

In recent years, political methodologists, have produced innumerable automated
document classification systems. Many of these systems, such as those based on
the well-known Naive Bayes algorithm, treat each word as a distinct entity,
ignoring complex interactions between them. While for some applications this
approach may appear reasonable, the precise arrangements of words in political
documents often convey meanings which cannot be captured so easily. In this
paper, we investigate the success of such naive algorithms by comparing
Wordscores, a Naive Bayes derivative, to several well-known algorithms and a
new classification system based on vanilla recursive heirarchical
Dirichlet-multinomial mixture models, pointing out avenues for future
advancement. Surprisingly, we find that the assumptions of Wordscores
notwithstanding, it shows dramatically increased performance, comparable to
some of the latest developments in document classification, at carrying out a
small number of carefully selected classifications on meticuously arranged
collections of political documents, and discuss its use in practical
applications.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

[gov2001-l] Preliminary Abstract