[gov2001-l] Abstract

7 May 2006

Last minute comments?

In recent years, political methodologists, have produced innumerable automated
document ranking and classification systems. Many of these ignore word sequence
information, treating entire documents as mere collections of words. A subset of
these, including those based on the well-known Naive Bayes algorithm, assume
that word frequencies are unrelated and that word sequence information is
unimportant \cite{domingos96}. A recently developed algorithm known as
Wordscores makes an even wider set of assumptions \cite{wordscores2003}. In
this paper, we compare Wordscores to several more moderate document ranking,
classification, and summarization algorithms. Surprisingly, we find that
Wordscores shows remarkably improved performance at carrying out a small number
of carefully selected classifications on meticuously arranged collections of
political documents, demonstrate its performance at gauging the effects of news
headlines on S\&P 500 daily securities prices, and discuss its utility in other
applications.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

[gov2001-l] Abstract