In recent years, corporate juggernauts such as Google have amassed enormous
fortunes by exploiting relatively simple document processing algorithms. A
huge body of literature detailing techniques for information
retrieval, document ranking and classification of large sequences and vectors
has sprung forth, and countless algorithms have been developed to take
advantage of rapid developments in this ever-changing realm. One such
algorithm is Wordscores, a simple document ranking algorithm developed recently
by Michael Laver, Kenneth Benoit and John Garry for extracting relative
political policy positions from documents. Despite its simplicity, we find
that Wordscores performs remarkably well at ranking political documents when
compared to a variety of classic algorithms. In this paper, we examine the
performance of Wordscores and illustrate its superiority at ranking political
texts, proposing extensions for improving its performance, detailing its
assumptions, and specifying conditions which must be met in order to use
Wordscores in making substantiative claims.
Quoting ghumphr(a)fas.harvard.edu:
Ranking Documents with Wordscores
In recent years, corporate juggernauts such as Google, have amassed enormous
fortunes by making use of relatively simple document processing algorithms.
A
huge body of literature has sprung forth detailing techniques for information
retrieval, document ranking and classification of large sequences and
vectors,
and countless algorithms have been developed to take advantage rapid
developments in this ever-changing realm. One such algorithm is Wordscores,
a
simple document ranking algorithm recently developed by Michael Laver,
Kenneth
Benoit and John Garry for extracting relative political policy positions from
documents. Despite its simplicity Wordscores performs remarkably well at
ranking political documents when compared to a variety of classic document
ranking algorithms. In this paper, we examine the performance of Wordscores
and illustrate its superiority at ranking political texts, proposing
extensions
for improving its performance, detailing its assumptions, and specifying
conditions which must be met in order to use it to make substantiative
claims.
_______________________________________________
gov2001-l mailing list
gov2001-l(a)lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l