مرکز منطقه ای اطلاع رساني علوم و فناوري - Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models

DocumentCode :

3189786

Title :

Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models

Author :

Nallapati, Ramesh ; Ahmed, Amr ; Cohen, William ; Xing, Eric

fYear :

2007

fDate :

28-31 Oct. 2007

Firstpage :

343

Lastpage :

348

Abstract :

Statistical topic models such as the Latent Dirichlet Al- location (LDA) have emerged as an attractive framework to model, visualize and summarize large document collections in a completely unsupervised fashion. One of the limitations of this family of models is their assumption of exchangeabil- ity of words within documents, which results in a `bag-of- words´ representation for documents as well as topics. As a consequence, precious information that exists in the form of correlations between words is lost in these models. In this work, we adapt recent advances in sparse mod- eling techniques to the problem of modeling word corre- lations within topics and present a new algorithm called Sparse Word Graphs. Our experiments on AP corpus re- veal both long-distance and short-distance word correla- tions within topics that are semantically very meaningful. In addition, the new algorithm is highly scalable to large collections as it captures only the most important correla- tions in a sparse manner.

Keywords :

Conferences; Data mining; Educational programs; Hidden Markov models; Linear discriminant analysis; Machine learning; Machine learning algorithms; Subspace constraints; USA Councils; Visualization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on

Conference_Location :

Omaha, NE

Print_ISBN :

978-0-7695-3019-2

Electronic_ISBN :

978-0-7695-3033-8

Type :

conf

DOI :

10.1109/ICDMW.2007.39

Filename :

4476689

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3189786