DocumentCode :
2193978
Title :
The PARIS Algorithm for Determining Latent Topics
Author :
Aharon, Michal ; Cohen, Ira ; Itskovitch, Arik ; Marhaim, Inbal ; Banner, Ron
Author_Institution :
Hewlett-Packard Israel Labs., Israel
fYear :
2010
fDate :
13-13 Dec. 2010
Firstpage :
1092
Lastpage :
1099
Abstract :
We introduce a new method for discovering latent topics in sets of objects, such as documents. Our method, which we call PARIS (for Principal Atoms Recognition In Sets), aims to detect principal sets of elements, representing latent topics in the data, that tend to appear frequently together. These latent topics, which we refer to as `atoms´, are used as the basis for clustering, classification, collaborative filtering, and more. We develop a target function which balances compression and low error of representation, and the algorithm which minimizes the function. Optimization of the target function enables an automatic discovery of the number of atoms, representing the dimensionality of the data, and the atoms themselves, all in a single iterative procedure. We demonstrate PARIS´s ability to discover latent topics, even when those are arranged hierarchically, on synthetic, documents and movie ranking data, showing improved performance compared to existing algorithms, such as LDA, on text analysis and collaborative filtering tasks.
Keywords :
data structures; document handling; iterative methods; optimisation; pattern clustering; set theory; PARIS algorithm; document clustering; iterative procedure; keyword extraction; latent topics determination; optimization; principal atoms recognition in sets; set representation; target function; automatic discovery; document clustering; keyword extraction; latent concepts; machine learning; set representation; topic extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9244-2
Electronic_ISBN :
978-0-7695-4257-7
Type :
conf
DOI :
10.1109/ICDMW.2010.187
Filename :
5693416
Link To Document :
بازگشت