Title :
Finding Hotspots in Document Collection
Author :
Peng, Wei ; Ding, Chris ; Li, Tao ; Sun, Tong
Author_Institution :
Florida Int. Univ., Miami
Abstract :
Given a document collection, it is often desirable to find the core subset of documents focusing on a specific topic. We propose a new algorithm for this task. Document clustering aims at partitioning the document-term datasets into different groups by optimizing certain objective functions. However, they are not suitable for finding hotspots that are described by a small set of documents with few tightly coupled terms. In this paper we propose a novel hot spot finding algorithm, DCC (Dense Concept Clustering) in document collections. DCC can extract distinct small topics with most representative documents and words simultaneously. The hotspots are dense bicliques in binary document-word matrices and they can be discovered sequentially one at a time using the generalized Motzkin-Straus formalism. The representative documents and words are tightly correlated for concept descriptions. Experiments on real document datasets show the effectiveness of the proposed algorithm.
Keywords :
document handling; pattern clustering; binary document-word matrices; dense bicliques; dense concept clustering; document clustering; document collection; document-term datasets; generalized Motzkin-Straus formalism; hot spot finding algorithm; Artificial intelligence; Bioinformatics; Clustering algorithms; Clustering methods; Computer science; Concrete; Partitioning algorithms; Sun; Technological innovation; USA Councils;
Conference_Titel :
Tools with Artificial Intelligence, 2007. ICTAI 2007. 19th IEEE International Conference on
Conference_Location :
Patras
Print_ISBN :
978-0-7695-3015-4
DOI :
10.1109/ICTAI.2007.173