DocumentCode
468424
Title
Finding Hotspots in Document Collection
Author
Peng, Wei ; Ding, Chris ; Li, Tao ; Sun, Tong
Author_Institution
Florida Int. Univ., Miami
Volume
1
fYear
2007
fDate
29-31 Oct. 2007
Firstpage
313
Lastpage
320
Abstract
Given a document collection, it is often desirable to find the core subset of documents focusing on a specific topic. We propose a new algorithm for this task. Document clustering aims at partitioning the document-term datasets into different groups by optimizing certain objective functions. However, they are not suitable for finding hotspots that are described by a small set of documents with few tightly coupled terms. In this paper we propose a novel hot spot finding algorithm, DCC (Dense Concept Clustering) in document collections. DCC can extract distinct small topics with most representative documents and words simultaneously. The hotspots are dense bicliques in binary document-word matrices and they can be discovered sequentially one at a time using the generalized Motzkin-Straus formalism. The representative documents and words are tightly correlated for concept descriptions. Experiments on real document datasets show the effectiveness of the proposed algorithm.
Keywords
document handling; pattern clustering; binary document-word matrices; dense bicliques; dense concept clustering; document clustering; document collection; document-term datasets; generalized Motzkin-Straus formalism; hot spot finding algorithm; Artificial intelligence; Bioinformatics; Clustering algorithms; Clustering methods; Computer science; Concrete; Partitioning algorithms; Sun; Technological innovation; USA Councils;
fLanguage
English
Publisher
ieee
Conference_Titel
Tools with Artificial Intelligence, 2007. ICTAI 2007. 19th IEEE International Conference on
Conference_Location
Patras
ISSN
1082-3409
Print_ISBN
978-0-7695-3015-4
Type
conf
DOI
10.1109/ICTAI.2007.173
Filename
4410300
Link To Document