• DocumentCode
    468424
  • Title

    Finding Hotspots in Document Collection

  • Author

    Peng, Wei ; Ding, Chris ; Li, Tao ; Sun, Tong

  • Author_Institution
    Florida Int. Univ., Miami
  • Volume
    1
  • fYear
    2007
  • fDate
    29-31 Oct. 2007
  • Firstpage
    313
  • Lastpage
    320
  • Abstract
    Given a document collection, it is often desirable to find the core subset of documents focusing on a specific topic. We propose a new algorithm for this task. Document clustering aims at partitioning the document-term datasets into different groups by optimizing certain objective functions. However, they are not suitable for finding hotspots that are described by a small set of documents with few tightly coupled terms. In this paper we propose a novel hot spot finding algorithm, DCC (Dense Concept Clustering) in document collections. DCC can extract distinct small topics with most representative documents and words simultaneously. The hotspots are dense bicliques in binary document-word matrices and they can be discovered sequentially one at a time using the generalized Motzkin-Straus formalism. The representative documents and words are tightly correlated for concept descriptions. Experiments on real document datasets show the effectiveness of the proposed algorithm.
  • Keywords
    document handling; pattern clustering; binary document-word matrices; dense bicliques; dense concept clustering; document clustering; document collection; document-term datasets; generalized Motzkin-Straus formalism; hot spot finding algorithm; Artificial intelligence; Bioinformatics; Clustering algorithms; Clustering methods; Computer science; Concrete; Partitioning algorithms; Sun; Technological innovation; USA Councils;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 2007. ICTAI 2007. 19th IEEE International Conference on
  • Conference_Location
    Patras
  • ISSN
    1082-3409
  • Print_ISBN
    978-0-7695-3015-4
  • Type

    conf

  • DOI
    10.1109/ICTAI.2007.173
  • Filename
    4410300