• DocumentCode
    2131295
  • Title

    A New Graph-Based Algorithm for Clustering Documents

  • Author

    Suarez, A.P. ; Trinidad, José Fco Martínez ; Ochoa, Jesús Ariel Carrasco ; Pagola, José E Medina

  • Author_Institution
    Adv. Technol. Applic. Center, La Habana
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    710
  • Lastpage
    719
  • Abstract
    In this paper a new algorithm, called CStar, for document clustering is presented. This algorithm improves recently developed algorithms like generalized star (GStar) and ACONS algorithms, originally proposed for reducing some drawbacks presented in previous Star-like algorithms.The CStar algorithm uses the condensed star-shaped sub-graph concept defined by ACONS, but defines a new heuristic that allows to construct a new cover of the thresholded similarity graph and to reduce the drawbacks presented in GStar and ACONS algorithms. The experimentation over standard document collections shows that our proposal outperforms previously defined algorithms and other related algorithms used to document clustering.
  • Keywords
    document handling; graph theory; pattern clustering; ACONS algorithm; CStar algorithm; GStar algorithm; clustering documents; condensed star-shaped subgraph concept; generalized star algorithm; graph-based algorithm; Astrophysics; Clustering algorithms; Conferences; Data mining; Filtering; Gas insulated transmission lines; Information retrieval; Optical filters; Parallel algorithms; Proposals; Clustering; Data Mining; Text Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
  • Conference_Location
    Pisa
  • Print_ISBN
    978-0-7695-3503-6
  • Electronic_ISBN
    978-0-7695-3503-6
  • Type

    conf

  • DOI
    10.1109/ICDMW.2008.69
  • Filename
    4733997