• DocumentCode
    711884
  • Title

    Search Results Clustering Algorithm Based on the Suffix Tree

  • Author

    Dengwei Wang ; Libo Liu ; Jing Dong ; Jiao Zheng

  • Author_Institution
    Sch. of Math. & Comput. Sci., Ningxia Univ., Yinchuan, China
  • fYear
    2015
  • fDate
    24-26 April 2015
  • Firstpage
    456
  • Lastpage
    460
  • Abstract
    The STC algorithm clusters the documents based on shared phrases and it is a linear time algorithm. Directed against the insufficiency of the existing STC algorithm such as the quality of clustering results and the screening of the clustering labels, the paper improves STC algorithm, respectively perfecting the choice of the base cluster, the similarity calculation formula used to merge the base clusters and the scoring function for the clustering labels. Finally entropy is taken as the evaluation criterion for the clustering results. Compared with the original algorithm there are a better effect which is attested by experiments and more readability, descriptive and distinguishable clustering labels.
  • Keywords
    computational complexity; document handling; information retrieval; pattern clustering; trees (mathematics); STC algorithm; base cluster merging; clustering labels; descriptive clustering labels; distinguishable clustering labels; entropy; linear time algorithm; scoring function; search result clustering algorithm; shared phrases; similarity calculation formula; suffix tree; Algorithm design and analysis; Bismuth; Clustering algorithms; Data mining; Entropy; Mathematical model; Search engines; clustering algorithm; document clustering; search result clustering; suffix tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Control Engineering (ICISCE), 2015 2nd International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4673-6849-0
  • Type

    conf

  • DOI
    10.1109/ICISCE.2015.106
  • Filename
    7120646