DocumentCode
711884
Title
Search Results Clustering Algorithm Based on the Suffix Tree
Author
Dengwei Wang ; Libo Liu ; Jing Dong ; Jiao Zheng
Author_Institution
Sch. of Math. & Comput. Sci., Ningxia Univ., Yinchuan, China
fYear
2015
fDate
24-26 April 2015
Firstpage
456
Lastpage
460
Abstract
The STC algorithm clusters the documents based on shared phrases and it is a linear time algorithm. Directed against the insufficiency of the existing STC algorithm such as the quality of clustering results and the screening of the clustering labels, the paper improves STC algorithm, respectively perfecting the choice of the base cluster, the similarity calculation formula used to merge the base clusters and the scoring function for the clustering labels. Finally entropy is taken as the evaluation criterion for the clustering results. Compared with the original algorithm there are a better effect which is attested by experiments and more readability, descriptive and distinguishable clustering labels.
Keywords
computational complexity; document handling; information retrieval; pattern clustering; trees (mathematics); STC algorithm; base cluster merging; clustering labels; descriptive clustering labels; distinguishable clustering labels; entropy; linear time algorithm; scoring function; search result clustering algorithm; shared phrases; similarity calculation formula; suffix tree; Algorithm design and analysis; Bismuth; Clustering algorithms; Data mining; Entropy; Mathematical model; Search engines; clustering algorithm; document clustering; search result clustering; suffix tree;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Science and Control Engineering (ICISCE), 2015 2nd International Conference on
Conference_Location
Shanghai
Print_ISBN
978-1-4673-6849-0
Type
conf
DOI
10.1109/ICISCE.2015.106
Filename
7120646
Link To Document