DocumentCode
575010
Title
Research on the parallel text clustering algorithm based on the semantic tree
Author
Liu, Gangfeng ; Wang, Yunlan ; Zhao, Tianhai ; Li, Dongyang
Author_Institution
Center for High Performance Comput., Northwestern Polytech. Univ., Xi´´an, China
fYear
2011
fDate
Nov. 29 2011-Dec. 1 2011
Firstpage
400
Lastpage
403
Abstract
Since the semantic relationship between words is neglected, the results of the text clustering algorithms that only use word frequency are not precision. In this paper, a semantic tree based text clustering algorithm which is based on WordNet is proposed. In order to reduce the time complexity, we adopt parallel algorithm in multi-processes model. This parallel algorithm starts some processes at the same time. The master process undertakes the task of data partitioning, sending information, collecting information and clustering the result. The slave processes basically are in charge of statistics of word frequency, calculating the weights and getting hypernyms of some words according to the semantic tree. The results of experiment show that this algorithm is not only higher in precision, but also with lower time complexity.
Keywords
computational complexity; parallel algorithms; pattern clustering; statistics; text analysis; trees (mathematics); word processing; WordNet; data partitioning; information collection; information sending; multiprocesses model; parallel algorithm; parallel text clustering algorithm; semantic tree; time complexity reduction; word frequency statistics; word hypernyms; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Parallel algorithms; Partitioning algorithms; Semantics; Parallel Algorithm; Semantic Tree; Text Clustering; WordNet;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Sciences and Convergence Information Technology (ICCIT), 2011 6th International Conference on
Conference_Location
Seogwipo
Print_ISBN
978-1-4577-0472-7
Type
conf
Filename
6316646
Link To Document