A Text Clustering Algorithm Based on Find of Density Peaks

Author

Peiyu Liu;Yingying Liu;Xiuyan Hou;Qingqing Li;Zhenfang Zhu

Author_Institution

Shandong Yingcai Univ., Jinan, China

fYear

2015

Firstpage

348

Lastpage

352

Abstract

The text clustering is one of core problems in text mining and information retrieval field, clustering algorithm is divided into four categories: the partitioned clustering algorithm, the hierarchical clustering algorithm, density-based clustering algorithm, as well as intelligence clustering algorithm. However, most clustering algorithms cannot meet the demand of speed and self-adapting about text clustering. This paper proposed a text clustering algorithm based on find of density peaks. The algorithm was implemented by the calculation of text distance and density, which was in accordance with calculation of the text vector similarity. SVM was used to express text to obtain the vector mapping for the similarity calculation. The next work was the finding of the local density and the distance from points of higher density of each text, removing the noise points, selecting the cluster center. The remaining points were assigned into the cluster which its nearest cluster center represented. According to several sets of contrast experiment, the density-based text clustering has an advantage of reliability and robustness.

Keywords

"Clustering algorithms","Partitioning algorithms","Clustering methods","Robustness","Text mining","Information retrieval","Algorithm design and analysis"

Publisher

ieee

Conference_Titel

Information Technology in Medicine and Education (ITME), 2015 7th International Conference on

Type

conf

DOI

10.1109/ITME.2015.103

Filename

7429163