مرکز منطقه ای اطلاع رساني علوم و فناوري - The research of text clustering algorithms based on frequent term sets

DocumentCode :

2329737

Title :

The research of text clustering algorithms based on frequent term sets

Author :

Liu, Xiang-Wei ; He, Pi-Lian ; Wang, Hui-Ying

Author_Institution :

Dept. of Comput. Sci., Tianjin Polytech. Univ., China

Volume :

fYear :

2005

fDate :

18-21 Aug. 2005

Firstpage :

2352

Abstract :

In this paper, we present a text-clustering algorithm of frequent term set-based clustering (FTSC), which uses frequent term sets for texts clustering. This algorithm can reduce the dimensionality of the text data efficiently, thus it can improve accurate rate and running speed of the clustering algorithm. The results of clustering texts by the FTSC algorithm cannot reflect the overlap of texts´ classes. Based on the FTSC algorithm, its improved algorithm - frequent term set-based hierarchical clustering algorithm (FTSHC) is given. This algorithm can determine the overlap of texts´ classes by the overlap of frequent words sets, and provide an understandable description of the discovered clusters by the frequent terms sets. The experiment results prove that FTSC and FTSHC algorithms are more efficient than K-Means algorithm in the performance of clustering.

Keywords :

data mining; pattern clustering; text analysis; K-Means algorithm; frequent term set-based hierarchical clustering algorithm; text clustering algorithm; Clustering algorithms; Clustering methods; Computer science; Feature extraction; Frequency; Helium; Partitioning algorithms; Tagging; Text mining; Web mining; Text cluster; Web mining; frequent term set-based clustering;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on

Conference_Location :

Guangzhou, China

Print_ISBN :

0-7803-9091-1

Type :

conf

DOI :

10.1109/ICMLC.2005.1527337

Filename :

1527337

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2329737