DocumentCode
441840
Title
TCBLHT: a new method of hierarchical text clustering
Author
Xu, Jian-Suo ; Wang, Li
Author_Institution
Sch. of Economy & Manage., Henan Normal Univ., Xinxiang, China
Volume
4
fYear
2005
fDate
18-21 Aug. 2005
Firstpage
2178
Abstract
This paper presents a new method of hierarchical text clustering based on combination of latent semantic analysis (LSA) and hierarchical TGSOM, which is called TCBLHT method. The text clustering result using traditional methods cannot show hierarchical structure, however, the hierarchical structure is very important in text clustering. The TCBLHT method can automatically achieve hierarchical text clustering, and establishes vector space model (VSM) of term weight by using the theory of LSA, then semantic relation is included in the vector space model. Both theory analysis and experimental results confirm that TCBLHT method decreases the number of vector, and enhances the efficiency and precision of text clustering.
Keywords
computational linguistics; data mining; pattern clustering; text analysis; vectors; LSA; TCBLHT; hierarchical TGSOM; hierarchical text clustering; latent semantic analysis; vector space model; Clustering methods; Data mining; Frequency; Functional analysis; Machine learning; Matrix decomposition; Singular value decomposition; Statistics; Technology management; Text mining; Hierarchical TGSOM; Latent Semantic Analysis; Text Clustering; Vector Space Model;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location
Guangzhou, China
Print_ISBN
0-7803-9091-1
Type
conf
DOI
10.1109/ICMLC.2005.1527306
Filename
1527306
Link To Document