DocumentCode
2331179
Title
Using semantic similarity matrix for defining operations involved in NTSO for clustering 20NewsGroups
Author
Jo, Taeho
Author_Institution
Sch. of Comput. & Inf. Eng., Inha Univ., Incheon, South Korea
fYear
2010
fDate
18-23 July 2010
Firstpage
1
Lastpage
6
Abstract
In this research, we propose the similarity matrix based version of NTSO as the approach to the text clustering. For using one of traditional approaches to text clustering, documents should be encoded into numerical vectors; encoding so causes the two main problems: the huge dimensionality and the sparse distribution. In order to solve the problems, in this research, we propose to encode documents into string vectors and use the NTSO (Neural Text Self Organization) as the string vector based neural network for the text clustering. By encoding documents into another form, we attempt to avoid the two main problems, completely. As the empirical validation, the proposed approach will be compared with others with respect to the clustering performance and speed.
Keywords
matrix algebra; neural nets; pattern clustering; text analysis; vectors; 20NewsGroups; NTSO; neural network; neural text self organization; numerical vector; semantic similarity matrix; string vector; text clustering; Artificial neural networks; Clustering algorithms; Encoding; Finite element methods; Semantics; Text categorization; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Evolutionary Computation (CEC), 2010 IEEE Congress on
Conference_Location
Barcelona
Print_ISBN
978-1-4244-6909-3
Type
conf
DOI
10.1109/CEC.2010.5586335
Filename
5586335
Link To Document