Title :
A clustering algorithm based on latent semantic model
Author :
Wang, Bu-Yu ; Li, Mei-An ; Wang, Yong-Jiang
Author_Institution :
Coll. of Comput. & Inf. Eng., Inner Mongolia Agric. Univ., Hohhot, China
Abstract :
In order to precisely procure the Chinese person information on the web, especially distinguish from the namesake, this paper propose a clustering algorithm based on latent semantic model. It establishes for every document a latent semantic model of sentence-word matrix based on central distance, central segment, document length, etc, by building the central word library of person attributes. It clusters the similar documents by means of dynamic-extending clustering algorithm. Experiments prove that the algorithm gives high accuracy to documents clustering as well as maintaining the coherence of the person´s semantic information and highlighting the importance of semantic information under different sequences.
Keywords :
Internet; document handling; information analysis; natural language processing; pattern clustering; Chinese person information; World Wide Web; central distance; central segment; central word library; document length; documents clustering; dynamic-extending clustering algorithm; latent semantic model; person attributes; sentence-word matrix; Agricultural engineering; Clustering algorithms; Coherence; Data mining; Educational institutions; Electronic mail; Engineering profession; Heuristic algorithms; Libraries; Stress; Latent semantic model; center word distance; central word position; central word set;
Conference_Titel :
Apperceiving Computing and Intelligence Analysis, 2009. ICACIA 2009. International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-5204-0
Electronic_ISBN :
978-1-4244-5206-4
DOI :
10.1109/ICACIA.2009.5361155