DocumentCode :
3599878
Title :
Word distributed representation based text clustering
Author :
Shan Feng ; Ruifang Liu ; Qinlong Wang ; Ruisheng Shi
Author_Institution :
Sch. of Inf. & Commun. Eng., BUPT, Beijing, China
fYear :
2014
Firstpage :
389
Lastpage :
393
Abstract :
The fast growth of Internet web documents has posed new challenges on how to efficiently and accurately manage and retrieve the textual collections, text clustering plays a significant role. Traditional document clustering is an unsupervised categorization of a given document collection based on vector space model, which is a high sparse vector. In this paper, we propose a means to fight the existing shortcomings with a word vector in distributed representation which is obtained from a neural probabilistic language model. To improve the representation of document vector and enhance the accuracy of text clustering, we first computing semantic similarities between words using word embedded vector, and then expanding the keywords of each document. The experiment results show the method can improve the accuracy of clustering.
Keywords :
distributed processing; pattern clustering; text analysis; Internet Web documents; document vector representation; neural probabilistic language model; sparse vector; text clustering; textual collections; unsupervised categorization; vector space model; word distributed representation; Recruitment; Semantics; Keyword extension; NMF; Word distributed representation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing and Intelligence Systems (CCIS), 2014 IEEE 3rd International Conference on
Print_ISBN :
978-1-4799-4720-1
Type :
conf
DOI :
10.1109/CCIS.2014.7175766
Filename :
7175766
Link To Document :
بازگشت