DocumentCode :
3727464
Title :
Topical Paragraph Vector learning
Author :
Qinlong Wang; Ruifang Liu; Hongqiao Li; Wenbin Guo
Author_Institution :
School of Information and Communication Engineering, Beijing University of Posts & Telecommunications, China, 100876
fYear :
2015
Firstpage :
182
Lastpage :
187
Abstract :
Word embeddings are distributed representations of word features. Despite its effectiveness, most word embeddings share a common problem that each word is represented with a single vector, which fails to capture homonymy and polysemy. In this paper, we propose Topical Paragraph Vector (TPV) which is similar to word embedding training method. We also use ordering and semantics of words as features during training. In addition, we employ latent topic model to assign specific topics to each word given the contexts of the documents. With the proposed TPV model, we obtain multiple word embeddings for each word implicitly in the latent space. Thus we overcome the weakness of single word embedding to certain extents. Furthermore, our model combines word embedding within the document as a vector for more semantic-enriched document level representation. From our experiments, we can see that it outperforms the baseline model on text classification task in 20_Newsgroup corpus.
Keywords :
"Training","Context","Semantics","Vocabulary","Context modeling","Computational modeling","Text categorization"
Publisher :
ieee
Conference_Titel :
Natural Computation (ICNC), 2015 11th International Conference on
Electronic_ISBN :
2157-9563
Type :
conf
DOI :
10.1109/ICNC.2015.7377987
Filename :
7377987
Link To Document :
بازگشت