DocumentCode
118130
Title
Document classification with distributions of word vectors
Author
Chao Xing ; Dong Wang ; Xuewei Zhang ; Chao Liu
Author_Institution
Center for Speaker & Language Technol. (CSLL), Tsinghua Univ., Beijing, China
fYear
2014
fDate
9-12 Dec. 2014
Firstpage
1
Lastpage
5
Abstract
The word-to-vector (W2V) technique represents words as low-dimensional continuous vectors in such a way that semantic related words are close to each other. This produces a semantic space where a word or a word collection (e.g., a document) can be well represented, and thus lends itself to a multitude of applications including document classification. Our previous study demonstrated that representations derived from word vectors are highly promising in document classification and can deliver better performance than the conventional LDA model. This paper extends the previous research and proposes to model distributions of word vectors in documents or document classes. This extends the naive approach to deriving document representations by average pooling and explores the possibility of modeling documents in the semantic space. Experiments on the sohu text database confirmed that the new approach may produce better performance on document classification.
Keywords
Bayes methods; document handling; pattern classification; word processing; LDA model; W2V technique; document classes; document classification; document modeling; document representations; low-dimensional continuous vectors; naive approach; semantic related words; semantic space; sohu text database; word vector distributions; word-to-vector technique; Bayes methods; Computational modeling; Educational institutions; Semantics; Support vector machine classification; Training; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
Conference_Location
Siem Reap
Type
conf
DOI
10.1109/APSIPA.2014.7041633
Filename
7041633
Link To Document