• DocumentCode
    118130
  • Title

    Document classification with distributions of word vectors

  • Author

    Chao Xing ; Dong Wang ; Xuewei Zhang ; Chao Liu

  • Author_Institution
    Center for Speaker & Language Technol. (CSLL), Tsinghua Univ., Beijing, China
  • fYear
    2014
  • fDate
    9-12 Dec. 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    The word-to-vector (W2V) technique represents words as low-dimensional continuous vectors in such a way that semantic related words are close to each other. This produces a semantic space where a word or a word collection (e.g., a document) can be well represented, and thus lends itself to a multitude of applications including document classification. Our previous study demonstrated that representations derived from word vectors are highly promising in document classification and can deliver better performance than the conventional LDA model. This paper extends the previous research and proposes to model distributions of word vectors in documents or document classes. This extends the naive approach to deriving document representations by average pooling and explores the possibility of modeling documents in the semantic space. Experiments on the sohu text database confirmed that the new approach may produce better performance on document classification.
  • Keywords
    Bayes methods; document handling; pattern classification; word processing; LDA model; W2V technique; document classes; document classification; document modeling; document representations; low-dimensional continuous vectors; naive approach; semantic related words; semantic space; sohu text database; word vector distributions; word-to-vector technique; Bayes methods; Computational modeling; Educational institutions; Semantics; Support vector machine classification; Training; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
  • Conference_Location
    Siem Reap
  • Type

    conf

  • DOI
    10.1109/APSIPA.2014.7041633
  • Filename
    7041633