Title of article :
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Author/Authors :
Khanijazani, I Computer Engineering and Information Technology Department - Amirkabir University of Technology - Tehran, Iran , Salami, D Computer Engineering and Information Technology Department - Amirkabir University of Technology - Tehran, Iran , Rahbar, A Computer Engineering and Information Technology Department - Amirkabir University of Technology - Tehran, Iran , Momtazi, S Computer Engineering and Information Technology Department - Amirkabir University of Technology - Tehran, Iran
Pages :
8
From page :
443
To page :
450
Abstract :
Text clustering and classification are two main tasks of text mining. Feature selection plays a key role in the quality of the clustering and classification results. Although word-based features such as Term Frequency- Inverse Document Frequency (TF-IDF) vectors have been widely used in different applications, their shortcomings in capturing semantic concepts of text have motivated researches to use semantic models for document vector representations. The Latent Dirichlet Allocation (LDA) topic modeling and doc2vec neural document embedding are two well-known techniques for this purpose. In this work, we first studied the conceptual difference between the two models and showed that they had different behaviors and capture semantic features of texts from different perspectives. We then proposed a hybrid approach for document vector representation to benefit from the advantages of both models. The experimental results on 20newsgroup showed the superiority of the proposed model compared to each one of the baselines on both text clustering and classification tasks. We achieved a 2.6% improvement in F-measure for text clustering and a 2.1% improvement in F-measure in text classification compared to the best baseline model.
Keywords :
Neural Document Embedding , Topic Modeling , Semantic Representation , Text Mining
Journal title :
Astroparticle Physics
Serial Year :
2019
Record number :
2453045
Link To Document :
بازگشت