DocumentCode :
3661202
Title :
A word distributed representation based framework for large-scale short text classification
Author :
Di Yao; Jingping Bi; Jianhui Huang; Jin Zhu
Author_Institution :
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190 China
fYear :
2015
fDate :
7/1/2015 12:00:00 AM
Firstpage :
1
Lastpage :
7
Abstract :
With the development of internet, there are billions of short texts generated each day. However, the accuracy of large scale short text classification is poor due to the data sparseness. Traditional methods used to use external dataset to enrich the representation of document and solve the data sparsity problem. But external dataset which matches the specific short texts is hard to find. In this paper, we propose a framework to solve the data sparsity problem without using external dataset. Our framework deal with large scale short text by making the most of semantic similarity of words which learned from the training short texts. First, we learn word distributed representation and measure the word semantic similarity from the training short texts. Then, we propose a method which enrich the document representation by using the word semantic similarity information. At last, we build classifiers based on the enriched representation. We evaluate our framework on both the benchmark dataset(Standford Sentiment Treebank) and the large scale Chinese news title dataset which collected by ourselves. For the benchmark dataset, using our framework can improve 3% classification accuracy. The result we tested on the large scale Chinese news title dataset shows that our framework achieve better result with the increase of the training set size.
Keywords :
"Training","Testing"
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), 2015 International Joint Conference on
Electronic_ISBN :
2161-4407
Type :
conf
DOI :
10.1109/IJCNN.2015.7280513
Filename :
7280513
Link To Document :
بازگشت