Title :
Short Text Feature Selection for Micro-Blog Mining
Author :
Liu, Zitao ; Yu, Wenchao ; Chen, Wei ; Wang, Shuran ; Wu, Fengyi
Author_Institution :
Int. Sch. of Software, Wuhan Univ., Wuhan, China
Abstract :
Feather selection is a process that extracts a number of feature subsets which are the most representative of the original meaning from original feature set. It greatly reduces the text processing time and increases the accuracy because of removing some data outliers. With the rapid development of Web 2.0 and the further evolution of the Internet, short text like micro-blog plays an important role in people´s daily life. However, existing feature selection methods cannot effectively extract these short text features, and greatly reduce the classification and clustering performance of short text. In this regard, we propose a novel feature selection method based on part-of-speech and HowNet. According to the composition of the text property, we choose the words with larger amount of information by different part-of-speech, and then expand the semantic features of these words based on HowNet, in this way the short text has more useful features. We use test data set collected from sina micro-blog and adopt the micro average and macro average of F1-Measure to evaluate the effects of short text classification. The results show that the short text feature selected by our method has a good amount of information, as well as good classification results.
Keywords :
Internet; Web sites; data mining; text analysis; HowNet; Internet; Web 2.0; feather selection; microblog mining; part-of-speech; short text classification; short text feature selection; Classification algorithms; Clustering algorithms; Concrete; Feature extraction; Internet; Semantics; Text categorization;
Conference_Titel :
Computational Intelligence and Software Engineering (CiSE), 2010 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-5391-7
Electronic_ISBN :
978-1-4244-5392-4
DOI :
10.1109/CISE.2010.5677015