Title :
Research on short text classification for web forum
Author :
Xiaochun He ; Conghui Zhu ; Tiejun Zhao
Author_Institution :
MOE-MS Key Lab. of Natural Language Process. & Speech, Harbin Inst. of Technol., Harbin, China
Abstract :
The unique characteristic of short text makes short text classification quite different from traditional long text processing. The feature space of short text is so sparse, which makes it notoriously difficult to extract sufficient and effective features. In this paper, aiming to classify the short text on web forum accurately, a novel short-text-processing method based on semantic extension is introduced to enhance the content of the original short text, which effectively solves the problem of feature sparse. In addition, we put forward the concept of Key-Pattern (KP) and propose a new text feature representation approach based on KP, which extracts phrase with powerful semantic information as the text features. Traditional classifier model are applied to estimate the text´s classification, experimental results show that the proposed method is effective to improve the accuracy and recall of short text classification.
Keywords :
Internet; feature extraction; pattern classification; text analysis; Web forum; classifier model; feature extraction; feature sparse problem; key-pattern concept; long text processing; semantic extension; short text classification; short-text-processing method; text feature representation approach; Classification algorithms; Feature extraction; Internet; Noise measurement; Semantics; Text categorization; Key-Pattern; Semantic extension; Short text classification; Text representation; Web forum;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-61284-180-9
DOI :
10.1109/FSKD.2011.6019652