Title :
Study on feature selection algorithm in topic tracking
Author :
Li, Shengdong ; Lv, Xueqiang ; Li, Yuqin ; Shi, Shuicai
Author_Institution :
Chinese Inf. Process. Res. Center, Inf. Sci. & Technol. Univ., Beijing, China
Abstract :
Text classification is the key technology for topic tracking, and vector space model (VSM) is one of the most simple and effective model for topics representation. Feature selection algorithm in VSM is an important means of data pre-processing, and it can reduce vector space dimension and improve the generalization ability of the algorithm. Therefore, it is necessary for feature selection algorithms to be in-depth and extensive research. So we study how feature space dimension and feature selection algorithm affect topic tracking. Then we get the variation law that they affect topic tracking, and add up their optimal values in topic tracking. Finally, TDT evaluation methods prove that optimal topic tracking performance based on weight of evidence for text increases by 8.762% more than mutual information.
Keywords :
feature extraction; pattern classification; support vector machines; text analysis; tracking; vectors; SVM; TDT evaluation; data preprocessing; feature selection algorithm; support vector machine; text classification; topic detection; topic tracking; vector space model; Classification algorithms; Information processing; Information science; Information technology; Mutual information; Prototypes; Space technology; Support vector machine classification; Support vector machines; Text categorization; feature selection; svm; tilt evaluation; topic tracking;
Conference_Titel :
Software Engineering and Data Mining (SEDM), 2010 2nd International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-7324-3
Electronic_ISBN :
978-89-88678-22-0