DocumentCode
2284326
Title
A New Method of Training Sample Selection in Text Classification
Author
Liao, Yixing ; Pan, Xuezeng
Author_Institution
Dept. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China
Volume
1
fYear
2010
fDate
6-7 March 2010
Firstpage
211
Lastpage
214
Abstract
Aiming to noise samples in the training dataset, a new method for reducing the amount of training dataset is proposed in the paper which is applicable to text classification. This method describes the distribution of training dataset according to the representativeness score of samples in the class they belong to, so as to show representative samples and noise samples in each class. The new method is applied on Chinese text dataset provided by Fudan Database Center. The experiments show that the proposed method can reduce noise samples effectively, improve the performance of classification and decrease the computational cost.
Keywords
classification; natural language processing; text analysis; noise samples reduction; text classification; training dataset distribution; training sample selection; Computational efficiency; Computer science; Educational technology; Frequency; Iterative methods; Mutual information; Noise reduction; Paper technology; Probability; Text categorization; representativeness score; text classification; training dataset selection;
fLanguage
English
Publisher
ieee
Conference_Titel
Education Technology and Computer Science (ETCS), 2010 Second International Workshop on
Conference_Location
Wuhan
Print_ISBN
978-1-4244-6388-6
Electronic_ISBN
978-1-4244-6389-3
Type
conf
DOI
10.1109/ETCS.2010.621
Filename
5458972
Link To Document