Title :
Using Weight-Retouching and Under-Sampling SVM Approaches for Text Categorization on Imbalanced Data
Author_Institution :
Coll. of E-Bus., South China Univ. of Technol., Guangzhou, China
Abstract :
More and more textual documents are available, which makes it more difficult to manage text data and to retrieve useful information from document contents. Text categorization is an important way to help resolve this problem, which is an increasingly important field and has been extensively studied. In this paper, we pay attention to the performance of the minority text class and attempt to improve its precision by using some techniques of imbalanced data processing. A weight-retouching and under-sampling Support Vector Machine(SVM) approaches have been taken into account. And it shows that the processing approaches of imbalanced text data by using weight-retouching and under-sampling SVM will make improvement on precision of minority class, while it won´t blemish the global performance.
Keywords :
support vector machines; text analysis; document content; imbalanced text data processing; minority text class; support vector machine; text categorization; text data management; textual document; under-sampling SVM; weight-retouching; Appraisal; Computational efficiency; Content based retrieval; Content management; Data processing; Frequency conversion; Information retrieval; Support vector machine classification; Support vector machines; Text categorization;
Conference_Titel :
E-Business and Information System Security, 2009. EBISS '09. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-2909-7
Electronic_ISBN :
978-1-4244-2910-3
DOI :
10.1109/EBISS.2009.5138143