مرکز منطقه ای اطلاع رساني علوم و فناوري - Bias analysis in text classification for highly skewed data

DocumentCode :

2866433

Title :

Bias analysis in text classification for highly skewed data

Author :

Tang, Lei ; Liu, Huan

Author_Institution :

Dept. of Comput. Sci. & Eng., Arizona State Univ., Tempe, AZ, USA

fYear :

2005

fDate :

27-30 Nov. 2005

Abstract :

Feature selection is often applied to high-dimensional data as a preprocessing step in text classification. When dealing with highly skewed data, we observe that typical feature selection metrics like information gain or chi-squared are biased toward selecting features for the minor class, and the metric of bi-normal separation can select features for both minor and major classes. In this work, we investigate how these feature selection metrics impact on the performance of frequently used classifiers such as decision trees, naive bayes, and support vector machines via bias analysis for highly skewed data. Three types of biases are metric bias, class bias, and classifier bias. Extensive experiments are designed to understand how these biases can be employed in concert and efficiently to achieve good classification performance. We report our findings and present recommended approaches to text classification based on bias analysis and the empirical study.

Keywords :

Bayes methods; decision trees; support vector machines; text analysis; bias analysis; binormal separation; chi squared method; class bias; classifier bias; decision trees; feature selection metrics; highly skewed data; information gain; metric bias; naive Bayes; support vector machines; text classification; Classification algorithms; Classification tree analysis; Computer science; Data engineering; Decision trees; Niobium compounds; Performance analysis; Support vector machine classification; Support vector machines; Text categorization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining, Fifth IEEE International Conference on

ISSN :

1550-4786

Print_ISBN :

0-7695-2278-5

Type :

conf

DOI :

10.1109/ICDM.2005.34

Filename :

1565781

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2866433