DocumentCode :
2530114
Title :
Majority filter-based minority prediction (MFMP): An approach for unbalanced datasets
Author :
Padmaja, T. Maruthi ; Krishna, P. Radha ; Bapi, Raju S.
Author_Institution :
IDRBT, Hyderabad
fYear :
2008
fDate :
19-21 Nov. 2008
Firstpage :
1
Lastpage :
6
Abstract :
For many data mining and machine learning applications predicting minority class samples from skewed unbalanced data sets is a crucial problem. To address this problem, we propose a majority filter-based minority prediction (MFMP) approach for unbalanced datasets. The MFMP adopts an unsupervised learning technique for selecting samples for supervised learning. The approach is based on two steps. In the first-step, minority samples are clustered and majority class samples that are out of minority classification regions are identified. This improves minority prediction rate. In the second step majority samples are randomly selected in individual clusters and this enhances majority prediction rate. Experimentally we studied the behavior of MFMP approach and compared with the traditional random under-sampling approach on a synthetic data set and three UCI repository datasets using the following classifiers: decision tree, k-nearest neighbor, Naive Bayes and Radial basis function network. Precision, Recall and F-Measure are used for evaluating performance of classifiers. The experimental evidence suggests that MFMP approach exhibits good prediction rates over minority and majority classes on all classifiers. Furthermore, the proposed approach outperforms the traditional random under-sampling approach. MFMP applied on the decision tree gave better prediction as compared to other classifiers studied.
Keywords :
Bayes methods; data mining; learning (artificial intelligence); MFMP approach; data mining; decision tree; k-nearest neighbor; machine learning; majority filter-based minority prediction; majority samples; minority samples; naive Bayes method; unbalanced datasets; unsupervised learning technique; Classification algorithms; Classification tree analysis; Clustering algorithms; Costs; Data mining; Decision trees; Insurance; Intrusion detection; Machine learning; Supervised learning; Classification; Clustering; Random Under-sampling; Selective Sampling; Unbalanced dataset;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
TENCON 2008 - 2008 IEEE Region 10 Conference
Conference_Location :
Hyderabad
Print_ISBN :
978-1-4244-2408-5
Electronic_ISBN :
978-1-4244-2409-2
Type :
conf
DOI :
10.1109/TENCON.2008.4766705
Filename :
4766705
Link To Document :
بازگشت