مرکز منطقه ای اطلاع رساني علوم و فناوري - Majority filter-based minority prediction (MFMP): An approach for unbalanced datasets

DocumentCode :

2530114

Title :

Majority filter-based minority prediction (MFMP): An approach for unbalanced datasets

Author :

Padmaja, T. Maruthi ; Krishna, P. Radha ; Bapi, Raju S.

Author_Institution :

IDRBT, Hyderabad

fYear :

2008

fDate :

19-21 Nov. 2008

Firstpage :

Lastpage :

Abstract :

For many data mining and machine learning applications predicting minority class samples from skewed unbalanced data sets is a crucial problem. To address this problem, we propose a majority filter-based minority prediction (MFMP) approach for unbalanced datasets. The MFMP adopts an unsupervised learning technique for selecting samples for supervised learning. The approach is based on two steps. In the first-step, minority samples are clustered and majority class samples that are out of minority classification regions are identified. This improves minority prediction rate. In the second step majority samples are randomly selected in individual clusters and this enhances majority prediction rate. Experimentally we studied the behavior of MFMP approach and compared with the traditional random under-sampling approach on a synthetic data set and three UCI repository datasets using the following classifiers: decision tree, k-nearest neighbor, Naive Bayes and Radial basis function network. Precision, Recall and F-Measure are used for evaluating performance of classifiers. The experimental evidence suggests that MFMP approach exhibits good prediction rates over minority and majority classes on all classifiers. Furthermore, the proposed approach outperforms the traditional random under-sampling approach. MFMP applied on the decision tree gave better prediction as compared to other classifiers studied.

Keywords :

Bayes methods; data mining; learning (artificial intelligence); MFMP approach; data mining; decision tree; k-nearest neighbor; machine learning; majority filter-based minority prediction; majority samples; minority samples; naive Bayes method; unbalanced datasets; unsupervised learning technique; Classification algorithms; Classification tree analysis; Clustering algorithms; Costs; Data mining; Decision trees; Insurance; Intrusion detection; Machine learning; Supervised learning; Classification; Clustering; Random Under-sampling; Selective Sampling; Unbalanced dataset;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

TENCON 2008 - 2008 IEEE Region 10 Conference

Conference_Location :

Hyderabad

Print_ISBN :

978-1-4244-2408-5

Electronic_ISBN :

978-1-4244-2409-2

Type :

conf

DOI :

10.1109/TENCON.2008.4766705

Filename :

4766705

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2530114