مرکز منطقه ای اطلاع رساني علوم و فناوري - Comparison of Classifiers Efficiency on Missing Values Recovering: Application in a Marketing Database with Massive Missing Data

DocumentCode :

2724191

Title :

Comparison of Classifiers Efficiency on Missing Values Recovering: Application in a Marketing Database with Massive Missing Data

Author :

Nogueira, Bruno M. ; Santos, Tadeu R A ; Zárate, Luis E.

Author_Institution :

LICAP, Pontifical Catholic Univ. of Minas Gerais, Belo Horizonte

fYear :

2007

fDate :

March 1 2007-April 5 2007

Firstpage :

Lastpage :

Abstract :

Missing data in databases are considered to be one of the biggest problems faced on data mining application. This problem can be aggravated when there is massive missing data in the presence of imbalanced databases. Several techniques as samples deletion, values imputation, values prediction through classifiers and approximation of patterns have been proposed and compared, but these comparisons do not consider adverse conditions found in real databases. In this work, it is presented a comparison of techniques used to classify records from a real imbalanced database with massive missing data, where the main objective is the database pre-processing to recover and select records completely filled for further techniques application. It was compared with other algorithms such as clustering, decision tree, artificial neural networks and Bayesian classifier, expressing their efficiency through ROC curves. Through the results, it can be verified that the problem characterization and database understanding are essential steps for a correct techniques comparison in a real problem. It was observed that artificial neural networks are an interesting alternative for this kind of problem since it was capable to obtain satisfactory results even when dealing with real-world problems.

Keywords :

data mining; database management systems; marketing data processing; pattern classification; data mining application; database preprocessing; imbalanced databases; marketing database; massive missing data; missing value recovery; record classification; record recovery; record selection; Artificial neural networks; Backpropagation algorithms; Bayesian methods; Classification tree analysis; Clustering algorithms; Computational intelligence; Data mining; Databases; Decision trees; Delta modulation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on

Conference_Location :

Honolulu, HI

Print_ISBN :

1-4244-0705-2

Type :

conf

DOI :

10.1109/CIDM.2007.368854

Filename :

4221278

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2724191