DocumentCode :
2251105
Title :
On the Use of Data Mining Tools for Data Preparation in Classification Problems
Author :
Goncalves, Paulo M. ; Barros, Roberto S M ; Vieira, Davi C L
Author_Institution :
Centro de Inf., Univ. Fed. de Pernambuco, Recife, Brazil
fYear :
2012
fDate :
May 30 2012-June 1 2012
Firstpage :
173
Lastpage :
178
Abstract :
The data preparation phase is a critical step in the KDD (Knowledge Discovery in Databases) process. This phase is crucial for a good data mining result because if data is not correctly prepared, all the next phases of the process are compromised. DMPML is a framework that stores preprocessed data for different data mining algorithms in an XML document and retrieves the correct codification by the use of an XSLT document according to the needs of the data mining algorithm. This paper presents a comparison between DMPML and three data mining applications (Weka, Rapid Miner, and KNIME) that implement the directed graph approach, concerning the time spent to create and execute the data preparation tasks for two data mining algorithms. The tests were executed using different types of data sets: numerical, categorical, and mixed. We observed that the scheme used by DMPML can simplify the usage of different data mining algorithms and significantly reduce the time spent creating the data preparation tasks.
Keywords :
XML; data mining; data preparation; directed graphs; pattern classification; DMPML; KDD process; XML document; XSLT document; classification problems; data mining algorithms; data mining tools; data preparation; data preparation tasks; directed graph approach; knowledge discovery in databases; Communities; Computers; Data mining; Educational institutions; Testing; Time measurement; XML; DMPML; Data preparation; Tools comparison; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Science (ICIS), 2012 IEEE/ACIS 11th International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-1536-4
Type :
conf
DOI :
10.1109/ICIS.2012.79
Filename :
6211093
Link To Document :
بازگشت