DocumentCode
3136506
Title
Data pre-processing support for data mining
Author
MikSovský, Petr ; Matousek, K. ; Kouba, Zdenek
Author_Institution
Fac. of Electr. Eng., Czech Tech. Univ., Prague, Czech Republic
Volume
5
fYear
2002
fDate
6-9 Oct. 2002
Abstract
It is well known that success of every data mining algorithm is strongly dependent on the quality of data processing. In this context it is natural that data pre-processing can be a very complicated task. Sometimes, data pre-processing takes more than half of the total time spent by solving the data mining problem. The paper describes a tool called SumatraTT, the goal of which is to make the process of data pre-processing easier and faster. Basically, SumatraTT (Transformation Tool) is a metadata-driven, platform independent, extensible, and universal data processing tool. These features have been achieved by building the tool as an interpreter of a transformation-oriented scripting language called SumatraScript. SumatraScript a is fully interpreted Java-like language combining together data access, metadata access, and common programming constructions. Furthermore, it supports RAD (Rapid Application Development) technology by providing the library of re-usable transformation templates. The second part of the paper contains a practical application of SumatraTT. It is a task aimed at prediction of water consumption in a regional distribution network.
Keywords
authoring languages; data handling; data mining; meta data; software libraries; software reusability; very large databases; Java-like language; Rapid Application Development; SumatraScript; SumatraTT; Transformation Tool; data access; data mining; data pre-processing support; interpreter; metadata; regional distribution network; reusable transformation templates; transformation-oriented scripting language; universal data processing tool; water consumption prediction; Buildings; Data mining; Data preprocessing; Data processing; Databases; Decision making; Filtering; Java; Laboratories; Libraries;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man and Cybernetics, 2002 IEEE International Conference on
ISSN
1062-922X
Print_ISBN
0-7803-7437-1
Type
conf
DOI
10.1109/ICSMC.2002.1176327
Filename
1176327
Link To Document