• DocumentCode
    3136506
  • Title

    Data pre-processing support for data mining

  • Author

    MikSovský, Petr ; Matousek, K. ; Kouba, Zdenek

  • Author_Institution
    Fac. of Electr. Eng., Czech Tech. Univ., Prague, Czech Republic
  • Volume
    5
  • fYear
    2002
  • fDate
    6-9 Oct. 2002
  • Abstract
    It is well known that success of every data mining algorithm is strongly dependent on the quality of data processing. In this context it is natural that data pre-processing can be a very complicated task. Sometimes, data pre-processing takes more than half of the total time spent by solving the data mining problem. The paper describes a tool called SumatraTT, the goal of which is to make the process of data pre-processing easier and faster. Basically, SumatraTT (Transformation Tool) is a metadata-driven, platform independent, extensible, and universal data processing tool. These features have been achieved by building the tool as an interpreter of a transformation-oriented scripting language called SumatraScript. SumatraScript a is fully interpreted Java-like language combining together data access, metadata access, and common programming constructions. Furthermore, it supports RAD (Rapid Application Development) technology by providing the library of re-usable transformation templates. The second part of the paper contains a practical application of SumatraTT. It is a task aimed at prediction of water consumption in a regional distribution network.
  • Keywords
    authoring languages; data handling; data mining; meta data; software libraries; software reusability; very large databases; Java-like language; Rapid Application Development; SumatraScript; SumatraTT; Transformation Tool; data access; data mining; data pre-processing support; interpreter; metadata; regional distribution network; reusable transformation templates; transformation-oriented scripting language; universal data processing tool; water consumption prediction; Buildings; Data mining; Data preprocessing; Data processing; Databases; Decision making; Filtering; Java; Laboratories; Libraries;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics, 2002 IEEE International Conference on
  • ISSN
    1062-922X
  • Print_ISBN
    0-7803-7437-1
  • Type

    conf

  • DOI
    10.1109/ICSMC.2002.1176327
  • Filename
    1176327