• DocumentCode
    2251997
  • Title

    SQL-based heuristics for selected KDD tasks over large data sets

  • Author

    Kowalski, Marcin ; Stawicki, Sebastian

  • Author_Institution
    Inst. of Math., Univ. of Warsaw, Warsaw, Poland
  • fYear
    2012
  • fDate
    9-12 Sept. 2012
  • Firstpage
    303
  • Lastpage
    310
  • Abstract
    We investigate how to use the scripts with automatically generated fast-performing analytic SQL statements to speed up the KDD-related tasks of attribute selection and decision tree induction. We base our framework on the entity-attribute-value data model in order to seamlessly scale the required queries with respect to the amounts of attributes involved in the given task´s specification. We note that the considered tasks can be heuristically handled using the same class of aggregation queries, where the most promising attributes and splits are searched by analyzing diversity of aggregated results grouped by decision. We also outline our plans with respect to creation of a large-scale framework for evaluating the proposed heuristics against real-world data.
  • Keywords
    SQL; decision trees; formal specification; query processing; KDD task selection; SQL-based ISBN heuristics; attribute selection; automatically generated fast-performing analytic SQL statements; decision tree induction; entity-attribute-value data model; large data sets; large-scale framework; real-world data; task specification; Accuracy; Data mining; Databases; Decision trees; Engines; Feature extraction; Standards; EAV; SQL; attribute selection; decision trees;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Systems (FedCSIS), 2012 Federated Conference on
  • Conference_Location
    Wroclaw
  • Print_ISBN
    978-1-4673-0708-6
  • Electronic_ISBN
    978-83-60810-51-4
  • Type

    conf

  • Filename
    6354469