Title :
SQL-based heuristics for selected KDD tasks over large data sets
Author :
Kowalski, Marcin ; Stawicki, Sebastian
Author_Institution :
Inst. of Math., Univ. of Warsaw, Warsaw, Poland
Abstract :
We investigate how to use the scripts with automatically generated fast-performing analytic SQL statements to speed up the KDD-related tasks of attribute selection and decision tree induction. We base our framework on the entity-attribute-value data model in order to seamlessly scale the required queries with respect to the amounts of attributes involved in the given task´s specification. We note that the considered tasks can be heuristically handled using the same class of aggregation queries, where the most promising attributes and splits are searched by analyzing diversity of aggregated results grouped by decision. We also outline our plans with respect to creation of a large-scale framework for evaluating the proposed heuristics against real-world data.
Keywords :
SQL; decision trees; formal specification; query processing; KDD task selection; SQL-based ISBN heuristics; attribute selection; automatically generated fast-performing analytic SQL statements; decision tree induction; entity-attribute-value data model; large data sets; large-scale framework; real-world data; task specification; Accuracy; Data mining; Databases; Decision trees; Engines; Feature extraction; Standards; EAV; SQL; attribute selection; decision trees;
Conference_Titel :
Computer Science and Information Systems (FedCSIS), 2012 Federated Conference on
Conference_Location :
Wroclaw
Print_ISBN :
978-1-4673-0708-6
Electronic_ISBN :
978-83-60810-51-4