DocumentCode
2251997
Title
SQL-based heuristics for selected KDD tasks over large data sets
Author
Kowalski, Marcin ; Stawicki, Sebastian
Author_Institution
Inst. of Math., Univ. of Warsaw, Warsaw, Poland
fYear
2012
fDate
9-12 Sept. 2012
Firstpage
303
Lastpage
310
Abstract
We investigate how to use the scripts with automatically generated fast-performing analytic SQL statements to speed up the KDD-related tasks of attribute selection and decision tree induction. We base our framework on the entity-attribute-value data model in order to seamlessly scale the required queries with respect to the amounts of attributes involved in the given task´s specification. We note that the considered tasks can be heuristically handled using the same class of aggregation queries, where the most promising attributes and splits are searched by analyzing diversity of aggregated results grouped by decision. We also outline our plans with respect to creation of a large-scale framework for evaluating the proposed heuristics against real-world data.
Keywords
SQL; decision trees; formal specification; query processing; KDD task selection; SQL-based ISBN heuristics; attribute selection; automatically generated fast-performing analytic SQL statements; decision tree induction; entity-attribute-value data model; large data sets; large-scale framework; real-world data; task specification; Accuracy; Data mining; Databases; Decision trees; Engines; Feature extraction; Standards; EAV; SQL; attribute selection; decision trees;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Information Systems (FedCSIS), 2012 Federated Conference on
Conference_Location
Wroclaw
Print_ISBN
978-1-4673-0708-6
Electronic_ISBN
978-83-60810-51-4
Type
conf
Filename
6354469
Link To Document