Title :
Index support for frequent itemset mining in a relational DBMS
Author :
Baralis, Elena ; Cerquitelli, Tania ; Chiusano, Silvia
Author_Institution :
Dipt. di Autom. e Inf., Politecnico di Torino, Italy
Abstract :
Many efforts have been devoted to couple data mining activities with relational DBMSs, but a true integration into the relational DBMS kernel has been rarely achieved. This paper presents a novel indexing technique, which represents transactions in a succinct form, appropriate for tightly integrating frequent itemset mining in a relational DBMS. The data representation is complete, i.e., no support threshold is enforced, in order to allow reusing the index for mining itemsets with any support threshold. Furthermore, an appropriate structure of the stored information has been devised, in order to allow a selective access of the index blocks necessary for the current extraction phase. The index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Experiments have been run for various datasets, characterized by different data distributions. The execution time of the frequent itemset extraction task exploiting the index is always comparable with and sometime faster than a C++ implementation of the FP-growth algorithm accessing data stored on a flat file.
Keywords :
SQL; data mining; database indexing; relational databases; tree data structures; PostgreSQL open source DBMS; data mining; data representation; frequent itemset mining index support; indexing technique; relational DBMS; tree data structures; Algorithm design and analysis; Buffer storage; Data analysis; Data mining; Data structures; Indexing; Itemsets; Kernel; Knowledge management; Relational databases;
Conference_Titel :
Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on
Print_ISBN :
0-7695-2285-8
DOI :
10.1109/ICDE.2005.80