DocumentCode
2649626
Title
Itemset Mining in Noisy Contexts: A Hybrid Approach
Author
Mouhoubi, Karima ; Létocart, Lucas ; Rouveirol, Céline
Author_Institution
LIPN, Univ. Paris 13, Villetaneuse, France
fYear
2011
fDate
7-9 Nov. 2011
Firstpage
33
Lastpage
40
Abstract
A general task in data mining consists in finding all rectangles of 1 in a boolean matrix in which the order of the rows and columns is not important. However, most algorithms which have been developed to solve this task are unable to be adapted to real data that may contain noise. The effect of the noise is to shatter relevant item sets into a set of small irrelevant item sets, yielding an explosion in the number of resulting item sets. Recent algorithms that have been proposed to address this problem suffer from various limitations such as the large number of results, the execution time which remains very high and the inability to discover overlapping patterns. In this work, we propose a new heuristic approach based on a graph algorithm for the efficient extraction of item set patterns in noisy binary contexts. This method is based on maximal flow/minimal cut algorithms to find dense sub graphs of 1 in the graph associated to the boolean data matrix. To evaluate our approach, various experiments have been performed on both synthetic data and real datasets from bioinformatic applications. We have compared our results on various synthetic datasets and a gene-expression data with various methods and demonstrate that i) our method is quite efficient ii) the patterns extracted by our algorithm have a better quality than the other methods.
Keywords
Boolean algebra; bioinformatics; data mining; graph theory; matrix algebra; Boolean data matrix; data mining; gene-expression data; graph algorithm; item set pattern extraction; itemset mining; maximal flow cut algorithm; minimal cut algorithm; noisy binary contexts; synthetic datasets; Bioinformatics; Bipartite graph; Context; Data mining; Itemsets; Noise; Noise measurement; Data Mining; dense subgraphs; maximal flow/minimal cut; noisy datasets;
fLanguage
English
Publisher
ieee
Conference_Titel
Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on
Conference_Location
Boca Raton, FL
ISSN
1082-3409
Print_ISBN
978-1-4577-2068-0
Electronic_ISBN
1082-3409
Type
conf
DOI
10.1109/ICTAI.2011.14
Filename
6103303
Link To Document