• DocumentCode
    2649626
  • Title

    Itemset Mining in Noisy Contexts: A Hybrid Approach

  • Author

    Mouhoubi, Karima ; Létocart, Lucas ; Rouveirol, Céline

  • Author_Institution
    LIPN, Univ. Paris 13, Villetaneuse, France
  • fYear
    2011
  • fDate
    7-9 Nov. 2011
  • Firstpage
    33
  • Lastpage
    40
  • Abstract
    A general task in data mining consists in finding all rectangles of 1 in a boolean matrix in which the order of the rows and columns is not important. However, most algorithms which have been developed to solve this task are unable to be adapted to real data that may contain noise. The effect of the noise is to shatter relevant item sets into a set of small irrelevant item sets, yielding an explosion in the number of resulting item sets. Recent algorithms that have been proposed to address this problem suffer from various limitations such as the large number of results, the execution time which remains very high and the inability to discover overlapping patterns. In this work, we propose a new heuristic approach based on a graph algorithm for the efficient extraction of item set patterns in noisy binary contexts. This method is based on maximal flow/minimal cut algorithms to find dense sub graphs of 1 in the graph associated to the boolean data matrix. To evaluate our approach, various experiments have been performed on both synthetic data and real datasets from bioinformatic applications. We have compared our results on various synthetic datasets and a gene-expression data with various methods and demonstrate that i) our method is quite efficient ii) the patterns extracted by our algorithm have a better quality than the other methods.
  • Keywords
    Boolean algebra; bioinformatics; data mining; graph theory; matrix algebra; Boolean data matrix; data mining; gene-expression data; graph algorithm; item set pattern extraction; itemset mining; maximal flow cut algorithm; minimal cut algorithm; noisy binary contexts; synthetic datasets; Bioinformatics; Bipartite graph; Context; Data mining; Itemsets; Noise; Noise measurement; Data Mining; dense subgraphs; maximal flow/minimal cut; noisy datasets;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on
  • Conference_Location
    Boca Raton, FL
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4577-2068-0
  • Electronic_ISBN
    1082-3409
  • Type

    conf

  • DOI
    10.1109/ICTAI.2011.14
  • Filename
    6103303