• DocumentCode
    659484
  • Title

    Efficiently extracting frequent subgraphs using MapReduce

  • Author

    Wei Lu ; Gang Chen ; Tung, A.K.H. ; Feng Zhao

  • Author_Institution
    Nat. Univ. of Singapore, Singapore, Singapore
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    639
  • Lastpage
    647
  • Abstract
    Frequent subgraph extraction from a large number of small graphs is a primitive operation for many data mining applications. To extract frequent subgraphs, existing techniques need to enumerate a large number of subgraphs which is superlinear with the cardinality of the dataset. Given the rapid growing volume of graph data, it is difficult to perform the frequent subgraph extraction on a centralized machine efficiently. In this paper, we investigate how to efficiently perform this extraction over very large datasets using MapReduce. Parallelizing existing techniques directly using MapReduce does not yield good performance as it is difficult to balance the workload among the compute nodes. We therefore propose a framework that adopts the breadth first search strategy to iteratively extract frequent subgraphs, i.e., all frequent size-(i+1) subgraphs are generated based on frequent size-i subgraphs at the ith iteration using a single MapReduce job. To efficiently extract frequent subgraphs, we propose an isomorphism-testing-free approach by properly maintaining how frequent subgraphs are mapped within each graph. Extensive experiments conducted on our in-house clusters demonstrate the superiority of our proposed solution in comparison with the baseline approach.
  • Keywords
    data mining; graph theory; tree searching; MapReduce; breadth first search strategy; data mining; frequent subgraph extraction; isomorphism-testing-free approach; Chemical compounds; Data mining; Educational institutions; Encoding; Indexes; Search problems; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691633
  • Filename
    6691633