Title :
gSpan: graph-based substructure pattern mining
Author :
Yan, Xifeng ; Han, Jiawei
Author_Institution :
Dept. of Comput. Sci., Illinois Univ., Urbana, IL, USA
Abstract :
We investigate new approaches for frequent graph-based pattern mining in graph datasets and propose a novel algorithm called gSpan (graph-based substructure pattern mining), which discovers frequent substructures without candidate generation. gSpan builds a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label. Based on this lexicographic order gSpan adopts the depth-first search strategy to mine frequent connected subgraphs efficiently. Our performance study shows that gSpan substantially outperforms previous algorithms, sometimes by an order of magnitude.
Keywords :
data mining; tree searching; algorithm; canonical label; depth-first search strategy; frequent connected subgraph mining; frequent graph-based pattern mining; frequent substructure discovery; gSpan; graph datasets; graph-based substructure pattern mining; lexicographic order; performance study; unique minimum DFS code; Chemical compounds; Computer science; Costs; Data mining; Data structures; Graphics; Itemsets; Kernel; Testing; Tree graphs;
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
DOI :
10.1109/ICDM.2002.1184038