Title :
MMC-margin: Identification of maximum frequent subgraphs by metropolis Monte Carlo sampling
Author :
Muyi Liu;Michael Gribskov
Author_Institution :
Purdue University, Dept. of Biological Sciences, 240 S. Martin Jischke Drive, West Lafayette, IN, USA 47907-1971
Abstract :
Frequent Subgraph Mining is important in many scientific fields including molecular biology, genomics, and chemistry. Graphs are used to represent specific structures, and by finding common subgraphs one can infer common properties. Identification of maximum frequent subgraphs, therefore, provides an approach to reveal hidden knowledge shared among graphs, and classify graphs into informative categories. Most existing algorithms explicitly explore the frequent subgraph space by depth-first or breadth-first search, and traverse the entire lattice of subgraphs. However, explicit search for subgraph isomorphism is NP-complete, and limited to small graphs. We propose MMC-Margin, a Metropolis Monte Carlo approach that extends the idea of the Margin search space to greatly reduce time and space complexity. We apply our method to three domain examples: RNA structure graphs from actual biological structures, artificially constructed RNA graphs with known common structures, and structures of chemical mutagens. Our experiments show that MMC-Margin effectively identifies maximum frequent patterns in subgraph spaces where exact algorithms encounter time and space issues. Specifically, we find that MMC-Margin can find the maximum common subgraph in these datasets in as little as 120 seconds, whereas, a complete search with Margin takes hundreds or even thousands of CPU hours. In addition to time efficiency, the MCMC approach requires little memory. The MMC-margin approach described here allows the identification of common subgraphs in biological and chemical datasets and the expansion of using subgraph isomorphism in extracting latent information.
Keywords :
"RNA","Lattices","Complexity theory","Monte Carlo methods","Memory management","Silicon"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363832