Title :
Efficiently Indexing Large Sparse Graphs for Similarity Search
Author :
Wang, Guoren ; Wang, Bin ; Yang, Xiaochun ; Yu, Ge
Author_Institution :
Northeastern Univ., Shenyang, China
fDate :
3/1/2012 12:00:00 AM
Abstract :
The graph structure is a very important means to model schemaless data with complicated structures, such as protein-protein interaction networks, chemical compounds, knowledge query inferring systems, and road networks. This paper focuses on the index structure for similarity search on a set of large sparse graphs and proposes an efficient indexing mechanism by introducing the Q-Gram idea. By decomposing graphs to small grams (organized by κ-Adjacent Tree patterns) and pairing-up on those κ-Adjacent Tree patterns, the lower bound estimation of their edit distance can be calculated for candidate filtering. Furthermore, we have developed a series of techniques for inverted index construction and online query processing. By building the candidate set for the query graph before the exact edit distance calculation, the number of graphs need to proceed into exact matching can be greatly reduced. Extensive experiments on real and synthetic data sets have been conducted to show the effectiveness and efficiency of the proposed indexing mechanism.
Keywords :
indexing; query processing; trees (mathematics); Q-Gram indexing mechanism; chemical compound; edit distance; graph structure; inverted index construction; k-adjacent tree pattern; knowledge query inferring system; protein-protein interaction network; query processing; road network; schemaless data; similarity search; sparse graph indexing; Buildings; Chemical compounds; Filtering; Indexing; Matched filters; Proteins; Query processing; Relational databases; Roads; Tree graphs; Graph indexing; kappa-adjacent tree.; similarity search;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2010.28