Author_Institution :
Sch. of Software, Tsinghua Univ., Beijing, China
Abstract :
There are plentiful and diverse applications of graph data management and mining in the real-world scientific research and business activities. As one of the most basic operations, uniform path pattern query processing on graph data faces three big challenges. In this paper, we deal with these challenges by the following points. Firstly, a new query language on graph, called G-Path, is presented, which focuses on complex path pattern query processing on a very large graph. Also, the design of a system called HDGL is proposed, which is based on a BSP-like model as well as MapReduce model, and can effectively handle distributed graph data operations and queries. Secondly, the implementation of HDGL on the de facto cloud platform - Hadoop - is brought forward. Based on the concept of distributed state machine, the query processing of a G-Path statement in HDGL is detailed. In addition, as the query optimization of G-Path queries, several tricks are utilized to improve dramatically the performance of query execution. Finally, extensive experiments on several graph data sets are conducted to show the usability of G-Path query language and the effectiveness of HDGL.
Keywords :
data mining; graph theory; parallel processing; query processing; G-Path query language; HDGL; Hadoop; MapReduce model; de facto cloud platform; graph data management; graph data mining; large graphs; uniform path pattern query processing; Data mining; Data models; Database languages; Indexes; Pattern matching; Query processing; Writing; G-Path; HDGL; path pattern query;