DocumentCode
71996
Title
SociaLite: An Efficient Graph Query Language Based on Datalog
Author
Jiwon Seo ; Guo, Stephen ; Lam, Monica S.
Author_Institution
Dept. of Comput. Sci., Stanford Univ., Stanford, CA, USA
Volume
27
Issue
7
fYear
2015
fDate
July 1 2015
Firstpage
1824
Lastpage
1837
Abstract
With the rise of social networks, large-scale graph analysis becomes increasingly important. Because SQL lacks the expressiveness and performance needed for graph algorithms, lower-level, general-purpose languages are often used instead. For greater ease of use and efficiency, we propose SociaLite, a high-level graph query language based on Datalog. As a logic programming language, Datalog allows many graph algorithms to be expressed succinctly. However, its performance has not been competitive when compared to low-level languages. With SociaLite, users can provide high-level hints on the data layout and evaluation order; they can also define recursive aggregate functions which, as long as they are meet operations, can be evaluated incrementally and efficiently. Moreover, recursive aggregate functions make it possible to implement more graph algorithms that cannot be implemented in Datalog. We evaluated SociaLite by running nine graph algorithms in total; eight for social network analysis (shortest paths, PageRank, hubs and authorities, mutual neighbors, connected components, triangles, clustering coefficients, and betweenness centrality) and one for biological network analysis (Eulerian cycles). We use two real-life social graphs, LiveJournal and Last.fm, for the evaluation as well as one synthetic graph. The optimizations proposed in this paper speed up almost all the algorithms by 3 to 22 times. SociaLite even outperforms typical Java implementations by an average of 50 percent for the graph algorithms tested. When compared to highly optimized Java implementations, SociaLite programs are an order of magnitude more succinct and easier to write. Its performance is competitive, with only 16 percent overhead for the largest benchmark, and 25 percent overhead for the worst case benchmark. Most importantly, being a query language, SociaLite enables many more users who are not proficient in software engineering to perform network analysis easily and efficiently.
Keywords
DATALOG; Java; graph theory; pattern clustering; social networking (online); Datalog; Eulerian cycles; Java implementations; Last.fm; LiveJournal; PageRank; SociaLite; authorities; betweenness centrality; biological network analysis; clustering coefficients; connected components; data layout; evaluation order; general-purpose languages; high-level graph query language; high-level hints; hubs; large-scale graph analysis; logic programming language; lower-level languages; mutual neighbors; real-life social graph; recursive aggregate functions; shortest path; social networks; synthetic graph; triangles; Aggregates; Algorithm design and analysis; Arrays; Java; Optimization; Semantics; Social network services; Datalog; Graph Algorithms; Query Languages; graph algorithms; query languages; social network analysis;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2015.2405562
Filename
7045548
Link To Document