DocumentCode :
3165181
Title :
On Pattern Preserving Graph Generation
Author :
Hong-Han Shuai ; De-Nian Yang ; Yu, Philip S. ; Chih-Ya Shen ; Ming-Syan Chen
Author_Institution :
Nat. Taiwan Univ., Taipei, Taiwan
fYear :
2013
fDate :
7-10 Dec. 2013
Firstpage :
677
Lastpage :
686
Abstract :
Real datasets always play an essential role in graph mining and analysis. However, nowadays most available real datasets only support millions of nodes. Therefore, the literature on Big Data analysis utilizes statistical graph generators to generate a massive graph (e.g., billions of nodes) for evaluating the scalability of an algorithm. Nevertheless, current popular statistical graph generators are properly designed to preserve only the statistical metrics, such as the degree distribution, diameter, and clustering coefficient of the original social graphs. Recently, the importance of frequent graph patterns has been recognized in the various works on graph mining, but unfortunately this crucial criterion has not been noticed in the existing graph generators. To address this important need, we make the first attempt to design a Pattern Preserving Graph Generation (PPGG) algorithm to generate a graph including all frequent patterns and three most popular statistical parameters: degree distribution, clustering coefficient, and average vertex degree. The experimental results show that PPGG, which we have released as a free download, is efficient and able to generate a billion-node graph in approximately 10 minutes, much faster than the existing graph generators.
Keywords :
Big Data; data analysis; data mining; graph theory; pattern clustering; statistical analysis; Big Data analysis; PPGG algorithm; average vertex degree; clustering coefficient; degree distribution; frequent graph pattern; graph analysis; graph mining; pattern preserving graph generation; real datasets; statistical graph generators; statistical metrics; Algorithm design and analysis; Biology; Clustering algorithms; Data mining; Databases; Generators; Histograms; Algorithms; Graph Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
ISSN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2013.14
Filename :
6729552
Link To Document :
بازگشت