DocumentCode :
3143068
Title :
On dimensionality reduction of massive graphs for indexing and retrieval
Author :
Aggarwal, Charu C. ; Wang, Haixun
Author_Institution :
IBM T. J. Watson Res. Center, Hawthorne, NY, USA
fYear :
2011
fDate :
11-16 April 2011
Firstpage :
1091
Lastpage :
1102
Abstract :
In this paper, we will examine the problem of dimensionality reduction of massive disk-resident data sets. Graph mining has become important in recent years because of its numerous applications in community detection, social networking, and web mining. Many graph data sets are defined on massive node domains in which the number of nodes in the underlying domain is very large. As a result, it is often difficult to store and hold the information necessary in order to retrieve and index the data. Most known methods for dimensionality reduction are effective only for data sets defined on modest domains. Furthermore, while the problem of dimensionality reduction is most relevant to the problem of massive data sets, these algorithms are inherently not designed for the case of disk-resident data in terms of the order in which the data is accessed on disk. This is a serious limitation which restricts the applicability of current dimensionality reduction methods. Furthermore, since dimensionality reduction methods are typically designed for database applications such as indexing, it is important to design the underlying data reduction method, so that it can be effectively used for such applications. In this paper, we will examine the difficult problem of dimensionality reduction of graph data in the difficult case in which the underlying number of nodes are very large and the data set is disk-resident. We will propose an effective sampling algorithm for dimensionality reduction and show how to perform the dimensionality reduction in a limited number of passes on disk. We will also design the technique to be highly interpretable and friendly for indexing applications. We will illustrate the effectiveness and efficiency of the approach on a number of real data sets.
Keywords :
data mining; graph theory; data reduction method; dimensionality reduction method; disk-resident data sets; graph indexing; graph mining; graph retrieval; Algorithm design and analysis; Bridges; Computational complexity; Data mining; Indexing; Partitioning algorithms; Social network services;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2011 IEEE 27th International Conference on
Conference_Location :
Hannover
ISSN :
1063-6382
Print_ISBN :
978-1-4244-8959-6
Electronic_ISBN :
1063-6382
Type :
conf
DOI :
10.1109/ICDE.2011.5767834
Filename :
5767834
Link To Document :
بازگشت