• DocumentCode
    3143068
  • Title

    On dimensionality reduction of massive graphs for indexing and retrieval

  • Author

    Aggarwal, Charu C. ; Wang, Haixun

  • Author_Institution
    IBM T. J. Watson Res. Center, Hawthorne, NY, USA
  • fYear
    2011
  • fDate
    11-16 April 2011
  • Firstpage
    1091
  • Lastpage
    1102
  • Abstract
    In this paper, we will examine the problem of dimensionality reduction of massive disk-resident data sets. Graph mining has become important in recent years because of its numerous applications in community detection, social networking, and web mining. Many graph data sets are defined on massive node domains in which the number of nodes in the underlying domain is very large. As a result, it is often difficult to store and hold the information necessary in order to retrieve and index the data. Most known methods for dimensionality reduction are effective only for data sets defined on modest domains. Furthermore, while the problem of dimensionality reduction is most relevant to the problem of massive data sets, these algorithms are inherently not designed for the case of disk-resident data in terms of the order in which the data is accessed on disk. This is a serious limitation which restricts the applicability of current dimensionality reduction methods. Furthermore, since dimensionality reduction methods are typically designed for database applications such as indexing, it is important to design the underlying data reduction method, so that it can be effectively used for such applications. In this paper, we will examine the difficult problem of dimensionality reduction of graph data in the difficult case in which the underlying number of nodes are very large and the data set is disk-resident. We will propose an effective sampling algorithm for dimensionality reduction and show how to perform the dimensionality reduction in a limited number of passes on disk. We will also design the technique to be highly interpretable and friendly for indexing applications. We will illustrate the effectiveness and efficiency of the approach on a number of real data sets.
  • Keywords
    data mining; graph theory; data reduction method; dimensionality reduction method; disk-resident data sets; graph indexing; graph mining; graph retrieval; Algorithm design and analysis; Bridges; Computational complexity; Data mining; Indexing; Partitioning algorithms; Social network services;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2011 IEEE 27th International Conference on
  • Conference_Location
    Hannover
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4244-8959-6
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2011.5767834
  • Filename
    5767834