• DocumentCode
    3144497
  • Title

    Computing structural statistics by keywords in databases

  • Author

    Qin, Lu ; Yu, Jeffrey Xu ; Chang, Lijun

  • Author_Institution
    Chinese Univ. of Hong Kong, Hong Kong, China
  • fYear
    2011
  • fDate
    11-16 April 2011
  • Firstpage
    363
  • Lastpage
    374
  • Abstract
    Keyword search in RDBs has been extensively studied in recent years. The existing studies focused on finding all or top-k interconnected tuple-structures that contain keywords. In reality, the number of such interconnected tuple-structures for a keyword query can be large. It becomes very difficult for users to obtain any valuable information more than individual interconnected tuple-structures. Also, it becomes challenging to provide a similar mechanism like group-&-aggregate for those interconnected tuple-structures. In this paper, we study computing structural statistics keyword queries by extending the group-&-aggregate framework. We consider an RDB as a large directed graph where nodes represent tuples, and edges represent the links among tuples. Instead of using tuples as a member in a group to be grouped, we consider rooted subgraphs. Such a rooted subgraph represents an interconnected tuple-structure among tuples and some of the tuples contain keywords. The dimensions of the rooted subgraphs are determined by dimensional-keywords in a data driven fashion. Two rooted subgraphs are grouped into the same group if they are isomorphic based on the dimensions or in other words the dimensional-keywords. The scores of the rooted subgraphs are computed by a user-given score function if the rooted subgraphs contain some of general keywords. Here, the general keywords are used to compute scores rather than determining dimensions. The aggregates are computed using an SQL aggregate function for every group based on the scores computed. We give our motivation using a real dataset. We propose new approaches to compute structural statistics keyword queries, perform extensive performance studies using two large real datasets and a large synthetic dataset, and confirm the effectiveness and efficiency of our approach.
  • Keywords
    SQL; directed graphs; query processing; relational databases; statistical analysis; SQL aggregate function; directed graph; group-&-aggregate framework; keyword query; keyword search; relational databases; rooted subgraphs; structural statistics; top-k interconnected tuple-structures; user-given score function; Aggregates; Cities and towns; Computers; Keyword search; Monitoring; Relational databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2011 IEEE 27th International Conference on
  • Conference_Location
    Hannover
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4244-8959-6
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2011.5767900
  • Filename
    5767900