• DocumentCode
    69089
  • Title

    Efficient Keyword Search on Uncertain Graph Data

  • Author

    Ye Yuan ; Guoren Wang ; Lei Chen ; Haixun Wang

  • Author_Institution
    Northeastern Univ., Shenyang, China
  • Volume
    25
  • Issue
    12
  • fYear
    2013
  • fDate
    Dec. 2013
  • Firstpage
    2767
  • Lastpage
    2779
  • Abstract
    As a popular search mechanism, keyword search has been applied to retrieve useful data in documents, texts, graphs, and even relational databases. However, so far, there is no work on keyword search over uncertain graph data even though the uncertain graphs have been widely used in many real applications, such as modeling road networks, influential detection in social networks, and data analysis on PPI networks. Therefore, in this paper, we study the problem of top-k keyword search over uncertain graph data. Following the similar answer definition for keyword search over deterministic graphs, we consider a subtree in the uncertain graph as an answer to a keyword query if 1) it contains all the keywords; 2) it has a high score (defined by users or applications) based on keyword matching; and 3) it has low uncertainty. Keyword search over deterministic graphs is already a hard problem as stated in [1], [2], [3]. Due to the existence of uncertainty, keyword search over uncertain graphs is much harder. Therefore, to improve the search efficiency, we employ a filtering-and-verification strategy based on a probabilistic keyword index, PKIndex. For each keyword, we offline compute path-based top-k probabilities, and attach these values to PKIndex in an optimal, compressed way. In the filtering phase, we perform existence, path-based and tree-based probabilistic pruning phases, which filter out most false subtrees. In the verification, we propose a sampling algorithm to verify the candidates. Extensive experimental results demonstrate the effectiveness of the proposed algorithms.
  • Keywords
    data analysis; indexing; information filtering; probability; query processing; trees (mathematics); PKIndex; PPI network; data analysis; data retrieval; deterministic graph; documents; efficient keyword search; false subtree filtering; filtering-and-verification strategy; influential detection; keyword matching; keyword query; path-based probabilistic pruning; path-based top-k probability; probabilistic keyword index; relational database; road network modeling; sampling algorithm; search efficiency; search mechanism; social network; texts; top-k keyword search; tree-based probabilistic pruning; uncertain graph data; uncertain graph subtree; Data mining; Graphs; Information retrieval; Ontologies; Search methods; Text mining; Uncertainty; Database; algorithm; graph data; uncertain data;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2012.222
  • Filename
    6648594