• DocumentCode
    3013441
  • Title

    Counting twig matches in a tree

  • Author

    Chen, Zhiyuan ; Jagadish, H.V. ; Korn, Flip ; Koudas, Nick ; Muthukrishnan, S. ; Ng, Raymond ; Srivastava, Divesh

  • Author_Institution
    Cornell Univ., Ithaca, NY, USA
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    595
  • Lastpage
    604
  • Abstract
    Describes efficient algorithms for accurately estimating the number of matches of a small node-labeled tree, i.e. a twig, in a large node-labeled tree, using a summary data structure. This problem is of interest for queries on XML and other hierarchical data, to provide query feedback and for cost-based query optimization. Our summary data structure scalably represents approximate frequency information about twiglets (i.e. small twigs) in the data tree. Given a twig query, the number of matches is estimated by creating a set of query twiglets, and combining two complementary approaches: set hashing, used to estimate the number of matches of each query twiglet, and maximal overlap, used to combine the query twiglet estimates into an estimate for the twig query. We propose several estimation algorithms that apply these approaches on query twiglets formed using variations on different twiglet decomposition techniques. We present an extensive experimental evaluation using several real XML data sets, with a variety of twig queries. Our results demonstrate that accurate and robust estimates can be achieved, even with limited space
  • Keywords
    database theory; hypermedia markup languages; query processing; tree data structures; XML data sets; XML queries; approximate frequency information representation; cost-based query optimization; data tree; estimation algorithms; hierarchical data; maximal overlap; node-labeled tree; query feedback; query twiglet estimates; set hashing; summary data structure; twig match counting; twig query; twiglet decomposition techniques; Books; Data structures; Database languages; Feedback; File systems; Frequency; Query processing; Robustness; Tree data structures; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2001. Proceedings. 17th International Conference on
  • Conference_Location
    Heidelberg
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-1001-9
  • Type

    conf

  • DOI
    10.1109/ICDE.2001.914874
  • Filename
    914874