• DocumentCode
    813668
  • Title

    Indexing useful structural patterns for XML query processing

  • Author

    Lian, Wang ; Mamoulis, Nikos ; Cheung, David Wai-Lok ; Yiu, S.M.

  • Author_Institution
    Fac. of Inf. Technol., Macao Univ. of Sci. & Technol., China
  • Volume
    17
  • Issue
    7
  • fYear
    2005
  • fDate
    7/1/2005 12:00:00 AM
  • Firstpage
    997
  • Lastpage
    1009
  • Abstract
    Queries on semistructured data are hard to process due to the complex nature of the data and call for specialized techniques. Existing path-based indexes and query processing algorithms are not efficient for searching complex structures beyond simple paths, even when the queries are high-selective. We introduce the definition of minimal infrequent structures (MIS), which are structures that 1) exist in the data, 2) are not frequent with respect to a support threshold, and 3) all substructures of them are frequent. By indexing the occurrences of MIS, we can efficiently locate the high-selective substructures of a query, improving search performance significantly. An efficient data mining algorithm is proposed, which finds the minimal infrequent structures. Their occurrences in the XML data are then indexed by a lightweight data structure and used as a fast filter step in query evaluation. We validate the efficiency and applicability of our methods through experimentation on both synthetic and real data.
  • Keywords
    XML; data mining; data structures; database indexing; query processing; RDF; XML query processing; XSL; data mining algorithm; document indexing; high-selective substructures; lightweight data structure; minimal infrequent structures; path-based indexes; query evaluation; semistructured data queries; structural pattern indexing; Computer Society; Data mining; Data structures; Databases; Filters; Indexes; Indexing; Information retrieval; Query processing; XML; Index Terms- Query processing; XML/XSL/RDF; document indexing.; mining methods and algorithms;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2005.110
  • Filename
    1432707