• DocumentCode
    1345690
  • Title

    Discovering structural association of semistructured data

  • Author

    Wang, Ke ; Liu, Huiqing

  • Author_Institution
    Sch. of Comput. Sci., Nat. Univ. of Singapore, Singapore
  • Volume
    12
  • Issue
    3
  • fYear
    2000
  • Firstpage
    353
  • Lastpage
    371
  • Abstract
    Many semistructured objects are similarly, though not identically structured. We study the problem of discovering “typical” substructures of a collection of semistructured objects. The discovered structures can serve the following purposes: 1) the “table-of-contents” for gaining general information of a source, 2) a road map for browsing and querying information sources, 3) a basis for clustering documents, 4) partial schemas for providing standard database access methods, and 5) user/customer interests and browsing patterns. The discovery task is affected by structural features of semistructured data in a nontrivial way and traditional data mining frameworks are inapplicable. We define this discovery problem and propose a solution
  • Keywords
    associative processing; data mining; information retrieval; trees (mathematics); very large databases; browsing patterns; discovered structures; discovery problem; document clustering; information source querying; partial schemas; road map; semistructured data; semistructured objects; standard database access methods; structural association discovery; structural features; traditional data mining frameworks; user/customer interests; Data mining; HTML; Image segmentation; Motion pictures; Roads; SGML; Software libraries; Spatial databases; Warehousing; Web mining;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/69.846290
  • Filename
    846290