• DocumentCode
    3323976
  • Title

    Extracting Loosely Structured Data Records Through Mining Strict Patterns

  • Author

    Wu, Yipu ; Chen, Jing ; Li, Qing

  • Author_Institution
    Dept. of Comput. Sci., City Univ. of Hong Kong, Hong Kong
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    1322
  • Lastpage
    1324
  • Abstract
    Extracting loosely structured data records (DRs) has wide applications in many domains, such as forum pattern recognition, blog data analysis, and books and news review analysis. Currently existing methods work well for strongly structured DRs only. In this paper, we address the problem of extracting loosely structured DRs through mining strict patterns. In our method, we utilize both content feature and tag tree feature to recognize the loosely structured DRs, and propose a new approach to extract the DRs automatically. Through experimental study we demonstrate that this method is both effective and robust in practice.
  • Keywords
    data mining; pattern recognition; blog data analysis; loosely structured data records; news review analysis; pattern recognition; strict pattern mining; tag tree feature; Application software; Computer science; Data mining; HTML; Information services; Internet; Pattern recognition; Videos; Web pages; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-1836-7
  • Electronic_ISBN
    978-1-4244-1837-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2008.4497543
  • Filename
    4497543