• DocumentCode
    3325050
  • Title

    Automatically Extracting Form Labels

  • Author

    Nguyen, Hoa ; Kang, Eun Yong ; Freire, Juliana

  • Author_Institution
    Sch. of Comput., Univ. of Utah, Salt Lake City, UT
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    1498
  • Lastpage
    1500
  • Abstract
    We describe a machine-learning-based approach for extracting attribute labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to retrieve and integrate data that reside in online databases and that are hidden behind form interfaces, including schema matching and clustering, and hidden-Web crawlers. Whereas previous approaches to this problem have relied on heuristics and manually specified extraction rules, our technique makes use of learning classifiers to identify form labels. Our preliminary experiments show this approach is promising and has high accuracy.
  • Keywords
    information retrieval; learning (artificial intelligence); Web form interface; attribute label extraction; hidden-Web crawler; machine learning; online databases; schema matching; Cities and towns; Crawlers; Data mining; Databases; Engines; HTML; Humans; Information retrieval; Partial response channels; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-1836-7
  • Electronic_ISBN
    978-1-4244-1837-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2008.4497602
  • Filename
    4497602