• DocumentCode
    595010
  • Title

    Learning the characteristics of critical cells from web tables

  • Author

    Nagy, G.

  • Author_Institution
    Rensselaer Polytech. Inst., Troy, NY, USA
  • fYear
    2012
  • fDate
    11-15 Nov. 2012
  • Firstpage
    1554
  • Lastpage
    1557
  • Abstract
    Critical Cells (CCs) are identified to partition a web table into mutually exclusive regions of stub, column header, row header, data, and neutral cells. Every table cell (including titles and footnotes outside the table proper but usually within the HTML table tags) is classified into one of six classes based on cell-features extracted from the target cell and its eight neighbors. Changing the domain of maximization over posteriors results in the assignment of exactly four CCs to each table. The average number of interactions required for error-free table data extraction can be reduced more than 75% by alternating between graphic interaction and auto-assignment.
  • Keywords
    Internet; feature extraction; learning (artificial intelligence); CC; Web tables; auto-assignment; cell-feature extraction; column header; critical cell characteristics; error-free table data extraction; graphic interaction; learning; neutral cells; row header; stub; Algorithm design and analysis; Data mining; Feature extraction; HTML; Training; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2012 21st International Conference on
  • Conference_Location
    Tsukuba
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4673-2216-4
  • Type

    conf

  • Filename
    6460440