• DocumentCode
    1582154
  • Title

    Layout and language: exploring text block discovery in tables using linguistic resources

  • Author

    Hurst, Matthew

  • Author_Institution
    WhizBang!Labs, Pittsburgh, PA, USA
  • fYear
    2001
  • fDate
    6/23/1905 12:00:00 AM
  • Firstpage
    523
  • Lastpage
    527
  • Abstract
    Identifying the textual content of table cells requires, in part, the successful resolution of ambiguities confusing multi-row cells and single-row cells, as well as the resolution of other layout based ambiguities. This paper investigates the application of linguistic resources to this problem and discusses algorithms that exploit both phrasal dictionaries and bigram language models for discovering the content of cells in flat text files
  • Keywords
    dictionaries; document image processing; linguistics; bigram language models; document image processing; document representation; experiments; flat text files; linguistic resources; phrasal dictionaries; table cell textual content; table recognition; text block discovery; textual layout; Company reports; Computational linguistics; Dictionaries; Encoding; Investments; Security; Testing; Text recognition; USA Councils;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
  • Conference_Location
    Seattle, WA
  • Print_ISBN
    0-7695-1263-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2001.953844
  • Filename
    953844