• DocumentCode
    3409352
  • Title

    Robust named entity detection in videotext using character lattices

  • Author

    Subramanian, Krishna ; Prasad, Rohit ; Macrostie, Ehry ; Natarajan, Prem

  • Author_Institution
    BBN Technol., Cambridge, MA
  • fYear
    2008
  • fDate
    March 31 2008-April 4 2008
  • Firstpage
    1241
  • Lastpage
    1244
  • Abstract
    Text in video sequences can provide key indexing information. In particular, videotext is rich in named entities (NEs) and detection of such entities is critical for search applications. Traditional approaches for detecting NEs in OCR output look for these NEs in the single-best recognition results. Due to inevitable presence of recognition errors in the single-best output, such approaches usually result in low recall. Given that a lattice is more likely to contain the correct answer, we explore NE detection from character lattices produced by our videotext OCR system. Furthermore, we use an approximate match criterion that allows insertion of punctuations during lookup. Experimental results show a 50% relative improvement in NE recall using lattices over exact lookup in the 1-best hypothesis. Since the improvement in recall is accompanied by a large number of false positives, we present techniques for reducing false alarms. In addition, we describe efficient techniques for reducing the time for detecting NEs.
  • Keywords
    character recognition; image sequences; video signal processing; OCR; character lattices; entity detection; named entities; recognition errors; video sequences; videotext; Character generation; Engines; Feature extraction; Hidden Markov models; Indexing; Lattices; Optical character recognition software; Robustness; Text recognition; Video sequences; Character Lattices; Hidden Markov Models; Named Entities; Optical Character Recognition; Videotext;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-1483-3
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2008.4517841
  • Filename
    4517841