Title :
Robust named entity detection in videotext using character lattices
Author :
Subramanian, Krishna ; Prasad, Rohit ; Macrostie, Ehry ; Natarajan, Prem
Author_Institution :
BBN Technol., Cambridge, MA
fDate :
March 31 2008-April 4 2008
Abstract :
Text in video sequences can provide key indexing information. In particular, videotext is rich in named entities (NEs) and detection of such entities is critical for search applications. Traditional approaches for detecting NEs in OCR output look for these NEs in the single-best recognition results. Due to inevitable presence of recognition errors in the single-best output, such approaches usually result in low recall. Given that a lattice is more likely to contain the correct answer, we explore NE detection from character lattices produced by our videotext OCR system. Furthermore, we use an approximate match criterion that allows insertion of punctuations during lookup. Experimental results show a 50% relative improvement in NE recall using lattices over exact lookup in the 1-best hypothesis. Since the improvement in recall is accompanied by a large number of false positives, we present techniques for reducing false alarms. In addition, we describe efficient techniques for reducing the time for detecting NEs.
Keywords :
character recognition; image sequences; video signal processing; OCR; character lattices; entity detection; named entities; recognition errors; video sequences; videotext; Character generation; Engines; Feature extraction; Hidden Markov models; Indexing; Lattices; Optical character recognition software; Robustness; Text recognition; Video sequences; Character Lattices; Hidden Markov Models; Named Entities; Optical Character Recognition; Videotext;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2008.4517841