DocumentCode
177872
Title
Text detection and recognition in natural scenes and consumer videos
Author
Jain, Abhishek ; Xujun Peng ; Xiaodan Zhuang ; Natarajan, Prem ; Huaigu Cao
Author_Institution
Language & Multimedia Bus. Unit Raytheon BBN Technol., Speech, Cambridge, MA, USA
fYear
2014
fDate
4-9 May 2014
Firstpage
1245
Lastpage
1249
Abstract
We propose an end-to-end system for text detection and recognition in natural scenes and consumer videos. Maximally Stable Extremal Regions which are robust to illumination and viewpoint variations are selected as text candidates. Rich shape descriptors such as Histogram of Oriented Gradients, Gabor filter, corners and geometrical features are used to represent the candidates and classified using a support vector machine. Positively labeled candidates serve as anchor regions for word formation. We then group candidate regions based on geometric and color properties to form word boundaries. To speed up the system for practical applications, we use Partial Least Squares approach for dimensionality reduction. The detected words are binarized, filtered and passed to a hidden Markov model based Optical Character Recognition (OCR) system for recognition. We show significant improvement in text detection and recognition tasks over previous approaches on a large consumer video dataset. Furthermore, the event detection system built upon the OCR output of this approach outperformed multiple other OCR-only based submissions in the recently concluded NIST TRECVID 2013 multimedia event detection evaluations.
Keywords
Gabor filters; computational geometry; filtering theory; hidden Markov models; image classification; image colour analysis; object detection; optical character recognition; regression analysis; support vector machines; text analysis; video signal processing; Gabor filter; NIST TRECVID 2013 multimedia event detection evaluations; OCR system; OCR-only based submissions; color properties; consumer video dataset; dimensionality reduction; geometric properties; geometrical features; hidden Markov model based optical character recognition system; histogram-of-oriented gradients; illumination variation selection; maximally stable extremal regions; natural scenes; partial least squares approach; shape descriptors; support vector machine; text candidates; text detection; text recognition; viewpoint variation selection; Event detection; Feature extraction; Image edge detection; Optical character recognition software; Support vector machines; Text recognition; Videos; Partial Least Squares; consumer video; event detection; text detection and recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location
Florence
Type
conf
DOI
10.1109/ICASSP.2014.6853796
Filename
6853796
Link To Document