DocumentCode :
3023519
Title :
Character duration modeling for speed improvements in the BBN Byblos OCR system
Author :
Natarajan, Premkumar ; Sundaram, Ram ; Prasad, Rohit ; Macrostie, Ehry
Author_Institution :
BBN Technol., Cambridge, MA, USA
fYear :
2005
fDate :
29 Aug.-1 Sept. 2005
Firstpage :
1136
Abstract :
In this paper, we describe a recent enhancement to our HMM-based OCR system that results in a significant increase in the speed of the system without any impact on recognition accuracy. Recognition speed is, in part, a function of the number of distinct HMMs that constitute the model set. As a result, the recognition speed is much slower for ideographic scripts, such as Chinese and Japanese which contain thousands of glyphs, than for alphabetic scripts such as Latin and Arabic. In our current OCR system, methods like sub-character modeling and Gaussian shortlists are used to reduce the processing time. In this paper, we describe a simple character-based duration modeling technique that puts a duration constraint on the number of frames for which a character can stay active. Character durations were obtained from automatically labeled training data and a probability mass function (histogram) was used to model character durations. The use of a duration model yielded a 37% improvement in speed with no loss in accuracy.
Keywords :
optical character recognition; BBN Byblos OCR system; HMM-based OCR system; alphabetic scripts; character duration modeling; character recognition; character-based duration modeling; histogram; ideographic scripts; probability mass function; Character recognition; Feature extraction; Handwriting recognition; Hidden Markov models; Histograms; Image recognition; Optical character recognition software; Speech recognition; Text recognition; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
ISSN :
1520-5263
Print_ISBN :
0-7695-2420-6
Type :
conf
DOI :
10.1109/ICDAR.2005.71
Filename :
1575721
Link To Document :
بازگشت