DocumentCode :
1993987
Title :
Improving Chinese/English OCR performance by using MCE-based character-pair modeling and negative training
Author :
Huo, Qiang ; Feng, Zhi-Dan
Author_Institution :
dept. of Comput. Sci. & Inf. Syst., Hong Kong Univ., China
fYear :
2003
fDate :
3-6 Aug. 2003
Firstpage :
364
Abstract :
In the past several years, we´ve been developing a high performance OCR engine for machine printed Chinese/ English documents. We have reported previously (1) how to use character modeling techniques based on MCE (minimum classification error) training to achieve the high recognition accuracy, and (2) how to use confidence-guided progressive search and fast match techniques to achieve the high recognition efficiency. In this paper, we present two more techniques that help reduce search errors and improve the robustness of our character recognizer. They are (1) to use MCE-trained character-pair models to avoid error-prone character-level segmentation for some trouble cases, and (2) to perform a MCE-based negative training to improve the rejection capability of the recognition models on the hypothesized garbage images during recognition process. The efficacy of the proposed techniques is confirmed by experiments in a benchmark test.
Keywords :
error handling; minimisation; natural language interfaces; optical character recognition; Chinese OCR performance improvement; English OCR performance improvement; MCE-based character-pair modeling; OCR engine; benchmark testing; character modeling techniques; character recognition; confidence-guided progressive search; errorprone character-level segmentation; fast match techniques; hypothesized garbage images; minimum classification error training; negative training; recognition accuracy; recognition efficiency; rejection capability; Benchmark testing; Character generation; Character recognition; Computer science; Image recognition; Image segmentation; Information systems; Optical character recognition software; Robustness; Search engines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN :
0-7695-1960-1
Type :
conf
DOI :
10.1109/ICDAR.2003.1227690
Filename :
1227690
Link To Document :
بازگشت