Title :
An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi)
Author :
Chaudhuri, B.B. ; Pal, U.
Author_Institution :
Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Calcutta, India
Abstract :
An OCR system is proposed that can read two Indian language scripts: Bangla and Devnagari (Hindi), the most popular ones in the Indian subcontinent. These scripts, having the same origin in ancient Brahmi script, have many features in common and hence a single system can be modeled to recognize them. In the proposed model, document digitization, skew detection, text line segmentation and zone separation, word and character segmentation, character grouping into basic, modifier and compound character category are done for both scripts by the same set of algorithms. The feature sets and classification tree as well as the knowledge base required for error correction (such as lexicon) differ for Bangla and Devnagari. The system shows a good performance for single font scripts printed on clear documents
Keywords :
character sets; document image processing; image classification; image segmentation; optical character recognition; performance evaluation; Bangla; Brahmi script; Devnagari; Hindi; Indian language script; OCR system; character grouping; character segmentation; classification tree; compound character category; document digitization; error correction; feature sets; knowledge base; modifier; performance; single font scripts; skew detection; text line segmentation; word segmentation; zone separation; Character recognition; Cleaning; Computer vision; Error correction; Natural languages; Optical character recognition software; Pattern recognition; Switches; Text recognition; Writing;
Conference_Titel :
Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
Conference_Location :
Ulm
Print_ISBN :
0-8186-7898-4
DOI :
10.1109/ICDAR.1997.620662