مرکز منطقه ای اطلاع رساني علوم و فناوري - An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi)

DocumentCode :

2195382

Title :

An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi)

Author :

Chaudhuri, B.B. ; Pal, U.

Author_Institution :

Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Calcutta, India

Volume :

fYear :

1997

fDate :

18-20 Aug 1997

Firstpage :

1011

Abstract :

An OCR system is proposed that can read two Indian language scripts: Bangla and Devnagari (Hindi), the most popular ones in the Indian subcontinent. These scripts, having the same origin in ancient Brahmi script, have many features in common and hence a single system can be modeled to recognize them. In the proposed model, document digitization, skew detection, text line segmentation and zone separation, word and character segmentation, character grouping into basic, modifier and compound character category are done for both scripts by the same set of algorithms. The feature sets and classification tree as well as the knowledge base required for error correction (such as lexicon) differ for Bangla and Devnagari. The system shows a good performance for single font scripts printed on clear documents

Keywords :

character sets; document image processing; image classification; image segmentation; optical character recognition; performance evaluation; Bangla; Brahmi script; Devnagari; Hindi; Indian language script; OCR system; character grouping; character segmentation; classification tree; compound character category; document digitization; error correction; feature sets; knowledge base; modifier; performance; single font scripts; skew detection; text line segmentation; word segmentation; zone separation; Character recognition; Cleaning; Computer vision; Error correction; Natural languages; Optical character recognition software; Pattern recognition; Switches; Text recognition; Writing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on

Conference_Location :

Ulm

Print_ISBN :

0-8186-7898-4

Type :

conf

DOI :

10.1109/ICDAR.1997.620662

Filename :

620662

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2195382