Title :
An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents
Author :
Mohanty, Sanghamitra ; Dasbebartta, Himadri Nandini ; Behera, Tarun Kumar
Author_Institution :
Dept. of Comput. Sci. & Applic., Utkal Univ., Bhubaneswar
Abstract :
Recognition of documents containing multiscripts is really a challenging task, which needs more effort of the OCR (optical character recognition) designers for improving the accuracy rate. Previously OCR was developed for documents with single scripts only mainly for English and regional languages. Old documents of not only uniscripts but also multiscripts is needed to be preserved for future use. This paper describes the character recognition process for printed documents containing English and Oriya texts. Though the languages in India are different but still we can find some common features among them. In consideration to our paper we need to distinguish between the Roman Script and the Oriya Script. Most of the English that is. Roman Script are linear as well as circular in nature and the Oriya characters are circular in nature. So we need to separate these scripts by taking into consideration of their features paragraph wise or line wise.
Keywords :
natural language processing; optical character recognition; text analysis; English languages; English texts; Oriya texts; bilingual optical character recognition; multiscripts; printed documents; regional languages; Application software; Character recognition; Cleaning; Computer science; Image segmentation; Natural languages; Noise generators; Optical character recognition software; Optical design; Pattern recognition;
Conference_Titel :
Advances in Pattern Recognition, 2009. ICAPR '09. Seventh International Conference on
Conference_Location :
Kolkata
Print_ISBN :
978-1-4244-3335-3
DOI :
10.1109/ICAPR.2009.49