مرکز منطقه ای اطلاع رساني علوم و فناوري - Identification of Japanese and English Script from a Single Document Page

DocumentCode :

2141439

Title :

Identification of Japanese and English Script from a Single Document Page

Author :

Chanda, S. ; Pal, U. ; Kimura, F.

Author_Institution :

Indian Stat. Inst., Kolkata

fYear :

2007

fDate :

16-19 Oct. 2007

Firstpage :

656

Lastpage :

661

Abstract :

In Japanese documents, a single text line of a page may contain both Japanese and English scripts. For the optical character recognition of such a document page it is better to identify Japanese and English script portions at first, and then to use individual OCRs of these two scripts on their respective identified portions to get higher OCR accuracy. In this paper, an automatic technique for identification of Japanese and English script portions from a single line of a printed document page is proposed. To the best of our knowledge this is the first work of its kind. Here, at first, the document is segmented into lines and then lines are segmented into characters. In the proposed scheme, individual scripts are identified using combination of different features obtained from structural shape of characters, pitch information, topological properties, water reservoir concept etc. Based on the experiment on 11304 characters, we obtained 98.79% identification accuracy from the proposed scheme.

Keywords :

natural language processing; optical character recognition; text analysis; English script identification; Japanese document page; Japanese script identification; document segmentation; optical character recognition; printed document page text line; Character recognition; Computer vision; Optical character recognition software; Reservoirs; Structural shapes; Support vector machine classification; Support vector machines; Testing; Text recognition; Water resources;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer and Information Technology, 2007. CIT 2007. 7th IEEE International Conference on

Conference_Location :

Aizu-Wakamatsu, Fukushima

Print_ISBN :

978-0-7695-2983-7

Type :

conf

DOI :

10.1109/CIT.2007.109

Filename :

4385159

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2141439