DocumentCode :
2023272
Title :
SVM Based Scheme for Thai and English Script Identification
Author :
Chanda, S. ; Terrades, Oriol Ramos ; Pal, U.
Author_Institution :
Indian Stat. Inst., Kolkata
Volume :
1
fYear :
2007
fDate :
23-26 Sept. 2007
Firstpage :
551
Lastpage :
555
Abstract :
In some Thai documents, a single text line of a document page may contain both Thai and English scripts. For the optical character recognition (OCR) of such a document page it is better to identify, at first, Thai and English script portions and then to use individual OCR system of the respective scripts on these identified portions. In this paper, a SVM based method is proposed for identification of word-wise printed English and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of the individual character group combining different character features obtained from structural shape, profile, component overlapping information, topological properties, water reservoir concept etc. Based on the experiment on 6110 data we obtained 99.36% script identification accuracy from the proposed scheme.
Keywords :
document image processing; image segmentation; natural language processing; optical character recognition; support vector machines; English script identification; Thai script identification; document segmentation; optical character recognition; support vector machine; Computer vision; Neural networks; Optical character recognition software; Optical network units; Pattern recognition; Reservoirs; Structural shapes; Support vector machine classification; Support vector machines; Water resources;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
ISSN :
1520-5363
Print_ISBN :
978-0-7695-2822-9
Type :
conf
DOI :
10.1109/ICDAR.2007.4378770
Filename :
4378770
Link To Document :
بازگشت