DocumentCode :
2208326
Title :
Zone-based structural feature extraction for script identification from Indian documents
Author :
Gopakumar, Rajesh ; Subbareddy, N.V. ; Makkithaya, Krishnamoorthi ; Acharya, Dinesh U.
Author_Institution :
Dept. of Comput. Sci. & Eng., Manipal Inst. of Technol., Manipal, India
fYear :
2010
fDate :
July 29 2010-Aug. 1 2010
Firstpage :
420
Lastpage :
425
Abstract :
Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper a Zone-based Structural feature extraction algorithm scheme towards the recognition of South-Indian scripts along with English and Hindi is proposed. The document images are segmented into lines and the line image is divided into different zones and the structural features are extracted. A total of 37 features were extracted in the first level and then reduced to an optimal number of features using wrapper and filter selection approaches. The K-nearest neighbor and the support vector machine classifiers are used for classification and recognition purpose. A classification accuracy of 100% is achieved on the optimal feature set.
Keywords :
document handling; feature extraction; optical character recognition; support vector machines; English; Hindi; Indian document image segmentation; K-nearest neighbor selection; multilingual documents; online search; optical character recognition; optimal feature set; script identification; script specific OCR; south-indian scripts recognition; support vector machine classifiers; zone-based structural feature extraction; Classification algorithms; Feature extraction; Image segmentation; Pixel; Skeleton; Support vector machine classification; Filter approach; Multilingual document; Script identification; Support Vector Machine; Wrapper subset selection; Zone-based structural features; k-Nearest Neighbor classifier;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Industrial and Information Systems (ICIIS), 2010 International Conference on
Conference_Location :
Mangalore
Print_ISBN :
978-1-4244-6651-1
Type :
conf
DOI :
10.1109/ICIINFS.2010.5578668
Filename :
5578668
Link To Document :
بازگشت