DocumentCode :
2315593
Title :
Shape and Morphological Transformation Based Features for Language Identification in Indian Document Images
Author :
Hangarge, Mallikarjun ; Dhandra, B.V.
Author_Institution :
P.G.Dept. of Studies & Res. in Comput. Sci., Gulbarga Univ., Gulbarga
fYear :
2008
fDate :
16-18 July 2008
Firstpage :
1175
Lastpage :
1180
Abstract :
In this paper, a technique of language identification in document images is described to discriminate five major Indian languages: Hindi, Marathi, Sanskrit, Assamese and Bengali belong to Devnagari and Bangla scripts. A text block of each language containing at least two text lines is selected and characterized by employing global and local features. Morphological transformations are used to decompose a text block in two directions at three levels, to capture fine texture primitives. Shape features of connected components are used to retain the local properties of the text block. Further, combination of these features is used to classify 500 text blocks of proposed languages based on Binary decision tree and KNN classifier. Proposed method is quite different from reported method on non-Indian languages, which are based on shape coding of characters, words and document vectorization. This method directly captures word shapes without segmentation and it is tolerant to variations in font style and size. The language identification results are encouraging.
Keywords :
decision trees; document image processing; natural language processing; text analysis; Indian document images; KNN classifier; binary decision tree; document vectorization; language identification; morphological transformation; non-Indian languages; shape coding; shape features; text block; Character recognition; Classification tree analysis; Decision trees; Frequency; Image recognition; Image segmentation; Natural languages; Optical character recognition software; Shape; Text recognition; Morphological Transformation; Shape; document image; language identification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Emerging Trends in Engineering and Technology, 2008. ICETET '08. First International Conference on
Conference_Location :
Nagpur, Maharashtra
Print_ISBN :
978-0-7695-3267-7
Electronic_ISBN :
978-0-7695-3267-7
Type :
conf
DOI :
10.1109/ICETET.2008.177
Filename :
4580082
Link To Document :
بازگشت