DocumentCode :
603357
Title :
Skewness and Nearest Neighbour Based Approach for Historical Document Classification
Author :
Kavitha, A.S. ; Shivakumara, Palaiahnakote ; Kumar, G. Hemantha
Author_Institution :
Dept. of Studies in Comput. Sci., Univ. of Mysore, Mysore, India
fYear :
2013
fDate :
6-8 April 2013
Firstpage :
602
Lastpage :
606
Abstract :
Classification of document is essential before feeding to OCR as there is no universal OCR which recognizes multiple scripts. Besides, classification of ancient historical documents such as Indus script is more challenging due to seal form inscribed on durable surfaces (stones) that does not have definite writing style. This result in characters may look different in different seals and non-uniform spacing between text lines. Therefore, in this paper, we propose two approaches, namely, Skew ness based Approach (SA) for Indus document classification from English and South Indian scripts and Nearest Neighbour based Approach (NNA) for classification of English from South Indian scripts. The SA explores the fact that skew ness between the components in the Indus document image with respect to x-axis is higher than skew ness between the components in English and South Indian documents. The NNA identifies the presence or absence of modifiers which are common in South Indian document images and are not present in English document images to study the straightness and cursive ness of the components for classification. The method is evaluated on 600 different document images, which include 100 documents of each type. The comparative study with existing methods shows that the proposed method is superior to existing methods in terms of classification rate.
Keywords :
document image processing; image classification; natural language processing; English document image; English script; Indus document classification; South Indian script; ancient historical document; historical document classification; nearest neighbour based approach; skewness based Approach; universal OCR; Educational institutions; Equations; Feature extraction; Image edge detection; Image segmentation; Optical character recognition software; Seals; Cursiveness; Indus document; Modifiers; Nearest neighbour; Skewness; Straightness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communication Systems and Network Technologies (CSNT), 2013 International Conference on
Conference_Location :
Gwalior
Print_ISBN :
978-1-4673-5603-9
Type :
conf
DOI :
10.1109/CSNT.2013.129
Filename :
6524472
Link To Document :
بازگشت