Title :
Script-based classification of hand-written text documents in a multilingual environment
Author :
Singhal, V. ; Navin, N. ; Ghosh, D.
Author_Institution :
Dept. of Electron. & Commun. Eng., Indian Inst. of Technol., Guwahati, India
Abstract :
Script-based text document classification is an important field of research in the context of multilingual textual document processing. But, all script identification techniques available in the literature so far do not consider handwritten documents. Variations in the writing style, character size, inter-line and inter-word spacings, etc. make the recognition process difficult and unreliable when these script identification algorithms, more specifically visual appearance based approaches, are applied directly on hand-written documents. Therefore, in this paper, we propose to preprocess the input document images so as to compensate for the variations due to writing style and thereby making them suitable for analysis on the basis of their visual appearances. Accordingly, we apply denoising, thinning, pruning, m-connectivity and text size normalization in sequence. Multi-channel Gabor filtering is used to extract texture features that characterize the visual appearances of the document images. Experimental result proves the potentiality of our proposed method of script identification for hand-written text document classification.
Keywords :
document image processing; grammars; handwriting recognition; image classification; image denoising; image thinning; text analysis; character recognition; character size; denoising; document images; hand-written text document classification; hand-written text documents; interline spacing; interword spacing; m-connectivity; multichannel Gabor filtering; multilingual environment; pruning; script identification algorithms; script-based classification; text size normalization; textual document processing; thinning; writing style; Context; Data mining; Data preprocessing; Feature extraction; Filtering; Image analysis; Natural languages; Noise reduction; Text analysis; Writing;
Conference_Titel :
Research Issues in Data Engineering: Multi-lingual Information Management, 2003. RIDE-MLIM 2003. Proceedings. 13th International Workshop on
Print_ISBN :
0-7803-7868-7
DOI :
10.1109/RIDE.2003.1249845