Title :
Identification of Indic Scripts on Torn-Documents
Author :
Chanda, Sukalpa ; Franke, Katrin ; Pal, Umapada
Author_Institution :
Dept..of Comput. Sci. & Media Technol., Gjovik Univ. Coll., Gjovik, Norway
Abstract :
Questioned Document Examination processes often encompass analysis of torn documents. To aid a forensic expert, automatic classification of content type in torn documents might be useful. This helps a forensic expert to sort out similar document fragments from a pile of torn documents. One parameter of similarity could be the script of the text. In this article we propose a method to identify the script in document fragments. Torn documents are normally characterized by text with arbitrary orientation. We use Zernike moment - based feature that is rotation invariant together with Support Vector Machine (SVM) to classify the script type. Subsequently gradient features are used for comparative analysis of results between rotation dependent and rotation invariant feature type. We achieved an overall script-identification accuracy of 81.39% when dealing with 11 different scripts at character/connected-component level and 94.65% at word level.
Keywords :
character recognition; document image processing; feature extraction; forensic science; natural language processing; support vector machines; Indic Scripts identification; SVM; Zernike moment-based feature; character-component level; connected-component level; content type automatic classification; document examination processes; document fragments; gradient features; rotation invariant feature type; script type classification; script-identification accuracy; support vector machine; torn documents; Accuracy; Feature extraction; Forensics; Kernel; Labeling; Support vector machines; Training; Computational Forensics; Gaussian Kernel SVM; Script Identification; Torn Document;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2011.149