Title : 
Word-wise Sinhala Tamil and English script identification using Gaussian kernel SVM
         
        
            Author : 
Chanda, Sukalpa ; Pal, Srikanta ; Pal, Umapada
         
        
            Author_Institution : 
Indian Stat. Inst., Kolkata, India
         
        
        
        
        
        
            Abstract : 
There are many documents in Srilanka where a single document page may contain Sinhala, Tamil and English texts. For OCR development of such a document page, it is better to identify different scripts present in the page and then feed the identified portion to the respective OCR module. In this paper, a SVM based technique is proposed for word-wise identification of Sinhala, Tamil and English scripts from a single document page. Structural features, topological features and water reservoir principle based features are mainly used here for the purpose. From the experiment we obtained encouraging results.
         
        
            Keywords : 
Gaussian processes; document image processing; feature extraction; optical character recognition; support vector machines; text analysis; Gaussian kernel SVM; OCR module; document image processing; structural feature; topological feature; water reservoir principle-based feature; word-wise English script identification; word-wise Sinhala Tamil script identification; word-wise Sinhala script identification; Feeds; Kernel; Neural networks; Optical character recognition software; Reservoirs; Structural shapes; Support vector machine classification; Support vector machines; Water resources; Water storage;
         
        
        
        
            Conference_Titel : 
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
         
        
            Conference_Location : 
Tampa, FL
         
        
        
            Print_ISBN : 
978-1-4244-2174-9
         
        
            Electronic_ISBN : 
1051-4651
         
        
        
            DOI : 
10.1109/ICPR.2008.4761823