Title :
Urdu text classification using decision trees
Author :
K. Khan;R. Ullah Khan;Ali Alkhalifah;N. Ahmad
Author_Institution :
Dept. of Information Engineering, University of Brescia, Italy
fDate :
12/1/2015 12:00:00 AM
Abstract :
This article reports the development and experimental analysis of an Urdu Optical Character REcognition (OCR) system. The proposed approach presents the preprocessing, features extraction and classification of Urdu language text. Three different features extraction techniques, the Hu moments, Zernike moments and the Principal Component Analysis (PCA) are used. Decision Tree algorithm J-48 is used for classification. A medium size database of 441 characters is created consisting of hand written and machine written Urdu language characters. An overall best recognition accuracy of 92.06% is achieved using the Hu moments.
Keywords :
"Feature extraction","Optical character recognition software","Character recognition","Decision trees","Principal component analysis","Classification algorithms","Databases"
Conference_Titel :
High-Capacity Optical Networks and Enabling/Emerging Technologies (HONET), 2015 12th International Conference on
DOI :
10.1109/HONET.2015.7395445