مرکز منطقه ای اطلاع رساني علوم و فناوري - Urdu text classification using decision trees

DocumentCode :

3739041

Title :

Urdu text classification using decision trees

Author :

K. Khan;R. Ullah Khan;Ali Alkhalifah;N. Ahmad

Author_Institution :

Dept. of Information Engineering, University of Brescia, Italy

fYear :

2015

fDate :

12/1/2015 12:00:00 AM

Firstpage :

Lastpage :

Abstract :

This article reports the development and experimental analysis of an Urdu Optical Character REcognition (OCR) system. The proposed approach presents the preprocessing, features extraction and classification of Urdu language text. Three different features extraction techniques, the Hu moments, Zernike moments and the Principal Component Analysis (PCA) are used. Decision Tree algorithm J-48 is used for classification. A medium size database of 441 characters is created consisting of hand written and machine written Urdu language characters. An overall best recognition accuracy of 92.06% is achieved using the Hu moments.

Keywords :

"Feature extraction","Optical character recognition software","Character recognition","Decision trees","Principal component analysis","Classification algorithms","Databases"

Publisher :

ieee

Conference_Titel :

High-Capacity Optical Networks and Enabling/Emerging Technologies (HONET), 2015 12th International Conference on

Type :

conf

DOI :

10.1109/HONET.2015.7395445

Filename :

7395445

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3739041