مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

3022025

Title :

Recognition of printed Amharic documents

Author :

Meshesha, Million ; Jawahar, C.V.

Author_Institution :

Center for Visual Inf. Technol., Int. Inst. of Inf. Technol., Hyderabad, India

fYear :

2005

fDate :

29 Aug.-1 Sept. 2005

Firstpage :

784

Abstract :

In Africa, there are a number of languages with their own indigenous scripts. This paper presents an OCR for Amharic scripts. Amharic is the official and working language of Ethiopia. This is possibly the first attempt towards the development of an OCR system for Amharic. Research in the recognition of Amharic script faces major challenges due to (i) the use of more than 300 characters in writing and (ii) existence of a large set of visually similar characters. In this paper, we propose a two-stage feature extraction scheme using PCA and LDA, followed by a decision DAG classifier with SVMs as the nodes. Recognition results are presented to demonstrate the performance on the various printing variations (fonts, styles and sizes) and real-life degraded documents such as books, magazines and newspapers.

Keywords :

document handling; feature extraction; natural languages; optical character recognition; principal component analysis; support vector machines; Amharic scripts; LDA; decision DAG classifier; feature extraction; optical character recognition; principal component analysis; printed Amharic document recognition; support vector machine; Africa; Character recognition; Degradation; Face recognition; Feature extraction; Linear discriminant analysis; Optical character recognition software; Principal component analysis; Printing; Writing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on

ISSN :

1520-5263

Print_ISBN :

0-7695-2420-6

Type :

conf

DOI :

10.1109/ICDAR.2005.198

Filename :

1575652

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3022025