Title :
An efficient restoration algorithm for the historic middle-age Persian (Pahlavi) manuscripts
Author :
Alirezaee, Shahpour ; Aghaeinia, Hassan ; Ahmadi, Majid ; Faez, Karim
Author_Institution :
Dept. of Electr. Eng., Amirkabir Univ. of Technol., Tehran, Iran
Abstract :
This paper aims to provide a restoration algorithm for the Pahlavi or middle-age Persian manuscript. This is the preliminary document processing view to this area. The central idea is based on the morphological analysis and connected component concept. The proposed algorithm uses the mathematical morphology and connected component concept to segment the line, word, and character overlapped Pahlavi documents and prepares those texts for OCR application. To evaluate the performance of the algorithm, it has been tested on 200 pages of the Pahlavi documents. The algorithm has a good success on document restoration and segmentation. Numerical results indicate that the proposed algorithm can remove the noise and destructive effects. The results also show 99.14% accuracy on the baseline detection, 97.35% accuracy on the text line extraction and removing other lines overlaps, and 99.5% accuracy for segmenting the extracted text lines to their components.
Keywords :
document image processing; history; image restoration; image segmentation; OCR application; Pahlavi manuscript document processing; connected component concept; document restoration algorithm; document segmentation; historic middle-age Persian manuscripts; mathematical morphology; morphological analysis; text line extraction; Character recognition; Entropy; Gray-scale; Handwriting recognition; Image restoration; Image segmentation; Morphology; Optical character recognition software; Testing; Text recognition; Connected component; Document restoration; Preprocessing; Segmentation;
Conference_Titel :
Systems, Man and Cybernetics, 2005 IEEE International Conference on
Print_ISBN :
0-7803-9298-1
DOI :
10.1109/ICSMC.2005.1571461