DocumentCode :
2102559
Title :
Towards automatic transcription of Syriac handwriting
Author :
Clocksin, W.F.
Author_Institution :
Dept. of Comput., Oxford Brookes Univ., UK
fYear :
2003
fDate :
17-19 Sept. 2003
Firstpage :
664
Lastpage :
669
Abstract :
We describe a method implemented for the recognition of Syriac handwriting from historical manuscripts. The Syriac language has been a neglected area for handwriting recognition research, yet is interesting because the preponderance of scribe-written manuscripts offers a challenging yet tractable medium for OCR research between the extremes of typewritten text and free handwriting. Like Arabic, Syriac is written in a cursive form from right-to-left, and letter shape depends on the position within the word. The method described does not need to find character strokes or contours. Both whole words and character shapes were used in recognition experiments. After segmentation using a novel probabilistic method, features of these shapes are found that tolerate variation in formation and image quality. Each shape is recognised individually using a discriminative support vector machine with 10-fold cross-validation. We describe experiments using a variety of segmentation methods and combinations of features on characters and words. Images from scribe-written historical manuscripts are used, and the recognition results are compared with those for images taken from clearer 19th century typeset documents. Recognition rates vary from 61-100%, depending on the algorithms used and the size and source of the data set.
Keywords :
feature extraction; handwritten character recognition; image classification; image segmentation; learning (artificial intelligence); natural languages; optical character recognition; probability; support vector machines; Syriac handwriting recognition; Syriac language; automatic transcription; classification trials; cross-validation; cursive form; discriminative support vector machine; feature extraction; image segmentation; probabilistic method; scribe-written manuscripts; segmentation probability; training data; typeset documents; Character recognition; Clocks; Handwriting recognition; Image quality; Image recognition; Image segmentation; Laboratories; Optical character recognition software; Shape; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Image Analysis and Processing, 2003.Proceedings. 12th International Conference on
Print_ISBN :
0-7695-1948-2
Type :
conf
DOI :
10.1109/ICIAP.2003.1234126
Filename :
1234126
Link To Document :
بازگشت