Title :
Towards automatic transcription of Syriac handwriting
Author_Institution :
Dept. of Comput., Oxford Brookes Univ., UK
Abstract :
We describe a method implemented for the recognition of Syriac handwriting from historical manuscripts. The Syriac language has been a neglected area for handwriting recognition research, yet is interesting because the preponderance of scribe-written manuscripts offers a challenging yet tractable medium for OCR research between the extremes of typewritten text and free handwriting. Like Arabic, Syriac is written in a cursive form from right-to-left, and letter shape depends on the position within the word. The method described does not need to find character strokes or contours. Both whole words and character shapes were used in recognition experiments. After segmentation using a novel probabilistic method, features of these shapes are found that tolerate variation in formation and image quality. Each shape is recognised individually using a discriminative support vector machine with 10-fold cross-validation. We describe experiments using a variety of segmentation methods and combinations of features on characters and words. Images from scribe-written historical manuscripts are used, and the recognition results are compared with those for images taken from clearer 19th century typeset documents. Recognition rates vary from 61-100%, depending on the algorithms used and the size and source of the data set.
Keywords :
feature extraction; handwritten character recognition; image classification; image segmentation; learning (artificial intelligence); natural languages; optical character recognition; probability; support vector machines; Syriac handwriting recognition; Syriac language; automatic transcription; classification trials; cross-validation; cursive form; discriminative support vector machine; feature extraction; image segmentation; probabilistic method; scribe-written manuscripts; segmentation probability; training data; typeset documents; Character recognition; Clocks; Handwriting recognition; Image quality; Image recognition; Image segmentation; Laboratories; Optical character recognition software; Shape; Support vector machines;
Conference_Titel :
Image Analysis and Processing, 2003.Proceedings. 12th International Conference on
Print_ISBN :
0-7695-1948-2
DOI :
10.1109/ICIAP.2003.1234126