DocumentCode :
2151668
Title :
Word stretching for effective segmentation and classification of historical Arabic handwritten documents
Author :
Al Aghbari, Zaher ; Brook, Salama
Author_Institution :
Dept. of Comput. Sci., Univ. of Sharjah, Sharjah
fYear :
2009
fDate :
22-24 April 2009
Firstpage :
217
Lastpage :
224
Abstract :
Recently, there is a growing need to access historical Arabic handwritten manuscripts (HAH manuscripts) that are stored in large archives; therefore, managing tools for automatic searching, indexing, classifying and retrieval of HAH manuscripts are required. The peculiar characteristics of Arabic handwriting have added an extra challenging dimension in developing such systems. This paper presents a novel holistic technique for segmenting and classifying HAH manuscripts. The classification of HAH manuscripts is performed in several steps. First, the HAH manuscript´s image is segmented into words, and then each word is segmented into its connected parts. Due to the existing overlap between the adjacent connected parts of a single word, we developed a stretching algorithm to increase the gap between them and thus improve their segmentation. Second, several structural and statistical features, which are devised for Arabic text, are extracted from these connected parts and then combined to represent a word with one consolidated feature vector. Finally, a neural network is used to learn and classify the input vectors into word classes. The extraction of structural and statistical features from the individual connected parts, as compared to the extraction of these features from the whole word, improved the performance of the system significantly.
Keywords :
document image processing; image segmentation; information retrieval; natural language processing; neural nets; text analysis; automatic searching; historical Arabic handwritten documents; historical Arabic handwritten manuscripts; indexing; neural network; retrieval; word classification; word segmentation; word stretching; Application software; Buildings; Error correction; Laboratories; Mobile agents; Multiagent systems; Scattering; Software design; Software testing; System testing; Data mining of Arabic documents; Feature extraction of Arabic text; Historical Arabic handwriting; Segmentation of Historical Arabic handwritten documents; Word classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Research Challenges in Information Science, 2009. RCIS 2009. Third International Conference on
Conference_Location :
Fez
Print_ISBN :
978-1-4244-2864-9
Electronic_ISBN :
978-1-4244-2865-6
Type :
conf
DOI :
10.1109/RCIS.2009.5089285
Filename :
5089285
Link To Document :
بازگشت