DocumentCode :
1632782
Title :
Enhanced Text Extraction from Arabic Degraded Document Images Using EM Algorithm
Author :
Boussellaa, Wafa ; Bougacha, Aymen ; Zahour, Abderrazak ; El Abed, Haikal ; Alimi, Adel
Author_Institution :
ENIS, Univ. of Sfax, Sfax, Tunisia
fYear :
2009
Firstpage :
743
Lastpage :
747
Abstract :
This paper presents a new enhanced text extraction algorithm from degraded document images on the basis of the probabilistic models. The observed document image is considered as a mixture of Gaussian densities which represents the foreground and background document image components. The EM algorithm is introduced in order to estimate and improve the parameters of the mixtures of densities recursively. The initial parameters of the EM algorithm are estimated by the k-means clustering method. After the parameter estimation, the document image is partitioned into text and background classes by the means of ML approach. The performance of the proposed approach is evaluated on a variety of degraded documents comes from the collections of the National library of Tunisia.
Keywords :
Gaussian processes; document image processing; expectation-maximisation algorithm; image representation; image segmentation; maximum likelihood detection; natural language processing; parameter estimation; probability; text analysis; Arabic degraded document image; EM algorithm; Gaussian mixture; ML algorithm; National library of Tunisia; enhanced text extraction algorithm; image representation; k-means clustering method; parameter estimation; probabilistic model; Algorithm design and analysis; Clustering algorithms; Clustering methods; Degradation; Image enhancement; Image segmentation; Maximum likelihood estimation; Parameter estimation; Partitioning algorithms; Text analysis; Arabic degraded document image; Maximum likelihood algorithm(ML); expectation-maximisation algorithm (EM); k-means clustering; segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
ISSN :
1520-5363
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2009.220
Filename :
5277497
Link To Document :
بازگشت