مرکز منطقه ای اطلاع رساني علوم و فناوري - Extracting text from degraded document image

DocumentCode :

3777153

Title :

Extracting text from degraded document image

Author :

Radhika Patel;Suman K. Mitra

Author_Institution :

Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, India 382007

fYear :

2015

Firstpage :

Lastpage :

Abstract :

The recent era of digitization is expected to digitized many old important documents which are degraded due to various reasons. Degraded document image binarization has many challenges like intensity variation, background contrast variation, bleed through, text size variation and so on. Many approaches are available for document image binarization, but none can handle all types of degradation at once. We proposed an approach which consists of three stages such as preprocessing, Text-Area detection and post-processing. Preprocessing enhances the contrast of the image. Next stage involves identifying Text-Area. Postprocessing technique takes care of false positives and false negative based on intensity values of preprocessed and gray image. The Performance is evaluated based on various quantitative measures and is compared with the method regarded best so far. The algorithm is also expected to be independent of the script, hence is tested on Gujarati degraded document images.

Keywords :

"Image edge detection","Frequency modulation","Histograms","Image segmentation","Distortion measurement","Data mining","Degradation"

Publisher :

ieee

Conference_Titel :

Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2015 Fifth National Conference on

Type :

conf

DOI :

10.1109/NCVPRIPG.2015.7490017

Filename :

7490017

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3777153