Title :
Extracting text from degraded document image
Author :
Radhika Patel;Suman K. Mitra
Author_Institution :
Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, India 382007
Abstract :
The recent era of digitization is expected to digitized many old important documents which are degraded due to various reasons. Degraded document image binarization has many challenges like intensity variation, background contrast variation, bleed through, text size variation and so on. Many approaches are available for document image binarization, but none can handle all types of degradation at once. We proposed an approach which consists of three stages such as preprocessing, Text-Area detection and post-processing. Preprocessing enhances the contrast of the image. Next stage involves identifying Text-Area. Postprocessing technique takes care of false positives and false negative based on intensity values of preprocessed and gray image. The Performance is evaluated based on various quantitative measures and is compared with the method regarded best so far. The algorithm is also expected to be independent of the script, hence is tested on Gujarati degraded document images.
Keywords :
"Image edge detection","Frequency modulation","Histograms","Image segmentation","Distortion measurement","Data mining","Degradation"
Conference_Titel :
Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2015 Fifth National Conference on
DOI :
10.1109/NCVPRIPG.2015.7490017