DocumentCode
3695090
Title
Machine-readable region identification from partially blurred document images
Author
Qinwen Wang;Yixue Wang;Chenyang Wang;Jufeng Yang;Tao Li;Kai Wang
Author_Institution
College of Computer and Control Engineering, Nankai University, Tianjin, China
fYear
2015
Firstpage
231
Lastpage
235
Abstract
Partial blur sometimes occurs in the document images captured by a camera, which will influence the performance of OCR on the non-blurred text region. A real-time method, named MRRI, is proposed in this paper to identify the machine-readable region from partially blurred document images. Firstly, a reference image is generated by low-pass filtering on the given document image. Secondly, a weight matrix is generated by calculating the structural similarity for each patch. Thirdly, a cost function is minimized to identify the maximum machine-readable region that can be well-recognized by OCR. In experiments, two applications are considered with the identified machine-readable region. On one hand, Tesseract-OCR is used for the word recognition to build index for a given document image. Compared with the results by applying OCR on the whole image, more words are correctly recognized by applying OCR on the identified region. On the other hand, the identified machine-readable region is used to assess the quality of a document image. Compared with other two image quality assessment methods, the machine-readable region based method shows a better performance. Also, MRRI is light and time-saving, which can meet the requirement of real-time applications.
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
Type
conf
DOI
10.1109/ICDAR.2015.7333758
Filename
7333758
Link To Document