Machine-readable region identification from partially blurred document images

Author

Qinwen Wang;Yixue Wang;Chenyang Wang;Jufeng Yang;Tao Li;Kai Wang

Author_Institution

College of Computer and Control Engineering, Nankai University, Tianjin, China

fYear

2015

Firstpage

231

Lastpage

235

Abstract

Partial blur sometimes occurs in the document images captured by a camera, which will influence the performance of OCR on the non-blurred text region. A real-time method, named MRRI, is proposed in this paper to identify the machine-readable region from partially blurred document images. Firstly, a reference image is generated by low-pass filtering on the given document image. Secondly, a weight matrix is generated by calculating the structural similarity for each patch. Thirdly, a cost function is minimized to identify the maximum machine-readable region that can be well-recognized by OCR. In experiments, two applications are considered with the identified machine-readable region. On one hand, Tesseract-OCR is used for the word recognition to build index for a given document image. Compared with the results by applying OCR on the whole image, more words are correctly recognized by applying OCR on the identified region. On the other hand, the identified machine-readable region is used to assess the quality of a document image. Compared with other two image quality assessment methods, the machine-readable region based method shows a better performance. Also, MRRI is light and time-saving, which can meet the requirement of real-time applications.

Publisher

ieee

Conference_Titel

Document Analysis and Recognition (ICDAR), 2015 13th International Conference on

Type

conf

DOI

10.1109/ICDAR.2015.7333758

Filename

7333758