Author_Institution :
College of Computer and Control Engineering, Nankai University, Tianjin, China
Abstract :
Partial blur sometimes occurs in the document images captured by a camera, which will influence the performance of OCR on the non-blurred text region. A real-time method, named MRRI, is proposed in this paper to identify the machine-readable region from partially blurred document images. Firstly, a reference image is generated by low-pass filtering on the given document image. Secondly, a weight matrix is generated by calculating the structural similarity for each patch. Thirdly, a cost function is minimized to identify the maximum machine-readable region that can be well-recognized by OCR. In experiments, two applications are considered with the identified machine-readable region. On one hand, Tesseract-OCR is used for the word recognition to build index for a given document image. Compared with the results by applying OCR on the whole image, more words are correctly recognized by applying OCR on the identified region. On the other hand, the identified machine-readable region is used to assess the quality of a document image. Compared with other two image quality assessment methods, the machine-readable region based method shows a better performance. Also, MRRI is light and time-saving, which can meet the requirement of real-time applications.