Author/Authors :
Habeeb, I.Q. Engineering College - University of Information Technology and Communications, Baghdad, Iraq , Al-Zaydi, Z.Q. Biomedical Engineering - University of Technology, Baghdad, Iraq , Abdulkhudhur, H.N. Directorate of Second Karkh - Ministry of Education, Baghdad, Iraq
Abstract :
The approach of OCR multiple outputs is used to improve accuracy for low scan-
ning resolution images. The idea of this approach is to incorporate information from multiple
outputs of OCR to improve the final OCR output. This approach includes a selection process
for choosing the best resulting words among multiple outputs of OCR. However, most exist-
ing selection techniques used in the selection process are not context-aware. Therefore, this
research proposed a selection technique to overcome the drawbacks of existing techniques. It
uses context information of sentences collected from the N-gram language model to improve
the final OCR output. The proposed selection technique was evaluated against three other
related existing techniques. The evaluation metrics used in this research were Character Error
Rate (CER) and Word Error Rate (WER). Experiments showed a relative decrease of 18.26%
and 14.23% over the CER and WER of the best existing technique. The proposed selection
technique will result in better information extraction through the automatic recognition of
low scanning documents.
Keywords :
selection technique , low-resolution images , ocr errors , document recognition