Title of article :
An Approach to Skew Detection of Printed Documents
Author/Authors :
Brodić, Darko University of Belgrade - Technical Faculty in Bor, Serbia , Mello, Carlos A. B. Universidade Federal de Pernambuco - Centro de Informática, Brazil , Maluckov, Čedomir A. University of Belgrade - Technical Faculty in Bor, Serbia , Milivojević, Zoran N. College of Applied Technical Sciences Niš, Serbia
Abstract :
In this paper, we propose an approach to estimate the text skew for printed documents. This is an important step to prevent errors in further stages of an automatic document processing system (as text segmentation). Our approach is based on the statistical analysis of the height of the connected components. In a nutshell, our algorithm is comprised of four steps: (i) removal of redundant data; (ii) establishment of the connected components, which represent filled convex hulls around each text element; (iii) enlargement of these components using morphological erosion; (iv) removal of the largest connected component to identify the first estimation of text skew. According to it, the connected components are enlarged by oriented morphological erosion and the longest of them is extracted. Statistical moments are applied to this longest component to evaluate its orientation and the global text skew of the document is identified. At the end of this process, the original document is rotated back based on the calculated angle. The performance of the proposed algorithm is examined by testing on a custom dataset. The results support the robustness of our approach.
Keywords :
Document image analysis , Connected component analysis , Statistical analysis , Moment based method , Skew estimation
Journal title :
Journal of J.UCS (Journal of Universal Computer Science)
Journal title :
Journal of J.UCS (Journal of Universal Computer Science)