Title :
Histogram-Based Fast Text Paragraph Image Detection
Author :
Devadeep Shyam;Yan Wang;Alex C. Kot
Author_Institution :
Rapid-Rich Object Search Lab., Nanyang Technol. Univ., Singapore, Singapore
Abstract :
Rumormongers always use long paragraphs to spread slanderous stories so that they can convince readers. Those illegal or sensitive rumors uploaded into the internet can be written on images to by-pass text filters. These images can be detected by existing filters such as OCR, but the detection is very time consuming. To prohibit the dissemination of those commentaries, detecting whether an image contains a sufficient amount of words provides convenience to the government or internet service providers. Because of this, we focus on developing a fast pre-processor algorithm for detecting images embedded with sufficient text, such that the text filters (e.g. OCR) only need to focus on those suspected images. In this paper, we propose a histogram-based fast detection method to determine whether an image contains paragraphs of text or not. Binary histograms are extracted from the converted binary images. Then, due to the periodic pattern of the histograms, a step curve is designed to apply on the autocorrelation of those histograms. The area under the curve is further utilized to differentiate images with paragraphs and those without. To imitate the scenario, we construct a new dataset covering more than 2000 images of with and without paragraphs. The results show the effectiveness of the proposed detection system, which achieves 99.5% in accuracy and 15 millisecond per image in speed implemented in C++.
Keywords :
"Histograms","Correlation","Optical character recognition software","Algorithm design and analysis","Image recognition","Text recognition","Search problems"
Conference_Titel :
Computational Intelligence, 2015 IEEE Symposium Series on
Print_ISBN :
978-1-4799-7560-0
DOI :
10.1109/SSCI.2015.76