DocumentCode :
595365
Title :
Character extraction in web image for text recognition
Author :
Bolan Su ; Shijian Lu ; Trung Quy Phan ; Chew Lim Tan
Author_Institution :
Dept. of Comput. Sci., Nat. Univ. of Singapore, Singapore, Singapore
fYear :
2012
fDate :
11-15 Nov. 2012
Firstpage :
3042
Lastpage :
3045
Abstract :
Images with text are frequently used on Internet for different purposes. Automatic recognition of text from web images plays an important role on extraction and retrieval of web information. However, the web images are usually in low resolution with artifacts and special effects, which makes word recognition a challenge task even after the text has been localized. In this paper, we propose a robust text recognition technique to efficiently convert the web images into text format. The proposed technique first makes use of the L0 norm smoothing to increase the edge contrast of the input web images. The images are then binarized on each color channel. A connected component analysis is followed to identify the possible character components. Finally the character candidates are recognized by the OCR engine after skew correction. Extensive experiments have been conducted on the latest ICDAR 2011 robust reading competition dataset for born-digital text. The experimental results show the superior performance of our proposed technique.
Keywords :
Internet; edge detection; image colour analysis; image retrieval; optical character recognition; smoothing methods; text analysis; text detection; ICDAR 2011 robust reading competition dataset; Internet; L0 norm smoothing; OCR engine; Web image; Web information extraction; Web information retrieval; automatic text recognition; born-digital text; character component identification; character extraction; color channel; connected component analysis; edge contrast; image binarization; low resolution images; robust text recognition technique; skew correction; text format; text localization; word recognition; Image color analysis; Image recognition; Optical character recognition software; Robustness; Smoothing methods; Testing; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
ISSN :
1051-4651
Print_ISBN :
978-1-4673-2216-4
Type :
conf
Filename :
6460806
Link To Document :
بازگشت