مرکز منطقه ای اطلاع رساني علوم و فناوري - Localizing and segmenting text in images and videos

DocumentCode :

1280369

Title :

Localizing and segmenting text in images and videos

Author :

Lienhart, Rainer ; Wernicke, Axel

Author_Institution :

Intel Corp., Santa Clara, CA, USA

Volume :

Issue :

fYear :

2002

fDate :

4/1/2002 12:00:00 AM

Firstpage :

256

Lastpage :

268

Abstract :

Many images, especially those used for page design on Web pages, as well as videos contain visible text. If these text occurrences could be detected, segmented, and recognized automatically, they would be a valuable source of high-level semantics for indexing and retrieval. We propose a novel method for localizing and segmenting text in complex images and videos. Text lines are identified by using a complex-valued multilayer feed-forward network trained to detect text at a fixed scale and position. The network´s output at all scales and positions is integrated into a single text-saliency map, serving as a starting point for candidate text lines. In the case of video, these candidate text lines are refined by exploiting the temporal redundancy of text in video. Localized text lines are then scaled to a fixed height of 100 pixels and segmented into a binary image with black characters on white background. For videos, temporal redundancy is exploited to improve segmentation performance. Input images and videos can be of any size due to a true multiresolution approach. Moreover, the system is not only able to locate and segment text occurrences into large binary images, but is also able to track each text line with sub-pixel accuracy over the entire occurrence in a video, so that one text bitmap is created for all instances of that text line. Therefore, our text segmentation results can also be used for object-based video encoding such as that enabled by MPEG-4

Keywords :

feedforward neural nets; image classification; image retrieval; image segmentation; object detection; optical character recognition; video signal processing; MPEG-4 object encoding; OCR; binary image; high-level semantics; image retrieval; image segmentation; multilayer feed-forward network; multiresolution approach; object detection; optical character recognition; text detection; text extraction; text localization; text recognition; text segmentation; video indexing; video segmentation; Encoding; Feedforward systems; Image resolution; Image segmentation; Indexing; Nonhomogeneous media; Pixel; Text recognition; Videos; Web pages;

fLanguage :

English

Journal_Title :

Circuits and Systems for Video Technology, IEEE Transactions on

Publisher :

ieee

ISSN :

1051-8215

Type :

jour

DOI :

10.1109/76.999203

Filename :

999203

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1280369