• DocumentCode
    1759569
  • Title

    A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video

  • Author

    Liang Wu ; Shivakumara, Palaiahnakote ; Tong Lu ; Chew Lim Tan

  • Author_Institution
    Nat. Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China
  • Volume
    17
  • Issue
    8
  • fYear
    2015
  • fDate
    Aug. 2015
  • Firstpage
    1137
  • Lastpage
    1152
  • Abstract
    Text detection and tracking in video is challenging due to contrast, resolution and background variations, and different orientations and text movements. In addition, the presence of both caption and scene texts in video aggravates the problem because these two text types differ in characteristics significantly . This paper proposes a new technique for detecting and tracking video texts of any orientation by using spatial and temporal information, respectively. The technique explores gradient directional symmetry at component level for smoothing edge components before text detection. Spatial information is preserved by forming Delaunay triangulation in a novel way at this level, which results in text candidates. Text characteristics are then proposed in a different way for eliminating false text candidates , which results in potential text candidates. Then grouping is proposed for combining potential text candidates regardless of orientation based on the nearest neighbor criterion. To tackle the problems of multi-font and multi-sized texts, we propose multi-scale integration by a pyramid structure, which helps in extracting full text lines. Then, the detected text lines are tracked in video by matching the subgraphs of triangulation. Experimental results for text detection and tracking on our video dataset, the benchmark video datasets, and the natural scene image benchmark datasets show that the proposed method is superior to the state-of-the-art methods in terms of recall, precision , and F-measure.
  • Keywords
    graph theory; mesh generation; text detection; video signal processing; Delaunay triangulation; F-measure; benchmark video datasets; component level; edge components; gradient directional symmetry; multi-oriented scene text line detection; multifont texts; multiscale integration; multisized texts; nearest neighbor criterion; precision; pyramid structure; recall; scene image benchmark datasets; spatial information; subgraphs; temporal information; text candidates; text movements; text tracking; video detection; video tracking; Feature extraction; Histograms; Image color analysis; Image edge detection; Shape; Smoothing methods; Tracking; Delaunay triangulation; multi-oriented video text detection; multi-sized text detection; text detection; text tracking;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2015.2443556
  • Filename
    7121019