• DocumentCode
    1096289
  • Title

    A Multimodal Scheme for Program Segmentation and Representation in Broadcast Video Streams

  • Author

    Wang, Jinqiao ; Duan, Lingyu ; Liu, Qingshan ; Lu, Hanqing ; Jin, Jesse S.

  • Author_Institution
    Chinese Acad. of Sci., Beijing
  • Volume
    10
  • Issue
    3
  • fYear
    2008
  • fDate
    4/1/2008 12:00:00 AM
  • Firstpage
    393
  • Lastpage
    408
  • Abstract
    With the advance of digital video recording and playback systems, the request for efficiently managing recorded TV video programs is evident so that users can readily locate and browse their favorite programs. In this paper, we propose a multimodal scheme to segment and represent TV video streams. The scheme aims to recover the temporal and structural characteristics of TV programs with visual, auditory, and textual information. In terms of visual cues, we develop a novel concept named program-oriented informative images (POIM) to identify the candidate points correlated with the boundaries of individual programs. For audio cues, a multiscale Kullback-Leibler (K-L) distance is proposed to locate audio scene changes (ASC), and accordingly ASC is aligned with video scene changes to represent candidate boundaries of programs. In addition, latent semantic analysis (LSA) is adopted to calculate the textual content similarity (TCS) between shots to model the inter-program similarity and intra-program dissimilarity in terms of speech content. Finally, we fuse the multimodal features of POIM, ASC, and TCS to detect the boundaries of programs including individual commercials (spots). Towards effective program guide and attracting content browsing, we propose a multimodal representation of individual programs by using POIM images, key frames, and textual keywords in a summarization manner. Extensive experiments are carried out over an open benchmarking dataset TRECVID 2005 corpus and promising results have been achieved. Compared with the electronic program guide (EPG), our solution provides a more generic approach to determine the exact boundaries of diverse TV programs even including dramatic spots.
  • Keywords
    digital video broadcasting; video streaming; TV programs; audio scene changes; broadcast video streams; digital video recording; electronic program guide; latent semantic analysis; multimodal scheme; multiscale Kullback-Leibler distance; playback systems; program segmentation; program-oriented informative images; textual content similarity; Broadcast video; TV program segmentation; latent semantic analysis; multimodal fusion;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2008.917362
  • Filename
    4469884