• DocumentCode
    3239865
  • Title

    A multi-modal video analysis system

  • Author

    Zhang, Shilin ; Li, Heping ; Zhang, Shuwu

  • Author_Institution
    High Technol. & Innovation Center, Chinese Acad. of Sci., Beijing, China
  • fYear
    2011
  • fDate
    27-29 May 2011
  • Firstpage
    176
  • Lastpage
    179
  • Abstract
    In this paper, we present a system for Chinese news program management based on cross media video analysis. Audio, caption text and video frames are all important for a person to understand the meaning of the video. Given these facts, we devised a system integrating continuous Chinese speech recognition (ASR), video caption text recognition (VOCR) and object/scene recognition (OR). The news program is firstly segmented to a serial of segments by anchor person detection. Then the ASR and VOCR recognition results are treated as two paragraphs of text, and we translate them to two bags of words to represent the original recognition results. By analysing the correspondance of the words in ASR result and VOCR result, we can get a trusted set of words to depict the video content of a segment of news program. In the last step, we implement the object/scene classification based on the keyframes analysis aided by the above recognition words. Experiments show that our news management system is efficient.
  • Keywords
    image classification; image segmentation; information resources; object recognition; speech recognition; text analysis; video signal processing; ASR; Chinese news program management; VOCR; anchor person detection; caption text; continuous Chinese speech recognition; cross media video analysis; keyframes analysis; multimodal video analysis system; news program segmentation; object classification; object recognition; scene classification; scene recognition; text paragraphs; video caption text recognition; video content; video frames; Feature extraction; Monitoring; Speech recognition; Support vector machines; Text recognition; Visualization; ASR; BOW; HOG; SIFT; SVM; VOCR;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on
  • Conference_Location
    Xi´an
  • Print_ISBN
    978-1-61284-485-5
  • Type

    conf

  • DOI
    10.1109/ICCSN.2011.6014699
  • Filename
    6014699