• DocumentCode
    1037544
  • Title

    Integrated Mining of Visual Features, Speech Features, and Frequent Patterns for Semantic Video Annotation

  • Author

    Tseng, Vincent S. ; Su, Ja-Hwung ; Huang, Jhih-Hong ; Chen, Chih-Jen

  • Author_Institution
    Nat. Cheng Kung Univ., Tainan
  • Volume
    10
  • Issue
    2
  • fYear
    2008
  • Firstpage
    260
  • Lastpage
    267
  • Abstract
    To support effective multimedia information retrieval, video annotation has become an important topic in video content analysis. Existing video annotation methods put the focus on either the analysis of low-level features or simple semantic concepts, and they cannot reduce the gap between low-level features and high-level concepts. In this paper, we propose an innovative method for semantic video annotation through integrated mining of visual features, speech features, and frequent semantic patterns existing in the video. The proposed method mainly consists of two main phases: 1) Construction of four kinds of predictive annotation models, namely speech-association, visual-association, visual-sequential, and statistical models from annotated videos. 2) Fusion of these models for annotating un-annotated videos automatically. The main advantage of the proposed method lies in that all visual features, speech features, and semantic patterns are considered simultaneously. Moreover, the utilization of high-level rules can effectively complement the insufficiency of statistics-based methods in dealing with complex and broad keyword identification in video annotation. Through empirical evaluation on NIST TRECVID video datasets, the proposed approach is shown to enhance the performance of annotation substantially in terms of precision, recall, and F-measure.
  • Keywords
    data mining; feature extraction; information retrieval; multimedia communication; speech processing; statistical analysis; video signal processing; frequent semantic pattern; integrated mining; keyword identification; multimedia information retrieval; speech-association model; statistical model; video annotation; video content analysis; visual-association model; visual-sequential model; Content based retrieval; Data mining; Image retrieval; Information analysis; Information retrieval; Layout; NIST; Pattern analysis; Predictive models; Speech analysis; Association rule; data mining; frequent semantic patterns; sequential patterns; video annotation;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2007.911832
  • Filename
    4432628