• DocumentCode
    2076238
  • Title

    Automatic Video Annotation by Mining Speech Transcripts

  • Author

    Velivelli, Atulya ; Huang, Thomas S.

  • Author_Institution
    University of Illinois at Urbana-Champaign
  • fYear
    2006
  • fDate
    17-22 June 2006
  • Firstpage
    115
  • Lastpage
    115
  • Abstract
    We describe a model for automatic prediction of text annotations for video data. The speech transcripts of videos, are clustered using an aspect model and keywords are extracted based on aspect distribution. Thus we capture the semantic information available in the video data. This technique for automatic keyword vocabulary construction makes the labelling of video data a very easy task. We then build a video shot vocabulary by utilizing both static images and motion cues. We use a maximum entropy criterion to learn the conditional exponential model by defining constraint features over the shot vocabulary, keyword vocabulary combinations. Our method uses a maximum a posteriori estimate of exponential model to predict the annotations. We evaluate the ability of our model to predict annotations, in terms of mean negative log-likelihood and retrieval performance on the test set. A comparison of exponential model with baseline methods indicates that the results are encouraging.
  • Keywords
    Content based retrieval; Data mining; Entropy; Information retrieval; Labeling; Predictive models; Speech; Streaming media; Testing; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition Workshop, 2006. CVPRW '06. Conference on
  • Print_ISBN
    0-7695-2646-2
  • Type

    conf

  • DOI
    10.1109/CVPRW.2006.39
  • Filename
    1640558