• DocumentCode
    3670176
  • Title

    Multimodal object recognition from visual and audio sequences

  • Author

    Weipeng He;Haojun Guan;Jianwei Zhang

  • Author_Institution
    TAMS, Department of Informatics, University of Hamburg, Vogt-Kö
  • fYear
    2015
  • Firstpage
    133
  • Lastpage
    138
  • Abstract
    This paper describes a visual-audio object recognition system using hidden Markov models. The system uses the bag-of-words model with scale invariant feature transform descriptors as the visual feature and the mel-frequency cepstrum coefficients as the audio feature. The classification of objects is based on the computation of the probabilities with learned hidden Markov models. Two different fusion methods are used in the system: feature fusion and decision fusion. The former method learns a joint probability distribution with one HMM, while the latter method learns two separate distributions for each modality and combines them under the conditional independence assumption. Experiments based on a dataset of 33 different household objects are carried out to evaluate the performance of these two fusion methods as well as unimodal approaches. The result shows that both fusion methods outperform unimodal methods, while these two methods are mostly comparable.
  • Keywords
    "Hidden Markov models","Visualization","Object recognition","Joints","Feature extraction","Videos","Covariance matrices"
  • Publisher
    ieee
  • Conference_Titel
    Multisensor Fusion and Integration for Intelligent Systems (MFI), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/MFI.2015.7295798
  • Filename
    7295798