• DocumentCode
    1694091
  • Title

    Audio head pose estimation using the direct to reverberant speech ratio

  • Author

    Barnard, Mark ; Wenwu Wang ; Kittler, Josef

  • Author_Institution
    Centre for Vision, Speech & Signal Process., Univ. of Surrey, Guildford, UK
  • fYear
    2013
  • Firstpage
    8056
  • Lastpage
    8060
  • Abstract
    Head pose is an important cue in many applications such as, speech recognition and face recognition. Most approaches to head pose estimation to date have used visual information to model and recognise a subject´s head in different configurations. These approaches have a number of limitations such as, inability to cope with occlusions, changes in the appearance of the head, and low resolution images. We present here a novel method for determining coarse head pose orientation purely from audio information, exploiting the direct to reverberant speech energy ratio (DRR) within a highly reverberant meeting room environment. Our hypothesis is that a speaker facing towards a microphone will have a higher DRR and a speaker facing away from the microphone will have a lower DRR. This hypothesis is confirmed by experiments conducted on the publicly available AV16.3 database.
  • Keywords
    audio signal processing; microphones; pose estimation; reverberation; speech processing; AV16.3 database; DRR; audio head pose estimation; coarse head pose orientation; direct to reverberant speech energy ratio; hypothesis; microphone; reverberant speech ratio; Arrays; Estimation; Head; Microphones; Noise; Speech; Speech processing; Audio Head Pose; Direct to Reverberant Speech Ratio;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6639234
  • Filename
    6639234