• DocumentCode
    3198441
  • Title

    LipActs: Efficient representations for visual speakers

  • Author

    Zavesky, Eric

  • Author_Institution
    AT&T Labs. Res., Middletown, NJ, USA
  • fYear
    2011
  • fDate
    11-15 July 2011
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Video-based lip activity analysis has been successfully used for assisting speech recognition for almost a decade. Surprisingly, this same capability has not been heavily used for near real-time visual speaker retrieval and verification, due to tracking complexity, inadequate or difficult feature determination, and the need for a large amount of pre-labeled data for model training. This paper explores the performance of several solutions using modern histogram of oriented gradients (HOG) features, several quantization techniques, and analyzes the benefits of temporal sampling and spatial partitioning to derive a representation called LipActs. Two datasets are used for evaluation: one with 81 participants derived from varying quality YouTube content and one with 3 participants derived from a forward facing mobile video camera with 10 varied lighting and capture angle environments. Over these datasets, LipActs with a moderate number of pooled temporal frames and multi-resolution spatial quantization, offer an improvement of 37-73% over raw features when optimizing for lowest equal error rate (EER).
  • Keywords
    computational complexity; speech recognition; video signal processing; EER; HOG; LipActs; YouTube content; equal error rate; histogram of oriented gradients; mobile video camera; quantization techniques; spatial partitioning; speech recognition; temporal sampling; tracking complexity; video based lip activity analysis; visual speaker retrieval; Detectors; Face; Feature extraction; Histograms; Quantization; Visualization; Vocabulary; feature extraction; learning systems; verification; video analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia and Expo (ICME), 2011 IEEE International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1945-7871
  • Print_ISBN
    978-1-61284-348-3
  • Electronic_ISBN
    1945-7871
  • Type

    conf

  • DOI
    10.1109/ICME.2011.6012102
  • Filename
    6012102