• DocumentCode
    67370
  • Title

    Convex Weighting Criteria for Speaking Rate Estimation

  • Author

    Yishan Jiao ; Berisha, Visar ; Ming Tu ; Liss, Julie

  • Author_Institution
    Dept. of Speech & Hearing Sci., Arizona State Univ., Tempe, AZ, USA
  • Volume
    23
  • Issue
    9
  • fYear
    2015
  • fDate
    Sept. 2015
  • Firstpage
    1421
  • Lastpage
    1430
  • Abstract
    Speaking rate estimation directly from the speech waveform is a long-standing problem in speech signal processing. In this paper, we pose the speaking rate estimation problem as that of estimating a temporal density function whose integral over a given interval yields the speaking rate within that interval. In contrast to many existing methods, we avoid the more difficult task of detecting individual phonemes within the speech signal and we avoid heuristics such as thresholding the temporal envelope to estimate the number of vowels. Rather, the proposed method aims to learn an optimal weighting function that can be directly applied to time-frequency features in a speech signal to yield a temporal density function. We propose two convex cost functions for learning the weighting functions and an adaptation strategy to customize the approach to a particular speaker using minimal training. The algorithms are evaluated on the TIMIT corpus, on a dysarthric speech corpus, and on the ICSI Switchboard spontaneous speech corpus. Results show that the proposed methods outperform three competing methods on both healthy and dysarthric speech. In addition, for spontaneous speech rate estimation, the result show a high correlation between the estimated speaking rate and ground truth values.
  • Keywords
    speech processing; time-frequency analysis; ICSI switchboard spontaneous speech corpus; TIMIT corpus; convex cost functions; convex weighting criteria; dysarthric speech corpus; optimal weighting function; speaking rate estimation problem; speech signal processing; speech waveform; temporal density function; time-frequency features; Density functional theory; Feature extraction; Nickel; Optimization; Speech; Speech processing; Training; Speaking rate estimation; convex optimization; dysarthria; speaker adaptation; vowel density function;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2434213
  • Filename
    7109110