• DocumentCode
    6540
  • Title

    Intrinsic Spectral Analysis

  • Author

    Jansen, Anton ; Niyogi, P.

  • Author_Institution
    Univ. of Chicago, Chicago, IL, USA
  • Volume
    61
  • Issue
    7
  • fYear
    2013
  • fDate
    1-Apr-13
  • Firstpage
    1698
  • Lastpage
    1710
  • Abstract
    It has long been posited that the space of speech sounds is inherently low dimensional, the result of a relatively small number of degrees of freedom involved in the human vocal apparatus. We attempt to formalize this notion by analyzing a simple physical model of the vocal tract and demonstrating that it produces transfer functions whose spectra are restricted to low dimensional manifolds embedded in an infinite dimensional space of square integrable functions. While source convolution and channel distortion precludes analytic recovery of the articulatory configuration from the observed signal, we present a data-driven unsupervised learning algorithm called Intrinsic Spectral Analysis designed to recover from a stream of unannotated and unsegmented audio a set of nonlinear basis functions for the speech manifold. Projecting a traditional spectrogram onto this nonlinear basis defines a novel acoustic representation that is demonstrated to have phonological significance, improved phonetic separability, inherent speaker independence, and complementarity with standard acoustic front-ends.
  • Keywords
    speech processing; articulatory configuration; channel distortion; data-driven unsupervised learning algorithm; degree of freedom; human vocal apparatus; improved phonetic separability; infinite dimensional space; inherent speaker independence; intrinsic spectral analysis; low dimensional manifolds; phonological significance; source convolution; speech sound space; square integrable functions; transfer functions; unannotated audio; unsegmented audio; vocal tract physical model; Acoustics; Electron tubes; Manifolds; Signal processing algorithms; Speech; Speech recognition; Transfer functions; Manifold learning; speech processing; speech recognition; unsupervised learning;
  • fLanguage
    English
  • Journal_Title
    Signal Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1053-587X
  • Type

    jour

  • DOI
    10.1109/TSP.2013.2238931
  • Filename
    6409472