• DocumentCode
    3430402
  • Title

    Towards unsupervised speech processing

  • Author

    Glass, James

  • Author_Institution
    MIT Comput. Sci. & Artificial Intell. Lab., Cambridge, MA, USA
  • fYear
    2012
  • fDate
    2-5 July 2012
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    The development of an automatic speech recognizer is typically a highly supervised process involving the specification of phonetic inventories, lexicons, acoustic and language models, and requiring annotated training corpora consisting of parallel speech and text data. Although some model parameters may be modified via adaptation, the overall structure of the speech recognizer usually remains relatively static. While this approach has been effective for problems where there is adequate human expertise, and labelled corpora are available, it is challenged by less-supervised or unsupervised scenarios. It also contrasts sharply with human speech processing where learning is an inherent ability. In this paper, three alternative scenarios for speech recognition “training” are described, each requiring decreasing amounts of human expertise and annotated resources, and increasing amounts of unsupervised learning. A speech deciphering challenge is then suggested whereby speech recognizers must learn sub-word inventories and word pronunciations from unannotated speech, supplemented with only non-parallel text resources. It is argued that such a capability will help alleviate the language barrier that currently limits the scope of speech recognition capabilities around the world, and empower speech recognizers to continually learn and evolve through use.
  • Keywords
    speech recognition; unsupervised learning; acoustic model; automatic speech recognizer; language barrier; language model; lexicons; nonparallel text resource; parallel speech; phonetic inventory; speech deciphering; speech recognition; text data; unsupervised learning; unsupervised speech processing; Acoustics; Hidden Markov models; Humans; Speech; Speech processing; Speech recognition; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science, Signal Processing and their Applications (ISSPA), 2012 11th International Conference on
  • Conference_Location
    Montreal, QC
  • Print_ISBN
    978-1-4673-0381-1
  • Electronic_ISBN
    978-1-4673-0380-4
  • Type

    conf

  • DOI
    10.1109/ISSPA.2012.6310546
  • Filename
    6310546