Towards unsupervised speech processing

Author

Glass, James

Author_Institution

MIT Comput. Sci. & Artificial Intell. Lab., Cambridge, MA, USA

fYear

2012

fDate

2-5 July 2012

Firstpage

1

Lastpage

4

Abstract

The development of an automatic speech recognizer is typically a highly supervised process involving the specification of phonetic inventories, lexicons, acoustic and language models, and requiring annotated training corpora consisting of parallel speech and text data. Although some model parameters may be modified via adaptation, the overall structure of the speech recognizer usually remains relatively static. While this approach has been effective for problems where there is adequate human expertise, and labelled corpora are available, it is challenged by less-supervised or unsupervised scenarios. It also contrasts sharply with human speech processing where learning is an inherent ability. In this paper, three alternative scenarios for speech recognition “training” are described, each requiring decreasing amounts of human expertise and annotated resources, and increasing amounts of unsupervised learning. A speech deciphering challenge is then suggested whereby speech recognizers must learn sub-word inventories and word pronunciations from unannotated speech, supplemented with only non-parallel text resources. It is argued that such a capability will help alleviate the language barrier that currently limits the scope of speech recognition capabilities around the world, and empower speech recognizers to continually learn and evolve through use.

Keywords

speech recognition; unsupervised learning; acoustic model; automatic speech recognizer; language barrier; language model; lexicons; nonparallel text resource; parallel speech; phonetic inventory; speech deciphering; speech recognition; text data; unsupervised learning; unsupervised speech processing; Acoustics; Hidden Markov models; Humans; Speech; Speech processing; Speech recognition; Training;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Science, Signal Processing and their Applications (ISSPA), 2012 11th International Conference on

Conference_Location

Montreal, QC

Print_ISBN

978-1-4673-0381-1

Electronic_ISBN

978-1-4673-0380-4

Type

conf

DOI

10.1109/ISSPA.2012.6310546

Filename

6310546