DocumentCode
3430402
Title
Towards unsupervised speech processing
Author
Glass, James
Author_Institution
MIT Comput. Sci. & Artificial Intell. Lab., Cambridge, MA, USA
fYear
2012
fDate
2-5 July 2012
Firstpage
1
Lastpage
4
Abstract
The development of an automatic speech recognizer is typically a highly supervised process involving the specification of phonetic inventories, lexicons, acoustic and language models, and requiring annotated training corpora consisting of parallel speech and text data. Although some model parameters may be modified via adaptation, the overall structure of the speech recognizer usually remains relatively static. While this approach has been effective for problems where there is adequate human expertise, and labelled corpora are available, it is challenged by less-supervised or unsupervised scenarios. It also contrasts sharply with human speech processing where learning is an inherent ability. In this paper, three alternative scenarios for speech recognition “training” are described, each requiring decreasing amounts of human expertise and annotated resources, and increasing amounts of unsupervised learning. A speech deciphering challenge is then suggested whereby speech recognizers must learn sub-word inventories and word pronunciations from unannotated speech, supplemented with only non-parallel text resources. It is argued that such a capability will help alleviate the language barrier that currently limits the scope of speech recognition capabilities around the world, and empower speech recognizers to continually learn and evolve through use.
Keywords
speech recognition; unsupervised learning; acoustic model; automatic speech recognizer; language barrier; language model; lexicons; nonparallel text resource; parallel speech; phonetic inventory; speech deciphering; speech recognition; text data; unsupervised learning; unsupervised speech processing; Acoustics; Hidden Markov models; Humans; Speech; Speech processing; Speech recognition; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Science, Signal Processing and their Applications (ISSPA), 2012 11th International Conference on
Conference_Location
Montreal, QC
Print_ISBN
978-1-4673-0381-1
Electronic_ISBN
978-1-4673-0380-4
Type
conf
DOI
10.1109/ISSPA.2012.6310546
Filename
6310546
Link To Document