DocumentCode
3015366
Title
An investigation on the use of acoustic sub-word units for automatic speech recognition
Author
Wilpon, J.G. ; Juang, B.H. ; Rabiner, L.R.
Author_Institution
AT&T Bell Laboratories, Murray Hill, New Jersey
Volume
12
fYear
1987
fDate
31868
Firstpage
821
Lastpage
824
Abstract
An approach to automatic speech recognition is described which attempts to link together ideas from pattern recognition such as dynamic time warping and hidden Markov modeling, with ideas from linguistically motivated approaches. In this approach, the basic sub-word units are defined acoustically, but not necessarily phonetically. An algorithm was developed which automatically decomposed speech into multiple sub-word segments, based solely upon strict acoustic criteria, without any reference to linguistic content. By repeating this procedure on a large corpus of speech data we obtained an extensive pool of unlabeled sub-word speech segments. Then using well defined clustering techniques, a small set of representative acoustic sub-word units (e.g. an inventory of units) was created. This process is fast, easy to use, and required no human intervention. The interpretation of these sub-word units, in a linguistic sense, in the context of word decoding is an important issue which must be addressed for them to be useful in a large vocabulary system. We have not yet addressed this issue; instead a couple of simple experiments were performed to determine if these acoustic sub-word units had any potential value for speech recognition. For these experiments we used a connected digits database from a single female talker. A 25 sub-word unit codebook of acoustic segments was created from about 1600 segments drawn from 100 connected digit strings. A simple isolated digit recognition system, designed using the statistics of the codewords in the acoustic sub-word unit codebook had a recognition accuracy of 100%. In another experiment a connected digit recognition system was created with representative digit templates created by concatenating the sub-word units in an appropriate manner. The system had a string recognition accuracy of 96%.
Keywords
Automatic speech recognition; Clustering algorithms; Databases; Decoding; Hidden Markov models; Humans; Pattern recognition; Speech recognition; Statistics; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87.
Type
conf
DOI
10.1109/ICASSP.1987.1169589
Filename
1169589
Link To Document