Title :
A study of speech recognition for children and the elderly
Author :
Wilpon, Jay G. ; Jacobsen, Claw N.
Author_Institution :
AT&T Bell Labs., Murray Hill, NJ, USA
Abstract :
Although children and the elderly have obvious needs for voice operated interfaces, hardly anything is known about the performance of the current automatic speech recognition technology with these people. In this paper we report the results of a thorough investigation into this field using a connected digit recognizer and a major telephone speech database. One would generally assume that the recognition of speech from these people would only be a matter of having enough, sufficiently representative training data. This turns out to be true only, as long as the speakers belong to the age range 15 to approximately 70. Outside this range the error rates increase dramatically, even with balanced amounts of training data. For males, the lower limit is very sharp and can be attributed to the change of pitch frequency during puberty. For females, the lower limit is gradual and caused by the slowly changing dimensions of the vocal tract length only. For both genders, the upper limit is very gradual and can possibly be attributed to changes in the glottis area and the internal control loops of the human articulatory system. The paper presents some supporting evidence for the above assertions and gives results for various attempts to improve the performance. Recognition of children and the elderly will require much more research if we are to fully understand the characteristics of these age group on current and future speech recognition systems
Keywords :
speech recognition; speech synthesis; 15 to 70 yr; age range; automatic speech recognition technology; children; connected digit recognizer; elderly; error rates; females; glottis area; human articulatory system; internal control loop; males; performance; pitch frequency; puberty; speech recognition; telephone speech database; vocal tract length; voice operated interfaces; Automatic speech recognition; Control systems; Databases; Error analysis; Frequency; Humans; Senior citizens; Speech recognition; Telephony; Training data;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
0-7803-3192-3
DOI :
10.1109/ICASSP.1996.541104