DocumentCode
3437993
Title
A study of speech recognition for children and the elderly
Author
Wilpon, Jay G. ; Jacobsen, Claw N.
Author_Institution
AT&T Bell Labs., Murray Hill, NJ, USA
Volume
1
fYear
1996
fDate
7-10 May 1996
Firstpage
349
Abstract
Although children and the elderly have obvious needs for voice operated interfaces, hardly anything is known about the performance of the current automatic speech recognition technology with these people. In this paper we report the results of a thorough investigation into this field using a connected digit recognizer and a major telephone speech database. One would generally assume that the recognition of speech from these people would only be a matter of having enough, sufficiently representative training data. This turns out to be true only, as long as the speakers belong to the age range 15 to approximately 70. Outside this range the error rates increase dramatically, even with balanced amounts of training data. For males, the lower limit is very sharp and can be attributed to the change of pitch frequency during puberty. For females, the lower limit is gradual and caused by the slowly changing dimensions of the vocal tract length only. For both genders, the upper limit is very gradual and can possibly be attributed to changes in the glottis area and the internal control loops of the human articulatory system. The paper presents some supporting evidence for the above assertions and gives results for various attempts to improve the performance. Recognition of children and the elderly will require much more research if we are to fully understand the characteristics of these age group on current and future speech recognition systems
Keywords
speech recognition; speech synthesis; 15 to 70 yr; age range; automatic speech recognition technology; children; connected digit recognizer; elderly; error rates; females; glottis area; human articulatory system; internal control loop; males; performance; pitch frequency; puberty; speech recognition; telephone speech database; vocal tract length; voice operated interfaces; Automatic speech recognition; Control systems; Databases; Error analysis; Frequency; Humans; Senior citizens; Speech recognition; Telephony; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location
Atlanta, GA
ISSN
1520-6149
Print_ISBN
0-7803-3192-3
Type
conf
DOI
10.1109/ICASSP.1996.541104
Filename
541104
Link To Document