Title :
Unsupervised and incremental speaker adaptation under adverse environmental conditions
Author :
Takagi, Keizaburo ; Shinoda, Kazuma ; Hattori, Hiroaki ; Watanabe, Takao
Author_Institution :
Inf. Technol. Res. Labs., NEC Corp., Kawasaki, Japan
Abstract :
A speaker adaptation method is described. In practical applications of speaker adaptation, adaptation and testing environments change significantly and are unknown beforehand. In such cases, since the speaker adaptation adapts a reference pattern to the adaptation utterances with regard to differences in both environment and speaker at the same time, performance in speaker adaptation would be degraded. To cope with this problem, our proposed method first eliminates the environmental differences between each input utterance and a reference pattern by using a rapid environment adaptation algorithm based on spectrum equalization (REALISE) (K. Takagi et al., 1995). Then we apply an unsupervised and incremental speaker adaptation with autonomous control using tree structure pdfs (ACTS) (K. Shinoda and T. Watanabe, 1995) to the environmentally adapted reference pattern. By combining these two methods, the resulting system is expected to perform well under adverse environmental conditions and to show a stable improvement, regardless of the amount of adaptation data. Evaluation experiments were carried out for utterances under three vehicle speed conditions. Recognition rates for a 100 Japanese word recognition task after 100 word adaptation were improved from 92% (ACTS alone) to 95% (proposed method)
Keywords :
adaptive systems; natural languages; probability; speech processing; speech recognition; tree data structures; ACTS; Japanese word recognition task; REALISE; adaptation data; adaptation utterances; adverse environmental conditions; autonomous control; environmental differences; environmentally adapted reference pattern; incremental speaker adaptation; input utterance; rapid environment adaptation algorithm; reference pattern; speaker adaptation method; spectrum equalization; tree structure pdfs; unsupervised speaker adaptation; vehicle speed conditions; Additive noise; Degradation; Information technology; National electric code; Probability density function; Speech recognition; Testing; Tree data structures; Vehicles; Working environment noise;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607211