Segment-based speaker adaptation by neural network

Author

Fukuzawa, Keiji ; Sawai, Hidefumi ; Sugiyama, Masahide

Author_Institution

ATR Interpreting Telephony Res. Labs., Kyoto, Japan

fYear

1991

fDate

30 Sep-1 Oct 1991

Firstpage

442

Lastpage

451

Abstract

The authors propose a segment-to-segment speaker adaptation technique using a feed-forward neural network with a time shifted sub-connection architecture. Differences in voice individuality exist in both the spectral and temporal domains. It is generally known that frame based speaker adaptation techniques can not compensate for speaker individuality in the temporal domain. Segment based speaker adaptation compensates for these spectral and temporal differences. The results of 23 Japanese phoneme recognition experiments using TDNN (time-delay neural network) show that the recognition rate by segment-based adaptations was 83.7%, 22.8% higher than the rate without adaptation

Keywords

feedforward neural nets; speech analysis and processing; speech recognition; Japanese phoneme recognition; feed-forward neural network; segment-to-segment speaker adaptation; spectral domains; temporal domains; time shifted sub-connection architecture; Feedforward neural networks; Feedforward systems; Laboratories; Neural networks; Performance evaluation; Research and development; Speech recognition; Telephony;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks for Signal Processing [1991]., Proceedings of the 1991 IEEE Workshop

Conference_Location

Princeton, NJ

Print_ISBN

0-7803-0118-8

Type

conf

DOI

10.1109/NNSP.1991.239497

Filename

239497