DocumentCode
394338
Title
Recent improvements to the IBM trainable speech synthesis system
Author
Eide, E. ; Aaron, A. ; Bakis, R. ; Cohen, P. ; Donovan, R. ; Hamza, W. ; Mathes, T. ; Picheny, M. ; Polkosky, M. ; Smith, M. ; Viswanathan, M.
Author_Institution
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
Volume
1
fYear
2003
fDate
6-10 April 2003
Abstract
In this paper we describe the current status of the trainable text-to-speech system at IBM. Recent algorithmic and database changes to the system have led to significant gains in the output quality. On the algorithms side, we have introduced statistical models for predicting pitch and duration targets which replace the rule-based target generation previously employed. Additionally, we have changed the cost function and the search strategy, introduced a post-search pitch smoothing algorithm, and improved our method of preselection. Through the combined data and algorithmic contributions, we have been able to significantly improve (p < 0.0001) the mean opinion score (MOS) of our female voice, from 3.68 to 4.85 when heard over loudspeakers and to 5.42 when heard over the telephone (seven point scale).
Keywords
frequency estimation; prediction theory; search problems; smoothing methods; speech synthesis; statistical analysis; IBM trainable speech synthesis system; algorithmic changes; cost function; database changes; duration; mean opinion score; output quality; pitch prediction; post-search pitch smoothing algorithm; preselection; search strategy; statistical models; text-to-speech system; Cost function; Databases; Decision trees; Knowledge based systems; Signal generators; Signal processing algorithms; Smoothing methods; Speech processing; Speech synthesis; Stress;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-7663-3
Type
conf
DOI
10.1109/ICASSP.2003.1198879
Filename
1198879
Link To Document