DocumentCode
3485521
Title
Automatic detection of unnatural word-level segments in unit-selection speech synthesis
Author
Wang, William Yang ; Georgila, Kallirroi
fYear
2011
fDate
11-15 Dec. 2011
Firstpage
289
Lastpage
294
Abstract
We investigate the problem of automatically detecting unnatural word-level segments in unit selection speech synthesis. We use a large set of features, namely, target and join costs, language models, prosodic cues, energy and spectrum, and Delta Term Frequency Inverse Document Frequency (TF-IDF), and we report comparative results between different feature types and their combinations. We also compare three modeling methods based on Support Vector Machines (SVMs), Random Forests, and Conditional Random Fields (CRFs). We then discuss our results and present a comprehensive error analysis.
Keywords
speech synthesis; support vector machines; CRF; SVM; TF-IDF; automatic detection; comprehensive error analysis; conditional random fields; delta term frequency inverse document frequency; language models; prosodic cues; random forests; selection speech synthesis; support vector machines; unit-selection speech synthesis; unnatural word-level segments; Acoustics; Feature extraction; Humans; Speech; Speech synthesis; Testing; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
Conference_Location
Waikoloa, HI
Print_ISBN
978-1-4673-0365-1
Electronic_ISBN
978-1-4673-0366-8
Type
conf
DOI
10.1109/ASRU.2011.6163946
Filename
6163946
Link To Document