• DocumentCode
    3485521
  • Title

    Automatic detection of unnatural word-level segments in unit-selection speech synthesis

  • Author

    Wang, William Yang ; Georgila, Kallirroi

  • fYear
    2011
  • fDate
    11-15 Dec. 2011
  • Firstpage
    289
  • Lastpage
    294
  • Abstract
    We investigate the problem of automatically detecting unnatural word-level segments in unit selection speech synthesis. We use a large set of features, namely, target and join costs, language models, prosodic cues, energy and spectrum, and Delta Term Frequency Inverse Document Frequency (TF-IDF), and we report comparative results between different feature types and their combinations. We also compare three modeling methods based on Support Vector Machines (SVMs), Random Forests, and Conditional Random Fields (CRFs). We then discuss our results and present a comprehensive error analysis.
  • Keywords
    speech synthesis; support vector machines; CRF; SVM; TF-IDF; automatic detection; comprehensive error analysis; conditional random fields; delta term frequency inverse document frequency; language models; prosodic cues; random forests; selection speech synthesis; support vector machines; unit-selection speech synthesis; unnatural word-level segments; Acoustics; Feature extraction; Humans; Speech; Speech synthesis; Testing; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
  • Conference_Location
    Waikoloa, HI
  • Print_ISBN
    978-1-4673-0365-1
  • Electronic_ISBN
    978-1-4673-0366-8
  • Type

    conf

  • DOI
    10.1109/ASRU.2011.6163946
  • Filename
    6163946