• DocumentCode
    591904
  • Title

    What makes this voice sound so bad? A multidimensional analysis of state-of-the-art text-to-speech systems

  • Author

    Hinterleitner, Florian ; Norrenbrock, C. ; Moller, Sebastian ; Heute, Ulrich

  • Author_Institution
    Quality & Usability Lab., Tech. Univ.Berlin, Berlin, Germany
  • fYear
    2012
  • fDate
    2-5 Dec. 2012
  • Firstpage
    240
  • Lastpage
    245
  • Abstract
    This paper presents research on perceptual quality dimensions of synthetic speech. We generated 57 stimuli from 16/19 female/male German text-to-speech systems (TTS) and asked listeners to judge the perceptual distances between them in a sorting task. Through a subsequent multidimensional scaling algorithm, we extracted three dimensions. Via expert listening and a comparison to ratings gathered on 16 attribute scales, the three dimensions can be assigned to naturalness of voice, temporal distortions and calmness. These dimensions are discussed in detail and compared to the perceptual quality dimensions from previous multidimensional analyses. Moreover, the results are analyzed depending on the type of TTS system. The identified dimensions will be used in the future to build a dimension-based quality predictor for synthetic speech.
  • Keywords
    natural language processing; speech synthesis; TTS; calmness; dimension-based quality predictor; female-male German text-to-speech systems; multidimensional analysis; multidimensional scaling algorithm; perceptual distances; perceptual quality dimensions; sorting task; state-of-the-art text-to-speech systems; synthetic speech; temporal distortions; voice naturalness; Correlation; Databases; Hidden Markov models; Rhythm; Sorting; Speech; Synthesizers; multidimensional scaling; perceptual quality dimensions; speech synthesis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2012 IEEE
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4673-5125-6
  • Electronic_ISBN
    978-1-4673-5124-9
  • Type

    conf

  • DOI
    10.1109/SLT.2012.6424229
  • Filename
    6424229