• DocumentCode
    1108203
  • Title

    A systematic approach to the extraction of diphone elements from natural speech

  • Author

    Kaeslin, Hubert

  • Author_Institution
    Swiss Federal Institute of Technology, CH Zürich, Switzerland
  • Volume
    34
  • Issue
    2
  • fYear
    1986
  • fDate
    4/1/1986 12:00:00 AM
  • Firstpage
    264
  • Lastpage
    271
  • Abstract
    Synthetic speech can be generated with an unrestricted vocabulary by concatenating stored units such as diphone elements. When joining speech segments that were not adjacent in the original context they were taken from, discontinuities in the spectral envelope may arise that impair intelligibility. The method proposed here attempts to find optimum diphone boundaries in order to minimize these discontinuities, Steady-state zones of all phones carrying a diphone boundary are specified by means of a centroid vector. Based on the centroids and on an objective distance measure, hypothetical boundary cost functions are defined. Their minimization together with the evaluation of a set of additional rules determines the boundary locations. A rhyme test carried out with speech generated by concatenating diphone elements extracted according to this method yielded an intelligibility score of 96.7 percent for isolated words.
  • Keywords
    Acoustic testing; Cost function; Interpolation; Linear predictive coding; Natural languages; Speech processing; Speech synthesis; Stability; Steady-state; Vocabulary;
  • fLanguage
    English
  • Journal_Title
    Acoustics, Speech and Signal Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0096-3518
  • Type

    jour

  • DOI
    10.1109/TASSP.1986.1164810
  • Filename
    1164810