• DocumentCode
    846179
  • Title

    Automatic phonetic segmentation

  • Author

    Toledano, Doroteo Torre ; Gómez, Luis A Hernández ; Grande, Luis Villarrubia

  • Author_Institution
    Speech Technol. Group, Telefonica R&D, Madrid, Spain
  • Volume
    11
  • Issue
    6
  • fYear
    2003
  • Firstpage
    617
  • Lastpage
    625
  • Abstract
    This paper presents the results and conclusions of a thorough study on automatic phonetic segmentation. It starts with a review of the state of the art in this field. Then, it analyzes the most frequently used approach-based on a modified Hidden Markov Model (HMM) phonetic recognizer. For this approach, a statistical correction procedure is proposed to compensate for the systematic errors produced by context-dependent HMMs, and the use of speaker adaptation techniques is considered to increase segmentation precision. Finally, this paper explores the possibility of locally refining the boundaries obtained with the former techniques. A general framework is proposed for the local refinement of boundaries, and the performance of several pattern classification approaches (fuzzy logic, neural networks and Gaussian mixture models) is compared within this framework. The resulting phonetic segmentation scheme was able to increase the performance of a baseline HMM segmentation tool from 27.12%, 79.27%, and 97.75% of automatic boundary marks with errors smaller than 5, 20, and 50 ms, respectively, to 65.86%, 96.01%, and 99.31% in speaker-dependent mode, which is a reasonably good approximation to manual segmentation.
  • Keywords
    fuzzy logic; hidden Markov models; neural nets; pattern classification; speaker recognition; speech processing; speech synthesis; Gaussian mixture model; HMM segmentation tool; automatic phonetic segmentation; fuzzy logic; hidden Markov model; neural networks; pattern classification; phonetic recognizer; segmentation precision; speaker adaptation techniques; speech analysis; speech synthesis; statistical correction procedure; Error correction; Fuzzy logic; Hidden Markov models; Labeling; Neural networks; Pattern classification; Research and development; Speech analysis; Speech recognition; Speech synthesis;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2003.813579
  • Filename
    1255449