• DocumentCode
    3015215
  • Title

    A training procedure for a segment-based-network approach to isolated word recognition

  • Author

    Soong, F.K.

  • Author_Institution
    AT&T Bell Laboratories, Murray Hill, New Jersey
  • Volume
    12
  • fYear
    1987
  • fDate
    31868
  • Firstpage
    693
  • Lastpage
    696
  • Abstract
    In this paper, we propose a complete training procedure for creating a subword-based network and test it in an isolated word recognition experiment. We first hand segment one training token per word into contiguous subword segments with the aid of an interactive program that can display and playback various acoustic features of an utterance. The subword segmental units adopted in this paper consist of four different sound classes including: stationary sounds, fast transitional sounds, slow transitional sounds plus consonant clusters and others. The hand segmented token is used to initialize a subword-based word network which is then refined by using more training tokens. The refinement is carried out with a two-level dynamic programming (DP) procedure. At the first level, or the word level, an endpoint-relaxed DP algorithm is used to remove any possible endpointing errors and to mark tentative segment boundaries. Between the marked segment boundaries, another endpoint-relaxed DP algorithm is employed at the segment level to refine the segments extracted at the word level. A segment-based word network, which consists of serial and parallel branches, is generated from this training procedure. While serial branches are generated by using acoustically similar segments aligned at the segment level parallel branches are created for accomodating different acoustic manifestations of the same sound class in different phonetic contexts or different pronunciations. A speaker-dependent, isolated word, recognition experiment was carried out. For a four-speaker(2 male and 2 female), English alphabet data base, the segment-based network, when compared with a conventional word-template-based approach, gives improved performance. The word error rate is reduced from 11.2% for the word-based recognizer down to 7.7% for the network-based recognizer; or correspondingly, the number of misrecognized words is reduced from 116 to 80 out of 1040 recognition trials.
  • Keywords
    Acoustic testing; Clustering algorithms; Computational complexity; Displays; Dynamic programming; Error analysis; Speech recognition; Switches; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87.
  • Type

    conf

  • DOI
    10.1109/ICASSP.1987.1169579
  • Filename
    1169579