• DocumentCode
    454573
  • Title

    Automatic Speech Attribute Transcription (ASAT) - The Front End Processor

  • Author

    Hou, Jun ; Rabiner, Lawrence ; Dusan, Sorin

  • Author_Institution
    CAIP Center, Rutgers Univ., Piscataway, NJ
  • Volume
    1
  • fYear
    2006
  • fDate
    14-19 May 2006
  • Abstract
    In this paper we discuss the design and implementation of the ASAT front end processing system, whose goal is to convert the speech waveform into a range of measurements and parameters which are then combined to form probabilistic attributes. The ASAT front end processing module utilizes a range of spectral and temporal speech parameters as input to a set of neural network classifiers to create sets of attribute probability lattices, based on either single frames or blocks of frames (segments). We test this architecture by using the 14 Sound Patterns of English (SPE) features as speech attributes. Without balancing the training data, the detection accuracies of 4 of the SPE features are above 90%, 2 features obtain between 80% and 90% detection accuracy, and 8 features have detection accuracies below 80%. With a novel method of balancing the feature training data, the performance of the neural networks improved significantly, with 6 features having detection accuracies above 90% and the remaining 8 features with detection accuracy above 80%
  • Keywords
    neural nets; speech recognition; ASAT front end processing system; Sound Patterns of English; attribute probability lattices; automatic speech attribute transcription; neural network classifiers; speech waveform; temporal speech parameters; Speech processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
  • Conference_Location
    Toulouse
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0469-X
  • Type

    conf

  • DOI
    10.1109/ICASSP.2006.1660025
  • Filename
    1660025