• DocumentCode
    2124358
  • Title

    Phone Segmentation for Japanese Triphthong Using Neural Networks

  • Author

    Banik, Manoj ; Hossain, Md Modasser ; Saha, Aloke Kumar ; Hassan, Foyzul ; Kotwal, Mohammed Rokibul Alam ; Huda, Mohammad Nurul

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ahsanullah Univ. of Sci. & Technol., Dhaka, Bangladesh
  • fYear
    2011
  • fDate
    11-13 April 2011
  • Firstpage
    470
  • Lastpage
    475
  • Abstract
    Context information influences the performance of Automatic Speech Recognition (ASR). Current Hidden Markov Model (HMM) based ASR systems have solved this problem by using context-sensitive tri-phone models. However, these models need a large number of speech parameters and a large volume of speech corpus. In this paper, we propose a technique to model a dynamic process of co-articulation and embed it to ASR systems. Recurrent Neural Network (RNN) is expected to realize this dynamic process. But main problem is the slowness of RNN for training the network of large size. We introduce Distinctive Phonetic Feature (DPF) based feature extraction using a two-stage system consists of a Multi-Layer Neural Network (MLN) in the first stage and another MLN in the second stage where the first MLN is expected to reduce the dynamics of acoustic feature pattern and the second MLN to suppress the fluctuation caused by DPF context. The experiments are carried out using Japanese triphthong data. The proposed DPF based feature extractor provides better segmentation performance with a reduced mixture-set of HMMs. Better context effect is achieved with less computation using MLN instead of RNN.
  • Keywords
    acoustic signal processing; hidden Markov models; learning (artificial intelligence); multilayer perceptrons; recurrent neural nets; speech recognition; ASR system; HMM; Japanese triphthong data; MLN; RNN; acoustic feature pattern; automatic speech recognition; context information; context-sensitive triphone model; distinctive phonetic feature based feature extraction; hidden Markov model; multilayer neural network; neural network training; phone segmentation; recurrent neural network; speech corpus; speech parameters; Context; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Recurrent neural networks; Speech recognition; Distinctive Phonetic Feature; Hidden Markov Model; Local Features; Multi-Layer Neural Network; Recurrent Neural Network;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology: New Generations (ITNG), 2011 Eighth International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-61284-427-5
  • Electronic_ISBN
    978-0-7695-4367-3
  • Type

    conf

  • DOI
    10.1109/ITNG.2011.88
  • Filename
    5945281