• DocumentCode
    28419
  • Title

    Chinese-English Phone Set Construction for Code-Switching ASR Using Acoustic and DNN-Extracted Articulatory Features

  • Author

    Chung-Hsien Wu ; Han-Ping Shen ; Yan-Ting Yang

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
  • Volume
    22
  • Issue
    4
  • fYear
    2014
  • fDate
    Apr-14
  • Firstpage
    858
  • Lastpage
    862
  • Abstract
    This study proposes a data-driven approach to phone set construction for code-switching automatic speech recognition (ASR). Acoustic and context-dependent cross-lingual articulatory features (AFs) are incorporated into the estimation of the distance between triphone units for constructing a Chinese-English phone set. The acoustic features of each triphone in the training corpus are extracted for constructing an acoustic triphone HMM. Furthermore, the articulatory features of the “last/first” state of the corresponding preceding/succeeding triphone in the training corpus are used to construct an AF-based GMM. The AFs, extracted using a deep neural network (DNN), are used for code-switching articulation modeling to alleviate the data sparseness problem due to the diverse context-dependent phone combinations in intra-sentential code-switching. The triphones are then clustered to obtain a Chinese-English phone set based on the acoustic HMMs and the AF-based GMMs using a hierarchical triphone clustering algorithm. Experimental results on code-switching ASR show that the proposed method for phone set construction outperformed other traditional methods.
  • Keywords
    Gaussian processes; computational linguistics; feature extraction; hidden Markov models; mixture models; natural language processing; neural nets; pattern clustering; smart phones; speech recognition; AF-based GMM; Chinese-English phone set construction; DNN extracted articulatory feature extraction; acoustic feature extraction; acoustic triphone HMM; automatic speech recognition; code switching ASR; code switching articulation modeling; context dependent cross-lingual articulatory feature; data driven approach; data sparseness problem; deep neural network; distance estimation; hierarchical triphone clustering algorithm; intrasentential code switching; training corpus; Acoustics; Feature extraction; Hidden Markov models; IEEE transactions; Speech; Speech processing; Training; Articulatory features; code-switching; phone set construction; speech recognition;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2014.2310353
  • Filename
    6763085