• DocumentCode
    3124583
  • Title

    Acoustic space partition based on broad phonetic class for ensemble acoustic modeling

  • Author

    Xugang Lu ; Yu Tsao ; Matsuda, Shodai ; Hori, Chiori ; Kashioka, Hideki

  • Author_Institution
    Nat. Inst. of Inf. & Commun. Technol., Japan
  • fYear
    2012
  • fDate
    5-8 Dec. 2012
  • Firstpage
    311
  • Lastpage
    314
  • Abstract
    Ensemble acoustic modeling can be used to model different factors that cause variability of acoustic space, and provide different combination to improve the performance of automatic speech recognition (ASR). One of the main concerns is how to partition the training data set to several subsets based on which ensemble models are trained. In this study, we focus on ensemble acoustic modeling concerned with acoustic variability caused by gender and accent for Chinese large vocabulary continuous speech recognition (LVCSR). Considering that gender and accent information may be encoded in local acoustic realizations of a few specific phonetic classes rather than in a global acoustic distribution, we proposed a acoustic space partition method based on broad phonetic class (BPC) modeling of speaker for ensemble acoustic modeling. With the principal component analysis (PCA) of the BPC based speaker representation, we designed two level hierarchical data partitions in the low dimensional speaker factor space that concerned with gender and accent information. Ensemble acoustic models were trained on the partitioned data sets on both levels. Speech recognition results showed that using acoustic models trained based on the first level and second level partitions got 9.73% and 32.29% relative improvements in character error reduction rate, respectively.
  • Keywords
    acoustic signal processing; gender issues; learning (artificial intelligence); principal component analysis; speaker recognition; speech processing; vocabulary; ASR; BPC based speaker representation; Chinese large vocabulary continuous speech recognition; LVCSR; PCA; accent information; acoustic space partition method; acoustic variability; automatic speech recognition; broad phonetic class modeling; character error reduction rate; ensemble acoustic modeling; gender; local acoustic realization; low dimensional speaker factor space; principal component analysis; speaker modeling; subsets; training data partition; two level hierarchical data partitions; Abstracts; Data models; Decision support systems; Jacobian matrices; Speech; Speech recognition; Standards; Ensemble modeling; acoustic space partition; speaker clustering; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on
  • Conference_Location
    Kowloon
  • Print_ISBN
    978-1-4673-2506-6
  • Electronic_ISBN
    978-1-4673-2505-9
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2012.6423501
  • Filename
    6423501