• DocumentCode
    134188
  • Title

    Performance evaluation of deep bottleneck features for spoken language identification

  • Author

    Bing Jiang ; Yan Song ; Si Wei ; Meng-Ge Wang ; Mcloughlin, Ian ; Li-Rong Dai

  • Author_Institution
    Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2014
  • fDate
    12-14 Sept. 2014
  • Firstpage
    143
  • Lastpage
    147
  • Abstract
    Our previous work has shown that Deep Bottleneck Features (DBF), generated from a well-trained Deep Neural Network (DNN), can provide high performance Language Identification (LID) when Total Variability (TV) modelling is used for a back-end. This may largely be attributed to the powerful capability of the DNN for finding a frame-level representation which is robust to variances caused by different speakers, channels and background noise. However the DBF in the previous work were extracted from a DNN that was trained using a large ASR dataset. Optimal LID DBF parameters may differ from those that are known to be optimal for ASR. Thus this paper focuses on investigating different DBF extractors, input layer window sizes and dimensionality, and bottleneck layer location. Additionally, principal component analysis (PCA) is used to decorrelate the DBF. Experiments, based on the Gaussian Mixture Model-Universal Background Model (GMM-UBM) operating on the NIST LRE 2009 database, are conducted to evaluate the system. Results allow comparison between different DBF extractor parameters, as well as demonstrating that LID based on DBF can significantly outperform the conventional shift delta cepstral (SDC) features.
  • Keywords
    Gaussian processes; learning (artificial intelligence); mixture models; natural language processing; neural nets; principal component analysis; signal representation; DNN training; GMM-UBM; Gaussian mixture model-universal background model; NIST LRE 2009 database; PCA; TV modelling; bottleneck layer location; decorrelation; deep-bottleneck features; deep-neural network training; frame-level representation; high-performance language identification; input layer window dimensionality; input layer window sizes; optimal LID DBF extractor parameters; performance evaluation; principal component analysis; spoken language identification; total variability modelling; Acoustics; Context; Feature extraction; Neural networks; Principal component analysis; Speech; Training; deep bottleneck feature; deep neural network; gaussian mixture model; language identification; shift delta cepstral;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2014.6936580
  • Filename
    6936580