DocumentCode :
134188
Title :
Performance evaluation of deep bottleneck features for spoken language identification
Author :
Bing Jiang ; Yan Song ; Si Wei ; Meng-Ge Wang ; Mcloughlin, Ian ; Li-Rong Dai
Author_Institution :
Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
fYear :
2014
fDate :
12-14 Sept. 2014
Firstpage :
143
Lastpage :
147
Abstract :
Our previous work has shown that Deep Bottleneck Features (DBF), generated from a well-trained Deep Neural Network (DNN), can provide high performance Language Identification (LID) when Total Variability (TV) modelling is used for a back-end. This may largely be attributed to the powerful capability of the DNN for finding a frame-level representation which is robust to variances caused by different speakers, channels and background noise. However the DBF in the previous work were extracted from a DNN that was trained using a large ASR dataset. Optimal LID DBF parameters may differ from those that are known to be optimal for ASR. Thus this paper focuses on investigating different DBF extractors, input layer window sizes and dimensionality, and bottleneck layer location. Additionally, principal component analysis (PCA) is used to decorrelate the DBF. Experiments, based on the Gaussian Mixture Model-Universal Background Model (GMM-UBM) operating on the NIST LRE 2009 database, are conducted to evaluate the system. Results allow comparison between different DBF extractor parameters, as well as demonstrating that LID based on DBF can significantly outperform the conventional shift delta cepstral (SDC) features.
Keywords :
Gaussian processes; learning (artificial intelligence); mixture models; natural language processing; neural nets; principal component analysis; signal representation; DNN training; GMM-UBM; Gaussian mixture model-universal background model; NIST LRE 2009 database; PCA; TV modelling; bottleneck layer location; decorrelation; deep-bottleneck features; deep-neural network training; frame-level representation; high-performance language identification; input layer window dimensionality; input layer window sizes; optimal LID DBF extractor parameters; performance evaluation; principal component analysis; spoken language identification; total variability modelling; Acoustics; Context; Feature extraction; Neural networks; Principal component analysis; Speech; Training; deep bottleneck feature; deep neural network; gaussian mixture model; language identification; shift delta cepstral;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
Type :
conf
DOI :
10.1109/ISCSLP.2014.6936580
Filename :
6936580
Link To Document :
بازگشت