Title :
Improved language identification using deep bottleneck network
Author :
Yan Song ; Ruilian Cui ; Xinhai Hong ; McLoughlin, Ian ; Jiong Shi ; Lirong Dai
Author_Institution :
Nat. Eng. Lab. of Speech & Language Inf. Process., USTC, Hefei, China
Abstract :
Effective representation plays an important role in automatic spoken language identification (LID). Recently, several representations that employ a pre-trained deep neural network (DNN) as the front-end feature extractor, have achieved state-of-the-art performance. However the performance is still far from satisfactory for dialect and short-duration utterance identification tasks, due to the deficiency of existing representations. To address this issue, this paper proposes the improved representations to exploit the information extracted from different layers of the DNN structure. This is conceptually motivated by regarding the DNN as a bridge between low-level acoustic input and high-level phonetic output features. Specifically, we employ deep bottleneck network (DBN), a DNN with an internal bottleneck layer acting as a feature extractor. We extract representations from two layers of this single network, i.e. DBN-TopLayer and DBN-MidLayer. Evaluations on the NIST LRE2009 dataset, as well as the more specific dialect recognition task, show that each representation can achieve an incremental performance gain. Furthermore, a simple fusion of the representations is shown to exceed current state-of-the-art performance.
Keywords :
feature extraction; learning (artificial intelligence); speech processing; speech recognition; DBN-MidLayer; DBN-TopLayer; DNN structure; LID; NIST LRE2009 dataset; automatic spoken language identification; deep bottleneck network; dialect identification task; dialect recognition task; front-end feature extractor; high-level phonetic output feature; improved language identification; incremental performance gain; internal bottleneck layer; low-level acoustic input feature; pretrained deep neural network; short-duration utterance identification task; Acoustics; Feature extraction; Kernel; NIST; Speech; Speech processing; Training; Bottleneck Feature; Deep Neural Network; Language Identification; Representation Learning;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178762