مرکز منطقه ای اطلاع رساني علوم و فناوري - Improved language identification using deep bottleneck network

DocumentCode :

3428523

Title :

Improved language identification using deep bottleneck network

Author :

Yan Song ; Ruilian Cui ; Xinhai Hong ; McLoughlin, Ian ; Jiong Shi ; Lirong Dai

Author_Institution :

Nat. Eng. Lab. of Speech & Language Inf. Process., USTC, Hefei, China

fYear :

2015

fDate :

19-24 April 2015

Firstpage :

4200

Lastpage :

4204

Abstract :

Effective representation plays an important role in automatic spoken language identification (LID). Recently, several representations that employ a pre-trained deep neural network (DNN) as the front-end feature extractor, have achieved state-of-the-art performance. However the performance is still far from satisfactory for dialect and short-duration utterance identification tasks, due to the deficiency of existing representations. To address this issue, this paper proposes the improved representations to exploit the information extracted from different layers of the DNN structure. This is conceptually motivated by regarding the DNN as a bridge between low-level acoustic input and high-level phonetic output features. Specifically, we employ deep bottleneck network (DBN), a DNN with an internal bottleneck layer acting as a feature extractor. We extract representations from two layers of this single network, i.e. DBN-TopLayer and DBN-MidLayer. Evaluations on the NIST LRE2009 dataset, as well as the more specific dialect recognition task, show that each representation can achieve an incremental performance gain. Furthermore, a simple fusion of the representations is shown to exceed current state-of-the-art performance.

Keywords :

feature extraction; learning (artificial intelligence); speech processing; speech recognition; DBN-MidLayer; DBN-TopLayer; DNN structure; LID; NIST LRE2009 dataset; automatic spoken language identification; deep bottleneck network; dialect identification task; dialect recognition task; front-end feature extractor; high-level phonetic output feature; improved language identification; incremental performance gain; internal bottleneck layer; low-level acoustic input feature; pretrained deep neural network; short-duration utterance identification task; Acoustics; Feature extraction; Kernel; NIST; Speech; Speech processing; Training; Bottleneck Feature; Deep Neural Network; Language Identification; Representation Learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location :

South Brisbane, QLD

Type :

conf

DOI :

10.1109/ICASSP.2015.7178762

Filename :

7178762

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3428523