مرکز منطقه ای اطلاع رساني علوم و فناوري - Data Augmentation for Deep Neural Network Acoustic Modeling

DocumentCode :

79488

Title :

Data Augmentation for Deep Neural Network Acoustic Modeling

Author :

Xiaodong Cui ; Goel, Vaibhava ; Kingsbury, Brian

Author_Institution :

IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA

Volume :

Issue :

fYear :

2015

fDate :

Sept. 2015

Firstpage :

1469

Lastpage :

1477

Abstract :

This paper investigates data augmentation for deep neural network acoustic modeling based on label-preserving transformations to deal with data sparsity. Two data augmentation approaches, vocal tract length perturbation (VTLP) and stochastic feature mapping (SFM), are investigated for both deep neural networks (DNNs) and convolutional neural networks (CNNs). The approaches are focused on increasing speaker and speech variations of the limited training data such that the acoustic models trained with the augmented data are more robust to such variations. In addition, a two-stage data augmentation scheme based on a stacked architecture is proposed to combine VTLP and SFM as complementary approaches. Experiments are conducted on Assamese and Haitian Creole, two development languages of the IARPA Babel program, and improved performance on automatic speech recognition (ASR) and keyword search (KWS) is reported.

Keywords :

acoustic signal processing; neural nets; speech recognition; ASR; Assamese language; CNN; DNN; Haitian Creole language; IARPA Babel program; KWS; SFM approach; VTLP approach; automatic speech recognition; convolutional neural networks; data augmentation; data sparsity; deep neural network acoustic modeling; keyword search; label-preserving transformation; speaker variation; speech variation; stochastic feature mapping approach; vocal tract length perturbation approach; Acoustics; Data models; Feature extraction; Neural networks; Speech; Training; Training data; Data augmentation; automatic speech recognition; deep neural networks; keyword search; stochastic feature mapping;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

2329-9290

Type :

jour

DOI :

10.1109/TASLP.2015.2438544

Filename :

7113823

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=79488