Building an ensemble of CD-DNN-HMM acoustic model using random forests of phonetic decision trees

Author

Tuo Zhao ; Yunxin Zhao ; Xin Chen

Author_Institution

Dept. of Comput. Sci., Univ. of Missouri, Columbia, MO, USA

fYear

2014

fDate

12-14 Sept. 2014

Firstpage

98

Lastpage

102

Abstract

We propose an RF-PDT+CD-DNN approach to generate an ensemble of context-dependent pre-trained deep neural networks (CD-DNNs) using random forests of phonetic decision trees (RF-PDTs) and constructing a CD-DNN-HMM-based ensemble acoustic model (EAM). We present evaluation results on the TIMIT dataset and a telemedicine automatic captioning dataset and demonstrate that the proposed RF-PDT+CD-DNN based EAM significantly outperforms the CD-DNN based single acoustic model (SAM) in phone and word recognition accuracies.

Keywords

decision trees; neural nets; speech recognition; telemedicine; CD-DNN-HMM acoustic model ensemble; EAM; RF-PDT+CD-DNN; SAM; TIMIT dataset; context-dependent pretrained deep neural networks; phone recognition accuracies; phonetic decision trees; random forests; random forests of phonetic decision trees; single acoustic model; telemedicine automatic captioning dataset; word recognition accuracies; deep neural network; discriminative pre-training; ensemble acoustic model; phonetic decision tree; random forest; single acoustic model;

fLanguage

English

Publisher

ieee

Conference_Titel

Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on

Conference_Location

Singapore

Type

conf

DOI

10.1109/ISCSLP.2014.6936680

Filename

6936680