مرکز منطقه ای اطلاع رساني علوم و فناوري - Random Forests of Phonetic Decision Trees for Acoustic Modeling in Conversational Speech Recognition

DocumentCode :

1036562

Title :

Random Forests of Phonetic Decision Trees for Acoustic Modeling in Conversational Speech Recognition

Author :

Xue, Jian ; Zhao, Yunxin

Author_Institution :

IBM, Yorktown Heights

Volume :

Issue :

fYear :

2008

fDate :

3/1/2008 12:00:00 AM

Firstpage :

519

Lastpage :

528

Abstract :

In this paper, we present a novel technique of constructing phonetic decision trees (PDTs) for acoustic modeling in conversational speech recognition. We use random forests (RFs) to train a set of PDTs for each phone state unit and obtain multiple acoustic models accordingly. We investigate several methods of combining acoustic scores from the multiple models, including maximum-likelihood estimation of the weights of different acoustic models from training data, as well as using confidence score of -value or relative entropy to obtain the weights dynamically from online data. Since computing acoustic scores from the multiple models slows down decoding search, we propose clustering methods to compact the RF-generated acoustic models. The conventional concept of PDT-based state tying is extended to RF-based state tying. On each RF tied state, we cluster the Gaussian density functions (GDFs) from multiple acoustic models into classes and compute a prototype for each class to represent the original GDFs. In this way, the number of GDFs in each RF tied state is decreased greatly, which significantly reduces the time for computing acoustic scores. Experimental results on a telemedicine automatic captioning task demonstrate that the proposed RF-PDT technique leads to significant improvements in word recognition accuracy.

Keywords :

Gaussian processes; decision trees; decoding; entropy; maximum likelihood estimation; speech coding; speech recognition; Gaussian density functions; RF-generated acoustic models; acoustic modeling; acoustic scores; conversational speech recognition; decoding search; maximum-likelihood estimation; multiple acoustic models; phonetic decision trees; random forests; relative entropy; Acoustic modeling; acoustic score combination; model clustering; phonetic decision trees (PDTs); random forests (RFs);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2007.913036

Filename :

4432283

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1036562