DocumentCode
2255258
Title
Phone clustering using the Bhattacharyya distance
Author
Mak, Brian ; Barnard, Etienne
Author_Institution
Center for Spoken Language Understanding, Oregon Graduate Inst. of Sci. & Technol., Portland, OR, USA
Volume
4
fYear
1996
fDate
3-6 Oct 1996
Firstpage
2005
Abstract
The authors study the use of the classification-based Bhattacharyya distance measure to guide biphone clustering. The Bhattacharyya distance is a theoretical distance measure between two Gaussian distributions which is equivalent to an upper bound on the optimal Bayesian classification error probability. It also has the desirable properties of being computationally simple and extensible to more Gaussian mixtures. Using the Bhattacharyya distance measure in a data-driven approach together with a novel a-level agglomerative hierarchical biphone clustering algorithm, generalized left/right biphones(BGBs) are derived. A neural-net based phone recognizer trained on the BGBs is found to have better frame-level phone recognition than one trained on generalized biphones (BCGBs) derived from a set of commonly used broad categories. They further evaluate the new BGBs on an isolated-word recognition task of perplexity 40 and obtain a 16.2% error reduction over the broad-category generalized biphones (BCGBs) and a 41.8% error reduction over the monophones
Keywords
Bayes methods; Gaussian distribution; errors; neural nets; pattern classification; speech recognition; Gaussian distributions; Gaussian mixtures; a-level agglomerative hierarchical biphone clustering algorithm; biphone clustering; broad categories; classification-based Bhattacharyya distance measure; data-driven approach; error reduction; frame-level phone recognition; generalized biphones; generalized left/right biphones; isolated-word recognition task; monophones; neural-net based phone recognizer; optimal Bayesian classification error probability; phone clustering; Bayesian methods; Clustering algorithms; Clustering methods; Decision trees; Error probability; Gaussian distribution; Hidden Markov models; Natural languages; Speech recognition; Upper bound;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location
Philadelphia, PA
Print_ISBN
0-7803-3555-4
Type
conf
DOI
10.1109/ICSLP.1996.607191
Filename
607191
Link To Document