Agglomerative vs. tree-based clustering for the definition of multilingual set of triphones

Author

B. Imperl;Z. Kacic;B. Horvat;A. Zgank

Author_Institution

Fac. of Electr. Eng. & Comput. Sci., Maribor Univ., Slovenia

Volume

3

fYear

2000

Firstpage

1273

Abstract

The paper addresses the problem of multilingual acoustic modelling for the design of multilingual speech recognisers. Two different approaches for the definition of multilingual set of triphones (bottom-up and a top-down) are investigated. A new clustering algorithm for the definition of multilingual set of triphones is proposed. The agglomerative clustering algorithm (bottom-up) is based on a definition of a distance measure for triphones defined as a weighted sum of explicit estimates of the context similarity on a monophone level. The monophone similarity estimation method is based on the algorithm of Houtgast. The second type of system uses tree-based clustering (top-down) with a common decision tree. The experiments were based on the SpeechDat II databases (Slovenian, Spanish and German 1000 FDB SpeechDat II). Experiments have shown that the use of the agglomerative clustering algorithm results in a significant reduction of the number of triphones with minor degradation of word accuracy.

Keywords

"Context modeling","Clustering algorithms","Speech recognition","Signal processing algorithms","Decision trees","Degradation","Laboratories","Digital signal processing","Computer science","Databases"

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2000. ICASSP ´00. Proceedings. 2000 IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-6293-4

Type

conf

DOI

10.1109/ICASSP.2000.861809

Filename

861809