مرکز منطقه ای اطلاع رساني علوم و فناوري - Subspace-based phonotactic language recognition using multivariate dynamic linear models

DocumentCode :

1687185

Title :

Subspace-based phonotactic language recognition using multivariate dynamic linear models

Author :

Hung-Shin Lee ; Yu-Chin Shih ; Hsin-Min Wang ; Shyh-Kang Jeng

Author_Institution :

Dept. of Electr. Eng., Nat. Taiwan Univ., Taipei, Taiwan

fYear :

2013

Firstpage :

6870

Lastpage :

6874

Abstract :

Phonotactics, dealing with permissible phone patterns and their frequencies of occurrence in a specific language, is acknowledged to be related to spoken language recognition (SLR). With the assistance of phone recognizers, each speech utterance can be decoded into an ordered sequence of phone vectors filled with likelihood scores contributed by all possible phone models. In this paper, we propose a novel approach to dig the concealed phonotactic structure out of the phone-likelihood vectors through a kind of multivariate time series analysis: dynamic linear models (DLM). In these models, treating the generation of phone patterns in each utterance as a dynamic system, the relationship between adjacent vectors is linearly and time-invariantly modeled, and unobserved states are introduced to capture a temporal coherence intrinsic in the system. Each utterance expressed by the DLM is further transformed into a fixed-dimensional linear subspace so that well-developed distance measures between two subspaces can be applied to linear discriminant analysis (LDA) in a dissimilarity-based fashion. The results of SLR experiments on the OGI-TS corpus demonstrate that the proposed framework outperforms the well-known vector space modeling (VSM)-based methods and achieves comparable performance to our previous subspace-based method.

Keywords :

language translation; speaker recognition; time series; VSM-based methods; dissimilarity-based fashion; dynamic linear models; fixed-dimensional linear subspace; linear discriminant analysis; multivariate dynamic linear models; multivariate time series analysis; phone patterns; phone patterns generation; phone recognizers; phone-likelihood vectors; phonotactic structure; spoken language recognition; subspace-based method; subspace-based phonotactic language recognition; vector space modeling; Acoustics; Computational modeling; Decoding; Speech; Speech recognition; Support vector machine classification; Vectors; phonotactic language recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6638993

Filename :

6638993

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1687185