DocumentCode
1887866
Title
Acoustic space analysis method utilizing statistical multidimensional scaling technique
Author
Shozakai, M. ; Nagino, G.
Author_Institution
Asahi Kasei Corp., Japan
fYear
2005
fDate
18-20 May 2005
Firstpage
37
Abstract
Summary form only given. In order to achieve sufficient improvement in speaker-adaptation techniques, such as the MLLR method, it is essential to obtain an adequate number of samples of the user´s voice, rendering the application of the method difficult in practical environments. Prior development of a library of highly precise acoustic models is necessary to ensure high enough speech recognition performance from the outset of using the system. It is quite important to analyze a target acoustic space to design an efficient acoustic model library. However, the analysis of multidimensional acoustic space is generally a difficult task. In order to support the analysis of acoustic space through the capability of human visual perception, we proposed the COSMOS (COmprehensive Space Map of Objective Signal, previously aCOustic Space Map Of Sound) method. It features the visualization of an aggregate of acoustic models based on stochastic models, such as HMM and GMM, into a two-dimensional map (called COSMOS map) by utilizing a statistical multidimensional scaling technique of nonlinear projection. First, the paper formulates the COSMOS method. Then, a quantitative analysis of a speaking style COSMOS map is described. Error analysis of the mapping from multidimensional space to two-dimensional space in the COSMOS map is investigated. Furthermore, it is suggested that there exist multiple radiated axes of acoustic feature continuity in the COSMOS map.
Keywords
Gaussian processes; acoustic signal processing; adaptive signal processing; hidden Markov models; multidimensional signal processing; speech recognition; statistical analysis; visual perception; GMM; HMM; acoustic feature continuity; acoustic model library; acoustic space analysis method; acoustic space map of sound; comprehensive space map of objective signal; human visual perception; nonlinear projection; speaker-adaptation techniques; speaking style; statistical multidimensional scaling; stochastic models; Aggregates; Hidden Markov models; Humans; Libraries; Maximum likelihood linear regression; Multidimensional systems; Signal analysis; Speech recognition; Visual perception; Visualization;
fLanguage
English
Publisher
ieee
Conference_Titel
Nonlinear Signal and Image Processing, 2005. NSIP 2005. Abstracts. IEEE-Eurasip
Conference_Location
Sapporo
Print_ISBN
0-7803-9064-4
Type
conf
DOI
10.1109/NSIP.2005.1502287
Filename
1502287
Link To Document