مرکز منطقه ای اطلاع رساني علوم و فناوري - A deep architecture with bilinear modeling of hidden representations: Applications to phonetic recognition

DocumentCode :

3165649

Title :

A deep architecture with bilinear modeling of hidden representations: Applications to phonetic recognition

Author :

Hutchinson, Brian ; Deng, Li ; Yu, Dong

Author_Institution :

EE Dept., Univ. of Washington, Seattle, WA, USA

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

4805

Lastpage :

4808

Abstract :

We develop and describe a novel deep architecture, the Tensor Deep Stacking Network (T-DSN), where multiple blocks are stacked one on top of another and where a bilinear mapping from hidden representations to the output in each block is used to incorporate higher-order statistics of the input features. A learning algorithm for the T-DSN is presented, in which the main parameter estimation burden is shifted to a convex sub-problem with a closed-form solution. Using an efficient and scalable parallel implementation, we train a T-DSN to discriminate standard three-state monophones in the TIMIT database. The T-DSN outperforms an alternative pretrained Deep Neural Network (DNN) architecture in frame-level classification (both state and phone) and in the cross-entropy measure. For continuous phonetic recognition, T-DSN performs equivalently to a DNN but without the need for a hard-to-scale, sequential fine-tuning step.

Keywords :

convex programming; learning (artificial intelligence); neural nets; parameter estimation; speech recognition; tensors; DNN architecture; T-DSN; bilinear mapping; bilinear modeling; closed-form solution; convex subproblem; deep neural network architecture; hidden representations; learning algorithm; parameter estimation; phonetic recognition; sequential fine-tuning step; tensor deep stacking network; Computer architecture; Error analysis; Neural networks; Speech; Stacking; Tensile stress; Training; deep learning; higher-order statistics; phonetic classification and recognition; stacking model; tensors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6288994

Filename :

6288994

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3165649