مرکز منطقه ای اطلاع رساني علوم و فناوري - Multi-View and Multi-Objective Semi-Supervised Learning for HMM-Based Automatic Speech Recognition

DocumentCode :

1478994

Title :

Multi-View and Multi-Objective Semi-Supervised Learning for HMM-Based Automatic Speech Recognition

Author :

Cui, Xiaodong ; Huang, Jing ; Chien, Jen-Tzung

Author_Institution :

IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA

Volume :

Issue :

fYear :

2012

Firstpage :

1923

Lastpage :

1935

Abstract :

Current hidden Markov acoustic modeling for large-vocabulary continuous speech recognition (LVCSR) heavily relies on the availability of abundant labeled transcriptions. Given that speech labeling is both expensive and time-consuming while there is a huge amount of unlabeled data easily available nowadays, the semi-supervised learning (SSL) from both labeled and unlabeled data aiming to reduce the development cost for LVCSR becomes more important than ever. In this paper, a new SSL approach is proposed which exploits the cross-view transfer learning for LVCSR through a committee machine consisting of multiple views learned from different acoustic features and randomized decision trees. In addition, a multi-objective learning scheme is developed in each view by maximizing a hybrid information-theoretic criterion which is established by the relative entropy between labeled data and their labels and the entropy of unlabeled data. The multi-objective scheme is then generalized to a unified SSL framework which can be interpreted into a variety of learning strategies under different weighting schemes. Experiments conducted on English Broadcast News using 50 hours of transcribed speech with 50 hours and 150 hours of untranscribed speech show the benefits of proposed approaches.

Keywords :

acoustic signal processing; decision trees; entropy; hidden Markov models; learning (artificial intelligence); speech recognition; vocabulary; English broadcast news; HMM-based automatic speech recognition; LVCSR; SSL approach; SSL framework; acoustic features; cross-view transfer learning; development cost; hidden Markov acoustic modeling; hybrid information-theoretic criterion; large-vocabulary continuous speech recognition; learning strategy; multiobjective learning scheme; multiobjective semi-supervised learning; multiview semi-supervised learning; randomized decision tree; relative entropy; speech labeling; transcribed speech; unlabeled data; Acoustics; Adaptation models; Decision trees; Entropy; Hidden Markov models; Speech; Training; Acoustic modeling; automatic speech recognition; multi-objective learning; multi-view committee machine; semi-supervised learning (SSL);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2012.2191955

Filename :

6175108

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1478994