Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation

Author

Heracleous, Panikos ; Nakajima, Yoshiki ; Lee, Akinobu ; Saruwatari, Hiroshi ; Shikano, Kiyohiro

Author_Institution

Graduate Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Japan

fYear

2003

fDate

30 Nov.-3 Dec. 2003

Firstpage

73

Lastpage

76

Abstract

In previous works, we introduced a special device (Non-Audible Murmur (NATM) microphone) able to detect very quietly uttered speech (murmur), which cannot be heard by listeners near the talker. Experimental results showed the efficiency of the device in NAM recognition. Using normal-speech monophone hidden Markov models (HMM) retrained with NAM data from a specific speaker, we could recognize NAM with high accuracy. Although the results were very promising, a serious problem is the HMM retraining, which requires a large amount of training data. In this paper, we introduce a new method for NAM recognition, which requires only a small amount of NAM data for training. The proposed method is based on supervised adaptation. The main difference from other adaptation approaches lies in the fact that instead of single-iteration adaptation, we use iterative adaptation (iterative supervised MLLR). Experiments prove the efficiency of the proposed method. Using normal-speech clean initial models and only 350 adaptation NAM utterances, we achieved a recognition accuracy of 88.62%, which is a very promising result. Therefore, with a small amount of adaptation data, we were able to create accurate individual HMM. We also introduce results of experiments, which show the effects of the number of iterations, the amount of adaptation data, and the regression tree classes.

Keywords

hidden Markov models; iterative methods; regression analysis; speech recognition; HMM retraining; NAM recognition; hidden Markov models; iterative supervised MLLR; iterative supervised adaptation; nonaudible murmur recognition; normal-speech clean initial models; recognition accuracy; regression tree classes; Head; Hidden Markov models; Iterative methods; Maximum likelihood linear regression; Microphones; Privacy; Regression tree analysis; Speech recognition; Training data; Working environment noise;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on

Print_ISBN

0-7803-7980-2

Type

conf

DOI

10.1109/ASRU.2003.1318406

Filename

1318406