مرکز منطقه ای اطلاع رساني علوم و فناوري - A Corpus-Based Approach to Speech Enhancement From Nonstationary Noise

DocumentCode :

1290794

Title :

A Corpus-Based Approach to Speech Enhancement From Nonstationary Noise

Author :

Ming, Ji ; Srinivasan, Ramji ; Crookes, Danny

Author_Institution :

Sch. of Electron., Electr. Eng., & Comput. Sci., Queen´´s Univ. Belfast, Belfast, UK

Volume :

Issue :

fYear :

2011

fDate :

5/1/2011 12:00:00 AM

Firstpage :

822

Lastpage :

836

Abstract :

Temporal dynamics and speaker characteristics are two important features of speech that distinguish speech from noise. In this paper, we propose a method to maximally extract these two features of speech for speech enhancement. We demonstrate that this can reduce the requirement for prior information about the noise, which can be difficult to estimate for fast-varying noise. Given noisy speech, the new approach estimates clean speech by recognizing long segments of the clean speech as whole units. In the recognition, clean speech sentences, taken from a speech corpus, are used as examples. Matching segments are identified between the noisy sentence and the corpus sentences. The estimate is formed by using the longest matching segments found in the corpus sentences. Longer speech segments as whole units contain more distinct dynamics and richer speaker characteristics, and can be identified more accurately from noise than shorter speech segments. Therefore, estimation based on the longest recognized segments increases the noise immunity and hence the estimation accuracy. The new approach consists of a statistical model to represent up to sentence-long temporal dynamics in the corpus speech, and an algorithm to identify the longest matching segments between the noisy sentence and the corpus sentences. The algorithm is made more robust to noise uncertainty by introducing missing-feature based noise compensation into the corpus sentences. Experiments have been conducted on the TIMIT database for speech enhancement from various types of nonstationary noise including song, music, and crosstalk speech. The new approach has shown improved performance over conventional enhancement algorithms in both objective and subjective evaluations.

Keywords :

feature extraction; noise; pattern matching; speaker recognition; speech enhancement; statistical analysis; TIMIT database; clean speech sentence recognition; corpus sentence; corpus speech; fast-varying noise estimation; feature extraction; matching segment; missing-feature based noise compensation; nonstationary noise; speaker recognition; speech enhancement; speech estimation; statistical model; temporal dynamics; Corpus-based speech modeling; longest matching segment; nonstationary noise; speech enhancement; speech separation;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2010.2064312

Filename :

5545372

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1290794