مرکز منطقه ای اطلاع رساني علوم و فناوري - CLOSE—A Data-Driven Approach to Speech Separation

DocumentCode :

78988

Title :

CLOSE—A Data-Driven Approach to Speech Separation

Author :

Ji Ming ; Srinivasan, Rajagopalan ; Crookes, D. ; Jafari, Aghil

Author_Institution :

Sch. of Electron., Electr. Eng. & Comput. Sci., Queen´s Univ. Belfast, Belfast, UK

Volume :

Issue :

fYear :

2013

fDate :

Jul-13

Firstpage :

1355

Lastpage :

1368

Abstract :

This paper studies single-channel speech separation, assuming unknown, arbitrary temporal dynamics for the speech signals to be separated. A data-driven approach is described, which matches each mixed speech segment against a composite training segment to separate the underlying clean speech segments. To advance the separation accuracy, the new approach seeks and separates the longest mixed speech segments with matching composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the constituent training segments, and hence the error of separation. For convenience, we call the new approach Composition of Longest Segments, or CLOSE. The CLOSE method includes a data-driven approach to model long-range temporal dynamics of speech signals, and a statistical approach to identify the longest mixed speech segments with matching composite training segments. Experiments are conducted on the Wall Street Journal database, for separating mixtures of two simultaneous large-vocabulary speech utterances spoken by two different speakers. The results are evaluated using various objective and subjective measures, including the challenge of large-vocabulary continuous speech recognition. It is shown that the new separation approach leads to significant improvement in all these measures.

Keywords :

source separation; speech recognition; statistical analysis; CLOSE method; arbitrary temporal dynamics; composition of longest segment approach; data-driven approach; large-vocabulary continuous speech recognition; long-range temporal dynamic model; matching composite training segments; mixed speech segment; simultaneous large-vocabulary speech utterances; single-channel speech separation; speech signal separation; statistical approach; Grammar; Hidden Markov models; Psychoacoustic models; Speech; Speech recognition; Training; Vocabulary; Co-channel speech; longest matching segment; speaker identification; speech recognition; speech separation; temporal dynamics;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2013.2250959

Filename :

6473839

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=78988