DocumentCode :
78988
Title :
CLOSE—A Data-Driven Approach to Speech Separation
Author :
Ji Ming ; Srinivasan, Rajagopalan ; Crookes, D. ; Jafari, Aghil
Author_Institution :
Sch. of Electron., Electr. Eng. & Comput. Sci., Queen´s Univ. Belfast, Belfast, UK
Volume :
21
Issue :
7
fYear :
2013
fDate :
Jul-13
Firstpage :
1355
Lastpage :
1368
Abstract :
This paper studies single-channel speech separation, assuming unknown, arbitrary temporal dynamics for the speech signals to be separated. A data-driven approach is described, which matches each mixed speech segment against a composite training segment to separate the underlying clean speech segments. To advance the separation accuracy, the new approach seeks and separates the longest mixed speech segments with matching composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the constituent training segments, and hence the error of separation. For convenience, we call the new approach Composition of Longest Segments, or CLOSE. The CLOSE method includes a data-driven approach to model long-range temporal dynamics of speech signals, and a statistical approach to identify the longest mixed speech segments with matching composite training segments. Experiments are conducted on the Wall Street Journal database, for separating mixtures of two simultaneous large-vocabulary speech utterances spoken by two different speakers. The results are evaluated using various objective and subjective measures, including the challenge of large-vocabulary continuous speech recognition. It is shown that the new separation approach leads to significant improvement in all these measures.
Keywords :
source separation; speech recognition; statistical analysis; CLOSE method; arbitrary temporal dynamics; composition of longest segment approach; data-driven approach; large-vocabulary continuous speech recognition; long-range temporal dynamic model; matching composite training segments; mixed speech segment; simultaneous large-vocabulary speech utterances; single-channel speech separation; speech signal separation; statistical approach; Grammar; Hidden Markov models; Psychoacoustic models; Speech; Speech recognition; Training; Vocabulary; Co-channel speech; longest matching segment; speaker identification; speech recognition; speech separation; temporal dynamics;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2013.2250959
Filename :
6473839
Link To Document :
بازگشت