Combining Machine Learning and Computational Auditory Scene Analysis to Separate Monaural Speech of Two-Talker

Author

Li, Peng ; Guan, Yong ; Liu, Wenju ; Xu, Bo

Author_Institution

Digital Media Content Technol. Res. Center, Chinese Acad. of Sci., Beijing

fYear

2007

fDate

Aug. 30 2007-Sept. 1 2007

Firstpage

280

Lastpage

284

Abstract

Monaural speech separation is one of the most difficult problems in speech signal processing. In this paper, a new method based on machine learning and computational auditory scene analysis (CASA) is suggested to separate the monaural speech of two-talker. The technique of machine learning is used to learn the grouping cues on isolated clean data from single speaker. By using a factorial-max vector quantization model (MAXVQ) to infer the masking signals needed in resynthesis, the objective of separation is accomplished. The results of experiment on a standard corpus show that this proposed method could separate the mixed speech of two speakers very well. The SNR of the separated speech are improved obviously.

Keywords

learning (artificial intelligence); speaker recognition; speech processing; vector quantisation; computational auditory scene analysis; factorial-max vector quantization model; machine learning; monaural speech separation; speech signal processing; Automation; Humans; Image analysis; Machine learning; Pattern recognition; Prototypes; Speech analysis; Speech coding; Speech processing; Timbre;

fLanguage

English

Publisher

ieee

Conference_Titel

Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on

Conference_Location

Beijing

Print_ISBN

978-1-4244-1610-3

Electronic_ISBN

978-1-4244-1611-0

Type

conf

DOI

10.1109/NLPKE.2007.4368044

Filename

4368044