مرکز منطقه ای اطلاع رساني علوم و فناوري - Speech recognition for mixed speech and music by NMF using various cost functions and noise adaptive training methods

DocumentCode :

3752071

Title :

Speech recognition for mixed speech and music by NMF using various cost functions and noise adaptive training methods

Author :

Naoaki Hashimoto;Kazumasa Yamamoto;Seiichi Nakagawa

Author_Institution :

Department of Computer Science and Engineering, Toyohashi University of Technology, Japan

fYear :

2015

Firstpage :

Lastpage :

Abstract :

We investigated speech recognition methods for mixed speech and music that only remove music based on non-negative matrix factorization (NMF). In this paper, we introduced the Euclidean distance of logarithm spectrum D_LOG as a distance measure for source separation, which may correspond to the distance measure for speech recognition, and compared it with such traditional distance measures as the Kullback-Leibler divergence and the Itakura-Saito divergence. We improved the speech recognition performance by pooling the estimated speech, the mixed sound, and clean speech to train the acoustic model. For isolated word recognition with NMF using D_LOG, we obtained an improvement from the baseline. Using the Itakura-Saito divergence and the "clean, multi-condition and noise-adaptive training model", we reduced the word error rate of 54.7% relative from the case of the "multi-condition training model" on average, from 57.6% to 80.8% word recognition rate.

Keywords :

Decision support systems

Publisher :

ieee

Conference_Titel :

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific

Type :

conf

DOI :

10.1109/APSIPA.2015.7415319

Filename :

7415319

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3752071