DocumentCode :
3752071
Title :
Speech recognition for mixed speech and music by NMF using various cost functions and noise adaptive training methods
Author :
Naoaki Hashimoto;Kazumasa Yamamoto;Seiichi Nakagawa
Author_Institution :
Department of Computer Science and Engineering, Toyohashi University of Technology, Japan
fYear :
2015
Firstpage :
27
Lastpage :
30
Abstract :
We investigated speech recognition methods for mixed speech and music that only remove music based on non-negative matrix factorization (NMF). In this paper, we introduced the Euclidean distance of logarithm spectrum DLOG as a distance measure for source separation, which may correspond to the distance measure for speech recognition, and compared it with such traditional distance measures as the Kullback-Leibler divergence and the Itakura-Saito divergence. We improved the speech recognition performance by pooling the estimated speech, the mixed sound, and clean speech to train the acoustic model. For isolated word recognition with NMF using DLOG, we obtained an improvement from the baseline. Using the Itakura-Saito divergence and the "clean, multi-condition and noise-adaptive training model", we reduced the word error rate of 54.7% relative from the case of the "multi-condition training model" on average, from 57.6% to 80.8% word recognition rate.
Keywords :
Decision support systems
Publisher :
ieee
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
Type :
conf
DOI :
10.1109/APSIPA.2015.7415319
Filename :
7415319
Link To Document :
بازگشت