Title :
Voice-Melody Transcription Under a Speech Recognition Framework
Author :
Dan-ning Jiang ; Picheny, Michael ; Yong Qin
Author_Institution :
IBM China Res. Lab, China
Abstract :
This paper presents a robust voice-melody transcription system using a speech recognition framework. While many previous voice-melody transcription systems have utilized non-statistical approaches, statistical recognition technology can potentially achieve more robust results. A cepstrum-based acoustic model is employed to avoid the hard-decisions that have to be made when using explicit voiced-unvoiced segmentation and pitch extraction, and a key-independent 4-gram language model is employed to capture prior probabilities of different melodic sequences. Evaluations are done from the perspective of both note recognition error rate and query-by-humming end-to-end performance. The results are compared with three other voice-melody transcription systems. Experiments have shown that our system is state-of-the-art: it is much more robust than other systems on data containing noise, and close to the best of all the systems on the clean data set.
Keywords :
acoustic signal processing; speech processing; speech recognition; statistics; cepstrum-based acoustic model; key-independent 4-gram language model; melodic sequences; pitch extraction; query-by-humming end-to-end performance; recognition error rate; speech recognition framework; statistical recognition technology; voice-melody transcription system; voiced-unvoiced segmentation; Cepstral analysis; Data mining; Databases; Error analysis; Hidden Markov models; Humans; Music information retrieval; Noise robustness; Probability; Speech recognition; Query-by-Humming; voice-melody transcription;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0727-3
DOI :
10.1109/ICASSP.2007.366988