Title :
A statistical multidimensional humming transcription using phone level hidden Markov models for query by humming systems
Author :
Shih, Hsuan-Huei ; Narayanan, Shrikanth S. ; Kuo, C. C Jay
Author_Institution :
Integrated Media Syst. Center, Southern California Univ., Los Angeles, CA, USA
Abstract :
A new phone level hidden Markov model approach applied to human humming transcription is proposed in this research. A music note has two important attributes, i.e. pitch and duration. The proposed system generates multidimensional humming transcriptions, which contain both pitch and duration information. Query by humming provides a natural means for content-based retrieval from music databases, and this research provides a robust front-end for such an application. The segment of a note in the humming waveform is modeled by phone level hidden Markov models (HMM). The duration of the note segment is then labeled by a duration model. The pitch of the note is modeled by a pitch model using a Gaussian mixture model. Preliminary real-time recognition experiments are carried out with models trained by data obtained from eight human objects, and an overall correct recognition rate of around 84% is demonstrated.
Keywords :
Gaussian processes; audio databases; audio signal processing; content-based retrieval; hidden Markov models; multidimensional systems; music; speaker recognition; Gaussian mixture model; content-based retrieval; duration; humming systems; humming waveform; music databases; music note; phone level hidden Markov models; pitch; query; real-time recognition; recognition rate; statistical multidimensional humming transcription; Content based retrieval; Databases; Electronic mail; Hidden Markov models; Humans; Instruments; Multidimensional signal processing; Multidimensional systems; Music information retrieval; Robustness;
Conference_Titel :
Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
Print_ISBN :
0-7803-7965-9
DOI :
10.1109/ICME.2003.1220854