Title :
Modulation-scale analysis for content identification
Author :
Sukittanon, Somsak ; Atlas, Les E. ; Pitton, James W.
Author_Institution :
Dept. of Electr. Eng., Univ. of Washington, Seattle, WA, USA
Abstract :
For nonstationary signal classification, e.g., speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer term signal variation. Partially motivated by human audition, we overcome the deficiencies of short-term features by employing modulation-scale analysis for long-term feature analysis. Our analysis, which uses time-frequency theory integrated with psychoacoustic results on modulation frequency perception, not only contains short-term information about the signals, but also provides long-term information representing patterns of time variation. This paper describes these features and their normalization. We demonstrate the effectiveness of our long-term features over conventional short-term features in content-based audio identification. A simulated study using a large data set, including nearly 10 000 songs and requiring over a billion audio pairwise comparisons, shows that modulation-scale features improves content identification accuracy substantially, especially when time and frequency distortions are imposed.
Keywords :
audio signal processing; distortion; frequency modulation; identification; music; signal classification; audio pairwise comparisons; content identification; content-based audio identification; frequency distortion; long-term feature analysis; modulation frequency perception; modulation-scale analysis; nonstationary signal classification; psychoacoustic results; time-frequency theory; Data mining; Feature extraction; Frequency; Humans; Information analysis; Multiple signal classification; Pattern analysis; Pattern classification; Signal analysis; Speech; 2-D features; Audio fingerprinting; audio identification; audio retrieval; auditory classification; content identification; feature extraction; feature normalization; long-term features; modulation features; modulation scale; modulation spectrum; pattern recognition; short-term features;
Journal_Title :
Signal Processing, IEEE Transactions on
DOI :
10.1109/TSP.2004.833861