Title :
Audio Signal Representations for Indexing in the Transform Domain
Author :
Ravelli, Emmanuel ; Richard, Gaël ; Daudet, Laurent
Author_Institution :
Inst. Jean le Rond d´´Alembert-LAM, Univ. Pierre et Marie Curie-Paris 6, Paris, France
fDate :
3/1/2010 12:00:00 AM
Abstract :
Indexing audio signals directly in the transform domain can potentially save a significant amount of computation when working on a large database of signals stored in a lossy compression format, without having to fully decode the signals. Here, we show that the representations used in standard transform-based audio codecs (e.g., MDCT for AAC, or hybrid PQF/MDCT for MP3) have a sufficient time resolution for some rhythmic features, but a poor frequency resolution, which prevents their use in tonality-related applications. Alternatively, a recently developed audio codec based on a sparse multi-scale MDCT transform has a good resolution both for time- and frequency-domain features. We show that this new audio codec allows efficient transform-domain audio indexing for three different applications, namely beat tracking, chord recognition, and musical genre classification. We compare results obtained with this new audio codec and the two standard MP3 and AAC codecs, in terms of performance and computation time.
Keywords :
audio coding; audio databases; discrete cosine transforms; frequency-domain analysis; music; signal representation; signal resolution; time-domain analysis; AAC codec; MP3 codec; audio codec; audio signal indexing; audio signal representation; beat tracking; chord recognition; frequency-domain features; large signal database; lossy compression; musical genre classification; signal decoding; signal resolution; sparse multiscale MDCT transform; time-domain features; tonality-related application; transform domain; Audio databases; Code standards; Codecs; Decoding; Digital audio players; Frequency; Indexing; Signal representations; Signal resolution; Spatial databases; Audio coding; audio indexing; sparse representations; time–frequency representations;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2009.2025099