DocumentCode :
1754808
Title :
Codebook-Based Audio Feature Representation for Music Information Retrieval
Author :
Vaizman, Yonatan ; McFee, Brian ; Lanckriet, Gert
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of California, San Diego, La Jolla, CA, USA
Volume :
22
Issue :
10
fYear :
2014
fDate :
Oct. 2014
Firstpage :
1483
Lastpage :
1493
Abstract :
Digital music has become prolific in the web in recent decades. Automated recommendation systems are essential for users to discover music they love and for artists to reach appropriate audience. When manual annotations and user preference data is lacking (e.g. for new artists) these systems must rely on content based methods. Besides powerful machine learning tools for classification and retrieval, a key component for successful recommendation is the audio content representation. Good representations should capture informative musical patterns in the audio signal of songs. These representations should be concise, to enable efficient (low storage, easy indexing, fast search) management of huge music repositories, and should also be easy and fast to compute, to enable real-time interaction with a user supplying new songs to the system. Before designing new audio features, we explore the usage of traditional local features, while adding a stage of encoding with a pre-computed codebook and a stage of pooling to get compact vectorial representations. We experiment with different encoding methods, namely the LASSO, vector quantization (VQ) and cosine similarity (CS). We evaluate the representations´ quality in two music information retrieval applications: query-by-tag and query-by-example. Our results show that concise representations can be used for successful performance in both applications. We recommend using top- τ VQ encoding, which consistently performs well in both applications, and requires much less computation time than the LASSO.
Keywords :
audio coding; content-based retrieval; feature extraction; indexing; learning (artificial intelligence); music; query processing; recommender systems; vector quantisation; LASSO; audio content representation; audio encoding; automated recommendation systems; codebook-based audio feature representation; content based methods; cosine similarity; digital music; informative musical patterns; machine learning tools; manual annotations; music discovery; music information retrieval; music repository management; precomputed codebook; query-by-example; query-by-tag; real-time user interaction; song audio signal; top- τ VQ encoding; user preference data; vector quantization; vectorial representations; Dictionaries; Encoding; Hidden Markov models; Speech; Speech processing; Training; Vectors; Audio content representations; music information retrieval; music recommendation; sparse coding; vector quantization;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2014.2337842
Filename :
6851913
Link To Document :
بازگشت