DocumentCode
27697
Title
A Systematic Evaluation of the Bag-of-Frames Representation for Music Information Retrieval
Author
Li Su ; Yeh, Chin-Chia Michael ; Jen-Yu Liu ; Ju-Chiang Wang ; Yi-Hsuan Yang
Author_Institution
Res. Center for Inf. Technol. Innovation, Acad. Sinica, Taipei, Taiwan
Volume
16
Issue
5
fYear
2014
fDate
Aug. 2014
Firstpage
1188
Lastpage
1200
Abstract
There has been an increasing attention on learning feature representations from the complex, high-dimensional audio data applied in various music information retrieval (MIR) problems. Unsupervised feature learning techniques, such as sparse coding and deep belief networks have been utilized to represent music information as a term-document structure comprising of elementary audio codewords. Despite the widespread use of such bag-of-frames (BoF) model, few attempts have been made to systematically compare different component settings. Moreover, whether techniques developed in the text retrieval community are applicable to audio codewords is poorly understood. To further our understanding of the BoF model, we present in this paper a comprehensive evaluation that compares a large number of BoF variants on three different MIR tasks, by considering different ways of low-level feature representation, codebook construction, codeword assignment, segment-level and song-level feature pooling, tf-idf term weighting, power normalization, and dimension reduction. Our evaluations lead to the following findings: 1) modeling music information by two levels of abstraction improves the result for difficult tasks such as predominant instrument recognition, 2) tf-idf weighting and power normalization improve system performance in general, 3) topic modeling methods such as latent Dirichlet allocation does not work for audio codewords.
Keywords
information retrieval; music; unsupervised learning; BoF model; MIR; audio codewords; bag-of-frames representation; feature representations; music information modeling; music information retrieval problems; power normalization; tf-idf weighting; unsupervised feature learning; Frequency measurement; Matching pursuit algorithms; Mel frequency cepstral coefficient; Multiple signal classification; Music information retrieval; Spectrogram; Training; Bag-of-frames model; music information retrieval; sparse coding; unsupervised feature learning;
fLanguage
English
Journal_Title
Multimedia, IEEE Transactions on
Publisher
ieee
ISSN
1520-9210
Type
jour
DOI
10.1109/TMM.2014.2311016
Filename
6763025
Link To Document