• DocumentCode
    27697
  • Title

    A Systematic Evaluation of the Bag-of-Frames Representation for Music Information Retrieval

  • Author

    Li Su ; Yeh, Chin-Chia Michael ; Jen-Yu Liu ; Ju-Chiang Wang ; Yi-Hsuan Yang

  • Author_Institution
    Res. Center for Inf. Technol. Innovation, Acad. Sinica, Taipei, Taiwan
  • Volume
    16
  • Issue
    5
  • fYear
    2014
  • fDate
    Aug. 2014
  • Firstpage
    1188
  • Lastpage
    1200
  • Abstract
    There has been an increasing attention on learning feature representations from the complex, high-dimensional audio data applied in various music information retrieval (MIR) problems. Unsupervised feature learning techniques, such as sparse coding and deep belief networks have been utilized to represent music information as a term-document structure comprising of elementary audio codewords. Despite the widespread use of such bag-of-frames (BoF) model, few attempts have been made to systematically compare different component settings. Moreover, whether techniques developed in the text retrieval community are applicable to audio codewords is poorly understood. To further our understanding of the BoF model, we present in this paper a comprehensive evaluation that compares a large number of BoF variants on three different MIR tasks, by considering different ways of low-level feature representation, codebook construction, codeword assignment, segment-level and song-level feature pooling, tf-idf term weighting, power normalization, and dimension reduction. Our evaluations lead to the following findings: 1) modeling music information by two levels of abstraction improves the result for difficult tasks such as predominant instrument recognition, 2) tf-idf weighting and power normalization improve system performance in general, 3) topic modeling methods such as latent Dirichlet allocation does not work for audio codewords.
  • Keywords
    information retrieval; music; unsupervised learning; BoF model; MIR; audio codewords; bag-of-frames representation; feature representations; music information modeling; music information retrieval problems; power normalization; tf-idf weighting; unsupervised feature learning; Frequency measurement; Matching pursuit algorithms; Mel frequency cepstral coefficient; Multiple signal classification; Music information retrieval; Spectrogram; Training; Bag-of-frames model; music information retrieval; sparse coding; unsupervised feature learning;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2014.2311016
  • Filename
    6763025