• DocumentCode
    2181620
  • Title

    Automatic Language Identification in music videos with low level audio and visual features

  • Author

    Chandrasekhar, Vijay ; Sargin, Mehmet Emre ; Ross, David A.

  • Author_Institution
    Google, Inc., Mountain View, CA, USA
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    5724
  • Lastpage
    5727
  • Abstract
    Automatic Language Identification (LID) in music has received significantly less attention than LID in speech. Here, we study the problem of LID in music videos uploaded on YouTube. We use a "bag-of-words" approach based on state-of-the-art content based audio-visual features and linear S VM classifiers for automatic LID. Our system obtains 48% accuracy for a corpus of 25000 music videos and 25 different languages.
  • Keywords
    Web sites; music; pattern classification; speech processing; support vector machines; video signal processing; YouTube; automatic language identification; bag-of-words approach; content based audio-visual features; linear SVM classifiers; music videos; spoken language processing; Accuracy; Histograms; Mel frequency cepstral coefficient; Pixel; Speech; Videos; Visualization; LID in music; audio-visual features; automatic language identification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947660
  • Filename
    5947660