DocumentCode
2181620
Title
Automatic Language Identification in music videos with low level audio and visual features
Author
Chandrasekhar, Vijay ; Sargin, Mehmet Emre ; Ross, David A.
Author_Institution
Google, Inc., Mountain View, CA, USA
fYear
2011
fDate
22-27 May 2011
Firstpage
5724
Lastpage
5727
Abstract
Automatic Language Identification (LID) in music has received significantly less attention than LID in speech. Here, we study the problem of LID in music videos uploaded on YouTube. We use a "bag-of-words" approach based on state-of-the-art content based audio-visual features and linear S VM classifiers for automatic LID. Our system obtains 48% accuracy for a corpus of 25000 music videos and 25 different languages.
Keywords
Web sites; music; pattern classification; speech processing; support vector machines; video signal processing; YouTube; automatic language identification; bag-of-words approach; content based audio-visual features; linear SVM classifiers; music videos; spoken language processing; Accuracy; Histograms; Mel frequency cepstral coefficient; Pixel; Speech; Videos; Visualization; LID in music; audio-visual features; automatic language identification;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location
Prague
ISSN
1520-6149
Print_ISBN
978-1-4577-0538-0
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2011.5947660
Filename
5947660
Link To Document