DocumentCode :
2181620
Title :
Automatic Language Identification in music videos with low level audio and visual features
Author :
Chandrasekhar, Vijay ; Sargin, Mehmet Emre ; Ross, David A.
Author_Institution :
Google, Inc., Mountain View, CA, USA
fYear :
2011
fDate :
22-27 May 2011
Firstpage :
5724
Lastpage :
5727
Abstract :
Automatic Language Identification (LID) in music has received significantly less attention than LID in speech. Here, we study the problem of LID in music videos uploaded on YouTube. We use a "bag-of-words" approach based on state-of-the-art content based audio-visual features and linear S VM classifiers for automatic LID. Our system obtains 48% accuracy for a corpus of 25000 music videos and 25 different languages.
Keywords :
Web sites; music; pattern classification; speech processing; support vector machines; video signal processing; YouTube; automatic language identification; bag-of-words approach; content based audio-visual features; linear SVM classifiers; music videos; spoken language processing; Accuracy; Histograms; Mel frequency cepstral coefficient; Pixel; Speech; Videos; Visualization; LID in music; audio-visual features; automatic language identification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
ISSN :
1520-6149
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2011.5947660
Filename :
5947660
Link To Document :
بازگشت