DocumentCode
259404
Title
Speech/Music Classification of Short Audio Segments
Author
Hirvonen, Toni
Author_Institution
Dolby Labs., Inc., Stockholm, Sweden
fYear
2014
fDate
10-12 Dec. 2014
Firstpage
135
Lastpage
138
Abstract
Research on speech/music classification of digital audio has been both popular in academia, and increasingly utilized in industry. Most of the usual methods use carefully hand-crafted features with Gaussian Mixture Models. To get best performance, some of the features necessitate a long latency due to look ahead, or/and a long onset error. This paper aims to have a different approach to the problem by exploring some of the latest trends in machine learning that have resulted in improvements in other fields. Specifically, it is shown that we can achieve comparable performance by only analyzing segments in the order of tens of milliseconds without the use of following or previous audio. This is done by using a method that allows automatic generation of arbitrarily many features from preprocessed spectrograms.
Keywords
Gaussian processes; audio signal processing; feature extraction; learning (artificial intelligence); mixture models; signal classification; speech processing; Gaussian mixture model; automatic generation; digital audio segment; hand-crafted features; machine learning; music classification; spectrogram; speech classification; Accuracy; Encoding; Spectrogram; Speech; Speech recognition; Support vector machines; Training; audio classification; feature learning; sparse coding;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia (ISM), 2014 IEEE International Symposium on
Conference_Location
Taichung
Print_ISBN
978-1-4799-4312-8
Type
conf
DOI
10.1109/ISM.2014.27
Filename
7033009
Link To Document