• DocumentCode
    259404
  • Title

    Speech/Music Classification of Short Audio Segments

  • Author

    Hirvonen, Toni

  • Author_Institution
    Dolby Labs., Inc., Stockholm, Sweden
  • fYear
    2014
  • fDate
    10-12 Dec. 2014
  • Firstpage
    135
  • Lastpage
    138
  • Abstract
    Research on speech/music classification of digital audio has been both popular in academia, and increasingly utilized in industry. Most of the usual methods use carefully hand-crafted features with Gaussian Mixture Models. To get best performance, some of the features necessitate a long latency due to look ahead, or/and a long onset error. This paper aims to have a different approach to the problem by exploring some of the latest trends in machine learning that have resulted in improvements in other fields. Specifically, it is shown that we can achieve comparable performance by only analyzing segments in the order of tens of milliseconds without the use of following or previous audio. This is done by using a method that allows automatic generation of arbitrarily many features from preprocessed spectrograms.
  • Keywords
    Gaussian processes; audio signal processing; feature extraction; learning (artificial intelligence); mixture models; signal classification; speech processing; Gaussian mixture model; automatic generation; digital audio segment; hand-crafted features; machine learning; music classification; spectrogram; speech classification; Accuracy; Encoding; Spectrogram; Speech; Speech recognition; Support vector machines; Training; audio classification; feature learning; sparse coding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia (ISM), 2014 IEEE International Symposium on
  • Conference_Location
    Taichung
  • Print_ISBN
    978-1-4799-4312-8
  • Type

    conf

  • DOI
    10.1109/ISM.2014.27
  • Filename
    7033009