• DocumentCode
    249363
  • Title

    Speech and Singing Discrimination for Audio Data Indexing

  • Author

    Wei-Ho Tsai ; Cin-Hao Ma

  • Author_Institution
    Dept. of Electron. Eng., Nat. Taipei Univ. of Technol., Taipei, Taiwan
  • fYear
    2014
  • fDate
    June 27 2014-July 2 2014
  • Firstpage
    276
  • Lastpage
    280
  • Abstract
    This study investigates the technique of automatically discriminating speech from singing voices for audio data indexing. We propose a discrimination system based on both timbre and pitch feature analyses. In using timbre features, voice recordings are converted into Mel-Frequency Cepstral Coefficients and their first derivatives and then analyzed using Gaussian mixture models. In using pitch feature, we represent voice recordings as MIDI note sequences and then use bigram models to analyze the dynamic change information of the notes. Our experiments, conducted using a database including 600 test recordings from 10 subjects, show that the proposed system can achieve 94.3% accuracy.
  • Keywords
    Gaussian processes; audio signal processing; cepstral analysis; feature extraction; indexing; mixture models; speech recognition; Gaussian mixture model; MIDI note sequences; audio data indexing; bigram model; dynamic change information; melfrequency cepstral coefficients; pitch feature analyses; singing discrimination; singing voices; speech discrimination; timbre feature analyses; voice recordings; Accuracy; Feature extraction; Speech; Speech processing; Timbre; pitch; singing; speech; timbre; voice discrimination;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2014 IEEE International Congress on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4799-5056-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2014.138
  • Filename
    6906790