• DocumentCode
    2701893
  • Title

    Segmental Modeling for Audio Segmentation

  • Author

    Aronowitz, Hagai

  • Author_Institution
    IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
  • Volume
    4
  • fYear
    2007
  • fDate
    15-20 April 2007
  • Abstract
    Trainable speech/non-speech segmentation and music detection algorithms usually consist of a frame based scoring phase combined with a smoothing phase. This paper suggests a framework in which both phases are explicitly unified in a segment based classifier. We suggest a novel segment based generative model in which audio segments are modeled as supervectors and each class (speech, silence, music) is modeled by a distribution over the supervector space. Segmental speech classes can then be modeled by generative models such as GMMs or can be classified by SVMs. Our suggested framework leads to a significant reduction in error rate.
  • Keywords
    Gaussian processes; audio signal processing; smoothing methods; speech processing; support vector machines; GMM; SVM; audio segmentation; detection algorithms; nonspeech segmentation; segment based generative model; segmental modeling; smoothing phase; Broadcasting; Detection algorithms; Error analysis; Hidden Markov models; Mel frequency cepstral coefficient; Natural languages; Smoothing methods; Speaker recognition; Speech; Testing; GMM supervectors; Speech segmentation; music detection; segmental modeling; voice activity detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
  • Conference_Location
    Honolulu, HI
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0727-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2007.366932
  • Filename
    4218120