DocumentCode :
2701893
Title :
Segmental Modeling for Audio Segmentation
Author :
Aronowitz, Hagai
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
4
fYear :
2007
fDate :
15-20 April 2007
Abstract :
Trainable speech/non-speech segmentation and music detection algorithms usually consist of a frame based scoring phase combined with a smoothing phase. This paper suggests a framework in which both phases are explicitly unified in a segment based classifier. We suggest a novel segment based generative model in which audio segments are modeled as supervectors and each class (speech, silence, music) is modeled by a distribution over the supervector space. Segmental speech classes can then be modeled by generative models such as GMMs or can be classified by SVMs. Our suggested framework leads to a significant reduction in error rate.
Keywords :
Gaussian processes; audio signal processing; smoothing methods; speech processing; support vector machines; GMM; SVM; audio segmentation; detection algorithms; nonspeech segmentation; segment based generative model; segmental modeling; smoothing phase; Broadcasting; Detection algorithms; Error analysis; Hidden Markov models; Mel frequency cepstral coefficient; Natural languages; Smoothing methods; Speaker recognition; Speech; Testing; GMM supervectors; Speech segmentation; music detection; segmental modeling; voice activity detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
ISSN :
1520-6149
Print_ISBN :
1-4244-0727-3
Type :
conf
DOI :
10.1109/ICASSP.2007.366932
Filename :
4218120
Link To Document :
بازگشت