Title :
Pitch-based feature extraction for audio classification
Author :
Abu-El-Quran, Ahmad R. ; Goubran, Rafik A.
Author_Institution :
Dept. of Syst. & Comput. Eng., Carleton Univ., Ottawa, Ont., Canada
Abstract :
This paper proposes a new algorithm to discriminate between speech and non-speech audio segments. It is intended for security applications as well as talker location identification in audio conferencing systems, equipped with microphone arrays. The proposed method is based on splitting the audio segment into small frames and detecting the presence of pitch on each one of them. The ratio of frames with pitch detected to the total number of frames is defined as the pitch ratio and is used as the main feature to classify speech and non-speech segments. The performance of the proposed method is evaluated using a library of audio segments containing female and male speech, and non-speech segments such as computer fan noise, cocktail noise, footsteps, and traffic noise. It is shown that the proposed algorithm can achieve correct decision of 97% for the speech and 98% for non-speech segments, 0.5-seconds long.
Keywords :
feature extraction; pattern classification; speech processing; speech recognition; speech synthesis; audio classification; audio segments; audio-conferencing system; microphone array; nonspeech segments; pitch ratio; pitch-based feature extraction; speech segments; Acoustic noise; Application software; Cameras; Feature extraction; Low pass filters; Microphone arrays; Noise measurement; Speech analysis; Speech enhancement; Systems engineering and theory;
Conference_Titel :
Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings. The 2nd IEEE Internatioal Workshop on
Print_ISBN :
0-7803-8108-4
DOI :
10.1109/HAVE.2003.1244723