Title :
Combining cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition
Author :
Do, Cong-Thanh ; Taghizadeh, Mohammad J. ; Garner, Philip N.
Author_Institution :
LIMSI, Orsay, France
Abstract :
This paper investigates the combination of cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition. Testing speech signals are recorded by a circular microphone array and are subsequently processed with superdirective beamforming and McCowan post-filtering. Training speech signals, from the multichannel overlapping Number corpus (MONC), are clean and not overlapping. Cochlear implant-like speech processing, which is inspired from the speech processing strategy in cochlear implants, is applied on the training and testing speech signals. Cepstral normalization, including cepstral mean and variance normalization (CMN and CVN), are applied on the training and testing cepstra. Experiments show that implementing either cepstral normalization or cochlear implant-like speech processing helps in reducing the WERs of microphone array-based speech recognition. Combining cepstral normalization and cochlear implant-like speech processing reduces further the WERs, when there is overlapping speech. Train/test mismatches are measured using the Kullback-Leibler divergence (KLD), between the global probability density functions (PDFs) of training and testing cepstral vectors. This measure reveals a train/test mismatch reduction when either cepstral normalization or cochlear implant-like speech processing is used. It reveals also that combining these two processing reduces further the train/test mismatches as well as the WERs.
Keywords :
acoustic signal processing; array signal processing; cepstral analysis; cochlear implants; filtering theory; microphone arrays; probability; speech recognition; CMN; CVN; KLD; Kullback-Leibler divergence; MONC; McCowan postfiltering; PDF; WER; cepstral mean normalization; cepstral variance normalization; circular microphone array; cochlear implant-like speech processing; multichannel overlapping number corpus; overlapping speech; probability density function; speech recognition; speech signal recording; speech signal testing; speech signal training; superdirective beamforming; testing cepstral vector; training cepstral vector; Cepstral analysis; Microphones; Speech; Speech processing; Speech recognition; Testing; Training; Cepstral normalization; Cochlear implant-like speech processing; Kullback-Leibler divergence; Microphone array speech recognition; Overlapping speech;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2012 IEEE
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4673-5125-6
Electronic_ISBN :
978-1-4673-5124-9
DOI :
10.1109/SLT.2012.6424211