DocumentCode :
1884292
Title :
Effective preprocessing of speech and acoustic features extraction for spoken language identification
Author :
Kumar, Abhijeet ; Hemani, H. ; Sakthivel, N. ; Chaturvedi, S.
Author_Institution :
Comput. Anal. Div., Bhabha Atomic Res. Center, Visakhapatnam, India
fYear :
2015
fDate :
6-8 May 2015
Firstpage :
81
Lastpage :
88
Abstract :
Language identification (LID) systems have become very popular and indispensible in multilingual speech processing applications where there is need of preprocessing of machine systems and preprocessing of human interface. The system predicts the best identified language given the speech utterance. The proposed LID system uses a gaussian mixture model (GMM) based LID which uses generatively trained language models on acoustic features of a particular language. Acoustic approach requires only the digitized speech utterance and their language labels which are less expensive computationally than the alternative approaches which also require phonetic transcription of speech. This paper investigates the different preprocessing techniques for noise removal, speech activity detection (SAD), speaker normalization and channel normalization. Also, the extraction procedure of cepstral features that captures the phonetic characteristics of signal is illustrated. We also give a comprehensive review of the current trends in feature extraction and compare the results of the same. Notably, Shifted delta cepstral (SDC), a quintessential feature for LID systems derived from Mel frequency cepstral features (MFCC) have been successfully tested with GMM based classifier. A comparative study between use of MFCC and SDC features in LID has been conducted and presented.
Keywords :
Gaussian processes; feature extraction; natural language processing; speech processing; GMM based LID system; Gaussian mixture model based LID system; Mel frequency cepstral features; acoustic feature extraction; channel normalization; digitized speech utterance; multilingual speech processing applications; noise removal; phonetic transcription; shifted delta cepstral; speaker normalization; speech activity detection; speech feature extraction; spoken language identification system; Feature extraction; Mel frequency cepstral coefficient; Noise; Speech; Training; Feature extraction; Gaussian mixture modeling; Language identification; Mel Frequency Cepstral Coefficient; Shifted delta cepstral; Speech activity detection; Vocal tract length normalization; cepstral mean subtraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), 2015 International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4799-9854-8
Type :
conf
DOI :
10.1109/ICSTM.2015.7225394
Filename :
7225394
Link To Document :
بازگشت