A feature extraction method for speech recognition based on temporal tracking of clusters in spectro-temporal domain

Author

Esfandian, Nafiseh ; Razzazi, Farbod ; Behrad, Alireza

Author_Institution

Dept. of Electr. Eng., Islamic Azad Univ., Qaemshahr, Iran

fYear

2012

fDate

2-3 May 2012

Abstract

In this paper, a novel approach is proposed for secondary feature extraction based on clusters tracking in spectro-temporal domain. Because of high dimensionality of the spectro-temporal features space, this domain is unsuitable for practical speech recognition systems. In order to reduce the dimensions of the feature space, weighted K-means (WKM) clustering technique is applied to spectro-temporal domain. The elements of mean vectors and covariance matrices of clusters are considered as the feature vector of each frame. However the cluster locations change gradually over the time. The main approach is based on the idea that the variations in clusters locations should be temporally tracked frame by frame and the parameters of these variations are considered in the extraction of secondary feature vectors of each speech frame. Several models are used to register the clusters in the new coming frame. In addition, a new architecture is proposed to classify the speech frames by a combining classifier using both tracked and non-tracked secondary features. The assessments were conducted for the proposed feature vectors on classification of several subsets of TIMIT database phonemes. Using tracked secondary feature vectors, the result was improved to 77.4% on voiced plosives classification which was relatively 1.8% higher than the results of non-tracked secondary feature vectors. The results on other subsets showed good improvement in classification rate too.

Keywords

covariance matrices; feature extraction; pattern clustering; set theory; signal classification; speech recognition; vectors; TIMIT database phonemes; WKM clustering technique; covariance matrices; dimension reduction; mean vector; secondary feature vector extraction; spectrotemporal feature space; speech classification; speech recognition; subsets; temporal cluster tracking; voiced plosives classification; weighted K-means clustering technique; Feature extraction; Filter banks; Sorting; Spectrogram; Speech; Support vector machine classification; Vectors; Auditory system; Clustering methods; Feature extraction; Image matching; Speech processing; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Artificial Intelligence and Signal Processing (AISP), 2012 16th CSI International Symposium on

Conference_Location

Shiraz, Fars

Print_ISBN

978-1-4673-1478-7

Type

conf

DOI

10.1109/AISP.2012.6313709

Filename

6313709

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=573553