Title :
Fuzzy-Clustering-Based Decision Tree Approach for Large Population Speaker Identification
Author :
Yakun Hu ; Dapeng Wu ; Nucci, Antonio
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
fDate :
4/1/2013 12:00:00 AM
Abstract :
In this paper, we address the problem of large population speaker identification under noisy conditions. Major techniques for speaker identification is based on Mel-Frequency Cepstral Coefficients (MFCC), Gaussian Mixture Model (GMM) and Universal Background Model (UBM) which we call MFCC+GMM and MFCC+GMM+UBM. The approaches are known to perform very well for small population identification under low-noise conditions. However, the increase of population size can cause performance degradation of these schemes under noisy conditions. To mitigate this limitation, we propose a fuzzy-clustering-based decision tree approach. The key idea of our approach is to 1) use a decision tree to hierarchically partition the whole population into groups of small size, and determine which speaker group at the leaf node a speaker under test belongs to, and 2) apply MFCC+GMM to the selected speaker group for speaker identification. The advantage of our approach is that we use features that are independent from MFCC to partition speakers into groups and only apply MFCC+GMM to speaker groups at the leaf level. The key challenge in our design is how to achieve a low error probability of decision-tree-based classification. To address this, we adopt fuzzy clustering in constructing the tree for population partitioning, i.e., at each level, a speaker may belong to multiple groups. Such redundancy increases the probability of classifying a speaker under test into a correct group/node on the tree. Another novelty of this paper is that we use pitch and five vocal source features to construct a six-level decision tree. Experimental results demonstrate that our approach outperforms MFCC+ GMM and MFCC+ GMM+ UBM with higher accuracy and lower complexity for large population identification under additive white Gaussian noise (AWGN) conditions.
Keywords :
AWGN; Gaussian processes; decision trees; fuzzy set theory; speaker recognition; Gaussian mixture model; Mel-frequency cepstral coefficients; additive white Gaussian noise; decision tree based classification; fuzzy clustering based decision tree; hierarchical partition; large population speaker identification; population partitioning; speaker under test; Decision trees; Feature extraction; Mel frequency cepstral coefficient; Sociology; Speech; Statistics; Testing; Gaussian Mixture Model (GMM); Large population speaker identification; Mel-Frequency Cepstral Coefficients (MFCC); fuzzy clustering; hierarchical decision tree;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2012.2234113