Abstract :
Voiced-unvoiced-silence (V/UV/S) classification of speech sounds is important in automatic speech/speaker recognition, speech segmentation, speech signal compression, and speech analysis. Training-based classifications suffer from lack of training databases or degrade when training and test statistics mismatch due to variances in speakers, languages, talking styles, noise, transmission channels, etc. This paper proposes a novel voiced-unvoiced-silence classification based on unsupervised learning. The class-dependent statistics (feature means, covariance matrices, and occurrence frequencies of voiced, unvoiced, and silence classes) needed for the classification are estimated directly from the signal to be classified via Gaussian mixture models and the expectation maximization algorithm. The classification is evaluated using NTIMIT, and the results are encouraging: V/UV/S classification accuracy is greater than 91.15%, and voice activity detection accuracy is greater than 97.45%.
Keywords :
Gaussian processes; audio databases; data compression; expectation-maximisation algorithm; signal classification; speaker recognition; speech coding; statistical testing; unsupervised learning; Gaussian mixture model; automatic speech/speaker recognition; expectation maximization algorithm; signal classification; speech analysis; speech segmentation; speech signal compression; statistical testing; training database; unsupervised learning; voiced-unvoiced-silence speech sound classification; Degradation; Frequency estimation; Loudspeakers; Natural languages; Spatial databases; Speaker recognition; Speech analysis; Statistical analysis; Testing; Unsupervised learning;