DocumentCode :
3421979
Title :
Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons
Author :
Qiao, Yu ; Shimomura, Naoya ; Minematsu, Nobuaki
Author_Institution :
Grad. Sch. of Frontier Sci., Tokyo Univ., Tokyo
fYear :
2008
fDate :
March 31 2008-April 4 2008
Firstpage :
3989
Lastpage :
3992
Abstract :
Phoneme segmentation is a fundamental problem in many speech recognition and synthesis studies. Unsupervised phoneme segmentation assumes no knowledge on linguistic contents and acoustic models, and thus poses a challenging problem. The essential question here is what is the optimal segmentation. This paper formulates the optimal segmentation problem into a probabilistic framework. Using statistics and information theory analysis, we develop three different objective functions, namely, summation of square error (SSE), log determinant (LD) and rate distortion (RD). Specially, RD function is derived from information rate distortion theory and can be related to human signal perception mechanism. We introduce a time-constrained agglomerative clustering algorithm to find the optimal segmentations. We also propose an efficient method to implement the algorithm by using integration functions. We carry out experiments on TIMIT database to compare the above three objective functions. The results show that rate distortion achieves the best performance and indicate that our method outperforms the recently published unsupervised segmentation methods.
Keywords :
probability; speech processing; speech recognition; speech synthesis; human signal perception mechanism; information rate distortion theory; integration functions; log determinant; probabilistic framework; rate distortion; speech recognition; speech synthesis; summation of square error; time-constrained agglomerative clustering algorithm; unsupervised optimal phoneme segmentation; Acoustic distortion; Clustering algorithms; Error analysis; Information analysis; Information rates; Information theory; Rate-distortion; Speech recognition; Speech synthesis; Statistical analysis; Agglomerative clustering; Rate Distortion theory; Unsupervised phoneme segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
ISSN :
1520-6149
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2008.4518528
Filename :
4518528
Link To Document :
بازگشت