DocumentCode
134212
Title
Automatic speech data clustering with human perception based weighted distance
Author
Xixin Wu ; Zhiyong Wu ; Jia Jia ; Meng, Hsiang-Yun ; Lianhong Cai ; Weifeng Li
Author_Institution
Shenzhen Key Lab. of Inf. Sci. & Technol., Tsinghua Univ., Shenzhen, China
fYear
2014
fDate
12-14 Sept. 2014
Firstpage
216
Lastpage
220
Abstract
Speech data from internet contain different speaking styles relating to information genre, emotions, sentiments, speaker characters, etc. Automatic classification of such data remains a challenging problem due to the difficulty in defining the categories to characterize different speaking styles clearly. To address the problem, this paper proposes a method based on x-means clustering, an extended version of k-means without fixed number of classes, for the task. Moreover, x-means method clusters the data according to a pre-defined distance measurement considering different features. Current methods on distance measuring only focus on features themselves while ignoring the impact of these features on human perception. To derive a more reasonable distance measurement, this paper also proposes a human perception based weighted distance to capture the contribution of different acoustic features on human perception. In this way, the automatic classification of internet speech data will make use of the prior knowledge of human perception as well as capture the speaking style characteristics in different datasets with varying categories. Experiments on listening test demonstrate that it is useful to introduce the human perception prior knowledge in distance measurement and our proposed method outperforms the baseline with conventional Euclidian distance with 10% improvement in classification accuracy.
Keywords
Internet; pattern classification; pattern clustering; speech processing; automatic Internet speech data classification; automatic speech data clustering; human perception based weighted distance; predefined distance measurement; speaking style characteristics; speaking styles; x-means clustering; Accuracy; Acoustics; Distance measurement; Internet; Speech; Speech recognition; Text recognition; feature weights; human perception; speech clustering; x-means;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location
Singapore
Type
conf
DOI
10.1109/ISCSLP.2014.6936604
Filename
6936604
Link To Document