• DocumentCode
    134212
  • Title

    Automatic speech data clustering with human perception based weighted distance

  • Author

    Xixin Wu ; Zhiyong Wu ; Jia Jia ; Meng, Hsiang-Yun ; Lianhong Cai ; Weifeng Li

  • Author_Institution
    Shenzhen Key Lab. of Inf. Sci. & Technol., Tsinghua Univ., Shenzhen, China
  • fYear
    2014
  • fDate
    12-14 Sept. 2014
  • Firstpage
    216
  • Lastpage
    220
  • Abstract
    Speech data from internet contain different speaking styles relating to information genre, emotions, sentiments, speaker characters, etc. Automatic classification of such data remains a challenging problem due to the difficulty in defining the categories to characterize different speaking styles clearly. To address the problem, this paper proposes a method based on x-means clustering, an extended version of k-means without fixed number of classes, for the task. Moreover, x-means method clusters the data according to a pre-defined distance measurement considering different features. Current methods on distance measuring only focus on features themselves while ignoring the impact of these features on human perception. To derive a more reasonable distance measurement, this paper also proposes a human perception based weighted distance to capture the contribution of different acoustic features on human perception. In this way, the automatic classification of internet speech data will make use of the prior knowledge of human perception as well as capture the speaking style characteristics in different datasets with varying categories. Experiments on listening test demonstrate that it is useful to introduce the human perception prior knowledge in distance measurement and our proposed method outperforms the baseline with conventional Euclidian distance with 10% improvement in classification accuracy.
  • Keywords
    Internet; pattern classification; pattern clustering; speech processing; automatic Internet speech data classification; automatic speech data clustering; human perception based weighted distance; predefined distance measurement; speaking style characteristics; speaking styles; x-means clustering; Accuracy; Acoustics; Distance measurement; Internet; Speech; Speech recognition; Text recognition; feature weights; human perception; speech clustering; x-means;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2014.6936604
  • Filename
    6936604