Title :
Improve K-means clustering for audio data by exploring a reasonable sampling rate
Author :
Chen, Gang ; Han, Bo
Author_Institution :
Int. Sch. of Software, Wuhan Univ., Wuhan, China
Abstract :
K-means clustering is sensitive to starting points and its time cost is expensive for large scale of data, such as audio. Sampling approach is widely applied to find “better” starting points for speeding up the clustering converging procedure. However, how to choose a reasonable sampling-rate remains a problem. In this paper, we reported our initial exploration of locating reasonable sampling-rates for different datasets. The procedure progressively increases sampling-rates and choose the cluster centers in the previous stage as the starting points for next clustering. The resulted relationship curve between sampling-rate and iteration number illustrates a turning point as reasonable sampling-rate. Based on two audio experimental data, the procedure can more efficiently cluster data while keeping similar clustering quality.
Keywords :
data mining; pattern clustering; K-means clustering; audio data; data clustering; reasonable sampling-rate; Algorithm design and analysis; Clustering algorithms; Data mining; Presses; Shape; Software; Software algorithms; K-means; audio; clustering; sampling-rate; starting points;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on
Conference_Location :
Yantai, Shandong
Print_ISBN :
978-1-4244-5931-5
DOI :
10.1109/FSKD.2010.5569371