Improve K-means clustering for audio data by exploring a reasonable sampling rate

Author

Chen, Gang ; Han, Bo

Author_Institution

Int. Sch. of Software, Wuhan Univ., Wuhan, China

Volume

4

fYear

2010

fDate

10-12 Aug. 2010

Firstpage

1639

Lastpage

1642

Abstract

K-means clustering is sensitive to starting points and its time cost is expensive for large scale of data, such as audio. Sampling approach is widely applied to find “better” starting points for speeding up the clustering converging procedure. However, how to choose a reasonable sampling-rate remains a problem. In this paper, we reported our initial exploration of locating reasonable sampling-rates for different datasets. The procedure progressively increases sampling-rates and choose the cluster centers in the previous stage as the starting points for next clustering. The resulted relationship curve between sampling-rate and iteration number illustrates a turning point as reasonable sampling-rate. Based on two audio experimental data, the procedure can more efficiently cluster data while keeping similar clustering quality.

Keywords

data mining; pattern clustering; K-means clustering; audio data; data clustering; reasonable sampling-rate; Algorithm design and analysis; Clustering algorithms; Data mining; Presses; Shape; Software; Software algorithms; K-means; audio; clustering; sampling-rate; starting points;

fLanguage

English

Publisher

ieee

Conference_Titel

Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on

Conference_Location

Yantai, Shandong

Print_ISBN

978-1-4244-5931-5

Type

conf

DOI

10.1109/FSKD.2010.5569371

Filename

5569371