DocumentCode :
2941469
Title :
How many clusters to report: A recursive heuristic
Author :
Carlis, John ; Bruso, Kelsey
Author_Institution :
Comput. Sci. & Eng. Dept., Univ. of Minnesota, Minneapolis, MN, USA
fYear :
2010
fDate :
Aug. 31 2010-Sept. 4 2010
Firstpage :
1069
Lastpage :
1072
Abstract :
Clustering can be a valuable tool for analyzing large amounts of data, but anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter when working within each of the three available frameworks where one thinks of clustering: as a Euclidean distance problem; as a statistical model problem; or as a complexity theory problem. We report here a novel recursive square root heuristic, RSQRT, which accurately predicts Kreported as a function of the attribute or item count, depending on attribute scales. We tested the heuristic on 226 widely-varying, but mostly scientific, studies, and found that the heuristic´s Kbest-predicted rounded to exactly Kreported in over half of the studies and was close in almost all of them. We claim that this strongly-supported heuristic makes sense and that, although it is not prescriptive, using it prospectively is much better than guessing.
Keywords :
bioinformatics; data analysis; data clustering; item count; recursive square root heuristic; Bayesian methods; Clustering algorithms; Complexity theory; Computational modeling; Presses; Shape; Spirals; Algorithms; Cluster Analysis; Data Interpretation, Statistical; Humans; Incidence; Proportional Hazards Models; Risk Assessment; Risk Factors; Schizophrenia;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE
Conference_Location :
Buenos Aires
ISSN :
1557-170X
Print_ISBN :
978-1-4244-4123-5
Type :
conf
DOI :
10.1109/IEMBS.2010.5627287
Filename :
5627287
Link To Document :
بازگشت