Title :
Modeling centre-based hard and soft clustering for Y chromosome short tandem repeats (YSTR) data
Author :
Seman, Ali ; Bakar, Zainab Abu ; Sapawi, Azizian Mohd.
Author_Institution :
Centre for Computer Science Studies, Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA (UiTM), 40450, Shah Alam, Malaysia
Abstract :
This paper models: (1) Y-STR data and; (2) Y-STR hard and soft clustering. The Y-STR models are extended and developed to test on three data sets of Y-STR haplogroup and Y-STR Surname. The results show that the hard clustering models and the soft clustering models have their advantages and disadvantages. The soft k-Means model produces a good clustering accuracy of 99.62% for Y-STR haplogroup data, whereas the hard k-Medoids obtains the highest score of clustering accuracy of 99.90% for Y-STR Surname data. This scenario seems to be both models have an equally chance of improving Y-STR clustering performances.
Keywords :
Accuracy; Clustering algorithms; DNA; Data models; Equations; Mathematical model; Numerical models; Clustering models; Y-STR data; hard clustering; soft clustering;
Conference_Titel :
Science and Social Research (CSSR), 2010 International Conference on
Conference_Location :
Kuala Lumpur, Malaysia
Print_ISBN :
978-1-4244-8987-9
DOI :
10.1109/CSSR.2010.5773869