Title :
Efficient soft relational clustering based on randomized search applied to selection of bio-basis for amino acid sequence analysis
Author :
Mahfouz, M.A. ; Ismail, Muhammad Ali
Author_Institution :
Dept. of Comput. & Syst. Eng., Univ. of Alexandria, Alexandria, Egypt
Abstract :
Protein sequence clustering is a process that aims to identify sets of homologous proteins in a protein database. In this paper, two efficient soft c-mediods clustering algorithms for prototype selection for protein sequences are presented. In the proposed techniques patterns are considered to belong to some but not necessarily all clusters. The proposed algorithms is comprised of a judicious integration of the principles of fuzzy sets, semi-fuzzy or soft clustering models, the amino acid mutation matrix. Applying randomized search along with soft clustering model to the fuzzy c-medoids algorithm enables efficient and effective selection of the minimum set of the most informative bio-bases. The efficiency and the effectiveness of the proposed algorithms, along with a comparison with other algorithms, have been demonstrated on different types of protein data sets.
Keywords :
biology computing; fuzzy set theory; matrix algebra; pattern clustering; proteins; search problems; amino acid mutation matrix; amino acid sequence analysis; bio-basis selection; fuzzy set; protein data set; protein sequence clustering; randomized search; semi-fuzzy model; soft c-mediods clustering algorithm; soft clustering model; soft relational clustering; Algorithm design and analysis; Amino acids; Clustering algorithms; Linear programming; Partitioning algorithms; Proteins; Runtime; Cluster Analysis; Data Mining; Fuzzy Clustering; Medoid-Based Clustering; Protein sequences; Relational Clustering; Unsupervised Learning;
Conference_Titel :
Computer Engineering & Systems (ICCES), 2012 Seventh International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4673-2960-6
DOI :
10.1109/ICCES.2012.6408530