Title :
FPGA implementation of K-means algorithm for bioinformatics application: An accelerated approach to clustering Microarray data
Author :
Hussain, Hanaa M. ; Benkrid, Khaled ; Seker, Huseyin ; Erdogan, Ahmet T.
Author_Institution :
Sch. of Eng., Univ. of Edinburgh, Edinburgh, UK
Abstract :
The Microarray is a technique used by biologists to perform many genome experiments simultaneously, which produces very large datasets. Analysis of these datasets is a challenge for scientists especially as the number of genome databases is increasing rapidly every year. K-means clustering is an unsupervised data mining technique used widely by bioinformaticians to analyze Microarray data. However, K-means can take between a few seconds to several days to process Microarray data depending on the size of these datasets. This puts a limit on the complexity of biological problems which can be asked by bioinfomaticians, and hence may result in an incomplete solution to the problem. In order to overcome such problems, we propose a highly parallel hardware design to accelerate the K-means clustering of Microarray data by implementing the K-means algorithm in Field Programmable Gate Arrays (FPGA). Our implementation is particularly suitable for server solution as it allows for processing many different datasets simultaneously. We have designed, and implemented five k-mean cores on Xilinx Virtex4 XC4VLX25 FPGA, and tested them on a sample of real Yeast Microarray data. Our design achieved about 51.7× speed-up when compared to a software model while being 206.8× more energy efficient.
Keywords :
bioinformatics; data mining; field programmable gate arrays; pattern clustering; FPGA implementation; K-means algorithm; K-means clustering; Microarray data; Xilinx Virtex4 XC4VLX25 FPGA; Yeast Microarray data; accelerated approach; bioinformatics application; biological problems complexity; clustering Microarray data; data mining technique; field programmable gate arrays; genome databases; genome experiments; parallel hardware design; Bioinformatics; Clocks; Clustering algorithms; Euclidean distance; Field programmable gate arrays; Hardware; Random access memory;
Conference_Titel :
Adaptive Hardware and Systems (AHS), 2011 NASA/ESA Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
978-1-4577-0598-4
Electronic_ISBN :
978-1-4577-0597-7
DOI :
10.1109/AHS.2011.5963944