Title :
Incremental Fuzzy Mining of Gene Expression Data for Gene Function Prediction
Author :
Ma, Patrick C H ; Chan, Keith C C
Author_Institution :
Dept. of Comput., Hong Kong Polytech. Univ., Hong Kong, China
fDate :
5/1/2011 12:00:00 AM
Abstract :
Due to the complexity of the underlying biological processes, gene expression data obtained from DNA microarray technologies are typically noisy and have very high dimensionality and these make the mining of such data for gene function prediction very difficult. To tackle these difficulties, we propose to use an incremental fuzzy mining technique called incremental fuzzy mining (IFM). By transforming quantitative expression values into linguistic terms, such as highly or lowly expressed, IFM can effectively capture heterogeneity in expression data for pattern discovery. It does so using a fuzzy measure to determine if interesting association patterns exist between the linguistic gene expression levels. Based on these patterns, IFM can make accurate gene function predictions and these predictions can be made in such a way that each gene can be allowed to belong to more than one functional class with different degrees of membership. Gene function prediction problem can be formulated both as classification and clustering problems, and IFM can be used either as a classification technique or together with existing clustering algorithms to improve the cluster groupings discovered for greater prediction accuracies. IFM is characterized also by its being an incremental data mining technique so that the discovered patterns can be continually refined based only on newly collected data without the need for retraining using the whole dataset. For performance evaluation, IFM has been tested with real expression datasets for both classification and clustering tasks. Experimental results show that it can effectively uncover hidden patterns for accurate gene function predictions.
Keywords :
DNA; bioinformatics; data mining; fuzzy set theory; pattern classification; pattern clustering; DNA microarray technologies; bioinformatics; classification problems; clustering problems; fuzzy measure; gene expression data; gene function prediction problem; incremental data mining technique; incremental fuzzy mining technique; linguistic gene expression levels; pattern discovery; Accuracy; Bioinformatics; Biological processes; Clustering algorithms; DNA; Data mining; Gene expression; Support vector machine classification; Support vector machines; Testing; Bioinformatics; fuzzy data mining; gene expression data analysis; gene function prediction; pattern discovery; Algorithms; Cluster Analysis; Computational Biology; Data Mining; Databases, Genetic; Fibroblasts; Fuzzy Logic; Gene Expression Profiling; Genes; Humans; Models, Statistical; Pattern Recognition, Automated;
Journal_Title :
Biomedical Engineering, IEEE Transactions on
DOI :
10.1109/TBME.2010.2047724