Title :
Noise-Resistant Bicluster Recognition
Author :
Huan Sun ; Gengxin Miao ; Xifeng Yan
Author_Institution :
CS Dept., Univ. of California, Santa Barbara, Santa Barbara, CA, USA
Abstract :
Biclustering is crucial in finding co-expressed genes and their associated conditions in gene expression data. While various biclustering algorithms (e.g., combinatorial, probabilistic modelling, and matrix factorization) have been proposed and constantly improved in the past decade, data noise and bicluster overlaps make biclustering a still challenging task. It becomes difficult to further improve biclustering performance, without resorting to a new approach. Inspired by the recent progress in unsupervised feature learning using deep neural networks, in this work, we propose a novel model for biclustering, named Auto Decoder (AD), by relating biclusters to features and leveraging a neural network that is able to automatically learn features from the input data. To suppress severe noise present in gene expression data, we introduce a non-uniform signal recovery mechanism: Instead of reconstructing the whole input data to capture the bicluster patterns, AD weighs the zero and non-zero parts of the input data differently and is more flexible in dealing with different types of noise. AD is also properly regularized to deal with bicluster overlaps. To the best of our knowledge, this is the first biclustering algorithm that leverages neural network techniques to recover overlapped biclusters hidden in noisy gene expression data. We compared our approach with four state-of-the-art biclustering algorithms on both synthetic and real datasets. On three out of the four real datasets, AD significantly outperforms the other approaches. On controlled synthetic datasets, AD performs the best when noise level is beyond 15%.
Keywords :
biology computing; genetics; neural nets; pattern clustering; pattern recognition; unsupervised learning; AD; AutoDecoder; automatic feature learning; bicluster patterns; biclustering algorithms; co-expressed genes; deep neural networks; noise suppression; noise-resistant bicluster recognition; noisy gene expression data; nonuniform signal recovery mechanism; overlapped bicluster recovery; real datasets; synthetic datasets; unsupervised feature learning; Algorithm design and analysis; Gene expression; Neural networks; Neurons; Noise; Noise level; Robustness; Biclustering; Gene Expression; Neural Network;
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
DOI :
10.1109/ICDM.2013.34