Title :
Sparse Group Selection on Fused Lasso Components for Identifying Group-Specific DNA Copy Number Variations
Author :
Ze Tian ; Huanan Zhang ; Rui Kuang
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Minnesota Twin Cities, Minneapolis, MN, USA
Abstract :
Detecting DNA copy number variations (CNVs) from arrayCGH or genotyping-array data to correlate with cancer outcomes is crucial for understanding the molecular mechanisms underlying cancer. Previous methods either focus on detecting CNVs in each individual patient sample or common CNVs across all the patient samples. These methods ignore the discrepancies introduced by the heterogeneity in the patient samples, which implies that common CNVs might only be shared within some groups of samples instead of all samples. In this paper, we propose a latent feature model that couples sparse sample group selection with fused lasso on CNV components to identify group-specific CNVs. Assuming a given group structure on patient samples by clinical information, sparse group selection on fused lasso (SGS-FL) identifies the optimal latent CNV components, each of which is specific to the samples in one or several groups. The group selection for each CNV component is determined dynamically by an adaptive algorithm to achieve a desired sparsity. Simulation results show that SGS-FL can more accurately identify the latent CNV components when there is a reliable underlying group structure in the samples. In the experiments on arrayCGH breast cancer and bladder cancer datasets, SGS-FL detected CNV regions that are more relevant to cancer, and provided latent feature weights that can be used for better sample classification.
Keywords :
biology computing; cancer; learning (artificial intelligence); molecular biophysics; pattern classification; CNV component; adaptive algorithm; arrayCGH data; bladder cancer dataset; breast cancer dataset; cancer molecular mechanism; cancer outcome; clinical information; fused lasso component; genotyping-array data; group-specific DNA copy number variation; latent feature model; latent feature weight; patient sample; sample classification; sparse group selection; Arrays; Cancer; DNA; Feature extraction; Matrix decomposition; Optimization; Probes; DNA copy number variations; fused lasso; group lasso; sparse group learning;
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-4649-8
DOI :
10.1109/ICDM.2012.35