DocumentCode
3153790
Title
Unraveling complex relationships between heterogeneous omics datasets using local principal components
Author
Alaydie, Noor ; Fotouhi, Farshad
Author_Institution
Dept. of Comput. Sci., Wayne State Univ., Detroit, MI, USA
fYear
2011
fDate
3-5 Aug. 2011
Firstpage
136
Lastpage
141
Abstract
There is a growing interest in studying the dependencies between multiple data sources. A common way to analyze the relationships between a pair of data sources based on their correlation is canonical correlation analysis (CCA) which seeks for linear combinations of all variables from each dataset which maximize the correlation between them. However, in high dimensional datasets, such as genomic data, where the number of variables exceeds the number of experimental units, CCA may not lead to meaningful information. Moreover, when collinearity exists in one or both the datasets, CCA may not be applicable. In this paper, we present a novel method to extract common features from a pair of data sources using local principal components and Kendalls ranking. The results show that the proposed method outperforms CCA in many scenarios and is more robust to noisy data. Moreover, meaningful results are obtained using the proposed method when the number of variables exceeds the number of observed units.
Keywords
correlation methods; distributed databases; feature extraction; principal component analysis; canonical correlation analysis; feature extraction; heterogeneous omics datasets; local principal components; multiple data sources; Biological system modeling; Computational modeling; Correlation; Data models; Feature extraction; Noise measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Reuse and Integration (IRI), 2011 IEEE International Conference on
Conference_Location
Las Vegas, NV
Print_ISBN
978-1-4577-0964-7
Electronic_ISBN
978-1-4577-0965-4
Type
conf
DOI
10.1109/IRI.2011.6009535
Filename
6009535
Link To Document