Title :
Supervised Multi-View Canonical Correlation Analysis (sMVCCA): Integrating Histologic and Proteomic Features for Predicting Recurrent Prostate Cancer
Author :
Lee, Gene ; Singanamalli, Asha ; Haibo Wang ; Feldman, Michael D. ; Master, Stephen R. ; Shih, Natalie N. C. ; Spangler, Elaine ; Rebbeck, Timothy ; Tomaszewski, John E. ; Madabhushi, Anant
Author_Institution :
Dept. of Biomed. Eng., Case Western Reserve Univ., Cleveland, OH, USA
Abstract :
In this work, we present a new methodology to facilitate prediction of recurrent prostate cancer (CaP) following radical prostatectomy (RP) via the integration of quantitative image features and protein expression in the excised prostate. Creating a fused predictor from high-dimensional data streams is challenging because the classifier must 1) account for the “curse of dimensionality” problem, which hinders classifier performance when the number of features exceeds the number of patient studies and 2) balance potential mismatches in the number of features across different channels to avoid classifier bias towards channels with more features. Our new data integration methodology, supervised Multi-view Canonical Correlation Analysis (sMVCCA), aims to integrate infinite views of highdimensional data to provide more amenable data representations for disease classification. Additionally, we demonstrate sMVCCA using Spearman´s rank correlation which, unlike Pearson´s correlation, can account for nonlinear correlations and outliers. Forty CaP patients with pathological Gleason scores 6-8 were considered for this study. 21 of these men revealed biochemical recurrence (BCR) following RP, while 19 did not. For each patient, 189 quantitative histomorphometric attributes and 650 protein expression levels were extracted from the primary tumor nodule. The fused histomorphometric/proteomic representation via sMVCCA combined with a random forest classifier predicted BCR with a mean AUC of 0.74 and a maximum AUC of 0.9286. We found sMVCCA to perform statistically significantly (p <; 0.05) better than comparative state-of-the-art data fusion strategies for predicting BCR. Furthermore, Kaplan-Meier analysis demonstrated improved BCR-free survival prediction for the sMVCCA-fused classifier as compared to histology or proteomic features alone.
Keywords :
biomedical MRI; cancer; correlation methods; image representation; medical image processing; proteins; proteomics; tumours; BCR-free survival prediction; Kaplan-Meier analysis; Pearsons correlation; Spearmans rank correlation; biochemical recurrence; data fusion strategy; data representation; dimensionality curse problem; disease classification; high-dimensional data stream; histomorphometric representation; nonlinear correlation; primary tumor nodule; protein expression level; proteomic feature; proteomic representation; quantitative image feature; radical prostatectomy; random forest classifier; recurrent prostate cancer prediction; sMVCCA; supervised multiview canonical correlation analysis; Correlation; Feature extraction; Optimization; Prostate cancer; Proteins; Proteomics; Vectors; Data fusion; digital pathology; dimensionality reduction; mass spectrometry; prostate cancer; proteomics;
Journal_Title :
Medical Imaging, IEEE Transactions on
DOI :
10.1109/TMI.2014.2355175