Abstract :
The National Human Genome Research Institute and National Cancer Institute, National Institutes of Health, U. S. Department of Health and Human Services has recently lunched The Cancer Genome Atlas (TCGA) project an overarching goal of understanding the molecular basis of cancer to improve our ability to diagnose, treat and prevent cancer. The prognosis for many cancers could be improved dramatically if they could be detected while still at the microscopic disease stage. We are investigating the possibility of detecting microscopic disease using machine learning approaches based on features derived from gene expression levels and metabolic profiles. We use immunochemistry and QRT-PCR to measure the gene expression profiles from a number of antigens such as cyclin E, P27KIP1, FHIT, Ki-67, PCNA, Bax, Bcl-2, P53, Fas, FasL and hTERT in several particular types of neuroendocrine tumors such as pheochromocytomas, paragangliomas; and the adrenocortical carcinomas (ACC), adenomas (ACA), and hyperplasia (ACH) in Cushing´s syndrome. We provide statistical evidence that, higher expression levels of hTERT, PCNA and Ki-67 etc. are associated with a higher risk that the tumors are malignant or borderline, as opposed to benign. We also investigated whether higher expression levels of the P27KIP1 and FHIT etc. are associated with a decreased risk of adrenomedullary tumors. While no significant difference was found between cell-arrest antigens such as P27KIP1 for malignant, borderline, and benign tumors, there was a significant difference between expression levels of such antigens in normal adrenal medulla samples and in adrenomedullary tumors. It follows from a comprehensive statistical analysis that a number of antigens such as hTERT, PCNA and Ki-67 can be considered as cancer markers, while another set of antigens such as P27KIP1 and FHIT are possible markers for normal tissue. Because more than one marker must be considered to obtai- a classification of cancer or no-cancer, and if cancer, to classify it as malignant, borderline, or benign, we must develop a intelligent decision system using machine learning techniques, including variants of support vector machines, neural networks, decision trees, self-organizing feature maps (SOFM) and recursive maximum contrast trees (RMCT). These variants and algorithms we developed tended to work very well, yielding an average accuracy that was generally in excess of 90%. Our frame work focused on not only different classification schemes and feature selection algorithms but also ensemble methods such as boosting and bagging in an effort to improve upon the accuracy of the individual classifiers. It is evident when all sorts of machine learning and statistically learning techniques are combined appropriately into one integrated intelligent medical decision system, the prediction power can be enhanced significantly. This research has many potential applications, not only in providing an alternative diagnostic tool and a better understanding of the mechanisms involved in malignant transformation, but also in providing information that is useful for treatment planning and cancer prevention.
Keywords :
biology computing; cancer; learning (artificial intelligence); patient diagnosis; trees (mathematics); tumours; Cushing´s syndrome; Ki-67 antigen; PCNA antigen; QRT-PCR; TCGA project; The Cancer Genome Atlas project; adenomas; adrenocortical carcinomas; cancer; diagnosis; gene expression levels; hTERT antigen; hyperplasia; immunochemistry; machine learning; metabolic profiles; microscopic disease detection; neuroendocrine tumors; paragangliomas; pheochromocytomas; recursive maximum contrast trees; self-organizing feature maps; treatment planning;