Title :
Classification of lung cancer subtypes by data mining technique
Author :
Dass, M. Venkat ; Rasheed, Mohammed Abdul ; Ali, Md Mortuza
Author_Institution :
Dept. of CSE UCE(A), Osmania Univ., Hyderabad, India
fDate :
Jan. 31 2014-Feb. 2 2014
Abstract :
Lung cancer is the leading cause of cancer-related deaths worldwide. Classification and characterization of cancer treatment strategies are essential in the current medical era. Gene mutations and their altered expressions is the base of cancer development. Analyzing these gene mutations and gene expression data for the phenotypic classification of lung cancer is proposed in this paper. Genomic and proteomic data sets (Biomarkers) of Non-Small Cell Lung Cancer (NSCLC) and its two major subtypes, Squamous Cell Cancer (SCC) and adenocarcinoma (ADC) were analyzed in this study. The biomarkers included in genomic and proteomic data sets are microRNAs, genes and their proteins. An integrated classification decision tree induction algorithm is applied on these biomarkers of NSCLC cancers for making predictions. Knowledge derived by the proposed algorithm has high classification accuracy with the ability to predict the cancer type. Cross-validation technique is applied that further enhances the classification accuracy of J48 algorithm. Thus our contribution includes the construction of decision tree using J48 weka tool for lung cancer subtypes and predict the lung cancer type for unknown class. Secondly we have compared the outputs obtained using J48 algorithm with improved decision tree (J48). Through the construction of decision tree, totally top ten classification rules are obtained using the apriori algorithm (weka tool) for predicting lung cancer. The average correction classification accuracy is nearly 99.7%, but many of the rules which are of user interest are pruned. The classification rules obtained by improved decision tree are dependent on user decision that helps to derive unlimited rules based on selection of attribute values. The improved decision tree has shown a good improvement over J48 algorithm. The findings are considered as helpful reference rules in diagnosis and drug development of SCC and ADC cancers. The accurate differential diagnosis of lung can- er by the knowledge of biomarkers could reduce the pain of histopathological examination of the patients.
Keywords :
RNA; cancer; cellular biophysics; data mining; decision trees; genetics; genomics; medical computing; patient diagnosis; patient treatment; pattern classification; proteins; ADC; J48 algorithm; J48 weka tool; NSCLC cancers; SCC; adenocarcinoma; apriori algorithm; attribute values; average correction classification accuracy; biomarkers; cancer development; cancer treatment strategies; cancer type prediction; cancer-related deaths; classification rules; cross-validation technique; data mining technique; differential diagnosis; drug development; gene expression data; gene mutations; genomic data sets; histopathological examination; integrated classification decision tree induction algorithm; lung cancer subtypes classification; medical era; microRNA; nonsmall cell lung cancer; phenotypic classification; proteins; proteomic data sets; squamous cell cancer; Biomarkers; Cancer; Classification algorithms; Data mining; Decision trees; Gene expression; Lungs; Non-small cell lung cancer; biomarker; data mining; improved decision tree induction; subtype classification;
Conference_Titel :
Control, Instrumentation, Energy and Communication (CIEC), 2014 International Conference on
Conference_Location :
Calcutta
DOI :
10.1109/CIEC.2014.6959151