Title :
Microarray Data Mining: A New Algorithm for Gene Selection Using Lorenz Curves & Gini Ratios
Author_Institution :
Lamar (Texas State) Univ., TX, USA
Abstract :
Gene selection is a challenging task in microarray data mining because a typical microarray dataset has only a small number of records while having thousands of attributes. This kind of dataset creates a high likelihood of finding false predictions that are due to chance. Finding the most relevant genes is often the key phase in building an accurate classification model. Irrelevant and redundant attributes have negative impacts on the accuracy of classification algorithms. In this paper, we present a new method for gene selection utilizing techniques from economics. We modify the Lorenz curves and the Gini coefficients by taking into account the order of classes and the order of gene´s discretized values and use them for selecting relevant genes. We believe that our method is the first one for attribute selection that considers the order of classes and the order of the attribute´s discretized values. We implemented this new method and compare our method with SAM, one of the most popular gene selection methods. Experimental results with many different classification algorithms for the task of classifying lung adenocarcinomas from gene expression show that (a) Our new method is different with SAM in the sense that it finds very different sets of significant genes. (b) Our method selects genes for more accurate classification.
Keywords :
biology computing; data mining; genetics; pattern classification; Gini ratios; Lorenz curves; classification algorithms; gene selection; microarray data mining; microarray dataset; Classification algorithms; Data mining; Diseases; Economic forecasting; Gene expression; Information technology; Lungs; Medical diagnosis; Morphology; Neoplasms; Classification; Gene Selection; Lorenz Curves; Microarray Data Mining;
Conference_Titel :
Information Technology: New Generations (ITNG), 2010 Seventh International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-6270-4
DOI :
10.1109/ITNG.2010.228