Title :
Linear Separability of Gene Expression Data Sets
Author :
Unger, Giora ; Chor, Benny
Author_Institution :
Sch. of Comput. Sci., Tel-Aviv Univ., Tel-Aviv, Israel
Abstract :
We study simple geometric properties of gene expression data sets, where samples are taken from two distinct classes (e.g., two types of cancer). Specifically, the problem of linear separability for pairs of genes is investigated. If a pair of genes exhibits linear separation with respect to the two classes, then the joint expression level of the two genes is strongly correlated to the phenomena of the sample being taken from one class or the other. This may indicate an underlying molecular mechanism relating the two genes and the phenomena(e.g., a specific cancer). We developed and implemented novel efficient algorithmic tools for finding all pairs of genes that induce a linear separation of the two sample classes. These tools are based on computational geometric properties and were applied to 10 publicly available cancer data sets. For each data set, we computed the number of actual separating pairs and compared it to an upper bound on the number expected by chance and to the numbers resulting from shuffling the labels of the data at random empirically. Seven out of these 10 data sets are highly separable. Statistically, this phenomenon is highly significant, very unlikely to occur at random. It is therefore reasonable to expect that it manifests a functional association between separating genes and the underlying phenotypic classes.
Keywords :
DNA; bioinformatics; cancer; computational geometry; genetics; cancer data sets; computational geometric properties; functional association; gene expression data sets; linear separability; molecular mechanism; phenotypic classes; simple geometric properties; Bioinformatics (genome or protein) databases; Biology and genetics; DNA microarrays; Data mining; Gene expression analysis; Geometrical problems and computations; Heuristic methods; Information filtering; diagnosis; linear separation.; Algorithms; Computational Biology; Databases, Genetic; Gene Expression Profiling; High-Throughput Screening Assays; Humans; Linear Models; Neoplasms; Oligonucleotide Array Sequence Analysis;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2008.90