DocumentCode :
244627
Title :
An algorithm for identifying differentially expressed genes in multiclass RNA-seq samples
Author :
Jaehyun An ; Kwangsoo Kim ; Sun Kim
Author_Institution :
Sch. of Comput. Sci. & Eng., Seoul Nat. Univ., Seoul, South Korea
fYear :
2014
fDate :
15-17 Jan. 2014
Firstpage :
40
Lastpage :
44
Abstract :
Gene expression in the whole cell can be routinely measured by microarray technologies or recently by using sequencing technologies. Using these technologies, identifying Differentially Expressed Genes (DEGs) among multiple phenotypes is one of the most important tasks in biology. Thus many methods for detecting DEGs between two groups has been developed. For example, T-test and relative entropy are used for detecting the difference between two probability distributions. When more than two phenotypes are considered, these methods are not applicable and other methods such as ANOVA F-test and Kruskal-Wallis are used for finding DEGs in the multiclass data. However, ANOVA F-test assumes a normal distribution and it is not designed to identify DEGs where gene are expressed distinctively in each of phenotypes. Kruskal-Wallis method, a non-parametric method, is more robust but sensitive to outliers. This paper proposes a non-parametric and information theoretical approach for identifying DEGs in the multiple class data and the approach is less sensitive to outliers. In extensive experiments with simulated and real data, our method outperformed existing tools. In addition, a web service is implemented for the analysis of multi-class data: http://biohealth.snu.ac.kr/software/degselection.
Keywords :
RNA; biology computing; data analysis; genetics; lab-on-a-chip; normal distribution; ANOVA F-test; DEG; Kruskal-Wallis method; T-test; biology; differentially expressed genes identification; information theoretical approach; microarray technologies; multiclass RNA-seq samples; multiclass data analysis; multiple class data; multiple phenotypes; nonparametric method; normal distribution; probability distributions; real data; relative entropy; sequencing technologies; simulated data; Accuracy; Bioinformatics; Biological information theory; Breast cancer; Gene expression; Mutual information; Robustness; Bioinformatics; Differentially expressed genes; Multiclass problem; RNA-seq;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data and Smart Computing (BIGCOMP), 2014 International Conference on
Conference_Location :
Bangkok
Type :
conf
DOI :
10.1109/BIGCOMP.2014.6741402
Filename :
6741402
Link To Document :
بازگشت