Author_Institution :
Dept. of Anatomy, Mahidol Univ., Bangkok, Thailand
Abstract :
CpG islands are cluster of CG-rich DNA sequences that associate with the promoter of many human genes. Previous studies based on the original CpG criteria, as being length >; 200 bp, %GC ≥ 50%, and ObsCpG/ExpCpG ≥ 0.60, showed that CpG islands overlap the promoter of all human housekeeping genes and over half of all tissue-specific genes. The present study, using the new and widely accepted criteria defined as length >; 500 bp, %GC ≥ 55%, and ObsCpG/ExpCpG ≥ 0.65, showed that CpG islands of ~60% of housekeeping genes overlap the promoter, suggesting that the previous studies might include Alu repeats in the promoter region. Using artificial neural networks, RBF, MLP, PNN, and SVM identified ObsCpG/ExpCpG of the 5´ region-CpG islands and ratio of exonic CpG island number to total CpGs as important variables for classification of myocytes-specificity, with ObsCpG/ExpCpG <; 0.65 being specific. For classification of neuron-specificity, the %GC and ObsCpG/ExpCpG of the 5´ region-CpG islands were important variables.
Keywords :
bioinformatics; neural nets; CpG island distribution; DNA sequences; artificial neural networks; bioinformatics; housekeeping genes; human housekeeping genes; human neuron genes; myocyte specific genes; Accuracy; Artificial neural networks; Bioinformatics; DNA; Genomics; Humans; Support vector machines; CpG island; MLP; PNN; RBF; SVM; tissue-specific;