Title :
InfoBarcoding: Selection of non-contiguous sites in molecular biomarker
Author :
Chiu, David K Y ; Xu, Peter S C
Author_Institution :
Dept. Comput. Sci., Univ. of Guelph, Guelph, ON, Canada
Abstract :
DNA barcoding has recently emerged for fast taxonomic classification of species using molecular biomarkers. Different from traditional classification scheme, DNA barcode often involves a small number of samples in each class, likely leading to a phenomenon known as overfit. To evaluate the efficacy of a biomarker based on a given meaningful multiple sequence alignment, we use a metric-based information measure that identifies converging interdependence on statistically significant sites. Experiments show that for the identified sites, when the convergent information between sites in the biomarker is small, its classification information is also small, whereas when it is high, then the information of the class is high. The correlation between these two types of pattern indicates the importance of selecting informative sites, in order for the biomarker to be effective as an identification barcode.
Keywords :
DNA; bar codes; biology computing; classification; molecular biophysics; molecular configurations; DNA barcoding; InfoBarcoding; fast taxonomic classification; metric-based information measure; molecular biomarker; molecular biomarkers; multiple sequence alignment; noncontiguous site selection; Correlation; DNA; Molecular biomarkers; Statistical analysis; DNA barcode; biomarker refinement; convergent information; multiple sequence analysis;
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2011 IEEE 1st International Conference on
Conference_Location :
Orlando, FL
Print_ISBN :
978-1-61284-851-8
DOI :
10.1109/ICCABS.2011.5729944