Title :
A simple clustering approach for pathogenic strain identification based on local and global amino acid compositional signatures from enomic sequences: the Escherichia genus case
Author :
Promponas, Vasilis J.
Author_Institution :
Dept. of Biol. Sci., Univ. of Cyprus, Nicosia, Cyprus
Abstract :
Cluster analysis offers a suite of powerful unsupervised methods, commonly used as exploratory data analysis tools. Such tools can be proven especially useful when we face the situation of analyzing large data sets and want to get an intuitive insight at subtle correlations between instances of the data. In this work, we demonstrate that simple hierarchical clustering approaches (based on compositional features extracted from the amino acid sequences encoded in the complete genomic sequences of 25 species/strains belonging to the proteobacterial genus Escherichia) can be used to accurately discriminate between pathogenic and nonpathogenic strains of those bacteria.
Keywords :
biology computing; cellular biophysics; feature extraction; genomics; microorganisms; molecular biophysics; pattern clustering; statistical analysis; Escherichia genus; amino acid compositional signature; amino acid sequences; cluster analysis; compositional feature extraction; exploratory data analysis tool; genomic sequences; hierarchical clustering; pathogenic strain identification; proteobacteria; Amino acids; Bioinformatics; Capacitive sensors; Data analysis; Genomics; Intestines; Microorganisms; Organisms; Pathogens; Proteins; Compositional signatures; bacterial pathogenicity; clustering; genome;
Conference_Titel :
Information Technology and Applications in Biomedicine, 2009. ITAB 2009. 9th International Conference on
Conference_Location :
Larnaca
Print_ISBN :
978-1-4244-5379-5
Electronic_ISBN :
978-1-4244-5379-5
DOI :
10.1109/ITAB.2009.5394396