DocumentCode :
735870
Title :
A species clustering method based on variation of molecular data with the aid of variance proportion
Author :
Ghavidel, Abolfazl ; Rezaeian, Amin ; Rezaee, Mohammadreza
Author_Institution :
Dept. of Comput. Eng., Ferdowsi Univ. of Mashhad, Mashhad, Iran
fYear :
2015
fDate :
9-11 July 2015
Firstpage :
151
Lastpage :
156
Abstract :
In order to infer evolutionary relationships as well as reconstruct phylogenetic trees, evolutionists often employ two general approaches: character-based and distance-based. Inasmuch as character based methods could be inordinately expensive in computational process, researchers have to use some estimation methods with practical run time. In this context, distance based methods are exceedingly quicker due to the utilizing of distance matrices. In Computational Biology, sequence comparison is of fundamental importance which tries to find similar sequences. Many different techniques have been developed to calculate the right distance measure among DNA sequences, however, they are almost only used for making distance matrix; additionally, they usually work in the absence of using models of evolution too. In this paper, a novel technique, based on mathematical variance calculation, is proposed to show how much gene sequences in a group are all to be similar. In this strategy, we use mathematical formula of variance to acquire the average of differences amongst all sequences of a specific set (called cluster). Eventually, all sequences with variation lower than the predefined variance will be clustered into some groups while each group contains a phylogenetic tree. We are of the idea that our method, in spite of simplicity in design, could be used as a logical criterion to cluster sequences of DNA and it also could prove useful as a simple technique to build phylogenetic networks based on distance, especially when there are a large number of input sequences.
Keywords :
DNA; biology computing; evolutionary computation; genetics; matrix algebra; pattern clustering; DNA sequence clustering; DNA sequences; character-based approaches; computational biology; distance based methods; distance matrix; distance-based approaches; estimation methods; evolutionary relationships; gene sequences; logical criterion; mathematical variance calculation; molecular data variation; phylogenetic networks; phylogenetic trees; sequence comparison; species clustering method; variance proportion; Clustering algorithms; DNA; Genomics; Phylogeny; Time complexity; Vegetation; maximum parsimony; phylogenetic tree; species clustering algorithm; stepwise addition; variance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on
Conference_Location :
Kolkata
Type :
conf
DOI :
10.1109/ReTIS.2015.7232869
Filename :
7232869
Link To Document :
بازگشت