مرکز منطقه ای اطلاع رساني علوم و فناوري - A species clustering method based on variation of molecular data with the aid of variance proportion

DocumentCode :

735870

Title :

A species clustering method based on variation of molecular data with the aid of variance proportion

Author :

Ghavidel, Abolfazl ; Rezaeian, Amin ; Rezaee, Mohammadreza

Author_Institution :

Dept. of Comput. Eng., Ferdowsi Univ. of Mashhad, Mashhad, Iran

fYear :

2015

fDate :

9-11 July 2015

Firstpage :

151

Lastpage :

156

Abstract :

In order to infer evolutionary relationships as well as reconstruct phylogenetic trees, evolutionists often employ two general approaches: character-based and distance-based. Inasmuch as character based methods could be inordinately expensive in computational process, researchers have to use some estimation methods with practical run time. In this context, distance based methods are exceedingly quicker due to the utilizing of distance matrices. In Computational Biology, sequence comparison is of fundamental importance which tries to find similar sequences. Many different techniques have been developed to calculate the right distance measure among DNA sequences, however, they are almost only used for making distance matrix; additionally, they usually work in the absence of using models of evolution too. In this paper, a novel technique, based on mathematical variance calculation, is proposed to show how much gene sequences in a group are all to be similar. In this strategy, we use mathematical formula of variance to acquire the average of differences amongst all sequences of a specific set (called cluster). Eventually, all sequences with variation lower than the predefined variance will be clustered into some groups while each group contains a phylogenetic tree. We are of the idea that our method, in spite of simplicity in design, could be used as a logical criterion to cluster sequences of DNA and it also could prove useful as a simple technique to build phylogenetic networks based on distance, especially when there are a large number of input sequences.

Keywords :

DNA; biology computing; evolutionary computation; genetics; matrix algebra; pattern clustering; DNA sequence clustering; DNA sequences; character-based approaches; computational biology; distance based methods; distance matrix; distance-based approaches; estimation methods; evolutionary relationships; gene sequences; logical criterion; mathematical variance calculation; molecular data variation; phylogenetic networks; phylogenetic trees; sequence comparison; species clustering method; variance proportion; Clustering algorithms; DNA; Genomics; Phylogeny; Time complexity; Vegetation; maximum parsimony; phylogenetic tree; species clustering algorithm; stepwise addition; variance;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on

Conference_Location :

Kolkata

Type :

conf

DOI :

10.1109/ReTIS.2015.7232869

Filename :

7232869

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=735870