مرکز منطقه ای اطلاع رساني علوم و فناوري - Adaptive Weighting Distance for Feature Vectors of Biological Sequences

DocumentCode :

1931422

Title :

Adaptive Weighting Distance for Feature Vectors of Biological Sequences

Author :

Kuo, Huang-Cheng ; Jou, Pei-yuan ; Huang, Jen-Peng

Author_Institution :

Nat. Chiayi Univ., Chiayi

Volume :

fYear :

2007

fDate :

19-22 Aug. 2007

Firstpage :

2269

Lastpage :

2273

Abstract :

Similarity search in biology sequences has received substantial attention in the past decade. Sequence alignment is the essential task for similar sequence search in bioinformatics. The biological sequence databases have getting larger in past decade, finding sequences similar to the query sequence is a time consuming task. By transforming sequences into numeric feature vectors, we can quickly filter out sequences whose feature vectors are distant to the feature vector of the query sequence. We proposed an adaptive weighting distance which is based on feature vector that contains three groups of features: count, extended relative position dispersion (XRPD), and extended absolute position dispersion (XAPD) of a DNA sequence. Each group has four dimensions for A, C, T, and G. When computing distance between two feature vectors, Euclidean distance and L1 distance are commonly used. In this paper, we use weighted L1 distance for computing the distance between two feature vectors. We derive weights for the four nucleotides from the count group, and apply the weights to both XRPD and XAPD. In other words, if a certain kind of nucleotide appears much frequent than the other kinds of nucleotides, the weight for the kind of nucleotide should also be large in XRPD and XAPD groups. Experiments show that such distance of feature vectors helps reflect the distance between sequences.

Keywords :

DNA; biology computing; search problems; sequences; DNA sequence; Euclidean distance; L1 distance; adaptive weighting distance; bioinformatics; biological sequences; extended absolute position dispersion; extended relative position dispersion; feature vectors; query sequence; sequence alignment; similarity search; Biology; Cities and towns; Computer science; Cybernetics; Information management; Machine learning; DNA Sequence; Feature Vector; Weight Assignment;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine Learning and Cybernetics, 2007 International Conference on

Conference_Location :

Hong Kong

Print_ISBN :

978-1-4244-0973-0

Electronic_ISBN :

978-1-4244-0973-0

Type :

conf

DOI :

10.1109/ICMLC.2007.4370523

Filename :

4370523

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1931422