مرکز منطقه ای اطلاع رساني علوم و فناوري - An Improved Scoring Method for Protein Residue Conservation and Multiple Sequence Alignment

DocumentCode :

1425484

Title :

An Improved Scoring Method for Protein Residue Conservation and Multiple Sequence Alignment

Author :

Nguyen, Ken D. ; Pan, Yi

Author_Institution :

Dept. of Inf. Technol., Clayton State Univ., Morrow, GA, USA

Volume :

Issue :

fYear :

2011

Firstpage :

275

Lastpage :

285

Abstract :

One of the most fundamental operation in biological sequence analysis is multiple sequence alignment (MSA). Optimally aligning multiple sequences is an intractable problem; however, it is a critical tool for biologists to identify the relationships between species and also possibly predict the structure and functionality of biological sequences. The most fundamental step of assembling MSA results is identifying the best location to place the sequence residues. And the accuracy of the sequence assembly depends heavily on the reliability of a scoring function used. With an appropriate scoring function, an MSA program can boost its accuracy of multiple sequence alignment up to 25%. In this study, we present a new, fast, and biologically reliable scoring method, hierarchical expected matching probability (HEP), to use in protein multiple sequence alignment. The new scoring method eliminates the burden of gap cost selection process. And it has consistently proven to be more biologically reliable than all other tested scoring methods through all tests on four different theoretical and experimental benchmarks, Valdar´s theoretical conservation benchmark, RT-OSM, BAliBASE3.0, and PREFAB4.0. An implementation of our new scoring method into progressive multiple sequence alignment, resembling the alignment algorithm in PIMA, ClustalW, and T-COFFEE, has shown an accuracy improvement up to 7% on BAliBASE3.0 and up to 5% on PREFAB4.0 benchmarks.

Keywords :

biology computing; molecular biophysics; molecular configurations; probability; proteins; BAliBASE3.0; ClustalW; HEP; PIMA; PREFAB4.0; RT-OSM; T-COFFEE; Valdar theoretical conservation benchmark; biological sequence analysis; gap cost selection; hierarchical expected matching probability; multiple sequence alignment; protein residue conservation; scoring method; Amino acids; Bioinformatics; Biological information theory; Protein sequence; Reliability; Sequences; Conservation scores; multiple sequence alignment; proteins; scoring functions; scoring methods; Algorithms; Reproducibility of Results; Research Design; Sequence Alignment; Sequence Analysis, Protein; Synteny;

fLanguage :

English

Journal_Title :

NanoBioscience, IEEE Transactions on

Publisher :

ieee

ISSN :

1536-1241

Type :

jour

DOI :

10.1109/TNB.2011.2179553

Filename :

6133485

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1425484