Title :
A Sequence Data Mining Protocol to Identify Best Representative Sequence for Protein Domain Families
Author :
Gowri, V.S. ; Shameer, Khader ; Reddy, Chilamakuri Chandra Sekhar ; Shingate, Prashant ; Sowdhamini, Ramanathan
Author_Institution :
Nat. Centre for Biol. Sci. (TIFR), Bangalore, India
Abstract :
Protein domains are the compact, evolutionarily conserved units of proteins that can be utilized for function association of the large number of gene products realised from whole genome sequencing projects. Homology, inferred by sequence similarity, is usually a reason for transfer of function annotation from pre-existing domain families to gene products. Sequence analysis protocols are directed by the reference sequence of families used for homology searches to reduce computational time in such large-scale data mining processes. As protein domain families are diverse in nature, it is an important task to identify a single best representative sequence member from a protein domain family using a well-defined, reproducible bioinformatics protocol. We report a new bioinformatics protocol that can be used to identify best representative sequence (BRS) from protein domain families. The method is based on “coverage analysis” score implemented using three different sequence search programs and the trends obtained in reporting best representative sequence are assessed. The highest average coverage for BRPs was 66% when searched using Hidden Markov Models. Further, it is crucial to select BRS specific for a sequence search method when searching in large sequence databases.
Keywords :
bioinformatics; data mining; genomics; hidden Markov models; proteins; best representative sequence; bioinformatics protocol; coverage analysis; gene product; genome sequencing project; hidden Markov model; protein domain families; sequence analysis protocol; sequence data mining protocol; best representative sequence; data mining; protein domain; protein family; sequence analysis; sequence data mining;
Conference_Titel :
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9244-2
Electronic_ISBN :
978-0-7695-4257-7
DOI :
10.1109/ICDMW.2010.153