Title : 
A protein family classification method for analysis of large DNA sequences
         
        
            Author : 
Henikoff, Steven ; Henikoff, Jorja G.
         
        
            Author_Institution : 
Fred Hutchinson Cancer Res. Center, Howard Hughes Med. Inst., Seattle, WA, USA
         
        
        
        
        
        
        
            Abstract : 
A method is described for identification and classification of proteins encoded in large DNA sequences. Previously, an automated system was introduced for the general detection of amino acid sequence motifs within diverse protein families. The system generated a database consisting of aligned sequence segments (blocks) that correspond to the most highly conserved regions of proteins. This database of blocks can be searched using protein queries for sensitive detection of homology based on the detection of both local and global similarities. We show that this database searching approach can also be used to detect distant relatives encoded in very large DNA sequences. The approach is illustrated by the detection of known and new relationships in the 315 kilobase sequence of yeast chromosome III.<>
         
        
            Keywords : 
DNA; biology computing; classification; database management systems; proteins; aligned sequence segments; amino acid sequence motifs; automated system; block database; database searching; distant relatives; global similarities; highly conserved regions; large DNA sequences; local similarities; protein family classification method; protein queries; sensitive homology detection; yeast chromosome III;
         
        
        
        
            Conference_Titel : 
System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference on
         
        
            Conference_Location : 
Wailea, HI, USA
         
        
            Print_ISBN : 
0-8186-5090-7
         
        
        
            DOI : 
10.1109/HICSS.1994.323570