Bioinformatic Approaches to Improve the Identification of Peptides from Proteomics Experiments

Author

Lau, King Wai ; Siepen, Jennifer

Author_Institution

Manchester Univ.

fYear

2006

Firstpage

Lastpage

Abstract

The accurate analysis of the proteome using mass spectrometry plays an important role in the understanding of many of the physiological processes that occur in an organism and has become a standard tool used in the identification of proteins. This identification of proteins is a challenging one and relies upon bioinformatics tools to characterize proteins via their proteolytic peptides which are identified via characteristic mass spectra generated after their ions undergo fragmentation in the gas phase within the mass spectrometer. An important problem associated with the accurate identification of peptides from mass spectrometry is whether or not a particular peptide is likely to be detected in a standard proteomics experiment, this can be dependant on a number of factors including the physiochemical properties of the peptide itself as well as the mass spectrometer used in the experiment. A machine learning approach was applied to find peptide fragmentation patterns based on different properties of the peptide sequence and we are able to predict which peptide(s) are likely to be detected in a standard proteomics experiment. The task of protein identification is made even more challenging by the occurrence of partial enzymatic protein cleavage, resulting in peptides with internal missed cleavage sites, as proteases frequently fail to digest proteins to their limit peptides. Typically, up to 1 of these "missed cleavages" are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using rules derived from information theory, we were able to "mask" candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines.

Keywords

biochemistry; biology computing; enzymes; learning (artificial intelligence); mass spectra; mass spectroscopic chemical analysis; molecular biophysics; bioinformatics; database searching; fragmentation; information theory; internal missed cleavage sites; machine learning; mass spectra; mass spectrometry; partial enzymatic protein cleavage; peptide identification; peptide sequence; physiochemical properties; physiological processes; proteases; protein identification; proteolytic peptides; proteome; proteomics; trypsin;

fLanguage

English

Publisher

iet

Conference_Titel

Signal Processing for Genomics, 2006. The Institution of Engineering and Technology Seminar on

Print_ISBN

0-86341-716-7

Type

conf

Filename

4126032

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3572563