• DocumentCode
    2736475
  • Title

    Modeling protein tandem mass spectrometry data with an extended linear regression strategy

  • Author

    Liu, Han ; Bonner, Anthony J. ; Emili, Andrew

  • Author_Institution
    Dept. of Comput. Sci., Toronto Univ., Ont., Canada
  • Volume
    2
  • fYear
    2004
  • fDate
    1-5 Sept. 2004
  • Firstpage
    3055
  • Lastpage
    3059
  • Abstract
    Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithm. The intensity patterns presented in mass spectra are useful information for identification of peptides and proteins. However, widely used algorithms can not predicate the peak intensity patterns exactly. We have developed a systematic analytical approach based on a family of extended regression models, which permits routine, large scale protein expression profile modeling. By proving an important technical result that the regression coefficient vector is just the eigenvector corresponding to the least eigenvalue of a space transformed version of the original data, this extended regression problem can be reduced to a SVD decomposition problem, thus gain the robustness and efficiency. To evaluate the performance of our model, from 60,960 spectra, we chose 2,859 with high confidence, non redundant matches as training data, based on this specific problem, we derived some measurements of goodness of fit to show that our modeling method is reasonable. The issues of overfitting and underfitting are also discussed. This extended regression strategy therefore offers an effective and efficient framework for in-depth investigation of complex mammalian proteomes.
  • Keywords
    biochemistry; eigenvalues and eigenfunctions; mass spectra; mass spectroscopy; medical computing; molecular biophysics; proteins; regression analysis; singular value decomposition; SVD decomposition; eigenvector; extended linear regression strategy; least eigenvalue; mammalian proteomes; mass spectra; peptides identification; protein expression profile; proteins identification; proteomics; regression coefficient vector; spectral interpretation algorithm; tandem mass spectrometry; Computer science; Eigenvalues and eigenfunctions; Genetics; Large-scale systems; Linear regression; Mass spectroscopy; Peptides; Protein engineering; Proteomics; Robustness; expression profile; goodness of fit;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering in Medicine and Biology Society, 2004. IEMBS '04. 26th Annual International Conference of the IEEE
  • Conference_Location
    San Francisco, CA
  • Print_ISBN
    0-7803-8439-3
  • Type

    conf

  • DOI
    10.1109/IEMBS.2004.1403864
  • Filename
    1403864