• DocumentCode
    3602158
  • Title

    A Machine Learning Based Approach to de novo Sequencing of Glycans from Tandem Mass Spectrometry Spectrum

  • Author

    Kumozaki, Shotaro ; Sato, Kengo ; Sakakibara, Yasubumi

  • Author_Institution
    Dept. of Biosci. & Inf., Keio Univ., Yokohama, Japan
  • Volume
    12
  • Issue
    6
  • fYear
    2015
  • Firstpage
    1267
  • Lastpage
    1274
  • Abstract
    Recently, glycomics has been actively studied and various technologies for glycomics have been rapidly developed. Currently, tandem mass spectrometry (MS/MS) is one of the key experimental tools for identification of structures of oligosaccharides. MS/MS can observe MS/MS peaks of fragmented glycan ions including cross-ring ions resulting from internal cleavages, which provide valuable information to infer glycan structures. Thus, the aim of de novo sequencing of glycans is to find the most probable assignments of observed MS/MS peaks to glycan substructures without databases. However, there are few satisfiable algorithms for glycan de novo sequencing from MS/MS spectra. We present a machine learning based approach to de novo sequencing of glycans from MS/MS spectrum. First, we build a suitable model for the fragmentation of glycans including cross-ring ions, and implement a solver that employs Lagrangian relaxation with a dynamic programming technique. Then, to optimize scores for the algorithm, we introduce a machine learning technique called structured support vector machines that enable us to learn parameters including scores for cross-ring ions from training data, i.e., known glycan mass spectra. Furthermore, we implement additional constraints for core structures of well-known glycan types including N-linked glycans and O-linked glycans. This enables us to predict more accurate glycan structures if the glycan type of given spectra is known. Computational experiments show that our algorithm performs accurate de novo sequencing of glycans. The implementation of our algorithm and the datasets are available at http://glyfon.dna.bio.keio.ac.jp/.
  • Keywords
    biology computing; dynamic programming; learning (artificial intelligence); mass spectroscopy; molecular biophysics; molecular configurations; organic compounds; support vector machines; Lagrangian relaxation; MS-MS spectra; N-linked glycans; O-linked glycans; computational experiments; core structures; cross-ring ions; dynamic programming; glycan de novo sequencing; glycan fragmentation; glycan substructures; machine learning based approach; structured support vector machines; tandem mass spectrometry spectrum; Bioinformatics; Computational biology; Dynamic programming; Glycomics; Ions; Linear programming; Sequential analysis; Glycan structure; MS/MS spectrum; glycan structure; structured SVM;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2015.2430317
  • Filename
    7102732