• DocumentCode
    3496274
  • Title

    Sparse Bayesian prediction of disordered residues and disordered regions based on amino-acid composition

  • Author

    Cawley, Gavin C. ; Hayward, Steven ; Janacek, Gareth J. ; Moore, Geoff R.

  • fYear
    2011
  • fDate
    July 31 2011-Aug. 5 2011
  • Firstpage
    1618
  • Lastpage
    1623
  • Abstract
    This paper presents some initial results of an investigation into the use of machine learning methods to detect natively disordered regions in proteins from sequence information. A committee of Relevance Vector Machines is used to select the optimal window size for residue-by-residue prediction of disordered regions, based on local amino-acid composition. The minimal error rate of ≈ 15% is achieved using very long (205 residue) window lengths, with the classifier making little use of more local sequence information. This suggests that disorder arises principally due to large scale diffuse changes in mean hydropathy and to a lesser extent mean charge. We also demonstrate that the proportion of proteins having long disordered regions in operational conditions cannot be reliably estimated using a classifier trained on a balanced dataset.
  • Keywords
    Bayes methods; biology computing; learning (artificial intelligence); pattern classification; proteins; disordered regions; disordered residues; local amino-acid composition; machine learning methods; optimal window size; relevance vector machines; residue-by-residue prediction; sparse Bayesian prediction; Amino acids; Bayesian methods; Bioinformatics; Genomics; Logistics; Proteins; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2011 International Joint Conference on
  • Conference_Location
    San Jose, CA
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4244-9635-8
  • Type

    conf

  • DOI
    10.1109/IJCNN.2011.6033418
  • Filename
    6033418