• DocumentCode
    3322148
  • Title

    Detecting experimental noises in protein-protein interactions with iterative sampling and model-based clustering

  • Author

    Mamitsuka, Hiroshi

  • Author_Institution
    Inst. for Chem. Res., Kyoto Univ., Uji, Japan
  • fYear
    2003
  • fDate
    10-12 March 2003
  • Firstpage
    385
  • Lastpage
    392
  • Abstract
    One of the most important issues in current molecular biology is to build exact networks of protein-protein interactions. Recently developed high-throughput experimental techniques accumulate a vast amount of protein-protein interaction data, but it is well known that data reliability has not reached at a satisfactory level. In this paper we attempt to computationally detect experimental errors or noises presumably contained in the protein-protein interaction data by an iterative sampling method using the learning of a stochastic model as its subroutine. The method repeats two steps of selecting examples that can be regarded as non-noises, and training the component algorithm with the selected examples alternately. Noise candidates are selected as the examples having the smallest average likelihoods computed by previously obtained stochastic models. We empirically evaluated the method with other two methods by using both synthetic and real data sets. We examined the effect of noises and data sizes by using medium- and large-sized synthetic data sets that contain noises added intentionally. The results obtained by the medium-sized synthetic data sets show that the significance level of the performance difference between the method and the two other methods has more pronounced for higher noise ratios. Further experiments show that this experimental finding was also true of a large-scale data set. The performance advantage of the method was further confirmed by the experiments using a real protein-protein interaction data set.
  • Keywords
    iterative methods; molecular biophysics; noise; physiological models; proteins; stochastic processes; component algorithm; data size; experimental noises detection; iterative sampling; large-scale data set; large-sized synthetic data sets; medium-sized synthetic data sets; model-based clustering; noise candidates; protein-protein interactions; real protein-protein interaction data set; stochastic models; Biological system modeling; Biology computing; Iterative algorithms; Iterative methods; Large-scale systems; Noise level; Protein engineering; Sampling methods; Signal to noise ratio; Stochastic resonance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Bioengineering, 2003. Proceedings. Third IEEE Symposium on
  • Print_ISBN
    0-7695-1907-5
  • Type

    conf

  • DOI
    10.1109/BIBE.2003.1188977
  • Filename
    1188977