• DocumentCode
    1784874
  • Title

    Pathogen host interaction prediction via matrix factorization

  • Author

    Li, Benjamin Y. S. ; Lam Fat Yeung ; Genke Yang

  • Author_Institution
    Dept. of Electron. Eng., City Univ. of Hong Kong, Hong Kong, China
  • fYear
    2014
  • fDate
    2-5 Nov. 2014
  • Firstpage
    357
  • Lastpage
    362
  • Abstract
    One of the goals in the study of infectious disease is to construct a reliable predictive model on the pathogen-host interactome. Conventional methods on the construction of model consider the problem as a binary classification problem. However, most databases only consist of detected interactions and lack of negative results. Thus, as compare to binary classification, this situation is closer to the collaborative filtering problem in nature. In this paper, a commonly used collaborative filtering technique, matrix factorization is applied on the prediction of pathogen-host interaction. However, in matrix factorization, estimation of latent variables is highly dependent on the completeness of the dataset. If the dataset is incomplete, due to the lack of information, estimation of some latent vectors may be infeasible. To relieve this issue, an extension of probabilistic matrix factorization is proposed in this paper. In the extended model, similarities between objects are taken into account as a basis of estimation. Experiment results have shown that when the sparsity increases, as compare to the conventional matrix factorization model and the probabilistic based matrix factorization model, the similarity based probabilistic matrix factorization model has the best goodness of fit and a high prediction accuracy.
  • Keywords
    diseases; matrix decomposition; medical computing; pattern classification; probability; binary classification problem; collaborative filtering technique; conventional matrix factorization model; conventional methods; databases; extended model; infectious disease; latent variable estimation; latent vectors; pathogen-host interaction prediction; pathogen-host interactome; prediction accuracy; reliable predictive model; similarity based probabilistic matrix factorization model; sparsity; Estimation; Pathogens; Predictive models; Probabilistic logic; Proteins; Sparse matrices; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
  • Conference_Location
    Belfast
  • Type

    conf

  • DOI
    10.1109/BIBM.2014.6999185
  • Filename
    6999185