• DocumentCode
    3259233
  • Title

    Sparse Logistic Classifiers for Interpretable Protien Homology Detection

  • Author

    Huang, Pai-Hsi ; Pavlovic, Vladimir

  • Author_Institution
    Dept. of Comput. Sci., Rutgers Univ.
  • fYear
    2006
  • fDate
    Dec. 2006
  • Firstpage
    99
  • Lastpage
    103
  • Abstract
    Computational classification of proteins using methods such as string kernels and Fisher-SVM has demonstrated great success. However, the resulting models do not offer an immediate interpretation of the underlying biological mechanisms. In particular, some recent studies have postulated the existence of a small subset of positions and residues in protein sequences may be sufficient to discriminate among different protein classes. In this work, we propose a hybrid setting for the classification task. A generative model is trained as a feature extractor, followed by a sparse classifier in the extracted feature space to determine the membership of the sequence, while discovering features relevant for classification. The set of sparse biologically motivated features together with the discriminative method offer the desired biological interpretability. We apply the proposed method to a widely used dataset and show that the performance of our models is comparable to that of the state-of-the-art methods. The resulting models use fewer than 10% of the original features. At the same time, the sets of critical features discovered by the model appear to be consistent with confirmed biological findings
  • Keywords
    DNA; biology computing; feature extraction; pattern classification; proteins; support vector machines; Fisher-SVM; biological findings; computational classification; feature extractor; interpretable protein homology detection; protein sequences; proteins; sparse logistic classifiers; string kernels; Biological system modeling; Computer science; Databases; Feature extraction; Hidden Markov models; Kernel; Logistics; Proteins; Sequences; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2702-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2006.152
  • Filename
    4063606