• DocumentCode
    983731
  • Title

    Frequent substructure-based approaches for classifying chemical compounds

  • Author

    Deshpande, Mukund ; Kuramochi, Michihiro ; Wale, Nikil ; Karypis, George

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Minnesota Univ., Minneapolis, MN, USA
  • Volume
    17
  • Issue
    8
  • fYear
    2005
  • Firstpage
    1036
  • Lastpage
    1050
  • Abstract
    Computational techniques that build models to correctly assign chemical compounds to various classes of interest have many applications in pharmaceutical research and are used extensively at various phases during the drug development process. These techniques are used to solve a number of classification problems such as predicting whether or not a chemical compound has the desired biological activity, is toxic or nontoxic, and filtering out drug-like compounds from large compound libraries. This paper presents a substructure-based classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the data set. The advantage of this approach is that during classification model construction, all relevant substructures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Experimental evaluation on eight different classification problems shows that our approach is computationally scalable and, on average, outperforms existing schemes by 7 percent to 35 percent.
  • Keywords
    biochemistry; chemistry computing; computational geometry; data mining; drugs; feature extraction; graph theory; pattern classification; aggressive feature selection; chemical compound classification; drug development process; subgraph discovery algorithms; substructure discovery process; substructure-based classification algorithm; support vector machines; virtual screening; Biological system modeling; Chemical compounds; Classification algorithms; Computational intelligence; Drugs; Filtering; Libraries; Pharmaceuticals; Scalability; Solid modeling; Index Terms- Classification; SVM.; chemical compounds; graphs; virtual screening;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2005.127
  • Filename
    1458698