Title :
Frequent sub-structure-based approaches for classifying chemical compounds
Author :
Deshpande, Mukund ; Kuramochi, Michihiro ; Karypis, George
Author_Institution :
Dept. of Comput. Sci., Minnesota Univ., Minneapolis, MN, USA
Abstract :
We study the problem of classifying chemical compound datasets. We present a substructure-based classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the dataset. The advantage of our approach is that during classification model construction, all relevant substructures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Our experimental evaluation on eight different classification problems shows that our approach is computationally scalable and on the average, outperforms existing schemes by 10% to 35%.
Keywords :
chemical structure; graph theory; pattern classification; support vector machines; chemical compound dataset classification; geometric substructure; subgraph discovery algorithm; substructure discovery process; Biology computing; Chemical compounds; Classification algorithms; Computational intelligence; Computer displays; Computer science; Drugs; High temperature superconductors; Scalability; Solid modeling;
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
DOI :
10.1109/ICDM.2003.1250900