Title :
Empirical comparison of graph classification algorithms
Author :
Ketkar, Nikhil S. ; Holder, Lawrence B. ; Cook, Diane J.
fDate :
March 30 2009-April 2 2009
Abstract :
The graph classification problem is learning to classify separate, individual graphs in a graph database into two or more categories. A number of algorithms have been introduced for the graph classification problem. We present an empirical comparison of the major approaches for graph classification introduced in literature, namely, SubdueCL, frequent subgraph mining in conjunction with SVMs, walk-based graph kernel, frequent subgraph mining in conjunction with AdaBoost and DT-CLGBI. Experiments are performed on five real world data sets from the mutagenesis and predictive toxicology domain which are considered benchmark data sets for the graph classification problem. Additionally, experiments are performed on a corpus of artificial data sets constructed to investigate the performance of the algorithms across a variety of parameters of interest. Our conclusions are as follows. In datasets where the underlying concept has a high average degree, walk-based graph kernels perform poorly as compared to other approaches. The hypothesis space of the kernel is walks and it is insufficient at capturing concepts involving significant structure. In datasets where the underlying concept is disconnected, SubdueCL performs poorly as compared to other approaches. The hypothesis space of SubdueCL is connected graphs and it is insufficient at capturing concepts which consist of a disconnected graph. FSG+SVM, FSG+AdaBoost, DT-CLGBI have comparable performance in most cases.
Keywords :
data mining; graph theory; support vector machines; AdaBoost; DT-CLGBI; SVM; SubdueCL; empirical comparison; frequent subgraph mining; graph classification algorithms; graph database; mutagenesis; predictive toxicology domain; walk-based graph kernel; Classification algorithms; Classification tree analysis; Databases; Kernel; Labeling; Machine learning; Support vector machine classification; Support vector machines; System testing; Toxicology;
Conference_Titel :
Computational Intelligence and Data Mining, 2009. CIDM '09. IEEE Symposium on
Conference_Location :
Nashville, TN
Print_ISBN :
978-1-4244-2765-9
DOI :
10.1109/CIDM.2009.4938658