Title :
An ensemble approach to building Mercer Kernels with prior information
Author :
Srivastava, Ashok N. ; Schumann, Johann ; Fischer, Bernd
Author_Institution :
Intelligent Syst. Div., NASA Ames Res. Center, Moffett Field, CA, USA
Abstract :
This paper presents a new methodology for automatic knowledge driven data mining based on the theory of Mercer Kernels, which are highly nonlinear symmetric positive definite mappings from the original image space to a very high, possibly infinite dimensional feature space. We describe a new method called Mixture Density Mercer Kernels (MDMK) to learn kernel function directly from data, rather than using pre-defined kernels. These data adaptive kernels can encode prior knowledge in the kernel using a Bayesian formulation, thus allowing for physical information to be encoded in the model. Specifically, we demonstrate the use of the algorithm in situations with extremely small samples of data. We compare the results with existing algorithms on data from the Sloan Digital Sky Survey (SDSS) and demonstrate the method´s superior performance against standard methods. The results show that the Mixture Density Mercer Kernel described here outperforms tree-based classification in distinguishing high-redshift galaxies from low-redshift galaxies by approximately 16% on test data, bagged trees by approximately 7%, and bagged trees built on a much larger sample of data by approximately 2%. The code for these experiments has been generated with the AutoBayes tool, which automatically generates efficient and documented C/C++ code from abstract statistical model specifications. The core of the system is a schema library which contains templates for learning and knowledge discovery algorithms like different versions of EM, or numeric optimization methods like conjugate gradient methods. The template instantiation is supported by symbolic-algebraic computations, which allows AutoBayes to find closed-form solutions and, where possible, to integrate them into the code.
Keywords :
Bayes methods; data mining; learning (artificial intelligence); AutoBayes tool; Bayesian formulation; MDMK method; Mixture Density Mercer Kernels; abstract statistical model specifications; automatic knowledge driven data mining; conjugate gradient methods; documented C-C++ code; ensemble approach; knowledge discovery algorithm; learning algorithm; nonlinear symmetric positive definite mappings; numeric optimization methods; schema library; symbolic-algebraic computations; Bayesian methods; Classification tree analysis; Computer science; Data mining; Distribution functions; Extraterrestrial measurements; Intelligent structures; Intelligent systems; Kernel; NASA;
Conference_Titel :
Systems, Man and Cybernetics, 2005 IEEE International Conference on
Print_ISBN :
0-7803-9298-1
DOI :
10.1109/ICSMC.2005.1571500