Title :
Continuous optimization of hyper-parameters
Author_Institution :
Dept. d´´Inf. et Recherche Oper., Montreal Univ., Que., Canada
Abstract :
Many machine learning algorithms can be formulated as the minimization of a training criterion which involves a hyper-parameter. This hyper-parameter is usually chosen by trial and error with a model selection criterion. In this paper we present a methodology to optimize several hyper-parameters, based on the computation of the gradient of a model selection criterion with respect to the hyper-parameters. In the case of a quadratic training criterion, the gradient of the selection criterion with respect to the hyper-parameters is efficiently computed by back-propagating through a Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyper-parameter gradient involving second derivatives of the training criterion
Keywords :
gradient methods; learning (artificial intelligence); minimisation; neural nets; Cholesky decomposition; continuous optimization; hyper-parameter optimization; implicit function theorem; machine learning algorithms; model selection criterion gradient; quadratic training criterion minimization; Bayesian methods; Linear regression; Machine learning; Machine learning algorithms; Minimization methods; Optimization methods; Supervised learning;
Conference_Titel :
Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on
Conference_Location :
Como
Print_ISBN :
0-7695-0619-4
DOI :
10.1109/IJCNN.2000.857853