Efficient Levenberg-Marquardt minimization of the cross-entropy error function

Author

Amar Sarić;Jing Xiao

fYear

2011

fDate

7/1/2011 12:00:00 AM

Firstpage

1

Lastpage

8

Abstract

The Levenberg-Marquardt algorithm is one of the most common choices for training medium-size artificial neural networks. Since it was designed to solve nonlinear least-squares problems, its applications to the training of neural networks have so far typically amounted to using simple regression even for classification tasks. However, in this case the cross-entropy function, which corresponds to the maximum likelihood estimate of the network weights when the sigmoid or softmax activation function is used in the output layer, is the natural choice of the error function and a convex function of the weights in the output layer. It is an important property which leads to a more robust convergence of any descent-based training method. By constructing and implementing a modified version of the Levenberg-Marquardt algorithm suitable for minimizing the cross-entropy function, we aim to close a gap in the existing literature on neural networks. Additionally, as using the cross-entropy error measure results in one single error value per training pattern, our approach results in lower memory requirements for multi-valued classification problems when compared to the direct application of the algorithm.

Keywords

"Training","Minimization","Jacobian matrices","Mathematical model","Equations","Vectors","Classification algorithms"

Publisher

ieee

Conference_Titel

Neural Networks (IJCNN), The 2011 International Joint Conference on

ISSN

2161-4393

Print_ISBN

978-1-4244-9635-8

Electronic_ISBN

2161-4407

Type

conf

DOI

10.1109/IJCNN.2011.6033191

Filename

6033191