DocumentCode :
3326196
Title :
Gradient descent fails to separate
Author :
Brady, M. ; Raghavan, R. ; Slawny, J.
Author_Institution :
Lockheed Res. & Dev., Palo Alto, CA, USA
fYear :
1988
fDate :
24-27 July 1988
Firstpage :
649
Abstract :
In the context of neural network procedures, it is proved that gradient descent on a surface defined by a sum of squared errors can fail to separate families of vectors. Each output is assumed to be a differentiable monotone transformation (typically the logistic) of a linear combination of inputs. Several examples are given of two families of vectors for which a linear combination exists that will serve to separate the two families. However, the minimum cost solution does not yield the desired combination. The examples include several cases where there are no local minima, as well as a one-layer system showing local minima with a large basin of attraction. In contrast to the perceptron convergence theorem, which proves that the perceptron architecture, there is no convergence theorem for gradient descent which would allow correct classification. The theorem disproves the presumption made in recent years, that barring local minima, gradient descent will find the best set of weights for a given problem.<>
Keywords :
neural nets; optimisation; differentiable monotone transformation; gradient descent; neural network; optimisation; squared errors; Neural networks; Optimization methods;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 1988., IEEE International Conference on
Conference_Location :
San Diego, CA, USA
Type :
conf
DOI :
10.1109/ICNN.1988.23902
Filename :
23902
Link To Document :
بازگشت