DocumentCode
402901
Title
The influence of the number of clusters on randomly expanded data sets
Author
Van Zyl, Jacobus ; Cloete, Ian
Author_Institution
Sch. of Inf. Technol., Germany Int. Univ., Bruchsal, Germany
Volume
1
fYear
2003
fDate
2-5 Nov. 2003
Firstpage
355
Abstract
Neural networks have been shown capable of learning arbitrary input-output mappings. However, like most machines learning algorithms, neural networks are adversely affected by sparse training sets, especially with respect to generalization performance. Several approaches to improve generalization performance when only sparse training data are available have been suggested. These include adding noise to training data or to weight updates. One method by Karystinos and Pados first clusters the training data and then generates new training data using a probability density function estimated from the clusters. This paper investigates this method further, especially with respect to the sensitivity of the method to the clustering procedure. We investigate the sensitivity to the number of clusters used by the clustering method, the sensitivity to the clustering method (K-means) itself, and also the use of the minimum differential entropy as an indicator for good cluster choice.
Keywords
learning (artificial intelligence); minimum entropy methods; neural nets; set theory; statistical analysis; clustering procedure; machines learning algorithms; minimum differential entropy; neural networks; probability density function; randomly expanded data sets; sparse training data; sparse training sets; Clustering algorithms; Clustering methods; Covariance matrix; Entropy; Information technology; Jacobian matrices; Machine learning algorithms; Neural networks; Probability density function; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2003 International Conference on
Print_ISBN
0-7803-8131-9
Type
conf
DOI
10.1109/ICMLC.2003.1264501
Filename
1264501
Link To Document