The influence of the number of clusters on randomly expanded data sets

Author

Van Zyl, Jacobus ; Cloete, Ian

Author_Institution

Sch. of Inf. Technol., Germany Int. Univ., Bruchsal, Germany

Volume

1

fYear

2003

fDate

2-5 Nov. 2003

Firstpage

355

Abstract

Neural networks have been shown capable of learning arbitrary input-output mappings. However, like most machines learning algorithms, neural networks are adversely affected by sparse training sets, especially with respect to generalization performance. Several approaches to improve generalization performance when only sparse training data are available have been suggested. These include adding noise to training data or to weight updates. One method by Karystinos and Pados first clusters the training data and then generates new training data using a probability density function estimated from the clusters. This paper investigates this method further, especially with respect to the sensitivity of the method to the clustering procedure. We investigate the sensitivity to the number of clusters used by the clustering method, the sensitivity to the clustering method (K-means) itself, and also the use of the minimum differential entropy as an indicator for good cluster choice.

Keywords

learning (artificial intelligence); minimum entropy methods; neural nets; set theory; statistical analysis; clustering procedure; machines learning algorithms; minimum differential entropy; neural networks; probability density function; randomly expanded data sets; sparse training data; sparse training sets; Clustering algorithms; Clustering methods; Covariance matrix; Entropy; Information technology; Jacobian matrices; Machine learning algorithms; Neural networks; Probability density function; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning and Cybernetics, 2003 International Conference on

Print_ISBN

0-7803-8131-9

Type

conf

DOI

10.1109/ICMLC.2003.1264501

Filename

1264501