Title :
Learning from soft partitions of data: reducing the variance
Author :
Eschrich, Sebastian ; Hall, Lawrence O.
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of South Florida, Tampa, FL, USA
Abstract :
Distributed machine learning can be realized using a divide and conquer methodology. One such divide and conquer method is learning from soft partitions of data. By examining the decomposition of classifier error into bias and variance terms, we see that learning from smaller partitions of data introduces higher variance. In this paper, we investigate the use of a particular variance reduction technique, randomized C4.5, when learning from soft partitions of data. This approach maintains the distributed nature of the learning algorithm while boosting the overall classification accuracy. Experiments on six machine learning datasets demonstrate the improved accuracy gains by reducing classifier variance. In particular, learning from soft partitions of data can produce more accurate classifiers than using an ensemble of randomized decision trees constructed from the entire dataset, which in turn results in a more accurate classifier than building a single decision tree.
Keywords :
data mining; decision trees; divide and conquer methods; fuzzy set theory; learning (artificial intelligence); bias terms; classifier error decomposition; distributed machine learning; divide and conquer methodology; k-means clustering; localized bagging; randomized C4.5; soft partitions of data; variance reduction technique; variance terms; Bagging; Boosting; Classification tree analysis; Computer errors; Computer science; Decision trees; Learning systems; Machine learning; Neurons; Partitioning algorithms;
Conference_Titel :
Fuzzy Systems, 2003. FUZZ '03. The 12th IEEE International Conference on
Print_ISBN :
0-7803-7810-5
DOI :
10.1109/FUZZ.2003.1209443