Title :
Data mining using genetic programming: the implications of parsimony on generalization error
Author :
Cavaretta, Michael J. ; Chellapilla, Kumar
Author_Institution :
Comput. Aided Eng. Dept., Ford Motor Co., Dearborn, MI, USA
Abstract :
A common data mining heuristic is, “when choosing between models with the same training error, less complex models should be preferred as they perform better on unseen data”. This heuristic may not always hold. In genetic programming a preference for less complex models is implemented as: (i) placing a limit on the size of the evolved program; (ii) penalizing more complex individuals, or both. The paper presents a GP-variant with no limit on the complexity of the evolved program that generates highly accurate models on a common dataset
Keywords :
computational complexity; data mining; generalisation (artificial intelligence); genetic algorithms; GP-variant; common dataset; data mining heuristic; generalization error; genetic programming; less complex models; program complexity; training error; unseen data; Computer aided engineering; Computer errors; Data mining; Decision trees; Genetics; Laboratories; Pattern recognition; Predictive models; Testing; Training data;
Conference_Titel :
Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on
Conference_Location :
Washington, DC
Print_ISBN :
0-7803-5536-9
DOI :
10.1109/CEC.1999.782602