DocumentCode :
928788
Title :
Scoring levels of categorical variables with heterogeneous data
Author :
Tuv, Eugene ; Runger, George C.
Author_Institution :
Anal. & Control Technol., Intel Corp., Chandler, AZ, USA
Volume :
19
Issue :
2
fYear :
2004
Firstpage :
14
Lastpage :
19
Abstract :
Heterogeneous (mixed-type) data present significant challenges in both supervised and unsupervised learning. The situation is even more complicated when nominal variables have several levels (values) that make using indicator variables (for every categorical level) infeasible. With unsupervised learning, several fairly involved, computationally intensive, nonlinear multivariate techniques iteratively alternate data transformations with optimal scoring. These seek to optimize an objective on the basis of a covariance matrix. Our goal is to find a computationally efficient and flexible method for mapping categorical variables to numeric scores in mixed-type data. We attempt to go beyond optimizing second-order statistics (such as covariance) and enable distance-based methods by exploring mutual relationships or bumps of dependencies between variables. This is a new objective for a scoring method that´s based on patterns learned from all the available variables.
Keywords :
distributed databases; optimisation; regression analysis; statistics; unsupervised learning; categorical variable; distance-based method; heterogeneous mixed-type data; nonlinear multivariate technique; scoring level; second-order statistics optimization; supervised learning; unsupervised learning; Classification tree analysis; Covariance matrix; Density functional theory; Function approximation; Independent component analysis; Multidimensional systems; Optimization methods; Regression tree analysis; Statistics; Unsupervised learning;
fLanguage :
English
Journal_Title :
Intelligent Systems, IEEE
Publisher :
ieee
ISSN :
1541-1672
Type :
jour
DOI :
10.1109/MIS.2004.1274906
Filename :
1274906
Link To Document :
بازگشت