Title :
A Discretization Algorithm of Continuous Attributes Based on Supervised Clustering
Author :
Hua, Haiyang ; Zhao, Huaici
Author_Institution :
Shenyang Inst. of Autom., Chinese Acad. of Sci., Shenyang, China
Abstract :
Many machine learning algorithms can be applied only to data described by categorical attributes. So discretization of continuous attributes is one of the important steps in preprocessing of extracting knowledge. Traditional discretization algorithms based on clustering need a pre-determined clustering number k, also typically are applied in an unsupervised learning framework. This paper describes such an algorithm, called SX-means (Supervised X-means), which is a new algorithm of supervised discretization of continuous attributes on clustering .The algorithm modifies clusters with knowledge of the class distribution dynamically. And this procedure can not stop until the proper k is found. For the number of clusters k is not pre-determined by the user and class distribution is applied, the random of result is decreased greatly. Experimental evaluation of several discretization algorithms on six artificial data sets show that the proposed algorithm is more efficient and can generate a better discretization schema. Comparing the output of C4.5, resulting tree is smaller, less classification rules, and high accuracy of classification.
Keywords :
category theory; knowledge acquisition; pattern clustering; unsupervised learning; SX-means; categorical attributes; class distribution; continuous attributes; discretization algorithm; discretization schema; knowledge extraction; machine learning algorithms; supervised X-means; supervised clustering; unsupervised learning framework; Automation; Classification tree analysis; Clustering algorithms; Data mining; Euclidean distance; Machine learning algorithms; Merging; Partitioning algorithms; Simulated annealing; Unsupervised learning;
Conference_Titel :
Pattern Recognition, 2009. CCPR 2009. Chinese Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4244-4199-0
DOI :
10.1109/CCPR.2009.5344142