Title :
A Roughset Based Data Labeling Method for Clustering Categorical Data
Author :
Reddy, H. Venkateswara ; Raju, S. Viswanadha
Author_Institution :
Dept. of Comput. Sci. & Eng., Vardhaman Coll. of Eng., Hyderabad, India
Abstract :
Data mining presets the process of finding analytical accounts in huge databases. Clustering is a one of efficient technique in data mining and it is performed based on the principle of similarity. Clustering the large database is a demanding and time consuming task. For this reason, an approach called data labeling through sampling technique is used. Data labeling is process of clustering the un sampled data objects into appropriate clusters. In this approach clustering the data is easy and also it improves the efficiency of clustering. In this method initially a sample dataset is chosen from a large database for clustering when initial clustering is completed, and the unsampled data objects are compared with the presented clusters. As a result, the similar data objects are given proper clustered labels and the dissimilar ones are treated as outliers. These data labeling methods are easier to execute on the numerical data, but it is complicated task for the categorical data because the distance among data objects does not exist. In the proposed method, a new and efficient data labeling technique is used to cluster the categorical data based on the cluster entropy in rough set theory. It is shown through the experimental results that the proposed algorithm is efficient and produces high quality clusters than previous clustering methods.
Keywords :
data mining; pattern clustering; rough set theory; clustering categorical data; data labeling; data mining; initial clustering; numerical data; roughset based data labeling method; sampling technique; unsampled data objects; Algorithm design and analysis; Clustering algorithms; Data mining; Databases; Entropy; Labeling; Rough sets; Categorical Data; Data labeling; Entropy; Outlier; Rough Sets;
Conference_Titel :
Eco-friendly Computing and Communication Systems (ICECCS), 2014 3rd International Conference on
Print_ISBN :
978-1-4799-7003-2
DOI :
10.1109/Eco-friendly.2014.86