DocumentCode
3289874
Title
An Improved CURD Algorithm for Source Code Mining
Author
Liu, Yangyang ; Zhang, Yang
Author_Institution
Coll. of Inf. Eng., Northwest A&F Univ., Yangling
Volume
4
fYear
2008
fDate
18-20 Oct. 2008
Firstpage
335
Lastpage
339
Abstract
Source code mining algorithm should have the ability to cope with large volume of data and nominal attributes, which are two major characteristics of source code dataset. K-means algorithm is not suitable for clustering source code as it is generally difficult for the users to determine the count of clusters for a previously unknown dataset. CURD clustering algorithm works efficiently. However, it can´t process nominal attributes. In this paper, we propose NCURD algorithm for clustering source code by making CURD applicable to nominal attributes, and by improving the working efficiency of CURD. The experimental results show that NCURD algorithm has excellent clustering performance for clustering source code.
Keywords
data mining; pattern clustering; software engineering; NCURD algorithm; improved CURD algorithm; k-means algorithm; source code clustering; source code dataset; source code mining; Clustering algorithms; Data mining; Educational institutions; Fuzzy systems; Knowledge engineering; Shape;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location
Jinan Shandong
Print_ISBN
978-0-7695-3305-6
Type
conf
DOI
10.1109/FSKD.2008.479
Filename
4666408
Link To Document