An Improved CURD Algorithm for Source Code Mining

Author

Liu, Yangyang ; Zhang, Yang

Author_Institution

Coll. of Inf. Eng., Northwest A&F Univ., Yangling

Volume

fYear

2008

fDate

18-20 Oct. 2008

Firstpage

335

Lastpage

339

Abstract

Source code mining algorithm should have the ability to cope with large volume of data and nominal attributes, which are two major characteristics of source code dataset. K-means algorithm is not suitable for clustering source code as it is generally difficult for the users to determine the count of clusters for a previously unknown dataset. CURD clustering algorithm works efficiently. However, it can´t process nominal attributes. In this paper, we propose NCURD algorithm for clustering source code by making CURD applicable to nominal attributes, and by improving the working efficiency of CURD. The experimental results show that NCURD algorithm has excellent clustering performance for clustering source code.

Keywords

data mining; pattern clustering; software engineering; NCURD algorithm; improved CURD algorithm; k-means algorithm; source code clustering; source code dataset; source code mining; Clustering algorithms; Data mining; Educational institutions; Fuzzy systems; Knowledge engineering; Shape;

fLanguage

English

Publisher

ieee

Conference_Titel

Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on

Conference_Location

Jinan Shandong

Print_ISBN

978-0-7695-3305-6

Type

conf

DOI

10.1109/FSKD.2008.479

Filename

4666408

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3289874