• DocumentCode
    3289874
  • Title

    An Improved CURD Algorithm for Source Code Mining

  • Author

    Liu, Yangyang ; Zhang, Yang

  • Author_Institution
    Coll. of Inf. Eng., Northwest A&F Univ., Yangling
  • Volume
    4
  • fYear
    2008
  • fDate
    18-20 Oct. 2008
  • Firstpage
    335
  • Lastpage
    339
  • Abstract
    Source code mining algorithm should have the ability to cope with large volume of data and nominal attributes, which are two major characteristics of source code dataset. K-means algorithm is not suitable for clustering source code as it is generally difficult for the users to determine the count of clusters for a previously unknown dataset. CURD clustering algorithm works efficiently. However, it can´t process nominal attributes. In this paper, we propose NCURD algorithm for clustering source code by making CURD applicable to nominal attributes, and by improving the working efficiency of CURD. The experimental results show that NCURD algorithm has excellent clustering performance for clustering source code.
  • Keywords
    data mining; pattern clustering; software engineering; NCURD algorithm; improved CURD algorithm; k-means algorithm; source code clustering; source code dataset; source code mining; Clustering algorithms; Data mining; Educational institutions; Fuzzy systems; Knowledge engineering; Shape;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
  • Conference_Location
    Jinan Shandong
  • Print_ISBN
    978-0-7695-3305-6
  • Type

    conf

  • DOI
    10.1109/FSKD.2008.479
  • Filename
    4666408