• DocumentCode
    1245764
  • Title

    Information-theoretic software clustering

  • Author

    Andritsos, Periklis ; Tzerpos, Vassilios

  • Author_Institution
    Dept. of Comput. Sci., Toronto Univ., Ont., Canada
  • Volume
    31
  • Issue
    2
  • fYear
    2005
  • Firstpage
    150
  • Lastpage
    165
  • Abstract
    The majority of the algorithms in the software clustering literature utilize structural information to decompose large software systems. Approaches using other attributes, such as file names or ownership information, have also demonstrated merit. At the same time, existing algorithms commonly deem all attributes of the software artifacts being clustered as equally important, a rather simplistic assumption. Moreover, no method that can assess the usefulness of a particular attribute for clustering purposes has been presented in the literature. In this paper, we present an approach that applies information theoretic techniques in the context of software clustering. Our approach allows for weighting schemes that reflect the importance of various attributes to be applied. We introduce LIMBO, a scalable hierarchical clustering algorithm based on the minimization of information loss when clustering a software system. We also present a method that can assess the usefulness of any nonstructural attribute in a software clustering context. We applied LIMBO to three large software systems in a number of experiments. The results indicate that this approach produces clusterings that come close to decompositions prepared by system experts. Experimental results were also used to validate our usefulness assessment method. Finally, we experimented with well-established weighting schemes from information retrieval, Web search, and data clustering. We report results as to which weighting schemes show merit in the decomposition of software systems.
  • Keywords
    information retrieval; pattern clustering; reverse engineering; software architecture; software maintenance; software metrics; systems re-engineering; Web search; data clustering; information retrieval; software clustering; software system; Clustering algorithms; Computer Society; Computer architecture; Information retrieval; Minimization methods; Reverse engineering; Software algorithms; Software engineering; Software systems; Web search; Index Terms- Reverse engineering; architecture reconstruction; clustering; information theory.; reengineering;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/TSE.2005.25
  • Filename
    1401930