Title :
Using n-grams to rapidly characterise the evolution of software code
Author :
Rainer, Austen ; Lane, Peter C R ; Malcolm, James A. ; Scholz, Sven-Bodo
Author_Institution :
Sch. of Comput. Sci., Univ. of Hertfordshire, Hatfield
Abstract :
Text-based approaches to the analysis of software evolution are attractive because of the fine-grained, token-level comparisons they can generate. The use of such approaches has, however, been constrained by the lack of an efficient implementation. In this paper we demonstrate the ability of Ferret, which uses n-grams of 3 tokens, to characterise the evolution of software code. Ferretpsilas implementation operates in almost linear time and is at least an order of magnitude faster than the diff tool. Ferretpsilas output can be analysed to reveal several characteristics of software evolution, such as: the lifecycle of a single file, the degree of change between two files, and possible regression. In addition, the similarity scores produced by Ferret can be aggregated to measure larger parts of the system being analysed.
Keywords :
software engineering; systems analysis; Ferret ability; software code evolution; systems analysis; text-based approaches; Application software; Cloning; Computer languages; Computer science; Educational institutions; Information retrieval; Performance analysis; Software measurement; Software performance; Software systems;
Conference_Titel :
Automated Software Engineering - Workshops, 2008. ASE Workshops 2008. 23rd IEEE/ACM International Conference on
Conference_Location :
L´Aquila
Print_ISBN :
978-1-4244-2776-5
DOI :
10.1109/ASEW.2008.4686320