Author :
Hayes, Jane Huffman ; Antoniol, Giuliano ; Gueheneuc, Yann-Gael
Abstract :
High-level software artifacts, such as requirements, domain-specific requirements, and so on, are an important source of information that is often neglected during the reverse- and re-engineering processes. We posit that domain specific pre-requirements information (PRI) can be obtained by eliciting the stakeholderspsila understanding of generic systems or domains. We discuss the semi-automatic recovery of domain-specific PRI that can then be used during reverse- and re-engineering, for example, to recover traceability links or to assess the degree of obsolescence of a system with respect to competing systems and the clientspsila expectations. We present a method using partition around medoids and agglomerative clustering for obtaining, structuring, analyzing, and labeling textual PRI from a group of diverse stakeholders. We validate our method using PRI for the development of a generic Web browser provided by 22 different stakeholders. We show that, for a similarity threshold of about 0.36, about 55% of the PRI were common to two or more stakeholders and 42% were outliers. We automatically label the common and outlier PRI (82% correctly labeled), and obtain 74% accuracy for the similarity threshold of 0.36 (78% for a threshold of 0.5). We assess the recall and precision of the method, and compare the labeled PRI to a generic Web browser requirements specification.
Keywords :
reverse engineering; software engineering; systems re-engineering; agglomerative clustering; cluster analysis; domain specific prerequirements information; generic Web browser; high-level software artifacts; reengineering; reverse engineering; semiautomatic recovery; Cognitive science; Computer science; Information analysis; Information resources; Labeling; Linux; Reverse engineering; Software maintenance; Software systems; Terminology;