Clustering Source Code Elements by Semantic Similarity Using Wikipedia

Author

Schindler, Mirco ; Fox, Oliver ; Rausch, Andreas

Author_Institution

Dept. of Inf. - Software Syst. Eng., Clausthal Univ. of Technol., Clausthal-Zellerfeld, Germany

fYear

2015

fDate

17-17 May 2015

Firstpage

13

Lastpage

18

Abstract

For humans it is no problem to determine if two words have a high or low semantic similarity in a given context. But is it possible to support a software developer or architect by using semantic data extracted from source code in the same way other relations like typical source code relations do? To answer this question we developed an approach to compute the semantic similarity by using Wikipedia as a textual corpus. In a case study we demonstrate this approach with a manageable software system. The results of using semantic similarities are compared with the outcome of using source code relations instead.

Keywords

Web sites; information retrieval; pattern clustering; semantic Web; software development management; source code (software); text analysis; word processing; Wikipedia; clustering source code element; manageable software system; semantic data extraction; semantic similarity; software developer; source code relations; textual corpus; Electronic publishing; Encyclopedias; Internet; Semantics; Software systems; Component Structure; Information Retrieval; Modularization; Semantic Similarity; Software Architecture; Spectral Clustering; Wikipedia;

fLanguage

English

Publisher

ieee

Conference_Titel

Realizing Artificial Intelligence Synergies in Software Engineering (RAISE), 2015 IEEE/ACM 4th International Workshop on

Conference_Location

Florence

Type

conf

DOI

10.1109/RAISE.2015.10

Filename

7168326