Extracting, identifiyng and visualisation of the content in software projects

Author

Uhlar, M. ; Polasek, Ivan

Author_Institution

Fac. of Inf. & Inf. Technol., Slovak Univ. of Technol. in Bratislava, Bratislava, Slovakia

fYear

2012

fDate

5-9 Nov. 2012

Firstpage

72

Lastpage

78

Abstract

The paper proposes a method for extracting, identifying and visualisation of topics in software projects. In addition to standard information retrieval techniques, we use AST and WordNet ontology to enrich document vectors extracted from parsed source code, LSI to reduce its dimensionality and the swarm intelligence in the bee behaviour inspired algorithms to cluster documents contained in it. We extract topics from the identified clusters and visualise them in 3D graph. The goal is to provide insight into software projects for development participants in the process of analysing and reusing the source code.

Keywords

data visualisation; graph theory; information retrieval; ontologies (artificial intelligence); software engineering; source coding; vectors; 3D graph; AST; LSI; WordNet ontology; content extraction; content identification; content visualisation; document vectors; information retrieval; parsed source code; software projects; Clustering algorithms; Indexes; Large scale integration; Software; Software algorithms; Vectors; Visualization; AST; Bee Behaviour Inspired Algorithms; Latent Semantic Indexing; Software Project; Source Code; Swarm Intelligence; Topic Identification and Extraction; Visualisation; WordNet Ontology;

fLanguage

English

Publisher

ieee

Conference_Titel

Nature and Biologically Inspired Computing (NaBIC), 2012 Fourth World Congress on

Conference_Location

Mexico City

Print_ISBN

978-1-4673-4767-9

Type

conf

DOI

10.1109/NaBIC.2012.6402242

Filename

6402242