Abstract :
Standard ways of calculating the similarity of different computer programs are needed in computer science. Such measurements can be useful in many different areas such as clone detection, refactoring, compiler optimization, and run-time optimization. Such standards are particularly important for uncovering plagiarism, trade secret theft, copyright infringement, and patent infringement. Other uses include locating open source code within a proprietary program and determining the authors of different programs. In a previous paper (R. Zeidman, 2006) I introduced the concept of source code correlation, presented a theoretical basis for such a measure, and described a program, CodeMatchreg, that compares software source code and calculates correlation. That paper compared the described method of source code correlation against existing methods of comparing source code and found it to be significantly superior. This paper refines that definition of source code correlation and presents a new, more robust, definition of multidimensional source code correlation.
Keywords :
computer science; copyright; patents; programming; security of data; software engineering; source coding; CodeMatch; computer programs; computer science; copyright infringement; multidimensional correlation; patent infringement; plagiarism; software source code; trade secret theft; Area measurement; Cloning; Computer science; Multidimensional systems; Open source software; Optimizing compilers; Plagiarism; Robustness; Runtime; Software measurement; Clone Detection; Copyright; Correlation; Infringement; Intellectual Property; Patent; Plagiarism; Refactoring; Source Code; Theft; Trade Secret;