Title :
Batch source-code plagiarism detection using an algorithm for the bounded longest common subsequence problem
Author :
Campos, R.A.C. ; Martinez, F.J.Z.
Author_Institution :
Dept. de Sist., UAM Azcapotzalco, Mexico City, Mexico
Abstract :
Source-code plagiarism detection is an unfortunate but necessary activity when reviewing assignments of programming courses. While being reasonably easy to fool, string-based comparisons offer a high degree of accuracy with almost no false positives and usually a good string similarity metric is the length of their longest common subsequence. In the case of two strings, the dynamic programming algorithm for this calculation unfortunately takes quadratic time even if the strings are equal. In this paper we present an algorithm that, given a batch of source-code files, efficiently finds all pairs of similar files by preprocessing the files and then using a fast branch-and-bound algorithm to find only those pairs whose longest common subsequence is indicative of plagiarism.
Keywords :
dynamic programming; programming theory; string matching; batch source code plagiarism detection; bounded longest common subsequence problem; branch-and-bound algorithm; dynamic programming algorithm; false positives; source code file preprocessing; string similarity metric; string-based comparisons; Algorithm design and analysis; Data preprocessing; Dynamic programming; Heuristic algorithms; Indexes; Plagiarism; Programming; Plagiarism detection; branch and bound; longest common subsequence; source-code;
Conference_Titel :
Electrical Engineering, Computing Science and Automatic Control (CCE), 2012 9th International Conference on
Conference_Location :
Mexico City
Print_ISBN :
978-1-4673-2170-9
DOI :
10.1109/ICEEE.2012.6421180