Title :
Nowhere to Hide: Finding Plagiarized Documents Based on Sentence Similarity
Author :
Gustafson, Nathaniel ; Pera, Maria Soledad ; Ng, Yiu-Kai
Author_Institution :
Comput. Sci. Dept., Brigham Young Univ., Provo, UT
Abstract :
Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by authors (owners) of the original copies. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates agraphical view of sentences that are similar (or the same) in D1 and D2. Experimental results verify that SimPaD is highly accurate in detecting (non-) plagiarized documents and outperforms existing plagiarism-detection approaches.
Keywords :
computational linguistics; copyright; electronic publishing; SimPaD; copyrighted documents; economic incentive; graphical sentences view; online publications; plagiarism detection method; plagiarized documents; predefined word-correlation factors; sentence-to-sentence similarity; Computer science; Displays; Euclidean distance; Fingerprint recognition; Intelligent agent; Materials science and technology; Merging; Natural language processing; Plagiarism; Writing; Plagiarism; Sentence Similarity; detection; word-correlation factors;
Conference_Titel :
Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-0-7695-3496-1
DOI :
10.1109/WIIAT.2008.16