Title :
Algorithm of the Text Copy Detection Based on Topic Bag
Author :
Wang Sen ; Wang Yu
Author_Institution :
Sch. of Manage., Dalian Univ. of Technol., Dalian, China
Abstract :
In order to resolve the current problem about seriously academic plagiarism in the web environment, this article proposes an algorithm of the text copy detection on the topic bag and the algorithm uses the idea of semantic clustering and multi-instance learning. Firstly, a paper is divided into three layers construction tree: a leaf node denotes a sentence; a branch node represents a topic bag, and the topic bag formed by semantic clustering of several paragraphs; the uppermost a root node is a text. Secondly, the similarities of topic bags are calculated by the similarities of sentences; then we can get the similarity of two papers by similarities and weights of topic bags. Experiments show that the proposed algorithm has higher accuracy.
Keywords :
Internet; pattern clustering; text analysis; Web environment; academic plagiarism; multiinstance learning; semantic clustering; text copy detection; topic bag; Copy Detection; Sentence; Similarity; Topic Bags; Web Environment;
Conference_Titel :
Web Information Systems and Mining (WISM), 2010 International Conference on
Conference_Location :
Sanya
Print_ISBN :
978-1-4244-8438-6
DOI :
10.1109/WISM.2010.159