Title :
Original content extraction oriented to anti-plagiarism
Author :
Shen, Yang ; Cheng, Ming ; Yao, Xing ; Wei, Wei
Author_Institution :
Sch. of Inf. Manage., Wuhan Univ., Wuhan, China
Abstract :
In order to reduce the impact of inclusion of citations and references during the detection of plagiarism in academic theses, and extract the original content, the author created three ways to extract original content and remove the citation: 1) Removal of normative citations by symbol features; 2) removal tacit citations by Bayesian method based on the minimum risk and thesis structure; 3) removal common knowledge base on domain public knowledge base. The research results show that during the extraction of original content, the precision decreases as the risk coefficient increases, while the recall rate increases with the risk coefficient. When the risk coefficient is 60, the whole performance achieves the optimum. Plagiarism detection after extracting the original content presents a fault rate decrease from 9.09% to 4.52%.
Keywords :
belief networks; citation analysis; information retrieval; Bayesian method; content extraction; normative citations removal; plagiarism detection; removal tacit citations; Conference management; Content management; Data mining; Engineering management; Knowledge management; Plagiarism; Prototypes; Risk management; Software libraries; Web pages; Beyes; citation removal; content extraction; plagiarism; thesis structure;
Conference_Titel :
Management Science and Engineering, 2009. ICMSE 2009. International Conference on
Conference_Location :
Moscow
Print_ISBN :
978-1-4244-3970-6
Electronic_ISBN :
978-1-4244-3971-3
DOI :
10.1109/ICMSE.2009.5317530