Title :
Experiments on the Indonesian plagiarism detection using latent semantic analysis
Author :
Soleman, Sidik ; Purwarianti, Ayu
Author_Institution :
Sch. of Electr. Eng. & Inf., Bandung Inst. of Technol., Bandung, Indonesia
Abstract :
Plagiarism is an important task since its number is increasing and the plagiarism technique is getting difficult. It means that there is not only literal plagiarism but also intelligence plagiarism. In order to handle the intelligence plagiarism, we employed latent semantic analysis (LSA) as the term-document representation. The LSA was used in the Heuristic Retrieval (HR) component and Detailed Analysis (DA) component. We conducted several experiments to compare the token type, the text segmentation and the threshold value. The test data were prepared manually from the available Indonesian paper corpus. Experimental results showed that the LSA outperformed the VSM (Vector Space Model), especially in test cases with intelligence plagiarism.
Keywords :
data analysis; text analysis; Indonesian paper corpus; Indonesian plagiarism detection; LSA; VSM; intelligence plagiarism; latent semantic analysis; term-document representation; text segmentation; threshold value; token type; vector space model; Communications technology; Matrix decomposition; Plagiarism; Sections; Semantics; System performance; Vectors;
Conference_Titel :
Information and Communication Technology (ICoICT), 2014 2nd International Conference on
Conference_Location :
Bandung
DOI :
10.1109/ICoICT.2014.6914098