Title :
Detecting Text Similarity over Chinese Research Papers Using MapReduce
Author :
Xu, Fan ; Zhu, Qiaoming ; Li, Peifeng
Author_Institution :
Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
Abstract :
This paper proposes a novel method to detect text similarity over Chinese research papers using MapReduce paradigm. Our approach differs from the state-of-the-art methods in two aspects. First, we extract the key sentences from Chinese research papers by using some heuristic features and then generate 2-tuple, (document id, key phrase), as the representation of the documents. Second, we design 2-phrase MapReduce algorithm to verify the effectiveness of the generated 2-tuple. For evaluation, we compare the proposed method with other approaches on synthetic corpus. Experimental results review that our method much outperforms the state-of-the-art ones on running time performance while guarantee the Jaccard similarity coefficient.
Keywords :
text analysis; Chinese research papers; MapReduce algorithm; document representation; text similarity detection; Algorithm design and analysis; Clustering algorithms; Electronic publishing; Encyclopedias; Feature extraction; Internet; Chinese Research Papers; Copy Detection; MapReduce; Parallel Algorithm; Similarity;
Conference_Titel :
Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2011 12th ACIS International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4577-0896-1
DOI :
10.1109/SNPD.2011.29