DocumentCode
2107013
Title
Detecting Text Similarity over Chinese Research Papers Using MapReduce
Author
Xu, Fan ; Zhu, Qiaoming ; Li, Peifeng
Author_Institution
Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
fYear
2011
fDate
6-8 July 2011
Firstpage
197
Lastpage
202
Abstract
This paper proposes a novel method to detect text similarity over Chinese research papers using MapReduce paradigm. Our approach differs from the state-of-the-art methods in two aspects. First, we extract the key sentences from Chinese research papers by using some heuristic features and then generate 2-tuple, (document id, key phrase), as the representation of the documents. Second, we design 2-phrase MapReduce algorithm to verify the effectiveness of the generated 2-tuple. For evaluation, we compare the proposed method with other approaches on synthetic corpus. Experimental results review that our method much outperforms the state-of-the-art ones on running time performance while guarantee the Jaccard similarity coefficient.
Keywords
text analysis; Chinese research papers; MapReduce algorithm; document representation; text similarity detection; Algorithm design and analysis; Clustering algorithms; Electronic publishing; Encyclopedias; Feature extraction; Internet; Chinese Research Papers; Copy Detection; MapReduce; Parallel Algorithm; Similarity;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2011 12th ACIS International Conference on
Conference_Location
Sydney, NSW
Print_ISBN
978-1-4577-0896-1
Type
conf
DOI
10.1109/SNPD.2011.29
Filename
6063565
Link To Document