• DocumentCode
    2107013
  • Title

    Detecting Text Similarity over Chinese Research Papers Using MapReduce

  • Author

    Xu, Fan ; Zhu, Qiaoming ; Li, Peifeng

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
  • fYear
    2011
  • fDate
    6-8 July 2011
  • Firstpage
    197
  • Lastpage
    202
  • Abstract
    This paper proposes a novel method to detect text similarity over Chinese research papers using MapReduce paradigm. Our approach differs from the state-of-the-art methods in two aspects. First, we extract the key sentences from Chinese research papers by using some heuristic features and then generate 2-tuple, (document id, key phrase), as the representation of the documents. Second, we design 2-phrase MapReduce algorithm to verify the effectiveness of the generated 2-tuple. For evaluation, we compare the proposed method with other approaches on synthetic corpus. Experimental results review that our method much outperforms the state-of-the-art ones on running time performance while guarantee the Jaccard similarity coefficient.
  • Keywords
    text analysis; Chinese research papers; MapReduce algorithm; document representation; text similarity detection; Algorithm design and analysis; Clustering algorithms; Electronic publishing; Encyclopedias; Feature extraction; Internet; Chinese Research Papers; Copy Detection; MapReduce; Parallel Algorithm; Similarity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2011 12th ACIS International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-1-4577-0896-1
  • Type

    conf

  • DOI
    10.1109/SNPD.2011.29
  • Filename
    6063565