• DocumentCode
    3731325
  • Title

    A monolingual approach to detection of text reuse in Russian-English collection

  • Author

    Oleg Bakhteev;Rita Kuznetsova;Alexey Romanov;Anton Khritankov

  • Author_Institution
    Antiplagiat JSC, Moscow, Russia
  • fYear
    2015
  • Firstpage
    3
  • Lastpage
    10
  • Abstract
    In this paper we develop a method for cross-lingual (Russian and English) text reuse detection. The method is based on the monolingual approach - translation of texts into one language and reduction to the text similarity problem. We split texts into non-overlapping fragments and compare fragments to each other by means of different metrics - BLEU(1-2), ME-TEOR, cosine similarity between bag-of-words representations of each snippet, and cosine similarity between vectors obtained from doc2vec-trained model. We explore the impact of choice of metric on the quality of text reuse detection. We assess quality of the method on a sample of a hundred scientific documents, originally in Russian, machine translated into English. Preliminary findings demonstrate feasibility of the approach.
  • Publisher
    ieee
  • Conference_Titel
    Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT), 2015
  • Type

    conf

  • DOI
    10.1109/AINL-ISMW-FRUCT.2015.7382960
  • Filename
    7382960