• DocumentCode
    30711
  • Title

    What You Submit Is Who You Are: A Multimodal Approach for Deanonymizing Scientific Publications

  • Author

    Payer, M. ; Ling Huang ; Gong, N.Z. ; Borgolte, K. ; Frank, M.

  • Author_Institution
    Purdue Univ., West Lafayette, IN, USA
  • Volume
    10
  • Issue
    1
  • fYear
    2015
  • fDate
    Jan. 2015
  • Firstpage
    200
  • Lastpage
    212
  • Abstract
    The peer-review system of most academic conferences relies on the anonymity of both the authors and reviewers of submissions. In particular, with respect to the authors, the anonymity requirement is heavily disputed and pros and cons are discussed exclusively on a qualitative level. In this paper, we contribute a quantitative argument to this discussion by showing that it is possible for a machine to reveal the identity of authors of scientific publications with high accuracy. We attack the anonymity of authors using statistical analysis of multiple heterogeneous aspects of a paper, such as its citations, its writing style, and its content. We apply several multilabel, multiclass machine learning methods to model the patterns exhibited in each feature category for individual authors and combine them to a single ensemble classifier to deanonymize authors with high accuracy. To the best of our knowledge, this is the first approach that exploits multiple categories of discriminative features and uses multiple, partially complementing classifiers in a single, focused attack on the anonymity of the authors of an academic publication. We evaluate our author identification framework, deAnon, based on a real-world data set of 3894 papers. From these papers, we target 1405 productive authors that each have at least three publications in our data set. Our approach returns a ranking of probable authors for anonymous papers, an ordering for guessing the authors of a paper. In our experiments, following this ranking, the first guess corresponds to one of the authors of a paper in 39.7% of the cases, and at least one of the authors is among the top 10 guesses in 65.6% of all cases. Thus, deAnon significantly outperforms current state-of-the-art techniques for automatic deanonymization.
  • Keywords
    data mining; data privacy; electronic publishing; pattern classification; text analysis; academic conferences; academic publication; anonymity requirement; anonymous papers; author identification framework; author ranking; authors anonymity attack; citation analysis; content analysis; deAnon framework; deanonymized authors; discriminative features; ensemble classifier; feature category; multilabel multiclass machine learning methods; multimodal approach; multiple heterogeneous aspects; multiple-partially-complementing classifiers; pattern modelling; peer-review system; qualitative level; quantitative argument; real-world data set; scientific publication deanonymization; single-focused attack; statistical analysis; text mining; writing style; Accuracy; Data mining; Feature extraction; Portable document format; Support vector machines; Training; Writing; Data privacy; text analysis; text mining;
  • fLanguage
    English
  • Journal_Title
    Information Forensics and Security, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1556-6013
  • Type

    jour

  • DOI
    10.1109/TIFS.2014.2368355
  • Filename
    6949149