• DocumentCode
    3765359
  • Title

    Are We Missing Labels? A Study of the Availability of Ground-Truth in Network Security Research

  • Author

    Sebastian Abt;Harald Baier

  • Author_Institution
    da/sec - Biometrics &
  • fYear
    2014
  • Firstpage
    40
  • Lastpage
    55
  • Abstract
    Network security is a long-lasting field of research constantly encountering new challenges. Inherently, research in this field is highly data-driven. Specifically, many approaches employ a supervised machine learning approach requiring labelled input data. While different publicly available data sets exist, labelling information is sparse. In order to understand how our community deals with this lack of labels, we perform a systematic study of network security research accepted at top IT security conferences in 2009-2013. Our analysis reveals that 70% of the papers reviewed rely on manually compiled data sets. Furthermore, only 10% of the studied papers release the data sets after compilation. This manifests that our community is facing a missing labelled data problem. In order to be able to address this problem, we give a definition and discuss crucial characteristics of the problem. Furthermore, we reflect and discuss roads towards overcoming this problem by establishing ground-truth and fostering data sharing.
  • Keywords
    "Security","Communication networks","Internet","IP networks","Payloads","Labeling","Biometrics (access control)"
  • Publisher
    ieee
  • Conference_Titel
    Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), 2014 Third International Workshop on
  • Print_ISBN
    978-1-4799-8308-7
  • Type

    conf

  • DOI
    10.1109/BADGERS.2014.11
  • Filename
    7446034