DocumentCode
3765359
Title
Are We Missing Labels? A Study of the Availability of Ground-Truth in Network Security Research
Author
Sebastian Abt;Harald Baier
Author_Institution
da/sec - Biometrics &
fYear
2014
Firstpage
40
Lastpage
55
Abstract
Network security is a long-lasting field of research constantly encountering new challenges. Inherently, research in this field is highly data-driven. Specifically, many approaches employ a supervised machine learning approach requiring labelled input data. While different publicly available data sets exist, labelling information is sparse. In order to understand how our community deals with this lack of labels, we perform a systematic study of network security research accepted at top IT security conferences in 2009-2013. Our analysis reveals that 70% of the papers reviewed rely on manually compiled data sets. Furthermore, only 10% of the studied papers release the data sets after compilation. This manifests that our community is facing a missing labelled data problem. In order to be able to address this problem, we give a definition and discuss crucial characteristics of the problem. Furthermore, we reflect and discuss roads towards overcoming this problem by establishing ground-truth and fostering data sharing.
Keywords
"Security","Communication networks","Internet","IP networks","Payloads","Labeling","Biometrics (access control)"
Publisher
ieee
Conference_Titel
Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), 2014 Third International Workshop on
Print_ISBN
978-1-4799-8308-7
Type
conf
DOI
10.1109/BADGERS.2014.11
Filename
7446034
Link To Document