DocumentCode
2420496
Title
Building a Better Similarity Trap with Statistically Improbable Features
Author
Roussev, Vassil
Author_Institution
Dept. of Comput. Sci., Univ. of New Orleans, New Orleans, LA
fYear
2009
fDate
5-8 Jan. 2009
Firstpage
1
Lastpage
10
Abstract
One of the persistent topics in digital forensic research has been the problem of finding all things similar. Developed tools usually take on the form of similarity, or fuzzy hash. In this paper, we present a generic empirical study of the problem of finding common features in binary data. Specifically, we study the problem of false positives and demonstrate that similarity tools work only as well as the underlying data allows them to and, therefore, must be aware of the basic properties of the input. We propose a new feature selection algorithm, which is based on the notion of statistically improbable features. We also show that the proposed method, can be tuned to account for the type-specific distribution of false positives.
Keywords
security of data; statistical distributions; binary data; digital forensic research; false positives; feature selection algorithm; similarity trap; statistically improbable features; type-specific distribution; Computer science; Data mining; Digital forensics; Information retrieval; Pressing; Search engines; Web search;
fLanguage
English
Publisher
ieee
Conference_Titel
System Sciences, 2009. HICSS '09. 42nd Hawaii International Conference on
Conference_Location
Big Island, HI
ISSN
1530-1605
Print_ISBN
978-0-7695-3450-3
Type
conf
DOI
10.1109/HICSS.2009.97
Filename
4755788
Link To Document