DocumentCode :
134402
Title :
Locality-sensitive hashing optimizations for fast malware clustering
Author :
Oprisa, Ciprian ; Checiches, Marius ; Nandrean, Adrian
Author_Institution :
Bitdefender, Bucharest, Romania
fYear :
2014
fDate :
4-6 Sept. 2014
Firstpage :
97
Lastpage :
104
Abstract :
Large datasets, including malware collections are difficult to cluster. Although we are mainly dealing with polynomial algorithms, the long running times make them difficult to use in practice. The main issue consists in the fact that the classical hierarchical algorithms need to compute the distance between each pair of items. This paper will show a faster approach for clustering large collections of malware samples using a technique called locality-sensitive hashing. This approach performs single-linkage clustering faster than the state of the art methods, while producing clusters of a similar quality. Although our proposed algorithm is still quadratic in theory, the coefficient for the quadratic term is several orders of magnitude smaller. Our experiments show that we can reduce this coefficient to under 0.02% and still produce clusters 99.9% similar with the ones produced by the single linkage algorithm.
Keywords :
cryptography; invasive software; optimisation; pattern clustering; polynomials; locality-sensitive hashing optimization; malware clustering; polynomial algorithm; single-linkage clustering; Algorithm design and analysis; Approximation algorithms; Arrays; Clustering algorithms; Dictionaries; Equations; Malware;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Computer Communication and Processing (ICCP), 2014 IEEE International Conference on
Conference_Location :
Cluj Napoca
Print_ISBN :
978-1-4799-6568-7
Type :
conf
DOI :
10.1109/ICCP.2014.6936960
Filename :
6936960
Link To Document :
بازگشت