• DocumentCode
    1620392
  • Title

    An improved clustering validity index for determining the number of malware clusters

  • Author

    Wang, Youyu ; Ye, Yanfang ; Che, Haishan ; Jiang, Qingshan

  • Author_Institution
    Software Sch., Xiamen Univ., Xiamen, China
  • fYear
    2009
  • Firstpage
    544
  • Lastpage
    547
  • Abstract
    Nowadays, along with the development of the malware writing techniques, the diversity and amount of malware variants are constantly increasing and proliferation of these malware has posed major threats to the network computer users. However, variants of malware families always share typical characteristics which reflect their origin and purpose. Therefore, categorizing malware to different families is one of the computer security topics that are of great interest. A fundamental and difficult problem for malware clustering is the determination of the ldquotruerdquo number of malware families on real data collection. To address this issue, in this paper, resting on the analysis of the extracted instruction of malware samples, we propose an improved clustering validity index VNFS for determining the number of malware clusters based on hierarchical clustering algorithm for malware categorization. VNFS is defined based on the concepts of the compactness within each cluster and distances of the clusters representatives. By using a novel function proposed in this paper for total separation between clusters, VNFS can deal with different density and irregular data sets more robustly. A comprehensive experimental study on real daily collections of PE malware samples is performed to compare various clustering validation measure methods, such as RS-RMSSTD, PST2, VFS and VXB. Promising experimental results demonstrate that our proposed clustering validity index VNFS always leads to better cluster number than other clustering validity indices for hierarchical clustering and the number of malware clusters it recommends is almost the same as the virus analysts suggest.
  • Keywords
    invasive software; pattern classification; pattern clustering; clustering validity index; computer security; hierarchical clustering algorithm; malware categorization; malware cluster; malware writing technique; network computer user; Algorithm design and analysis; Clustering algorithms; Computer networks; Computer science; Computer security; Data mining; Performance analysis; Performance evaluation; Robustness; Writing; clustering validity; hierarchical clustering; malware clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Anti-counterfeiting, Security, and Identification in Communication, 2009. ASID 2009. 3rd International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    978-1-4244-3883-9
  • Electronic_ISBN
    978-1-4244-3884-6
  • Type

    conf

  • DOI
    10.1109/ICASID.2009.5277000
  • Filename
    5277000