DocumentCode :
588861
Title :
The Effects of Different Representations on Malware Motif Identification
Author :
Narayanan, Arun ; Yi Chen ; Shaoning Pang ; Ban Tao
Author_Institution :
Sch. of Comput. & Math. Sci., Auckland Univ. of Technol., Auckland, New Zealand
fYear :
2012
fDate :
17-18 Nov. 2012
Firstpage :
86
Lastpage :
90
Abstract :
Sequence alignment is widely used in bioinformatics for revealing the genetic diversity of organisms and annotating gene functions by finding regions of similarity across biosequences. Such alignment requires sequences to be represented in the DNA or protein alphabet for tools such as Clustal to work. Previous work has demonstrated the feasibility of applying biosequence multiple alignment techniques to computer viral and worm signatures to find regions of similarity that can serve as malware `motifs´, or meta-signatures. However, it was not known how different ways of representing signatures in an appropriate biosequence alphabet would affect the alignment results. This paper investigates the effects of adopting three different ways of representing malware signatures on sequence alignment and motif identification. The results of the alignment were checked with perceptrons, decision tree and logistic regression. The best performing representation was used to derive rules in PRISM that give rise to `motifs´ that can perform the role of `metasignatures´. All analysis was undertaken on the publicly available data mining tool, Weka (Waikato Environment for Knowledge Analysis: http://www.cs.waikato.ac.nz/ml/weka/).
Keywords :
DNA; bioinformatics; data mining; decision trees; digital signatures; invasive software; regression analysis; Clustal; DNA; PRISM; Waikato environment for knowledge analysis; Weka; bioinformatics; biosequence multiple alignment techniques; computer viral signatures; data mining tool; decision tree; gene functions annotating; genetic organisms diversity; logistic regression; malware motif identification; meta-signatures; protein alphabet; worm signatures; Accuracy; Amino acids; Benchmark testing; Grippers; Logistics; Malware; Viruses (medical); Multiple sequence alignment; molecular visualisation; motifs; viral signatures; viruses; worms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Security (CIS), 2012 Eighth International Conference on
Conference_Location :
Guangzhou
Print_ISBN :
978-1-4673-4725-9
Type :
conf
DOI :
10.1109/CIS.2012.27
Filename :
6405872
Link To Document :
بازگشت