آشكارسازي بدافزارها با استفاده از دسته‌بندي دنباله‌هاي با طول متغير

عنوان به زبان ديگر

Malware Detection using Classification of Variable-Length Sequences

پديد آورندگان

حسيني، فاطمه دانشگاه آزاد اسلامي واحد علوم و تحقيقات، تهران - گروه مهندسي كامپيوتر , ميرزا رضايي، ميترا دانشگاه آزاد اسلامي واحد علوم و تحقيقات، تهران - گروه مهندسي كامپيوتر , شريفي، آرش دانشگاه آزاد اسلامي واحد علوم و تحقيقات، تهران - گروه مهندسي كامپيوتر

تعداد صفحه

از صفحه

137

تا صفحه

146

كليدواژه

آشكارسازي بدافزارها , روش‌هاي مبتني بر گراف , تركيب دسته‌بندها , دسته‌بندي با طول متغير , ماشين بردار پشتيبان

چكيده فارسي

در اين مقاله روشي مبتني بر گراف به عنوان استخراج ويژگي براي دنباله هاي با طول متغير پيشنهاد مي شود. روش پيشنهادي بدون ثابت‌كردن طول دنباله ها، با تعيين پر تكرارترين دستورها و گذاشتن باقي دستورها در مجموعه ‘other’ از لحاظ سرعت و حافظه صرفه جويي مي كند. با توجه به ميزان شباهت ويژگي ها، هر نمونه امتيازي مي گيرد و از امتيازات جهت دسته بندي استفاده مي شود. براي بهبود نتايج، دو رويكرد پيشنهاد مي‌شود. در رويكرد نخست، ويژگي‌هاي استخراج‌شده از روش هاي امتيازدهي بر روي آپكد، هگزادسيمال و فراخواني سيستمي در ورودي دسته بندها تركيب مي شوند. در رويكرد دوم، خروجي دسته بندهاي مختلف تركيب شده و از رأي اكثريت استفاده مي شود. رويكرد پيشنهادي با دقت 97 % بدافزارهاي دگرگون‌شده رايانه‌اي از مجموعه vxheaven را نه ‌تنها شناسايي، بلكه دسته بدافزارها را نيز تعيين ميكند؛ در‌‌حالي كه روش هايSSD و HMM تحت شرايط يكسان با دقت 84 % و 80 % توانستند بدافزارها را شناسايي كنند.

چكيده لاتين

In this paper, a novel method based on the graph is proposed to classify the sequence of variable length as feature extraction. The proposed method overcomes the problems of the traditional graph with variable length of data, without fixing length of sequences, by determining the most frequent instructions and insertion the rest of instructions on the set of “other”, save speed and memory. According to features and the similarities of them, a score is given to each sample and that is used for classification. To improve the results, the method is not used alone, but in the two approaches, this method is combined with other existing Technique to get better results. In the first approach, which can be considered as a feature extraction, extracted features from scoring techniques (Hidden Markov Model, simple substitution distance and similarity graph) on op-code sequences, hexadecimal sequences and system calls are combined at classifier input. The second approach consists of two steps, in the first step; the scores which obtained from each of the scoring Technique are given to the three support vector machine. The outcomes are combined according to the weight of each Technique and the final decision is taken based on the majority vote. Among the components of the support vector machine, when given a higher weight in the similarity graph method (the proposed method), the result is better, Because the similarity graph method is more accurate than the other two methods. Then, in the second section, considering the strengths and benefits of each classifier, classifier outputs are combined and the majority voting is used. Three methods have been tested for group combinations, including Ensemble Averaging, Bagging, and Boosting. Ensemble Averaging consisting of the combination of four classifiers of random forests, a support vector machine (as obtained in the previous section), K nearest neighbors and naive Bayes, and the final decision is taken based on the majority vote; therefore, it is used as the proposed method. The proposed approach could detect metamorphic malware from Vxheaven set and also determines categories of malware with accuracy of 97%, while the SSD and HMM methods under the same conditions could detect malware with an accuracy of 84% and 80% respectively.

سال انتشار

1398

عنوان نشريه

پردازش علائم و داده ها

فايل PDF

7755368

لينک به اين مدرک

https://search.isc.ac/dl/search/defaultta.aspx?DTC=8&DC=1123214