• DocumentCode
    653229
  • Title

    Random Forest Classification for Detecting Android Malware

  • Author

    Alam, Md Shamsul ; Vuong, Son T.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of British Columbia, Vancouver, BC, Canada
  • fYear
    2013
  • fDate
    20-23 Aug. 2013
  • Firstpage
    663
  • Lastpage
    669
  • Abstract
    Internet connected smartphone devices play a crucial role in the application domain of Internet of Things. These devices are being widely used for day-to-day activities such as remotely controlling lighting and heating at homes, paying for parking, and recently for paying for goods using saved credit card information using Near Field Communication (NFC). Android is the most popular smartphone platform today. It is also the choice of malware authors to obtain secure and private data. In this paper we exclusively apply the machine learning ensemble learning algorithm Random Forest supervised classifier on an Android feature dataset of 48919 points of 42 features each. Our goal was to measure the accuracy of Random Forest in classifying Android application behavior to classify applications as malicious or benign. Moreover, we wanted to focus on detection accuracy as the free parameters of the Random Forest algorithm such as the number of trees, depth of each tree and number of random features selected are varied. Our experimental results based on 5-fold cross validation of our dataset shows that Random Forest performs very well with an accuracy of over 99 percent in general, an optimal Out-Of-Bag (OOB) error rate [3] of 0.0002 for forests with 40 trees or more, and a root mean squared error of 0.0171 for 160 trees.
  • Keywords
    Android (operating system); Internet of Things; data privacy; feature selection; invasive software; learning (artificial intelligence); pattern classification; smart phones; Android application behavior; Android feature dataset; Android malware detection; Internet connected smartphone devices; Internet of Things; application domain; machine learning ensemble learning algorithm; optimal out-of-bag error rate; random feature selection; random forest algorithm; random forest classification; random forest supervised classifier; root mean squared error; Androids; Error analysis; Humanoid robots; Malware; Smart phones; Vectors; Vegetation; Android; Machine Learning; Malware;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Green Computing and Communications (GreenCom), 2013 IEEE and Internet of Things (iThings/CPSCom), IEEE International Conference on and IEEE Cyber, Physical and Social Computing
  • Conference_Location
    Beijing
  • Type

    conf

  • DOI
    10.1109/GreenCom-iThings-CPSCom.2013.122
  • Filename
    6682136