• DocumentCode
    703676
  • Title

    Ensemble learning for detection of malicious content embedded in PDF documents

  • Author

    Nath, Hiran V. ; Mehtre, B.M.

  • Author_Institution
    Center for Inf. Assurance & Manage. (CIAM), Inst. for Dev. & Res. in Banking Technol. (IDRBT), India
  • fYear
    2015
  • fDate
    19-21 Feb. 2015
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Portable Document Format (PDF) is used as a defacto standard for sharing documents. Even though pdf is a document description language, it has lot of features similar to programming language. With the add on support of JavaScript (Malicious script) and the facility to embed any file into a PDF document, creates a big potential for disastrous cyber attacks. From 2008 onwards, the malicious users are concentrating more on embedding malicious codes into pdf documents. Compared to PE, pdf files pose higher risk since the embedded content can be encrypted and/or encoded. Recently multistage delivery of malware is used for APTs and targeted attacks. Here pdf documents are used for accomplishing one or more stages, like mini-duke, where pdf file was used for first stage. It went undetected for almost two years. These files could be considered as a carrier of k-ary codes. In this paper, we bring out the importance of analyzing the data encoded in the stream tag along with other structural information. We are giving a proof of concept by embedding JavaScript into PDF document. This is not detected by any of the existing pdf parsers. Finally, we propose ensemble learning for detecting such pdf files.
  • Keywords
    Java; authoring languages; cryptography; document handling; invasive software; learning (artificial intelligence); APTs; JavaScript; PDF documents; data analysis; data encoding; disastrous cyber attacks; document description language; ensemble learning; k-ary codes; malicious codes; malicious content detection; malicious script; malware; portable document format; programming language; structural information; Encryption; Entropy; Feature extraction; Malware; Portable document format; Ensemble Learning; Malicious JavaScript; Multi-Stage Attack; Portable Document Format (PDF); Proof of Concept;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing, Informatics, Communication and Energy Systems (SPICES), 2015 IEEE International Conference on
  • Conference_Location
    Kozhikode
  • Type

    conf

  • DOI
    10.1109/SPICES.2015.7091371
  • Filename
    7091371