• DocumentCode
    3273949
  • Title

    Analyze and detect malicious code for compound document binary storage format

  • Author

    Gao, Yu-xiang ; De-yu Qi

  • Author_Institution
    Res. Inst. of Comput. Syst., South China Univ. of Technol., Guangzhou, China
  • Volume
    2
  • fYear
    2011
  • fDate
    10-13 July 2011
  • Firstpage
    593
  • Lastpage
    596
  • Abstract
    Comparing traditional malicious attack, embedding malicious codes into documents is becoming a more efficient and hidden way. The attackers embed the malicious codes into a document based on the document storage format so that they activate secretively when the document is opened by third-party software. With a simple action of double click the document, it could bring a nightmare to the user. Through researching and analyzing the structure of compound file, we mainly focus on the Word documents, and try to find out a method to detect them. We have used the bloom filter as well as the entropy rate of Markov chain and reached a high accuracy. Detect embedded malicious codes by analyzing the embedded codes themselves, because they are machine instructions which must can execute by CPU. A basic assumption is that the machine instructions in the document are different from the normal text, pictures, tables, etc. The basic direction of detection is to find the different areas in the document. Thus, we use the entropy rate as a measure to quantify this distinction.
  • Keywords
    Markov processes; entropy; security of data; word processing; CPU; Markov chain; Word documents; bloom filter; compound document binary storage format; document storage format; entropy rate; machine instructions; malicious attack; malicious code analysis; malicious code detection; third-party software; Accuracy; Compounds; Cybernetics; Entropy; Jacobian matrices; Machine learning; Markov processes; Bloom filter; Document storage format; Entropy rate; Malicious codes; Markov chain;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics (ICMLC), 2011 International Conference on
  • Conference_Location
    Guilin
  • ISSN
    2160-133X
  • Print_ISBN
    978-1-4577-0305-8
  • Type

    conf

  • DOI
    10.1109/ICMLC.2011.6016767
  • Filename
    6016767