RedJsod: A Readable JavaScript Obfuscation Detector Using Semantic-based Analysis

Author

AL-Taharwa, Ismail Adel ; Lee, Hahn-Ming ; Jeng, Albert B. ; Wu, Kuo-Ping ; Mao, Ching-Hao ; Wei, Te-En ; Chen, Shyi-Ming

Author_Institution

Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ. of Sci. & Technol., Taipei, Taiwan

fYear

2012

fDate

25-27 June 2012

Firstpage

1370

Lastpage

1375

Abstract

JavaScript allows Web-developers to hide intention behind their code inside different looking scripts known as Obfuscated code. Automatic detection of obfuscated code is generally tackled from readability perspective. However, recently obfuscation exhibits patterns that modify both syntax and semantic characteristics while preserving readability characteristic. There are two problems in dealing with readable obfuscation: 1. Difficulty in locating it since it does not manipulate suspicious strings. 2. It is a common and essential practice adopted in both benign codes and malicious codes. In this work, we first investigate why and how readable obfuscation can hinder detection of maliciousness and prevent the static analysis of suspicious scripts. Next, we propose a readable JavaScript obfuscation detector (RedJsod) system to deal with this type of threat. RedJsod is a well defined detector based on variable length context-based feature extraction (VCLFE) scheme that takes advantages of abstract syntax tree (AST) representation of a given JavaScript code to infer run-time behaviors statically. We applied RedJsod to three datasets collected from real world Web-pages to evaluate its effectiveness. Also, we tested RedJsod on well-known readable obfuscation samples cited in related works as a proof of concept illustration. Our experimental results indicated that RedJsod achieved very high detection rates (greater than 97%) in terms of accuracy, eliminated false negatives completely, while at the same time yielded very few false positives.

Keywords

Internet; Java; abstract data types; feature extraction; invasive software; program diagnostics; programming language semantics; tree data structures; AST representation; RedJsod; VCLFE scheme; Web pages; abstract syntax tree; automatic obfuscated code detection; benign codes; malicious codes; maliciousness detection; malware; proof of concept; readability characteristics preservation; readable JavaScript obfuscation detector; run-time behaviors; semantic characteristics; semantic-based analysis; static analysis prevention; suspicious scripts; syntax characteristics; threat detection; variable length context-based feature extraction; Abstracts; Context; Context modeling; Detectors; Encoding; Feature extraction; Malware; AST representation; JavaScript malware; detection; encoding; feature-based; obfuscation; static analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Trust, Security and Privacy in Computing and Communications (TrustCom), 2012 IEEE 11th International Conference on

Conference_Location

Liverpool

Print_ISBN

978-1-4673-2172-3

Type

conf

DOI

10.1109/TrustCom.2012.235

Filename

6296140