Evading Malware Detection through Obfuscated Computer Code
Methodologies in Adapting Code to Defeat Customary Defenses
Visit the Evidence Files Facebook and YouTube pages; Like, Follow, Subscribe or Share!
Find more about me on Instagram, Facebook, LinkedIn, or Mastodon. Or visit my EALS Global Foundation’s webpage page here.
Obfuscating code is the process of making computer code very difficult to read and understand. Businesses and developers employ obfuscation to make their code challenging to read, copy, or steal, while attackers use it to evade malware detection services. At its core, obfuscation refers to taking various steps to mask what a program is and actually does. This article focuses on obfuscation used for the purpose of evading detection and defense mechanisms. Cybersecurity specialists must understand the tactics attackers use to circumvent detection systems to create more robust defensive measures. Before examining several different methods of obfuscation, here is a brief explication of some primary detection strategies.
***NOTE: Some of the tools listed herein may be considered “hacking tools” in certain jurisdictions, and could be subject to legal regulation or use restrictions.
Static Malware Analysis
Static Malware Analysis amounts to little more than inspecting code at a moment in time. Windows Portable Executable Format (“PE Format”) describes the structure of files such as .exe, .dll and .sys. PE format data structures are (generally) the same on-disk or in memory. These files contain the x86 instructions, data, and metadata needed to run the program. The PE file format structure is illustrated like this:
Source: Malware Data Science
The first header of significant interest is the PE header, which includes useful information such as the time of compiling of the file. To find the offset of the PE header, one can examine the e_lfanew field in the DOS header. A valid PE file signature value is set at 0x00004550. In the Optional header, one finds the program’s entry point, which contains the first instruction the program runs. It also indicates the program targets and other information. Section headers provide details on each section of the file and organize items such as code, data, text strings, and images. These include critical evidence for malware analysis such as:
.text: Contains the executable component of the program that the operating system will execute. This should be the only section with execute privileges and to contain resident code.
.rdata: Contains the import and export information. It can also be used to store other read-only data used by the program. Import and export sections can be subdivided into .idata and .edata respectively.
.data: Contains the global data required by the program.
.rsrc: Contains the resources used by the application such as icons, images, menus and strings.
The data size of these sections should align on-disk and virtually. If not, analysts should investigate the possibility of packing (see more on this below). Linked libraries provide information about the program’s functionality. These can include types such as: Static—where all the information resides in the code; Runtime—which is only loaded when the program calls it; and Dynamic—which are loaded on execution. Windows’ Process Explorer allows examination of the DLL processes, including those which are open or loaded. Investigating the strings provides further information on the functionality of a program. Static analysis programs use keyword searches to detect suspicious strings.
Static analysis provides a starting point for malware analysis, but is often ineffective against more sophisticated constructions, particularly those employing multipart obfuscation as discussed below.
x86 Disassembly
Malware code is typically written in high-level language and then passed through a compiler to run at the CPU level. x86 Disassembly begins by converting that high-level code to binary. Many tools are available for disassembling, such as pefile and capstone. Malware coders often use anti-disassembly methods to confuse analysts about the instructions contained in different blocks. Disassembly is further complicated by self-modifying code. A straightforward tactic is linear assembly, which involves correlating the sequence of bytes in the PE file with its x86 code, followed by examining the bytes. For a detailed discussion of this, see Baojian Hua’s Topics in Software Security, chapter 4, here.
Dynamic Malware Analysis
The simplest explanation of Dynamic Analysis is that it tells the analyst what malware does. Watching what a program writes to the disk enables the analyst to compare similar traits across other programs. This can help determine if malware was created by the same person or entity, or as a way to identify elements of an as-yet unknown program. The Cuckoo Sandbox is an open-source tool for conducting this process. For more analysis tools and services options, see here. Like Static Analysis, Dynamic Analysis examines file signatures for file properties, among other things. In addition, Dynamic Analysis helps identify the specific commands and procedures the program is attempting to perform. It might also show network hosts the program contacted, system objects it did or attempted to manipulate, whether it executed encryption, and any changes it made to the registry. Additionally, it can detect any API calls made by the program.
Sandboxed Dynamic Analysis has drawbacks. Some malicious programs are purposefully designed to evade sandbox protections. Some common evasion techniques include:
Extended Sleep – programs lie dormant for a period of time predicated on the hope that it can “wait out” the computational or pre-set limitations of the sandbox environment
Sandbox detection – malware contains detection tools searching for VM processes and registry entries to trigger evasion tactics (such as commanding an extended sleep on positive detection)
User interaction – delays deployment of the malware payload until the user conducts prescribed interactions, which often indicate that automated malware detection processes have finished
Obfuscation
Cyber attackers tend to be lazy. With malware flooding the internets (both surface and dark webs), it is easy to procure malicious programs with sophistication levels correlating to the price the attacker is willing to pay. As such, cybersecurity experts formulate defenses based on an equal level of access to this now ubiquitous commodity. Advanced defense mechanisms search for all kinds of features of the code of these programs to quarantine, repel, or altogether shut down before they can cause any significant damage to the protected systems. Moreover, specific aspects of the exploited vector that malware uses can also lead to identifying the attacker. To circumvent protections and detection, cyber attackers spend more time obfuscating acquired code than they do crafting it anew (the exception, generally, might be state-sponsored hackers). To create adequate defenses against code that is purposely altered to evade security, cybersecurity professionals must know how to detect and decipher obfuscated code.
Typically, to evade malware detection and identification, malware coders employ numerous methods. What follows are some examples of obfuscation techniques.
Signature Obfuscation
Many commercial malware programs rely on signature analysis to identify malware. Cyber threat researchers collect and publish many elements of malware (these publications are sometimes publicly available, distributed only to law enforcement or other cleared entities, or specific to proprietary anti-virus or firewall software). Protection programs scan incoming files and compare them to these signature databases. A match indicates the presence of malware that the system then deals with accordingly. To build signature databases, researchers collect and analyze sample malware from various sources. Examples of signature elements might include plaintext commands, hexadecimal commands, or file types. The obvious disadvantage to signature analysis is that it only captures known malware.
Nevertheless, even known malware can evade signature analysis. One such method is to insert dummy code. Attackers employing this method add vast amounts of code that does not change the overall functionality in any way. Instruction pattern transformation means adding complexity to otherwise simple instructions, often employing unusual commands that perform the same functions but require considerably more deciphering to understand. Adding conditional variables to straightforward processes disguises what the code is doing by making it unclear what process each segment is meant to complete. Because many malware scanners take shortcuts to enhance memory efficiency, adding enormous amounts of dummy code or conditional variables between and among identifiable signature aspects can help to evade detection. In the opposite direction, attackers compress or pack code. In following this methodology, normal signatures remain packed into formats that malware scanners tend to miss or cannot read, with the malware only unpacking or decompressing the code upon execution. This is particularly effective against static scanners. Even programs that supplement static scanning with unpacker technologies can be defeated if attackers use custom methods for compression. Some popular packing programs include UPX, ExeStealth, PESpin, Andromeda, and Obsidium. Removing metadata or superfluous code reduces information about the program, which inhibits understanding of some of the more sophisticatedly obscured sections. Encrypting code strings generally renders signature analysis ineffective. String encryption replaces plain text strings with encrypted values, forcing an analyst to decrypt each string prior to being able to understand their functions. As signature analysis depends upon known code, encrypting specific attributes of malicious code makes it extremely difficult to detect, and more so if multiple methods of encryption are used within the same file. Polymorphic malware constantly changes its signatures to fool antivirus software into thinking it is not malicious. Metamorphic malware continuously rewrites its “internal structure, rewriting and reprogramming itself each time it infects a computing system.”
PowerShell Obfuscation
Using a basic binary-to-text with padding encoding to disguise the download and execution of a file via PowerShell, methods like this employ several techniques at once to obfuscate subsequent functions. To scramble potential signatures, attackers create variable assignments—which simply rename or make invisible the variables—and split strings using concatenation, or alter words with random spacing and capitalization. Attackers also adopt encryption. For PowerShell-based attacks specifically, they often use SecureString. See here for a detailed example. For more sophisticated evasion, attackers can make the signature invisible altogether (or, nonexistent, really) by using the WebClient.net object to read the malware from a remote location and execute it using PowerShell’s Invoke-Expression. Invoke-Expression allows for calling code within a script to be executed at a later time. Executing coding this way creates “fileless malware,” which is basically malicious code attached to legitimate scripts that execute through memory contemporaneously with the legitimate program. Using a program such as Daniel Bohannon’s Invoke-Obfuscation tool, the code can be jumbled to hide obvious keywords from scanning. See some examples displaying the commands with the subsequent output using the Invoke-Obfuscation tool, as presented by Michael Buckbee, here:
The result:
XOR Obfuscation
The Exclusive or operation (XOR) is a simple substitution cipher conducted through a reversible algorithm. Utilizing 1-byte values as the key (ranging from 0 to 255), every byte of the data needing encoding is XORed with the selected key. AS XOR detection programs easily defeat this kind of obfuscation, attackers will implement XOR encoding in a loop, or via multiple passes, essentially encrypting the original value, then encrypting again the previously encrypted value. See here for a variety of tools available to deobfuscate XORed code. For multi-pass or loop encoding, analysts may have to manually decode one or more layers before XOR decoding tools will work.
Control-flow Flattening
During the execution of a program using this method, the control flow is commanded by a dispatch variable. Each function is broken down into blocks. The transition from block to block is determined by a control value residing at the end of each block. The control value redirects the flow through the dispatcher node that employs an artificial variable to determine which block executes next. At its basic level, the control flow is transformed through a series of conditional statements (if-then/else/while). Here is a basic example:
Source: Android Code Protection via Obfuscation Techniques: Past, Present and Future Directions
When analysts attempt to reverse engineer the program, the control flow is not obvious because all that is apparent is that each of the blocks are connected to the dispatcher. The relationship between the blocks themselves cannot be immediately determined. Each block represents its own state identified only by its variable. In order to recover the control flow, analysts must identify the variable and follow the dispatch logic to determine the subsequent block. The value in this for an attacker is the time it takes to fully resolve. Both dynamic and static analysis will eventually resolve control-flow flattened programs, but the program can become significantly more complex if the attacker employs indirect or call-based routines. For more on this, see here.
An illustration of control-flow flattening logic:
Source: Automated Detection of Control-flow Flattening
Resource Obfuscation
Resource obfuscation is simply hiding the files or scripts the program calls. A basic method attackers use is to add a random byte to data sources (such as images or strings), which is then removed at runtime. Of course, any number or sequence of variables can be used—made even more complicated by encrypting the entire called-upon dataset along with the obfuscating variable(s). Decryption occurs at runtime, and the variable is simultaneously removed as commanded at execution. For a tool designed specifically for obfuscating resources, see here. Just as developers do, attackers can perform obfuscation with tools like this to replace file extensions with gibberish, rendering them very confusing to an analyst conducting a static examination. In addition, the same method can be used for malware using dynamically downloaded data, either by altering URL mapping definitions or dynamically generating obfuscated URLs as needed.
Conclusion
This is a very basic introduction to obfuscation techniques. As a cybersecurity professional or malware researcher, it is critical to understand obfuscation methodologies and how to search for them through every stage of malware analysis. There is an abundance of tools available including many free resources to assist in unpacking obfuscated malware, but there will be many occasions where the analyst will need to decipher such code manually. Toward that end, here is some recommended reading:
Surreptitious Software by Christian Collberg and Jasvir Hagr
Syntia: Synthesizing the Semantics of Obfuscated Code by Tim Blazytko, Moritz Contag, Cornelius Aschermann, Thorsten Holz
Protecting Software through Obfuscation: Can It Keep Pace with Progress in Code Analysis? by Sebastian Schrittwieser, Stefan Katzenbeisser, Johannes Kinder, Georg Merzdovnik, and Edgar Weippl
A Detection Framework for Semantic Code Clones and Obfuscated Code by Abdullah Sheneamer, Swarup Roy, and Jugal Kalita
Layered Obfuscation: a Taxonomy of Software Obfuscation Techniques for Layered Security by by Hui Xu, Yangfan Zhou, Jiang Ming, and Michael Lyu
Detecting Malicious JavaScript Code Based on Semantic Analysis by by Yong Fang, Cheng Huang, Yu Su, Yaoyao Qiu
Thumbnail source: SOC Experts - Anand Guru - Malware Analysis - 10 Obfuscation techniques
***
I am a Certified Forensic Computer Examiner, Certified Crime Analyst, Certified Fraud Examiner, and Certified Financial Crimes Investigator with a Juris Doctor and a Master’s degree in history. I spent 10 years working in the New York State Division of Criminal Justice as Senior Analyst and Investigator. Today, I teach Cybersecurity, Ethical Hacking, and Digital Forensics at Softwarica College of IT and E-Commerce in Nepal. In addition, I offer training on Financial Crime Prevention and Investigation. I am also Vice President of Digi Technology in Nepal, for which I have also created its sister company in the USA, Digi Technology America, LLC. We provide technology solutions for businesses or individuals, including cybersecurity, all across the globe. I was a firefighter before I joined law enforcement and now I currently run the EALS Global Foundation non-profit that uses mobile applications and other technologies to create Early Alert Systems for natural disasters for people living in remote or poor areas.
For more cybersecurity and analysis articles, see below: