Memory snapshot dataset of a compromised host with malware using obfuscation evasion techniques

This article presents a dataset for studying the detection of obfuscated malware in volatile computer memory. Several obfuscated reverse remote shells were generated using Metasploit-Framework, Hyperion, and PEScrambler tools. After compromising the host, Memory snapshots of a Windows 10 virtual machine were acquired using the open-source Rekall's WinPmem acquisition tool. The dataset is complemented by memory snapshots of uncompromised virtual machines. The data includes a reference for all running processes as well as a mapping for the designated malware running inside the memory. The datasets are available in the article, for advancing research towards the detection of obfuscated malware from volatile computer memory during a forensic analysis.


Data
The dataset includes (4300 positive and 300 negatives) memory snapshots also a.k.a., memory dumps of a compromised Windows 10 virtual machine. The positive dataset consists of three groups according to the payloads employed to compromise the VM. We used the Advanced Forensics Format (AFF4) to store the memory snapshots. AFF4 is an open format for storing forensic disk images and the accompanying information about the data. For every positive memory dump, we extracted the list of all processes and stored it in a text file. In addition, we provided the memory map for the payload used to compromise every VM, and we saved it also in a text file. The memory map shows the virtual address of the page, the corresponding physical offset of the page, and the size of the page. We used several encoded/obfuscated reverse shell executable payloads to compromise the VM. We performed the encoding/obfuscation process using existing Metasploit encryption algorithms in addition to other tools such as Shellter, Hyperion, and PEScrambler.

Experimental design, materials, and methods
The proposed dataset aims at supporting security research that involves analyzing memory snapshots (forensic analysis). By doing so, we can have more accurate information about the applications running in the memory including the behavior of malware if present [1].
Specifications Table   Subject Cyber Security Specific subject area Detection of obfuscated malware from volatile computer memory. Type of data Memory snapshots of a compromised Windows 10 virtual machine. Three groups of memory snapshots were generated based on the following penetration tests: (1) reverse meterpreter shells, (2) Shellter shells, (3) Hyperion and PEScrambler shells. Each memory snapshot is provided with a list of running processes in the system and the memory map of the malicious process. How data were acquired The memory snapshots were acquired using Rekall's WinPmem acquisition tool. The list of all process and the mapping were generated by Rekall's "pslist" and "memmap" plugins. Data format Memory snapshots are in advanced forensics format (AFF4 Value of the data The dataset represents realistic memory snapshots of a Windows 10 virtual machine (VM) running either only benign or a mix of benign and malicious applications. The dataset can be used to train machine learning models to discriminate between benign and malicious activity in the volatile memory. The dataset can be used to validate malware VM detection techniques. The dataset can be used to examine the robustness of malware VM detection techniques against evasion techniques such as code obfuscation, data, and code encryption. It allows performing cross-obfuscation tests: training with one set of obfuscations and test performance with a disjoint set of obfuscations.
To collect these snapshots, we have used Oracle VM VirtualBox [2] with a Windows 10 host operating system to create two VMs, i.e., an attacker's machine (Kali-Linux) and a victim's machine (windows 10). For exploiting the victim machine, we have used the open source Metasploit Framework [3] The idea was to generate several encoded reverse shell executable payloads (32-bit) that implement a reverse TCP connection (Fig. 1). Reverse shells are very relevant in cybersecurity because they can allow an attacker to scan your network internally, install network sniffers, steal valuable information, change computer settings, including passwords and user credentials, perform DDoS attacks on other computers, and the like.
As one of the objectives of this dataset is to assess how detection techniques based on machine learning algorithms can detect obfuscated malware within a computer volatile memory. We have generated the payloads in three different steps as follows. First, we incorporated the encoding capabilities of the Metasploit framework, since the framework provides a different number of encoders for 32-bit executable payloads. Second, we re-encoded the payloads generated in 2.1 using Shellter [4] Third, we re-encoded the payloads generated in 2.1 using Hyperion [9] and then PEScrambler [5] We elaborate on these steps in the following sections.

Memory snapshots: metasploit encoded payloads
In this stage, we have generated the payloads using sixteen "32-bit" encoders (Table 1). Besides, for each encoder, we iterated over ten times. Hence, a total of 160 encoded payloads will be generated.
Although the framework provides other encoders, we have only selected compatible encoders and discarded non-compatible ones. Unselected encoders either yielded broken snapshots, or they did not work in the first place. We generated the payloads via a chain of commands as follows.
The chain of command was used for all encoders given in Table 1. The "shikata_ga_nai" was always used with other encoders because it is the only encoder with the rank of Excellent, a measure of reliability and stability of a module. Options used to generate the payloads are as follows: -p: What type of payload to create (in our case a meterpreter reverse TCP shell) LHOST: What IP address to connect back to LPORT: What TCP port to connect back to (in this case port 4444) -f: What file type to create (in our case windows executable) -e: The designated encoder to use (encoder_name) -i: The number of times to encode a payload (num ¼ 1; /; 10.) -o: Where to redirect the output (in this case to a file called metasploit_payload.exe) Once the payloads were generated, we zipped and transferred them to the victim machine. When the payload is executed on the victim machine, a meterpreter session is created between the attacker and the victim. The meterpreter session was created as follows: Here "LHOST" represented the victim machine. The customized "autoruncommands.rc" enabled us to simulate user's activities between both devices such as uploading files, downloading files, and taking screenshots. Once a payload was running, and a session was opened, snapshots were collected. For every payload, we collected 10 snapshots, while the time between every snapshot is between 2 and 4 minutes. To achieve this goal, we have used the windows memory acquisition tool is a.k.a., WinPmem (version: winpmem-2.1.post4) [6] This process can be performed as follows: Options used to generate the memory snapshots are as follows: -o: Write the output into snapshot.aff4 -t: Truncate the output file The snapshots were stored in "advanced forensic format" (AFF4) while the size of every snapshot is approximately 1 gigabyte. The AFF4 is a compressed format and therefore for extracting any valuable information, this image should be decompressed. Although we have already decompressed all memory dumps we did not provide such decompressed files as the file of each dump separately is about 5 gigabytes. For the decompression process, the Rekall (version: Version 1.7.3.dev54: Hurricane Ridge) Memory Forensic Framework was utilized [7] This process can be performed as follows: Following the decompression process, we extracted a list of all processes "pslist" for every image file as well as the memory map "memmap" for the employed payload (Figs. 2 and 3). These information act as labels to train the machine learning algorithm. The "pslist" is extracted as follows: The "memmap" is extracted as follows: Options used to extract the list of all processes and the memory map are as follows: -profile¼: The name of the profile to load (in our case Win10x64_17134) -f: The raw image to load -proc_regex: A regex to select a profile by name (in our case, these names would be "payload", "pescrambler_en", or "shellter-paylo"). &> where to redirect the output After validating the integrity of the memory dumps (i.e., removing any corrupted files), we ended up with 1530 AFF4 files. The folder containing these files along with their labels can be accessed at the following link: https://drive.google.com/open?id¼14csgcVl_fKjLWoDk0qU7pkF7u1nRxujz. rekall memmap -proc_regex payload_name -f snapshot.aff4.img -profile¼Win10x64_17134 &> memmap.txt rekall -f snapshot.aff4 imagecopy -output-image¼ snapshot.aff4.img rekall pslist -profile¼Win10x64_17134 -f snapshot.aff4.img &> pslist.txt Fig. 3. An example of the memory map for a payload with a process name such as "payload-x86-al".

Memory snapshots: "Shellter" metasploit encoded payloads
Here the payloads generated in 2.1 were re-encoded using "Shellter". It is a dynamic, shellcode injection tool. It can be used to inject shellcode into native Windows applications (32-bit only). "It takes advantage of the original structure of the PE file and doesn't apply any modification such as changing memory access permissions in sections (unless the user wants), adding an extra section with read, write, and execute access, and whatever would look dodgy under an AV scan". We re-encoded Metasploit encoded payloads as follows: Where "-a" refers to an auto mode, "-s" refers to a stealth mode. The auto mode enables Shellter to apply its own encoding. The encoding engine will use a random amount of "XOR", "ADD", "SUB", or "NOT" operation. The stealth mode feature preserves the original functionality of the application while it keeps all the benefits of dynamic PE infection. We followed the same steps mentioned in 2.1 to obtain the memory snapshots. After validating the integrity of the memory dumps, we ended up with 1520 AFF4 files. The folder containing these files along with their labels can be accessed at the following link: https://drive.google.com/open?id¼1MNDg7ntEY3k7wfPLxDq6y9vHG7aZro-Q.

Memory snapshots: "Hyperion & PEScrambler" metasploit encoded payloads
Here the payloads generated in 2.1 were re-encoded using "Hyperion" and then PEScrambler. Hyperion tool is a runtime crypter that can transform a Windows portable executables (PE) into an encrypted version that decrypts itself on startup and executes its original content. PEScrambler is a tool to obfuscate win32 binaries automatically [8] It can relocate portions of the code and protect them with anti-disassembly code. It also defeats static program flow analysis by re-routing all function call through a central dispatcher function [8] The re-encoding commands are performed as follows: Options used to generate the obfuscated payload are as follows: -i: Specify an executable input file (hyperion_payload.exe) -o: Specify an output executable file (pescrambler_payload.exe) After validating the integrity of the memory dumps, we ended up with 1250 AFF4 files. The folder containing these files along with their labels can be accessed at the following link: https:// drive.google.com/open?id¼1gYA7WyZY6MC5WyKI9_Q1uyPRnI0iufmM.
At last, the negative snapshots were collected with only trusted applications were only running in the memory. The folder containing these files along with their labels can be accessed at the following link: https://drive.google.com/open?id¼1J7T4ZRWChEiIBKkL4bq0IEeh2ZeL0NGN.