Neural Reverse Engineering of Stripped Binaries using Augmented Control Flow Graphs

doi:10.5281/zenodo.4099685

Published November 15, 2020 | Version v3

Dataset Open

Neural Reverse Engineering of Stripped Binaries using Augmented Control Flow Graphs

1. Technion, Israel

This dataset and pre-trained models are released as a companion to our OOPSLA '20 publication: "Neural Reverse Engineering of Stripped Binaries using Augmented Control Flow Graphs":

The dataset file (nero_dataset_binaries.tar.gz) is composed from packages of binary executables created by compiling several GNU source-code packages. We used these executables to evaluate our approach as implemented in our prototype "Nero" and compare it to other approaches. All executables contain debug information which serves as the ground truth for the procedure name predictions. The packages are split into three sets: training, validation and test.
1. The executable file name structure is: "<compiler>-<compiler version>__O<Optimization level(u for default)>__<Package name>[-<optional package version>]__<Executable name>". For example "gcc-5__Ou__cssc__sccs".
The procedure representation file (procedure_representations.tar.gz) contains:
1. The raw representations for all the binary procedures in the above dataset. Each procedure is represented by one line in the relevant file for each set (training.json, validation.json and test.json)
2. The above representations preprocessed for training.
The pre-trained model file (nero_gnn_model.tar.gz) was created using the above preprocessed dataset and contains:
1. Pre-trained model.
2. Training log.
3. Prediction results log.

For the code of the "Nero" prototype, and more information about the above artifacts see our Github repo

Files

Files (149.7 MB)

Name	Size	Download all
nero_dataset_binaries.tar.gz md5:96ecb494acdee1f723fa5c350b0af846	36.0 MB	Download
nero_gnn_model.tar.gz md5:2707fd61f9033632c28ac290c81f9fc7	98.7 MB	Download
procedure_representations.tar.gz md5:61e27d9f9dbabccbb87f2af12890a4e2	15.1 MB	Download

Additional details

Is documented by: Conference paper: 10.1145/3428293 (DOI); Preprint: arXiv:1902.09122 (arXiv)

	All versions	This version
Views	991	669
Downloads	687	483
Data volume	40.3 GB	29.0 GB

Neural Reverse Engineering of Stripped Binaries using Augmented Control Flow Graphs

Creators

Description

Files

Files (149.7 MB)

Additional details

Related works