Botnet dataset with simultaneous attack activity

The proposed dataset shows characteristics of simultaneous botnet attack activities. Botnet network traffic has sequentially interconnected as formed as bidirectional network flow (binetflow), which is combined with normal activities. The dataset is generated from a simulation process by extracting botnet pattern behaviors taken from CTU-13 and NCC datasets. The extraction results are utilized as the basis for simulations to produce a new dataset with simultaneous botnet attack activities. The term “simultaneous attack activities” refers to an attack activity that involves multiple botnets and happens at the same time. The dataset contains several botnet types distributed over three detection sensors. Each dataset has 18 network header features with a total recording duration of 8 h. The bot attack spreads must be appropriately handled by efficient processing, also known as parallel computation detection.

The proposed dataset shows characteristics of simultaneous botnet attack activities.Botnet network traffic has sequentially interconnected as formed as bidirectional network flow (binetflow), which is combined with normal activities.The dataset is generated from a simulation process by extracting botnet pattern behaviors taken from CTU-13 and NCC datasets.The extraction results are utilized as the basis for simulations to produce a new dataset with simultaneous botnet attack activities.The term "simultaneous attack activities" refers to an attack activity that involves multiple botnets and happens at the same time.The dataset contains several botnet types distributed over three detection sensors.Each dataset has 18 network header features with a total recording duration of 8 h.The bot attack spreads must be appropriately handled by efficient processing, also known as parallel computation detection.
© 2022 The Author(s).Published by Elsevier Inc.This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

Value of the Data
• The dataset represents botnet attack activities, consisting of those associated sequentially with simultaneous activities detected on multiple parallel sensors.This has overcome the lack of existing datasets, which do not provide those critical data characteristics.• The dataset is helpful for network administrators and network security researchers to analyze, evaluate and develop a new attack detection model based on the simultaneous characteristic that was occurring from several sensors' detection at the same time.
• The dataset can be extended to a parallel detection model for more complex botnet activity detection.This parallel detection handles distributed bot attacks, which often occur in a real system.Three sub-datasets are obtained explicitly from different sensors containing normal and various botnet activities.Additionally, a sub-dataset is a combination of those three subdatasets.Thus, it can be a knowledge database comprising botnet attack pattern behaviors.

Data Description
The dataset simulates botnet attacks using botnet activities described in CTU-13 [1] and NCC [2] .The simulation extracts all scenarios from those two datasets to determine attack activities, attack phases, and the time difference between attacks and normal activities [3] , leading to four scenarios represented by the corresponding sub-datasets.Additionally, it is assumed that this simulation employs three sensors.This dataset is intended to be a basis for developing a distributed botnet detection model, requiring data sources with simultaneous botnet attack activity.It is an attack activity carried out by more than one type of botnets at the same time.
The first scenario consists of five botnet types: Rbot, Neris, Sogo, NSIS.ay, and Virut, which are detected by sensor 1.The second scenario has four botnet types: Rbot, Neris, Menti, and Virut, collected by sensor 2. The third scenario utilizes sensor 3, resulting in five botnet types: Rbot, Neris, Murlo, NSIS.ay, and Virut.Lastly, the fourth scenario combines outputs from those three sensors.An example of the output of sensor 1 is depicted in Fig. 1 .In more detail, a description of scenarios is provided in Table 1 .Each sensor's output is stored in a folder containing three  The proposed dataset shown in Table 2 comprises 18 features as network headers representing network traffics.Among those features, three new features are extracted from the existing 15 features in CTU-13 and NCC datasets: ActivityLabel, BotnetName, and SensorId, describing the botnet name, activity label, and sensor id recording activities, respectively.The ActivityLabel has two possible values: 0 or 1, which indicate the network traffic is normal or botnet activities, respectively.The BonetName feature contains the attacking botnet's name; if it is a normal activity, the field is blank (-).SensorId is the identifier of the sensor recording network traffic.
This dataset has simultaneous botnet activity and more advanced characteristics in terms of attack intensity and the number of botnet types in one sub-dataset than the sporadic attacks on CTU-13 and periodic attacks on NCC datasets.The botnet activities tend to peak at random periods in sporadic characteristics.Besides, it peaks at a relatively constant time interval in the periodic characteristics [4] .Several botnets carry out attacks at the same time with simulta-Fig.2(a).Comparison between the generated dataset, CTU-13 and NCC for scenario 3, per minutes.neous characteristics, which are substantially more intense than sporadic and periodic attacks.Most detection systems, especially models that use clustering and deep learning techniques, consume many resources, causing problems when the detection is carried out at the same time in a short time frame [5 , 6] .The proposed dataset has the characteristics of simultaneous attacks in a short period, so the security system must survive resource problems when dealing with botnet attacks.Different sensors parallelly detect the same type of bots or attack behavior with a simultaneous activity, which the CTU-13 and NCC datasets do not have.The CTU-13 dataset only records botnet activity without considering the number of sensors.Moreover, it focuses on one botnet type in one attack scenario.This has made CTU-13 cannot be used for the parallel botnet activity detection model.On the other side, NCC is designed to evaluate only associated actions, focusing on obtaining periodic activity on grouped botnets.In this research, these CTU-13 and NCC disadvantages are combined to produce simultaneous datasets.Table 3 shows a description of the proposed dataset with simultaneous attack characteristics, and Fig. 2 compares the proposed dataset to the CTU-13 and NCC datasets with per minute analysis.

Experimental Design, Materials and Methods
The topology of botnet attack activities is illustrated in Fig. 3 .The simulation has three sensors, each consisting of four to five botnet types.Bot activity is obtained by extracting bot behavior from scenario 1 to 13 datasets on CTU-13 and NCC [1 , 2] .The extraction process is carried out to analyze bot attack behavior and normal behavior [3] , whose extraction path is shown in Fig. 4 .
All scenarios in the CTU-13 and NCC datasets are extracted to obtain attack activity, attack phase, and the time difference between attack and normal activity.Normal data are recorded sequentially, while the bot attack activities are further analyzed, starting by determining the time difference between attacks.Targeted attacks, such as DDoS, tend to flood network traffic for a specific period [7 , 8] .Therefore, the time difference between attacks must be monitored to identify a chain attack activity and a follow-up attack.Furthermore, attack such as SPAM is unique; this assault activity is more constant in the distance between attacks.After examining the time gap between attacks, the following analysis finds the attack source and target.This analysis is utilized to generate a new set of bot-to-target attacks.All findings are saved as a set of new attack stages based on the characteristics of each botnet [1] .Finally, the BotnetName feature is included to explain where the botnet attack activity originated.
The simulation process accepts several parameters, as described in Table 4 .The attack simulation starts by adding a particular botnet to each sensor.Each attack activity is carried out in accordance with the previously extracted data and is organized depending on the attack stage, the interval between attacks, and the attack's characteristics.The simulation procedure is conducted simultaneously by adding normal activities sequentially.
The simulation results are saved as a bidirectional network flow (.binetflow) file representing data obtained from a specific sensor.The process is completed by combining those three binetflow files from different sensors, and all resulting datasets are shown in Fig. 5 .

Table 1
Simulation scenario for each sensor.

Table 2
Details of each feature identifying network traffic.

Table 3
Detail activity recorded for each sensor.

Table 4
Parameters of dataset generator.