Boğaziçi University distributed denial of service dataset

Distributed Denial of Service (DDoS) attacks is one of the most troublesome intrusions for online services on the internet. In general DDoS attacks are divided into two categories as bandwidth depletion and resource depletion attacks. We generate resource depletion-type DDoS attacks on the campus network of Boğaziçi University and recorded the ongoing traffic from the backbone router's mirrored port. We generate TCP SYN, and UDP flooding packets using Hping3 traffic generator software by flooding. This dataset includes attack-free user traffic and attack traffic, which is suitable for evaluating network-based DDoS detection methods. Attacks are towards one victim server connected to the backbone router of the campus. Attack packets have randomly generated spoofed source IP addresses. We removed payloads of packets and anonymized the source IP addresses of legitimate users for the confidentiality of legitimate users.


Specifications
Computer Science:: Computer Networks and Communications, Computer Vision and Pattern Recognition, Information Systems.

Specific subject area
Resource depletion type DDoS attacks, TCP SYN flood, and UDP flood. Attack-free data in the dataset is the traffic of more than 40 0 0 real internet users. The payloads of packets are removed, and the IP addresses of the users are changed to protect the confidentiality of users and the university.

Description of data collection
Data is collected from mirrored backbone switch port using Wireshark software in .pcap file format. These files then converted to .cvs format, and packet payloads removed for anonymization.

Data source location
Bo gaziçi University, ˙I stanbul, Turkey.  [ 10 , 11 ], the BOUN dataset includes legitimate background internet traffic mixed with DDoS attack traffic. In addition, the BOUN datasets provide easier simulation and analysis because of small file sizes and fewer packets compared to other datasets [ 10 , 11 , 12 ]. • Attack and legitimate traffic packets can easily be separated from each other using destination IP addresses of packets. Attack-free packets in the datasets can be used for traffic analysis, or combined methods with another attack dataset can be evaluated [13] . • Datasets are given in comma-separated file format, including header information of packets to help researchers easily import datasets in different research software platforms.

Data description
The design concept of Network-based intrusion detection systems is detecting attacks from networks end, on the router, or on the backbone switch. This dataset is produced for the evaluation of network-based intrusion detection methods. In the network topology shown in Fig. 1 , the traffic is taken from campus routers port by mirroring method. The mirroring operation on routers interfaces provides our traffic recording server the exact copies of incoming and outgoing packets flowing through the mirrored interface. Traffic is recorded and converted to .csv file format using Wireshark software.
The dataset includes two different attack scenarios. In both situations, randomly generated spoofed destination IP addresses are used in a flooding manner. For TCP flood attacks, TCP port 80 is used as the destination port. All of the datasets lasted 8 min. In each of them, 80 s waiting period, then 20 s attack period is practiced. Different packet rates are used to let researchers evaluate their detection methods concerning different packet rates. The TCP SYN Flood and UDP flood datasets include attack rates of 10 0 0, 150 0, 20 0 0, and 2500 packets/second, respectively. The topology of the network for obtaining an attack dataset is given in Fig. 1 . Both legitimate and DDoS attack traffics mirrored to the recording server.
Attack packets can be distinguished from attack-free packets using the destination IP address of packets. The victim IP address is 10.50.199.86. Fig. 1 shows the network topology used in the generation of the dataset. We carried out the TCP SYN flood and UDP flood attacks towards a server connected to the campus backbone. Over 40 0 0 active internet user traffic was flowing over the campus router simultaneously to the attack traffic.
We used the hping3 software installed on 3 computers for attacks. Attack packets contain spoofed source IP addresses. Since the source IP addresses of the attack packets are generated randomly and uniquely, it appears as attacks come from many different sources when viewed from the routers port. In other words, the attack packets in the dataset come from multiple sources.
Datasets are given as two tables in the comma-separated value (csv) file format. The names of the files are BOUN_TCP_Anon.csv corresponding to TCP SYN flood attacks, and BOUN_UDP_Anon.csv corresponding to the UDP flood attack dataset. The tables in the files of the dataset have the following columns: 6 Source_Port: Source TCP port of the packet. If it is not a TCP packet, this field is empty. 7 Destination_Port: Destination TCP port of the packet. If it is not a TCP packet, this field is empty 8 SYN: This value is "Set" if the packet is a TCP packet and its SYN flag is equal to one, it is equal to "Not Set" if the packet is a TCP packet and its SYN flag is equal to zero. If the packet is not a TCP packet, this field is empty. 9 ACK: This value is "Set" if the packet is a TCP packet and its ACK flag is equal to one, it is equal to "Not Set" if the packet is a TCP packet and its ACK flag is equal to zero. If the packet is not a TCP packet, this field is empty. 10 RST: This value is "Set" if the packet is a TCP packet and its RST flag is equal to one, it is equal to "Not Set" if the packet is a TCP packet and its RST flag is equal to zero. If the packet is not a TCP packet, this field is empty.  Table 1 and Table 2 gives some statistics and information about attacks in datasets. Each attack dataset contains 4 attack instances. The columns of tables are explained as follows: • Attack Period: There are 4 attack periods for TCP SYN and UDP flood datasets.

Experimental design, materials and methods
We used the same network topology shown in Fig. 1 to create the UDP and TCP SYN flood datasets. The setup differs only in the generated attack packets for UDP and TCP SYN flood attack datasets. We used hping3 software to generate attack packets with randomly generated spoofed source IP addresses.
Network-based intrusion detection systems aim to detect intrusions by monitoring traffic to and from all devices. They perform detection by analyzing all traffic passing through the gateway of the user networks. They are generally connected to the gateway of the network or the backbone router.
We produced the BOUN DDoS dataset to evaluate network-based intrusion detection approaches. We recorded the network traffic from the mirrored router port. Port mirroring on the backbone router sends a copy of all network packets seen on the mirrored router port to another interface for monitoring purposes.
Wireshark software running on a server running with windows processing system was used to record the traffic. Traffic is initially saved in .pcap file format and then converted into the .csv file format to make it available to use in research software applications. Payloads of packets are deleted, and A-class virtual IP addresses replace source IP addresses using text editing software to preserve the confidentiality of end-users.

Ethics statement
This work doesn't include any human subject and animal experiments. In addition, data is anonymized, and the payload of the packets is removed in order to prevent the confidentiality of users.

Declaration of Competing Interest
The authors declare that they have no known competing for financial interests or personal relationships that could have appeared to influence the work reported in this paper.