Optimization of a Depiction Procedure for an Artiﬁcial Intelligence-Based Network Protection System Using a Genetic Algorithm

: The current demand for remote work, remote teaching and video conferencing has brought a surge not only in network trafﬁc


Introduction
The unprecedented increase in demand for remote work, remote teaching and video conferencing has brought a surge not only in network traffic, but unfortunately, in the number of attacks as well. According to [1], these attacks have surged by 800%. Having reliable, safe and secure functionality of various network services has never been more important.
Cybersecurity covers three different fields: traditional network security, which deals with protecting physical communication links and networking devices; network monitoring and threat detection; and protection against cybercrimes, such as ransomware attacks and identity theft [2].

Communication Networks Threats
Attacks on communication networks are increasingly sophisticated such that they even use artificial intelligence (AI) in combination with machine learning and automation. These technologies make attacks faster, broaden their impacts and make them more difficult to detect, especially if masking is used to conceal the attack. Tools for launching AI-based attacks can be easily downloaded, and because their use requires no deep knowledge, such attacks can be initiated by practically anyone. Finally, modern Internet of Things (IoT)

Disadvantages of AI-Based Protection
AI-based cybersecurity solutions have disadvantages, as well. First, achieving proper functionality requires rigorous training of the AI. Such training can be accomplished only on a large set of clean data. These data have to provide sufficient variability to allow optimal learning. For data to be clean, it must be possible to assume that they contain no unwanted traffic patterns that could be misused by an attacker. This could lead to a so-called AI-proof attack that would not be recognized by the protection solution. Unfortunately, the extent of the data does not allow manual checking and detailed verification of its content.
A second disadvantage consists of the immense demands placed upon computational hardware resources. AI uses demanding algorithms that require considerable capacity. Finally, the initial configuration and subsequent maintenance might require deeper knowledge and more work than do traditional network protection tools.

Summary
In conclusion, there exists a consensus [6][7][8] that AI works best when combined with traditional protection methods and when it is deployed to assist humans but not with the intention of replacing them completely.

Integration of AI-Based Cybersecurity Protection
The basic approach to integrating an AI-based cybersecurity protection system into a traditional communication network utilizes a dedicated general device, such as a server or networking device, based on traditional x86 CPU architecture and an open UNIX-based operating system. Such a device provides a flexible platform for implementing the system. As networking devices of traditional networks typically use proprietary software and custom hardware, they cannot in this case be used for the system integration.

Software-Defined Networking
Software-defined networking (SDN) is a viable paradigm that combines general networking devices with a software-based controller and allows integration of traditional networking. This concept is based on physical separation of the forwarding plane-which remains on general networking devices-and placing the control plane in the form of a software application (termed an SDN controller) in a dedicated server.
The SDN controller manages all networking functions by delivering instructions to networking devices while providing centralized network management to system operators. The controller can be extended by custom-made applications to achieve a required functionality, such as cybersecurity.

Use of SDN to Integrate AI-Based Protection
The programmability of SDN allows extension of the controller with AI-based protection functionality, as demonstrated in [9]. This functionality does not require dedicated hardware devices and can be updated at any time. This leads to a more robust and comprehensive solution than does the approach of traditional networks and utilizes dedicated hardware devices and proprietary software [10]. SDN can be used for all areas of cybersecurity, from anomaly detection to threat mitigation.

Related Work
State-of-the-art research in the area of AI-based SDN protection can be classified into several fields: anomaly detection, protection against (D)DoS (Distributed Denial of Service), and performance.

Anomaly Detection
The key functionality of a protection system consists of the ability to detect anomalies and to provide effective network monitoring. A large amount of traffic requires the detection to be fully automated. Use of SDN offers a unique advantage in terms of traffic statistics. These statistics are automatically collected on all SDN-enabled networking devices and can be utilized for anomaly detection, as was presented in [11]. This approach eliminates the need to utilize dedicated devices for performing sampling-based detection, and therefore reduces cost and simplifies the network topology.
Collected statistics can be used to classify data flows and to perform filtering actions, such as blocking the traffic. This was researched in [12], where the authors developed a framework called ATLANTIC. That framework combined information theory and machine learning to calculate deviations in flow table entropy and to perform automated mitigation.
Collecting statistics on networking devices has two significant limitations. Firstly, the data are aggregated, and secondly, they are available only for lower networking layers (not the application layer). This flaw can be mitigated by forwarding the traffic via the SDN controller. This was demonstrated in the SDN-based firewall developed by Qiumei et al. [13]. The application layer firewall used supervised machine learning in combination with a typical binary classification problem to filter the traffic. The firewall achieved a relatively high detection accuracy of 96.79% and very low average latency of 0.2 ms.

Protection against (D)DoS
Availability is one of the most important aspects in relation to computer networks. Attacks on availability try to disrupt the service by overwhelming the device or software resources. This makes the service inaccessible for legitimate users. These attacks can be classified into traditional DoS attacks, where only a single attacker is pursuing the attack, and distributed versions ((D)DoS), where the attacker utilizes a large number of devices (often previously compromised devices-so-called zombies) to carry out the attack. Protection against these attacks is very complicated and typically does not provide 100% reliability.
The first step of every protection is to detect the attack. Bhushan et al. [14] described use of SDN in a cloud environment for this detection with very low communication and computation overheads. A more advanced detection of (D)DoS and port scan attacks was described in [10]. This solution used a combination of two detection methods: discrete wavelet transform and random forest.
The second step is to take an automated mitigating action, dramatically shortening what would otherwise be a slow response time attainable by human operator. A combi-nation of SDN together with network functions virtualization (NFV) was used for this purpose in [15]. Those authors used threshold-based classification, and on this basis a filtering action was taken such that the corresponding packets were dropped with a probability ranging from 10% to 100%. Another approach to automated protection was described in [16]. Those authors developed a framework called ArOMA that integrated traffic monitoring, anomaly detection and mitigation. This solution fully utilized the SDN advantage to integrate all these functions without the need for a dedicated hardware device or installation of specific software.
Efficient detection and mitigation of (D)DoS attacks will be especially important in future large-scale networks integrating IoT devices. As presented by the authors of [11], utilization of SDN and automatically collected statistics stored on networking devices is more efficient and can achieve greater detection accuracy than traditional sampling-based detection methods.

Performance
SDN is based on software processing, which, by its nature, is always slower than is hardware-accelerated processing of traditional networking devices. On the other hand, SDN provides features that can eliminate slower software processing. This includes proactive insertion of flow rules and use of hardware-accelerated tables for storing and using these rules.
The first issue was analyzed in [17], where the author determined that use of default reactive insertion of flow rules increases latency by 4 ms in typical network scenarios. Use of the proactive method reduced the overall latency to 0.1 ms.
Research on utilizing hardware-based tables for storing firewall rules on networking devices was reported in [18]. This approach achieved a 23-fold better performance over a typical software-based firewall. It is important to consider the limited capacity of these hardware-based flow tables, as this can be insufficient in large-scale networks such as IoT and clouds. This was addressed in [19], where the authors presented an algorithm for distribution of security policies in these scenarios.

Materials and Methods
As we specified in Section 1.3, the programmability of SDN allows us to extend the SDN controller with an AI-based protection functionality. One of the obvious applications of AI is the advanced traffic handling, and consequently, automated filtering during an SDN-based firewall's operation. The AI element functionality is often supposed to work as a decision element. In most cases, the AI element helps to determine one of the decision states according to incoming flow characteristics.
In this article, we introduce a specific neural network-based decision procedure that can be considered for application in any flow characteristic-based traffic handling controller. For the sake of being concise and transparent, we consider the decision element state space to be composed of the following items: allow, block, forward to selected ports, application layer inspection and four levels of quality of service (QoS) settings (low, normal, high and critical). For the same reason, we consider only L4 communication, as described in Section 3.2.1. Nevertheless, the decision element, which includes other possible communication protocols and an extended set of items in the state space, can be developed using the same procedure.
Therefore, in order to demonstrate the idea of this contribution, we aim to develop a decision procedure capable of making a decision based upon the incoming flow characteristics defined by both source and destination IPv4 addresses and also by the amount of traffic expressed as packets per second. Each IP address is composed of four octets, which are used as unique inputs. In addition, source and destination port numbers are considered as relevant inputs. Hence, 11 inputs specify the decision.
While regarding artificial neural network as an engine for the decision procedure, we will describe briefly below three hypotheses to develop the traffic handling controller described above.

Feedforward Neural Network Applied Directly to Map Input-Output Dependency
During the past three decades, feedforward multilayer neural networks with dense layers have proven themselves to be specifically competent for input-output mapping problems. If a feedforward neural network meets specific conditions [20,21], it is able to solve any input-output problem to any degree of accuracy. Hence, in this case, a feedforward neural network is supposed to determine the decision state from the incoming flow characteristic as shown in Figure 1. This hypothesis is closely dealt with in the authors' previous works [9,22] and is used here for the purpose of comparing the results.

A Convolutional Neural Network and Depiction Applied to Map Input-Output Dependency
With current possibilities in hardware acceleration of parallel computing, convolutional neural networks (CNNs) are considered to constitute a leading topology among neural networks. In addition to dense layers of classical feedforward neural networks, CNNs include a convolutional layer that extracts features from the input signal. A good summary of CNNs can be found in [23]. A list of well-known CNN topologies is summarized in [24]. Therefore, in this case, a convolutional neural network is expected to determine the decision state from the incoming flow characteristic, as demonstrated in Figure 2. It is a generally accepted feature of CNNs that the performance is particularly efficient when applied to multidimensional data processing. Image processing can be mentioned as one of the most recognizable such examples [25]. Hence, it seems to be efficient to find an operation that transforms 11 inputs (incoming flow characteristics) into a twoor three-dimensional structure, preferably a graphical figure. This operation is further referred to as depiction. A polar line chart is one of the suggested transformations, as demonstrated in Figure 3. This hypothesis, and the previous one, is analyzed in the authors' earlier work [22] and is used here for the purpose of comparing the results.

Convolutional Neural Network with Optimized Depiction Procedure Applied to Map Input-Output Dependency
This hypothesis is an expansion of the previous one and is the point of this contribution. We expect that the depiction procedure stated above strongly affects the accuracy of the decision determination. Therefore, it should be optimized in order to provide an ideal medium that supplies incoming flow characteristics to the CNN. In this contribution, we propose an approach for depiction procedure optimization as shown in Figure 4. The approach involves repeated CNN training utilizing the data obtained by various depiction procedures. The procedure itself, meanwhile, is tuned according to the performance of the trained CNN. This hypothesis is defined in detail in the following section. The testing experiments are then presented, and the results are then compared to those from other approaches at the end of the article.

Depiction Procedure Optimization
The key step of this contribution is to determine the optimal procedure for depiction. In our case, the process of depiction was supposed to transform 11 values of incoming flow characteristics into a 2D array suitable for processing by the CNN. In addition, the process was meant be quick enough to be used in SDN. Therefore, after pilot testing of several approaches, we decided to apply a depiction procedure based on planar rotation of annuli of the original medium image, since this geometric transformation can be implemented in a very efficient way. To be specific, the medium image was divided into homocentric annuli. The number of annuli equaled the number of parameters. Then, each annulus was rotated depending on the value of the corresponding parameter. An example of four parameters is depicted in Figure 5. The quality of depiction is strongly affected, however, by the entity of the medium image. In other words, a correctly defined medium image will provide a much more readable input to the CNN than will a random medium image. On the other hand, the "correctness" of the medium image is not directly observable and must be determined by evaluating the performance of the whole decision element. Hence, the medium image pattern is the subject of optimization.
As the image pattern, which affects the CNN, should be the result of the optimization process, it would be difficult to adapt classical techniques from mathematical optimization to provide the optimal solution. On the other hand, the family of stochastic populationbased optimization techniques (evolutionary algorithms) seems to be a perfect fit, because these techniques do not demand derivative evaluation or a gradient of objective function. To justify this selection, in the following paragraphs, we briefly review evolutionary algorithms applied in cybersecurity.

Work Related to Applications of Evolutionary Algorithms in Cybersecurity
Evolutionary algorithms, together with other soft computing techniques, have proven themselves to have great potential in the cybersecurity field. Even articles older than 10 years describe evolutionary algorithms to provide successful and efficient tools, especially in detecting intrusion [26][27][28]. In these works, evolutionary algorithms were used for deriving classification rules. In other cases, evolutionary algorithms were used instead to select optimal parameters of some core functions within which other methods were used to derive the rules [29].
In more recent works, there has been a focus especially on (D)DoS protection systems developed using evolutionary algorithms and other artificial intelligence techniques. The advantage of these systems lies in their ability to learn from current data. Hence, these systems are able to prevent attacks even if the attackers implement different traffic patterns [30,31].
A different domain is addressed in [32,33]. Dennis Garcia et al. provide a cybersecurity project for developing network defense strategies through modeling adversarial network attack and defense dynamics in peer-to-peer networks via coevolutionary algorithms.
Apart from the mentioned usages, evolutionary algorithms are used in addressing many of today's most recent issues within networking, such as routing, quality of service, load balancing, bandwidth allocation and channel assignment [34,35].

Genetic Algorithm for Medium Image Pattern
A genetic algorithm (GA) is probably the most commonly encountered member of the evolutionary algorithms family. It is a stochastic search method for finding a near optimum solution based on a natural selection process and genetics [36]. The GA uses a population of chromosomes representing possible solutions to the problem. In each generation, the GA creates a new set of possible solutions by selecting chromosomes according to their level of fitness. The selected chromosomes are then bred together using genetic operators, mainly crossover and mutation. This iterative process is expected to lead to better solutions.
The particular parts of the GA, and how we implement them to solve our problem, are summarized below. Note that each step depends on many tunable parameters. As it would be impossible to set each of them analytically, most of them are selected based on limited pilot studies performed during the experiments.

Solution Representation
As mentioned above, each chromosome represents a possible solution to the problem. We want to determine an optimal pattern of a medium image for the depiction procedure. Hence, the chromosome is represented by a 2D square array with 110 rows and columns. We believe that this size is a compromise reached by considering computational complexity and gains in accuracy. In addition, this size can be processed by all the selected CNNs (see Section 3.2). Each cell in a 2D array is filled by a value in the range < 0, . . . , 255 >. Therefore, this array can be visualized as an 8 bit grayscale image.

Fitness Evaluation
Each chromosome in the population needs to be evaluated by its fitness level in order to perform selection. In our case, the chromosome represents a medium image for depiction procedure. This procedure affects the input into the CNN, and consequently, the performance of the CNN. Therefore, each individual was evaluated by the performance of the CNN trained using our dataset (see Section 3.2.1), transformed using a specific depiction procedure. The process of evaluation is shown in Figure 6. The final fitness was evaluated as the mean value of the best performances of each particular CNN. The performance is defined as a categorical cross entropy loss function.
Apparently, the fitness evaluation procedure is a very stochastic process, especially due to the CNN's training. Therefore, we decided to train three well-established CNNs in one evaluation. Each CNN was trained five times. Hence, 15 training processes were performed during one fitness level evaluation. We expected this to be sufficient for suppressing the stochasticity of the procedure. As CNNs, we selected LeNet-5 [37,38], AlexNet [39] and VGG-16 net [40]. Note that because the fitness level is based on some kind of loss function, lower fitness means a better chromosome. All the parameters are summarized in Table 1. Population initialization is the starting point of the GA. During initialization, all chromosomes in the initial population are set. We select the heuristic initialization method as follows: • 20% of chromosomes set as an array of random integer in range < 0, . . . , 255 >; • 20% of chromosomes set as an array of random integer in range < 0, . . . , 63 >; • 20% of chromosomes set as an array of random integer in range < 0, . . . , 127 >; • 20% of chromosomes set as an array of random integer in range < 128, . . . , 255 >; • 20% of chromosomes set as an array of random integer in range < 192, . . . , 255 >.
Each chromosome is then filtered by Gaussian filter, where the standard deviation σ is set randomly in range < 0, . . . , 10 >.
Some examples of initial chromosomes are shown in Figure 7.

Selection
Selection is that step of the GA where individual chromosomes are selected from a population for breeding. We choose to use tournament selection. To be more specific, n tournaments are arranged, where n is the number of chromosomes within a population. Four chromosomes randomly selected from the population participate in each tournament. Eventually, the winner of each of n tournaments is selected for breeding.

Crossover
Crossover is a genetic operator used to combine the genetic code of two or more parent chromosomes to provide new (offspring) chromosomes. Crossover provides stochasticity to the process by creating completely new chromosomes. In our case, where the chromosomes are represented by grayscale images, the process of crossover is defined as follows: • Get two parent chromosomes a and b from population; • set α to a random value in range < −0.5, . . . , 1.5 >; • limit α to a range < 0, . . . , 1 >; • set number of crossover points to integer in range < 1, . . . , 10 >; • for each crossover point: divide both parents a and b along the crossover point into rectangles a 1 , a 2 , a 3 , a 4 and respectively • return a, b as the result of crossover.
Note that there is the same probability for α to be at its limits or between them. For better illustration of the crossover operation, some examples of crossovers are shown in Figures 8-10.

Mutation
Mutation is an operation in the GA to maintain genetic diversity within the population. Selection and crossover naturally reduce diversity of chromosomes, and this could lead the algorithm into an unwanted local optimum. Hence, mutation is an essential operator to keep the diversity sufficiently high.
We implement two types of mutation, each performed on every chromosome with probability 0.025. Both types of mutation are described below.
• Define a square subarray from a chromosome with random position and random size (maximum number of rows and columns is 40). Create a subarray of the same size by the initialization procedure defined in Section 2.6.3. Place the created subarray into the position of the original subarray. • Filter the chromosome using the Gaussian filter, where standard deviation σ is set randomly in the range < 0, . . . , 10 >.
Two examples of the mutation procedure are depicted in Figure 11.

Elitism
Elitism in the GA is a procedure allowing the best chromosomes from the current generation to migrate unaltered directly to the next generation. Elitism generally guarantees that the solution quality of the best chromosome will not decrease over generations. In our implementation, we simply migrated the best solution (one chromosome) from its current population to the next generation. Note that the best chromosome still can be selected for crossover and mutation, even if selected for elitism.

Optimization Flow
Considering the statements above, we design the optimization procedure in order to obtain the ideal medium image for the depiction procedure. Note that all the steps, especially fitness function evaluation, are computationally demanding, so we implemented several heuristics to improve the probability of being successful. Specifically, we designed the experiment as follows.
First, we initiated three encapsulated populations of 200 individual chromosomes. Each population was evolved for 67 generations. Then, the 50 best chromosomes from each encapsulated population, together with 50 freshly initiated chromosomes, were put together to create a new population. This new population was evolved for the next 33 generations. The optimization procedure is shown in Figure 12.

Results
In this section, we provide the process and results of the depiction procedure optimization, and afterwards, results of the whole decision procedure design.

Depiction Procedure Design
We performed the optimization experiment according to the statements summarized in the previous section. In Figure 13, a course of the fitness level is shown for a mean chromosome and for the best chromosome from all populations. Several interesting points are worthy of note. Through the first 67 generations, the fitness of the best individual steadily declines. Then, after integration of all the populations, the fitness of the best chromosome falls steeply for several generations before eventually becoming constant. This course indicates that the optimization process is set suitably and the number of generations is sufficiently high. If we examine the course of the mean chromosome, we can observe that the mutation operator ensures a sufficiently high diversity in the population for the first 67 populations. The diversity then ascends, obviously because of freshly initiated chromosomes. The diversity in the last 20-25 generations is conspicuously lower (note the logarithmic scale on the y-axis). In addition, the stochastic process of neural network training is still observable on both courses. It is obvious that this optimization experiment could be run repeatedly for different parameters. However, as mentioned above, it is computationally a very demanding task. This one experiment ran for more than four months using three computers having hardware-accelerated parallel processing. To be more specific, we used computers with the following hardware specifications: processor-Intel Core i5-8600K (3.6 GHz); internal memory-16 GB DDR4 (2666 MHz); video card-NVIDIA PNY Quadro P5000 16 GB GDDR5x PCIe 3.0 (2560 CUDA cores); SSD-SATA M.2 512 GB. The experiments were performed using Python 3.6 and TensorFlow 2.0.
The chromosome with the best fitness level at the end of the optimization process is shown in Figure 14. This pattern is used as the medium image for the depiction procedure.

Decision Procedure Design
In this section, we aim to develop a CNN-based decision element for a decision procedure according to Figure 2. This process is especially based on training and testing of the implemented CNN. Feedforward CNNs consist of multiple layers arranged in a feed-forward manner. The first layers (convolutional and max-pooling, typically combined with ReLU) perform feature extraction from the input data. Then, several dense layers are connected and the classification or decision is ensured by a soft-max activation function.
As the performance of the CNN is strongly affected by its structure, we decided to include several well-known architectures for testing. Namely, Net1 and Net2 are the simplest architectures. Both of these were adapted from [41]. In addition to these networks, the following more complex and widely accepted topologies were selected: LeNet-5 [37,38], AlexNet [39], VGG-16 net [40] and MobileNet [42]. Note that more advanced CNNs were not included into this selection because of the strong need for computational efficiency. More recent CNNs are generally much more computationally demanding.

Training Dataset
The dataset used simulates a highly utilized industrial network corresponding to an electrical substation network [43] with control and management applications such as SCADA and Distribution Management System. In order to make the solution reasonably coherent, lower-layer industrial protocols such as GOOSE were not considered. The traffic was generated by a custom developed application [9], which in turn generated the target decisions (ALLOW, BLOCK, INSPECTION, FORWARD, QOS EF, QOS AF13, and QOS AF41). Data traffic was generated for the following scenarios: normal traffic (random TCP or UDP ports and action: ALLOW), DoS attack (number of packets per second for a single flow distinctly high and action: BLOCK), HTTP traffic (TCP destination port 80 and action: application layer INSPECTION), HTTPS traffic (TCP destination port 443 and action: FORWARD to selected ports) and three types of QoS (critical priority for TCP destination port 5060; high priority for TCP destination port 37; and low priority for source or destination ports 20, 21, 69 and 115). IP addresses and number of packets per second were randomly generated from defined intervals. The traffic map generated for neural network training consisted of 80,000 unique data patterns.
The dataset was subdivided into a training set (70%), validation set (15%) and testing set (15%). The training set was used for neural network parameter adaptation during the training process, the validation set was used to identify the best network configuration during training and the testing set was used for final AI module evaluation.
Note that inputs in the dataset were transformed using the depiction procedure described in Figure 5 and using the medium image shown in Figure 14.

CNN Training and Results
The training of the selected architectures was performed in order to obtain a CNNbased decision element. The ADAM search technique was chosen for use as an optimizer based on its generally acceptable performance [44]. Initial weights were set randomly, with Gaussian distribution (location = 0, scale = 0.05). The training instances were performed 50 times. See Table 2 for all parameters of the training processes. The resulting values of the categorical cross entropy loss function computed over the testing set for each topology are shown in Figure 15. Categorical cross entropy loss function is calculated as follows.
where N is the number of samples in the testing set, K is the number of classes considered for classification, t is the label of the target class (0 or 1) and y is the j−th scalar value in the neural network output (between 0 and 1).
In addition, accuracy of the decision element when using the data in the testing set is depicted in Figure 16. As training is a stochastic process, the results are depicted as box graphs. Accuracy is defined as follows.
where n is the number of correctly performed decisions and N is the number of all decisions, i.e., the number of samples in the testing set.
Loss function E Figure 15. Final values of loss function (1) over testing set.
Accuracy Acc Figure 16. Final accuracy (2) of the decision element over testing set. As mentioned in Section 2, we have tested three hypotheses in our research. The best final accuracies of the decision element designed in this contribution are compared to our previous results in Table 3. In addition, we include other important metrics-confusion matrix, precision, recall and F1-score-in Appendix A.

Discussion
The objective of the presented work was to develop a specific neural network-based decision procedure that may be applied to a flow characteristic-based traffic handling controller. Three hypotheses were formulated and tested in this and the authors' previous works. It already has been shown that the convolutional neural network in combination with a depiction procedure provides better accuracy of the decision element in comparison to a direct feedforward neural network [22]. We clearly show in this contribution, however, that it is beneficial to optimize the depiction procedure itself. As the results demonstrate ( Table 3), optimization of the depiction procedure improves the accuracy from 0.9501 to 0.9984 while preserving the same computational complexity. In addition, the other metrics presented in Appendix A also support this statement. For example, according to Table A11, (D)DoS attack was correctly detected in 6204 cases of 6218 possible cases and no false detections were triggered.
Although the optimization process is a hugely time-consuming task, it was performed once during the development of the decision procedure and it did not affect implementation of the element in traffic handling control.
The proposed AI decision procedure can be generally utilized in many network security devices such as firewalls, intrusion detection systems and intrusion prevention systems. In our protection system deployment, as formerly introduced in [9], the functionality combines intrusion prevention and detection systems. Use of SDN relies on periodical collection of traffic data (commonly every one second interval) and subsequent processing on the SDN controller, which includes the AI subsystem. This processing is therefore done offline as in traditional intrusion detection systems. However, unlike in these systems, our system can react based on the AI subsystem result, by inserting specific flow rules (such as blocking) into networking devices and therefore achieving functionality of the intrusion prevention system, but with an approximately 1 s latency. For testing purposes, we deploy the proposed decision procedure using NVIDIA Jetson NANO [45], as a single board computer naturally suitable for this purpose. The latency of the depiction procedure is 18.6 ms and the latency of the MobileNet (which provides the best overall performance-see below) is 16.6 ms. We assume, the decision procedure could take from approximately 2 ms to 100 ms, based on used hardware and neural network architecture. Therefore, it can be seamlessly applied in the SDN-based network protection system with a one second interval.
In examining performance of the particular convolutional networks, we should emphasize especially the well-established VGG-16 and MobileNet, which feature good learning ability, result in small loss function values and deliver excellent performance with accuracies equal to 0.9976 and 0.9984, respectively. Surprisingly, LeNet-5's accuracy fails to exceed 0.95.
It is not possible to directly compare the presented results to other works. Although several authors propose artificial intelligence techniques for security handling, using the software-defined networking paradigm, they mostly consider different aims, unmatched datasets and uneven conditions. For a raw illustration, we summarize some findings below.
Authors in [46] proposed an intrusion detection system for SDN based on a neural network approach and they achieved the accuracy of 0.973 with their dataset. Oo Myo Myint et al. [47] introduced a detection method of (D)DoS attack by using the advanced support vector machine technique with an accuracy between 0.970 and 1.000 based on the ratio of training and testing data. They also used their own dataset. Fuzzy logic approaches can also be used for detection of the (D)DoS attack on SDN. Authors [48] proposed an algorithm that deployed multiple criterion for attack detection, and they demonstrated the ability to detect and filter 97% of the attack flows with a false positive rate of 5%. Moreover, a combination of a support vector machine and a decision tree approach was introduced in [49]. Based on the experimental results with the KDD CUP99 dataset [50], their system showed an accuracy of 0.976. Additionally, as a last example, Phan et al. [51] provided a novel approach which implemented a self-organizing map with a support vector machine approach. Their results showed that this system was able to achieve an accuracy of 0.976 and a false positive rate of 3.85 %. As mentioned, these results were gained from different datasets and the acquisition procedures were based on different effects. Despite this, the presented accuracies are roughly on the same level as our results, or worse. These findings indicate that our approach is vindicated.

Conclusions
In this contribution, we proposed a specific neural network-based decision procedure as a part of a traffic handling controller. Such an AI-based element can be straightforwardly integrated into a software-defined networking controller to provide all the advantages of a machine learning approach while presenting no particular demands for proprietary software or custom hardware.
The main contribution consists of the development and improvement of the depiction process using a genetic algorithm. We implemented a convolutional neural network as a decision element. In as much as convolutional neural networks behave especially well when applied to multidimensional inputs, we state a novel depiction procedure to automatically transform incoming flow characteristics into a 2D array. The depiction procedure uses meta-learning to adaptively perform an efficient conversion of raw data into a new data representation (suitable encoder), which will be suitable for processing using a convolutional neural network. The depiction procedure adds another layer of representational learning (one layer of representational learning is contained in a deep neural network) and is optimized in a complex computational experiment based on a genetic algorithm.
As a result, we demonstrated that a convolutional network, in combination with an optimized depiction procedure, provides exceptionally high accuracy of the decision process. The proposed process of finding a suitable data representation and an effective depiction procedure on the performed experiments, significantly increases the accuracy in the classification of network packets. The presented method is, nevertheless, far from optimal. One important point is the size and bit depth of the medium image-these parameters are now determined once without the possibility of a change. It could be possible to find the size and depth more suitable for a particular neural network. The other point is the depiction process. Many well accepted approaches, which can store information into an image, are known. Instead of using our procedure, we can adapt other one and get a better or less time-consuming depiction process.
Moreover, we believe that the proposed depiction procedure could be used much more generally, and that it could be applicable to other problems that do not yet have good enough results when using neural networks. Future research will focus on designing and exploring different types of depiction procedures and finding more general approaches that could increase the accuracy of neural networks and machine learning algorithms on selected problems.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Metrics of Designed Neural Networks
We present the accuracy of each neural network intended to be a decision element in Table 3. However, in order to provide a comprehensive information about the process, we provide the other metrics here as an appendix. In the following tables, we present a confusion matrix, precision, recall and F1-scorescore for the best representative of each neural network and each class. The metrics are defined as follows.