An Integrated Model Based on O-GAN and Density Estimation for Anomaly Detection

Anomaly detection is a classic and crucial problem in the field of artificial intelligence, which aims to find instances that deviate so much from the main distribution of data or differ from known instances. This paper explores how to combine advanced deep learning techniques with traditional probabilistic statistical methods for anomaly detection. We propose a very effective and concise semi-supervised anomaly detection method named “ORGAN-KDE” based on the orthogonal generative adversarial network (O-GAN) and kernel density estimation. In the training phase, we use the encoder of O-GAN to learn the latent representation of normal data, namely the code of normal data, and then use the kernel density estimation to solve the probability density function of code. The code of normal sample obtained through the trained encoder can get a larger probability value when passing through the trained kernel density estimator, while the code of anomalous sample can get a smaller probability value, so as to achieve the purpose of anomaly detection. Compared with other anomaly detection methods based on GAN, our method has a very simple network structure, and experiments have proved that it performs well on both structured datasets and image datasets.


I. INTRODUCTION
When analyzing real-world data, one of the most important works we do is to find those instances that are significantly different from the remaining instances [1]. For the tasks in different domains, anomaly detection may also be referred to as outlier detection [2] or novelty detection [3]. The application of anomaly detection technology is extensive a lot, including but not limited to credit card fraud detection [4], network intrusion detection [5], and medical anomaly detection [6]. Nowadays, with the increasing complexity of practical problems and the proliferation of data, the anomalous samples of known or unknown types can be inevitable, and it is very important to timely find these anomalous samples. Therefore, anomaly detection technology plays an irreplaceable role in modern data analysis and plays a very important auxiliary role in solving practical problems.
There are numerous methods for anomaly detection, which can be summarized as the following: statistical and probability method, distance-based method, domain-based method, reconstruction-based method, and information theory The associate editor coordinating the review of this manuscript and approving it for publication was Md Asaduzzaman . method [7]. In our opinion, these methods of anomaly detection are nothing more than capturing the distribution of normal samples and then finding an effective judgment criterion to correctly distinguish anomalous samples from normal samples. For example, most GAN-based anomaly detection methods use GAN to capture the distribution of normal samples, and then use the idea of reconstruction to distinguish anomalous samples from normal samples.
Although there have been many exciting research results in the field of anomaly detection, there are still many challenging problems [1]. First of all, with increasingly diverse data types and increasingly complex data structures, the accuracy of traditional anomaly detection methods dipped across the board, because these methods cannot accurately capture the structure of complex data. Secondly, the boundary between normal data and anomaly data is often not clearly defined, so the judgment conditions are not easy to be given, which is a very challenging problem for the field of anomaly detection. A better way to solve this problem is to get the distribution of the normal sample, and then determine whether it is an anomalous sample through measuring the difference between the test sample and the normal sample distribution. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Deep learning is a subset of machine learning and has been proved that deep learning has better performance than traditional machine learning methods. Deep learning can processes data with extremely complex structures, which makes it more and more popular in the application of anomaly detection [1]. At the same time, as the anomaly detection data is generally unbalanced, that is, the number of normal samples far exceeds the number of anomalous samples, those traditional supervised classification methods do not perform well in the field of anomaly detection. Therefore, semisupervised or unsupervised methods are more prevalent in the field of anomaly detection. The method proposed in this paper is a semi-supervised deep learning anomaly detection method.
GAN is an advanced generative model proposed by Goodfellow et al. [8]. In recent years, researche on GAN has mushroomed. GAN is very suitable for modeling the high-dimensional complex distribution of real-world data. Therefore, GAN has recently been used in the field of anomaly detection, such as AnoGAN [9], EGBAD [13], GANomaly [11], and so on. Nevertheless, with the deepening of research, the network structure of these GAN-based anomaly detection methods has become more and more complicated. Therefore, we refer to the research of Mao et al. [12] and propose an anomaly detection method based on O-GAN [10], which makes the network structure as simple as possible on the premise of ensuring accuracy.
In this paper, a semi-supervised anomaly detection model based on O-GAN [10] and kernel density estimation [15] is proposed, which is referred to as ''ORGAN-KDE''. The network consists of an encoder, a generator, and a kernel density estimator. An encoder and a generator constitute an O-GAN, where the encoder is a sub-network with special functions, and it has the functions of discriminating and coding at the same time. In the training phase, only the normal samples are fed to O-GAN, and through the adversarial training, the encoder can capture the meaningful latent representation (i.e. features, or is called codes) of the normal samples. The codes are then fed to the kernel density estimator to estimate the code's probability density function. In the anomaly detection phase, the test samples are detected by the trained encoder and the kernel density estimator. Since the encoder is trained by the normal samples, it can only learn the latent representation of the normal samples, but not the abnormal samples. Apparently, the codes obtained by the anomalous samples through the encoder must not belong to the code distribution of normal samples. Then, when the code of an anomalous sample passes through the kernel density estimator, a small probability value will be obtained, so as to achieve the purpose of anomaly detection.
The idea of our method is to extract the important features of data in high-dimensional space through the encoder of O-GAN, to achieve the purpose of dimension reduction, and improve the accuracy of anomaly detection. We are in the following considerations. First of all, high-dimensional data such as images can be regarded as the low-dimensional manifold embedded in the high-dimensional space, so the redundant information can be removed through O-GAN and key information can be obtained. Secondly, nonparametric statistical methods have the problem of ''curse of dimensionality'', so we reduce the dimension of high-dimensional data to low-dimensional space, which can make the kernel density estimation work better. Compared with the previous anomaly detection methods, our method has the advantages of simple network structure, which makes up for the traditional methods' inability to model complex and high-dimensional data.
The rest of this paper is organized as follows: In Section II, we will briefly introduce the related works; in Section III, some background information will be briefly introduced; in Section IV, we will describe our proposed method in detail; details of the experiment will be shown in Section V; in the final section, conclusions and future work will be provided.

II. RELATED WORK
In this section, we will summarize the previous researches, including generative adversarial network, density estimation, and anomaly detection.

A. GENERATIVE ADVERSARIAL NETWORK
Generative adversarial network (GAN) is an excellent generative model network proposed by Goodfellow et al. [8]. It provides a new way of thinking for model generation, that is, the adversarial idea, which gives birth to many subsequent models. Generally speaking, GANs consists of two sub-networks, namely generator and discriminator. The adversarial game between the generator and the discriminator enables the generator to produce visually realistic images. At present, GANs have shown excellent performance in many fields such as super-resolution [14], image translation [16], [17], video generation [18], text generation [19], and so on, as well as have a large number of applications in the field of anomaly detection [9], [11]- [13], [20]. Meanwhile, many scholars have also conducted in-depth studies on the network framework and mathematical principle of GAN, which gave rise to a series of GAN variants, such as DCGAN [21], WGAN [22], LSGAN [23], BiGAN [24], and so on. According to existing studies, most GANs can obtain powerful generators through alternating iterative training, but discriminator is often abandoned because, in theory, discriminator will degenerate in the process of training (such as tending to a constant). Su [10] proposed a network that can convert discriminator into encoder only by slightly modifying the original GAN. This network is called orthogonal generative adversarial network (O-GAN), which enables discriminator to play a very amazing role without increasing complexity and time cost.
In this paper, we will apply O-GAN to anomaly detection.

B. KERNEL DENSITY ESTIMATION
Density estimation is a very simple concept and has a wide range of applications. There are many density estimation methods, among which the most famous and practical methods are the mixture model, such as Gaussian Mixtures [25], and neighbor-based methods, such as kernel density estimation [15]. Density estimation has been widely used in the field of anomaly detection.
In this paper, we will use kernel density estimation for anomaly detection.

C. ANOMALY DETECTION
Anomaly detection (also known as outlier detection or novelty detection) is a very important and challenging research field. The goal of anomaly detection is to find samples that deviate from the main distribution of data or are different from the known samples [1]. The more challenging task is to determine why the anomalous samples are abnormal. Anomaly detection has been widely applied in many fields, such as credit card fraud detection [4], network intrusion detection [5], medical anomaly detection [6], and so on. Scholars have also conducted in-depth studies on anomaly detection methods from different perspectives, such as the distance-based nearest neighbor algorithm [26], one-class support vector machine [27] based on the linear model, and VAE [34] based on reconstruction. We refer to the review by Pimentel et al. [7], and summarize the current popular anomaly detection methods from different perspectives in Table 1. There are as many studies on anomaly detection as stars. Therefore, in the following part of this section, the GAN-based anomaly detection method is only introduced in detail.
In recent years, there has been an explosion of research on GAN, and at the same time, anomaly detection methods based on GAN have emerged one after another. AnoGAN [9] is the first application of GAN in the field of anomaly detection. AnoGAN is a two-stage anomaly detection algorithm. In the first stage, a conventional GAN (such as DCGAN) is trained using normal samples. In the second stage, the iterative method is used to invert the test image to the latent space to obtain its latent vector, and then the trained conventional GAN is used to reconstruct the test image, and the reconstruction error is taken as the anomaly score for anomaly detection. However, the process of acquiring the latent vector of the test image through iteration is time-consuming, so many subsequent methods consider adding encoder in the GAN to reduce the time cost. EGBAD [13] is an anomaly detection method based on BiGAN [24]. In this network, in addition to generator and discriminator, there is also an encoder whose purpose is to find the inverse mapping from the real sample to the latent space, which can greatly improve the efficiency compared with AnoGAN. Since then, many methods have added different sub-networks based on GAN to improve the performance, such as GANomaly [11], OCGAN [20], MAD-GAN [38], GAN-AD [39], and so on. Although these methods are excellent in some specific problems, these network structures are very complex. Mao et al. proposed a very simple and effective anomaly detection method based on O-GAN [10], namely Dis-AE [12]. This anomaly detection method can make use of a relatively simple network structure and obtain good performance.
In this paper, based on Dis-AE, an effective anomaly detection method is proposed by combining O-GAN and kernel density estimation.

III. BACKGROUNDS
In this section, we will briefly introduce some background knowledge, including generative adversarial network, orthogonal generative adversarial network, and kernel density estimation.
A. GENERATIVE ADVERSARIAL NETWORK GAN [8] is a generative model of the training process in the game state. The main structure of GAN can be divided into two parts: generator and discriminator. The generator is used to generate visually similar images to real-world images, and the discriminator is used to determine whether the input image is an image from a dataset or an image generated by the generator. We will refer to the research of Goodfellow et al. [8] and give a brief introduction to the background knowledge of GAN.
In the training process, first of all, a set of noises are sampled from the prior distribution (such as Gaussian distribution) and false samples are generated by the generator. Then, fixed generator, training the discriminator to determine as much as possible whether the input sample is a real sample or a generated sample. After several iterations of the loop, the final ideal situation is that the samples generated by the generator are close to the real samples, and the discriminator cannot tell whether the sample is from the output of the generator or the real data. The objective function to be optimized for GAN is as follows: where x ∈ R n x (n x is the dimension of real image), z ∈ R n z (n z is the dimension of noise), G : R n z → R n x is the generator, D : R n x → R is the discriminator, p data is the real data distribution, and p z is the prior distribution.
For the discriminator D, the work done by the discriminator is a binary classification, so equation (1) can be regarded as a common cross-entropy loss in binary classification. As a very important and practical concept, cross-entropy has been widely used in a variety of fields, and there are some highly cited paper [53]- [55].
For the generator G, the work done by the generator is to generate samples that can fool the discriminator as much as possible, so you want to maximize the probability D(G(z)) of generating samples in equation (1), which is to minimize the log(1 − D(G(z))).
In the actual training process, the generator and discriminator are trained alternately. For the generator, it minimizes the max V (G, D), which minimizes the maximum value of V (G, D). For generator fixed, we can use the derivative to find the maximum value of V (G, D), which is the optimal discriminator D * : where p G is the probability distribution defined implicitly by the generator G, that is, when z ∼ p(z), the distribution of generated samples G(z). Then, we can use the optimal discriminator D * to find the optimal generator G * . Through calculation, we find that the objective function to be optimized by the generator is equivalent to optimizing the JS divergence between p data (x) and p G (x), namely: In theory, when the global optimal is reached, that is, the generator and discriminator reach Nash equilibrium, p G = p data . In other words, the discriminator cannot tell whether the input sample is from generator or real data, i.e. D * (x) = 1 2 .

B. ORTHOGONAL GENERATIVE ADVERSARIAL NETWORK
O-GAN is a very effective and powerful adversarial network. O-GAN only needs to simply modify the original GAN, and the discriminator can be turned into an encoder so that GAN has the ability of both generating and coding, and hardly increases the training cost. We will refer to the research of Su [10] and give a brief introduction to the background knowledge of O-GAN. First of all, for general GAN, the objective function can be written in the following form: where f , g, h are some specific functions, f , g, h : R → R.
Considering that the work of discriminator and encoder are similar: the output of discriminator D is a scalar, that is, R n x → R; the output of encoder E is a vector, that is, R n x → R n z . Therefore, the encoder may not be represents the discriminator, but encoder and discriminator can share a large number of parameters, and the discriminator has more parameters than the encoder. Then, the discriminator can be rewritten as: where T : R n z → R.
As is known to all, when training GAN, it is necessary to sample noise from a prior distribution p z (z) (such as Gaussian distribution) to generate realistic images. Theoretically, any noise sampled from this prior distribution can generate real images through the trained GAN. In fact, the converse is also true, which is to say that there must be z ∼ p z (z) if G(z) is a realistic image. In this case, z can be considered a feature (or code) of the real image G(z). Then, an auxiliary term, namely the Pearson correlation coefficient, can be added to the objective function to make the feature E(G(z)) as relevant as possible to noise z. At the same time, to simplify the network T (·), the average can be used instead. Therefore, the objective function of O-GAN is: where avg(.) is the average, ρ(.) is the Pearson correlation coefficient.
In this paper, we will use the encoder E of O-GAN to acquire the feature of real data and pass it to the kernel density estimator, in oreder to make the kernel density estimation can perform better.

C. KERNEL DENSITY ESTIMATION
Kernel density estimation is a nonparametric statistical method to solve the density function of random variables with given sample points. We will refer to the research of Wand and Jones [51] and give a brief introduction to the background of kernel density estimation.
Assuming X 1 , X 2 , · · · , X n is a set of d-variate random samples having unknown density p(x), where X i = (X i1 , X i2 , · · · , X id ) ∈ R d . The following formula can be used to estimate the unknown density: where K is the d-variate kernel function, H is the bandwidth matrix. A popular choice for K is the standard d-variate Gaussian kernel, and we will use it: 204474 VOLUME 8, 2020

IV. OUR APPROACH
In this section, we will elaborate on our proposed approach. First of all, we will have a precise definition of the problem to be studied. Then, we will look at the overall framework. Finally, the algorithm of training phase and anomaly detection phase will be explained in detail.

A. PROBLEM DEFINITION
First of all, we need to know what normal samples are, what anomalous samples are. The normal samples are typically predefined, known, or acceptable samples [12]. The anomalous samples are usually caused by an error or previously unknown. Given a data set that contains only normal samples D 1 = {x 1 , x 2 , · · · , x m } for training the model (i.e., O-GAN and kernel density estimator). A data set D 2 = {(x 1 , y 1 ), (x 2 , y 2 ), · · · , (x n , y n )} containing both normal samples and abnormal samples is given, where y i ∈ {0, 1} (0 represents normal samples and 1 represents abnormal samples) for testing the trained model.
Our goal is to model dataset D 1 so that O-GAN can learn the features of normal samples, and the kernel density estimator can learn the probability density function of normal sample features. Then, in the anomaly detection phase, the anomaly score A(x) of the test samples can be obtained after the test samples pass the trained O-GAN and density estimator successively. Given a threshold α, when A(x) > α, the test sample is considered as a normal sample; when A(x) < α, the test sample is considered as an anomalous sample.

B. NETWORK FRAMEWORK
The proposed anomaly detection method consists of three sub-network structures: an encoder E, a generator G, and a kernel density estimator K . Our method has two phases: the training phase and the anomaly detection phase. The specific network structure flow chart is shown in Figure 1.
In the training phase, an O-GAN and a kernel density estimator need to be trained. The purpose of training O-GAN is to enable its encoder to learn the features (or known as code) of normal samples, and the purpose of training the kernel density estimator is to enable it to estimate the probability density function of normal sample features. O-GAN consists of an encoder E and a generator G (as shown in Figure 1 on the left of the training phase). First, a set of noises are sampled from a prior distribution (such as a Gaussian distribution) and a set of fake samples can be generated by generator G. Then, the normal real samples and the fake samples are fed to encoder E. While encoder E captures the features of the normal sample, avg(E) should try its best to determine whether the samples come from the real data or the generator. After several loop iterations, E and G converge and fixed E at the same time. When normal samples pass through the trained encoder E, a set of low-dimensional features can be obtained and fed to the kernel density estimator K (as shown in figure 1 on the right of the training phase) so that the kernel density estimator K can learn the probability density function of the low-dimension feature of the normal sample.
In the anomaly detection phase, the features of test sample can be obtained through the trained encoder E. And then, the probability value of the sample can be obtained through the trained kernel density estimator K , which is used to judge whether the test sample is an anomaly or not. If the sample is an anomalous sample, its features will be in the low-density region of the normal sample feature probability density function, that is, the probability value of the anomalous sample features obtained by the density estimator will be smaller. VOLUME 8, 2020 C. TRAINING PHASE In the training phase, O-GAN and density estimator need to be trained. The data used are all normal samples, and the pseudo-code of the algorithm is shown in Algorithm 1. When O-GAN is trained, the setting of objective function and hyperparameters are consistent with that of Su [10]. The three special functions of f , g and h are set as: f (t) = h(t) = t, g(t) = −t. At the same time, a regularization term of differential form is added to the objective function of the encoder [40]. Therefore, the final form of the objective function to be optimized is as follows: where Algorithm 1 Training Phase Input: Normal dataset {x 1 , x 2 , · · · , x m }, value of λ 1 and λ 2 Output: encoderf E , generatorf G , density function of codê p(x) Loop: Draw a batch of sample from normal dataset: X = {x 1 , x 2 , · · · , x n } Draw a batch of noise from z ∈ N (0, 1): Z = {z 1 , z 2 , · · · , z n } Get generated samples and codes: Updata encoder parameters by minimizing the following objective function: Updata generator parameters by minimizing the following objective function:

End for
Training a kernel density estimator using the normal sample code: Get the probability density functionp(x)

D. ANOMALY DETECTION PHASE
In the anomaly detection phase, we need to use the trained encoder of O-GAN and kernel density estimator in the training stage for anomaly detection. The pseudo-code of the algorithm is shown in Algorithm 2. First of all, the test samples can obtain its features (or codes) through the trained encoder, and then pass the features to the density estimator to obtain the corresponding probability output value, which is regarded as the abnormal score, i.e.p(E(x i )).
Given a threshold α, if score > α, then the test sample is a normal sample; otherwise, it is an anomalous sample.

Algorithm 2 Anomaly Detection Phase
Input: Testing dataset {x 1 ,x 2 , · · · ,x n }, trained f E , trained p(x), threshold α Output: Normal or anomaly for each test sample For i = 1 : n do: In this section, we will detail the advantages of our proposed method from an experimental perspective. First, we will introduce the datasets and the baseline methods used in the experiment, and then demonstrate the advantages of our method from three aspects: performance, feature dimension comparison, and network structure difference.

A. DATASETS
In the experiment of this paper, we will use KDD-CUP99 dataset [41], MNIST dataset [42], Fashion-MNIST dataset [43] and CIFAR10 dataset [44]. KDDCUP99 is a structured dataset, and the remaining three datasets are all image datasets. In Figure 2, we present some representative samples of these image datasets. Next, we will describe these four datasets in detail. • KDDCUP99: The KDDCUP99 dataset is a nine-week network connection dataset collected from a simulated US Air Force LAN, which has been used in The Third International 204476 VOLUME 8, 2020 Knowledge Discovery and Data Mining Tools Competition. The data types for all the features in this dataset are either string or numeric. Since there are far more anomalous samples in this data set than normal samples, it is not suitable for anomaly detection research. So we used the version of SA provided in scikit-learn [45].
The MNIST dataset is a handwritten digits dataset. This data set consists of 70,000 grayscale images of 28*28 handwritten digits, including 60,000 images in the training set and 10,000 images in the test set. Each image corresponds to a 10-digit number 0-9 label.
• Fashion-MNIST: The Fashion-MNIST dataset consists of 70,000 grayscale images of 28*28 fashion clothing/accessories (e.g., T-shirt, trouser, pullover, etc.), including 60,000 images in the training set and 10,000 images in the testing set. There are 10 categories in this data set, each representing 10 different fashion clothing/accessories.
The CIFAR10 dataset consists of 6,000 color images of small images of 32*32 real objects, in which the training set contains 50,000 images and the testing set contains 10,000 images. The dataset has 10 categories that represent 10 different real objects (e.g., airplane, frog, truck, etc.).
For dividing the training set and testing set, we refer to the Protocol 2 of OCGAN [20], and the following are specific usage protocols of these datasets.
• KDDCUP99 dataset: We sampled 83770 samples from all the normal samples in the dataset as the training set; the testing set includes the remaining 13,508 normal samples and all 3,377 anomaly samples, accounting for 20% of the testing set samples.
• Image datasets: The dividing strategy of the training set and the testing set for the three image datasets is the same. When working with the image data set, we treat each category as a normal category and the remaining categories as an anomaly category, so that we get ten child datasets. For example, the first child dataset of MNIST uses the number 0 as the normal category and the remaining nine categories (i.e. number 1-9) as the anomaly category. The training set of each child dataset contains all normal samples of the training set divided by default, and the testing set contains 800 normal samples and 200 anomalous samples of the testing set divided by default, and the anomalous samples account for 20% of the testing set samples.
The details of each dataset are shown in Table 2.

B. BASELINE METHODS
To highlight the superiority of our proposed method, we compare some baseline methods. Here is a brief introduction to these baseline methods: • OCSVM [27]: OCSVM is an effective and frequently used anomaly detection method. In this method, the data samples are mapped to the high-dimensional feature space by kernel function, so that they have better aggregation, and an optimal hyperplane is solved in the feature space to achieve the maximum separation of the target data from the origin of coordinates.
• KDE [15]: Kernel density estimation is a non-parametric statistical method. The traditional method of anomaly detection using kernel density estimation is to directly estimate the density function of observed samples, while in our proposed method, it is to estimate the density of the low-dimensional features of normal samples.
• EGBAD [13]: EGBAD is an anomaly detection method based on BiGAN, with the idea of using reconstruction errors.
• GANomaly [11]: GANomaly is an anomaly detection method based on the adversarial idea and autoencoder, with the idea of using reconstruction errors.
• Dis-AE [12]: Dis-AE is an anomaly detection method based on O-GAN, and its detection method is reconstruction error. Although our method is also based on O-GAN, we use probabilistic KDE as an anomaly detection method. In order to make our method comparable to these baseline methods, we made a lot of efforts, including making the settings of layers in the network as identical as possible, the hyperparameters as identical as possible, and the selection criterion of threshold as identical as possible.

C. PARAMETERS SETTING
Parameters we need to set in advance include: λ 1 , λ 2 , and kernel function of density estimator. For λ 1 and λ 2 in the VOLUME 8, 2020 objective function, we refer to the setting of [10], λ 1 = 0.25, λ 2 = 0.5. For the kernel function of the density estimator, the Gaussian kernel is selected and the bandwidth h is set as 1.
It is well known that the selection of threshold α is very important in the studies of semi-supervised anomaly detection. However, in our study, the determination of threshold was not a major research object, so we followed the setting of threshold used in [11], [12], [52].
In the structured dataset and the image dataset, we adopt different performance comparison indexes. For structured datasets (KDD CUP 99), we use a number of metrics based on the confusion matrix, including accuracy, precision, recall, and F1 scores. For the image datasets, AUROC is adopted.
The O-GAN is implemented in Keras [46], and the kernel density estimate is implemented in scikit-learn [45].
For more information on network structure, see the appendix.

D. PERFORMANCE COMPARISON
It is well known that in real-world problems, whether or not an algorithm is adopted its performance is crucial. In order to demonstrate the superiority of our method, we made some performance comparison on these four data sets, including KDDCUP99, MNIST, Fashion-MNIST, and CIFAR10. Table 3 and Figure 3-5 show the performance comparison results on these four data sets respectively.  Table 3 shows the performance of various methods on the KDDCUP99 dataset. Although OCSVM obtains a high precision, our method is the highest in accuracy, recall, and F1 score. Apparently, for structured data, our method is superior to other GAN-based anomaly detection methods, because we use a probabilistic statistics-based detection method instead of a reconstruction-based detection method. Therefore, if the data of real-world problem is structured data, our proposed method is an excellent method for anomaly detection.    our method did not achieve optimal performance on every child datasets,our method performed well on the whole. By observing the average AUROC result figure (Figure 6), we can find that our method has the highest average AUROC on the three image datasets, and the average AUROC on the MNIST dataset is much higher than that of other methods, which indicates that our method can also perform well in image anomaly detection.

E. FEATURE DIMENSION COMPARISON
The dimension of feature plays a very important role in the effect of kernel density estimation. Too high or too low dimension may result in poor kernel density estimation. On the one hand, too low dimension may not extract enough useful information; on the other hand, too high dimension may make information redundant. The influence of the feature dimension of O-GAN encoder output on the performance of anomaly detection is also investigated. We carried out experiments on the MNIST dataset, and compared the features of 32, 64, 128, 256, and 512 dimensions. The results are shown in Figure 7.  By observing Figure 7, we found that on MNIST dataset, when the output dimension of the encoder of O-GAN is 64, the kernel density estimation effect is the best and the anomaly detection result is relatively satisfactory. When the dimension is 32, the kernel density estimation is slightly worse than 64. The higher the output dimension of the encoder of O-GAN, the worse the performance of kernel density estimation. And when the dimension is 512, the result is very unacceptable. Therefore, we suggest that when using our method for anomaly detection, the size of the feature dimension of the O-GAN encoder output should be carefully considered. For different problems, the choice of the feature dimension may be different, but it should not be too large.

F. NETWORK STRUCTURE COMPARISON
Nowadays, numerous GAN-based anomaly detection methods have been developed, but most of these methods have very complex network structures. In order to prove that our method (and other methods [12] based on O-GAN) is more simple and efficient than other GAN-based methods, we compare the network structures of various GAN-based anomaly detection methods.
As shown in Table 4, GAN-based methods often have very complex network structures. For example, OCGAN has three generators or decoders, two discriminators, an encoder, and a VOLUME 8, 2020  classifier, which is really too complicated. Other GAN-based anomaly detection methods also add an encoder or autoencoder with coding function on the basis of GAN, such as f-AnoGAN, EGBAD, MDGAN, which also makes the network structure very complex. AnoGAN, the first GAN to be used in the field of anomaly detection, only needs to train a conventional GAN (e.g. DCGAN), but this method will cause great time expenditure in the phase of anomaly detection.
O-GAN provides us with a new idea, that is, the encoder and the discriminator share a great number of parameters, so that the conventional GAN can earn coding function without great changes and additional time. Dis-AE and our method ORGAN-KDE are preliminary explorations of the application of O-GAN in the field of anomaly detection. While ensuring accuracy, network structures can also be very simple. At the same time, compared with Dis-AE, our method combines advanced deep learning technique with traditional probabilistic statistical method to improve performance.

VI. CONCLUSION
In this paper, we propose a semi-supervised anomaly detection integrated model based on O-GAN and kernel density estimation, which is referred to as ''ORGAN-KDE''. This method obtains the feature (or code) of normal sample through the encoder of O-GAN, and obtains the probability density function of normal sample feature through kernel density estimation. This method combines the most promising generation model (i.e. GAN) with the traditional non-parametric statistical method (i.e. density estimation) for anomaly detection. Experiments show that our method is effective for both structured data and image data.
Future work will consider how to use more emerging adversarial network structures to integrate with other excellent traditional machine learning or statistical probability methods, so as to achieve the use of a concise network structure for anomaly detection.

APPENDIX
In this section, we will show the network structure details of our proposed method used in the experiment, as shown in Table 5-7.