Deep reinforcement learning autoencoder with ra-gan and gan

ABSTRACT

authors in [7] proposed a novel hybrid approach where supervised training and reinforcement learning are applied respectively on the receiver and transmitter.
Notably, the above works consider solely the most undemanding AWGN channel case, and the systems were already burdened with the substantial amount of data transmission between the receiver and the transmitter. To overcome this issue, a number of channel imitation-based schemes utilizing Deep Learning (DL) were introduced [10] [11] [12], in which addition modules used to imitate the channels in practice are installed. Particularly, DL models can imitate any arbitrary functions with the help of Deep Neural Network (DNN). It is highly capable of solving large-scale and complex problems [13]. DNN can either serve as a separate signal processing module or be implemented in an existing module to improve its performance [14] [15] [16]. Additionally, it can be used instead of a complex transceiver to remove the hardware burden when the designers attempt to optimize the global performance of wireless networks from the end-to-end (E2E) viewpoint [17].
The received signal was imitated with a generative adversarial network (GAN) in [18]. Gan has two components, which are the generator and the discriminator deployed in the framework of multi-layer DNNs. During the DNNs training, a fake received signal imitating the real received signal is generated by the generator to establish and maintain the training process for the transmitter. Meanwhile, the discriminator trains the generator to ensure that the signal it generates is as close as possible to the real signal. Thereby, the bridge for the standard propagation (BP) for gradient calculation for the transmitter is built. Papers [19] [20] [21] [22] [23] have shown that any arbitrary channel can be imitated with this method, and the hardware complexity of the transceiver can be significantly reduced. However, there are two drawbacks with the method, being the possible gradient vanishing problem with multi-layer generator and the over-fitting problem during iterative training. These problems affect the imitation accuracy of the GAN, which leads to the degrade of the E2E learning, thus the system performance [24].
A wireless system is fundamentally divided into different modules, i.e., source and channel encoder/decoder, (de)modulator, etc. [17]. Therefore, the nonlinear behavior of the system cannot be simply expressed using mathematics and optimizing the individual modules does not guarantee the optimization of the global system [25]. It was proven in [26] that designing modulation and coding separately is sub-optimal and the same principle is applied to designing other components as well, which hinders the global optimization of conventional wireless networks. Thus, there is a need to change the design paradigm with other approaches.
One of the approaches to optimize the communication system is to utilize end-to-end (E2E) learning. However, an E2E learning communication system only be trained via BP with known channel [3]. Thus, it is necessary to deploy GAN, which is capable of imitating any arbitrary channel and reduce effectively the complexity of transceiver's hardware [27] [28] [29]. The GAN system faces its inherent problems being gradient vanishing (when calculation is passed through the multi-layer NN structures) and overfitting (when attempting to optimize several modules and their weights at once). The two problems will cause the downgrade of the system performance.
As a solution, we propose in this paper a so-called residual aided GAN (RA-GAN) scheme, in which generator's structure is modified with the residual neural network (Resnet) studied in [30]. Instead of generating the received signal in GAN, the generator in RA-GAN generates the difference between the signal that the system transmits and receives. Specifically, instead of the layer-by-layer, a skip connection is deployed so that in the generator, the input layers and output layers are connected. Moreover, the loss function is changed in RA-GAN so as to mitigate the over-fitting issue in GAN by adding a regularizer. The benefits of such modifications are presented as the contributions of the study: • Generator's structure in the conventional GAN-based system is modified using Resnet to counter the gradient vanishing and overfitting problems.
• By employing a skip connection to link the the input and output, it is possible to produce extra gradient (difference between the signals that the system transmits and receives), as a counteraction to the gradient vanishing issue. By adding a regularizer in the loss function in RA-GAN, which is not computationally complex, the representation ability of the training scheme is limited so that the over-fitting issue can be mitigated.
• In comparison with the GAN scheme, the fake signal that RA-GAN generates is closer to the real received signal, proving that the proposed residual generator we propose outperforms the conventional one; Following the Introduction in Section I. The method for GAN and RA-GAN training schemes of an E2E learning communication system is reported in Section II. Section III reports the simulation results. Eventually, the paper is concluded in Section IV. the transmitter Τ, inter-mediate channel n, and receiver R. Two multi-layer NNs T and R are deployed respectively with trainable weights of θ T and θ R . At first, a piece of information x input to T is mapped to a one-hot vector 1m. Notably, 1m is a M-dimensional vector taken from M. In this set, the m-th element is 1, and the rest M-1 are 0. As a next step, is deployed as a function θ : ℳ ↦ to map the one-hot vector to the signal ∈ so that it can be sent via channels. Similarly, is a function θ : ↦ � ∈ + | | ∑ =1 = 1 �, which is used to map the received signal ∈ to a probability vector ∈ + . Subsequently, we obtain the final decision �, corresponding to the input � , with � being the maximum value in the probability vector . It should be noted that the 's hardware defines the power constraint put on signal , that is, | | 2 = 1. The final aim is to retrieve so that it can be as accurate as possible to the signal the system receives.

Method
where and are respectively the channel and Gaussian noise. The standard complex Gaussian distribution whose mean and variance are (0,1). We use a loss function following [9] to compare that is transmitted and the recovered : To acquire the optimal weights θ * and θ * respectively for and , we apply a back propagation (BP) algorithm on the loss function ℒ(θ , θ , ) in (2) to calculate the gradient. Nonetheless, in (2), we can only update θ with the below gradient: where there is the loss function approximation L, derived from (2). For functions ∈ ℝ and ∈ ℝ with variable we have their gradient matrices as So as to boost the performance of the system, we have to optimize θ as well [2]. Nevertheless, the gradient ∇ θ L cannot be obtained as proven in [4] , because the backpropagation process is obstructed with the unknown channel as follows: where the identity matrix of size is

Generative Adversarial Network scheme (GAN)
To overcome this, a GAN illustrated in Fig. 2 was used for generation of surrogate gradient to subsequently update θ [19]. In general, a GAN includes a generator connected to a multi-layer NN discriminator . Both are with trainable weights respectively symbolized with θ and θ . We use : θ : ↦ to produce fake received signal � in accordance with the transmitted signal and random noise obtained from the standard Gaussian distribution. For simplification, in , we consider as a built-in variable. Meanwhile, the discriminator θ : ↦ (0,1) is deployed for training to ensure that the difference between the fake signal and real received signal is minimal.
On the other hand, is used to recognize the real from the fake signal it receives. Specifically, if the input to is from the real with signal distribution of ( | | ), output of will be 1. Contrarily, the output is expected to be 0 for the fake received signal distribution � ( � | | ).
As for , to ensure the real and the fake received signals are similar, it has to adjust its own output � so that after inputting into , the output of (�) is made as close to 1 as possible.
With regard to the GAN's procedure, it is possible to update the weights of as per the loss function of the real received input as follows: Correspondingly, the weights θ of is alternately updated as per the loss function of the fake received input � with: Consequently, we can calculate the gradients with ∇ ℒ ( ) and ∇ ℒ ( ). The loss functions (6) and (7) can be minimized with the Adam gradient descent algorithm in [27], [28] and [29]. Because we can train to reproduce the real received signal, we have to ensure that the surrogate gradient is the closest to the gradient (5) and it can be passed back through with: where However, as mentioned in [24], the training instability will limit the performance of GAN, resulting in the dramatic downgrade of the whole system. As aforementioned, the causes for this are gradient vanishing and over-fitting and would be addressed with a proposed RA-GAN training scheme.

Ra-gan scheme
Because of the existence of unknown channel, training the transmitter is a demanding task. As per the GAN training scheme, we can obtain a surrogate gradient using (8) to update . Nonetheless, the output distribution � ( � | | ) varies much from ( | | ) owing to the problems of gradient vanishing and over-fitting, leading to the proposal of RA-GAN training scheme.

Residual learning to alleviate the gradient vanishing issue
For the traditional GAN, the variables are fed forward through layers and eventually output as fake samples by the multi-layer . However, if there are more layers in , the gradient would become significantly small because it has been multiplied by the partial derivatives of loss functions when it is passed through the layers in the BP algorithm. If the case the partial derivative is near to 0, it will result in the vanish of the gradient, causing problems to the training process of . With inspiration taken from the residual learning in [20], a connection to keep the in-between layers that links the input and the output of is designed. Correspondingly, the residual generating function : ↦ could be formulated as: where the transmitted and the generated signals are respectively denoted as and �, respectively. Additionally, the residual generator θ ( ) is implemented to distinguish the transmitted from the received signals according to the conditional input .
Then, we can calculate the gradient for updating weights of the in the RA-GAN scheme as: where

Regularization method to mitigate overfitting
The loss function is reconstructed to overcome the over-fitting problem in the E2E learning of the system. As aforementioned, when and D are added in GAN for training purposes, we can witness the significant increase of additional trainable weights, leading to the over-growing representation ability thus the over-fitting problem. To mitigate this, a so-called regularizer can be added to the loss function. Being different from the GAN, this regularizer allows the RA-GAN scheme to recover the real signal it receives with the help of regularization method. In particular, for the representation ability restriction, we would need to add a weight Ω(θ) for penalty into the loss function of the RA-GAN scheme: where the loss function that we reconstructed and original loss function are respectively denoted as L(θ ) and L(θ ). In addition, there is a hyper-parameter λ deployed to balance Ω(θ) and L(θ ).  The process is separated to three primary steps: 1. Initializing weights; 2. Generating fake signal and real signal; 3. Training the network weights iteratively. It is worth noting that iterative training is applied for T, R, G, and D in this system. It means that the change in parameters of a module does not lead to the changes of other modules unless it is converged. Additionally, the actual signal transmission is only completed once the parameters of T and R are returned.

Results and Discussion
Here, we presents the performance analysis of the RA-GAN scheme in AWGN channel. To evaluate the ability to transmit the data, we focus on the block error rate (BLER). Followingly, we compare the RA-GAN and the conventional GAN scheme in [19], while considering the training method, which has the known channel, as the optimal performance bound.
Specifically, for the optimal case, is assumed to know the real channel, making the gradient ∇ θ L available for training in (10). Moreover, the ability to generate fake received signals of the RA-GAN and the conventional GAN is compared. The simulation parameters are = 16, = 7, B = 320, train = 10000, λ = 0.01, δ 2 = � 0 2 log 2 � in [2]. The simulations are conducted in Matlab with the formulations that are derived previously in this paper.
We used a dataset of 100.000 one-hot vectors that are randomized to validate the BLER performance of the system trained with SNR = 5( ). As can be observed in Fig. 4, the GAN-based [19] performs significantly worse than the optimal case (with known channel), owing to the aforementioned gradient vanishing and over-fitting problems happen during GAN training. On the other hand, the RA-GAN scheme is almost identical to the optimal training method. This shows how effectively the existing problems of GAN system is mitigated by employing the skip connection in the generator and regularizer as a part of the loss function. In Fig. 5 and Fig. 6, we plotted the results from the loss functions in relation with the training epoch in AWGN channel with SNR = 5( ). In Fig. 5, we can see that the GAN scheme cannot converge while the RA-GAN one in Fig. 6 can. It should be noted that during the training process, owing to the training randomness, some bad points would exist, however, they could be recovered in the next epoch. Therefore, it can be concluded that the RA-GAN scheme is able to generate signals that are better fit to the real received one. In other words, the trained residual outperforms the conventional in terms of generation performance.  From the results, it can be observed that the proposed RA-GAN-based system performs remarkably better than the conventional GAN. This is proven as well from the sensitivity analysis that is conducted. Future works can consider different datasets to assess how consistent the RA-GAN scheme is. Moreover, it is suggested to investigate different deep learning neural networks such as the deep ensemble learning [31], and compare their performance to the basic GAN and RA-GAN in this paper. Besides, broader and more comprehensive view on the application of DL in the wireless networks from both the software and hardware perspectives [32] can be an inspiration for the future research interests.

Conclusion
Firstly, the definition of E2E learning system is introduced together with its challenges, one of which is that the transmitter of the system can only be trained with known channel. Based on that, we improve the GAN scheme to a so-called RA-GA scheme. The two primary problems associated with the conventional GAN being the gradient vanishing and over-fitting were solved as RA-GAN delivers more robust, powerful gradients (by skipping connection between the input and output of the generator) and helps to control the representation ability (by added the Regularizer to the loss function). In terms of the BLER, the simulation result shows that the RA-GAN performs comparatively well as the optimal method, which outperforms the conventional GAN. Declarations Author contribution. All authors contributed equally to the main contributor to this paper. All authors read and approved the final paper. Funding statement. None of the authors have received any funding or grants from any institution or funding body for the research.
Conflict of interest. The authors declare no conflict of interest.

Additional information.
No additional information is available for this paper.