Deep Learning–Based Energy Beamforming With Transmit Power Control in Wireless Powered Communication Networks

In this paper, we propose deep learning–based energy beamforming in a multi-antennae wireless powered communication network (WPCN). We consider a WPCN where a hybrid access point (HAP) equipped with multiple antennae broadcasts an energy-bearing signal to wireless devices using energy beamforming. We investigate the joint optimization of the time allocation for wireless energy transfer (WET) and wireless information transfer (WIT) with the design for energy beams while minimizing the transmit power at the HAP for efficient use of its available resources. However, this is a non-convex problem, and it is numerically intractable to solve it. In the literature, the traditional approach to solving this problem is based on an iterative algorithm that incurs high computational and time complexity, which is not feasible for real-time applications. We study and analyze a deep neural network (DNN)-based scheme and propose a faster and more efficient approach for the fair approximation of a near-optimal solution to this problem. To train the proposed DNN, we acquire training data samples from a sequential parametric convex approximation (SPCA)-based iterative algorithm. Instead of acquiring data samples and training the DNN, which is highly complex, we use offline training for the DNN to provide a faster solution to the real-time resource allocation optimization problem. Through the simulation results, we show the proposed DNN scheme provides a fair approximation of the traditional SPCA method with low computational and time complexity.


I. INTRODUCTION
In recent decades, radio frequency (RF)-enabled energy harvesting (EH) technology has become an attractive solution for limited battery-power wireless devices. In wireless sensor networks (WSNs), the sensors of embedded wireless devices have a limited battery life, and battery replacement is infeasible due to the high cost. As a result, RF-enabled wireless power transfer (WPT) provides a promising solution for such networks, where wireless devices (WDs) are continuously powered by RF signals [1], [2]. Moreover, simultaneous wireless information and power transfer (SWIPT) and wireless powered communication networks (WPCNs) are two main applications of WPT systems studied in the literature. In a SWIPT system [3], data and energy signals are transmitted The associate editor coordinating the review of this manuscript and approving it for publication was Antonino Orsino . simultaneously to low power wireless devices using power or time switching mode. However, a WPCN follows the harvest-then-transmit (HTT) protocol, where a hybrid access point (HAP) transmits an RF energy signal to WDs on forward link, and after harvesting energy, the WDs send an information signal to the HAP on reverse link by utilizing the harvested energy. Recently, WPCNs have been presented as an emerging technology for large-scale WSNs in low-power industrial internet of things (IIoT) systems [4].
Under the HTT protocol, we assume WDs have no energy sources and are equipped with rechargeable batteries that need to replenish energy from the RF signal transmitted by the HAP. In time division duplex (TDD) mode, the HAP generally transmits an energy signal, and the WDs send information signals in allocated time slots. In order to maximize the efficiency of the WPCN, it is necessary to optimize the time allocation for wireless energy transfer (WET) and wireless VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ information transfer (WIT) because harvested energy and the achievable throughput rate of the WDs are functions of time allocation [5], [6]. To realize WET practically it is needed to transmit high amount of energy to harvest a small amount of energy because RF energy signals attenuate quickly over distances. However, there are many recent studies and advancement in designing highly efficient antennae to improve the efficiency of WET [7], [8]. In addition, WET and WIT efficiencies can be enhanced by using multiple antennae at the HAP. When we use multiple antennae, we can properly design energy beams towards WDs, called transmit beamforming or energy beamforming, which increases the energy and throughput efficiencies of the WPCN [9]- [11].
In this paper, we jointly optimize the time allocations and energy beams for multi-antennae WPCNs. Moreover, for efficient use of available HAP resources, we also optimize transmit power while managing the minimum requirements for harvested energy and the achievable throughput of each wireless device. WPCNs have been investigated for various scenarios in the literature that present efficient solutions for WET and WIT in wireless networks [5], [9], [10], [12]- [19]. These studies jointly optimized the resource allocations while maximizing WPCN efficiency. In [5], Ju and Zhang studied the throughput maximization in a WPCN, in which they proposed a harvestthen transmit protocol. They presented an optimal solution for throughput by optimizing the time allocations for each user in a WPCN. They also discussed a doubly near-far problem and proposed a solution by optimizing the common throughput. The authors of [9], [10], [13], [14] proposed energy beamforming solutions to maximize the efficiency of WET. They presented the optimal solutions for resource allocations in multiple antennae WPCNs. Most of the solutions use numerical optimization to solve network optimization problems. In our previous work [18], we formulated the transmit power minimization problem by jointly optimizing the time allocations on downlink and uplink, the power allocations on uplink, and the WET energy beams on downlink. However, this problem is non-convex and numerically intractable. We solved it by using an iterative algorithm based on sequential parametric convex approximation (SPCA). The SPCA iterative algorithm provides a near optimal solution, but it is costly in terms of computational and time complexity. Moreover, it is difficult to implement these iterative algorithms in real-time applications.
Recently, deep learning-based approaches have been widely adopted to solve the resource-management problems in wireless networks [20]- [29]. In [20], Sun et al., designed a deep neural network (DNN) based algorithm for the power control problem in interference-limited wireless networks. They proposed DNNs can fairly approximate the weighted minimum mean square error (WMMSE) algorithm with high sum-rate performance and low computational complexity. In [21], Kang et al., proposed a deep autoencoder-based scheme to learn the channel parameters at the energy transmitter in energy harvesting networks. The proposed scheme performed better in comparison to the Gerchberg-Saxton method. For the resource allocation in wireless deviceto-device (D2D) communication, the authors of [22], [27] proposed deep learning-based networks to control the transmit power at transmitters in order to maximize the spectral efficiency while managing the interference at low level. In [23], He et al. discussed learning-based wireless powered secure network in which they maximize the effective secrecy throughput for the wireless powered system. Similarly, [24]- [28] presented the deep learning-based schemes as faster and low computational complex solutions in comparison to the conventional algorithms.
In this paper, we propose a deep learning-based scheme for optimal resource management in a WPCN. With a supervised deep learning technique, a DNN is trained to learn the mapping between input and output variables of the optimization problem. To solve the network resource management optimization problem using supervised deep learning, a large data set is needed to train the DNN. For this, we acquired data samples by solving the network model using traditional methods. Then, we trained the DNN using data generated from a mathematical model of the network, as shown in Fig. 1. However, higher computationally complexity is required to acquire a large training data set from the traditional optimization technique, but we can generate training data samples and go offline to train the DNN. Thus, higher complexity can be afforded, and the DNN can provide a less complex solution for real-time applications.

II. SYSTEM MODEL AND PROBLEM FORMULATION
In this section, we describe a multi-antennae WPCN, as illustrated in Fig. 2. We consider a hybrid access point (HAP) equipped with multiple antennae, i.e. M > 1, that transmit energy signals on a forward link (FL) and receive information signals on a reverse link (RL). The considered network consists of K fixed wireless devices (WDs) denoted by {WD 1 , WD 2 , . . . WD k }, where each device is equipped with a single-antenna; h k ∈ C M ×1 and g k ∈ C M ×1 each denote a communication channel between the HAP and the k th wireless device on FL and RL, respectively. All the channel properties remain constant in one coherence interval. In addition, we assume that channel properties are known at the transmitter, which was commonly assumed in several papers [5], [9], [30], [31]. All the WDs have embedded energy storage equipment in the form of rechargeable batteries to store energy harvested from energy signals. In our proposed WPCN system, we assume WDs follow the harvestthen-transmit protocol. In addition, WDs have no embedded energy sources, and send an information signal only by utilizing the energy harvested from the HAP. The proposed WPCN operates in a time division duplex (TDD) manner as shown in Fig. 3. At the start of the coherence interval, HAP sends a control signal which comprises the information of resource allocation for each wireless device. We can omit the further discussion on this control signal transmission (CST) phase because it is fixed and does not affect the other phases of coherence interval. In the WET phase, all WDs harvest energy from the energy-carrying signal broadcast by the HAP on FL in τ o T seconds. In the WIT phase, the WDs transmit information-bearing signals on RL by utilizing the harvested energy in the allocated K time subslots, denoted τ 1 , τ 2 , . . . τ k in Fig. 3. We assume a small fixed guard interval between every two consecutive time slots to avoid multi-user interference [32]. It is assumed that the total coherence time is normalized to 1 for convenience without loss of generality. Since the proposed system operates in the TDD manner, channel reciprocity holds for forward and reverse links

A. WET ON FL
During the WET phase, the HAP transmits an arbitrary energy signal in the form of beams directed towards WDs with transmit power P A in time τ o . We denote the energy signal as s e = α α αs 0 , where s 0 is an independent and identically distributed (i.i.d.) random signal with zero mean and unit variance; and α α α ∈ C M ×1 denotes the energy beamforming vector at the HAP. There is a sum power constraint at the HAP: E[||s e || 2 ] = ||α α α|| 2 ≤ P A . The energy signal received at the k th wireless device on FL is expressed as Here, ω k is the noise at the k th wireless device. We assume that the energy harvested from receiver noise is negligible, compared to the energy harvested from the energy signal, so we ignore receiver noise in further problem formulations. Then, the amount of energy harvested by the k t h WD is expressed as where ρ k ∈ (0, 1) for k = 1, 2, 3, . . . , K denotes the energyharvesting efficiency at the k th WD.

B. WIT ON RL
During the WIT phase, WDs send information signals to the HAP in time slots allocated to them by utilizing the energy harvested from the HAP. Every k th WD sends an informationbearing signal, s I = √ p k s k , in allocated slot τ k at power level p k , where s k denotes the information-bearing signal of the k th WD, which is assumed to be an i.i.d. circularly symmetric complex Gaussian (CSCG) random variable with zero mean and unit variance. In WIT phase, p k is limited to the energy harvested from the HAP and the circuit energy The signal received by the HAP in allocated slot τ k on RL is expressed as where z denotes the noise at the HAP with power N 0 . The achievable throughput (in bits per second per hertz) for the k th WD in the k th time slot is expressed as where β β β n ∈ C M ×1 denotes the received beamforming vector for decoding the information signal at the HAP. Then, we consider maximum-ratio combining (MRC) beamforming at the HAP such that β β β k = h k h k . Hence, (4) can be simplified as follows: In this paper, our main goal is to design the transmit beamforming vector while minimizing the transmit power. We observe from (2) and (5) that energy harvested by the k th WD and the achievable throughput are direct functions of transmit power at the HAP. When we increase transmit power, WDs harvest more energy, and hence, more power is available to send information signals on RL, which increases the achievable throughput of the WD. However, there is limited power available at the HAP. So, it is necessary to design WPCN parameters for efficient use of the resources available at the HAP.
Here, we jointly optimize the time and power allocations with an optimal design of transmit beamforming vectors to minimize the transmit power at the HAP while managing the harvested energy and achievable throughput above a specified threshold. We formulate the optimization problem as follows: where k and e k are the minimum requirements for achievable throughput and harvested energy, respectively. In (6), we jointly optimize the power and time allocations on RL and FL, as well as the transmit beamforming vectors, while minimizing the transmit power at the HAP. Constraints (6b) and (6c) confirm the harvested energy and achievable throughput are above the specified thresholds; (6d) shows that the power available to send information is limited to the energy harvested from the HAP, because the WDs have no other source of energy. We assume E (c) k = 0, ∀k to focus on transmit power allocation for wireless devices. Constraint (6e) satisfies the normalization of time over the total coherence interval.

III. DEEP LEARNING BASED ENERGY BEAMFORMING
In this section, we propose a deep learning-based algorithm to solve Problem (6), effectively. In particular, we discuss a supervised learning (SL)-based deep neural network (DNN) that learns the mapping of input variables to output optimization variables from the training data. From Problem (6), we specify channel gains as an input feature to feed the DNN, while power allocation, time allocation, and beamforming vectors are output from the proposed model.

A. NETWORK STRUCTURE
As shown in Fig. 4, we consider a fully connected DNN that consists of one input layer, multiple hidden layers, and multiple branch output layers. The output of every q th hidden layer can be expressed as where {θ q } and {b q } are the weights and biases of the q th layer. Learning is the problem of finding the weights, θ θ θ , within some feasible set that will lead to the optimal solution of Problem (6). Here, o(.) indicates the activation function applied to each hidden layer to add non-linear properties in the mapping function of input to output variables. We adopt separate branches of the DNN for magnitude and angles of beam vectors (i.e. {bm k } and {ba k }, respectively) since a neural network must be a real-valued scalar or vector [33]. The numbers of neurons in the input layer and the output layer are MK and 2M + 2K + 1, respectively. However, the number of hidden layers and the number of neurons in each hidden layer are hyper-parameters that need to be tuned in order to improve the performance of the DNN. A well-trained DNN can learn the mapping between input and output while configuring the optimal weights for neurons.

2) DATA PRE-PROCESSING
From Problem (6), we generate a large data set from the conventional method. Each data sample consists of {|h mk |} as inputs and {{p k }, τ 0 , {τ k }, {bm k }, {ba k }} as outputs. From the data samples, we observe that input and output have different scales, which creates a problem in the weight-learning process. Because of this, we use a data normalization technique to rescale all the input/output variables. For this, we use the MinMaxScaler pre-processing technique, which transforms all the variables into the range [0, 1].

3) ACTIVATION FUNCTION
Activation functions add non-linearity to the mapping function between input and output variables. The most widely used activation function in deep networks is ReLU , and that makes convergence easier and faster [34], [35]. We apply the ReLU activation function to the output of every hidden layer. However, for the output layer, we use the Sigmoid activation function for {{p k }, {bm k }, {ba k }} and Softmax for the time output. The Softmax activation function fulfills time constraints (6e) and (6g).

4) DROPOUT
We use a dropout layer after each hidden layer to add regularization in the proposed DNN. A dropout helps the DNN overcome the overfitting error in the training process. The main idea behind the dropout regularization is to randomly change the architecture of the neural network by ignoring a few neurons in each hidden layer for a new data sample during the training process. This randomization helps learn the model more generally, and reduces the overfitting problem.

B. LEARNING IN THE DNN MODEL 1) GENERATING THE TRAINING DATA
Training a DNN is the process that tunes the neural network parameters, i.e. weights and biases of neurons, under SL fashion in order for the DNN to learn the mapping of input to output. For this, we need a large number of data samples from input and output that describe Problem (6). However, Problem (6) is a non-convex and complex problem, and it is numerically intractable to find the optimal solution to this problem. Here, we exploit the sequential parametric convex approximation (SPCA) algorithm to generate data samples for the Problem (6). SPCA is a general scheme for solving non-convex optimization problems, where a nonconvex problem is divided into convex sub-problems, and is solved over several iterations. In all iterations, the nonconvex feasible set is approximated by an inner convex approximation [36]. In [18], we proposed the near-optimal solution for Problem (6) by exploiting the SPCA iterative algorithm. We generated L data samples using the method discussed in Algorithm 3 [18]. Each data sample consisted of channel gain and the corresponding optimal solution of beamforming vectors and power and time allocation given by

2) TRAINING THE DNN
The idea behind training the DNN is to optimize weights {θ q } and biases {b q } in order to minimize the loss function. The loss function in DNN training is defined as the difference between the actual optimal values from the data samples, }}, and the predicted values from the output of the DNN, {{p k }, τ 0 , {τ k }, {bm k }, {ba k }}. We use mean squared error (MSE) as the loss function that should be minimized to find the optimal solution for the proposed network. We use the Adam optimizer to find the optimal weights in training the DNN.

IV. PERFORMANCE EVALUATION
In this section, we present simulation results to show the comparison between the conventional SPCA approach and our proposed DNN-based scheme. We performed the SPCA simulations using the MATLAB 2020a CVX solver [37], and the DNN approach was implemented in Python 3.8.5 with TensorFlow 2.5.0. The simulations were carried out on a computer with an Intel Core i5-6500 CPU at 3.2 GHz with 8 GB of RAM.

3) WPCN SYSTEM PARAMETERS
We considered a multi-antennae WPCN system with M antennae at the HAP and K wireless devices. We generated data samples D = 10000 for K = [2, 3, 4, 5, 6] and M = [10,15,20,25,30,35] in each case. For the generation of data samples using the SPCA method, we set noise power N 0 = −100 dBm, the harvested energy threshold e k = −30 dBm, and the minimum requirement of achievable throughput k = 1 bit/s/Hz. We assumed the receiver efficiency in each wireless device when harvesting energy is ρ k = ρ = 0.5. The distance-dependent path loss is modeled as PL k = 10 −3 d −γ in which d is the distance between the HAP and a wireless device and γ is the path loss exponent, set at 3 in our simulation setup. We assumed all wireless devices are uniformly distributed within the range [1m, 2m] from the HAP. Moreover, we considered the communication link between the HAP and a wireless device to be modeled with Rician fading: where K R is the Rician factor (set to 3), h k LOS denotes the LOS deterministic component, and h k NLOS denotes the standard Rayleigh fading components with zero mean and unit variance. For LOS components, we considered the far-field uniform linear antenna array model [38], given as where c k = π sin (ω k ) and {ω k } are the angles of the directions from the wireless devices to the HAP, and the carrier wavelength is double the spacing between successive antenna elements at the HAP.

4) DNN ARCHITECTURE AND HYPERPARAMETERS
We set the following DNN parameters for the simulations, performed experiments for different numbers of hidden layers, and the network that showed the minimum MSE was chosen for all the experiments. The common network had three hidden layer, and all the output branches each consisted of two hidden layers and one output layer. Similarly, we performed experiments for different hidden neurons, but chose [32,64,64,64,32] hidden neurons for hidden layers. We trained the DNN on 6000 data samples, and validated the network with 2000 samples in all training epochs. The remaining 2000 data samples were use for testing the trained model. We performed experiments for different batch sizes, as shown in Fig.5. We observed that a batch size of 100 had fast convergence and minimum training loss. This is because the small batch sizes achieved the best training stability and generalization performance, for a given computational cost, across a wide range of data samples [39]. VOLUME 9, 2021 FIGURE 4. The proposed deep learning-based neural network consisting of one input layer and multiple output branches. There is a common network where channel gain is the input for the DNN, and it is further divided into multiple output branches. Each output branch approximates the optimal solution of optimal variables for Problem (6). Fig.6. shows the convergence of training loss and validation loss from the proposed DNN when K = 2, 5. Here, we observe fast convergence in the proposed network. Moreover, when we increased the number of wireless devices, training loss and validation loss increased for the same DNN architecture. However, training loss can decreases by tuning the hyper-parameters. But, in practice, we cannot tune the hyper-parameters for each case. In Fig.7 and Fig.8, we plot the transmit power at the HAP versus different numbers of wireless devices. We see that the proposed DNN approximates the SPCA-based solution for energy beamformers quite well. We observe from the plot that when we increase the number of wireless devices, transmit power at the HAP increases. This is because, more wireless devices harvest more energy from the HAP. From Fig. 7, we also see that a batch size of 100 performed better than larger batch sizes. Fig. 8 shows that the higher learning rate performed better in comparison with lower learning rates. But we cannot make the learning rate too high, or the learning process might skip the global optimum point. Fig. 9 shows a comparison of computational time from the conventional approach versus the proposed scheme for different numbers of wireless devices. We observe from the       that proposed scheme provides a faster solution in comparison of conventional approach. As we know, there is very short coherence interval in wireless communication networks, so high computational time may become impractical. Deep learning based proposed scheme provide a faster and less complex solution.

V. CONCLUSION
In this paper, we proposed a supervised learning-based deep neural network that approximates the optimal solution for energy beamforming vectors in a multiple-antennae wireless powered communication network. The proposed deep learning-based scheme gives the best approximation of the conventional SPCA approach. We generated data samples from the iterative SPCA algorithm to train and validate the proposed deep neural network. The trained deep neural network learned the mapping between input/output variables of the optimization problem for energy beamforming with transmit power control. We fed the DNN with channel gain from the WPCN, and found optimal solutions for power and time allocation with an optimal design for energy beams while minimizing the transmit power at the HAP. In order to maximize the performance of the DNN, we designed the hyper-parameters for the proposed network. The simulation results show that the proposed solution is much faster than the conventional iterative SPCA method. We conclude that deep learning-based schemes can optimize networks and provide a faster solution with less complexity in comparison with conventional optimization algorithms. In 2005, he joined the University of Ulsan, Ulsan, South Korea, where he is currently a Full Professor. His current research interests include spectrum sensing issues for CRNs, and channel and power allocation for cognitive radios (CRs) and military networks, SWIPT MIMO issues for CRs, MAC, and routing protocol design for UW-ASNs, and relay selection issues in CCRNs. VOLUME 9, 2021