Power Allocation in Cell-Free Massive MIMO: A Deep Learning Method

Massive multiple-input multiple-output (MIMO) is a key technology in 5G. It enables multiple users to be served in the same time-frequency block through precoding or beamforming techniques, thus increasing capacity, reliability and energy efﬁciency. A key issue in massive MIMO is the allocation of power to the individual antennas, in order to achieve a speciﬁc objective, e.g., the maximization of the minimum capacity guaranteed to each user. This is a nondeterministic polynomial (NP)-hard problem that needs to be solved in a timely manner since the state of the channels evolves in time and the power allocation should stay in tune with this state. Although several heuristics have been proposed to solve this problem, these entail a considerable time-complexity. As a result, with the present methods, it cannot be guaranteed that power allocation happens in time. To solve this problem, we propose a deep neural network (DNN). A DNN has a low time complexity, but requires an extensive, ofﬂine, training process before it becomes operational. The DNN we propose is the combination of two convolutional layers and four fully connected layers. It takes as input the long-term fading information and it outputs the power for each antenna element to each user. We limit ourselves to the case of time-division duplex (TDD) based sub-6GHz networks. Numerical results show that, our DNN-based method approximates very closely the results of a commonly used heuristic based on the bisection algorithm.


I. INTRODUCTION
Multiple-input multiple-output (MIMO) systems, where several antennas are deployed in both transmitter and receiver, have been studied during the last two decades and applied to many wireless standards [1]. While the initial work was focused on point-to-point links, multi-user MIMO (MU-MIMO) attracted more attention in recent years and is a technique used in 5G. In MU-MIMO a base station (BS) simultaneously serves several users in the same time-frequency resource block. Benefiting from the spatial diversity and multiplexing, MU-MIMO increases capacity, enhances reliability, improves energy efficiency and reduces interference [2]. In order to achieve further capacity gains, massive MIMO has been introduced, where the number of antennas in a BS has been increased to hundreds, being at least one order of magnitude larger than the number of user equipment (UEs) served simultaneously. As the number of antennas in an antenna array grows, the channels become asymptotically deterministic; this is known as channel hardening. As a result, the capacity only depends on the long-term The associate editor coordinating the review of this manuscript and approving it for publication was Xingwang Li . fading coefficients, which are stable over a time period of typically several seconds [3].
Recently, cell-free (CF) massive MIMO has been proposed [4], where several remote antenna arrays (RAA) are spread over a coverage area and are connected to a central controller (CC) via a fronthaul. The CC function, which in 5G is part of the BS (gNB), determines the power to be allocated for each UE at each individual antenna.
In a massive MIMO system, a UE in the coverage area can, in principle, be served by each and every antenna. In CF massive-MIMO, this is still true, but the antennas are now geographically distributed over the coverage area in clusters (sub-arrays). The difference with a cellular or small cell system, is that in those, a UE can only be served by antennas belonging to a particular cluster, i.e., a particular BS. Hence, in CF massive-MIMO, the concept of cell has disappeared [5] and all antennas coherently serve all UEs [6].
Compared to centralized massive MIMO (Fig.1), where all antennas are co-located in one BS, the merits of CF massive MIMO are: (1) Much smaller physical size of the RAAs compared to a centralized antenna array, which facilitates deployment, in particular in indoor scenarios. (2) Better performance for the same total number of antennas; e.g., in [7] it is shown that CF massive MIMO can have a significantly higher sum-rate than a centralized system, for the same precoding and transmission power.
(3) Higher robustness against shadow fading due to the spatial diversity of the RAA arrays; this is particularly important for mmWave.
There are also some drawbacks of CF massive MIMO with respect to centralized massive MIMO: (1) Higher physical deployment costs due to the geographical spread of the RAAs and the need for a fronthaul.
(2) Non-negligible propagation delay in the fronthaul, because the RAAs are spread over a potentially large area.
Since in massive MIMO, and in particular CF massive MIMO, different UEs are served simultaneously in the same time-frequency domain, controlling the inter-user interference is important. The power allocation plays a crucial role in controlling this interference and in optimizing the performance [8], [9]. In this paper we address this problem for CF massive MIMO. To ensure quality of service fairness for the UEs, we intend to do this by max-min power control optimization. We assume that the UEs have a single active antenna.
A key benefit of massive MIMO is that the short-term fading can be neglected due to channel hardening [10]. The time scale of the long-term fading determines the power controlling time, which puts a constraint on the validity of the power allocation decisions. Hence, it limits the time budget available in the CC for the collection of the channel state information (CSI), the computation of the transmission power for each antenna to UE, and, the sending of the power settings to each RAA via the fronthaul.
The problem we are facing is that the time complexity of an exact solution of the max-min power optimization is excessive for the given time budget. Even the heuristics that are proposed in the literature, e.g., the bisection algorithm in [4], are too complex to meet the time constraints.
This led us to propose a deep neural network (DNN) to perform the task. The time-complexity of a DNN is low. The price one pays is the off-line training that needs to be performed. This can be lengthy and requires the generation of a large training data set. However, this is not a real problem since it is an activity which is performed off-line, before the DNN becomes operational.
Deep learning has become a popular method to solve complex problems [11]. Universal approximation theory [12], proves that a DNN can approximate any function that has continuous values. It has shown competitive performance, compared to non-machine learning solutions, in communication networks such as multiple access scheme [13], routing optimization and congestion reduction [14].

A. RELATED WORK
Many papers have studied massive MIMO, e.g., [15], [16] and [17], however, most of these consider centralized massive MIMO. A few papers, in particular [7], [18] and [19], have addressed distributed massive MIMO. They found that a distributed system offers higher sum-rates than centralized systems, assuming perfect CSI and no interference. A realistic analysis however must account for imperfect CSI as well as interference between UEs. In [4] the max-min power allocation, based on imperfect CSI, is studied for the extreme case where each RAA has only one antenna. In [20] the same problem is studied for different RAA sizes, assuming fixed UE positions. However, none of these papers ( [4] and [20]) consider the real-time constraints of the power-allocation, which is our main concern.
Recently the power of applying deep learning methods in the control of wireless communication systems has been shown in [21], [22], and, in particular for massive MIMO, in [23], [24], and [25]. In [21] a fully connected neural network mimics the processing of the weighted minimum mean square error (WMMSE) algorithm to manage interference in multi-cell networks. The numerical results show that with a neural network one can closely approximate the performance of WMMSE but, in much less time. In [23] and [24] it is shown that a DNN achieves near-optimal accuracy of signal detection with much less computations compared to the classical method. DNN-based channel estimations were proposed in [25], [26] and [27], outperforming traditional estimators in terms of computational cost. Specifically, a fast and flexible denoising convolutional neural network (FFDNet) is proposed to estimate the channel for cell-free mmWave massive MIMO in [27], the authors claim that the proposed FFDNet achieves faster training speed than the existing DNN methods without sacrificing normalized mean square error performance.
Similarly, in [28] and [29] deep learning was used to solve power allocation in massive MIMO. A fully connected neural network and a recurrent neural network were proposed to maximize the spectrum efficiency (SE) and implement the max-min power policy respectively in [28]. In [29] a twolayer DNN was used for power allocation to combat intercell interference. The state-of-the-art residual dense block (ResDense) method was applied in [30] to allocate power in multi-cell massive MIMO. Numerical results show encouraging performance of DNN-based power allocation; hence, it is expected that DNNs will replace traditional heuristics, because of their lower time complexity.

B. CONTRIBUTIONS
In this work we consider the real-time power allocation problem in CF massive MIMO using time-division duplex operation (TDD). Our contributions are the following: (1)We formulate the general max-min power allocation problem, then we propose a heuristic, consisting of a nonconvex iteration algorithm combined with the bisection method, to solve it.
(2)We construct a DNN to approximate our proposed heuristic.
(3)We analyze the performance of the DNN in terms of the accuracy with which it approximates the heuristic and its time complexity. Our results show that the DNN provides a very good approximation, while requiring significantly less computation time.
The paper is organized as follows: in Section II, we formalize the CF massive MIMO system and the max-min optimization problem. In Section III we propose a heuristic to optimize the max-min power allocation. Its purpose is to generate offline a data set to be used for training the DNN and to assess the results of our DNN approach. The proposed DNN is discussed in Section IV. Section V shows numerical results, which provide evidence that our DNN solution approximates the heuristic quite closely. Section VI summarizes our conclusions and indicates further research that is needed.
Notation: Boldface characters denote a matrix or a vector. () * and () H stand for conjugate and conjugate-transpose, respectively. | | represents the Euclidean norm, and E{ } is the expectation operator. We use UE and user interchangeably.

II. CF MASSIVE MIMO SYSTEM
Consider a CF massive MIMO system comprising N RAAs, each equipped with M antennas. K single-antenna UEs are served by these RAAs in a given coverage area. Furthermore, all RAAs are connected to a CC, where the power allocation is performed. We base our system model on the following common assumptions: (1) The system operates in the sub-6GHz frequency bands.
(3) The time-frequency resources are divided into coherence intervals, during which the channel can be regarded as constant [33].
(4) The channels between antenna m in RAA n and user k, for all m, n, and k, are i.i.d. with a Rayleigh fading distribution [9]: where β k n,m is the long-term fading coefficient. [34] proposed that the channel model should be the combination of Rayleigh and Rician fading, because the LOS probabilities depend on the distance between the transmitters and receivers. In this work, we assume Rayleigh fading to simplify the model. (5) The propagation in the fronthaul is negligible. However, as argued in [4], the effects of the fronthaul should be quantified in future work, as is done in [35] and [36].

A. UPLINK CHANNEL ESTIMATION
Let τ c be the coherence time expressed in number of modulation symbols. We assume pilot sequences of length K where K < τ c . UE k is preassigned the pilot ψ ψ ψ k with E{|ψ ψ ψ k | 2 } = 1 and ψ ψ ψ k ψ ψ ψ H k = 0 for k = k . This guarantees that there is no intra-cell pilot contamination [37]. In general, pilot contamination is a very important issue in MIMO systems. It degrades the quality of the CSI, which in turn reduces the system capacity. References [38] and [39] examine how intra-cell pilot contamination can be decreased for multicell massive MIMO, [40] checks how to reduce the pilot contamination for cell-free massive MIMO.
During the channel estimation phase, all users send their pilots with full transmission power. The received signal at antenna m in RAA n is the superposition of these pilots: where p p is the pilot power, w w w ∈ C K is an additive Gaussian white noise vector with element power σ 2 and ι ι ι ∈ C K is the pilot contamination vector from other cells with element power η 2 . Based on the received signal, the RAA performs conjugate operations to decode the desired signal from VOLUME 8, 2020 user k: The Bayes estimator [41] will produce the estimated channel by prior probability g k n,m ∼ CN (0, β k n,m ): See the details in Appendix A.

B. DOWNLINK TRANSMISSION
Based on the estimated channels, i.e., the CSI values, the CC determines the power for each antenna in each RAA to transmit data to UEs. Let q k with E {|q k | 2 } = 1 be the intended signal for user k, then the transmitted signal of antenna m in RAA n is : where p k n,m is the downlink transmission power from antenna m in RAA n to user k and f k n,m is an element of the precoding vector. Since the max-ratio precoding can be performed locally at each RAA, and hence is very suitable for a CF massive MIMO architecture [42], we use max-ratio precoding as was also done in [9]: The max-ratio precoding has lower complexity than zeroforcing and approximates the performance of the optimal dirty-paper precoding [43] when increasing the number of antennas [44]. To find a low-complexity precoding algorithm with good performance is still a challenge. In [45] a promising deep-learning based algorithm is proposed, which, however, requires further research and analysis. UE k will receive the superposition of the signals of all RAAs in the whole system: where w k is the thermal noise with element power σ 2 at UE k.

C. ACHIEVABLE DOWNLINK DATA CAPACITY
We can decompose the right side of equation (7) into a sum of four terms: the desired signal, the fluctuation caused by the uncertain channel gains, the interference from other UEs, and the noise, as shown in (8). According to [9], the achievable downlink capacity for user k, C k is given by the use-and-thenforget (UatF) bound: with the achievable signal-to-interference-plus-noise ratio (SINR) given by (10). Details are shown in Appendix B.

III. MAX-MIN POWER ALLOCATION
As we see in (10), the achievable SINR of user k is a function of the following variables: long-term fading, pilot power, white noise and downlink transmission power. In this section, we use a heuristic consisting of a combined bisection-plusiteration algorithm for max-min power allocation. The maxmin policy ensures quality of service fairness among all UEs.
Other power allocation policies such as new fairness power allocation [46], max product SINR [30], given target SINR [31] could also be considered, which would lead to a different optimization objective function. The max-min power allocation policy can be formulated as: where p l is the downlink transmission power limitation of each antenna, n = 1, 2, . . . , N ; m = 1, 2, . . . , M and k = 1, 2, . . . , K . An exact solution for this optimization problem is not feasible, since the time complexity increases exponentially as M , N and K increase linearly, i.e., it is nondeterministic polynomial (NP)-hard. Therefore, one has to resort to a heuristic to solve it. A widely adopted heuristic (see, e.g., [4]) is the bisection algorithm [47], which divides the problem into two sub-problems, namely, the candidate value problem and the feasibility problem. In each loop of the bisection algorithm, a candidate value is chosen to determine the constraints of the subsequent feasibility problem. The challenge of this method is that we need to solve a nonconvex feasibility problem in each loop, which is a nonlinear inequalities problem with K +NM constraints and NMK variables. There is no analytical solution, therefore we propose a numerical solution given by Algorithm 1. This, of course, will give us sub-optimal solutions.
The key idea of Algorithm 1 is that, after an initial power allocation, the user with the highest capacity will always give its allocated power to the user with the lowest capacity, provided that its capacity does not go below the candidate capacity that corresponds to SINR candidate , which is SINR candidate = (SINR max + SINR min )/2 in our bisection algorithm. At the stopping point of the algorithm, the difference in SINR between the user with the highest and the one with the lowest capacity, will be no more than 1 . If we set 1 small enough, the capacity of all users will be regarded as sufficiently equal, achieving fairness. The tolerances 2 and 3 , here, are used to avoid endless loops.
The time complexity to solve (11) by our heuristic is O(log 2 (MN )M 3 N 3 K 2 ) which is shown in Appendix C. Even such a polynomial complexity can be too high when the if SINR i ≥ SINR candidate 14:  31: end if solution has to be obtained within the time constraints. This was the motivation to try DNNs to solve the problem.

IV. DEEP LEARNING BASED POWER ALLOCATION
In this section we propose a DNN to perform the max-min power allocation. This method has low complexity: it only requires a number of layers of simple operations such as matrix or vector multiplications [21]. In addition, a DNN is expected to run on neural processing units (NPU) [48], specifically designed to support machine learning, making it possible to parallelize parts of the computation. Hence it should be easy to meet the time constraints. Unlike traditional iterative methods of solving (11), the DNN-based power allocation operates as a nonlinear regression: given the input information (the long-term fading coefficients β k n,m ), the DNN will output the transmission power for each antenna. Since we use a supervised-learning method, a training data set is required. For a given CF massive MIMO network, the training should be based on the particular network configuration and the usage scenarios to be expected.
Let there be N RAAs in the coverage area. For each realization, we assume K UEs are uniformly and randomly distributed in the coverage area. The max-min power allocation is computed by the heuristic described in section III, to generate a sufficiently large number of data points as training samples for the DNN. We record three elements of each realization to compose a training data point: the long-term fading coefficients K × MN matrix β β β, the achievable signal to noise ratio S, i.e., the final SINR candidate , and the corresponding computed power allocation vector p p p. Multiple realizations are produced to generate the training dataset.

A. DESIGN OF THE DNN
Several DNNs have been proposed in the literature to solve the power allocation problem. In [21], a fully connected DNN was proposed to control interference. Its purpose was to approximate the WMMSE algorithm. Although it can be proved that an arbitrary accuracy can be achieved by several layers, overfitting, i.e., good performance for training data while bad performance for testing data, is inherently a shortcoming of a fully connected network. In [30] the residual dense block (ResDense), which consists of several convolutional layers is used for power allocation in massive MIMO. Good performance was achieved by this powerful DNN; however, the objective function is different from ours: the sum spectral efficiency is maximized. The solution for this maximization problem is based on the greedy strategy, i.e., a user will always be served by its closest BS. This makes the problem easier to solve as the solution is confined to a small feasible region. Moreover, since [30] studied multicell MIMO, based on a centralized MIMO setup, the system model is entirely different from ours. Since neither of these DNNs was suitable for solving our problem, we needed a different design.
As mentioned in Section III our proposed algorithm for power allocation involves two parts: the candidate value problem and the feasibility problem. We note that the candidate value SINR candidate is determined by a feasible solution, while the power allocation is based on SINR candidate . This makes the problem more complicated since we need to solve two suboptimization problems. If one is known or obtained easily, e.g., the final SINR candidate (S) is known, we can decrease the loop in Algorithm 1 or, if we can easily determine whether SINR candidate is feasible, in each loop of Algorithm 1, a simple bisection computation will get the final SINR candidate by an iteration involving multiple steps, thereby simplifying the optimization significantly.
Based on the above analysis, and according to the universal approximation theory [12], we expect to get S by using a DNN with several layers. We use a convolutional neural network for the regression of S, because compared with the fully connected neural network, it can significantly decrease the number of parameters. After we get S, we need an iterative algorithm to solve the feasibility problem, i.e., the power allocation. Referring to [21], we can use several fully connected layers to approximate this iterative algorithm. So, we design the structure of our DNN as consisting of two stages: regression processing and allocation processing.
While we propose a specific DNN structure, it is worth to point out that finding the best DNN structure and the values of the hyper-parameters can also be seen as optimization problems in their own right, requiring further research. Referring to the literature ( [21], [49]), we tried several structures and hyper-parameters, including fully connected networks (from one layer to six layers) and traditional convolutional neural networks (two convolutional layers with a number of fully connected layers varying from one to four), to choose the best configuration, i.e., the one that gives us the lowest mean square error (MSE) on the training dataset.
Let us now discuss in detail the two stages of the DNN. For an explanation of the general concepts used in convolutional neural networks, we refer to [11] and [49]. Rectified linear units (ReLu) are used as our activation functions in all layers.
Regression: The objective of the regression part is to get S, the achievable SINR, from the input, i.e., from the longterm fading coefficients matrix β β β. We use two convolutional layers and two fully connected layers for this process (see Fig.2). Specifically, for the first convolutional layer, we use 5 × M × Q filters with stride [1, M ] to operate on the input matrix. The result of this convolutional operation yields Q feature matrices with K ×N elements by zero padding 2. Note that we do not use a pooling operation after feature extraction in this layer. In the second layer, we use 5 × 5 × Q filters with stride [1,1] and zero padding 2 to guarantee the same number of inputs and outputs in this layer. Then a max-pooling operation is used to decrease the number of parameters. Here we use a 2 × 2 kernel size with stride [2,2]. After that we adopt a two-layer fully connected network to get the output S. The numbers of neurons in these two fully connected layers are K /2 × N /2 × Q and K /2 × N /2 × Q/2 , respectively. represents the ceiling operation.
Allocation: When the intermediate S is obtained, the next step is to perform the process of power allocation. We derive from (10) that there is a multiplication operation of β β β and S to calculate the transmission power; so Sβ β β is the input in this phase. Finally, two fully connected layers with 2KNM and KNM neurons are employed to describe the nonlinear relationship between the input Sβ β β and the output p p p.

B. TRAINING OF THE DNN
The DNN is trained in two phases, i.e., the input-output pair (β β β, S) is used to adjust the filters in the first two fully connected layers (regression processing), while (Sβ β β, p p p) is used to train the following two fully connected layers (allocation processing). MSE is used as loss function, see (12). I is the number of training data, and (p k n,m ) pre,i , (p k n,m ) target,i , (S) pre,i , (S) target,i represent the output of the power allocation by the DNN, the target output of the power allocation by the training data set, the intermediate value by the DNN, and the intermediate value by the training data set for sample i, respectively. The first term in (12) represents the average power coefficient error while the second one is the regression rate error of the intermediate S.
The DNN is trained using the adapted gradient descent method described in [50], which means that the learning rate is updated in each epoch: in initial epochs the learning rates are large to guarantee that the loss function will converge, while in the later epochs the learning rates are getting smaller to achieve sufficient searching. The training process contains updating of weights between neurons and updating of the bias in each layer by partial derivatives. More details of how the training process works are described in [11].

C. TIME COMPLEXITY OF THE DNN
The time complexity of the DNN solution mainly lies in the training phase, which is not a problem for the operational phase of the system, since it can be done offline. The complexity of the DNN power allocation operation lies in the online forward propagation part. Assuming a well-trained DNN, the time complexity is O(K 2 N 2 M 2 ); see Appendix D for details. Compared to the complexity of the heuristic algorithm O(log 2 (MN )M 3 N 3 K 2 ) in Section III, the DNN power allocation eliminates multiple iteration loops. The computation times of the two methods are compared in the next section.

V. COMPARISON OF DNN-BASED AND HEURISTIC-BASED POWER ALLOCATION
In this section, we show by simulations that the DNN-based power allocation can closely approximate the performance of the bisection heuristic with a much lower time complexity.

A. CONFIGURATION AND SCENARIO FOR SIMULATION
We consider a 200 × 200 m 2 square coverage area with a total of 100 antennas to serve K = 5 users. There are 9 RAAs (N = 9) placed in a regular grid as shown in Fig.3. Each RAA has 11 antennas (M = 11), except for the central one which has 12. This has no particular significance, except that the simulations were originally done for studying the effects of the degree of distribution of the antennas on the network capacity. Note that determining the optimal degree of distribution and geographical deployment of RAAs in CF massive MIMO is an open issue. Finding the optimal configuration, given a particular number of antennas requires further research. We have used a sample configuration of a CF massive MIMO system to demonstrate the potential of the DNN-based solution. However, for a new configuration (e.g., different deployment of RAAs), the DNN might need to be retrained. As in [4], the maximum power levels for each RAA antenna and each UE are 200 mW and 100 mW respectively. The carrier frequency is 1.9 GHz and the available bandwidth is 20 MHz. We set the height of the RAAs to 15 m and for the UEs to 1.65 m. We assume that the noise power is -94 dBm, and the standard deviation of shadow fading is 8 dB. The number of modulation samples in each coherence interval is assumed to be 200. The pilot contamination from other cells is −80 dBm.
The mini batch size, i.e., the number of samples to process before the parameters are updated in the DNN, is 500. The initial learning rate is 0.002 and the maximum number of iterations is 800. The initial weights and bias are Gaussian random variables that have an N (0, 0.01) distribution. The number of filters in each convolutional layer is 60. For the first two fully connected layers, we set the number of neurons to 900 and 450 respectively; while for the latter fully connected layers, we select 1000 and 500 respectively. The simulation parameters are listed in Table 1.
We assume that a correlated shadowing path loss model can be used, where the long-term fading coefficients β k n,m are given by: where PL k n,m is the path loss in dB, the second factor represents the shadow fading with standard deviation σ sh and z k n,m is the shadow fading coefficient defined as in [4], [51]: where a n,m ∼ N (0, 1) and b k ∼ N (0, 1) are independent random variables, and κ, where 0 ≤ κ ≤ 1, is a parameter. When κ = 0, the shadowing from a given user is the same to all RAAs, which means that the obstacle is near to the UE; while for κ = 1, the shadowing from a given RAA is the same to all users, means that the obstacle is near to the RAA. In our simulation, we set κ = 0.5 and adopt the covariance functions of a n,m and b k in [51]. The three-slope path loss model [52] is formulated as in (15). d k n,m is the distance between user k and antenna m of RAA n. L is defined in [53] as (16), where f is the carrier frequency, h RAA is the height of the RAAs and h u is the UE antenna height.

B. PERFORMANCE OF DNN BASED POWER ALLOCATION
We evaluate the DNN-based power allocation in two scenarios, namely fixed-position UEs and moving UEs.
We should first train the DNN before testing and using it. Therefore, we generated a dataset, containing 432000 data points. The DNN is trained using 430000 data points. Fig.4 shows how the power coefficient error, the regression rate error and the loss function, evolve with the number of iterations.
Two hyper-parameters influence the loss function. The first one is the learning rate, i.e., the evolution rate of the DNN. A higher rate results in a faster convergence but runs the risk of ending up in a local optimum, which means insufficient searching has been done. A low rate, on the other hand, may lead to a loss function that does not converge. The second influential hyper-parameter is the number of filters (Q) in each convolutional layer. More filters improve the prediction accuracy, i.e., the loss becomes smaller, however, it also implies that the training time becomes longer. It is worth to point out that the number of filters dominates the loss function of the regression while the learning rate has more influence on the loss function of the allocation. We tried different hyper-parameters, considering the tradeoffs between the number of filters and the implementation complexity and, between the learning rate and the training time. We see from Fig.4 that after 800 iterations, the loss function is close to 0; its value is actually 10 −5 . This value is still expected to decrease as the number of iterations grows.

1) FIXED-POSITION UEs
For the UE fixed-position scenario, we use the remaining 2000 data points for testing. Fig.5 compares the power allocation obtained by the DNN and the target, determined by the bisection heuristic. It does this by comparing the CDF of the power allocation from 2,000 testing data points. Each data point consists of 500 transmit-receive pairs (MNK ), so the data in Fig.5 is the result of 10 6 pairs. A specific transmit- receive pair corresponds to a particular antenna of an RAA and the receiver of a UE. We found that the power allocated to around 30% of the channels is zero. This means that some antennas are not providing power to some users. We also observe that the power distribution generated by the DNN and the target almost overlap, which means that the two methods achieve similar results. However, the CDF only describes the results statistically, different transmit-receive pairs may get very close transmission power, as a result, a similar CDF is obtained. To make it clear, we chose the results of one data point, randomly taken from the test set, which contains 2000 data points, see Fig.6. Fig.6 (a) shows the CDF of the power allocation for one randomly chosen data from 2,000 test data, Fig.6 (b) shows, for this specific data, the target and the DNN power allocation for each of the 500 transmit-receive pairs. The errors between DNN and target (bisection) power allocation are mostly less than 1mW. However, compared to the right side in Fig.6 (a), the errors on the left side are much larger. This is because a max-pooling operation is used after the second convolutional layer in the DNN, causing the information of small values  to disappear. Fig.6 also demonstrates that our DNN-based power allocation does not perform as well as Fig.5 shows: different transmit-receive pairs may get close or even same allocated power. This result leads us to make a statistical analysis of errors for the test data. We recorded the CDF of errors for the 2000 test data (i.e., 10 6 transmit-receive pairs) in Fig.7.
We observe that the relative error (|Target-DNN|/p l ) between the DNN allocation and the target is less than 0.01 for around 59% transmit-receive pairs and 0.05 for 97%. We calculate that the average error of power allocation for the whole test data set, is 0.0123.
Even if the DNN achieves a very similar power allocation as the bisection algorithm, a question arises: what is the effect of the allocation error on the per-user data capacity? Fig.8 gives some insight. It compares the CDF of the per-user data capacity for the DNN allocation and the target bisection heuristic allocation, based on the total test set containing 2,000 data points. We use the same trained DNN to perform power allocation for both the case with pilot contamination and without pilot contamination. We observe that the difference of DNN and target heuristic for both cases are almost the same, which means that the performance of our DNN is not noticeably affected by pilot contamination. This is because the power allocation depends on the long-term fading β. The pilot contamination only affects the performance of the channel estimation, causing a degradation of the per-user data capacity. Fig.8 also shows that the gap for both cases is no more than 4Mbits/s, which maps to at least 94% accuracy of approximation, measured as (1-|Target-DNN|/target). So, we can conclude that the DNN approximates the behavior of the bisection heuristic very closely in the fixed-position UE scenario.

2) MOVING UEs
For the moving-UE scenario, each UE has an initial position that is random and uniformly distributed over the coverage area. Each UE moves in a random direction (up, down, left and right) with a randomly chosen velocity distributed uniformly between 0 and 5m/s. It maintains its speed and direction for 1s, before selecting a new speed and direction. When the UE reaches the boundary of the coverage area it reverses its direction of movement to stay within the coverage area. A trace of a realization of such a scenario, over the simulation period, for five UEs, is shown in Fig.9. We simulate a duration of 200 seconds at a random initial position of UEs and assume that the UEs will not get out of the coverage area. The DNNbased power allocation is performed every second, after the CC has estimated the CSI from the uplink pilots. We compare again the DNN-based method and the bisection-algorithm based heuristic. Fig.10 shows the CDF of the power allocation for both methods, over a 200s period, thus the results come from 10 5 transceiver pairs. From Fig.10 we can see that DNN-based power allocation works worse than in fixed-position scenario. Much more errors occurred in 1 ∼ 70mW. We also record the results of one random second out of the 200 seconds in Fig.11.  Compared to Fig.6, Fig.11 shows that the DNN-based power allocation approximates the trend of the target bisection method, but has larger errors with respect to the target  than in the fixed-position scenario. The input to the DNN, consists of the long-term fading coefficients. However in a moving UE scenario, with random moving directions, random velocity of the UEs and shadowing, more noise is created for the input data. As in the fixed-UE scenario, we also record the statistical errors in Fig.12. We note that 38.1% of the errors are less than 0.01 and 57.5% of the errors are less than 0.05. The average error is 0.0609. This large error may be caused by underfitting, i.e., our DNN is insufficiently sensitive to the noise of input data. This could likely be solved by using a recurrent neural network. However this needs to be investigated in future research.
Finally, we calculate the final per-user capacity of moving UEs in Fig.13, the results come from a 200 second operation period.
The largest difference between the two methods occurs around 60 Mbits/s. The corresponding accuracy is around 85%. We also observe that the difference is the largest in the low capacity part, which implies that for the cases where  UEs are in an unfavorable location, e.g., due to shadowing, the DNN does not perform as well.

3) DIFFERENT DEPLOYMENTS OF RAAs
To see what impact a different deployment of the RAAs has on the effectiveness of the DNN method compared to the bisection heuristic, we consider the RAA configuration shown in Fig.14. 4 RAAs are placed in the coverage area and each RAA is equipped with 25 antennas, 5 single-antenna UEs are served. The UEs are stationary.
We generate 173800 data points using the bisection heuristic. Then 171800 data points are used for continuous training, i.e., totally 601800 (171800+430000) data points have been used to train our DNN. The rest, 2000 data points are for testing. We show the CDF of the power allocation error (with respect to the bisection heuristic) in Fig.15. We observe that 48% of the errors are less than 0.01 and 87.5% are less than 0.05. The average error is 0.0209.
Finally, in Fig.16 we show the CDF of per-user data capacity. The largest errors occur in the low capacity part (from 0 to 60 Mbits/s), where the poor channels add more difficulties for the DNN. While for the other part, the difference between VOLUME 8, 2020  DNN and the bisection heuristic is no more than 6 Mbits/s, mapping to around 88% accuracy.

4) COMPARISON OF EXECUTION TIME
The time complexity of the DNN method is dramatically lower than the one of the bisection heuristic. Nevertheless, it is difficult to do a fair comparison in terms of processing time on a real implementation. One important reason is that the implementations will be based on different hardware architectures. Where the bisection heuristic is well suited to be executed on a classical (multicore) CPU architecture, the DNN will likely run on a hardware architecture optimized for machine leaning, e.g., an NPU. Nevertheless, it is revealing to see the huge difference, when both are executed on the same hardware. In Table 2 we recorded 10 random samples from 2,000 data points. We use the same platform, a 4 core Intel Core i5-7300 CPU with 2.6 GHz frequency. The programs are both written in Python 3.7.2.
From table 2 it is obvious that the DNN-based power allocation requires much less processing time than the bisection heuristic and has less variation. For the DNN, the number of calculations is constant. The fluctuation of processing time comes from the calculation of different floating-point numbers and the inaccuracy of reading the system time. The CPU load also plays a role in this fluctuation. For the bisection heuristic, the time fluctuation mainly comes from different initializations, i.e., a different starting point of the search can make a large difference in the time needed to find the optimum.

VI. CONCLUSION
In this paper, we proposed a DNN to perform the power allocation in a CF massive MIMO system. The max-min power policy, which provides a fair quality of service for all users, was considered. We showed that this NP-hard problem, for which a time-consuming heuristic is required to meet the time constraints imposed by the coherence time, can be better solved by a well-trained DNN. The DNN approach has a low time complexity while exhibiting a perform very close to the commonly used heuristic based on the bisection algorithm. The cost of using a DNN is the lengthy training required. But this should not be a problem in practice since it is done offline, before the network becomes operational.
We demonstrated the qualities of the DNN solution using a particular network configuration and scenario. These qualities, i.e., the close approximation of the behavior of a heuristic, should in principle hold for any network and scenario, since DNNs have been proven, given enough training, to be capable of approximating any function arbitrarily close.
However, there are some concerns that we did not address in this paper and that require further research. In particular finding the most suitable DNN structure, possibly related to the configuration of the CF massive MIMO system, and the determination of the hyper-parameters, which is important for the (prediction) capacity of the DNN. Another issue is that we considered a microwave network. To extend this to the mmWave domain, might be a more complex problem since the UEs are likely to be equipped with multiple antenna arrays, instead of a single antenna like we assumed in this paper. Furthermore, more complex channel models, e.g., Rician or the combination of Rician and Rayleigh, should be considered.

B. ACHIEVABLE SINR OF UE
Reference [33] shows that the achievable data capacity of reliable data transmission [54] will not exceed the Shannon limit [55] in (9). The pre-log factor (1 − K /τ c ) refers to payload transmission, as K modulation symbols are used during channel estimation. SINR k is formulated as (B.1).

C. TIME COMPLEXITY OF THE PROPOSED HEURISTIC
To calculate the computational complexity of the bisectionbased heuristic we should consider the worst case of the power allocation. The time complexity of the bisection algorithm is O(log 2 (MN )) according to [47]. We calculate the complexity of Algorithm 1. SINR max is approximately linearly increasing with MN while SINR min is 0 for the worst case. In each iteration of this loop, SINR max and SINR min are updated as: where δ 1 and δ 2 are updating steps for SINR max and SINR min , respectively. The calculation of δ 1 and δ 2 are very complicated because they are dependent on variable factors such as the actual deployment of the RAAs, the position of the users etc. Moreover, they are always changing in each loop of the iteration. However, we can derive from the convergence of our proposed algorithm that the smallest values of δ 1 and δ 2 (the worst case), are unrelated to the scale of the problem. This means that we can use the smallest value of δ 1 and δ 2 to address the worst case of the iteration to meet the termination condition: (SINR max − iter × min(δ 1 , δ 2 )) −(SINR min + iter × min(δ 1 , δ 2 )) ≤ 1 (C.3)  The computational complexity of the DNN lies in the forward propagation. In the first convolutional layer, we use 5×M ×Q filters with stride [1, M ]. The input is one K × MN matrix so the output of the first convolutional layer is a K × N × Q matrix, which implies K × N × Q convolutional computations. So, the complexity of the first convolutional layer is O(5KMNQ). Similarly, the complexity of the second convolutional layer is O(25KMNQ 2 ). The first two fully connected layers in the regression processing have a complexity: The last fully connected layer in the optimization processing has a complexity of O(2K 2 N 2 M 2 + KNM ). So, we can get the total time complexity as: He was involved in many European research projects. He has supervised 42 Ph.D. students. He has authored or coauthored around 300 articles. His current research interests include machine learning in radio access network architectures, radio-over-fiber, mm-wave technologies, the energyharvesting IoT, and V2X communication. He was a Reviewer for many projects.