Graph-colouring based pilot assignment to mitigate downlink pilot contamination for cell-free massive MIMO systems

In cell-free massive multiple-input-multiple-output (CF-MMIMO) systems, a massive number of access points, mastered by central processing units are distributed in a coverage area to serve much smaller number of user equipments (UEs) simultaneously over the same time/frequency resources. In opposition to the centralized MMIMO, CF-MMIMO particularity is that its channel hardening degree is not sufﬁciently accentuated, thus, it will be judicious to include downlink (DL) pilots in order for the DL channel to be estimated. This paper considers the DL pilot assignment for the CF-MMIMO systems by deﬁning a metric, involving the inter-user interference. This metric gives insights into DL pilot contamination. A threshold is then deﬁned to optimize the number of DL pilots, which maxi-mizes the minimum per-user DL throughput. This approach gives a conﬂict graph, where each UE is regarded as a vertex of the graph. This is a combinatorial optimization problem that can be approximated using graph-colouring algorithms. The simulation results reveal that the proposed method outperforms interestingly, in terms of per-user DL throughput, the existing methods such as statistical channel state information, the orthogonal, and the random pilot assignment in the DL training.

In contrast, the distributed MMIMO, the upgraded architecture of distributed antenna systems [7,8] in terms of the number of antennas, is characterized by a massive number of service antennas are geographically deployed on a large space within the cell. It reduces inter-cell interference through a coherent cooperation between BSs via advanced backhaul. By exploiting diversity and cooperation, it offers much higher probability of coverage than co-located MMIMO [9]. However, it comes with its own complications increasing backhaul requirements and deployment costs and it needs a performant central processing unit (CPU). In particular, the channel state information (CSI) must be shared among the BSs in distributed MMIMO.
Recently, taking into consideration the problem of very high interference at the edge of a cell and the benefits of distributed MMIMO, researchers point the compass direction arrow in other direction and deal with cell-free MMIMO (CF-MMIMO) [10][11][12]. It is a specific implementation of a distributed MMIMO where a significant number of access points (APs), equipped with single or multiple antennas, are situated within an infinite area (no cells). Fronthaul connections are used to link the APs to CPUs, which are linked via network backhaul, if multiple CPUs exist. All APs serve coherently all UEs in the same time-frequency resource. More deeply, each AP applies beamforming, using local channel estimates resulting from received uplink (UL) pilots, to send out data to UEs. The APs cooperate with each other thanks to the CPUs. The latters are dedicated to collection and distribution of payload data, power control and pilot assignment [13]. In order to minimize the backhaul requirements, the CPU sends to the APs the data to be transmitted to the UEs and receives sufficient statistic of the received data symbols from all the APs. It is to be mentioned that neither the channel estimates, nor beamformers are forwarded to the CPU, they are computed locally at the APs [14]. In practice, we should notice a user-centric virtualcell MMIMO [15], which is a specific case of CF-MMIMO. Its specificity is that each UE is served only by a limited number of APs, requiring less backhaul overhead than the CF-MMIMO [16].
The benefits of shifting onto CF-MMIMO are: (i) improving spatial multiplexing by increasing the macro-diversity gain, as each UE receives the same signal transmitted from APs situated in different sites by using channels with different characteristics. This ameliorates the autoimmunity against the shadow fading and the path-loss; (ii) reducing the inter-user interference (IUI): when we have a distributed topology, the APs are more likely closer to the UEs. This improves the quality of the channel estimation and, as a consequence, the accuracy of the beamforming. More spatially focused and directive beams contribute to less IUI in comparison with the centralized MMIMO especially for cell-edge UEs, which suffer from a very high interference; and (iii) offering a very high coverage probability and improving energy efficiency [9]. However, its drawbacks are the higher fronthaul requirements and the necessity for distributed signal processing. Moreover, another feature characterizing the CF-MMIMO is its low degree of channel hardening [17][18][19].
The channel hardening phenomenon makes the fading channel be as determinist, that is, it behaves as if it were a non-fading channel and the random fluctuations are still there but its effect on the channel communication is negligible. In conventional MMIMO, the channel hardening eliminates the impact of smallscale fading owing to the high spatial diversity. The channel variations tend to be near to the channel statistical characterization, that is, the channel hardens more and more, when the number of stochastic channels or in other terms the number of service antennas gets higher and higher, it is the 'law of the large numbers'. In conventional MMIMO, the UEs can rely to decode data on long-term statistical CSI (sCSI). As a consequence, no downlink (DL) pilots are needed to estimate the short-term instantaneous CSI (iCSI) [20].
Whereas in CF-MMIMO, the channel hardening is less accentuated owing to its distributed architecture. Practically, a specific UE is served only by a cluster of APs, especially the nearest ones, because the signals transmitted by the farthest APs are degraded by large path-loss. Thus, the number of APs taken into consideration for serving the UEs is small and the "law of the large numbers" is not valid. So, the UEs, having low degree of channel hardening, rely on iCSI based on beamformed DL pilots [13,17,21]. Besides, UEs whose channels vary quickly need to use beamformed DL pilots to acquire iCSI. Hence, the pilot utility metric per UE [17] is measured not only by the channel hardening degree but also by the speed of UEs. It is to be mentioned that in conventional MMIMO, only UEs whose channels vary slowly do not need to receive beamformed DL pilots. However, it is not the case for UEs whose channels vary quickly. As a consequence, the use of DL pilots will overwhelm the limited radio resource at the price of data. To this end, we should have recourse to save resources while assigning the DL pilots to the UEs. Hence, using pilot assignment approaches becomes necessary to mitigate the DL pilot contamination (PC) issue in CF-MMIMO.

Related works
A broad spectrum of methods has been applied to reduce pilot overhead and the IUI in communication systems. Among these methods, we may cite the pilot assignment, which can be formulated as a combinatorial problem, resulting in an excessive computational complexity [22]. This has boosted the design of suboptimal greedy pilot assignment algorithms, which exploit statistical information such as the large-scale fading to enhance the max-min fairness spectral efficiency and reduce UL PC between cells in MMIMO [23].
In CF-MMIMO, to reduce the pilot overhead and to guarantee higher DL sum throughput, the authors in [17] considered a pilot utility metric to determine if a UE requires an orthogonal DL pilot beamformed by each AP in order to acquire the DL channel gain or just be based on statistical channel. However, practically, since we have a limited-length coherence interval, we should decrease the pilot overhead by using non-orthogonal pilots or reusing pilots. In that respect, in CF-MMIMO, many papers used random pilot assignment, that is, each UE will be assigned at random one pilot sequence from a collection of orthogonal pilot sequences [11,24]. On the one hand, its positive aspects are the simplicity and the susceptibility to be applied in distributed networks. On the other hand, as a negative aspect, it does not involve the situation when UEs in close vicinity share the same pilot. This leads to a significant PC. To deal with this weakness in such systems, appropriate pilot assignment schemes are needed to mitigate PC and to reduce pilot overhead. Greedy pilot assignment algorithm in [11] allows to allocate randomly UL pilots to all UEs. Then, this assignment is enhanced iteratively by choosing the UE having the minimum per-user DL rate. The chosen UE updates its allocated UL pilot so that its PC effect is minimized. Moreover, to ensure a same quality of service throughout the network, [18] put forward a joint UL-DL pilot assignment algorithm, which minimizes a utility function depending on UL and DL PC issues. Another method to deal with decreasing the pilot overhead is called "clustering pilot assignment". Among these pilot assignment schemes, a "regular pilot reuse" scheme is utilized in [25] in order for UEs geographically separated to share a copilot, and to ensure a PC effect at an acceptable low level. Furthermore, to control the PC in CF-MMIMO, [26] proposed a dynamic cooperation clustering, a user-centric approach, where serving each UE is ensured by a cluster of APs offering the best channel conditions.
In addition to that, the graph-colouring algorithms, being minimization techniques, have been implemented to reduce the interference in mobile communications [27,28]. Reference [29] proposes a pilot assignment algorithm based on vertex graphcolouring problem to model reusing UL pilots between multiple cells in cellular networks. Precisely, in [29], UL PC occurs only between cells because a pilot can be used once within one cell. The graph-theoretic approach was also used to assign DL training pilots in order to reduce the pilot overhead in centralized MMIMO systems [28]. Furthermore, it was used for UL pilot assignment in [27], by exploiting the large-scale fading coefficients in MMIMO systems. Reference [27] studied the UL interference effect that one UE can induce to another UE while using the same UL pilot in MMIMO systems. In such systems, DL pilots are not necessary thanks to channel hardening and channel reciprocity. This is applicable under the assumption of low/moderate speed UE.

Contributions of the paper
Motivated by the improvements of graph colouring to mitigate the effect of UL PC in multi-cell MMIMO, we extend the study to CF-MMIMO system to investigate under which conditions UEs can share the same DL pilot, mitigating the DL PC effect. To this end, as an improvement and an extension of our work in [30], we provide a detailed description and analysis on using graph colouring for DL pilot assignment to mitigate the DL PC in CF-MMIMO. To study only the potential DL interference between UEs, we use non-orthogonal DL pilots for DL channel estimation, and we are limited to orthogonal UL pilots for UL channel estimation. Specifically, our contributions are laser focused on the following points: • We propose to measure the strength of potential IUI between two UEs, via a ratio of the interference channel and the desired channel strengths, deduced from the per-user DL rate. • We construct an interference or a conflict graph to identify the potential IUI relationships among all UEs in CF-MMIMO. UEs, between which the ratio of the interference channel strength and the desired channel strength is stronger than a certain threshold, are in conflict, that is, are connected to each other. • We develop a graph-colouring algorithm to greedily allocate different pilots to connected UEs and same pilot to nonconnected UEs in the conflict graph in CF-MMIMO. • We study the improvement of the performance gain of the proposed graph-colouring algorithm used for DL pilot assignment in CF-MMIMO, compared to those based on sCSI, orthogonal pilot assignment and random pilot assignment in the DL training.
• We study the reduction of the DL pilot overhead ratio versus the number of UEs using the proposed scheme.

Paper outline and notation
The remaining parts of this paper are structured as follows. We introduce in Section 2 the non-orthogonal pilot assignment and we deal with the PC problem in MMIMO systems. Then, we present in Section 3 the considered system model and we devote Section 4 to the performance analysis. Section 5 presents the key piece of this paper dealing with graph-theoretic approach for DL pilot assignment while Section 6 contains the numerical results. Finally, concluding observations are given. Boldface letters denote column vectors. The superscripts () * and () H stand for, respectively, the conjugate and conjugate transpose. The expectation operator is denoted by {.}. At the end, z ∼  ( , 2 ) stands for a circularly symmetric complex Gaussian random variable (RV) z with mean and variance 2 .

Time-division duplexing transmission
Time-division duplexing (TDD) operation is recommended in MMIMO system because the training overhead is independent of the total number of serving antennas in the BS array, unlike frequency-division duplexing (FDD) operation [4]. We assume half-duplex TDD so that only one end of the link is transmitting at any one time, either the BS or the UEs. The channel coherence interval is defined as the time-frequency interval during which the channel response can be considered almost constant. It is defined by the propagation environment, UE mobility and carrier frequency [4]. The TDD frame structure is generally designed to be smaller or equal to the smallest channel coherence interval of all active UEs. For the sake of simplicity, throughout the paper, we assume that a TDD frame is the same as the channel coherence interval. Added to that, the coherence interval is split into UL and DL subintervals. As in [4], two configurations of a TDD frame are available, differing only in including DL pilots or not. The guard intervals between UL and DL transmissions are not shown. The first configuration, shown in Figure 1a, includes UL training, UL data, DL training and DL payload data transmission [4,31]. These phases will be described in detail in the following paragraph. Compared to the first configuration, the second one, shown in Figure 1b, does not include DL pilots. Here, we assume a c -sized coherence interval. Its c size is equal to T c B c , where T c and B c stand for, respectively, the coherence time and the coherence bandwidth. Moreover, c can be presented as the sum of the length of every subinterval as follows where ul,p and ul,d are, respectively, the number of symbols per TDD frame used in the UL to send pilots and data . dl,p and dl,d are, respectively, the number of symbols per TDD frame used in the DL to send pilots and data.
In TDD system, the UL and DL channels use the same frequency spectrum, but different time slots. As a hypothesis, the UL and DL channels are reciprocal thanks to an accurate hardware chain calibration. We notice that the precoding requires a high quality of CSI at the transmitter. The CSI can be obtained differently depending on UL/DL communications: • In the UL channel, precisely in the UL training phase (UL pilots), the UEs send known signals, pilots, to the BS. Subsequently, the latter estimates the UL CSI to all UEs, relying on the received pilot signals. Assuming the channel reciprocity, this CSI is utilized to detect the signals sent from the UEs in the UL (UL data) and to precode the transmit signals in the DL (DL data). • In the DL channel, the DL CSI is needed by the BS and may be needed by the UE, depending on the channel hardening and the speed of UE [17]. On the BS side, the DL CSI is needed to precode the transmitted signals. Thanks to reciprocity, the channel estimated in the UL is utilized to precode the DL data. On the UE side, the effective DL channel gain is used to decode the desired signals. To obtain the DL channel information, the BS can beamform pilots (DL pilots). Based on the received signal, each UE can estimate the effective DL channel gain.
It is important to notice the following points: First, the position of DL pilots is advanced compared to the DL data in order to avoid additional latency caused by the switch on/off of the base station. For the same reason, the position of UL data is advanced compared to the DL data. Second, it is compulsory to ensure the time needed for uplink channel estimation and precoding processing before sending DL data. As a consequence, DL data transmission (respectively UL data transmission), proceeded after the UL training, is based on the precoding (respectively detection) with a good UL channel estimate.

Pilot reuse and PC
Whether in UL or DL, the channel information quality in conventional MMIMO or CF-MMIMO may be assigned by a PC problem [32]. It occurs during the UL/DL channel estimation, when two or more UEs send/receive non-orthogonal pilots or more precisely, the same pilot sequences. Moreover, this problem persists even when the number of BS antennas grows without bound [1,31]. As is well explained in [5,33], intuitively to avoid this problem in UL (or DL), each UE should be assigned an orthogonal pilot in UL (or DL) to not contaminate the channel estimates in UL (or DL). However, mathematically the length of orthogonal pilot sequences is proportional to the number of orthogonal pilots, that is, the number of UEs. Moreover, the channel coherence interval length is limited due to the changing of the propagation environment. Hence, this limits the length of orthogonal pilot sequences. Assigning orthogonal pilots cannot be physically realistic. In addition, to decode its intended data, the UE may need to rely on beamforming DL pilots. If the latters are non-orthogonal, this will cause DL PC. In that respect, in Section 5, we will propose a pilot reuse scheme based on graph theory to mitigate DL PC in CF-MMIMO.

CF-MMIMO SYSTEM MODEL
Here we employ the CF-MMIMO system model as in [11,17,34]. Its key elements are M single-antenna APs and K singleantenna UEs, which are spread randomly in a large space without cell boundaries, with K < M . The APs are linked to one or more CPUs via a fronthaul network and the CPUs are connected via backhaul network as shown in Figure 2. The features of this system are as follows: the M APs communicate at the same time with the K UEs in the same time/frequency resource. The half duplexing used is TDD, allowing one directed communication at a time slot, either from the APs to UEs (DL) or from UEs to APs (UL). As a consequence, the system takes advantage of channel reciprocity, that is, the channel response in the UL and the DL are the same. This requires a calibration of the hardware chains. As a DL precoder, we use the conjugate beamforming, also known maximum-ratio transmission. The choice of such a precoder is justified due to the following features: (i) its specificity to be deployed on a distributed system [31]; (ii) to avoid the exchange of CSI among APs; (iii) to reduce the fronthaul network load [13] because the precoding can be deployed locally at each AP without the need for exchanging information to other APs; and (iv) it is not considered as an optimal precoder but it is a lowcomplexity precoder and composed of inexpensive hardware components.
Using the same notations as [17], we suppose that TDD frame length is same as the coherence interval length interval. As a consequence, the channel is considered static within a frame and variable independently for every frame. g mk stands for the coefficient of the channel that separates the kth UE and mth AP, is as follows where h mk stands for small-scale fading component, and mk stands for the large-scale fading component, which incorporates path-loss and shadowing. We suppose that h mk , m = 1, … , M , k = 1, … , K , are independent and identically distributed (i.i.d.)  (0, 1) RVs. The hypothesis of independent small-scale fading is justified by the fact that the APs and the UEs are deployed in a large area. As a consequence, the scatterers of each AP and each UE are different. Added to that, the small-scale fading is supposed to be static during every coherence interval and varies independently from one coherence interval to another. The large-scale fading fluctuates much slowly, and remains constant for many coherence intervals. Hence, the { mk } coefficients are supposed to be known.

UL training
In our case, as we supposed that the channel coherence interval is equal to a TDD frame, we have ul,p < c . We consider the UL pilot sequence of length ul,p samples sent by the kth UE as √ ul,p k ∈ ℂ ul,p ×1 , k = 1, … , K . During training phase, we suppose that the pilot sequences are transmitted with full power, and are mutually orthonormal. In that respect, among the primordial conditions, we consider that ul,p ⩾ K , that is, ul,p = K is the smallest length of symbols needed to generate K orthogonal vectors. The mth AP receives the combination of K UL pilots from all UEs as follows where ul,p is the normalized transmit signal-to-noise ratio (SNR) related to the pilot symbol and w ul,p,m is the additive noise vector, whose elements are i.i.d.  (0, 1) RVs. The instruc-tions to estimate g mk are as follows: (i) the mth AP projects the received pilot signal y ul,p,m onto k resultingỹ ul,p,mk as: and; (ii) the resulted signalỹ ul,p,mk is then used as a prior information to perform the minimum mean square error (MMSE) estimation of g mk givingĝ mk , k = 1, … , K as followŝ where The channel estimation error is given byg mk ≜ g mk −ĝ mk . By definition, owing to the linear MMSE properties [35],ĝ mk and g mk are uncorrelated. Furthermore, the estimate and estimation error are jointly Gaussian distributed, thus they are statistically independent. mk , which stands for the variance of the UL channel estimate, is defined as follows The channel estimate and the estimation error in UL, respectively, are distributed asĝ mk ∼  (0, mk ) andg mk ∼  (0, mk − mk ) .

DL payload data transmission
In the DL data transmission phase, each AP implements a power control, and uses the UL estimated CSI in order to precode, via a conjugate beamforming, the signals to be transmitted to all the K UEs. Hence, as is described on the AP side in Figure 3, the mth AP transmits to all UEs the following signal: where q k denotes the data symbol for the kth target UE, which satisfies {|q k | 2 } = 1. Furthermore, q k are zero-mean and unitvariance symbols, and are uncorrelated. dl,d is the normalized transmit SNR related to the data symbol. Furthermore, mk , m = 1, … , M , k = 1, … , K , are the power control coefficients, which represent the conveyed power by the mth AP for the kth UE. They are determined under the average power limitation at each AP: The power constraint (9) can be given as a function of mk , m = 1, … , M , k = 1, … , K , and mk when we substitute (8) into (9) as follows The kth UE receives the sum of the data signals sent by all the APs in (8) as follows where each one of the terms a kk ′ ≜ ∑ M m=1 √ mk ′ g mkĝ * mk ′ , k ′ = 1, … , K describes the effective DL channel gain seen by the kth target UE for the symbol sent to the k ′ th UE. Furthermore, w dl,d,k is additive  (0, 1) noise at the kth UE. The kth UE must have a sufficient knowledge of a kk to accurately decode q k [34] as is shown in Figure 3, more precisely in the decoding block of the UE side. To this end, among the methods used, the kth UE can rely on channel hardening and we assume that a kk ≈ {a kk }. This method is appreciated for conventional MMIMO owing to the "law of the large numbers". But, it is different from CF-MMIMO because of its distributed architecture. In fact, only a subset of APs, that is, the closest APs can serve a UE because the far ones are attenuated by larger pathloss. Besides, in the aim to estimate a kk , the kth UE utilizes DL beamformed pilots.

DL training
In order to determine the effective DL channel gain estimatê a kk , the mth AP precodes, utilizing the conjugate beamforming, and beamforms to all UEs the DL pilots, defined by √ dl,p k ′ ∈ ℂ dl,p ×1 , k ′ = 1, … , K . Thus, the dl,p × 1-sized pilot signal x CF dl,p,m sent from the mth AP, is given by where dl,p is the normalized transmit SNR per DL pilot symbol and any two of the DL pilot sequences 1 … K can be either mutually orthonormal or identical as follows The kth UE receives a corresponding dl,p × 1 pilot vector as follows where w dl,p,k is a receiver noise vector, whose elements are i.i.d  (0, 1) RVs.
In order for the effective DL channel gain a kk to be estimated, the procedures are as follows. The kth UE multiplies the received DL pilot by the pilot sequence H k as follows where n dl,p,k ≜ H k w dl,p,k ∼  (0, 1) . The second term in (18) describes the DL PC effect because the DL pilot sequences may be reused. Based on [35], the resulted signal y dl,p,k is then used as a prior information to perform MMSE estimation of a kk . Consequently, we haveâ kk , k = 1, … , K as followŝ a kk is used by the kth UE to decode its intended data as is described on the UE side in Figure 3. The proof to calculatê a kk is shown in detail in [18]. The estimation error is as follows: a kk = a kk −â kk .

PERFORMANCE ANALYSIS
We consider two types of CSIs namely: iCSI and sCSI [36].
In iCSI, the current or instantaneous channel conditions are known. In sCSI, we have access to the statistical characterizations of the channel. The CSI acquisition depends on the speed of channel variation as follows: • In fast fading channels, where channel conditions vary rapidly, only sCSI is suitable. • In slow fading channels, iCSI can be estimated in much larger coherence intervals with some estimation errors before being changed.
In practice, the CSI often lies in between sCSI and iCSI. For example, to decide whether the UE can rely on iCSI or on sCSI to decode data, [17] considered a pilot utility metric compared to a specific threshold. In what follows, we focus on studying the attainable DL rate under sCSI and iCSI.

Attainable DL rate under instantaneous CSI
Let us consider the following hypotheses, dealing with a fading channel associated with a non-gaussian noise. Moreover, UEs have an imperfect iCSI. According to this hypothesis, the UE relies on beamformed DL pilots and estimates the DL channel gain. This latter is a prior information for the capacity-bounding technique by following the same process as in Section 2.3.5 in [4]. As {a kk ′ }, k, k ′ = 1, … , K , can be approximated as Gaussian random variables, the DL ergodic capacity lower bound can be approximated by the following proposition: Proposition 1. A closed-form expression for an approximate attainable DL rate of the transmission from the APs to the kth UE, in a CF-MMIMO system, via a conjugate beamforming, using orthogonal training sequences in the UL and non-orthogonal training sequences in the DL, for any finite M and K , is provided by (20). We note that in (20) we have, where kk ′ = ∑ M m=1 mk ′ mk mk ′ . k is the DL channel estimate variance, which incorporates the effects of the DL PC. It represents the DL counterpart of the term mk . In what follows, we extract the following terms from (20): These terms represent, respectively, the power of the desired signal, the IUI induced by the k ′ th UE and the variance of the DL channel estimation errorã kk .

4.2
Attainable DL rate under statistical channel at the receiver As a hypothesis, the kth UE counts on sCSI. As a consequence, getting inspired by Section 2.3.4 in [4], the attainable DL rate FIGURE 4 (a) Random pilot reuse: each UE is randomly assigned a pilot sequence chosen from a collection of orthogonal sequences. It is to be mentioned that the UEs coloured with the same colour, share the same DL pilot. This may cause a stringent interference between the UEs, for example, UEs in the red shadow zone. (b) Adjacency list. (c) Conflict graph of the proposed scheme: three pilots are used for six UEs alleviating the PC. (d) Adjacency matrix: each of the 1's in the matrix of adjacency is equivalent to an edge in the conflict graph denoted as R sCSI,CF k , is dictated by (21). The component term dl,p kk ′ stands for the beamforming gain uncertainty. It originates from the UEs' absence of iCSI. As is shown in (25), the beamforming gain uncertainty represents not only the variance of the effective DL channel, but also it quantifies indirectly the channel hardening. Precisely, it is inversely proportional to the channel hardening degree.

GRAPH-THEORETIC APPROACH FOR PILOT ASSIGNMENT
In essence, we evolve the proposed graph-theoretic approach to allocate DL training resources to all UEs in CF-MMIMO. First, we explain the utility of the proposed approach and then we apply the graph-theoretic approach to allocate DL pilots.
As is shown in Equation (20), the DL attainable rate R iCSI,CF k is a function of k , which is negatively impacted by the DL PC. To illustrate this problem, we deal with an example in Figure 4a. The UEs, painted with the same colour, share the same beamformed DL pilot. Thus, these UEs, in the red shadow zone, suffer from a stringent PC. From (20), we consider that ∑ K k ′ ≠k W IUI kk ′ represents the interference strength from other UEs applied to the k-UE. Hence, we define a symmetric matrix Θ = [ kk ′ ] K ×K , whose entries give a measure strength of the IUI that can be induced between the kth UE and the k ′ th UE, and are defined as follows defines the ratio of interference channel and the desired signal strengths. Larger kk ′ indicates a stringent interference occurred between the kth and the k ′ th UEs when the same DL pilot is allocated to these UEs. As a ramification, it will be appropriate to define a threshold th by which we can determine whether the same pilot can be allocated for two UEs, that is, k,k ′ < th , or not.

Construction of interference graph based pilot allocation
By employing the information of dominant DL interference between UEs, we deal with a resource-allocation approach, which is turned into a graph-colouring issue. According to th , we create an adjacency matrix A = [ kk ′ ] K ×K whose entries are as follows As the aforementioned matrix Θ, the matrix A is symmetric. Later, the adjacency matrix is exploited to elaborate the adjacency list followed by the conflict or interference graph, in the aim to define the interference relationship among all UEs in the CF-MMIMO system. We exemplified the problem in Figure 4. We consider a system involving six UEs, that is, K = 6. Depending on , we generate an adjacency list of this system represented in Figure 4b. Let us describe the first line in the adjacency list as follows. UE 1 is in conflict with UE 2 and UE 3 that means that the 1,2 and 1,3 are superior to a predefined threshold th . Similarly, the same process is applied for the other lines in the adjacency list. Then, an interference or conflict graph is constructed in Figure 4c to describe the relationship in terms of the severity of PC between UEs in this example. Every edge of the conflict graph represents a conflict between two vertices (UEs) and distinct vertex colours represent distinct (orthogonal) pilots. For this example, the adjacency matrix A is shown in Figure 4d.

The proposed algorithm
To resolve the DL pilot allocation problem, a solution based on graph theory is put forward. It consists in colouring a conflict graph, where each coloured vertex in the conflict graph repre- Allocating the same DL pilot to every UE.
Step 1: Construction of the interference conflict graph based on R iCSI,CF k , k = 1, … , K .
Step 2: Arranging the vertex set {UE 1 , … , UE K } in a descending order in accordance with the degree d k , k = 1, … , K .
Step 3: Assign the colour c 1 = 1 to the highest-degree vertex.
Step 4: 1: sents a distinct UE and each different colour represents a unique (orthogonal) DL pilot sequence. In particular, the colouring process must be in accordance with these rules: (i) Two vertices (UEs) that are linked by an edge, that is, in conflict, are given different colours (pilots) and; (ii) the number of colours (pilots) is minimized. Hence, mathematically, the optimization problem of vertex colouring can be described as follows where K dl,tr is the required number of colours (DL training resources) to colour the conflict graph and c k is the colour allocated to the kth UE where k ∈ {1, 2, … , K }. Throughout the paper, the DL pilot size dl,p is the same as the number of DL pilots. Furthermore, it is to be noticed that only K DL pilots are available and there is no limitation in the number of times a pilot (colour) can be reused. The problem in (28) is conjectured to be an NP-hard problem leading to a high complexity. Thus, to address this issue, we need to opt for heuristic and suboptimal algorithms. For further details, three constructive algorithms for the graph-colouring problem, namely the GREEDY, DSATUR, and RLF algorithms are found in this book [37].
Here, we use a greedy algorithm to solve sequentially the graph-colouring problem. In every step, the greedy algorithm makes certain choices. Its principle is to choose the solution that is likely to be the best among other local solutions without considering the consequences that may occur and without going back.
We propose a graph-colouring based pilot assignment algorithm in CF-MMIMO (GC-PA-CF-MMIMO) in Table 1, whose instructions are elaborated step by step as follows: As an input of the aforementioned algorithm, we consider the CF-MMIMO system parameters K , M , { mk }, th . Added to that, for the sake of comparison of the IUI induced within the system, we allocate the same DL pilot to every UE.

1.
Step 1: While allocating the same DL pilot allocated to every UE, we generate the conflict graph by computing R iCSI,CF k , and the associated adjacency matrix A.

2.
Step 2: We arrange the vertex set {UE 1 , … , UE K } in a descending way in accordance with the degree of the kth vertex UE k as follows: stands for the number of edges associated with the vertex UE k , k = 1, … , K . Expressly, d k stands for the number of UEs in conflict with the vertex UE k . Thus, the resulting arranged set is denoted by V arranged . Subsequently, V i denotes an i-indexed vertex of the V arranged to colour, K i is the colour number registered after the colouring of the first i vertices, and c i is the colour used to colour Step 3: In this step, we start the colouring process of the first vertex in V arranged , V 1 , by assigning any colour to it. By convention, the first colour assigned to V 1 is coded by one, that is, c 1 = 1. More generally, each new colour code is incremented by one in comparison to the previous colour used. Once V 1 is coloured, the number of colours is fixed to Step 4 (loop for): For each loop instruction i, ∀i ≥ 2, we concentrate on colouring V i . Prior to that, we must circumvent conflict or interference between V i and the previously coloured vertices, that is, indexed by the colour index set  c i = {1, 2, … , i − 1}. Precisely, to this end, the appropriate colour for V i must be in accordance with the colour of every vertex V k as follows: We consider   i the set of all colours assigned to vertices, which are neighbours with V i . Through the colouring process of V i , depending on the conditions, we are faced with either using a new colour or reusing a specific colour as follows:

• Case 2: Reusing a specific colour
If the authorized code colour set to colour V i , denoted as  i = {1, 2, … , K i−1 } −   i , is non-empty, then to colour V i we reuse a colour k, that is, c i = k, where n j is the number of vertices that have previously been coloured by colour j . Expressly, the colour k used to colour V i is determined by these two steps: (i) We select the colours authorized to be used to colour V i , that is, belonged to  i ; and (ii) once (i) is performed, we count how many times these selected colours have been used previously, and we choose the lowest used colour to colour V i . Thus, K i = K i−1 < i. Consequently, as we reuse colours, this case represents the sine qua non case to limit excess use of colours (pilots). In essence, the key idea of this paper to mitigate DL PC relies on reusing DL pilots while avoiding conflict between vertices to make the number of DL pilots fewer, that is, make the DL pilot sequence length shorter.
Remark 1. Our proposed algorithm for DL pilot assignment is performed at the CPU. To that end, large-scale coefficients are exchanged only from all the APs to the CPU. Based on large-scale fading, it is noteworthy that the signal processing at the CPU is simplified as the large-scale fading fluctuates much slowly, and remains constant for many coherence intervals. That exchange is necessary to construct the conflict of our proposed scheme. Indeed, the conflict graph construction is based on the ratio of the interference channel and the desired signal strengths (see Equations 23,24,and 26) and then, that ratio is a function of large-scale coefficients (see Equations 6 and 7). Added to that, to design the conflict graph, it is necessary to assign the same DL pilot to all users. To that end, as the same DL pilot is chosen from the orthogonal matrix of DL pilot sequences, the CPU assigns the index of the same pilot to all APs intended to the all users.

Threshold method selection
To control the conflict among UEs, we choose accordingly a threshold value th . If k,k ′ > th , UE k and UE k ′ are considered in conflict. Therefore, these UEs are not authorized to reuse the same pilot. If we have k,k ′ ≤ th , UE k and UE k ′ are not in conflict. Therefore, they are authorized to reuse the same pilot. At the beginning, we consider the minimum and the maximum of k,k ′ , that is, min = min{ k,k ′ }, max = max{ k,k ′ }. The conflict graphs depend on the selected value of th . From (27), we have kk ′ > th . Added to that, we have obviously, kk ′ ≥ min . As a consequence, mathematically, this implies that min ≥ th , that is, we have th ∈] − ∞, min ]. In contrast, in our context of the conflict graph construction, the threshold is used to determine from which the kk ′ is considered predominant or not. Thus, the lowest limit of th is min . Likewise, the highest limit of th is max . Thus, th ∈ [ min , max ].
We consider the two extreme cases as follows. When we set the threshold th to max , no conflict is identified among UEs. A potential conflict among UEs stems from diminishing th . Thus, the proposed algorithm GC-PA-CF-MMIMO can mitigate the interference, in combination with a decreased number of DL pilots (colours). Setting the threshold th to min induces a conflict among all UEs. Besides, the number of DL pilots (colours) required is K pilots. The algorithm GC-PA-CF-MMIMO in this case is assimilated to an orthogonal pilot allocation scheme.
In order to determine the near-optimal threshold th , we employ an algorithm called "iterative grid search (IGS)" [27] as where T represents the number of iterations and N denotes the total number of grids per iteration. In short, to ensure a certain net throughput per user, the IGS algorithm consists in choosing the threshold parameter th from N grids in such a way the minimum per-user DL net throughput is maximized in each iteration until it attains T iterations. The algorithm is elaborated as follows: • Iteration 1: Let us consider the N threshold grids for the first iteration by the grid set  = { We denote (1) max the grid element from the grid set  that can attain the maximum of the minimum per-user DL net throughput. A subinterval 1 is denoted as [ It will be utilized for the next iteration as a search interval of (2) max .
• Iteration t , 2 ⩽ t ⩽ T : Once we find the (t ) max , we narrow the search subinterval t and we get the interval search [ ]. This step is performed for (T − 1) times. Finally, the near-optimal threshold is th = (T ) max .

NUMERICAL RESULTS AND DISCUSSION
This section presents a quantitative study to assess the performance, in terms of per-user downlink net throughput, of our proposed solution and the existing solutions in the literature. The M APs and K UEs are uniformly and randomly distributed within a 1 km 2 -sized square. To simulate a cell-free topology, that is, to avoid cell-edge effects, the square simulation area is wrapped around with eight copies of neighbour squares to deal with a network with infinite area (wrap-around technique).

Large-scale fading model
Considering the model used in (2), we emphasize on the largescale fading coefficient mk , which models the path-loss and the shadow fading as follows mk = PL mk .10 sh z mk where PL mk stands for the path-loss and 10 sh z mk 10 describes the shadow fading associated with the standard deviation parameter sh , besides z mk ∼  (0, 1), that is, we assume uncorrelated shadow fading.
and where f is the carrier frequency (in MHz), h AP (in m) is the height of the AP, and h UE (in m) is the height of the UE antenna. We shall mention that when d mk ≤ d 1 , there is no shadowing, that is, the transmitters/receivers are not surrounded by common obstacles.

Parameters and setup
In all examples, the overall simulation parameters are summarized in Table 2. The parameters ( ul,p , ul,d , dl,p , dl,d ) are, respectively, the transmit powers of UL pilot, UL data, DL pilot and DL data. The corresponding normalized SNRs ( ul,p , ul,d , dl,p , dl,d ), defined in Section 3, are computed by dividing the aforementioned radiated powers by the noise power, which is given by where B spect is the spectral bandwidth (Hz), k B = 1.381 × 10 −23 (Joule per Kelvin) is the "Boltzmann constant", and T 0 = 290 (Kelvin) is the noise temperature. The noise figure in UL and DL are equal to 9 dB and c = 200 symbols, corresponding to a coherence bandwidth B c = 200 kHz and a coherence time T c = 1 ms. In what follows, B spect is set at 20 MHz and the antenna gain at 0 dBi. Herein, we assume that K ≪ c and ul,p dl,p ≥ K . Added to that, the channel estimation in UL and DL occur at the same coherence time.
To have a fair comparison, we define the DL per-user net throughput T k (bit/s) [4], which is directly proportional to the DL per-user rate R k (bit/s/Hz), as follows where B spect denotes the spectral bandwidth and po is the pilot overhead, that is, the number of symbols spent per coherence interval in UL and DL training phase. T k evaluates only the fraction of symbols in each coherence interval that are dedicated for transmission of payload data (1 − po c ), that is, we subtract the resources for UL and DL pilots transmission. Besides, to get both UL and DL spectral efficiencies simultaneously, the net spectral efficiencies must be multiplied by a factor DL in order to consider only the fraction of useful symbols per coherence interval utilized for DL. For instance, for a symmetric TDD frame ( ul,d = dl,d ), that is, DL = 1 2 , the per-user throughput

Results and discussions
In what follows, we deal with an example to highlight the advantage of our proposed solution compared to the stateof-the-art. On the one hand, for UL pilot allocation, we used only orthogonal pilot assignment. On the other hand, for DL pilot allocation, our study is based on the following methods: 1. Statistical CSI (sCSI): no DL training is implemented and all the UEs use the sCSI to decode its intended data. We allocate to each UE an orthogonal UL pilot. So, ul,p = K , dl,p = 0. 2. Instantaneous CSI (iCSI) with orthogonal DL pilots: we allocate to each UE an orthogonal UL pilot and each UE receives an orthogonal DL pilot in order for the DL channel gain to be estimated. Hence, ul,p = dl,p = K . 3. Instantaneous CSI (iCSI) with DL random pilot assignment: we allocate to each UE an orthogonal UL pilot and each UE receives a DL pilot chosen randomly from a collection of orthogonal sequences composed of rand dl,p -length pilot sequences. In order to work in same conditions for random pilot assignment and colouring pilot assignment defined by a pilot sample length colour dl,p , we set rand dl,p = colour dl,p . For the random pilot assignment, two UEs may receive the same pilot randomly.

FIGURE 5
The IGS algorithm to determine th the near-optimal threshold value with the system parameters M = 60, K = 10, and the IGS algorithm parameters N = 20, and T = 1: (a) for the channel setup 1, and (b) for the channel setup 2. Each channel setup 1 and 2, are generated by each sample of the following random variables aforementioned: z mk ∼  (0, 1), d mk , m = 1, … , M , k = 1, … , K . The distance between the kth UE and the mth AP, d mk , m = 1, … , M , k = 1, … , K , is generated by deploying UEs and APs in a random position delimited by a square length equal to 1000 m (as is described in Table 2, the position of the APs and the UEs follows a uniform random distribution), and h mk , m = 1, … , M , k = 1, … , K , are independent and identically distributed (i.i.d.)  (0, 1) RVs 4. Instantaneous CSI (iCSI) with colouring pilot assignment: we allocate to each UE an orthogonal UL pilot and each UE receives a DL pilot following the colouring algorithm GC-PA-CF-MMIMO. Especially, the UEs between which we have k,k ′ in such a way that k,k ′ < th , can share the same DL pilot. On the contrary, the UEs between which we have k,k ′ respecting k,k ′ ≥ th , are not allowed to share the same DL pilot, otherwise it will cause more and more interference between UEs.
Moreover, to fix the power control coefficients mk at the mth AP for the kth UE, we utilize an equal (uniform) power control policy with mk = ( ∑ K k ′ =1 mk ′ ) −1 ∀k = 1, … , K , m = 1, … , M satisfying the equality condition in (10), that is, each AP uses full power. This technique, denoted by channel-dependent full power transmission (CD-FPT), is not optimal. However it can be deployed in a distributive manner, and it reduces the complexity of computation as it does not take into consideration an optimization problem. Figure 5a,b exemplify, respectively, for two channel setups, the IGS algorithm to indicate how to seek the near-optimal threshold th with M = 60, K = 10, N = 20, and T = 1. The X-axis stands for the normalized threshold interval, and the Yaxis stands for the min-user DL net throughput of the proposed GC-PA-CF-MMIMO considering the threshold grid set As we are limited to T = 1, the near-optimal threshold th = (1) max can be determined once The average min-user DL net throughput against the number of APs, with K = 10 and K = 20 the first iteration is performed, considering the maximum of the min-user DL net throughput of the proposed GC-PA-CF-MMIMO. We underline that the selection of th has an impact on the min-user DL net throughput of the proposed GC-PA-CF-MMIMO. We define the two following cases as follows: (i) In case we have th = min , the proposed scheme is assimilated to an orthogonal pilot assignment scheme for which we need K orthogonal pilots; (ii) In case we have th = (1) max , we reuse pilots using the minimum number of DL pilots and reducing the effect of DL PC in DL pilot assignment as (28) reveals. On the one hand, as a consequence, the pilot sequence length decreases compared to case 1. This increases the minuser DL net throughput. On the other hand, the reduction of pilot sequence length comes at the expense of the rise of the number of identical pilot sequences, hence, the increase in the DL channel estimation error quality, which decreases the minuser DL net throughput. Thus, there is a tradeoff on the pilot sequence length depending on the considered channel setup. Figure 6 represents the average min-user DL net throughput versus M with K = 10 and K = 20. The performance, in terms of min-user DL net throughput, of the benchmark schemes and the proposed scheme is boosted continually when the antenna number M grows, thanks to the diversity gain. Especially, whatever the value of M , the proposed scheme outperforms the prior art mentioned before by about 1 Mbits/s. The curve of the iCSI with random pilot assignment is lower than the one of sCSI, as while using random DL pilot assignment, the R iCSI,CF k is diminished compared to R sCSI,CF k due to the problem of DL pilot contamination. As while using random DL pilot assignment, the DL pilots may be reused, this increases the problem of DL pilot contamination. Using DL pilots reduces the channel gain uncertainty compared to sCSI, as the denominator of R iCSI,CF k , depends on − dl,d k , but, the problem of DL pilot contamination remains considerable. We emphasize that the DL orthogonal pilot assignment yields an additional gain compared to the DL random pilot assignment in accordance with the UL case in [24], and another gain can be yield by the proposed colouring algorithm as it removes the potential IUI while reducing the number of DL training pilots, via the colouring of the conflict graph. The proposed scheme gives an additional gain compared to the DL random pilot assignment. For a sake of a fair comparison between the random pilot assignment and the proposed scheme, we consider rand dl,p = colour dl,p , where colour dl,p is equal to the minimum number of DL pilots as (28) reveals. Then, we investigate only the effect of pilot allocation strategy. The random pilot assignment policy, where every pilot is chosen randomly from a set of orthogonal sequences with a pilot sequence length rand dl,p , is independent of the conflict between UEs and therefore those in close proximity may reuse the same DL pilot, which increases DL PC. The DL orthogonal pilot assignment performs better than the sCSI scheme. As the channel hardening in CF-MMIMO is not sufficiently pronounced, UEs use the beamformed DL pilots with the aim to accurately estimate DL channel [18]. Added to that, with the increase of the number of users K , the average min-user DL net throughput of our proposed method still outperforms the prior works aforementioned. Figure 7 represents the average DL pilot overhead ratio defined by, against the number of UEs K when M = 60. In other terms, the DL pilot overhead ratio computes the DL pilot size reduction using the proposed scheme in comparison to the orthogonal DL pilot assignment scheme. It is to be noticed that the number of orthogonal pilots using the orthogonal pilot assignment is equal to the number of UEs, K . The DL overhead ratio related to orthogonal pilot assignment is constant and equal to 1 as K orthogonal pilots are used for K UEs. Similarly, the DL overhead ratio of sCSI is constant and equal to 0 as no DL pilots are used. For a sake of comparison we consider rand dl,p = colour dl,p , where colour dl,p is equal to the minimum number of DL pilots as (28) reveals. As a consequence, whatever the value of K , the number of DL pilots used for random pilot assignment, where every pilot is chosen randomly from a set of orthogonal sequences with a pilot sequence length rand dl,p , is equal to the number of DL pilots used for the proposed scheme. Furthermore, we highlight that the average DL overhead ratio for the proposed scheme is smaller than the one for the orthogonal pilot assignment, as in the proposed scheme we reuse DL pilots so that we reach the minimum of DL pilot size as (28) reveals. Added to that, the DL overhead ratio for the proposed scheme is inversely proportional to the number of UEs. Concerning the proposed scheme, when K is set between 10 and 30, the DL pilot overhead ratio varies between almost 0.25 to 0.1, and when K varies between 30 to 50, the DL pilot overhead ratio is almost constant and equal to 0.1. The more we have UEs to serve, the more DL beamformed pilots are needed to mitigate the DL PC.

Complexity analysis
According to the proposed algorithm in Table 1, the complexity of the proposed GC-PA-CF-MMIMO scheme based on selecting a threshold th through the IGS can be computed step by step as follows: -Step 1: The Construction of the interference conflict graph requires (K ); -Step 2: The sorting of the vertices requires (K ); -Step 3/4: The colouring process requires in the worst case (K 2 ).
Thus, the complexity required to operate the whole proposed algorithm is (K 2 ). Added to that, according to the IGS procedure, the proposed algorithm is performed for NT times to select the near-optimal threshold th . As a consequence, the total complexity is (NTK 2 ), which is acceptable for a powerful CPU.

CONCLUSIONS
We put forward a graph-colouring based approach to reduce DL PC and to decrease the DL pilot overhead due to the DL pilot utilization required for DL channel estimation in the context of CF-MMIMO system. It is performed by relaxing the orthogonality property of pilots, which generates the DL PC effect. First, considering this effect, we selected an appropriate threshold. Then, in order to model the potential interference relationship among all UEs, we employed an interference or conflict graph. After that, an algorithm GC-PA-CF-MMIMO is proposed to assign DL pilots to the UEs, eliminating the predominant interference involved in the conflict graph. Precisely, this algorithm allows to reuse DL pilots, making them fewer, and therefore, reducing their size, in combination with mitigating DL PC. Consequently, using the proposed algorithm, in CF-MMIMO, the gain in terms of min-user DL net throughput is increased compared to the conventional pilot assignment techniques in CF-MMIMO. We notice that the DL pilot overhead ratio, using the proposed scheme, is reduced compared to the DL orthogonal pilot assignment.