Deployment of Clustered-Based Small Cells in Interference- Limited Dense Scenarios: Analysis, Design, and Trade-Offs

Network densification is one of the most promising solutions to address the high data rate demands in 5G and beyond (B5G) wireless networks while ensuring an overall adequate quality of service. In this scenario, most users experience significant interference levels from neighbouring mobile stations (MSs) and access points (APs) making the use of advanced interference management techniques mandatory. Clustered interference alignment (IA) has been widely proposed to manage the interference in densely deployed scenarios with a large number of users. Nonetheless, the setups considered in previous works are still far from the densification levels envisaged for 5G/B5G networks that are considered in this paper. Moreover, prior designs of clustered-IA systems relied on oversimplified channel models and/or enforced single-stream transmission. In this paper, we explore an ultradense deployment of small cells (SCs) to provide coverage in 5G/B5G wireless networks. A novel cluster design based on a size-restricted k-means algorithm to divide the SCs into different clusters is proposed taking into account path loss and shadowing effects, thus providing a more realistic solution than those available in the current literature. Unlike previous works, this clustering method can also cater for spatial multiplexing scenarios. Also, several design parameters such as the number of transmit antennas, multiplexed data streams, and deployed APs are analyzed in order to identify trade-offs between performance and complexity. The relationship between density of network elements per area unit and performance is investigated, thus allowing to illustrate that there is an optimal coverage area value over which the network resources should be distributed. Moreover, it is shown that the spectral-efficiency degradation due to the intercluster interference in ultradense networks (UDNs) points to the need of designing an interference management algorithm that accounts for both intracluster and intercluster interferences. Simulation results provide key insights for the deployment of small cells in interference-limited dense scenarios.


Introduction
5G and beyond (B5G) wireless communication networks will strive for a capacity and coverage that are beyond the capabilities of current standards. Therefore, new radio access techniques and deployment strategies are being implemented by several standard development organizations and research groups to meet the demanding requirements of key performance indicators (KPIs) related to spectral efficiency (SE), delay, or outage probability, to name a few. Network densification, achieved by deploying a large number of small cells (SCs) throughout the coverage area, is one of the most promising solutions to boost the performance of wireless networks. SCs are based on the use of low-power and reducedsize wireless access points (APs) that provide short-range connectivity to nearby mobile stations (MSs). It has been shown that a dense deployment of SCs has the potential to increase capacity, extend network coverage, and improve spectrum utilization with reduced power consumption [1]. However, several technical challenges need to be addressed to achieve the promised performance gains in an ultradense network (UDN) setting, including the efficient management of high levels of interference, the implementation of reliable and cost-effective backhaul links, and the design of efficient protocols for the management of initial access, mobility, and handover [1,2]. In the following, we concentrate on the first of the challenges, which is interference management, notwithstanding the importance of the rest of mechanisms.
Advanced interference management techniques are required to combat the effects caused by the multiple concurrent transmissions taking place in a reduced area when considering a UDN scenario. Interference alignment (IA) has gained a lot of attention as one of the most promising interference management techniques for SCs given its capability to achieve optimal degrees of freedom (DoFs) [3,4]. IA has been applied in several scenarios such as cognitive radio (CR), device-to-device (D2D), and cellular and heterogeneous networks (HetNets) [5,6]. Although a closed-form expression to compute the precoder and decoder matrices according to the IA strategy is very difficult to be obtained in most scenarios, several linear iterative algorithms allow a practical implementation that exploits the availability of multiple antennas both at the transmitter and the receiver sides [7][8][9]. The main idea behind linear IA algorithms is to iteratively design the precoder and decoder matrices of all network users so that, for each user, the received interference signals are projected into certain subspaces orthogonal to the decoder matrix at the receiver side.
In a recent previous work [10], a performance comparison of different IA algorithms reported in scientific literature, such as minimum interference leakage (MIN-IL) [7], Gauss-Newton interference leakage (GN-IL) [9], maximum signal to interference and noise ratio (MAX-SINR) [7], and sum mean squared error (SUM-MSE) [8], is addressed. It is numerically shown that MAX-SINR achieves a good performance for low and medium signal to noise ratio (SNR) values, which is the scenario most often found in UDN settings.
The number of noninterfering data streams that can be simultaneously transmitted by using IA algorithms over an interference channel is directly related to the number of users in the network and the amount of transmit and receive antennas [11]. An IA algorithm is only feasible when all the interference signals are cancelled, and the desired data is properly recovered. Therefore, designing the precoder and decoder matrices of all the APs and MSs in the entire network by solving a single system-wide IA problem is not possible in practical densely deployed scenarios with a large number of SCs. In this context, the IA feasibility conditions that are difficult to be satisfied provided the large number of antennas required at each AP/MS to be able to align all the interference signals. In addition, a large amount of channel state information (CSI) overhead must be feedbacked to compute a global IA solution, which decreases the efficiency of the network. To overcome these problems, several authors have proposed the idea of dividing the network into several clusters with reduced size based on the fact that the strength of the received signals depends on propagation characteristics such as large-scale fading [12][13][14][15][16][17][18][19]. Towards this end, users are grouped in clusters such that the stronger interfer-ence links are captured in the same cluster, thus potentially generating a large amount of (intracluster) interference. These interfering transmissions can be efficiently suppressed by applying IA independently in each cluster. In contrast, the intercluster interference caused by geographically distant users is typically maintained weak so that it can be safely ignored. Therefore, a proper cluster design strategy is essential to increase the capacity of the network.
The idea of clustered IA is introduced in [12] to evaluate the applicability of IA in the emerging cellular networks with a high population of users. With this aim, the average throughput gains provided by clustered IA over the traditional noncooperating scheme with frequency reuse are evaluated considering a scenario made of 37 hexagonal cells each with 1 km of radius and 7 transmitter/receiver pairs per cell, a numerology still far from the densification levels to be found in 5G/B5G networks. In a later work [13], the same authors also evaluate the performance gains in terms of outage probability for large ad hoc networks. Nonetheless, the cluster design strategy was not clearly described in these seminal papers. In order to fill this gap, different cluster design algorithms have been developed in [14][15][16][17][18][19].
In [14], the authors propose a novel clusterization method based on graph partitioning for a multiuser interference network. An adaptation of the MAX-SINR algorithm based on diagonal loading (MAX-SINR-DL) is also described to consider both intracluster and intercluster interferences in the IA formulation, leading to improved rate performance. Finally, a soft-clustering design relaxing the IA feasibility constraint is proposed achieving higher performance for a high SNR regime. Numerical results are provided for a maximum of 15 transmitter/receiver pairs uniformly distributed over a 10 × 10 km square area. A different clustering strategy, based on the hedonic coalition formation game, is developed in [15] for large-scale cellular networks. Nevertheless, the cluster design in these research works assumes the use of single data stream transmission, thus limiting its applicability to scenarios without spatial multiplexing. Furthermore, recent papers have applied resource allocation with clustered IA to also reduce the intercluster interference and hence to boost the achievable SE in UDNs [16,17]. However, the aforementioned research works propose cluster design algorithms that are based on oversimplified channel models that only take into account distance-dependent path loss propagation losses and do not consider the shadowing effect.
Clustered IA has also been exploited for D2D communications in [18,19], where three grouping schemes based on fuzzy C-Means are proposed. In these papers, however, the precoder/decoder pairs are computed based on IA with symbol extensions, which is not practical design criteria because they require exponentially long symbol extensions proportional to the number of users and do not properly work under constant or slowly varying channels [3,5,14]. In this context, a partially connected IA is proposed in [6] for cellular and D2D communication networks to select the proper interference links to be aligned with IA while the residual weak interference, considered as noise, is managed by a power optimization method. However, a clustering algorithm is not considered, and the proposed method only applies IA 2 Wireless Communications and Mobile Computing to the most powerful interference links that can be dealt with the available spatial dimension. Therefore, this algorithm is not viable for UDNs with few antennas at each AP since the level of interference that cannot be handled will severely impair the performance of the network.
1.1. Contribution. According to the works described above, the relevance of the study of IA in a UDN context with 5G/B5G numerology and the design of new efficient radio access strategies becomes evident. In this paper, the capacity of SCs to provide coverage in 5G/B5G wireless networks is explored under the umbrella of the IA paradigm. A fully distributed topology as in [20][21][22] is considered, where there are no macro base stations and a large number of APs act as small base stations to provide communication services to a much smaller number of MSs. The system is modeled as a downlink K-user interference channel network, where each MS receives data from a single AP [6,7]. Moreover, each MS and its corresponding AP are considered an SC as in [20][21][22]. The transmission within an SC causes interference to the remaining users. We note that the network architecture is somewhat similar to that considered in the cell-free related literature (see [20][21][22]); however, the studied setup only requires for APs/MSs within a cluster to share their CSI whereas cell-free study requires the distribution of all users' data to all APs. Furthermore, we consider multiple antennas at the transmitter and receiver, thus enabling the incorporation of precoding/combining techniques at both sides to increase the achievable SE of the network. According to the deployment scenario considered in this paper, the proposed solution is divided into two general stages. In the first one, each MS is associated with the AP that exhibits the smallest large-scale channel loss. Then, in the second stage, a clustered-IA method is applied to cancel the interference signals. An IA iterative algorithm is implemented due to its optimality in terms of the number of DoFs. However, since in most practical scenarios the concurrent transmission links outnumber the amount of transmit and receive antennas, the fully connected IA feasibility conditions are not satisfied. Thus, a clusterization method is applied to divide the AP/MS pairs into different clusters taking into account similarities in propagation losses rather than proximity. The aim is to ensure that links causing the strongest interference levels belong to the same cluster and thus can be subsequently cancelled. Interference is only taken into account in the clustered-IA stage because considering the SINR to perform the AP/MS pair assignment and then to form the clusters would involve an iterative process with prohibitive complexity. Note that, in practical applications, the use of the SNR to decide the AP/MS pair is a common practice in the literature [20][21][22].
A novel cluster design is proposed based on the k-means algorithm since it is one of the most widely employed methods due to its speed, simplicity, and performance. K -means algorithm groups a set of objects into different clusters such that the distance function of each object with respect to the centroid of the cluster is minimized [23]. In this paper, the set of objects is defined by the large-scale propagation losses of the channel matrix including path loss and cor-related shadowing which guarantees that the users potentially causing the highest levels of interference to each other are located in the same cluster. Therefore, the proposed cluster design provides, to the best of our knowledge, a more realistic solution than those available in the literature that only consider the distance-dependent path loss. In the conventional k-means algorithm, a restriction on the number of elements inside a cluster is not considered in the formulation. Consequently, the IA feasibility conditions may not be satisfied in some clusters due to the limited number of available transmit and receive antennas. To overcome this issue, we design a size-restricted k-means algorithm to constrain the maximum number of AP/MS pairs inside the same cluster. The proposed constrained k-means algorithm is less complex than those described in [24,25].
A performance comparison is carried out between a network of SCs using singular value decomposition-(SVD-) based precoding/combining schemes and a network of SCs using IA algorithms based on either the traditional MAX-SINR approach or the MAX-SINR-DL strategy proposed in [14] to illustrate the advantage of using interferencetargeting schemes that account for both intracluster and intercluster interferences. Note that SVD is the optimal linear beamforming strategy for a MIMO point-to-point transmission [26] and it is widely used in such circumstances (e.g., Wi-Fi standard IEEE 802.11 ac/ax). However, its performance is severely degraded as infrastructure and user deployment becomes denser. It is therefore interesting to evaluate the relative merits of IA-based processing and SVD beamforming under different levels of densification and taking into account performance and complexity. Furthermore and in contrast to [12][13][14][15], we consider multiple data stream transmission based on the results in [11] for a linear iterative IA algorithm. The impact of the cluster size on the average achievable rate of the networks is fully assessed as well as that of the number of antennas at each communication end alongside the network load. Also, the effect the AP and/or MS density might have on the system performance is investigated, thus allowing to illustrate the relationship between network elements per area unit and performance. Numerical results show that there is an optimal coverage area value over which the network resources should be distributed.

Paper
Organization. This paper is organized as follows. The system model and problem formulation are presented in Section 2. In Section 3, the clustered IA solution to cancel intracluster interference is fully described. The proposed cluster design is detailed in Section 4. Numerical results are presented in Section 5 that serve to validate the proposed SC deployment. Finally, Section 6 concludes the paper and provides hints for further research.  Wireless Communications and Mobile Computing stands for expectation. min ða, bÞ denotes the smallest element between a and b, and dae indicates the smallest integer larger than or equal to a. I d is the d × d identity matrix.

System Model
Let us consider a network of densely deployed SCs similar to the one presented in [20][21][22] and which can be considered illustrative of the scenarios to be found in B5G networks. In particular, we assume a network comprised of M APs each equipped with N t antennas that simultaneously provide service, on the same time-frequency resource, to K MSs equipped with N r antennas. Similar to [20][21][22], a scenario with a larger number of APs than MSs is considered, as illustrated in Figure 1. In addition, each AP is assumed to be connected to a base station controller (BSC) by means of a fronthaul link. Although unlimited capacity and error-free fronthaul links are considered in this paper, the use of finite-capacity fronthaul links will be considered in future works similar to what was done in [27].

Channel Model.
The classical channel propagation model considered in previous papers for clustered IA [14][15][16][17][18][19] oversimplifies the real radio environment. To deal with this issue, we assume the 3GPP Urban Microcell model described in [28] that takes into account the small-scale fading, largescale propagation losses (i.e., path loss and shadowing), and the possibility of the channel to toggle between non-line-ofsight (NLOS) and line-of-sight (LOS) propagation conditions. The MIMO channel matrix describing the propagation between the ith AP and the kth MS is modeled as where H ½k,i ∈ ℂ N r ×N t , i = f1, 2, ⋯, Mg and k = f1, 2, ⋯, Kg. The scalar β ½k,i represents the large-scale propagation losses, and G ½k,i ∈ ℂ N r ×N t denotes the matrix of small-scale fading coefficients. The large-scale fading is further decomposed as β ½k,i = ζ ½k,i χ ½k,i with χ ½k,i corresponding to the shadowing component and ζ ½k,i representing the distance-dependent path loss defined as [28] where ζ 0 is the path loss at a reference distance of 1 m, α is the path loss exponent, and r ½k,i is the three-dimensional (3D) distance between the ith AP and the kth MS where the height of each communication node, AP or MS, is also taken into account. Regarding the shadowing, χ ½k,i follows the model described in ([20], (54)-(55)), and it is modelled as a spatially correlated log-normal random variable with variance σ 2 χ . Parameters ζ 0 , α, and σ 2 χ exhibit different behaviours depending on whether the link is in LOS or NLOS conditions. In particular, the channel between the ith AP and the kth MS will be in LOS according to the probability [29].
where r 0 is a reference distance. LOS propagation links are characterized by a Ricean K-factor K ½k,i and holding 10 log 10 ðK ½k,i Þ~N ðμ K , σ 2 K Þ. For NLOS links, it holds that K ½k,i = 0. Finally, the small-scale fading terms G ½k,i consist of independent and identically distributed (i.i.d.) complex Gaussian random variables distributed as CN ð0, 1Þ. The channel coefficients H ½k,i are assumed to be static throughout the coherence interval and then change independently (i.e., block fading). The channel model assumed in this paper is a more realistic representation than those proposed in previous works in the scientific literature since it considers both the path loss and correlated shadowing effects. We note at this point that the inclusion of spatial correlation effects of the antenna array is left for a future work. [20][21][22], we assume that each AP can serve only one MS, and hence, an AP selection procedure is required. Each AP is randomly selected, on a per-MS basis, such that each MS is served only by the available AP with the smallest channel loss β ½k,i . Once an AP is selected, it becomes unavailable, and when all MSs have been paired with an AP, the idle APs are turned into a sleep mode to save energy. We consider that each MS and its corresponding AP form a reduced SC composed of a single transmitter/receiver pair. However, to simplify the notation and based on the terminology used in [20][21][22], each AP/MS pair will be referred to as an SC instance.

AP Selection. Similar to the SC setup introduced in
Each selected AP is intended to transmit d data streams to its corresponding MS, with the transmission of the kth AP/MS pair causing interference to the other K − 1 MSs. Since we assume a densely deployed network with a high number of SCs, a clustered IA strategy is implemented to cancel the interference signals. A limited amount of AP/MS pairs showing the highest levels of mutual interference are grouped into the same cluster in such a way that the intercluster interference is as weak as possible. This strategy is reminiscent of the one introduced in [30] where users clustered together are assigned orthogonal pilot sequences, thus leading to a minimization of the intracluster interference. In our IA-based setup, intracluster interference is completely eliminated using a linear iterative IA algorithm whereas the intercluster interference is expected to be relatively small since it comes from geographically distant users.
As it is typically postulated in most state-of-the-art wireless networks, downlink and uplink transmissions are organized in a time division duplex (TDD) operation assuming the reciprocity of the propagation channel H represents the reciprocal channel between the k th MS and the ith AP [7,31]. Global information about large-scale fading between all AP-to-MS links (i.e., β ½k,i , ∀k, i) is required for the clustering stage. Similar to [14], this large-scale information is obtained at each AP by measuring the averaged signal power of the pilot signals transmitted by each MS in the uplink training phase. Then, this information 4 Wireless Communications and Mobile Computing is sent to the BSC to perform the cluster design. As it is typically assumed in wireless communications, large-scale fading varies at a much slower pace than small-scale fading and, hence, remains virtually constant for many coherence periods [20,32]. Therefore, the cluster design only needs to be updated when the large-scale fading conditions change, thus reducing signal overhead and complexity. Finally, the signal overhead for instantaneous CSI estimation is limited to AP/MS pairs inside the same cluster. Subsequently, these estimated channels are used by the IA algorithm to compute the precoder/decoder matrices. For brevity of exhibition, this paper focuses on the downlink only, notwithstanding the fact that most of the discussion also applies to the uplink segment. Furthermore, we assume that the estimation error is negligible, and hence, perfect CSI can be considered as in [12-15, 18, 19]. We note, however, that the design of robust IA algorithms to deal with channel estimation errors constitutes a promising line for further research.

Clustered Interference Alignment Algorithm
IA algorithms carefully design the precoder matrix V ½k ∈ ℂ N t ×d at the AP side and the decoder matrix U ½k ∈ ℂ N r ×d at the MS to avoid intracluster interference. V ½k is designed to constrain all the interference signals into a subspace orthogonal to U ½k , while the desired signal is allocated into another subspace free from interference. IA algorithms have to satisfy the following feasibility conditions to suppress the interference and recover the desired signals [7] where j = f1, 2, ⋯, Kg represents the indexes of the selected APs. The characterization of design parameters to ensure that IA algorithms meet feasibility conditions has been widely addressed in a scientific literature [11,[33][34][35]. The analysis provided in these papers shows that the ability of the IA algorithm to satisfy the previous constraints depends on the number of users K, the spatial dimension of the MIMO channel (number of antennas at transmitter (N t ) and receiver (N r ) side), and the number of data streams d to be multiplexed for each AP/MS pair. Results in [33][34][35] only address a partial description of the problem, applicable to specific values of the parameters K, d, N t , and N r . Based on these former results, the authors in [11] provide a full characterization of the feasibility of linear IA for a symmetric MIMO channel with constant coefficients and no symbol extension. Equation (26) in [11] describes the maximum achievable DoFs (d * ) as a function of the number of users and the number of antennas at the transmitter and receiver as d * = f ðλðKÞ, γðN t , N r ÞÞ, where f ðλ, γÞ is a function that assuming that N r ≤ N t (see [11] for the details). In our clustered IA context, this equation offers us the framework to set the cluster size constraint. Let K = f1, 2, ⋯k, ⋯, Kg be the set of indices of the K AP/MS pairs. Moreover, let A = fA 1 , A 2 , ⋯A c , ⋯, A n C g denote the set of clusters, where A c contains the indices of the AP/MS pairs that are in the cth cluster, c = f1, 2, ⋯, n C g , and n C is the total number of clusters. For the given values of d, N t , and N r , the maximum number of SCs in each cluster, denoted as S max , is computed such that d ≤ d * = f ðλ ðS max Þ, γðN t , N r ÞÞ. Then, denoting the size of each cluster as S = fS 1 , S 2 , ⋯S c , ⋯, S n C g, where each size represents 5 Wireless Communications and Mobile Computing the cardinality of the cluster (i.e., S c = jA c j), the cluster design has to satisfy the following properties [14]: Let A c k denote the cth cluster where the kth AP/MS pair is located. Then, the received signal at the kth MS is given by where s ½k ∈ ℂ d×1 is the d-stream transmitted symbol vector to user k and n ½k ∈ ℂ N r ×1 is the circularly symmetric additive white Gaussian noise (AWGN) vector at the kth MS ðn ½k~C N ð0, σ n 2 I N r ÞÞ. The transmitted signal power by the jth AP is given by P ½j = E½ks ½j k 2 , and it is subject to a per-AP power constraint P ½j ≤ P T ∀j.
The SE of the kth transmitter-receiver pair can be determined as [8,36] where SINR ½k,l is the SINR at the kth receiver and lth data stream, l = f1, 2, ⋯, dg and it can be expressed as where is the intracluster interference plus noise covariance matrix at the kth MS and lth data stream preceding the decoding stage. The intercluster interference covariance matrix at the kth MS before the decoding stage is computed as One of our aims in this paper is to analyze how the average SE achieved for a given AP/MS pair is affected by several practical design parameters.
3.1. Max-SINR. Owing to its superior performance at low and intermediate SINR values, the MAX-SINR IA algorithm will be considered in this research work [7]. This is a linear iterative algorithm that considers the direct channel gain, the interference leakage, and the noise power to develop an alternating maximization of the SINR per transmitted data stream. Similar to [7], but considering the clustered IA scenario, the unitary decoder vector U ½k l that maximizes SINR ½k,l is given by The transmission in the reciprocal network H ½j,k is considered to compute V ½j in a similar way. In this case, the intracluster interference plus noise covariance matrix is given by and the intercluster interference is computed as Then, the unitary precoder vector is given by This is an iterative process in which the precoder and decoder matrices are iteratively updated until the algorithm converges or the number of iterations reaches a given limit. Unfortunately, due to the nonconvex nature of the optimization problem, the convergence to a global minimum is not guaranteed.
Since MAX-SINR can only remove the intracluster interference, its performance can be severely degraded in highly dense scenarios. Consequently, a modification of the algorithm is proposed in [14] to also take into account the 6 Wireless Communications and Mobile Computing intercluster interference in the design of the precoding/decoding matrices. This method is known as MAX-SINR-DL, and it is based on considering the aggregate intercluster interference as white noise which is then incorporated to the diagonal of the intracluster interference covariance matrix (i.e., Gaussian hypothesis). Therefore, equation (11) is modified as [14] U We note from the above formulations that the MAX-SINR method only requires of a global CSI knowledge of all links between APs and MSs inside the same cluster. Therefore, the CSI signal overhead is only limited to the users inside the same cluster. A distributed implementation of MAX-SINR with only local knowledge of CSI is also proposed in [7]. Moreover, according to (15), the MAX-SINR-DL method also needs a global information regarding the propagation losses of the channel (e.g., path loss and correlated shadowing) of all links between APs and MSs (i.e., β ½k,j ∀k, j). However, this information is as well required to implement the clustering stage. Hence, the practical complexity of the system is left unaltered.
Since intercluster interference is not perfectly eliminated with any of the described algorithms, the proper design of the cluster distribution will determine the performance of the system. The next section addresses the proposal to efficiently group the AP/MS pairs in different clusters to ensure that the residual intercluster interference has a minimal impact on the SE of the network.

Cluster Design: Size-Restricted k-Means
The cluster design objective is the partition of the K AP/MS pairs into n C clusters such that This optimization problem is known to be NP-hard, and therefore, the optimal solution can only be found via an exhaustive search that becomes computationally infeasible even for moderate values of K. Therefore, we propose a suboptimal solution based on the k-means algorithm that partitions a set of objects into a given number of clusters defined by its centroids to minimize a squared-error function [23]. We use the large-scale propagation losses including the path loss and correlated shadowing effect as the set of objects X = fx 1 , x 2 , ⋯, x K g corresponding to the k-means clustering algorithm such that the vector for the jth AP is given by x j = ½β ½1,j β ½K,j . This allows us to group inside the same cluster the APs with similar channel loss with respect to all MSs that in turn are precisely those that cause the higher intracluster interference. We define n C = dK/S max e as the initial number of clusters, and their corresponding centroids μ c are computed following the method introduced in [37] to increase speed and accuracy. Then, each AP/MS is associated to the nearest cluster centroid. Formally, this is expressed as where gðx j , μ c Þ is a function used to measure the distance between the propagation loss vector x j and the cluster centroid μ c that can also be viewed as a square-error function. In this paper, we use the cosine distance Þ. This distance has proved to result in roughly the same clusters as if it was conducted on the geographical positions of the APs/MSs using an Euclidean distance (i.e., clustering according to the AP/MS locations) [30]. The advantage of the clustering approach used in this paper is that it obviates the need to know the geographical locations of the communications nodes and instead relies on the already available large-scale propagation losses.
In each iteration of the k-means algorithm, the centroids are recalculated based on the current objects within each cluster and a new object assignment is developed based on (17). The design of the clusters is iteratively updated until no redistribution of the objects occurs. K-means is reported as the most commonly used clustering strategy, providing a local MMSE value. However, cluster size restriction is not considered in the basic formulation of the algorithm, and hence, there is no guarantee that this method satisfies constraint (5d). Several strategies to balance k-means have been studied to optimize both equal size clusters and local MMSE [24,25]. In these papers, the assignment phase is different from the original k-means algorithm. Instead of selecting the users to the nearest centroids, a linear programming algorithm [24] or a Hungarian algorithm [25] is applied to ensure equal-size clusters and a local MMSE at the cost of increasing the complexity. However, since we only require to limit the maximum number of SCs in each cluster instead of obtaining equal-size clusters as in [24,25], we propose a novel size-restricted k-means algorithm that is simpler than previous proposals.
The proposed algorithm is organized in two major steps. First, an initial cluster design is obtained by applying k-means algorithm with n C ini = dK/S max e clusters. Then, if there is any cluster that does not satisfy constraint (5d), a size-restricted stage is implemented to ensure that each cluster has a maximum size S max . This step is based on splitting the large clusters into smaller ones at the cost of increasing the number of clusters and hence the intercluster interference. A formal description of this method is given in Algorithm 1. The number of clusters is increased by one in each iteration of the size-restricted stage. In this way, the convergence of the method to a cluster distribution that satisfies constraint (5d) is guaranteed.

Wireless Communications and Mobile Computing
At every iteration of the size-restricted stage, the k-means algorithm is computed with a lower number of data objects in the observation set X ′ and a lower number of clusters. This is due to the fact that once the initial cluster distribution has been obtained, only the objects that belong to the n C ′ clusters that do not satisfy property (5d) are used to compute an updated version of the cluster distribution. Note that n C ′ value is reduced after each iteration until the method converges. Then, according to [23,38], the complexity of the initial cluster design in the first step of the Algorithm 1 is Oðn C ini K 2 tÞ, where t is the number of iterations for the convergence of the k-means algorithm. It can be noted that the number of objects in the observation set and the dimensionality of the data are both equal to K. Therefore, the total complexity of the proposed size-restricted k-means algorithm is always lower than OðTn C ini K 2 tÞ for T > 1, where T is the number of iterations required to guarantee the convergence of the Algorithm 1, and exactly Oðn C ini K 2 tÞ for T = 1. Since the final number of clusters n C is expected to be much lower than the dimension of the set of objects (i.e., n C < <K), then our method has lower complexity than those reported in [24,25].
The cluster design is performed at the BSC only using information related to the propagation losses of the channel instead of the complete channel matrix to reduce the signal overhead. Then, since the large-scale fading changes much more slowly than the coherence time, the cluster formation must only be updated a few times per second even in a high mobility scenario [20], thus leading to a low complexity design and a low signal overhead in the network. In fact, the most costly step of our proposal is the design of the precoder and decoder matrices that need to be updated at every coherence time. The computational complexity of the MAX-SINR algorithm is analyzed according to the number of complex multiplications as in [39]. For the direct channel, this complexity depends on the calculation of the intracluster interference plus noise covariance matrix in (9) and the computation of the decoder vector U ½k l according to (11). This last operation involves the matrix inversion of the covariance matrix. Then, similar operations have to be computed for the reciprocal channel. According to [39,40], the multiplication of two complex matrices with sizes N r × N t and N t × d requires N r N t d complex multiplications. Therefore, the intracluster interference plus noise covariance matrix calculation at the kth AP/MS pair for all the data streams (i.e., l = f1, 2, ::, dg) requires approximately a total of S c k · ðN r N t d + N 2 r d + N 2 r Þ complex multiplications in the direct channel and S c k · ðN r N t d + N 2 t d + N 2 t Þ in the reciprocal channel [39], where S c k is the number of SCs in the cluster containing input : n C ini , β, S max , K: output: A, n C 1 // Step 1: Initial cluster design by applying k-means algorithm 2 Compute an initial cluster distribution by applying k-means algorithm with input arguments: X, n C ini = dK/S max e and output argument A ini ; 3 A = A ini ; 4 n C = n C ini 5 // Step 2: Size-restricted cluster distribution 6 Compute the size of each cluster S 7 Find the number of clusters n c ′ that violate property (5d) being S c > S max 8 a = 1; 9 b = n C − n c ′; 10 while n c ′ > 0 do 11 Find the indexes set A′ of clusters that violate property (5d) such that A′ = f⋯, A ini c , ⋯g ∀c : S c > S max ; 12 Update the indexes set A only with clusters that met property (5d) fA a , ⋯, A b g = A ini \ A′; 13 Find the observation set X′ of the APs/MSs pairs that are located in the n c ′ clusters that violate property (5d) such that X′ = f⋯, x j , ⋯g ∀j : j ∈ A′c j ; 14 Compute an updated cluster distribution by applying k-means algorithm with input arguments: X′, n C ′ + 1 and output argument A upd ; 15 Compute the updated size S upd of each cluster in the set A upd ; 16 Find the new number of clusters n } c that violate property (5d) being S upd c > S max ; Algorithm 1: Size-restricted k-means algorithm.

Numerical Results
We initially consider a scenario consisting of M = 50 APs and K = 30 MSs uniformly distributed randomly within a square of area of 11 km 2 (i.e., the length of the side of the square is L = 1000 m) as shown in Figure 1. Similar to [20], cell-edge effects are avoided by implementing a wrap-around technique in which the nominal squared coverage area under evaluation is wrapped around with eight neighboring identical squared areas to imitate an infinite network. We assume a network where each AP is equipped with N t = 8 antennas and transmits d data streams to its corresponding MS which is equipped with N r = 2 antennas. In line with a literature on ultradense deployments, each AP transmits with a power of P = 200 mW distributing available power uniformly among the d streams to transmit (i.e., uniform power allocation). Numerical results are obtained averaging over 100 random locations of APs/MSs within the coverage area, and for each location, 100 channel realizations are obtained by generating different fast fadings. The rest of the simulation parameters are detailed in Table 1 and chosen in accordance with the numerology typically used in UDNs as reported in [20,28]. Figure 2 illustrates the cluster design procedure for the same scenario shown in Figure 1. AP/MS pairs with the same colors are those located in the same cluster. According to ([11], (26)), for the above simulation parameters and assuming d = 1, the maximum number of users in each cluster is S max = 9, a constraint that is shown to be achieved with the proposed size-restricted cluster design. Therefore, the fulfillment of the IA feasibility conditions is guaranteed. Figure 3 shows the average SE per user versus the number of MSs for the traditional MAX-SINR, the MAX-SINR-DL, and the SVD beamforming. Results are shown for three distinct coverage areas (L = 1000, 500, and 250 m). We consider the maximum number of data streams allowed for the SVD, that is, d = 2, while SE results for the IA algorithm have been obtained for d = 1 and d = 2. According to ([11], (26)), the maximum number of AP/MS pairs in each cluster is S max = 9 for d = 1 and S max = 4 for d = 2. As expected, it can be observed that, irrespective of the coverage size, the network performance decreases as the number of MSs increases. This result is caused by the fact that a higher value of K demands for a higher number of clusters to group all SCs. Therefore, higher intercluster interference levels are generated, thus limiting the performance of the network. In Figure 3 The amount of data streams multiplexed by the IA algorithm represents a trade-off between the amount of desired data transmitted in every SC and the number of users within the same cluster. For given values of N t , N r , and K, the higher the d, the lower the S max , and hence, the higher the intercluster interference. In the scenarios shown in Figures 3(a) and  3(b), a higher performance is achieved by transmitting with more data streams. However, for an extreme densification case like the one shown in Figure 3(c), transmitting with a single data stream achieves a higher average SE than with d = 2. The reason behind this fact is that in highly interferencelimited systems, designing larger clusters with the aim of improving the interference cancellation is more efficient than multiplexing a higher number of data streams. Note that the shrinkage of the coverage area unavoidably results in lower spectral efficiencies, irrespective of the technique in use. It is worth pointing out that under extreme densification (50 APs in a 250 × 250 m square) and heavy network load (K > 40), a reasonable condition since the amount of densification responds to the expected user load and the SVD-based beamforming performs remarkably well indicating that a useful 9 Wireless Communications and Mobile Computing strategy in this case is to concentrate the radiated energy towards the direction of maximum gain of the user-specific channel matrix. Since SVD does not require any exchange of CSI information among APs, it seems it would be a reasonable processing choice under these specific conditions. Overall, Figure 3 shows that MAX-SINR-DL is the IA-based scheme achieving the best performance in interference-limited systems since it deals with both intracluster and intercluster interference. The performance improvement is achieved by considering the information regarding the propagation losses from all users in the network. Since MAX-SINR-DL reports higher values of SE than traditional MAX-SINR in a clustered deployment of SCs, the remaining figures only illustrate the performance provided by this strategy. It can be observed that the performance increases as the number of APs grow while keeping the network load fixed. Note that the probability of having APs closer to each MS becomes higher for a larger amount of APs, leading to an increase of the probability of LOS propagation within an SC, a reduction of the corresponding path loss, and a more efficient exploitation of the diversity against shadow fading. Therefore, we can note that a network deployment with a high number of distributed APs increases the coverage probability at the cost of higher complexity and a more expensive design.
A similar result is obtained in Figure 5, which shows a performance improvement as the APs are equipped with a  larger antenna array. Higher values of N t imply the availability of a higher spatial dimension to align the interfering signals. Therefore, according to the analysis in Section 3, a larger number of AP/MS pairs could be grouped in the same cluster, thus reducing the intercluster interference. The performance curves obtained for the IA algorithm can be divided into two sections. For a single data stream transmission when N t < 29, there are several clusters, and hence, the growth of SE is produced by the reduction of intercluster interference as the number of clusters decreases. For N t ≥ 29, there is only one cluster which means that all interfering terms are dealt with in the interference cancellation process and the SE is significantly increased. In fact, as N t increases beyond 29 antennas, the SE keeps increasing but more slowly, because in this interval the growth is given by how well the interference is cancelled, and the desired signal is located in a subspace completely orthogonal to the interference subspace. In the case of d = 2, all SCs are grouped inside one single cluster for N t ≥ 60 which explains the abrupt change of the SE behaviour at N t = 60. Finally, Figure 6 shows the relation between the average SE per user and the length of the square side (L) when fixing the number of APs and MSs in the network (M = 100, K = 30). We notice that as a consequence of the realistic 3D channel model assumed in the paper, there is an interval where SE grows with L while in another one it decreases. This is because 11 Wireless Communications and Mobile Computing in dense scenarios limited by interference, as L increases the 3D distances between a given MS and the interfering APs increase faster than those between the MS and the selected AP. Then, since the large-scale fading depends on the distance, the power of the desired signal decreases more slowly than the interference power, and hence, the SE increases. As L keeps increasing and the spatial density of APs and MSs decreases, however, the aforementioned 3D distances start behaving as 2D distances and the SE decreases with L as the noise term becomes more and more significant with respect to the interference term and the desired signal power is drastically reduced. Indeed this is also what causes the SVDbeamforming solution to perform remarkably close to the best IA solution. This figure proves that in the given values of K, M,

12
Wireless Communications and Mobile Computing N t , N r , h AP , and h MS , there is an optimal value of L for which the SE is maximum. Then, although it may seem surprising in an interference-limited system, distributing a certain amount of communication resources in a very small coverage may imply a loss in system performance.

Conclusion
This paper has considered the application of various flavours of IA in the context of UDN within a 5G/B5G framework. In particular, an algorithm based on a clusterized version of MAX-SINR IA has been shown to be an effective technique to achieve remarkably large spectral efficiencies in deployments of SCs with a large number of APs and MSs in the coverage area. Results reveal that a larger cluster size, effectively implemented by using more antennas at the APs, can drastically improve the network performance, but this comes at the cost of requiring the exchange of CSI among more network elements (all the APs/MSs forming the cluster). Also, the impact modifying the densities of APs and/or MSs might have on the system performance has been investigated and has served to illustrate the relationship between the number of network elements per area unit and performance. According to the considered propagation model, given a certain number of APs and MSs equipped with a fixed amount of antennas, there is an optimal coverage area value over which these resources should be distributed. Numerical results have provided us with valuable insight to make practical designs in UDNs showing a proper balance between performance and network complexity. A future work will seek to incorporate the effects the availability of imperfect CSI and/or the use of spatially correlated antenna arrays might have on the net-work performance as well as the implementation of more sophisticated power allocation techniques.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.