Data-Driven Self-Organization With Implicit Self-Coordination for Coverage and Capacity Optimization in Cellular Networks

Coverage and Capacity Optimization (CCO) and Inter-Cell Interference Coordination (ICIC) are two tightly coupled and conflicting Self-Organizing Network (SON) functions that are responsible for ensuring optimal coverage and capacity in any cellular network. While executing currently, these functions may modify the same RF and antenna parameters, resulting in severe performance deteriorations. In this context, a centralized optimization and coordination approach may be impractical considering the large sizes of network clusters and the dynamics involved between the several other defined SON use cases. In this work, an implicitly coordinated and scalable self-organizing architecture is followed such that when a carefully defined multi-objective utility function for CCO-ICIC joint optimization is optimized locally by each RAN node, a desired balance between the two conflicting network targets of coverage and capacity is ensured globally. Pareto analysis of three variants of the proposed Local Multi-Objective KPI (LMO KPI) has been conducted to implicitly coordinate the two SON functions in a distributed self-organized manner. In order to recommend appropriate network configurations dynamically to quickly adapt to altering network environments, two collaborative filtering-based Recommender Systems (RecSys), one using a Deep Autoencoder and another based on Singular Value Decomposition, have been employed along with a neural network regressor to improve recommendations for cold-start scenarios. The two proposed hybrid-RecSys-based SON coordination solutions, while adopting an appropriate Local Multi-Objective KPI (LMO KPI), outperform previous work in coverage by 36% and in capacity by around 2% while reducing power consumption by more than 50%. The study demonstrates that the definition of the LMO KPI is crucial to the performance of this approach. Altogether, the work shows that the adopted self-organization and implicit SON-coordination approach is not only feasible and performant but also scales well if implemented meticulously.


I. INTRODUCTION
T HERE is an evident paradigm shift towards smart and cognitive networks that utilize vast amounts of network telemetry data for automating and optimizing several complex issues in the domain of wireless communications. The involvement of advanced Machine Learning (ML) algorithms for addressing the challenging problems of communication systems and making them adaptive and self-aware, has been very well realized by the telecom industry [1], [2], [3]. One crucial area of application for Artificial Intelligence (AI) based network data analytics is for cognitive network management of cellular mobile networks. 3GPP introduced the notion of Self-Organizing Networks (SON) first in Release 8 and then has been elaborating on the idea with several subsequent releases. In recent releases, dedicated network functions like Network Data Analytics Function (NWDAF) and Management Data Analytics Service (MDAS) have been introduced to actively promote the integration of AI-powered solutions in the field of telecommunication [4], [5], [6], [7].
In the context of Cognitive Network Management (CNM), employing zero-touch automated configuration of radio-access networks increases their efficiency and minimizes the need for expensive human engineering expertise. There are several SON functions introduced by the Standards Developing Organizations (SDOs) to individually address specific network goals like load balancing, handover optimization, interference management, coverage and capacity optimization, energy saving, outage compensation and some others. The SON functions, or CNM functions, are autonomously operating closed control loops that cognitively monitor different network contexts and strive to achieve their respective performance targets or technical objectives by regulating the required network configurable parameters. Generally, there are multiple concurrently executing SON functions with distinct operator-defined objectives. If the behaviour or targets of some SON functions are complementing, their gains can be further increased. However, on numerous occasions, the updates by one SON function could also negatively impact the performance of one or more SON functions. In order to manage these positive and negative couplings between the simultaneously operating SON functions, the concept of Self-Coordination has been introduced [8].
The task of modelling the dynamics between the coexecuting SON  environments of present and future cellular networks, purely based on human expert knowledge can be cumbersome and still inaccurate. The advancements in the domain of AI have already attracted researchers of various spheres to employ advanced ML algorithms for modelling such complex relationships. The telco industry and academia have also very well recognized the potential of ML-based intelligent solutions that can be utilized for CNM [9], [10]. The availability of vast amounts of network telemetry data, the advances in the fields of storage and computation and the introduction of cloudification and softwarization in networking can act as major enablers for the adoption of ML for exhaustive network analytics and zero-touch operation and control in next-generation CNM solutions.
In this work, the emphasis is on the simultaneous optimization of coverage and capacity which are crucial as well as challenging CNM targets for wireless cellular networks. A dedicated SON function called Coverage and Capacity Optimization (CCO) has been defined by 3GPP that aims to take care of coverage holes and achieve a good trade-off with capacity. Another tightly coupled SON function, Inter-Cell Interference Coordination (ICIC), tries to mitigate ICI situations by reducing overlapping coverage. It is known that interference coordination, coverage and capacity optimization are contrasting network goals that cannot be individually realized without affecting the others [11]. These functions can thus conflict with the operation of each other when they operate concurrently in any network as they might alter the same network configurations with conflicting targets. Thus, the attempt in this work is to capture the dynamics between these two SON functions by deriving an implicit SON-coordination model that can establish good coverage without creating significant interference to neighbour cell sites, thereby ensuring optimal capacity.
Recommender Systems (RecSys), a sub-class of ML algorithms focused in this work, are unsupervised information retrieval systems that are capable of analyzing huge volumes of data related to the knowledge of the users, items and the user-item interactions in order to predict the preference of a user among a set of items [12], [13]. The proposed SONcoordination solution is designed using hybrid collaborative filtering RecSys with an intent to exploit the similarities in the observed network environments by the cell sites to collaboratively and cooperatively determine suitable sets of parameter configurations for dynamically changing traffic scenarios, from a large space of possible combinations.
The functional architecture of the proposed hybrid SON solution comprises self-organized intelligent agents executing at each RAN Node (RN) that collect the relevant observation from their respective cell sites and enforce configuration updates in a distributed manner based on the recommendations from trained ML models. The targeted ML models can be trained centrally at an edge cloud or a public cloud in a cooperative manner based on the observations shared by the participating RNs. The idea is to achieve the targeted network goals in a decentralized manner such that instead of determining the near-optimal configurations for a large network cluster, the problem is broken down to each RN following a divide-and-conquer strategy. The absence of a central entity for determining suitable configurations for all the involved network entities lets the solution scale to any network size and also ensures that there is no single point of failure. Although this work focuses on two specific SON functions, the design of the SON-coordination solution is proposed keeping in mind that the general requirements of any other conflicting and complementing SON functions are also catered. This way the motivation is to propose a distributed, practical, scalable, resilient, self-organized, and self-coordinated solution that can be deployed for any network size and easily extended to any number of inter-dependent SON functions.
The organizational structure of the rest of the paper is as follows. In Section II, a literature review is presented for the relevant works in this domain along with the key contributions of this work. The reasons for considering the application of Recommender Systems over the other existing ML-based solutions have been deliberated towards the end of this section. Section III briefly covers an overview of the principles and design paradigms that have been taken into consideration for the solution architecture to effectuate selforganization and implicit self-coordination of SON functions. This section provides a high-level idea about how the proposed SON-coordination solution could be deployed in practical networks. Next, the conceptual design of formulating the SONcoordination problem in terms of conventional Recommender System models has been discussed in Section IV. In Section V, the implementation details of two proposed hybrid collaborative filtering RecSys-based SON-coordination solutions have been elaborated. The simulation setup, results and observations have been presented in Sections VI and VII, followed by some concluding remarks in Section VIII. For ease of reading, a list of the critical acronyms is included in Table IV.

II. LITERATURE REVIEW AND KEY CONTRIBUTIONS
The two SON functions -ICIC and CCO have been individually studied in several works. The standalone designs mostly focus on the intrinsics involved in each of these individual closed-loop automation functions. When they are executed concurrently in the network, they tend to impact each other's performance and these interactions further increase the degree of optimization complexity. Nevertheless, it is still important to understand the central ideas of both these SON functions to model them jointly such that the operation of both can be optimized without compromising the performance of the other.
Most of the recent works on CCO are targeted at collectively optimizing the antenna and RF parameters of a group of RNs over a central controller. The authors in [14] have proposed a CCO approach in which cells with coverage issues are first heuristically recognized and clustered along with the impacted cells. Then the antenna azimuth and downtilt are jointly optimized for the complete cluster using Sequentially Unconstrained Maximization Technique. The heuristic approach for clustering cells with coverage issues requires human supervision and may still be sub-optimal and less practical for large network clusters. Another CCO work on similar lines is [15], where the proposed solution first attempts to determine the problem cells and group a set of neighbouring cells to centrally determine appropriate antenna azimuth and downtilts for all the cells in the group using a Differential Evolution approach. This evolutionary algorithm-based metaheuristic approach may not be able to figure out the optimal solution and the centralized search for suitable antenna configurations for all cells may not scale well for large clusters. Also, these works focus on under-coverage and over-coverage issues but the gains in terms of capacity are not evaluated.
In [16], a preliminary work on CCO has been evaluated where a patch of a network with multiple RNs is optimized in terms of coverage and capacity using a single controller. The transmission powers and antenna downtilts of all the RNs in the area of interest are collectively optimized using methods like Deep Deterministic Policy Gradient and Bayesian Optimization. The authors of this work acknowledge that such a centralized optimization approach may be infeasible for larger networks as the optimization space would explode, but it provides a good insight to understand the trade-off between coverage and capacity. Another work using Reinforcement Learning (RL) has been proposed in [17] where antenna downtilt is dynamically adjusted for handling CCO issues. Apart from these, there are several works [18], [19] in the literature where the attempt is to explore the impact of transmit power, antenna azimuth and downtilt. The authors in these works highlight that among the three configurable parameters, the effect of downtilt on coverage and capacity is the most significant and along with power control, the CCO gains can be further amplified.
The problem of ICIC has also been extensively studied in several works. The authors in [20] have investigated different frequency reuse schemes with various uniform and non-uniform user distributions and varying network loads. They have demonstrated the advantages of Soft Frequency Reuse (SFR) in terms of spectral efficiency, mean throughput and the availability of the complete spectrum for all cells (Reuse-1). Practical network scenarios are considered where it is not the case that all the UEs experience good radio conditions and network loads are relatively low. In [21], [22] and [23], the authors have explored genetic algorithms and RL-based methods to determine and adjust the most suitable sub-band power factor for centre UEs. In [22], even the configuration for the edge-to-centre boundary is considered for optimization to dynamically update the categorization of the users as cell-centre and cell-edge UEs.
An Inverse Reinforcement Learning based ICIC strategy is investigated in [24], where Wasserstein Generative Adversarial Networks and Double Deep Q Network have been used in combination for performing behaviour imitation with limited real training samples. This approach could be interesting in certain scenarios but would require the involvement of good human expertise or otherwise would lead to suboptimal solutions. Also, curation of a rich dataset may be required in this case as a noisy and limited training set may result in poor quality of generated synthetic data, and that in turn would impact the decision-making of the model.
These two SON functions are very critical for mobile networks and their network goals are tightly intertwined as it is quite inevitable to enhance the coverage of a cell without impacting the nearby ICI situation and thereby the network capacity. For instance, increasing the transmission power or reducing the antenna downtilt may improve the coverage situation but if the operation is not done carefully it would deteriorate the ICI situation and the overall network capacity could be compromised. In the literature, there are not many works that have attempted to resolve the conflicts between these two SON functions for achieving their overall network objectives. However, it is worth discussing the ideas and solutions that have been proposed to achieve self-coordination between other SON functions.
There are broadly two kinds of strategies used for coordinating the operation of conflicting SON functions. One of the approaches involves engaging an explicit coordinator function that operates on top of the SON functions in conflict [25], [26], [27]. The roles of the external coordinator are to limit or completely switch off the functions of one or several SON functions that are lower in priority. Another popular approach is to jointly implement and optimize the correlated SON functions such that the allowed range of configurable parameters is not restricted due to any kind of prioritization of SON functions [28], [29], [30], [31], [32]. In [33] also, the authors have promoted the approach of joint-optimization over external coordinator-based approaches as the chances of selecting suboptimal configuration parameters significantly reduce when there is no restriction in the space of candidate solutions. Additionally, the advancements in the fields of AI, computing capabilities and storage have made it absolutely viable to handle a huge space of possible configurations for jointly modelling several SON functions with varied and coupled objectives.
In [34], the application of Recommender Systems has been proposed for joint modelling of conflicting and complementing SON functions. The idea to apply RecSys in this context is that it can recommend the most appropriate set of configuration parameters for any specific observed network environment such that the network goals of all the implemented SON functions are achieved with the best possible trade-off. Unlike most of the prior works that are based on variants of Reinforcement Learning approaches, RecSys are capable of handling high-dimensional state-action-space that may exist in joint-optimization-based SON-coordination problems. Therefore, the ingenious RecSys-based proposed framework can be easily scaled up for any number of related SON functions. Apart from that, the benefits that RecSys-based solutions can bring in the domain of network optimization is that they can be made efficient with minimal human supervision and can be trained over sparse real network data with few parameter configurations actually tried over the network [34]. This way the combinatorial explosion issue of RL-based approaches can be avoided. The specific contributions of this work are as follows: • A hybrid Recommender System based decentralized self-organizing and self-coordinating SON solution is proposed and evaluated for two intertwined SON functions, CCO and ICIC. Two variants of hybrid collaborative filtering based recommender systems, one based on matrix factorization and another using neural network have been explored. The recommendations for the cold-start scenarios have also been improved for both the employed RecSys using an additional deep neural network-based regression model. • Local Multi-Objective KPIs (LMO KPIs) have been designed as a rating system for the employed RecSys that can ensure implicit coordination between the addressed SON functions while facilitating self-organization in the network in a distributed fashion. These LMO KPIs, if maximized, should be able to achieve joint optimization of the targets of the given SON functions and also meet overall network goals by accomplishing the local objectives of the target nodes without compromising that of their neighbours. • The selection of a suitable central tendency measure for the network KPIs involved in the LMO KPI is also important to aptly learn the performance of the different network configurable parameters over the given state of the environments. In this context, a Pareto analysis of three variants of the proposed LMO KPI is conducted for the joint implementation of CCO and ICIC. This evaluation shows the effect of the varying coverage-interference trade-off parameter on the selection of the radio parameters and finally on the overall network targets. This study is important for determining an operating range for Mobile Network Operators (MNOs) for achieving the best coverage-capacity trade-off. • The shortcomings and strengths of the three adopted measures are compared in terms of control for establishing the desired trade-off between the network targets of coverage and capacity; and the most suitable among them is selected after exhaustive analysis. The closedloop performance of the proposed hybrid RecSyss are assessed while employing the selected LMO KPI variant and have been finally compared with the baseline solution proposed in the previous work [34] in terms of relevant network KPIs.
III. SOLUTION OVERVIEW -PRINCIPLES AND DESIGN PARADIGMS The size, density and complexity of the network clusters are increasing with every generation of mobile communication networks along with their diversity in terms of access technologies and topologies. The need of the hour is to come up with solutions that are easily scalable, adaptable and flexible according to the various sizes and topologies of network clusters. The first high-level idea of the proposed solution is to achieve self-organization in the network in terms of the different objectives addressed by several discrete network functions in a decentralized manner. The second important theme to be covered is the self-coordination between the SON functions with diverse objectives that are concurrently executed at the RNs.

A. Principles of Solution Design
In a nutshell, the design of the overall proposed solution is hybrid, where ML models are trained centrally using cell-wise network data collected from all RNs and the model predictions/inferences are used in a decentralized manner at each RN for their respective cells (see Fig. 1). The idea is that the partial knowledge of all cells can be used to cooperatively train a model that can be used finally by all the cells in the network. This way the knowledge is consistent and complete throughout. If an environment is observed by some cell at an earlier point in time, then the knowledge of the most suitable parameter configuration for such an environment can be used later for any cell that has observed the particular environment for the first time. Also, for any newly created cell, there is a baseline knowledge from where it can start rather than beginning from a clean slate. Furthermore, the employed ML models are trained cyclically at regular intervals. This allows the models to evolve according to the updates occurring in the network scenarios like construction, weather, technology and topology updates etc. The periodicity of model retraining is not a constraint in such a system as it can be flexibly chosen and dynamically adapted by the solution providers based on the availability of computation resources, deployment scenarios, e.g., urban/rural, geographical locations, or the demands (reactive/proactive) of the network operators.
In the existing networks, the already available user-level and cell-level telemetry data at the Operations, Administration and Management (OAM) systems, hosted over single or multiple datacentres, can be utilized to centrally train the employed ML models. The trained ML models can be then executed at the RNs (eNBs/gNBs) at near real-time loops for inferences and appropriate configuration recommendations such that the objectives of the targeted SON functions are addressed. The proposed solution is also compatible with the O-RAN architecture based next-generation mobile communication networks. In such a flexible and distributed architecture, the ML model training can be conducted at the Service Management and Orchestration (SMO) Framework hosted over the Non-Real-Time (Non-RT) RAN Intelligent Controller (RIC) and the trained models can be executed for inferences at the Near-RT RIC [35], [36].
To ensure that the global network goals are met in a distributed, self-organized fashion, each participating RN should try to optimize a utility function that maximizes its own targets while keeping in check the impact on its neighbours. In addition, this cell-level utility function should be able to jointly optimize the targets of the implemented SON functions. Therefore, to accomplish a truly self-organized and implicitly self-coordinated SON solution, cell-level Local Multi-Objective network KPIs (LMO KPIs) are defined and explored in the proposed solution. This LMO KPI is carefully designed to address the objectives of the targeted CCO and ICIC SON functions. Finally, with the help of the trained ML models, the values of LMO KPI are predicted. The expectation is that the selection of the values of configurable parameters that maximize the LMO KPI prediction shall be able to achieve the global network performance goals set by the MNOs. In this work, hybrid RecSys-based models have been explored for predicting the LMO KPI values for recommending appropriate network configurations according to the environments detected at the respective cell-sites. The details of these algorithms are discussed in Sections IV and V.

B. Design Paradigms for Self-Organization and SON-Coordination
The design of the solution is coherent with the four design paradigms for architecting self-organizing networks, as proposed by the authors in [37]. Paradigm #1 suggests that in a self-organized system, instead of a central entity being responsible for the entire organization, the tasks and behaviour of the local agents should be defined in a way that the desired global properties can be established. In this case, according to paradigm #1, the employed ML models are able to learn only the local behaviour at cell-level and thus promote a divide-and-conquer strategy (see Fig. 1).
According to paradigm #2, a self-organized system should not aim for perfectly conflict-free resource coordination between the participating entities as in the case of a centrally organized system. This kind of centralized coordination mechanism may require significant signalling overhead in a highly dynamic network. The recommendation is that it is better to tolerate some temporary localized conflicts if they can be easily detected and contained. The design should avoid applying any explicit coordination between the RNs but rather let them observe and communicate only in their neighbourhood to decipher the status of the network and react accordingly. This is adhered to in the proposed design as well.
Due to the locality in the design, there is no requirement for the global states of the network and this way the paradigm #3 is fulfilled. It suggests that long-lived global state information may have inconsistencies in a dynamic system and with shortterm local state information, the assumptions about the other nodes and the dependencies between them could be drastically minimized. This would make the networks more adaptive and resilient against updates and failures in the system. Paradigm #4 finally defines how the system should be adaptive at a local node level rather than involving any centralized entity for helping in monitoring and reacting continuously. There could be three levels of adaptation. Level 1 adaptation is about a protocol that can modify the control settings according to the short-term regular changes in the environment. In the proposed solution, this is analogous to a trained hybrid RecSys that can recommend suitable network configurations based on the changing state of the local network environment. In Level 2, the system should be able to adapt to longterm behavioural changes to optimize system performance.
In the proposed design, model retraining can accommodate such behavioural changes caused by the updates occurring in the network scenarios. And finally, level 3 adaptation proposes that it can conduct a major adaptation when it realizes that the changes are so severe that the employed algorithm no longer converges. In such cases, grid search for hyperparameter tuning can be performed in such ML-based SON solutions.
Once a scalable design for self-organization is achieved, another critical challenge is that there are several SON functions with varied objectives that may be executed concurrently at all RNs. The individual SON functions may organize and coordinate well with their own instances executing all over the network but the different SON functions hosted for each RN may have several conflicting and complementing relationships between them. This calls for solutions for self-coordination that can make the SON functions converge to meet the overall network targets in a stable and robust manner. In this context, the approach followed in this work is to use an appropriate LMO KPI for achieving implicit-coordination between the targeted and tightly coupled SON functions. The defined LMO KPI consist of an operator-tunable parameter that allows them to configure the target with the kind of trade-off they desire between the several network performance goals. In conclusion, the distributed and localized design for self-organization and the implicit self-coordination approach makes the solution scalable to any network size and any number of SON functions.

IV. PROBLEM FORMULATION
The task of achieving self-organization and selfcoordination between SON functions is accomplished by extending the RecSys-based formulation proposed in [34]. RecSys are a sub-class of ML algorithms that are used as scalable information retrieval tools. They are capable of processing huge amounts of sparse and high-dimensional datasets and have been successfully employed for several realworld use cases like generating friend suggestions; movies, news and product recommendations and several others. The conventional idea of RecSys is to recommend appropriate items to users based on their history of interactions with the set of items. The historical user interactions are used to learn their preferences. Analogous to this idea, here the given problem of self-organization and self-coordination is formulated in terms of RecSys, where recommendations of the most appropriate combination of configuration parameters are generated according to the LMO KPIs recorded for different environments observed in the network. Beyond this point, the Environments are abbreviated as ENV and the Parameter Sets as PS.

A. Observable Network Variables Set or Environment (ENV)
ENV refers to the state of the observable network variables observed at the cell-level. For interference coordination, an SFR scheme is implemented such that the users are segregated as centre and edge UEs, served with orthogonal sub-bands. The Percentage of Centre UEs (PCU) gives a rough estimate of the distributions of users in a cell. This information can be useful for both CCO and ICIC. All these variables are monitored for each cell separately. This way the RNs can detect the coverage and signal quality of their respective cells and also the degree of interference it creates to its neighbours. The DIC is communicated to them by their neighbouring RNs. The readers can refer [34] to get further details about these selected ENV variables and the rationale behind them.

B. Set of Configurable Network Variables or Parameter Set (PS)
Similar to the work in [34], three configurable network variables namely Power Factor for the centre UEs (PF centre ), Edge to Centre Boundary (ECB) and Antenna Downtilt (DT ant ) are considered in the Configurable PS. The first two parameters are directly associated with the SFR scheme employed for coordinating ICI. The value of PF centre can vary from 0.1 to 1.0 (maximum transmit power) with a resolution of 0.1. The ECB is used to control the proportion of centre and edge UEs as per the applied SFR scheme. Its value ranges between 0.7 to 1.4 and the adjustment resolution is 0.1. Unlike in [34], the range of DT ant can be varied from 5 to 18 degrees. This increased DT ant values allow increasing the focus of transmission further close to the cell centre. So, finally the set of PSs comprises of 10x8x27 = 2160 combinations of PF centre , ECB and DT ant .

C. Local Multi-Objective Utility Function for Self-Organization and Self-Coordination
In a conventional RecSys framework, an explicit or implicit rating system is required that can provide an indication of the degree of suitability or desirability of a recommendable item for a given user. In a system where a human is involved, direct or indirect feedback about any item can be used as a rating to learn the preferences of the users and generate meaningful recommendations for them.
In this problem of network optimization, a measure is to be defined that can be used as a rating about how suitable is a PS configuration for a particular observed ENV in terms of the involved SON functions. In other words, in this context, a measure is to be chosen such that if its value is high for a given PS, it should be one of the most appropriate network configurations for achieving the desired trade-off between coverage and capacity. For this reason, a Local Multi-Objective Utility function or LMO KPI is defined that can be utilized to achieve joint-optimization or implicit coordination between the involved SON functions. In addition, this LMO KPI should be such that if optimized locally would also aid in accomplishing the global idea of decentralized self-organization.
Assuming the cells in a network are denoted by i ∈ {1, . . . , I } and the UEs served by the i th cell are represented by k (i) ∈ {1, . . . , K (i) }, then the SIR observed by a UE k (i) can be expressed as: where, RSRP Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
The final network goal is to maximize the SIR while minimizing the coverage holes. According to Eq. (3), if for each UE in the network, the RSRP detected from its serving cell is maximized and the RSRP detected from its neighbour cells, i.e., the ICI caused by the neighbouring cells, is minimized then the overall network SIR should be optimized. In a distributed self-organized system, a cell individually cannot limit the ICI detected by its UEs from its neighbours but rather it can control the ICI it creates to its neighbouring cell UEs. This should also ensure that the above-mentioned objective is achieved globally.
In addition, the LMO KPI must be carefully designed as it is extremely critical for self-organization between the participating RNs and achieving implicit self-coordination between the interrelated SON functions. It should ensure that the overall network goals are attained by maximizing the KPIs of the target cell without compromising that of its neighbours. Considering everything, the LMO KPI proposed for selforganization and self-coordination of the two targeted SON functions, ICIC and CCO is as follows: where, RSRP cell of i th cell can be expressed as k (i) and IFC cell , i.e., the aggregated Interference Created by the i th cell to the UEs of the neighbouring cells [34] can be denoted by . α is the coverage-interference trade-off parameter that can be configured by the MNOs to achieve the desired coverage-capacity targets. Intuitively, an appropriate PS that maximizes the LMO KPI should maximize the cell RSRP (or coverage) without creating significant ICI to its neighbours. The values of RSRP cell and IFC cell are individually normalized between 0 and 1 to ensure that the effect of α is consistent.
In the utility function, it would not be a good idea to consider the sum of RSRP and IFC values reported by all UEs as the value of the function would be then dependent on the number of UEs. So, choosing an appropriate central tendency measure to read the aggregate RSRP and IFC situation of a cell is critical. In [34], the mean values of RSRP cell and IFC cell were considered as a central tendency measure for optimization with α = 0.6. The observations presented in Section VII-A show that based on the distribution of UEs in a cell, mean values can become an inefficient target for optimization for several scenarios. Therefore, in this work, median and quartiles are also evaluated as candidates for choosing an apposite central tendency measure for optimization. In addition, the complete range of α is also analyzed to learn its impact on the overall interference, coverage and capacity situation of the network. The three LMO KPIs explored in this work are as follows: 1) Mean RSRP cell and IFC cell : Generally, the mean values of KPIs are considered a preliminary indicator of their central tendency and many a time this impression could be wrong. For instance, in this case, the RSRP cell can be increased by improving the RSRP of all the UEs or increasing it significantly for some of the UEs. In the latter case, it would give a wrong impression of the coverage situation. Similarly, IFC cell can be reduced by generating a small amount of interference for a greater number of UEs. This can happen as an effect of keeping low antenna downtilts. Here, mean as a central tendency measure would be dependent on the distribution of UEs and thus could be misleading in several scenarios. (5) 2) Median RSRP cell and IFC cell : The Median is another important statistical measure that can give a good idea about the data distribution. In many cases, it could even be a better choice than the mean as it is more robust against outliers. So, in the case of a few noisy samples or when distributions are skewed, the median could be a better measure of the central tendency. This second version of the LMO KPI is denoted by the equation below:

3) First Quartile RSRP cell and Third Quartile IFC cell :
The intuition behind the LMO KPI in the case of CCO and ICIC is to make sure that the RSRP cell is raised to a point till the interference caused to the neighbouring edge UEs is not troublesome. Therefore, in this case, the first quartile of RSRP (RSRP Q1 cell ) is kept as a target for maximization, assuming the values above it would also increase. These low observations of RSRP would be majorly recorded by the UEs at the cell edges. Thus, if the situation for these worse-case scenarios is rectified, it could improve the coverage situation for most of the UEs of a cell. Then, in the case of IFC cell , if its third quartile value is chosen for minimization, it would mean that the lower IFC values detected by UEs below the third quartile would also most probably reduce further. The higher values of interference would be detected by the immediate neighbour UEs, just outside the target cell boundary. In case IFC Q3 cell is selected for minimization, it should naturally reduce the ICI caused to the far-off neighbour UEs. Therefore, with this measure, the motivation is to determine whether maximizing the RSRP Q1 cell and minimizing the IFC Q3 cell can provide a better Pareto frontier for coverage and capacity trade-off. Mathematically, it can be represented as: The endeavour here is to maximize the SIR but it is also important to learn how it is done. It can either be maximized by simply improving the situation of some UEs (mostly close to the RNs) or while improving that of most of the UEs. The latter is more desirable when the target is to maximize coverage simultaneously.

V. PROPOSED SOLUTION BASED ON HYBRID COLLABORATIVE FILTERING RECSYS
According to the hybrid design for self-organization, the observations collected by all the RN agents are shared with the OAM which could be hosted on public or private datacentre(s). The proposed solution uses a Recommender System The local data of all RNs collected over the cloud is first organized as a matrix with rows containing the actual LMO KPIs observed for a specific ENV ID for a particular set of PS IDs. This ENV ID-PS ID-LMO KPI (EPL) matrix is generally a sparse matrix as all the combinations of PSs are mostly not explored in the real network for all the possible ENVs. The employed RecSys are trained using this consolidated sparse matrix data to finally predict the values of LMO KPIs for the unexplored PSs. Finally, the PSs with the highest predicted LMO KPIs are recommended for the corresponding ENVs.
Conventionally, RecSys encounter a problem known as the Cold Start problem [12], [38] where it becomes difficult to generate good recommendations for users (ENVs) with fewer actual observations for the set of items (PSs). This is because there is a lack of information to estimate the preferences of the particular user precisely. For such cases, in [34], a random PS from the list of top overall best-performing PSs are recommended until a significant amount of observations are available for appropriate recommendations. In this work, to address the cold start situations, an additional Deep Neural Network based Regression (DNNR) model is trained that is able to predict the LMO KPI based on the unquantized values of the ENV and PS variables. The PS with the highest predicted LMO KPI is recommended for the ENVs facing the cold start situation (see Fig. 2). The details about the three discussed models -SVD RecSys, DAE-RecSys and the DNNR model are elaborated in the following subsections.

A. Singular Value Decomposition Based RecSys (SVD-RecSys)
Singular Value Decomposition (SVD) is one of the stateof-the-art collaborative filtering approaches that has gained a lot of popularity over the last few years. In this case, SVD is used to generate a fully-specified low-rank approximation of the sparse LMO KPI matrix. This way it can provide LMO KPI predictions for the unobserved PSs for any given ENV. For a PS p applied to an ENV e, the computation for LMO KPI prediction is done using the Eq. (8) [39] [40].
where the elements of vector v p contain the latent features of p and the vector u e captures the degree of suitability that an ENV e possesses for each of the latent features. The bias parameters b e and b p account for the deviations in LMO KPI for ENV and PS respectively, from the overall mean LMO KPI, μ LMO KPI . The readers can refer to [34] for more details about how the SVD-RecSys model is used to generate PS recommendations for the observed ENVs based on the approximated LMO KPI predictions.

B. Deep Autoenocder Based RecSys (DAE-RecSys)
Over the past few years, Deep Learning (DL) too has gained a lot of interest in the domain of RecSys due to its flexibility and capability to capture the nonlinear and nontrivial dynamics within the input data. Several DL-based RecSys have been explored in diverse domains [41] and one of the architectural paradigms involves the application of Autoencoders for such recommendation tasks [42], [43], [44], [45]. The authors in [46] even demonstrate the superiority of Autoencoders as collaborative filtering based RecSys over popular matrix factorization techniques like SVD. Therefore, in this work, a Deep Autoencoder-based RecSys (DAE-RecSys) has been additionally studied as a potential solution for self-organizing and self-coordinating network functions. An autoencoder is an unsupervised learning based neural network architecture that is composed of three components -encoder, code or bottleneck and decoder. The aim of this network is to reconstruct the input in the output layer by trying to learn a function h(r e ; θ) ≈ r e (9) Here, r e = (r e1 , . . . , r en ) represents a sparse LMO KPI vector for any ENV e ∈ E = {1 · · · m} when corresponding PS p ∈ P = {1 · · · n} has been configured for it. θ denotes the set of model parameters Generally, autoencoder-based RecSys can be exploited in two ways. It can either be used to capture low-dimensional feature representations at the bottleneck layer or to generate a fully specified prediction for the sparse input vector in the reconstruction layer. The latter case is used in this work. The output of l th layer is denoted by z l e which can be defined as where σ stands for sigmoid activation function which is a nonlinear function represented by where (•) represents Hadamard product, m e = (m e1 , . . . , m en ) is a mask vector such that m ep = 1 if r ep is observed, else m ep = 0, and λ is the regularization rate for L2 penalty. Once the model is trained with parametersθ, a prediction of LMO KPI for ENV e for any PS p using the DAE-RecSys can be computed using Eq. (13).
The PS that generates the highestr ep is recommended for the target ENV (depicted with dark green colour in Fig. 3(a)). The neural network architecture for DAE-RecSys used for this work is as follows: 2160, 128, 256, 256, dp(0.65), 256, 128, 2160; which means four layers in the encoder (2160, 128, 256, 256), coding layer or bottleneck layer of 256 and four layers in the decoder (256, 256, 128, 2160). A dropout layer with a drop probability of 0.65 is introduced at the output of the encoder for regularization and to avoid over-fitting. The Sigmoid activation function is used for all the layers except the last layer of the decoder which is kept linear.

C. Deep Neural Network Based Regression Model (DNNR)
In order to generate better recommendations for the ENVs for which there are fewer observations logged in the database, a Deep Neural Network based Regression model is employed in conjunction with the applied RecSys engine. It models the relationships between the discussed ENV and PS variables and generates corresponding predictions for LMO KPIs. The difference in the approach for predicting LMO KPIs as compared to the RecSys-based approaches has been depicted in Fig. 3. Instead of abstracting the details of the observed ENV (with an ENV ID) and only looking at the sparse vector of actual LMO KPIs, in this case, the actual continuous values of the ENV and PS variables are used for predicting the LMO KPIs. Therefore, the DNNR model takes the set of ENV and PS variables as input and predicts the value of the corresponding LMO KPI. Once the LMO KPI predictions for all the PS combinations are available for the detected ENV, the PS corresponding to the highest predicted LMO KPI is recommended (depicted with dark green colour in Fig. 3(b)) Several linear and non-linear regression methods have been explored for this exercise and based on the model accuracies, the DNNR model is selected. Table I consists of the test losses of the models trained with the respective regression algorithms. A grid search is performed to determine the best set of hyperparameters for all the ML-based regression algorithms. For comparing the accuracy of the evaluated models, one-third of the total collected data (test set), not used during model training, is used to compute the Root Mean Squared Error (RMSE) between actual and predicted LMO KPI values. In Table I, among the various regression algorithms, it can be observed that the test RMSE of the DNN regression model is the lowest and is therefore used in the hybrid RecSys for closed-loop online evaluation. The details of the explored regression algorithms are not in the scope of this paper but the readers can refer to [47] as a starting point.
For every ENV that is facing a cold-start situation, the DNNR model is used to predict the LMO KPI for each combination of PS. This is computationally more expensive than RecSys but the probability of cold-start situations should reduce over time. In this case, any ENV with less than 10 actual observations in the EPL matrix is considered to be a cold-start case.
The following neural network architecture is adopted for the DNNR model -12, 64, 64, 1, where the Sigmoid activation function is used for all the layers except the output layer. The continuous values of the ENV and PS variables are used as inputs to the model. Instead of the derived ENV variable DIC, the number of UEs interfered with the corresponding levels of interference (L1/L2/L3/L4) are used. A normalization layer is introduced at the input as a pre-processing step. Normalization ensures that the scale of the output and that of the gradients are not affected by the difference in the scales of the inputs, making the training process more stable. The relevant model hyperparameters are listed in Table II. VI. SIMULATION SETUP A cellular network comprising 7 tri-sectored eNBs is simulated over a C++ libraries-based system-level, discrete-event

LTE network simulator provided by Nokia Bell Labs Germany and the Institute of Communication Networks and Computer
Engineering at the University of Stuttgart, Germany. The simulator comprises a wrap-around implementation around the target 21-cell network cluster to ensure better coverage and ICI approximations. A Python-based ML engine for training and executing the employed ML models is interfaced with the network simulator. The relevant simulation configuration parameters have been summarized in Table II.
Simulation campaigns are conducted with each simulation iteration consisting of 11 different traffic scenarios. The traffic scenarios are created with different combinations of UEs belonging to five mobility groups (refer Table III). Each traffic scenario is simulated for 200 seconds and after every scenario, the network is reset and the UEs are randomly initialized according to the configurations of the respective scenarios. The simulation computations happen over snapshots of 100 milliseconds and the configured SON interval is 2 seconds. A warm-up period of 40 snapshots, i.e., 4 seconds of simulation time is considered before the logging and closed-loop recommendations begin.
A homogeneous cellular network scenario of urban macrocell, similar to a simulation scenario specified by 3GPP [48] is considered so that the evaluations and observations could be comparable to real network deployments. Since the simulation involves several different mathematical models for the channel, antennas, mobility etc., (refer Table II), the data may be less noisy as compared to that collected from real deployments. The simulator implements a wrap-around around the original simulated layout such that the simulated network is like a patch from a continuous cellular network. In real networks, a significantly higher amount of observations may be required to train ML models with similar accuracies. Although basic, the Random-Walk mobility model could be quite sufficient for coverage and capacity computations. The learning of the incorporated ML models should not be very different even with highly realistic mobility models. In order to cover most of the types of trivial mobility scenarios in mobile networks, several different traffic scenarios of low, medium and high mobility have been simulated. Moreover, the design of the solution ensures that the monitoring and configuration enforcement is strictly local and it would be interesting to see the performance of such a distributed solution in a heterogeneous network too.
A random exploration phase is first conducted by randomly configuring the values of the PS variables over all the simulated traffic scenarios and the corresponding observations are logged in a database. The observations comprise of the values of ENV variables, PS variables and the corresponding LMO KPI variables. The values of RSRP cell and IFC cell are computed by taking a time average of the next five snapshot samples after any PS configuration is updated. Two-thirds of the data collected after the random exploration phase is used to train the three discussed models and one-third is used for evaluation of the models. A grid-search is conducted for each of the models and the most suitable hyperparameters used for their training are specified in Table II.
In order to benchmark the effect of the application of the three discussed LMO KPIs, the SVD RecSys proposed in [34] is used online with the network simulator. The SVD RecSys model is trained for different values of α for the three versions of the LMO KPI and global values of the relevant network KPIs are monitored for comparison. This way the impact of the local behaviour of the RecSys models over the overall network targets can be observed.
Finally, the SVD RecSys proposed in [34] is treated as a baseline and the performance of the two discussed hybrid RecSys are compared with it. In addition, a static configuration with SFR applied and with one of the overall best-performing PS is also simulated to show the network performance when there is no learning and dynamic configuration. In this case, a PS with PF centre = 1.0, ECB = 0.9 and DT ant = 15 • is considered as one of the best PS for the simulated network layout. Intuitively, maximum transmit power for centre UEs with SFR in place, and ECB close to cell-centre can ensure good signal strength for most of the UEs in the cell. Apart from that, a downtilt of 15 • is observed to be the most suitable configuration for inter-site distances of 500 m [17].

VII. RESULTS AND OBSERVATIONS
In this section, first, a Pareto analysis of the three evaluated LMO KPIs is demonstrated using the SVD-RecSys proposed in [34] where α is varied by a resolution of 0.1. Then, the performances of the two hybrid RecSys are compared with the baseline approach and also with the case with Static Best PS + SFR.

A. Pareto Analysis of the Evaluated LMO KPIs
To analyze the impact of the three variants of the LMO KPI on coverage, interference and capacity, the following four parameters are observed: is the percentage of UEs that detect serving cell RSRP below a certain threshold (refer Table II). RSRP Outage and Mean Network RSRP are observed to infer the status of coverage, Mean Interference Detected shall indicate the detected levels of ICI and Mean Network SINR can be referred to evaluate the capacity of the network. These four parameters (or Network KPIs) have been normalized to visualize them together. The readers can refer to Table V  Analytically, LMO KPI 1 (refer Eq. (5)) denotes that at higher values of α, the objective is to maximize the mean RSRP of the cell and at lower values of α, the emphasis should be on minimizing the interference created by the target cell to its neighbours. Although recommendations are generated locally by each RN agent to maximize the respective LMO KPIs, eventually the interest should be to monitor the overall effect of all the local changes on the network-level KPIs. The plots for LMO KPI 1 are depicted in Fig. 4. In congruence with Eq. (5), in Fig. 4a also, it can be seen that when α is raised from 0 to 1, the values of mean network RSRP (green) increases and the mean interference detected (red) reduces in the opposite direction. In other words, the trend of these two KPIs is proportional. This is intuitive because the increment in signal strength of a cell also increases the ICI caused to  its neighbours. Thus, the effect on these parameters is aligned according to the chosen LMO KPI. But it is also important to pay attention to the effect on RSRP outages (blue) as it increases as the value of α increases. It can be implied that at higher values of α, the optimization algorithm is trying to increase the mean RSRP by focusing the signal strength for a limited number of UEs which results in higher values of RSRP outages. Consequently, the mean RSRP improves but not the situation of coverage holes, i.e., RSRP outages. This can be also inferred by the selections of higher values of PF centre and DT ant (see Fig. 4b and 4c respectively) with increasing values of α. Higher values of DT ant indicate that the signal strength is focused on the UEs close to the RN. Since the coverage at the cell edges is not good, the resulting ICI caused is also limited. This way the signal quality is improved for the UEs in coverage and the mean SINR (orange) values are persistently high at higher values of α.
Then, the plots of LMO KPI 2 (refer Eq. (6)) are depicted in Fig. 5. In this case also, the trends of mean RSRP and mean interference created (see Fig. 5a) are quite similar to the previous case, i.e., mean interference created and mean RSRP increases with increasing values of alpha. But in contrast to LMO KPI 1 , it can be observed that there is a check on the RSRP outages as the value of α increases. This is also accordingly reflected in the configuration recommendations of PF centre (Fig. 5b) and DT ant (Fig. 5c). For increasing values of α, the occurrences of high values of these parameters are comparatively less than in the previous case. Furthermore, considering the SINR plot as well, this can be seen as a better trade-off between interference and coverage. The highest values of SINR are around the middle values of α, and beyond a point, the SINR starts dropping again. Intuitively, this makes sense as increasing the coverage overlap between cells increases the interference and deteriorates the SINR.
The plots for the third case, i.e., LMO KPI 3 (refer Eq. (7)) are presented in Fig. 6. Here, the outage situations are very low at higher values of α but the aggregate signal strength (RSRP) and signal quality (SINR) are compromised (see Fig. 6a). This must be because of the low recommended DT ant (Fig. 6c) and PF centre values (see Fig. 6b) compared to both the previous cases. The overall trends of the considered network KPIs follow that of the second case (RSRP and Interference created increases with increasing α, SINR peaks towards the centre and then starts dropping again and RSRP outages reduce as α increases) but here they seem to be more sensitive to the varying values of α. The rise and drops are quite steep. The values of SINR drop significantly beyond a certain point and it is difficult to define a good range of operation in this case. Therefore, LMO KPI 3 can lead to an overreactive system and it may not be a reasonable choice for every occasion.
It could be observed that a good range of operation for α would be somewhere between 0.4 -0.6. Considering α = 0.5,  the distribution of network-level RSRP Outage and SINR values for the three evaluated LMO KPIs, observed over one complete simulation iteration are also compared (see Fig. 7). These plots provide a closer look into the effect of the three LMO KPIs over the two significant parameters indicating the status of coverage and capacity. Clearly, a concentrated distribution around the bottom-right corner would be the best situation and among the four cases that are compared, the distribution for LMO KPI 2 seems to be the most convincing trade-off. In this case, there is a significant reduction in outages (or coverage holes) with a minimal compromise on SINR (capacity). Overall, the variance is also comparatively   low which indicates that the situation is consistent most of the time and also considering multiple traffic scenarios.

B. Overall Closed-Loop Performance Comparison
The LMO KPI 2 is selected as the objective function for optimization and then the RecSys-based SON-coordination  [34]) is executed again with the current simulation setup and the performance of the hybrid RecSys solutions are compared against it in terms of Mean Network RSRP, Mean network SINR, RSRP Outage percentage and Radio Link Failures. α = 0.5 is configured for all the cases. For the sake of completion, the Static Best PS + SFR scenario is also executed to compare the performance of the solutions with a static configuration where there is no dynamic learning and adjustment.
As discussed in the previous sections, the implicitly coordinated CCO-ICIC solutions try to optimize the LMO KPI for each cell without being aware of the global network state. If the selection of the LMO KPI is justified for the given problem and if the operation of the local RN agents is truly self-organized, then the global effect of the distributed solution should be stable and visible at the network level. Therefore,  Fig. 8, 9, 10 and 11.
It can be observed that there is a significant improvement in the RSRP outages and Radio Link Failures (RLFs) and that marginally pulls down the detected average signal strength (RSRP). This is quite certain to happen but the coetaneous improvement in mean network SINR shows that reduction in coverage holes does not come at the cost of reduced capacity. These distribution plots cover the observations logged over one complete simulation iteration with the mentioned 11 traffic scenarios. The overall summary with the ensemble average of the four network KPIs is depicted in Fig. 12. Additionally, the TX power consumption for centre UEs is also compared to determine the efficiency of the solution in terms of power savings. This demonstrates how the dynamic adaptation of transmit power along with antenna downtilt can save energy wherever possible.

VIII. CONCLUSION
The concept of Self-Organizing Networks (SON) for autonomous operations and management of cellular networks has existed for almost a decade but there are only a few practical solutions that are capable of meeting the expectations of network operators. In the past few years, many researchers and engineers have been trying to exploit the rapidly enhancing capabilities of measurement, processing, storage and advanced ML-based optimization algorithms in this direction. Along with self-organization, another important aspect that needs to be tackled during network optimization is to achieve coordination between the closely related and conflicting network functions with diverse use cases and goals. The realization of such solutions will depend on how well they scale to the needs of future mobile networks like dynamic updates in the number and topologies of cell sites, number of RAN features and parameters, traffic patterns changing over time and geographic locations and so on.
In this work, while keeping into consideration the generality and scalability of the solution design, the focus is to propose and evaluate a data-driven approach to jointly achieve the goals of two SON functions -CCO and ICIC by following the design principles of self-organization and implicit SON-coordination. The definition of the local utility function that is expected to be optimized at each of the RNs is of utmost importance in this design. It should be carefully defined such that the local objectives of each site are fulfilled without compromising that of its neighbours and eventually achieve the overall network targets efficiently. Apart from that, the comprehension of the state of the environment is also extremely critical. The attempt has been to ensure that the design is distributed with localized interactions and inferences, and that can finally demonstrate an emergent behaviour while achieving adaptability and scalability. The merits of implicit SON-coordination in such a truly self-organized architecture can be clearly implied as it can easily scale to multiple other intertwined SON functions and for any size of the network.
In this study, a thorough analysis of the multiple explored versions of LMO KPIs is conducted over the complete range of the introduced coverage-interference trade-off parameter. The recommendations of different PS configurations with respect to the adopted LMO KPIs demonstrate how the optimal parameter configuration space is altered according to the respective local goals. Finally, the closed-loop performance of the hybrid architectures of two kinds of RecSys, one based on matrix factorization and another using a deep autoencoder, employing the most appropriate LMO KPI variant is compared against the baseline state-of-the-art solution. In the proposed hybrid RecSys-based SON-coordination solutions, a deep neural network based regression model is exploited for the cold-start scenarios.
Significant overall improvements of around 36% have been observed in terms of coverage along with an increment of approx. 0.2 dB in SINR, which translates to around 2% improvement in capacity. In addition, the gain in terms of power savings using the proposed algorithms, while configuring the considered configuration parameters as a combination, is also noteworthy. The balance between coverage overlap and ICI becomes more critical for future networks that should facilitate the stringent requirements of Ultra-Reliable Low-Latency Communication (URLLC) applications [49]. In the future, it would be interesting to advance the hybrid SONcoordination solution towards a fully-distributed architecture with support for decentralized model training using Federated Learning, so that the bandwidth required for the huge transfers of network management data can be minimized. This would make the solution more scalable in terms of data handling and more efficient with respect to the communication resources required for network management.

APPENDIX
The absolute values of the relevant network KPIs observed with the three variants of the LMO KPI have been tabulated in Table V.