A Novel Stochastic Learning Automata Based SON Interference Mitigation Framework for 5G HetNets

Long Term Evolution Advanced (LTE-A) Heterogeneous Networks (HetNet) are an important aspect of 5th generation mobile communication systems. They consists of high power macrocells along with low power cells i.e. picocells and femtocells to fill up macrocell coverage gaps. HetNet permit deployment of femtocells by users for added flexibility, but then interference issues between neighbouring cells have to be addressed as all femtocells use the same frequency channels for transmission. To mitigate this problem, LTE-A standard offers two new features, one is carrier aggregation in which Component Carriers (CC) form the basic aggregate units shared among cells and the other is enhanced Inter-Cell Interference Co-ordination (eICIC) through X2 interface. The physical implementation of these features is left open to research. This paper investigates two distinct techniques for orthogonal CC selection through Stochastic Cellular Learning Automata (SCLA) to improve the QoS performance of a femtocell. The first technique uses SCLA with user feedback, and the second technique uses SCLA with a central publishing server where all cells upload their past used CC vectors. SCLA methods are better suited for Self Organizing Network (SON) as they do not require synchronized cell coordination, have low complexity and have good optimization characteristics. The simulation results show that the techniques enhance the cell edge performance considerably.


Introduction
Recent years have seen a tremendous increase in data traffic volume and demand.Just in 2016, it is expected that more than 10 exabytes of traffic per month will be circu-lating across cellular networks with more than 4 billion 4G wireless subscriptions [1], [2].Mobile networks are striving hard to keep pace with this high traffic demand.As 4G gets deployed, the work on the 5G has already started by 3GPP as depicted by their 5G time line shown in Fig. 1.The most viable way to enhance the network capacity in 4G and beyond, is to increase the cell density as the radio link efficiency is already approaching its theoretical limits [3].Therefore in Heterogeneous Networks (HetNet) small low powered cells have been introduced, such as picocells, femtocells and relay nodes [4] to enhance the macrocell capacity and Quality of Service (QoS) of users.Thus, these small cells provide an economical and flexible way to fill up gaps and boost network coverage in existing operational areas.
HetNets have their own set of challenges and one of the major issue is mitigation of inter-cell interference, as macrocell and small cells use the same frequency band [5].The problem is further aggravated by the fact that small cells like the femtocells, will be deployed by users, making their absorption in the network more complex [6].For large number of femtocells being arbitrarily deployed there is a dire requirement for an interference control strategy that should be J u n 1 9 Oc t 1 8 Oc t 1 7 J u n 1 7 F e b 1 7 F e b 1 6 1from: http://www.3gpp.org/news-events/3gpp-news/1674-timeline_5gfully distributed, self-configurable and self-optimizing [7].Interference problem in a cellular network has been extensively researched in literature as it has direct impact on the QoS offered by a network operator.In Long Term Evolution (LTE) and Long Term Evolution-Advanced (LTE-A) standards, Inter-Cell Interference Co-ordination (ICIC) and enhanced Inter-Cell Interference Co-ordination (eICIC) schemes are available to address this problem, but the level of precision and coordination involved between the cells makes it less viable for practical implementations [1], [8].The best candidates for such scenarios are machine learning and automation techniques which form the basis of Self Organizing Networks (SON) for mobile environment [9][10][11].SON makes planning, configuration, management, optimization and healing of mobile radio access networks simpler and faster and is a core component of 3GPP Release 8 and later Releases [12].

GP P ' 5 G' T i me l i n e
Femtocells use the same physical layer technology as macrocells and support new Carrier Aggregation (CA) feature of LTE-A [1], [13].CA is also an important area in 3GPP Release 12 and the upcoming Release 13 [14].CA involves combining Component Carriers (CC) to have a wider bandwidth of 100 MHz for all the cells.To minimize interference in the network, generally a power control strategy that maximizes frequency orthogonality among neighboring cells at cell edges gives the best results.Thus, CC resources of a femtocell need to be intelligently selected and power controlled in such a way that they produce least interference and also meet the QoS level set for edge users.For users close to the femtocell the selection of carrier groups becomes less important because a lower transmit power level can be allocated to them.QoS improvement of edge users at each femtocell results in overall edge users performance gain at the network level.Authors in [15], concluded that the best way to achieve this with minimum complexity and signalling overhead between the network elements is through non-assisted self learning and adjusting methods.One of the latest research on this methodology has been done in [16] using stochastic learning algorithm, whereby femtocells independently learn and select CCs that result in minimum interference to the network.They point out the advantages of using the stochastic learning approach (i) distributed, (ii) dynamic environment interaction, (iii) energy saving, (iv) no inter-cell information exchange, and (v) low complexity.The concept of Stochastic Cellular Learning Automata (SCLA) stems from Cellular Automata (CA), whereby cells act like points in a lattice (Cellular) and follow a simple local rule (Automata) using stochastic learning [17][18][19][20].When stochastic learning is combined with automata it is termed as Stochastic Learning Automata (SLA).
In this paper we adopt two unique SCLA approaches for each femtocell to enhance the QoE of edge users in the network.First, we apply SCLA with SINR feedback from users to help femtocells self-learn their interference environment and then adjust their CC selection vectors such that they are at maximum orthogonality with neighboring cells.In the second approach, all femtocells publish their current CC selection information to a central server from where any cell can retrieve past CC selection information of its neighbors and then learn their CC selection patterns.A femtocell can then make an intelligent probabilistic estimate on future CC selection vector of its neighboring cells and select its own CC vector so that it is orthogonal to the neighborhood selection resulting in maximum SINR for its own users.Both these techniques do not require any inter-cell communication.The first technique, applies to individual cells and no exchange of information with other cells or the core network takes place.The second technique, however requires signalling between cells and the central publishing server accessible through the core network but not among the cells.As the link between cells and the core network is generally over a reliable and fast media, the impact of this additional signalling will be minimal.We have applied both of these approaches independently and in combination in the paper.
Our SCLA approach differs from existing stochastic methods [16] by presenting a generic method to select orthogonal CC vector by using orthogonal matrices.The dimension of the orthogonal matrix is based on the number of available CC resources an operator can spare for its edge user cases.The publishing approach is also different, in the sense that cells can learn from the past history of neighboring cell's CC vector selection instead of improving on inter cell coordination to select a CC vector that results in minimum neighbor interference.This method is based on the assumption that the future CC selection of neighbors is not unpredictable but rather it is some function of past selections.This assumption holds true in HetNet considering that most of the users do not change their position radically in a few network time steps.The neighborhood cells are selected based on a predefined inter-cell distance that is set by operator to minimize complexity in selecting neighbors.Finally, users experiencing worst SINR are given channel resources that are orthogonal, whereas rest of the users are given resources from channels that are power limited to a ratio of the maximum power to further reduce channel interference.Simulation scenario is taken from [13] so that results can be as practical as possible.
The paper is organized as follows; Sec. 1, introduction, Sec. 2 gives the system model and problem statement, Sec. 3 and Sec. 4 discusses theory behind the stochastic cellular automata and proposed approach, Sec. 5 presents the results achieved whereas Sec.6 concludes the paper.

Design Overview
A full urban LTE-A environment has been considered with a macrocell, several femtocells and mobile User Equipments (UE) as shown in Fig. 2. The macrocell transmission is considered uniform over the whole frequency bandwidth at maximum power.Femtocells use CC selection to transmit over whole or part of the total bandwidth depending on user throughput requirement and density.To achieve CC orthogonality between cells a generalized approach is applied using orthogonal matrices.The rows of the matrix give the orthogonal vector whereas the number of columns of the matrix is equal to the number of CCs reserved for orthogonal vector selection.We transmit at maximum power on the selected channel or channels in the orthogonal vector and at reduced power on other channels.A unity matrix will have maximum orthogonal vector choices but only one CC to offer in each vector, whereas any other orthogonal matrix will have one or more than one CC channels to offer per orthogonal row vector selected.Stochastic Learning Automata SLA learns and selects the best orthogonal vector that records minimum interference from neighboring cells.
In order to achieve better performance for the edge users, the orthogonal CCs are allocated to the users having worst SINR.The rest of the users are then given resources from the remaining available CC channels by assigning next CC channel with highest measured SINR to the next worst case user.In our system model the data rate observed by a user is taken as an indicator of QoS criteria.We calculate and adjust the CC power based on target user data rates as set for a particular simulation case.A sample power selection of a cell is shown in Fig. 3.
In the second approach, all femtocells publish their GPS location and CC selection vectors for every time step on a server.New cells quickly identify their neighboring cells from this central server based on predefined distance criteria set by operator.This neighborhood cell selection criteria, based on only distance is not optimal as in reality it depends on the actual radio environment [12] but reduces the complexity of identifying neighboring cells in a large network.Femtocells can then retrieve information from the publishing server on past CC vectors selections of their neighbors and use it to further learn and refine their own orthogonal CC vector selection.In our system model, the CC vector publishing server is positioned at the macrocell as shown in Fig. 2, but it can be placed anywhere in the network as deemed feasible.
With this publishing approach precise inter-cell coordination is not required as the selected CC vector information of neighboring cells is not for the next LTE frame structure but for past ones.

Mathematical Modeling
Consider B as the bandwidth allocated to each femtocell.
B is subdivided into L common aggregate channels CC such that a selection vector X j t = x j t (1), x j t (2), x j t (3), . . ., x j t (L) defines the current selection of aggregate channels for femtocell j in time t.The value of each element in the vector can either be 1 or 0 i.e. selected or not selected.Thus the complete set of vector selection "X" for any femtocell has cardinality of 2 L − 1.At least one CC channel is to be selected even if there are no mobiles in its coverage area.
Out of the set X, a subset vector space 0} is defined such that its elements are used to construct an orthogonal matrix of dimension (M × M).A unity matrix provides maximum orthogonal vectors i.e.M. Any other orthogonal matrix will provide less than M orthogonal vectors and will have at least one orthogonal vector with more than one CC.For a specific value of M there can be different combinations of CC selection given by L In this paper we have limited our focus on only the number of orthogonal CCs selected and not on their different combinations.The selection of Y for M = 3 and corresponding orthogonal matrix is shown in Fig. 4.So the overall selection vector X becomes The purpose of defining the vectors Y and X is to give the users having low SINR CC resources with full power (p max ) from Y and for rest of the users with CC resources limited to the maximum power of (αp max ) available from X. α can have any value between 0 and 1 but for this paper we have fixed it to α = 0.7.In special case, where a femtocell has no users or only one user then it will transmit at maximum power on randomly selected one CC from Y to act as the pilot channel.After the CC channels from Y have been consumed for low SINR users, the rest of the users will be given CC resources such that the user with highest interference level is assigned the CC channel with lowest observed interference.Thus the total power transmitted by a femtocell at any time t is given by the following equation, depending on how many CC channels it is transmitting on where Let the distance of UE k in a Closed Subscriber Group (CSG) group of K users from its associated femtocell j at time t be denoted as d j t,k .Let R k be the lowest data constraint for any user in the model to satisfy a predefined QoS requirement by the operator.We can then define an association vector for a femtocell j as a where a j t (k) ⇒ j means the UE k associated with femtocell j at time t.Consider that a user k can be served with one or more than one CC l by its associated femtocell j, depending on resources available and user QoS requirement.In order to estimate the data throughput observed by the UE k by all the allocated CCs we first consider the SINR on one CC l given as where Q t,k,l is the aggregate interference observed by UE k on CC l at time t, G k,a j t (k) is the gain of the femtocell j directed towards this user k dependent on the path loss component and d j t,k , N 0 is the thermal noise power.
For simplicity we are not considering handover cases for the users.The total interference Q t,k,l for a user k at time t is the aggregate interference received from other cells i.e. macrocells and other femtocells, and is given as where G t,k, j is the gain of any femtocell j towards the user k at time t.
We now define the utility function U of UE k on CC l based on modified Shanon capacity as given in the equation below.The value of BW eff (Bandwidth Efficiency) and SINR eff (SINR efficiency) are set to 0.56 and 2 respectively for LTE environment [21].
Thus the utility function of user k over all allocated l CC channels is Note that the utility function gives us an estimate of the data rate received by any user k.Our optimization objective here is to maximize overall utility U t for all edge users while meeting the minimum data rate R k requirement for all network users.We define a variable 0.7 < ρ ≤ 1.0, such that users having their distance from their associated femtocell j as ρD For this paper we assume that only one type of femtocells are deployed having same coverage area.Thus the problem statement can be framed as follows Problem Statement.To maximize the mean of the utility function for the edge users while meeting the desired data rate QoS for rest of the users.

Application of Stochastic Cellular
Learning Automata to HetNet

Cellular Learning Automata
Cellular Automatas (CAs) are mathematical models for systems consisting of large numbers of simple identical components with local interactions.The simple components act together to produce complex emergent global behavior.A CA is called cellular, because it is made up of cells like points in the lattice, and called automata as they follow a simple local rule [17].On the other hand, Learning Automata (LAs) are by design, "simple agents for doing simple things".In [18], CA and LA are combined, and a new model, which is called Cellular LA (CLA) is presented.This model is superior to simple CA because of its ability to learn and is superior to single LA as it can learn from a collection of LAs which can interact with each other.CLAs are particularly suitable for modeling natural systems that can be described as massive collections of simple objects interacting locally with each other.Thus CLAs are also applicable to the case of femtocells in Self Optimizing Network (SON) environment [20].CLAs can be classified into two groups synchronous CLA and asynchronous CLA (ACLA) [19] depending upon how the cells update their states either synchronously from one global clock or independently.In HetNet, femtocells can update their states from finite set of states, synchronously or asynchronously according to a local rule.This paper uses synchronous CLA model.
Mathematically, a r-dimensional synchronous CLA environment with J femtocells is a structure Ω = (Z r , Φ, A, N, F).In the structure, Z r indicates a lattice of r-tuples of integer numbers.For our 2D synchronous CLA system model, the value of r = 2. Φ is a finite set of states.A is the set of LAs, each of which is assigned to one femtocell of the CLA environment.N = {c 1 , c 2 , c 3 , . . ., c n } is a finite subset of Z r called neighborhood vector.F = Φ n → β is the local rule of the CLA, where action the combined neighborhood action Φ n results in reinforcement signal β [20].The femtocell local rule computes the reinforcement signal for each LA based on the performance indicators feedback.Our performance indicator are the bit rate measured by individual UEs and the recorded femtocell's own throughput, which is just the aggregate of data rates achieved by individual UEs in its coverage.β is computed as a binary indicator, having status as either optimum or not-optimum.β is considered optimum when the feedback got from the environment is in favour of our learning decision, and taken not-optimum otherwise.
The state of all femtocells in the CLA environment is described by a configuration.The rule and initial configuration of the CLA specify the evolution of CLA that tells how each configuration is changed in every step.The configuration of the complete CLA model at any time step t is denoted by p t = (p t (1), p t (2), . . ., p t (J)) , where p t ( j) = (p t ( j1), p t ( j2), . . ., p t ( jn)) is the action probability vector of LA A j .A configuration p is deterministic if the action probability vector of each LA is a unit vector, otherwise it is probabilistic.Hence, the set of all deterministic configurations K * and the set of all probabilistic configurations K in CLA are K * = p|p( j, y) ∈ {0, 1} ∀y, j and K = p|p( j, y) ∈ [0, 1]∀y, j, respectively, where y p( j, y) = 1 for all j.
The operation of the CLA takes place as follows on every time step t.At iteration t, each LA A j chooses one action from its set of actions ϕ j .Let the cardinality of ϕ j be n j .The application of the local rule to every cell allows transitioning the CLA configuration to a new one.Then based on the feedback of action taken, every A i receives a reinforcement signal b i ∈ β based on the set performance indicators.If the b i ∈ β is optimum then the chosen action A i receives positive reward.

Steady State Condition for Cellular Learning Automata.
The following two theorems state the steady-state behavior of CLA when each cell uses linear reward inaction algorithms or L R−I algorithms.Proofs of these theorems can be found in [22].Theorem 3.1 Suppose that there is a bounded differential function defining the action mapping of all LAs A j in all cells forming a CLA A as D : R (n 1 +n 2 +•••+n n ) → R such that for some constant c > 0, ( ∂D ∂p j v )(p) = cµ jv (p) for all j and v, where p is the generic configuration of CLA, p jv is the configuration in which a configuration having reward v for LA A j has been selected and µ jv (p) is the average reward for LA A j .Then, CLA for any initial configuration in K − K * and with sufficiently small value of learning parameter (max{τ} → 0) always converges to a configuration that is stable and compatible, where τ j is the learning parameter of LA A j .Theorem 3.2 A synchronous CLA, which uses uniform and commutative rule, starting from p(0) ∈ K − K * and with sufficiently small value of learning parameter (max{τ} → 0), always converges to a deterministic configuration that is stable and compatible.
If the CLA satisfies the sufficiency conditions needed for Theorem 1 and 2, then the CLA will converge to a compatible configuration; otherwise the convergence of CLA to a compatible configuration cannot be guaranteed, and it may exhibit a limit cycle behavior [23].The compatibility of a configuration implies that no LA in CLA has any reason to change its action.

Advantage of Stochastic Learning in CA
Stochastic learning automata (SLA) is a finite state machine and can learn from both stationary and non-stationary environment to achieve a better performance [24].Modern urban environment is overly crowded and is characterized by many sky scrapers and underground buildings.This congestion results in a number of blind spots and low coverage areas.Femtocells, help alleviate this problem by filling in these coverage gaps and allowing users to deploy as per their requirement.Thus, femtocells will be able to enhance the end user satisfaction but at network level their would be a requirement to coordinate their transmissions so that inter-cell interference can be brought to minimum.Structuring a model where interference is controlled through coordination between femtocells, can become complex and inefficient as cell density increase and metropolis grow.Using Stochastic Learning in CLA or Stochastic Cellular Learning Automata (SCLA), femtocells can self-learn the changing radio environment and adapt to minimize interference.SLA learns by means of feedback from the environment and improve on its selection.As there is no predetermined relationships between SLA actions and the responses, so there is no requirement for a closed form system model.Also, as SLA operates at each femtocell level the overall SCLA design is fully distributed and dynamic.Thus SLA is appropriate in our HetNet system model where cell deployment is user defined and not under operator control.The learning process is done iteratively until SLA reaches a stable condition.We adjust the probabilities of states stochastically in any K − K * CLA configuration, such that y p j,y = 1, and keep the learning rate of the algorithm of the order of 10 −3 for every femtocell j.With these conditions satisfied, Theorems 1 and 2 guarantee that the SCLA configuration will converge to a stable condition for the network.

Proposed SCLA Approach
In this section we present the proposed approach based on SCLA.The system model based on this approach is distributed, real time and allows dynamically interaction with its environment.The design exhibits characteristics of emergent behavior as it has the capability to self-learn through the feedback received from users and neighboring femtocells and adapt accordingly.SCLA picks out one CC from M orthogonal vectors in Y according to a probability vector q j t = q j t (1), q j t (2), . . ., q j t (M) for femtocell j at time t.All UEs in a femtocell are then sorted according to their SINR values and the UE experiencing maximum interference is allocated this orthogonal CC at maximum power P. For the rest of the UEs in the femtocell, the next worst case UE is allocated a CC from the remaining available CCs having minimum recorded aggregate interference.For these UEs, the CC transmit power is calculated based on target user data rate R k using (4) and is limited to a maximum of αP as highlighted in Sec. 2. The target value of R k corresponds to a predefined QoS level set by operator to be met for any user k in the network.This process is repeated until all UEs have been satisfied.
Once the CC states have been selected for the femtocells, we update the probability vector by using pursuit algorithm [25].We use Discrete Pursuit Reward Inaction (DPRI) pursuit algorithm in this work.The previous work has shown that DPRI pursuit algorithm has good convergence properties.SCLA learns on the basis of two types of feedbacks from the environment, the combined interference plus noise observed per CC by all users, named as Scheme 'A' and the neighboring cells orthogonal vector selection available from the publishing server, named as Scheme 'B'.In Scheme A, a positive or negative reinforcement signal β is generated if the observed data rate on the worst user increases or decreases than the previous time slot.Whereas, in Scheme B, a positive or negative reinforcement signal β is generated if the observed cell throughput increases or decreases with respect to the previous time slot.The equation for β for Scheme A at any time slot t is given in (7), where k is the worst case UE.
where k = worst case UE.
The equation for β for Scheme B at any time slot t is given in ( 8) Algorithm 1 Femtocell Stochastic Automata Orthogonal Component Carrier Selection (10) trialState ← y(m) Arrange cell UEs as per increasing SINR levels wor stCaseU E ← U E SINR (1) Assign orthogonal CC and power to worst case UE Channel (wor stCaseU E) ← y(m) Power (x(l)) = power Req endif end for t ← t + 1 Get feedback and update q t vector if Scheme A is valid β = (U worstCaseUE (t) ≥ U worstCaseUE (t − 1))? update q t using (9)with A if Scheme B is valid ))? update q t using (9)with B until t = End Simulation Thus the CC probability vector is updated according to the following equations in both the cases.
where ∆ A, B is the either the learning rate for SCLA Scheme A or B respectively.As both the SCLA schemes are independent of each other, the system model can use the schemes separately or apply them in unison.In case the schemes are combined, a single CC probability vector is used for orthogonal CC selection.
As the simulation time step t progresses each femtocells will continue to learn and improve on its orthogonal CC selections and reach an optimal level where the probability of the best CC vector will almost reach unity.Reaching this condition is desirable if the neighbor cells do not change their orthogonal CC vector selections.When such a network state has been reached then the system can be considered as stable.However, the model still retains its dynamic nature as it can respond to any new change in the environment.
In order to give enough time to the system to explore the environment we initialize the probability vector uniformly and allow each state to get at least one chance in E trials with a confidence level of CL.This can be achieved if the condition of E given in the following equation is satisfied.
Thus, we choose an appropriate value of E to satisfy the above condition.The pseudo-code of our system is given in Algorithm 1.

Results
Simulation Environment.A downlink LTE-A scenario was simulated in Matlab as shown in Fig. 5. Simulation parameters used in this paper are given in Tab. 1.These parameters have been referred from [13].Omnidirectional antennas are assumed for all cells.A full buffer traffic model with infinite data packets in the queue of each femtocell is applied.Femtocells users are configured in closed subscriber group (CSG) mode, that is only those users included in the CSG access control list of a femtocell are given CC resources.
Learning Rate.SCLA Scheme A with user feedback was taken as reference to select one learning rate for the simulated environment.R k was set to 40 Mbps.Performance of edge users was observed while varying the SCLA learning rate A from 0.0001 to 0.05.The results are presented in Fig. 6.
Figure 6 shows that the learning rate of A = 0.005 gives the best performance overall with highest observed average edge user data rate of 28.91 Mbps.The average data rate achieved for all users at this learning rate is also highest, around 37 Mbps.Above and below this value the SCLA performance is not optimum.For very low learning rates the system performance deteriorates considerably, as the femtocells are unable to cope up with the changes in the environment model.Higher learning rates (above 0.05) were not tested as they lead to unstable behavior as pointed out in Theorems 1 and 2. Thus, 0.005 was selected as the learning rate for both SCLA schemes A and B for rest of the simulations.In the case where both A and B SCLA schemes have been combined, the learning rate of 0.002 has been used so that the combined effect can be compared to the case where only one of the SCLA schemes is applied.Cardinality of Orthogonal CC Set.The average data rate achieved for edge users and all users as a whole with different sizes of orthogonal set Y is shown in Fig. 7.The results show that reserving less number of CCs for orthogonal set Y give better performance for edge users as compared to higher numbers, i.e. around 3 to 5. The best performance is achieved when M = 4.It can also be seen that the value of M, that gives best result with edge users also gives best results for all users.This is because lower level of interference in the environment improves SINR for all users.
Variable Target User Data Rate R. The simulation results with different target user data rate, R k is shown in Fig. 8 and Fig. 9 for edge users and all users respectively.The figures give a comparison of network performance for four cases; (1) Non-learning case, (2) SCLA with user feedback, (3) SCLA with cell publishing and (4) Combination of SCLA with user feedback and cell publishing.Four distinct regions of achieved user data rate vis-a-vis given target data rate can be identified in both the graphs for the above, one non-learning and three SCLA learning cases.The first region, labelled as A corresponds to the data range where the measured average data rate of users is more than the required target data rate in all four cases.Thus, in this region though the three SCLA learning cases perform better to non-learning case but as the required target data rate is met we can claim that all cases have acceptable performance.The second region, labelled as B corresponds to the region where non-learning case is not able to meet the required target data rate but the three SCLA learning cases perform better than the target value or are able to achieve it.The third region, labelled as C corresponds to the region where all four cases fail to meet the target data rate but the three SCLA learning cases perform much better than non-learning case.The fourth region labelled as D, is a region where all the four cases do not meet the target data rate and have almost equal performance.This is due to the fact that in order to satisfy the high target data rate value all femtocells have to give their users, CC resources at maximum power possible and so overall the interference level in the network environment becomes so high that learning does not give any significant advantage.Comparing the two figures i.e.Fig. 8 and Fig. 9, it can be seen that as the average data rate observed for all users in Fig. 9 continue to increase and match the target values whereas the graph of edge users starts to fall around the mid.This dip in the performance of cells for edge users is because of the rising interference level in the environment as all femtocells start to operate at maximum capacity to match the target user data rate.This observation is similar to [26], [27], where performance degrades as traffic density and user demands increases.However, a combination of new disruptive technologies for 5G like directional antennas, massive MIMO and millimetre spectrum wave can provide us with more frequency bandwidth and data rates while at the same time reducing the interference as pointed out in [28].Hence, we may see important improvement in performance at the higher data rates by the application of our SCLA techniques.It can also be seen that the region B is more wide for the edge users as compared to all users, which shows that the SCLA learning benefits the edge users more as compared to all users.
Comparing the three SCLA learning cases in Fig. 8 and Fig. 9, it can be seen that the SCLA learning case based only on user feedback in general, performs better for edge users as compared to SCLA with cell feedback and combined SCLA cases.This is because in this case, the learning is based solely on the performance of worst users who happen to be at the extreme edges.The case of SCLA with cell feedback lags behind the other two cases in both the figures as it does not take into account the actual interference situation in the coverage area and learns only from the published CC vectors of neighboring cells defined by a simple distance parameter and femtocells's own throughput.However, the combined SCLA case in the two figures show that combining the SINR feedback from users to the published CC feedback of neighboring cells gives a better performance as compared to the individual SCLA cases for all users (Fig. 9).This shows that enhancing the edge user performance benefits all users in the network due to refined orthogonal CC selection.Performance Versus Femtocell The average target data rate achieved by edge users and all users with increasing femtocell density for the four cases of non-learning, SCLA with user feedback, SCLA with cell publishing and combined SCLA is shown in Fig. 10 and Fig. 11 respectively.The results in both the figures, show that as cell density increase the performance of non-learning and learning SCLA cases fall and converge.This is because with increasing cell density the number of network users also increase resulting in an overall rise in the transmission and interference in the network.The point of convergence is around 0.001 cells/m 2 in both the figures.Comparing the three SCLA cases, it can be seen that SCLA with user feedback performs better for edge users as compared to the other two cases, whereas the combined SCLA case has better results for overall all users.These observations here are similar to the previous observations for variable target user data rate, because in both the situations the effect on the network is similar i.e. more traffic demand requires more network resources thereby resulting in more CC transmissions and rise in network interference.
Empirical CDF Plots For Edge Users.The CDF plots for edge users at the distance of 5m from cell edges, with a target user data rate of 25 Mbps and cell deployment density of 0.0003 cells/m 2 is shown in Fig. 12.The graph clearly shows the advantage of combined SCLA learning case (SCLA scheme A + B) over no learning case.For almost all the observed edge user's data rates the probabilities of combined SCLA are higher as compared to the case where learning is not employed.For no learning case, the users beyond the target data rate of 25 Mbps is almost negligible whereas in the learning cases there are some users with data rates that are higher than the given target.This is mainly due to the reduction in the interference levels among neighboring cells because of SCLA.

Conclusions
The simulation results clearly show that (1) Stochastic learning algorithms benefit edge users as compared to non learning cases when user target data rates are not high and the femtocell deployment density is not intense.(2) Fewer number of orthogonal channels perform better as compared to a scheme where all CC channels are used for orthogonal vector selection.(3) The new cell publishing scheme be combined with user feedback using stochastic automata to further boost the cell performance for generic users but the selection of neighborhood area needs to be appropriately defined.(4) Either of the two stochastic automata schemes using user feedback and/or cell publishing can be used individually or in unison to benefit the of the cell.

e v a l u a t i o n o f s o l u t i o n s R AN WG WI : s p e c i c a t i o n o f s o l u t i o n s R e q u i r e me n t s E v a l u a t i o n I n i t i a l s u b mi s s i o n s o f p r o p o s a lFig. 1 .
Fig. 1.Tentative 3GPP timeline for 5G 1 .

Fig. 2 .
Fig. 2. HetNet environment model with deployed femtocells and a central server for femtocell CC vector publishing.

Fig. 3 .
Fig. 3. Femtocell orthogonal CC selection with power distribution for all its users.
as edge users, where D max is the maximum coverage range of the femtocell j.

Fig. 6 .
Fig. 6.User performance with different learning rates for SCLA with user feedback.

Fig. 7 .
Fig. 7. User data rate averages with different number of orthogonal channels.

Fig. 8 .
Fig. 8. Performance with variable data rate for edge users.

Fig. 9 .
Fig. 9. Performance with variable data rate for all users.