Maximizing the Energy Efficiency of Virtualized C-RAN via Optimizing the Number of Virtual Machines

In cloud radio access networks (C-RAN), more accurate prediction of the number of virtual machines (VMs) one server can support would improve network capacity and energy efficiency (EE). In this paper, the problem of allocating an optimal number of VMs to the cloud server is introduced. Monte Carlo-based evolutionary algorithm [particle swarm optimization (PSO), quantum PSO, or genetic algorithm] are used to find the suboptimal number of VMs that optimizes the EE of C-RAN. To enable such evaluation, a power model is proposed to evaluate the power consumption of each unit within a virtualized server. This evaluation occurs under the circumstances of increased number of hosted VMs, and processed resource blocks (RBs) at each VM. Moreover, power allocation methods are proposed to transmit the power from base band unit pool to the remote radio heads (RRHs), and from RRHs to the users (UEs). This allocation is based on the combination of one or more of RRH distance, RRH channel gain, UE distance, UE channel gain, and UE path loss. The EE problem was constrained to crucial quality of service indicators, including minimum UE data rate, number of allocated RBs, and latency imposed due to virtualization.


Maximizing the Energy Efficiency of Virtualized C-RAN via Optimizing the Number of Virtual Machines
Raad S. Alhumaima , Riyadh Khlf Ahmed, and Hamed S. Al-Raweshidy Abstract-In cloud radio access networks (C-RAN), more accurate prediction of the number of virtual machines (VMs) one server can support would improve network capacity and energy efficiency (EE).In this paper, the problem of allocating an optimal number of VMs to the cloud server is introduced.Monte Carlo-based evolutionary algorithm [particle swarm optimization (PSO), quantum PSO, or genetic algorithm] are used to find the suboptimal number of VMs that optimizes the EE of C-RAN.To enable such evaluation, a power model is proposed to evaluate the power consumption of each unit within a virtualized server.This evaluation occurs under the circumstances of increased number of hosted VMs, and processed resource blocks (RBs) at each VM.Moreover, power allocation methods are proposed to transmit the power from base band unit pool to the remote radio heads (RRHs), and from RRHs to the users (UEs).This allocation is based on the combination of one or more of RRH distance, RRH channel gain, UE distance, UE channel gain, and UE path loss.The EE problem was constrained to crucial quality of service indicators, including minimum UE data rate, number of allocated RBs, and latency imposed due to virtualization.Index Terms-Virtualisation, optimisation, BBU pool, power allocation, cloud radio access networks, energy efficiency, virtual machines.

I. INTRODUCTION
D RIVEN by the need to provide at least 10 times higher spectral and energy efficiency (EE) in 5G networks, mobile operators deployed a large number of small cells in heterogeneous networks.Whilst this has increased network capacity, it has also led to the consumption of more power.In order to reduce this consumption, cloud-radio access network (C-RAN) architecture is proposed [1], [2].In C-RAN, the base band units (BBUs) servers are responsible for processing the upper layers and most of the physical layer functions, including radio frequency (RF), base band digital signal processing, and wireless media access control (MAC) functions.These BBUs are cloudified in a centralised location, called a BBU pool.Consequently, the remote radio heads (RRHs) are left exceedingly simple at the cell site, with only optical to electrical conversions, amplifiers, and antennas [3].C-RAN contrasts the traditional long term evolution (LTE) system, where the BBU functions are processed in the evolved NodeB (eNodeB) at the cell site itself.Bringing these BBUs together in C-RAN has resulted in a paradigm that is capable of effectively implementing advanced cooperation algorithms, utilising the spectrum by using dynamic bandwidth adaptation approaches, exploiting the load variation to reduce the required cooling and total power consumption/consumed (PC), and adapting new 5G enabling technologies [4].Additionally, C-RAN diminishes operational expenditures (OPEX) and capital expenditures (CAPEX) due to reduced maintenance cost and fewer site visits [5].Despite the C-RAN paradigm unleashing the network potentials greatly regarding EE and cost of operation, intensifying the number of deployed RRHs and active BBUs leads to considerable increases in the amount of PC [3].Hence, the goal of offering a highly efficient 5G architecture that is able to decrease this PC substantially is a challenge.In order to meet it, the research community has embraced the use of network function virtualisation (NFV) techniques in the cloud.In fact, both C-RAN and NFV represent the key success technologies in the coming generations.C-RAN offers low cost system, higher level of efficient resources sharing, while virtualisation technology can provide major reductions in PC.Moreover, the service providers (SPs) have gained the ability to allocate network resources flexibly within the cloud.In addition, there has been automation in the virtualised server's operation and configuration, reduced maintenance cost, and support for multi-tenancy mode of service.On top of this, NFV has allowed deploying and managing new services to fulfil the UEs' demands on the fly.It has also led to the promotion of the concept of hardware-software isolation in the virtualised servers [6], [7], which allows for the execution of network functions using only software, called virtual machines (VMs).These VMs can run on general off-the-shelf servers, rather than proprietary built or dedicated appliances [8].
VMs are expected to reduce the operational cost in the core networks.Today, it is possible to operate fewer virtualised servers in the pool to run the whole network while fulfilling the UE quality of service (QoS) requirements [9].Each VM is software that runs the BBU functions and shares This work is licensed under a Creative Commons Attribution 3.0 License.For more information, see http://creativecommons.org/licenses/by/3.0/ the resources of the host server with other VMs in a time restricted manner.Running multiple VMs on a single hardware requires a HyperVisor (HV), which is software that runs on the server's higher layer, thus allowing the host to be shared by multiple VMs [10].That is, each VM can utilise the server's random access memory (RAM), central processing unit (CPU), network interface cards (NICs) and hard drive (HDD) by itself without obstructing other VMs.First, the HV collects the information of each VM regarding the number of UEs and their QoS indicators, subsequently scheduling the available resource blocks (RBs) amongst these VMs.Afterwards, each VM can schedule its share of RBs amongst its UEs according to different QoS factors such as, minimum data rate, received power, interference, etc.However, the presence of HV within the host server increases RAM accesses, CPU functional complexity and HDD usage, with eventual consequent extra overhead occurring within the virtualised server.Furthermore, the existence of VMs increases the host server's latency.Hence, a detailed comparison of the virtualised and bare servers is required to identify the advantages and disadvantages of using NFV in C-RAN.

A. NFV Trade-Offs
1) NFV is able to curtail the increase in PC in C-RAN due to integrating new technologies and services such as, software defined network (SDN) and load balancing appliances [3].This reduction can be achieved by sharing the available server's resources/units while cascading multiple VMs in one operating server.
2) It was mentioned in [11], that a single virtualised server can host tens of VMs.However, this might be possible when the server is running offline applications, the delay of which is relaxed.Because a virtualised server with 1 VM may take about 5 times more execution time delay to process a packet compared to its traditional counterparts [12], such delay in online services can not be avoided.This delay originates from each VM only being able to own a scarce amount of the host server's resources as it has to share these with other VMs.Clearly, if the server hosts more VMs, the resources share allocated to each VM is further reduced [13].In this case, the VM is obligated to queue its load and wait for a window to be opened again by the HV after a time.Hence, optimising the number of installed VMs in one host server is prerequisite, so that the VMs can always meet their real time requirements, and ensures the host server is not overloaded.
3) It was measured in [14], that the execution time of a traditional base station's functions is convexly or linearly proportional to the number of processed RBs.This means when the VM operates as a BBU, increasing the number of its allocated RBs will surely produce extra delay.Whilst the total number of VMs generate an enlarge amount of delay, the allocated RBs of each VM have to be optimised.4) Finally, the virtualised server itself gains a PC as its resources are fully utilised.The reasons of such consumption is due to higher computation levels, generated I/O instructions, and compound accessing for the device resources by the aggregated VMs' applications.Consequently, a busy virtualised server may consume about 40% more power than traditional counterparts [12].This increament requires further investigations regarding modelling the PC of virtualised servers.
The above contradictions galvanise the estimation of what is the optimum number of VMs for sustaining the network's QoS.This simple question leads to the generation of further inquiry, such as, what is the amount of overhead these VMs draw upon?How does increasing the number of processed RBs at each VM can affect the PC and latency of the host server?What is the highest latency that can be avoided by the network?What is the optimal number of allocated RBs to each VM?What is the minimum QoS requirement for each UE served by a particular VM?These questions originate differentiated network variables including RBs, PC, delay and data rate.Hence, we have assembled these parameters to be correlated in an EE problem.

B. Main Contributions
1) The optimal number of VMs that maximises the EE of C-RAN has been estimated.The problem is solved using particle swarm optimisation (PSO), quantum PSO (QPSO) and genetic algorithm (GA) approaches.
2) We intended to measure the amount of traffic (number of UEs and RRHs, power allocation, channel gains, etc.) found in the area of interest, and examines a possible change in the traffic volume prior to optimisation process.For this purpose, a Monte Carlo method was adapted inside a PSO, QPSO and GA algorithms to assume large number of possible network traffic.This method is different to what is found in the available literature, where the network is constantly adapted to a new solution each time the traffic volume or network behaviour is changed.
3) In contrast to the uniform distributions of the UEs and RRHs, which are based on hexagonal, circular or triangular shape.In this work, Poison Point Process (PPP) distribution has been used to reflect on the real-time deployment and practical-wise resources assignments.These included randomly generating the RRHs and UEs positions, power, resource blocks scheduling, etc, for each Monte-Carlo iteration.
4) Modelling the way active VMs and utilised RBs increasingly affect the PC of the host servers.This modelling provides a realistic evaluation to the PC modelling at the server's unit level, i.e., CPU, RAM, NIC and HDD.A well-known LTE parameter (i.e., RB) has been used as a main factor in this modelling.Eventually, this parameter can affect both the sum rate and PC during the optimisation process.
5) The baseband signals transmitted from the BBU pool towards the fiber/wireless connected RRHs are distributed over the proposed power allocation methods: PAM1, PAM2 and direct power.These allocations are based on either UE distances to the RRH, both these distances and channel gain, or equal power distribution to the RRHs.Note that direct power method means the RRHs are allocated equal power from the pool.However, these allocations are different than in the available research works, in which the sum data rate is directly influenced by only the resources assignments in the fornthaul (from RRHs to the UEs).However, in C-RAN, both fronthaul and backhaul (resources assignment from BBU pool to RRHs) should be considered.Accordingly, the proposed methods in ( 1) and ( 2)) correlate the distances (from BBU pool to the RRHs) with the RRH's received power, such that the pool can have an influence up on the latter.Subsequently, these RRHs impact upon UEs' received power and their data rates.

C. Related Work
In this section, both sides of the EE problem are discussed, namely, the workload management in virtualised data centres, and PC modelling.A comprehensive survey in [15] reviewed most of the available PC models to date, including virtualised and non virtualised servers, data centres as well as single server.Generally, the available power models can be classified into hardware, software intrusive and machine learning.This classification is based on the approach that has been used in the measurement.Intrusion based models are required to install intrusive hardware tools and events counters, which makes the PC measuring expensive and complex.Moreover, these measuring hardware counters add additional PC overhead to the actually measured PC.This runs against the generic purpose of the research, which is aimed at reducing the PC as much as possible.It was observed that the main parameter to be measured in these models is the utilisation ratio/level that is scored by the CPU, RAM or storage during operation.Measuring such a factor requires there to be a server, from the operating system of which this value can be monitored, and this clearly increases cost along with complexity.Kansal et al. [16] have proposed to track the VM's energy usage at each hardware unit via using HV-observable hardware power states.The software based models are similar to the first method, but this type use a monitoring software, which is installed as an application on the server so that the VMs' power usage can be known.This process is also complex, the installed software can be a reason for more PC within the server.Furthermore, the tracking process cannot guarantee accurate measurement for the events that occur, because of the time response mismatch between these high frequency events and the time window opened to track them, see [17].The third method is based on machine learning or heuristics and is error acceptable as it is based on random distribution of the solution candidates.Also, it requires repeating the process of optimisation several times to guarantee the results, for example see [18].Furthermore, this method is time consuming and costly on power.However, our proposed power model is much simpler and costless when compared to the available models, it is only based on the number of hosted VMs, allocated bandwidth/RBs to each VM, and components data sheets.
On the other hand, there have been several works with the aim to optimising the EE in the data centres.In [19] and [20], a dynamic, on-demand VM migration based algorithms were proposed for distributed data centres.These works were devoted to reducing the carbon footprint based on specified service level agreement (SLA).Zhani et al. [20] put forward a live migration technique amongst the cloud servers to adapt dynamically the load fluctuating.In [21], an energy efficient algorithm that reduces the operational costs in virtualised data centres was suggested.The algorithm consolidates VMs based on current CPU utilization using live migration technique.However, in these works, there was no evaluation for the migration power cost, not to mention the increased delay within the virtualised server.The power cost can reach up to 32W in the source, and 10W in the destination server for each migrated VM [22].If these numbers are multiplied by the number of migrated VMs, such a price would militate against the deployment of these methods.In [23], a technique was proposed to reduce the electricity bill through allocating the coming traffic amongst distributed, Internet based, and non virtualised data centres.Zhang et al. [24] were concerned with optimising the energy cost in the data centres.Their proposed algorithm adapts the traffic demand over time to reduce the power.This work places emphasis on service and infrastructure providers, and their revenue to satisfy a certain SLA.Hatzopoulos et al. [25] suggested using both, the traditional power grid and renewable energy to reduce the PC in data centres.Through a time varying and traffic adaptation based algorithm, they proposed allocating some of the network tasks to the renewable energy sources.However, the cost of renewable energy was not evaluated, such as maintenance, deployment and gain over traditional source of energy.Gao et al. [26] and Guo et al. [27] put forward an approach to reduce the carbon footprint by redirecting the traffic to cleaner geographical locations.Xin et al. [28] proposed an algorithm that splits the coming traffic to different data centres instead of one destination, with the main objective being to balance the coming workload prior to processing.A highly related work to our problem can be found in [29], where the number of VMs a server can support is experimentally assessed.Unfortunately, there has been no UE resources allocation, no power model, and no mathematical representation to be able to generalise such a case to broader amount of server types.

II. SYSTEM MODEL
The downlink multi-RRH, multi-UE C-RAN system included total number of RRHs (M).These RRHs are PPP distributed with intensity (λ 1 ).On the other hand, the total number of UEs (U) are distributed with intensity (λ 2 ).Each RRH (m) is assumed to have a sub-number of UEs (U m ).The nearest distance-based UEs then attached to the RRH m and distributed with coordinates (x u , y u ), with small scale fading h that is assumed to be Rayleigh fading.The noise power is assumed to be additive with a value of (σ 2 ).

A. Optical Power Allocation Models
Two methods are proposed to distribute the power from BBU pool to the optical fibre, with star topology connected RRHs.The first method or power model (PAM1) is a distancebased proportional allocation, where RRHs received power relies on the distance d m,o to the BBU pool.That is, the closer RRH m is to the BBU pool, the less power (Pr m ) it will receive as compared to other RRHs, as follows: where OFL denotes the fibre losses and Ppool denotes the total power transmitted by the BBU pool.In traditional or partially centralised networks, the need for such allocation (i.e., from BBU pool to RRHs) is ignored, because the BBU unit already resides within the eNodeB, and transmits the signals to the UEs through RF unit.In addition, the connection from eNodeB to the core network is only logical via the transport links.However, with fully centralised C-RAN, the BBU unit is shifted to the BBU pool.Hence, the modulated signals are no longer generated at the eNodeB, but rather, from the BBU pool.To describe this relocation, power allocation methods from the BBU pool to the RRHs are planned.
The second model (PAM2) is proposed based on both the RRH-BBU pool distances and the channel gain received by the U m -th UEs.If an increase within the total channel gain of UEs (U m ) is taking place.Alongside, the RRH is more distant to the BBU pool when compared to other RRHs, this situation allows an increase within the power received (Pr m ) by the tagged RRH m in comparison to other RRHs.Additionally, the RRH's received power can be further disciplined through the power control variable (δ) which sets the effectiveness of this proportional power distribution.This method can be introduced as follows: where u ∈ {1, . . ., U m } is the UE index, (0 ≤ δ ≥ 1) is the power allocation effectiveness control factor, and h m u = |h| 2 is the signal attenuation of u-th UE within m-th RRH.
Consequently, the u-th UE can be allocated an amount of power based on three methods.The first allocation is based on the distance, where the u-th UE is allocated this according to its distance (d m,u ) when compared to other UEs distances within the m-th RRH, as follows: The second allocation is based on both, the distance (d m,u ) and the received channel gain (h m,n u ) compared to other UEs within the m-th RRH, as follows: where P n m,u and h m,n u denote UE's received power and channel gain from m-th RRH served by n-th VM.Moreover, the third allocation is based on both the path loss (r m u ) and the small scale fading (h m,n u ) [30], as follows: where r m u = (d m,u ) −α indicates the path loss from the RRH m to UE u, and α is path loss exponent.Subsequently, the sum data rate can be given as: where RB n represents the total number of RBs allocated to VM n with bandwidth B o , P n m,u,rb is the allowed transmitted power on RB (rb), and σ n m,u,rb represents the SINR of rb served by VM n assigned to UE u of RRH m, where and is the channel gain of interferer RRH inf to the UE u.It is worth mentioning that maximisation of the sum bit rate of all UEs does not guarantee this for each individually.Hence, the bit rate of each UE is constrained to a threshold value, as presented in (10).
In regards to the PC, there are four major participants involved within the constituency of a server, these are RAM, CPU, NIC and HDD.It was mentioned in [31] and [32] that the PC of the virtualised server is exponentially or non linearly proportional to the number of VMs.Hence, the PC of the virtualised server (P srvr ) can be expressed as P srvr = (P ram + P cpu + P nic + P hdd ) × e εN , where P ram , P cpu P nic and P hdd denote the initial PCs of server's RAM, CPU, NIC and HDD, respectively.The term (e εN ) is used to describe the dynamic PC of the server's units due to the existence of VMs, where ε is a positive constant.Since each VM is serving several UEs, it is assumed that the dynamic load or bandwidth share is linearly proportional to the number of processed RBs [14].This means the more UEs served at each VM, the more RBs that are processed, which increases the dynamic or traffic based consumption as a greater share of the finite server resources is demanded.Consequently, the total number of processed RBs in a server (RB T = N n RB n ) is added to P srvr to assemble the total consumption of a server (P server = P srvr + e ϑn * RB T ).RB n denotes the total number of RBs processed by each VM, and ϑ n is the increment factor due to processing RB n by VM n in any of the server resources.These RBs are concerned with adding an important decision weight to both sides of the EE formulation.
Another performance factor is the time it takes the VM to process these RBs.The execution time of the workload in a traditional BBU server increases linearly with both the number of RBs and the modulation coding scheme (MCS ∈ {9,16,25}) that is used to transmit/receive these RBs [14].In a virtualisation environment, a single VM requires π times more delay to process a packet compared to the traditional counterparts.This is due to increased accessing calls and interrupts among VM-HV and HV-server's unit, where π can reach up to 5 [12].Modelling this concept requires introducing a factor called MCS index (mcs) to describe the linear relationship between the RBs and execution time in a bare BBU server (τ bare ), where τ bare = τ init + (mcs * RB n ), τ init denotes the initial BBU delay due to other BBU functions, rather than MCS.Furthermore, the HV delay (π) is added to the above description to produce the execution time of virtualised server (τ n v ) when 1 VM is found in the server, i.e., τ n v = τ bare + π.Subsequently, the total execution time of all VMs (τ vms ) is produced by jointly adding τ n v of all available VMs, where

B. Gain of Virtualisation
With a 10 MHz bandwidth, there are 50 RBs available at each 0.5 ms, or 100 RBs per transmission time interval (TTI), also called (Subframe=1 ms).Whilst the minimum allocated resources to a single UE is 12 sub-carriers in one TTI, which is equivalent to 2 RBs in the time domain.Eventually, the BBU can serve up to 50 UEs each millisecond.If 100 UEs are connected, the scheduler takes at least 2 ms to serve them all.This logic is correct, but commercially it is difficult to design such scheduler to handle 50 UEs in 1 ms, because there are a minimum number of RBs assigned to each UE in a certain TTI to guarantee its minimum QoS.This means there can be more than 2 RBs assigned to the UE in each TTI.In a virtualised server, the total number of scheduled RBs in one TTI can reach up to N × BB, where BB denotes the number of traditional BBU servers.This is because each VM performs as a separate traditional BBU device through a software abstract.Amongst the VMs, the HV is responsible for managing these available RBs, where each VM n is assigned a certain number of (RB n ), according to their load, the UE's channel condition, the UE's distance, etc.On the other hand, this increment in the number of RBs that are required to be processed in one TTI imposes another speculation regarding whether there is any available server capable of serving such a number (i.e., N × BB)?.In answer to this, the current advances in hardware manufacturing show that a single BBU server is capable of processing up to 900 LTE UEs [33].In traditional network operation, this number can barely be reached, as in actual server performance, not all UEs are active at the same time.However, in virtualisation environment, this offers some nonutilised server resources that can be exploited by the VMs.Hence, such a situation allows each VM to process its allocated RBs on time.Eventually, this facilitates a reducing in the total number of bare servers, which diminishes the consumed power and improves the EE, without compromising the network performance.

III. TOTAL POWER CONSUMPTION
Total PC of the virtualised server is also subjected to the effects of other losses such as, AC-DC, DC-DC and cooling loss in a straight forward manner.These losses is linearly scaled with other components' PC and approximated by using loss factors (σ DC , σ AC , σ cool ) to represent AC-DC, DC-DC and cooling, respectively [34].Successively, the total PC of virtualised C-RAN (P vCRAN ) is modelled as the combination of virtualised cloud BBU server's PC ( ), and RRH's PC (P RRH ) which is modelled as ( ).Moreover, the RRH transmitted power (Pt m ) is equivalent to its received power (Pr m ) if no power gain is added.P opt,P and P opt,R denote the PC of optical devices in the BBU pool and the RRH, respectively.σ DC ,R and σ MS ,R denote RRH's DC and RRH's MS loss factors, respectively.Moreover, there will be no cooling offered to the RRH.Finally, P RF is RF unit's PC, Ptm η PA is the PC of power amplifier (PA), and η PA is its efficiency.

IV. PROBLEM FORMULATION
The Sum EE of vC-RAN system is defined as how much sum data rate can the UEs receive in one Watt.The formulation of such problem is described as follows: ind holds the speed of light (sol) inside the optical fiber, and ind is the refractive index of the optical fiber, which is assumed to be identical for all RRHs-BBU pool links.Through the third constraint (12), the latency of the server due to one VM (τ n v ) can be controlled and the latter as well as the fixed value of initial HV delay π can control the delay τ bare .The latter, in turn, affects the number of processed RBs (RB n ) of each VM.By substituting the third constraint into the second, the total latency of the system will not exceed the latency threshold (τ thr ), where τ thr vms is the maximum latency threshold allowed to all VMs.Each VM n in constraint (13) can not exceed the maximum number of allocated RBs RB n .Accordingly, constraint ( 14) deals with maximum number of RBs to be processed in the server by all VMs.Because recently the LTE servers' processing capability has become greater and more efficient, the maximum number of processed RBs (RB T ) is suggested as being 800.This number is shared amongst the VMs, each exploiting its allocated share to assign the required number of RBs to each UE, while satisfying other constraints.The sixth constraint (15) imposes the limitation regarding the power received by all UEs U m on the total RBs RB n .Finally, the constraint ( 16) indicates the total received power by all RRHs cannot exceed the total transmitted power of the BBU pool (Ppool).
To solve our problem, PSO, QPSO and a GA are used to search the solution space of a function to find the sub-optimal number of VMs (N) that maximises an objective/fitness function (EE) of C-RAN.The predominant issue is that the use of such algorithms holds a time constraint, which is the time needed to obtain the solution.However, this has been overcome in this work, as for a specific geographical area where the pool resides, a huge amount of potential traffic is considered.Specifically, for each Monte Carlo iteration, new UEs, RRHs, channel conditions and RB assignments are established by using PPP distribution.These iteration will cover, examine and expect all possible traffic situations in the area of interest on daily basis or for large periods of time.Hence, repeating the optimisation process each time the traffic is changed is no longer necessary.Another constraint is the sub-optimality of the given solution, which is also ignored, because, PSO, for example, yields a solution that is nearest to the optimal one.Since the required number of VMs is an integer, rounding the solution up and down will mitigates such behaviour, which holds true for QPSO and GA.Specifically, down scaling/flooring the solution variable (N) is preferred to prevent server over-load and thus, ensure its safe operation.
The reason of adapting PPP at each iteration, rather than treating uniformly, is to contemplate the practical deployment and real life scenarios during the distribution of RRHs and UEs [35].The Poison distribution measures the probability that a certain number of events occur within a certain period of time.This stochastic process is one of the most important random processes in probability theory.It is widely used to model the random points in time and space, being an accurate way to model the spatial distribution of the geographical RRH location [36].As it offers no constraint on the distances of the adjacent RRHs, it provides more realistic cell shape as well as SINR and EE measurements in comparison to the uniform distribution, as represented by hexagonal, circular, or triangle cell shapes.In PSO, the particles have random speeds through the solution space, each being assessed by an objective/fitness function with a best stored particle solution (pbest) and best stored overall solution (gbest).Based on the current particle's (i) position (N i ), its speed (v i ), its past best position (pbest i ) and best global position of whole particles (gbest), each individual particle is being updated interactively.PSO first initialises its particles or generations, with each particle representing a possible sub-optimal solution (the potential number of VMs), this possibility then undergoes the process described in Algorithm-1.In which, the possible particle solution (N i ) is subjected to the constraints.
Algorithm 1 : PSO Main Algorithm while (not terminating condition) do Evaluate each particle for particle i, i = 1, 2, ..., I do (update the best positions) ) end for, end for for particle i, i = 1, 2, ..., I do (generate the next generation) end for end while Furthermore, at each particle evaluation, the Monte Carlo inner loop performs the following: (i) randomly generating the RRHs and UE assignments using PPP; (ii) repeating these steps R times of possibilities; and (iii) calculating the average sum EE of the UEs within the network, as shown in Algorithm 1.These steps will be repeated as many times as the number of particles (I).The reason of proposing the inner loop of Monte-Carlo with R iterations inside the main PSO algorithm with I particles, is to ensure that the solution is qualified for an enormous number of network formations, i.e., (R × I).For example, if R = 1000 and I = 100, this will produce (1000 × 100 = 100000) possible RRHs and UE resources assignments, as at each particle i there are R iterations.Practically, this means that the resulting solution is valid for this number of network distributions.Indeed, if R is increased, this possibility approaches unity, and the number of covered network scenarios will be virtually closer to infinity.Eventually, this matter will strengthen the efficiency of the solution.However, this comes with increased execution time of the algorithm, which has been previously neglected.As such, this complexity is expected to increase when solving a paradigm with larger geographical area due to increasing the number of RRHs and UEs.PSO has a particular problem that arises from its structure of being a continuous algorithm.Consider a set of points such as {A, B, C and D}, in order for one particle to move from A to D, this requires to passing the points in between (B, and C).If these points are local minimums, there will be an inbuilt problem.As a solution, quantum PSO (QPSO) is proposed to follow a purely probabilistic scheme in which the next position is drawn from a probability distribution, thus having a discrete nature.QPSO was first proposed in [37] as a good complexity-performance trade-off method.It is based on both, the physical principle of quantum mechanics, and the social behaviour of swarms of various animals.In quantum theory, a qubit is the smallest unit of information, with its value relies in the range [0,1].Each particle (i) holds quantum energy q i (t).The QPSO algorithm is similar to PSO, it stores the values of best position previously found for each particle pbest and the global best position gbest.From these positions, the best global and individual quantum energy values are calculated in order to generate changes in the particle positions.The algorithm is implemented to our problem, as follows: 1) Generating the initial particles, each particle i with energy q i (t) is randomly generated at position N i (t).2) Evaluating N i (t) through the cost function (9), if there is a better position for the particle i, the best individual and global positions will be updated.3) Changing the energy q i (t) of each particle according to: q i (t + 1) = c 1 q i (t) + c 2 q best i (t) + c 3 q best g (t), where c 1 , c 2 , c 3 denote the weight of each component of energy, and c 1 + c 2 + c 3 = 1.4) return to step 3 until reaching the total number of iterations [38].

V. RESULTS AND ANALYSIS
To correlate the findings of our problem with real-time scenarios, the resulting parameters were selected from [32], [34], and [39]- [42], as shown in Table I.The experimental data related to the PC of each component in the server demonstrate that initial PC of the CPU is 29.6W,RAM is 4W, NIC is 2W and HDD is 25W.Moreover, the rest of PC in the server is a result of the overhead, i.e., the AC-DC, DC-DC and cooling.Moreover, the parameters used in Table I led to about 40% PC increment within each virtualised server at maximum workload.This increment was real-time measured in [12], which represents the cost of over-utilising the server due to the existence of many VMs that share it's units.However, the proposed model is not constrained to only yielding this amount of percentage, but rather, is valid for any type of server through adjusting the model parameters.It is worth mentioning that different specifications of the server can affect the algorithm regarding the EE and resulting N, because each server might hold different manufacturing initials and efficiencies, which is required to be adjusted through PC initials and tuning factors such as, ε, ϑ n , etc.
Fig. 1 shows the sum EE of C-RAN using only distance based power allocation from the RRHs to UEs (3).Additionally, the power from BBU pool towards the RRHs is distributed using three methods, these are PAM1 (1), PAM2 (2), and direct/equal power to all RRHs.In all cases, PSO is outperforming the GA; more information about the GA operation can be found in [43].
Moreover, Fig. 2 shows a comparison of sum EE using the same power allocation methods, i.e., PAM1, PAM2 and direct power, but the RRHs-UE power allocation is based on distance and channel gain of (4).Based on the distance, the UEs are classified into center and edge.The RRH-center UEs are assumed to always have better channel conditions than RRHedge UEs.Hence, the distance based technique allocates more power to the RRH-center UEs, which results in maximising the EE.
Furthermore, Fig. 3 shows the comparison of sum EE of C-RAN using PAM1, PAM2 and direct power, with the UEs  having power allocated according to (5) [30].Due to the existence of channel path loss within this method, it produces less EE performance when compared to the methods (4) and (3).Whilst the path loss degrades the power received by the UE, the SINR will be degraded, which results in less EE Fig. 4 shows a comparison of sum EE of C-RAN by using the same power allocations of Fig. 2, but the system in not virtualised.Clearly, the non virtualised case has produced more PC, Hence, the network EE has been reduced.For further inquiry into this result, the traditional server consumption is produced.To achieve this, the effect of N and RBs has been removed from P vCRAN formulation.If (P bbu ) symbolises the bare server's PC, where P bbu = [P ram + (P cpu × K ) + (P nic × L) + P hdd ], where K and L denote the total number of CPUs and NICs, respectively.Subsequently, the number of bare servers (BB) is multiplied by P bbu .Afterwards, the amount [(BB × P bbu ) + P RRH ] has replaced the virtual case (i.e., P vCRAN ) in Algorithm 1, which upgrades the PC.Moreover, Fig. 5 shows the sum EE performance comparison amongst QPSO, PSO and GA using UE power allocations of ( 4) and ( 5), while RRH-BBU pool allocation is based on PAM2, or (2).Due to channel path loss, the model of (4) constantly overcomes the EE output of model (5).Concurrently, QPSO algorithm performs better than the other (PSO and GA) algorithms, but with more convergence time due to its complex behaviour.However, all cases have resulted values of N in between 6 and 7.The selected tuning parameter of QPSO, PSO and GA are as follow: for PSO, the total number of particles I is equivalent to 100, inertia weight (w) is 0.9, the cognitive parameter (c 1 ) is 0.2, and social parameter (c 2 ) is 1.2.For GA, the number of generation is 100, the population size is 100 and the crossover probability is 0.8.Finally, QPSO parameters are selected according to [38].All the results have been obtained after running the algorithms 20 times to overcome the randomness behaviour of heuristic algorithms.Then the run with highest record of each case has been selected.In  all cases, PSO always converges faster than GA and QPSO, in about 23 generations, while GA constantly converges in more than 35 generations, and QPSO in more than 40 generations.
To understand further how the different parameters influence the EE outcome, we give a simple example starting with a single UE.The UE's received power is based on the RRH's power received from the pool.The RRH's received power affects the PA consumption, whilst the latter, in turn, influences server and total network consumption.Since the number of VMs (N) relies within the formulation of total PC, this parameter is relatively affected.If there are no constraints, N tends to be zero, so as the PC is minimised, which maximises the EE.However, there are two effective constraints to prevent such failure.First, the total latency threshold, which binds the execution time to the restricted value in (11) and (12), and this accordingly, affects the resulting N. Second, the total number of RBs is involved in both the PC and sum rate calculations.When running the algorithm, the RBs aim to increase the sum rate of (9), whilst the same time decreasing the PC, because more processed RBs means more PC, as described in Section II.

VI. CONCLUSION AND POTENTIAL DEVELOPMENTS
The EE maximisation problem in virtualised C-RAN has been presented in the context of estimating the number of VMs that one server can support, without affecting the operating efficiency.To enable such an evaluation, a power model of virtualised server has been proposed to simulate the real time measurement.This model reflects the consequences of increasing the number of VMs found within the server, and processed RBs by each VM.In addition, the time constraint due to virtualisation technology is modelled as well as the execution time of processing the RBs in bare servers.This formulation is integrated with the total C-RAN's latency to participate in the optimisation process.While considering all the possible assignment in an area of interest using PPP oriented Monte Carlo method inside the main PSO, QPSO and GA algorithms, the network EE is evaluated.By adapting Monte Carlo, the necessity to repeat the optimisation process is avoided.At the same time, the long/short traffic variation problem has been overcome.
Multiple comparisons can be established when changing the way UEs receive their power.For example when using PSO, GA or pattern search algorithms instead of the power allocations of ( 3) and (4).However, the latter are proposed to relieve the run time.Finally, the provided mathematical representation in this work can be easily used to optimise the placement of the visualised BBU pool.By exploiting the distances amongst RRHs-BBU pool (d m,o ), the coordinates x and y can be considered as extra optimisation variables, instead of using x = 0 and y = 0 as reference values in this work.This can optimise the virtualised BBU pool position to guarantee a mitigated average delay among RRHs-BBU pool, reduced system PC and an enhanced EE.
Furthermore, each UE (u) holds an Euclidean distance (d m,u ) to the serving RRH m, where d m,u = (x m − x u ) 2 + (y m − y u ) 2 .The RRHs are positioned at coordinates (x m , y m ), each RRH is located with Euclidean distance (d m,o ) to the BBU pool, where d m,o = (x m − x ) 2 + (y m − y) 2 .The pool in turn is positioned at (x = 0, y = 0) at the centre of the geographical area.
du is the aggregate interference from all other interferers RRHs (Inf ) excluding the serving RRH m.Moreover, r inf u = (R inf u ) −α stands for the path loss from the interferer RRH (inf ) to the UE u, R inf u is the distance of interferer RRH inf to the UE u, and h inf u

m
,rb ≤ Pr m , P n m,u,rb ≥ 0, ∀ u, rb (15) M Pr m ≤ Ppool , ∀ m (16) where Csum n m,u,rb = B o log 2 (1+P n m,u,rb σ n m,u,rb ), Csum thr is the minimum QoS requirement.The second constraint (11) represents the round trip latency restriction, where τ u = 2 × arg max( dm,u sol ) pertains the round trip signals latency of the most distant UE u served by RRH m.In addition, τ m = 2 × arg max( dm,o v ) denotes the maximum round trip latency of the signals travelling from RRH m to the BBU pool.Furthermore, v = sol

Fig. 1 .
Fig. 1.EE comparison of virtualised C-RAN using PAM1, PAM2 and direct power allocation, the UEs are allocated power according to (3).

Fig. 3 .
Fig.3.EE evaluation of virtualised C-RAN using PAM1, PAM2 and direct power allocation, the UEs are allocated power according to(5).