Joint Optimization of Content Placement and User Association in Cache-Enabled Heterogeneous Cellular Networks Based on Flow-Level Models

.


Introduction
Driven by the proliferation of smart devices and abundant applications, the past decade witnesses a sharp rise in mobile data traffic.It is predicted that the global mobile data traffic will reach 49 exabytes per month by 2021 [1].An effective approach to address the explosively growing data volume is to deploy plenty of low-power small base stations (SBSs) together with traditional macrobase stations (MBSs) to form a heterogeneous cellular network (HCN) [2,3].The densely deployed SBSs can meet the huge demand for high-speed data traffic in hot-spots and fill coverage holes of macrocells.However, deploying high-speed backhaul links to connect massive SBSs to core networks brings about huge capital and operational expenditures, which are unaffordable for network operators.
Among the huge data traffic, mobile video streaming is expected to account for 78% of the total data traffic in 2021.Many studies have shown that video streaming in wireless networks exhibits significant regularity [4][5][6].Particularly, a few popular contents are requested frequently by different users at different times, which is referred to as asynchronous content reuse [7].Repetitive content transmission raises congestion in backhaul links and core networks, especially at peak hours, which increases the content retrieval delay and decreases the efficiency of content delivery.This issue is further aggravated by the limited-capacity backhaul links of SBSs.Borrowing the concept of information-centric networking in wired networks [8], caching at the wireless edge [9][10][11][12] has been proposed to reduce the backhaul usage via equipping BSs and mobile devices with low-cost cache units.In the cache-enabled cellular networks, majority 2 Wireless Communications and Mobile Computing of requested contents can be directly obtained from local storage, nearby devices or BSs, which significantly alleviates the backhaul congestion and reduces the download delay.The implementation of content-centric networking paradigm in HCNs can unleash the potential of HCNs and is an important candidate technology for the fifth generation communication systems [10,11].
1.1.Related Work.Many studies focus on the performance analysis of cache-enabled communication networks.In [13], the performance of a coded caching scheme is analyzed from the perspective of information theory.The work of [14] contrasts the cache-enabled device-to-device (D2D) content delivery with other alternative approaches and demonstrates its superiority.The authors of [15] investigate the scaling law of link rates with respect to some network size parameters in a cache-enabled wireless network and analyze the sustainability of this network.Using tools from stochastic geometry, [16] derives the expressions of outage probability and average delivery rate in a cache-enabled small cell network and analyzes the impacts of some network parameters on the system performance.Likewise, Yang et al. [17] deduce the outage probability and average ergodic rate in cache-enabled HCNs.Reference [18] investigates the energy efficiency of the cache-enabled wireless access networks and analyzes the effects of some factors on it.
Generally, the content delivery in cache-enabled networks consists of two phases [13]: content placement phase and content delivery phase.In the content placement phase, some strategic contents are prefetched via backhaul links and cached at BSs during off-peak hours.In the context of D2D networks, contents can also be predownloaded from BSs to devices and cached at devices.The content placement schemes are crucial to the performance of cacheenabled networks and are studied extensively.In [19], the content placement problem is formulated as maximizing a monotone submodular function over matroid constraints and a greedy algorithm is proposed to solve it.Reference [20] proposes a distributed belief propagation algorithm to solve the content placement problem, which is aimed at minimizing the download latency.Taking into account BS cooperation and the propagation delay in backhaul links, Peng et al. [21] propose a low-complexity algorithm to optimize the caching placement strategy.The above-mentioned content placement schemes are all implemented in concrete network scenarios, and the outputs of these schemes are caching states of given contents at specific BSs or devices.These schemes are termed deterministic caching policies.Another line of work focuses on the probabilistic caching policy, which optimizes a caching distribution for a group of cache-enabled nodes.These studies often model the networks based on stochastic geometry and find the caching policies that optimize the derived performance metrics.In [22], an optimal probabilistic caching policy is proposed to maximize the content hit probability, which can be defined based on either signal-to-interference-plus-noise ratio (SINR) model or Boolean model.Reference [23] proposes the tier-level content placement policies in HCNs.Taking into consideration millimeter-wave and full-duplex communications, [24] proposes a content dissemination mechanism based on proactive content fetching in cache-enabled full-duplex D2D networks and analyzes its performance from an evolutionary perspective.
After contents have been cached at BSs and/or devices, content delivery schemes direct how to deliver the requested contents to users, e.g., routing, resource allocation, transmission schemes, and so on.In [25], the cache-aware user association problem is formulated as a one-to-many game and the objective is to minimize the backhaul usage at each SBS.By leveraging both physical and social characteristics, [26] jointly optimizes user pairing, channel allocation and power control in cache-enabled D2D networks.Cheng et al. [27] propose three power allocation algorithms with different objectives in cache-aided small cell networks with limited backhaul.In a given caching situation, [28] proposes a distributed relaxing-rounding algorithm to jointly optimize user association and resource allocation in small cell networks.Reference [29] studies the multicast scheduling in cache-enabled wireless networks.By formulating the optimization problem as a Markov decision process, the authors analyze the structure of the optimal policy and propose a low-complexity suboptimal policy.
The performance of content-centric networking paradigm highly depends on the cooperation between content placement and content delivery.Content placement determines the upper bound of the performance of content delivery, and content delivery is implemented in a given content placement situation.Joint optimization of content placement and content delivery takes into account the interaction between these two aspects and can improve the performance dramatically compared with separate approaches.Reference [30] formulates the joint routing and caching problem as a variant of the facility location problem, and proposes an approximation algorithm to solve this NP-hard problem.In [31], the joint optimization of request routing and content caching in a network with given topology is investigated.The authors propose approximate solutions for this problem in congestion-insensitive and congestion-sensitive models, respectively.In [32], the authors develop the optimal cooperative content caching and delivery policy in a network where both BSs and devices have cache capacity.The work in [33] jointly optimizes caching, routing, and channel assignment in a collaborative small cell networks, in which network coding is applied to enable multiple BSs to cooperatively transmit contents to a user.Reference [34] considers multicasting in cache-enabled HCNs and optimizes the content caching at different tiers to maximize the successful transmission probability.The authors of [35] design a probabilistic caching policy and a random scheduling policy to maximize the successful offloading probability in cache-enabled D2D networks.

Motivation, Contributions, and Organization.
Although the joint optimization of content placement and content delivery has been investigated extensively in the literature, there are three important issues that are not considered in the state-of-the-art researches, as specified in the following.
First, the timescale of content placement is much larger than that of content delivery.The rate of content placement should accord with the variation of content library and the content popularity distribution over it, which can be deemed constant during a few days [7].The timescale of content delivery, which is affected by user mobility and user activity, ranges from seconds to minutes.Existing works, such as [32,33,36,37], often optimize content placement at the timescale of content delivery; i.e., the content placement is designed based on a snapshot of a network with given user distribution.When user distribution changes, the previously derived content placement scheme is no longer optimal and it should be updated based on the new user distribution.Frequent content replacement due to user mobility and user activity makes the backhaul links congested, and the advantages of caching diminish or even become negative.Moreover, because of the combinatorial characteristic of content placement problems, their solutions often have relatively high complexity, and they are not practicable in a highly dynamic scenario.
Second, the local content popularity often differs from the general content popularity.Content popularity distribution is a holistic statistical measure over a large area, and it may obscure meaningful difference in content popularity among small regions.References [38,39] have verified this point based on analysis of the real YouTube datasets.Most current researches simply assume identical content request probabilities among all the users, and neglect the geographic locality of user interests.
Third, the spatial traffic distribution is uneven over a large area.Due to the unequal population sizes in different regions and the difference in user behaviors, the spatial traffic distribution exhibits evident heterogeneity [40,41].The traffic demand in hot-spots is much larger than that in suburbs.In existing works, especially the ones in which models are constructed based on stochastic geometry, such as [22-24, 34, 35], the traffic demand is uniformly distributed.This setting differs from real situation and the performance of these approaches will be degraded in practical application.
For addressing the above challenges, in this paper, we formulate a framework for joint optimization of content placement and user association in cache-enabled HCNs based on flow-level models [42,43].In the flow-level models, networks are modeled as queuing systems, in which BSs correspond to servers and user requests correspond to flows to be served by these servers.Different from traditional snapshot models, flow-level models focus on the spatial traffic demand distribution during a time period instead of the locations and demands of individual users at a certain moment.With the help of flow-level models, we jointly optimize content placement and user association based on the aggregated traffic demand during a long time period rather than instantaneous traffic demand.There are two kinds of flow-level models proposed in the literature: load-noncoupled (LNC) model [42] and load-coupled (LC) model [44].In the LNC model, the intercell interference is assumed to be static, while, in the LC model, the intercell interference interacts with the loads of other BSs.In this paper, we propose a greedy content caching and content-level selective association (GCC-CSA) algorithm in LNC and LC model, respectively, for joint optimization of content placement and user association in cache-enabled HCNs.The contentlevel spatial traffic distribution is modeled to simulate the difference in content popularity among different regions.The superposition of all the content-level spatial traffic distributions forms the overall spatial traffic distribution.This joint optimization problem is formulated as a mixed integer nonlinear programming (MINLP) problem and its objective is to minimize the average delay of a typical flow.This formulation takes into account the limited backhaul capacity of SBSs and its effect on the achievable data rate of each content.For tackling this problem, we decouple it into two interrelated subproblems.First, we propose a CSA algorithm to optimize user association in a given content placement situation.The requests for different contents at the same location are allowed to be served by different BSs due to different caching states of these contents at nearby BSs.Second, we propose a GCC algorithm to add the content that yields the maximum reduction in the value of cost function to each BS in a given user association situation.These two algorithms are alternately executed until the caches of all the BSs are filled to capacity.The derived contentlevel selective user association takes effect unless the contentlevel spatial traffic distribution changes, which means that user association is optimized at the timescale of content placement.
Our main contributions are summarized as follows.
(1) We formulate the joint optimization of content placement and user association based on flow-level models.In this formulation, content-level spatial traffic distribution is modeled to simulate the locality of content popularity and the heterogeneity of spatial traffic distribution, and user association is optimized at the timescale of content placement.To the best of our knowledge, this is the first work that addresses all the issues summarized above.
(2) We jointly optimize content placement and user association in LNC and LC model, respectively.LNC model and LC model are two typical network models.Most of existing works based on flow-level models focus on the LNC model due to its favorable properties and elegant solution.In addition to the formulation and solution in LNC model, in this paper, we also propose corresponding formulation and solution in LC model.
(3) We propose a GCC-CSA algorithm to tackle the joint optimization problem.We decouple the complex MINLP problem into two interrelated subproblems.The CSA algorithm is proposed to find the optimal user association in a given content placement situation and the GCC algorithm is proposed to update the content placement.Some properties of these algorithms are also proved.
The rest of this paper is organized as follows.Section 2 constructs the LNC and LC models for the cache-enabled HCNs.Section 3 formulates the joint optimization of cache placement and user association as an MINLP problem.In Section 4, we present the GCC-CSA algorithm in LNC and LC models, respectively.Section 5 defines two performance metrics, average delay and occupied backhaul data rates, to evaluate the performance of the GCC-CSA algorithm.In Section 6, we give the implementation details and complexity analysis.Section 7 compares the performance of the proposed algorithm with that of other schemes through simulations.Finally, Section 8 concludes this paper.

System Model
We consider the downlink transmission in a cache-enabled HCN.A geographic area L ⊂ R 2 is covered by a set of BSs M = {1, 2, . . ., }, which includes MBSs and SBSs.M M and M S denote the set of MBSs and SBSs, respectively.For alleviating congestion in limited backhaul links, each SBS  ∈ M S is equipped with a storage unit with capacity   to cache strategic contents.Since MBSs are often equipped with high-capacity backhaul links, we do not consider caching at MBSs in this paper.The total transmission bandwidth is .The transmit power and backhaul capacity of BS  ∈ M are denoted by   and   , respectively.
In a certain time period, all the contents possibly requested by users in L constitute a file library F = {1, 2, . . ., }.The size of content  ∈ F is denoted by V  .According to statistical analysis [5,6], the probability that a certain content is requested in L can be modeled by Zipf distribution.If these contents are sorted according to their popularity in descending order, the probability that the -th content is requested is given by where  ≥ 0 characterizes the skewness of Zipf distribution.A large  means that a small number of popular contents account for most content requests.In the sequel, contents and files are used interchangeably.
In the flow-level model, requests for content  are assumed to follow an inhomogeneous Poisson point process with arrival rate per unit area   () at location  ∈ L. We assume that the Poisson point processes with respect to all the contents are independent from each other.According to superposition theorem, the process characterizing requests generated at  is also a Poisson process, and its intensity is () = ∑ ∈F   ().The probability that  is requested at  is obtained as   () =   ()/(), and the probability that  is requested in L,  ,L , is calculated by Obviously, { ,L } follow Zipf distribution.The content request probabilities in a subset L  ⊂ L also follow Zipf distribution, but the order of contents and (or) the skewness parameter may change because of locality of content popularity.The traffic density of  at ,   (), is defined as the average required data rate of  at  per unit area, and it is obtained as   () = V    ().In other words,   () is the average required amount of data with respect to  at  per unit area per unit time, and it characterizes the content-level spatial traffic distribution.The traffic density at  is simply given by () = ∑ ∈F   () and it captures the overall spatial traffic variability.
According to Shannon's formula, the radio link data rate from BS  to a device located at  is   () =  log 2 (1 +   ()).  () denotes the SINR experienced by a device at  with respect to BS , and it is given by where   () is the channel gain from BS  to location ,   () is the interference received from other BSs except BS ,  2 is the power of background noise.Since the data rate is evaluated at the timescale of content placement, which is much larger than the coherence time of wireless channels, fast fading is not contained in   ().
When a data flow requesting content  at location  is served by BS , the load density of BS  at  with respect to  is defined as where denotes the achievable data rate of  from BS  to .  , is a binary variable indicating whether BS  stores .When  , = 1, content  is cached at BS  and it can be transmitted to the receiver without using the backhaul link.In this case, the achievable data rate of  from BS  to  is the data rate in the radio link.If  , = 0, BS  does not store  and content  must be retrieved via the backhaul link.Accordingly, the achievable data rate of  from BS  is limited by backhaul capacity   .The physical meaning of  , () is the fraction of time required to deliver traffic density   () from BS  to  in unit time.
Let  , () denote the probability that a data flow requesting content  at location  is routed to BS .Of course we have  , () ∈ [0, 1] and ∑ ∈M  , () = 1 for any  and .This definition allows content-level selective association and it contains traditional user association policies that are insensitive to requested contents.Base on { , ()} and { , ()}, the load of BS  can be expressed as where  is an arbitrarily small positive constant and it is introduced to avoid some intractable and trivial situations in the following formulation and solution.According to the above definition,   can be interpreted as the total fraction of time needed for BS  to serve all the associated flows in unit time, which cannot be larger than 1.
Given { , ()}, the arrival process of data flows to BS  follows Poisson process with arrival rate   = ∫ L ∑ ∈F (  () , ()).If multiple flows associated with a BS are scheduled in a round robin manner, the BS can be modeled as an M/G/1 multiclass processor sharing (MCPS) queue [45].Multiclass means that users at different locations receive different data rates depending on channel conditions and caching states, and processor sharing means that the associated flows are scheduled in a round robin manner.

Load-Non-Coupled Model.
In the LNC model, the interference received by a device is assumed to be static and it is independent of the activity of other BSs.This assumption is reasonable in a system with fractional frequency reuse or enhanced intercell interference cancellation.When these techniques are applied, the intercell interference itself and its variation is reasonably negligible [42].In this model,   () can be calculated as   () =  ∑  ̸ =     (), where  ∈ (0,1) characterizes the average received interference.
According to definition in (6), the feasible set of  = [ 1 ,  2 , . . .,   ] T in the LNC model is obtained as With given routing probabilities { , ()}, the load of BS  is independent of the loads of other BSs, thus this model is termed load-non-coupled model.P NC has the following property.
)} and { 2 , ()} are corresponding routing probabilities of  1 and  2 , respectively.Let  be a convex combination of  1 and  2 .Given  ∈ [0, 1], for all  ∈ M, we have and its routing probabilities { , ()} satisfy all conditions in (7).Thus  ∈ P NC and P NC is a convex set.

Load-Coupled Model.
In a cellular network with frequency reuse factor of 1, all the BSs work on the same frequency band.In this scenario, the interference received by a user varies considerably depending on the activity of other BSs.It varies at the timescale of flow dynamics and accurately modeling these correlations is intractable [44].For tackling this issue, Fehske et al. [46] propose to model the dynamic interference by the time-averaged interference.If the load of BS  is treated as the probability that BS  is transmitting, the SINR can be expressed as where ∑ ∈M, ̸ =       () is the average interference in a long time period and it replaces the instantaneous interference at any moment in this period.Note that   (, ) relates to the loads of all BSs except BS .The radio link data rate in the LC model is   (, ) =  log 2 (1 +   (, )) and the load density of BS  at  with respect to  is written as where  , (, ) ≜  ,   (, ) Similarly, the feasible set of  in the LC model is given by Since   is derived based on   (, ), it is coupled with the loads of all the other BSs.Let   () = min{∫ L (∑ ∈F  , (, ) , ()), 1 − } and () = [ 1 (),  2 (), . . .,   ()] T ; then a load vector  ∈ P C must be a fixed point of (•); i.e.,  = ().For given routing probabilities { , ()} and caching states { , }, we can find a unique fixed point in [0, 1)  according to Theorem 1 in [46].

Problem Formulation
We aim to find the optimal content placement scheme { , } and user association policy { , ()} that minimize the average content delivery delay in a cache-enabled HCN.Although routing probabilities { , ()} describe user association in a probabilistic manner, the values of optimal routing probabilities are binary under given caching states, as shown in Section 4. In the M/G/1 MCPS queue, the average number of flows at BS  is given by E[  ] =   /(1 −   ) [47], and the total number of flows at all BSs is ∑ ∈M   /(1 −   ).According to Little's formula, minimizing the average number of flows in a BS is equivalent to minimizing the average delay of a typical flow in this BS.For minimizing the average delay of all flows, the joint optimization of content placement and user association can be formulated as min is equivalent to minimizing the cost function ().P denotes P NC or P C depending on whether loads are coupled.D = [ , ] × denotes the caching state of each file at each BS, The second constraint ensures that the total size of cached files at each BS cannot exceed corresponding cache capacity.Due to the correlation between  and { , ()}, we can obtain the optimal content-level selective user association during the course of finding the optimal .The details are presented in Section 4 Problem (13) belongs to MINLP problems and it is NPhard.If D is given, however, this problem degenerates into a user association problem, which is similar to the ones studied in [42,44].In the following section, we present the proposed GCC-CSA algorithm to solve problem (13) in LNC model and LC model, respectively.

GCC-CSA Algorithm
In this section, we present the GCC-CSA algorithm in LNC model and LC model, respectively.The GCC-CSA algorithm is an iterative algorithm, and each iteration is composed of two steps: CSA and GCC.In the CSA step, the optimal user association is derived under given caching states.Based on the derived user association results, the GCC step adds the file that yields the maximum cost reduction to each BS.Beginning with empty caches, these two steps are alternately executed until all the caches cannot accommodate any more contents.

Load-Non-Coupled Model.
At first, we find the optimal user association under given caching states.In the LNC model, given D, problem ( 13) is simplified as the following form: Inspired by the work in [42], we first propose a CSA algorithm to solve problem (14), as shown in Algorithm 1, and then prove its optimality.
The following theorem states the convergence of Algorithm 1 and the optimality of the output.Theorem 2. In the LNC model, the sequence { () } derived from Algorithm 1 converges to the fixed point of  = (), and it is the unique optimal solution of problem (14).
Although user association is defined as probabilities in (7), Algorithm 1 shows that the optimal user association Initialization: D, small positive constant  and ,  > , stepsize  ∈ [0, 1),  (0) ∈ (0, 1 − )  ,  = 0 while  >  do for all location  ∈ L and content  ∈ F, the flow requesting  at  connects to BS  ()  () = arg max ∈M  , ()(1 −  ()  ) 2 ; for all BS  ∈ M and content  ∈ F, calculate the coverage area of BS  with respect to  L ()  , = { ∈ L |  = arg max ∈M  , ()(1 −  ()  ) 2 }; for all BS  ∈ M, calculate its new load   ( () ) = min{∑ ∈F ∫ L () ,  , (), 1 − };  (+1) =  () + (1 − )( () );  = ‖ (+1) −  () ‖ 2 ,  fl  + 1; end while Outputs: the optimal load  * =  () and the optimal coverage area L * , = L () , for all  ∈ M and  ∈ F Algorithm 1: The CSA algorithm for solving problem (14).Given certain D, we can obtain the optimal load  * and the optimal coverage area {L * , } according to Algorithm 1.Thus, for any BS  ∈ M, we have Since min{  (),   } ≤   (), we have ρ, ≤  * , and the cost function also decreases.Base on this fact, we propose a GCC algorithm for content placement in an iterative manner, as shown in Algorithm 2. In the GCC algorithm, the content that achieves the maximum cost reduction is added to each BS at each iteration.After all the BSs cache new contents, the content placement scheme D is updated and Algorithm 1 is executed again to obtain the new cell coverage areas associated with the updated D. This process continues until no more contents can be cached in any BSs.
In Algorithm 2, s = [s 1, , s2, , . . ., s, ] T denotes the occupied cache capacity of these BSs at the beginning of the -th iteration.F , denotes the set of noncached contents of BS  at the beginning of the -th iteration.G , denotes the set of contents that can be cached at BS  at the -th iteration.d , denotes the -th row vector in D  .With given {L * ,, }, the loads of a BS with respect to different contents are independent of each other, thus caching  , at BS  produces the maximum reduction in its load.Each BS caches the content that produces the maximum reduction in its load, and the cost function () also achieves the maximum reduction at a single iteration.The following theorem gives the convergence property of Algorithm 2.  17) shares the same form with problem ( 14), and its solution can also be derived from Algorithm 1 with slight modifications that  , () and  , () are replaced by  , (,  () ) and  , (,  () ), respectively.However, for proving the optimality of the solution derived from Algorithm 1 in LC model, P C must have the following property [44].

Theorem 3. The sequence of cost function values {𝑓(𝜌
Property 4 (full convertibility).P C is said to have the property of full convertibility if for ∀,   ∈ P C , there exist valid { δ, ()} that make the following equation hold for all  ∈ M: With the property of full convertibility, the following theorem guarantees that the solution derived from Algorithm 1 is also optimal in LC model.Theorem 5.In the LC model, if P C has the property of full convertibility, then the sequence { () } derived from Algorithm 1 converges to the fixed point of  = (), and it is the unique optimal solution of problem (17).
Proof.This proof is similar to the proof of The fourth line applies full convertibility property.Since  * , () = 1{ = arg max ∈M  , (,  * )(1 −  *  ) 2 }, so ⟨∇( * ), ( −  * )⟩ ≥ 0. Unlike P NC , P C is not necessarily a convex set.According to the approach proposed in [44], for ∀  ,     Since () is a convex function and P C ⊂ Conv(P C ), the minimum value of () in P C is also achieved at  * .Since () is a strictly convex function, the optimal solution of ( 17) is unique, and so is the fixed point of  = ().Proving that { () } converges to  * follows the same steps in the proof of Theorem 2, and they are omitted here.
Given D, the optimal load  * and the optimal cell coverage {L * , } satisfies If a certain  , is changed from 0 to 1, with the same coverage {L * , },  *  will probably change and it further influences the loads of other BSs in the LC model.The new load vector that makes the system stable,   , = [ 1, , ,  2, , , . . .,  , , ] T , is obtained from the following iterative formula: For other BS  ̸ = , Since the data rates of flows associated with BS  relate to the loads of other BSs rather than the load of BS , we have (,  (1)    , Wireless Communications and Mobile Computing Since  (1)   , , ≤  *  , for other BS  ̸ = , we have Thus we conclude  (2)    , ≤  (1)    , . Assuming  ()    , , for  = 2, 3, . .., for BS , we have and for BS  ̸ = , we have

Performance Metrics
As shown in (13), the objective of GCC-CSA algorithm is to minimize the average content delivery delay in a cacheenabled HCN.However, the objective function () in ( 13) corresponds to the total number of flows at all BSs, and it cannot be used to characterize delay in the flow-level models.For evaluating the performance of GCC-CSA algorithm in terms of delay, in this section, we define the average delay at a given location and the average delay in the whole area.In addition, we also define the occupied backhaul data rates of BSs to demonstrate the advantage of GCC-CSA algorithm in terms of decreasing backhaul load.

Average Delay.
For ease of definition and computation, we partition the continuous area L into massive pixels and user association policy is derived for each pixel.The traffic densities and the radio link parameters at the center of a pixel are viewed as the average traffic densities and average radio link parameters in this pixel.Let pixels be indexed by , and let Y denote the set of all pixels.In the M/G/1 MCPS queues, the time-averaged throughout of a flow associated with BS   () for content  at location  is given by    (), ()(1 −    () ) [49] (In the LC model, this expression is given by    (), (, )(1−   () )).Thus, the average delay for requesting any content at location  can be obtained as

Occupied Backhaul Data
Rates.Backhaul usage can be significantly reduced by proper content placement and user association strategies.In the flow-level models constructed in this paper, the occupied backhaul data rate of BS  is obtained as The average occupied backhaul data rates of MBSs and SBSs are calculated by  u,M = ∑ ∈M M  u, /|M M | and  u,S = ∑ ∈M S  u, /|M S |, respectively.

Implementation and Complexity
In this paper, the formulation of joint optimization of content placement and user association is based on the content-level spatial traffic distribution   () of each content  in a given area L. Time is divided into multiple time periods, which range from several hours to several days.The content-level spatial traffic distribution is assumed to be static during each time period, and it changes when the next time period starts.Before a time period starts, {  ()} in the forthcoming time period should be estimated in advance.Generally, {  ()} correlates with the content popularity distribution and spatial traffic distribution.There have been many studies on the prediction of content popularity distribution [50][51][52] and the analysis of spatial traffic distribution [40,53].With the aid of these methods, we can precisely estimate {  ()}.
Once {  ()} is obtained, it is imported into the GCC-CSA algorithm to derive the content placement and user association policy that take effect during the next time period.Thus, the GCC-CSA algorithm is an offline algorithm.Although the CSA algorithm is essentially a distributed online algorithm [42,44], it must be executed at the centralized controller in our scheme.The user association policy   () of each content  at each pixel  can be calculated by parallel computing, e.g., NVIDIA CUDA toolkit, to reduce the running time.The simulations in this paper are conducted by this approach.
The associated BS   () of a data flow depends on the requested file  and the location .Contents can be identified by naming contents at the network layer [8].With positioning techniques in cellular networks [54], the locations of users can also be easily obtained.Based on these techniques, when a content request arrives, it will be routed to corresponding BS with limited signaling overhead.
In the following, we analyze the complexity of the GCC-CSA algorithm in LC model, and the complexity of it in LNC model can be derived similarly.At each iteration in Algorithm 1, the number of operations for the first three steps is 3|Y|, and the number of operations for the last two steps is 2.Denote by  max the number of iterations that makes Algorithm 1 converge, then the total number of operations of Algorithm 1 is  1 ≜  max (3|Y| + 2).Denote by  max the number of iterations that makes equation (22) converge; then the total number of operations of ( 22) is  2 ≜  max |Y|.To capture the essence of the complexity of Algorithm 3, we make two assumptions: (1) all the files have the same size; (2) all the  BSs can cache  files at most.Considering the upper bound of complexity of Algorithm 3, the total number of operations of Algorithm 3 is given by ( 1 + ( 2 +  + 1)) +  1 .After some mathematical manipulations, the complexity of the GCC-CSA algorithm in LC model is obtained as (max{ max ,  max }|Y| 2  2 ).

Simulation Results
In this section, we validate the performance of the proposed GCC-CSA algorithm based on the spatial traffic distribution derived from a real network.For comparison, we also evaluate the performance of another two schemes: (1) most popular caching and max-SINR association scheme, denoted by MPC-MSA, and (2) most popular caching and content-level selective association scheme, denoted by MPC-CSA.MPC means all the BSs cache the most popular contents in a given area, and it has been considered in [16,17].The MPC policy only considers the overall content popularity distribution, and it ignores the heterogeneity of user preference over a large area.MSA is a user association method generally implemented in the current networks, and it serves as a baseline user association scheme.

Simulation Setup.
We consider a square area with side length 2km, as shown in Figure 1. 7 MBSs and 10 SBSs are located in this area.MBSs are deployed based on hexagonal grid model with an intersite distance of 800 m.SBSs are randomly deployed in this area.The minimum intersite distance between SBS and SBS (MBS) is set to 400 m.The bandwidth is  = 10 MHz, and the noise power spectral density is -174 dBm/Hz.The spatial traffic distribution during a certain time period derived from a network operator [53] is also shown in Figure 1.The content library contains  = 50 files.For simulating the locality of content popularity distribution, this area is partitioned into 9 regions, and the content popularity distributions in these regions are different.All these regions share the same skewness parameter but differ in the order of content popularity.For example, Content 1 is the most popular content in Region 1, but it is the fifth most popular content in Region 2. Two values of skewness parameter  ∈ {0.8, 1.2} are considered.All the files are assumed to have the same size, i.e., V  = 10 MB for all  ∈ F. For conducting the experiments on computers, the whole area is divided into 200 × 200 pixels.Other simulation parameters are given in Table 1.   2 and 3 give the average delay in the three schemes under various network configurations in LNC model and LC model, respectively. S ,  M , and  are defined in Section 5.1.We can draw the following conclusions from these tables.
(1) Under the same configuration,  in GCC-CSA is smaller than that in MPC-MSA and MPC-CSA, especially when  S is small.The proposed GCC algorithm always makes BSs cache the contents that produce the maximum reduction in average delay, and the CSA algorithm enables flows requesting different contents to connect to different BSs, which reduces the average delay further.When  S is small, CSA algorithm avoids many flows being associated with the SBSs that do not cache the requested files, and therefore the loads of SBSs decrease tremendously.When  S is large, however, the caching states of SBSs have minor influence on the loads of SBSs according to (5), and the advantage of GCC-CSA scheme over other schemes diminishes.
(2) In the same scenario, the three kinds of average delay in the three schemes decrease as  S increases (except  M in MPC-MSA scheme in LNC model).When  S increases, the achievable data rates from SBSs probably increase, and the loads of SBSs and  S decrease accordingly.In the GCC-CSA and MPC-CSA scheme, increased  S leads to expanded coverage areas of SBSs.Thus, the coverage areas of MBSs shrink and  M decreases.In the LNC model, the radio link data rates provided by MBSs are independent of the loads of other BSs, and thus  M in MPC-MSA remains unchanged as  S changes.In the LC model, the interference received by flows associated with MBSs is attenuated as the loads of SBSs decrease, and  M in MPC-MSA decreases accordingly.
(3) For given  S , the three kinds of average delay in GCC-CSA scheme decrease (or remain unchanged) as  S or  increases.The increase of  S means that more contents can be cached at SBSs, and the increase of  implies that more requests are aimed at a few popular contents.Both of them increase the probability of finding the requested files at SBSs and reduce the average delay in GCC-CSA scheme.Because of the locality of content popularity distribution, the cached contents in MPC-MSA and MPC-CSA scheme are not always popular in local areas, and the increase of  S or  possibly enlarges the average delay.
(4) In all scenarios,  S in GCC-CSA is smaller than that in MPC-MSA when  S is small.When  S is large, however,  S in GCC-CSA is larger than that in MPC-MSA.CSA mechanism restricts the coverage areas of SBSs when  S is small, which brings about smaller  S compared with MSA mechanism.When  S becomes large, the data flows that were originally associated with MBSs probably transfer to SBSs to reduce the overall average delay in the GCC-CSA scheme.In this case,  S in GCC-CSA is larger than that in MPC-MSA, but  in GCC-CSA is always smaller than that in MPC-MSA.
For visually showing the advantages of GCC-CSA scheme in reducing average delay, Figures 4 and 5 illustrate the three kinds of average delay in LNC and LC models when  S = 10 Mbps, respectively.We can observe that  S in GCC-CSA is smaller than that in MPC-MSA and MPC-CSA.The reduction in  S in GCC-CSA compared with the values in MPC-MSA becomes apparent when  S is small.When  S = 5 and  = 0.8, the proposed scheme achieves a reduction of 19.4% and 36.1% in  S compared with MPC-CSA scheme in LNC and LC models, respectively.When  S = 5 and  = 1.2, the proposed scheme achieves a reduction of 24.3% and 37.6% in  S compared with MPC-CSA scheme in LNC and LC models, respectively.Moreover,  M and  in these schemes are almost equal.This is because the coverage areas of MBSs are much larger than those of SBSs and  M in these schemes are close to each other.When  S = 100 Mbps,  u,M in GCC-CSA is 2078.5 bit/s less than that in MPC-MSA, but  u,S in GCC-CSA is only 320.0 bit/s greater than that in MPC-MSA.On the whole, the proposed scheme occupies less backhaul capacity than MPC-MSA scheme.Furthermore,  u,S and  u,M in GCC-CSA are always smaller than those in MPC-CSA, which demonstrates the advantage of GCC algorithm.These conclusions can also be drawn in other scenarios in LNC and LC models.The saved backhaul capacity can be used to provide other services, such as live streaming, video calls and online games.

Optimality of GCC Algorithm.
We demonstrate the optimality of GCC algorithm in a simple network as shown in Figure 10.The simplified content library contains  = 10 files and the cache capacity of SBS is  S = 2.The four regions separated by dashed lines share the same skewness parameter but differ in the order of content popularity.We compare the GCC algorithm with exhaustive search and MPC policy, and the CSA algorithm is applied together with these content placement schemes.obtained cost function values of these schemes under various network configurations.In this simple network, GCC can always find the optimal content placement with much lower complexity than exhaustive search.All these schemes obtain the optimal solutions when  S = 100 Mbps.This is because the implication of content placement at SBSs becomes weak when the backhaul capacity is large enough according to (5).

Conclusions
In this paper, we have proposed a GCC-CSA algorithm for joint optimization of content placement and user association in cache-enabled HCNs based on flow-level models.By modeling cellular networks as queuing systems, we have taken into consideration the discrepancy in the timescales of content placement and user association, the locality of content popularity and the heterogeneity of spatial traffic distribution, which are often neglected in the literature.The objective of joint optimization is to minimize the average delay of data flows, and this problem is formulated as an MINLP problem in LNC and LC models, respectively.Given the contents cached at BSs, we have proposed a CSA algorithm that allows data flows requesting different contents to connect to different BSs.A heuristic GCC algorithm is also proposed to tackle the content placement problem, and its convergence property is proved.Simulation results show that the proposed GCC-CSA algorithm can reduce the average content delivery delay compared with traditional approaches.Especially, when the backhaul capacity of SBSs is stringent, the proposed algorithm can significantly decrease the average content delivery delay in coverage areas of SBSs compared with traditional MPC-MSA scheme.In addition, the proposed algorithm can reserve larger backhaul capacities for transmission of contents that are not reusable.

6 Wireless
Communications and Mobile Computing end for  fl  + 1; end while get the optimal  *  and corresponding {L * ,, } under D  according to Algorithm 1; Outputs: the content placement scheme D * = D  and corresponding optimal load  * =  *  Algorithm 2: GCC-CSA algorithm for solving problem (13) in LNC model. is deterministic.L * , indicates the coverage area of BS  with respect to content , and the coverage areas associated with different contents may be different from each other depending on the caching states and backhaul capacity.
Define a set Y M ≜ { ∈ Y |   () ∈ M M , ∀ ∈ F}, and it denotes the set of locations that only connect to MBSs.The complement of Y M is denoted by Y S , and it denotes the set of locations that are probably associated with SBSs for requesting some contents.Apparently, Y M ∪ Y S = Y and Y M ∩ Y S = ⌀.The average delay of flows at locations in Y M and Y S are calculated by  M = ∑ ∈Y M E[ | ]/|Y M | and  S = ∑ ∈Y S E[ | ]/|Y S |, respectively.The average delay of flows at all locations is calculated by  = ∑ ∈Y E[ | ]/|Y| correspondingly.

7. 2 .
Illustration of Content-Level Selective Association.Figures 2 and 3 illustrate the coverage areas associated with Content 1 and Content 2 derived from GCC-CSA algorithm in LNC model and LC model, respectively.From these figures, we can observe that the coverage areas of SBS1, SBS4 and SBS8 associated with Content 1 and Content 2 are quite different, especially in LC model.The difference in caching state of Content 1 and Content 2 results in the different coverage areas.In these two models, SBS1 and SBS8 cache Content 1 but do not cache Content 2, and SBS4 caches Content 2 but does not cache Content 1.The caching states of these two contents are identical in other SBSs.Taking SBS8 as an example, since it does not cache Content 2, the coverage area with respect to Content 2 must shrink to prevent heavy load.7.3.Statistical Properties of Delay.Tables
Figures 6 and 7 show the cumulative distribution functions (CDFs) of E[ | ] under a given network configuration in LNC and LC models, respectively.The CDFs of E[ | ] in Y M in these three schemes almost overlap.However, the distributions of E[ | ] in Y S in GCC-CSA Wireless Communications and Mobile Computing

2 Figure 4 :
Figure 4: Comparison of average delay in LNC model when  S = 10 Mbps.

2 Figure 5 :Figure 6 :
Figure 5: Comparison of average delay in LC model when  S = 10 Mbps.

Table 2 :
The average delay in the three schemes in LNC model.

Table 3 :
The average delay in the three schemes in LC model.Figures8 and 9compare  u,M and  u,S in these schemes in LNC and LC models, respectively.For given  S and ,  u,M and  u,S in MPC-MSA do not change with the variation of  S because MSA method is independent of backhaul capacity.u,M in GCC-CSA and MPC-CSA decrease with the increase of  S , and  u,S in GCC-CSA and MPC-CSA increase with the increase of  S .As  S increases, some data flows that originally connected to MBSs are associated with SBSs to lower the overall average delay, which leads to decreased  u,M and increased  u,S .We specify the advantage of GCC-CSA scheme in backhaul usage based on Figure8(c), and other subfigures show the same results.When S = 1 Mbps,  u,S in GCC-CSA is 1722.5 bit/s less than that in MPC-MSA, but  u,M in GCC-CSA is only 680.1 bit/s greater than that in MPC-MSA.When  S = 10 Mbps,  u,S and  u,M in GCC-CSA are all smaller than those in MPC-MSA.

Table 4
lists the

Table 4 :
The cost function values of these three content placement schemes.