Link Stress Reduction against Bursty Arrivals of Content Requests

Content delivery networks are designed to extend the end-to-end transport capability of the Internet to cope with increases in video traffic. For further improvement, bursty request arrivals should be efficiently addressed. As opposed to previous approaches, in which the best client-server pair is individually selected (individual optimization), this paper proposes an algorithm for dealing with simultaneous arrival requests, in which client-server pairs are selected such that all requests receive good service (social optimization). The performance of the proposed algorithm is compared with that of the closest algorithm, an individual optimization algorithm, under the condition that a large number of requests arrive simultaneously. The evaluation criterion is the worst link stress, which is the largest number of streams per link. The numerical results show that the proposed algorithm is effective for large-scale networks and that the closest algorithm does not provide near-optimal solutions, especially when all requests arrive in a small part of the network or when there are many servers.


Introduction
Video traffic will be increasingly prevalent on the Internet.According to Cisco's traffic forecast for 2009-2014, global IP traffic is expected to increase by 34% per annum, and much of the increase is attributed to the delivery of video data [1].Video traffic typically consumes a large amount of network bandwidth for a long time.Furthermore, some video content providers such as YouTube have begun to provide high-definition video streaming services.It is fully anticipated that even more efficient and scalable video delivery schemes will be required.
The video delivery approach based on peer-to-peer (P2P) networking is currently popular since this yields several advantages such as resource scalability, network path redundancy, and self organization [2,3].A large number of P2P-based video delivery techniques are now available, and some are used for commercial purposes [4,5].Nevertheless, the P2P systems still pose some challenges such as resilience, underlay awareness, and security [6,7].Meanwhile, content delivery networks (CDNs) have evolved to improve the scalability and reliability of Web sites, and their focus has shifted to media delivery.CDNs extend the end-to-end transport capability of the Internet by employing techniques designed to optimize content delivery.Typical techniques are Web caching, server-load balancing, and request routing [8].This paper focuses on the CDN server assignment scheme that prevents congestion when bursts of requests arrive.
Most commercial CDN providers, such as Akamai and Limelight Networks, follow the overlay approach in which servers and caches distributed over the network manage content delivery.The underlay network components (e.g., routers) play no active role in content delivery.There are two classes of overlays: unstructured and structured [9].Structured overlays, which are organized with specific topologies, are relatively complex, whereas routing and searching operations tend to be efficient.Some frequently used overlay topologies are trees [10], rings [11], meshes [12,13], and hypercubes [14,15].The hypercube overlay considered in this paper has attractive topological properties for video delivery: low node degree, small network diameter, recursive construction, and independent paths [14].
Previous server assignment approaches are classified as individual optimization, which selects individually the best client-server pair in terms of the number of hops, the round-trip time, and/or the server and network loads [16][17][18][19].In the case of bursty request arrivals (flash crowd), however, individual optimization may not lead to social optimization, which provides good service quality for all requests.Some queueing models show that they do not agree under heavy loads [20][21][22].This paper formulates an optimization problem and then proposes a social optimization algorithm.The numerical results show that the proposed algorithm is effective, especially when all requests arrive in a small part of the network or when there are many servers.
This paper is organized as follows: Section 2 defines the routing rules in the hypercube overlays.Section 3 specifies the content delivery model and then formulates the server assignment problem.Section 4 proposes a heuristic algorithm for the problem.Section 5 compares the performances of the proposed algorithm and an individual optimization algorithm under the condition that a large number of requests arrive simultaneously.Finally, Section 6 presents the conclusions.

Hypercube Routing
This section defines the routing in the hypercube overlays.The K-dimensional hypercube has 2 K nodes and

 
, edges [23].Each node corresponds to a K-bit binary string (node ID), and two nodes are linked with an edge if their node IDs differ in precisely one bit.As a consequence, each node is adjacent to K other nodes, one for each bit position, and the number of hops between any two nodes does not exceed K. Figure 1 illustrates a three-dimensional hypercube.Another important feature of the hypercube is independent routes [24].Let i and j be any two nodes of a K-hypercube.There are K independent paths between i and j, and their lengths are less than or equal to , where stands for the Hamming distance between nodes i and j.
Routing on the hypercube is simple and does not require routing tables.This is because any two nodes whose node IDs differ in one bit are connected.In the case of a three-dimensional hypercube, for example, if node 000 needs to transmit packets to node 011, since nodes 010 and 001 are directly connected to nodes 000 and 011, there are two shortest-hop paths: 00 and To fix a route for each pairing of source and destination, we assume that only the first path is used.In other words, if the node IDs of the source and destination are 0  If all nodes obey this routing rule, routing paths from a source node to the other nodes are deterministically given.The binomial tree [25] in Figure 1 represents the routing paths from node 000 to all other nodes in the three-dimensional hypercube.The figure also shows that the number of hops does not exceed three along any path.Hereinafter, the term binomial tree refers to the routing paths from the root node to all other nodes.The binomial tree rooted at node 101 can be derived by XORing every node ID in Figure 1 with 101.

Content Delivery Model
Let us consider a content delivery system consisting of origin servers and surrogate servers connected in a hypercube overlay.The origin servers have the definitive version of the content.The surrogate servers, which are located close to users and receive content requests, store a copy of the content.Through the interaction among the surrogates, one of the surrogates gives content to a user if possible; otherwise (i.e., for a cache miss), the requested content is delivered from an origin to the user via the surrogate that received the request.
In this model, at any instant in time, any node in the hypercube acts as one of the three types of servers: 1) An origin server, 2) A surrogate server that is relaying a stream from an origin to a user or that has just received a request which causes a cache miss, 3) A surrogate server that does not need to interact with origin servers.
Hereinafter, we refer to a server of the first type as a server, a server of the second type as a client, and a user request that causes a cache miss as a request (i.e., this paper regards a server of the second type as a client that is served by a server of the first type).
Let S and C denote sets of servers and requests, respectively.Simultaneous request arrivals are dealt with under the condition that , here > 1 N M  M S and = = N C .The request partitioning problem considered is to assign requests to multiple servers in such a way that the quality of the assignment is maximized on the basis of the following assumptions: are non-overlapping, where is a request set assigned to server i.
2) Servers may have different processing capabilities (see Subsection 3.3).
3) For each request, one stream is delivered from a server to the user via the client that received the request.
4) Requests may be preassigned to a server (see Subsection 3.3).

Worst Link Stress
The quality of the assignment is measured using the worst link stress.Assume that is given and that all requests in C are receiving delivery service.Then the number of streams on each hypercube link can be determined.Let be the number of streams on link e that originate from server i for serving all requests in i .The link stress (LS) of link e represents the number of streams on the link and is given by The worst link stress (WLS) is the greatest link stress of all links in the hypercube.Strictly, WLS is given by where E is the set of all links in the hypercube.
Let us calculate WLS using the binomial tree in Figure 1 under the condition that and two requests arrive simultaneously at nodes 101 and 111.In this case, = 1 M and .According to Figure 1, there are four links used for stream delivery: from (2) we have .The WLS indicates the degree of congestion since congestion typically occurs at links where a large number of streams are flowing.In order to obtain a small WLS value, traffic concentration on any single link must be avoided.Table 1 lists the definitions of symbols used frequently in this paper.

Optimization Problem
The request partitioning problem P is formulated as follows: where i  is the set of requests preassigned to server i.
When C has changed due to bursty arrivals, a new par- If server k is providing service for request c when a new partition must be calculated, the partition is obtained under the condition that k   .The preassignment may also be used for reducing interdomain traffic.
Servers may not have the same processing capability.Let be the number of requests assigned to server i (i.e., i L = L G i i ) and let i f be the processing capacity of server i.To balance the load among heterogeneous servers, i should increase with

Proposed Algorithm
Each server k has two sets U and A, which are the set of requests not selected and the set of requests either selected by server k or preassigned to server k, respectively.Note that A depends on server k but U does not.The algorithm proposed for solving the optimization problem P is specified as follows: 1) At every server k in set S the algorithm starts with 2) Each server k in turn selects one request from set U in an arbitrary order according to Algorithm 4. If server k selects request c, then Algorithm 4 updates U and A such that c is removed from U and added to A.
3) After the selection, server k informs the other servers about what has been selected so that each uses the information to update its own set U.
4) The algorithm ends at server . Algorithm 4 is based on the following proposition: Proposition 1.The WLS is the LS of a link connecting the root to a binomial subtree if M  .Proof.Assume that the number of streams on link in subtree B is greater than the number of streams from the root to the subtree.Then, at least one of the streams on link does not stem from the root.This contradicts Figure 2 shows binomial subtrees 0 1 2 3 , , B B k B (a bi omial subtree of order k) is a k-dimensional hypercube rooted at a node that is directly connected to the root (node 0000).The proposition suggests that WLS be produced by one of the four links connecting the root produced by one of the four links connecting the root and the subtrees.Therefore, traffic load should be balanced among these links.The load balancing, however, is not easy since a decision made by one server affects the other servers' decisions.
, , B B where n Selecting a request (c) from set . 1.
Find the smallest   Select request c uniformly from U 8.
Remove request c from U 9.
Add request c to A 10.
The following explains Algorithm 4: Let X be the number of requests that server k is expected to select in future.Tuple  , based on where requests come from (see Figure 3).Therefore, where and = . Note that each server has different K tuples .
The expected number i X is obtained as follows: Assume that server k uniformly selects k requests at a time.Then, the number of selected requests from subtree has a hypergeometric distribution with mean . We adopt the mean as X ; that is, for , = 0,1, , To balance the load, Algorithm 4 selects a request from if The load balancing means that i A , = 0,1, , 1 i K  , are almost the same when the algorithm finishes execu-tion.If (6) holds, the final n A value is probably small; therefore, the server immediately raises n A before n U becomes an empty set.If two or more integers n satisfy (6), the smallest integer is used.

Performance Comparisons
This section compares the performances of three algorithms: the random, closest, and proposed algorithms.In all of these algorithms, each server k in set S in turn selects one request in an arbitrary order until the number of requests that server k must serve reaches k .In the random algorithm, one request is selected uniformly from those not selected yet.In the closest algorithm, each server selects a request coming from the closest client, where the distance between server s and client c is measured by the number of hops from s to c.If there is more than one such request, a request from a client in the lowest order subtree is selected.

L
The closest algorithm is an individual optimization algorithm since it yields the closest client-server pairs one by one.In this simultaneous arrival scenario, the term closest indicates not only the lowest hop count but also the lowest round-trip time, since there are no ongoing streams when requests arrive.
Table 2 lists the default parameter values.For each 3- , 10,000 calculations are performed.All of the algorithms are impartially evaluated by using the same node IDs for the M servers and N clients that receive user requests.These IDs are uniformly selected unless otherwise mentioned.Every server handles the same number of requests (i.e., 1 ) and there are no preassigned requests (i.e., for all k).

Dimensionality
Figure 4 shows the frequency distributions of 10,000 calculated WLS values and Table 3 lists the means and standard deviations of the WLS distributions.These results demonstrate the effects of the dimensionality on the performance of the three algorithms.From Table 2, the percentages of the numbers of servers and clients in a hypercube are independent of the dimensionality K.
From the figure, the proposed algorithm outperforms the other two algorithms, regardless of K. From Table 3, the mean and standard deviation of the random algorithm increase with K, while those of the closest algorithm stay roughly the same.In contrast, by using the proposed algorithm, the mean and standard deviation decrease with K.As a result, the largest WLS of the proposed algorithm also decreases with dimensionality K, as shown in

Client Distribution
be the number of hops from node to the client that receives request c.Let us consider three request sets , , and 3 that satisfy All requests in 3 arrive at clients that are located close to node .
C 00 0  Figure 5 shows the WLS distributions and Table 4 lists their means and standard deviations for the three sets.From the table, the proposed algorithm yields the small-est means and standard deviations for all request sets.By contrast, the WLS of the closest algorithm is highly sensitive to the request set.As shown in Table 5, when set 3 is used, the means for the random and closest algorithms are very similar.Furthermore, as shown in Figure 5(b), the largest WLS of the closest algorithm is greater than that of the random algorithm.This result suggests that individual optimization strategies are vulnerable to spatially irregular arrivals.

Number of Servers
Figure 6 and Table 5 show the results when the number of servers M is varied.The results demonstrate that as M increases, the proposed algorithm becomes more useful than the closest algorithm.Note that all three algorithms are identical when M .As M increases, the number of candidates of solution to problem P increases.The closest algorithm does not provide a near-optimal solution when a large number of candidates exist.

Resource Utilization
For efficient link resource utilization, the number of hops per stream should be as small as possible.We evaluate the three algorithms based on the average number of hops per stream ( c H ), which is given by , h i j (7) where denotes the number of hops from server i to the client that receives request j.The hop count c H for the closest algorithm indicates the lower bound.From the table, the average hop counts of the proposed and random algorithms are greater than that of the closest algorithm by 2.12 and 3.41 hops, respectively.In other words, the number of resources used by the proposed or random algorithm is 1.82 or 2.32 times larger, respectively, than that used by the closest algorithm.
Figure 7 shows the histogram of the average of 100 link stresses.Figure 7  H but also that the algorithm does not make good use of a large number of low-stress links.

Conclusions
Video traffic on the Internet is expected to continue to grow in the near future, making the development of more scalable video delivery schemes indispensable.In particular, bursty request arrivals should be efficiently addressed.Previous server assignment approaches in the content delivery networks can be classified as individual optimization; i.e., the best client-server pair is individually selected.This paper considered the assignment problem with hypercube overlays from the viewpoint of social optimization, which provides good service quality for all simultaneous arrival requests.We first formulated an optimization problem and then derived a heuristic algorithm for the problem.
We compared the performances of three algorithms (the proposed, closest, and random algorithms) based on the worst link stress (WLS), which indicates the degree of network congestion.To clarify the advantages of the social optimization approach, we considered the case in which a large number of requests arrive simultaneously.In this arrival scenario, the closest algorithm is an individual optimization algorithm in terms of not only the hop count but also the round-trip time.The following results were obtained through evaluations:  The proposed algorithm was effective for large-scale networks because both the mean and standard deviation of the WLS distribution decreased as the hypercube dimensionality increased. The closest algorithm did not provide near-optimal solutions when all requests arrived in a small part of the network or when there were many servers. The number of low-stress links of the closest algorithm was significantly less than that of the proposed algorithm.This result indicates that the algorithm does not make good use of a large number of lowstress links.

Figure 3 .
Figure 3. Tuple (U, A, X) is partitioned into K tuples (U i , A i , X i ) based on where requests come from.

Fig- ure 4 .
Figure4shows the frequency distributions of 10,000 calculated WLS values and Table3lists the means and standard deviations of the WLS distributions.These results demonstrate the effects of the dimensionality on the performance of the three algorithms.From Table2, the percentages of the numbers of servers and clients in a hypercube are independent of the dimensionality K. From the figure, the proposed algorithm outperforms the other two algorithms, regardless of K. From Table3, the mean and standard deviation of the random algorithm increase with K, while those of the closest algorithm stay roughly the same.In contrast, by using the proposed algorithm, the mean and standard deviation decrease with K.As a result, the largest WLS of the proposed algorithm also decreases with dimensionality K, as shown in Fig-ure 4. These results indicate that the proposed algorithm is effective for large-scale hypercubes.

Figure 5 .
Figure 5. WLS distributions for three algorithms when K = 12 and the request set is (a) C 1 or (b) C 3 .

Figure 7 .
Figure 7.A histogram of the average link stress.

Table 1 . Symbols used frequently in this paper.
XExpected number of requests selected in futureLSNumber of streams on a linkWLSGreatest LS of all links in a hypercubeH cAverage number of hops per stream

Table 6
lists the averages of 100 values of c H .For all three algorithms, the same server and request sets   1 100 , i S C   c i i are used to obtain the average hop count H .

Table 6 . Averages of 100 values of H c when K = 12.
and Table6are obtained under the same conditions; i.e., the same server and request sets  ) of the closest algorithm is significantly smaller than that of the proposed algorithm.This result indicates not only how the closest algorithm achieves the low average hop count c