Packet Scheduling in High-speed Networks Using Improved Weighted Round Robin

: A variety of applications with different QoS requirements are supported simultaneously in the high-speed packet-switched networks, packet scheduling algorithms play a critical role in guaranteeing the performance of routing and switching devices. This study presents a simple, fair, efficient and easily implementary scheduling algorithm, called Successive Minimal-weight Round Robin (SMRR). In each round, SMRR provides the same service opportunity, which is equivalent to the minimal weight of the current round, for all active data flows. On the basis of the concept of Latency-Rate (LR) servers, we obtain the upper bound on the latency of SMRR and WRR (Weighted Round Robin) respectively and the results indicate that SMRR makes a significant improvement on the latency bound in comparison to WRR. We also discuss the fairness and implementation complexity of SMRR and the theoretical analysis shows that SMRR preserves the good implementation complexity of O (1) with respect to the number of flows and has better fairness than WRR.


INTRODUCTION
There are many kinds of services with different QoS requirements in Internet.Packets belonging to different traffic flows often share links in their respective paths towards their destinations.Switches and routers want to schedule these traffics for supporting different service levels; therefore, the performances of routing-switching devices are tightly interrelated to the used packet scheduling algorithms.Packet scheduling algorithms can be broadly classified into two categories: time-stamp based scheduling and round-robin based scheduling.Time-stamp based scheduling algorithms such as WFQ (Weighted Fair Queuing) (Demers et al., 1998), WF 2 Q (Worst-case Fair Weighted Fair Queuing) (Bennett and Zhang, 1996), VC (Virtual Clock) (Zhang, 1990), SCFQ (Self-Clocked Fair Queuing) (Golestani, 1994) and SFQ (Start-time Fair Queuing) (Goyal et al., 1997) maintain two time-stamps for each packet to indicate its startserving time and end-serving time respectively, then sort these time and send out the packet with the least end-serving time.They achieve good fairness and low latency.However, they are not very efficient due to the complexity involved in computing and sorting each packet's time.On the other hand, in round-robin based scheduling algorithms such as WRR (Weighted Round-Robin) (Hemant and Madhow, 2003), DRR (Deficit Round-Robin) (Shreedhar and Varghese, 1996) and SRR (Smoothed Round Robin) (Guo, 2001), the scheduler simply serves all non-empty queues in a round-robin manner.These algorithms neither maintain a time-stamp for every service flow, nor perform sorting among the packets.Most of them have O (1) complexity with respect to the number of flows.As a matter of fact, round-robin is used in many other fields besides packet scheduling due to its simple, applied and good efficiency.Research and application aiming at round-robin algorithm are very broad (Hu et al., 2008;Lin et al., 2011;He et al., 2012).
Time-stamp based packet scheduling algorithms are to approximately emulate the most ideal packet scheduling algorithm-GPS (Generalized Processor Sharing) (Parekh and Gallager, 1992).However, their design complexities are increased in order to get perfect latency characteristics.So round-robin based packet scheduling algorithms are more attractive for implementation in high-speed packet networks.In this study, we present a new packet scheduling algorithm, termed Successive Minimal-weight Round Robin (SMRR), with better fairness and latency characteristic compared to Weighted Round Robin (WRR).In each round, every flow always gains the equivalent service opportunities to the least weight of the current round.It efficiently avoids the situation that some flows cannot gain service probably for a long time.Generally the opportunity of a flow, namely the number of packets which are sent to the output link with permission in a Fig. 1: Pseudo-code for SMRR scheduling packets round, is proportional to its weight.WRR sends out all the packets whose number is equivalent with a flow's weight at a time and a flow with large weight may let other flows be waiting for a long time.In contrast, SMRR sends a flow's packets whose number is equivalent with its weight through many times; only when a flow's weight is the least, it can send out all of its packets at a time.So SMRR can assure the service opportunity of every flow and balance their waiting latency.WRR is an earlier scheduling algorithm and applied long before.In recent years, many literatures put forward many queue scheduling algorithms based on WRR.For example, according to the current queue length, Pan and Zhang (2012) dynamically set the maximum byte number that can be sent during a service time in order to guarantee latency characteristics and relative fairness of low-weighted services.BSTLRR (Xiong and Zhang, 2012) schedules those data flows firstly, which have higher real-time requirement.TIIWRR (Zhang et al., 2011) ameliorates UIWRR, also called WRR2 in (Hemant and Madhow, 2003), to solve the collision problem when UIWRR computes packet sending sequence.Based on the very limited buffering capability of optical packet switching network, Cao et al. (2010) introduce an algorithm called United Weight Round Robin (UWRR) for scheduling multiservices in the network.As far as a scheduling algorithm is concerned, there are three very important attributes, namely latency, fairness and complexity.These three attributes have an effect on the practical application of a scheduling algorithm.Therefore, this study will lay special stress on analyzing these three attributes of SMRR and compare them with WRR.

SUCCESSIVE MINIMAL-WEIGHT ROUND ROBIN
A pseudo-code implementation of the SMRR scheduling algorithm is shown in Fig. 1, including Initialize Module, Enqueue Module and Dequeue Module.The Initialize Module is to initialize the SMRR packet scheduler when network node has demand on scheduling packets.Enqueue Module is called whenever a new packet arrives at a flow.The packet will enter into the correlative queue and wait for being sent.Dequeue Module schedules packets from the queues corresponding to different flows.
Therefore, we can define the scheduling framework of SMRR as: suppose there are n (n is a natural number and n>0) data flows sharing one output link; packets of each flow enter into correlative queue before being sent to the output link; packet scheduling module uses SMRR algorithm to scheduling each packet so that the bandwidth resource of the output link is allocated fairly.
A data flow is defined as active if its queue is not empty or its packets are being scheduling.All the active flows are put into a list and this list is called "Active Flow List".When a flow changes from active to inactive, it will be removed from Active Flow List.When a flow changes from inactive to active, it will be appended at the end of Active Flow List.
Every data flow is designated a "weight" through which indicates its priority, QoS requirements and so on.When a flow's weight (namely its service opportunity) is not used up, we say that its "Surplus Weight" is more than zero.All the flows which have positive Surplus Weight are put into a list which is called "Surplus Flow List".When a flow's weight is used up, namely its weight already equals 0; this flow will be removed from Surplus Flow List.If a data flow belongs to Surplus Flow List, it belongs to Active Flow List definitely, but not vice versa.
There are two types of rounds in SMRR-primary round and secondary round.A primary round is defined as the process during which the data flows, included in Active Flow List at a time instant T1 (T1>0), are accessed by packet scheduling module.The newcome flows or those become active once again can be appended at the end of Active Flow List, but they will be accessed in the next round.The definition of secondary round, similar with primary round, is the process during which the data flows included in Surplus Flow List are accessed by packet scheduling module.A primary round may involve one or more secondary rounds.
In order to express the number of data flows wanted to be accessed in a round, SMRR introduces a counter, which is called "Visit Flow Count", to record the number of data flows.At the beginning of a round, Visit Flow Count equals the number of data flows in Surplus Flow List.When packet scheduling module finishes accessing a data flow, the value of Visit Flow Count will be minus 1.At last, when Visit Flow Count equals 0, it means that the current round is over.
Another counter, called "Packet Number Count", is used to record the number of packets that a data flow has sent out totally in a round and every data flow has a Packet Number Count.The initial value of Packet Number Count is 0 before a round begins and then Packet Number Count will be increased by 1 after the corresponding flow sends out one packet.

SMRR LATENCY ANALYSIS
Stiliadis and Verma (1998) defined a general class of schedulers, called Latency-Rate (LR) Servers.Here schedulers are those servers or devices which run specific scheduling algorithm.The authors also developed and defined a notion of latency and determined an upper bound on the latency for a number of schedulers that belong to the class of LR servers.There into the notion of latency is based on the length of time it takes a new flow to begin receiving service at its guaranteed rate and pointed out latency is directly relevant to the size of the playback buffers required in real-time streaming applications.In this study, SMRR's latency characteristic will be analyzed by using the concept of Latency-Rate Servers.
Consider an output link of transmission rate r, access to which in controlled by the SMRR scheduler.
Let n be the total number of flows and let ρ i be the scheduling rate for flow i.Let ρ min be the lowest of these scheduling rates.Since all the flows share the same output link, the sum of the scheduling rates is no more than the transmission rate of the output link, namely ≤ J (# . In order that each flow receives service proportional to its guaranteed service rate, the SMRR scheduler assigns a weight to each flow.The weight assigned to flow i, w i is given by: Apparently, for 1≤i≤n, w i ≥1. Let Ф i represent the amount of data that flow i is permitted to send during each round robin service opportunity and let Φ min be the amount of data permitted to the flow with the lowest scheduling rate.The amount of data permitted to flow i is given by Ф i = w i Φ min .Thus, the amounts of data permitted to the flows are in proportion of their scheduling rates.In addition, let A i (τ, t) represent the number of arrived packets in flow i during the time interval (τ, t).
Definition 1: Define W as the sum of the weights of all active flows that are being served by the SMRR scheduler.
Definition 2: An active period of a flow is defined as the maximal interval of time during which it has at least one packet awaiting service or in service.A flow remains continuously active during its active period.Definition 3: A busy period of flow i is defined as the maximal interval of time (τ 1 , τ 2 ) such that at any time t∈ (τ 1 , τ 2 ) the accumulated arrivals of flow i since the beginning of the interval do not fall below the total service received during the interval at a rate of exactly ρ i .That is: A flow is active during its busy period, but the active period is different from the busy period of a flow, in the sense that it reflects the actual behavior of the scheduler because the instantaneous service offered to the flow varies according to the number of active flows.Suppose Sent i (t 1 , t 2 ) represent the amount of service received by flow i during the time interval (t 1 , t 2 ).Let the time instant α i be the start of a busy period for flow i. Suppose t>α i and flow i is continuously busy during the time interval (α i , t).Define S i (α i , t) as the number of packets in flow i that arrive after time α i and are scheduled during the interval of time (α i , t).Obviously, the busy period of a flow must be its active period, but the active period is not always its busy period and thus S i (α i , t) is not necessarily the same as Sent i (α i , t).

Definition 4:
The latency of flow i is defined as the minimum non-negative constant Ө i that satisfies the following for all possible busy period of the flow: As defined in Stiliadis and Verma (1998), a scheduler which satisfies Eq. ( 3) for some non-negative constant value of Ө i is said to belong to the class of Latency Rate (LR) servers.
Note that even if the definition of the latency is based on flow busy periods, actually it is easier to analyze scheduling algorithms based on the active period of a flow.
Lemma 1: Suppose there is a scheduling server S and flow i becomes active at the instant of time τ i .Let t>τ i be some instant of time such that the flow is continuously active during the time interval (τ i , t).Let Ө i ΄ be the smallest non-negative number such that the following is satisfied for all t: Even though (τ i , t) may not be a continuously busy period for flow i, the latency as defined by Eq. ( 3) is bounded by Ө i ΄.
Proof: Since a busy period must be an active period, it will be proven so long as any one busy period (α k , β k ) during active periods satisfies Eq. ( 3): Synthetically, S i (α k , t) ≥max {0, ρ i (t -α k -Ө i ΄)}.Hence S is Latency-Rate server and the latency is less than or equals Ө i ΄.
Lemma 1 allows us to determine the latency bound of a scheduler by considering only those periods during which a flow is continuously active.
Lemma 2: The latency experienced by flow i during the active period (τ i , t) will reach its upper bound, Ө i ΄, only if the time instant, τ i , at which flow i becomes active, is the start time of serving another flow.
Proof: Assume that flow j (j ≠ i) receives its service opportunity during the time interval (t 1 , t 2 ), t 1 ≤τ i <t 2 .If τ i is not the start time that j receives service opportunity, then τ i >t 1 .That is, the time interval (t 1 , τ i ) is a part of the service opportunity of flow j, but it is not included in the latency experienced by flow i.On the other hand, when τ i = t 1 , τ i is the start of the service opportunity of flow j.So, the time for which flow i has to wait before receiving any service will involve the time interval (t 1 , t 2 ).Obviously, when τ i is the start time of the service opportunity of another flow, the latency experienced by flow i is always greater.
Lemma 3: The latency experienced by flow i during the active period (τ i , t) will reach its upper bound, Ө i ΄. t belongs to such a time set T i and T i is the set of all time instants at which the scheduler begins serving flow i.
Proof: Assume that t 1 and t 2 are any two consecutive time instants belong to T i .Consider an instant of time t, t 1 <t<t 2 .When flow i is receiving service at time instant t, the amount of service received by flow i is r (t -t 1 ) during the time interval (t 1 , t), where r is the link rate.Clearly, flow i is receiving service at the rate ρ i or higher.Thus, the worst-case latency of flow i is no worse than that until time t 1 .On the other hand, when some other flow is receiving service at time instant t, flow i does not receive any service during the time interval (t, t 2 ).So the latency experienced by flow i includes the time interval (t, t 2 ).In this case, the worstcast latency of flow i is no worse than that until time t 2 .The above two cases show that the worst-case latency of a flow during (t 1 , t 2 ) is equal to the latency until either t 1 or t 2 .The lemma is proved.
Just as mentioned in above section, when flow i receives service from SMRR scheduler, rounds can be primary and secondary and a primary round may involve one or more secondary rounds.The primary round measures the amount of data Ф i , which is permitted by flow i during a primary round.Service in a secondary round equals the lowest amount of data among all flows in every secondary round, namely W min in SMRR algorithm description.Service in a primary round equals the sum of service of all nested secondary rounds.If let k be the primary round and let v be the secondary round, then (k, v) expresses that a flow receives the v-th service in the k-th primary round and let τ i (k, v) be the time instant at which flow i begins to receive the (k, v)-th service.Assume that flow i becomes active at time instant τ i , therefore, in order to determine the latency bound of SMRR, according to Lemma 3, we need to only consider time interval (τ i , τ i (k, v) ) for all (k, v) in which flow i receives service.
For analyzing the latency bound, we need to select a suitable time interval (τ i , τ i (k, v) ) such that the size of this time interval is the maximum possible.The start time instant τ i may or may not coincide with the start of a new primary round.Assume that the k 0 -th primary round is in progress or which starts exactly at time instant τ i and let the time instant t h represent the start time of the (k 0 + h)-th primary round.Consider the case when τ i does not coincide with the time instant t 0 , the start of primary round k 0 , i.e., τ i >t 0 .In this case, the time interval (t 0 , τ i ) will be excluded from the time interval under consideration.On the other hand, when τ i coincides with t 0 , the size of the time interval (τ i , τ i (k, v) ) is maximal.According to Lemma 2, we assume that the τ i coincides with the start of the k 0 -th primary round.Figure 2 illustrates the time interval under consideration supposing that (k, v) is equal to (k 0 + s, v).Note that in Fig. 2, PR (a) represents the a-th primary round and SR (a, b) denotes the secondary round (a, b) in the execution of SMRR scheduler.In addition, the set G FIN includes those flows that have completed receiving their allocated service in the duration of the (k 0 + s)-th primary round at the time instant τ i (k 0 +s, v) .
Theorem 1: The SMRR scheduler belongs to the class of LR servers, with an upper bound on the latency Ө i for flow i given by: min ) ( min min Hereinto, the value of the parameter Ψ as follows: Ψ = 0, when w i is the smallest among n weights Ψ = 1, other cases Proof: According to Lemma 1, the latency of an LR server can be estimated based on its behavior in the flow active periods, so we will prove the theorem by showing that: On the basis of the discussion above, the time interval (τ i , τ i (k 0 +s, v) ) can be split into two sub-intervals: • (τ i , t s ): This sub-interval includes s primary rounds of execution of SMRR scheduler starting at primary round k 0 .Now assume that primary round (k 0 + h) is in progress during the time interval (t h , t h+1 ).For all n flows: Summing the above over s rounds beginning with primary round k 0 : This sub-interval includes the part of the (k 0 + s)-th primary round prior to the start of the service of flow i during the v-th secondary round.In the worst-case, flow i will be the last flow to receive service among all the flows.In this case, during the sub-interval (t s , τ i (k 0 +s, v) ), the service received by flow i comes from the former v-1 secondary rounds, whereas the other n-1 flows have completed the service of v secondary rounds.If v = 1, then flow i does not receive service in this sub-interval.
Let Sent i (k, v) represent the total service received by flow i since the start of the k-th primary round until the time instant when SMRR scheduler finishes the service of the v-th secondary round, then: Sorting all the n weights in ascending order and let w v-1 and w v represent the (v-1)-th and the v-th weight respectively, we have: Since w v ≤w i , therefore: Combining Eq. ( 7) and (10), we have: Solving for s: Note that the total data transmitted by flow i during the time interval under consideration can be expressed as the following summation: As explained earlier, Sent i (t s , τ i (s, v) ) is the same as Sent i (s, v-1).Sent i (τ i , t s ) can be obtained by summing the amount of data of flow i over s primary rounds starting at primary round k 0 .We get: 14) Using ( 12) to substitute for s in ( 14), we have: In ( 15), if v = 1 then w i is the least among all weights and.w v-1 = 0; if v>1 then w v-1 ≥1.Thereupon, introducing a parameter Ψ and its values as follows: Ψ = 0 when w i is the least among n weights; Ψ = 1 in other cases.So: Now, since the sum of the scheduling rates is no more than the link rate r and combing (1), we have: Using Eq. ( 17) in ( 16), we get: And then: The statement of the theorem is proved.Now, analyzing the latency of SMRR under two boundary conditions: Case I: for ∀j, 1≤j≤n, j ≠ i, ρ i <<ρ j In this case, w i = 1 and w i <<w j .Also W>>w i , the set G FIN is empty and Ψ = 0. Hence: Case II: for ∀j, 1≤j≤n, j ≠ I, ρ i >>ρ j In this case, w i >>w j .Ψ = 1 and the set G FIN includes all (n-1) flows except flow i. Hence:

THE LATENCY BOUND OF WRR
Theorem 2: The WRR scheduler belongs to the class of LR servers, with an upper bound on the latency Ө i for flow i given by: min Proof: According to Lemma 1, the latency of an LR server can be estimated based on its behavior in the flow active periods, so we will prove the theorem by showing that: Assume that flow i becomes active at time instant τ i and let τ i s be the start time of the s-th service opportunity of flow i.According to Lemma 3, in order to determine upper bound on the latency of WRR, we need to only consider time interval (τ i , τ i s ) for all s. Figure 3 illustrates the time interval under consideration for a given s.Note that the time instant τ i may or may not coincide with the end of a round and the start of the subsequent round.Let k 0 be the round which is in progress at time instant τ i or which ends exactly at time instant τ i .Let the time instant t h mark the end of round (k 0 + h -1) and the start of the subsequent round.As shown in Fig. 3, assume that the time instant when flow i becomes active coincides with the time instant when some flow g is about to start its service opportunity during the k 0 -th round.Let G i+ denote the set of flows which receive service after flow i becomes active, i.e., during the time interval (τ i , t 1 ).Similarly, let G i-denote the set of flows which are served by the WRR scheduler before flow i becomes active, i.e., during the time interval (t 0 , τ i ).Note that flow i is not included in either of these two sets since flow i will receive its first service opportunity only in the (k 0 + 1)th round.If the time instant τ i coincides with the time instant t 1 , which marks the end of the k 0 -th round and the start of the (k 0 + 1)-th round, then the set G i+ will be empty and all the n-1 flows will be included in the set G i-.In this case, flow i will be the last to receive service in the (k 0 + 1)-th round and all subsequent rounds during the time interval under consideration.
The time interval (τ i , τ i s ) can be split into the following three sub-intervals: • (τ i , t 1 ): This sub-interval includes the part of the k 0th round during which all the flows belonging to the set G i+ will be served by the WRR scheduler and then: • (t 1 , t s ): This sub-interval includes s-1 rounds of the WRR scheduler starting from round (k 0 + 1).Assume that round (k 0 + h) is in progress during the time interval (t h , t h+1 ), we get: Summing Eq. ( 23) over s-1 rounds beginning with round k 0 +1: • (t s , τ i s ): This sub-interval includes the part of the (k 0 + s)-th round during which all the flows belonging to the set G i-will be served by the WRR scheduler and then: Combining Eq. ( 22), ( 24) and ( 25), we have: Solving for s-1: Note that during the time interval under consideration, (τ i , τ i s ), flow i receives service in s-1 rounds starting at round (k 0 + 1).We get: Using ( 27) to substitute for (s-1) in ( 28), we have: We get: And then: The theorem is proved.

COMPARATIVE ANALYSIS OF LATENCY UPPER BOUND
At first, we present a comparison of the latency upper bound of WRR derived in Theorem 2 with the latency bound derived in (Stiliadis and Verma, 1998).Let Ө i new represent the latency bound derived in Theorem 2. Thus: Let Ө i old represent the latency bound of WRR as derived in (Stiliadis and Verma, 1998), namely: In Eq. ( 33), F denotes the size of a WRR frame which is equal to WΦ min , the summation of the data permitted to be sent by all the active flows.L c is the size of an ATM cell.Substituting for F in Eq. ( 33) we get: Comparing Eq. ( 32) and (34), we have θ i new <θ i old .It illuminates that the upper bound on the latency of WRR derived in Theorem 2 is lower.
Next, we compare the latency bound of SMRR analyzed earlier with the latency bound of WRR.The latency of SMRR, under two boundary conditions, are shown in Eq. ( 19) and (20).Both of them are less than # (ˣ − ˱ )Φ C obviously, so the latency characteristic of SMRR is better than that of WRR.Note that there is a special case: SMRR and WRR have the same latency upper bound when all the n flows have the same weight.

FAIRNESS
For fairness analysis, the fairness metric proposed in Golestani (1994) is more commonly employed.This metric, known as the Relative Fairness Bound (RFB), is defined as the maximum difference in the service received by any two flows over all possible intervals of time.
Definition 5: For any two flows i and j of all the flows which have continuous packets queuing during the time interval (t 1 , t 2 ), let FM (t 1 , t 2 ) represent the maximum value of the following formula (35).FM is defined as the maximum value of FM (t 1 , t 2 ) over all possible time intervals (t 1 , t 2 ): If a scheduling algorithm has FM = 0, it is obvious that this algorithm has good relative fairness.For example, GPS (Generalized Processor Sharing) (Parekh and Gallager, 1992) is proven to possess this property.However, this condition cannot be met by any packetby-packet algorithm since packet must be served exclusively.In such algorithms, we can only require that their FMs are bounded by a constant.Therefore, for a scheduling algorithm running on server S, it is considered close to fair if its FM is a constant.In particular, FM (t 1 , t 2 ) should not depend on the size of the time interval.This constant is called F S in (Stiliadis and Verma, 1998) and F S is called the fairness of server S. Apparently, FM≤F S .
Proof: Consider any two flows i and j that are continuously backlogged in a time interval (t 1 , t 2 ).Since the maximal difference of the service opportunity between i and j is one secondary round, without loss of generality, assume that two flows have passed through k primary rounds.In addition, flows i and j have passed through v-1 and v secondary rounds respectively.We have: The following case discussion: Proof: Consider any two flows i and j that are continuously backlogged in a time interval (t 1 , t 2 ).Let k be the number of service opportunities given to flow i in interval (t 1 , t 2 ) and let k΄ be the number of service opportunities given to flow j in the same interval, then |k-k΄|≤1, namely k΄≥k-1 and k≥k΄-1.Thus, When, we have: The result of ( 42) is less than absolutely.

IMPLEMENTATION COMPLEXITY
Implementation complexity is also an important index to evaluate a scheduling algorithm besides latency and fairness.Since a scheduling algorithm is wanted to be implemented in high-speed networks generally, implementation complexity of a scheduling algorithm will affect its practical application.In addition, it is better that the implementation complexity does not depend on the number of the active flows.
When a SMRR scheduler serves n flows, its implementation complexity is defined as the order of the time complexity, with respect to n, of enqueuing and then dequeuing a packet for transmission.
there is a special case mentioned above, namely all the n flows have the same weights, at which both SMRR and WRR schedulers have the same FM.Nevertheless, when flows have different weights, SMRR scheduler has better fairness than WRR, which can be derived by comparing (37)-(39) with (40)-(41).For example, in (38), we have:

Table 1 :
Comparison of latency bounds