HSUPA Transport Network Congestion Control

The introduction of high speed uplink packet access (HSUPA) greatly improves achievable uplink bitrate but it presents new challenges to be solved in the radio access network. In the transport network, bandwidth reservation for HSUPA is not efficient and TCP cannot efficiently resolve congestion because of lower layer retransmissions. This paper proposes an HSUPA transport network flow control algorithm that handles congestion situations efficiently and supports quality of service differentiation. In the radio network controller (RNC), transport network congestion is detected. Relying on the standardized control frame the RNC notifies the Node B about transport network congestion. In case of transport network congestion the Node B part of the HSUPA flow control instructs the air interface scheduler to reduce the bitrate of the flow to eliminate congestion. The performance analysis concentrates on transport network limited scenarios. It is shown that TCP cannot provide efficient congestion control. The proposed algorithm can achieve high end-user perceived throughput, while maintaining low delay, loss and good fairness in the transport network.


I. INTRODUCTION
Recent deployment of High Speed Downlink Packet Access (HSDPA) in operational 3G networks increases the downlink system capacity providing improved end-user experience by higher download speed and reduced round trip times [1]. The demand for uplink performance improvement is addressed by introducing Enhanced Dedicated Channel (E-DCH) in 3GPP Release 6 [2]. The E-DCH is further improved with the possibility of higher-order modulation in Release 7 [3]. The Release 6 and 7 improvements allow Layer 1 peak rates up to 5.7 Mbps and 11 Mbps in uplink. New Medium Access Control layers (MAC-e/es) were introduced to support the new features of the High Speed Uplink Packet Access (HSUPA), i.e. fast Hybrid Automatic Repeat Request (HARQ) with soft combining, reduced (2 ms) TTI length and fast scheduling.
In spite of the fact that similar features have been introduced for HSDPA and HSUPA, there are several essential differences [2]. In case of HSDPA the High Speed Downlink Shared Channel (HS-DSCH) is shared in time domain among all users, for HSUPA the E-DCH is dedicated to a user. For HSDPA the transmission power is kept more or less fixed and rate adaptation is used. However, this is not possible for HSUPA since the uplink is non-orthogonal, therefore fast power control is needed for fast link adaption. Soft handover is not supported by HSDPA, while for HSUPA soft handover ‡ Zoltán Nagy has left Ericsson Research since the work described in this paper was completed. is used to decrease the interference from neighboring cells and to have macro diversity gain. Consequently, for HSDPA the shared resources are the transmission power and the code space of the shared channel, but for HSUPA the interference headroom.
Likewise to HSDPA [4], the Iub and Iur transport network links 1 could be a bottleneck in the radio access network for HSUPA, since the increased air interface (Uu) capacity does not always come with similarly increased transport network capacity in practice. The cost of transport links is still high and not expected to decrease dramatically [5]. The possible congestion situation over a transport link cannot be solved by Transmission Control Protocol (TCP) efficiently because of lower layer retransmissions. It has been identified in 3GPP that an HSUPA flow control can resolve these congestion situations if transport network congestion detection functionality is available. For this purpose a new control frame and a new Information Element for E-DCH Iub/Iur data frame was introduced in [6]. The requirements and principles of congestion control are summarized in [7].
In [4], the authors give an overview of the various congestion control algorithms developed for different networks, e.g. TCP and Asynchronous Transfer Mode (ATM) flow control solutions. An overview of HSDPA flow control solutions is also given and an HSDPA flow control algorithm which solves not only the efficient usage of air interface, but also the congestion situation on transport network is proposed. In [8], a transport network overload control algorithm for Best-Effort DCH traffic is introduced. It is shown that already in the case of DCHs this improves transport network utilization. In [9], the authors made an HSUPA performance analysis using congested transport assumption, but without using any transport network congestion control solution. Up to the knowledge of the authors of this article, no HSUPA flow control solution has been proposed so far in the literature.
We propose an HSUPA flow control algorithm that supports scenarios where the transport network is the limiting factor. This algorithm uses the flow control framework standardized by 3GPP [6].
The rest of the paper is structured as follows. Section II gives a system overview. Section III describes the proposed HSUPA flow control algorithm. The performance of this flow 1 Iub is between Node B and RNC, Iur is between Drift RNC and Serving RNC (SRNC).  control algorithm is evaluated in Section IV. Finally, Section V concludes the paper.

II. SYSTEM OVERVIEW
The nodes and protocol layers involved in the HSUPA flow control (FC) are depicted in Fig. 1 [10]. The figure also shows the location of the FC related functionalities. The task of the FC is to regulate the transfer of MAC-es Protocol Data Units (PDUs) on the Iub/Iur Transport Network (TN) towards SRNC. In the rest of the article flow denotes this MAC-es PDU flow. Several of these flows may share the same air interface or TN bottleneck. Note that the regulation provided by FC is needed only when the TN limits the performance.
When HSUPA is carrying moderate-speed Quality of Service (QoS) sensitive traffic, QoS can be guaranteed by TN bandwidth reservation by means of TN admission control and FC is not used. For best-effort traffic, bandwidth reservation is not efficient and FC is used instead.
A User Equipment (UE) can be in Soft Handover (SHO), which means that its transmission is received by more than one cell. One of these cells, usually the one with the best radio connection, is called serving cell and the rest are called non-serving cells. When a UE is in SHO it has as many flows over TN as many Node Bs it is connected to.
The TN provides transmission between the Node B and SRNC. ATM/AAL2 (ATM Adaptation Layer 2) or UDP/IP (User Datagram Protocol/Internet Protocol) is used as transport protocols. The TN bottleneck and the associated bottleneck buffer can be in the network at a point of aggregation and also in the nodes on the interface cards 2 . The TN may support Transport Network Layer (TNL) QoS differentiation, which allows for different flow controlled flows to have different service over the TN based on e.g. subscription or service. Different flows of the same Node B may experience bottleneck at different parts of the network, not only due to different TNL QoS level, but also due to e.g. some flows being transmitted over Iur or over parallel Iub links. Additionally, flows must be able to efficiently use the changing TN capacity remaining from high priority flows to ensure efficient utilization of the TN. The FC must be capable of regulating the flows in this changing environment and must maintain high enduser throughput and fairness while maintaining low end-to-end delay for delay sensitive applications (e.g. gaming over best effort HSUPA).
The HSUPA air interface scheduler (Uu scheduler) operates by sending scheduling grants to UE and receiving scheduling requests from UE [2]. Only the scheduling framework is standardized, the scheduling algorithm itself is not. There are two types of scheduling grants, Absolute Grant (AG) and Relative Grant (RG). AGs can be sent only by the serving cell and transmitted over the E-DCH Absolute Grant Channel (E-AGCH), which is a shared resource among all users of the cell. The AG defines how many bits can be transmitted every TTI and thus a maximum limit of the data rate, and the AG is valid until a new scheduling grant is received. The RG can modify this rate up/down in the serving cell, or only down in the non-serving cell. The UE indicates by a flag called Happy Bit whether it would benefit from a higher rate grant or not.
The MAC-e/es protocol layers in the UE are responsible for HARQ and the transport format selection according to the scheduling grants. The created MAC-e PDU is transmitted over the air interface to the Node B. The MAC-e protocol layer in the Node B demultiplexes the MAC-e PDUs to MAC-es PDUs which are transmitted over the TN to the SRNC. The MAC-es protocol layer in the RNC handles the effect of the SHO by reordering, duplicate removal and macro combining to ensure in-sequence-delivery for the Radio Link Control (RLC) protocol layer.
While a connected UE may have several (MAC-es) flows multiplexed in one MAC-e flow, only one AG is assigned to the UE. This makes the congestion control challenging when some flows belonging to the same UE experience TN congestion while others not 3 . In this case, as a simplification, the whole MAC-e flow can be treated as congested.
RLC Acknowledged Mode (AM), which is a Selective Repeat Automatic Repeat Request protocol, is used between the UE and SRNC to correct the residual HARQ failures and to provide seamless channel switching [11]. RLC AM does not include congestion control functionality, because it assumes that RLC PDUs are transmitted by MAC-d layer according to the available capacity. The RLC status messages, which are being sent regularly, trigger retransmission of all missing PDUs. This may result in unnecessary retransmissions because new status messages are sent before the retransmitted PDUs arrive, especially in case of long round trip time. Several unsuccessful retransmissions trigger an RLC reset and the whole RLC window (maximum 80 KByte) is discarded. The end-user IP packets never get lost in TN -unless the congestion causes RLC reset -and in this way TCP cannot detect TN congestion based on duplicate acknowledgements. TCP slow start rapidly increases the TCP window size to its maximum and it is normally kept at maximum during the whole transmission -unless a bottleneck other than the TN is experienced -because of the lack of IP packet loss and large enough RLC Service Data Unit (SDU) buffer. Too many retransmissions of the same PDU usually causes TCP timeout that degrades the TCP efficiency significantly. Consequently, TCP cannot control TN congestion efficiently and a system specific congestion control solution is needed.
Frame loss and the resulting RLC retransmission shall be minimized because it significantly increases the delay variation of end-user. The TN delay shall be kept low due to delay sensitive applications over HSUPA and to minimize control loop delay for FC and RLC. The delay target for MACes PDUs over TN is typically in the order of 100 ms. This requirement is a compromise between performance and achievable utilization.
FC related Iub/Iur data and control frames are standardized in [6] and define the HSUPA FC framework. The FC algorithm itself is not standardized, each vendor can implement its own solution. The Iub/Iur E-DCH UL data frame (DF) contains the user data, the Frame Sequence Number (FSN), the Connection Frame Number (CFN) and Subframe Number. CFN and Subframe Number are used for reordering, but can also be used to calculate a Delay Reference Time (DRT) which defines when the DF was sent from Node B. FSN and DRT can be used for TN congestion detection. Apart from congestion detection based on DF fields, also transport protocol specific congestion detection techniques are possible to use. The TNL Congestion Indication Control Frame (TCI CF) is used for reporting the congestion detected in SRNC. The TCI contains a congestion status field, which can indicate no congestion, congestion due to delay build-up or due to frame loss.
While the purpose of HSDPA flow control [4] and HSUPA flow control is similar, there are significant differences. Firstly, for HSUPA only the TN bottleneck has to be regulated, while for HSDPA there are also Uu scheduler queues in the Node B to be regulated (called MAC-hs Priority Queues [4]). This also means that HSDPA FC must deal with Uu and TN bottleneck, but in case of HSUPA FC the Uu bottleneck is completely handled by the Uu scheduler. Secondly, HSUPA can be in SHO, while HSDPA not. This means that for the same UE there can be several (one serving and zero or more nonserving) flows to be controlled.

III. FLOW CONTROL ALGORITHM DESCRIPTION
In this section, we introduce a rate-based per flow FC solution. A rate-based solution is chosen because this is aligned well with the standardized 3GPP framework. A per flow solution supports different TN bottlenecks for the flows of the same Node B and TNL QoS differentiation among the flows. An aggregated solution would require detailed information about the TN bottleneck(s) and QoS solution, also it should support aggregated TN connections, where flows of several Node Bs can experience bottleneck. While such solution is not impossible, its complexity would be too high compared to the achievable gains. The FC algorithm architecture is depicted in Fig. 2

Fig. 2. Flow control architecture
The FC is designed to provide fair throughput sharing among the flows sharing the same TN bottleneck, when the TN is limiting the throughput. Behavior of flows is regulated by the Uu scheduler until a TN congestion is detected. The reason for this is that as long as the TN is not a bottleneck it is the task of the Uu scheduler to utilize the air interface as much as possible and to provide fairness among the flows. The Uu scheduler increases the granted bitrate with a reasonable speed to avoid large interference peaks. This also ensures that sudden overload of the TN is avoided.
When TN congestion is detected the FC dominates the behavior. During this time the flows are regulated according to an algorithm, which is conform with the additive increase multiplicative decrease (AIMD) property. In [12], it is shown that AIMD guarantees convergence to fairness; all flows converge to an equal share of resources in steady state, where no flows join or leave. A multiplication with a coefficient provides the multiplicative decrease and a constant increase rate after reduction provides the additive increase property. The AIMD property is met only for the serving cell behavior. However, a MAC-e PDU is normally received in the serving-cell with a higher probability, thus the end-user fairness is dominated by the serving-cell behavior.
The presented algorithm has several parameters, which were set according to typical achievable Uu and TN bitrates, TN buffers sizes and propagations delay. For a system, where these typical values are significantly different from the values used in Section IV, the parameter setting has to be reconsidered.

A. TN congestion detection in SRNC
The TN congestion detection and notification part of the algorithm is performed whenever a DF arrives to the SRNC. Two different congestion detection methods are used, namely: • FSN gap detection. The 4-bit FSN in the DF can be used to detect lost DFs. • Dynamic Delay Detection (DDD). The Node B DRT is compared to a similar reference counter in SRNC when the DF is received. The difference between the two counters increases when the TN bottleneck buffer is built up. Congestion is detected when this difference increases too much compared to the minimum difference. When performing DDD, the severity of the congestion is differentiated. In case of moderate dynamic delay increase (t sof t e.g. 40 ms) it is soft congestion and in case of large increase (t hard e.g. 60 ms) it is hard congestion. Detected FSN gap is also reported as hard congestion.
978-1-4244-3062-8/08/$25.00 ©2008 IEEE The detected congestion and its severity is reported to the Node B by a TCI CF, if no TCI CF was sent for a given time (t T CI ). The purpose of t T CI is to avoid unnecessary reaction to the same congestion event twice and its value is based on TN dynamics.

B. Flow control in Node B
Whenever a TCI is received by the Node B, it triggers a congestion action by the FC entity. Depending on the severity of the congestion a reduce request with a coefficient Q is issued. Different coefficients are applied in case of soft and hard congestions: Q sof t (e.g. 90%), Q hard (e.g. 50%). Depending on whether it is a serving cell flow or a non-serving cell flow, the rate reduce request is issued to the Uu scheduler or to the Frame dropping functionality.

C. Congestion Action in the Serving cell
Until the first rate reduce request is received, the Uu scheduler behavior is not affected by FC at all. Based on air interface conditions, hardware resources and Happy Bit information, the Uu scheduler decides the granted bitrate represented by the AG. Upon receiving a rate reduce request, the scheduler decreases the granted bitrate by sending a new AG according to the received Q. Additionally, when a rate reduce request was issued for a flow, the scheduler is not increasing the absolute grant of that flow with more than a predefined rate r linIncr (e.g. 20-200 kbps/s). The value of r linIncr is determined based on typical TN bitrate and typical number of parallel flows. The Uu scheduler used in the studied system does not send any RGs in the serving cell.

D. Congestion Action in Non-serving cell
A TCI received in the non-serving cell shall not trigger rate reduction by RG, because a MAC-e PDU is received in the best cell (usually the serving cell) with a higher probability. Consequently, if we reduced the bitrate due to TN limitations in the non-serving cell, we might reduce the bitrate of the enduser unnecessarily. However, congestion action still needs to be taken, thus part of the received MAC-e PDUs are dropped. If these PDUs are not received in the serving-cell, then RLC AM still retransmits these missing PDUs.
A forwarding coefficient, Q forw determines the probability that a received MAC-e PDU is forwarded. It is 1 at initialization and each received reduce request decreases Q forw with multiplication by Q sof t or Q hard . The Q forw is gradually increased to 1 afterwards.

IV. PERFORMANCE ANALYSIS
The FC and the Uu scheduler 4 algorithms introduced in Section III were implemented in a WCDMA/HSPA system simulator. It contains the introduced HSUPA related protocol 4 Uu scheduling is more complex than the behavior described in Section III. However, for the simulations described in this section, the flows are TNL limited in most of the cases and scheduler behavior is dominated by the algorithm described in Section III.   . The multicell radio environment consists of standard models for distance attenuation, shadow fading and multi-path fading, based on 3GPP typical urban channel model, see [13]. The simulator supports 10 ms TTI length for E-DCH and the user equipment was E-DCH Category 3 terminal [14] which supports approx. 1.44 Mbps peak rate on L1. The radio network used by the simulator consists of an RNC and a Node B with 3 cells.
To illustrate the need for system specific congestion control one uploading user is simulated when TN is limiting and TN capacity is varying. Fig. 3 shows the 1 s average IP level throughput 7 of the user with or without FC. The usage of FC provides high IP level throughput and reacts to the TN capacity changes very fast and accurately. When relying only on TCP (i.e. no FC) the performance is seriously degraded. In the beginning, TCP throughput is increased until the 200 ATM cells long TN buffer becomes filled. The RLC PDUs start being lost and retransmitted. Retransmission further increases load and PDU loss ratio and thus retransmission. During the simulation TN loss ratio is ∼20% and 71% of all sent RLC PDUs are retransmissions, which results much lower throughput. However with several retransmissions IP packets still reach the RNC, there is no IP packet loss or gap. Thus the congestion is not visible for TCP until an RLC reset and a consequent TCP timeout happens at ∼60 s.
In the rest of the section we evaluate the performance of the proposed FC. In the investigated cases the TN capacity was 1, 2 or 4 Mbps and there was no DCH and HSDPA traffic. A traffic model which can load the TN and is simple enough to evaluate the system behavior in detail was implemented. This model has three parameters: number of users attached to the Node B, object size uploaded by the users (10 Mbyte)  We use the following performance measures to investigate the performance and potential protocol problems: average total IP level throughput, average E-DCH Iub/Iur data frame delay and Jain's fairness index [12] of IP level throughput calculated for 1 s long intervals. The simulations were run for 600 s to evaluate these measures. Note that due to the protocol overheads the maximum achievable IP level throughput is ∼3 Mbps in case of 4 Mbps TN capacity. Fig. 4 compares the average IP throughput as a function of the number of users over TN link using different TN buffer sizes, i.e. 500 and 200 ATM cells (solid and dashed lines), and different TN capacities, i.e. 1, 2 and 4 Mbps. Table I shows the buffer sizes in ms for different TN capacities. The introduced FC uses efficiently the TN bottleneck 8 and the throughput is hardly dependent on the TN buffer size. But in case of 4 Mbps TN capacity, there is a small throughput difference between the different TN buffer sizes. This is because the 200 ATM cells long TN buffer is only 21.2 ms long which is less than the soft congestion limit, hence only frame loss based congestion detection was possible to use. In case of the other TN capacities the DDD is also possible to use. Fig. 5 shows the average DF delay for different TN capacities and TN buffer sizes. The more users were connected to the Node B the higher average delay is measured, however these values are far below the 100 ms target. Note that the TN buffer length (Table I) is the upper limit of the measured delay. The MAC-es PDU loss ratio was also investigated, but it was less than 1% even when the TN buffer was small and only frame loss based congestion detection was possible to 8 In case of 4 Mbps TN capacity and 1 or 2 users the throughput is limited by the Uu peak rate. use. Fig. 6 shows the average fairness index with 2 Mbps TN capacity for different TN buffer sizes. In case of 500 ATM cells long buffer the fairness provided by FC is more than 0.9. Less fairness was measured in case of 200 ATM cells long TN buffer (i.e. 42.4 ms). This is because the soft congestion region is very small hence in case of congestion only few users got soft and the others got hard congestions. The differences in the decrease rates (Q sof t , Q hard ) caused the less fairness.

V. CONCLUSIONS
With more and more increased air interface throughput, the efficient utilization of the often limiting transport network became more important. To meet this demand a per-flow HSUPA transport network flow control algorithm has been proposed. The need for transport network congestion control was shown and transport network congestion detection and avoidance techniques were described. The introduced algorithm can support quality of service differentiation among HSUPA flows as well as different transport network bottlenecks for the flows of the same Node B. It was shown by simulations that the proposed algorithm can maintain high transport network utilization and good fairness among the flows while also keeping the delay and loss in the transport network low. The solution has been compared to a scenario when we rely only on TCP congestion control and it has been shown that the lack of HSUPA flow control causes serious performance degradation in the system when the transport network capacity is limiting the throughput.