Delay allocation between source buffering and interleaving for wireless video

One fundamental tradeoff in the cross-layer design of a communications system is delay allocation. We study delay budget partitioning in a wireless multimedia system between two of the main components of delay: the queuing delay in the source encoder output buffer and the delay caused by the interleaver. In particular, we discuss how to apportion the fixed delay budget between the source encoder and the interleaver given the channel characteristics, the video motion, the delay constraint, and the channel bit rate.


Introduction
Delay partitioning is a fundamental tradeoff problem in the cross-layer design of a communications system. This problem is especially important in real-time video communications such as video conferencing or video telephony, in which there exists a tight end-to-end delay constraint. For example, interactive video telephony should have a maximum end-to-end delay of no more than around 300 ms. Once the receiver begins displaying the received video, the display process must continue without stalling. In other words, in order to be useful, frame data entering the source encoder at time t must be displayed at the decoder by time (t + T), where T is the delay constraint, that is, an upper bound for end-to-end delay of the system. In addition, the available data rate on the channel is constrained by the available bandwidth.
In [1] and [2], the design of rate-control schemes for low-delay video transmissions was studied for a noiseless channel. In [3] and [4], the efficient design of an interleaver for a fading channel was investigated. In [5], *Correspondence: xinwangharry@mail.com 4 VMware Inc., Palo Alto, CA 94304, USA Full list of author information is available at the end of the article specific tandem and joint source-channel coding strategies with complexity and delay constraints were analyzed and compared. In [6][7][8], delay-constrained wireless video transmission schemes were proposed for different application scenarios. In [9] and [10], tradeoffs between delay and video compression efficiency were discussed for a motion-compensated temporal filtering (MCTF) video codec and for hierarchical bi-directional (B-frames) schemes, respectively. In [11], the tradeoff between the long-term average transmission power and the average buffer delay incurred by the traffic was analyzed mathematically over a block-fading channel with delay constraints. And in [12], the tradeoff between the network capacity and the end-to-end queueing delay was studied for a mobile ad hoc network.
In this literature, either design strategies with delay constraints were investigated without considering any tradeoff issue, or certain tradeoff problems with delay constraint were discussed for different purposes and contexts than those in this paper. In this paper, we study delay partitioning for video communications over a Rayleigh fading channel. In particular, we focus on the delay allocation between the source encoder buffer and the interleaver as we vary various parameters, such as the motion of the video content, the rate of variation of the channel, the end-to-end delay constraint, the channel bit rate, and the channel code rate.
The system model we study is shown in Fig. 1. Typically, video frames arrive at the video encoder at a constant frame rate. The frames are compressed to a variable bit stream and passed on to the video encoder output buffer from which bits are drained at a constant rate. To protect against channel errors, forward error coding (FEC) is employed on the compressed bitstream coming out of the encoder buffer. This is followed by interleaving to provide robustness to channel fading. Finally, the bit stream coming out of the interleaver is modulated and sent over the wireless channel. At the receiver, the bitstream is demodulated, de-interleaved, decoded, and then passed on to the video decoder input buffer (henceforth called the decoder buffer). The video decoder extracts bits from the decoder buffer at a variable rate to display each frame at its correct time and at the same constant frame rate at which they were available to the video encoder. A rate-control mechanism is used at the video encoder to control the number of bits allotted to each frame so that the encoder buffer and the decoder buffer never overflow or underflow, while maintaining acceptable video quality at all times. Note that we assume there is no video encoder input buffer, and no video decoder output buffer; hence, the video encoder output buffer and the video decoder input buffer are called the encoder buffer and decoder buffer, respectively, throughout this paper.
The paper is organized as follows. In Section 2, the system model is introduced in detail. In Section 3, we formulate the delay partitioning problem mathematically and end up with a relationship among source encoding buffer delay, interleaving delay, and channel decoding delay, under a delay constraint. Simulation results of the tradeoff between the source encoder buffer and the interleaver are shown and analyzed in Section 4, for different video sequences over Rayleigh fading channels. In particular, we study how the tradeoff will be affected by the motion of the video content, the rate of variation of the channel, the delay constraint, and the channel bit rate. Lastly, Section 5 concludes the paper.

System model with delay constraint
In this section, we will discuss the components in Fig. 1 in detail.

Source coding
In real-time video communications, the end-to-end delay for transmitting video data needs to be very small, particularly for interactive two-way applications such as video conferencing and gaming. Video data enters the source encoder at a constant rate of f frames per second (fps), where it first undergoes block-based motioncompensated (MC) prediction, followed by DCT transformation of the residual block. The DCT coefficients are quantized by appropriately choosing the quantization parameter, and the quantized values are then run-length and Huffman coded. Assume the transmission bit rate is R B bits per second (bps), and the source-coded bit stream leaves the encoder buffer at r s bps.
Whenever a frame occupies more than r s /f bits, bits will accumulate in the source encoder buffer and increase the encoder buffer delay experienced by the incoming bits. If this trend continues for several frames, the buffer may fill up because the buffer size is limited. When the number of bits in the buffer is more than a predetermined threshold, it will lead to frame skipping as will be discussed later. On the other hand, whenever a frame occupies less than r s /f bits, the encoder buffer fullness level decreases. If this trend continues for several frames, the encoder buffer may run empty, thereby wasting channel bandwidth.
By sensing the buffer fullness and keeping an estimate of the available bit budget, the rate control chooses the quantization step size and seeks to prevent buffer overflow and underflow while maintaining acceptable video quality. If either the remaining bit budget is small or the buffer is getting full, the rate control resorts to coarse quantization. If either the remaining bit budget is large or the buffer is getting empty, the quantization step size is reduced (i.e., fine quantization). A large delay budget for the source encoder allows the use of a large encoder buffer, which tends to result in higher-quality video because the rate control has more freedom. Typically, the increased number of bits resulting from finely encoding a complex scene can be easily accommodated in the large buffer. However, when tight delay constraints exist, the system must operate with a small encoder delay budget, or equivalently a small encoder buffer, which tends to reduce the quality of the video, as the functioning of the rate control is more constrained. In extreme cases, the encoder buffer may fill up several times, leading to loss of data through repeated frame skipping.
On the decoder side, the incoming stream of video data is buffered in a source decoder buffer. Once the source decoder starts displaying the frames, the delay constraint becomes operational. If T denotes the upper bound for end-to-end delay of the system, a frame entering the encoder at time t must be displayed at the decoder at time (t +T), and all the video data corresponding to this particular frame must be available at the decoder accordingly. A video frame that is not able to meet its delay constraint is useless and is considered lost. We assume that the source decoder has knowledge of the frame numbers skipped by the source encoder and that it holds over the immediately preceding displayed frame and displays it in place of the skipped frame.
In H.263, the rate control performs the bit allocation by selecting the encoder's quantization parameter for each block of 16 × 16 pixels. We choose the test model number 8 (TMN-8) rate control [1,2] recommended for lowdelay applications. The TMN-8 rate control is a two-step approach: a frame layer control first selects a target bit count for the current frame, followed by a macroblock (MB) layer rate control which selects the quantization step size for each MB in the frame. The TMN-8 rate control has a threshold for frame skipping. Whenever the number of bits in the encoder buffer increases beyond this threshold, typically one frame is skipped so that the number of bits in the buffer falls below the threshold. For each skipped frame, buffer fullness reduces by r s /f bits. We assume the first frame in the video sequence is coded as an I frame, and all subsequent frames as P frames, since this is a common strategy for video communications with a tight delay budget. We also assume the I frame is transmitted error free to the decoder and the decoder does not start the display until the first I frame is completely buffered. The rate control starts with the first P frame. Once the I frame is displayed, the delay constraint becomes operational and all subsequent frames must meet their delay constraint.
From the point of view of the system engineer, the parameter of interest is the threshold for frame skipping (denoted by S t ). However, for the hardware engineer, the buffer size (denoted S b ) is more important. These two quantities are closely related, as explained now. As shown in Fig. 2, we modify the rate control such that, while encoding the frame, if the buffer fullness level exceeds S t , the remaining MBs of the frame are all skipped. If a particular sequence comprising N skip bits is used to inform the decoder of this situation, the buffer size required is S b = S t +N skip . Because N skip is usually much smaller than S t , for example N skip = 24 in our system while S t is at least several thousand, for simplicity, we assume the threshold for frame skipping and the buffer size are the same and are equal to S, i.e., S b = S t = S.

Channel coding
The information bitstream coming out of the source encoder buffer is channel coded using a rate-compatible punctured convolutional (RCPC) code with rate r c and constraint length ν [13]. At the receiver, the Viterbi algorithm is used to find the best candidate in the trellis for the received bitstream. The delays encountered in the channel encoder and decoder are called the channel encoding delay and the channel decoding delay. Together, these make the delay budget of channel coding. When using a convolutional code with constraint length ν, the channel decoding delay is approximately the decision depth of the Viterbi decoder, which is about 5ν in bits. The decision depth for punctured convolutional codes is generally longer. If the puncturing period for the RCPC code is P, the decision depth can be bounded by 5Pν [14]. We note, however, that when using channel encoding schemes such as turbo coding that require iterative decoding at the receiver, the channel coding delay budget may use up a significant portion of the overall delay budget.
Bandwidth is a major resource shared between source coding and channel coding. A bandwidth constraint limits the available rate on the channel. Allocating more bandwidth to the source encoder allows more information from the source to be transmitted, resulting in betterquality video. However, the bandwidth available for channel coding is reduced, leading to increased errors on the channel and thus a reduced probability of achieving high video quality. Let R B bps be the total available rate on the channel, and r s and r c be the average source coding rate and the channel code rate, respectively. Then the bandwidth constraint is expressed as [15,16]

Interleaving and fading channel model
We consider coherent BPSK over a flat fading channel, where flat fading means that there is a constant gain across the bandwidth of the received signal. Therefore, the effect of the channel is a multiplicative gain term on the received signal level. We use the channel model suggested by Jakes [17], in which the envelope of the fading process is assumed to be Rayleigh distributed. The Doppler spectrum is given by where f D is the Doppler frequency and is given by f D = f c v/c, where f c is the carrier frequency, v is the mobile velocity, and c is the speed of light. The covariance function of the fading process for this channel model can be shown to be given by the first order Bessel function, namely where τ is the time separation between the two instances when the channel is sampled. Thus, the correlation between two consecutive symbols with separation T s is where T s is the symbol time. The product f D T s is usually called the normalized Doppler frequency. Error control coding works well when the code symbols used in the decoding process are affected by independent channel conditions. Correlated fading is one of the sources of channel memory on the land mobile channel. Interleaving is used to break up channel memory, and it is an essential element in the design of error control coding techniques for the land mobile channel. A block interleaver formats the encoded data in a rectangular array of N 1 rows and N 2 columns. The code symbols are written in row-by-row and read out column-by-column. On the decoder side, the received symbols are first de-interleaved before they enter the decoder. As a result of this reordering, the fading samples of two consecutive symbols entering the decoder are actually N 1 T s apart in time, and the correlation between two consecutive channel instances is now given as J 0 (2πf D N 1 T s ). The parameter N 1 is often referred to as the depth of the interleaver.
The inverse of the normalized Doppler frequency roughly equals the coherence time, N coh = 1/(f D T s ), of the channel in bits, and is a measure of the number of consecutive bits over which the channel remains correlated. The amount of interleaving required depends on the channel. If the channel is slower, the coherence time is larger and consequently a larger interleaver is required. When there is no limit on the size of the interleaver, perfect interleaving can be achieved for mobile channels, which ensures that the fading envelopes are uncorrelated. However, both interleaving and de-interleaving introduce delay in the system, called the interleaving delay. Both of these delays are equal to N 1 N 2 T s seconds. In a practical system, the interleaving delay budget is constrained not only by the overall delay budget but also by the delay budget necessary for the robust functioning of the source coding and the channel coding.
For convolutionally coded systems, the dimensions of the interleaver are chosen to maximize the interleaving depth N 1 , which should ideally be N coh to ensure nearly independent fading conditions for consecutive symbols. More important, N 2 should be chosen at least large enough to avoid the wrap around effect [18][19][20]. The wrap around effect means that the length of an error event exceeds the number of columns in the interleaver. This results in more than one symbol being affected by virtually the same channel conditions and thus degrades performance. As a rule of thumb, the number of columns is chosen slightly larger than the length of the shortest error event of the code.
Interleaving, in conjunction with FEC, is a mechanism to achieve time diversity, where, by transmitting consecutive symbols sufficiently separated in time, nearly independent fading is ensured. As with any diversity technique, the performance improvement shows diminishing returns with increased diversity order. Note that the effective order of diversity is a nondecreasing function of N 1 . Various rules of thumb are available in the literature to determine the interleaver depth sufficient to extract nearly independent fading case performance [3,4].
In [3], simulations were used to demonstrate that fully interleaved performance is approximately achieved for BPSK over exponentially correlated channels when the interleaver depth is chosen to satisfy f D T s N 1 > 0.1. This rule, however, does not apply to correlated fading channels with other auto-correlations, such as Jakes' model. In [4], a simple figure of merit for evaluating the depth of the interleaver was obtained for Rician channels, and a variety of channel auto-correlation functions. However, as shown in our simulations, this figure of merit does not hold true for Jakes' fading model with low κ factor (κ is the ratio of signal energy in direct and diffused signal components) Rician channels and the limiting Rayleigh fading case.

The delay constraint formulation
The end-to-end delay constraint of each frame, T, is the upper bound to the delay that a frame may experience and still be able to be displayed on time, where by delay, we mean the time difference between when the video frame is captured for encoding and when it reaches the video decoder. Consider frame i captured at time t. Without loss of generality, we assume t to be zero. Further, we assume that each frame has the same number of MBs, and denote this number by M (e.g., for video with QCIF format, M = 99). We also denote the MB index by k (k = 0, 1, 2, · · · , M − 1), and we let b i (k) be the number of bits in the kth MB of the ith frame.
Frames arrive at the video encoder at some constant frame rate, and thus, the processor has to process each frame in the same amount of time because we assume there is no video encoder input buffer. Each frame has the same number of MBs, and we assume each MB has to be processed in the same amount of time. At the video decoder, frames are displayed at a constant frame rate, and we assume there is no video decoder output buffer.
For frame i to meet its delay constraint, the kth MB's decoding must begin at time T − (M − k)T d , where T d is the time required to decode a MB (source decoding only, i.e., excluding the FEC decoding) and is assumed to be the same for all MBs. Also, the kth MB becomes available for encoding only after time kT e , where T e is the encoding time of a MB (source encoding only, i.e., excluding the FEC encoding) assumed to be the same for all MBs. Thus, if the kth MB is to meet its decoding deadline, the following must be true: where T eb (k) is the encoder buffer delay, i.e., the time the kth MB waits in the encoder buffer before it starts moving out to the channel encoder, T enc (k) is the FEC encoding delay for the kth MB, T int (k) is the delay caused by interleaving for the kth MB, T c (k) is the transmission time for the kth MB, T CH is the channel propagation delay, assumed to be a known constant, T dein (k) is the delay caused by de-interleaving for the kth MB, T dec (k) is the channel decoding delay, and finally T db (k) is the decoder buffer delay for the kth MB, i.e., the time it waits in the decoder buffer before its decoding begins for display. A few simplifications can be made. We have earlier explained the logic for assuming that the video encoding time, T e , and the video decoding time, T d , are the same for all MBs. We also assume they are equal to each other, which is essentially the same as saying that the MBs arrive at the encoder buffer and depart from the decoder buffer as a stream with each MB spaced T MB seconds apart, where T MB = 1/(Mf ) and f is the frame rate. As a consequence of the above assumption, notice that the right hand side of Eq. (4) becomes independent of k. We ignore the delay caused by channel encoding (i.e., T enc (k) ≈ 0), because it is negligible compared to the delay caused by channel decoding and the delay caused by source encoding. For Viterbi decoding of RCPC codes with constraint length ν and period P , the decoder has a latency of approximately T dec (k) = 5Pν/R B [13,14]. Also, since we are assuming a rate r c channel code and a fixed channel rate of R B bps on the channel, the transmission time for the kth MB can be expressed as T c (k) = b i (k)/r s . We assume that each MB has enough bits to span the width of the interleaver at least once, i.e., b i (k) ≥ N 2 . The sum of the interleaving and the de-interleaving delays is then approximately given as T int (k)+T dein (k) ≈ 2N/R B . Incorporating all these simplifications, Eq. (4) can be written as: Furthermore, the term b i (k)/r s is typically on the order of a few milliseconds. For example, with r s = 48 kbps, f = 10 fps, and M = 99, the average number of bits per MB is b i (k) ≈ 50, and thus, b i (k)/r s ≈ 1 ms. Because the delay budgets in the multimedia applications we study are typically equal to or greater than 100 ms, the term b i (k)/r s can be neglected. Assuming a constant channel propagation delay T CH , and noting that we need T db (k) > 0 to guarantee the source decoder buffer does not run empty, Eq. (5) can be rewritten as where The encoder buffer delay experienced by each MB in each frame must satisfy the above inequality in order for the corresponding frame to meet its display deadline. As explained previously, the maximal number of source coded bits in the source encoder buffer is equal to S, and they leave the buffer at a rate r s bps; thus, T eb (k) ≤ S/r s . As a result, Eq. (6) is always true whenever the following is true: where S/r s can be viewed as the delay budget for source coding, 2N/R B as the delay budget for interleaving and 5Pν/R B as the delay budget for channel decoding. As a result, the delay partitioning problem is to allocate the delay budget among these three components under the constraint (7), such that the overall distortion of the video transmission is minimized. In the following section, simulation results are presented to study the three main components in the delay budget. A possible future research interest may be to apply some analytical models with suitable utility function computed from the three delay budget components, so that the delay budget tradeoffs can be resolved analytically in some specific conditions, but that would be beyond the scope of this paper.

The effect of interleaver depth on system performance
An interleaver is important to remove the channel memory when error control codes designed for memoryless channels are applied to channels with memory. Before we consider the tradeoff in delay allocation in wireless multimedia, we first study the effect of interleaver design without a delay budget restriction. The performance of an interleaver is governed by its interleaving depth N 1 . As mentioned in Subsection 2.3, simulation results in [3] demonstrated that fully interleaved performance is approximately achieved for BPSK over exponentially correlated channels when N 1 ≥ 0.1N coh is satisfied, and [4] further extended this result to Rician channels and a variety of channel auto-correlation functions by proposing a simple figure of merit for evaluating interleaver depth. Our simulations confirm this result for Jakes' fading model with high κ factor Rician channels. In Fig. 3, we show simulation results for a system with a channel code of rate r c = 1/2 and minimal distance d min = 10, an interleaver with N 2 = 100 columns, and Jakes' fading spectrum with f D T s = 0.01. The two bottom dashed lines are drawn for the Rician channel with κ = 5 (or 7 dB), with interleaver depth N 1 = 14 and ideal interleaving (i.e., N 1 = ∞). These results match the results  Fig. 4 in [4], which illustrates that N 1 = 14, which is slightly larger than 0.1N coh = 10, gives performance close to ideal (infinite) interleaving.
However, further simulations illustrate that this figureof-merit does not hold true for Jakes' fading model with low κ factor Rician channels. Lowering the κ factor of the Rician channel makes the fading more severe, and the channel is Rayleigh when κ = 0 (or −∞ dB), where the direct signal component is totally absent. Clearly with decreasing κ, the performance degrades and a larger interleaver depth may be required. In Fig. 3, the performance when κ = 0 is shown, by utilizing interleavers with depth N 1 = 14, N 1 = 0.7, N coh = 70, N 1 = N coh = 100, and infinite interleaving. As seen from the four top plots, substantial gains in performance are achieved over N 1 = 14, with an improvement by an order of magnitude, especially at middle and high SNR. On the other hand, although the performance improves significantly from N 1 = 14 to N 1 = 70, there is not much gain in further increasing after N 1 ≥ 70. This is the typical characteristic of any diversity system, where with increasing diversity order, the improvement in performance shows diminishing returns. Figure 4 further illustrates this point, by showing bit error performance versus interleaver depth, with a convolutional channel code having r c = 1/3, constraint length ν = 6, and d min = 14 [13,21], over a Rayleigh fading channel with f D T s = 0.005 (i.e., N coh = 200). N 2 is fixed to be 16, which is slightly greater than d min [18,19]. We again note the sharp fall in bit error rate (BER) as N 1 increases from 0 to 80, and that the performance begins to flatten out around N 1 = 140 onwards, which is again the depth corresponding to 0.7N coh .
As a result, our simulation results suggest the following: for Rician channels with high κ factor, fully interleaved performance is approximately achieved when the interleaver depth N 1 ≥ 0.1N coh ; while for Rician channels with low κ factor, in particular for a Rayleigh fading channel, fully interleaved performance is approximately achieved when N 1 ≥ 0.7N coh . Also, the number of columns (N 2 ) should be greater than the minimal distance (d min ) of the channel code to avoid the wrap around effect.

Delay allocation between the source encoder buffer and the interleaver, for fixed delay budget, channel bit rate, and FEC code
We will discuss the delay allocation between the source encoder buffer and the interleaver in this and the next subsections. In all our simulations, we encoded QCIF size video sequences at f = 10 fps. Also, for all comparisons, we kept the ratio of the energy-per-coded bit to the noise power spectral density, E s /N 0 , constant at 3 dB. For each set of system and channel parameters, we ran 10,000 realizations of the time-correlated Rayleigh fading channel, which were generated using Jakes' model [17]. We computed the cumulative distribution function (CDF) of the average peak signal-to-noise ratio (PSNR), where PSNR is calculated by first averaging the mean square error (MSE) for the entire decoded video sequence, and then converting to PSNR. The system performance can be gauged once the CDF curves for each possible set of parameters in the  Fig. 5 illustrates what the CDF curves could look like. Whenever two CDF curves do not intersect (e.g., curves C 1 and C 3 in Fig. 5), the lower curve is superior because it always has a higher probability of achieving any given average PSNR. When there are crossovers between two curves (e.g., curves C 1 and C 2 in Fig. 5), then one curve may be superior for one application but not for another. Comparison between the curves may then involve criteria such as minimizing the area under the curve, perhaps with some weighting. In this paper, as shown in Fig. 5, to evaluate the system performance, we adopted the criterion from [22] of minimum area under the CDF curve to the left of a certain threshold x h defined later in the paper, i.e., the value In this subsection, we analyze the delay partition between the source encoder buffer and the interleaver, for a fixed delay budget C, a given channel bit rate R B and a fixed RCPC code with rate r c . As explained in Section 2, the delay budget of the source encoder is determined by the threshold for frame skipping S. Given R B and a RCPC code with rate r c , the source coding rate, r s , is determined by (1), and the channel decoding delay, which is roughly equal to (5Pν/R B ), is also fixed. Under this scenario, increasing the delay budget of the source encoder comes at the cost of reducing the interleaver delay budget, i.e., using a smaller interleaver. In general, given the total delay budget C and channel bit rate R B , the choice of S is affected by the source encoding rate r s and the video content, and the choice of interleaver depth N 1 is related to the channel fading characteristics (N coh and channel model) and the video content. Therefore, we will focus on how this tradeoff will be affected by the motion of the video content, the rate of variation of the channel, the delay constraint and the channel bit rate.
In the following simulations, we used the rate r c = 1/3 RCPC code with ν = 6 and d min = 14 [13,21] for channel coding, and N 2 was fixed at 16. We ran the simulations with different parameters, for example, video sequences with high, medium, or low motion, channels with fast, medium, or slow fading, delay constraints that are tight, medium, or loose , and different channel bit rates.
First, we assume a delay constraint C = 150 ms and a channel bit rate R B = 144 kbps (thus r s = 48 kbps). We simulated the system for a medium motion sequence "Foreman" QCIF over a medium fading channel with normalized Doppler frequency f D T s = 0.005 (N coh = 200 bits). The candidate delay allocations we tested are summarized in Table 1, which were calculated based on Eq. (7). Figure 6 shows the CDF curves of the PSNRs for these delay allocations, and the areas under the CDF curves are plotted as the solid line in Fig. 7, where the xaxis is the interleaver delay budget expressed as a fraction of the total delay budget. It is seen that, as the interleaver delay budget increases from N 1 = 67, the system performance initially improves because of the increased diversity gain. However, the diversity gain shows diminishing returns, and at some point the reduction in source encoder delay budget starts having more of an effect, and the system performance degrades. It is seen that (N 1 = 151, S = 5500) is the optimal delay allocation for this case, where N 1 is about 3 4 N coh .  To see the effect of the motion of the video content, we also simulated a very high motion sequence "Mobile" QCIF and a very low motion sequence "Akiyo" QCIF, with the other parameters the same (C = 150 ms, R B = 144 kbps and N coh = 200). The system performances measured by the areas under the CDF curves are plotted and compared in Fig. 7, where the threshold value x h was set to be the maximal PSNR value observed among all the realizations in the test for that individual video sequence. For example, in Fig. 6, the largest PSNR achieved by any of the systems is 33.01 dB, so for the purposes of generating the curve corresponding to Foreman QCIF in Fig. 7, we compute the areas under the CDF curves and to the left of x h = 33.01 for the curves in Fig. 6. Because different x h values were used for the three curves corresponding to the three different video sequences, the performance comparison (i.e., y-axis values) is only meaningful within a curve, but not between different curves. It is observed that, given the above parameters, a higher motion video sequence requires a higher source encoder buffer size S, at the cost of a smaller interleaver depth. For example, Fig. 7 shows the optimal choices of N 1 are 170, 151, and 140, for Akiyo, Foreman, and Mobile, respectively. In compressing video, some frames may need more bits than other frames because of the presence of fine detail. In addition, for a high motion video, some frames may need a significantly larger number of bits than others to well represent the occurrence of high motion, and the performance may degrade more seriously during concealment for frame skipping. As a result, a larger source encoder buffer is needed. To further illustrate this point, in Fig. 8, we assumed an unconstrained encoder buffer size, and recorded the number of bits accumulated in the buffer for the three video sequences when the source rate was r s = 48 kbps. Note that, although the buffer size is unlimited here, the number of bits accumulated is not infinite because the system is still subject to rate control. As expected, Fig. 8 illustrates that a higher motion sequence usually needs a larger buffer size than a lower motion sequence.
We also simulated the system for different channel variation rates, with the same C = 150 ms and R B = 144 kbps.   all three video sequences. Again, the x h values were set to the maximal PSNRs observed for the corresponding video sequences. It is seen that, given the same set of system parameters, a larger N 1 is preferable for a slowly fading channel, in order to break the channel memory, whereas a smaller N 1 is preferable for a fast fading channel to free up more of the delay budget for the source encoder buffer.
Next, we simulated the system for different delay budgets and different channel bit rates. Figure 13 shows the system performance for Foreman QCIF at R B = 144 kbps Fig. 8 The number of bits accumulated in a source encoder buffer with unlimited size versus the frame number, for different video sequences at the source coding rate r s = 48 kbps and f D T s = 0.005, with a tight delay constraint C = 100 ms, a medium constraint C = 150 ms and a very loose constraint C = 250 ms. In order to compare the performance not only along each curve in Fig. 13, but also across curves, the same x h value, set to be the maximal observed PSNR value in all the simulations for Fig. 13, was applied for the area calculations. It is seen that, for the three constraints, the optimal choices of N 1 are 135, 151, and 180, respectively, while the corresponding optimal ratios of the interleaver delay budget to the total delay budget are 30.0, 22.4, and 16.0 %, respectively. In other words, as the delay budget C increases, the optimal Fig. 10 System performance, as measured by the areas under the CDF curves, versus the fraction of the interleaver delay budget, for different video sequences, Rayleigh fading channel with f D T s = 0.0035, delay budget C = 150 ms, and channel bit rate R B = 144 kbps. The curve for Foreman QCIF is derived from Fig. 9 with x h = 32.82 interleaver depth N 1 increases, because of more available resources, while the corresponding ratios of the interleaver delay to the total delay budget decrease, because of the diminishing returns of the diversity gain. Also, it is seen that the system performance with the best (N 1 , S) choice improves, i.e., has a smaller area (y-axis value), as C increases. Similar trends occur when the channel bit rate R B increases, holding other system parameters constant. As shown in Fig. 14, which plots the system performance for Foreman QCIF at C = 150 ms and f D T s = 0.005, with different channel bit rates, the optimal choices of N 1 are 135, 151, and 170, for R B = 96 kbps, R B = 144 kbps, and Fig. 12 System performance, as measured by the areas under the CDF curves, versus the fraction of the interleaver delay budget, for different video sequences, Rayleigh fading channel with f D T s = 0.01, delay budget C = 150 ms, and channel bit rate R B = 144 kbps. The curve for Foreman QCIF is derived from Fig.11 with x h = 34.81 Fig. 13 System performance, as measured by the areas under the CDF curves, versus the fraction of the interleaver delay budget, for delay budgets C = 100 ms, C = 150 ms, and C = 250 ms, Foreman QCIF, Rayleigh fading channel with f D T s = 0.005, and channel bit rate R B = 144 kbps. All the areas are calculated with x h = 34.16, and the curve for C = 150 ms is derived from Fig. 6 R B = 168 kbps, respectively, and the corresponding ratios of the interleaver delay budget to the total delay budget are 30.0, 22.4, and 21.6 %, respectively. Also, the system performance with best (N 1 , S) choice improves when R B increases.
Examining the results shown in figures from Figs. 6 to 12, as well as our other simulation results, we see the following trends. First, the normalized Doppler frequency is the key parameter in the delay partitioning, and a system operating over a fast fading channel prefers a smaller interleaver depth N 1 . And as shown in all the above simulation results, it seems that about 0.7N coh (more precisely, from 0.6N coh to 0.9N coh ) is a safe choice for N 1 . This result is consistent with our conclusion in Subsection 4.1, which illustrates that the maximum gain from the interleaver is approximately achieved when N 1 ≥ 0.7N coh in a Rayleigh fading channel. Second, the video content also affects the delay partitioning; a sequence with higher motion content usually prefers a larger source encoder, and thus a smaller N 1 . Third, either fast fading, or a larger total delay budget C, or a larger channel bit rate R B , improves the system performance on the average, holding other parameters the same. For example, Figs. 6, 9, and 11 show that, for a given set of system parameters, the highest PSNR achieved improves from about 32 dB to about 34 dB when the channel varies more rapidly. Note that the performance improvement for a larger C or a larger R B is due to the system having additional available resources, while the performance improvement for fast fading is due to additional channel diversity. However, the last conclusion is valid only for accurate channel estimation. Lastly, the gaps between the performances of the optimal delay allocation and various sub-optimal delay allocations decrease when the channel varies faster. For example, in Fig. 10 (a slowly fading channel), the performance of the optimal allocation and those of other allocations varies by a factor of 10, while in Fig. 12 (a fast fading channel), the differences are limited to a factor of 1.2. This implies that the delay allocation issue is more important when the channel varies slowly. When the channel varies fast enough, different allocations may not affect the performance as much.

Bandwidth allocation and delay allocation
In this subsection, we vary the channel coding rate, r c , to analyze the bandwidth partition between source coding and channel coding, together with the delay partition between the source encoder buffer and the interleaver, for a fixed delay budget, C, and a given channel bit rate, R B .
Again, the rates r s and r c must satisfy bandwidth constraint (1). Also, we note from delay constraint (7) that, for a fixed R B , the interleaver delay, which is equal to 2N/R B , and the channel decoding delay, which is equal to 5Pν/R B , do not change by changing r s . This implies that increasing S proportionately with r s will ensure that the same delay allocation is maintained. However, maintaining the same delay allocation is not necessarily desirable. With a change in r s and r c , the optimal delay allocation may change.
Assume there are N c candidate channel codes with rates {r c }. The optimal bandwidth partition and delay partition, i.e., the best (r c , r s , N 1 , S) 4-tuple, can be determined by a two-step optimization method: Step I: For each channel code candidate with rate r c , calculate the corresponding r s from Eq. (1). For each (r c , r s ) pair, among the candidate delay partition pairs (N 1 , S), find the one for this bandwidth allocation that minimizes the area under the CDF curve, as illustrated in Section. 4.2. This yields N c 4-tuples, with corresponding PSNR CDF curves.
Step II: Among the N c 4-tuples, find the one with the smallest area under its CDF curve, using a common threshold value, x h . This (r s , r c , N 1 , S) 4-tuple is the one with best bandwidth and delay allocations.
To illustrate this procedure, we simulated the system for different channel codes in the same RCPC family, with rates equal to 1/3, 4/11, 2/5, and 4/9 [13], for Foreman QCIF at f D T s = 0.005, R B = 144 kbps, and C = 150 ms. From Eq. (1), the corresponding source coding rates are 48, 52.4, 57.6, and 64 kbps, respectively. For each (r c , r s ) pair, different (N 1 , S) pairs that satisfied Eq. (7) were simulated, and the (N 1 , S) pair that minimized the area under the CDF curve was selected. For example, the pair (N 1 = 151, S = 5500) was selected for the bandwidth allocation (r c = 1/3, r s = 48 k), where the areas of CDF curves were derived from Fig. 7. In Fig. 15, we show, for four possible (r c , r s ) allocations, the CDF curve for the corresponding best (N 1 , S) pair. They are (N 1 = 151, S = 5500), (N 1 = 170, S = 5780), (N 1 = 190, S = 6110), and (N 1 = 217, S = 6400), for r c = 1/3, r c = 4/11, r c = 2/5, and r c = 4/9, respectively. Then, the best bandwidth and delay partition 4-tuple was selected among the four candidates shown in Fig. 15. In Fig. 16, we plot the areas under all the CDF curves, wherein all curves were calculated with the same threshold x h = 35.78. It is seen that the (r c = 1/3, r s = 48 k, N 1 = 151, S = 5500) 4-tuple yields the best overall performance.
In Fig. 16, we show that, all other parameters being the same, increasing r c , and thus increasing r s in accordance with Eq. (1), the optimal ratio of the interleaver delay to the total delay budget increases, and both the optimal interleaver depth N 1 and the optimal source buffer S increase. This is because, first, both channel coding and interleaving are used to combat the channel fading and to protect the information sequence, so when a channel code with higher r c is used, it is willing to use a larger N 1 to compensate for the loss from a less powerful channel code. Second, with r s increasing, the source encoder needs a larger buffer. As shown in Eq. (7), for a fixed R B , the channel decoding delay, 5Pν/R B , is fixed for all the RCPC codes in an RCPC family, since all the codes are formed from the same mother code with the same period P and constraint length ν. When r c increases, the source encoding delay, S/r s , becomes smaller, given the same S, because r s increases with r c according to Eq. (1). This additional delay resource will be shared by both the source encoder and the interleaver, both of which want a larger delay budget. It turns out that the best selection is one that results in a larger S and a larger N 1 . Further, the optimal ratio of the interleaver delay to the total delay budget, which is equal to (2N 1 N 2 )/(R B C), also increases, because N 1 increases, while C, R B and N 2 are kept constant.
Lastly, Fig. 16 shows that, when increasing r c , the system performance with the best delay partition degrades. For example, the performance gaps between that of the Fig. 16 System performance, as measured by the areas under the CDF curves, versus the fraction of the interleaver delay budget, for different channel coding rates {r c }, Foreman QCIF, Rayleigh fading channel with f D T s = 0.005, delay budget C = 150 ms, and channel bit rate R B = 144 kbps. All the areas are calculated with x h = 35.78, and the optimal performance points, corresponding to the minimal areas on the respective curves, are derived from Fig. 15 optimal delay allocation for r c = 1/3 and those for r c = 4/11, r c = 2/5, and r c = 4/9, are about a factor of 0.34, 2.50, and 8.81, respectively. It is seen that, under the scenario we studied here, the system always prefers to use the strongest channel code. This is probably because the E s /N 0 value is 3 dB, which is relatively low. Under better channel conditions, a higher rate RCPC code would most likely be preferred.

Conclusions
We analyzed the performance of a wireless video communication system operating over a fading channel, under both an end-to-end delay constraint and a bandwidth constraint. We showed that the main delay components in the system include the queuing delay in the source encoder output buffer, the delay caused by interleaving and deinterleaving, and the delay caused by channel decoding. The relationship among these three components, restricted by the delay constraint, was derived mathematically. We then focused on the delay partitioning between the source encoding and the interleaving.
Simulation results of the tradeoff between the delay of the source encoder buffer and the interleaver were compared. In particular, we studied how this tradeoff is affected by parameters such as the Doppler frequency of the fading channel, the motion of the video content, the delay constraint, the channel bit rate, and the channel code rate.
It was shown that the normalized Doppler frequency of the fading channel (i.e., N coh ) is the key parameter in the delay partitioning. Given other parameters held constant, a system operating over a fast fading channel prefers a smaller interleaver depth N 1 , and thus a smaller ratio of the interleaver delay to the total delay budget. From our results for various QCIF sequences over a Rayleigh fading channel with different bandwidth and delay constraints, we found that optimal values for the interleaver depth N 1 ranged from the integer part of 0.6N coh to the integer part of 0.9N coh , and that, in general, the integer part of 0.7N coh is a safe choice for N 1 . Also, we showed that the system performance is more sensitive to the delay partitioning when it operates over a slow fading channel.
Other system parameters also affect the delay partitioning between the source encoding and interleaving. In general, for a sequence with higher motion content, because of a larger variation in the number of bits used to describe each frame, a larger source encoder buffer size S and a smaller interleaver depth N 1 are preferable, and thus a smaller ratio of the interleaver delay to the total delay budget. For a system with a larger total delay budget C, or a larger channel bit rate R B , because of the additional resources, both a larger S and a larger N 1 are preferable, and our results indicate that the corresponding ratio of the interleaver delay to the total delay budget becomes smaller. Lastly, for a system with a higher channel code rate (i.e., a weaker channel code), because of the increase of source rate and the loss of error correction capability, both a larger S and a larger N 1 are again preferable, but now our results indicate that the corresponding ratio of the interleaver delay to the total delay budget becomes larger.
We also showed that either a larger total delay budget C, or a larger channel bit rate R B , or fast fading (i.e., a smaller N coh ), improves the system performance on the average, holding other parameters the same. Notice that the conclusion for fast fading is valid only for accurate channel estimation. Also, a two-step procedure was proposed to determine the optimal bandwidth partition and delay partition, from a finite set of possible RCPC codes. The best allocation depends on both the channel conditions and the video content.
In conclusion, we mention several possible directions in which this work can be extended. We used a video encoder with single-frame prediction. One may involve the use of more sophisticated source encoding strategies, such as hierarchical bi-directional prediction (B-pictures) and long-term frame prediction with pulsed quality, which are more efficient but will introduce additional source coding delay. Also, the channel codes we studied are from a family of RCPC codes. One may use instead codes based upon iterative decoding, such as turbo codes and low-density parity check (LDPC) codes, which are more powerful but can result in a larger delay. Additionally, our analysis assumed perfect channel estimation. One can relax this assumption, and study the effect on the delay allocation when noisy channel estimates are used. Finally, we studied the tradeoffs of the delay partitioning problem based on simulation results. One can adopt analytical models which are appropriate for some specific scenarios to study the influence of different delay components, so that the optimization problem can be solved by suitable algorithms for some restricted conditions.