Modulation level allocation for MGS streaming over a multihop wireless channel

This article introduces a method for efficiently transmitting medium grain scalable video packets over a transmission path consisting of multiple wireless links. Medium grain scalability provides bit rate adaptation according to the available bit rate by dropping a number of video packets in the compressed bit stream. In other words, rate-distortion control can be achieved by means of packet transmission control. The available bit rate and the spectral efficiency are determined by the bandwidth and the modulation level, respectively. Accordingly, the number of packets available for transmission is affected by the modulation level of the packets. However, if we consider modulation levels with higher spectral efficiency in order to increase the number of packets and reduce the expected video distortion, the packet error rate of the transmitted packets can also be increased because the spectrally efficient modulation levels are sensitive to channel noise. This is another reason for the increment in expected video distortion, because the erroneous received packets cannot be used for video reconstruction. Therefore, this article considers the minimization of expected video distortion by the optimization of two factors--packet extraction for transmission and modulation level allocation for the extracted packets. Packet extraction is optimized for the path between the source and destination nodes, whereas the modulation level for each extracted packet is optimized for each link along the transmission path.


Introduction
Scalable video coding (SVC), as standardized by the joint video team of the international telecommunication union-telecommunication standardization sector (ITU-T) and the international organization for standardization/ international electro-technical commission (ISO/IEC) [1], is a video compression method that can bandwidth-efficiently support multiple spatial-temporal resolutions for a single video. It also supports a multi-bit rate feature that can be adapted to network or channel variations. These standard SVC properties can be utilized in diverse applications, such as multi-user video streaming services, distributed video streaming multihop networks, or scalable ondemand services, as discussed in [2]. This article considers medium grain scalability (MGS), which is one of the standard bit rate scalable coding methods [3]. MGS provides network abstraction layer (NAL) packets (MGS packets in this article) that can be dropped without causing a decoding violation. To efficiently utilize the multi-bit rate feature, unequal protection (UEP) strategies can be considered for MGS packets, as each packet has a different priority in terms of its rate distortion (RD) attribute. For example, a priority index was developed in [3,4] to indicate the priority that can be used for UEP. UEP has been considered in video transmission systems, as in [5][6][7][8][9][10][11][12][13][14][15]. In [5][6][7], an application layer resource type, such as parity data, was considered. In contrast, the studies [8][9][10][11][12][13][14][15] considered physical layer (PHY) optimization for SVC or MGS video streams, as in the proposed method. In [8], multiple code division multiple access (CDMA) channels were proposed, with a different processing gain for the SVC quality layers. The suggested optimization problem can be simplified by separately transmitting each SVC layer to each CDMA channel, so that as many CDMA channels as SVC layers are required to fully utilize the method. In [9,10], frequency diversity was utilized by orthogonal frequency division multiple access systems. Modulation and channel coding was designed to guarantee the same target bit error rate (BER) in [9,10], whereas [11,12] considered a flexible packet error rate (PER) according to the RD attribute of the video packets. The algorithms were designed to find the transmission modes of multi-rate transmitters, which minimize the expected video distortion. In other words, studies [11,12] jointly addressed UEP by only transmitting video packets with a higher priority, and by allocating more transmission time to higher priority packets. This article considers the same approach for a wireless transmission path consisting of multiple links. For multihop wireless channels, the crosslayer optimization (CLO) designs of [13][14][15] are introduced for a video streaming service. These designs, including optimal path selection, assume a sufficient number of intermediate nodes, so that if the quality of a link in one path degrades, an alternative path can be substituted. A predefined transmission time is reserved for each node, and the remaining time for each node is an important factor for these CLO designs. However, in the case where the number of intermediate nodes is too small, and only one feasible transmission path is available, this path selection diversity cannot be achieved. Therefore, the flexible allocation of transmission time for links on the selected path must be considered. In this article, such time allocation is achieved by allocating the adaptive modulation levels for the links. This article also assumes that the available transmission power for each link is limited, in order to prevent interference to surrounding communication systems. Therefore, we focus on the modulation levels of the links according to their channel state information (CSI) in terms of the received signal-to-noise ratio (SNR).
This rest of the article is organized as follows. Section 2 introduces the proposed system, outlines the problem statements, and formulates an optimization problem. Section 3 provides three levels of algorithm for solving this problem. The performance of the proposed method is demonstrated in Section 4, and we present our conclusions in Section 5.

Configuration of the proposed system
The proposed method efficiently transmits MGS packets over a wireless transmission path connected by multiple links. It is assumed that the density of the nodes is sufficiently low that there exists only one feasible path between the source and destination nodes. A predefined transmission time for the path is allocated prior to the proposed optimization, and the total time for the path can flexibly be distributed between the links on the path. Therefore, where τ i, h is the time required to transmit the ith packet over the hth link, T is the predefined time that is determined according to the required frame rate of the video for real time streaming, and P and ℋ are the number of video packets and links, respectively. We assume that the transmission power is fixed over the nodes in a path to prevent interference to surrounding communication systems. The proposed method is designed to optimize packet transmission and modulation level allocation according to the quality of each link. We assume that CSI concerning each link is fed back to the source nodes via a backward control channel. The source node extracts those packets available for transmission, and finds the modulation level of every link for each packet scheduled to be transmitted. These modulation levels are signaled to the corresponding intermediate nodes before the packet is transmitted. Each intermediate node demodulates the received packet, remodulates it according to the modulation level information, and forwards it to the next node.

Problem statements 2.2.1. Expected distortion analysis
Video distortion of decoded frames in a group of pictures (GOP) [1] is affected by the combination of packets available for the video reconstruction. Therefore, it is necessary to predict the combination at the destination node in order to control and reduce the distortion. The combination at the destination node for each packet in a GOP results from two factors-the packet drop rate (PDR) (decided by the transmitter) and the PER (influenced by channel noise). We define the packet loss rate (PLR), which is the probability that the packet is not available at the destination node, as j = 1-(1-X) (1-Y) where X and Y are the PER and PDR, respectively. For P packets in a GOP, the number of combinations is 2 P , as two cases (that of being used and unused for decoding) can be considered for each packet. Therefore, the expected distortion of the kth frame is where Φ c and d k, c are the probability that the cth combination occurs at the destination node and the distortion of the kth frame in the cth combination, respectively. Equation (1) implies that 2 P decoding simulations are required to calculated k , because d k, c must be measured for 0 ≤ c < 2 P by the decoding simulations. Φ c can be expressed in terms of the PLR of P packets (j i for 0 ≤ i < P ) as where α i, c denotes whether P i is to be used (α i, c = 1) or not (α i, c = 0). Video distortion is largely affected by the reference structure, which can be established in various ways. A multiple reference structure [1] is considered to improve coding efficiency and error resilience. However, according to [16], such improvements are largely dependent on the temporal characteristics of the videos, and, indeed, the improvements are not especially large, in most cases, compared to the complexity of this type of structure. In addition, both the video encoder and decoder require a large amount of memory to store the multiple frames to reference. Therefore, this article considers hierarchical B, with its single-reference coding structure. In this case, the number of decoding simulations can be reduced from 2 P , as discussed at the end of Section 2.2.1. Furthermore, by considering only combinations with a high probability of occurrence, the number of simulations can be reduced to a practical level.
A. Simple examples of expected distortion Let us assume that there are three frames, each of which is coded to one packet, as depicted in Figure 1. In this figure, the packet reference [1] is expressed by the arrows. F k is the kth frame in temporal order, and P k is its coded packet. In this article, we define the number of effective packets (NEP) for each frame. The resulting distortions of the three frames are decided by the combination of NEP (CNEP) of the frames. For this coding structure, four CNEPs can be considered, as shown in Table 1. If a referenced packet is unavailable, any packet referencing it is also unavailable. For example, if P 0 is unavailable, P 1 becomes unavailable, so that P 2 also becomes unavailable. Therefore, if NEP for F 0 is 0, all of the packets of F 1 and F 2 become unavailable for decoding. Hence, NEPs for F 1 and F 2 also become 0, as shown by the CNEP of 0. CNEP = 1 denotes the case where P 0 is available and P 1 is unavailable. Therefore, the NEP for F 2 is also 0, because P 2 becomes ineffective for distortion. The probability of occurrence of each CNEP is shown in the table in terms of the PLR of the packets. If P 0 is unavailable, P 1 and P 2 are also unavailable, so that CNEP = 0. In other words, the probability of CNEP = 0 is the same as the PLR of P 0 , j 0 . As CNEP = 1 in the case where P 0 is available and P 1 is unavailable, the probability of CNEP = 1 is (1-j 0 )j 1 . In this way, the probability of the four CNEPs can be calculated, so that the expected distortion of the three frames can be obtained from (1). Therefore, the expected distortion of 3. As the distortion of F 0 is not affected by the quality In this way, the expected distortion of F 1 can be written as 3 . If we assume SVC, we can consider multiple layers [1]. Therefore, as shown in Figure 2, each frame can be coded to multiple packets, where each packet of a frame represents a spatial or quality layer. In this article, we focus on quality scalability and consider MGS coding. Therefore, in the rest of this article, the term "packet" means base quality layer packet or MGS packet. In Figure 2, P k, l is the lth quality layer packet of F k , where the 0th quality layer means the base layer. The associated CNEPs are given in Table 2. In the same way as the previous coding structure, the expected distortion of F 0 can be obtained as where d 0,1 = d 0,2 = d 0,3 and d 0,4 = d 0,5 = d 0,6 . In this way, the expected distortion of each frame can be expressed in terms of PLRs. The CNEPs in Table 2 can be categorized into three groups according to the number of frames requiring error concealment, where a number of error concealment techniques [17] can be considered for any frame that NEP = 0. For example, CNEP = 0 requires concealment for both of the frames, whereas CNEP = 1 or 4 requires concealment for F 1 . Neither frame requires any error concealment for the remaining four CNEPs. Therefore, C (the number of CNEPs) for Figure 2 is 2 0 + 2 1 + 2 2 = 7. C can be generalized as where T and Q are the number of frames and quality layers, respectively. This analysis can be extended to more complicated coding structures, such as the hierarchical B structure [1]. Figure 1 Reference structure of three frames. Table 1 Possible CNEPs of the reference structure in Figure 1 CNEP NEP Probability B. Expected distortion of hierarchical B structure For further coding efficiency and temporal scalability, SVC is designed to provide the hierarchical B structure shown in Figure 3, where four temporal layers are presented. To simplify the distortion analysis, this article defines reference groups (RGs), so that the hierarchical B structure in Figure 3 can be analyzed as depicted in Figure 4. In the figure, frames in an RG are independent of each other. Therefore, one RG can be considered as one frame in order to simplify the analysis of the reference structure and calculate the video distortion (detailed discussion is given with Table 3). Although some of the references in Figure 3 are omitted in Figure  4, this representation can sufficiently describe the reference relations. For example, we can see from Figure 4 that F 4 cannot be decoded if F 0 is not decoded, although the reference arrow from F 0 to F 4 is omitted in the figure. The CNEP of a frame unit was introduced previously. In this section, the CNEP of an RG unit is introduced in order to analyze the expected distortion of the hierarchical B coding structure. If the number of quality layers (including the base layer) is 3, the number of CNEPs is 364 according to (3), as T (the number of RGs in this case) is 5. Therefore, 364 decoding simulations, as listed in Table 3, must be accomplished to obtaind k for 0 ≤ k ≤ 8, where the NEPs of frames in an RG are the same as the NEP of the RG. For example, CNEP = 362 means that 2 packets are available for every frame in RG 4 (F 1 , F 3 , F 5 , and F 7 ) and 3 packets are available for the remaining frames. As each of the frames in an RG is independent of other frames in the same RG, these frames can independently be analyzed. For example, in order to calculated 1 (the expected distortion of F 1 ), the probability of each CNEP must be calculated. Therefore, the NEPs of RG 0 , RG 1 , RG 2 , RG 3 , and RG 4 can be considered as the NEPs of F 0 , F 8 , F 4 , F 2 , and F 1 , respectively. The probability of each CNEP is then calculated based only on the PLRs of packets in F 0 , F 8 , F 4 , F 2 , and F 1 , becaused 1 is not affected by the PLRs of packets in the other frames. The distortion of the kth frame of the cth CNEP, d k, c , can be calculated from the simulation of the cth CNEP. For example, to calculate d 1,362 , one packet (the highest quality layer packet) for each frame in RG 4 is eliminated, and all of the remaining packets are decoded. The resulting distortion of F 1 can be obtained for d 1,362 . Note that the NEPs of frames F 3 , F 5 , and F 7 in the same RG do not affect d 1,362 . However, performing 364 decoding simulations to optimize 9 frames is impractical. Therefore, a more efficient version of this expected distortion analysis is required, as discussed in Section 3.5.

Increment of expected distortion
The increment in the expected distortion due to every packet loss must be obtained in order to minimize the expected distortion, as discussed in Section 3. As we have mentioned, the expected distortion can be expressed in terms of the PLR. Consequently, the increment can also be expressed in terms of the PLR. In Figure 2, for example, if the PLR of P 0,0 increases, the expected distortion of F 0 (d 0 in (2)) increases according to On the other hand, the PLR of P 1,0 , j 1,0 , is irrelevant to the expected distortiond 0 given by (2), as no packet in F 0 references P 1,0 . Therefore, as (2) does not contain j 1,0 . In this way, the increment of the expected distortion can also be calculated for the hierarchical B structure, where the structure can be P 0,1 F 0 P 0,0 P 1,1 F 1 P 1,0 Figure 2 Reference structure of two frames with two quality layers. Table 2 Possible CNEPs of the reference structure in Figure 2 CNEP NEP Probability simplified by using the reference group concept discussed in Section 2.2.1.

Expected delay
For P i , the number of bits that can be transmitted per second over the hth link is r log 2 (μ i, h ), where r and μ i, h are the bandwidth and modulation level allocated to P i over the hth link, respectively. Therefore, the expected time required to transmit P i over the hth link is where L i and Y i are the number of bits and the PDR of P i , respectively. Note thatτ i, h is the expected value, as Y i is a probability. Therefore, the delay constraint is where P = {P i |i = 0, 1 . . . , N − 1} is the set of packets in a GOP ( N = F × Q is the number of packets in a GOP, where Q is the number of quality layers employed).

Optimization
The purpose of the proposed method is to minimize the average of the expected distortion of each frame in a GOP. Therefore, where ℱ is the number of frames in a GOP. The PLR of packet P i in P is where X i is the PER of P i . Here, the Lagrange optimization formula can be written as whereτ and T are the required and allowed delay in transmitting P, respectively. l is the Lagrange multiplier. As shown in (7), the optimized modulation level set μ*, PDR set Y*, and Lagrange multiplier λ* must be found in order to minimized + λ (τ − T) .

Resource distortion attribution
As discussed in [3], each additional MGS coded video packet drop results in an increment in the received video distortion and a decrement in the required bit rate. For each MGS packet, the study [3] defines Figure 3 Hierarchical B structure with four temporal layers.
Reference group representation of Figure 3. Table 3 CNEPs of the RG structure in Figure 4 CNEP NEP For bit rate control, the packets are prioritized according to the RD attribute. In this article, we modify this to consider the required time resource (delay). Any increment in the PDR or modulation level results in an increment in the expected distortion and a decrement in the required delay. Therefore, this article defines the resource-distortion (RsD) attribute, which is the distortion increment/delay decrement, as According to [18], x i, h can be approximated as for multi-level quadrature amplitude modulation (MQAM), where s i, h is the SNR of the hop. In (9), X i is the PER at the destination node, which is where ℋ is the number of hops in the path. Therefore, Equation (9) is and From (4), the divisor in (8) is Therefore, Note that if Y i is 1, both the dividend and divisor of λ PHY i, h are 0, so that λ PHY i, h cannot be specified. In other words, λ PHY i, h can take any value. On the other hand, the dividend of λ MAC i in (8) can be written, using (6) and (12), as From (4), the divisor is Note that Y i is a probability, and can be controlled to take an edge value of 0 or 1 by the transmitter. If it is one of these two edge values, neither the dividend nor the divisor of λ MAC i can be specified. Consequently, λ MAC i cannot be specified, which means it can take any value.

Algorithm I-searching continuous modulation level and PDR
If μ* and Y* minimized + λ (τ − T) in the optimization formula (7) differentiated with respect to any element μ i, h in μ* and Y i in Y* becomes 0, so that The equations above can be reformed and substituted with RsD attributes as As Y i affects λ PHY i, h and λ MAC i , as shown in (13) and (14), the following three settings can be considered in order to satisfy (15).
• Setting 2: Set μ i, h to satisfy λ = λ PHY i, h and Y i to 0. • Setting 3: Set Y i to 1.
The target service of the proposed method is real-time streaming that requires a more stringent delay constraint than that on the expected delay given in (5). However, Y i is not 0 or 1 for Setting 1, which means that P i may or may not be sent. Therefore, by excluding Setting 1 from our consideration, Equation (5) can be modified to Therefore, Algorithm I considers only Settings 2 and 3. Algorithm I is developed to search μ i, h so as to make λ PHY i, h approaches l. From (10), μ i, h can be evaluated in terms of x i, h as 1-1.6s i, h /ln[x i, h ]. Therefore, μ i, h satisfying (15) can be determined by obtaining x i, h satisfying (15). x i, h is the BER value, which is usually close to 0. Thus, it is convenient to consider the logarithmic values where is the term independent of b i, h and its logarithm, respectively. Figure 5 provides an example of Λ i, h , where L i and Ψ i, h are set to 5000 and 6, respectively.
To calculate x i, h according to (10), s i, h is set to 15 dB. If input Λ (the value that Λ i, h should approach) is G 1 , the solution of b i, h is g 1 . However, if input Λ is G 2 , there are two candidate solutions for b i, h , which are g 2,1 and g 2,2 . In this case, the lower value g 2,1 is selected, as the purpose of the algorithm is to minimizeδ . To find the intersection, Algorithm I is developed as shown in Figure 6. The second term in (17) is ln(μ i, h ) + (L i -1) ln (1-x i, h ) + 2 ln (ln(5x i, h )) + 2 ln(ln(μ i, h )). Therefore, Δ i, h (the slope of Λ i, h ) in Algorithm I can be obtained as by differentiating each term in (17) with respect to b i, h . Note that Ψ i, h has vanished, as it is independent of b i, h . First, Algorithm I checks whether or not ψ i, h is 0, whereas Algorithm II calculates θ i, h andδ i according to (11) and as discussed in Section 2.2.2, respectively, then inputs those values to Algorithm I. If at least one of the packets referenced by P i is dropped,δ i becomes 0 because a change in j i cannot contribute tod . Consequently, ψ i, h becomes 0 and Λ i, h becomes -∞, regardless of b i, h , according to (18). This means that a value of b i, h satisfying (15) does not exist. Therefore, Algorithm I sets Y i to 1, i.e., Setting 3, in order to drop the packet and satisfy (15) regardless of b i, h . Otherwise, if d is not 0, Ψ i, h is calculated from ψ i, h according to (18) in order to obtain the intersection shown in Figure  5. It can be seen from Figure 5 that the intersection of the curve Λ i, h and the solution, e.g., g 1 or g 2,1 (rather than g 2,2 ), must be somewhere at which Δ i, h ≥ 0 (note that the slope at the intersection of Δ i, h and g 2,2 is negative, and that the solution g 2,2 maximizes the video distortion, which is not desired). Therefore, b i, h should initially be set to a sufficiently small value. We found that b i, h = -10 (x i, h = 10 -10 ) was sufficiently small to make the slope positive for various settings of L i , Ψ i, h , and s i, h . To reduce the computational power, the algorithm utilizes Δ i, h . If Δ i, h is negative, Y i is set to 1 and the process is terminated.  Figure 5. This is due to the second and third terms in (19) becoming very small compared to 1 as x i, h approaches 0. However, if the input Λ is higher than the possible maximum of Λ i, h , e. g., Λ > G 1 in Figure 5, the solution does not exist. In this case, Δ i, h becomes negative while the Δ i, h and b i, h calculations are repeated. Table 4 shows an example where the input Λ is set to 6 for Figure 5. As shown in Table 4, b i, h was initially set to -10, as depicted in Algorithm I in Figure 6. In the first iteration of Algorithm I, Δ i, h and Λ i, h are calculated to be 0.9935 and -0.6443 according to (17) and (19), respectively. Λ i, h = -0.6443 is not close to Λ = 6, so Algorithm I determines Δ i, h , and finds that it is negative. Therefore, Algorithm I is terminated by setting the PDR Y i to 1, which means that P i will be dropped.

Algorithm II-refining parameters
In Algorithm I,δ i , and θ i, h are used to calculate ψ i, h according to (18).δ i is affected by the PLR of packets referenced by P i and the PLRs of packets referencing P i . In addition, θ i, h is affected by the PERs of P i of other links, according to (11). However, δ and θ (which are the set ofδ i and θ i, h for P, respectively) are not available unless Algorithm I has been accomplished for the related packets and links so far. Therefore, this article proposes Algorithm II, which initializes the PDR and PER of every packet over every link to 0 in order to initiate δ and θ. Therefore, Algorithm I can calculate μ and Y over the packets. However, if μ and Y do not satisfy (16), a real-time streaming service cannot be guaranteed. Therefore, Algorithm II drops packets with Setting 2 until (16) is satisfied, as shown in Figure 6. Equation (14) can be used as the criterion for deciding which packets will be dropped, as it quantifies the attribute of transmitting P i . Therefore, it drops packets with a lower λ MAC i . In this way, the set Y obtained by Algorithm I over P is modified according to the packet dropping procedure. Using the obtained PDR set Y and the PERs that can be calculated by μ, the algorithm can renew δ and θ. μ and Y are then calculated again via the same procedure. If every element pair in the previously calculated μ and Y is close to corresponding with those in the new μ and Y, we consider (15) to be satisfied. Therefore, Algorithm II is terminated. Otherwise, the algorithm repeats this procedure until μ and Y become stable. The expected distortiond is calculated in terms of mean squared error (MSE). The allowed delay T in (16) is set to 0.2 s. Using the example of Figure 7, this section discusses Algorithm III, which is designed to find the value of Λ that maximizes EPSNR, where the EPSNR performance of Algorithm II for various input Λ is shown in Figure 7c. The reason for the local maxima of EPSNR in Figure 7c is as follows. If Λ = 2.6, Algorithm II allocates 19 packets to transmit and a 0.194-s transmission time (by modulation setting for the 19 packets), as shown in Figure 7a, b. As the allowed delay is set to 0.2 s, more transmission time can be allocated by adjusting Λ. By reducing Λ from 2.6, the BER for  Figure 6 Algorithms for finding μ and Y for a given Λ. Table 4 Calculations for Λ i, h to approach Λ = 6 in Figure 5 Iteration  each transmitted packet is lowered (as shown in Figure  5), which means that a lower modulation level is allocated. Therefore, the transmission time becomes larger by reducing Λ from 2.6, as shown in Figure 7b. As Λ approaches 2.4, the transmission time approaches its maximum of 0.2 s. However, if Λ is less than 2.4, the number of transmitted packets is reduced to 18, as shown in Figure 7a, to keep the transmission time below 0.2 s. Thus, the transmission time is reduced discontinuously, as shown in Figure 7b. These discontinuities in radio resource (transmission time) result in the discontinuities and local EPSNR maxima observed in Figure 7c. Algorithm III is designed to find the Λ value that maximizes EPSNR. In the rest of this section, Algorithm III is explained by reference to the EPSNR performance of Figure 7c. In order to avoid the local maxima in the figure, Algorithm III iteratively reduces the search range. In the first iteration (Iteration 1 in Figure 7c)

Practical expected distortion and increment
The expected distortion and its increment are discussed in Section 2. The expected distortion incrementδ i is utilized to find the optimal packet extraction and modulation level allocation, and is updated while Algorithm II is performed. To obtain the exact amount of the expected distortion, the number of error patterns given by (3) must be simulated, which is impractical as T and Q grow. Therefore, the number of simulations must be reduced for practical implementation. For example, we can simulate only the T Q + 1 CNEP that are most likely to occur at the destination node. Table 5 shows an example of the CNEPs where the number of temporal layers and quality layers are 2 and 2, respectively. If the NEPs in the temporal level immediately before the current level is the same as or 1 greater than that of the current level, the CNEP is chosen in order to calculate δ (for Algorithm II) andd (for Algorithm III). In this case, T = 3, so that the number of required CNEPs is 7 = T Q + 1, as in Table 5. In the case of T = 5 and Q = 3, it is 16, which is much more practical than the 364 simulations discussed in Section 2.2.1.

Complexity of the proposed method
The proposed method consists of three algorithm levels.
As shown in Figure 6, Algorithm II runs Algorithm I as many as HP times until the convergence criterion for Algorithm II is achieved. Therefore, the complexity of Algorithm II is κ 2 = HP (κ 1 + κ MAC ) η 2 , where 1 and MAC are the complexity of Algorithm I and the calculation of (14) for the packet dropping module in Algorithm II, respectively. h 2 is the number of iterations required for convergence. The complexity of Algorithm I is 1 = 0 h 1 , where 0 is the complexity of calculating (17) and (19) and h 1 is the number of calculations. As Ψ i is fixed for each iteration of Algorithm I, the calculation of the second term in (17) is the main cause of the complexity. The computational power required to calculate this term is mainly dependent on calculating 10 β i, h to obtain x i, h , ln(5x i, h ), ln(μ i, h ), and x * i, h L i −1 . Once its components have been calculated, Equation (19) can easily be obtained. As MAC is trivial compared to 1 , 2 can be considered as HPκ 1 η 2 . Consequently, the total complexity for the proposed method for optimizing P is κ = HPκ 0 η 1 η 2 η 3 , where h 3 is the predefined number of samples taken to find the optimal Λ, as discussed in Section 3.4. Therefore, can be considered as O (HP) .

Algorithm I-discrete-searching discrete modulation level and PDR
Section 3.2 introduced Algorithm I for finding a continuous modulation level and PDR for a packet. Algorithm I finds a continuous modulation level satisfying λ = λ PHY i, h . However, it is not practical to realize this continuous modulation value. Therefore, this section considers discrete modulation levels affordable by MQAM. For instance, five modulations, such as 4-, 16-, 64-, 256-, and 1024-QAM, can be considered. By reforming (10), b i, h can be written as Constants C 1 and C 2 are log 10 [0.2] and 1.6/ln [10], respectively. As M = {4,16,64,256,1024}, the number of possible b i, h , and consequently the number of possible Λ i, h , is also five. Hence, it is difficult to satisfy λ = λ PHY i, h . Therefore, an alternative method (Algorithm I-Discrete) is considered, which finds the Λ i, h closest to Λ from among the five possible candidates. Consequently, modulation levels with Δ i, h ≥ 0 are selected from the five candidates, and the level whose Λ i, h value is closest to Λ is found. However, if each of the five candidates is less than 0, it can be considered that none of the modulation levels is adequate. Therefore, the algorithm sets the PDR to 0 and terminates. By substituting Algorithm I (which is called by Algorithm II, as in Figure 6) with Algorithm I-Discrete, an efficient packet extraction and discrete modulation level scheme can be obtained by Algorithm III.

Transmission path configured with a single link
Prior to a discussion of multi-link cases, the performance of the proposed method for a single link is discussed. The proposed method is designed to find the optimal set of transmitted packets, and the modulation level of each transmitted packet, for a transmission path configured with multiple links. Therefore, if the proposed method is applied to a single link, it operates similarly to the method in [12], which is designed to determine the optimal set of transmitted packets and transmission modes. In the rest of this article, the two methods are termed Single Link Optimization (SLO) and Multi-Link Optimization (MLO). In this section, JSVM 9.19.7 [17] is considered for the simulations. For the simulations of Sections 4.1 and 4.2, Mobile common intermediate format (CIF) is tested at 30 frames per second (FPS) with a bandwidth of 150 kHz for each link, where the GOP size is 8 (which results in three temporal layers (TLs) in the hierarchical B structure, as in Figure 3) and the number of quality layers (QLs) is 3. Figure 8 shows the total MSE increment of the first GOP by excluding each of the 27 packets configuring 9 frames and 3 QLs (QL 0, QL 1, and QL 2), where QL 0 is the base quality layer and the ninth frame is shared by the next GOP. If we consider an adaptive modulation strategy to guarantee a fixed BER level for every link, the number of transmitted packets for the limited transmission time (0.2 s in this section) is determined by the BER level. If the BER level is lowered to reduce channel error, the bit rate is reduced, so that the number of packets transmitted is reduced. Figure 9 shows the number of bits according to the number of transmitted packets, where the 27 packets of Figure 9 are prioritized by RD attribution, as in [3]. Figure 10 shows the number of the transmitted packets with respect to the channel SNR, where the target BER is set to either 10 -7 , 10 -8 , or 10 -9 . As shown in Figure 10, more packets can be transmitted by considering a higher BER level. The transmission time for the three BER levels of the adaptive modulation strategies are shown in Figure 11a. If a BER level is set for the packets, the transmission time is determined solely by the number of packets. Therefore, the allowed time of 0.2 s is not fully utilized in most cases, as shown in Figure 11a. Consequently, it is unfair to compare the three BER levels of the adaptive modulations with the proposed method, as the proposed method is designed to fully utilize the given transmission time. Therefore, this paper considers a Flexible BER Decision FBeD method, which finds the BER level that minimizes the expected video distortion when applied to all transmitted packets. FBeD is considered to provide a performance upper bound to the adaptive modulation method designed to guarantee a fixed BER level, and FBeD shows better performance than the other three methods in terms of EPSNR, as shown in Figure 11b. Furthermore, as FBeD can fully utilize the

MSE increment
Frame index in termporal order given transmission time, as shown in Figure 11a, it is suitable for comparison with the proposed method. Meanwhile, the proposed method individually determines the BER level for each packet according to its distortion increment (MSE increment) and bit length. Figure 12 shows the allocated BER of each packet and its MSE increment for FBeD and SLO with channel SNRs of 12 and 14 dB. The cumulative transmission time for each transmitted packet using the two methods is shown in Figure 13, where the cumulative time is considered to show the total transmission time reaching 0.2 s. In Figure 13, the transmitted packets are sorted in decreasing order of MSE increment, so that it can be conveniently observed alongside Figure 12. For a 12-dB channel SNR, both FBeD and SLO transmit 17 packets.
In the 14 dB case, FBeD and SLO transmit 19 and 22 packets, respectively. As shown in Figure 12a, SLO provides higher protection for 4 of the 17 packets, and more time is allocated for the packets with a high MSE increment, as shown in Figure 13a, where the number of transmitted packets is the same. As shown in Figure  12b, SLO allocates a higher BER for 19 of the 22 packets, so that it allocates less time for packets with a low MSE increment, as shown in Figure 13b. Hence, SLO can transmit three packets more than FBeD while providing a lower BER level for three packets with a high MSE increment. For both values of the channel SNR, SLO shows better performance. This is shown in Figure  14, which shows the number of transmitted packets and EPSNR for various channel SNR cases.

Transmission path configured with two links
Whereas SLO finds the optimal transmission packet set and BER (modulation level) for a single link, MLO finds these two parameters for multi-links. In the following    simulations, a transmission path configured with multiple links is considered, where the channel SNR of one link on the path changes. In Section 4.2, a transmission path configured with two links (Links 0 and 1) is considered in which the channel SNR of Link 1 is 25 dB and that of Link 0 varies. Figure 15 shows the transmission time used for each link to transmit all of the packets. In Figure  15, it can be seen that less time is allocated to a link with a relatively high SNR, so that the other link can use more time to protect the packets from channel error. For SLO, however, an equal time of 0.1 s is assumed to be allocated to each link, as it is designed for a single link. For multi-links, FBeD is adjusted to allocate the same BER level to all links, so that the transmission time can be flexibly allocated to each link, as in MLO. Figure 16 shows the number of transmitted packets and EPSNR with respect to a varying channel SNR of Link 0. As the same amount of time is allocated to each link when the SNRs of the links are the same, as shown in Figure 15, MLO shows a similar performance to SLO, as shown in Figure 16b. As shown in Figure 16, if the SNR difference between the two links is relatively small (i.e., SNR of Link 0 is 23-33 dB), the EPSNR performance of SLO and MLO is almost identical. In this case, the EPSNRs are higher than that of FBeD. However, as a link with a lower SNR cannot occupy more time in SLO, the EPSNR cannot be improved, even for Link 0 with a channel SNR above 33 dB. When Link 0 has a SNR below 23 dB, Link 0 cannot occupy more time in SLO, so that its EPSNR degrades to below that of FBeD.
(a) ( b)   Transmitted packets, in decreasing order of MSE increment

Transmission path configured with three links
In Figure 17, a path with three links (Links 0, 1, and 2) is considered, where the channel SNRs of Links 1 and 2 are 25 and 30 dB, respectively. Because the EPSNR performance of SLO is sensitive to an imbalance in the channel SNRs of links, SLO becomes more inadequate as the number of links grows. Therefore, the performance of SLO has been omitted from Figure 17. This figure shows the test results from three quarter CIF (QCIF)-15 FPS and three CIF-30 FPS videos, where the bandwidth for each link was set to 50 kbps for QCIF videos and 200 kbps for CIF videos. The GOP size for QCIF videos was set to 4 and that for CIF videos was set to 8. In addition to the two methods with continuous modulation levels (FBeD (CNT), MLO (CNT)), the performance of the methods with discrete modulation levels (as discussed in Section 3.7) is also shown in   Figure 17, MLO (CNT) shows better performance than FBeD (CNT). In both the QCIF and CIF cases, Foreman shows acceptable video quality (EPSNR higher than 30 dB) for relatively low channel SNR compared to Mobile and Football for FBeD (CNT) and MLO (CNT). This is because the visual and motion characteristics of Foreman are relatively simple compared to Mobile and Football, so it can be compressed more. Therefore, a lower modulation level can be allocated for Foreman while transmitting as many packets as required for acceptable video quality. However, as shown in Figure 17c modulation levels below that supported by discrete MQAM (μ i, h less than 4 in (20)) are evaluated by FBeD and MLO for low channel SNR, so that FBeD (DSC) and MLO (DSC) degrade severely as the channel SNR degrades. Nevertheless, it is observed that MLO (DSC) shows improvement over MLO (DSC) for the six video sequences.

Conclusions
This article proposed a method to jointly exploit the bit rate and channel adaptation provided by MGS and adaptive modulation over a transmission path consisting of multiple wireless links. The proposed algorithms found the optimal packet transmission scheme by extracting packets according to an RsD attribution quantified as the distortion increment over the delay decrement. In order for the extracted packets to be transmitted, the proposed algorithms also found the optimal modulation allocation. The two factors of packet extraction and modulation allocation were optimized simultaneously by solving an optimization problem to minimize the total distortion.