Cross-Layer QoS Support for Multimedia Delivery over Wireless Internet

Delivering multimedia over wireless Internet is a very challenging task. Multimedia delivery inherently has strict quality-of-service (QoS) requirement on bandwidth, delay, and delay jitter. However, the current Internet can only support best-e ﬀ ort service, which imposes varying network conditions during multimedia delivery. The advent of wireless networks further exacerbates the variance of network conditions and brings greater challenges for multimedia delivery. To improve perceived media quality by end users over wireless Internet, QoS supports can be addressed in di ﬀ erent layers, including application layer, transport layer, link layer, and so forth. This paper presents a framework, which provides QoS support, for multimedia delivery over wireless Internet, across di ﬀ erent layers. To provide e ﬃ cient QoS support for di ﬀ erent types of media over the best-e ﬀ ort networks, we ﬁrst propose a cross-layer architecture, which combines the application-level, transport-layer, as well as link-layer controls, and then review recent advances in each individual component. Speciﬁcally, dynamic estimation of varying channel and network, adaptive and energy-e ﬃ cient application and link-level error control, e ﬃ cient congestion control, header compression, adaptive automatic repeat request (ARQ) and priority-based scheduling, as well as QoS-adaptive proxy caching technologies are explicitly reviewed in this paper.


INTRODUCTION
With the explosive growth of the Internet and dramatic increase in wireless access, there is a tremendous demand on multimedia delivery over wireless Internet.The third generation (3G) wireless networks, foreseen to be the enabling technology for multimedia services with up to 384 kbps outdoor and 2 Mbps indoor bandwidth, makes it feasible for multimedia communication over the wireless link [1].Moreover, the proliferation of 802.11 systems, that can provide up to 100 Mbps bandwidth, has extended the role of traditional Internet to support media streaming services in the air [2].However, multimedia over wireless Internet poses many challenges as follows.

Different QoS requirements for different types of media
In general, different types of media have different characteristics.Specifically, real-time media such as video and audio is delay-sensitive but capable of tolerating a certain degree of errors.Non-real-time media such as Web data is less delaysensitive but requires reliable transmission.In addition, due to scalable media encoding technologies, different parts of real-time media are of different importance.

High packet loss rate and bit error rate
In wireline networks, packet losses are usually caused by congestion in intermediate routers.Meanwhile, wireless channels have higher bit error rate (BER) due to fading and multipath effects.The resulting packet losses and bit errors can have devastating effects on multimedia quality.

Bandwidth limitation and fluctuation
Network conditions and characteristics in the current Internet such as bandwidth, packet loss ratio, delay, and delay jitter vary from time to time.Meanwhile, the capacity of wireless network also fluctuates with the changing environment.

Low performance for traditional transport-layer protocol
Traditional transport-layer protocol assumes congestion to be the primary cause for packet losses and unusual delay in the network.It will decrease the transmitting rate in the case of packet losses.Unfortunately, in wireless networks, the packet may also be dropped due to channel errors, thereby resulting in unnecessary reduction in end-to-end throughput.

Limited battery life
Comparing with fixed nodes, there is a battery lifetime constraint in mobile devices.In general, maintaining good media quality and minimizing average power consumption, including processing power and transmission power at mobile devices, are in conflict with each other.From multimediacoding point of view, achieving better media quality usually consumes more processing power in source coder.From the network point of view, multipath fading and multiple-access interference (MAI) in wireless network necessitate the use of high transmission power.

Heterogeneity among users and networks
Receivers in multimedia delivery systems are quite different in terms of latency requirements, visual quality requirements, processing capabilities, power limitations, and bandwidth constraints.Moreover, multimedia may traverse different type of networks such as wire-line networks, 3G, and wireless local area network (WLAN) systems, each of which has different characteristics such as reliability, delay, jitter, bandwidth, and medium access control (MAC) mechanisms.
To handle the above challenges, many studies had been performed from different aspects.More specifically, linklayer, transport-layer, and application-layer solutions are proposed, respectively.
Considering the limitation of bandwidth in wireless systems, in the link layer, the most important target is to increase link utilization.It is known that RTP/UDP/IP and TCP/IP have the problem of large header overhead on bandwidthlimited links.Header compression has been proven to be efficient for using those protocols.Unfortunately, existing header compression schemes [3,4] do not work well on noisy links, especially the one with high BER and long round-trip time (RTT).Internet Engineering Task Force (IETF) had set up a working group (WG), called robust header compression (ROHC), to address the header compression issue.
To handle the severe bandwidth and delay fluctuation in wireless Internet, available network condition estimation and congestion control are key issues needed to be addressed in the transport layer.Throughput calculation, packet-pair, and packet-train bandwidth probing are several popular techniques for bandwidth measurement [5].Other network information such as packet error rate, delay, and delay jitter is also quite useful for high-quality media delivery.
Different congestion and rate control schemes can be performed so that multimedia such as video and audio can adapt to the estimated network information in a smooth way [6].
There are many studies in the application layer to improve media delivery quality.Error protection, power saving, and proxy management are several hot topics.To overcome the packet loss and residual bit error in wireless Internet, error control techniques such as forward error correction (FEC) and automatic repeat request (ARQ) are necessary to maintain high-quality media delivery.Unequal error control [7] can be adopted if further taking different importance of different types/parts of media into account.To compromise the power-quality dilemma, power control and joint source-channel coding (JSCC) are two effective approaches.Power control is conducted from the group point of view by controlling transmission power and spreading gain for a group of users so as to reduce interference [8]; while JSCC is conducted from the individual user's point of view to effectively combat the errors occurred during transmission by allocating bits between source and channel [9].The heterogeneous networks and different requirements of receivers ask for an efficient proxy-caching mechanism to satisfy different characteristics of receivers.Traditional proxy servers were designed to serve web requests for noncontinuous media, such as textual and image objects.With the increasing advent of video and audio streaming applications, continuous-media caching has been studied in [10].However, the varying wireless Internet condition and different media characteristics impose challenges on how to efficiently cache both continuous and noncontinuous media.
Figure 1 depicts a general architecture for multimedia delivery over wireless Internet.In this architecture, multimedia server, base station (BS) (gateway) with media proxy, and heterogeneous mobile clients are deployed.Different solutions in different layers, that are mentioned above, have been incorporated in this architecture.More specifically, application-layer, transport-layer, and link-layer control mechanisms are all taken into account in order to achieve good end-to-end quality of multimedia services.In the later sections, recent advances for those components are reviewed.

CROSS-LAYER ARCHITECTURE FOR MULTIMEDIA DELIVERY OVER WIRELESS INTERNET
As mentioned above, different layers have different impacts on the media delivery quality and meanwhile, different layers have different approaches to improving the media delivery quality.Figure 2 depicts the user plane protocol stack in Universal Mobile Telecommunications System (UMTS) highlighting the layers with significant impact on system performance.As shown in Figure 2, the application is transmitted via TCP or UDP in the Internet part based on the traffic characteristics.IP packets arriving in the downlink to the UMTS network are transported to the radio network controller (RNC).Necessary header compression techniques are applied to the packets in the packet data convergence protocol (PDCP) layer.Then the corresponding packets are  transferred to the radio link layer (RLC), where they are segmented into smaller RLC protocol data units (PDUs).Diverse RLC ARQ schemes can be used to achieve the required reliability.The RLC PDU queues of a particular IP connection are served by the MAC layer.In deterministic transmission time intervals (TTIs), the MAC layer entities ask the corresponding RLC layer entities for a certain number of RLC PDUs, which are then transferred through the radio interface in MAC frames.
Considering the three key components in wireless Internet architecture, that is, multimedia server, BS (gateway), and mobile hosts, the cross-layer architecture should fulfill the functionalities as shown in Figure 3.

Dynamic wireless Internet condition estimation
Network estimation in different layers on the server, BS, and mobile-host side works together to track the varying wireless Internet conditions.

Network condition adaptation
Adaptively, adjust the amount of wireless Internet resources (i.e., the bandwidth) according to the varying network condition.It is fulfilled in the congestion control module in the multimedia server and BS.

Network-aware media adaptation
In response to the changing network conditions, media encoding mechanisms and different parts of media can be adaptively adjusted or tailored to maximize the system efficiency and perceived end-to-end quality.

Power efficiency and error robustness
Application and link-layer error control schemes can be used together for error robustness.Meanwhile, the overall power consumption in the mobile station (MS) should be minimized.

Efficient network utilization
To improve the network utilization, especially in wireless channels, header compression is performed in both BS and mobile hosts.

Multiservices supporting
Priority-based scheduling is an efficient way to support multiservices.

Network and clients heterogeneity
It can be supported by QoS-adaptive proxy caching.
In the following sections, we will review various QoSsupport technologies for media streaming over wireless networks from application-layer, transport-layer, and link-layer aspects, respectively.

APPLICATION-LAYER MULTIMEDIA TRANSMISSION CONTROL
There are many topics that need to be addressed in application level to improve media delivery quality.More specifically, recent progress on adaptive media codec, error protection schemes, power saving approaches, and resource allocation solutions are reviewed in this section.

Network-aware media codec
In general, media codec has the ability to dynamically change the coding rate and other coding parameters according to the varying network conditions (bandwidth, loss, delay, etc.).Scalable coding techniques are introduced to realize this type of media adaptation.The major technique to achieve scalability is layered coding technology, which divides multimedia information into several layers, and the incremental reception of layers also increases the media fidelity.In video coding techniques which utilize DCT-based transform such as H.263 and MPEG-4, layered coding techniques can be categorized into three classes: temporal, spatial, and signal-tonoise ratio (SNR) scalability [11].Video codecs with temporal (spatial) scalability encode a video sequence into one base layer and multiple enhancement layers.The base layer is encoded by itself with the lowest temporal (spatial) resolution, and the enhancement layers are coded based upon the temporal (spatial) prediction to lower layers.SNR scalable codecs encode a video sequence into several layers with the same temporal and spatial fidelity.Base layer is coded by itself to provide the basic quality, and enhancement layers are coded to enhance the video quality when added back to the base layer [12].However, traditional layered codec requires the enhancement layers to be fully received before decoding.Otherwise, they will not provide any quality improvement at all.The fine granularity scalability (FGS), which adopts a method called bit-plane coding, can overcome this problem, that is, the enhancement layers can be truncated anywhere while maintaining some degree of quality improvement [11].Further enhancement based upon FGS such as PFGS [13], improves the coding efficiency by using as many predictions as possible from the same and immediate lower layers in the current and previous frames.Another kind of video codec can divide a video sequence into multiple spatial-temporal subbands via 3D wavelet transform.Each of these subbands could be coded into multiple bit planes which can be further divided into several coding passes.Thus each layer of the coded video consists of some specific coding passes among all the subbands to achieve the scalability [14].Speech coding is totally different from that for video, in that speech signals can be characterized by some particular model.There are generally three types of speech codec, that is, waveform, model-based, and hybrid codec.Waveform codec quantizes the amplitude of the signals at each point and improves the coding efficiency via an adaptive prediction filter which captures the correlation among the signals.A well-known example of this codec is adaptive differential pulse-code modulation (ADPCM).Modelbased codec encodes speech signals based on a specific speech model.The parameters of the model, instead of the original signal samples, are quantized and transmitted.The majority of the modern speech codec is based upon the hybrid method which combines the waveform and model-based codec.Coded-excited linear predictive (CELP) coding is the most widely used codec of hybrid methods [15].ITU G.729 and GSM-EFR are two standards based on CELP.All the above codecs encode speech signals into a voice stream with a fixed rate.The prevalent speech adaptation technique is adaptive multirate codec (AMR) [16], which is a mandatory standard for 3G systems.AMR allows a dynamic rate adaptation which is controlled by an in-band signaling procedure.In contrast to speech, audio codec cannot benefit from general models of the audio signal.The adaptation of audio codec is achieved similar to scalable video codec, which also has a layered structure [17,18].
Error resilience and concealment techniques are employed in video/audio coding to prevent from or minimize the quality impairment in the case of packet losses/errors during media transmission.Common error resilience and concealment approaches include data partition, insertion of synchronization marks to prevent from drifting errors, concealing lost information from temporally/spatially adjacent region, and so forth.A good review of robustness techniques on streaming audio can be found in [19].The introduction of the most updated error concealment/resilient tools in video coding can be found in [20], and the overview of existing error resilient and error concealment techniques in video coding can be found in [21,22], respectively.

Network-adaptive error control
Besides the varying network conditions, there are also packet losses and bit errors in wireless Internet.Thus, efficient error protection scheme is essential for improving end-to-end media quality.As mentioned above, ARQ and FEC are two basic error correction mechanisms.FEC is a channel coding technique protecting the source data at the expense of adding redundant data during transmission.FEC has been commonly suggested for applications with strict delay requirements such as voice communications [23].In the case of media transmission, where delay requirements are not that strict or round-trip delay is small (e.g., video/audio delivery over a single wireless channel), ARQ is applicable and usually plays a role as a complement to FEC.In [24], hybrid FEC/ARQ method was adopted and the delay bound can be achieved by limiting the number of retransmissions.A similar technology is also used in [25,26,27,28].Targeting at video streaming over WLAN, in [29], hybrid ARQ/FEC was adopted in unicast and FEC was used in multicast.A similar work in WLAN can also be found in [30], where unequal error protection (UEP) and ARQ are applied, taking the characteristics of WLAN into account.To reduce the bursty effect of packet losses, packet interleaving can also be adopted in conjunction with FEC and ARQ [31].As mentioned in Section 3.1, layered scalable media codec usually divides media into a base layer and multiple enhancement layers.Since the correct decoding of enhancement layers depends on the errorless receipt of base layer, base layer is more important than enhancement layers.Therefore, it is natural to adopt UEP for layered scalable media.Specifically, stronger FEC protection can be applied to the base and lower layer data while weaker channel coding protection level is applied to the higher layer parts.
To keep the residual error/loss rate under the targeted level, it is efficient to adjust the FEC protection level in response to the underlying changing network condition.For example, Global System for Mobile Communications (GSM) systems can dynamically distribute the voice data and channel coding among the overall bandwidth to the possible best voice quality.Moreover, studying how to add FEC codes to layered scalable media is of a great interest recently [7,26,27,31].In [7,31], the protection level of every layer in video bitstream is dynamically adjusted according to the changing network conditions.In [26], a channel-adaptive application level error control scheme utilizing UEP and delay constrained ARQ, has been proposed for scalable video streaming.Current and estimated RTTs are used to maximize retransmissions times while meeting the delay requirements.Similar approaches are also applicable to the layered audio stream [27].

Bit allocation between source and channel coding
Considering the limited resource in the media delivery system, an important question raised is how to decide the distribution between source codes and channel codes; and specifically for layered codec, to what extend a specific layer should be protected.This is generally known as the bit allocation problem.To answer the above questions, an analytic model describing the relation between media quality and source/channel parameters should be developed.The most common metrics to evaluate media quality is the expected end-to-end distortion D T , where D T consists of source distortion D S and channel distortion D C .Source distortion is caused during the media source encoding.Variable encoding methods such as quantization, motion estimation in video coding, linear prediction in voice coding, and rate control can be the cause of source distortion.Channel distortion occurs when fragments of media stream are lost due to network congestion, or incorrectly received due to wireless channel noise.Therefore, the bit allocation problem can be formulated as the optimization problem where R T is the total available bandwidth, and R S and R C are the rates for source coding and channel coding, respectively.
Based upon the analytical model above, optimal bit allocation can be resolved by numeric methods such as dynamic programming, penalty function, or Lagrange multiplier.Several bit allocation schemes have been developed according to the above model taking different kinds of scalable media codec and channel models into account [26,27,31].
In wireless Internet scenario, the packet losses consist of those due to network congestion and those caused by wireless transmission errors, which in turn may have different loss patterns.Since different loss patterns lead to different perceived QoS at the application level [32], Yang et al. [33] proposed a loss differentiated rate-distortion based bit allocation scheme which minimizes the end-to-end video distortion taking the above different loss patterns into account.
From the above rate-distortion analytical model, we can see that both source coding and channel coding parameters can affect the final media quality.JSCC schemes are thus proposed to achieve the optimal end-to-end quality by adjusting the source and channel coding parameters, simultaneously.From the source-coding point of view, adjusting quantization parameters or entropy coder to control source rate [34], selecting inter-or intracoding mode to trade off the coding efficiency and the error resilient ability [35], are several key issues that can be jointly considered with channel coding.In [28], an integrated JSCC scheme has been proposed to study the performance of FEC/ARQ, meanwhile the joint effect of FEC/ARQ and error-resilient source coding is considered.Because of the large amount of source and network parameters that could be jointly adjusted, the computational complexity of searching for the optimal solution is extraordinary high.Some works have to limit the dependencies among the parameters so that it can be solvable to dynamic programming [36].Another method is to use the local optimal solution instead of the global one [37].

Energy-efficient bit allocation
In addition to optimizing the quality of media streaming, mobile users in wireless Internet also need to considering the constraints imposed by limited battery power.How to achieve the good user's perceived QoS while minimizing the power consumption is yet another challenge.As mentioned in Section 1, there is a tradeoff between maintaining good media quality and minimizing power consumption.In order to maintain a certain transmission quality, larger transmission rate in wireless channels inherently needs more power, and more power also allows adopting more complicated media encoding algorithms with higher complexity and thus can achieve better coding efficiency.Therefore, an efficient way to obtain optimal media quality is jointly considering sourcechannel coding and power consumption issues.In end systems, the power consumption of media streaming over wireless mainly consists of transmission power and processing power.
Traditional joint source coding and power control schemes are mainly targeted at minimizing the power consumed for a single user.For example, a low-power communication system for image transmission has been investigated in [38].In [39], quantization and mode selection have been discussed together with transmission power consumption.Moreover, rate adaptation could be further taken into account [40].In addition to transmission power, processing power, which further consists of power consumption for source coding and channel coding, has been considered in [41].In [41], the analytical model characterizing the relation among power consumption, source and channel coding can be denoted as min P T P S , P C , P t s.t. or where P T is the total power consumption, P S , P C , and P t are the power required by source coding, channel coding, and transmission, respectively, and D 0 and P 0 are system-or userspecified distortion and power thresholds.
It is worth noting that power-quality optimization for a single user would potentially increase the interference to other mobile users in the interference range, which results in QoS reduction in those users who have been in the optimized state.In order to rereach the optimal status, those interfered users may adjust their transmission powers, which will also introduce interference to all the other users.Therefore, the global power optimization should be studied from the group point of view [39,41].

QoS-adaptive proxy caching for multimedia delivery over wireless Internet
Multimedia applications usually have stronger QoS requirements than that of best-effort services, which bring great challenges to the Internet and unreliable wireless networks.Moreover, the heterogeneity among different devices in different networks implies that their demands are different in terms of delay, bandwidth, and visual quality.Deploying multimedia proxies at the edge of Internet connecting both remote servers and end clients is an efficient way to satisfy the heterogeneous requirements of end users.On the proxyserver side, the backbone network between proxy and server is a best-effort network, that is, the network conditions such as bandwidth, packet loss ratio, delay, and jitter vary from time to time.On the proxy-client side, two types of clients, namely Internet clients and wireless clients, access the proxy via different networks.By caching the popular media content and treating end users from different networks differently, multimedia proxies can also alleviate network congestion and reduce the latency and workload on multimedia servers.
In order to provide efficient streaming service for both Internet and wireless clients, media proxy should deal with the following issues: (1) providing high quality video streaming service for both Internet clients and various wireless clients; (2) managing limited cache resource in proxy so as to provide optimal performance for heterogeneous users; (3) evaluating and selecting multimedia replicas from the servers in the Internet to relay streaming for end clients.
Traditional proxy servers are designed to serve web requests for noncontinuous media such as texts and images.In contrast to these objects, continuous media has very different characteristics such as high delay and bandwidth sensibility and tolerance of moderate data loss.Moreover, within a scalable media stream, different part of data is usually of different impacts on media quality.And caching the real-time traffic should also take the varying conditions in the Internet and wireless networks into account.All these call for a different design for multimedia proxies.
Cache replacement policy is one of the key components in the proxy design.Traditional caching replacement schemes designed for web data can be roughly classified as recency-based and frequency-based.Recency-based schemes such as least recently used (LRU) [42] exploit temporal locality among cache objects or recency of reference, that is, objects which have been referenced recently are more likely to be rereferenced in the near future.Frequency-based policies, for example, least frequently used (LFU) [43] make cache replacement decision according to the popularity of the content in the cache.Some other solutions are proposed to balance the frequency and recency-based algorithms such as LRU-k and LRFU [44,45].To support continuous media, Rejaie et al. introduced a replacement policy for layered media [46].Tewari et al. proposed a resource-based caching (RBC) policy for both continuous and noncontinuous media, balancing the usage of disk space and I/O [47].All the above mentioned works use hit rate as the performance metric.In [48], Yu et al. proposed a QoS-adaptive replacement policy for mixed media.In this scheme, different priorities among and within media, along with the network conditions, are considered and the goal of cache management is to maximize the hit rate of noncontinuous media and the perceived QoS for the real-time continuous media.Q. Zhang et al. further proposed a unified cost metrics to measure the cache performance balancing the issues of network, latency, and media distortion [49].
Prefetching between proxies and servers is another effective technique in proxy design, if the user access pattern can be accurately estimated [10,50,51].Since continuous media is more likely to be accessed sequentially, Sen et al. proposed a proxy prefetching scheme for multimedia stream [10].In [48,52], the QoS-adaptive prefetching schemes took the network condition into account.Other caching techniques such as batching and merging were incorporated in [52].In order to handle heterogeneous users with different QoS requirement, multimedia proxies are also assigned tasks such as transcoding [53], rate control [54], or any network-adaptive techniques mentioned in Section 3.2.If there are multiple servers between proxies and servers, selecting a proper server to achieve load balance, and meanwhile maximizing media quality, is another issue [49].

TRANSPORT-LAYER MULTIMEDIA TRANSMISSION CONTROL
To efficiently deliver multimedia over wireless Internet, it is important to estimate the status of underlying networks so that multimedia applications can adapt accordingly.IETF has developed several standards such as real-time transport control protocol (RTP/RTCP), real-time streaming protocol (RTSP), session initiation protocol (SIP), session description protocol (SDP), and streaming control transport protocol (SCTP) to monitor and control the media streaming process.However, how well these protocols can work to achieve a desirable media streaming quality relies on the accuracy of the estimation of the network conditions.One of the most important issues in the estimation of network conditions is to detect current available bandwidth and perform efficient congestion control.A proper congestion control scheme should maximize the bandwidth utilization and at the same time should avoid overusing network resource which may cause network collapse.Since TCP is the dominantly used transport protocol in the Internet, it is very important for multimedia streaming applications to be TCP-friendly, which means the long-term throughput of a multimedia stream is close to that of a TCP flow under similar network conditions [55].Generally there are two kinds of TCP-friendly streaming control protocol: windowbased and model-based.Window-based congestion control scheme performs additive increase and multiplicative decrease (AIMD) rate adjustment which is similar to TCP [56].
The rate-adaptive protocol (RAP) [57] mimics the behavior of the TCP congestion window and acknowledgement is triggered by every incoming packet on the receiver side to measure packet loss and RTT.TCP emulation at receivers (TEAR) [58] maintains a "virtual" congestion window at receivers and tries to derive from the incoming packets whether the congestion window should increase or decrease, note that the window adjustment is also in AIMD manner.Modelbased congestion control algorithms model TCP throughput by packet loss rate, RTT, and retransmission timeout (RTO) [59].They use the derived model and current measured network parameters to determine current available bandwidth for streaming protocols [60].TFRC is a well-known streaming protocol based upon this kind of model [61].

Packet loss differentiation and estimation
To design a streaming protocol for wireless Internet, several issues need to be considered.We will discuss these issues in the remaining sections.The most important one is congestion-loss estimation.In wireless environment, packet losses can be caused by either network congestion or transmission errors in wireless channels.TCP and TCP-friendly streaming protocols treat any packet loss as a signal of network congestion and consequently reduce the transmission rate.However, this rate reduction is not necessary when the loss is due to wireless errors, which in turn underutilizes the network resource.
Generally, there are two types of solutions to discriminate the losses, which are split connection and end-to-end method, respectively [62].In the former case, a proxy locates at the edge of the wired and wireless network to handle the two types of network separately [6,63,64].However, this type of solution introduces the deployment and cost issues for network operators.In the end-to-end method, one solution is to incorporate explicit congestion notification (ECN) to detect whether the network is in congestion status [65] and ignore the signal of packet losses.This method requires ECN scheme to be enabled at any intermediate router.Another type of end-to-end method is to use heuristic methods to differentiate the congestive packet loss from the erroneous loss.In [66], Biaz and Vaidya used packet interarrival time to differentiate the cause of losses.While in [67], Cen et al. extended their idea and further incorporated relative one-way trip time (ROTT) as an additional metrics to discriminate the losses.In [68], packet pair, that is, two packets sent back by back, was used in conjunction with hidden Markov model to achieve loss differentiation.Tsaoussidis and C. Zhang further proposed a technique called packet "wave" which is a series of back-by-back packets to detect the cause of losses [69].
All the heuristic methods expect a packet to exhibit a certain behavior under network congestion or erroneous losses.However, a specific behavior of a packet in the wireless Internet reflects the joint effect of several factors.Moreover, the traffic pattern in the Internet itself is a complicated research topic, and using a simple pattern to predict the behaviors of the packets is risky.In [33], Yang et al. proposed an end-toend loss differential method, which utilizes the link information in wireless channel.

Available bandwidth estimation
Another way to avoid the packet loss ambiguity is to bypass this problem, that is, use metrics other than packet loss as a signal of network congestion.
Packet pair is one of the effective methods to measure bandwidth.TCP-Westwood measures the ACK pair to derive the short-term bandwidth and counts the amount of acknowledged data during a time period to get the relative long-term bandwidth estimation [70].In [71], Wu et al. also used packet pair to probe the current bottleneck of the network and interpacket arrival time is adopted to measure the available bandwidth.By feeding these two values into an exponentially weighted moving average (EMWA) filter, the bandwidth estimation can be stable when possible, and agile when necessary.
Another method to estimate the available bandwidth is based on delay.TCP-Vegas is the most famous solution based upon this idea [72].TCP-Vegas maintains a minimum RTT as the "base RTT," and compares the current measured RTT to base RTT.If difference between them is small, TCP-Vegas deduces that network is not in the congestion status and increases the transmission rate.Otherwise, it reduces the rate to avoid congestion.Based upon this idea, TCP-Veno proposed a packet loss differentiation method which claims that only packet losses during the period when RTT varies greatly are congestive losses [73].A severe problem of delay-based available bandwidth measurement is how to achieve fairness with TCP, which is not clear.

Delay variation and estimation
In order to alleviate the packet losses due to transmission errors in wireless networks, ARQ is often adopted in modern wireless systems.This will introduce large delay variation in data transmission, which may cause inaccurate estimation of RTT and RTO.This in turn will result in performance degradation in window-based congestion control solution [74].Chan and Ramjee [75] proposed to use receiver's window field to convey the current wireless channel conditions to the TCP sender and an ACK regulator to manage (postpone) the release of ACKs to the sender to absorb the delay variation.For model-based streaming protocols, they adjust sending rate based on the estimated packet loss ratio and RTT.To reduce reverse path traffic, many streaming protocols send only a single acknowledgement back to measure the RTT during a predefined period of time.However, in wireless environment, the aforementioned delay fluctuation causes a dramatic variation of RTT value.That is to say, the rate estimation counted on RTT may be inaccurate and fluctuate greatly, which is not favorable to delay-sensitive real-time multimedia streaming applications.Yang et al. [33] proposed a streaming protocol which can collect as many RTT observations as possible during a period of time without increasing additional reverse traffic.It turns out that this method can smooth the rate variation of the congestion control behavior.

Channel estimation
To date, large amount of wireless multimedia studies have focused on robust media delivery over wireless channels [76,77,78].Most of the works assumes they have perfect knowledge of the fluctuating wireless channel conditions such as BER, delay, bandwidth, and so forth.However, it is very complicated to convert the physical channel QoS parameters into the desired QoS requirements, which can be understood by multimedia applications.For example, the raw physical-layer data rate is not equal to that obtained in link layer, considering the header and modulation overheads, and the channel decoding efficiency.Moreover, it is not obvious how to achieve end-to-end optimality for multimedia delivery although a single layer (e.g., physical or link layer) can reach the optimum.The physical wireless channel can be characterized by the large-scale loss and the small-scaling fading models [79].Large-scale loss models can predict the physical channel variation caused by user location and background interference level, while small-scale fading models statistically describe the radio signal strength fluctuations in very short time durations or over very short traverse distance [79].The physical channel state can be characterized by a finite-state Markov chain, which classifies the physical status into several states in terms of different BERs or data rates [80,81].A series of work from Zorzi et al. [81,82] shows that Markov model is a good approximation on block transmission over fading wireless channels.
Due to the different QoS metrics used in different layers addressed above, researchers propose to move the physical channel models up to link layer, that is, converting physical QoS parameters into application-understandable QoS metrics.Effective capacity (EC) theory [83] is proposed to model a wireless link by two EC functions: the probability of nonempty buffer and the QoS exponent of a connection, which characterize the queuing behavior in the link layer.Therefore, EC model is a powerful tool and can easily be used to provide the multimedia QoS metrics such as delay bound and available bandwidth [84].According to the results of [82], Q. Zhang et al. adopted a first-order Markov process to model the RLC layer frames in 3G wireless channels [26].Based upon the characteristics of 3G wireless channels [82,85] and considering the interaction between UDP and RLC protocol, they derived the available UDP throughput by physical channel parameters.

Header compression
As stated above, IETF had set up ROHC WG to address header compression issues.The goal of the ROHC WG is to develop header compression schemes that perform well over links with high error rates and long link RTT.In the ROHC framework, relevant information from past packets is maintained in a context.The context information is used to compress (decompress) subsequent packets.The compressor and decompressor update their contexts upon certain events.
It is known that, impairment events may lead to inconsistencies between the contexts of the compressor and the decompressor, which in turn may cause incorrect decompression.Thus, ROHC scheme needs some mechanisms for avoiding context inconsistencies and also mechanisms for making the contexts consistent when they are not.
Due to the limited packet loss robustness of existing realtime traffic compression scheme, CRTP, and the demands of the cellular industry for an efficient way of transporting voice over IP over wireless, ROHC has designed an ROHC scheme for IP/UDP/RTP headers [86], which are generous in size, especially compared to the payloads often carried by packets with such headers.This scheme had been accepted as IETF RFC 3095.ROHC-RTP has become a very efficient, robust and capable compression scheme, able to compress the headers down to a total size of one octet only.Also, transparency is guaranteed to an extremely great extent even when residual bit errors are present in compressed headers delivered to the decompressor.
The work on TCP header compression, called ROHC-TCP, has recently started in ROHC group.Compared with previous works such as CTCP (RFC 1144) or IPHC (RFC 2507), ROHC-TCP focused on ROHC over lossy links.In addition, ROHC tries to improve compression efficiency taking the optional fields (e.g., timestamp, SACK) into account [87].TCP-aware robust header compression (TAROC) scheme [88] can significantly improve the compression efficiency in unidirectional link by using congestion window tracking mechanisms and window-based least significant bit (LSB) encoding technique.

Application-adaptive ARQ and priority-based scheduling
Error control can be performed both at application/transport layer and link-layer.In general, the link-layer ARQ is more efficient than that in the application/transport layer.This is because, firstly, ARQ across a single link has a shorter control loop than that in upper layer, thus can recover the lost data more quickly.Secondly, link-layer ARQ can operate on frames which are much smaller than IP datagram, and therefore become more efficient in terms of error/loss recovery.Thirdly, a link-layer ARQ procedure is able to use local knowledge which is not available to end hosts [89].However, optimal performance can hardly be achieved based upon the link-layer ARQ.ARQ running both at the link layer and in end-to-end method could lead to undesirable competition on data retransmission.Such kind of "contention" of data retransmission in different layers will result in severe performance degradation in transport protocols [64], and may potentially have two or more copies of one packet residing in intermediate routes at the same time.
Link ARQ schemes, according to their willingness to retransmit lost frames to ensure reliable data delivery, can be classified into perfectly-persistent, high-persistent, and lowpersistent ARQ.Those schemes are differed in delay and reliability, which inspires people to adopt an upper-layer-aware link ARQ for applications which may have different QoS re-quirements.The idea is that the applications signal their QoS requirements to each link along the path on a per-packet basis.Link-layer ARQ can therefore adaptively adjust its behavior in accordance to different QoS requirements [26].The effects of the adaptive ARQ are implicitly fed back to applications through packet drops or delay.
In addition to the adaptive ARQ, it is well known that priority-based packet scheduling can also support differential QoS services.In priority-based schedulers, packets are grouped into several classes with different priority according to their QoS requirements.Packets in the class with higher priority are more likely to be transmitted first.And packets in the same class are served in a FIFO manner.Based upon the priority scheduling mechanism, each QoS class will have some sort of statistical QoS guarantees.Traditional priority packet-scheduling algorithms based upon generalized processor-sharing (GPS) fluid model such as weight fair queueing (WFQ) [90] inherently couple delay bound and bandwidth requirement, which lack flexible QoS provision.Liao and Zhu [91] proposed a priority packet-scheduling algorithm by relaxing the packet service order.In [84], the authors employed the simplest strict (nonpreemptive) prioritized scheduling policy and derived the rate constraints for different video substreams with different QoS requirements according to the EC theory [83].The EC theory can also be applied to the QoS-provision scheme which exploits multiuser diversity [92].The advantage of this scheme is that it can achieve capacity gain under strict QoS requirements where traditional multiuser diversity scheduling cannot be applied directly.

CONCLUSION
This paper presents a framework with cross-layer architecture for multimedia delivery over wireless Internet.We review various media delivery techniques at the application layer, transport layer, and link layer to achieve good user's perceived quality of multimedia data.More specifically, network-aware adaptive media source coding, dynamical estimation of the varying channel, adaptive and energy-efficient application and link-level error control, efficient congestion control, header compression, adaptive ARQ and priority-based scheduling, as well as the QoSadaptive proxy caching are explicitly reviewed in the architecture.
Cross-layer design for multimedia delivery over heterogeneous wireless Internet presents many challenges and opportunities.There are still a lot of issues needed to be further investigated.Moreover, this paper mainly focuses on QoS support in unicast media streaming.Efficient work on QoS provisions for multicast media streaming is an area that requires lots of efforts [29].Mobility also has significant impact on perceived QoS during multimedia streaming.How to maintain an acceptable media quality when handoff happens is another research direction [93].Enabling media streaming over ad hoc network is more challenging than over traditional wireless networks, where mobile hosts communicate with BS.In wireless ad hoc networks, dynamic changing topology and interference result in even greater QoS fluctuation.Recently, multipath media streaming and QoS-aware MAC design are two promising cross-layer approaches to providing QoS support for ad hoc networks [94,95].

Figure 1 :Figure 2 :
Figure 1: A general architecture for multimedia delivery over wireless Internet.

Figure 3 :
Figure 3: The cross-layer architecture for multimedia delivery over wireless Internet.