Adaptive access and rate control of CSMA for energy, rate, and delay optimization

In this article, we present a cross-layer adaptive algorithm that dynamically maximizes the average utility function. A per stage utility function is defined for each link of a carrier sense multiple access-based wireless network as a weighted concave function of energy consumption, smoothed rate, and smoothed queue size. Hence, by selecting weights we can control the trade-off among them. Using dynamic programming, the utility function is maximized by dynamically adapting channel access, modulation, and coding according to the queue size and quality of the time-varying channel. We show that the optimal transmission policy has a threshold structure versus the channel state where the optimal decision is to transmit when the wireless channel state is better than a threshold. We also provide a queue management scheme where arrival rate is controlled based on the link state. Numerical results show characteristics of the proposed adaptation scheme and highlight the trade-off among energy consumption, smoothed data rate, and link delay.


Introduction
In wireless networks, mobile devices are usually battery powered with a limited amount of energy. Therefore, minimization of energy consumption while maintaining the quality of service in the network is crucial. This must be accomplished by adapting the transmission parameters to the system dynamics and to the time-varying channel of the links. In this article, we present a crosslayer adaptive algorithm that dynamically maximizes the average utility function of a carrier sense multiple access (CSMA)-based wireless link.
Benefits of such adaptation schemes are shown in some prior works in terms of energy efficiency [1][2][3][4][5][6][7][8]. In such works various control algorithms have been proposed that trade-off among different goals such as energy consumption, average delay, packet dropping probability and bit error rate, and dynamically adapt the transmission parameters to the channel and system state. The aforementioned works assume point-to-point links with dedicated channels. However, in data transmission networks, where data are generated at random time instances, random access schemes are used to efficiently exploit channel resources. In such systems, there are more users than available channels, and at any given time only a subset of users can access the channels. Therefore, the optimality of channel access decision is crucial in random access networks. Random access is widely used in ad hoc networks as it can be implemented in a distributed manner. Wireless local area networks (WLAN) and practical personal or sensor networks usually use random access control in their ad hoc operation mode [9,10]. On the other hand, it is shown recently that CSMA protocols can achieve maximum stable throughput [11] while keeping bounded queuing delay [12], and it can achieve a collision free WLAN [13].
Optimization of random access networks was first proposed in order to achieve single hop proportional fairness for slotted ALOHA networks [14]. Different types of fairness are also considered and random access control is modeled as a utility maximization problem in [15]. In addition, the cross-layer optimization problem of random access control and transmission control protocol is solved as a network utility maximization problem [16]. Newtonlike algorithms are also provided for energy and throughput optimization with end-to-end delay constraint in multi hop random access network [17]. However, in the aforementioned articles static transmission probability was used and opportunity of time varying and adaptive control was ignored.
On the other hand, queue-based random access algorithms were studied in [18], where access probabilities are assumed to be adapted based on queue sizes. Stability of the proposed algorithms was verified and their delay performance was shown to surpass fixed optimization algorithms. Also a heuristic differential queue-based scheduling algorithm is proposed in [19] which shows superior performance compared to 802.11 through experimental results. However, such queue-based algorithms are inappropriate for fading channels and prioritize links with low channel quality, which results in low energy efficiency [20].
In this article, we propose cross-layer adaptive algorithms; derived from dynamic programming, for distributed optimization of the links in CSMA-based wireless networks operating in mobile environments. As a performance metric, we define the per stage utility of the link as a weighted concave function of energy consumption, smoothed data rate, and smoothed queue size in the link, where the weights are assigned based on the desired tradeoff among them. The algorithms maximize the average utility by dynamically adapting the channel access decision and transmit data rate (by selecting different modulation and coding schemes) according to the queue size of the link and the availability and quality of the time-varying channel (channel state is assumed to be known at the transmitter). Both, finite-time horizon (FTH) and infinitetime horizon (ITH) problems are considered. In the first case, the utility sum is maximized for a finite time period, whereas in the second case, the long-term average utility is maximized.
We consider a mobile environment with frequency-flat time-varying channel response. This requires suitable models of the wireless channel dynamics. Here, we use finite-state Markov chains (FSMC) to model channel dynamics, such that channel time-correlation at network links is partially exploited by the proposed algorithms. Although the physical wireless channel is inherently non-Markovian, it has been shown that stationary Markov chains can capture the essence of the channel dynamics [21]. Many transmission adaptation algorithms are based on first-order Markov channel models [1,2]. Here, we consider first-and second-order Markov chains to model characteristics of network links.
The numerical simulations show the benefits of the proposed adaptation algorithms in terms of energy efficiency, and highlight the trade-off among energy consumption, smoothed data rate, and delay in links of a CSMA network. They also show that the use of suitable Markov model for the wireless channel improves performance of the adaptation algorithm, mainly for slow fading channels. Algorithms based on uncorrelated, first-and second-order Markov models are considered and their performance is compared through simulations.
The rest of the article is organized as follows. Section 2 presents the system model and in particular it describes the model of the network links as well as wireless channel models. In Section 3, per stage utility of the links is defined. Consequently, the utility sum maximization for a finite time period is formulated as an optimal finite-horizon control problem. Similarly, the long-term average utility maximization is formulated as an optimal infinitehorizon control problem. Section 4 uses dynamic programming to compute the optimal adaptation policies for the problems formulated in Section 3. We have investigated structural properties of the optimal solution in Section 5. Numerical results and comparisons are described in Section 6. Finally, Section 7 concludes the article.

System model
In this section, we describe the model of the random access links as well as wireless channel models.

Link model
We consider an ad hoc network where links use CSMA protocol similar to the one provided in [22] which prevents collision among links and also resolves hidden and exposed node problems which exist in wireless networks [23]. As shown in Figure 1, we assume a slotted transmission model where each timeslot, of duration T s , contains both a data slot and a number of control mini slots. When the link has a packet to transmit, it should wait for a random value of W control mini-slots, and if no other link has reserved the channel earlier, it will send a short request to send packet to reserve the channel. Then, the potential receiver which also perceives that the channel is idle will response with a clear to send (CTS) packet that allows the transmitter to transmit and informs possible interfering nodes that the channel will be used. Once the transmitter receives the CTS, it sends its packet in the data slot.
Timeslot k is defined as the time interval [(k -1)T s , kT s ). We use I k to denote the channel access, where I k = 1 indicates that the link has decided to access the channel at the kth timeslot. The control policy adapts I k in each slot based on the system and channel state. Also B k = 1 indicates that the link should delay its transmission because the channel is already occupied by another link. We model B k as a Bernoulli process where P B Pr{B k = 1} is the channel occupancy probability. The Bernoulli distribution is widely used to model the statistics of B k in CSMA networks [24].
The link has a queue of maximum size L. Let q k denote the number of packets in the queue at the kth timeslot, which is assumed to be known at the transmitter. Obviously, I k = 0 when q k = 0. r k denotes the controlled number of packets that arrive the queue in slot k, which we will call arrival rate hereafter. The value of r k should be chosen both to provide suitable rate for source data and to prevent delay due to backlog through adapting source rate to the link state [25]. To avoid buffer overflow the arrival rate is constrained by r k ≤ (Lq k ). The queue update equation is Where C k indicates the maximum number of packets that can be transmitted during the kth data slot. C k depends on the channel state, and it is assumed to be known at the transmitter at the beginning of each timeslot. We call the data that the physical layer transmits in one time slot a frame and the link consumes a constant energy e for transmission of frame in the data slot. Thus, the energy consumed in the kth timeslot will be E k = eI k (1 -B k ).
We also consider the exponentially weighted moving average (EWMA) of the queue occupancyq k and of the arrival rater k as the link state variables which are defined as follows Note thatq k andr k can be viewed as "smoothed" measures of the delay and data rate in the link. The parameters θ q and θ r determines the time scale over which the smoothing is performed. The smaller the value of θ r or θ q , the shorter the time period of moving average (smoothing). Values of θ r and θ q are determined based on the tolerance of the applications to the delay and data rate variations in the link. Random early detection protocol has used the EWMA of the delay (q k ) as a criterion for congestion control [26]. In addition, the EWMA of the rate (or smoothed rate),r k , has been used in [27,28] as a measure of the quality of service. EWMA is also used as a metric in statistical quality control [29].

Channel model
We consider a frequency-flat block-fading channel, where the channel remains constant during each timeslot, and can change for consecutive timeslots. Therefore, we assume that the duration of each timeslot (T s ) is less than the coherence time of the channel. Hence, channel responses at different timeslots can be correlated. The channel power gain at the kth timeslot is denoted by g k . Since we assume constant transmit power, the received signal-to-noise ratio (SNR) in the link for the kth timeslot will be proportional to g k . The fading range 0 ≤ g is partitioned into M disjoint regions so that the jth region is defined as R j = {g : A j ≤ g <A j +1 }, where A 1 = 0 and A M+1 = ∞. The channel for the kth timeslot is in state j if g k R j . Also the values of A j are selected according to the adaptive modulation and coding as follows. Consider that transmitter has a set of modulation and coding schemes {Q 1 , Q 2 , ... , Q M } to select from in each time slot. We select A j ; j = 2, ... , M such that if channel is in state j, transmitter can use Q j and ensures that the frames transmitted with this scheme have error probability less than FER th which is a target threshold for frame error rate (FER).  Let C = {Ĉ j |j = 1, ..., M} denote the set of number of transmit packets associated with the set of channel states, if g k R j then C k =Ĉ j where C k is the number of packets that can be transmitted in the kth timeslot. Note that packet error rate will be below the same threshold, i.e., PER th = FER th , since (a) adaptive algorithm applies different Q i schemes so that transmitter ensures the same error threshold for all frames, and (b) if a frame transmission was unsuccessful all packets in the frame will be lost. Therefore, the ratio of the lost packets to the total number of packets equals the ratio of the erroneous frames to the total number of frames, regardless of the channel state.
Prob{packet in a frame of sizeĈ j } × FER(Ĉ j ) = FER th Subsequently, we consider three models for the random process C k , with diverse degrees of complexity.

Uncorrelated model
In this model, the channel response at different timeslots are assumed uncorrelated so, where P j is the probability of the channel state R j . This simple model may be accurate for fading channels that exhibit high timevariability. It is also the fitting model when there is no prior information about the channel time correlation.

First-order markov model
To model the time correlation of the channel we use an M-state FSMC [30] with time discretized to T s and transition probabilities as Accordingly, the random process C k will be modeled with the same M-state FSMC so: The transition probabilities depend on the normalized Doppler frequency f d T s which determines the rate of variation of the channel with respect to the timeslot duration, where f d is the channel Doppler frequency. Although the physical wireless channel is inherently non-Markovian, it has been shown that an FSMC can capture the essence of the channel dynamics when the number of regions/states (M) is low and the channel fades slow enough (see for example [21] and references therein). Note that the uncorrelated model can be viewed as a particular case of FSMC where P i,j = P j , ∀i.

Second-order Markov model
In order to model dynamics of C k more accurately, we also consider second-order FSMC channel models. They are more accurate than the first-order FSMC since C k+1 depends on both C k and C k-1 .
In this article, we use the so-called Cartesian product method [21] for the second-order models. We will investigate the effect of the FSMC order on the performance of the resulting algorithm through numerical results. Note that the formulation of the first-order Markov model can be considered as a special case of the second-order model with P i,j = P l,i,j for any l.

Problem formulation
We consider a wireless link in a CSMA network which desires to optimize its transmission rate, energy consumption, and delay. We distinguish two dynamic optimization problems: FTH and ITH problems. In the FTH problem, the performance of the link is optimized over a finite number of timeslots, whereas in the ITH problem the link performance is optimized considering an infinite number of timeslots. Next, they are formulated as dynamic programming problems.

Finite time horizon
We define a utility maximization problem over N timeslots or stages as follows: where the expectation is taken over the random process C k . The function g(s k , μ k ) is the utility per stage and is a measure of the quality of service of the link at each timeslot. It depends on the action vector μ k = (I k , r k ) and on the system state vector. We consider a secondorder Markov model for C k and include component C k-1 in the state vector s k = (q k ,q k ,r k , C k , C k−1 ). Note that the first-order model can be considered as a special case with s k = (q k ,q k ,r k , C k ). Considering (·) as the state update function we can write: s k+1 = (s k , μ k , C k+1 , B k ). In (6) g N+1 is the final stage utility which depends only on the final state of the system, s N+1 and it can include some limitations or penalties on the final state of the system.
Here we consider a special format for utility per stage function in order to clarify how it controls system performance: where U(·), and -V(·) are suitable continuous, concave functions, and parameters a and b control the tradeoff between rate, energy, and delay in the utility function. A similar formulation for per stage utility is used in [27,28] for multi-period utility maximization while queue management and thus queue sizes were not considered.
The number of packets remaining in the queue at the final stage can be penalized with a price of h as follows:

Infinite time horizon
In this case we maximize the average utility per stage which is defined by where the action and state vectors as well as the per stage utility function are defined similar to the FTH problem. We consider both the first-and second-order models for the channel state by applying appropriate format of s k .

Optimal adaptive control
To maximize FTH or ITH utility functions the controller should decide optimal actions μ * k (s k ) at the beginning of each timeslot as a function of the system state s k . Note that the decision must be causal since future system states are unknown due to the randomness of the channel state (C k ) and occupancy (B k ). In this section, by using the DP algorithm [31], we derive algorithms that compute the optimal control functions for the FTH and ITH problems. It is important to remark that the resulting optimal control functions are computed and stored offline. Then, they will be used online to dynamically adapt the actions to the system state. As described earlier, the system state definition can support uncorrelated, first-and second-order channel models so we do not limit the solution to any specific channel model.

Per stage adaptation to maximize FTH utility
The optimal control policy is the sequence of control functions (one for each timeslot) (6). Note that the control functions provide the optimal action for each of the possible system states at different stages. Using the DP algorithm, the optimal policy π* is obtained from the following backward recursion for k = N, N -1, ..., 1: The function J k (s k ) is the maximum expected accumulative utility, achieved under optimal decision, when the system is in state s k at the kth stage. Thus, J 1 (s 1 ) is the expected total utility for N stages when the initial state is s 1 .
The application of the DP algorithm requires computation of function J k (s k ) for all possible system states (s k ) at each stage and necessitates the system state space to be finite. Since the state componentsr k , andq k can take values from continuous spaces, we discretize them using finite grids {q d m |m = 1, 2, ...M q }, and {r d m |m = 1, 2, ..., M r }. Then, we can express each non-grid value as a linear interpolation of the nearby grid values: where w m q and w m r are non-negative weights and It can be shown that if Lipschitz condition holds for the functions w m q (q), w m r (r), and g(s k , μ k ), and for the state update functions (1-3), the DP solution of the discretized problem converges to the optimal policy for the original continuous problem, as the density of the grid increases [32]. For the problem in hand the utility and the state update functions are continuous and thus satisfy the Lipschitz condition. We select w m q and w m r to be suitable continuous functions of the state variables which are chosen on the basis of geometric considerations as suggested in [31] and each state will be described by two nearby discrete states: The DP algorithm for the discretized state space whereJ k+1 (s k+1 ) is the estimation of J k+1 (s k+1 ) by its values at discretized states and is given by: Furthermore q k+1 ,q k+1 andr k+1 are given by (1), (2), and (3), respectively. We consider the second-order Markov model by using both C k-1 and C k in the state vector. The other channel models can be considered as its special case. The solution provided in Equations (17)- (19) is valid for any concave and continuous function of g.
Next we replace g(s k , μ k ) in (17) and (18) with its format provided in (7) and calculate the expectation, E C k+1 ,B k [·], using the channel transition probabilities, Pr C k+1 =Ĉ j |C k , C k−1 and channel occupancy probability, Pr{B k = 1}. The expected accumulative utility and the optimal control functions for k = N, N -1, ... , 1 will be Sincer d k andq d k are independent of the decision in the kth timeslot, U(r d k ) − αV(q d k ) do not affect the maximization in (21). Also the summations in (20) and (21) are over all M channel states and two possible channel occupancy conditions.
The discrete DP algorithm can be executed offline and the resulting optimal policy can be stored in a look-up table available at the transmitter. Then, it will be used online to dynamically adapt the action to the system state.

State-based adaptive control to optimize average utility per stage
To solve the ITH problem of (9) we first define the average utility per stage when using policy π and starting from the initial state s as We denote the optimal policy as π* which produces the maximum average utility per stage J*. Both π* and J* are independent of the initial state since the influence of the utility of the early stages on the average utility reduces to 0 as N ∞. Moreover, since the utility per stage, the transition probabilities (4), and the state update Equations (1)-(3) are all stationary, the optimal policy will be stationary (does not change from stage to stage). Therefore, it is a single function, μ*(s), that maps the system states to actions regardless of the stage.
J*, together with the so-called relative value function h*(s), should satisfy the following Bellman's fixed point equation [31] for every state: where s + indicates the successor state of the current state s. Considering (·) as the state update function s + = (s, μ, C + , B). The expectation in Equation (23) is over the random processes {B k } and {C k }.
We use a modified relative value iteration algorithm to solve the ITH problem [31]. First, we define a variant of the Bellman operator over any function f as where parameter τ (0, 1) is a scalar. Then, the following iterative algorithm is used in order to calculate h (n+1) (s) for all states of the state space in the iteration (n + 1): where s' is some fixed state. We initialize this algorithm with h (0) (s) = g(s, I = 0, r = 0). Convergence of (25) is guaranteed since queue and channel states are recurrent [31]. The decision will also be updated and will finally converge to the optimal decision as n ∞: The practical application of (24) requires the state space to be discrete, so we use the same discretization procedure as in Section 4.1. This results in the following modified Bellman operator: Mr l=1 w m q (q + )w l r (r + )f (s +d ) Therefore, we apply (27) and compute h (n) (s d ) for all possible discrete states. For the uncorrelated and firstorder channel models there are (L × M q × M r × M) discrete states and for the second-order channel model this number should be multiplied by M.

Structural properties of the optimal solution
In the previous section, we provided DP algorithms that can be applied to find optimal decisions through numerical calculations. In this section, we investigate some structural properties of the solution. We use the following practical assumptions throughout this section.
Assumption 1: Per stage utility function has a format of (7) and (8) with U(·), and V(·) as increasing functions.
Assumption 2: Consider (C − =Ĉ a − , C =Ĉ a ) as the channel state in the previous and current slot, respectively, and (Ĉ b − ,Ĉ b ) as other possible channel states in these two slots, whereĈ a ≤Ĉ b andĈ a − ≤Ĉ b −. We assume that there exists a j such that the following inequality holds for channel transition probabilities: where P a − ,a,i is the probability of going from channel states (Ĉ a − ,Ĉ a ) to the next state C + =Ĉ i as defined for second order Markov model. Assumption 2 is valid in practice for Markov channels since (Ĉ a − ,Ĉ a ) is supposed to be lower than and (Ĉ b − ,Ĉ b ) and each side of the inequality calculates the probability of going to the first j states with lowest rates. For example, if P b − ,b,1 ≤ P a − ,a,1 then the inequality will be true for j = 1 for and assumption 2 holds. If the inequality turns out to be true for any value of j then the assumption is correct. Based on this assumption we provide the following lemma: Lemma 1: If f(C) and g(C) are two increasing functions, f(C) ≤ g(C), C + is the next channel state, and similar to Assumption 2Ĉ a ≤Ĉ b and (Ĉ a − ≤Ĉ b − ) then we have Proof is provided in the Appendix.

Structural properties of FTH solution
The following theorem indicates monotonicity of J k (s k ) versus the state variables.
Theorem 1: J k (s k ) is a decreasing function of q k and q k , and an increasing function ofr k and C k for all values of k.
Proof: In order to prove the theorem we show through induction that for k = N + 1, ..., 1 we have J k (s k + Δ) ≤ J k (s k ) for any vector Δ that increase q k andq k , and decreaser k and C k .
We define μ * k, as optimal decision for state s k + Δ in stage k, so we can write J k (s k + ) = G k (s k + , μ * k, ), however μ * k, is not an optimal decision for state s k so we have: Using Assumption 1 it is clear that g is a monotonic function of the state variables We consider (·) as the state update function and define two possible next states s * k+1, = ϕ(s k + , μ * k, , C k+1 , B k ) and s # k+1 = ϕ(s k , μ * k, , C k+1 , B k ). For known values of C k+1 and B k we can use (1)-(3) and easily show that since B is independent of the system state thus we have f(C k+1 ) ≤ g (C k+1 ) and since J k +1 (·) is an increasing function of C, then f(·) and g(·) are increasing functions. Applying Lemma 1 witĥ Combining (31), (32) and considering definition of G k in (29) we get Equation (30) together with (33) prove the theorem by showing: J k (s k + Δ) ≤ J k (s k ).
■ Assuming uncorrelated channel model the following theorem indicates the "threshold structure" of the optimal transmission policy versus the channel state.
Theorem 2: If the optimal access decision in state s k = (q k ,q k ,r k , C k ) is I * k (s k ) = 1, then for another possible state s k = (q k ,q k ,r k , C k ) in the same slot with improved channel state C k ≥ C k we have I * k (s k ) = 1.

Proof:
Assume μ * k (s k ) = (I k = 1, r k ) but μ * k (s k ) = (I k = 0, r k ) as the optimal decision for s k and s k , respectively. According to the definition of G k in the proof of Theorem 1, μ * k (s k ) maximizes G k (s k , μ k ) and we have On the other hand since s k and s k differs only in the channel state, we have g(s k , μ * k (s k )) = g(s k , μ * k (s k )) and by using (I k = 0, r k ) for both states, queue size will modify similarly for s k , and s k which results in the same next state, s k+1 = s k+1 . Also for uncorrelated channel model the averaging over next channel state does not depend on the current state, thus By applying decision μ * k (s k ) = (I k = 1, r k ) and since transmission with better channel state will decrease q andq which increase J k according to Theorem 1, we have Combining (34), (35), and (36) results in which is in contrast to optimality of the μ * k (s k ) = (I k = 0, r k ). Thus, we should have μ * k (s k ) = (I k = 1, r k ). ■ Note that Theorem 2 may be incorrect when channel state is time correlated. For example, consider two possible channel states C k and C k , with C k < C k and assume that optimal decision is to transmit for a state with C k . Also assume that probability of going from C k to a better channel state and from C k to a worse channel state is high. So, we can argue heuristically that in this condition it may be optimal to transmit data when channel is in state C k but not to transmit when it is in state C k .

Structural properties of ITH solution
We provide structural properties of ITH solution in this section through the following theorems. First we show that relative value function, h*(s), is a monotonic function in Theorem 3 and then prove the threshold structure of access decision versus channel state in Theorem 4.
Theorem 3: h*(s), is a decreasing function of q andq, and an increasing function ofr and C.
Proof: We define Δ = (δ 1 , δ 2 , -δ 3 , -δ 4 , -δ 5 ) with δ i ≥ 0 and show that h*(s + Δ) ≤ h*(s). We also define G f (s, μ(s)) on function f as Assuming μ* as the decision that maximizes G f and according to the Bellman equation (24) we have Taking into account h*(s) = lim k ∞ h (n) (s), we prove through induction for every iteration n, h (n) (s + Δ) ≤ h (n) (s). For n = 0 we define h (0) (s) = g(s, I = 0, r = 0) which according to Assumption 1 it is clear that h (0) (s + Δ) ≤ h (0) (s). We assume that h (n) (s) is monotonic and show h (n+1) (s) is also monotonic. First, we show that B τ h (n) (s) is monotonic. Using (26) for states s and s + Δ and assuming μ* and μ * , respectively, as maximizing actions we have From definition of g it is clear that g(s + , μ * ) ≤ g(s, μ * ). We can use the similar approach as the proof of Theorem 1 and apply Lemma 1 to show that Combining (38) and (40) we find that B τ h (n) (s + Δ) ≤ B τ h (n) (s). Using Equation (25) and taking into account that B τ h (n) (s') is independent of the state vector, it can be easily shown that h (n + 1) (s) is a monotonic function.
■ Assuming uncorrelated channel model the following theorem indicates existence of a threshold for channel state that the link should decide to transmit when channel state is better than or equal to that threshold.
Theorem 4: There exists a threshold, C th , that for s th = (q = C th ,q,r, C th ) with anyq andr, we have I* (s th ) = 1. Also for any s with C ≥ C th and q ≥ C th we have I*(s) = 1.
Proof: Assume in timeslot k we have C k = C max and q k = C max , transmission at this time has the energy cost of b e but it will reduce q by C max which will reduceq by θ q C max and also will reduce the future costs related to the queue size. However, transmission of theses C max packets at any later time slot requires the same amount of energy. Thus, it is better to transmit these packets at state s th to reduce the queue size as early as possible and reduce the future costs related to the queue size. We conclude that: "if C = C max and q = C max then I*(s) = 1" which proves existence of C th .
In order to prove the second part of the theorem we assume s = (q ≥ C th ,q,r, C ≥ C th ) and consider optimal decisions μ*(s th ) = (I*(s th ), r*(s th )) and μ*(s) = (I*(s), r*(s)) for states s th and , respectively. If I(s) = 0 we can show similar to the proof of Theorem 2 that it cannot be an optimal transmission policy. ■

Numerical results
For numerical analysis of the adaptive control algorithms provided in Section 4 we consider a lightweight sensor in a wireless network that may transmit its status using few bits. In each timeslot the sensor may send its own packet or forward packets of other sensors. We assume a Rayleigh flat fading channel, and use a set of simple Modulation and Coding schemes. Note that our adaptive algorithm only requires the FSMC model which can be found for many practical fading channels [21] and do not depend on Rayleigh fading assumption or Modulation schemes. However, in this section we consider the following types of modulations joint with Reed-Solomon (RS) coding: Note that in each time slot one frame will be transmitted and the time duration of the frames is identical for different schemes. Figure 2 illustrates FER of the aforementioned schemes. Setting 0.01 as the FER threshold, we find SNR thresholds A j for the fading regions which ensure the required FER limit for (Q 1 , Q 2 , Q 3 , Q 4 ) as {0, 3.8, 7.77, 33.1, ∞}. For example, we will use Q 3 while SNR is between 7.77 (8.9 dB) and 33.1 (15.2 dB). Assuming a Rayleigh fading channel, we use the Markov model proposed in [21,30] to obtain the transition probabilities for a given normalized Doppler frequency. Unless otherwise indicated we assume the average SNR at the receiver is SNR = 10 dB and normalized Doppler frequency is f d T s = 0.02. Each frame that will be transmitted in the data slot contains a coded block. Considering the packet length of 47 bits, 0, 1, 2, or 4 packets can be transmitted in a frame based on the channel state so C = {0, 1, 2, 4}.
Regarding the per stage utility function (7) we use U(r k ) = log(ε +r k ), and V(q k ) = (q k ) 2 to avoid very small rates and large queue sizes. Logarithmic utility is used to provide proportional fairness in the network [14] and prevent selfish rate maximization of the link. Also it provides fairness among multiple flows over a single link [27]. Recently, it has been shown, based on experimental results and Weber-Fechner psychophysical law, that user experience and satisfaction follows logarithm laws, and quality of experience (QoE) versus rate is formulated as QoE(r) = log (ar + b) [33]. In the utility function, energy and queue sizes are used with negative weights to minimize energy consumption and delay. Remember that the energy consumed at the kth timeslot is given by E k = eI k (1 -B k ). Therefore We also use the following parameters for simulations unless otherwise indicated: θ q = θ r = 0.7, L = 12, a = 0.005, be = 1, , P B = 0.1, ε = 0.001. Using these parameters, Figure 3 shows the selected per stage utility which is an increasing function ofr k and decreasing function ofq k . Our selected parameters result in a utility function which is negative; however, behavior of this function versus system state is such that by maximizing it we will maximize rate while minimizing energy and delay.
As described in Section 4, continuous state variables, r k andq k , should be discretized in order to achieve a finite state system and dynamic programming solution. We set M r = 21, and M q = 13 for discretization. It is shown in our simulations that enhancement achieved by selecting greater values for M r , and M q is insignificant. Also the maximum queue size is assumed to be L = 12 and the number of arrival packets at each stage r k is limited by 4.

FTH results
As indicated earlier for FTH problem we use (8) as the final stage utility function with h = 5 as the price for packets remaining in the queue where U(r k ) = log(ε +r k ), andr 1 = 0.1. We assume that the initial state of the link is q 1 = q 1 = 0,q 1 = q 1 = 0 and consider a flat fading Rayleigh channel. The recursive algorithm (16)-(18) is used to obtain the optimal control policy (over the discretized state space) for different values of N. The transition probabilities of the Markov models are computed as described earlier.
The optimal control policies are then used in Monte Carlo simulations, over the channel response g k and the channel occupation B k processes, to maximize J 1 (s 1 ), the sum of the utilities of the N stages. Figure 4 illustrates the J 1 (s 1 ), as a function of the number of timeslots N, for different channel correlation models and two values of average SNR. This figure shows that the performance is enhanced by exploiting the channel correlation through the FSMC models, mainly for large values of N. It also shows that the use of second-order FSMC is not worthwhile in these cases. For N > 50, J 1 (s 1 ) varies almost as a linear function of N where the slope depends on the channel correlation model and the average SNR.
We have also investigated the dependence of performance on the initial state of the link. Figure 5 illustrates the average utility, J 1 (s 1 )/N, versus the initial EWMA rater 1 for different values of N. As expected the higher initial EWMA rate, the higher average utility. It also shows that sensitivity to the initial state decreases as the number of slots increases.
Size of the grid used for discretization, M q and M r , can affect performance of the system. In Table 1 we provide FTH performance of the algorithms that have used different values of M q and M r for N = 40. We can see that enhancement achieved by selecting values greater than M r = 21, and M q = 13 is negligible.

ITH results
We use the modified relative value iteration algorithm (25), with τ = 0.9, in order to find the optimal control policies for the infinite time horizon problems. Unless otherwise indicated, in the following results we have considered the first-order FSMC channel model. In each iteration the algorithm computes new values of I (n) (s) and r (n) (s) for all possible states and finally it converges to the optimal control policy (for the discretized problem) μ*(s) = (I*(s), r*(s)). Figure 6 illustrates the convergence of the iterative algorithm, by showing the percentage of decisions, (I (n) (s), r (n) (s)). which are modified in each iteration in comparison with the previous iteration. Apart from the optimal control policy, the algorithm also provides the optimal relative value function, h*(s). Figure 7a,b illustrates r*(s), and h*(s) versus some elements of the state vector while fixing the others. They show interesting properties of the optimal policy and relative value function with respect to the system state. For example, Figure 7a shows that the arrival rate should be reduced as the channel goes to the fade state or as the EWMA of rate increases. Figure 7b indicates that the relative utility function decreases as the queue size increases or the channel goes to the fade state. Figure 8 demonstrates the optimal actions for a particular realization of the channel process in a period of 200 timeslots. Note that when the channel goes to a deep fade during timeslots 231 to 282 the link does not  access the channel (I k = 0) so there is not energy consumption in this period (E k = 0). Also, new packet arrivals are reduced to prevent high queue backlog but kept at a minimum rate to prevent log(ε +r k ) from very negative values. After the deep fade finishes the link starts to transmit backlogged packets while keeping slow arrival rate until timeslot 288.
Based on the selected format of the per stage utility (7), we can reduce the energy consumption by increasing b. However, this is achieved at the cost of reducing the transmission rate and increasing the delay as shown in Figure 9. In other words, the figure shows the tradeoff between energy, rate, and delay as a function of b. Here, the average delay is calculated using the little's low: D = k q k / k r k [34]. Figure 10 shows the performance of the optimal policies, obtained from different channel correlation models, as a function of the time variability of the channel. In particular, it shows the resulting average utility per stage, as a function of the normalized Doppler frequency f d T s , for the first-and second-order FSMC models and different values of the EWMA parameters for packet arrivals and queue occupancy. It shows that average utility is higher for fading channels with higher f d T s since channel remains for a short time in deep fades. For θ = θ q = θ r = 0.7, which corresponds to larger averaging time of rate and queue size, both channel models exhibit similar performance. However, for θ = θ q = θ r = 0.3 we see that the more accurate second-order FSMC model enhances the performance of the link compared to the first-order FSMC model.

Conclusions
We addressed the problem of optimal channel access and rate adaptation in the links of CSMA wireless networks. We defined a utility function that trades off the

Convergence of I (n)
Convergence of r (n)    energy consumption and the average packet transmission rate and delay. By using dynamic programming, we derive algorithms and optimal policies that maximize the average utility by adapting the arrival packet rate and channel access as functions of the queue occupancy, channel state, and smoothed rate. The optimal policies can be computed and stored offline. Then, they can be used online for dynamic access control and queue management of the link. The proposed algorithms exploit the time correlation of the channel by means of different FSMC models. Both FTH and ITH problems were addressed. In the first case, the average utility is optimized for a finite time period, whereas in the second case, the long-term average utility is maximized. Structural properties of the optimal solution are investigated and it is shown that optimal transmission policy has a threshold structure versus the channel state. For the ITH problem we proved the existence of a channel state that the link should always transmit when the channel is in that state or in a better one. Numerical results show that the overall performance of the link can be enhanced by increasing the order of the FSMC channel model. However, it increases the complexity of the algorithms and the memory required to store the optimal policies.

Proof of Lemma 1
The difference between right-and left-hand of inequality (28) can be calculated using the channel transition probabilities: We partition the summation and rewrite it as  The first inequality is a result of g(Ĉ i ) − f (Ĉ i ) ≥ 0 and the second one considers g(Ĉ j ) ≤ g(Ĉ i ), i = j + 1, ... , M, and f (Ĉ j ) ≥ f (Ĉ i ) i = 1, ..., j.
where the second inequality is a result of Assumption 2. ■