Joint Optimization of Data Freshness and Fidelity for Selection Combining-Based Transmissions

Motivated by big data applications in the Internet of Things (IoT), abundant information arrives at the fusion center (FC) waiting to be processed. It is of great significance to ensure data freshness and fidelity simultaneously. We consider a wireless sensor network (WSN) where several sensor nodes observe one metric and then transmit the observations to the FC using a selection combining (SC) scheme. We adopt the age of information (AoI) and minimum mean square error (MMSE) metrics to measure the data freshness and fidelity, respectively. Explicit expressions of average AoI and MMSE are derived. After that, we jointly optimize the two metrics by adjusting the number of sensor nodes. A closed-form sub-optimal number of sensor nodes is proposed to achieve the best freshness and fidelity tradeoff with negligible errors. Numerical results show that using the proposed node number designs can effectively improve the freshness and fidelity of the transmitted data.


Introduction
With the emergence of the Internet of Things (IoT) and continuous development of communication techniques, increasingly more facilities are connected to the Internet. At the same time, a flood of information swarms into the fusion center (FC) waiting to be processed.
Wireless sensor network (WSN) is a typical data-driven application in IoT, where multiple sensor nodes are deployed dispersedly aiming at observing the information source. The observed information is then transmitted through wireless channels, reaching the FC for further data gathering and recovery. Typical examples of WSN include forest salinity monitoring, smart parking, intelligent transportation safety supervising, and so on.
In WSN, the freshness of the data is essential for taking prompt actions. For example, in a pedestrian detection system, timely data from sensors in-vehicle and infrastructure is critical to avoid collision and ensure the safety of pedestrians. In smart parking, the occupancy of the parking lot is vital for the users. In environment supervising, fresh data from the sensors is urgently wanted to better monitor the air pollution condition [1].
The above examples show the significance of the freshness of the received information in wireless sensor networks since it is transmitted to operate, supervise, and monitor the systems.
Aside from the freshness, ensuring the fidelity, or accuracy of the received data is also of paramount importance. In wireless communication systems, diversity techniques are applied as useful tools for combating fading effects and improving the quality of the received signal [2][3][4][5][6]. The fundamental diversity combining is the selection combining (SC) scheme, where there is a signal to noise ratio (SNR)-based SC scheme [3] and log-likelihood ratio (LLR)-based SC scheme [5]. For the LLR-based SC scheme, it selects the branch based on the LLR. For SNR-based SC scheme, it selects the branch that provides the largest SNR from several diverse branches. This approach can effectively improve the fidelity of the ultimately received data.
Based on the background information, we can conclude that when accounting for these two metrics, on the one hand, better freshness can be achieved by waiting for observations from fewer sensor nodes, yet it leads to a decline in the fidelity as there might be insufficient observations; On the other hand, a higher fidelity of transmitted data can be obtained when FC receives more observations from sensors. However, the data may become stale since there may be a long time for receiving enough observations. Motivated by the opposite trend of two metrics affected by the number of sensor nodes, we aim to find the optimal number of sensor nodes by carefully characterizing the data freshness and fidelity using the age of information (AoI) and minimum mean square error (MMSE) metric.

Contributions
Contributions of this paper can be summarized as follows.

1.
We establish a theoretical model that simultaneously takes into account the data freshness and fidelity for jointly optimizing two metrics through adjusting the number of sensor nodes. Explicit characterizations of freshness and fidelity were given by applying the average AoI and MMSE metrics.

2.
By endowing the weighting factor to the average AoI and MMSE, the optimal number of sensor nodes optimization problem is formulated. Through appropriate approximations on both metrics, we specify the optimal number of sensor nodes by proposing a sub-optimal solution, which jointly minimizes data freshness and fidelity.

3.
Numerical results validate the correctness of the proposed solution and it can achieve the best freshness-fidelity weighted-sum tradeoff with negligible errors.

Organizations
The remainder of this paper is organized as follows. The related works are demonstrated in Section 2. In Section 3, we introduce the system model. In Section 4, we formulate the problem by introducing the freshness and fidelity metric. Closed-form sub-optimal solution is given based on appropriate approximations in Section 5. Numerical results are provided in Section 6. Finally, conclusions are drawn in Section 7.

Related Works
In this Section, we provide a thorough overview of the current academic and state-ofthe-art works, to provide a broader perspective and deeper understanding of our work.
In 2011, a metric termed the age of information (AoI) was widely applied to characterize the freshness of data delivered in a communication system. It is defined as the time elapsed since the generation of the latest received update [7]. Various systems characterized by different queueing models such as M/M/1, D/M/1, M/D/1, M/G/1, and G/G/1/1 adopted AoI to evaluate the freshness of the received data [7][8][9]. Authors in [10] considered the multi-source preemptive queuing model and investigated the optimal generation rate of each source achieving the best information freshness. The optimal generation rate of update which induced the minimal violation probability is found in [11]. Apart from the average AoI metric, The metric of peak AoI (PAoI) was introduced to capture information about the maximum value of the AoI [12,13]. Based on the violation probabilities of AoI and peak AoI, the authors in [14] analyzed the optimal arrival rate of status update in the sight of asymptotical optimality. Authors in [15] proposed a new metric termed age upon decisions (AuD) to evaluate the freshness of received updates at some decision epochs.
In WSN, the mean square error metric is widely applied to characterize the fidelity of the system. In [16], techniques for designing robust pre-coders and combiners to the linear decentralized estimation of unknown vector parameters in a coherent multiple-input multiple-output (MIMO) network with multiple sensors under imperfect channel state information (CSI) were presented. Their proposed techniques were based on the criteria that minimize the MSE of the estimated signal at the FC limited by total network power constraint. The authors in [17] proposed a framework of joint collaboration-compression for sequential estimation of a random vector parameter in WSN. By alternatively minimizing the sequential MMSE, they designed near-optimal and linear compression strategies under power constraints. Following from [16], the authors continued to develop optimal precoders minimizing the sum-MSE for the scenario of transmitting quantized observations in [18].
There have been typical works focusing on freshness and fidelity metrics in WSN [19][20][21][22]. An adaptive monitoring framework was introduced in [19] to achieve a balance between efficiency and accuracy on Internet-enabled physical devices. The authors in [20] determined the age-optimal policies for the update request and processing times subject to a maximum allowed distortion constraint on the updates. In [21], it was proved that there is an optimal number of quantization bits that optimizes the tradeoff of AoI and the MMSE. In [22], the AoI and MSE metrics were explicitly characterized by deriving the closed-form expressions for the block lengths and accuracy levels, after which they optimized the coding schemes by demonstrating a reachable region of AoI and MSE.
The above-mentioned works focused on the impact of request time, quantization bits, and the block length, respectively, on system freshness and fidelity. While our work is distinguished from them in that we try to optimize system performance through adjusting the number of sensor nodes, which is of paramount significance at the very beginning deploying phase.

System Model
We consider a wireless sensor network consisting of N, N ∈ N sensor nodes, which observe a common metric X (state update information such as temperature, humidity, and speed) characterized by a Gaussian process with zero-mean and variance σ 2 X (the assumption is based on the pervasiveness that a large number of phenomena are distributed in normal form, as is assumed in [23][24][25]), as illustrated in Figure 1. Each node's observation is transmitted through an orthogonal channel, which follows block Rayleigh fading. For ease of manipulation, we assume that the observation experience independent channel gain denoted as |h i | 2 ∼ exp(1), channel noise Z i with zero-mean and variance σ 2 Z i . The received signal for the i-th node Y i is expressed as: where h i can be obtained by transmitting pilot signal which is used for channel estimation. Then, the observations are transmitted to the FC. Considering that the ability of signal processing, collecting, and transmission are different among the facilities, we assume the transmitting time from sensor nodes to the FC obeys exponential distribution with mean 1/v and is independent and identically distributed (i.i.d.) across the sensor nodes. Specifically, the transmission is organized in rounds where the FC waits for observations from N sensors in each round, and it begins a new round at the exact moment when the previous round is complete. At the FC, selection combining (SC) technology is employed to better control the fidelity of the received data.

Data Freshness Metric
We adopt the AoI of the FC as the freshness metric of the system. It is defined as the difference between the current time and the time when the most recently received observation is generated. Formally, at a given time t, AoI is expressed as: where u t represents the generating time of the most recently received observation. Accordingly, as is described in [7], the average AoI in the interval of (0, T) is given as:   Figure 2b, the average AoI can be obtained through calculating the difference between the isosceles triangle expressed as 1 Specifically, considering I rounds of sensing in total, the time average AoI is: Notice that the statistical characteristics of R (j) N are the same, and R (j) N is actually related to the number of sensor nodes N, so we remove the superscript j and specify the number of sensor nodes N as a subscript of the interval random variable, i.e., R N . Then, the average AoI is

Data Fidelity Metric
We use selection combining technology at the FC, i.e., we select the largest |h i | among all sensor nodes denoted as |h max |, i.e., max{|h i |, 1 ≤ i ≤ N} = |h max |. And that results in: Accordingly, the distortion between the source X and the received signal Y is denoted as D, which is expressed as where k is the signal estimating coefficient.

Problem Formulation
In this section, we firstly give explicit characterizations of two metrics. After that, a problem of jointly optimizing data freshness and fidelity of the sensing system is formulated.

Average AoI of the Proposed Model
Lemma 1. The average AoI observed at the FC is: where H N = ∑ N l=1 1 Proof of Lemma 1. Recall that the general expression on average AoI we derive in Equation (4) requires us to figure out E[R N ] and E[R 2 N ]. And the transmitting time from sensor nodes to the FC obeys exponential distribution with mean 1/v and is i.i.d. across the sensor nodes. In terms of order statistic [26], the mathematical expectation and variance of the time difference denoted as D[·] between the first arriving observation and the start point is given by: The mathematical expectation and variance of the time difference between the second and first observation is: as we treat this observation as the first arriving observation among the remaining N − 1 sensors. Take Equation (9) into Equation (10) By using Mathematical induction, the N-th arriving observation is given by: Note that E[R 2 N ] can be expressed as By taking Equations (11) and (12) into Equation (4), the proof is completed.

MMSE of the Proposed Model
Recall that in our model each node transmits its observation through block Rayleigh fading channel, where the channel gain within a block remains the same while it varies independently across different blocks. Based on this model, let us use the MMSE metric to evaluate the data fidelity of the system. Lemma 2. The MMSE D(N) by using SC technique is: where Γ(a, b) = ∞ b u a−1 e −u du represents the incomplete Gamma function.
Proof of Lemma 2. Firstly, one can establish objective problem of minimizing the distortion as shown in Equation (6), which is where E[X] 2 = σ 2 X and E[Z i ] 2 = σ 2 Z i . Notice the irrelevance between source X and the noise Z i , the objective problem can be further elaborated as With some manipulations, the distortion, i.e., the optimal value of Equation (15) can be written as In our model, we assume the noise power is the same among all the sensor nodes, i.e., σ 2 Z i = σ 2 Z = 1. Denote the SNR of each node as γ 0 = σ 2 X /σ 2 Z , then the distortion is Recall |h i | 2 ∼ exp(1) and max{|h i |, 1 ≤ i ≤ N} = |h max |. Now we need to figure out the probability density function (pdf) of |h max | 2 to calculate the MMSE. For conciseness we denote |h max | 2 and its corresponding pdf as g and p(g), respectively. Given N i.i.d. exponential distribution random variables with parameter 1/θ, the pdf of the maximum variable among them can be derived: where in our work we have 1/θ = 1. After that, the MMSE can be computed as: where (a) follows from ω = (g + 1 γ 0 )i and dω = idg + i γ 0 .

Tradeoff between Data Freshness and Fidelity
Following the analyses in the above two subsections, we can see that the number of sensor nodes N is essential in affecting both freshness and fidelity metrics. Based on the results we derive so far, the objective problem is formulated by importing a weighting factor α, to jointly optimize the two metrics simultaneously through adjusting the number of sensor nodes and the weighting factor: where ∆(N) and D(N) are given by Equations (7) and (13).
We can see that when the weighting factor α is large, the average AoI part is dominant. This condition can be applied in scenarios such as daily news updating and music hits ranking where timeliness is more important. On the contrary, when α is relatively small, the fidelity of the system becomes the major concern while the freshness requirement becomes relatively low. Such systems like fuel consumption reporting and indoor temperature monitoring are in this case since the state of these systems often changes slowly over time.
In order to find the optimal N , we first analyze the monotonicity of ∆(N), D(N) and the tradeoff f (N) respectively.

Monotonicity of Average AoI
As is derived from Lemma 1, the monotonicity of ∆(N) can be figured out by calculating the difference between ∆(N) and ∆(N − 1): It can be easily verified that Equation (20) is positive, which indicates ∆(N) monotonically increases w.r.t. the number of sensor nodes N.

Monotonicity of D(N)
It is not difficult to conclude from Equation (13)

Monotonicity of the Weighted-Sum Tradeoff
Since the average AoI ∆(N) monotonically increases w.r.t. N, and D(N) monotonically decreases w.r.t. N. The conclusion can be drawn that the weighted-sum is most time a unimodal function that decreases first and increases then, even it may fluctuate sometimes, we can still treat the first local optimum as global optimum since it saves system cost for deploying sensors. The monotonicity analysis paves the way for finding a sub-optimal number of sensor nodes which we will elaborate on in the next section.

Sub-Optimal Number of Sensor Nodes
In this section, we give a closed-form sub-optimal number of sensor nodes to achieve the best system timeliness and fidelity. By applying proper approximations on two metrics, a closed-form sub-optimal number of sensor nodes can be derived. Theorem 1. The sub-optimal number of sensor nodes can be approximately given by: Proof of Theorem 1. Firstly, recall that ∆(N) = 3H N 2v + G N 2vH N , in fact, the latter part G N 2vH N is small enough to omit, then we have ∆(N) ≈ 3H N 2v . For MMSE, we try to use 1/E[g + 1/γ 0 ] to approach D(N). That is Then, the objective problem can be further reformulated as: By searching for the zero-point of its first-order derivative and omit the unnecessary root, the sub-optimal number of sensor nodes can be figured out through Noticing that H N ≈ ln N, it yields 3α ln 2 N + 2 ln N γ 0 After some manipulations,Ñ = exp − 1 can be calculated where another unnecessary root is omitted. At last, by comparing f ( Ñ ) with f ( Ñ ), the solution N * can be obtained by choosing the integer resulting in smaller data freshness and fidelity weighted-sum tradeoff.
One can conclude from Theorem 1 that the optimal number of sensor nodes N * is strongly affected by SNR γ 0 , the weighting factor α and the arrival rate v.

Numerical Results
In this section, we firstly perform simulations to validate the correctness of theoretical results. After that, we investigate the effects of system parameters on the overall system freshness and fidelity. The realistic setup is shown in Table 1.  Figure 3 shows the simulation and analysis results of the average AoI ∆(N) when the arrival rate of the updates are v = 4 and v = 6, i.e., in each time slot, there are 4 and 6 updates arriving at the FC respectively. As can be seen intuitively, the average AoI increases while the number of sensor nodes N grows. This accords with the analysis in Section 4.3.1. It is also validated that the average AoI is overall larger when the arrival rate v is smaller (cf. Equation (7)). Most importantly, by comparing the curves with the markers we can conclude that our analysis result on average AoI (cf. Lemma 1) is identical to the simulation result in which we invoked 100,000 groups of arrays obeying exponential distribution with parameter v to calculate the average AoI.   . The realistic setup in this simulation is introduced as follows. The signal power σ 2 X is set to 10 dBm (10 mW) and 13 dBm (20 mW) respectively, the channel noise power σ 2 Z is N 0 B = −173 × 10 6 dBm. For the path loss model, we adopt 35.5 + 37.6 lg(d k ), where we assume the distance between the sensors and the FC d k is 100 m. Then we have the corresponding SNR γ 0 as 12.5 dB and 15.5 dB respectively. Firstly, it can be seen from the figure that our theoretic analysis on MMSE (Lemma 2) accords with the simulation in which we invoked 100,000 groups of random variables obeying |h i | 2 ∼ exp(1) and chose the largest among them as the received signal to further calculate the MMSE. Secondly, it is shown that the MMSE declines with the growth of sensor nodes number N, which also echoes with the analysis in Section 4.3.2.  Figure 5 shows the varying trend of the data freshness-fidelity weighted-sum tradeoff f (N) with respect to the number of sensor nodes N under different SNR γ 0 and weighting factor α. The realistic setup of the system is the same as in Figure 4. In this case, we consider the same arrival rate as v = 8 (eight updates arrive at the FC in each time slot). Overall, all curves well demonstrate the monotonicity of the weighted-sum tradeoff which decreases first and goes up then. Comparing circular dotted curve with a circular solid curve which differs in SNR γ 0 , we can conclude that a smaller SNR γ 0 results in smaller MMSE and so is the tradeoff, whereas the optimal number of sensor nodes N = 4 remains unchanged. Comparing the circular solid curve with the square solid curve, it is seen that a larger weighting factor α leads to a smaller optimal number of sensor nodes N . This can be ascribed to the fact that a larger weighting factor α means a larger AoI, hence the curve begins the upward trend earlier. By further reducing the weighting factor α to 0.45, the optimal number of sensor nodes becomes even larger, see the dotted square curve, reaching N = 12. Most importantly, it is obvious that our proposed sub-optimal number of sensor nodes N * is identical to the optimal N . This well validates the effectiveness of our proposed sub-optimal number of sensor nodes.  Figure 6 depicts the relationship between the data freshness-fidelity weighted-sum tradeoff f (N) and the number of sensor nodes N under different weighting factor α and arrival rate v. In the simulation, the signal power σ 2 X is fixed to 10 dBm (10 mW) and the noise power remains the same, which means the MMSE distortion remains the same. Observing the circular solid curve and the square solid curve, we can see that under the same weighting factor α = 0.6, a larger arrival rate v causes a smaller tradeoff. However, the optimal number of sensor nodes N can be comparatively larger: When v = 10, N = 8; When v = 5, N = 5. This is explainable since a larger arrival rate v leads to smaller AoI, which means the curve will start climbing later. The improvement on system overall freshness and fidelity brought by the proposed sub-optimal solution N * is also validated.

Conclusions
Motivated by the contradictory relationship between the freshness and fidelity of the received data, we studied the joint optimization of the two metrics by adjusting the number of sensor nodes. Explicit expressions of average AoI and MMSE were derived, based on which a closed-form sub-optimal solution was obtained via feasible approximations. Numerical results validate that our proposed sub-optimal number of sensor nodes is correct and can achieve the best data freshness and fidelity tradeoff with negligible errors.