Distributed Channel-Aware Quantization Based on Maximum Mutual Information

In distributed sensing systems with constrained communication capabilities, sensors' noisy measurements must be quantized locally before transmitting to the fusion centre. When the same parameter is observed by a number of sensors, the local quantization rules must be jointly designed to optimize a global objective function. In this work we jointly design the local quantizers by maximizing the mutual information as the optimization criterion, so that the quantized measurements carry the most information about the unknown parameter. A low-complexity iterative approach is suggested for finding the local quantization rules. Using the mutual information as the design criterion, we can easily integrate the effect of communication channels in the design and consequently design channel-aware quantization rules. We observe that the optimal design depends on both the measurement and channel noises. Moreover, our algorithm can be used to design quantizers that can be deployed in different applications. We demonstrate the success of our technique through simulating estimation and detection applications, where our method achieves estimation and detection errors as low as when designing for those special purposes.


Introduction
A random source with continuous amplitude requires infinite number of bits to be described. However, due to practical constraints in communication systems, for example, limited storage or channel capacity, only finite number of bits can be accommodated. To compress a continuous-amplitude random source into limited amount of information its amplitude should be quantized to ( ) which takes values from a discrete finite set. A side effect of this quantization is some loss of information about , which depends on the quantizer's quality and the compression rate.
Quantization theory has been studied long ago [1,2]. The rate-distortion theory [3] describes the relation between the amount of distortion caused by the quantization and the rate by which the quantized source can be presented. The theoretical limits described by the rate-distortion function can only be asymptotically achieved by optimal source encoding. For designing optimal quantizers, a practical method has been investigated by Lloyd and Max [1,4]. They propose an iterative algorithm for finding the best quantization rule for a random source to achieve the lowest distortion. The distortion measure they use is the mean squared error (MSE). Using the Lloyd-Max algorithm, the optimal (it must be mentioned that all iterative algorithms for quantization design find a local optimal solution which depends on the initial quantization rules) -level quantization rule which minimizes the estimation MSE can be found for a random scalar distributed according to probability density function (pdf) ( ). The joint quantization of multiple variables has been studied under vector quantization [5,6].
A more interesting quantization scenario is when the continuous-amplitude source is not observable and only a noisy version of it can be measured as = + , where is some random noise. In this scenario the observed quantity is quantized as ( ); however, the goal is to achieve the best representation of . It has been shown by [7][8][9] that the optimal quantizer in this case can be achieved using the generalized Lloyd-Max algorithm, where the distortion measure is modified to include both the quantization and measurement noises.

International Journal of Distributed Sensor Networks
A relatively recent and more challenging problem appears in distributed sensing systems, for example, sensor networks. The problem can be described as distributed noisy source quantization, also referred to as the multiterminal source coding or CEO problem [10,11]. In a distributed sensing system, the same unknown source is observed by different measurement devices, each having a noisy observation as = + ; 1 ≤ ≤ , where is the measurement noise of the th observation. Each observation has to be quantized according to a local quantization rule ( ) and sent to the fusion centre (FC). At the FC, the quantized values are used for estimation, detection, or classification purposes, based on the application of the sensor network. Since for an optimal solution the local quantization rules have to be jointly optimized, this problem is more challenging than a centralized quantization design. The rate-distortion bound is analytically intractable in this case; however, upper and lower bound have been derived in [12].
Design of the optimal distributed quantizer for the above scenario has been considered by the authors of [13][14][15][16][17][18], who suggest cyclic algorithms based on alternating minimization [19], to find the optimal quantization rules. The algorithm starts by initial guesses about the quantizers, that is, 0 1 , . . . , 0 . During each iteration , for each 1 ≤ ≤ , the best quantization rule that optimizes a performance criterion, for example, MSE [13,14], Fisher information [15], or Ali-Silvey distances [16], is found by fixing the other − 1 quantizers and is assigned to .
Compared with the previous design algorithms for distributed quantization, that is, [13][14][15][16], in this work we use the mutual information (MI) as the optimization criterion for the distributed quantization design. We jointly design quantizers that maximize the MI of the quantized data and the unknown parameter. Our motivation for using the MI is that it is a fundamental measure showing how much information one variable contains about another variable. We design a set of quantizers for the noisy measurements, in a way that the quantized variables contain the most information about the unknown parameter.
A theory of designing quantizers based on optimizing mutual information measures has been discussed as the information bottleneck method by [17]. The information bottleneck method has been mostly used in clustering and classification applications. In this work we take the channel noise effect into account and design distributed quantizers that are optimum in presence of imperfect communication channels.
The MI measure has the following benefits. It allows designing the quantizers independent of the choice of a decoder or estimator in the FC. Also, as we will discuss later, when using the MI measure the global optimization criterion can be broken down into smaller criteria. Finally, it allows incorporating the effect of communication channels in the design of optimal quantizers. Hence, obtain the optimal distributed channel-aware quantizer. By maximizing the MI of the received data at the FC and the unknown parameter, we observe that depending on the channel noise the optimal quantizers can be different from the channelunaware quantizers.
Performance evaluation through simulating different scenarios shows great results for our distributed quantizers design based on MI maximization. This is evaluated for two applications, that is, estimation and detection. We will show that the quantization rules obtained by maximizing the MI achieve the same, and in some cases better, performance when compared with those quantizers specifically designed for the estimation or the detection purpose, that is, using MSE or Ali-Silvey distances as the optimization criteria.
The paper is organized based on two cases. First, we assume perfect communication channels between the sensors and the FC and develop our method; then we apply the method to the general case where channel is not perfect. In Section 2, we first justify the choice of MI to be used as the design criterion. In Section 3, the problem is defined and formulated based on MI. Consequently, a design algorithm is devised in Section 4 assuming ideal communication channels. In Section 5, the algorithm is modified to include channel effect. Finally, the numerical results are presented and discussed in Section 6.

Mutual Information as the Optimization Criterion
Most of the literature on optimal quantizer design has used distortion measures, such as MSE, to design the optimal quantizer [1,4,[7][8][9]13]. However, other measures have also been used as criteria to design quantizers; among them are Ali-Silvey distances [16,20,21], Cramer-Rao lower bound, and Fisher information [15,22,23]. The motivation for using these measures is the fact that they work better in some applications. For example, Ali-Silvey distance measures are shown to design better quantizers for detection applications [16,20]. A fundamental measure, showing how much information about the unknown is conveyed in the quantized data, is the MI of the unknown and the quantized data. Therefore, in this work, we base the design of distributed quantizers on maximizing the MI and will show, in Section 6, that the MI criteria result in designing quantizers with the same and even higher performance than other measures including distortion measures and Fisher information. Also, choice of MI as the optimization measure has some computation benefits in solving joint optimization of quantizers as it allows for breaking down the global bigger equation into smaller ones, as explained in Section 3. Keeping the number of quantized levels per sensor constant, we achieve the highest information rate = ( ;̂) by properly designing the quantizers. The MI criterion, to the best of our knowledge, has not been studied for distributed quantizer design in the literature.
A benefit of using the MI is to make the quantizer design independent of the estimation method or decoder. In design solutions based on distortion measures, such as squared error or Hamming error [3], the estimation method is fixed, for example, minimum mean squared error (MMSE) or maximum likelihood, and the optimization of the quantizers is achieved depending on the estimator type. Using the MI  measure, however, following the design of quantizers, an estimation or detection method can be developed in the FC, based on each specific application. This enables us to design a quantizer that is useful for estimation, detection, classification, or feature extraction. Specifically for estimation purposes, the optimal quantizers designed based on minimizing the MSE (the Lloyd-Max algorithm) are those also with high MI [1,24]. This makes sense, because when the quantized data carry more information about the unknown parameter, the FC has a better representation of the unknown; hence, it can estimate it more accurately. The performance of our MI-based algorithm in estimation and detection applications is discussed in Sections 6.1 and 6.2, respectively. Using the MI measure in the distributed quantization design enables breaking down an -sensor quantization problem into smaller problems. In fact, since the formula of the MI can be recursively broken down using the chain rule of MI, a simpler suboptimal solution can be derived by maximizing each component. The related formulations are discussed in the following section.

Problem Formulation Based on Mutual Information
The distributed quantization problem addressed in this work is defined as follows. Suppose (for the brevity of notations, we use the same symbol to address a random variable and its value) is a random scalar which takes values in R with pdf ( ). A number of noisy measurements of are observed at some distributed locations as = ℎ( , ), 1 ≤ ≤ , where ℎ is the measurement function and is the measurement noise. The measurement noise at different sensors may be correlated, but it is assumed that the distribution ( 1 , . . . , | ) is known. Due to communication constraints, the continuousamplitude measurements have to be quantized before transmission. Therefore, is encoded to ∈ L = {1, 2, . . . , } using a local quantization rule Q : R → L ; that is, = Q ( ). A quantization rule Q is defined by a set of real-valued numbers called breakpoints, that is, Γ , that divide R into partitions and assign a value from L to each partition. Each quantized piece of data , 1 ≤ ≤ , is then transmitted over a communication channel, and the received symbol at the FC is called . The complete problem model is shown in Figure 1.
Let (1) We will use this property to develop the design algorithm in Sections 4 and 5. We first consider ideal communication channels between the sensors and the FC and derive an algorithm for designing the quantization rules in Section 4. In Section 5, we extend the algorithm to consider the channel effect and design channel-aware quantization rules. From this point to Section 5, the communication channels are assumed to be ideal. The goal is to derive Q 1 (⋅), . . . , Q (⋅) so that on average the random variables 1 , . . . , together are a better representation of . To achieve that, we maximize the MI of and the quantized data over local quantization rules, that is, Due to the stepwise characteristic of the quantization rules, finding an analytical solution to the problem in (2) is difficult. Therefore, in this work we find some approximations and numerical methods to tackle this problem. The MI in (2) can be recursively written based on the chain rule of mutual information [3]: Hence, a suboptimal solution can be derived based on this recursive breakdown as where in the th line, 1 ≤ ≤ , the quantized data 1 , . . . , −1 are generated based on the previously found Q * 1 , . . . , Q * −1 , respectively. Finding the th quantization rule, for 1 ≤ ≤ , from (4) is less complex than finding all quantization rules from (2). Therefore, this suboptimal approach has less complexity. In the following section, we develop a method to find a solution for the maximization problem in (4).

Design Algorithm
To find Q , 1 ≤ ≤ , one should solve the maximization in the th line of (4). However, since Q is a discrete level function the optimization is not analytically traceable. In this section we provide a numerical method to find a local optimal solution for the th quantization rule, 1 ≤ ≤ . In (4), assume that Q * 1 , . . . , Q * −1 are known. To find the th quantization rule Q * , according to (4) we need to maximize ( ; | 1 , 2 , . . . , −1 ). When the − 1 previous quantization rules are fixed, this is equivalent to the following optimization (note that 1: = { 1 , . . . , }): In (6), the entropy terms involving only and 1: are independent of the choice of quantizers. Therefore, we can reduce the optimization problem to (we have dropped where the last equation is the consequence of the Markov chain property in (1). Note that since, for each 1 ≤ ≤ , solely depends on , we have Therefore, (7) can also be written as Note that, in the above formula, the maximization is on ( | ). It is straightforward to see that the probability function ( | ) is just another form of defining the th quantization rule. Since, ∀ ∈ R, Q maps to a value ∈ L such that = ( ), we can write where is the Kronecker delta function, which is equal to one, where = and zero elsewhere. Note that, in (9), ( | 1: ) also depends on the quantization rule or ( | ).
To solve the optimization problem in (9), motivated by [25] we use the double maxima approach by converting (9) to a larger maximization problem. The maximization in (9) can be achieved following the next three steps, which is proven in the Appendix.
(i) The maximum of the objective function in (9), namely, * , can be written as * = max max ( , ) , where is a short term for ( | ) and is a short term for ( , where ∈ L = {1, 2, . . . , }. It is shown in the Appendix that these procedures find * . Using (i), (ii), and (iii), an iterative algorithm to find the optimal th quantization rule can be derived as Algorithm 1. In Algorithm 1, determines the condition to stop the iterations. Since the mutual information ( ; 1: ) is increased during each iteration and it is upper-bounded, the algorithm converges to a local maximum.

Channel-Aware Optimal Quantizers
The discussions up to this point have assumed ideal communication channels between the sensors and the FC. In real distributed sensing systems, due to the nonideal communication channels the quantized data generated by the sensors might not be received correctly at the FC. This will affect the overall performance of the system. Hence, considering the channel effect in designing the quantizers is crucial [26]. For centralized quantization [27][28][29], revise the MSE to include the channel effect. Then, they jointly optimize the source encoders and the reconstruction levels at the receiver by minimizing this new MSE. For distributed quantization, channel-optimized quantizer design has been developed for hypothesis testing by minimizing the Bayesian cost [30,31]. Recently, the distributed channel-aware quantizer design for multiple correlated sources has been addressed by [32,33], where source encoders are designed to quantize correlated sources, in presence of noisy communication channels. Reference [34] discusses the problem under a total power constraint and designs the quantizers by minimizing the signal distortion in the receiver. For multimedia applications in distributed networks, the multiple description coding has been used to fight the channel loss [33]. References [35][36][37] have addressed the effect of imperfect transmission channels in the multiple description coding algorithms for the application of distributed video transmission. In this section, we are considering the channel into our quantizer design to recover the signal more accurately in the destination.
In this section, we design optimal channel-aware quantizers for the distributed quantization of a noisy source using MI measure. We assume that communication channels between each piece of quantized data and the FC are independent. In presence of these noisy channels, we now optimize the quantizers' design by maximizing the MI of the unknown parameter and the channels' outputs. We use the Markov chain property in (1), and to solve the optimization problem we follow an approach similar to Section 4.
Due to channel errors, the received symbol at the FC, ∈ L , might not be the same as the transmitted symbol . We assume that the channel transition probabilities, that is, ( = = ), 1 ≤ , ≤ , are known. Based on these channel transition probabilities, we can write the MI of and the received symbols at the channels' output 1 , 2 , . . . , , that is, ( ; 1 , 2 , . . . , ), or in short ( ; 1: ). Then we maximize ( ; 1: ) to find the optimal channel-aware quantizers. Similar arguments preceding (5) are applicable here. Hence, the th optimal channel-aware quantization rule is obtained as Q * = arg max Q ( ; 1: ) .  ( 1: and therefore Q * = arg max Assuming that the channel between each sensor and the FC is independent of the other channels, where is a short term for ( | ) and is a short term for ( , 1: ), such that, for all realizations of where ∈ L = {1, 2, . . . , }. Finally, a similar iterative solution as Algorithm 1 can be proposed for finding channel-aware quantization rules.

Simulation Results
In this section the performance of our proposed algorithm is demonstrated using computer simulations and compared with other methods. In particular, we examine the performance of our MI-based quantization design for the estimation applications and detection applications, in Sections 6.1 and 6.2, respectively. The effect of nonideal channels on the optimal quantization rules is investigated in Section 6.3.

Estimation Application.
For a distributed sensing system with estimation purposes, the quantized values are used in the FC to estimate the unknown. To compare with [15], where the quantization rules are obtained by minimizing the MSE, we use a similar simulation scenario. Therefore, the unknown parameter is distributed according to ( ) = N(0, 1). Two sensors are involved; that is, = 2. The measurement noises 1 and 2 are additive Gaussian noises with correlation and marginal distribution N(0, 1). The number of quantization levels for both sensors is . At the FC we use the MMSE estimator to estimate from the quantized measurements 1 and 2 . Similar to [15], the initial quantization breakpoints are chosen from the optimal quantization rules of Lloy-Max algorithm [4]; that is, Γ 0 1 = Γ 0 2 = {−2.5, 0, 2.5}. Our algorithm finds the optimal quantizers Q * 1 and Q * 2 by maximizing the MI ( ; 1 , 2 ). According to (4), ( ; 1 , 2 ) can be broken down as ( ; 1 ) + ( ; 2 | 1 ).
International Journal of Distributed Sensor Networks 7 Based on Algorithm 1, first ( ; 1 ) is maximized to find Q * 1 , and consequently ( ; 2 | 1 ) is maximized to find Q * 2 . Due to this breaking down of the task, the MI ( ; 1 , 2 ), which is the sum of the two components, is maximized in two steps. Figure 2 shows the value of MI at each iteration of the algorithm, for = 4 and = 0.
At each iteration of the algorithm, the current quantization rules are used to quantize the measurements 1 and 2 to 1 and 2 , respectively. These values are then used to estimate using the MMSE estimator; that is,̂= { | 1 , 2 }. The estimation performance at each iteration is computed in terms of MSE, in Figure 3. It can be seen from Figures 2 and 3 that by maximizing the MI the MSE of estimation is decreased.
The optimal quantization rules at the end of iterations are represented by the set of breakpoints as Γ 1 and Γ 2 in Table 1. For different simulation scenarios the final quantization rules and the final MSE are shown and compared with the results of Lam algorithm [15]. Comparing with [15], the final quantization rules are different, but the MSE performances are essentially the same.

Detection Application.
In a distributed sensing system with detection purposes, the FC uses the quantized data to perform a hypothesis testing. We use our method of maximizing the MI to find the optimal quantization rules for the detection scenario and compare the performance with that of Poor algorithm [20], where Ali-Silvey distances [38] are used as the optimization criterion.
To simulate the detection scenario we assume that the unknown is a Bernoulli random variable which represents the absence ( 0 ) or presence ( 1 ) of the signal ; that is, ( = ) = and ( = 0) = 1 − . Each sensor makes an observation of in additive Gaussian noise and sends the quantized observation to the FC. We use the algorithm in Section 4 to design the optimal rules for quantizing the measurements. Note that since takes its values from the finite set { , 0}, the integral over in all equations translates into a summation over this set. At the FC, the Neyman-Pearson method is used to test the hypotheses.
To compare with Poor [20], we assume equally likely 0 and 1 ; that is, = 1/2. Also, each sensor quantizes its   N(0, 1). The probability of detection error is shown in Table 2 for two different signal energies. Our method is indicated by "MI" in the table. The results based on Matsushita distance and -divergence criteria from [20] are also indicated in the table. It can be seen from the error probabilities that the detection performance is similar to and in some cases better than [20].

Channel Effect.
The presence of a nonideal communication channel between each sensor and the FC affects the design of optimal local quantizers for each sensor. Using the design algorithm developed in Section 5, we find the channelaware local quantizers. The simulation results confirm that the optimal quantizers assuming ideal channels are different from the optimal quantizers in the presence of nonideal channels.
To compare the channel-aware and channel-unaware quantization schemes we consider an estimation application. We assume that sensor 's quantized data , 1 ≤ ≤ , is mapped to a binary word of size log 2 and sent to the FC via a binary symmetric channel (BSC) with crossover probability . The received symbol at the FC is therefore and the transition probabilities, that is, ( = | = ), 1 ≤ , ≤ , are derived based on . In the simulation examples we assume two sensors with identical crossover probabilities for all channels; that is, 1 = 2 = .   and for different values of . The final quantizers are given in Table 3. It can be seen from Table 3 that the optimal quantization solution changes depending on the channel error probability. Consequently, if, for instance, one deploys the quantizers designed for = 0 in a scenario where = 0.05, the MSE will be 0.497. While using the optimal quantizers designed for = 0.05, the MSE is 0.485. For the error detection application in nonideal communication channels, we have compared our results with that of [31], for different problem setups. Reference [31] develops an iterative algorithm by minimizing the error probability at the fusion centre after finding the optimal fusion rule for each iteration based on the quantizers of that iteration. For all setups, we choose = 2, = 4, = 1, and i.i.d. measurement noises with pdf N(0, 1). and vary for different scenarios. The optimized quantizers and the final error probability are given in Table 4. It can be seen that for all cases our results are very close and just better than the results of [31].

Conclusion
In this paper, we proposed an algorithm based on maximizing the MI measure for jointly designing optimal channel-aware local quantization rules for a distributed sensing system. The MI allows us to design general purpose quantizers that later can be deployed for different applications, for example, estimation or detection. We have shown that the performance of the optimal quantizers based on the MI is  essentially the same as the performance of optimal quantizers of other methods that specifically target the estimation or detection application. We also observed that the optimal local quantizers in the presence of nonideal channels are different from the local quantizers that are optimized without considering the channel effect.