An Information-Theoretic Approach to Joint Sensing and Communication

A communication setup is considered where a transmitter wishes to convey a message to a receiver and simultaneously estimates the state of that receiver through a common waveform. The state is estimated at the transmitter by means of generalized feedback, i.e., a strictly causal channel output, and the known waveform. The scenario at hand is motivated by joint radar and communication, which aims to co-design radar sensing and communication over a shared spectrum and hardware. For the case of memoryless single receiver channels with i.i.d. time-varying state sequences, we fully characterize the capacity-distortion tradeoff, defined as the largest achievable rate below which a message can be conveyed reliably while satisfying some distortion constraints on state sensing. We propose a numerical method to compute the optimal input that achieves the capacity-distortion tradeoff. Then, we address memoryless state-dependent broadcast channels (BCs). For physically degraded BCs with i.i.d. time-varying state sequences, we characterize the capacity-distortion tradeoff region as a rather straightforward extension of single receiver channels. For general BCs, we provide inner and outer bounds on the capacity-distortion region, as well as a sufficient condition when this capacity-distortion region is equal to the product of the capacity region and the set of achievable distortions. A number of illustrative examples demonstrate that the optimal co-design schemes outperform conventional schemes that split the resources between sensing and communication.


I. INTRODUCTION
Future generation wireless networks are expected to support a number of autonomous and intelligent applications that strongly rely on accurate sensing and localization techniques [3]. The key-enabler of these applications is the ability to continuously sense the dynamically changing environment, hereafter called the state, and to react accordingly by exchanging information between nodes. Although most current systems have separate implementations of sensing and communication systems, the high cost of spectrum and hardware encourages integrating both tasks via a single waveform and a single hardware platform (see e.g. [4], [5] and references therein). A typical example is joint radar sensing and communication, where a transmitter equipped with a colocated radar receiver wishes to convey a message to a (already detected) receiver and simultaneously estimate state parameters of that receiver. The scenario at hand has been extensively studied in the literature (see e.g. [6], [7] and references therein). In particular, a number of joint sensing and communication schemes, or co-design schemes, have been proposed to optimize performance metrics capturing some tension between two functions [8]- [12]. Although these works provide system guidelines or propose waveforms suitable to some specific scenarios, none has addressed the fundamental performance limits above which a joint sensing and communication system cannot operate irrespectively of computational complexities, choices of state parameters, or further assumptions.
Such an observation motivates us to study the fundamental limit of joint sensing and communication from an information theretical perspective. To this end, we build on a simple single-user communication model where a transmitter exploits strictly causal channel output, or generalized feedback, for state sensing while a receiver has perfect state knowledge [1]. The proposed model captures two underlying assumptions used in radar signal processing. On the one hand, generalized feedback captures the inherently passive nature of the backscattered signal observed at the transmitter, which cannot be controlled but is determined by its surrounding environment. On the

B. Organization
The rest of this paper is organized as follows. The following Section II describes the model, formulates the joint sensing and communication problem in a single-receiver channel, and provides the corresponding capacitydistortion-cost tradeoff. Section III extends the obtained results to two-user broadcast channels. Finally, Section IV concludes the paper.

C. Notation
We use calligraphic letters to denote sets, e.g., X . The sets of real and nonnegative real numbers however are denoted by R and R + 0 . Random variables are denoted by uppercase letters, e.g., X, and their realizations by lowercase letters, e.g., x. For vectors we use boldface notation, i.e., lower case boldface letters such as x x x for deterministic vectors.
We use [1 : X] to denote the set {1, · · · , X}. We use X n for the tuple of random variables (X 1 , · · · , X n ). We abbreviate independent and identically distributed as i.i.d. and probability mass function as pmf. Logarithms are taken with respect to base 2.

A. System Model
Consider the point-to-point communication scenario depicted in Fig. 1, where a transmitter wishes to communicate a message to a receiver over a memoryless state-dependent channel and simultaneously estimate the state from generalized feedback. In order to formulate the joint sensing and communication problem, we consider a statedependent memoryless channel such that the channel output at the receiver Y i and the feedback signal Z i of a given time i are generated according to its stationary channel law P Y Z|XS (·, ·|x i , s i ) given the time-i channel input X i = x i and state realization S i = s i , irrespective of the past inputs, outputs and state signals. Except for some Gaussian examples, we assume that channel states S i , inputs X i , outputs Y i , and feedback signals Z i take value in given finite sets S, X , Y, and Z, respectively. The state sequence {S i } i≥1 is assumed i.i.d. according to a given state distribution P S (·) and perfectly known to the receiver.
A (2 nR , n) code for the state-dependent memoryless channel (SDMC) consists of 1) a discrete message set W of size |W| ≥ 2 nR ; 2) a sequence of encoding functions φ i : W × Z i−1 → X , for i = 1, 2, . . . , n; 3) a decoding function g : S n × Y n → W; 4) a state estimator h : X n × Z n →Ŝ n , whereŜ denotes a given finite reconstruction alphabet. For a given code, the random message W is uniformly distributed over the message set W and the inputs are obtained as X i = φ i (W, Z i−1 ), for i = 1, . . . , n. The corresponding channel outputs Y i and Z i at time i are obtained from the state S i and the input X i according to the SDMC transition law P Y Z|SX . LetŜ n := (Ŝ 1 , · · · ,Ŝ n ) = h(X n , Z n ) denote the state estimate at the transmitter andŴ = g(S n , Y n ) the decoded message at the receiver.
The quality of the state estimates is measured by the expected average per-block distortion 4 where d : S ×Ŝ → R + 0 is a given bounded distortion function: max (s,ŝ)∈S×Ŝ d(s,ŝ) < ∞.
In practical communication systems, we typically impose an expected cost constraint on the channel inputs such as an average or peak power constraint. These cost constraints can often be expressed as for some given cost functions b : X → R + 0 . Definition 1. A rate-distortion-cost tuple (R, D, B) is said achievable if there exists a sequence (in n) of (2 nR , n) codes that simultaneously satisfy for P (n) e := Pr Ŵ = W . The capacity-distortion-cost tradeoff C(D, B) is the largest rate R such that the rate-distortion-cost tuple (R, D, B) is achievable.
The main result of this section is an exact characterization of C(D, B). We begin by describing the optimal estimator h, which is independent of the choice of encoding and decoding functions, and operates on a symbolby-symbol basis, i.e., it computes estimateŜ i only in function of X i and Z i but not the other inputs and feedback signals.
where ties can be broken arbitrarily and Irrespective of the choice of encoding and decoding functions, distortion ∆ (n) in (4b) is minimized by the estimator which only depends on the SDMC channel law P Y Z|SX and the state distribution P S .
Proof: See Appendix A. Lemma 1 implies that we can focus without loss of generality on a symbol-by-symbol deterministic estimator. Based on (5), we define the estimation cost c(x) of the optimal estimator as Now we are ready to present the capacity-distortion-cost tradeoff.

B. Capacity-Distortion-Cost Tradeoff
In order to characterize some useful properties of the capacity-distortion-cost function, we define the following sets: Then, the minimum distortion for a given cost B is given by Definition 2. Define the information-theoretic tradeoff function where (X, S, Y, Z) ∼ P X P S P Y Z|SX and the maximum is over all P X satisfying both the distortion and cost constraints (9a) and (9b).
Lemma 2. Given a SDMC P Y Z|SX with state-distribution P S , the capacity-distortion-cost tradeoff function C inf (D, B) has following properties.
saturates at the channel capacity: where C NoEst (B) := max PX ∈PB I(X; Y |S) denotes the classical channel capacity of the SDMC for a given cost B, and D max (B) denotes the corresponding distortion for P Xmax := argmax PX ∈PB I(X; Y |S).
Proof: The proof is a straightforward extension of [14,Corollary 1] to the case of two cost functions and the state dependent channel. The nondecreasing property follows immediately from the definition (11) because we have P D1 ⊆ P D2 and P B1 ⊆ P B2 for any D 1 ≤ D 2 and B 1 ≤ B 2 .
In order to verify the concavity of C inf (D, B) with respect to (D, B), we consider time-sharing between two input distributions, denoted by P (1) X and P (2) X , that achieve C inf (D 1 , B 1 ), C inf (D 2 , B 2 ), respectively. To make the dependency of the mutual information with respect to the input distribution more explicit, we adapt the following notation: for any pmf P X over the input alphabet X , let I(P X , P Y |XS | P S ) : For any θ ∈ (0, 1), we have: where (a) follows by definition, (b) follows from the concavity of the mutual information functional with respect to the input distribution, (c) follows by the linearity of the constraints and because for any k = 1, 2 the pmf P (k) X has expected cost no larger than B k and expected distortion no larger than D k . This establishes the concavity of C inf (D, B). We now state the main result of this section. Theorem 1. The capacity-distortion-cost tradeoff of a SDMC P Y Z|SX with state-distribution P S is: Proof: See Appendix B. Combining Lemma 2 and Theorem 1, we can conclude that the rate-distortion tradeoff function C(D, B) is non-decreasing and concave in D ≥ D min and B ≥ 0, and for any B ≥ 0 it saturates at the channel capacity C NoEst (B).
For many channels, given B ≥ 0, the tradeoff C(D, B) is strictly increasing in D until it reaches C NoEst (B). However, for SDMBCs and costs B ≥ 0 where the capacity-achieving input distribution P Xmax := argmax PX ∈PB I(X; Y | S) also achieves minimum distortion D min (B), the capacity-distortion tradeoff is constant C(D, B) = C NoEst (B), irrespective of the allowed distortion D. This is in particular the case, when the expected distortion E[d(S,ŝ * (X, Z)] does not depend on the input distribution P X . The following corollary identifies a set of SDMCs P Y Z|SX and state distributions P S where this holds for all costs B ≥ 0.
Corollary 1. Assume that there exists a function ψ(·) with domain X ×Z so that irrespective of the input distribution P X the following two conditions hold: for (S, X, Z) ∼ P S P X P Z|SX . In this case, for any given B, the rate-distortion tradeoff function C(D, B) is constant over D ≥ D min and equal to the channel capacity of the SDMC: Proof: See Appendix C.
The following state-dependent erasure channel satisfies the conditions in above corollary. Let S be Bernoulli-p and Y equal to the erasure symbol "?" when S = 1 and Y = X when S = 0. Moreover, assume perfect output feedback, i.e., Y = Z. For the choice ψ(X, Z) = 1{Z = "?"} = S both Markov chains in Corollary 1 are trivially satisfied because S and X are independent.
C. Blahut-Arimoto Type Algorithm to evaluate Theorem 1 Through simple time-sharing arguments, it can be shown that for given feasible B, the set of achievable (R, D) pairs is convex. Its boundary is thus characterized by solving the following parameterized optimization problem for each µ ≥ 0: Notice that the conditional mutual information functional can explicitly be written as: for the state pmf P S and the SDMB transition law P Y Z|XS . Notice that for µ = 0 we simply obtain the capacity of the SDMC under the input cost constraint (disgarding the distortion constraint), while for µ → ∞, we obtain the minimum possible distortion subject to the same input cost constraint. We remark that the parameterized optimization problem above differs from the standard Blahut-Arimoto algorithm with cost constraints [22, Section IV] only in that 1) the objective function (19) includes an additional penalty term and 2) the mutual information functional is I(X; Y | S) instead of I(X; Y ), which reflects the state-dependent channel and the state knowledge at the receiver. Since the penalty term is additive and linear in P X , all concavity properties desired for a Blahut-Arimoto type algorithm remain valid. The following Theorem 2 can then be proved by standard alternating optimization techniques, in analogy to the proof of the Blahut-Arimoto algorithm [21], [22].
By letting Q X|Y S denote a general conditional pmf of X given Y, S, we consider the function: Theorem 2. Let the state pmf P S and the SDMC transition law P Y Z|XS be given. The following statements hold: a) For each µ ≥ 0, denote by L µ (B) the optimal value of (19). Then, b) Fix P X ∈ P(B). Then, J µ (P X , P S , P Y Z|XS , Q X|Y S ) is maximized by choosing Q X|Y S as c) Fix Q X|Y S . Then, J µ (P X , P S , P Y Z|XS , Q X|Y S ) is maximized by choosing P X ∈ P(B) as where and λ ≥ 0 is chosen so that x∈X P X (x)b(x) ≤ B holds with equality when evaluated for P X in (24).
Based on Theorem 2, we propose the following algorithm summarized in Algorithm 1. The algorithm yields a pair of capacity-distortion values (C µ (B), D µ (B)) for any fixed input cost B and µ. Letting P ∞ X,µ denote the convergent input distribution produced by the algorithm for given B and µ, we have By varying µ, we obtain the capacity-distortion tradeoff for fixed input cost B. Moreover, by varying the input cost B, we obtain the family of all such tradeoffs, and therefore the whole boundary of the achievable capacitydistortion-cost tradeoff region.

D. Examples
Before presenting our examples, we present two baseline schemes. 1) Baseline Schemes: We consider two baseline schemes that time share (TS) between two operating modes. The first baseline scheme, termed Basic TS scheme, is unable to simultaneously perform the sensing and communication tasks and splits its resources (time or bandwidth) between the following two modes: • Sensing mode without communication (achieves rate-distortion pair (0, D min (B))) The input pmf P X is chosen to minimize the distortion: and thus the minimum distortion D min (B) defined in (10) is achieved. Due to the lack of communication capability, the communication rate is zero.
Choose λ (0) > 0. 6: for = 1, 2, . . . do 7: Update dual variables: where α is the gradient adaptation step 9: Let P (k) The input pmf P X is chosen to maximize the rate: and this mode thus communicates at a rate equal to the channel capacity C NoEst (B). Due to the lack of proper sensing capabilities, the estimator is set to a constant value regardless of the feedback and the input signals. The mode thus achieves distortion The second baseline scheme is called Improved TS scheme and can simultaneously perform the communication and sensing tasks. This scheme time-shares between the following modes.
• Sensing mode with communication (achieves (R min (B), D min (B))) The input pmf P X is choosen according to (30) to achieve the minimum distortion. The chosen pmf P Xmin can achieve the following communication rate: • Communication mode with sensing (achieves (C NoEst (B), D max (B))) The input pmf P Xmax is chosen as in (31) to maximize the communication rate. The mode thus communicates at the capacity C NoEst (B) of the channel. Sensing is performed by means of the optimal estimator in (5). The mode thus achieves distortion It is worth noticing that for any cost B ≥ 0, the two operating points of the two modes in the Improved TS scheme, (R min (B), D min (B)) and (C NoEst (B), D max (B)), also lie on the capacity-distortion-cost tradeoff curve C(D, B) presented in Theorem 1. These two points are thus also operating points of any optimal co-design scheme. As we will see at hand of the following examples, all other operating points of the Improved TS scheme are however typically suboptimal compared to an optimal co-design scheme.
2) Example 1: Binary Channel with Multiplicative Bernoulli State: Consider a channel Y = SX with binary alphabets X = S = Y = {0, 1} and where the state S is Bernoulli-q, for q ∈ (0, 1). We assume perfect output feedback to the transmitter Y = Z, and consider the Hamming distortion measure d(s,ŝ) = s⊕ŝ. No cost constraint is imposed.
The following corollary specializes Theorem 1 to this example.
Corollary 2. The capacity-distortion tradeoff of a binary channel with multiplicative Bernoulli state is given by where H b (p) denotes the binary entropy function. In other words, the curve C(D) is parameterized as Proof: Since Y is deterministic given S, X and in particular it equals 0 whenever S = 0, we have: Setting p := P X (0), we obtain To calculate the distortion, we notice that the optimal estimatorŝ * (·, ·) in Lemma 1 setŝ In fact, whenever x = 1 the transmitter acquires full state knowledge because z = y = s. In this case c(x = 1) = 0. For x = 0, the transmitter does not receive any useful information about the state and hence uses the best constant estimator, irrespective of the feedback z. In this case, where we used the independence of S and X. The expected distortion of the optimal estimator thus evaluates to: The capacity-distortion tradeoff of Corollary 2 is illustrated in Fig. 2 for state parameter q = 0.4. The figure also compares the performances of the two baseline TS schemes. We observe a significant gain of an optimal co-design scheme over the two TS baseline schemes. We conclude this example with a derivation of the parameters of the TS schemes.
To determine the performance of the basic TS scheme, we recall that the best constant estimator (that does not consider the feedback) isŝ const = argmax s∈{0,1} P S (s) , which allows to conclude that D trivial = min{q, 1−q}. The basic TS scheme thus achieves all rate-distortion pairs on the line connecting the points (0, 0) and (q, min{q, 1−q}). 3) Example 2: Real Gaussian Channel with Rayleigh Fading: We consider the real Gaussian channel with Rayleigh fading: where X i is the channel input satisfying lim n→∞ where {N fb,i } are i.i.d. zero-mean Gaussian of variance σ 2 fb ≥ 0. We consider the quadratic distortion measure d(s,ŝ) = (s −ŝ) 2 .
First, we characterize the two operating points achieved by the Improved TS baseline scheme. The capacity of this channel is achieved with a Gaussian input X max ∼ N (0, B), and thus the communication mode with sensing achieves the rate-distortion pair where we have set σ 2 fb = 1 and P = 10dB to obtain the numerical values. Minimum distortion D min is achieved by 2-ary pulse amplitude modulation (PAM), and thus the sensing mode with communication achieves rate-distortion pair where the numerical value again corresponds to σ fb = 1 and B = 10dB. Next, we characterize the performance of the basic TS baseline scheme. The best constant estimator for this channel isŝ = 0, and the communication mode without sensing achieves rate-distortion pair (C NoEst (B), D trivial (B) = 1). The sensing mode without communication achieves rate-distortion pair (0, D min (B)). In Fig. 3, we compare the rate-distortion tradeoff achieved by these two TS baseline schemes with a numerical approximation of the capacity-distortion-cost tradeoff C(D, B) of this channel. As previously explained, C(D, B) also passes through the two end points (R min (B), D min (B)) and (C NoEst (B), D max (B)) of the Improved TS scheme. We use the Blahut-Arimoto type Algorithm 1 to obtain a numerical approximation of the points on C(D, B) in between these two operating points. Specifically, the input alphabet is quantized to a M = 16-ary PAM constellation The Gaussian noise N is quantized with a centered equally-spaced 50-points alphabet, and the state S is quantized by applying an equally-spaced 8000-points quantizer on the Chi-square distributed random variable S 2 . Denoting the quantized input, noise, and state by X q , N q , and S q , we keep our multiplicative-state, additive-noise channel model to generate the channel outputs used to run Algorithm 1 to obtain the numerical approximations:

III. MULTIPLE RECEIVERS
In this section, we consider joint sensing and communication over two-receiver broadcast channels. Consider the two-receiver broadcast channel scenario depicted in Fig. 4. The model comprises a two-dimensional memoryless state sequence {(S 1,i , S 2,i )} i≥1 whose samples at any given time i are distributed according to a given joint law P S1S2 over the state alphabets S 1 × S 2 . Receiver 1 observes state sequence {S 1,i } and Receiver 2 observes state sequence {S 2,i }. The transmitter communicates with both receivers over a state-dependent memoryless broadcast channel (SDMBC), where given time-i input X i = x and state realizations S 1,i = s 1 and S 2,i = s 2 , the time-i outputs Y 1,i and Y 2,i observed at the receivers and the transmitter's feedback signal Z i are distributed according to the stationary channel transition law P Y1Y2Z|S1S2X (·, ·, ·|s 1 , s 2 , x). We again assume that all alphabets

A. System Model
The goal of the transmitter is to convey a common message W 0 to both receivers and individual messages W 1 and W 2 to Receivers 1 and 2, respectively, while estimating the states sequences {S 1,i } and {S 2,i } within some target distortions. For simplicity, the input cost constraint is omitted.
A (2 nR0 , 2 nR1 , 2 nR2 , n) code for an SDMBC thus consists of 1) three message sets 2) a sequence of encoding functions φ i : 3) for each k = 1, 2 a decoding function g k : S n k × Y n k → W 0 × W k ; 4) for each k = 1, 2 a state estimator h k : X n × Z n →Ŝ n k , whereŜ 1 andŜ 2 are given reconstruction alphabets. For a given code, we let the random messages W 0 , W 1 , and W 2 be uniform over the message sets W 0 , W 1 , and W 2 and the inputs are obtained from the states S 1,i and S 2,i and the input X i according to the SDMBC transition law P Y1Y2Z|S1S2X . Further, for k = 1, 2 letŜ n k := (Ŝ k,1 , · · · ,Ŝ k,n ) = h k (X n , Z n ) be the transmitter's estimates for state S n k and (Ŵ 0,k ,Ŵ k ) = g k (S n k , Y n k ) the messages decoded by Receiver k. The quality of the state estimatesŜ n k is again measured by bounded per-symbol distortion functions d k : Our interest is in the two expected average per-block distortions and the joint probability of error Definition 4. The capacity-distortion region CD is given by the closure of the union of all achievable rate-distortion tuples In the remainder of the section, we present bounds on the capacity-distortion region CD. As in the single-receiver case, one can easily determine the optimal estimator functions h 1 and h 2 , which are independent of the encoding and decoding functions and operate on a symbol-by-symbol basis.
where ties can be broken arbitrarily.
Irrespective of the choice of encoding and decoding functions, distortions ∆ are minimized by the estimators Proof: See Appendix A. Analogously to the definition in equation 8 we can then define the optimal estimation cost for each input symbol Characterizing the capacity-distortion region is very challenging in general, because even the capacity regions of the SDMBC with and without feedback are unknown to date. We first present the exact capacity-distortion region for the class of physically degraded SDMBCs and then provide bounds for general SDMBCs. We shall also compare our results on the capacity-distortion regions to the performances achieved by simple TS baseline schemes, in analogy to the single-receiver setup.
Specifically, we again have a basic TS baseline scheme that performs either sensing or communication at a time, and an improved TS baseline scheme that is able to perform both functions simultaneously via a common waveform by prioritizing either sensing or communication. Analogously to the single-receiver setup, each of the two baseline schemes time-shares between a sensing mode and a communication mode. However, since we now have two distortions and three rates, the choice of the "optimal" pmf P X for each mode is not necessarily unique, but rather a continuum, depending on which function of the two distortions or the three rates one wishes to optimize. For fixed input pmf, the difference between the communication mode without sensing (employed by the basic TS scheme) and the communication mode with sensing (employed by the improved TS scheme) lies in the choice of the estimators. In the former mode, the transmitter applies the best constant estimators for the two state-sequences, irrespective of its inputs and feedback outputs. In the latter mode, it applies the optimal estimators in Lemma 3, which depend on the input and the feedback output. Similarly, the difference between the communication modes without and with sensing is that in the former all rates are zero and in the latter the chosen input pmf P X can be used for communication at positive rates.

B. Capacity-Distortion Region for Physically Degraded SDMBCs
This section characterizes the capacity-distortion region for physically degraded SDMBCs and evaluates it for two binary examples.
Definition 5. An SDMBC P Y1Y2Z|S1S2X with state pmf P S1S2 is called physically degraded if there are conditional laws P Y1|XS1 and P S2Y2|S1Y1 such that P Y1Y2|S1S2X P S1S2 = P S1 P Y1|S1X P S2Y2|S1Y1 . (54) That means for any arbitrary input P X , the tuple (X, S 1 , S 2 , Y 1 , Y 2 ) ∼ P X P S1S2 P Y1Y2|S1S2X satisfies the Markov chain Theorem 3. The capacity-distortion region CD of a physically degraded SDMBC is given by the closure of the set of all tuples (R 0 , R 1 , R 2 , D 1 , D 2 ) for which there exists a joint law P U X so that the tuple (U, X, S 1 , S 2 , Y 1 , Y 2 , Z) ∼ P U X P S1S2 P Y1Y2Z|S1S2X satisfies the two rate constraints and the distortion constraints Proof: The achievability can be proved by standard superposition coding and using the optimal estimators in Lemma 3. The converse also follows from standard steps and the details are provided in Appendix D.
In what follows, we evaluate the theorem for two examples.
with the joint state pmf for γ, q ∈ [0, 1]. Notice that S 2 is a degraded version of S 1 , which together with the transition law (59) ensures the Markov chain X − −(S 1 , Y 1 ) − −(S 2 , Y 2 ) and the physically degradedness of the SDMBC. We consider output feedback and set the common rate R 0 = 0 for simplicity. In this SDMBC, zero distortions D 1 = D 2 = 0 can be achieved by deterministically choosing X = 1 exactly as for the single-receiver case. This choice however cannot achieve any positive communication rates, i.e., R 1 = R 2 = 0.
In the sensing mode with and without communication, we thus have: The optimal input distribution for communication is are achievable. The input X max ∼ B(1/2) simultaneously maximizes both communication rates R 1 , R 2 .
In the communication mode without sensing, the transmitter applies the optimal constant estimator for each state, namelyŝ and thus achieves all tuples where D 1,max := min{q, 1 − q} and D 2,max := min{γq, 1 − γq}, and r ∈ [0, 1] denotes the time-sharing parameter between the two communication rates.
In the communication mode with sensing, the same input X max is used. The transmitter however applies the optimal estimator for k = 1, 2:ŝ * and achieves the tuple where r again denotes the time-sharing parameter between the two communication rates. The basic and improved TS baseline schemes achieve the time-sharing lines between points (62) and (65) and points (62) and (67), respectively. The following corollary evaluates Theorem 3 to obtain the performance of the optimal co-design scheme.
for some choice of the parameters r, p ∈ [0, 1]. H(X) , directly leads to the desired rate constraints. The distortion constraints are obtained from the optimal estimators in (66). Following the same steps as in the single-receiver case, i.e. (40) and (41), we obtain which concludes the proof.
Notice that above Corollary 3 reduces to Corollary 2 in the special case of R 0 = R 2 = 0 and D 2 = ∞, i.e., when we ignore Receiver 2. Fig. 5 shows in red colour the boundary of the projection of the tradeoff region CD of this example onto the 3dimensional plane (R 1 , R 2 , D 1 ), for parameters γ = 0.5 and q = 0.6. The tradeoff with D 2 is omitted for simplicity and because D 2 is a scaled version of D 1 . The figure also shows the boundaries of the basic and improved TS baseline schemes. We again notice a significant gain for an optimal co-design scheme compared to the TS baseline schemes.
So far, there was no tradeoff between the two distortion constraints D 1 and D 2 . This is different in the next example, which otherwise is very similar.
2) Example 4: Binary BC with Multiplicative Bernoulli States and Flipping Inputs: Reconsider the same state pmf P S1S2 as in the previous example, but now an SDMBC with a transition law that flips the input for receiver 2: As in the previous example we consider output feedback Z = (Y 1 , Y 2 ).
Corollary 4. The capacity-distortion region CD of the binary SDMBC with flipping inputs in (70) and output feedback is the set of all tuples (R 0 , R 1 , R 2 , D 1 , D 2 ) satisfying for some choice of the parameters r, p ∈ [0, 1].
The capacity-distortion region expression above captures the tradeoffs between the two rates through the parameter r, between the rates and the distortions through the parameter p, and between the two distortions through the parameter p.
Comparing above Corollary 4 to the previous Corollary 3, we remark the identical rate constraints and the relaxed distortion contraints for both D 1 and D 2 in Corollary 4. The reason is that the flipping input allows the transmitter to perfectly estimate S 1 from (X, Y 1 , Y 2 ) not only when X = 1 but also when X = 0 and Y 2 = 1 because they imply that S 2 = 1 and by (60) also S 1 = 1.

C. Capacity-Distortion Region for General SDMBCs
In the remainder of this section, we reconsider general SDMBCs, for which we present bounds on CD. We start with a simple outer bound.
Theorem 4 (Outer Bound on CD). If (R 0 , R 1 , R 2 , D 1 , D 2 ) lies in CD for a given SDMBC P Y1Y2Z|S1S2X with state pmf P S1S2 , then there exists for each k = 1, 2 a conditional pmf P Uk|X such that the random tuple (U k , X, S 1 , S 2 , Y 1 , Y 2 , Z) ∼ P Uk|X P X P S1S2 P Y1Y2Z|S1S2X satisfies the rate constraints and the average distortion constraint where the functionŝ * k (·, ·) is defined in (51). Proof: See Appendix D.
Proof: Similar to [16] and omitted. In analogy to Corollary 1 for the single-receiver case, for some SDMBCs there is no tradeoff between the achievable distortions and communication rates. In this case, for the BC, the capacity-distortion region is given by the Cartesian product between the SDMBC's capacity region: and its distortion region: Proposition 2 (No Rate-Distortion Tradeoff). Consider an SDMBC P Y1Y2Z|S1S2X with state pmf P S1S2 for which there exist functions ψ 1 and ψ 2 with domain X × Z so that irrespective of the input distribution P X the relations hold for (S 1 , S 2 , X 2 Z) ∼ P S1 P S2 P X P Z|XS1,X2 . The capacity-distortion region of this SDMBC is the product of the capacity region and the distortion region: Proof: Analogous to the proof of Corollary 1. Specifically, the proof is obtained from Appendix C by replacing (S,Ŝ, ψ, Y, T ) with (S k ,Ŝ k , ψ k , Y k , T k ), for k = 1, 2.

D. Example 5: Erasure BC with Noisy Feedback
Our first example satisfies Conditions (80) and (81) in Proposition 2 for an appropriate choice of ψ 1 and ψ 2 , and its capacity-distortion region is thus given by the product of the capacity region and the distortion region.
Let (E 1 , S 1 , E 2 , S 2 ) ∼ P E1S1E2S2 over {0, 1} 4 be given but arbitrary. Consider the state-dependent erasure BC where the feedback signal Z = (Z 1 , Z 2 ) is given by the described SDMBC satisfies the conditions in Proposition 2, thus yielding the following corollary.
Corollary 5. The capacity-distortion region of the state-dependent erasure BC with noisy feedback in (83)-(84) is the Cartesian product between the capacity region of the SDMBC and its distortion region: When P E1S1E2S2 = P E1S1 P E2S2 , then the distortion region is given by: The capacity region C of this SDMBC is unknown even with perfect feedback. 1 Proof. The state can perfectly be estimated (S k = 0) with zero distortion if (S k , E k ) = (0, 0). Otherwise, the feedback is Z k =? and provides no information. The optimal estimator is then given by the best constant estimator, which in this example is:ŝ This immediately yields the distortion constraint in (87).

E. Example 6: State-Dependent Dueck's BC with Multiplicative Bernoulli States
Consider the state-dependent version of Dueck's BC [25] in Figure 6 with input X = (X 0 , X 1 , X 2 ) ∈ {0, 1} 3 , i.i.d. Bernoulli states S 1 , S 2 ∼ B(q), for q ∈ [0, 1], and outputs where and the noise N ∼ B(1/2) is independent of the inputs and the states. The feedback signal is and for simplicity we again ignore the common rate R 0 .
We notice that only X 1 and X 2 are corrupted by the state and the noise. Since X 0 is received without any state or noise, it is thus completely useless for sensing. In fact, the optimal estimator of Lemma 3 is (see Appendix F-A) where we slightly abuse notation by omitting the argument x 0 for the estimatorŝ * k because this latter does not depend on x 0 .
For a given input pmf with probability t := Pr[X 1 = X 2 ], the expected distortion achieved by the optimal estimators in (92) is (see Appendix F-B): We observe different cases: i) for q ∈ [0, 1/2], both minima are achieved by q; ii) for q ∈ 1/2, 2 − √ 2 , the first and second minima are achieved by 1 − q and q, respectively; iii) for q ∈ 2 − √ 2, 1 , the first and second minimum are achieved by (1 − q) and (1 − q)(2 − q), respectively. The distortion constraint (76) thus evaluates to: We notice that for q ∈ [0, 1/2], the distortion constraint is independent of t and thus of P X , and the minimum expected distortions are D min,1 = D min,2 = 1 2 q. For q ∈ 1/2, 2 − √ 2 , the minimum expected distortions are achieved for t = 1 and the same holds also for q ∈ 2 − √ 2, 2/3 . For q ∈ [2/3, 1], the distortions are minimized for t = 0. We thus have D min,1 = D min,2 = D min , where We obtain a characterization of the distortion region: The private-messages capacity region is: The converse and achievability proofs are provided in Appendices F-C and F-D, respectively.
Reconsider now the case where q ∈ [0, 1/2]. As previously explained, the distortion is independent of the input distribution, and the capacity-distortion region CD degenerates to the product of the capacity and distortion regions: Corollary 6 (No Rate-Distortion Tradeoff). For above state-dependent Dueck example with q ∈ [0, 1/2]: For the general case, we only have bounds on the capacity-distortion region CD. We first present our outer bound, which is based on Theorem 4 and proved in Appendix F-C.
Corollary 7 (Outer Bound). The capacity-distortion region CD (without common message) of Dueck's statedependent BC is included in the set of tuples (R 1 , R 2 , D 1 , D 2 ) that for some choice of the parameters t ∈ [0, 1] satisfy the rate-constraints and the distortion constraints in (94).
The inner bound is based on Proposition 1, see Appendix F-D. Together with the outer bound in Corollary 7 it characterizes both the distortion region D and the capacity region C in (96) and (97).
Corollary 8 (Inner bound). The capacity-distortion region CD of the state-dependent Dueck BC includes all ratedistortion tuples (R 1 , R 2 , D 1 , D 2 ) that for some choice of t ∈ [0, 1] satisfy (94) and as well as the convex hull of all these tuples. Sensing mode with and without communication: In the sensing mode with communication, one can choose an arbitrary pmf for X 0 , e.g., X 0 Bernoulli-1/2 because this input does not affect the sensing. From (95), the minimum distortions of D min,1 = D min,2 = 5/32 are achieved by setting X 1 = X 2 with probability 1. For X 1 = X 2 the sum-rate cannot exceed and Y 2 are corrupted by the Bernoulli-1/2 noise N . On the other hand, any rate pair (R 1 , R 2 ) of sum-rate R 1 + R 2 = 1 is trivially achievable by communicating only with the noiseless X 0 -input.
We conclude that the sensing mode with communication achieves the rate-distortion tuple (R 1 , R 1 , D 1 , D 2 ) satisfying R 1 + R 2 ≤ 1 and D k ≥ 5/32, k = 1, 2. (103) If the transmitter cannot perform communication and sensing tasks simultaneously, the same minimum distortions are achieved but the rates are trivially zero.

IV. CONCLUSION
Motivated by the paradigm of integrated sensing and communication systems, we studied joint sensing and communication in memoryless state-dependent channels. We fully characterized the capacity-distortion tradeoff for the single-user channels as well as physically-degraded broadcast channels. For general broadcast channels, we presented inner and outer bounds on the capacity-distortion region. Through a number of illustrative examples, we demonstrated that the optimal co-design scheme offers non-negligible gain compared to the basic time-sharing scheme that performs either sensing or communication, as well as compared to the improved time-sharing scheme that prioritizes one of these two tasks. Interestingly, there are ideal situations where the capacity is achieved without compromising the sensing performance. While for memoryless channels, the absence of a tradeoff appears to be rather exceptional, we expect that this situation will be more common for channels with memory. This is because for temporally correlated channels, a good state estimation at the transmitter not only yields good sensing performance but is also beneficial for communication. In fact, in some preliminary examples that we considered, we can achieve rate-distortion points that are close to the ideal point where rate equals capacity and distortion is minimal. The capacity-distortion tradeoff characterization for channels with memory remains as an interesting future work.

ACKNOWLEDGEMENTS
The authors would like to thank Gerhard Kramer for his support on an early version of this paper.

APPENDIX A PROOF OF LEMMA 1
Recall thatŜ n = h(X n , Z n ), and write for each i = 1, · · · , n: where (a) holds by the Markov chain Summing over all i = 1, . . . , n, we thus obtain: which yields the desired conclusion.
APPENDIX B PROOF OF THEOREM 1 1) Converse: Fix a sequence (in n) of (2 nR , n) codes such that Limits (4) hold. By Fano's inequality there exists a sequence n → 0 as n → ∞ so that: where (a) holds because conditioning can only reduce entropy; and (b) holds because (W, Y i−1 , S i−1 , S n i+1 ) − (S i , X i ) − Y i form a Markov chain. We continue as: where (c) holds by the definition of C inf (D, B), and (d) and (e) hold by Lemma 2.
2) Achievability: Fix P X (·) and functionsĥ(x, z) that achieve C(D/ (1+ ), B), where D is the desired distortion and B is the target cost, for a small positive number > 0. We define the joint pmf P SXY := P S P X P Y |SX . a) Codebook generation: Generate 2 nR sequences {x n (w)} 2 nR w=1 by randomly and independently drawing each entry according to P X . This defines the codebook C = {x n (w)} 2 nR w=1 , which is revealed to the encoder and the decoder.
b) Encoding: To send a message w ∈ W, the encoder transmits x n (w). c) Decoding: Upon observing outputs Y n = y n and state sequence S n = s n , the decoder looks for an index w such that (s n , x n (ŵ), y n ) ∈ T (n) (P SXY ).
If exactly one such index exists, it declaresŴ =ŵ. Otherwise, it declares an error. d) Estimation: Assuming that it sent the input sequence X n = x n and observed the feedback signal Z n = z n , the encoder computes the reconstruction sequence as: S n = (ŝ * (x 1 , z 1 ),ŝ * (x 2 , z 2 ), . . . ,ŝ * (x n , z n )).
(117) e) Analysis: We start by analyzing the probability of error and the distortion averaged over the random code construction. Given the symmetry of the code construction, we can condition on the event W = 1.
We then notice that the decoder makes an error, i.e., declares nothing orŴ = 1 if, and only if, one or both of the following events occur: Thus, by the union bound: The first term goes to zero as n → ∞ by the weak law of large numbers. The second term also tends to zero as n → ∞ if R < I(X; Y |S) by the independence of the codewords and the packing lemma [19,Lemma 3.1]. Therefore, P (n) e tends to zero as n → ∞ whenever R < I(X; Y |S). The expected distortion (averaged over the random codebook, state and channel noise) can be upper bounded as In the event of correct decoding, i.e.,Ŵ = 1, and sinceŜ i =ŝ * (X i , Z i ), also (S n , X n (1),Ŝ n ) ∈ T (n) P SXŜ , where P SXŜ denotes the joint marginal pmf of P SXZŜ (s, x, z,ŝ) := P S (s)P X (x)P Z|SX (z|s, x)1{ŝ =ŝ * (x, z)}. Then, for (S,Ŝ) following the marginal of the pmf P SXZŜ defined above. Assuming that R < I(X; Y |S), and thus P e → 0 as n → ∞, we obtain from (123) and (126): Taking finally ↓ 0, we can conclude that the error probability and distortion constraint (4a), (4b) hold (averaged over the random code constructions, the random states, and the noise in the channel) whenever Notice that the cost constraint (4c) is fullfilled by construction. By standard arguments it can then be shown that there must exist at least one sequence of deterministic code books C n so that constraints (4) hold.

APPENDIX C PROOF OF COROLLARY 1
It suffices to show that under the described conditions, the distortion constraint (4b) does not depend on P X . To this end, we define T = ψ(X, Z) and rewrite the expected distortion as: where (a) holds by the definition ofŝ * (x, z) and the law of total probability; and (b) by the Markov chain S − −T − −(X, Z), see (17), and because T is a function of X, Z. The independence of the pair (T, S) with X from (16), together with the above expression implies that the expected distortion does not depend on the choice of the input distribution P X . Hence, we can conclude that for any given B ≥ 0, the rate-distortion tradeoff function C(D, B) is constant over allD ≥ D min and coincides with the capacity of the SDMC C NoEst (B).
Following similar steps, we obtain: where we defined Y 1 := Y 1,T and S 1 := S 1,T ; and where (b) holds by the physically degradedness of the SDMBC which implies the Markov chain , and (c) holds because S 1 ∼ P S1 independent of (U, X).
Recall that we assume the optimal estimators (51) in Lemma 3. Using the definitions of T , X, S k above and defining Z := Z T , we can write the average expected distortions as: Combining (134), (136), and (137) and letting n → ∞, we obtain that there exists a limiting pmf P U X such that the tuple (U, X, S 1 , S 2 , Y 1 , Y 2 , Z) ∼ P U X P S1S2 P Y1Y2Z|S1S2X satisfies the rate-constraints and the distortion constraints This completes the proof.

APPENDIX E PROOF OF THEOREM 4
Fix a sequence (in n) of (2 nR0 , 2 nR1 , 2 nR2 , n) codes satisfying (50). Fix then a blocklength n and consider an enhanced SDMBC where Receiver 1 observes the pair of statesS 1 = (S 1 , S 2 ) and the pair of outputsỸ 1 = (Y 1 , Y 2 ). The enhanced SDMBC is clearly physically degraded because for any input pmf P X the Markov chain holds. Following the steps in the previous Appendix D, we can conclude that and Consider next a reversely enhanced SDMBC where Receiver 1 observes only (Y 1 , S 1 ) but Receiver 2 observes both state sequencesS 2 := (S 1 , S 2 ) and both outputsỸ 2 := (Y 1 , Y 2 ). Following again the steps in the previous Appendix D, but now with exchanged indices 1 and 2, we obtain: Combining all these inequalities and letting first n → ∞ and then n ↓ 0, establishes the desired converse result.

A. Optimal Estimator of Lemma 3
We first derive the optimal estimatorŝ * k (x 1 , x 2 , y 1 , y 2 ) of Lemma 3 for this example. Case y 1 = y 2 = 1: In this case, S 1 = S 2 = 1 deterministically, and thuŝ Case y 1 = 1 and y 2 = 0: In this case, S 1 = 1 deterministically and To derive the optimal estimator for state S 2 , we notice that y 1 = 1 implies x 1 ⊕ N = 1, i.e., N = x 1 ⊕ 1. As a consequence, So, for x 2 = x 1 we have y 2 = S 2 = 0 and the optimal estimator setŝ Instead for x 2 = x 1 , the feedback output y 2 = 0, irrespective of the state S 2 . The optimal estimator then is the constant estimatorŝ * Case y 1 = 1, y 2 = 0: Symmetric to the previous case y 1 = 0, y 2 = 1. The optimal estimators are as in (150) and (151), but with exchanged indices 1 and 2.

B. Minimum distortion
We evaluate the expected distortion of the optimal estimators in (92), for a given input pmf P X0X1X2 . Let t := Pr[X 1 = X 2 ]. We first consider the distortion on state S 2 : where (a) follows by the definition of the functionŝ * 2 . In the previous Subsection F-A, we argued that for y 2 = 1 or for (y 2 = 0, y 1 = 1, x 1 = x 2 ), the state S 2 is deterministic (S 2 = 1 in the former case and S 2 = 0 in the latter) and thus minŝ ∈{0,1} P S2|X1X2Y 1 Y 2 (ŝ|x 1 , x 2 , y 1 , y 2 ) = 0. We further argued that for (y 1 = 1, y 2 = 0, x 1 = x 2 ) the transmitter learns nothing about state S 2 , which is thus still distributed according to P S . Based on these observations, we continue from (157) as: where in (b) we used (152)-(155) and the fact that when X 1 = X 2 , then event

D. Proof of Achievability Results
We evaluate Proposition 1 for different choices of the involved random variables. Since we ignore the common rate R 0 , bound (77d) is not active and can be ignored.
The presented choice of parameters can thus achieve all rate-distortion tuples (R 0 , R 1 , R 2 , D 1 , D 2 ) satisfying the distortion constraints in (94) (which only depends on the probability t := Pr[X 1 = X 2 ]) and 2) Second choice: Same as the first choice except that V 0 = X 2 ⊕ Y 2 . Following symmetric arguments as above, we conclude that for this choice the constraints in (77) evaluate to: 3) Combining the Choices and Time-Sharing: From the two previous subsections, we conclude that for any t ∈ [0, 1] the set of rate-distortion tuples (R 0 , R 1 , R 2 , D 1 , D 2 ) is achievable if it satisfies (94) and As previously discussed, for q ≤ 1/2, the distortion constraints (94) do not depend on t, and thus without loss in optimality in (183) one can set t = 1/2, which results in a sum-rate constraint Combined with (183a) and (183b), this sum-rate bound establishes the achievability of the capacity region in (97). For q > 1/2 the distortion constraints (94) are either increasing or decreasing in t. The set of achievable ratedistortion tuples is then obtained by varying t either over [0, 1/2] or over [1/2, 1]. Numerical results indicate that the so obtained set is not convex and the convex hull is obtained by considering convex combinations between different values of t > 0 and t = 0 for q ∈ [2/3, 1] and t = 1 for q ∈ [1/2, 2/3].