Channel Discord and Distortion

Discord, originally notable as a signature of bipartite quantum correlation,in fact can be nonzero classically, i.e., arising from noisy measurements by one of the two parties. Here we redefine classical discord to quantify channel distortion,in contrast to the previous restriction of classical discord to a state,and we then show a monotonic relationship between classical (channel) discord and channel distortion. We show that classical discord is equivalent to (doubly stochastic) channel distortion by numerically discovering a monotonic relation between discord and total-variation distance for a bipartite protocol with one party having a noiseless channel and the other party having a noisy channel. Our numerical method includes randomly generating doubly stochastic matrices for noisy channels and averaging over a uniform measure of input messages. Connecting discord with distortion establishes discord as a signature of classical, not quantum, channel distortion.


I. INTRODUCTION
Discord is often touted as a quantifier of quantum correlations in a state, with nonzero discord said to imply that observed correlations transcends non-quantum (i.e., 'classical') limits [1,2], akin to, but different from, a Bell inequality [3]. Treated as a quantum resource operationalized by state merging [4], discord in quantum computing protocols [5,6] is believed by many to deliver a quantum advantage to some protocols [7]. However, this quantum nature of discord has been challenged by stochastic information, which shows that discord is due to noisy measurement by one of the two parties [8]. Essentially, discord can be understood in terms of a protocol amenable to a stochastic-information interpretation, and this interpretation fails if and only if (iff) the two parties share bipartite entanglement [8]. Discord thus serves as a fascinating starting point for studying stochastic information.
Previous work analyzed state discord in the context of classical states [8], i.e., analyzing discord as a signature of classical rather than quantum correlations. Here we introduce the concept of classical channel discord, which is based on averaging over all allowed channel input states for a given channel, with the previous definition used in assessing how much discord is added to a state by the given channel. Our expression for channel discord is amenable to numerical evaluation, which shows channel discord is monotonic with respect to channel distortion. We augment this numerical analysis by solving analytically the small but nontrivial two-bit case (each of two parties holds one bit) to confirm our numerics for this case and establish a path for proving discord-distortion monotonicity, which is a challenging calculation as we show. Our monotonicity result establishes meaningfully that channel discord and distortion are essentially equivalent.
Mathematically, discord in a two parties state refers to an apparent discrepancy between two expressions for mutual information obtained by two parties named here as Alice (A) and Bob (B), one quantity depending on joint probability and then other depending on conditional probability. The usual view of the reason for quantum discord in a state is that conditional information must be adapted to the quantum case by introducing measurement, and this incompatibility gives rise to nonzero discord [2]. Quantum discord in a state has been shown to be equivalent to classical discord if and only if entanglement between the two parties is zero, and this equivalence has been explained by showing that, in the absence of entanglement, discord represents one party, namely Bob, suffering from noisy measurement whereas Alice's measurements are ideal [8]. Our goal is to show that classical channel discord, obtained by averaging channel-added discord to all allowed channel input states, is equivalent to channel distortion by establishing a monotonic relation between discord and total-variation, or Kolmogorov, distance, which quantifies channel distortion. We analyze the general case numerically and solve analytically for the two-bit case.
Our article is structured as follows. In §II we summarise essential background on stochastic information, discord, total-variation distance and doubly stochastic channels. Our approach is described in §III and our approach specifically elaborates on our model for describing a noisy protocol for creating channel discord. Furthermore, §III presents our notation and mathematical expressions, and our methods for solving these expressions numerically. Subsequently, in §IV, we present our numerical results and explain the plots, and we discuss the results thoroughly in §V.
In §VI, we summarise our claims and provide an outlook. In Appendices B and A we introduce convenient notation for probability vectors and what we call Hadamard calculus respectively, which fundamental to our approach.

II. BACKGROUND
In this section we discuss the background and context for our work. In §II A we discuss informational states including what we call stochastic information, which is a probabilistic mixture of definite informational states; specifics regarding probabilistic information states are explained in Appendix B, which is based on the Hadamard notation explained in Appendix A. Included in this discussion, we review the notions of entropy by considering shared information between parties, and we also review the notions of entropy and mutual information, all in the elegant Hadamard notation elaborated in Appendix A, which we apply for the first time to this application. Then, in §II B, we explain mappings of information states in terms of channels with special emphasis on doubly stochastic channels. In this subsection we discuss ways to quantify how stochastic a channel is. Finally, in §II C, we review the notion of classical discord for states and the concepts of total-variation distance for stochastic information states; quantum discord is explained in Appendix C.

A. Stochastic Information
In this subsection we review the concept and mathematical framework for stochastic information based on the probability-vector representation elaborated in Appendix B which uses Hadamard calculus introduced in Appendix A. Then we discuss known concepts concerning entropy of a stochastic-information state by using Hadamard calculus introduced in Appendix A. Finally, we review bipartite stochastic information including conditional entropy and mutual information.
The joint probability of messages shared between Alice, whose message size is M A , and Bob, whose message size is M B , is the bipartite matrix using the notation that mat M A ×M B (R) refers to matrices with M A rows and M B columns whose entries are from any ring R. The norm is defined by Eq. (A4). Here we have let δ AB mm denote a versor for message m as discussed in Appendix B.
In concordance with quantum-information nomenclature, we refer to δm as a 'pure state' [9]. Impure states refer to 'mixed' states, which are probabilistic mixtures of pure states and are obtained as a probabilistic mixture of any pure state. We discuss mixed and pure bipartite stochasticinformation states in Appendix B. Now we discuss the entropy of the probability vector representing the mixed message. Mixedness of a state p ∈ R M is quantified by entropy [10] 0 ≤ H(p) := −p log p ≤ log M (2) using Hadamard notation explicated in Appendix A. A state is pure iff its entropy is zero, which follows from p • log p = 0 for a versor. The lower bound for the entropy (2) is H(p) = 0 for a pure state. The upper bound for entropy is H(p) = log M for a uniformly mixed state, with M as the size of p. A high-entropy state is a state whose entropy is close to this bound. The joint state (1) has joint entropy H AB (2) and total message size M . In our analysis, we always assume, without loss of generality, that Thus, the joint entropy of the bipartite versor (B5) is zero as required for a pure state. The matrix representation of p AB is an M A × M B matrix, with nonnegative real entries such that the sum of all entries is one. The marginal distribution is obtained by ignoring the other party's share of the mixed state. Hence, Alice's marginal distribution is the probability vector using the unit one-norm. Similarly, we construct the marginal distribution p B by summing over Alice's degree of freedom.
Alice's state conditioned on Bob's state is with explained in Appendix A. The last term of Eq. (5) displays row-vector elements p A m obtained by element-wise division of each matrix element p AB mm by respective column-vector elements p B m . Similarly, the conditional probability distribution for Bob is p B|A = p AB p A analogous to (5).
The entropy of the conditional probability distribution p A|B is Fact 1. A bipartite stochastic state p AB , which decomposes to p AB p B and to p AB p A , is conditionally pure iff Consequently, p AB is conditionally pure iff it is permutationally equivalent to a diagonal matrix; i.e., in diag([0, 1]), which refers to the set of diagonal matrices whose entries are each in the realnumber interval [0, 1]. Thus, i.e., is diagonal of size min{M A , M B } × min{M A , M B }. Furthermore, conditional probability distributions p A|B and p B|A , which are obtained from a bipartite stochastic pure information state p AB , are necessarily pure. Operationally speaking, a bipartite state is conditionally pure only if Alice's pure state can be known by Bob after he measures his share of the joint stochastic-information state and vice versa. Consequently, the bipartite versor (B5) used in Eq. (1) is conditionally pure.
Mutual information quantifies correlation between two parties, Alice and Bob, with the last part of this expression expressed in a novel way by using Hadamard arithmetic. An equivalent, alternative mutual information definition is Consequently, so I A;B p AB (9) and J A;B p AB (10) are equal.

B. Stochastic map and stochastic matrix
A noisy channel is any mapping that changes the entropy (or noise, which is monotonically related) of a state in a non-decreasing way and adds noise to at least one state [11]. We are specifically interested in noisy channels that can be represented as stochastic matrices that map probability vectors representing states [8].
Under the action of a channel represented by matrix E, the state, represented by p, maps to Ep. We require that E is a square matrix with nonnegative entries such that either rows or columns sum to one. Hence, the norm of the state p is unchanged by the stochastic map by stochastic matrix E. The entropy of this state after passing through the channel is H(Ep) ≥ H(p). A doubly stochastic matrix is a stochastic matrix whose rows and columns both sum to one.
In the bipartite setting, an identity mapping by Alice concomitant with a stochastic map E by Bob, yields the resultant bipartite state with the trivial identity map I on Alice's side and the noise matrix E only acting on Bob's share. By the Perron-Frobenius theorem, E being stochastic or doubly stochastic implies that this mapping has at least one stationary vector with all entries being positive real numbers with this vector corresponding to the largest eigenvalue of the matrix representing the mapping [12]. Although we discuss discord and total-variation distance in terms of measurement described by a noisy measurement channel represented by a stochastic matrix, we focus on doubly stochastic matrices due to the abundance of mathematical properties that we can exploit for generating and understanding our results. For stochastic information theory, doubly stochastic matrices are the non-quantum analogue of quantum completely positive trace-preserving maps [13].
Birkhoff's Theorem says that any doubly stochastic matrix can be written as a convex hull of permutation matrices, which is known as the Birkhoff polytope [14]. The doubly stochastic matrix thus represents a random permutation of bits in the string. Furthermore, for each strictly positive matrix A, exactly one doubly stochastic matrix T A exists such that T A = DAD with the diagonal matrices D and D having positive diagonal-elements and themselves unique up to a scalar factor [15,16]. Here we present key background information on doubly stochastic matrices needed for our study. Specifically, we define doubly stochastic matrices and connect these matrices with the Birkhoff polytope, also known as a permutahedron. A permutation σ is represented by a permutation matrix Π σ , whose entries are all zeroes and ones such that only one instance of one appears in each row or column. For ℘ a length-M ! probability vector, a permutahedron E is the convex sum Note that EΠ σ = Π σ E.

C. Classical discord and distortion for a state
In Appendix C, we summarise quantum discord; in this subsubsection, we summarise classical discord for stochastic-information states [8]. Whereas quantum discord is the discrepancy ∆ A;B between mutual information I (9) and J (10) for quantum states, classical discord (11) is zero in the ideal case. However, discord is nonzero if, analogous to the quantum case, which optimizes over all possible measurements, classical discord also involves noisy measurements.
Alice's measurements are treated as ideal whereas Bob's measurements are treated as being noisy, described by a stochastic or a doubly stochastic mapping E acting on the state (12) [8]. The resultant state, after Bob's noisy measurement, is p AB E (12). Whereas the mutual information I (9) is known, the alternative mutual information (10) is modified to include the effect of Bob's noisy measurements and is consequently described by with the subscript E referring to Bob's noisy channel as we always treat Alice's as ideal: I. In other words, the conditional information inherent in inferring alternative mutual information (14) involves Bob announcing his results to Alice, and Bob's measurement apparatus is noisy: described by stochastic or doubly stochastic channels, as described in §II B, prior to ideal measurement and announcement by Bob. Following this definition of alternative mutual information involving noisy measurement (14), discord for stochastic information is [8] for specified noisy channel E, which only affects J and not I. State discord (15) is considered to be classical because states are distributions and because channels are stochastic maps; i.e., all the mathematical objects are distributions and their mappings and hence does not require a Hilbertspace description. This E-dependent discord is necessarily nonnegative due to the data-processing inequality [11]. By analogy with quantum discord, which minimizes over all measurement, classical discord corresponds to minimizing state discord (15) over all allowed channels {E} [8]. For shared stochasticinformation states, classical discord quantifies how much stochasticity is added by a noisy measurement process. If this noise is described by a doubly stochastic channel, this noise corresponds to random permutations, following Birkhoff's theorem, corresponding to instances of measuring some messages incorrectly as other messages, with the identity permutation corresponding to measuring all integers correctly. Non-zero discord can be interpreted as quantifying stochasticity added by measurement only if entanglement is zero; otherwise a quantum model is required to describe correlations [8]. Analogous to quantum state merging operationalizing quantum discord [4], stochasticinformation state merging operationalizes classical discord [8].
Channel distortion, which is used in rate-distortion theory [11], quantifies the minimum number of bits required per symbol that could be achieved over a channel so that the input signal can be approximately reconstructed at the output without exceeding a given expected distortion. Mathematically, rate-distortion theory, distortion functions quantify the cost of representing a symbol by an approximate symbol. Typical distortion functions include Hamming distortion, squared-error distortion and total-variation.
Total-variation, or Kolmogorov, distance between probability distributions p and p (B2), namely [17], has been widely used for extremum problems, such as controlling uncertain stochastic systems [18], approximating a family of probability distributions by a given probability distribution, maximizing or minimizing entropy subject to total-variation distance constraints, quantifying uncertainty of probability distributions by total-variation distance metric, stochastic minimax control, and in many problems of information, decision theory, and minimax theory [19], testing for scale families [20] and distortion of channels [11]. Thus, total-variation distance is well studied and valuable across a broad spectrum of applications, including for us in comparing total-variation distance to discord.

III. APPROACH
In this section, we begin by explaining our model, which involves three agents: Alice and Bob who share messages and Charlie who provides random messages from a distribution. After describing the model, we develop the mathematics required to analyse the effect of noisy measurement in terms of average discord and average distortion in §III B. Finally we elaborate on our methods for solving the expressions and what we plot in §III C.

A. Model
We describe our model for discord as a three-agent protocol involving Charlie, Alice and Bob. By describing the tasks performed by each of the three agents, we have fully described the protocol and the pertinent quantifiers of discord and distortion. Although this model is implied in a previous study of classical discord, we need to make explicit the agents of this protocol and their actions to be clear in our study of channel discord.
Charlie generates joint distributions with prior Q p AB and then computes I A;B p AB (9) for each p AB . For given p AB (1), Charlie generates a length-ς sequence of pairs of integers by sampling over p AB . In each instance, the first integer message m A is sent to Alice and the second integer message sent to Bob. As A and B can experience noise in their readout, the resultant messages, m A and m B , can differ from the original messages, m A and m B . Alice and Bob send this noisy pair, m A and m B , to Charlie. At the end of this part of the protocol, Charlie has stored the length-ς sequence {m A , m B }. Charlie then infers the distribution p A B from these data, with this inferred state represented byp A B .
As Alice's instrument is assumed to be noiseless, m A ≡ m A , Charlie's procedure is greatly simplified: he does not send Alice the message but just stores it. The message pair is thus , which approximates the actual state p AB E after Bob's noisy measurement and Eq. (1). Charlie computes all permutations of the state p AB E, with each permuted state denoted p AB EΠ σ .
He thence estimates the alternative mutual information J A;B σ (10) with the subscript σ indicating which of the permuted states p AB EΠ σ is being considered. With these results at hand, Charlie estimates the discord for each p AB EΠ σ and computes the minimum discord over all σ. Then he averages over results for many generated states p AB to obtain an estimate for average discord for a specific channel corresponding to Bob's noisy measurement.
For each estimate p AB EΠ σ , he computes the distortion, which he quantifies by the total-variation distance D A;B σ p AB EΠ σ , between p AB and the estimate p AB EΠ σ . Charlie repeats this task for all permutations σ to obtain the minimum total-variation distance and then averages over all states to obtain average distortion. The mathematical description of this procedure is in §III B 4.
In each instance Alice receives noiseless message m A , which is the versor δ m A , which she reads and sends the same message back to Charlie. Thus, for our mathematical analysis, Alice's role is superfluous, hence neglected in our protocol.
In each instance Bob receives message m B . His measurement is noisy, which we describe by a doubly stochastic channel E described in §II C. This noise corresponds to permutations of the message basis so some messages are seen to be different messages incorrectly except in the case of the identity permutation 1, which corresponds to reading the message correctly.
To elucidate our model, we consider the specific case of a two-bit channel. Thus, we assume that Charlie generates a single bit each for Alice and Bob; i.e., M A = M B = 2 and m A , m B ∈ {0, 1}. Let the noisy channel be the mapping of 0 to 0, i.e., 0 → 0, with probability 2 /3. Then, by the doubly stochastic property, 0 → 1 with probability 1 /3 and 1 → 1 with probability 2 /3 so 1 → 0 with probability 1 /3 the matrix describing this mapping is doubly stochastic. Bob sends the message m B obtained from his measurement back to Charlie. Note that, for the execution of this protocol, the same stochastic matrix, which describes Bob's measurement noise, is applied once per instance, i.e., each time that Bob reports each measurement outcome.

B. Mathematics
In this subsection, we describe in §III B 1 how Charlie generates states as randomly chosen joint distributions to be sent to Alice and Bob, and then we describe how random channels are generated in §III B 2 for Bob. In §III B 3 we describe mathematically how the channel is applied to the joint state. Permutations are applied to states, and both discord and distortion are minimimized over all permutations, as described in §III B 4. Finally, in §III B 4, we explain how average discord and average distortion are estimated.

Generating joint distributions
In this subsubsection, we explain mathematically how Charlie generates p AB . Charlie constructs a prior Q p AB , which is heavily weighted over high-entropy states, meaning that state entropy (2) is close to the upper bound. By sampling this prior, Charlie obtains states with high, low and medium entropy by a linear interpolation between states drawn randomly from Q p AB and the state represented by the identity matrix denoted 1. This linear interpolation generates a continuum of interpolated states for each of the N rand states. Thus, Charlie generates random states from which he draws messages to send to Alice and Bob.
To sample from Q p AB , we generate a random p ∈ mat M and then normalized such that the sum of diagonal elements is one. As for general states, such conditionally pure states also tend to have high entropies.
Sampling either Q p AB or Q p AB cp yields a candidate state p AB cp , which is then mapped to a family according to where B 1 to ensure sufficiently many medium-and low-entropy states. The channel representing Bob's noisy measurement then acts on Bob's message share, and we explain how to generate these channels in the next subsection.

Generating channels
In this subsubsection, we explain how to generate a random doubly stochastic channel for Bob. As the doubly stochastic channel is a permutahedron discussed in §II B, generating a random channel is equivalent to constructing the length M B ! weight, or probability, vector ℘, which corresponds to the probability coefficients for reverse lexicographic ordering of permutations {σ}, similarly to reverse lexicographical ordering of the vector of permutation matrices (B4). A given weight vector ℘ has associated entropy which is the entropy of the corresponding channel E. Now we explain how to generate ℘ from a distribution P that is an equal weighting of a uniform prior P ↑ , resulting in high-entropy weight vectors ℘ ↑ such that its entry H(℘ ↑ ) (2) is the maximum entropy log M B !, and another prior P ↓ that generates states ℘ ↓ with low entropy H(℘ ↓ ) (2). To generate P ↑ , we first set each entry of ℘ ↑ be 1 and then normalize this length M B ! weight vector by dividing each element by ℘ 1 . In contrast, we generate P ↓ by uniformly randomly generating the first element of ℘ ↓ , namely ℘ 1 ↓ , from the interval [0, 1] and replace The next element of the weight vector, namely, ℘ 2 ↓ , is drawn uniformly from the interval and we continue according to the rule that ℘ is drawn uniformly from After randomly generating all these M B ! elements, we normalise this weight vector to obtain ℘ ↓ . Now that we have generated an instance of a low-entropy weight vector ℘ ↓ and a high-entropy weight vector ℘ ↑ we generate numerous vectors in between the two by linear interpolation, thereby sampling the continuous set This interpolation (22) yields medium-entropy channels to round out the sampling.
As we have generated many random instances of ℘ (22), we can construct corresponding descriptions of doubly stochastic channels. The elements of ℘ are coefficients of reverse lexicographically ordered permutation matrices, and this weighted sum is then the permutahedron that describes random doubly stochastic channel E with entropy given by the entropy of its representative weight vector. Mathematically, the matrix description of the channel is for ℘ and Π length-M B ! vectors of real numbers and permutation matrices (B4), respectively, and σ ∈ S M B ! is drawn in reverse lexicographical order.

Applying the channel to the joint distributions
To describe Bob's noisy measurement mathematically, we apply the generated random channel, discussed in §III B 2, to Bob's share of the entire message p AB sent by Charlie. First suppose that Alice and Bob each have noisy measurements described by channels E A and E B , respectively. Then the state sent back to Charlie from Alice and Bob is corresponding to one application of a noisy channel for each of Alice's and Bob's noisy measurements but noiseless transmission back to Charlie. As Alice's measurement is noiseless, we assign Remark 1. In §III A, Charlie estimates p AB E to be p AB E by repeated sampling but here, in the mathematical description, we work with exact descriptions of the state (24).

Estimating discord and distortion
After Charlie receives p AB E (24), he computes all permutations of this matrix. Charlie generates each of the M B ! instances of M B × M B permutation matrices Π σ (B3), for each σ drawn from permutation group S M B in reverse lexicographical order. For each instance σ, Charlie obtains the permuted state by multiplying p AB E by each permutation matrix Π σ .
Here we redefine average discord to quantify channel distortion, in contrast to the earlier definition of classical discord in terms of fluctuations for a state [8], and we define average distortion. This redefinition of average discord, in terms of channels, then allows us to show numerically the monotonic relationship between classical (channel) discord and channel distortion. Average discord is obtained first by minimizing discord (15) over all state permutations and then by averaging over all states according to the prior Q p AB in §III B 1. Similarly, average distortion is obtained by averaging state-dependent distortion (16) over the prior of states Q p AB .
For average discord, we first extend state-dependent discord (15) to be the minimized statedependent discord over all permutations of the state Averaging over the prior Q p AB yields channel discord which quantifies the discord due to noisy measurement in a state-independent but of course priordependent way. Average discord is obtained by sampling the integral (27) to obtain the esti-mate∆ A;B (E). Similarly, we extend the definition of distortion D (16) first by minimizing over permutations and then by averaging over the prior Q p AB . The state-and channel-dependent distortion is and its minimization over all permutations is Then, analogous to average discord (27), we integrate to obtain channel distortion for a given channel. Average discord is obtained by sampling the integral (27) to obtain the estimateD A;B (E).

Example: Two-bit channel
We now elucidate our model by considering the special case where Charlie generates a single bit for Alice and a single bit for Bob: M A = M B = 2. Thus, Eq. (1) implies that the joint probability of messages shared between Alice and Bob is specified by the matrix p AB = p 00 p 01 p 10 p 11 , p ij ≥ 0, p 00 + p 01 + p 10 + p 11 = 1.
The two-bit state (31) is subjected to a distortion channel whose form is dictated by Eq. (23), which implies Consequently, the entropy is according to Eq. (20).
Applying the channel to the two-bit state yields We then compute the state discord per Eq. (26), and we could perform a similar calculation for state distortion per Eq. (16). The explicit expression for discord (26)  − (p 00 + p 10 ) log (p 00 + p 10 ) − (p 01 + p 11 ) log (p 01 + p 11 ) +p 00 log (p 00 ) + p 01 log (p 01 ) + p 10 log (p 10 ) + p 11 log (p 11 ) as a function of state p AB and channel parameter µ.
If we could integrate discord and distortion over all possible joint states, i.e., over the formal variables p 00 , p 01 , p 10 , p l1 with respect to an appropriate prior distribution, then we could obtain channel discord and distortion via Eqs. (27) and (30), respectively. Note that the resulting expressions depend only on µ. On the other hand, if discord is monotonic with respect to entropy for all states {p AB }, i.e., for all points on the tetrahedron, then discord would be monotonic with respect to this average.

C. Methods
In this subsection we discuss how we average discord and average distortion for many randomly chosen channels, with each of these channels corresponding to a different noisy measurement process implemented by Bob. First, we explain in §III C 1 how many instances of states we generate and how to create those instances. Then, in §III C 2, we explain how we generate all permutation matrices and thence all permuted states. Then we explain how we generate random channels {E} numerically in §III C 3 and how many such channels. Finally, in §III C 4 we explain how we calculate the states sent back to Charlie, how to use these states to compute average discord and average distortion, and then study their relations.

Numerically generating states
We begin by generating joint states from either prior Q p AB or Q p AB cp as described in §III B 1. In practice, we generate random states as follow. First we choose B = 99 in Eq. (19) as we have discovered empirically that this value of B yields a good spread of low-, medium-and high-entropy states. Then we step through values of linear interpolation parameter a by increasing in step sizes that grow quadratically: the coefficient a of 1 increases in steps of ([ − 1]0.0101) 2 for ∈ [100].
For each choice of a and fixing b = (1 − a)B, we choose a new random instance of p AB according to the random-matrix construction method described in §III B 1. We insert a, b and p AB into Eq. (19) to obtain one instance of a state for calculating average discord and average distortion. We apply perms from MATLAB to the initial vector, and the output is the M B !×M B matrix Θ, whose elements are column indices for nonzero entries of permutation matrices Π σ . For each of the M B ! rows, labelled by σ in reverse lexicographical order, we construct the corresponding M B ×M B permutation matrix Π σ , whose entries are all zeroes and ones such that only one instance of one appears in each row or column as described in and around Eq. (B3).
Specifically, to generate Π σ , we pick row σ from the matrix Θ, denoted as a vector Θ σ . The value of the entry in the first column of Θ σ indicates which element of the first row of Π σ is one, with the rest of the elements in that row being zero. Then we proceed to the second entry of the vector Θ σ , and its value indicates which element of the second row of Π σ is one. We continue for all M B rows of Π σ and then repeat for all σ ∈ S M B . In this way we have constructed the full set of permutation matrices for message m B .

Numerically generating random channels
We generate 6000 doubly stochastic channels, each channel represented by some weight vector ℘ (℘ ↓ , a) (22), regardless of message size. Construction of each of these weight vectors proceeds according to the mathematical description in §III B 2. We choose to generate 6000 instances of random channels because we allow for 100 equally spaced values of a in Eq. (22) and 60 randomly chosen ℘ ↓ for each a.
A doubly stochastic channel is a permutahedron, and we have generated the set of all permutation matrices Π σ ∈ S M B in §III C 2. Specifically, we generate random weight vectors according to Eq. (22), thereby yielding low-, medium-and high-entropy (20) weight vectors. The resultant set of 6000 randomly generated weight vectors faithfully represents 6000 randomly generated channels, and our interpolation (22) ensures good sampling of a wide range of channel entropies. In addition, we manually add the noiseless channel 1 to our simulations to include the instance of zero average discord and zero average distortion.

Relating average discord to average distortion
Now that state p AB is generated numerically according to the procedure described in §III C 1, for both Q p AB corresponding to random initial states and Q p AB cp for random conditionally pure states, we calculate the corresponding p AB E for each permutation σ. These permuted returned states {p AB EΠ σ } are used to calculate both average discord and average distortion. Mathematical expressions for average discord and average distortion are given in §III B 4.
We begin with how we calculate average discord ∆ A;B (E) (27). First we calculate each ∆ A;B E p AB Π σ for each p AB and then minimize over all σ ∈ S M B according to Eq. (26), thereby obtaining the minimized ∆ A;B (E). The next step is to average over all p AB . As explained in §III C 1, we generate a random state p AB for each choice of linear interpolation parameter a (22), which suffices to sample the integral (27) fairly and thus obtain a good estimate∆ A;B (E) of the actual average discord ∆ A;B (E).
The procedure for calculating average distortion D A;B (E) (30) is similar. First we calculate each D A;B E p AB Π σ for each p AB and then minimize over all σ ∈ S M B according to Eq. (29), thereby obtaining the minimized D A;B (E). The next step is to average over all p AB . As explained in §III C 1, we generate a random state p AB for each choice of linear interpolation parameter a (22), which suffices to sample the integral (30) fairly and thus obtain a good estimateD A;B (E) of the actual average distortion D A;B (E).
Finally, we relate estimated average discord to estimated average distortion by plotting∆ A;B (E) againstD A;B (E). Specifically, we choose sufficiently large yet tractable message sizes, namely, M B ∈ {6, 7} and make distinct plots for each M B . For each randomly chosen channel, the resultant single point on the graph corresponding to∆ A;B (E) andD A;B (E) is marked, and we thereby obtain a scatter plot. We create plots for two cases, random initial states and random initial conditionally pure states, and compare these two cases.

Example: Two-bit channel
We explicitly analyze the relationship between average channel discord (27) and channel distortion (30) in the case of a two-bit channel as described in Section III B 5. By doing so, we establish the monotonicity of discord as a function of channel entropy. Our approach follows.
1. We first recognize that the set of possible channels E can be parametrized by a single number 0 ≤ µ ≤ 1 per Eq. (32). Hence, the channel discord can be expressed as a function of µ. Furthermore, the channel discord is defined to be an integral with respect to some measure over an integrand that itself depends on µ. If we prove that the integrand is monotonic in µ over some interval, we will also have proved that the integral is monotonic in µ over the same interval.
2. We can therefore assess the monotonicity of the channel discord (26), which we write here simply as ∆, as a function of H (33) by examining the derivative of ∆ with respect to H and applying the chain rule: This expression is well-defined except when µ = 1 /2, but this point corresponds to the maximum value of H and so is not relevant for monotonicity arguments. Thus, monotonicity of ∆ with respect to H can be proven by demonstrating monotonicity on the intervals 0 < µ < 1 /2 and 1 /2 < µ < 1, which can be proven by demonstrating monotonicity of the integrand as described in the previous point.
The above two points imply that we can prove the monotonicity of channel discord as a function of channel entropy by showing that the state discord is a monotonically increasing (decreasing) function of µ for 0 < µ < 1 /2 ( 1 /2 < µ < 1). We prove that this is so in §IV D.

IV. RESULTS
In this section we present our results, which are numerical in nature. We choose tractable message-size values, namely M B ∈ {6, 7}, to study the relation between average discord∆ A;B (E) and average distortionD A;B (E). Specifically, we plot∆ A;B (E) vsD A;B (E) for many generated channels as described in §III C 3 averaged over randomly generated states. We plots for two cases: randomly generated joint distributions in §IV A and randomly generated conditionally pure states in §IV B. Then we explain our best-fit quadratic relation between∆ A;B (E) andD A;B (E) in §IV C. Our methods for generating these plots are described in §III C.

A. Plots for randomly generated joint states
In Fig. 1, we have plotted estimated average discord∆ A;B (E) (27) and estimated average dis-tortionD A;B (E) (30) for two cases of total message length M ∈ {6, 7} as discussed in §III C 4. This scatter plot represents 6001 instances of randomly chosen channels for Bob and randomly chosen initial states by Charlie, and the points are colour-coded by the Shannon entropy of the weight vector representing the channel (20).
The origin of the plot corresponds to zero average discord and zero average distortion and arises for Bob's measurement being noiseless, i.e., for a zero-entropy weight vector ℘. We observe a monotonic trend of increasing average discord with respect to increasing average distortion. This monotonicity inference is reinforced in §IV C where we explain the best-fit curve, which is certainly monotonic. Furthermore, based on the colour-coded heat map in Fig. 1, we see monotonically increasing of all three: average discord, average distortion and channel entropy. The scatter plot shows more features. The highest point of the curve has the maximumn allowed entropy log M B ! (20) for the channel. For the two chosen messages sizes, the maximum entropies are log 6! = 9.492, log 7! = 12.299, (37) respectively. Also the scatter plot is narrow for low-and high-entropy cases of channels and wide for medium-low choices of channel entropy. We have provided Fig. 1(a) and Fig. 1(b) showing scatter plots for M B = 6 and M B = 7, respectively. The two scatter plots are similar. The differences are that the maximum entropy for the second scatter plot is higher due to the larger message size, with the increase in maximum entropy given by the ratio of the numbers in (37). Both estimated average discord and estimated average distortion are increased slightly for increased message size.

B. Plots for randomly generated conditionally pure states
In this subsection we obtain scatter plots of estimated average discord vs estimated average discord for the case that Charlie generates random conditionally pure states (8) instead of random joint states (1) as was done in §IV A. Other than using conditionally pure states here, we follow exactly the same procedure used to obtain Fig. 1. The purpose of this subsection is to verify or refute that the two cases of initial random joint distributions vs initial random conditionally pure states show the same or different features.
The scatter plot for estimated average discord vs estimated average distortion is shown in Fig. 2 for initial conditionally pure states. Similarly to Fig. 1, the scatter shows monotonically increasing average discord with respect to average distortion, monotonicity of both with respect to channel entropy represented by the heat map, and wider scatter for medium entropy compared to narrow scatter width for low and high entropy. The differences are only with respect to randomness of generated channels in the plots, suggesting that Figs. 1 and 2 are identical up to random-sampling variability.
C. Quadratic best fit to the plots Now we explain how we fit a curve to the scatter plots of Figs. 1 and 2. Our numerical results fit well with a quadratic curvẽ with {t ı } chosen differently for each plot to minimize root-mean-square error (RMSE). In all four cases, RMSE∼ 0.1, which indicates a good fit as the RMSE is much smaller than the range of∆ A;B (E).

D. Monotonicity of two-bit channel discord as function of channel entropy
Now we show that the discord of a two-bit noise channel varies monotonically with the entropy of that channel. As we explain in Section III C 5, it suffices to show that the state discord ∆ for an arbitrary two-bit state (31) varies monotonically in the parameter µ that specifies the channel (32). This monotonicity relation can in turn be proven by showing d∆ dµ ≶ 0 for µ ≷ 1 2 by exploiting Eq. (36). For ease of calculation, we substitute µ → 1+α 2 . Thus, we accomplish our aim of proving d∆ dµ ≶ 0 when µ ≷ 1 2 by instead proving that for −1 < α < 1. First we derive an expression for d∆ dα . A tedious-but-straightforward calculation starting from Eq.
We now show that f α is concave/linear/convex depending on whether sgn α = −1, 0, 1. It is easy to see that f 0 ≡ 0, which is a linear function, so we focus instead on the case α = 0. In that case, the identity log 1+y 1−y = arctanh y shows f α (x) = 1 2α × g(αx), where g(y) = y arctanh y. Hence we need only show that g(y) is convex for −1 < y < 1. This convexity can be seen directly from the Maclaurin series for arctanh, which is a sum of convex functions and hence is convex. Thus, f α is convex/concave depending on the sign of α. Hence, d∆ dα has the appropriate sign; hence, d∆ dH is monotonic as required.

V. DISCUSSION
We have developed a full classical (i.e., non-quantum) theory for channel discord, which establishes the meaning of nonzero discord in the context of stochastic information theory. Although classical discord and stochastic information theory were introduced in 2015 [8], key notions were sketched rather than fully developed. Here we have given a detailed theory of non-quantum discord all the way from the context of a three-party protocol with noisy measurement to building in Hadamard notation for making expressions clear and elegant to the unprecedented connection between average channel discord, average channel distortion and entropy of the noisy measurement (noisy channel), including monotonic relations between them, which thereby show that channel discord, in a classical setting, is a form of channel distortion arising due to one party's noisy measurement.
Scatter plots for average discord vs average distortion in Figs. 1 and 2 show this monotonic relation between average discord and average distortion and, through the heat maps, also the monotonicity between average distortion and channel entropy. These results are purely numerical but show a simple quadratic relationship for two choices of message sizes. A general mathematical relation connecting average discord to average distortion is beyond the scope of this work but we derive mathematical relations for the two-bit case both to illustrate how analytical results can be obtained and also to lend support to our conjectures based on numerical analysis; our analysis has focused on developing the protocol, making clear the problem, defining appropriate quantities and tackling numerically. Mathematically proving the general case is challenging so we instead solve the special two-bit case, i.e., the case that each of Alice and Bob hold one bit and Bob's measurement is noisy, and there we prove that channel discord is a monotonic function of channel entropy. Thus, the numerical results are backed up by a closed-expression analysis, and, furthermore, this analysis points the way to general proofs, likely using the Hadamard calculus elaborated in Appendix A.
The plots in §IV display a high level of scatter for medium-entropy cases and much less scatter for low-and high-entropy cases. Although the spread is large, monotonicity and quadratic scaling is clearly evident in these plots and the root-mean-squared error (RMSE) of each plot hovering around 0.1 is testament to the quality of the quadratic fit and hence the inference of monotonicity.
Our analysis has focused only on noise represented by doubly stochastic maps, which correspond to permutahedrons. In other words, we have concentrated on noise that would arise from random permutations of classical message measurements correspondingly measuring some messages as different messages incorrectly. In the spirit of quantum discord, our average discord and average distortion calculations are built on minimizing over all such permutations. Future work should involve generalization from doubly stochastic to stochastic maps; in the quantum context, this generalization would be akin to extending from completely positive trace-preserving mappings to completely positive mappings.

VI. CONCLUSIONS
Discord has emerged as one of the most significant quantum resources [7] but not without controversy [8,21]. Separating quantum and non-quantum aspects of discord is vital to determine quantum resourcefulness and otherwise for discord. The connection between state discord and entanglement is known, but discord for channels has been unexplored under the treatment of noisy measurement as being manifested by a noisy channel. Here we establish and elucidate connections between classical channel discord and channel distortion and entropy. To this end, we have developed a protocol, mathematical framework and numerical analysis of average channel discord, with averaging being over random shared message states (for random initial joint distributions and, to check consistency, over conditionally pure states) and over random doubly stochastic channels representing noisy measurement by one party. Note that we have defined classical channel discord to quantify channel 'fluctuations' (i.e., 'noise'), in contrast to earlier work on classical discord, which quantifies fluctuations for a state [8]. Our notion of classical channel discord then leads to our numerical demonstration of monotonicity between classical (channel) discord and channel distortion. Thus, our results show numerically that average discord, in the non-quantum setting, is equivalent to average distortion of a channel with channel distortion based on total-variation distance. Furthermore, we show numerically that this distortion measure is monotonic in channel entropy, which builds confidence that total-variation distance is a reasonable way to quantify distortion.
Akin to quantum discord, we have incorporated minimization of average discord and average distortion over all permutations, which permutations referring to permuting messages. The identity permutation corresponds to reading each message correctly and other permutations cause some messages to be read as other messages. Given that the noisy measurement is modelled as a doubly stochastic channel, which is a permutahedron, the idea of minimizing over all permutations is to identify which permutation minimizes channel discord and minimizes channel distortion averaged over all states. This minimization is key to connecting our notion of classical distortion to the quantum version.
We have created a full framework for studying the connection between average discord and average distortion for a noisy channel and have shown numerically a monotonic relation between the two. This monotonic relation is satisfying as we can now regard discord, in the classical setting, as an alternative measure of channel distortion, manifested as noisy readout by one party. We augment our numerical analysis of channel discord vs channel distortion by studying the two-bit example analytically and provide results for discord corresponding to certain special but important two-bit states. We can see that analytic methods are challenging, even in the two-bit case, but our analytical study shows a path forward for general analytical work, which benefits from using the Hadamard calculus we discuss in Appendix A.
For a (A1) restricted by 0 < a ı 1 ...ıt ≤ 1, (A5) and introducing J := (1) as the tensor with every entry being 1 (in contrast to I being the matrix such that only diagonals are 1 and off-diagonal elements are all 0) of equal size to tensor a, the element-wise logarithm is Here we use the notation • • to refer to the -fold element-wise product of the tensor • with itself. For constructing conditional states, we employ Hadamard division [23]. Hadamard division for two same-dimensional tensors (including vectors and matrices) is simply their element-wise division. Another definition of Hadamard division applies for a matrix divided by a vector, where the length of the vector equals the number of rows (or columns) of the matrix; in this case Hadamard division of the matrix by the vector corresponds to division of the row (or column) vectors of the matrix by the elements of a vector.
with the sequence of σ drawn from the permutation group in reverse lexicographical order.
A specific message of interest is a versor, which is a vector whose entries all zero except one element whose entry is one [27]. Any permutation of a versor corresponds to just replacing a given message by a new message so a permutation of a versor is just another versor. A versor is equivalent to a stochastic-information state p m = δ mm form the given message. We write this versor, corresponding to specific messagem, as δm. Versors form a basis for stochastic-information states, which we call the message basis. A permutation matrix (B3) is actually a tensor product of a versor and a coversor, with a coversor defined to be a covector version of a versor.
The Cartesian product of versors forms a basis for bipartite stochastic-information states, from which a joint distribution can be constructed. Suppose the two parties, Alice and Bob, each hold an information state δ Ǎ m and δ B m , respectively, where we use superscripts A and B to denote who owns which of the vector spaces in the tensor product. Alice's and Bob's joint state is the bipartite versor encode information that is only accessible by coherent quantum interactions [39]. A flexible twophoton setup has realized a three-qubit system with programmable degrees of initial correlations, measurement interaction, and characterization processes, thereby yielding the demonstration that local observation in an activation protocol for converting discord into distillable entanglement [40]. A trapped-ion experiment has shown that quantum discord inference of open-system dynamics detects system-environment quantum correlations without accessing the environment [41].
In contrast to stochastic information states, which are probability distributions (B2) or joint distributions for the bipartite case (1), the quantum state is a trace-class bounded completely positive operator ρ on Hilbert space H , or on the tensor product H ⊗ H for the bipartite case [9]. The quantum state's entropy is H(ρ) = − tr (ρ log ρ) for tr the trace operation. In quantum information theory, measurement is described by positive operator-valued measures [9], but, for quantum discord, only projective-valued measures {P } [9], which are self-adjoint projections on H , are considered. Each projective-valued measure P comprises a set of projective operators P ı with P ı P  = P ı δ ı . Measurement of a state yields a real-valued outcome, and the state is subsequently described by the j th projection P j ρ corresponding to that outcome. This projection is expressed as a conjugation of ρ in the literature [2], but this way of expressing is superfluous for our purposes and hence not employed.
The conditional quantum state, first defined by Cerf and Adami [42], is now typically defined as [2] ρ A|B j := (1 ⊗ P j ) ρ AB (1 ⊗ P j )/p j , p j := tr (1 ⊗ P j ) ρ AB (1 ⊗ P j ) (C1) with p j the probability of Bob obtaining j th outcome after he has applied projection-valued measures (PVM) P . The conditional quantum entropy [42] is which is a probability-weighted average of conditional entropy. Mutual information I A;B (ρ AB ) is the same as for Eq. (9) except that H now corresponds to the quantum entropy, and the last term is replaced by H A:B → H ρ AB . The alternative mutual information is which is the supremum over all Bob's PVMs. Quantum discord, analogous to classical discord (11), which was actually defined later than quantum discord [8], is which quantifies the difference between the two mutual information quantities for quantum discord. Quantum discord is interpreted as correlations that remain after classical correlations are subtracted from total correlation and recognised to quantify the non-classical correlations in a quantum system, including entanglement and therefore identified as a quantum resource [30]. The primary feature encapsulated by its quantum property is how a state is affected by local measurements and seen as a form of classical correlation aided with quantum coherence (superpositions) at the level of individual subsystems [21,28,43]. The presence of discord in quantum computing protocols [5,6], motivates the assertion that discord is a quantum resource, operationalized by state merging [4],