Super-additivity in communication of classical information through quantum channels from a quantum parameter estimation perspective

We point out a contrasting role the entanglement plays in communication and estimation scenarios. In the first case it brings noticeable benefits at the measurement stage (output super-additivity), whereas in the latter it is the entanglement of the input probes that enables significant performance enhancement (input super-additivity). We identify a weak estimation regime where a strong connection between concepts crucial to the two fields is demonstrated; the accessible information and the Holevo quantity on one side and the quantum Fisher information related quantities on the other. This allows us to shed new light on the problem of super-additivity in communication using the concepts of quantum estimation theory.


Introduction
The greatest achievement of classical communication theory is realization of the fact that being able to use a noisy communication channel many times allows one to encode, transmit and decode a message in an error-free way at a non-zero asymptotic rate referred to as the capacity of the channel. A classical channel is described via a conditional probability distribution relating input and output symbols from which the capacity of the channel can be directly calculated [1,2]. In a quantum setting [3,4], two additional elements, that have an impact on the amount of classical information that can be transmitted through the channel, need to be considered. The first one is the family of quantum states that are used to send the encoded information and the second is the measurement that provides the read-out. Only then the corresponding conditional probability can be evaluated and the capacity can be calculated using the classical formula.
There is more to it, however. In a quantum scenario we can imagine states entering inputs corresponding to different uses of a channel to be entangled. This may in principle lead to an advantage in communication capacity compared with a strategy where only separable states are allowed, see figure 1. This potential gain thanks to entanglement of input states is referred to as super-additivity of quantum channel capacity and its actual existence is a topic of long and hot debate in quantum communication community [5][6][7][8][9][10][11][12]. In this paper we will refer to this concept as the input super-additivity. Moreover, even if we do not employ entangled state at the input we are still left with the possibility to perform collective measurements at the output-measure states arriving at different channel outputs coherently. This may and indeed in many cases does provide a benefit in the form of increased capacity and we will refer to this effect as the output superadditivity [13][14][15][16].
Quantum parameter estimation theory has been largely developed before the quantum communication field achieved its maturity. These two fields share a common element, they both care to find the measurement optimal for the purpose of extracting classical information encoded in quantum states. In a communication problem the quantum channel is given, but the character of the information and the way that it is being encoded in the quantum states is arbitrary and it is ideally chosen in a way to maximize the final information transfer. In an estimation problem, on the other hand, the information is encoded in quantum states in a particular way either directly or via the action of some parameter dependent channels, see figure 1. In this sense the estimation problem may formally be regarded as a restricted communication problem, even though the traditional figures of merit used in estimation and communication approaches are usually different [14]. The central problem of quantum communication and estimation theories is to identify the potential benefits coming from exploiting entanglement in the input states as well as at the measurement stage of the protocols. Interestingly, unlike in the communication problem, there is a great number of examples demonstrating significant gains coming from the use of entangled input probes in quantum estimation Figure 1. General schemes illustrating the role of entangled inputs (input superadditivity) and collective measurements (output super-additivity) in communication (a) and estimation (b) protocols. The performance is quantified by channel capacity C for communication and the quantum Fisher information F Q for estimation problems, while the labels in the superscripts inform whether entanglement is utilized (∞) or not (1) at the input and output stages respectively. While in communication scenarios it is the measurement stage where the super-additivity appears naturally, when the practical role of input super-additivity is debatable, it is the entanglement at the input that offers a significant precision enhancement in case of parameter estimation. Note that an estimation scenario can be regarded as a special case of a communication task where parameter encoding is fixed by the channel parameter dependence Λ x . protocols, with applications ranging from optical and atomic interferometry, via magnetometry to spectroscopy and atomic clocks stabilization [17][18][19][20][21][22][23]. At the same time, in a typical estimation protocol utilizing unentangled input probes collective measurements are typically irrelevant. Thus a contrasting picture emerges: when thinking of capacities of quantum channels gains in information processing arising from utilizing entanglement are at the measurement stage whereas it is the input stage where entanglement makes a difference in the estimation scenarios.
The goal of this paper is to better understand the connections between the two fields from the point of view of the super-additivity issue, understood here as a general question of utility of entanglement in estimation/communication protocols. We show that in the weak estimation regime, where the amount of information extractable on the parameter is very small compared to the prior information, i.e. the error of estimation is large compared with the variance of prior distribution, the accessible information as well as the Holevo quantity can be expressed using Fisher information-like concepts, which allows us to discuss utility of entanglement in communication using the properties of quantities well understood within the field of quantum estimation. We also point out that in a communication problem encoding a large number of independent parameters is

Communication
The main goal of classical communication theory is understanding the limits of sending credible information through noisy channels. For this purpose the sender needs to appropriately encode the message and the receiver needs to decode it in a way that the message is not corrupted by the noise of the channel. Mathematically, a classical communication channel is modeled by a probabilistic map connecting input (X) and output (Y ) random variables via conditional probability distribution p(y|x), x ∈ X , y ∈ Y. According to the Channel Coding Theorem [1,2], the maximal number of bits that can be correctly transmitted per channel use, referred to as the capacity of the channel C, reads: where is the Shannon entropy of the output and H(Y |X) = − x,y p(y|x)p(x) log p(y|x) is the Shannon conditional entropy. In this paper all logarithms are assumed to be in base 2. It is important to stress that even though the symbols sent through different independent channels may be correlated, the formula for the capacity is given in terms of relation between input and output variables of a single channel. This automatically implies that the capacity C N of a channel constructed by grouping N individual independent channels into a single entity, fulfills the additivity property C N = NC. Quantum communication [4,26] is concerned with sending messages encoded in quantum states through quantum channels, see figure 1(a). In this paper we will only consider the problem of sending classical messages through quantum channels, ignoring the problem of transmitting faithfully quantum states themselves, as only this aspect of communication can be expected to have some relation with the estimation problem which in the end deals with extraction of classical parameter encoded in quantum states. Mathematically, a quantum channel is a completely positive trace preserving (CPTP) map Λ [3] acting on quantum states represented as density operators. Communication performance of the channel crucially depends on the states {ρ in x } x∈X in which we encode input symbols x as well as the operators {Π y } y∈Y representing the final measurement. To keep full generality one typically allows for general positive operator valued measurements (POVM) [3], so that the only condition on the measurement operators are: Π y ≥ 0, y∈Y Π y = 1 1. With the family of input states as well as the measurement operators fixed, the conditional probability distribution relating input and output symbols reads p(y|x) = Tr(ρ x Π y ), where ρ x = Λ(ρ in x ) represent the input states after they have been transmitted through channel Λ. Using now the classical formula for channel capacity, (1), we get the corresponding formula for capacity of a quantum channel: where superscript (1,1) indicates that no entanglement is involved neither at the input nor at the output stage of the protocol. Unlike in a classical scenario the issue of additivity of the capacity of the quantum channel is far from obvious. We may both perform collective measurements involving multiple output states as well as send states which are entangled throughout different channel inputs. This makes classical additivity arguments in general invalid as the full conditional probability relating input and output symbols of multiple channels no longer factorizes into single channel quantities. Indeed, when collective measurements are allowed, the capacity is in general larger than the one given in (2) [13][14][15][16]27] and is expressed via the so called Holevo quantity χ: where with S(ρ) = − Tr(ρ log ρ) being the von Neuman entropy. The replacement of 1 with ∞ in the right superscript represents the possibility of measuring collectively arbitrary number of output channels. Apart from covering a more general scenario the above formula has also a clear advantage over (2) as it no longer requires optimization over measurements.
When the input states are additionally allowed to be entangled, one can also formally write a formula for the capacity using regularization of the Holevo quantity [5] which is, however, infeasible to deal with due to the necessity of considering entangled quantum states of arbitrary large number of subsystems. In case of commonly encountered quantum channels it is proven or at least strongly expected based on numerical investigations that C (∞,∞) = C (1,∞) [28][29][30]. The overall picture is more complicated, however, due to the example of Hastings [10] where a construction of two channels is given for which the Holevo quantity is demonstrated to be strictly super-additive. The construction is probabilistic and deals with channels of potentially very high dimensions, and as a result it is hard to assess the quantitative impact of thus demonstrated super-additivity for practical communication scenarios. To the best knowledge of the authors, up till know there has been no explicit example of a low dimensional channel relevant for communication purposes for which input super-additivity would be demonstrated. Therefore in this paper, we will write that C (1,∞) which is supposed to represent the fact that while there are input superadditive properties of the Holevo quantity, up till now they have not been demonstrated to be relevant in practical communication scenarios.
Super-additivity in communication from a quantum parameter estimation perspective 7 To summarize this section we may therefore write the following relation: pointing out to the presence of the output and the practical lack of the input superadditivity when thinking of the classical capacity of quantum channels.

Estimation
Classical estimation theory provides methods to optimally estimate a value of a parameter x based on observations y that are known to be distributed according to probability distribution p(y|x), that represents the probabilistic model for the problem considered. For this purpose one looks for the optimal estimator functionx(y) that minimizes the estimated parameter deviation from the true parameter value. Identifying the optimal estimator is non-trivial and its form in general critically depends on the prior knowledge available. Nevertheless, assuming the estimator is unbiased: x = y p(y|x)x(y) = x-so that it on average returns the true value-the Cramér-Rao (CR) inequality [31] allows to write a lower bound on the estimator variance: where ∆ 2x = y p(y|x)(x(y) − x) 2 , dot denotes differentiation with respect to x and F is the Fisher information (FI). Provided one identifies an estimator saturating the above bound one is sure to have found the optimal one. Even though saturation of the bound is possible for only a very limited class of probability functions, the so called exponential family of distributions [31,32], the situation is much clearer in the asymptotic regime when one registers many observations y i , i ∈ {1, . . . , N} independently and identically distributed according to p(y i |x). In this case the joined probability distribution p(y 1 , . . . , y N |x) = Π N i=1 p(y i |x) is product and hence, thanks to additivity of FI on independent probability distributions, the corresponding FI for p(y 1 , . . . , y N |x) equals F N = NF . As a result the estimation variance based on N observation is bounded according to (7) as: Most importantly, the above bound is saturable in the asymptotic limit of N → ∞ and the optimal estimator is the max-likelihood estimator [31,32] . Saturability of the CR bound for large N is intimately related with the local asymptotic normality theorem [33] proving that, in the limit of large N and after a suitable reparametrization, probability distribution Π N i=1 p(y i |x) can be viewed as a Gaussian distribution with mean being shifted by √ Nx and the variance equal to 1/F . Since Gaussian distribution with its mean as a parameter to be estimated is a member of the exponential family of distributions for which the CR bound is saturated this proves the fact.
In a typical quantum estimation problem we are given a family of states {ρ x } with the task of learning the parameter x. Apart from the issue of finding the optimal estimatorx(y) we also need to find the optimal measurement {Π y } that yields the actual conditional probability distribution of observed results p(y|x) = Tr(ρ x Π y ). The quantum generalization of the CR inequality yields the lower bound on the achievable variance irrespectively of the measurement applied [34]: where F Q is the quantum Fisher information (QFI) and L x is the symmetric logarithmic derivative (SLD) operator implicitly defined by: When written explicitly in terms of eigenvalues and eigenvectors of ρ x = d n=1 p n |n n|, the QFI reads: where both the eigenvalues and eigenvectors in general depend on x and in case of pure states estimation, ρ x = |ψ x ψ x |, the above formula simplifies to QFI is additive on product states, so that F (ρ ⊗N x ) = NF (ρ x ). Hence, given N copies of the state we get: Moreover, when a measurement is chosen so that it is a projective measurement in the SLD eigenbasis, the corresponding FI equals the QFI. Therefore, applying the arguments from from the classical case, the above quantum CR inequality is also asymptotically saturable in the limit of large N.
If the family of states to be considered is not given, but the parameter to be estimated is rather encoded in the action of a channel Λ x , we are additionally challenged to find the optimal probe states ρ in that allow the parameter x to be estimated with smallest possible uncertainty by measuring the output states ρ x = Λ x (ρ in ). When the probe state is sent into inputs of N copies of the channel Λ x , one can again ask whether entangled input states and collective measurements offer any advantage compared to uncorrelated strategies. The answer to this question is the key to understanding the benefits of quantum enhanced estimation scenarios, see figure 1 If only product input states are allowed, the maximal QFI per channel use reads: where the equality F arises thanks to additivity of QFI on product states and the fact that there always exist a local measurement for which FI equals to the corresponding QFI [35]. Therefore, unlike in the communication case, there is no benefit from application of collective measurements when product input states are used. However, when inspecting QFI for entangled input probes there are a plethora of examples when entanglement at the input increases the resulting QFI. In particular, for unitary parameter estimation the QFI may increase at a rate proportional to N 2 rather than linearly in N [18][19][20] and even though decoherence typically reduces the asymptotic scaling again to a linear one, the entanglement enhancement benefit remains in terms of a larger multiplicative constant [36][37][38]. Hence, in general QFI is input super-additive and we can therefore write: where F (∞,∞) Q denotes QFI optimized over all entangled states at the input, which contrasts the analogous relation for communication capacities given in (6). It is important to keep in mind, however, that for entangled input states, the optimal detection strategy in some instances may be collective [39]. Note that we take the convention where F (∞,∞) Q denotes the QFI per channel use to make it more like the capacity concept introduced before. The issue of finding F (∞,∞) has been addressed in [36,37,40,41].
Up till now we have based our whole discussion of the quantum estimation problem on the analysis of the QFI. When communication and estimation approaches are to be related, however, it is more natural to adopt the Bayesian perspective on estimation, as the prior distributions of the parameters to be estimated naturally translate to input symbol probability distributions in a communication problem. Taking the quadratic cost as a figure of merit, the optimal Bayesian estimation of a parameter x, given the familiy of states ρ x distributed according to the prior p(x), is the one that minimizes the average variance: over the choice of measurement operators {Π y } and estimatorsx(y). A general solution to the problem is known and the minimal achievable variance equals to [42,43]: wherex = dx p(x)x is the mean and ∆ 2 0 = dx p(x)(x −x) 2 the variance of the prior whereas L is implicitly defined via the following relation: withρ,ρ ′ defined as Super-additivity in communication from a quantum parameter estimation perspective10 The apparent similarity of (18) to the SLD formula (10) becomes even stronger when the prior p(x) is assumed to be Gaussian in which case (17) becomes: with F Q (ρ) being the QFI for the problem of estimating the mean of the priorx given the averaged stateρ. The Bayesian perspective is indeed adopted in papers making use of rate-distortion theory to derive bounds on estimation precision using communication tools [24,25]see section 4. Fortunately, the conclusions on the role of entanglement in the quantum estimation problem discussed above using the QFI concept remain qualitatively unchanged when the Bayesian methodology is applied [20]. Hence it is often enough to study the properties of the QFI which is easier to analyze. The QFI related quantities will also prove useful in Sec.5 where it is demonstrated that they play an important role in analyzing communication performance in the weak estimation regime.

Rate-distortion theory
A first natural place to look for relations between estimation and communication problems is the rate-distortion theory [2,44]. The main objective of the rate-distortion theory is to quantify how much information can be transmitted provided given level of errors and vice versa. In particular, viewing the estimation protocol as a communication channel form the input symbol x to its estimatorx, it is possible to lower bound the corresponding mutual information I(X :X) via [45] I(X : where ∆ 2x is the average estimation variance and for continuous random variables H(X) denotes a differential entropy. Intuitively, this relation reflects the fact that the better the estimation precision the higher the communication rate. Or stated the other way round, one needs to communicate a lot in order to estimate very precisely. Note, that here we refer to the estimation problem using Bayesian perspective as we explicitly take into account the form of prior distribution. Recall also that in the whole paper we focus on transmitting classical information encoded in quantum systems and therefore utilize results of classical rate-distortion theory abstracting from a more general quantum ratedistortion theory [46] where faithful communication of quantum states themselves is considered. Utilizing (21) together with the fact that I(X :X) is upper bounded by the Holevo quantity χ one can get a lower bound on the achievable estimation variance [24,25,47] Thinking of states ρ x as the outputs of a parameter dependent channel ρ x = Λ x (ρ in ), we may obtain the lower bound ∆ 2x valid for arbitrary input probe states, provided we are able to upper bound the corresponding Holevo quantity candidate is the capacity C (1,∞) , (2), of the channel which is obviously an upper bound for χ. The problem is that typically the formula for the capacity is not easily obtained and also the resulting bound may not be very informative. For example, for a single mode lossy bosonic channel with effective transmission η it is known that if the average number of photons at the input is upper bounded byn the capacity of the channel reads: When plugged into (22) this yields in the largen ≫ 1 limit: Thinking now of phase estimation, the bound is reasonably tight for the lossless case η = 1. However, in case of losses (η < 1), the bound is highly unsatisfactory since it is known that the Heisenberg limit is lost [20], and the achievable asymptotic scaling of phase estimation variance is 1/n rather than 1/n 2 . This is related to the fact that the optimal encoding that saturates the capacity of the channel is not the phase encoding characteristic for the phase estimation problem. Therefore, instead of plugging in the capacity of the channel itself it may be more reasonable to insert a tighter bound on χ obtained for an encoding present in a given estimation problem. Following this way of reasoning, a much more informative bound has been derived for the case of unitary where H(G|ρ in ) is the Shannon entropy of the measurement statistics corresponding to measuring ρ in in the eigenbasis of the generator G. This approach allowed to obtain useful precision bounds in case of decoherence-free nonlinear quantum metrology [25] and lossy optical estimation [24], where the correct 1/n phase estimation variance scaling in presence of losses has been recovered. The results summarized above made used of connections between estimation and communication fields in order to obtain original results in estimation theory. In this paper we focus on a complementary goal. We aim to obtain a better insight into communication aspects of quantum channels benefiting from our understanding of estimation related quantities.

Weak estimation regime
In this section we identify a regime where a connection between Shannon and Holevo quantities on one side and the QFI related quantities on the other can be established. This regime corresponds to a situation which we refer to as the weak estimation regime. The precise conditions will be given further in this section, but intuitively the regime we are interested in corresponds to a situation in which the knowledge on the parameter gained from measurement of the output state is small compared to the prior knowledge. More formally, we can state this condition as an assumption that 0 is a variance of the prior distribution and F (x) is Fisher information of the conditional probability distribution p(y|x) defining the channel evaluated at the prior mean valuex.
This regime is indeed of physical interest in some important instances of communication, especially communication on large distances when the power of the incoming signal is weak; we analyze a particular example of such case in Sec.7. Importantly, note that weak estimation regime usually do not apply to a situation in which we send the same symbol many times since then we can learn a lot about the parameter. This is reflected in the fact that Fisher information of the total conditional distribution increases proportionally to the number of channel uses which leads to breaking the condition ∆ 2 0 ≪ 1/F (x). We specifically consider this opposite regime in Sec.6.2.
Our main results relate mutual information and Holevo quantity with Fisher information and its quantum counterparts. In the classical case we show that in the weak estimation regime [48,49] whereas in the quantum case we have where J is a quantity analogical to QFI but with slightly different operational meaning and F n can be directly related to QFI. The exact definition of J and F n , proofs of the above equations and discussion will be given further in this section.

Classical case
Let us first discuss the classical case and write mutual information between the sender and the receiver Assuming p(y|x) is sufficiently smooth in x and the prior p(x) is sufficiently narrow we approximate p(y|x) using expansion around the prior meanx up to the second order where dots denote derivatives with respect to x taken at x =x. Taking the expectation value of this expression with respect to the prior p(x) we obtain p(y) ≈ p(y|x)+ , where ∆ 2 0 is the prior variance. Using this approximation, the first term in (27) hence reads: We now expand the log function and keep the leading order terms in ∆ 2 0 arriving at: By doing so, we have made an implicit assumption that p(y|x) > 0, otherwise the expansion would not be possible. Within our order of approximation, ignoring contribution from terms for which p(y|x) = 0, is justified provided in this caseṡ p(y|x) =p(y|x) = 0 as well. This is the technical assumption which intuitively means that events which are impossible when x =x do not gain probability too rapidly when moving away fromx. Moving on to the second term. We expand the conditional entropy aroundx up to the second order in (x −x): Taking now the average of the above expression over the prior p(x), the linear term vanishes and the result reads: Subtracting (31) from (29) we arrive at: where F (x) is the FI of p(y|x) evaluated atx. Note, that in the above derivation, while expanding the logarithm in the expression for Shannon entropy H(Y ) we have assumed that In order to expand logarithm in the expression for conditional entropy H(Y |X) we additionally assumed also that ∆ 2 0 ≪ 1/F (x) meaning the prior variance is much smaller than the variance dictated by the CR bound. This intuitively means that for the approximation (32) to hold our gain of knowledge on the parameter obtained from the observed data must be small compared to prior knowledge.
(32) reminds of a known relation between the FI and the relative entropy. Relative entropy D(p q) = y p(y) log[p(y)/q(y)] is a natural measure of a difference between two probability distributions. When considering two neighboring probability distributions p(y|x), p(y|x + dx) their relative entropy is approximated by the FI up to the second order in dx [50]: On the other hand mutual information may be expressed via relative entropy as: Had we replaced p(y) in the above formula with p(y|x) and expanded relative entropy aroundx using (33) we would indeed get (32). Validity of this replacement hinges upon the assumption that knowledge of the input parameter to be equal to the prior mean does not alter the conditional probability substantially. This again intuitively corresponds to the weak estimation regime, but is hard to justify formally without a more detailed analysis as presented above.

Quantum case
Moving now on to the quantum world, we ask for the generalization of (32) that would provide a connection between quantum communication and quantum estimation concepts. The most natural step would be to replace the FI appearing in (32) with the QFI. Indeed, this is a right approach provided we use product input states and no collective measurements, hence the (1, 1) scenario. In this case we simply replace single channel probabilities p(y|x) with Tr(ρ x Π y ). Since there always exist a measurement for which the corresponding FI equals to the QFI, this implies that when communicating using ρ x states with variance of prior distribution much narrower than 1/F Q (ρx) the mutual information may be approximated as: As a side remark, note that utilizing inequality (21), assuming a Gaussian prior and making use of an explicit relation between the the Bayesian cost and the QFI given in (20) leads to: Since for Gaussian prior H(X) = 1 2 log(2πe∆ 2 0 ), we get: Where the right hand side of the above inequality differs from (35) only by the replacement of ρx withρ. Clearly this makes sense, as mixing a state cannot increase the QFI, and hence the above inequality is indeed in agreement with our approximation. Let us now consider the Holevo quantity given by (4) describing communication capabilities when collective measurements are allowed. First note, that the Holevo quantity may be expressed as whereρ = dx ρ x is the average state and is the quantum relative entropy [3]. Interestingly, when expanding the quantum relative entropy for neighboring quantum states up to the second order we get [51] Super-additivity in communication from a quantum parameter estimation perspective15 where with p n and |n being the eigenvalues and eigenvectors of ρ x and d the dimension of the Hilbert space. Comparing the above equation with (11) it is clear that J(ρ x ) is in general not equal to the QFI, and we will refer to it as the relative entropy quantum Fisher information (REQFI). In fact it upper bounds the respective QFI J(ρ x ) ≥ F Q (ρ x ), with equality for diagonal density matrices [51]. Moreover, REQFI gives meaningful results only on mixed states, being infinite on pure states. This last fact is a counterpart of the infiniteness of quantum relative entropy for pure states.
Proceeding by analogy to the classical case, one might attempt to replaceρ with ρx in (38), plug in the expansion (40) and arrive at an approximate formula for Holevo quantity as in (35) but with QFI replaced by the REQFI. Instead, we provide below a general derivation for the approximating formula for the Holevo quantity, which proves that the above heuristic argument only works in case of full rank states (or states which are effectively full rank in the sense that their kernel subspace can be trivially removed from the considerations) and is not justified in general. Intuitively, this is related with the fact thatρ which is obtained as a probabilistic mixture may in general be a state of higher rank than ρx, and has a significant impact on the Holevo quantity.
Taking the Holevo quantity we expand ρ x around the prior meanx up to the second order and get ρ x ≈ ρx +ρx(x −x) + 1 2ρx (x −x) 2 . The average state at the output is therefore equal tō ρ = dx p(x)ρ x ≈ ρx + ∆ 2 0ρx 2 . Let p n denote eigenvalues and |n denote eigenbasis of ρx, which in case of degeneracy is further specialized to diagonalizeρx on each degenerate subspace. To calculate the first term in (42) we only need to know eigenvaluesp n of ρ. Treating term as a small perturbation added to ρx we make use of standard perturbation theory and get that up to the first-order correctionp n ≈ p n + ∆ 2 0 (ρx)nn 2 , where (ρx) nn = n|ρx|n and hence We make analogous assumption as in the classical derivation, namely that eigenvalues that are zero when x =x do not grow too rapidly when moving away fromx. Hence we assume that if p n = 0 then alsoṗ n =p n = 0. Still the situation we face is significantly different than in the classical case. Even with the assumptions made, we are not entitled to neglect the terms for which p n = 0 sinceṗ n =p n = 0 does not imply that (ρx) nn = 0. This is a crucial point and is related to the intrinsically quantum transformation of the states-the unitary transformation. Let r ≤ d be the rank od ρx. We split (43) into two parts depending on whether p n is strictly positive (1 ≤ n ≤ r) or zero (r + 1 ≤ n ≤ d) and expand the logarithm whenever it is positive Moving on to the second term in (42), note that S(ρ x ) only depends on eigenvalues of ρ x and its expansion is identical as in the classical case. Hence after averaging over p(x) we get where the sum is over non-zero p n . Subtracting (45) from (43) we get: Thanks to trace preservation r n=1p n = 0, and d n=1 (ρ) nn = 0 (note the summation upper limit is d), and as a result: Writing (ρx) nn explicitly we have Note that ṅ|ṅ = d k=1 | ṅ|k | 2 and since |k , |n denote orthonormal eigenvectors theṅ k|n = 0 and hence we can replace ṅ|k with − n|k arriving at: Plugging the above formula into (48) and recalling the definition of REQFI, (41), we arrive at the final expression: where the underline symbol in J indicates that the sum in the definition of J is restricted to n ≤ r, avoiding zero p n , and thus, unlike J which may sometimes be infinite, J is always finite. The F n quantities appearing in the second term read: In order to interpret them note that when differentiating the definition of SLD, (10), we get:ρx and using (10) again yields Sandwiching with |n , which is outside the support of ρx (n ≥ r + 1), and plugging into (52) we arrive at F n (ρx) = n|LxρxLx|n .
Comparing this with the definition of the QFI, (9), which can be rewritten as: we see that F n represent contributions to QFI from the subspace laying outside the support of ρx-the kernel of ρx. Recall that the eigenbasis |n outside the support of ρx is not arbitrary but was assumed to diagonalizeρx. The approximate expression for Holevo quantity, Eq (51), simplifies in two special cases. When ρx is full rank, orρ lives on the support of ρx, the second term in Eq (51) vanishes and the Holevo quantity only depends on J: Going to the other extreme, if ρx = |ψx ψx| is pure then the SLD can be written explicitly: and so: Thanks to ψx |ψx + ψx|ψx = 0 identity we have ψ|LρL|ψ = 0 and hence the whole contribution to QFI comes from the kernel of ρx. Let P 0 be projector on the kernel of ρx, then: Let us write |ψx = a|ψx + b|ψ ⊥ x , where |ψ ⊥ ψ is orthogonal to |ψx . It is now clear that the |ψ ⊥ x is a proper choice of eigenvector in the kernel subspace that makes |ψx diagonal on this subspace. Moreover, this is the only vector that will yield any contribution to QFI, and hence there is only one non zero F n (n = 1 + 1) which reads: and is equal to the QFI, see (12). Summarizing, for pure state protocols Holevo quantity can be approximated using only the QFI as: 6. Manifestations of super-additivity

Weak estimation regime
Approximate formulas for the mutual information, (35), as well as for the Holevo quantity, (51), provide an interesting insight into the issues of super-additivity. Let us first assume that no entanglement is used at the input, and that the output states coming out of individual channels are ρ x = Λ x (ρ in ). Let C p(x),Λx denote the "capacity" of the channel under a fixed encoding defined by the prior as well as channel parameter dependence {p(x), Λ x }. Note that when talking about capacity, we implicitly consider a scenario where the above specified encoding is repeated independently over N channels. By independently, one should understand here that the full action of N channels is distributed according to the prior distribution considered. Assuming the prior is narrow enough so that our approximations hold, and we restrict ourselves to only individual measurements at the output then invoking the approximate formula (35) for the mutual information yields: If, however, collective measurements are allowed then utilizing (51) we get: where the rank r here refers to the rank of ρx = Λx(ρ in ). Comparing the above two formulas one can easily appreciate the advantages coming from the use of collective measurements. Let us define a natural measure of output super-additivity as the ratio of the two capacities Focusing for clarity on the two extreme cases of full rank and pure output states, in which simple approximate formulas (57,62) for Holevo quantity are valid, the super-additivity measure reads where F Q [Λx] and J[Λx] denote maximal QFI and REQFI of the channel Λx, i.e.
and similarly for REQFI. Inspecting the full-rank case, we see that the measure of output super-additivity is equal to the ratio of the REQFI J and the QFI F Q . We have already stressed before that in general J ≥ F Q and now we can fully appreciate that the gap between these two quantities is actually responsible for the output super-additivity in the weak communication regime.
On the other hand, in case of pure states, γ (1,∞) is determined solely by the QFI and is divergent in the limit ∆ 2 0 → 0 indicating a more than a constant factor gain in communication potential thanks to the use of collective measurements.
Note that we can also consider output super-additivity even in the case of fixed state ρ in , which results in a fixed set of output states ρ x . In such instance our measure of super-additivity can be defined similarly as in (66) but with QFI and REQFI of the specific state ρx rather than the channel Λx. This is important in practical applications where usually the set of message states ρ x is specified by the laboratory apparatus.
Approaching now the input super-additivity issue, let us consider a scenario where N channels are divided into k-channel subgroups, Λ ⊗k x , that accept k-partite entangled probes at their inputs, so that the action of all channels is described as Λ ⊗k x 1 ⊗. . .⊗Λ ⊗k x N/k . Note that all channels within one subgroup encode the same value of the parameter, and this is repeated independently for other subgroups. The corresponding "capacity" reads: Provided we stay in the regime where our weak communication assumption holds, the Holevo quantity is expressible using F Q and J and the issue of input super-additivity is therefore related to the issue of super-additivity of the QFI and the REQFI. Super-additivity of QFI, i.e. the property that F , has been already discussed in Sec. 3 and is a typical feature of most quantum channels estimation problems with an exception of a very narrow class, e.g. loss estimation, where no entanglement at the input is needed to reach the optimal performance [52]. For example, when ideal unitary parameter estimaion is considered then F (k,∞) Q grows linearly in k and when plugged into pure-state approximate formula for Holevo quantity (62) the "capacity" (67) will get a further boost from the input super-additivity.
Super-additivity of the REQFI is much less studied, but its relevance is clear from our above analysis as noisy channels produce output states ρx which are likely to be highly mixed and satisfy the conditions which make the simpler formula for Holevo quantity approximation (57) valid. While the tools for studying the QFI in noisy channels are highly developed and allow to draw immediate conclusions on e.g. the maximal gain that can be offered by entangled probes, analogous studies have not been pursued in case of REQFI, as this quantity being an upper bound on QFI typically provides looser bounds in quantum estimation problems and hence apart from mathematical investigations was rarely appreciated in a more practical-oriented studies. We hope that our present work, will boost the interest in the REQFI, as an element providing a connection between estimation and communication problems. For the time being, we provide here an example of reasoning that shows how super-additivity of REQFI may be analyzed using tools developed in the quantum estimation field.
One of the simplest ideas allowing to find the limit on the maximal achievable QFI when entangled probes are used, is the idea of classical simulation of the channel [37,53]. In this approach, one treats quantum channel space as a probability space over which classical parameter dependent probability distribution emulates the quantum channel change, and one upper bounds the QFI by a classical FI of this probability distribution. The basic property of the QFI that is used in this derivation is that it does not increase under parameter-independent channels and reduces to classical FI for diagonal density matrices. The same properties are, however, also enjoyed by the REQFI. One can therefore immediately apply the known upper bounds on QFI derived using classical simulation method to REQFI. Now provided, the bound on QFI is asymptotically saturable, this automatically implies that since J ≥ F Q the bound is saturable for J as well, and in this way one can obtain an asymptotic formula for J optimized over entangled input state of large number of probes proving in particular its input super-additivity. We present quantitative results obtained with the help of this kind of reasoning in Sec. 7.
Therefore, it is clear from the discussion above that in the weak estimation regime, we enjoy both aspects of quantum super-additivity; first on the level of measurements (output super-additivity) formally reflected by the replacement of F Q by J or − ln 2F Q log ∆ 2 0 F Q /4e in the approximate formula for the Holevo quantity; and second on the level of input probes related to the input super-additivity of F Q and J.
We should note, however, that the requirements of our approximation never allow us to increase k too much, and least of all consider k → ∞. This is because our weak estimation assumption requires that what we learn at the output is small compared to prior knowledge. Clearly, by increasing k we increase the information available about the input parameter and hence at some point the approximation needs to break down. In the next subsection, for completeness of our presentation we discuss in more detail the opposite regime of k → ∞, which we refer to as the strong estimation regime and contrast it with the weak estimation regime discussed so far.

Strong estimation regime
We start with a classical scenario. Consider k independent repetitions of an experiment governed by the same conditional probability distribution p(y i |x) so that the joined probability distribution of k results y k = {y 1 , . . . , y k } reads: p(y k |x) = Π k i=1 p(y i |x).
Note the fixed value x for all experiments. As mentioned already in Sec. 3, thanks to the local asymptotic normality theorem [33] we know that in the limit of large k, the problem can be translated to that of estimating x from a Gaussian distribution with mean being shifted by √ kx and variance equal to 1/F (x). Less formally, we can think of this problem as estimating shift x from a Gaussian distribution with variance narrowing as 1/(kF (x)). This allows to obtain an asymptotic formula for the mutual information in this protocol treating it as a communication task of transmitting x [32,54]: where the second term represents the average entropy of a Gaussian distributions with variance 1/(kF (x)), and the o(1) is a term of order 1 appearing due to the fact that perfect Gaussianity is achieved only asymptotically. The above formula can be rewritten in a more appealing form which clearly shows that the information communicated increases only logarithmical with k, and that it is maximised for Jeffreys prior p(x) ∼ F (x) [32]. Jeffreys prior is often considered the least informative prior and therefore it is used to represent a complete lack of knowledge about the parameter [55]. Due to only logarithmic increase in mutual information, the strong estimation regime is not likely to be interesting for communication purposes. Given N to be a total number of available uses of a channel, it is preferable to keep k relatively small and repeat the communication procedure N/k times using independently distributed input parameters x i . While each of the parameters will not be estimated particularly well, the resulting mutual information will scale like N/k quickly surpassing the log N behaviour which we would be left with had we set k to its maximal available value N in order to perform the most precise parameter estimation possible. In other words, for the purposes of communication it is much better to have an increasing number of independent parameters transmitted under fixed noise, rather than a fixed number of parameters transmitted under decreasing noise.
Considering the quantum counterpart of the above scenario with no entanglement used at the input, i.e. the receiver obtains ρ ⊗k x state, and no collective measurements allowed at the output, we immediately get an analogue of (69) by substituting F (x) with F Q (ρ x ). This is because one can always find a local measurement for which F = F Q .
On the other hand, if we allow for collective measurements we no longer enjoy the product structure of the conditional probability distribution, and therefore cannot directly apply the results of classical local normality theory. Fortunately, quantum generalization of local asymptotic normality [56,57] states that the estimation of a parameter from a product state ρ ⊗k x in the limit of large number of copies k → ∞ is equivalent to estimation of a displacement of a particular Gaussian quantum state with a variance in the direction of the shift corresponding to the QFI-there exist CP maps that translate one problem to another under trace norm. Thanks to continuity of the von-Neuman entropy with respect to the trace norm [58,59] we may therefore argue that calculation of the Holevo quantity may also be done using the Gaussian states. We do not attempt to give a rigorous proof here, but by analogy these observations suggest a conjecture that a good approximation to Holevo quantity in the strong estimation regime should be Therefore, this intuition suggest that collective measurements offer no additional advantage here, unlike in the weak estimation regime. Note, that one can derive also an other generalized version of (69), important for quantum data compression and communication tasks [60,61], however, we do not use it here since we deal with classical communication.
Thinking now of entangled input probes in the strong estimation scenario, one should not expect a relation like above to hold in general. Some fundamental incompatibilities between QFI based and entropy-communication based approaches when entangled input probes are considered where underlined in [25,47]. The most striking is the example of phase estimation using NOON states. While QFI grows as k 2 , the information communicated is never larger than a single bit as the output state is restricted to a two-dimensional subspace (which also implies that despite large QFI NOON states are not really useful in practical estimation (note, however, that NOON states are not useful in practical estimation [62,63]). Something relatively general, can nevertheless be said. In case channels Λ x are noisy, the QFI at the output for optimally entangled input probes generically scales linearly with k and the quantum enhancement amount to a constant factor improvement [38]. Moreover, as argued in [62,64] one can achieve almost optimal performance utilizing states where k probes are divided into groups of g particles where entanglement is present only among the particles belonging to the same group. In such a scenario, one can approximate the input state as a product state of large number of groups, number of which will tend to infinity while their size will remain constant when k → ∞. This makes it possible again to apply the reasoning based on quantum local asymptotic normality, and argue that for this class of states (70) ) representing a performance gain thanks to the use of entangled input probes.
In short. While in the weak estimation/communicaton regime we have observed both the effects of input and output super-additivity in the strong-estimation regime there seems to be no gain from the use of collective measurements while one may still observe benefits coming from entanglement present in the input states. p(ϕ),Λϕ (black) for qubit dephasing channel as a function of dephasing parameter η is compared with the case when entanglement between two channels inputs is additionally allowed γ (2,∞) = C (2,∞) p(ϕ),Λϕ /C (1,1) p(ϕ),Λϕ (red); the prior variance is assumed to be ∆ 2 0 = 2 × 10 −2 . The dashed curves represent the same quantities calculated using narrow-prior approximate formulas (74). The inset depicts the validity of our approximation by presenting the ratio of the exact Holevo quantity (73) to the approximate expressions as a function of variance of prior distribution in decoherence-free case η = 1 (black, dotted) as well as in presence od dephasing η = 0.9 for product (black, solid) and optimally entagled two-channel inputs (red, solid).

Examples
In this section we provide two examples of communication for which one can easily analyze issues of input and output super-additivity using the knowledge of the behavior of the QFI and the REQFI and study the validity of our approximation by comparing it to rigorous calculations.

Qubit dephasing channel
As a first, illustrative example, let us consider a phase encoding qubit channel in presence of partial dephasing: where U ϕ = e −iϕσz/2 is the unitary phase encoding operator, with σ z denoting Pauli z operator, whereas K i are Kraus operators of the dephasing map where η is the dephasing parameter.

Super-additivity in communication from a quantum parameter estimation perspective24
We choose a Gaussian prior probability distribution p(ϕ) = 1 √ 2π∆ 2 0 e −ϕ 2 /2∆ 2 0 which is a valid probability distribution on a circle ϕ ∈ [−π, π] for small variances ∆ 2 0 which is the regime we are interested in. The Holevo quantity for this model is maximized for input states lying on the equator of the Bloch ball and can be easily calculated yielding where denotes a binary entropy function. In the limit of narrow prior distribution ∆ 2 0 ≪ 1 the above formula becomes Let us now analyze this model with the help of the ideas developed in this paper. From (63) we know that C (1,1) can be expressed through the QFI. Phase estimation in presence of dephasing is a well understood problem, the QFI is again maximzed for states lying on the equator and reads max ρ in F Q [Λ 0 (ρ in )] = η 2 . Hence Knowledge of the maximal QFI is also sufficient to calculate approximate C (1,∞) p(ϕ),Λϕ in the decoherence-free case (η = 1) with the help of (62) and yields the result which agrees with (74). On the other hand, in presence of decoherence, η < 1, in order to get an approximate expression for C (1,∞) p(ϕ),Λϕ we need to calculate the REQFI and optimize it over input states. The resulting expression reads max ρ in J[Λ 0 (ρ in )] = η 2 ln 1+η 1−η . Since the output state Λ 0 (ρ in ) is supported on the whole Hilbert space we may utilize (57) and as a result get the same expression as in (74). In this case the output super-additivity measure γ (1,∞) = 1 2η ln 1+η 1−η ≥ 1 proving the generic advantage of the use of collective measurements in this communication protocol. Note, however, that for large dephasing η ≪ 1 we have γ (1,∞) → 1 and super-additive behavior of the capacity is lost, see Fig. 2.
To analyze input super-additivity we need to analyze the QFI and REQFI for phase estimation in presence of dephasing when inputs states are allowed to be entangled. For decohrenece free case QFI exhibits Heisenberg limited scaling and we have F . Consequently, the optimal QFI for dephasing is super-additive and therefore also capacity for decoherence-free communication exhibits input super-additivity. If, on the other hand, the decoherence is present in our setup, we should consider REQFI instead of QFI. It was shown in [36,37] that asymptotically, for large number N of two-level particles the QFI per particle is bounded by an expression F (N,∞) Q ≤ η 2 1−η 2 and the bound is tight in a sense that for large N one can find an entangled input state and measurement for which the inequality is saturated. However, as stated in Sec. 6, REQFI must obey the same classical simulation bound as QFI and since the bound can be saturated by the latter quantity it necessarily p(α),Λα for optical communication utilizing coherent states with narrow gaussian amplitude distribution via a thermal lossy channel as a function of transmission coefficient η for average input photon numbern = 0.01 and thermal number of photons n th = 0.1 (red, solid), n th = 1 (black, solid). The dashed lines represent the same quantities calculated using our approximation. The ratio of the exact Holevo quantity value to the approximated one is depicted in the inset as a function of the average input photon number which plays the role of the width of input parameter distribution: n th = 0, η = 0.9 (black, dotted), n th = 0.1, η = 0.5 (red, solid), n th = 1, η = 0.99 (black, solid). also have to be tight for the former one. Therefore we may apply the same reasoning as in the decoherence-free case, this time with a conclusion that J (N,∞) > J (1,1) . This implies that also in the presence of nonzero dephasing sending entangled input states can improve the capacity in addition to gains already present thanks to utilizing collective measurements, see figure 2 where the benefits of entangling two inputs are depicted and it is clear that C

Bosonic thermal channel
The utility of our approximated formulas applied to the qubit example presented above may be questioned as they are only valid in the weak estimation regime of narrow prior phase distribution and it is not clear what practical motivation might justify the use of such a narrow input phase encoding for communication purposes. In order to show that the limit is actually of physical interest, as our second example let us consider a model of optical communication through a thermal channel which is a quantum analogue of classical additive white Gaussian noise channel. As will be clarified below, in this case the weak estimation limit we are interested in appears naturally in practical applications as it corresponds to the regime of small input light intensities-a regime of high relevance and extensively investigated in optical communication literature [15,16,65,66].
Intuitively, the evolution of an input state of light through the thermal channel may be described as mixing a single mode input state with a thermal state ρn th = ∞ n=0n n th (n th +1) n+1 |n n| with average number of photonsn th on a beamsplitter with transmissivity η. Note, that in the extreme casen th = 0 the thermal channel describes pure photon losses. We assume that the encoding of information is in the amplitude α ∈ R of a coherent state with a Gaussian prior probability distribution p(α) = 1 √ 2πn e −α 2 /2n , wheren is the average number of photons per channel use in our communication procedure and simultaneously plays the role of the variance of the amplitude random variable. The encoding procedure is realized by the action of a displacement operator on the input vacuum state |α = D(α)|0 , where D(α) = e αâ † −αâ andâ,â † are respectively annihilation and creation operators of the input bosonic mode. Since both coherent and thermal states as well as the evolution are Gaussian we may easily express the output state. To do this we apply the methods from [67] and get displaced thermal states at the output Since the above state is already written in its eigenbasis we can use (11) and calculate QFI and REQFI for the problem of displacement parameter estimation, which read F Q = 4η 1+2(1−η)n th and J = 2η ln 1+(1−η)n th (1−η)n th respectively. We can now use these results in order to write communication rate in the regime of small average number of photons n ≪ 1, which is exactly the regime of narrow prior distributions. First of all, in the absence of thermal environmentn th = 0 i.e. for lossy channel, the output states are pure ρ α = | √ ηα √ ηα| so according to (62) we have χ ≈ ηn log e ηn , whereas even if only a small amount of thermal photons is present the output state is mixed and by (57) the rate is reduced to χ ≈ ηn log 1 + (1 − η)n th (1 − η)n th .
These expressions agree with the expansion in the average number of photons of the exact Holevo quantity for thermal channel which is given by where the function f (x) = x + 1 2 log x + 1 2 − x − 1 2 log x − 1 2 and β = 1 2 + (1 − η)n th is the diagonal element of the covariance matrix of the thermal state ρ (1−η)n th [68]. The convergence of approximate and exact formulas for small average number of photons can be seen in the inset of figure 3.
The above results imply that in the limit of small average number of photons Holevo quantity behavior changes drastically depending whether there are thermal photons in the environment or not. In the first case, according to (78), the rate scales linearly with the average number of photons. If, however, the environment is in the vacuum state, that is we are dealing with purely lossy channel, we see from (77) that rate scales super-linearly withn. The presence of thermal environment therefore can reduce the rate significantly in the regime of weak signal power.
From the form of the QFI and the REQFI it is also evident that information transmission rates for the considered setup clearly exhibit output super-additivity. In both cases, either the pure lossy channel or thermal channel, we see that Holevo quantity is larger than the respective accessible information, which in the weak estimation limit is clearly visible thanks to the use of formulas (66) and the fact that in our case J > F Q . This super-additive behavior is depicted in figure 3. Note, however, that large thermal noise reduces the gain from using collective measurements and asymptotically forn th ≫ 1 we do not see super-additive behavior lim n th →∞ γ (1,∞) = 1. On the other hand, in the absence of thermal photons, one still gets the advantage from output superadditivity in the regime of small average number of signal photons irrespectively of losses present.
Finally, let us also point out that (77) and (78) agree with asymptotic expansion of the capacity of lossy and thermal channels respectively [69,70] in the limit of small average number of signal photons. Therefore based on our approximation we can conclude that in the regime of weak signal power in order to obtain optimal performance of communication it is sufficient to encode information using just the displacement in single quadrature and Gaussian prior probability. This is an example, where the encoding in the estimation problem considered happens to be the optimal encoding in the problem of unrestricted capacity optimization, and hence results may be directly related with the actual channel capacity formulas and not only with the channel "capacities" under sub-optimal encodings.

Conclusions
We have highlighted a connection between communication concepts such as the mutual information and the Holevo quantity on one side and Fisher information related quantities utilized in quantum estimation theory on the other. The presented approach allows one to trace the aspects of super-additivity both at the input as well as at the output stages of the communication protocols provided one operates in the weak estimation regime where the amount of information learned from the measurements is small compared to the prior knowledge. This regime is in particular highly relevant in optical communication utilizing weak light beams. The main message of the paper is that in this regime the input super-additivity can be linked to the input super-additivity of the QFI F Q and the REQFI J whereas the output super-additivity is intimately related with the majorization of F Q by J. Our results provide also a new operational interpretation for J, as it appears naturally in the approximate formula for the Holevo quantity in case of full rank output states as well as for F Q which determines the communication performance in case of pure output states. Since the symbol encoding in the considered communication protocols were restricted to the ones appearing in the corresponding estimation schemes, the validity of statements on communication super-additivity issues appearing throughout our work is necessarily restricted to these particular encodings. Still, as demonstrated by the example of optical communication in the weak power regime, in some cases simple estimation relevant encodings can be found that lead to the optimal communication performance and hence allow to address the concept of fundamental channel capacity quantity within our approach as well.