Quantum collapse rules from the maximum relative entropy principle

We show that the von Neumann--Lueders collapse rules in quantum mechanics always select the unique state that maximises the quantum relative entropy with respect to the premeasurement state, subject to the constraint that the postmeasurement state has to be compatible with the knowledge gained in the measurement. This way we provide an information theoretic characterisation of quantum collapse rules by means of the maximum relative entropy principle.


INTRODUCTION
The dynamics of quantum states in the orthodox (von Neumann's) foundations of quantum mechanics consists of two different prescriptions: the unitary evolution and the so-called 'collapse' of a quantum state to a subspace encoding the knowledge gained in the outcome of a measurement. The mappings (rules) describing this collapse were originally formulated by von Neumann [1], and later improved by Lüders [2]. There are two different forms of collapse. When one knows only that a measurement corresponding to an observable (a self-adjoint operator with a discrete spectrum) O has taken place, the 'weak' rule applies. It is defined as ρ → i∈I P i ρP i , where ρ is an original quantum state (in general, a density operator), while O = i∈I λ i P i is a spectral decomposition with some countable index set I (hence, i∈I P i = I, P i P j = P i δ ij , and λ i ∈ R ∀i, j ∈ I). If a measurement corresponding to O has resulted in a specific value λ k ∈ {λ i | i ∈ I} associated to a projector P k ∈ {P i | i ∈ I}, then the 'strong' rule, ρ → P k ρP k / tr(ρP k ), is applied.
The negative of Umegaki's quantum relative entropy [3,4], D(ρ, σ) = −S(ρ, σ) := tr(ρ ln ρ − ρ ln σ) ∈ [0, ∞], can be used as a measure of distinguishability, or relative information content, of the quantum state σ from the state ρ. The use of D instead of S follows Wiener's idea that the «amount of information is the negative of the quantity defined as entropy» [5]. Note that we call S = −D the relative entropy, following the convention of [6] that makes the Gibbs-Shannon and von Neumann entropies the special cases of S, after adding a constant: S vN (ρ) = S(ρ, I/n) + log(n).
The function D can be considered as a nonsymmetric distance: in general, D(ρ, σ) = D(σ, ρ). If a given state is σ and we believe it to be ρ, it can be easier or harder to find our error than if their roles were reversed. Say, σ = P with P some projector and ρ = I/n. If we measure the property corresponding to I − P , a single measurement can tell us that the state is not σ, whereas no single measurement could reveal the same of ρ. See e.g. [7,8] for an overview of reasons for using D(ρ, σ) as a measure of distinguishability and relative information content.
A key information theoretic property of the strong collapse rule is that the probability of measuring the value λ k again, after having measured it once, is 1, which follows from tr P k P k ρP k tr(ρP k ) = 1. Repeated measurements add no new information. Clearly, the state P k ρP k / tr(ρP k ) is not the only state that has this property. What we demonstrate in this letter is that, among all states that have this property, the strong collapse rule selects the state that is least distinguishable from the initial state ρ, that is, it has the minimum relative information D(ρ, ·), in a suitably regularised sense. This allows for an information theoretic characterisation of the strong collapse rule: the state after measurement is the state that is least distinguishable from the previous state, while being compatible with the new information gained by the measurement.
In order to derive the strong collapse rule, we will need two intermediate results. First we will show that the weak collapse rule produces the least distinguishable state among the block diagonal states. We then show that a weighted version of the strong collapse rule, ρ → i p i P i ρP i / tr(ρP i ), is the least distinguishable amongst the states with blocks of fixed trace. This rule can be interpreted as corresponding to a measurement where we believe that the result P i occurred with probability p i . This intermediate step regularises the problem of a strong collapse, which is then obtained as a limiting case, by taking p i → δ ik with k = 1.
Our derivation of the collapse rules from the constrained maximisation of Umegaki's quantum relative entropy is of special importance in the context of epistemic and information theoretic approaches to the foundations of quantum theory. In this context, collapse rules have been considered as analogues of the Bayes-Laplace rule [9][10][11][12]. This analogy rested on mathematical and conceptual similarity, but was not derived from any single unifying principle. In the meantime, the Bayes-Laplace rule has been shown to be a special case of the constrained maximisation of the Kullback-Leibler relative entropy [13][14][15][16]. Our result provides the missing piece of the puzzle. Both the Bayes-Laplace and von Neumann-Lüders rules are special cases of a single epistemic principle of inductive inference (or, in other words, information theoretic state updating). This issue will be discussed in more detail in the last section.

THE SETUP
We will consider the finite dimensional case. Hence, quantum states will be identified with non-negative matrices of trace 1, which form the affine simplex D in the space of all hermitian n × n complex matrices.
The function D(·, ·) is jointly convex in both arguments [17], which implies that D(ρ, ·) is convex on D for all ρ ∈ D. Due to the finite dimensionality of the problem, we can use the first order condition for the existence of a minimum of a convex function (see e.g. For a function differentiable at x this condition states that if x is in the interior of V then the derivatives of f need to vanish. If x belongs to some strata of the boundary of V then all tangential derivatives need to vanish whereas derivatives in inward transversal direction need to be nonnegative. In our minimisation problem we have a subspace V ⊂ D of density matrices that is defined by a linear equation and thus is a subsimplex. The function D(ρ, ·) restricts to a convex and differentiable function on V and we want to find its minimum. Thus we simply differentiate in the directions preserving V and set the derivatives to be positive. We will denote this condition by The next two sections will be concerned with evaluating this set of equations.

WEAK COLLAPSE
In the case of a weak collapse due to the measurement of O = i λ i P i , the constraint set is given by the block diagonal density matrices, The condition (2) is equivalent with σ ∈ D iff σ = i P i σP i , as well as with σ ∈ D iff [O, σ] = 0 (see [19] for a discussion).
We can parametrise V w in terms of the singular value decomposition of σ. Every element of V w is of the form with Λ a trace 1 diagonal matrix with positive entries, U a unitary that is a product , that is, functions (in the sense of the functional calculus) act blockwise on the space V w .
Let us consider first the variation ∂ Vw tr(ρ ln(·)) = 0 in the direction parametrised by the U i . Given a function on a Lie group f (U ) we can take the directional derivative by looking at the parameter derivative of a one parameter group of diffeomorphisms on U . As multiplication in a Lie group is differentiable we can pick the one parameter group of diffeomorphisms generated by left multiplication with the one dimensional subgroup exp(tL), We then define the directional derivative in direction L as the derivative of the pushforward of f along φ t , For a function that is the trace of U in a particular representation this can be easily evaluated: A straightforward calculation shows that we further have d dt φ ♯ t tr(AU BU * )| t=0 = tr(ALU BU * ) − tr(AU BU * L). Note that [L i , P j ] = 0, and in particular L i P j = δ ij L i . The derivative then takes the form We thus see that if σ and i P i ρP i are concurrently diagonalisable, the above equation vanishes. In fact, since [ln σ i , ρ i ] is traceless and {L i , iL i } spans the space of all traceless matrices in the i-th matrix block, this is also a necessary condition.
Let us next consider the variation in the direction of the spectrum, that is the direction of Λ. We are interested in the case where σ and i P i ρP i are concurrently diagonalisable. Let κ σ k and κ ρ k be the eigenvalues of σ and i P i ρP i respectively. If κ ρ i = 0 and κ σ i = 0 then D(ρ, σ) = ∞, so this can not be the minimum if a state with finite relative entropy exists, and we can disregard this case here. Let us first consider the case that all κ ρ i = 0. We have the condition The derivatives ∂ Λσ have to preserve the trace. An overcomplete basis of such derivatives is given by ∂ κ σ k − ∂ κ σ l . Thus, So, the ratios of the eigenvalues of i P i ρP i and σ are fixed. As they both are trace 1, this implies they are the same. If some of κ ρ i = 0, then the above condition cannot be satisfied. However we can still look for the minimum on the boundary. Let assume that κ ρ i = 0 for i ∈ I. The function D is independent of κ σ i for i ∈ I. We will prove that directional derivatives of D in the point κ σ j = κ ρ j are positive. For that we have to check the derivatives with positive +∂ κ σ i , rather than also those with the negative signs, as the latter ones point out of the space of density matrices, making the eigenvalues negative. We thus have the derivatives ∂ κ σ i −∂ κ σ l for i ∈ I and l ∈ I and ∂ κ σ k −∂ κ σ l for k, l ∈ I. These are given by and we see this is a global minimum. Recall that if κ ρ i = 0 when κ σ i = 0 then D(ρ, σ) = ∞. We now also need to consider the case that κ ρ i = 0 when κ σ i = 0. In that case we would get the full derivatives in the i direction, thus the equations (10) apply, which can not be satisfied unless all κ σ j = 0, which can not occur in D.
Combining this with the above we have that The state σ = i P i ρP i is the only state σ satisfying C ρ Vw (σ) ≥ 0. The set V w is convex, so from (11) and convexity of D(ρ, ·), this is the unique global minimum.

STRONG COLLAPSE
The conditions defining 'strong' collapse that were specified in Introduction lead us to a troubling situation, because for such nonfaithful states the relative entropy is almost always infinite. We will overcome the problem by deriving a generalised version of the strong collapse rule that is a quantum counterpart of Jeffrey's rule. The ordinary strong collapse rule will be then obtained by a limiting procedure.
Consider a constraint set given in terms of p i ∈ R such that i p i = 1 by (12) where {P i | i ∈ I} is again determined by the spectral decomposition of an observable O = i∈I λ i P i . The set (12) can be interpreted as encoding the knowledge that the measurement outcome λ i corresponding to a projection P i occurs with a probability p i .
Here we encounter a problem. If we have a p i nonzero but tr(ρP i ) = 0, then every state in V s will have relative entropy −∞ to ρ. Moreover, even if we subtract the infinite constant, we find that the regularised distance does not depend on the state in the block P i and there is no unique minimum. We thus will always assume that tr(ρP i ) = 0 for p i = 0.
The variation in the U i direction goes through as before. However the variation in the direction of the spectrum changes in that a basis is now given in terms of ∂ κ σ i k −∂ κ σ i l , with κ σi k and κ σi l belonging to the same block P i and thus being eigenvalues of σ i . Thus only the fractions of eigenvalues within each block are fixed. This implies that the eigenvalues of σ i are uniformly scaled relative to the eigenvalues of ρ i . The condition k κ σi k = p i fixes σ i to be p i ρ i / tr(ρ i ).
This shows that The state σ = i p i PiρPi tr(PiρPi) is the only state σ satisfying ∂ Vs D(ρ, σ) ≥ 0.
The strong collapse is a limiting case of the above projection, with all p i going to zero except of one, p 1 , corresponding to a projection P 1 that, in turn, corresponds to a measurement result given by an eigenvalue λ 1 . We obtain this by taking the weak continuous limit. .
Note that in the finite dimensional case that we consider here the weak topology and norm topology coincide.

THE FOUNDATIONAL VIEW
In the orthodox formulation of quantum mechanics the 'collapse rules' are postulated. Thus, they are not deduced from any other more fundamental principle. They can be derived from several different conditions, see [20,21] for a review, but none of these conditions possesses the status of a fundamental principle of quantum theory. The weak collapse rule can be derived by taking the tensor product with an auxilliary state, followed by unitary evolution and a partial trace. This may serve as a derivation independent of interpretational issues (when this procedure is interpreted as an interaction with some ontic environment, it is usually considered as an instance of decoherence). However, no such construction exists for the strong rule. This fact, as well as the unclear relationship between the strong collapse rule and unitary evolution, renders the orthodox mathematical foundations conceptually insufficient, asking for further insights.
In general, an ontic interpretation of the quantum state leads to considering quantum collapse as a change of the "state of being" of some "material object/thing". On the other hand an epistemic interpretation leads to considering quantum collapse as a change of the "state of information" of some "experiencing user/agent". (There also is a corresponding difference in the meaning of the term 'measurement'.) In particular, the dynamical reduction approach of [22], belongs to the former class, providing an ontic explanation by means of a general dynamical principle from which the quantum collapse rule is derived. On the other hand, an epistemic interpretation of collapse rules as quantum mechanical analogues of the Bayes-Laplace rule p(x) → p(x)p(b|x)/p(b) was proposed in [9][10][11][12]. However, no epistemic explanation, understood as a derivation from some fundamental principle of information theory (or statistical inference theory) has been offered. Our paper (as well as the closely related paper [23]) provides such a derivation.
Following the postulates of [24,25] (which aim at reapproaching the foundations of quantum theory in the spirit of [26][27][28][29]), we demonstrated that the mapping to the unique solution of constrained minimisation of the relative information D, can serve as the general principle of quantum state change due to the acquisition of new information (represented by the constraints Q). This amounts to selecting the quantum state that is the least distinguishable from the original state among all states that are in a strict agreement with the new knowledge (represented by the constraints).
In order to derive the quantum collapse rules from the principle (15), we needed to identify the information theoretic constraints that define the situations of weak and strong collapse. The 'weak' collapse amounts to encoding the information that a specific observable O has been subjected to measurement. A quantum state σ that carries such information has to be compatible with the possi-bility of measuring all eigenvalues of O precisely. Such a situation can be characterised by the condition [σ, O] = 0 (or, equivalently, [P i , σ] = 0 ∀P i ). The 'strong' collapse should additionally result in a state that would reproduce the result of measurement of a particular eigenvalue with certainty (that is, with probability equal 1). That is, given a projector P encoding the outcome λ of the measurement, the post-collapse density operator σ should satisfy the condition of a 'weak' collapse, as well as tr(P σ) = 1.
This way our results can be considered as a quantum counterpart of derivations [13][14][15][16] of the Bayes-Laplace rule from the constrained maximisation of the Kullback-Leibler relative entropy [30], S(p, q) := − X µ(x)p(x) log(p(x)/q(x)), where x ∈ X , while p and q are densities of probability measures with respect to a measure µ on X . The functional S(p, q) is a special case of Umegaki's quantum relative entropy S(σ, ρ) for discrete X and [σ, ρ] = 0. This strengthens the analogy between the Bayes-Laplace and the von Neumann-Lüders rules: they are just two special cases of a single general principle of inductive inference, given by (15). From the Bayesian perspective, the state ρ is a prior, while σ, satisfying the constraints and maximising S(ρ, σ), is a posterior.

REMARKS
All earlier results on derivation of weak and strong collapse rules from minimisation of two point functionals on the space of quantum states [19,[31][32][33][34][35][36][37][38] were obtained for (various) symmetric quantum information distances. The importance of our result stems from the importance of (the negative of) Umegaki's relative entropy in quantum information theory as opposed to symmetric quantum information distances, which do not carry a similar semantic significance.
After finishing this paper, we were informed about reference [39], where it is shown that a state σ = i P i ρ i P i , where P i are rank 1 projectors, minimises the functional D(ρ, σ). This is a special case of our result for the weak collapse rule. The generalisation to our result is stated without proof in [40].
A closely related paper [23] deals with the same type of problem as here, but using a different mathematical approach, allowing for treatment of the infinite dimensional case. Further conceptual and mathematical discussion associated with the results of both papers is carried out there and in [25].

ACKNOWLEDGMENTS
We would like to thank Carlos S. Guedes for many important and insightful discussions throughout the de-velopment of this result. We thank also Patrick Coles for informing us about [39,40]. This research was supported in part by Perimeter Institute for Theoretical Physics. Research at Perimeter Institute is supported by the Government of Canada through Industry Canada and by the Province of Ontario through the Ministry of Research and Innovation. This research was also partially financed by the National Science Center of the Republic of Poland (Narodowe Centrum Nauki) through the grant number DEC2011/01/N/HS3/03273.