The Quantum Bayes Rule and Generalizations from the Quantum Maximum Entropy Method

The recent article"Entropic Updating of Probability and Density Matrices"[1] derives and demonstrates the inferential origins of both the standard and quantum relative entropies in unison. Operationally, the standard and quantum relative entropies are shown to be designed for the purpose of inferentially updating probability distributions and density matrices, respectively, when faced with incomplete information. We call the inferential updating procedure for density matrices the"quantum maximum entropy method". Standard inference techniques in probability theory can be criticized for lacking concrete physical consequences in physics; but here, because we are updating quantum mechanical density matrices, the quantum maximum entropy method has direct physical and experimental consequences. The present article gives a new derivation of the Quantum Bayes Rule, and some generalizations, using the quantum maximum entropy method while discuss some of the limitations the quantum maximum entropy method puts on the measurement process in Quantum Mechanics.


Introduction
The recent article "Entropic Updating of Probability and Density Matrices" [1] derives and demonstrates the inferential origins of both the standard and quantum relative entropies in unison.The derivations of the standard and quantum relative entropies in [1] were not rudimentary; rather, a set of inferentially guided design criteria were proposed to design a function capable of accurately updating probability distributions when faced with incomplete information.The solution has the functional form of the standard relative entropy and thus the standard relative entropy is the functional designed for the purpose of probability updating.Similar (design) derivations exist [2,3,4,5,6,7,8], but the number of required design criteria was reduced in [1].What is particularly pleasant in [1] is the equal implementation of the same design criteria to design a functional capable of updating density matrices.This parallel derivation shows the quantum relative entropy is designed to update density matrices -formulating an inferentially oriented quantum maximum entropy method.Not only was the quantum relative entropy found in [1], but we also learned how to use it.This discussion saturates the previous article.Here we provide a new derivation of the Quantum Bayes Rule (QBR), discuss the physical implications entropic methods puts on the measurement process in Quantum Mechanics (QM), and briefly discuss how the quantum maximum entropy method provides some simple generalizations of the QBR.
In QM the wavefunction has two modes of evolution [9,10]: one is the continuous unitary evolution given by the dynamical Schrödinger equation, while the other is the discrete collapse of the wavefunction that occurs when a detection is made.The collapse postulate is generally implemented ad hoc to empirically represent the effect of detection, and more recently the Quantum Bayes Rule [11,12,13] (also known as the fundamental theorem of quantum measurement [14] or the positive operator-valued measure (POVM) formalism [15,16,17,18]), quantifies collapse under a POVM measurement, A † x Ax, where x A † x Ax = 1, which is a generalization of Lüders Rule [10].Here we will derive the QBR from entropic arguments, which we claim eliminates the need for ad hoc collapse postulates in QM.Our result further perpetuates the interpretation that entropy may be used to inferentially collapse the wavefunction using projectors [19,20], and our result is generalized to the QBR using POVM's (and a weak QBR).Rather than appealing to group theoretic arguments [19,20], our derivations are seemingly simpler as they require solving the Lagrange multiplier problem following [1] (the derivation parallels [21,22] but using density matrices) and naturally avoids the infinite entropy problem that that appears for "strong Lüders rule" derivation in [19].The Lagrange multiplier technique is used in the maximum entropy method community ( [23,25,24] and the works and conferences that have followed) for updating probability distributions, so we refer to the method of inference capable of update density matrices as the quantum maximum entropy method.
As both forms of the standard and quantum relative entropy resemble one another, they inevitably share analogous solutions and face similar limitations; however, because we are dealing with density matrices, these limitations have physical consequences.In standard probability theory, there is a phrase, "The maximum entropy method cannot fix flawed information" [2], and a similar theme permeates the inference procedure for density matrices.Because the entropy was designed to update from a prior density matrix φ to a posterior density matrix ρ, the form of φ must accurately describe the prior state of knowledge of the system if ρ is going to objectively represent the updated state of knowledge for that quantum system.For instance, if our prior knowledge tells us that a particle is located within a certain interval, it makes no sense to impose that the particle have an average position anywhere but within that interval.The quantum mechanical analog of this is that a prior density matrix cannot be updated to regions not originally spanned by the prior density matrix.We derive this type of incompatibility for the quantum maximum entropy method, which we name the Prior Density Matrix Theorem (PDMT).The PDMT sheds light on some of the nontrivial notions of quantum measurement and QM in general.
A special case of the PDMT insists that the detection of an observable from a pure state (the collapse) is impossible without first decohering (or partially decohering) the pure state.This is a rediscovery of Lüders' notion [10] that the action of a measurement device is to project the pure state into a mixed state ρ → Pi ρ Pi, except our argument is from purely entropic and thus inferential arguments.This concept is not as foreign or as objectionable as it may seem if we consider the well known results of the quantum two slit experiment.If a "which slit" detection of the particle is made, then the resulting probability distribution is a decohered sum of Gaussians on the screen (after many trials), whereas omitting this detection allows for interference effects.Decoherence of the pure state was required for a which slit inference.The PDMT further insists that once the particle hits the screen, to detect its state, it must first decohere (potentially again) on the detection screen.This imprints a mixed state realization of the incoming pure or decohered state on the screen ρ → Pi ρ Pi, which may be detected and collapse the state.In this sense, "collapse of the wavefunction" is better stated "collapse of the mixed state" -which then, as we will see, is nothing more than standard probability updating.
In preparation for the derivation of the Quantum Bayes Rule using the quantum maximum entropy method, the derivation of Bayes Rule using the standard maximum entropy method will be reviewed [1,21].We will introduce the PDMT and apply the quantum maximum entropy method to derive the aforementioned cases of interest.

Maximum Entropy Method
Here we will loosely refer to ρ(x) as a probability distributions rather than a probability density with the understanding that the probability of an event is actually ρ(x) dx.E.T. Jaynes is the proprietor of the maximum entropy method [23,25,24], but over the years his insights have been refined [2].
The relative entropy and quantum relative entropy were designed for the purpose of making inference in [1] by implementing a set of design criteria.The design criteria are guided by the "Principle of Minimum Updating" (PMU) -which is the claim that a probability distribution should only be updated to the extent required by the information -while information itself is defined operationally as being what causes probability distributions to change.This pragmatic principle enforces objectivity to this method of inference.The functional form remaining, after implementing the design criteria, takes the form of a relative entropy, where C[ϕ] is a constant independent of ρ and |A| = 0 is an arbitrary but positive constant such that we are really maximizing the entropy.Maximizing this entropy with respect to a set of expectation value constraints Ai = dx ρ(x)Ai(x) and normalization via the Lagrange multiplier method is setting the variation, where {λ, αi} is the set of Lagrange multipliers that enforce their corresponding expectation value constraints.For arbitrary variations of ρ(x), The probability distribution that maximizes the entropy for arbitrary variations in ρ(x) occurs when the terms within the parenthesis vanish; and therefore one finds canonical-like solutions, in general.As |A| is a nonzero constant, it may be absorbed into the Lagrange multipliers, (λ, {αi}), so we may let it equal unity without loss of generality.Writing the normalization Lagrange multiplier in the suggestive form Z = e −λ gives, The Lagrange multipliers are solved by evaluating their corresponding expectation value constraints, usually employing standard methods from Statistical Mechanics Ai = ∂ ∂α i log(Z).One of the design criteria (DC1') used to derive the relative entropy in [1] enforces the Principle of Minimum Updating by stating, "in the absence of new information, the posterior distribution ρ is equal to the prior distribution ϕ".This is indeed the case if either no expectation value constraints are imposed, or if the imposed expectation value constraints are already satisfied by ϕ -in which case the introduced Lagrange multipliers are zero.
A comment on biased priors Entropic inference of this nature is only as useful as we are objective about our subjectivity.One should be careful not to apply nonsensical constraints, i.e. attempting to impose impossible expectation values.In such a case, the maximum entropy method provides "no solution" to the optimization problem due to its irrationality.If a set of microstates x ∈ D0 in a domain D0 are assigned a zero prior probability ϕ(x ∈ D0|s) = 0 given some situation s, then it is impossible to update the posterior distribution to anything but ρ(x ∈ D0|s) = 0 for any amount of new information (as can be seen in ( 6)).In the same way, a delta function prior distribution ϕ(x|s) ∼ δ(x − x0), which claims complete certainty, cannot be updated.We call distributions that cannot be updated due to having poor priors "biased" as any amount new information does not change the prescribed state of knowledge ϕ(x|s) → ρ(x|s) = ϕ(x|s).A biased state of knowledge pertaining to a situation s does not imply bias for a new situation s ′ , so a realization that a nonzero (or non-infinite) probability should be assigned to the region D0 admits the system is now in a new situation s ′ .An example of this from Statistical Mechanics (and also QM) occurs if the distance between the walls of an infinite potential box is enlarged such that previous zero probability regions now gain possibility.In this sense, and others, entropic updates are purely epistemic.These notions extend to density matrices as we will see later.

Maximum Entropy and Bayes
When the information provided is in the form of data, entropic updating is consistent with Bayes Rule, where Bayes Rule is the first equal sign and Bayes Theorem is the second equal sign [2].The leads to the realization that Bayesian and entropic inference methods are consistent with one another [21,22].The posterior distribution ρ(θ) can only be realized once the data about x's has been processed.This implies the state space of interest is the product space of X × Θ with a joint prior ϕ(x, θ).Suppose we collect data and observe the value x ′ .The data constrains the joint posterior distribution ρ(x, θ) to reflect the fact that the value of x is known to be x ′ , that is, however; this data constraint is not enough to specify the full joint posterior distribution, because ρ(θ|x) is not determined.
As there are many distributions that satisfy this data constraint, we rank the distributions using the relative entropy.Note that the data constraint (8) in principle constrains each x in ρ(x, θ) so a Lagrange multiplier α(x) is required to tie down each x ∈ X of the marginal distribution ρ(x).Maximizing the entropy with respect to this constraint and normalization is, where λ is the Lagrange multiplier imposing normalization.This leads to the following joint posterior distribution, The Lagrange multiplier Z is found by imposing normalization, The Lagrange multiplier α(x) is found by considering the data constraint, Substituting this solution into the joint posterior distribution gives, Integrating over x gives the marginalized posterior distribution, which is Bayes Rule.Generalizations of Bayes Rule, such as Jeffery's Rule when the data (and constraint) is uncertain, dθρ(x, θ) = ρD(x), are also consistent with the method of maximum entropy (further review can be found in [2,21,22]).The universality of this entropic inference method is emphasized by it consistency with other forms of inference like Bayesian inference.

Quantum Maximum Entropy Method
The derivation of the quantum relative entropy parallels the derivation of the standard relative entropy because the same design criteria were applied in both cases, but this time to the ranking of density matrices [1].The form of the quantum relative entropy that saturates the design criteria is, where SU (ρ, φ) is Umegaki's form of the quantum relative entropy.Similarly, maximizing this entropy with respect to a set of expectation values of Hermitian operators { Âi}, (i.e.Tr( Âi ρ) = Ai ) and normalization is setting the variation, Arbitrary variations of ρ is, due to the cyclic property of the trace, which, after absorbing |A| into the Lagrange multipliers again, gives the solution, where normalization may be factored out of the exponential due to the universal commutativity of the identity operator.The remaining problem is to solve for the n Lagrange multipliers using their n associated expectation value constraints.In principle their solution is found by computing Z and using standard methods from Statistical Mechanics, and inverting to find αi = αi( Âi ).Using these methods, the relevant thermodynamic and quantum information theoretic results in [26] that stem from quantum relative entropy may be reproduced and rephrased as applications of inference.Between the Zassenhaus formula and Horn's inequality, in general the solutions to (20) lack a certain calculational elegance due to the difficulty of expressing the eigenvalues of Ĉ = log( φ) + αi Âi in simple terms of the eigenvalues of the Âi's and φ, when the matrices do not commute.The solution requires solving the eigenvalue problem for Ĉ, such the the exponential of Ĉ may be taken and evaluated in terms of the eigenvalues of the αi Âi's and the prior density matrix φ.It is at this point that the review of the relevant material has concluded.

Prior density matrices
If the prior density matrix φ = φ2 is a projector, then we consider it to be a "biased" density matrix because no amount of information can update the posterior density matrix, i.e. φ → ρ = φ using entropic methods.An example using spin is discussed below to introduce the notion.
Consider the biased prior density matrix φ = |+ +| -the positive spin-z eigenstate.To preform calculation with any rigor using this biased prior, we must unbias it slightly by considering something like φ = limǫ→0 (1 − ǫ)|+ +| + ǫ|− −| ≡ limǫ→0 φǫ.We will use φǫ for the prior, and then take the limit ǫ → 0 when appropriate.In attempting to force the issue, consider maximizing the relative entropy with respect to b • σ such that ρ requires nonzero components along spin down |− −|, in contrast to φ. Maximizing the entropy with respect to this constraint, normalization, and the biased prior gives, The Lagrange multiplier which imposes normalization may be found by diagonalizing the exponent Ĉǫ → Λǫ, suggesting a convenient representation of the posterior density matrix using Ûǫ, In the limit ǫ → 0, the respective eigenvalues of Ĉǫ, λ±, approach 0 and −∞ while their respective eigenvectors straighten out |λ± ǫ → |± , and Ûǫ → 1. Therefore the posterior density matrix ρ = limǫ→0 ρǫ = φ is equal to the biased prior density matrix and has not updated as φǫ → φ.Because the pure state fails to update, it is biased, synonymous to a delta function probability distribution.The prior density matrix takes precedence as it does in the standard relative entropy case.Below we will discuss the general case and discuss its implications.Consider an M th order biased prior represented in its eigenbasis φ = M n=1 ϕn|ϕn ϕn|+ N n=M +1 0n|ϕn ϕn| in an N dimensional Hilbert space (M = 1 is a purestate).Given an N × N dimensional constraint Â (however the analysis holds for Â of any rank), the prescription is to add and subtract some ǫ's from φ such that φ → φǫ spans N , and, has N −M diagonal log(ǫ) terms.Because density matricies are Hermitian, and have a sum representation, ρ = ij ρij |i j|, they may always be rearranged and relabeled into the form above without loss of generality.Note that this construction may not have Tr(ϕǫ) = Tr(ϕ), but there is no formal issue because equality holds in the limit ǫ → 0. Because log( φǫ) may always be reorganized as above, in general may be written as log( φǫ) = log( φM ) ⊕ log(ǫ) 1N−M , where log( φM ) is the first M × M block of log( φǫ) and log(ǫ) 1N−M is the remaining block proportional to log(ǫ).Expressing the N × N constraint matrices Â = i αi Âi in the eigenbasis of φǫ, and summing it, is, which is a general representation of the matrix that residing in the exponential of a posterior density matrix, ρǫ = 1 Z exp( Ĉǫ), having an M th order biased prior density matrix φǫ.If we similarly partition Ĉǫ by letting ĈM be its first M × M block, then the characteristic polynomial equation of Ĉǫ has the following form, where the cn's are the remaining coefficients of the characteristic polynomial.For any finite λ, we may divide the characteristic equation by the leading (log(ǫ) − λ) N−M ≈ log(ǫ) N−M term, which in the limit of ǫ → 0, reduces the characteristic equation to the M × M block characteristic equation, for all finite λ.The eigenvectors associated to these M -finite eigenvalues span the M × M vector space.As this is true for all finite eigenvalues, the remaining N − M eigenvalues are not finite and indeed are all equal to negative infinity, due to the log(ǫ)'s as ǫ → 0. The remaining eigenvectors with the associated infinite eigenvalues therefore span the remaining (N − M ) × (N − M ) vector space, but are not unique because they have degenerate eigenvalues.The eigenvectors for the finite and infinite eigenvalues are disjoint, and may be partitioned by a direct sum λ = λM ⊕ λN−M and therefore so are the unitary matrices which diagonalize them Û = ÛM ⊕ ÛN−M as the unitary operators consist of columns of their associated eigenvectors.This disjointness is independence in the sense that the unitary operator Û is block diagonal.The posterior density matrix is therefore, completely independent of the Â − ÂM pieces of the constraints in Â, and φM = φ is the original M th order biased prior.This means the expectation values used to constrain ρ should really be independent of the Â − ÂM pieces to guarantee a logical solution.The lack of updating biased priors is not a failure of the method of maximum entropy, but rather a failure to choose appropriate constraints given an M th order biased prior density matrix -essentially, this constraint and prior density matrix conflict and have no solution unless we change Â → ÂM such that Tr( Âρ) = A → Tr( ÂM ρ) = AM .
In general, any prior density matrix that does not span the entire Hilbert space is an M th order biased prior density matrix.This insists the following, which we state as a theorem: Prior Density Matrix Theorem (PDMT): An M th order biased prior density matrix φ can only be inferentially updated in the eigenspace that it spans.Regions not spanned by the M th order biased prior density matrix remain zero.
The immediate consequence of the PDMT is that entropic updating can only cause epistemic and inferential changes to ρ.The inability to update a pure state, like in the pure state spin |+ +| example, shows just this.The only way to change the state of |+ +| is to physically rotate the state by applying dynamical unitary operators U (t ′ , t) via the Schrödinger equation because no inferential entropic update is possible.Once the Quantum Bayes Rule (general collapse) is derived using entropy, we will see that the Schrödinger equation and the quantum maximum entropy method compliment one another in QM -the first being responsible for continuous dynamical "physical" changes to the system and the second being discontinuous inferential updates, which cannot be explained by unitary evolution of the Schrödinger equation.In some sense, asking questions like, "What is the probability a spin up particle along z is up along x?" is zero (at that time) unless it is further specified that something like a Stern-Gerlach device has been used to spatially separate (and decohere) the spin ±x values such they may be detected at a later time (in which case the answer is ρ(±x|t ′ ) = 1/2).The Born Rule ρ(x) = |Ψ(x)| 2 seems to carry a lot of linguistic and experimental baggage if it is to be interpreted correctly.This is because detection, collapse, and entropic inference can only occur if first the pure state is projected into a mixed state ρ → Pi ρ Pi by the appropriate measurement device.While it is possible for ρ and Pi ρ Pi to have (in some basis) an identical probability spectrum, the two density matrices may evolve differently in time, and in that sense, represent different physical situations.
If one is serious about the assignment of a biased prior density matrix then the following realization is needed, " Because the prior density matrix is biased, the quantum maximum entropy method cannot update to a new posterior at that time".If however your prior density matrix is changed physically by the addition of new microstates via interaction, allowing it to decohere [27] (and the references therein), or change by some other process, then at a later time you one could employ a method similar to [28,29], that is, apply φ−1/2 φ′ 1/2 (t) and its transpose on either side of φ ≡ φ(0), to represent a new prior density matrix that in general may: be a decohered, represent a new experimental configuration, or a unitary evolution.Now, if the prior is unbiased, it is possible to inferentially update it non-trivially.
There are a few things to take away from this section.The quantum maximum entropy method only updates a density matrix inferentially, as can be seen by its lack of ability to rotate biased priors into non-biased states or other biased priors states.This is exactly what we expect, as the problem of biased priors exists in standard probability theory.The solution to the biased prior problem is, if appropriate: to change the constraint, change the prior, or perhaps both.This reasoning guides us in choosing appropriate priors in subsequent derivations throughout this paper.4 The Quantum Bayes Rule: Notationally, we denote a density matrices living in a Hilbert space Hx ⊗H θ to be written as ρx,θ .Density matrices may of course be expressed in any basis within these Hilbert spaces.We find it convenient to denote the x ′ , x ′ block matrix of ρx,θ with an equal sign such that x ′ |ρ x,θ |x ′ ≡ ρx=x ′ ,θ and similarly θ ′ |ρ x,θ |θ ′ ≡ ρx,θ=θ ′ .Also a tilde above a density matrix will represent a mixed representation of the density matrix in question φθ → ∼ ϕ θ .We introduce the Quantum Bayes Rule and then derive it using the quantum maximum entropy method as well as some generalizations.
Introduction -Quantum Bayes Rule: Following [14], consider a prior density matrix φθ which is entangled with an ancilla such that φθ → φx,θ .The system and the ancilla are entangled in the following way; given an initial state of the ancilla |0 x 0|, the joint system is entangled with a unitary operator U , where, and, is the x ′ , x sub-block matrix [14].The prior density matrix of the joint system is therefore, where Ax0 ≡ Ax are defined as the measurement operators of the POVM Êx = Âx Â † x .Due to Neumark's theorem, making a projective measurement of the ancilla x is a positive operator valued measurement (POVM) on φθ [30].Projecting the ancilla (is more or less collapse in the sense of Lüders) requires the following action on φx,θ , which implies the new state of the system is, after normalizing, which is known as the Quantum Bayes Rule (QBR) [11,12,13], the fundamental theorem of quantum measurement [14], or the POVM formalism [15,16,17,18].In the remainder of this section we will derive the Quantum Bayes Rule and other inference rules using the quantum maximum entropy method.
Simple Collapse: This entropic update is a special case of (36) when the Ax's are all projectors rather than a more general POVM.As we are simply doing a projective measurement on φx, an(other) ancilla is not needed to generate the POVM; however a projective measurement on the x's requires entangling φx to detector states and letting them decohere within the detector.For concreteness we may imagine that φx represents the pure state density matrix of a particle that went though a two slit apparatus (no which slit measurement has been made) and is impeding onto a screen, CCD array, or the like.The pure state evolves with the detector states (as above), and tracing over the detector states {|di } to represent projective measurement on x (before detection but after interaction with the measurement device), is a mixed state realization of the original two slit pure state (ϕ(x) = ∼ ϕ (x)).This is in-no-way original and may be obtained following [10] using projectors φ → ∼ ϕ x (t) = x Px φx Px or more directly [27].
In principle, when the detection of the result of a projective measurement ( ∼ ϕ x ) is made, the state of the system is known with certainty.This is represented by the following constraint on the posterior probability distribution, which is an expectation value on the posterior density matrix ρx, stating that the system was detected in the x ′ state with certainty.Because this constraint must be imposed for every x, there is one Lagrange multiplier αx for each x.Maximizing the quantum relative entropy with respect to this constraint and normalization is setting which gives the posterior, Because the constraint and prior commute, the posterior density matrix takes a simple form, The normalization constraint gives, and the expectation value constraint (39) gives, The final form of the posterior density matrix is found by substituting for αx, and the result is a collapsed state is, as suspected.Written in a suggestive "Bayes update" or "projective collapse" form, it perhaps better meshes Lüders strong collapse rule and the QBR.Note the tilde on ∼ ϕ x indicates that it is the appropriate prior for inference as the state has decohered in the detector.Although ∼ ϕ x=x ′ and φx=x ′ are numerically equal, substitution of this above is incorrect because φx has yet to decohere and cannot be inferentially updated due to the PDMT.Although this is perhaps a bit fussy, it provides another reason why secure channels exist in quantum cryptography -the statistics and dynamics of a quantum system change when it is measured ( φx → ∼ ϕ x ) because the state must decohere before it is inferentially updated ( ∼ ϕ x → ρx) due to the PDMT.Above is the special case of the Quantum Bayes Rule (1) when the measurements are projective.Note that this derivation does not require first solving for the "weak" collapse and taking the limit as is done in [19] to avoid infinite relative entropies [19].This is because [1] gives the general solution to ρ (equation ( 19)) while also providing the quantum maximum entropy method for making inferential updates of density matrices.

Simple Weak Collapse:
A form of weak collapse may be found by considering a system that has a certain probability of being in one state or another (perhaps due to measurement uncertainty) after detection.Given the same prior density matrix ∼ ϕ x , we maximize the entropy with respect to a set of constraints {ρ(x) = Tr(|x x|ρx)} to codifying a lack of certainty in the state (perhaps a narrow Gaussian distribution rather than exact knowledge in (39)).Maximizing the entropy with respect to these constraints and normalization again gives the posterior, because all the matrices and projectors commute.The normalization constraint gives, Satisfying the remaining expectation value constraint (ρ(x) = Tr(|x x|ρx)) gives e αx = Z ρ(x) ϕ(x) for each x, and therefore, ρx = which is a weak collapse or perhaps a quantum Jeffrey's rule in agreement with [19].
The Appropriate Prior for the QBR: The problem at hand requires a knowledge of the correct prior density matrix for inference.Notice that if φθ is an M th order biased prior, then, is an M th order biased prior, meaning that φx,θ can only be inferentially updated in that subspace (which may or may not be desirable).This is especially problematic if M = 1 such that φx,θ = φ2 x,θ is a pure state because it cannot be updated at all.
We therefore follow the intuition given by the PDMT -if we are going to make inferences on the basis of detection, the prior density matrix should appropriately reflect the fact that it has interacted with a measurement device.This interaction will be modeled by entangling the ancilla and detector states {|dy }, which act as a local environment states within the detector, via a unitary evolution (following [27] and the notation in [14], but a simple projection argument from Lüders on the ancilla states of ϕ xθ would also suffice), where, and the sub-block matrices are, We define a good detector as one in which the |x th ancilla state only entangles with the local state of the detector |dx , which is an argument for the sub-block matrix to take a simple form, The entangled density matrix becomes, φd,x,θ = These local environment detector states in which the ancilla reside, are traced over, as we do not keep track of their evolution.An example of this would be an ancilla which terminates on a photosensitive sheet -we obviously do not keep track of the state of the sheet.This is to say, a small period of time after the projective measurement has been made, the ancilla states transitions to a mixed state, which gives a standard (classical) probability distribution of the ancilla states over the detector.The prior density matrix after projective measurement has been made is thus a block diagonal sum of states, which we claim is the appropriate density matrix for POVM inference.This form of the prior is nolonger biased, even if φθ is itself biased.If all of the Ax's are projections, then this prior represents the resulting mixed state from a detector interacting with a potentially entangled of a pure state such as |Ψ = x cx|x, θx .As is done in [10], the action of the measurement device causes φx,θ → ∼ ϕ x,θ , but here this change of state is required to make inferential updates due to the PDMT, complimenting Lüders.
The constraints leading to QBR: Detecting the (exact) result of a projective measurement on the ancilla state x puts the posterior ancilla into a collapsed state x ′ .This is represented by a posterior probability distribution (data) expectation value constraint, for the case when the final state of the ancilla is known.Notice that this information alone is not enough to fully constrain ρx,θ as there are many ρx,θ which satisfy that constraint.We therefore employ the quantum maximum entropy method and impose normalization, this data constraint, with respect to the appropriate prior The expectation value constraint forces, meaning, e αx = Zρ(x) . In the case the data is known exactly, ρ(x) = δ xx ′ , the Lagrange multiplier reads . Substituting in for the Lagrange multipliers gives the final form of the posterior density matrix , ρx,θ = such that the marginal posterior is ̺x which is equivalent to, and is interpreted as the posterior density matrix of the ancilla after a complementary Âθ ′ measurement operator has been applied and θ ′ has been detected.The conditional priors, and because ϕ(x) = ϑ(x) and ϕ(θ) = ϑ(θ), The joint posterior density matricies ∼ ϑx,θ and ∼ ϕ x,θ , and their posterior marginals differ in how they will evolve in time.It is also possible to make inferences on a prior state in which both Hilbert spaces have decohered, φxθ → x,θ ϕ(x, θ)|x, θ x, θ|, which has correlations due to the previous entanglement but is no longer entangled.Because the use of appropriate measurement devices leads to ϕ(x, θ) = ϑ(x, θ), there is no interpretational issue in the delayed choice experiment because collapse only occurs after detection and decoherence of both the ancilla x and system of interest θ.The time order of the decoherence becomes irrelevant because the joint probabilities are equal -a similar argument is given in [31].Essentially what has happened in the delayed choice experiment is that you do not know if you have done a "which slit" measurement or not, which is like having "mixed state of measurement outcomes", but, this is precisely what a POVM measurement represents.
Weak collapse via thermal baths: Rather than detecting the result of a projective measurement on the ancilla state, we consider the weak measurement POVM one would obtain if the ancilla is sent into a thermal box as it can be naturally generated in the quantum maximum entropy method.Here we will let the Hilbert space Hx of the ancilla be spanned by {|n }, the energy basis eigenstates of the ancilla in the thermal box having a Hamiltonian Ĥn = n ǫn|n n|.The joint prior density matrix is prepared similar to above The following energy expectation value is used to represent the constraint of an ancilla in a thermal box, Again notice that this information alone is not enough to fully constrain ρn,θ as there are many ρn,θ which satisfy that constraint.We therefore require the quantum maximum entropy method; that is, maximizing the quantum relative entropy with respect to normalization, this constraint, and the POVM prior The expectation value constraint forces, meaning one can solve β = β( Hn ) by inverting the above equation after computing Z as is done in Statistical Mechanics.The marginal posterior is a realization of the "weak" collapse rule using thermalization, in which the outcome state ρθ of the system may be controlled (in the usual sense) by forcing the ancilla into a box with temperature β or β ′ as it causes changes to the statistics of the distant weak POVM: ρ(θ) = Tr(|θ θ|ρ θ ).

Generalizations:
General inferences of ρ on the basis of a prior state of knowledge φ and arbitrary expectation value constraints { Âi } gives the following general updating rule, from φ → ρ in light of new information about { Âi }.This is of-course the general solution to the quantum maximum entropy method, but now it is clear it may be interpreted as the solution for general purpose inference when applied correctly.As commutation was used in the previous QBR and QJR, inferences involving expectation values of non-commuting operators generalizes these rules -for instance, "simultaneously" imposing x and p gives ρ = 1 Z exp αx + β p + log( φ) .The solution is found by diagonalizing the exponential and using the methods from Statistical Mechanics.
An odd prescription might be, given a prior density matrix which has decohered or is being measured in the of Â basis (due to the PDMT) such that, which may be inverted to find α = α( B ), and thus the posterior ρ.
Perhaps the simplest example of such a situation is to start from a mixed prior density matrix in spin-z and then maximizing the quantum relative entropy with respect to expectations in spin-x to infer the posterior density matrix.The quantum maximum entropy method reproduces the well known solution to this problem that is usually reasoned by appropriately weighting the eigenvalues of ρ (in the x basis) such that σx is satisfied (the solution is completely determined due to normalization).The quantum maximum entropy method may be extended to cases in which the system of equations is under-determined, e.g. a single expectation value constraint and normalization of a Gell-Mann operator constraint λi that does not commute with its 3 × 3 (mixed) prior density matrix.

Conclusions:
In this article we applied the quantum maximum entropy method and derived the Lüders collapse (and weak collapse) rules, the QBR, the QJR, and also a method for computing inferential generalizations when expectation values do not commute.In doing so we eliminated ad hoc collapse postulates in QM by using the quantum maximum entropy method [1].As is demonstrated by the arguments leading up to the PDMT, because M th order biased priors may only be inferentially updated (full or partial collapse) within the M dimensional Hilbert subspace they span, the phrase, "collapse of the wavefunction" should be replaced by "collapse of the mixed state".A simple consequence is, because an M = 1 biased prior (pure state) cannot be updated inferentially, as given by the PDMT, it shows that entropic methods provide a reason for why pure states are "secure channels" simply because any eavesdropper would have to decohere the pure state to make inferences, and change the statistics of the original state.In essence, the PDMT is a rediscovery of Lüders notion that the application of a measurement device is to mix the incoming state ρ → i Pi ρ Pi, except here it was derived purely from inferential and entropic arguments.The quantum maximum entropy method and the PDMT have rigorized some notions and applications of quantum measurement such that future applications have a more full bodied representation in Quantum Theory.

ϕ
and likewise ϑ(x|θ) = ϕ(x|θ), we see all of the probability relationships hold and may be used interchangeably.It should also be noted that in general: θ because their off diagonal components may differ, however you may express ∼ ϑx|θ ′ in terms of the ϕ probability distributions and vice-versa.