Semi-Density Matrices and Quantum Statistical Inference

In this paper, inspired by the ‘Minimum Description Length Principle’ in classical statistics, we introduce a new method for predicting the outcomes of performing quantum measurements and for estimating the state of quantum systems.


Introduction
Needless to say, nowadays nearly all our physical knowledge is based on quantum theory. So an increasingly important problem is to characterize quantum systems and to obtain information about them. In the way of solving the problem, Quantum Statistical Inference (QSI) is a unique tool. Quantum statistical inference is the quantum version of classical statistical inference. To be more precise, quantum statistical inference enables us to obtain information about quantum systems by using outcomes of performing quantum measurements. The research subject was initiated in the middle of the 1960s. The pioneers and the first researchers in the field are Holevo, Yuen, Kennedy, Belavkin, etc. Since then till now many researchers in different countries have conducted research into the subject and have extended it in different directions. Among other things, QSI contains the subject matters, quantum estimation and quantum prediction, which will be considered in this paper. To treat these problems the only tool at our disposal is performing measurements. Since quantum theory is statistical in nature, we have to perform the same quantum measurement in the same state of the quantum system many times. But, as it is well-known, after performing a measurement on a quantum system the state of the system changes drastically. To overcome the difficulty, we usually assume that there are n quantum systems described by the same Hilbert space  and prepared independently and identically in the same state ρ (a density matrix on ) and we perform the same quantum measurement on each of them. In this way, we obtain a data set = D x x x , ,... n 1 2 ( ). By quantum estimation we mean techniques enabling us to find an approximation of the state ρ with the help of the data set D and by prediction we mean characterizing the probability of the outcome + x n 1 given the previous outcomes Î x D. An appropriate method to solve the problems is to choose a set  of density matrices on  containing ρ, called a quantum model and try to find the state ρ by methods, such as Maximum Likelihood Estimation (MLE). To be able to act in this way, we have to parameterize the set  in a differentiable manner. Unfortunately, ML Estimation which has been used by several authors gives rise to overfitting 2 . Moreover, in general, we do not know whether the state ρ is in the model  or not. Inspired by the works of J. Rissanen [1,2], P. Grünewald [3], and others on the Minimum Description Length Principle (MDL) in classical statistics, one of our goals in this paper is to remedy this difficulty. Their works on the use of 2-part codes [3] in MDL guided us to use sets of semi-density matrices in addition to quantum models and call them generalized quantum models (for more detail see the beginning of section 3). As in classical MDL we base our work on universal sources associated with quantum models. We will show that in all interesting cases universal quantum sources exist. It will be evident that the use of universal sources automatically protects against overfitting. Moreover, we prove different versions of the consistency theorem showing that when the state ρ is in the chosen model , the selected universal quantum source is asymptotically equivalent to it.
The organization of the paper is as follows: In section 2 we introduce the notion of Q-projection which in this work will act as projective quantum measurement. At the end of section 2, it is proved that in quantum theory all results concerning prediction and estimation proved in this paper are true for general quantum measurements. In section 3 after some explanations about the MDL principle and the way we have gone through to quantize the most important notions involved in MDL, we will define fundamental concepts, such as (generalized) quantum models, universal quantum sources, which is the core concept of this work, quantum source and quantum strategy. we will also prove some important facts about them. At the end of the same section, we introduce the notion of good quantum estimator and a large class of them. Section 4 is about quantum prediction and quantum estimation. In section 5 we will introduce the notion of consistency and prove some theorems about it. In section 6, we give examples that indicates the efficiency of this method.
We emphasize that with the help of trace function, one can reduce the problems treated here to problems in the classical MDL methods and solve them classically. But in doing this the operator nature of important concepts like universal quantum source associated with quantum models, quantum strategy and conditional density matrix conditioned on density matrix will be lost. Even worse, one cannot understand that these concepts are operators. Moreover, treating the problems in the realm of operator theory are more natural and simpler. In the same vein, nearly all notations, definitions and conventions used in the paper is directly inspired by their classical counterparts in [3]. So that comparison of classical and quantum frameworks should be straightforward.

Q-Projection
Given a separable Hilbert space , in general infinite dimensional, with inner product á ñ ·|· , the set ñ Î  k k {| | } will denote an orthonormal basis of  and its dual basis will be denoted by the set á Î  k k { || }. The set of all bounded operators (resp. self-adjoint bounded operators) on  will be denoted by  B ( )(resp. by  B H ( )) and the set of all positive operators (resp. density matrices) on  will be denoted by +  B ( )(resp. by  D ( )). Finally, the Hilbert space generated by trace class operators of  with the following inner product will be denoted by and it is called a density matrix if = Tr T 1 ( ) . The mapping which sends each nonzero semi-density matrix T to its associated density matrix T

{
} be a finite subset of p  ( ). The combination of elements of  is The set of all maximally connected unions of the family Î X j j J ( ) will be denoted by  Î X j J j . Clearly,  Î X j J j is a partition of X.

For each
The set of all minimally connected intersections of the family Î X j j J ( ) is evidently a partition of X and will be denoted by  Î X j J j . Now assume that X is an arbitrary non-empty set. Let the set of all partitions of X be denoted by X ( ) P . Let P and Q be in X ( ) P . We say that Q is finer than P and we write P P ⪯ , if each elements of P is the union of some elements of Q. It is evident that the set X ( ) P with the order relation P Q ⪯ is a partially ordered set. Assume is the greatest lower bound (resp. the least upper bound ) of the partially ordered set  and will be denoted by  Î P k K k (resp.  Î P k K K ).
If the set is consistent it has a least upper bound and a greatest lower bound.
2. If the set is finite and commutative, then it is consistent. Proof.
1. Assume that the set  is consistent then it has an upper bound = R r r , , , then there exists Î r R such that  q r and rR q =0 which is a contradiction. Hence, = q R q . Therefore for each Î  Q , each Î q Q is the sum of some elements of R. Let the order preserving mapping  Q Qfrom p  ( )into R ( ) P be defined as follows, for each , where q is the set of all summands of the projection q. Notice that q is the sum of some elements of R. Now it is clear that under this mapping we have the following bijective maps.
we have seen above that  Î P k K k (resp.  Î P k K k ) is the least upper bound (resp. the greatest lower bound) of the set will be called the Q-projection of T (see also [4]). The set of all Q-projections of elements of  B ( )will be denoted by  B Q ( )and for each Î  be Hilbert spaces. Let = P p p , ,..., 1 2 { }and = Q q q , ,..., 1 2 { }be complete sets of mutually orthogonal projections of the Hilbert spaces  and ¢  . Then: is a complete set of mutually orthogonal projections on Ä ¢   . Let T (resp. S) be a bounded operator on  (resp. ¢  ). Then: Lemma 2.
1. The mapping Q is trace preserving.
2. If T is self-adjoint, then T Q is also self-adjoint. 3. A necessary and sufficient condition for T to be positive is that for each Then, T Q is always normal. Proof.
4. Since in this case  B Q ( )is a commutative algebra, the proof is clear.   Proof.
Since any Î  T B( )can be written as a combination of two self adjoint elements Q is continuous. Proof.

Assume that
On the other hand for each Î q Q and each Î p P we have , the proof of the second part is clear.

Proof. From lemma 3 and the fact that qT=Tq implies
Lemma 7. Assume that T P is a pseudo-spectral decomposition of the operator T. Then for each Î  S B( ), we have = = = ST S T TS T S Tr TS Tr T S and .
. The proof of the second equality is the same. The third equality is evident. + The previous lemmas lead to the following result.

A necessary and sufficient condition for
 B Q ( ) to be commutative is that Q be a complete set of mutually orthogonal minimal projections.

Let S and T be in
. This fact motivate the following definition.
Remark 1. Any two elements of  B ( )always weakly commute. For some relations, being true or weakly true are equivalent. For example, if  T S then clearly, this relation is weakly true. Conversely, Assume that for each The relation weakly equal will be denoted by = w . Let r Î  D ( ) be a diagonal matrix. Clearly, we can consider ρ as a classical probability distribution function. But if the density matrix ρ is not diagonal we cannot interpret it in this way. The following definition serves to discriminate these two cases.  } ( ) and let ρ be a density matrix on . Assume that we perform the quantum measurement described by the set Q of measurement operators on the quantum system with state space  in the state ρ. Then, as it is well-known the probability of outcome associated with q i is r Tr q q i i ( ). Now, assume that Q P ⪯ . Then as we have seen earlier q i can be written as sum of some elements of P.
Tr q Tr p Tr p p Tr p p .
In this work in many cases we use only p Î  Q 0 ( ). Nevertheless, interpreted in quantum theory, as is evident from the above fact, our results concerning prediction and estimation will be true for all p Î  Q ( ). Moreover, as it is well known [5] the outcomes of a general measurement on the quantum system represented by the Hilbert space  can be realized by a projective measurement on the tensor product of  and another Hilbert space  . 0 So, our results will be true for general quantum measurement systems.

Quantum model, quantum source and quantum strategy
As we said in the introduction our work in this paper inspired by the Minimum Description Length Principle is based on universal quantum sources associated with quantum models. In this part, we define several versions of universal quantum sources associated with a quantum model and investigate some of their properties. In the same section, we prove the existence of universal quantum sources and give a constructive way to build it. We also define quantum strategy and treat its relation to universal quantum sources.
Before going further in this section let us give some comments on the use of semi-density matrices and on our definition of universal quantum sources.
The minimum description length principle is a powerful tool in statistical (inductive) inference. It is essentially based on two important notions:

2-part coding
The estimation by 2-part code can be considered as a mathematical formulation of Occam's Razer which says that between different descriptions of a data set, the simpler is the better. Assume that these descriptions are encoded in such a way that they reflect their complexities. Then the description with the shortest code-length is the better.
More precisely, let  be a nonempty set of probability density (mass) functions on a set  and let Ì  D n be an i.i.d data set generated by Î   p . Assume that elements of  are encoded. For each Î  p , the length of its associated code-word will be denoted by L(p) andp D log 2 ( ) will be denoted by L D p .
is the length of an encoded description of the data set D and p̈is chosen according to Occam's Razer.

universal coding
Under above assumptions on  and  , assume that for each Î  n p , n( ) is a probability density (mass) function on  n . The sequence = Î p p n n (¯) ( ) of probability density (mass) functions will be called universal with respect to , if for each For more details see [3]. Now let us explain briefly the way we have gone through to quantize these two notions.
Let the Hilbert space  be the state space of a quantum system A, which is prepared in an unknown state r 0 , a density matrix on , and let where O is the set of outcomes, be a projective quantum measurement system. Assume that  is a nonempty set of density matrices on  and Î D O n is the set of outcomes of performing theQ-measurement on n quantum systems identical to A and prepared in the same state r 0 . In performing theQ-measurement on the quantum system A in an arbitrary state ρ the probability of outcome m is

2-part coding ⟶ semi-density matrix
Let elements of  be somehow encoded and for each r Î  let r L ( ) be the length of the code-word associated with ρ and let But the function log 2 is increasing and is also increasing with respect to the semi density matrices r r -2 L n ( ) ( ) , as in the above classical case is an estimation of r 0 according to Occam's Razer. Notice that r r -

Universal coding ⟶ universalDensity Matrix
Let r n ( ) and r¢ n ( ) be two density matrix on  . In the following all tensor products of Hilbert spaces are topological tensor products. The n-times tensor product of a Hilbert space  with itself will be denoted by  n ( ) and in general, for each Î . From now on semi-density matrices on   will be denoted by r r = Î n n (¯) ( ) .The semi-density matrix r r = Î n n (¯) ( ) will be called 1. simple if r r = In this work ln denotes natural logarithm and log denotes logarithm in base 2.
Definition 7. Let r and r¢ be density matrices. Then the quantum relative entropy of r and r¢ is r r r r r r ¢ = - where r w r = * ( ). Therefore, r is universal relative to . Example 3. Let  be a quantum model and let r be a universal density matrix relative to  and U be a unitary operator. Then r -U U 1 is a universal density matrix relative to  (¯) ( ) is a universal quantum source relative to .
The proof is evident.
Lemma 11.  S the set of all universal quantum source relative to the quantum model  is convex.
Proof. Let r 1 and r 2 be two universal quantum source relative to the quantum model . Let r Î  and >  0 be given. Then there exists Î  n 0 such that for = k 1, 2 and  n n 0 we have:
Let  1 and  2 be Hilbert spaces. Let r be a density matrix on the Hilbert space r r Ä =   Tr , 1 2 1 2 ( ) and r r r = -• .
is called the conditional semi-density matrix of q conditioned on q n ( ) under ρ.
Definition 9. Let  be a separable Hilbert space and let r r = Î n n (ˆ) ( ) , be a positive operator on   and , is good.
Example 4. Let  be the following quantum model.
where r q is a 2×2-density matrix defined as follows For simplicity we omit the index Q. Assume that Î q Q n n ( ) ( ) consists ofk times q 1 and -n k ( ) times q . 2 Then for each q   0 1 we have It is straightforward to see that the maximum likelihood estimator for q n ( ) is r q q n ( ) Clearly  is a Bayesian quantum model. As we have proved earlier its associated universal quantum source is r Î , n n n n n n k n k n One can compute the above integral by partial integration and see that The density matrix r + q .

Quantum prediction and quantum estimation
As we said in the introduction, quantum prediction and quantum estimation are the most important subjects of quantum statistical inference. Following the classical works in MDL principle, our method of statistical inference is in general based on universal quantum source and use of it to do quantum prediction and quantum estimation.  ( ) is the maximum likelihood Q-quantum strategy associated with . Unfortunately, r is not good. But in many cases (see the above example), a modified version of the maximum likelihood Q-quantum strategy, which is very close to the unmodified one and the difference between them tends rapidly to zero, is a good one.

Quantum version of classical MDL prediction and estimation
This good Q-quantum strategy enables us to predict next outcome given the dataq .  If the maximum is achieved by more than one ρ we choose the one with the maximum trace. And if there is still more than one ρ there is no further preference. More precisely, let us suppose that  is a compact Riemannian sub-manifold of the Hilbert space á ñ  B , . . n 0 ( ) If there are more than one r 0 in ¢ Z we do not have any further preference among them. (For more information about finding extremum points see [7]) In the next section we will show that given the outcome q I n ( ) , it is an estimator of the state of the system.

Consistency and convergence
Consistency is a very important property of different methods of statistical (inductive) inferences. Let us explain briefly what we mean by it. Assume that  is a separable Hilbert space and  is a quantum model on . we say that a method of quantum statistical inference is consistent with respect to  if for r Î  0 and p Î  Q 0 ( ), we perform the quantum measurement Q on the quantum system  in the state r 0 repeatedly and obtain more and more data the state yielded by the method is more and more close to the state r 0 in some sense.
In this section we investigate different approaches to consistency and convergence.

Consistency based on distinguishability
Let  be a separable Hilbert space and let T and S be in  B H ( )and λ be a complex number; Assume that ¹ S 0 , l = T Sand p is the orthogonal projection onto the image of S. Then, we put l = T S p. Let  be a separable Hilbert space and p Î  Q 0 ( ). Let r r = Î n n (¯) ( ) be a quantum source on  . * For each Î  n let P n be a unary relation on Q .    [3].) + Definition 12. Let r* and r be quantum sources and r* be simple. and r* and r be their associated quantum strategies. Then, the standard KL-risk of r n * ( ) with respect to r n ( ) is The proof is a consequence of the definition of Q-universal source and theorem 6. + Lemma 13. Let f and F be two increasing positive real functions defined on +  . If the function f/F is decreasing and Proof. Assume that there exists > c 0 such that for n large enouph  f n cF n .
⟶ be a differentiable and integrable decreasing function. Assume that The sequence u n is defined as follow: = u 0 0 and for all is a sequence of non-negative real numbers. Then   is convergent. Therefore, the sequence a n is also convergent.   In the following, we write r n instead of r .   For each state, we simulated datasets with varying numbers of repetitions n=10, 50, 100, 250, 500. Table 2, shows the number of times (out of 1000 samples) that the quantum version of classical two-part code estimation chose correctly. As expected, for small sample sizes, n, the quantum version of classical two-part code estimation may select the wrong model because it has a built-in preference for 'simple' models. But for all large n, it will select the correct model. Yet for the small n, it is far better than classical methods, like AIC and BIC. In the case of the pure state because of the appropriate choice of weight, it never missed and always chose correctly. On the other hand, it avoids overfitting and it did well for the mixed states too. AIC and BIC have mistakes even for the large number of n. The comparison between tables 1 and 2 will show the difference between using semi-density matrices and common traditional models.
In the next example, we will show a concrete example of calculating a universal quantum source and predicting the + n 1-th outcome by a quantum strategy.