A note on the asymptotics of random density matrices

We show in this note that the asymptotic spectral distribution, location and distribution of the largest eigenvalue of a large class of random density matrices coincide with that of Wishart-type random matrices using proper scaling. As an application, we show that the asymptotic entropy production rate is logarithmic. These results are generalizations of those of Nechita, and Sommers and \. Zyczkowski.


Introduction
Density matrices are fundamental tools of quantum mechanics and quantum information theory for describing the state of a quantum system ( [1,14,19]). While the theory of deterministic density matrices is well developed, random density matrices have not been considered by many before ( [13,17,21]). These matrices are particularly useful when the state of the system is either unknown or just partially known. Random (density) matrices also appear in tomography ( [10]), when the matrix elements come from measurements, although in this case the semi-definiteness can happen to fail. Due to the randomness quantities like the entropy or entanglement cannot be computed exactly, they have to be estimated, and in order to do this a probability measure has to be introduced on the set of density matrices. While there is a uniquely defined uniform distribution on the set of pure states, since these are the rays of a Hilbert space, for density matrices, i.e. mixed states, there is no candidate for a canonical measure. There are two main classes of probability measures on the set of density matrices (described in more details in [21]). The first class consists of metric measures, which are generated by metrics on the set of density matrices, e.g. the Bures distance defined by the metric d(ρ, σ) = 2 arccos Tr(ρ 1/2 σρ 1/2 ) 1/2 . The second class consists of the induced measures, where density matrices are obtained by partially tracing a random pure state of a larger system. The topic of this note is confined to random density matrices of the second class. In order to describe the second class assume a quantum system is in some pure state |X ∈ H ⊗ K, where H is a p dimensional Hilbert space of the observer, and K is an n dimensional Hilbert space representing the (unknown) environment. The state of the system in the observer's space is given by ρ = Tr K |X X|, i.e. the partial trace of |X with respect to K. It can be shown that ρ has the form XX † /Tr(XX † ) = XX † /||X|| 2 HS , where X is a p × n matrix, || · || HS denotes the Hilbert-Schmidt norm, and X † denotes the conjugated transpose of X (for more details on the tensor analytic and matrix algebraic description see [8]). Due to the fact that K is unknown, there is a degree of freedom in choosing the distribution of |X . In case the distribution of |X is invariant under unitary conjugation, it can be shown that the elements of the matrix X are independent and normally distributed complex random variables. By analyzing the asymptotic behavior of density matrices we can get a useful insight on large quantum systems, i.e. when H ⊗ K is of large dimension. Since the aforementioned random density matrices are functions of the generating X, or more precisely, functions of XX † , it is reasonable to analyze the spectral asymptotics of XX † . The theory of the asymptotics of positive semidefinite matrices of the form XX † is well established. It is known that after the proper scaling the limit of the spectral distribution is given by the compactly supported Marchenko-Pastur law and the limit distribution of the largest eigenvalue is governed by the Tracy-Widom law under quite general conditions (see eg. [3,4,6,9,11,12,15]). Furthermore, after proper scaling the largest and smallest eigenvalues converge to the respective edge of the support with probability one. Some results from this theory can be translated to the case of random density matrices of the previously mentioned type. Nechita showed in [13] that after proper scaling and under the assumption that the elements of X are independent with standard normal distribution the limit laws coincide with those of XX † . In Section 2 we will generalize results of Nechita for a larger class of random density matrices, while Section 3 consists of the proofs of the generalized theorems. Sommers andŻyczkowski computed the two point correlation functions of a random density matrix of the second type from the invariant ensemble. They have also shown asymptotic results for the mean of the von Neumann entropy for a special class of random density matrices ( [17]). Section 4 generalizes their results.

Spectral asymptotics of general random density matrices
First, for the sake of completeness, let us introduce some notation and definitions.
Definition 1. The matrix ρ ∈ C n×n is a density matrix, if it is positive semi-definite (denoted by ρ ≥ 0) and Tr ρ = 1.
Given any x ∈ R, denote by δ x the Dirac measure concentrated at x.
For a given set A ⊂ R we will denote its indicator function by I A (x).
Given any Hermitian matrix A we will denote its j th largest eigenvalue by λ j (A).
We note that the density matrix is in close relation with the density operator, i.e. a linear, bounded operator of a Hilbert space with trace equal to 1. It can be shown that in finite dimensional Hilbert spaces there is an equivalence between these two objects. Now let us recall Nechita's results ( [13]) about the asymptotics of random density matrices. Let X = (x kl ) 1≤k,l be a family of independent, identically distributed (from this point on abbreviated as IID) random variables with standard complex Gaussian distribution N C (0, 1). Assume p = p(n) is such that lim n (p(n)/n) = c and consider the empirical distribution where ρ n = X n X † n /Tr(X n X † n ) and . Then where ν c is the Marchenko-Pastur distribution with parameter c (defined rigorously in the next section). Furthermore we also have lim n cnλ 1 (ρ n ) = ( √ c + 1) 2 with probability one, and where F 2 denotes the Tracy-Widom distribution with parameter 2. (For more details on the Tracy-Widom law see e.g. [18].) While working with density matrices it is also interesting to analyze the asymptotic behavior of the entropy. As an application of the main results we show in the Section 4 that the asymptotic entropy rate is logarithmic.
In the following we will state the main results.
kl ] = 0 and E[|x kl | 2 ] = 1, and assume that lim n p(n)/n = c. Then where µ n denotes the same measure as in equation (1) and E denotes the expectation functional.
Note that when determining the asymptotic distribution of λ 1 (ρ n ) the quantity p/n cannot be replaced by √ c as the convergence of p/n can be arbitrarily slow. Figures (1a) and (1b) show numerical evidence for Theorems 1 and 2. Both simulations were done using matrices with IID elements uniformly distributed on the set { ±1±i √ 2 }. The other parameters were chosen as n = 2000, c = 1/2 and the sample size was 5000. The density function of the Tracy-Widom law of parameter 2 was computed with the routine dtw(x,beta=2) of the R package called "RMTstat", while the eigenvalue statistics were computed in Julia.  Theorem 3 (Marchenko-Pastur [11]). Suppose {x kl , 1 ≤ k, l} is a family of IID complex random variables such that E[x kl ] = 0, E|x kl | 2 = 1, furthermore suppose p = p(n) and lim n p(n)/n = c > 0. Let X n := (x kl ) 1≤k≤p(n) 1≤l≤n and W n := 1 n X n X † n and µ ′ n := 1 p(n) n j=1 δ λ j (Wn) . Then we have where ν c denotes the Marchenko-Pastur distribution with parameter c.
Note that this is a more general and more concisely phrased version of the original theorem. A proof of this can be found in [2,11]. Given a measure µ supported on R + , introduce the notation We will also need the following lemma (for a proof see the Appendix).
Proof of Theorem 1. According to Lemma 1, equation (2) is equivalent to P S(ε, µ n ) → S(ε, ν c ) = 1 for all ε > 0, hence it is sufficient to prove that It can be easily checked that By assumption the elements of X are IID, thus Tr X n X † n /(np(n)) − → 1 with probability one, since the Strong Law of Large Numbers (SLLN) is applicable and lim n p(n)/n = c. Also, because of µ being a finite measure, S(ε, µ) is a continuous function of ε, which implies (3).
Proof of Theorem 2. The first part of the theorem is quite obvious. Geman showed in [7] that under the assumption of the present theorem we have and hence according to the SLLN.
To justify the second part we have to compare the largest eigenvalue of X n X † n with that of ρ n . According to the results of Bao et al. in [4] we have which means that it is sufficient to show that n 2/3 (λ 1 (X n X † n )/n−cnλ 1 (ρ n )) − → 0 in distribution. Since Tr(X n X † n ) , and λ 1 (X n X † n )/n − → (1 + √ c) 2 with probability one, it remains to prove that For arbitrary, but fixed p and n let S p,n := 1≤k≤p 1≤l≤n |x ij | 2 , then E[S p,n ] = np, Var(S p,n ) = np(E[|x 11 | 4 ]−1) and Sp,n np − → 1 with probability 1 if p = p(n) and n → ∞. Since γ 1 < k(n)/n < γ 2 it is enough to show that Furthermore let A p,n := {S p,n ∈ [np − β, np + β]} for some small β > 0 and for any fixed ε > 0, and let B ε,p,n := {n 2/3 1 − np Sp,n < ε}, then since B ε,p,n = (B ε,p,n ∩ A p,n ) ∪ (B ε,p,n ∩ (Ω \ A p,n )) and B ε,p,n ∩ (Ω \ A p,n ) ⊂ (Ω \ A p,n ). First, note that where in the second inequality we used Fatou's lemma and the last equality is due to the SLLN. Switching the role of p and n we obtain the same for fixed n and p → ∞. This proves that P(Ω \ A p(n),n ) → 0 as n → ∞.
On the event A p,n we have and the quantity on the right hand side tends to 0 whenever p = p(n) and n → ∞. This proves P(B ε,p(n),n ∩ A p(n),n ) → 0 as n → ∞ for any fixed ε > 0, therefore (5) holds true. Since convergence in probability implies convergence in distribution the proof is completed.
In the case of the smallest eigenvalue Feldheim and Sodin showed in [5] that The proof is essentially the same as for the previous case, thus it is left to the reader.

Application: Asymptotic entropy
In this section we are going to investigate the von Neumann entropy of random density matrices of the previously discussed type. We will prove that it exhibits a Strong Law of Large Numbers type of behavior for large systems. As it is meant to characterize the chaos present in a system, the results of this section show how much disorder is to be expected in the observed system H after tracing out the environment K. We will also see that, not surprisingly, the asymptotic entropy depends on the ratio of the size of the observation space H and the environment K. First let us define the von Neumann entropy of a density matrix.
Definition 3. Let ρ denote a density matrix on a finite dimensional Hilbert space H. Denote by the so-called von Neumann (also known as Shannon) entropy of ρ. In case 0 is an eigenvalue define 0 log 0 := 0.
Sommers andŻyczkowski computed asymptotic results for the mean von Neumann entropy in [17] by showing that if ρ n = XX † /Tr(XX † ) is such that X is an n × n Gaussian random matrix with IID elements. Our next proposition generalizes their result. and η K (x) := max{η(x), K} for x ≥ 0, then Now let ε > 0 be arbitrary. Since µ n converges to ν c weakly with probability one, and η K is a bounded, continuous function, I (n) 2 < ε/3 if n is large enough with probability one. Due to the definition of η K we have lim K→∞ η K (x) = η(x) and |η K (x)| ≤ |η(x)| for every x ≥ 0. Moreover, supp ν c being compact implies |η(x)|dν c (x) < ∞. According to the dominated convergence theorem I (n) 3 < ε/3 if K is large enough. It can be easily checked that |η(x) − η K (x)| ≤ x 2 for x ≥ K and K ≥ 1, which means that I (n) 1 ≤ ∞ K x 2 dµ n (x). Weak convergence of µ n with probability one implies x 2 dµ n (x) → x 2 dν c with probability one. By writing x 2 = min{K, ) → 0 with probability one. Weak convergence also implies µ n ((K, ∞)) → 0 for any fixed K > 0 with probability one, meaning that we have < ε/3 with probability one if n is large enough. Summarizing the above arguments yields that the quantity in (8) is less then ε. After subtracting log n from H(ρ n ) and taking the limit n → ∞ we obtain = log c − 1 2πc due to the assumption lim n p/n = c. The second part of the proposition is a consequence of the first part and can be easily proved using equation (7). Remark 1. Usually the entropy rate of a stochastic process {X n , n ∈ N} is defined as lim n Hn n , with H n = − p n (x 1 , . . . , x n ) log p n (x 1 , . . . , x n )dx 1 . . . dx n for continuous, and H n = − x 1 ,...,xn P(X 1 = x 1 , . . . , X n = x n ) log P(X 1 = x 1 , . . . , X n = x n ) for discrete random variables X 1 , X 2 , . . .. In the case of this paper there is no trivial way, if any, to define a stochastic process {X n , n ∈ N} such that H(X 1 , . . . , X n ) = H(ρ n ). The most natural way would be to define (X 1 , . . . , X n ) so that they follow the same distribution as (λ 1 (ρ n ), . . . , λ n (ρ n )). If F n (x 1 , . . . , x n ) denotes the distribution function of (X 1 , . . . , X n ) for any n ≥ 1, then the following strong compatibility condition has to be satisfied ∀k ≥ 0 ∀x 1 , . . . , x n dF n+k (x 1 , . . . , x n , dx n+1 , . . . , dx n+k ) = F n (x 1 , . . . , x n ).
It can be checked that this fails to happen even in the case of (Gaussian) Wishart matrices.

Conclusion
Nechita showed in [13] that the spectral asymptotics of random density matrices of the form XX † /Tr(XX † ) coincide with that of XX † after proper scaling, where the elements of X are independent and their distribution is standard complex Gaussian. In this paper, the previously mentioned results are generalized for the same type of random density matrices, but for the case when X comes from a larger class of random matrices. Since using the formula XX † /Tr(XX † ) is a very simple way of simulating random density matrices, these results can be used to approximate properties like the spectral distribution, the location and distribution of the largest eigenvalue, and the von Neumann entropy of large dimensional random density matrices.
In the application section we have generalized results of Sommers andŻyczkowski by showing that random density matrices generate infinite entropy in the limit, but the production rate is logarithmic and surprisingly independent of the parameter c.
An interesting further generalization of these results would be to consider random density matrices, where the columns of the generating X matrix are independent, but the elements of a columns are not. Yaskov showed in [20], that assuming X consists of independent copies of the isotropic p dimensional (real) vector x p the Marchenko-Pastur theorem is equivalent to a concentration of the quadratic form of the resolvent of 1 n XX † . In light of the aforementioned result, it would be interesting to show whether or not the IID condition could be relaxed in Theorems 1 and 2. and this yields P(S(ε, µ n ) → S(ε, µ) ∀ε > 0) = 1, implying ( Theorem 2.2 and Remark 2.3 in [16] ) that µ n converges to µ vaguely on every compact subset of [0, ∞] with probability one. For a finite measure ν on R ≥0 the measure ν is defined as The function f z (x) = x+1 x−z with f (∞) = 1 is continuous on [0, ∞] for all z ∈ C with Im z > 0, hence By the standard Stieltjes continuity theorem (Theorem B.9 on page 515 in [2]) this implies that µ n converges to µ vaguely. For probability measures vague convergence is equivalent to weak convergence.