Random State Technology

We review and extend, in a self-contained way, the mathematical foundations of numerical simulation methods that are based on the use of random states. The power and versatility of this simulation technology is illustrated by calculations of physically relevant properties such as the density of states of large single particle systems, the specific heat, current-current correlations, density-density correlations, and electron spin resonance spectra of many-body systems. We explore a new field of applications of the random state technology by showing that it can be used to analyze numerical simulations and experiments that aim to realize quantum supremacy on a noisy intermediate-scale quantum processor. Additionally, we show that concepts of the random state technology prove useful in quantum information theory.


I. INTRODUCTION
Random numbers are ubiquitous in computational and theoretical physics. A prime example is random matrix theory, whose application in physics emerged when Eugene Wigner introduced the notion of random matrices to model the nuclei of heavy atoms, explaining the level spacing in the spectra of the nuclei [1]. In the meantime, random matrix theory has found applications in many other fields such as quantum chaos [2], quantum optics [3], and quantum computing [4], as well as in other disciplines of engineering and finance [5]. In quantum statistical mechanics, a way to study eigenstate thermalization is to introduce a small perturbation in the form of a random matrix [6]. The resulting model is assumed to be applicable to physical situations, in particular to the study of equilibration of local observables [7,8].
In the past two decades, the concept of the so-called canonical typicality has been developed [9][10][11][12][13][14][15]. In essence, the idea is that the reduced density matrix of a system is a thermal state if the state of the whole system is a pure state of the overwhelming majority of wave functions, that is a random state on the subspace defined by a small energy interval containing the energy of the microcanonical ensemble. Similar ideas appear in the works from Bocchieri and Loinger [16], and Tasaki [17], who did not emphasize the role of the random state though. Gemmer et al. studied the process of equilibration of a two-level system coupled to a special many-level bath and derived a rate equation for the two levels by Hilbert space averaging [18]. The derivation implicitly uses the notion of the bath being in a pure random state. The idea of the thermal pure quantum state formulation of statistical mechanics was given in Ref. [19,20], some elements of which were already contained in S. Lloyd's Phd thesis in 1988 [21]. This thesis was barely noticed in the community of typicality until the relevant chapter was made available on arXiv in 2013 [21].
In the numerical-simulation arena, (pseudo-)random numbers are key to all Monte Carlo methods [22][23][24]. The focus of this paper is on the theory and application of a numerical method for computing properties of quantum systems, based on the use of a random state, which in essence is just a collection of random amplitudes defining the state vector of the quantum system. We coin this numerical tool to calculate various static and dynamic properties of quantum systems as random state technology. In the literature, this technology is also referred to as quantum dynamical typicality [25][26][27][28].
Calculating traces of square matrices is one of the basic computational problems in quantum statistical physics. If calculating each matrix element i|X|i for i = 1, . . . , D takes O(D) arithmetic operations, computing Tr X = ∑ D i=1 i|X|i requires O(D 2 ) arithmetic operations. In general, D is the dimension of the Hilbert space containing the state vectors describing the quantum system. D grows exponentially with the system size; for instance, for a system of n qubits, D = 2 n . Random state technology makes calculations feasible that would otherwise be out of reach with present-day supercomputers because it changes the operation count from O(D 2 ) to O(D), a significant reduction if D is large.
To the best of our knowledge, the basic idea of the random state technology is due to Alben et al., who used a random phase vector to reduce the numerical work of calculating the density of states (DOS) for a particle moving on a lattice [29]. Similar ideas have been used in the quantum transfer matrix Monte Carlo Method [30], the estimation of the eigenvalue spectrum (by calculating the moments of the distribution of eigenvalues) [31], the calculation of the DOS and other properties of the Anderson model and of the spin-1/2 Heisenberg model [32][33][34], and in the calculation of the DOS [35][36][37] and linear response functions [38][39][40][41]. A rigorous justification of the working principle of the random state technology and applications to the calculation of the eigenvalues of very large matrices, the density of states (DOS), and static properties of spin models were given by Hams and De Raedt [42].
One area of applications for which the random state technology proves very useful is the study of decoherence, thermalization, and dissipation of a quantum system interacting with a bath, that is an open quantum system [43]. A simple but approximate method to treat the dynamics of an open quantum system is to use a master equation for the reduced density matrix of the quantum system [43]. A direct, approximation-free simulation of a relatively large system (quantum system + bath) becomes possible [44][45][46][47][48][49][50][51][52][53][54][55] if we use random state technology to represent the bath state by a projected random state, the so-called thermal pure quantum state [19,20]. Another important class of applications is calculation of various static and dynamic properties, such as the DOS, the specific heat, the diffusion constant, the optical conductivity, etc. for systems such as the spin-1/2 Heisenberg model [27,[56][57][58][59][60][61], the Hubbard model [62,63], tight-binding models of 2D materials [64][65][66][67], etc.
For the efficient calculation of dynamic properties, the random state technology has to be supplemented by an efficient method for computing the result of applying the time-evolution operator to the state vector. Methods most often used are Suzuki-Trotter product formula algorithms [68,69], the Chebyshev polynomial algorithm [70][71][72], a Lanczos-iteration based method [73,74], and the Runge-Kutta method [27,75].
The paper is structured as follows. In Sec. II we present an extensive, rigorous analysis of the random state technology using basic tools of calculus only. We derive general results for the mean and variance for different random states, and use Markov's lemma to derive bounds. Then, we discuss various applications of the random state technology such as the calculation of the DOS (Sec. III), the specific heat (Sec. IV), current-current and density-density correlations, and electron spin resonance (ESR) spectra (Sec. V). Sections VI and VII apply the ideas of the random state technology to analyze numerical simulations and experiments that aim to establish quantum supremacy on a noisy intermediate-scale quantum processor [76] and generalize a statement in quantum information theory [77,78]. Sec. VIII summarizes the material presented in this paper.

II. THEORETICAL BACKGROUND
The underlying idea of the random state approach is that accurate approximations to Tr X may be obtained by computing Φ|X|Φ with the pure state |Φ defined by where {| j | j = 1, . . . , D} denotes (any) complete set of orthonormal basis states spanning the D-dimensional Hilbert space and where the complex-valued amplitudes {c j = c j (ξ ξ ξ ); | j = 1, . . . , D} are functions of complex-valued random variables ξ ξ ξ = (ξ 1 , . . . , ξ D ), i.e. ξ j = a j + ib j with real-valued a j and b j , distributed according to a probability density p(ξ ξ ξ ), We denote the expectation and variance of a random variable z = z(ξ ξ ξ ) with respect to p(ξ ξ ξ ) by E We focus on the case where p(ξ ξ ξ ) = p(a 1 , b 1 , . . . , a D , b D ) is a symmetric function of the a's and b's, meaning that its value does not change if we interchange its arguments. Furthermore, we impose the constraint that p(ξ ξ ξ ) = p(a 1 , b 1 , . . . , a D , b D ) is an even function of each of the a's and b's and that both the real and imaginary part of c j (ξ ξ ξ ) = c j (a 1 , b 1 , . . . , a D , b D ) are odd functions of each of the a's and b's. Exploiting these symmetries, it is easy to see that where we used the symmetries of p(ξ ξ ξ ) to drop subscripts in the argument of E [.] and introduced c to keep track of the fact that |c| 2 and | c| 2 correspond to two different random variables ξ j and ξ k with j = k. In this paper, we limit the discussion to the random states listed in Table I. We refer to states generated by the methods implied by Case A and B as Gaussian random states and those generated by the method implied by Case C as random phase states [79]. Salient features of these random states are that in practice they are easy to generate and that exact results for their statistical properties can be derived. In Appendix D, we present two different algorithms for Case A and show that the Gaussian random state is drawn from a uniform distribution on the D-dimensional sphere (Haar measure). The Gaussian random state generated by Case B does not have this property by itself; however, as we normalize the state vector in most applications of quantum theory, Case B is very similar to Case A. Generating the random phase state is close to trivial. TABLE I. Overview of the combination of probability densities p(ξ ξ ξ ) and amplitudes c j (ξ ξ ξ ) that we consider in this paper. Columns four to six give the moments that appear in Eqs. (3) and (4). Columns seven and eight give the exact expressions (see Appendix A for the proof) of the average and the variance of D Φ|X|Φ / Φ|Φ , respectively.
In the following, we first prove general results using the symmetries of the probability density only. Then, if no further progress can be made, we use Table I to obtain explicit expressions for the Cases A, B, and C. Using Eq. (3), we find Evaluating Eq. (5) for the special case X = 1 1 yields E |c| 2 = E [ Φ|Φ ] /D. Using this to eliminate E |c| 2 , we have A further, significant reduction of work is possible if we can obtain accurate estimates of the averages in Eq. (6) by using only one "realization" of the random vector ξ ξ ξ . Therefore, in practice, we generate a random vector ξ ξ ξ , construct the pure state |Φ , compute D Φ|X|Φ / Φ|Φ and expect that Markov's lemma [80] (see below) tells us that if the variance of D Φ|X|Φ / Φ|Φ vanishes, the probability that D Φ|X|Φ / Φ|Φ is equal to Tr X approaches one. Therefore, in order to assess the usefulness of the "one random state" approximation, we have to compute the variance of D Φ|X|Φ / Φ|Φ . In Appendix A, we derive the exact expressions for the average and the variance of the right-hand side of Eq. (7). These expressions are listed in the seventh and eight column of Table I, respectively. From column seven, it follows directly that on average D Φ|X|Φ / Φ|Φ is equal to Tr X. Note that in Case A and B, the variance does not depend on the choice of the basis, whereas in Case C, it does. Therefore, in Case C, a proper choice (depending on X) of the basis may help to reduce the variance [79].
At this stage of the discussion, X can be any matrix with entries of any size. In order to have a numerically meaningful measure of the statistical fluctuations we define the relative variance rVar by and for Case C we have where λ k ≡ D −1 ∑ D j=1 λ k j denotes the average with respect to the index j. The last factor in Eq. (9) is the relative mean square deviation of the eigenvalues of X. If we assume that this number does not increase faster than D + 1, a reasonable assumption for physically relevant problems, then rVar [D Φ|X|Φ / Φ|Φ ] vanishes as D → ∞. Similarly, if λ 2 /λ 2 does not increase faster than D, then the left-hand side of Eq. (10) vanishes as D → ∞.
The vanishing of the relative variances as D → ∞ is the key idea behind any successful, practical application of the random state technology. Indeed, once it has been shown that relative variances vanish as D → ∞, for large D (which is a very large number in most cases of interest) it suffices to use only one realization of a random state |Φ to obtain an accurate estimate of Tr X, implying that the computational burden has been reduced by a factor D. This can be shown by the following argument. Appealing to a variant of Markov's inequality (Chebyshev's inequality) [80] we see that if the variance of A is sufficiently small, the probability that the value of this random variable deviates from its average value will also be small. We use this basic result in the form This shows that if (DTr XX † − |Tr X|)/ |Tr X| 2 (Case A and B) or Tr XX † / |Tr X| 2 (Case C) do not increase faster than D, the probability that the right-hand side of Eq. (7), computed from one realization of a random state, deviates from its expectation value Tr X, vanishes as D increases. It is instructive to estimate the expected error of using Eq. (7) in a different, somewhat more general manner. Let us multiply both sides of Eq. (7) by Φ|Φ and define Obviously, Eq. (6) implies E X = 0. The mean square deviation is given by By straightforward application of Eqs. (2)-(4) we find and therefore where the last line follows by substituting the corresponding values of E |c| 2 , E |c| 2 | c| 2 , and E |c| 4 , given in Table I. Except for Case B where there is an irrelevant difference (D + 1 instead of D) in the denominator, the final expressions given by Eq. (17) agree with the exact expressions for the variance given in Table I. Therefore, the expressions in Eq. (17) for the mean square deviation Var X lead to the same conclusion as the one obtained for the exact expressions listed in Table I.

III. APPLICATION: DENSITY OF STATES
To our knowledge, the first application of the random state technology was for the calculation of the density of states (DOS) of an alloy [29]. In this section, we present a discussion of this kind of application from a more general perspective. An example implementation that can be used to reproduce the figures presented in this section and compute the DOS of many other lattices is available online at [81]. In the following, we seth = 1 for convenience.
Given a state |Ψ , the local density of states LDOS Ψ (ω) of a system described by a Hamiltonian H is defined by where [−T, T ] denotes the relevant time interval (see below for a criterion on how to choose T given H). In practice, T is limited by the amount of computational work it takes to calculate the matrix element in Eq. (18). If |Ψ is taken to be a random state |Φ , applying Eq. (5) yields, where DOS(ω) = ∑ D n=1 δ (ω − E n )/D is, by definition, the density of states. For numerical work, problems with large D are demanding, but by using the random state technology we can reduce the computational burden by writing In practice, we perform the integration over time in Eq. (18) by means of the Discrete Fourier Transform (DFT). To this end we rewrite Eq. (18) as where τ = T /N is the time step at which we sample the function Ψ|e −itH |Ψ . Accordingly, we obtain DOS(ω) at frequencies ω = kπ/T for k = −N, . . . , N − 1. As before, we see that the random state technology reduces the amount of computational work from O(D 2 ) to O(D).
In practice, the most efficient method to compute Ψ|e −itH |Ψ for t = τ, 2τ, . . . , Nτ is to solve the time-dependent Schrödinger equation (TDSE) using a suitable method such as the Suzuki-Trotter product formula algorithm [68,69], the Chebyshev polynomial algorithm [70][71][72], a Lanczos-iteration based method [73,74], or the fourth-order Runge-Kutta method [27,75]. The values of Ψ|e −itH |Ψ for t = −τ, −2τ, . . . , −Nτ are obtained by using Ψ|e +itH |Ψ = Ψ|e −itH |Ψ * . In numerical work, the matrix H is bounded. Denoting by H the maximum absolute value of the eigenvalues of H (i.e., the spectral norm or 2-norm of H), the function LDOS Ψ (ω) = 0 if |ω| > H , that is this function is band limited. Then Nyquist's sampling theorem tells us that we should sample with a time step τ < π/ H in order to cover the full range of eigenvalues and avoid aliasing [24]. As τ = T /N, this means that the time interval should be chosen such that T < Nπ/ H . Eigenvalues that differ by less than π/T cannot be resolved.
In summary, the number of samples 2N determines the resolving power of the method, and the time interval 2T has to satisfy T < Nπ/ H to cover the full range of eigenvalues. For many problems of interest, H is not known, but it can be replaced by a bound such as H ≤ H 1 = max 1≤ j≤D ∑ D i=1 |H i, j | (i.e., the maximum absolute column sum, which is closely related to the concept of the Gershgorin disk [82]) which is easy to compute.
We illustrate the application of the random state-based technique by computing the DOS for a single particle hopping on a one-dimensional ring, a square, a graphene, and a kagome lattice. The Hamiltonian is given by where a + m (a n ) is the fermion creation (annihilation) operator of a particle at site m (n), v mn are the hopping integrals, and w m is an on-site potential. The sum over m, n is over all nearest-neighbor bonds of the lattice with m < n. Note that because of the restriction to a one-particle problem, it does not matter whether we use fermion or boson operators. Being a one-particle problem, we can compute all physical properties if we can diagonalize the quadratic form defined by the matrix V with elements v mn + δ mn w m . In some cases (see below), V can be diagonalized analytically for any size or shape of the lattice. If that is not possible, we can diagonalize V numerically if its dimension is not too large, in practice limiting this approach to matrices with a dimension in the range 10000 to 100000. The random state technology can handle (much) larger matrices [42].
The numerical procedure consists of three steps (see also the demonstration available at [81]): 1. Generate a Gaussian random state |Φ = ∑ m c + m a + m |0 and a copy: |Ψ = |Φ .
2. Use the first-order real-space Suzuki-Trotter formula [68] to construct the second-order approximation U 2 ( τ) = U T 1 ( τ/2)U 1 ( τ/2). The order in which the matrix exponentials in Eq. (25) appear can be chosen freely but once chosen, this order has to be kept fixed. Solve the TDSE by repeating |Ψ(t +τ) ← (U 2 ( τ)) l |Ψ(t) , where τ = τ/l and with l ≥ 1 controlling the accuracy of the product formula approximation.   3. Compute the DFT of f (t)g(t) where g(t) is a Gaussian window function.
In Fig. 1, we present the (well-known) results for the DOS of the four mentioned lattices with v mn = v and w m = 0, as obtained by the above procedure.
For problems for which all eigenvalues of V are known in analytical or numerical form, such as the examples for a particle moving on one of the lattices considered above, there exist much more efficient methods to compute the (L)DOS than the one based on solving the TDSE. For instance, simply counting the number of times that the energies fall in bins [E, E + ∆E] gives the DOS in the form of a histogram. However, the method based on random states in combination with the solution of the TDSE excels when the calculation of all the eigenvalues of H is prohibitive. We illustrate this point by Fig. 2 where we present DOS results for the two-dimensional Anderson model of localization and a square lattice with rather exotic anisotropic hopping integrals.

IV. APPLICATION: QUANTUM STATISTICAL PHYSICS
An important class of problems for which the random state technology can be put to good use is the calculation of thermal equilibrium averages of observables of a quantum system, described in terms of its Hamiltonian H = H † . If Y = Y † represents the matrix of such an observable, the task is to compute where β denotes the inverse temperature in units of 1/k B , and we used invariance of the trace under cyclic permutation (Tr AB = Tr BA) to bring Y into a form that is most suited for the application of the random state technology. According to the general recipe laid out in Section II, the idea is to replace the calculation of the traces by the calculation of only one matrix element. The first step in the procedure is to generate a random state |Φ , by using for instance one of the methods given in Appendix D. Next, we compute the so-called random thermal state defined by whereby it is implicitly understood that one has an efficient and accurate algorithm to compute e −β H/2 |Φ . In practice, the Chebyshev polynomial representation of e −β H/2 can be used to compute, with close to optimal efficiency, e −β H/2 |Φ to almost machine precision [38,70,71,83]. The final computational step is then to estimate the thermal equilibrium expectation value according to where X = e −β H/2 Ye −β H/2 and Z = e −β H . As before, in order to show that the replacement of the traces by single matrix elements makes sense, one has to show that the variance of the right-hand side of Eq. (28) is small. Because of the presence of Z in the denominator, we are unable to calculate the average or the variance of Φ|X|Φ / Φ|Z|Φ exactly, except if β = 0 in which case the problem reduces to the one treated in Section II. Therefore, we have to resort to the approximate treatment based on Eqs. (B1) and (B2). To calculate the first three terms in Eqs. (B1) and (B2), we only need It is now straightforward to use the expressions for the moments given in Table I and obtain the expressions for the average and variance of Φ|X|Φ / Φ|Z|Φ . For example, in Case B we find and The absolute values of the contributions in the curly brackets are readily shown to be bounded by 2 Y (Eq. (30)) and 4 Y 2 (Eq. (31)). In practice, we are only interested in observables Y for which Y = O(log D). Therefore, the magnitude of correction (second term of Eq. (31)) to and the variance of the thermal average of Y is primarily determined by the factor is the free energy, we have, in general, where the first inequality follows from applying the Schwarz inequality to the inner product Tr A † B. The second inequality follows by writing From the left-hand side of Eq. (32) and Eq. (33), it follows immediately that F(2β ) ≥ F(β ). This is in concert with the second law of thermodynamics. Furthermore, since the free energy is an extensive quantity, Tr Z 2 /(Tr Z) 2 vanishes exponentially with increasing system size. If we assume that Y is at most proportional to the system size then, using the fact that the terms in the curly brackets are bounded and their prefactors vanish exponentially with increasing system size, the correction to the average and the variance itself vanish in the same manner. (32) suggests that the efficiency of the random state approach is lost when β → ∞. Indeed, when β → ∞, the random thermal state turns into the ground state |0 of H, that is lim β →∞ |Φ β → |0 , but we cannot recommend calculating the ground state by projection with e −β H/2 because for large β , the efficiency of this projection is inferior to that of the Lanczos method, for instance.
It is instructive to illustrate these general features through a very simple example. To this end, we compute e −2β (F(2β )−F(β )) = Tr Z 2 / (Tr Z) 2 for a system of N non-interacting spins, described by the Hamiltonian H = −Ω ∑ N i=1 σ z where σ z is the zcomponent of the Pauli-spin matrices, having eigenvalues ±1. A simple calculation yields which shows that e −2β (F(2β )−F(β )) smoothly increases from D −1 = 2 −N to one when β increases from zero to infinity. For β < ∞, Eq. (34) vanishes exponentially with increasing N.
In summary, the factor Tr Z 2 /(Tr Z) 2 determines the rate at which the variance Eq. (31) vanishes with the system size but this factor can, depending on the temperature 1/β , vary from 1/D to close to one. In the latter case, the random state technology looses its efficiency.

A. Specific heat
In this section, we use the random state technology to calculate the specific heat of frustrated and non-frustrated spin-1/2, nearest-neighbor, antiferromagnetic Heisenberg models. The Hamiltonian is defined by is the spin-1/2 operator at the i-th site, i, j refers to the nearest-neighbor sites on the lattice, and J = −1 is the antiferromagnetic coupling. The specific heat is calculated from the fluctuation of the energy, i.e., as C = Numerically, there are only two important operations involved. The first is the preparation of the random thermal state (projection on a random state, Eq. (27)) and the second is the operation H|Φ β . There are several ways to do the projection, e.g., by the power method [84], a product formula algorithm [68,69], or the Chebyshev polynomial algorithm [70][71][72]. These methods only need storage for a few state vectors, and the numerical errors are well under control. Another method, widely used in the solid-state physics community, is the finite-temperature Lanczos method [73]. This method uses the eigenvalues and eigenstates obtained from the standard Lanczos method to calculate the matrix element of the operator Y . It requires storage of wave functions proportional to the Lanczos iteration order M and reproduces the results of the high-temperature expansion up to order M. According to the theory given in the previous section, the errors due to the use of the random state are bounded (simply replace Y by H or H 2 ) and vanish as D → ∞. Figure 3 shows the results for the specific heat for three different lattices, namely, the square, triangular, and kagome lattice. For all cases, the lattice size N = 36 and periodic boundaries are used. For both triangular and kagome lattices, the shape of the lattice is a rhombus. Within the range of temperatures covered, we clearly see that for the square and triangular lattice, there is only one peak in the specific heat, located around T /J = 0.6 and T /J = 0.2, respectively. The specific heat for the kagome lattice shows multiple peaks. The first peak is around T /J = 0.6. The second peak is around T /J = 0.1, shoulder-like and greatly reduced compared to results obtained for system sizes less than N = 36 (data not shown).
Our results are consistent with those obtained from exact diagonalization [85], finite-temperature Lanczos method (FTLM) [86,87], and the transfer-matrix quantum Monte Carlo methods [88]. Shimokawa and Kawamura used the random state technology to calculate the finite-temperature properties of the antiferromagnetic Heisenberg model on the kagome lattice up to size 36 [89]. Sugiura and Shimizu developed the same method and demonstrated its use by calculating the specific heat on the same model up to size 30 [20]. At even lower temperatures, a third peak appears, see Ref. [86], in which the finite-temperature Lanczos method was used to calculate the thermal properties for a kagome lattice with up to N = 42 spins.
As the temperature decreases further, the factor Tr Z 2 / (Tr Z) 2 approaches one. The corresponding loss of statistical accuracy can only be compensated for by averaging over different realizations of the random states [42], see also appendix C. As this paper focuses on the basic principles of the random state technology, we do not pursue the aspect of averaging over different realizations of the random states here and therefore, we do not calculate the specific heat for T /J < 0.01. Instead, as an illustration, we also include in Fig. 3 the results obtained by using different random states, for temperatures down to T /J = 0.1. Figure. 3 shows that the data for the kagome lattice, obtained using two different initial random states, agree with each other up to T /J = 0.1, while the results for the other two lattices only agree up to temperatures of about T /J = 1. This can be understood as follows. At sufficiently low temperatures, we have Tr Tr Z 2 / (Tr Z) 2 approaches one (and the variance may become large) is about 30 times higher for the square and triangular lattice than for the kagome lattice. In other words, if we want to compute temperature-dependent averages with approximately the same accuracy and exp[−β (E 1 − E 0 )] becomes small, it is necessary to average over more than one realization of the random state.
For extremely low temperatures, the specific heat can be calculated through the exact low-lying eigenvalues which can be obtained by several methods, such as Lanczos-based algorithms [90] and the Sakurai-Sugiura method [91]. For moderately low temperatures, Morita and Tohyama proposed two algorithms improved from FTLM by utilizing the exact low-lying eigenvalues and eigenvectors [92]. The first so-called replaced FTLM algorithm replaces the energies by the exact low-lying energies in the FTLM. The second so-called orthogonalized FTLM is to start the FTLM by random states orthogonal to all the exact low-lying eigenvectors. The effectiveness of these algorithms is illustrated by results for the specific heat and structure factor for Kitaev-Heisenberg models on kagome and triangular lattices for systems with up to N = 36 spins. The second algorithm can be easily adapted to the projection method used to calculate the specific heat in the present paper.

V. APPLICATION: QUANTUM DYNAMICS
Random state technology can be used for the calculation of the expectation value of time-or frequency dependent observables as well. In this section, we illustrate this fact by showing calculations of the current-current correlation, density-density correlations, and ESR spectrum for spin-1/2 models. For examples of the application of the random state technology to the calculations of the dynamic properties of 2D materials see Refs. [64,65] A. Current-current correlation In this subsection we focus on the current-current autocorrelation function defined by where j is the (problem-dependent) current and j(t) = e itH je −itH . If we set Y = j(t) j(0), it is obvious that the random state technology can be applied to calculate the current-current correlation, Eq. (31) guaranteeing that the variance of C(t) vanishes exponentially with increasing system size. In numerical work, we introduce two auxiliary states and obtain the autocorrelation function C(t) by computing As already mentioned, the action of the time evolution operator e −iHt on a state vector can be computed by algorithms such as the fourth-order Runge-Kutta scheme [27,75], the Suzuki-Trotter product formula algorithm [68,69], and the Chebyshev polynomial algorithm [70][71][72]. We take as an example the 1D spin-1/2 Heisenberg XXZ ring defined by where J = −1 sets the energy scale and ∆ = 1.5 is the anisotropy. From the lattice version of the continuity equation, it follows that the spin current operator j is given by Spin transport properties can be obtained by Fourier transform of the current-current correlation function. Figure 4 shows the results of the current-current autocorrelation function for two system sizes N = 37, 38 obtained by using the random state technology. The Suzuki-Trotter product formula algorithm [68,69] was used to compute the time evolution of the two state vectors. For simplicity, we choose β = 0. The data for the two different system sizes follow the same curve up to tJ ≈ 15 and then start to deviate from each other. This suggests that the current-current autocorrelation function shows strong finite size effects, except for relatively short times. The data presented here are in concert with earlier results [93][94][95]. A way to alleviate this effect is to apply the idea of numerical linked cluster expansion, which allows the accurate estimation of C(t) for much longer times [96,97].

B. Density-density correlation
The density-density correlation is intimately related to the current-current correlation via the continuity equation but is of interest by itself. It is defined by where n is the density operator and n(t) = e itH n(0)e −itH . We only have to set Y = n(t)n(0) to see that also in the case of the density-density correlation function, the random state technology can safely be applied. Numerically we can calculate this quantity by exactly the same technique as the one used for the current-current correlation, namely from the time evolution of two projected state vectors. For β = 0, we can eliminate one of the time evolutions. For a local density operator, it is always possible to find its square root, either analytically or if necessary numerically. Then, it is straightforward to show that where |ψ(t) = e −itH n(0) |Φ is an unnormalized state vector. Note that if the density operator has negative eigenvalues, the square root of this operator is obtained by shifting the eigenvalues by the lowest eigenvalue, and the above formula should be changed accordingly. We illustrate the method by considering once more the Heisenberg XXZ model Eq. (40). As the density operator, we take the local occupation number n l = S z l + 1/2. Note that √ n l = n l here. The density-density correlation function at β = 0 is given by where |ψ(t) = e −itH n L/2 |Φ , ψ(t) |ψ(t) = 1/2, and p l is the expectation value of n l (after proper normalization of the state vector). As only one time evolution of the state vector |ψ(t) is needed, using the same resources as in the case of the currentcurrent correlations allows us to simulate a system with one extra spin, that is the 1D XXZ model with N = 39 spins. Figure 5 shows the results p l . The initial state produces a peak in the center of the chain, i.e., p 20 = 1. As a function of time, this peak then spreads over the neighbors and the density profile shows a tendency to become stationary. Within the maximum time tJ = 30 shown, the profile does not reach the boundary yet, indicating that up to this time finite size effects are not yet relevant. It is clear that the width of the profile develops slowly, which, in fact, fits well to a square-root increase after an initial linear increase for short time scales tJ ≤ 1. The data provides clear evidence for diffusion in the integrable XXZ model with large anisotropy ∆ > 1. Our results agree with results obtained from time-dependent density matrix renormalization group calculations [98]. Further detailed discussions about diffusion and data for other models can be found in Refs. [58,60,61,63].

C. Electron spin resonance
According to linear response theory, the ESR signal is related to the Fourier transform of the autocorrelation correlation function [99]. The ESR signal averaged over a period 2T is proportional to [99]  where and M x is the x-component of the total magnetization. We set and X(ω) = X † (ω), we can use Eq. (31) to show that the variance of C(ω, T ) vanishes exponentially with increasing system size.
Again, we take the XXZ model as an example, except that now we add an extra term to Eq. (40), given by H m = −h ∑ i S z i , to represent the external magnetic field. For simplicity, we do not include the dipole-dipole interaction term [100]. The parameters are taken to be J = −1, ∆ = 0.84, and h = 5 and we adopt open boundary condition. Figure 6 presents the results for the ESR spectra obtained from simulating the given XXZ model up to times tJ = 4096 for N = 30 and tJ = 2048 for N = 32 and 34, respectively. As the scale of the x-axis of Fig. 6 indicates, this particular calculation requires a high resolution in the frequency domain. Accordingly, we need simulation data for a very large time interval, effectively limiting the system sizes we can study within a reasonable amount of elapsed time. Therefore, we computed data for system sizes N = 30, 32, 34 and β = 0 only.
As Fig. 6 shows, the ESR line shape clearly displays a four-peak structure, which fits well to a sum of four Lorentzians [101]. The separation between the two central peaks seems to decrease as N increases from N = 30 to N = 34. This may suggest that the central double-peak structure disappears in the thermodynamic limit. However, on the basis of numerical data, it is difficult to draw a definite conclusion about the vanishing of the central double-peak structure. In fact, this is a subtle issue and we refer the interested reader to Refs. [102][103][104] for additional data on the size and temperature dependence of the ESR line shape.

VI. APPLICATION: QUANTUM SUPREMACY
Recently, Gaussian random states (see Table I, case A) found new applications in the field of quantum information processing. In this rather long section, we scrutinize the possibility of letting a universal quantum computer generate Gaussian random states. We address the conceptual differences of realizing such states on a conventional digital computer and a hypothetical, universal quantum computer. We also discuss the difficulties that arise in designing and testing quantum circuits which are tailored to generate states that exhibit features characteristic of Gaussian random states.
A demonstration that a programmable quantum device can perform a (not necessarily useful) task which is out of reach for present-day and near-future supercomputers is called quantum supremacy [4,76,105]. Assessing the potential of such a quantum device involves performing a series of established tests, so-called benchmarks. In this section and in section VII, the term "benchmarking" refers to measuring the quality of a quantum information processor as a computing device by comparing experimental data produced by the processor with results obtained from (a computer simulation on a digital computer of) the quantum theoretical model for the device. More specifically, this section discusses benchmarking of the Sycamore superconducting processor [76] using the cross entropy [4] (see section VI C) as a measure. In section VII, we present theoretical results related to the theory of "randomized" benchmarking [106,107], which involves averaging the fidelity [77] of experiments performed with different, randomly chosen, quantum circuits.
We start by giving an outline of the content of this section. In subsection VI A, we analyze the general problem of sampling from a probability distribution that is specified through the amplitudes of a pure quantum state such as the Gaussian random state of Table I, case A. We compare the resources that are needed to perform this sampling on a digital computer and on the theoretical model of a quantum computer. We also address the complexity of constructing and validating quantum gate circuits that are designed to generate states that exhibit features characteristic of Gaussian random states.
Subsection VI B discusses general aspects of the recent quantum supremacy experiment with the Sycamore superconducting processor [76]. Subsection VI C introduces the measure, the cross entropy, that is used to discriminate between outcomes that are distributed uniformly over all possibilities and outcomes reflecting the quantum gate sequence that the device was instructed to carry out.
In subsection VI D, we use the maximum entropy method and the knowledge of the measured cross entropy to establish a relation between the theoretically expected probability distribution and the unknown probability distribution that describes the observed outcomes.
In subsection VI E, we adopt the Gaussian random state (see Table I, case A) as the model of the theoretically expected probability distribution. We show that, given a numerical estimate of the cross entropy, the maximum entropy method [108] yields a unique solution for the probability distribution that describes the observed outcomes. This solution indeed allows us to discriminate between a device that generates outcomes reflecting the applied quantum gate sequence and a device that produces uniformly distributed outcomes.
Finally, in section VI F, we combine supercomputer simulations with the experimental data produced by the Sycamore superconducting processor [76] to assess the claim that quantum supremacy has been demonstrated.
A. Using a quantum computer to sample from a random state Up to this date, applications of the random state technology employ digital computers to solve problems. Obviously, for this purpose, we need to be able to store at least one pure state |Φ = ∑ D−1 j=0 c j | j in the computer's memory. This means that the computer should have enough memory to store the D complex numbers c j . A supercomputer with 1 PiB (2 50 bytes) of random access memory (RAM) can store 2 46 complex double-precision coefficients c j . Translating this into how many qubits (or S = 1/2 spins) we can represent with such a large amount of RAM, we find the disappointingly low number L = 46. Every time we add a qubit to the system, we have to double the amount of memory, just to be able to store the state |Φ . The required amount of memory D = 2 L grows exponentially with the number of qubits L. Clearly, the cost of such a digital computer, its power consumption, and its speed of operation severely limit the number of quantum objects one is able to study by application of the otherwise very efficient random state technology.
Random states |Φ that are drawn from the uniform distribution on the 2D-dimensional unit sphere, Case A of Table I, are not only very convenient for many applications of the random state technology but, as shown in section VII and will become clear later in this section, they are important for the theory of randomized benchmarking of real quantum processors too. Therefore, in this section, we limit the discussion to random states that are drawn from the uniform distribution on the 2D-dimensional unit sphere and, to simplify the writing, will refer to such states as random states in the following.
Applications of the random state technology generically involve the calculation of that is the expectation value of a matrix X for a random state |Φ . Let us simplify the discussion by considering matrices X which are diagonal with respect to the (computational) basis states {|0 , . . . , |D − 1 }, X| j = x j | j . Using this, Eq. (47) becomes were J is a set of m D states drawn with probability |c j | 2 and we have used the law of large numbers [80] to approximate the exact expressions by a sum of m instead of D terms. The problem of computing X has now been replaced by the problem of sampling integers j with probabilities |c j | 2 (recall ∑ D−1 j=0 |c j | 2 = 1.) The generic way for performing this sampling on a digital computer is to compute the cumulative probability distribution P( j) = ∑ j i=0 |c i | 2 for all j = 0, . . . , D − 1, generate a uniform pseudo-random number 0 < r < 1, and output the largest value of j for which P( j) ≤ r.
Let us now ask ourselves how difficult, in terms of memory and computation time, it is to compute and store the cumulative probability distribution P( j). It may not be too far off to assume that the total amount of storage available in the world at this time is of the order of 10 EiB (≈ 2 63 bytes, although it is not easy to find a reliable number). For a value of D of that order, the calculation of P( j) needs to be performed with 8-byte floating point arithmetic. This implies that with 10 EiB of storage, we may be able to store 2 59 values of the P( j)'s, the corresponding quantum system is described in terms of 59 qubits.
Using the algorithm described in section D 2, we can generate c 0 , c 1 , . . . , c D−1 independently. Thus, to fill the table of P( j)'s, we can simply generate c j and set P( j) = P( j − 1) + |c j | 2 for j running from 0 to D − 1 (with P(−1) = 0). This takes of the order of D floating-point operations. To reduce the elapsed time, we may distribute this calculation over many processors. Suppose that we can really get exclusive access to the huge amount of storage that our application needs, can we then sample from the cumulative distribution P( j)? Disregarding the fact that this storage is distributed over the whole world and access times to memory locations may be relatively long, we probably can because for a given random number r, we can find j by binary search, that is in a number of steps that is proportional to L, not to D = 2 L . The upshot of these considerations is that the limit on the value of D of the random state from which we would like to sample is dictated by the amount of available memory.
Can a gate-based quantum computer do better than a digital computer in terms of problem size? According to the mathematical model of gate-based quantum computing, the state of an L-qubit quantum computer is described by a linear superposition of the D = 2 L computational basis states [109]. As the state of the quantum computer changes, all D = 2 L coefficients change in parallel [109]. Thus, the mathematical model of a quantum computer provides us with a machine that exhibits an unprecedented level of intrinsic parallelism and a tremendous amount of memory (with each additional qubit, we double the amount of coefficients describing the state). The relevant question then is "can we exploit these features in practice?".
Returning to the conceptually simple task of sampling from a random state, with a quantum computer at our disposal we would proceed as follows: 1. Design a gate circuit C, that is a sequence of unitary operations [109], that changes the state | j = 0 into the random state |Φ = ∑ D−1 j=0 c j | j .
2. Apply C to the initial state | j = 0 .
3. With the quantum computer in the state |Φ = ∑ D−1 j=0 c j | j , perform a measurement of all the qubits. This measurement yields values q k = 0, 1 for each of the k = 0, . . . , L − 1 qubits. The sequence of values q L−1 · · · q 0 is called a bitstring and is equivalent to the integer j = ∑ L−1 k=0 2 k q k , corresponding to the index of the state | j ∈ {|0 , . . . , |D − 1 }. By Born's rule, the probability for this event is given by |c j | 2 .
4. Store the m values of j, obtained by m repetitions of steps 2 and 3, to form the set M . 5. Compute m −1 ∑ j∈M x j to obtain the desired approximation to X .
Note that designing such circuits C is by no means trivial, as C should act on all D coefficients to create the random state. To the best of our knowledge, there is no rigorous proof that such circuits can be constructed with a depth polynomial in the number of qubits. Instead, Boixo et al. constructed so-called random circuits from single-and two-qubit gates [4]. For those circuits which can be simulated numerically, they demonstrated that they have a depth which grows polynomially with the number of qubits and produce states that yield the Porter-Thomas distribution, a direct consequence of the state being a Gaussian random state [4]. We make the plausible assumption that the length/depth of the circuit C and the time it takes to measure all qubits are of order L, not of order D. Then in the mathematical realm, the gate-based quantum computer can sample from a random state of dimension D, for values of D which are out of reach for a digital computer. Memory is not an issue.
Although the mathematical model holds great promise for very fast and very large computation, the technological hurdles to build a quantum information processor that realizes (part of) these promises seem enormous. In fact, there still is a wide gap between the mathematical model and its physical realizations. For instance, experiments with publicly accessible quantum processors show that they are not yet capable of reliably performing simple computational tasks such as adding two small integers [110]. Moreover, the performance of the current generation of gate-based quantum information processors is adversely affected by various sources of noise. For this reason, they are often referred to as Noisy Intermediate-Scale Quantum (NISQ) devices. The prospects of building a fully error-corrected quantum processor with NISQ technology are rather dim. Therefore, in order for quantum information processing to become a practical reality on short terms, it is necessary to (i) characterize the performance of NISQ devices by a well-defined procedure (ii) develop algorithms for solving practically relevant, nontrivial problems using these error-prone NISQ devices.

B. Quantum supremacy: general aspects
As mentioned earlier, current NISQ devices have difficulties to perform e.g. simple arithmetic operations. However, because of their noisy operation, they excel at producing "random" output. The latter feature may be put to good use to demonstrate that a NISQ device can perform a computational task which is prohibitive for state-of-the-art digital (super)computers. Such a demonstration is coined "quantum supremacy". Quoting Boixo et al.: "we propose the task of sampling from the output distribution of random quantum circuits as a demonstration of quantum supremacy" [4]. The output distribution Boixo et al. have in mind corresponds to a realization of a state drawn from the uniform distribution on the 2D-dimensional unit sphere, Case A of Table I, that is a random state. The Case A distribution is exceptional among multidimensional distributions because for any D, many of its properties can be studied analytically through closed form expressions. For instance, one can show (see the derivation of Eq. (A11)) that the exact expression for probability density p(z) of "the probability z = |c j | 2 to observe the bitstring j" is given by p(z) = (D − 1)(1 − z) D−2 (see Eq. (A11) in Appendix A). For large D, p(z) ≈ e −Dz , a result which is often referred to as the Porter-Thomas law [111].
Referring to item 1 in section VI A, the first step is to design circuits C that perform the desired task. To this end, Boixo et al. use what they call a "random circuit", denoted by R in the following, These R's are specifically designed to generate states that are close to random states. Heuristics and simulation are used to guide the construction of the R's [4,76,112]. Boixo et al. have constructed a large number of such R's [4,76].
The quantum supremacy demonstration [4,76] has four different components: (a) Design of a random circuit R, operating on L qubits, for which the output distribution is known through simulation of the same (or approximately the same) circuit on a digital computer.
(b) A real quantum processor that can execute the random circuit R and can produce a set of measured bitstrings M .
(c) A measure for the correlation between the set of observed bitstrings and the random circuit executed on the ideal quantum computer.
(d) An error model to account for the NISQ character of the quantum processor and a procedure to extrapolate the results to values of L that cannot be simulated by a state-of-the-art digital computer.
Regarding (b), recall that simulating an L-qubit circuit C on a digital computer requires resources of the order D. At the time of writing, this prohibits the simulation (with an accuracy of 10 digits or better) of a universal quantum computer with to L > 45 [113]. Therefore, merely executing C on a real quantum processor having say, L = 53 qubits, is a trivial kind of "quantum supremacy". One has to demonstrate that the output distribution is indeed the one expected for the circuit C. In principle, this requires taking of the order of S × D samples where S is the number of repetitions (say of the order of S = 10000) required to estimate each |c j | 2 with sufficient statistical accuracy. Obviously, for large L (even for L = 45), this sampling task is prohibitive, even with a noise-free quantum computer. Therefore, in order to proceed, one has to be less demanding and feel content with showing that the measured set of bitstring M complies with the hypothesis that these bitstrings are samples drawn from the distribution corresponding to R. This then requires the specification of one or more measures for the correlation, as mentioned in item (c), which is the topic of the following subsection. Item (d) is essential for the interpretation of the data produced in the quantum supremacy experiment [76] but is outside the scope of this paper.

C. Cross entropy
Consider an ideal quantum computer with L qubits executing a gate circuit U represented by a unitary matrix U. This quantum computer generates states labeled by j = 0, . . . , D − 1 (D = 2 L ) with probabilities p U ( j). Assume that we have a physical realization of the quantum computer. As discussed above, currently only NISQ devices are able to execute the gate circuit U. This physical device produces bitstrings j with a-priori unknown probabilities denoted by p V ( j). We denote the collection of bitstrings (represented by integers) generated by this device by

The key question is now
To what extent are the probability distributions p U and p V similar?
If we had enough data to build a histogram that approximates p V and we knew p U , we could resort to standard statistical tests for comparing the two distributions [24]. However, even for small L (say L = 20, D = 2 L = 1048576), a large number of bitstrings (say m 100D) is required in order to estimate the histogram with some confidence, rendering this approach useless for practical purposes if L is large. Boixo et al. [4] proposed to circumvent this "sampling problem" by using the cross entropy, of the elements in the set J (i.e., the m bitstrings generated by the device) to approximate the sum over all j in Eq. (49). This circumvents the problem of not being able, in practice, to collect enough data to estimate all the p V ( j)'s. For completeness, we mention that the recent quantum supremacy demonstration used, in addition to Eq. (49), also the "linearized" cross entropy [76]. For the purpose of discussing the ideas behind these recent quantum supremacy experiments, it does not seem to matter if one uses C(V,U) or L(U,V ) [76]. Therefore, in the remainder of this section, we only consider the cross entropy C(U,V ).
For a fixed p U , c U is a positive number, obtained by computing the cross entropy Eq. (50) using a data set of bitstrings J , generated by a device. The key question can now be reformulated as Given p U and the numerical value c U > 0, what can be said about the unknown distribution p V ?
In order to make a statement, one has to adopt a model. In the next subsection we appeal to the principle of maximum entropy, a recipe to obtain a probability distribution with specified averages of functions of variables (and nothing else) and which otherwise is as uniform as possible [108].

D. Maximum entropy method
We follow Jaynes (Ref. [108], section 11.6) and search for the extrema of where the Lagrange multipliers λ and µ account for the constraints ∑ N j=1 p V ( j) = 1 and c U = −(1/m) ∑ j∈J log p U ( j), respectively. There is only one extremum, namely the maximum of F [108]. Hence, we solve the problem of maximizing the first term in Eq. (51) which is the entropy (or Shannon information), subject to the named constraints. The solution for the maximum is given by p V ( j) = exp(λ + µ − 1 + µ log p U ( j)). Using the named constraints we find exp(1 − λ − µ) = ∑ D−1 j=0 p µ U ( j) and We can find µ by solving In summary, if we combine the knowledge that the cross entropy C(V,U) = c U with the principle of maximum entropy, there is a definitive relation between the known probabilities p U ( j) and the unknown probabilities p V ( j) from which the samples J are drawn. Given the value of c U and probabilities p U ( j)'s the maximally non-committal probability distribution is given by Eq. (52) with µ being the solution of Eq. (53). Phrased differently, there are many more probabilities to be assigned than there are constraints (the normalization and the value of c U ) and the maximum entropy principle yields the broadest distribution that is compatible with these constraints. Therefore Eq. (52) may be viewed as a minimally prejudiced assignment complying with the named constraints. Assume that an experiment with a device yields a value of c U for which the solution of Eq. (53) yields µ = 1, Then, from Eq. (52), it follows that p V ( j) = p U ( j). In other words, excluding all other knowledge we might have, the principle of maximum entropy suggests (but not guarantees) that the device is working properly, meaning that it generates bitstrings according to the probabilities p U ( j) that correspond to the circuit U. Similarly, if the device produces a value of c U for which the solution of Eq. (53) yields µ = 0, the principle of maximum entropy suggests (but not guarantees) that p V ( j) = 1/D, that is the states are sampled from a uniform superposition of basis states (see Case C in Table I).
Up to this point, the discussion has been entirely general, independent of a particular choice of the gate sequence U. Clearly, in combination with the maximum entropy principle, the cross entropy estimate Eq. (50) is a useful measure for the correspondence between the observed bitstrings J produced by a quantum device and the ideal gate sequence U that one would like this device to carry out.

E. Quantum supremacy: theory
A key advantage of using the cross entropy is that it can be estimated without having to perform a prohibitively large number of measurements. To be useful, the only requirement is that p U ( j) is known, through simulation on a digital computer [76,113,114] or through a model that can be treated analytically. For an arbitrary sequence U, memory requirements limit the calculation of the p U ( j)'s to less than 50 qubits (D < 2 50 ) on current supercomputers [76,113]. To demonstrate that a quantum processor can perform a task which cannot be performed by a digital computer, one has to resort to circuits U for which the cumulative probability distribution cannot be determined analytically and is not amenable to computation by a digital computer.
It is instructive to first consider the case in which the circuit E generates a set of bitstrings E for the uniform superposition state by applying a Hadamard gate to each qubit [109], that is Case C of Table I (with all a j 's equal to zero) for which p E ( j) = 1/D and, from Eq. (50), c E = log D. Then Eq. (53) is an identity for arbitrary µ and Eq. (52) yields p V ( j) = p E ( j) = 1/D. Thus, assuming the device to work properly, the circuit E is expected to generate m bitstrings E for which c E = log D (up to statistical fluctuations).
Next, we consider somewhat less trivial states, namely those that are distributed uniformly on the 2D-dimensional unit sphere. These are the states A of Table I that are at the heart of the random state technology. In the recent quantum supremacy experiment [76], the quantum processor executes a so-called random circuit R and produces sets of bitstrings R for each instance of the circuit R. The basic idea is that the state produced by R is an instance of the uniform distribution on the 2D-dimensional unit sphere. Each sample R of m bitstrings carries a "fingerprint" of the particular circuit R that produced these bitstrings. In symbols, for a particular random circuit R, the state of the ideal quantum computer executing R reads |Ψ R = ∑ D−1 j=0 z j | j where (z 1 , . . . , z D−1 ) is a sample drawn from a uniform distribution over the 2D-dimensional unit sphere. However, if we assume that the state generated by such a circuit is indeed an instance of the uniform random state |Φ , we can use random state technology to explore its features analytically.
In the remainder of this subsection, we consider the case that the quantum processor executes the circuit R without producing any error and that the resulting state seems to have been drawn from the uniform distribution over the 2D-dimensional unit sphere. The first issue to address is whether, for the application to cross-entropy benchmarking, we can obtain accurate estimates from one realization of the state |Φ if D is large. From Eq. (A13) it follows that such that also in the case of random sampling from |Φ , we can expect an accurate estimate of the entropy from one realization (if D is large). Similarly, it follows that for large D, the variance of the difference between entropy and its estimate computed from the set R having J elements is given by where γ ≈ 0.577 is Euler's constant. As log D ≈ 0.69L, Eq. (55) shows that if J L 2 , any subset R will yield an estimate of the entropy which, on average, will be an accurate approximation to the entropy − ∑ D j=1 p R ( j) log p R ( j). Finally, we consider the solution of the maximum entropy problem Eq. (53). According to Eq. (54), if D is large, we may replace the calculation based on one realization of the random state by the average over all equivalent random states. This amounts to performing integrals over multidimensional Gaussians, which is straightforward. It then follows from Eq. (A12) that we have to find the value of µ that satisfies the equation The function PolyGamma(0, µ + 1) is the logarithmic derivative of the Gamma function, that is PolyGamma(0, µ + 1) = Γ (µ + 1)/Γ(µ + 1). This function is monotonically increasing function for µ > −1 with a divergence at −1, negative for −1 < µ < 0.46163, and positive for µ > 0.46164. For large µ, we have PolyGamma(0, µ + 1) = log µ + 1/2µ + O(1/µ 2 ). Therefore, Eq. (56) has a unique solution for µ. Instead of solving Eq. (56) for µ, it is easier to compute, for a chosen value of µ, the difference between the value of c E = log D of the uniform distribution p E ( j) = 1/D and the theoretical value c R given by Eq. (56). We find , µ = 0 ← uniform distribution over all D states 1 , µ = 1 ← uniform distribution on the 2D-dimensional unit sphere 3/2 , µ = 2 .
In the next subsection, we use the analytical results of this subsection to interpret the experimental results of a recent quantum supremacy experiment [76].  (59). The probabilities p R ( j) for the circuit R have been obtained by JUQCS-E [113]. The corresponding sets of m = 500000 bitstrings M have been obtained from experiments [76].
In the first column, the letter in brackets identifies the instance of the different random circuits R used in the experiments [76]. The results obtained by using the circuit 39[b] and the measured data generated by the circuit indicated in the corresponding row are listed as α 39[b],M . R: pseudo-random circuit; R: sampled states, obtained by executing R on the simulator JUQCS-E; M : sampled states, produced by the Sycamore processor executing R [76]; E : states sampled from a uniform distribution. In this subsection, we scrutinize the results for the cross entropy, obtained by combining the bitstrings M produced by the 53-qubit Sycamore superconducting processor [76] and probabilities p R ( j) calculated with the universal quantum computer simulator JUQCS-E [113][114][115].
For a given quantum circuit R, designed to generate a random state, JUQCS-E [113] executes R and computes the probability distribution p R ( j) for each quantum state j ∈ {0, . . . , D − 1}. JUQCS-E also computes the cumulative distribution function P R (k) = ∑ k j=0 p R ( j) and samples states from this distribution, yielding a set of bitstrings R. All these calculations are numerically exact (up to at least 10 digits). A feature of JUQCS-E, not documented in Ref. [113], allows the user to specify a set M of m bitstrings for which JUQCS-E calculates p R ( j) for all j ∈ M . The latter feature allows us to compute the estimate −(1/m) ∑ j∈M log p R ( j) for the cross entropy.
Following the methodology for cross-entropy benchmarking [4,76], the quantities of interest are where X = R, M , E and R and M are sets of bitstrings generated by JUQCS-E and by the Sycamore processor [76], respectively. E is a set of bitstrings sampled from the uniform distribution (p E ( j) = 1/D). From the theoretical model presented in section VI E, it follows that α R,M ≈ 0 if a quantum processor produces bitstrings that are distributed uniformly. The larger the value of α R,M , the more likely it is that the quantum processor has been sampling from the correct distribution. In this sense, the uniform distribution corresponding to α R,M ≈ 0 provides a baseline for the assessment of the quality of a NISQ device. If m is sufficiently large (m = 500000 for the experimental data sets [76]), we may expect that α R,R ≈ α R,R . If the circuit R produces a genuine random state, averaging over all such R's yields α R,R R = 1, see Eq. (57). Note that the last term in Eq. (58) (Eq. (59)) is equal to minus the cross entropy C(R, R) (C(R, X )).
In Table II, we present results for the α's defined by Eqs. (58)- (59). Most of these results were included in the original report on the quantum supremacy experiment [76]. Note that if we use Eq. (55) to estimate the standard deviation of α R,X we find that for m = 500000 samples, the standard deviation is about 0.0014. The second and third column show that the results produced by JUQCS-E are in excellent agreement with the theoretical prediction Eq. (57) for µ = 1. Recall that the latter has been obtained by averaging over all states distributed uniformly over the 2D-dimensional unit sphere whereas the former is obtained from a simulation with a single instance of the random circuits which have been "engineered" [76] to generate instances of such a state. The fact that α R,R ≈ 1 may suggest, but is not a proof, that the circuit R, executed by JUQCS-E, produces a random state of the type A (see Table I). In the fourth column we present the results obtained by using the sets of bitstrings M measured in the recent quantum supremacy experiment [76]. The fifth column shows two results for α R,M , obtained by using the JUQCS-E data for p R ( j), computed for circuit 39[b], and the experimental data generated by circuits 39[c] and 39[d], respectively. The α's obtained by replacing the measured bitstrings M by bitstrings E sampled from a uniform distribution are given in the sixth column of Table II. The numerical results presented in Table II can be summarized as follows: On the basis of (i) alone, it seems that there is little evidence in support of the hypothesis that the states of the set M have been sampled from the distribution that is characteristic for the circuit R. In fact, one might even be tempted to conclude that the Sycamore processor executing circuit R samples bitstrings from a uniform distribution. However, even though the numerical values of α R,M are small, they are still significantly larger than the values of α R,E obtained by sampling from the uniform distribution, see (ii). We can make this tentative conclusion more quantitative by using the model derived by the maximum entropy method and by performing a hypothesis test. First, given a value of c R , the solution of Eq. (56) is unique. Therefore, α R,M ≈ 0 implies that µ ≈ 0. From Eq. (52) it then follows that p V ( j) ≈ 1/D. Second, following Jaynes (Ref. [108], section 9.11), we consider the expression where n j is the number of times a bitstring j has been observed in m repetitions of the experiment and we have omitted constants that are irrelevant for the present purpose. The larger the value of ψ X , the less weight the hypothesis X has relative to any alternative hypothesis belonging to the same Bernoulli class (represented by probabilities for the bitstrings that are independent and stationary) [108]. Thus, according to this hypothesis test, the data gives more weight to hypothesis R than to hypothesis E if Therefore, we should opt for hypothesis R if α R,M > γ, an inequality which is obviously not supported by the data (since α R,M ≈ 0). Note that this test does not rule out that there are other, better hypotheses than the two, E and R, that we have considered. At any rate, on the basis of the values of α R,M alone, we should conclude that the processor is more likely to execute E than R. However, this conclusion is premature because the preceding analysis focuses on the sampling and hypothesized distributions without taking into account what the circuit R actually does when it is executed by Sycamore processor. According to (ii), feeding JUQCS-E with the bitstrings E , generated on a digital computer using a uniform distribution, does not support the latter conclusion. Indeed, the values of |α R,E | (column six) are at least one order of magnitude smaller than those of |α R,M | (column four). Furthermore, from (iii) it follows that the |α R,M |'s of bitstrings M produced by another circuit R' are also much smaller that those obtained by using bitstrings M produced by circuit R. This observation strongly suggests that the bitstrings M produced by the Sycamore processor carry a definite (albeit weak) signature of the circuit R that was used to generate bitstrings M . An important missing element in the above analysis is mentioned in (d) above in Sec. VI B, namely the fact that the NISQ processor is error-prone. If fact, these errors are substantial [76]. For a detailed discussion of the analysis that incorporates an error model, see Ref. [76].
To refute a claim that quantum supremacy has been demonstrated, the relevant question is then "for each R, can we construct an approximation that uses only a (non-exponential) fraction of the D-dimensional Hilbert space and yield values of α R,M similar to those shown in Table II. A recent paper suggests that the answer to this question may be affirmative although the results presented in that paper do not refute quantum supremacy yet [116]. By restricting state vectors of the quantum computer to a class of matrix-product states, the paper shows that it is possible to generate states that carry signatures (expressed through fidelities rather than cross entropies) of the generating circuit with quality similar to those observed in the quantum supremacy experiments [76]. However, using an approximation based on matrix-product states allows computation on digital computer at a cost that does not increase exponentially with the number of qubits [116]. Clearly, the last word about a quantum processor surpassing the power of a digital computer for a specific computational task has not been said yet.

VII. APPLICATION: QUANTUM INFORMATION THEORY
It has become common practice to characterize NISQ devices through what is called randomized benchmarking [106,107]. In essence, randomized benchmarking characterizes a quantum processor in terms of averages over many different (random) instances of sequences of gates.
In this section, we use elements of the random state technology to address some theoretical aspects of characterizing NISQ devices through randomized benchmarking. The result will be a generalization of a well-known formula for the average fidelity [77,78] to non-trace-preserving quantum operations.
Let ρ and σ denote the density matrices describing the state of the mathematical idealization of a quantum processor and of a real device, respectively. The fidelity defined by [117] is a measure of the difference between the density matrices ρ and σ . In the case that ρ = |ψ ψ| is a pure state, we have ρ = |ψ ψ|, √ ρ = ρ = ρ 2 , and the fidelity simplifies to the overlap, that is F(ρ, σ ) = ψ|σ |ψ . A mathematically convenient way to discuss the result of a sequence of quantum gate operations applied to the initial state |ψ is through the language of linear maps and quantum operations [43,109]. In short, a quantum operation transforms an initial density matrix ρ as ρ = E (ρ) = ∑ α E α ρE † α where the E α are the so-called operation elements (also known as Kraus operators; not to be confused with gate operations) [109]. The number of different E α 's is at most D 2 [109]. In general, ρ → E (ρ) defines a completely positive map [43,109] which need not be trace-preserving, meaning that Tr ∑ α E † α E α may be less than one [43,109]. In the special case that E is trace-preserving, we have Tr E (ρ) = Tr ρ = 1.
In this language, an ideal quantum gate operation corresponding to a unitary matrix U is described by the map E id (ρ) = UρU † . We imagine that a real quantum device, prepared in a pure state ρ = |ψ ψ|, actually carries out a slightly different operation denoted by the linear map E ac (ρ). We consider the quantum operation E (ρ) = E ac (E −1 id (ρ)) = E ac (U † ρU). If the device's implementation of U was perfect, the operation E would be equal to the identity operation. The fidelity of the operation is given by Clearly, the fidelity Eq. (62) is a numerical measure for how well a quantum processor performs the gate sequence represented by the unitary matrix U. If the fidelity is equal to one (zero), the quantum processor is performing perfectly (producing output that has no relation to U). Instead of estimating the fidelities of NISQ devices for a collection of different states |ψ , randomized benchmarking is motivated by drawing a sample of |ψ 's from a uniform distribution on the unit sphere and computing the average over the fidelity Eq. (62), where the integral in Eq. (63) is over all pure states or, equivalently, over all points on the surface of a unit sphere of dimension 2D. These integrals can be evaluated by the same method that we have used before. Writing |ψ = ∑ D j=1 c i | j (see Eq. (1)), we have where the integral is over all complex-valued c's such that ∑ D j=1 c * j c j = 1. Using Eq. (4) and the results given in the first row of Table I, we have E c * j c k c * l c m = (δ j,k δ l,m + δ j,m δ k,l )/D(D + 1), and Eq. (64) becomes where F ent (E ) = ∑ α |Tr E α | 2 /D 2 is the so-called entanglement fidelity [109,118]. In the case that E is trace preserving we have ∑ α Tr E † α E α = D and Eq. (65) reduces to [119] F avg (E ) = DF ent (E ) + 1 D + 1 .
The relation Eq. (65) generalizes an earlier result [77,78] to the case of non-trace-preserving quantum operations and also yields simple expressions for the average and entanglement fidelity in terms of the operation elements E α .

VIII. FINAL REMARKS
As mentioned in the introduction, for several decades many scientists have used random states in their simulation work. In this paper, we have presented a systematic and rigorous analysis of the mathematical foundation of the random state technology as it is being used in numerical simulation. Applications of this technology in areas of quantum statistical physics, quantum dynamics, and quantum information processing have been given, with the primary aim to illustrate the power and versatility of the random state technology.
In essence, the random state technology as applied in the numerical simulation arena reduces the computational burden from O(D 2 ) to O(D), where D is the dimension of the Hilbert space used to describe the quantum system. For such applications, there are rigorous bounds on the errors and statistical fluctuations that result from the use of random states.
We have also shown how the random state technology can help to analyze numerical simulations and experiments that aim for establishing quantum supremacy on a NISQ processor, as well as to prove a statement for non-trace-preserving maps in quantum information theory.
In view of the computational power, flexibility and versatility of the random state technology and the fact that it is based on solid principles, we may expect to see many new applications in the near future.
We first consider the Gaussian random states listed in the first two rows of Table I. Since the expressions of interest involve averages of Φ|X|Φ / Φ|Φ only, both choices for c j (ξ ξ ξ ) yield the same expressions for Φ|X|Φ / Φ|Φ . In other words, the exact expressions to be derived apply to both choices of c j (ξ ξ ξ ). Some of the integrals used below can also be found in Ref. [121].
Using the symmetries of the probability densities and the fact that both the real and imaginary part of c j (ξ ξ ξ ) = c j (a 1 , b 1 , . . . , a D , b D ) are odd functions of each of the a's and b's, we have and Using S p dΩ p = 2π p/2 /Γ(p/2) and making the change of variables r = x cos θ and R = x sin θ , we obtain where we used the identity π/2 0 cos 2p−1 θ sin 2q−1 θ dθ = Γ(p)Γ(q)/2Γ(p + q) and made the substitution y = x 2 .
Similarly, we have and In the case of the random phase state (Case C in Table I), Φ|Φ = 1 by construction. Then, from Eq. (5), it follows directly that E [D Φ|X|Φ / Φ|Φ ] = Tr X. Using Eq. (17), Φ|Φ = 1, and the expressions for the moments given in the third line of Table I we find Note that unlike in the case of the Gaussian random states, for the random phase state the variance depends on the choice of the basis states {| j | j = 1, . . . , D}.
Given a real number z, we may ask for the probability density p(z|D) that the j-th basis state occurs with probability p( j) = z. For a given random state, we have p( j) = (a 2 j + b 2 j )/(a 2 1 + b 2 1 + ... + a 2 D + b 2 D ). By symmetry, all basis states are equivalent and it suffices to consider only j = 1, for instance. Because 0 ≤ p( j) ≤ 1 we have p(z|D) = 0 if z < 0 or z > 1. For 0 ≤ z ≤ 1 we can use the same tricks as before and by some elementary algebra we obtain More generally, we can ask for the probability density that p( j 1 ), . . . , p( j k ) (all j's different from each other) take the values z 1 , . . . , z k , respectively. A calculation similar to the one used to obtain Eq. (A9) yields for 1 ≤ k < D The results of Eqs. (A3), (A4), and (A5) are now readily obtained by calculating 1 0 dz zp(z|D), 1 0 dz z 2 p(z|D), and 1 0 dz z 1 z 2 p(z 1 , z 2 |D), respectively.
If D is large and zD is not, we have which is known as Porter-Thomas distribution [111]. Note that for evaluating averages over the unit hypersphere there is no advantage of using the (approximate) Porter-Thomas distribution in place of the exact distribution Eq. (A9). With Eqs. (A9) and Eq. (A10), it is straightforward to compute the averages, denoted by . , over all random states |Φ . In general, we have F(p( j)) = 1 0 dz F(z)p(z|D), F(p( j))G(p(k = j)) = 1 0 1 0 dz 1 dz 2 F(z 1 )G(z 2 )p(z 1 z 2 |D), etc. Specifically we have where γ is Euler's constant. For large D, the variance on the entropy is found to be  1 where, x and y can be any functions of the random variables ξ ξ ξ .
In Section II, we give explicit expression for the expectation values of x, y, etc., in terms of the moments that are listed in Table I. Table III Table I and that the expressions for the variances agree with those given by Eq. (17).

Appendix C: Sampling over random states
In the main text, we focus on the calculation of the trace by using only one random state. Here we consider the case that further averaging over R > 1 different random states is necessary to produce results with good statistics. This is necessary for small systems or at very low temperature.
We assume that samples of pairs of variables (x i , y i ) for i = 1, . . . , R have been obtained from different independent realizations of a random variable ξ ξ ξ . Obviously, for i = j, E[ ]. There are two ways to average over the R samples [20,74], namely Before we use Eqs. (B1) and (B2) to analyze the mean and variance for the two different ways of computing the average of the R samples, we first recall the well-known formulas for the ensemble mean and variance. We have whereas for the second way of averaging Eq. (C2), we obtain Comparing Eqs. (C8) and (C9), it is clear that the variances show the same 1/R dependence (as usual for independent samples). However, the presence of the factor 1/R in the second and third term of the expression for the mean Eq. (C9) obtained using Eq. (C2) implies that these correction terms vanish as R → ∞, unlike in the case the mean Eq. (C8) is computed according to Eq. (C1). In other words, if possible, we should use Eq. (C2) to compute the average of independent realizations of the random states.
where 0 ≤ θ 1 ≤ 2π, 0 ≤ θ j ≤ π for j > 1, and the condition x 2 1 + . . . + x 2 d = r 2 is automatically satisfied. The Jacobian |J| of this transformation reads [123] showing that because of the independence of r and the θ 's, p(R) is independent of the direction of the vector x = (x 1 , . . . , x n ). Therefore, the distribution of points x i / x is uniform over the sphere.

An alternative to Muller's method
The following is based on material presented in Ref. [124], pages 111 and 182. In contrast to representation Eq. (D2) where we use 2D spherical coordinates to encode pairs of real numbers that determine the amplitude of a basis state, the representation that we adopt in this section uses D − 1 spherical coordinates to encode the square root of the probabilities of the basis states and another D random numbers {ν 1 , . . . , ν D } to encode the phases. We represent the coefficients of the state vector by {x 1 e iφ 1 , . . . , x D e iφ D } where for all 1 ≤ j ≤ D, 0 ≤ φ j ≤ 2π, x j ≥ 0 and x 2 1 + . . . + x 2 D = r 2 . We will set r = 1 later. The x j 's and φ j are called octant coordinates [124]. In terms of the spherical coordinates, we have where 0 ≤ θ j ≤ π/2 and the conditions 0 ≤ x j and x 2 1 + . . . + x 2 D = r 2 are automatically satisfied. We can find the volume element of the 2D-dimensional sphere in these coordinates as follows. First consider a single complex number z = u + iv = xe iφ and compute the Jacobian for the transformation (u, v) → (x, φ ) to find that du dv = xdx dφ . For D complex variables, the volume element in Cartesian coordinates is du 1 dv 1 . . . du D dv D = x 1 . . . x D dx 1 . . . dx D dφ 1 . . . dφ D . Next, we use the spherical coordinates (restricted to the first octant) for x 1 , . . . , x D , see Eq. (D11). The Jacobian of this transformation is given by Eq. (D3) with n = D. Collecting all sines and cosines we find where y k = sin θ k . Integrating Eq. (D14) over 0 ≤ r ≤ R, 0 ≤ y k ≤ 1 and 0 ≤ φ k ≤ 2π we find (D14) From Eq. (D14), it is easy to find the probability density for any of the x's. For instance, the probability that x 2 D is less than z is given by from which, by differentiation with respect to z, we find p(z|D) = (D − 1)(1 − z) D−2 , see Eq. (A9). Unlike with the standard spherical coordinates, see Eq. (D8), the expression Eq. (D14) allows us to sample the angles θ k 's, or equivalently, the y k 's independently. The probability density and probability for y k are given by p(y k ) = 2ky 2k−1 respectively, from which it follow that in order to generate a random number Y k with the correct distribution, we simply have to generate a uniform random variable r k and put Y k = r 1/2k k . In practice, k can be very large and numerically, we should use Y 2 k = exp(log(r)/k) ≈ 1 − log(r)/k if log(r)/k is very small. For numerical purposes, we find that Müller's method is more convenient, in particular if D is large.