Determinantal Processes and Independence

We give a probabilistic introduction to determinantal and permanental point processes. Determinantal processes arise in physics (fermions, eigenvalues of random matrices) and in combinatorics (nonintersecting paths, random spanning trees). They have the striking property that the number of points in a region $D$ is a sum of independent Bernoulli random variables, with parameters which are eigenvalues of the relevant operator on $L^2(D)$. Moreover, any determinantal process can be represented as a mixture of determinantal projection processes. We give a simple explanation for these known facts, and establish analogous representations for permanental processes, with geometric variables replacing the Bernoulli variables. These representations lead to simple proofs of existence criteria and central limit theorems, and unify known results on the distribution of absolute values in certain processes with radially symmetric distributions.


Introduction
Determinantal point processes were first studied in 1975 by Macchi [22], who was motivated by fermions in quantum mechanics. Determinantal processes arise naturally in several other settings, including eigenvalues of random matrices, random spanning trees and nonintersecting paths; see, e.g., Burton and Pemantle [4], Soshnikov [27], Lyons [20], Lyons and Steif [21], Shirai and Takahashi [25], Johansson [14], Borodin, Okounkov and Olshanski [3], and Diaconis [8]. A determinantal point process, on a Polish space Λ (assumed locally compact) with a reference measure (assumed Radon) µ, is determined by a kernel K(x, y): the joint intensities of the process can be written as det(K(x i , x j )). The kernel defines an integral operator K acting on L 2 (Λ) that is assumed to be self-adjoint, non-negative and locally trace class, i.e. for every compact D the eigenvalues, λ D i , of the operator K restricted to D satisfy i λ D i < ∞. Determinantal point processes have a special property (Shirai and Takahashi [26] Proposition 2.8) that seems to have only been used in special cases ( [1], [23]): In a determinantal process, the number of points that fall in a compact set D ⊂ Λ, has the same distribution as a sum of independent Bernoulli(λ D i ) random variables.
The proof is immediate from well known formulas for the generating function of particle counts. However, we give a proof starting from first principles and avoid the use of Fredholm determinants.
Permanental processes are defined analogously and are the counterpart of determinantal processes for modeling bosons (see Macchi [22]). They fall into the more general class known as Cox processes [6]. In this case we have: In a permanental process, the number of points that fall in a compact set D ⊂ Λ, has the same distribution as a sum of independent geometric( For the examples of interest to us, operator theoretic nuances are not essential. The reader will not miss anything significant by keeping in mind just the following cases. 1. Λ is a finite set and K is a Hermitian non-negative definite |Λ|×|Λ| matrix, and µ is the counting measure on Λ. 2. Λ is an open set in R d , µ is Lebesgue measure and K(x, y) is continuous function defining a self-adjoint non-negative integral operator K on L 2 (Λ).
A point process in a locally compact Polish space Λ is a random integervalued positive Radon measure X on Λ. (Recall that a Radon measure is a Borel measure which is finite on compact sets.) If X almost surely assigns at most measure 1 to singletons, it is a simple point process; in this case X can be identified with a random discrete subset of Λ, and X (D) represents the number of points of this set that fall in D.
The distribution of a point process can, in most cases, be described by its joint intensities (also known as correlation functions).
Remark 2. For overlapping sets, the situation is more complicated. Restricting attention to simple point processes, ρ k is not the intensity measure of X k , but that of X ∧k , the set of ordered k-tuples of distinct points of X . Indeed, (1) implies (see [18,19,23]) that for any Borel set B ⊂ Λ k we have When B = D ⊗ki i for a mutually disjoint family of subsets D 1 , . . . , D r of Λ, and k = r i=1 k i , the left hand side becomes For a general point process X , observe that it can be identified with a simple point process X * on Λ × {1, 2, 3, . . .} such that X * (D × {1, 2, 3, . . .}) = X (D) for Borel D ⊂ Λ. Thus, if X (D) has exponential tails for all compact D ⊂ Λ, then the joint intensities determine the law of X ; see [18,19]. In particular, Theorems 7 and 10 below imply that this is the case for determinantal and permanental processes governed by a trace class operator, since a convergent sum of Bernoulli (or geometric) variables always has exponential tails.
Definition 3. A point process X on Λ is said to be a determinantal process with kernel K if it is simple and its joint intensities satisfy: for every k ≥ 1 and x 1 , . . . , x k ∈ Λ.
Remark 4. We postulate determinantal processes to be simple because we have adopted equation (1) as the definition of joint intensities. If instead, we start with the slightly more restrictive definition of joint intensities as explained above in Remark 2, it follows that the process must be simple. Nevertheless, postulating simplicity is more in tune with the conventions of Physicists who often consider a determinantal process with k points, not as a random counting measure but as a random point in Λ k /Diagonals, where "Diagonals" denotes the subset of points of Λ k with at least two co-ordinates equal. This viewpoint together with a postulate on the behaviour of quantum amplitudes under exchange of particles leads naturally to Determinantal and Permanental processes (and additionally to "fractional statistics", when Λ is two dimensional, see for instance [16]).
Consider a kernel K that defines a self-adjoint trace-class operator K. Macchi [22] and Soshnikov [27] showed that there exists a determinantal point process with kernel K if and only if all eigenvalues of K are in the interval [0, 1]. In Section 2 we shall give a probabilistic proof of this fact.
Remark 5. When we speak of a kernel K on Λ 2 , a priori it is only defined only almost everywhere w.r.t. µ × µ and thus quantities like K(x, x)dµ(x) that appear in the definition of joint intensities are not defined. This can be made sense of as follows.
The kernel K defines a self-adjoint integral operator K that has eigenfunctions φ k and eigenvalues λ k , where φ k are orthogonal in L 2 (µ). In particular, there is a set Λ 1 ⊂ Λ such that φ k are all defined point-wise on Λ 1 and µ(Λ c 1 ) = 0. At least in the case when K has finite rank, this shows that K(x, y) = k λ k φ k (x)φ k (y) is well defined on Λ 1 × Λ 1 , and that is sufficient to define K(x, x)dµ(x) etc. For more details on this point, see Lemmas 1,2 of the survey paper of Soshnikov [27].
Recall that the permanent of an n × n matrix M is defined as Hough, Krishnapur, Peres and Virág/Determinantal Processes and Independence 210 Definition 6. A point process X on Λ is said to be a permanental process with kernel K if its joint intensities satisfy: for every k ≥ 1 and x 1 , . . . , x k ∈ Λ.
For any kernel K that defines a self-adjoint non-negative definite operator K (i.e., the eigenvalues of K are nonnegative), there exists a permanental point process with kernel K. We shall give a proof of this known fact in Section 4. Now we state the main theorems. The most common application of the following theorem is to describe the behavior of a determinantal process already restricted to a subset.
Theorem 7. Suppose X is a determinantal process with a trace-class kernel K. Write where φ k are normalized eigenfunctions of K with eigenvalues λ k ∈ [0, 1]. (Here n = ∞ is allowed). Let I k , 1 ≤ k ≤ n be independent random variables with I k ∼ Bernoulli(λ k ). Set K I is a random analogue of the kernel K. Let X I be the determinantal process with kernel K I (i.e., first choose the I k 's and then independently sample a discrete set that is determinantal with kernel K I ). Then In particular, the total number of points in the process X has the distribution of a sum of independent Bernoulli(λ k ) random variables.
In the special case of random spanning trees of a finite graph, Bapat [1] was the first to observe the last fact stated above. Namely, he proved that the number of edges of the spanning tree falling in a subset of edges of the given graph has the distribution of a sum of independent Bernoullis.
When the D i 's are related in a special way, there exists a simple probabilistic description of the joint distributions of the counts X (D i ). The motivation for this terminology comes from quantum mechanics, where two physical quantities can be simultaneously measured if the corresponding operators commute. Proposition 9. Under the assumptions of Theorem 7, let D i ⊂ Λ, 1 ≤ i ≤ r be mutually disjoint and simultaneously observable. Let e i be the standard basis vectors in R r . Denote by φ k , the common eigenfunctions of K on the D i 's and by λ k,i the corresponding eigenvalues. Write λ k = i λ k,i and note that λ k ≤ 1.
where ξ k = (ξ k,1 , . . . , ξ k,r ) are independent for different values of k, with P( ξ k = e i ) = λ k,i for 1 ≤ i ≤ r and P( ξ k = 0) = 1 − λ k . In words, (X (D 1 ), . . . , X (D r )) has the same distribution as the vector of counts in r cells, if we pick n balls and assign the k th ball to the i th cell with probability λ k,i (there may be a positive probability of not assigning it to any of the cells).
Theorem 10. Suppose X is a permanental process in Λ with a trace-class kernel where φ k are normalized eigenfunctions of K with eigenvalues λ k (n = ∞ is allowed). Let α = (α 1 , . . . , α n ), where α i are non-negative integers such that ℓ = ℓ( α) = α 1 + · · · + α n < ∞ and let Z α be the random vector in R ℓ with density: where γ = (γ 1 , . . . , γ n ). In particular, X (Λ) has the distribution of a sum of independent geometric( λ k λ k +1 ) random variables. Remark 11. The density given in (9) has physical significance. Interpreting the functions φ k as eigenstates of a one-particle Hamiltonian, (9) gives the distribution for ℓ non-interacting bosons in a common potential given that α i of them lie in the eigenstate φ i . This density is the exact analogue of the density which appears in Theorem 7 and gives the distribution for ℓ non-interacting fermions in a common potential given that one fermion lies in each of the eigenstates φ i1 , . . . , φ i ℓ . The fact that (10) vanishes if a row is repeated illustrates Pauli's exclusion principle, which states that multiple fermions cannot occupy the same eigenstate. See [9] for more details.
Theorem 12. Under the assumptions of Theorem 10, suppose D 1 , . . . , D r are simultaneously observable as in definition 8. Denote by φ k the common eigenfunctions of K on the D i 's and by λ k,i the corresponding eigenvalues. Then where (η k,1 , . . . , η k,r ) are independent for different values of k, for each k, the sum η k = i η k,i has a geometric distribution with mean λ k := i λ k,i and given i η k,i = N ,

Determinantal processes
We begin with a few important examples of determinantal processes.
Example 13 (Non-intersecting random walks). Consider n independent simple symmetric random walks on Z started from i 1 < i 2 < . . . < i n where all the i j 's are even. Let P i,j (t) be the t-step transition probabilities. Karlin and McGregor [15] show that the probability that the random walks are at j 1 < j 2 < . . . < j n at time t and have mutually disjoint paths is It follows easily that if t is even and we also condition the walks to return to i 1 , . . . , i n at time t, then the positions of the walks at time t/2 are determinantal with a Hermitian kernel. See Johansson [14] for this and more general results.
Example 14 (Uniform spanning trees). Let G be a finite undirected graph and let E be the set of oriented edges (each undirected edge appears in E with both orientations). Let T be uniformly chosen from the set of spanning trees of G. For each directed edge e = vw, let χ e := 1 vw − 1 wv denote the unit flow along e. Define χ ei : e 1 , . . . , e n is an oriented cycle}.
It is easy to see that H = ⋆ ⊕ ♦. Now, define I e := K ⋆ χ e , the orthogonal projection onto ⋆. Then, the set of edges in T forms a determinantal process with kernel K(e, f ) := (I e , I f ) with respect to counting measure on the set of edges. This was proved by Burton and Pemantle [4], who represented K(e, f ) as the current flowing through f when a unit of current is sent from the tail to the head of e. The Hilbert space formulation above is from BLPS [2].
Example 15 (Ginibre ensemble). Let Q be an n × n matrix with i.i.d. standard complex normal entries. Ginibre [10] proved that the eigenvalues of Q form a determinantal process in C with the kernel with respect to Lebesgue measure. As n → ∞, we get a determinantal process in the plane with the kernel Example 16 (Zero set of a Gaussian analytic function). The power series f 1 (z) := ∞ n=0 a n z n where a n are i.i.d. standard complex normals, defines a random analytic function in the unit disk (almost surely). Peres and Virág [23] show that the zero set of f 1 is a determinantal process in the disk with the Bergman kernel with respect to Lebesgue measure in the unit disk.

Determinantal projection processes: motivation and construction
The most general determinantal processes are mixtures of determinantal projection processes, i.e. processes whose kernel K H defines a projection operator K H to a subspace H ⊂ L 2 (Λ) or, equivalently, Then the number of points in X is equal to n, almost surely.
Proof. The conditions imply that the matrix (K(x i , x j )) 1≤i,j≤k has rank at most n for any k ≥ 1. From (3), we see that E X (Λ) k = 0 for k > n. This shows Hough, Krishnapur, Peres and Virág/Determinantal Processes and Independence 214 that X (Λ) ≤ n almost surely. However, Therefore X (Λ) = n, almost surely.
Despite the fact that determinantal processes arise naturally and many important statistics can be computed, the standard Definition 3 is lacking in direct probabilistic intuition. Below we present an algorithm that is somewhat more natural from a probabilist's point of view, and can also be used for modeling determinantal processes.
In the discrete case, the projection operator K H can be applied to the delta function at a point, and we have K H δ x (·) = K(·, x). In the general case we take this as the definition of K H δ x . Let · denote the norm of L 2 (µ).The intensity measure of the process is given by When µ is supported on countably many points, we have is a probability measure on Λ. We construct the determinantal process as follows. Start with n = dim(H), and H n = H.
• Pick a random point X n from the probability measure µ Hn /n.
• Let H n−1 ⊂ H n be the orthocomplement of the function K Hn δ x in H n . In the discrete case, H n−1 = {f ∈ H n : f (X n ) = 0}. Note that dim(H n−1 ) = n − 1 a.s. • Decrease n by 1 and iterate.
Proposition 19. The points (X 1 , . . . , X n ) constructed by Algorithm 18 are distributed as a uniform random ordering of the points in a determinantal process X with kernel K.
Proof. Let ψ j = K H δ xj . Projecting to H j is equivalent to first projecting to H and then to H j , and it is easy to check that K Hj δ xj = K Hj ψ j . Thus, by (12), the density of the random vector (X 1 , . . . , X n ) constructed by the algorithm equals

Hough, Krishnapur, Peres and Virág/Determinantal Processes and Independence 215
Note that H j = H ∩ ψ j+1 , . . . , ψ n ⊥ , and therefore V = n j=1 K Hj ψ j is exactly the repeated "base times height" formula for the volume of the parallelepiped determined by the vectors ψ 1 , . . . , ψ n in the finite-dimensional vector space H ⊂ L 2 (Λ). It is well-known that V 2 equals the determinant of the Gram matrix whose i, j entry is given by the scalar product of ψ i , ψ j , that is so the random variables X 1 , . . . , X n are exchangeable. Viewed as a point process, the n-point joint intensity of {X j } n j=1 is n!p(x 1 , . . . , x n ), which agrees with that of the determinantal process X . The claim now follows by Lemma 17.
Example 20 (Uniform spanning trees). We continue the discussion of Example 14 Let G n+1 be an undirected graph on n + 1 vertices. For every edge e, the effective resistance when a unit of current is sent along e is R(e) = (I e , I e ). To use our algorithm to choose a uniform spanning tree, proceed as follows: • If n = 0, stop.
• Take X n to be a random edge, chosen so that P(X n = e i ) = 1 n R(e i ). • Construct G n from G n+1 by contracting the edge X n , and update the effective resistances {R(e)}. • Decrease n by one and iterate.
For sampling uniform spanning trees, more efficient algorithms are known, but for the general case the above procedure is the most efficient we are aware of.
We shall need the following lemmas.
Lemma 21. Suppose {φ k } n k=1 is an orthonormal set in L 2 (Λ). Then there exists a determinantal process with kernel K(x, y) = n k=1 φ k (x)φ k (y). Proof. For any x 1 , . . . , x n we have (K(x i , x j )) 1≤i,j≤n = A A * , where A i,k = φ k (x i ). Therefore, det (K(x i , x j )) is non-negative. Moreover, In the sum, if π(k) = τ (k), then Λ φ π(k) (x k )φ τ (k) (x k )dx k = 0, and when π(k) = τ (k), this integral is 1. Thus, only the terms with π = τ contribute. We get which along with the non-negativity of det (K(x i , x j )) 1≤i,j≤n shows that 1 n! det (K(x i , x j )) 1≤i,j≤n is a probability density on Λ n . If we look at the resulting random variable as a set of unlabeled points in Λ, we get the desired n-point joint intensity ρ n .
Lower joint intensities are got by integrating over some of the x i s: We caution that (13) is valid only for a point process that has n points almost surely. In general, there is no way to get lower joint intensities from higher ones. We now show how to get ρ n−1 . The others can be found exactly the same way, or inductively. Set k = n − 1 in (13) and expand ρ n (x 1 , . . . , x n ) = det (K(x i , x j )) 1≤i,j≤n as we did before to get If π(n) = τ (n), the integral vanishes. And if π(n) = τ (n) = j, π and τ map {1, . . . , n − 1} to {1, 2, . . . , n} − {j} (and the product of the signs of these "permutations" is the same as sgn(π)sgn(τ ), because π(n) = τ (n)). This gives us We must show that this quantity is equal to det (K(x i , x j )) i,j≤n−1 . For this note that where we abuse notation and let A[i 1 , . . . , i m ] stand for the matrix formed by taking the columns numbered i 1 , . . . , i m and B[i 1 , . . . i m ] for the matrix formed by the corresponding rows of B. The identity (14) can be proved by observing that both sides are multi-linear in the rows of A and in the columns of B.
We now prove Theorem 7. Before that we remark that in many examples the kernel K defines a projection operator, i.e, λ k = 1 for all k. Then I k = 1 for all k, almost surely, and the theorem is trivial. The theorem is applicable to the restriction of the process X to D for any Borel set D ⊂ Λ, as the restricted process is determinantal with kernel the restriction of K to D × D.
Proof of Theorem 7. First assume that K is a finite-dimensional operator-i.e., K(x, y) = n k=1 λ k φ k (x)φ k (y) for some finite n. We show that the processes on the left and right side of (7) have the same joint intensities. By (3), this implies that these processes have the same distribution.
Note that the process X I exists by Lemma 21. For m > n, the m-point joint intensities of both X and X I are clearly zero. Now consider m ≤ n and x 1 , . . . , x m ∈ Λ. We claim that: To prove (15), note that where A is the m × n matrix with A i,k = I k φ k (x i ) and B is the n × m matrix with B k,j = φ k (x j ). Apply Cauchy-Binet formula (14) to A, B defined above and take expectations. Observe that B[i 1 , . . . i m ] is nonrandom and where C is the m × n matrix C i,k = λ k φ k (x i ). Applying the Cauchy-Binet formula in the reverse direction to C and B, we obtain (15) and hence also (7). By Lemma 17, given {I k } k≥1 , X I has k I k points, almost surely. Therefore, So far we assumed that the operator K determined by the kernel K is finite dimensional. Now suppose K is a general trace class operator. Then k λ k < ∞ and hence, almost surely, k I k < ∞. Therefore, the process X I is well defined and (16) is valid by the same reasoning. Taking expectations and observing that the summands in the Cauchy-Binet formula are non-negative, we obtain where C is the same as before. To conclude that the right hand side is just det (K(x i , x j )) 1≤i,j≤m , we first apply the Cauchy-Binet formula to the finite ap- as was required to show. (In short, the proof for the the infinite case is exactly the same as before, only we cautiously avoided applying Cauchy-Binet formula to the product of two infinite rectangular matrices). Now we give a probabilistic proof of the following criterion for a Hermitian integral kernel to define a determinantal process.
Theorem 22 (Macchi [22], Soshnikov [27]). Let K determine a self-adjoint integral operator K on L 2 (Λ) that is locally trace class. Then K defines a determinantal process on Λ if and only if all the eigenvalues of K are in [0, 1].
Proof. We can assume that K is trace class, since it suffices to construct a determinantal process on compact subsets of Λ with kernel the restriction of K.
Sufficiency: If K is a projection operator, this is precisely Lemma 21. If the eigenvalues are {λ k }, then as in the proof of Theorem 7 we construct the process X I . The proof there shows that X I is determinantal with kernel K.
Necessity: Suppose that X is determinantal with kernel K. Since the joint intensities of X are non-negative, K must be non-negative definite. Now suppose that the largest eigenvalue of K is λ > 1. Let X 1 be the process obtained by first sampling X and then independently deleting each point of X with probability 1− 1 λ . Computing the joint intensities shows that X 1 is determinantal with kernel 1 λ K. Now X has finitely many points (we assumed that K is trace class) and λ > 1. Hence, P [X 1 (Λ) = 0] > 0. However, 1 λ K has all eigenvalues in [0, 1], with at least one eigenvalue equal to 1, whence by Theorem 7, P [X 1 (Λ) ≥ 1] = 1, a contradiction.
Example 23 (Non-measurability of the Bernoullis). A natural question that arises from Theorem 7 is whether, given a realization of the determinantal process X , we can determine the values of the I k 's. This is not always possible, i.e., the I k 's are not measurable w.r.t. the process X in general.
Consider 16 . Since all measurable events have probabilities that are multiples of 1 8 , it follows that the Bernoullis cannot be measurable.
Theorem 7 gives us the distribution of the number of points X (D) in any subset of Λ. Given several regions D 1 , . . . , D r , can we find the joint distribution of X (D 1 ), . . . , X (D r )? It seems that a simple probabilistic description of the joint distribution exists only when the D i 's are simultaneously observable, as in Theorem 9.
Proof of Theorem 9. At first we make the following assumptions:

Hough, Krishnapur, Peres and Virág/Determinantal Processes and Independence 219
• K defines a finite dimensional projection operator-i.e., K(x, y) = n k=1 φ k (x)φ k (y) for x, y ∈ Λ, and {φ k } is an orthonormal set in L 2 (Λ). Note that by our assumption, φ k are also orthogonal on D i for every 1 ≤ i ≤ r. Moreover, it is clear that λ k,i = Di |φ k | 2 .
We write  In particular, since by Lemma 17, a determinantal process whose kernel defines a rank-n projection operator has exactly n points, almost surely. Thus, we have Any term with σ = τ vanishes upon integrating. Indeed, if σ(m) = τ (m) for some m, then where j(m) is the index for which Therefore,

Hough, Krishnapur, Peres and Virág/Determinantal Processes and Independence 220
Now consider (8) and set M i = k ξ k,i for 1 ≤ i ≤ r. Our goal is to compute P [M 1 = k 1 , . . . , M r = k r ]. This problem is the same as putting n ball into r cells, where the probability for the j th ball to fall in cell i is λ j,i . To have k i balls in cell i for each i, we first take a permutation σ of {1, 2, . . . , n} and then put the σ th m ball into cell j(m) if k 1 + . . . + k j(m)−1 < m ≤ k 1 + . . . + k j(m) . However, this counts each assignment of balls r i=1 k i ! times. This implies that which is precisely what we wanted to show. Now we deal with the two assumptions that we made at the beginning. If ∪ i D i = Λ, we could restrict the point process to ∪ i D i . We still have a determinantal process. Then, if the kernel does not define a projection, apply Theorem 7 to write X as a mixture of determinantal projection processes. Applying (19) to each component in the mixture we obtain the theorem. The finite rank assumption can be relaxed in the same way as in Theorem 7.

Applications
As an application of Theorem 7 we can derive the following central limit theorem for determinantal processes due to Costin and Lebowitz [5] in case of the sine kernel, and due to Soshnikov [28] for general determinantal processes.
Theorem 24. Let X n be a sequence of determinantal processes on Λ with kernels K n . Let D n be a sequence of measurable subsets of Λ such that V ar (X n (D n )) → ∞ as n → ∞. Then Var (X n (D n )) d → N (0, 1).
Proof. By Theorem 7, X (D n ) has the same distribution as a sum of independent Bernoullis with parameters being the eigenvalues of the integral operators associated with K n restricted to D. A straightforward application of Lindeberg-Feller CLT for triangular arrays gives the result.
Remark 25. Existing proofs of results of the kind of Theorem 24 ([5], [28]) use the moment generating function for particle counts. Indeed, one standard way to prove central limit theorems (including the Lindeberg-Feller theorem) uses generating functions. The advantage of our proof is that the reason for the validity of the CLT is more transparent and a repetition of well known computations are avoided. Moreover, by applying the classical theory of sums of independent variables, local limit theorems, large deviation principles and extreme value asymptotics follow without any extra effort.
Radially symmetric processes on the complex plane Theorem 9 implies that when a determinantal process with kernel K has the form K(z, w) = n c n (zw) n , with respect to a radially symmetric measure µ, then the absolute values of the points are independent. More precisely, we have Theorem 26. Let X be a determinantal process with kernel K with respect to a radially symmetric measure µ on C. Write K(z, w) = k λ k a 2 k (zw) k , where a k z k , 0 ≤ k ≤ n − 1 are the normalized eigenfunctions for K. The following construction describes the distribution of {|z| 2 : z ∈ X }.
• For 1 ≤ k ≤ n − 1 let Q k be an independent size-biased version of Q k−1 (i.e., Q k has density f k (q) = a 2 k a 2 k−1 q with respect to the law of Q k−1 ).
• Consider the point process in which each point Q k is included with probability λ k independently of everything else.
When µ has density φ(|z|), then Q k has density Theorem 26 (and its higher dimensional analogues) is the only kind of example that we know for interesting simultaneously observable counts.
Proof. Let ν be the measure of the squared modulus of a point picked from µ. In particular, if µ has density φ(|z|), then we have dν(q) = πφ( √ q) dq.
For 1 ≤ i ≤ r, let D i be mutually disjoint open annuli centered at 0 with inner and outer radii r i and R i respectively. Since the functions z k are orthogonal on any annulus centered at zero, it follows that the D i 's are simultaneously observable. To compute the eigenvalues, we integrate these functions against the restricted kernel; clearly, all terms but one cancel, and we get that for z ∈ D i z k λ k,i = Di λ k a 2 k (zw) k w k dµ(w), and so As r i , R i change, the last expression remains proportional to the probability that the k times size-biased random variable Q k falls in (r 2 i , R 2 i ). When we set (r i , R i ) = (0, ∞), the result is λ k because a k w k has norm 1. Thus the constant of proportionality equals λ k . The theorem now follows from Proposition 9.
Example 27 (Ginibre Ensemble revisited). Recall that the n th Ginibre ensemble described in Example 15 is the determinantal process G n on C with kernel K n (z, w) = n−1 k=0 λ k a 2 k (zw) k with respect to the complex Gaussian measure dµ = 1 π e −|z| 2 dz, where a 2 k = 1/k!, and λ k = 1. The modulus-squared of a complex Gaussian is a gamma(1, 1) random variable, and its k-times size-biased version has gamma(k + 1, 1) distribution (see (21)). Theorem 26 immediately yields the following.
Theorem 28 (Kostlan [17]). The set of absolute values of the points of G n has the same distribution as {Y 1 , . . . , Y n } where Y i are independent and Y 2 i ∼ gamma(i, 1).
All of the above holds for n = ∞ also, in which case we have a determinantal process with kernel e zw with respect to dµ = 1 π e − 1 2 |z| 2 dz. This case is also of interest as G ∞ is a translation invariant process in the plane.
Example 29 (Zero set of a Gaussian analytic function). Recall the zero set of f 1 (z) := ∞ n=0 a n z n is a determinantal process in the disk with the Bergman kernel with respect to Lebesgue measure in the unit disk. Theorem 26 applies with a 2 k = (k + 1) and λ k = 1 (to make K trace class, we first have to restrict it to the disk of radius r < 1 and let r → 1). From (21) we immediately see that Q k has beta(k + 1, 1) distribution. Equivalently, we get the following.
Theorem 30 (Peres and Virág [23]). The set of absolute values of the points in the zero set of f 1 has the same distribution as {U

High powers of complex polynomial processes
Rains [24] showed that sufficiently high powers of eigenvalues of a random unitary matrix are independent.
Theorem 31 (Rains [24]). Let {z 1 , . . . , z n } be the set of eigenvalues of a random unitary matrix chosen according to Haar measure on U(n). Then for every k ≥ n, {z k 1 , . . . , z k n } has the same distribution as a set of n points chosen independently according to uniform measure on the unit circle in the complex plane.
We point out that this theorem holds whenever the angular distribution of the points is a trigonometric polynomial.
Proposition 32. Let {z 1 , . . . , z n } be distributed on (S 1 ) ⊗n with density P (z 1 , . . . , z n , z 1 , . . . , z n ) w.r.t. uniform measure on (S 1 ) ⊗n , where P is a polynomial of degree d or less in each variable. Then for every k > d the vector (z k 1 , . . . , z k n ) has the distribution of n points chosen independently according to uniform measure on S 1 .
Proof. Fix k > d and consider any joint moment of (z k 1 , . . . , z k n ), where λ denotes the uniform measure on (S 1 ) ⊗n . If m i = ℓ i for some i then the integral vanishes. To see this, note that the average of a monomial over (S 1 ) ⊗n is either 1 or 0 depending on whether the exponent of every z i matches that of z i . Suppose without loss of generality that m 1 > ℓ 1 . Then in each term, we have an excess of z k 1 which cannot be matched by an equal power of z 1 because P has degree less than k as a polynomial in z 1 .
We conclude that the joint moments are zero unless m i = ℓ i for all i. If m i = ℓ i for all i, then the expectation equals 1. Thus, the joint moments of (z k 1 , . . . , z k n ) are the same as those of n i.i.d. points chosen uniformly on the unit circle. This proves the proposition.
More generally, by conditioning on the absolute values we get the following.
Corollary 33 applies to powers of points of determinantal processes with kernels of the form K(z, w) = d k=0 c k (zw) k w.r.t a radially symmetric measure µ on the complex plane. Combining this observation with our earlier results on the independence of the absolute values of the points, we get the following result.

Permanental processes
In this section we prove analogous theorems for permanental processes. We begin with the following known representation of permanental processes, that can be found in Macchi [22].
Proposition 35. Let F be a complex Gaussian process on Λ. Given F , let X be a Poisson process in Λ with intensity |F | 2 . Then X is a permanental process with kernel K(x, y) = E F (x)F (y) .
Remark 36. Since any non-negative definite Hermitian kernel is the covariance kernel of a complex Gaussian process, it follows that all permanental processes are of the above form.
Corollary 37. If K determines a self-adjoint non-negative definite locally traceclass integral operator K, then there exists a permanental process with kernel K. Now we prove Theorem 10 using the representation in Proposition 35. We need the following simple fact: Fact 38. Let Y be a Poisson process on Λ with intensity measure ν. Assume that ν(Λ) < ∞ and ν is absolutely continuous with respect to µ. Let Y be the random vector of length Y(Λ) obtained from a uniform random ordering of the points of Y. For k ≥ 1, the law of Y on the event that Y(Λ) = k is a subprobability measure on Λ k with density with respect to µ k . We have Proof of Theorem 10:. We use the construction in Proposition 35 with F (z) = n k=1 √ λ k a k φ k (z) where a k are independent standard complex Gaussian random variables. Let X be the random vector obtained from a uniform random ordering of the points of X . If we first condition on F , then by Fact 38 the joint density of the random vector X on the event {X (Λ) = k} is given by which is a subprobability measure with total weight P X (Λ) = k F . Integrating over the distribution of F we get that on the event {X (Λ) = k} the density of X is which is also a subprobability measure with total weight P X (Λ) = k . We now expand the product inside the expectation (23) as a sum indexed by ordered set partitions (S 1 , . . . , S n ) and (T 1 , . . . , T n ) of {1, 2, . . . , k}. The set partitions corresponding to a summand q are constructed by letting S ℓ be the set of indices i for which q contains the term √ λ ℓ a ℓ φ ℓ (z i ) and T i be the set of indices i for which q contains the term √ λ ℓ a ℓ φ ℓ (z i ). The summand corresponding to the partitions (S ℓ ), (T ℓ ) is thus: which clearly vanishes unless |S ℓ | = |T ℓ | for every ℓ. Also note that for a standard complex normal random variable a, E e −λ|a| 2 |a| 2m = m! (1 + λ) m+1 .
By orthogonality of φ j s, this term vanishes upon integration unless π −1 (I j ) = σ −1 (I j ) for every 1 ≤ j ≤ n. For a given π, there are n j=1 m j ! choices of σ that satisfy this. For each such σ, we get 1 upon integration over z i s. Summing over all k! choices for π, we get Λ k j k dµ = P X (Λ) = k = (m1,...,mn): mi=k which proves the claim about the number of points in Λ. Thus by (25) X is a mixture of the processes X m (D), with weights given by n i=1 P [χ i = m i ], where m = (m 1 , . . . , m n ) with m i being non-negative integers. This is what we wanted to prove. Now we prove Theorem 12. As before, we remark that it is applicable to the restriction of X to D for any Borel set D ⊂ Λ.