Exchangeable pairs and Poisson approximation

This is a survey paper on Poisson approximation using Stein's method of exchangeable pairs. We illustrate using Poisson-binomial trials and many variations on three classical problems of combinatorial probability: the matching problem, the coupon collector's problem, and the birthday problem. While many details are new, the results are closely related to a body of work developed by Andrew Barbour, Louis Chen, Richard Arratia, Lou Gordon, Larry Goldstein, and their collaborators. Some comparison with these other approaches is offered.


Introduction
Charles Stein has introduced a general method for proving limit theorems with explicit error terms, the essential idea of which is the notion of a characterizing operator. Given a probability measure P o on a space Ω with expectation E o , a characterizing operator T o is an operator on a suitable function space on Ω with the properties that: 2. If E is expectation for a different probability on Ω and ET o ≡ 0, then The idea is then to prove that E . = E o by showing that ET o . = 0. To do this, Stein has introduced a method which he calls the method of exchangeable pairs.
In this survey, we specialize his general approach to Poisson approximation. Very roughly, the idea is to show that a random variable W has an approximate Poisson distribution by studying how a small stochastic change affects the law of W. An appropriate such change is often easily found by constructing a reversible Markov chain resulting in an exchangeable pair (W, W ′ ). Stein's general approach is developed for the Poisson setting in section 2. This development is somewhat technical; as motivation for the machinery, we offer the following rough overview. The aim is to prove that W = X i has an approximate Poisson distribution, where the X i are (perhaps dependent) indicators. First, a probabilistic construction is used to create a second random variable W ′ such that (W, W ′ ) is exchangeable. This construction is often one step in a reversible Markov chain with W as stationary distribution. Then the machinery from section 2 gives the following bound: with c a parameter chosen to minimize the error terms. Thus the error will be small provided In applications, W = X i with the X i indicators of rare events. The X i are thus mostly zero and W of them are one. The example to keep in mind is when a point of the sequence is chosen at random, and that X i is changed to its opposite, giving W ′ . Heuristically, since most of the X i are 0, there are constants a and b such that By the symmetry of the exchangeable pair, Thus choosing c = λ a gives b = 1 c , and makes the error terms small. In many cases, just changing one X i to its opposite doesn't quite work to produce an exchangeable pair, but most of our exchangeable pairs are constructed in ways which are rather similar to this. The reader will see rigorous versions of these heuristics starting in section 3.
The contents of the rest of the paper are as follows. Section 2 develops a general bound. In section 3, we give a first example of using Stein's method: Poisson-binomial trials. The method is shown to give close to optimal results with explicit error bounds. In sections 4-6, the method is applied to three classical probability problems: the matching problem, the birthday problem, and suppose that E(W ) = λ < ∞. Let X o be the bounded real-valued functions on N = {0, 1, 2, ...}, and let E o : X o → R be expectation with respect to P oi λ measure on N. The random variable W allows us to define a map β : X o → X by βf (ω) = f (W (ω)).
Heuristically, W has an approximate P oi λ distribution if E o (f ) .
= E(f (W )); i.e., E o . = Eβ. This is equivalent to saying that the following diagram approximately commutes: Stein constructs a symmetric probability Q on Ω × Ω with margins P (i.e., Q(A, B) = Q(B, A) and Q(A, Ω) = P(A)), which gives an exchangeable pair. It 67 is used to define the following enlarged diagram: Stein's lemma, developed and proved in lemma (2) below, shows in a precise sense that if the left square approximately commutes, then the triangle approximately commutes. This leads to explicit bounds on Poisson approximation.
We begin by constructing the top row of the diagram. We define the characterizing or Stein operator T o for the P oi λ distribution by and we let F o ⊆ X o be the functions f : N → R such that T o f is a bounded function. Note that F o contains all of the functions f on N such that f (n) = 0 eventually, thus it is rich enough for our purposes.
Remark: T o is called characterizing because it also has the property that if p is a probability on N with the property that ∞ j=0 (λf (j + 1) − jf (j))p(j) = 0 for every bounded function f : N → R, then p is the Poisson distribution with parameter λ. To see this, let f = δ k . This yields the equation λp(k − 1) = kp(k), which leads to a recursion relation that describes P oi λ .
Next we will define a map U o , which is almost an inverse to T o . Define: It is easy to check that Thus U o is inverse to T o on ker(E o ). The following lemma, proved by Barbour and Eagleson in [7], gives bounds on expressions involving U o .
A reviewer has pointed to a sharper bound in [38]. We have not seen this and will use lemma (1) in what follows. This lemma and equation (6) show that U o f ∈ F o for every f ∈ X o . To complete the top row, let i denote the map R → X o which associates to each constant c the map on N which takes the value c for each n.
To construct the bottom row of diagram (3), define Using a probability Q on Ω × Ω as discussed above, define the operator T by: Observe that T f is a bounded function on Ω for f ∈ F. Further, where the first equality is by the symmetry of Q and the second is by the anti-symmetry of f . Thus ET = 0. Finally, define α : F o → F to be any linear map, for example, αf (ω, ω ′ ) = f (W (ω))− f (W (ω ′ )). Stein's lemma is true regardless of what α is, so we choose α to work well with the problem at hand. In applications, α is often a localized version of the example given above; see proposition (3) below for an example.
We can now state and prove: Lemma 2 (Stein). Suppose that in the following diagram of linear spaces and linear maps, Then Proof. We have:  (2) and the bounds to follow, it is only required that W and W ′ are exchangeable. While this is most easily achieved by making Q symmetric, examples of Fulman [28] and Rinott-Rotar [45] show that other constructions are possible.

Remarks
The following proposition gives us a more workable form for the error term.
Proposition 3. Let W be a random variable on (Ω, A, P) and Q a symmetric probability on Ω × Ω with margins P. Let W = W (ω) and W ′ = W (ω ′ ). Then Remarks: For this choice of α, the error term from Stein's lemma will be small if the exchangeable pair can be chosen so that for some c. One then uses this c in defining α to obtain cancellation in the error term, and then uses the bounds in lemma (1) to bound the error term. The examples which follow show that (11) and (12) often hold for natural choices of Q, for some c. We observe that if (11) and (12) hold, the ratio method proposed by Stein [52] is also a way to approach Poisson approximation.

Poisson-Binomial Trials
Let X 1 , X 2 , ..., X n be independent {0, 1}-valued random variables with P(X i = 1) = p i and P(X To put this example into the framework of section (2), let Ω = {0, 1} n , with P(ω 1 , ..., ω n ) = n i=1 p ωi i (1 − p i ) 1−ωi , and let X be the bounded functions on Ω. Let W (ω) = n i=1 ω i . One way to build a symmetric probability Q on Ω × Ω is the following probabilistic construction: choose an index I uniformly in {1, ..., n}, and let ǫ I = 1 with probability p I and ǫ I = 0 with probability 1 − p I . Given ω ∈ Ω, set ω . This constructs a probability Q on Ω × Ω: assign to the pair (ω, ω ′ ) the probability of choosing ω from Ω according to P and then going from ω to ω ′ by the process described. It is clear from the construction that Q is symmetric and has margins P. From these definitions, Combining these calculations with (10) and choosing c = n gives As we consider f = U o g for functions g with 0 ≤ g ≤ 1, lemma (1) yields the following: Theorem 4. Let X 1 , X 2 , ..., X n be independent {0, 1}-valued random variables with P(X i = 1) = p i and n i−1 Remark: The factor of 2 in the denominator arises because where the first supremum is taken over all measurable sets A and the second is over all measurable funtions f such that f ∞ ≤ 1. . This bound was derived by a different argument by Barbour and Hall. A comparison with other available bounds and approximations for the i.i.d. case is in Kennedy and Quine [35], where there are also extensive references given. The bound that we give is sharp for small λ and can be improved for large λ.
Here, the factor of λ in the denominator saves the day.
Remarks: Under the conditions of theorem (4), there are methods for computing the exact distribution of W and a host of approximations that differ from the Poisson. See Stein [49] or Percus and Percus [42] for further discussion. An extensive collection of random variables which can be represented as sums of independent binary indicators is in Pitman [43].

The Matching Problem
Because of its appearance in Montmort [18], the matching problem is one of the oldest problems in probability. It asks for the distribution of the number of fixed points in a random permutation. Takács [53] gives an extensive history. To put this problem into our set up, let S n be all n! permutations of n objects with P(σ) = 1 n! , and let W (σ) = |{i : This approach of writing a random variable as a sum of {0, 1}-valued random variables, called the method of indicators, is one of our main tools for computing expectations of N-valued random variables. Using this representation of W , it is easy to see that EW = 1. Build an exchangeable pair (σ, σ ′ ) by choosing σ ∈ S n according to P and then choosing σ ′ given σ by following σ with a random transposition τ chosen uniformly among the non-trivial transpositions in S n . Then where a 2 (σ) is the number of transpositions which occur when σ is written as a product of disjoint cycles. The first expression is calculated by observing that if W ′ = W − 1, then τ must choose one of the fixed points and switch it with something which is not a fixed point. The second calculation is done by similar considerations. Define α to be the localized operator as in proposition (3) and choose c = n−1 2 .
Then, for g : N → [0, 1], Using the method of indicators, it is easy to check that E(W 2 ) = 2 and E(2a 2 ) = 1. Further, for g ∞ ≤ 1, U o g ∞ ≤ 1 by lemma (1). Putting all of this together proves Theorem 5. Let W be the number of fixed points in a randomly chosen permutation on n letters. Then Remarks: 1. In this example, the bound is not sharp. As is well known, L(W ) − P oi 1 T V ≤ 2 n n! . It is interesting to speculate on just where sharp control is lost. We have proceeded by equality through (13). 2. The super exponential bound of remark (1) is an algebraic accident. This can be seen by considering the number of fixed points in only part of the permutation, that is, considering W k (σ) = |{1 ≤ i ≤ k : σ(i) = i}| for some k < n. Essentially the same argument shows that W k∼ P oi k n . In section 8, we show that the fixed point process (scaled to [0,1]) converges to a Poisson process of rate 1. Here, the error may be seen to be of order 1 n .

Generalized matching
Consider now the more general problem of fixed points of permutations of the set A = {1, 1, ..., 1, 2, ..., 2, ..., k, ..., k} where the number i appears l i times. For example, if two ordinary decks of cards are shuffled, placed on a table and turned up simultaneously, one card at a time, a match is counted if two cards of the same number appear at the same time, without regard to suits. This is the matching problem with k = 13 and l i = 4 for each i. Let |A| = n and let S n be the set of all permutations of the elements of A with P(σ) = 1 n! for each σ. Let W (σ) be the number of fixed points of σ and Build an exchangeable pair in this situation in the same way as in the previous situation: follow a permutation σ by a random transposition of the n set elements. Then To calculate the first expression, make use of the fact that for fixed i, j =i W ji = l i − W i . The first line follows as one determines the probability in question by, for each fixed i and each j = i, counting the number of ways to choose a symbol i with σ(j) = i and a symbol k = i, k = j with σ(i) = k. These will then be switched by a transposition, and the number of fixed points will have been increased by 1. The second calculation is done similarly.
In order to bound the error term, the following moments are needed: .

S. Chatterjee et al./Exchangeable pairs and Poisson approximation
Now choose (in analogy with the previous case) c = n−1 2 and make use of proposition (1) to estimate: For the other half of the error term, Putting these two estimates together with proposition (3) proves

Remarks:
1. In particular, if l i = l for each i, then the theorem gives:

2.
A celebrated paper of Kaplansky [33] first showed that W had an approximate Poisson distribution under suitable restrictions on the l i . 3. Stein's method may be used to prove Poisson approximation in the matching problem for some non-uniform distributions µ on S n . In outline, the technique is similar: construct Q on S n × S n by choosing σ according to µ and then choosing σ ′ by making a random transposition and using the Metropolis algorithm to ensure that σ ′ is also µ-distributed. 4. Essentially the same techniques can be used to study approximate matches.

Generalizing the problem in remark 4, one may study
A iπ(i) where A ij is an n × n matrix with entries in {0, 1}. Similarly, one may study the number of permutations which violate a given set of restrictions at most k times with this method, if the restrictions are given by a {0, 1}-matrix (that is, π(i) = j is allowed if A ij = 1). Classically, these problems are solved using rook theory or the extensive development connected to the permanent (see [21] for extensive references).

The Birthday Problem
For the classical birthday problem, k balls are dropped independently into n boxes, such that any ball is equally likely to fall into any of the n boxes. How many balls should there be to have some specified chance of having a box with more than one ball in it? Let us set up the problem in the framework of Stein's method. Let [n] = {1, . . . , n}, and let Ω = [n] k . An element ω = (b 1 , . . . , b k ) ∈ Ω will represent an arrangement of the k balls in the n boxes: b i is the number of the box into which we put ball i. Thus the b i are i.i.d. and uniformly chosen from [n]. Denote this probability measure on Ω by P. Let M m be the number of boxes containing exactly m balls, and let M m+ be the number of boxes containing m or more balls. We will show that if k = θ √ n, then W = M 2+ is approximately since any box which contains more than two balls must contain one such triplet as in the sum. Thus, Build an exchangeable pair (ω, ω ′ ) as follows: choose ω ∈ Ω according to P and then choose an index I ∈ [k] uniformly. Let b * I be i.i.d. with the {b i }, and let ω ′ = (b 1 , . . . , b * I , . . . , b k ). Thus the exchangeable pair is formed by first distributing the k balls independently and uniformly in the n boxes, and then choosing a ball at random and choosing a new box for it at random. As before, let W = W (ω), W ′ = W (ω ′ ), and compute: The first equality holds as in order to make W go down by 1, we must choose a ball from a box that contains exactly two balls and put it into any box except either the one it came from or any of the boxes which started with exactly one ball. The second computation is similar.
By proposition (3) and choosing c = k 2 , the error term for Poisson approximation is given by: Observe that θ = kn −1/2 implies that Finally, √ n , as shown before. Applying lemma (1), proves Theorem 7. If we drop k = θ √ n balls independently and uniformly into n boxes, and let W be the number of boxes containing at least two balls, then

Triple matches
We next consider a variation of the birthday problem: triple matches. As before, let Ω be the space of possible arrangements of k balls in n boxes with probability P given by independent and uniform placement of the balls. This time, the random variable W (ω) will denote the number of triple matches in ω; i.e., and from this representation, Let M i (ω) be the number of boxes containing exactly i balls, thus W = k l=3 l 3 M l . Let X l i be the indicator that box i contains exactly l balls. Then Construct an exchangeable pair in this case exactly as in the previous case: choose a ball at random and then choose a new box at random to put it in. We have: This follows as the only ways to make the number of triples sharing a box go up by exactly one are either to choose a ball from a box with only one ball and move it to any of the boxes with two balls, or to choose a ball from a box with two balls and put it into one of the other boxes with two balls.
Using lemma (1) and proposition (3) together with the triangle inequality, we will need to estimate: for some choice of the parameter c.
Consider (14) first. Define new random variables M i = M i − µ i where µ i = EM i and write everything in terms of these M i . We have: The deterministic part of this is and if k = o(n), then the top order term of the numerator is k(k−1)(k−2)n 2k−3 . We thus choose the parameter c to be c = k 3 . Then in expression (14), use the triangle inequality to estimate the deterministic part of the sum separately. It is asymptotic to k 4 3n 3 , thus if k = o(n 3/4 ), this part of (14) goes to 0. Recall that EW = k(k−1)(k−2) 6n 2 , so to get a nontrivial limiting distribution, the case we are interested in is k = θn 2/3 for some fixed θ, so limiting our considerations in a way that still allows for this case is no loss. We will now estimate each term of the non-deterministic part of (14) separately. Using the method of indicators gives Now to estimate (14), use Cauchy-Schwartz: In particular, if k = θn 2/3 , this expression goes to 0 as n → ∞.
The same kind of analysis is used to bound (15). Again, recentering all the random variables about their means and using our choice of c, the expression (15) turns into: The deterministic part approaches 0: In order to estimate the rest, the following moments are needed: Looking back at (16) and using the triangle inequality and Cauchy-Schwarz on most of the terms, the following go to zero: We have already dealt with the deterministic part. Next, consider the term which is the remaining µ 0 M 3 term of (16) and the first summand of Finally, we have to deal with E In particular, if k = θn 2/3 , then W has a non-trivial limiting Poisson distribution.
Remark: It is instructive to compare the bound in theorem 8 to the more general bound for multiple matches in the birthday problem that appears in theorem 6B of the book of Barbour, Holst, and Janson. First, their bound gives an explicit inequality instead of the O k 4 n 3 of theorem 8. More importantly, in the critical case where k = θn 2/3 , theorem 8 gives a bound of order n −1/3 for the error. Theorem 6B gives a bound of order n −2/3 . This suggests a more careful look at the analysis above and a healthy respect for the coupling approach.

The Coupon-Collector's Problem
In its simplest version, the coupon-collector's problem is as follows: drop k balls independently and uniformly into n boxes. How large should k be so that there is a prescribed chance (e.g. 95%) that every box contains at least one ball? This is also an old problem: Laplace (1780) gave an effective solution. Feller [26] and David-Barton [17] present substantial developments.
Write W for the number of empty boxes, and use θ to write k = n log n + θn. We introduce the new notation k i for the number of balls in box i, and use N i to denote the number of boxes with i balls. Note that E(W ) = n(1 − n −1 ) k ∼ e −θ . Hence we take λ = e −θ . Make an exchangeable pair by choosing a ball at random, choosing a box at random, and putting the chosen ball into the chosen box. As before, compute: kn Taking c = n gives the error for Poisson approximation as By the method of indicators one can compute EN 1 W = i =j P(k i = 1∧k j = 0), where k i is the number of balls in box i. Hence, We have C = e −θ (n log n + θn − n log n + (W + 1) log n) k = e −θ (θn + (W + 1) log n) k and E(W ) ≤ e −θ . Hence E|C| ≤ e −θ (θn + (e −θ + 1) log n) n log n + θn ∼ θe −θ log n .
So what remains is to bound E|D|. Observe that E|D| ≤ n log n k E e −θ − N 1 log n .
So if N 1 log n concentrates at its expectation, there is a bound on the error term.
Let p = P(ξ 1 = 1) = k 1 Then Var(ξ 1 ) = np(1 − p) ≤ np and Cov(ξ 1 , ξ 2 ) = n(n − 1)(ρ − p 2 ). Now Putting everything together shows Theorem 9. Drop k balls independently and uniformly into n boxes. If k = n log n + θn, and W denotes the number of empty boxes, then Remark: The analysis above can be sharpened to give an error of order log n n in theorem 9. See the remarks to example 2 in section 9.2. It is instructive to compare this bound with the results of theorem 6D in the book by Barbour, Holst, and Janson. In the critical case where k = n log n + θn, theorem 6D gives an explicit inequality which leads to an error of order log n n as well.

Multivariate Poisson Approximation
This section gives useful bounds on the approximation of a vector of integervalued random variables by a vector with independent Poisson coordinates. As an example, we treat the joint distribution of the number of fixed points and the number of times i goes to i + 1 in a random permutation.
, any choice of the {c k }, and Proof. Without loss of generality, assume that the Z's and the W 's are defined on the same space and are independent.
Lemma (1) shows that f k ∞ ≤ min(1, 1. From the definition of f k given in (19), Now, for A k and B k as defined above, define for some c k , and note that by antisymmetry, E(T k ) = 0. If F denotes the σ-field generated by Z 1 , . . . , Z d , W 1 , . . . , W d , then As E(T k ) = 0, the expression (21) can be inserted into the summand in (20), yielding From (22), the claim (18) follows from lemma (1).
Create an exchangeable pair (W ′ 1 , W ′ 2 ) and (W 1 , W 2 ) by following σ with a randomly chosen transposition. The sets A k , B k are Then The calculations for P(A 2 |σ) and P(B 2 |σ) are analogous to those carried out in section 4. Choosing c k = n−1 2 for k = 1, 2 in proposition (10) then yields the estimate where (Z 1 , Z 2 ) are independent Poisson random variables, each with mean 1.

Introduction
We begin with an example: consider, as in section (4), a random permutation σ on n letters.
W is the number of matches up to time j. In the same way as in section (4), one can show that W j has an approximate Poisson distribution with parameter j n . In this section we show that the point process Y t = W ⌊ t n ⌋ for 0 ≤ t ≤ 1 converges to a Poisson process of rate 1. Similar results can be derived for the birthday problem: the number of birthday matches with ⌊t √ n⌋ people converges to a Poisson process of rate 1. For the coupon-collector's problem, the number of new coupons collected from time n log n to time n log n + tn converges to a Poisson process of rate e −t .
Recall that if Y is a complete separable metric space and µ is a measure on Y which is finite on bounded sets, a Poisson process of rate µ is a random discrete measure on Y with the following properties: • The number of points N (A) in a bounded set A has a Poisson distribution with parameter µ(A).
are independent. Useful introductions to Poisson processes are found in Kingman [36] and Resnick [44]. The first use of Stein's method in Poisson process approximation was in 1988 in the paper [5] by Barbour, and the last chapter of Barbour, Holst, and Janson [8] has a wonderful development of Stein's method for Poisson process approximation, using dependency graphs and the coupling approach. We show here how the method of exchangeable pairs can be used. Again, there are other approaches to proving Poisson process approximation. We find the papers [39] and [40] by Kurtz particularly helpful. For a connection with Palm theory, see [15].
Returning to our example, once we have the Poisson process approximation in place, one has various 'off the shelf' results, e.g.
• If there are k matches, their location is distributed as k random points in [0, 1]. • One has the limiting distribution of the minimum (or maximum) distance between two matching times (see Gilbert and Pollak [29], Feller [26]).
Furthermore, one can harness some standard machinery for transformations of Poisson processes. Perhaps the three most important such results are the following: Let a Poisson process of rate µ be given on Y.
• (Mapping) If T : Y → X is a proper map then the image process is Poisson on X with rate µ T −1 .
• (Spraying) Let K(y, dx) be a stochastic kernel on Y × F X . Replace each point y in the Y process by a point chosen from K(y, dx). The new process is Poisson with rate K(y, dx)µ(dy).
• (Thinning) Delete points in the Y process with probability P (y). The new process is Poisson with rate A P (y)µ(dy).
With these examples as our motivation, we turn to a rigorous account. Let I be a finite index set, and let {X i } i∈I be binary indicators with some joint distribution and with P(X i = 1) = p i . Let {W i } i∈I be independent Poisson random variables with parameters p i . We give a development which allows us to bound the distance between the processes. This depends on the construction of an exchangeable pair (X, X ′ ). In the matching problem, X is constructed from a random permutation σ and X ′ is constructed by making a random transposition in σ.

The basic set-up
Let I be a finite set, and let ξ = i∈I x i δ i be a configuration on I. Thus ξ is a measure on I putting mass x i at i, with x i ∈ N. In fact, in our applications we will typically consider the case x i ∈ {0, 1}. Let X o be the bounded measurable functions on configurations. Thus if f : N → R is given, then ξ → f (i)x i = f dξ is in X o , as are max{i : x i > 0} and f (i, j)dξdξ for f : N × N → R. Let E o be the Poisson expectation operator on X o . In other words, consider a probability space (Ω, A, P) and an induced probability on the space of configurations on I given by writing ξ = x i (ω)δ i where the x i ∼ P oi(p i ). Let X be the space of bounded measurable functions on Ω with corresponding expectation operator E. As in section 2, there will be a map β : X o → X assigning a random variable to a function on configurations: In the matching example, the probability space is S n with the uniform distribution, and the association σ ←→ ξ σ = x i (σ)δ i gives rise to the map βT (σ) = T (ξ σ ) for T ∈ X o . The aim is to show that Eβ and F o the subset of elements h ∈ X o with T o h bounded. As explained in Barbour, Holst, and Janson, this T o is the generator of an immigration and death process Z(t) on I where particles are born at site i at exponential rate p i and die at rate one. This process has the Poisson process of rate {p i } as a stationary distribution. This fact (or a straight-forward direct computation) shows that E o T o = 0. Note that for |I| = 1 the operator is slightly different from the usual Stein operator for a Poisson process. This T o is a "second difference" operator. We use it because of already available bounds on its inverse, but also as an introduction to this line of development. The inverse can be given a stochastic representation where Z(t) is the immigration and death process started at ξ. Barbour, Holst, and Janson (pg. 209) show that They also show that if f (ξ) = δ A (ξ) for some set of configurations A, then As before, let F be the bounded anti-symmetric functions on Ω × Ω, and construct T via a symmetric probability Q with margins P: We may choose α to be any linear map α : To proceed, we need to make an intelligent choice of α, make explicit the difference T α − βT o , and use this expression to bound the remainder term.
Because of the different form of the Stein operator, the choice of α which seems most useful is: Thus, In our application, h = U o f , thus h has bounded differences. It then follows that if c and the probability Q can be chosen so that then the error in the Poisson approximation is bounded by Of course, in any application, some work is required. Here is a full version of the matching problem.
Theorem 11. Let X i (σ), 1 ≤ i ≤ n indicate whether or not i is a fixed point of the random permutation σ. Let Y i be independent Poisson random variables, each with mean 1 n . Then Proof. It is easy to see that and that with Z i (σ) indicates whether σ(σ(i)) is i or not, and W = X i . Choose c = n−1 2 , and note that p i = 1 for each i. We thus have that It follows that if A is any set of configurations,

Remark:
We have not treated the closely related area of compound Poisson approximation; see Erhardsson [25] and the references therein.

The coupling approach to Poisson approximation
Stein's method for Poisson approximation has been very actively developed along two lines, both quite different from the present account. The connections between the various approaches are not clear. In this section we give a brief development of the coupling method. In section 10 we give a brief development of the dependency graph method. The book-length account of Barbour, Holst, and Janson treats both coupling and dependency graphs in detail. The main reasons for including brief descriptions here are twofold: (a) as illustrated above, the many analytic tools developed by previous workers can easily be adapted for use with the present method of exchangeable pairs; (b) while the three approaches are similar, there are also real differences. Sometimes one or another approach is easier to apply or yields better bounds.
The basics of the coupling method are laid out in section 9.1. Section 9.2 treats our familiar examples: Poisson-binomial trials, the birthday problem, the coupon-collector's problem, and the matching problem. Section 9.3 illustrates one of the triumphs of the coupling approach: effortless bounds for negatively dependent summands. Our development leans heavily on lectures of Charles Stein.

The basics
From section 2, we know that a random variable W has a Poisson distribution with parameter λ if and only if, for each bounded measureable function f , This suggests that one approach to the question of whether W is approximately Poisson-distributed is to compare E(W f (W )) and λE(f (W + 1)). The following simple lemma motivates a slightly mysterious construction to come. It appears in Stein [51]. Proof.
The lemma suggests comparing W + 1 and a random variable W * having the distribution of W , given that the randomly chosen coordinate X I is one. Many examples in which W * is constructed explicitly appear below. The following result gives a concrete bound on the total variation distance between the law of W and P oi λ in terms of the difference between W + 1 and W * .
Proposition 13. Let X, I, and W be defined as in lemma (12). Let W * be defined on the same probability space with P(W * = w) = P(W = w|X I = 1).
Proof. Given A ⊆ N ∪ {0}, let g(j) = δ A (j) − P oi λ (A) and set f = U o g, for U o as defined in (5). Then, from lemma (12) and equation (6), Now, for any integers x < y, and from lemma (1), From these pieces, we get a bound for Poisson approximation by constructing a coupling (W, W * ) with W * close to W + 1. Stein [51] and Barbour, Holst, and Janson [8] give many examples leading to good bounds in a host of problems.
We next give a brief treatment of the basic probability problems treated in previous sections.

Examples
Example 0 (Poisson-binomial trials): Proof. We construct W * 'backwards'. Pick X from the product measure, and independently pick I * with P(I * = i) = pi λ . Set X * I * = 1 and X * i = X i for i = I * . We claim that Indeed, if x i = 0, then both sides are 0. If x i = 1, then Note that the claim shows P(W * = j) = P(W = j|X I = 1). To complete the proof, observe that W + 1 − W * = X I * , and that P(X I * = 1) = i p 2 i λ .

Remarks:
1. Observe that the bound of this example is exactly the same as the bound given by the method of exchangeable pairs. 2. The backwards construction of this example seems magical when you first encounter it. The method of exhangeable pairs seems much more straightforward (at least to us). One benefit of the coupling approach is the clean bound of proposition (13). Then if a 2 is the number of transpositions in π, with probability W n , −1 with probability 2a2 n , 0 otherwise. Then Example 2 (The coupon-collector's problem): Here k balls are dropped into n boxes and X i is one or zero as box i is empty or not. Then W = i X i is zero if all of the boxes are covered. To construct a coupling, pick a box uniformly and simply distribute each ball in it at random into one of the other boxes. We claim that To prove this, let A be the number of balls in the box selected. Observe that W + 1 − W * ≥ 0 and that • If A = 0, then W + 1 − W * = 1.
• If A > 0, then W + 1 − W * is the number of the A balls that fall into empty boxes.
From this, Combining these bounds proves the claim.
Remark: If k = n log n + cn for c fixed and n large, then proposition (13) shows that the number of empty boxes is well approximated by a Poisson distribution with error of order log n n . This is better than the bound we obtained in section 6 using exchangeable pairs. In preliminary work we have shown that if the proof of section 6 is carried through by conditioning only on W , we get an error term of order log n n as well. Thus For E |W + 1 − W * | S c we may, by symmetry, consider the absolute difference between the number of balls in the first two boxes, given that each contains at least one ball. This is just the absolute difference between the number of balls in the first and second boxes when k − 2 balls are dropped into n boxes. By bounding this by twice the expected number of balls in the first box, namely 2k n , and bounding P(S c ) by 1, we get Remark: If k = θ √ n with θ fixed and n large, then proposition (13) shows that the number of matching pairs has a Poisson distribution of mean θ 2 2 with error of order 1 √ n .

Automatic coupling
One of the many contributions of the book-length treatment on Poisson approximation by Barbour, Holst, and Janson (1992) is a soft theory which leads to very neat result. Recall that a collection {X i } i∈ I of random variables is called negatively associated if for all disjoint A, B ⊆ I and monotone functions f, g, . If X i is one or zero as N i is bounded above by m or not (for fixed m ≥ 0), then X i are negatively associated. In Bose-Einstein allocation, k balls are dropped into n boxes so that all configurations are equally likely. If Y i are independent geometric random variables with P(Y i = j) = qp j for 0 ≤ j < ∞, then the distribution of Y i given i Y i = k has a Bose-Einstein distribution. These variables are thus negatively associated.
Many further examples and tools are given in [8].
The motivation for this set-up is the following elegant theorem of Barbour, Holst, and Janson. Remarks: 1. Of course, the mean equals the variance for a Poisson random variable. The theorem shows that if the mean is close to the variance, then the law is close to Poisson.
2. Mixtures of negatively associated variables are negatively associated. Diaconis and Holmes [22] apply these ideas to study the birthday and couponcollector's problems when balls are dropped into boxes with probabilities {p i } and a prior is put on p i . 3. For variables presented as conditional given a sum, a host of analytic techniques are available for higher order expansions and large deviations [20].

The dependency graph approach
In this section we give a brief description of the dependency graph approach to Stein's method. This method is often useful when a natural local dependence structure exists; in these cases, this approach may be easier to apply than constructing an exchangeable pair or a coupling. Following a description of the approach, an example of using a dependency graph to solve a variation of the birthday problem is given.
be a set of binary random variables, with P(X i = 1) = p i . A dependency graph for {X i } is a graph with vertex set I = {1, . . . , n} and edge set E, such that if I 1 , I 2 are disjoint subsets of I with no edges connecting them, then {X i } i∈ I1 and {X i } i∈ I2 are independent. Let N i denote the neighborhood of i in the graph; that is, This framework yields the following bound, first proved in [4]: Theorem 15. Let {X i } i∈ I be a finite collection of binary random variables with dependency graph (I, E); suppose P(X i = 1) = p i and P(X i = 1, X j = 1) = p ij . Let λ = p i and W = X i . Then L(W ) − P oi λ T V ≤ min(1, λ −1 )   i∈I j∈Ni\{i} p ij + i∈I j∈Ni Proof. Let A ⊆ N, and let f = U o δ A for U o as in section 2. Let Z ∼ P oi λ . Then Let W i = W − X i , and V i = j ∈Ni X j . Note that X i and V i are independent and that X i f (W ) = X i f (W i + 1).
Remark: Here we have insisted that X i be independent with {X j } j ∈Ni . The following theorem, stated and proved in Barbour, Holst, and Janson (1992), is a more general result in which it is only required that X i must not depend too strongly on {X j } j ∈Ni . The proof is very similar to that of theorem (15).
where η i is any quantity satisfying |E(X i g(Y i + 1)) − p i Eg(Y i + 1)| ≤ η i for every g : N ∪ {0} → [0, 1]. a variety of metrics on the permutation group. For example, one may take Spearman's ρ, which boils down to the squared differences (σ(i)−σ ′ (i)) 2 . Fixing a metric d, one can define a probability distribution on permutations by P θ (σ) = z −1 θ d(σ,σo) , 0 < θ < 1, where z is a normalizing constant and σ o is the center of the distribution. These Mallows models have been widely applied; for extensive references, see Fligner and Verducci [27] or Diaconis and Ram [24]. The problem is this: for θ, σ o , and d fixed, pick σ from P θ and look at the distribution of the number of fixed points. We conjecture that this will be Poisson under very mild conditions. To use Stein's method, an exchangeable pair must be formed. One simple way to do this is to pick σ from P θ and then make a random transposition τ . Use the Metropolis algorithm to get the second coordinate distribution to be P θ as well. Alternatively, the Gibbs sampler may be used. Of course, the problems outlined in the first two items above may be studied under these non-uniform models. Diaconis and Holmes [22] give some examples and further motivation.

Variants of the coupon-collector's problem
1. Non-uniform allocations. One can drop k balls into n boxes with the probability of dropping a ball into box i equal to p i . This is studied by classical arguments (and without error terms) in Rosén [47]. Diaconis and Holmes [22] study Bose-Einstein allocation. They also mention the problem of putting a prior on (p 1 , . . . , p n ) and studying the Bayes distribution.

2.
Complexes. The balls may be dropped in j at a time such that within each group of balls, all must land in different boxes. This actually arises in collecting baseball cards. Kolchin et al. [37] give some classical results.
3. Coverage problems. A sweeping generalization occurs in the world of coverage processes. Let X be a compact metric space. Pick points x 1 , . . . , x k from a probability measure µ on X. For a fixed ǫ > 0, what is the chance that the union of the ǫ-balls about the x i cover X? Classical examples are dropping arcs on a circle (see Feller [26]) or caps on a sphere. Hall [31] and Janson [32] give many results. Aldous [1] has further literature. The widely studied k-SAT problem of computer science theory can be seen as a special case. Here, X is the hyper-cube Z d 2 , k points are chosen randomly, and one wants to understand the chance that Hamming balls of a given radius cover the cube.

The birthday problem
Many natural variations have been studied: one may fix a graph, color the vertices with k colors chosen with probabilities p i for 1 ≤ i ≤ k, and ask how many edges have the same color. The same kind of question may be asked for hyper-graphs. Some of these problems are treated by Aldous [1], Arratia-Gordon-Goldstein [4], or Barbour-Holst-Janson [8]. We mention here a lesserknown development. Camarri and Pitman [12] have shown that limits other than the Poisson can arise if non-uniform probabilities are used. The limiting measures that arise there are natural (related to the Mittag-Leffler function) and well worth further development. It is also natural to study how these limiting regimes occur in the graph variants discussed above. Developing Stein's method for these cases seems like a natural project.