The Divergence Borel-Cantelli Lemma revisited

Let $(\Omega, \mathcal{A}, \mu)$ be a probability space. The classical Borel-Cantelli Lemma states that for any sequence of $\mu$-measurable sets $E_i$ ($i=1,2,3,\dots$), if the sum of their measures converges then the corresponding $\limsup$ set $E_\infty$ is of measure zero. In general the converse statement is false. However, it is well known that the divergence counterpart is true under various additional 'independence' hypotheses. In this paper we revisit these hypotheses and establish both sufficient and necessary conditions for $E_\infty$ to have either positive or full measure.


Introduction
The Borel-Cantelli Lemma is a result in probability theory with wide reaching applications to various areas of mathematics. To some extent, this note is motivated by its deep applications to number theory, in particular to metric number theory -see for example [5,6,9,28,29,46] and references within. Loosely speaking, metric number theory is concerned with the arithmetic properties of almost all numbers and many key results in the theory are underpinned by variants of the divergence part of the Borel-Cantelli Lemma (see Lemma DBC below). The divergence part is also known as the second Borel-Cantelli Lemma and it naturally shows up (in some form) in the proof of the notorious Duffin-Schaeffer Conjecture [24] recently given by Koukoulopoulos & Maynard [35] and its higher dimensional generalisation proved two decades earlier by Pollington & Vaughan [41]. Indeed, the divergence Borel-Cantelli Lemma is very much at the heart of numerous other recent advances on topical problems in metric number theory, such as those in the theory of multiplicative and inhomogeneous Diophantine approximation and Diophantine approximation on manifolds and more generally on fractals, see for example [4,8,18,19,20,21,33,43,44,49]. In a nutshell, our goal it is to revisit the Borel-Cantelli Lemma and to establish both sufficient and necessary conditions that guarantee either positive or full measure.

Background and Motivation
To set the scene, let (Ω, A, µ) be a probability space and let E i (i ∈ N) be a family of measurable subsets (events) of Ω. Also, let i.e. E ∞ is the set of x ∈ Ω such that x ∈ E i for infinitely many i ∈ N.
Determining the measure of E ∞ turns out to be one of the fundamental problems considered within the framework of classical probability theory -see for example [14, Chp.1 §4] and [42,Chp.47] for general background and further details. With this in mind, the following convergence Borel-Cantelli Lemma provides a beautiful and truly simple criterion for zero measure.
Lemma CBC (Convergence Borel-Cantelli). Let (Ω, A, µ) be a probability space and let {E i } i∈N be a sequence of subsets (events) This powerful lemma, which is also known as the first Borel-Cantelli Lemma, has applications in numerous disciplines. In particular, within the context of number theory it is very much at the heart of Borel's proof that almost all numbers are normal [16].
In view of Lemma CBC, it is natural to ask whether or not there is a sufficient condition that enables us to deduce that the measure of E ∞ is positive or possibly even full; that is to say that µ(E ∞ ) = µ(Ω) = 1 .
The divergence of the measure sum ∞ i=1 µ(E i ) is clearly necessary but certainly not enough as the following simple example demonstrates.
The problem in the above example is that the building blocks E i of the lim sup set under consideration overlap 'too much' -in fact they are nested. The upshot is that in order to have µ(E ∞ ) > 0, we not only need the sum of the measures to diverge but also that the sets E i are in 'some sense' independent; that is, we need to control overlaps! Indeed, Borel & Cantelli showed that mutual independence in the classical probabilistic sense, which means that for every n ∈ N µ n t=1 E it = n t=1 µ(E it ) for any indices i 1 < . . . < i n , implies that µ(E ∞ ) = 1. This full measure statement, often referred to as the second Borel-Cantelli Lemma, led to a flurry of activity with the aim of relaxing the mutual independence condition. Notable progress in this quest included replacing mutual independence by pairwise independence -this corresponds to (1) being fulfilled with n = 2 rather than every n ∈ N. In turn, pairwise independence was replaced by the upper bound condition on the overlaps. Undoubtedly, verifying (2) is significantly easier than (1). However, in many applications, we rarely have (2) let alone mutual independence as in the original statement of the second Borel-Cantelli Lemma. What is much more useful is the following variant which these days is often referred to as the divergence Borel-Cantelli Lemma.
Lemma DBC (Divergence Borel-Cantelli) . Let (Ω, A, µ) be a probability space and let {E i } i∈N be a sequence of subsets (events) in A. Suppose that ∞ i=1 µ(E i ) = ∞ and that there exists a constant C > 0 such that holds for infinitely many Q ∈ N. Then In particular, if C = 1 then µ(E ∞ ) = 1 .
We refer the reader to [28,29,42,46] for the proof of the lemma which is essentially a consequence of the Cauchy-Schwarz inequality. As pointed out by Harman [29], the basic idea goes back to the works of Payley & Zygmund [38,39] from the nineteen thirties.
Remark 1. To the best of our knowledge, the in particular part of Lemma DBC first explicitly appears in the work of Erdös & Reyni [25, Lemma C] from the late fifties. Lamperti [36] in the early sixties established the weaker form of above lemma in which  [14, §24]) that the lim sup set E ∞ satisfies a zero-one law ; namely that µ(E ∞ ) = 0 or 1.
Within the context of metric number theory, the existence of such a law for the lim sup set of well approximable real numbers is due to Cassels [17] and Gallagher [27] and it plays a key role in the recent proof of the Duffin-Schaeffer Conjecture [35]. For further details and higher dimensional generalisation of their zero-one laws see [7,10] and references within. Alternatively, without the presence of a general zero-one law, if we are willing to impose a little more structure on the probability space, we can guarantee full measure if the measure sum diverges locally and quasi-independence on average holds locally in the presence of an appropriate topological structure on Ω. In short, by locally we mean that the conditions under consideration hold for E i ∩ A where A is an arbitrary open set with positive measure. For the precise statement see Lemma LBC below.
In short, the purpose of the present paper is to determine whether or not Lemma DBC is best possible. In other words, is it the case that the pairwise quasiindependence on average condition (3) cannot be replaced by a weaker condition? Recall, that in view of Lemma CBC, the divergence sum condition within Lemma DBC is not negotiable -it has to be present. We show that within a reasonably general framework, given any lim sup set E ∞ := lim sup i→∞ E i with µ(E ∞ ) > 0 the sets E i can be appropriately manipulated or rather "trimmed" in such a manner that the resulting subsets E * i are quasi-independent on average and the sum of the measures µ(E * i ) diverges. Thus, up to "trimming" the divergence Borel-Cantelli Lemma is best possible. Moreover, we show that quasi-independence on average for the trimmed sets is in fact not only equivalent to full measure but to three other useful properties which are of independent interest especially within the context of applications. We conclude the paper with a couple of examples that demonstrate the versatility and power of our results.

Statement of results
Throughout, (Ω, A, µ, d) will be a metric measure space equipped with a Borel probability measure µ. In what follows, supp µ will denote the support of the measure µ and given x ∈ Ω and r > 0, B = B(x, r) will denote the ball centred at x of radius r. Also, given a real number a > 0, we denote by aB the ball B scaled by a factor a; i.e. aB := B(x, ar). Most of the time we will assume that µ is doubling. Recall, that µ is said to be doubling if there are constants λ ≥ 1 and r 0 > 0 such that for any x ∈ supp µ and 0 < r < r 0 µ(B(x, 2r)) ≤ λ µ(B(x, r)) .
The doubling condition allows us to blow up a given ball by a constant factor without drastically affecting its measure. The metric measure space (Ω, A, µ, d) is also said to be doubling if µ is doubling [30]. Note that the doubling property is imposed only on the measure of balls centred in supp µ. However, in many instances the doubling property can be effectively used on balls that are not necessarily centred in supp µ provided that they contain 'enough' of the support. In particular, when working with a given sequence of balls {B i } in Ω we will often impose the following weaker version of doubling: Condition (6) is not particularly restrictive and ensures that whenever µ(B i ) > 0 the support of µ within B i is not concentrated too close to the boundary of B i . Indeed, if for some ε > 0 the ball (1 − ε)B i contains points in supp µ, then the inequality in (6) holds with any a > 1 and b = λ k where k := log 2 a+1−ε ε . Note that if the centre of B i is in supp µ then the inequality in (6) trivially holds with a = 2 and b = λ.
Restricting our attention to lim sup sets of balls, we have the following 'if and only if' statement for full measure.
and for infinitely many Q ∈ N Q s,t=1 It is important to note that the constant C > 0 appearing in (8) is independent of the arbitrary ball B. The following is a strengthening of Theorem 1 to lim sup sets of open sets. As we shall see, the proof will follow the same line of argument as that of Theorem 1.  (7) and (8) for infinitely many Q ∈ N.
The upshot is that for a lim sup set E ∞ to have full measure we must be able to locally "trim" the associated sets E i so that the resulting trimmed subsets are quasiindependent on average and the sum of their measures diverges.
It turns outs that the sufficiency part of the Theorem 2 can be made more general. In particular, the doubling condition can be altogether dropped. The following is a local variant of the (standard) divergence Borel-Cantelli Lemma which allows us to deduce full measure rather than just positive measure.
Lemma LBC (Local Borel-Cantelli) . Let (Ω, A, µ, d) be a metric measure space equipped with a Borel probability measure µ and let {E i } i∈N be a sequence of Borel subsets of Ω. Suppose there exists an increasing function f : and for infinitely many Q ∈ N Q s,t=1 Then Moreover, if in addition µ is doubling and f (x) = cx for some constant 0 < c ≤ 1, it suffices to take A in the above to be an arbitrary ball of sufficiently small radius centred in supp µ.
Remark 3. In the case f (x) = cx for some constant 0 < c ≤ 1 and A = B is a ball, condition (10) becomes the same as (8) with C = c −1 . Given a measurable set A with µ(A) > 0, let µ A denote the conditional probability measure given by In other words, µ A is the re-normalised µ-measure restricted to A. Then it is easily seen that on replacing µ by µ A , the divergence condition (9) and the overlap condition (10) with f (x) = cx coincide with those of Lemma DBC. For obvious reasons, the independence condition (10) with f (x) = cx is often refereed to as local quasi-independence on average.
Theorem 1 will follow from a more general statement that provides three more necessary and sufficient conditions for full measure. To be more precise, Theorem 1 is the equivalence between (A) and (E) within the following statement with C = κ −2 .
Throughout, we use the standard notation • to denote that the union of sets under consideration is disjoint.
Then, the following statements are equivalent: (C) For any ball B in Ω centred in supp µ and any G ∈ N, there is a finite sub- where λ is as in (5), k := max{1, log 2 6 a−1 } and a, b are as in (6).
consisting of a finite union of disjoint balls from for any subsequence (G i ) i∈N of natural numbers, and, with κ is as in (12), for any pair of natural numbers G and G ′ and, with κ is as in (12), for infinitely many Q ∈ N Q s,t=1 Remark 4. It will become apparent in the proof that we can take the subset E G,B in (D) to be the union of balls in the sub-collection K G,B associated with (C).
We now turn our attention to an 'if and only if' statement for positive measure for lim sup sets of balls.
and for infinitely many The following is the 'positive measure' analogue of Proposition 1 and it clearly implies Theorem 3.
Proposition 2. Let (Ω, A, µ, d) be a metric measure space equipped with a doubling Borel probability measure µ. Let {B i } i∈N be a sequence of balls in Ω such that (6) holds. Let E ∞ := lim sup i→∞ B i . Then, the following statements are equivalent: where λ is as in (5), k := max{1, log 2 6 a−1 } and a, b are as in (6).
for any subsequence (G i ) i∈N of natural numbers, and, with κ is as in (17), for any pair of natural numbers G and G ′ and, with κ is as in (17), for infinitely many 2 Proof of results

Preliminaries
We will make multiple use of the following basic covering lemma, see for example [30,37].
Lemma 1 (The 5r covering lemma). Every family F of balls of uniformly bounded diameter in a metric space (Ω, d) contains a disjoint subfamily G such that The following measure theoretic result, which is an extension of Proposition 1 in [6, §8], provides a mechanism for establishing full measure statements.
Lemma 2. Let (Ω, A, µ, d) be a metric measure space equipped with a Borel doubling probability measure µ. Let E be a Borel subset of Ω. Assume that there are constants r 0 , c > 0 such that for any ball B centred in supp µ with r(B) < r 0 , we have that Then, µ(E) = 1.
The lemma is a standard corollary of the Lebesgue density theorem or more generally the Lebesgue differentiation theorem for doubling metric measure spaces (see for example [30,Theorem 1.8]. A slightly weaker version of this lemma can also be found in [6]. In short, the version of this lemma established as Proposition 1 in [6, §8] requires that (20) holds for arbitrary balls centred in Ω rather than just in supp µ and the proof uses covering arguments rather than the Lebesgue density theorem.
Remark 5. Note that the doubling assumption in Lemma 2 can be weakened by requiring instead that (Ω, A, µ, d) is a Vitali space as defined in [30, p.6]. Furthermore it is also possible to remove the doubling assumption altogether from Lemma 2 at the price of requiring a lower bound on µ (E ∩ B) for an arbitrary open set B as opposed to an arbitrary ball of sufficiently small radius. We will state this version formally as it will be required in the proof of Lemma LBC.
The following "obvious" but useful statement relates the standard doubling property (5) for balls centred in supp µ with the weaker property corresponding to (6) in which the centre can be anywhere. where k ∈ N satisfies 2 k ≥ (1 + s)/(a − 1) and λ is as in (5).
Proof. Let x ∈ B ∩ supp µ and B ′ be the ball centred at x of radius (a − 1)r(B). Then clearly B ′ ⊂ aB. Since B ′ is centred in the support of µ, the doubling property (5) is applicable to it and we have that for every k ∈ N µ(2 k B ′ ) ≤ λ k µ(B ′ ) .
Since B ′ ⊂ aB and µ(aB) ≤ bµ(B), we therefore have that It remains to observe that sB ⊂ 2 k B ′ provided that (a − 1)2 k ≥ 1 + s and so (21) implies the required inequality.

Proof of Lemma LBC
Let A be any open subset of Ω and {L i,A } i∈N be the sequence of sets as in Lemma LBC. In particular, by definition, L i,A ⊂ E i ∩ A for every i ∈ N and therefore lim sup On applying Lemma DBC (the standard divergent Borel-Cantelli Lemma) it follows that Also recall that E i is a Borel set for every i ∈ N and therefore E ∞ := lim sup i→∞ E i is a Borel subset of Ω. Then, applying Lemma 3 with E = E ∞ implies that µ(E ∞ ) = 1 as desired.
If µ is a doubling measure and f (x) = cx for some constant 0 < c ≤ 1, then the 'moreover' part of Lemma LBC follows on applying Lemma 2 instead of Lemma 3.

Proof of Proposition 1
• Step 1: (A) =⇒ (B). This is obvious since µ is a probability measure. In view of the 5r covering lemma (Lemma 1), there exists a disjoint sub-family G such that However, since G is a disjoint collection of balls, which have non-empty intersection with supp µ, we have that where k := max{1, log 2 6 a−1 }. Thus, If G is infinite, the sum in (22) is convergent and therefore there exists some j 0 > G for which Obviously, this is also true if G is finite. Now let K G,B := {B i : B i ∈ G, i < j 0 }. Clearly, this is a finite sub-collection of {B i : i ≥ G}. Moreover, in view of (22) and (23) the collection K G,B satisfies the desired properties.
It follows from (12) that for any pair of natural numbers G and G ′ . Thus, the sets E G,B satisfy the desired properties.
• Step 4: (D) =⇒ (E). Let B be any ball centred in supp µ and for any G ∈ N let E G,B ⊂ B be as in (D), and let K G,B be a finite collection of disjoint balls from {B i : i ≥ G} that constitute E G,B , that is (24) holds. Observe that for any pair of natural numbers G and G ′ Let G 1 = 1 and fix the collection K G 1 ,B . Define G 2 = t + 1 where t is the largest index such that B t ∈ K G 1 ,B . Since K G 1 ,B is finite this is clearly possible. With G 2 defined, we can fix the collection K G 2 ,B and proceed by induction as follows. Suppose the integers G 1 , . . . , G n and the corresponding collections K G 1 ,B ,. . . ,K Gn,B have been determined. Define G n+1 = t + 1 where t is the largest index such that B t ∈ K Gn,B . With G n+1 defined, we can fix the collection K G n+1 ,B and we are done. Now, let {L s,B } s∈N be the sequence of balls contained in B obtained by placing the balls from {K G i ,B : i ∈ N} in the same order as in {B i } i∈N . In view of the choice of the integers G i , the sequence

This together with the fact that
shows that the sequence {L s,B } s∈N satisfies the desired properties.

Proof of Proposition 2
The proof is very similar to that of Proposition 1 and so we will simply provide a sketch.
and the same argument leading to (22) shows that where k is the same integer as in (22). Furthermore, the same argument leading to (23) shows that there exists some j 0 > G for which Then, in view of (26) and (27) the finite sub-collection K G := {B i : B i ∈ G, i < j 0 } of {B i : i ≥ G} satisfies the desired properties.
• Step 2: (B) =⇒ (C). For any G ∈ N, let K G be the finite sub-collection of disjoint balls associated with (B) and define It follows from (17) that µ(E G ) ≥ κ which in turn implies that ∞ i=1 µ(E G i ) = ∞ for any subsequence (G i ) i∈N of natural numbers, and that for any pair of natural numbers G and G ′ . Thus, the sets E G satisfy the desired properties.
• Step 3: (C) =⇒ (D). For any G ∈ N let E G ⊂ Ω be as in (C), and let K G be a finite collection of disjoint balls from {B i : i ≥ G} that constitute E G , that is (28) holds. Observe that the same argument leading to (25) shows that for any pair of natural numbers G and G ′ Now, let {L s } s∈N be the sub-sequence of {B i } i∈N balls corresponding to the subcollections {K G i : i ∈ N}, where the sequence of natural numbers G 1 , G 2 , . . . is defined in the same way as within Step 4 of the proof of Proposition 1. For M ∈ N, Then, the same argument used within Step 4 of the proof of Proposition 1, shows that the sequence {L s } s∈N satisfies the desired properties.

Proof of Theorem 2
The sufficiency side of Theorem 2 is an immediate consequence of the "moreover" part of Lemma LBC. Thus we only have to prove the necessity side. This would clearly follow on mimicking the proof of Proposition 1 if we could establish the analogue of Part (C) from (11) which trivially follows from our working assumption that µ(E ∞ ) = 1. Thus, with this in mind, let B be any ball in Ω centred in supp µ. In particular, we have that In view of the 5r covering lemma (Lemma 1), there exists a disjoint sub-family G such that

It follows that
However, since G is a disjoint collection of balls centred in supp µ, we have that The sum in (22) is convergent and therefore there exists a finite sub-collection K G,B ⊂ G for which In view of (30) and (31) the collection K G,B satisfies (12) with κ = 1 2λ 3 . As already mentioned above, to complete the proof we simply replicate Steps 3 & 4 in the proof of Proposition 1. In remains to note that within the sequence of balls arising at Step 4 there may (and most likely will) be finite disjoint collections of balls arising from the same set E i . These can be grouped together in an obvious manner to form the sequence (L s,B ) s∈N as required in the statement of Theorem 2.

Examples of applications
In this section we will provide two basic examples showing the conclusions of our results in action. We wish to emphasize that the applications we discuss in this section are not new -they have been chosen to demonstrate the key principles in a relatively simple format. New interesting recent applications can be found, for example, in [23]. We start with an explicit application utilising the power of trimming within a proof of Khintchine's theorem. The proof we provide is not entirely new but, to the best of our knowledge, is simpler, due to some technical simplifications, than the existing published proofs. At the same time it leads to a slightly stronger statement than the standard one. For obvious reasons, W (ψ) is usually referred to as the set of ψ-well approximable numbers. Khintchine's fundamental theorem [31] in the theory of metric Diophantine approximation dates back to 1924 and it provides an elegant criterion for the 'size' of W (ψ) expressed in terms of one-dimensional Lebesgue measure λ.
Theorem K. Let ψ : N → (0, +∞) be such that ψ(q)/q is monotonically decreasing. Then Remark 6. The above statement of Khintchine's Theorem is in fact a slighter stronger form of the standard modern version [6] in which the ψ(q) is assumed to be monotonically decreasing.
The convergence part of Khintchine's theorem is an immediate consequence of Lemma CBC on noting that λ(E q ) ≤ 2ψ(q). It does not require the monotonicity assumption or indeed any other additional assumptions. In turn, the modern-days proofs of the divergence part of Khintchine's theorem exploits the principles set out in the main theorems of this paper. For q ∈ N and p ∈ Z with 0 ≤ p ≤ q define the balls (intervals) in R Clearly, E q = q p=0 E q,p and so W (ψ) is the limsup set of the intervals E q,p . In view of Cassels' zero-one law [17], λ(W (ψ)) = 1 if and only if λ(W (ψ)) > 0. In turn, by Theorem 3, λ(W (ψ)) > 0 if and only if there exists a subsequence (L i ) i of (E q,p ) q∈N,0≤p≤q satisfying (15) and (16). The upshot of this is that establishing Khintchine's theorem boils down to finding the "trimmed" subsequence (L i ) i∈N . This can be done in several ways but probably the easiest is to impose the explicit condition that the rational fractions p/q under consideration are reduced; that is For completeness we present an argument showing the validity of (15) and (16) for this "trimmed" subsequence, a version of which can be found in [46, §I.3].
To verify (15), we start by observing that there are exactly ϕ(q) (the Euler function) positive integers p ≤ q such that gcd(p, q) = 1, and therefore we have that q p=0 gcd(p,q)=1 We shall use the following well-known partial summation formula: where (a q ) q∈N and (b q ) q∈N are any two sequences or real numbers, and the following well known asymptotics for the average order of the Euler's function: Let 0 < C 1 < 3/π 2 . Then, using the fact that ψ(q)/q is decreasing, (32) and the trivial estimate 1 + · · · + q ≤ q 2 , by the partial summation formula with a q = 2ψ(q)/q, b q = ϕ(q), we have that for sufficiently large T T q=1 q p=0 gcd(p,q)=1 (a q − a q+1 )(1 + · · · + q) + a T +1 (1 + · · · + T ) .
And again by the partial summation formula, this time with a q = 2ψ(q)/q and b q = q, we get that the above equals C 1 for sufficiently large T . In particular, this implies (15).
To verify (16), first observe that if q < m, 1 ≤ p ≤ q and 1 ≤ k ≤ m then λ(E q,p ∩ E m,k ) ≤ λ(E m,k ) ≤ 2ψ(m)/m. Then, for fixed q < m we get that Further, if x ∈ E q,p ∩ E m,k then trivially |qx − p| < ψ(q) and |mx − k| < ψ(m), whence Also, since the fractions p/q and k/m are reduced and different (for we assumed that m > q) we must have that |pm − qk| ≥ 1. Thus the number of (p, k) in the right hand side of (34) is less than or equal to the number of integer points (p, k) satisfying If all such points (p, k) lie on a line, then from the last inequality of (35) we immediately get that their number is ≤ 4mψ(q). Otherwise, assuming such points exit, the set of these points has rank 2 and, by (35), lies in the convex body given by |p| ≤ q, |k| ≤ m, |pm−qk| ≤ 2mψ(q) which has volume ≤ 16mψ(q). In this case, the number of such points is bounded by 32mψ(q) + 2 ≤ 36mψ(q) as a consequence of Blichfeldt's theorem [15]. Either way, the right hand side of (34) is bounded by 72ψ(m)ψ(q). Clearly, the same holds when q > m. Therefore, in view of the divergence sum condition for sufficiently large T . Together with (33) this verifies (16) with C = 73/(4C 2 1 ). Remark 7. The question regarding the relevance of monotonicity in Khintchine's theorem remained a prominent open problem in probabilistic number theory for nearly 80 years. Indeed, in 1941 Duffin & Schaeffer showed that the monotonicity could not be removed (by providing a counterexample) and they formulated an alternative statement. This attracted much work (by Erdös, Vaaler, Pollington, Vaughan and Harman amongst others) and was eventually proved by Koukoulopoulos & Maynard [35]. All these works used trimming as the basis for their approaches very much in line with the outline above. Of course, the process and implementation of trimming are significantly more sophisticated.
Remark 8. The above example makes use of the power of trimming within the context of Theorem 3, a statement dealing with positive measure. In turn, the "ubiquity" technique [6] represents an example of the power of trimming within the context of Theorem 1, a statement dealing with full measure. In short, the theory of ubiquitous systems provides a general framework for deducing full measure statements for a large class of lim sup sets and in view of Theorem 1, it is not at all surprising that "trimming" plays a central role when developing the theory.
Returning to Theorem K, note that the convergence part implies that λ W (τ ) = 0 for any τ > 1 , where for any τ > 0 we write W (τ ) for W (ψ : q → q −τ ). The set W (τ ) is usually referred to as the set of τ -well approximable numbers. The upshot of the above is that for any τ > 1, the set of τ -well approximable numbers is of measure zero and we cannot obtain any further information regarding the 'size' of W (τ ) in terms of Lebesgue measure -it is always zero. Intuitively, the 'size' of W (τ ) should decrease as τ increases. In short, we require a more delicate notion of 'size' than simply Lebesgue measure. The appropriate notion of 'size' best suited for describing the finer measure theoretic structures of W (τ ) and indeed W (ψ) is that of Hausdorff measures. Let (Ω, d) be a metric space and let X be a subset of Ω. For ρ > 0, a countable collection {B i } of balls in Ω of radius r i ≤ ρ for each i such that X ⊂ i B i is called a ρ-cover for X. Let s be a non-negative number and define where the infimum is taken over all possible ρ-covers of X. For further details concerning Hausdorff measure and dimension see [26,30,37].
The following statement is a Hausdorff measure analogue of Khintchine's Theorem. It provides an elegant criterion for the 'size' of the set W (ψ) expressed in terms of the measure H s . The convergent part is an immediate consequence of the natural generalization of Lemma CBC to Hausdorff measures (see for example [13,Lemma 3.10]). As with Khintchine's theorem, the main substance is very much the divergence part.
Theorem K-J. Let ψ : N → (0, +∞) be such that ψ(q)/q is monotonically decreasing and let s ∈ (0, 1]. Then Recall, that H 1 = 1 2 λ and so when s = 1 the above reduces to Theorem K. When s < 1, the above Hausdorff measure statement is essentially due to Jarník and dates back to 1931. Note that in this case H s ([0, 1]) = ∞ and Jarník Theorem (i.e. Theorem K-J with s < 1) implies that Hence the the 'size' of W (τ ) decreases as τ increases which is inline with our intuition. For further details and a gentle introduction to the theory of metric Diophantine approximation see [9].
The second application of our results constitutes the key element of the so-called Mass Transference Principle which enable us to deduce Theorem K-J from Theorem K. At first glance this seems rather odd since Hausdorff measures are regarded as a natural refinement of Lebesgue measure.

The power of full measure: Mass Transference Principle
The second key example exhibits the power of full measure. To set the scene, let (Ω, A, µ, d) be a locally compact metric measure space equipped with a Borel regular probability measure µ. Without loss of generality we will assume that Ω is the support of µ. With this in mind, suppose there exist constants δ > 0, 0 < a ≤ 1 ≤ b < ∞ and r 0 > 0 such that for any ball B = B(x, r) with x ∈ Ω and radius r ≤ r 0 . Such a measure is said to be Ahlfors δ-regular. It is well know that if Ω supports an Ahlfors δ-regular measure µ, then dim H Ω = δ and moreover that µ is strongly equivalent to δ-dimensional Hausdorff measure H δ -see [26,30,37] for details. The latter simply means that there exists a constant C ≥ 1 such that for every µ-measurable subset E of Ω and so (36) is equally valid with µ replaced by H δ . Also note that it is easily verified that a δ-Ahlfors regular measure is a doubling measure. Finally, throughout this section, given s > 0 and a ball B we define the scaled ball Note, by definition B δ = B and if r < 1 and s < δ then B s is a scaled up ball.
Let {B i } i∈N be a sequence of balls in Ω with radius r(B i ) → 0 as i → ∞ and suppose that In view of Lemma CBC, it follows that However, now suppose there exists some s > 0 such that the lim sup set associated with the scaled up balls B s i has full measure; that is It turns out that knowing such a full measure statement for the "scaled up" balls enables us to deduce an analogous statement for the original balls. Indeed, the following Mass Transference Principle [11, Theorem 3] allows us to transfer H δ -measure theoretic statements for lim sup subsets of Ω to general H s -measure theoretic statements. Remark 9. Note that by the definition of Hausdorff dimension, Theorem MTP implies that dim H lim sup n→∞ B n ≥ s , and moreover that H s (lim sup n→∞ B n ) = ∞ if s < δ.
With reference to Proposition 1, the key towards establishing the Mass Transference Principle is to make use of the fact that the full measure statement (A) implies the existence of the finite sub-collection K G,B of balls satisfying (C). In [11], this implication is explicitly the subject of Section 4. In short it provides deep information regarding the local distribution of the centres of the balls under consideration. This is very much at the heart of the "optimal" Cantor construction carried out in [11,Section 5] that enables one to show that H s (lim sup n→∞ B n ) = ∞ (= H s (Ω)) if s < δ. The Cantor construction itself is more technical rather than innovative -the existence of the collection K G,B is the crux! Remark 10. There have been a steady series of works [1,2,12,32,40,47,48,50] that extend the Mass Transference Principle in numerous directions, such as to systems of linear forms, iterated function schemes and large intersection sets. For an overview of the first ten years after Theorem MTP, we refer the reader to the review article [3]. The more recent work of Wang & Wu [47] is particularly notable in that it deals with lim sup sets defined via rectangles rather than simply balls. It is well worth stressing that all the above cited variants of Theorem MTP have at their heart a common feature. In one form or another, they all exploit the fact that any full measure statement such as (A) in Proposition 1 implies the existence of the finite sub-collection K G,B of balls satisfying (C).
We bring this section to a close by using Theorem MTP to show that within the world of classical metric Diophantine approximation as described in §3.1, the Lebesgue theory of lim sup sets underpins the general Hausdorff theory. This is rather surprising since the latter theory is regarded to be a subtle refinement of the former.
The claim is that in view of the Mass Transference Principle we have that Khintchine's Theorem =⇒ Jarník's Theorem ; i.e., Theorem K (which is of course Theorem K-J with s = 1) implies Theorem K-J for all s ∈ (0, 1). First of all let us dispose of the case that ψ(q)/q 0 as q → ∞. Then trivially, W (ψ) = [0, 1] and the result is obvious. Without loss of generality, assume that ψ(q)/q → 0 as q → ∞. With respect to the Mass Transference Principle, let Ω = [0, 1], d be the supremum norm, δ = 1 and s ∈ (0, 1). We are given that ψ(q)/q is monotonically decreasing and that q 1−s ψ(q) s = ∞. Let θ(q) := q 1−s ψ(q) s . Then it follows that θ(q)/q is monotonically decreasing and θ(q) = ∞. Thus, Khintchine's Theorem implies that H 1 (W (θ)) = H 1 ([0, 1]). It now follows via the Mass Transference Principle that H s (W (ψ)) = H s ([0, 1]) = ∞ and this completes the proof of the divergence part of Jarník's Theorem -the main substance of Theorem K-J. As mentioned in §3.1, the convergence part of Theorem K-J is a straight forward consequence of Lemma CBC for Hausdorff measures [13,Lemma 3.10].