Global well-posedness of the Maxwell-Klein-Gordon equation below the energy norm

We show that the Maxwell-Klein-Gordon equations in three dimensions are globally well-posed in $H^s_x$ in the Coulomb gauge for all $s>\sqrt{3}/2 \approx 0.866$. This extends previous work of Klainerman-Machedon \cite{kl-mac:mkg} on finite energy data $s \geq 1$, and Eardley-Moncrief \cite{eardley} for still smoother data. We use the method of almost conservation laws, sometimes called the"I-method", to construct an almost conserved quantity based on the Hamiltonian, but at the regularity of $H^s_x$ rather than $H^1_x$. One then uses Strichartz, null form, and commutator estimates to control the development of this quantity. The main technical difficulty (compared with other applications of the method of almost conservation laws) is at low frequencies, because of the poor control on the $L^2_x$ norm. In an appendix, we demonstrate the equations' relative lack of smoothing - a property that presents serious difficulties for studying rough solutions using other known methods.

We can define the curvature F αβ of the connection A as the real anti-symmetric tensor The (massless) Maxwell-Klein-Gordon equations for a complex field φ and a oneform A α are given by which are the Euler-Lagrange equations for the Lagrangian Throughout this paper we follow the convention that repeated indices are summed over their range. (For example, here ∂ β F αβ := 3 β=0 ∂ β F αβ for each α.) We split A into the temporal component A 0 and the spatial component A := (A 1 , A 2 , A 3 ). We similarly split the covariant spacetime gradient D α into the covariant time derivative D 0 = ∂ t + iA 0 and the covariant spatial gradient D := (D 1 , D 2 , D 3 ) = ∇ x + iA.
The Maxwell-Klein-Gordon system of equations has the gauge invariance φ → e iχ φ; A α → A α − ∂ α χ for any (smooth) potential function χ : R 1+3 → R. From this and some elementary Hodge theory, one can place this system of equations in the Coulomb gauge ∇ x · A = 0. In this gauge the Maxwell-Klein-Gordon equations become the following overdetermined elliptic-hyperbolic system of equations (see [24]): here 2 is the standard d'Lambertian 2 := ∂ α ∂ α = −∂ 2 t + ∆ and P := ∆ −1 d * d is the spatial Leray projection onto divergence-free vector fields Observe that P can be written as a polynomial combination of Riesz transforms We refer to the system (3)-(7) collectively as (MKG-CG). We shall write Φ := (A 0 , A, φ) to denote the entire collection of fields in (MKG-CG), and use Φ := (A, φ) to isolate the "hyperbolic" or "dynamic" component of these fields. (From (3) we see that A 0 obeys an elliptic equation rather than a hyperbolic one.) We can study the Cauchy problem for (MKG-CG) by specifying the initial data 1 Φ[0]. Although we specify initial data for A 0 , it is essentially redundant (assuming some mild decay conditions on A 0 at infinity) since by (3) In Section 3 we show how these conditions allow A 0 to be reconstructed from Φ.
Remark 1.2. If we ignore the "elliptic part" of the equations, then heuristically the system (MKG-CG) takes the schematic form 2 (9) 2Φ = Ø(Φ∇ t,x Φ) + Ø(ΦΦΦ) although this caricature does not capture the full structure of the equation. Indeed, in [19] it was observed that the interplay between (7) and the bilinear terms in (5), (6) allow us to write the most important components of the quadratic portion 1 Here and in the sequel we use φ[t] as short-hand for (φ(t), φt(t)). 2 The Ø() notation is made precise in the second paragraph of Section 6 below. For now, the notation can be taken to mean "terms that look like". Ø(Φ∇ t,x Φ) of the nonlinearity in terms of the null forms Q jk (ϕ, ψ) := ∂ j ϕ∂ k ψ − ∂ k ϕ∂ j ψ. We shall return to this point in Section 6.
For any 3 s > 1/2, define the norm [H s ] on initial data 4 by where H s x is the usual inhomogeneous Sobolev norm u H s x := ∇ s u L 2 x , not to be confused with the homogeneous Sobolev norm u Ḣs x . Here and in the sequel we use x as short-hand for (1 + |x| 2 ) 1/2 . We refer to [H 1 ] in particular as the energy class.
Remark 1.4. The energy class [H 1 ] is almost the H 1 x × L 2 x norm of Φ[0], a difference being that we do not place A 0 in L 2 x . Indeed, even if Φ is smooth and compactly supported, we see from (3) and the fundamental solution of the Laplacian that A 0 might only decay as fast as O(1/|x|) at infinity, which is not in L 2 x . Thus we see that the non-local nature of the Coulomb gauge causes some difficulties 5 with the low frequency component of A 0 . Although these difficulties will cause much technical inconvenience, they are not the main enemy in the low regularity theory, and we recommend that the reader ignore all mention of low frequency issues at a first reading. In particular, the reader should initially ignore the technical distinctions between the inhomogeneous Sobolev norm H s x and the homogeneous counterparṫ H s x .
1.5. Prior results. The following local and global well-posedness results are known. Regarding global solutions, if the initial data Φ[0] is smooth and obeys the compatibility conditions (8), then there is a unique smooth global solution of (MKG-CG) with initial data Φ[0]; see [11]. Furthermore, one has global well-posedness in the energy class and above:  3 We remark that the condition s > 1/2, besides being natural from scaling considerations, is also important for making sense of the non-linearity Ø(Φ∇t,xΦ) and the compatibility conditions (8), since when s < 1/2 one cannot make sense of a product of an H s x function and an H s−1 x function, even in the sense of distributions. We do not consider here the delicate issue of what happens at the critical regularity s = 1/2. 4 As is usual for wave equations, regularity in time and regularity in space are essentially equivalent, so we always expect ∂tΦ to have one lower degree of spatial regularity than Φ. 5 On the other hand, we do not have these issues with the time derivative ∂tA 0 . Indeed, if Φ(0) ∈ H 1 x then from (4) and some Sobolev embedding we see that ∇x∂tA 0 ∈ L 6/5 , and hence that ∂tA 0 ∈ L 2 x .
This Hamiltonian is clearly non-negative. In the Coulomb gauge ∇ x · A = 0 the Hamiltonian turns out to be roughly equivalent to Φ[t] 2 [H 1 ] ; see Section 4. Theorem 1.6 includes also local well-posedness in [H s ] in the range s ≥ 1. This condition for local well-posedness has been lowered to s > 3/4 by Cuccagna [10] (see also Theorem 5.1 below), and down to the near-optimal value of s > 1/2 in [30] (see also a similar result for a model problem in [25], and [27], [34] for analogous results in higher dimensions). Our results here shall rely primarily on the local theory in [10] and not on the more sophisticated techniques in [30]. 1.7. Main result. The purpose of this paper is to consider the corresponding question of global well-posedness below the energy class. Our main result is Theorem 1.8. Theorem 1.6 also holds in the range 1 > s > √ 3/2. Remark 1.9. This result was announced previously by the first and third authors in the smaller range 1 > s > 7/8. Prior to the finalization of this paper, we had announced this result for the improved range s > 5/6, but some of the estimates used in that argument turned out to be incorrect.
These results can be compared with the theory for the nonlinear wave equation (compare with (9)). This equation has the same scaling (10) as (MKG-CG) but is simpler due to the lack of derivatives in the nonlinearity. For this equation one has local well-posedness all the way down to the critical regularity s ≥ 1/2 (though at s = 1/2 the time of existence depends on the profile of the data, and not just its norm), and global well-posedness for smallḢ 1/2 data, see e.g. [29,14,36]. For large data global well-posedness is known for s ≥ 13 18 [31], extending previous work that had gotten to s > 3 4 [17], [13], [2]. Since the local well-posedness theory for (MKG-CG) has been improved in [30] to nearly match that for (NLW), one might therefore hope to improve the results here to, say s > 3/4, although this is by no means automatic and we will not do so here given that the argument is already quite lengthy.
Our proof proceeds by the method of almost conservation laws, sometimes called the "I-method", introduced in [16] and in the earliest versions of the present work (see e.g. [3] for use of the method in a more straightforward context than the present one). The basic idea is to introduce a special smoothing operator I = I N of order 1 − s depending on a large parameter N , and then consider the quantity H[IΦ [t]], which turns out to be finite (but large) for Φ[t] ∈ [H s ]. When s = 1 then I is the identity and H[IΦ [t]] is exactly conserved. When s < 1 we do not have exact conservation, but we will be able to show (using a modified local wellposedness theory) that H[IΦ[t]] is "almost conserved" in that its derivative is very small (indeed, it will be bounded by a negative power of N ). This will allow us to control the solution for long times (a positive power of N ). Letting N go to infinity we obtain the result. Unfortunately the operator I maps H s x to H 1 x with a large operator norm (like O(N 1−s )), and when s is large this loss of N 1−s can overwhelm the almost-conservation of H[IΦ [t]], which is why we have the rather artificial restriction s > √ 3/2. Subsequent refinements of the "I-method" in [5]- [9] suggest that this restriction can be lowered by adding additional "damping correction" terms to H[IΦ [t]] to reduce the size of the derivative, but we will not pursue these matters here. Certainly we do not expect √ 3/2 to be the sharp threshold of global well-posedness. (For instance, many, though unfortunately not all, of the components of the argument are also valid in the regime s > 5/6, and some parts are even valid in the range s > 3/4.) These results are similar to those of the earlier work of Bourgain [1] and later authors in obtaining global well-posedness for nonlinear wave, Schrödinger and KdV-type equations below the energy norm. However the methods are slightly different; instead of using a smoothing operator I, the method of Bourgain relies on truncating the solution u at frequency N into a low frequency component and a high frequency component, and controlling the evolution of the two components separately (except for some periodic adjustments at regular intervals). This approach gives much better control on the solution (for instance, the method shows the high frequencies behave almost like the linear flow), but requires "extra smoothing estimates" on the nonlinear component of the solution, in particular placing that component in the energy class even when the solution is in a rougher Sobolev space. For the equation (MKG-CG) these extra smoothing estimates are not available for the worst term in the nonlinearity, namely P(A) · ∇ x φ, mainly because of the derivative ∇ x ; indeed in the appendix we will give an argument that shows that this extra smoothing fails for [H s ] solutions for any s < 1. Fortunately, the I-method can circumvent this problem by using commutator estimates as a substitute for extra smoothing estimates. See [38, §3.9] for some further discussion.
1.10. Organisation of the paper. After setting some general notation in Section 2, we describe some useful elliptic estimates for A 0 in Section 3, we are able to investigate the Hamiltonian in Section 4, and show that this Hamiltonian largely controls the [H 1 ] norm of Φ. This allows us to begin the proof of Theorem 1.8 in Section 5, where we reduce matters to showing a standard local well-posedness result (Theorem 5.1) as well as an almost conservation law (Proposition 5.4) for the modified Hamiltonian H[IΦ]. To achieve these tasks, we write the (MKG-CG) equation in Section 6 into a more schematic form which will be more convenient to manipulate. After recalling some H s,b theory in Sections 7-8, we are then quickly able to establish the local well-posedness result in Section 9. To prove the almost conservation law, we will then need a modified local well-posedness result (Proposition 10.1), established in Section 10. We then differentiate the Hamiltonian in Section 11, leading to a number of commutator terms we need to control. The terms arising from cubic nonlinearities are relatively easy and are dealt with in Section 12. The terms arising from bilinear nonlinearities are rather complicated and we shall deal with them en masse using some specialized notation in Section 13, which allows us to deal with the non-null form bilinear terms in Section 14 and the null form terms in Section 15.
Finally, in the appendix we demonstrate why the system (MKG-CG) does not have the smoothing effect necessary for Bourgain's Fourier truncation method [1] to be applicable.
1.11. Acknowledgements. The first named author was supported by the NSF, and the Sloan and McKnight Foundations. The third author is supported by a grant from the MacArthur Foundation, by NSF grant DMS-0649473, and by the NSF Waterman award. We thank the referee for a careful reading of the paper.

Notation
We fix an exponent s, which will usually be in the range √ 3/2 < s < 1. We use C to denote various large constants depending on s and on some other quantities which we will indicate in the sequel. We use A B (or A = O(B)) to denote the estimate A ≤ CB and A ≪ B to denote the estimate A ≤ C −1 B. We use a+ and a− to denote expressions of the form a + ε and a − ε, where 0 < ε = ε(s) ≪ 1 denotes a small number; the implicit constants C referred to above are allowed to depend on ε. Note that a may be negative; thus for instance − 1 2 + = −( 1 2 −) is a number slightly larger than − 1 2 .
Given any Banach space X and any injective linear operator T : X → Y , let T X denote the Banach space {T u : u ∈ X} with norm If X is a Banach space, we shall use B(X) to denote the unit ball B(X) := {f ∈ X : f X ≤ 1}. Thus rB(X) = B(rX) is the ball of radius r, is the unit ball of the Banach space X + Y , T B(X) = B(T X) is the unit ball of T X, etc. We use the embedding notation X ⊆ Y to denote the estimate B(X) ⊆ CB(Y ), or equivalently that f ∈ Y and f Y f X for all f ∈ X. We will need this rather unusual notation due to our reliance on compound spaces X + Y in some of our later arguments.
We useφ to denote the spatial Fourier transform and define fractional derivative operators in the usual manner: We recall the Sobolev multiplication laws on R 3 . Specifically, we have x whenever s 1 + s 2 ≥ 0, s ≤ min(s 1 , s 2 ) and s < s 1 + s 2 − 3 2 . See e.g. [37]. Of course, the implicit constant here depends on s 1 , s 2 , s. A special case of this inequality is Ifŵ is supported in the region |ξ| 1 thenḢ 1 x is equivalent to H 1 x and the claim follows from (14) (since s > 3/4). Thus we assume thatŵ is supported on the region |ξ| ≪ 1.
Taking the fractional derivative ∇ x 1−s and applying the fractional Leibnitz rule (taking Fourier transforms and assumingû,ŵ to be real and non-negative if desired) we reduce to showing that Since w only has low frequencies, the operator ∇ x 1−s is harmless when applied to w, and it will suffice to prove the latter inequality. But we can use Hölder to measure ∇ x 1−s u in L 3 x and w in L 6 x , and the claim follows from Sobolev and the assumption 3/4 < s ≤ 1.
We shall also use the Sobolev embeddingsḢ x extremely frequently in the sequel. Of course these homogeneous Sobolev embeddings imply various inhomogeneous Sobolev embeddings, e.g. that H s x ⊆ L 3 x whenever s ≥ 1/2.
As mentioned in the introduction, one of the technical difficulties with (MKG-CG) is that it is not always possible to control the low frequency portion 6 of the fields (A 0 , Φ) satisfactorily in L 2 x norm. To get around this we shall estimate the low frequencies in other L p x norms. In this section we develop some of the theory of frequency-localized Lebesgue spaces. 6 In the introduction, it was only A 0 which had difficulty getting into the L 2 norm, and the [H s ] norm allowed us to control Φ in L 2 . However, for the global existence argument we shall need to rescale the fields (A 0 , Φ) by a large dilation factor λ. This rescaling is needed to make the (subcritical) Hamiltonian small, but it also makes the (supercritical) L 2 x norm large. While it is possible to continue using the L 2 x norm, it leads to inferior numerology (in particular, the range of possible s is greatly reduced) so we shall avoid doing so, using other (non-supercritical) Lebesgue spaces such as L 6 x as substitutes.
Definition 2.2. If 1 < p ≤ ∞ and R > 0, we define the space L p R to be the subspace of L p (R 3 ) consisting of those functions whose Fourier support is contained in the ball |ξ| ≤ R. (We keep the L p norm structure on this subspace L p R .) We will use very specific instances of these spaces such as L 6 1 , L 3 2 , and L ∞ 10 .
Observe that if R is bounded, then derivatives are bounded on L p R : This is clear since ∇ is equivalent to a standard symbol of order 0 on frequencies |ξ| ≤ R. From this and Sobolev embedding (or Bernstein's inequality) we see that The functions in L p R are thus very smooth (in fact, they are analytic). The p exponent thus does not measure regularity, but instead controls the decay at infinity.
From Hölder's inequality we have (17) L p R · L q R ′ ⊆ L r R+R ′ whenever 1/r = 1/p + 1/q, since the frequency support of a product is contained in the sum of the frequency supports of the factors.
In particular, if R is bounded, then functions in L p R are bounded, and so L p R · L 2 ⊆ L 2 . From (15) and the Leibnitz rule we thus have (18) L p R · H s x ⊆ H s x for all integer s ≥ 0. By duality this also holds for integer s ≤ 0. By complex interpolation this thus holds for all real s.
Finally, we prove an "energy estimate" for the L p R spaces. Let us restrict spacetime to a slab [t 0 −δ, t 0 +δ]×R 3 for some 0 < δ ≪ 1 and some time t 0 . Suppose u is such that Indeed, this follows from the Duhamel formula and the fact that ∇ x , cos((t−t 0 ) √ −∆) and sin((t−t0) are equivalent to symbols of order 0 for frequencies ≤ R and times t ∈ [t 0 − δ, t 0 + δ].

The elliptic theory of A 0
In this section we develop some elliptic theory for how the connection component A 0 depends on φ and A. We shall establish a smoothing effect that allows us to The equations (3), (4) for a fixed time t can be rewritten as We view (20) as a linear elliptic system for two unknown fields A 0 , A 0,t in terms of data φ, φ t , A; the t subscript here should be viewed as simply a label, thus A 0,t and φ t are not being interpreted here as the time derivatives of A 0 or φ.
Our main result here is as follows.
(we do not assume here that φ, A, φ t solve (MKG-CG) or obey any compatibility conditions). Then there exists unique A 0 ∈ H 1 x and A 0,t ∈ L 2 x obeying (20), and we also have the bounds , and A ′ 0 , A ′ 0,t are the associated solutions to (20), then we have the local Lipschitz bound x ≪ 1) then we can iterate away the linear term A 0 |φ| 2 , and controlling A 0 in terms of φ is straightforward. However we do not assume any smallness condition on φ, and so we must proceed with some care. In particular, we must augment perturbation theory with some variational methods 7 .
Proof. The Schrödinger operator −∆ + |φ| 2 maps H 1 x to H −1 x (using (14)), and is clearly positive definite. From Lemma 2.1 we have Im(φφ t ) ∈ H −1 x . From (20) we thus conclude that A 0 is unique as claimed. The uniqueness for A 0,t is obvious.
To prove the remaining claims in the proposition, it suffices by the usual density arguments to verify the case when φ, φ t , A are smooth and rapidly decreasing in space, which we shall now assume throughout.
From (20) and standard Euler-Lagrange theory, we see that A 0 can be now be constructed as the unique minimizer in H 1 x of the convex functional 8 This gives existence of A 0 . The existence of A 0,t is clear from Hodge theory, since the right-hand side of the second equation in (20) is curl-free. We shall now use this variational formulation to establish the bounds (22), (23).
At first glance it seems we are in trouble when s < 1 because (25) cannot be controlled by the H s In particular, sinceL φ,φt (A 0 ) ≤ 0, we have so by Cauchy-Schwarz we have (21). From Lemma 2.1 see that A 0 verifies (22). The claim for A 0,t is much simpler, following easily from (3), Lemma 2.1, Sobolev and Hölder.
It remains to establish (23). We fix φ, A, φ t , φ ′ , A ′ , φ ′ t (and hence A 0 , A ′ 0 ). From (22) we have (26) A 0 Ḣ1 If we write A 0 = A ′ 0 + h and A 0,t = A ′ 0,t + h t , our task is to show that (27) h Ḣ1 We begin with the estimation of h Ḣ1 x . From the variational characterisation of ) ≤ 0 and thus by the triangle inequality we thus see that it suffices to show that The expression (25) can also be interpreted as the component of the Hamiltonian (13) which depends on A 0 . and similarly for A 0 replaced by A ′ 0 . Using the definition ofL, we may estimate the left-hand side of (29) by Splitting φφ t − φ ′ φ ′ t as a sum of two differences and using (24) and Lemma 2.1, we obtain Also, from (26) and Sobolev we have and from Sobolev we have H s x and the claim (29) follows. This yields the desired bound (27) for h Ḣ1 x . The analogous claim for h t follows from (3), Hölder, Sobolev, and Lemma 2.1 as before. This gives (27) and hence (23) as desired.
Remark 3.3. Heuristically, Proposition 3.1 allows us to eliminate A 0 from (MKG-CG), and think of this system as an evolution purely in Φ. Indeed from the above analysis one morally has A 0 ≈ ∆ −1 (Φ∇ x Φ). However we shall keep A 0 explicit in our computations.
Remark 3.4. Proposition 3.1 asserts that A 0 is somewhat smoother than Φ: it is inḢ 1 x even though φ is merely in H s x . However we cannot place A 0 in H 1 x or even in L 2 x because of the slow decay of A 0 at infinity mentioned earlier.  norm by a smooth Φ ′ [0] which still obeys the divergence-free condition ∇ x · A ′ [0] = 0. Then we construct A ′ 0 (0) as above, and ∂ t A ′ 0 (0) by (4). From (23)

Fixed-time Hamiltonian estimates
Using the elliptic theory for A 0 and the machinery of frequency-localized spaces, we are now ready to understand the Hamiltonian (13).
From (13) and the triangle inequality we have x . From Hölder's inequality and the Sobolev embeddingḢ 1 x ⊆ L 6 x we thus have (30) H  [24], however in that paper some L 2 x control on φ and A was also assumed at time zero. We will not be able to use such control as the L 2 x norm is supercritical 9 and so will behave badly with respect to a rescaling argument which we will use later. Fortunately, we can still obtain good control on Φ without the L 2 x norm, although some odd things happen at low frequencies.
Then we have the estimates Informally, control of the Hamiltonian allows one to place most of A 0 (t) and Φ [t] in H 1 x × L 2 x , except for the low frequency component, which is only in L 6 x or L 3 x . The hypothesis that A 0 , Φ are [H 1 ] functions is a purely qualitative hypothesis; the constants C do not depend on the [H 1 ] norms of these functions.

Proof. From the hypothesis and (13) we have
From (35) and the hypothesis ∇ x ·A(t) = 0 we have A(t) ∈ CB(Ḣ 1 x ). Also, by taking divergence-free and curl-free components of (34) using the hypothesis ∇ . Combining these estimates together we obtain (33). Using the embeddingḢ 1 x ⊆ H 1 x + L 6 1 which comes from applying Sobolev embedding to the low frequencies oḟ H 1 x , we thus see that A 0 and A satisfy the required estimates (31), (32).
It remains to show the corresponding estimates for φ. We begin with the pointwise identity for j = 1, 2, 3. In particular we have the "diamagnetic inequality" |∂ j |φ(t)|| ≤ |D j φ(t)|. 9 One might consider adding a mass term to (MKG-CG) and posing the same global wellposedness questions. It seems likely that one has similar results for the massive (MKG-CG), however the argument would be technically more complicated due to the Schrödinger-like behaviour of low frequencies. Also, the mass term is still supercritical and so this does not solve the difficulties of using the L 2 norm.
From (37), the Sobolev embeddingḢ 1 x ⊆ L 6 x , and the trivial observation that |φ(t)| and φ(t) have the same L 6 x norm we thus have (38) φ(t) ∈ CB(L 6 x ). Also, from our estimates on the A j and A 0 and Sobolev embedding we have (39) A 0 (t), A j (t) ∈ CB(L 6 x ). By Hölder we thus have . On the other hand, if we take the divergence of (37) using the hypothesis ∇ (39) and Hölder we have x . From this and the previous we thus have ∆φ(t) ∈ CB(H −1 x ). We now divide φ(t) smoothly into a low frequency component supported on |ξ| ≤ 1, and the remainder supported on |ξ| ≥ 1/2. From (38) and the above equation we see that the low frequency part is in L 6 1 and the remainder is in H 1 x , so φ(t) obeys (31).

Global well-posedness: preliminary reduction
We now begin the proof of Theorem 1.8. Fix √ 3/2 < s < 1, and fix the initial data Φ[0] obeying the hypotheses of the theorem. Let T * denote the maximal time of existence for which one can construct a solution Φ in [H s ]; our objective is to show that T * is infinite.
In Section 9 we shall prove the following local well-posedness result (essentially due to Cuccagna [10]): Remark 5.2. In view of the work of Cuccagna [10], the local existence theorem here should in fact extend to the range s > 3/4, and a possibly weakened version of this local existence theorem should also hold in the range s > 1/2 thanks to the work of Machedon and Sterbenz [30]. However, to avoid technicalities we will restrict ourselves to the case s > 5/6 (which covers the range s > √ 3/2 that our main theorem covers).
Assume this theorem for the moment. Then we have T * > 0. Furthermore, if T * is finite, then Theorem 5.1 forces one to have lim t→T * Φ[t] [H s ] = +∞. Thus if we can prove the polynomial growth bound (12) for t < T * , we will have obtained global well-posedness.
By another application of Theorem 5.1 and a standard limiting argument (using Remark 3.5) we may assume that Φ[0] is smooth and [H 1 ], in which case we have a global smooth and [H 1 ] solution from the results in the introduction ( [11,24]). Thus it will suffice to prove (12) for global smooth solutions.
Henceforth our constants C are allowed to depend on Fix the time T in (12). In view of Theorem 5.1 we may assume T 1. As is usual in applications of the I-method, we will need to rescale the equation using (10), replacing Φ by the rescaled solution for some large λ = λ(T ) ≫ 1 to be chosen later. Note that Φ (λ) also solves (MKG-CG). In order to obtain (12) at time T we will need to control Φ (λ) at time λT .
We would like to use the Hamiltonian H[Φ (λ) [t]] defined in (13). Unfortunately we do not have enough regularity on Φ to ensure this Hamiltonian is finite since s < 1.
(On the other hand, A 0 has enough regularity thanks to (22).) To get around this difficulty we shall use the method of almost conservation laws.
We pick a large number N = N (T ) ≫ 1 to be specified later. Let m(ξ) be a smooth radial positive symbol such that m(ξ) = 1 for |ξ| ≤ N and m(ξ) = |ξ| s−1 /N s−1 for |ξ| > 2N , and let I be the Fourier multiplier Thus I is the identity for bounded 10 frequencies |ξ| ≪ N and is smoothing of order 1 − s for high frequencies |ξ| N . Observe that the convolution kernel of I is integrable, thus I is bounded on every translation-invariant Banach space. Furthermore, we have the smoothing estimates 10 We have two frequency cutoffs in our argument, one at 1 and one at N . To avoid confusion as to what "low" and "high" frequency are, we refer to frequencies |ξ| 1 as low, frequencies 1 ≪ |ξ| ≪ N as medium, and frequencies |ξ| N as high. We will also refer to frequencies |ξ| ≪ N as bounded, and frequencies |ξ| ≫ 1 as local.
] as a substitute for the Hamiltonian. Unfortunately, the loss of N 1−s in (41) would make the modified Hamiltonian large. However, the scaling parameter λ (combined with the fact that the energy regularity H 1 x is sub-critical) can be used to rescale the Hamiltonian to be small again. More precisely, we have Proof. Observe that The former estimate is easy, in fact by (42), (22) we have For the latter estimate we use (42): Thus we can make the modified Hamiltonian small at time zero. In order to control the modified Hamiltonian at later times we use the following almost conservation law for the modified Hamiltonian: Proposition 5.4 (Almost conservation law). Let (A 0 , Φ) be a global smooth solution to (MKG-CG), and suppose t 0 is a time such that Then we have Remark 5.5. The error of O 1 N (s−1/2)− corresponds to the restriction s > √ 3/2, but is not optimal. In particular it seems feasible that one could improve this error to O(N −1/2+ ), which would in principle allow us to obtain global well-posedness for s > 5/6. For the equation (NLW), an error of O(N −1+ ) is attainable, which corresponds to the regularity s > 3/4 (cf. [17]). By combining this conservation law with additional techniques, the global well posedness of (NLW) was extended to the range s > 13/18 in [31], with a further gain to s > 7/10 in [32] (in the spherically symmetric case).
The proof of Proposition 5.4 is rather lengthy and (together with Theorem 5.1) will occupy Sections 10-15. For now, we see how this proposition implies (12) and hence Theorem 1.8.
A little algebra shows that we can choose λ ≫ 1, N ≫ 1 so that (47) and (43) simultaneously hold, so long as s > √ 3/2. Furthermore, both λ and N are at most polynomial in T .
To finish up we use an integration in time argument inspired by a similar argument from [19]. From (46) and Lemma 4.1 we have that . Using the fundamental theorem of calculus . Splitting Φ smooth into low frequencies |ξ| ≤ 4 and a remainder term |ξ| ≥ 2 and using (48) we obtain . On the other hand, from Lemma 4.1 we have . Multiplying the two using the Sobolev embedding H 1 x ). Applying the fundamental theorem of calculus and (49) again we obtain Combining this with (48) again we obtain Combining this with (22) we obtain . Undoing the scaling we thus obtain (12) as desired. This proves Theorem 1.8.
It remains to show Theorem 5.1 and Proposition 5.4. This will occupy the remainder of the paper.

A caricature for MKG-CG
The system (MKG-CG) may appear excessively complicated, due to vector structures, Riesz transforms, complex conjugates, and constants such as 2i. To clean up some of the clutter we shall adopt some notational conventions to reduce (MKG-CG) to a "caricature" form, which we will then use to prove both Theorem 5.1 and Proposition 5.4.
We adopt the convention that if A is a scalar-, vector-, or tensor-valued quantity, then Ø(A) denotes an expression which is schematically of the form A, or more precisely a finite linear combination of expressions of the form T i Re(A i ) and T ′ i Im(A i ), where A i denotes the various components of A and T i , T ′ i either denote constants or Riesz transforms (which arise due to the presence of the Leray projection P). We recall the well-known fact from Calderón-Zygmund theory (see e.g. [35]) that these operators are bounded on L p x for every 1 < p < ∞. We can then define quadratic schematic expressions Ø(AB) and cubic ones Ø(ABC) by using the convention that AB denotes the tensor product of A and B (viewed as real tensors rather than complex, thus for instance Re(A)Im(B) = Ø(AB)), etc. For example, we have and we can therefore rewrite (MKG-CG) in the caricature form where the bilinear and trilinear nonlinearities N 0 , N 1 , N 2 , N 3 are defined as the tensors Remark 6.1. The cubic nonlinearity N 3 is relatively easy to deal with. The nonlinearity N 2 would be dangerous if it were present in the "hyperbolic" equation for 2Φ, but fortunately only affects the "elliptic" equation for A 0 , which has better smoothing effects with which to handle this nonlinearity. The nonlinearity N 1 is tractable due to the high regularity of A 0 . The null form N 0 = N 0 (Φ, Φ) is perhaps the most interesting. It is a special case of the more general quadratic form N 2 , or more precisely However, one can express N 0 more carefully as See [24] for more details.
Remark 6.2. The equation is sometimes used as a simplified model for (MKG-CG) (and also for Yang-Mills equations in the Coulomb gauge); see e.g. [25], [27]. However we will not use this model equation here.

Function spaces
We now recall some notation for the function spaces we shall use to control the nonlinear expressions N 0 , N 1 , N 2 , N 3 properly, which will be useful both for proving Theorem 5.1 and Proposition 5.4.
Given a spacetime function φ : R × R 3 → C, we useφ to denote the spacetime Fourier transformφ Of course, the spacetime Fourier transform only makes sense if φ is defined globally on R × R 3 (as opposed to a spacetime slab such as [0, T ] × R 3 ). In practice this difficulty is avoided by using the spacetime Fourier transform to define global function spaces, and then define their local counterparts by restriction.
If X is a Banach space of functions on R 3 , we use L q t X to denote the space of functions whose norm is finite, with the usual modifications when q = ∞; we also let C 0 t X be the space of bounded continuous functions from R to X with the supremum norm. In particular, we have the mixed Lebesgue spaces L q t L r x and the energy spaces x . These spaces localise to spacetime slabs I × R 3 in the obvious manner.
For any s, b ∈ R, we denote the space 11 We now formalize the well-known fact that H s,1/2+ functions are "averages" of free H s x solutions to the wave equation (see e.g. [33,Proposition 7] or [38, Lemma 2.9]). Lemma 7.1 (H s,b decomposes into free solutions). Let φ ∈ H s,b for some b > 1/2. Then for each λ ∈ R there exists a global solution φ λ to the free wave equation 1 for all t, and a co-efficient a(λ) ∈ R, such that for all t, and such that where the implicit constant can depend on b.
Proof. Without loss of generality we may assume that the spacetime Fourier transformφ is supported on the upper half-space {(τ, ξ) : τ ≥ 0}. We then write where δ is the Dirac delta. If we then define we see that all the relevant properties are easily verified except perhaps for the L 1 λ bound on a, which we compute using Cauchy-Schwarz: As a particular consequence of this lemma, we see that if one can imbed H s x × H s−1 x free solutions in a spacetime Banach space X which is invariant under time modulations φ(t) → φ(t)e itλ , then one can also imbed H s,1/2+ solutions into the same space. In particular we have for any s ∈ R. Also, from Strichartz' estimate (see e.g. [15], [36], and the references therein) and Lemma 7.1 we have where the estimate is known to fail (see [19]). If time is localized to an interval, one can also use Hölder in time to lower the q index.
Finally, we recall I whenever the right-hand side is finite. The implied constant of course depends on σ, b, s.
Note the factor of |I| σ/2 on the right-hand side of (60); this factor will be very convenient for the large data theory. The fact that σ is allowed to be as large as 1−b (rather than 1/2) allows us to reach s > √ 3/2 ≈ .866 rather than s > 7/8 = .875.

Bilinear estimates
To obtain local well-posedness in H s We are now ready to prove the main bilinear estimates needed to handle the nonlinear expressions N 0 , N 1 , N 2 . Proposition 8.1 (Bilinear estimates). Let 3/4 < s < 1. Then we have the estimates where η is any bump function and s ≤ s ′ , s ′′ ≤ 1 are exponents which are not both equal to 1, and A 0 , φ, ψ are arbitrary functions for which the left-hand side makes sense. (Of course, the implicit constants depend on s, η, s ′ , s ′′ .) These estimates were essentially proven in [10], but we sketch a proof here based on the bilinear estimates in [12]. For local existence we need to take s ′ = s ′′ = s, but for global existence we will also need one of s ′ , s ′′ to equal 1 instead.
x and the claim follows from Lemma 7.1. If we then apply similar reasoning to ψ we obtain To establish (61) in the case where ψ has Fourier support in the region |ξ| 1, one applies (66) with ψ replaced by ∇ x ∆ −1 ∇ t,x ψ and then takes traces. The final remaining case of (61) to check is when ψ has Fourier support in the region |ξ| 1. In that case, the norm ∇ t, Again this estimate follows from the previous when φ and ψ both have Fourier support in the region |ξ| 1 (note that multiplication by η(t) is bounded on any H s,b space). Now suppose φ has Fourier support on |ξ| 1. Crudely writing N 0 (φ, ψ) = Ø(φ∇ x ψ) and estimating the H s−1,s−1 norm by the L 2 t H s−1 x norm, we argue as with (61), the only difference being that −1/2+ has been replaced by s−1.
Finally, we prove (62), (63). From the embeddings H s, 3 (57)) and the trivial embedding H s−1,s−1 ⊆ L 2 t H s−1 x , it suffices to show the spatial product estimates The first estimate follows directly from (14). To prove the second we use duality to convert it to If f has Fourier support in the region |ξ| 1 then this again follows from (14), so we may assume f has support in the region |ξ| 1. But then this follows by applying the fractional Leibnitz rule for ∇ x 1−s and the Sobolev embedding

Local existence
We are now finally ready to prove the local existence result, Theorem 5.1.
Remark 9.1. This result is essentially in [10], but for our application we need local well-posedness for large data as well as small. One cannot simply rescale large data to be small because the L 2 x norm is supercritical in (MKG-CG). By localizing in time and modifying the "b" index of the H s,b norms we can control the "hyperbolic" component of the large data evolution for short times. However for the "elliptic" component of the evolution localizing in time does not help. One might try localizing in space, but this is tricky because the non-local Coulomb gauge has destroyed finite speed of propagation, and one would probably be forced to use local Coulomb gauges, cf. [20] and [39]. Fortunately, the variational estimates in Section 3 will allow us to avoid these difficulties.
on this slab; the Theorem then follows by a standard limiting argument, using the remarks at the end of Section 3 to approximate rough data by smooth data.
Define the norm Φ X on the slab [−T, T ] × R 3 by 1 is a small number to be chosen later. We shall show that which will imply (68) by (57) and (23).
To simplify the exposition we shall just prove the bound but the reader may verify that the arguments below can be easily adapted to differences.
We begin with the Φ component of the X norm. By (60) and (50) we have .
From Proposition 8.1 (restricted to [−T, T ] in the obvious fashion) we can control the N 0 , N 1 nonlinearities: On the other hand, from Strichartz (58) we have 14

Also, we have by Sobolev and (22)
Combining these estimates we can control the cubic nonlinearity N 3 : Putting all of this together we obtain By (50) we thus have 14 It is here that we crucially make use of the hypothesis s > 5/6. It is likely that the methods of Cuccagna [10] can control the cubic terms N 3 by more sophisticated estimates than Strichartz estimates in the larger range s > 3/4, but we will not need to do so here. We thank the referee for these points.
We estimate the second term using Proposition 8.1 (observing that s+s−2 > −1/2+ since s > 3/4), and we estimate the third term by the computations in (72), to obtain Combining this with our previous bound for Φ we obtain If we now choose ε = ε(M ) sufficiently small, and T = T (ε, M ) sufficiently small, we thus see that there is an absolute constant C such that By standard continuity arguments this implies that Φ X ≤ CM , as desired. The adaptation of this scheme to differences is routine (using (23) instead of (22)) and is left to the reader.

Modified local well-posedness
To finish the proof of Theorem 1.8, it only remains to prove Proposition 5.4. Fix From a continuity argument we may assume a priori that Our objective is to control the modified Hamiltonian at times t close to t 0 . In order to do this we must obtain estimates on Φ away from t 0 . One obvious possibility is to combine (75) and Lemma 4.1; this for instance will give the estimate x on the slab [t 0 − δ/2, t 0 + δ/2]. However these types of estimates (which basically place IΦ in C 0 t [H 1 ]) will not be the right type of estimates (except for the low frequency component) for estimating the change in the Hamiltonian 15 ; it turns out that we need estimates in H s,b spaces, which are not directly controlled by the Hamiltonian.
One might hope to apply Theorem 5.1, since the regularity (44) should be enough to put Φ in [H s ]. However this is inefficient (basically because there is a significant loss in using (41)) and in addition there are some low frequency issues, because of the error terms in Lemma 4.1. So we shall instead require a modified local well-posedness result which is adapted to the estimates arising from Lemma 4.1. 15 Basically, the problem is that Sobolev embedding in three dimensions does not allow C 0 t [H 1 ] to control L ∞ x norms, so that nonlinearities such as Φ∇xΦ cannot be placed in L 2 x norms. To get around this we must use Strichartz embeddings and null form estimates, which in turn necessitates the use of H s,b spaces. We remark that in one dimension one can rely purely on Sobolev embedding and obtain global well-posedness results for nonlinear wave equations below H 1 without using H s,b or Strichartz norms; see [1].

From (44) and Lemma 4.1 we thus have
(recall that I is the identity on low frequencies). We now extend this control at time t 0 to control on [t 0 − δ, t 0 + δ] with the following proposition, which is the main result of this section.
Proposition 10.1 (Spacetime control on Φ). Adopt the assumptions of Proposition 5.4, and suppose in addition K ≫ 1 is a sufficiently large constant. We conclude there is a δ with c(K)N 0− ≤ δ ≤ C(K), such that Remark 10.2. Note that the "b" index is now s− instead of 3/4+. This extra regularity in the b index will turn out to be helpful when proving (45). These types of estimates morally follow from the local well-posedness theory (or more precisely, the multilinear estimates underlying that theory) using tools such as [8, Lemma 12.1], but for various technical reasons it is not feasible to do so directly, and so we have chosen instead the following more pedestrian argument.
Proof. All our spacetime norms here will be on the slab [t 0 − δ, t 0 + δ]. Many of the unpleasant technicalities in the following argument will arise from the low frequency terms L p R , and the reader is advised to ignore all the contributions from these terms in a first reading as they are not the essential difficulty.
By the continuity method and the smoothness of Φ it will suffice to prove this under the a priori assumption (since this will imply that the space of δ for which (79) holds is both open and closed, if δ is restricted to be sufficiently small).
To prove this, we begin by estimating Φ in various auxiliary norms. Split Φ smoothly into a low frequency component Φ low supported on |ξ| ≤ 2, and a local component Φ local supported on |ξ| ≥ 1. From (80) we have . Also, from Sobolev embedding, (57) and the low frequency restriction we have ∂ t Φ low ∈ CKB(C 0 t L 3 2 ) while from (77) and Sobolev embedding we have Φ low (t 0 ) ∈ CB( L 6 1 + L 3 2 ) so by the fundamental theorem of calculus we have . Combining these estimates we obtain (58) and (16) we have in particular that Φ ∈ CKB(L 6 t L 6 x ).
Our next task is to obtain estimates on A 0 . From (81), (57), (41) and Sobolev embedding we have . From Lemma 2.1, (18), and (17) we thus have x ). Combining this with our L 6 t L 6 x bound on Φ we thus have Φ ∈ CK 2 B(L 6 t L 6 x ). In particular we can control the cubic nonlinearity N 3 : Proof. We first prove (88). By an appropriate Fourier decomposition 16 it will suffice to prove the estimate in the following four cases: • (bounded-bounded interaction)φ,ψ are supported on the region |ξ| ≤ N/2. • (high-high interaction)φ,ψ are supported on |ξ| > N/10. • (bounded-high interaction)φ is supported on |ξ| < N/5 andψ is supported on |ξ| > N/4. 16 For instance, one can decompose bothφ andψ into three components with frequency support N , ∼ N , N respectively, in such a way that each of the nine interactions falls into one of the four categories described below.
Similarly for ψ. This allows us to deal with the highhigh, high-bounded, and bounded-high cases. In the bounded-bounded case, all the I operators are the identity, so it suffices to show where we have used the N 0+ to gain an epsilon regularity on the bounded frequency function φ. We crudely estimate the H 0,s−1 norm by the L 2 t L 2 x norm and crudely write the null form N 0 (φ, ψ) as Ø(φ∇ x ψ). The claim then follows from the Strichartz embeddings H 1+,s− ⊆ L 2+ t L ∞ x and H 0,s− ⊆ C 0 t L 2 x from (58).
Now we prove (86), (87). In the bounded-high and high-high cases this follows from (62), (63) by the same arguments as before (indeed, we may even weaken the H 1/2+,0 norm to I −1 H 1/2+,0 ). It remains to consider the bounded-bounded and high-bounded cases. We begin with the bounded-bounded case. Estimating the H 0,s−1 norm by the L 2 t L 2 x norm, it suffices to show For the first estimate we use the Sobolev embedding H 1/2+,0 ⊆ L 2 t L 3 x and the Strichartz estimate H 1,s− ⊆ C 0 t L 6 x from (58). For the second we use the Sobolev embedding A 0 L 2 t L ∞ x ∇ x,t A 0 H 1/2+,0 and the Strichartz estimate H 0,s− ⊆ C 0 t L 2 x . The same argument also deals with the high-bounded case; one can use the theory of paraproducts to cancel the factors of I.
Finally we prove (85). In the high-high case we compute using (61) with s ′ = s ′′ = s: which is acceptable in the high-high case. In the high-bounded case we use (61) with s ′ = s and s ′′ = 1: which is acceptable. For the bounded-high case we similarly use (61) with s ′ = 1 and s ′′ = s. Now we turn to the bounded-bounded case. Here we take advantage of the additional N 0+ factor; it suffices to prove But then this follows from the Strichartz embeddings H 1+,s− ⊆ L 2+ t L ∞ x and H 0,s− ⊆ C 0 t L 2 x from (58).
We can now control N 2 . Indeed we claim To prove this we multiply (80) and (81). To multiply the two H s,b spaces we use 17 (85). To multiply the L p R spaces we use (17), (16) (indeed we can get into H 0,0 = L 2 t L 2 x for these terms).
It remains to handle the cross terms when H s,b is multiplied against a L p R . By (80) and (81) it suffices to prove the embedding x which easily follows from (18) and a decomposition in to high and bounded frequencies.
To control the A 0 ∂ t Φ component of N 1 , we use (80). The 2KB(C 0 t L 3 2 ) component of ∂ t Φ will be acceptable from (83), as this places this contribution to A 0 ∂ t Φ in C 0 t L 2 x , which easily embeds into I −2 H 1,0 [t0−δ,t0+δ] . So it suffices to show that Split A 0 into low frequencies |ξ| ≤ 2 and local frequencies |ξ| ≥ 1. For local frequencies we can use (92) and (87) (observing that I −2 H 1,0 ⊆ H 1/2+,0 ). For low frequencies we have A 0 ∈ CK 2 B(C 0 t L 6 2 ) from (82), and the claim will follow from (90). Now we control the ∂ t A 0 Φ component of N 1 . To do this we multiply (92) with (81). The product of I −2 H 1,0 and I −1 H 1,s− is acceptable from (86) (again observing (17), while the product of L 2 t L 6 1 and C 0 t L 6 1 is similarly in L 2 t L 3 2 . For the cross 17 The estimates there were phrased for cutoff functions η centered at the origin, but it is clear from time translation invariance that one can also use cutoff functions centered at t 0 .
Finally we consider Φ low ∇ x,t Φ low . As mentioned in the previous paragraphs we have Φ low ∈ CKB(C 0 t L 6 4 ) and ∇ x,t Φ low ∈ CKB(C 0 t L 3 4 ). The claim then follows from (17).
We apply the above Proposition with a fixed K sufficiently large, i.e. with K an absolute constant, following our conventions for such constants. In particular henceforth all implicit constants are allowed to depend on K. As a corollary of the above argument (specifically (79), (81), (92), (83)) we have the estimates In other words, ignoring the technical low frequency issues, IΦ lives in H 1,s− and I 2 A 0 lives in H 2,0 , and similarly for the time derivatives (but with one lower order of regularity, of course).
In the remainder of the paper we use the estimates (94)-(97) to obtain (45).

Differentiating the Hamiltonian
Having obtained control on A 0 , Φ on the interval [t 0 − δ, t 0 + δ], we are now ready to begin the proof of (45). We shall use the real inner product u, v := Re R 3 u(x)v(x) dx throughout this section. Since m is real and symmetric we observe that I is self-adjoint: Iu, v = u, Iv . Similarly for I −1 .
Fix T ∈ [t 0 − δ/2, t 0 + δ/2]. By the Fundamental theorem of calculus it suffices to show that Our next task is to expand the expression If the I were not present then (99) would vanish. With the I present, (99) does not vanish completely, but we will be able to express (99) in terms of commutators of I and other operators.
Before we do so, let us begin with a heuristic discussion, ignoring the elliptic term A 0 and the null structure. Since (MKG-CG) is roughly of the form (100) 2Φ = Ø(Φ∇ t,x Φ) + Ø(ΦΦΦ) and the Hamiltonian (13) is roughly of the form it seems reasonable to expect an identity roughly of the form for arbitrary Φ (not necessarily solving (MKG-CG)), since we know in advance that the Hamiltonian must be preserved by the flow (MKG-CG). In particular we expect On the other hand, by applying I to (100), we have Inserting this into the previous equation, we expect to split ∂ t H[IΦ] as two commutators: We now begin the rigorous argument. The rigorous form of (101) is Lemma 11.1 (First variation of Hamiltonian). If Φ is arbitrary (not necessarily solving (MKG-CG)), then where D α and F αβ were defined in (1), (2).
Observe that this quantity vanishes (as expected) if Φ solves (MKG), and in particular if it solves (MKG-CG).
Proof. We recall the stress-energy tensor where η αβ is the Minkowski metric. From (13) we see that To compute the integrand we observe that Using this we can expand ∂ α T α0 as Collecting terms and relabeling (using the anti-symmetry of F ), we can rewrite the above as The first term vanishes from the Bianchi identity dF = ddA = 0. The last term can be simplified as [D α , D 0 ] = iF α 0 . After a little more collecting terms and relabeling, we obtain ∂ α T α0 = (∂ α F α µ + Im(φD µ φ)F 0µ + Re(D α D α φD 0 φ) and the Lemma follows.
In the Coulomb gauge (7), we can use this lemma to rewrite (99) as On the other hand, by applying I to (MKG) we have Thus one can write (99) as a linear combination of the commutator expressions This should be compared with (102).
We now break up (103), (104), (105) further. We introduce the nonlinear commutators We also define the "mollified time derivative" In later sections we shall prove the estimates We begin with (103). Consider the contribution of ∂ j IA 0 . We write this term crudely as ∇ x IA 0 , Ø([I, N 2 ]) + Ø([I, N 3 ]) which is acceptable by (107). Now consider the contribution of ∂ t IA j . By (7) we may freely insert a projection P on the left term of the inner product, and hence on the right by self-adjointness. From the definition of the null form N 0 we can thus write this contribution as which is acceptable by (106). It remains to prove (106), (107). We shall do so in later sections, but for now we give some estimates on D 0 Φ.
The low term is acceptable from (76) and Sobolev, so we consider the local term. We split D 0 Iφ = Iφ t + iIA 0 Iφ. The Iφ t component is acceptable from (95), so it remains to control the local frequency component (IA 0 Iφ) local of IA 0 Iφ. Because this is a lower order term, regularity will not be a major problem, but there will be some other technical issues related to the time truncation.

From (96), (97) and Sobolev we have the crude bound
x ) while from (94), (57) and Sobolev we have . The first two norms of (109) are then bounded by Hölder (since I is bounded on all the above spaces). For the last norm we instead use (96), (16), and Sobolev to obtain x ) while from (95), (57) and Sobolev we have . The claim again follows from Hölder.
The only remaining task is to establish the estimates (106), (107) for various values of k.

The cubic commutator [I, N 3 ]
We first prove the estimates (106), (107) for the cubic commutator [I, N 3 ], which is the easiest to handle as there are no derivatives. Indeed we will not need the full strength of the commutator structure here, and we can use very crude Lebesgue space estimates. In this case we will obtain a decay of N −1/2+ instead of just 1 Smoothly divide Φ := Φ bounded + Φ high , where Φ high has frequency support on |ξ| > N/10 and Φ bounded is supported on |ξ| < N/5.
We need the following Strichartz-type estimates.
Proof. The bound on D 0 Φ comes from Lemma 11.2 and the crude estimate H 0,s− ⊆ C 0 t L 2 x . The bound on ∇ x IA 0 comes from (96) and the crude embedding I −1 H 1,0 ⊆ L 2 t L 9/2 x arising from Sobolev embedding (since s > 5/6).
For A 0 , we argue differently. The low frequency component is acceptable from (96), (16). For the medium and high frequency components, the estimate (96) and Sobolev gives A 0 ∈ CB(L 2 t L ∞ x ), while (97) and the fundamental theorem of calculus and Sobolev embedding gives . The claim then follows by interpolation.
For Φ, the bounds in (112) come from (94), the observation that , and the Strichartz embeddings (from (58)) For A 0 , we have from (96) and Sobolev that while from (97), the fundamental theorem of calculus we have The claims then follow from interpolation.
We now prove (106), (107). From (110) and Hölder it will suffice to prove that . We expand out [I, N 3 ] as the sum of eight terms When a = b = c = bounded then I acts like the identity everywhere, and the summand vanishes. Thus it will suffice to show that x ) whenever at least one of a, b, c is equal to high. By symmetry we may assume c = high. But then the claim follows from (111), (112) and Hölder. (The operator I and the projections Φ → Φ bounded , Φ → Φ high are bounded on every translationinvariant Banach space. To get L 2 t L 9/7 x , one places two factors in L 4 t L 6 x , and the last factor in an interpolant of C 0 t L 2 x and C 0 t L 3 x .) This completes the proof of (106), (107) for the cubic commutator [I, N 3 ].
Remark 12.2. It is in fact possible to use more Strichartz estimates to improve the estimate for [I, N 3 ] even further to N −1+ ; this would be consistent with the results for the cubic nonlinear wave equation in [17]. The numerology is as follows. As we saw above, it suffices to put the three factors Φ in N 3 = Φ 3 in L 3 t L 6 x . The Strichartz embedding (58) allows this if Φ is in H 2/3,1/2+ . But Φ is in H 1,s− (for medium frequencies at least), so there is 1/3 of a derivative to spare. Since there are three factors of Φ, we thus see that there is about a full derivative of surplus regularity in [I, N 3 ]. From (115) one then expects to extract a gain 18 of N −1+ , in principle at least.

Frequency interactions of bilinear commutators
In the remainder of the paper we will prove the estimates (106) or (107) for the bilinear commutators [I, N 0 ], [I, N 1 ], [I, N 2 ]. Ignoring derivatives and null forms 18 Admittedly, in the above argument only one of the three factors Φ could be assumed to be high frequency, however one should still be able to obtain the full gain of N −1+ by playing around with the Strichartz exponents (e.g. putting the high frequency factor in L 2 t L ∞− x and the other two in C 0 t L 2+ x ), or perhaps by using commutator estimates as we do with the bilinear commutators below.
(which will have no bearing on the discussion in this section), all the expressions on the left-hand side have the form On the other hand, the functions u, v, w which appear here have different behavior at low, medium and high frequencies. The purpose of this section is to decompose the above trilinear expressions in terms of these three frequency components.
We smoothly split u = u low +u med +u high , whereû low is supported on |ξ| < 20,û med is supported on 10 < |ξ| < N/5, andû high is supported on |ξ| > N/10. Similarly decompose v and w. We can then split (113) into 27 terms of the form This may look like a lot of terms, but fortunately most of these terms are zero. For instance, if neither of b or c is high, then I acts like the identity and (114) vanishes. So we may assume at least one of b, c is high.
Next, we claim that if one of a, b, c is low frequency, then (114) vanishes unless the other two indices is high frequency. To see this, suppose (for instance) that a was low frequency and b was low or med frequency. Then we can integrate by parts and rewrite the above as From this discussion we see that of the 27 terms in the decomposition, only 9 are non-zero, and they are listed in Figure 1.
We now discuss qualitatively how each of the six cases in Figure 1 will be estimated. In all of the cases, the main challenge in proving (106) or (107) is to obtain the decay factor 1 N (s−1/2)− ; it is relatively straightforward to prove these estimates without this decay factor, but then Proposition 5.4 will only let us control the Hamiltonian for times T = O(1), which will not give us global well-posedness for any H s x .
To obtain this decay we must use the fact that the high frequencies are small if measured in rough norms. In particular, we will make frequent use of the simple estimate valid for all u such thatû is only supported in "high" frequencies |ξ| N , and all θ ≥ 0 and s ∈ R. Thus if there is a high frequency term present, we can sacrifice some of its regularity to obtain the desired gain in N .  (113), and the Lemmas and estimates which are useful in each case (although for the null form N 0 the analysis is more complicated than the above table suggests). In most cases, it will not be so important to distinguish between the cases a = med and a = high. One can also eliminate the med-med-high case by Fourier support considerations, though this does not significantly simplify the argument.
This will be fairly straightforward in the first three cases of Figure 1, when there are two high frequency terms, because the low frequency term is smooth and easily estimated. In fact for these low frequency cases one can usually improve the decay estimate to N −1/2+ or better. We use the following two lemmas to handle the low frequency cases.
Proof. By (16) we may take p = ∞; by lowering s 3 if necessary we may assume s 3 = −s 1 . By a Hölder in time (and discarding all appearances of the operator I, which is bounded on every Lebesgue and Sobolev space) it suffices to prove the spatial estimate We perform a Littlewood-Paley decomposition u = k≥0 u k , whereû k is supported in the region ξ ∼ 2 k . Similarly split w = k ′ ≥0 w k ′ . Observe from the Fourier support of v that u k , vw k ′ vanishes unless k ′ = k + O(1). By Hölder we may therefore estimate the left-hand side by The claim then follows from Cauchy-Schwarz and the almost orthogonality of the u k and of the w k ′ in Sobolev norms.
The claim for permutations follows since the expression u(t), v(t)w(t) is essentially invariant under permutations (the conjugation being irrelevant for the above norms).
This lemma does not give the decay of 1 N (s−1/2)− directly, but we shall combine it with (115) to do so if there is enough surplus regularity in the u and w variables. However, even when this surplus regularity is unavailable we can still obtain this decay if there is a commutator structure. More precisely: Lemma 13.2 (Commutator estimate). We have Proof. By (16) it suffices to take p = ∞. By Hölder's inequality in time, it thus suffices to show the spatial commutator estimate Let us first assume thatŵ is supported in the annulus |ξ| ∼ M for some dyadic M ; we will sum in M later. If M ≪ N then I(vw) − v(Iw) = vw − vw = 0, so we may assume M N . Under this assumption, we will prove that (116) x ; the claim then follows for general w by a Littlewood-Paley decomposition and the triangle inequality.
It remains to show (116). Fix M . We use Plancherel to write From the support conditions on v and w we may insert some cutoff functions where a is a bump function adapted to |ξ| 1 and b(ξ 2 ) is a bump function adapted to |ξ| ∼ M .
From the mean value theorem and the smoothness of m we see that on the support of a(ξ 1 )b(ξ 2 ). Moreover, we may write where c is a bump function of two variables adapted to the region |ξ 1 | 1, |ξ 2 | ∼ M .
By inverting the Fourier transform again, we obtain whereč is the inverse Fourier transform of c. From the bump function estimates on c and standard integration by parts computations, we obtain the bounds |č(y, z)| M 3 y −100 M z −100 .
Thus by Minkowski's inequality followed by Hölder's inequality, (117) x and the claim (116) then follows from the Cauchy-Schwarz inequality.
Because of this Lemma, the low frequency terms will be quite minor in comparison to the medium and high frequency interactions, although they will unfortunately occupy about half of the cases in the sequel. For the medium and high frequency interactions we shall often use the estimate We may assume that u, v have non-negative Fourier transform. From the pointwise inequality m(ξ + η) −1 m(ξ) −1 + m(η) −1 and Plancherel, we see that I −1 obeys a fractional Leibnitz rule, so it suffices to show that x . But these both follow from (14) (observing that u controls Iu and I −1 v controls v; the conjugation is irrelevant).
To prove the third estimate it is enough to prove But, by integration by parts, duality and (14) we have and the claim follows.
As with Lemma 13.1, these estimates when combined with (115) will give the desired decay in N provided that there is enough surplus regularity in the high frequency factors.
The "high-high" interaction, when b, c are both high, will also be relatively easy to handle because there are two high frequency terms in which one can sacrifice some regularity. (It will turn out that the a term usually has no surplus regularity.) The "medium-high" or "high-medium" interactions will be more delicate however, especially if the function associated with the "high" frequency is quite rough (e.g. ∇ x,t φ). In this case there may be no surplus regularity on the high frequency factor to use, but to compensate for this the medium frequency factor will have quite a bit of surplus regularity. To exploit this we will use the commutator structure, and specifically the Hölder continuity (or mean-value theorem) estimate for any medium frequency ξ 1 , high frequency ξ 2 , and 0 ≤ θ ≤ 1. Morally speaking, the estimate (120) allows us to transfer 19 up to one full degree of regularity from the medium frequency factor to the high frequency factor. (If it were not for the Hölder estimate (120) (which would for instance be the case if m was rough), one would have to require that v and w can both individually come up with this much surplus regularity; this is roughly equivalent to the existence of an extra smoothing estimate of the type mentioned in the introduction.) We now turn to the specific details for each commutator in turn.
14. . Because of the presence of the relatively smooth function A 0 , these commutators can be handled by relatively simple tools, namely Hölder's inequality, the fractional Leibnitz rule, and some simple commutator estimates. We will be able to obtain a decay here of N −1/2+ , which improves over the claimed decay of We split D 0 Φ, A 0 , and Φ into low, medium, and high components as in Section 13. It will suffice to show for all triples (a, b, c) in Figure 1.
To prove the above commutator estimates, we will use the following bounds on the Lemma 14.1 (Spacetime estimates). On the slab [t 0 , T ]×R 3 , we can place the low, medium, and high components of D 0 Φ, A 0 , ∇ x,t A 0 , Φ, and ∇ x,t Φ in the following spaces: 19 It is this ability to use the commutator structure to transfer regularity from smooth factors to rough ones which distinguishes the methods here from the frequency truncation method used by Bourgain [1] and later authors. In that method one usually has to rely on "extra smoothing estimates" to control the medium-high interactions, but these estimates are usually only available if there are no derivatives in the nonlinearity.
We may discard the I from the above spaces if desired thanks to the trivial embeddings IH α ⊆ H α ⊆ I −1 H α for any α ∈ R.
We can now motivate the numerology behind the decay of N −1/2+ . Consider the commutators (121), (123), which are roughly of the form [t0,T ]×R 3 Ø(∇ t,x A 0 Φ∇ t,x Φ). From the above we see that the three factors are in H 1 x , L 2 x , and H 1 x , for medium frequencies at least. Lemma 13.3 then allows us to estimate the above trilinear expression. In fact we have about half a derivative to spare; even if we reduced the regularity of one of the H 1 factors to H 1/2+ , we could still use Lemma 13.3. The idea is to then use (115) to convert this half derivative of room to a N −1/2+ factor in the estimates 20 . The case of (122) is similar; the three factors are now in L 2 x , H 2 x , L 2 x but there is still the half of derivative of surplus regularity which one can hope to convert to a N −1/2+ gain, by using (115) (and in some cases (120)).
Unfortunately, there are a number of minor differences between (121), (122), and (123) which require separate treatment. To systematize the numerous cases we shall use a number of tables.
To prove (121) (which is the easiest case) for each of the six cases in Figure 1 we use the norms and Lemmas indicated in Figure 2. For low frequency interactions we use Lemma 13.1, while for medium and high frequency interactions we use Lemma 13.3(i).
The proof of (122) is a little trickier because of the low regularity of ∂ t φ. We tackle the six cases in Figure 1 using the spaces and Lemmas in Figure 3 (concatenating the fifth and sixth cases). Five of the cases are straightforward applications of the 20 Indeed, one could perhaps improve this factor even further by exploiting the room available in the time index. Currently we are estimating one factor in L 2 t and the other two in C 0 t . By using Strichartz estimates (cf. Section 12, or [17]) one might be able to sacrifice integrability in time for regularity in space, which might then be convertible to further gains in N . Figure 2. List of possible cases for (121), the spaces in which to estimate the three factors, and the Lemma used to obtain the estimate. In this case smoothing effect of I or the commutator structure does not need to be exploited. Observe that in all six cases the product of the three norms is Lemmas of the previous section and will not be discussed further. The one case which is interesting is Case 4, when a is medium or high, b is medium, and c is high. By a Hölder in time it suffices to show the commutator estimate where v has medium frequency and w has high frequency. (Note that Lemma 13.2 is not available to us here because v is not low frequency.) Since all the norms on the right-hand side are L 2 x based we may assume thatû,v,ŵ are non-negative. By (120) with θ = 1/2− we then have x , and from Sobolev embedding we have |∇ x | 1/2− v L ∞ x v H 2 . The claim follows.
Finally, we prove (123). We now argue as before, except that we now must re-shuffle the six cases of Figure 1 because ∇ x IA 0 has different behaviour at medium and  spaces, and in particular the fact that the "b" index is s− and not just 1/2+. Here is the one case where we will only be able to obtain a decay of 1 N ( s− 1 2 ) − instead of N −1/2+ . Unsurprisingly we shall also need null form estimates for these spaces. The time localization to the interval [t 0 , T ] has been ignored up until now (because we have always done a Hölder in time anyway) but is now a major technical nuisance, as multiplication by sharp time cutoffs destroys the "b" index of regularity.
The major difficulty with this estimate is with the ∇ −1 Q(Φ, Φ) component of N 0 , because there is no extra regularity in the s index in any of the factors to be sacrificed to obtain the decay in N . However, there is some extra regularity in the b index which can (after much work) be exploited as a substitute. Informally, the strategy is as follows. If at least one of the factors is low or medium frequency then one can obtain the decay in N through commutator estimates. Now suppose all factors are high frequency. We look at the spacetime Fourier transform of all three factors. If at least one of them is far away ( N ) from the light cone then one can exploit the additional room in the b index to obtain the gain. The only remaining possibility is when all three frequencies are close to the light cone, but this means that their frequencies must be close to parallel (since they must add up to zero), at which point one can obtain some gain from the null structure.
It will suffice to prove that for all a, b, c as in Figure 1.
We first deal with the low frequency terms when one of a, b, c is low frequency. For this term we shall abandon the null structure and just prove We argue using the following table; the estimates on D 0 Φ and Φ come from Lemma 11.2 and (94) respectively. Note that the arguments are almost identical to those in the previous section.
Now suppose that a, b, c are in one of the three remaining cases in Figure 1. Since we have dealt with all low frequency issues there will no longer be a need to distinguish between |ξ| and ξ , or between homogeneous and inhomogeneous Sobolev norms, etc.
The Sobolev estimates which sufficed for all the other commutators will not work here, and we must exploit the null structure. Since [t 0 , T ] is contained inside [t 0 − δ/2, t 0 + δ/2], it will suffice from (94), Lemma 11.2, to prove the global spacetime estimate where u a , v b , w c are supported in the Fourier regions corresponding to a, b, c. Now that we are working globally in spacetime we are able to use the spacetime Fourier transform. We may assume that the spacetime Fourier transforms of u a , v b , w c are all real and non-negative.
Next, we recall the standard estimate for the null form symbol |ξ 1 ∧ ξ 2 |. In fact, we shall study several cases and show that all of them (except one) are For any α > 0, let a α (t) denote the Fourier transform of τ −α ; this function appears implicitly in (126), (127) for α = 1/2, 1. We shall need the following L p t estimates: Lemma 15.2. Let 0 < α ≤ 1. Then a α ∈ L p t for all p < 1/(1 − α).
We now estimate (126), (127) separately in the cases min = 0 and min = 0, giving four cases. In all cases except Case 4, we will be able to obtain a decay of O(N −1/2+ ). Case 3: The bounding of (127) when min = 0.
Since the j = 1 and j = 2 cases are symmetric, we may assume that j = 0 or j = 1.
We again use (128) to bound (127) by N −1/2+ * ;Nmin 1,Nmax∼N med N 2 j=0 F j (ξ j , τ j ) Undoing the Fourier transform and considering the j = 0 and j = 1 cases separately, it thus suffices to show For the latter estimate, we place u 0 in L 4 t L 4 x , u 1 in L 2 t L 2 x , and u 2 in L 4+ t L 4 x , using (58) and Lemma 15.2. Now we turn to the former. To avoid excessive notation, we will pretend that a 1 is compactly supported rather than rapidly decreasing; the rapidly decreasing case can then be handled by a routine dyadic decomposition. We may now assume that u 0 , u 1 , u 2 are compactly supported in time. From (61) we have u 1 u 2 H −1/2,0 u 1 H 0,s− u 2 H 1/2,s so it suffices to show that R×R 3 uva 1 dxdt u H 1/2,(s−1/2)− v H −1/2,0 .
But this is easily established from Plancherel's theorem.
By symmetry we may take min = 1. We may assume that N 1 ≪ N 0 , N 2 since otherwise we could take min = 0 and be in Case 3. We can then bound (127) by * ;1 N1≪N0,N2 Using Cauchy-Schwarz, we may fix N 0 and N 2 ; but we will still need to sum in N 1 .
There are two subcases.
We let e 1 , e 2 , e 3 be the standard basis for R 3 . We choosê But this can easily be seen to be false for sufficiently small ε, giving the desired contradiction.
Remark A.2. The above construction shows that φ[1] − φ lin [1] can be made arbitrarily large in the energy norm even when Φ [H s ] is small. It is possible to modify the above construction (using multiple frequency scales N ) to in fact make φ[1] − φ lin [1] have infinite energy; we leave the details to the reader.