APPROXIMATION OF (SOME) RANDOM FPUT LATTICES BY KDV EQUATIONS

. We consider a Fermi-Pasta-Ulam-Tsingou lattice with randomly varying co-efficients. We discover a relatively simple condition which when placed on the nature of the randomness allows us to prove that small amplitude/long wavelength solutions are almost surely rigorously approximated by solutions of Korteweg-de Vries equations for very long times. The key ideas combine energy estimates with homogenization theory and the technical proof requires a novel application of autoregressive processes.

Here t ∈ R, j ∈ Z and the unknowns q(j, t) (the relative displacement) and p(j, t) (the velocity) are real-valued.The mass coefficients m(j) are strictly positive and is the spring potential 2 .Lastly δ + f (j) = f (j + 1) − f (j) and δ − f (j) = f (j) − f (j − 1) are the right and left finite-difference operators.Models of this sort are ubiquitous in applications.A partial list: molecular dynamics, lamination, nondestructive testing, vehicular traffic, granular media, metamaterials, chemistry/biochemistry, and power generation [21].The system (1.1) also plays a major role as a paradigm for the mathematical analysis of wave propagation-especially solitary waves-in nonlinear dispersive settings and it is the system's famous connection to the Korteweg-de Vries (KdV) equation wherein our interest lies.Here are several important mathematical results about that connection: • When m(j) is constant, long-wavelength (say like 1/ϵ, where 0 < ϵ ≪ 1), smallamplitude (order of ϵ 2 ) solutions are well-approximated over long time scales (order of 1/ϵ 3 ) by solutions of KdV equations.The relative ℓ 2 -error made by the approximation We write the equations as a first order system as opposed to its possibly more familiar second order form m(j)ẍ(j) = V ′ (x(j + 1) − x(j)) − V ′ (x(j) − x(j − 1)).
The change of variables leading from this to (1.1) is q(j) = x(j + 1) − x(j) and p(j) = ẋ(j). 2This choice of the spring potential-which is an instance of the "α-potential" from [9]-is made mainly for simplicity.We could allow more complicated potentials and, so long as we had V ′ (0) > 0 and V ′′ (0) ̸ = 0, only minor changes to our results would occur.
in this case is O(ϵ).See [26] for the earliest formal derivation and [23] for the first rigorous result.
While there have been a few derivations of KdV from random versions of the FPUT lattice previously (specifically [14,25]), all have been purely formal with no rigorous quantitative results.Even conjectures for the size of the error have been absent.We have been working for some time to remedy this.In our article [20] we discovered that if the m(j) are independent identically distributed (i.i.d.) random variables then the accuracy of long-wavelength approximations is substantially diminished and, consequently, only shorter time scales and the linear problem (that is when V ′ (q) = q) are within reach.Precisely, we showed 3 that long-wavelength solutions (again like 1/ϵ) converge almost surely and strongly, but rather slowly, to solutions of a wave equation on time scales on the order of 1/ϵ: the relative ℓ 2 -error made by the approximation is almost surely O( ϵ ln | ln(ϵ)|).Numerics indicate that our error estimate is close to sharp.In [19], McGinnis proved a similar result for a 2D lattice.Furthermore, formal and numerical studies of random FPUT and other similar random lattice problems report that the waves in such systems experience a notable deterioration of their amplitude as time evolves (see, for instance, [11,15,16,18]).We have carried out our own simulations of the nonlinear problem (1.1) with i.i.d.random variables as coefficients in the long-wavelength/small amplitude regime.These simulations demonstrate that for time scales longer than 1/ϵ, solutions of (1.1) attenuate substantially; KdV-like dynamics (namely, resolution into solitons) is not observed.We include the results of our simulations below in Section 7, Figure 1.In short, we do not believe that when the coefficients are i.i.d.random variables a KdV approximation is appropriate or possible.
However, there are more sorts of randomness than simply taking the coefficients to be i.i.d..In this paper we consider the random case, but we restrict the randomness in such a way that we can prove a fully rigorous KdV approximation.We believe that this is the first example of such a result involving randomness and nonlinear dispersive systems, though there are several earlier results which carefully derive-but do not fully justify-KdV as an effective equation for the evolution of long water waves over randomly varying topography [22,4,6,5].
Here is our assumption on the the masses: Hypothesis 1.1.The masses are given by where ζ(j), j ∈ Z, are i.i.d.random variables with zero mean, variance σ 2 and support contained in (−1/4, 1/4).
We refer to (1.3) as a transparency condition and we call (1.1) subject to Hypothesis 1.1 the transparent random mass FPUT lattice.The use of "transparent" here is due to an observation from simulations: if the masses meet this condition then waves propagate relatively cleanly through the lattice without too much "back scattering" or "internal reflection."Our idea for making this choice was inspired by the derivation of KdV as an effective equation for water waves over a random bottom in [22] where the topography is given as a perfect spatial derivative.The condition on the support of ζ(j) is there to ensure that m(j) are strictly positive (for if |ζ(j)| < 1/4 then the triangle inequality tells us m(j) > 0).It also guarantees that σ 2 < ∞.
Our main result in a nutshell: for suitable initial conditions, solutions of the transparent random mass FPUT lattice almost surely satisfy for |t| ≤ T 0 /ϵ 3 , where A and B solve the KdV equations The fully technical version of our result appears in Theorem 6.1 below.
Remark 1.2.To the uninitiated, it may look like the size of the error exceeds the size of the approximation.However, the long-wave scaling of the spatial coordinate gives Our paper is organized as follows.Section 2 spells out some notation and other ground rules.Section 3 proves a general approximation theorem for (1.1).While motivated by KdV approximations, the result applies more broadly.Section 4 contains the derivation of KdV from (1.1) under Hypothesis 1.1; this is the heart of the paper.Section 5 contains a multitude of estimates which set the stage for the application of the general approximation theorem.It is in this section where probability plays a major role and where the technical guts of our result live.Section 6 ties everything together with the statement and proof of our main result, the technical version of (1.4).Then we present the result of supporting numerics in Section 7. We close out with a big list of open questions in Section 8. Acknowledgements: The authors would like to thank Amanda French, C. Eugene Wayne and Atilla Yilmaz for helpful conversations related to this project.Also, JDW would like to recognize the National Science Foundation who supported this research with grant DMS-2006172.

Preliminaries
2.1.Function/sequence spaces.For a doubly infinite sequence f : Z → R we put, as per usual, ∥f ∥ ℓ 2 := j∈Z f 2 (j) and ∥f ∥ ℓ ∞ := sup j∈Z |f (j)|.Of course ℓ 2 and ℓ ∞ are the sets of all sequences where the associated norms are finite.If we write ∥f, g∥ ℓ 2 we mean ∥f ∥ ℓ 2 + ∥g∥ ℓ 2 , the norm on ℓ 2 × ℓ 2 .The analogous convention applies to ∥f, g∥ ℓ ∞ .For functions F : R → R and non-negative integers n and r we put and H n (r) is the closure of the set of all smooth functions with respect to this norm.We define H n := H n (0), L 2 (r) := H 0 (r) and is the associated function space.By L ∞ we mean W 0,∞ .All of the spaces listed above are Banach spaces.
).We at times default to "big C" notation: if we simply write f (ϵ) ≤ Cg(ϵ) and omit qualifiers then we mean f (ϵ) = O(g(ϵ)).Some quantities will depend on the random variables ζ(j).For such quantities, if we write f (ϵ) = O(g(ϵ)) we mean this in an almost sure sense.Specifically, there exist constants To be clear: we always mean O, o and their ilk rigorously, and we always mean them in the almost sure sense.

Approximation in general
We begin by proving a general approximation theorem using the strategy described in Section 5.3 of [10] (itself inspired by [23]).Suppose that for ϵ ∈ (0, 1) we have some functions q ϵ (j, t) and p ϵ (j, t) (the approximators) that we expect are good approximations to solutions of (1.1) when ϵ is small.By this we mean that we know that q ϵ (j, t) and p ϵ (j, t) nearly solve (1.1) in the sense that the residuals are small relative to ϵ.To validate the approximation over the time scale |t| ≤ T 0 /ϵ 3 we need information about α 1 (ϵ) := sup In particular, we assume: Our goal is to show that if we have approximators with these features then the true solution of (1.1) whose initial conditions are consistent with those of the approximators remains close over the long time scale.The result we prove here is specialized to FPUT lattices where the spring potentials are homogeneous and of the form (1.2), but requires only the following non-degeneracy condition on the masses: The condition on the support of ζ(j) in Hypothesis 1.1 implies the above, though we do not require all of that hypothesis in this section.
Here is the result: Theorem 3.1.Suppose that the mass coefficients satisfy (3.4), the approximators q ϵ (j, t) and p ϵ (j, t) meet (3.3) and the initial conditions for (1.1) satisfy ∥q(0) − q ϵ (0), p(0 Then the solution (q(t), p(t)) of (1.1) satisfies the absolute error estimate as well as the relative error estimate Proof.We introduce the errors: η := q − q ϵ and ξ := p − p ϵ where q(j, t) and p(j, t) solve (1.1).Time differentiation of these expressions together with (1.1) and some algebra get us: In the above )a + a 2 .Now we define the energy functional: In the above (u, v) = (u(j), v(j)) is in ℓ 2 × ℓ 2 .Under our assumptions, the square root of this quantity is equivalent to the ℓ 2 × ℓ 2 norm in the following sense: there exist ϵ * ∈ (0, 1) and (3.6) Here are the details.First of all, simple estimation gives and thus (3.4) tells us that j∈Z 1 2 m j v(j) 2 is equivalent to ∥v∥ ℓ 2 .This gives the "v" part of (3.6).
For the next step, we suppose that η(j, t) and ξ(j, t) solve (3.5) and put H(t) := H(η(t), ξ(t); t).Differentiation of H(t) with respect to t gives: Using (3.5) (and suppressing some dependencies) results in: We sum by parts and terms cancel: Subsequently, Cauchy-Schwarz and the like get us: One easily computes that ∂ b W(η, q ϵ ) = η 2 .In which case we conclude, using the earlier formula for W ′ and routine estimates, that Ḣ ≤ ∥ξ∥ dt H 1/2 the above can be recast as We have assumed α 2 (ϵ) = O(ϵ 3 ) so the above implies for a constant C 2 > 0.
An application of Grönwall's inequality gets us: Using (3.6) again: We take the supremum of this over |t| ≤ T 0 /ϵ 3 and get sup The constant C ⋆ > 0 is independent of ϵ.
In conclusion, if we assume that then we have shown sup This is the absolute error estimate.As for the relative error a standard reverse triangle inequality argument shows that sup

Derivation of the effective equations
Now that we have Theorem 3.1, we can move on to deriving the KdV equations from (1.1).The procedure for the derivation is a multiple scales expansion, inspired by [3].We assume the following form of our approximators: where the Q n = Q n (j, X, τ, T ) and P n = P n (j, X, τ, T ) are maps Of course we are viewing ϵ as being small.Given that we put X = ϵj in q ϵ and p ϵ , we think of X as being the long-wave length scale and j being the microscopic length scale.
For expansions of the sort we are carrying out, it pays to be organized at the outset.First we define the following operators for functions U = U (j, X): ) .These are partial shifts and partial finite-differences with respect to j. Next, for ϵ > 0 put If u(j) = U (j, ϵj) then δ ± u(j) = D ± U (j, ϵj).That is to say, D ± are the total finite-difference operators.
Expanding the right-hand sides of D ± U (j, X) in (formal) Taylor series with respect to ϵ Truncating the sum at n = M would give a formal error on the order of ϵ M +1 and so we define These operators give the exact error made by such a truncation.Note that for M = 0 we just ignore the sum, i.e. ϵE ± 0 := D ± − δ ± j .If we plug (4.1) into the residuals (3.1) and carry out some substantial algebra we find that Res where and The usual way to proceed is to select the Q 0 , P 0 , . . ., Q 3 , P 3 so that each Z 1k = Z 2k = 0.In this case we would have Res 1 = ϵ 6 W 1 and Res 2 = 1 m ϵ 6 W 2 which we can then estimate using the formulas for the Q n and P n .This strategy works perfectly well in the homogeneous and periodic problems as all the terms are rigorously the size they formally appear to be, modulo an annoying factor of ϵ −1/2 caused by the long-wave scaling.But it fails in the random problem; the randomness leads to terms which are much larger than they appear.
Our modified strategy is to solve (which will largely determine Q 0 , P 0 , . . ., Q 2 , P 2 ) and then to do "something else" for Z 15 and Z 25 .At the end of this, we find that Res In Section 5 we show that these are O ℓ 2 (ϵ 5 | ln(ϵ)|).This is enough to apply Theorem 3.1 and get the error estimates shown in (1.4).

4.1.
A tutorial on solving Z 1k = Z 2k = 0.Each pair of equations Z 1k = Z 2k = 0 will have the form g n (j) Ḡn (X, τ, T ). (4. 3) The sequences f n (j) and g n (j) are mean-zero random variables which come, in one way or another, from ζ(j); they depend only on the microscale coordinate.The Fn and Ḡn functions do not depend on the microscale coordinate at all.They will be made up of pieces of the various P n and Q n where n < k − 2. In this way (4.3) allows us to figure out P k−2 and Q k−2 from the earlier functions.
We decompose (4.3) into a "long-wave" part (those pieces that do not depend on the microscale coordinate j at all) and a "microscale" part (those that do).The long-wave part just consists of the terms F0 and Ḡ0 and so we set (4.4) F0 = 0 and Ḡ0 = 0.This is a sort of solvability condition that will wind up giving us the long-wave dynamics; how it all plays out will be seen when we get in the weeds below.The microscale part is what is left over: We can just write down a solution for this: χ n (j) Fn (X, τ, T ) and where we select χ n and κ n so that Solving these equations for χ n and κ n from f n and g n is one of the key steps in the whole procedure and as we shall show the transparency condition makes this a relatively easy affair...at least at first.The functions Pk−2 (X, τ, T ) and Qk−2 (X, τ, T ) are "constants of integration"; in most cases we determine these from (4.4) at a later point in the derivation.Now we get into actually solving the equations.
Remark 4.1.In this section any function with a "bar" on top will not depend on j.We make this convention so that we do not need to perpetually clutter up our algebra with functional dependencies.For the same reason it is helpful to keep in mind that m and ζ depend only on j and not on the other variables.
4.3.Z 13 = Z 23 = 0. Using (4.7) these equations become . Using the transparency condition (1.3) converts these to . Following the steps from the tutorial in Section 4.1 we see that the long-wave part (4.4) of these equations is This is the wave equation wearing a fake mustache and glasses and we readily solve it: Remark 4.2.We use the convention that w = X − τ and l = X + τ so that A = A(w, T ) and B = B(l, T ).Note that A and B do not depend on j.It is these functions that will ultimately solve KdV equations.
After (4.8) we are left with the microscale part . The solution formula (4.6) gives Q 1 = Q1 + χ∂ τ P0 where we want δ − χ = δ + δ − ζ.Finding χ is easily done as we can simply cancel a δ − from both sides and put χ = δ + ζ.This is so simple because of the transparency condition (1.3) and this is one of the reasons we have assumed it.Likewise (4.6) says that we should put P 1 = P1 , but it will turn out that P1 will be zero so we just enforce that now.In short we have Note that δ + j ζ is bounded in j because of the compact support assumption in Hypothesis 1.1.
Remark 4.3.What if we had not made the transparency assumption but instead assumed that m(j) = 1+z(j) where z(j) are i.i.d.mean zero random variables?The long-wave part is the same as above but now the microscale part is δ − j Q 1 = z∂ τ P0 .To use the solution formula (4.6) we would want to find χ so that δ − χ = z, or rather This equation tells us that χ(j) is a random walk with steps given by z(j) and as such we expect χ to grow like √ j.To see why this is an issue, notice that Q 1 would include the term χ(j)A w (X − τ, T ), which then would show up in the approximator (4.1) as ϵ 3 χ(j)A w (ϵ(j − t), ϵ 3 t).The term A w is propagating to the right with roughly unit speed and thus when t ∼ 1/ϵ 3 will be located at j ∼ 1/ϵ 3 .In turn this indicates χ(j) ∼ ϵ −3/2 towards the end of the approximation time interval.Hence the term ϵ 3 χA w would be substantially larger than it appears: the techniques from [20] show that almost surely Were χ = O ℓ ∞ (1) the right-hand side of the preceding estimate would be Cϵ 5/2 (see Lemma 5.6 below).And so we find that the "ϵ 3 term" in the approximator is more than an order of magnitude larger than it should be, bigger in fact that the leading order term in the approximation.Disaster!
The lesson learned: if a term in our approximation involves a random walk it will ultimately be at least ϵ −3/2 larger than it formally appears to be.We call this difficulty a random walk disaster.
4.4.Z 14 = Z 24 = 0.The relations (1.3), (4.7), (4.8), (4.10) and a little algebra convert these equations to The long-wave part of this is which can be solved by putting This leaves the microscale part Which as per (4.6) we solve by taking Once again, the transparency condition (1.3) made finding this solution a simple matter of cancelation; it is the reason why the transparency condition has two finite-differences on ζ.If we had put only one finite-difference in (1.3) then another random walk disaster as described in Remark 4.3 would occur when we solve (4.12).
Recall that σ 2 is the variance of ζ(j).We need Z 15 and Z 25 to be small relative to ϵ, but zeroing them out completely happens to be too restrictive; we will need to modify the microscale part of the decomposition described in Section 4.1.But before we get there we deal with the long-wave part.
4.5.1.Kill the long-wave part with KdV equations.As per normal, we zero out the long-wave parts of Z 15 and Z 25 .We have conveniently arranged all such terms in the first line on the right of the preceding formulas for Z 15 and Z 25 and so we put Within (4.14) lurk the KdV equations; here is how we coax them into the daylight.Let and use (4.9) in (4.14) to get Subtracting these gives: (4.15) If we let B be an l-antiderivative of B (specifically B(l, T ) := l 0 B(y, T )dy) and set (4.16) many terms in (4.15) die.What survives is This is a KdV equation!A parallel argument (after adding instead subtracting equations a few steps above) shows we should take (4.18) with A a w-antiderivative of A (specifically A(w, T ) := w 0 A(y, T )dy).In which case we get that B solves another KdV equation: To summarize: taking A, B, A 2 and B 2 as we have just described means that (4.14) is satisfied.4.5.2.Handle with the microscopic part using autoregressive processes.The next step in dealing with Z 15 and Z 25 is to control the microscopic parts that are left over after (4.14): Many, but not all, of these terms in Z 25 can be eliminated with the same cancelation tricks that worked earlier.To see this, we let where for the moment we leave γ 1 = γ 1 (j) and γ 2 = γ 2 (j) unspecified.Substituting the above into (4.20)gives (4.21) If we followed the strategy from the tutorial in Section 4.1, we would put 2 and get Z 15 = Z 25 = 0. Since the ζ(j) are i.i.d.random variables we would then find that γ 1 (j) and γ 2 (j) are random walks, which leads us to another disaster as described in Remark 4.3 (this time in the residual terms).Why not just stack another finite-difference on ζ in the transparency condition?This would help in Z 15 but would not be useful in handling the parts stemming from ζδ + j δ − j ζ in Z 25 .To avoid these problematic random walks we take γ 1 and γ 2 to solve In which case we find that (4.21) becomes The extra factors of ϵ on the right-hand sides here means that our choices of P 3 and Q 3 are formally as good as putting Z 15 = Z 25 = 0. Estimates for P 3 and Q 3 (and consequently Z 15 , Z 25 and the residuals) ultimately require us to understand γ 1 and γ 2 .The equations in (4.22) are examples of autoregressive processes [12].These are dissipative cousins of random walks and with classical probabilistic methods we will show that they roughly cost us a factor of ϵ −1/2 (see Lemma 5.10 below) instead of the ϵ −3/2 we get from using random walks.This is big but not too big for our estimates to handle.4.6.Summing up.At this point we have completely determined all the functions P 0 , . . ., Q 3 in the approximation.As it can be challenging to sort through it all, we close out this section by summarizing the derivation.Definition 4.4.Suppose A(w, T ) and B(l, T ) solve the KdV equations (4.17) and (4.19) and γ 1 (j) and γ 2 (j) solve the autoregressive processes (4.22).Take A 2 (w, l, T ) and B 2 (w, l, T ) as in (4.16) and (4.18).Define Q k (j, X, τ, T ) and P k (j, X, τ, T ) via where it is understood that w = X − τ and l = X + τ .Then we call q ϵ (j, t) and p ϵ (j, t), as defined in (4.1), the extended KdV approximators.
In this section we have proven: Lemma 4.5.The extended KdV approximators have with W 1 and W 2 given at (4.2).
We move on to proving many estimates related to the extended KdV approximators.

Estimates on the approximators and residuals
To streamline some of the forthcoming statements we put forth the following convention: Definition 5.1.We say A and B are good solutions of KdV on [−T 0 , T 0 ] if they satisfy (4.17) and (4.19) along with the estimate Remark 5.2.The existence of good solutions of KdV on intervals of arbitrary length is by now classical (see [24]).The lower bound is just to guarantee that the approximation is not trivial.
In this section we prove: Proposition 5.3.Assume Hypothesis 1.1.Let q ϵ (j, t) and p ϵ (j, t) be the extended KdV approximators as in Definition 4.4 where we further assume that that A and B are good solutions of KdV on [−T 0 , T 0 ].Then almost surely the quantities defined at (3.2) satisfy ). Estimates on terms which do not involve γ 1 or γ 2 can be handled using well-understood techniques found in previous works, whereas the rest require new ideas.All dependence on γ 1 and γ 2 enters through P 3 and Q 3 , the latter of which has some terms without them.And so we put (5.1) To be clear, Q 30 has no instances of a γ within.
Similarly if in the formulas for W 1 and W 2 we eliminate any term with a γ in it we get: Thus the terms with a γ are: (5.2) 5.1.Terms without γ 1 and γ 2 .In this part we prove: Lemma 5.4.Assume Hypothesis 1.1.Let q ϵ (j, t) and p ϵ (j, t) be the extended KdV approximators as in Definition 4.4 where we further assume that that A and B are good solutions of KdV on [−T 0 , T 0 ].Then and sup Remark 5.5.Note that in Hypothesis 1.1 we assumed that |ζ(j)| < 1/4 for all j.A consequence of this is that none of the estimates in Lemma 5.4 depend on the realization of the ζ(j).That is to say there is no probability needed to understand this lemma.
Proof.The proof is similar to that of Proposition 4.2 of [10], though there are a few small, but substantive, differences.The main tool we need is: Proof.Lemma 4.3 of [10] is nearly identical to this, but has the requirement that f (j) be N -periodic.Still we can piggyback the proof of our result on that one.The first estimate is all but obvious.For the second we have the easy estimate ∥u∥ ℓ 2 ≤ ∥f ∥ ℓ ∞ ∥F ϵ ∥ ℓ 2 where F ϵ (j) := F (ϵj), j ∈ Z.But then the second estimate of Lemma 4.3 of [10] applies and shows ∥F ϵ ∥ ℓ 2 ≤ Cϵ −1/2 ∥F ∥ H 1 .For the third, a direct computation shows that The third estimate from Lemma 4.3 of [10] To get an estimate for E − M is similar.The final estimate, for D ± u, follows from, the definition of D ± , the triangle inequality and the second estimate in this lemma.□ We also need the following, to control the antiderivatives in A 2 and B 2 : Proof.We use Cauchy-Schwarz and the fact that R (1 + y 2 ) −1 = π.To wit: .
Taking the supremum over X seals the deal.□ Armed with Lemmas 5.6 and 5.7 we can get into proving the estimates in the Lemma 5.4.There are many terms and handling each would inflate this paper like a bounce house.So we do not do that.Instead we show how to estimate a "prototype" term which captures the nuances.That term is g = E − 0 δ + j ζA∂ 2 l B which some digging will show appears in Res 2 .Using the estimate for E − 0 from Lemma 5.6 we have ∥g∥ ℓ 2 ≤ Cϵ −1/2 ∥δ + j ζ∥ ℓ ∞ ∥A∂ 2 l B∥ H 1 .By the triangle inequality and the definition of δ + j we have ∥δ + j ζ∥ ℓ ∞ ≤ 2∥ζ∥ ℓ ∞ and the supposition that the support of ζ(j) is in (−1/4, 1/4) ultimately gives ∥δ + j ζ∥ ℓ ∞ ≤ 1/2.Also classical Sobolev-Hölder inequalities tell us that . Likewise, Sobolev's inequality tells us that Since A and B are assumed to be good solutions of KdV on [−T 0 , T 0 ] we get sup |t|≤T 0 /ϵ 3 ∥g∥ ℓ 2 ≤ Cϵ −1/2 , which is the targeted estimate.
All the other terms are handled using the same sorts of steps used above.We close the proof with a comment on the regularity needed.The most smoothness required for A and B comes from the terms in ∂ T Q 30 .As in [23,2,10], one finds that ∂ 6 w A and ∂ 6 l B make an appearance and so, to deploy estimates like in Lemma 5.6, we need A and B to be in H 7 .□ 5.2.The autoregressive part.Now we need to put bounds on terms where γ 1 and γ 2 appear.The first question: how big are these sequences?The equations in (4.22) which these satisfy are examples of autoregressive models, specifically AR(1) processes [12].We have the following almost sure estimate for solutions of such processes: Lemma 5.8.Suppose that z(n), n ≥ 0, are i.i.d.random variables with zero mean and compact support.Fix θ ∈ (−1, 1) and let Then there exists a constant C > 0 so that The constant C depends on the realization of z(n) but does not depend on θ; it is almost surely finite.
Proof.The result is a consequence of of Hoeffding's inequality, whose proof can be found in [13]: Theorem 5.9.Let w(0), . . ., w(n − 1) be mean-zero, independent random variables with w(k).Then for any µ ≥ 0 We apply this to (5.3); let w n (k) := θ k z(n − k).Since E[z(j)] = 0 we have E[w n (k)] = 0 for all choices of n and k.Since the z(j) are independent it follows that, for fixed n, the w n (k) are independent with respect to k.The support of z(j) is compact so there is a ≥ 0 for which the support lies in [−a, a].Then the support of θ k z(n − k) is in [−aθ k , aθ k ].Thus w n (0), . . ., w n (n − 1) pass the hypotheses of Theorem 5.9 with b k = aθ k and we have: Since n≥0 2/(e + n) 2 is finite, the Borel-Cantelli Lemma [8]  Lemma 5.10.Take Hypothesis 1.1 as given.Suppose that γ 1 (j) and γ 2 (j) solve (4.22) and γ 1 (0) = γ 2 (0) = 0. Then there exists a constant C > 0 such that for all ϵ ∈ (0, 1) we have The constant C depends on the realization of the ζ(j) but is almost surely finite.
Proof.We prove the estimate for γ 2 as the one for γ 1 is similar but easier.Taking j > 0 in the second equation in (4.22) gives or rather If we take γ 2 (0) = 0 then the we can find γ 2 (j) (for j > 0) from the above by iteration.In particular we have where we have put θ ϵ := 1/(1 + ϵ).To be clear γ 21 (j), γ 22 (j) and γ 23 (j) correspond to the three sums in the order of their appearance.
The random variables ζ(j) meet the hypotheses of Lemma 5.8 and so we can apply the results to γ 21 (j) and γ 22 (j) forthwith to get: Dealing with γ 23 (j) is a bit more complicated because the summands are not independent.We have From this we see that v(j) and v(j + 1) are dependent.As are v(j) and v(j + 2), since ζ(j + 1) appears in both.But v(j + 3) and v(j) have no terms in common and it follows that they are independent.Thus {v(3l)} l∈Z is an i.i.d.collection of random variables.As are {v(3l + 1)} l∈Z and {v(3l + 2)} l∈Z .We break up γ 23 (k) accordingly: Each of the three sums passes the hypotheses of Lemma 5.8, though there are some small subtleties.We estimate the first as the others are all but the same.Put k = 3l to find Then we have from Lemma 5.8: This, along with the fact that ln is an increasing function, gives which in turn leads to the estimate we are after.We need estimates for γ 2 (j) when j < 0 too.If we take j < 0 in the second equation of (4.22) we get We rearrange this: As we have taken γ 2 (0) = 0 the above formula gives us γ 2 (−1) and, more generally, γ 2 (j), j < 0, by iteration.For j = −l < 0 we obtain: where ϑ ϵ := 1 − ϵ.The first two sums pass the hypotheses of Lemma 5.8 and since That completes the proof of Lemma 5.10.□ Next we prove the main workhorse lemma for controlling γ terms in our approximation: for k = 1, 2. (The choices for + or − in E ± 0 and F (ϵ(• ± t), ϵ 3 t) are not linked.)The constant C > 0 is almost surely finite.
Note that if 0 ≤ y < t then the left-hand side of (5.7) is negative whereas the right-hand side is positive.So there can be no solutions with y < t and this implies the left-hand inequality in (5.6).
The estimate (5.5) follows from (5.4) with a few tricks.First we have by direct calculation Y )dY then the Fundamental Theorem of Calculus and the definition of E + 0 tell us E + 0 F = I ϵ ∂ w F .One can show (see the argument that leads to equation (3.4) in [20]) that ∥I ϵ G∥ H n (r) ≤ C∥G∥ H n (r) .Putting it all together we get sup .

□
Now we can control all the γ dependent terms.
Lemma 5.12.Assume Hypothesis 1.1.Let q ϵ (j, t) and p ϵ (j, t) be the extended KdV approximators as in Definition 4.4 where we further assume that that A and B are good solutions of KdV on [−T 0 , T 0 ].Then almost surely and Proof.The estimates for P 3 and Q 3γ are immediate from Lemma 5.11 and their definitions.
A direct calculation shows that Each of these can be estimated with Lemma 5.11 as well.Another calculation gives The first line of the above we estimate with Lemma 5.11.The ones in the second line all hinge on estimating terms of the form D − (Q 3γ Q l ) for different choices of l.The definition of D − and the triangle inequality give At this point the remainder of the estimates follow from earlier estimates on the component Q k and bookkeeping.□ 5.3.Finishing up.We are now in position to prove Proposition 5.3.
To prove the estimate for α 2 (ϵ), from (4.1) we have All terms appearing in h ϵ have been estimated in one place or another previously and each is O ℓ 2 (ϵ −1/2 ) at worst so that we get sup |t|≤T 0 /ϵ 3 ∥ϵ 4 h ϵ ∥ ℓ ∞ = O(ϵ 7/2 ).On the other hand using the first estimate in Lemma 5.6 shows that sup So all told we have α 2 (ϵ) = O(ϵ 3 ).

The main event
Now we can state and prove our main theorem in full detail.

Numerics
In this section we report the outcomes of a variety of numerical simulations of solutions of (1.1).In all cases our methodology is to truncate (1.1) to |j| ≤ M where M ≫ 1 and enforce periodic boundary conditions (M is always taken to be so incredibly vast that the solutions are never large anywhere near the edges of the computational domain).The resulting system is a large finite-dimensional ODE which we solve with a standard RK4 algorithm.This is essentially the same method as used in [10,20].The calculations were performed in MATLAB.
7.1.Amplitude attenuation.The first experiment simulates (1.1) with a number of choices for m(j).These are: • m(j) = 1 for all j, that is, they are constant.
We take ϵ = 1/2, 1/4, 1/8 and 1/16 and simulate from t = 0 out to t = 3/ϵ 3 .Famously, solutions of KdV equations with smooth and localized initial data will, over time, resolve into the sum of separated solitary waves of fixed amplitude [7].Thus, if the solution of the FPUT lattice is well-approximated by a KdV equation we expect the ℓ ∞norm to at least roughly stabilize over long time periods.And so in Figure 1 we plot ∥q(•, T /ϵ 3 ), p(•, T /ϵ 3 )∥ ℓ ∞ /ϵ 2 vs T , for 0 ≤ T ≤ 3. (The scaling here is to be consistent with the long wave scaling so that we may compare various choices of ϵ on the same plot.)We see exactly this stabilization in the plots for the constant, 2-periodic and transparent cases.Furthermore, the stabilization becomes more pronounced as ϵ decreases, which is consistent with the rigorous KdV approximation theorems here and in [23,2,10].But when the masses are taken to be i.i.d., there is an obvious, pronounced decay of the amplitude; this attenuation (up to the scaling) becomes stronger as ϵ decreases.This is why we said in the Introduction that a KdV approximation for the i.i.d.problem is not appropriate.7.2.Numerical computation of optimal error bound.In the second experiment we aim to corroborate the conclusions of our main result, Theorem 6.1.We simulate (1.1) with m(j) subject to the transparency condition (1.3) where ζ(j) are drawn from the uniform distribution on [−1/8, 1/8].In this case, σ 2 = 1/192.We choose the initial data so that B(l, T ) is zero and A(w, T ) is an exact solitary wave solution of (4.17), namely 3 sech 2 6 1+24σ 2 (w − T ) .That is to say, we take (7.2) q(j, 0) = 3ϵ 2 sech 2 6 1 + 24σ 2 ϵj and p(j, 0) = −3ϵ 2 sech 2 6 1 + 24σ 2 ϵj .
We simulate for ϵ = 2 −l/2 where l = 2, . . ., 10 and run the simulations from t = 0 to t = 3/ϵ 3 .(When ϵ = 1/32 this takes a very long time!)To be clear, we fix a realization and then vary ϵ as stated with the same realization used throughout.Then we compute the overall absolute error, specifically:  Then we repeat for another realization (ten different realizations all together).If we plot E ϵ vs ϵ on a loglog plot, Theorem 2 tells us the best fit line to the data should have slope somewhere around 2, or larger.The results are shown in Figure 2; all ten realizations on the same graph.The line of best fit has slope exceeding 2.5 in each case; they are 2.5715, 2.5982, 2.5677, 2.5445, 2.5778, 2.5948, 2.5510, 2.5760, 2.5499, 2.5563.This numerically computed slope is over .5 larger than what we expect from our rigorous estimate.That is to say, the numerics indicate that the approximation of the transparent mass FPUT lattice by KdV is a fair bit better than what our results from Theorem 6.1 show.We repeated the same experiment for many different realizations and the numerically computed slope was always near to 2.5.Thus we conjecture that the absolute ℓ 2 -error is at worst O(ϵ 5/2 ), which is the same size as the error for the constant and periodic problems.Of course we do not know how to prove such a thing at this time.8. What's next.
Our results are the first piece of much larger program aimed at bringing stochastic homogenization to nonlinear dispersive problems.Here are a number of open problems, some of which should be relatively straightforward given our results here and others of which will require substantial new technical ideas.
(1) Prove a result analogous to Theorem 6.1 but in expectation instead of in the almost sure sense.In our work on the linear i.i.d.lattice [20] we proved approximation results in both senses and the strategy for the expectation result is almost surely transferable.
(2) Study (1.1) but allow spatial heterogeneity in the spring potentials as well.We expect an analogous transparency condition can be used to achieve a result similar to the one here.(3) Confirm (or reject!) the conjecture that the sharp order of the KdV approximation error for the transparent mass FPUT lattice is smaller than O(ϵ 5/2 ).The error estimate we prove here is due entirely to our use of the autoregressive processes in the extended approximation.But perhaps a yet more clever option exists to handle the terms which we encountered at Z 15 and Z 25 .

1 )
we can bound both as we did for γ 21 and γ 22 earlier.And the same skullduggery about independence that worked for γ 23 works for the third sum.All together we get sup j≤0 |γ 2 (j)| ln(e + |j|) ≤ Cϵ −1/2 .

( 4 )
If one can get the conjectured sharp error estimate, it opens the door to replacing the transparency condition (1.3) with m(j) = 1 + δ − ζ(j).
2.2.Probability.All probabilistic components in the paper descend through the random variables ζ(j).Associated probabilities are represented by P and expectations by E. 2.3.O, o and C notation.We use the following version of Landau's "big O/little o" notation.Given two real-valued functions, f (ϵ) and g(ϵ), we say f