Pushed, pulled and pushmi-pullyu fronts of the Burgers-FKPP equation

We consider the long time behavior of the solutions to the Burgers-FKPP equation with advection of a strength $\beta\in\mathbb{R}$. This equation exhibits a transition from pulled to pushed front behavior at $\beta_c=2$. We prove convergence of the solutions to a traveling wave in a reference frame centered at a position $m_\beta(t)$ and study the asymptotics of the front location $m_\beta(t)$. When $\beta<2$, it has the same form as for the standard Fisher-KPP equation established by Bramson \cite{Bramson1,Bramson2}: $m_\beta(t) = 2t - (3/2)\log(t) + x_\infty + o(1)$ as $t\to+\infty$. This form is typical of pulled fronts. When $\beta>2$, the front is located at the position $m_\beta(t)=c_*(\beta)t+x_\infty+o(1)$ with $c_*(\beta)=\beta/2+2/\beta$, which is the typical form of pushed fronts. However, at the critical value $\beta_c = 2$, the expansion changes to $m_\beta(t) = 2t - (1/2)\log(t) + x_\infty + o(1)$, reflecting the"pushmi-pullyu"nature of the front. The arguments for $\beta<2$ rely on a new weighted Hopf-Cole transform that allows to control the advection term, when combined with additional steepness comparison arguments. The case $\beta>2$ relies on standard pushed front techniques. The proof in the case $\beta=\beta_c$ is much more intricate and involves arguments not usually encountered in the study of the Bramson correction. It relies on a somewhat hidden viscous conservation law structure of the Burgers-FKPP equation at $\beta_c=2$ and utilizes a dissipation inequality, which comes from a relative entropy type computation, together with a weighted Nash inequality involving dynamically changing weights.


Introduction
We consider the long time behavior of the solutions to the Burgers-FKPP equation (1.1) that dates back to the seminal work of Fisher [21] and Kolmogorov, Petrovskii and Piskunov [33], and has been studied extensively since. This equation arises in numerous applications in the physical and biological sciences, also discussed in [42]. Beyond this, it has been used to study the fine properties of branching Brownian motion after McKean [40] discovered its connection to (1.2); see, for instance, [2,3,13,14,43] and references therein. Both of the original papers [21] and [33] showed that (1.2) admits traveling wave solutions of the form u(t, x) = U c (x − ct) for all c ≥ c * = 2, which necessarily satisfy the ODE We denote by U * (x) the traveling wave moving with the minimal speed c * = 2. Given c ≥ c * , solutions to (1.3) are unique up to translation in x. One may fix a normalization for U * (x), for example, by requiring that U * (0) = 1/2. It was observed already in [21,33] (albeit at a different level of mathematical rigor) that the solution to (1.2) with an initial condition that is a step function (1.4) converges in shape to a traveling wave: there exists a reference frame m(t) so that u(t, x + m(t)) → U * (x), as t → +∞, uniformly in x ∈ R. (1.5) It was also argued informally in [21] and proved in [33] that m(t) = 2t + o(t) as t → +∞. (1.6) We refer to m(t) as the location of the front at time t > 0 since, roughly, it separates the regions {u(t, x) ≈ 1} for x ≪ m(t) and {u(t, x) ≈ 0} for x ≫ m(t). This result was refined in the pioneering works by Bramson [11,12], who used probabilistic techniques and the connection of (1.2) to branching Brownian motion to analyze the front position m(t). In particular, Bramson showed that there exists a constant x ∞ that depends on the initial condition u in (x) for (1.2), so that This asymptotic expansion holds as long as the initial conditions u in (x) decay sufficiently fast to zero as x → +∞. We also mention the related work by Uchiyama [52] and Lau [34], which are based on PDE techniques, and more recent refinements and alternative proofs of Bramson's result in [8,27,29,44,45,46], that use both probabilistic and PDE methods. Very recently, spectral techniques have been applied to study the "Bramson shift" in [4,5,6], including some problems that do not obey the comparison principle. In particular, it was shown in [27,45] that convergence in (1.5) is only algebraic in time: x > 0. (1.8) The more precise results in [27] show that the convergence rate can not be improved to an exponential in time rate: even convergence in shape to a traveling wave does not hold beyond the order O(t −1 ). The algebraic rate of convergence of the solution to (1.2) to a shift of a traveling wave is closely related to the fact that FKPP fronts are "pulled"-that is, the long time behavior of the solutions is governed by the behavior far ahead of the front, where u is small. Hence, the problem is not compact in a certain sense, making the algebraic-in-time (rather than exponential) rate of convergence natural.
The "pulled" behavior should be contrasted with the class of equations of the form similar to (1.2): v t = v xx + f (v) (1.9) but with solutions that behave as "pushed fronts" -that is, the long time behavior of the solutions is governed by the behavior at the front, where u is neither small nor approximately 1. An example of such f is a bistable nonlinearity of the form , with some θ ∈ (0, 1). Unlike the Fisher-KPP equation, in the pushed cases solutions to the initial value problem for (1.9) with a rapidly decaying initial condition v(0, x) = v in (x) converge to a shift of the traveling wave U f (x) for (1.9) exponentially fast in time: there exists ω > 0 such that (1.10) Moreover, the front location has the asymptotics m f (t) = c f t + x ∞ , as t → +∞, (1.11) without any logarithmic in time correction. The rate of convergence in (1.10) is exponential in time precisely because the fronts are "pushed," so that the long time behavior is determined by what happens in a compact region around the front, and not by the tail behavior as x → +∞. We refer to the monograph [38] and extensive presentations in [19,51] for, respectively, an applied mathematics and a physics perspective on pushed and pulled fronts, and to [24] for a recent mathematical analysis, among many other references.
The pulled to pushed fronts transition in the Burgers-FKPP equation As we have mentioned, the long time behavior of the reaction-diffusion equations that admit either pulled or pushed fronts is reasonably well-understood, at least on an intuitive level. An interesting aspect of the Burgers-FKPP equation (1.1) is that it exhibits a transition from the pulled to pushed behavior at β c = 2. The behavior of traveling waves for (1.1) already illustrates the change in behavior at β c = 2. For a given β ∈ R, the Burgers-FKPP equation (1.1) admits traveling wave solutions for all c ≥ c * (β), with the minimal speed (1.12) The minimal speed traveling wave φ β satisfies and φ β (+∞) = 0. (1.13) Once again, the solution to (1.13) is unique only up to a translation in x. We fix the normalization by requiring that φ β (0) = 1/2. It happens that the traveling wave profile for β ≥ 2 is explicit. Indeed, one can check by direct computation that φ β (x) = 1 1 + e βx/2 , for β ≥ 2. (1.14) On the other hand, when β < 2, the profile of the minimal speed traveling wave is, to the best of our knowledge, not explicit, and the asymptotics of φ β as x → +∞ are no longer purely exponential, being given by φ β (x) ∼ (Ax + B)e −x , as x → +∞, for β < 2, (1.15) with some A > 0 and B ∈ R that depend on β. This was shown, for instance, in [42] by a phase plane analysis. It is discussed further in Appendix A.
As explained in Remark 1 of [24], a quantitative mathematical criterion for a traveling wave profile U c (x), that moves with a speed c ≥ 0, to be pushed is that U c (x)e cx/2 ∈ L 2 (R).
(1. 16) Otherwise, a traveling wave is pulled. We will see the motivation behind the criterion (1.16) in the discussion of the long time behavior of the solutions to (1.1) for β > 2, which is contained in Section 7. According to this classification, the Burgers-FKPP traveling waves are pushed for β > 2, and pulled for β ≤ 2, as can be seen from (1.14) and (1.15). While the case β = 2 obviously fails the pushed front criterion (1.16), it has an additional property distinguishing it from the case β < 2. The traveling wave, still given by (1.17) due to (1.14), satisfies the weaker condition (cf. (1.16), recalling that c * = 2 when β = 2) that φ 2 (x)e x = e x 1 + e x ∈ L ∞ (R). (1.18) As we will see, this reflects a very different long time behavior of the solutions to (1.1) at β = 2 compared both to β < 2 and β > 2. Borrowing the terminology of [36], we will refer to such "dual nature" fronts at β = β c as "pushmi-pullyu" fronts.

The large time behavior of the solutions
We now describe the main results of this paper on the large time behavior of the solutions to the Burgers-FKPP equation (1.1) with β = 0 and rapidly decaying initial conditions, generalizing the results on the standard Fisher-KPP equation (1.2) discussed above. The main new feature is the aforementioned transition from the pulled to pushed behavior at β c = 2. The analysis at β = β c turns out to be surprisingly delicate. The study of the large time behavior in the present paper relies, in particular, on the notion of steepness of the solution. While such arguments date back to the original KPP paper [33], we were to a large extent motivated by the definition in the recent paper of Giletti and Matano [26]. As we will only need it for smooth functions, we can formulate their notion as follows. Let us denote by W the class of C 1 , monotonically decreasing function u(x), x ∈ R, such that Let W be its closure in L 1 . Given two functions u 1 , u 2 ∈ W, we say that u 1 is steeper than u 2 if |u ′ 1 (u −1 1 (z))| > |u ′ 2 (u −1 2 (z))|, for all z ∈ (0, 1). (1.20) In other words, the graph of u 1 (x) is steeper than the graph of u 2 (x) when compared at each fixed level z ∈ (0, 1), rather than at a fixed point x ∈ R. This notion is translation invariant; if u 1 is steeper than u 2 , it is also steeper than any translate u 2 (· + h), with a fixed h ∈ R. For u 1 , u 2 ∈ W, we say that u 1 is steeper than u 2 , if u 1 and u 2 can be approximated by C 1 -functions u 1,ε and u 2,ε as ε → 0 such that u 1,ε is steeper than u 2,ε for all ε. Equation (1.1) has the following important property.
Proposition 1.1. Let u 1 (t, x) and u 2 (t, x) be the solutions to (1.1) with the corresponding initial conditions u 10 , u 20 ∈ W. If u 10 is steeper than u 20 , then u 1 (t, ·) is steeper than u 2 (t, ·) for all t > 0.
This result was essentially proved for the classical Fisher-KPP equation (1.2) in the original KPP paper [33]. For the convenience of the reader, we present the proof for the Burgers-FKPP case in Section 2. An interesting aspect here is that while the steepness comparison is used throughout the paper, most of its applications are not in the spirit of the elegant intersection number type of arguments but to produce estimates. A novel element of the use of these ideas in the present paper is their quantitative application. In contrast to the qualitative arguments used in previous works, we use "steepness" to obtain estimates on u and integral quantities involving it. An exception is the proof of Proposition 1.1 itself and its consequences discussed in Section 2.
The main result of the present paper is the following: Theorem 1.1. Let u(t, x) be the solution to (1.1) with the initial condition u in ∈ W such that u in is steeper than the minimal speed traveling wave φ η . Then, for each β ≤ 2, there exists a constant x ∞ that depends on β and the initial condition u in so that lim t→+∞ u(t, x + m β (t)) = φ β (x), (1.21) with the function m β (t) given by m β (t) = 2t − 3 2 log(t + 1) − x ∞ + o(1), as t → +∞, (1.22) if β < 2, and, otherwise, by m β=2 (t) = 2t − 1 2 log(t + 1) − x ∞ + o(1), as t → +∞. (1.23) For β > 2, there exists ω > 0, which depends on β but not on u in , and K > 0, which depends both on β and u in , such that (1. 24) We note that the class of initial data considered in Theorem 1.1 includes the Heaviside initial data: u in = 1 (x < 0). Indeed, it is easy to see that u in is the limit in L 1 as ε → 0 of and u in,ε is clearly steeper than φ β for ε < 1. Theorem 1.1 reflects the different nature of the Burgers-FKPP fronts we have discussed above for various values of β ∈ R. For β < 2, the solution is pulled and the front location has the same asymptotics (1.22) as (1.7) for the standard Fisher-KPP equation (1.2). For β > 2, the solution is pushed and the exponential-in-time convergence to the traveling wave (1.24) agrees with what we have seen in (1.10) for pushed fronts. The new asymptotics (1.23) for the "pushmi-pullyu" solutions at β = β c is different from both of these cases.
One may ask if the asymptotics (1.22) and (1.23) can be refined, as was done in [7,8,19,27,45] for the standard Fisher-KPP equation (1.2). It seems that this is possible using the fascinating, if formal, technique of [8]. Arguing in the manner of [8], it turns out that the o(1) terms in the expansion of m β (t) are exactly the same as in the Fisher-KPP case when β < 2; however, they are different when β = 2. In this "pushmi-pullyu" case, one finds (1.25) The first terms in this expansion above agree with the formal computation in [19], performed by a completely different method, where it was computed up to the order O(t −1/2 ). While the error in [19] is given as O(1/t), and the O(log t/t) correction seems to be missing, it is tempting to conjecture that (1.25) is universal for the "pushmi-pullyu" situations, such as, for instance, the reaction-diffusion equation (1.28) below. The application of the formal technique of [8] to derive (1.25) is contained in Section 8. We should mention that the convergence of the solution to the Burgers-FKPP equation (1.1) to a traveling wave was studied using the matched asymptotic expansions in [37]. Their formal results agree with Theorem 1.1 in the cases β < 2 and β > 2 but unfortunately overlook some of the details in the case β = 2, leading to an incorrect prediction. On the other hand, as we have mentioned, the matched asymptotics analysis in [19,38] predicts the shift (1/2) log t for the "pushmi-pullyu" transition situations, such as (1.28) below, mimicking, in a certain sense, the Burgers-FKPP equation at β = 2.

Connection to other pulled to pushed transition problems
The Burgers-FKPP equation at β = 2 is not the only one example of a "pushmi-pullyu" front. A well known instance is a particular generalized Fisher nonlinearity considered in [28]: (1.26) Here, the situation is qualitatively and even quantitatively extremely similar to the Burgers-FKPP picture: traveling waves exist for all c ≥ c * (a), with c * (a) = 2 for all 0 ≤ a ≤ 2, and c * (a) = a 2 + 2 a , for a > 2, (1.27) as in (1.12). Even the traveling wave profile for a = 2 is given exactly by φ β=2 in (1.17). As explored in [19], (1.26) is a special case of the more general class of equations (1.28) in which the pulled to pushed transition occurs at a = n + 1: when a ≤ n + 1, the minimal speed is c * = 2 and when a > n + 1, c * = a n+1 + n+1 a > 2. In the critical case a = n + 1, the corresponding traveling wave is explicit, with the same purely exponential decay as in (1.17).
Numerous other examples, including the pushed-pulled transition for systems of reactiondiffusion equations arising in chemistry and biology are discussed in Section 3.13 of [51]. We also mention the repulsive Keller-Segel-FKPP equation that was recently used to model the spread of a population in which individuals reproduce and diffuse, influenced by a preference for low population density regions with strength |χ|. This is seen, for instance, in slime molds [39] and bacteria [54]. Origins of some other similar chemotaxis models have been discussed in [22,23]. The pushed-pulled transition at the level of traveling waves has been shown in [31]: there exists a threshold χ 0 > 0 such that, when |χ| < χ 0 , the fronts are pulled, while, as χ → −∞, fronts are pushed. In the same vein is a system of equations considered in [10]: one equation is a Fisher-KPP equation for T with drift u and the other, which governs u, satisfies a Burgers type equation with a Boussinesq-type forcing depending on T . Actually, this specializes to (1.1) for a certain choice of parameters. As with the Keller-Segel-FKPP, a pushed-pulled transition occurs. In both models, the pushed-pulled analysis in [10] is performed only at the level of the traveling wave, which is a much simpler setting. The Cauchy problem, on the other hand, appears to require new ideas.
A matched asymptotic analysis in [19,38] predicts the results as in Theorem 1.1 to hold for the long time behavior of the solutions of equations at the pushed-pulled transition, such as (1.28). However, to the best of our knowledge, there are no rigorous results with this precision in any of such critical cases. The best result in this direction seems to be the very recent paper [25], which shows that when a = 2, the level sets of the solutions to (1.26) are located at a position As discussed in detail in [5] and [6], the threshold cases that separate the pulled and pushed fronts seem to be also outside the scope of the currently available spectral methods.

Comments on the proofs
Let us comment on the strategy of the proof of the three cases in Theorem 1.1, in the order of increasing difficulty and intricacy. The case β > 2 falls into the category of pushed fronts, and the proof follows the classical strategy of [47,48,49], with appropriate modifications. For β < 2 in Theorem 1.1, we use an extension of the arguments in [29,44] for the Fisher-KPP equation, approximating the dynamics by the Dirichlet problem for the linear heat equation on a half line. The Burgers drift term causes a difficulty, since the linearization strategy used, for instance, in [27,30,29,44,45] taking out the spatial exponential decay in the solution seems insufficient for 0 < β < 2. To overcome this, we pass to the moving frame x → x − 2t, setting and introduce a weighted Hopf-Cole transform combining the standard Hopf-Cole transform for the heat equation and the exponential weight used in the standard Fisher-KPP arguments. It turns out that if the initial condition u in for (1.1) is steeper than the traveling wave φ β , then v(t, x) satisfies a differential inequality as in Proposition 3.1 below. We stress that the steepness comparison of the initial condition to a traveling wave plays a crucial role in showing that v(t, x) satisfies the differential inequality (1.32).
With (1.32) in hand, we are able to construct upper and lower barriers in the self-similar variables for the linearized equation for v(t, x) on the half line, and then the convergence in the tail implies the convergence in the bulk due to the pulled-front nature of the dynamics, as in [44]. Interestingly, this last step also utilizes the assumption that the initial condition, and hence the solution, is steeper than the minimal speed traveling wave, in an explicit quantitative way. Qualitatively, the case β < 2 is similar to the standard Fisher-KPP equation, and the weighted Hopf-Cole transform gives a tool to see that. However, the repeated use of the steepness comparison is something new in this argument for Burgers-FKPP equation.
The weighted Hopf-Cole transform also indicates the technical reason for the transition at β = 2: while it is easy to see from (1.31) that v(t, x) → 0 as x → −∞ for β < 2 (recall that u ≤ 1), in the case β = 2 the function v(t, x) approaches a positive constant as x → −∞. This modifies the boundary condition for the linearized problem for the upper and lower barriers in the self-similar variables, and ultimately leads to the change in the logarithmic shift from (3/2) log t to (1/2) log t at β = 2.
Let us now discuss the ingredients of the the proof of Theorem 1.1 in the critical case β = 2, which is remarkably different from the approach for the standard Fisher-KPP equation. This analysis is probably the most novel part of the present paper. The first key observation is that when β = 2, the Burgers-FKPP equation (1.1) has a special structure: the function satisfies a spatially inhomogeneous conservation law: Here, u(t, x) is defined in (1.30). An immediate consequence of (1.34) is a conservation law for the exponential moment of u(t, x): This conservation law eventually leads to a lower bound for m 2 (t) of the form see the proof of Lemma 5.2 in Section 5.3 below. A matching upper bound for m 2 (t) is related to the behavior of p(t, x). Note that, together with the explicit expression (1.17) for the profile φ 2 (x), the convergence to a traveling wave in shape in (1.21) yields, roughly, Thus, an upper bound of the form would follow from an L ∞ -bound on p(t, x) of the form (1.39) Such decay, while natural to expect in view of (1.34), is not automatic for solutions of massconserving advection-diffusion equations, even if the advection is bounded: see the end of Section 5.1 for simple examples of such equations with solutions that do not decay in time.
The proof of (1.39), presented in Section 5, turns out to be rather intricate. While (1.34) looks like a degenerate viscous conservation law, we were unable to adapt the methods of [16] or [32] to (1.34) and instead take a different approach. The first step is a relative entropy computation inspired by [17,41] where it was used for linear advection-diffusion equations. An unusual twist is that we compute the relative entropy not with respect to another solution but to a super-solution to (1.34). This leads to a weighted dissipation inequality for the function The dissipation identity (1.41) is similar to that for the standard heat equation, where it takes the form d dtˆϕ that is, as in (1.41) but without the weight ρ(t, x). In the latter case, (1.42) combined with the Nash inequality and a standard duality argument directly leads to the temporal decay rate t −1/2 in R. Here, the time-dependent weight ρ(t, x) that appears in (1.41) is degenerate as x → −∞, so the standard Nash inequality can not be used. Instead, we obtain a Nash-type inequality for weighted spaces for a certain class of degenerate weights: see Proposition 5.9 below. The weights need to satisfy certain quantitative assumptions, and we need to verify that the dynamics do not take the weight ρ(t, x), defined in (1.40), out of the class of the admissible weights or make the constants in the weighted Nash inequality in Proposition 5.9 degenerate as t → +∞. Applying the weighted Nash inequality in (1.41), leads to the appropriate decay of ϕ(t, x) in a weighted L 2 -space. However, the nature of the Nash inequality leads to an extra delay in time, before the decay sets in, that depends on the initial condition. This, among other issues, prevents us from using the duality argument to establish the weighted L ∞ -decay of ϕ(t, x) from the L 2 -estimate. Instead, it comes from additional ad hoc arguments, first establishing the bound at a large time t ′ less than, but comparable to, t and then extending it to the final time t. The decay of p(t, x) follows from that of ϕ(t, x) and bounds on the weights. An extra technical complication is that the function ρ(t, x) appears in the denominator in the definition (1.40) of the function ϕ(t, x) but vanishes as x → −∞. As a result, the L 2 -norm of ϕ(t, x) is actually infinite for a large class of interesting initial conditions and extra approximations have to be used to deal with this issue. At this stage, the assumption that u in (x) = 1 for x ≤ L 1 is actually not a simplification but a complication that can not be avoided if one wants to include the Heaviside function into the class of admissible initial conditions. The need to control the behavior in the back of the front is another reflection of the "pushmi-pullyu" nature of the solution.

Organization of the paper
This paper is organized as follows. Section 2 uses the steepness comparison arguments to prove Proposition 1.1 and convergence of the solution to a traveling wave in shape. Section 3 describes the weighted Hopf-Cole transform leading to the differential inequality (1.32). The proof of Theorem 1.1 for β < 2 is contained in Section 4. Section 5 is devoted to the decay estimates of the solutions to the inhomogeneous viscous conservation law (1.34) that appears in the case β = 2. It is here that we prove the aforementioned L ∞ -decay of the solutions to (1.34). Section 6 uses these results to prove Theorem 1.1 for β = 2, bootstrapping the bounds (1.36) and (1.38) to the precise asymptotics (1.23). The case β > 2 is considered in Section 7. Section 8 uses the techniques of [8] to obtain, by formal arguments, further corrections to the front location asymptotics given in Theorem 1.1 for β ≤ 2. The result for β < 2 is identical to the standard Fisher-KPP equation but is different for β = 2. Finally, Appendix A contains some basic facts about the traveling waves for the Burgers-FKPP equation.
Acknowledgment. JA was supported by Joe Oliger Fellowship. CH was partially supported by NSF grant DMS-2003110. LR was partially supported by NSF grant DMS-1910023, and ONR grant N00014-17-1-2145. We are grateful to John Leach for a very friendly and helpful discussion of the results in [37] and [38].

Convergence to a traveling wave in shape
In this section, as a preliminary step to the proof of Theorem 1.1, we use a strategy inspired by the original KPP paper [33] to show convergence in shape of a solution to the Burgers-FKPP equation to a traveling wave. As the first step, we prove Proposition 1.1.

The proof of Proposition 1.1
It is enough to assume that u 10 , u 20 ∈ W by standard density arguments. Let u 1 (t, x) and u 2 (t, x) be the solutions to (1.1) with the respective initial conditions u 10 , u 20 such that u 10 is steeper than u 20 . First, we note that since the initial conditions are decreasing, both u 1 (t, x) and u 2 (t, x) are decreasing and have the left and right limits as in (1.19), so that both u 1 (t, ·) and u 2 (t, ·) lie in W.
Recall the definition of "steeper" (1.20). To show that u 1 (t, ·) is steeper than u 2 (t, ·) for any t > 0, consider the functions for a fixed k 0 ∈ R. The function w(t, x; k 0 ) satisfies Since u 10 is steeper than u 20 , it is also steeper than u 20 (· + k 0 ). Therefore, there exists x 0 so that and w(0, x; k 0 ) < 0 for all x > x 0 .
As w(t, x; k 0 ) is a solution to the parabolic equation (2.1), a consequence of [1, Theorems A and B] is that the function w(t, x; k 0 ) has exactly one zero y(t; k 0 ) for all t > 0. Indeed, [1] shows that the number of zeros is nonincreasing and that a zero may only disappear at a time t 0 > 0 when two zeros "collide." Hence, w(t, x; k 0 ) > 0 for all x < y(t; k 0 ) and w(t, x; k 0 ) < 0 for all x > y(t; k 0 ), with y(0; k 0 ) = x 0 . In addition, we have ∂ x u 1 (t, y(t; k 0 )) < ∂ x u 2 (t, y(t; k 0 )).
Since this is true for all k 0 ∈ R, it follows that u 1 (t, ·) is steeper than u 2 (t, ·). A standard approximation argument shows the following.
Corollary 2.1. Let v(t, x) and u(t, x) be the solutions to (1.1) with the respective initial conditions v in ∈ W and u in (x) = ½(x ≤ 0). Assume that v in (x) is steeper than the minimal speed traveling wave φ β (x). Then, for any t > 0 the solution v(t, ·) is steeper than φ β , and is less steep than u(t, x).

Convergence in shape
We now establish convergence of the solution in shape to a traveling wave.
Proposition 2.2. Let u(t, x) be the solution to (1.1) with the initial condition u in ∈ W that is steeper than the minimal speed traveling wave φ β (x), or with u in (x) = ½(x ≤ 0). Then, there exists a function m β (t) such thatṁ β (t) → c * (β) as t → +∞ and Here, φ β (x) is a solution to (1.13) with the minimal speed c * = c * (β).
Corollary 2.1 shows that it suffices to consider the solution u(t, x) to (1.1) with the initial condition u(0, x) = ½(x ≤ 0). Note that for any τ > 0, the function u (τ ) (t, x) = u(t + τ, x) is the solution to (1.1) with the initial condition u (τ ) (0, x) = u(τ, x) that is less steep than u(0, x). It follows that for any t > 0 and τ > 0 the function u(t, ·) is steeper than u(t + τ, ·). In addition, u(t, ·) is steeper than the minimal speed traveling wave φ β (x) for all t > 0. Hence, if for each v ∈ (0, 1) and t > 0, we let x(t, v) be the unique point such that u(t, x(t, v)) = v, then, the function is increasing in t for all v ∈ (0, 1), and Let now m β (t) be the position such that u(t, m β (t)) = 1/2 for all t > 0, and consider the translateũ (t, x) = u(t, x + m β (t)), as well as the corresponding inverse ξ(t, v) defined byũ(t, ξ(t, v)) = v, for 0 < v < 1. Observe that ξ(t, 1/2) = 0 for all t > 0. We see from (2.5)-(2.6) that the function E(t, v) is negative and increasing in time. Thus, it has a limit Hence , as t → +∞, and ξ(t, v) =ˆv As a consequence, the functionũ(t, x) also converges uniformly on compact sets to a limitũ ∞ (x): (2.10) Moreover, due to (2.7), we have This yields the correct behavior of the limits x → ±∞: Indeed, considering, for example the behavior x → +∞, we have Notice thatṁ By parabolic regularity theory, the numerator is bounded and, by (2.6) with v = 1/2, the denominator is bounded away from zero. It follows thatṁ β is bounded uniformly in t. Hence, for any sequence t n → ∞, there is a subsequence t n k → ∞ and a real number c ∈ R such thatṁ β (t n k ) → c.
Using then the convergence (2.9), we deduce that where we have switched to ∂ notation to avoid the awkward double subscript. From (2.14), we see thatũ ∞ (x) is a traveling wave solution to (1.1) moving with the speed c. It remains to show that c = c * (β). The key point is that the steepness comparison argument above applies to any traveling wave solution to In other words, if we set for any φ that satisfies (2.15) with some c ≥ c * (β). Therefore, the limitũ ∞ (x) is the traveling wave that is the steepest among all traveling wave solutions. Lemma A.2 implies thatũ ∞ (x) = φ β (x) is the minimal speed traveling wave and, thus, c = c * (β). By the arbitrariness of the sequence t n , it follows thatṁ β (t) → c * (β) as t → ∞. This finishes the proof of Proposition 2.2.

The weighted Hopf-Cole transform
In this section, we discuss a weighted Hopf-Cole transform that will play a key role in the analysis of the Burgers-FKPP equation for β ≤ 2. Let us recall that the standard Burgers equation can be linearized by means of the Hopf-Cole transform. Namely, if u is a solution to (3.1) then the function The second simple observation is that if u(t, x) is the solution to the standard Fisher-KPP equation in a frame moving with the speed c * = 2: The nonlinear term in (3.6) is negligible for x very large and positive but plays the role of a large absorption for x very negative. Therefore, the solution to (3.6) should be well approximated by the solution of the heat equation on a half-line x > 0 with the Dirichlet boundary condition: This simple idea is what is driving the convergence to a traveling wave in [27,29,44,45]. The weighted Hopf-Cole transform that we discuss below allows us to adapt this intuition to the Burgers-FKPP equation (1.1) with β ≤ 2, and also shows why the transition from pulled to pushed fronts happens at β = 2.
We will consider the solution to (1.1) in the reference framẽ Here, we take in accordance with the different behavior in Theorem 1.1 in these two cases. In the above reference frame, (1.1) takes the form that is a combination of (3.2) and (3.5).
Let us point out one way to see why β = 2 is the critical value by using v. Sinceũ is steeper than φ and φ converges exponentially to 1 as x → −∞, we find The constant C t may depend on t, but not on x. From this, we see that, as These differences reflect the three different behaviors in Theorem 1.1: when β < 2, the nonlinear term (integral ofũ) does not dominate, and when β > 2, the nonlinear term dominates.
The main result of this section is the following analogue of (3.6) in the standard Fisher-KPP case. It will allow us to adapt an approximation similar to the linear Dirichlet boundary problem (3.7) for β < 2 in Section 4 and with a different boundary condition for β = 2 in Section 6. This will be extremely important for the proof of Theorem 1.1 for β ≤ 2.
Proposition 3.1. Let u(t, x) be the solution to (1.1) with β ≤ 2 and the initial condition u(0, x) as in Theorem 1.1. Then, the function v(t, x) defined in (3.12) satisfies the differential inequality As we will see in the proof, it is here, among other places, that the steepness assumption on the initial condition u(0, x) plays a crucial role, together with propagation of steepness in Proposition 1.1.

Proof of Proposition 3.1
We claim that the function v(t, x) satisfies an equation of the form where Let us verify that (3.14) holds. We compute (3.17) Using these identities in (3.11) gives (3.18) To simplify this equation, we integrate (3.11) from x to ∞ to get Substituting this back into (3.18) gives, after some algebra (3.20) Using the second identity in (3.16) in the right side gives which is exactly (3.14)-(3.15).
Here is the key observation. Proof. The claim is trivially true for β ≤ 0, so we only consider the case 0 < β ≤ 2. Let us first show that a traveling wave φ β (x) satisfies the inequality (3.22) We will prove (3.22) for β < 2, and deduce the conclusion for β = 2 by continuity. Recall that for β < 2 the traveling wave has the asymptotics (A.24): with some constants A and B. As β < 2, it follows that there exists L > 0 so that (3.22) holds for all x ≥ L, and we only need to prove this inequality for x < L. Next, we integrate the traveling wave equation, with c * (β) = 2: Passing to the limit x → −∞ in (3.24), and keeping in mind that φ β (−∞) = 1, gives Therefore, there exists L 1 < 0 so that (3.22) holds for all x < L 1 . In order to show that this inequality also holds for L 1 < x < L, consider the function Using (3.23), we obtain Hence, the only possible critical points of G(x, φ β ) are local maxima. As G(x) > 0 for x > L and x < L 1 we deduce that G(x, φ β ) > 0 for all x ∈ R, and (3.22) holds.
To finish the proof of Lemma 3.2, consider the integral To write this integral differently, definex byũ(t,x(t, v)) = v and note that with E(t, v) as in (2.5). Then, making the change of variables y → v via y =x(t, v), we find withĒ(v) defined in (2.6). We used (2.6) in the inequality in the first line in (3.28), employing the assumption that the initial condition is steeper than the minimal speed traveling wave. Using (3.22) in (3.28) we see that We conclude that finishing the proof of Lemma 3.2, and hence that of Proposition 3.1 as well.
4 The proof of Theorem 1.1 for β < 2

Outline of the proof
In this section, we prove Theorem 1.1 for β < 2. The overall strategy of the proof is similar to [44] which considers the classical Fisher-KPP equation with β = 0 but there are also non-trivial differences worth mentioning. The first step is to get control of the solution on the spatial scales x ∼ O( √ t). This is done using the self-similar variables. The estimates are precise enough to include the tail behavior Unlike in [44], in this step we rely crucially on the weighted Hopf-Cole transform and Proposition 3.1 to construct upper and lower barriers for the solution. The main estimate is the following: x) be the solution to (3.11) with the initial condition as in Theorem 1.1. There exist α ∞ > 0 and ε 0 > 0 so that, for any 0 < γ < 1/2 and ε ∈ (0, ε 0 ), there exists T ε,γ > 0 such that The second step is to use the pulled nature of the problem to show that the control ofũ given in Lemma 4.1 at x = x γ (t) induces convergence to a traveling wave on the scales x ∼ O(1). Before discussing the modifications required for β = 0, we recall the argument in [44]. It proceeds by constructing solutionsũ α to the same equation as satisfied byũ (which, in [44] is (3.11) with β = 0), considered on the half-line (−∞, (t + 1) γ ), with the boundary conditioñ u α (t, (t + 1) γ ) = αx γ e −xγ , and with α = α ∞ ±ε (cf. [44,Section 4]). Then, the analogue of (4.1) for β = 0, which is Lemma 5.1 in [44], and the comparison principle imply that u α∞−ε ≤ũ ≤ũ α∞+ε .
Unfortunately, a direct attempt to do this in our setting fails for several reasons. The convergence ofũ α to ϕ α in [44] is achieved by the construction of an explicit super-solution for the equation for the difference e x (ũ α − ϕ α ). However, the Burgers term in our context leads to a growth term in this equation when β > 0 (see, in contrast, (4.63) when β ≤ 0). We bypass this issue by working with the weighted Hopf-Cole transforms ofũ and the traveling wave. One might be tempted to defineũ α as above and then take its weighted Hopf-Cole transform. However, there is no apparent reason forũ α to be steeper than the traveling wave, meaning that Lemma 3.2 does not apply, and we cannot deduce the key differential inequality (3.13). To bypass this difficulty, we work at the level of the weighted Hopf-Cole transform, defining v α solving (3.14) treated as a linear equation with theũ terms serving as given coefficients (see (4.29)-(4.31) below), with a suitable boundary condition at x = (t + 1) γ . Proceeding as in [44], we obtain an upper bound on v α given by a shift of the weighted Hopf-Cole transform of the traveling wave and a decaying term. The lower bound is obtained using yet another steepness comparison. Afterwards, an additional argument is needed to upgrade this to the convergence ofũ due to the fact thatũ and v are connected in a nonlocal fashion.
Below, we first prove Lemma 4.1, then apply it to show closeness of the weighted Hopf-Cole transforms ofũ and the traveling wave φ β in Lemma 4.3, and, finally, deduce the closeness ofũ and φ β from this. For the sake of concreteness, we takeũ(0, x) = ½(x ≤ 0). The argument for general initial conditions as in Theorem 1.1 is nearly verbatim.

Analysis in the self-similar variables
We start with equations (3.14)-(3.15) and pass to self-similar variables: let In these variables, (3.14)-(3.15) becomes with the operator L defined by Note that when β = 0, which is the classical Fisher-KPP equation, (4.3) reduces to a local equation In that case, the nonlinear term in the right side is very small for η > 0 but plays the role of a large absorption for η < 0. This was used in [44] to show that (4.5) is well-approximated by the linear problemω augmented with the Dirichlet boundary conditionω(τ, 0) = 0. This is the main intuition behind the proof of the long-time asymptotics of ω in [44]. We now collect the ingredients that would allow us to use the arguments of [44]. First, Lemma 3.2 implies that, as long as the initial condition u(0, x) is steeper than the minimal speed traveling wave, the right hand side of (4.3) is non-positive. Hence, the solution to (4.6) is still a super-solution to (4.3).
Second, the coefficient in the parenthesis in the right side of (4.3) can be bounded below as follows: by Lemma 3.2, we have thatˆ∞ leading to It follows that the solution to is a sub-solution to (4.3).
The third observation is that the right side of (4.3) is very small for η ≫ e −τ /2 . To see this, we first show that, if β < 2, then there exist A > 1 and L > 0 so that (4.9) Indeed, the functionū(t, x) satisfies (we set L = 0 momentarily to simplify the notation) Hence, as long as β < 2, we may choose A ∈ (1, 2/β) so that With this choice of A, the functionū(t, x) is a super-solution to (1.1) for any L ∈ R. Since A > 1, we Then, the comparison principle for (1.1) implies that (4.9) holds. As a consequence, we see that the right side of (4.3) can be bounded byũ which are double-exponentially small on the scales η = O(e −( 1 2 −γ)τ ) for any γ ∈ (0, 1/2) (recall that x = ηe τ /2 ).
Finally, the asymptotics in (4.12) yield the approximate Dirichlet boundary condition for ω at η = −e −(1/2−γ)τ . Indeed, we have (4.13) In order to analyze the long-time behavior of ω, the main point is the following. The above arguments show that ω(τ, η) should be well-approximated by the solution to the linear problem (4.6) with the Dirichlet boundary condition. The linear operator L, given by (4.4), is compact and selfadjoint on H 1 0 (e η 2 /4 dη; R + ). Its spectrum consists of the eigenvalues 0, 1, 2, . . . , and its principal eigenfunction is ηe −η 2 /4 (in general, the eigenfunctions are given by the odd Hermite polynomials). Hence, the dominant behavior for ω(τ, η) as τ → +∞ should be given by α ∞ ηe −η 2 /4 , for some α ∞ depending on the initial data. This simple picture is complicated by the error terms in (4.3) and the "not quite zero" boundary condition (4.13). However, they have "fast" decay, so, with careful analysis, they can be suitably controlled. This is the argument in a nutshell, even though the details of the proof are more intricate.
In order to carry out this strategy, the authors of [44] require exactly the four ingredients listed above: equations for the super-and sub-solutions (given in our case by (4.6) and (4.8)), doubleexponential decay of the coefficients in the right side of (4.3) (see (4.11) and (4.12)), and the approximate Dirichlet boundary condition, as in (4.13). Thus, the strategy of that paper, which involves constructing successively more precise sub-and super-solutions of ω using the spectral properties of L, can be applied without alteration. As such, we state the following lemma giving the asymptotics of ω and omit the details.
The next step is to pass from the control of the solution on the spatial scales x ∼ O(t γ ), provided by Lemma 4.1, to the spatial scales x ∼ O(1) using the pulled nature of the Burgers-FKPP equation for β < 2. The arguments are a bit different for β ≤ 0 and β ∈ (0, 2). We first discuss the latter case where the arguments deviate from [44], as we have discussed in the outline of this section, and later explain why the case β < 0 is quite similar to what was done in [44] for β = 0, We are going to use an argument inspired by [44] but we will only apply it after an application of the weighted Hopf-Cole transform, and the conclusion is different as a result. We take a traveling wave φ β (x), and shift it into the moving frame: leaving the definition of ζ α (t) open for the moment. This function satisfies Next, we define its Hopf-Cole transform as Noticing that we obtain an equation for the function ψ: We used (3.24) in the last step above.
We note that, differentiating (4.20) and using the asymptotics (A.24) of φ β , it is easy to check that ψ α is increasing in x when x ≫ 1. Hence, ζ α is well-defined for t sufficiently large. In the sequel, we work with α = α ∞ ± ε, in which case, ψ α approximately matches with v at x = (t + 1) γ . Note a difference with [44]: the shift is determined by the value of ψ α (t, x) at the point x = (t + 1) γ , and not by the value of ϕ α (t, x) as in [44]. Next, using the asymptotics (A.24) of φ β , we find that and where C is independent of α over the interval (α ∞ /2, 2α ∞ ). In addition, using (A.24) again, we find that In view of (4.22), (4.25), and (4.26), we find, when |x| ≤ (t + 1) γ , Let us also recall that the Hopf-Cole transform v(t, x) of the functionũ, defined in (3.12), satisfies (3.14)- (3.15): x) as the solution to (4.28), thought of as a linear equation for v, with prescribed functionũ(t, x): We now make a couple of observations. First, note that the normalization (4.23) and the boundary condition (4.30) imply that v α (t, (t + 1) γ ) = ψ α (t, (t + 1) γ ), for all t > T ε and ε > 0. (4.32) Next, we claim that for any ε > 0, we have , for all t ≥ T ε and x < (t + 1) γ . (4.33) This follows from the fact that v and v α∞±ε satisfy the same linear parabolic equation (4.28) and (4.29) and the same initial condition (4.31), but take ordered values at the boundary. Indeed, with the help of Lemma 4.1, we obtain for t > T ε , up to increasing T ε to deal with the different exponential factors in the second line above. A similar argument gives v α∞−ε (t, (t + 1) γ ) < v(t, (t + 1) γ ). Thus, comparison yields (4.33), as claimed.
The above is due to a comparison argument using the fact that the function in the right side of (4.35) solves the same linear equation as v α∞+ε , with the same boundary condition, but with larger initial data. The important consequence of (4.35), along with (4.33), is that For the last inequality, we used (4.12). Let us define, for any ε ∈ (−1, 1), The following bound will be crucial for us.
There exists λ > 0, so that we have, for any ε > 0: Proof. We temporarily abuse notation and denote s ε = s in this proof. Using (4.27) and (4.29), we find, for |x| ≤ (t + 1) γ , Using (4.36), we see that and, by (4.32), The inequality (4.40) is another reminder of the importance of the condition β < 2 here. Lemma 3.2 tells us that the zero order coefficient in (4.39) is positive: In addition, we claim that Indeed, recalling from (4.21) that ϕ x (t, x) = φ ′ β (x + ζ(t)), and using the notation E andĒ set in (2.6), we find and (4.43) follows from (4.44) and (4.45). We deduce from (4.39), (4.42), and (4.43) that s(t, x) satisfies a differential inequality with G(t, x) ≥ 0 and boundary conditions (4.40)-(4.41). We can now argue as in [44] that we can find λ, γ, ε sufficiently small so that, for we haves for t ≥ T , up to possibly increasing T . The choice of γ occurs in this step. Hence,s is a supersolution of (4.39) and, up to multiplyings by a large constant so that s(T, ·) ≤s(T, ·), we have From this, (4.38) follows, finishing the proof of Lemma 4.3. By virtue of Lemma 4.3, we have now established that, for t > T ε and |x| (4.49) The first inequality is due to (4.33) and the second is due to (4.38). We point out that we do not have a more precise lower bound. This is another place where the steepness comparison will play a crucial role. Finally, note that the shifts corresponding to ψ α∞±ε satisfy, as in (4 .24): The end of the proof of Theorem 1.1 for β < 2 The case 0 < β < 2 The definitions (3.8) ofũ and (4.18) of the difference ϕ α∞ , and the asymptotics (4.24) of ζ imply that Theorem 1.1 reduces to the uniform convergence ofũ − ϕ α∞ to zero on R. So far, we have only shown a weak version of closeness for their weighted Hopf-Cole transforms in (4.49). Indeed, notice that where o(1) vanishes as t → ∞. It follows that the established inequalities are quite far apart due to the t γ factor. We handle this issue now. Before going to the proof, note that the uniform convergence to zero ofũ − ϕ α∞ reduces to showing that, for any L > 0, This is sufficient due to the convergence in shape in Proposition 2.2. Fix any ε ∈ (0, α ∞ /2). We note that, due to (4.50), we have for all t sufficiently large and any x. Hence, it suffices to establish upper bounds onũ − ϕ α∞+ε and ϕ α∞−ε −ũ.
To handle the term I 1 in (4.56), we apply (4.49) to find, since β ∈ (0, 2): (4.58) The term I 4 is the next simplest. Using (4.12), we see that Using (4.59) and also (A.24) to handle the tail integral over ((t + 1) γ , ∞), we obtain (4.60) We now handle I 3 . If the bracketed term in the definition of I 3 is non-positive, there is nothing to prove as I 3 ≤ 0. If the bracketed term is positive, we write The reason for the extra step here is that we do not know a priori that e −x v is bounded; however, we do know that e −x ψ α∞+ε is.
For I 31 , we use (4.49), the positivity of the bracketed term, and the fact that the bracketed term is smaller than one (again recall that β ∈ (0, 2)) to find For I 32 , we first notice that In addition, from (4.54), we see that Thus, we find Taylor expanding the exponential and using the asymptotics (4.50) of ζ α∞±ε and (A.24) of φ β , we see that We conclude that The claim (4.52) then follows from (4.53), (4.55), and (4.62). This concludes the proof for β ∈ (0, 2).
The case β ≤ 0 This case is essentially the same, except for some simplifications, so we only highlight the changes that need to be made. The key estimate is to obtain an upper bound on In this case ζ α has the same asymptotics as in (4.50). Then s ε satisfies The last inequality holds when |x| ≤ (t + 1) γ for the same reasons as in (4.27) and due to the fact that β ≤ 0 and ∂ x ϕ α∞+ε = φ ′ β ≤ 0. Thus, the same upper bound as in Lemma 4.3 holds. On the other hand, it is easy to check thatũ > ϕ α∞−ε on x < (t + 1) γ for t sufficiently large. This is due to Lemma 4.1, which yields the correct ordering at x = (t + 1) γ , and the fact thatũ is steeper than ϕ α∞−ε .
The combination of the above with a simpler version of the argument for the case β ∈ (0, 2) yields the desired convergence. This finishes the proof of Theorem 1.1 when β < 2.

The case β = 2: bounds on the front location
We now turn to the case β = 2, where the analysis is particularly delicate. Let us define the shift µ(t) by Note that, due to the sign convention in (5.1), an upper bound on µ(t) is a lower bound on the front location and vice versa. Our goal in this section is to prove the following upper and lower bounds on µ(t).  1), with the initial condition as in Theorem 1.1, and let µ(t) be defined by (5.1). There exists m 0 > 0 that depends on u(0, x) so that Later, in Section 6, we will improve these bounds to the precise asymptotics of Theorem 1.1: Proposition 5.1 is, however, a crucial step in the proof of (5.3). Its proof occupies almost all of the rest of this section and is the heart of this paper. In order to explain the outline of the proof, we will need to do some preliminary transformations, leading to Lemmas 5.2 and 5.4 below that imply the conclusion of Proposition 5.1. We will discuss their proofs when we come to their respective statements. As a technical comment, we mention that without loss of generality we will, once again, At the very end of this section, we use Proposition 5.1 to obtain a helpful bound in an intermediate region in a short Section 5.6.

5.1
Outline of the proof of Proposition 5.1

An exponential moment
We first give a heuristic argument to explain the delay in the case β = 2 and the role of β. Consider the Burgers-FKPP equation, in the moving frame x → x − 2t + µ(t), with an unknown shift µ(t): To highlight the role of β, we have not yet specified it to the value β = 2. A simple computation, using only (5.4) and several integration by parts, shows that the exponential moment In the first equality, we simply used (5.4). In the second-to-last inequality, we used that β = 2. In every inequality between, we only integrated by parts and cancelled terms. Therefore, as long as we choose µ(0) = 0. As is clear from (5.6), this algebraic property is specific to β = 2.
If µ(t) is the "correct frame," in the sense thatũ(t, x) converges to a traveling wave, we expect from the philosophy of the self-similar variables thatũ(t, x) has the asymptotics u(t, x) ∼ 1 1 + e x−x 0 e −x 2 /(4t) , for t ≫ 1 and x ≥ 0. (5.8) This indicates that if I(t) is computed in the correct reference frame, then with an explicit constant C 0 that depends on x 0 . For this to be consistent with (5.7), we should have explaining the (1/2) log t shift for the front position. The rest of this section and the following one is a justification of (5.10).

An inhomogeneous conservation law
In order to explain the outline of the proof of Proposition 5.1, we need to introduce a change of variable related to the evolution of mass for the exponential moment in (5.6). Let u(t, x) be the solution to the Burgers-FKPP equation (1.1) with the specific value β = 2, and set which satisfies Note that we use a slightly different notation here for the shifted function u rather thanũ. This is to denote the difference in shift: u is shifted into the moving frame 2t, whileũ is shifted to the moving frame 2t − µ(t) matching the front, as, for instance, in the proof of Proposition 2.2. The first key observation is that the function satisfies a viscous spatially inhomogeneous conservation law: with the initial condition Recall that we assume, without loss of generality, that u(0, x) = ½(x ≤ 0). Notice that (5.14) conserves mass for p(t, x):ˆp The normalization (5.1) in terms of u(t, x) becomes which translates into A simple preliminary observation is that the solution to (5.14)-(5.15) satisfies simply because u(t, x) ≤ 1. Note also that p(x) = e x and p(x) = e x 1 + e (x−ξ) (5.20) are exact solutions to (5.14), for any shift ξ ∈ R. The proof of Proposition 5.1 relies on the analysis of the solution to (5.14)-(5.15) and proceeds in the following steps. First, we prove an upper bound on µ(t).
Lemma 5.2. There exists K 1 > 0 so that µ(t) be defined by (5.17) satisfies First, we will show that the mass of p(t, x) in the region {x > N √ t} is exponentially small in N by an argument that bounds exponential moments of p. Next, using the simple exponential bound (5.19), we can check that the mass in the region {x < −µ(t)} is bounded above by exp{−µ(t)}.
To bound the mass in the middle region, recall that when β = 2, the traveling wave moving with the minimal speed c * = 2 is explicit and is given by (1.14): From this and the steepness comparison in Proposition 1.1, we immediately deduce the following useful property.
By Lemma 5.3, and (5.18), we know that It follows that the mass in the region −µ(t) ≤ x ≤ N √ t is bounded by 2N √ te −µ(t) (it is easy to show the weak bound µ(t) < N √ t, see (5.28) below). Combining all three bounds and recalling mass conservation (5.16) of p, we deduce that µ(t) must satisfy the upper bound in Lemma 5.2. The details are given in Section 5.3.
To prove a lower bound for µ(t) we use the following lemma.
Lemma 5.4. There exist C > 0 so that, for all x ∈ R and t > 0, The constant C depends on the initial data nontrivially.
We prove Lemma 5.4 in Section 5.4, and its surprisingly delicate proof is outlined there. When combined with (5.18), Lemma 5.4 implies the lower bound in Proposition 5.1 Let us make a brief comment that the t −1/2 decay rate in (5.25) is standard for parabolic equations in one dimension, and would be expected for a solution to (5.13). Nevertheless, the proof of Lemma 5.4 is much less straightforward than one would naively expect. To illustrate the potential obstacles, notice that (5.14) can be written as One might hope to "forget" the connection between u and p, and prove t −1/2 decay for linear divergence form advection-diffusion equations of the form (5.27) in general, with an advection term u(t, x) that, say, connects two constants on the left and on the right. This seems to be a good cartoon for u(t, x). However, such decay cannot hold in general. Indeed, consider the following explicit example. Let The function q(x) is rapidly decaying at infinity but, of course, it does not decay in time. This shows that the boundedness or existence of the limits at infinity of u are not sufficient to determine the decay. It is crucial that u is negligible on R + and its profile does not move much -these are, however, exactly the properties that we are trying to prove. Circumventing these difficulties requires interesting a priori estimates for our specific problem, which will be discussed in Section 5.4. To summarize, Proposition 5.1 reduces to Lemmas 5.2 and 5.4 and we prove these lemmas in the rest of this section.

Preliminary weak bounds on µ
Let us first give some very poor bounds on µ(t) that, at least, ensure that it does not behave too wildly. These are useful in the sequel.
For a very simple bound, a comparison to the standard KPP equation with β = 0 that uses monotonicity of u(t, x) in x, implies that with a universal constant C.
We now use the conservation of mass for p (5.16) to obtain a lower bound on µ. Recalling Lemma 5.3 and the definition (5.1) of µ, we find p(t, x) = e x u(t, x) ≥ e x 1 + e x+µ(t) , for x < −µ(t).
We emphasize that the proof of Lemma 5.3 is independent of all other lemmas in this section -it is simply a consequence of the steepness comparison. Therefore, we have We conclude that µ(t) ≥ − log 2. (5.29) The bounds (5.28) and (5.29) will be greatly improved below.

5.3
An upper bound on the shift: the proof of Lemma 5.2 As we have mentioned, the strategy of the proof is to show that if µ(t) is too large then p(t, x) can not have a mass larger than 1/10 in any of the three regions 30) provided that N is also chosen sufficiently large. This would contradict (5.16). Note that (5.29) implies N √ t ≥ −µ(t) for t ≥ 1 and N sufficiently large, so that the regions above are well-defined. For the left region L, we simply apply (5.19) and writê For the middle region M , we also have a simple estimate that uses Lemma 5.3: for all t ≥ 1. We used (5.28) in the last step above. Now, we deal with the right region R. First we state a bound on an exponential moment of p: fix any m ∈ (0, 1/2) and let I m (t) =ˆ e mx + e −mx p(t, x)dx. (5.33) We claim that We postpone the proof of (5.34) momentarily and show how to conclude the bound for the integral over R. For N > 1 large, we estimate this mass by using (5.33) with m = 1/ √ t. This giveŝ We used the conservation of mass (5.16) in the last step.
All that remains to finish the proof of Lemma 5.2 is to establish (5.34), which we do now. Recalling I m from (5.33), multiplying (5.14) by e mx + e −mx , and integrating yields We decompose the last integral: On the other hand, as x > log(2) > −µ(t), by (5.29), we may apply Lemma 5.3 to obtain (2) [e mx + e −mx ]p(t, x)dx.
In the second inequality, we used (5.41), and in the last inequality, we used that |x| ≤ e x ≤ e −µ(t) (1 + e x+µ(t) ).

A lower bound on the shift: the proof of Lemma 5.4
The proof of Lemma 5.4 is quite a bit more involved than that for Lemma 5.2. Let us first explain the main steps of the proof. Recall that the standard L ∞ -decay for diffusion equations of the self-adjoint form with a uniformly positive and bounded diffusivity a(x) is obtained as follows: first, one gets the dissipation inequality 1 2 An application of the Nash inequality leads, after solving an elementary differential inequality, to the L 1 − L 2 decay estimate z(t, ·) L 2 ≤ C t n/4 z(0, ·) L 1 . (5.46) The self-adjoint form of (5.44) and the estimate (5.46) give the dual bound z(t, ·) L ∞ ≤ C t n/4 z(0, ·) L 2 . (5.47) The last step is to apply the semi-group property and the above estimates to deduce that See, e.g., [18, Section 2.4] for a full treatment of this. It seems not possible to directly obtain a dissipation inequality for the L 2 -norm of the function p(t, x), starting with (5.14), due to the spatial inhomogeneity of the nonlinear term. Instead, to get an analogue of (5.45), we will use a suitably chosen weight ρ(t, x) that weighs R + more than R − , and establish an L 2 w -dissipation inequality for the function ϕ = p ρ . (5.49) Here we use w as a subscript to emphasize that we are working in weighted Lebesgue spaces. Such weight allows to focus on where advection is negligible and diffusion dominates the evolution of (5.27).
It turns out that the weight ρ(t, x) = 1 − u(t, x) can actually be used to produce a dissipation inequality because, as we will see, the function 1− u is a super-solution for (5.14) satisfied by p(t, x). This property is not purely algebraic: it will, once again, use the steepness comparison of u(t, x) to the traveling wave. The dissipation computation is directly inspired by the relative entropy arguments for linear advection-diffusion equations in [17,41]. Here, however, we compute the entropy relative not to a solution but a super-solution, and the nonlinear nature of the present situation requires specific cancellations. This is the subject of Proposition 5.8 below.
We also establish a weighted Nash inequality stated in Proposition 5.9 below. When adapted to our setting, it yields the appropriate long time decay of the L 2 w -norm of ϕ, up to a (potentially large) boundary layer in time. An interesting complication is that the weighted Nash inequality holds for a nontrivial class of weights satisfying certain assumptions, and we need to control the fact that our weight, coming from the solution to a nonlinear evolution equation, satisfies these assumptions for all t > 0 in a uniform way. This is done in the course of the proof of Lemma 5.7 below.
In contrast to the standard proof for diffusion equations, we cannot directly pass from L 2 w -decay to L ∞ -decay. Indeed, due to the weight, the aforementioned L 2 w -decay estimate is not an L 1 w → L 2 w estimate as the boundary layer depends on the initial L 2 w -norm of ϕ and, hence, the usual adjointness trick in (5.47), used to establish the L ∞ decay is not available. This complication is present even for the linear equation (5.27) when the relationship between p and u is "forgotten." In fact, decay in L ∞ w of ϕ is not even expected in the setting in which we find ourselves. To pass from the L 2 wbounds for the function ϕ(t, x) to the L ∞ decay for p(t, x), we use time averages to find a particular intermediate time T g < T , at which ϕ satisfies "good" pointwise bounds in a region of interest. Those bounds, stated in Lemma 5.10, can be transferred to show that p is bounded both by e x and C/ √ T . We can then "trap" that estimate going forward in time, from the time T g until the time T , by breaking p up into a small mass part, which necessarily stays small due to conservation of mass and parabolic regularity theory, and a part that sits under an explicit, small super-solution. This will conclude the proof.
As the reader will surely have noticed, there is a subtle, but serious, issue in the above outline. The L 2 w decay requires that the initial L 2 w norm be bounded, which, as can be immediately seen from (5.49), is not true when u(0, ·) is the Heaviside function, the function p(0, ·) is given by (5.15), and ρ = 1 − u. This requires an extra step where we choose an approximate initial condition u a (0, x) = 1 with γ ∈ (1, 2), to obtain new solutions u a and p a . Here, a > 0 is a parameter depending on the final time T at which we wish to establish the upper bound. A careful analysis shows that, with an appropriate choice of a, the two solutions, p and p a , stay O(1/ √ T ) away from each other, due to the error estimate in Lemma 5.6.
Below, we state the upper bound on the modification p a and show how to bootstrap that bound to the decay of p itself. This is done in Section 5.4.1. Then we give the proof of the upper bound on p a following the outline above, in Section 5.4.2.

The proof of Lemma 5.4 A modified initial condition
We first construct the modified solution p a (t, x). Recall that u(t, x) is the solution to the Burgers-FKPP equation with the initial condition u in (x) = ½(x ≤ 0). For any a > 0 and γ ∈ (1, 2), let u a (t, x) be the solution to (5.51) with the initial condition u a (0, x) = 1 1 + e γ(x−a) ½(x ≤ 0), (5.52) and set p(t, x) = e x u(t, x), p a (t, x) = e x u a (t, x). (5.53) As in (5.14) and (5.27), the function p a satisfies and We have made the switch to ∂ t,x notation to avoid the awkward double sub-script. It is easy to observe that u a ≤ 1, and we have, by the comparison principle (notice that e x solves (5.54)): Two important quantities of interest for us are dx.

(5.57)
The last equality in the first line above follows by conservation of mass. Note that M a and I a are, respectively, the weighted L 1 and L 2 norms of the function ϕ = p a /(1 − u a ) with weight 1 − u a . Two easy computations show that M a (0) is uniformly bounded in a > 0: and I a (0) is finite for all a > 0: but is not uniformly bounded as a → +∞. Note that an analogous estimate for I(0) with u a and p a replaced by u and p, respectively, does not hold, as This is what prevents us from establishing the decay of p directly.
As we see in the sequel, another crucial feature of this altered initial condition is that u a (0, ·) is steeper than the traveling wave 1/(1+e x ). This is why we take γ > 1 in (5.52). The restriction γ < 2 comes from the upper bound on I a (0) in (5.59).
The main estimate we establish on p a is: Let u a (t, x) be the solution to (5.51) with the initial condition (5.52) for some a > 0 and 1 < γ < 2, and p a (t, x) = e x u a (t, x). There exists a universal constant C 5.5 > 0 such that the following holds. Set t 1 (a) = C 5.5 I a (0), (5.61) with I a defined in (5.57). There exists K > 0 that does not depend on a > 1 so that We postpone the proof of Lemma 5.5 until Section 5.4.2. First, we obtain a closeness estimate on p and p a . Its proof is succinct enough to give it immediately.
Lemma 5.6. Fix a ∈ R and γ > 0. Let u(t, x) and u a (t, x) be the solutions to (5.51) with the respective initial conditions u(0, x) = ½(x ≤ 0) and (5.52), respectively. There is a constant C > 0 that does not depend on a or γ such that p(t, x) = e x u(t, x) and p a (t, x) = e x u a (t, x) satisfy x) for all t ≥ 0 and x ∈ R. Note that h(0, ·) ≥ 0 and, due to (5.14) and (5.54), that h satisfies the parabolic equation Hence, the comparison principle implies that h ≥ 0. This finishes the lower bound in (5.63).
To conclude the upper bound, we use mass conservation for p(t, x) and p a (t, x). Indeed, We are now in a position to combine Lemma 5.5 with Lemma 5.6, in order to prove Lemma 5.4. One delicate point is that Lemma 5.6 requires us to take a large. On the other hand, I a (0) in (5.59) blows up as a → +∞. This will require a careful balancing act.
We note that we need only prove Lemma 5.4 for T sufficiently large, as the claim follows for "smaller" T by simply increasing the constant C. where the last inequality follows due to (5.67). Due to (5.70), we may apply Lemma 5.5 to find We may also use Lemma 5.6 to see that for all x ∈ R and t > 1. (5.72) Recalling the choice (5.68) of γ and a in (5.72) and using (5.71) to t = T , we deduce that finishing the proof of Lemma 5.4.

The proof of Lemma 5.5
We now prove Lemma 5.5, following the outline from the beginning of Section 5.4.
Step one: decay in a weighted L 2 space Our first goal is to obtain an L 2 -decay estimate, an analogue of (5.46) in the present situation. We will work with norms weighted by ρ a = 1 − u a . (5.73) and make a change of function The main goal of this step is the decay rate of I a , which we state here. Recall that u a and p a are defined in Lemma 5.5.
Lemma 5.7. For a > 0 and γ ∈ (1, 2), let t 1 (a), M a , and I a (0) be as in (5.61) and (5.59). Then and , for t ≥ t 1 (a). (5.76) Note that I a may be thought of as ϕ a 2 L 2 (ρa) . To prove Lemma 5.7, we obtain a more general result for weighted L 2 decay, identifying assumptions on the weight under which decay holds. Afterwards, we show that ρ a satisfies these assumptions and apply the general result to our setting.
The first step is a dissipation inequality, inspired by the relative entropy arguments for linear equations in [17,41]. As in [17], given a weight ρ(t, x) and an advection v(t, x), we consider an operator D ρ defined by Let v(t, x) be a smooth bounded function, q(t, x) be a solution to and ρ(t, x) be a super-solution to (5.78): with initial conditions p(0, x) ≥ 0 and ρ(0, x) > 0 such that where ϕ(t, x) = q(t, x)/ρ(t, x) for all t ≥ 0 and x ∈ R. Then ϕ satisfies The main difference with [17,41] is that the function ρ(t, x) is not a solution to the advectiondiffusion equation (5.78) but a super-solution.
The second general result is an adaptation of the Nash inequality for weighted spaces. This allows us to make use of the dissipation inequality in Proposition 5.8. Then, for any θ > 0 and any smooth non-negative function ϕ(x) that is sufficiently rapidly decaying as x → +∞ and bounded as x → −∞ we havê Let us point out that the weight satisfies assumptions of Proposition 5.9, and this is an example the reader may want to keep in mind.
Let us also briefly note the connection with the standard Nash inequality. A key step in the proof of the latter is to establish (5.85) with max{1, θ 2 } replaced by θ 2 . The change here reflects the fact that the measure induced by r is finite on R − . The proofs of Proposition 5.8 and Proposition 5.9 are postponed until Section 5.5. We first apply these results to establish the L 2 decay of p a , that is, Lemma 5.7.

Proof of Lemma 5.7
We first assume that the assumptions of the dissipation and Nash inequalities, that is, Proposition 5.8 and Proposition 5.9, hold in a uniform way for ρ a (t, x), and show how to conclude. Afterwards, we show how to verify those assumptions.
Assuming that we can apply Proposition 5.8 and Proposition 5.9, we proceed as follows. By Proposition 5.8, we have Moreover, recalling the definition of M a from (5.57) and that it is uniformly bounded, as in (5.58), and applying Proposition 5.9, we find, for any θ > 0, with a constant C that does not depend on t, γ, or a. There are two cases to consider: first, assume that at some t > 0 we havê Then, we get from (5.86): On the other hand, if (5.88) fails, so that |∂ x ϕ a (t, x)| 2 ρ a (t, x) dx < 1, we can use (5.87) with the constant Keeping in mind the upper bound (5.58) on M a , this leads tô . Therefore, in the second case we have  Therefore, for t ≥ t 1 (a), we have Hence, all that remains in the proof of Lemma 5.7 is to verify the assumptions of Proposition 5.8 and Proposition 5.9. We consider first the assumptions of Proposition 5.8. In the notation of Proposition 5.8, we have q = p a , v = u, and ρ = ρ a . In view of (5.55) and (5.59), the assumptions (5.78) and (5.80) are satisfied. Hence, the only assumption to check is (5.79), a somewhat miraculous property that First, using that u a satisfies (5.51), we find The initial condition for u a in (5.52) was chosen to be steeper than the traveling wave φ given by (5.22) -this is why we needed to take γ > 1 in (5.52). Using Proposition 1.1, we deduce that u a (t, x) is steeper than φ for all t > 0. Applying this property to (5.97) gives Using the explicit expression (5.22) for φ, it is straightforward to check that the right side of (5.98) is non-negative everywhere: This can, of course, be also obtained from (3.24). Hence, (5.96) is established. Next, we verify that ρ a satisfies assumptions (i) and (ii) of Proposition 5.9 uniformly for all t > 0. Assumption (i) holds automatically since ρ a is increasing in x (recall that u a is decreasing in x), with ρ a (t, −∞) = 0 and ρ a (t, +∞) = 1 for all t > 0. Assumption (ii) in (5.83) requires that It is useful to (implicitly) define the analogue µ a of µ for u a : Consider first the case when x < −µ a (t). As u a (0, ·) is steeper than the traveling wave, so is u a (t, ·), by Proposition 1.1, and we know from Lemma 5.3 that, if y ≤ x, then ρ a (t, y) = 1 − u a (t, y) ≤ ρ a (t, x)e y−x 1 − ρ a (t, x) + ρ a (t, x)e y−x . (5.102) This givesρ ≤ 2ρ a (t, x).

(5.103)
In the last line we used that x < −µ a (t), so that This implies (5.100) for x such that ρ a (t, x) < 1/2: On the other hand, when x ≥ −µ a (t), we have ρ a (t, x) ≥ 1/2, so that Since ρ a (t, x) and ρ a are increasing, we find, using (5.103) for y < −µ a (t) and (5.105) for y > −µ a (t): Taking the maximum of (5.104) and (5.106), we arrive at (5.100). We deduce that the function ρ a (t, x) satisfies the assumptions of the weighted Nash inequality in Proposition 5.9, for all t > 0. This concludes the proof of Lemma 5.7.
Step two: pointwise bounds at a particular time We now find a "good time" T g < T when ϕ a satisfies the desired bounds via a time averaging. The catch is that we do not have control over T g and, thus, a third step is required afterwards, to control the solution on the time interval T g ≤ t ≤ T .
To this end, we use time averages in order to find the "good time" T g when the (weighted) L 2 bound of ∂ x ϕ a is small. Since T /2 ≥ 2t 1 (a), Lemma 5.7 yieldŝ for all t ≥ T /2. Integrating the dissipation inequality (5.82) and using (5.109), we obtain 4 Tˆ3 As a result, there exists Heuristically, we conclude by arguing that, due to (5.110), if ϕ a is "too big" somewhere to the right of −µ a (t), then it must be "too big" on a large set. However, in this region, ρ a is bounded above and below and p a and ϕ a are comparable. As a result, p a will be "too big" on a large set, violating mass conservation. The key point here that makes the above reasoning work is that, by the definitions (5.73), (5.74), and (5.101) of ρ a , ϕ a , and µ a , we have In particular, working on x > −µ a (t) is crucial here as ρ a is potentially small to the left of −µ a (t).
To make this reasoning rigorous, we take any x 0 ≥ −µ a (T g ), and use the Newton-Leibniz formula for any y ∈ [x 0 , x 0 + √ T ] along with (5.110)-(5.111), to obtain a lower bound for ϕ a (T g , y): Using again (5.111), this yields Recalling mass conservation (5.16) and using (5.112) we arrive at Rearranging the above and using the arbitrariness of x 0 > −µ a (t), as well as (5.111), to translate this into a bound for ϕ a (t, x 0 ), finishes the proof of Lemma 5.10. .
Step three: preserving the L ∞ -smallness over [T g , T ] We are now in a position to combine the results in the first two steps to finish the proof Lemma 5.5, establishing the upper bound (5.62) on p a . Proof of Lemma 5.5. Combining (5.111), Lemma 5.10, and (5.56), we find C > 1 such that The function is a steady solution to the equation (5.54) that p a satisfies as Moreover, it obeys a uniform bound The next step is to split p a into a portion bounded above by P and a small error part. Let us write Here, the "P -portion" ψ P ≥ 0 solves with the initial condition ψ P (T g , x) = min{p a (T g , x), P (x)}, x ∈ R. (5.119) The "error part" ψ E ≥ 0 solves with the initial condition We now estimate ψ P and ψ E at time T . From (5.115) and (5.119), it is clear that and both functions solve (5.118). Hence, the comparison principle implies that ψ P (T, x) ≤ P (x) for all x ∈ R. Using this and (5.116), we find as desired.
On the other hand, (5.120) implies that the total mass of ψ E is conserved: We now bound the right hand side. First, by (5.113) and (5.114), From this and (5.120), it follows that Hence, using (5.121) and (5.56), we obtain (5.124) Invoking (5.123), we obtainˆψ Recall that T g ≤ 3T /4 and T is sufficiently large. We may, thus, apply parabolic regularity theory (recall that ψ E solves (5.120)) to conclude that, up to increasing C, we have

5.5
The weighted L 2 framework: the proof of Propositions 5.8 and 5.9 First, we establish the dissipation inequality in Proposition 5.8. Proof of Proposition 5.8. Following [17], let us write an equation for h = H(ϕ), with a given function H, and not just for the cases H(ϕ) = ϕ and H(ϕ) = ϕ 2 we use here. Setting (5.127) We now use (5.78) to obtain When H(ϕ) = ϕ, this establishes (5.81).
Let us now specialize to the case h = H(ϕ) = ϕ 2 to prove (5.82). Multiplying (5.128) by ρ and integrating by parts, we find finishing the proof. Next, we prove the analogue of Nash's inequality. Due to the inhomogeneity, we are not able to use the standard Fourier-based proof. The proof is more similar to that of Carlen and Loss [15], which is based on the Poincaré inequality. A complication in our setting is that the decay of r on the left makes our space more akin to R + than R; however, we only have a mild "boundary condition" on the left in that we only know that ϕ 2 r is integrable.
Proof of Proposition 5.9. Let us first assume that ϕ is not only non-negative, bounded, and decays to zero sufficiently fast as x → +∞ but also monotonically decreasing. Fix θ ∈ R + and notice that, due to the assumptions on r, we may find L ∈ R such thatr(L) = θ. Fix ε > 0 to be chosen, and write Note that, since ϕ(x) is decreasing, Rearranging (5.130) and using (5.131), we find (5.132) Since r is increasing, we have (5.134) Next, using assumption (ii) in (5.83), together with monotonicity ofr(x) giveŝ We may then define rearrangements of functions through the "layer cake decomposition": As with the usual symmetric decreasing rearrangement, we immediately find that ϕ * is decreasing, the functions ϕ and ϕ * have super-level sets of equal measure, and, for any p > 0, We claim that an analogue of the Pólya-Szegö inequality holds: Since we have already established (5.85) for decreasing functions, the fact that (5.85) holds for a general function ϕ is an immediate consequence of (5.138) and (5.139). We now prove (5.139). In principle, it is an immediate consequence of a more general result in [50]. We present a simpler proof in our present one-dimensional setting for the convenience of the reader. We may assume without loss of generality that the function ϕ(x) is positive everywhere, smooth and takes each value finitely many times. A useful consequence is that any level set of ϕ * has at most one point.
First, we use the coarea formula to rewritê Then, we note that so that (5.140) becomesˆϕ Next, we show that both the numerator and denominator in the right hand side of (5.142) may be replaced by their analogues with ϕ * in place of ϕ, up to an inequality. First, since we conclude that (ϕ * ) −1 (t) ≤ sup ϕ −1 (t). Using that r is increasing, we conclude that On the other hand, by the coarea formula, we have that, for any t > 0 ds.
An immediate consequence is that The analogous formula for ϕ * holds. Since m r ({x : ϕ(x) > t}) = m r ({x : ϕ * (x) > t}) for all t, it follows that .

(5.144)
The last equality above is due to the fact that (ϕ * ) −1 (t) is a one point set. Including (5.143) and (5.144) in (5.142), we conclude that Reapplying the coarea formula to the rightmost quantity above yieldŝ which concludes the proof of (5.139) and, thus, of Proposition 5.9.

A lower bound on p(t, x) in the middle region
We finish this section with a lower bound on p(t, x) in an intermediate region that will be useful to us later on.
Lemma 5.11. Let u(t, x) be the solution to the Burgers-FKPP equation (1.1) with the initial condition u(0, x) = ½(x ≤ 0), and p(t, x) be given by (5.11) and (5.13). There exist k > 0, T 0 > 0 and c 0 > 0 so that we have Proof. Let us take t > 0 sufficiently large. We argue by contradiction. For ε ∈ (0, 1/10) and k > 0 to be chosen, suppose there exists We will show that this violates mass conservation (5.16): if ε and k are chosen sufficiently small. The first step is to notice that the mass to the left of (−1/2) log t is small. Indeed, due to the upper bound (5.19) We also know that the mass to the far right is small. Indeed, recall from (5.35) that there exist T 0 and N 0 so that for all t > T 0 and N > N 0 , we havê To estimate the mass of p in the middle region, we split it into two parts. When x ∈ [k √ t, N √ t], we use (5.148) together with the steepness estimate in Lemma 5.3 to find We used (5.146) in the second inequality above. It follows that On the other hand, when x ∈ [−(1/2) log t, k √ t], we use Lemma 5.4 to find Putting together (5.150), (5.151), (5.153), and (5.154) yieldŝ Taking ε and k sufficiently small, and N sufficiently large we obtain a contradiction to (5.149). The conclusion of Lemma 5.11 follows.
6 The proof of Theorem 1.1 for β = 2 We now prove Theorem 1.1 in the critical case β = 2. As in the case β < 2 considered in Section 4, the proof is based on the analysis in the self-similar variables, and using the pulled nature of the problem to show that convergence on the diffusive scales implies convergence to a traveling wave on scales x ∼ O(1). The key difference with the situation for β < 2 is that, as we have mentioned previously, the Dirichlet boundary condition at η = 0 for the function v, introduced by the weighted Hopf-Cole transform (3.12), no longer approximately holds in the self-similar variables. Instead, the function v has a positive but a priori unknown limit on the left: see Lemma 6.1 below. The bounds in Proposition 5.1 will be a crucial ingredient in establishing the correct boundary condition in the self-similar variables. After we pass to the self-similar variables, the non-zero boundary condition changes the long time behavior in the self-similar variables. This is an algebraic reason for adjusting the logarithmic shift (3/2) log t in the front position to (1/2) log t. With these bounds in hand, the argument has many similarities with the case β < 2, so we will omit many of the details, only highlighting the differences. As before, we will assume without loss of generality that the initial condition is u(0, x) = ½(x ≤ 0).

The weighted Hopf-Cole transform
Motivated by Proposition 5.1, we consider the moving frame: x → x − 2t + (1/2) log(t + 1), and set u(t, x) = u(t, x + 2t − 1 2 log(t + 1)). (6.1) Here, u(t, x) is the solution to the Burgers-FKPP equation (1.1), in the non-shifted reference frame. We stress the difference between the function u(t, x) defined in (5.11) and used throughout Section 5, and the functionũ(t, x) used in the present section. The former is u(t, x) in the reference frame x → x − 2t, while the latter refers to the solution in the reference frame used in (6.1). In particular, the function p(t, x) used throughout Section 5 andũ(t, x) are related by p(t, x) = e x u(t, x) = e x u(t, x + 2t) = e xũ t, x + 1 2 log(t + 1) .
The key step in the proof will again be to analyze the long time behavior of v at scales O((t + 1) γ ) for γ ∈ (0, 1/2), and this will be done through the use of self-similar variables. However, the key difference with the case β < 2 is that v does not decay to zero as x → −∞ (cf. (4.13)). This is quantified in the following: for all t ≥ T 0 and any x, (6.6) and Proof. Let us defineμ(t) byũ Hence, for x ≤ 0 we can estimate v(t, x) from above by which is (6.6). The proof for x ≥ 0 is simpler and follows directly from (6.10).
To obtain (6.7), we first use (6.11) to write for x ≤ −K:  In order to extend this lower bound on the right, we recall the lower bound in Lemma 5.11. Combined with the change of variables (6.2), it shows that we can find c 0 > 0 and k 0 > 0 so that for 0 ≤ x ≤ k 0 √ t and t ≥ T 0 we havẽ (6.14) We deduce that Finally, for x ∈ (−K, 0), we see from the monotonicity ofũ(t, x) and (6.14) thatũ(t, This proves (6.7).

Analysis in the self-similar variables
We now outline the main ingredients required to establish the long-time behavior of v. The thrust of the argument is similar to the case β < 2, so the outline will be made in reference to the ideas used in Section 4. First, apply the familiar self-similar change of variables: v(t, x) = w log(t + 1), Then, (6.5) leads to the evolution equation (1 −ũ)dy −ũ , (6.17) with the operator L defined by This is different from the operator L in (4.4) by a multiple of the identity. Considering L as an operator on H 1 (e η 2 /4 dη; R + ) augmented with Neumann boundary conditions, its spectrum consists of the eigenvalues 0, 1, 2, . . . with (unnormalized) principal eigenfunction ψ 0 (η) = e −η 2 /4 . Importantly, Lemma 3.2 shows that the right side of (6.17) is negative: Recall the major ingredients in establishing the long-time dynamics in self-similar variables when β < 2: (1) a super-solution solving a tractable equation; (2) a sub-solution solving a (potentially different) tractable equation; (3) smallness of the right hand side when η ≫ e −τ /2 ; (4) an approximate boundary condition when η ≪ −e −τ /2 . We now check that analogous ingredients are available in this case.

(6.24)
Hence, the right hand side of (6.17) is double exponentially small.
Finally, we address the approximate boundary conditions of ω at η = −e −(1/2−γ)τ . Keeping Lemma 6.1 in mind, an approximate Dirichlet boundary condition is not possible. This is, of course, part of the reason for the different shift when β = 2. Instead, we have an approximate Neumann boundary condition. First notice that, due to (6.11), we have that |1 −ũ(e τ , −e γτ )| ≤ Ce −e γτ . Hence, w satisfies an approximate Neumann boundary condition with double exponentially small error. Thus, the main ingredients to an analogous argument as in the case β < 2 are in place, with the only major difference being the change from approximate Dirichlet boundary conditions to approximate Neumann boundary conditions. The change in boundary conditions changes the principal eigenfunction of L and suggests that the long-time behavior of w should look like α ∞ e −η 2 /4 for η ≥ 0. This is confirmed by the following key estimate, which is the analogue of Lemma 4.2. As the proof follows from similar arguments as in Lemma 4.2, using the four ingredients above and our knowledge of the spectrum of L, we omit the details. Lemma 6.2. Given w solving (6.17) with initial conditions w(0, η) = 1(η ≤ 0), there exists a constant α ∞ > 0 and a function R γ such that w(τ, η) = α ∞ e −η 2 /4 + R γ (τ, η)e −η 2 /6 for η ≥ −e −(1/2−γ)τ , (6.28) for any γ ∈ (0, 1/2) and Before proceeding, we note that the role of Lemma 6.1 in the proof of Lemma 6.2 is to guarantee the positivity of α ∞ and the boundedness of w. From Lemma 6.2, we immediately obtain the longtime behavior ofũ at scales between O(1) and O( √ t).
7 Convergence to pushed fronts for β > 2 In this section, we consider convergence to the minimal speed traveling wave in the case β > 2.
As we have mentioned in the introduction, the proof is quite standard and follows the classical approach of [47,48,49] for the convergence to a traveling wave in the pushed front regime. Let us recall that for all β > 0 there is a traveling wave solution to the Burgers-FKPP equation (1.1) of the form φ β (x) = 1 1 + e βx/2 (7.1) that moves with the speed A key point is that for β ≥ 2, the speed c * (β) given by (7.2) also happens to be the minimal front speed. In other words, for β ≥ 2 the minimal speed wave profile is explicit and given by (7.1).
Another property that will be crucial for the analysis is that for β > 2 we have According to the criterion of [24], this puts the traveling front (7.1) into the category of pushed fronts, unlike the fronts for β ≤ 2, for which (7.3) does not hold. Accordingly, the proof of the large time convergence of the solutions to the initial value problem for (1.1) with β > 2 consists of three steps that are common in such results for pushed fronts: compactness, local stability and quasi-convergence.

Compactness
As the first step, we show that the solution u(t, x) to (1.1) with β > 2 can be trapped between an explicitly constructed super-solutionū and a sub-solution u. Each of them will converge to a separate shift of the traveling front, exponentially fast in time. Here, we follow the construction in [47]. We denote by c * = c * (β), as given by (7.2), and drop the subscript β in the notation for the traveling wave profile φ β (x) given by (7.1). Consider the moving frame z = x − c * t, setting u(t, z) = u(t, z + c * t). (7.4) This function satisfies with the initial condition u(0, z) = u in (z). The sub-and super-solutions in the moving frame are described by the following lemma.
Proof. We only prove the claim for v. We drop the subscripts of q 1,2 and ξ 1,2 and superscripts of q (1,2) 0 , and z (1,2) 0 in order to simplify the notation. Let us insert the ansatz for v(t, z) given by the left side of (7.6) into the desired inequality that needs to hold for v(t, x) to be a sub-solution to (1.1). This gives Here, we have set We fix a sufficiently small δ > 0, and consider (7.10) in three regions of z ∈ R separately.
The middle region: There exists α δ > 0, which depends on δ > 0, so that for all z in the middle region we have We need to consider separately the points where z ≥ z 0 , or z < z 0 , as this changes the definition of q(t, z). If z > z 0 , then q satisfies (7.15), and we use (7.23) to obtain, using positivity of ξ ′ (t) once again, Due to the assumption on λ, λβ > 1, which implies that q 2 − λβq 2 ≤ 0. It follows that C does not depend on q 0 . On the other hand, if z ≤ z 0 , then q t = −µq, q z = 0, (7.25) and, as long as we choose q 0 < δ, we get, after another appplication of (7.18): after possibly increasing K depending only on δ, which, in turn, depends on D 0 and λ. The far left region: Arguing as in the far right region setting, we see that, up to further decreasing δ depending on D 0 , we have z + z 0 ≤ 0. Therefore, q(t) satisfies (7.25), and we have The explicit form (7.1) of the profile φ(x) as x → −∞ implies that in the far left region we have As we also have F ′ (1) = −1, we can assure that

Local stability
Lemma 7.1 implies a certain compactness for the solution v(t, z) to (7.5) that eventually will lead to the nonlinear stability result. To formulate it, we need to introduce the weighted Banach space where η(z) = min{1, exp(−λz)}. (7.32) We also introduce the norm v λ,1 = v λ + v z λ (7.33) for a later usage. Let us define the ω-limit set of u in ∈ B λ with respect to the evolution (7.5) as By the standard parabolic regularity theory (see, e.g., [35, Chapter IV]), we know that the "orbit" {v(t, ·) : t ≥ 1} is relatively compact in C 2 loc (R). Thus, the set ω(u in ) is nonempty for a given initial condition u in ∈ B λ . Proposition 2.2 implies that each element of ω(u in ) is either a constant 0 or 1, or a shift of the traveling wave φ(x). Indeed, v is simply u in the moving frame c * t. Hence, if m β (t n ) − c * t n → ∞, then Proposition 2.2 implies that v(t n , ·) → 1 as the front m β (t n ) is "far ahead." Similarly, if m β (t n ) − c * t n → −∞, then v(t n , ·) → 0. Finally, if m β (t n ) − c * t n converges to a constant, then Proposition 2.2 implies that v(t n , ·) converges to (a shift of) the traveling wave.
Lemma 7.1 rules out the possibility of convergence to a constant. Our goal is to prove that it consists of exactly one element, and that the solution converges exponentially fast to that particular shift of a traveling wave. Lemma 7.1 implies the following local stability result.

Quasi-convergence and exponential stability
Now, convergence in shape in Proposition 2.2 and the steepness comparison in Proposition 1.1, together with Lemma 7.2 imply the following quasi-convergence property.
Corollary 7.3. Fix λ ∈ (2/β, β/2). Let v(t, x) be the solution to (7.5) with the initial condition v 0 (x) = ½(x ≤ 0). Then there exists a sequence t n → +∞ and s 0 ∈ R so that To improve the result of Corollary 7.3 to the exponential convergence, we will use the method of [48,49], which consider convergence to traveling waves for equations of the form In the Burgers-KPP case, we have If one looks for solutions that are perturbations of a traveling wave, that is then in the moving coordinate z = x − c * t, the equation of w can be decomposed as Here, the linearized operator is The remainder R is a nonlinear operator whose Fréchet derivative vanishes at w = 0.
The main result of [49] is that a local stability result, as we have in Corollary 7.3, implies exponential convergence to a traveling wave, as long as the operator L satisfies certain spectral assumptions that we will now recall and specify to the Burgers-FKPP equation. Let us put the linearized operator in the from and set To remove the drift term in the operator L in (7.39), Sattinger introduces the operator so that In our case, with φ(x) given explicitly by (7.1), and c * by (7.2), we have the left and right limits We see that p ± = lim z→±∞ p(z) are given by as β > 2, and As both p + < 0 and p − < 0, the operator M is stable both as x → +∞ and x → −∞, and Theorems 1 and 2 in [49] imply that convergence in Corollary 7.3 is actually exponential in time.

An informal derivation of the higher corrections
We explain in this section how the non-rigorous but extremely interesting methodology of [8] can be used to predict the higher order corrections to the logarithmic shift in the front position. This strategy was applied in [8] for the classical Fisher-KPP equation, and the first two extra terms in the expansion were rigorously confirmed in [27,45], leading to the long-time front position asymptotics a significant refinement of Theorem 1.1. The terms that appear in (8.1), except for x ∞ , do not depend on the initial conditions. Moreover, they are expected to be universal for a large class of Fisher-KPP type problems. That is, this expansion has been shown to hold, with exactly the same coefficients, for all equations of the form with a nonlinearity f (u) of the Fisher-KPP type, normalized so that f ′ (0) = 1 as long as f ∈ C 1,δ near 0 (see [9] for the treatment of the less regular case). An analysis similar to what we do in this section for β = 2 would show that (8.1) holds with exactly the same coefficients for the Burgers-FKPP equation (1.1) for all β < 2, confirming the universality prediction of [8]. We omit the details. The goal of this section is to show that when β = 2 this expansion changes the coefficients but not the form of the individual terms to m(t) = 2t − 1 2 log(t + 1) − x ∞ − √ π 2 √ t + 1 + (1 − log 2) 4 log(t + 1) t + 1 + o log t t .

(8.3)
It is tempting to conjecture that the expansion (8.3) is also universal for problems that combine the pulled nature with traveling waves that decay as e −x rather than the Fisher-KPP asymptotics xe −x . This regime is referred to as Case (II) in Chapter 2 of [38] for reaction-diffusion equations of the form (8.2). We begin with the following observation, inspired by [8], that we state for all β ≤ 2. Let u(t, x) be the solution to (1.1) with an initial condition u in (x) and set ϕ(t, r) =ˆR u 2 (t, z)e rz dz, Φ 1 (r) = (1 − rβ/2)ˆ∞ 0 ϕ(t, r)e −(r 2 +1)t dt, (8.4) and Φ(r) =ˆR u in (x)e rx dx. (8.5) Note that the function Φ(r) is smooth for all r > 0, as long as the initial condition u in (x) is compactly supported on the right. On the other hand, as we will see below, the function Φ 1 (r) may potentially blow up as r → 1 − . This possibility is removed by the following identity. Proof. The proof is a modification of the argument in [8]. Without loss of generality we assume that L 0 = 0. First, we recall that, as we have shown in (4.9)-(4.10), for all β ≤ 2, we have the upper bound u(t, x) ≤ū(t, x) := 1 1 + e x−2t , for all x ∈ R. (8.8) It follows that ϕ(t, r) is defined and differentiable in r for all r < 2. This also shows that for all r ∈ (0, 1) we have g(t, r) :=ˆR u(t, x)e rx dx ≤ˆR e rx 1 + e x−2t dx = I 0 (r)e 2rt , (8.9) with I 0 (r) =ˆR e rx 1 + e x dx < ∞, since r ∈ (0, 1). Next, differentiating g(t, r) in t, we obtain g t (t, r) =ˆR u t (t, x)e rx dx =ˆR − β 2 (u 2 ) x + u xx + u − u 2 e rx dx = (1 + r 2 )g(t, r) − (1 − rβ/2)ϕ(t, r).
As 2r < 1 + r 2 , we may use (8.9) to pass to the limit t → +∞ and obtain (8.7). An immediate consequence of Proposition 8.1 is that the function Φ 1 (r) remains regular and even infinitely differentiable, with bounded derivatives as r → 1 − . A surprising discovery of [8] is that for the classical Fisher-KPP equation one can perform a careful analysis of that limit in terms of the front location m(t) and this regularity alone can be used to obtain the asymptotics (8.1) of m(t) as t → +∞. As noted above, this approach can be applied nearly verbatim for β < 2, so we only consider β = 2 below.

Assumptions on the rate of convergence
Let us now formalize the assumptions that go into the derivation of (8.1) and (8.3). We know from Proposition 2.2 that there is a reference frame m(t) such that u(t, x + m(t)) → φ 2 (x), (8.11) and from Theorem 1.1 that, when β = 2: m(t) = 2t − γ(t), γ(t) = a log(t + 1) − α(t), α(t) = x ∞ + o(1) as t → +∞. (8.12) Of course, we already know that a = 1/2 when β = 2 but we leave this coefficient undetermined for now, to show how this value can be discovered by the arguments below. One assumption of [8] and later proved in [27], is that the analogue of (8.11) with β = 0 holds at the rate O(1/t). For the Burgers-FKPP equation this translates into the assumption that u(t, x + m(t)) = φ 2 (x) + 1 t η(t, x), as t → +∞, (8.13) with a rapidly decaying in space (but not necessarily in time) function η(t, x). A result of [27] for β = 0 is that η(t, x) has a positive limit as t → +∞, and the rate O(t −1 ) cannot be improved. We stress that (8.13) is an assumption and not a rigorous claim, even though we believe that it holds, as it does for the classical Fisher-KPP equation.
In order to avoid additional technicalities in an argument that is not rigorous (because assumption (8.13) has not been justified), we use (8.13) to replace the rigorous claim that Φ 1 (1−ε) remains regular as ε → 0 by the assumption that I(ε) remains strictly positive and regular as ε → 0, (8.23) neglecting the error term E(ε). Note that the functionφ(r) is regular near r = 1, as can be seen immediately from its definition (8.17). The positivity of I(ε) holds because we know from (8.7) that Φ 1 (1 − ε) is finite and not small for any ε > 0, and so isφ(1). The surprising fact is that (8.23) by itself leads to the asymptotic expansion (8.3) for m(t). We first explain how we can find from (8.23) that the coefficient a that appears in (8.12) equals to 1/2, as expected. Using the expression for γ(t) in (8.12), we write I(ε) = εˆ∞ 0 e −ε 2 t+(1−ε)α(t) (t + 1) a(1−ε) dt = εˆ∞ where the o(1) is according to the limit ε → 0. We replaced α(s/ε 2 ) by its limit x ∞ in second-to-the last step above. If a > 1, then the non-integrability of r −a near the origin will cause the integral to grow like ε −2(a−1) . In this case, we deduce that I(ε) ∼ ε, which cannot happen as I(ε) is uniformly positive (8.23). A similar argument applies to the case a = 1. Hence, a < 1. In this case, the integral is finite, so that I(ε) ∼ ε 2a−1 . Since I is positive and bounded (again, by (8.23)), the only choice is a = 1/2 as in Theorem 1.1.

A Traveling waves for the Burgers-FKPP equation
In this appendix, we recall some basic facts on the Burgers-FKPP traveling waves. Most of them can be found in Section 13.4 of [42], at least on a formal level, and also in [53] for β < 0. We will use the notation f (u) = u(1 − u)

Existence of the traveling waves
We first state the result on the existence of the traveling waves. A traveling wave is a heteroclinic orbit of (A.4) that goes from E 2 to E 1 . The linearization of (A.4) around E 1 is The eigenvalues of this matrix are , (A. 6) and are both real and negative if c ≥ 2. Thus, the point E 1 is a stable equilibrium of c ≥ 2.
On the other hand, the linearization of (A.2) around E 2 is d dx The eigenvalues of this matrix are They are real and have opposing signs, so E 2 is a saddle point.

The asymptotic profile of the traveling waves
We briefly summarize asymptotic profiles of traveling wave solutions. The details are essentially identical to the Fisher-KPP equation, see, for instance, [38] for a detailed analysis. When β < 2, the minimal speed traveling wave has the following asymptotics on the right: U (z) = (Az + B)e −z + O(e −(1+δ)z ), as z → +∞. (A. 24) with A > 0 and δ > 0. When β ≥ 2, the critical front has an explicit form U (z) = 1 1 + e βz/2 . (A.25)

Steepness comparison of the waves
Finally, we present a steepness comparison for the traveling waves that is used to show convergence of the solution in shape to the minimal speed traveling wave.
Lemma A.2. Fix any β ∈ R. Let U 1 and U 2 be two traveling waves with speeds c 1 and c 2 , respectively. If c 1 ≤ c 2 , then U 1 is steeper than U 2 .
Proof. The proof follows an approach from [20]. If we definē it is enough to show thatĒ It follows that the function F (z) = (Ē 1 (z) −Ē 2 (z)) exp −ˆz Observe thatĒ 1 −Ē 2 → 0 as z → 1. It follows that F (z) → 0, as well. On the other hand, since c 1 ≤ c 2 , we see from (A.30) that F (z) is decreasing. It follows that F (z) ≤ 0 for all z, which implies (A.27) and concludes the proof.