On the identification of a nonlinear term in a reaction-diffusion equation

Reaction-diffusion equations are one of the most common partial differential equations used to model physical phenomenon. They arise as the combination of two physical processes: a driving force $f(u)$ that depends on the state variable $u$ and a diffusive mechanism that spreads this effect over a spatial domain. The canonical form is $u_t - \triangle u = f(u)$. Application areas include chemical processes, heat flow models and population dynamics. The direct or forwards problem for such equations is now very well-developed and understood. However, our interest lies in the inverse problem of recovering the reaction term $f(u)$ not just at the level of determining a few parameters in a known functional form, but recovering the complete functional form itself. To achieve this we set up the usual paradigm for the parabolic equation where $u$ is subject to both given initial and boundary data, then prescribe overposed data consisting of the solution at a later time $T$. For example, in the case of a population model this amounts to census data at a fixed time. Our approach will be two-fold.First we will transform the inverse problem into an equivalent nonlinear mapping from which we seek a fixed point. We will be able to prove important features of this map such as a self-mapping property and give conditions under which it is contractive. Second, we consider the direct map from $f$ through the partial differential operator to the overposed data. We will investigate Newton schemes for this case. In recent decades various anomalous processes have been used to generalize classical Brownian diffusion. Amongst the most popular is one that replaces the usual time derivative by a fractional one of order $\alpha \leq 1$. We will also include this model in our analysis. The final section of the paper shows numerical reconstructions that demonstrate the viability of the suggested approaches.

Classical, Brownian motion diffusion is not the only version, and in recent decades various anomalous processes have been used to generalize this case. Amongst the most popular is one that replaces the standard time derivative by a subdiffusion process based on a fractional derivative of order α 1. We will also include this model in our analysis. The final section of the paper will show numerical reconstructions that demonstrate the viability of the suggested approaches. This will also include the dependence of the inverse problem on both T and α.
Keywords: reaction-diffusion equation, fractional diffusion, parameter identification, regularization (Some figures may appear in colour only in the online journal)

Introduction
Before the thorough mathematical theory of existence, uniqueness and regularity results for reaction-diffusion equations were developed during the 1960s, there was extensive literature on applications dating from the 1930s. This includes the work of Fisher in considering the Verhulst logistic equation together with a spatial diffusion, u t − ku xx = au (1 − u) and that of Kolmogorov, Petrovskii and Piskunov in similar models now collectively referred to as the Fisher-KPP theory; more complex polynomial terms occurring in combustion theory due to Zeldovich and Frank-Kamenetskii. The class of reaction-diffusion equations describe the behaviour of a wide range of chemical systems where diffusion of material competes with production by a chemical reaction. They also describe systems where heat is produced depending on the local temperature and diffuses away by some mechanism. These systems can also be based on particles, as in the nuclear diffusion equation. For general references here we note the books [2,9,22] as well as chapter [19]. These equations and their generalization can be described as follows.
Let Ω be a bounded, simply connected region in R d with a smooth boundary ∂Ω, and let L be a uniformly elliptic operator of second order with L ∞ coefficients.
and subject to the initial and boundary conditions In (1) the standard, direct problem is to know both the diffusion operator L, the nonhomogeneous linear term r and the reaction term f (u). However, even in the earliest applications it was appreciated that the diffusion term −k u might have the diffusion coefficient k as an unknown, and while the specific form of f (u) was assumed known from the underlying model, specific parameters may not be. In our case we assume that f is unknown (save that it be a function of u alone). Thus, we are interested in the inverse problem of recovering f from additional data that would be overposed if in fact f were known. There has been previous work along such lines. For example, in a series of papers, [23][24][25], Pilant and Rundell investigated the recovery of f from overposed data consisting of a time trace of the solution u at a given point x 0 ∈ ∂Ω, namely u(x 0 ,t) for all t > 0. See also [6]. We will take a very different situation here by assuming that one is able to measure in the spatial direction by taking a snapshot at a fixed later time T and so our overposed data will be u(x, T) = g(x).
We also note that in the case of such data, if the reaction term is of the form q(x) f (u) where q is unknown but the actual form of the nonlinearity f (u) is known, then it is possible to recover the spatial component in a unique way [17].
Exchanging the time derivative in (1) for one of fractional order adds further complexity but is also of considerable physical interest as we are modifying in a fundamental way the diffusion process (Fick's law) to one of a so-called anomalous diffusion mechanism that would lead to a non-local, time-fractional equation.
and is subject to the same initial and boundary conditions (2). Here we have also included the possibility of an additional (known) inhomogeneity, which can be used to drive the evolution process. By D α t we mean the Djrbashian-Caputo derivative of order α which is defined to be (5) and since this is a non-local operator, the designation of the starting point t = a is essential; in our case we shall take a = 0. Of course, there are other possibilities for anomalous diffusion models, but the one chosen is both standard in most applications and has an existing rich mathematical theory. Examples of the above include applications in many of the areas in which traditional reaction diffusion equations have featured over the last 100 years and form a natural generalization. The central feature of a reaction-diffusion model is the interplay between the two central processes that are in some sense in competition. One of the main themes of this paper is to investigate the role the different diffusion models play in our ability to recover the reaction term f (u).
Briefly, the outline of the paper is as follows. In the next section we collect crucial main existence and regularity results for the direct problem. These are well known for the parabolic case but we require an addition to the existing literature for the case of (4). Sections 3 and 4 are devoted to two iterative approaches for numerically solving the inverse problem, a fixed point scheme and a Newton-type method. While these methods are formulated in the general, possibly higher-dimensional setting, the analysis will be restricted to the spatially one-dimensional case, due to the need for certain Sobolev embeddings. In the last section we show some numerical examples illustrating the above and use this to bring out certain features of the problem. In particular we investigate the difference between the classical and fractional diffusion operators and the role that the value of the final time T plays.

The direct problem
In this section we provide the results and tools for the direct problem, i.e. solution of the initial boundary value problem (2), (4), which will be needed later on in the reconstruction methods. These are, first of all, a representation of the solution to the inhomogeneous linear problem by means of Mittag-Leffler functions as well as eigenvalues and eigenfunctions of the operator −L. Moreover, versions of the familiar Gronwall's inequality for ordinary differential equations in function space are required for the fractional case; we develop these below, first for the linear case, then later for the semilinear. These will then be used to provide estimates needed for the semilinear fractional operator.

The solution representation and the Mittag-Leffler function
Throughout this paper we will assume that L = ij ∂ xi (a ij ∂ xj ) + c is a symmetric uniformly elliptic operator with coefficients a ij ∈ W 1,∞ (Ω), b i , c ∈ L ∞ (Ω), and that Ω is a bounded and convex or C 1,1 domain.
We define an operator A in L 2 (Ω) by where L is equipped with impedance boundary conditions, with its domain D(A) ⊆ H 2 (Ω). Then the fractional power A σ is defined for any σ ∈ R. Since L is a symmetric uniformly elliptic operator, the spectrum of A is entirely composed of eigenvalues and counting according to the multiplicities, we can set 0 < λ 1 λ 2 . . .. By ϕ j ∈ H 2 (Ω), we denote the L 2 (Ω) orthonormal eigenfunctions corresponding to λ j . Next, we introduce a space Ḣ s (Ω) bẏ which is a Hilbert space with the norm v 2Ḣ ) . Henceforth, we set Ḣ −s (Ω) = (Ḣ s (Ω)) , which consists of bounded linear functionals on Ḣ s (Ω). The solution u to the linear problem can be represented by [13] u( Here E α,β (z) is the Mittag-Leffler function [21] defined by This is an entire function of order 1/α and type 1. For the general properties of this function we refer to [20,27]. The definitive references are due to Djrbashian [3][4][5], although not so immediately accessible. See also the tutorial on inverse problems for fractional operators [13]. In the case α = 1, E α,1 (z) = E α,α (z) = e z and the model (7) as well its solution formula (8) recover the standard parabolic equation. For our purposes, and in fact for a large range of inverse problems involving a subdiffusion process, one of the key features of the Mittag-Leffler function is its decay for large, real and negative arguments. The asymptotic behaviour as z → ∞ in various sectors of the complex plane C was first derived by Djrbashian [3], and subsequently refined by many over the years since.
The following useful estimate is a direct corollary of lemma 2.1.
Corollary 2.1. Let 0 < α < 2, β ∈ R, and πα/2 < µ < min(π, πα). Then the following estimates hold The solution representation (8) can be succinctly rewritten as where the solution operators E and E are defined by, respectively and Since

Existence, uniqueness and regularity
The differences between the parabolic (α = 1) and fractional (α < 1) cases is substantial in terms of the regularity of the direct solution even in the homogeneous case r = 0. For the parabolic problem, if u 0 ∈ L 2 (Ω) then u(t) is infinitely differentiable in t and so we have an infinite pick up in regularity of the solution. For the fractional problem, this is not the case; one can only recover at most two weak derivatives; see [26]. This difference is due to the exponential decay of e −λjt , in the parabolic case but the only linear decay of E α,1 (−λ j t α ) for 0 < α < 1.
There is also a distinction for the case of a linear nonhomogeneous term as shown in [26] and lemma 2.3 below, which we will use to obtain a corresponding result for the nonlinear situation (4). As in the parabolic case, the main tool for this is Gronwall's inequality; we will need a version of this for the fractional operator case. Now we introduce the concept of weak solution for problem (7).
In the homogeneous case, existence and uniqueness of a solution to (7) can be concluded from (12), and the following stability estimate on the solution operator E(t) defined in (13).

Lemma 2.2.
For the operator E(t) defined in (13), we have, for any t > 0, where 0 q 2 and q p q + 2, and the constant c depends only on α and p − q.
Proof. By the definition of the operator E(t), there holds By corollary 2.1, we have where the constant c depends only on α. Consequently where the last inequality follows from the choice of p and q such that 0 p − q 2 and Young's inequality ab 1 P a P + P−1 P b Remark 2.1. The choice p q + 2 in lemma 2.2 indicates that the operator E(t) has at best a smoothing property of order 2 in space, which contrasts sharply with the classical parabolic case, for which the following estimate holds [29] for q 0, and any p q. This is reflected in the exponential decay of e −λjt , instead of the linear decay E α,1 (−λ j t α ) for 0 < α < 1.

Lemma 2.3.
For the operator Ē (t) defined in (14), there holds, for any t > 0, where 0 q 2 and q p q + 2, and the constant c depends only on α and p − q.
Gronwall's inequalities are a basic tool for differential equations; that shown here is the classical integral formulation that does not require interior differentiability of the function u(t). [a, b] while α is integrable on every subinterval of (a, b). If β(t) 0 and if

Theorem 2.1. Let β and u be real-valued continuous functions defined on the interval
If, in addition, the function α is non-decreasing, then We will need the corresponding results for fractional-order equations.
is a nonnegative, nondecreasing continuous function on (0, T). If u(t) is nonnegative and locally integrable on (0, T) such that In addition, if a(t) is nondecreasing on [0, T) then Proof. For locally integrable functions φ(t) on [0, T) define the operator B by Then for each integer n by successively applying (19) we obtain We claim that The proof of this is found by induction, and the inequality (23) is clearly true for n = 1. Assume now it is true for n = j , then we have where the second line follows from the first from the fact that g(t) is nondecreasing, and the third line follows by an obvious change of variables. Writing the integrand as a beta function and then invoking the beta-gamma function duplication formula yields the identity This now shows (23).
and it is easily seen that From the above we now obtain (20). If a(t) is nondecreasing on [0, T) then we can write (20) as where we have used the fact that An important particular case occurs when g is constant and the expression on the righthand side of the estimate can be written as a Mittag-Leffler function.
Lemma 2.4. Suppose b 0, β > 0 and a(t) is a nonnegative function locally integrable on 0 t < T (for some T < +∞), and suppose u(t) is nonnegative and locally integrable on Moreover, the following convolution version of Gronwall's inequality will be employed.

Lemma 2.5. Suppose c 0 and b(t) is a nonnegative function locally integrable on 0 t < T (for some T < +∞), and suppose u(t) is nonnegative and locally integrable on
Proof. The proof proceeds as in that of theorem 2.2, but in this case, it is easier since there is no singularity involved. Define (Ba)(t) = c t 0 e −λ(t−s) a(s) ds, then the inequality for u reads

Semilinear problems
Now we briefly discuss the slightly more complex case, i.e. semilinear problems, using a fixed point argument.
Consider the following initial boundary value problem for a semilinear subdiffusion In the model, we assume The argument below is based on the operator theoretic approach in L 2 (Ω); see also [12, theorem 3.1]. (29) hold. Then the solution u to problem (28) Proof. For the unique existence of u, it suffices to discuss the integral equation; see (12) First we estimate u t (t), which is given by where we have used (15). Consequently, for any 0 < t T, by lemma 2.3 with p = q = 0, there holds Using assumption (29) ).

B Kaltenbacher and W Rundell Inverse Problems 35 (2019) 115007
). This completes the proof of the theorem. □

An iteration scheme to recover f
A natural scheme to recover f is to evaluate equation (4) on the overposed boundary t = T.
where g is the given data and define a sequence of approximations by the fixed point iteration f k+1 = T( f k ). We start with the idealized situation of exact data and later on in section 3.5 consider the realistic setting of noisy data. Before we can utilize the map (30) we must obtain conditions on the data that guarantee it is well defined. Specifically, the range of g(x) must contain all values of the solution u(x, t; f ) for t T : The operator T is a concatenation T = P • S of the operators P : Z → X and S : X → Z defined by where we choose ρ 0 f act (u 0 ) + r(·, T)−Lu 0 Ḣσ (Ω) to make sure that a fixed point of T solves the original inverse problem (35), (3).
We use a bounded interval I = [g min , g max ] with g min = min{g(x) : x ∈ Ω}, g max = max{g(x) : x ∈ Ω}, in order to be able to make use of embedding theorems. Then we use the function space setting with σ such that Ḣ σ (Ω) continuously embeds into W 1,∞ (Ω) and with the norm Moreover, we employ the projection P(z) = max{min{z, g max }, g min } on I to define u(x, t; f ) as a solution to Hence, based on the range condition (31), which we need to assume holding for the actual nonlinearity f act only, and the fact that u(x, T; f act ) solves (4), we can replace the original model (4) by the equation containing the projection (35). In order to establish well-definedness of the operator P we will assume throughout this section that from which the general case q ∈ [1,2] follows by interpolation. Moreover, Under condition (36), the operator P : Z → X is well defined by (32) between the spaces defined by (33), and satisfies Proof. Existence of a minimizer follows from the fact that { f (g) − y Z : f ∈ X and f (u 0 ) Ḣσ (Ω) ρ 0 } is a nonempty closed convex (hence weakly * closed) subset of the space X which is the dual of a separable space. Moreover, from (37) and the triangle inequality as well as by minimality of Pf , comparing with f = 0, we get any minimizer f + of (32) satisfies f + (g) = y and is therefore unique, due to the fact that g(Ω) = I . The second condition in (40) can be shown to be satisfied by the strict monotonicity of g accord- is strictly monotone. This can be seen by using the identity f (u(·, t u(·, T; f ) will be shown at the beginning of the next section) and using the assumed regularity (36) of g, which transfers to its inverse and gives

as an easy verification of the chain rule in
In our implementation we use a regularized version of P by restricting the search for a minimizer of f (g) − y X to a finite-dimensional subset of smooth functions in X. This is actually an intrinsic part of the algorithm and to some extent explains its good contraction performance in spite of the fact that in an infinite-dimensional function space setting, contractivity of T can only be proven in the conditional sense of theorem 3.2 below.
Well-definedness of the operator S : X → L 2 (Ω) follows from theorem 2.3. The higher regularity Sf ∈Ḣ σ (Ω) will be established in the following section.

Self-mapping property of the fixed point operator
To prove that D α t u(x, T; f ) ∈ Z we make use of the representation formulae (12)-(14) which by the triangle inequality and Young's inequality for convolution (16) 2−2θ+σ ) in order to cancel the powers of λ j in the term containing f (Pu), so that no space derivatives are finally applied to f (Pu).
We can therefore further estimate, noting that by the range condition (31 Gronwall's inequality (17) yields provided α is sufficiently close to one: Note that Q * = ∞ does not work in the case α < 1, because of the singularity at t = 0.
Also note that the condition on α arising here is compatible with the previous one, Q and can therefore be satisfied provided σ < 2 and α is sufficiently close to one: That is, since we need σ > 3 2 to guarantee continuity of the embedding Ḣ σ (Ω) → W 1,∞ (Ω), this implies α > 4 5 . The function Φ appearing in the estimate (43) is an increasing function of the initial data f (u 0 ) + r(·, 0) − Lu 0 Ḣσ (Ω) , of f L ∞ (I) and of r t L Q * (0,T;L 2 (Ω)) ; moreover for fixed T and bounded f L ∞ (I) , the value of Φ can be made arbitrarily small by making and apply the above estimate (43) to get, for any f ∈ B, where the second inequality in (45) can be achieved by choosing ρ sufficiently large, ρ > C(g)C Ω H σ ,W 1,∞ Lg Z , and then choosing ρ 0 , r t L Q * (0,t;L 2 (Ω)) small enough so that Φ ρ 0 , T, ρ, r t L Q * (0,t;L 2 (Ω)) Moreover, by the constraint in the definition (32) of P, we have i.e. altogether T( f ) ∈ B, provided f ∈ B.

Weak * compactness of B and weak * continuity of T
The set B is by definition weakly * compact and convex in the Banach space X with norm For any sequence ( f n ) n∈N ∈ B converging weakly * in X to f ∈ X we have that the sequence of images under T, i.e. T( f n ), is contained in the weakly * compact set B. Using this and compactness of the embeddings X → L ∞ (I) (where boundedness of the interval I is crucial) we can extract a subsequence with indices (n k ) k∈N and an element f + ∈ B such that It remains to prove that f + = T( f ). For this purpose, we use the fact that the difference with homogeneous initial and impedance boundary conditions, where q n (x, t) Young's inequality (16) and (41), as well as the fact that f n ∈ B, allows us to obtain the crude estimate which by Gronwall's inequality (17) yields Using the fact that û j n as defined in (49) satisfies the fractional ODE D α tû j n (t) + λ jû j n (t) = ((q nûn +f n (Pu))(·, s), ϕ j ) we obtain From Young's inequality (16) and (41) we therefore obtain an estimate of D α tûk in a rather weak norm Thus, under the attainability condition (40) by (47). On the other hand, (47) also implies Invoking Schauder's fixed point theorem in locally convex topological spaces (see [7]), we have proven the following result. with ȳ = 1 0 f (g + θPw))P dθ 0 and homogeneous boundary conditions, and therefore has to vanish. □ Below we will make a series of short remarks. The question arises whether a similar result can be achieved for α < 4 5 . We have not investigated all possibilities to prove this, but also have no evidence that it cannot be done.
The assumption of strict monotonicity of g is used to guarantee attainability (40); see remark 3.1. Our iteration algorithm can be implemented without it and in practice our scheme worked perfectly well also for non-monotone g.
Estimate (43) shows that the influence of the initial data indeed decreases for larger T. This confirms the computational observation of faster convergence for larger T.

Contractivity for monotone nonlinearities in the parabolic case-uniqueness
To establish a contraction property of T in the parabolic case, we will make use of the fact that the solution u to the forward problem (4), (2) and even its time derivative decays exponentially, provided f is monotonically decreasing. In order to prove this, we restrict ourselves to L = ∇ · (A∇), where A ∈ R n×n is a symmetric positive definite matrix, i.e. the coefficients of the elliptic differential operator L are supposed to be constant, and r = 0.
Assume that f 0, then from then the maximum principle shows that v 0 but more crucially, v v. For v we can use the representation formula (12)- (14), which yields and Now we will employ (52) in order to show that provided we have f 1 , f 2 ∈ W 2,∞ (I), then For f 1 , f 2 ∈ X, when taking the difference T( f 1 ) − T( f 2 ) = f + 1 − f + 2 , the terms containing g explicitly and r cancel, so that with and we can estimate (T( f 1 ) − T( f 2 ))(g) Z by estimating D α t z. In the parabolic case α = 1 this can be done by differentiating (53) with respect to time to obtain that w = D t z satisfies We assume that there exist potentials q,q ∈ L ∞ (Ω), q,q 0 such that for some ĉ 1 , č 1 , such that 4ĉ 1 <λ 2 1 , 4č 1 <λ 2 1 , where λ 1 is the smallest eigenvalue of Lq = L +q (and analogously for q). We rewrite both initial boundary value problems in terms of Lq, and Lq, respectively, and use corresponding eigensystems λ j , φ j , λ j , φ j . This yields where the first term on the right-hand side can be estimated by This shows that we cannot take an r larger than 1 here, in order to avoid derivatives acting on the term ỹ(s) u u t works for Lebesgue norms but not for higher-order Sobolev norms). As a consequence, an embedding into W 1,∞ (Ω) will not be possible here and we have to work with a weaker norm on the differences f 1 − f 2 which, however, should still be strong enough to capture the f 1 − f 2 difference term appearing in ỹ. We will use the norm |||f 1 − f 2 ||| := ( f 1 − f 2 )(g) Ḣ1 (Ω) and make the assumptions (36) and u 0 ∈ W 1,∞ (Ω), which, together with the fact that by theorem 3.1 ∇u (2) L ∞ (0,T;L ∞ (Ω)) TC Ω H σ ,W 1,∞ ρ, implies the existence of constants C −1 (g), C 0 (g), Altogether we have The convolution version of Gronwall's inequality lemma 2.5 yields Analogously to (55), we obtain for (53) the identity Hence, using the replacements in (58) we obtain z(·, t) 2Ḣr (Ω) which for sufficiently large r would even yield but will actually only need r = 0 below. This allows us to estimate the term ỹu (2) t as follows Inserting this together with (52) into (58), setting r = 1, and using our assumption (56), we obtain λ1+ĉ3 . Altogether we have proven: Theorem 3.2. Let the assumptions of theorem 3.1 and additionally α = 1, r = 0, Then there exists a constant C > 0 depending only on f 1 L ∞ (I) , (Lu 0 + f (u 0 )) 2
As a consequence of theorem 3.2 we have the following conditional uniqueness result

Some thoughts and comments
It is instructive to compare the inverse problem for (1) with the initial boundary conditions (2) studied in this paper with earlier work in the [23][24][25] for similar equations and initial boundary conditions. Both approaches used an iterative method obtained by projecting the equation onto a boundary where overposed data are available. In the current case this represents a spatial measurement of u at a final time t = T, u(x, T) for x ∈ Ω, and in the former a time measurement u(x 0 ,t) for x 0 ∈ ∂Ω. In each case the method relies on a mapping of f to the differential operator projected onto the overposed boundary. In the current case this is f → u t (x, T) and in the previous it is (in one space dimension) f → u xx (x 0 , t). The analysis of the iteration relies on quite different tools for the representation of solutions. Here we work with series representations and estimates can to some extent be carried over from the parabollic case to the subdiffusion setting. In [23][24][25] the proofs (in a parabolic setting) rely on a representation by means of a Green's function and do not directly extend to subdiffusion, since the Green's function there-defined by the Mainardi-Wright function-exhibits a singularity at the origin.

Data smoothing and noise propagation
Typically, we will be given noisy data g δ and sometimes also some information on the statistics of the noise or, in the deterministic setting considered here, on the noise level with respect to the L 2 norm Application of the iterative scheme requires sufficiently smooth data, so instead of g δ we employ a smoothed version g δ satisfying the conditions on g of theorems 3.1, 3.2 as well as for some m > 0, where δ → 0 as δ → 0; see, e.g. [18, section 6.1].
In the situation of theorem 3.1 with m = σ, we can make use of uniform boundedness by ρ of the fixed point fg δ of the operator Tgδ corresponding to data g δ as δ → 0, as well as weak * continuity of T to prove subsequential weak * convergence in X of fg δ to f act as δ → 0, where fg δ is the fixed point corresponding to data g δ . Namely, uniform boundedness of ( fg δ X ) δ∈(0,δ] = ( Tgδ ( fg δ ) X ) δ∈(0,δ] implies existence of weakly * in W 1,∞ (I) and strongly in L ∞ (I) convergent subsequences f k := fg δ k , Tgδ k ( fg δ k ) with limits f and f , respectively. It remains to prove that f = T g (f ), which means that f is a solution to the inverse problem with exact data g, provided it satisfies the range condition (31). To this end, we abbreviate f k := fg δ k and g k :=g δ k and consider the following decomposition y Z : f ∈ X and f (u 0 ) + r(·, 0) Ḣσ (Ω) ρ 0 }, and we have used the fact that Tg k (f k ) = Pg k (ỹ k ) =f k . The first and last difference T g (f ) − T g (f k ) and f k −f tend to zero weakly * in X, due to the weak * continuity of T g . For the second and third differences, we abbreviate the projections defined by (32) by f + k = P g (y k ), f + k = P g (ỹ k ), and f + k = Pg k (ỹ k ), i.e. f + k (g) = y k and f + k (g) =ỹ k =f + k (g k ). Hence, the second difference in (62) satisfies and the third difference can be estimated by Altogether, the right-hand side of (62) tends to zero as k → ∞. A subsequence-subsequence argument therefore yields stability.  (36) and (61) with m = σ such that δ → 0 as δ → 0 and σ > 3 2 . Then the sequence of fixed points fg space, e.g. just X = W 1,∞ (I), or, in order to work with a Hilbert space setting, The operator F is well defined due to theorem 2.3 (replacing f there by f • P here). This allows one to write the inverse problem as We denote the exact solution by f act and some initial guess by f 0 . The derivative of the forward operator is given by The well-definedness of v can be concluded from results in [26], which has to be applied together with some fixed point argument, since the potential f (Pu) depends on x and t. Its adjoint (as appearing in the explicit version of Tikhonov-regularized Newton or in source conditions) can be derived by means of the standard integration by the parts procedure using the co-area formula. However, this leads to quite opaque formulas and hence we avoid it by stating Newton's method and source conditions in a variational manner below. Further regularity of F, such as Lipschitz continuity at f act , can be verified by using the fact where

Formulation of Newton-type methods
For simplicity we again just use a deterministic noise bound while the noise itself might of course be random. Consider two versions of regularized Newton iterations; see, e.g. [1, 11, 14-16, 28, 30] and the references therein.

Tikhonov-regularized Newton:
with chosen so that the exact solution is admissible, i.e. f act − f 0 . In order to guarantee the existence of the minimizers of (69) or (70) respectively, it is crucial to use a space X that guarantees weak( * ) compactness of sublevel sets as well as weak( * ) lower semicontinuity of the cost functional (and constraint). Therefore a choice X = H σ (I) or X = W 1,∞ (I) is admissible, while X = C 1 (I) would not satisfy these requirements.
Both versions of Newton's method are stopped according to the discrepancy principle: for some τ > 1 (independent of δ). The advantage of (70) as compared to (69) is that it always guarantees a bounded sequence of iterates, no matter if further conditions guaranteeing convergence are satisfied; see section 4.3 below. Its drawback is that an upper bound of the distance between f 0 and f act should be known in order to have a reasonable choice of . If for all δ sufficiently small the sequence reaches the stopping index defined by the discrepancy principle, then weak compactness of B ( f 0 ) also guarantees weak subsequential convergence of f k * (δ) to a solution f act , provided F is weakly continuous.

Convergence of Newton-type methods under a Lipschitz condition at f act and a variational source condition
Structural constraints on the forward operator, such as the tangential cone condition, are often not satisfied for parameter identification problems in PDEs unless complete observations of the state (i.e. in our case observations of u on all of Ω × (0, T)) are available. So they are likely not satisfied for the inverse problem under consideration here.
In contrast to this, a Lipschitz condition on F can usually be verified in a much more standard way. Note that one of the two points where F is evaluated is fixed to f act , which enables one to establish (72) without the need to impose a stronger space X (which would make the problem more ill-posed), but just by assuming additional regularity on f act ; see section 4.4. A convergence proof of Newton-type methods without a structural condition (so with just the Lipschitz condition on F ) requires f act to satisfy a source condition. The explicit version of the source condition would involve an adjoint, which we avoid by considering the variational version (see, e.g. [10]) for some β < 1 L . Note that (73) implies that f act − f 0 lies in the orthogonal complement of the kernel of F ( f act ) so its verification is related to the uniqueness question for the linearized problem. We here restrict ourselves to the case of a Hilbert space X with inner product (·, ·) X for simplicity. The considerations below can be extended to a Banach space setting by using Bregman distances (see, e.g. [10]).
Convergence of (69) under conditions (72) and (73) so that the stopping index defined by the discrepancy principle (71) is reached after a finite number of steps follows existing results (e.g. [15, lemma 4.1]; see also [1] for the Hilbert space setting and [28,30] in Banach spaces).
In case of (70), we can argue as follows, assuming that has been chosen such that f act − f 0 2 f act − f 0 2 + Cδ and omitting the norm subscripts to simplify the notation. Minimality of f k+1 and admissibility of f act implies where the left-and right-hand sides can further be estimated by means of the Lipschitz condition on F so inserting F p (k) and F pa (k) into (74) we get Here the quadratic terms can be estimated by means of the variational source condition and the fact that f − f 0 where we have again used the Lipschitz condition in the last estimate, hence Using this for = k and = k + 1 altogether yields i.e. an estimate of the form with m = 8βL 1−4βδ smaller than one if β is sufficiently small, so which proves that k * (δ) is finite provided τ has been chosen sufficiently large, τ > 1 +C 1−m . Moreover, (71) together with (75) for = k * (δ) yields the convergence rate

Back to the reaction-diffusion equation
We now verify the assumptions of proposition 4.1 for the forward operator F defined by (63).

Computational algorithms used
Here we describe some of the computational logistics behind the two approaches analysed in the previous sections: that of the iteration scheme based on (30) and those Newton-type methods based on computing the map and the associated Jacobian from (64) and (65). In both methods the forward solver was a standard Crank-Nicolson scheme for the parabolic equation while in the fractional case the time stepping and computation of D α t at the final time was based on a second-order method introduced in [8].
The former of these methods is by far the simplest and without question performed both exceedingly well and in a much superior way to those based on the more complex Newton methods. The algorithm can be summarized in the following steps: • Obtain the (noisy) data g(x) = u(x, T; f ) from some unknown f (u) but given the values of the initial and boundary conditions under which the solution u of the direct problem was constrained. • Since we require the quantity Lg this must be computed in a stable manner. This was accomplished by using the fact that u(x, T) must share the prescribed boundary conditions and so the exact value can be approximated as N n g n φ n (x) where {φ n (x)} is the orthogonal set of eigenfunctions of L on Ω. Then Lg can be written as N n g n λ n φ n (x). Now we require that Lg be at least as smooth as the desired value for f (g) to lie in W 1,∞ and we do this by a standard least squares fit with an appropriate penalty term to ensure this level of smoothness in Lg. Hence we used the Ḣ σ (Ω) norm where σ must be chosen in accordance with the Sobolev embedding theorem value (see previous sections).
• At each iteration we must recover f from f (g) = u t (x, T; f ) − Lg. We do so by representing f at the (now smoothed) function g in a basis set of functions. This is essentially the projections P and P noted in (32) and (35). Since we make no constraints on the form of f other than sufficient regularity we do not choose a basis with in-built restrictions as would be obtained from an eigenfunction expansion. Instead we used a radial basis of shifted Gaussian functions b j (x) := e −(x−xj) 2 /σ centred at nodal points {x j } and with width specified by the parameter σ. Since we use a finite number of such analytic functions, the computed f at each iteration will be likewise smooth. We compute the basis coefficients from a given f in the standard way using the Gram matrix of {e −(x−xj) 2 /σ }. It should be noted that no smoothing is necessary at this step -the entire regularization of the problem being achieved by the smoothing of the noisy data g. Figure 1 shows the reconstruction of the reaction term f (a) (u) = 2u(1 − u)(u − a) with a = 0.75. This corresponds to a particular choice of parameters in the Zeldovich model. The initial approximation was f (u) = 0, and we show iterations 1, 2, 5. The latter represented effective numerical convergence and it is clear that the second iteration was already very good.
The figure on the left shows the situation with 1% added noise and that on the right with 5% noise. Note that the iterations scheme itself was without regularization; all of this was contained in the initial smoothing of the data g (and also g xx ). The parameters for this were chosen by the discrepancy principle. However, it is clear that in the reconstruction from 5% noise that this resulted in under-smoothing. In all numerical runs based on the iteration scheme (30) the initial approximation for f was taken to be the zero function.
Of course there are more challenging possible reaction terms, and figure 2 shows the reconstruction of the Lipschitz function given by The data g(x) had 1% added noise and again effective numerical convergence was obtained within five iterations and even iteration 2 was almost as close to the original. For the case of Newton schemes we of course must compute the derivative of the map F and its associated derivative through equations (64) and (65). The computations for these rely on the same solver families used for the direct equation. As usual, there are several versions of Newton schemes available. For example, we utilized the option of freezing the Jacobian at the initial approximation to obviate what is the single most computationally intensive item in each iteration. In common with many nonlinear problems, the initial approximation is quite important. Choosing one at considerable distance from the actual function frequently led to a failure to converge. Even with a reasonable initial approximation it was often prudent to implement some control over the length of the Newton step. Although the convergence under such conditions was fairly rapid, perhaps surprisingly, the number of iterations required was almost always larger than that taken by the iteration scheme (30). Figures 3 show reconstructions of f (u) = f (a) (u) and f (u) = f (b) (u) using Newton iteration. In both cases the data g was subject to 1% noise. The initial guess was taken to be of similar form: f However, the norms of f and f init were quite far apart in both cases.
The figures show the 10th iteration although the rate clearly slowed down by the fourth or fifth iteration, which were already close to the one shown.
The use of a frozen Newton gave almost similar results; only a slight lag in the convergence rate was noticed. This of course shows that, provided the initial approximation is sufficiently good, this version has a significantly lower computational cost. Conversely, if the initial approximation is poorer, then a hybrid approach with initial shorter Newton steps followed by a holding of the derivative after a certain point becomes a viable option.   We note that the issue of regularization parameter choice is far from trivial in this case. Our regularization of the (singular) Jacobian required a stronger norm than just L 2 and in fact we used a penalty term of the form 0 I + 2 R where R was a smoothing matrix formed from R jk = umax umin b j (u)b k (u)λ(u) du where the weight function λ(u) allowed for differential smoothing over the interval.

Dependence on the final time T and on α
The sign of f (u) and of f (u) in the reaction-diffusion model will make a difference to the rate of convergence as will the time of measurement T. For example, theorem 3.2 shows that the contractive nature of the map T depends on T, and the larger the value T the smaller the contraction constant. In the parabolic case this decay is in fact exponential but we cannot expect the same in the fractional situation where the large time decay of the exponential is replaced by the linear decay of the Mittag-Leffler function.
The leftmost figure in figure 4 shows the dependence on the choice of T and of α of the convergence rate as measured by the improvement of the first iteration over the initial approxima- iter1 ∞ . This is a more definitive measure of the effect of the parameters T and α than the number of iterations in achieving effective convergence as the latter involves a degree of subjectivity. However both measures give consistent results.
In the parabolic case the results are as predicted by theorem 3.2; the number of steps required for a given accuracy decreases as T increases. What is perhaps surprising given the only linear asymptotic behaviour of the Mittag-Leffler function is the fact the convergence for large values of T is at least as good in the fractional case and, indeed, it is considerably better for small values of the final time T.
The right side of figure 4 shows the corresponding results for Newton's method. Here we again only look at the difference in the L ∞ norm of the initial approximation of f (a) init of f (a) (u) obtained after the first iteration.
In the parabolic case there is a definite improvement in the convergence rate as T increases. However, this feature diminishes with decreasing α and by α = 0.25 was almost imperceptible. Note that the improvements obtained by the first approximations shown in the above figures should not be directly compared as the iteration scheme (30) started from an initial approximation of f

Epilogue
Reaction-diffusion equations are building blocks for mathematical modelling in a wide range of applications. In many cases these models make a specific-form ansatz for both the diffusion and reaction mechanisms and contain parameters that have to be determined in order to prime the system for subsequent predictive behaviour of the quantity u of prime interest. This has traditionally been accomplished by a least squares fit to a relatively small number of parameters. In this paper we have sought to go beyond this by making no prior assumptions about the qualitative behaviour of the unknown reaction term f (u), but by recovering a full functional form from measured information consisting of the variable u at a fixed time. Such data are very natural in many applications; for example, in the case of ecology models, it corresponds to a population census. We have assumed that the diffusive process is known, including any parameters involved in this component. Thus, the diffusion coefficient a(x) and any transport term b(x) is assumed known as well as whether diffusion is through a Brownian or a particular version of an anomalous process. Other random walk models are possible and are frequently utilized. This paper has concentrated on the case of a process where the probability density function for the mean waiting time may be unbounded leading to a time derivative of the fractional order α < 1. In the limiting case when the mean waiting time is bounded, that is, α = 1, we of course obtain the regular parabolic operator, and this is also covered by our analysis. The recovery of f , in particular the convergence rate of the numerical algorithms, depends on α and also on the final time of measurement T, and the interplay between these quantities is quite complex.
We also noted the possibility of a fractional power in the space variable. However, this was through the fractional power of the elliptic operator −L rather than through the similar process of fractional-order spatial derivatives that would arise if our assumption on the random walk process took a spatial probability density function with infinite variance. While the latter model has a definite physical basis, the replacement of −L by its fractional power (−L) σ has a less clear physical connection. It does have mathematical elegance and convenience for analysis and computation and this has lead to its popularity. Discovering the interplay among the fractional power constants α, σ, the diffusion coefficient a(x) and the inversion of f (u) would be very interesting.
The fact that the model considered allows for spatially dependent coefficients is important beyond mathematical generality; this could be an essential factor in modelling. For example, in ecology, certain areas could be either rich in food or less hospitable and thus support a different f (u) term. In this respect the current paper is complementary to [17], where it was assumed that the reaction term f (u) was known but the coefficient that quantified this environmental factor had to be determined.
From a mathematical perspective, the standard parabolic reaction diffusion equation has considerable complexity. For example, without a growth constraint on f (u) the solution u can 'blow up' in finite time. Even if we measure our data-the solution u-at a time T, where T is less than this blow-up time, we are likely to have extreme values occurring if the leading term in f (u) is asymptotically near the critical exponent. Our reconstruction methods had to be set in a suitable space to prevent such an event occurring during the iterative process. The regularity of the non-homogeneous parabolic equation is well understood and there are critical gaps in the Schauder space regularity of the solution in the time variable that prevented a straightforward analysis of the convergence of the iteration schemes used in reconstructing f (u). In Sobolev spaces the embedding schemes between these spaces forced much of the analysis to hold only in one space variable. In the case of time-fractional diffusion the situation is more complicated. The subdiffusion (α < 1) equation has even more restricted regularity properties due to the singularity at t = 0 that causes at most a two-derivative pick up in smoothness of the solution u(:, t) over that of f whereas the parabolic equation has an infinite smoothness pick up. For reasons such as these, this paper is quite technical in parts and, as a compromise on length, has a rather dense exposition, which we have tried to ameliorate as much as possible.
In conclusion, this paper has only touched the surface of an exceedingly nontrivial problem. There remain challenging mathematical questions in incorporating more complex diffusion models into the framework as well as the myriad of cases that could occur in terms of finding optimal driving conditions such as initial boundary data to best recover a given type of function f . Clearly, in higher space dimensions and for space-and time-fractional models there are significant obstacles in developing both effective numerical methods and providing the required numerical analysis. In seeking a solution to the inverse problem of reconstructing f (u) we have concentrated on the featured iteration scheme based on the map involving u t (·, T; f ). An alternative approach is to extend the scope of Newton-type methods based on the map from f (u) to the data g(x) = u(x, T) instead. This will require showing the injectivity of the resulting Jacobian map as well as finding regularization strategies for its effective inversion. We view this as a nontrivial step, but believe it allows for multiple options. Beyond the indicated analysis, mathematical opportunity beckons.