Data assimilation in price formation

We consider the problem of estimating the density of buyers and vendors in a nonlinear parabolic price formation model using measurements of the price and the transaction rate. Our approach is based on a work by Puel (Puel J-P 2002 C. R. Acad. Sci., Paris 335 (2) 161–166), and results in an optimal control problem. We analyze this problems and provide stability estimates for the controls as well as the unknown density in the presence of measurement errors. Our analytic findings are supported with numerical experiments.


Introduction
In this paper we use techniques developed in the field of data assimilation to predict the dynamics of a nonlinear parabolic free boundary price formation model proposed by Lasry & Lions in [16].The Lasry-Lions (LL) model describes the price evolution of a single good traded between a large group of buyers and a large group of vendors.The price enters as a free boundary, at which trading takes place.After the realisation of a transaction, buyers and vendors immediately sell or rebuy the good at a shifted price.The shift in the price is due to the previously paid constant transaction costs.The situation detailed above can be described by the following nonlinear parabolic partial differential equation f (x, 0) = f 0 (x), p(0) = p 0 . (1.1c) The positive part f + = max(f, 0) of the function f = f (x, t) corresponds to the distribution of buyers over the price x ∈ Ω, the negative part f .= min(f, 0) to the is the vendor distribution over the price.The free boundary p = p(t) corresponds to the price where f (•, t) = 0, the function Λ to the total number of transactions executed at that price.The immediate placement and execution of new bids and orders after the trading event are modeled by the Delta Diracs at the shifted prices p(t) + a and p(t) − a, where a ∈ R + denotes the transaction costs.Random changes in the buyer and vendor distribution are included by a Laplacian with constant diffusivity σ ∈ R + .We assume that the initial distribution f 0 satisfies: f 0 (p 0 ) = 0, f 0 (x) > 0 for x < p 0 and f 0 (x) < 0 for x > p 0 , a.e. in Ω (1.2) and set w.l.o.g.σ 2 2 = 1.System (1.1) can be posed on the positive real line Ω = R + or a bounded interval Ω = [0, x max ], where x max denotes the maximum price.We will consider (1.1) on the bounded interval only and impose homogeneous Neumann boundary conditions to ensure that the total number of buyers and vendors is constant in time.
For convenience, we assume the initial price p 0 is normalized to 0 and only consider its relative change.Hence we work on the shifted domain [−L, L], where L = xmax 2 .Altogether we will consider (1.1) with boundary condition (1.3) on Ω = [−L, L] throughout this manuscript.
The LL model (1.1) was analyzed in a series of papers, cf.[11,19,8,4,5].Most available results are based on a nonlinear transformation of (1.1), which transforms the problem to the heat equation with nonlinear boundary conditions.This connection provides the main analytical ingredients to study existence and long time behavior of solutions to (1.1).Lasry and Lions introduced the model on the macroscopic level only, a more detailed microscopic interpretation of the trading process and the respective limit as the number of buyers and vendors tend to infinity was missing.This connection was established by Burger et al., who proved that the original LL model can be derived from a Boltzmann type model as the number of transactions tends to infinity, see [2].In their approach trading events between buyers and vendors are modeled by "collisions", which can also be used to describe price dynamics in case of more general trading rules.The connection between the Boltzmann-type price formation model and the LL model (1.1) was further investigated in different asymptotic limits in [3].The LL and Boltzmann-type price formation models are appealing in many respects, especially in terms of analytical tractability.However the resulting price process is deterministic and does not give any insights into connections between transactions rates, order flows or price volatility.Markowich et al., [18] considered a stochastic extensions of the original LL model.However this extension did not give realistic price dynamics either.Very recently Cont and Müller [10] proposed a stochastic partial differential equation with multiplicative noise, which reproduces statistical properties of real price dynamics.
In this paper we focus on the inverse problem of determining the buyer-vendor distribution given measurements of the price and the transaction rate on a time interval [0, T ].This distribution can then be used as an initial value and thus allows us to predict price and transaction rate for t > T .More specifically we will investigate the question Problem I: Given measurements of the price p(t) and the transaction rate Λ(t) in some time interval [0, T ], is it possible to predict the price for times t > T ?
Our approach is based on an optimal control approach proposed by J-P.Puel, see [20,21].It is based on a duality argument, which allows to reconsttuct the distribution f at the final time T .This is in contrast to standard data assimilation where one tries to recover the initial datum f 0 (x).We adapte the strategy of Puel et al. and use duality estimates to compute linear functionals of f (T, x).These functionals involve the solution of optimal boundary control problems with PDE constraints.Optimal boundary control problems are well studied in the literature, see e.g.[17,22,13].We will make use of an exact null controllability result for parabolic boundary control problems shown in [7].Its proof is based on Carleman estimates, a technique commonly used to derive exact controllability results (and also uniqueness for inverse problems), see [23,14] for details.A possible numerical realisation of Puel's strategy was presented in [9].
Our contributions to the subject of optimal control for parabolic free boundary problems and data assimilation in price formation models are the following: • We present the first approach to reconstruct the buyer-and vendor distribution from measurements of price and transaction rate (to the author's knowledge).
• We generalise the data assimilation approach of Puel et al., see [20], to free boundary value problems and evolving domains.
• We provide stability estimates, which give novel insights into the influence of measurement errors on the price dynamics.
• We propose a computational strategy to implement the developed framework numerically.
This paper is organized as follows: The proposed framework is based on several analytic results, which will be presented in Section 2. The data assimilation problem itself is discussed in Section 3. Section 4 is devoted to stability in the presence of measurement errors and we conclude by presenting numerical experiments in Section 5.

Preliminary results
In this section we provide analytic tools and results of the forward problem and define the respective adjoint problem, which will be used in the optimal control approach.
Assumption (A1) is the necessary compatibility condition for the initial datum f 0 (which we already stated in (1.2)), while (A2) ensures that the price stays sufficiently far away from the interval boundaries.Note that the restriction on p(t) is not severe in the context of inverse problems: Since we will assume later on that we know measurements of p(t) in some time interval [0, T ], we can always chose the domain size L (within realistic bounds) such that the condition p(t) ∈ (−L + a, L − a) is satisfied.As p(t) is continuous, we also know it will stay in (−L + a, L − a) for some time so that it is save to predict for t > T .

Nonlinear transformation of the model
We start by discussing the nonlinear transformation which converts (1.1) to a linear heat equation.This connection was exploited in almost all analytic results as well as computational methods.The idea is that the second derivative of f at p(t) − a behaves like Λ(t)δ p(t)−a while at p(t) + a it behaves like −Λ(t)δ p(t)+a .Thus shifting the function by multiples of ±a and adding them up 'eliminates' the singularity on the right hand side.More precisely, for Ω = R, we define (2.1) Then the function F = F (x, t) satisfies the heat equation with the transformed initial datum Since we consider (1.1) with homogeneous Neumann boundary conditions on a bounded domain, we only have a finite sum in (2.1) but obtain the following boundary conditions: Note that the solution of the original LL model (1.1) can be computed by

Existence and regularity of the price
In the following we provide additional existence and regularity results for the direct problem.Note that these results are not optimal in terms of regularity.However, they are sufficient to define all quantities that we shall need in the sequel.Proof.The proof is mainly based on the definition of the transformation (2.1), see [5] for details.
Note that the stationary price is determined by the initial mass of buyers and vendors as well as the transaction rate a.In particular where M l = − L p 0 f 0 (x)dx and M r = L p 0 f 0 (x)dx.The presented analysis of the adjoint and assimilation problem relies on the following regularity result for the price p = p(t).Proof.The results is a direct consequence of the fact that F (x, t) is smooth in space and time for all t > 0 and of the boundedness of Λ.Indeed, differentiating the relation F (p(t), t) = 0 yields and therefore where the parabolic version of Hopf's Lemma applied at Remark 2.3.The regularity of the price p as well as the buyer-vendor density f at the initial time is crucial to define the transformation between the time-dependent domains [−L, p(t)] and [L, p(t)] and the reference domain [0, 1] (see Subsection 2.3) but also for the exact controllability results of Theorem 3.3.Therefore we will work the temporal domain [ε, T ] instead of [0, T ] for some fixed ε > 0 in the following only.

Evolving spaces and transformation to fixed domain
A crucial step in the subsequent analysis is the splitting of the domain Ω into the part left and right of the price p(t) (illustrated in Figure 1).We introduce the domains and Ω = [0, 1], as well as Following [1], we define evolving Bochner spaces on these domains.We present the construction for the left domain Ω = [−L, p(t)] only, since the argument for the right domain is analogous.First denote by H 1 (t) := H 1 ((−L, p(t)) the evolving Hilbert space.Next we define the map φ t : The function φ t is obviously continuous and reduces to the identity at t = ε.It is also a homeomorphism as its inverse is continuous as well.This allows us to introduce the evolving Bochner spaces (as in [1, Definition 2.7]) and, following again [1], make the identification of u(t) = (ū(t), t) with ū(t) for u ∈ L 2 H 1 (and likewise in L 2 (H 1 ) * ).The space of continuously differentiable functions on evolving Bochner spaces is given by Thus we can, as in [1, Definition 2.20], to give a notion of time (material) derivative as Then we can finally define the space used for the notion of weak solutions, namely The definitions of the respective quantities While the previous definitions allow us to directly work in a noncylindrical domain, it is sometimes also useful consider the transformation to the fixed domain Hence we introduce transformations which map Q and Q to Q: Note that due to assumption (A1), T and T are well-defined and that T actually flips the domain, i.e. it swaps left and right boundary points.

Adjoint equations
The next ingredient will be two adjoint equations, posed on the domains Q and Q , respectively.
ε, T ) there exist unique solutions Φ and Φ to (2.10) and (2.11), respectively.Furthermore, we have With the help of the transformations T and T , equation (2.10) and (2.11) can be transformed into a generic problem of the form For (2.10) we define (y, t) = T (x, t) and compute while for (2.11) and (y, t) = T we obtain (2.16) Note that in view of Lemma 2.2 and Assumption (A1), the coefficients a and b are (in both cases) continuous and uniformly bounded by as there may be points with p (t) = 0. Thus, standard existence and regularity results for linear diffusion-convection equations on fixed domains, such as [15, Theorem 5.2], can be used to ensure the solvability of (2.14).

Data assimilation problem
We now turn to the main part of this paper -the inverse or data assimilation problem I.In classic data assimilation approaches one would use the measurements of p = p(t) and Λ = Λ(t) on [0, T ] to reconstruct the initital datum f 0 (x) of (1.1).Here we follow an alternative approach proposed by Puel et al., see [20,21], and estimate the buyer-vendor distribution at the final time, that is f (x, T ) instead.This requires the solution of additional optimal control problems, which are, however, well posed if an appropriate regularisation (penalty) is added.
To use Puel's strategy in our setting, we will estimate the densities of buyers and of vendors separately (that is on the right and left of the free boundary).The reconstruction is then based on the following two duality estimates: satisfy (2.10) and (2.11), respectively.Then, the following duality estimates hold for arbitrary functions u , u ∈ L 2 (0, T ) and every ε > 0.
Proof.We prove the first estimate only, since the argument for (3.2b) is the same.We have where we have used the boundary condition (1.3), f (p(t), t) = 0 and the definition of Λ.Now we will use (3.2a)-(3.2b) to determine f (x, T ).Since the choice of Ψ and Ψ in (3.2a) and (3.2b) is arbitrary and the last term on the right hand side contains only known (i.e.computed or measured) quantities, we could obtain a linear functional of f (x, T ).The only unknowns are the first terms on the respective right hand sides.But since we are free to choose arbitrary boundary data u and u , this leads to the null-controllability problems for (2.10)- (2.11).Indeed, if we can chose u and u such that Φ (x, ε) = 0 and Φ (x, ε) = 0, the unknown terms in both orthogonality relations drop out and we can reconstruct f (x, T ).

Optimal control problem
To conduct the strategy outlined above, we have to solve the optimal control problems min u ∈L 2 (ε,T ) Since the structure of both problems is the same, we will only discuss the first one.To increase readability, we will drop the subscript and write u, φ, . . .instead of u , φ from now on.
The next result states that the optimal control problem is indeed exactly null-controllable in the sense of the following definition.
The following exact boundary controllability result is based on [7, Theorem 2.3], slightly extended and adapted to our situation.The theorem reads as follows.
Theorem 3.3 (Exact null-controllability).For every Ψ ∈ L 2 (Ω ), there exists at least one control u ∈ L 2 (ε, T ) such that the solutions Φ of (2.10) satisfies Φ(x, ε) = 0 on Ω .Furthermore, there exists a constant C which depends on p(t), L and T such that holds with ū being the control of minimum L 2 -Norm.
Proof.The regularity of the price p allows us to transform the problem to a fixed domain using T defined in (2.9).Hence we only consider equations of type (2.14).First we observe that for any positive δ > 0, any solution Φ to (2.In order to be able to numerically solve the optimal control problem, we introduce the following regularized version min u∈L 2 (ε,T ) Standard arguments guarantee the existence of a unique minimizer, see e.g.[22,Section 3.5].
Calculating the derivatives of the corresponding Lagrange functional we obtain the first order optimality system where Φ satisfies the adjoint equation (2.10) and the coupling The following results examine the convergence of u as α → 0. The proofs are using the same techniques as in [21], yet adapted to our boundary control problem.
Theorem 3.4.For every α > 0, denote by (u α , φ α ) the corresponding solution to (3.7).Then we have where ū is the solution to the optimal control problem (3.3) having minimal L 2 -norm and Φ α and Φ are the solutions to (2.10) with boundary data u α and ū, respectively.
Proof.By Theorem 3.3, we know that there exists at least one function solving the exact null controllability problem.Thus, the set of all these controls in L 2 (ε, T ) is nonempty.As it is also convex and closed, there exists a unique ū having minimal L 2 -norm.Since u α minimizes the functional (3.7) among all function in L 2 (ε, T ) we have which implies the (uniform in α) bound Thus, we can extract a subsequence, again labeled u α that converges weakly to some ũ in L 2 (ε, T ).Using the weak formulation of (2.10) and an Aubin-Lions argument, we see that this is sufficient to obtain the convergence )), as α → 0 and (3.13) implies Φ(ε, x) = 0. Thus, arguing as in the proof of [21, Theorem 2.12], we can use the fact that ū has minimal norm as well as the lower semicontinuity of the norm w.r.t weak convergence to obtain that ũ = ū.This argument also implies norm convergence and the uniqueness of the limit then finally yields This also implies Φ = Φ which completes the proof.
Remark 3.5.Understanding the optimal control problem (3.3) (or (3.4)) as Tikhonov regularisation, one could ask for convergence rates of u α to ū as α → ∞.Indeed, such rates could be expected under appropriate source conditions on ū.The interesting point now is to understand the influence of p(t) in the definition of the forward operator in the characterisation of such conditions and also how perturbation in p would influence them.We leave this question for future research.

Stability in the presence of measurement errors
Assume we have measurements of two different prices p 1 and p 2 as well as two different transaction rates Λ 1 (t) and Λ 2 (t).Can we control the difference in the reconstructions f 1 (x, T ) and f 2 (x, T ) as well as the future predicted prices p 1 (t) and p 2 (t) for t > T in terms of these differences?In this section we will give a positive answer to this question based on the following strategy 1. Estimate the error in the optimal controls u 1 and u 2 in terms of the error in p 1 and p 2 (Lemma 4.2).
2. Estimate the error in the respective reconstructions f 1 (x, T ) and f 2 (x, T ) in terms of errors in price and transaction rate (Lemma 4.3).
3. Use these results to predict errors in the future price (Lemma 4.7).
Note however that for the last point we need to make additional regularity assumptions on the reconstructed final data that do not directly follow from our analysis (see Remark 4.5 for details).We start by assuming W.l.o.g.we only consider the optimality system related to (3.3), i.e. the left part Ω = [−L, p(t)] and again drop the subscript .Moreover, we transform all equations to the unit interval [0, 1], so that the optimality system reads as and the coupling condition with a(t) and b(t) as defined in (2.15).Note that the transformed primal and dual equations are still adjoint to one another, yet now with respect to the scalar product Lemma 4.1.Let Φ and G be the solutions to (4.1a) and (4.1e), respectively.Then we have Proof.These are standard estimates obtained choosing Φ and G as test functions in the weak formulation of (4.1a) and (4.1e), respectively.For the first estimate, we additionally used the L 2 -bound (3.14) on the boundary control, which introduced the α-dependence in C 1 .
Now we are able to prove stability of the optimal control problem in terms of measurement errors in the price.Lemma 4.2 (Stability of u).Consider two different prices p 1 (t) and p 2 (t) such that p 1 (ε) = p 2 (ε) and p 1 − p 2 C 1 ([ε,T ]) ≤ δ p .Denote by Φ 1 and Φ 2 and G 1 and G 2 the solutions to (4.1a) and (4.1e) with p = p 1 and p = p 2 , respectively.Then the following stability estimate for the controls u 1 and u 2 holds: Proof.For each p i (and corresponding a i , b i ), we denote by G i , Φ i and u i the corresponding solutions to the optimality system (4.1a)-(4.1i)and furthermore Then, Φ and Ḡ satisfy, in the weak sense, the equations and Note that the following calculations are formal since for now we only know existence of weak solutions and therefore some of the integrals are not defined.In the end we arrive, however, at an estimate which is again well defined and could can be obtained rigorously by directly working with weak solutions.We chose this way of presentation as we believe it to be easier to follow.Thus (formally) taking equation (4.3a) and testing it with Ḡ (with respect to the scalar product (4.2))yields Integrating by parts on the left hand side, using (4.3e) and the boundary conditions results in A final integration by parts to remove the second derivatives on the right hand side gives (p(t) + L) Using the estimates of Lemma 4.1, the boundedness of u in L 2 (see (3.14)) and Cauchy's inequality applied to the last term on the right hand side, we have where we also used the lower bounds (2.17) on a and Assumption (A3) to estimate the expression (p(t) + L) from below by p and above by L − p.Using again (2.17) yields Combining this with the previous estimate yields the assertion.
For the second step of our strategy, we return to the orthogonality relation (3.2a) which, transformed to [0, 1], reads as (p(T ) + L) In the presence of errors in p and Λ we obtain two different relations and the following stability result.Note that the above results on the adjoint equations imply solvability for Φ with continuous dependence on the initial value for any Ψ ∈ L 2 ([0, 1]).Hence, the duality relation uniquely defines ).There is further stable dependence of f (•, T ) on the errors in the price and transaction rates, which we make precise by the following result: Lemma 4.3 (Stability of f (x, T )).Let p 1 , p 2 and Λ 1 , Λ 2 be given functions which satisfy Assumption (A3) and denote by f 1 (x, T ) and f 2 (x, T ) the corresponding reconstructed prices calculated using (4.4).Then we have We estimate each term of the right hand side separately where we used Lemmata 4.1 and 4.2.Next we have Combining all estimates and taking the supremum over all Ψ ∈ L 2 ((0, 1)) with Ψ L 2 ((0,1)) = 1, we finally obtain Taking C 6 = max(C 13 , C 11 ) yields the assertion.
Remark 4.4.The estimates of Lemma 4.2 and 4.3 show that, for α > 0, the recontruction of the unkown buyer vendor distribution f (x, T ) is acutally a well-posed problem, at least for suffiently smooth perturbations of p.This is due to the fact that we are solving a regularized optimization problem.The price to pay is that the term involving f (x, ε) in (4.4) does not vanish.However, since f (x, ε) is fixed, is does not appear in our stability estimates.
For the next result, we choose perturbed prices p 1 and p 2 such that |p 1 (T ) − p 2 (T )| < 2a and assume w.l.o.g. that p 1 (T ) ≤ p 2 (T ) and make the following additional assumptions: Remark 4.5.We mention that indeed it is natural to assume strong regularity of f in a neighbourhood of p(T ) for T > 0, since it locally arises as the solution of a heat equation.On the other hand we need to expect some singularities around p(T ) − a and p(T ) + a due to the singular source terms.Thus (A5) seems completely natural for forwards solutions of the price formation model.Moreover, it can also be verified that f (•, T ) reconstructed via (4.4) has local H 4 -regularity, which follows from using Ψ supported in I and an analysis of the solution of the parabolic equation for Φ, which can be estimated in terms of the H −4 norm of the initial value.
In the following we analyze the forward propagation for t > T in a small time interval.We denote the new initial value by f i,0 := f i (•, T ).First note that using the same localisation strategy as in [19] (i.e.multiplying the solution to (1.1) with a smooth cut-off function that has support inside the interval I), implies with γ > 0 to be fixed later on and where f i is the solution to (1.1) with the reconstructed initial datum f i (x, T ) that additionally satisfies (A4)-(A6).Furthermore, I 2 is an interval As the p i (t) are continuous, choosing γ sufficiently small guarantees that the derivatives of K T N appearing in the definition of θ i are always evaluated away from their singularity, in particular they are bounded and locally Lipschitz-continous, which implies with the local Lipschitz constant λ Taking the absolute value on both sides of (4.9) and using Lemma 4.6 implies so that Gronwall's lemma implies, together with (A3) and (A6), yields Next we exploit the fact that f i (p i (t), t) = 0 by taking the time derivative, which gives Subtracting the above equation for i = 1 and i = 2 respectively, using the definition of Λ i and integrating in time we obtain, for Denoting by Λ = inf T ≤s≤T +γ Λ 1 (s)Λ 2 (s) and using (A3) this yields As a consequence of (

Numerical Simulation
We conclude by illustrating the proposed methodologies and confirming the obtained analytic results with various computational experiments.All simulations are performed on the domain [−L, L], which is split into N intervals of length h.The discrete grid points are denoted by x i = ih.We compute solutions at discrete times t k = k∆t, where ∆t is the discrete time step.for k = 1, . . ., K. Note that the pertrubed price is still in C 1 and that p δ,0 (0) = p δ,k (0) for all k = 1, . . .K. We use the first initial datum (5.1) and set K = 7.All other parameters are the same as in the first example.Figure 5 illustrates the linear increase of the error in the controls and the reconstruction as the noise level increases.

Predicting price dynamics
We conclude with an example where we use the reconstructed buyer-vendor distribution f (x, T ) to estimate future price dynamics and illustrate the influence of noise in those.We consider an initial datum of the form

Summary and Outlook
We studied a data assimilation problem for a parabolic nonlinear free boundary problem.This partial differential equation describes the evolution of the price, that is the free boundary, in a large economic market.We developed an analytical and computational framework for the corresponding data assimilation problem, which is based on a previous work by Puel et al., see [20].The free boundary splits the original problem into two parts, each of them defining a separate optimal control problem.We discussed analytic properties of the respective problems and derived stability estimates for the controls and reconstructed unknown buyervendor distribution in the presence of noise.Finally we confirmed and illustrated our results with computational experiments.We believe that the developed framework provides the basis for more general data assimilation problems in price formation.In [2] Burger et al. considered a Boltzmann type price formation model, which allows for more complex trading mechanisms.This problem is a system of nonlocal reaction-diffusion equations on the whole domain, where multiple prices (even with continuous distribution) and transaction rates can appear.Analogous questions can be asked for this problem if only the expectation of the price is to be predicted, but the problem could

Theorem 2 . 1 (
Existence of f , p(t)).Let f 0 ∈ L 2 (−L, L) and p 0 ∈ (−L+a, L−a) satisfy (A1).Then the BVP (1.1) has a global solution conserving the total mass of buyers and vendors iff the zero level set p = p(t) of the solution of (2.2)-(2.3)satisfies p(t) ∈ (−L + a, L − a) for all t > 0. Then the free boundary p(t) converges to the stationary price p ∞ ∈ (−L + a, L − a).

Figure 2 :
Figure 2: Left: Evolution of the price p = p(t); Right: Reconstructed and computed buyervendor distributions F .

Figure 3 :
Figure 3: Evolution of the controls u 1 and u 2 .

Figure 4 :
Figure 4: Left: Evolution of the price p = p(t); Right: Reconstructed and computed buyervendor distributions F .

( a )
Difference in the L 2 norm of the controls (b) Difference in the C 1 norm of the reconstructions

Figure 5 :
Figure 5: Difference in the controls and the reconstructions for different values of δ.