Dynamic Equilibrium Limit Order Book Model and Optimal Execution Problem

In this paper we propose a dynamic model of Limit Order Book (LOB). The main feature of our model is that the shape of the LOB is determined endogenously by an expected utility function via a competitive equilibrium argument. Assuming zero resilience, the resulting equilibrium density of the LOB is random, nonlinear, and time inhomogeneous. Consequently, the liquidity cost can be defined dynamically in a natural way. We next study an optimal execution problem in our model. We verify that the value function satisfies the Dynamic Programming Principle, and is a viscosity solution to the corresponding Hamilton-Jacobi-Bellman equation which is in the form of an integro-partial-differential quasi-variational inequality. We also prove the existence and analyze the structure of the optimal strategy via a verification theorem argument, assuming that the PDE has a classical solution.


Introduction
The effect of the liquidity of a security asset, both short term and long term, has been noticed by practitioners and researchers alike for quite some time. Tremendous efforts have been made in modeling the liquidity costs as well as its impact on the security prices (see, e.g., [4,2,8,15], to mention a few). In a frictionless market model (Black-Scholes' framework for example), one assumes that the securities can be bought or sold at a quote price regardless of the trade size and the actual availability of the securities. But this is far from being realistic. In practice, the parity between the supply and demand often causes the actual trade price to deviate from the fundamental price, leading to the bid-ask spread. As a consequence, some extra cost has to be paid in actual trading, especially when the volume of the trade is relatively large compared to the existing liquidity on the market.
Unlike the quote driven market models, in which a market maker sets the price upon which all the trades are made, an "order-driven" market model is one that reflects more of the reality. In such a model, both buyers and sellers are allowed to be "patient" in the sense that they submit the "orders" containing the amount of the shares and the prices at which they are willing to buy or sell.
These orders are called limit orders. Unlike the "market orders", which are executed immediately at the "market price" whenever there is sufficient liquidity, the limit orders are executed only when an opposite order with the matching price comes in. Obviously, limit orders are usually not executed immediately, a limit order book (LOB) is thus formed. Intuitively, a reasonable model of an LOB must contain the following basic elements: (i) The best ask/bid price (the frontier of the sell/buy LOB); (ii) The shape of the LOB (the volumes of the orders at each price).
There have been many papers in the literature trying to model and analyze the movement of the LOB (cf., e.g., [11,13,14,16] and the references cited therein), as well as the optimal execution/liquidation problems in which a large trader needs to acquire/liquidate a certain amount of stocks in a given time horizon, with the minimal cost (see, e.g., [3,12,15]). Apart from the usual factors such as the fundamental price (or mid-price) and the liquidity (often refer to the total amount of shares available for trading), an important characteristic of an LOB is its "shape", that is, the "density" function of the LOB. This is particularly the case when the liquidity cost is among the main concerns. However, in most of the existing works the shape of the LOB is assumed to be exogenously given, either in the simple "block-shaped" (cf. e.g., [4,15]), or in a general given shape that is supposed to be determined by empirical studies (cf. e.g., [3,2,17] and the references cited therein). However, such an assumption obviously lacks the ability to adapt to the changes of market movement, especially when the underlying price is volatile within the concerned time horizon. A more ideal model would be such that the shape of the LOB could be determined endogenously, through some more basic market factors such as the bid-ask spread, the fundamental prices (the "mid price", for example), and the market liquidity. This paper is an attempt in this direction.
To simplify the argument in this paper we shall consider only the "sell" side of the LOB, namely we assume that all the buyers are "impatient" in the sense that they only submit the market orders so there is no "buy" side LOB. Our first objective is to develop a dynamic model for the LOB whose shape is determined via the movement of the fundamental price, the instantaneous trading size, as well as the liquidity. The guiding principle of our model comes from the idea of equilibrium distribution, initiated by Rosu [16]. Roughly speaking, we assume that there exists a competitive equilibrium among all the prices in the LOB. The existence of such an equilibrium can be heuristically justified as a balance between the expected sell price and the cost of waiting (for the order to be executed). The equilibrium could be affected by the fundamental price, the execution of orders, and the arrival of the new orders, etc., and when an existing equilibrium is broken, every seller in LOB will reposition until an equilibrium is reached. It should be noted that this equilibrium is " competitive" in the sense that one trader's deviation will be stopped by others' immediate undercutting. In other words, when the market is under monopoly, we should allow the distribution to behave differently. In this paper we assume that the time of reaching new equilibrium is negligible, that is, the impact has zero duration, or "zero resilience", for simplicity.
We should note, however, that the issue of resilience is interesting in its own right (see, e.g., [4] and also [1,2]), but this is not the main purpose of this paper.
Mathematically, we shall assume that the equilibrium density process takes the form µ * t = µ * (t, X t , Q t , y), y ≥ p 0 , where p 0 is the lowest (selling) price, X is the fundamental value of the asset, and Q is the total volume of the LOB. We also assume that the equilibrium is "quantified' by a common expected utility on each price, which depends on the fundamental price and the total liquidity, and is denoted by U (X, Q). Our main premise is that, after each trade with size α ∈ [0, Q], the following two identities must hold: Here the first equality is self-evident: p(α) = p(α, X, Q) ≥ p 0 is the price in LOB at which the accumulated volume of sell limit orders between p 0 and p(α) is exactly equal to α; whereas the second equality means that the average price sold should be equal to U (X, Q − α), the expected utility for the remaining LOB (a more detailed argument will be given in §3). Using the equations in (1.1) we will be able to solve explicitly the process µ * in terms of U , and from which we will define the liquidity cost, and argue that, modulo a term that is of order α 2 , where α is the trading size, it is linear (although time inhomogeneous) in α. More importantly, once we obtain the density function of the LOB, we can then evaluate the liquidity cost. We show that, under mild technical conditions, the average price (including liquidity cost) exactly coincides with the supply curve in sense of Cetin-Jarrow-Protter [8].
Our second goal of this paper is to consider an optimal execution problem. That is, finding an optimal strategy of purchasing a large block of shares within a prescribed time duration [0, T ] with a minimum cost. Such a problem has been studied by many authors (cf. e.g., [2,4,6,17], and the references cited therein), but with the endogenously given shape of LOB, our problem seems to be new. We shall consider only two types of actions: the (buying) action of the large investor self, and an aggregated action of all the other investors, which is modeled as a compound Poisson process, representing all incoming limit sell orders, canceled orders, and the market buy orders. In other words, without the buying action of the investor, whose accumulated purchase will be described by an increasing process, the movement of the total available shares in the LOB is a continuous time pure jump process. We then show that the Bellman Principle of dynamic programming holds in this case, and the value function is a viscosity solution of the resulting HJB quasi-variational inequality (QVI). Finally, in the case that the QVI has a classical solution, we shall analyze the optimal strategy by proving a verification theorem. It is noted that the continuous (or inaction) region in our model may not be simply connected, and as a consequence the optimal strategy may contain multiple (even infinitely many) jumps.
The rest of the paper is organized as follows. In §2 we give the necessary technical background and describe the basic elements of the model. In §3 we introduce the notion of equilibrium distribution, and analyze some important quantities that can be derived endogenously from such distribution. These in particular include bid-ask spread and the liquidity cost that play the fundamental role in our optimal execution problem. In §4 we introduce the optimal execution problem and study its various equivalent expressions. In §5 and §6 we prove the dynamic programming principle, derive the HJB equation, and prove that the value function is a viscosity solution to the corresponding HJB equation. Finally, §7 is devoted to the construction of an optimal strategy, in the case that the HJB equation has a classical solution.

Preliminaries
Throughout this paper we assume that all the randomness comes from a complete probability space (Ω, F, P) on which are defined a standard Brownian motion W = {W t : t ≥ 0}, and a standard Poisson process N = {N t :≥ 0} with intensity λ. In what follows the Brownian motion W represents the market noise that drives the fundamental value (or mid-price) of the underlying stock, and the Poisson process N represents the frequency of the incoming limit orders. Therefore it is reasonable to assume that W and N are independent. We shall denote F W = {F W t : t ≥ 0} and F N = {F N t : t ≥ 0} to be the natural filtration generated by W and N , respectively.
We consider a finite time horizon [0, T ]. For simplicity, we assume that there is only one stock traded in an order driven market, and the interest rate is 0. We first give the mathematical description of the basic elements involved in our model.
1. Fundamental Price. We assume that the underlying stock has a fundamental value (or mid-price) which is known to the public. But the market price deviates away from it, due to the possible illiquidity, which leads to the bid-ask spread. Since the fundamental value only affects our model as a source of randomness, we simply assume that it is a diffusion, and satisfies the following stochastic differential equation (SDE): where b and σ satisfy the following standing assumptions: (H1) (i) b(·, ·) and σ(·, ·) are deterministic functions, continuous in t, and uniformly Lipschitz continuous in x, with a common uniform Lipschitz constant L > 0.
Remark 2.1 It is clear that the assumption (H1) guarantees the well-posedness of the the SDE (2.1), and solution satisfies X t > 0 for all t ≥ 0, P-a.s. The continuity of b and σ in t is mainly for the viscosity property of the value function in §6 below. For notational simplicity, in this paper we assume W is 1-dimensional, but all the results can be extended to higher dimensional case.
Moreover, we may even allow b and σ to be random, and all the results in §4 and §5 will still hold true, after obvious modification. However, in this case the HJB equation in Section 6 will become a backward stochastic PDE and the associated path dependent PDE. We refer to [10] for the related theory.
2. The Limit Order Book (LOB). We assume that there are patient and impatient investors in the market, and they put different bid and/or ask prices to either liquidate or purchase the given stock based on their preferences (see §3 for more discussion on this). Since in this paper we consider the optimal execution problem for purchasing the stock, only the sell side LOB will be relevant. We thus assume in what follows that all the buyers are impatient and only make "market orders" (i.e., buying whatever is available on the market), and consequently there is no "buy side" LOB. Moreover, we isolate one particular investor, referred as the investor, who will carry out the optimal execution problem later.
We shall assume that the movement of the LOB depends solely on the investment activities, namely the investor herself, and all other investors (buyers and sellers). For simplicity, we assume that the activities of other investors are aggregated as a large investor whose investment activities is described by a compound Poisson process is a sequence of i.i.d. random variables with distribution ν. We shall assume E{|Λ i |} < ∞. We should note that the large investor is allowed to make both (buy and sell) limit orders and market orders, and can also cancel orders. Thus Λ i 's will take values in R (i.e., ∆Y t < 0 is possible). It is useful to introduce the following filtration: , which will be the basic information source allowed in our execution problem. We notice that F N ⊂ F Y ⊂ F.

The Inventory
Process. We assume that the investor is trying to purchase a certain number, say K, shares of the given stock within a given time horizon [0, T ], and denote the accumulated number of shares up to time t ∈ [0, T ] by π t . Then clearly π = {π t : t ≥ 0} is an increasing process, and we assume that it is F-predictable. Note that, with this assumption, all the jumps times of π is predictable, and consequently ∆π τ i ∆Y τ i = 0, since all jump times of N (and of Y ) are totally inaccessible. In fact, for practical reason we could, and will, assume that N and Y have càdlàg paths but π is càglàd, and then naturally we have ∀t ∈ [0, T ], P-a.s.
Note that with such a definition the investor can observe the jump of Y and immediately jump afterwards. Clearly, each particular realization of π could be considered as an execution strategy.
We can now describe the dynamics of the total number of shares of the stock in the (sell) LOB, denoted by Q = {Q t : t ∈ [0, T ]}. We shall consider in this paper the simplest case in which the dynamics of Q can be affected by only two factors: the order made by the investor herself, π, and the orders made by the other large investor (or the aggregated action by all other market participants), Y . Then, it is readily seen that, for a given strategy π ∈ A and initial inventory q, the movement of Q π := Q π,q is determined by: Q π 0 := q, and (ii) When π is continuous, which will be the case in most of the paper, Q π is càdlàg.
We note from (2.3) that Q π τ i+1 ≥ 0. This is a natural constraint since the volume of the LOB can never be negative. However, not all π ∈ A will guarantee that the corresponding Q π t ≥ 0 for all t ∈ [0, T ]. We thus consider the following admissible strategies: given q ≥ 0, Throughout the paper, we shall denote (2.5) We remark that we do not take the closure for the first R + inŌ.

Equilibrium Distribution
In this section we introduce the notion of "equilibrium density" of the LOB, one of the most important ingredients in our model. Our idea follows from that of Rosu's [16], which we now describe. We assume that every seller comes into the market with the same amount of information (this is different from the asymmetric information assumptions, cf. [5]). Each seller sets his/her ask price based on the personal preference, which is the combination of the expected return of the order and the possible lost value (or cost) due to, say, the waiting time for the order to be executed. In an equilibrium we assume that every seller will have the same "expected return" (or "expected utility") of the order, which we denote by U (X, Q), where X is the fundamental value of the stock and Q is the total number shares available.
The existence of such equilibrium could be argued as follows. Suppose two sellers do not believe that they have the same expected return, then one of them (usually the one with lower expected return) is going to cancel his/her limit order and resubmit it to the market with a different ask price in exchange for a higher expected return. Then every seller in the market will do the same until an equilibrium is reached. We should point out that such an equilibrium approach only works when there is sufficient competition in the market. In fact, when the market is under monopoly, we should not expect the distribution to behave like this.
Given the expected return U (X, Q), we now introduce the concept of "equilibrium density".
Recall that the density function of an LOB is a non-negative function µ(y) ≥ 0, ∀y ≥ 0, such that µ(y) = 0, for y < p 0 , where p 0 ≥ X is the lowest (best) ask price, and that We note that if µ(y) ≡ µ, p 0 ≤ y ≤ p 0 + Q/µ, is a constant, then the LOB is said to have a "block shape" (see, e.g., [4] and [15]). Another way to study the problem is to assume the "shape" of the LOB is given exogenously (see, e.g., [2,17]). Our main idea is to show that the shape function is determined by the following simple facts. Assume that a (large) market buy order comes in and α-shares of the stock were purchased, where α ∈ (0, Q]. We assume that the lowest portion of α shares in the LOB is consumed. Thus, if we denote p(0) = p(0, X, Q), to be the lowest ask price, then we can find p(α) > p(0) such that On the other hand, we assume that, in equilibrium, the average price of the sold block should have the same expected return of the remaining orders in the LOB, which has a total of Q − α shares after the purchase. In other words, we assume that: for any α that 0 ≤ α ≤ Q, Now taking derivative with respect to α in (3.2) and (3.3) we obtain: Solving two equations in (3.4) we have: We note that, by setting α = 0 in (3.5), That is, the "frontier" of the LOB is exactly the representative of the equilibrium, as expected.
On the other hand, since the function α → p(α) is obviously non-decreasing, we can assume further that it is invertible and denote h(y) = p −1 (y), then (3.6) becomes Namely, the equilibrium density µ := µ X,Q can be explicitly derived, as long as U (X, Q) is given.
We should remark here that the modeling of the expected return function U (X, Q) is itself an interesting and challenging problem. For example, in [16] such an expected return function was obtained explicitly by solving a recursive difference equation. Also, in a slightly different setting, the relationship between the bid-ask spread and the liquidity was considered by Avellaneda-Stoikov [5], in which an argument of indifference pricing was applied to construct the return function U . In what follow we shall assume the existence of such a function U , and furthermore, based on the discussion above, we make the following assumptions.
(H2) The expected utility function U : R + ×R + → R + enjoys the following properties: which leads further to the existence of its inverse so that the formula (3.8) makes sense. Moreover, This fact will be frequently used in our discussion.
(ii) (H2) obviously does not render the function U a true "utility function" in either variable.
In fact, the assumption (H2)-(i), which guarantees the positivity of the density function µ (see (3.6)), implies that it is decreasing and convex in Q, hence a "cost function" on Q in a usual sense. Of course, it would be reasonable to assume that U is concave in X, hence a utility on the price, but we do not need such an assumption in the rest of our discussion.
(iii) In practice, it is natural to assume further that U (x, q) ≥ x, or lim q→∞ U (x, q) = x. The latter implies that the liquidity premium vanishes as the supply goes to infinity. But technically we do not need them in this paper.
We conclude this section by observing that, given the density function µ = µ X,Q , the cost for buying α shares of stock can be easily calculated as where the last equality is due to (3.3). From this we obtain that Clearly, we can see that the liquidity cost consists of a linear part (with respect to the trade size α), due to the bid-ask spread; and a higher order part that is determined by the "shape" of the LOB. More precisely, assume for example p ′ (α) < ∞, then we can easily derive from (3.10) that In particular, if we consider a purchase strategy π = {π t }, then (3.11) amounts to saying that . Consequently, for a continuous strategy π c = {π c t , t ∈ [0, T ]}, the following calculation of the total cost will be useful in the rest of the paper: The following obversion is worth noting. Assume that the function U is sufficiently regular, then by (3.3) we see that, for each α ∈ [0, Q], the process of "average price" of the stock counting liquidity cost, defined by is a semi-martingale. Furthermore, the assumption (H2) implies that it is convex and increasing with respect to the trade size α. In other words, the process S is exactly the supply curve in the sense of Cetin-Jarrow-Protter [8](!).

Optimal Execution Problem
We are now ready to introduce the main objective of the paper: the optimal execution problem.
Consider the scenario when an investor would like to purchase K shares of the stock within a prescribed time duration [0, T ]. Given initial inventory q ≥ 0 and a purchase strategy π ∈ A ad (q), we consider the following cost functional: where π c denotes the continuous part of π, and g : R + × [0, K] → R + is the terminal penalty function. Clearly, the first term is the cost for the jump part of π, and the second term is the cost of the continuous part of π. The value function is thus We shall assume that the terminal penalty function g satisfies the following assumption: (H3) (i) g is uniformly Lipschitz continuous in (x, y), with Lipschitz constant L > 0.
Remark 4.1 In the case π T < K, one is forced to purchase the remaining amount of shares y := K − π T at time T , regardless the liquidity. The terminal (penalty) g(x, y) ≥ U (x, 0)y for y ≥ 0 amounts to saying that this price would be more expensive than the highest market price U (x, 0), the price with zero liquidity. Furthermore, by (H3)-(ii) we see that Therefor if the final inventory is Q, and the investor needs to purchase a total of y shares, but decides to buy 0 < y ′ ≤ y ∧ Q from LOB right before T and buys the remaining y − y ′ using the penalty price, then his total cost would be: recall (3.9), This again shows that it is disadvantageous to purchase everything at the terminal time.
We now introduce two alternative expressions for V 0 to facilitate the future discussion. First, we define the set of continuous strategies by Clearly, if π ∈ A c ad (q), then Q π is càdlàg and C(X t , Q π t , ∆π t ) = 0. We thus define (4.4) Next, recall that p(0, X, Q) = U (X, Q) is decreasing in Q. Thus, for 0 < α ≤ Q, it holds that We now replace C(· · · ) by D(· · · ) in (4.1) and define (4.6) We note that since A c ad (q) ⊆ A ad (q), it follows from (4.5) that Our main observation is that the cost D(X, Q, α) can actually be approximated by continuous strategies, thus these inequalities should all be equalities. We substantiate this in the following theorem.
To this end, we fix arbitrary π ∈ A ad (q) and ε > 0. We claim that Indeed, for each m ∈ N, define τ m 0 := 0 and τ m i+1 Since π has right limits and the filtration F is right continuous, we see that Clearly, (π m ) c = π c and π m ≤ π. This implies that Q π m ≥ Q π and thus π m ∈ A ad (q). Moreover, Furthermore, since obviously one has lim m→∞ g(X T , K − π m T ) = g(X T , K − π T ), we conclude that lim m→∞ J 1 (π m ) ≤ J 1 (π), and thus there exists M such that Next, recall again that ∆π s ∆N s = 0 and thus τ i = τ M j , P-a.s. for all i, j. Let δ > 0 be a small number. For each i = 1, · · · , M 2 , let j i be the smallest j such that τ j > τ M i . We remark that j i is random and τ j i is still an F-stopping time. Define π M,δ recursively as follows. First, π M,δ where we abuse the notation that τ M m 2 +1 := T . It is clear that π M,δ is continuous and π M,δ ≤ π M . This implies that π M,δ ∈ A c ad (q). Note that, by changing variable u : and that lim δ→0 P(τ M,δ i = τ M i + δ) = 1, thus we have lim δ→0 π M,δ T = π T , P-a.s. Now, by the monotonicity of U again and applying the dominated convergence theorm, Setting δ > 0 small enough such that J 0 (π M,δ ) ≤ J 1 (π M ) + ε 2 . By (4.9) and recalling that π M,δ ∈ A c ad (q), we prove (4.7), whence the theorem.

Remark 4.3 (i)
We note that the cost functional J 0 (t, x, k, q; π) in (4.14) uses only continuous strategies. It will facilitate the argument when we prove that the value function V is a viscosity solution to the HJB equation in §5 and §6.
(ii) The cost functional J 1 (t, x, k, q; π) will be useful when we investigate the existence of optimal strategy in §7. Recall from Theorem 4.2 the inequality V 0 0 ≤ V 0 ≤ V 1 0 . Thus an optimal strategy, if exists, should also optimize J 1 . However, it is worth noting that cost function D(· · · ) does not have a practical meaning, as opposed to the cost function C(· · · ), and in practice it cannot be implemented directly. Nevertheless, combining the approximations (4.8) and (4.10) in the proof of Theorem 4.2, we will be able to find an implementable good approximation of optimal strategy, as we shall see in §7.

Dynamic Programming Principle
In this section we verify some properties of the value function V and establish the Dynamic Programming Principle (DPP). As we pointed out in Remark 4.3-(i), we shall consider the cost functional J 0 . We begin by the regularity of V with respect to the "spatial variables" x, k, and q, respectively. is non-decreasing x, non-increasing in k and q, respectively, and uniformly Lipschitz continuous with respect to (x, k, q) ∈Ō.
We can now follow the standard arguments in the literature to establish the following simpler from of dynamic programming principle, when the time increments are deterministic. Proposition 5.2 Assume (H1) -(H3). Then, for any 0 ≤ t 1 < t 2 ≤ T and (x, k, q) ∈Ō, Proof. LetṼ (t 1 , x, k, q) denote the right side of (5.6). We first show that V (t 1 , x, k, q) ≥ V (t 1 , x, k, q). Indeed, for any π ∈ A c ad (t 1 , k, q), letπ denote the restriction of π on [t 2 , T ]. Then X In other words,π ∈ A c ad (t 2 , π t 2 , Q π,q t 2 ). This implies that We remark that in the above the last equality can be proved rigorously by using the notion of regular conditional probability distribution. Since the argument would be rather lengthy but more or less standard, we omit the details. Now take infimum over π ∈ A c ad (t 1 , k, q) on both sides of above, we obtain V (t 1 , x, k, q) ≥Ṽ (t 1 , x, k, q).
To prove the opposite inequality, we first fix ε > 0, and consider a countable partition {O i } ∞ i=1 ofŌ and (x i , k i , q i ) ∈ O i , i = 1, 2 · · · , such that, for any (x, k, q) ∈ O i , it holds that |x − x i | ≤ ε, For any (x, k, q) ∈ O i , note that π i − k i + k ∈ A c ad (t 2 , k, q i ) ⊂ A c ad (t 2 , k, q). Then, by (5.1), (5.3), (5.4), and applying Proposition 5.1, for a generic constant C we have Now for any π ∈ A c ad (t 1 , k, q), define a new strategyπ: It is clear thatπ t 1 = k,π is continuous and non-decreasing on [t, T ], andπ T ≤ π i T ≤ K on each O i . Moreover, Qπ ,q s = Q π,q s ≥ 0 for s ∈ [t 1 , t 2 ], and for s ∈ [t 2 , T ], on O i we have Thusπ ∈ A c ad (t 1 , k, q), and therefore, it follows from (5.7) that Now, since ε > 0 is arbitrary and π ∈ A c ad (t 1 , k, q), we conclude that V (t 1 , x, k, q) ≤Ṽ (t 1 , x, k, q), proving the proposition.
As a corollary of Proposition 5.2, we shall prove the temporal regularity of V . We note that this will be a crucial step towards the general form of dynamical programming principle. (H1)-(H3). Then, for any 0 ≤ t 1 < t 2 ≤ T and (x, k, q) ∈Ō, we have

Corollary 5.3 Assume
Proof. First note that the constant process k ∈ A c ad (t 1 , k, q). Then, by Propositions 5.2 and 5.1, Next, recall from §2 that the dynamics of Q (see (2.3)) is driven by the compound Poisson process Y , whose jump size Λ i 's and the jump times τ i 's are independent. Then one can easily check: Consequently, we obtain On the other hand, since U ≥ 0 and V is decreasing in q, where the last inequality is due to (5.9). This, together with (5.10), leads to (5.8).
To conclude this section we give a general version of the dynamic programming principle.
Denote T t to be all the F-stopping times taking values in (t, T ]. Theorem 5.4 Assume (H1)-(H3). Then, for any (t, x, k, q) ∈ [0, T ) ×Ō and any τ ∈ T t , Proof. For each π ∈ A c ad (t, k, q) and τ ∈ T t , denote I(π, τ ) be the expectation on the right side of (5.11). Following the arguments in Proposition 5.2 one can easily show that V (t, x, k, q) ≥ inf π∈A c ad (t,k,q) I(π, τ ). So it suffices to prove the reversed inequality: I(π, τ ). (5.12) We first assume that τ ∈ T t takes only finitely many values t < t 1 < · · · < t m ≤ T . We prove (5.12) by induction on m. When m = 1, (5.12) follows from Proposition 5.2. Now assume that (5.12) holds for m − 1, and that τ takes m values. For any π ∈ A c ad (t, k, q), we have Note that {τ > t 1 } ∈ F t 1 and τ takes only m − 1 values on {τ > t 1 }. By inductional hypothesis we have where the last inequality is due to Proposition 5.2. Since π ∈ A c ad (t, k, q) is arbitrary, we proved (5.12) for m, completing the induction.
To prove (5.12) for arbitrary τ ∈ T t , we first find τ n ∈ T t , n = 1, 2, · · · , such that τ n − τ ≤ 1 n and τ n ↓ τ , as n → ∞. By previous arguments we see that (5.12) holds for each τ n . That is, x, k, q) ≤ I(π, τ n ) for each π ∈ A c ad (t, k, q). Moreover, by definition of I(π, τ ) we have Applying Corollary 5.3 and noting that π is continuous we see that the right hand side above converges to 0 as n → ∞. Consequently we obtain that V (t, x, k, q) ≤ I(π, τ ) for each π ∈ A c ad (t, k, q). This implies (5.12), and hence concludes the proof.
The HJB equation In this section we shall prove that the value function, while not necessarily smooth, is a viscosity solution of the Hamilton-Jacobi-Bellman equation of the optimal execution problem.
We begin by introducing some notations. For simplicity we often use the equivalent notations for partial derivatives: ∂ t ϕ = ∂ϕ ∂t . The notations ∂ x ϕ, ∂ k ϕ, ∂ q ϕ, and ∂ xx ϕ are thus obvious. In this and next section, we denote by C 1,2 b ([0, T ]×Ō) the set of continuous functions ϕ on [0, T ]×Ō such that the partial derivatives ∂ t ϕ, ∂ x ϕ, ∂ k ϕ, ∂ q ϕ, and ∂ xx ϕ exist and are continuous and bounded.
For each t ∈ [0, T ), we introduce a new filtration: Moreover, in light of the cost functional J 1 in (4.13) and the DPP (5.13), we define, for each (t, x, k, q) ∈ [0, T ) ×Ō, π ∈ A ad (t, x, k, q), ϕ ∈ C([0, T ] ×Ō), and F-stopping time τ , Next, we let τ t 1 be the first jump time of N after t and ν is the common distribution of the jump size random variables Λ i 's. We remark here that, by definition (6.1) it is clear that (τ t 1 , ∆Y τ t 1 ) is independent ofF t , and hence τ t 1 is not an F t -stopping time(!). Furthermore, we have the following result that is important for our discussion. Lemma 6.1 For any fixed (t, k, q) and any π ∈ A ad (t, k, q), there exists anF t -adapted processπ such thatπ s∧τ t 1 = π s∧τ t 1 , for all s ≥ t, P-a.s.
Proof. We first note that since π is left continuous, we need only find aF t -adapted processπ such that, for any fixed s ≥ t P{π s 1 {τ t 1 >s} = π s 1 {τ t 1 >s} } = 1. This amounts to saying that given s ≥ t, and X ∈ L 0 (F s ), there existsX ∈ L 0 (F t s ) such that X1 {τ t 1 >s} =X1 {τ t 1 >s} , lP-a.s. But this last statement is more or less standard (see, e.g., [7]), we nevertheless give a brief proof for completeness. We fix s > t and denote Clearly, H s ⊆ L 0 (F s ). We claim that H s ⊇ L 0 (F s ). Indeed, note that F s =F t s ∨ σ{Y r , t ≤ r ≤ s}. By a simple Monotone Class argument, for any X ∈ L 0 (F s ), we need only assume either X ∈ L 0 (F t s ) or X = Y r for some r ∈ [t, s]. But in the former case we can chooseX = X, and in the latter case we chooseX = Y t . Since in both casesX ∈ L 0 (F t s ), we conclude that X ∈ H s . This proves the claim, whence the lemma. Now for any ϕ ∈ C 1,2 b ([0, T ] ×Ō we introduce the following integro-differential operators: The following lemma is crucial.
Lemma 6.2 Assume ϕ ∈ C 1,2 b ([0, T ] ×Ō and τ is an F t -stopping time. Then it holds that where L and M are defined by (6.3).

Moreover, V is called a viscosity solution if it is both a viscosity subsolution and supersolution.
Our main result of this section is the following theorem.
Proof. The terminal condition V (T, x, k, q) = g(x, K − k) is obvious. Moreover, note that if π t = K, then π s ≡ K for all s ∈ [t, T ), as there is no need to purchase any more. Thus dπ s = 0 for s ∈ [t, T ], and clearly g(X T , K − π T ) = g(X T , 0) = 0. That is, V (t, x, K, q) = 0. So Definition 6.3 (i) holds (with equalities), and thus it suffices to check Definition 6.3 (ii) and (iii).
We now turn to the viscosity supersolution property. We first check Definition 6.3 (iii). Let x, k, 0). For any π ∈ A c ad (t, k, 0), since there is no liquidity (q = 0), there is no possibility of trading, and thus it must hold that: π s ≡ k and Q π,0 s = 0, s < τ t 1 . Then, by (6.10), (6.13) and (6.9) again, we have Dividing both sides above by δ and then sending δ → 0, similar to the case (6.15) we can prove Definition 6.3 (iii).
It remains to verify Definition 6.3 (ii). Suppose in the contrary that for some (t, x, k, q) ∈ [0, T ) × O and ϕ ∈ A(t, x, k, q). Then, applying Theorem 5.4 on τ t 1 we can find π := π δ ∈ A c ad (t, k, q) such that Now letπ be the F t -adapted version of π, as was defined in Lemma 6.1, andQ π s = q −π s + k, s ≥ t. For any δ > 0, define the following stopping times: Then τ ′ δ is anF t -stopping time. Similar to the first part of Proposition 5.2 we can show that Now following the derivation of (6.15) we obtain Since ϕ is smooth, we deduce from (6.17) that, for δ is small enough, Thus it follows from (6.19 Finally, recalling (6.16) and noting that We derive from (6.20) that δ 2 ≥ c 2 δ − C(1 + |x| 4 )δ 2 . But this is obviously impossible when δ > 0 is small enough, a contradiction to the assumption (6.17). This completes the proof. Remark 6.5 (i) If the value function actually has the regularity V ∈ C 1,2 b ([0, T ] ×Ō), then instead of being a viscosity solution, it will be a classical solution to the QVI (6.11). Moreover, by Theorem 7.4 below, we see that the classical solution is unique.
(ii) We should note that one may try to analyze the uniqueness in the sense of viscosity solutions by following the standard techniques (see the classical reference [9]). However, since our main focus is the dynamic equilibrium model of the limit order book, we prefer not to pursue this in this already lengthy paper and will leave it to interested reader.

Description of Optimal Strategy
In this section we give a characterization of the optimal strategy. Our argument will be based on the assumption that the HJB equation has a "classical solution", which will not be substantiated in this paper, as it is itself a challenging problem. Our main purpose is to see the possible structure of the optimal strategy and compare it to the usual optimal singular stochastic control in the literature.
Our starting point is the following partial Verification Theorem.
In the rest of the section we shall find an optimal strategy π * ∈ A ad (0, 0, q) such that (7.2), hence (7.1), holds with equality, given the existence of the classical solution v of the QVI (6.11)-(6.12). We shall remark though, although it is interesting in theory, the π * is in general not implementable since the cost D in the expression J 1 of (4.13) is not the real jump cost. However, as was pointed out in Remark 4.3, this π * will nevertheless provide us a very good and implementable approximate optimal strategy.
To help identifying the optimal strategy π * , we first provide some sufficient conditions. Without loss of generality, we shall only focus on the interval [0, τ 1 ], corresponding to the term in (7.2) with i = 0. To be more precise, we want to find π ∈ A ad (0, 0, q) such that e π,0 := E To this end, for any (t, x, k, q) is an open set in [0, K ∧ q], and φ is F W -progressively measurable, non-decreasing in k, such that φ(t, k, q) ≥ k, and φ(t, k, q) = k for k ∈ O(t, X t , q). We have the following result.
That is, π t ∈Ō(t, X t , q). But note that as the solution to the variational inequality (6.11), it is easy to see that L [v](t, X t , y, q − y) = 0 holds whenever M [v](t, X t , y, q − y) > 0, namely, for any y ∈ O(t, X t , q). The continuity of v then renders that L[v](t, X t , y, q − y) = 0 onŌ(t, X t , q).
We next show that such π indeed exists. Fix (x, q). In light of Proposition 7.2 we introduce: A 0 = π ∈ A ad (0, 0, q) : Xt,q) (π t )dπ c t = 0, π t+ ≤ φ(t, π t , q), t ∈ [0,τ 1 ), P-a.s. . (7.10) Clearly, π t ≡ 0 ∈ A 0 , thus A 0 = ∅. We shall construct the optimal strategy from this set. Proof. We shall prove the existence by using Zorn's lemma. To this end, we introduce a partial order in A 0 : π 1 ≺ π 2 if and only if π 1 t ≤ π 2 t for all t ∈ [0, T ], P-a.s. (7.11) We claim that every totally ordered subset in A 0 has an upper bound in A 0 . Indeed, let {π i } i∈I ⊆ A 0 be a totally ordered subset, where the index set I could be uncountable. Denoting Q T to be the set of all rationals in [0, T ], we define π r := esssup i∈I π i r , ∀r ∈ Q T . (7.12) Since {π i } is totally ordered, by a standard argument we can find a sequence π n = π in , i n ∈ I, n = 1, 2, · · · , such that π n 's are non-decreasing in n; and lim n π n r = esssup i∈I π i r = π r , ∀r ∈ Q T . (7.13) We then define π t := lim rրt,r∈Q T π r , for all t ∈ (0, T ]. We shall prove that π ∈ A 0 , and therefore an upper bound of {π i }. Clearly π is F-adapted, non-decreasing, left continuous, and π 0 = 0, π T ≤ K. Moreover, since Q π n ≥ 0, clearly Q π r ≥ 0 for all r ∈ Q T , which implies Q π t ≥ 0 for all t ∈ [0, T ] and thus π ∈ A ad (0, 0, q).
Summarizing, we have shown that every totally ordered subset of A 0 has an upper bound.
This, together with Proposition 7.1, completes the proof.
Remark 7.5 Based on Proposition 7.2 we can roughly describe the optimal strategy π * as follows.
At each time t ∈ [τ i ,τ i+1 ] between the two jump times of N , there is an "inaction region" O(t, X t , Q π * τ i ), which is an open set, and therefore can be decomposed into open intervals. If π * t − π * τ i ∈ O(t, X t , Q π * τ i ), then it stays "flat." If it is at the boundary of O(t, X t , Q π * τ i ), hence the boundary of one of the open intervals, then it either jumps to φ(t, X t , Q π * τ i ), i.e, the boundary of nearest neighboring interval above it, if φ(t, X t , Q π * τ i ) > π * t , or move along with the boundary of O(t, X t , Q π * τ i ), when φ(t, X t , Q π * τ i ) = π * t . In particular, when O(t, X t , Q π * τ i ) is simply connected, then π * essentially behaves like an optimal singular stochastic control. However, it is not clear to us that O(t, X t , Q π * τ i ) will be simply connected, and consequently the optimal strategy may jump multiple (even infinitely many) times between [τ i ,τ i+1 ].