H\"older regularity for stochastic processes with bounded and measurable increments

We obtain an asymptotic H\"older estimate for expectations of a quite general class of discrete stochastic processes. Such expectations can also be described as solutions to a dynamic programming principle or as solutions to discretized PDEs. The result, which is also generalized to functions satisfying Pucci-type inequalities for discrete extremal operators, is a counterpart to the Krylov-Safonov regularity result in PDEs. However, the discrete step size $\varepsilon$ has some crucial effects compared to the PDE setting. The proof combines analytic and probabilistic arguments.


Introduction
The celebrated Krylov-Safonov [KS79] Hölder estimate is one of the key results in the theory of non-divergence form elliptic partial differential equations with bounded and measurable coefficients. The result, in addition to being important on its own right, also gives a flexible tool in the higher regularity and existence theory due to its very general assumptions on the coefficients.
In this paper, we study regularity of expectation of a quite general class of discrete stochastic processes or equivalently functions satisfying the dynamic programming principle (DPP) where f is a Borel measurable bounded function, ν x is a symmetric probability measure for each x with support in B Λ , Λ ≥ 1, ε > 0, and α + β = 1, α ≥ 0, β > 0. From a stochastic point of view, our processes are generalizations of the random walk where the next step in the process is taken according to a probability measure that is a combination of ν x and the uniform distribution on B ε (x) as described by the DPP (more details are given in Sections 2.1 and 2.3).
It is important to notice that ν x can vary quite freely from point to point. Under continuity or other assumptions not satisfied in our case, related results have been studied for example in [LR86], [CS09] and [Kus15].
The first of the two main results of this article is a Hölder estimate in the discrete setup without any further continuity assumption on the measures ν x .
Theorem 1.1. There exists ε 0 > 0 such that if u is a function satisfying (1.1) in B 2R with ε < ε 0 R, then for every x, z ∈ B R , where C > 0 and γ > 0 are constants independent of ε.
The role of the discrete processes we study can be compared to the role of linear uniformly elliptic partial differential equations with bounded and measurable coefficients in the theory of PDEs.
The regularity techniques in PDEs, see [KS79] and [CC95], or in the nonlocal setting, see [CS09] and [GS12] utilize, heuristically speaking, the fact that there is information available in all scales. We also refer to [CD12] and [SS16] for similar results regarding nonlocal operators with nonsymmetric kernels. For a discrete process, the step size sets a natural limit for the scale, and this limitation has some crucial effects. Indeed, the value can even be discontinuous, and our estimates are asymptotic. Such estimates suffice in many applications, for example in passing to the limit with solutions to discretized PDEs or stochastic processes. Similar results have been obtained on a grid in the context of difference equations with random coefficients in [KT90]. See also [Law91], where regularity estimates for difference equations arising from random walks are obtained using probabilistic techniques.
The proof uses a stochastic approach akin to the original proof of Krylov and Safonov in [KS79] with some crucial differences. The first observation, as suggested above, is that the function u in (1.1) can be presented as an expectation. The key estimate is then Theorem 5.7 stating that we can reach any set of positive measure with a positive probability before exiting a bigger cube. With this result at our disposal, the De Giorgi oscillation estimate, Lemma 5.8, follows in a straightforward manner. Indeed, we can reach a level set with a positive probability and use this in estimating the oscillation. The Hölder estimate, Theorem 1.1, then follows by De Giorgi oscillation lemma after a finite iteration.
The proof of Theorem 5.7 is nonstandard. In the proof we would like to construct a set of cubes which is large enough and such that the set we want to reach has a high enough density in the cubes. Both conditions, however, cannot always be satisfied simultaneously in our setting. As suggested above, both the PDE and nonlocal techniques utilize the information in all scales. Concretely, a rescaling argument is used in those proofs in arbitrary small cubes. In contrast, in our case the step size ε determines limit for the scale. If we simply drop all the cubes smaller than of scale ε in the usual Calderón-Zygmund decomposition, we have no control on the size of the error. Therefore, the cubes of scale ε need to be taken into account separately both in the decomposition lemma, Lemma 5.4, as well as in the proof of the key intermediate result, Theorem 5.7.
The proof of Theorem 5.7 is based on the ε-version of the Alexandrov-Bakelman-Pucci (ABP) estimate with bounded and measurable right hand side, Theorem 4.7. However, the classical proof of the ABP estimate using the change of variables formula for integrals to obtain a quantity that can be estimated by the PDE does not seem directly applicable here. Instead, we adapt the nonlocal approach of Caffarelli and Silvestre [CS09]. A second remark is that we directly apply the ABP estimate with a discontinuous right hand side, which is chosen to be a characteristic function of a level set. In this case, the standard ABP estimate having the L Nnorm on the right hand side is false (Example 4.6), and therefore the statement of Theorem 4.7 is weaker, but sufficient for our purposes.
Our study is partly motivated by the aim of developing stochastic methods in connection with the p-Laplace equation and other nonlinear PDEs, see Example 2.4 and Remark 7.6. The Hölder estimate, Theorem 1.1, can be generalized to functions merely satisfying the Pucci-type inequalities, later modified and shortened to the form L + ε u ≥ − |f | , L − ε u ≤ |f | . This is our second main result.
Theorem 1.2. There exists ε 0 > 0 such that if u is a function satisfying (1.2) and (1.3) for every x ∈ B 2R with ε < ε 0 R, then for every x, z ∈ B R , where C > 0 and γ > 0 are constants independent of ε.
We refer the reader to Section 7 and in particular to Theorem 7.3 for a more detailed description.

Preliminaries
As above, let Λ ≥ 1, ε > 0, β ∈ (0, 1] and α = 1 − β. Every constant may depend on Λ, α, β and the dimension N . If a constant depends on other parameters we denote it. Throughout this paper Ω ⊂ R N denotes a bounded domain, and further B r (x) = {y ∈ R N : |x − y| < r} as well as B r = B r (0). We construct an extended domain containing all balls B Λε (x) with x ∈ Ω as follows We follow the convention Further, When no confusion arises we just simply denote · N and · ∞ , respectively.
For each x = (x 1 , . . . , x n ) ∈ R N and r > 0, we define Q r (x) the open cube of side-length r and center x with faces parallel to the coordinate hyperplanes. More precisely, In addition, if Q = Q r (x) and ℓ > 0, for simplicity we denote ℓQ = Q ℓr (x).
2.1. Dynamic programming principle and difference operators. We consider M(B Λ ) the set of symmetric unit Radon measures with support in B Λ and ν : defines a Borel measurable function for every u : R N → R Borel measurable. Then for each x ∈ R N we have a measure ν x with support in B Λ such that It is worth remarking that the hypothesis (2.1) on Borel measurability holds, for example, when the ν x 's are the pushforward of a given probability measure µ in R N . More precisely, if there exists a Borel measurable function h : is measurable by Fubini's theorem.
For each ε > 0 we consider a generalized random walk starting at x 0 ∈ Ω. Given the value of x k , the next position of the process x k+1 is determine as follows. A biased coin is tossed. If we get heads (probability α), a vector z is chosen according to ν x k and we have x k+1 = x k +εz. If we get tails (probability β), x k+1 is distributed uniformly in the ball B ε (x k ). More details are given in Section 2.3. Denote by τ the exit time from the domain, that is τ = min{n ∈ N : x n ∈ Ω}.
Given a Borel measurable bounded function g : R N \ Ω → R, we can define where E x0 stands for the expectation with respect to the process. We will prove in Section 3 that u : R N → R satisfies the homogenous dynamic programming principle given by u(y) dy for x ∈ Ω, and u(x) = g(x) for x ∈ Ω. Moreover, u is the unique bounded function that satisfies the dynamic programming principle.
Moreover, given the running payoff f : Ω → R, a Borel measurable bounded function, we can define In Section 3 we prove that u is the unique bounded function that satisfies the dynamic programming principle (DPP) for x ∈ Ω and u(x) = g(x) for x ∈ Ω. For clarity, let us emphasize that (2. 2) is what we call the DPP in this paper. This also motivates the following definitions.
Definition 2.1. We say that a bounded Borel measurable function u is a subsolution to the DPP if it satisfies in Ω. Analogously, we say that u is a supersolution if the reverse inequality holds. If the equality holds, we say that it is a solution to the DPP.
If we rearrange the terms in the DPP, we may alternatively use a notation that is closer to the difference methods.
Definition 2.2. Given a Borel measurable bounded function u : With this notation, u is a subsolution (supersolution) if and only if L ε u + f ≥ 0(≤ 0).
By defining δu(x, y) : = u(x + y) + u(x − y) − 2u(x) and recalling the symmetry condition on ν x we can rewrite δu(x, εy) dy . (2.3) Our theorems actually hold for functions merely satisfying Pucci-type inequalities. For expositional clarity we leave this to Section 7.

2.2.
Examples. In this section, we present some recent examples and applications. The list is by no means exhaustive, and further examples could be obtained discretizing partial differential operators with bounded and measurable coefficients.
Example 2.3 (Convergence to the solution of the PDE). Let φ ∈ C 2 (Ω). We can use the second order Taylor's expansion of φ to obtain an asymptotic expansion for L ε φ(x). Indeed, observe that holds as ε → 0 for every y ∈ B Λ , where a ⊗ b stands for the tensor product of vectors a, b ∈ R n , that is, the matrix with entries (a i b j ) ij . Hence, by the linearity of the trace, On the other hand, since every measure ν x ∈ M(B Λ ) defines a matrix which is a linear second order partial differential operator. Furthermore, for β ∈ (0, 1], the operator is uniformly elliptic: given ξ ∈ R N \ {0}, and estimating the integral we have that . It also holds, using Theorem 1.1 (cf. [MPR12, Theorem 4.9]), that under suitable regularity assumptions, the solutions u : = u ε to the DPP converge to a viscosity solution v ∈ C(Ω) of Similar convergence results also hold in the following examples.
Example 2.4 (p-Laplacian). In [LP17, Section 3.2], the following process that is covered by Theorem 1.1 was considered. Let u be a p-harmonic function whose gradient vanishes at most at a finite number of points. When at x ∈ Ω such that ∇u(x) = 0, we define a probability measure where L Bε(x) denotes the uniform probability distribution in B ε (x) ⊂ R N and δ x the Dirac measure at x. Then we choose the next point according to the probability measure There is a classical well-known connection between the Brownian motion and the Laplace equation. This example is related to so called tug-of-war games introduced in the paper of Peres, Schramm, Sheffield and Wilson [PSSW09] in connection with the infinity Laplace operator. Similarly, a connection exists between the p-Laplacian, 1 < p < ∞, and different variants of tug-of-war game with noise, [PS08], [MPR12] and [Lew20].
There are several regularity methods devised for tug-of-war games with noise: the above papers contain global approach, and local approach is developed in [LPS13] as well as [LP18]. However, none of these methods seem to directly apply to the present situation. Moreover, later we prove Theorem 7.3 which applies to solutions of DPP associated to tug-of-war games with noise and the p-Laplacian, see Remark 7.6.
Example 2.5 (Ellipsoid process). A particular case of the stochastic process considered in this paper is the so-called ellipsoid process (see [AP20]). This process arises when ν x is the uniform probability measure on E x \ B 1 , where E x denotes an ellipsoid centered at the origin such that Then for every measurable set A ⊂ R N , α = |Ex\B1| |Ex| and β = |B1| |Ex| . Hence, replacing this in (2.2) with f = 0 we get that the expectation related to the ellipsoid process satisfies the dynamic programming principle An asymptotic Hölder estimate was obtained in [AP20] under certain assumption on the ellipticity ratio of the ellipsoids. Now, Theorem 1.1 implies the Hölder estimate for u without any additional assumption and thus improves the result in [AP20]. Such mean value property over ellipsoids has been studied by Pucci and Talenti in connection with smooth solutions to PDEs in [PT76].
2.3. Stochastic process. Next we define the stochastic process related to the DPP (2.2). Let x 0 ∈ Ω be the initial position of the process. We equip R N with the natural topology, and the σ-algebra B of the Borel measurable sets. We consider along with the positions of the process the results of the coin tosses, so our process is defined in the product space For ω = (x 0 , (c 1 , x 1 ), (c 2 , x 2 ) . . . ) ∈ H ∞ , we define the result of the k-th toss C k (ω) = c k and the coordinate processes X k (ω) = x k . If C k+1 = 0 (probability α), the next position of the process X k+1 is distributed according to ν x k . And for C k+1 = 1 (probability β), X k+1 is uniformly distributed in B ε (x k ). That is, we have the following transition probabilities Let {F k } k denote the filtration of σ-algebras, F 0 ⊂ F 1 ⊂ · · · defined as follows: F k is the product σ-algebra generated by cylinder sets of the form We have that C k and X k are F k -measurable random variables. By Kolmogorov extension theorem, the transition probabilities determine a unique probability measure P x0 in H ∞ relative to the σ-algebra F ∞ = σ(∪ k F k ). We denote E x0 the corresponding expectation.
We consider τ the exit time from the domain, that is τ = min{n ∈ N : X n ∈ Ω}. We define T A as the hitting time for A and τ A the exit time, that is T A = min{k ∈ N : X k ∈ A} and τ A = min{k ∈ N : X k ∈ A}.
2.4. Stochastic estimates. In this section we establish some estimates related to τ and other stochastic results. We will prove that E x0 [τ ] is of order 1/ε 2 . Moreover, we will prove that the second moment of ε 2 τ is bounded. We start with a rough estimate needed as a first step.
Lemma 2.6. The process leaves the domain almost surely and moreover for every x 0 ∈ Ω.
Thus, it remains to prove (2.4). We choose n such that n ε 2 > diam Ω. We consider the event E of n steps after the k-th one to be uniformly distributed, and where the first coordinate increases at least ε 2 . That is E = {c k+1 = · · · = c k+n = 1 and π 1 (x i − x i−1 ) > ε/2 for i = k + 1, . . . , k + n}, where π 1 denotes the projection to the first coordinate. Observe that Assuming E we have that |x k+n − x k | ≥ n ε 2 , hence since n ε 2 > diam Ω, it must be the case that τ ≤ n + k. Therefore (2.4) holds for λ = 1 − P x0 (E) < 1. Now we construct two sequences of random variables. They will allow us to obtain bounds for the growth of the the expected value of the square of the distance from the starting point x 0 .
Lemma 2.7. The sequence of random variables By the symmetry of ν x k and the ball B ε we can write Employing the parallelogram law we get where we have used that ν x k is supported in B Λ .
Therefore, for C = αΛ 2 + β B1 |x| 2 dx we have as we wanted to show.
Lemma 2.8. The sequence of random variables Proof. As in the previous lemma we have By dropping the |εz| 2 term, we get as claimed.
We are ready to prove that E x0 [τ ] is of order 1/ε 2 .
Lemma 2.9. There exists C 1 , C 2 > 0 such that Since the increments of M k are bounded and E x0 [τ ] < +∞ we can apply the optional stopping theorem. We obtain E x0 [M τ ] ≤ 0 and hence as desire. The other inequality can be obtain by considering the submartingale from Lemma 2.8. In fact, we get where we have used that x τ is at a distance of at most Λε from x τ −1 ∈ Ω and therefore |x τ − x 0 | ≤ diam Ω + Λε.
Now we obtain an estimate for the random variable ε 2 τ necessary to bound its second moment in the subsequent corollary. We follow [BER19, Lemma 3.6]. The key point here is that the process is memoryless.

Dynamic Programming Principle: Existence and uniqueness
Recall that Ω ⊂ R N is a bounded domain and g : R N \ Ω → R and f : Ω → R are measurable bounded functions. We define where τ is the exit time from Ω. In this section we prove that u is the unique bounded solution to the DPP (2.2) given by for x ∈ Ω, and u(x) = g(x) for x ∈ Ω. For related arguments, see [LPS14], and [Har16], [Ruo16], [AHP17] as well as [BPR17].
In the following lemma we prove that subsolutions are uniformly bounded. We have required subsolutions to be bounded, which is necessary as shown by Example 3.5 below, but here we prove that there is a bound that only depends on the parameters of the problem and not the solution itself.
Lemma 3.1. There exist C = C(diam Ω, f, g, ε) > 0 such that u ≤ C for every subsolution u to the DPP with boundary values g.
Proof. We consider the space partitioned along the x N axis in strips of width ε 2 . We define S k = {y : y N < kε/2}, We define p = βA and consider k ∈ Z, for x ∈ S k+1 we have {y : Then, inductively, we get (3.1) We assume without loss of generality that Ω ⊂ {y : 0 < y N < R} for some R > 0. Then, since u = g in R N \ Ω ⊃ S 0 , we have sup S0 u = sup S0 g ≤ sup g. We assume that sup u ≥ sup g (if not then sup g is an upper bound for u and the proof would be finished) and consider n such that n ε 2 > R, then Ω ⊂ S n and we have sup Sn u = sup u. We apply (3.1) for such n and k = 0, we get From where we finally get the upper bound Lemma 3.2. There exists u 0 a subsolution to the DPP with u 0 = g on R N \ Ω.
Proof. We consider v(x) = K + L|x| 2 where we will choose K ∈ R and L > 0 in what follows. Since it is convex we have and therefore for L large enough we get We choose K small enough such that v ≤ g in Ω Λε . Then u 0 given by u 0 = v on Ω and u 0 = g on R N \ Ω is a subsolution. Proof. We construct the solution by iterating the DPP. We consider the function u 0 given by Lemma 3.2 and define inductively Since u 0 is a subsolution we get u 1 ≥ u 0 . Given u n ≥ u n−1 , by the recursive definition, we get u n+1 ≥ u n . Then, by induction, we obtain that the sequence of functions is an increasing sequence. By replacing u n+1 by its definition in u n+1 ≥ u n we get that u n is a subsolution.
Then Lemma 3.1 gives us a uniform bound for u n . We conclude that the sequence of functions u n converges pointwise to a Borel function u. Now our goal is to prove that u is a solution to the DPP. To that end, we will prove that the sequence converges uniformly. Then we obtain that u is a solution of the DPP by passing to the limit in the recursive definition.
For the sake of contradiction suppose that the convergence is not uniformly. Observe that sup Ω u − u n is a decreasing non negative sequence, then it has a limit Since we are assuming that the convergence is not uniform we have M > 0.
Observe that by Fatou's Lemma, we have lim n→∞ Ω u(y) − u n (y) dy = 0 so we can bound Bε(x) u(y) − u n (y) dy uniformly on x.
We fix δ > 0. Let n 1 be such that sup Ω u − u n ≤ M + δ for every n ≥ n 1 . And let n 2 such that for every x ∈ Ω and n ≥ n 2 .
Let n 0 = max{n 1 , n 2 } and k > n 0 , we have for every x ∈ Ω. Since this holds for every k > n 0 we get Recalling that α < 1 we can select δ such that α(M + δ) + βδ < M and we have reached a contradiction.
Theorem 3.4. The function is the unique bounded solution to the DPP with boundary values g.
Proof. Given a solution v to the DPP, we have that Then, by Doob's stopping time theorem (recall that v and f are bounded) we have Thus v = u, we have proved that u is a solution to the DPP and that every solution coincides with it, there is a unique solution.
Example 3.5. The uniqueness fails if we do not assume that solutions to the DPP are bounded. For Ω = (−2, 2) ⊂ R we consider the process given by ε = 1, α = β = 1/2 and ν x the uniform probability distribution on B 1 for every point x ∈ Ω except the ones in the set { 1 2 k } k∈N . There we set 2 .
In this case the DPP has multiple solutions: the function u ≡ 0 and which is not bounded. This is why the solutions to the DPP are required to be bounded.

ε-ABP estimate
Regarding the classic theory of elliptic PDEs, one of the key inequalities in the Krylov-Safonov proof of Hölder regularity is the so-called Aleksandrov-Bakelman-Pucci estimate (ABP estimate for short), which guarantees a pointwise bound for subsolutions of Lu + f = 0 by means of the L N -norm of f . Namely, if f is a continuous bounded function in Ω and u ∈ C(Ω) satisfies then there exists a constant C > 0 depending only on N , diam Ω and the uniform ellipticity of A(x) such that See [CC95, Chapter 3] for the ABP estimate for viscosity solutions and [GT01, Section 9.1] for strong solutions.
Given a subsolution u, one of the key ideas in the proof of the classical ABP estimate was the use of the concavity properties of u at the set of points where the graph of u can be touched from above by tangent hyperplanes. This set of points (known as the contact set and denoted by K u ) turned out to carry all relevant information about the subsolution. To be more precise, if we denote by Γ the concave envelope of u, the ABP estimate is obtained by studying the behavior of Γ at those points in Ω where Γ and u agree. Using the concavity of Γ, the first main step in the proof consisted on obtaining an estimate of sup Ω u in terms of |∇Γ(Ω)|. It is worth noting that the structure of the PDE does not play any role in the proof of this first estimate, which was obtained using exclusively geometric arguments.
In a second step, and in addition to the concavity of Γ, it turned out that Γ is C 1,1 in the contact set so, by virtue of Rademacher's theorem, Γ is indeed C 2 a.e. in K u . This fact and a change of variables formula gives an inequality of the form which allowed to use the equation to estimate the right hand side and, consequently, to obtain the ABP estimate.
However, in the case of the DPP, the non-local nature of the setting forces us to consider also non-continuous subsolutions of the DPP, so the corresponding concave envelope Γ might not be C 1,1 as in the classical setting. In addition there is no PDE to connect with the right hand side of the previous inequality, and therefore we follow a different strategy in order to estimate |∇Γ(Ω)|. The idea is to cover the contact set K u by a finite collection of balls of radius ε/4, and then to estimate |∇Γ(B ε/4 (x))| by means of the oscillation of Γ with respect to a supporting hyperplane touching the graph of Γ from above at x ∈ K u . This oscillation, in turn, is estimated by using the DPP, which yields the desired ε-ABP estimate. It is also interesting to note that one can recover the classical ABP estimate by taking limits as ε → 0.
In this section we adapt the ideas from [CS09, Section 8], where an ABP-type estimate was obtained for continuous solutions of non-local integro-differential equations (see also [CLU14] and [CTU20] for similar approaches). Further references are [KT90] for an ABP estimate for elliptic difference equations and [GS12], where an ABP-type estimate is obtained using a generalization of the concept of concave envelope as a non-local fractional envelope.
Let u : R N → R be a bounded function and let Γ be the concave envelope of Since u is not necessarily continuous, we define the 'contact' set as Since u + ≤ Γ, then K u is a closed subset, and thus compact. Moreover, observe that in the particular case of u being an upper semicontinuous function in Ω, then K u = Ω ∩ {u + = Γ}.
As we have already pointed out, one of the key steps in the proof of our ABPtype estimate is the construction of a suitable cover of the contact set K u by balls of radius ε/4. For this purpose, before stating the main result of this section we introduce the following notation. Given ε > 0, we denote by Q ε (R N ) a grid of open cubes of diameter ε/4 covering R N . Take for instance We stress that, while not needed in the proof of the main ε-ABP estimate, the assumption of Q ε being a grid is needed later in the proof of Theorem 4.7. Now we are in conditions of stating the main theorem of this section. We use the notation L ε u + f for convenience in some of the proofs, but this is equivalent to the DPP and stochastic notations as we recall at the end of the section.
in Ω for f ∈ C(Ω). Let Γ be the concave envelope of u + in Ω Λε and let Q = Q ε (K u ) be the grid of pairwise disjoint open cubes Q of diameter ε/4 defined in (4.2). Then After proving this theorem we relate the result to the stochastic process and in Theorem 4.7 we obtain a version of the estimate where the continuity hypothesis for f is removed.
In what follows, we can assume without loss of generality that f ≥ 0 in Ω and sup R N \Ω u = 0. Then u + = max{u, 0}.
It turns out that in order to prove Theorem 4.1 we only need to use the information of the concave envelope in the set of contact points K u . Indeed, since inserting this inequality in (2.3) we get that . Since Γ is concave, then Γ is continuous and for any fixed x 0 ∈ K u we have that Hence, if f ≥ 0 is a continuous function in Ω and u is a bounded function satisfying Therefore, in what follows, we will use (4.5) instead of (4.4).
Before stating the first lemma of this section, we need to define the superdifferential of Γ at x ∈ Ω Λε as the set (4.6) Since Γ is a concave function in Ω Λε , then ∇Γ(x) = ∅ for every x ∈ Ω Λε . Moreover, given a set S ⊂ Ω Λε , we denote ∇Γ(S) = x∈S ∇Γ(x).
In addition, if S is a compact subset of Ω, then ∇Γ(S) is closed. Indeed, if {ξ n } n ⊂ ∇Γ(S) is a sequence converging to ξ 0 ∈ Ω Λε , by definition there exists {x n } n ⊂ S (which by compactness we can assume that converges to some x 0 ∈ S by passing to a subsequence) such that Γ(z) ≤ Γ(x n )+ ξ n , z −x n for each z ∈ Ω Λε . Since Γ is concave (and thus continuous), taking limits we get that ξ 0 ∈ ∇Γ(x 0 ) ⊂ ∇Γ(S). In consequence, ∇Γ(S) is a Lebesgue measurable set.
Lemma 4.2. Let u : R N → R be a bounded function such that u ≤ 0 in R N \ Ω. Then where K u is the contact set defined in (4.1).
Since Γ is the concave envelope of u + and sup u = sup Γ, then for every |ξ| < ρ there exists ℓ a supporting hyperplane of Γ in Ω Λε such that ∇ℓ ≡ ξ. Fix any ξ ∈ B ρ . We claim that ξ ∈ ∇Γ(x 0 ) for some x 0 ∈ K u . To see this, let and define ℓ(z) = τ + ξ, z for every z ∈ Ω Λε . Then ℓ ≥ Γ ≥ u + in Ω Λε . Moreover, by the definition of τ , for each n ∈ N, there exists x n ∈ Ω Λε such that On the other hand, by the definition of ℓ, for any z ∈ Ω we have where in the inequality we have used that x n ∈ Ω Λε and z ∈ Ω, and the definition of ρ has been recalled in the last equality. Taking the supremum for z ∈ Ω we obtain 1 − |ξ| ρ sup Ω u + ≤ ℓ(x n ) ≤ u + (x n ) + 1 n for each n ∈ N. Hence, since |ξ| < ρ and sup Ω u + > 0, we can assume that x n ∈ Ω for every n ∈ N. Otherwise, since u ≤ 0 in R N \ Ω by assumption, if x n ∈ Ω Λε \ Ω for each sufficiently large n ∈ N, letting n → ∞ we would obtain a contradiction. Furthermore, by a compactness argument, we can assume without loss of generality that x n converges to a point x 0 ∈ Ω. Thus, since Γ is continuous, taking limits we get lim sup Finally, since u + ≤ Γ, we have in particular that x 0 ∈ K u and ξ ∈ ∇Γ(x 0 ). In consequence B ρ ⊂ ∇Γ(K u ), so |B 1 |ρ N ≤ |∇Γ(K u )| and (4.7) follows.
The idea is to estimate the term |∇Γ(K u )| in the right hand side of (4.7) by covering the contact set K u with balls of radius ε/4 and estimating |∇Γ(B ε/4 (x))|. This is done by obtaining an upper bound for the gradients of the concave function Γ in B ε/4 (x) which depends on the oscillation of Γ with respect to a supporting hyperplane touching the graph of Γ at x.
The following lemma shows that the graph of Γ stays quadratically close to a tangent hyperplane in a neighborhood of any point in which the inequality L ε Γ+f ≥ 0 is satisfied. It is noteworthy to mention that this is the only result where the DPP is used.
Lemma 4.4. Suppose that Γ is a concave function and x 0 ∈ Ω satisfying L ε Γ(x 0 )+ f (x 0 ) ≥ 0. Then, for any w > 0, the following holds where ξ is any vector in ∇Γ(x 0 ). Furthermore, (4.11) Proof. First observe that, since Γ is concave in Ω Λε , then δΓ(x 0 , y) ≤ 0 for every y ∈ B Λε . Thus, we can estimate by zero the α-term in (2.3), so we obtain L ε Γ(x 0 ) ≤ β 2ε 2 Bε δΓ(x 0 , y) dy. (4.12) Since f (x 0 ) ≥ −L ε Γ(x 0 ) by assumption, using the definition of δΓ(x 0 , y) and the symmetry of the ball we can estimate for any fixed ξ ∈ ∇Γ(x 0 ). Let us define the auxiliary function Φ : Observe that, for the sake of convenience, the sign of Φ has been changed with respect to the previous proof. Notice that Φ ≥ 0 due to the concavity of Γ. We split the ball B ε in two sets and we study the integral of Φ over each of them. Then where in the inequality we used that Φ ≥ 0 to estimate the second integral over B ε ∩ {Φ ≤ w}. Then (4.10) follows by combination of the previous estimates.
If f (x 0 ) > 0, we choose w > 0 so that 0 ≤ Φ(y) ≤ w holds for every y ∈ B ε/2 . Notice that, as we already mentioned, Φ ≥ 0 follows directly from the concavity of Γ. To check that Φ ≤ w, observe that the inclusion B ε/2 (y) ⊂ B ε holds for every y ∈ B ε/2 . Then (4.10) yields In particular, choosing w = 2 N +2 β f (x 0 ) ε 2 , we get that the left hand side of the previous inequality is bounded by 1/4, and thus there exists z ∈ B ε/2 such that Combining the inequalities for z and −z we obtain that so Φ(y) ≤ w follows from the convexity of Φ and this completes the proof.

Now we are in conditions of proving the main result of this section.
Proof of Theorem 4.1. Let us consider the pairwise disjoint collection of open cubes Q ε (K u ) defined in (4.2). Then the following conditions are satisfied: Since K u is bounded, we can label de cubes in Q ε (K u ) as Q 1 , . . . , Q n , where n = n(ε) ∈ N. Furthermore, we select a point x i ∈ K u ∩ Q i for each i = 1, . . . , n so that Q i ⊂ B ε/4 (x i ). From all the above considerations we can estimate Combining this with the estimates from Lemmas 4.3 and 4.4 we obtain Then the result follows by replacing this in the estimate from Lemma 4.2.
As we have seen in Section 3, solutions to the DPP can be interpreted as an expected value. Thus the ε-ABP extends to this setting as well. (4.13) More precisely, C = 2 N +3 /β.
Proof. We consider u : By Theorem 3.4 we have that L ε u = −f in Ω and u = 0 in R N \ Ω. The result follows by applying Theorem 4.1 to u.
Let us observe that the ε-ABP estimate (4.3) yields the classical ABP estimate after taking limits as ε → 0. Since each Q in Q ε (K u ) has diameter ε/4, then ε = 4 √ N |Q| 1/N and thus Furthermore, letting ε → 0 the size of the cubes in Q ε (K u ) converges to zero, and since f is continuous, we obtain the L N (K u )-norm of f + as the limit of Riemann sums, i.e.
Thus, replacing this in the ε-ABP estimate (4.3) we get which is the classical ABP estimate plus an error term vanishing when ε → 0.
Observe that the error depends on f , moreover it does not vanish uniformly on f . Also observe that the ABP estimate requires f to be continuous. The standard version of the ABP in the context of PDEs is with L N -norm of f in the right hand side. Unfortunately such an inequality does not hold in our setting. That is, for a general f , an inequality such as  Let ν be given by where v x is such that We overcome the difficulty of the previous example in the following theorem, where we obtain a weaker version of the result. Fortunately it is enough for our purposes, see Lemma 5.1.
Theorem 4.7 (ε-ABP estimate with measurable f ). Given f : Ω → R a non negative bounded measurable function, there exist C > 0 (depending only on N ) such that Proof. First we extend f : Ω → R outside Ω defining f (x) = 0 for every x ∈ R N \ Ω. Then we definef : R N → R as the function given bỹ for every x ∈ R N , sof is continuous in R N and, in particularf ∈ C(Ω). For i ≥ 1 we have (4.14) Since where we used (4.14). Rearranging terms, we get Observe that sincef ∈ C(Ω) andf ≥ 0 we can apply Corollary 4.5. We obtain For any fixed Q ∈ Q ε (Ω), let x 0 denote the center of Q, so Q = Q ε Let ℓ = ℓ(N ) ∈ N be the unique odd integer such that ℓ − 2 < 9 √ N ≤ ℓ. In consequence Q ⊂ B ε (x) ⊂ ℓQ for every x ∈ Q. Sincef is continuous in R N , there exists some x ∈ Q such thatf (x) = sup Qf and thus (sup where in the first inequality we have recalled Jensen's inequality for convex functions. Moreover, since the cubes in Q ε (Ω) form a grid, it turns out that ℓQ can be expressed as the union of the cubes Q ′ such that Q ′ ∈ Q ε (Ω) and Q ′ ⊂ ℓQ. Since any particular Q ′ ∈ Q ε (Ω) belongs to card{Q ∈ Q ε (Ω) : Q ′ ⊂ ℓQ} = ℓ N number of cubes ℓQ, we can estimate the overlap and get where the last equality comes from the fact that f ≡ 0 outside Ω. Taking the N -th root we finally obtain , and the result follows after inserting this in (4.15).

De Giorgi oscillation lemma
The main goal of this section is to prove Lemma 5.8, a version of the classical De Giorgi oscillation lemma.
We follow Krylov-Safonov argument in [KS79] and [KS80], see also [Bas98, Chapter V, Section 7]. However, our case is partly discrete and ε sets a natural limit for the scale that can be used in the proofs. This causes considerable changes. The key result is Theorem 5.7 where we prove that a set of positive measure is reached by the process with positive probability. This then implies De Giorgi oscillation lemma in a straightforward manner by using a level set as the set of positive measure.
One of the key steps in this section is the use of an adapted version Calderón-Zygmund decomposition, Lemma 5.4. The main difference with the classical version is that we do not consider cubes of scale smaller than ε. If we simply stop the decomposition once we reach the cubes of scale ε, then we would lose control between the original set and union of cubes in the decomposition. Therefore we need a subtle additional condition in the decomposition for cubes of size ε.
The first step in our argument is to prove that sets of 'large density' are reached by the process with positive probability. This is done in the following lemma. There we employ the ε-ABP estimate, Theorem 4.7, with the characteristic function of A as a right hand side, and further estimates from Section 2.4. Recall that T A denotes the hitting time for A and τ A the exit time, that is T A = min{k ∈ N : X k ∈ A} and τ A = min{k ∈ N : X k ∈ A}.
In the following lemma we prove that sets of positive measure are reached by the process with positive probability when ε 0 /2 ≤ ε < ε 0 . By performing a scaling of the space and step size we later use the result for cubes of size comparable to ε in Theorem 5.7, see Remark 5.3.
Proof. We define N 0 = 2 √ N ε0 + 1 and consider the event E of the first N 0 movements to be uniformly distributed. That is E = {c 1 = · · · = c N0 = 1}. We have . Observe that P(E) = β N0 . If a uniform random step occurs, then the step size is at most ε. Hence, after N 0 uniform random steps the token is at a distance of at most from the starting point. Therefore, we have that all the steps until time N 0 are inside of Q 10 We consider U i a sequence of independent random variables uniformly distributed in B 1 . And we define Y = N0 i=1 U i . Let f denote the density of Y , it is a radial decreasing function strictly positive in the ball of radius N 0 . Given x 0 = x ∈ Q 1 and y ∈ Q 1 we can bound Finally we obtain Therefore the result holds for γ = 1 Remark 5.3. Given a cube Q there exists an affine transformation h(x) = ax + b such that h(Q) = Q 1 . Given the process X k we can consider the process h(X k ).
Observe that this new process is of the type that we are considering forε = aε and the pushforward measureν given byν x (A) = ν h −1 (x) (h −1 (A)). Then, results established for Q 1 such as Lemma 5.2 can be applied to cubes of any size. Moreover, if ε 0 /2 ≤ε < ε 0 , then the constant γ only depends on ε 0 . Now we state our version of the Calderón-Zygmund lemma. In the discrete setting, the ε sets a natural limit for the scale. To control the error when stopping the decomposition at the level ε, we introduce an additional condition. When applying the decomposition, a careful choice of the parameters allows us to guarantee the two opposite goals: there are enough cubes in the decomposition and the share of A in measure is still large enough. First we introduce some notation. We denote by D ℓ the family of dyadic open subcubes of Q 1 of generation ℓ ∈ N. That is D 0 = {Q 1 }, D 1 is the family of 2 N dyadic open cubes obtained by dividing Q 1 , and so on. Given ℓ ∈ N and Q ∈ D ℓ we define pre(Q) ∈ D ℓ−1 as the unique dyadic cube in D ℓ−1 containing Q.
Next we will construct a collection of (open) cubes Q B , containing subcubes from generations D 0 , D 1 , . . . , D L , and a set B : = ∪ Q∈QB Q.
By the assumption we first observe Then we split Q 1 into 2 N dyadic cubes D 1 . For those dyadic cubes Q ∈ D 1 that satisfy For other dyadic cubes that do not satisfy (5.2) and are not contained in any cube already included in Q B , we keep splitting, and again repeat the selection according to (5.2). We repeat splitting L ∈ N times. At the level L, in addition to the previous process, we also select those cubes Q ∈ D L (not the predecessors) into Q B for which δ|Q| ≥ |A ∩ Q| >δ|Q|. (5.3) Now the following lemma holds.
Proof. Observe that for pre(Q) selected according to (5.2) into Q B , it holds that since otherwise we would have stopped splitting already at the earlier round. Also for cubes Q selected according to (5.3) into Q B , it holds that |A ∩ Q| ≤ δ|Q|.
Summing up, for all the cubes Q ∈ Q B , it holds Moreover, by construction, the cubes in Q B are disjoint.
We define G L as a family of cubes of D L that covers Q 1 \ B a.e. It immediately holds a.e.
By this, using (5.4), as well as observing that |A ∩ Q| ≤δ|Q| by (5.3) for every Q ∈ G L , we get Before proceeding to the main result, we need to show that if the stochastic process starts in certain cube, it will reach any subcube in the next level of the dyadic decomposition with positive probability. We also need to show that for any starting point in Q 1 , the process reaches Q 1/2 with positive probability. We obtain these results as a corollary of the following lemma.
Observe that ϕ is smooth in that region for ε 2 small enough.
We are ready to prove our main result, that a set of positive measure is reached by the process with positive probability. The idea is the following: given a suitable set A we construct B using the Calderón-Zygmund lemma such that B is larger than A in measure. Using this, we prove that the process reaches B (estimate (5.7) below) and then A by considering two alternatives (estimates (5.9) and (5.10) below).
First, observe that By Corollary 5.6 we have that P x (T Q 1 2 < τ Q 10 √ N ) is positive and by Lemma 5.1 we have a positive lower bound for P y (T A < τ Q1 ) for y ∈ Q 1/2 whenever |Q 1 \ A| is small enough. Therefore the probability P x (T A < τ Q 10 √ N ) is uniformly bounded from below for A such that |Q 1 \ A| is small enough. We get 1 > q 0 .
By the previous observation, we may choose q > q 0 such that (q + q 2 )/2 < q 0 . Thus for η : = (q − q 2 )/2 we have Given A ⊂ Q 1 with q ≥ |A| > q − η, we consider the union of cubes B constructed in Lemma 5.4 for δ = q,δ = η, and L ∈ N such that 2 L ε < ε 0 ≤ 2 L+1 ε. Observe that L depends on ε, that is, the depth of the Calderón-Zygmund decomposition depends on ε. This is what allows us to have the smaller cubes in the decomposition of side length comparable to ε. All the other constants are independent of ε. With these choices, by the Calderón-Zygmund Lemma 5.4, we have |A| ≤ q|B| + η, that is Hence, by the definition of ϕ and (5.6), we have since by the choice of q we had q > q 0 . We can estimate Now we estimate P y (T A < τ Q 10 √ N ) separating in two cases depending on y ∈ B. Because of the construction of B we know that one of the following must hold: • There exists a dyadic cube Q with side length equal to 1/2 L such that y ∈ Q ⊂ B and |A ∩ Q| >δ|Q| = η|Q|, or • There exists a dyadic cube Q with side length larger than or equal to 1/2 L such that y ∈ pre(Q) ⊂ B and |A ∩ Q| > δ|Q| = q|Q|.
In the first case by scaling the cube Q to Q 1 (see Remark 5.3) we obtain a process forε with ε 0 /2 ≤ε = 2 L ε ≤ ε 0 . By applying Lemma 5.2 we obtain for y ∈ Q ⊂ B Observe that γ depends on ε 0 but not on ε.
In the second case we scale pre(Q) to Q 1 and obtain a version of the process forε ≤ ε 0 and someν. We may assume that the scaled version of Q is P = Q 1 ∩ {x : x i > 0 for i = 1, . . . , N }. We can bound the probability of reaching P by Corollary 5.6 and then the probability of reaching A using that |A ∩ Q| > q|Q|. By the choice of q, we obtain for y ∈ pre(Q) ⊂ B P y (T A < τ Q 10 √ N ) ≥ pϕ(q) > 0. (5.10) Using (5.9), (5.10) and (5.7) in (5.8), we conclude that P y (T A < τ Q 10 √ N ) ≥ ϕ(q) min{ηγ, pϕ(q)} > 0. Hence, ϕ(ξ) > 0 for every ξ > q − η, which is a contradiction. Now we state a version of the classical De Giorgi oscillation lemma for the subsolutions of the DPP.
We have B 1/2 ⊂ Q 1 . We take k = 2(5N + Λε 0 ), such that X τQ 10 √ N ∈ B k/2 . We define A = B 1/2 ∩ {ũ ≤ m} and consider the stopping time where the first inequality holds sinceũ is a subsolution to the DPP and we have bounded Ex ε 2 T by Lemma 2.9.
Observe that infx ∈B 1/2 Px(T A < τ Q 10 √ N ) is positive as stated in Theorem 5.7. Also observe that f ∞ = (2R) 2 f ∞ . Therefore, we have proved the result since boundingũ(x) for everyx ∈ B 1/2 ⊂ Q 1 is equivalent to bound u(x) for every Observe that the values of k and η do not depend on ε nor R. And an analogous statement holds for supersolutions.

Proof of the Hölder estimate
The Hölder estimate follows from the De Giorgi oscillation lemma, Lemma 5.8, after a finite iteration. We include here the details as we have to take special care of the role of ε in the arguments. We also define osc(A) = sup Observe that osc(B R ) = O(R).
Lemma 6.1. There exists λ < 1 and k > 1 such that for every solution u to the DPP defined in B kR we have for every R and ε < Rε 0 .
Proof. We can assume that O(kR) = 0 where k is given by Lemma 5.8. We consider l = (M (kR) + m(kR))/2. Either Suppose that the first holds (the proof is completely analogous in the other case), then since u ≥ m(kR) and l = m(kR) + O(kR) 2 , Lemma 5.8 implies that Thus, the statement holds for λ = 1 − η/2.
By iterating the oscillation estimate from Lemma 6.1 we can obtain the Hölder regularity. To that end we prove the following lemma.
Lemma 6.2. If O(s) ≥ 0 is a non-decreasing function and O(s) ≤ λO(ks) + Cs 2 f ∞ for every s > ξ for some λ ∈ (0, 1), k > 1 and ξ > 0 such that λk 2 > 1, then Proof. Since k > 1 there exist a unique m ∈ N 0 such that By using repeatedly that O(s) ≤ λO(ks) + Cs 2 f ∞ , for s = R k m , R k m−1 , . . . , R k we obtain Observe that we have used the hypothesis for values larger than ξ, in fact We have, Thus, the inequality follows since Observe that Lemmas 6.1 and 6.2 prove that given u a solution to the DPP defined in B R (x), for ε < ρε 0 . We are ready to prove the Hölder estimate.
Proof of Theorem 1.1. Given x, z ∈ B R we consider ρ = |x − z|.

Generalization to Pucci-type operators and inequalities
Here we explain how to modify our arguments to include solutions to Puccitype operators and inequalities. Our method is robust and essentially the same arguments remain valid.
We start by defining the operator and then a stochastic process associated to it.
For each ε > 0 we consider a stochastic process starting at x 0 ∈ R N . The process is driven by a controller. Given the value of x k , the next position of the process x k+1 is determined as follows. A biased coin is tossed. If we get heads (probability α), the controller chooses z ∈ B Λ and we have x k+1 = x k ±εz, each with probability one half. If we get tails (probability β), x k+1 is distributed uniformly in the ball B ε (x k ).
To be more precisely, a strategy S for the controller is a measurable function defined on the partial histories, that is Then the process is moved accordingly to this choice. That is, given A ∈ B and c = 0 or 1, we have the following transition probabilities For a fixed strategy we have a process as before. The only difference is that now the measure ν may depend not only in x but S (this does not introduce any difference in our arguments). Fixed S we can consider E x0 S the corresponding expectation. All the estimates obtained for E x0 hold for E x0 S and are independent of S.
We consider a game where the controller is paid g(x τ ) at the end and therefore it is her goal to maximize that value, the expectation for her earnings is given by where E x0 S stands for the expectation with respect to the process and S is the strategy adopted by the controller. The function u : R N → R satisfies the DPP given by u(y) dy for x ∈ Ω, and u(x) = g(x) for x ∈ Ω.
We can consider a version of the game where whenever the token leaves a point x i , the controller is paid ε 2 f (x i ). In this case the expectation for her earnings is given by It turns out, as will be shown below, that u is the unique bounded Borel measurable function that satisfies for x ∈ Ω and u(x) = g(x) for x ∈ Ω. Or equivalently L + ε u + f = 0. The existence of a solution to equation (7.2) can been seen as before. Next we prove the equivalent to Theorem 3.4.
Theorem 7.2. The function u given by (7.1) is the unique bounded solution to equation (7.2) with boundary values g.

Proof.
Let v be a solution to equation (7.2). Given a strategy S 0 we have Thus {v(X k ) + ε 2 k−1 i=0 f (X i )} k is a supermartingale. Then, by Doob's stopping time theorem (recall that v and f are bounded) we have Since this holds for every strategy and v( On the other hand, given η > 0 we consider a strategy S 0 that almost maximizes the right hand side of (7.2), that is S 0 (x 0 , . . . , The strategy can be taken measurable similarly to Lemma 3.1 in [LPS14].
We have Since this holds for every η > 0 we conclude that v ≤ u. Thus v = u, we have proved that u is a solution to (7.2) and that every solution coincides with it, there is a unique solution.
Finally, we state again Theorem 1.2, which is our main result of this section: one only needs Pucci-type inequalities in order to obtain the regularity result.
Remark 7.4. Observe that the ε-ABP estimate (Theorem 4.1), as well as all the results from Section 4, are valid if we consider the maximal Pucci-type operator L + ε instead of L ε . This is due to the fact that (similarly as in equations (4.4) and (4.5)), if u is a bounded Borel measurable function satisfying L + ε u + f ≥ 0 in Ω, then L + ε Γ + f ≥ 0 in K u , where Γ is the concave envelope of u and K u is the set of contact points defined in (4.1). Hence, using this together with the fact that the second differences satisfy δΓ(x 0 , y) ≤ 0 for each x 0 ∈ K u , we can estimate the α-term in L + ε Γ(x 0 ) so L + ε Γ(x 0 ) ≤ β 2ε 2 Bε δΓ(x 0 , y) dy.
This is analogous to the inequality (4.12) in the proof of Lemma 4.4, and it is indeed the cornerstone in all the estimates from Section 4.
With the analogous results of Sections 2 and 4 for L + ε in hand, those of Section 5 follow. However, there is a key modification needed is in the analogous version of Lemma 5.8 where, after establishing some estimates related to the stochastic process, solutions to the DPP are considered. Here we adapt our argument to functions satisfying (7.3).
Given η > 0 we consider the strategy S 0 the almost maximizes the right hand side of (7.4), that is S 0 (x 0 , . . . , x k ) =z ∈ B Λ such that As in the proof of Theorem 7.2 we get that {v(X k ) + ε 2 k−1 i=0 f (X i ) − η2 −k } k is a submartingale for Ex S0 .
Observe that infx ∈B 1/2 Px(T A < τ Q 10 √ N ) is positive as stated in Theorem 5.7. Also observe that f ∞ = (2R) 2 f ∞ . Therefore, we have proved the result since boundingũ(x) for everyx ∈ B 1/2 is equivalent to bound u(x) for every x ∈ B R . Finally since the inequality holds for every η > 0 it holds without it.
Remark 7.6. Given nonempty subsets M x ⊂ M(B Λ ) for each x ∈ R N with suitable measurability requirements, we can consider solutions to the equation Observe that such function would satisfy (7.3) and therefore would be in the hypothesis of Theorem 7.3.
Thus solutions to (7.6) satisfy the hypotheses of Theorem 7.3.
In a similar way there is a large family of discrete operators associated to different PDEs that are in the hypothesis of our main result.