On Dantzig and Lasso estimators of the drift in a high dimensional Ornstein-Uhlenbeck model

In this paper we present new theoretical results for the Dantzig and Lasso estimators of the drift in a high dimensional Ornstein-Uhlenbeck model under sparsity constraints. Our focus is on oracle inequalities for both estimators and error bounds with respect to several norms. In the context of the Lasso estimator our paper is strongly related to [11], who investigated the same problem under row sparsity. We improve their rates and also prove the restricted eigenvalue property solely under ergodicity assumption on the model. Finally, we demonstrate a numerical analysis to uncover the finite sample performance of the Dantzig and Lasso estimators.


Introduction
During past decades an immense progress has been achieved in statistics for stochastic processes. Nowadays, comprehensive studies on statistical inference for diffusion processes under low and high frequency observation schemes can be found in monographs [14,17,19]. Most of the existing literature is considering a fixed dimensional parameter space, while a high dimensional framework received much less attention in the diffusion setting.
Since the pioneering work of McKean [20,21], high dimensional diffusions entered the scene in the context of modelling the movement of gas particles. More recently, they found numerous applications in economics and biology, among other disciplines [3,6,9]. Typically, high dimensional diffusions are studied in the framework of mean field theory, which aims at bridging the interaction of particles at the microscopic scale and the mesoscopic features of the system (see e.g. [28] for a mathematical study). In physics particles are often assumed to be statistically equal, but this homogeneity assumption is not appropriate in other applications. For instance, in [6] high dimensional SDEs are used to model the wealth of trading agents in an economy, who are often far from being equal in their trading behaviour. Another example is the flocking phenomenon of individuals [3], where it seems natural to assume that there are only very few "leaders" who have a distinguished role in the community. These examples motivate to investigate statistical inference for diffusion processes under sparsity constraints.
This paper is focusing on statistical analysis of a d-dimensional Ornstein-Uhlenbeck model of the form t≥ 0, (1.1) defined on a filtered probability space (Ω, F, (F t ) t≥0 , P), with underlying observation (X t ) t∈ [0,T ] . Here W denotes a standard d-dimensional Brownian motion and A 0 ∈ R d×d represents the unknown interaction matrix. Ornstein-Uhlenbeck processes are one of the most basic parametric diffusion models. When the dimension d is fixed and T → ∞, statistical estimation of the parameter A 0 has been discussed in several papers. Asymptotic analysis of the maximum likelihood estimator in the ergodic case can be found in e.g. [17] while investigations of the non-ergodic setting can be found in [16,18]. The adaptive Lasso estimation for multivariate diffusion models has been investigated in [12].
Our main goal is to study the estimation of A 0 under sparsity constraints in the large d/large T setting. Such a mathematical problem finds its main moti-vation in the analysis of bank connectedness whose wealth is modelled by the diffusion process X. This field of economics, which studies linkages between a large number of banks associated with e.g. asset/liability positions and contractual relationships, is key to understanding systemic risk in a global economy [13]. Typically, the connectivity structure, which is represented by the parameter A 0 , is quite sparse since only few financial players are significant in an economy, and the main focus is on estimation of non-zero components of A 0 .
Theoretical results in the high dimensional diffusion setting are rather scarce. In this context we would like to mention the Dantzig selector which was introduced in [5] and primarily designed for linear regression models. More specifically, [5] established sharp non-asymptotic bounds on the l 2 -error in the estimated coefficients and proved that the error is within a factor of log(d) of the error that would have been reached if the locations of the non-zero coefficients were known. Further extensions of the aforementioned results can be found in [10] and [24], which study the Dantzig selector for discretely observed linear diffusions and support recovery for the drift coefficient, respectively. Our work is closely related to the recent article [11], where estimation of A 0 under row sparsity has been investigated. The authors propose to use the classical Lasso approach and derive upper and lower bounds for the estimation error. We build upon their analysis and provide oracle inequalities and non-asymptotic theory for the the Lasso and Dantzig estimators. In comparison to [11], we obtain an improved upper bound for the Lasso estimator, which essentially matches the theoretical lower bound, and also show that the restricted eigenvalue property is automatically satisfied under ergodicity condition on the model (1.1) (in [11] the extra assumption (H4) has been imposed). The latter is proved via Malliavin calculus methods proposed in [22]. Moreover, we show that the Lasso and Dantzig estimators are asymptotically efficient, which is a well known fact in linear regression models (cf. [2]). Finally, we present a simulation study to uncover the finite sample properties of both estimators.
The paper is organised as follows. Section 2 is devoted to the exposition of the classical estimation theory in the fixed dimensional setting and to definition of the Lasso and Dantzig estimators. Concentration inequalities for various stochastic terms are derived in Section 3. In particular, we show the restricted eigenvalue property under the ergodicity assumption via Malliavin calculus methods. In Section 4 we present oracle inequalities and error bounds for both estimators. Numerical simulation results are demonstrated in Section 5. Finally, some proofs are collected in Section 6.

Notation
In this subsection we briefly introduce the main notations used throughout the paper. For a vector or a matrix x the transpose of x is denoted by x . For p ≥ 1 and A ∈ R d1×d2 , we define the l p -norm as We denote by A ∞ = lim p→∞ A p the maximum norm and set A 0 := 1≤i≤d1,1≤j≤d2 1 {Aij =0} . We associate to the Frobenius norm · 2 the scalar product where tr denotes the trace. For a symmetric matrix A ∈ R d×d we write λ max (A), λ min (A) for the largest and the smallest eigenvalue of A, respectively. We denote by For a quadratic matrix A ∈ R d×d , diag(A) stands for the diagonal matrix satisfying diag(A) ii = A ii . We also introduce the notation where c 0 > 0 and I s (A) is a set of coordinates of s largest elements of A. Furthermore, vec denotes the vectorisation operator and ⊗ stands for the Kronecker product. For z ∈ C we denote by Re(z) (resp. Im(z)) the real (resp. imaginary) part of z. Finally, for stochastic processes

The setting and fixed dimensional theory
We consider a d-dimensional Ornstein-Uhlenbeck process introduced in (1.1). Throughout this paper the matrix A 0 is assumed to satisfy the following condition: where the column vectors of P 0 are eigenvectors of A 0 . Furthermore, the eigenvalues θ 1 , . . . , θ d ∈ C have strictly positive real parts: It is well known that under condition (H) the stochastic differential equation (1.1) exhibits a unique stationary solution, which can be written explicitly as In this case we have that We assume that the complete path (X t ) t∈[0,T ] is observed and we are interested in estimating the unknown parameter A 0 . Let us briefly recall the classical maximum likelihood theory when d is fixed and T → ∞. When P T A denotes the law of the process (1.1) with transition matrix A restricted to F T , the loglikelihood function is explicitly computed via Girsanov's theorem as Consequently, the maximum likelihood estimator A ML is given by Under condition (H) the estimator A ML is asymptotically normal, i.e.
with id denoting the d-dimensional identity matrix. Indeed, we have the identity 5) and the result (2.4) follows from the standard martingale central limit theorem. We refer to [17, p. 120-124] for a more detailed exposition.
When assumption (H) is violated the asymptotic theory for the maximum likelihood estimator A ML is more complex. If some eigenvalues θ j satisfy Re(θ i ) < 0 exponential rates appear as it has been shown in [18]. A further application of Ornstein-Uhlenbeck processes to co-integration is discussed in [16], where the condition Re(θ i ) = 0 appears for some i's.

The Lasso and Dantzig estimators
Now we turn our attention to large d/large T setting. We consider the Ornstein-Uhlenbeck model (1.1) satisfying the assumption (H) and assume that the unknown transition matrix A 0 satisfies the constraint We remark that due to condition (2.2) it must necessarily hold that s 0 ≥ d. A standard approach to estimate A 0 under the sparsity constraint (2.6) is the Lasso method, which has been investigated in [11] in the framework of an Ornstein-Uhlenbeck model. The Lasso estimator is defined as where λ > 0 is a tuning parameter. We remark that A L can be computed efficiently, since it is a solution of a convex optimisation problem. Next, we are going to introduce the Dantzig estimator of the parameter A 0 . According to (2.3) the quantity L T (A) can be written as (2.8) We recall that B belongs to a subdifferential of a convex function f : In particular, B ∈ ∂ B 0 1 satisfies the constraint B ∞ ≤ 1. A necessary and sufficient condition for the minimiser at (2.7) is the fact that 0 belongs to the subdifferential of the function A → L T (A) + λ A 1 . This implies that the Lasso estimator A L satisfies the constraint Now, the Dantzig estimator A D of the parameter A 0 is defined as a matrix with the smallest l 1 -norm that satisfies the inequality (2.9), i.e.
By definition of the Dantzig estimator we have that A D 1 ≤ A L 1 . In particular, when the tuning parameters λ for Lasso and Dantzig estimators are preset to be the same, then the Lasso estimate is always a feasible solution to the Dantizg selector minimization problem although it may not necessarily be the optimal solution. This implies, that when respective solutions are not identical, the Dantizg selector solution is sparser (in l 1 -norm) than the Lasso solution (see [15], Appendix A for details). From the computational point view, the Dantzig estimator can be found numerically via linear programming for convex optimisation with constraints.
The following basic inequality, which is a direct consequence of the fact that provides the necessary basis for the analysis of the error A L − A 0 .

On Dantzig and Lasso estimators in a high dimensional OU model 4401
From Lemma 2.1 it is obvious that we require a good control over martingale term ε T , V F for certain matrices V ∈ R d×d to get an upper bound on the prediction error ( A L −A 0 )X L 2 . Another important ingredient is the restricted eigenvalue property, which is a standard requirement in the analysis of Lasso estimators (see e.g. [2,4]). In our setting the restricted eigenvalue property amounts in showing that is bounded away from 0 with high probability.
Interestingly, the latter is a consequence of the model assumption (H) and not an extra condition as in the framework of linear regression. This has been noticed in [11], but an additional condition (H4) was required which is in fact not needed as we will show in the next section.
In order to establish the connection between the Dantzig and the Lasso estimators we will show the inequality for a certain constant c > 0, which holds with high probability. Once the term A L 0 is controlled, we deduce statements about the error term A D − A 0 via the corresponding analysis of A L − A 0 .

Concentration bounds for the stochastic terms
In this section we derive various concentration inequalities, which play a central role in the analysis of the estimators A L and A D .

The restricted eigenvalue property
This subsection is devoted to the proof of the restricted eigenvalue property. The main result of this subsection relies heavily on some theoretical techniques presented in [22], where Malliavin calculus is applied in order to obtain tail bounds for certain functionals of Gaussian processes. In the following, we introduce some basic notions of Malliavin calculus; we refer to the monograph [23] for a more detailed exposition.
Let H be a real separable Hilbert space. We denote by B = {B(h) : h ∈ H} an isonormal Gaussian process over H. That is, B is a centred Gaussian family with covariance kernel given by We shall use the notation L 2 (B) = L 2 (Ω, σ(B), P). For every q ≥ 1, we write H ⊗q to indicate the qth tensor product of H; H q stands for the symmetric qth tensor. We denote by I q the isometry between H q and the qth Wiener chaos of X. It is well-known (see e.g. [23,Chapter 1]) that any random variable F ∈ L 2 (B) admits the chaotic expansion where the series converges in L 2 and the kernels f q ∈ H q are uniquely determined by F . The operator L, called the generator of the Ornstein-Uhlenbeck semigroup, is defined as whenever the latter series converges in Next, let us denote by S the set of all smooth cylindrical random variables of The Malliavin derivative D verifies the following chain rule: when ϕ : R n → R is in C 1 b (the set of continuously differentiable functions with bounded partial derivatives) and if (F i ) i=1,...,n is a vector of elements in D 1,2 , then ϕ(F 1 , . . . , F n ) ∈ D 1,2 and The next theorem establishes left and right tail bounds for certain elements Z ∈ D 1,2 . . Assume that Z ∈ D 1,2 and define the function Suppose that the following conditions hold for some α ≥ 0 and β > 0: Then, for any z > 0, it holds that Now, we apply Theorem 3.1 to certain quadratic forms of the Ornstein-Uhlenbeck process X. The following result is crucial for proving the restricted eigenvalue property. (2.5). Then it holds for all x > 0:

Proposition 3.2. Suppose that assumption (H) is satisfied and let C T be defined as in
where the function H 0 is defined as op , and the quantities P 0 and r 0 are introduced in assumption (H).
Proof. We define the centred stationary Gaussian process Y v t = v X t and note that its covariance kernel is given by can be considered as an isonormal Gaussian process indexed by a separable Hilbert space H whose scalar product is induced by the covariance kernel of ( and notice that Z v T is an element of the second order Wiener chaos. Hence, Z v T has a Lebesgue density and we have L −1 Z v T = −Z v T /2, and we conclude by the chain rule that Consequently, the conditions of Theorem 3.1 are satisfied with α = 4 T p0K∞ r0 and . The statement of Proposition 3.2 corresponds to assumption (H4) in [11], which has been shown to be valid via a log-Sobolev inequality only when A 0 is symmetric (cf. [11,Theorem]). In other words, the extra assumption (H4) is not required as it directly follows from the modelling setup.
The next theorem proves the restricted eigenvalue property.

Theorem 3.3. Suppose that assumption (H) is satisfied and define
Then for any 0 ∈ (0, 1) it holds that where the constant T 0 ( 0 , s, c 0 ) is defined as Proof. See Section 6.1.
The next corollary presents a deviation bound for the quantity C T .

Deviation bounds for the martingale term ε T and final estimates
As mentioned earlier controlling the stochastic term ε T , V F for matrices V ∈ R d×d is crucial for the analysis of the estimators A L and A D . The martingale property of ε T turns out to be the key in the next proposition. We remark that the following result is an improvement of [11,Theorem 8].

On Dantzig and Lasso estimators in a high dimensional OU model 4405
Proof. We first recall Bernstein's inequality for continuous local martingales. Let (M t ) t≥0 be a real-valued continuous local martingale with quadratic variation ( M t ) t≥0 . Then for any a, b > 0 it holds that This result is a straightforward consequence of exponential martingale technique (cf. Chapter 4, Exercise 3.16 in [25]). By definition ε ij which completes the proof.
Summarising all previous deviation bounds we obtain the following result.

Oracle inequalities and error bounds for the Lasso and Dantzig estimators
In this section we present the main theoretical results for the Lasso and Dantzig estimators. More specifically, we derive oracle inequalities for A L and A D , and show the error bounds for the norms · L 2 , · 1 and · 2 . In particular, we establish the asymptotic equivalence between the Lasso and Dantzig estimators.

Properties of the Lasso estimator
We start this subsection with proving a statement, which is important for obtaining oracle inequality for the Lasso estimator A L .

In particular, it implies that
Proof. Let us set δ L (A) := A− A L . Applying Lemma 2.1 we obtain the following inequality Hence, on E(s, c 0 ) it holds that We observe next that δ L (A) 1 + A 1 − A L 1 ≤ 2 δ L (A) |A 1 , which immediately implies (4.1). Applying (4.1) to A = A 0 we deduce that where the last inequality holds due to the sparsity assumption A 0 0 ≤ s 0 . Consequently, A L − A 0 ∈ C(s 0 , 3) and the proof is complete.
We are now in the position to present an oracle inequality for the Lasso estimator A L , which is one of the main results of our paper.

7) and assume that condition (H) holds. Then for
and T ≥ T 0 0 /2, s 0 , 3 + 4/γ , with probability at least 1 − 0 it holds that Proof. Consider an arbitrary matrix A ∈ R d×d with A 0 ≤ s 0 . Then, on E(s 0 , 3 + 4/γ), according to Lemma 4.1 and Cauchy-Schwarz inequality: the result immediately follows from Lemma 4.1. Hence, we only need to treat the case 4λ The latter implies that A L − A 0 ∈ C(s 0 , 3 + 4/γ) due to (4.2). Then, on the event E(s 0 , 3 + 4/γ), we have and consequently we obtain from (4.2) that Using the inequality 2xy ≤ ax 2 + y 2 /a for a > 0, we then conclude that which completes the proof.
Theorem 4.2 enables us to find upper bounds on the various norms of A L −A 0 as well as on the sparsity of A L . We remark that the bound in (4.6) will be useful to provide the connection between the Lasso and Dantzig estimators in the next subsection.
Proof. On the event E(s 0 , 3), taking A = A 0 and A 0 = supp(A 0 ), we obtain the inequality This gives (4.3) and (4.4). Moreover, on the same event it holds and hence (4.5) follows. Now, it remains to prove (4.6). Note that necessary and sufficient condition for A L to be the solution of the optimisation problem (2.7) is the existence of a matrix B ∈ ∂ A L 1 such that Furthermore, A ij L = 0 implies that B ij = sign( A ij L ). Thus, we conclude that where the last inequality holds on E(s 0 , 3). On the other hand, on the same event we obtain which implies (4.6).
The upper bounds in (4.3)-(4.5) improve the bounds obtained in [11, Corollary 1] and they are in line with the classical results for linear regression models. We recall that the paper [11] considers row sparsity of the unknown parameter A 0 , i.e.
where A i 0 denotes the ith row of A 0 . Obviously, this constraint corresponds to s 0 = ds in our setting. The authors of [11] obtained the upper bound for in contrast to our improved bound T −1 ds log d. Thus, we essentially match the lower bound which has been derived in [11,Theorem 2].

Remark 4.4.
Unfortunately, an extension of the analysis to more general diffusion models does not seem to be straightforward. The linearity of the drift function in the parameter A is absolutely crucial for the proofs. First of all, it allows for an explicit computation of the log-likelihood function, whereas in the more general setting we would require an approximative analysis of the likelihood, which is expected to be much more involved (see e.g. [26,27] for an example of such analysis for general parametric models). Secondly, the linear form of the drift function leads to the quadratic form of the term C T . We remark however that the methodology of [22], which is the basis of Proposition 3.2, only applies to quadratic functionals of X and thus different mathematical techniques are needed to show this type of concentration phenomena in the general framework. Hence, we leave this investigation for future research.

Remark 4.5.
In this section we showed that the Lasso estimator has asymptotically optimal estimation rate. Another preferable property for an estimator in sparse model is consistency in selection of variables. For Ornstein-Uhlenbeck model we say that estimatorÂ is consistent for selection of variables if P supp(Â) = supp(A 0 ) → 1, when T → ∞.
These two properties are referred to as oracle properties (see [8]), and it is well known that the Lasso estimator for linear Gaussian models cannot satisfy both of them with the same tuning parameter λ (see [30]), while the adaptive Lasso estimator can. The authors of [11] have introduced the adaptive Lasso estimator for the Ornstein-Uhlenbeck model, which is defined as where • denotes the Hadamard product and (| A ML | −γ ) ij := | A ij ML | −γ for a γ > 0. They have proved that the adaptive estimator A ad is consistent for support selection and showed the asymptotic normality of A ad when restricted to the elements in supp(A 0 ); see [11,Theorem 4] for more details.

Properties of the Dantzig estimator
In this subsection we will establish a connection between the prediction errors associated with the Lasso and Dantzig estimators. This step is essential for the derivation of error bounds for A D . Our results are an extension of the study in [2], where it was shown that under sparsity conditions, the Lasso and the Dantizg estimators show similar behaviour for linear regression and for nonparametric regression models, for l 2 prediction loss and for l p loss in the coefficients for In what follows, we will derive analogous bounds for the Ornstein-Uhlenbeck process. (ii) On the event A L 0 ≤ s ∩ E(s, 1) the following inequality holds: Proof. See Section 6.3. Proposition 4.6 implies an oracle inequality for the Dantzig estimator, which is formulated in the next theorem.
The statements of Theorems 4.2 and 4.7 suggest that the Lasso and Dantzig estimators are asymptotically equivalent. This is in line with the theoretical findings in linear regression models as it has been shown in [2]. More specifically, we obtain the following result, which is a direct analogue of Corollary 4.3.
Proof. Denote A 0 = supp(A 0 ). On the event E(s 0 , 1) the matrix A 0 satisfies the Dantzig constraint (2.9), A D − A 0 ∈ C(s 0 , 1) and which gives (4.8) and (4.9). Moreover, on the same event it holds that which completes the proof.
In this work we have shown that the Lasso and Dantzig selector performances are equivalent. It is worth to mention that although we study penalized likelihood methods, it may be of separate interest (in both computational and theoretical context) that the Dantzig estimator can also be applied to settings in which no explicit likelihoods or loss functions are available (see [7] for more details).

Numerical simulations
This sections presents some numerical experiments on simulated data that illustrate our theoretical results.
Our estimation methods are based on continuous observations of the the underlying process, which need to be discretised for numerical simulations. We will use 500000 discretisation points over the time interval [0, T ] with T = 300. Such approximation is sufficient for the illustration purpose, since further refinement of the grid does not lead to a significant improvement.
The selection of value of tuning parameter λ is made by cross-validation technique, using first 90% of observations as training set and last 10% as validation set. More precisely, in our simulations we utilise In Figure 1 we demonstrate an example of the transition matrix A 0 ∈ R 15×15 and the corresponding maximum likelihood, Lasso and Dantzig estimators. Instead of giving numerical values of the entries of A 0 we use a colour code to highlight the sparsity. We observe that MLE provides a good performance on the support, but it gives rather poor estimates outside the support. On the other hand, the superiority of the Lasso and Dantzig estimators, especially in terms of support recovery, is quite obvious even for relatively small dimension of matrix. Figure 2 demonstrates the relative error of the maximum likelihood, Lasso and Dantzig estimators compared to the norm of the true matrix. We compute the relative error for dimensions d = 5, . . . , 20 and for L 1 and Frobenius norms. Figure 2 clearly shows the improvement of performance of penalized estimation methods with growth of the dimension d compared to the maximum likelihood estimation. Indeed, we observe that relative errors of maximum likelihood estimation grow linearly both in L 1 and Frobenius norms, while relative errors of Lasso and Dantzig estimators decay in d. The sparsity of the true parameter A 0 was chosen equal to s = 0.3d 2 , which might explain the limiting behaviour of Lasso and Dantzig estimators when d is increasing. Finally, we observe that relative errors for Lasso and Dantzig estimators are practically equivalent, which is exactly in accordance with our theoretical results.

Proof of Theorem 3.3
We first note the identity V X 2 L 2 = tr(V C T V ). Replacing C T by its limit C ∞ we deduce the inequality tr(V C ∞ V ) ≥ k ∞ > 0 and therefore (6.1) Next, we introduce the set K(s) := V ∈ R d×d \ {0} : V 0 ≤ s . As is shown in Lemma 6.1 it holds that sup V ∈C(s,c0) (6.2) Thus, it suffices to consider K(s) instead of C(s, c 0 ) in the following discussion. Observing (6.1) we obtain that For a matrix V ∈ K(s) we denote its j-th row vector by v j and v = vec(V ) ∈ R d 2 . Moreover, we define a symmetric random matrix Then we deduce the identity According to Proposition 3.2 we obtain the following inequalities for any x > 0: By Lemma 6.2 we conclude that We deduce from (6.3) that The latter statement together with (6.2) implies the inequality for all T ≥ T 0 ( 0 , s, c 0 ), which completes the proof of Theorem 3.3.

Proof of Corollary 3.4
Let e (i,j) ∈ R d×d be a matrix defined as e kl (i,j) := 1 (k,l)=(i,j) . We observe that Furthermore, and hence This completes the proof of Corollary 3.4.

Proof of Proposition 4.6
Since A satisfies the Dantzig constraint (2.9), we deduce by definition of the Dantzig estimator: which proves part (i). Now we show part (ii) of the proposition. Set δ := A L − A D . Due to (2.8) we deduce The Dantzig constraint (2.9) implies the inequality and the same inequality holds for A D being replaced by A L . On E(s, 1) we have Furthermore, on A L 0 ≤ s it holds that δ ∈ C(s, 1) and we conclude from Theorem (3.3) that We also have δ 1 ≤ 2 δ |supp( AL) 1 ≤ 2 A L 1/2 0 δ 2 2 . Observing the first identity of (6.4), putting the previous estimates together and using the inequality 2xy ≤ ax 2 + y 2 /2 for a > 0, we obtain the following inequality On the other hand, applying the second identity of (6.4), we deduce that which completes the proof.

Some lemmas
In this subsection we present two results that can be easily deduced from Lemmas F.1, F.2 and F.3 from supplementary material of [1]. We state their proofs for the sake of completeness.
Proof. First, recall the definition of the set C(s, c 0 ) in (2.1) and denote the unit balls by B q (r) := {v ∈ R d : v q ≤ r} for any d ≥ 1 and q ≥ 0, r > 0. Furthermore, we introduce the notation K(s) = B 0 (s) ∩ B 2 (1) for s ≥ 1.
Then K(s) = |U |≤s S U . In what follows, we choose A = {u 1 , . . . , u m }, which is a 1 10 -net of S U . Lemma 3.5 of [29] guarantees that |A| ≤ 21 s . Next, notice that for every v ∈ S u , there exists some u i ∈ A such that Δv ≤ 1 10 , where Δv = v − u i . Then it holds Next, we use the fact that 10(Δv) ∈ S U which gives us in consequence