Generalized Lagrange Theorem

The present paper is devoted to possible generalizations of the classic Lagrange Mean Value Theorem. We consider a real-valued function of several variables that is only assumed to be continuous. The main concept is to replace the notion of the derivative by the so called bisequential tangent cone. We first prove Rolle and Lagrange type results and then we turn to comparing this cone with the Clarke subdifferential in the case of a Lipschitz function. We also investigate an approach using normal cones.


Introduction
Lagrange's Mean Value Theorem in its classic form, for a differentiable single valued real function, is one of the most crucial facts in mathematical analysis, having a large number of important applications.
A natural and interesting question is whether the Mean Value Theorem could have a valid counterpart for a function that is only continuous and not necessarily differentiable.In order to solve this problem, one has to find what a replacement for the derivative of the function or the tangent space of its graph.
There are many works concerning the above topic.A very interesting result, employing the generalized gradient, was published by G. Lebourg in [7].Another approach, via approximate Jacobian matrices, can be found in [6].Some results using Dini derivatives were achieved by T. Ważewski and W. Mlak in [11] and [8] respectively.Some possible generalizations can be also found in [4].For vectorvalued maps, see also [5] and references therein.
In the present paper yet another approach for multi-valued real continuous functions will be presented, using the paratingent cone, named here the bisequential tangent cone (BTC) (Definition 1.3).The BTC was first introduced by F. Severi and G. Bouligand respectively in the articles [10] and [2] and then studied in detail by Yuntong W. in [13].It was also mentioned in several other sources such as [9] or [12], with little to no further information provided.In the first section a handful of basic examples and main useful properties of this object will be presented and later a generalized version of the Mean Value Theorem will be formulated, namely a Rolle type result (Theorem 1.12) and a Lagrange type result (Theorem 1.13).
The second idea is to explore the properties of the normal cone.In this case the geometric interpretation is quite different.Instead of having a tangent parallel to the given secant, one can find in the mean point a nonzero vector from the normal cone perpendicular to the secant, see Theorem 2.7.
Finally, if the considered functions are restricted to be locally Lipschitz, the Clarke subdifferential can be used as the natural counterpart of the derivative.In the third section we recall the theorem by G. Lebourg and later we show a correspondence between the Clarke subdifferential and the BTC in Theorem 3.11.This results in a very elegant geometric interpretation of Lebourg's result.
I would like to thank my supervisor Maciej Denkowski for suggesting the problem.

Bisequential tangent cones
Let us recall the basic Mean Value Theorem: As we would like to obtain a natural counterpart of the above theorem in the case where the function is only continuous, we may replace the derivative of the function by a properly understood tangent to the graph.
The most intuitive idea is to consider the Peano tangent cone: Unfortunately, the above notion turns out to be insufficient.Take for example We see that the horizontal line y = 0 is not contained in the tangent cone of the graph at any of its points, which gives us a counterexample.So it will be necessary to take a slight modification of the classic definition.Thus we define the bisequential tangent cone (BTC) of a given set X at a point x ∈ X (see e.g.[12]).Definition 1.3.We say that a vector v ∈ R n belongs to the bisequential tangent cone of X at x (we write v ∈ B x (X)) iff there exists a bisequence of points The above definition has a clear geometric interpretation, as the lines tangent to X at x are described by the limits of directions of secants with both ends approaching x.Now let us examine some properties of the BTC.

Remark 1.4.
There is always 0 ∈ B x (X).We also notice that unless x is isolated in X, then B x (X) = {0} and if v ∈ B x (X), then the BTC contains a whole line {tv : t ∈ R} and thus it is a full cone.
In what follows we will assume that a (non-vertical) linear subspace of codimension 1 can be identified with a graph of a linear function L : R n−1 → R. Definition 1.5.We say that a linear subspace as above is tangent to a set From now on, we will consider the set X being a graph of a continuous function f : A → R, A ⊂ R n , denoted by Γ f .For a linear map , where B a (f ) denotes the set of directions of tangent subspaces of Γ f ⊂ R n at the given point a, called the BTC subdifferential of f at a (see also [13] def.4.1).
Later a proper explanation of this name will be provided, as we will compare the BTC and Clarke subdifferentials of a Lipschitz function (see Theorem 3.11).
Remark 1.6.We have also to deal with a vertical tangent subspace which is a hypersurface in R n+1 = R n × R given by Later we will see that the set of tangent directions is reduced to {0} in a ∈ A if the function f is of class C 1 in some neighbourhood of a and f ′ (a) = 0. for some t ∈ R ≥0 , we obtain Moreover, by taking a bisequence The BTC equals the full space: Remark 1.7.In the case of a single variable function f : , where The function is differentiable at 0 and f ′ (0) = 0, yet the derivative is not continuous at the origin.Thus we obtain In order to find any applications for the BTC subdifferential, some basic properties are needed.The following proposition is well-known for Clarke subdifferential (see for example [9], Theorem 9.8, or [13], Proposition 4.3).We provide a direct proof based on the definition of the BTC subdifferential.Proposition 1.9.If A is open in R n and f : A → R is differentiable, then for any point a ∈ A, the following conditions are equivalent: Proof.
(1) ⇒ (2) Assume that there exists The function f | [an,bn] for n > N can be identified by a projection with a single variable function and the basic Mean Value Theorem can be applied.We find a point c n on the segment [a n , b n ] for which Then we obtain Taking a limit of both sides when n → +∞, we have which yields a contradiction.
(2) ⇒ (1) Fix ε > 0.Then, by assumptions, the directions of secants in Γ f with both ends within a tight neighbourhood of (a, f (a)) are close to the direction determined by the derivative f ′ (a).In other words: which gives the continuity of f ′ , so f is of class C 1 near a.
Another useful property refers to the behaviour of the BTC subdifferential of a sum of two functions (cf.[9], exercise 8.8).
Proof.The inclusion "⊃" can be obtained directly (use 1.9).Then we get "⊂" by applying the above rule to f + g and −g.Remark 1.11.We necessarily have to assume that one of the functions is of class C 1 .Consider as an example Now we state generalized Rolle and Lagrange Theorems for the BTC.
Theorem 1.12 (Generalized Rolle).Let K be a compact subset of R n such that intK = ∅ and let f : Proof.If f is constant, then the theorem obviously holds.Otherwise, f reaches two different extrema in K by Weierstrass Theorem.Without any loss of generality, we may assume that ∃c ∈ intK : We consider two cases.
We consider separately the situation in which there exists a sequence Taking the sequence of numbers (2) Case n > 1.
Fix x ∈ R n .We will show that (x, 0) ∈ B c (f ).We choose a, b ∈ ∂K in such a way that c ∈ [a, b] ⊂ K and the point c + x belongs to a line going through a and b.Then we can identify f | [a,b] with a single variable function and the rest of the proof is done as in the previous case with the sequence of numbers multiplied by ± x .Theorem 1.13 (Generalized Lagrange).Let K be a compact subset of R n , intK = ∅ and let f : Proof.Let us consider the map g : It is constant on ∂K, so by 1.12 ∃c ∈ intK : 0 ∈ B c (f ).Obviously B c (L) = {∇L} and 1.10 applied for g and L ends the proof.

Normal cones
Let L be a linear subspace of R n .Any vector v ∈ R n can be uniquely presented in a form The map P L : R n ∋ v → v 1 ∈ L is the orthogonal projection on a subspace L. Now we consider the following situation.Let S ⊂ R n .For a ∈ S we recall the normal cone of S at a. Definition 2.1.The normal cone of S at a is the set If our considered set S is a graph of a function f : A → R, A ⊂ R n−1 , then for a ∈ A we will use the following notation: For the sake of further considerations, we introduce a modified definition of an angle between two (non-zero) vectors.Definition 2.3.For x, y ∈ R n , x, y = 0 we define the angle between x and y as Remark 2.4.By the above definition, the second condition in 1.2 coincides with We will show the Mean Value Theorem stated for a continuous function using the normal cone.The following known result will be crucial.Proposition 2.5.Let a ∈ S ⊂ R n and let x ∈ R n be a point for which the Euclidean distance from S is realized at a, that is Proof.We may assume that a = 0.If x / ∈ N 0 (S), then by definition there exists a point v ∈ C 0 (S), v = 0 for which x, v > 0. Without loss of generality we assume that v ∈ ∂B(x, x ).According to 1.2, This can be interpreted geometrically as the statement that the lines going through 0 and a n intersect ∂B(x, x ) at some points b n = 0 for all n ≥ N .We see that b n → v (n → +∞), so for n sufficiently large, the point a n must be inside the segment [0, b n ].Otherwise we would have But then x − a n < x for some n ∈ N.This contradiction finishes the proof.
Remark 2.6.The converse implication is not always true.Take as an example the graph of f (x) = ||x| − 1|.Then the point (0, 7) belongs to N 0 (f ) but its Euclidean distance from the graph is not realized at (0, 1).

Now to the main theorem.
Theorem 2.7.Let K be a compact subset of R n , intK = ∅ and let f : In another words, a normal cone at some point c inside the set K contains a nonzero vector perpendicular to Γ L .
Proof.Without a loss of generality, let C = 0. We consider the map It is continuous as a composition of continuous functions and defined on a compact set, so it reaches its limits.By the assumptions we know that g | ∂K ≡ 0. If g ≡ 0, then f is affine and Otherwise we choose a point c ∈ intK such that g(c) = sup x∈K g(x) and we consider the vector Then v is perpendicular to Γ L and moreover Then by Proposition 2.5 we have v ∈ N c (f ), what was to be proven.

subdifferential
In this chapter we will consider the case where the function fulfills the local Lipschitz condition.
All further considerations in this section will be taken for a locally Lipschitz function f , defined as in 3.1.The word locally can be omitted, as the set A ⊂ R n can be narrowed to a properly adjusted neighbourhood of the point a ∈ A.
For the convenience of the reader we will recall several basic notions from [3], first the generalized directional derivative of the function f at the point a in the direction v ∈ R n .Definition 3.2.Let f : A → R be a Lipschitz function, A ⊂ R n .The generalized directional derivative of the function f at the point a ∈ A and in the direction v ∈ R n is the limit Proposition 3.3.f • (a; v) as a function of a single variable v is bounded, positively homogeneous and subadditive.Moreover, where M is the Lipschitz constant of f .
We can apply the Hahn-Banach Theorem for the function f • (a; v), finding a linear functional ξ : R n → R such that Now let our attention be paid to the basic properties of the generalized gradient, given in [3].Proposition 3.7.For lipschitz functions f i : where the set on the right hand side consists of elements of the form We can see how the Mean Value Theorem can be formulated for a Lipschitz function using the generalized gradient.In [3] we can find the following fact, known as the Lebourg Theorem.Theorem 3.8 (Lebourg).Let U ⊂ R n be an open set and let x, y ∈ U be such that [x, y] ⊂ U .Then for a Lipschitz function f : There are several other notions of a subdifferential.In some sources (see e.g.[1] or [9]) the following can be found: Definition 3.9.Let U be an open and nonempty subset of R n and let f : U → R be a locally lipschitz function.For x ∈ U we define: (1) The subdifferential ∂f (x) of f at x as The limit subdifferential In [9], the links between all the different notions of subdifferentials are clearly explained.In particular, it is well-known that ∂f (x) = ∂ • f (x) for f being a locally Lipschitz function.
Remark 3.10.The Fréchet subdifferential has a convenient geometric interpretation.For x ∈ U consider the tangent cone C x (f ), taking into account all sequences {x n } +∞ n=1 ⊂ U convergent to x such that and x n = x ∀n ∈ Z + .Then ∂f (x) can be viewed as the set of vectors ξ ∈ R n such that for any sequence {x n } +∞ n=1 we have Now we shall see the connection between the Clarke subdifferential and the BTC subdifferential considered in the first section.Proof.Without loss of generality we may assume that a = f (a) = 0. Let us fix v ∈ ∂ • f (a).First we will prove that what will be done in three steps.
If y 1 < −y 2 , then, by 3.10, ∂f (0) = ∅ and there is nothing to prove.Assume that y 1 ≥ −y 2 .Then we find sequences Without loss of generality assume that |a n |, |b n | < 1 2 n for n ∈ N, choosing proper subsequences if needed.Now we wish to construct a bisequence then we fix c N = a N and consider the map It is continuous as a composition of continuous functions and furthermore It can be easily observed that this bisequence fulfills the desired conditions.Moreover, all segments [c i , d i ], i ∈ Z + are parallel to the segment [0, x].
Step 2. v ∈ ∂ ′ f (0).Let {x n } +∞ n=1 ⊂ U and {v n } +∞ n=1 ⊂ R n be sequences such that: Having fixed a point x ∈ R n , for all pairs (x i , v i ), i ∈ Z + we construct bisequences {(c k )} +∞ k=1 ⊂ U 2 as in the first step.Then for any n ∈ N we have Hence for the bisequence {(c and c n → 0 (n → +∞).Moreover, each of the segments [c for some T ∈ (0, 1).Analogically as in the second step for a given x ∈ R n we may construct bisequences { Without loss of generality we may assume that v 1 , x < v 2 , x .Then For all i ∈ Z + we consider a continuous mapping Now set N ∈ Z + , N > N 0 .From the second step we have So we can find a mean point t N ∈ [0, 1] such that for some v ∈ R n .Fix x 0 ∈ R n , x 0 = 0. Without any loss of generality, let x 0 = 1.
Then there exists a bisequence {a n , b n } +∞ n=1 such that Let v n = a n − b n .Then v n → 0 + and vn vn → x 0 .Moreover, For the first component of the above sum, we have an approximation: For the second component, by the Lipschitz condition: where L denotes the Lipschitz constant of f .Finally, v, x 0 ≤ f • (a; x 0 ), which by the arbitrariness of x 0 yields v ∈ ∂f (a).
As a corollary from the above theorem, we obtain a very elegant geometric interpretation of the Lebourg Theorem.
r a and define r b analogically.Then for x, y ∈ B(a, r a ) ∩ B(b, r b ) we have:

Theorem 3 . 11 .
Let U be open in R n , a ∈ U and let f : U → R be a locally lipschitz function.Then we have ∂ • f (a) = B a (f ).
Now it is enough to take eN −N0 = (1 − t N )a N + t N c N , f N −N0 = (1 − t N )b N + t N d N .This bisequence fulfills our desired conditions.Thus we have∂ • f (a) = B a (f ).Conversely, assume that (x, v, x ) ∈ B a (f ), ∀x ∈ R n

Corollary 3 . 12 .
Let U ⊂ R n be open and let x, y ∈ U be such that [x, y] ⊂ U .Then for a lipschitz function f : U → R there exists a point c ∈ (x, y) such that f (y) − f (x) ∈ B c (f ), y − x .