Calculus of variations and optimal control for generalized functions

We present an extension of some results of higher order calculus of variations and optimal control to generalized functions. The framework is the category of generalized smooth functions, which includes Schwartz distributions, while sharing many nonlinear properties with ordinary smooth functions. We prove the higher order Euler-Lagrange equations, the D'Alembert principle in differential form, the du Bois-Reymond optimality condition and the Noether's theorem. We start the theory of optimal control proving a weak form of the Pontryagin maximum principle and the Noether's theorem for optimal control. We close with a study of a singularly variable length pendulum, oscillations damped by two media and the Pais-Uhlenbeck oscillator with singular frequencies.

Leibniz's axiom Natura non facit saltus seems to be inconsistent not only at the atomic level, but also for several macroscopic phenomena. Even the simple motion of an elastic bouncing ball seems to be more easily modeled using non-differentiable functions than classical C 2 ones, at least if we are not interested to model the non-trivial behavior at the collision times. Clearly, we can always use smooth functions to approximately model the forces acting on the ball, but this would introduce additional parameters (see e.g. [62,71] for smooth approximations of steep potentials in the billiard problem). Therefore, the motivation to introduce a suitable kind of generalized functions formalism in classical mechanics is clear: is it possible, e.g., to generalize Lagrangian mechanics so that the Lagrangian can be presented as a generalized function? This would undoubtedly be of an applicable advantage, since many relevant systems are described by singular Lagrangians: non-smooth constraints, collisions between two or more bodies, motion in different or in granular media, discontinuous propagation of rays of light, even turning on the switch of an electrical circuit, to name but a few, and only in the framework of classical physics. Indeed, this type of problems is widely studied (see e.g. [6, 30, 36, 39, 44-46, 55, 68, 70]), but sometimes the presented solutions are not general or hold only for special conditions and particular potentials.
From the purely mathematical point of view, an enlarged space of solutions (with distributional derivatives) for variational problems leads to the the so-called Lavrentiev phenomenon, see [7,29]. Variational problems with singular solutions (e.g. non-differentiable on a dense set, or with infinite derivatives at some points) are also studied, see [12,13,29,66].
In this sense, the fact that since J.D. Marsden's works [51][52][53] there has not been many further attempt to use Sobolev-Schwartz distributions for the description of Hamiltonian mechanics, can be considered as a clue that the classical distributional framework is not well suited to face this problem in general terms (see also [60] for a similar approach with a greater focus on the generalization in geometry). In [51][52][53], J.D. Marsden introduces distributions on manifolds based on flows. Since the traditional system of Hamilton's equations breaks down, Marsden considers the flow as a limit of smooth ones. On the other hand, Kunzinger et al. in [38] (using Colombeau generalized functions) critically analyses the regularization approach put forward by Marsden and constructed a counter example for the main flow theorems of [51]. In this field, a number of problems remains open, see again [51]. For example, in the non-smooth case the variational theorems fail, and the study of the virial equation for hard spheres in a box is still an open question. All this motivates the use of different spaces of nonlinear generalized functions in mathematical physics and in singular calculus of variations, see e.g. [3,10,27,31,36,58,72].
In this paper, we introduce an approach to variational problems involving singularities that allows the extension of the classical theory with very natural statements and proofs. We are interested in extremizing functionals which are either distributional themselves or whose set of extremals includes generalized functions. Clearly, distribution theory, being a linear theory, has certain difficulties when nonlinear problems are in play.
To overcome this type of problems, we are going to use the category of generalized smooth functions (GSF), see [22][23][24][25]43]. This theory seems to be a good candidate, since it is an extension of classical distribution theory which allows to model nonlinear singular problems, while at the same time sharing many nonlinear properties with ordinary smooth functions, like the closure with respect to composition and several non trivial classical theorems of the calculus. One could describe GSF as a methodological restoration of Cauchy-Dirac's original conception of generalized function, see [14,35,41]. In essence, the idea of Cauchy and Dirac (but also of Poisson, Kirchhoff, Helmholtz, Kelvin and Heaviside) was to view generalized functions as suitable types of smooth set-theoretical maps obtained from ordinary smooth maps depending on suitable infinitesimal or infinite parameters. For example, the density of a Cauchy-Lorentz distribution with an infinitesimal scale parameter was used by Cauchy to obtain classical properties which nowadays are attributed to the Dirac delta, cf. [35].
In the present work, the foundation of the calculus of variations is set for functionals defined by arbitrary GSF. This in particular applies to any Schwartz distribution and any Colombeau generalized function (see e.g. [9,31]).
The main aim of the paper is to start the higher-order calculus of variations and the theory of optimal control for GSF. The structure of the paper is as follows. We start with an introduction into the setting of GSF and give basic notions concerning GSF and their calculus that are needed for the calculus of variations (Sec. 2). After the basic definitions to set the problem in the framework of GSF, we prove the higher order Euler-Lagrange equations, the D'Alembert principle in differential form, the du Bois-Reymond optimality condition and the Noether's theorem (Sec. 3). In Sec. 4, we start the theory of optimal control proving a weak form of the Pontryagin maximum principle and the Noether's theorem, and in Sec. 5 we close with a study of a singularly variable length pendulum, oscillations damped by two media and the Pais-Uhlenbeck oscillator with singular frequencies.
The paper is self-contained, in the sense that it contains all the statements required for the proofs of calculus of variations we are going to present. If proofs of preliminaries are omitted, we clearly give references to where they can be found. Therefore, to understand this paper, only a basic knowledge of distribution theory is needed.

Basic notions
The new ring of scalars. In this work, I denotes the interval (0, 1] ⊆ R and we will always use the variable ε for elements of I; we also denote ε-dependent nets x ∈ R I simply by (x ε ). By N we denote the set of natural numbers, including zero.
We start by defining a new simple non-Archimedean ring of scalars that extends the real field R. The entire theory is constructive to a high degree, e.g. neither ultrafilter nor non-standard method are used. For all the proofs of results in this section, see [22][23][24][25].
(2.1) This is a congruence relation on the ring R ρ of moderate nets with respect to pointwise operations, and we can hence define ρ R := R ρ / ∼ ρ , which we call Robinson-Colombeau ring of generalized numbers. This name is justified by [8,63]: Indeed, in [63] A. Robinson introduced the notion of moderate and negligible nets depending on an arbitrary fixed infinitesimal ρ (in the framework of nonstandard analysis); independently, J.F. Colombeau, cf. e.g. [8] and references therein, studied the same concepts without using nonstandard analysis, but considering only the particular gauge ρ ε = ε.
We can also define an order relation on ρ R by saying that [x ε ] ≤ [y ε ] if there exists (z ε ) ∈ R I such that (z ε ) ∼ ρ 0 (we then say that (z ε ) is ρ-negligible) and x ε ≤ y ε + z ε for ε small. Equivalently, we have that x ≤ y if and only if there exist representatives [x ε ] = x and [y ε ] = y such that x ε ≤ y ε for all ε. Although the order ≤ is not total, we still have the possibility to define the infimum [x ε ] ∧ [y ε ] := [min(x ε , y ε )], the supremum [x ε ] ∨ [y ε ] := [max(x ε , y ε )] of a finite number of generalized numbers. Henceforth, we will also use the customary notation ρ R * for the set of invertible generalized numbers, and we write x < y to say that x ≤ y and x − y ∈ ρ R * . Our notations for intervals are: As in every non-Archimedean ring, we have the following For example, setting dρ := [ρ ε ] ∈ ρ R, we have that dρ n ∈ ρ R, n ∈ N >0 , is an invertible infinitesimal, whose reciprocal is dρ −n = [ρ −n ε ], which is necessarily a positive infinite number. Of course, in the ring ρ R there exist generalized numbers which are not in any of the three classes of Def. 2, like e.g. x ε = 1 ε sin 1 ε . The following result is useful to deal with positive and invertible generalized numbers. For its proof, see e.g. [31].
Lemma 3. Let x ∈ ρ R. Then the following are equivalent: (i) x is invertible and On the ρ R-module ρ R n we can consider the natural extension of the Euclidean Even if this generalized norm takes values in ρ R, it shares some essential properties with classical norms: It is therefore natural to consider on ρ R n a topology generated by balls defined by this generalized norm and the set of radii ρ R >0 of positive invertible numbers: Definition 4. Let c ∈ ρ R n and x, y ∈ ρ R, then: The relation < has better topological properties as compared to the usual strict order relation a ≤ b and a = b (that we will never use) because the set of balls B r (c) | r ∈ ρ R >0 , c ∈ ρ R n is a base for a sequentially Cauchy complete topology on ρ R n called sharp topology. We will call sharply open set any open set in the sharp topology. The existence of infinitesimal neighborhoods (e.g. r = dρ) implies that the sharp topology induces the discrete topology on R. This is a necessary result when one has to deal with continuous generalized functions which have infinite derivatives. In fact, if f (x 0 ) is infinite, we have f (x) ≈ f (x 0 ) only for x ≈ x 0 , see [23]. Also open intervals are defined using the relation <, i.e. (a, b) := {x ∈ ρ R | a < x < b}. Note that by Lem. 3.(iii), for all r ∈ ρ R >0 there exists m ∈ N such that r ≥ dρ m . Therefore, also the set of balls B dρ m (c) | m ∈ N, c ∈ ρ R n is a base for the sharp topology.
We will also need the following result.
The reader can feel uneasy in considering a ring of scalars such as ρ R instead of a field. On the other hand, as mentioned above, we will see that all our generalized functions are continuous in the sharp topology. Therefore, the following result is partly reassuring: Lemma 6. Invertible elements of ρ R are dense in the sharp topology, i.e.
2.1. Open, closed and bounded sets generated by nets. A natural way to obtain sharply open, closed and bounded sets in ρ R n is by using a net (A ε ) of subsets A ε ⊆ R n . We have two ways of extending the membership relation x ε ∈ A ε to generalized points [x ε ] ∈ ρ R n (cf. [24,57]): (ii) Let (x ε ) be a net of points of R n , then we say that x ε ∈ ε A ε , and we read it as (x ε ) strongly belongs to if there exists a representative [x ε ] = x such that x ε ∈ A ε for ε small, whereas this membership is independent from the chosen representative in case of strongly internal sets. An internal set generated by a constant net A ε = A ⊆ R n will simply be denoted by [A].
The following theorem (cf. [24,25,57]) shows that internal and strongly internal sets have dual topological properties: For example, it is not hard to show that the closure in the sharp topology of a ball of center c = [c ε ] and radius r = [r ε ] > 0 is

2.2.
Generalized smooth functions and their calculus. Using the ring ρ R, it is easy to consider a Gaussian with an infinitesimal standard deviation. If we denote this probability density by f (x, σ), and if we set σ = [σ ε ] ∈ ρ R >0 , where σ ≈ 0, we obtain the net of smooth functions (f (−, σ ε )) ε∈I . This is the basic idea we are going to develop in the following Definition 9. Let X ⊆ ρ R n and Y ⊆ ρ R d be arbitrary subsets of generalized points. Then we say that Let us note explicitly that this definition states minimal logical conditions to obtain a settheoretical map from X into Y and defined by a net of smooth functions of which we can take arbitrary derivatives still remaining in the space of ρ-moderate nets. In particular, the following Thm. 10 states that the equality f ([ Theorem 10. Let X ⊆ ρ R n and Y ⊆ ρ R d be arbitrary subsets of generalized points. Let f ε ∈ C ∞ (Ω ε , R d ) be a net of smooth functions that defines a generalized smooth map of the type X −→ Y , then (iv) GSF are closed with respect to composition, i.e. subsets S ⊆ ρ R s with the trace of the sharp topology, and GSF as arrows form a subcategory of the category of topological spaces. We will call this category ρ GC ∞ , the category of GSF. Therefore, with pointwise sum and product, any space ρ GC ∞ (X, ρ R) is an algebra.
Similarly, we can define generalized functions of class ρ GC k , with k ≤ +∞: Definition 11. Let X ⊆ ρ R n and Y ⊆ ρ R d be arbitrary subsets of generalized points and k ∈ N ∪ {+∞}. Then we say that Note that properties (iv), (v) are required only for |α| = k because for lower length they can be proved using property (iii) and the classical mean value theorem for f ε (see e.g. [24,25]). From (i) and (ii) of Thm. 10 it follows that this definition of ρ GC k is equivalent to Def. 9 if k = +∞. Moreover, properties similar to (iii), (iv) of Thm. 10 can also be proved for ρ GC k .
The differential calculus for ρ GC k+1 maps can be introduced by showing existence and uniqueness of another ρ GC k map serving as incremental ratio (sometimes this is called derivativeï¿oe la Carathï¿oeodory, see e.g. [37]).
Theorem 12. Let U ⊆ ρ R n be a sharply open set, let v = [v ε ] ∈ ρ R n , k ∈ N ∪ {+∞}, and let f ∈ ρ GC k+1 (U, ρ R) be a ρ GC k+1 map generated by the net of functions f ε ∈ C k+1 (Ω ε , R). Then There exists a sharp neighborhood T of U × {0} and a map r ∈ ρ GC k (T, ρ R), called the generalized incremental ratio of f along v, such that , so that ∂f ∂v ∈ ρ GC k (U, ρ R). Note that this result permits us to consider the partial derivative of f with respect to an arbitrary generalized vector v ∈ ρ R n which can be, e.g., infinitesimal or infinite. Using recursively this result, we can also define subsequent differentials D j f (x) as j−multilinear maps, and we set , the generalized number defined by the operator norms of the multilinear maps A ε ∈ L j (R n , R d ).
The following result follows from the analogous properties for the nets of smooth functions defining f and g.
Note that the absolute value function | − | : ρ R −→ ρ R is not a GSF because its derivative is not sharply continuous at the origin; clearly, it is a ρ GC 0 function. This is a good motivation to introduce integration of ρ GC k functions. One dimensional integral calculus is based on the following Then, there exists one and only one generalized We can thus define Definition 15. Under the assumptions of Thm. 14, we denote by´( All the classical rules of integral calculus hold in this setting: .
We also have a generalization of Taylor formula: Then, for all n ∈ N ≤k we have Formulas (i) and (ii) correspond to a plain generalization of Taylor's theorem for ordinary functions with Lagrange and integral remainder, respectively. Dealing with generalized functions, it is important to note that this direct statement also includes the possibility that the differential d n+1 f (ξ) may be infinite at some point. For this reason, in (2.3) and (2.4), considering a sufficiently small increment k, we get more classical infinitesimal remainders d n+1 f (ξ) · k n+1 ≈ 0. We can also define right and left derivatives as e.g. f (a) := f + (a) := limt→a a<t f (t), which always exist if Analogously to the classical case, we say that x 0 ∈ X is a local minimum of f ∈ ρ GC k (X, ρ R) if there exists a sharply open neighbourhood (in the trace topology) Y ⊆ X of x 0 such that f (x 0 ) ≤ f (y) for all y ∈ Y . A local maximum is defined accordingly.

2.3.
Embedding of Sobolev-Schwartz distributions and Colombeau generalized functions. We finally recall two results that give a certain flexibility in constructing embeddings of Schwartz distributions. Note that both the infinitesimal ρ and the embedding of Schwartz distributions have to be chosen depending on the problem we aim to solve. A trivial example in this direction is the ODE y = y/dε, which cannot be solved for ρ = (ε), but it has a solution for ρ = (e −1/ε ). As another simple example, if we need the property H(0) = 1/2, where H is the Heaviside function, then we have to choose the embedding of distributions accordingly. See also [26,48] for further details. If ϕ ∈ D(R n ), r ∈ R >0 and x ∈ R n , we use the notations r ϕ for the function x ∈ R n → 1 r n · ϕ x r ∈ R and x ⊕ ϕ for the function y ∈ R n → ϕ(y − x) ∈ R. These notations permit us to highlight that is a free action of the multiplicative group (R >0 , ·, 1) on D(R n ) and ⊕ is a free action of the additive group (R >0 , +, 0) on D(R n ). We also have the distributive property r (x ⊕ ϕ) = rx ⊕ r ϕ.
and ψ ε is even for all ε ∈ I. (ii) Let ω n denote the surface area of S n−1 and set c n := 2n ωn for n > 1 and c 1 : Concerning embeddings of Schwartz distributions, we have the following result, where c( Theorem 21. Under the assumptions of Lemma 20, let Ω ⊆ R n be an open set and let (ψ b ε ) be the net defined in 20. Then the mapping and satisfies the following properties: Concerning the embedding of Colombeau generalized functions, we recall that the special Colombeau algebra on Ω is defined as the quotient G s (Ω) := E M (Ω)/N s (Ω) of moderate nets over negligible nets, where the former is Using ρ = (ε), we have the following compatibility result: for every open set Ω ⊆ R n .

Example 23. (i)
Let δ, H ∈ ρ GC ∞ ( ρ R, ρ R) be the corresponding ι b -embeddings of the Dirac delta and of the Heaviside function. Then  (v) which is a necessary property to prove Thm. 21.(i), see [25,26]. It is well-known that the latter property is one of the core ideas to bypass the Schwartz's impossibility theorem, see e.g. [31].

Functionally compact sets and multidimensional integration.
2.4.1. Extreme value theorem and functionally compact sets. For GSF, suitable generalizations of many classical theorems of differential and integral calculus hold: intermediate value theorem, mean value theorems, suitable sheaf properties, local and global inverse function theorems, Banach fixed point theorem and a corresponding Picard-Lindelï¿oef theorem both for ODE and PDE, see [22,[24][25][26]48].
Even though the intervals [a, b] ⊆ R, a, b ∈ R, are not compact in the sharp topology (see [24]), analogously to the case of smooth functions, a ρ GC k map satisfies an extreme value theorem on such sets. In fact, we have: We shall use the assumptions on K and (K ε ) given in this theorem to introduce a notion of "compact subset" which behaves better than the usual classical notion of compactness in the sharp topology.
We motivate the name functionally compact subset by noting that on this type of subsets, GSF have properties very similar to those that ordinary smooth functions have on standard compact sets.
Remark 26. In the present paper, we need the following properties of functionally compact sets.
As a corollary of this theorem and Rem. (26).(ii) we get Let us note that a, b ∈ R can also be infinite numbers, e.g.
Finally, in the following result we consider the product of functionally compact sets: A theory of compactly supported GSF has been developed in [22], and it closely resembles the classical theory of LF-spaces of compactly supported smooth functions. It establishes that for suitable functionally compact subsets, the corresponding space of compactly supported GSF contains extensions of all Colombeau generalized functions, and hence also of all Schwartz distributions.
As in the classical case (see e.g. [21]), thanks to the extreme value theorem 24 and the property of functionally compact sets K, we can naturally define a topology on the space ρ GC k (K, ρ R d ): Definition 30. Let K f ρ R n be a functionally compact set such that K = • K (so that partial derivatives at sharply boundary points can be defined as limits of partial derivatives at sharply interior points; such K are called solid sets). Let l ∈ N ≤k and v ∈ ρ GC k (K, ρ R d ). Then The following result (see [43] and [2] for a similar approach) permits us to calculate the (generalized) norm v l using any net (v ε ) that defines v.
Lemma 31. Under the assumptions of Def. 30, let [K ε ] = K f ρ R n be any representative of K. Then we have: Using these ρ R-valued norms, we can naturally define a topology on the space ρ GC k (K, ρ R d ).
One can easily prove that sharply open sets form a sequentially Cauchy complete topology on ρ GC k (K, ρ R d ), see e.g. [23,49]. The structure ρ GC k (K, ρ R d ), ( − l ) l≤k has the usual properties of a graded Frï¿oechet space if we replace everywhere the field R with the ring ρ R, and for this reason it is called an ρ R-graded Frï¿oechet space.

Multidimensional integration.
Finally, to deal with higher-order calculus of variations, we have to introduce multidimensional integration of GSF on suitable subsets of ρ R n (see [25]).
Definition 33. Let µ be a measure on R n and let K be a functionally compact subset of ρ R n . Then, we call K µ-measurable if the limit exists for some representative (K ε ) of K. Here m ∈ N, the limit is taken in the sharp topology on In the following result, we show that this definition generates a correct notion of multidimensional integration for GSF.
(iii) Let (K ε ) be any representative of K and let f ∈ ρ GC ∞ (K, ρ R) be a GSF defined by the net (f ε ). ThenˆK exists and its value is independent of the representative (K ε ).
(iv) There exists a representative (K ε ) of K such that for any representatives (a i,ε ), (b i,ε ) of a i and b i respectively. Therefore, if n = 1, this notion of integral coincides with that of Thm. 14 and Def. 15.

Higher-order calculus of variations for generalized functions
In this section, we prove for GSF the higher-order Euler-Lagrange equation, the du Bois-Reymond optimality condition and the Noether's theorem. We start with some definition of basic notions and notations. Using the sharp topology of Def. 32, we can define when a curve is a minimizer of a given functional. Note explicitly that there are no restrictions on the generalized numbers a, b ∈ ρ R, a < b, e.g. they can also both be infinite numbers.
Definition 35. Let t 1 , t 2 ∈ ρ R, with t 1 < t 2 , and let m ∈ N >0 , then The subscript "bd" stands here for "boundary values". Note explicitly that both ρ GC ∞ 0 (m) and ρ GC ∞ Note explicitly that Thm. (iv).10 (closure of GSF with respect to composition) and Def. 15 of 1-dimensional integral of GSF (i.e. Thm. 14) allows us to say that J(q) is a well-defined number in ρ R.
The following results establish classical necessary and sufficient conditions to decide if a function u is a local minimizer for the functional (3.3).
is well defined and continuous with respect to the trace of the sharp topology in its codomain. Therefore, we can findr . This shows that the GSF s ∈ Br(0) → J(q + sh) ∈ ρ R has a local minimum at s = 0. Now, we apply Lem. 19 and thus the claims are proven.
. We claim that s = 0 is a minimum of ψ. In fact, for all s ∈ B 1 (0) by Taylor's Thm. 18 In the following, we always assume the notations of Def. 35. Moreover, we also set for the set of all the polynomials with coefficients in the ring ρ R having degree less of equal to d ∈ N.
In order to prove the higher-order version of the fundamental lemma, we need the following preliminary results: implies f (x) ≤ dρ q and hence also, letting q → +∞, f (x) ≤ 0. Since f (x) ≥ 0, this would imply f (x) = 0. Therefore, taking the negation of (3.5) we get Moreover, from f ≥ 0 and Thm. 16  The second preliminary result introduces the use of approximate identities for convolution with GSF (see also [43,Lem. 4.3]): Only in this statement, we write ∀ 0 t ∈ ρ R >0 to denote ∃r ∈ ρ R >0 ∀t ∈ B r (0) ∩ ρ R >0 , i.e. "for t sufficiently small (in the sharp topology)". Assume that Proof. We only have to generalize the classical proof concerning limits of convolutions with strict delta nets. We first note that For each r ∈ ρ R >0 , sharp continuity of f at x yields |f (x − y) − f (x)| < r for all y such that |y| < δ ∈ ρ R >0 , and we can take δ < R. Assuming that (ii) holds for all 0 < t < s, for where we used (iii). The right hand side of (3.8) can be taken arbitrarily small in ρ R >0 because it holds for all r ∈ ρ R >0 .

3.1.
Higher-order Euler-Lagrange equation, D'Alembert principle and du Bois-Reymond optimality condition. In this section, we first compute the first variation δJ(q; h) and thereby we deduce the Euler-Lagrange equations both in integral and differential forms, and the D'Alembert principle in differential form.
In order to prove the Euler-Lagrange equation in integral form (see below Cor. 44), we first need to show the fundamental lemma and the higher order du Bois-Reymond lemma of the calculus of variations for generalized functions.
Lemma 41 (Fundamental Lemma of the Calculus of Variations). Let a, b ∈ ρ R such that a < b, and the assumptions of Lem. 39 hold. Therefore and Lem. 5 hence yields f (x) = 0.
Lemma 42 (Fundamental higher order du Bois-Reymond lemma). Let a, b ∈ ρ R be such that Proof. For all h ∈ ρ GC ∞ 0 (a, b), using m times integration by parts in (3.12), by h (i) (a) = h (i) (b) = 0 for all i = 0, . . . , m − 1 we get´b a f (m) (t)h(t) dt = 0. Thereby, the fundamental Lem. 41 yields f (m) = 0. Integrating m times (see Thm. 14), we get the claim As a simple consequence, we have the following  0 (a, b). Therefore, Lem. 42 implies that g(t) − F (t) is a constant (in ρ R), and hence g = F = f as claimed.
Applying the higher-order du Bois-Reymond Lem. 42, we arrive at the first form of the Euler-Lagrange equations: , ρ R is a weak extremal of functional (3.3), then q satisfies the following higher-order Euler-Lagrange integral equation: (3.13) From the second equality of Lem. 40 and the fundamental Lem. 41, we obtain Corollary 45 (Higher-order Euler-Lagrange equations in differential form). (3.14) Finally, Lem. 40, Def. 35.(v) and the fundamental Lem. 41 directly yield the D'Alembert principle in differential form: and assume that the functional J satisfies at q the D'Alembert's principle with generalized forces Q, then Associated to a given function q ∈ ρ GC ∞ [t 1 , t 2 ], ρ R , it is convenient to introduce the following quantities (see [20]): These operators are useful for our purposes because of the following properties: We are now in conditions to prove a higher-order du Bois-Reymond optimality condition.
Proof. Using, for simplicity, ϕ j := ϕ j (q), we have d dt We now simplify the second term on the right-hand side of (3.19), which forms a telescopic sum: Substituting (3.20) into (3.19) and using the higher-order Euler-Lagrange equations (3.14), and since, by definition, we obtain the intended result, that is,

3.2.
Higher-order Noether's theorem. In our approach to Noether's theorem, we can use the non-Archimedean language of ρ R to formalize some infinitesimal properties frequently informally used in this topic.

(3.25)
Proof. Differentiating (3.21) with respect to s at s = 0 ∈ P and t ∈ T and using Rem 50.(ii), we obtain Note explicitly that using two times Thm. 12, for any GSF f we always have and for all i = 2, . . . , m ∂ ∂s

Theorem 53. If functional (3.3) is invariant in the sense of Def. 49, then the quantity C[q] m (t)
defined for all q ∈ ρ GC ∞ ([t 1 , t 2 ], S ) and t ∈ T by is a constant of motion on T and S .
Proof. The proof follows directly from the previous results and no specific property of GSF is needed. For simplicity, we use the notations ϕ i := ϕ i (q, ·) and η i := η i (q, ·).

Optimal control problems
Thm. 53 gives a Lagrangian formulation of Noether's principle extended to the generalized smooth functions setting. In this section, we adopt the Hamiltonian formalism in order to generalize Noether's principle to the wider context of optimal control, see e.g. [15].

(CP)
We explicitly note here that, because our generalized functions are set-theoretical maps, and because of the closure with respect to composition, the natural notion of solution for a differential equation with GSF is given by pointwise equality. On the other hand, Thm. 14 and Cor. 43 imply that, for GSF, the notion of pointwise solution is equivalent to that of weak solution.
In problem (4.1)-(CP), 3), this is clearly more general than the usual traditional case, where state and control functions are assumed to be piecewise smooth. In particular, we will always consider state and control functions in the following spaces where r ∈ ρ R >0 is a fixed generalized radius such that B r (q 1 ) ⊆ H. In this section, we use simplified notations, e.g., of the form L(·, q,q) or ϕ(·, q, u) to denote compositions t ∈ [t 1 , In developing this topic, it is therefore essential to already have suitable results about solutions of ODE with GSF. The Banach fixed point theorem can be easily generalized to spaces of generalized continuous functions with the sup-norm − 0 (see Def. 30). As a consequence, we have the following Picard-Lindelï¿oef theorem for ODE in the ρ GC k setting, see also [16,49], and [67] for an updated reference about solution of (fractional) ODE in the framework of Colombeau's algebra.
Finally, we have the following Grï¿oenwall-Bellman inequality in integral form: Theorem 55. Let α ∈ ρ R >0 . Let u, a, b ∈ ρ GC k [0, α], ρ R and assume that a 0 · α < N · log dρ −1 for some N ∈ N. Assume that a(t) ≥ 0 for all t ∈ [0, α], and that u(t) ≤ b(t) + t 0 a(s)u(s) ds. Then In this section, we always assume that the state equation ϕ satisfies the assumptions of Thm. 54, i.e. setting: we assume that Therefore, Thm. 54 allows us to state that ∀u ∈ A ∃!q u ∈ Q : (CP) holds for all t ∈ [t 1 , t 2 ]. (4.8) Note that Thm. 54 in particular yields both , and thereby q u ∈ Q (see (4.2)). Therefore, (4.8) holds for all solid functionally compact sets H f ρ R d and K f ρ R l such that both B r (q 1 ) ⊆ H and (4.7) hold. Note also explicitly, the importance in this deduction of the closure of GSF (of which, we recall, Sobolev-Schwartz distributions are a particular case) with respect to composition (Thm. 10.(iv)).
Note that the Lipschitz constant J ∈ ρ R in (4.26) can be an infinite number, e.g. if the Lagrangian L shows some kind of singularity as a generalized function. (4.28) Moreover, for any control u ∈ A, let p u ∈ ρ GC ∞ ([t 1 , t 2 ], ρ R d ) denote the adjoint variable (generalized momentum), i.e. the unique GSF solution of the Cauchy problem As we showed above, this problem has a unique solution on [t 1 , t 2 ] because of our assumptions (4.7).
We want to close this section, giving a proof of the weak Pontryagin Maximum Principle, i.e. a theorem where instead of the usual condition (if v is an optimal control, i.e. it solves problem (4.11); see e.g. [4,33,61,65]) we have instead the necessary condition ∂ u H(t, q v (t), v(t), p v (t)) = 0 assuming that v ∈ A is a local minimum of the functional Directly from the definition of Hamiltonian, we get that, for any control u ∈ A, the pair (q u , p u ) is a solution of the following Hamiltonian system: Next, we prove the following optimality condition for the functional (4.10): Theorem 63. In the assumptions of Thm. 60, let u ∈ Then ∂H ∂u (·, q u , u, p u ) ·ū.
The next proposition generalizes the du Bois-Reymond necessary optimality condition Thm. 47 (as usual, set in the following theorem: m = 1, ϕ(t, q, u) = u so thatq = u). Its proof follows directly from (4.28) and the system (WPS).
Note that for the particular case of calculus of variations (ϕ(t, q, u) = u and hence H = L + p ·q) one obtains from (4.36) the necessary condition of invariance Lem. 51.

Examples and applications
5.1. Modeling singular dynamical systems. In this section we show several applications of the calculus of variations with GSF we introduced above. As we already mentioned in the introduction, we will not consider mathematical models of singular dynamical systems at the times when singularities occur. Indeed, this would clearly require new physical ideas, e.g. in order to consider the nonlinear behavior of objects or materials for the entire duration of the singularity. Like in every mathematical model, the correct point of view concerns J. von Neumann's reasonably wide area of applicability of a mathematical model: To begin, we must emphasize a statement which I am sure you have heard before, but which must be repeated again and again. It is that the sciences do not try to explain, they hardly ever try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of some verbal interpretations, describes observed phenomena. The justification of such a mathematical construct is solely and precisely that it is expected to work -that is correctly to describe phenomena from a reasonably wide area. [73, pag. 492] Therefore, it is not epistemologically correct to use the theory described in the present article to deduce a physical property of our modeled systems when a singularity occurs. Stating it with a language typically used in physics, we consider physical systems where the duration of the singularity is negligible with respect to the durations of the other phenomena that take place in the system. Mathematically, this means to consider as infinitesimal the duration of the singularities. As a consequence, several quantities changing during this infinitesimal interval of time have infinite derivatives. We can hence paraphrase the latter sentence saying that the amplitude (of the derivatives) of these physical quantities is much larger than all the other quantities we can estimate in the system. However, this is a logical consequence of our lacking of interest to include in our mathematical model what happens during the singularity, constructing at the same time a beautiful and sufficiently powerful mathematical model, and not because these quantities really become infinite.
As we will see below for the singularly variable length pendulum (Sec. 5.3), only a rigorous mathematical theory of infinitesimal quantities is able to resolve, e.g., the apparent inconsistency of considering infinitesimal oscillations and at the same time neglecting the infinitesimal duration of the singularity.
On the other hand, the aforementioned "wide area" is now able to include in a single equation the dynamical properties of our modeled systems, without being forced to subdivide into cases of the type "before/after the occurrence of each singularity". Which is not even reasonable in several cases, e.g. in the motion of a particle in a granular medium or of a ray of light in an optical fiber. Clearly, if an explicit analytic solution is possible, this is preferable, but this is a rare event. On the other hand, in numerous cases we have to deal with a differential equation whose singularities are generated by Heaviside's functions or Dirac's deltas and linear or nonlinear operations with them. Embeddings of these generalized functions as GSF are studied and quite well-known, see Sec. 2.3. Their pictures, Fig. 2.1, are clearly obtained by numerical methods, but their properties can be fully justified by suitable theorems, see e.g. example 23. In the same way, we can consider numerical solutions of our differential equations as empirical laboratories helping us to guess suitable properties and hence conjectures on the solutions. In principle, these properties must be justified by corresponding theorems. From this point of view, the fact that GSF share with ordinary smooth functions a lot of classical theorems (such as the intermediate value, the extreme value, the mean value, Taylor theorems, etc.) is usually of great help. 5.3. Singularly variable length pendulum. As a first example, we want to study the dynamics of a pendulum with singularly variable length, e.g. because it is wrapping on a parallelepiped (see Fig. 5.1; see [54] for a similar but non-singular case).
The pendulum length function is therefore Λ(θ) = H(θ 0 −θ)L 1 +L 2 , where H is the (embedding of the) Heaviside function. We always assume that L 1 , L 2 ∈ ρ R >0 are finite and non-infinitesimal numbers. From this it follows that for all θ, L1 and hence that also Λ(θ) > dρ is invertible (recall that H has negative infinitesimal oscillations in an infinitesimal neighborhood of the origin, see Fig. 2.1).
The kinetic energy is given by: The potential energy (the zero level being the suspension point of the pendulum) is: and δ is the Dirac delta function. We then obtain the following equation of motion: Taking into account thatΛ(θ) = Λ (θ)θ, we finally obtain the equation of motion for the variable length pendulum:θ Note in (5.9) the nonlinear operations on the Sobolev-Schwartz distribution Λ, on the GSF θ and the composition t → Λ(θ(t)). Since the Lagrangian L is autonomous, in the usual way Noether's Thm. 53 implies that the mechanical energy of the system is a constant of motion: (5.10) Even if local (in time) existence and uniqueness of the solution θ of (5.10) is easily granted by Thm. 54 and by the extreme value Thm. 24, the existence of a global solution is not so easy to show (see in [49] the general theory) and is out of the scope of the present work. Before showing the numerical solution of (5.9), let us consider the simplest case of the dynamics far from the singularity and that of small oscillations. The former, as we mentioned above, is the only physically meaningful one. 5.3.1. Description far from singularity and small oscillations. For simplicity, let us consider the simplest case θ 0 = 0. Furthermore, we consider that the pendulum is initially at rest and starts its movement at t 1 ∈ ρ R. The initial conditions we use are hence: with θ 1 < 0. Assuming that at some time t 3 ∈ ρ R we have θ(t 3 ) > 0, by the intermediate value theorem for GSF, there exists t 2 ∈ ρ R where we have the singularity, i.e. θ(t 2 ) = 0 and the length of the pendulum smoothly changes from L 1 + L 2 to L 2 after the rope touches the parallelepiped. By example 23, it follows that this change happens in an infinitesimal interval, because by contradiction it is possible to prove that if Λ(θ) ∈ (L 2 , L 1 + L 2 ), then |θ| ≤ −1 log dρ ≈ 0.
Definition 71. Let x, y ∈ ρ R. We say that x is far from y if |x − y| ≥ dρ a for all a ∈ R >0 . More generally, we say that x is far from y with respect to the class of infinitesimals For example, if |x| ≥ r for some r ∈ R >0 , then x is far from 0, but also the infinitesimal number x = −1 k log dρ (k ∈ R >0 ) is far from 0; similarly, the infinitesimal x = −1 k log log dρ if far from 0 with respect to all the infinitesimals of the type −1 h log dρ for h ∈ R >0 . If θ is far from 0 and b ≥ dρ −a , a ∈ R >0 , then |bθ| ≥ dρ −a |θ| ≥ dρ −a/2 ≥ 1. Therefore, from example 23, it follows that H(−θ) ∈ {0, 1} and henceΛ(θ(t)) = 0. Equation (5.9) becomes If we assume that θ(t 1 ) = θ 1 < 0 and θ(t 3 ) > 0 are far from 0, the sharp continuity of θ yields the existence of δ 1 , δ 3 ∈ ρ R >0 such that because θ(t 2 ) = 0). Assuming that t 1 , t 3 are far from t 2 , without loss of generality we can also assume to have taken δ i so small that also t 1 + δ 1 and t 3 − δ 3 are far from t 2 .
Since each t ∈ [t 1 , t 1 + δ 1 ) ∪ (t 3 − δ 3 , t 3 ] is far from t 2 , we can also formally join the two solutions ϑ i using the Heaviside's function: For the motivations previously stated, this infinitesimal approximation cannot be extended to a neighborhood of t 2 . We close this section noting that all these deductions can be repeated using any GSF H ∈ ρ GC ∞ ( ρ R, ρ R) satisfying for all x far from zero H(x) = 1 if x > 0 and H(x) = 0 if x < 0. This allows us to consider e.g. a GSF that does not show the infinitesimal oscillations of the Heaviside's function in an infinitesimal neighborhood of the origin. The graph of θ(t), and its derivativeθ(t), based on the Mathematica definitions of H(x) and δ(x) (see [75]) are shown in Figure 5.2. In Figure 5.3 we show the second derivative graph. Directly from (5.9) and (5.7) we can prove that when θ(t) = θ 0 ,θ(t) is an infinite number and henceθ(t) has a corner point. Note finally that no infinitesimal oscillations are represented in an infinitesimal neighborhood of the singularities. This is due to the Mathematica implementation of H and δ: we could hence say that these graphs represent the solution far from the singularities.

5.4.
Oscillations damped by two media. The second example concerns oscillations of a pendulum in the interface of two media. Since we have to neglect the dynamics occurring at singular times (i.e. at the changing of the medium), this can be considered only a toy model approximating the case of a very small but sufficiently heavy moving particle.
We hence want to model the system employing a "jump" in the damping coefficient β, i.e. a finite change occurring in an infinitesimal interval of time, see Fig. 5.4. Since the frictional forces acting in this case are not conservative, it is well-known that the Euler-Lagrange equations cannot be assumed to describe the dynamics of the system and we have to use the D'Alembert principle Cor. 46; see [42] for a deeper study.
The kinetic energy is given by: In case of fluid resistance proportional to the velocity, we can introduce the generalized forces Q as: where r is a proportional coefficient depending on the media. Let's define the Lagrangian L as L(θ,θ) := T (θ) − U (θ). (5.20) We hence assume that the equation of motion for this non-conservative system is given by the D'Alembert's principle, so that Cor. 46 gives d dt By introducing the damping coefficient β(θ) := r(θ)/(2m) (we clearly assume that the mass m ∈ ρ R > 0 is invertible) we obtain the classical form of the equation of motion for damped If the pendulum crosses the boundary between two media (see Fig. 5.4) with damping coefficients β 1 and β 2 , we can model the system using the Heaviside function H: β(θ) = β 1 + (H(θ + θ 0 ) − H(θ − θ 0 )) (β 2 − β 1 ), (5.24) where θ = ±θ 0 are the angles at which we have the changing of the medium (singularities). The numerical solution of (5.23) with β defined by (5.24) and initial conditions (5.16) is presented in Fig. 5.5. The numerical solution has been computed using Mathematica Solver NDSolve, but with an implementation of the Heaviside's function H corresponding to Thm. 21, i.e. as represented in Fig. 2.1.
We also include the graphs of the angular frequencyθ (which shows corner points) and of the angular accelerationθ (which shows "jumps", i.e. infinite derivatives at singular times, as we can directly see from (5.23) and (5.24)).
Note that, to simplify our analysis, we considered a fixed length Λ for the pendulum. However, in the framework of the suggested model, we could also consider a singularly variable length pendulum dumped by two media.
Once again, we note that the model can be refined by considering, instead of the Heaviside's function, any GSF H ∈ ρ GC ∞ ( ρ R, ρ R) satisfying for all x far from zero H(x) = 1 if x > 0 and H(x) = 0 if x < 0; e.g. this refinement could allow one to consider different "large" infinitesimal neighborhoods of the origin in order to take into account a better modeling near the singularities. 5.5. Pais-Uhlenbeck oscillator with singular frequencies. An interesting higher order system with several applications in Physics is the Pais-Uhlenbeck (PU) oscillator [59]. The PU oscillator is usually studied as a toy model of recent higher-derivative theories like gravity, quantummechanical field theories and others (see for example [1,5,32,50]), but it can also describe simple real word systems such as, for example, electrical circuits [5]. Actually, the PU oscillator is a bi-harmonic oscillator with two natural frequencies and is described by a fourth-order equation of motion. An example of PU GSF oscillator would be a PU electric circuit with singularly variable frequencies. We can think at an (exogenous or endogenous) force acting on the system and rapidly changing these frequencies.
The oscillator we consider is given by the following Lagrangian function L(t, q,q,q) = m 2 q 2 − ω 2 1 (t) + ω 2 2 (t) q 2 + ω 2 1 (t)ω 2 2 (t)q 2 ∀t ∈ [t 1 , t 3 ] (5.25)  Second derivativeθ of the solution of (5.23) (blue line). The case with β = const = β 1 is also shown for comparison (violet line). Note the "jumps" at the singular moments, for example at t = 0.083 s (scaled in the right figure). The infinitesimal oscillations are caused by the embedding as GSF of the Heaviside function.

Conclusions
The present paper concretely shows the possibility to develop calculus of variations and optimal control for a class of generalized functions extending Sobolev-Schwartz distributions. On the one hand, we have clearly only presented basic results, but, on the other hand, this paves the way for a lot of further results in pure and applied mathematics, theoretical physics and engineering.
We have also shown that generalized smooth function theory has features that closely resemble classical smooth functions. In contrast, some differences have to be carefully considered, such as the fact that the new ring of scalars ρ R is not a field, it is not totally ordered, it is not ordered complete, so that its theory of supremum and infimum is more involved (see [56]), and its intervals are not connected in the sharp topology because the set of all the infinitesimals is a clopen set. Almost all these properties are necessarily shared by other non-Archimedean rings because their opposite are incompatible with the existence of infinitesimal numbers.
Conversely, the ring of Robinson-Colombeau generalized numbers ρ R is a framework where the use of infinitesimal and infinite quantities is available, it is defined using elementary mathematics, and with a strong connection with infinitesimal and infinite functions of classical analysis (see Def. 2). As we have shown in Sec. 5.3 with the singularly variable length pendulum, this leads to a better understanding and opens the possibility to define new models of physical systems.