A note on continuous-time stochastic approximation in infinite dimensions*

We find sufficient conditions for convergence of a continuous-time Robbins-Monro type stochastic approximation procedure in infinite dimensional Hilbert spaces in terms of Lyapunova functions, the variational approach to stochastic partial differential equations being used as the main tool.


1.1
Stochastic approximation was originally introduced as a procedure for sequentially finding a zero or an extremum point of a function which can be observed only with a random measurement error; it has found many applications e.g. to recursive estimation, adaptive control or learning algorithms, see the books [BMP,BPP,Bo,Che2,KC] or [KY] for a thorough information about the stochastic approximation methods. The seminal Robbins-Monro procedure may be roughly described as follows: Let R : R → R be a function which is known to have a unique root x 0 but the observation of R(x) at time k ∈ N is corrupted by a noise e k (x). Let α n > 0 be such that Then under suitable assumptions upon the function R and the random variables e k (x) it may be shown that Y n → x 0 almost surely as n → ∞. M. B. Nevel'son and R. Z.
Khas'minskiȋ in their book [NCh] studied a continuous-time version of stochastic approximation. In particular, they introduced a continuous-time analogue of the Robbins-Monro procedure: Consider a stochastic differential equation dX = α(t) R(X) dt + σ(t, X) dW , X 0 = x, (1.1) where W is a Wiener process and α is a strictly positive function in L 2 (R ≥0 ) \ L 1 (R ≥0 ). Sufficient conditions for X t to converge to the zero set of R almost surely as t → ∞ were found in terms of existence of a suitable Lyapunov function for (1.1). One may consult the book [Ko] or the papers [Pf,Che,La1,La2,La3,La4] for further results in this direction. Due to powerful tools from stochastic analysis, proofs in the continuous-time case may be presented in a very lucid way (cf. also [Che] for a discussion of this point).
The aim of this note is to extend the stochastic analysis approach, in the form proposed by Nevel'son and Khas'minskiȋ, to infinite-dimensional Hilbert spaces. Several results on discrete-time stochastic approximation in infinite-dimensional spaces are available, cf. e.g. [BRS, Go, KS, Ni, Yi, YZ], but the only paper using infinite-dimensional stochastic analysis to study stochastic approximation we are aware of is [BYY,§ 4]. However, [BYY] treats stochastic delay equations, whilst we are interested in stochastic partial differential equations. We confine ourselves to procedures of the Robbins-Monro type in the case of a unique root, since we see our task in indicating how the ideas from [NCh] may be combined with techniques from the theory of stochastic evolution equations, not in obtaining the strongest possible results. A typical example we can cover is the following: Consider a nonlinear elliptic equation ∆u + r(u) = f in D, u = 0 on ∂D, (1.2) where D ⊆ R d is a bounded domain with a smooth boundary ∂D, and a stochastic parabolic equation is again a strictly positive function. Sufficient conditions on r will be found for the solution X of (1.3) to converge almost surely to the (unique) solution u 0 ∈ W 1,2 0 (D) of (1.2) (see Example 3.1 below). A common approach to equations like (1.3) is to interpret them in the mild sense, as an equation where U is the evolution operator generated by α(·)∆. However, our proofs rely heavily on the use of Lyapunov functions, while mild solutions are not semimartingales and the Itô formula cannot be applied directly to them, approximations of a rather technical nature being needed. Thence we decided to use the theory of variational solutions, going back to [Pa] and [KR] (see e.g. the books [Cho,Chapter 6] and [LR,Chapters 4 and 5] for a more recent presentation). Moreover, this choice makes it possible to deal with quasilinear problems (see Examples 3.2, 3.3).
Before stating our main results we have to introduce some notation and recall a few basic facts about variational solutions we shall need. This is done in the next two subsections; the main results are stated and proved in Section 2, in Section 3 some illustrative examples are provided.

1.2
Let E and F be Banach spaces, we shall denote by L (E, F ) the space of all bounded linear operators from E to F . If both spaces are Hilbert, by L 2 (E, F ) the ideal of Hilbert-Schmidt operators in L (E, F ) will be denoted. C k (G) will stand for the space of k-times continuously differentiable real-valued functions on an open set G ⊆ E. If f : G → R, we shall denote by Df (x) and D 2 f (x) the first and second Gâteaux derivative of f at the ECP 22 (2017), paper 36. point x, respectively, provided they exist. Analogously, if f : R ≥0 × G → R, D x f (t, x) and D 2 x f (t, x) will stand for the first and second Gâteaux derivative of f (t, ·) at the point x.
For spaces of (Bochner) integrable functions and Sobolev spaces we shall use standard notation; finally, by λ the Lebesgue measure on R ≥0 will be denoted.

1.3
Let H and K be real separable Hilbert spaces, B a reflexive Banach space embedded continuously and densely in H. Upon identifying H with its dual H * we get a Gelfand triple B ⊆ H ⊆ B * ; note that -in this representation -the restriction of the dual pairing ·, · B * ,B to H × B coincides with the scalar product ·, · H in H. Assume that and consider a stochastic evolution equation is a filtered probability space whose filtration satisfies the usual conditions and on which a standard cylindrical (F t )-Wiener process W on K and a B * -valued (F t )-progressively measurable process X are defined for all t ≥ 0 P -almost surely.
Since the process X solving (1.4) is in general only B * -valued, the Itô formula cannot be used to compute ϕ(X) for an arbitrary ϕ ∈ C 2 (H) and extra assumptions on ϕ are needed. We state here two Itô formula-type results which we shall need later.
First, let (Ω, F , (F t ), P ) be a filtered probability space satisfying the usual conditions and carrying a standard cylindrical (F t )-Wiener process W on K. Assume that: (1.5) then there exists a B-valued processũ such thatũ ∈ L p loc (R ≥0 ; B) P -almost surely and u =ũ λ ⊗ P -almost everywhere on R ≥0 × Ω. Then u has sample paths in C(R ≥0 ; H) P -almost surely and [KR,Theorem 2.17] or [LR,Theorem 4.2.5].
Comparing this result with Definition 1.1 we see that any solution X of (1.4) has continuous sample path in H P -almost surely.
In order to establish the Itô formula for functions more general than · 2 H one needs an additional hypothesis (C) Both B and B * are uniformly convex.

Remark 1.2. a)
The hypothesis (C) is obviously satisfied if B is a Hilbert space. Let us emphasize that (C) can be omitted if ϕ = · 2 H or, more generally, if processes of the form ψ(t, u(t) 2 H ) with ψ ∈ C 1,2 are considered. b) An Itô formula for the process χ(t, u(t)), where χ is a suitable smooth function on R ≥0 × H, is proved in [Cho,Theorem 7.2.1], but under rather restrictive additional assumptions on u.

Main results
Following [NCh], we derive the convergence of a Robbins-Monro type procedure as an immediate corollary to a theorem providing sufficient conditions for the convergence of path of any solution of (1.4) to a singleton {x 0 }, which will be established first. (In applications to stochastic approximation, x 0 will be the unique root of the drift coefficient, but on an abstract level, it may be an arbitrary point in H.) Hence, let us consider the and denote by L the Kolmogorov operator associated with it, namely, if h ∈ K, then we set Further, let us consider the following conditions: 3) x 0 being the point introduced in (H1), and H V (0, y) dµ(y) < ∞. (2.4)

Remark 2.2.
Tracing the proof below one may check easily that -as in the finitedimensional case -it suffices to assume instead of (2.1)-(2.3) that V ≥ 0, there exists τ ≥ 0 such that lim x→x0 sup t≥τ V (t, x) = 0, and that for any ε > 0 there exists τ = τ (ε) such that Remark 2.3. The singleton {x 0 } may be replaced with an arbitrary closed set Γ ⊆ H.
Let (2.1)-(2.3) be modified in the following way: requires only very straightforward changes; unfortunately, this result is usually too weak to be applied to equations with multiple roots (cf. the discussion in [NCh,Chapter 5]).
Proof. a) The first two steps of the proof are essentially known from stability theory of stochastic PDEs, but we provide them for completeness and as we shall refer to parts of the argument in the sequel. Set Since γ ∈ L 1 (R ≥0 ), U is obviously well defined and U ≥ 0 on R ≥0 × H. To avoid overcomplicated formulae, we shall proceed as if γ were also continuous, i.e. exp( ∞ · γ(r) dr) ∈ C 1 (R ≥0 ), the general case may be handled in the same way by using (1.7) instead of (1.6). If γ is continuous, U ∈ K and an easy calculation shows that We aim at proving that (U (t, X t ), t ≥ 0) is a supermartingale. Towards this end, set (with the convention inf ∅ = +∞), whereX is the process introduced in Definition 1.1.
Plainly, τ n ∞ as n → ∞ P -almost surely. Using the Itô formula and (2.7) we get for any t ≥ 0 due to the definition of τ n and boundedness of D x U on bounded subsets of R ≥0 × H, as We see that for all t ≥ 0 and n ∈ N by (2.4). Since U ∈ C(R ≥0 × H) and the paths of X are continuous in H we obtain by the Fatou lemma, thus U (t, X t ) ∈ L 1 (P ) for every t ∈ R ≥0 . Analogously, for any 0 ≤ s ≤ t we have The Fatou lemma for conditional expectations now implies that = U (s, X s ) P -almost surely, which is the supermartingale property. For further use, let us note that proceeding as above we get again by the Fatou lemma.
Since U (t, X t ) is a continuous nonnegative supermartingale, the martingale convergence theorem yields a random variable U ∞ ∈ L 1 (P ) such that lim t→∞ U (t, X t ) = U ∞ P -almost surely.
Remark 2.6. a) Note that (2.12) may be satisfied only if x 0 is the unique root of R.
(3.1) Set H = L 2 (Λ), B = W 1,2 0 (Λ) and denote by G the superposition operator defined by g. Assume that G is a continuous mapping from B to H and that there exists ∈ R such for any T ∈ R ≥0 . Finally, let α : for any T ∈ R ≥0 , let f ∈ B * and let µ be a Borel probability measure on H with a finite second moment, i.e. · H ∈ L 2 (µ). Then it may be checked easily that all hypotheses of Theorem 4.2.4 in [LR] are satisfied and hence there exists a unique solution ((Ω, F , (F t ), P ), W, X) to the stochastic parabolic equation dX = α(t) ∆X + G(X) − f dt + α(t)σ(t, X) dW (t), X(0) ∼ µ, the Dirichlet Laplacian ∆ being interpreted as an operator in L (B, B * ) in a natural way. Assume that there exists a weak solution u 0 ∈ B of (3.1); one may consult e.g. [BS], [Pr,Chapter 9] or references therein for results in this direction. We want to apply (3.7) As we have already mentioned, (3.6) is satisfied if g is either nonincreasing, or Lipschitz continuous with a sufficiently small Lipschitz constant.