Local asymptotic normality for shape and periodicity of a signal in the drift of a degenerate diffusion with internal variables

Taking a multidimensional time-homogeneous dynamical system and adding a randomly perturbed time-dependent deterministic signal to some of its components gives rise to a high-dimensional system of stochastic differential equations which is driven by possibly very low-dimensional noise. Equations of this type are commonly used in biology for modeling neurons or in statistical mechanics for certain Hamiltonian systems. Assuming that the signal depends on an unknown shape parameter $\theta$ and also has an unknown periodicity $T$, we prove Local Asymptotic Normality (LAN) jointly in $\theta$ and $T$ for the statistical experiment arising from (partial) observation of this diffusion in continuous time. The local scale turns out to be $n^{-1/2}$ for $\theta$ and $n^{-3/2}$ for $T$ which generalizes known results for simpler systems.

1 Introduction of the model and the problem Let U ⊂ R N +L be a σ-compact set and let f : U → R N and g : U → R L be locally Lipschitz continuous functions. Finally, let S : [0, ∞) → R N be a continuous periodic signal and consider the deterministic dynamical system dX t = f (X t , Y t )dt + S(t)dt, This system is divided into two groups of variables: The N components of X whose dynamics depend directly on the signal and the L components of Y which are affected by the signal only indirectly through the influence of X. Intuitively speaking, we can think of (1) as a dynamical system with no intrinsic time-inhomogeneity which then receives an additional time-dependent external input S in some of its variables, while the remaining variables merely describe an interior mechanism. This is why we sometimes refer to X as the adjustable variable(s) and Y as the internal variable(s). Note that the only source of time-inhomogeneity is indeed the signal -if the system receives constant external input S ≡ c ∈ R N (or none at all, i.e. c = 0), it is homogeneous in time. Systems of this kind frequently arise in the context of neuroscience and statistical mechanics (see Examples 1.1 and 1.2 below).
We construct a stochastic model by following the idea that the signal is not actually received in its original shape, but is subject to random perturbations by external noise (i.e. noise that is independent of the rest of the system). To take account of this notion, it seems natural to substitute the signal term S(t)dt in (1) with the increment dZ t of a process taking values in a closed set U ′ ⊂ R N and satisfying an SDE of the type where W is an M -dimensional standard Brownian Motion, while b : U ′ → R N and σ : U ′ → R N ×M are locally Lipschitz continuous drift and volatility functions. Note that this SDE can be viewed as a generalized Orstein-Uhlenbeck type process with time-dependent mean-reversion level (think of b(Z t ) = −βZ t with β ∈ (0, ∞)).
A particularly prominent special case is the classical signal in noise model (take M = N = 1, b ≡ 0, and σ ≡ 1, see for example [22,Example I.7.3, Chapter III.5]), which arises in a wide variety of fields including communication, radiolocation, seismic signal processing, or computer-aided diagnosis and has been the subject of extensive study.
Perturbing S(t) randomly in this way leads to the stochastic dynamical system with state space This system can be thought of as degenerate in the following sense: Firstly, the equation for Y does not incorporate the driving Brownian Motion W explicitly, making it rather unclear which effect noise has on these components. Secondly, the dimension M of the driving Brownian Motion can (and will usually) be much lower than the dimension N + L + N of the system. This is why we call a stochastic process satisfying a system of stochastic differential equations of the type (2) a degenerate diffusion with internal variables and randomly perturbed time-inhomogeneous deterministic input.
We now have three groups of variables: The entirely autonomous external input governed by dZ t (the "noisy signal"), the components of X that are directly adjusted by the noisy signal, and the components of the internal variable Y whose dynamics are only indirectly affected by noise, since the respective differential equations incorporate neither Z nor the driving Brownian Motion W explicitly. Note that for this reason Y is conditionally deterministic given X and has continuously differentiable trajectories.
The system (2) is a generalization of the one introduced in equation (18) of Section 4.1 of [20], which is a probabilistic version of a class of dynamical systems that are well-known in the mathematical modeling of neurons (see Example 1.1 below). In [12] (which can be viewed as a companion article to the present one), we study the model (2) from a purely probabilistic standpoint and use methods from [21] to discuss sufficient conditions for the process (X, Y, Z) to be positive Harris recurrent. Before we explain the focus of the current article, let us introduce two major examples. and . for all (x, y) = (x, y 1 , y 2 , y 3 ) ⊤ ∈ U . The corresponding dynamical system (1) is known as the Hodgkin-Huxley system and it was first introduced by Hodgkin and Huxley in 1952 (see [10], note however that we use the slightly different model constants from [24]) with the aim of describing the initiation and propagation of action potentials in the cell membrane of a neuron in response to an external stimulus. While X is the membrane potential itself (usually labeled V in the literature), the internal variables Y 1 , Y 2 , and Y 3 (commonly denoted by n, m, and h) correspond to the ionic mechanism underlying its evolution. The two predominant ion currents in the cell membrane are import of sodium N a + and export of potassium K + through the membrane. Each of the internal variables signifies the probability that a specific type of gate in the respective ion channel is open at a given time. It is for this reason that n, m, and h are often called gating variables. In the context of this model, the signal S represents the dendritic input which the neuron receives from a large number of other neurons, transported by an even larger number of synapses located on the respective dendritic tree. The resulting "total dendritic input" can then be thought of as an average of interdependent and repeating similar currents, which is why S is usually assumed to be periodic (or even constant). When modeling neurons, particular interest lies in the typical spiking behaviour of the membrane potential, a feature that is commonly agreed upon to be adequately described by the Hodgkin-Huxley model. For a more detailed modern introduction, interpretation, and an in-depth comparison with other neuron models, see for example [24] and [6].
Adding noise in the sense of (2) by choosing σ ∈ C ∞ (U ′ ) and b(Z t ) = −βZ t with β ∈ (0, ∞), we acquire the so-called stochastic Hodgkin-Huxley model (with mean reverting Ornstein-Uhlenbeck type input). It was first introduced and studied by Höpfner, Löcherbach, and Thieullen in the series of the three papers [19], [20], and [21]. The constant β is determined by the so-called time constant of the membrane which represents spontaneous voltage decay not related to the input. For many types of neurons, the time constant is known from experiments (see [7]). A degree of freedom lies in the choice of the volatility σ which reflects the nature of the influence of noise. In the past, mean reverting Ornstein-Uhlenbeck type equations with various volatilities have been used to model the membrane potential itself (see for example [28] or [15]), and in a sense our stochastic Hodgkin-Huxley model can be viewed as a refinement of this kind of model. If σ is Lipschitz continuous, existence of a unique non-exploding strong solution taking values in E = R × [0, 1] 3 × U ′ follows from the same arguments as in [19,Proposition 1] and [20,Proposition 2].
Analogously, one can introduce stochastic versions of simpler neuron models such as the FitzHugh-Nagumo model (see [24, equations (4.11) and (4.12)]) or the Morris-Lecar model (see [31] or, for a modern version, [33]). Example 1.2. Systems of coupled oscillators are particularly intuitive Hamiltonian systems and several different stochastic models have been subject to research in the past (see e.g [9], [2], [32], [3]). The following example is inspired by the model from [4] to which we add a time-inhomogeneity and the corresponding external variables.
Let us think of three rotors, each given by their angle q i (t) ∈ R and momentum p i (t) ∈ R at the time t ∈ [0, ∞) for each i ∈ {1, 2, 3}. Assuming their respective masses to be all equal to 1 and not taking into account units, the laws of classical mechanics imply We suppose that these rotors are coupled in row, i.e.
where w 1 , w 2 , w 3 : R → R and u 1 , u 2 , u 3 : R → R are related to interaction potentials and pinning potentials, respectively. A classical model is the one that arises if we let one or both of the outer rotors receive external torques and interact with Langevin type heat baths. In order to give a mathematical description of this, we fix i ∈ {1, 3} for the remainder of this paragraph. Applying an external time-dependent torque S i : [0, ∞) → R to the i-th rotor means expanding the equation for p i to which turns (3) and (4) into a system like (1). On top of that, we want to add interaction with a heat bath, i.e. for a temperature τ i ∈ (0, ∞) and a dissipation constant δ i ∈ (0, ∞), the equation for p i is further expanded to where the last term in parentheses is the total sum of external influences. Following the spirit of (2), we may replace this term with the increments of a more general random perturbation of the torque: We take In this article, we want to study a statistical model in which the deterministic signal S depends on a set of parameters. More precisely, we assume that there is an open set Θ ⊂ R D such that where T is the signal's periodicity and ϑ is a D-dimensional shape parameter. A natural goal is to estimate ϑ and T simultaneously from continuous observation of the process. However, observing the process (X, Y, Z) entirely may not make sense in many models: The external variable Z can be of a rather abstract nature and, for example, in the Hodgkin-Huxley model from Example 1.1 the only variable that is arguably observable is the membrane potential X. In spite of that, Section 3.1 shows: Result 1. As long as the initial configuration (X 0 , Y 0 , Z 0 ) is deterministic and known, it does not matter whether we can observe the entire process (X, Y, Z), only the adjustable variable X, or only the external variable Z.
This is the content of Remark 3.1 and Proposition 3.2. Since Z is the most convenient process to handle statistically among all of these, our considerations in the sequel are confined to this external variable. Being able to relate statistical problems entirely to Z means that as long as this variable fits our setting, we can treat any example of (2) (including in particular those that were introduced in Examples 1.1 and 1.2). In Section 3.2, we prove an LAN result for the external variable (Theorem 3.7), generalizing [11,Theorem 2.3] in which we only treated the case M = N = 1. This can then be combined with the previous results in order to obtain: Result 2. Under reasonable regularity conditions on the parametrization and under some non-degeneracy and ergodicity of the external variable Z, the sequence of statistical experiments corresponding to continuous observation of (X, Y, Z) over growing time intervals [0, n] for n → ∞ has the LAN property. The local scales are identified as n −1/2 for the shape and n −3/2 for the periodicity.
The rigorous and precise corresponding statement is Theorem 2.3. It allows for application to simultaneous estimation of shape and periodicity, as under LAN we can use Hájek's Convolution Theorem and the Local Asymptotic Minimax Theorem in order to establish optimality for estimators when the rescaled estimation errors are stochastically asymptotically equivalent to the central statistic of the experiment (see [29], [5], [27] or [14] for a detailed presentation of the relevant theory).

Main results and applications
First, let us recall and collect the basic assumptions that were mentioned in the introduction.
(A0) Basic setting: The state space is E = U × U ′ where U ⊂ R N +L is σ-compact and U ′ ⊂ R N is closed. All of the coefficient functions f , g, b, σ are locally Lipschitz continuous and the signal S (ϑ,T ) is continuous, T -periodic with T ∈ (0, ∞) and depends on some parameter ϑ taken from an open set Θ ⊂ R D .
Throughout this article, (A0) will be a tacit standing assumption.
and incorporating the parameters, we rewrite the equation (2) as where We fix some probability space (Ω, F , P) and we consider the following assumptions about the SDE (5): (A1) Unique solvability: For all (ϑ, T ) ∈ Θ × (0, ∞) and all deterministic starting points Φ 0 ∈ E, the SDE (5) has a unique strong solution (A2) Bounded diffusion matrix: The mapping σσ ⊤ : U ′ → R N ×N is uniformly bounded away from 0 and from ∞ in the sense that there are σ 0 , σ ∞ ∈ (0, ∞) such that is positive Harris recurrent.
Remark 2.1. 1.) As we know from Linear Algebra, (A2) also yields that the inverse σσ ⊤ (z) −1 exists for all z ∈ U ′ , is symmetric and positive definite (and hence possesses a square root σσ ⊤ (z) −1/2 ∈ R N ×N ), and we have . Thus, the linear mapping σ(z) : R M → R N is surjective and hence M ≥ N . In this sense, (A2) is a non-degeneracy condition on the external equation for Z.
3.) The recurrence assumption (A3) allows us to make use of certain variants of classical Limit Theorems (see [17], [18]) which we will need for Lemma 3.4 below. Note that (A3) is weaker than the assertion that the entire process Φ (ϑ,T ) is positive Harris-recurrent (compare [12]).
Let (ϑ, T ) ∈ Θ × (0, ∞). We define the probability measure Observing the process continuously then means working with the filtration given by and gives rise to the sequence of statistical experiments defined by As is proved in Section 3, for all (θ,T ) ∈ Θ × (0, ∞) the corresponding log-likelihood ratios are given by where B (ϑ,T ) is a Brownian Motion and π Z = (π (N +L+1) , . . . , π (N +L+N ) ). Examining its structure suggests that in order to find a suitable quadratic expansion for LAN we have to impose appropriate smoothness conditions on the signal with respect to the parameters. The following set of conditions (S1) -(S5) turns out to be sufficient: (S1) Basic regularity: For each ϑ ∈ Θ we have a 1-periodic function and ∂ ϑi S ϑ (·) ∈ L 2 loc [0, ∞); R N for every ϑ ∈ Θ and i ∈ {1, . . . , D}.
Remark 2.2. 1.) If (S1) holds andṠ (ϑ,T ) (s) is continuous (and thus also locally bounded) with respect to ϑ, T , and s, (S2) and (S3) follow by dominated convergence. Note that in general, (S1) does not require that for example ∂ ϑ1 S (ϑ,T ) (s) is continuous (or even locally bounded) in T or s.
3.) As a consequence of the two preceding observations, all of the hypotheses (S1) - ; R N and 1-periodic with respect to s. Existence and boundedness of ∂ s D ϑ S ϑ (s) ensure that we can choose δ = 1 and ζ = 0 above.
4.) Note that the choice of the matrix norm in (S3) and (S4) is of course arbitrary. We decided to go with the Frobenius norm, because it is commonly used and it is convenient to handle in our calculations.
The main result is the following one. For a detailed explanation and proof, as well as an explicit introduction of the Fisher Information, we refer to Section 3.
Proof of Theorem 2.3. The claim follows immediately from Theorem 3.7 and (the proof of) Proposition 3.2. In particular, the assumptions (A2) and (S5) can in fact be replaced by the slightly weaker but more technical conditions (A2') and (S5') which are introduced in Section 3 below and are discussed in Remark 3.5.
Note that other than the basic existence and uniqueness assumption (A1), the conditions for Theorem 2.3 incorporate only the external variable and the deterministic signal. Before we proceed to the proof section, we would like to collect some comments on relevant examples in which these conditions are fulfilled.

Example 2.4.
A simple yet important example for the external variable is the multidimensional Ornstein-Uhlenbeck process with time-dependent mean reversion level S (ϑ,T ) . This process corresponds to (13) with b(z) = −βz for all z ∈ U ′ = R N with some positive definite β ∈ R N ×N and a constant volatility σ ∈ R N ×M such that σσ ⊤ ∈ R N ×N is positive definite. Assumption (A2) is then trivially fulfilled, and in complete analogy to the case M = N = 1 (see [17,Example 2.3]), one can calculate explicitly its transition densities, and then use Theorem 3.2 and Theorem 4.6 (with f ≡ 1 and V (z) = |z| 2 ) from [30] in order to check (A3).
is continuously differentiable with respect to ϑ ∈ Θ and twice continuously differentiable with respect to ξ ∈ R K . Clearly, the property (S1) holds, and sinceṠ (ϑ,T ) (s) is given by which is continuous with respect to ϑ, T , and s, we also have (S2) and (S3). Moreover, we see that the Hölder property from part 2.) of Remark 2.2 is fulfilled if it is fulfilled by the mapping In that case, all of the hypotheses (S1) -(S4) hold. 2.) If the signal has a product structure S ϑ (s) = D(ϑ)ϕ(s) with ϕ ∈ C 2 [0, ∞); R K 1-periodic and G ∈ C 1 Θ; R N ×K , we can treat it as a special case of the preceding example. As for all s,s ∈ [0, ∞) we have no further conditions are needed to ensure the Hölder property from part 2.) of Remark 2.2 to hold with δ = 1 and ζ = 0. 3.) In particular, the example above secures that (S1) -(S4) are fulfilled for signals of the form with K ∈ N and G k , H k ∈ C 1 Θ; R N for all k ∈ {1, . . . , K}. 4.) Taking K = D, N = 1 and G k (ϑ) = ϑ k , H k (ϑ) = 0 for all ϑ ∈ Θ and k ∈ {1, . . . , K}, the signal from (9) clearly also satisfies (S5), als long as 0 / ∈ Θ.

Proofs and supplementary results
3.1 Observing (X, Y, Z), X, or Z We start this section with a fundamental observation: If the starting point is known, observing only the adjustable variable X is actually no restriction, since we can successively reconstruct the remaining variables Y and Z. Let us explain this step for step in the following remark.
Remark 3.1. Assume that the starting point (X 0 , Y 0 , Z 0 ) ∈ E is known. Fix a finite time horizon t 0 ∈ (0, ∞) and assume that the trajectory (X t ) t∈[0,t0] has been observed and is thus also known. Then the function (t, y) → g(X t , y) is completely known, and given the structure of the internal equation in (2), the trajectory (Y t ) t∈[0,t0] is now given as the solution to the ordinary differential equation t0] , and by rearranging the first line of (2), this information allows us to calculate All in all, we have reconstructed every component of (X t , Y t , Z t ) t∈[0,t0] just from (X t ) t∈[0,t0] and the starting point (X 0 , Y 0 , Z 0 ).
Remark 3.1 is the legitimation for us to work with the idealized assumption that we can in fact observe the entire process (X, Y, Z) even in situations where realistically one could only observe the adjustable variable X. Next, we will describe the corresponding statistical experiment.
In order to make Proposition 3.2 more apprehensible, we will do this very carefully and with much attention to measure-theoretic subtleties. A look at (5) reveals that the drift coefficient depends on the parameter (ϑ, T ) ∈ Θ × (0, ∞), while the volatility does not. Hence, we can use [14, Theorem 6.10] 2 in order to determine the log-likelihood ratios. Let (t, x, y, z) ∈ [0, ∞) × E. Comparing the drift coefficients of (5) with different parameters (θ,T ), (ϑ, T ) ∈ Θ × (0, ∞), we see that Thanks to (A2) and (6), Setting π Z := π (N +L+1) , . . . , π (N +L+N ) ⊤ and writing m Z,(ϑ,T ) for its local martingale part under P (ϑ,T ) , the expression for the log-likelihood ratio can be rewritten as In order to eliminate the rather unintuitive integral with respect to m Z,(ϑ,T ) , we introduce the local s for all t ∈ [0, ∞).
2 Note that we do not assume -as in this Theorem -that B and Σ are defined on the entire euclidean space and are globally Lipschitz continuous. By our assumptions, E = U × U ′ is σ-compact and hence we can find a sequence (Kn) n∈N of compact sets increasing to E. Using Kirszbraun's Theorem ([26, Hauptsatz I]), the restriction to each Kn of B(t, ·) and Σ can be extended to globally Lipschitz continuous functions on R N+L+N (which also satisfy a linear growth condition). Hence, the proof of [14, Theorem 6.10] needs only a slight adjustment to work in our case: Using the notation from there, the stopping time ̺n has to be replaced by ̺n ∧ inf{t > 0 | ηt / ∈ Kn} and in equation (II (n) ) and thereafter the coefficients b, σ and c have to be altered in analogy to γ. The rest of the proof then needs no further changes.

Its quadratic variation process is
for all t ∈ [0, ∞), so Lévy's Characterization Theorem [23, Theorem II.6.1] yields that B (ϑ,T ) is an N -dimensional P (ϑ,T ) , (F t ) t∈[0,∞) -Brownian Motion. Incorporating this process, we can write We note immediately that the only component of π that is featured explicitly in this expression is the π Zcomponent. It seems plausible that we should get the same expression for the log-likelihood ratio in an experiment that does not even know that any variables other than Z exist. Let us make this formally rigorous. Let η = (η t ) t∈[0,∞) be the canonical process on C [0, ∞); U ′ , and write and consider the sequence of experiments given by Using the same arguments as above and writingm Z,(ϑ,T ) for the local martingale part of η under Q (ϑ,T ) , we can again use [14, Theorem 6.10] and conclude where the processB (ϑ,T ) := B (ϑ,T ) t t∈[0,∞) given by We now have calculated the log-likelihood ratios for both E (X,Y,Z) and E Z . Comparing them leads to the following result. Proof. Due to the definition of P (ϑ,T ) and Q (ϑ,T ) , we have L η Q (ϑ,T ) = L π Z P (ϑ,T ) , and in view of (11), (12), (15), and (16), this implies (17) from which the second statement of this Proposition follows immediately.
In view of Theorem 2.3, Proposition 3.2 is the justification for us to restrict ourselves to studying the simpler process Z instead of the more complex (X, Y, Z) in the following section.

Local Asymptotic Normality for Z
This section centres around the sequence of statistical experiments defined by E Z in (14) which corresponds to continuous observation over growing time intervals of the N -dimensional diffusion Z following the parameterdependent SDE (13). As mentioned in Section 1, taking M = N = 1, b ≡ 0, and σ ≡ 1 leads to the classical "signal in white noise" model. For this special case, Ibragimov and Khasminskii proved LAN with rate n −3/2 for a smooth signal with known ϑ and unknown T , and discussed asymptotic efficiency for certain estimators (see [22,Sections II.7 and III.5]). In [8], Golubev extended their approach with L 2 -methods in order to estimate T at the same rate for unknown shape which in turn was the basis for Castillo, Lévy-Leduc and Matias for non-parametric estimation of the shape under unknown T (see [1]). For our more general diffusion (13), we will stay within the confines of parametric estimation. The main result of this section is LAN for the sequence of experiments E Z with unknown ϑ and unknown T (Theorem 3.7). For M = N = 1 Höpfner and Kutoyants had already solved this problem both for known T with unknown ϑ (see [16]) and for known ϑ with unknown T (see [18]). A result on LAN jointly in ϑ and T was presented in [11], but still only in dimension one. Theorem 3.7 extends all of these results and allows for application to simultaneous estimation of the shape and the periodicity in any dimension.
In the context of this subsection, we replace the assumption (A1) with the following weaker analogue. We also work with the following slight relaxation of (A2).
Note that so far, the only use of (A2) occured in (10), and there (A2') would also suffice. Let us also give an equivalent reformulation of (A3) which incorporates the notation we introduced in the previous section.
Periodicity of the signal is the reason why (A3) even makes sense at all: Since S (ϑ,T ) and therefore the entire drift term of (13) is T -periodic, the grid chain is a U ′ -valued time-homogeneous discrete-time Markov process. Another important process that is embedded in η in a similar way is the C([0, T ]; U ′ )-valued time-homogeneous path segment chain η ps := (η ps k ) k∈N0 defined by taking an arbitrary η ps 0 ∈ C [0, T ]; U ′ with η ps 0 (T ) = Z 0 and then setting η ps As we know from [17, Theorem 2.1 (a)] 3 , the path segment chain η ps inherits positive Harris recurrence under Q (ϑ,T ) from the grid chain and its invariant distribution m (ϑ,T ) is the unique measure on B C([0, T ]; U ′ ) such that for all l ∈ N, 0 = t 0 < t 1 < . . . < t l = T , and B 0 , . . . , B l ∈ B U ′ we have where Q (ϑ,T ) s,t t>s≥0 is the transition semi-group of η under Q (ϑ,T ) . We will make use of the following strong law of large numbers for the path segment chain which we cite from [17, Theorem 2.1 (b)]. Proposition 3.3. Let (A1') and (A3) hold and fix some (ϑ, T ) ∈ Θ × (0, ∞). Assume that (A t ) t∈[0,∞) is a Q (ϑ,T ) , (G t ) t∈[0,∞) -increasing process. If there is a non-negative function F ∈ L 1 m (ϑ,T ) such that Proof. See Section 2 of [17]. where is understood as a matrix-valued integral. Then the following statements are true.
is a non-negative definite and symmetric bilinear form.
Proof. For the sake of simplicity and as (ϑ, T ) is fixed anyway, we drop all corresponding superscripts. First, we check that B G is indeed a well-defined mapping with values in R. Let the lower bound for the eigenvalues of G(·) be denoted by G 0 ∈ (0, ∞). Recall that G −1 (·) always exists, is positive definite, and G −1 0 is an upper bound for its eigenvalues. Then by linearity and contractivity of the operator µQ 0,sT , we can estimate Thanks to the symmetry of G −1 , we can polarize the integrand and thus the whole expression, which allows us to use the above in order to conclude that and hence B G is well-defined. It is then trivial to see that it is a non-negative definite and symmetric bilinear form, and the proof for (i) is complete. We note that the left hand side of (20) is bilinear in u and v as well. Thanks to this and (i), the proof of the second statement of the Lemma can be reduced to the case u = v, since the general case then follows by polarization.
Let Since G −1 (·) is positive definite, the integrand is non-negative, and therefore A is an increasing process whose trajectories are obviously continuous. Note that the expression on the left hand side of (20) can be rewritten as For k = 0 this is simply 1 t A t , which we will handle with the help of Proposition 3.3. The general statement then follows from this special case by elementary calculus (compare Lemma 3.17 of [13]).
In order to establish the functional relation between A and η that is needed in Proposition 3.3, we define the function , and thus it is integrable with respect to the probability measure m. Since u is 1-periodic, we see that for all k ∈ N, and consequently Proposition 3.3 allows to deduce Q-almost sure convergence where the use of Fubini's Theorem in the second step is justified by the non-negativity of the integrand, and the third step makes use of (18). This completes the proof.
(S5') Regularity of the signal with respect to B (ϑ,T ) While part (ii) of (S5') is merely needed for technical reasons (as will become clear in the proof of Theorem 3.7 below), part (i) is of more general importance, since I (ϑ,T ) (1) will turn out to be the Fisher Information. We will discuss these conditions in detail in the following remark.
4.) If (A2) holds, for all u ∈ L 2 [0, 1]; R N we can use (6) and estimate σσ ⊤ is positive definite (in fact even coercive). Thus, (A2) and (S5) together imply (S5'). 5.) A very simple and seemingly natural sufficient condition for (S5') is orthogonality of the functions σσ ⊤ (without assuming this bilinear form to be positive definite). This is equivalent to both I (ϑ,T ) (t) and I ′ (ϑ,T ) (t) being diagonal matrices with non-vanishing diagonal entries and as such they are invertible. However, this is not a very likely scenario, since S ϑ has D degrees of freedom, determines the D functions ∂ ϑ1 S ϑ , . . . , ∂ ϑD S ϑ , and then S ′ ϑ -while adding no further degree of freedom -would have to be orthogonal to these as well.

2.) For
σσ ⊤ is just the standard L 2 -inner product with respect to Lebesgue's measure. If N = 1, D = 2d with d ∈ N, and the signal has a finite Fourier expansion it is both of the type from the first part of this example and of the type introduced in part 3.) of Example 2.5 (so in particular it satisfies (S1) -(S4)). Elementary calculations show that the conditions (22) and (23) and fix any bounded sequence (h n ) n∈N ⊂ R D+1 . Then Q (ϑ,T ) -almost surely we have with Fisher Information and score holds.
We now proceed to give the proof, divided into several steps. 1.) The main idea is to introduce a time step size t ∈ (0, ∞) into the log-likelihood ratio and then interpret log dQ (ϑ,T )+δnhn | Gtn dQ (ϑ,T ) | Gtn , n ∈ N, as a sequence of continuous-time stochastic processes. Splitting them into several parts and applying Lemma 3.4 together with tools from continuous-time martingale theory will eventually lead to the desired quadratic expansion. Indeed, adding and subtracting the termṠ (ϑ,T ) (s)δ n h n to the difference of the signals yields .) It remains to show convergence to zero in Q-probability of the remainder terms R n (t), U n (t), and V n (t) introduced at the very beginning of this proof. Therefore, we consider the sequence (R n ) n∈N of the local Q-martingales Their quadratic variation processes are obviously given by (U n (t)) t∈[0,∞) . Exploiting the uniform ellipticity assumption (A2'), we can estimate the quadratic variation by Note that this upper bound is entirely deterministic. In order to prove that it in fact converges to zero, we will separate the dependence on the parameters ϑ and T in such a way that we can use the periodicity and (S1) -(S4) efficiently. This can be achieved by continuing the inequality (27)  =: 3σ −1 0 (A n + B n + C n ). We will treat convergence of A n , B n , and C n step for step. For this purpose, set H := sup n∈N |h n | and note that due to (25) we have |ϑ n − ϑ| ≤ Hn −1/2 and |T n − T | ≤ Hn −3/2 for all n ∈ N.
Starting with A n , we observe that for sufficiently large n ∈ N we have T n ∈ [T /2, 2T ] and thus where the factor in front of the integral is obviously convergent. Using the L 2 -continuity condition (S3) and a simple application of the mean value theorem (compare Lemma 3.18 of [13]), one sees that the integral itself tends to zero. Next, using the Hölder condition (S4), we obtain for sufficiently large n ∈ N that B n ≤ |ϑ n − ϑ| The particular conditions on α and β from (S4) make the second summand vanish for n → ∞, while the first summand converges to zero because of (S3).
In order to estimate C n , we make explicit use of the C 2 -property (S1) which is readily translated into the condition that the mapping (0, ∞) ∋ T → S (ϑ,T ) (s) is twice continuously differentiable for any fixed s ∈ (0, ∞). Consequently, for every s ∈ (0, ∞) and any i ∈ {1, . . . , N } Taylor expansion with the Lagrange form of the remainder provides a ̺ i = ̺ i (s, ϑ, T, T n , h n ) between T and T n such that for sufficiently large n ∈ N we can infer that and hence C n vanishes for n → ∞. So far, we have shown that the sequence of random variables (U n (t)) n∈N not only vanishes in probability under Q for n → ∞, but is even bounded by a deterministic sequence which goes to zero. Therefore, and in particular, R n (t) also vanishes in probability under Q for n → ∞. Finally, the same is true for the last remainder variable V n (t), as by the Cauchy-Schwarz inequality we get that (29) |V n (t)| 2 ≤ U n (t)h ⊤ n I n (t)h n ≤ U n (t)H 2 |I n (t)| n→∞ − −−− → 0, since I n (t) converges and U n (t) goes to zero. Taking t = 1 completes the proof.
Remark 3.8. The convergence in probability for n → ∞ of the remainder terms R n (t), U n (t), and V n (t) (which determine the term o Q (ϑ,T ) (1) in (24)) is in fact even uniform with respect to t ∈ [0, t 0 ] for every t 0 ∈ (0, ∞). For U n (t) this is clear, since it only increases with t. Using the Burkholder-Davis-Gundy inequality, the estimation (28) can be improved to which also takes care of R n (t). For V n (t) we notice that the bound given in (29) only depends on t via I n (t) and U n (t) which are both non-decreasing with respect to t.
Remark 3.9. In the one-dimensional case M = N = 1, variants of Theorem 3.7 are already known in the literature, where shape and periodicity are treated separately and one of them is assumed to be known. A detailed contextualization is provided in Remark 2.6 and Examples 2.7 and 2.8 of [11].