Prediction of dynamical systems from time-delayed measurements with self-intersections

In the context of predicting behaviour of chaotic systems, Schroer, Sauer, Ott and Yorke conjectured in 1998 that if a dynamical system defined by a smooth diffeomorphism $T$ of a Riemannian manifold $X$ admits an attractor with a natural measure $\mu$ of information dimension smaller than $k$, then $k$ time-delayed measurements of a one-dimensional observable $h$ are generically sufficient for $\mu$-almost sure prediction of future measurements of $h$. In a previous paper we established this conjecture in the setup of injective Lipschitz transformations $T$ of a compact set $X$ in Euclidean space with an ergodic $T$-invariant Borel probability measure $\mu$. In this paper we prove the conjecture for all Lipschitz systems (also non-invertible) on compact sets with an arbitrary Borel probability measure, and establish an upper bound for the decay rate of the measure of the set of points where the prediction is subpar. This partially confirms a second conjecture by Schroer, Sauer, Ott and Yorke related to empirical prediction algorithms as well as algorithms estimating the dimension and number of required delayed measurements (the so-called embedding dimension) of an observed system. We also prove general time-delay prediction theorems for locally Lipschitz or H\"older systems on Borel sets in Euclidean space.

1. Introduction 1.1.Time-delayed measurements and Takens-type delay embedding theorem.Suppose we are given a dynamical system T : X → X on some phase space X with a transformation (evolution rule) T . 1 For an experimentalist, direct knowledge on the system (X, T ) may be lacking or non-existing.As a consequence, information on its behaviour is often provided by a finite sequence of time-delayed measurements (1) h(x i ), h(T x i ), . . ., h(T m x i ), x 1 , . . ., x r ∈ X, r, m ∈ N of a function (observable) h : X → R. In general, the information contained in (1) might not be sufficient for reconstructing the system (X, T ).However, it is natural to ask, how well one may approximately reconstruct (X, T ) (or determine its important attributes such as its dimension) from (1), at least for some observables h.This problem has been widely studied from both theoretic and applied points of view, in natural, social and medical sciences as For a more detailed introduction to time-delayed embeddings, see e.g.[ASY97, Chapter 13] or [BT11, Chapter 6].
1.2.Prediction algorithms and the predictability conjectures of Schroer, Sauer, Ott and Yorke.Consider time-delayed measurements from a different perspective.Instead of trying to reconstruct the original dynamics (X, T ) from the time-delayed measurements of an observable h (determining whether the map φ is injective), consider the problem of predicting from the first k terms of the time series h(x), h(T x), . . ., h(T k−1 x) its future values h(T k x), h(T k+1 x), . ... Notice that such prediction takes place not in the original phase space X, but in φ(X) ⊂ R k , considered as a model space for the system.One of the basic questions in this context is to determine, for which values k such prediction is possible.
It should be observed that the prediction problem can be considered for both invertible and non-invertible transformations T , while the possibility of dynamical reconstruction (embedding) by delay coordinates maps is generally restricted to invertible systems.It is important to note that from a theoretical point of view, reconstruction implies prediction but not vice versa.
Thus, the problem of predicting future values of a dynamically generated time series may possibly be more accessible than the problem of reconstructing the entire system.Moreover, the problem of prediction is relevant from the point of view of applications (see e.g.[JL94, HGLS05, WCL09, SKY + 18]).In the context of applications, creating reliable prediction algorithms is a matter of major importance.Let us present one of these algorithms, proposed by Farmer and Sidorowich in [FS87].To describe it, consider a sequence of measurements h(x), . . ., h(T n+k−1 (x)) of an observable h for a point x ∈ X and some k, n ∈ N.This defines a sequence z 0 , . . ., z n ∈ φ(X) of k-delay coordinate vectors, where (4) z i = z i (x) = φ(T i x) = (h(T i x), . . ., h(T i+k−1 x)), i = 0, . . ., n.
Given such a sequence, for y ∈ R k , ε > 0 and n ∈ N, define (5) Pred x,ε,n (y) = 1 #I n i∈In z i+1 for I n = I n (x, y, ε) = {0 ≤ i < n : z i ∈ B(y, ε)}, assuming I n = ∅, where B(y, ε) denotes the open ball of centre y and radius ε.Now, knowing the values of z 0 , . . ., z n , we predict the value of the next point z n+1 = (h(T n+1 x), . . ., h(T n+k x)) by Pred x,ε,n (z n ).In other words, the predicted value of z n+1 is taken to be the average of the values z i+1 , i = 0, . . ., n − 1, where we count only those i, for which z i are ε-close to the last known point z n .In this way, we predict the one-step future of the dynamics in the model space φ(X) ⊂ R k .The Farmer-Sidorowich algorithm, as well as its variants, like the simplex algorithm [SM90], are important tools for non-parametric prediction (see e.g.[SPSZ20, SFG + 22]).
When studying the Farmer-Sidorowich and other prediction algorithms, a useful and realistic approach is to consider a probabilistic setting, where there is an (explicit or implicit) random process, given by a probability measure µ on X, determining which initial states are accessible to the experimentalist.In this context, Schroer, Sauer, Ott and Yorke [SSOY98] introduced in 1998 the following notion of probabilistic predictability for systems defined by smooth diffeomorphisms on Riemannian manifolds.We present it here in a general setup of transformations of Borel sets in Euclidean space.Definition 1.1.Let X ⊂ R N , N ∈ N, be a Borel set admitting a Borel probability measure µ.Let T : X → X be a Borel transformation, h : X → R a Borel observable and k ∈ N.
A relation of χ ε and σ ε to the Farmer-Sidorowich algorithm for ergodic 6 systems is described in the following proposition.
The proof of Proposition 1.3 is given in Section 6.
We can now state the first conjecture of of Schroer, Sauer, Ott and Yorke [SSOY98] (SSOY predictability conjecture) in its original form.It is stated for a special class of natural measures, which are 'physically observed' T -invariant probability measures on Riemannian manifolds, see Definition 2.10 for details.The symbol ID denotes the information dimension of a measure (see Definition 2.7).
SSOY predictability conjecture ([SSOY98, Conjecture 1]).Let T : X → X be a smooth diffeomorphism of a compact Riemannian manifold X with a natural measure µ of information dimension ID(µ) = D. Then a generic observable h : X → R is almost surely k-predictable with respect to µ for k > D.
Note that in this formulation some details, including the type of genericity and the smoothness class of the dynamics and observable, are not specified precisely.Apparently, the most important feature of the conjecture is the fact that the bound for the minimal number of measurements is reduced (at least) by half compared to various versions of Takens-type delay embedding theorems, i.e. from k > 2 dim X to k > dim µ.The possibility of such reduction is the main difference between the deterministic and probabilistic settings (where in the latter case one can neglect sets of measure zero).In fact, the SSOY predictability conjecture has been invoked in the literature as a theoretical argument for reducing the number of time-delay measurements of observables (see [OL98,MS04,Liu10]), also in direct applications (e.g. in [QMAV99] studying neural brain activity in focal epilepsy).
In our previous paper [BGS22, Corollary 1.10] we proved the SSOY predictability conjecture for arbitrary ergodic T -invariant Borel probability measures µ.On the other hand, we constructed an example of a C ∞ -smooth diffeomorphism with a non-ergodic natural measure, for which the conjecture does not hold (see [BGS22,Theorem 1.11]).However, after replacing the information dimension ID(µ) by the Hausdorff dimension dim H µ (see Definition 2.7), we verified the conjecture for C r -generic (r ≥ 1) diffeomorphisms T (see [BGS22, Corollary 1.9]).In fact, in [BGS22, Corollaries 1.9-1.10]we also showed that the suitable k-delay coordinate map φ is injective on a set of full µ-measure for a generic observable h, inducing an almost sure embedding of the system into R k .To this aim, we established a predictable embedding theorem [BGS22, Theorem 1.7 and 3.1] for injective Lipschitz maps T on compact sets in Euclidean space and arbitrary Borel probability measures, assuming a suitable condition on the size of the sets of T -periodic points of low period, similar to the one in the Takens-type delay embedding theorem by Sauer, Yorke and Castagli, quoted in Subsection 1.1.
In this paper we prove a general version of the SSOY predictability conjecture, valid for an arbitrary dynamical system defined by a Lipschitz transformation of a compact set in Euclidean space with a Borel probability measure, and Lipschitz observables.In particular, we do not assume the injectivity of the dynamics nor any bound on the size of the sets of periodic points.Similarly to [SYC91,Rob11,BGS20,BGS22], in this and subsequent results we consider the genericity of observables in terms of prevalence in the space of Lipschitz or locally Lipschitz maps h : X → R (see Definition 2.3 and the discussion afterwards).More precisely, we prove the following.
Theorem 1.4 (General SSOY predictability conjecture).Let X ⊂ R N , N ∈ N, be a compact set, µ a Borel probability measure on X and T : X → X a Lipschitz map.Then a prevalent Lipschitz observable h : The proof of Theorem 1.4 is given in Section 7. In fact, we show an extended version of the result (Theorem 7.1), valid for Hölder observables.
Remark 1.5.Since for a Lipschitz map T : X → X on a closed set X ⊂ R N and ergodic T -invariant Borel probability measure µ, we have dim H µ ≤ ID(µ) (see e.g.[BGS22, Proposition 2.1]), Theorem 1.4 shows that the SSOY predictability conjecture holds in its original form (i.e. with the information dimension of the measure µ) for an arbitrary Lipschitz map T on a compact set X ⊂ R N , an ergodic T -invariant Borel probability measure µ and a prevalent Lipschitz observable h : X → R.
In [SSOY98], Schroer, Sauer, Ott and Yorke formulated also another conjecture concerning the decay rate of the prediction error σ ε for a natural measure on the attractor of the system.SSOY prediction error conjecture ([SSOY98, Conjecture 2]).Let T : X → X be a smooth diffeomorphism of a compact Riemannian manifold X with a natural measure µ of information dimension ID(µ) = D. Fix a generic observable h : X → R, k ∈ N and δ > 0. Then for the k-delay coordinate map φ corresponding to h and sufficiently small ε > 0, the following hold. (i) (ii) If D < k < 2D and φ is not injective on X, then The main result of this paper is partially establishing the SSOY prediction error conjecture, with the information dimension of µ replaced by the box-counting dimension of X (see Definition 2.7).More precisely, we prove the key estimate for the decay rate of the measure of the set of points with given prediction error (i.e. the upper bound in assertion (ii) of the conjecture) with an exponent arbitrarily close to k − D, and confirm assertions (iii)-(iv).Case (ii), where φ is not injective is precisely the case referred to in the title of this paper, i.e. 'time-delayed measurements with self-intersections'.
Furthermore, in Section 8 we provide an example showing that assertion (ii) of the conjecture does not hold for the information or Hausdorff dimension of µ of an arbitrary Borel probability measure µ (we should however emphasize that the counterexamples are not in the class of natural measures).This is in contrast with the SSOY predictability conjecture, which holds for the Hausdorff dimension of µ.Similarly as our previously mentioned results, the theorem is formulated in a general category of Lipschitz transformations on compact sets in Euclidean space and arbitrary Borel probability measures.Note that in the formulation of the result, there appear both upper and lower box-counting dimension of X.
Then there is a prevalent set S of Lipschitz observables h : X → R, such that for every h ∈ S, k ∈ N, δ, θ > 0, there exist C, ε 0 > 0, such that for the k-delay coordinate map φ defined in (3) and every 0 < ε < ε 0 , the following hold.
The proof of Theorem 1.6 is given in Section 7. Again, we show an extended version of the result (Theorem 7.4 and Corollary 7.3), valid for Hölder observables.
1.3.Another perspective on predictability.As explained above, the approach to predictability taken in [SSOY98] through the definition of the prediction error σ can be motivated by its relation to prediction algorithms, like the Farmer-Sidorowich algorithm.However, one can consider another, apparently more straightforward approach.It is based on the premise that predictability is precisely the phenomenon whereby the first k terms of the time series (2) of an observable h at a point x ∈ X determine its (k + 1)-th term.Following this point of view, we define a notion of deterministic predictability.Definition 1.8.Let (X, T ) be a dynamical system consisting of a map T : X → X on a set X ⊂ R N and let k ∈ N. We say that an observable h : X → R is deterministically kpredictable at a point y ∈ φ(X), if for the k-delay coordinate map φ defined in (3), the map φ • T is constant on φ −1 ({y}).If h is deterministically predictable at all points y ∈ φ(X), then we say that h is deterministically k-predictable.
The subsequent proposition is almost immediate (the proof is given in Section 5).
Proposition 1.9.The following conditions are equivalent: Moreover, if X is compact and T , h are continuous, then the map S is continuous.
Note that whenever S exists, it is given by It follows that if h is deterministically k-predictable, then knowing the first k terms of its time series, we can determine all its further terms by iterating the map S.
Consider now a probabilistic setting, when the phase space X admits some probability measure µ, and one predicts measurements performed at a typical (µ-almost every) point.In view of Proposition 1.9, we define the following 'almost sure' version of deterministic predictability.
Definition 1.10.Let X ⊂ R N , N ∈ N, be a Borel set with a Borel probability measure µ.Let T : X → X be a Borel transformation, h : X → R a Borel observable and k ∈ N. We say that h is almost surely deterministically k-predictable with respect to µ, if there exists a Borel set X h ⊂ X of full µ-measure such that for every x 1 , x 2 ∈ X h , if φ(x 1 ) = φ(x 2 ), then φ(T x 1 ) = φ(T x 2 ).Equivalently, there exists a map S : φ(X h ) → φ(X) such that φ(T x) = S(φ(x)) for every x ∈ X h , i.e. the diagram commutes and ( 6) S(h(x), . . ., h(T k−1 x)) = (h(T x), . . ., h(T k x)) for µ-almost every x ∈ X.
Definition 1.11.In the context of Proposition 1.9 and Definition 1.10, we call the map S (wherever it is defined) the prediction map.The space φ(X) (or φ(X h )) is called the model space, and the number k is the delay length. 7  Remark 1.12.In the case of deterministic predictability (Definition 1.8 and Proposition 1.9), the dynamical system (φ(X), S) is a topological factor (model system) of (X, T ).In contrast, in the case of almost sure deterministic predictability (Definition 1.10), the prediction map S is not guaranteed to map φ(X h ) into itself, hence it might not be possible to iterate it.However, if we assume the measure µ to be T -invariant, then the set X h can be chosen to satisfy , providing an almost sure model of the system (X, T ).
The following proposition describes properties of continuous almost surely deterministically predictable observables for continuous systems.Proposition 1.13.Let X be a Borel set, µ a Borel probability measure on X and T : X → X a continuous map.Let h : X → R be a continuous observable, which is almost surely deterministically k-predictable with respect to µ for some k ∈ N. Then the following hold.
(i) The set X h from Definition 1.10 can be chosen such that the model space φ(X h ) is a Borel set and the prediction map S is a Borel map.
for φ * µ-almost every y ∈ φ(X), where χ ε is introduced in Definition 1.1.Moreover, if the measure µ is T -invariant, then the assumption φ • T ∈ L 1 (µ) in assertion (ii) can be replaced by h ∈ L 1 (µ). 7In the literature it is more common to find the terminology reconstruction space or embedding space for φ(X) as well as embedding dimension for k.Note however that this terminology is inadequate when φ is not injective.
The proof of Proposition 1.13 is given in Section 6.Our next result shows, in particular, a surprising fact, that the two probabilistic notions of predictability, presented in Definitions 1.2 and 1.10, coincide for continuous systems on compact sets.Theorem 1.14 (Relations between two notions of almost sure predictability).Let X be a Borel set, µ a Borel probability measure on X and T : X → X a continuous map.Let h : X → R be a continuous observable and k ∈ N. Consider the k-delay coordinate map φ defined in (3) and the prediction map S. Then the following hold.
(i) If φ • T ∈ L 2 (µ) (e.g. if h is bounded) and h is almost surely deterministically kpredictable with respect to µ, then S ∈ L 2 (φ * µ) and for φ * µ-almost every y ∈ φ(X) and χ ε , σ ε from Definition 1.1, so h is almost surely k-predictable with respect to µ. (ii) If X is compact, then h is almost surely deterministically k-predictable with respect to µ if and only if it is almost surely k-predictable with respect to µ.Moreover, if the measure µ is T -invariant, then the assumption φ • T ∈ L 2 (µ) in assertion (i) can be replaced by h ∈ L 2 (µ).
The proof of Theorem 1.14 is presented in Section 6.In view of this result, if X is compact and T is continuous, then one can use the term almost sure predictability for a continuous observable h in the context of both Definitions 1.2 and 1.10.
The notion of almost sure deterministic predictability enables one to gain a new perspective on the Farmer-Sidorowich algorithm.Indeed, if an observable h : X → R is almost surely deterministically k-predictable, then the points z i defined in (4) satisfy for φ * µ-almost every z 0 ∈ R k .Hence, by Proposition 1.3, Proposition 1.13 and Theorem 1.14, we immediately obtain the following corollary.
Corollary 1.15.Let X be a Borel set, T : X → X a continuous map and µ an ergodic Tinvariant Borel probability measure on X. Suppose an observable h : X → R is continuous, h ∈ L 1 (µ) and h is almost surely deterministically k-predictable with respect to µ for some k ∈ N. Consider z i = z i (x), i = 0, . . ., n and Pred x,ε,n (y), defined in (4)-( 5) for µ-almost every x ∈ X and φ * µ-almost every y ∈ R k .
It follows that under the assumptions of Corollary 1.15, at a typical point y, the vector χ ε (y) for small ε > 0 may be interpreted as a limit of empirical means of the prediction map S associated with the Farmer-Sidorowich algorithm, while σ ε (y) is a suitable limit of empirical standard deviations of S.
1.4.Time-delay prediction theorems.As remarked above, deterministic predictability can be used for predicting future values of a given observable, rather than for a faithful reconstruction of the original dynamics.On the other hand, deterministic predictability holds whenever the conclusion of the Takens delay embedding theorem is satisfied, i.e. whenever the delay coordinate map φ is injective on X (with S given as S = φ • T • φ −1 ).In the language of the theory of dynamical systems, deterministic predictability means that φ is a semi-conjugation between the original system (X, T ) and its factor (φ(X), S), while injectivity of S means that the two systems are isomorphic (conjugate).Obviously, in many cases an observable h can be deterministically predictable, while φ is not injective (for example, this holds trivially when h is constant or T is the identity).Therefore, it is natural to expect that typical deterministic predictability may hold under weaker assumptions than the ones required for Takens-type delay embedding theorems.
Indeed, a result of this kind (called a time-delay embedding theorem for non-injective smooth dynamics) was proved by Takens [Tak02] in 2002.More precisely, he showed that if X is a compact C 1 -manifold and k > 2 dim X, then for a typical pair of a C 1 -map T : X → X and a C 1 -observable h : X → R, the k-delay coordinate map φ determines the (k − 1)-th iterate of T , i.e. there exists a (Lipschitz) map π : φ(X) → X such that (7) π This conclusion is clearly weaker than the injectivity of φ in the classical time-delay embedding results (in fact, without assuming the injectivity of T , the injectivity of φ does not hold even in generic sense, see [Tak02, Section 2]).On the other hand, the result implies typical deterministic predictability, as one can define the prediction map S in terms of π as S(y) = S(y 1 , . . ., y k ) = (y 2 , . . ., y k , h(T (π(y 1 , . . ., y k )))).
Similar results have been recently obtained in the topological category [Kat20,Kat21] for some classes of topological spaces.
In this paper we prove a general deterministic time-delay prediction theorem in the setup of locally Lipschitz maps on arbitrary sets in Euclidean space.In comparison to the Takens-type delay embedding theorem by Sauer, Yorke and Casdagli [SYC91], quoted in Subsection 1.1, we remove the assumption of the injectivity of T and the bound on the dimension of the sets of periodic points, to obtain deterministic predictability rather than the injectivity of the delay coordinate map.Theorem 1.16 (Deterministic time-delay prediction theorem).Let X ⊂ R N , N ∈ N, and let T : X → X be a locally Lipschitz map.Then a prevalent locally Lipschitz observable h : commutes, where φ is the k-delay coordinate map defined in (3) and S is the prediction map.Furthermore, if X is compact, then S is continuous.
The proof of this result (in fact, its extended version given by Theorem 5.2) is given in Section 5.
Remark 1.17.The existence of the map π from (7) requires a bound on the dimension of periodic points.8Therefore, one cannot hope to improve Theorem 1.16 to obtain a result analogous to the one in [Tak02].
In our previous paper [BGS20], we studied the problem of determining conditions under which a system (X, T ) can be almost surely reconstructed by the k-delay coordinate map, with respect to some probability distribution on the phase space X.In [BGS20, Theorem 1.2 and 4.3], we proved that for an injective, (locally) Lipschitz system (X, T ) on a Borel set in Euclidean space with a Borel probability measure µ on X, the k-delay-coordinate map corresponding to a prevalent (locally) Lipschitz observable is injective on a set of full µmeasure, provided k > dim H µ and the Hausdorff dimension of µ restricted to the set of p-periodic points is smaller than p for every p = 1, . . .k − 1.This agrees with the general philosophy of the SSOY conjectures, where the number of measurements can be reduced by half, if one is allowed to neglect sets of measure zero.Following the above approach, we can establish the following 'almost sure' version of Theorem 1.16.
Theorem 1.18 (Almost sure time-delay prediction theorem).Let X ⊂ R N , N ∈ N, be a Borel set, µ a Borel probability measure on X and T : X → X a locally Lipschitz map.Then a prevalent locally Lipschitz observable h : commutes, where φ is the k-delay coordinate map defined in (3), X h is a full µ-measure Borel subset of X, φ(X h ) is a Borel set and S is a Borel prediction map.Furthermore, if X is compact and k > dim H (supp µ), then S is continuous for a prevalent locally Lipschitz observable.
Remark 1.19.If, additionally, X is closed, T is Lipschitz and µ is T -invariant and ergodic, then the same conclusions hold under the assumption k > ID(µ) (cf.Remark 1.5).
Note that Theorem 1.4 follows directly from Theorem 1.18 and Theorem 1.14.In Section 6 we prove its extended version, given by Theorem 6.1.In Section 8 we provide an example showing that neither the assumption k > dim H µ nor the assumption k > ID(µ) is sufficient to obtain continuity of the prediction map S for a prevalent set of observables.
1.5.Estimating the minimal delay length.Oftentimes within the timed-delayed measurement framework, experiments are performed on an underlying system whose key attributes, e.g. its dimension, are unknown.In such situations, one is faced with the challenge of estimating the delay length k from a sample of finite time series as in (1), with the goal of achieving a reasonable model for the system.Manifestly what is considered reasonable depends on the application at hand (see [Aba96,Chapter 4] for an overview of algorithms aiming at this goal).A basic algorithm of this kind is the false nearest neighbor algorithm introduced in [KBA92] (see also [Aba96,Section 4.2]; similar algorithms were introduced in [ ČP88, LPS91]).Let us give a brief overview of this algorithm.For a given k (a candidate for the delay length) one considers points φ(x i ) ∈ R k , i = 1, . . ., m obtained from the measurements (1).For each point φ(x i ) one considers its nearest neighbor φ(x j ) in the k-dimensional model space.If the distance |h(T k x i ) − h(T k x j )| is much larger than the distance φ(x i ) − φ(x j ) (relative to some a priori threshold), then φ(x j ) is called a false nearest neighbor of φ(x i ) (as in such case one expects x i and x j to be far away in the original phase space X, implying that φ(x i ) and φ(x j ) are close due to the poor performance of φ).Next, one computes the proportion (relative to m) of points which have a false nearest neighbor.The smallest k for which this proportion is smaller than a fixed a priori rate is chosen as the delay length.The false nearest neighbor algorithm is often employed in practice (see e.g.[HOP97, HP97, MRCA14], and [CG19, ZLZ21, B la22, BR23] for some recent applications).It is also being used and studied in the literature on numerical methods for modelling and analysing chaotic dynamics [RSTA95, RM97, KMB15, CG19].Interestingly, for a number of dynamical systems with known dimension of the phase space, it has been observed that the delay length estimated by the above (or similar) algorithms is smaller than the theoretical Takens delay embedding theorem bound of 2 dim X, see [Aba96, Section 2.4.1] and [KBA92,ČP88,LPS91] for examples including the Lorenz and Rössler systems, the Hénon map and the dynamics of stochastic self-modulation of waves in a non-stationary medium.
It seems plausible that the SSOY prediction error conjecture together with our prediction error estimate (Theorem 1.6) present the correct framework for a mathematical explanation of this numerically observed phenomenon.Indeed, note that the algorithm described above tests how well future values of the time series are predicted from the previous ones, studying in fact the predictability problem rather than the reconstruction (embedding) problem.Furthermore, one is interested in having a satisfactory prediction property with high enough probability, rather then for all sampled points, hence adopting a probabilistic point of view.Moreover, the quantitative statement of the assertion (ii) of the SSOY prediction error conjecture as well as the assertion (i) of Theorem 1.6 provide a method for determining the dimension of the phase space as well as the desired delay length.9 Structure of the paper.Section 2 contains preliminary definitions and results, as well as a discussion on the notion of prevalence and its variants used in this paper.In Section 3 we prove a crucial combinatorial proposition (Proposition 3.1), and derive a corollary (Corollary 3.2), which is essential for the proofs of the main results of the paper.Section 4 provides general estimates of the Lebesgue measure of some sets of parameters related to delay coordinate maps for perturbations of Lipschitz observables.In Section 5 we prove Theorem 5.2, which provides an extended version of the deterministic time-delay prediction theorem (Theorem 1.16).Section 6 contains the proofs of the results on the properties of continuous almost sure deterministically predictable observables and the relations between the two notions of almost sure predictability (Proposition 1.13 and Theorem 1.14), as well as Theorem 6.1, which is an extended version of the almost sure time-delay prediction theorem (Theorem 1.18).In Section 7 we prove extended versions of the general SSOY predictability conjecture (Theorem 1.4), given by Theorem 7.1, and the prediction error estimate (Theorem 1.6), split into three results (Theorem 7.2, Corollary 7.3 and Theorem 7.4).In Section 8 we provide examples showing that Theorems 1.6 and 1.18 do not hold under weaker assumptions on the dimension of the given measure (Corollary 8.2 and Proposition 8.3).The paper concludes with a list of open questions related to the considered problems (Section 9).

Preliminaries
Notation.The number of elements of a set A is denoted by #A.The symbols • , dist(•, •), •, • and | • | denote, respectively, the Euclidean norm, distance, inner product and diameter in R N , N ∈ N. We write conv A for the convex hull of a set A ⊂ R N .The open δ-ball around a point x ∈ R N is denoted by B N (x, δ) and the closed ball by B N (x, δ) (sometimes we omit the dimension N in notation).By Leb we denote the Lebesgue measure (or the outer Lebesgue measure, in case of non-measurable sets).
Singular values.Let ψ : R m → R k be a linear map and let A be the matrix of ψ.It is a classical fact that the following numbers are equal: the dimension of the image of ψ; k minus the dimension of the kernel of ψ; the maximal number of linearly independent columns of A; the maximal number of linearly independent rows of A. This common value is called the rank of A and is denoted by rank A.
For p ∈ {1, . . ., k} let σ p (A) (we also use the symbol σ p (ψ)) be the p-th largest singular value of A, i.e. the p-th largest square root of an eigenvalue of the matrix A * A. The following fact is standard (see e.g.[Rob11, Lemma 14.2]).
Lemma 2.1.The rank of a matrix equals the number of its non-zero singular values.
Prevalence.Below we present the definition of prevalence -a notion introduced by Hunt, Sauer and Yorke in [HSY92], which may be considered as an analogue of 'Lebesgue almost sure' condition in infinite dimensional normed linear spaces.Definition 2.3.By a (complete) linear metric in a linear space we mean a (complete) metric which makes addition and scalar multiplication continuous.Let V be a complete linear metric space (i.e. a linear space with a complete linear metric).A Borel set S ⊂ V is called prevalent if there exists a Borel measure ν in V , which is positive and finite on some compact set in V , such that for every v ∈ V , we have v + e ∈ S for ν-almost every e ∈ V .A non-Borel subset of V is prevalent if it contains a prevalent Borel subset.For more information on prevalence we refer to [HSY92] and [Rob11, Chapter 5].
Let us now describe prevalence in the spaces of (locally) Lipschitz and Hölder observables.Let X ⊂ R N and β ∈ (0, 1].Recall that a function h : X → R is β-Hölder if there exists c > 0 such that |h(x) − h(y)| ≤ c x − y β for every x, y ∈ X.We say that a function h : X → R is locally β-Hölder on X, if for every x ∈ X there exists an open neighbourhood U of x such that h| U ∩X is β-Hölder.We denote the space of β-Hölder (resp.locally β-Hölder) functions on X by H β (X) (resp.H β loc (X)).A function h : X → R is Lipschitz (resp.locally Lipschitz ) if it is 1-Hölder (resp.locally 1-Hölder).The space of Lipschitz (resp.locally Lipschitz) functions on X is denoted by Lip(X) (resp.Lip loc (X)).
For X ⊂ R N , β ∈ (0, 1] and h : X → R let (where h β,X can be infinite).Suppose X is bounded.Then In this case, it is a standard fact that H β (X) endowed with the β-Hölder norm • β,X is a Banach space (in particular, a complete linear metric space).We also set • to be the Lipschitz norm on Lip(X).Consider now the space H β loc (X) for a (possibly unbounded) set X ⊂ R N .Clearly, H β loc (X) is a linear space.To introduce a linear metric in H β loc (X), fix a countable basis B = {U 1 , U 2 , . ..} of the standard topology in R N , such that B consists of bounded sets.For functions h, h ∈ H β loc (X) we define Let us emphasize that even if X is bounded, it might happen H β (X) = H β loc (X).However, if X is compact, then H β (X) = H β loc (X), with • H β (X) and d β,X (•, •) generating the same topology.
In this paper we use the notion of prevalence in the sense of Definition 2.3 applied suitably to the spaces V = H β (X) or V = H β loc (X).Remark 2.4.Similarly as in [SYC91,Rob11,BGS22], to prove prevalence of appropriate sets of observables, we check the following condition, which is sufficient for prevalence.Let {h 1 , . . ., h m }, m ∈ N, be a finite set of functions in V = H β (X) or V = H β loc (X), called the probe set.Define ξ : R m → V by ξ(α 1 , . . ., α m ) = m j=1 α j h j .Then ν = ξ * Leb (where Leb is the Lebesgue measure in R k ) is a Borel measure in V , which is positive and finite on the compact set ξ(B m (0, 1)).For this measure, a sufficient condition for a set S ⊂ V to be prevalent is that for every h ∈ V , the function h + m j=1 α j h j is in S for Lebesgue-almost every (α 1 , . . ., α m ) ∈ R m .In this case, we say that S is prevalent in V with the probe set {h 1 , . . ., h m }.Following [BGS20, BGS22], as probe sets we take suitable interpolating families of functions in V , defined below.Definition 2.5.Let X be a subset of R N .A family of functions h 1 , . . ., h m : X → R is called a k-interpolating family in X, if for every collection of distinct points x 1 , . . ., x k ∈ X and every has maximal rank.Note that the same is true for any collection of distinct points with ≤ k.
Remark 2.6.It is known that any linear basis {h 1 , . . ., h m } of the space of real polynomials of N variables of degree at most k − 1 is a k-interpolating family in R N (see e.g.[GS00, Section 1.2, eq.(1.9)]).Obviously, the functions from this family are locally Lipschitz on any set X ⊂ R N , in particular they belong to H β loc (X) for β ∈ (0, 1].Dimensions and measures.Definition 2.7.For s > 0, the s-dimensional (outer) Hausdorff measure of a set X ⊂ R N is defined as For a bounded set X ⊂ R N and δ > 0, let N (X, δ) denote the minimal number of balls of diameter at most δ required to cover X.The lower and upper box-counting (Minkowski) dimensions of X are defined, respectively, as The lower (resp.upper) box-counting dimension of an unbounded set is defined as the supremum of the lower (resp.upper) box-counting dimensions of its bounded subsets.By separability one may always assume that the supremum is over a countable number of bounded subsets.
Using this together with the fact dim Recall that for a measure µ on a set X and a measurable set Here dim may denote any one of the dimensions defined above.
For a Borel probability measure µ in R N with compact support define its lower and upper information dimensions as If ID(µ) = ID(µ), then we denote their common value as ID(µ) and call it the information dimension of µ.For more information on dimension theory in Euclidean spaces see [Fal14,Mat95,Rob11].
Definition 2.8.The support of a Borel measure µ in a metric space, denoted supp µ, is the smallest closed set of full µ-measure.For a Borel map φ, by φ * µ we denote the push-forward of µ under φ, defined by φ * µ(E) = µ(φ −1 (E)) for Borel sets E. We say that measures µ and ν are mutually singular if there exists a measurable set X of full µ-measure such that ν(X) = 0. We denote this fact by µ ⊥ ν.
Definition 2.9.Let µ be a Borel measure on a Borel set X and let T : X → X be a Borel map.The measure Definition 2.10 (Natural measure).Let X be a compact Riemannian manifold and for almost every x ∈ B(Λ) with respect to the volume measure on X, where δ y denotes the Dirac measure at y and the limit is taken in the weak- * topology.

Combinatorics of orbits
Let X ⊂ R N be an arbitrary set and let T : X → X be a transformation.Fix k ∈ N and let h 1 , . . ., h m : X → R be a 2k-interpolating family on X, according to Definition 2.5.Consider an observable h : X → R. For α = (α 1 , . . ., α m ) ∈ R m let h α : X → R be given by )) be the k-delay coordinate map corresponding to h α (here it is useful to include the dependence on T and k in the notation).In this section we study the combinatorics of orbits of points of X in relation to the properties of φ T α,k .Note that for x, y ∈ X we have .
The aim of this section is to prove the following proposition, which provides a basic technical tool used in the proofs of the main results of this paper.Note that as the proof of the proposition is purely combinatorial, we do not assume any regularity on the dynamics T and observables h, h 1 , . . ., h m , except of the condition that {h 1 , . . ., h m } is a 2k-interpolating family in X.
Proposition 3.1.For every x, y ∈ X, if rank D x,y < k, then This proposition together with Lemma 2.1 implies immediately the following corollary, which is a key result used in this paper.
The rest of this section is devoted to the proof of Proposition 3.1.Let us introduce first some notation.For x ∈ X, let Orb T (x) = {T n x : n ≥ 0} be its orbit.We will often write Orb(x) instead of Orb T (x) if the transformation is clear from the context.We call point x ∈ X periodic if there exists p ≥ 1 such that T p x = x and pre-periodic (or eventually periodic) if it is not periodic and there exists n ≥ 1 such that T n x is periodic.If x is neither periodic nor pre-periodic, it is called aperiodic.Note that x is aperiodic if and only if Orb(x) is infinite.
Every orbit Orb(x) can be uniquely presented as a disjoint union of a pre-periodic part and a cycle (we adopt a convention that for aperiodic x, the pre-periodic part is infinite and the cycle is empty).We will denote by P (x) the length of the pre-periodic part and by C(x) the length of the cyclic part.Now we proceed with the proof of Proposition 3.1.Fix a pair of points x, y ∈ X and assume (11) rank D x,y < k.
Note that we can assume (12) as otherwise T k x = T k y and the lemma holds trivially.Denote the elements of the set {x, T x, . . ., T k−1 x, y, T y, . . ., T k−1 y} by z 0 , . . ., z −1 , without multiplicities and preserving the above order, so that = #{x, T x, . . ., T k−1 x, y, T y, . . ., T k−1 y}.With this notation, the matrix D x,y can be written as a product of matrices is a k × matrix, where (note that we enumerate rows and columns starting from 0).By (12), the entries m i,j are well-defined.Moreover, every row of J x,y consists of a single entry 1, single entry −1 and − 2 zeros.Since z 0 , . . ., z −1 are distinct, ≤ 2k and {h 1 , . . ., h m } is a 2k-interpolating family, the matrix V x,y has maximal rank.Therefore, rank D x,y = rank J x,y , so by (11), (13) rank J x,y < k.
Let G be a directed graph consisting of the set of vertices and directed edges T i x → T i y for i = 0, . . ., k − 1.Let G denote the non-directed version of G (i.e. a non-directed graph with the same set V of vertices and non-directed edges T i x ↔ T i y for i = 0, . . ., k − 1).Note that the points T k x, T k y may be contained in V , and the graph can have vertices of degree higher than 1.The key observation is the following.
y) for i = 0, . . ., L − 1 as for each i there holds {u i , u i+1 } = {T j x, T j y} for some j ∈ {0, . . ., k − 1}.Hence, . This ends the proof, as the total number of edges in G is at most k, hence any two vertices connected by a path are in fact connected by a path of length at most k.
We proceed with the proof of Proposition 3.1.We will consider four (not mutually disjoint) cases, which together cover all the possibilities, where at least one of the points x, y is periodic or aperiodic (note that only Case 4 will actually require the assumption of periodicity).Then we will show how to reduce the remaining case (with both x, y being pre-periodic) to the already established ones.
By symmetry, we can assume # Orb(x) ≥ k.Then ≥ k and the matrix J x,y has entries 1 on the main diagonal. 10Therefore, the first k × k square submatrix of J x,y has the form I − A, where I is the identity matrix and we see that A is the adjacency matrix of a subgraph of G, obtained by restricting the set of vertices to {x, T x, . . .T k−1 x} and including only those edges . Therefore, we can assume that A is not nilpotent, so the graph described by A contains a cycle.Consequently, G contains a cycle.We will prove that this implies T k x, T k y ∈ V and the existence of a path from T k y to T k x in G. Then there is a path from T k x to T k y in G and the assertion of Proposition 3.1 holds by Lemma 3.3 (with constant k instead of 2k).
To show that there is a path from T k y to T k x in G, note first that any cycle in G is of the form (14) for some i 1 , . . ., i s ∈ {0, . . ., k − 1}.If i j < k − 1 for each j = 1, . . ., s, then the following path (the image of the path (14) under T ) x also describes a cycle in G. Therefore, there exists a cycle in G containing an edge T k−1 x → T k−1 y, and hence there is a path in G of the form where i 1 , . . ., i s ∈ {0, . . ., k − 1}.In fact, i 1 , i s < k − 1 since T k−1 x = T k−1 y and all others indices can be assumed to be smaller than k − 1, as otherwise we can consider a shorter path.Therefore, T k x, T k y ∈ V and As the orbits of x, y are finite, each of the points x, y is periodic or pre-periodic, and hence the numbers C(x), P (x), C(y), P (y) are finite with C(x), C(y) > 0. For simplicity, denote p = C(x), q = C(y) and note that # Orb(x) = P (x) + p and # Orb(y) = P (y) + q.By assumption, we have p = rq for some r ∈ N. As k − p ≥ P (x) and k − p ≥ P (y), we see that T k−p x belongs to the cyclic 10 By the main diagonal of a matrix we understand the elements of the matrix with coordinates of the form (i, i).
part of Orb(x) and T k−p y belongs to the cyclic part of Orb(y).Therefore, T k−p x = T k x and T k−p y = T k−rq y = T k y.Consequently, T k x, T k y ∈ V and G contains the path Hence, can use Lemma 3.3 to show Proposition 3.1, with constant k instead of 2k.
Note that if Orb(x) ∩ Orb(y) = ∅ and both orbits are finite, then the cyclic parts of Orb(x) and Orb(y) have to be equal, hence C(x) = C(y).Therefore, Cases 1 and 2 cover all the possibilities with Orb(x) ∩ Orb(y) = ∅ (since then # Orb(y) ≤ k implies P (y) + C(x) ≤ k).
This implies the following equalities.
• T k−p x = T k x (as k − p ≥ P (x), and hence T k−p x belongs to the cyclic part of Orb(x) and p is its period), • T k−q−p y = T k−p y (as k − q − p ≥ P (y)), • T k−q−p x = T k−q x (as k − q − p ≥ P (x)), • T k−q y = T k y (as k − q ≥ P (y)).Using these equalities, we see that T k x, T k y ∈ V and As x is periodic, the matrix J x,y has the following form: (we present here a particular case k = 10, p = 4, P (y) = 2, q = 5 for illustration, but the structure of the matrix is general).
We will now perform a sequence of elementary row operations to 'move' all the entries 1 to the main diagonal of the matrix.Namely, we perform the following sequence of k − p operations on J x,y , to obtain a new matrix K x,y (recall that we enumerate rows and columns starting from 0).
(1) Subtracting row k − 1 − p from row k − 1 (the last row of the matrix).
As T k−1 x = T k−1−p x, this operation deletes the entry 1 from row k − 1.Since k − 1 − p < # Orb(y) (as k ≤ # Orb(x) + # Orb(y)), this also adds the entry 1 in column k − 1 − p + p = k − 1 of row k − 1.We claim that this entry does not cancel with the entry −1 in row k − 1.For that, we have to check that the unique entry −1 in row k − 1 is not located in column k − 1.There are two possibilities, corresponding to the assumption C(y) C(x) or P (y)+C(x) ≥ k.If P (y)+C(x) ≥ k, then k − p − 1 < P (y), hence the column k − 1 'corresponds' to the preperiodic part of the Orb(y) and thus contains a single entry −1 (in row k − p − 1), which is therefore not deleted.Alternatively, we have C(y) C(x).Then, the cancellation of the entry −1 from row k − 1 by subtracting row k − 1 − p would mean that T k−1 y = T k−1−p y.This cannot happen as C(y) C(x).Therefore, the final effect of this operation is moving the entry 1 in the last row from column k mod p to column k − 1 in the same row, i.e. to the main diagonal (without cancelling any entries −1).
Similarly, this moves the entry 1 in row k − 2 to the main diagonal. . . .(k − p) Subtracting row 0 from row p.This moves the entry 1 in row p to the diagonal of the first k × k square submatrix.
Note that in the above sequence of operations, once the row is modified, it is never being used later to modify another row, hence indeed the only results of the operations are the ones described above.Consequently, in the new matrix K x,y , all the rows between p and k − 1 have their unique entry 1 on the main diagonal, while all the entries −1 remain in the same location as in J x,y .As J x,y has entries 1 on the main diagonal in the rows between 0 and p − 1, and these rows were not modified by the above operations, we conclude that K x,y is of the following form: The key observation is that the matrix K x,y is equal to J x,y ( T , k) for a suitably changed dynamics T on the set Orb(x) ∪ Orb(y).Indeed, define T : Orb(x) ∪ Orb(y) → Orb(x) ∪ Orb(y) by T | Orb(y) = T | Orb(y) and T (T j x) = T j+1 x for 0 ≤ j < p − 1 y for j = p − 1 .
In order words, T follows the dynamics of T on Orb(y) and changes the periodic orbit of x into a pre-periodic part, whose final element is mapped to y instead of returning to x.Note that # Orb T (x) = Orb T (x) + Orb T (y) ≥ k, hence J x,y ( T , k) has entries 1 exactly on the main diagonal.The same is true for K x,y .As additionally T | Orb(y) = T | Orb(y) and the matrices J x,y (T ) and K x,y have entries −1 at the same positions, we see that indeed Note that rank J x,y ( T , k) < k, because otherwise rank J x,y (T, k) = rank K x,y = rank J x,y ( T , k) = k, which is impossible by (13).Since Orb T (x) ≥ k, Case 1 applies to the points x, y undergoing the dynamics of T and we can use Proposition 3.1.Hence, we have Consequently, as x is p-periodic under T , we have which completes the proof of Proposition 3.1 in Case 4.
Note that Cases 1-4 (by applying them with x and y possibly exchanged) cover all the possibilities when at least one of the points x, y is periodic or aperiodic.Therefore, it remains to consider the case when both points x and y are pre-periodic.We will explain now how to reduce this case to the previous ones.We can assume # Orb(x) ≤ k and # Orb(y) ≤ k, as otherwise Case 1 applies (which requires no assumptions on the periodicity of x or y).Moreover, as noted above, we can assume Orb(x) ∩ Orb(y) = ∅, as the case of non-disjoint orbits follows from Cases 1-2, which do not require the assumption that at least one of the points is periodic.By symmetry, we can assume 0 < P (x) ≤ P (y).Then the matrix J x,y is of the following block form, consisting of four column blocks: , where I P (x) and I P (y) are identity matrices of size P (x) × P (x) and P (y) × P (y), respectively, and C x , C y are circulant matrices of the form (again, we take here p = 3, k − P (x) = 7 and = 2, k − P (y) = 5 for illustration, but the circulant structure is general).Add the first P (x) columns of J x,y to the first P (x) columns of the third column block of J x,y (the one containing −I P (y) ).As P (x) ≤ P (y), this deletes the first P (x) entries −1 in the third column block.Therefore, we obtain a new matrix J x,y of the form , such that rank J x,y = rank J x,y , where K x,y is a matrix of size (k − P (x)) × (C(x) + # Orb(y)) and of the form , with s = P (y)−P (x).Let K x,y be the matrix obtained from K x,y by deleting zero columns.As there are P (x) of such columns, we obtain a matrix of size (k−P (x))×(C(x)+# Orb(y)−P (x)) and of the form A crucial observation is that K x,y = J T P (x) x,T P (x) y (T, k − P (x)).
Moreover, rank K x,y < k − P (x), since otherwise (15) implies rank J x,y = k, which contradicts (13).Hence, as T P (x) x is periodic, one of Cases 1-4 applies for the points T P (x) x, T P (x) y and k − P (x) instead of k.Hence, we can use Proposition 3.1 to show y) and, consequently, Proposition 3.1 holds in this case.This completes the proof of the lemma.

Covering bounds
In this section we give bounds on the Lebesgue measure of the set of 'bad' parameters α ∈ R m in terms of covers of the set X (or its subsets).Here and in the subsequent sections we keep all the notation from the beginning of Section 3 except that we write φ α instead of φ T α,k for short.We consider also the matrix D x,y defined in (10).
(i) If {B(y i , ε i )} i∈I , is a finite or countable cover of Y with balls centred at y i ∈ Y , then Remark 4.2.Without any assumptions on measurability of Y and Z, the sets appearing in the left hand sides of the inequalities in items (i) and (ii) might not be Lebesgue measurable.
In this case, it follows from the proof that the inequality holds for the outer Lebesgue measure.
Proof of Lemma 4.1.By assumptions, φ α and φ α •T are β-Hölder on Y with constants uniform with respect to α ∈ B m (0, 1) (see [BGS20,p. 4955]).Hence, we can set To prove assertion (i), fix x ∈ Y and consider α ∈ B m (0, 1) and Therefore, by (9) and Lemma 2.2 Leb α ∈ B m (0, 1) : This completes the proof of assertion (i).The statement (ii) is proved in the same way, where instead of fixing x ∈ Y and considering an approximation of y by y i , we approximate the pair (x, y) by (x i , y i ).We omit the details.
If Y is compact, we can apply Corollary 3.2 to obtain the following bound.
Lemma 4.3.Let Y ⊂ X be as in Lemma 4.1 and assume additionally that Y is compact.Then for every δ > 0 there exists Note that F δ is compact as a projection of the compact set and φ α (T x) − φ α (T y) ≥ δ onto the first two coordinates.Corollary 3.2 implies that the k-th singular value σ k (D u,v ) is positive on F δ .By the continuity of h i • T j , the matrix D x,y depends continuously on (x, y) and hence so does σ k (D x,y ) (see e.g.[GVL13, Corollary 8.6.2]).Therefore, by the compactness of F δ , we can set If we now fix x ∈ Y and consider the set with the bound independent of x ∈ Y .Let {B(y i , ε i )} i∈I be a finite or countable cover of Y with y i ∈ Y .Set J = {i ∈ I : B(y i , ε i ) ∩ F δ,x = ∅} and for i ∈ J choose an arbitrary , then by the definition of F δ,x , Lemma 4.1(i) and ( 16), there exists

Deterministic predictability -proof of Theorem 1.16
We begin by proving Proposition 1.9, which shows the continuity of the prediction map for continuous deterministically predictable observables on compact spaces.
Proof of Proposition 1.9.The equivalence of assertions (a)-(c) is obvious.Suppose X is compact and T and h are continuous.For every closed set F ⊂ R k , we have Since φ • T is continuous and X is compact, the set (φ • T ) −1 (F ) is compact.Hence, by the continuity of φ, the set S −1 (F ) is compact.This shows that the map S is continuous.
We will also repeatedly use the following lemma.
Lemma 5.1.Let T : X → X be a locally Lipschitz map on a set X ⊂ R N .Let h 1 , . . ., h m : X → R be locally β-Hölder observables and let k ∈ N. Then X can be presented as a countable union X = ∞ n=1 X n of bounded sets X n , such that X n ⊂ X n+1 for n ∈ N, the maps T, T 2 , . . ., T k are Lipschitz on X n and h 1 , . . ., h m are β-Hölder on each set T i (X n ), i = 0, . . ., k.If X is Borel, then the sets X n can be also chosen to be Borel.
Proof.Using local Lipschitz and Hölder conditions together with the fact that R N is a hereditarily Lindelöf space, one can find a countable open cover V of X by bounded sets, such that for every V ∈ V, the map T is Lipschitz on V and h, h 1 , . . ., h m are β-Hölder on V .Let U be the collection of all sets of the form Let us turn now to the proof of the deterministic time-delay prediction theorem (Theorem 1.16).We will actually prove the following extended version of the result.Recall that H s , s > 0, denotes the s-dimensional Hausdorff measure (see Definition 2.7).
Theorem 5.2 (Deterministic time-delayed prediction theorem -extended version).Let T : X → X be a locally Lipschitz map on a set X ⊂ R N .Fix k ∈ N and β ∈ (0, 1] such that H βk (X × X) = 0. Let {h 1 , . . ., h m } be a 2k-interpolating family in X consisting of locally β-Hölder functions.Let h : X → R be a locally β-Hölder observable.Then the observable h α = h + m j=1 α j h j is deterministically k-predictable for Lebesgue-almost every α = (α 1 , . . ., α m ) ∈ R m .Furthermore, if X is compact, then the prediction map S α corresponding to h α is continuous for Lebesgue-almost every α ∈ R m .Proof of Theorem 1.16 assuming Theorem 5.2.Since a countable intersection of prevalent sets is prevalent (see e.g.[HSY92]), we can fix a number k > 2dim B X. Note that by (8) and [ORS16, Lemma A1], hence H k (X × X) = 0 and Theorem 5.2 can be applied with β = 1.The prevalence of deterministically predictable observables in the space Lip loc (X) follows from Remark 2.4.
Without loss of generality one can show (18) for almost every α in the closed unit ball B m (0, 1).Let X = ∞ n=1 X n be the decomposition from Lemma 5.1 (so that we are in position to invoke Lemma 4.1 with Y = X n ).Note that in order to establish (18), it suffices to prove Leb(A n ) = 0 for every n ∈ N, where By Corollary 3.2, σ k (D x,y ) > 0 for x, y ∈ X and α ∈ R m such that φ α (x) = φ α (y) and φ α (T x) = φ α (T y).Therefore, defining for n, j ∈ N the sets we have Thus, our goal is to prove11 (19) Leb(A n,j ) = 0 for every n, j ∈ N.
so there exists a countable cover of Z n,j by balls {B(( By Lemma 4.1(ii) applied with ε = 0, we have As for a fixed n and j we can take ρ arbitrarily small, the above yields (19).This completes the proof of (18), which implies the existence of the prediction map S α for almost every α ∈ R m .The continuity of S α for a compact X follows from Proposition 1.9.
6. Almost sure predictability -proofs of Theorems 1.14 and 1.18 We start this section by proving Proposition 1.3, describing a relation of χ ε and σ ε from Definition 1.1 to the Farmer-Sidorowich algorithm for ergodic systems.
Proof of Proposition 1.3.Consider x ∈ X, y ∈ R k , ε > 0 and the k-delay coordinate map φ corresponding to h.Note that since µ is T -invariant, we have ´|h| Since µ is T -invariant and ergodic, by the Birkhoff ergodic theorem (see e.g.[Wal82, Theorem 1.14]) we obtain (20) for µ-almost every x ∈ X, φ * µ-almost every y ∈ R k and ε > 0. This proves the first assertion of the proposition.
To show the second assertion, note that and, analogously as in (20), by the Birkhoff ergodic theorem we obtain for µ-almost every x and φ * µ-almost every y.By (20), for µ-almost every x and φ * µ-almost every y.Furthermore, by the Schwarz inequality, Again by the Birkhoff ergodic theorem (as for µ-almost every x ∈ X and φ * µ-almost every y ∈ φ(X), so |III| −−−→ n→∞ 0 by (20).We conclude that Var x,ε,n (y) −−−→ n→∞ σ 2 ε (y) for µ-almost every x ∈ X and φ * µ-almost every y ∈ R k , which shows the second assertion of the proposition.Now we prove the results on the properties of continuous almost sure deterministically predictable observables for continuous systems and the relations between the two notions of almost sure predictability (Proposition 1.13 and Theorem 1.14).
Proof of Proposition 1.13.First, we prove assertion (i).Since X h is Borel, by the regularity of the measure µ we can assume that it is σ-compact (i.e. a countable union of compact sets).
Then, as φ is continuous by assumption, the set φ(X h ) is σ-compact, hence Borel.Moreover, for every closed set F ⊂ R k , we have Hence, by the continuity of φ, the set S −1 (F ) is σ-compact, hence Borel.This shows that the map S is Borel and proves assertion (i).
To show assertion (ii), assume φ • T ∈ L 1 (µ) and note that this holds in particular if h is bounded, since then φ • T ≤ k sup |h|.Then by the almost sure deterministic predictability of h we have S ∈ L 1 (φ * µ) and by the integral differentiation theorem (see e.g.[Fed69, Theorem 2.9.8] or [LY85, Lemma 4.1.2]) for φ * µ-almost every y ∈ φ(X).This proves assertion (ii).Finally, note that if Proof of Theorem 1.14.To show assertion (i), suppose φ • T ∈ L 2 (µ) and note (similarly as in the proof of Proposition 1.13) that this holds in particular if h is bounded.For y ∈ R k and ε > 0 such that µ(φ −1 (B(y, ε))) > 0, we have By the almost sure deterministic predictability of h we have S ∈ L 2 (φ * µ), so S(•) − S(y) 2 ∈ L 1 (φ * µ).By the integral differentiation theorem for φ * µ-almost every y ∈ φ(X).Furthermore, by Proposition 1.13(ii), for φ * µ-almost every y ∈ φ(X).This shows that h is almost surely k-predictable, proving assertion (i).Now we prove assertion (ii).In view of assertion (i), it suffices to show that if h is almost surely k-predictable, then it is almost surely deterministically k-predictable.To do it, we use the properties of the system of conditional measures {µ y } y∈φ(X) for the map φ, in the sense of [BGS22, Definition 2.4].In particular, µ y is a Borel probability measure on φ −1 ({y}) for φ * µ-almost every y ∈ φ(X) and µ(A) = ´φ(X) µ y (A)dφ * µ(y) for every µ-measurable set Note that the function φ(X) y → ´φ • T dµ y is φ * µ-measurable by [BGS22, Theorem 2.5], so the function X x → ´φ • T dµ φ(x) is µ-measurable.Hence, the set Xh is µ-measurable.By [BGS22, Corollary 3.3], for φ * µ-almost every y ∈ φ(X) there exists a set X (y) ⊂ φ −1 ({y}) of full µ y -measure, such that φ • T is constant on X (y) .As X (y) ⊂ Xh , by the properties of the system of conditional measures we obtain ) dφ * µ(y) = 0, so the set Xh has full µ-measure.By the definition of Xh , if x 1 , x 2 ∈ Xh and φ(x ).The regularity of the measure µ implies that there exists a Borel set X h ⊂ Xh of full µ-measure.This shows that h is almost surely deterministically k-predictable, proving assertion (ii).
As previously, note that if µ is T -invariant, then the condition h ∈ L 2 (µ) implies φ • T ∈ L 2 (µ) (see the proof of Proposition 1.3).Now we prove the almost sure time-delay prediction theorem (Theorem 1.18).Again, we actually show its extended version.Theorem 6.1 (Almost sure time-delay prediction theorem -extended version).Let X ⊂ R N be a Borel set, µ a Borel probability measure on X and T : X → X a locally Lipschitz map.Fix k ∈ N and β ∈ (0, 1] such that µ ⊥ H βk , and let {h 1 , . . ., h m } be a 2k-interpolating family in X consisting of locally β-Hölder functions.Let h : X → R be a locally β-Hölder observable.Then for Lebesgue almost every α = (α 1 , . . ., α m ) ∈ R m , the observable h α = h + m j=1 α j h j is almost surely deterministically k-predictable with respect to µ, i.e. there exists a prediction map S α defined on a Borel set X α ⊂ X of full µ-measure.If µ is T -invariant, then X α can be chosen to satisfy T (X α ) ⊂ X α .If, additionally, X is compact and H βk (supp µ) = 0, then S α is continuous for almost every α ∈ R m .
Proof of Theorem 1.18 assuming Theorem 6.1.Again, since a countable intersection of prevalent sets is prevalent, we can fix a number k > dim H µ. Fix β = 1 and apply Theorem 6.1, noting that dim H µ < k implies µ ⊥ H k .As before, the prevalence of almost surely predictable observables follows from Remark 2.4.
Proof of Theorem 6.1.The proof is similar to the proof of Theorem 5.2.The main difference is that due to Fubini's theorem we can work varying y for a fixed x rather than varying a pair (x, y).Hence, it suffices to consider covers of X rather than covers of X × X. Detailed arguments are given below.
Once again, note that it is enough to prove the theorem for almost every α ∈ B m (0, 1).By the assumption µ ⊥ H k , there exists a Borel subset of X of full µ-measure and zero H kmeasure.Combining this with Lemma 5.1, we can find a non-decreasing sequence of bounded Borel sets X n of positive µ-measure, satisfying µ , such that T, T 2 , . . ., T k are Lipschitz on X n and h, h 1 , . . ., h m are β-Hölder on each set T i (X n ), i = 0, . . ., k. Denote by the normalized restriction of µ to X n .By Theorem 1.14, to prove the main assertion of the theorem, it is enough to show that for a fixed n ≥ 1, for almost every α ∈ B m (0, 1) there exists a Borel set for every x, y ∈ X α .To this end, it is enough to prove that for every x ∈ X we have Indeed, if (21) holds, then by Fubini's theorem 12 for the measure µ n ⊗ Leb, and consequently, for Lebesgue-almost every α ∈ B m (0, 1).Then the suitable set X α ⊂ X n of full µ n -measure can be defined as the complement of zero measure set above.
Hence, to prove (21), it is enough to show Leb(A n,j ) = 0 for every n, j ∈ N, x ∈ X n .Apply Lemma 4.1 for Y = X n to find a suitable number D = D(X n,j ).Fix ρ > 0. Since by assumption, 12 The measurability of the set to which Fubini's theorem is applied can be checked e.g. using [BGS20, there exists a countable cover of X n,j by balls B(y i , ε i ), i ∈ N, with y i ∈ X n,j and such that By Lemma 4.1(i) applied with ε = 0, As for fixed n, j and x we can take ρ arbitrarily small, we have Leb(A n,j ) = 0.This gives (21) and completes the proof of the existence of the prediction map S α for Lebesgue-almost every α.
The forward invariance of the set X α in the case when µ is T -invariant measure follows from Remark 1.12.
Suppose now that X is compact and consider the question of the continuity of S α .Unlike in Theorem 5.2, we cannot invoke Proposition 1.9, as the set X α is not guaranteed to be compact.We therefore have to combine the arguments form the proof above and Proposition 1.9 to conclude a stronger property.This requires a stronger assumption H βk (supp µ) = 0 (in fact, as shown in Corollary 8.2, the assumption µ ⊥ H βk , or even dim H µ < βk, is too weak to guarantee the continuity of S α for typical α).
Set Y = supp µ.By definition, H βk (Y ) = 0.Moreover, Y is compact, T, T 2 , . . ., T k are Lipschitz on Y and h, h 1 , . . ., h m are β-Hölder on each set T i (Y ), i = 0, . . ., k.Therefore, we can repeat the proof of ( 21 for some sequence of points y n ∈ Y and a fixed ε > 0.Then, by the compactness of Y we can take a subsequence y n k → y for some y ∈ Y , so by the continuity of φ α , we have φ α (x) = φ α (y) and φ α (T x) = φ α (T y) for a Lebesgue-positive measure set of α ∈ B m (0, 1), contradicting (22).Hence, for every x ∈ Y , the property (23) holds for almost every α ∈ B m (0, 1).By Fubini's theorem for the measure µ ⊗ Leb, for almost every α ∈ B m (0, 1) there exists a set Y α ⊂ Y of full µ-measure, such that (23) holds for every x ∈ Y α .Let Xα = X α ∩ Y α .Then Xα is a set of full µ-measure, S α is defined on φ α ( Xα ) and (23) holds for every x ∈ Xα .We claim that S α is continuous on φ α ( Xα ).Indeed, suppose z i ∈ φ α ( Xα ) for i ∈ N and z i → z for some z ∈ φ α ( Xα ).Let y i , x ∈ Xα such that φ α (y i ) = z i and φ α (x) = z.Passing to a subsequence, we can assume y i → y for some y ∈ Y .By the continuity of φ α , we have φ α (y) = z = φ α (x), so (23) implies φ α (T y) = φ α (T x).Hence, by the continuity of T This shows the continuity of S α on φ α ( Xα ) for almost every α ∈ B m (0, 1).Remark 6.2.Similarly to the proof of [BGS20, Theorem 4.3], the main part of Theorem 6.1 (almost sure predictability of prevalent observables) can be extended to the case when the measure µ is σ-finite.We leave the details to the reader.7. SSOY Conjectures -proofs of Theorems 1.4 and 1.6 In this section we prove the general SSOY predictability conjecture (Theorem 1.4) and a part of SSOY prediction error conjecture, i.e. the prediction error estimate (Theorem 1.6).Again, we prove extended versions of the theorems.
The first result, extending Theorem 1.4, follows directly from Theorem 6.1, applied for β = 1, and Theorem 1.14.Theorem 7.1 (General SSOY predictability conjecture -extended version).Let X ⊂ R N be a compact set, µ a Borel probability measure on X and T : X → X a Lipschitz map.Fix k ∈ N and β ∈ (0, 1] such that µ ⊥ H βk .Let {h 1 , . . ., h m } be a 2k-interpolating family on X consisting of β-Hölder functions.Let h : X → R be a β-Hölder observable.Then for almost every α ∈ R m , the observable h α = h + m j=1 α j h j is almost surely predictable.Recall that by Theorem 1.14, almost sure predictability of h α in Theorem 7.1 is equivalent to almost sure deterministic predictability of h α . The extended version of the prediction error estimate Theorem 1.6 is divided into three parts, where the first two correspond to the assertion (ii) of the theorem, while the third one deals with assertion (i).First, we observe that the assertion Theorem 1.6(ii) holds for all deterministically predictable continuous observables on compact space with continuous dynamics.
Theorem 7.2.Let X ⊂ R N be a compact set and T : X → X a continuous map.If a continuous observable h : X → R is deterministically k-predictable for some k ∈ N and the prediction map S is continuous, then for every δ > 0 there exists ε 0 = ε 0 (h, k, δ) > 0 such that σ ε (φ(x)) < δ for every x ∈ X and every 0 < ε < ε 0 , where φ is the k-delay coordinate map corresponding to h, and σ ε comes from Definition 1.1.In particular, the assertion holds if φ is injective.
Proof.Recall that the prediction map S : φ(X) → φ(X) satisfies S • φ = φ • T .Therefore, for every x ∈ X and ε > 0 we have Moreover, as S is uniformly continuous on the compact set φ(X), for every δ > 0 there exists ε 0 > 0 such that for every 0 .
By the definition of χ ε and σ ε (see Definition 1.1), this gives φ(T x) − χ ε (φ(x)) ≤ δ 2 and σ ε (φ(x)) ≤ δ 2 < δ.If φ is injective on X, then the prediction map S is given by S = φ • T • φ −1 and hence it is continuous (as φ −1 is continuous the inverse of a continuous map on a compact set).Theorem 7.2 implies the following corollary.Similarly as above, we write σ α,ε for σ ε from Definition 1.1 corresponding to an observable h α .
Corollary 7.3 (Prediction error estimate, assertion (ii) -extended version).Let X ⊂ R N be a compact set, µ a Borel probability measure on X and T : X → X a Lipschitz map.Fix k ∈ N and β ∈ (0, 1] such that H βk (X × X) = 0. Let {h 1 , . . ., h m } be a 2kinterpolating family on X consisting of β-Hölder functions an let h : X → R be a β-Hölder observable.Then for almost every α = (α 1 , . . ., α m ) ∈ R m and every δ > 0, there exists ε 0 = ε 0 (α, δ) > 0, such that for every 0 where φ α is the k-delay coordinate map corresponding to the observable h α = h + m j=1 α j h j .Proof.By Theorem 5.2, for almost every α ∈ R m the observable h α is deterministically kpredictable and the prediction map is continuous.Hence, the result follows from Theorem 7.2.
Note that Theorem 7.2 together with Corollary 7.3 provide an extended version of the assertion (ii) of Theorem 1.6.In fact, to obtain Theorem 1.6(ii), it is enough to set β = 1 in Corollary 7.3 and use the assumption k > 2dim B X and (17) to conclude H k (X × X) = 0.

Counterexamples
In this section we present an example showing that any of the assumptions dim H µ < k and ID(µ) < k is in general not sufficient to guarantee the continuity of the map S in Theorem 1.18 and the prediction error estimates in Theorem 1.6.In both cases we show that there exists an open set of Lipschitz observables for which the conclusion of the theorem fails.As any prevalent set is dense [HSY92, Fact 2'], this shows that the results are not true under the weaker assumptions. Let Note that A 0 is dense in the unit interval [0, 1] and A is dense in the union of two copies of the unit interval.For the rest of this section we set the dynamics T : X → X as and note that T (A) = A. The following proposition shows that for an open set of Lipschitz observables, the prediction map for k = 1 (whenever exists) cannot be continuous on the full image of A.
Proposition 8.1.There exists a non-empty open set U ⊂ Lip(X), such that for every h ∈ U there exists z ∈ A and a sequence z i ∈ A such that h(z i ) → h(z) as i → ∞, but h(T z i ) does not converge to h(T z).
The following corollary shows that the conditions dim H µ < k and ID(µ) < k are not sufficient to obtain the continuity of the prediction map S in Theorem 1.18 for a prevalent set of Lipschitz observables.
Corollary 8.2.There exist a compact set X ⊂ R 2 , a Borel probability measure µ on X with dim H µ = ID(µ) = 0 and a Lipschitz map T : X → X, such that for a non-empty open set of Lipschitz observables h : X → R and k = 1, the prediction map S : φ(X h ) → φ(X h ) (provided it exists) is not continuous on a set of positive φ * µ-measure.Therefore, the continuity of S on a set of full φ * µ-measure does not hold for a prevalent set of Lipschitz observables h : X → R.
Proof.Define X and T as above.Let µ be a Borel probability measure on R 2 , such that µ(A) = 1 and µ({x}) > 0 for every x ∈ A. Note that dim H µ = ID(µ) = 0, as µ is purely atomic (the fact dim H µ = 0 follows directly from the definition of the Hausdorff dimension, while ID(µ) = 0 follows from [Rén59, Theorem 3]).Fix k = 1 so that φ = h.By Proposition 8.1, there exists a non-empty open set U ⊂ Lip(X), such that for every h ∈ U there exists z ∈ A and a sequence z i ∈ A, i ∈ N such that φ(z i ) → h(z), but φ(T z i ) does not converge to φ(T z).Assume for a contradiction that S is well-defined and continuous on a set of full φ * µ-measure.As µ(z i ) > 0, S must be defined at φ(z i ) for all i.By (6), we have S(φ(z i )) = φ(T z i ).Thus, S(φ(z i )) does not converge to S(φ(z)), even though φ(z i ) converges to h(z).
To show that Theorem 1.6 does not hold with dim H µ or ID(µ) replacing dim B X, we need to consider a more specific family of measures.Let β n , n = 1, 2, . . .be a positive sequence satisfying (27) for some constant M > 0. Consider a purely atomic probability measure ν on A given by ν({q/2 n }) = β n 2 n−1 for n ≥ 1 and odd q ∈ {0, . . ., 2 n − 1}.Note that with this definition, every dyadic rational Combining the above estimates completes the proof.
gives the desired family of sets.See the proof of [BGS20, Theorem 4.3] for more details.
2 n − 1} \ 2N consist of pairwise disjoint sets, hence each point 2 n−K can belong to at most one set in each of these families.Therefore, # (Q n \ Y n ) ≤ 22 n−K , so combining with (28) and recalling K = 8 we obtain #Y n ≥ 2 n−7 .
A T -invariant Borel probability measure µ on Λ is called a natural measure if lim n→∞ 1 n