Stability of invariant measures

We generalize various notions of stability of invariant sets of dynamical systems to invariant measures, by defining a topology on the set of measures. The defined topology is similar, but not topologically equivalent to weak* topology, and it also differs from topologies induced by the Riesz Representation Theorem. It turns out that the constructed topology is a solution of a limit case of a $p$-optimal transport problem, for $p=\infty$.


Introduction
The motivation for this paper is two-fold. The first motivation is related to dynamical systems, and to finding the "right" topology on the set of measures, namely a topology which replicates certain properties of a dynamical system on a metric space to the induced dynamical system on the set of measures. The second motivation is to investigate an alternative formulation of the well-known Optimal transport problem.
The "right" topology on the set of measures. Let f be a continuous function (i.e. a discrete dynamical system) on a metrizable topological space X, and f ♯ be the induced function on the set of Borel probability measures P (X) on X. Then f ♯ is a discrete dynamical system on P (X). Study of the dynamical system f ♯ can often give useful information on the system f , as was shown for example in [5], [12]. Now, a number of fundamental properties of a dynamical system f are not analogous to similar properties of the dynamical system f ♯ , in any of the standard topologies, such as the weak* topology on P (X). We give two examples. Let x ∈ X be a sink or a source of f , and let δ x be the probability measure supported on {x}. Then, typically, δ x is not a sink or a source of f ♯ in the weak* topology. A detailed discussion of this is in Section 5. Now, let A ⊂ X be a closed invariant set of f , and in addition an attracting set. Let A ♯ be the set of all measures in P (X) such that its support is a subset of A. One can see (and we show in Section 5) that A ♯ is an invariant set of f ♯ , but is typically not an attractor with respect to the weak* topology.
The same conclusions as in the last two examples hold for flows and semiflows. They also hold for other usual topologies on the set of measures, induced by topologies on the set C * (X) of all bounded linear functionals on the set of continuous real valued functions C(X), and the Riesz Representation Theorem.
Our goal is to find the "right" topology on the set of measures which would naturally generalize various notions of stability and attraction to dynamical systems on the set of measures. The topology should be close enough to the weak* topology, so that one can find nontrivial compact sets of measures, and use it in various applications. Such a topology constructed here is called dynamical topology. We will also show that, by identifying stable or attracting sets of measures (with respect to the dynamical topology) rather than sets of points, we get better insight and more information on behavior of a chosen dynamical system.
The ∞-optimal transport problem. Let X be a metric space with a metric d, and µ, ν be two Borel probability measures on X. Then the p-Wasserstein distance W p , where 1 ≤ p < ∞, is the function The set T (µ, ν) is the set of all transports; i.e. the set of all Borel probability measures γ on X × X, such that π 1♯ γ = µ, π 2♯ γ = ν, where π 1 , π 2 are projections of X × X to the first, resp. second variable (transports are sometimes called "couplings" in the probability and some ergodic theory literature).
The measure γ which minimizes (1.2) is a solution of the optimal transport problem in the p-norm. Intuitively, the minimizer γ is the measure which describes how the points in the support of µ are coupled to the points in the support of ν, so that the p-norm of the coupling distances is minimal.
There is rich literature on the optimal transport problem, including proofs of existence, uniqueness, and properties for various spaces X and norms 1 ≤ p < ∞ (see e.g. [1], [2], [4], [11]). (The metric d(x, y) can also be replaced by a more general cost function c(x, y).) In this paper we study the case p = ∞. This case may have implications to various optimization problems where the cost of transport does not depend on mass to be transported, but only on the maximal transport distance.
For all 1 ≤ p < ∞ the formula (1.2) generates a metric on the space of the probability measures P (X), called p-Wasserstein metric. One can show that p-Wasserstein metrics are for 1 ≤ p < +∞ uniformly equivalent to each other and to the Prokhorov metric, and so generate the weak* topology on P (X) (see e.g. [6]). We will show that they are neither uniformly nor topologically equivalent to the metric and topology in the case p = +∞, and that the topology in the case p = +∞ is the dynamical topology defined in the first part of the paper.
The structure of the paper. We start with definitions of dynamical metric and dynamical topology, which are the main tools of this paper. Then we show that the dynamical topology indeed differs from the weak* topology, and from topologies induced by the Riesz Representation Theorem. In the third section, we discuss various properties of the dynamical topology. We focus on characterization of convergence of the set of measures with respect to the dynamical topology in the fourth section. The proof of characterization of convergence is combinatorial in character (as such, it could have implications in the combinatorial ergodic theory). We continue with proofs that the dynamical topology indeed gives natural generalizations of various notions of stability to dynamical systems on sets of measures. Finally we analyze the ∞-optimal transport problem, and we show that the ∞-Wasserstein metric generated by the ∞-optimal transport problem generates the dynamical topology on the P (X). We also prove existence of a solution of the ∞-optimal transport problem.

Definition of the dynamical topology
In this paper X is always a compact metric space, equipped with a Borel σalgebra into a measurable space, and d is its metric. Let M (X) be the space of all finite Borel measures on X, and P (X) ⊂ M (X) the space of all probability measures. Let C(X) be the normed (Banach) space of all real valued continuous functions f : X → R. Then M (X) can be naturally embedded in the dual space of all bounded linear functionals on C(X).
We denote by τ w the weak* topology M (X), and by τ u the uniform topology (i.e. the topology induced by the sup-norm on the dual C * (X) of C(X)). The topology τ u is much finer than the topology τ w , and as such is seldom used in the dynamical systems. We will show that both topologies differ from the τ d topology to be constructed. We use the notation "w-", "u-", and "d-" ("d" for the dynamical topology, yet to be defined) when referring to properties of a set or a sequence in various topologies. In particular, w-convergent, u-convergent, and d-convergent means that a sequence of measures in P (X) is convergent with respect to weak*, uniform, or dynamical topology respectively.
We now define dynamical metric and topology on the set P (X). Let I be the unit interval [0, 1], and λ the Lebesgue measure defined on the family of Borelmeasurable subsets of I. Given two functions f, g : I → X, we define their distance as D(f, g) = sup a∈I d(f (a), g(a)).
Distance D is well defined because of compactness of X. It is straightforward to check that D is symmetric, and that it satisfies the triangle inequality D(f, g) + D(g, h) ≥ D(f, h). If f : I → X is a (Borel) measurable function, then f ♯ λ = µ denotes the measure µ(A) = λ(f −1 (A)) for all measurable A ⊆ X. Definition 1. We define the distance between two probability measures µ, ν as ∆(µ, ν) = inf D(f, g), where infimum goes over all measurable functions f, g : I → X, satisfying µ = f ♯ λ, ν = g ♯ λ.
We will now prove in several steps that P (X) equipped with ∆ is indeed a metric space. We will use the following form (as in e.g. [3], Proposition 2.17) of the well known isomorphism Theorem ( [7], Theorem C of Section 41.): Theorem 1. If X is a compact metric space with a nonatomic Borel probability measure µ, then it is isomorphic (in the category of measure spaces) to the the Lebesgue measure λ on the family of Borel measurable subsets of I. Corollary 1. Suppose µ is a Borel probability measure on a compact space X. Then there exists a Borel measurable function f : This implies that the infimum in the definition of ∆ goes over a nonempty set, hence ∆ is well defined.
In the rest of the paper, "measurable" will always mean "Borel-measurable". Lemma 1. Suppose µ ∈ P (X), and let g 1 , g 2 : I → X be measurable functions, such that µ = g 1♯ λ = g 2♯ λ. Then for each ε > 0, there exist measurable, λ-invariant functions h 1 , h 2 : I → I such that Proof. Let C 1 ,C 2 , ..., C k be any pairwise disjoint, measurable cover of the support of µ, such that each C i has diameter less than ε (such a cover exists because of compactness of X). Without loss of generality we also assume that for all i, µ(C i ) > 0. We define a set Y ε ⊆ I × I, and a Borel probability measure ν ε on Y ε , with where λ 2 is the Lebesgue measure on I×I. Since λ 2 (Y ε,i ) = λ(g −1 1 (C i ))·λ(g −1 2 (C i )) = µ(C i ) 2 , one can easily check that ν ε is a probability measure. By definition, for any a ∈ Y ε , where π 1 , π 2 : I 2 → I are coordinate projections. For any measurable A ⊆ I, By using Corollary 1, we find a measurable function h * : I → I 2 such that ν ε = h * ♯ λ, and ν ε (I) ⊆ Y ε . Now (2.2) and (2.3) imply that h 1 = π 1 • h * , h 2 = π 2 • h * are the required functions. Proposition 1. The function ∆ is a metric on P (X).
Definition 2. The topology on P (X) induced by the metric ∆ is called dynamical topology. We denote it by τ d .

Properties of the dynamical topology
We now compare different topologies on P (X), and investigate elementary properties of the dynamical topology.
Proposition 2. The weak* topology on P (X) is coarser than τ d . Equivalently, d-convergence implies w-convergence.
The next simple example shows that the dynamical topology differs from both the weak* and uniform topology on any nontrivial X (i.e. X with more than one element). We will see that the dynamical topology refines the weak* topology in a very different way than the uniform topology.

Example 1. Suppose that µ n is a sequence of atomic measures, each supported on
Without loss of generality we can assume that p n i → q i , and x n i → y i . It is easy to check that µ n is u-convergent if and only if for all i, x n i is eventually constant (i.e. there exists n 0 such that for all n ≥ n 0 , x n i = y i ). On the other hand, one can check that the sequence (µ n ) is d-convergent, if and only if for all i, p n i is eventually constant.
We also deduce that in this example (µ n ) is at the same time uand d-convergent, if and only if it is eventually constant.
Proposition 2 and the Example above imply the following conclusion.
Corollary 2. (i) If X has at least two elements, then τ w ⊂ τ d , but not equal to it; (ii) If X is not a finite set, then τ d ⊂ τ u and τ u ⊂ τ d ; Now we discuss w-connectedness and w-compactness of P (X).
(ii) If X has at least two elements, than P (X) with the dynamical topology is not d-sequentially compact, and not d-compact.
Proof. (i) Suppose that X is path-connected. Let µ, µ ′ be any two measures in P (X), and choose any measurable f, f ′ : , since X is path connected, there is a measurable function g : I × I → X, such that g(., 0) = f , g(., 1) = f ′ , and such that t → g(a, t) is continuous for every a. Now the function t → g(., t) ♯ λ is a d-continuous curve in P (X), connecting µ and µ ′ .
(ii) We construct the following simple example: choose two points x = y in X, and the sequence of measures where δ x , δ y are atomic measures concentrated in x, y. Now µ n w-converges to µ = δ y . Since ∆(µ n , µ) = d(x, y), neither µ n nor any subsequence of µ n d-converge to µ. Proposition 2 implies that µ n has no convergent subsequence, hence P (X) is not d-sequentially compact. Since P (X) is metrizable, P (X) is not d-compact.
We now develop several simple tools used in proofs later in the paper. For a given set J ⊆ I and measurable f, g : We denote the support of a measure µ by supp(µ) . Lemma 2. Suppose J ⊆ I is a measurable set of full measure.
(i) For any measurable f, g : I → X, there exist measurable f , g : (ii) If µ n d-converges to µ, then supp(µ n ) converges to supp(µ) in the Hausdorff topology.
Proof. (i) Suppose µ, ν ∈ P (X). For any ε > 0 there exist measurable functions f, g : I → X, such that µ = f ♯ λ, ν = g ♯ λ, and such that D(f, g) < ∆(µ, ν) + ε. Using Lemma 3, we construct f , g, as in the Lemma. We easily check that The claim (ii) of the Proposition 4 is not true in the weak* topology. The counterexample which is constructed in (3.1) is a sequence of w-convergent measures µ n , converging to a measure µ, but such that supports of measures µ n do not converge to the support of the measure µ.
Assume that a sequence of measures µ n w-converges to a measure µ, and that the sequence of supports of measures µ n converges in the Hausdorff topology to the support of µ. We can not then in general claim that µ n d-converges to µ. A counter-example is the sequence for δ x , δ y as in (3.1). The same example shows that we can find measures µ, ν, such that d H (supp(µ),supp(ν)) = 0, but such that ∆(µ, ν) is arbitrarily large. We now show that the dynamical topology is not much finer than the weak* topology. Proof. Choose µ ∈ P (X), and any ε > 0. Since X is compact, we can find a measurable pairwise disjoint cover C 1 , C 2 , ..., C k of X, such that the diameter of each C i is at most ε. For each 1 ≤ i ≤ k, choose any x i ∈ C i . Let ν ε = k i=1 µ(C i )δ xi , and then ν ε is supported on the finite set {x 1 , ..., x k }. Choose any measurable f : I → X such that µ = f ♯ λ, and define g : With regards to the weak* topology, the set of all measures uniformly supported on a finite (multi)set is w-dense. No similar claim is true in the uniform topology.

Characterization of convergence in the dynamical topology
We first recall some well known properties of convergent sequences of measures. Proposition 6. Assume that µ n w-converges, d-converges or u-converges to µ.
Proof. The proof for w-convergence is e.g. in [13]. The rest follows from Proposition 2 and the definition of u-convergence.
The following notion is the main tool for characterization of d-convergence.

Definition 3. We say that a measurable set
Note that a µ-separating A can have measure 0. In the following, ǫ-neighborhood of a set A is the open set {x ∈ X, such that ∃y ∈ A, d(x, y) < ǫ}. Note also that, since X is compact, if D is any open set such that Cl(A) ⊆ D, then for small ε, an ε-neighborhood of A is a subset of D.
The proof of the following Lemma is an easy exercise.
We prove Theorem 2 in several steps.
Lemma 5. Suppose µ n , µ satisfy conditions (i), (ii) of Theorem 2, and choose any ǫ > 0. Assume that A 1 , A 2 , ..., A m is a cover of X, of measurable, nonempty, pairwise disjoint sets with diameter less or equal than ǫ. Then we can find δ, 0 < δ ≤ ǫ, and an integer n 0 , such that, if B 1 , ..., B m are δ-neighborhoods of A 1 ,..., A m respectively, then for all n ≥ n 0 , and any Proof. Let I be the set of all subsets of indices that the condition (ii) of the Theorem is satisfied in the following sense: for all (i 1 , i 2 , ..., i k ) ∈ I, if B 1 ,..., B m are δ-neighborhoods of A 1 ,..., A m respectively, then there is n 1 such that for all n ≥ n 1 , is µ-separating, we can also assume that δ is small enough, so that The converse of Lemma 4 implies that if (i 1 , i 2 , ..., i k ) / ∈ I, Then because of (4.4), ρ > 0. Now, choose n 2 such that, for all n ≥ n 2 , and for all (i 1 , i 2 , ..., i k ) / ∈ I Such n 2 exists because B i1 ∪B i2 ∪...∪B i k is open, because µ n w-converges to µ, and because of Proposition 6, (i). Let n 0 = max{n 1 , n 2 }. Now, (4.2) and (4.3) imply (4.1) for (i 1 , i 2 , ..., i k ) ∈ I, and (4.5) and (4.6) imply (4.1) for (i 1 , i 2 , ..., i k ) / ∈ I. Lemma 6. Let ξ be a measure on X (positive, not necessarily a normed one), x 1 , x 2 , ..., x m , nonnegative real numbers, and B 1 , B 2 , ..., B m measurable subsets of X, such that for any Then there exist measures ν 1 , ν 2 , ..., ν m (positive, not necessarily normed) such that for all i = 1, ..., m, The proof of Lemma 6 is essentially combinatorial in character and is not related to the rest of the paper, so we postpone its proof to the Appendix. Proof. We can find a finite cover A 1 , A 2 , ..., A m of X of measurable, nonempty, pairwise disjoint sets with diameter less or equal than ε because of compactness of X. We apply Lemma 5 and find 0 < δ ≤ ǫ and an integer n 0 , such that if B 1 , B 2 , ..., B m are the δ-neighborhoods of A 1 , A 2 , ..., A m , and n ≥ n 0 , then (4.1) holds. Choose n ≥ n 0 , and set x i = µ(A i ), ξ = µ n . The relation (4.1) and the fact that (A i ) are pairwise disjoint imply (4.7). As (A i ) is a measurable partition of X, µ(A 1 ) + ... + µ(A m ) = µ(X) = 1 = µ n (X), which is by definition (4.8). Now applying Lemma 6 we obtain positive measures ν 1 , ν 2 , ..., ν m such that Similarly, we can construct a function f n : I → X such that f n (I i ) ⊆ B i , and such that f n♯ λ = µ n . We do it in the following way: let λ i = λ | I i . We construct f n | Ii to be any measurable function so that The construction implies that f (t) ∈ A i if and only if f n (t) ∈ B i . Since A i ⊆ B i , and the diameter of B i is at most 2ǫ, then for any x ∈ A i , y ∈ B i we get d(x, y) ≤ 2ε. We conclude that D(f, f n ) ≤ 2ǫ, and by definition ∆(µ, µ n ) ≤ 2ǫ. Proof. Assume that A is µ-separating, and let D be an open set, such that Cl(A) ⊆ D, and such that µ(D \ A) = 0. Let ε > 0 such that 2ε-neighborhood of A is a subset of D. Let B be the ε-neighborhood of A, and choose an arbitrary open set C such that Cl(A) ⊆ C ⊆ B. Then the construction implies that (4.12) x ∈ C, y ∈ D c =⇒ d(x, y) > ε.
Choose δ < ε, small enough such that the δ neighborhood A is a subset of C. Now find n 0 large enough such that for all n ≥ n 0 , ∆(µ, µ n ) < δ/2. Now we can find functions f , f n such that (4.13) D(f, f n ) < δ, and such that µ = f ♯ λ, µ n = f n♯ λ. Definitions of A and D imply that µ(D \ A) = 0. Now Lemma 2, (i) implies that without loss of generality we can assume that for all (4.14) x Suppose f (x) ∈ A. Then (4.13) implies that d(f (x), f n (x)) < δ, so f n (x) is in δ neighborhood of A, which is a subset of C. Now, suppose f n (x) ∈ C. Now, because of (4.12) and (4.13), f (x) ∈ D c , and because of (4.14), , so we deduce that for all n ≥ n 0 , µ n (C) = µ(C).
We now prove Theorem 2.
Proof. One implication of Theorem 2 follows from Lemma 7; the other from Proposition 2 and Lemma 8.
Theorem 2 could be further modified: to check whether a w-convergent sequence is d-convergent, it is sufficient to prove (ii) only for closed µ-separating sets.
The following important Corollary shows that the dynamical topology is in some sense sufficiently close and similar to the weak* topology, and as such is expected to have various applications. Proof. Indeed, if X is connected and supp µ = X, then there are no µ-separating sets. Now we give yet another characterization of d-convergence. By definition, a sequence µ n ∈ P (X) d-converges to a measure µ ∈ P (X), if there exists a sequence of measurable functions f n , g n : I → X, such that µ n = (f n ) ♯ λ, µ = (g n ) ♯ λ, and D(f n , g n ) → 0. Now we show that g n can be independent of n, and put it in the context of the well-known Skorokhod Theorem (see e.g. [8]).

Theorem 3. (Skorokhod) Assume that X is metrizable, separable, and complete.
A sequence of measures µ n ∈ P (X) is w-convergent, if and only if there exists a sequence of measurable functions f n : I → X, and a function f : I → X, such that for all a ∈ I, f n (a) → f (a), and such that µ n = f n♯ λ, and µ = f ♯ λ.

Corollary 4. Assume that X is metrizable. A sequence of measures µ n ∈ P (X) is d-convergent, if and only if there exists a sequence of measurable functions f n :
I → X, and a function f : I → X, such that f n → f as n → ∞, uniformly on I, and such that µ n = f n♯ λ, and µ = f ♯ λ.
Proof. ⇐=: It follows directly from the definition of ∆.
=⇒: Assume that µ n d-converges to µ. We can adjust the construction of the function f : I → X in the proof of the Theorem 2 so that µ = f ♯ λ, and so that f is independent of ε. We do it by choosing ε = 1/2 n , and constructing the cover A 1 , ..., A m for a chosen ε to be a refinement of the cover A ′ 1 , ..., A ′ m ′ for another ε ′ > ε. We can also, using the same construction, find a sequence f n : I → X so that D(f n , f ) → 0, and that µ n = f n♯ λ, which proves the claim.

Stability of invariant measures
We now show that various notions of stability, including Lyapunov stability, attracting sets, asymptotic stability, and Nekhoroshev stability, generalize well to invariant measures. As in the previous sections, X is a compact metric space. In this section, f is a continuous function on X, and f ♯ is then a d-continuous function on P (X), f ♯ µ(A) = µ(f −1 (A)) for all Borel measurable A.

Definition 4. For all Borel measurable
A ⊆ X, we call the set A ♯ the lift of the set A, defined as the set of all measures µ ∈ P (X), such that supp(µ) ⊆ A.
All properties (i)-(vi) follow directly from the definition of the lift. Note that in general in (v) and (vi) equality does not hold, so lift ♯ is not a morphism of set algebras.
⊇: Let µ ∈ f (A) ♯ , i.e. supp(µ) ⊆ f (A). Let µ n = mn k=1 λ n k δ(y n k ) be any sequence of finitely supported measures which w-converges to µ, and such that for all n, k, y n k ∈supp(µ); let x n k ∈ A be any sequence such that f (x n k ) = y n k , and let ν be the limit point of a w-convergent subsequence of ν n = mn k=1 λ n k δ(x n k ). Then supp(ν) ⊆ f −1 (supp(µ)) ⊆ A, and because of continuity In the following, ε-neighborhoods of sets of measures and other properties in P (X) are always with respect to the dynamical topology, unless specified otherwise.
The key property of the dynamical topology is the following lemma: Proof. ⇒: Denote by V the ε-neighborhood of A ♯ in P (X). (i) Claim: U ♯ ⊆ V. Choose any µ ∈ U ♯ , and then by definition supp(µ) ⊆ U . Since X is compact and supp(µ) closed, there exists δ > 0 such that supp(µ) is a subset of a (ε − 2δ)-neighborhood of A. Now, Proposition 5 implies that there is a measure ν δ supported on a finite set {x 1 , ..., x k } such that ∆(µ, ν δ ) < δ. Without loss of generality we also assume that ν δ ({x i }) > 0 for all i, and then it is easy to see that for all 1 ≤ i ≤ k, x i is in the (ε − δ)-neighborhood of A. We now choose y 1 , ..., y k ∈ A, such that d(x i , y i ) < ε − δ, and define ν = k i=1 ν δ ({x i })δ yi . Now, ν ∈ A ♯ , and because of the choice of x i and y i it is ∆(ν δ , ν) ≤ ε − δ, hence by triangle inequality ∆(µ, ν) < ε.
⇐: It now follows from uniqueness of ε-neighborhood.
Lemma 11 is not true for weak* topology, or uniform topology.

Proof. Lemma 11 implies that A is open if and only if A ♯ is open.
Assume that A is closed, and choose any convergent sequence of measures µ n ∈ A ♯ , converging to µ. Proposition 6, (ii) now implies that supp(µ) ⊆ A, hence µ ∈ A ♯ and A ♯ is closed. Now, assume that A ♯ is closed, and let x n be any convergent sequence in A, converging to x. Now by definition of d-convergence, δ xn converges to δ x , and since A ♯ is closed, δ x ∈ A ♯ . By definition x ∈ A, so A is closed. Definition 5. Lyapunov stability. Given a continuous function f on X, we say that a closed invariant set A of the dynamical system f is Lyapunov stable, if for each ε > 0, there exists δ > 0, such that if U , V are ε, δ neighborhoods of A respectively, then for all n ≥ 0, n ∈ N, f n (V ) ⊆ U .

Proposition 7. Suppose f is a continuous function on X. A closed invariant set A is Lyapunov stable with respect to f , if and only if
Proof. Suppose A is Lyapunov stable, i.e. for a given ε > 0, f n (V ) ⊆ U for some δ > 0 and all n ≥ 0, with U , V respectively ε, δ neighborhoods of A. By definition, f n (V ) ⊆ U is equivalent to f n (V ) ♯ ⊆ U ♯ , and because of Lemma 10 and f n (V ) ♯ = f n ♯ (V ♯ ), this is equivalent to f n ♯ (V ♯ ) ⊆ U ♯ . Lemma 11 states that U , V are respectively ε, δ neighborhoods of A if and only if U ♯ , V ♯ are respectively ε, δ neighborhoods of A ♯ , which completes the proof. Definition 6. Asymptotic stability. Given a continuous function f on X, we say that a closed invariant set A of a dynamical system f is asymptotically stable, if there exists ε > 0, such that for each x ∈ U , where U is the ε-neighborhood of A, lim n→∞ d(f n (x), A) = 0.

Proposition 8. Suppose f is a continuous function on X. A closed invariant set
A is asymptotically stable with respect to f , if and only if A ♯ is asymptotically stable with respect to f ♯ .
Proof. =⇒: Suppose A is asymptotically stable. Let ε > 0, U , be as in the definition of the asymptotic stability, and choose any δ > 0. Now, because of compactness of X, there is n 0 such that for all n ≥ n 0 , f n (U ) ⊆ V , where V is the δ-neighborhood of A. This, and Lemma 11, imply that for n ≥ n 0 , and any measure µ ∈ U ♯ , f n ♯ (µ) is in the δ-neighborhood of A ♯ . Since δ was arbitrary, A ♯ is indeed asymptotically stable.
⇐: Assume that A ♯ is asymptotically stable, and choose ε > 0, U ♯ , as in the definition of asymptotic stability, where because of Lemma 11, U is the ε-neighborhood of A. By definition, if x ∈ U , then δ x ∈ U ♯ , and then ∆(f n ♯ (δ x ), A ♯ ) → 0 as n → ∞. That, and the relation f n ♯ (δ x ) = δ f n (x) now imply that d(f n (x), A) → 0 as n → ∞, which proves that A is asymptotically stable.
Somewhat stronger property than asymptotic stability is that of an attractor. Proof. The Proposition follows from the following equivalences: (I) Lemma 11 implies that U ♯ is the ε-neighborhood of A ♯ , if and only if U is the ε-neighborhood of A; (II) Lemma 9, (ii) and Lemma 10 imply that f N (U ) ⊆ U if and only if f N ♯ (U ♯ ) ⊆ U ♯ ; and (III) Lemma 9, (iv) and Lemma 10 imply that ∩ ∞ k=1 f n ♯ (U ♯ ) = (∩ ∞ n=1 f n (U )) ♯ . That, and Lemma 9, (i), now imply that An analogous claim holds for repellers, sinks and for sources.

Proposition 10. Suppose f is a continuous function on X. A closed invariant set
A is exponentially stable with respect to f , if and only if A ♯ is exponentially stable with respect to f ♯ .
Proof. It follows directly from the definition of exponential stability, Lemma 11 and Lemma 12.
A similar claim can be also proven for Nekhoroshev stability (see e.g. [10]). We now give an example which shows that the claims above do not hold for uniform or weak* topology. In other words, we show that uniform or weak* topology on P (X) are not the right topologies for generalizing notions of stability to spaces of measures.

Example 2.
Assume that f is a dynamical system with one attracting sink x, and one source y. Such a dynamical system can be constructed on a, say, 2-sphere, with the sink and the source being the poles. We denote by δ x and δ y the probability measures concentrated on x, y respectively, and we define µ ε = (1 − ε)δ x + εδ y , for a given ε > 0 . Now, for small ε > 0, µ ε is arbitrarily u-close and w-close to δ x (but not d-close!). However, µ ε is a f ♯ -fixed point. We conclude that {δ x } = {x} ♯ is neither an attractor, nor asymptotically stable set for f ♯ in weak* or uniform topology on P (X). We say that a f -invariant measure µ is Lyapunov stable, asymptotically stable, an attractor, or exponentially stable, if {µ} is Lyapunov stable, asymptotically stable, an attractor, or exponentially stable respectively, with respect to f ♯ and the dynamical topology on P (X).
More generally, if A is a f ♯ -invariant, d-closed set of measures, we say that it is Lyapunov stable, asymptotically stable, an attractor, or exponentially stable, if it is so with respect to f ♯ and the dynamical topology on P (X).
An information on stability of invariant measures gives much more information on dynamics in a neighborhood of a set (which can be the support of an invariant measure), then the information on stability of invariant sets. The following example illustrates that claim.
Example 3. Let X = R 2 /Z 2 be a 2-torus, and we define a function f (x, y) = (x ′ , y ′ ) with (a standard map with k = 0). Let A be the circle y = 0. Then A is a closed invariant set (all points on A are fixed), and also Lyapunov stable. Let λ be the Lebesgue measure on A, and ν any probability measure supported on A, different from λ. Both λ, ν are f -invariant, and supported on a Lyapunov stable set A. Now, one can check that λ is Lyapunov stable, and ν is not. This reflects the fact that the rotation in the vicinity of A is "with uniform speed". Similar examples could be constructed for attractors and exponentially stable sets.

The optimal transport problem
In this section we define the ∞-optimal transport problem, and show that the metric on the set of measures induced by the ∞-optimal transport problem is equal to the dynamical ∆ metric. We then deduce that the ∞-optimal transport problem generates a different structure on the set of of measures than the p-optimal transport problem for any 1 ≤ p < ∞.
Definition 10. Let X be a metric space with a metric d, and µ, ν be two Borel probability measures on X.
Distance of measures µ, ν with respect to a transport γ ∈ T (µ, ν) is defined with The ∞-Wasserstein distance of two measures µ, ν is defined with If the minimum in (6.1) is attained, then any measure γ for which ∆ γ (µ, ν) = ∆ ∞ (µ, ν) is called a solution of the optimal transport problem with respect to the measures µ, ν. Now we prove that the ∞-Wasserstein distance is the same as the dynamical metric ∆.
Corollary 7. Assume that X has at least two elements. Then ∞-Wasserstein metric ∆ ∞ = ∆ is neither uniformly nor topologically equivalent to any of the p-Wasserstein metrices W p for 1 ≤ p < ∞.
Proof. For any p, 1 ≤ p < ∞, the metric W p is uniformly equivalent to the Prokhorov metric on P (X) (see e.g. [6]), hence topologically equivalent to the weak* topology. Corollary 2, (i) implies the claim.
We close this Section with a proof that the ∞-optimal transport problem has a solution.

Appendix: The proof of Lemma 6
In this Appendix we prove the fact from the proof of Theorem 2, which is essentially combinatorial in character. Assume that an integer m, any family of measurable subsets (B 1 , B 2 , ..., B m ) of X, and a measure ξ are given. We first introduce some notation and definitions. Let P({1, ..., m}) be the set of all subsets of {1, ..., m}, and for a nonempty ϕ ∈ P({1, ..., m}), (The arrangement ρ depends also on the measure ξ, which we omit from the argument of ρ because it is always clear which measure is being considered). The k th arrangement ρ k shows how many sets of intersections of exactly k sets B i have positive measure ξ. The arrangement ρ is the number of such sets B ϕ for any k > 0, such that they have positive measure ξ, and then Now we prove Lemma 6.
and also for all 1 ≤ k ≤ m, x k > 0.
(Note that the assumptions of the Case 3 do not apply that for all nonempty ϕ, ξ(B ϕ ) > 0.) Let p be any p ∈ ψ.