On the numerical approximation of the Perron-Frobenius and Koopman operator

Information about the behavior of dynamical systems can often be obtained by analyzing the eigenvalues and corresponding eigenfunctions of linear operators associated with a dynamical system. Examples of such operators are the Perron-Frobenius and the Koopman operator. In this paper, we will review different methods that have been developed over the last decades to compute finite-dimensional approximations of these infinite-dimensional operators - e.g. Ulam's method and Extended Dynamic Mode Decomposition (EDMD) - and highlight the similarities and differences between these approaches. The results will be illustrated using simple stochastic differential equations and molecular dynamics examples.


Introduction
The two main candidates for analyzing a dynamical system using operator-based approaches are the Perron-Frobenius and the Koopman operator. These two operators are adjoint to each other in appropriately defined function spaces and it should therefore theoretically not matter which one is used to study the system's behavior. Nevertheless, different methods have been developed for the numerical approximation of these two operators.
The Perron-Frobenius operator has been used extensively in the past to analyze the global behavior of dynamical systems stemming from a plethora of different areas such as molecular dynamics [47,51], fluid dynamics [27,23], meteorology and atmospheric sciences [54,53], or engineering [57,45]. Toolboxes for computing almost invariant sets or metastable states are available and efficiently approximate the system's behavior using adaptive box discretizations of the state space. An example of such a toolbox is GAIO [15]. This approach is, however, typically limited to low-dimensional problems.
Recently, several papers have been published focusing on data-based numerical methods to approximate the Koopman operator and to analyze the associated Koopman eigenvalues, eigenfunctions, and modes [10,58,59]. These methods extract the relevant global behavior of dynamical systems and can, for example, be used to find lower-dimensional approximations of a system and to split a system into fast and slow subsystems as described in [24]. In many applications, the complex behavior of a dynamical system can be replicated by a small number of modes [59].
The approximation of the Perron-Frobenius operator typically requires short simulations for a large number of different initial conditions, which, without prior knowledge about the system, grows exponentially with the number of dimensions; the approximation of the Koopman operator, on the other hand, relies on potentially fewer, but longer simulations [10]. However, we will show that this is not necessarily the case, the Perron-Frobenius operator can also be approximated using just a small number of long simulations. Thus, the latter approach might be well-suited for experimentally obtained data running just a few tests with different initial conditions for a longer time. Whether the numerically obtained operator then captures the full dynamics of the system, however, depends strongly on the initial conditions chosen.
While the Koopman operator is the adjoint of the Perron-Frobenius operator, the connections between different approaches to approximate these operators have -to our knowledge -not been fully described. In this paper, we will review different numerical methods to approximate the Perron-Frobenius operator and the Koopman operator and illustrate the similarities and differences between these approaches. We will mainly focus on simple stochastic differential equations and molecular dynamics applications.
For such systems, it can be quickly seen that the Perron-Frobenius operator satisfies Pf (y) = f (x)k(x, y) dµ(x) , (2) and that the Markov operator property holds as well. If the measure µ is invariant, i.e., µ(A) = A k(x, y) dµ(x) dµ(y) for every A ∈ B, then P : L p (X) → L p (X) is a well-defined non-expansive operator for every p ∈ [1, ∞], as in the deterministic case.
Invariant (or stationary) densities play a special role. These are densities f (i.e., positive functions with unit L 1 norm) which satisfy Pf = f . If such a density f is unique, the system is called ergodic, and satisfies for any g ∈ L p (X), p ∈ [1, ∞], that lim n→∞ 1 n P-almost surely (a.s.) for µ-a.e. x ∈ supp(f ), where supp(f ) is the set {f > 0}. With some additional assumptions on k, the convergence in (3) is geometric, with the rate governed by the second dominant eigenvalue of P. In general, eigenfunctions associated with subdominant eigenvalues correspond to the slowly converging transients of the system and yield information about metastable sets; sets between which a dynamical transition is a rare event.
where ·, · µ is the duality pairing between L 1 and L ∞ functions. For specific combinations of Φ and µ, the Koopman operator can be defined on L 2 (X), too 3 ; in what follows, we assume that this is the case. Again, K is an infinite-dimensional linear operator that characterizes the finite-dimensional nonlinear system Φ. To obtain the dynamics of a system defined on X ⊂ R d , use the set of observables g i (x) = x i , i = 1, . . . , d, or in shorthand, the vector-valued observable g(x) = x, where g is called full-state observable. On vector-valued functions, the Koopman operator acts componentwise.
In order to maintain duality with the Perron-Frobenius operator, for the non-deterministic system Φ with transition density function k, the Koopman operator is defined as where E[·] denotes the expectation value with respect to the probability measure underlying Φ(x). Note that while the Koopman operator was defined here for a discrete-time dynamical system, the definition can be extended naturally to continuous-time dynamical systems as described in [10].
If ϕ 1 and ϕ 2 are eigenfunctions of the Koopman operator with eigenvalues λ 1 and λ 2 , then also the product ϕ 1 ϕ 2 is an eigenfunction with eigenvalue λ 1 λ 2 . The product of two functions is defined pointwise, i.e. (ϕ 1 ϕ 2 )(x) = ϕ 1 (x) ϕ 2 (x). Analogously, for any eigenfunction ϕ and r ∈ R, ϕ r is an eigenfunction with eigenvalue λ r assuming that ϕ(x) = 0 for r < 0. Example 2.1. Consider a linear dynamical system of the form x k+1 = A x k with A ∈ R d×d , cf. [10,58]. Let A have d left eigenvectors 4 w i with eigenvalues µ i , i.e. w i A = µ i w i for i = 1, . . . , d. Then ϕ i (x) = w i x is an eigenfunction of the Koopman operator K with corresponding eigenvalue λ i = µ i since As described above, also products of these eigenfunctions  Let f : X → R be an observable of the system that can be written as a linear combination of the linearly independent eigenfunctions ϕ i , i.e.
. 4 Here and in what follows, left eigenvectors are represented as row vectors.
Analogously, for vector-valued functions F = [f 1 , . . . , f n ] T , we get where v i = [c i,1 , . . . , c i,n ] T . These vectors v i corresponding to the eigenfunctions ϕ i are called Koopman modes.
Definition 2.2. Given an eigenfunction ϕ i of the Koopman operator K and a vector-valued observable F , the vector v i of coefficients of the projection of F onto span{ϕ i } is called Koopman mode.
The connection between the dynamical system Φ and the Koopman operator K is given by the full-state observable g(x) = x and the corresponding Koopman eigenvalues λ i , eigenfunctions ϕ i , and eigenmodes v i required to retrieve the full state [58]. Since (Kg)(x) = (g • Φ)(x) = Φ(x) and, using the Koopman modes v i belonging to g, we can compute Φ(x) with the aid of the Koopman operator. A pictorial representation of the relationship between states and observables as well as the evolution operator and Koopman operator can be found in [58].

Numerical approximation 3.1 Generalized Galerkin methods
The Galerkin discretization of an operator A over some Hilbert space H can be described as follows. Suppose we have a finite-dimensional subspace V ⊂ H with basis (ψ 1 , . . . , ψ k ) given. The Galerkin projection of A to V is the unique linear operator A : V → V satisfying ψ j , Aψ i = ψ j , Aψ i , for all i, j = 1, . . . , k .
If the operator A is not given on a Hilbert space, just a Banach space, it can be advantageous to take basis functions (with respect to which the projected operator is defined) and test functions (which serve in (5) to project objects not necessarily living in the same subspace) from different sets. If A : Y → Y is an operator on a Banach space Y, V ⊂ Y a subspace with basis (ψ 1 , . . . , ψ k ), W ⊂ Y * a subspace of the dual of Y with basis (ψ * 1 , . . . , ψ * k ), i.e. dim V = dim W, then the Petrov-Galerkin projection of A is the unique linear operator A : V → V satisfying where ·, · denotes the duality bracket. This idea can be taken one step further, resulting in a Petrov-Galerkin-like projection even if l := dim W > dim V. In this case, (6) is over-determined and the projected operator A is defined as the solution of the least-squares problem We refer to this as the over-determined Petrov-Galerkin method.

Ulam's method
Probably the most popular method to date for the discretization of the Perron-Frobenius operator is Ulam's method; see e.g. [56,14,3,24]. Let {B 1 , . . . , B k } ⊂ B be a covering of X by a finite number of disjoint measurable boxes and let 1 Bi be the indicator function for box B i , i.e.
Ulam's method is a Galerkin projection of the Perron-Frobenius operator to the subspace spanned by these indicator functions. More precisely, if one chooses the basis functions ψ i = 1 µ(Bi) 1 Bi , then from the relationship we can represent the discrete operator by a matrix P = (p ij ) ∈ R k×k with The denominator µ(B i ) normalizes the entries p ij so that P becomes a row-stochastic matrix. Thus, P defines a finite Markov chain and has a left eigenvector with the corresponding eigenvalue λ 1 = 1. This eigenvector approximates the invariant measure of the Perron-Frobenius operator P [37,40,22,16]. The entries p ij of the matrix P can be viewed as the probabilities of being mapped from box B i to box B j by the dynamical system Φ. These entries can be estimated by randomly choosing a large number of test points x On the one hand, this is a Monte-Carlo approach to estimate the integrals in (8), and hence a numerical realization of Ulam's method. On the other hand, it is also an over-determined Petrov-Galerkin method (7) with test functionals ψ * being point evaluations at the respective sample points x ; i.e., for a piecewise continuous function ϕ we have ψ * (ϕ) = ϕδ x dµ = ϕ(x ). One can see this by noting that due to the disjoint support of the basis functions 1 Bi the sum in (7) decouples and the entries of P can be readily seen to be as on the right-hand side of (10). The effect of Monte-Carlo sampling and the choice of the partition on the accuracy and convergence of Ulam's method has been investigated in [4,41,33].
Remark 3.1. We note that, given independent random test points x (l) i ∈ B i , l = 1, . . . , n, expression (10) is a maximum-likelihood estimator for (9). This holds true in the non-deterministic case as well, where (9) reads as

Further discretization methods for the Perron-Frobenius operator
Petrov-Galerkin type and higher order methods. Ulam's method is a zeroth order method in the sense that it uses piecewise constant basis functions. We can achieve a better approximation of the operator (and its dominant spectrum, in particular) if we use higher order piecewise polynomials in the Galerkin approximation; see [18,20].
If the eigenfunctions of the Perron-Frobenius operator are expected to have further regularity, the use of spectral methods can be advantageous [30,26]. Here, collocation turns out to be the most efficient, in general; i.e., where basis functions are Fourier or Chebyshev polynomials [9], and test functions are Dirac distributions centered in specific domain-dependent collocation points. Mesh-free approaches with radial basis functions continuously gain popularity due to their flexibility with respect to state space geometry [25,60].
A kind of regularity different from smoothness is if functions of interest do not vary simultaneously strongly in many coordinates, just in very few of them. Sparse-grid type Galerkin approximation schemes [11] are well suited for such objects; their combination with Ulam's method has been considered in [32].
Higher-order approximations do have, however, an unwanted disadvantage: the discretized operator is not a Markov operator (a stochastic matrix), in general [33,Section 3]. This desirable structural property can be retained if one considers specific Petrov-Galerkin methods; cf. [19], where the basis functions are piecewise first-or secondorder polynomials and the test functions are piecewise constant.
Maximum entropy optimization methods. Let us consider a Petrov-Galerkin method for discretizing the Perron-Frobenius operator P, such that ψ * One might as well alleviate the condition g ∈ V, at the cost of not having a unique solution to (11). Then, in order to get a unique solution, one has to impose additional conditions on g. If one considers (11) as constraints, one could formulate an optimization problem whose solution is g. There is, of course, no trivial choice of objective functional for this optimization problem, however energy-type (i.e. g 2 dµ) and entropy-type (i.e. g log g dµ) objective functionals turned out to be advantageous to use [17,6,5,7]. The reason for this is that the available convergence analysis for Ulam's method is quite restrictive [37,20,22], and these optimization-based methods yield novel convergent schemes to approximate invariant densities of non-singular dynamical systems -to this end, one sets g = f in (11).
The down-side of this method is that in order to represent the approximate invariant density, one has to compute "basis functions" which arise as non-trivial combinations of the test functions h j and the dynamics Φ.

Extended dynamic mode decomposition
An approximation of the Koopman operator, the Koopman eigenvalues, eigenfunctions, and eigenmodes can be computed using Extended Dynamic Mode Decomposition (EDMD). Note that we are using a slightly different notation than [58,59] here to make the relationship with other methods, in particular Ulam's method and Dynamic Mode Decomposition (DMD, defined in Remark 3.6 below), more apparent. In order to obtain EDMD, we take the basis functions ψ i , as above, and for the test function(al)s, we take delta distributions δ xj , that is, δ x , ψ = ψ(x). EDMD requires data, i.e. a set of values x i and the corresponding and additionally a set of basis functions or observables D = {ψ 1 , ψ 2 , . . . , ψ k } called dictionary. EDMD takes ideas from collocation methods, which are, for example, used to solve PDEs, where the x i are the collocation points rather than a fixed grid [59]. Writing Ψ = ψ 1 ψ 2 · · · ψ k T as a vector of functions, that is Ψ : Here, K ∈ R k×k applied from the right to vectors in R 1×k represents the projection of K with respect to the basis (ψ 1 , . . . , ψ k ). If the number of basis functions and test functions does not match, (6) cannot be satisfied in general and a least squares solution of the (usually overdetermined) system of equations is given by applying Ψ + X , the pseudoinverse of Ψ X , giving A more detailed description can be found in Appendix D. For the sake of convenience and to compare DMD and EDMD, we define M K = K T . This approach becomes computationally expensive for large m since it requires the pseudoinverse of the k × m matrix Ψ X . Another possibility to compute K is where the matrices A, G ∈ R k×k are given by In order to obtain the second EDMD formulation from the first, the relationship Ψ + X = Ψ T X (Ψ X Ψ T X ) + was used. For a detailed derivation of these results, we refer to [58,59].
An approximation of the eigenfunction ϕ i of the Koopman operator K is then given by Example 3.2. Let us consider the linear system described in Example 2.1 again. The eigenfunctions computed using EDMD with the basis functions ψ l = x l1 1 x l2 2 , 0 ≤ l 1 , l 2 ≤ 5, are in very good agreement with the theoretical results. EDMD computes exactly the eigenfunctions shown in Figure 1 with negligibly small numerical errors ε < 10 −10 , where we computed the maximum difference between the eigenfunctions and their approximation. The first eight nontrivial eigenvalues of M K are In order to obtain the Koopman modes for the full-state observable g(x) = x introduced above, define ϕ = [ϕ 1 , . . . , ϕ k ] T and let B ∈ R d×k be the matrix such that g = B Ψ, then ϕ = Ξ Ψ and Note that since Ξ is the matrix which contains all left eigenvectors of M K , the matrix Ξ −1 needed for reconstructing the full-state observable g contains all right eigenvectors of M K . That is, the Koopman eigenfunctions ϕ = Ξ Ψ are approximated by the left eigenvectors of M K and the Koopman modes V = B Ξ −1 by the right eigenvectors (cf. [58], with the difference that there the observables and eigenfunctions are written as column vectors and the data matrices Ψ X and Ψ Y are the transpose of our matrices; we chose to rewrite the EDMD formulation in order to illustrate the similarities with DMD and other methods). Remark 3.4 (Convergence of EDMD to a Galerkin method). As described in [58], EDMD converges to a Galerkin approximation of the Koopman operator for large m if the data points are drawn according to a distribution µ. Using the Galerkin approach, we would obtain matrices A and G with entries . Then K T = A G −1 would be the finite-dimensional approximation of the Koopman operator K. Clearly, the entries a ij and g ij of the matrices A and G in (14) converge to a ij and g ij for m → ∞, since Remark 3.5 (Variational approach for reversible processes). The EDMD approximation of the eigenfunctions of the Koopman operator is given by the left eigenvectors ξ of the matrix M K = A G + , i.e. ξ M K = λ ξ, and can be -provided that G is regular -reformulated as a generalized eigenvalue problem of the form ξ A = λ ξ G. This results in a method similar to the variational approach presented in [42] for reversible processes. A tensor-based generalization of this method can be found in [44].
Remark 3.6 (DMD). Dynamic Mode Decomposition was first introduced in [48] and is a powerful tool for analyzing the behavior of nonlinear systems which can, for instance, be used to identify low-order dynamics of a system [55]. DMD analyzes pairs of d-dimensional data vectors x i and y i = Φ(x i ), i = 1, . . . , m, written again in matrix form (12). Assuming there exists a linear operator M L that describes the dynamics of the system such that The DMD modes and eigenvalues are then defined to be the eigenvectors and eigenvalues of M L . The matrix M L minimizes the cost function There are different algorithms to compute the DMD modes and eigenvalues without explicitly computing M L which rely on the (reduced) singular value decomposition of X. For a detailed description, we refer to [55].
Remark 3.7 (DMD and EDMD). The first EDMD formulation (13) shows the relationship between DMD and EDMD. Let the vector of observables be given by Ψ(x) = x. Then Ψ X = X and Ψ Y = Y , thus i.e. the DMD matrix M L is an approximation of the Koopman operator K using only linear basis functions. Since B = I, the Koopman modes are V = Ξ −1 , which are the right eigenvectors of M K and thus the right eigenvectors of M L , which illustrates that the Koopman modes in this case are the DMD modes. Hence, (exact) DMD can be regarded as a special case of EDMD.
Remark 3.8 (Sparsity-promoting DMD). A variant of DMD aiming at maximizing the quality of the approximation while minimizing the number of modes used to describe the data is presented in [31]. Sparsity is achieved by using an 1 -norm regularization approach. The 1 -norm can be regarded as a convexification of the cardinality function. The resulting regularized convex optimization problem is then solved with an alternating direction method. That is, the algorithm alternates between minimizing the cost function and maximizing sparsity.
In the same way, a sparsity-promoting version of EDMD could be constructed in order to minimize the number of basis functions required for the representation of the eigenfunctions.

Kernel-based extended dynamic mode decomposition
In some cases, it is possible to improve the efficiency of EDMD using the so-called kernel trick [59]. In fluid problems, for example, the number of measurement points k is typically much larger than the number of measurements or snapshots m. Suppose f (x, y) = (1 + x T y) 2 for x, y ∈ R 2 , then f (x, y) = 1 + 2 x 1 y 1 + 2 x 2 y 2 + 2 x 1 x 2 y 1 y 2 + x 2 1 y 2 1 + x 2 2 y 2 2 = Ψ(x) T Ψ(y) for the vector of observables Ψ(x) = 1, The kernel function f (x, y) = (1 + x T y) p for x, y ∈ R d will generate a vector-valued observable that contains all monomials of order up to and including p.
That is, instead of O(k), the computation of the inner product is now O(d) since inner products are computed implicitly by an appropriately chosen kernel function.
In [59], it is shown that any left eigenvector v of M K for an eigenvalue λ = 0 can be written as v =v Ψ T X , witĥ v ∈ R m . Using the relationship Ψ + and thus a left eigenvector of M K can be computed by a left eigenvector ofM K =K T =ÂĜ + multiplied by Ψ T X , whereÂ = Ψ T X Ψ Y ∈ R m×m andĜ = Ψ T X Ψ X ∈ R m×m . The entries of the matricesÂ andĜ can be computed efficiently byâ using the kernel function f . The computational cost for the eigenvector computation now depends on the number of snapshots m rather than the number of observables k. For a more detailed description, we refer to [59].

Duality
In this section, we will show how, given the eigenfunctions of the Koopman operator, the eigenfunctions of the adjoint Perron-Frobenius operator can be computed, or vice versa. The goal here is to illustrate the similarities between the different numerical methods presented in the previous sections and to adapt methods developed for one operator to compute eigenfunctions of the other operator. We will focus in particular on Ulam's method and EDMD.

Ulam's method and EDMD
Let us consider the case where the dictionary contains the indicator functions for a given box discretization where 1 n ∈ R n is the vector of all ones. The pseudoinverse of this matrix is Ψ + X = 1 n Ψ T X and the matrix M K = Ψ Y Ψ + X ∈ R k×k with entries m ij has the following form Comparing the entries m ij of M K with the entries p ij of P in (10), it turns out that M K = P T and thus P = K. That is, EDMD with indicator functions for a given box discretization computes the same finite-dimensional representation of the operators as Ulam's method.

Computation of the dual basis
For the finite-dimensional approximation, let ϕ i be the eigenfunctions of K and ϕ i the eigenfunctions of the adjoint operator P, i = 1, . . . , k. Since Kϕ i , ϕ j µ = λ i ϕ i , ϕ j µ and ϕ i , P ϕ j µ = λ j ϕ i , ϕ j µ , subtracting these two equations gives 0 = (λ i − λ j ) ϕ i , ϕ j µ . The left-hand side of the equation is zero due to the definition of the adjoint operator. Thus, if λ i = λ j , the scalar product must be zero. Furthermore, ϕ j can be scaled in such a way that ϕ i , ϕ i µ = 1. Hence, we can assume that ϕ i , ϕ j µ = δ ij .
Let now B = (b ij ) ∈ C k×k and C = (c ij ) ∈ C k×k . Define b ij = ϕ i , ϕ j µ and write It follows that the coefficients c ij have to be chosen such that C = B −1 . In order to obtain the matrix B, we compute Here, we assume that the matrix G is invertible. It follows that The drawback of this approach is that all the eigenvectors of the matrix M K need to be computed, which -for a large number of basis functions -might be prohibitively time-consuming. We are often only interested in the leading eigenfunctions.

EDMD for the Perron-Frobenius operator
EDMD as presented in Section 3 can also directly be used to compute an approximation of the eigenfunctions of the Perron-Frobenius operator. Since the entries of the matrix A T are given by Pψ i , ψ j µ . The matrices A and G are approximations of A and G, respectively. Thus, the eigenfunctions of the Perron-Frobenius operator can be approximated by computing the eigenvalues and left eigenvectors of Analogously, the generalized eigenvalue problem can be solved. We discuss an even more general way of approximating the adjoint operator in Appendix A.
Example 4.1. Let us compute the dominating eigenfunction of the Perron-Frobenius operator for the linear system introduced in Example 2.1. Note that the origin is a fixed point and we would expect the invariant density to be the Dirac distribution δ with center (0, 0). Using monomials of order up to 10 and thin plate splines of the form ψ(r) = r 2 ln r, where r is the distance between the point (x, y) and the center, respectively, we obtain the approximations shown in Figure 2. This illustrates that the results strongly depend on the basis functions chosen. EDMD will return only meaningful results if the eigenfunctions can be represented by the selected basis. One possibility to detect whether the chosen basis is sufficient to approximate the dominant eigenfunctions accurately is to add additional basis functions and to check whether the results remain essentially unchanged. Here, one should take into account that the condition number of the problem might deteriorate if a large number of basis functions is used. Another possibility is to compute the residual Ψ Y − K T Ψ X F . A large error indicates that the set of basis functions cannot represent the eigenfunctions accurately.
using the fact that G is symmetric.
This shows that the eigenfunctions of the Koopman operator are approximated by the left eigenvectors and the eigenfunctions of the Perron-Frobenius operator by the right eigenvectors of the generalized eigenvalue problem with the matrix pencil given by (A, G). The advantage of this approach is that arbitrary basis functions can be chosen to compute eigenfunctions of the Perron-Frobenius operator. This might be beneficial if the eigenfunctions can be approximated by a small number of smooth functions -for instance monomials, Hermite polynomials, or radial basis functions -whereas using Ulam's method a large number of indicator functions would be required.

Numerical examples
In this section, we will illustrate the different methods described in the paper using simple stochastic differential equations and molecular dynamics examples.

Double-well problem
Consider the following stochastic differential equation where w t,1 and w t,2 are two independent standard Wiener processes. In this example, the potential, shown in Figure 3a, is given by V (x, y) = (x 2 − 1) 2 + y 2 and σ = 0.7. Numerically, this system can be solved using the Euler-Maruyama method, which, for an SDE of the form can be written as where h is the step size and ∆w k = w k+1 − w k ∼ N (0, h). Here, N (0, h) denotes a normal distribution with mean 0 and variance h. A typical trajectory of system (17) is shown in Figure 3b. That is, Ulam's method requires 2500 parameters to describe the eigenfunctions while EDMD requires only 66. For each box, we generated n = 100 test points, i.e. 250000 test points overall, and used the same test points also for EDMD resulting in Ψ X , Ψ Y ∈ R 66×250000 . The system (17) is solved using the Euler-Maruyama method with a step size of h = 10 −3 . One evaluation of the corresponding dynamical system Φ corresponds to 10000 steps. That is, each initial condition is integrated from t 0 = 0 to t 1 = 10. The first two eigenfunctions of the Perron-Frobenius operator and Koopman operator are shown in Figure 4. Observe that the computed eigenvalues are -as expected -almost identical. The second eigenfunction computed with Ulam's method is still very coarse, increasing the number of test points per box would smoothen the approximation. Since for EDMD only smooth basis functions were chosen, the resulting eigenfunction is automatically smoothened.
The system has two metastable states and the second eigenfunction of the Perron-Frobenius operator can be used to detect these metastable states. Also the second eigenfunction of the adjoint Koopman operator contains information about a possible partitioning of the state space, it is almost constant in the y-direction and also almost constant in the x-direction except for an abrupt transition from −1 to 1 between the two metastable sets. The other eigenvalues of the system are numerically zero.

Triple-well problem
Consider the slightly more complex triple-well potential taken from [51]. Here, the variables x and y are coupled, i.e. the potential cannot be written as V (x, y) = V 1 (x) + V 2 (y) anymore. The potential function is shown in Figure 5 and the first two nontrivial eigenfunctions of the Perron-Frobenius operator and the Koopman operator in Figure 6. Note that the eigenfunction ϕ 2 separates the two deep wells at (−1, 0) and (1, 0) and is near zero for the well at (0, 1.

Molecular dynamics and conformation analysis
Classical molecular dynamics. Classical molecular dynamics describes the motion of atoms, or groups of atoms, in terms of Hamiltonian dynamics under the influence of atomic interaction forces resulting from a potential. The position or configuration space Q ⊂ R d describes all possible positions of the atoms, while the momentum space P = R d contains all momenta. The potential V : Q → R is assumed to be a sufficiently smooth function. The phase space X = Q × P of the molecule consists of all possible position-momenta pairs x = (q, p). The evolution of a molecule in phase space under ideal conditions is described by Hamilton's equations of motioṅ where M denotes the symmetric positive definite mass matrix. Since molecules do not stand alone, but are rather subject to interaction with their surrounding molecules, different models incorporating these interactions are more commonly used. One way to account for the collisions with the surrounding molecules is to include a damping and a stochastic forcing term in (19) to obtain the Langevin equation This is an SDE giving rise to a non-deterministic evolution, hence positions and momenta are random variables.
Here, w t is a standard Wiener process in R d . Further, γ and σ satisfy the fluctuation-dissipation relation 2γ = βσσ T , where 0 < β is called the inverse temperature. This is due to the fact that β = (k B T ) −1 , where T is the macroscopic temperature of the system, and k B is the Boltzmann constant. The fluctuation-dissipation relation ensures that the energy of the system is conserved in expectation. It can also be shown (cf. [38,51]) that the Langevin process, governed by (20), has a unique invariant density with respect to which it is ergodic. This density is also called the canonical density, and has the explicit form f can (q, p) = Spatial transfer operator. One of the main features of molecules we are interested in is that it has several important geometric forms, called conformations, between which it "switches". Hereby it spends "long" times (measured on the time scales of its internal motion) in one conformation, and passes quickly to another. Due to this time scale separation the conformations are called metastable. The identification of metastable conformations is of major interest, and it is connected to the sub-dominant eigenfunctions of a special transfer operator which is adapted to the problem at hand [50]: although the more appreciated models describe the dynamics of a molecule in the complete phase space including positions and momenta, metastability is observed (and described) in the positional coordinate only. This problem-adapted transfer operator is called the spatial transfer operator (cf. molecules with positional coordinates distributed according to the density w : Q → R with respect to the canonical distribution is given, then its image under the spatial transfer operator with lag time t describes the density of the positional coordinate of the ensemble after time t, again with respect to the canonical distribution: where f Q is the positional marginal of the canonical density, i.e. f Q (q) = P f can (q, p) dp, and P t Lan is the transfer operator of the Langevin process governed by (20). The operator S t : L 2 (Q, µ Q ) → L 2 (Q, µ Q ), where dµ Q (q) = f Q (q)dq, is self-adjoint (i.e. has pure real point spectrum), and due to the ergodicity of the Langevin process it possesses the isolated and simple eigenvalue 1 with corresponding eigenfunction 1 Q [2].
With the right chemical intuition at hand the range of positional coordinates possibly interesting for conformation analysis can be drastically reduced to just a handful of essential coordinates; as it is shown in Section 5.4. The spatial transfer operator can be adapted to this situation, as we describe in Appendix C. There we also show that if we carry out the EDMD procedure in the space of these reduced observables, we actually approximate a Galerkin projection of the corresponding reduced spatial transfer operator. A similar technique has been developed in [42,43]. Chekroun et al [13] also approximate a reduced transfer operator from observable time series from climate models, but only for the case where the basis functions are characteristic functions, as in Ulam's method.

n-butane
Let us now consider the n-butane molecule H 3 C−CH 2 −CH 2 −CH 3 shown in Figure 7 (drawn with PyMOL [49]). We want to analyze this molecule since the energy landscape and conformations are well-known. The four configurations illustrated in Figure 7 can be obtained by rotating around the bond between the second and third carbon atom. The potential energy of a molecule depends on the structure. The higher the potential energy of a conformation, the lower the probability the system will remain in that state. Thus, we would expect a high probability for the anti configuration, a slightly lower probability for the gauche configuration, and low probabilities for the other configurations. Indeed, the anti and gauche configurations are metastable conformations.  Molecular dynamics simulators are standard tools to analyze the conformations and conformational dynamics of biological molecules such as proteins and the extraction of this essential information from molecular dynamics simulations is still an active field of research [44]. We simulated the n-butane molecule for an interval of 10 ns with a step size of 2 fs using AmberTools15 [12] and, downsampling by a factor of 100, created one trajectory containing 50,000 data points. From this 42-dimensional trajectory -3 coordinates for each of the 14 atoms -, we extracted the dihedral angle ϕ shown in Figure 8 as cos ϕ = n 1 · n 2 n 1 n 2 , where n 1 = v ij × v jk and n 2 = v lk × v jk are the vectors perpendicular to the planes spanned by the carbon atoms i, j, k and j, k, l, respectively, and v ij is the bond between i and j.
In order to compute the dominant eigenfunctions of the spatial transfer operator for this one essential coordinate, we used 41 basis functions {1, cos(i x), sin(i x)}, i = 1, . . . , 20, for the interval [0, 2 π]. The resulting leading eigenfunctions are shown in Figure 9. As expected, the first eigenfunction predicts high probabilites for the gauche and anti configurations and low probabilites for the other configurations. The (sign) structure of the second and third eigenfunctions contain information about the metastable sets.

Conclusion
The global behavior of dynamical systems can be analyzed using operator-based approaches. We reviewed and described different, projection-based numerical methods such as Ulam's method and EDMD to compute finitedimensional approximations of the Perron-Frobenius operator and the Koopman operator. Furthermore, we highlighted the similarities and differences between these methods and showed that methods developed for the approximation of the Koopman operator can be used for the Perron-Frobenius operator, and vice versa. We demonstrated the performance of different methods with the aid of several examples. If the eigenfunctions of the Perron-Frobenius operator or Koopman operator are smooth, EDMD enables an accurate approximation with a small number of basis functions. Thus, this approach is well suited also for higher-dimensional problems.
The next step could be to investigate the possibility of extending the methods reviewed within this paper using tensors as described in [44] for reversible processes. Currently, not all numerical methods required for generalizing these methods to tensor-based methods are available. Nevertheless, developing tensor-based algorithms for these eigenvalue problems might enable the analysis of high-dimensional systems. a vector-valued function, and for sets of points (collected column-wise into a d × m matrix) Scalar products.
Given f, g ∈ R k and some positive measure µ, such that | ψ i ψ j dµ| < ∞ for all i, j = 1, . . . , k, we wish to express the µ-weighted L 2 scalar products of elements of V. To this end, we compute where S ∈ R k×k with S ij = ψ i ψ j dµ. Since µ is positive, S is symmetric positive definite, hence invertible.

Adjoint operator.
With this, we are ready to express the adjoint A * of any (linear) operator A : V → V with respect to the scalar product ·, · µ . By successive reformulations of the defining equation for the adjoint, we obtain

Thus,
Remark A.1. From (23) we can see that A T represents the adjoint of A if S is a multiple of the identity matrix, implying that the basis functions are orthogonal with respect to ·, · µ . This is the case for Ulam's method, given the boxes have all the same measures.
Let Φ : R d → R d be some dynamical system. The following properties hold also, if Φ, such as the basis functions and the measure µ are restricted to some set X.
Recall equations (1) and (4), stating that the Perron-Frobenius operator P µ : L 1 → L 1 with respect to the measure µ is (uniquely) defined by h dµ, for all measurable A , and the Koopman operator K : L ∞ → L ∞ is defined by respectively. They satisfy the duality relation We have seen in section 3.4, that if the data points satisfy y i = Φ(x i ), i = 1, . . . , m, then K, with K T = Ψ Y Ψ + X , is a data-based approximation of the Koopman operator. More precisely, in the infinite-data limit m → ∞, x i ∼ µ, the operator K converges to a Galerkin approximation of K on V with respect to ·, · µ . Using (15), we can also conclude that where S is the symmetric positive definite weight matrix from above. This suggests, using (23), that if there is a sufficient amount of data points at hand, then we can approximate the Galerkin projection of the Perron-Frobenius operator P µ to V by The same matrix representation has been obtained in equation (16), by a different consideration. Note also, that if one can compute the matrix S with S ij = ψ i ψ j dρ with respect to a different measure ρ, the Perron-Frobenius operator with respect to ρ can be approximated as well, one is not restricted to use the empirical distribution µ of the data points.
Remark A.2. All these considerations can be extended to the case where the dynamics Φ is non-deterministic.

B On the ergodic behavior of one-step pairs
We will need the result of this section, equation (25), in the following section. Let the non-deterministic dynamical system Φ be given with transition density function k, that is, for a.e. y ∈ X. Further, let f denote the unique invariant density of Φ, with respect to which Φ is geometrically ergodic. Geometric ergodicity of the Langevin process (20) has been established in [38]. For φ, ψ ∈ L 2 (X) we wish to determine the ergodic limit To this end, we consider the non-deterministic dynamical system Ψ : .
In order to find the transition density function of Ψ, note that yielding k Ψ ((x, y), (u, z)) = δ y (u)k(u, z) as the transition density function of Ψ. From this we immediately find its invariant density.
Proof. Direct computation shows the last equality following from the invariance of f under Φ.

C EDMD for the reduced spatial transfer operator
We shall first discuss the restriction of the spatial transfer operator, introduced in (21), to a collection of coordinates which we assume to be sufficient to describe the metastable behavior of the system. Let ξ : Q → U ⊂ R r be a smooth, possibly nonlinear mapping of the configuration variable q to these so-called essential (or reduced ) coordinates. For instance, in case of n-butane in Section 5.4 we have r = 1 and ξ describes the mapping q → ϕ given implicitly by (22). Let ξ have the property that for every regular value z ∈ U of ξ, is a smooth, codimension r manifold. We suppose that ξ is a physically relevant observable of the dynamics, e.g. a reaction coordinate.
To define the spatial transfer operator for the essential coordinates, we need a nonlinear variant of Fubini's theorem, the so-called coarea formula [21,Section 3.2]. For an integrable function h : Q → R it holds where G(q) = det ∇ξ T ∇ξ −1/2 is the Gramian, and dσ z denotes the Riemannian volume element on M z . It follows that the (marginal) canonical density for the observable ξ is f U (z) = Mz×P f can G dσ z dp .
Comparing with (15), we thus see that EDMD converges in the infinite-data limit to a Galerkin projection in L 2 (U, µ U ) of the spatial transfer operator for the essential coordinates given by ξ.

D Derivation of the EDMD-discretized Koopman operator
Let the finite dictionary D = {ψ 1 , . . . , ψ k } of piecewise continuous functions be given, and define V to be the linear space spanned by D. We will give a step-by-step derivation of the matrix representation of the EDMD-discretized Koopman operator K : V → V with respect to the basis D. Let us denote also with K ∈ R k×k this matrix representation, and note that the matrix K acts by multiplication from the left, i.e. if the vector c ∈ R k represents the function i c i ψ i , then K c represents its image under the discrete Koopman operator. Recall that ψ : X → R k denotes the column-vector valued function with [ψ(x)] i = ψ i (x). Now, EDMD is an over-determined Petrov-Galerkin method (7), where x 1 . . . , x l are the initial data points and y 1 , . . . , y l denote their images under the dynamics. If there was just one single data point x , we would like to find a matrix K satisfying the equation for every c ∈ R k . Rearranging the terms and using Kψ i (x ) = ψ i (y ) yields or, in vectorial notation, c T ψ(y ) = c T K T ψ(x ). Since this has to hold true for every c ∈ R k , we have ψ(y ) = K T ψ(x ). From this it follows by putting the column vectors ψ(x ) and ψ(y ) side-by-side for multiple data points x to form the matrices Ψ X and Ψ Y , respectively, that (28) is equivalent with where · F denotes the Frobenius norm. Thus, EDMD can be viewed as a DMD of the transformed data Ψ X and Ψ Y . The solution of the minimization problem is given by where Ψ + X is the pseudoinverse of Ψ X .