A KERNEL-BASED METHOD FOR DATA-DRIVEN KOOPMAN SPECTRAL ANALYSIS

. A data-driven, kernel-based method for approximating the leading Koopman eigenvalues, eigenfunctions, and modes in problems with high- dimensional state spaces is presented. This approach uses a set of scalar observables (functions that map a state to a scalar value) that are deﬁned im- plicitly by the feature map associated with a user-deﬁned kernel function. This circumvents the computational issues that arise due to the number of functions required to span a “suﬃciently rich” subspace of all possible scalar observables in such applications. We illustrate this method on two examples: the ﬁrst is the FitzHugh-Nagumo PDE, a prototypical one-dimensional reaction-diﬀusion system, and the second is a set of vorticity data computed from experimentally obtained velocity data from ﬂow past a cylinder at Reynolds number 413. In both examples, we use the output of Dynamic Mode Decomposition, which has a similar computational cost, as the benchmark for our approach.

1. Introduction. In many applications, the evolution of complex spatiotemporal phenomena can be characterized using models based on the interactions between a relatively small number of modes. Due to the availability of data and computational power, algorithmic techniques for identifying such modes have become increasingly common [15,8,29,28,26,6,35,36,17,9,34,32,10,20]. Perhaps the best known method is the Proper Orthogonal Decomposition (POD) [15,8,4], which is related to Principal Component Analysis (PCA) [4,20]. However, other algorithms that generate different sets of modes exist. In recent years, approximations of the modes of the Koopman operator [19,18] have become popular in fluid flow applications [26,6,24,1,9,36,14]. The Koopman modes of the full state observable, We apply the kernel-based approach to a pair of sample problems: the onedimensional FitzHugh-Nagumo PDE, a prototypical reaction-diffusion system, and experimentally obtained vorticity data from flow over a cylinder at Reynolds number 413. Using the FitzHugh-Nagumo PDE example, we will demonstrate that using a higher-dimensional subspace of observables than DMD can enable Koopman-based algorithms to more accurately and consistently reproduce the leading Koopman modes, eigenvalues, and eigenfunctions. The cylinder example represents a typical use case for methods like DMD, and our objective here is to demonstrate that the kernel-based approach has practical advantages in more realistic settings where the data are noisy and the "true" Koopman eigenvalues and modes unknown.
The remainder of the manuscript is outlined as follows: in Sec. 2 we present the kernel reformulation of Extended Dynamic Mode Decomposition. In Sec. 3, we apply it to a numerical discretization of the FitzHugh-Nagumo PDE in one spatial dimension, where a subset of the Koopman eigenvalues and modes are known. In Sec. 4, we apply our method to experimentally obtained data of flow over a cylinder, which is a more realistic example with measurement noise and other experimental realities. Since this example is a more typical application area for DMD and similar methods, we compare the computational cost of our method, as measured in terms of wall-clock time, against a DMD benchmark. Finally, in Sec. 5, we present some concluding remarks and future outlook.

A data-driven approximation of the Koopman operator.
In this section, we present a data-driven approach for approximating the Koopman operator that can be applied to systems with high-dimensional state spaces. The method presented here is a reformulation of the Extended DMD procedure that makes use of the so-called kernel trick [30]. In this section we will: (i) briefly review the Koopman operator, (ii) give a short derivation of "standard" Extended DMD, which enables the "tuples" of Koopman eigenvalues, eigenfunctions, and modes to be approximated from data, and (iii) present the kernel-based reformulation of the approach.
2.1. The Koopman operator. Although we will focus on data-driven approximations throughout this manuscript, the Koopman operator [19,18], along with its eigenvalues, eigenfunctions, and modes, is defined by a dynamical system and not a set of data.
Definition 1 (Koopman operator). Given the discrete time dynamical system (n, M, F ), where n ∈ Z is time, M ⊆ R N is the state space, and x → F (x) defines the dynamics, the action of the Koopman operator, K, on a scalar observable, Intuitively, Kφ is a new scalar valued function that gives the value of φ "one step in the future." Note that the Koopman operator acts on functions of the state, and not the states themselves. Since φ ∈ F, where F is an appropriate space of scalar observables, the Koopman operator is infinite dimensional. However, it is also linear, and can have eigenvalues and eigenfunctions, which we refer to as Koopman eigenvalues and eigenfunctions. Accompanying the eigenvalues and eigenfunctions are the Koopman modes for a given vector valued observable, g : M → R No , where N o ∈ N. These modes are vectors in a system of ODEs (or spatial profiles in a PDE) that contain the coefficients required to construct g using a Koopman eigenfunction basis [26,35]. One particularly useful set of modes is that of the full state observable, g(x) = x, which we refer to as simply the Koopman modes in all that follows.
In many systems [26,23,24], triples consisting of an eigenvalue, an eigenfunction, and a mode enable a simple yet powerful means of representing the system state and predicting future state values. In particular, where ξ k is the Koopman mode corresponding to the eigenfunction ϕ k , µ k is the corresponding eigenvalue, and N k is the number of tuples required for a practically useful reconstruction, which could be infinite. The eigenfunction ϕ k : M → C is a function, but the mode ξ k ∈ C N is a vector. From (2), the coefficient associated with the k-th mode, ξ k , is obtained by evaluating the k-th eigenfunction, ϕ k , and the temporal evolution is dictated by µ k . Ultimately, one benefit of this approach is that the Koopman eigenvalues, eigenfunctions, and modes are intrinsic to the dynamical system, so unlike the principal components or the outputs of most other data-driven dimensionality reduction methods, they are defined absent any data.

Extended dynamic mode decomposition.
Extended Dynamic Mode Decomposition (Extended DMD) [35] is a regression procedure whose solution produces a finite-dimensional approximation of the Koopman operator. To do this, Extended DMD must be provided with a set of N k scalar observables, which we denote as ψ k : M → R for k = 1, . . . , N k , that span F N k ⊂ F. In addition to defining a subspace of the scalar observables, the ψ k also define a mapping from state space to feature space: Definition 2 (Feature space). Suppose the space of scalar observables is approximated using N k functions, then feature space is R N k . The feature map from state space to a subset of feature space is determined by which is simply the set of scalar observables "stacked" into a column vector.

Remark 1.
Due to the connection between state space and the definition of feature space, the value of any function φ,φ ∈ F N k at x ∈ M can be written as the inner product of a coefficient vector a,â ∈ C N k and an element of feature space: Our goal is to capture the evolution of a scalar observable in the subspace F N k . In practice, F is not known explicitly, and all we have access to is a data set of snapshot pairs: where x m , y m ∈ M. One important special case of such a data set is a single timeseries of data, which can be expressed in the above form by "grouping" sequential pairs of snapshots that were obtained with a fixed sampling interval ∆t.
Then, letφ = Kφ + r, where r ∈ F is a residual function that appears because F N k is not necessarily invariant to the action of the Koopman operator. Using the notation in (4), the objective of the Extended DMD procedure [35] is to use the data in (5) to identify a mapping K ∈ R N k ×N k from some given coefficient vector, a, to a new vectorâ such that this residual is minimized. One procedure that does this is: Algorithm 1 (Conceptual procedure for Extended DMD).
1. Using the data set of snapshot pairs in (5) and the set of scalar observables aggregated into ψ, compute the observation matrices where where + denotes the pseudoinverse obtained using the SVD.

Remark 2.
While this procedure is extremely simple, it is only applicable to problems with moderate numbers of snapshots and scalar observables as both Ψ x and Ψ y , which are typically dense matrices, must be stored in their entirety. As we will show, identical results can be obtained without explicitly forming Ψ x and Ψ y .
In Ref. [35], the authors considered the case where M N k , and proposed the following computational procedure: Algorithm 2 (Extended DMD).

Form the matrices of outer products
by evaluating ψ(x m ) and ψ(y m ), computing the outer products of the appropriate pair of vectors, and updating the entries of G and A. By constructing G and A term by term, we avoid the need to explicitly form the Ψ x and Ψ y matrices. 2. Compute the pseudoinverse of G, and let Remark 3. Because EDMD is a least-squares problem, the matrix K in (7) is equivalent to the definition in (8). Equation 7 identifies K by solving the least squares problem using the pseudoinverse while (8) uses the normal equations.
Regardless of the method, the properties of K of interest here are: 1. The k-th eigenvalue of K, µ k , is an approximation of an eigenvalue of K.
When the data are generated by sampling a continuous time dynamical system with a fixed sampling interval ∆t, we also define the approximation of the continuous time eigenvalue as λ k log(µ k )/∆t. 2. Using (4), the corresponding eigenvector, v k , approximates an eigenfunction of the Koopman operator via 3. The left eigenvector, w k , can be used to approximate the Koopman mode, ξ k . For a derivation of these relationships, see Ref. [35].
2.3. The Kernel method. In Sec. 2.2, we considered the case where the number of snapshots was large compared to the dimension of the desired subspace of functions. Now we will consider the opposite and more commonly encountered regime [28,29,27,26,6,24,1,9,36,14] where the number of snapshots is small compared to the dimension of our subspace of scalar observables (i.e., M N k ).

Remark 4.
The difficulty that appears when N k M is that the Extended DMD procedure, as formulated in Sec. 2.2, requires O(N 2 k M ) time to assemble K and O(N 3 k ) time to compute the modes and eigenvalues. However, the value of N k for a "rich" set of basis functions grows rapidly as the dimension of state space increases, which makes this approach computationally infeasible.

Example 1.
Consider the case where F N k is the space of all (multivariate) polynomials on R 256 with degree up to 20. Then, N k ∼ 10 30 , which is far too large for practical computations [4]. This explosion in the size of the problem is common in machine learning applications, and is one facet of the curse of dimensionality.
Similar to the approach taken by Schmid [27], we make the problem tractable by computingK, a matrix similar to K that allows its non-zero eigenvalues and their corresponding eigenvectors to be computed. Next, we defineK.
where Q, Σ ∈ R M ×M and Z ∈ R N k ×M . The pair µ = 0 andv are an eigenvalue and eigenvector of if and only if µ and v = Zv are an eigenvalue/eigenvector pair of K.
so ifv is an eigenvector ofK then v = Zv is an eigenvector of K. Starting with: x and therefore the image of Z. As a result, if v is an eigenvector of K, thenv is an eigenvector ofK.
If we could formK, then we could compute the non-zero eigenvalues and the corresponding left and right eigenvectors of K, which would yield the desired approximations of the Koopman eigenvalues, eigenfunctions and modes. However, this has simply moved the computational bottleneck from the eigendecomposition of K to computing the SVD of Ψ x , which is an O(N k M 2 ) task. While this is an improvement, due to the (possibly extremely large) size of N k , even linear dependence on the number of observables will make the resulting problem computationally intractable.
In order to develop a more efficient algorithm, we must first demonstrate that the scalar observables do not need to be defined explicitly. More concretely: Remark 5. The matrixK in (11) can be obtained from matrices whose elements are inner products in feature space evaluated at pairs of data points. To demonstrate this, we first define the matrixĜ Ψ x Ψ T x , and reuse the matrixÂ Ψ y Ψ T x from (11). Despite appearing "flipped," the ij-th elements ofĜ andÂ arê HoweverĜ = QΣ 2 Q T , using the definition of Q and Σ in (10). Therefore, given G we can obtain Q and Σ via its eigendecomposition. WithÂ, Q, and Σ in hand, K can be constructed using (11). Note that this is simply the method of snapshots approach for computing the POD modes performed in feature space [31].
With this property established, we can now use the kernel trick: Definition 3 (The kernel trick). The kernel trick is a common technique for efficiently computing inner products in an implicitly defined reproducing kernel Hilbert space [7,3,30,25]. A kernel function has the form f : M × M → R, and is associated with a feature map ψ, which maps from state space to feature space. The relationship between f and ψ is that Example 2 (A quadratic kernel). The simplest example is the polynomial kernel with x, z ∈ R 2 , which, when expanded, is . As a result, one could either compute inner products in feature space using the kernel function or the mapping, ψ, identified above. The results are analogous, but the kernel approach requires an inner product of vectors in R 2 while the direct approach requires an inner product in R 6 .
There are libraries of kernel functions, each of which is associated with a different implicitly defined feature map. In this paper, we use the following kernels: Definition 4 (Polynomial kernel). Polynomial kernels have the form: which are equivalent to a basis set that can represent all polynomials up to and including terms of degree α, but takes only O(N ) time to evaluate. The addition of the d parameter scales the data by a constant factor, and affects the relative importance of higher order terms in the polynomial expansion.
In the example that follows in Sec. 3, we use a polynomial kernel with α = 20. We selected a polynomial kernel because the Koopman eigenfunctions are often analytic in a disk about a fixed point [22], and polynomial kernels mimic an α order power-series expansion. The specific choice of α = 20 is more ad hoc. In general, large values of α use a "richer" set of basis functions, but also deleteriously impact the condition number ofĜ. The choice of α = 20 appeared to strike a balance between having the "right" basis functions and avoiding numerical issues.
Definition 5 (Gaussian kernel). Gaussian kernels have the form: which is commonly used in machine learning applications [4,11,30], in order to highlight the performance of a different possible choice of the kernel. Similar to d, σ is a scaling parameter that is dependent on the characteristic length scales of the underlying problem.
This choice will result in different Koopman eigenvalues, eigenfunctions, and modes, and may yield superior (or inferior) results compared to the polynomial kernel. Ultimately, the choice of kernel is equivalent to the choice of dictionary elements in EDMD. The goal is to choose a kernel whose implicit reproducing kernel Hilbert space contains the Koopman eigenfunctions of interest, but the optimal choice is problem dependent and remains an open research question.
Once a kernel has been chosen, the procedure for obtaining an approximation of the Koopman operator is as follows: 1. Using the data set of snapshot pairs and the kernel function, f , compute the elements ofĜ andÂ using: 2. Compute the eigendecomposition of the GramianĜ to obtain Q and Σ.
Remark 6. The practical advantage of this formulation is thatK can be assembled in O(M 2 N ) time rather than O(M 2 N k ) time. However, because ψ has not been defined explicitly, we must adjust the approach for Koopman mode and eigenfunction computation because the method in [35] assumes ψ is known.
Next we will show how to compute the values of the Koopman eigenfunctions and modes with the restriction that ψ has been defined implicitly through the choice of the kernel function. More concretely, we will demonstrate that the needed operations in feature space can be expressed entirely in terms of inner products, which will then be evaluated using the kernel function.

Remark 7.
The values of the Koopman eigenfunctions can be evaluated at the supplied data points using the (right) eigenvectors ofK and (9). LetV be the matrix whose columns are the eigenvectors ofK. Substituting this matrix into (9) results in the matrix of eigenfunctions values: where i-th row of Φ x contains the numerically computed eigenfunctions evaluated at x i . Because Q and Σ are obtained as byproducts of the solving process, the Koopman eigenfunctions can be evaluated at the data points without explicit knowledge of ψ.
This formulation once again exploits the property that the eigenfunctions can be written as the inner product of two vectors in feature space, which is computed using the kernel selected. However because Φ x evaluates the eigenfunctions at the data points, there is a significant amount of cancellation/pre-computation, which results in the simple expression shown above.

Remark 8.
The k-th numerically approximated Koopman eigenfunction can also be evaluated at a new data point. Substituting v k = Zv k into (9) yields: which uses the same reasoning as in (17) though without the simplifications and cancellations that occur in that case.
Next, we focus on the computation of the Koopman modes. In [35], it was assumed that the state itself could be written as a linear combination of the ψ k with known coefficients. Because we have not defined ψ explicitly, this assumption will not hold in the kernel case. Instead, we select the modes to minimize the error associated with reconstructing the state data from the Koopman eigenfunction embedding, i.e., If provided thatV has full rank. As a result, given the eigenvectors ofK and the matrices used to construct it, we can also approximate the modes.

Remark 9.
ProvidedK has a full set of eigenvectors, the inverse ofV iŝ whereŵ * i is a left eigenvector ofK scaled so thatŵ * iv j = δ ij . This implies that the k-th Koopman mode, ξ k , is Therefore approximate Koopman modes and eigenfunctions can be obtained via the left and right eigenvectors ofK, and do not require a complete eigendecomposition ofK. For the problems considered here, a complete decomposition ofK is performed, but for problems with larger numbers of snapshots, Krylov methods could be used to compute a leading subset of its eigenvalues and vectors [21]. We give a short example of the speedup possible using this approach in Sec. 4.
3. Example: The FitzHugh-Nagumo PDE. In order to highlight the effectiveness of the kernel method, we first apply it to the FitzHugh-Nagumo PDE [12] in one spatial dimension using DMD as the benchmark. This example is particularly useful, because a subset of the true Koopman eigenvalues and modes can be deduced from the system linearization. The governing equations are: where v is the activator field, w is the inhibitor field, c 0 = −0.03, c 1 = 2.0, δ = 4.0, = 0.02, and x ∈ [0, 20] with Neumann boundary conditions. Both v and w are approximated using a discrete cosine transform-based spectral method with 128 basis functions each [5].
With these parameter values, (23) has a stable equilibrium point that resembles a standing wave-front in both the activator and inhibitor fields. In order to explore a subset of state space and to mimic the presence of impulsive actuation, every 25 time units we perturb the system state by setting where x 1 = 7.5, x 2 = 10, x 3 = 12.5 and u 1 , u 2 , u 3 ∼ N (0, 0.1 2 ). Note that w(x, t) remains unchanged. Starting from the equilibrium point, we generate five separate trajectories consisting of 2500 snapshots (or 2499 snapshot pairs) with a sampling interval of ∆t = 1 starting with five different random seeds. Figure 1 shows the v and w data for a prototypical trajectory: the perturbations applied every 25 units in time excite "fast" directions that decay almost immediately (and thus are not visible in the figure) as well as "slow" directions that take the form of "shifts" in the position of the wave front.
To compute the Koopman eigenvalues, eigenfunctions, and modes, we use a polynomial kernel with α = 20. The scaling parameter d = 1 M M m=1 x m . For these data sets, both X (for DMD) and Ψ x (for the kernel method) have small singular values, so when we compute the pseudoinverse of X (for DMD) or Σ (for the kernel method), we retain the 150 largest singular values and set the others to zero. This is equivalent to projecting X and Y onto the 150 leading POD modes of X or Ψ x and Ψ y onto the leading 150 kernel principal components of Ψ x before carrying out the computation. In our experience, this often helps to avoid the spurious unstable eigenvalues that both DMD and Extended DMD can produce. Figure 2 shows the approximation of the continuous-time eigenvalues obtained for the five different data sets using this methodology. Because we used truncated SVDs for computing pseudoinverses, both DMD and the kernel method produce eigenvalues that are, up to numerical errors, contained in the left half plane. Furthermore, both DMD and the kernel method consistently identify the eigenvalues with λ 1 ≈ 0 and λ 2,3 ≈ −0.006 ± 0.053i though, as shown in the figure, the variance in this   Figure 2. The leading eigenvalues computed using the kernel method (left) and DMD (right). Both DMD and the kernel approach consistently identify the λ 1 = 0 eigenvalue and the pair associated with the system linearization (λ 2,3 ≈ −0.006 ± 0.053i). However, the kernel-based method appears to do so with less variance, and is also able to reconstruct two additional "layers" of of eigenvalues, such as λ 4 ≈ −0.013 and λ 7 ≈ −0.019 + 0.053i.
pair is smaller when the kernel method is used. Although DMD identifies other eigenvalues with larger negative real parts, no other eigenvalues lie in the subset of the complex plane shown in the figure. The kernel method, which effectively uses a "richer" set of basis functions to approximate the Koopman eigenfunctions, also identifies two additional layers of the "pyramid" of eigenvalues that the Koopman operator should possess in this problem [13]. As a result, although the numerically computed eigenvalues are truly data dependent, the kernel approach: (i) appears to  There is a larger amount of variation in the 4th and 7th modes between the five data sets, but the shapes of these modes remain consistent.
approximate the leading eigenvalues of the Koopman operator with less variance than the DMD benchmark, and (ii) identifies additional Koopman eigenvalues the DMD benchmark appears to neglect. The real part of the v-component of the modes corresponding to four of the leading eigenvalues are shown in Fig. 3. The first mode, which has λ 1 ≈ 0, is a scaled version of the solution at the fixed point where the appropriate scaling factor is determined by the corresponding Koopman eigenfunction. The error, ξ 1 − ξ true with the normalization ξ = 1, was less than 0.01 for the five data sets used in the computation (the mean value of the error was 6.9 × 10 −3 ). The next pair of modes (modes 2 and 3), which have λ 2,3 ≈ −0.006 ± 0.053i, are the "slowest" pair of eigenvectors of the system linearization, and provide a useful description of the evolution near the fixed point. The error in these modes was less than 0.02 for the five data sets (with a mean value of 1.9 × 10 −2 ). As a result, not only does the kernel method consistently identify the leading Koopman modes in this problem, but it is accurate as well.
The first mode associated with nonlinear effects, which has the eigenvalue λ 4 ≈ −0.013, is shown in the third image of Fig. 3. As shown in the figure, the kernel method consistently identifies this mode, though there are differences in the mode shape that are visible to the eye if one "zooms in" near x = 10 in the figure.
In general, both DMD and the kernel method become less accurate the further "down" (i.e., into the left half-plane) one goes, as is demonstrated by the slight, yet visible differences in the 7th mode with λ 7 ≈ −0.019 + 0.053i shown in the bottom panel of Fig. 3. Intuitively, this is because the corresponding eigenfunctions tend to become more complicated functions; in this example, the eigenfunctions corresponding to each subsequent layer of the pyramid are higher powers of the leading two eigenfunctions. As a result, they are less likely to lie in or near the subspace spanned by our implicitly chosen basis set, and are therefore less accurately computed.
In this section, we applied the kernel approach to the FitzHugh-Nagumo PDE, and found that it consistently identified a subset of the Koopman modes and eigenvalues including those that are not associated with the system linearization. The first three modes, where the true solutions happen to be known, are identified accurately, with a relative error of less than 2%. Given a sufficiently large amount of data and the proper choice of kernel, this approach can also identify additional "layers" of Koopman eigenvalues, but here the variance in the numerically obtained eigenvalues and modes is larger, which is likely due to the fact the eigenfunctions of interest do not lie entirely within the span of the functions used to represent them. As a result, by choosing a larger set of functions, the kernel method has the potential to accurately reproduce more of the leading Koopman eigenvalues, eigenfunctions, and modes than the DMD benchmark. 4. Example: Experimental data of flow over a cylinder. In this example, we apply the kernel approach to experimentally obtained Particle Image Velocimetry (PIV) data for flow over a cylinder taken at Reynolds number 413, which was used in previous works [33,14]. The experiment was conducted in a water channel designed to minimize three-dimensional effects, and the computed velocity field sampled at 20 Hz at a resolution of 135 × 80 pixels [33]. From these snapshots of the velocity field, we compute the vorticity via standard finite difference methods. In the original manuscripts, a single time-series consisting of 8000 snapshots of the vorticity were used. In this manuscript, we create multiple data sets by randomly selecting 3000 snapshot pairs from this master data set. As a result, although we have access to the same data, the data set is just a set of snapshot pairs and not a single time-series.
This example is intended to highlight how we foresee the kernel approach being used in practice. Due to the experimental nature of the data and the presence of measurement and process noise, the true Koopman eigenvalues and modes are not known. However as we will show shortly, the numerically computed Koopman modes and eigenvalues have several of the properties that the true eigenvalues should have, which implies the kernel method is producing a more accurate approximation than DMD. Because we cannot quantitative judge the performance of the kernel method in this setting, our objective here is simply to demonstrate the practical advantages of the kernel method, rather than the accuracy of the resulting modes and eigenvalues.
We other kernels, we also use the same data to approximate the Koopman eigenvalues and modes using a Gaussian kernel with σ = d. In this problem, both kernels will produce similar eigenvalues and modes, but in other applications or even simply with other scaling parameters they may give different results. From previous efforts [33,14], a useful set of modes and eigenvalues can be obtained from this data using DMD and similar methods. Figure 4 shows the eigenvalues computed using DMD along with the kernel approach with either a polynomial or Gaussian kernel. In all cases, all 3000 singular values were above machine epsilon, and were retained when computing pseudoinverses. The practical advantage of either the polynomial or Gaussian kernel is that the eigenvalues and modes of interest can be identified visually; in particular, they are the "most slowly decaying" eigenvalues, which could be computed efficiently even in large systems using Krylov methods. On the other hand, DMD has a "cloud" of both stable and unstable eigenvalues, and the modes known to be most important are not necessarily those that decay most slowly. We should note that a useful set of eigenvalues and their concomitant modes are contained in this cloud, but energy [33] or sparsitypromoting [16] methods must be used in order to select this subset of the eigenvalues. To do this, however, both of these methods require all of the eigenvalues and modes to be computed, which could become computationally expensive in systems with larger numbers of snapshots.
In addition to the eigenvalues shown in Fig. 4, both kernel methods generate a cloud of eigenvalues associated with more rapid decay rates (i.e., to the left of the window shown). It should be stressed, however, that none of the neglected eigenvalues have a positive real part. Conceptually, the underlying system possesses a limit cycle with the addition of both process and measurement noise. In the absence of noise, the Koopman eigenvalues would lie on the imaginary axis and in regular intervals in the left half-plane [13]. The addition of process noise causes many of the eigenvalues to decay (or decay more rapidly) [23,2,35], as the distribution of trajectories approaches the invariant distribution that is concentrated near the  Figure 5. The real part of three of the leading Koopman modes obtained using the kernel method with a polynomial kernel (top) and a Gaussian kernel (bottom). These modes agree favorably with pre-existing modes computed using DMD [33], but were identified visually rather than via energy-based or sparsity-promoting [16] methods for mode selection. Note: the frequencies listed above are the angular frequencies, and differ from similar results in Refs. [33,14] by a factor of 2π.
limit cycle. This decay rate depends upon the amount and type of noise added to the system. Furthermore, the real part of the numerically computed eigenvalues also depends on the data. For example, the 3000 randomly chosen snapshot pairs used here resulted in eigenvalues with much larger decay rates than a 3001 snapshot time-series would with the same kernel. As a result, the eigenvalues presented here almost certainly overestimate the decay rate the "true" Koopman eigenvalues would have due to process noise. The real part of three of the leading Koopman modes are shown in Fig. 5. Both the shape and angular frequency (i.e., (λ)) of these modes are similar to those obtained using the full set of 8000 snapshots in Refs. [33,14]. Using the polynomial kernel results as an example, λ ≈ 5.58i corresponds to an oscillation frequency of 0.89 Hz, λ ≈ 11.15i to a frequency of 1.77 Hz, and λ ≈ 16.73i to 2.66 Hz. In Ref. [14], these frequencies were reported as 0.89 Hz, 1.77 Hz, and 2.73 Hz respectively. The modes shown in Fig. 5 were obtained from the left eigenvectors of the matrixK using (22), which in this example were obtained by completely decomposingK. However, for both the polynomial and Gaussian kernels, they could also have been obtained by computing the leading left-eigenvectors using the implicitly restarted Arnoldi method [21].
Although more complex methods for scattered-data interpolation exist, one benefit of the kernel-based approach is that it is similar to the "standard" DMD algorithm both in implementation and computational cost. Table 1 shows the "wallclock" time averaged over 100 realizations associated with DMD, the kernel method with a polynomial kernel, and with a Gaussian kernel. For each iteration, 3000 snapshot pairs (of the 8000 snapshot time-series) were randomly selected without  Table 1. The wall-clock times associated with assembling the matrixK and computing the eigenvalues and modes averaged over 100 randomly generated data sets consisting of 3000 snapshot pairs. The values in parentheses are the means of the wall-clock times required to compute the 5 largest eigenvalue/eigenvector pairs using ARPACK [21]. In all three cases, we do not compute the associated Koopman eigenfunction values; those would require the right-eigenvectors and would roughly double the "Mode/Eigenvalue Computation Time" values listed. Figure 6. A superposition of the eigenvalues that result when DMD (left), a polynomial kernel (center), and a Gaussian kernel (right) are applied to the 100 data sets used to create Table 1. The kernel methods fairly consistently identify the same set of leading eigenvalues, though there is a significant amount of variance in the rapidly decaying eigenvalues. As in Fig. 4, eigenvalues in the left half plane are colored blue, and those in the right half plane are colored red.
replacement, and passed to DMD or the kernel method. The "Matrix Assembly Time" column contains the time required to form the matrixK or the equivalent for DMD. Because both kernels can be evaluated in O(N ) time, assemblingK can be done in O(N M 2 ) time, which is identical to DMD. This equivalence in computational cost is captured by the nearly identical run times for all three cases. The "Mode Computation Time" column in Table 1 contains the time required to compute the complete set of Koopman modes, and Fig. 6 shows the superposition of the eigenvalues obtained from the 100 experiments (i.e., all 100 spectra plotted on the same axes). Because both DMD and the kernel method produce M × M matrices, this step requires O(M 3 ) time for both DMD and the kernel method. Because the kernel methods appear to consistently produce isolated eigenvalues, there is the potential for significant computational savings if only a few of the leading eigenvalues/modes are required. The values in parentheses in Table 1 are the mean wall-clock times required to compute the five leading eigenvalues and vectors using ARPACK, which can be up to a factor of 10 faster than computing all of the eigenvalues/modes. For the sake of comparison, we also computed the five leading eigenvalues/modes for DMD, but it should be noted that this does not always produce modes of interest, which are the ones shown in Fig. 5, as they are not always associated with the most slowly decaying values of λ in this example.
In this section, we applied the kernel method to experimental data and computed the leading Koopman modes. Unlike the previous example, where the kernel approach produced modes that DMD could not, in this example both methods produce similar sets of modes. The difference, however, is that the "important" modes identified in Refs. [33,14] using an energy-based method, are the clearly-visible leading (i.e., most slowly-decaying) modes of the kernel method. As a result, rather than computing all of the Koopman modes and then truncating, one could compute only the leading Koopman modes via (22). Furthermore, the eigenvalues identified by the kernel method possess more of the properties that the Koopman eigenvalues should have; in particular, they lie on the imaginary axis or in the left half-plane. Therefore, even though we cannot definitively say that the kernel method has accurately identified the Koopman eigenvalues, there are conceptual and practical advantages to using the kernel method in lieu of standard DMD in experimental applications like the one shown here.

Conclusions.
We have presented a data-driven method for approximating the Koopman operator in problems with large state spaces, which commonly occurs in physical applications such as fluid dynamics. The kernel method we have developed defines a subspace of scalar observables implicitly through a kernel function, which allows the method to approximate the Koopman operator with the same asymptotic cost as DMD, but with a far larger set of observables. In this manuscript, we used a polynomial kernel, whose associated set of basis functions can represent α-th order polynomials, and a Gaussian kernel. However, many other kernels are available and are associated with other basis sets; as a result, the choice of kernel, like the choice of a basis set in Extended DMD, is important but currently must be chosen by the user.
To highlight the effectiveness of the kernel method, we applied it to a pair of illustrative examples. The first is the FitzHugh-Nagumo PDE in one spatial dimension, where a subset of the Koopman eigenvalues and modes could be deduced by linearizing the system about an equilibrium point. For this subset, the kernel method consistently and accurately identified the leading Koopman eigenvalues and modes. However, it also accurately computed additional Koopman eigenvalues that are not associated with the system linearization and, although we do not have an analytical expression for the corresponding modes, consistently identified a mode shape.
In practical applications, the data often come from systems whose true Koopman eigenvalues (or related information such as the location and linearization about fixed points) are unknown. Our second example, which used experimental data from flow over a cylinder, was a more realistic example of how we envision the kernel approach being applied. In that example, the numerically obtained eigenvalues had more of the properties the Koopman eigenvalues should have, though it is likely they overestimate the decay rates associated with those eigenvalues due to the randomly selected data and the presence of measurement noise. The primary advantage of the kernel method here was that the Koopman modes of interest were associated with the eigenvalues that decay the slowest. As a result, they could be computed using the leading left eigenvectors of the finite dimensional approximation, rather than requiring the complete set of eigenvectors and eigenvalues to be computed and then truncated using energy-based or sparsity promoting methods.
In the end, the Koopman operator is an appealing mathematical framework for defining a set of spatial modes because these modes are intrinsic to the dynamics rather than associated with a set of data. However, obtaining effective approximations of this operator is non-trivial, particularly when all that is available are data. One method for obtaining such an approximation is Extended Dynamic Mode Decomposition [35], which is, for certain choices of basis functions, only computationally feasible in problems with small state dimension. The kernel method presented here is conceptually identical to Extended Dynamic Mode Decomposition, but computationally feasible even in large systems; in particular, the asymptotic cost of the kernel method is identical to that of "standard" Dynamic Mode Decomposition, and therefore, it can be used anywhere DMD is currently being used. Like most existing data-driven methods, there is no guarantee that the kernel approach will produce accurate approximations of even the leading eigenvalues and modes, but it often appears to produce useful sets of modes in practice if the kernel and truncation level of the pseudoinverse are chosen properly. As a result, approaches like the kernel method presented here are a first step towards the ultimate goal of enabling the conceptual tools provided by Koopman spectral analysis to be used in practical applications.