Diffusion maps embedding and transition matrix analysis of the large-scale flow structure in turbulent Rayleigh--B\'enard convection

By utilizing diffusion maps embedding and transition matrix analysis we investigate sparse temperature measurement time-series data from Rayleigh--B\'enard convection experiments in a cylindrical container of aspect ratio $\Gamma=D/L=0.5$ between its diameter ($D$) and height ($L$). We consider the two cases of a cylinder at rest and rotating around its cylinder axis. We find that the relative amplitude of the large-scale circulation (LSC) and its orientation inside the container at different points in time are associated to prominent geometric features in the embedding space spanned by the two dominant diffusion-maps eigenvectors. From this two-dimensional embedding we can measure azimuthal drift and diffusion rates, as well as coherence times of the LSC. In addition, we can distinguish from the data clearly the single roll state (SRS), when a single roll extends through the whole cell, from the double roll state (DRS), when two counter-rotating rolls are on top of each other. Based on this embedding we also build a transition matrix (a discrete transfer operator), whose eigenvectors and eigenvalues reveal typical time scales for the stability of the SRS and DRS as well as for the azimuthal drift velocity of the flow structures inside the cylinder. Thus, the combination of nonlinear dimension reduction and dynamical systems tools enables to gain insight into turbulent flows without relying on model assumptions.


Introduction
Due to technological advancements in past years the rate at which data can be acquired, stored and processed has increased tremendously. In addition, powerful computers allow to simulate natural processes in greater detail, with larger temporal and spatial resolution than ever before. While acquiring data from measurements and simulation is essential for scientific progress, the goal is always to develop simple effective models that map high-dimensional natural processes onto lower dimensional models that, optimally, allow for predicting the future with accuracy.
Ideally, such a reduction results in functional relationships between just a few control and response parameters. Deriving equations that describe the system, however, can only be done if one has a clear understanding of the underlying fundamental mechanisms. While such an understanding often does not exist, even if the fundamental mechanisms are known, they are often not sufficient to make accurate predictions in complex nonlinear systems. For example, while it is well known that the flow of simple fluids is governed by the Navier-Stokes equations [Dav09], their possible solution is computationally demanding and very sensitive to boundary and initial conditions, such that simple accurate predictions are impossible.
Over the years many tools have been developed for the reduction of data without taking additional knowledge about the source and type of these data into account. Early pioneering approaches are Principal Component Analysis (PCA) [Pea01,Jol86,Shl14], where multidimensional data are mapped on a lower dimensional space via a linear mapping by keeping as much variation as possible, or Multidimensional Scaling (MDS) [Kru64,TdSL00], where a linear embedding is used such that pair-wise distances between points are maintained as well as possible. A more novel method, the diffusion maps approach, first suggested by Lafon and Coifman [Laf04,CL06a], uses local information from the data [RS00,BN03,DG03,ZZ04], in contrast with the above methods that use global distances. It is a non-linear technique that parametrises data based on their underlying connectivity, i.e., their proximity in the space spanned by the observable variables. This approach has been successfully used for instance for image analysis [BWWA13] and image processing [FFL10].
The combination of dimension reduction tools with dynamical time-series analysis can lead to a better understanding of complex and high-dimensional systems: (i) PCA is applied in Proper Orthogonal Decomposition (POD) [SMH05,Row05]; (ii) time-lagged correlation analysis results in Dynamic Mode Decomposition [SS08, KNK + 18], which is a particular instance of Koopman mode analysis [RMB + 09, WKR15, KKS16, AM17], (iii) cluster analysis [KNC + 14] aggregates system states with respect to similarity, and (iv) optimal transport is used to find a non-linear transformation that decouples the dynamics of coordinates [AS19]. Data-based approximation of the Koopman operator by diffusion maps is obtained in [BGH15,GKKS18], where the latter reference analyses a convection problem similar to ours below, in a different container geometry. Classical embedding results from differential geometry and dynamical systems with subdivision techniques are used in [DvMZ16,ZDG18,GKD19] to approximate finite-dimensional attractors and invariant manifolds of infinite-dimensional systems.
In this work we propose to use the diffusion maps embedding and transition matrix analysis [Hsu87,DJ99] to analyse the structure and dynamics of a turbulent system. The main advantage of our approach is that it represents the geometry of data in a low-dimensional space where analysis techniques such as the transition matrix method can be applied. Thus, it simultaneously delivers a geometric-topological and a dynamical understanding of the system. The particular system considered here is turbulent Rayleigh-Bénard convection (RBC). In RBC a horizontal fluid layer is confined by a warm plate from below and a cold plate from the top. It is an archetypical system to study pattern formation and turbulence and has been investigated for more than a century [Bén00,Ray16].
Under the Oberbeck-Boussinesq approximation [Obe79, Bou03, SV60] the system is fully determined by only two dimensionless parameters. These are the Rayleigh and Prandtl numbers, Ra = gα∆T L 3 κν and Pr = ν/κ, respectively. Here, L, ∆T , and g denote the height of the fluid layer, the temperature difference between its bottom and its top, and the gravitational acceleration. Furthermore, α, ν, and κ are the isobaric thermal expansion coefficient, the kinematic viscosity and the thermal diffusivity. The Rayleigh number measures the thermal driving, while the Prandtl number is the ratio of the two damping mechanisms, i.e., viscous and thermal diffusion. For small Ra the flow is laminar, and forms steady spatially periodic convection rolls with a wavelength of roughly twice the height of the fluid layer. The convection flow becomes unsteady, chaotic and finally turbulent as Ra increases. In sufficiently extended systems, the signature of the initial periodic rolls can still be observed in the averaged flow and temperature field as turbulent superstructures [PSS18]. For technical reasons, most experiments and numerical simulations have been conducted in smaller cylindrical containers, with an aspect ratio Γ = D/L between its diameter D and its height L close to unity. In this case, the superstructure consists of a single large-scale circulation (LSC) roll, that extends over the entire cell, so that warm fluid rises close to the sidewall on one side, while cold fluid sinks at the opposite side. This single roll structure is rather stable compared to the eddy turnover time, but exhibits interesting dynamics on larger time scales, such as diffusive azimuthal drift, cessations, or torsional oscillations [BA07,FBA08,BA09]. The shape and dynamics of the LSC heavily depends on Γ. With increasing Γ the single roll state (SRS) is replaced by counter-rotating rolls arranged side-by-side [PWB + 11].
Well studied is the case of Γ = 1/2. Here the system is predominantly in the SRS, but the single roll is significantly less stable than for Γ = 1 and often undergoes a transition to a state in which two counter-rotating rolls are on top of each other (double roll state -DRS) [XX08,WA11c]. In the DRS, the vertical heat transport is reduced by about 1 -2 % compared to the SRS. Note that in the following, we refer to the large-scale flow as LSC regardless of whether the system is in SRS or DRS. Studying these large-scale coherent structures is interesting as they pose forms of self-organisation in highly nonlinear systems very far from equilibrium that are currently not understood at all.
A particularly interesting variation of the classical RBC system is rotating RBC where the convection cylinder rotates around its vertical axis with a constant angular speed. Studying rotating convection is crucial for a better understanding of the large convection system in geoand astrophysics that occur in rotating frames and are thus strongly influenced by Coriolis forces (see e.g., [LE09, ZA10, KSN + 09, WWA16]).
In this paper, we investigate the dynamics of the temperature field in turbulent RBC for the rotating and non-rotating case using diffusion maps embedding. In particular, we re-analyse temperature measurements at various locations in the sidewall of a convection cylinder of aspect ratio Γ = 0.5, filled with water at an average temperature of 40 • C. The analysis here was done with measurements at Rayleigh numbers of Ra = 7 × 10 10 and 9×10 10 . Temperature measurements in the sidewall help to reveal large-scale convection structures, as in general, hot fluid rises from the warm bottom plate along one side, while cold fluid sinks down from the top plate along the opposite side (see fig. 1a). Despite this large-scale circulation, the turbulent fluid motion results in vigorous fluctuations of the temperature field in space and time that are detected by the sidewall thermometers. The relevant data have been published before in [WA11b] for the non-rotating case and in [WA11c] for the rotating case.
In the next section, we will briefly describe the experimental setup used to acquire the data as well as the structure of the data. In section 3 we explain in detail the diffusion map embedding and we will show the resulting embedding for a standard (horizontal and non-rotating) RBC case. In section 4 we will analyse the embedded data regarding dynamical features of the systems, and find that the long-term dynamical behavior can be connected to the evolution of the LSC. We close the paper with a discussion section.

Experimental setup and data collection
A sketch of the experimental setup is shown in fig. 1a. The main part of the experiment is the cylindrical convection cell of aspect ratio Γ = 0.5 that was closed by two horizontal copper plates from the bottom and the top. The top plate was cooled using temperature regulated water, while the bottom one was heated by an ohmic heater. The temperature of the bottom and top plate were kept constant to within ±0.02 K of the desired temperature during an experiment.
To characterise the temperature field, 24 thermistors were embedded in blind holes in the sidewall, roughly a millimeter away from the fluid. Eight thermistors were equally distributed along the azimuthal direction at each of the heights L/4, L/2, and 3L/4. The working fluid was water at a temperature of 40 o C, resulting in Pr =4.38 for all analysed measurements in this paper. Under the applied conditions the flow was highly turbulent with vigorous small scale fluctuations that self-organised into a large-scale circulation that was predominantly in the SRS. The SRS can be detected in sidewall temperature measurements as sinusoidal temperature variation along the azimuthal direction at a specific height ( fig. 1b).
During an experiment, the top and bottom plate temperature T t and T b were held constant and the temperature of 24 thermistors embedded in the sidewall were recorded with a rate of roughly one measurement every 3.4 s. The experiment was conducted for several hours. Data from the first hour were discarded as the system has not yet reached statistical equilibrium. Please see [WA11c] for further experimental details. black dots (left) and cross-section of the cylinder with the azimuthal location of the thermistors (right). Arrows mark the large-scale motion of the hot (red) and cold (blue) plumes that rise and sink close to the sidewall. (b) Thermistor measurements at one time instance at the heights L/4 (red), L/2 (green) and 3L/4 (blue) showing the temperature signature of the large-scale circulation in the single roll state. The solid lines are sinusoidal fits to the temperature as a function of the azimuthal position.
In the next sections we will reanalyse time series of the sidewall temperature measurements using diffusion maps embedding. We will compare the results with results published in [WA11c]. The analysed datasets together with the relevant experimental parameters are listed in table 1.

Data embedding
In the following we consider the measurements of the 24 side wall thermistors as observations of the full system's state (the full velocity and temperature fields) by a 24-dimensional observable at a given time. Before we start analyzing these data, we first describe the diffusion maps approach for an arbitrary data set (a cloud of data points).

Diffusion maps
Let the set of measured data points Z := {z 1 , . . . , z m } ⊂ R r be given. In our case, the state of the system at a given time t j is represented by a 24-dimensional vector (z j ∈ R 24 ), spanned by the r = 24 thermistor readings. Note that the components of z j are dimensionless, as each component was calculated by subtracting the mean temperature (T t + T b )/2 and normalising by the temperature difference The entire set Z thus represents a cloud of all points taken during an experimental run, i.e., m = 302360 points in the 24-dimensional phase space for data set D1 (see table 1). With the embedding algorithm explained below, we represent each state z j in a new coordinate system, spanned by convenient and representative embedding coordinates ξ •,j := ξ • (z j ) ∈ R n , n < r.
Here n is determined from the output of the diffusion maps algorithm and should be such that the representation of the data set with these n coordinates delivers insight into its geometry; in general we use n = 2, 3 for visualization reasons. The reduced embedding coordinates ξ • will be a selected subset of the full embedding coordinates ξ, as described below. We denote by ξ i (z j ) the i-th (full) embedding coordinate of the j-th data point, and ξ i , the i-th coordinate function, can thus be represented by a vector in R m through setting ξ i,j = (ξ i ) j = ξ i (z j ), where (·) j denotes the j-th entry of a vector.
Note that the method will use only the vectors of measurements, and in particular it will use no information whatsoever on the thermistor positions at which these measurements were taken. The diffusion maps approach is designed to reveal the geometric features with the largest (nonlinear) variation in the data set it is applied to.
The method we choose to work with, diffusion maps [CL06b], assumes that the given data points are sampled from a low-dimensional smooth manifold that is embedded in the highdimensional ambient space R r . It will try to construct (non-linear) coordinate functions from the data set into a low-dimensional space that is one-to-one; thus giving a low-dimensional parametrization (embedding) of the data set. To find this embedding map ξ • the method constructs a virtual diffusion process on the data points, with the jump probabilities of this diffusion being based on proximity between data points-and thereby it neglects any natural or artificial ordering of the data, in particular any temporal order. Such information, however, can later be re-included as we will do below in section 4.2.
1 The normalised temperatures zj relate to the measured temperatures Tj as zj = Here Tt and T b are the top and bottom plate temperatures and Tj is a 24-dimensional vector containing all temperature measurements at time tj.
The guiding intuition behind the construction is that small local Euclidean distances in the ambient space R r are good approximations of local geodesic distances on the unknown data manifold. This is not true for large distances, as the manifold might be curved. A diffusion-like process that is running locally "along the data manifold" thus "feels" the intrinsic geometry of this manifold, and its properties thus reflect this geometry.
The construction is as follows. First we choose a proximity or scale parameter ε > 0, and define the data similarity matrix K ∈ R m×m by where algorithmically often a cutoff radius is used to set K ij 1 to zero, thus obtaining a sparse matrix with essentially no loss of accuracy. One sees, that K ij is a measure for the closeness between the two data points z i and z j in their r-dimensional state space (i.e., the 24-dimensional space in our case). The row sums k ε (·) are now used to pre-normalize 2 the similarity matrix: By row-normalizingK we finally obtain the diffusion map matrix With this normalization, the matrix elements P ij can be interpreted as a probability that a random walker moves from z i to z j . P is a stochastic matrix giving rise to a-by construction reversible-Markov chain; a virtual diffusion on the data points. The central observation in [CL06b] (as pioneered in related works [RS00, BN03, DG03, ZZ04]) is that the right eigenvectors of P , denoted by Ξ i ∈ R m , at dominant eigenvalues Λ i ∈ R, can be used as intrinsic coordinates on the manifold. 3 More precisely, if (Λ i , Ξ i ) denote eigenpairs of P , we embed the data point z j into R m by Thus, the entry of the i-th eigenvector (scaled by the associated eigenvalue) at a data point is used as i-th coordinate value for the embedded data point. As m is usually large (e.g., m = 302360 for dataset D1), this is not yet a simplification. However, if the data manifold is low-dimensional, then only the first few eigenvectors carry geometric information, and the subsequent ones are redundant. Thus, we choose n of the dominant eigenmodes (which we denote, for notational simplicity, by the first n subdominant ones, keeping in mind that we could skip some, e.g., taking the 2-nd, 4-th, and 7-th mode) to define the diffusion map where the main advantage is in n m and n r. Note that we discarded the first eigenvector, since by the row-stochasticity of P we have Ξ 1 = (1, . . . , 1) T because P Ξ 1 = Ξ 1 , and thus it is an entirely non-informative coordinate. We summarize the notation used in this construction in Table 2.

Object
Properties / Notation Table 2: Summary of the notation used in the construction of the embedding by diffusion maps.
The computation of pairwise distances z i − z j as necessary in here, can be done efficiently up to hundreds of dimensions utilizing k-d tree data structure [Fri18], and subsequently solving a m×m sparse eigenvalue problem through vector iteration [Ste02], as only the dominant modes are required. This latter step is usually the computational bottleneck, and makes the method for m 10 4 infeasible, depending on the sparsity of P . Fortunately, one can subsample the data, compute (Λ i , Ξ i ) pairs on this data set of tractable size, and "interpolate" the embedding of the remaining points without the need of any further eigenvalue computations. More information on how to do this, on how to choose the proximity parameter ε, and on why the diffusion map gives good intrinsic coordinates on the manifold is deferred to Appendix A.

Applying the algorithm to the measurements
The data set. The measured data points are local observations z j = h(x(t j )) at specific times t j by some observation function h : X → R r of the original dynamics where x(t) ∈ X is the full state of the system (the velocity and temperature field), and F τ denotes the time dynamics of the system, i.e., governed by the momentum and energy equations in our case. Although strictly not necessary, here we assume that the observation times t j are equispaced, i.e., t j+1 − t j = τ for all j.
In the experimental setup of section 2 the state z j ∈ R r is given by the r = 24 sidewall temperature measurements and thus marks a single point in a 24-dimensional parameter space. The number of data points acquired during a single experimental run is usually at least m ≈ 10 4 .
We note that to uncover the attractor of a partially observed system, often delay-embedding in the sense of Takens [Tak81,Rob05] is used. For a noisy system, however, these approaches have their limitations, as one would necessarily reconstruct the state space of the noise as well. Our analysis did not show a significant difference in the geometry of the data set if analysed in delay-coordinates, thus in the following we will work in the original 24-dimensional data space.
Geometric features of the data: small scales. We now calculate diffusion maps from sidewall data of dataset D1 that was acquired without rotation of the convection cylinder. For this preliminary analysis we subsample the original data set and take 3800 equally sampled points between times 3400 s and 68930 s, as measured from the beginning of the experiment. We apply the automated procedure to find an appropriate scale parameter ε, as described in Appendix A.2. This automated procedure yields ε = 1.25·10 −4 , and estimates a dimension of the data manifold to be approximately 5. As we will see, the data does not have a simple manifold structure of locally homogeneous dimension (at least not on this scale), and noise plays a role as well, thus this dimension estimate should not be taken as a precise numerical value. Nevertheless, as the estimate is substantially smaller than the ambient dimension 24, we can claim with high certainty that the data yields a low-dimensional geometric structure.  [WA11c]. We see that all eigenvectors pick up a core disc-like structure and long excursions that depart from the core and return to it shortly after. We will see below that the disc-like structure is a representation of the large-scale flow structure in the system. The excursions on the other hand might be interpreted as signatures of the intermittent nature of the turbulent flow. We already see here that the disc-like core correlates with the amplitude ( fig. 2 b) and the orientation ( fig. 2 c) of the LSC. This indicates that the low-dimensional embedding readily represents important physical structures related the to large-scale circulation of the convective flow-the turbulent superstructures in convection-while the distinct eigenvectors encode transient excursive behavior of different kinds. We note that this distinguishes diffusion maps from linear methods, such as PCA, which are not able to effectively separate the bulk of the data from the excursions (computations not shown here).
In this study we shall focus on the dynamically most prevalent structure, namely the core region where most data points reside. Noting that the transients are rare events consisting of just a few data points-which nevertheless have a large geometric deviation from the "mean pattern"-, we will consider an embedding that focuses on the large-scales and suppresses the transients. This is achieved by increasing the proximity scale to ε = 5 · 10 −4 . An explanation of the phenomenon is given via the diffusion distances in Appendix A.1. Further increasing of ε would make the similarity of all point pairs asymptotically equal, thus we would lose all the information on the geometric structure in the data. Taking ε 1.25 · 10 −4 would, contrariwise, disconnect all data points, resulting in a similar loss of information. We refer to Appendix A.2 for details. We also note that there is a reasonable robustness of the method with respect to local variation of the proximity parameter.
Geometric features of the data: large scale. The results of the diffusion map embedding with ε = 5 · 10 −4 , are shown in fig. 3, where we plot three-dimensional embeddings (ξ k , ξ k+1 , ξ k+2 ) for k = 2, . . . , 6 of the data set by the 8 eigenvectors with the largest eigenvalues. The corresponding eigenvalues are: We see that the eigenvalues decrease sufficiently fast and thus we do not expect further relevant geometric information to be hidden in the lower spectrum. A comparison of the mutual ratios of the log-eigenvalues log(Λ i ) with one another show similar ratios than the eigenvalues of the Laplacian on a disc, indicating that the large-scale geometric structure of the data set is a disc (see Appendix B). Another evidence supporting this claim is that the embedded points in fig. 3(a-d) gather along a two-dimensional sub-manifold in the 3d-space, meaning that two of the embedding coordinates parametrize the third one, hence this does not yield additional geometric information. Moreover, these three-dimensional embeddings show very strong similarities to analogous embeddings by the eigenfunctions of the Laplacian on a disc, cf. fig. 13 in Appendix B, indicating a disc-like data manifold. We will thus focus in the following mainly on the embedding of the data in the (ξ 2 , ξ 3 )-space. First, we shall compare the information obtained from this embedding with a previous analysis of the data-which relies on the knowledge of the physical setting of the underlying experiment, such as the shape of the container and the existence of convection roll states.
Comparison with previous results. Figure 4 shows the (ξ 2 , ξ 3 )-embedding for ε = 5 · 10 −4 . Additionally, some further properties obtained in [WA11c] of the physical system are represented by the color of the data points. In fig. 4a, colors represent time at which the data were taken, ranging from blue at early times to yellow at later times. In this representation no clear correlation between time and the location of the data in the eigenvector space can be seen. That means, on sufficiently large time scales the system is in a statistically steady state. Later, in section 4 we will further analyse the dynamical behaviour of the system. amplitude of the LSC, and (c) orientation of the LSC as deduced from the azimuthal temperature profile at midheight (color hue shows orientation). Data points with very small amplitude have been omitted in (c), since calculating the orientation for the LSC with very small amplitude is not only ambiguous but also meaningless, as there might not be a well defined LSC. Subfigure (d) shows data points of the system when the large-scale circulation is in a single roll state (blue) or a double roll state (red).
The data points in fig. 4b are color-coded with the amplitude of the LSC. Here, we see a clear correlation. Points with large radii r ξ,j := ξ 2 2,j + ξ 2 3,j also show large LSC amplitudes, while points inside the circle, with small radii also show small amplitudes. Furthermore, as shown in fig. 4c, the angle θ := arctan (ξ 3 /ξ 2 ) also correlates with the orientation of the LSC very well. We thus see that the two dominant eigenvectors describe the most prominent variation of the LSC, the large-scale motion of the system. As we have pointed out already in the introduction, the LSC in cylinders of aspect ratio Γ = 1/2 can take the shape of a single roll (single roll state -SRS) or of two counter rotating rolls (double roll state -DRS) one on top of the other. In fig. 4d, we mark with blue dots whenever the system is in SRS, while red squares mark states, whenever the system is in DRS. We do not plot any points for which the system is in a undefined state 4 . It can be seen that points corresponding to SRS and DRS are clearly distinguishable. While SRS points are located on the outer areas of the disc, points corresponding to the DRS are located closer to the center of the disc. In this representation points belonging to SRS and DRS are not well separated, and there is a large number of points that belong to a transition state. This suggests that the DRS here is not a stable state but rather an intermediated state after the SRS has been rendered unstable, e.g., due to the growth of a corner roll (see [SNS + 10]).
It is worth to stress again that diffusion maps finds the largest nonlinear geometric variation in the data set, without any additional information about the system. Hence, while it identifies structure in the data, it does not tell us what these structures are in a physical sense, unless we provide additional knowledge, as for example about the positions of the thermistors. As an advantage, we can detect dominant dynamical behavior of the system without any additional physical knowledge, as long as these are present in the identified coordinates ξ. This will be discussed in the next section. In our case, a sufficiently spread-out arrangement of the thermistors is vital for "seeing" the LSC in the data. If the thermistors would be located rather close to each other, for example only at one half of the sidewall, the LSC could not have been detected -in this case the embedding would show other features related to the local fluctuations of the temperature field at that particular location.

Analysing dynamical features
It is in the nature of turbulent flows to show chaotic dynamics on different time and length scales. We therefore propose in the following new methods of statistically analysing the dynamical features of the convective flow utilising the diffusion maps approach. The main focus will be on analysing our data in their (ξ 2 , ξ 3 )-embedding. Recall that we denote this reduced embedding of the j-th data point by ξ •,j := ξ • (z j ).

The dynamics of the large-scale motion
As we observe in fig. 4, the large-scale geometry of the dataset resembles a disc with sparse occupation towards the center. In particular, data points that correspond to the SRS form a ring. This suggests that the orientation on the ring (the "angular coordinate") is the most important variable of the system, as it has the largest variation in the measurement space. Thus, in the following we investigate the dynamics of this coordinate. Note that investigating the stability and dynamics of the SRS has been a topic of interest for more than a decade [BA07,BA08a,BA09,SBHT14].
To this end, for each embedded point ξ •,j ∈ R 2 we compute its azimuthal angle θ j = arg(ξ •,j ). Since we want to investigate the long time dynamics, we need to get rid of any effects caused by its periodicity. Thus we unwind the data by assuming that within a single time steps the angle does not change by more than π. Any larger changes are due to winding and thus we unwind θ j by adding or removing multiple of 2π until −π < (θ j − θ j−1 ) ≤ π. Now, for a given offset s ∈ N, we calculate ∆θ j (s) = θ j+s −θ j and from this the corresponding mean displacement ∆θ j (s) and its variance ∆θ j (s) 2 − ∆θ j (s) 2 . Here the average · is done over all times, i.e., over all j.
Non-rotating convection cylinder. For experiment D1 with ε = 5 · 10 −4 , we analyse the data for the (ξ 2 , ξ 3 ) embedding and show the result in fig. 5. Fig. 5a shows the mean displacement (drift) and variance as a function of the lagtime ∆ := τ s on a linear scale, whereas the same data are plotted on the double-log scale in (b). We see in fig. 5a that the average displacement (yellow triangles) is negligible compared to the variance. This is expected, since there is no source for a constant mean drift of θ with time. The mean drift is not exactly zero as we average over a finite number of samples (i.e., short sections in time) and would further decrease for longer time traces. We will see below that the same system exhibits a nearly constant drift when the convection cylinder is rotated around its vertical axis with a constant rotation rate.
The variance (blue bullets in fig. 5) increases monotonically with increasing ∆. From fig. 5b we see that the increase appears to be linear for ∆ 100, suggesting a diffusive process on large time scales. A fit to the data reveals a diffusion coefficient (i.e., the slope of the curve) of D = (9.74 ± 0.01) × 10 −3 rad 2 /s. For small lag-times (∆ < 50 s) the variance does not follow a linear trend. Its slope in the log-log plot ( fig. 5b) is close to 2, which suggests a ballistic behaviour, i.e., (∆θ) 2 − ∆θ 2 ∝ at 2 for some a > 0. Such dynamics, i.e., a ballistic motion for small time scales and a diffusive behaviour for large time scales is characteristic for example for molecular motion and can be well described by a Langevin equation Here, ω 0 is a constant angular drift that might occur, for example in a rotating frame and γ is a resistance caused by friction. The stochastic noise η(t) describes contributions of turbulent fluctuations to the dynamics and is assumed to be δ-correlated, with η(t 1 )η(t 2 ) = σ 2 δ(t 1 − t 2 ). These fluctuations lead to increments of the form η(t)dt = σdW (t), with dW (t) being a standard Wiener process. For eq. (7) the mean displacement and its variance can be calculated [Gar04], giving and Thus, the drag and forcing coefficients γ and σ 2 can be computed from the diffusion coefficient D and the coefficient a of the parabola, yielding γ = 2a/D = (4.7 ± 0.4) × 10 −2 s −1 and A Langevin equation has already been suggested for the orientation of the LSC in turbulent thermal convection in cylinders with aspect ratio Γ = 1 by Brown and Ahlers [BA08b]. In order to compare our results with their results, we have to compensate for the difference in Rayleigh number. This can be done, as our cell height was similar to theirs and they provide fitted power law relations between Ra and D θ . While the fits in [BA08b] and thus our estimate includes significant uncertainty, we estimate a diffusion coefficient of D θ = 0.8 × 10 −3 rad 2 /s 3 , which is more than an order of magnitude smaller than for our case. This result reflects the larger fluctuations of the flow and smaller stability of the LSC in Γ = 1/2 cylinders.
Rotating convection cylinder. While for the non-rotating case, discussed above, the net drift was very small and just a statistical feature that would shrink even more with longer time series, the drift becomes significant when the cylinder is rotated around its cylinder axis. Figure 6 shows an analysis of the embedded data acquired in a cylinder under rotation with a rotation rate of 0.088 rad/s (data set D2). As a result due to Coriolis forces the internal convection structure also rotates with respect to the side walls and thus with respect to the temperature probes deployed in them.
We see that now ∆θ increases linearly with ∆. A linear fit to the date reveals a slope of ω 0 = (1222 ± 0.01) × 10 −5 ) rad/s. The variance of ∆θ on the other hand looks very similar to the non-rotating case in fig. 5. The variance increases initially (∆ 30 s) quadratically with a coefficient a = (1.48 ± 0.02) × 10 −4 rad 2 /s 2 and for larger ∆ linearly with a diffusion coefficient D = (6.59 ± 0.01) × 10 −3 rad 2 /s, corresponding to γ = (4.49 ± 0.06) × 10 −2 s −1 and σ 2 = (1.33 ± 0.04) × 10 −5 rad 2 /s 3 . It is quite interesting that while the drag coefficient γ is very similar to the non-rotating case, σ 2 and thus the diffusion coefficient are significantly reduced, by almost 40%. This shows how slow rotation suppresses turbulent fluctuations, resulting in a much more stable large scale circulation. This stabilising effect has also been observed in other statistical quantities such as the frequency of transitions between the double and the single roll state, the width of the probability density function of the LSC amplitude, or the number of Fourier modes determined from sidewall measurements [WA11a].
We note that Ra was roughly 15% smaller for the rotating data set (D2) as for the nonrotating one (D1). The difference in Ra is too small to have any significant influence on either γ or D, as we have observed from analysing other non-rotating data sets with Ra = 7.2 × 10 10 that were however much shorter and thus less suitable for a rigorous statistical analysis.
A note on the radius evolution. We have seen in section 3.2 that the radius r ξ = ξ 2 2 + ξ 2 3 corresponds to the amplitude of the LSC. We therefore also want to analyse the stochastic behaviour of r ξ . Similarly to θ, we now look at displacements ∆r ξ (∆) for given lagtimes ∆ and calculate their variance v r := (∆r ξ ) 2 − ∆r ξ 2 . We plot in fig. 7 v r as a function of ∆. For the non-rotating case ( fig. 7a) we see for ∆ < 20 also a ballistic regime, where data follow a parabola with coefficient a = (1.59±0.03)×10 −8 . For larger ∆ there is a short range where the data follow a linear function of ∆. The corresponding slope is D = (4.94 ± 0.02) × 10 −7 . For even larger ∆, the slope of the data decreases again. This decrease is expected, as r ξ can not take arbitrarily large values as also the amplitude of the LSC is confined. We note that Brown and Ahlers [BA08b] have modeled the dynamics of the amplitude of the LSC as a Brownian motion inside a potential well. As fitting the complete model would be more involved, we defer this to future work. With the more general method presented in the next section, the statistical behaviour of r ξ is captured as well. Note, that information about the exact amplitude of the LSC can not be calculated from r ξ but only qualitative features of it. Fig. 7b shows a very similar analysis for the rotating case (dataset D2). There, the coefficients a and D are significantly smaller, suggesting that also in this quantity the stabilising effect of rotation can be seen. We do not elaborate on this finding any further, as clearly a simple Langevin model (eg. (7)) is only valid for very small deviations, and v r is a nonlinear function of ∆, just as the relationship between r ξ and the amplitude of the LSC is assumed to be nonlinear. Furthermore, the relationship between the LSC amplitude and r ξ might be even different for the non-rotating and the rotating case.

Transition matrix
In the previous subsection we have discussed mainly the angular dynamics of the single convection roll (SRS) by considering the dynamics of the polar coordinate in the (ξ 2 , ξ 3 ) parameter space. However, the dynamics could have other important features not readily revealed by the embedding geometry. In examples below we will investigate the switching between SRS and non-SRS, and also the dynamics observed for a small-ε embedding in three dimensions, where the transient excursions of the dynamics are still dominating the geometry. The tool for this is going to be the transition matrix, which describes the redistribution of states under the dynamics between subsets of the state space. As such, it is an approximation of the so-called transfer operator that describes the evolution of distributions due to the dynamics of the system [LM94,DJ99].
Recall that the data points were obtained from a long simulation, and were sampled at a nearly constant rate with time period τ = 3.4 s. Thus, the ordered time series (ξ •,j ) j=1,...,m , represents the dynamical time series x(t j ) j=1,...,m observed through ξ • • h. Consequently, one can view ξ •,j+s as the image of ξ •,j under the dynamics after a lagtime ∆ = sτ , where s is the chosen offset. Let J = {1, . . . , m − s}. The discretization of the dynamics is then done by constructing the so-called transition matrix T = T (s) ∈ R p×p by counting the relative number of transitions from box i to box , within the time from step j to j + s. Note that we can use an arbitrary offset s ∈ N in (11), and arbitrary partition P of the data set, but should consider the following aspects such that the transition matrix represents the dynamical properties of the system well [PWS + 11, SNL + 11]: (a) The offset should not be too small with respect to the size of the boxes B i , otherwise most points do not leave the box they start in, and thus the dynamical features remain invisible.
(b) The offset should not be too large with respect to the number of data points in the boxes, otherwise the Monte Carlo estimate (11) might be strongly erroneous.
(c) On a similar note, the boxes should not be too small with respect to the data density, otherwise there are too few data points per box, and again one obtains too high sampling errors.
By construction, T is a row-stochastic matrix. Its elements approximate the probabilities that a point from a certain box B i is mapped into another box B by the dynamics. For determining the box partition and assembling the transition matrix we use the MATLAB-based software package GAIO [DFJ01]; cf. https://github.com/gaioguy/GAIO/.
Note that in principle this analysis can be applied directly to the 24-dimensional parameter space in which our original data points x k are observed. However, partitioning a highdimensional space suffers from the curse of dimensionality, and becomes quickly computationally intractable. Also, with the diffusion maps embedding we can represent our data points based on a few chosen most important independent features and neglect everything else. In the next section we will elaborate how spectral analysis of T can uncover the long-term dominant dynamical behavior of the underlying process.
Assembling the transition matrix by (11) scales linearly with the number of data points, making it numerically tractable for a large amount of dynamical data. Because for diffusion maps the computational bottleneck is to acquire the pairwise distances between (close-by) data points and to solve the eigenvalue problem, it would be desirable to do this computation on data sets with m 10 4 . These seemingly opposing attributes can be brought together by the outof-sample extension of diffusion maps [CSSS08]. On the one hand, for a representative subset Y ⊂ Y of data points the diffusion maps embedding of the dominant geometric features of the data set is not improved by increasing the size of Y . On the other hand, having computed a diffusion map embedding for some Y , there is a substantially cheaper way to approximate the embedding coordinates of additional points (i.e., Y \ Y ) than to compute the embedding for the full data set. This method is described in Appendix A.3 and it is utilized in the following to compute the embedding on less than 5 × 10 3 data points, and to "interpolate" the embedding of up to 3 × 10 5 additional data points. These are then used to calculate the transition matrix T .

Extracting dynamical features from the transition matrix
Stationary distribution. The transition matrix provides access to the long-time behavior of the dynamics. For instance, a stochastic dynamics governed by the matrix T (i.e., a Markov chain jumping from box to box) will be ergodic if T is irreducible, i.e., every box can be reached from any other box with a positive probability, thus T k i > 0 for a sufficiently large k. Then the relative time duration that the process spends in box B b is given by the b-th component of the stationary distribution µ = (µ 1 , . . . , µ p ) that is given by µ T T = µ T with p a=1 µ a = 1. It is straightforward to see that for our transition matrix holds µ b represents the probability that a random data point lies within box B b and it is µ b = lim k→∞ T k ib .
Almost invariant behavior. If the transition matrix was not irreducible, there would be disjoint sets I 1 ∪ . . . ∪ I = {1, . . . , p} such that T ab = 0 whenever a ∈ I i and b ∈ I j , i = j. The index sets I i , and equivalently the sets b∈I i B b , are then called invariant. This means that there are different sets of boxes, between which transitions are impossible. A point ξ •,j can only transfer to other boxes that are part of its own box set, but not to boxes in other sets. One can show that in this case T has an -fold eigenvalue 1. There would be right eigenvectors ρ k satisfying Whether the transition matrix is irreducible or not highly depends on the time duration considered here, i.e., the offset s. For example, for s = 0 every box is decoupled from the other boxes, while for s → ∞ we gain the stationary solution T ab = µ b (naturally, one would also need an arbitrarily long data set to be able to set up T ). Now, if for some s the transition matrix is not irreducible but close to an irreducible matrix (with respect to some matrix norm) with invariant sets, then T has eigenvalues close to 1, and the corresponding eigenvectors are close to those in (12). That means, in turn, if T has eigenvalues close to one, we can identify regions (unions of boxes belonging to the same index set I i ) that the system is unlikely to leave within time ∆, i.e., they are almost invariant [Dav82, GS98, DJ99, DW04, GS06].
In fig. 8(a) we show the eigenvector of T (obtained with s = 10) corresponding to the largest real eigenvalue, λ 8 = 0.496. As τ ≈ 3.45 s, the decay rate of this eigenvalue is As the sign structure of the eigenvector distinguishes regions well described by some level set of r ξ , this means that the largest almost invariance of the dynamics is in the radial direction. Comparison with fig. 4 suggests that the dynamical feature found here approximately separates the SRS from non-SRS of the convection. Note that this is not the same as separating SRS from DRS, as this separation would be done by the gray circle in fig 8, as obtained from fig. 4(d).
Computing expected lifetimes supports this observation. In [WA11b] expected lifetimes of 53 s for the DRS and 270 s for the SRS were measured, while both seemed to be exponentially distributed random variables. Fig. 8(b) shows the empirical distribution of jump times between the almost invariant sets. They seem exponentially distributed too, suggesting the dynamics on this level of coarseness (i.e., only observing whether r ξ > c for some appropriate constant c) is Markovian. The expected transition time between the yellow and the blue areas are (145 ± 5) s (blue → yellow) and (168 ± 5) s (yellow → blue). The numerical values in these two studies are of the same order of magnitude, however the discrepancy between them is not surprising. It stems from the fact that [WA11b] was measuring lifetimes of SRS and DRS, however there is a transition region between the two which does not belong to either. Our analysis assigns the transition region, the DRSs and some SRSs with smaller r ξ to one almost-invariant set (blue region), and the rest to the other. where ω = e 2πi/p and i is the imaginary unit. The corresponding right eigenvectors ρ k are ρ k σ(a) = ω k ρ k a . (13) It follows that S p = Id, the identity, and thus a permutation induces a cyclic behavior. If all we know is S, we can deduce the dynamics from its eigenvectors by (13). Note that every permutation matrix is also a stochastic matrix.
With a similar reasoning one can also relax the permutation requirement on S in the following ways: (b) If the matrix S is stochastic, and close to a permutation matrix, then also its eigenvalues and eigenvectors are close to those of a permutation matrix. In this sense, one can speak of almost-cyclic behavior.
To summarize, if the transition matrix T shows eigenvalues close to those of a permutation matrix, i.e., eigenvalues close to the unit circle in the complex plane, then the time-series is expected to have an almost cyclic component in it [DJ99]. Via transition matrix analysis one is able to find the hidden cyclic behavior even when considering an embedding that does not map the data on a disc where an angular coordinate can readily be identified. To show this, we use the automated ε-selection procedure from Appendix A.2 for the data set D2, where the convection cylinder is rotating. This gives ε = 7.5 · 10 −5 . We subsample the data set, such that only every second measurement is used, and "interpolate" the remaining data points as described in Appendix A.3. For this proximity parameter, the large but rare excursions dominate the geometry, and we choose ξ • = (ξ 4 , ξ 5 , ξ 6 ), where the disc-like geometry is not obvious. To compute the transition matrix, we subdivide the bounding box of the embedded, three-dimensional data into 128 × 128 × 128 congruent boxes, by keeping only those that contain data points; leaving us with p = 856 boxes. With the offset s = 5 we compute the transition matrix T , and find the second eigenvalue with decay rate κ = −0.0026 ± 0.0116i.
The real part of the corresponding eigenvector is shown in fig. 9(b). It has negligibly small values on the excursing paths, and shows a sinusoidal pattern on the part of the embedding space that can be attributed to the SRS; cf. fig. 9(a). Application of T turns the pattern of the eigenvector counterclockwise with period t per = 2π Im(κ) = 540 s. Note that in section 4.1 we calculated for the same dataset (D2) a rotation period of the structure of 2π/0.0122=515 s. The discrepancy is due to the embedding with different proximity parameters ε and due to the discretization error from the transition matrix computation on a finite partition. In summary, the transition matrix method reveals the cyclic behavior in the dynamics even for complicated embedding scenarios.
Transition matrix analysis summary. In figures 10 and 11 we depict the real parts of the first 12 left eigenvectors of the transition matrix for the data sets D1 and D2, respectively. Their eigenvectors can be separated roughly in two classes: the first showing sinusoidal patterns in This suggests that-at least on long time scales-the dynamics in the orientation and the radial direction are independent; otherwise we would expect a mixed-mode eigenvector, i.e., one that cannot be written as a product φ ang (θ)φ rad (r) of purely angular and radial modes. If the dynamics were a pure noisy rotation of the orientation, the drift and diffusion coefficients ω 0 and D from section 4.1 could be approximated from the eigenvalues of the transition matrix directly, as described in Appendix C. This works well for the angular frequency, as shown above for the data set D2, is however more defective for the diffusion coefficient.

Comparison with other methods
There is increasing activity on the dynamical analysis of flow fields; in particular the last decade witnessed the development of different novel tools. We briefly discuss two such methods. The first one seems to be the most widespread to date. The second is in spirit the closest to ours. Then we draw a conceptual comparison with the theory of effective (or reduced) transfer operators and reaction coordinates [BKK + 18], and discuss differences to Koopman Mode Decomposition [RMB + 09].
Dynamic mode decomposition. Dynamic mode decomposition (DMD), first introduced in [SS08], was devised to reveal long-term dynamical features of a system (eq. (6)), when all that is available are observables z 1 , . . . , z m sampled at a constant rate. With some offset s, building the  data matrices of the same size, one seeks a matrix (linear transformation) A such that AX ≈ Y , where this equality is solved in a least-squares fashion (by minimizing the Frobenius norm); i.e., A = Y X + with X + being the pseudoinverse of X.
In [WKR15], a connection of DMD and the so-called Koopman operator 6 has been revealed, which has been extended to the Perron-Frobenius operator in [KKS16], which is dual to the Koopman operator. In particular, DMD converges to a Galerkin projection of the Koopman (and, by a similarity transformation, to the Perron-Frobenius) operator onto the space spanned by functions linear in the observable vector z; cf. [KKS16].
The main connection to our work here is that the transition matrix T (eq. (11)) is a discretization of the Perron-Frobenius operator as well, as described in [KKS16, Section 3.2]. We expect this discretization to be superior to DMD, as it is a projection onto hundreds of boxes in our example cases, while DMD is a projection merely onto the 24-dimensional observable space. Nevertheless, an application of DMD with offset s = 5 to the data set D2 gives a mode with decay rate κ = −0.0021 ± 0.0116i, in good agreement with the above. The reason for this is that the complex cyclic mode can be well approximated by the linear functions of the observable; e.g., 8 k=1 e 2πki/8 ((z) k + (z) k+8 + (z) k+16 ), where z = ((z) 1 , . . . , (z) 24 ) T ∈ R 24 . However, applying DMD to the data set D1, we do not find any mode with decay rate close to κ = −0.02, which we attribute to the fact that the radial direction in our embeddings above-where the dynamics with this decay rate is happening-is a nonlinear function of z.
Thus, DMD can capture dynamical features that can be characterized by linear functions of the observable, but is oblivious to other dynamical behavior.
Direct Koopman analysis by diffusion maps. Berry, Giannakis, and Harlim recently developed a method [BGH15] with strong connections to the one we presented here. They approximate the so-called (stochastic) Koopman generator L, i.e., the generator of the stochastic differential equations assumed to be underlying the dynamics that governs the data, directly on the diffusion maps eigenvectors Ξ i . To this end they use the approximation to obtain a converging (Galerkin) approximation as the number of dynamical samples grows. Our transition matrix T can be seen as an approximation of e τ L on a discretization of the domain of this operator; see [FJK13] for connections between the transition matrix and discretization of the generator. The main difference lies in the discretization of the approximation space: while we are aggregating data points in a fixed number of partition elements, the approximation L ∈ R m×m of [BGH15] grows in size with the size of the data set, and makes it thus numerically intractable at some point. It should be nevertheless remarked that by combining their method with the out-of-sample extension of diffusion maps, as used here, it is possible to compute more accurate (Galerkin) approximations of L on a fixed tractable set of diffusion maps eigenfuctions [TGDW19].
Effective transfer operators and Koopman Mode Decomposition. Effective (reduced) transfer operators arise in the context when the state of the system is only observed partially, through some non-linear observation function ξ : X → R r . They propagate distributions or observables f ξ that are functions of ξ, i.e., f ξ =f • ξ withf : R r → C, describing the effect of the dynamics as seen through the observation function ξ, conditional to the system being in equilibrium. Thus, they are given by the conditional expectations where E[ · ] denotes expectation with respect to the invariant measure of the system. It is immediate that our transition matrix T , defined by (11) in section 4.2, is a discretization of P τ ξ with observation function ξ = ξ • • h. In fact, it can be seen as a Galerkin projection of this operator on the space spanned by piecewise constant functions over the boxes B i , cf. [KKS16] for further details. Its transpose T T is an analogous approximation of the effective Koopman operator K τ ξ , which is the adjoint of P τ ξ . It is shown for reversible dynamics in [BKK + 18] that the dominant timescales of the original system are still well retained in the system observed through ξ, i.e., the dominant spectra of P τ and P τ ξ are close, if the dominant eigenvectors φ j of P τ are well parametrized by the observation function, i.e., there existφ j : R r → C, such that φ j ≈φ j • ξ. This non-linear representation property is often present in molecular systems where the long time scales are connected to transitions between regions of state space ("reactions"), hence such a ξ is called reaction coordinate.
Koopman Mode Decomposition (KMD) [RMB + 09]-of which DMD is a particular data-based approximation-also considers the dynamics affecting a vector-valued observation function ξ. It assumes that the observation function can be decomposed in the eigenfunctions of K τ , i.e., ξ(·) = ∞ k=0 v j φ j (·), where v j ∈ C r are the vector-valued coefficients of this decomposition (the Koompan modes). The evolution of the observation ξ(x(t)) of the true systems state x(t) can then be described as or some suitable truncation of this series.
Given some observation function ξ, both KMD and spectral analysis of the effective transfer operator are model reduction tools for the original, complex, possibly high-dimensional deterministic or random system. While KMD aims at reconstructing the (expected) dynamics pointwise, spectral analysis of the effective transfer operator gives information about the dynamical processes with the slowest time scales. Requirements for good performance of both methods are in some sense quite opposite: for KMD to perform efficiently, one requires ξ to be (componentwise) representable through linear combinations of a reasonable number of eigenfunctions of the Koopman operator, while the effective transfer operators require the dominant eigenfunctions of the full transfer operators P τ or K τ to be approximately non-linearly parametrizable by a low-dimensional observation function. From the perspective of having a complex (chaotic) system at hand, it appears more suitable to ask for the long-term dominant statistical behavior of the system, as delivered by analyzing the effective transfer operators, than to consider single trajectories, as given by KMD. Also, once we have given a fair parametrization ξ of the (approximate) attractor of a system, unless important dynamical processes happen in dimensions of the attractor not seen by this parametrization, the eigenfunctions of the full transfer operators are non-linear functions of ξ.

Summary and discussion
In this paper, we have analysed geometric and dynamical features of turbulent Rayleigh-Bénard convection using the diffusion maps embedding approach. The state of the system was measured by the temperature at 24 different locations in the sidewall of the convection cylinder, and the approach embeds the unordered set of measurements into a space of chosen dimensionality.
We applied this embedding to the data of two different data sets, one where the convection cylinder was at rest and another one where it rotates with a constant angular speed. We found that in both cases the transformed data set forms a disc in the space spanned by the two dominant diffusion-map coordinates ξ 2 and ξ 3 . We found that this disc represents the large-scale circulation (LSC) in the convection container, with the orientation of the LSC being represented by the azimuthal location on the disc, and the strength of the LSC being represented by the radial distance from the disc center.
We furthermore investigated the double and single roll states and their representation in the (ξ 2 , ξ 3 )-embedding. We found that points corresponding to the DRS were located at the center of the disc, while points belonging to the SRS were located at the perimeter of the disc. Analysing the azimuthal dynamics of the embedded data shows a diffusive process on large time scales and a ballistic process on short time scales. This behavior supports a model for the large-scale circulation based on a Langevin equation suggested by Brown and Ahlers [BA08b].
While we clearly see the diffusive azimuthal motion of the LSC in our approach, we did not detect any clear signatures of the torsional or the sloshing mode of the LSC (see e.g., [ZXZ + 09, BA09]). We believe that this is mainly due to the small aspect ratio of the cell Γ = 0.5, causing the SRS to be rather unstable with frequent switching between SRS and DRS. Due to this erratic behaviour of the SRS, periodic torsional and sloshing oscillations are difficult to detect. However, we do expect to observe the torsional mode for Γ = 1 cells when the SRS is more stable and the torsional mode better detectable.
If the embedding does not represent the bulk dynamical behaviour (because, e.g., large but rare dynamical excursions dominate the geometry in state space), a transition matrix analysis in the low-dimensional embedding space can reveal hidden large-scale dynamical features, e.g., cyclic dynamics; shown in Figure 9.
The proposed approach belongs to the class of transfer (or Koopman) operator methods to approximate properties of dynamical systems. While having direct connections to other methods of this class [RMB + 09, WKR15,BGH15], it is a novel composition of the so-called Ulam discretization of transfer operators [Ula60,Fro98,DJ99,KKS16] and the diffusion maps [CL06b,CL06c] manifold learning approaches to approximate a so-called effective transfer operator [BKK + 18]. We discussed accuracy and numerical cost in connection with related methods. We expect the (spectral) convergence of our method towards the effective transfer operator P τ ξ from (14) in the correct limit of infinite data points and vanishing-diameter box-covering. This should follow by combining results from [VLBB08,DJ99] and [KKS16,Appendix C], under the assumption that the data lies exactly on a finite-dimensional smooth manifold.
We note that many of the findings reported here, can also be calculated by using a classical approach, where the amplitude and orientation of the LSC is determined from fitting a cosine function to the thermistors at a given vertical position [BA07,FBA08,BA09,BA08b,WA11c]. While this approach assumes already the existence of an LSC, the diffusion maps approach in contrast is suitable to detect unknown structure in data in the first place.
In summary, our approach was able to provide geometric intuition about the dynamical state space (in a general sense, the stochastic "attractor") of the infinite-dimensional Rayleigh-Bénard convection system in a cylinder. Furthermore, the long-term dynamical behavior on this space was revealed, and found to be in good quantitative agreement with previous experimental results. While these experimental results heavily relied on the knowledge of the physical space's geometry and model assumptions, our analysis is not utilizing such knowledge at any point. We believe that this and similar analysis techniques can help to improve the understanding and reduced modeling of high-or infinite-dimensional complex systems.

Acknowledgement
We acknowledge support by the Priority Program 1881 "Turbulent Superstructures" of the German Research Foundation (DFG). We thank Michael Wilczek and Jörg Schumacher for valuable discussions.

A.1. Diffusion distance
To understand in which sense does diffusion maps retain the geometric properties of the data manifold after embedding, let us recall that P is a stochastic matrix and thus P ij can be viewed as the probability that a random walker jumps from the data point z i to the data point z j . Now define a diffusion distance as follows: Here, π = (π 1 , . . . , π m ) T the stationary distribution of the Markov chain, and satisfies , such that π i P ij = π j P ji .
In this definition, the diffusion distance D(z i , z j ) is small, if the transition probabilities from z i are similar to the transition probabilities from z j . This can be written as P i− ≈ P j− , where P i− denotes the i-th row of P . Thus, we note that forξ , a weighted Euclidean distance in the space of distributions on the data set corresponds to the diffusion distance on the data manifold. The important observation in [CL06b] is that this now can be reformulated using the eigenvalue decomposition of P to yield This now means that with the embedding where · is the usual Euclidean norm. Eq. (18) holds only for the full m-dimensional embedding. Assuming that the eigenvalues Λ decay sufficiently fast (after sorting them accordingly), we can represent the data points sufficiently well by just a few eigenvectors, as in (5), i.e., in a lower-dimensional space. Also if this is not the case, one still obtains a valuable parametrization of the data manifold by just a few (usually the number of eigenvectors to take is the dimension of the data manifold) selected eigenvectors, as the other eigenvector do not contain additional topological information (these are so-called higher order harmonics, cf. Appendix B for an example). Of course, the quantitative property (18) is lost, but on a qualitative level one still obtains a low-dimensional one-to-one embedding of the data manifold.
Let us now consider the statement from section 3.2, claiming that a larger proximity parameter emphasizes the one-dimensional transient loops less in the embedding. Note that the data points on the transient arc are sparsely spaced along a one-dimensional line. This means, for a small ε there will be non-negligible transition probability P ij essentially only between neighboring points z i and z j . Thus, the random walker needs many steps to transition from one part of the arc to another. Meanwhile, for a larger ε, transitions jumping over several neighbors are possible, speeding up transitions strongly. In the bulk, for both small and large values of ε there are plenty neighbors present, allowing for a fast diffusive spreading. Thus, the diffusion distance between points in the bulk stays low in ε is changed from large to small, while on the transient arc it becomes large. If the pairwise mutual distance between the arc points is large, they need to fill in more space in the embedding.

A.2. Optimal choice of the proximity parameter ε
Here we will consider the task of automatically determining a "good" proximity parameter value ε (sometimes also called bandwidth) in the diffusion maps method. The ideas presented here stem from [CSSS08], and have been later refined in [BH16].
Let us consider the entries K ij (ε) = exp −ε −1 x i − x j 2 of the similarity matrix in the diffusion maps construction. If there are m data points sampling the manifold M, then, by interpreting the following sum as Monte Carlo approximation of an integral, we obtain Here, on the one hand, ( * ) works if ε is sufficiently large, such that the point cloud {x i } m i=1 "resolves" the functions exp −ε −1 · −x j 2 properly, such that the Monte Carlo estimation is valid. On the other hand, ( * * ) is a good approximation, if ε is sufficiently small, such that the integral M exp −ε −1 · −y 2 is well approximated by the same integral on the tangential space R d of M at y. For this, the "Gaussian bell" should be sufficiently localized, i.e., ε small. It is assumed, that in between, there is a sweet spot for the values of ε that the above approximations hold. It follows that We note the two limiting behaviors: • As ε → 0, we have K ij (ε) → δ ij , the Kronecker delta. Thus, S(ε) → m.
In between these two extremes there should be a region of linear growth in the double-logarithmic plot S(ε) versus ε, according to (19). This is suggested to be determined by maximizing d log(S(ε)) d log(ε) . The idea is that such an ε is neither too small (compared with the data point density), nor too large (compared with the diameter of the data point cloud). Note that the procedure also gives an estimate of the dimension of the manifold M through the slope computed in (19).

A.3. Out-of-sample extension of diffusion maps
Once we have computed the diffusion map matrix P and its eigenpairs (Λ, Ξ) for a given fixed set of data points, z 1 , . . . , z m , we would like to embed a new data point z without repeating the whole diffusion map computation anew on the entire data set augmented by z. Especially, we would like to avoid solving the numerically expensive eigenvalue problem.
To this end, let us write the eigenvalue equation P Ξ = ΛΞ of the diffusion map matrix in the following functional form, where p(z i , z j ) = P ij . The main idea of the "interpolation" is to use (20) to evaluate Ξ in an arbitrary point z, cf. [CL06c,Def. 2]. For this we need that the function z → p(z, z j )-for fixed data points z 1 , . . . , z m and given z j -can be evaluated at any point of the space, not just in data points z i . This is possible by simply repeating the construction in (1)-(3) with replacing z i by z. There, the summation in (1) and (2) is carried out over the original data points only. Thus, (3) gives the values p(z, z j ), and by (20) we set where the Ξ(z j ) have already been computed in advance. Naturally, one can vectorize this procedure for a whole set {z 1 , . . . ,z M } of new data points. Then we need to compute the interpolating matrixP ∈ R M ×m withP ij = p(z i , z j ), and Ξ := (Ξ(z 1 ), . . . , Ξ(z M )) T is obtained byΞ = 1 ΛP Ξ.

B. Eigenmodes of the Laplacian on a disc
To be able to better understand the diffusion-maps embedding, we will briefly consider the eigenvalue problem of the Laplacian on the disc D = {(x, y) ∈ R 2 : x 2 + y 2 ≤ 1} with homogeneous Neumann boundary conditions. It turns out [GN13, section 3.2] that eigenfunctions are all separable in the sense that they can be written in polar coordinates (r, φ) as n,m (r, φ) = cos(nφ)J n ( −λ n,m r), ψ (2) n,m (r, φ) = sin(nφ)J n ( −λ n,m r), where n ∈ N 0 , m ∈ N, J n is the n-th Bessel function of first kind, and −λ n,m is the m-th positive root of their derivatives J n . Due to the rotational symmetry in φ, the eigenspaces for n > 0 are twice degenerate, and ψ (1) n,m , ψ n,m form a basis for them. If n = 0, there is no ψ (2) 0,m , because the eigenspace is non-degenerate. The λ n,m are the associated eigenvalues. For instance: λ 0,1 = −14.682, λ 1,1 = −3.390, λ 2,1 = −9.328.
The corresponding eigenfunctions are shown in fig. 12 as surface plots. If the disc has radius R instead of one, the eigenvalues are scaled by 1 R 2 . Note that depending on the kernel function used in the diffusion maps method, one approximates different multiples of the Laplacian, i.e., in the appropriate sense where for our kernel function k(x, y) = exp(−ε −1 x − y 2 ) we have C = 1 4 , cf. [HAL07, Theorem 25]. Thus, the transformed eigenvalues (λ − 1)/ε (or log(λ)/ε) of the diffusion map approximate 1 4 λ n,m . As for high-dimensional data we cannot directly compare the diffusion map eigenfunctions with those of the Laplacian on a disc, an indication whether the data is an almost-isometric embedding of a disc in high dimensions is given by the following: (i) The eigenvalues themselves depend on the radius of the circle, but their ratios should stay close to the ratios given by the λ n,m above.
(ii) Another indicator for a disc-like manifold is if the embedding by different diffusion-map eigenfunctions resembles an associated embedding by the eigenfunctions of the true Laplacian on a disc. Some of these are shown in fig. 13. Thy are to be compared with the embedding of the convection data in fig. 3. That these three-dimensional embeddings show two-dimensional surface, is an indication of degeneracy (non-linear interdependence) between the eigenfunctions. This is expected, as the manifold we consider is itself merely two-dimensional.
(a) (b) (c) Figure 13: Embeddings of the disc D into R 3 by different combinations of Laplacian eigenfunctions. The cone-shaped (a), saddle-shaped (b), and figure-eight shaped (c) embeddings appear for the convection-roll data set as well, indicating a disc-like data manifold.

C. Noisy rotation on a circle
Let us consider the SDE dθ(t) = ω dt + which describes a noisy uniform rotation on the circle with circumference L, that we identify with the (periodic) interval [0, L). The associated backward Kolmogorov equation reads as ∂ t u = D 2 ∂ θθ u + ω ∂ θ u for u = u(t, θ) .
The associated eigenvalues (and eigenfunctions) are those of the right-hand side, which is a second order differential operator; called the generator. Expressing the Kolmogorov equation for a Fourier basis with basis functions φ k (θ) = exp i 2πkθ L , k ∈ Z, we directly obtain that the basis functions are eigenfunctions at eigenvalues Thus, λ k , λ −k are complex conjugate eigenpairs, and the corresponding eigenspace can be spanned by the real functions sin 2πkθ L and cos 2πkθ L . The forward Kolmogorov (or Fokker-Planck) equation associated with (22) reads as ∂ t u = D 2 ∂ θθ u − ω ∂ θ u, and has thus the very same eigenfunctions as the backward equation at complex conjugate eigenvalues, i.e., λ f wd k =λ bwd k . Thus, the noise and drift coefficients show up separately in the real and imaginary parts of the eigenvalues, and so we can estimate them from those.

D. Azimuthal drift and diffusion as a function of the applied rotation rate
We have explained in section 4.1 how one can extract information of the azimuthal orientation and their time dependence from the embedded data. We have shown that the azimuthal orientation can be modeled with a Langevin equation that includes a constant azimuthal drift ω 0 . This drift is zero (or small for a single time trace) for a non-rotating system, but becomes important under rotation. Fig. 14 shows the azimuthal diffusion D, the ballistic coefficient s and the average drift ω 0 for different rotation rates (expressed dimensionless as the inverse Rossby number 1/Ro). Let us first have a look at the drift in fig. 14b. Note that using the diffusion maps embedding and calculating the azimuthal position from it, does not preserve the sign of the azimuthal angle. As a result we cannot distinguish between cyclonic and anti-cyclonic motion of the flow structure. Thus, we assigned the sign from the classical analysis to ω 0 . The result in fig. 14b shows that the in this way calculated azimuthal velocity (in fact these are not flow velocities but rather velocities of the thermal structure) agree very well with the classically calculated values. |ω 0 | increase first with increasing 1/Ro reach a maximum at around 1/Ro≈ 0.5 and decrease after that. Another minimum is reached at around 1/Ro≈ 0.8 and then |ω 0 | increases again, with a negative sign.
The diffusion rate D and the ballistic coefficient s also change as a function of 1/Ro, as can be seen in fig. 14a. Interestingly, both values decrease initially with increasing 1/Ro, they reach a minimum at roughly 1/Ro≈ 0.8 and increase for larger 1/Ro. Note that a change in the heat transport was observed at 1/Ro=0.8. For faster rotation rates, the vertical heat transport is enhanced compared to the non-rotating case.
We interpret the results found here, such that for sufficiently small rotation rates, the flow is somehow stabilised, turbulent fluctuations are reduced or have less influence on the large-scale circulation and thus the diffusion coefficient is reduced. The finding further support that at 1/Ro=0.8 a transition occurs and that the fluid is in a different state for larger 1/Ro. While it is not clear what exactly happens at this point, there are evidences from previous studies that a large convection roll is replaced by multiple vortices and columnar structures in which cold and warm fluid is transported from the boundaries into the bulk (see e.g., [WSZ + 10, SOLC11]). The diffusion of the structures is then no longer driven by the turbulent plume emission, but rather by the motion and interaction of the vortices. Drift ω 0 as a function 1/Ro calculated via diffusion maps as explained above (blue bullets) and via the classical way (red x, see [WA11a]). One should note that since we cannot determine the direction of the drift, i.e., the sign in front of ω 0 , we assigned the sign from the classical analysis to it. Thus if the classical analysis shows cyclonic motion, we add assumed that ω 0 was positive, while we assumed negative ω 0 when the classical analysis shows anti-cyclonic motion. The black vertical line marks 1/Ro = 0.8, where there onset of heat transport enhancement due to rotation was observed (see [WA11c]).