Predicting flow reversals in chaotic natural convection using data assimilation

A simplified model of natural convection, similar to the Lorenz (1963) system, is compared to computational fluid dynamics simulations in order to test data assimilation methods and better understand the dynamics of convection. The thermosyphon is represented by a long time flow simulation, which serves as a reference"truth". Forecasts are then made using the Lorenz-like model and synchronized to noisy and limited observations of the truth using data assimilation. The resulting analysis is observed to infer dynamics absent from the model when using short assimilation windows. Furthermore, chaotic flow reversal occurrence and residency times in each rotational state are forecast using analysis data. Flow reversals have been successfully forecast in the related Lorenz system, as part of a perfect model experiment, but never in the presence of significant model error or unobserved variables. Finally, we provide new details concerning the fluid dynamical processes present in the thermosyphon during these flow reversals.


Introduction
Forecasting methodologies, traditionally motivated by numerical weather prediction (NWP), can find applications in other fields such as engineering (Savely et al., 1972), finance (Sornette and Zhou, 2006;Bollen et al., 2011), epidemiology (Ginsberg et al., 2009), and marketing (Asur and Huberman, 2010). Techniques borrowed from the weather forecasting community may prove to be powerful for forecasting these other types of complex systems. Fluid systems can be particularly challenging due to dynamics taking place at multiple interacting spatial and temporal scales. However, because of their relationship to NWP, fluid systems are among the most studied in the context of forecasting.
In this paper, we show that the flow in a computational fluid dynamics (CFD) simulated thermosyphon undergoing chaotic convection can be accurately forecast using an ordinary differential equation (ODE) model akin to the classic Lorenz (1963) system. The thermosyphon, a type of natural convection loop or non-mechanical heat pump, can be likened to a toy model of climate. Thermosyphons are used in solar water heaters (Belessiotis and Mathioulakis, 2002), cooling systems for computers (Beitelmal and Patel, 2002), roads and railways that cross permafrost (Lustgarten, 2006), nuclear power plants (Detman and Whipp, 1968;Beine et al., 1992;Kwant and Boardman, 1992), and other ⋆ Corresponding author. e-mail: kameron.harris@uvm.edu industrial applications. In these heat pumps, buoyant forces move fluid through a closed loop, and at high amounts of forcing they can exhibit complex aperiodic behavior. As first suggested by Lorenz (1963), this is illustrative of the unpredictable convection behavior observed in weather and climate dynamics.
Synthetic observations of the thermosyphon are combined with model data to produce new forecasts in the process known as data assimilation (DA). DA is a generic method of combining observations with past forecasts to produce the analysis, an approximately optimal initial condition (IC) for the next forecast cycle. Another interpretation of the analysis is that it is a "best guess" for the true system state as represented in the phase space of the model. DA can be used as a platform for the reanalysis of past observations, in which the dynamical model plays a key role in constraining the state estimates to be physically realistic (Compo et al., 2006).
In the present study, we use an analysis of simulated thermosyphon mass flow rate data to explain the heat transport processes occurring during chaotic flow reversals, and to inform empirical forecasts of the occurrence of these flow reversals. Although the flow reversals are chaotic, we show they have short-term predictability, quantifying the extent to which this is possible with our methods. This paper is structured in the following way: In Sec. 2, we explain the CFD simulation used to generate a synthetic true state or "nature run" of the thermosyphon and the separate forecasting model. In Sec. 3 we present an overview of how DA was applied to this experiment and its performance.  Figure 1. The thermosyphon has a simple circular geometry. The bottom wall is heated to a constant hot temperature T h while the top wall is maintained at the temperature Tc, creating a temperature inversion of hot fluid below cold fluid. If conduction alone cannot stabilize this temperature inversion, then the fluid will begin to rotate and convection becomes the dominant process of heat transfer. The relevant model state variables are proportional to the bulk fluid velocity u and the temperature difference across the loop ∆T 3−9 . For counterclockwise flow, as indicated by the arrow near 9 o'clock, fluid velocity u > 0 and ∆T 3−9 is typically > 0. The radius ratio R/r = 24 used in our experiments is shown to scale.
In Sec. 4 we explain and present the results for flow reversal and rotational state residency time forecasts. Finally, Sec. 5 contains concluding remarks. In Appendices S1-4 in the Supporting Information we present a derivation of the model, detail the tuning of model parameters, and explain in detail the DA methods used.

Models and the data assimilation algorithm
Following previous experiments that examined the periodic (Keller, 1966) and chaotic Creveling et al., 1975;Gorman and Widmann, 1984;Yuen and Bau, 1999;Jiang and Shoji, 2003;Burroughs et al., 2005;Desrayaud et al., 2006;Ridouane et al., 2009) behavior of toroidal thermosyphons, we also consider a circular thermosyphon geometry. Picture a vertically-oriented hula hoop, as shown in Fig. 1. An imposed wall temperature T h on the lower half of the loop (− π 2 < φ < π 2 ) heats the fluid contained in this section. Similarly, a wall temperature Tc < T h is imposed on the upper half ( π 2 < φ < 3π 2 ) to cool the upper section (Fig. 1). The forcing, proportional to the temperature difference ∆Tw = T h − Tc, is constant. We focus on the case of developed flow, ignoring transient behavior.
The behavior of the fluid can be qualitatively understood as follows. As the heating parameter is increased, the flow behavior transitions from a conduction state (conduct-ing equilibrium) to a steady, unidirectional state of convection (convecting equilibrium). No particular rotational state (clockwise, CW, or counterclockwise, CCW) is favored due to symmetry. At still higher heating values, chaotic flow oscillations can be observed. In the chaotic regime, the flow is observed to oscillate around one unstable convecting equilibrium state until flow reversal. Each flow reversals causes the system to transition between CW and CCW rotational states.

Thermosyphon simulation
The reference state of the thermosyphon is represented by a CFD-based numerical simulation in two spatial dimensions (2D). The details of the computational model have been described in detail in a previous study by Ridouane et al. (2009); however, for completeness, we summarize here its essential elements.
It is assumed that the temperature differential ∆Tw is sufficiently small so that temperature-dependent variations of material properties can be regarded as negligible, save for the density. The standard Boussinesq approximation is invoked and all fluid properties are assumed to be constant and evaluated at the reference temperature (T h + Tc)/2. The flow is assumed to be laminar, two-dimensional, with negligible viscous dissipation due to low velocities. Under these circumstances, the governing dimensionless equations are the unsteady, 2D laminar Navier-Stokes equations along with the energy equation and equation of state for the density. No slip velocity boundary conditions are imposed on the walls and isothermal boundary conditions of T h and Tc are imposed on the heated and cooled lower and upper walls, respectively.
The dimensionless control parameter for convection is the Rayleigh number, defined here as where g is the gravitational acceleration, γ is the thermal expansion coefficient, ν is the kinematic viscosity, and κ is the thermal diffusivity. The one dimensionless geometric parameter is the ratio of major (loop) radius R to minor (tube) radius r, hereafter referred to as the radius ratio. Consistent with the previous study, the dimensions of the loop are chosen with R = 36 cm and r = 1.5 cm to yield a radius ratio of 24.
As in the classic Rayleigh-Bénard problem, the Rayleigh number determines the onset of convection in the thermosyphon. For the numerical simulations on this fixed geometry, a range of Rayleigh numbers can be imposed by varying the value of the gravitational acceleration. As the Rayleigh number is increased from zero, the flow behavior transitions from a stationary, conduction state to a steady, unidirectional state of convection. At still higher values of Ra, chaotic flow oscillations can be observed. Unless otherwise indicated, the simulation results presented in this paper correspond to a value of Ra= 1.5 × 10 5 , which is within the chaotic regime.
All numerical simulations were performed using the commercial CFD software ANSYS Fluent (2006), which is based on the finite-volume method. (An example of the output is shown in Fig. 8, in the discussion of flow reversals.) During the course of the simulations, the time-varying mass flow rate, a scalar denoted by q and proportional to u, is saved at 10 s intervals. This reporting interval is conservative, as laboratory thermocouples can be sampled more than once pre second. In doing so, a time series of the "true" synthetic thermosyphon state is recorded to be used in a forecasting scheme.

Forecast model
The Ehrhard-Müller (EM) system is a three-variable ODE derived specifically to model bulk flow in the thermosyphon ; also see Appendix S1 in the Supporting Information for an alternative derivation). Written in dimensionless form, the governing equations are The state variable x1 is proportional to the mass flow rate or mean fluid velocity, x2 to the temperature difference across the convection cell (∆T3−9, measured between 3 o'clock and 9 o'clock), and x3 to the deviation of the vertical temperature profile from the value it takes during conduction; specifically, x3 ∝ ( 4 π ∆Tw − ∆T6−12), where ∆T6−12 is the temperature difference measured between 6 o'clock and 12 o'clock. The parameter α is comparable to the Prandtl number, the ratio of momentum diffusivity and thermal diffusivity. Similar to the Rayleigh number, the heating parameter β ∝ ∆Tw determines the onset of convection as well as the transition to the chaotic regime. Finally, K determines the magnitude of variation of the wall heat transfer coefficient with velocity. The functional form of that variation is determined by h : The interested reader is referred to Appendix S1 in the Supporting Information for an explanation of this piecewise form, which differs slightly from the original model of . Note that when K = 0, the system is analogous to the Lorenz (1963) system with geometric factor (Lorenz's b) equal to one. The lack of a geometric factor in the EM system is due to the circular geometry of the convection cell. Lorenz equations have been widely used in nonlinear dynamics to study chaos and in NWP as a model system for testing DA (Miller and Ghil, 1994;Yuen and Bau, 1999;Annan and Hargreaves, 2004;Evans et al., 2004;. When in the chaotic parameter regime, the EM system exhibits growing oscillations in the x1 and x2 state variables around their convecting equilibrium values until flow reversal. In this system, the CCW rotational state is characterized by x1 > 0 and x2 > 0, and the CW rotational state by x1 < 0 and x2 < 0. However, one should note that near a flow reversal x1 and x2 can have opposite signs, because zero-crossings of the x1 variable typically lag behind those of x2. Here, using the 3D-Var algorithm and an assimilation window of 5 min, the filter has satisfactory overall performance (scaled error ≈ 35%). Note the error spike around 135 min when the forecast and truth end up in different rotational states. The largest errors tend to occur at or near flow reversals due to inherent sensitivity near that transition and to the qualitatively different behavior of the different flow directions.
The parameters found to match the simulated thermosyphon were α = 7.99, β = 27.3, and K = 0.148. The characteristic time and mass flow rate scales, used to transform the dimensionless model variables t ′ and x1 into dimensional time and "observations" of mass flow rate, were 631.6 s and 0.0136 kg/s, respectively. The q scale is the one nonzero entry in the observation operator H, Eqn. (4). The above parameters were found using a multiple shooting algorithm explained in Appendix S1.2 in the Supporting Information. Numerical integration of this autonomous ODE was performed with a fourth-order Runge-Kutta method and timestep 0.01 (corresponding to 6.316 s) in Matlab (2009).

Data assimilation
DA is the process by which observations of a dynamical system are combined with forecasts from a model to estimate error covariances and calculate an optimal estimate for the current state of the system, called the analysis. The inherent difficulties are compounded by the fact that the forecaster uses an inexact forecasting model and never knows the true state of the dynamical system. The number of state variables in a NWP model is typically O(10 3 ) times larger than the number of observations. Nevertheless, the analysis becomes the IC for a new forecast. The time interval between successive applications of the DA algorithm, i.e. the time between analysis steps (usually determined by the availability of observations but here allowed to vary), is called the assimilation window. The process is illustrated in Fig. 2.
A variety of filters are capable of solving the DA problem. The canonical example is the Kalman filter (KF; Kalman, 1960), the optimal state estimation algorithm for a linear system. One of DA's first applications was to trajectory estimation and correction of missiles and rockets (Savely et al., 1972). A number of nonlinear DA schemes are implemented in this study. In 3D variational DA (3D-Var; here 3D refers to the spatial dimensions for weather mod-els), the background error covariance is estimated a single time, offline, prior to the data assimilation procedure. In the extended Kalman filter (EKF), background error is evolved according to the linear tangent model, which approximates the evolution of small perturbations about the trajectory. Ensemble Kalman filters (collectively EnKFs) use ensembles of forecasts to estimate the background error and better capture nonlinear behavior. The methods examined in this study were 3D-Var, the EKF, the ensemble square root Kalman filter (EnSRF), and the ensemble transform Kalman filter (ETKF). Detailed descriptions of each method are included in Appendix S2 in the Supporting Information. A full review of DA is beyond the scope of the present paper; for a comprehensive treatment, we refer the reader to .

Methods
A perfect model experiment, in which the Lorenz equations were used to forecast a synthetic truth created by the exact same system, was tested first but not included here. We found analysis errors similar to those reported by  (3D-Var and EKF) and  (ETKF), using the same model and tuning parameters. This ensured that the DA algorithms were working before applying them to the synthetic thermosyphon data.
As stated in Sec. 2.1, forecasts of the thermosyphon are made observing one scalar variable, the mass flow rate q ∝ x1. Gaussian noise with standard deviation equal to 6 × 10 −4 kg/s, approximately 0.8% of the mass flow rate climatological mean, q 2 = 0.075812 kg/s, is added to the synthetic truth to create observations. The relative magnitude of this error is comparable to that of experimental measurements.
The EM model is used in the forecast step to integrate the analysis forward in time and create the new background forecast. The end results of applying DA are a background and analysis timeseries of x1, x2, x3, informed by both the timeseries of thermosyphon mass flow rate and the EM model dynamics.
In this realistic forecasting scenario, where only limited information about the true state is available, the observations of state variable q provide the only validation. For this reason, we calculate the forecast errors in observation space. These are given as root mean square error (RMSE), where RMSE = δq 2 . The residual at a specific assimilation cycle is given by δq = q − Hx b . Here, x b is the background forecast made by the model, and H : R 3 → R is the linear observation operator in units of kg/s. All errors are then scaled by q 2 , the climatology of q. Analysis error is a common metric for assessing DA performance in perfect model experiments. In this study, however, we assert that background error is preferable. Analysis error in observation space, which will be small even for large assimilation windows, is not an appropriate metric for assessing model performance since it can disagree substantially with the background error. For example, 3D-Var in one experiment with a 10 minute assimilation window yielded analysis and background scaled errors of 0.08 and 0.86, respectively. The analysis error would seem to indicate that forecasting is doing a good job, but the background error shows that background forecasts are essentially meaningless. The filter, however, accounts for this and weights the observations heavily over the background forecasts when producing the analysis. Since we are concerned with forecasting, background error is a more representative metric.
When applying DA to nonlinear systems, some type of covariance inflation is performed to prevent filter divergence due to error underestimation.  found that a Lorenz forecasting model with a slightly different forcing parameter required a 10-fold increase in the multiplicative inflation factor when using a 3 member EnKF. Model error is more pronounced for our forecasts, since the EM model is a reduced approximation of the numerically simulated thermosyphon. We relied upon additive and multiplicative background covariance inflation to capture model error. Additive inflation was particularly important for the stability of the EKF and EnKFs. Additive noise provides a different exploration of dynamically accessible regions of state space, and it would be interesting to explore why additive versus multiplicative is preferred in certain cases, although this is beyond the scope of this paper. The specifics of how inflation was performed and tuned, and the parameters used are given in Appendix S2 in the Supporting Information.
All EM and DA parameter tuning was performed using a separate mass flow rate time series than was used for validation. Each DA algorithm was allowed 500 cycles to spin-up, and its performance was measured over the following 2500 cycles. Ensemble size in each case was set to 10 members.

Results
With proper tuning, all DA algorithms were capable of synchronizing the EM model to observations of mass flow rate alone. As the assimilation window increased, scaled background error increased in a sigmoidal fashion, as expected (see Fig. 3). For assimilation windows up to 2.5 min, all DA algorithms have nearly indistinguishable errors. For assimilation windows between 3 and 6 min, 3D-Var performs noticeably worse than the other methods which remain indistinguishable. Then, with assimilation windows greater than 6 min, the ensemble methods (EnSRF and ETKF) outperform the EKF noticeably. This is perhaps surprising, at first glance, because the ensemble size is significantly smaller than the dimension of the simulated thermosyphon state space (O(10 5 ) variables). However, we know the thermosyphon dynamics effectively take place on the EM equations' attractor (a manifold in three dimensions). The superior performance of EnKFs here is likely due to the ensemble methods capturing nonlinear effects which dominate at larger windows.
Following the historical S1 score convention, scaled error above 70% is considered a "useless" forecast, while under 20% the forecast is "perfect" . Perfect forecasts for 3D-Var were found up to a 4 minute assimilation window, while the other methods (EnSRF, ETKF, and EKF) produced perfect forecasts with assimilation windows 1.5 minutes longer. q 2 over 2500 assimilation cycles plotted for different DA algorithms and varying assimilation windows. As the window becomes larger, the error increases towards saturation. The lower dashed line in the main figure shows the limit of a "perfect" forecast while the upper demarcates a "useless" forecast.
A persistent spike in background error for the 5 minute assimilation window (Fig. 3) is possibly due to that time being approximately the same as the characteristic period of oscillations in q (evident in Fig. 2). We conjecture that this may lead to a type of resonance in the DA-coupled EM system which degrades DA performance.
Besides these results pertaining to forecast skill, we also found that the DA algorithms infer thermosyphon dynamics which are absent from the EM model. In Fig. 4 we see the simulated thermosyphon's attractor obtained by both a time-delay embedding ( Fig. 4(a); Alligood et al., 1996) and a projection of the EM analysis states to the x1-x2 plane ( Fig. 4(b)). If the thermosyphon fluid flow stalls in the midst of a reversal, fluid in the bottom can quickly heat up while that in the top is cooled, leading to an unstable, strong temperature inversion. This causes the fluid to move very quickly in the reversed direction, but this new direction also ends up being unstable, and a new flow reversal can occur immediately. Absent DA, the EM model system does not exhibit this behavior of stalling followed by large swings of the trajectory.
In the time-delay embedding ( Fig. 4(a)), this phenomenon is exhibited by small loops in the trajectory as it moves near the convective fixed points. The flow stalls when the system state is near the conductive fixed point at the origin, then it swings wildly which brings it near the convective fixed point, but in such a way that it does not end up spiraling outward in the usual fashion as during a normal flow reversal, as exhibited by the Lorenz equations. Instead, it quickly reverses again, which we call non-Lorenz behavior. This non-Lorenz behavior is further elaborated upon . Two views of the numerically simulated thermosyphon attractor. A time-delay reconstruction, using the monitored mass flow rate, is shown in (a). In (b), plotted points show x 1 and x 2 of the EM analysis generated by EKF with an assimilation window of 120 s. Each is colored by the scaled background forecast error at that point. The delay time of 60 s used for (a) was chosen because it yielded a good approximation of the attractor in (b). Note how in (a) trajectories that move through the far edge of either lobe create distinctive loops near the center of the opposite lobe. This is an example of dynamics which are not present in the EM model without DA. It may explain the higher error for points in (b) at the far edge of each attractor lobe. See text for further description and Fig. 5 for another example.
in Sec. 4.5. Forecast skill is worst at the far edges of the assimilated attractor ( Fig. 4(b)). This could be due to the wild swings of the EM trajectory after being ejected from the region of state space near the conducting equilibrium, or to the nonlinear dynamical instabilities at the edge of the attractor found by Palmer (1993) and Evans et al. (2004).
We also explicitly show one of these stalled flow rever- The EKF algorithm with a 30 s assimilation window was used. Color indicates the scaled background error. The state starts (a) in the CCW rotational region and stalls near the conducting state for an extended period, causing fluid in the bottom to heat up, manifesting in an x 3 that creeps towards 0 with x 1 ,x 2 ≈ 0, before the state swings quickly (b) through one oscillation in the CW rotational state. This is followed by one oscillation in the CCW state (c) before another stall near the conducting point and subsequent swing (b again) before settling into Lorenz-like oscillations (d). Note that the filter only observes the x 1 variable but is able to reconstruct the dynamics in the full state space. sals in Fig. 5, where we plot the EKF-assimilated EM trajectory using a 30 s assimilation window. When the fluid stalls, the x3 variable moves closer to 0 (i.e. ∆T6−12 increases) while x1 and x2 (proportional to q and ∆T3−9, respectively) are approximately 0, reflecting the growing temperature inversion while the fluid remains stationary. When the fluid starts to move, the assimilated trajectory swings wildly to the left attractor lobe (CW rotational state), then right (CCW rotational state). The trajectory undergoes another stall-swing cycle before finally resuming Lorenz behavior, where the trajectory spirals outward from the CCW convecting equilibria. This contrasts the Lorenz and EM model dynamics, for which large deviations in the system state from convecting equilibrium are driven close to the other convecting equilibrium during a flow reversal, which stabilizes the system. See also Sec. 4.5, Fig. 7, and the accompanying discussion. This result remains unchanged for the other DA algorithms also using a 30 s assimilation window. The inference remains using EKF and a 60 s assimilation window, but the trajectory appears much noisier, leading us to believe that this is due to the rapid update. With larger assimilation windows, the trajectory becomes uninterpretable as error in the unobserved variables increases.

Experimental setup
For the purpose of flow reversal forecasts, we picked a single DA algorithm and assimilation window. In this Section, all analyses were generated by the extended Kalman filter and an assimilation window of 30 s. This interval corresponds to 5 time steps of the model and is shorter than that used in  and . The following could certainly be repeated using other algorithms, observations, and assimilation windows, but this was beyond the scope of this paper. The flow reversal tests in Sec. 4.3 and the residency time forecasts in Sec. 4.4 were tuned and validated on separate analysis timeseries. The length of the tuning and validation timeseries were approximately 39 and 93 days, respectively.

Occurance of flow reversals: traditional explanation
The first explanation of the mechanism responsible for flow reversals was presented by  and repeated by Creveling et al. (1975). Welander, who was also the first to discover that thermosyphons exhibit aperiodic oscillatory behavior, explained the instability of steady convecting flow by considering a thermal anomaly or "warm pocket" of fluid. For low heating rates, the convecting equilibrium is stable because viscous and thermal dissipation are in phase, thus an increase (decrease) in flow rate leads to an increase (decrease) in friction and a decrease (increase) in buoyancy, and such perturbations are damped out. At higher heating rates, the warm pocket is amplified with each cycle through the loop due to out of phase viscous and thermal dissipation's. Welander explained that when the warm pocket emerges from the heating section and enters the cooling section, it feels a greater buoyant force than the surrounding fluid and accelerates, exiting the cooling section quickly, giving it less time to radiate away its energy. As the pocket moves into the section with warm boundary, the buoyant force it experiences is again higher than normal, so now the pocket decelerates and passes slowly through the heating section, gaining more energy. This positive feedback effect causes the pocket to grow hotter and larger with each pass through the loop. These oscillations in the fluid temperature and velocity do not grow unhindered, however. The pocket eventually becomes large and hot enough that its descent towards the heating section is stopped entirely by its own buoyancy. Without movement, the pocket dissipates, but its remnant heat biases new rotation in the opposite direction, and the flow reverses.
In the Lorenz and EM systems, this feedback is embodied in the spiraling repulsion of trajectories from the unstable convecting equilibria at the center of each lobe or wing of the attractor before moving to the other lobe. Because the growth of oscillations is an important component to the flow reversal process in both the CFD simulated thermosyphon and EM system, we define here what is meant by oscillation amplitude in each case. In the CFD simulated thermosyphon, it is the maximum distance of the system state from the nearest convecting equilibrium, where system state is understood to mean the state of the entire temperature and velocity flow fields in the CFD simulation. When considering the DA-generated EM analysis, the kth x1 oscillation amplitude x max 1 is defined as the maximum amplitude f ] is the time interval of the kth oscillation.

Flow reversal forecasting methods
Three separate tests were developed to predict, at each assimilation step, whether a flow reversal would occur within the next oscillation period (approximately 11 min), here taken to be within the next 20 DA cycles. See Sec. 4.6 and Appendix S3 in the Supporting Information for a description of how the tunable parameters were chosen.

Lead forecast
The simplest test forecasts a flow reversal whenever the background forecast changes rotational state. Note that to forecast a flow reversal occurring in the future, the background forecast started from the most recent analysis IC provides our only information about the system's future state. Ignoring the three-dimensional nature of the state space, a flow reversal is forecast whenever x1 crosses through zero. Note also that the forecast is unable to predict flow reversals that occur beyond the lead time, and that lead forecast quality quickly degrades as the lead time is increased. We impose a limit on the number of assimilation cycles to look ahead, λ lead = 7, so that the algorithm does not trust forecasts too far in advance.

Bred vectors
An ensemble of perturbed states forming a small ball around the analysis can be used to represent uncertainty in the IC. A nonlinear system will dynamically stretch and shrink such a ball around its trajectory as it moves through the attractor (Danforth and Yorke, 2006). Small perturbations to points on a trajectory are integrated forward in time, and the differences between perturbed and unperturbed solutions are called bred vectors (BVs). Here, the rescaling amplitude is 0.001 and the integration time coincides with the 30 s assimilation window.
The average BV growth rate is a useful measure of local instabilities (Hoffman et al., 2009). Evans et al. (2004), studying perfect-model forecasting of the Lorenz system, set a BV growth rate threshold which accounted for 91.4% of the observed flow reversals (hit rate). Our BV test simply forecasts a flow reversal whenever the average BV growth rate over the previous assimilation window exceeds a threshold, ρBV = 0.6786.

Correlation
The final test uses the fact that flow reversals are suspected to be caused by out of phase viscous and thermal dissipation. Since the friction term grows with fluid velocity ∝ x1 and the thermal dissipation grows with the size of the temperature anomaly, related to x2, we examined the correlation between those two variables over a tunable number of previous analysis cycles. Specifically, when the slope of the least-squares linear fit of λcorr = 18 previous analysis points [x2(t−i), x1(t−i)] T for i = 0, 1, . . . , (λcorr−1) exceeds a threshold ρcorr = 1.42, a flow reversal is forecast. See Fig. 6 for an illustration of this process. Interestingly, increasing autocorrelation of the state seems to be a universal property of many systems in advance of critical transitions (Scheffer et al., 2009;Cotilla-Sanchez et al., 2012).

Forecasting residency times in the new rotational state
We found that the analysis' x1 oscillation amplitude preceding each flow reversal is correlated with the duration of the following rotational state, shown in Fig. 7. We refer to these durations between flow reversals as residency times. Residency times are observed at discernible "steps" corresponding to integer numbers of oscillations. This correlation makes the x1 oscillation amplitude a plausible predictor for residency time in the new rotational state.
Furthermore, the average BV growth rate measured over the assimilation window preceding that extremum follows a clear gradient in Fig. 7, the growth rate increasing with oscillation amplitude. The BV growth rate gradient implies that more unstable system states precede longer residency times in the next rotational state. Outliers with x max 1 14.5 result in shorter residency times than expected from making similar plots to Fig. 7 for the pure Lorenz and EM systems (not shown). In the Lorenz and EM systems, the steps continue to move upwards with x1 oscillation amplitude. The discrepancy is due to the non-Lorenz behavior that was mentioned at the end of Sec. 3.2 Our residency time prediction algorithm proceeds as follows. When a flow reversal is forecast by one of the methods described in Sec  (11,12). This is the interval that we would consider for an x 1 oscillation amplitude of 10.5 preceding flow reversal. The most likely residency time is about 23 min or 2 oscillations, the middle "step" for the histogram range.
observed exactly once in the training timeseries, so it was considered too rare an event to merit a category). The typical residency times corresponding to 1, 2, 3, 4, 5, and 6 oscillations are taken to be 11. 48, 23.09, 33.72, 44.38, 55.11, and 66.08 minutes, respectively; the oscillation category associated with a given residency time is taken to be that with the closest time in this list. This algorithm generates a probabilistic forecast from the relative abundance of points in each oscillation category. An example output would be 20%, 40%, 30%, and 10% chance of 1, 2, 3, and 4 oscillations in the next rotational state and zero probability of 5 or 6 oscillations.

New details regarding the flow reversal mechanism
Not all flow reversals occur when the system reaches the same flow oscillation amplitude, nor do all rotational states last the same amount of time. During a flow reversal, the fluid motion stalls after hot fluid extends across the entire heating section into the cooling section (see Sec. 4.2 and Fig. 8). The magnitudes of this hot "tongue" and, likewise, the opposite cold tongue affect the stability of the system as it reverses. If the oscillation is small, it will mostly dissipate before the new rotational state is entered, bringing the temperature profile close to that of conduction. This is a highly unstable equilibrium, since the vertical temperature gradient builds until the fluid in the bottom is much hotter than the fluid above (illustrated in the analysis in Fig. 5). When the fluid begins to rotate, it accelerates rapidly. The large amount of heat carried by the fluid brings the system state far from the convecting equilibrium. If the oscillation is large (corresponding to a large deviation from convecting equilibrium in temperature and velocity), remnant warm and cool areas will be present in the top and bottom sections of the loop, respectively. These stabilize the new rotational state near its convective equilibrium. The resulting duration is longer since the instability requires more time to grow before causing the next reversal. These two situations are illustrated in Fig. 8 and explain the trend in the Lorenz region of Fig. 7. Animations of the simulated temperature field during flow reversal are consistent with this explanation 1 .
We believe that the behavior in the extremely large oscillation, non-Lorenz region, where x max 1 14.5 and shown Figure 8. Temperature profiles before and after two flow reversals for the chaotic regime in units of Kelvin. For ease of visualization, the radius ratio is reduced to 3. Ra = 1.8 × 10 4 . The amplitude of the oscillation shown in (a) is relatively weak; the temperature profile quickly approaches (b) that of conduction (c) where it heats up significantly before the reversal. The extreme instability of the conducting state, near time 20 min, produces a large oscillation in the CW direction (d), immediately causing another flow reversal back to CCW. As the system enters the new rotational state, remnant heat stabilizes the flow [contrast (e-f) with (b-c)], necessitating a longer residency in the new rotational state while the instability grows (g-i) (note that for the radius ratio of 24 no more than 7 oscillations are ever observed).
in Fig. 7, is caused by excessive remnant thermal energy after flow reversal. Although the temperature distribution present after a flow reversal is configured in a way that stabilizes the flow in the new rotational state, the very large magnitude of the temperature field is a competing, destabilizing factor that dominates as x max 1 increases into the non-Lorenz region. This leads to shorter durations in the new rotational state before a second flow reversal occurs.

Flow reversal forecast skill
The results of the three tests are presented in Tab. 1 as two-by-two contingency tables. Shown in Tab. 2 are the threat score (TS), false alarm ratio (FAR), and probability of detection (POD) (Wilks, 1995). Given a non-probabilistic yes/no forecast with a hits, b false alarms, c misses, and d correct negatives for a total of n events, these are defined as TS=a/(a + b + c), FAR=b/(a + b), POD=a/(a + c). Because flow reversals are relatively rare events, the hit rate (a+d)/n would be dominated by correct negatives. Instead, TS is chosen as an appropriate overall performance metric since it disregards these frequent negative events and takes into account both false alarms and misses.
There are trade-offs among the various skill scores for each flow reversal test. Tuning the reversal tests then amounts to multiobjective optimization, attempting to maximize TS, RPS-avg, and RPS-med (the skill scores used for residency time forecasts, defined in Sec. 4.7), minimize FAR, and maintain POD above 95%. The goal was to tune each method to all-around good performance, for both reversal occurrence and residency time forecasts. To guide the process, plots of the skill scores were made for different tuning parameters, but the final tuning was performed ad hoc. In Appendix S3 in the Supporting Information, Fig. S3-2 shows one of these tuning experiments with the final parameters chosen appearing in the center of each subfigure.
Considering TS alone, the lead forecast performed best, followed by the correlation test, with the BV test performing poorest. The BV test also had a very high FAR, leading us to conclude, in contrast to the results of Evans et al. (2004) for a perfect model experiment, that BV growth rate is a poor overall predictor of flow reversals in a realistic thermosyphon. On the other hand, the correlation test had the lowest FAR while maintaining a high TS, but this comes at the price of more misses, resulting in a lower POD. The reasonable performance of the correlation test in all areas lends circumstantial evidence to the claim that out of phase dissipations are indeed the cause of flow reversals.
The flow reversal occurrence tests are triggered in different situations, leading to variation in how far in advance flow reversals are detected, the "warning time". Warning times were only computed for hits, i.e. forecast flow reversals which were observed to occur. The lead, BV, and correlation tests had average warning times of 175, 217, and 304 s respectively. Histograms of these warning times are presented in Appendix S3 in the Supporting Information.

Residency time forecast skill
A näive way of forecasting residency times would assign each possible outcome a probability equal to that measured  Table 2. Skill scores for flow reversal categorical forecasts (TS, FAR, POD) and residency time probabilistic forecasts (RPS-avg, RPS-med) for the three tests. Note that for TS and POD a perfect score is 100%, while a perfect FAR is 0%; RPS scores should be interpreted as a percent improvement over climatology, so that any RPS > 0 is an improvement.
from the climatology. In our case, this would amount to using the marginal distribution of oscillation occurance. However, our method also takes into account the x max 1 before the flow reversal (i.e. the joint distribution of events by oscillation occurance and x max 1 ), which we have shown contains important information about the number of oscillations that the system will undergo in the new rotational state. So, we compare our method to climatology using a ranked probability skill score (RPS, see Wilks, 1995). This is only computed in the case of hits. We actually computed two variants, by taking either the mean (RPS-avg) or median (RPS-med) of the ranked probability scores for each reversal event when computing the skill. The results are illustrated in Tab. 2. The lead forecast test performs best, followed by the BV test and the correlation test. Unsurprisingly, the flow reversal tests with smaller warning times performed better when making residency time forecasts. Because there is more information about the system state immediately preceding a flow reversal if the warning time is small, the residency time forecast is better informed. The magnitude of the improvement over climatology is large for all methods. The RPS scores in Tab. 2 are similar to or better than those for probabilistic forecasts in NWP (Tippett and Barnston, 2008;Doblas-Reyes et al., 2000).

Conclusion
DA was shown to be an effective way of coupling a simplified model to CFD simulations of the thermosyphon. Although background forecast errors were always larger than observational noise, climatically scaled background error was small for reasonable assimilation windows. Proper tuning of multiplicative and additive inflation factors was essential for avoiding filter divergence and achieving low forecast error. All of the DA methods used in this study accurately capture the behavior of the thermosyphon with short assimilation windows. Among the DA methods, the ensemble methods show advantages over 3D-Var and EKF with longer assimilation windows, when nonlinear error growth becomes important. With frequent analysis update, DA can reveal non-Lorenz behavior in the thermosyphon even with the EM (Lorenz-like) model.
Three different predictors of flow reversals were proposed and tested with reasonable success. In comparison with the two rules in Evans et al. (2004) for predicting the behavior of the Lorenz trajectory, the BV growth rate is a useful measure of the EM model's dynamical instabilities, but it does not perform well on its own as a predictor of flow reversals. Finally, the amplitude of the final oscillation in the current rotational state was found to be correlated with the residency time in the following rotational state, and we provide a physical explanation for this phenomenon, elaborating on the details of flow reversals. Oscillation amplitudes were then used to create probabilistic forecasts of those residency times with significant improvements over climatology.
A laboratory thermosyphon device is in construction. The next stage of this research will apply similar methods to forecasting the system state, flow reversals, and residency times using 3D numerical flow simulations. Spatiallyaware DA techniques, such as the Local Ensemble Transform Kalman Filter , could be applied to finite-volume or finite-element models to study the spatial structure of the fluid flow and error growth. These imperfect model experiments could be used to compare the relative performance of other DA algorithms (4D-Var, , synchronization approaches (adaptive nudging, see , and empirical correction techniques Allgaier et al., 2012).
Although the thermosyphon is far from representing anything as complex and vast as Earth's weather and climate, there are characteristics our toy climate shares with global atmospheric models. Sophisticated atmospheric models are, at best, only an approximate representation of the numerous processes that govern the Earth's climate. Global weather models and the EM model both parameterize finescale processes that interact nonlinearly to determine largescale behavior. Clouds and precipitation are sub-grid scale processes in a global weather model, and the correlations for the heat transfer and friction coefficients are parameterizations of fluid behavior on a finer scale than can be dealt with in the reduced model. Cloud formation is only partly understood, and moist convection is an area of active research where some models bear similarities to the EM model considered here (Weidauer et al., 2011).
The methods we use to forecast the toy model are also similar to the methods used for global geophysical systems. Both require state estimation to find the IC from which to generate forecasts. Also, when forecasts are made in either system, climatology and dynamically accessible regimes are often more important than specific behavior: the occurrence of flow reversals for the thermosyphon; periodic behavior such as the El Niño Southern Oscillation, and statistics such as globally and regionally-averaged temperatures and their effects on rainfall, ice cover, etc. for climate. Each of these is a statistic that must be post-processed from the model output. To meet these global challenges, many tools are needed in the modeling toolbox. These techniques may also be useful for forecasting sociotechnological systems which are rapidly gaining importance as drivers of human behavior. In this way, toy models can provide us with insights that are applicable to the important scientific problems of today.

S1.1 Derivation
Following the derivations of  and , we consider the forces acting upon a control volume of incompressible fluid in the loop. All fluid properties are cross-sectionally averaged, and the radial components of velocity and heat conduction within the fluid are neglected. The fluid velocity u = u(t) is assumed to be constant at all points. Applying Newton's second law, the sum of all forces on the control volume must equal its change in momentum: where Fp = −πr 2 R dφ∇p = −πr 2 dφ ∂p ∂φ (S1-1b) (S1-1d) The angular coordinate φ and loop dimensions r and R are defined in Fig. 1 in the main text, g is the acceleration of gravity, ρ is the fluid density, u is velocity, and p is pressure. The total force in Eqn. (S1-1a) is comprised of the net pressure (Fp), friction from shear within the fluid (F f ), and the force of gravity (Fg). The pressure term, Eqn. (S1-1b), is the volume times the pressure gradient. The friction term, Eqn. (S1-1c), is written in this form in order to simplify the analysis; all frictional effects are contained in fw which will depend on fluid velocity, to be discussed later. Before we write the momentum equation, it is convenient to apply the Boussinesq approximation, which assumes that variations in fluid density are linear with temperature. In other words, ρ = ρ(T ) ≈ ρ0(1−γ(T −T0)) where ρ0 is the reference density, γ is the coefficient of volumetric thermal expansion, and T0 = 1 2 (T h + Tc) is the reference temperature. The Boussinesq approximation also states that the density variation is insignificant except in terms multiplied by g. Thus, the density ρ is replaced by ρ0 in all terms of Eqn. (S1-1) except gravity, Eqn. (S1-1d). Using the Boussinesq approximation, gathering terms, and dividing out common factors gives the momentum equation (S1-2) Integrating about the loop, the momentum equation is simplified because u and fw are independent of φ and other terms drop out due to periodicity.
We now must account for the transfer of energy within the fluid, and between the fluid and the wall. All modes of heat transfer are neglected except convection, which is a valid approximation when r ≪ R ). The energy rate of change (D/Dt is the material derivative with respect to time) in the control volume is which must be equal to the heat transfer through the wall ∆Q = −πr 2 R dφ hw(T − Tw) , (S1-5) where cp is the specific heat of the fluid, hw is the heat transfer coefficient, which depends on velocity, and Tw is the temperature at the wall. Combining Eqns. (S1-4) and (S1-5) gives the energy equation Together, Eqns. (S1-3) and (S1-6) represent a simple model of the flow in the loop. The transport coefficients fw and hw characterize the interaction between the fluid and the wall. They are defined by the constitutive relations  hw = hw 0 [1 + Kh(|x1|)] (S1-7) fw = 1 2 ρ0fw 0 u , (S1-8) where x1 ∝ u is the dimensionless velocity. The function h(x) = p(x) when x < 1 x 1/3 when x 1 (S1-9) in Eqn. (S1-7) determines the velocity dependence of the heat transfer coefficient, which varies as u 1/3 for moderate u . We introduce the fitting polynomial p(x) = 44/9 x 2 − 55/9 x 3 + 20/9 x 4 to ensure that hw is analytic at x1 = 0. This piecewise definition causes h(x) to vary as p(x) for x 1 and x 1/3 for x > 1. Eqn. (S1-8) gives the frictional deceleration of the fluid when |u| > 0, and the ρ0/2 term is retained to simplify the final solution. Dimensionally, fw is an acceleration (m/s 2 ) and hw is power per unit volume per unit temperature (W/m 3 K). These coefficients hw 0 , fw 0 , and K must be estimated from experiments (e.g.,  or from other empirical means. In Sec. S1.2, we describe the empirical methods used for parameter estimation.  solved the system of two coupled, partial differential equations (Eqns. (S1-3) and (S1-6)) by introducing an infinite Fourier series for T . The essential dynamics can be captured by the lowest modes, i.e., T (φ, t) = C0(t) + S(t) sin φ + C(t) cos φ . (S1-10) Because this form of T separates the variables φ and t, the problem is transformed into a set of ordinary differential equations. Substituting Eqn. (S1-10) into Eqn. (S1-3) and integrating gives the equation of motion for u. Similarly, Eqn. (S1-6) is integrated by dφ sin φ and dφ cos φ to separate the two temperature modes S and C. The system is written in dimensionless form where the following linear transformations have been made to create dimensionless variables Physically, x1 is proportional to the mean fluid velocity, x2 to the temperature difference across the convection cell or ∆T3−9 (between 3 o'clock and 9 o'clock), and x3 is proportional to the deviation of the vertical temperature profile (characterized by the temperature difference between 6 o'clock and 12 o'clock, ∆T6−12) from the value it takes during conduction.
The parameter α = 1 2 ρ0cpfw 0 /hw 0 is comparable to the Prandtl number, the ratio of momentum diffusivity and thermal diffusivity. Similar to the Rayleigh number, the heating parameter Rhw 0 fw 0 ∆Tw (S1-13) determines the onset of convection as well as the transition to chaotic behavior. Although the previous derivation assumes a 3D geometry, the CFD simulations described in Sec. 2.1 of the main text, were performed in 2D. A 2D geometry corresponds to infinite concentric cylinders as opposed to the quasi-1D torus. Due to cross-sectional averaging, the EM equations of motion (S1-11) are the same in 2D or 3D; the change may be realized by letting πr 2 → 2r in Eqns. (S1-1), (S1-4), and (S1-5) and carrying out the rest of the derivation. The only differences arise in the non-dimensional transformations and parameters, which were empirically determined by a multiple shooting algorithm (see Sec. S1.2).

S1.2 Parameter estimation
Before any forecasting, the parameters matching the EM model to the thermosyphon simulation needed to be determined.  used experimental measurements to determine the correlation coefficients for friction, fw 0 , and heat transfer, hw 0 and K. They achieved this by opening the loop at φ = π/2 and providing a developed flow with adjustable velocity. By measuring the pressure loss (∝ fw) and heat transfer across the loop for a range of velocities, they were able to find the correlation coefficients using regression. We were unable to accomplish this with a CFD simulation of an open-loop geometry.
Instead, parameter estimation was formulated as a multiple shooting problem. Shooting methods minimize the error in an ODE trajectory relative to data by optimizing over all possible initial conditions and parameter space. Multiple shooting is a shooting method suitable for chaotic ODEs (Baake et al., 1992). It overcomes the sensitive dependence on initial conditions by partitioning the data set and solving the shooting problem on those subsets of the data, augmented by continuity conditions. We used the nonlinear least square optimizer lsqnonlin in Matlab (2009) to perform the minimization and relaxed the continuity constraints. The model parameters which were tuned were α, β, and K. However, we also needed a way to determine the time and velocity scales to convert the dimensionless variables t ′ and x1 to their observed, dimensional values t and q. These scales change as the other parameters are varied, so these were incorporated into the variables of the optimization.
The results of the multiple shooting algorithm are shown in Fig. S1-1 and Table S1-1.

S2.1 Kalman Filter (KF)
The KF is well-known and widely used in linear DA and control problems. Although the thermosyphon is highly  nonlinear, the linear update equations are similar to those of the nonlinear algorithms used for this experiment. The KF attempts to assimilate observations and forecasts for a process of the form In this case, x t is the true state, which advances in time according to the linear process W, which is unknown but approximated by the model M. Subscripts index the time step. Using the model, the analysis from the previous time step is integrated to generate the background forecast for the current time step where M is the linear model, x a is the old analysis, and x b is the background. Because M is only an approximation of W, a perfect initial condition will not lead to a perfect forecast, so where the model errors ǫ q have covariance Q (usually assumed to be constant in time) and are written on the right hand side for convenience. When deemed unnecessary, time subscripts are left out. Given an observation y and background forecast x b , the KF finds the optimal way to combine them into the analysis x a , the best guess of the current state. This becomes the IC when forecasting with the model, Eqn. (S2-2). In an operational context, we usually cannot observe every state variable. If x ∈ R N and y ∈ R M , then M < N (in NWP M ≪ N ), so we define the observation operator H : R N → R M that takes the background forecast from the model state space into the observation space. This serves two purposes: first, it avoids extrapolation of observations to gridpoints in state space; and second, it enables us to interpret our forecasts by comparing them directly to observations. For the thermosyphon, H is linear, so we write it as H, but this is usually not the case for the observations in NWP, e.g., satellite radiances and radar reflectivities.
The complete application of the KF consists of a forecast step and an analysis step with the Kalman gain K k given by The forecast equations create the background forecast and update the background error covariance. The new background error covariance is the old analysis error integrated forward plus the model error Q. In the analysis step, this background forecast is incremented by the gain times the innovation (y − Hx b ) to produce the analysis. The difference between the analysis and the background is referred to as the analysis increment; statistical properties of these increments can be used to reduce model error Danforth and Kalnay, 2008b;a). The new analysis error is equal to the background error reduced by a factor of (1 − KH). By finding the analysis, the filter has revealed the best possible starting point for the next background forecast. In fact, if the system is linear, the KF is the optimal algorithm for state-estimation.

S2.2 Variational Filtering (3D-Var)
Rather than minimize the analysis error variance, the analysis equations can also be derived by finding the analysis state x a that minimizes the quadratic scalar cost function The cost C(x) has its minimum at x = x a , where x a is given by Eqn. (S2-5). This is called the 3D variational (3D-Var) method since the minimization for NWP is with respect to a state vector embedded in a three-dimensional field (latitude, longitude, and height).
Formally, both 3D-Var and the KF yield the same solution . However, in this case the control variable is the analysis, while in the KF the control variable is the weight matrix itself. In operational NWP, where the dimension of the state space N is of O(10 9 ), the numerical implementations of 3D-Var and the nonlinear KF are drastically different. Because 3D-Var assumes the background error B is fixed in time, the Kalman gain K needs to be calculated only once. The calculation of K is the most computationally prohibitive part of DA because it requires solving a linear system in N variables. A constant K thus makes the algorithm computationally simple; the most difficult part of implementing 3D-Var is finding the optimal B.
However, a static B is not realistic. From a dynamical systems standpoint, uncertainty is closely related to stability, which is clearly dependent on the system state. In the thermosyphon, the true background error is typically smaller when the system state is near the unstable convecting equilibria than when the state is near the more unstable conducting equilibrium. Because 3D-Var is computationally cheap, the National Centers for Environmental Prediction (NCEP) employ it to estimate ICs for the National Weather Service 14-day global forecasts. However, it cannot detect so-called "errors of the day", state-dependent forecast errors which grow quickly but are not represented in the 3D-Var background error covariance matrix .
In our implementation, the 3D-Var background error covariance B3D-Var was calculated iteratively, using a techniques similar to that described in . We did this by calculating a time average of the outer product of analysis increments disregarding the initial 500 assimilation cycles and iterating the process until convergence. During this, forecast errors were observed to decrease and stabilize. This B3D-Var was first computed for the 30 s assimilation window. It was stored and then used to bootstrap the iterative procedure for the 60 s assimilation window, which was stored and fed into the calculation for the 90 s assimilation window, etc.

S2.3 Extended Kalman Filter (EKF)
The EKF is essentially the KF applied to a nonlinear model. Given a nonlinear model M, the error covariances are updated by the linear tangent model M ≡ ∂M/∂x| x=x b which takes the place of M in Eqn. (S2-4b). This model propagates small perturbations around the trajectory x b forward in time. To operate on the matrix A with the linear tangent model, first take the Jacobian of F (the right hand side of the nonlinear differential equationẋ = F (x) which describes the model M) and evaluate it at the background point x b ; call this matrix J. Each column ai of A, which can be thought of as an error perturbation to the analysis state, is then integrated forward in time according to the linear ODEȧi = J ai.
Also note that if the observation operator H is nonlinear, it is replaced by a similar linear tangent model H in the matrix equations (S2-5) and (S2-6). The transpose of these matrix functions are called adjoint models, which are used in sensitivity analysis of the state to perturbations.
To propagate the background covariance without the explicit adjoint model, as Eqn. (S2-4b) would require, B was first decomposed with the Cholesky factorization (Golub and van Loan, 1996) into the product of a lower and upper diagonal matrix before its columns were integrated forward with the linear tangent model M.
This guarantees symmetry for the new analysis error covariance A. Some modifications to the EKF algorithm are necessary to prevent filter divergence. A multiplicative inflation factor -11) was applied to the background covariance matrix after the model integration and before the analysis step. We also performed additive inflation, following . Random numbers uniformly distributed between 0 and µ were added to the diagonal elements of A after performing the analysis and before the next forecast step, i.e.
where ν is an N -dimensional vector whose entries are drawn from a uniform distribution between 0 and 1.

S2.4 Ensemble Kalman Filters (EnKFs)
The EnKF is a method that replaces a single forecast state with an ensemble of states. The spread of the ensemble about its mean gives an approximation of the background error covariance and forecast uncertainty, while the ensemble average gives the best guess of the forecast. The EnKF was first introduced by Evensen (1994). For a comprehensive overview of ensemble filters, see Evensen (2003). It was shown that if the observation, which has random error with covariance R, is perturbed with P random errors (again with covariance R), to make an P -member ensemble of independent observations {yi}, then the background error covariance can be written (Evensen, 2003) which is simply the unbiased average outer product of background perturbations X b = [x ′b 1 , . . . , x ′b P ]. The background forecast of ensemble member i is denoted x b i , x b is the background forecast ensemble average, and x ′b i = x b i − x b is the ith member's deviation from the mean. In this case, each ensemble member is updated according to the KF equations for their associated observation The advantages of the EnKF are many: there is no linear tangent model to compute, the number of ensemble members can be small (O(10 2 ) for NWP) relative to the dimensionality of the state space, and prior knowledge about the structure of the forecast errors is not necessary. Currently, 4D-Var (like 3D-Var but also taking into account older observations) and ensemble filters are the most promising candidates being considered to replace 3D-Var in operational NWP.
As with the EKF, ensemble filters tend to underestimate the background error, resulting in an ensemble spread which is typically less too small. We again used multiplicative inflation of the background error, a common method shown to be successful in Evensen (2003); Whitaker and Hamill (2002); Annan and Hargreaves (2004); ; . This is accomplished by setting before the analysis step. Additive inflation proved crucial to stabilizing both EnKFs tested. Without it, the filters sometimes worked but only with ∆ ≫ 1; ∆ is supposed to be a small parameter. As in the EKF, additive inflation is applied immediately after the analysis step, but in this case the noise is added to the analysis ensemble states x a i ← x a i + µ ν (S2-15) for all i = 1, . . . , P . The noise ν is, again, an N -dimensional random vector with entries drawn from the uniform distribution between 0 and 1.

S2.5 Ensemble Square Root Filter (EnSRF)
The original EnKF adds noise to create linearly independent observations and is classified as a perturbed observations method . This necessarily introduces additional sampling error into the forecast. For this reason, Whitaker and Hamill (2002) introduced the ensemble square root filter (EnSRF) as an improved EnKF. In the EnSRF, the ensemble mean is updated with the traditional Kalman gain (Eqn. (S2-6)) -16) and deviations from the mean are updated by (S2-18) When the observation is a scalar, it can be shown that If observation errors are uncorrelated (R is diagonal), then Eqn. (S2-19) can be used to process observations one at a time (Whitaker and Hamill, 2002). The updated analysis ensemble is then {x a i }, where x a i = x a + x ′a i . Square root filters have better numerical stability and speed than their standard KF counterparts. The Potter square root filter was employed for navigation in the Lunar Module of the Apollo program (Savely et al., 1972).

S2.6 Ensemble Transform Kalman Filter (ETKF)
The ETKF is another type of deterministic square root filter. In this variant, the analysis perturbations are assumed to be equal to the background perturbations postmultiplied by a transformation matrix T so that the analysis error covariance satisfies Eqn. (S2-5b). The analysis covariance is written c 0000 Tellus, 000, 000-000 The local ensemble transform filter (LETKF) is a variant that computes the analysis at a given gridpoint using only local observations. This allows for efficient parallelization. Localization removes spurious long-distance correlations from B and allows greater flexibility in the global analysis by allowing different linear combinations of ensemble members at different spatial locations . Table S2-1 lists the tuning parameters used for the DA experiments. The tuning was done manually. Sensitivity of model error to the tuning parameters was checked by creating a course contour plot of background error for assimilation windows of 2, 4, 6, 8, and 10 minutes for each filter; an example is shown in Fig. S2-1.

APPENDIX S3: Flow reversal forecast tuning
In Fig. S3-1, we present skill score curves as the tuning parameters of the three flow reversal forecasts are varied. We used a sequence of plots of this type to inform our tuning of the various flow reversal tests.
In Fig. S3-2, we also present warning time histograms for the different tests. If early detection of flow reversals is desirable, then the warning times tell us how the tests compare.