A New Method to Compute Transition Probabilities in Multi‐Stable Stochastic Dynamical Systems: Application to the Wind‐Driven Ocean Circulation

The Kuroshio Current in the North Pacific displays path changes on an interannual‐to‐decadal time scale. In an idealized barotropic quasi‐geostrophic model of the double‐gyre wind‐driven circulation under stochastic wind‐stress forcing, such variability can occur due to transitions between different equilibrium states. The high‐dimensionality of the problem makes it challenging to determine the probability of these transitions under the influence of stochastic noise. Here we present a new method to estimate these transition probabilities, using a Dynamical Orthogonal (DO) field approach. In the DO approach, the solution of the stochastic partial differential equations system is decomposed using a Karhunen–Loève expansion and separate problems arise for the ensemble mean state and the so‐called time‐dependent DO modes. The original method is first reformulated in a matrix approach which has much broader application potential to various (geophysical) problems. Using this matrix‐DO approach, we are able to determine transition probabilities in the double‐gyre problem and to identify transition paths between the different states. This analysis also leads to the understanding which conditions are most favorable for transition.

2 of 20 oscillator (Berloff et al., 2007). When these models include a stochastic representation of the wind-stress forcing, transitions between states with different paths can be found when the amplitude of the noise is large enough (Pierini, 2014;Schmeits & Dijkstra, 2001). In this paper, we focus on a new method to compute transition probabilities in multi-stable stochastic dynamical systems governed by stochastic partial differential equations, with the barotropic double-gyre quasi-geostrophic wind-driven model as application.
A standard, but naive, method to determine transition probabilities between different equilibrium states is a Monte Carlo approach. One obtains sufficient independent realizations of the system and simply counts the number of transitions within a given time. Such a Monte Carlo approach is not well-suited for high-dimensional systems or when transitions are rare. For example, for a transition probability of p = 10 −3 , already 1,000 realizations would be needed to likely simulate one event. Hence, several specialized methods have been suggested to compute transition rates more efficiently, such as Adaptive Multilevel Splitting (Rolland & Simonnet, 2015) and Genealogical Particle Analysis (Wouters & Bouchet, 2016). A promising variant for computing the probability of a transition within a certain time interval is the Trajectory Adaptive Multilevel Sampling (TAMS) method, where an ensemble of trajectories is steered from one equilibrium state to another, and by keeping track of the different trajectories, an estimate of the transition probability can be made (Lestang et al., 2018;Rolland & Simonnet, 2015).
To apply TAMS to high-dimensional systems, a projected time-stepping method was suggested in Baars et al. (2021), for which in every time step, only a small projected linear system has to be solved, and for which only the projected trajectories have to be stored, which are much smaller dimensional than the original trajectories. The projected time-stepping method is related to the Karhunen-Loève transform in the context of stochastic processes (Loeve, 1955) or the proper orthogonal decomposition in the context of fluid dynamics (Cazemier et al., 1998;Lumley, 1967). Such model order reduction techniques have been applied in many other fields when studying transitions (Hartmann et al., 2016;Hartmann & Schütte, 2005;Hernández et al., 2018;Mohamad et al., 2016;Mukhin et al., 2015;Yvinec et al., 2012).
Here, we explore the direct computation of transition probabilities using a combination of a model order reduction technique, the Dynamical Orthogonal (DO) field method (Sapsis & Lermusiaux, 2009) and classical Monte Carlo simulation. The stochastic partial differential equations are efficiently solved using a Karhunen-Loève expansion of the state vector and an orthogonal projection onto a small-dimensional subspace of DO modes. Until now, the implementation of DO has been limited as for each model the governing DO equations had to be derived analytically (Sapsis & Dijkstra, 2013). In this paper we present a general matrix-DO framework which is more versatile for applications and use it to study transition probabilities in the barotropic double-gyre quasi-geostrophic model of the wind-driven ocean circulation. In Section 2 we discuss the matrix-DO framework together with a set-up of the wind-driven ocean circulation model. The results on transition probabilities and transition paths for this model are the main focus of this study and are presented in Section 3. The results are summarized and discussed in Section 4.

The Wind-Driven Ocean Circulation Model
We consider the midlatitude wind-driven ocean circulation in a square L × L, constant depth D basin on a β − plane (f = f 0 + β 0 y), with f 0 the Coriolis acceleration at a certain (mid)latitude. The density of the ocean water ρ is constant and we assume the lateral momentum mixing to be represented with help of a constant eddy-viscosity A H . The flow is forced by a zonal wind-stress τ x (x, y) = τ 0 τ(x, y) having a deterministic and a stochastic component. When the equations are scaled with a horizontal velocity U, a length L and time L/U, the non-dimensional stochastic partial differential equations of this model (Sapsis & Dijkstra, 2013) become the following in streamfunction-vorticity formulation.
where ω is the vertical component of the vorticity vector and ψ is the associated streamfunction. Furthermore, VAN WESTEN ET AL.

10.1029/2022MS003456
3 of 20 where τ x is the deterministic part of the wind-stress forcing and the term b ∂W is the stochastic wind-stress curl; W indicates a Wiener process. The boundary conditions are given by slip conditions at the north-south boundaries (y = 0, 1) and no-slip conditions on the east-west boundaries (x = 0, 1).
Equation 1 contain three parameters, α = τ 0 L/(ρDU 2 ), β = β 0 L 2 /U, Re = UL/A H and the noise strength σ. We use the values for the parameters (α = β = 10 3 ) as provided in Sapsis and Dijkstra (2013) because the deterministic bifurcation diagram for this case is known, and focus on varying the parameters σ and Re. For the deterministic case (σ = 0), it is known that for relatively low values of Re (<30) a single symmetric state exists. Near Re = 30, there is a pitchfork bifurcation resulting in two (asymmetric) steady states, the so-called jet-up or jet-down states, and for even higher Re (>52) there is a Hopf bifurcation (Sapsis & Dijkstra, 2013). So the multi-stable steady state regime, focused on in this paper, is bounded by 30 < Re < 52.

The Matrix-DO Method
When Equation 1 are discretized in space on an equidistant n x × n y grid using central differences, the resulting set of equations can be written as with X(0) = X 0 . Here, ( ) ∈ ℝ is the stochastic state vector, W(t) is a n W − dimensional Wiener process (n W = 1), M is a symmetric positive semi-definite n × n matrix, f a smooth mapping ℝ × ℝ → ℝ and g a smooth mapping ℝ × ℝ → ℝ × ℝ . The governing Equation 1 can be written in the form Equation 3, where for example, g contains the discretized stochastic wind-stress curl. In this case, M is singular and we have a system of stochastic algebraic differential equations. We assume that there is no stochastic forcing on the algebraic part of the equations and that X 0 satisfies the algebraic part of the equations.
In the matrix-DO formulation, we aim to solve Equation 3 and split the state vector X(t) into a deterministic and an additive stochastic part where ̄( ) is the ensemble mean in the DO method, satisfying [ ( )] =̄( ) , and the remaining part Substituting Equation 4 into 3, and taking the expectation results in the equation for ̄( ) , that is, Subtracting Equation 5 from 3, we obtain the equation for Z(t) where Next we use a Karhunen-Loève expansion of the form where V(t) (n × n V ) contains an M-orthogonal basis of the so-called DO modes and Y(t) is a stochastic vector (n V × 1), containing the amplitudes of these modes. Here n V is the number of DO modes retained in the Karhunen-Loève expansion. Of course, the approximation is exact if the dimension n V tends to infinity, but we want to keep the dimension as low as possible, to reduce the needed computational effort. The statistical properties of the desired stochastic variables are determined by using n Y realizations. The larger the number of realizations, the closer the statistics of the stochastic variable is approximated.
Since V depends on t, the expansion Equation 8 is not unique, because for any orthogonal matrix U(t), ( ) = ( ) ( ) = ( ) ( ) ( ) ( ) ≡̂( )̂( ) . To remove this redundancy, the DO-condition is used which states that the change in V is orthogonal to the space V, that is, Next we substitute the Karhunen-Loève expansion Equation 8 into 6, leading to For readability, we drop the time argument in the equations from now on. Pre-multiplying Equation 10 by V T , using Equations 9 and 8, the following evolution equation for Y(t) is obtained: Pre-multiplication of Equation 11 with MV and subtracting it from Equation 10 gives To determine an equation for each DO mode we post-multiply Equation 12 by Y T and take the expectation, leading to Note that the DO-condition Equation 9, which leads to d(V T MV)/dt = 0, assures that when starting with V(0) T MV(0) = I then V T MV = I for all t.
From these equations we can now create a matrix based method by writing f(x, t) as a combination of a vector, a linear form and a bilinear form. Similarly g(x, t) is written as a combination of a matrix and a matrix that depends linearly on x. In the Supporting Information S1, the matrix DO equations are given for general quadratic f(x, t) and linear g(x, t). In this case, the user only has to provide only M, f(x, t), the Jacobian matrix of f, and the bilinear form f(x, t) contains. Moreover, the user should provide matrices defining the quantity g(x, t). Note that the version used here is limited to equations with most quadratic nonlinearities and additive stochastic forcing, but this still allows tackling a large class of problems. Details on how to numerically solve Equations 5, 11 and 13 are also presented in Supporting Information S1. Results of testing the code using the stochastic Burgers equation are presented in Figure S1 in Supporting Information S1 and a publicly available Python code is provided online.

Results
Within the multi-stable regime of the model 30 < Re < 52, we will explore the transitions between the two deterministically stable states. In the results below, we use a n x = n y = 64 equidistant grid and the standard (dimensionless) time step is Δt = 1 × 10 −3 . We first focus on the standard case (Section 3.1) for Re = 40 and σ = 1 and then in the following sections focus on the noise amplitude (Section 3.2), number of DO modes (Section 3.3) and the value of Re (Section 3.4).

The Standard Case
In the standard case, we take n V = 4 (number of DO modes) and n Y = 1, 000 (number of stochastic realizations). Based on adaptive criteria, Sapsis and Lermusiaux (2012) demonstrated that n V = 4 is the lowest number of DO modes which sufficiently capture the dynamics and statistics of the solutions of the wind-driven ocean circulation  Figure 1a) is the well-known symmetry-breaking P-mode (Simonnet & Dijkstra, 2002). A weaker noise amplitude (Figure 1b), more DO modes ( Figure 1c) and a higher spatial resolution ( Figure 1d) do not change this result. As expected, the variances for v 2 , v 3 and v 4 in Figure 1b are considerably lower (compared

of 20
to the standard case) as a result of the lower noise level. For Re = 20 (Figure 1e), the DO modes do not grow, which is consistent with the fact that this value of Re is outside the multi-stable regime of the model. A low spatial resolution effectively increases the value of Re at the pitchfork bifurcation to values beyond Re = 40 and hence no growth of DO modes is found ( Figure 1f).
The probability distribution function (PDF) of the stochastic coefficients (Y i ), together with streamfunction and vorticity patterns of and the v i at t = 15 are shown in Figure 2 for the standard case. Because two stable steady states exist for Re = 40 in the deterministic case, v 1 has a bimodal PDF (Figure 2a) while the PDFs of the other DO modes are Gaussian (Figures 2d, 2g and 2j). As already mentioned above, v 1 (Figure 2b) breaks the symmetry of ( Figure 2c). The spatial pattern of v 2 ( Figure 2e) shares much resemblance with the gyre intensifying L-mode (Simonnet & Dijkstra, 2002). It modulates the gyre strength as this DO mode has a similar (opposite) sign for the gyres compared to for positive (negative) values of Y 2 . The v 3 and v 4 DO modes (Figures 2h and 2k) show spatial patterns with a more complicated spatial structure, but we will not discuss those because their variances are relatively small.
In the following, the amplitudes of the realizations with Y 1 > 0 are indicated by ( +   1   ) , i = 1, 2, …, n V and the (ensemble) mean of these coefficients is indicated by . We use a similar notation for The mean values of the Y i coefficients are plotted in Figure 2f and show that the Y 1 coefficients strongly dominate in the model response. We reconstruct the total state using (Figures 2i and 2l). Note that we actually plotted the + and − , but we drop the bars for the equilibria states from now on. We also reduce the general notation to 1 ( + 1 ) = + 1 for i = 1 and to ( + 1 ) = + for i > 1, and similarly for the realizations with Y 1 < 0 ( − 1 and − ). We will next use the standard case to study transition behavior when the noise level is increased.

Transitions: Different Noise Levels
From the solution of the standard case (Figure 1a), we branch off simulations with different noise levels (σ). We increase the number of realizations to n Y = 10, 000 by copying 10 times the Y i at t = 15, while the and v i , i = 1, …, 4 (at t = 15) remain unaltered. We use a finer time step of Δt = 5 × 10 −4 in the time integration, both based in stability and accuracy reasons in the case of higher noise levels. After branching the simulations, we integrate the model up to t = 5 (t = 0 is now the end of the standard case simulation).
To identify transitions by analyzing individual realizations and checking whether the Y 1 value approaches the branch as the 95%-confidence level of all positive (negative) Y 1 values for each time step. Note that the related + or − coefficients are not strictly positive or negative as this is only the case for the + 1 and − 1 coefficients. We also distinguish the realizations based on their initial Y 1 sign at t = 0, as this is relevant when a realization undergoes a transition and switches (Y 1 ) sign. The transition probability is then the probability that a transition between the steady states (i.e., + 1 → − 1 and − 1 → + 1 ) happens within the given simulation time (t = 5). One relatively easy way to check whether realizations started near one equilibrium state reach the other equilibrium state is by taking the minimum (maximum) value of all the Y 1 realizations with an initial positive (negative) sign, which we refer to as + 1,min ( − 1,max ). Figure 3 shows an illustrative example on how realizations are distinguished based on the initial Y 1 sign and the related Y 2 coefficients. For the related + coefficients (with initial positive Y 1 ) we determined either the maximum value (and similarly − max and − min for the initial negative Y 1 realizations).
The variances of the DO modes are shown in the insets of Figures 4a, 4d, 4g and 4j for different noise amplitudes σ, while the branch specific properties of the Y i coefficients and the v i are plotted in Figure 5. For the standard case (σ = 1), we find no transitions for the chosen simulation time as the + 1,min and − 1,max stay close to the corresponding branch ( Figure 5a). This suggests that the transition probabilities are (very) low. The + 1,min deviates more from the + 1 branch than the − 1,max from the − 1 branch. Note that all n Y realizations have different Y i coefficients, but all n Y realizations share the same time-varying patterns v i . This suggests that the larger deviations in + 1,min compared to − 1,max are related to the spatial patterns of the DO modes. For example, a realization in the + 1 branch may experience a (small) decrease in Y 1 and increase in Y 2 . However, such excursions do not affect the jet-down circulation pattern much as the v 2 streamfunction pattern is strongly correlated with the jet-down state 8 of 20 (r ψ = 0.63, Figure 6a). The spatial correlations for the v 2 vorticity with the jet-down state are smaller (r ω = 0.23) because this pattern is spatially more varying than the streamfunction pattern. On the other hand, the degree of variations for a realization in the − 1 branch is smaller because the spatial correlations of v 2 with the jet-up state are smaller (r ψ = 0.48 and r ω = 0.23). This explains the higher variances in + 1 and + 2 compared to − 1 and − 2 (Figures 5a and 5b), respectively. Using the same reasoning, the variance in − 3 is larger than + 3 (Figure 5c) as v 3 is stronger correlated with the jet-up state than with the jet-down state (Figure 6a). The variance of v 3 is, however, smaller compared to that of v 2 ( Figure 4a) and hence the + 1 realizations have a larger variance compared to the − 1 realizations. We do not consider v 4 as this mode has an even smaller variance. This explains that particular variations in some Y i coefficients are larger than in others (e.g., + 1 vs. − 1 ) because of the resemblance of the DO spatial patterns with the equilibrium states (jet-down or jet-up state).
For σ = 5, the variances of the Y i strongly increase after branching this simulation from the standard case (inset Figure 4d). + 1,min and − 1,max deviate more from the corresponding branch ( Figure 5d) compared to the standard case, but the noise level of σ = 5 is still insufficient to induce transitions. The variances in + 1 and + 2 are again larger than those of − 1 and − 2 , respectively (Figures 5c, 5d and 6b). At t ≈ 4.7 the v 3 DO pattern substantially changes (Figure 6b). When two DO modes have the same variance the spatial patterns of these modes are inter-exchangeable; this is the case for v 3 and v 4 at t ≈ 4.7. However, these changes show no sign changes in the Y 1 coefficients.
Further increasing the noise level to σ = 10 shows substantial oscillatory deviations in − 1,max from the − 1 branch. Note that + 1,min also oscillates but the amplitude is much smaller. The variances of − 1 and − 2 are now larger than those of + 1 and + 2 (Figures 5g and 5h), respectively, as v 2 is now better correlated with the jet-up state than with the jet-down state (Figure 6c). The higher noise level causes a substantial perturbation in v 2 for t < 0.5 leading to a change in sign and flipped image (under mid-axis reflection) of this mode compared to the lower noise cases (compare insets in Figures 5b, 5e and 5h). Under the higher noise amplitude, v 2 slightly oscillates for t > 0.5, where its symmetry (w.r.t. y = 0.5) slightly varies over time (black curve in Figure 6c). A higher level of symmetry in v 2 coincides with relatively large deviations in − 1,max from the − 1 branch. Again v 3 and v 4 are inter-exchangeable as these modes have the same variance multiple times during the simulation (Figures 4g and 6c). The timing in swapping these DO modes is also linked to the symmetry of v 2 . Although a few realizations switch sign in Y 1 at the end of the simulation, the noise level of σ = 10 is still too low to induce transitions before t = 5.

of 20
The transitions do not randomly occur over time, only during specific time windows when v 2 is highly symmetric ( Figure 6d) and when the amplitude of this mode strongly increases (Figure 5k). This coincides with kinetic energy transfer from to v 2 , kinetic energy transfer from v 2 to the other DO modes (mostly to v 1 and v 3 ) and kinetic energy dissipation by v 2 (Figure 4). A symmetric v 2 together with the symmetric results in a strongly symmetric total circulation pattern when |Y 2 | > |Y 1 |. There is also an increase in the magnitude of Y 3 coefficients before undergoing the transition (Figure 5l), but the magnitude of Y 3 is smaller than that of Y 2 . The magnitude increase in Y 2 (and Y 3 ) makes the contribution of the DO mode v 1 less relevant and hence realizations with The transition probabilities between the two branches ( + 1 → − 1 vs. − 1 → + 1 ) are strongly different over this time interval. One could argue that the 8 realizations which return to the original + 1 branch (Figure 7a) are also transitions from − 1 to + 1 . Indeed, these 8 realizations also switch sign in Y 1 around t = 2.5, which is  . The + 1,min and − 1,max are the largest deviations from the initial branch. (Middle column): Same as the left column, but now for the Y 2 coefficients. The insets show the spatial patterns of v 2 at t = 1, 2.5 and 4 (same color scheme as in Figure 2e). (Right column): Same as the middle column, but now for the Y 3 coefficients and v 3 . In addition in panel (i) four realizations (thin lines) are shown which reach a different branch, the related Y 2 and Y 3 coefficients are also shown in panels (k and l), respectively. The additional insets in panels (a-f) are zoomed-in versions of the Y i deviations. 11 of 20 comparable to the three initial − 1 realizations ( Figure 7b). Nonetheless, accounting for these 8 realizations still results in a factor 5 difference between the two transition probabilities. It is however not very surprising that one finds differences in the transition probabilities. All realizations share the same DO patterns and a particular DO pattern at a given time may favor a specific state, as was demonstrated for the v 2 and v 3 w.r.t. the jet-up and jet-down states ( Figure 6). This also explains why the transition paths are not identical (Figures 7c and 7d).
The DO modes v 2 , v 3 and v 4 oscillate in the higher noise level simulations and this may eventually favor transitions from − 1 to + 1 . Hence, we continued the σ = 15 simulation to t = 10. Realizations which made the transition to the other branch for t = 0-5 are not part of their initial branch at t = 5. Therefore we distinguish the realizations at t = 5 based on their Y 1 (t = 5) sign and followed the same procedure as before. There are 47 transitions (out of 4,840 realizations) from − 1 to + 1 for t = 5-10 (Figure 7b), for + 1 to − 1 there are only 10 transitions (out of 5,160 realizations, Figure 7a). These transitions occur at t ≈ 6 when v 2 is highly symmetric (insets Figures S2h and S3a in Supporting Information S1). The mode v 2 has a similar pattern at t = 6 compared to t = 2.5, so this does not explain why the − 1 to + 1 transitions are favored over the + 1 to − 1 transitions for t = 5-10. However, v 3 switches sign at t = 6 (insets Figure  S2i in Supporting Information S1) compared to t = 2.5 (insets Figures S2c and S2i in Supporting Information S1) and the spatial correlations between v 3 and the states v − and v + also switch sign over time ( Figure S3a in Supporting Information S1). The − 1 to + 1 transitions are favored (suppressed) over the + 1 to − 1 transitions at t ≈ 6 (t ≈ 2.5), as the spatial correlation of v 3 with v − (v + ) is stronger than the spatial correla tion of v 3 with v + (v − ) before the transitions ( Figure S3a in Supporting Information S1). The transition paths are switched in sign for Y 3 in Figure 7f but Figure 6. (a-d) The spatial correlation coefficients r ψ of the streamfunction patterns of the jet-down state (v + (t)) and jet-up state (v − (t)) with v 2 (t) and v 3 (t) for different noise levels. The black curve shows the symmetry of v 2 by correlating the southern part ( 2 ( ), 0.5 ) with the northern part The v + (t) and v − (t) are determined in the same way as in Figure 2. 12 of 20 share similar transition paths as in Figure 7c which is related to the sign difference for v 3 . We also find comparable results between Figures 7e and 7d. This again underlines our earlier point that transitions are dependent on the spatial patterns of the different DO modes and the time-varying DO modes favor particular Y 1 transitions over time.
Another factor which may influence the transition probabilities is the difference in the initial number of realizations starting from the + 1 (5,210) and − 1 (4,790) branch at t = 0. This difference results in slightly different 13 of 20 values of + 1 and − 1 (Figure 2f) and a small asymmetry in the DO mean and other DO modes (Figure 2c). To investigate whether this has an effect on the transitions, we branched off a simulation from the end of the standard case simulation at t = 10 where we fitted the + 1 ( = 10) distribution and we drew 500 positive and 500 negative (by multiplying with a factor −1) from the original + 1 ( = 10) distribution. The other Y i coefficients, DO modes and noise level (σ = 1) remained unaltered at t = 10. We continued the simulation up to t = 15 ( Figure S4 in Supporting Information S1) and then branched off a simulation with higher noise level (σ = 15) and n Y = 10, 000 (copying 10 times the Y i PDFs). Hence, we now have an equal number (5,000) of realizations starting in both Y 1 branches.
The values of + 1 and − 1 are the same and the is completely symmetric. Note that some of the DO modes are flipped (w.r.t. y = 0.5) and/or switched in sign (compare Figure 2 and Figure S4 in Supporting Information S1). We mainly find transitions from − 1 to + 1 between t = 0-10 ( Figure S5 in Supporting Information S1). In this simulation, v 3 does not switch sign (see v 3 (t = 2.5) Figure S2f and v 3 (t = 6) Figure S2l in Supporting Information S1) and not favoring the other transition from + 1 to − 1 . This is also reflected in the spatial correlations ( Figure S3b in Supporting Information S1), where v 3 is stronger correlated with v − than with v + before both transitions at t ≈ 2.5 and t ≈ 6. Stochastic variability is a plausible explanation why v 3 did not switch sign after t > 5, but it is likely that this mode will eventually switch sign for longer integration times and then favoring transitions from + 1 to − 1 . Despite that we mainly find one type of transition ( − 1 to + 1 ), the transition probabilities and paths ( Figure S5 in Supporting Information S1) are comparable to the results above (Figure 7).
We did not show the evolution of v 1 (tripolar mode) for the different noise levels as the pattern of this mode remains fairly constant, even for the higher noise level simulations ( Figure S6 in Supporting Information S1).

Transitions: Different Number of DO Modes
In this section, we deviate from the standard case by increasing the number of DO modes (n V ) while the Reynolds number is not altered (Re = 40). Increasing n V results in a better approximation of the exact solution, but comes at the cost of longer computational time. Each additional DO mode effectively adds more variance to the entire system, note that the variances of higher DO modes are relatively small compared to the first four DO modes. The tri-polar mode (v 1 ) is the dominant DO mode and the spatial pattern is robust for higher n V (see Figures 1a  and 1c). The spatial structures of the higher DO modes are different than the standard case and become more complicated for higher n V . For example, the gyre mode (v 2 ) from the standard case (n V = 4, Figure 2e) is somewhat similar to the sum of v 2 and v 3 for n V = 5 and n V = 10 (Figures 8c and 8f).
The simulations with more DO modes can either be initiated from rest (Figures 8a and 8d) or branched off from the standard case at t = 10 (Figures 8b and 8e). The variances and spatial patterns of the DO modes are similar between the methods, although there are some sign differences in the spatial patterns. We have chosen to branch off the simulations from the standard case, which has two advantages. First, the number of + 1 (521) and − 1 (479) realizations at the end of the standard case is identical as in the standard case and therefore easy for comparison. Second, the computational time is at least a factor three shorter than a full simulation for each n V .
The variances of v 1 and v 2 are similar among the simulations for varying n V while those of v 3 and v 4 strongly increase for higher n V (Figure 8g). Variability in the Y 1 and Y 2 coefficients is comparable to the standard case (Figures 8h and 8i). The Y 1 and Y 2 deviations remain small and we observe no transitions with a noise level of σ = 1 within the chosen simulation time. In the previous subsection we demonstrated that a symmetric gyre mode is key for transitions, but note that the symmetry in v 2 (Figure 8i) is the result of splitting the standard gyre mode pattern into two different modes (i.e. composite of v 2 and v 3 ) for higher n V . Otherwise, we would expect the largest Y 2 deviations for n V = 5, but this is not the case. Similar as was done for n V = 4, we branched off simulations with n Y = 10, 000, Δt = 5 × 10 −4 , t = 5 for n V > 4 and different noise amplitudes. There are no transitions for σ = 5 and σ = 10 before t = 5; a similar result was found for the standard case.
There are Y 1 transitions for σ = 15 for varying n V (Figures 9a and 9b). However, note the higher temporal variability in Y 1 for simulations with more DO modes compared to the standard case (n V = 4). More DO modes increases the variance of the entire system, which results in more temporal variability in Y 1 (as well as for the other Y i coefficients). The relatively high temporal variability in the Y i coefficients results in very noisy transition paths, hence we do not show these results for n V > 4, but there are common factors when Y 1 undergoes a transition. When Y 1 switches sign, the magnitude of Y 2 is relatively strong and the second mode transfers some of its energy to the other modes, this is 14 of 20 comparable to the standard case (Figures 9c and 9d). The spread in the number of transitions is about constant for n V ≥ 6 (see insets Figures 9c and 9d) and together with the variances of the DO modes (Figure 9e) shows the convergence of the solution with n V . The system state is remarkably constant for n V = 5, as there are hardly variations in the Y i coefficients, spatial patterns and energy transfer rates ( Figure S7 in Supporting Information S1). As was demonstrated for the standard case, variations in the spatial patterns influence various DO quantities. Spatial variations can be induced through stochastic noise or when the spatial patterns of two DO modes are inter-exchanged because their variances are very similar. The variances are differently separated when adding one more DO mode to the standard case and adding two or more DO modes result in a similar separation again (Figures 9e and 9f). The variances of the different modes are well separated for n V = 5, which likely explains why there are no Y 1 transitions for n V = 5. Although more DO modes approximates the exact solution better, we find comparable results as in the standard case (except for n V = 5), in that during Y 1 transitions the magnitude of the gyre mode (i.e., Y 2 ) plays an important role.

Transitions: Other Reynolds Numbers
Close to Re = 30, the asymmetric branches which appear from the pitchfork bifurcation in the deterministic case, will come closer together. Close to Re = 52, oscillatory modes which destabilize the mean state at a first Hopf 15 of 20 bifurcation will become more important in the behavior of the flow. Hence, we varied Re between 35 and 50 and used a DO simulation from rest using otherwise standard case parameters (i.e., σ = 1). Although the results slightly vary when adding more DO modes to the system, we set the number of DO modes to n V = 4 for easy comparison with the standard case. For Re ≤ 35 the circulation takes a longer time to equilibrate (compared to the standard case), so we simulated up to t = 20 using Δt = 10 −3 . For Re ≥ 45, the circulation is more variable and therefore we used a finer time step of Δt = 5 × 10 −4 and integrated the model up to t = 15. Figure 10 shows the jet-down state (v + ), the DO modes v 1 and v 2 for Re = 35, 45 and 50 (for full simulation results see Figures S8, S9, S10 in Supporting Information S1). Higher values of Re result in a stronger jet-down state (Figure 10j), which is also reflected in a stronger tri-polar mode (i.e., larger |Y 1 | coefficients, Figure 10k). On the other hand, for lower values of Re, v 1 is weaker and the distance between the two Y 1 branches becomes smaller The mean and 95%-Cl are determined over all the jet-down realizations (blue curve). (k): The mean of |Y 1 | and maximum deviations from the Y 1 mean branches (see also Figure 5) for varying Re. The mean and 95%-Cl of the largest deviations from the Y 1 branches are determined over t − 2 to t. (l) The symmetry of v 2 (similar to Figure 6) and the largest deviations from 2 for varying Re. The mean and 95%-Cl of the correlation coefficient and deviations are determined over t − 2 to t. The noise level is σ = 1 for all simulations.
(|Y 1 |, Figure 10k) and eventually becomes zero when reaching the pitchfork bifurcation at Re ≈ 30. As was shown for the standard case, a relatively weak v 1 (relatively low |Y 1 |) in combination with a (stronger) symmetric v 2 is key to causing transitions. We find a more symmetric v 2 for lower Re, which results in larger deviations in the Y 1 and Y 2 realizations (Figures 10k and 10l). The relatively large deviations in Y 2 for Re = 46 and Re = 47 are the results of a strongly time-varying gyre mode in these simulations. Overall, the symmetry of v 2 is dependent on Re and is not related to a different composite of the gyre mode as this is the case for simulations with n V > 4 (see Figure 8i). This suggests that transition probabilities are dependent on Re, where lower values of Re have higher transition probabilities; however, we find no transitions with the current noise level of σ = 1.
Similar as was done before, we branched off simulations with σ = 10, n Y = 10, 000 and Δt = 5 × 10 −4 for Re = 35, 45 and 50. There are now transitions in Y 1 (Figure 11a) for Re = 35 as the DO mode v 2 is highly symmetric for lower Re (Figure 10l). There are no transitions for higher Re (40, 45 and 50) with the same noise level (Figures 5 and 11). Note that v 2 has switched sign for Re = 35 (Figure 11b) and therefore we show the lowest Y 2 coefficients. The results show again that a strong (i.e., more negative Y 2 for Re = 35) symmetric DO mode v 2 is crucial for having transitions in Y 1 .

Summary and Discussion
Determining transition paths and probabilities of transitions between different states in a multi-stable model described by stochastic partial differential equations is a substantial computational challenge. Traditional Monte-Carlo methods are in most cases too computationally heavy because of the large number of simulations required and the different equilibria (i.e., initial distribution of the ensemble) are not known a priori. Often a long single trajectory is analyzed in a reduced space, such as a two-dimensional data based space of Empirical Orthogonal Functions  (Crommelin, 2003;Tantet et al., 2015). Another approach is through Adaptive Multilevel Splitting techniques, which require adequate score functions to be successful (Baars et al., 2021;Lestang et al., 2018).
In this study we proposed a novel method using the Dynamical Orthogonal (DO) field approach as model reduction technique. First, a matrix formulation of DO was provided which is much more versatile than the original method, as the DO equations do not have to be derived analytically (Sapsis & Lermusiaux, 2009). The version used here is limited to at most quadratic nonlinearities in the equations and additive stochastic forcing. The DO equations are generated numerically once the user provides the right hand side of the discretized stochastic partial differential equations. The dimension of the problem is substantially reduced by the DO method and the lowest number of DO modes which sufficiently captures the dynamics of the solutions can be found by adaptive methods (Sapsis & Lermusiaux, 2012). The matrix-DO formulation is implemented in an openly available Python software package and can be used for various (geophysical) problems. The idea behind the method is that transitions will mainly occur in a subspace where the probabilities of the trajectories is high. Once this subspace is dynamically computed, as in DO, the transition probabilities and paths are determined by just counting trajectories, as in traditional Monte Carlo methods.
As an interesting application of the method, we considered the transitions between a jet-up and jet-down state in a barotropic quasi-geostrophic model of the wind-driven ocean circulation of which the evolution operator is not self-adjoint. Such transitions are thought to be relevant for path changes in ocean western boundary currents, in particular in the Kuroshio Current near Japan. As the DO approach solves the stochastic partial differential equations numerically, the results on the transition probabilities and transition paths are considered as accurate as this numerical approximation of the solution. When configured into a multi-stable deterministic regime, no transitions occur in this model for low noise levels within the chosen simulation time, which suggests that for this case the transition probabilities are (extremely) low.
For sufficiently large noise levels there are transitions and these follow specific transition paths which are determined by the amplitude and asymmetry (with respect to the mid-basin axis) of v 2 . When the noise level is too low, v 2 remains asymmetric and hinders transitions. The transition probabilities are also dependent on the eddy viscosity of the flow (represented through Re), where for higher viscosities, but still in the multi-stable regime, the system is more susceptible of undergoing (then small) transitions because the v 2 DO mode is more symmetric. This seems at first counter intuitive, as the degree of instability is smaller but it can be understood in the framework of Freidlin-Wentzell theory and its extensions (Lucarini & Bódai, 2019, 2020. In the weak-noise limit, the logarithm of the expected escape time is proportional to the quasi-potential height difference, which is lower in the small Reynolds number case. The matrix-DO formulation is a special case of a dynamical low rank and tensor approximation (Feppon & Lermusiaux, 2018;Hesthaven et al., 2022) and such more general approaches may be useful when extending the method under more general nonlinearities. An alternative approach to capture directions of transient instabilities is through the computation of Optimal Time Dependent (OTD) modes (Babaee & Sapsis, 2016). While these instabilities have a finite lifetime, they can be crucially important in altering the system dynamics through the activation of other instabilities or by creating sudden nonlinear energy transfers. For example, the time-dependent basis of OTD modes can capture strongly transient non-normal energy growth. The OTD modes also converge exponentially fast to the eigendirections of the Cauchy-Green tensor associated with the most intense finite-time instabilities (Babaee et al., 2017).
All of these model reduction techniques increase the computational viability of rare event algorithms such as TAMS (Lestang et al., 2018) to study transitions with low transition probabilities in stochastic partial differential equations. Just like in the projected time-stepping method (Baars et al., 2021), where the basis consists of eigenvectors of the covariance matrix at steady state, a "frozen" DO (or OTD) basis can be used instead. This will reduce the TAMS computations enormously as the algorithm only has to deal with a small dimensional subspace when computing trajectories. First tests for the wind-driven ocean circulation model as used in this paper show promising results, but future work is needed to establish the potential of this combined approach.

Data Availability Statement
The matrix-DO Python software is openly available at https://doi.org/10.5281/zenodo.7180807 together with model set-up of the Burgers' equation and the wind-driven ocean circulation model as used in this paper. The data for generating the main result are also provided at https://doi.org/10.5281/zenodo.7180807, the full (raw) data is