Scale-aware deterministic and stochastic parametrizations of eddy-mean flow interaction

The role of mesoscale eddies is crucial for the ocean circulation and its energy budget. The sub-grid scale eddy variability needs to be parametrized in ocean models, even at so-called eddy permitting resolutions. Porta Mana and Zanna (2014) propose an eddy parametrization based on a non-Newtonian stress which depends on the partially resolved scales and their variability. In the present study, we test two versions of the parametrization, one deterministic and one stochastic, at coarse and eddy-permitting resolutions in a double gyre quasi-geostrophic model. The parametrization leads to drastic improvements in the mean state and variability of the ocean state, namely in the jet rectification and the kinetic-energy spectra as a function of wavenumber and frequency for eddy permitting models. The parametrization also appears to have a stabilizing effect on the model, especially the stochastic version. The parametrization possesses attractive features for implementation in global models: very little computational cost, it is flow aware and uses the properties of the underlying flow. The deterministic coefficient is scale-aware, while the stochastic parameter is scaleand flow-aware with dependence on resolution, stratification and wind


Introduction
Ocean mesoscale eddies, with scales of 10-100 km, are turbulent features in the ocean derived from barotropic and baroclinic instabilities, and are strongly influenced by wind forcing and stratification. Eddies play a key role in ocean circulation, including tracer transport, mixing and stirring, and actively participate in energy transfer between scales. The mesoscale eddy energy is particularly enhanced in the vicinity of western boundary currents and their extension (e.g. Gulf Stream and Kuroshio), and in the Southern Ocean. Eddies are crucial in the feedback of energy to the large-scale flow (e.g., Scott and Arbic, 2007 ) and in maintaining the jet extension via upgradient momentum fluxes leading to sharpening of gradients ( Greatbatch et al., 2010 ).
Climate models from the Coupled Model Intercomparison Project (CMIP) archive ( Taylor et al., 2012 ) used for the last Intergovernmental Panel on Climate Change ( IPCC, 2013 ) have too coarse horizontal resolution to resolve these eddies. The effect of eddies on the large scale is parametrized in such coarse resolution models using the Gent-McWilliams parametrization ( Gent and achieved using Laplacian viscosity (or diffusion) with too large coefficients, or using hyperviscous parametrization ( Holloway, 1992;Frisch et al., 2008 ) or biharmonic closure ( Smagorinsky, 1963;Leith, 1990;Griffies and Hallberg, 20 0 0 ) which dissipates enstrophy at the grid scale near the deformation radius and scales with model resolution. However, recent studies have shown that hyperviscosity, in addition to representing a direct enstrophy cascade ( Bachman et al., 2017 ), spuriously dissipates energy at small scales Jansen and Held, 2014 ). Parametrization of sub-grid scale eddies for eddy-permitting regimes are therefore needed to either correct the spurious loss of energy resulting from the use of hyperviscosity (including modified hyperviscosity; Fox-Kemper and Menemenlis, 2008 ), or to replace hyperviscosity altogether. The aim of our paper is to introduce an eddy parametrization, derived for eddy-permitting models, that makes use of the resolved variability, mimics the behaviour of Reynolds stresses such as sharpening ocean jets, scales with resolution and the flow, and feeds back energy lost due to viscosity. Jansen and Held (2014) propose to re-inject the energy lost at small scales using a negative viscosity determined by an energy equation following Eden (2010) . Filtering of the velocities, as done for example in the Lagrangian-averaged Navier-Stokes-α model ( Holm and Wingate, 2005;Holm and Nadiga, 2003 ), or the nonlinear gradient approximation ( Nadiga and Bouchet, 2011 ) have shown promising results (see PMZ14 and Anstey and Zanna (2017) for comparisons between our proposed schemes and these studies). However, recent studies ( Graham and Ringler, 2013 ) highlighted that these parametrizations can lead to a build-up of enstrophy at small scales and to numerical instability. Other approaches at eddy-permitting resolutions have argued for the use of a stochastic term for upgradient momentum fluxes and energy backscatter in spectral models ( Kraichnan, 1976;Frederiksen and Davies, 1997;Duan and Nadiga, 20 07;Nadiga, 20 08;Kitsios et al., 2012;Grooms and Majda, 2013 ). The sub-grid forcing is generally constrained by an energy spectrum. In quasi-geostrophic models the need for upgradient momentum closures based on a stochastic model was also pointed out ( Berloff, 2005b( Berloff, , 2015( Berloff, , 2016. However, all approaches require some a priori knowledge of sub-grid eddy statistics. Here we implement a parametrization proposed by Porta Mana and Zanna (2014 , referred to as PMZ14). In PMZ14 we diagnosed a relationship between the missing eddy forcing and a non-Newtonian stress divergence ( Ericksen, 1956;Rivlin, 1957 ). The missing forcing is defined as the PV eddy flux divergence resulting from a high-resolution eddy resolving model compared to an eddy permitting model. The non-Newtonian stress divergence depends on the Lagrangian rate of change of the potential vorticity (PV) gradient and its local deformation. The relationship between the missing eddy forcing and a non-Newtonian stress divergence was inspired by general principles of potential vorticity conservation, frame-invariance, differential memory ( Truesdell and Noll, 2004 ) and symmetry properties of the stress tensor ( Bachman and Fox-Kemper, 2013 ). The relationship, more intuitively, is based on an argument that in eddy-permitting models the rate of strain, eddy shape and orientation, and the PV gradient can be used to mimic the evolution of the eddy PV forcing ( Nadiga, 2008;Anstey and Zanna, 2017 ). The work argued that the parametrization could be efficient in a deterministic mode, with a coefficient for the parametrization that scales with model resolution. In addition, a stochastic parametrization was also presented, with a stochastic forcing term whose probability is conditional on the non-Newtonian forcing, wind forcing, stratification, and model resolution.
This paper is structured as follows. In Section 2 we briefly present the quasi-geostrophic model used in the current study. In Section 3 we discuss two implementations of the parametrization, one deterministic and one stochastic. In Section 4 we present the results of the two different implementations for the mean flow and variability. Section 5 is a discussion of the impact of the parametrized forcing on the momentum, energy and enstrophy budgets and presents ways forward for implementation in primitive-equations models. We briefly conclude in Section 6 .

Model setup
The model used in the present study, PEQUOD, solves the forced dissipative baroclinic quasi-geostrophic (QG) potential vorticity (PV) equation on a beta plane in a square basin (e.g., Berloff, 20 05a, 20 05b ). The main setup is similar to the one used in Porta Mana and Zanna (2014 , PMZ14). The model is composed of three isopycnal layers with thicknesses H m (with m = 1 , 2 , 3 for the upper, middle and bottom layer, respectively). For each layer m , the prognostic equation solved for the potential vorticity q is given by The planetary vorticity is f = f 0 + βy , ∇ = ( ∂ ∂x , ∂ ∂y ) is the horizontal gradient, N is the Brunt-Väisälä frequency of the mean density stratification and ψ is the streamfunction derived from the non-divergent velocity such that u m = (− ∂ψ m ∂y , ∂ψ m ∂x ) . The dissipation term is D m = −r∇ 2 ψδ m, 3 − ν∇ 6 ψ m , where δ m, i is the Kronecker delta function. The first term parametrizes the presence of a bottom Ekman layer with a bottom drag coefficient r . The second term is a horizontal biharmonic viscosity term, with viscosity coefficient ν, which scale-selectively dissipates enstrophy near the grid-scale. Note that PMZ14 used a Laplacian viscosity term rather than a biharmonic term for high-resolution and eddy-permitting runs (but not for coarse resolution runs). The use of hyperviscosity in the present study is to allow a setup that mimics current eddy-permitting ocean model setups, and ensures small-scale dissipation and numerical stability (we struggled to keep the model stable when using the deterministic parametrization; see Section 3 ). The hyperviscous term was calculated at the previous timestep for practical reasons (see Section 3 b).
The forcing F m , applied to the upper layer, is the curl of the wind stress τ : where ρ 0 is the reference density. The wind stress curl profile is identical to PMZ14 and spins up two gyres separated by a strong meandering jet emanating from the western boundary. The term F eddy m is the eddy parametrization, which can take a deterministic or a stochastic form. We use different model configurations defined as follows: i. The "truth": a high-resolution run with 7.5 km horizontal resolution. The eddy forcing term F eddy m , in Eq. (1) , is set to 0. ii. Low-resolution unparametrized runs: runs at eddy-permitting resolution with horizontal grid-spacing of 30 km and 60 km; and a coarse-resolution run at resolution of 120 km. No parametrization of eddy forcing is included, i.e. F eddy m is again set to 0. iii. Low-resolution parametrized runs: Same as in (2) has a spatial and temporal dependence on the flow, which can be deterministic or stochastic as defined in Section 3 .  The length of the integrations from rest to statistically steady state is 410 years. All simulations presented are numerically converged and are solved using centred-leapfrog with RAW filter ( Williams, 2009 ) and a modified Arakawa advective scheme ( Arakawa, 1966 ). Additional experiments using different numerical schemes led to similar results to the ones presented in the following sections. We output the data daily (as snapshots), with all steady states and variance calculations done using the last 200 years of model integration. The common model parameters for all runs are defined in Table 1 while the differing model parameters for various resolutions are listed in Table 2 . Fig. 1 shows the statistically steady streamfunction ψ for the high resolution truth (panel a) and the 30 km-resolution unparametrized model (panel b). A few differences between the runs include the absence of a strong and narrow eastward jet at low resolution compared to the truth; and a too far south separation point at the western boundary. The eddy-permitting simulations at 30 km and 60 km (the latter not shown) generate some eddies via barotropic and baroclinic instabilities but with fewer filaments and weaker turbulence compared to the eddy resolving run (cf Fig. 2 a and b). The potential vorticity snapshot at low resolution also hints at some numerical instability near the jet, due to the appearance of sharp features of alternating sign with spatial scale of the model gridscale (30 km). The differences in the simulations arise as a result of the small Reynolds number at low resolution, due to the increased horizontal grid box size and viscosity.

Parametrization of sub-grid mesoscale eddies
In this section we introduce the parametrization proposed by PMZ14 and discuss the deterministic and stochastic implementations in the baroclinic QG model of Section 2 . In the remainder of the paper we omit the layer subscript m for conciseness.

PMZ14 parametrization
In PMZ14 we postulate that the divergence of the sub-grid eddy PV stress in QG ( S * ) reflecting the missing forcing due to truncated nonlinear advection and increased dissipation between the truth and a low-resolution run can be well approximated by where κ is a scalar, independent of space or time. The value of κ is estimated by coarse-graining high-resolution simulations onto a coarse resolution grid. Unlike common closures, the parameter κ is not a diffusivity or viscosity coefficient, as it has units of length squared. As discussed in PMZ14, the proposed closure, obtained by imposing several mathematical and physical constraints such as frame-invariance and memory, can be expressed as a non-Newtonian second order Rivlin-Ericksen-like stress ( Ericksen, 1956;Rivlin, 1957 ) using where ∇u is a rank-2 tensor (i.e., a matrix) given by Qualitatively, the parametrization can amplify or weaken existing gradients of PV. The parametrization is mainly applicable at eddypermitting resolution in which instabilities can create meanders and deformations of the flow field ( Fig. 2 b) which are required for the parametrization to perform most effectively.
The robustness of the relationship between the diagnosed eddy forcing ( S * ) and the proposed parametrization is thoroughly quantified in PMZ14 by constructing probability distribution functions (PDFs) and conditional probability distribution functions (cPDFs) 1 . The diagnosed eddy forcing PDFs conditional on the expression (4) are used to assess the degree to which the "true" eddy forcing correlates with ∇ 2 D q D t for each layer. PMZ14 find, using the conditional PDFs P (S * | ∇ 2 D q D t ) , a very strong linear correlation between the two quantities, such that the mean of the PDFs of the eddy forcing term conditional on ∇ 2 D q D t can be used as a deterministic parametrization using a constant coefficient κ, where κ is negative with no spatio-temporal dependence. 2 The correlation can exhibit large deviations from its mean for large values of eddy PV forcing (i.e., the PDFs are not delta functions). Excursions in the eddy forcing can reach more than 40% of its mean value. A single constant coefficient, or a single value of the parametrized eddy forcing, therefore cannot adequately represent the behaviour of highly turbulent eddying regions.
Large values of eddy forcing parametrization, ∇ 2 D q D t , describe large growth rate of nonlinear instabilities developing in the flow. Such nonlinear processes can exhibit large fluctuations which are reflected in skewed PDFs, as found in PMZ14. Therefore a statistical (stochastic) parametrization of ocean eddies might be more appropriate when attempting to capture the effect of eddies in highly turbulent jet regions. In the following two sections we describe the implementation of a deterministic and stochastic version of the eddy parametrization in the QG model.

Deterministic parametrization
Assuming that a constant (and negative) value for κ can be used in (4) , the QG model equation with the deterministic eddy parametrization is given by The parametrized equation can be rewritten as Using coarse-grained high-resolution experiments, PMZ14 observe that the value of κ, which was found to be negative, only scales with the grid size of the coarse-resolution model. The operator acting on the PV tendency and advective terms, 1 − κ∇ 2 , behaves as a roughener of total forcing in the PV equation. This operator could be viewed as a filter acting to replace the scales of motion that are truncated by the coarse grid scale of the model, with the strength of the filter being determined by the value of κ.
Due to the singularity present at wavenumbers K 2 = −1 /κ, inverting the operator 1 − κ∇ 2 t o solv e Eq. (9) is not feasible. Instead, we numerically solve Eq. (7) by evaluating κ∇ 2 D q D t at the previous timestep, so that the parametrization is implemented as where t is the model timestep increment (the hyperviscous term is also commonly calculated at the previous time step for numerical stability) and other terms are calculated at time t . The implementation of the deterministic parametrization comes with the same computational cost as biharmonic viscosity. The implementation only requires saving the Lagrangian (material) derivative of potential vorticity at the previous timestep and calculating its Laplacian.
Using a large range of numerical simulations under different forcing and dissipation and different geometric configurations, PMZ14 estimate the mean κ value as −( α x ) 2 , where α ≈ 0.45. To avoid the singularity present at wavenumbers K 2 ≈ −1 / ( α x ) 2 , we adjust the value of κ such that | κ| = ( 0 . 31 x ) 2 in order to ensure numerical stability (see Section 5.1 for further discussion).

Stochastic parametrization
Even if the deterministic parametrization shows a useful relationship between the eddy properties and the large scale flow, it remains difficult to predict the behaviour of turbulent flow with a single deterministic relationship based on the resolved scales. The turbulent nature of the flow field suggests the use of a nondeterministic closure to describe the eddy-mean flow interaction. PMZ14 showed that probability distribution functions of the eddy forcing conditional on the large scale flow, ∇ 2 D q D t , can be reconstructed given solely information from the coarse-resolution QG model. One main characteristic is that the standard deviation and the higher-order moments of the PDF increase with the Reynolds number and decrease with the coarse-graining grid box size.
In the eddy-permitting range considered here, the standard deviation can be expressed to a very good approximation as The standard deviation scales linearly with the wind stress and exhibits an inverse dependence on the resolution. The standard deviation also scales linearly with the PV variance, such that with γ q = 1 . 7 (also non-dimensional) and is valid whether we consider wind or buoyancy forcing (see PMZ14 for sensitivity to forcing, resolution and model configuration). Note that using coarse-graining diagnostics, we find that the maximum value of σ is always proportional to the mean value of the eddy forcing, with a scale-independent proportionality constant, such that σ ∝ x 2 (not shown). Therefore, as x goes to zero, the mean and standard deviations also tend to zero. The standardized skewness and kurtosis (i.e., skewness and kurtosis scaled by the standard deviation) were shown to be roughly independent of model parameters and found to be O (1), with values μ 3 ≈ 0.61 and μ 4 ≈ 1.4, respectively. The resolved quantities therefore fully determine the standard deviation, skewness, and kurtosis of the PDFs.
For each run, we use the relationships found in PMZ14 and described above to reconstruct the PDFs, based on a maximum entropy procedure ( Mead and Papanicolaou, 1984 ), and use them as the basis for a stochastic parametrization. The implementation of the stochastic parametrization can thus be written as where the stochastic term F is sampled according to the reconstructed conditional PDFs, with spatial and temporal scales, x stoc and t stoc , respectively. The scales determine intervals at which the stochastic term F is selected before being recalculated. In all the runs presented here, we diagnose the value of at every grid box ( x stoc = x ), then sample F from the corresponding conditional PDFs P (S * | ∇ 2 D q D t ) ; the value of F is kept for t stoc = 1 day. The timescale is chosen by using the coarse-grained information which showed a decorrelation time-scale of about 1 day for the eddy forcing term. The results deteriorate if the timescale is too short (on the order of a few timesteps), which turns the parametrization into a simple AR1 process and is not an accurate representation of the eddy forcing (see Berloff, 2005b ). It is important to reiterate that the stochastic parametrization is not equivalent to simply adding random fluctuations: the stochastic term is sampled from a PDFs given a value of the local ; it is therefore strongly dependent on the resolved flow field. For this implementation there is no rescaling of the mean or any other moments of the PDFs. All results presented in the paper are using the above implementation. The computational cost associated with this stochastic parametrization is about 10% more compared to the deterministic parametrization. The cost can be reduced by changing the spatial and temporal decorrelation of the stochastic term ( x stoc and t stoc , respectively) or by slightly changing the implementation. A slightly faster implementation can be introduced. The following implementation, similarly to the deterministic runs, requires tuning of the mean of the PDFs to ensure numerical stability by avoiding the singularity discussed in Section 3.2 . This additional implementation is briefly outlined below, however readers can skip directly to Section 4 for the results of the simulations described above. We can decompose the eddy parametrization into a deterministic and stochastic term such that The stochastic term is determined as follows. The standard deviation σ of the cPDFs scales with ∇ 2 D q D t (t − t ) , and skewness and kurtosis of the cPDFs are standardized moments such that the shape of the cPDFs is selfsimilar. We can therefore select a value s * from a single rescaled conditional PDF with zero mean, standard deviation of one, and constant moments μ 3 , μ 4 such that P (s * | μ = 0 , σ = 1 , | μ 3 | , μ 4 ) . The rescaled cPDF is independent of ∇ 2 D q D t . We can then rescale With this implementation, there is no need to first check the value of ∇ 2 D q D t (t − t ) and then choose from a specific conditional PDF. The results of this stochastic implementation are slightly closer to the deterministic simulations than to the stochastic simulations described using Eq. (11) , possibly due to the reduction in κ necessary for numerical stability.

Results
In this section, we analyse and compare the results of deterministic and stochastic implementations at different horizontal resolutions, with a focus on runs performed at 30 km resolution. Fig. 1 shows the statistically steady streamfunction ψ for the high resolution truth (7.5 km horizontal resolution) and for the unparametrized and parametrized eddy permitting runs at 30 km horizontal resolution. The parametrized runs show clear improvement in the steady state streamfunction for the deterministic ( Fig. 1 c) and stochastic ( Fig. 1 d) implementations, compared to the unparametrized run at the same resolution ( Fig. 1 b). The steady state streamfunction for the parametrized simulations are closer to the truth ( Fig. 1 a) than the unparametrized run. The strength of the subtropical gyre is marginally increased in the parametrized runs to 14.71 Sv and 15.22 Sv, compared to 14.48 Sv in the unparametrized run and 15.54 Sv in the high resolution run. The strength of the subpolar gyre is significantly improved from −17.40 Sv in the unparametrized run, to −18.92 Sv and -21.6 Sv in deterministic and stochastic runs, compared to −24.75 Sv in the high resolution run. Another feature is the improvement of the separation point of the jet in the parametrized runs. In the unparametrized run, the jet separates from the western boundary 300 km further south than in the high resolution. In the deterministic and the stochastic runs, the separation point is located only 210 km and 60 km south of the separation point in the high resolution run, respectively.
Snapshots of potential vorticity for the different model runs, taken after 250 years of model run, are shown in Fig. 2 , only for illustrative purposes. The unparametrized run shows the presence of weak turbulence and also some grid scale numerical noise (panel b). However the high resolution (panel a) and parametrized runs (panels c and d) exhibit a lot more turbulence and filamentation. The introduction of the parametrization results in the enhancement of gradient of potential vorticity but also attenuates the numerical noise present in the unparametrized simulation (see Section 5 for discussion). Fig. 3 shows the steady state kinetic energy as a function of latitude at x = 120km and x = 300km , further highlighting the presence of a strong jet detaching from the western boundary in the high-resolution simulation (black). The zonal eddy momentum stress divergence, ∇ · u u , is shown in panels (c) and (d) for the same longitudes as panels (a) and (b), where primes are anomalies from the time mean denoted by an overline ( Waterman and Jayne, 2012 ). In the unparametrized simulation (blue), the jet is weaker and broader than in the high resolution simulation. The deterministic (red) and stochastic (grey) parametrizations lead to a sharpening of the jet and an increase in the zonal velocity (hence in the kinetic energy) in the core of the jet. The jet sharpening is a direct consequence of the parametrization, which enhances the eddy momentum stress divergence. Another notable impact of the parametrization is to inject energy back into the large-scale flow, as shown in Fig. 4 . The introduction of the eddy closure vastly improves the spectrum of kinetic energy over all wavenumbers by reducing the amount of energy dissipated at small scales and allowing energy spuriously lost to be backscattered.
This can further be explored by considering the turbulent energy budget Larichev and Held, 1995;Scott and Arbic, 2007;Straub and Nadiga, 2014 ). As shown previously by these authors, the spectral budget for total energy in the QG model can be expressed as the sum of the contribution in Fourier space from forcing, bottom friction, hyperviscosity and from the redistribution of energy across scales from the spectral transfer of kinetic and available potential energy (APE). Fig. 5 shows the statistically steady state budget of the different model simulations. For the high-resolution run at a resolution of 7.5 km (panel a), the spectral characteristics are reminiscent of other QG and primitiveequation models, and of baroclinic turbulence ( Charney, 1971;Rhines, 1977;Salmon, 1978 ). Forcing and dissipation mostly balance each other, and the spectral transfer of kinetic and APE summing up to zero for all wavenumbers. The transfer of kinetic energy has a large negative lobe (sink of kinetic energy) at higher wavenumbers, and a large positive lobe (source of KE) at smaller wavenumbers. Integrating this transfer (starting from large wavenumbers) would produce a single, large negative lobe representing a net inverse transfer of energy to larger scales. APE is extracted at large scales and transferred down-scale toward the deformation scale. At the deformation scale, energy is converted to kinetic energy and this kinetic energy is transferred to large scales. Our parametrization is attempting to reproduce this energetic behaviour. There is no transfer of kinetic energy toward small scale, therefore as expected the term due to hyperviscosity is very small at a resolution of 7.5 km.
At a resolution of 30 km ( Fig. 5 b), without an eddy parametrization, our results are similar to Scott and Arbic (2007) ; Hallberg (2013) ; Jansen and Held (2014) with energy being lost near the grid scale (roughly the deformation scale) due to hyperviscosity, and therefore only part of the energy being fluxed towards small scale as APE is then fluxed upscale as kinetic energy. Both the sink of kinetic energy at higher wavenumbers and the source at lower wavenumbers are significantly smaller compared to the high-resolution case, thus leading to a reduced inverse transfer of kinetic energy to larger scales. As a consequence, there is less kinetic energy at large scales and a reduction of the APE being extracted. When the deterministic parametrization is introduced ( Fig. 5 c), energy is returned mostly at scales only somewhat larger than the deformation wavenumber as seeing by the increase in the size of both the positive and negative lobes in spectral transfer. The most dramatic improvements are when the stochastic parametrization is introduced ( Fig. 5 d), a larger portion of kinetic energy is being fluxed back up to the larger scales and more closely matches the kinetic energy transfer of the high-resolution simulation. These results are is in agreement with our other diagnostics.  c and d with b). The deterministic parametrization leads to the creation of a jet extension, thereby reducing the error in the streamfunction away from the recirculation region. However, the introduction of the stochasticity vastly improves the recirculation gyres and leads to an error of less than 1 Sv at any location in the basin. At eddy permitting resolutions (30 and 60 km), the deterministic version of the parametrization leads to about 50% error reduction, and the stochastic component to a further 20 to 40%. At coarser resolution (120 km), there is a very small improvement in the mean flow when the deterministic parametrization is implemented, and there is no added value to the implementation of stochastic parametrization when considering the mean flow error.
Despite the limited number of tests at different resolutions, it appears that there is a cut-off at which stochasticity impacts upon the mean flow compared to the deterministic parametrization, which is roughly equal to the Rossby radius of deformation. Further tests will be required to validate the critical or cut-off resolution. Additionally, the reduction in mean flow error at coarse resolution (120 km) is rather small which is in agreement with our initial assumption that the parametrization will be successful at eddy-permitting resolution by reinforcing the existing gradients.
Figs. 7 and 8 show that not only the mean flow is improved but also its spatial and temporal variability. The absence of a meandering jet in the low resolution run without any eddy closure is reflected by the loss of variance in the centre of the domain ( Fig. 7 ). The deterministic parametrization is shown to improve the variance significantly, especially at low frequency ( Fig. 8 ). The stochastic parametrization has a positive impact at high-frequency, as expected by the introduction of random fluctuations, but also at low frequency. Therefore the high-frequency variability of mesoscale eddies can have a significant impact on modulating low frequency variability ( Berloff et al., 2007 ).

Fig. 5.
Spectral budget of total energy as a function of horizontal wavenumbers K for the following runs with horizontal resolution: (a) 7.5 km, (b) 30 km unparametrized, (c) 30 km with deterministic parametrization, (d) 30 km with stochastic parametrization. The different terms in the spectral energy budget are due to forcing (blue), dissipation by bottom friction (red), hyperviscosity (green), and spectral transfer of kinetic (violet) and available potential energy (yellow). All terms are normalised by the total energy and by dK / K . The black vertical lines denote the deformation scale. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 6.
Mean Error relative to the high resolution truth: (a) Root-mean-square error, defined as the L2 norm of the low resolution runs with respect to the high resolution truth, as a function of resolution; Difference between the time-mean streamfunction of the high-resolution simulation and the 30km low-resolution (b) unparametrized, (c) with deterministic parametrization, (d) with stochastic parametrization. Fig. 9 shows the error in variance for the different runs: deterministic, and stochastic with different moments for the reconstruction of the cPDFs. The ability of the stochastic parametrization to reinforce or introduce variance is clear at all resolutions, even at very coarse resolution. There is some benefit to include non-zero skewness and kurtosis of the cPDFs. For the run at 120 km horizontal resolution, the injection of non-Gaussian stochastic forcing leads to a non-zero contribution to the variance. The deterministic parametrization, on the other hand, increases the variance of the eddy permitting runs only, with no visible impact for the coarse resolution run.

Discussion
In the previous section, we showed that the parametrization improves the energy spectrum as a function of frequency and wavenumber, and the jet strength and its separation point, while keeping the model numerically stable. We now discuss several aspects of vorticity, energy and enstrophy budget to provide some insights concerning the results described above. We also propose an implementation for a primitive equation model based on our results. Most aspects are discussed in the context of the deterministic parametrization but the stochasticity is discussed as well since its inclusion is beneficial to the representation of the mean and variance.

Vorticity forcing and stability criteria
Consider the parametrized potential vorticity equation in spectral space by taking the Fourier Transform of ( Eq. (7) ), we obtain (similarly to Eq. (12) in PMZ14) where κ < 0, i.e. κ = −| κ| ; f (K , t ) is the spatial Fourier transform of f ( x , t ), x is the position vector, K = (k, l) is the total 2D wavenumber with modulus K = | K | = k 2 + l 2 , and ∇ f = i K f . The argument ( K , t ) is omitted for convenience. As the total wavenumber K increases (i.e., the wavelength decreases), the amplitude of the Fourier transform of PV forcing increases (see Eq. (12) and Fig.  9 of PMZ14).
The value of κ in the simulations required some tuning to keep the model numerically stable and this is in part a direct consequence of Eq. (12) . The amplitude of the Fourier Transform of PV forcing has a singularity at K 2 = 1 | κ| , for a length scale of | κ| . However, the maximum wavenumber the model can resolve is equal to K max = 2 π / 2 x = π / x . Therefore avoiding the singularity requires that 1 − | κ| K 2 max > 0 , which using | κ| = (α x ) 2 leads to the condition (α x ) 2 < K 2 max = x 2 /π 2 . The stability condition is thus α < 1 /π = 0 . 318 , justifying the choice of α = 0 . 31 for numerical stability (rather than the diagnosed value of roughly 0.45 from PMZ14).
The tuning of κ is also sensitive to the sub-grid viscous dissipation, and the likely build-up of small scale enstrophy (see below). For a Laplacian viscous term (as used in PMZ14), the maximum possible values of κ for the deterministic and stochastic implementations are κ = −(0 . 27 x ) 2 and κ = −(0 . 29 x ) 2 , respectively. One can increase κ if the viscosity coefficient is also increased, but the influence of the parametrization is then damped by the dissipation, making it therefore inefficient to represent up-gradient momentum fluxes. Numerical stability is improved by raising the power of the Laplacian operator, which increases the scale-selective behaviour of the dissipative term, allowing for a stronger forcing with larger values of | κ|. For the stochastic implementations, the effective value of parametrization (henceforth of | κ|) is always larger than that of the deterministic case. This reflects the fact that stochastic fluctuations can have a stabilizing effect on the model solution. As pointed out in Palmer (2012) , the tuning of a stochastic parametrization should not be done starting from the deterministic version.

Relationship of PMZ14 to hyperviscosity
We can approximate the effective viscosity term, i.e. the combination of the biharmonic viscosity and the RE parametrization in Eq. (12) as using a Taylor expansion 1 / (1 − x ) = 1 + x + x 2 + x 3 + · · · , valid for | x | < 1. The subgrid effective dissipation differs in its dependence on K from that in the unparametrized equation. The infinite series for the effective viscosity corresponds to a dissipative behaviour on the Fourier amplitude of D q /D t , reminiscent of hyperdiffusive closures (e.g. ν∇ 6 ψ, ν∇ 8 ψ ...), hence removing energy and enstrophy at small scales. Recall from Eq. (12) that avoiding the singularity requires the value of κ to be chosen so that | κ| K 2 < 1 for any wavenumber K resolved by the model. We therefore have (| κ| K 2 ) n < 1 for n ≥ 0, and hence the magnitude of the terms in the series of Eq. (13) decreases with increasing n . The effective viscosity in spectral space thus corresponds to a series of hyperviscous terms of increasing order, where the first and largest term is the explicitly specified hyperviscosity (as defined in Section 2 ) and the subsequent terms could be characterized as higher-order hyperviscous corrections.
Hyperviscous closures are motivated by the desire to enhance dissipation at very small scales while retaining as much resolved turbulence as possible in the slightly larger scales, i.e. to create a sharper transition between regions of spectral space in which advective and dissipative behaviour dominates ( Holloway, 1992 ). The addition of higher-order hyperviscous corrections (in spectral space) should further enhance the dissipation at the smallest resolved scales, with increasingly larger effect as the scale decreases. The effect of the parametrization can be interpreted as modifying the scale-selectivity of the dissipative term. Unlike hyperviscosity, the parametrization ( Eq. (5) ) involves nonlinear terms and a temporal derivative. Another interpretation, given in PMZ14 considering the terms in Eq. (5) , is to view the parametrization as acting to enhance or weakens flow parcels depending on their "history".
As discussed in Section 3 , a practical workaround to implement the parametrization is to use D q D t at the previous timestep, so that the timestepping can remain explicit. The parametrized model equation (1) , can be written as Applying (14) recursively we obtain where for n = 0 we set ( κ∇ 2 ) 0 ≡ 1. The total forcing of potential vorticity at time t involves successive applications of the operator κ∇ 2 to the wind forcing F wind and dissipation D (as defined in Section 2 ) at previous times.
We can neglect terms involving ∇ 2 n F wind , for n ≥ 1, assuming that the wind forcing is large-scale and constant in time, so that such higher order Laplacians are increasingly small. The evolution Eq. (15) can then be written as where the bottom drag contribution to D has been ignored, and the last line uses the fact that κ < 0. In spectral space we then have D q D t Since t = 0, the relative signs of ψ (t) , ψ (t − t ) , ψ (t − 2 t ) ...
can differ, and the term need not be strictly dissipative. This contrasts with the behaviour expected when D q /D t is evaluated at the present timestep ( t = 0 ), as the different terms would act as hyperviscosity (see Eq. (13) ). However, ψ (t) in most cases would have the same sign as ψ (t − t ) , assuming an AR1 process with memory of order the eddy timescale, such that Eq. (17) does behave as a dissipative term. An infinite series of terms is not warranted for implementation in a numerical model nor might be valid for all scales relevant here (indeed, our attempts of using a selected number of terms showed that the simulations did not converge). However, the series did provide some insights. We interpret the infinite series of hyperviscous terms as a change in the scale-selective dissipation of the flow, modifying the overall dissipation and the interaction between scales. However, improvements in the simulation are not solely due to changes in the overall dissipation, but also via an upscale energy transfer.

Energy and enstrophy
Since the energy is spuriously being dissipated at small-scales in non-eddy resolving models, inhibiting energy backscatter from small to large scales ( Jansen and Held, 2014 ), this energy needs to be reinjected on average by our parametrization. The energy tendency due to the parametrization is given by where E is the sum of the kinetic energy, 1 2 (∇ψ ) 2 , and APE f 2 2 N 2 (∂ z ψ ) 2 for all three layers. The energy tendency due to the parametrization is then the product of ψ and the potential vorticity eddy forcing, | κ|∇ 2 D q D t . The nonlinear energy tendency term due to the parametrization can be further interpreted as follows.
Suppose that q following a parcel is increasing, D q D t > 0 , and that ∇ 2 D q D t < 0 . If this occurs on a cyclonic eddy, i.e. ψ < 0, then ψ∇ 2 D q D t > 0 . The parametrized energy term | κ| ψ∇ 2 D q D t is thus positive, yielding an energy source. Correspondingly, ∇ 2 D q D t > 0 and ψ > 0 yield an energy source for an anticyclonic eddy. This suggests that the effect of the closure is to increase the kinetic energy of the flow when the potential vorticity tendency on a parcel has the same sign as the relative vorticity of the parcel.
Assuming that the effect of the eddies is small compared to the wind forcing and dissipation, leading to D q D t ≈ F wind + D, the effect of the eddies will be to amplify the energy input of the net forcing, i.e. the residual of the wind forcing plus the dissipation (with no effect if these are zero). Likewise, if ∇ 2 D q D t has opposite sign to the vorticity, there will be an energy sink, suggesting the parameterisation should act to amplify the dissipation of energy. Our parametrization therefore requires some input of energy from the wind (or buoyancy) forcing, in addition to the presence of small scale deformation (as discussed earlier). The input of energy will then modify the velocity, and the nonlinear advective term, leading to an indirect energy transfer between scales. We argued above that the effect of the closure on the dissipation resembles the effects of a hyperdiffusion on q . This suggests that it is dissipative for q under certain assumptions. The fact that it can behave as an energy source, but also dissipate enstrophy, is the desired qualitative behaviour for a parametrization of QG turbulence ( Charney, 1971 ). To further understand the contribution of the parametrization to the enstrophy budget, consider the enstrophy tendency due to the parametrization: where G = 1 2 q 2 is the enstrophy. Using the identity the enstrophy tendency (19) can be expressed as The first term on the right hand side of Eq. (19) does not contribute to the global enstrophy budget. The second term can act as a sink or source of enstrophy depending on the relative orientation of ∇ D q D t with ∇q . In a location where D q D t is tending to amplify q anomalies, ∇ D q D t will tend to point in a similar direction as ∇q so that ∇ D q D t · ∇q > 0 , yielding an enstrophy source, and similarly an enstrophy sink if the tendency is dissipative, meaning that D q D t is damping the q anomalies. Hence κ < 0 is required for selfconsistency, and the parametrization can provide both a source and sink of enstrophy. This demonstrates that both locally and globally enstrophy can build up in the model, further affecting the numerical stability of the model. We earlier highlighted that some tuning of κ was necessary to maintain numerical stability and that the value obtained (and thereby our results) were sensitive to the viscous term used ( ∇ 4 ψ vs. ∇ 6 ψ).
The improvement seen in the numerical simulations presented in this work are therefore due to a careful combination of the sub-grid eddy dissipation (here ν∇ 6 ψ) and the eddy parametrization ( κ∇ 2 D q D t ). For the parametrization to be effective, we must ensure that the build-up of enstrophy at the grid-scale does not overwhelm the solution and that the nonlinear upscale energy transfer is not inhibited by the viscous terms.
The stochastic backscatter is more effective than the deterministic nonlinear backscatter. It is difficult to disentangle the reasons for the differences. However, we hypothesize that it is the result of a combination of several factors. As described previously, the noise has a stabilizing impact on the model simulations.
The nonlinearity of the model equation might be pushing the stochastic model into a new (and more stable) regime that the unparametrized or parametrized deterministic models cannot attain. This can be a consequence of the sampling of the tails of the PDFs of eddy forcing (with non-zero kurtosis), which the deterministic parametrization cannot sample adequately. The infrequent but extreme values of eddy forcing can have a significant impact on the scale interaction in turbulent flows, as illustrated here. Furthermore, the probabilistic forcing might avoid the frequent forcing at the singularity, stabilizing the model.

Implementation into a primitive equation model
The implementation of our parametrization is tied to a QG potential vorticity equation. Current ocean climate models do carry momentum, temperature and salinity as prognostic variables but not potential vorticity which is only diagnosed. We propose two possible ways forward to implement the parametrization into a primitive equation model. The proposed implementations below are defined by imposing a total QG potential vorticity forcing as given by Eq. (4) . Each implementation provides advantages but also caveats.
Two-dimensional implementation: The parametrization can be implemented into the momentum equations, without adding a contribution into the buoyancy equation, if we assume that the eddy forcing is purely 2D. This does not mean that the buoyancy forcing is neglected but rather that it projects directly on the momentum. The u and v momentum tendencies due to the eddy parametrization can be expressed as follows where q is the QG potential vorticity as defined in Eq.
(2) . The curl of the momentum tendency then leads to the following vorticity forcing The newly calculated velocity components from Eq. (21) where q u is the relative and planetary vorticity and q b is the stretching vorticity such that q u = ∇ 2 ψ + βy Let us define F, G and B as the forcing terms due to the parametrization on the right hand side of the zonal momentum, meridional momentum and buoyancy equations, respectively. The QG PV tendency due to eddy parametrization via the forcing terms F, G and B can be estimated by deriving an equation for QG PV and is therefore where b = f 0 ψ z is the buoyancy which can be shown to yield the following QG PV forcing i.e. the PZM14 closure. The arbitrary parameter κ b has been introduced, which partitions the buoyancy component of the forcing between the momentum and buoyancy equations. Choosing κ b = 0 gives the closure suggested in Eq. (21) . Choosing κ b = κ puts all of the forcing associated with q b into B , i.e. into the buoyancy equation. To avoid spurious diapycnal mixing, the implementation should be performed on isopycnal surfaces, which would also provide a close match to the QG PV implementation. The implementation of the PZM14 closure proposed in Eq. (27) is equivalent to the proposed implementation of Eden (2010) (note the parametrizations are not equivalent, only their implementations in primitive equation models). Eden (2010) proposes PV diffusion closure as well as a buoyancy diffusion closure. Combining both PV and buoyancy closures yields an expression for the eddy momentum flux forcing term that appears in the zonal-momentum equation (in a zonal-mean channel model). Similarly to our approach in Eqs. (21) and (27) , Eden (2010) assumes a specific forcing form, showing that it gives the desired forcing in the PV equation. Eden (2010) finds that an additional term, the "gauge term", is required to satisfy the momentum constraint. For the PMZ14 closure, the gauge term can be any contribution to F and G that vanishes when ∂G ∂x − ∂F ∂y is computed. Our simulations do not have a direct buoyancy forcing, therefore it is difficult to argue which implementation would lead to more physically based results. Furthermore, it is difficult to decide how to choose κ b sensibly. One could attempt to diagnose it from high-resolution simulation model, as in PMZ14. For example, diagnostic results from the QG high-resolution model indicate that momentum and buoyancy contributions to ∇ 2 Dq / Dt are of similar magnitude (not shown), suggesting that κ b = κ would a physically reasonable choice. One could use high-resolution primitive equation models to further diagnose the different contributions of momentum and buoyancy forcing and validate the QG results, however this is beyond the scope of the present study.
Both implementations in QG, deterministic and stochastic, do show some sensitivity to the timestep being used, as any numerical simulation would. Yet, the sensitivity to the timestep in the current parametrization might be more pronounced since the implementation requires the knowledge of the time-tendency of nonlinear quantities. In the present work, we choose to keep the timestep of the high resolution truth and the low-resolution 30 km runs identical. This was done to isolate the role of timestepping in the stability of the simulations. Runs with increased timesteps up to 4 times the present values lead to results identical to the ones presented here. As for the QG implementation, the Lagrangian derivative of potential vorticity is required in primitive equation, therefore there might be some sensitivity to the timestep chosen to integrate the model which will need to be investigated.
Another (related) approach for an implementation in primitive equation is to use the different parts of the non-Newtonian Rivlin-Ericksen stress directly from the momentum equations, rather than the analogy from PMZ14. Anstey and Zanna (2017) have shown that the deformation part of the Rivlin-Ericksen stress in the momentum equation can dissipate enstrophy, conserve energy while mimicking very closely the effect of eddies onto the mean flow.

Summary
We presented a new parametrization of eddy-mean flow interaction for use in eddy permitting models. The closure, based on a form resembling a Rivlin-Ericksen stress which includes a deformation and a memory term of potential vorticity gradient, can be implemented as a deterministic or stochastic parametrization. When either formulation of the parametrization are implemented, we obtained a drastic improvement in the mean state and the variability of the low resolution models over all wavenumbers and frequencies.
The parametrization requires only resolved variables from the low resolution model, namely the Laplacian of the Lagrangian tendency of potential vorticity, ∇ 2 D q D t . For the deterministic parametrization, the parameter (which has dimensions of length squared) depends only the low resolution grid size. For the stochastic parametrization, in addition to the low resolution grid box size, the maximum strength of the wind stress (or the variance in PV under any forcing -mechanical or thermodynamical) and the local stratification are necessary input to estimate the contribution of the eddy forcing. The deterministic and stochastic parametrizations are therefore scale-aware, and flow-aware due to the necessary evaluation of the local ∇ 2 D q D t . The present results indicate there is a cut-off resolution at which the deterministic parametrization and the stochasticity do not impact the mean flow. The cut-off is around the Rossby radius of deformation, but further tests are required to ascertain this value. For horizontal resolutions smaller than the cut-off value, the model produces instabilities generating barotropic and baroclinic eddies. The effects of these eddies can be enhanced by the parametrization, affecting local and global momentum, energy and enstrophy budgets. The parametrization manages to overcome dissipation, especially in regions where the eddy forcing has a strong impact on the larger scale flow.
The parametrization captures some key ingredients of geotrosphic turbulence such as jet sharpening and upgradient momentum fluxes, energy backscatter and enstrophy dissipation. It requires only spatial and temporal derivatives already computed by the model. Some numerical stability criteria must be respected and would need to be tested in more complex models. However in QG, the parametrization also appears to have a stabilizing effect on the model. It might be necessary to revisit some of the initial assumption when considering primitive equations such as decomposing the forcing into momentum and buoyancy as described in the discussion, or combine the parametrization with an energy or an enstrophy equation ( Marshall and Adcroft, 2010;Marshall et al., 2012 ).
The stochastic backscatter is shown to be a more efficient and a more stable eddy parametrization than its deterministic counterpart. Stochastic parametrizations for convection have been fairly routine in atmospheric models ( Raisanen et al., 2004;Plant and Craig, 2008 ), especially in the grey zone (analagous to the horizontal resolution cut-off discussed above for eddy-mean interaction). Stochastic parametrizations in primitive equation ocean models, mainly at coarse-non-eddying resolution, are slowly being implemented showing various degrees of success (e.g., Brankart, 2013;Andrejczuk et al., 2016;Williams et al., 2016;Grooms, 2016;Juricke et al., 2017 ). The encouraging results presented here reinforces the need for developing and implementing scale-and flowaware stochastic ocean parametrization in ocean climate models.