Oceanic stochastic parametrizations in a seasonal forecast system

We study the impact of three stochastic parametrizations in the ocean component of a coupled model, on forecast reliability over seasonal timescales. The relative impacts of these schemes upon the ocean mean state and ensemble spread are analyzed. The oceanic variability induced by the atmospheric forcing of the coupled system is, in most regions, the major source of ensemble spread. The largest impact on spread and bias came from the Stochastically Perturbed Parametrization Tendency (SPPT) scheme - which has proven particularly effective in the atmosphere. The key regions affected are eddy-active regions, namely the western boundary currents and the Southern Ocean. However, unlike its impact in the atmosphere, SPPT in the ocean did not result in a significant decrease in forecast error. Whilst there are good grounds for implementing stochastic schemes in ocean models, our results suggest that they will have to be more sophisticated. Some suggestions for next-generation stochastic schemes are made.


Introduction
Seasonal forecasting with coupled atmosphereocean-land surface models has become well established at many numerical weather prediction (NWP) and climate forecast centers during the last two decades (MacLachlan et al. 2015;Molteni et al. 2011;Saha et al. 2014). These coupled systems for seasonal forecasting exploit predictability originating from the ocean and the land surface. More specifically, coupled models allow for predictions of seasonal-tointerannual variations of the climate system such as the El Niño-Southern Oscillation (ENSO) cycle, which may in turn affect predictions of other long timescale variations. Coupled models are also increasingly being used for medium-range weather predictions. For example, the European Centre for Medium-Range Weather Forecasts (ECMWF) recently introduced a coupled atmosphere-ocean model into its operational forecasts in November 2013 (Lang et al. 2015;Janssen et al. 2013).
Forecast uncertainties need to be accounted for in these prediction systems. For example, errors in the observations and an incomplete observing system lead to inaccuracies in the initial conditions of forecasts. Because of the chaotic nature of the atmosphere and the earth system, these initial errors tend to grow quickly, reducing the predictive skill of forecast models. Ensembles of model simulations are used to account for initial condition uncertainty. Each ensemble member is initialized with slightly different initial conditions, generated using a perturbation method [e.g., ensembles of data assimilations, singular vectors; see, e.g., Molteni et al. (2011)]. The ensemble as a whole then provides a probabilistic forecast.
In addition to the uncertainties in initial conditions, forecast models themselves are inaccurate due to the numerical approximations used in the temporal and spatial discretization or due to inaccurate representations of subgrid-scale processes. In the past few decades, different strategies have emerged to account for these model uncertainties. Multimodel ensembles make use of the diverse set of weather and climate models to sample the uncertainty in the model formulation, on time scales of weather (e.g., Mylne et al. 2002;Park et al. 2008), seasonal (Palmer et al. 2004;Weisheimer et al. 2009), decadal (e.g., Smith et al. 2013;Meehl et al. 2014), and climate (e.g., Tebaldi and Knutti 2007;Flato et al. 2013) predictions. Perturbed parameter ensembles, on the other hand, sample the uncertainty in the choice of specific, imperfectly constrained model parameters (e.g., Stainforth et al. 2005). Furthermore, in multi-parameterization ensembles the set of applied parameterization schemes is modified for each ensemble member.
Finally, stochastic parameterizations are becoming an alternative technique for representing model uncertainty, especially for atmospheric NWP (e.g., Shutts 2005;Berner et al. 2009;Palmer et al. 2009, and references therein). By injecting stochastic perturbations into the system at the subgrid scale, uncertainties in closure schemes are incorporated and unresolved subgrid-scale variability may be taken into account. Some of the stochastic schemes such as the stochastically perturbed parameterization tendency (SPPT) scheme have been used successfully in atmospheric forecast models (e.g., Palmer et al. 2009), leading to improved forecast skill up to the seasonal time scales (Weisheimer et al. 2014;Batté and Doblas-Reyes 2015). SPPT is a multiplicative stochastic scheme based on existing deterministic parameterizations. The study of Weisheimer et al. (2011) compared the multimodel, perturbed parameter and atmospheric stochastic physics approach for monthly and seasonal forecasts showing the potential for stochastic parameterizations to outperform the multimodel ensemble.
The motivation for stochastic parameterizations in the atmosphere originates to some degree from the existence of power-law structures and the related rapid upscale error propagation [see Palmer (2012) for a detailed discussion]. Since similar power-law structures, associated with mesoscale eddies, can also be found in the ocean (LaCasce and Ohlmann 2003; LaCasce 2008), similar arguments for the potential role of stochastic parameterizations may therefore hold.
Stochastic parameterizations have not only been introduced into the atmospheric component of NWP and seasonal forecast models (e.g., Buizza et al. 1999;Palmer et al. 2009), but have more recently also been implemented into the sea ice (Juricke et al. 2013;Juricke andJung 2014), ocean (e.g., Brankart 2013;Brankart et al. 2015), land surface (MacLeod et al. 2016), and air-sea coupling (e.g., Williams 2012; Beena and von Storch 2009) components of global general circulation models, to account for the uncertainty in subgrid parameterizations. Although the impact of ocean stochastic parameterizations has been demonstrated in a climatological context (e.g., Brankart 2013), as yet no comparable stochastic parameterization in the ocean has been implemented into operational coupled models. Since some of the proposed stochastic schemes for ocean subgrid-scale parameterizations (e.g., Porta Mana and Zanna 2014) may not be straightforward to implement in complex coupled global models, the purpose of this study is to investigate the impact of simpler stochastic schemes of similar complexity to those used in the atmosphere. These schemes aim at quantifying model uncertainty based on existing parameterizations.
Three methods of stochastic parameterization are considered: a surface flux parameterization similar to Williams (2012), a stochastic perturbation of the equation of state similar to Brankart (2013), and stochastic perturbations of the parameterized tendencies of diffusion-mixing and viscosity-which can be considered to be the application to the ocean of the SPPT scheme used successfully in atmospheric models (e.g., Palmer et al. 2009). The details of the model setup, the stochastic parameterization schemes, and the methods of data analysis are given in section 2. The results of several ensemble integrations over a 10-yr period are presented in section 3 and our conclusions are summarized and discussed in section 4.

Model configuration and experimental design
Integrations were performed using a variation of the ECMWF seasonal forecast system [system 4; Molteni et al. (2011)]. The ocean component consists of the NEMO v3.0 global ocean primitive equation model (Madec et al. 2008) discretized onto an approximately 18 ORCA tripolar grid (Madec and Imbard 1996) with 42 vertical levels. NEMO is coupled to the ECMWF atmospheric forecast model IFS Cy36r4, integrated at T159 horizontal resolution (reduced from T255 in system 4) with 91 vertical levels. Ocean reanalysis data, used for initial conditions and verification, was provided by the new ECMWF operational ocean reanalysis system (ORAS4; Balmaseda et al. 2013). The ensemble of five reanalyses is driven by sampling uncertainty in winds and in deep ocean initial conditions, and subsampling observation coverage. The ocean analyses are then augmented by applying SST perturbations with an associated subsurface temperature signal (Molteni et al. 2011), as stochastic SST perturbations are linearly decreased to a depth of 80 m. The unperturbed ensemble member of ORAS4 is used for verification. Atmospheric initial conditions are derived from the ERA-Interim dataset (Dee et al. 2011), without initial or stochastic perturbations applied.
To investigate the potential impact of spatially and temporally correlated noise used for the stochastic schemes, we followed the example of first implementations of SPPT in the atmosphere (see Buizza et al. 1999) and introduced a simple, computationally efficient spatial correlation method. The horizontal domain was divided into a coarse, regular L 3 L latitude longitude grid. In each of the cells of this coarse grid, a single uniformly distributed random number was generated at a given time interval I and applied uniformly to each of the ocean model grid cells within the respective L 3 L region. Different resolutions for L and time intervals I were tested. The diagnostics in this paper focus on the results with L 5 308 and with I 5 1 or I 5 30 days, as these showed the largest, physically reasonable and computationally stable response in terms of changes to the generated oceanic ensemble spread. Reduced spatial and time scales reduced the impact. Results of a reduced 208 3 208 spatial correlation grid will be briefly discussed as well.
A 10-member ensemble of experiments, initialized on 1 November 1989 was integrated for 3 months. The same procedure was repeated over 9 more years to generate an ensemble for each of the 10 years 1989-98, for a total of 100 integrations. Our setup is designed to mimic, in a computationally efficient way an operational seasonal forecasting system (e.g., system 4 from ECMWF). We define three regions for the purpose of summarizing the time-dependent nature of the integrations: 1) the North Atlantic subpolar region, 458-658N, 708W-08; 2) the North Atlantic subtropics region, 108-458N, 708W-08; and 3) the Southern Ocean, 358-658S, all longitudes. Although we considered other regions, including the tropical Pacific and whole globe averages, the above regions were chosen in order to focus on the areas where the largest biases in the mean and the largest ensemble variance occur. These are also regions where the schemes examined here have the largest impact. Results of area averages over other regions support the findings described here, or are inconclusive (i.e., cannot be distinguished from the noise). In this context, the criterion to distinguish signal from noise is simply defined as approximately where the mean is greater than two standard deviations in the mean: [e.g., see section 2.5.9 in von Storch and Zwiers (1999)].
Here m is the ensemble mean difference between a stochastically perturbed and a control ensemble integration, s is the standard deviation of the differences of all start dates, and n 5 10 is the number of independent samples, assuming that the model state is independent from one year to the next. Equation (1) is similar to the 95% confidence two tailed t test. We chose this test statistic because it is simple and to avoid giving the impression of accurate confidence intervals. The t test is not robust against the sampling assumption that every realization occurs independently of all other realizations [e.g., see section 6.6.1 in von Storch and Zwiers (1999)] and is not appropriate for variances. The choice of a 2s threshold is somewhat arbitrary and provides a rough guide. [See Fig. 3 for an impression of the spread indicated by (1).]

a. Stochastic surface flux (SSF)
Stochastic perturbations of the air-sea fluxes were applied to the seasonal forecast model based upon the method described in Williams (2012) who performed experiments with a lower-resolution coupled atmosphere-ocean GCM. Their ocean model (OPA 8.2; Madec et al. 1998) was integrated using the ORCA2 grid, which has approximately a 28 horizontal resolution, with 31 vertical levels. Their atmospheric model (ECHAM 4.6;Roeckner et al. 1996) was integrated using a T30 spectral grid with 19 vertical levels. In Williams (2012), the air-sea freshwater flux DS and nonsolar heat flux DT, were perturbed separately in two separate experiments, given the following perturbation scheme: DT / (1 1 r T )DT, and DS / (1 1 r S )DS, (2) with r T and r S being random numbers uniformly distributed between 60.5 and generated at 3-h intervals on the ORCA2 grid scale. For this paper, in addition to moving to a higher resolution, the interval over which random numbers were chosen was increased to I 5 1 day and applied using the L 3 L grid defined above with L 5 308. Also, both fluxes were perturbed simultaneously but with different sequences of random numbers.

b. Stochastic equation of state (SES)
The method of stochastic parameterization of the nonlinear equation of state, which relates the density to the temperature, salinity, and pressure, is based upon the method described in Brankart (2013). To simulate the uncertainty related to area-averaged temperature and salinity fields used as input for the equation of state, a first-order autoregressive process perturbs both state variables by an amount proportional to their gradients. The autoregressive process was implemented with a MAY 2016 A N D R E J C Z U K E T A L . decay time scale of 12 days, however, integrations did not complete. We found that a decay time of 7.5 days was necessary to keep the model stable. Further investigation, beyond our current scope, is necessary to establish the precise reason for this instability. Ultimately the density at each grid point is perturbed independently of neighboring grid cells as described by Brankart (2013). In Brankart (2013) integrations were performed using NEMO with the same ORCA2 grid as that used by Williams (2012). Their system, however, was forced using climatological atmospheric data without interannual variations (Large and Yeager 2009) rather than the atmosphere from a coupled model.

c. Stochastically perturbed parameterization tendency
We introduce a SPPT scheme for the ocean similar to that implemented in atmospheric models for ensemble weather forecasts. The subgrid-scale-parameterized tendencies (used to crudely mimic turbulent diffusion, mixing, convection, and viscosity) applied to the zonal velocity u, meridional velocity y, salinity S, and temperature T, are multiplied by (1 1 r X ) as in (2), with different random sequences r X for u, y, S, and T, that is, X 2 fu, y, S, Tg. For example, the deterministic prognostic equation for T given by takes the following form when SPPT is implemented where U 5 (u, y, w) is the 3D Eulerian velocity, U GM is the eddy-induced velocity from the Gent-McWilliams parameterization scheme (Gent and McWilliams 1990), D T represents the parameterized diffusion and mixing tendencies, and F T is the air-sea flux. Summarized in the respective D X and X 2 fu, y, S, Tg, terms for the momentum and tracer equations are parameterized terms using vertical eddy viscosity and diffusivity coefficients calculated by a turbulent kinetic energy (TKE) closure scheme as well as a double-diffusion mixing scheme. Lateral diffusion and viscosity are also included in D X using horizontally varying coefficients for tracers [following Held and Larichev (1996) and Treguier et al. (1997)] as well as a three-dimensional spatially varying viscosity coefficient for the momentum equations. See Madec et al. (2008) for the specification of these schemes. For the results discussed in section 3 the spatial field of the random numbers r X was generated at either I 5 30 days or alternatively I 5 1-day intervals using the L 5 308 grid defined above with the same values of r X applied on each vertical model level. The four r X were applied to the parameterized tendencies of the respective prognostic equations simultaneously for all four fields and were drawn from a uniform distribution between 60.8. This value for the magnitude of r X was found to be the maximum value consistent with model stability. The eddy viscosity term D T stabilizes the model, so a value of r u , 20.8, held for 30 days, could be an obvious source of model instability.

Impact of the stochastic parameterizations on model bias and ensemble forecast performance
The control ensemble integrations without any stochastic perturbations exhibit biases relative to the reanalysis. The daily bias is estimated by taking the difference between the reanalysis and the ensemble mean for each start year averaged over all 10 start years. Averaging in time between 60 and 90 days into the integration indicates that this bias, as illustrated for upper 300-m ocean heat content in Fig. 1a, often coincides with the regions of high ensemble variability such as the eastern tropical Pacific, Southern Ocean, Gulf Stream, and Kuroshio regions. The upper 300-m heat content is chosen as it yields a particularly strong signal-to-noise ratio compared to sea surface temperatures in isolation (see the online supplemental material).
The SPPT scheme leads to a change in the bias in the Southern Ocean, particularly in the region of the south coast of Australia, and the North Atlantic (Fig. 1b). Comparing Figs. 1a and 1b indicates that the warm bias along the south coast of Australia is reduced. Taking the area means, specified in section 2, of the daily bias (Fig. 2) highlights a reduction in the mean bias for the upper 300-m ocean heat content and sea surface salinity (SSS) in the North Atlantic subpolar region after the first month. The change in bias of SST found in our study is relatively small (about 10%) in the North Atlantic and the tropical Atlantic over the 3-month integration performed compared to those obtained when increasing the ocean resolution (Scaife et al. 2011) or the atmospheric vertical resolution (Harlasse et al. 2015) for long integrations. The bias is initially not exactly zero due to the random spread in the initial condition perturbations. In the Southern Ocean, while the warm bias has been reduced, no noticeable changes are observed in the SSS bias. In contrast, the bias in the North Atlantic subtropical region has increased due to SPPT. For certain regions the difference in bias remains relatively constant over the length of the integration.
The gray lines in Fig. 2 represent the ensemble mean bias of the control integration for each of the 10 years and can be considered as an indicator of confidence. Changes in the ensemble bias and spread (defined as the ensemble variance) between the control and integrations using the SSF and SES schemes are too small to distinguish from zero. The inherent ensemble spread due to the ocean initial condition perturbations and atmospheric variability is large in comparison. When it comes to the area-averaged quantities, Fig. 2 also confirms that the SSF and SES schemes have little impact upon the bias though there are regions in which the bias appears slightly reduced (e.g., North Atlantic subpolar SSS around 2-3 months for the SES scheme).
The impact of the SPPT scheme upon the ensemble spread is apparent in several key regions (Fig. 1d). The largest and most significant impact is shown in the region of the south coast of Australia with patches of visibly increased spread throughout the Southern Ocean. This is a similar pattern to the changes in vertical mixing induced by a stochastic wind forcing compared to climatological wind forcing [cf. Fig. 8 in Beena and von Storch (2009)]. In addition to the Southern Ocean, Fig. 1d also reveals increased ensemble spread along western boundary currents, including the Gulf Stream and Kuroshio. Although there are reductions of ensemble spread in parts of the tropics, they are not distinguishable from zero using the criterion in (1). The increase in ensemble spread over the length of the integrations is readily seen in area means of the North Atlantic subtropical and subpolar regions and the Southern Ocean (Figs. 3a-c). For the North Atlantic subtropics (see Fig. 3b), which corresponds to a rather moderate increase in regionally averaged ensemble spread, the relative increase in spread after three months represents about 40% of the mean ensemble spread of the deterministic ensembles. The sharp changes in spread apparent in Figs. 3a-c is due to a new set of random numbers being chosen every 30 days. We expect that substituting the r X for autoregressive processes with the equivalent time scales will remove this artifact.
On the other hand, Figs. 3d-f indicates that the SPPT scheme has increased the regional spread at the expense of increased forecast error in the upper ocean heat

MAY 2016
A N D R E J C Z U K E T A L . content in most regions over the length of the integration. The forecast error is defined as the difference between the bias-corrected ensemble mean and the reanalysis. Bias corrected means that the daily bias is subtracted from each integration. The mean squared error is then the squared error, averaged over the region of interest. The forecast error is slightly, but statistically significantly reduced between 2 and 5 weeks in the North Atlantic subtropical region. Comparing Figs. 3a and 3d, Figs. 3b and 3e, and Figs. 3c and 3f, indicates that the additional mean squared error is approximately an order of magnitude lower than the increase in spread. The equivalent results for the SSF and SES schemes show that there is little impact upon the total ensemble spread (see the online supplemental material). Reducing the time between random numbers from I 5 30 days to I 5 1 day resulted in a reduced impact for the SPPT scheme (to the point at which changes are almost indistinguishable from zero) when comparing ensemble bias, spread, and mean squared error to those of the control integration. This supports the conventional hypothesis that the impact of the stochastic term is strongly dependent upon its time scale, with longer time scales corresponding to a larger impact. Integrations in which the spatial correlation of the noise is reduced to an L 5 208 grid yield only small changes. From previous sensitivity studies we would expect that as the correlation length scale is reduced further, there will come a point at which the impact of the random term is strongly reduced (see, e.g., Juricke et al. 2013). (See the online supplemental material for additional figures demonstrating the reduced impact at these time scales and equivalent plots for the SSF and SES schemes and the impact on the sea surface temperature and salinity.)

Discussion
In this study, we test three oceanic stochastic parameterization schemes that aim at quantifying model uncertainty: the stochastically perturbed parameterization tendency (SPPT), the stochastic surface flux (SSF) (Williams 2012), and a stochastic equation of state (SES) (Brankart 2013). The relatively simple SPPT scheme, which has proved to be an important element of an ensemble forecast system in numerical weather prediction, injects multiplicative noise into the prognostic equations with an amplitude proportional to the deterministically parameterized tendencies. These three schemes are applied to the ocean component of a state-of-the-art seasonal coupled forecast system to account in part for the uncertainty in subgrid processes. The model considered here exhibits relatively large oceanic variability FIG. 2. Bias in the (a)-(c) sea surface salinity and (d)-(f) upper 300-m ocean heat content averaged over (left to right) three regions. See text for details. The colored lines indicate the ensemble mean bias averaged over all 10 start years for the control integrations (blue), the integrations with SPPT (red), SSF (green), and SES (black). Each gray line represents the 10-member ensemble mean bias of the control integration. There is 1 gray line for each of the 10 yr. Note the different y-axis scales. compared to a system run at coarser resolution or without an interactive atmosphere. Using such a model, the impact of the SSF and SES schemes was relatively small at the monthly time scales considered, and may be more visible through longer time-scale integrations. In the case of the SES and SPPT schemes, the amplitude of the stochastic perturbations was limited by model stability.
Results show that compared with the other schemes, SPPT is an effective stochastic parameterization for increasing ensemble spread for variables such as sea surface temperature and salinity, and upper 300-m ocean heat content. The impact of SPPT was found to be particularly marked, and visible above the background variability, in regions of strong eddy activity, such as along western boundary currents in the Gulf Stream and Kuroshio regions, in the North Atlantic subpolar region, and also in parts of the Southern Ocean. In most other regions the atmospherically induced oceanic variability is the major contributor to the ensemble spread. Similar results were found by Juricke et al. (2014) in the context of applying stochastic perturbations to the sea ice strength in seasonal sea ice modeling. They showed that after a few weeks the atmospheric variability is the largest contributor to sea ice ensemble spread.
While SPPT increased ensemble spread for some regions, ensemble-mean forecast skill on the other hand was not improved by the addition of SPPT, with the exception of the North Atlantic subtropics. For some regions, model bias was made worse. The latter does not necessarily imply that SPPT is degrading model accuracy, since the value of many climate model parameters are found by running the model in deterministic mode and estimating the values that fit the observations best. If a stochastic scheme impacts on the model mean state, then tuning should be performed using the full stochastic model and not a deterministic approximation to it (Palmer 2012). In addition to tuning the deterministic parameters of the model, the impact of a stochastic parameterization may be tuned by adjusting the decorrelation time scale and spatial distribution of the random perturbations as well as their magnitude. These are key parameters of the stochastic parameterization [e.g., see Cooper and Zanna (2015) for a discussion]. For the particular configurations discussed in this study, changes to the time scale appeared more important than the spatial scale. It should also be mentioned that the procedure used to create the random perturbations is crucial if unphysical artifacts such as the sharp changes in spread apparent in Figs. 3a-c are to be avoided. For future implementations of these kinds of stochastic ocean parameterization schemes the pattern generator used to create spatially correlated perturbations should FIG. 3. Area-averaged changes in the (top) ensemble spread and (bottom) mean square error with respect to the control integration due to SPPT of upper 300-m ocean heat content. See text for details. The gray lines indicate the statistics from the ensembles for each of the 10 start years. The solid red line indicates the mean over all years and the dashed lines indicate the uncertainty in the mean (the mean 6 2s/10 1/2 ).

MAY 2016
A N D R E J C Z U K E T A L .
be able to produce more realistic, smoothly varying random fields, respecting physical constraints such as boundary conditions. Where possible, observational data should be used to get estimates for spatial and temporal decorrelation scales. A more technical constraint, on the other hand, is the computational efficiency of such a random pattern generator, especially when it comes to operational seasonal forecasting. While the current implementation of SPPT led to an increase in spread, it was not able to demonstrate a reduction in ensemble-mean forecast error, even when the model fields have been bias corrected a posteriori. This result is possibly due to SPPT being too crude a scheme for ocean models. In the ocean, subgrid processes are parameterized largely by diffusion and viscosity. The impact of uncertainty in these terms upon the large-scale oceanic circulation, as simulated by SPPT, may not well reflect the uncertainty in the underlying turbulent processes. What this suggests is that a more positive impact than has been found here requires the development of more sophisticated stochastic schemes for unresolved and missing processes. Nonetheless the simple oceanic SPPT scheme introduced in this study provides a first estimate of the potential impact of stochastic ocean parameterizations in an operational seasonal forecast model.
The development of such stochastic closures to represent unresolved and missing processes should when possible remain consistent with fundamental physical constraints. For example, our initial implementations of a stochastic Gent-McWilliams scheme violated important nondivergent and adiabatic constraints leading to unrealistic upwelling and growing instabilities (see the online supplemental material). Avenues for future investigations might include a stochastic Gent-McWilliams scheme that is stable over long time periods, and remains consistent with the nondivergent and adiabatic constraints of the deterministic scheme. Stochastic schemes should ultimately be guided by observations, however, ocean observations are sparse. As a methodology to develop new parameterizations, highresolution idealized simulations can be substituted as ''truth'' (e.g., Berloff 2005Berloff , 2015 and optimally coarse grained to derive stochastic terms. This approach can provide a guide for the magnitude, spatial, and temporal patterns of the stochastic terms to be implemented in the numerical model, and can, for example, be decoupled from the background flow (e.g., Cooper and Zanna 2015). A complementary approach is to implement a PDF-based parameterization, such as Porta Mana and Zanna (2014) who have developed a stochastic parameterization of ocean mesoscale eddies that depends on the temporal tendency of potential vorticity.