An ensemble reconstruction of ocean temperature, salinity, and the Atlantic Meridional Overturning Circulation 1960–2021

Ocean reanalyses covering many decades, including those with few observations, are needed to understand climate variability and to initialize and assess interannual to decadal climate predictions. The Met Office Statistical Ocean Re‐Analysis (MOSORA) exploits long‐range covariances to generate full‐depth reanalyses of monthly ocean temperature and salinity even from sparse observations. We extend MOSORA by generating an ensemble that samples uncertainties in long‐range covariances. Initial covariances are taken from model runs and these are improved with observations using an iterative process. We demonstrate that covariances are improved by iteration, and that this procedure, using very sparse observations typical of the 1960s, captures many features of analyses benefiting from modern observation density. We investigate the ensemble spread and find that salinity trends in the covariances from model runs can introduce unexpected changes in the reanalyses. We nudge the reanalyses into an ensemble of coupled climate models to produce estimates of the Atlantic Meridional Overturning Circulation (AMOC). At 26°N, the AMOC shows decadal variability consistent with observations at this latitude and shows signs of strengthening in recent years. The ensemble spread in AMOC reconstructions increases with time as more observations interact with uncertain covariances. At 45°N, the amount of decadal variability in the AMOC varies between members, but on shorter timescales the variability is similar across the ensemble. At 45°N, the AMOC can be constrained better with more observations on the western boundary, but longer continuous observations are needed to improve covariances and reduce uncertainties in the AMOC.


INTRODUCTION
Decadal climate prediction fills the gap between seasonal forecasts and climate projections and is important for government policy and corporate planning.It is well established that decadal predictions are skilful (Meehl et al., 2014;Smith et al., 2019) and climate services are beginning to be developed (Dunstone et al., 2022;Hermanson et al., 2022;Solaraju-Murali et al., 2022).The main difference between decadal predictions and climate projections is that decadal predictions start from the current state of the climate system.This is essential for predicting internal variability and may also provide more accurate forecasts of externally forced changes in climate (Doblas-Reyes et al., 2013;Meehl et al., 2014;Smith et al., 2019).Accurate ocean initial conditions are particularly important, since predictable slow changes in the ocean can drive multiyear and decadal climate impacts including Atlantic hurricanes (Caron et al., 2018;Smith et al., 2010), Sahel drought (Sheen et al., 2017;Yeager et al., 2018), European rainfall (Simpson et al., 2019), droughts and wildfires in southwestern United States (Chikamoto et al., 2017), summer temperatures in northeast Asia (Monerie et al., 2018), the occurrence of warm summer temperature extremes (Borchert et al., 2019), and Tibetan Plateau summer rainfall (Hu & Zhou, 2021).
Over the last 20 years, observations of ocean temperature and salinity over the upper 2000 m have been greatly improved by the deployment of a global array of around 3000 Argo floats (Riser et al., 2016;Roemmich et al., 2019Roemmich et al., , 2012)).However, the long timescales of decadal prediction and the need to evaluate skill and biases for different phases of multidecadal climate variability require retrospective forecasts (known as hindcasts) that cover several decades.For the 6th Coupled Model Intercomparison Project (CMIP6, Eyring et al., 2016), decadal hindcasts start in 1960 (Boer et al., 2016).At that time, most of the ocean was poorly observed (Abraham et al., 2013), making initialization of the ocean difficult and highlighting the need to extract as much information as possible from the available observations.
The Met Office Statistical Ocean Re-Analysis (MOSORA: Smith et al., 2015) is used to initialize the Met Office Decadal Prediction System (DePreSys) and uses optimal interpolation with long-range (potentially global) covariances to fill the gaps between the sparse observations, thereby exploiting long-range teleconnections to create globally complete reanalyses.The potential of this approach to generate realistic reanalyses even with very sparse observations typical of the 1960s was demonstrated by Smith and Murphy (2007) and Smith et al. (2015).Allison et al. (2019) showed that this approach reproduces temperature trends well down to 2000 m and that accurate global covariances improve the analysis greatly compared with local covariances for data-sparse periods.Historical ocean observations are too sparse to calculate global covariances, and even local covariances are not known with sufficient accuracy in some key regions, including the Labrador Sea (Menary & Hermanson, 2018).MOSORA attempts to overcome this by using covariances calculated from climate models.Whilst these are physically plausible, they are model-dependent.Hence, for the latest Met Office decadal prediction system (DePreSys4), which contributes to CMIP6, we develop an ensemble of MOSORA (hereafter MOSORA-E) that samples uncertainties in covariances by using an ensemble of different climate models.
In this work, we document the development of MOSORA-E, including assessing its fidelity with data-withholding experiments and investigating sources of uncertainty.Many decadal climate anomalies are linked to the North Atlantic and require skilful predictions of the Atlantic Meridional Overturning Circulation (AMOC), since it is important for northward heat transport in the Atlantic (Bryden et al., 2020;Johns & Baringer, 2011) and subsequent impacts over land (Hermanson et al., 2014;Müller et al., 2014;Robson et al., 2012;Sheen et al., 2017;Yeager et al., 2015;Yeager & Robson, 2017).Constraining the AMOC in historical reconstructions has been shown to be difficult (Karspeck et al., 2017;Pohlmann et al., 2013), but is expected to improve with modern observational coverage and assimilation methods (Jackson et al., 2019).We therefore also assess reconstructions of the AMOC in the DePreSys4 ensemble assimilation run.Surprisingly, we find that AMOC uncertainties do not necessarily decrease with greater observational coverage.We investigate the reasons for this and suggest ways in which AMOC reconstructions can be improved.
In Section 2, we describe how MOSORA-E is constructed and how we use a climate model to reconstruct the AMOC.In Section 3, we assess MOSORA-E using withheld observations and investigate the origins of the ensemble spread.In Section 4, we present the AMOC reconstructions and investigate causes of uncertainty.Finally, we present our conclusions and recommendations in Section 5. used are EN4 temperature and salinity profiles (Good et al., 2013) with XBT corrections by Gouretski and Reseghetti (2010).The observed profiles are averaged into monthly means and onto a 1.25 • × 1.25 • global grid with 20 levels as used previously (Smith et al., 2015;Smith & Murphy, 2007).The top 80 m (top six levels) also uses sea-surface temperature (SST) observations in the form of HadSST2 (Rayner et al., 2006) before 1982and HadISST1.1 (Rayner et al., 2003) thereafter (Smith & Murphy, 2007).SST under sea-ice is set to −1.8 • C.
In this version, MOSORA-E uses the years 1951-2016 to calculate climatologies, variances, covariances, and correlations (apart from the initial iteration, see below).Since optimal interpolation is computationally expensive, we limit the number of observations that can influence an analysis grid point to 200, though we stress that these are not necessarily limited to local regions.For each grid point and month, the analysis searches for observations (both temperature and salinity) in the same month and the five preceding months over a predetermined number of vertical levels for which the correlation between observation location and analysis grid point is greater than 0.3 in magnitude.This search begins with regions close to the analysis grid point and expands until either 300 potential observations have been found or all regions have been searched.If more than 200 potential observations are found, then the number is reduced to 200 by keeping the observations with the highest absolute correlation with the analysed point.These observations are then used in the optimal interpolation to determine the values of salinity and potential temperature at the analysed point, exploiting covariances between temperature and salinity to perform a joint analysis.
Covariances for the initial iteration are taken from historical simulations.Member one uses a coupled HadGEM3 historical simulation with Global Configuration version 2 (GC2, Williams et al., 2015).Member two uses an ocean-only run forced with observed interannual forcing; the model is described by Garry et al. (2019), but here we use the Coordinated Ocean Research Experiments v2 Inter-Annual Forcing (CORE2-IAF, Large & Yeager, 2009).The remaining eight members use coupled HadGEM3 historical simulations that were performed with perturbed physics at GC3 (Williams et al., 2018;Yamazaki et al., 2021).In this way, our ensemble samples covariances from Nucleus for European Modelling of the Ocean (NEMO) models (Storkey et al., 2018) at a resolution of 0.25 • (ORCA025) with a variety of surface forcings.
The model covariances used will inevitably contain errors, and a key part of the MOSORA procedure is to exploit the available observations to improve the model covariances.This is achieved through the iteration F I G U R E 1 Iteration procedure for MOSORA.First-guess model covariances and an observed climatology are used to make an initial analysis for all months in the covariance period .Subsequent iterations use covariances and climatologies calculated from the previous analysis, which benefit from observed information.The same observations are used for every iteration.procedure illustrated in Figure 1.The first-guess covariances taken from the model runs described above are used, along with a climatology based on observations from 1941 to 1996 (Smith & Murphy, 2007), to create global reanalyses for all months over the covariance period .These reanalysis fields contain observed information and are then used to update the climatologies of temperature and salinity and to calculate the covariances in the next iteration (iteration 1), and so on.Note that cross-covariances are used to make joint temperature and salinity analyses, which is particularly useful for subsurface salinity, which benefits from the cross-covariance with better observed subsurface temperature in the early reanalysis period.To reduce noise, reanalysis fields are smoothed with a 3 × 3 grid-point window.We demonstrate below that two iterations improve the reanalyses significantly.
To reconstruct the AMOC and create initial conditions for the DePreSys4 decadal climate predictions for CMIP6, assimilation runs are performed in which observational reanalysis fields are nudged into a coupled climate model covering the period December 1959-November 2021.The model is the same as the HadGEM3-GC3.1-MMCMIP6 historical simulation (Andrews et al., 2020), which has a nominal atmospheric resolution of 60 km (N216) and an ocean resolution of 0.25 • (ORCA025), with these components coupling every hour.In each ensemble member, the corresponding MOSORA-E member is nudged in at all levels (except the top level) in the ocean towards the monthly-mean full-field ocean temperature and salinity data from iteration 2 (linearly interpolated to the model grid).As in the previous version of DePreSys (Dunstone et al., 2016), we use a relaxation time of 10 days in the ocean, sea ice is nudged towards HadISST (Rayner et al., 2003, updated) with a relaxation time of 1 day, and the atmosphere is nudged with a 6-hour relaxation towards temperature and winds at all levels from ERA-40 (Uppala et al., 2005) until 1996 and ERA-Interim (Dee et al., 2011) after that.

ASSESSMENT OF MOSORA-E
Our assessment of the reanalyses is divided into two parts.
In the first part, we use data-withholding experiments to assess the impact of iterations and to investigate the accuracy of reconstructions made using the observational network typical of the 1960s.In the second part, we investigate the ensemble spread in ocean temperature and salinity and how it changes with iteration.We focus mainly on the 995-m level, which encompasses the 800-1200 m depth range, because it is approximately the depth of the AMOC maximum (which we will study in Section 4) and in the early part of the reanalysis period this depth is poorly observed compared with recent years.

Iterations and data-withholding experiments
The observational network is continually changing over our target reanalysis period 1951-2016.The 1960s, for example, are characterized by a sparse observational network, with most temperature and salinity observations in the top few 100 m.The 2010s, on the other hand, benefit from the global Argo network of robotic floats and have much better observational coverage in the top 2000 m in ice-free areas and away from coastal shelves (Palmer et al., 2019).Here we perform data-withholding experiments to assess how well MOSORA-E made using 1960s observations compares with analyses using 2010s observations.For this, we made the reanalyses for each month of the well-observed period 2010-2013 (48 months in total) and compared them with reanalyses made using a subset of the observations that are close to locations with observations from 1964 to 1967.This was performed using the MOSORA-E ensemble member with first-guess covariances taken from the ocean-only forced run (member two).
An example of the results for salinity at 995 m can be seen in Figure 2 for May 2013.The left-hand column shows the analyses made from all the salinity profile locations available in May 2013 and the right-hand column shows the analysis using a subset of the observations that were chosen as they are close to locations where there were observations in May 1967.The actual observations used at 995 m are shown in Figure 2a,b.Only 17% of the locations with observations remain after subsetting and these are not uniformly distributed, but instead follow international shipping routes and are biased to the Northern Hemisphere.As a result, local covariances would not be expected to capture many features seen in the analysis with all observations.The analyses using first-guess covariances from the model (Figure 2c,d) agree close to 1967 observational locations (along the Newfoundland coast, the Norwegian Sea, west of South America, and east of Japan), but in other regions the agreement is poor and the spatial correlation between the two is (0.09).In contrast, iteration 2 shows an improvement in agreement (Figure 2e,f).Both analyses have anomalies with a larger spatial scale that are more clearly defined with less small-scale noise.The North Pacific and Southern Ocean are now more similar and the spatial correlation is now higher (0.62).
We now generalize these results to other levels and dates.We expect the largest improvements for levels where there are few observations in the 1960s and many observations in the 2010s, because the error covariances are expected to improve most when there are many observations.The proportional increase in observations between the 1960s and the 2010s is largest over the depths 400-2000 m (not shown).This is because depths below 2000 m are not well sampled in either period and, although the number of observations in the upper 400 m did increase, there were already observations in that layer during the 1960s.Therefore, we focus our analysis on the levels between 400 and 2000 m.Frequency histograms of spatial correlations between the full analysis with all observations and the analysis with withheld observations show that iteration brings the two analyses much closer to each other for both temperature and salinity (Figure 3).As the difference between consecutive iterations reduces as more iterations are made, there is likely not much to gain by iterating a third time.Hence, by iterating covariances to take into account observational information, MOSORA-E is able to provide analyses using sparse observations from the 1960s that capture many of the features seen in analyses made using observational coverage similar to the present day.
The optimal interpolation algorithm produces an estimate of the expected error variance based on available observations and the strength of their link (correlation) to the analysis point (Smith & Murphy, 2007, equation 3).However, this assumes that covariances are perfect, and imperfect covariances lead to an underestimation of this error variance.The agreement between estimated and actual error variance provides an indication of how good the covariances used in the analyses are.We take the final analysis (iteration 2) using all observations as the truth, such that the actual error variance is calculated as the variance of the difference between the full analysis and the analysis from withheld observations.The ratio of actual to expected error variance (Figure 4) shows that analyses made using the first-guess covariances from the models greatly underestimate the actual error variances.On average, the actual error variance is almost eight times larger than the expected error variance (average ratio is 7.9), so the error covariances taken directly from the model runs are clearly not optimal.After two iterations, there are still areas with ratios above one, but the values are five times smaller (average ratio is 1.6) and the expected error variance is much closer to the actual error, indicating an improvement in the covariances resulting from the iterations.

Ensemble spread
Figure 5 shows the ensemble standard deviation of temperature and salinity for the second iteration at 995 m over the years used for the assimilation and hindcasts.Considering measurement errors, the spread is small, being only a few hundreths of a degree or a thousandth of a salinity unit in most of the oceans.The spread is of the same order as the ensemble mean interannual variability (not shown).
The lowest spread (high agreement between members) is in the tropical oceans and the North Pacific.There is larger spread in parts of the Atlantic, Southern Ocean, and Arctic Sea.For temperature, there is relatively small spread for the Atlantic subpolar gyre, but for salinity this region has relatively large spread.
To understand how the spread has changed with iteration, the right-hand column shows the difference in spread between the second iteration and the analysis with first-guess covariances.It is expected that, in well-observed regions and regions strongly influenced by the observations, iterations will cause covariances to converge, leading to more confident reanalyses and a decrease in ensemble spread.This is indeed the case in most regions for temperature, except for the Southern Ocean and Arctic Ocean, where perhaps the observation data are too sparse to constrain the covariances.For salinity, convergence is less prevalent: most notably, the spread in the Atlantic has increased for salinity but decreased for temperature with iteration.This indicates that the error covariance of salinity with itself differs more between members than the error covariance of temperature with itself.This implies that reanalyses of salinity will be more uncertain than those of temperature, perhaps since there are more temperature observations than salinity observations.We look more closely at salinity in the North Atlantic Subpolar Gyre to investigate the increase in spread of salinity with iteration there.Figure 6a shows the time series of salinity at a point in the western subpolar gyre for each of the runs used in the model covariances.The point is marked as a cyan cross in Figure 5d.We find that several models have trends at 995 m in the subpolar gyre; the time series are shown relative to their initial value to emphasize this.This location has a relatively long time series of observations at this depth with some gaps.In comparison with the observations, for most of the models the trends are large.These trends also exist for other points in the subpolar region, possibly related to model drift due to insufficient spin-up.As the points across the region have trends in common, this leads to spuriously strong correlations, reducing the spread of the ensemble of the first analysis, as all covariances are similar.This is evident in Figure 6b, which shows the analyses made using the covariances derived from the time series in Figure 6a.The influence of the observations has removed any trends in the time series and the analyses follow the decadal variability of the observations more closely.However, the spread of the ensemble often does not encompass the observations.
The impact of removing the trends from the covariances becomes evident with iteration.Figure 6c shows salinity at the same location for iteration 2. With the influence of the trends removed, other differences between the covariances have become more apparent, leading to an increase in spread compared with Figure 6b.In agreement with what we see in Figure 5d, the spread of salinity in the subpolar gyre has increased from the first-guess analysis (Figure 6b) to iteration 2 (Figure 6c). Figure 6 also illustrates what was shown in Figure 4: the second iteration has improved covariances compared with the initial model covariances, as the expected error variance is closer to the actual error variance.We therefore conclude that the model covariances in the subpolar gyre used to make the first-guess analysis did not have sufficient spread to represent the uncertainty of the reanalysis and the increase in spread represents improved covariances and uncertainties.
Another region that stands out in Figure 5d is the Gulf of Guinea, where the spread of salinity increases with iteration over a large region.implies that the anomalies prior to this are close to zero, as there are few observations influencing the Gulf of Guinea at this depth and the increased observational coverage from the Argo array highlights the sensitivity to uncertain covariances.Further investigation into the covariances of the individual members (not shown) shows that the members with a positive trend (such as green, pink, and purple) have positive correlations between temperature and salinity in this region; members with a negative trend (such as orange, grey, and blue) show negative correlations; and the remaining members (red, brown) have only weak correlations.The three members with positive correlations had positive salinity trends in the models used for the first-guess covariances and positive correlations between temperature and salinity.These positive correlations are maintained through all the iterations, indicating that they are dominant in determining the analysis for this point.The cause of the correlations in the other members is less clear.The fact that the ensemble spread increases when more observations are introduced is a sign that the covariances are uncertain and that models perhaps do not represent the processes in this region well.
Using the gridded EN4 data set of temperature and salinity from 2006 to 2021 (the period where MOSORA-E shows conflicting trends and also when there is observational coverage from Argo) reveals that there is no strong correlation between salinity and temperature in this region on annual mean timescales (not shown).This region is controlled primarily by the formation of Antarctic Intermediate Water (AAIW; Li et al., 2022), the origin of which is the Southern Ocean, also poorly observed.Hence, the increase in uncertainty despite increased observational coverage highlights the need to understand and model the formation and characteristics of AAIW better.

IMPACT ON ATLANTIC MERIDIONAL OVERTURNING CIRCULATION
In this section, we investigate the reconstructed AMOC and its uncertainties.The AMOC is calculated as the maximum overturning streamfunction at each latitude in the assimilation runs.To focus on the impact of ocean observations, we assess the AMOC with Ekman transport removed.The Ekman transport is the same for all members, as it is based on the same wind observations.Figure 8 shows the AMOC at 26 • N (hereafter AMOC-E26) and 45 • N (hereafter AMOC-E45).The RAPID AMOC (Cunningham et al., 2007) is measured at 26 • N and shows a mean strength of about 13 Sverdrup (Sv) (1 Sv = 10 6 m 3 s −1 ) with Ekman removed, so the transport in all members is weak, which has been found for earlier nudged runs of DePreSys (Karspeck et al., 2017).The RAPID time series is short, but the AMOC in the assimilation runs shows variability, in agreement with RAPID.The AMOC in our reconstruction is currently strengthening from the recent low in 2010.Both AMOC-E26 and AMOC-E45 show decadal variability, though the amount varies from member to member.Variability in AMOC-E45 ranges from about 2.5 Sv (blue member) to about 1.5 Sv (green member).The difference in decadal variability is smaller at 26 • N, and at both latitudes the variability on less than 10-year timescales is similar between the members.
The black lines in the panels of Figure 8 refer to the right-hand axis and show the AMOC uncertainties diagnosed from the ensemble standard deviation.Perhaps surprisingly, the uncertainty in AMOC-E26 generally increases with time despite an increase in observational coverage.Uncertainties in AMOC-E45 vary with time, with three peaks around 1970, 1995, and 2011 that are not related to the mean AMOC.
We now investigate what might be influencing the spread of the AMOC at the two latitudes.Figure 9a shows the ensemble mean of the correlation of temperature at a point on the western boundary at 26 • N (cyan cross) with temperature elsewhere on the 1500-m level.The variable and depth are chosen, as the AMOC maximum is usually found close to the 1500-m depth and temperature is more dominant than salinity in determining the variability of density at this location (not shown).All correlations of magnitude smaller than 0.3 have been masked out, as they would not be used in the analysis (Section 2).The correlations are highest at the western boundary close to the analysis point and following the path of the Gulf  Stream extension across the Atlantic.Figure 9b shows this ensemble mean multiplied by the ensemble standard deviation of the correlations.This is a rough measure of where the mean correlation is high and the ensemble spread is also high.In other words, it highlights the locations where observations are important for the analysis point shown by the cross, but the disagreement between ensemble members on the error covariances means there will be spread in the analyses between members.The region close to the analysis point has low values, as all members agree here; instead the regions to the north and east are likely to be contributing to the spread of the temperature analysis at 26 • N on the western boundary.
In Figure 9c, we correlate the spread of AMOC-26E with the number of observations (when there are observations) across the North Atlantic.As the spread of AMOC-26E increases with time, this shows primarily where observations have similarly increased, which is almost everywhere.However, we note that the regions noted in Figure 9b, particularly the region off the northeast coast of North America, appears to be associated with the spread of AMOC-26E.As the mean correlation in Figure 9a is high in this region and the spread is also high (Figure 9b), more observations here will increase the spread of the AMOC due to uncertainties in the error covariances.Thus, long-term observations are needed in this region at depth to reduce the uncertainty of the covariances with 26 • N.
For the spread of AMOC-E45, we look at salinity at 995 m.Again, this depth is close to the depth of the maximum overturning at this latitude, but here salinity plays a larger role in determining the variability of density than it does at 26 • N (not shown).Figure 9d shows the ensemble-mean correlation between a point at 45 • N on the western boundary and salinity elsewhere on the level.The strongest correlations are at 45 • N on the western boundary and in the Labrador Sea.The product of the ensemble mean correlation and the standard deviations, shown in Figure 9e, does not show any regions with large values (as seen in Figure 9b; note the difference in contour levels), apart from the negatively correlated region to the south.
In Figure 9f, the correlation between the number of observations and the spread in AMOC-E45 shows a strong negative region to the east of the Grand Banks and close to 45 • N. It appears that the spread in AMOC-E45 is associated with the number of observations on the western boundary there.Particularly in the 1980s, there were relatively many observations here compared with other time periods (not shown), which contributes to the low spread at this time in Figure 8.A lack of observations in this region therefore reduces our ability to reconstruct the AMOC and hence the heat transport into the North Atlantic subpolar gyre.

SUMMARY AND CONCLUSIONS
Initialization of the ocean is key for decadal predictions.To assess the skill of decadal predictions and correct model drifts and biases, hindcasts covering multiple decades are needed and typically start in 1960 (Boer et al., 2016).Whilst the Argo array has improved the observational coverage substantially over the most recent 20 years, subsurface ocean observations before then are very sparse and introduce uncertainties in initial conditions for hindcasts.The latest Met Office Decadal Prediction System (DePreSys4), which contributed to CMIP6, samples these uncertainties by using a 10-member ensemble.Each ensemble member takes initial conditions from an assimilation run in which the coupled climate model is nudged towards different monthly mean reanalyses of ocean temperature and salinity.These analyses are taken from an ensemble version of the Met Office Statistical Ocean Re-Analysis (MOSORA-E), which we document here.
MOSORA (Smith et al., 2015) uses optimal interpolation with potentially global covariances to produce globally complete reanalyses of ocean temperature and salinity.Because observations are too sparse to provide global covariances, these are initially computed from climate model simulations.These are physically plausible but model-dependent, so MOSORA-E samples uncertainties in covariances by using an ensemble of 10 different model simulations.An important aspect of MOSORA-E is that the covariances may be improved by taking observations into account.This is achieved through iteration: the model covariances are used to produce a first-guess analysis covering the period 1951-2016; because this has been influenced by observations, it potentially provides more accurate covariances and climatologies and is used for the next iteration, and so on.We use data-withholding experiments to demonstrate that, after two iterations, analyses using sparse observations typical of the 1960s are able to capture many regional temperature and salinity anomalies seen in analyses that benefit from Argo observations.We also assess the ratio of actual error to expected error diagnosed from the optimal interpolation and find that this reduces from nearly eight for the first-guess analysis to 1.6 for the second iteration, demonstrating that iteration improves the covariances.
We further assessed uncertainties across the MOSORA-E ensemble, focusing on a depth of 995 m, which is important for the AMOC.In general, we expect uncertainties in the analysis to reduce as covariances are improved, and this is indeed the case for temperature in most regions.However, ensemble spread increases with iteration in some regions, especially for salinity.In the North Atlantic subpolar gyre, spurious trends in the model simulations produce spatial salinity correlations that are too strong, leading to unrealistic agreement across the ensemble in the first-guess analysis.Iteration removes these trends, resulting in improved analyses despite an increase in uncertainty.Ensemble spread of salinity also increases in the Gulf of Guinea, especially during the Argo period.We traced this to uncertainties in the cross-correlation between salinity and temperature, which can produce different trends in salinities once observations become available.This highlights the need to check first-guess covariances for trends and for improved understanding and modelling of the formation and characteristics of AAIW, which dominates salinities at 995 m in this region.
We also assessed variability and uncertainty in the AMOC, which has been shown to be important for decadal predictions (Hermanson et al., 2014).The AMOC was obtained from the ensemble of assimilation runs and shows coherent decadal variability at both 45 • N and 26 • N. Surprisingly, uncertainties at 26 • N increase almost continuously with time, despite improvements to the observational coverage.We traced this mainly to uncertainties in covariances between temperature at the western boundary and temperatures further north off the coast of North America.At 45 • N, increased uncertainties appear to be related to fewer observations at the western boundary.
The result that analysis spread increases with increasing numbers of observations can seem counterintuitive.Optimal interpolation is a statistical method, akin to multiple linear regression.Simple mathematical models showing this can be found in the Supplementary Material for this article (Figures S1 and S2).When there are few observations, the solution falls back on climatology and hence ensemble members remain close.Observations impact the solution through covariances, such that more observations will move the analysis further from climatology.However, uncertainties in covariances in regions strongly correlated with the analysis point may lead to a larger spread in the analysis.For example, in the Gulf of Guinea (Figure 7), the covariance between temperature and salinity is particularly uncertain and can even differ in sign, leading to an increase in spread of the analyses when observations become available in the region of uncertainty (Figure S3).Similarly, the spread of the AMOC at 26 • N (Figure 8a) increases when more observations become available in regions where covariances are uncertain, but the observations have a strong influence on the solution (Figure S4).
Overall, our results are encouraging for reconstructing historical ocean temperature, salinity, and AMOC for at least the last 60 years, even with very sparse observational coverage from the 1960s.Key to this is the generation of accurate covariances through an iterative procedure that updates model-based relationships with observations.Sparse ocean observations likely contain more information than is currently utilized by many reanalysis techniques that do not exploit global covariance, and further studies are warranted to assess the potential to develop ocean reanalyses further back in time.Nevertheless, substantial uncertainties remain, especially for the AMOC, and these can grow despite improved observational coverage from the Argo array in regions where covariances are particularly uncertain.Reducing AMOC uncertainties will likely require improved observational coverage along the western boundary of the North Atlantic at the depth of the AMOC maximum, and improved understanding of the physical processes leading to covariance uncertainties in this region.Although not available for the whole period analysed here, if included, sea-surface height information may help to constrain the analyses.Finally, we note that our ensemble does not sample the full range of uncertainty and recommend future studies that include covariances from other models and also sample additional uncertainties related to XBT corrections (Cheng et al., 2014(Cheng et al., , 2016)).

F
I G U R E 2 Evolution of salinity anomalies at 995 m with iteration for all available observations (left) and for observations typical of 1967 (right).All anomalies are relative to the climatology of the final iteration (iteration 2) for 1951-2016.(a) Locations and grid-box mean amplitude of gridded observed salinity anomalies in May 2013.(b) May 2013 observations subsampled to resemble the observational density of May 1967.(c,d) Analysis of salinity anomalies with first-guess covariances using the observations above.(e,f) Analysis of salinity anomalies for iteration 2. In brackets in the titles of (d) and (f) are the spatial correlations of these fields with (c) and (e), respectively.

F I G U R E 3
Frequency polygons of spatial correlations between the analysis with all observations and the analysis with withheld observations for potential temperature (left) and salinity (right).All months between 2010 and 2013 and four levels (447, 666, 995, 1500 m) are used to create the histograms.Each iteration is shown with a different colour.F I G U R E 4Ratio of actual to expected error variance for salinity at 995 m for (a) first-guess covariances and (b) iteration 2. Note that the colour scale is nonlinear.

F
The effect of iteration on uncertainties at 995 m.(a) Ensemble standard deviation for iteration 2 potential temperature (unit K) over annual means of the reanalysis (1960-2016) and (b) the difference from the first-guess analysis.(c,d) The same as (a,b) but for salinity (practical salinity units, PSU).The crosses in (d) show points used in Figures 6 and 7.
Figure 7 shows the analysed salinity in iteration 2 at 995 m for one point in the Gulf of Guinea.The point is marked as a yellow cross in Figure 5d.The ensemble spread arises from a divergence in the trend since about 2000.It is perhaps surprising that the ensemble uncertainty increases after Argo buoys are introduced into the region (around 2006) and the number of observations increases compared with earlier years (lower panel).This F I G U R E 6 Annual mean salinity at a point (53 • N, 36 • W) at 995 m for (a) initial model covariances, (b) first-guess analysis, and (c) iteration 2. Ensemble members are in colour and observations in black.Observations are not continuous at this location and some values may be biased due to missing months in the annual mean.In (a), the salinity values are relative to the value in 1950 to emphasize the trends.F I G U R E 7 Salinity at a point at 6 • S, 5 • W in the Gulf of Guinea at 995 m from iteration 2. The lower panel shows the annual count of temperature and salinity observations in the Atlantic 30 • S-10 • N at 995 m on the same time axis as the upper panel.

F
I G U R E 8 Atlantic Meridional Overturning Circulation (AMOC) with Ekman transport removed at (a) 26 • N and (b) 45 • N. The red line shows observations from the RAPID array reduced by 4 Sv.Grey lines show transport from MOSORA nudged runs.Time series of annual mean transport have been smoothed with a five-year running mean.The blue and green lines pick out two members with differing decadal variability.The black line shows one standard deviation of the smoothed ensemble (right-hand axis).

F
Error covariances related to AMOC spread.(a) Ensemble mean of member correlations between annual mean temperature at a point 26 • N at the western boundary (cyan cross) and temperature at 1500 m in the North Atlantic.(b) The ensemble mean correlation multiplied by the ensemble standard deviation.(c) The correlation between the annual mean number of observations (temperature and salinity) and AMOC ensemble standard deviation.Both the AMOC and observation time series have been smoothed with a 5-year running mean.Points with no observations in a 5-year period have been masked out.All fields where the ensemble mean correlation magnitude is less than 0.3 have been masked.(d-f) Same as (a-c), except for 45 • N and salinity at 995 m.