Uncertainties in critical slowing down indicators of observation-based fingerprints of the Atlantic Overturning Circulation

Observations are increasingly used to detect critical slowing down (CSD) to measure stability changes in key Earth system components. However, most datasets have non-stationary missing-data distributions, biases and uncertainties. Here we show that, together with the pre-processing steps used to deal with them, these can bias the CSD analysis. We present an uncertainty quantification method to address such issues. We show how to propagate uncertainties provided with the datasets to the CSD analysis and develop conservative, surrogate-based significance tests on the CSD indicators. We apply our method to three observational sea-surface temperature and salinity datasets and to fingerprints of the Atlantic Meridional Overturning Circulation derived from them. We find that the properties of these datasets and especially the specific gap filling procedures can in some cases indeed cause false indication of CSD. However, CSD indicators in the North Atlantic are still present and significant when accounting for dataset uncertainties and non-stationary observational coverage.


INTRODUCTION
In recent years there has been increasing focus on non-linearities and the potential of abrupt transitions in the Earth system, especially in response to anthropogenic greenhouse gas emissions.Of particular interest are systems that have multiple stable equilibrium states, and so could rapidly transition in a self-perpetuating way to a different state once a critical forcing threshold is reached (Mckay et al., ).When such systems approach a transition to a different state in response to gradual changes in forcing, they may exhibit so-called critical slowing down (CSD), in which their response to perturbations changes in a characteristic manner (Dakos et al., ).CSD can be a sign of a forthcoming transition and may in certain situations be used to anticipate it; statistical signs of CSD, such as increasing variance or autocorrelation, have hence also been termed early-warning signals (Scheffer et al., ).CSD has been identified in observations of numerous Earth system components that have been identified as tipping elements (McKay et al., ).These include the the Greenland Ice Sheet (Boers and Rypdal, ), the Atlantic Meridional Overturning Circulation (AMOC) (Boers, ; Michel et al., ), the Amazon rainforest, (Boulton et al., ) as well as other parts of global vegetation (Smith et al., b,a).
However, CSD indicators such as the variance and lag-one autocorrelation (Scheffer et al., ) or the restoring rate (Boers, ; Held and Kleinen, ) are calculated from observational datasets that are not optimised to capture higher-order statistics.Observational datasets employ a variety of methods to combine data from different instruments, adjust observational biases and fill in missing grid cells.These methods are tuned to best capture mean global trends in the data, sometimes at the cost of underlying statistical properties.For example, variance may decrease just as a result of increasing data coverage.This can be caused by an increasing numbers of observations that can be taken into account for each reported value, e.g. by taking the mean over samples of increasing size and correspondingly reduced standard error.However, this is just a simplified example, and as each observational dataset has its own specific methods of data assimilation and infilling, it is not possible to generalize the effect that observational dataset uncertainties would have on higher-order statistics.For an in-depth investigation of data aggregation effects using remote sensing data we refer to Smith et al. ( b).In this work we focus on sea-surface temperature (SST) and salinity based CSD indicators for the AMOC.We show how dataset uncertainties can be incorporated into the CSD analysis, and how the standard significance testing methods can be modified to account for the influence of different infilling methods.The most commonly used CSD indicators are an increase in the variance and autocorrelation of a timeseries.However, these indicators can result in false positives, as an increase in the variance or autocorrelation can also be caused by a corresponding statisical change in the external conditions, often modeled as noise.To avoid such false positives, B introduced the restoring rate λ, estimated under the assumption of non-stationary correlated noise driving the system.
For a system in state x close to equilibrium, we can linearize about the equilibrium state and thus the dynamics can be approximated as dx dt ≈ λx + η, where η stands for random external perturbations.λ can thus be estimated by regressing and estimating of the derivative dx/dt against x.One can then avoid false CSD indicators caused by the properties of η by performing this regression with a generalized least square algorithm that assumes noise with varying autocorrelation (for more details see B ). λ is negative for systems close to a stable state, and when a multistable system approaches a critical transition, λ increases to from below.We focus on λ in the main text of the paper; corresponding analyses and figures for the variance and autocorrelation can be found in the supplementary materials.
A prerequisite for a statistically significant increase in CSD indicators is a sufficiently long time series .The second approach is to calculate the CSD indicators for each grid cell in the SST or salinity dataset, and look at the regions that are thought to be related to the AMOC strength.For example, if the AMOC weakens, salinity is accumulated along its main transport path, and thus the changes in near-surface salinity along the Gulf Stream and North Atlantic Current are thought to reflect changes in the strength of the AMOC (Rahmstorf, ).B found significant increases in λ both in the SST and salinity fingerprints and on spatially explicit maps, and both of these approaches will be used in this work.Although this study will focus on SST and salinity datasets and on CSD indicators, the work presented here can be generalised to many other datasets and higher-order statistics (for example, Shi et al. ( )).

UNCERTAINTY ENSEMBLES
In this section, we asses how uncertainties provided with the observational datasets propagate to uncertainties in the CSD indicators (or other higher-order statistics).The only uncertainty provided for EN . . is the uncertainty associated with the analysis method, and that estimate has issues that limit its usefulness for our analysis; for example, in some areas more observations actually increase the analysis uncertainty (see Good et al. ( )).Thus, in this section, we make use only of the uncertainties provided with the HadCRUT and HadSST datasets.See Methods for a detailed discussion of the analysis and processing procedures of these datasets, as well as of the provided uncertainties.
We  HadSST and HadCRUT, respectively, as those are the time series a user would obtain when downloading the data from the Met Office website (www.metoffice.gov.uk/hadobs/hadcrut/, www.metoffice.gov.uk/hadobs/hadsst/).In general, the median is a better choice, as taking the mean changes the statistical properties.
However, the magnitude of the trend is of lesser interest to us than whether or not it is statistically significant.Figures S , S , S and S show the same calculations for the variance and autocorrelation.The autocorrelation behaves similarly to λ, although with a less significant increase.The variance has no overall trend, due to a marked decrease from to the s, which is likely caused by the increasing number of observations for the reasons explained above.

GLOBAL SIGNIFICANCE ESTIMATION WITH SURROGATES
In addition to an uncertainty estimation, it is also important to calculate the statistical significance of our higher-order statistic of interest.When we want to test a statistic of a time series xt, for example the linear trend of its lambda time series, denoted by s λ (xt), this is generally done by generating surrogate time series of xt, xi t (Theiler et al., ).The values of the statistic for many surrogate time series s λ (x i t ) can then be used as a distribution to estimate the significance of the actual s λ (xt).Our null model determines the properties that the surrogate time series need to have.When calculating CSD indicators, our null model should be a time series that has the same autocorrelation structure and variance as xt, but is otherwise random.Such a time series can be produced either by measuring a finite number of autocorrelation properties of xt and generating an according time series, or by the method of Fourier surrogates, where the Fourier phases are randomly shuffled (see Methods).These surrogates can be modified to include the effects that interpolation methods or lack of data have on the time series (e.g.removing data points that are missing in the original time series).
It is important to note that in the case of calculating CSD indicators, our conservative null model is an SST or salinity time series with given properties, not a λ time series.In this regard, the surrogate analysis of this work is more conservative than that in B , who considered surrogates of the λ time series.This is because the autocorrelation structure of λt has a non-trivial dependence on the autocorrelation of xt.By generating surrogates of λt as in B one ignores the wider range of autocorrelations that λt can have given the autocorrelation of xt.This can result in a narrower distribution of surrogate λ trends, and the significance of s λ (xt) is thus overestimated.It is therefore important to generate time series from xt and not λt, even if the latter is computationally more costly.Note that this also implies that Fourier surrogates are not generally suited to test trend significance in arbitrary time-correlated time series.They should only be used in situations where the trend of a sliding-window higher-order statistic is estimated from a given time series, and one has access to that time series to compute surrogates from.For both datasets, the region of increasing λ extends over the whole North Atlantic, but only a smaller region shows statistically significant change at the .confidence level.For HadISST, significant increase only occurs along the North Atlantic Current, the density-driven part of the AMOC, whilst for HadCRUT the significant increase occurs at the northern edge of the sub-tropical gyre, along the North Atlantic Current and in the eastern sub-polar North Atlantic, including the Greenland, Iceland and Norwegian seas.
As opposed to the SST datasets, the infilling method in EN uses information from previous times, as well as a clima- tology.This has the potential to cause false indications of CSD (see Methods for a full discussion).Thus when generating surrogates for a time series from the EN salinity dataset, we must consider the effect the lack of data and analysis method has on the earlier years.The full analysis process is too complex to reproduce when generating the surrogates.However, we can reproduce the specific effect that the analysis procedure has on the calculated CSD indicators.To do this, we use the observational weights provided with the dataset (see e.g.Fig S h-m).These observational weights were produced by setting all observational values to one and the climatology to and rerunning the analysis that produced the infilled dataset (Good et al., ).The resulting weight w represents the amount of information in the given analysis value that comes from observations.It should be noted that this observational information comes from the whole globe, and so w can be high even if there is no observation at the specific gridpoint.However, a low w value is still a good indicator of a datapoint where the persistence based forecast dominates.
For each time series, we use the autocorrelation properties of its last years to generate AR( ) surrogates.Then, for each month that has an observational weight below some limit w 0 , we replace the surrogate value with the persistencebased forecast (Eq ).This creates the same spikes in the surrogates that are present in the analysis data, and thus modifies the autocorrelation structure in a similar way (Fig. S ).
Using the unmodified surrogates, there are regions of significantly positive linear trends of λ for the EN .out modification the whole of the Atlantic seems to have a significant increase in variance.But once the effect of the analysis method is incorporated, no significant regions remain, and we recognise the increase in variance as spurious, caused by the analysis procedure of EN .

DISCUSSION AND CONCLUSIONS
We have addressed a common problem that arises when CSD indicators are computed from pre-processed observational data, namely, that the observational datasets have inherent and potentially non-stationary uncertainties and bi- However, the fact that the areas of significant CSD indicators on the map are much smaller in our analysis than in B , does strengthen the case for an AMOC destabilization.In B , the whole North Atlantic was marked as significant, as well as large regions in the South Atlantic.This is not a result one would expect if the CSD was caused by a weakening AMOC.In this work, the significance is reduced to smaller regions in the North Atlantic that are more typically associated with the path of the warm branch of the AMOC: the North Atlantic Current, the Irminger Sea and the Greenland, Iceland and Norwegian Seas.These CSD indicators could be a sign of AMOC destabilization, as the SST and salinity in these regions would be sensitive to the strength of the AMOC (Rahmstorf, ).The Irminger and Iceland basins in particular have been recently identified as the centres for subpolar AMOC variability, as opposed to the subpolar gyre (Lozier et al., ; Chafik et al., ).Finally, note that the definition of significance in this work is conservative.Significance is calculated on a point-by-point basis and not globally, and we do not take the spatial coherence of positive trends in the North Atlantic into account in our significance testing.
Together with the computational expense, the uncertainty estimates provided with the observational datasets are the most important factors for a reliable uncertainty estimation of CSD.Although running the full analysis algorithm that is used to create the observational datasets is beyond the scope of this work, we are able to make modifications to the surrogates that influence the statistical properties of the data in a similar way to the full analysis.However, we cannot estimate the complete uncertainty on the CSD indicators of the salinity dataset EN . .because the analysis but not the observational uncertainties are provided.Even so, our results show that the influence of observational analysis methods on higher orders statistics should be taken into account alongside the effect on the long-term mean properties.
In summary, for CSD indicators computed from observation-based SST and salinity data we have presented a comprehensive uncertainty estimation and propagation together with significance testing.Such an analysis is a prerequisite to robustly assessing the destabilization of a system from observational data.We find that data processing methods can lead to false detection of CSD (see also Smith et al. ( b)).However, we demonstrate that such obstacles can be overcome by incorporating the data processing effects into uncertainty estimates and significance testing.

HadISST
The HadISST dataset is based on the Met Office Marine Data Bank as well as the Comprehensive Ocean-Atmosphere Data Set (COADS) (Woodruff et al., ), and has been temporally and spatially homogenized using an empirical orthogonal function method called reduced-space optimal interpolation.The data is bias adjusted before applying the interpolation, and the final product is a blend of the interpolated data with the original data, once that has been quality-controlled to homogenize its grid-scale variance.This infilling makes the dataset ideal for forcing atmospheric models and for comparison with coupled climate model outputs.However, the interpolation has limitations in regions with scarce observations, and so is not optimal for statistical analyses of climate variability.In addition, HadISST does not have an uncertainty estimation of either the bias adjustement, analysis method, measurement or sampling uncertainty.This makes it difficult to estimate possible effects on the CSD analysis (Rayner et al., ).

HadSST and HadCRUT
Given the lacking uncertainty information for HadISST we focus primarily on two similar SST datasets The HadSST dataset has a member ensemble which explores variations of the bias scheme parameters, and in addition comes with measurement and sampling uncertainties.This uncertainty analysis is ideal for estimating uncertainties of CSD indicators.However, as the HadSST dataset is non-infilled, it has large gaps where there is no data.
This makes a grid cell by grid cell CSD analysis impossible, and even causes difficulties when averaging data over larger regions such as the subpolar gyre.In this work, we therefore complement our analysis with SSTs from the HadCRUT This increase in observations affects the statistical properties of the data.Primarily, an increase in observations causes a decrease in the variance, as the data in later times is an average of more values and the variance of the mean scales as , where σ 2 is the variance of the individual observations and n is the number of observations.The effect this would have on the autocorrelation is more difficult to determine.The more accurate values in later times could cause a higher autocorrelation due to the improved signal-to-noise ratio of the data (see Smith et al. ( b)), but the larger range of measurement instruments used in later times could also reduce any false contributions to the autocorrelation that are related to the instruments.In all these cases, a large part of the effect would be included in the uncertainties provided with the datasets, as they account for both sampling and measurement uncertainties.).They are used to obtain the globally complete analysis by calculating an optimal fit to the good profiles and profile levels in each month, given a background (prior constraint).Good profiles and levels are those which do not fail any quality-control check.The resulting optimal interpolation equations are solved using a nu-merical scheme.The background used for this calculation is a damped persistence-based forecast: where x f i denotes the damped persistence forecast for month i, x c i is the climatological mean for that month, and α = 0.9.As we are concerned here with the statistical properties of the data, the influence of using such a persistencebased forecast as the background needs to be addressed.If there are no observations for long periods of time, the analysis will relax to the climatology for that given location.If we then have a single observation, this causes a spike in the data, which relaxes back to the climatology with a monthly lag-autocorrelation of . .In most of the Atlantic Ocean there are very few observations before the s (see Fig S ), and so the monthly autocorrelation is artificially forced to about .at the start of the time series, which also affects the yearly autocorrelation.Depending on the true autocorrelation function of the underlying time series and depending on how much of the forecast is used, this analysis method causes a biased estimate.In particular, it could cause a false indication of CSD if the true monthly autocorrelation for the more recent decades is systematically above . .

AMOC indices
The SST-based AMOC index is calculated as the mean SST in the subpolar gyre minus the mean global SST, following

Caesar et al. (
).The subpolar gyre in this work is taken as the area between °and °N and °and °E (following Menary et al. ( ) for ease of calculation).This is a slightly different area than that used by B , but makes little difference at the low °resolution of HadSST and HadCRUT.We also take the full year instead of the winter months, as the latter method causes no substantial difference for the change in statistical properties.
All salinity time series and global plots in this work are for the thickness-weighted mean of the upper ocean layers, corresponding to the average salinity in the top m of the salinity profiles.The observational weights are similarly averaged in the top layers.The uncertainty is calculated by simple uncertainty propagation: ∆s = ∆s 2 i , where {∆s i } are the uncertainties of each individual level.This incorrectly assumes the layer uncertainties are independent, but is acceptable here as the uncertainty is not used for any quantitative analysis, but only to display an uncertainty range in Figure S .

Gaps in the data
Both HadSST and HadCRUT have gaps in their time series.When averaging this data for different regions, we take a conservative approach: We first spatially average the monthly resolution data and then take the yearly average of the resulting time series.When taking the yearly average we only consider a year if there is data for all months, otherwise it is set to NaN.However because we make the monthly-resolution spatial average first, this approach does not ensure that each grid cell in the region has data for each month of the year.Thus the value for a given year might have more grid cells contributing to one month than another, which could affect variability.For HadCRUT we also calculate CSD indicators at each grid cell where there are less than missing years out of .

Critical slowing down indicators
The restoring rate λ, the variance and the autocorrelation are calculated in the same manner as in B .Each time series is first nonlinearly detrended using a running mean with a -year window.The edges are not removed, so the detrending method is less certain at the first and last years of the time series.The CSD indicators are then calculated in -year running windows.Note that as in B the λ plotted in this study is the numerical result of the regression of ∆x i against x i , and so is related to the analytical λ defined in the text by λ = e λ − 1 (when the timestep ∆t = 1).As the magnitude of λ is immaterial in this study and we are only concerned with its increase or decrease, both definitions behave similarly and are thus interchangeable for our purposes.

Surrogates
Surrogates are created from the detrended SST and salinity time series.Fourier surrogates are calculated by taking the discrete Fourier transform of the time series, multiplying by random phases and then taking the inverse Fourier transform.By the Wiener-Khinchin theorem, the variance and autocorrelation function of wide-sense-stationary random processes are specified by the squared amplitudes of the (discrete) Fourier transform.Thus the Fourier surrogates preserve the variance and autocorrelation function of the original time series.
However, in this work we know the timeseries we are dealing with have been modified by the analysis process and lack of observations.When the analysis method modifies the autocorrelation, as in the case of the salinity data, the Fourier surrogates of the full time series are not a correct null hypothesis for CSD analysis, because the autocorrelation function will include information from the earlier, modified times.In addition, Fourier surrogates can only be calculated for timeseries with no missing values, and thus cannot be used for the HadCRUT global analysis.Thus, in these cases, we do not use Fourier surrogates.

AR( ) surrogates
For the cases where Fourier surrogates cannot be computed, we choose to use AR( ) surrogates on a monthly resolution: where the timeseries value at time t, xt, is determined by the value at times t − 1, t − 2 with autocorrelation coefficients a 1 , a 2 , and t is white noise.Even though the monthly time series in the datasets are close to being AR( ) processes, the higher coefficient is needed to get the correct lag-one autocorrelation coefficient for the yearly averaged timeseries.Since the yearly average is taken before calculating CSD indicators, using an AR( ) process instead of AR( )-AR( ) does not make a big difference as long as the annual lag-one autocorrelation is correct.If the estimate of monthly lag-one autocorrelation is Am, the coefficients are related by: We get the values of a 1 , a 2 for each time series by calculating the true lag-one autocorrelation estimate for the last year of the annual and monthly time series, Ay, Am, and then calculating the a 2 value that minimizes the difference of the estimated A y , A m of the AR( ) time series from the true values.

Modification for salinity surrogates
It is only possible to produce perfect surrogates for the salinity analysis data by repeating the complete analysis used to create the dataset.This is not feasible for this study.We can, however, replicate the effect of the analysis method that would potentially cause spurious CSD detection, namely the relaxation back to the climatology that occurs when there is a lack of observational data.For this we utilize the observational weights provided as part of the EN . .dataset.We start with an AR( )-based surrogate dataset of the global monthly data averaged over the levels in the top m.For each grid cell, we take the months that have an observational weight that is below some limit w 0 and replace the surrogate value with the persistence-based forecast (see equation ).As we did not have access to the climatology used in the EN analysis, we took the first year of analysis data in each grid cell as the climatology.This will in most cases be the true climatology, as very few cells have observational influence in the first year.An example of this replacement process for w 0 = 0.5 can be seen in Figure S .The general location of positive and negative trends is similar to that for λ, but the regions of significance are different.Note that in this case the significance regions are slightly reduced when using the modified surrogates, as the modification causes a larger false increase in AR( ) than it does in λ.Nevertheless, the overall pattern of regions with significant increases remains qualitatively the same.but for the STD.Although almost the whole of the Atlantic shows an increasing variance, using modified surrogates removes all of the significant regions because the modification mimics the false variance increase that the EN analysis method causes.The areas of strong negative trend close to the North American and South African coasts are also due to dataset irregularities (a "spike" in the early , probably due to a faulty data point).Note that these results do not rule out the existence of increasing variance trends in the true SST values, as those would be masked by the analysis method that causes the false trends.

B
used three observational datasets: the HadISST (Rayner et al., ) and ERSST (Huan et al., ) datasets for SSTs and the EN . .dataset (Good et al., ) for salinity profiles.They provide smooth global fields from to present for the SST data and from to present for the salinity data.In this work we will use an updated version of the EN salinity data (EN ) and for further robustness testing additionally use the HadSST (Kennedy et al., ) and HadCRUT (Morice et al., ) SST datasets, which date back to .

Figure :
Figure : a. Mean number of SST observations per month in HadSST and HadCRUT for the time span in each grid cell.b.Number of observations per month in HadSST and HadCRUT in the subploar gyre (SPG, turqoise) and globally (black).Raw observations are the same for both HadSST and HadCRUT.c. Number of years with complete annual mean data (see Methods) for HadSST.d.Fraction of full grid cells after taking the annual mean in the SPG (orange) and globally (black) in HadSST.e. Same as (c) but for HadCRUT.f.Same as (d) but for HadCRUT.The SPG area is shown as a square on the maps.

Figure :
Figure : Significance of trend in λ of subpolar gyre AMOC index in HadSST (a,c,e,g) and HadCRUT (b,d,f,h).a,b.Median (mean) AMOC index (black) and min-max range (turqouise) of samples ( ensemble members) for HadSST (HadCRUT).We use the mean instead of the median for HadCRUT as that is the default product a user would download when not investigating uncertainties.c,d.Same as (a,b) but for the λ of the AMOC indices, computed using a window size of years.e. Distribution (orange) of linear trends of λ computed from Fourier surrogates for each AMOC index sample from HadSST; distribution (turquoise) of linear trends of λ of samples from HadSST with a fitted gausian distribution (solid black).The linear trend of the median HadSST index is shown in dashed black.f.Same as e but for HadCRUT, using fourier surrogates of the ensemble member AMOC indices, and the linear trend of the mean.g. p-value of linear trend of λ of each sample of the HadSST AMOC index with respect to its own Fourier surrogates.The p-values of the median with respect to Fourier surrogates is shown as a dashed black line, and the .significancevalue is shown as a solid grey line.f. same as g but for the AMOC index of the HadCRUT ensemble members, with the p-value of the mean AMOC index as a dashed black line.

Figure
Figure : a. Linear trends of λ time series for the HadISST data.b.Same as (a) but for the HadCRUT mean dataset.Light blue stippling shows the regions where the trends are significant at th percentile, calculated from Fourier surrogates for each cell for HadISST (a) and AR( ) surrogates for each cell for HadCRUT (b).See Methods for more details.
Figure : a. Linear trend of λ time series for the average top m salinity in the EN dataset.Black stippling shows the regions of th percentile significance calculated using AR( ) surrogates modified following the EN analysis method with an observational weight bound of . .These significance regions are indistinguishable from those calculated from the unmodified surrogates (see Figure S ).Only regions between • W and • E with + years of observational weight above .are shown.See Figure S and Methods for details.

Figure :
Figure :Regions in the Northern Atlantic with a statistically significant increase in λ for the EN salinity (purple), HadISST SSTs (turquoise) or HadCRUT SSTs (orange).Significance is calculated from surrogates for each grid cell (see text and Methods).
dataset.The SST data in HadCRUT is based on the HadSST dataset, but provides an additional, more globally complete analysis dataset.The gaps in the data are filled using a Gaussian-process-based statistical method.In regions where the local observations offer an insufficient constraint this infilled data is removed, so the final dataset still has some gaps (see Fig ).The dataset is comprised of ensemble members, which sample the reconstruction error for the Gaussian process in addition to the bias and observational uncertainty in the data.The number of individual SST observations has increased approximately exponentially over the last years (Fig ).
The statistical properties of the EN dataset are affected by the data analysis method in a much clearer way than the other datasets.EN includes both global quality-controlled ocean temperature and salinity profiles and monthly objective analyses (Good et al.,).The profiles are direct observations from various sources, such as the World Ocean Database (Boyer et al., Figure S shows how this modification shifts the global distribution of autocorrelation to higher values.

Figure S :
Figure S : Same as Fig. but for the standard deviation (STD).The STD for the AMOC index in HadSST and HadCRUT shows a marked difference from that of HadISST (see Fig S and B , Fig. e).The overall negative trends of variance are because of a decrease in variance from to , present in both HadSST and HadCRUT, but not in HadISST.This is likely due to the changing number of observations in the subpolar gyre (see Fig ): as the number of observations increase the index is a mean of more values and the variance decreases.

Figure S :
Figure S : Same as Fig. but for the AR trends.

Figure S :
Figure S : Same as Fig. but for the standard deviation (STD) of linear trends.The STD of the ensemble members is different from the STD of the ensemble mean because taking the mean removes the high variability at early times that causes the negative trend in the STDs seen in plots c-e (see also black line vs green range in Fig. S d,f and h).

Figure S :
Figure S : a. Number of good salinity profiles per month in the Atlantic Ocean for the EN . .dataset (note the logarithmic scale).b-g.Mean annual salinity in the upper m (black) with its corresponding analysis uncertainty range (grey shading) for six locations in the North Atlantic.h-m.Mean annual observational uncertainty (black) and the density of good salinity profiles in the surrounding °region (pink) for the same six locations.

Figure S :
Figure S : Number of years in the EN . .data with top m mean observational weight above w = .(a), .(b) and .(c).As in other figures only the region between • W and • E is shown.

Figure S :
Figure S : Modification of AR surrogates of salinity time series (see Methods).First column: mean of distribution of modified surrogate λ trends.Second column: original AR surrogate (black) and modified AR surrogates where w ≤ 0.5 (turquoise).Third column: λ time series of the time series shown in the second column.Fourth column: distribution of λ trends at the location, with a dashed line of the corresponding colour showing the mean.Note that the modification does not systematically bias the trends.

Figure
Figure S : a. Mean of distribution of surrogate λ trends for raw AR surrogates of salinity time series, and modified surrogates with observational weight limit b. w = ., c. .and d. . .The areas where the analysis trends are significant given each of the surrogate distributions is shown in the second row on top of the λ trend of the analysis data (e-h).Note the differences in scale between the induced negative and positive biases (upper row) and the actually inferred trends (lower row).As in other figures only the region between • W and • E is shown.

Figure S :
Figure S : Same as Fig. but for the AR trends.

Figure S :
Figure S : Same as Fig. but for the STD trends.The significant difference between the trends for HadISST and HadCRUT is likely due to dataset properties, which have a large influence on the variance.

Figure S :
Figure S : Similar to Fig. for the AR , but showing the significance regions both for the raw AR( ) surrogates (a) and the modified surrogates with w = 0.5 (b).The general location of positive and negative trends is similar to that for λ, but the regions of significance are different.Note that in this case the significance regions are slightly reduced when using the modified surrogates, as the modification causes a larger false increase in AR( ) than it does in λ.Nevertheless, the overall pattern of regions with significant increases remains qualitatively the same.

Figure S :
Figure S : Same as Fig. Sbut for the STD.Although almost the whole of the Atlantic shows an increasing variance, using modified surrogates removes all of the significant regions because the modification mimics the false variance increase that the EN analysis method causes.The areas of strong negative trend close to the North American and South African coasts are also due to dataset irregularities (a "spike" in the early , probably due to a faulty data point).Note that these results do not rule out the existence of increasing variance trends in the true SST values, as those would be masked by the analysis method that causes the false trends.

Figure S :
Figure S : Same as Fig. but for the AR .Similarly to λ, the significance regions for the three datasets are roughly in the same area.The regions in the Irminger, Greenland and Iceland seas are similar to those found for λ.However the significant regions in lower latitudes are the northern Gulf Stream and its extension into the Atlantic Ocean.

Figure S :
Figure S : Same as Fig. but for both the AR and λ trends.All the significant AR (λ) regions are shown in pink (turquoise).The three datasets are distinguished by pattern: dots, vertical lines and diagonal lines for HadCRUT, HadISST and EN , respectively.The difference between the two indicators is due to the calculation method of λ, which accounts for autocorrelated residual noise.Together the regions for both indicators trace the currents that are part of the AMOC in the North Atlantic: from the Gulf Stream along the North Atlantic Current and into the Greenland, Iceland, Irminger and Labrador seas.

Figure S :
Figure S : To show the difference between the first and last years of salinity analysis data and the effect that modification has on the surrogates, we plot the distribution of the lag-one autocorrelation values of all Atlantic grid cells in the EN . .dataset in the first (a) and last (b) years of data.The analysis autocorrelation (solid black) changes from a sharply peaked distribution in the first years to a wider one in the last years.The raw AR( ) surrogates (dashed black) match the last years, but not the first .The modified surrogates with weight limits of .(turqouise), .(orange) and .(purple) have a more similar distribution to the analysis in the first years, and are unchanged from the raw surrogates in the last years, as required.
. Direct observations of the AMOC strength in the Northern Atlantic only go back to (Frajka-Williams et al., ).Consequently, numerous AMOC fingerprints based on observations spanning longer time periods have been suggested, which are thought to reflect variations in the strength of the AMOC.As the AMOC transports heat and salt northward, SSTs and