1 Introduction

In the last decades, a strong increase of the meltwater fluxes from the Greenland ice sheet (GrIS) and surrounding glaciers was observed (Bamber et al. 2018; Mouginot et al. 2019; Otosaka et al. 2023). Since 1995, runoff fluxes have been larger than in any other period during the last 115 years, both in absolute magnitude and amplitude of the change (Fettweis et al. 2017). Mass losses from the largest ice caps in the Canadian Arctic Archipelago (CAA) have also been increasing since the mid-2000 s (Gardner et al. 2011) and ice loss has reached a record in 2019 in Greenland (Sasgen et al. 2020). The total amount of runoff and solid ice discharge from Greenland and surrounding regions exceeds 1300 km\(^3\)/yr (40 mSv) after 2010 (Bamber et al. 2018). This is small compared to the total freshwater (FW) input to the Arctic Mediterranean of 200 mSv (Østerhus et al. 2019) and FW export through Davis and Fram Strait (Haine et al. 2015; de Steur et al. 2018). Yet, the GrIS and surrounding regions contributions have become significant and further increase of the meltwater fluxes is expected in the future (Golledge et al. 2019; Haine et al. 2015).

The impact of the recent increase of FW from GrIS over the Subpolar North Atlantic (SPNA) region and the Arctic remains unclear (Rignot and Kanagaratnam 2006; Böning et al. 2016). Enhanced GrIS melting may have contributed to the 2015 cold anomaly or “cold blob” (Schmittner et al. 2016). The modification of density variations in the deep Labrador Sea have been linked to the GrIS melting (Yang et al. 2016; Robson et al. 2016; Rühs et al. 2021) and may have impacted deep water formation, though with no certainty for now (Böning et al. 2016; Rhein et al. 2018; Yashayaev and Loder 2016). This question is crucial because SPNA watermass transformation is an important component of the Atlantic Meridional Overturning Circulation (AMOC) which plays a key role in our global climate system (Buckley and Marshall 2016). The upper limb of the AMOC transports warm water northward in the Atlantic, compensated by a deep southward return flow of cold, dense waters. Modification of that circulation would have a significant impact on the meridional ocean heat transport in the Atlantic, ocean heat storage and the ocean uptake of anthropogenic carbon cycle (Kostov et al. 2014; Romanou et al. 2017). Rahmstorf et al. (2015) hypothesised that the possible 20th century AMOC weakening may be a consequence of the freshening of North Atlantic (NA) waters resulting from GrIS melt. Indeed, there is some evidence that past increase of FW fluxes may have slowed down the AMOC (Rahmstorf 2002; Hawkins et al. 2011; Hu et al. 2010; Sánchez Goñi et al. 2012), even if recent studies suggest an overestimation of the sensitivity of the AMOC to FW fluxes and Arctic freshening (He and Clark 2022). This possibility is discussed in the 6th chapter of the IPCC Special Report on the Ocean and Cryosphere in a Changing Climate (“SROCC” report, Pörtner et al. 2019) concluding that we were not able to find yet enough evidence to properly attribute the possible AMOC weakening. Moreover, as the AMOC response has a large spread in the last CMIP6 (Bellomo et al. 2021; Gong et al. 2022), there is an emerging need to constrain the AMOC response in future climate predictions (Bonnet et al. 2021).

A variety of freshwater forcing experiments have been conducted to improve knowledge and understanding of these possible links. Some idealised studies (“hosing experiments”) with coupled models showed that unrealistically large flux anomalies distributed around Greenland over different periods can lead to an expected weakening of the AMOC (Gerdes et al. 2006; Jungclaus et al. 2006; Stouffer et al. 2006; Hu et al. 2008; Swingedouw et al. 2013, 2015; Jackson et al. 2023). The studies from North Atlantic Hosing MIP (NAHosMIP) aim at finding the trigger point for an AMOC collapse and hysteresis, applying 0.3 Sv of hosing over the North Atlantic for 20 and 50 years. Some models experience an increase in Arctic salinity in response to highly enhanced FW fluxes. This signal also appeared in a recent realistic freshwater forcing experiment (Devilliers et al. 2021), where around 4.2 mSv were released using an observation-based estimate. A simple conceptual model (Wei and Zhang 2022) linked the Arctic salinity anomalies to the multidecadal AMOC variability. Moreover, the study of Meccia et al. (2023) links the centennial variability of the AMOC to the freshwater propagation and the release of salinity anomalies southwards through the East Greenland Current. Deep-water formation areas are indeed impacted by the resulting changes in North Atlantic surface density. Interestingly, as sea ice plays an important role in this scheme (see also: Schmith and Hansen 2003; Drews and Greatbatch 2017), they found that this variability disappears under a much warmer (\(+\,4.5\,^{\circ }\)C) climate when Arctic sea ice has disappeared. In fact, (Meccia et al. 2023) shows that this version of EC-Earth3 exhibits excessive multi-centennial variability, which might make it difficult to detect smaller signals on other time scales with similar processes involved.

In more recent studies, Ocean General Circulation Models (OGCMs) have been forced with observational and high resolution model-based estimates of FW fluxes from the GrIS melting in order to investigate its potential role on the variability of the AMOC (Böning et al. 2016; Dukhovskoy et al. 2016; Gillard et al. 2016; Marsh et al. 2010; Yang et al. 2016). These ocean-only model studies considered input from the GrIS alone and not the surrounding glaciers and ice caps, and suggested a possible future change in the AMOC which is not yet detectable. However, a very recent study (Martin and Biastoch 2023) using coupled and ocean GCMs highlights the importance of the atmospheric feedback. It is shown in this study and others (Swingedouw et al. 2013) that AMOC’s response to an external forcing, such as enhanced FW input, is much more important when the atmosphere is not able to adjust to surface temperature changes, like it is the case in ocean-only models. In Martin and Biastoch (2023) the importance of resolving the ocean mesoscale dynamics is also discussed: in addition to their well-known role in deep convection processes (Böning et al. 2016; Pennelly and Myers 2022), eddies and realistic boundary currents are needed to properly transport the FW to the central Labrador Sea. This was also highlighted in a FW forcing experiment at 1/24 degree resolution (Swingedouw et al. 2022).

To account for some of the deficiencies in the previous work described above, some experiments were performed including a realistic meltwater runoff flux varying in time and space in the forcing of global coupled models. With the coupled climate model CESM (Community Earth System Model, version 1.1.2), two simulations were performed from 1850 to 2200, one with emission scenarios RCP8.5, the other one with RCP2.6. Over the historical period, no changes in the AMOC strength was found but enhanced GrIS freshwater forcing in RCP8.5 scenario led to a slight weakening of the AMOC by 1.2 Sv at the end of the 21th century (Lenaerts et al. 2015). However, the impact of the runoff forcing could not clearly be disentangled from internal variability, as a quantitative assessment would require several ensemble members. A multi model study of the impact of a modest (0.05 Sv), abrupt increase in the runoff using 200-year-long preindustrial climate model simulations (MPI-ESM, FOCI, and AWI-CM) showed an AMOC decline of 1.1–2.0 Sv emerging after three decades of enhanced runoff (Martin et al. 2022). A much smaller impact on the AMOC was found by imposing observed, varying FW fluxes from Greenland and surrounding regions in an ensemble of ten 95 years historical simulations for the period of 1920–2014 (Devilliers et al. 2021). In this latest study, which was done with the IPSL-CM6A-LR model (Boucher et al. 2020), the AMOC slowed by about 0.3 Sv for the simulated period. This result could be model-dependent and need to be explored in other model systems. Here we follow the protocol from Devilliers et al. (2021), and impose observed time varying FW forcing from Greenland and surrounding regions to the EC-Earth3 model (Döscher et al. 2022) for the historical period: 1920–2014. To include the most recent years in this experiment, we extended the dataset and continued the runs until 2019 with the scenario ssp2–4.5 (Shared Socioeconomic Pathways, Meinshausen et al. 2020).

In Sect. 2, we present the FW flux dataset, the forcing protocol, the climate model and reanalyses products. Technical details are provided regarding the ensemble of forced simulations that is afterwards compared to a control ensemble, and statistical tools that are used in this study. Results in terms of response to the freshwater forcing experiment are presented in Sect. 3, showing impact over NA temperature and salinity, convection activity and AMOC. Comparison of the model simulations to observations and reanalyses in the Arctic and SPNA regions are provided. An overall discussion and concluding remarks are provided in the last section.

2 Materials and methods

The EC-Earth3 climate model (described in Sect. 2.2) is externally forced with an observation-based dataset of freshwater fluxes from Greenland and surrounding regions. This section describes the dataset and the protocol followed in the freshwater forcing experiment.

2.1 The meltwater flux dataset

In this study, we use the [1920–2016] FW flux dataset derived from the work of Devilliers et al. (2021), which was constructed using the observation-based [1958–2016] dataset of Bamber et al. (2018). Here the FW flux estimate of Bamber et al. (2018) is based on a combination of satellite observations of glacier flow speed and regional climate modelling of the glaciers surface mass balance and runoff. It provides monthly values of both runoff and solid ice discharge components. Data are available from Greenland and surrounding glaciers and ice caps at high spatial resolution (5 km) for the period 1958–2016. In the study of Devilliers et al. (2021), the ice runoff estimate for the period 1840–2010 from Box and Colgan (2013) is used to extend these FW flux components back in time in order to obtain a dataset that includes the large runoff fluxes resulting from anomalously high melting rates in the 1920 s. The details method for this extension can be found in Devilliers et al. (2021). The time series of the extended dataset are presented in Fig. 1.

Fig. 1
figure 1

Reconstructed FW fluxes (in mSv) from Devilliers et al. (2021) (their Fig. 3), covering the period 1840–2016. Solid lines are 5-year running means, dashed lines are the mean annual values. The solid black line plotted against the right-hand Y axis is the total FW fluxes from all sources

The solid ice discharge component provided in Bamber et al. (2018) is concentrated along the coast of Greenland. This spatial distribution is modified using satellite-based concentrations of icebergs derived from the Altiberg project (Tournadre et al. 2015). We apply the Altiberg distribution of the icebergs to the solid ice discharge component of Bamber et al. (2018) in order to account for iceberg drift and effectively redistribute freshwater forcing off-shore. The time-averaged spatial distribution of the fluxes, for both runoff (near coastal) and solid ice discharge (redistributed offshore), is presented in Fig. 2.

Fig. 2
figure 2

Time averaged (1920–2014) spatial distribution of icebergs (left) and runoff (right) freshwater fluxes (in km\(^3\)/yr) from Devilliers et al. (2021) (their Fig. 1)

Complete description of the dataset reconstruction for the period 1920–2016 and discussion regarding the methods used can be found in the study of Devilliers et al. (2021). For this study, the fluxes were extended for the years 2017 to 2019 using a linear extrapolation, keeping the spatial redistribution of the year 2016. The extension shows a good agreement with recent observational studies (Mouginot et al. 2019; Box et al. 2022).

2.2 The EC-Earth3 model

EC-Earth is a modular Earth System Model developed by a collaborative European consortium. In this study, we use the Atmosphere-Ocean General Circulation Model configuration of the version that contributed to CMIP6: the standard EC-Earth3 documented in Döscher et al. (2022). In this configuration, the model system includes atmosphere, ocean and sea ice components. The atmospheric component is the Integrated Forecast System (IFS, https://www.ecmwf.int/en/publications/ifs-documentation) from the European Centre for Medium-Range Weather Forecasts (ECMWF), based on its cycle cy36r4, with a T255 horizontal resolution (about 80 km) and 91 vertical levels. The ocean component is version 3.6 of the Nucleus for European Modelling of the Ocean (NEMO; Gurvan et al. 2017) with 75 vertical levels, which is itself composed of the ocean model OPA (Ocean PArallelise) and the Louvain-La-Neuve sea ice model (LIM3; Rousset et al. 2015), both run with an ORCA1 horizontal resolution (about 1\(^{\circ }\) nominal resolution, 48–64 km around Greenland). The atmospheric and oceanic components interact through OASIS-MCT coupler (Craig et al. 2017). The vegetation fields are prescribed and have been derived from a historical simulation performed with EC-Earth3-Veg, a different model configuration that in addition includes interactive vegetation as represented by the LPJ-GUESS model (Smith et al. 2014). The runoff fluxes from land to ocean are derived from a runoff mapper that uses OASIS3-MCT to interpolate local runoff and ice-shelf calving (from Greenland and Antarctica) to the ocean. The runoff and calving received from the atmosphere and from the surface model HTESSEL are interpolated onto 66 hydrological drainage basins on a mapper grid by a nearest-neighbour distance-based Gauss-weighted interpolation method. The runoff is evenly and instantaneously distributed along the ocean coastal points connected to each hydrological basin (Döscher et al. 2022).

In this study we use as reference an ensemble of ten historical simulations performed with EC-Earth3, following the CMIP6 protocol (Eyring et al. 2016) and forcings of greenhouse gas concentrations, land use, volcanic and anthropogenic aerosol concentrations, and solar variability for the historical period 1850–2014. This ensemble is denoted as the “Historical” ensemble in this study, following the notation of Devilliers et al. (2021). The ten members (r1–2, r5, r7–8, r10, r13–15, and r21) start from different initial conditions, obtained from a multi centennial preindustrial simulation, in order to sample internal variability. The ensemble mean of the Historical ensemble represents the forced signal from external radiative forcing, while the difference between the Historical and Melting ensemble means is the forced signal from GrIS melting. The spread represents the amplitude of the internal variability, and can be compared to the forced signals to obtain a signal to noise ratio. The complete ensemble of 25 EC-Earth3 historical members available has also been used to strengthen our results (Milinski et al. 2020). For the last years of the experiment (2015–2019), the Historical ensembles is extended with CMIP6 protocol under the scenario ssp2–4.5. All CMIP6 historical and scenario simulations are available at the CMIP6-Earth System Grid Federation (ESGF) data nodes.

2.3 The forcing protocol

The Historical ensemble is compared in the next sections to a “Melting” ensemble of ten simulations that are branched from the ten simulations of the Historical ensemble at January 1, 1920, and added the freshwater forcing around Greenland as it is described in more detail below.

During the Melting ensemble simulations, the monthly runoff values around Greenland computed by the atmospheric component are replaced by the total value obtained by summing the observation-based solid ice discharge (calving), ice and tundra runoff values presented in Fig. 1, i.e. the total value (black line). The snow melt from GrIS which is calculated from the standard EC-Earth3 and considered as calving around Greenland are therefore set to zero as they are included as liquid flux in the runoff variable (black line). The runoff is added at two meter depth, assumed to be freshwater (0 psu), and at surface ocean temperature. The thermal impact of iceberg melting is neglected as it depends on the ocean upper temperature and is higly spatially variable.

Figure 3 compares the runoff fluxes that are added in the Melting simulations (blue) to the runoff output of a large historical (black) ensemble, both summed over the forcing area for runoff (denoted as the “runoff forcing zone”) displayed in the blue zone on the right map of Fig. 2. This area is well described in Bamber et al. (2018) and it includes Ellesmere, Devon, and Axel Heiberg islands as it is the case for the Greenland runoff-map of EC-Earth but also Svalbard and Iceland. The variable computed is named “friver” (CMIP6 output standard name) and is described as water flux going into sea water from rivers. The calving output was not available for many of Historical members on the ESGF node and we included only 17 members out of the 25 of the total EC-Earth3 historical ensemble as they were the only ones with runoff outputs that were made available. The Melting simulations and the historical simulations present runoff values of the same order of magnitude but we are introducing a multi-decadal variability, which is not present in the Historical members, leading to a maximum correction of 18 mSv in the 1920 s–1930 s (due to a warming period) and from the 1990 s onward. The fact that the model does not capture the increase in 1920 s and 1980 s onwards underline its limitations in the representation of the Greenland Ice Sheet. The finite depth of the snow layer results in a limited amount of possible runoff fluxes, which will clearly affect the model prediction skill in the future decades considering the acceleration of the melting rate. On-going developments regarding the coupling with ice sheet models (Madsen et al. 2022) should help with these limitations. On average, 3.9 mSv is added over the 100 years of the freshwater forcing experiment, while the standard deviation of the 17-members historical ensemble is 2.3 mSv. Monthly climatologies are compared in supplementary Figure A1 where we note that the maximum runoff fluxes occur a little later in the year (July instead of June) in the observation-based estimate than in the EC-Earth3 model.

Fig. 3
figure 3

Mean annual runoff values (in mSv) in the two ensembles summed over the runoff forcing zone: prescribed fluxes in the Melting ensemble (blue line) and computed fluxes in the Historical ensemble (grey lines). The thick black line represents the ensemble mean of the Historical ensemble

Figure 4 shows for comparison the total runoff from the same region as simulated by the first member (r1) of the available friver variable from the CMIP6 models with a nominal resolution of 100 km for the ocean component module. Among the seven models, IPSL-CM6-LR, GDFL-ESM4 and EC-Earth3-CC display the most realistic runoff values while MIROC6 and NorESM2-MM give much lower values.

Fig. 4
figure 4

Five year running mean of historical runoff values (in mSv) from seven CMIP6 models and the reconstructed dataset (thick red line, see Sect. 2.1), summed over the runoff forcing zone

Difference in spatial redistribution of the runoff between the six-member mean of the two ensembles is shown in Fig. 5. Only six members (r1, r2, r7, r10, r14 and r21) are used because those were the only ones with runoff outputs available through the ESGF node. Runoff is increased in almost the whole forcing region except for the part coming from the Newfoundland and Labrador region.

Fig. 5
figure 5

Difference (in mSv) in runoff output between the Melting and Historical ensemble means (6 members) averaged over the period of the experiment [1920–2019]

2.4 The reanalysis and data products

Data and reanalysis products for ocean salinity and temperature are compared to our experiment outputs in the result section of this study (Sect. 3.3). The products have been interpolated on the ORCA1 grid to facilitate comparison.

2.4.1 HadISST

Sea Surface Temperature (SST) data are extracted from the Hadley Centre Global Sea Ice and Sea Surface Temperature (HadISST) product, from Rayner et al. (2003). It is a combination of monthly global fields of SST and sea ice concentration for the period 1871-present. HadISST uses reduced space optimal interpolation applied to SSTs from the Marine Data Bank (mainly ship tracks) and ICOADS (International Comprehensive Ocean–Atmosphere Data Set) through 1981 and a blend of in-situ and adjusted satellite-derived SSTs for 1982-onwards.

2.4.2 ORAS5

The Ocean Reanalysis System 5 (ORAS5) reanalysis dataset provides global ocean and sea-ice reanalysis monthly mean data prepared by the European Centre for ECMWF OCEAN5 ocean analysis-reanalysis system (Uotila et al. 2019; Zuo et al. 2019). OCEAN5 uses the NEMO ocean model and assimilates sub-surface temperature, salinity, sea-ice concentration and sea-level anomalies. The consolidated product uses reanalysis atmospheric forcing (ERA-40 until 1978 and ERA-Interim from 1979 to 2014) and re-processed observations. The near real-time (referred to as “Operational” in the download form) ORAS5 product is available from 2015 onwards and is updated on a monthly basis 15 days behind real time. It uses ECMWF operational atmospheric forcing and near real-time observations. The ORAS5 dataset is produced by ECMWF and funded by the Copernicus Climate Change Service. The choice of this product is motivated by the main conclusion from the master thesis by Rautiainen (2020) stating that the ORAS5 compares well with observations in the Arctic ocean. Moreover, the study of Langehaug et al. (2023) compared ORAS5 with hydrographic data from cruises in the Eurasian part of specific section in the Arctic (CAATEX section) and concluded that ORAS5 represents well the observed temperature and salinity of the Atlantic Layer.

2.5 Statistical testing

In order to test the difference between the Melting and Historical ensemble means (averaged over time and members), we use the modified Student’s t-test proposed by Zwiers and von Storch (1995), which accounts for serial correlation. We can regard the variances as similar according to an F-test (details in Supplementary Section B).

3 Results

In this section, we investigate the response of the model to the freshwater forcing (FWF), by comparing the Melting ensemble to the Historical one (see Sect. 2.2), and both ensembles to observations and reanalysis products (Sect. 2.4).

3.1 Ensemble mean ocean response

3.1.1 Surface temperature and salinity and sea-ice cover

Comparing the Historical with the Melting ensemble mean SST (Fig. 6) reveals a slight cooling of the Gulf Stream region (up to \(-0.2\,^{\circ }\)C, not significant) and a partly (localized in Baffin Bay and east coast of Greenland) statistically significant increase of the North Atlantic SST with a maximum of \(0.9\,^{\circ }\)C along the coasts of Southeastern Greenland, in the Baffin Bay and the East Greenland Current. No difference is detected in the Arctic Ocean.

This broad heating pattern over the SPNA and Nordic Seas along the pathway of the North Atlantic Current could be explained by enhanced heat advection (AMOC), through changes in stratification (mixed layer depth), while the stronger warming observed along the Greenland coast might be a result of local air-sea heat flux changes due to the retreat of sea ice cover (SIC).

Fig. 6
figure 6

Colours show the difference in sea surface temperature (in celsius degree) between the Melting and Historical ensemble means averaged over the period of the experiment [1920–2019]. Black dots show grid points where differences are significant at the 90% level using a modified Student’s t-test taking serial correlation into account. Contour lines show the mean state of the Historical ensemble mean over 1920–2019

Fig. 7
figure 7

Difference in sea surface salinity (SSS, left, in practical salinity unit) and October-November (ON) sea ice cover (SIC, right, in %) between the Melting and Historical ensemble means averaged over the period of the experiment [1920–2019]. Black dots show grid points where differences are significant at the 90% level using a modified Student’s t-test. Contour lines show the mean state of the Historical ensemble mean over 1920–2019

Looking at the sea surface salinity (SSS) and October-November sea ice cover (ON SIC, key period for sea ice formation) differences between the two ensembles, we observe in Fig. 7 a significant freshening of the Baffin Bay by about 1.5 psu and a weak freshening up to 1 psu on the East Greenland shelf. The response in the Arctic is non-uniform, with an increase of \(+\,0.1\) psu in the central Arctic and a freshening of \(-0.1\) psu in the Eurasian basin (not significant).

SIC is slightly decreased in the Melting ensemble in the Baffin Bay and from the south east of Greenland to the Denmark Strait and up to the Nordic Seas, with partial statistical significance (areas where grid cells displaying black dots). Significant changes in SIC appear in connection to the Bering Strait Inflow and in the Arctic export gateways including the Lincoln Sea and downstream Nares Strait and Fram Strait. Confined to the strongly advective regions, this indicates a robust dynamic impact of the forcing.

The slight salinification of the entire SPNA and Nordic Seas (\(+\,0.3\) psu, not significant) indicates a rerouting of Arctic FW export or long-term FW storage in the Baffin Bay and/or an intensification of the salt advection through the AMOC. These changes and the surface temperature response are consistent with a strengthening of the AMOC (Caesar et al. 2018; Rahmstorf et al. 2015; Zhang et al. 2007).

3.1.2 Mixed layer depth and AMOC

The AMOC is partly driven by the convective activity and downwelling in the Labrador (disputed, e.g. Lozier et al. 2019) and Irminger (Rühs et al. 2021) and Nordic Seas (Årthun et al. 2023). Rühs et al. (2021) discuss a potential role of enhanced Greenland runoff on the shift of convection activity from the Labrador to the Irminger Seas. To address those questions, changes in NA mixed layer depth (MLD) and Atlantic meridional stream function are shown in Fig. 8.

Fig. 8
figure 8

Left figure displays the 1920–2019 time-averaged February–March–April mixed layer depth (FMA MLD, positive values increasing with depth) difference between the Melting and Historical ensemble means. Right figure compares the Melting and Historical ensemble means of the Atlantic meridional streamfunction. Black dots show grid points where differences are significant at the 90% level using a modified Student’s t-test. Contour lines show the mean state of the Historical ensemble mean over 1920–2019

Time averaged (1920–2019) comparison of the February–March–April (FMA, period of deep water formation) MLD in Fig. 8 (left) indicates a significant increased convective activity in the Labrador and Nordic Seas (partially significant) in the Melting ensemble as compared to the Historical one. Time series provided in Supplementary Fig. A2 shows that the Labrador Sea MLD seems to be the most impacted, which is possibly linked to the retreat of the sea-ice edge (Fig. 7). This slight increase might be the result of the difference in the spatial redistribution of the freshwater as we can see on Fig. 5 that less freshwater is poured west of the Labrador Sea in the Melting ensemble, which might reduce the stratification and enhance the convection in this region, as shown in Fig. 8. A deeper mixed layer, ventilation and densification of subsurface and deeper waters feed the AMOC and suggest a strengthening. Figure 8 (right) indeed reveals a slight increase in the AMOC strength between the Melting and the Historical ensembles.

The difference of the maximum AMOC at 45\(^{\circ }\)N is \(+\,0.20(\pm 0.81)\) Sv (respectively \(+\,0.18 (\pm 0.63)\) Sv for the maximum AMOC at 26\(^{\circ }\)N and \(+\,0.30( \pm 0.63)\) Sv for the maximum AMOC at all latitudes). The uncertainty or spread represents the amplitude of the internal variability. To compute the uncertainty intervals, the standard deviation of the member-to-member differences in AMOC maximum at 26\(^{\circ }\)N, 45\(^{\circ }\)N, and between 30\(^{\circ }\)S and 60\(^{\circ }\)N between the two ensembles are calculated after averaging each member over the time of the experiment (1920–2019). The uncertainty is too large here to conclude on the response of the AMOC to the external freshwater forcing. It is larger than what was found in a previous study with the IPSL-CM6-LR model where a decrease of \(-0.32 (\pm 0.35)\) Sv for the maximum AMOC at 26\(^{\circ }\)N was found using a similar freshwater forcing in a ten-member ensemble over a comparable period (Devilliers et al. 2021).

Time series of the maximum AMOC at 45\(^{\circ }\)N are presented in Fig. 9. We estimate that the centennial variability of the AMOC is correctly sampled, as shown by the dashed grey lines representing the ensemble spread with the large Historical ensemble of 25 members. The Melting ensemble mean AMOC becomes slightly stronger than the Historical one after about 50 years of simulation demonstrating a modulation of the time evolution of the AMOC on a multi-decadal time scale. The runoff estimate used for the forcing displays indeed a multi-decadal variability (blue curve in Fig. 3) where the minimum values arise between 1960 and 1990, and are lower than the estimation of the model in some years. This period coincides with the one when the Melting ensemble displays a slightly stronger AMOC (Fig. 9). The negative trend of the Melting ensemble is stronger (\({-}\)0.065) than the Historical ensemble trend (\({-}\)0.036) over the last 30 years of the experiment, which is likely a consequence of the enhanced freshwater forcing in this time.

Fig. 9
figure 9

Time series of the maximum AMOC at 45\(^{\circ }\)N for the Melting (blue) and the Historical (black) ensembles. Continue lines are the annual (thin) and 10-year running mean (thick) ensemble means. The spread of both ensembles is one standard deviation. Trends over the last 30 years are represented by the dashed blue and black lines. Dashed grey lines show the spread (1 std.) of the large historical ensemble (25 members)

We note that at the beginning of the experiment, the perturbation leads to an increase of the Melting ensemble spread (represented here by one standard deviation) in comparison to the Historical (e.g., 2.5 Sv vs. 1.8 Sv in 1940). The addition of freshwater is adding noise to the system. At the end of the experiment (in 2019), the spread of the Melting ensemble becomes much smaller (1.0 Sv) than that of the Historical and the large Historical ensemble of 25 members (both 1.5 Sv). A hundred combinations of 10 Historical members from the large ensemble of 25 members were tested and none presented a spread for the year 2014 as small as the one obtained with the Melting ensemble, and this implies a significance at the 0.005%-level (two-sided). This, as well as the temporal evolution, could reflect the effect of the external constraint added by the time-varying FW forcing which is most pronounced at the end of the experiment. It takes indeed several decades after a perturbation to reach for the new quasi-equilibrium state (e.g. Swingedouw et al. 2013; Jackson and Wood 2018). Supplementary Figure A3 describes the distribution of the members for the total period of the experiment, and for the first and the last 10 years. Over the whole period, all ensemble members present a very small increase of the maximum AMOC at 45\(^{\circ }\)N but the Melting members are more concentrated around the ensemble mean than the Historical members at the end of the simulation.

3.2 AMOC response among ensemble members

The AMOC in EC-Earth3 is characterized by a multi-centennial variability (Döscher et al. 2022) and this variability might be strongly related to a variability in the freshwater discharge through sea-ice melt (Meccia et al. 2023). If that is the case, we suggest here that the model’s response could be constrained using the members showing the more consistent representation of the historical AMOC, because these members would be more realistically tuned to the observed freshwater discharge and its impact on sea-ice. Exploring the evolution of maximum AMOC at 45\(^{\circ }\)N for the different ensemble members, we can (subjectively) separate them into three groups. We base the groups on the members’ AMOC strength in 1920: greater than 18 Sv, between 16 and 18 Sv, and lower than 16 Sv (respectively red, green and blue in Fig. 10, showing the maximum AMOC at 45N 10-year running mean). Leaving out the two members in the ensembles with the strongest maximum AMOC state at 45\(^{\circ }\)N (> 18 Sv) at the beginning of the experiment (red in Fig. 10) leads to slowing of \(-0.08(\pm 0.48\) Sv) of the sub-Melting ensemble as compared to the sub-Historical one over the whole period. Leaving out also the members with a more moderately high AMOC (between 16 and 18 Sv, green in Fig. 10) leads to a slowing of \(-\,0.21(\pm 0.58\) Sv) over 1920–2019 for the ensemble composed of the blue members in Fig. 10. The response with this sub-ensemble shows very large uncertainty but is closer to the results obtained in Devilliers et al. (2021) (\(-0.26 \pm 0.37\) Sv, for the maximum AMOC at 48\(^{\circ }\)N) where a similar freshwater forcing experiment was done using the climate model IPSL-CM6-LR over the period 1920–2014. The clearer response (less uncertainty) obtained with the IPSL-CM6-LR model might be linked to the fact that its AMOC multi-centennial variability has a lower amplitude than the one in EC-Earth3.

Fig. 10
figure 10

Left figure shows the time series for the ensemble members of the maximum AMOC at 45\(^{\circ }\)N (10 year running mean). Solid (dashed) lines are the members of the Melting (Historical) ensemble. Red (green and blue) lines are the Melting members with max AMOC at 45\(^{\circ }\)N greater than 18 Sv (between 16 and 18 Sv and lower than 16 Sv) at the beginning of the experiment. Right figure displays the difference member-to-members

Working under the disputable hypothesis that the model’s response to the FWF depend on the AMOC state, we explore the processes at play by regrouping the members into two groups: one with the five members with moderate initial AMOC state (blue members in Fig. 10) and one with the other five members (red and green members in Fig. 10). Figure 10 shows the differences between each member and its forced twin, and we can notice the large spread in the response of the model. The maximum AMOC at 45\(^{\circ }\)N of the sub-ensembles is plotted in Figure A5, which gives an illustration of the two different AMOC responses to the FWF. Time-averaged sea surface height (SSH) differences between Historical and Melting sub-ensembles are shown in Fig. 11. The SSH differences are based on zero-mean globally-averaged SSH in each run. A positive sea surface height difference in the central Arctic is the sign of an enhanced anticyclonic circulation, which helps to keep salinity anomalies (freshwater) inside the Arctic basin.

The first set of sub-ensembles made of initially moderate AMOC members displays a lower Central Arctic Ocean SSH for the Melting runs compared to their Historical counterparts. Time series averaged over the Beaufort gyre region (Fig. 12) show that the SSH is increasing in both sub-ensembles (Historical and Melting). These sub-ensembles seems to be following the mechanism found for EC-Earth3 in the study (Meccia et al. 2023): the more anticyclonic circulation traps more freshwater in the Arctic, and the decrease of the freshwater export from the Arctic increases surface salinity in the NA, which enhances the deep convection and the AMOC strength (Fig. 10, blue members).

The second set of sub-ensembles, with high initial AMOC state, display a higher SSH than the first set (Fig. 12). The mechanism described in Meccia et al. (2023) for the AMOC to go from high to low state is through sea-ice melting, which would reduce the Arctic surface salinity, leading to an anticyclonic circulation and an increase of FW export, which in turn reduces the convection and eventually slows down the AMOC. As both SSH and AMOC stay quite stable along the simulation in these sub-ensembles, we can hypothesise that those members do not have enough sea ice to trigger the mechanism as it is the case in a warmer climate in the study of Meccia et al. (2023). Supplementary Figure A6 and A4 indeed show a much reduced sea-ice cover in those members that start from high initial AMOC state. However the differences are not significant and the signals are overall too weak to draw strong conclusions.

Fig. 11
figure 11

Time-averaged [1920–2019] sea surface height (SSH) difference between the Melting and Historical sub-ensemble means. Left figure compares the members with moderate AMOC (blue members in Fig. 10). Right figure compares the members with high AMOC (green and red members in Fig. 10). Contour lines are the mean state of the Historical sub-ensemble over 1920–2019

Fig. 12
figure 12

Time series of sea surface height (SSH) averaged over the Beaufort gyre (red triangles on right figures, lat[80–88\(^{\circ }\)N], lon[139–179\(^{\circ }\)W]) of the Melting and Historical sub-ensembles. Upper left figure compares members with moderate initial AMOC state (blue members in Fig. 10). Lower left figure compares members with high initial AMOC states (green and red members in Fig. 10). Thick lines are the sub-ensemble means

3.3 Comparison to observations and reanalysis

3.3.1 Surface temperature in the Subpolar gyre region

Changes in the Subpolar gyre (SPG) region surface temperature have been associated with changes in the gyre strength (Chafik and Rossby 2019) and play a role in the preconditioning of the water column for deep convection events. SPG temperatures are also used as a proxy for AMOC strength (Caesar et al. 2018). Figure 13 shows that the freshwater forcing is reducing the SST bias in the SPG region by comparing the Melting and Historical ensemble means to ORAS5 and HadISST products. We notice that ORAS5 reanalysis agree well with HadISST product in this region. The warmer North Atlantic in the Melting ensemble after 1970 is consistent with a stronger AMOC, in comparison to the Historical ensemble (Fig. 9).

We evaluated the root mean square error (RMSE) and the Pearson correlation coefficient (PCC) between the Melting (Historical) ensemble means and the ORAS5 over the 40 years available (1979–2018). Results are given in Table 1. The Melting ensemble shows a reduced error (significantly different from the Historical ensemble RMSE with \(p{\hbox {-value}} \ll 0.01\)) and increased correlation with the observations. Correlation of the Historical ensemble mean is however not significant and not significantly different from the Melting ensemble. Figure 13 also shows that the trend over the last 30 years of the experiment in the reanalysis is almost zero (slope of 0.001), so the Melting ensemble mean trend (slope of 0.017) is improved compared to the Historical one (slope of 0.036). The improvement of the trend is not sensitive to the period chosen over 1980–2020 (not shown) and might be related to the increase of the freshwater forcing.

Fig. 13
figure 13

Top figure displays the Melting (blue line) and Historical (black line) ensemble means SST (in deg. C), averaged over the SPG region represented by the red trapeze displayed in the bottom figure (lat[42.5;58]\(^{\circ }\)N, lon[20;50]\(^{\circ }\)W). The spread of each ensemble is one standard deviation. Red and yellow lines are the time series of the HadISST production and the ORAS5 reanalysis ensemble mean

Table 1 Root mean square error (RMSE) and Pearson correlation coefficient (PCC) are calculated for the SST in SPG region (red trapeze in Fig. 13), comparing the Historical and Melting ensemble means to the ORAS5 product

3.3.2 Beaufort gyre salinity

Previous Sect. 3.1 showed an interesting response from our model in the Arctic region (SIC and SSS in Fig. 7, SSH in Fig. 11), which is investigated here in more detail. An increase of the Arctic surface salinity was also found in response to freshwater forcing in previous studies (Devilliers et al. 2021; Swingedouw et al. 2013; Stouffer et al. 2006). Here, the salinity response is most prominent at depths between 200 and 300 ms, at the bottom of the halocline layer and upper part of the Atlantic layer to the extent they are properly represented.

Figure 14 shows the time series of the 200–300 ms salinity of the two ensembles, Historical and Melting, and the ORAS5 reanalysis ensemble mean, averaged over the Beaufort region. While both ensembles display a fresh bias at these depths, the Melting ensemble is again closer to the reanalysis, consistent with the fact that a higher AMOC would bring more salinity to the Arctic through the Atlantic layer. We note that the spread of the Melting ensemble is again clearly smaller than that of the Historical ensemble at the end of the experiment.

Results in terms of RMSE and PCC are given in Table 2. The bias in terms of RMSE is indeed reduced in the Melting ensemble (significantly different from the Historical ensemble RMSE with \(p{\hbox {-value}} \ll 0.01)\), even though the correlation (significant) with reanalysis is not increased in the Melting in comparison to the Historical ensemble. Correlation coefficients are also not significantly different between the Melting and the Historical ensembles. Figure 14 also shows that the trend over the last 20 years of the experiment in the reanalysis is almost zero (slope of −0.021), so the Melting ensemble mean trend (slope of −0.020) is improved compared to the Historical one (slope of −0.015). The improvement of the trend only occurs during the last 20 years of the experiment (not shown), which might be due to the delay for the freshwater signal to reach this region.

Fig. 14
figure 14

Top figure displays the Melting (blue line) and Historical (black line) ensemble means [200–300] metres salinity (in psu), averaged over the Beaufort gyre region represented by the red trapeze displayed in the bottom figure (lat[75;83]\(^{\circ }\)N, lon[140;180]\(^{\circ }\)W). The spread of each ensemble is one standard deviation. The yellow line is the time series of the ORAS5 reanalysis ensemble mean

Table 2 Root mean square error (RMSE) and Pearson correlation coefficient (PCC) are calculated for the [200–300] metres salinity in the Beaufort Gyre (BG) region (red trapeze in Fig. 14), comparing the Historical and Melting ensemble means to the ORAS5 product

3.3.3 Canadian basin AWCD and AWCT

In order to assess how changes in North Atlantic temperature due to freshwater forcing can impact the Arctic region through ocean exchanges, we focus here on the temperature of the warm Atlantic Water (AW) layer, which is the main water mass at the Arctic intermediate depth (200–800 m). The AW core temperature (AWCT) is defined as the maximum temperature below the halocline (150 m in this study) in the vertical profile, and the AW core depth (AWCD) is defined as the depth of the AWCT (Polyakov et al. 2004; Li et al. 2012, 2014; Wang et al. 2018; Shu et al. 2019). In the observations, the AWCD range is about 200–600 m (Ilıcak et al. 2016). Shu et al. (2019) found that the depth of the maximum temperature below the halocline in several CMIP5 models is much deeper than the observations. Khosravi et al. (2022) also shows that EC-Earth3 deep waters (below 200 m) are too warm and too fresh in the Canadian basin, on average over the period 1979–2014. Langehaug et al. (2023) showed that EC-Earth3 was one of the models that show overall good results for Arctic temperature, with a Halocline Layer (100–300 m) temperature close to ORAS5 reanalysis in a section crossing the Central Arctic Ocean over the period 1993–2010.

We evaluate the AWCT and AWCD in both our ensemble simulations for the Canadian basin, and compare the results to ORAS5. We found that 5 members exhibit an AWCD greater than 3000 m, which is unrealistic but not uncommon among CMIP6 models according to Heuzé et al. (2023). Three of these members are from the Melting ensemble, two are from the Historical ensemble. Interestingly, these members (melting-r2, historical-r5, melting-r13, historical-r13, melting-r15) correspond to the ones that start from a high AMOC state at the beginning of the simulation (red and green members in Fig. 10). The ensemble mean (continuous lines) and sub-ensemble means (dashed lines, with the unrealistic members r2, r5, r13 and r15 excluded from both ensembles) AWCT and AWCD time series are shown in Fig. 15. The RMSE and correlation evaluation are presented in Table 3.

The AWCD shows a large spread in both Melting and Historical ensembles with ensemble means that are about 1000 m too deep when compared to observations. Jumps in the AWCD (Fig. 15) are due to the raw vertical resolution of the vertical column, as we are showing here the depth of grid cell with maximum temperature. Sub-ensembles display more realistic AWCD. Both ensembles exhibit a warm bias when comparing AWCT to reanalysis (at 300 m depth for ORAS5 and 1000 m for EC-Earth). The RMSE is again reduced here in the Melting ensemble, for the complete ensembles of ten members and the sub-ensembles of 6 members. Correlation is however decreased slightly when comparing the Melting runs to the Historical ones, but increased if we look at the sub-ensembles (but not significant).

Fig. 15
figure 15

Top figure displays the Melting (blue line) and Historical (black line) ensemble means AWCT (in \(\,^{\circ }\)C) and middle figure shows the AWCD (in metres), both averaged over the Canadian basin region represented by the red trapeze displayed in the bottom figure (lat[73;87]\(^{\circ }\)N, lon[125;180]\(^{\circ }\)W). The spread of each ensemble is one standard deviation. The dashed lines are the sub-ensemble means, for which 4 members with AWCD > 3000 m are excluded (see Sect. 3.3.3). The yellow line is the time series of the ORAS5 reanalysis ensemble

Table 3 Root mean square error (RMSE) and Pearson correlation coefficient (PCC) are calculated for the the Canadian basin AWCT (red trapeze in Fig. 15), comparing the Historical and Melting ensemble means to the ORAS5 product. Numbers in the parenthesis are for the sub-ensemble means (dashed line in Fig. 15)

4 Discussion and conclusion

In this study, we investigate the impact of a realistic freshwater forcing in the climate model EC-Earth3 in its CMIP6 configuration. Therefore we do not perform a classic “hosing” experiment with a release of a relatively large amount of freshwater, but instead we use an observation-based dataset of runoff and solid ice discharge. Ten simulations are run over the historical period until 2014 and then continued with the ssp2–4.5 scenario until 2019. In these simulations, the runoff and iceberg melting fluxes that are calculated by the model are replaced by values from the observation-based dataset.

The ensemble of these ten simulations, denoted as Melting ensemble, is compared to equivalent (same initial conditions) simulations run without external FWF (Historical ensemble). The comparison of the two ensemble means reveals a small, partially significant, surface warming and salinity increase of the North Atlantic, except for the Baffin Bay, where the FWF is the strongest, which displays a significant freshening. We were able to find some reduction of the temperature and salinity biases in key regions such as the Subpolar North Atlantic region and the Canadian basin of the Arctic. The trends over the last decades are also more aligned with reanalysis in these regions.

The mixed layer depth is increased in the Labrador and Nordic seas and some stratification develops in the Irminger Sea while the ensemble mean AMOC is slightly enhanced (+0.19±0.81 Sv for the maximum AMOC at 45\(^{\circ }\)N) with a large ensemble spread. Opposite trends were found in a similar study done with the IPSL-CM6-LR model where the mixed layer was more shallow in the Labrador and Nordic seas and a slight decrease (\({-}\,0.26\pm 0.37\) Sv) for the maximum AMOC at 48\(^{\circ }\)N was found with a 10-members ensemble forced over 95 years with the same dataset (Devilliers et al. 2021). This might be explained by a lower amplitude of the multicentennial variability of the AMOC that IPSL-CM6-LR shows compared to EC-Earth3. The negative trend of the last 30 years is stronger in the Melting ensemble and coincides with an increase of the FWF. Given the large uncertainty of the results, we conclude that the impact of the FWF on the AMOC remains unclear in both climate models after a century of forcing. The significant differences that we found in terms of salinity and temperature near Greenland do not develop into a large scale impact and mostly remain contained in the FWF zone.

At the end of the experiment, we observe that the spread of the AMOC is reduced in the Melting ensemble, and such reduced variance cannot be found in any other combination of the Historical ensemble members. Temperature and salinity in different regions also display a reduced spread after a hundred years of transient FW forcing. The fact that even a small change in FWF amount and temporal variability modulates the model dynamics is an interesting result for attribution of observed AMOC changes and the development of robust early warning indicators.

Exploring the dynamics of the different members among our two ensembles showed that some members start from a relatively strong AMOC state, and both ensemble means have a much stronger AMOC than in the study with IPSL-CM6-LR. Leaving out high AMOC members leads to a similar response to what was found in the study of Devilliers et al. (2021). The members with very high AMOC at the beginning of the simulation are also the ones that display an unrealistic Atlantic water layer core depth in the Arctic. The study of Caesar et al. (2018) suggests that the state of the AMOC in 1920 was approximately the same as the one in 2010 (see their Fig. 6 or their extended Fig. 6). RAPID-MOCHA array observations (Smeed et al. 2017; Moat et al. 2022) reveal a quite weak state of the AMOC at this period which could suggest that the low-AMOC members of our ensembles are the most realistic and that possibly the AMOC response to the FWF is indeed a small decline and not a small strengthening. The phase of the AMOC in its internal oscillation may have an important impact on the response to the FWF, as it is playing a role in the northward transport of heat and momentum (Meccia et al. 2023). This points to the importance of properly initialising the ocean state for getting better decadal predictions. This also brings forward the question of the validity of the classic equal weights when calculating an ensemble mean. Indeed, members with a strong AMOC could be considered as less realistic, and should therefore have a smaller weight in the ensemble mean calculation. All these results highlight the need of developing new metrics using specific weights when calculating the ensemble means, as it is done for multi-model means (Knutti et al. 2017; Ruiz et al. 2022), in order to properly evaluate the skill of a climate model and its response to a sensitivity experiment.

Finally, we acknowledge that one of the main limitations of this study is the coarse resolution of the ocean model grid, due to the very high computational cost of a coupled system. Therefore the (sub-)mesoscale processes that play a role in connecting boundary current, deep convection and downwelling (Georgiou et al. 2019; Tagklis et al. 2010) are not resolved. In another study, applying the same FW experiment to a very high (1/24\(^{\circ }\)) resolution ocean-only model, the AMOC weakens by about 2 Sv after only 13 years (Swingedouw et al. 2022). This study provides only one simulation, but one can argue that internal variability is much reduced in a forced simulation due to the lack of atmospheric feedback. The role of meso-scale processes and atmospheric feedback were systematically investigated in the study of Martin and Biastoch (2023) and they both showed their importance, making the decision between low (coupled) resolution and high (forced) resolution simulations very difficult. There are some lines of evidence suggesting that the sensitivity of the AMOC to future changes could depend on the resolution (e.g. Spence et al. 2008; Roberts et al. 2020; Swingedouw et al. 2022) which determines the inclusion of the mesoscale processes, the strength and width of the boundary currents and also the mean state and strength of the AMOC (Jackson et al. 2020; Hirschi et al. 2020), even if this is not trivial at 1/4\(^{\circ }\) ocean resolution (see for the EC-Earth3 model: Haarsma et al. 2020). Indeed, the next generation of 1/4\(^{\circ }\) climate models in preparation for CMIP7 have potential for improvement, but are still not resolving the energetic ocean mesoscale, and the correct assessment of the impact of Greenland melting remains a challenge in climate modelling.

To conclude, the response of the system is still undetermined, but we were able to find some signs that the freshwater from Greenland melting plays a role over the whole North Atlantic, and impacts the global-scale circulation and the Arctic deep layers. A clearer response at large scale ermerges at the end of the experiment, showing that in the next decades, the increasing melting fluxes from the Greenland Ice Sheet could play a critical role in climate model dynamics.