Evaluating sea ice thickness simulation is critical for projecting a summer ice-free Arctic Ocean

The rapid decline of Arctic sea ice, including sea ice area (SIA) retreat and sea ice thinning, is a striking manifestation of global climate change. Analysis of 40 CMIP6 models reveals a very large spread in both model simulations of the September SIA and thickness and the timing of a summer ice-free Arctic Ocean. The existing SIA-based evaluation metrics are deficient due to observational uncertainty, prominent internal variability, and indirect Arctic response to global forcing. Given the critical roles of sea ice thickness (SIT) in determining Arctic ice variation throughout the seasonal cycle and the April SIT bridging the winter freezing and summer melting processes, we propose two SIT-based metrics, the April mean SIT and summer SIA response to April SIT, to assess climate models’ capability to reproduce the historical change of the Arctic sea ice area. The selected 11 good models reduce the uncertainty in the projected first ice-free Arctic by 70% relative to 11 poor models. The chosen models’ ensemble mean projects the first ice-free year in 2049 (2043) under the shared socio-economic pathways (SSP)2-4.5 (SSP5-8.5) scenario with one standard deviation of the inter-model spread of 12.0 (8.9) years.


Introduction
The rapid reduction of Arctic sea ice since the end of the 20th century has drawn much attention as an indicator of local and global climate. The change of Arctic sea ice would affect the mid-high latitude climate (Wu et al 2009, Tang et al 2013, Mori et al 2014, Vihma 2014, Gu et al 2018, Chripko et al 2021, economic activity (Ho 2010, Harsem et al 2011, Guy and Lasserre 2016, and ecosystems (Arrigo 2014, Post et al 2013. Since an ice-free summer is the most representative symbol of a warming Arctic, how soon the summer Arctic will become ice-free is always a keen societal concern, as the ice-free Arctic may have remarkable impacts on the Arctic environment, marine ecosystem, and maritime activities. Understanding the observed change of Arctic sea ice and predicting the ongoing evolution toward a seasonally ice-free Arctic Ocean has been a grand challenge for climate science. September sea ice area (SIA) shows an accelerated retreat (about −1.7 million km 2 decade −1 ) from 1998 to 2007, as assessed from the modern satellite passive microwave data record (Comiso et al 2008). The SIA here is defined by the sum of the pixel area multiplied by the sea ice concentration (SIC) in each pixel over all the sea ice grids with SIC greater than 15%. The sea ice thickness (SIT) and sea ice volume (SIV) also show an accelerated loss (Lindsay and Schweiger 2015), especially for the multi-year and perennial ice (Kwok 2018). The summer season is extended, dominated by an earlier melting onset and delayed autumn freeze-up (Stroeve et al 2014). However, after 2008, the September SIA has shown a near-zero trend with a shallow mean state (3.9 million km 2 ) (Swart et al 2015). This trend has not changed in the past six years. A robust decadal variation has occurred over the past three decades, posing considerable uncertainties for projecting anthropogenically forced Arctic SIA changes. However, the reason for decadal variation remains unresolved.
Coupled global climate models are the primary objective tools to investigate the underlying mechanisms of sea ice response and provide future projections based on physical laws. Thus, the ability of climate models to simulate the observed changes in Arctic sea ice has become a central measurement of models' performance in Arcticfocused climate-model intercomparisons (Notz and Community 2020). In 2019, model projections from CMIP6 (the sixth phase of Coupled Model Intercomparison Project) (Eyring et al 2016) became available. Sea Ice Model Intercomparison Project in CMIP6 designed a specific set of diagnostics that allowed for detailed analyses of ice-related processes and process-based evaluations of sea ice simulations in the participating models . However, substantial uncertainty persists in these models' simulated sea-ice loss rate. Most models fail to simulate an evolution broadly in accord with observations for SIA and volume (Notz and Community 2020). Research has indicated that model errors and internally generated climate variability may be the dominant factors contributing to uncertainties in those coupled models (Melia et al 2015, Swart et al 2015. We have examined 40 CMIP6 models' projections of the ice-free Arctic. These models are listed in table S1. The ice-free Arctic is commonly defined as the Arctic September SIA being less than 1.0 million km 2 (Snape andForster 2014, Notz andCommunity 2020). While the CMIP6 models provide a more realistic simulation of Arctic sea ice, the intermodel spread in the projected ice-free Arctic years remains as large as in CMIP5 models (Notz and Community 2020), ranging from 2014 to far beyond 2100 under the medium emission (SSP2-4.5) scenario (Wang et al 2021). This notorious uncertainty makes the all-model ensemble mean less meaningful. Therefore, selecting realistic models to create a reliable projection with reduced uncertainty remains a frontline challenge.
The existing primary approaches for model selection are based on the models' ability to reproduce the observed September SIA-based climatology, linear trend, and its response to global warming (table S2). This work first reviews these current and widely used evaluation metrics and their performances (section 3). Given the essential role of SIT in determining Arctic ice variability in both the melting and freezing seasons, we propose thickness-based evaluation metrics to assess the CMIP6 model's quality (section 4). In contrast to the previous efforts, our unique approach stresses the critical role of SIT in driving long-term sea ice loss. We discuss the role of the thickness-constrained model selection in reducing the simulation and projection uncertainties. Section 5 presents the conclusion and discussion.

Data and method
SIC is commonly used to measure the amount of ice distribution. There are several observational data available. We primarily rely on the passive microwave satellite data record from 1979 to the present, which provides consistent estimates of SIC data from several sensors. The most referenced dataset comes from the National Snow and Ice Data Center (NSIDC) (Comiso 2017) and Hadley Centre observations datasets (Rayner 2003). Due to the satellite orbit inclination, there is a central Arctic hole in the NSIDC data. The hole is patched using the northernmost latitude's average value to keep the data's consistency.
Here we use both datasets to reduce the uncertainty in observation. The SIT derives from Pan-arctic Ice-Ocean Model and Assimilation System, which blends satellite-observed data, such as SIC, into model calculations to estimate SIT. This dataset agrees reasonably well with the submarine observation of ice thickness, and the bias is within 9% along the 1993 submarine track in the Arctic (Zhang and Rothrock 2003). All sea ice data are interpolated into the same grid with 1 • resolution. Then, the monthly SIA is calculated based on sea ice concentration.
The 2 m near-surface air temperature is also used to diagnose the sea ice's sensitivity to global warming. To reduce the uncertainty, we used the ensemble mean of two reanalysis datasets: the fifth generation European Centre for Medium-range Weather Forecasts reanalysis data (ERA-5) with 1 • resolution (Hersbach et al 2020) and the National Centers for Environmental Prediction (NCEP-II) datasets with 2.5 • resolution (Kanamitsu et al 2002). The period of 1979-2014 is chosen as the evaluation period since it overlaps with the satellite observation period, and 2014 is the last year of the models' historical simulation period.
This study analyzes 40 models that provide sea ice simulation for various scenarios (table S1). Most of them provide historical simulations and future projections under the SSP2-4.5 and SSP5-8.5 scenarios, representing a medium and high radiative forcing of 4.5 and 8.5 W m −2 in 2100 relative to the preindustrial levels (O'Neill et al 2016). Among these 40 models, most models provide more than one realization. We used as many realizations as possible to diminish the bias from the different initial conditions. The model information and realization number are shown in table S1. Since models have different grid configurations, we first interpolate the SIC and thickness from model grids to a 1.0 × 1.0 degree standard latitude/longitude grid (same with observation) before calculating the sea ice area. In this way, model results were compared consistently.
It is crucial to choose an appropriate region and season for study due to the complex spatial-temporal variations of sea ice. Here we focus on the Arctic Ocean north of 70 • N to exclude the interference from the Atlantic Ocean. We define the freezing season as four calendar months from December to March and the primary melt season as three calendar months from June to August.

Previous sea ice area-based observational evaluations
3.1. The climatological mean September SIA Climatological September SIA or SIC measures the average sea ice amount in the peak melt season and is widely used for evaluation of the CMIP3 and CMIP5 models (Wang and Overland 2009, Massonnet et al 2012, Stroeve et al 2012, Liu et al 2013, Huang et al 2017. However, the average September SIA from 1979 to 2014 is 5.36 and 4.73 million km 2 in the NSIDC and Hadley datasets. This observed uncertainty is even more significant than some inter-model spreads. Supplementary figure 1 shows two groups of the best ten models selected based on these two different observational datasets. Eight out of ten models are different. The projected ensemble mean ice-free year by the top ten models is 2053 and 2067, respectively, under the SSP2-4.5 scenario. Furthermore, the intermodel spread is not reduced and even increased in the future projection. Thus, the observational uncertainty in September SIA argues against its application for model evaluations.

The linear trend of the September SIA
The linear trend of September SIA is another widely used quantity to measure the mean area reduction rate of sea ice in the calibration period. While this constraint avoids the observational uncertainties, it highly depends on the choice of the reference period. The Arctic SIA trend is superimposed on large decadal variability, making it difficult to determine the actual trend using a limited length of observation. To illustrate the sensitivity of the trend to different evaluation periods, we examined three reference stages 1979-2000 (the CMIP3 period), 1979-2007 (the CMIP5 period), and 1979-2014 (the CMIP6 period) (supplementary figure 2). The average trends for the three reference periods vary from 0.38 to 0.72 million km 2 decade −1 . The top ten models selected using the three trends projected an ice-free year ranging from 2041 to 2082. Therefore, the linear trend of September SIA cannot be used in the model evaluation due to the decadal variation since 1979.

The response of Arctic SIA to global warming forcing
Notz and Community (2020) evaluated the CMIP6 models using September SIA sensitivity to global warming, namely, the September SIA reduction for a given amount of global warming (Gregory et al 2002, Mahlstein and Knutti 2012, Stroeve and Notz 2015, or a given amount of accumulative anthropogenic CO 2 emission (Herrington andZickfeld 2014, Notz and. The sensitivity was estimated with the regression of September SIA to the global annual mean 2 m air temperature. Most CMIP6 models (34 out of 40) underestimate the observed sensitivity (supplementary figure 3(a)). The inter-model spread for the ten selected models, defined by the average standard deviation of September SIA from 1979 to 2014, is 1.37 million km 2 , only slightly smaller than all model spreads (1.87). Thus, the intermodel uncertainty is not significantly reduced. The ensemble mean of the ten models overestimates the observed downward trend. Therefore, by 2039, the projected September SIA will be about one million km 2 (supplementary figure 3(b)). Supposing we use global annual mean CO 2 as the proxy, 25 out of 40 CMIP6 models underestimate the observed sensitivity. The simulated inter-model spread for the ten selected models is 0.97 million km 2 , significantly reduced compared to all model spread. However, the ensemble mean of the ten models overestimates the mean September SIA and projects ice-free year ranging from 2024 to 2065 with one standard deviation of 16 years. The selected good models still show significant uncertainties.
These constraints measure the models' fidelity in reproducing the local Arctic SIA's response to global external forcing. However, the Arctic regional response is driven by atmospheric circulation changes under the global forcing. The heterogeneous atmospheric circulations make regional warming vary from place to place under the same external global forcing (Wang et al 2021). This constraint might be problematic because the Arctic ice is directly affected by local warming rather than the global mean temperature rise or global anthropogenic forcing.

Physical consideration
Previous evaluations focus on the SIA variability in summer. However, the warming during fall and winter significantly reduces SIV and thickness (Lindsay and Schweiger 2015, Kwok 2018). The reduced SIT at the end of winter could enhance summer ice loss Washington 1979, Curry et al 1995). Therefore, we must consider the model's ability to reproduce the observed SIT in the evaluation.
The critical roles of SIT in Arctic ice melting can be understood from the processes in both the freezing and melt seasons. During the freezing season, the average climatological mean temperature is about −20.1 • C, far below the sea ice freezing point (−1.8 • C). Meanwhile, the temperature at the interface between the seawater and the sea ice bottom Figure 1. Evaluation of Arctic sea ice simulation in each CMIP6 model by two metrics. The abscissa is the climatological mean April sea ice thickness (SIT). The ordinate is the summer SIA reduction response rate (September-minus-May SIA) to the April mean SIT from 1979 to 2014. The black and red dots denote observed and 40 models' MME, respectively. The seven models within the red circle are excellent performing models (names marked by red in the legend). The good, fair, and poor models are marked in the legend with blue, black, and yellow ink. keeps at the freezing point. Thus, the substantial temperature gradients in the interior of ice and snow cause continuous heat loss from the ocean to the atmosphere, forming new sea ice at its bottom. In this process, the near-surface air temperature dominates the ice growth efficiency and further determines the SIT at the end of winter. The April SIT preconditions the summer SIA changes. As summer comes, the average air temperature in the melt season June-July-August (JJA) is above the freezing point (−1.8 • C), leading to a broad sea ice melt in the Arctic. Since the albedo of seawater (0.06) is much smaller than sea ice (0.4-0.8 depending on the snow cover above), the seawater absorbs insolation and heats (melts) the ice laterally. The initial warming would release more low reflective seawater, increasing absorption of insolation and eventually induce a further sea ice loss (Curry et al 1995, Serreze andFrancis 2006). This positive ice-albedo feedback amplifies the initial warming and sea ice retreat. Note that the lateral melting (ice-albedo feedback) is highly subjected to the SIT (Parkinson and Washington 1979). The thinner ice would shrink faster for the same amount of heat in the melt season, intensifying the ice-albedo feedback and accelerating the ice area retreat. Thus, the SIT thinning plays a critical role in summer ice melting. In summary, Arctic SIT is crucial in determining Arctic ice melting throughout the seasonal cycle.

Sea ice thickness-based evaluation metrics
We argue that the April SIT is a critical quantity to be evaluated because it serves as an initial condition in the ice melt season. A reliable model must correctly reproduce it. Here we use the climatological mean April SIT from 1979 to 2014 as the first evaluation quantity (the abscissa in figure 1). In April, the SIT reaches its maximum value in the annual cycle, and the entire Arctic is covered by sea ice (SIC is close to 100% in most individual grids). The observed mean   of the simulated September SIA for seven excellent models (a), eight good models (b), ten fair models (c), and 15 poor models (d) under historical experiment. The thick black curve is the observation. The thick red curve is the models' multi-model ensemble (MME) mean for each group. The 'MME RMSE' denotes the MME's root mean square error (RMSE) against observation. The 'uncertainty' is defined by the average root mean square error between the MME and each model (dashed curve), representing the mean inter-model spread in each group. Five-year running mean has been applied to the data. thickness is 2.41 m (black dot). The April SIT evaluates how the models simulate the sea ice growth process in the freezing season.
We propose that the summer SIA response to April SIT is another quantity to be evaluated since the SIT significantly regulates the summer ice area reduction (or lateral melting). This quantity measures the response of summer SIA reduction to each year's initial sea ice condition. The summer here means JJA, during which more than 88% of the SIAloss occurs. We measure the summer SIA area variation by September-minus-May SIA for each year and estimate the summer SIA response rate by the least square regression between the summer SIA reduction and April Arctic SIT using historical experiment data for 1979-2014. The regression coefficient is the response rate used as the second evaluation quantity (the ordinate in figure 1). The correlations between the summer SIA reduction and April Arctic SIT are significant at the 95% confidence level in 38 out of 40 CMIP6 models. 32 out of 40 models have a correlation coefficient exceeding 0.7. The observed response rate is −2.5 million km 2 m −1 . It means that for every one-meter thinning of sea ice in April, the SIA retreats by 2.5 million km 2 in summer.
The September SIA is arguably determined by the April SIT (initial condition) and the summer ice melting as measured by the summer SIA response rate. Therefore, we use these two criteria to select credible models (figure 1). Although the multi-model ensemble (MME) mean (red dot) is close to the observation (black dot), 40 CMIP6 models show enormous deviations from their ensemble means. We divided these models into four groups based on the relative errors from the observations (figure 1). Seven models within the red circle are excellent performers with less than 20% relative error. The eight models within the blue square and outside the red circle are good The thick black curve is the observation; the thick red curve is the 11 models' multi-model ensemble mean. Five-year running mean has been applied to the data. models, with a relative error is less than 40%. The ten models between the blue and black square boxes are considered fair, with a relative error ranging from 40% to 100%. The remaining 15 models outside the black square are poor, with a relative error exceeding 100%. Figure 2 shows each group's September SIA skill in historical simulations . Supplementary figure 4 shows the counterpart for April SIT. Here we use two ways to measure each group's performance. The MME-root mean square error (RMSE), calculated by the RMSE between each group's MME (red curve) and observation (black curve), measures the models' holistic ability to reproduce the observation. The 'Uncertainty' , which is the average RMSE deviating from the MME for each model, depicts each group's inter-model spread. Reliable constraints should select models that not only capture the observation with small MME-RMSE but also reduce the inter-model spreads (small Uncertainty).
The results show that most individual models in the excellent and good group can capture the long-term decline of Arctic sea ice with an RMSE of 0.43 (0.36) million km 2 (figure 2). The MME-RMSE increases for the fair models (0.57) and poor models (0.90). Notably, the uncertainty increases progressively from the excellent model (0.44) to the poor model (2.25). Similarly, the uncertainty in the simulated April SIT also systematically increases from the excellent models (0.19 m) to poor models (0.89 m) (supplementary figure 4). The increase of the uncertainty from the excellent to poor models is four to six folds, suggesting the selected good models can dramatically reduce the simulated uncertainties.

Future projection of the first Arctic ice-free year
Can the selected models reduce the uncertainties in their future projection of the ice-free Arctic? Since only 11 out of 15 excellent and good models (hereafter good models) and 11 out of 15 poor models provided future projection data, we compare two groups of models that have an equal number (11) members. One group consists of 11 good models, and the other includes 11 poor models. The inter-model spread for the 11 good models, defined by the average standard deviation of September SIA from 1979 to 2014, is 0.85 million km 2 , which is only 30% of the 11 poor models' spread (2.8 million km 2 ). Therefore, the good models' spread is less than one-third of the poor models' counterparts. Compared to all (22) models' spread (2.1 million km 2 ), the 11 good models reduced the inter-model spread by 60%. The results confirm that the selected models can significantly reduce the uncertainty of the simulated September SIA. Figure 3 compares the models-projected September SIA for 11 good models (a), (b) and 11 poor models (c), (d) under SSP2-4.5 and SSP5-8.5 scenarios, respectively. The projected first ice-free year by the good group is 2049 (2043), with one standard deviation of the inter-model spread of 12.0 (8.9) years, under the SSP2-4.5 (SSP5-8.5) scenario. By contrast, the projected first ice-free year for the poor group is 2078 (2060), 29 years later than the good group. More importantly, the inter-model spread of ice-free years among 11 poor models is 40.2 (34.2) years under the SSP2-4.5 (SSP5-8.5) scenario, which are about 3.5 times larger than the good models' spread in predicting the ice-free Arctic. The model selection significantly narrows the uncertainty range of the projected ice-free Arctic summer.
In addition to the reduction of SIA, the SIT will also decrease as more areas will be covered by firstyear ice. Figure 4 shows the April SIT and September SIC projected by the selected good models when the ice-free year comes (shading). The contours represent the observed field for the present state (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014).
In the year when the ice-free Arctic comes, much of the central Arctic is covered by sea ice, less than 2.5 m in April (figures 4(a) and (c)). The area of thick sea ice (>2 m) will be reduced by 5.4 (5.0) million km 2 under the SSP2-4.5 (SSP5-8.5) scenario. The average Arctic April SIT is thinned by about 25% compared to the period from 2007 to 2014. The September SIA will also be substantially reduced (figures 4(b) and (d)). The SIC in the central Arctic would generally be less than 50%. The SIA is reduced by 74% compared to the present state (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014). The highly concentrated sea ice in the central Arctic at the present day will break into small floes with more open water released. Most perennial ice will disappear, and the Arctic Ocean will reach a new equilibrium state with the seasonal ice.

Conclusion and discussion
Analysis of the 40 CMIP6 models indicates that the model simulated and projected SIA (sea ice area) and its decline rate have enormous spreads from their ensemble means, representing notorious uncertainties in the Arctic sea ice simulation and projection. This huge uncertainty makes the all-model ensemble mean less meaningful. Therefore, selecting realistic models to create a reliable projection with reduced uncertainty is imperative.
We reviewed the performance of three existing SIA-based evaluation metrics in selecting models and elucidated why these metrics are inadequate. We stress that Arctic SIT plays a critical role in determining Arctic ice melting in both freezing and melt seasons. The maximum April SIT reflects winter freezing processes and serves as the initial condition for the melting season. In summer, the SIT also significantly regulates SIA reduction. The April SIT acts as a bridge, linking the winter freezing and summer melting processes. Therefore, we propose two new thickness-based quantities, the April mean SIT and summer SIA response to April SIT, for evaluating climate models' skills.
We have classified 40 CMIP 6 models into four groups (excellent, good, fair, and poor) and demonstrated that both the errors (RMSE) and uncertainties (inter-modal spread) decrease systematically from the excellent to poor model groups in the simulated longterm decline of Arctic SIA and SIT (figure 2 and supplementary figure 4). The uncertainty in the simulated September SIA in the excellent model group (0.44 million km 2 ) is reduced by about 81% compared to the poor model group (2.25 million km 2 ). Likewise, the uncertainty in the simulated April SIT in the excellent group is reduced by about 79% against the poor group (supplementary figure 4). The model selection using the SIT-based metrics also significantly narrows the uncertainty range of the projected ice-free Arctic summer: The 11 good models' intermodel spread is reduced by 70% compared to the 11 poor model groups. The projected uncertainty could be substantially reduced by eliminating poor models from the MME. The selected models' ensemble mean projects the first ice-free year in 2049 (2043) under the SSP2-4.5 (SSP5-8.5) scenario with one standard deviation of the inter-model spread of 12.0 (8.9) years.
Given the critical roles of the SIT and the notable spread in the CMIP6 models' simulation, we suggest modelers pay specific attention to improving SIT simulation, including the winter thickening and summer thinning processes.
Although the SIT-based evaluation metrics can significantly reduce the inter-model spread, the sources of uncertainty in the coupled climate system models have not been fully elucidated. A better understanding of the physical processes in the seasonal sea ice evolution and establishing a set of physics-based emergent constraints remain a target for further studies. In addition, our new thicknessbased constraints only relate to the sea ice component. The sea ice evolution is governed by local atmospheric forcing (Wang et al 2021). A physics-based constraint metric should consider the performance of atmospheric or oceanic forcing and include their impacts on the Arctic sea ice dynamics.