A 10-year record of Arctic summer sea ice freeboard from CryoSat-2

Satellite observations of pan-Arctic sea ice thickness have so far been constrained to winter months. For radar altimeters, conventional methods cannot differentiate leads from meltwater ponds that accumulate at the ice surface in summer months, which is a critical step in the ice thickness calculation. Here, we use over 350 optical and synthetic aperture radar (SAR) images from the summer months to train a 1D convolution neural network for separating CryoSat-2 radar altimeter returns from sea ice floes and leads with an accuracy >80%. This enables us to generate the first pan-Arctic measurements of sea ice radar freeboard for May–September between 2011 and 2020. Results indicate that the freeboard distributions in May and September compare closely to those from a conventional ‘winter’ processor in April and October, respectively. The freeboards capture expected patterns of sea ice melt over the Arctic summer, matching well to ice draft observations from the Beaufort Gyre Exploration Program (BGEP) moorings. However, compared to airborne laser scanner freeboards from Operation IceBridge and airborne EM ice thickness surveys from the Alfred Wegener Institute (AWI) IceBird program, CryoSat-2 freeboards are underestimated by 0.02–0.2 m, and ice thickness is underestimated by 0.28–1.0 m, with the largest differences being over thicker multi-year sea ice. To create the first pan-Arctic summer sea ice thickness dataset we must address primary sources of uncertainty in the conversion from radar freeboard to ice thickness.


Introduction
Sea ice extent in the Arctic has declined at an unprecedented rate in recent decades (Stroeve and Notz, 2018), affecting polar amplification of global warming trends (Serreze et al., 2009), changes in precipitation (Webster et al., 2014) and Arctic Ocean freshwater content (Morison et al., 2012). Eight of the lowest ever recorded September sea ice extents have occurred in the last ten years (Fetterer et al., 2017). These changes have fostered growing stakeholder interest in the Arctic Ocean, particularly during summer and autumn months when open water area is greatest (Barnhart et al., 2016), the sea ice is most dynamic  and ocean primary productivity (Arrigo et al., 2012) and biogeochemical processes (Barber et al., 2015) are most active. Accurate forecasts of summer sea ice conditions weeks to months in advance would revolutionize polar numerical weather prediction, increase commercial shipping and cruise ecotourism throughout the Arctic, and improve planning of resource exploitation, fishing, and hunting activities in the marginal ice zone (Guemas et al., 2016).
Our understanding of and ability to predict changes in the Arctic sea ice cover during summer are limited by the availability of remotely sensed sea ice observations. State-of-the-art forecasting systems for short-term (weeks to a few months) sea ice conditions demonstrate significantly improved fidelity when initialized from winter ice thickness observations (Chen et al., 2017;Allard et al., 2018;Blockley and Peterson, 2018) or sub-model grid ice thickness distributions (Schröder et al., 2019). Through idealized sea ice model experiments, Day et al. (2014) and Bushuk et al. (2017) have demonstrated that ice thickness observations can theoretically offer predictions of pan-Arctic and even regional September sea ice area up to 4-5 months in advance. However, initializing sea ice models with thickness observations prior to May/ June is less effective because synoptic episodes of sea ice advection and negative ice-growth feedbacks in spring diminish the impact of winter thickness anomalies on summer ice area (Bushuk et al., 2020). A sharp increase in predictability occurs at the onset of the sea ice melting season, when the ice-albedo feedback acts to enhance remaining thickness anomalies (Sigmond et al., 2016;Babb et al., 2019). This transition has been termed the "spring predictability barrier" and is a robust feature across most of the GCMs in CMIP5 (Bonan et al., 2019).
To date, pan-Arctic sea ice thickness observations from satellite laser altimetry , radar altimetry (Laxon et al., 2013) and Lband radiometry (Kaleschke et al., 2012) are only available for the winter months of October-April. Airborne electromagnetic (EM) sensors and moored upward-looking sonar (ULS) instruments have provided snapshots of the sea ice thickness distribution over limited areas and/or time periods (Haas and Howell, 2015;Belter et al., 2020Belter et al., , 2021. However, consistent pan-Arctic sea ice thickness observations remain elusive for the months of May-September when they would arguably be most valuable. A critical step of the sea ice thickness processing chain for radar altimeters is the separation of measurements from sea ice floes and leads (cracks in the ice that form when ice floes diverge). Lead detection relies on the ability to distinguish different surface types (open ocean, leads and ice floes) from backscattered radar echoes. In the winter months (Oct-Apr), each surface will reflect the return echo differently. Leads tend to return specular reflections resulting in a high backscatter since they present a mirror-like surface. Returns from ice floes are less specular with lower backscatter, while returns from the open ocean have a smooth functional form. This allows parameters such as the calibrated backscatter coefficient sigma naught (σ 0 ), the waveform shape (including parameters derived from the shape like the waveform power, pulse peakiness and leading-edge width), and contextual information (e. g., sea ice concentration from passive microwave satellites), to distinguish the different surface types. For synthetic aperture radar (SAR) altimeters, such as CryoSat-2, further parameters can be derived from the echo stack information (range integrated power (RIP), stack kurtosis, stack peakiness and stack standard deviation) to classify surface types. The usefulness of physical models to parameterize each surface type is limited and tends to be governed by any assumptions made in the modelling. Thus, thresholds can either be chosen empirically (Laxon et al., 2003;Peacock and Laxon, 2004;Ricker et al., 2016;Passaro et al., 2017), or statistically, for example, by machine learning techniques (Poisson et al., 2018;Müller et al., 2017;Dettmering et al., 2018). The accuracy of these lead-ice floe classifiers generally ranges from 80 to 95% (Lee et al., 2016;Dettmering et al., 2018).
Over the summer months (May-Sep), high backscatter specular reflections originate from leads and melt ponds, making reflections from the sea ice surface varied and difficult to distinguish from leads (Drinkwater, 1991). Reflective leads and melt ponds covering as little as 1% of the sensor footprint can produce a specular radar return (Kwok et al., 2018). Thus, traditional surface type classification algorithms perform poorly during the summer. For instance, algorithms developed for Arctic winter months regularly classify >50% of the sea ice-covered area in May onwards as leads despite ice concentrations exceeding 90% (e.g., Lee et al., 2018). Regardless of the challenge to detect leads in summer months, observations of the sea surface height including all specular returns are biased high between June and October, compared to other months (Armitage et al., 2016). This indicates the radar detects bare ice and melt pond surfaces elevated above sea level and can potentially still measure the ice freeboard.
Here, we use a novel approach to accurately separate leads and summer sea ice floes in CryoSat-2 observations of the Arctic Ocean to retrieve sea ice freeboard. We (i) collate images from optical and SAR satellites overlapping altimeter orbits, enabling us to characterize reflections from different surface types, ii) use local along-track variations in echo parameters instead of absolute values to account for the strong variability of summer sea ice conditions, and iii) include the local variation in elevation as a classification variable to distinguish sea ice surface water from leads. Machine learning techniques are applied for the final classification. Radar freeboards are derived from the elevation differences between classified ice floes and leads. We first compare sea ice radar freeboards between May and September to those obtained from a standard CryoSat-2 processing chain for the winter months. We then validate the derived freeboards against independent sea ice freeboard, draft, and thickness observations from airborne laser scanner, ULS, and airborne EM surveys, respectively. We identify sources of apparent bias in the summer radar freeboards, before finally discussing prospects for converting freeboards into a first pan-Arctic sea ice thickness data product for May-September 2011-2020.

CryoSat-2 data
The CryoSat-2 satellite launched in 2010, is equipped with the Ku band SAR/Interferometric Radar ALtimeter (SIRAL) instrument and uses either SAR or SAR interferometric (SARIn) altimeter mode over sea ice. A limitation of obtaining reliable elevation measurements over ice floes and leads is that there can be significant backscatter and waveform shape variation depending on the reflecting surface. To partially account for this we applied the SAR Altimetry MOde Studies and Applications + (SAMOSA+) physical retracker (Dinardo et al., 2018) based on the SA-MOSA2 delay-doppler analytical radar echo model (Ray et al., 2015). SAMOSA2 estimates the epoch, waveform power and significant wave height, while SAMOSA+ estimates an additional parameter, the mean square surface slope, that is designed to model surfaces with a range of backscattering properties from quasi-specular sea ice to fully specular leads. Our available dataset included zero-padding to increase waveform sampling but did not include a Hamming window filter. This filtering step is designed to reduce the effects of side lobes on the antenna main beam but can introduce bias in the lead elevation data and is not explicitly required for the SAMOSA+ retracking (Laforge et al., 2020). We used CryoSat-2 data processed from Level-1B to Level-2 for May-September 2011-2020 with the SARvatore and SARINvatore modules provided by the European Space Agency Grid Processing On Demand (GPOD) service (Dinardo et al., 2016) (Data repositories #45 and #46 available from http://wiki.services.eoportal.org/tiki-index.ph p?page=SARvatore+Data+Repository).

SAR and optical data
We used six satellites, three optical (Landsat-8 and Sentinel-2A and 2B) and three SAR sensors (RADARSAT-2 and Sentinel-1A and 1B), to identify overlapping scenes with CryoSat-2 orbits that were a maximum of 15 min apart. This ensured that we did not have to use a sea ice drift model to align the sensors as, for example, a drift speed of 0.4 m/s for 15 min would result in a 360 m offset, which is approximately the alongtrack sampling of CryoSat-2. Sea ice drift speeds are typically well below 0.2 m/s in the Arctic . All missions except RADARSAT-2 started operating after the CryoSat-2 launch (2010), and all have a lower maximum latitude (see Table 1). Thus, no coincident Table 1 Optical and SAR satellite scenes with overlapping CryoSat-2 tracks along with how many scenes could actually be used to create the testing/training database.

Satellite
First year of operation passes could be obtained for the northernmost 4 o of the CryoSat-2 orbit. Overall, we found 209 and 1918 coincident images for the optical satellites and SAR satellites, respectively. The largest number of coincident images were found between the twin Sentinel-1A and -1B satellites and CryoSat-2 (1864 images). Fewer Sentinel-1 images coinciding with CryoSat-2 SARIn data (N = 58) than SAR data (N = 1806) were identified as the SARIn sea ice data are mostly limited to coastal regions. There was an experimental SARIn area in the open Arctic Ocean, known as the Wingham Box, included in the CryoSat-2 mode mask from November 2010 to July 2014. Only RADARSAT-2 started operating before the mode for this box was changed to SAR, however we could not obtain any overlapping images. We manually checked each corresponding image to find locations of leads that intersected with CryoSat-2 tracks. By manually screening, we were able to detect leads with minimal processing on the images and without using pre-existing optical or SAR image classification schemes. Leads were determined visually from variations in the image intensity and their shape, as leads are typically elongated linear features compared to rounded melt ponds or ice floes. We used a true colour image from the visible bands as the primary method to find leads in optical satellite images because leads are less reflective than ice floes in the visible spectrum. As a secondary check, we also used the difference between the red and blue bands as melt ponds tend to reflect more light at blue wavelengths than leads (Istomina et al., 2016). Both Landsat-8 (USGS, NASA) and Copernicus Sentinel-2 (ESA) data were accessed through the Google Public Cloud (https://cloud.google.com/storage/d ocs/public-datasets).
For the SAR satellites we used the backscatter from both HH and HV polarization images to discriminate leads. The microwave backscatter intensity is mainly governed by the dielectric constant and roughness of the surface along with the incidence angle, polarization and frequency of the incident wave (Carsey, 1992). Thus, the ice type, roughness, wind speed over water, and melt pond fraction all play a role in the backscatter intensity (Scharien et al., 2014). In the HH image, a lead could either appear bright or dark based on incidence angle and wind speed. In the HV image leads typically appear dark but tend to have lower contrast owing to the proximity of the HV signal to the sensor noise floor (Komarov and Buehner, 2017). For Sentinel-1 we used the preprocessing scheme described in (Filipponi, 2018), which involves border and thermal noise reduction, speckle filtering using the refined Lee filter and a conversion to sigma naught (σ 0 ). The RADARSAT-2 data were calibrated to σ 0 and speckle filtered with a Lee sigma filter.
Copernicus Sentinel data were retrieved from the ASF DAAC (https://as f.alaska.edu/data-sets/sar-data-sets/sentinel-1/), processed by ESA, while RADARSAT-2 data were provided by the Natural Resources Canada's Earth Observation Data Management System (www.eodms-sgdot. nrcan-rncan.gc.ca) We only recorded leads with a notable corresponding change in elevation in the CryoSat-2 SAR or SARIn data, and when a lead could be clearly identified in the optical or SAR image, thus the minimum lead size identified was approximately the size of the CryoSat-2 footprint (~300 m).

Training/testing database of CryoSat-2 summer sea ice classes
We found that roughly 10% of the images could be used in the manual classification. This was due to a combination of cloud cover for the optical satellites, the image being over open water, or only a small part of the CryoSat-2 orbit crossing the image. While we roughly obtained the same number of coincident images for each month, in the early and mid-summer months (April-July) it was difficult to manually determine leads from the Sentinel-1 SAR images and CryoSat-2 elevation differences. This was caused by melting/saturated snow and melt ponds in their initial stage of formation on the surface of the ice which reduced the contrast between leads and floes in the SAR image. Such a reduction in contrast has been documented previously over first-year ice (FYI) in May and June (e.g., Barber et al., 1992;Scharien et al., 2014). Thus, most of the lead classifications were from August and September (N = 117, 109 SAR and 8 SARIn) as opposed to 43 SAR and 5 SARIn).
Along with the classified leads we defined two further classes: 'good' floes and 'noisy' floes. This allowed us to create a training and testing dataset that represents the complete CryoSat-2 observational record over Arctic sea ice in summer months. 'Good' floes were defined visually where we could be confident that a sea ice floe was present in both the CryoSat-2 elevation data and its coincident optical or SAR image. 'Noisy' floes were defined visually where the coinciding image showed sea ice floes but the local surface parameters from the CryoSat-2 data were too variable to classify (caused for example by off-nadir reflections from melt ponds, heavily mixed surface types or areas of rough ice). The three classification types: leads, good floes and noisy floes, allowed us to separate where there were local changes in CryoSat-2 parameters (e.g., elevation, σ 0 or waveform shape) associated with a lead, versus large local changes owing to noise. We only distinguished between floe types in the manual classification scheme and did not separate the good from noisy floes when estimating sea ice freeboards from CryoSat-2.
Overall, we classified 170 (157 SAR and 13 SARIn mode) leads along with 236 examples of good ice floes and 193 examples of noisy ice floes. The current data set is available in the supplementary materials, and the spatial distribution of leads is shown in Fig. 1. We obtained good coverage in the Central Arctic Ocean, but there were fewer classified leads over thin sea ice near the lower sea ice concentration ice pack margins. In these marginal ice zone (MIZ) locations, it was more difficult to manually identify leads between thinner sea ice floes because the change in elevation at a lead was often obscured by background elevation variability (noise) over neighbouring sea ice floes. This resulted in only 6% of the classified leads originating from areas of thin ice where freeboards <0.05 m.
The elevation, σ 0 , pulse peakiness (PP: maximum of waveform divided by the average of all bins over a noise floor) and RIP peakiness (RP: maximum of RIP divided by all average of all bins), of leads, good floes and noisy floes in the training/testing dataset were detrended using a 30-point (8.5-km) boxcar filter and are shown in Fig. 2. All four parameters show a clear change over the classified lead, with the σ 0 , PP and RP showing a local increase while the elevation decreases. We also tend to observe a rise in elevation either side of the lead. This is because adjacent radar footprints include off-nadir brighter reflections from the lead which can cause the retracker to 'snag' and measure an incorrect elevation. Over good floes and noisy floes, there was no clear pattern; however, the noisy floe samples exhibited considerably more variability in all parameters. For the SARIn data we observed the same overall patterns (Fig. 2); however, due to the limited number of samples, the mean varied. Other parameters, such as the stack standard deviation, stack kurtosis, and waveform leading edge width were also investigated; however, the four parameters in Fig. 2 showed the clearest evidence of a lead.

Machine learning lead classification
We used 1D convolutional neural network (CNN) supervised learning for our primary classification scheme. The 1D CNN is considered a deep form of learning as it allows us to use the data at any stage of processing, and its features are learned within the algorithm. CNNs, like other deep learning algorithms, are a series of layers where the raw data is transformed (in this case by a 1D convolution) into increasing meaningful representations of the data.
We used the detrended elevation, σ 0 , pulse peakiness, and RIP peakiness over an 11-point (3-km) window centred on the classification point. The 11-point window allowed us to capture the lead classification signal without including too many data points from adjacent sea ice floes that a larger window size could introduce. We implemented the CNN in TensorFlow using the Keras API (https://www.tensorflow.org/api_d ocs/python/tf/keras) (Abadi et al., 2015). The CNN consisted of two 16-point convolution layers (3-point filter size) with a Max Pooling layer to down sample the data between the two layers. The input was a 11 × 4 matrix comprised of 4 parameters (detrended elevation, σ 0 , pulse peakiness and RIP peakiness) over the 11-point window size. The final output was a 3-element matrix of the probability of each classification type. We used the 'softmax' activation on the final output layer, which enabled us to obtain the classification confidence, and the rectified linear unit ('relu') activation on the convolutional layers.
We split the 632 samples from the classification database into 568 training samples and 64 testing samples and ensured there were equal proportions of lead, floe, and noisy floe classifications within each group. Overall, the 1D CNN correctly classified 80% of the testing data. Only 5% of samples that were actually floes or noisy floes were misclassified as leads. If we compared only the lead classification with all sea ice floes and did not distinguish between good or noisy floe types, the overall accuracy of the CNN increased to 90%. The accuracy of the classification scheme on the testing data only partially indicates how well it will perform on the complete CryoSat-2 observational record. Despite the classification sample database being intentionally designed to represent as much of the overall population as possible, any part of the full observational record that is not represented in the training/testing database will not be reflected in the classification scheme's accuracy. Therefore, we further tested the CNN classification scheme visually by applying it to a series of coincident image tracks that contained no training data used in the classification. Figs. 3 and 4 show two examples of the classification schemes in June and August, respectively. The 1D CNN performed well, and where there was a false positive, this tended to be a lower confidence classification. It is notable in both cases, but particularly in June (Fig. 4), that a number of leads appearing in the SAR image are not detected by the CryoSat-2 classification scheme (and often do not show any discernible signal in the CryoSat-2 waveform parameters or elevation). This emphasizes how leads can easily be obscured in the CryoSat-2 data by noise during summer months, when strong reflections from melt ponds can mask the signal from a lead within the same footprint (Kwok et al., 2018). These 'errors of omission' will not significantly affect the estimation of sea ice freeboard. However, it is important to note that melt-pond covered ice floes erroneously classified as leads, that would bias derived freeboards low, represented just 5% of Examples of manually classified leads in Sentinel 1 SAR images (HH polarization), b) for September in the Central Arctic ocean c) for September near the ice margins, and d) for a Landsat-8 optical image in July in the Central Arctic Ocean. The solid black line denotes the location of the coinciding CryoSat-2 track. a) shows the location of classified floes (black circles) and noisy floes (black crosses) for the SAR and SARIn data and location of classified SAR leads (blue) and SARIn leads (red). The background map displays average sea ice concentration for the 2011-2020 study period. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) the misclassified samples.
We also tested the same training data on a decision tree classification algorithm. This allowed us to compare with a classification scheme that is commonly used and easy to visualize and interpret. Decision tree classification is essentially a set of questions inferred from the data that allows the model to predict the target variable. The main disadvantage is that decision tree classification is considered to be a shallower form of learning as it requires us to extract several features of the data ourselves prior to training. Here we used the same detrended parameters and window size as the CNN classification, then calculated the standard deviation of the data over the window and the difference between the classification point and the average of the 10 other points in the window. This resulted in 8 different parameters for the decision tree classification. These features allowed us to capture the local variation and quantify the noise level within a window using a relatively simple set of parameters. We used the scikit-learn decision tree package (https ://scikit-learn.org) with a maximum tree depth of 7 and using the Gini Index to define how the classification questions are split. The decision tree classification had an accuracy of 88% while 5% of the floe or noisy floe samples were again misclassified as leads. However, when we tested the data on the series of coincident tracks that contained no data used in the classification (Figs. 3 and 4), this method produced more false positives which will likely bias the final freeboard estimates. It is also evident that the CNN and DT algorithms regularly classified different samples along these tracks as leads (Figs. 3 and 4), while missing many other leads visible in the imagery. We expect the DT algorithm may not fully capture the change in signal when CryoSat-2 samples a lead, despite performing well on the test dataset.
Finally, we performed a point-to-point comparison on the freeboard measurement derived for all leads classified in the arbitrarily selected summer months of 2012. A point-to-point comparison is typically used to evaluate the measurement consistency; however, a higher variability in freeboard measurements would indicate that there are more points misclassified as leads. The method for deriving freeboard is given below. We compared all pairs of freeboard points within 40 km and 7 days of each other, and we found the paired freeboard measurements from the decision tree method and the CNN had standard deviations of 0.22 m and 0.14 m, respectively. This method is limited by the long time interval of pairwise comparisons, reflected by the high standard deviations. However, the CNN had better consistency than the decision tree classification in this experiment, suggesting that the CNN misclassifies fewer leads.

Radar freeboard calculation
The sea ice freeboard in winter months is conventionally calculated by finding the difference between sea ice floe elevations and the alongtrack sea level interpolated from proximal lead elevations, following appropriate range corrections (i.e., removing the mean sea surface height and atmospheric or tidal effects). This method performs well in the winter months when there is a sufficient density of leads along-track; however, in the summer months, detectable leads can be sparse and using this method would lead us to interpolate over long distances (>100 s km). Instead, we calculated one freeboard estimate at each lead point by interpolating the elevations from nearby sea ice floes (using both points classified as good floes and noisy floes). The mean ice floe elevation around a lead was calculated by fitting a 2nd order polynomial over a span of 7 km centred over each lead point. This span included a sufficient number of floe elevation samples to obtain realistic freeboard measurements even when there was high local variability in elevation. We used robust fitting as the ice floe samples closest to the lead location can exhibit large, unrealistic elevation changes caused by snagging (e.g., Fig. 2). This involves iteratively recalculating the least squared regression with a bisquare weighting that reduces the impact of outlying residuals from the previous fit iteration. The machine learning method can select several lead points in succession, over gaps in the ice wider than a single CryoSat-2 footprint (>300 m); hence, we treat successively classified leads as one freeboard measurement. In some cases, the lead classification selected a point where the local change in elevation was at a minimum plus or minus one point along-track. Thus, we labelled points either side of classified lead samples as leads also, to avoid calculating the freeboard not centred on this minimum (which is likely the correct lead location). This means that every lead is treated as at least a set of three successive CryoSat-2 samples. We performed the fit on ice floe elevations centred over the middle point of the consecutive leads and used the largest estimate as the single freeboard measurement from that consecutive group of lead points. We exclude 7 km segments where more than 50% of the points are labelled as leads, because our machine learning algorithm was not trained on data with a low sea ice concentration. We did not include any point where the mean square error from the polynomial fit through ice floe surface heights (including noisy floes) was greater than 0.5 m to remove unreliable freeboard measurements.
We used inverse distance weighted gridding to produce 15-day freeboard fields at a cell size of 80 km. This cell size is 2-3 times coarser than the 25 km resolution of standard CryoSat-2 freeboard products developed for winter months, which is a limitation of the spatial density of valid freeboard observations from our summer processing scheme. We used a search radius of r = 80 km and measurements weighed by 1 / (1 + (3d /r) 2 ), where d is the distance from the grid node. To remove any anomalous measurements, we flagged cells where either the difference between σ 0 or freeboard and the median of the adjacent cells in the same grid and the grids 15 days before and after the current grid was greater than manually defined cutoff thresholds of 15 dB and 0.1 m respectively. We also removed any cells where the average classification confidence was less than 50%, to filter out freeboards derived from points with a high probability of being misclassified.

Radar freeboard
An example of the gridded radar freeboards and freeboard anomalies estimated for 2013 are shown in Figs. 5 and 6 respectively, and the freeboard climatology from 2011 to 2020 is shown in Fig. 7. We can produce freeboard maps for most of the summer months, only July and early August occasionally exhibit a significant loss of coverage when we expect melt pond coverage on the ice to be highest (Kwok et al., 2018), producing a higher proportion of noisy measurements. Despite the majority of training data coming from August and September, we could still obtain good coverage of valid freeboards in May and June. This is unexpected because we found it challenging to identify leads in the coincident SAR and optical images for this early summer period. However, there are similar levels of noise in the CryoSat-2 observations in May and June compared to later in the summer, so the nadir-looking altimeter appears to be less affected by the transition from a cold to melting/ saturated snowpack than off-nadir SAR imaging sensors (Mahmud et al., 2016;Scharien et al., 2014). We also lose coverage in areas around the margins of the sea ice pack and this is evident from the climatology (Fig. 7). We could not obtain valid freeboard observations in marginal sea ice zones with low ice concentration and the thinnest sea ice floes; for instance, our method is unable to resolve freeboards lower than about 4 cm. This is because we had to use local elevation changes as a key parameter of the classification scheme and therefore could not reliably include leads showing minimal elevation changes (below 4 cm) Fig. 4. Example of the CNN and decision tree lead classification results for a coincident SAR image in June 16th 2018 in the Central Arctic Ocean, with the corresponding CryoSat-2 along-track data for a) RIP peakiness (RP), b) pulse peakiness (PP), c) backscatter (σ 0 ) and d) residual elevation. The CNN classification is shown in e) with blue (confidence >50%) and red (confidence <50%) classified leads. The decision tree classified leads are shown in f) and are green. The lack of contrast between ice floes and leads is common for SAR images obtained in May and June. This is reflected by the high variability (noise) in retracked elevation measurements from CryoSat-2 in d. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) in the training dataset. Few reliable leads were identified in the marginal ice zone (Fig. 1) with the training dataset biased geographically towards the thicker sea ice pack resident in the Central Arctic.
The yearly radar freeboard maps and climatology (Figs. 5, 6 & 7) and time series (Fig. 8) show an expected seasonal evolution with thickness decreasing in May, June and July and increasing in late August and September (Blanchard-Wrigglesworth and Bitz, 2014). The freeboard patterns evolve in a consistent manner over the summer, with relatively thinner and thicker areas of the ice cover generally persisting from one time interval to the next. By including April and October radar freeboard maps from a conventional 'winter' processing scheme , we see that the spatial patterns of the freeboard from shoulder months of the summer processing scheme (May 1st-15th and Sept 16th-30th) closely match the winter data.
It is noticeable that the radar freeboards generally increase between April and May 2013, before declining thereafter (Figs. 5 and 6). This is a robust feature of the summer freeboard fields in every year of the record (Fig. 7) despite the fact we do not expect significant sea ice growth during May (Lindsay and Schweiger, 2015). One likely explanation for this is the scattering horizon of the CryoSat-2 Ku-band radar shifts systematically upwards within the snowpack between April and May. It is typically assumed for the winter processing schemes (Oct-April) that the Ku-band radar penetrates fully through snow to the snow-sea ice interface (e.g., Laxon et al., 2003Laxon et al., , 2013. However, studies have demonstrated that in certain conditions, when the temperature of the snow is higher (Willatt et al., 2009), or the snowpack contains significant layering or ice lenses (Willatt et al., 2011;Ricker et al., 2015), the principal Ku-band scattering horizon can shift upwards. It is very likely that the seasonal warming and melting of the Arctic snowpack on sea ice in May, particularly around the peripheral seas, would introduce moisture between snow grains and prohibit the Ku-band radar from fully penetrating to the snow-ice interface. However, it is also possible that the processing scheme for summer months contributes to increasing radar freeboards in May. By performing a direct comparison of the winter and summer processing schemes in April 2016 (Fig. 9) we see that radar freeboards from the summer processor are marginally thicker in the Arctic peripheral seas.
The apparent differences between winter and summer processors are much clearer for the direct comparison in October 2015 (Fig. 9). We did not include newly formed sea ice in this comparison; however, summer processed data still significantly overestimate the freeboards (<0.04 m) of thin sea ice around the edges of the Central Arctic ice pack compared to the winter processed data. It is also notable that radar freeboards of the remaining MYI in October are generally underestimated by the Fig. 5. CryoSat-2 summer (May-September) radar freeboards for 2013, with freeboards for April and October 2013 from a conventional 'winter' processing scheme by Landy et al., (2020). The grey areas represent the presence of sea ice but no valid ice thickness measurements.
summer processing scheme in comparison to the winter one (Fig. 9f), which we discuss further below. We do observe clear correlation between the patterns of summer and winter freeboards in both April and October, despite these regions of over-and under-estimation.

Validation against independent airborne and mooring observations
We use independent observations of the sea ice freeboard, draft, and thickness to evaluate the CryoSat-2 estimates of radar freeboard in different regions of the Arctic Ocean and periods of the summer melting season. These validation data have nonnegligible uncertainties. To examine the radar freeboards directly, we derived airborne estimates for sea ice freeboard from the Operation IceBridge Arctic summer campaigns in the Chukchi Sea on 16th and 19th July 2016 and the Lincoln Sea 24th and 25th July 2017 (Fig. 10). We assume that most of the snow accumulated on sea ice floes over winter has melted by these midsummer dates , so we can compare airborne laser scanner freeboards directly to the CryoSat-2 radar freeboards without requiring corrections for snow penetration or loading. IceBridge Airborne Topographic Mapper (ATM) returns from sea ice and leads which typically have an absolute elevation accuracy of about 10 cm, are classified with coinciding aerial photographs (Buckley et al., 2020) and used to derive a laser freeboard estimate along the flight track (details in the supplementary). We averaged all valid observations along 7 km sections of the flight track and performed a point-to-grid comparison with the CryoSat-2 freeboard grids. For the 2016 campaign in the Chukchi Sea, the difference between the laser freeboards from the ATM and the CryoSat-2 radar freeboards is 0.02 ± 0.06 m (Fig. 10). However, for the 2017 campaign in the Lincoln Sea, the distributions of ATM laser freeboard are significantly thicker than the coinciding distribution from CryoSat-2. We find that the airborne freeboards vary as a clear function of the sea ice surface roughness (Supp. Fig. S4). Separating the laser freeboards into smoother and rougher sea ice using an arbitrary roughness threshold of 0.35 m, we find the freeboards are underestimated by 0.10 ± 0.06 m and 0.20 ± 0.10 m below and above this threshold, respectively (Fig. 10). This suggests that rougher ice will be more significantly underestimated in thickness by CryoSat-2 than the smoother ice will be.
We further compare the summer freeboards to airborne EM induction sounding observations of sea ice thickness (Krumpen et al., 2016) and ULS observations of sea ice draft from the Beaufort Gyre Exploration Program (BGEP) moorings (https://www.whoi.edu/beaufortgyre). These datasets are chosen to evaluate the spatial and temporal validity of the satellite observations, respectively. To make these comparisons Fig. 6. CryoSat-2 summer (May-September) radar freeboard anomalies for 2013, with winter freeboard anomalies included for April and October from a conventional 'winter' processing scheme by Landy et al., (2020). The data is relative to 2011-2020 averages.
we need to convert the CryoSat-2 radar freeboards to estimates of thickness and draft, respectively (e.g., Tilling et al., 2018). For both comparisons, we use a fixed sea ice density of 930 kg/m 3 and do not include any snow/meltwater loading in the conversion. It is a reasonable approximation to ignore snow or meltwater loading for the mid-summer months of July and August, as we expect the snowpack to have mostly depleted by this point Stroeve et al., 2020) and ponds to have drained to sea level (Eicken et al., 2004;Landy et al., 2014). However, we can expect significant snow loading in May and June, and some new snowfall accumulating towards the end of the summer season. Additionally, melt ponds on the sea ice surface significantly affect the freeboard to ice thickness calculation that we do not account for here (see discussion). Finally, we know sea ice density differs between firstyear and multi-year sea ice (Alexandrov et al., 2010) and may potentially vary throughout the summer. It is beyond the scope of this study to correct for these factors, but we have used a fixed relatively high sea ice density to partially account for them. It is an active area of our research to develop realistic corrections for more accurately converting the new radar freeboards from CryoSat-2 to estimates of summer sea ice thickness.
The EM dataset is comprised of data from the Alfred Wegener Institute (AWI) POLARSTERN ARK-XXVI/3 (TransArc) campaign in 2011 and the IceBird campaigns from 2016 to 2018. For the IceBird campaigns, the ice thickness was measured using the EM-bird sensor towed by a fixed-wing aircraft. Data are concentrated near the coast of Northern Greenland and the Fram Strait (see Fig. 10) and recorded in late-July and August. The TransArc campaign acquired helicopterborne, EM-bird ice thickness data in the Central Arctic Ocean in August and September. The EM-bird estimates sea ice thickness by measuring the electrical conductivity difference between ice and ocean water (Haas et al., 2009), and the accuracy of these measurements is of the order of ±0.1 m over flat ice (Pfaffling et al., 2007;Haas et al., 2009) and can be reduced in the presence of melt ponds (Haas et al., 1997). As the datasets are at a spatial resolution of 10s meters compared to the lower-resolution gridded CryoSat-2 data, we average the airborne data over a 10 km window and then perform a point-to-grid comparison between the down-sampled EM data and CryoSat-2 thickness grids. There is high spatial variability in the comparison between sea ice thickness estimates from CryoSat-2 and EM surveys. If we assume the EM data represent a true reference for the ice thickness, the CryoSat-2 observations acquired around the coast of Northern Greenland and in the Lincoln Sea underestimate the ice thickness by 1.0 ± 0.4 m. In the Fig. 7. CryoSat-2 summer (May-September) radar freeboard climatology for 2011-2020, with climatological freeboards for April and October from a conventional 'winter' processing scheme by Landy et al., (2020). The grey areas represent the presence of sea ice but no valid ice thickness measurements.
Fram Strait, CryoSat-2 underestimates sea ice thickness acquired by the airborne EM sounder by 0.76 ± 0.4 m, while observations in the Central Arctic Ocean underestimate the ice thickness by 0.28 ± 0.2 m (Fig. 10). Despite using a higher ice density than we realistically expect for Central Arctic multi-year ice in August, the satellite observations still underestimate those from the independent airborne reference. Moreover, the mean difference and variability on the difference both increase from thinner sea ice at the periphery towards the thickest ice in the Lincoln Sea. This pattern to the bias matches the results from the direct comparison of CryoSat-2 radar freeboards with OIB airborne laser freeboards. The systematic nature of the bias indicates that it has a physical explanation with the attendant possibility of correcting for it in the conversion from freeboard to thickness, which we discuss below.
The BGEP moorings have been maintained in the Beaufort Sea since 2003, monitoring freshwater and heat content in the Arctic Ocean including the solid freshwater flux through observations of sea ice draft. ULS ice draft observations (which have uncertainties ranging from ±0.05-0.1 m, Krishfield and Proshutinsky, 2006) from Moorings A, B and D are available for the period between 2011 and 2018 coinciding with our CryoSat-2 sea ice thickness observations, enabling us to validate the magnitude and timing of the ice melting rates obtained from our new product. We also include estimates for the ice draft obtained from CryoSat-2 in winter months, using the LARM  and SnowModel-LG  sea ice thickness dataset (available from https://data.bas.ac.uk/full-record.php?id=GB/NERC/BAS/PDC/ 01257) which does include all relevant corrections for the snow load, ice type etc. Satellite-derived ice drafts from a radius of 150 km around each mooring are compared against a 31-day rolling average of daily measurements of the mean ice draft from the mooring ULS (excluding draft measurements <5 cm).
The CryoSat-2 observations appear to capture the timing of sea ice growth and melt cycles in the Beaufort Sea very closely (Fig. 11). The satellite draft estimates in summer exhibit considerably more scatter than the winter estimates but still clearly observe the sea ice thinning and decay between May and September, and in many cases, the measurements at shoulder points between winter and summer processing schemes match closely. We also observe similar variation in magnitude of ice thickness between years. For example, in 2013 and 2014, the ULS had lower summer melting rates and thicker sea ice remaining at the end of the melt season (Tilling et al., 2015), and this is reflected in the satellite observations (Figs. 6 and 11). However, there are some disparities between the CryoSat-2 and ULS data. In some winters, the CryoSat-2 draft estimates from our conventional processor do not match the ULS well, for instance, between November 2014 and April 15 at Mooring D, which provides some evidence for the radar scattering biases highlighted by Khvorostovsky et al. (2020). Despite assuming a high sea ice density, CryoSat-2 ice draft estimates regularly underestimate those from the ULS in May and June, indicating that it is still necessary to account for the snow load into the early melting season. In a few years, the satellite-derived drafts consistently underestimate the ULS for the entire duration of the summer (e.g., 2013 at Moorings B and D), which may reveal a similar bias to those identified in the Lincoln Sea through comparison with the IceBridge ATM and AWI AIREM data. Overall, the correlation coefficients between CryoSat-2 and ULS sea ice drafts, for only data between May and September, are 0.76, 0.63 and 0.62 for Moorings A, B and D, respectively. The mean bias and standard deviation on the bias are − 0.13 ± 0.45 m, − 0.33 ± 0.52 m, and − 0.29 ± 0.51 m, for Moorings A, B and D, respectively.

Potential sources of radar freeboard bias
Comparisons with winter radar freeboard maps, airborne laser scanner, EM and mooring based ULS observations identify clear limitations with the current summer radar freeboard maps. For example, inter-comparing the summer and winter processing schemes in October indicates that the summer processing cannot resolve radar freeboards thinner than 4 cm (Fig. 9); thus, areas of the thinnest sea ice are either overestimated or unmeasured. In contrast, applying a physically based radar retracking approach enables the detection of radar freeboards as low as 2-3 cm in October for the winter processing scheme . The inter-comparison between summer and winter processors also reveals that the thickness of the thickest multi-year sea ice is typically underestimated by the summer scheme (Fig. 9). This is confirmed by the airborne laser freeboard and EM observations which demonstrate that CryoSat-2 underestimates the thickness of late-summer sea ice in the Lincoln Sea, with the magnitude of the difference increasing as the sea ice gets thicker and rougher (Fig. 10). The sources of these biases in summer radar freeboards are likely due to a range of factors which can be broadly grouped into three categories: the classification scheme, the elevation measurement of the radar altimeter and conversion from radar freeboard to ice thickness.
Ideally, we would not have to use local elevation change as a parameter in the classification scheme, as this can potentially add constraints to the magnitude of detectable freeboards. However, we found that the elevation change was vital for separating specular leads from melt ponds (Fig. 2), so the parameter had to be included in the summer classifier. As we did not have enough testing/training samples from the marginal ice zone where the ice tends to be at its thinnest, we could not robustly classify leads that displayed a small elevation change. This is apparent when we compare the winter and summer processing schemes in October: the thickness of new sea ice, which our classifier has not been trained on, is truncated (Figs. 8 & 9). Additionally, any misclassification will potentially result in an underestimation of the freeboard. For example, ice floes misclassified as leads will underestimate freeboard. These only represent 5% of the misclassified samples when compared to the testing data. In contrast, misclassification of leads as ice floes caused by strong reflections from melt ponds that mask the signal from a lead (which may potentially result in an erroneous ice floe elevation measurement) is more frequent in occurrence. However, this will only impact the freeboard calculation if the misclassified points are For (e) we mask out areas of newly formed ice where the sea-ice concentration was less than 5% in September and larger than 5% in October.
used in the polynomial fit to the ice floe elevation. To test the effect of misclassification, we randomly selected 10% of leads and purposefully misclassified them as floes (which is intentionally larger than what is indicated in the training data test). To maintain the same number of leads in the data, we then randomly selected the same number of ice floes and misclassified them as leads. We then performed this 100 times over the arbitrarily selected month of August 2018 and found an average underestimation of the freeboard of 0.3 ± 0.2 cm when compared to the 'correctly' classified data. While we cannot fully quantify the bias as we are unable to accurately determine how much data are misclassified over the entire dataset; the results indicate that it will not significantly contribute to the underestimation of the freeboard, especially as we chose a larger percentage of misclassified leads than the training data test.
The elevation measurement of the radar altimeter is another significant source of uncertainty. Radar altimetry for estimating sea ice freeboard relies on accurate detection of the mean level of ice floe surfaces. If the principal scattering horizon of the radar is not located at the same height as the mean ice floe surface height, the altimeter range measurement will be biased (e.g., Armitage and Ridout, 2015;Ricker et al., 2015). It is well understood, for example, that the troughs of ocean waves reflect the nadir Ku-band radar altimeter pulse more effectively than crests (Melville et al., 1991), lowering the centroid of the scattering horizon with respect to mean sea level. This is known as the electromagnetic (EM) sea state bias which increases with wind speed (i.e., sea surface roughness) up to a bias of around 40 cm (Tran et al., 2010). There is some evidence for a similar bias over sea ice floes in winter, for instance Xia and Xie (2018) discovered that CryoSat-2 increasingly underestimates the freeboard of coincident observations from OIB as the sea ice gets thicker. Moreover, Arctic sea ice floe echoes are generally specular in the summer months (Kwok et al., 2018). This indicates the waveform peak power will frequently be referenced to the surface of reflecting ponds. If the pond surfaces do not lie at the mean ice floe surface elevation, but below it, an EM bias will be added to the range measurement over ice floes. This type of EM-bias over summer sea ice floes with mixed bare ice-ponded surfaces would generally bias the range high, thus freeboard low, and would be larger over rougher sea ice (equivalent to the sea state bias). This EM-bias could be a major contributor to CryoSat-2 underestimating ice thickness versus the independent airborne observations (Fig. 10). Our comparisons with the ATM laser freeboards and airborne EM thickness data support this argument, showing a bias that increases from marginal sea ice in the Central Arctic (2011 data) to Fram Strait (2016) into the roughest, oldest MYI in the Lincoln Sea (2017-2018) (Fig. 10). Since the sea ice roughness is higher for thicker ice (Supp. Fig. S4), we can expect the bias to also be larger for thicker sea ice floes. This mixed-footprint scattering bias is not currently accounted for in the SAMOSA+ retracker over sea ice nor any other conventional retracking algorithm for winter or summer processing.
The range to the sea surface may also be underestimated at leads as the CryoSat-2 footprint will generally be larger than a lead and the radar is sensitive to specular scatterers covering just 1% of the SAR-limited footprint (Kwok et al., 2018). The radar returns classified as leads may comprise reflections from ponds located closer to the nadir point than a nearby lead, causing the retrieved sea surface elevation to be biased high, resulting in an underestimation of freeboard. We have investigated this bias by comparing CryoSat-2 data processed at an 80 Hz posting rate as well as the standard 20 Hz rate. The 80 Hz data has an along-track footprint of ~80 m compared to ~320 m for the 20 Hz data (although the same across-track footprint) and thus will be less susceptible to reflections that do not originate from the lead. We compared the retracked elevation of all lead samples in the testing/training dataset between 20 minus 80 Hz observations and found a median difference of − 5 mm. Therefore, we do not expect mixed pond/lead reflections to be a major source of the apparent sea ice thickness underestimation.
The final potential source of uncertainty is in the conversion from radar freeboard to ice thickness. Alexandrov et al. (2010) provide a range of densities for multi-year ice in winter of 720 to 910 kg/m 3 . One Comparison of CryoSat-2 sea ice thickness observations with airborne EM thickness measurements. The CryoSat-2 thickness estimates were obtained from radar freeboards assuming no snow/meltwater loading and a fixed sea ice density of 930 kg/m 3 . set of ice core observations from level multi-year ice found mean and one sigma ice density of 887 ± 20 kg/m 3 (Eicken et al., 1995) but the authors acknowledge regular desalination during sampling. If sea ice floes in summer are completely permeable, then air pockets below sea level will be filled with ocean water (although this will vary regionally and between new and multi-year ice). Thus, the actual density of the liquid filled sea ice below sea level might be much higher than desalinated ice core observations. In this analysis, we used a relatively high fixed ice density of 930 kg/m 3 , which would tend to overestimate ice thickness. Hence, it is unlikely that the underestimation of ice thickness seen when we compare the EMI and mooring based ULS observations is predominantly due to the conversion from radar freeboard. Moreover, this bias is seen in the freeboard comparison before conversion to ice thickness any local or regional difference in sea ice density will likely contribute to additional sources of uncertainty in the ice thickness calculation. To account for this we would require an improved understanding of the constraints on summer sea ice density.
Residual snow load on the sea ice or melt pond water accumulated on the ice, above sea level, must also be accounted for in the conversion to thickness. This is unlikely to have affected our comparison of CryoSat-2 with the AEM data as the sea ice north of Greenland in August does not support a significant snow load Stroeve et al., 2020). However, it is clear from our comparison to the mooring ULS observations that sea ice draft is regularly underestimated in May-June (Fig. 11)   Fig. 11. Comparison of sea ice draft measured by the Beaufort Gyre Exploration Programme Mooring ULS (Upward Looking Sonar) with ice draft estimates by CryoSat-2 in a 150 km radius surrounding each mooring. CryoSat-2 draft observations in winter are from the LARM algorithm . CryoSat-2 draft observations in summer are obtained by converting radar freeboard to thickness assuming no snow/meltwater loading and a fixed sea ice density of 930 kg/m3. BGEP Mooring A is located at approximately 75 • N 150 • W, Mooring B at 78 • N 150 • W, and Mooring D at 74 • N 140 • W and are shown in Fig. 10. because a residual snow load has not been corrected for , while in July, this may be an additional source of uncertainty in comparison to the ATM data.

Capturing regional and interannual freeboard variability
Regional patterns of the sea ice radar freeboard derived from CryoSat-2 in summer months seem realistic. For instance, the distributions of the freeboard in shoulder months of May and September match closely to those obtained from a conventional radar altimetry processing scheme in April and October, respectively (Fig. 7). We can observe ice freeboard anomalies persisting through the summer in the same locations, which we would not expect to if noise exceeded the CryoSat-2 freeboard signal. In April 2013, negative thickness anomalies are present across the Central Arctic with positive anomalies in the marginal seas, and these patterns persist in our new dataset until mid-June (Fig. 6). It has been shown that the summer of 2013 was anomalously cool, with 5% fewer melting days compared to the 1980-2014 average (Tilling et al., 2015), and involved reduced export of sea ice to the North Atlantic due to atmospheric circulation patterns (Lei et al., 2018) and strong ice convergence against the Canadian Arctic coastline (Kwok, 2015). The freeboard maps in Fig. 6 suggest the switch between negative and positive Central Arctic thickness anomalies occurred in July, with much thicker-than-usual sea ice persisting throughout August and September into the next winter (Tilling et al., 2015). The yearly time series of regional sea ice thickness evolve smoothly throughout the summer, capturing the thinning and advection of thicker sea ice out the Arctic basin from May/June to August, through ice melt and the early stages of ice regrowth in September. (Figs. 8 & 11).
The comparisons between CryoSat-2 estimates of sea ice draft and those observed directly at the BGEP mooring ULS sensors confirm the satellite can accurately resolve the full annual time series of sea ice growth and decay (at least in the Beaufort Sea; Fig. 11). The volume, type, and seasonal evolution of sea ice in the Beaufort Sea varies considerably from year to year (Babb et al., 2019(Babb et al., , 2020, providing a challenging region for altimetry-based sea ice thickness retrievals (Khvorostovsky et al., 2020). In most years, the timing of the transitions between ice growth and melt in May/June and vice versa in September are correct, albeit drafts are regularly underestimated by CryoSat-2 in early summer without correcting for snow loading. Most encouragingly, the satellite accurately captures interannual variations in the sea ice draft remaining at the end of the summer melting season, including the anomalously thick ice in 2013 and 2014. These two years of lower-thanusual summer melting (Tilling et al., 2015) also stand out in regional time series of CryoSat-2 radar freeboards (Fig. 11), with an anomalously strong and early rebound of freeboard in the Central Arctic in August and September 2013 (Fig. 8). In contrast, CryoSat-2 also resolves the rapid melt and thinning of sea ice in July and August 2012, particularly apparent in the Beaufort and Chukchi Seas (Figs. 8 & 11), that led to the lowest ever recorded September ice extent (Parkinson and Comiso, 2013).
Although the radar freeboard fields can capture realistic spatiotemporal patterns of summer sea ice thickness variability, they still have some limitations. For example, a conventional winter waveform classification scheme  identifies 309,000 and 155,000 leads within the region with sea ice concentration > 70% north of 65 N, for October 2015 and April 2016, respectively. In contrast, our summer classification scheme here identifies only 34,000 and 25,000 leads, respectively, for the same months. The machine learning algorithm naturally produces a conservative classification for leads to get an accurate match between training and testing data, which excludes a lot of leads (see Figs. 3 and 4). This is desirable because commission errors in the classification (i.e., pond-covered ice floes erroneously classified as leads) would bias the radar freeboards. Significantly more CryoSat-2 observations are also omitted during summer months due to noise (Fig. 2), which is introduced by off-nadir snagging of the radar to melt ponds and occasionally SAMOSA+ retracking echo side lobes (no hamming or other weighting has been applied). The combined impact of noise and the classifier excluding leads is that we can only obtain valid freeboards at relatively low resolution (80 km), while significant regions of the ice cover can at times be completely missing data, for instance, parts of the Western Arctic in July 2013 (Fig. 5).

Prospects for improvement
Two features of the classification scheme could potentially be improved. Radar freeboards are overestimated in the marginal ice zone because the classifier has been trained on samples geographically biased to the region with high sea ice concentration (Fig. 1). The training samples for leads include significant local variations in retracked elevation that generally represent thicker sea ice floes adjacent to leads in the Central Arctic (Fig. 2). Therefore, we require further training samples representing leads in zones of thinner and less concentrated sea ice at the ice pack margins. It may be possible to use the radar waveform or the full stack of single look echoes for each CryoSat-2 sample in a deeper machine learning classifier, rather than local profiles of derived parameters as we have used here (Fig. 2). However, considerably more training/testing data are required for the algorithm to learn patterns in lower-level waveform observations and the classifier may be prohibitively slow. There will be value in testing whether fully focused (FF-) SAR processing (Egido and Smith, 2016) can reduce noise and improve the classification, by narrowing the along-track footprint to 10s meters; however, we found little improvement using 80 Hz versus 20 Hz posted CryoSat-2 data. The prospective Copernicus Polar Ice and Snow Topography Altimeter (CRISTAL) mission will use open-burst SARIn mode over sea ice, with multiple radar frequencies. FF-SAR processing and off-nadir lead/melt pond detection could enable improved lead height estimation and noise removal with CRISTAL during Arctic summer months.
With the twin Sentinel-3A and -3B SAR altimeters offering coverage up to a latitude of 81.5 N since 2018, it may be possible to improve both the resolution and precision of the summer ice freeboard grids and reduce areas of missing observations by combining all three sensors (Lawrence et al., 2019). This will be particularly valuable in June and July when CryoSat-2 observations are more frequently missing. A significant area of the ice cover lies below 81.5 N (Fig. 7), which may allow us to obtain more testing/training samples of the marginal ice zone. Since March 2020 the RADARSAT Constellation Mission (RCM) has routinely covered almost the entire pole (up to 90 N), which should enable us to identify a larger training database in future. Early research also indicates that it might be possible to measure sea ice freeboard with NASA's ICESat-2 laser altimeter during Arctic summer months, following continued developmental work (Tilling et al., 2020). With CryoSat-2 now on a migrated orbit, operating alongside ICESat-2 for 20+ long coinciding profiles every month in the Cryo2Ice campaign, there may be future opportunities to intercompare radar and laser freeboards over summer sea ice.

Conclusions
In this study we have presented the first estimates of pan-Arctic summer sea ice freeboard from a satellite radar or laser altimeter. The ten-year record covers May to September, between 2011 and 2020, and stiches together a time series of observations from the CryoSat-2 radar altimeter that have, so far, been limited to Arctic winter months.
Meltwater ponds accumulating at the surface of sea ice floes in summer present the major obstacle to derive valid freeboards during summer months. These ponds prevent conventional radar altimeter waveform classification schemes from accurately separating ice floes from leads. Here we identified almost 350 optical and SAR images within 15 min of and coinciding in space with CryoSat-2 passes in order to verify the surface type: 'good' sea ice floe, 'noisy' floe, or lead, of around 600 coinciding CryoSat-2 footprint samples. Samples were split into two groups for training and testing a deep learning 1D CNN classification algorithm based on local variations of four CryoSat-2 parameters. The overall accuracy of the algorithm was ~80% but increased to 90% considering only the difference between all sea ice floes (good and noisy) versus leads and included only 5% of ice floes misclassified as leads. The final algorithm can classify the surface type with an estimate of confidence for any CryoSat-2 observation over sea ice, without requiring external information.
The classifier was applied to all CryoSat-2 SAR and SARIn mode observations north of 65 N, retracked with the SAMOSA+ algorithm through the ESA GPOD service. Sea ice radar freeboards were estimated from the mean elevation of ice floes around each classified lead and gridded to 80-km bi-monthly fields, accounting for observation uncertainties. Valid radar freeboard fields could be obtained for all months of the summer, although the data in June and July occasionally exhibit significant loss of coverage owing to noise and a lack of leads. Freeboard fields at the shoulder months of May and September have patterns that closely resemble freeboards measured in April and October, respectively, with a conventional 'winter' processing scheme. However, the method cannot resolve freeboards thinner than around 4 cm because the training data are confined to the Central Arctic and small elevation variations reflective of leads in thin, marginal sea ice are thus not classified.
Sea ice freeboards evolve as expected throughout the summer months, thinning rapidly between June and August before stabilizing and thickening slightly in September. The timing of the seasonal freeboard evolution and its interannual variations match closely to those measured by independent ULS instruments mounted on the Beaufort Gyre Exploration Program moorings. CryoSat-2 freeboard observations also capture the distribution of airborne laser scanner freeboards measured by Operation IceBridge in the Chukchi Sea in July 2016. However, they underestimate IceBridge freeboards and airborne EMI thickness observations collected over the oldest Arctic sea ice in the Lincoln Sea, with a thickness bias that increases up to one meter as the ice gets rougher. A likely source of bias comes from the radar overestimating the range to sea ice floes, with specular radar echoes tied to the surfaces of reflective melt ponds sitting below the ice floe's mean level. This EM ranging bias should theoretically increase as a function of the sea ice surface roughness.
Our ongoing research will use external datasets and numerical modelling of radar echoes to develop the corrections required for converting radar freeboards to summer sea ice thickness. This includes correcting for the EM ranging bias and for residual snow loading in early summer. We anticipate that a new 10+ year record of Arctic sea ice thickness, including data from the autumn-spring 'cold' season and summer melting season, could be valuable for many polar applications. For instance, they could be assimilated into sea ice prediction systems to improve the skill of weekly-monthly summer ice forecasts (Bushuk et al., 2017), provide opportunities for estimating the sunlight reaching primary producers resident in and under sea ice , or could support active marine operations during Arctic summer months.

Credit author statement
J.L. conceived the study. Both G.J.D. and J.L wrote the paper. G.J.D. undertook the data analysis, and both G.J.D. and J.L. developed the methods. All authors commented on the manuscript.

Declaration of Competing Interest
All authors declare no conflict of interest.

Acknowledgments
This paper is a contribution to the UK Natural Environment Research Council (NERC) Project "PRE-MELT" under Grant NE/T000546/1. JL also acknowledges support from the European Space Agency Living Planet Fellowship "Arctic-SummIT" under Grant ESA/4000125582/18/ I-NS and from the Centre for Integrated Remote Sensing and Forecasting for Arctic Operations (CIRFA) project through the Research Council of Norway (RCN) under Grant #237906. MT acknowledges support from ESA's "CryoSat+ Antarctic Ocean" under grant ESA AO/1-9156/17/I-BG and MT & JL from the "EXPRO+ Snow" under grant ESA AO/1-10061/19/I-EF. The authors thank the SARvatore (SAR Versatile Altimetric Toolkit for Ocean Research & Exploitation) service available through ESA Grid Processing on Demand (GPOD) for providing Level 2 CryoSat-2 observations. This free-to-use service was invaluable for completing the objectives of our study. We acknowledge the use of imagery provided by services from RADARSAT-2 data and products © MDA Geospatial Services Inc. -All Rights Reserved. RADARSAT is an official mark of the Canadian Space Agency. RADARSAT-2 data are available for a fee from the Natural Resources Canada's Earth Observation Data Management System (www.eodms-sgdot.nrcan-rncan.gc. ca). We thank three anonymous reviewers for their comments.