A practical algorithm for the retrieval of floe size distribution of Arctic sea ice from high-resolution satellite Synthetic Aperture Radar imagery

In this study, we present an algorithm for summer sea ice conditions that semi-automatically produces the floe size distribution of Arctic sea ice from high-resolution satellite Synthetic Aperture Radar data. Currently, floe size distribution data from satellite images are very rare in the literature, mainly due to the lack of a reliable algorithm to produce such data. Here, we developed the algorithm by combining various image analysis methods, including Kernel Graph Cuts, distance transformation and watershed transformation, and a rule-based boundary revalidation. The developed algorithm has been validated against the ground truth that was extracted manually with the aid of 1-m resolution visible satellite data. Comprehensive validation analysis has shown both perspectives and limitations. The algorithm tends to fail to detect small floes (mostly less than 100 m in mean caliper diameter) compared to ground truth, which is mainly due to limitations in water-ice segmentation. Some variability in the power law exponent of floe size distribution is observed due to the effects of control parameters in the process of de-noising, Kernel Graph Cuts segmentation, thresholds for boundary revalidation and image resolution. Nonetheless, the algorithm, for floes larger than 100 m, has shown a reasonable agreement with ground truth under various selections of these control parameters. Considering that the coverage and spatial resolution of satellite Synthetic Aperture Radar data have increased significantly in recent years, the developed algorithm opens a new possibility to produce large volumes of floe size distribution data, which is essential for improving our understanding and prediction of the Arctic sea ice cover


Introduction
Determining the processes that govern the evolution of sea ice is critical to the improvement of predictive forecasts of sea ice conditions. A key parameter necessary for studying this evolution and validating the parameterization in sea ice models is the sea ice floe size distribution (FSD). Sea ice models estimate the FSD through floe breakup parameterizations (e.g., Williams et al. 2013a;Horvat and Tziperman, 2015;, and the parameterized FSD is then used to calculate thermodynamic melt (Steele, 1992;Zhang et al. 2016). This is a highly coupled process. It is essential that the in situ evolution of FSD be derived to verify the parameterized evolution of FSD and its impacts on sea ice dynamics and thermodynamics in the models (Williams et al. 2013b;Zhang et al. 2016). However, records of the evolution of the in situ FSD from satellite observations are rare.
The latest satellite Synthetic Aperture Radar (SAR) images cover large areas of the Arctic Ocean on a regular basis and/or on demand at spatial resolutions ranging from sub-meter to hundreds of meters. Satellite SAR has also generated a vast historical inventory of images since the 1990s (or since 2007 for high-resolution images), and an increasing number of high-resolution (~ 3-20 m) satellite data are being acquired. To process SAR imagery efficiently and robustly it requires proven algorithms that are not only applicable across various SAR sensors (to produce accurate FSD) but also fast enough to digest large quantities of imagery. Developing such an algorithm is complicated and challenging due to i) the variability in intensity (backscattering coefficient), ii) the high level of noise (speckles) in the imagery, and iii) the lack of edge discrimination between touching floes. The primary objective of this paper is to introduce a new FSD retrieval algorithm

RESEARCH ARTICLE
A practical algorithm for the retrieval of floe size distribution of Arctic sea ice from high-resolution satellite Synthetic Aperture Radar imagery Sea ice FSD was first studied by using a set of aerial photographs by Rothrock and Thorndike (1984). Since then, FSD has been derived from aerial photography, satellite visible and SAR imagery (e.g., Holt and Martin, 2001;Toyota et al. 2011;Wang et al. 2016). FSD derived from aerial photos and/or visible satellite imagery was used mainly for case studies that examined the effect of lateral melt of small floes (Toyota et al., 2006(Toyota et al., , 2011. These case studies occurred during relatively short periods of time and in restricted locations (like near the ship). Thus, full FSD evolution during winter-to-summer transition has not been measured. Satellite SAR can provide a high level of flexibility by taking images through cloud and darkness, so monitoring the continuous evolution of FSD can be realized. Previously, the major stumbling block in using SAR was coarse spatial resolution of SAR (> ~ 100 m) and low signal-to-noise level, which made it difficult to resolve individual floes reliably. Recent SAR technology has significantly improved its spatial resolution (~ 1-20 m) and signal-to-noise quality, which makes it possible to resolve small floes (~ 50-100 m) near the regime shift between mechanical breakup and lateral melt (Toyota et al., 2011).
FSD retrieval from aerial photos and visible imagery utilizes simple thresholding methods for sea ice segmentation followed by morphological eroding/expanding operator to split touching floes (e.g., Steer et al., 2008;Toyota et al., 2011). The retrieved images are often manually inspected and corrected. The major differences between aerial photos/visible and SAR are the high level of speckle noise and inter-and intra-image variability in intensity in SAR. This high noise and variability makes it even more difficult to apply a simple thresholding method to SAR to produce satisfactory results. Holt and Martin (2001) used local dynamic thresholding (Haverkamp et al., 1995) for sea ice segmentation to derive FSD from ERS-1 SAR data, which combined with correction to wind-roughened open water to reduce the effect of variable intensity within the SAR images. They also implemented a restricted shrinking/growing algorithm (Soh et al., 1998) to split touching ice floes. This method is similar to the erosion/expansion operator used by Steer et al. (2008), but the growth of pixels is restricted so that the original floe boundary can be preserved when floes are split. Such a shrinking/growing algorithm is computationally expensive and difficult to implement, as the number of iterations varies from floe to floe and the correct "marker" (skeleton image) is difficult to achieve. A recent study by Zhang and Skjetne (2015) applied the gradient vector flow snakes algorithm to aerial photographs to delineate the boundaries of touching sea ice floes. The results suggested better handling of detecting correct floe boundaries if the seed and initial contour can be reasonably assumed.

Issues of water-ice segmentation from SAR
Obtaining accurate water-ice segmentation from SAR imagery is important for FSD retrieval, and several algorithms were proposed for this problem. Early attempts include ISODATA cluster analysis (Kwok, 1992) and grey level co-occurrence probability texture features (Barber and LeDrew, 1991). Later, dynamic local thresholding and an expert system with known geophysical parameters were investigated for water-ice segmentation and ice type classification (Haverkamp et al., 1995). Deng and Clausi (2005) proposed a function-based Markov Random Field Model algorithm that did not require any training data but a small set of control parameters. The algorithm uses intensity only for water-ice separation but also incorporates gray-level co-occurrence probability texture features for sea ice type classification. This algorithm was later incorporated into the map-guided system to aid the ice charting by human experts (Maillard et al., 2005;Clausi et al., 2010). Further development includes watershed and iterative region growing with semantics (Yu and Clausi, 2008). The algorithm produces regions (polygons), each labeled via maximum a priori estimation. Such a labeling is based on backscattering (backscattering mean, covariance and number of pixels within the region). The authors reported an accuracy higher than 80% when compared with an ice expert system (Clausi et al., 2010).
Most of the aforementioned algorithms assume that the distribution of image data within each segmentation region follows a Gaussian model. However, SAR image distribution can vary from Gaussian to Gamma depending on sea ice condition, sea surface roughness and incident angle. Therefore, an efficient algorithm that does not fit a specific image distribution would be ideal for sea-ice SAR segmentation. Recently, Salah et al. (2011) proposed parametric kernel graph cuts (KGC), an unsupervised and versatile method that can adapt to any image distribution. Instead of fitting a specific distribution shape, KGC seeks the modes of the image distribution via mean-shift updates and assigns pixels to such modes using a kernelinduced, non-Euclidean distance. This approach relaxes the need for knowing a priori the correct statistical image model within each region. Evaluated over many synthetic and real images with various image distributions (e.g., Gaussian, Gamma and exponential) and contrasts, KGC yielded consistently competitive results in comparison to choosing the correct models (Salah et al., 2011).
In the next section, the developed algorithm is detailed in several steps. This description is then followed by case studies (Section 4) in which the algorithm-produced FSD results are validated against manually produced "ground truth" data. Figure 1 illustrates the processing steps and examples of outputs from each step (SAR, water-ice segmented image, floe-splitting image and FSD). A combination of de-noising filters was applied to reduce speckle noise in the SAR images, including median, bilateral and Gaussian filters. Water-ice segmentation was computed by using paramet-ric KGC (Salah et al., 2011). We used SAR intensity only, as image intensity is in general sufficient for water-ice discrimination, while texture features are effective for ice type classification (Clausi and Yue, 2004;Deng and Clausi, 2005).

Preprocessing and water-ice segmentation
In this study, we used TerraSAR-X (TS-X) single-polarized (HH) StripMap (SM) multi look ground range detected image products. Before applying KGC, the original TS-X data were radiometrically calibrated to backscattering coefficient (σ 0 ) values (using Sentinel-1 Toolbox). They were linearly scaled to 8-bit gray-scale intensity, and then were applied with a low-pass Gaussian filter and a bilateral filter (Tomasi and Manduchi, 1998) to reduce speckle noises. The incidence angle dependencies of the observed σ 0 over various types of sea ice were not corrected, as the incident angle dependent variation (IADV) is relatively small for TS-X SM images. The IADV can be large for some wide-area swath images such as Radarsat-2 ScanSAR Wide (500 × 500 km). However, the swath size of TS-X SM images are relatively small (~ 30 × 50 km) and the IADV within the image is also very small (less than a couple of degrees). Thus, IADV within the image was ignored. In addition, the KGC algorithm can handle some internal intensity variation by adjusting to various image distributions. Let ( ) 2 : x I x I I x ∈ Ω ⊂ → = ∈ R R be our processed image function. The process of image segmentation is to find a partition { } 1 k l l R = of Ω into k homogeneous regions. Let λ denote a variable labeling function  of W, which assigns each pixel to a region parameter The problem consists of finding a labeling function l that minimizes the following functional (Salah et al., 2011): where D is a kernel-induced, non-Euclidean distance evaluated by the radial basis function kernel: ( ) ( ) is a regularization function given by the truncated squared absolute difference (Boykov et al., 2001): where const is a constant. Minimization of E is carried out by iterating two steps: (i) optimization with respect to partition { } 1 k l l R = using swap moves from combinatorial graph-cut optimization (Boykov et al., 2001); and (ii) optimization with respect to region parameters μ l using fixed-point iterations (Salah et al., 2011), which interestingly yield mean-shift updates (Comaniciu and Meer, 2002). Such mean-shift updates drive each parameter μ l towards the mode of the image distribution of region R l . The two main control parameters of the KGC algorithm are k (the number of regions) and β, which controls the weight of the smoothness term. We choose β less than 0.01 to keep the details in the water-ice segmented image. The selection of k is dependent on image distribution, e.g., Gaussian mixture vs Gamma mixture, and the values of k that we selected were between 2 and 18. For instance, when the actual image distribution approaches Gamma mixture, we observed that higher values of k produce better results. The optimal labeling λ produced by KGC is a continuous variable in the range [0, 1], and our results were obtained by thresholding λ . For water-ice segmentation, only one cut-off threshold (τ) (typically τ = 0.1) is required to distinguish between water and ice pixels. For ice-type classification, two cut-off thresholds are required to classify pixels into open water, first-year ice and multiyear ice. We note that the ice type classification is based on an optimization of the relative contrast in backscattering intensity between first-year ice and multiyear ice for each TS-X image, not accounting for backscattering variation due to incident angle. A globally applicable algorithm for ice type classification is being currently developed for L-, C-and Ku-band SAR by Nghiem et al. (2016).

Floe splitting
Floe splitting was performed through a combination of distance transformation, watershed and simple rule-based boundary revalidation processing . We tried to avoid using any sort of shrinking/growing algorithm (Soh et al., 1998;Steer et al., 2008), as they will lead to deformation of the floes and inaccurate FSD estimation, and the implementation requires expensive computing. Once the binary water-ice image (0 = water and 1 = ice) has been produced from the SAR image by using the KGC algorithm (see Section 3.1), the distance transform was applied to the binary water-ice image. For each pixel in the image, the inverse distance transform calculates the distance between the pixel and the nearest water pixel of the binary image (the Euclidean distance was used by default). Thus, the ice interior away from open water has larger negative values in the inverse distance transform, forming the basin of the watershed. Any un-detected "false" holes would produce unnecessary minima that lead to over-splitting of the floe (see Section 4.5 for more details). We applied a rule-based post-processing (i.e., boundary revalidation) to reduce erroneous floe-splitting or merging. In this post-processing, all the watershed boundaries neighboring more than two ice areas were in question for its validity of floe splitting or merging. In this study, we used two geometric and two intensity-based rules to revalidate the boundaries in question. More comprehensive description of the rule-based revalidation rules is available in Ren et al. (2015).
• Rule 1: if current boundary length < threshold 1 (T 1 ), then split the neighboring ice regions. The length of the boundary in question between touching ice floes is proportional to floe size; i.e., the smaller the floes, the shorter the length of touching boundaries. Thus, threshold T 1 defines the upper limit of the expected length of the touching boundaries. • Rule 2: if current boundary length < average length of all neighboring boundaries, then split the neighboring ice regions. The length of the touching boundary in question is likely shorter than the mean length of other non-touching floe boundaries. • Rule 3: |difference between all neighboring region intensities| > threshold 3 (T 3 ). If two different ice floes are touching each other, there is a higher possibility that these floes have different intensities. • Rule 4: |mean intensity of the current boundary -mean intensity of neighboring ice regions| > threshold 4 (T 4 ), then split the ice regions. When two or more floes are touching one another, the touching boundaries are likely to have different intensities due to stronger backscattering at the floe edge or existence of open water in between.

Manual inspection and correction
Manual inspection/correction was performed at two stages. At the first stage, the (black and white) waterice image, produced from the KGC algorithm, was visually inspected to check any erroneous segmentation (Figure 1). The inspection was done by comparing the water-ice image with the original TS-X image, which normally took a few minutes. If the resulting water-ice map was not satisfactory (based on a perception of the human expert), the SAR image was reprocessed with a different set of KGC parameters (i.e., k or β). Once satisfactory, the water-ice binary image was used to perform floe splitting as described in Section 3.2 to produce the ice floe-splitting image (Figure 1). The second stage inspection took place after floe splitting had finished. The floe-splitting image was examined visually by comparing the floe-splitting image (as shown in Figures 4 and 5) with the original TS-X image. Common errors from the algorithm include a) over-splitting of elongated floes, b) over-splitting due to unmasked melt ponds, and c) lack of splitting of closely packed floes. Once these errors were identified, they were corrected by removing over-split boundary or adding new boundary. This procedure typically took about less than an hour. We repeated this procedure, normally two to three times, until satisfactory (based on the perception of the human expert) results are achieved. The whole process took about one or two hours per case, and the final results were then used to determine the FSD. Further discussions about the effects of visual inspection/correction on FSD retrieval can be found in Section 4.3.

FSD calculation
In the final image from the floe-splitting algorithm, all detected floe boundaries were converted into water pixels (0 = water). As such, each floe consisted of connected ice pixels, which can be analyzed to calculate the floe size. The size of an ice floe can be measured by 1) the area, 2) the mean caliper diameter, and 3) the perimeter (Rothrock and Thorndike, 1984). These properties are highly correlated, so that measuring one of them can provide approximation of the others. Rothrock and Thorndike (1984) found that the area of floes was correlated to the mean caliper diameter (i.e., A = 0.66 d 2 ). This relationship satisfies that perimeter = πd, assuming the floe approximates as a circle. Previous studies defined floe size as the diameter of a circle or a side of a square with the same area of the floe (Steer et al., 2008;Toyota et al., 2011). If the floe shapes are irregular, the caliper diameter provides the most realistic representation of the floe size. Thus, in this study the mean caliper diameter was adopted, which can be calculated for each floe by finding the distance between two parallel calipers that are just tangent to opposite sides of the floe and then averaging over all orientations of the calipers. However, caution should be taken for small floes as the limited number of tangent lines may provide incorrect representation of the size. In this study, the validation results were compared as a floe number density distribution (FND) and/or a cumulative floe number distribution (CFND). CFND, or N(d), is power law as N(d) ∝ d -α , where d is the mean caliper diameter and α is the power law exponent (Rothrock and Thorndike, 1984).
In practice, the power-law exponent α can be calculated by using a number of different methods. Previously α was estimated by fitting the slope of the CFND plot in log-log space using a least-square fit (LSF), which exhibits a straight line over a truncated range of the CFND. Finding the truncated range for LSF is done by visually examining the distribution. In this study, LSF α was derived for a number of truncated ranges for the validation. A recent study shows that the power-law exponent α can be estimated directly from the distribution and does not require any prescribed truncation range (Clauset et al., 2009;Virkar and Clauset, 2014). This method (hereinafter referred as VC14) provides a statistically sound estimate of (non-cumulative) α and x min (the lower bound to the power-law behavior), and a goodness-of-fit to test of whether the data in question represent a power-law distribution or not (e.g., a powerlaw distribution if p-value > 0.1; otherwise not the powerlaw). As described in Appendix A, the (non-cumulative) α from VC14 can be converted to cumulative α by subtracting 1 (or adding 1 if α is defined in negative).

Selected datasets
We selected four cases for the validation of the algorithm in which "ground truth" data were manually produced. The selection was made to present diverse sea ice conditions in summer. All selected cases consisted of TerraSAR-X (TS-X) single-polarized (HH) StripMap (SM) images acquired during the summer of 2014, as part of the Marginal Ice Zone (MIZ) project, supported by the U.S. Office of Naval Research. For all selected TS-X SAR images, we found co-located high-resolution visible-band (HRV) images, a result of the declassification effort of the MEDEA group, from the U.S. Geological Survey Global Fiducials Library (GFL) (http://gfl.usgs.gov) ( Table 1). The exact acquisition time for TS-X data is known, while such information is unknown for the GFL HRV images. Nonetheless the same ice floes between TS-X SAR and HRV images can be identified clearly as they did not move very much (Figure 2). For the selected SAR images, the intensity values were calibrated radiometrically and also scaled linearly to grayscale and re-projected to a UTM coordinate, prior to the application of the proposed algorithm. The pixel spacing of the original TS-X SM images is 1.25 m, and the pixel spacing of the HRV images is 1 m.
Case 1 represents summer ice conditions in which very large floes (> 2 km) are loosely dispersed with smaller floes (< 0.5-1 km) (Figure 2a). In Case 2, floes of diverse sizes occur in a closely packed ice condition (Figure 2b). For both cases, the same ice floes shown in the SAR images can be easily identified in the corresponding HRV images (Figure 2a and b), although slight movement of floes can be seen between the two images. Cases 3 represents a late summer ice condition where sea ice floes are highly fragmented; i.e., most of ice floes are less than 300 m (Figure 2c). Both SAR and HRV images show that closely packed clusters of small floes occur at the bottomleft and top-right of the images, with some scattering of floes between two clusters. Case 4 also represents a late summer ice condition where a mixture of small and large floes are clustered together (Figure 2d). In this case, the ice condition was much more dynamic, and identifying the same floes between SAR and HRV images was more challenging.

Ground truth (GT) data
It is difficult to obtain a single, absolutely accurate ground truth (GT) that can provide a baseline to measure the validity of the algorithm-produced outputs. As human perception of texture and intensity of the image may vary from person to person, manual segmentation can still contain inherent variations. Manual segmentation of SAR imagery is particularly challenging due to the existence of speckle and backscattering changes caused by surface roughness, melting and incident angle. Despite these problems, operational sea ice charts have been produced by sea ice experts through manual analysis of SAR imagery and are considered as reasonable GT data to validate relevant algorithms (e.g., Karvonen, 2014). In this study, we did not use 1-m resolution HRV image directly to build GT, as the floes were moved around between TS-X image and the corresponding HRV image (due to difference in acquisition time). We used the HRV image as a guide to split touching boundaries, especially for difficult cases where the floe boundaries were unclear in the TS-X image, which provided more objective delineation of individual floes in the TS-X image. The motivation of building GT was to produce baseline data that accurately represented the FSD. While building GT, we made an effort to trace the floe boundaries as seen in HRV images to preserve the shape of the floes. In some cases, however, tracing actual boundaries in TS-X are quite challenging, especially under compact ice conditions. In those cases, the manually traced boundaries may not accurately represent the shape of floes in the SAR images.
In our study, GT data were produced as follows. We first downsized the image, reducing the resolution of the selected TS-X SAR by 50%; i.e., the pixel spacing (ps) = 2.5 m. This resolution accommodates a more manageable file size during manual tracing of touching boundaries of the floes. We produced a water-ice segmented image by using the interactive feature extraction module in ENVI®, rather than KGC. The main reason for this approach was to avoid using the same algorithm being validated. The ENVI® module implements an edge-preserving watershed and merging algorithms, and produces a reasonable outcome if the threshold is well defined. We manually adjusted the threshold to have as accurate a water-ice image as possible, with the aid of the corresponding HRV image. As for the next step, the boundaries between touching floes were manually traced. Thus, GT images contain open water and ice regions, with all touching floes separated by manually traced boundaries.
Producing GT images took more than three days of a sea ice expert's time for each case. The procedure of drawing a vector line for each touching boundary was very labor extensive. We note that the image size used for GT was about 10 × 10 km or smaller. In practice, a larger image size (e.g., 30 × 30 km or larger) is used to derive FSD. In such a case, the expert time required for the manual analysis can increase significantly, especially for closely compacted ice conditions. One of the advantages using the algorithm is to save the human expert labor time in producing FSD; e.g., it typically takes about one or two hours for an expert to check and correct the errors from the algorithm. As part of the algorithm, a manual inspection/correction is required (see Section 3.3); however, the required expert time is significantly reduced from three Hwang et al: A practical algorithm for the retrieval of floe size distribution of Arctic sea ice from high-resolution satellite Synthetic Aperture Radar imagery Art. 38, page 7 of 23 days to a couple of hours or less to simply check for any obvious errors.

Validation results -visual comparison
To validate the algorithm results with the GT data, we first decreased the image resolution of the selected TS-X SAR images by 50% (i.e., pixel spacing = 2.5 m) and applied speckle noise filters prior to running the algorithm. For speckle filtering, we kept the same parameters for median (5 × 5), Gaussian (7 × 7) and bilateral (half-width = 15) filters for all cases. To produce a water-ice image, we set KGC parameters as follows: k = 3, β = 0.001 for Cases 1 and 2, and k = 3, β = 0.000 for Cases 3 and 4. For Cases 1 and 2 (typical of early to mid-summer ice condition), we applied more smoothing (β = 0.001 in KGC) during production of a water-ice image to reduce the effects of "false" holes (e.g., low-intensity areas mainly caused by the presence of melt ponds) within ice floes. Melt ponds (i.e., pools of water formed on sea ice during melting) are most widespread in late spring, when the effects of "false" holes are most significant. In summer, their effects are less significant, as the melt ponds are drained down from the ice surface. Figure 3 shows the algorithm results for SAR images along with GT. When visually compared to GT, the algorithm results agree reasonably well with GT for large ice floes but not for smaller floes (Figure 3). To examine the algorithm results in more detail, we cut out a section of HRV, SAR and algorithm-derived floe-split images for Case 2 (Figure 4). Here we would like to pay attention to the floes marked in numbers 1-8 in Figure 4. Those large (≥ ~1 km) floes are in general well delineated by the algorithm. However, small floes surround some of those large floes, which makes it difficult to identify the exact floe boundary in SAR images. For example, following a close look at floe 8, one can identify a small floe at the bottom of floe 8, which, as confirmed in HRV image, is clearly a separate one. However, the algorithm failed to split this small floe from floe 8 (Figure 4) due to insufficient evidence in the SAR image. Similar issues can be found in floes 6 and 7 in Figure 4.
One important issue is the existence of low intensity spots ("false" holes) within an ice area. In Figure 4, there are low intensity (darker tones) areas in the middle of floe 9. Those low intensity areas can occur due to thinner ice and melt ponds. In the algorithm, we employed speckle filters and the KGC smoothing term β to reduce insignificant low intensity spots within ice floes. However, those low intensity spots cannot be removed in some occasions (i.e., large melt ponds), and will result in "false" local minima, leading to an over-splitting of floes. In this case, floe 9 was over-split by the algorithm but corrected through visual inspection. Another common case for over-splitting can be found in floe 10. As can be seen in the HRV image, floe 10 is clearly a single floe. However, this floe appears to be an aggregation of two floes (e.g., a two hump-like shape) in the SAR image with small floes clustered around the floe. As the distance transformation is calculated from the floe edge, this two hump-like shape leads to two local minima and ends up with over-splitting. If the shape of a sea ice floe is close to a circle, this issue does not occur. However, we do find that actual sea ice floes come with very irregular shapes (such as long elongated ellipse, two hump-like, rectangle and so on). One of the objectives of the visual inspection is to identify and correct those errors.
Next we examine how the algorithm detects small floes. For this aim, we selected three sub-areas (marked as A, B and C in Figure 4) and magnified these areas in  in sub-area C, as the floes are more dispersed. However, in sub-areas A and B, small floes are much more packed, and it is very difficult to separate them in a TS-X SAR image (see floes marked as "2" in sub-areas A and B). The algorithm considers those small floes that are closely packed together as one larger floe (> ~ 200 m) instead of many individual ones. The result will be an underestimation of the number density of small floes but overestimation of the number density of larger floes. This limitation can be partly overcome by using SAR images with a higher resolution.

Validation results -statistical comparison
In Section 4.3 we examined the algorithm results qualitatively based on visual comparison with GT. Next we extend our validation exercise into a quantitative assessment. We compare the algorithm results with GT in terms of FND and CFND. We first compare FNDs between the algorithm and GT (Figure 6). Here we note that FND derived by the algorithm was much smaller than the one from GT; i.e., the algorithm FND was only 40-50% of GT FND for Cases 1, 2 and 4 ( Figure 6). Most of the floes undetected by the algorithm are small floes (≤100 m); i.e., the algorithm FND was only 40% or less of GT FND for these three cases (Figure 6). This difference suggests that the algorithm is failing to resolve small (≤100 m) floes, as discussed in Section 4.3. This difference in FND is not due to any difference in image resolution between the algorithm and GT, as the same SAR image was used for both the algorithm and GT. Two factors contributing to the difference are summarized as follows. First, we set the lower pixel limit as 25 pixels, meaning that we ignored any ice area that had less than 25 pixels (equivalent to about 12.5 m in diameter) in the algorithm. Second, de-noising filters and smoothness term β were applied to subdue low intensity spots caused by speckle noise, melt ponds and different ice types (see Section 3.1 and 5.1). This inevitably smoothed out some details of the segmented water-ice images. Small open water areas between floes are needed for the floesplitting algorithm to find the correct floe boundaries, especially for small floes (see Figure 5). We also note that FND derived by the algorithm in Case 3 is much higher (almost 70% of GT FND) than that in other cases (Figure 6c), where FND of small (≤100 m) floes from the algorithm approaches almost 60% of GT FND. This result is partly because no smoothness term in KGC was applied (β = 0.00) for this case, and thus more details in the waterice image have been kept. We also note that the algorithm overestimates FND for d = 100-200 m by ~4 0% (Figure 6c). We attribute this overestimation to the fact that the algorithm derives small floes that are tightly packed together as a large floe (see Section 4.3). In summary, the algorithm overestimates, compared to GT, mean and median d by 19-81 m and 23-39 m, respectively, and the overestimation is the highest in Cases 1, followed by Case 2, 4 and 3 (Figure 6).
Next we examine the algorithm results using CFND. We first compare the CFND plots between the algorithm and GT. For Case 1, CFNDs between the algorithm and GT match reasonably well to each other until d is less than 200 m (Figure 7a). For Case 2, the algorithm CFND match very well with GT until d = 200 m and then slightly deviate from GT until d = 100 m (Figure 7b). The effects of limited number of small floes are shown as "flattening" of CFNDs. For Cases 1 and 2, the flattening occurs at d = ~5 0 m in both the algorithm and GT CFNDs (Figure 7a and b). This flattening can be seen more clearly in GT CFNDs, as they exhibit a straighter line before the flattening. For Case 3, the algorithm CFND matches with GT until d = 150 m and then deviates slightly upward from GT until d = 50 m (Figure 8c). For Case 4, the algorithm CFND looks steeper than GT (Figure 7d). Table 2 contains the α values estimated from both the LSF and VC14 methods. LSF α was determined by linearly fitting (an outlier-resistant linear regression) CFND in log-log space over a truncated floe size range (α range ) such as 50-100 m, 100-500 m or 100-500 m. The α value from VC14 were converted to (cumulative) α values by subtracting 1 for the comparison with α from CFND (see Appendix A for details). For Case 1, the algorithm α values are slightly higher (steeper) than GT values (up to 0.09 for LSF α or up to 0.04 for VC14 α, Table 2). For Case 2, the differences in α increase up to 0.15 (LSF α) or 0.19 (VC14 α), indicating a steeper slope from the algorithm ( Table 2). For Case 3, similar to other cases, the algorithm α from LSF is slightly higher (an overestimation of α), but VC14 α shows an underestimation of the algorithm α ( Table 2). Note that VC14 α is not statistically significant to be assumed as the power-law distribution (p-value from the goodness-of-the-fit < 0.1), which casts a doubt whether such α estimates can be trusted. For Case 4, the differences in α further increase up to 0.37 (LSF α) or 0.432 (VC14 α), indicating a much steeper slope from the algorithm ( Table 2). This difference (an overestimation of α from the algorithm) can be attributed to the algorithm's inability to resolve small (d < 100 m) floes. The algorithm tends to see multiple small floes as a larger floe due to its limitation in water-ice segmentation (see Figures 4 and 5). This increases the number of floes with d = 100-300 m, which causes a steeper slope in CFND (see Figure 7d). This overestimation is more significant for a late summer condition (e.g., Case 4) than for an early summer condition (e.g., Cases 1 and 2). Also note that the number of samples and the size of image used in the validation are quite small. In our normal FSD retrieval, we typically use SAR images of 30 × 30 km or larger in size. In this case, the algorithm-derived CFND typically exhibits a straight line and the number of floes exceeds 1,000 (see Section 5.4). In summary, our validation results showed that FND derived from the algorithm is consistently smaller than those from the GT, especially for small (< 100 m) floes. We attribute this underestimation of small floes to the lower pixel limit (25 pixel, ~1 2.5 m in d) set in the algorithm and the limitation in the water-ice segmentation. This limitation leads to an overestimation of the algorithm α (i.e., steeper slope), as the algorithm tends to treat those unresolved small floes as larger floes. This overestimation tends to be more significant for a late summer ice condition (Case 4). We also note that the size of SAR imagery that was used for the validation is much smaller (~10 × 10 km and ~3 × 3 km), and the number of floes derived by the algorithm was also quite small (e.g., N = 555 for Case 4). Thus, the results should be interpreted with caution. In the next section, we discuss various factors affecting the algorithm performance, including the effect of limited number of floes in a small image size.

Effects of speckle filtering and KGC parameters on water-ice segmentation
In the algorithm, accurate segmentation of water-ice regions is a very important prerequisite for FSD retrieval, as the segmented water-ice image is be used as a base map to split the boundaries of touching floes (see Section 3).
In the algorithm, a combination of de-noising filters and KGC is employed to produce accurate water-ice segmentation. SAR imagery acquired over sea ice areas often exhibits a Gamma-like image distribution (see the example for Case 1 SAR imagery in Figure 8a), which is difficult to segment by using conventional methods such as K-mean. Applying even a modest median (3 × 3) filter can significantly improve the image distribution toward a Gaussian (Figure 8b), and so does the corresponding water-ice segmentation (Figure 9b). Despite this improvement, the water-ice image still contains many low intensity spots (Figure 9b). These low intensity spots commonly occur in sea ice SAR imagery and can be attributed to unfiltered speckle noise, melt ponds and different ice types. More distinctive bi-modal Gaussian distribution can be achieved by applying additional bilateral (half-width = 15) filter (Figure 8c), which further subdues low intensity spots while preserving the floe edges (Figure 9c). Additional Gaussian filter does not improve the image distribution as the number density distribution remains almost unchanged (Figure 8d), and so does the associated waterice image which is also almost unchanged in this particular case (Figure 9d). As shown in the results, de-noising filters can significantly reduce low intensity spots and improve water-ice segmentation. However, the resulting water-ice segmentation still contains many low intensity spots and is not good enough to be used as a base map for the floe-splitting purpose (see Figure 9d).
In the example described above, we set k = 2 (Gaussianmixture) and β = 0.00 (no smoothing) in KGC to create water-ice segmentation, in order to demonstrate the effects of de-noising filters. As mentioned earlier, KGC has shown a potential to produce good segmentation results for a range of image distributions such as Gaussian, Gamma and exponential. In our case, this capability of KGC is important, as SAR imagery often exhibits a Gamma-like distribution, and de-noising alone fails to produce accurate water-ice segmentation. In KGC, the selection of k is dependent on image distribution, e.g., Gaussian mixture vs Gamma mixture. To test this point, we set various k values to produce water-ice segmentation, the results of which are shown in Figure 10. We can see that low intensity spots are greatly reduced when we use k = 4 (Figure 10b). Applying higher k (k = 6) further reduces low intensity spots within ice regions, but also increases high intensity spots (white spots) in water regions (Figure 10c). Much better segmentation results can be achieved by applying de-noising filters and higher k (k = 4), although some low intensity spots still occur within ice regions (Figure 10d).
Another KGC parameter β controls the smoothness in water-ice segmentation (see Section 3.1). To test the effect of β, we produced water-ice segmentation for three different β values, i.e., β = 0.000, β = 0.001 and β = 0.01. The results are shown in Figure 11. In this test, we kept k = 4 and de-noising filters as before. As seen in Figure 11, a slight addition of smoothness (i.e., β = 0.001) greatly improves the removal of low intensity spots (dark spots) within ice regions (Figure 11b). For the remaining low intensity  Virkar and Clauset (2014). c Manually revised, through manual inspection/correction. We set the threshold T 1 = 1000 m for Cases 1 and 2, and T 1 = 200 m for Cases 3 and 4. d Not applicable. e Estimates of α by ground truth (GT) and algorithm that failed the goodness-of-the-fit test (p-value < 0.1), indicating that the power-law distribution is not a suitable model. spots, they can be further removed by applying a higher β (i.e, β = 0.01) at the expense of losing details of floe edges (Figure 11c). In our validation exercise described in Section 4, we used β = 0.01 for Cases 1 and 2 (early to mid-summer) and β = 0.00 for Cases 3 and 4 (late summer). Note, smoothing is not required for late summer cases, as the effect of low intensity spots is relatively small (not shown here). For early to mid-summer cases, applying a low β would have better resolved smaller floes but also promoted over-splitting of larger ice floes. For example, α and N were increased by 0.07 and 257, respectively, when we used β = 0.001 instead of β = 0.01 (from α = 1.51 and N = 832 for β = 0.01) (Figure 11). Applying β = 0.00 (no smoothing) increases both α and N by 1.81 and 2,225, respectively (Figure 11). This over-splitting can be minimized by applying a higher β.
We note that more advanced speckle filters are available (e.g., Zabalza et al., 2015), and applying those advanced filters can potentially improve the results especially for better resolving small floes. As for now, however, a combined use of speckle filters and KGC is quite effective in producing accurate water-ice segmentation image that can be used as a base map for floe splitting.

Effects of boundary revalidation parameters on FSD retrieval
In the previous section, our discussion was focused on the effects of speckle filters and KGC parameters in producing accurate water-ice segmentation images. In this section, we turn our attention to the effects of boundary revalidation parameters on floe splitting. In our algorithm, Figure 9: Water-ice image for Case 1 produced with various speckle filters. The segmentation results were derived from the TerraSAR-X SAR image (pixel spacing = 2.5 m) (a) without any speckle filter, (b) with a median (3 × 3) filter, (c) with median and bilateral (half-width = 15) filters, and (d) with median, bilateral and Gaussian (7 × 7) filters. We set kernel graph cuts (KGC) parameters k = 2 and β = 0.00 (no smoothing) for the segmentation to demonstrate the effects of speckle filtering. In each panel there are two small red boxes that have been blown up and re-displayed in the lower left and right corners. No manual inspection/correction was applied for the results shown here. DOI: https://doi.org/10.1525/elementa.154.f9 we used inverse distance transformation combined with watershed to identify the boundaries between touching ice floes. As this approach normally leads to over-splitting of floes, we employed a rule-based boundary revalidation process (see Section 3.2). Among four rules used in this study, we set the constant value for the thresholds for Rules 2-4. Rule 2 regulates whether the length of the boundary in question is shorter than the mean length of other non-touching floe boundaries or not. If it is shorter, the boundary in question is regarded as valid. Rules 3 and 4 examine the difference in intensity values among ice regions and along the boundary in question (see Section 3).
Different thresholds for Rule 1 (T 1 ) were used depending on ice conditions. Rule 1 examines whether the length of the boundary in question is short enough to be a valid boundary of touching floes. For example, if T 1 is too large, all the boundaries detected by the algorithm are regarded as floe boundaries, promoting over-splitting of floes. If T 1 is too small, the algorithm can fail to detect some of the "real" floe boundaries. To test the effects of T 1 , we derived CFND and α for three different T 1 for all cases, and summarize the results in Table 3. The α value derived from the algorithm varies with the selection of T 1 , but the difference in α is within 0.14 for usual selection of T 1 (i.e., T 1 = 500 or 1000 m for early to mid-summer, T 1 = 200 or 500 m for late summer). The difference in α between T 1 = 200 m and T 1 = 1000 m increases up to 0.15 for Case 1 and up to 0.10 for Case 2 ( Table 3).
As another note for our selected cases, we observed relatively small changes in α after visual inspection/correction. The largest difference in α (0.05) was found in Case 2 (early summer case) and the smallest was found in Cases 3 and 4 (late summer cases). This finding is partly because over-splitting commonly occurs in larger floes, and visual inspection/correction is more difficult for small floes. The small changes in α (∆α ≤ 0.05) for the four selected cases raises a question whether the manual inspection is required. In practice (when we analyze a larger data set), we found that the difference due to manual inspection was about 0.16 on average, but it increased up to 0.67 for some cases. The cases showing larger difference are typically associated with a) over-splitting by unfiltered melt ponds, b) over-splitting of elongated floes, and c) under-splitting of closely packed floes.  Ground truth (GT) and the algorithm values used in the validation exercise in Section 4.3. d Estimates of α by ground truth (GT) and algorithm that failed the goodness-of-the-fit test (p-value <0.1), indicating that the power-law distribution is not a suitable model.

Effects of image resolution
As discussed in Section 4.1, we used the half resolution images (ps = 2.5 m) to derive FSD. In this section, we extend our discussions and analyze how the image resolution would affect FSD retrieval. For this purpose, we added two more image resolutions, i.e., a full resolution (ps = 1.25 m) and a less than quarter resolution (degraded to 15% of the full resolution) (ps = 8.33 m). Table 4 summarizes the results in α and N for all four selected cases. First, we note that the algorithm α is very comparable between full and half resolutions for Cases 1 and 2 (i.e., the difference is ≤0.01) ( Table 4). For Cases 3 and 4, the difference is much larger, i.e., 1.48 for Case 3 and 0.50 for Case 4 ( Table 4). The results from Cases 3 and 4 need to be interpreted with caution as the derived FSD may not represent floe statistics due to the small number of floes derived in a limited size coverage of the image (i.e., ~3 × 3 km). In particular, the goodness-of-the-fit test showed that VC14 α for Case 3 is not statistically significant (i.e., unlikely the power-law), and that the VC14 method failed to estimate α for the quarter resolution (Table 4).
In Figure 12, the FSD results from Case 2 are shown for detailed analysis. First, the difference in FND mainly occurs in small (≤100 m) floes, where the FND of small (<100 m) floes from the full resolution is almost doubled or 7 times higher than the ones from half or quarter resolutions, respectively. Also note that the mean and median d derived by the algorithm become smaller as the image resolution increases, which suggests that a higher image resolution is certainly helpful to resolve small (≤100 m) floes.
Comparison of α for different image resolution is summarized in Table 4. For Cases 1 and 2, the algorithm α increases from the quarter resolution to the half resolution, but slightly decreases or remains the same from the half resolution to the full resolution ( Table 4). The increase of the algorithm α from the quarter resolution to the half resolution is expected, as a higher resolution is helpful to resolve small floes. However, smaller or similar α from the full resolutions is unexpected, especially for Case 4 which is almost 0.50 less ( Table 4). Why does the full resolution image produce a smaller (less steep) α? Here we first compare x min (lower-bound to the power-law behavior estimated by the method of Vikar and Clauset, (2014) between the half resolution and the full resolution for Case 3. At the half and full resolution x min is 158 m and 79 m, respectively, which indicates the ability to resolve smaller floes at the full resolution. In Section 4.4, the algorithm showed a higher (steeper) α than GT, mainly because the algorithm saw a congregation of small (d < 100 m) floes as a larger floe (Figure 5), which increased the number of floes with d = 100-300 m (Figure 6). We speculate that the increased (full) image resolution has helped to resolve such small floes that were otherwise detected as a larger floe at the half resolution, which caused less steep slope in CFND (Figure 12).
As mentioned earlier, the validation cases were made for small sub-images (~10 × 10 km or ~3 × 3 km). Thus, the number of floes derived by the algorithm can be very small, especially for the quarter resolution. In the next section, we use a larger image to examine how CFND and α behave between different image resolutions if sufficient number of floes can be derived.

FSD retrieval from a larger image
In previous sections, we found some of the algorithm results might not be representative due to an insufficient number of floes derived from small sub-images. In our Ground truth (GT) and the algorithm values used in the validation exercise in Section 4.3. c Estimates of α by GT and algorithm that failed the goodness-of-the-fit test (p-value <0.1), indicating that the power-law distribution is not a suitable model.
normal FSD retrieval, we typically use a larger SAR image (30 × 30 km or larger) to derive FSD, and the derived CFNDs usually exhibit quite a straight line, with the number of floes derived from the algorithm exceeding over 1,000. The size of TS-X images used for the validation exercise was much smaller than 30 × 30 km ( Table 1). This is because the validation image size was limited by finding the area that overlapped between TS-X and HRV images, and this process became difficult due to cloud cover in the HRV images. In this section, our objectives are twofold. First, we reexamine how the shape of CFND plots changes with a larger number of samples. For this purpose, we divided the original SAR images (30 × 60 km) into sub-images (30 × 30 km), and derived corresponding FSDs from those sub-images. Figure 13 shows FSD results derived from SAR images with two different resolutions (quarter and half resolutions) for Case 1. First, the number of floes derived from the algorithm is 1,898 (even for the quarter image resolution), and CFND exhibits quite a straight line (no local deviation visible). The α values are slightly lower than the validation case (1.74 from the larger image vs. 1.83 from the validation). This difference can be attributed to regional differences (larger vs. small areas) or simply an insufficient number of floes in the validation results. Also note that the difference in α between the VC14 and LSF methods is very small (less than 0.05) (Figure 13). Second, the difference in VC14 α between quarter and half resolutions is also small (within 0.09) for the large image ( Table 4 and Figure 13).  Figure 14 shows the same comparison results between two image resolutions for Case 3. Similar to Case 1, the number of floes exceeds 4,000, and CFND exhibits quite a straight line. In particular, we note that FNDs for small floes (< 200 m, floe size group 1 and 2) derived from the half resolution image are significant higher than those from the quarter resolution image (Figure 14), which makes the total number of floes 13,554 for the half resolution image. In the validation case (image size = 4 × 4 km), the VC14 method failed to estimate α from the quarter resolution image (Table 4). We speculated this failure was owing to insufficient floe number (N = 103 for the quarter resolution) from the small image. For the larger image (30 × 30 km), the difference in α between quarter and half resolutions is reduced to 0.23, due to sufficient numbers of floes derived for both resolutions. More importantly, the VC14 α values are statistically significant (p-value > 0.1 for the goodness-of-the-fit). The difference in α in Case 3 is larger than what we found in Case 1 (0.09), likely due to a much higher number of small floes. Nonetheless, this confirms that CFND and α can be reasonably comparable between quarter (ps = 8.33 m) and half (ps = 2.50 m) image resolutions if a sufficient number of floes can be derived from a larger image.

Summary and closing remarks
In this study, we have presented a set of algorithms that derive summer sea ice FSD from high-resolution SAR imagery. The proposed algorithm is comprised of (i) speckle filtering, (ii) water-ice segmentation by kernel graph cuts (KCG) and (iii) floe splitting by a combination of distance transformation, watershed and rule-based boundary revalidation methods. To test the performance of the algorithm, we selected four cases that represent sea ice conditions during early to later summer. All selected cases consisted of both TerraSAR-X SAR (TS-X SAR, Strip-Map, single polarization HH) and high-resolution visibleband (HRV) imagery (USGS GFL). Ground truth (GT), used as baseline data to validate the algorithm, was created for each case by using a feature extraction module in ENVI® and then manually tracing floe boundaries with the aid of HRV images.
The results show that the algorithm considerably underestimated the number density of small (d ≤ 100 m) ice floes. This underestimation is mainly due to a lower pixel limit (25 pixels) used in the algorithm as well as the limitation in water-ice segmentation, especially for small floes that were closely packed together. These combined effects caused almost 36-75 % underestimation for small (d ≤ 100 m) floes, while increased the number of larger (d = 100-300 m) floes. This has led to the algorithm α over-estimated by 0.04-0.19 (Cases 1 and 2, early summer case) or by 0.12-0.42 (Cases 3 and 4, late summer case) ( Table 2). The relatively large difference in α between GT and the algorithm for Case 4 highlights its limitation, i.e., detecting a congregation of small floes as a larger floe. This effect is more severe in a late summer condition when small floes are closely packed (Figures 4 and 5). This limitation may be overcome by using a higher resolution image, as the algorithm α became more comparable to GT α when the full resolution image was used ( Table 4). Several factors affecting algorithm performance were addressed. The results showed that a combination of speckle filtering and KGC was effective in producing the best water-ice segmentation (Section 5.1). SAR imagery acquired over sea ice areas often exhibits a Gammamixture image distribution (Figure 8a), for which it is difficult to produce water-ice segmentation using conventional methods such as K-means. We found that speckle filters (median, bilateral, Gaussian) could be used to refine the image distribution into a Gaussian by subduing low intensity spots ("false" holes) that were caused by unfiltered speckles, melt ponds and different ice types within ice regions (Figures 8 and 9). Despite the improvement, speckle filters themselves would not be sufficient to produce satisfactory water-ice image for floe splitting (i.e., too many low intensity spots) (Figure 9d). This limitation could be improved by employing a larger k in KGC to further subdue unwanted low intensity spots even before applying speckle filters (Figure 10a-c). By combining KGC with a larger k with speckle filters, significant improvement could be achieved. However, water-ice segmentation map still contained some low intensity spots for early to mid-summer cases (Figure 10d). Slight addition of smoothness β in KGC effectively subdued the remaining low intensity spots (for early to mid-summer cases), producing water-ice segmentation maps that are good enough for floe splitting.
Floe splitting was done by a combination of inverse distance transformation, watershed and rule-based boundary revalidation, using the water-ice segmented image (as described above) as a base map (see Section 3.2). We examined the effects of boundary revalidation parameters to determine whether a certain boundary detected by watershed will be kept (splitting) or discarded (merging). Our discussion was focused around T 1 for Rule 1, which determined either splitting or merging based on the length of the boundary in question. The selection of different T 1 did affect the algorithm CFND and α, but the difference in α was relatively small if T 1 was reasonably selected (Table 3). For instance, the difference in α remained within 0.14 for Cases 1 and 2 (early to mid-summer ice condition) between various T 1 values (Table 4). Note, we typically use T 1 = 500 or T 1 = 1000 m for the image acquired during early to mid-summer ice condition). Our results also showed that the difference in α between before and after manual (visual) inspection are quite small (within 0.05) for all four cases (Table 3). Although the validation results showed that manual inspection could have a very small impact on the derived α, in practice the manual inspection could have a significant impact on the results (e.g., causing a difference in α up to 0.67 for some cases). The cases showing larger difference are typically associated with We also addressed how different image resolutions would affect the FSD retrieval (Section 5.3). The results were compared in three different resolutions; i.e., the quarter resolution (ps = 8.3 m), the half resolution (ps = 2.5 m) and the full resolution (ps = 1.25 m). The comparison results showed that the algorithm α values at the half resolution were higher (steeper) than the ones at the quarter resolution (Table 4). This effect was expected, as a higher resolution is helpful to resolve small floes (Figure 12). The algorithm α values at the full resolution, however, were smaller (less steep) than the ones at the half resolution (Table 4). We speculate that the result of lower α values at the full resolution is likely due to the increased resolution that helped to resolve the small floes that were otherwise detected as a larger floe at the half resolution.
When we examined the validity of FSD statistics from a larger (30 × 30 km) image (see Section 5.4), the number of floes derived from the algorithm exceeded 1,000, and CFNDs exhibited quite straight lines (i.e., no local deviation as shown in validation cases). For larger images, the difference in α between quarter and half resolution was 0.09-0.23 (Figure 13 and 14).
In conclusion, the results showed that the algorithm α varied between 0.04 and 0.42, including the validation against GT and the effects of different image resolutions, as well as the effects of control parameters such as k, β and T 1 and visual (manual) inspection. This degree of variability means that a combination of the algorithm and manual inspection could be used to derive FSD from SAR imagery with some accuracy compared to what a human expert could manually produce. Here we stress that our purpose of the algorithm development is to minimize (not replace) the labor and time of a human expert, to provide consistent FSD retrieval, and to mimic what an expert produces manually with sufficient time and effort. Human intervention by an expert is still necessary to check and correct errors at the end of production of water-ice segmented image and floe splitting, but much less time and effort is needed in comparison to conventional fully manual FSD retrieval. In our case, an expert spent at least three full days to construct GT, yet with the algorithm, the expert spent only a couple of hours to check and correct errors.
For future work, there are still remaining challenges in the algorithm; i.e., (i) resolving small (≤100 m) floes, (ii) more automated correction for the errors, and (iii) melt pond detection. Small (≤100 m) floes may be better resolved by employing more advanced speckle filters (e.g., Zabalza et al., 2015) and/or optimization-based schemes. We are currently working on these algorithms to see whether more details of floe edges can be preserved while suppressing low intensity spots in SAR imagery. In this study, we used four simple rules to validate the boundary, but more comprehensive rules can be added to further reduce the errors. Combining the rule-based method with other algorithms such as active contour is an interesting idea. One of the critical issues for SAR imagery acquired during early summer is the significant presence of large melt ponds. Melt ponds are potentially important for FSD, because they serve as "weaker points" for the floes to break up (Arntsen et al., 2015). At the same time, those melt ponds at their maximum can cause significant problems for FSD retrieval, in which cases the current algorithm (even with smoothing, large β) cannot eliminate the low intensity spots caused by large melt ponds. In the future, automatically masking melt ponds would be needed to robustly derive FSD from early summer SAR imagery.

Appendix A: Comparison with previous studies
Regarding the comparison of the exponent α values, we first note that the exponent α values reported in previous studies are dependent on whether the slope in loglog space is derived from cumulative distribution or non-cumulative distribution (see Stern et al., 2017b, for more detailed discussions). In previous studies, a cumulative distribution was typically used to estimate α (i.e., the slope of the distribution in log-log space) ( Table A1). In a few studies, non-cumulative distribution was used to estimate α, in which case non-cumulative α can be converted to cumulative α by subtracting 1 (or adding 1 if α is defined as negative). For, non-cumulative α from Stern et al. (2017a) and Hwang et al. (2017) would have ranges of 1.0-1.6 and 1.9-2.5, respectively, when converted to cumulative α ( Table A1).
As can be seen in Table A1, the range of cumulative α values from Rothrock and Thorndike (1984), Holt and Martin (2001) and Perovich and Jones (2014) is 1.7-2.5. The range of cumulative α values from this study is 1.5-2.2, if Case 3 is excluded (see Table 4 in this study), which is comparable to previous studies. Note that the selected cases used in this study were drawn from a larger dataset in Hwang et al. (2017); i.e., we selected the four cases that had corresponding 1-m resolution visible images among the dataset used in Hwang et al. (2017). In Hwang et al. (2017), we used the algorithm developed in this study to segment and delineate floe boundary from TS-X data, but used another (non-cumulative) method suggested by Clauset et al. (2009) and Virkar and Clauset (2014) to calculate power law coefficient α. In Hwang et al. (2017), the mean α values for July and August are 1.9 and 2.5 (when converted to cumulative α), respectively, which are also within the range reported in previous studies. We note that the cumulative α value for Case 3 in this study is much higher than other cases ( Table 4). This case represents a convex curve of cumulative distribution, not a straight line, representing a homogeneous floe size regime (Figure 7c). This relatively high α may reflect that the power law is not a suitable model to fit the distribution. When the goodness-of-the-fit test (Clauset et al., 2009;Virkar and Clauset, 2014) is applied to Case 3, it shows the distribution is unlikely a power law distribution.
The cumulative α values from Wang et al. (2016) are much shallower (1.0-1.5) than the reported values stated above (Table A1) Although the α values from MODIS have shown some consistency with the values from 1-m resolution MEDEA images (referred as HRV images in this study) for limited cases during May to early July (Figures 12 in Stern et al., 2017a), the TS-X derived α values were generally steeper ( Figure 13 in Stern et al., 2017a). We note that MODIS resolution may be insufficient to resolve small floes in summer, and this may result in a shallow slope from MODIS. Wang et al. (2016), however, used high-resolution satellite data (such as MEDEA and Landsat), where it is expected that their α values would be consistent with previous studies and our study. In particular, MEDEA data used in Wang et al. (2016) covered the Beaufort Sea (74-76N, 149-150W) and spanned August 2-16. Some possible causes for the inconsistencies include 1) actual spatial and temporal variability in FSD, 2) sampling variability, 3) inadequacy of the power law as a model of the FSD, and 4) poor analysis method (Stern et al., 2017b). Considerable variability in FSD can occur within a relatively short distance (Hwang et al., 2017), and limited coverage of satellite image can produce some inconsistency, especially if only a few images were analyzed for comparison. For some cases (like Case 3 in this study), the power law is not the best model to describe the FSD. Poor analysis including inappropriate binning and truncation of the distribution can cause some variability in the retrieved FSD (see Stern et al., 2017b).