Improving the accuracy of the Water Detect algorithm using Sentinel-2, Planetscope and sharpened imagery: a case study in an intermittent river

ABSTRACT Mapping surface water using remotely sensed optical imagery is a particular challenge in intermittent rivers because water contracts down to narrow linear features and isolated pools, which require accurate water detection methods and reliable image datasets. Of the many methods that use optical sensors to identify water, the Water Detect algorithm stands out as one of the best options due to its classification accuracy, open-source code, and because it does not require ancillary data. However, in the original study, the Water Detect algorithm was only tested with Sentinel-2 imagery. High-resolution and high-frequency imagery, such as Planetscope, combined with sharpening and band synthesizing techniques have the potential to improve the accuracy of surface water mapping, but their benefit to the Water Detect algorithm remains unknown. Uncertainty also exists about the extent to which different input parameters (i.e. maximum clustering and regularization) influence the accuracy of Water Detect. Practitioners seeking to map surface water in intermittent rivers need guidance on a best-practice approach to improve the accuracy of Water Detect. To meet this need, we automated an existing method for sharpening and synthesizing bands and applied it to a series of multispectral Sentinel-2 and Planetscope images. We then developed a sensitivity analysis algorithm that compared the accuracy for all possible combinations of input parameters in a given range for the water detection process – enabling optimal parameters to be identified. We applied this workflow to an 81 km stretch of the lower Fitzroy River (Western Australia) to periods when spatial water extent varied markedly, i.e. mid-wet (February), early-dry (June), and late-dry season (October), across three years with variable wet season flow. We found that the ability to accurately detect surface water using multispectral imagery was increased by using input parameters identified by the sensitivity analysis and using Visible + Near-infrared (VNIR) bands, with relatively little gained by image sharpening unless the area of interest was burnt or experienced considerable shading. Also, the regularization parameter exerted less influence on results than maximum clustering. Importantly, the accuracy of the Water Detect algorithm can vary drastically if input parameters are not calibrated to local conditions. Results also revealed that our approach was adept at detecting linear features in intermittent rivers. We recommend that practitioners using Water Detect to identify surface water undertake a workflow similar to that described here to improve the accuracy of the Water Detect algorithm. The automated routines provided by this study will significantly assist practitioners in doing so. Increasing the accuracy with which we detect and map water in intermittent rivers will improve our understanding and management of these important systems which are under increasing threat.


Introduction
Global declines in the quality and quantity of freshwater habitats (Vörösmarty et al. 2010) mean that now, more than ever, there is a need for accurate water resource assessments.This is a challenge for the world's intermittent rivers, which support around 40% of the global population (Koohafkan et al. 2008) but have little gauging infrastructure/data to guide decision-making (Callow and Boggs 2013;Jarihani et al. 2015).A remotely sensed approach that uses multispectral imagery to detect and map water is a cost-effective alternative to describe intermittent river hydrology (Callow and Boggs 2013).A multispectral approach also allows seamless characterization of water across a landscape, as opposed to distinct locations (i.e.gauges), enabling more nuanced management and assessment of impacts.However, a multispectral approach in intermittent rivers faces its own challenges, as narrow linear features such as runs and riffles, typically transitory, are difficult to map with a coarse pixel resolution and low satellite revisit frequency.Cloud cover can also interfere with image capture during times of precipitation.For a multispectral approach to be useful in intermittent rivers, practitioners need guidance on how to optimize its accuracy.This includes information on which water detection method to select, the satellites and sensors to choose, and whether images should be pre-processed (i.e.sharpened).
Of the many methods that use optical sensors to identify water, the Water Detect algorithm (Cordeiro, Martinez, and Peña-Luque 2021) stands out as potentially one of the best options due to its classification accuracy (Tottrup et al. 2022), open-source code, and because it does not require ancillary data.This algorithm mapped water bodies smaller than 0.5 ha at a country scale more accurately than other commonly used unsupervised methods (e.g.Canny-edge MNDWI, Multi-Otsu MNDWI, FMask, Sen2Cor, and MAJA) (Cordeiro, Martinez, and Peña-Luque 2021).Other studies have confirmed the advantage of Water Detect's approach classifying water at both local (i.e.reservoir) and regional levels (Peña-Luque et al. 2021;Tottrup et al. 2022).However, for the Water Detect algorithm to achieve maximum accuracy for a given region or optical sensor, the best input parameters need to be used.If sub-optimal parameters are chosen, the accuracy of the resulting classification can be compromised.Sensitivity analysis algorithms designed to test and compare the results of multiple input parameters can be used to identify optimal parameter values, i.e. those that most accurately detect water, for each use case.Currently, no automated script exists that performs a sensitivity analysis for the Water Detect algorithm meaning there is little guidance for practitioners wishing to use this method for water mapping and limited information on the likely consequences of suboptimal parameters on water detection accuracy.
Optical satellites with high spatial resolution and high revisit frequency can increase the quality of input data to water detection methods, improving the mapping of intermittent river features and the overall accuracy of water detection.If the extent of surface water in a river changes markedly through time, which is common in intermittent systems, the spatial resolution and the frequency of cloud-free imagery constrain our ability to quantify hydrological changes.In this sense, the Planetscope constellation (Planet Team 2021) is a game-changer, producing daily global coverage at high spatial resolutions (3 m).The high frequency-and spatial resolution of imagery obtained from Planetscope now means that riverine surface water can be mapped with much higher precision than previously achieved with coarser and less frequent imagery from platforms such as Landsat, Sentinel, or MODIS.However, while Planetscope accessibility, frequency, and pixel resolution represent a significant advance, their sensors lack longer spectral wavelengths (e.g.shortwave infrared -SWIR bands).This can be problematic for water mapping as these wavelengths are known to be sensitive to water absorption and are widely used in water mapping.For instance, the Water Detect algorithm was initially proposed to be applied to Sentinel-2 images with SWIR bands.Planetscope sensors also have no onboard calibration devices, making it hard to systematically implement corrections to the multiple generations of this satellite constellation, resulting in spectrally-variable output data (Huang and Roy 2021).While initially problematic, this difference in spectral quality and number of bands among Planetscope images may be overcome by spectral data sharpening and band synthesizing.
Image sharpening has been developed to get the best of both spectral and spatial resolution by integrating different image sources to improve the information content and quality of multispectral images (Kaplan and Avdan 2018).Generally, high-resolution multispectral imagery is merged with lowerresolution images to create a hybrid product with high resolution and good spectral quality (Fonseca et al. 2011).Some satellites, such as Landsat, Worldview, and Quickbird, provide a panchromatic (Pan) band, which can be used in sharpening.Satellites with no Pan band can use other satellites' high-resolution bands to emulate a Pan band, provided the pixel resolution and the sharpening algorithm are suited to the specific process.For instance, to take advantage of Planetscope's four spectral resolution bands (3 m), Li et al. (2020) tested methods to combine spectrally-corrected Sentinel-2 imagery with high-resolution but uncorrected, Planetscope imagery for Earth Observation studies.Li et al. (2020) concluded that it is feasible to sharpen 3 m VNIR (Visible -Blue/Green/Red, and Near-Infrared -NIR bands) and synthesize Red-edge and SWIR reflectance on days that Sentinel-2 and Planetscope are spatially overlapping.Sharpening and synthesizing bands can also be highly beneficial for surface water mapping.For instance, sharpening methods, such as the Landsat-MODIS fusion (Jarihani et al. 2015), could be adapted to Planetscope-Sentinel. Given that water has a strong absorption on the SWIR band (Xu 2006), synthesizing bands allows the application of many methods for water detection that uses the SWIR band (Feyisa et al. 2014;Xu 2006;Fisher, Flood, and Danaher 2016;Wang et al. 2018;Jarihani et al. 2015), even if the chosen satellite is not equipped with the sensor to capture this wavelength.Although using high-resolution and high-frequency imagery combined with sharpening and band synthesizing techniques have the potential to improve the accuracy of surface water mapping, their specific benefit to the Water Detect algorithm remains unknown.Moreover, as the sharpening and band synthesizing process can require considerable processing time, there is a need to quantify the accuracy gains associated with this process.
Our study aimed to improve the ability of the Water Detect algorithm to detect surface water in intermittent rivers by using a case study to examine the extent to which satellite/band combination, image sharpening and input parameters influenced water detection accuracy.The satellites evaluated were Planetscope which has a high spatial resolution and narrower spectral range, and Sentinel-2, which has a lower spatial resolution but wider spectral range.To address our aim, we developed a workflow and automated routines that sharpened images (based on the method by Li et al. 2020) and identified the optimal input parameters for Water Detect.The input parameter routine was a sensitivity analysis that evaluated all possible combinations of spectral indices (NDWI, MNDWI), maximum clustering, and regularization.Our study also evaluated the extent to which the optimal satellite/band combination, image sharpening and input parameters differed depending on whether time series data or a discrete period was evaluated -providing an insight into whether a time-averaged approach can accurately map water across the hydrologic diversity that typifies intermittent rivers.To highlight the benefit of our approach, we compared the performance of Water Detect default input parameters with those identified by our sensitivity analysis.We conclude by visually assessing the sharpened results and the ability of the improved water maps to describe small linear water features (i.e.runs) and quantifying their vulnerability to misclassification (i.e.burnt, shaded and heavily vegetated areas) -issues likely to be problematic for other studies.Our study was conducted on time series imagery with 9 time steps that spanned considerable variation in surface water extent along an 81 km river stretch of the lower Fitzroy River, an intermittent river in north Western Australia.The automated routines produced as part of this study and the associated workflow are available on GitHub (see Data Availability Statement) and can be used by practitioners to determine if image sharpening is needed and to identify the satellite/band combinations and input parameters that optimize the accuracy of water detection.This will assist the development of best practices in identifying and mapping surface water in intermittent rivers and contribute to our ability to manage and protect these systems.

Study area
The Fitzroy River, located in the Kimberley region of north-western Australia (Figure 1), is an intermittent river whose flow regime is classified as "predictable summer highly intermittent" (Kennard et al. 2010).The river which is situated in a savannah landscape, experiences extreme flood flows during the wet season (November to April) and dries to poorly interconnected or disconnected pools in the dry season (May to October) (Beesley et al. 2021).The extent to which surface water fragments during the dry season is linked to the magnitude of wet season flows, which vary markedly among years (Kennard et al. 2010) and groundwater inputs (Taylor et al. 2018).We focused on one reach, an 81 km stretch of the lower Fitzroy River (Figure 1).This section had little to no groundwater discharge, and surface water transitioned from a fully connected channel during the wet, down to a pool-run/riffle sequence along a thalweg during the early dry and down to isolated pools during the late dry.The study area thus provided an excellent representation of the range of hydrologic conditions that the river experiences and enabled us to assess the ability of our multispectral approach to map narrow linear features (i.e.runs/riffles).
The area of interest polygon, i.e. the study site, was determined by firstly mapping the river corridor and then creating a polygon that encompassed a floodplain buffer.Specifically, we used slope values >20 degrees from a Digital Elevation Model (DEM) derived from LiDAR (2 m) to identify the main river corridor (i.e.main stem).Values >20 degrees were considered riverbanks, given that the lower Fitzroy River has deeply scoured channels.After filtering slope values, the river corridor was visually defined and manually polygonized using ArcGIS Pro 3.0 and Esri base maps with a total area of 10.08 km 2 .The river corridor was then buffered on each side by 1 km to expand the area of interest (169.98 km 2 ) so that it contained the main channel and the immediate floodplain (Figure 1).The resulting polygon was used to clip images for further analysis.We deliberately chose to include both the main channel and the adjoining floodplain in the assessment in recognition of the importance of river-floodplain connectivity to the ecological functioning of rivers; hence, the ability of the method to span this interface would be an important consideration for many.Note, the DEM was provided by the Western Australian Department of Water and Environment Regulation.

Planetscope
As a source of high spatial resolution imagery, we used cloudless Level-3B PlanetScope-0 analytic surface reflectance orthoscenes atmospherically corrected by Planet Labs to surface reflectance using the 6S radiative transfer model (Kotchenova et al. 2006) with ancillary data from MODIS.The Planetscope-0 satellites have four spectral bands (VNIR): Blue (B: 455-515 nm), Green (G: 500-590 nm), Red (R: 590-670 nm), and Near-infrared (NIR, 780-860 nm) with a spatial resolution of 3 × 3 m (Planet Team 2021).We used 82 images captured in February, June and October of 2018, 2019, and 2020 (~9 tiles per month).The acquisition of images on dates close to Sentinel-2 image capture helps produce better sharpening quality results (Fonseca et al. 2011).Planetscope images were acquired using an academic license provided by Planet Labs Inc (https://planet.com accessed on 21 November 2022).

Method workflow
The study workflow was divided into three parts: image sharpening and band synthesizing, sensitivity analysis and performance assessment (Figure 2).The first part sharpened Sentinel-2 VNIR bands using Planetscope VNIR bands as Pan and synthesized SWIR bands.The second part tested five satellite/ band combinations and several combinations of parameters for the Water Detect algorithm using a sensitivity analysis.The last part was a performance assessment of the tested satellite/ band combinations and parameters and a visual analysis of the results.Li et al. (2020) method for sharpening and synthesizing Sentinel-2 bands with Planetscope was automated.We followed the same workflow found in Li et al. (2020) but used different python libraries intending to improve performance (see Data Availability Statement).For a detailed description and equations refer to the Supplementary Material.

Image sharpening and band synthesizing
The first step of the automated sharpening process was to check if both images were in the same projection and if the reference image (Planetscope) was within the extent of the target image (Sentinel-2).Each band of Sentinel-2 and Planetscope images was co-registered using AROSICS (Scheffler et al. 2017) to ensure that all pixels were in the same geographic position.In the AROSICS co-registry process, the target image (Sentinel-2) is resampled to match the reference (Planetscope) pixel resolution (3 m).The co-registered Planetscope VNIR bands were spatially degraded to Sentinel-2 spatial resolutions (i.e. 10 or 20 m) using an OpenCV 2D convolution filter (Bradski 2000) with a 41 × 41 matrix (−20 . . . .20 -with center i = 0, j = 0).
As Planetscope is equipped with sensors to capture only four bands (VNIR), and water has a high absorption spectrum on the SWIR band (Xu 2006), a 3 m equivalent SWIR band was synthesized from Sentinel-2 SWIR 1 (Band 11) and SWIR 2 (Band 12).For that, each of the Sentinel-2 SWIR bands was resampled and co-registered with Planetscope bands.While Jarihani et al. (2015) suggested calculating indices and then sharpening data for water footprint mapping, Planetscope lacks the same spectral bands as Sentinel-2 and necessitates band synthesizing first followed by an index approach.All Planetscope bands were degraded to 20 m resolution (Supplement Equation (1) -with f = 1/40 m −1 ) and the respective SWIR modulation transfer function (MTF) values (2021) were used to calculate the convolutional matrix.The degraded 20 m bands were used as input variables in multiple linear regression equations (Supplement Equation ( 5)), executed with Scikitlearn (Pedregosa et al. 2011), using each band as explanatory (or independent) variables and the 20 m Sentinel SWIR bands as the response (or dependent) variable.The regression coefficients were used in Supplement Equation ( 4) with the Planetscope bands to synthesize the SWIR bands.Lastly, a highpass modulation equation (HPM) (Schowengerdt 2007;Li et al. 2020;Vivone et al. 2019) was applied for each VNIR band (Supplement Equation ( 6)) and to synthetic SWIR bands (Supplement Equation ( 7)) to finish the sharpening process.

Ground truthing
The Fitzroy River lies in a remote location and access to the river is limited, making the collection of ground truth information on water extent costly and logistically infeasible for all images.Opportunistic handheld GPS water extent points were collected during two fieldwork campaigns in June and October 2019 and September 2020 (Figure 3C).Although the ground truth points were insufficient to validate the entire area of interest, they aided in interpreting satellite imagery and creating hand-classified validation maps.Validation was also assisted by using highresolution images (10 cm) from the Western Australian Land Information Authority (Landgate − 21 December 2016) (Figure 3D).
To overcome the scarcity of ground truth data, we manually polygonized the water extent for all images by visually interpreting areas identified as water (Figure 3A).We used supporting data, including highresolution aerial imagery and LiDAR DEM products (2 m -Captured during October 2016 -Figure 3E), to understand preferential runoff flow paths and a combination of Sharpened SWIR bands (3 m -Figure 3F) and natural (Figure 3A) and false colors (Figure 3B) of Planetscope (3 m) to enhance contrast.The resulting hand-digitized polygons were smoothed using ArcGIS's "smooth polygon" tool to give features a more natural and rounded appearance, as manual polygonizing can leave sharp and unnatural edges.Although manual visual interpretation is largely used in remote sensing and land use accuracy assessments (Shao and Wu 2008;Zhou et al. 2014;Costa, Foody, and Boyd 2018), it has limitations.This subjective and qualitative process is influenced by the interpreter's understanding of the spectral characteristics of each object and local knowledge of the study area, meaning that the results will vary with different interpreters.However, for an experienced interpreter with good local knowledge of the study area, the chances of misinterpretation of an object are less likely than automated methods.The interpreted data was used as a ground truth reference in subsequent quantitative accuracy analysis.

Automated sensitivity analysis for the water detect algorithm
The Water Detect algorithm (Cordeiro, Martinez, and Peña-Luque 2021) identifies surface water features by combining spectral water indices (e.g.Normalized Difference Water Index -NDWI, Modified Normalized Difference Water Index -MNDWI, and the Multiband Water Index -MBWI) with VNIR -SWIR bands to highlight features of interest that are clustered based on a multidimensional agglomerative clustering and a naive Bayesian classifier.In addition to choosing the main water spectral indices and the combination of bands, two other parameters can significantly influence the results of Water Detect: maximum clustering and regularization of the normalized spectral indices.These parameters are particularly important for our study, given that the area of interest was a buffered corridor around the main river channel.In this case, the number of possible targets to be identified as clusters is lower than an entire Sentinel-2 scene, such as that used in the original study by Cordeiro, Martinez, and Peña-Luque (2021).Moreover, the regularization of spectral water indices promotes a shrinkage of the water indices variance and can avoid a water cluster being split in two in the presence of different water constituents (i.e.organic and inorganic matter).The main input parameters for Water Detect must be configured in an initialization file (.ini) which holds all necessary inputs.
We developed a sensitivity analysis algorithm (see Data Availability Statement) to test all possible combinations of inputs parameters (i.e.spectral indices, maximum clustering, and regularization) for Water Detect within a specified range and assessed accuracy, determining the most accurate inputs for each specific case, and producing a most-to-least accuracy ranking.The developed sensitivity algorithm has four main inputs: 1) the Water Detect default initialization file as per Cordeiro, Martinez, and Peña-Luque (2021), 2) the range of maximum clustering and regularization given by lowest, highest, and step values; 3) the images to be tested, and 4) the ground truth raster to be used in the accuracy assessment.
The first step to finding the most accurate Water Detect input parameters for each tested image was to calculate all possible combinations between the specified range of maximum clustering and regularization values and the images to be tested.We automatically changed the initialization file for each unique combination based on the number of bands, maximum clustering, and regularization and used the modified initialization file to execute Water Detect.The reference points (ground truth) were then compared with the classification results (water mask) using the Scikit-learn Metrics module and Matthews correlation coefficient (MCC) function (Pedregosa et al. 2011).The MCC (Eq. 1) is a performance measure of the quality of binary and multiclass classifications proposed by Matthews (1975) and revised by Baldi et al. (2000), which accommodated the Pearson correlation coefficient to assess the correlation in confusion matrices.The MCC is less influenced by imbalanced datasets, as it considers both accuracy and error rates and uses all confusion matrix values (Bekkar and Alitouche 2013).In intermittent rivers, the ratio between non-water and water classes will invariably become smaller as the dry season progresses and the relative number of water pixels decreases.Therefore, using metrics that account for skewed or biased datasets is necessary.MCC values range from −1 (total disagreement between predicted scores and true labels' values) to 0 (prediction no better than random) to + 1 perfect prediction (Fernández et al. 2018).A high MCC score means high accuracy and low misclassification of positive and negative classes (Chicco, Tötsch, and Jurman 2021).

MCC ¼
tp � tn À fp � fn ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Where tp = True Positives, fp = False Positives, tn = True Negatives, and fn = False Negatives.
To assess how input parameters for the Water Detect algorithm changed the accuracy of water identification, we used 3, 9, and 1 as lowest, highest, and step values for maximum clustering (i.e. 3, 4, 5, 6, 7, 8, 9), and 0.01, 0.1, 0.01 for regularization (i.e.0.01, 0.02 . . .0.10).We tested five satellite/band configurations as input for the sensitivity analysis algorithm (Table 1).The accuracy of the classified images was evaluated using all valid points retrieved from each reference ground truth raster (~19 × 10 6 points) and the respective values of those points in each classified raster.We therefore applied our sensitivity analysis algorithm across 3,150 configurations, i.e. 70 possible input combinations (i.e. 7 maximum clustering values *10 regularization values), 5 satellite/band combinations, 9 time steps: February, June and October of 2018, 2019, and 2020.Thus, nine MCC scores were generated for each satellite/band combination and for each clustering-regularization combination (total of 3,150 MCC values).

Comparing satellite/band combination performance
The water detection performance of different satellite/band combinations (Planetscope -VNIR, Sentinel-2 -VNIR, Sharpened -VNIR, Sentinel-2 -VNIR-SWIR, and Sharpened -VNIR-Synthetic SWIR) were compared over the nine time periods assessed (February, June, and October of 2018, 2019, and 2020).Firstly, we compared each satellite/band performance using all the combinations considered for the sensitivity analysis and used boxplots and density plots to aid in interpreting accuracy dispersion, patterns, and regions of high performance.Secondly, we compared the performance of the water detection results using the best fitting input parameters, identified by the developed sensitivity analysis, with optimum timeaveraged parameters (inputs that presented high averaged accuracy over all considered images) to test if the optimum time-averaged parameters had good adherence with optimal input parameters.We then compared the performance of each satellite/ band combination using the optimum timeaveraged parameters.
Although the sensitivity test algorithm can extract top-performing input combinations for each image, for reproducibility purposes, we also aimed to find a combination that would perform consistently well over multiple images and time periods for each satellite/band, i.e. to identify optimum time-averaged parameters.We identified the optimum timeaveraged parameters for each satellite/band by averaging the MCC results over time for each combination of maximum clustering and regularization.In addition, we analyzed the optimum time-averaged parameter combinations for each time step and satellite/

Visual analysis
Visual assessment is an important step in image data validation as human interpretation can detect nuances not readily identified by image statistics.The visual analysis was divided into two parts: verifying the sharpening and band synthesizing results, and the water masks resulted from the optimum time-averaged Water Detect parameters.In the first part, we sought to understand the impact of sharpening on image edge effect, caused after merging different tiles, and on water identification.In the second part, we checked the water masks for narrow linear features between pools and investigated cases of water misclassification, such as burnt areas, heavily shaded areas and dense vegetation.Misclassification was quantified by determining the percentage area of each misclassification type for each satellite/band combination.

Results
Results are separated into four sections.The first section uses data from the whole time series to evaluate the extent to which satellite/band combination affects water detection accuracy using MCC scores.
The second section assesses how improved water maps using the time series data compare with data from discrete periods, providing an insight into whether a time-averaged approach can accurately map water across the hydrologic diversity that typifies intermittent rivers.This part also compares water detection accuracy using the default input parameters of Water Detect with the optimum and optimum time-averaged input parameters selected by our sensitivity analysis.It also examines the performance of sharpened imagery using optimum timeaveraged input parameters.The third section evaluates the extent to which maximum clustering and regularization parameters affect the accuracy of water detection.The fourth section uses water maps to provide a visual assessment of the sharpening results, the ability of our approach to detect common intermittent rivers linear features, and quantitatively appraises our approach's vulnerability to misclassification, i.e. to identify burnt, shaded or heavily vegetated areas as water.

Water detection accuracy using time series data
For our case study, VNIR bands were considerably better at detecting surface water compared to SWIR bands.VNIR bands typically received mean MCC scores of >0.7, whereas SWIR bands had mean MCC scores between 0.48 and 0.55 (Figure 4).Note, MCC scores of + 1 indicate perfect prediction and −1 total disagreement.Sharpened images increased the accuracy of water detection but only marginally.For instance, Sharpened (VNIR) images had a mean MCC score of 0.746, whereas unsharpened images from Sentinel-2 (VNIR) and Planetscope (VNIR) had scores of 0.721 and 0.704, respectively (Figure 4).
Input parameters for the Water Detect algorithm had a greater influence on the ability to accurately detect water than the choice of satellite/band combination.For instance, the MCC scores displayed more variation within a satellite/band than among satellite/band combinations (Figure 4).This was particularly the case for SWIR bands which displayed extremely large interquartile ranges (i.e.0 to 0.756 MCC scores for Sentinel-2 SWIR and 0.178 to 0.793 for Sharpened SWIR) (Figure 4).Sharpened (SWIR) presented the greatest range in accuracy, with MCC scores ranging from −0.266 to 0.87 (or a 427% increase).This high variation in accuracy indicates that using sub-optimal input parameters can markedly reduce the accuracy of water detection.VNIR bands displayed considerably smaller interquartile ranges indicating they were more robust, i.e. sub-optimal input parameters did not impact their accuracy as much (Figure 4).However, it should be noted that some parameter choices also led to poor water identification for these satellite/band combinations, as illustrated by the outliers (black dots), which received MCC scores as low as 0.01.

Water detection accuracy: time series averaged vs. discrete times vs. the water detect default
The accuracy with which optimum time-averaged input parameters detected surface water was very similar to that of parameters chosen to improve accuracy for a single time period, indicating that four of our case study time-averaged parameters can be confidently used to map water for an array of hydrologic conditions.This was particularly the case for VNIR bands -see the congruence between the orange and blue lines in the left panels in Figure 5. SWIR bands also showed considerable similarity, but timeaveraged parameters displayed slightly lower accuracy (see the orange line sitting below the blue line in the right panels in Figure 5).
The Water Detect algorithm's default parameters (regularization: 0.02 and maximum clustering: 7) were highly variable in their ability to accurately detect surface water at our study site through time.For instance, for some satellite/band combinations at some times, i.e.Planetscope (VNIR) in February, June and October 2018 and June 2020, the default parameters had similar accuracy to optimal parameters chosen by our sensitivity analysis.For other satellite/ band combinations at other times, i.e.Sentinel-2 (SWIR) during October 2018, February and October 2019, and October 2020, the default parameters performed extremely poorly (Figure 5).The mechanism driving poor performance was unclear as it was observed in both VNIR and SWIR bands and during times when the river was connected and severely contracted.
Sharpened (VNIR) showed the highest water detection accuracy, i.e.MCC performance, with a mean score of 0.822, followed by Planetscope (VNIR) 0.809, Sentinel-2 (VNIR) 0.791, Sharpened (SWIR) 0.783, and finally Sentinel-2 (SWIR) 0.753 (Figure 6).While Sharpened (VNIR) imagery was consistently accurate through time, Planetscope (VNIR) had the highest MCC scores for six of the nine time steps.However, it also had the highest standard deviation due to poor performances in February 2018 and October 2020, likely due to sensor calibration issues and appearing to be corrected in the sharpened image.Comparably, Sentinel-2 (VNIR) and Sharpened (SWIR) images also had excellent performance over time, with mean MCC scores very close to Planetscope (VNIR).In fact, the Water Detect algorithm consistently identified water in all images with good performance, i.e. the difference between the highest mean (Sharpened VNIR) and the lowest (Sentinel-2 SWIR) was only 0.069 or 8%.Furthermore, the Sharpened (VNIR) MCC scores appeared to follow the highest accuracy between their source images, i.e.Sentinel-2 (VNIR) and Planetscope (VNIR), which suggests that the sharpening process can benefit water detection consistency over time and helps to explain its overall good performance.Even though in our case, the performance metrics were almost negligible between Sharpened (VNIR), Planetscope (VNIR), and Sentinel-2 (VNIR).

Water detection accuracy: maximum clustering and regularization
The Water Detect input parameters that increased water detection accuracy (i.e.MCC scores) on average for all satellite/band combinations were low maximum clustering and mid-regularization (Figure 7 -Grouped mean).This is illustrated by the highest MCC values (yellow) occurring in a narrow ellipse to the left side of the bottom right panel in Figure 7. Maximum clustering exerted more influence on accuracy than regularization, as shown by more significant variation in MCC values, i.e. the greater color differentiation along the clustering axis than the regularization axis in Figure 7.This pattern also played out when evaluating the top five parameter combinations for each satellite/band combination.For instance, four out of five had a mean maximum clustering between 3 and 5, with Sentinel-2 (VNIR) having an optimal value of 6.Similarly, for regularization, four out of five satellite/band combinations had values between 0.07 and 0.1, with Sharpened (VNIR) having an optimum value of 0.04.Our results indicate that across a hydrological continuum (here examined using images from multiple times), that water detection accuracy was increased by the following parameters: Planetscope (VNIR) (max.clustering: 3, regularization: 0.09), Sentinel-2 (VNIR) (6, 0.07), Sharpened (VNIR) (5, 0.04), Sentinel-2 (SWIR) (3, 0.08) and Sharpened (SWIR) (4, 0.1).There was a minimal difference in accuracy between the top five raking parameter combinations for each satellite/band combination, with the MCC difference between the first and last (1 and 5) of 0.007 (Planetscope -VNIR), 0.012 (Sentinel-2 -VNIR), 0.017 (Sharpened -VNIR), 0.006 (Sentinel-2 -SWIR), and 0.023 (Sharpened -SWIR).Thus, for our case study, we could recommend any of the top five most accurate input combinations as suitable.

Visual analysis and misclassification cases
Our visual assessment indicated that the sharpening method seamlessly unified all Planetscope scenes (Figure 8).The resulting image looked more natural with improved contrast compared with the merged raw images.For instance, the black arrows in Figure 8B indicate contrast issues (i.e.edge effects) that arise when merging raw Planetscope imagery, and Figure 8C shows how these problems are resolved after sharpening, i.e. smooth edges, obvious contrast, and critical image detail easily visible.Interestingly, edge effects did not appear to affect water identification using Water Detect on Planetscope images (Figures 9A, B and C).No edge effect was observed in the Sharpened SWIR images (Figures 8D and E).
The Water Detect algorithm was able to identify surface water features in all images.This was also the case when narrow linear features between pools (i.e.runs) were analyzed, an important feature in river fragmentation/connectivity studies.A visual appraisal of these images reveals how well-isolated pools and runs were mapped (Figure 10B, F and I).It also shows how successfully inundated floodplain channels were mapped (Figure 10A, D and G).
Although the Water Detect algorithm was able to identify surface water features in all images, some instances of misclassification were observed.This occurred even when optimum time-averaged input parameters were used for each satellite/band combination in the water detection process.For instance, when using Planetscope and Sentinel-2 (VNIR), burnt areas were consistently and erroneously classified as water (Figure 11).This can dramatically affect water detection accuracy, given that burnt zones can extend for large areas, influencing the water mask accuracy greatly, as seen in Figure 11.However, our results showed that image sharpening reduced misclassification, especially during June and October 2018, but also during October 2019 and October 2020 (see Figure 11).Heavily shaded areas were also frequently mistaken for water in VNIR images (Figure 12); however, the extent of misclassification was much smaller than burnt areas (maximum <2% against a maximum of <38%).In this case, sharpening also reduced misclassification.In some SWIR images (Sentinel-2 and Sharpened), spots of dense riparian vegetation were misclassified as water, as were some spots of sparse vegetation during dry months, i.e. see June and October in Figure 13.In this case, sharpening increased misclassification -see June and October 20 October 201,819 and June and October 2020 in Figure 13.

Discussion
This case study found that the ability to accurately detect surface water in the Fitzroy River using multispectral imagery was improved by identifying optimum input parameters for the Water Detect algorithm and using VNIR bands, with relatively little gained by image sharpening routines.Our automated sensitivity test was highly effective at identifying optimal inputs that increased the accuracy of the Water Detect algorithm, with the use of sub-optimal inputs leading to considerable declines in accuracy.In our study system, we found that VNIR bands were better able to detect water than SWIR bands highlighting the possibility of accurate water detection using simple satellite/band combinations.The benefits of sharpening Sentinel-2 with Planetscope images were marginal, especially if the target area did not include heavily shadowed or extensive burnt areas; thus, sharpening should be considered only after weighing the benefits against the additional processing time.Using    automated routines to sharpen imagery and determine input parameters for the Water Detect algorithm is a significant advancement that will increase the accuracy of water detection and mapping in intermittent rivers.
The correct choice of input parameters for Water Detect considerably influenced the ability of the algorithm to accurately classify surface water in our case study.For instance, optimal parameters for maximum clustering and regularization identified by our sensitivity test algorithm increased accuracy by 427% compared to suboptimal values for the satellite/band combination with the greatest range in accuracy, here Sharpened (SWIR).In our case study, the optimum time-averaged maximum clustering value was 6 for Sentinel-2.This finding is in accordance with the results of Cordeiro, Martinez, and Peña-Luque (2021), who found that the accuracy of Sentinel-2 peaked and stabilized with maximum clustering values between 5 and 10 (Water Detect default = 7).Our study is the first systematic test of regularization for the Water Detect algorithm, and our results indicate that the ability to accurately identify water is increased with midregularization values.However, the fact that the optimal value for this parameter varied considerably among discrete time periods, even though water detection accuracy changed relatively little, suggests that this parameter exerts less influence on water detection accuracy than the maximum clustering value.Ground-truthed data are required to confirm any sensitivity test and performance assessment; however, obtaining broad-scale ground truth data may be infeasible for many remote intermittent river systems.In such instances, we recommend that practitioners use a small but representative stretch of their study system to collect ground-truth data, perform  the sensitivity test and identify optimum timeaveraged input parameters.
Our automation of Li et al. (2020) sharpening routine is a significant advancement that makes it possible to easily and quickly sharpen many sets of images or time series.This automation is advantageous because of its speed and because it reduces the possibility of human error, which can arise when many images are sharpened manually.The resulting images were clear and seamlessly unified compared to raw merged Planetscope images and are visually comparable to the results achieved by Li et al. (2020).The sharpened images also outperformed, albeit slightly, raw merged images when identifying water, which is another indicator that the sharpening process successfully fused the best qualities of the source images.
While our automated image sharpening was successful, our study revealed that sharpening had a negligible effect on the accurate identification of water in most situations.For instance, the maximum increase in accuracy associated with sharpening in our study system was only 1.52%.Nonetheless, when one of the source images performed poorly, the sharpened image tended to perform as well as the most accurate source image (Sentinel-2 or Planetscope), leading to more consistent accuracy through time.The benefits of sharpening can be attributed to a given sensor's spectral quality or spatial resolution since we used the best combination of both in the sharpened image.Depending on a project's objective, this could justify the additional processing time despite marginal improvement.
VNIR images outperformed SWIR images in all scenarios but one, indicating that four-band images, which are much more common and less expensive to capture, are suitable for accurate water detection.This finding was somewhat surprising since the Water Detect algorithm was developed initially to be used only with SWIR images (Cordeiro, Martinez, and Peña-Luque 2021).In this sense, VNIR sensors should be considered an option in future versions of the Water Detect algorithm.However, we caution against using VNIR images for detecting water in intermittent rivers in built-up, burnt, or heavily shaded areas, as research suggests that SWIR bands perform better in these settings (Xu 2006;Zhang et al. 2011).We recommend that our sensitivity algorithm be used to reveal the optimum satellite/band combinations and input parameters for intermittent rivers heavily impacted by shade.
Even though all tested images demonstrated excellent overall performance in water detection when optimized Water Detect input parameters were used, we observed significant misclassifications linked to dark surfaces, i.e. burnt areas, heavily shaded areas, and shaded areas in dense vegetation, that are worthy of discussion.Misclassification linked to burnt and heavily shaded areas was more pronounced with VNIR images, and misclassification with dense vegetation was more pronounced with SWIR images.Many studies have detailed the misidentification of shadows and dark surfaces as water (Xu 2006;Zhai et al. 2015;Guo et al. 2017;Zhang et al. 2011).Burnt areas can be particularly problematic in northern Australia, where Indigenous people frequently light fires as a part of land management (Jackson, Finn, and Featherston 2012;Bird, Bird, and Parker 2005;McGregor et al. 2010).Those seeking to identify surface water in regions with frequent fire histories should consider using SWIR bands; this is especially the case if the area of interest is the floodplain (i.e.main channel areas are infrequently burnt).A possible solution to minimize misclassification would be introducing methods into the classification routine that mask heavily shaded and burnt areas.Such additional steps are possible (Mostafa 2017;Yamazaki, Liu, and Takasaki 2009;Pereira 2003;Roy et al. 2005;Frantz et al. 2016) but were beyond the scope of our study.
In our case study, Sentinel-2 (VNIR) images performed surprisingly well relative to Sharpened (VNIR) and Planetscope (VNIR), even though this satellite's spatial resolution is more than three times as coarse.This finding is important as the marginal improvement in performance from Planetscope (VNIR) and Sharpened (VNIR) incurs the expense of greater processing time and the financial cost of purchasing Planetscope versus the freely available Sentinel-2 imagery.It is likely that the relatively large size (~100 m width) of the main channel in our case study minimized the difference in performance between high resolution (Planetscope and Sharpened) and coarser images (Sentinel-2).For instance, while the study site was chosen because it dried down to pools and contained small linear features (runs/riffles) these small features still made a relatively minor contribution to surface water on average through time.A study examining a narrow tributary (width <20 m) may find that satellites with higher resolution have superior water detection accuracy.We believe that Sentinel-2 is a solid option for studies seeking to identify surface water at catchment or sub-basin scales, but ultimately that practitioners should choose satellites with the spatial resolution most suited to their study system and aims.
Practitioners seeking to identify surface water using the Water Detect algorithm can improve their accuracy by considering satellite/band combinations, input parameters and the need for image sharpening.We recommend that a pilot study, similar to the workflow described here, be undertaken at the commencement of a study to determine optimal conditions.Practitioners need to be mindful of their river/stream size and study aims and the resolution required to meet them, as this will also likely influence the choice of a satellite.Similarly, consideration should be given to potential sources of water misclassification, as techniques such as image sharpening can be used to reduce this error.While our case study identified the conditions that improve water detection for the lower Fitzroy River and are likely to perform well for other large lowland river systems in savannah landscapes, the suitability of our finding to other settings remains unknown.Only by evaluating the accuracy of Water Detect in different systems with different aims can we learn which satellite/band combinations, input parameters, and sharpening methods are most appropriate and whether generalizations can be made.

Figure 1 .
Figure 1.Location of the study area in the Fitzroy River, Kimberley, Western Australia.The study area is represented by the light blue polygon and the inset dashed box on the right shows an example of the Fitzroy River corridor and the surrounding 1 km buffer that was used as a polygon to clip images for analysis.

Figure 2 .
Figure 2. Study workflow showing image sharpening, band synthesizing and sensitivity analysis prior to the performance assessment.Note that the satellite/band combinations tested are shown in light blue color boxes.

Figure 3 .
Figure 3. Example of the different sources of data used to ground truth the water masks.

Figure 4 .
Figure 4. Boxplots of MCC scores for all satellite/band combinations, showing their mean (white dots), median (black horizontal line), interquartile range (colored box), minimum/maximum values (whiskers), and outliers (black dots).High MCC scores indicate better water detection performance.

Figure 5 .
Figure 5.Comparison of performance between the most accurate images using optimum fitting parameter combinations, the optimum time-averaged fitting parameter combination, and the water detect default parameters (maximum clustering 7, regularization 0.02) for the five satellite/band combinations examined.Horizontal lines show the adherence between the fittings.

Figure 6 .
Figure 6.Comparison of the accuracy of water detection for sharpened and unsharpened images through time.Bars show MCC scores for each time and satellite/band combination, as per the legend on the bottom.Note, optimum time-averaged parameters were used as inputs to water detect.

Figure 7 .
Figure 7. Point density plots comparing different time-averaged MCC scores for different combinations of the two inputs into the water detect algorithm, i.e. maximum clustering and regularization parameter for each satellite/band combination.The lower right panel shows the grouped mean for all satellite/band combinations combined.MCC scores represent the accuracy of water detection and are color coded as per the legend.Note, parameter combinations that produced poor accuracy, i.e.MCC scores < 0.7 were omitted from the figure to improve visual interpretation.

Figure 8 .
Figure 8. Visual comparison between Sentinel-2 (A) on the left, Planetscope (B) in the center, and Sharpened Sentinel-2 (C) on the right (February 2019).The black arrows in the center indicate the difference between scenes (edge effect) when merging raw Planetscope imagery without any treatment.Panels D and E show Sentinel-2 and Sharpened SWIR, respectively.

Figure 9 .
Figure 9. Water detection in raw merged Planetscope imagery at three locations within the study site affected by image edge effectsi.e.where Planetscope images were merged.Note the seamless identification of water identification across these boundary locations.

Figure 10 .
Figure 10.An example of water mapping at the study site through time as achieved by the best satellite/band combination and the optimum water detect input parameters identified using the sensitivity analysis.Each panel represents a different date and shows the variation in hydrological connectivity within and among years.The blue coloring indicates identified water.The base map in the background is imagery from Planetscope.Sharpened (VNIR) showed the best performance over the other considered satellite/band combinations in February 2018 and October 2020 (A and I); Sharpened (SWIR) in June 2018 (B); and Planetscope (VNIR) in October 20 February 201,819 June 2020 October 201,919 February 2020 and June 2020 (C, D, E, F, G, and H).

Figure 11 .
Figure 11.A typical case of misclassification -burnt areas.The left panel shows burnt areas identified as water in red, and on the right, a bar plot showing the percentage of burnt areas erroneously classified as water.

Figure 12 .
Figure 12.A typical case of misclassification -heavily shaded areas.The left panel shows heavily shaded areas identified as water in yellow, and on the right, a bar plot showing the percentage of heavily shaded areas erroneously classified as water.

Figure 13 .
Figure 13.A typical case of misclassification -dense vegetation.The left panel shows dense vegetation identified as water in pink, and on the right, a bar plot showing the percentage of shadows on dense areas erroneously classified as water.

Table 1 .
Satellite/Band configurations, water indices and parameters used in the sensitivity test.