Mapping burn severity in the western Italian Alps through phenologically coherent reflectance composites derived from Sentinel-2 imagery

Deriving burn severity from multispectral satellite data is a widely adopted approach to infer the degree of environmental change caused by fire. Burn severity maps obtained by thresholding bi-temporal indices based on pre- and post-fire Normalized Burn Ratio (NBR) can vary substantially depending on temporal constraints such as matched acquisition and optimal seasonal timing. Satisfying temporal requirements is crucial to effectively disentangle fire and non-fire induced spectral changes and can be particularly challenging when only a few cloud-free images are available. Our study focuses on 10 wildfires that occurred in mountainous areas of the Piedmont Region (Italy) during autumn 2017 following a severe and prolonged drought period. Our objectives were to: (i) generate reflectance composites using Sentinel-2 imagery that were optimised for seasonal timing by embedding spatial patterns of long-term land surface phenology (LSP); (ii) produce and validate burn severity maps based on the modelled relationship between bi-temporal indices and field data; (iii) compare burn severity maps obtained using either a pair of cloud-free Sentinel-2 images, i.e. paired images, or reflectance composites. We proposed a pixel-based compositing algorithm coupling the weighted geometric median and thematic spatial information, e.g. long-term LSP metrics derived from the MODIS Collection 6 Land Cover Dynamics Product, to rank all the clear observations available in the growing season. Composite Burn Index data and bi-temporal indices exhibited a strong nonlinear relationship ( R 2 > 0.85) using paired images or reflectance composites. Burn severity maps attained overall classification accuracy ranging from 76.9% to 83.7% (Kappa between 0.61 and 0.72) and the Relative differenced NBR (RdNBR) achieved the best results compared to other bi-temporal indices (differenced NBR and Relativized Burn Ratio). Improvements in overall classification accuracy offered by the calibration of bi-temporal indices with the dNBR offset were limited to burn severity maps derived from paired images. Reflectance composites provided the highest overall classification accuracy and differences with paired images were significant using uncalibrated bi-temporal indices (4.4% to 5.2%) while they decreased (2.8% to 3.2%) when we calibrated bi-temporal indices derived from paired images. The extent of the high severity category increased by ~19% in burn severity maps derived from reflectance composites (uncalibrated RdNBR) compared to those from paired images (calibrated RdNBR). The reduced contrast between healthy and burnt conditions associated with suboptimal seasonal timing caused an underestimation of burnt areas. By embedding spatial patterns of long-term LSP metrics, our approach provided consistent reflectance composites targeted at a specific phenological stage and minimising non-fire induced inter-annual changes. Being inde- pendent from the multispectral dataset employed, the proposed pixel-based compositing approach offers new opportunities for operational change detection applications in geographic areas characterised by persistent cloud cover.


Introduction
Fire is one of the major natural disturbance agents in European Alpine forests (Bebi et al., 2017;Kulakowski et al., 2016). Current fire regimes in the European Alps exhibit significant heterogeneity in terms of fire frequency, spatial extent and seasonality, according to the variability in climatic, environmental and anthropogenic drivers (Bebi et al., 2017;Conedera et al., 2018;Wastl et al., 2013;Zumbrunnen et al., 2011). Recent and projected increases in temperatures and drought conditions associated with climate change (Gobiet et al., 2014;Gobiet and Kotlarski, 2020;Gudmundsson and Seneviratne, 2016) are crucial factors for future shifts of fire regimes in the European Alps (Bedia et al., 2014;Cane et al., 2013;Schumacher and Bugmann, 2006), substantially increasing the probability of large wildfires occurrence (Barbero et al., 2019). Recently, the severe and prolonged drought conditions associated with heat waves that occurred during the summer of 2017 in several parts of south-central Europe (Rita et al., 2020) led to an anomalous fire season in many regions of France, Italy, Portugal and Spain (Turco et al., 2018).
Severity is one of the main factors influencing ecosystem responses, so its assessment is crucial to effectively guide post-fire management strategies aimed at promoting forest regeneration and the recovery of ecosystem services (Leverkus et al., 2018). From an ecological perspective, severity is defined as the magnitude of environmental change caused by fire (Key and Benson, 2006;Lentile et al., 2006). The term burn severity is commonly used in remote sensing applications (Keeley, 2009), and its differences with fire severity are related to the assessment period (Cansler and McKenzie, 2012;Lentile et al., 2006;Veraverbeke et al., 2010a). Fire severity commonly refers to an initial assessment of those effects directly related to combustion, such as fuel consumption and tree mortality measured immediately after the fire. On the contrary, burn severity refers to an extended assessment of severity, usually performed during the first growing season following the fire. This implicates that burn severity combines fire effects and the initial ecosystem response, including delayed mortality and survivorship (Key, 2006).
Mapping burn severity with medium-resolution satellite imagery, e. g. Landsat data, acquired in the pre-and post-fire growing seasons, is typically performed through bi-temporal indices based on the Normalized Burn Ratio (NBR) (García and Caselles, 1991), such as the differenced Normalized Burn Ratio (dNBR, Key and Benson, 2006), the Relative dNBR (RdNBR, Miller and Thode, 2007) and the Relativized Burn Ratio (RBR, Parks et al., 2014). Ecologically meaningful burn severity maps can be produced by classifying bi-temporal indices using thresholds derived from parametric models incorporating field measures of burn severity (Key and Benson, 2006;Kolden et al., 2015). Commonly adopted field data comprehend the Composite Burn Index (CBI) (Cansler and McKenzie, 2012;Key and Benson, 2006) and its modifications, i.e. GeoCBI (De Santis and Chuvieco, 2009) and weighted CBI (Soverel et al., 2010), percentage change in tree canopy cover and tree basal area (Miller et al., 2009).
Given the influence of image selection on bi-temporal indices (Chen et al., 2020;Veraverbeke et al., 2010a), pre-and post-fire image pairs employed for computing bi-temporal indices should meet temporal requirements relative to matched acquisition and optimal seasonal timing (Key, 2006;Veraverbeke et al., 2010a). Image pairs with similar acquisition timing throughout the year enhance spectral matching between pre-and post-fire conditions, enabling to disentangle between spectral changes induced by fire and external factors (Eidenshink et al., 2007;Key, 2006;Miller and Thode, 2007;Veraverbeke et al., 2010b). Plant phenology, solar elevation angle, illumination variations due to topography, and moisture content of both soil and vegetation are among the most important external factors causing inter-and intra-annual changes in the spectral response of the land surface (Key, 2006;Key and Benson, 2006;Veraverbeke et al., 2010b;Verbyla et al., 2008). Several approaches are useful to limit the influence of such non-fire induced changes. For example, specific topographic correction techniques can effectively reduce pre-and post-fire differences in reflectance values arising from illumination effects associated with rugged terrains (Veraverbeke et al., 2010b). Inter-annual variations in plant phenology and moisture content can be mitigated by performing a calibration based on dNBR values retrieved from the unburnt area surrounding the fire perimeter (Key, 2006;Meddens et al., 2016;Miller and Thode, 2007;Parks et al., 2014) and in the same forest type (Furniss et al., 2020;Picotte et al., 2020). The dNBR offset, usually computed as the average from the unburnt area and subtracted from the entire scene, should ideally be close to zero in the case of image pairs with matched spectral conditions (Key, 2006). This approach proved to be effective in improving the relationship between bi-temporal indices and CBI (Meddens et al., 2016;Parks et al., 2018). Nevertheless, extracting the dNBR offset can be a subjective process (Picotte et al., 2020) and depends on the spatial configuration of the landscape mosaic (Parks et al., 2018). In particular, the selection of an appropriate set of pixels requires the presence of similar forest types near the burnt area, and a single value could be suboptimal to calibrate inter-annual differences in multiple forest types or within burnt areas with a broad altitudinal gradient.
Optimal seasonal timing refers to the timing of image acquisition that maximises the contrast between healthy and burnt vegetation (Chen et al., 2020;Eidenshink et al., 2007;Key and Benson, 2006;Veraverbeke et al., 2010a). In temperate ecosystems, the optimal seasonal timing typically spans from early-to-middle growing season dates as the vegetation reaches its maximal photosynthetic activity (Eidenshink et al., 2007;Key, 2006;Key and Benson, 2006;Picotte et al., 2020). In burnt areas spanning a wide elevation range, the optimal seasonal timing can vary considerably, thus requiring multiple images to be processed (Key, 2006). Moreover, complex landscape mosaics such as those of the Mediterranean Basin exhibit a high degree of local variations in phenology associated with different land covers (Veraverbeke et al., 2010a).
The selection of appropriate image pairs poses challenges for the operational assessment of burn severity with bi-temporal indices, e.g. when large areas or a high number of fires are considered (Parks et al., 2018;Whitman et al., 2020). Specifically, the amount of time required by expert operators (Parks et al., 2018), the lack of standardised procedures (Chen et al., 2020), and the availability of cloud-free images (French et al., 2008), especially during the first post-fire growing season (Key and Benson, 2006), constrain the usage of bi-temporal indices. Recent approaches to overcome the limitations above have been developed (Parks et al., 2018;Whitman et al., 2020). They rely on pixelbased mean compositing algorithms that exploit all the Landsat images available in the pre-and post-fire growing season. While producing spatially consistent results, the mean compositing method requires an effective removal of invalid pixels, i.e. those contaminated by clouds, cloud shadows or snow (Vancutsem et al., 2007). As accurately identifying cloud and cloud shadows in Landsat and Sentinel-2 imagery is an active research topic (Tarrio et al., 2020;Wei et al., 2020), compositing methods that minimise the importance of odd values would be preferable. Compositing algorithms based on multidimensional medians such as the medoid (Flood, 2013) and the geometric median  can produce consistent pixel-based reflectance composites when the proportion of invalid values is less than half of the observations. The medoid approach belongs to the best pixel selection strategy group (sensu Griffiths et al., 2019), while the geometric median generates synthetic reflectance values, i.e. not actually sensed. The amount of available data considerably influences the quality of reflectance composites produced with the medoid Van Doninck and Tuomisto, 2017). However, the geometric median can produce spatially coherent reflectance values even when clear observations are scarce . These methods have been successfully employed to produce image composites of Landsat imagery (Flood, 2013;Kennedy et al., 2018;Roberts et al., 2017;Van Doninck and Tuomisto, 2017).
A widely adopted best pixel selection strategy is parametric scoring, which involves assigning a possibly weighted sum of scores obtained from the evaluation of several parameters to each clear observation available within the compositing period for a given pixel location (Frantz et al., 2017;Griffiths et al., 2013;White et al., 2014). The acquisition Day-Of-Year (DOY) and year, the distance to clouds and cloud shadows, the sensor and the amount of atmospheric haze are among the most used parameters. While target DOYs are often determined in a static fashion for producing reflectance composites over large areas (Griffiths et al., 2019White et al., 2014), spatially explicit land surface phenology (LSP) metrics can serve to dynamically adjust scoring functions relative to the acquisition DOY for attaining phenological coherence (Frantz et al., 2017). Reflectance composites optimised for representing land surface at a specific phenological stage encompass climatic variations induced by both latitudinal and elevation gradients that can otherwise generate spectral inconsistencies when analysing vegetation dynamics over time (Frantz et al., 2017).
In the Western Italian Alps (Piedmont Region), several large wildfires simultaneously occurred in the second half of October 2017, outside of the fire season that typically spans from winter to early spring. They were favoured by exceptional summer drought conditions that lasted into the autumn associated with strong gusts of foehn wind (Arpa Bo et al., 2020). These events burnt nearly 10,000 ha of forests, woodlands, shrublands and pastures and heavily affected the air quality of the surrounding urban areas (Bo et al., 2020). In response to these events, a post-fire management plan has been developed by the regional forest managers in cooperation with other stakeholders (Regione Piemonte, 2019), using burn severity maps produced by the University of Torino. Burn severity maps provided crucial information to identify areas that required prompt post-fire interventions. In fact, fire impaired the protective function of montane forests (e.g. Brang et al., 2006), i.e. protection against soil erosion, rockfall and avalanches, particularly in high severity patches.
Our general hypothesis is that phenologically coherent reflectance composites can increase both effectiveness and operational usage of bitemporal indices for mapping burn severity. In particular, we expect that targeting surface reflectance in early to intermediate stages of the growing season can enhance the discrimination capability of bitemporal burn severity indices, e.g. between fire and non-fire induced spectral changes. In this study, we aimed at achieving the following objectives: (i) to generate pre-and post-fire reflectance composites optimised for burn severity mapping with Sentinel-2 imagery by embedding spatial patterns of long-term LSP metrics; (ii) to produce and validate burn severity maps based on the modelled relationship between bi-temporal indices and CBI data; (iii) to compare burn severity maps obtained by using either Sentinel-2 image pairs or reflectance composites.

Study area
Our study focuses on 10 burnt areas ( Fig. 1, Table 1) whose perimeters were mapped by the Carabinieri Command for Forest Protection through field surveys. We retrieved the start and end dates of each fire (Table 1) from the Visible Infrared Imaging Radiometer Suite (VIIRS) 375 m standard Active Fire product (Schroeder et al., 2014) distributed by NASA's Fire Information for Resource Management System (FIRMS). The extent of burnt areas ranged from 55 to 3974 ha, resulting in a total area of 9740 ha, of which forests covered 7202 ha, according to the Dominant Leaf Type map produced in 2015 (European Environmental Agency, 2018a). Burnt areas were mostly covered by montane and submontane broadleaved and coniferous forests. In particular, forests dominated by broadleaved species covered 4995 ha, and among tree species, sweet chestnut (Castanea sativa M.), European beech (Fagus sylvatica L.) and downy oak (Quercus pubescens W.) were the most abundant. The dominant coniferous tree species were European larch (Larix decidua M.), and Scots pine (Pinus sylvestris L.), covering 2207 ha of the total burnt area.

General overview
We performed the analyses in two steps: fire severity (Section 2.5) and burn severity mapping (Section 2.6), as highlighted in Fig. 2. Fire severity maps allowed us to develop the sampling design to collect CBI data in the burnt areas (Section 2.4). We employed LSP metrics during the selection of pre-and post-fire images, hereafter referred to as paired images, and to produce reflectance composites (Fig. 2).

Satellite data and preprocessing
We conducted the analyses using data acquired by the MultiSpectral Instrument (MSI) onboard Sentinel-2A (S2A) and Sentinel-2B (S2B) satellites. Specifically, we downloaded Sentinel-2 images containing Top-Of-Atmosphere (TOA) reflectance values processed according to Level-1C for the UTM-based Military Grid Reference System (MGRS) tiles 32TLQ and 32TLR. We preprocessed Sentinel-2 images employed for mapping fire severity (Section 2.5) and burn severity (Section 2.6) using different tools, depending on their availability when we performed the analyses. We mapped fire severity using Bottom-Of-Atmosphere (BOA) reflectance products obtained with the Sen2Cor 2.4.0 processor (Louis et al., 2016). We removed pixels contaminated by clouds, cloud shadows and snow by applying masks generated using the Function of mask (Fmask) version 4.0 software (Qiu et al., 2019). We mapped burn severity using BOA reflectance products derived with the Framework for Operational Radiometric Correction for Environmental monitoring (FORCE) software (version 3.6.5, available at https://github.com/da vidfrantz/force) (Frantz, 2019). In particular, the FORCE Level 2 Processing System performed the following operations: (i) detection of clouds, cloud shadows and snow using a modified version of the Function of mask (Fmask) algorithm (Frantz et al., 2018); (ii) sub-pixel coregistration (Rufin et al., 2020); (iii) atmospheric correction (Frantz et al., 2016); (iv) topographic correction (Kobayashi and Sanga-Ngoie, 2008); (v) computation of Nadir BRDF (Bidirectional Reflectance Distribution Function)-Adjusted Reflectance (NBAR) (Roy et al., 2017a(Roy et al., , 2017b. Sub-pixel co-registration, topographic correction and NBAR values were fundamental requirements for temporal and spatial consistency in a multi-temporal analysis setting (Frantz et al., 2016;Roy et al., 2017a;Rufin et al., 2020). As Sentinel-2 bands at 20 m spatial resolution, i.e. band 5 (B5), band 6 (B6), band 7 (B7), band 8A (B8A), band 11 (B11) and band 12 (B12), were processed by FORCE at 10 m by dividing each pixel into four sub-pixels, we resampled BOA reflectance values to the original 20 m grid using the cubic convolution method. For the Quality Assurance Information (QAI) masks resampling, we applied a conservative approach based on the "presence" rule proposed by

Table 1
Information regarding fire, landform, climate and forest cover properties of the burnt areas. For spatial data we report the average ± standard deviation. Column abbreviations are as follows: Ba = Burnt area; Ex = Extent; Sd = Start date; Ed = End date; El = Elevation; Te = Mean annual temperature; Pr = Mean annual precipitation; Br = Broadleaved tree cover; Co = Coniferous tree cover; Nf = Non-forest cover; Dt = Dominant tree species. Tree species abbreviations are: Eb = European beech; Sc = Sweet chestnut; Do = Downy oak; Mb = Mixed broadleaves; El = European larch; Sp = Scots pine.

Fig. 2.
Flowchart of the analyses performed in the current study, grouped into fire severity and burn severity mapping steps. Claverie et al. (2018): for each bit, in the case one of the four 10 m pixels within each original 20 m was equal to one, all four pixels were set to one. Additional processing operations included spatial subsetting based on buffers around wildfire perimeters and invalid pixels masking, i.e. those covered by clouds, cloud shadows, snow and saturated pixels using the relevant bits in the QAI mask. We processed raster data using the R (R Core Team, 2021) package "raster" (Hijmans, 2020).

Field data
We employed the Composite Burn Index (CBI) protocol (Key and Benson, 2006) to classify and validate bi-temporal indices derived for burn severity mapping (Section 2.6). CBI scores rely on ocular estimations of different factors grouped into five vertical strata: three in the understory and two in the overstory vegetation (Key and Benson, 2006). The final CBI score assigned to each plot assumed values in the interval [0, 3], where zero corresponded to an unaltered condition and three to the maximum degree of fire induced changes. The CBI sampling design was based on fire severity maps (Section 2.5) and forest types. In particular, a stratified random scheme for plot selection allowed us to collect CBI data covering the whole range of its values within the dominant tree species. We located 20 m circular CBI plots at a minimum distance of 60 m from each other and at the centre of 3 × 3 windows of 20 m pixels characterised by a low variability in dNBR (Key and Benson, 2006). We placed centroids of CBI plots close to those of Sentinel-2 pixels using a survey-grade Trimble R2 GNSS antenna with sub-meter geolocation accuracy and performed differential correction of these locations. We assigned CBI scores with the supervision of the same person in order to minimise the variability introduced by different evaluators (Miller et al., 2009). We surveyed a total of 251 CBI plots from June to September 2018, distributed among the burnt areas as reported in Table S3. We classified CBI values into three severity categories: unchanged to low (CBI ≤ 1.25), moderate (CBI > 1.25 and ≤ 2.25) and high (CBI > 2.25) as proposed in other studies (Miller et al., 2009;Soverel et al., 2010). The distribution of CBI plots among severity categories was as follows: 51% in the unchanged to low, 33.5% in the moderate, and 15.5% in the high. In some areas, we were able to survey only a limited number of plots, i.e. less than 10, due to steep slopes and the lack of hiking trails for safely reaching the burned patches.

Fire severity
We mapped fire severity patterns using Sentinel-2 data acquired between September and November 2017 (Table S1). We computed preand post-fire NBR (Eq. (1)) using the near infrared (NIR) narrow band (MSI B8A) and the second shortwave infrared (SWIR) band (MSI B12) to derive dNBR (Table 2). We calibrated dNBR by subtracting the dNBR offset, which was the average dNBR of unburnt pixels close to each fire perimeter, located within a 200 m outer buffer. We employed a relatively narrow buffer to minimise differences between burnt and unburnt vegetation characteristics, as suggested by Parks et al. (2018). Furthermore, inside this region, we selected only pixels mainly covered by tree canopies, e.g. more than two-thirds, to limit the influence of phenological mismatches caused by other vegetation types. In particular, we performed the selection according to the Tree Cover Density map produced by the Copernicus Land Monitoring Service in 2015 at 20 m spatial resolution (European Environmental Agency, 2018b). We finally classified dNBR using thresholds proposed by Key and Benson (2006).

Burn severity
We mapped burn severity using bi-temporal indices based on NBR (dNBR, RdNBR, RBR) ( Table 2) and performed the calibration through the dNBR offset. We employed either paired images acquired during the growing seasons of 2017 and 2018 (Table S2) or reflectance composites generated using all the clear observations available in these periods. Specifically, we defined a compositing period spanning from 20 May to 10 September as this date range falls within the growing season in the burnt areas. The phenology-based weight used in the compositing algorithm (Section 2.7.1.) generally assumed minimum values, e.g. 0.01, at the start and end dates of this period.
For each burnt area, we obtained paired images by selecting pre-and post-fire images that were cloud-free, i.e. with a percentage of valid pixels ≥ 95, and with matched acquisition dates, i.e. closest DOY as possible. If multiple paired images were available, we chose those closer to the long-term peak of the growing season, computed as described in Section 2.7.1.

Sentinel-2 reflectance composites
We produced reflectance composites using all the MSI bands acquired at 20 m spatial resolution and the weighted geometric median, also known as the Fermat-Weber location problem (Brimberg, 1995;Cohen et al., 2016;Vardi and Zhang, 2000). The weighted geometric median is an estimator of centrality for multivariate data based on the weighted sum of Euclidean distances to all the observations rather than on their sum, as in the case of the geometric median (Chaudhuri, 1996). It is effective for the generation of noise-reduced, cloud-free reflectance composites (Roberts et al., 2019).
At the pixel level, we considered a n × p matrix containing n repeated observations in rows and reflectance values relative to p spectral bands in columns. We assigned a weight to each observation based on a specific weighting system. This procedure, iterated over all the pixels forming the image, can be formally expressed as follows. Given n observations y i ∈ ℝ p with associated weights w i ∈ ℝ >0 , i = 1,…,n, the weighted geometric median is the vector m n ∈ ℝ p that minimises the weighted sum of Euclidean distances from the n observations: where ‖•‖ denotes the Euclidean norm. If all the weights are equal, i.e. w i = 1/n, m n becomes the geometric median (Chaouch and Goga, 2012;Chaudhuri, 1996;Small, 1990). When observations are not collinear, e. g. they lie in two or more dimensions, the (weighted) geometric median always exists and is unique (Milasevic and Ducharme, 1987). The robustness properties of the weighted geometric median differ from those of the geometric median (Nevalainen et al., 2007), which is unaffected by outlying values if their proportion remains under the breakdown point corresponding to half of the observations (Lopuhaa and Rousseeuw, 1991;Oja, 2013). In particular, the robustness of the estimator is influenced mostly by the weights assigned to invalid observations, e.g. contaminated by clouds, as it depends on the smallest set of observations whose weights sum up to at least half of all weights (Nevalainen et al., 2007). If only two clear observations are available, the weighted geometric median corresponds to the observation with the highest weight, providing no resistance to potential invalid values. As mountainous areas are characterised by frequent cloud and snow cover, the number of clear observations during certain growing seasons can be equal to two, so an adaptive time window for selecting more clear observations can mitigate data scarcity. During the compositing process, the width of the time window iteratively widened up to 20 days for each side until the number of clear observations for a given pixel location reached a minimum of three. We computed the weighted geometric median using Weiszfeld's iterative algorithm (Vardi and Zhang, 2000;Weiszfeld, 1937) implemented in the R (R Core Team, 2021) package "Gmedian" (Cardot, 2020), which uses efficient linear algebra libraries based on C++ programming language.

Weighting system
Our pixel-based weighting system prioritised clear observations acquired during the greenup phase of the growing season and close to the date of peak while it penalised those lying nearby clouds or cloud shadows. Hence, the total weight w i of the i-th observation was the sum of a phenology-based weight (w p ) with a distance-based weight (w d ), and each of them assumed values in the interval [0, 1]. We computed w p through long-term land surface phenology (LSP) metrics representing the DOY at which transitions between phenological phases typically occur at a given location. Long-term LSP metrics allowed us to overcome two major limitations related to annual data. First, post-fire LSP metrics likely deviated from pre-fire ones due to changes in vegetation cover. Second, persistent snow and cloud cover occurring in mountainous areas during certain years prevent the computation of w p on a yearly basis. We derived long-term LSP metrics from the Moderate Resolution Imaging Spectroradiometer (MODIS) Collection 6 Land Cover Dynamics Product (MCD12Q2) (Friedl et al., 2019). The MCD12Q2 product is available at 500 m spatial resolution from 2001 until 2017 and it is based on the 2band Enhanced Vegetation Index (EVI2). Specifically, we computed the pixel-wise geometric median throughout the 17-year time series of Maturity (p 1 ), Peak (p 2 ) and Senescence (p 3 ) metrics. These correspond to the dates when EVI2 first crosses 90% of its amplitude, reaches its amplitude and last crosses 90% of its amplitude, respectively. The robustness of the geometric median limited the influence of ephemeral changes for the considered time interval induced by land-use/land-cover or annual climate anomalies. We discarded pixels flagged as poor quality retrievals in the "QA_Detailed" layer of the MCD12Q2 product to limit the proportion of contaminated data. To cope with the different spatial resolutions of the MCD12Q2 product and Sentinel-2 data, we first resampled LSP metrics to 20 m using the nearest neighbour method, i.e. we divided MODIS pixels into sub-pixels matching the grid and resolution of Sentinel-2 data. We then applied a Gaussian filter with a kernel having sigma equal to 250 m and a width corresponding to one kilometre to eliminate the edges of the 500 m pixels while retaining the effects of local climate variability associated with topography on LSP metrics. Following Frantz et al. (2017), we derived w p for a given pixel location by adapting an asymmetrical Gaussian curve to each triplet of LSP metrics (p 1 , p 2 and p 3 ) (Fig. 3). This procedure involved first computing the parameters σ l and σ r that determine the width of the curve at each tail: We then computed w p by evaluating the Gaussian function at the DOY corresponding to the acquisition date of the i-th observation (parameter D i in Eq. (4)): The parameter c in Eq. (4) further controlled the width of the Gaussian curve and we set it to 0.2 such that w p is equal to 0.45 at p 1 and p 3 (Fig. 3).
The distance-based weight, w d , was aimed at reducing the influence of potential invalid reflectance values lying close to detected clouds or cloud shadows. Moreover, it limited patchiness in reflectance composites caused by the edges resulting from the removal of invalid pixels in Sentinel-2 images. This was particularly relevant when a low number of clear observations was available for a given pixel location. We computed w d through a sigmoid function previously employed as a scoring function for evaluating the distance to clouds in several best pixel selection algorithms (Frantz et al., 2017;Griffiths et al., 2019Griffiths et al., , 2013: where ED max is the maximum Euclidean distance to invalid pixels at which w d can assume values lower than one and ED i is the Euclidean distance of the i-th observation (Fig. 4). Our experiments showed that a distance of 200 m, equivalent to 10 Sentinel-2 pixels, was adequate to effectively reduce patchiness. Finally, we normalised the weights, w 1 ,…,w n , computed for all the observations available at each pixel location using the softmax function, employed with the weighted geometric median by Roberts et al. (2019) and defined as: The softmax transformation converts each component of a vector of  real numbers into values comprised in the interval [0, 1] that sum to one (Bishop, 2006). Therefore, all the transformed weights can be interpreted as probabilities proportional to the exponentials of the unnormalised weights. This transformation increased the contrast in importance between the best and worst observation.

Evaluation of long-term LSP metrics and radiometric consistency of reflectance composites
Elevation emerged as a critical driver in mountainous areas controlling long-term LSP metrics derived either from MODIS (Hwang et al., 2011;Xie et al., 2017) or Landsat (Elmore et al., 2012) data. Hence we evaluated spatial patterns of our long-term LSP metrics within the burnt areas through a correlation analysis with elevation using the Pearson correlation coefficient. We obtained elevation data at 20 m spatial resolution by averaging values of the digital terrain model available for the Piedmont Region at 5 m spatial resolution.
We assessed radiometric consistency in time of reflectance composites by evaluating the symmetrical linear relationship between pre-and post-fire values at each MSI 20 m band and the derived NBR values using the reduced major axis (RMA) regression method (Roy et al., 2016;Smith, 2009). The RMA regression is adopted when both dependent and independent variables contain measurement errors, as in the case of surface reflectance values from satellite imagery. The similarity between the symmetrical linear relationship obtained with NBR values computed either with paired images or reflectance composites allowed us to assess whether the weighted geometric median preserved the spectral relationships between the MSI bands B8A and B12. We performed these evaluations on a set of one million pixels randomly selected outside the fire perimeters within a maximum distance of three kilometres. We quantified differences between pre-and post-fire reflectance and NBR values using the root mean square error (RMSE). We assessed the significance of the linear relationships through the p-value associated with the F-test performed with ordinary least squares (OLS) fits, using pre-fire values as the independent variable and post-fire values as the dependent variable.

Classification and validation of burn severity maps
Following the nonlinear regression models proposed by different authors (Miller et al., 2009;Miller and Thode, 2007;Parks et al., 2014), we predicted thresholds of each bi-temporal index that discriminated between field severity categories (unchanged to low, moderate and high) through the inversion of the following equation: where y corresponds to values of either dNBR, RdNBR or RBR. CBI plots for some burnt areas were either limited in the total amount or the range of values, so we used all the plots (n = 251) to build a combined regression model for each bi-temporal index. We note that vegetation characteristics varied among plots due to the presence of stands dominated by either conifers or broadleaves. We assessed the predictive performance of the regression models by averaging values of the coefficient of determination (R 2 ) and RMSE obtained from 5-fold cross-validation repeated 1000 times. We sampled values of the three bi-temporal indices at each CBI plot location without employing any data interpolation method, e.g. bilinear interpolation (Cansler and McKenzie, 2012;Parks et al., 2014), as this approach did not improve the predictive performance of the regression models.
Using CBI values as reference, we built an error matrix by pooling reference and classification data of all the burnt areas. We then computed classification accuracy (producer's, user's and overall accuracy) and Cohen's Kappa statistic. Lastly, we performed the exact McNemar test implemented in the R package "exact2x2" (Fay, 2010) to evaluate whether differences in overall classification accuracy between burn severity maps produced using paired images or reflectance composites were statistically significant.

Reflectance composites
The number of pre-and post-fire clear observations available at the pixel level within the compositing period (20 May -10 September) varied considerably inside each burnt area, between burnt areas and according to the acquisition year (Fig. 5). Clear observations available at the majority of pixel locations were more than three in the pre-and postfire compositing period while cloud-free images were fewer than three for some burnt areas (Fig. S1). Sentinel-2 data was generally more abundant in 2018 than 2017. The overlapping zone between tiles 32TLR and 32TLQ greatly increased the number of clear observations at burnt area 7.
Spatial patterns of the long-term Maturity (p 1 ) and Peak (p 2 ) were strongly and positively correlated with local variations of elevation, as highlighted by the Pearson correlation coefficient that varied across the burnt areas from 0.79 to 0.97 for p 1 and from 0.66 to 0.95 for p 2 (Table S4). Conversely, the association between long-term senescence (p 3 ) and elevation was generally lower compared to p 1 and p 2 and varied considerably according to the burnt area (Table S4).
Reflectance composites exhibited radiometric consistency in space as values between neighbouring pixels were similar, though they were computed using a different number of observations (Fig. 6). Radiometric consistency in time was highlighted by similar reflectance values between pre-and post-fire unburnt pixels (Figs. 6, 7). The RMA regression lines between pairs of unburned synthetic reflectance values were close to the 1:1 line for all the MSI bands acquired at 20 m and R 2 values associated with the OLS fit were moderate to high (0.69-0.84). RMSE ranged from 0.014 (B5) to 0.045 (B7) and the OLS regression was always highly significant (p < 0.001) (Fig. 7). Synthetic reflectance for MSI bands in the SWIR wavelengths (B11 and B12) was higher in 2017 (xaxis) than in 2018 (y-axis), as highlighted by the slope of the RMA regression line (Fig. 7e-f).
The weighted geometric median preserved the spectral relationships across MSI bands, as displayed by pre-and post-fire NBR values of unburnt pixels (Fig. 8b) and synthetic reflectance values of burnt pixels (Fig. 9). In particular, the RMA linear regression between pre-and postfire NBR derived from reflectance composites was very similar to that obtained with paired images (Fig. 8). Pre-fire synthetic reflectance values for MSI bands B6, B7 and B8A were generally higher than most of those assumed by clear observations (Fig. 9).

Regression models
The average R 2 from the repeated 5-fold cross-validation procedure highlighted a slightly higher predictive performance of nonlinear regression models built with bi-temporal indices derived from reflectance composites than those computed with paired images (Table 3). However, differences in R 2 were limited (≤ 0.008) for all bi-temporal indices. Considering the same bi-temporal index, the average RMSE was always higher for reflectance composites compared to paired images (Table 3). The calibration of bi-temporal indices with the dNBR offset provided a minor increase in the predictive performance of the regression models (Table 3). Regression models built using the complete set of CBI plots and uncalibrated bi-temporal indices are displayed in Fig. 10. Predicted thresholds of bi-temporal indices discriminating between burn severity categories were higher using reflectance composites compared to paired images (Table S5).

Burn severity maps
Burn severity maps achieved overall accuracies ranging from 76.9% to 83.7% (Kappa between 0.61 and 0.72) and these were consistently higher for reflectance composites compared to paired images (Table 4). Paired images performed better with calibrated bi-temporal indices and among them, RdNBR provided the highest overall classification accuracy (80.5%, Kappa 0.67). Conversely, reflectance composites attained the highest overall classification accuracy (83.7%, Kappa 0.72) with uncalibrated RdNBR (Table 4). Among burn severity categories, the moderate one achieved the lowest user's and producer's accuracy.
The calibration of bi-temporal indices through the dNBR offset improved the overall accuracy of burn severity maps derived from paired images by 2% (RdNBR and dNBR) and 1.6% (RBR). On the contrary, the calibration of bi-temporal indices derived from reflectance composites reduced the overall accuracy by 1.2% (dNBR) and 2% (RdNBR and RBR). Notably, changes in producer's and user's accuracy induced by calibration were mostly limited to the unchanged to low and moderate categories.
Burn severity maps obtained from uncalibrated bi-temporal indices highlighted significant differences in overall classification accuracy (p < 0.05) between paired images and reflectance composites (Table 5), particularly for the RdNBR (p < 0.01). Differences in overall classification accuracy of burn severity maps from calibrated bi-temporal indices produced using paired images or reflectance composites were not statistically significant (Table 5).
We compared burn severity maps obtained with calibrated RdNBR from paired images and uncalibrated RdNBR from reflectance composites. Visual inspection of the spatial patterns of burn severity patches revealed good agreement between maps obtained using the two approaches above (Fig. 11), though differences in the extent of burn severity categories were noticeable (Fig. 12). To account for missing pixels in burn severity maps derived from paired images (Fig. 11a), we removed these pixels also in maps obtained with reflectance composites. In some areas, the extent of the moderate and high burn severity categories was greater in maps derived from paired images compared to those obtained with reflectance composites by a total of 198.6 ha (6.3%) and 47.5 ha (11.4%), respectively. Conversely, where the extent of the moderate and high severity categories was lower in maps obtained with paired images than in those produced with reflectance composites, differences amounted to 290.9 ha (− 9.2%) and 125.6 ha (− 30.1%). Overall, the extent of the moderate and high severity categories was lower in burn severity maps obtained with paired images compared to those derived with reflectance composites by 93.2 ha (− 2.9%) and 78.1 ha (− 18.7%), respectively.

Reflectance composites
The 5-days revisit time offered by the Sentinel-2 mission since both S2A and S2B satellites have been in orbit, provided a considerable amount of clear observations within the compositing period, i.e. more than 10 observations for many pixel locations. The lower number of clear observations available in the 2017 growing season compared to that of 2018 (Fig. 5, Fig. S1) was caused by the availability of S2B images only since July 2017, after the completion of the ramp-up phase (Sudmanns et al., 2019).
Relying on long-term LSP metrics, our weighting system ignores inter-annual differences in plant phenology arising from different amounts of precipitation over the years, as in the case of the severe summer drought in 2017 (Rita et al., 2020). Divergent plant phenology between years has the potential to generate non-fire induced spectral changes (Veraverbeke et al., 2010a;Verbyla et al., 2008). Reflectance composites showed greater overall stability of NBR values across the growing seasons of 2017 and 2018 in unburnt pixels compared to paired images (Fig. 8). This suggests a limited impact of non-fire induced interannual variations on NBR values derived from reflectance composites. However, non-fire induced inter-annual differences in synthetic spectral values were noticeable in the SWIR wavelengths (MSI B11 and B12) and limited in the NIR narrow band (MSI B8A) (Fig. 7d-f). The slope of the RMA fit lower than one in B11 and B12 is likely associated with low moisture conditions in plants and substrates during the pre-fire growing season of 2017 as a consequence of the severe drought. Pre-fire moisture content has been identified as the most critical variable for improving regression models between dNBR and CBI data (Soverel et al., 2011), suggesting its relevance in determining non-fire induced inter-annual changes at the spectral level.
Linear and positive correlation between spatial patterns of long-term LSP metrics relative to the greenup phase of the growing season and patterns of topography (Table S4) has been observed in mountainous landscapes. This is associated with the influence of elevation on temperature and snow cover, which are both dominant factors controlling spring onset at high altitudes (Elmore et al., 2012;Hwang et al., 2011;Xie et al., 2017). Limiting the influence of ephemeral land-cover changes and climate anomalies on the phenology-based weight improved the suitability of our weighting system for producing consistent reflectance composites throughout the years. Moreover, the coarse spatial scale at which long-term LSP metrics are implemented in the weighting system prevented the presence of artefacts in reflectance composites due to spatial discontinuities in LSP metrics associated with transitions between different land covers.
Spatial heterogeneity associated with the number of clear  Table 1. observations had a negligible influence on the radiometric consistency in space of reflectance composites (Fig. 6). The weighted geometric median produced synthetic reflectance values that were statistically representative of the compositing period and preserved spectral relationships across bands (Roberts et al., 2019). Because of the two properties above, the (weighted) geometric median is defined as a central and global indicator of the data distribution (Chaouch and Goga, 2012). Hence, synthetic reflectance values obtained with the weighted geometric median are suitable for the computation of spectral indices (Roberts et al., 2019. For missed invalid pixels that can occur using detection algorithms for clouds and cloud shadows currently available for Sentinel-2 imagery (Tarrio et al., 2020), the robustness of the weighted geometric median is considerably superior to that of the weighted mean (Roberts et al., 2019).

Burn severity mapping
When mapping burn severity is the primary aim, the exploitation of the weighted geometric median integrating long-term LSP metrics offers a threefold improvement over the usage of paired images. First, it provides reflectance data for the whole burnt area with no or minimal presence of outlying values. Second, it generates reflectance values representative of the optimal seasonal timing at every pixel location. Third, spectral matching between pre-and post-fire conditions is promoted by the weighting system anchored to fixed DOYs corresponding to the long-term LSP metrics.
Reflectance composites marginally increased the predictive performance of nonlinear regression models compared to paired images (Table 3). Results obtained with NBR mean composites were either similar (Parks et al., 2018) or opposite (Whitman et al., 2020) to ours. The exponential growth function correctly approximated the nonlinear asymptotic relationship between CBI and bi-temporal indices (Lentile et al., 2009) since values of these latter are not constrained to complete vegetation mortality (Miller et al., 2009). Mixing CBI plots dominated by either coniferous or broadleaved species did not prevent nonlinear regression models from achieving high predictive performance (R 2 > 0.85). Moreover, the extensive preprocessing of Sentinel-2 images and the thoroughly collected CBI data likely enhanced the relationship between CBI data and bi-temporal indices.
The overall classification accuracy of burn severity maps derived from paired images was higher (approximately 6-7% differences) compared to similar studies using Landsat (Parks et al., 2018) or Sentinel-2 imagery (Mallinis et al., 2017). Reflectance composites outperformed NBR mean composites derived from Landsat imagery (Parks et al., 2018), increasing the overall accuracy of burn severity maps by between 8-10% depending on the bi-temporal index. User's and producer's accuracies of the high burn severity category (> 95%) provided by uncalibrated RdNBR derived from reflectance composites were comparable to results obtained with the random forest classifier for the crown consumption class in fire severity maps Gibson et al., 2020). Improvements in the overall classification accuracy offered by reflectance composites over paired images were maximal and statistically significant for uncalibrated bi-temporal indices (Table 5). Similarly, Parks et al. (2018) obtained significant improvements in the overall classification accuracy using uncalibrated RdNBR derived from NBR mean composites compared to paired images.
Our burn severity maps discriminated the moderate severity with a relatively low accuracy compared to the other categories (Table 4). Similar results have previously been reported when inferring burn severity from remotely sensed multispectral data (Miller et al., 2009;Soverel et al., 2010). In non-stand replacing fires, the assessment of mixed fire effects mostly impacting the understory vegetation through multispectral data is challenging as healthy and relatively dense tree canopy cover can mask fire effects (De Santis and Chuvieco, 2007). Recently, Yin et al. (2020) integrated tree canopy cover information in the parameterisation and backward inversion of a radiative transfer model to limit the underestimation of the moderate and high burn severity.
The calibration of bi-temporal indices with the dNBR offset reduced the classification accuracy of burn severity maps derived from reflectance composites (Table 4). In fact, the radiometric consistency in time between pre-and post-fire reflectance composites likely limited the utility of the dNBR offset. Drawbacks of using the dNBR offset are related to the selection of an appropriate set of unburnt pixels, as they should be representative of the pre-fire vegetation in the burnt area (Parks et al., 2018;Picotte et al., 2020). Therefore, deriving the dNBR offset for large burnt areas characterised by a heterogeneous pre-fire vegetation can be subjective and prone to error. Our compositing method can improve the operational usage of bi-temporal indices by eliminating the need for further calibration.
The increase in the accuracy of burn severity maps provided by reflectance composites over paired images appeared to be related to higher thresholds of bi-temporal indices discriminating between burn severity categories (Table S5). In fact, a large range of values within each burn severity category favoured the discrimination capability of bitemporal indices (Lentile et al., 2009). The priority given to observations close to the long-term peak of phenology enhanced the spectral contrast between healthy (pre-fire) and burnt vegetation, e.g. in the NIR region (Fig. 9). A steady decrease in canopy greenness has been observed in deciduous forests during the greendown phase and has been implemented when modelling phenological transitions in forests using optical remote sensing data (Elmore et al., 2012;Klosterman et al., 2014;Melaas et al., 2013). In particular, the decline of reflectance in the nearinfrared wavelengths throughout the summer months in deciduous forests (Jenkins et al., 2007) likely affects the temporal pattern of NBR mostly during the pre-fire growing season. Conversely, during the first post-fire growing season, the detection of fire induced changes can be heavily influenced by regeneration processes, particularly in quickly recovering ecosystems such as those of the Mediterranean basin (Veraverbeke et al., 2010a). In our burnt areas, the recovery of the herbaceous layer in wood-pastures and the abundant sprouting from stumps in broadleaved forests increasingly reduced the spectral contrast between pre-and post-fire conditions over time.
The optimisation of the seasonal timing in reflectance composites also increased the overall extent of the moderate and high burn severity categories (Fig. 12). An underestimation of burn severity likely occurred when using paired images acquired during late summer (Fig. S1) due to Fig. 8. Scatter plots of pre-fire (x-axis) and post-fire (y-axis) NBR values randomly sampled outside the fire perimeters (n = 1,000,000). NBR was computed using either (a) paired images or (b) reflectance composites. The dashed black line is the 1:1 line, while the solid red line represents the RMA regression line. The coefficient of determination (R 2 ) and p-value are derived from the OLS regression. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. 9. Pre-fire (green) and post-fire (red) surface reflectance values of three pixels in the burnt area 2 for each of the MSI bands acquired at 20 m (B5, B6, B7, B8A, B11, B12). Each pixel was located within a different burn severity category: (a) unchanged to low, (b) moderate, and (c) high. Light green and light red lines represent pre-and post-fire reflectance values of Sentinel-2 images acquired from 20 May to 10 September, whereas synthetic reflectance values are displayed in dark colours. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Table 3 Average values of the coefficient of determination (R 2 ) and RMSE obtained from a repeated 5-fold cross-validation. We built regression models between bitemporal indices (dNBR, RdNBR and RBR) computed using either paired images or reflectance composites and CBI. We either calibrated or not bi-temporal indices through the dNBR offset.  Fig. 10. Nonlinear regression models built using CBI field-data (x-axis, n = 251) and uncalibrated bi-temporal indices (y-axis) derived either from paired images (ac) or reflectance composites (d-f).

Table 4
Accuracy of burn severity maps obtained using different bi-temporal indices (either uncalibrated or calibrated with the dNBR offset) derived from paired images or reflectance composites. User's Accuracy (UA), Producer's Accuracy (PA) and Overall Accuracy (OA) are expressed as percentages. Cohen's Kappa (K) ranges from − 1 to 1. the weak contrast between pre-and post-fire spectral conditions, as observed by other authors (Chen et al., 2020;Key, 2006;Veraverbeke et al., 2010a). On the contrary, the slight decrease in the extent of the moderate and high burn severity, e.g. in burnt area 2 and 3, when using reflectance composites was probably caused by higher thresholds of bitemporal indices than those derived from paired images (Table S5). Further investigation is needed to determine the applicability of our approach in certain fire-prone ecosystems where fire severity is of primary interest due to rapid vegetation regrowth (Parker et al., 2015;Picotte and Robertson, 2011;Veraverbeke et al., 2010a). As fire severity is assessed within a few weeks following the fire (Key, 2006), reflectance composites should be produced using a relatively short compositing period which would benefit from multispectral data with a high temporal resolution. In this sense, the Harmonized Landsat Sentinel-2 (HLS) dataset (Claverie et al., 2018) provides medium-resolution multispectral data with a relatively high temporal resolution (up to 2-3 days).
Though remote sensing indices are widely used for mapping burn severity (Morgan et al., 2014), they typically make use of only two or three spectral bands, e.g. NBR, Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI). Other methods based on optical remote sensing data make use of all the spectral bands available and proved to be effective in burn severity estimation (Morgan et al., 2014;Yin et al., 2020). Machine learning classifiers, e.g. random forest Gibson et al., 2020), radiative transfer models (Yin et al., 2020) and spectral mixture analysis (Quintano et al., 2017) are among those approaches that could benefit from exploiting the full spectral information provided by reflectance composites.

Conclusions
In this study, we presented a compositing approach that offers the possibility to overcome some of the major limitations hindering burn severity mapping through bi-temporal indices derived from multispectral data. We highlighted that temporal constraints in the selection of appropriate paired images can significantly affect burn severity maps. Our approach provides new opportunities for operational burn severity mapping in areas characterised by persistent cloud cover, such as mountainous landscapes. Moreover, phenologically coherent Table 5 Results from the exact McNemar test relative to differences in overall classification accuracy of burn severity maps derived from paired images or reflectance composites. P-value and 95% confidence interval (CI) refer to the significance and magnitude of difference in overall classification accuracy, respectively.  Fig. 11. Burn severity maps of area 1 (a, c) and area 2 (b, d) obtained with calibrated RdNBR computed using paired images (a, b) or uncalibrated RdNBR from reflectance composites (c, d). The black line indicates the official fire perimeters.
reflectance composites provide a standardised approach for mapping burn severity and avoid further processing to mitigate spectral mismatches. Among the advantages the proposed compositing algorithm provides, there is also its transferability to other optical sensors and multi-sensor data, such as surface reflectance provided by the HLS dataset (Claverie et al., 2018). In the context of forest ecology and forest disturbances mapping, several change detection techniques and classification algorithms could benefit from the use of phenologically coherent reflectance composites, thus being of broad interest to forest managers and researchers.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Fig. 12. Extent of burn severity categories (unchanged to low, moderate and high) in each burnt area derived from maps obtained with calibrated RdNBR computed using paired images or uncalibrated RdNBR from reflectance composites.