Harvested area did not increase abruptly—how advancements in satellite-based mapping led to erroneous conclusions

Using satellite-based maps, Ceccherini et al. (Nature 583:72-77, 2020) report abruptly increasing harvested area estimates in several EU countries beginning in 2015. Using more than 120,000 National Forest Inventory observations to analyze the satellite-based map, we show that it is not harvested area but the map’s ability to detect harvested areas that abruptly increases after 2015 in Finland and Sweden.


Introduction
Using satellite-based maps, Ceccherini et al. (2020) report abruptly increasing harvested area estimates in several EU countries beginning in 2015. They identify Finland and Sweden as countries with the largest harvest increases and the biggest potential effect on the EU's climate policy strategy. In a response to comments Wernick et al. 2021) regarding the original study, Ceccherini et al. (2021) reduce their estimates markedly but generally maintain their conclusion that harvested area increased abruptly. Using more than 120,000 field reference observations to analyze the satellite-based map employed by Ceccherini et al. (2020), we confirm the hypothesis by Palahí et al. (2021) that it is not a harvested area but the map's ability to detect harvested areas that abruptly increases after 2015. While the abrupt detected increase in harvest is an artifact, Ceccherini et al. (2020) interpret this difference as an indicator of increasing intensity in forest management and harvesting practice. Ceccherini et al. (2020) use satellite-based Global Forest Change (GFC) (Hansen et al. 2013) data to estimate the yearly harvest area in each of 26 EU states over the period 2004 to 2018. They claim that an increase of harvested areas will impede the EU's forest-related climate-change mitigation strategy, triggering additional required efforts in other sectors to reach the EU climate neutrality target by 2050.

Discussion
In their response to comments, Ceccherini et al. (2021) carry out a stratified estimate of harvested area for the combined area of Finland and Sweden with more than 5000 visually classified reference points based on manual interpretation, using high-resolution aerial images and Landsat data. They compare the time periods 2011-2015 and 2016-2018 to find a 35% increase in harvested area in the second period which is a considerable reduction compared to the original article, where a 54% and 36% increase was reported for Finland and Sweden, respectively. Although this approach is more robust than the "pixel counting" ) of the original article, as can be seen below, this is still a gross overestimation of the change in harvested area. The main issue is the use of Landsat to determine the timing of forest cover losses. Because Landsat became more sensitive in detecting forest cover loss over time, many losses that occurred in or before the first period are thus detected in the second period. This causes errors in the reference data which propagate in the reported estimate. Moreover, Landsat provides the primary data input for GFC, which results in circular reasoning when using Landsat as reference data for GFC. In other words, Landsat cannot be used to validate a Landsat-based product.
Furthermore, Ceccherini et al.'s (2021) argument that abrupt changes in harvested area were not observed in all countries and therefore cannot be caused by data artifacts is inappropriate because the algorithms used to create the GFC map are, due to the modeling procedure applied, inherently non-linear (Hansen et al. 2013). Unexpected changes can therefore happen in some regions but not in others.
Finally, Ceccherini et al. (2021) claim inconsistencies in GFC were unknown. Though inconsistencies in GFC's time series have previously been reported (Rossi et al. 2019;Galiatsatos et al. 2020), this may indeed not have been well-known. However, it is a well-established fact that Earth observation data and related products can be unreliable and inconsistent (McRoberts 2011;Olofsson et al. 2014). Important interpretations and decisions should therefore not be based on "pixel counting" estimates.
We employ more than 120,000 field observations from repeated measurements in 44,000 sample plots from the Finnish and Swedish national forest inventories (NFIs) as reference data in statistically rigorous estimators in order to analyze the accuracy of Ceccherini et al.'s (2020) findings (see Appendix and our dataset (Breidenbach et al. 2021b)). We find that GFC's ability to detect harvested areas and thinnings abruptly increases after 2015 (Fig. 1). When the ability to detect harvest improves, the overall harvested area in GFC will increase, even without a real change in management activity. As a result, more harvested areas and thinnings were detected by GFC after 2015, and this explains why the "harvested area" reported by Ceccherini et al. (2020) abruptly increases. In other words, the reported abrupt increase in harvest is to a large degree simply a technical artifact (bias) caused by the advancement of GFC over time. Ceccherini et al.'s (2020) conclusions, however, are the product and direct consequence of an inconsistent time series and are thus both incorrect and misleading.
Assuming the average proportion of correctly identified harvested areas before 2015 also applies after 2015, the GFC area after 2015 can be modeled without this increasing sensitivity. This indicates that the GFC recorded increases in "harvested area" of 54% and 36% in Finland and Sweden, reported by Ceccherini et al. (2020) represents an overestimate of 188% and 851%, respectively, compared to our reference data (Fig. 2). Because this modeled area still includes commission error, thinnings, and other harvests, additional calculations would be required to provide improved estimates of the actual harvested area change (Rossi et al. 2019). We further highlight that Ceccherini et al.'s (2021) more recent findings do not in any way alter or affect these basic, validated findings.
In addition to generating harvested area estimates subject to systematic error, Ceccherini et al. (2020) do not provide any estimates of uncertainty and further assume all biomass in their mapped harvested areas was in fact removed. Given that a considerable share of the harvested areas in the period 2016-2018 are thinnings and not final harvests (Fig. 2), the latter results in even larger errors with respect to C-losses. Ceccherini et al. (2020) likewise assume the biomass map they utilize is accurate and without uncertainty, which is unrealistic (Naesset et al. 2020). We focus on the problems related to the harvested area estimate in Ceccherini et al. (2020) as this is the most fundamental issue and is adequate for illustrating the erroneous conclusions drawn by the authors.
We acknowledge the strong desire for sound and independently verifiable monitoring strategies driven by their potential for supporting the promotion of forest-related climate benefits (Griscom et al. 2017;Bastin et al. 2019;Brancalion et al. 2019). Without this, much hesitation has accompanied interest in mobilizing forest resources behind the climate change mitigation challenge. Earth observation remote sensing (RS) and related mapping efforts embody the promise of providing very important tools for monitoring land use change, tropical deforestation, and forest restoration (Hansen et al. 2013;Baccini et al. 2017;Harris et al. 2021). As such, they likewise hold the promise of supporting efforts to better integrate forest resources into the framework of climate change mitigation strategies.
RS products, however, can be used in ways that potentially result in estimates subject to severe systematic error, as we have seen in this and other studies (Naesset et al. 2020). Ceccherini et al.'s (2020) claim that a 30% increase of harvested area in France corresponds with national statistics has been invalidated by Picard et al. (2021). These authors find that, despite the already very weak correlation between national statistics and Ceccherini et al.'s (2020) results, this "correlation" itself was caused by one single year that was heavily affected by a storm event. In reliable surveys, Picard et al. (2021) were unable to find signs of increased harvested area in France.
Because RS data measure reflections of electromagnetic waves (e.g., visual light in the case of optical satellites) rather than the direct object of interest such as forest cover loss and carbon stock, algorithmic models are required for interpreting these reflections. Models, however, are frequently imprecise tools (Box 1976) and generally require reference data to correct their data output and thereby provide unbiased estimates (Naesset et al. 2020;Breidenbach et al. 2021a). The compilation of RS data results in nice, colorful maps and scientificlooking figures further distract attention. The collection of the required reference data, on the other hand, is tedious, expensive, and their enormous importance not well understood (McRoberts 2011). Combining the GFC map with adequate reference data into reliable estimators can prove very useful for estimating harvested area and related C-stock losses, as illustrated in various studies (Turubanova et al. 2018;Rossi et al. 2019;Galiatsatos et al. 2020;Naesset et al. 2020;Breidenbach et al. 2021a).

Conclusion
We certainly agree with the authors that one of the more important elements of the Paris Agreement is to "achieve a balance between anthropogenic emissions by sources and removals by sinks of greenhouse gases in the second half of this century" (UN 2015). Based on the data at hand, however, it would be erroneous to lay blame for the failure to achieve these goals at the feet of the forestry sector. We nonetheless remain hopeful future debate over the role of the European forest sector will remain rooted in more scientific foundations. Certainly, the use of largescale open data in carbon monitoring and reporting, as Ceccherini et al. (2020) also (as well as many others before them) propose, represents the next great trend and should generally be applauded. However, strong systematic errors in estimated results clearly need to be avoided. This demonstrates why work of this kind should always be accompanied by rigorous collection of field observations and appropriate statistical estimates. Future work should therefore continue in the direction of further combining the use of large-scale, field-based sampling methods with remote sensing data resources.

Appendix
The Finnish NFI The Finnish NFI (Korhonen 2016) is a systematic nationwide cluster sampling survey composed of permanent and temporary clusters. In this study, only data from the permanent clusters were used. Since the 10th NFI (2004)(2005)(2006)(2007)(2008), the inventory is continuous with a 5-year cycle such that 20% of the clusters are measured in each year. Finland is divided into six regions denoted as strata, with decreasing sampling intensity towards the north. In two of these strata, the partly autonomous Åland islands and the low-productivity, northernmost Lapland region, the continuous design is not applied and all plots are measured in a single field season. Because of this inconsistency, compared to the vast majority of the NFI data, these two strata were not included in this analysis. The distance between the permanent clusters ranges from 12 to 20 km.
Each permanent cluster consists of 10-14 sample plots. Depending on the sampling stratum, a distance of 250 or 300 m separates adjacent plots. Each sample plot position is recorded with a high-precision Global Navigation Satellite Systems (GNSS) device. Until 2013, the plot design was restricted angle count sampling (ACS) with a basal area factor (BAF) of 2 and maximum radius of 12.52 m in southern Finland and a BAF of 1.5 and maximum radius 12.45 m in northern Finland. Since 2014, tree-level measurements have been carried out on concentric circular plots with radii of 9.00 and 5.56 m for trees with a diameter at breast height (dbh) ≥ 95 mm and ≥ 45 mm, respectively. Trees with a dbh < 45 mm are still sampled using ACS with a BAF of 1.5. As of 2019, the radius of the smaller circle was changed to 4.00 m. A large number of forest stand, site, and tree variables are assessed on each plot. The tree-level measurements are used to estimate stem volume and biomass. At reinventory, trees are re-measured and, if logged, harvested trees and time of logging are estimated and recorded. In this study, "logging-type" is defined as (1) final felling consisting of clear cutting, cutting for natural regeneration and cutting before deforestation, (2) thinning (first thinning and later thinnings), and (3) other harvests (removal of seed trees, salvage cutting tree removal along ditches and other locations). Time of logging is defined by harvest season, not calendar years, and the harvest season starts on the 1st of June.
For this study, the last calendar year of a harvest season determined the loss year, and forest cover losses have been assessed since 2008 using 33,846 observations from 15,565 permanent sample plots visited from 2009 to 2019. The NFI data used represent a total land area including wetlands of 27 Mha.
The Swedish NFI The Swedish NFI (Fridman et al. 2014) is a systematic nationwide cluster sampling survey composed of permanent and temporary clusters. In this study, only data from the permanent clusters were used. The inventory is continuous with a 5-year cycle such that 20% of the clusters are measured in each year. Sweden is divided into five strata, with decreasing sampling intensity towards the north. The distance between clusters ranges from 11 to 26 km.
Each permanent cluster consists of 4-8 sample plots. Depending on the sampling stratum, a distance of 300 to 1200 m separates adjacent plots. Each sample plot position is recorded with a hand-held GNSS device. A consistent plot design has been applied in the time period considered and tree-level measurements are carried out on concentric circular plots with radii of 10.0, 3.5, and 1.0 m for measurements of trees with a dbh ≥ 100 mm, ≥ 40 mm, and ≥ 0 mm dbh, respectively.
A large number of forest stand, site, and tree variables are assessed on each plot. The tree-level measurements are used to estimate stem volume and biomass. At reinventory, trees are re-measured and, if logged, volume loss, logging type and time of logging are estimated and recorded. In this study, "logging-type" is defined as (1) final felling consisting of clear cutting, cutting for natural regeneration and cutting before deforestation, (2) thinning (first thinning and later thinnings), or (3) other harvests (removal of seed trees, salvage cutting, other tree removal). Time of logging is defined by harvest seasons, not calendar years, where harvest season is defined as the time between the start of the vegetation period (between end of April and end of May, depending on region) in one calendar year to the start of the vegetation period in the next calendar year. The first three harvest seasons before the measurement of the plot are determined using this method and prior harvests are grouped into one harvest class.
For this study, the first calendar year of a harvest season determines the loss year and forest cover losses have been assessed since 2004 using 91,304 observations from 28,544 permanent sample plots visited from 2004 to 2019. The NFI data used represent all of Sweden; a total land area including wetlands of 45 Mha.

GFC data and determination of the loss year
We intersected the GFC map version 1.6 used by Ceccherini et al. (2020) with the center coordinates of the NFI plots. The GFC loss year, if available, was then attributed to the respective NFI period. Because the NFIbased loss year is estimated, we replaced the NFI loss year by the GFC loss year where both were observed for individual plots. We use the NFI plots to analyze which changes in the forest can be detected by GFC. In other words, we use the field observations as ground-truth to evaluate how well GFC captures harvests over time. Estimators The estimators and notation used here closely follow Breidenbach et al. (2021a) but deviate in important ways when it comes to the application. The estimators are repeated here for completeness and with minor adjustments for this context.
The estimates utilizing only NFI data are based on the basic expansion (BE) estimator, i.e., the sum of total estimates within each NFI stratum (region) where t represents the total of a variable of interest, the "^" identifies this as an estimate of a population parameter and h indexes the strata. Uncertainty can be estimated by the variance estimator The total within a stratum is estimated using n h clusters indexed by i within the sample of clusters s h located within the stratum. The design of the NFI clusters is fixed, resulting in single-stage cluster sampling. To simplify the notation and improve readability, we drop the subscript h indexing the strata using the estimators in this section where λ is the area of the stratum and y i is the mean over the variable of interest observed on m i plots of the ith NFI cluster. To estimate the population parameter of interest for a certain domain such as the area of final felling in a certain year, a domain indicator variable I d is used. This domain indicator is 1 if the plot belongs to the domain of interest and 0 otherwise such that where y ij is the observed value of the variable of interest on the jth plot of the ith cluster (Mandallaz 2008, p. 65). In the case of area estimation, y ij is an n-vector of ones. (In the case where other variables would be of interest such as carbon stocks, y ij is the observed carbon stock on the plot scaled to per-hectare values.) The number of plots m i is typically fixed within a stratum but can vary due to the irregular shape of the stratum. In other words, m i is the number of plots on land which usually is constant but can vary for clusters located close to the coast or along stratum borders.
To develop the variance estimator of the total, it is convenient to write the total estimator aŝ whereŶ is the mean over all plots irrespective of the cluster structure (Mandallaz 2008, p. 66). This is the ratio of two random variables because m i is not fixed. Therefore, variance is estimated aŝ where n is the number of observations (clusters), m ¼ 1 n P i∈s m i is the average number of plots per cluster (Mandallaz 2008, p. 68). The variance of the total is then estimated by multiplying the squared area of the stratum with the variance estimate of the mean We assume simple random sampling and accept that the variance estimates are likely conservative due to the systematic distribution of the clusters in the NFIs. Other options are possible (Magnussen et al. 2020) but will not generally change our case or conclusions.

Application of the estimators
The loss year, determined by GFC if available or otherwise determined by the field crews, was the primary domain of interest (d). All sample plots that covered a loss year were used for estimating the variables of interest. For example, for estimates of the domain of interest "final felling area for the loss year 2018," all sample plots measured in 2018 and 2019 were used and the indicator variable was set to 1 for sample plots with loss year 2018 and final felling was recorded based on the particular logging type. The indicator variable was set to 0 for all other plots. Because GFC information was not used in this estimate apart from adjustments to the felling year, we refer to this estimator ast NFI τ . Correspondingly, for estimating the area of final felling detected by GFC, the indicator variable was set to 1 for sample plots with the GFC-based loss year 2018 and final felling recorded as the logging type. The indicator variable was set to 0 for all other plots. We refer to this estimator ast GFC τ . The proportion of correctly detected final fellings (thinnings, or other harvests) by GFC is a ratio of the two estimates (Mandallaz 2008, p. 68) with variancê where λ h is the area of the hth stratum and where y GFC i is y i [eq. (4)] resulting int GFC τ and y NFI i is y i [eq. (4)] resulting int NFI τ . While our approach is suitable for assessing the accuracy of GFC, it is not optimal for estimating actual harvested area for two reasons. First, the use of the GFC loss year can introduce bias in estimates if the GFC loss year has a systematic error. Second, official NFI statistics include measurements from both permanent and temporary sample plots and utilize standlevel observations around the sample plots for area estimation rather than only plot-level measurements. We have employed this approach because plot-level measurements conceptually match the pixel-level GFC data better than stand-level observations.
Standard errors for estimates in Fig. 2 Breidenbach et al.