Integrating multi-source remote sensing data for mapping boreal forest canopy height and species in interior Alaska in support of radar modeling

Vegetation information is essential for analyzing aboveground biomass and understanding subsurface characteristics, such as root biomass, soil organic matter, and soil moisture conditions. In this study, we mapped boreal forest canopy height (FCH) and forest species (FS) distributions in the Delta Junction region of interior Alaska, by integrating multi-source remote sensing observations within a machine learning framework based on the extreme gradient boosting technique. Model inputs included multi-frequency (C-/L-/P-band) SAR observations from Sentinel-1, UAVSAR (Uninhabited Aerial Vehicle SAR) and AirMOSS (Airborne Microwave Observatory of Subcanopy and Subsurface), and Sentinel-2 optical reflectance data. LVIS (Land Vegetation and Ice Sensor) LiDAR measurements (RH98) and Tanana Valley State Forest timber inventory data were used as respective canopy height and species ground truth data. The combination of multi-source datasets produced the best model performance (RMSE 1.62 m for FCH, and 84.27% overall FS classification accuracy) over other models developed from single source observations. The resulting FCH and FS maps using multi-source datasets were derived at 30 m spatial resolution and showed favorable agreement with plot level field measurements from the Forest Inventory and Analysis record. The model results also captured characteristic differences in stand structure between dominant species and from post-fire vegetation succession. Our results show the potential of multi-source remote sensing observations, including low frequency microwave sensors, for monitoring boreal forest complexity and changes due to global warming.


Introduction
Approximately one-third of global boreal forests are underlain by permafrost (Safford and Vallejo 2019).The presence of, and depth to, the permafrost table exerts strong controls on vegetation distribution in the boreal forest (Chapin et al 2016).Forest cover acts as an insulating thermal buffer for belowground permafrost (Chang et al 2015).Understanding vegetation properties is crucial for studying the interactions between above-and below-ground processes, as well as permafrost characteristics.Developing better methods for landscape monitoring of vegetation, potentially affecting permafrost degradation, is also needed to improve understanding of the vulnerability of boreal ecosystems to climate change.Addressing these needs has been a major focus of the NASA Arctic Boreal Vulnerability Experiment (ABoVE), a decadal field campaign spanning North American Arctic and boreal ecosystems of Alaska and Northwestern Canada (Miller et al 2019, https:// above.nasa.gov/).
In boreal ecosystems of interior Alaska, the forest cover is driven by diversity of topography, surface geology, disturbance history, and the presence of permafrost.The diversity of boreal forest species (FS) is low compared to temperate and tropical biomes, with forest stands generally composed of a few dominant overstory species (Chapin and Danell 2001).Forest canopy height (FCH) and FS are among important forest attributes for inferring subsurface properties.An important subsurface property, the annual maximum thaw depth to permafrost observed at the end of the summer, is referred to as the active layer thickness (ALT) and is known as a major indication of permafrost degradation (Chen et al 2019).In discontinuous permafrost regions, forest characteristics can be used as indicators of the presence or absence of nearsurface permafrost and its characteristics such as ALT (Viereck 1992, Döpper et al 2021).
Interior Alaska is vast and often inaccessible, and only limited regions can be studied by means of laborintensive fieldwork.Therefore, the use of remote sensing data to map FCH and FS is essential for a complete characterization of the above-ground vegetation.Light Detection And Ranging (LiDAR) data can provide vertical forest profiles, including FCH (Potapov et al 2021, Zhang et al 2022).Radar data offer the advantage of broad coverage in all-weather conditions.Polarimetric SAR (PolSAR) imagery has shown the capability of estimating forest structure and distinguishing FS in different ecosystems (Sader 1987, Ranson andSun 2000, Maghsoudi et (Rignot et al 1994, Ranson and Sun 1994, Saatchi and Rignot 1997, Ranson and Sun 2000, Schlund and Davidson 2018).
The combined use of multifrequency SAR (C-/L-/P-band) and multispectral data remains a topic of active research interest.Airborne SAR acquisitions in ABoVE and the availability of contemporaneous satellite observations from Sentinel-1/2 offer a unique opportunity to apply these data to the estimation of FCH and FS.In this study, we investigated the integration of AirMOSS P-band radar, UAVSAR Lband radar, Sentinel-1C-band radar, and Sentinel-2 multispectral optical data for mapping FCH and FS in the Delta Junction region of interior Alaska.Delta Junction was selected as a regional case study for developing this multi-source estimation technique due to its ecological diversity, availability of diverse data sources, and representativeness to the broader boreal forest region of Alaska.One of the primary purposes of the generated FCH and FS maps is for the radar remote sensing community.For example, radar soil retrievals are highly sensitive to uncertainty contributed from vegetation scattering.To reduce the uncertainty, the FCH and FS maps can be used as prior information for parameterizing radar scattering models to initialize and enable land parameters retrieval from low frequency radar observations.Furthermore, other immediate uses of the generated FCH and FS mappings are forest inventory assessment, post-wildfire disturbance analysis, and establishing correlations between vegetation characteristics and permafrost ALT distributions.As satellitebased L/P-band data become more readily available (Le et al 2011, Kellogg et al 2020) large-scale mapping of FCH, FS, and other forest structural attributes will become more feasible using these data.
The research conducted in this paper provides support to the NASA airborne campaigns as part of the ABoVE.We investigate the synergy of AirMOSS, UAVSAR, and Sentinel-1/2 data for mapping FCH and FS, using a machine learning framework based on extreme gradient boosting (XGBoost) algorithm.The relative contribution of predictors to models of FCH and FS are quantified.The relationship between generated FCH and FS maps was analyzed, and compared with information from FIA record as validation.The results offer new insights into next-generation remote sensing methods designed to monitor boreal forests.

Study area and data
The study area is located at Delta Junction in the Tanana River Valley, Alaska, USA (figure 1).Boreal forests at this site have a predominance of five tree species, including black spruce (Picea mariana), white spruce (Picea glauca), paper birch (Betula neoalaskana), quaking aspen (Populus tremuloides), and balsam poplar (Populus balsamifera).The study area overlaps the Tanana Valley State Forest (TVSF), where the forest can reach heights of up to 25 m.The study area (inside the solid yellow contour in   Topography (Topo) DEM, slope, aspect figure 1) encompasses approximately 2500 km 2 , and was defined by the location and coverage of available remote sensing imagery.We compiled a suite of remote sensing observations (table 1) and referenced ground measurements.These datasets were resampled and reprojected to a consistent 30 m resolution grid in the Alaska Albers Equal Area Conic projection based on the NASA ABoVE coordinate system (EPSG:102001, Loboda Carroll 2023).Detailed description about the data processing step can be found in the supplementary material.

Methodology
The overall workflow consists of (1) data assembly, processing, and preparation for the analysis; (2) investigation of FCH and FS XGBoost modeling using multi-source datasets, and quantification of the relative importance of model predictors; and (3) FCH and FS mapping using the best-performance model, and assessment of the resulting FCH and FS mappings (figure 2).
To estimate FCH, the LVIS Relative Height 98 (RH98) metric was used as the reference canopy height (Montesano et al 2023).For FS prediction, the processed TVSF timber inventory served as a reference.The 23 prediction scenarios were developed using the different combinations of AirMOSS P-band, UAVSAR L-band, Sentinel-1C-band, and Sentinel-2 optical multispectral, and topographic data.The prediction scenarios included: (1) Sentinel-1 (S1); ( 2  adding S1/S2 to a combination of L-and P-band, and scenario 15 tested the performance from using all multi-source data.Scenarios 16-23 explored the performance using topography variables alone and with multi-sensor data. Machine learning methods have been proven to successfully model forest species and structure attributes (Grabska et al 2020, Pourshamsi et al 2021, Silveira et al 2023).In this paper, Extreme Gradient Boosting (XGBoost) was applied for estimating FCH and FS.XGBoost is a gradient tree boosting machine learning algorithm, which shows superior performance and has been recently used widely in remote sensing applications (Chen and Guestrin 2016).The XGBoost package available in Python was used in this study.
We randomly divided the aforementioned forest height and species ground truth data into training (80% of samples) and test (20% of samples) sets.Based on the test dataset, the root-mean-square error (RMSE) and coefficient of determination (R 2 ) were calculated for assessing the accuracy of FCH predictions.Overall accuracy (OA) and Kappa coefficient (K) were computed to compare the prediction of FS among different models.The best-performing models among the 23 prediction scenarios were selected.The permutation feature importance was ranked based on the best performance of the FCH and FS XGBoost models.When a single feature value is randomly shuffled, the resulting decrease in a model's score is defined as the permutation feature importance.Permutation feature importance is less biased and can reflect how important the feature is for the model (Pedregosa et al 2011).We used the welltrained best-performing XGBoost regression model for FCH mapping, and the classification model for FS mapping.
The FCH and FS mappings were further analyzed and evaluated against available ground measurements.These ground measurements were assembled from in-situ FS and height data from the FIA inventory in interior Alaska.Due to landowner privacy, plot locations were obscured through fuzzing or swapping methods (Coulston and Reams 2018).Therefore, we compared single-species stands with height data acquired from our FCH and FS mapping products against in-situ FIA forest inventory plot data to analyze the agreement between the two.Alaska FS and stand structure are strongly connected with fire disturbance and post-fire recovery stages (Goetz et al 2010, Shenoy et al 2011).Burned areas due to wildfires that occurred between 1942 and 2007 were studied.The FCH mappings of burned and unburned areas were compared, and a ten-year time duration since disturbance was used to analyze the FCH in relation to time since burn.The FS mapping pattern in relation to fire history was also studied for two distinct burn recovery stages within the study area: 11-40 and >40 years.This comparison was used to verify whether the FCH and FS predictions were able to capture characteristic forest recovery patterns from historical wildfire events.

XGBoost regression for forest canopy height
We investigated different sensors and topographic variables for mapping FCH.Utilization of various combinations of data was investigated for FCH estimation (Figures 3(a) and (b)).For single-source remote sensing datasets, we found the S2 better for estimating FCH than the S1, L-band, and P-band approaches.The combinations of the various datasets performed generally better than the single-source datasets.However, the L/P combination was less accurate than using S2 only.The S1/2 combination performed better than the L/P and S1 models.The combination of L/P + S1/2, compared to the S1/2 model, yielded a 0.051 increase in R 2 and a decrease of 0.31 m in RMSE.The combination of S1/2 + Topo produces a good estimation, only slightly worse than the use of L/P + S1/2 + Topo.The best estimation FCH result (R 2 = 0.89, RMSE = 1.62 m) was obtained from the multi-source model (L/P + S1/2 + Topo), demonstrating the contributions of multi-source information to improving the accuracy of FCH mapping.
In order to understand the influence of FCH XGBoost components, the permutation feature importance of the best performing model (L/P + S1/2 + Topo) is presented (figure 3(c)).Overall, S2 data show higher importance than S1/L/P data.For S2, short-wave infrared bands (B12 and B11) play an important role in the estimation of FCH.S1-HV also has a higher model impact than the S1-VV polarization.For P-/L-band, L-band volume scattering has higher importance than other decomposition parameters.The ranking reveals significant importance of DEM terrain information for predicting FCH.

XGBoost classification for forest species
We performed the FS classification with different combinations of feature layers (Figures 4(a  sensing datasets, S2 outperformed the S1, L-and Pband models.S2 alone produced better prediction performance than the combination of S1, L-and Pband.The L/P + S1/2 combination also produced a ∼7% increase in OA and 0.1 increase in K, compared to S1/2 combination.The best performing FS predictions were obtained using the multi-source data with topographical variables (OA = 84.27%,K = 0.81).
The permutation important feature ranking of the best FS XGBoost (L/P + S1/2 + Topo) is presented (figure 4(c)).The most important model feature is the topography, where DEM, slope, and aspect play a key role in determining the FS pattern.The S2 SWIR (B11, B12), red-edge (B5) and visible (B4) bands were also found to be crucial FS predictors.The S1-VH feature showed greater importance than S1-VV.In contrast, the P-/L-band decomposition features showed minimal importance for the FS predictions.In general, volume scattering, correlated with the forest canopy layer, ranked higher in importance than the other decomposition components.

FCH and FS mapping
The best FCH XGBoost model (L/P + S1/2 + Topo) was used to generate the 30 m resolution FCH spatial distribution over the study area (figure 5(a)).The resulting FCH map showed overall good agreement with the RH98 validation pixels, with R 2 is 0.892, RMSE is 1.62 m.Though there is an underestimation of 1.31 m relative to LiDAR values for taller canopy heights (FCH > 15 m) (figure 5

(b)).
Table 2 shows the corresponding confusion matrix for the best FS model (L/P + S1/2 + Topo).The  FS predictions show good performance for singlespecies stands, with both user accuracy (UA) and producer accuracy (PA) larger than 80%, while the predictions for two-species (BS-WS and PB-QA) and mixed forests are less accurate.The non-forested areas are masked using the U.S. National Land Cover Database (NLCD) (Wickham et al 2017).The FS map was generated using the best FS model (figure 6).A detailed comparison between FCH and FS maps is in the supplementary material (figure S1).

Comparison with FIA inventory data
XGBoost predictions are able to reproduce characteristic patterns and relationships between FCH and FS for the area under study.In taller forest stands, paper birch, quaking aspen and white spruce are the dominant species.In contrast, comparatively low statured stands are dominated by black spruce.These findings were also confirmed by comparing the distribution of FCH values across different boreal FS classifications, derived from the predictions and FIA inventory data (figure 7).In general, the model predictions and forest inventory records both show the similar characteristic patterns.For example, the mean FCH of black spruce is around 5 m and is generally the lowest statured forest type.White spruce is about twice the height of black spruce and similar in size to deciduous forests, which are about 4 m higher than that of black spruce.These results show overall XGBoost predictions consistency with the FIA inventory data used as regional ground-truth.However, more detailed comparisons between the inventory measurements and overlapping model predictions are constrained by the imprecise nature of reported FIA plot locations.

The link between fire disturbance and FCH and FS maps
Boreal forests in Alaska are strongly affected by a legacy of wildfire disturbance, which is a major driver of forest successional changes in dominant tree species and canopy heights   Notably, the XGBoost generated FCH pattern captures the apparent fire disturbance legacy.The estimated FCH pattern captures the general post-fire successional growth in tree heights by ten-year epoch of time since burn (figure 8(a)).
The model predicted FS pattern is also sensitive to the fire history, as represented for two distinct burn recovery stages within the study area: 11-40 and >40 years following fire (figure 8(b)).Comparing estimated differences in the proportions of stand species shows that deciduous trees (paper birch and quaking aspen) become dominant at the beginning of regeneration, accounting for 36.5% of FS in the 11-40 year period.However, their prevalence decreases to approximately 18% in stands older than 40 years.This is likely because deciduous forest stands initially replace spruce forests following stand-replacing wildfires (Mack et al 2008(Mack et al , 2021)).
The relative distribution of mixed forests increases over time from 19.3% to 22.8%, as spruce reestablishes at later successional stages and relatively shortlived deciduous trees gradually die.White spruce commonly replaces deciduous trees and becomes dominant at late successional stages (Viereck 1992), while black spruce can become dominant in more poorly drained permafrost sites.Therefore, the spruce stands recover gradually, constituting a larger percentage of the forest stand composition over time (Li et al 2021).

XGBoost FCH and FS predictions
We used XGBoost driven by observations including C-/L-/P-band SAR and optical remote sensing and topographic data to estimate FCH and FS patterns across the Delta Junction boreal forest region of interior Alaska.The resulting model predictions showed favorable accuracy in depicting characteristic structural differences among the major over story tree species and post-fire successional patterns in relation to ground truth observations from forest inventory records.The best performing model predictions were attained using multi-source remote sensing data, highlighting the advantages of the combined observations.Moreover, multi-source remote sensing data provide complementary information on vegetation structural properties including the structure and dielectric constant of vegetation biomass components (SAR), and the photosynthetic canopy cover and biochemical composition of vegetation (optical).Furthermore, multi-frequency SAR data can provide different layers of information, due to different effective penetration depths and sensitivity levels for different microwave frequencies.For example, C-band can only penetrate to the branch layer of forests, L-band is capable of penetrating into the trunk to soil surface, and P-band potentially may reach to the root zone.Therefore, adding more sources of data could decrease RMSE of FCH and increase OA of FS, even though the improvement achieved thorough the addition of a given source is sometimes marginal (figures 3 and 4).
The feature importance ranking verified that the integration of multi-source data can benefit the FCH and FS model predictions.The Sentinel-2 SWIR (B11, B12) and Red-edge (B5) data were among the most important model predictors.The SWIR bands have been shown to be sensitive to leaf/needle water content (Astola et  Other long-wavelength radar (P-/L-bands) predictors were less important, but their inclusion still improved the model predictions, which may be because the longer wavelength radars are better able to capture surface micro-topography and soil characteristics.

Implications for retrieval of soil properties
The PolSAR backscattering of forests can be modeled as a function of various parameters, including aboveground vegetation characteristics like FCH, as well as ground parameters such as soil moisture and organic matter (Burgin et al 2011).These parameters can jointly and substantially affect radar backscattering coefficients.Remote sensing of soil properties in forests is possible from low frequency (P-band) PolSASR backscattering, but the retrieval accuracy depends heavily on the accuracy of the vegetation parameterization.For instance, a tall forest with dry soil moisture conditions could show similar backscatter behavior as a low statured forest overlying wet soil.Therefore, the physics-based retrieval of soil properties can be ambiguous without accurate aboveground information.The XGBoost-derived FCH and FS maps may potentially contribute to more reliable soil properties retrievals, by providing more realistic aboveground information and initializing parameterization (such as diameter at breast height and stem dielectric constant) of boreal forest in physics-based radar backscatter models (Wang et al 1993, Yarie et al 2007).This could further enable the radar retrieval algorithms based on such scattering models (Chen et al 2019, Bakian-Dogaheh et al 2022), leading to better predictions of soil properties, such as soil moisture and water table depth dynamics.
In addition, boreal FS types in interior Alaska are related to the distribution of permafrost and ALT (Anderson et al 2019, Roland et al 2019, Döpper et al 2021).Black spruce is relatively short-statured compared to other FS and commonly occurs over nearsurface permafrost (30-60 cm), while balsam poplar, paper birch, and quaking aspen are taller and grow in regions with deeper ALT (>70 cm) or discontinuous to permafrost-free conditions on relatively well-drained soils.Therefore, based on the FS map, additional information on near-surface permafrost and active layer conditions can be inferred.More accurate vegetation information, as proposed here, is needed to understand permafrost resilience and vulnerability, and their dependencies on landscape characteristics.

Conclusion
This study applied a machine-learning methodology informed by LVIS LiDAR, multi-source data from AirMOSS, UAVSAR, Sentinel-1, and Sentinel-2 for mapping boreal FCH and dominant FS distributions over the Delta Junction region of interior Alaska.Our analysis and results revealed that the joint use of multi-source datasets improved the performance of the XGBoost predictions for FCH (RMSE of 1.62 m, R 2 of 0.89) and FS (OA of 84.27%, Kappa of 0.81).These results established the general effectiveness and potential of using multi-source remote sensing data for regional mapping of boreal FS and height.The resulting FCH and FS maps showed characteristic patterns and statistical consistency with FIA forest inventory ground-truth data.The predictions also captured patterns in post-fire successional changes following wildfire as indicated from regional fire history records.These results are expected to inform the development of next generation remote sensing approaches for monitoring the rapid changes occurring in boreal forests as a consequence of permafrost thaw and more intense wildfires from global warming.
al 2013).C-band radar signals can penetrate through leaves, and L-band signals penetrate deeper but are saturated for tall and dense tropical trees (Chen et al 2020, Nandy et al 2021, Pourshamsi et al 2021).Due to longer wavelength, P-band data have deeper characteristic penetration into vegetation and soil, and have been used to estimate forest structure and retrieve soil moisture (Zhao et al 2022, Moghaddam et al 2000, Tabatabaeenejad et al 2015, Brigot et al 2019).Optical multispectral satellite imagery has been used for classifying boreal vegetation types such as deciduous and evergreen trees, and predicting vegetation structure (Persson et al 2018, Astola et al 2019, Abdi 2020).Combinations of SAR and optical data have been explored for vegetation structure and cover type classification since SAR and optical data are expected to be complementary to each other (Moghaddam et al 2002, Pham et al 2020, Li et al 2020).Recent studies on FCH and FS mapping using multi-source remote sensing data have mainly focused on the synergistic use of satellite C-band SAR data from Sentinel-1 and multispectral data from Sentinel-2 (Erinjery et al 2018, Bhattarai et al 2021, Nandy et al 2021, Silveira et al 2023), or the combined use of multifrequency SAR data

Figure 1 .
Figure 1.The Delta Junction study area within interior Alaska.Mosaic airborne SAR covered area is shown in solid yellow contour.The LVIS collection area is shown by the blue dotted contour; the TVSF timber inventory is shown as green polygons.Red dots represent FIA plots.Burned areas in Delta Junction are depicted from Alaska fire history polygons.

Figure 2 .
Figure 2. The overall workflow of this study: blue areas denote the data predictors; the green areas represent the ground truth data treated for forest canopy height and species; the white areas indicate FCH and FS XGBoost models analysis; the orange and yellow areas denote is the FCH and FS mappings and verification process: agreement with FIA data and assessment to distinctive post-fire recovery phases.

Figure 3 .
Figure 3. XGBoost for FCH.(a) RMSE of XGBoost models developed from various combinations of remote sensing data.(b) R 2 of XGBoost models developed from various combinations of remote sensing data.(c) Permutation feature importance of the best performing FCH XGBoost model (L/P + S1/2 + Topo).
) and (b)).Similar to the FCH results, for single-source remote

Figure 4 .
Figure 4. for FS.(a) OA of XGBoost models developed from various combinations of remote sensing data.(b) K of XGBoost models developed from various combinations of remote sensing data.(c) Permutation feature importance of the best performing FS XGBoost model (L/P + S1/2 + Topo).

Figure 6 .
Figure 6.Dominant boreal FS mapping derived at 30 m spatial resolution (BS: black spruce, WS: white spruce, BP: balsam poplar, PB: paper birch, QA: quaking aspen, BS-WS: black spruce and white spruce mixed, PB-QA: paper birch and aspen mixed, Mixed: all other mixed types).

Figure 7 .
Figure 7. Distribution of FCH for different dominant boreal FS species as derived from (a) FIA inventory data within the Delta Junction, Alaska study domain (see figure 1), and (b) the XGBoost models' predictions from this study.Boxplots extend from the Q1 to Q3 quartile values of the data, with a line at the median and a triangle at the mean.The whiskers extend from the edges of the box to show the range of the data, but no further than 1.5 * (Q3−Q1) from the edges of the box.

Figure 8 .
Figure 8.The XGBoost-estimated FCH and FS maps relevance to fire disturbance legacy.(a) distribution of FCH by ten-year epoch of time since fire.(b) comparison of FS successional stages at 11-40 years and >40 years post-fire.
al 2019, Grabska et al 2020).The red-edge bands have also been used to study vegetation structure and leaf area index (Majasalmi and Rautiainen 2016).Topography-related data (DEM, slope, aspect) are also crucial, especially for FS modeling, since topography affects surface wetness, drainage as well as incident solar radiation (temperature) (Viereck et al 1983).In interior Alaska, deciduous forests (quaking aspen and paper birch) become dominant on south-facing slopes (warm) with increased elevation and soil drainage.In contrast, north-facing slopes (cool) with wet soils and lowlands underlain with permafrost are generally covered by black spruce tree species (Viereck et al 1983, Walker and Johnstone 2014, Döpper et al 2021).For SAR-related predictors, Sentinel-1 VH consistently ranked high, which is consistent with deciduous and coniferous forests generally showing large differences in C-band cross-polarization backscatter behavior (Udali et al 2021).L-band volume scattering, closely related to cross-polarization HV, is sensitive to different canopy biomass height levels and is also useful for FCH and FS estimation (Rignot et al 1994).

Table 1 .
Feature layers used in this study.

Table 2 .
Forest species classification confusion matrix results.Bold numbers represent correct classification.
the Delta Junction study area burned within the previous 80 years and is in various stages of post-fire recovery.The approximate coverage of forest recovery stages represented include ∼3% recently burned in the previous 11-20 years, ∼4% burned from 21-30 years prior, ∼3% burned from 31-40 years prior, and ∼2% recovering from older (>40 years) burns.