Mapping tree diversity in the tropical forest region of Chocó-Colombia

Understanding spatial patterns of diversity in tropical forests is indispensable for their sustainable use and conservation. Recent studies have reported relationships between forest structure and α-diversity. While tree α-diversity is difficult to map via remote sensing, large-scale forest structure models are becoming more common, which would facilitate mapping the relationship between tree α-diversity and forest structure, contributing to our understanding of biogeographic patterns in the tropics. We developed a methodology to map tree α-diversity in tropical forest regions at 50 m spatial resolution using α-diversity estimates from forest inventories as response variables and forest structural metrics and environmental variables as predictors. To include forest structural metrics in our modelling, we first developed a method to map seven of these metrics integrating discrete light detection and ranging (LiDAR), multispectral, and synthetic aperture radar imagery (SAR). We evaluated this methodology in the Chocó region of Colombia, a tropical forest with high tree diversity and complex forest structure. The relative errors (REs) of the random forest models used to map the seven forest structural variables ranged from low (6%) to moderate (35%). The α-diversity maps had moderate RE; the maps of Simpson and Shannon diversity indices had the lowest RE (9% and 13%), followed by richness (17%), while Shannon and Simpson effective number of species indices had the highest RE, 27% and 47%, respectively. The highest concentrations of tree α-diversity are located along the Pacific Coast from the centre to the northwest of the Chocó Region and in non-flooded forest along the boundary between the Chocó region and the Andes. Our results reveal strong relationships between canopy structure and tree α-diversity, providing support for ecological theories that link structure to diversity via niche partitioning and environmental conditions. With modification, our methods could be applied to assess tree α-diversity of any tropical forest where tree α-diversity field observations coincident with LiDAR data.


Introduction
Decreasing biodiversity loss and ensuring sustainable use of biodiversity are two main targets of the Convention on Biological Diversity, currently signed by 196 countries (CBD 2019). Essential biodiversity variables (EBVs) have been developed to monitor progress towards these goals (Pereira et al 2013, Vihervaara et al 2017. α-, β-, and γ-diversity measures are EBVs typically estimated by in-situ observations (Condit et al 1992, Higuchi et al 2008, Laurance et al 2010, Baraloto et al 2013, Hubbell 2013, Duque et al 2017, McNicol et al 2018, Oliveira et al 2018. However, the potential of remote sensing to predict α-β-γ-and even functional diversity has been evaluated by relating information of multispectral, hyperspectral, synthetic aperture radar (SAR), and light detection and ranging (LiDAR) data to species inventories (Gould 2000, Foody and Cutler 2003, Carter et al 2005, Carlson et al 2007, Goetz et al 2007, 2015, Rocchini 2007, Rocchini et al 2010, Hernández-Stefanoni et al 2012, Hakkenberg et al 2018a, 2018b, Bae et al 2019, Marselis et al 2020 and to morphological and physiological traits (Schneider et al 2017(Schneider et al , 2020.
Although the importance of tree diversity information for the conservation, management, and sustainable use of tropical forests has been recognized (Kessler et al 2009, Anderson-Teixeira et al 2015, Laurance et al 2018, regional spatial models of tree α-diversity (mean species diversity at a local scale (Whittaker 1960)) at detailed spatial resolution are lacking. Some maps have been developed regionally and globally but their relatively coarse spatial resolutions (⩾∼1 km) (Barthlott et al 1996, 2007, Ter Steege et al 2003, Parmentier et al 2011, Keil and Chase 2019, limits their applications for local management, as does the fact that they contain relatively little information on the vegetation itself.
In situ observations have shown that the effect of vegetation structure on α-diversity patterns can differ among taxa (Allouche et al 2012, Simonson et al 2014, Ben-hur andKadmon 2020). However, positive relationships between plant and animal α-diversity and vegetation structure are common in several vegetation types including forests, shrublands, wetlands, and grasslands (Duivenvoorden 1996, Lundholm 2009, Laurans et al 2014, Bae et al 2019, Maskell et al 2019, Marselis et al 2020, Meyer et al 2020. Two non-exclusive hypotheses could explain this ecological relationship. The niche hypothesis proposes that α-diversity should rise with the increasing niche space produced by taller and more heterogeneous vegetation structures (Hutchinson 1957, Stein et al 2014 until a maximum where it would start to decline due to the reduction of the average amount of effective area available for the species (Ben-hur and Kadmon 2020). Under this hypothesis, taller and more heterogeneous canopies should present higher tree αdiversity in tropical forest. The environmental harshness hypothesis proposes that harsh environments limit the set of viable tree species traits. Vegetation height is assumed to act as an integrative measure of limiting environmental factors, thereby leading to positive relationships between taller canopies and tree diversity (Whittaker 1965, Kraft et al 2015, Marks et al 2016. Causality between diversity and structure may act in the other direction as well, with more diverse forest stands containing a wider array of tree architectures thereby leading to greater structural complexity. LiDAR, an active sensor that allows to map vegetation structure from the ground to the top of the canopy (Silva et al 2018, Dubayah et al 2020, has been used for assessing tree α-diversity at detailed spatial resolution (∼50 m) in central America (Fricker et al 2015, George-Chacon et al 2019, in Lopé National Park, Gabon, Africa (Marselis et al 2019), Hawaii (Carlson et al 2007), and for 15 study sites distributed across the tropics (Marselis et al 2020). These and other local scale studies demonstrate strong relationships between LiDAR and tree diversity.
Notwithstanding their efficacy for estimating detailed forest structural metrics, airborne LiDAR are expensive to acquire and typically cover limited spatial extents. Although the long-awaited spaceborne Global Ecosystem Dynamics Investigation (GEDI) LiDAR is now operational and has acquired measurements across much of Earth's temperate and tropical forests, its orbital transects may be separated by hundreds of meters and persistent cloud cover may further reduce coverage (Dubayah et al 2020). Consequently, spatial models of α-diversity that use only LiDAR predictors also generally cover limited spatial extents. To overcome this constraint, LiDAR data can be integrated with other sensor types to obtain wall-to-wall maps of forest structural metrics for large areas (Asner et al 2010, 2012, Qi and Dubayah 2016, Potapov et al 2020. For example canopy height has been mapped in tropical forests with relatively high accuracy by integrating temporal metric predictors derived from multispectral imagery and SAR together with lidar data (Fagua et al 2019b, Healey et al 2020. Such examples show the potential of building wall-to-wall maps of forest structural metrics for large areas of tropical forest, allowing the subsequent estimation of α-diversity patterns. Our objectives in this paper were to (a) model the relationships between forest structure and tree diversity for the Chocó-region of Colombia and (b) create the first maps of forest structure and tree α-diversity at relatively detailed spatial resolution (∼50 m) for the region. This region contains the most well preserved forest of the Chocó global ecoregion, one of 25 biodiversity hotspots prioritized for conservation (Fagua and Ramsey 2019, WWF 2019, Fagua et al 2019a. We hypothesized that α-diversity of trees would be related to forest structural metrics and environmental variables at local scales, revealing patterns at larger scales. In the results and discussion of the paper we identify and evaluate these spatial patterns from the perspective of the niche diversity (Whittaker 1965) and environmental harshness hypotheses (Marks et al 2016) in the context of biodiversity conservation in this region.

Study area
Our analysis focused on the Chocó-region (Ideam et al 2016), a lowland tropical forest zone of 68927 km 2 located along the Colombian Pacific Coast and the western branch of the Colombian Andes (figure 1). The Chocó-region includes sections of five ecoregions (Olson et al 2001), and contains   Hill (1973), Jost (2006) Effective number of species-Shannon It weights species in proportion to their frequency and is calculated as exp (− ∑ p i log(p i )) where pi is the proportion of total individuals contributed by the ith species.
ENS -Shannon Hill (1973), Jost (2006) Effective number of species-Simpson It gives more weight to abundant species and is calculated as 1/ ∑ (p i ) 2 ENS-Simpson Hill (1973), Jost (2006) Shannon index Shannon index It is defined as Simpson (1949) a high percentage of well-preserved forests (Fagua and Ramsey 2019). Because of its unique geological and biological history, the Chocó-region is one of the most rich and endemic areas on Earth in flora and fauna (Gentry 1986, Gregory-Wodzicki 2000, WWF 2019. Additional description of the region in appendix A.

Plots, α-diversity, forest structure and environmental variables
We measured α-diversity, forest structure, and environmental variables in eighty-five 0.25-hectare plots (50 m × 50 m) located in six ecoregions of lowland moist forests across South America ( figure 1(a)). This size of vegetation plot tends to produce lower errors when predicting tropical tree diversity using remote sensing (Fricker et al 2015, Marselis et al 2019 and is an appropriate scale for estimating tree α-diversity (Gentry 1988, Phillips and Miller 2002, Baraloto et al 2013. For α-diversity, in-situ observations were conducted in each plot including censuses of all live trees, live palms, and live woody lianas with a diameter at breast height (DBH) > 10 cm. These data were provided by BioREDD project for biodiversity research in Colombia (Meyer et al 2019). We then calculated a set of tree α-diversity measures to evaluate and compare tropical forest communities: the Hill numbers, Shannon index, and Simpson index (table 1). Additional details in appendix B.
For forest structure, discrete return LiDAR data were acquired in each plot. We first estimated four vertical forest structural metrics from the canopy profile, then we estimated three horizontal metrics based on the derived canopy height model. A summary of these metrics is provided in table 2. Although other metrics could be estimated (e.g. individual tree attributes and density metrics), we selected these metrics because they are comparable when different LiDAR instruments are used, can be emulated with metrics of waveform LiDAR, and have been related to α-diversity measures (MacArthur and MacArthur 1961, Davies and Asner 2014, Zhang et al 2017. Additional information regarding forest structure data is provided in appendix B and details on the lidar data acquisitions in appendix C.
For environmental variables, we selected seventysix, 1 km (or finer) gridded datasets related to climate, soil, topography, and incident radiation which have been shown to be relevant drivers of diversity pattern (Wright et al 1993, Hawkins et al 2003, Laurance et al 2010, Guo et al 2013, Toszogyova and Storch 2019; see table 3 for a summary of these variables and appendix B for additional processing and variable selection information. We did not include any precipitation data between 2001 and 2019 due to their coarse spatial resolution. Furthermore, we did not include slope and aspect as topographic variables because the lowland tropical forests of South America are dominated by relative flat topography and the 85 plots were all in flat areas.

Plot level modelling
We modelled α-diversity estimated in the 85 plots using the forest structural metrics and environmental variables as predictors. We first assessed the predictive potential of five machine learning algorithms and selected random forest (RF) regressions because RF had the lowest errors (appendix D). With this model in mind, we first selected the set of forest structural metrics and environmental variables that maximized the RF prediction of each α-diversity measure using the Boruta algorithm (Kursa and Rudnicki 2010). We also tuned the number of trees (from 300 to 1000) and the number of variables randomly sampled at each split (from 2 to 10) (Boehmke and Greenwell 2019). Root mean squared error (RMSE), mean absolute error (MAE), and relative error (RE) were used as model-error measures. We performed two error estimations: five-fold cross-validation (CV) and data  partition (DP) in which 70% of the data are used for training and 30% are used for testing. We ran 100 iterations per method producing a distribution of RMSE, MAE, and RE. We also reported predictor variable importance generated by the Boruta algorithm, which reduces the bias of the Gini index, a common indicator of variable importance (Nicodemus and Malley 2009). Boruta compares predictor importance with shadow predictors (predictor copies generated by random permutations of their own values) in numerous RF regressions, 100 in our case. Predictors with significantly larger or smaller importance than shadow predictors are declared as important or unimportant for the modelling (Kursa and Rudnicki 2010). We also calculated linear correlation (Pearson-r) between the α-diversity Table 4. Summary of the multispectral and SAR data used to build temporal means (mean-X) and temporal standard deviations (Sd-X) mosaics. The first letter and second letter in the SAR data (H or V) refer to the transmit and return signals; H stands for horizontal and V for vertical polarizations. The details for the construction of these mean-X and Sd-X are provided at appendix E.  measures and the predictors as a descriptive evaluation.

Mapping forest structure
We mapped the seven forest structural metrics integrating LiDAR, multispectral and SAR data. We first obtained the forest structural metrics at 50 m spatial resolution for 578 km 2 of LiDAR surveys across the Chocó (figure 1(d)), taking the LiDAR metrics as 'ground truth' for training models based on the image data ( We followed the same methods to process the LiDAR surveys as described in section 2.2 and appendix B. We then built temporal mean (mean-X) and standard deviation (sd-X) mosaics of multispectral and SAR data for the Chocó region using the pixel values of all imagery available between 1 January 2017 and 31 December 2019. Temporal metrics improve the modelling of forest structure variables by increasing sensitivity and reducing errors (Fagua et al 2019b, Potapov et al 2019. A summary of these metrics is provided in table 4 and additional details and codes in appendix E.
Finally, we used the RF algorithm to model and map the relationship between the forest structural metrics and the 30 multispectral and SAR predictors.
We selected 67 654 training sites (50 × 50 m pixels) in the 578 km 2 of LiDAR surveys to estimate the seven forest structural metrics, using a spatial filter of 100 m 2 to reduce spatial autocorrelation effects. We evaluated predictor importance using Boruta, tuned the parameters of the RF regressions, and evaluated model errors using 100 iterations for CV and DP as described above.

Mapping tree α-diversity
We used the RF regressions at the plot level for modelling maps of the α-diversity measures at the regional level, using stacks of predictor variables at the spatial scale of 50 m (plot size). Because the RF algorithm selects only about 66% of the training data in an analysis (Liaw 2018), we performed 30 iterations (each iteration is a RF regression) to estimate 30 maps for each α-diversity measure; a final map per α-diversity measure was obtained by averaging the 30 maps. Map errors (RMSE, MAE and RE) were calculated using predicted diversities against diversity values of a reserved group of fourteen 0.25 hectare plots in the Chocó region (figure 1(d)) that were not used for model training.

Spatial distribution of biodiversity
As an additional validation and evaluation of our diversity maps, we randomly selected 100 000 sites to assess differences in the spatial distribution of the tree α-diversity against the terrestrial ecoregions (Olson et al 2001) and natural land covers of the Chocó Table 5. Error estimation of random forest regressions to predict five diversity metrics. The regressions were estimated using eighty-five 0.5 hectare plots where diversity metrics, forest structural metrics, and environmental variables were recorded. For each error estimation method, cross validation (CV) and data partition (DP), we performed 100 iterations where each iteration estimated RMSE, MAE, and RE. Mean and Standard deviation (Sd) of each group of 100 iterations are showed in the

Results
RF modelling at the plot level showed that H and D predictions had the lowest error, Richness predictions had moderate error, and ENS-Shannon and ENS-Simpson predictions had the highest errors (table 5). Boruta algorithm showed that 41 of the evaluated 76 predictors were significantly important across the RF models. Forest structure metrics and five environmental variables (BIO15, BIO18, Alt, Tem, and FPAR) tended to show importance over the third quantile relative to all importance values (appendix F). Forest structure metrics were highly and positively correlated with each α-diversity metric (0.65 > r magnitude < 0.97) (appendix G). RF modelling used to map the forest structural metrics (figure 2) showed the lowest errors for VCH while the other six forest structural metrics presented moderate errors (appendix H). Boruta algorithm showed that all multispectral and SAR predictors were significantly important for the RF modelling of each forest structure metric. Means of C-band VH/VV and Landsat SVVI tended to have the highest importance; however, other predictors also presented importance higher than the third quantile relative to all importance values, showing that the contribution of multispectral and SAR predictors as a group was key to the relatively low modelling errors (appendix I).
The maps of H and D obtained the lowest errors, the map of richness showed moderate errors, and the maps of ENS-Shannon and ENS-Simpson presented the highest (figure 3). Curves of ENS-Shannon and ENS-Simpson against richness show linear relationships and curves of Shannon index against Simpson index produced quadratic relationships for the αdiversity maps of Chocó as well as for the 85 plots (figure 4), showing that our maps correspond well to the biodiversity patterns of inventories. Higher ENS-Shannon than ENS-Simpson values and higher Shannon index than Simpson index values reflect higher evenness of tree species distributions with lower dominance of abundant species.
Diversity patterns changed significantly across terrestrial ecoregions; Hill's numbers, H and D were higher in the three forest ecoregions than in the mangrove ecoregions (398 > F values < 227, p values < 0.001) (figure 5). Diversity patterns also varied significantly with natural land cover; Hill's numbers, H and D were highest in forest, lower in secondary vegetation, and lowest in wetlands (980 > F values < 1800, p values < 0.001) (figure 6). High tree diversity is located along the Pacific Coast from the centre to the north of the region and in non-flooding areas along the boundary between the Chocó region and the Andes. The north-eastern and south-western portions of the Chocó Region tend to have lower diversity (figure 3).

Spatial modelling
Integrating in situ forest inventories and remote sensing data, we developed an accurate methodology to map tree α-diversity metrics in tropical forests at detailed spatial resolution. We used mostly free and open-access data; therefore, our methods can be adopted in other regions to evaluate diversity. The only restricted data source was airborne LiDAR data, however our methods can be adapted to use publicly available GEDI LiDAR and airborne LiDAR datasets are becoming more common. Our methods open new possibilities for mapping and monitoring EBVs at regional to global scales as well as new prospects to understand and conserve the biodiversity of tropical forests.
Although previous work has explored relationships between tree α-diversity and remote sensing data in tropical forest (Carlson and Ripley 1997, , we were able to use these relations to map tree α-diversity over a relatively large extent by using aircraft LiDAR as the key link between forest inventories and continuous remote sensing imagery. LiDAR data allowed accurate estimations of forest structural metrics that were related to tree diversity at the level of forest inventories. These forest structural metrics were also related to continuous remote sensing imagery, allowing the construction of accurate regional maps of forest structure, a primary factor driving diversity at fine scales in tropical forests. We found forest structural metrics were highly correlated to tree α-diversity and were important for their prediction. Higher VCH and z_sd thus suggest more canopy stratification and more tree species because trees tend to occupy different vertical strata due to light competition. This hierarchy of the canopy is considered one of the main mechanisms that maintains the high tree diversity of tropical forests (Sheil et al 2006). Higher canopy stratification also suggests more niches and thus diversity as niche hypothesis proposes. Evidence of vertical niche partitioning has been related to diversity of the maximum plant height trait, providing support for the statement that niche partitioning is an important mechanism for maintaining high diversity (Laurans et al 2014). CH_sd, CH_H, and CH_R are indicators of canopy heterogeneity; higher structural heterogeneity suggests more potential for ecological niches and thus higher diversity (Torresani et al 2020).
Climate (  seasonality (BIO15) and temperature were important and negatively correlated to α-diversity, indicating that longer periods of seasonal water deficit plus higher temperatures decrease tree α-diversity in tropical forests as predicted by the harshness hypothesis. The southern and northern portions of Amazon forests present this combination of harsh conditions, resulting in fire increases and diversity reduction (Davidson et al 2012, Brando et al 2014. Although most soil predictors did not influence our modelling, some predictors related to soil structure (sand, clay, and bulk density) from 0 cm to 30 cm tended to have low but significant importance. Sand and clay proportion at low depths regulates water content by reducing water loss at high precipitation seasonality and temperature in the tropical forest (Maia et al 2020). Therefore, these soil variables are related to tree diversity, temperature, and BIO15 in correspondence with our results.
Two aspects of our approach were key to minimizing model errors: First, we used seven forest structural metrics to predict α-diversity measures, which contributed significantly to the prediction by capturing different aspects of forest structure. Second, integrating LiDAR metrics with temporal SAR and multispectral metrics improved the accuracy of forest structure maps. The smaller numerical scale of Shannon and Simpson indices produced lower errors compared with maps of ENS-Shannon and ENS-Simpson.
Our map estimations of tree α-diversity fall within the range of field estimates recorded for the ecoregions and land cover of the study region However, the high tree species turnover (species dissimilarity from 30% to 45% in adjacent hectares and from 85% to 99% in hectares separated by 50 km (Condit et al 2002)) linked to large numbers of tree species in the tropical forests (8000 species are estimated in the Choco alone (WWF 2019)) mean that our tree αdiversity maps cannot be interpolated to a different spatial resolution.
Our results add to a growing literature that show strong positive relationships between canopy height, canopy vertical structure, and tree diversity at plot scales for a range of forest types (Dubayah et al 2020, Walter et al 2021). Our study area sits at an extreme end of the tree diversity gradient, recognized as one of the richest forests in the world (Gentry 1986, Keil and Chase 2019). Other studies conducted in temperate ecosystems with far fewer tree species, greater seasonality, lower rainfall, and lower productivity have also observed strong structure diversity relationships (Lopatin et al 2016, Walter et al 2021. This suggests that our approach may be applicable to forest ecosystems with wide range of environmental conditions and biogeographic histories and could yield similarly useful insights on ecoregion scale patterns of tree diversity. Testing the approach in different biogeographic realms would also aid in understanding how historical factors and environmental constraints mediate relationships between structure and diversity.

Distribution of tree α-diversity
Our maps agree with field estimates of tree αdiversity in ecoregions and by natural land cover category. This result can be interpreted as indirect validation of map accuracy and as partial support for the harshness hypothesis. The highest α-diversity values were in non-flooding areas of EPaM and ChD ecoregions. The highest α-diversity was in non-flooding forest followed by non-flooding secondary vegetation. These areas also correspond to tall and heterogeneous canopies. This strong relation between well-drained upland areas, CH, and tree diversity has also been observed in other lowland forest of South America and has been associated with stable environmental conditions for the establishment and growth of trees (Duivenvoorden 1996, Ter Steege et al 2003. Conversely, the lowest α-diversity values were found in areas with permanent or seasonal flooding and adverse conditions like high salinity and anoxic soils, such as wetlands or mangroves. These areas also corresponded to shorter and less heterogeneous canopies. Similar results have been recorded in the The highly variable conditions of the flooded areas would select for species with a narrower subset of traits than found in the regional pool, thus reducing tree diversity relative to upland forest. Our maps suggest a zone of highest tree diversity between the Pacific Coast and the wetlands of the Atrato River, from around 5.5-8.6 • of longitude. This non-flooding forest zone has the lowest human population density of the Chocó region and roads are absent, conferring it special value in terms of conservation. This zone was identified as the highest biomass forest in the Chocó region (Meyer et al 2019), increasing its value in terms of co-benefits if preserved. However, the proximity of Quibdó, the capital of Chocó department, and minimal protected area coverage are the main threats to its preservation. The other high diversity zone identified by our maps corresponds to the non-flooding areas along the border between the Chocó region and the Andes. Scholars have highlighted the high biodiversity of the Andes forest/lowland forest ecotone in South America (Dehling et al 2014).
Although wetlands and mangroves presented lower tree α-diversity than the non-flooding forest areas of the Chocó-region, this does not indicate low absolute diversity. Some wetlands of Chocó-region show comparable α-diversity values (Shannon index until 3.1) to other tropical vegetation, such as savannas (Morandi et al 2018). These wetlands are also key habitat for endemic flora and fauna (Rangel 2011), increasing their contribution to regional and global biodiversity. There are no directly comparable maps of tree α-diversity for the Chocó-region. A global tree αdiversity model was recently published using a minimum 1 cm DBH threshold to define trees (Keil and Chase 2019), compared to 10 cm in our study. Because a large proportion of diversity in the region is comprised of small stemmed trees (Gentry 1986), the use of different thresholds alone would likely lead to significant differences in diversity estimates between these studies. However, in terms of internal validation metrics, our model is comparable, with a 4.4% relative error computed from five-fold CV compared to ∼90% of deviance explained in the global richness model (Keil and Chase 2019) and 80% variance explained in a model of Amazon tree α-diversity (Ter Steege et al 2003).

Conclusion
We show it is possible to integrate inventories of vegetation and remote sensing data to build accurate maps of tree α-diversity in tropical forest regions. The use of LiDAR data was essential for deriving the forest structural metrics that were associated with optical and SAR image data that was, in turn, used to create maps of α-diversity. Although we used the forest structural maps to evaluate tree diversity, these maps could be used to evaluate other processes and vegetation traits that are linked to vegetation structure. Furthermore, the variability in tree diversity observed across ecoregions and land covers with our study region provides evidence for interactions between the niche and harshness hypotheses to explain tree diversity. While we did not use GEDI LiDAR data, the similarity in spatial resolution between our analysis (50 m) and GEDI (25 m), indicate the potential to produce wall-to-wall structure maps trained on GEDI Lidar data. Thus, our results establish a baseline for predicting and mapping tree α-diversity in additional regions using widely available LiDAR data.

Data availability
Maps of diversity, maps of forest structure, data, and codes that support the findings of this study are openly available at the following URL/DOI: (https://zenodo.org/record/4697615). Likewise, the Landsat imagery processing using Google Earth Engine codes is in the next link (https://code. earthengine.google.com/ed2613adc4c3beb6ad4f0212 8ec6ee59) and the Sentine-1 imagery processing using Google Earth Engine codes is in next link (https://code.earthengine.google.com/8ed106adc297 6b23224761dca47fa65f)

Acknowledgments
We acknowledge NASA Group on Earth Observations Work Programme grant 80NSSC18K0338 for support of CF and PJ, and NASA GEDI Science Definition Team grant NNL15AA03C for support of PB and SJG. We also acknowledged COLCIENCIAS-Colombia (529-2011) and Fulbright-US for partically support of CF. We really appreciate the effort and comments done by three hidden reviewers to improve this paper.

APPENDICES Appendix A. Additional information on study area
The study area of our analysis is called the Chocóregion by Colombian environmental authorities and is considered one of the five primary natural regions of the country (Ideam et al 2016) (figure 1(a)). In terms of the global map of terrestrial ecoregions (Olson et al 2001), 86.5% of this area is considered Chocó-Darien moist forest (59672 km 2 ), 3.4% is considered Western Ecuador Moist Forest in the south (2364.6 km 2 ), and 1.2% is Eastern Panamanian montane forests (871.3 km 2 ) in the north. The Chocó-region also includes the 7.8% of South American Pacific Mangrove (5444.3 km) and 0.8% of Amazon Orinoco-Southern Caribbean Mangrove (574.7 km) ( figure 1(b)). In terms of recent land cover, circa 2015 (Fagua and Ramsey 2019), most of the Chocó-region is formed by natural covers such as well-preserved forests (72.2%; 49803 km), secondary vegetation (natural regrowth of vegetation after deforestation: 13.5%; 9353.4 km), wetlands (7.5%; 5213.1 km), and water bodies (3.6%; 2539 km). Anthropogenic land cover, such as grassland, crops and palm plantations, covered less than 1% of the region ( figure 1(c)).
The forests of the Chocó-region became separated from the forests of the amazon basin by the uplift of the Andes beginning around 25 million years ago (Gregory-Wodzicki 2000). Consequently, endemic species emerged in a significant pulse of diversification. The Isthmus of Panamá later formed roughly three million years ago, establishing a land bridge for fauna and flora from North and South America (Gentry 1986, WWF W W F 2019) and further enriching regional diversity. The Chocó-region presents a rainfall ranging from 8000 mm to 13000 mm across the region (Poveda and Mesa 2000); the highest rainfalls are recorded in the municipality of Lloró (Chocó Department-Colombia) with 12700 mm in average between 1952 and 1960.

Appendix B. Summary of α-diversity measures, forest structural metrics, and environmental variables B.1. α-diversity measures
The five measures of α-diversity used in this work are defined as: the three Hill's numbers (also termed effective number of species (ENS)), where q = 0 is species richness (richness or 0 D), q = 1 is ENS-Shannon or ( 1 D), and q = 2 is ENS-Simpson or ( 2 D) Hill (1973), Jost (2006). Richness ( 0 D) is simply the number of species. ENS-Shannon ( 1 D) weights species in proportion to their frequency and is calculated as exp (− ∑ p i log(p i )). ENS-Simpson ( 2 D) gives more weight to abundant species and is calculated as 1/ ∑ (p i ) 2 (Chao et al 2014). We also estimated the Shannon diversity index (H) defined as (Shannon 1948, Hill 1973) and the Simpson diversity index (D) defined as D = 1 − ∑ p i 2 (Simpson 1949). In all cases, p i is the proportional abundance of the ith species. We used the R package 'mobsim' (May et al 2018) to calculate these metrics of diversity.

B.2. Forest structure metrics
Vertical metrics: to obtain the first set of vertical metrics, the LiDAR surveys were processed by summing the returns within a footprint estimated at 50 m spatial resolution, as the 0.25 hectare plots of vegetation, and 1 m vertical resolution (Dubayah et al 2010, Clark et al 2011. We selected the ground returns and produced 1 m DTMs (digital terrain models) using the knearest neighbour method with the inverse-distance weighting. We then obtained normalized heights for each return of the point cloud and selected the highest returns in the 50 m spatial resolution to compute the vertical metrics as: (a) canopy height (CH) or the cumulative height above ground at the 98th percentile. (b) 50th percentile of heights (z_50) or the cumulative height above ground at the 50th percentile (also referred to as height of median energy). (c) vertical canopy heterogeneity (VCH) expressed as − ∑ p i log(p i ), where p i was the sum of the returns in a voxel calculated at 50 m spatial resolution and 1 m vertical resolution. (d) Standard deviation of normalized heights (z_sd) where the standard deviation was calculate in voxels of 50 m spatial resolution and 1 m vertical resolution (Davies and Asner 2014, Zhang et al 2017). We also estimated the coefficient of variation of normalized heights (z_cv) from the LiDAR data, but we eliminated this metric from our analysis because we could not model the z_cv spatially. Horizontal metrics: to obtain these metrics, we initially created 1 m CH models using the previous procedure.
We used the 1 m CH model to calculate three horizontal metrics as: (a) standard deviation (CH_sd) or standard deviation of the pixel values. (b) Horizontal heterogeneity (CH_H) expressed as − ∑ v i log(v i ), where v i was the proportion of the pixel value. (c) Rumple Index (CH_R) that computes the roughness of CH as the ratio between its area and its projected area on the ground using Jenness's algorithm (Jenness 2004). The R packages 'Raster' and 'lidR' were used to implement these procedures (Hijmans et al 2016, Roussel et al 2019.

B.3. Environmental variables
The 76 environmental variables were obtained and processed thus. (a) The 19 WorldClim variables (BIO1, BIO1… and BIO19) were used without modification (Hijmans et al 2005). (b) The annual mean temperature between 2001 and 2019 (Tem) was obtained from the MODIS product MOD11A1 v006 (MODIS/Terra Land Surface Temperature/Emissivity Daily L3 Global) (Wan et al 2015). This product provides daily land surface temperature and emissivity in • K at 1 km spatial resolution. We built mosaics of annual mean values using Google Earth Engine (Gorelick et al 2017) and average these daily mosaics to obtain a final mosaic. (c) We obtained altitude (Alt) from the SRTM-30 digital elevation model (NASA Shuttle Radar Topographic Mission) (Jarvis et al 2008). (d) Photosynthetically Active Radiation (FPAR) was obtained from the MODIS product MOD15A2H (Myneni and Knyazikhin 2015); an 8 days composite dataset at 500 m resolution that chooses the best pixel available from all the acquisitions of the Terra sensor from within the 8 days period. We built annual mean mosaics of this product corresponding to the years of the vegetation surveys of each 0.5 hectare plot, using Google Earth Engine. (e) The soil variables were obtained from the Soil-Grids250m (Hengl et al 2017). This soil product offers nine main variables at six soil depths (0 cm, 10 cm, 30 cm, 60 cm, 100 cm, 200 cm): Nitrogen content (N) in cg kg −1 , percentage of silt content (Silt), percentage of sand content (Sand), percentage of clay content (Clay), soil pH (pH), soil organic carbon content (C) in 5 g kg −1 , soil bulk density (Bulk) in 10 kg m −3 , and soil water content (Water) at 33 kPa of field capacity.

Appendix C. Description of the LiDAR instruments
The LiDAR instruments used in the eighty-five 0.25 hectare plots across tropical forest of South America ( figure 1(b)) and the Chocó region (figure 1(d)) were similar models of Teledyne Optech: (a) ALTM 3100 (Optech Inc.) instruments were used in 44 plots, (b) ALTM Orion M-20 instruments (Optech Inc.) in 31 plots, and (c) ALTM 3033 (Optech Inc.) instruments in the other ten plots. The flights were between 700 and 1000 m above ground and had a swath side lap of 65%. A minimum return density (⩾4 m 2 over 99.5% of the study area) was required to avoid inconsistencies and biases in estimated forest structural metrics associated with low return density (Jakubowski et al 2013, Longo et al 2016 D1). For GLM.net, we tuned λ and α. For RF, we selected the predictors that fit the best models using the algorithm Boruta and tuned the number of trees to grow (from 300 to 1000) as well as the number of variables randomly sampled as candidates at each split of each tree. For SVM, we assessed three kernel types and tuned their parameters: Liner kernel (cost was tuned), polynomial kernel (cost and degree were tuned) and radial kernel (cost and sigma were tuned). Liner kernel produced the best models; thus, we only include their results in this appendix. For MARS (an extension of linear regression that captures nonlinearities and interactions between variables), we tuned the maximum number of terms (including intercept) in the pruned model and the degree. For BRT, we excluded highly correlated predictors based on their variance inflation factors (VIFs).

Appendix E. Building multispectral and SAR mosaics.
We built: (a) ten temporal mean (mean-X) and ten standard deviation (sd-X) mosaics for seven bands and three indices using Landsat data; (b) four mean-X and four sd-X mosaics of backscattering coefficients of the C-band using Sentinel-1 data; and (c) two mean-X mosaics of backscattering coefficients of the L-band using ALOS-2-PALSAR data.
These mosaics were built in Google Earth Engine (Gorelick et al 2017). For the Landsat data, we first utilized the product USGS Landsat-8 Surface Reflectance Tier-1 to estimate temporal mean-X and temporal Sd-X mosaics. This products is atmospherically corrected using LaSRC (Landsat 8 Surface Reflectance algorithm) that includes cloud, shadow, water and snow masks (Usgs 2019). We only included confident pixel-values (low cloud and atmospheric distortion) using a mask based on the pixel quality attributes generated from the CFMASK correction algorithm (pixel_qa values) and the radiometric saturation pixel (radsat_qa values). The mosaics were resampled to 50 m using bilinear interpolation to match the spatial resolution of the 0.25 hectare plots. To reduce the blanks in the previous Landsat-8 mosaics, a second mosaics was built combining two Landsat products to fill these blanks: USGS Landsat-8 Surface Reflectance Tier-1 and USGS Landsat-7 Surface Reflectance Tier 1 collections. In this second mosaics, the Landsat-8 pixelvalues were rescaled to the Landsat-7 pixel-values to estimate mean-X and Sd-X mosaics following (Roy et al 2016). For the Sentinel-1 data, we used the Sentinel-1 SAR GRD product, which offers calibrated and ortho-corrected data (ESA 2019). We applied an angular-based radiometric slope correction to Sentinel-1 data following (Vollrath et al 2020), using the correction model suggested for vegetation applications and the ALOS World 3D global digital surface model to estimate surface slope. The spatial resolution of this Sentinel-1 product is 15 m which we resampled to 50 m using bilinear interpolation. For the ALOS-2-PALSAR data, we used the years 2017 and 2018 of the Global PALSAR-2/PALSAR Yearly Mosaic (Shimada et al 2014) to estimate Xmeans. We did not estimate X-Sds because this is an annual product with and only two years of data were available. The spatial resolution of this product is 25 m which we resampled to 50 m using bilinear interpolation. Appendix G Table G. Correlations (Pearson-r) between tree α-diversity measures against forest structural metrics and environmental variables. Significant P values range; P < 0.001 ( * * * ), P < 0.01 ( * * ), and P < 0.05 ( * ). Bold highlights significant P values. Variable acronyms were descripted previously.