Tree regeneration characteristics in limestone forests of the Cat Ba National Park, Vietnam

The ability of overstory tree species to regenerate successfully is important for the preservation of tree species diversity and its associated flora and fauna. This study investigated forest regeneration dynamics in the Cat Ba National Park, a biodiversity hotspot in Vietnam. Data was collected from 90 sample plots (500 m2) and 450 sub-sample plots (25 m2) in regional limestone forests. We evaluated the regeneration status of tree species by developing five ratios relating overstory and regeneration richness and diversity. By examining the effect of environmental factors on these ratios, we aimed to identify the main drivers for maintaining tree species diversity or for potential diversity gaps between the regeneration and the overstory layer. Our results can help to increase the understanding of regeneration patterns in tropical forests of Southeast Asia and to develop successful conservation strategies. We found 97 tree species in the regeneration layer compared to 136 species in the overstory layer. The average regeneration density was 3764 ± 1601 per ha. Around 70% of the overstory tree species generated offspring. According to the International Union for Conservation of Nature’s Red List, only 36% of threatened tree species were found in the regeneration layer. A principal component analysis provided evidence that the regeneration of tree species was slightly negatively correlated to terrain factors (percentage of rock surface, slope) and soil properties (cation exchange capacity, pH, humus content, soil moisture, soil depth). Contrary to our expectations, traces of human impact and the prevailing light conditions (total site factor, gap fraction, openness, indirect site factor, direct site factor) had no influence on regeneration density and composition, probably due to the small gradient in light availability. We conclude that the tree species richness in Cat Ba National Park appears to be declining at present. We suggest similar investigations in other biodiversity hotspots to learn whether the observed trend is a global phenomenon. In any case, a conservation strategy for the threatened tree species in the Cat Ba National Park needs to be developed if tree species diversity is to be maintained.


Background
Forest regeneration plays a key role in forest development. In managed forests, it ensures the survival of tree species after the overstory layer has been harvested. In natural forests, it is key to the resilience of an ecosystem after natural disturbances [1][2][3][4][5][6]. Thus, the forest regeneration status determines the future of a forest ecosystem [4]. However, the regeneration layer also directly depends on the structure of the standing tree layer [2,7,8] and reflects forest resilience and vitality [3,9,10]. When a forest ecosystem lacks sufficient natural regeneration of certain tree species, tree species diversity is lost, which may, in turn, affect related ecosystem functions and services in the long term [4,9,[11][12][13]. Therefore, research on natural forest regeneration dynamics and on potential factors influencing successful regeneration will increase the understanding of the long-term functioning and stability of forest ecosystems [14].
Studies of the impacts of abiotic and biotic factors on establishment, survival, and increase in natural regeneration have been conducted worldwide in different forest types [1,3,4,6,[15][16][17][18][19][20][21][22][23][24]. Research on regeneration patterns in tropical forests is, however, still scarce (but see below). Nevertheless, this research is critical due to the contributions of tropical forests to global biodiversity [25][26][27][28]. Southeast Asia harbors approximately 15% of the world's tropical forests [29] located in countries such as Cambodia, Indonesia, Malaysia, Myanmar, the Philippines, Thailand, and Vietnam. This part of the world can be regarded as a biodiversity hotspot where the greatest number of endemic and threatened species in the world can presumably be found [26,30]. It is, therefore, highly important for biodiversity conservation. In addition, these forests are important for environmental protection, socio-economics, and the living conditions of forestdependent populations [31]. However, to maintain these tropical forests and their diversity, we need to understand the degree to which tree regeneration patterns depend on abiotic and biotic factors and how they change due to natural or human disturbances [32]. Many studies have examined the tree diversity of saplings depending on light and water availability in tropical forests, or have focused on the regeneration patterns within gap-understory habitats in tropical rainforest environments [26][27][28][29][30][33][34][35]. Research on natural regeneration under potential limiting factors other than light are, however, still rare especially in Southeast Asia.
In 1943, 14.3 million hectares of natural forests could be found in Vietnam, accounting for 43% coverage of its total land area [36,37]. After long-lasting wars in Vietnam during the period 1945-1954 and 1955-1975, the forest area had decreased to 11.2 million hectares [36]. In the period from 1975-1990, the quality and quantity of forests further declined due to multiple socio-economic factors, unsustainable management, and consumption [36,38]. As a consequence, the forests in Vietnam reached their lowest coverage (27%) in 1990 [36,37,39]. Due to government policy, the forest cover increased again up to 42% in 2019 [40]. This was achieved both by protecting the remaining natural forest ecosystems and by establishing five million ha of forest plantations [40]. These measures reduced the pressure on forests such that the forest area increased to 13.8 million ha in 2019 [36,39,41]. At the same time, the Vietnamese government also established protected areas and national parks across the country to enable the recovery of secondary forests and to protect primary forest ecosystems [36,42]. So far, 30 national parks and protected areas have been established in Vietnam [42,43]. Due to past unsustainable management practices, most natural forests in Vietnam now are secondary forests; primary forests are restricted to core zones of protected areas or national parks [36].
To date, few studies have focused on forest regeneration in both of these forest types. Dao and Hölscher [44] examined the regeneration status of three threatened species in north-western Vietnam and found that most of those tree species regenerated in core zones, while their regeneration was poorer in buffer zones and restoration zones. Van and Cochard [45] suggested that forest isolation contributed to decreasing regeneration of rare tree species in lowland hillside rainforests in central Vietnam. Blanc, et al. [46] conducted a study on forest structure, natural regeneration status, and floristic composition at five locations in Vietnamese Cat Tien National Park. Their results showed that tree species diversity in the regeneration layer decreased due to the dense canopies of the dominant tree species. Tran et al. [47] studied the regeneration of 18 commercially valuable tree species after 30 years of selective logging in Kon Ha Nung Experimental Forest, Vietnam. Their results indicated that tree regeneration density in intensively managed forests was significantly higher than in low impact and unlogged forests. However, to our knowledge, no study has yet addressed natural forest regeneration in the limestone forests of Vietnam (including secondary and remaining primary forests), even though they are diversity hotspots and habitat for many threatened tree species [48].
The regeneration layer is known to be influenced by overstory tree species composition and density [49,50], abiotic factors [9,51], and biotic factors [4]. Here we investigated natural forest regeneration in Cat Ba National Park (CBNP), located on limestone islands in Vietnam [52][53][54]. Specifically, we sought to identify the impact of environmental factors on natural regeneration diversity by focusing on two main questions: (1) Does tree species richness in the regeneration layer resemble the tree species richness in the overstory, indicating high stability in tree species richness? (2) If species richness differs among the different layers, which environmental factors drive the species richness gap between the overstory and the regeneration layer?

Species diversity status of the overstory vs. the regeneration layer
In 90 sample plots, we found a total of 97 tree species in the regeneration layer (see "Appendix": Table 7) compared to 136 species in the overstory tree layer (see "Appendix": Table 8), indicating that species richness in the overstory layer was higher in almost every sample plot compared to the regeneration layer ( Fig. 1). We observed a similar pattern for the threatened tree species (Fig. 2). The average density of regeneration trees was 3,674.42 ± 1,601.62 ha −1 (mean ± sd).
Extrapolation of results underpinned the observed tree species diversity patterns. Both, incidence (Fig. 3a) and abundance-based (Fig. 3b) extrapolation showed a clear difference in tree species diversity with higher values in the overstory layer across three investigated Hill numbers (Fig. 3, see "Appendix": Tables 7,8,9 and 10). Extrapolating to a base sample size of 180 plots (double of observed sample size, [55]) increased the species richness in the overstory to 152 species compared to 124 species in the regeneration layer (Fig. 3a, see "Appendix": Tables 7 and  8). The difference was even more pronounced when extrapolating based on the number of sampled individuals (Fig. 3b, see "Appendix": Tables 9 and 10). The diversity gap between forest layers further increased with increasing Hill number (Fig. 3a, b). Thereby, the estimated sample coverage for the base sample size was above 95% for both forests layers indicating completeness of sampling (see "Appendix": Figs. 9 and 10).

Ratios comparing overstory vs. regeneration layer diversity
We calculated five ratios linking the overstory and regeneration layer diversity per plot. The five ratios clearly indicate that the regeneration layer does not reach the diversity level of the overstory because all five ratios fell below 1 on average (Fig. 4). This result was also confirmed by the one sample t-test, with all five ratios being significantly lower than 1 (Table 1). When separating the regeneration into different height classes, the true diversity and species richness ratio were smallest for the height class < 50 cm (0.2 and 0.17, respectively) and highest for the height class considering regeneration > 200 cm < DBH 5 cm (0.46 and 0.42, respectively) (see "Appendix": Fig. 11). Results show that the regeneration layer only reaches 70% of the diversity of the overstory layer, with only 38% of the overstory tree species regenerating successfully within a sample plot (Table 1). Interestingly, 30% of the regenerating tree species came from mother tree species presumably located outside the sample plots, as they were not present in the overstory (Table 1). Offspring was found for only 36% of the mature threatened tree species (Table 1).

Principal components as independent environmental gradients
Principal component analysis was used to identify independent environmental gradients as potential drivers of regeneration patterns. The first three principal components (PC) of the PCA explained 54.14% of the variation in environmental characteristics among plots. PC1 (23.5% explained) had the highest loadings for different light availability factors, while PC2 (19.7%) represents soil fertility (CEC, humus content), percentage of rock surface, soil moisture, soil depth, and pH. PC3 (10.9%) represents the soil texture (silt, clay, and sand) (Fig. 5, see "Appendix": Table 11).

Impact of environmental factors on regeneration patterns Tree regeneration density
Neither specific environmental factors (Table 2) nor the first three principal components (Table 3) were significantly correlated with tree regeneration density using linear mixed effect models.

Ratios comparing overstory and regeneration layer diversity
For three out of five ratios, the PC2, which combines a gradient of fertility (S_CEC, S_SH), percentage of rock surface, and moisture, was the best predictor (Table 4). Thereby, an increasing PC2 axis value slightly reduced the species richness ratio (SRR), the true diversity ratio (TDR), and the new species ratio (NSR), indicating that the difference between the forest layers increases with soil fertility, soil moisture, and rock surface. The percentage of rock surface best predicted the same species ratio. An increasing percentage of rock surface reduced the same species ratio, indicating that only certain tree species were able to regenerate on rough terrain (Table 4). Light variables, summarized as PC1, were the best predictors for the threatened species ratio, but with no significance (Table 4). In general, marginal and conditional R 2 values were very low, showing that the Fig. 3 a Sample-size-based (incidence-based), and b individual-based (abundance-based) rarefaction and extrapolation. The solid line depicts the interpolation, and the dotted line shows the extrapolation of sampling curves for tree species data of overstory and regeneration layers for different Hill numbers: q = 0 (species richness, left side), q = 1, (Shannon diversity, middle) and q = 2 (Simpson diversity, right side). The solid dots/triangles show the observed reference sample size recorded environmental variables could explain only a small proportion of the variation.

Discussion
Seedling density in the regeneration layer is an important property for successful regeneration. Our results demonstrate that the average regeneration density of CBNP was 3,674 ± 1,602 trees per ha (see results section). This mean density is considerably higher than that of sub-tropical forests [4], but comparable with other forest locations in Vietnam, such as the Highland forests (around 3400 trees per ha) [47] and limestone forests in Quangninh Province, Vietnam (3814 trees per ha) [56]. However, in Vietnam, even higher regeneration densities have been reported. For example, in the Cat Tien National Park, tree regeneration density ranged from 2850 to 8150 trees per ha [46]; in other broadleaf evergreen forests of Vietnam   Table 2 Linear mixed effect model results of tree regeneration density and six environmental factors which were most strongly correlated with the first three PCs (see more in "Appendix": Table 11) Acronyms of variables are defined in Table 5. Given are the estimates (Value) and the respective standard error, the degrees of freedom (df ), the t-value of each variable, and its significance (p-value). Significance was assumed with p < 0.05  [57]. Since we could not identify any specific environmental factor explaining variation in regeneration density, we can only speculate about the most important drivers. It is known from studies in various biomes around the world that light availability plays a crucial role in regeneration abundance and distribution [3,6,58]. It is likely that the narrow range of light availability (from 8.21% (± 2.75%) to 10.37% (± 11.68%), e. g. for ISF see Table 5) in our study prevented us from confirming its importance in our case. However, even if significant differences in light availability only partially explain regeneration density [58], it is known from other studies that disturbances due to logging [47], livestock browsing, and microsite characteristics [17] are additional explanatory factors in seedling density variation. However, in our study, environmental factors and human disturbances did not appear to affect tree regeneration density (Tables 2, 3). Our results suggest that competition within the regeneration layer may also play a role, indicating the importance of dominant tree species [59]. The eight most dominant tree species in the regeneration layer accounted for 55% of all seedlings and the 16 most dominant tree species in the overstory represented 67% of total seedling abundance (see "Appendix": Table 12 and Fig. 12). Thereby, the low ranking of threatened species in the overstory may explain the even lower regeneration success of this species group compared to the common species (see "Appendix": Table 12 and Fig. 12), however, there are also some threatened species (e. g. Aporusa ficifolia) that regenerated successfully compared to their ranking in the overstory (rank 37 in the regeneration vs. 119 in the overstory). Our inconclusive results underscore the need for additional research to explain regeneration density more mechanistically. Approaches should focus more on species traits, such as how the fruit coat requires specific environmental conditions to allow successful germination and establishment [60]. Many studies have used seedling, sapling, and mature tree species densities as criteria for evaluating the forest regeneration status [4,7,61]. Forests are classified as having good regeneration potential when the number of seedlings > the number of saplings > the number of trees; the potential is poor if the numbers of seedlings and saplings are fewer than the present mature tree species [4,7,61]. We question the suitability of this approach for some forest types since it does not take developmental stages into account; for example, where mature tree density is so high that regeneration is inhibited due to low light availability. These forests should not rate as poor since their potential for regeneration may still be high. We modified this approach, focusing on species richness and diversity indices of the tree regeneration and overstory layer rather than on tree density. Even though this approach is also quite simplistic and may not consider different recruitment events over time that may have shaped the regeneration as well as the overstory [62], relating overstory and regeneration richness and diversity can give insights to potential trajectories of tree species richness. We found that tree species richness and diversity in the regeneration layer were lower than in the overstory layer (see Figs. 1, 2, 3). The 97 tree species that were found in the regeneration layer accounted for 71% of the overstory tree species (136 tree species) (see "Results" Section, and see "Appendix": Tables 7 and 8). After extrapolation to a base sample size, species richness in the overstory was Table 3 Linear mixed effect model results of tree regeneration density and the first three principal components PC1 = light availability gradient, PC2 = soil fertility, rock surface, soil moisture, and pH gradient; PC3 = soil texture gradient. Given are the estimates (Value) and the respective standard error, the degrees of freedom (df ), the t-value of each variable, and its significance (p-value). Significance was assumed with p < 0.05   Fig. 3). The difference was even higher for Simpson diversity (1.63 times higher diversity in the overstory). The pattern was similar when using an abundance-based extrapolation approach indicating the robustness of results when accounting for sampling effort and the number of individuals [63]. Furthermore, our results are comparable to other studies conducted in Vietnam. Tran, et al. [47] found 107 tree species in the sapling stratum and 90 tree species in the seedling stratum compared to 144 tree species in the overstory layer in an evergreen broadleaf forest. Blanc, et al. [46] reported tree species numbers of 92, 83, 53, 1, and 43 respectively in five one ha sample plots in the overstory layer of Cat Tien National Park, whereas the number of regeneration tree species were 50, 52, 20, 1, 24, respectively.

Variables Value Standard Error df t-value p-value
The found poor status of species richness in the regeneration layer in our study was verified by the various ratios ( Fig. 4, Table 1). In addition, separating the regeneration into height classes indicates that the gap between overstory and regeneration richness and diversity is even increasing with time, as the ratios were highest for the largest height class representing the oldest regeneration (see "Appendix": Fig. 11). Our results may therefore hint towards potential community alterations in the future that have been observed in other tropical forests [64,65]. Decreasing species dispersal by large vertebrates is mentioned as an important factor for such community alterations [64]. In our study, only 38% of the regenerating tree species came from overstory tree species (same species ratio), 30% came from outside the plots (newly occurred species ratio) ( Table 1). The trend was also observed for the threatened tree species, which had an equally poor regeneration species rate (36%) (Fig. 2, Table 1). Interestingly, the threatened tree species were mainly found around the parent trees in our study area. According to Janzen [66], the seed density of a given tree species decreases with distance from the parent tree but also varies with seed size and seed dispersal processes, and is affected by plant parasites and seed-eating animals. However, more detailed research is needed to determine whether low seed production, low germination rates, low survival rates, or insufficient dispersal can explain Table 5 Environmental and human activity characteristics in the three study sites (LLA, MSA, and ISA) in Cat Ba National Park The values represent the mean and standard deviation of 30 plots per study site (in total 90 plots). Different lower-case letters indicate significant differences between the three areas (at p ≤ 0.05). We used the "multicomp" package to calculate differences between the three study sites [111]. The acronym column shows the abbreviation of the factor. T terrain factors, S soil properties, L light availabilities, and H human impact Many previous studies have found that a single environmental factor fails to explain forest regeneration characteristics [1, 3, 4, 6, 7, 9, 11, 15-17, 19, 24, 51, 59, 67-71]. These results are confirmed by our study since we found that PC2, which represented a combined fertility, rough terrain, and moisture gradient (see "Appendix": Table 11 and Table 4, Fig. 5), explained the pattern of tree species regeneration better than single environmental variables. However, the marginal R 2 values of each model (Table 4) were very small. So although we can confirm a link between species richness ratios and environmental factors, we did not observe a strong relationship. We assume that other unidentified factors or factors functioning on a larger scale must be considered such as rainfall seasonality [72], water erosion [73,74], and flooding period [75,76]. In particular, increasing extreme events can have major impacts on seedling establishment effective over extensive areas. In general, tropical forests are considered as very sensitive to changing climatic conditions and interannual climate variability as the forests display for example strong coevolutionary interactions and specializations that can be decoupled by global change. In addition, changing environmental conditions may eliminate the narrow niches in tropical forests and by this species diversity [77,78].
As previously mentioned, one important factor affecting tree regeneration patterns at the local scale may be light availability. However, we did not find an influence of light-related factors (represented by PC1) on the tree species richness and diversity ratios (Table 4); we assume that our gradient in light availability was too small (Table 5). Therefore, we can only speculate as to whether higher light availability would have resulted in more balanced ratios between overstory and regeneration tree species richness.
Previous studies have also demonstrated variability in tree species composition along topographic gradients [18,[79][80][81][82][83][84][85], because topography affects soil formation (including soil fertility, moisture, and depth) and creates microhabitats [83,84,86,87]. Microhabitats contribute to regeneration niches which in turn are strongly linked to species coexistence [23,68]. In our research, topography was represented by the percentage of rock surface, slope, and elevation. We assume that a combination of rock surface, slope, and limestone ridges strongly affect soil characteristics (soil nutrient status, humus, soil moisture, and depth), which may have implications for seed storage ability [6,61]. With increasing percentage of rock surface, soil cover and soil depth decreased (Table 4, Fig. 5, and "Appendix": Table 11). Furthermore, with increasing slope, soils become shallower, store fewer nutrients, and are more prone to erosion. Therefore, factors indicating rough terrain may have created unfavorable conditions for seed storage and germination [6,83].
Besides topography and light, soil factors are considered as most important for natural forest regeneration [2,3,16,17,68,70,80,88]. In our study, soil moisture as well as base saturation and CEC were represented by PC2 and affected the species richness ratios negatively. However, this unexpected result may be a methodological artifact, since soil moisture and soil chemical properties were determined for the upper 20 cm of the soil only. Likely, these 20 cm do not sufficiently represent the real status of soil moisture and soil fertility. This view is supported by the finding that soil depth was negatively correlated to PC2, and thus influenced the species richness ratio positively.
Forest regeneration of tree species depends on both natural disturbances and anthropogenic activities. Natural disturbances can increase the variability in light conditions, influence seed arrival, and contribute to the diversity of seeds by providing regeneration niches [23,89,90]. In addition, natural disturbances also affect recruitment patterns of colonizing species, influence soil resource levels, and determine longer-term community development [91]. Human activities may have similar effects but they can additionally affect seed bank composition, for example by removing dominant tree species [70,91]. However, we did not find a strong effect of human disturbances on species richness and diversity ratios. Only the number of footpaths was related to PC2 (r = − 0.21) (see "Appendix": Table 11, and Fig. 5). But this relationship was negative; therefore, the number of footpaths had a positive effect on the ratios, lending support to the idea that disturbances can promote the regeneration process. This is supported by Tran, et al. [47] who found a higher similarity between the regeneration and overstory richness in forests with high intensity selective logging compared to forests with a lower management intensity or to unlogged forests after 30 years because of sufficient sunlight reaching the forest floor in the intensively managed forests to facilitate seed germination and seedling growth. Although we do not have records of natural disturbances or historic human impact, long-term effects of former disturbances may still be reflected in the richness and composition of the regeneration layer or even more so of the overstory layer and can explain current richness differences between layers [62,92,93]. Thus, both natural disturbance and historical human influence should be taken into account when investigating regeneration patterns of tree species including threatened species.

Conclusions
Our results indicate that a considerable number of tree species that can be found in the overstory of the forests in the CBNP is absent in the regeneration layer. We interpret this finding as an indication that tree species diversity appears to be decreasing. Since we were not able to explain the resulting pattern to a satisfying degree, even though a large number of potentially influencing variables were tested, unidentified factors such as species dispersal or factors functioning on a larger spatial scale may be decisive. Thus, future research may make use of experiments to learn more about the autecology of the different tree species or to examine the impact of climate change on regeneration processes. Also evaluating the impact of natural forest recovery after historical (natural or human) disturbances should be observed in detail as different time scales may have shaped the tree layers.
Building on our results and with additional knowledge, conservation strategies could be developed for maintaining tree species biodiversity and particularly for maintaining threatened species. Since we only recorded the regeneration status at one point in time, we suggest continuous monitoring of its development by using the ratios introduced here. This would make it possible to address the question of species turnover and diversity change with more certainty for the Cat Ba National Park.

Study site
The data presented stems from northern Vietnam and was collected in the CBNP (20°44′ to 20°55′ N, 106°54′ to 107°10′ E). The national park is part of the Cat Ba Island archipelago located in the South China Sea. CBNP lies to the South of Halong City (25 km), and the Hanoi Capital is found 150 km north-west to CBNP (comp. Fig. 6).
CBNP comprises 366 islands of varying size [52,94]. The main rock bed is limestone. The park has a total size of nearly 16,200 ha. This includes maritime (5265 ha) and terrestrial sites (10,932 ha) [52,53]. The highest point of the park lies at 331 m above sea level, whereas the average elevation lies around 125 m above sea level. CBNP has a heterogeneous topography with slopes ranging from 15° to 35° [54]. The climate of CBNP is humid sub-tropical with precipitation sums of around 1500-2000 mm yr −1 , an average humidity far above 80%, and an average temperature of 23 °C yr −1 . The rain season lasts from May through October and the dry season lasts from November to April [52,95].
The forest ecosystems of CBNP are diverse and include evergreen limestone forests, wetland high mountain forests, and mangroves, next to caves and maritime coral reefs [52,95]. The evergreen broadleaf tropical rain forests of CBNP can be categorized as undisturbed primary forests or secondary forests, which have undergone significant disturbances by humans [96]. The secondary forests are mainly in the lower parts of the park and in the limestone mountains. Other secondary forests are restored moist evergreen, wetland, and bamboo forests, as well as mangrove forests (comp. Pham, et al. [48]). There are also former plantations in the park [53,96].
Due to its high plant and animal diversity, UNESCO granted the park the status of a biosphere reserve in 2004 [52]. The plant diversity is currently estimated to comprise 1561 plant species. These belong to 842 genera. More than 400 of the species are timber species, but there are also more than 1000 medicinal, edible and ornamental species. More details on species diversity can be found in Le and Le [97]. According to the CBNP report [53] and Le [95], 29 IUCN Red List tree species have to date been identified at CBNP. In addition, 43 are listed on the Vietnam red list and account for almost 60% of all tree species in Vietnam that are in need of protection.
A large share of CBNP (~ 45%) is dedicated to the protection of natural dynamics in six different core zones of the park (Fig. 6). These core zones are strictly protected, which means that no management measures are carried out. However, the accessibility to the core zones varies and data was collected in three out of the six areas along a gradient of accessibility (Fig. 6). In these areas, the protection efforts were mainly directed at the conservation of the evergreen broadleaf forests. In the following, these three areas are referred to as lowland area (LLA), mid-slope area (MSA), and isolated area (ISA). The size of the areas is about 1916 ha, 600 ha, and 1560 ha, respectively. The accessibility follows the same order, mainly due to the elevation, whereas ISA is additionally separated from the accessible part of the park through water (more details in Pham, et al. [48]).

Data sampling
We applied a simple random sampling technique [98] to set up the sample plots (Fig. 7). Each study area was divided into 30 strips. In each strip, random sample plots were generated using random numbers to determine their coordinates. Two uniform random numbers U 1i , U 2i (the U interval from 0 to 1) were used each time to calculate X i = U 1i x X max , with Y i = U 2i x Y max as coordinates for each random sample plot, and where X max , Y max was the highest coordinate of the area map (Fig. 7). If the coordinate (X i , Y i ) appeared in the defined strip, this point was accepted as a sample plot point. Otherwise, the point was  (Fig. 7).
Using this technique, we then randomly selected 30 plots within each of the three protected areas (LLA, MSA, ISA) summing up to 90 plots in total. Each plot was 500 m 2 in size (20 m × 25 m).

Standing tree layer
We recorded all trees with DBH (diameter at breast height) ≥ 5 cm on the plots, respectively. Their diameter and height were measured and their identity was determined by botanical experts from the Northeast College of Agriculture and Forestry (AFC) and park employees. Not all species could be identified in the field. For these, the genus or even only the family was recorded. All recorded species were assigned to categories of threat according to the IUCN [99][100][101][102].

Regeneration layer
The regeneration of tree species was recorded on five subplots which were established at five positions on each sample plot (Fig. 8). Each subplot was 25 m 2 (5 m × 5 m) in area. Subplots were positioned in the center and the corners of the square plot. Species identity of seedlings and saplings (defined as trees with DBH < 5 cm) were recorded here. Following the approach for the overstory tree species, species recorded in the regeneration layer were also assigned to categories of threat. Tree regeneration was assigned to four different height classes (< 50 cm, from 50 cm-100 cm, 100 cm-200 cm, and > 200 cm).

Growth site characteristics Topographic data
The topographic terrain variables recorded for the whole plot were the elevation in m above sea level (T_Ele), the slope in degrees (T_Sl), and the rock surface in percentage (T_RS). As measurement devices, we used an inclinometer for the slope and a GPS device (Garmin GPSMAP 64st) for coordinates and elevation. The rock surface was assessed visually on the basis of the five subplots (Fig. 8).

Soil conditions
Soil chemistry was derived from soil samples. An auger of 10 cm in diameter was used in the plot center to collect the samples. We only used the first 20 cm of the soil, because the nutritional status of this layer is most relevant for the plant vitality and growth in the area [103]. We took 90 soil samples in total -one sample from each plot. As variables describing soil conditions, we analyzed the samples for base saturation (S_BS) and cation exchange capacity (S_CEC), hydrolytic soil acidity (S_ HA), and pH value (S_pH). In addition, the soil humus (S_SH) and the absolute soil moisture content (S_SM) were derived.
In the first step, soil samples had to be dried at room temperature and sieved through a 2 mm mesh. This procedure removed larger rocks and organic material. Then the samples were oven-dried at 105 °C until a constant weight was reached after about 6-8 h. This allowed calculating the absolute soil moisture content (S_SM) by subtracting pre-and post-drying weights and dividing it by pre-drying weight. Mohr salt (K 2 Cr 2 O 7 ) was used to oxidatively determine the soil humus content (S_SH) following the Walkley and Black method [104,105]. The hydrolytic acidity (S_HA) was determined with the Kappen method using NaOH [104][105][106][107][108]. Finally, the cation exchange capacity (S_CEC) was determined following the Kjendhal method using Ammonium acetate (NH 4 CH 3 COOH) [104][105][106][107][108]. Here the CEC was K + + Ca 2+ + Mg 2+ + Na + + NH 4 + + H + + Al 3+ . The ratio of the exchangeable bases (Ca 2+ , Mg 2+ , K + , and Na + ) to the cation exchange capacity was defined as Base saturation (S_BS). All soil analyses were conducted at the Vietnam National University of Forestry. The soil physical variables soil texture (S_Clay, S_Sand, S_Silt) and rocks in the soil (S_SR) were also derived from the auger samples. The percentages of clay, sand, and silt were estimated with the Bouyoucos hydrometer method [109]. The percentage of rocks in the soil was estimated from a soil subsample. This subsample was sieved again and separated along the 2 mm threshold. The weight ratio was considered as a percentage value. To estimate soil depth (S_SD) a steel rod was used. Soil depth per plot was defined as the mean depth of five measurements across the plot (more details in Pham, et al. [48]).

Light indicators
Light availability was estimated by using the Solariscope (SOL 300B, Ing.-Büro Behling, Wedemark) [110], which takes and automatically analyses hemispheric photographs. Measurements were conducted at 2 m above the soil surface in three diagonal subplots across the sample plot (Fig. 8). The Solariscope characterizes seven properties related to light availability [110]:

Human impact
Until present, human activities can be recorded in the park, irrespective of the protection status. Also, the park is comparably young (established in 1986) and former harvesting, slash and burn but also hunting activities affect the forest structure until today [52,95]. Since the area is protected, a lot of effort is put into decreasing the abundance of human activities, especially in the core zones of the park. These activities even included resettlements towards outside the borders of the park. However, many villages are still located close to the park. Hence, human activities can still be detected within the park boundaries, despite them being illegal. These mainly include logging and hunting. As proxies for human activities, we counted footpaths (H_FP), tree stumps (H_STP), and poacher traps to catch animals (H_AT) on the plots.

Environmental characteristics of the study sites
Environmental characteristics in the three study sites differed ( Table 5). The average slope in ISA was twice as steep as in LLA. ISA also had the highest percentage of rock surface, followed by the MSA and LLA. The average elevation was lowest in MSA. The soil depth in LLA was deepest among the three study sites and shallowest in ISA. MSA was characterized by more rocky soil than the other two areas. The percentage of silt and clay in MSA was highest among the three study sites; however, soil moisture was highest in ISA. Although LLA was characterized by the deepest soils, soil chemical properties revealed lower pH, less humus content, and lower soil moisture than the other two areas. Light availability was comparable between the three study sites, with indirect site factors ranging between 8 and 10%. However, light availability was slightly lower in LLA compared to the other study sites. The factor L_LAI was highest in MSA, and L_ELAD was highest in ISA. Human disturbances such as footpaths and stumps occurred more frequently in LLA than in the other two sites, while most animal Table 6 Definition of five ratios contrasting tree species diversity in the regeneration and overstory layers traps were found in MSA as compared to LLA and ISA (Table 5).

Data analysis
To visualize and contrast species diversity in the overstory and regeneration layers for the entire study area, the "iNEXT" package was used in R [112] to estimate regional tree species diversity in both forest layers. This package is based on rarefaction and extrapolation methods and estimates diversity for different Hill numbers [113]. Hill numbers (q) represent the effective number of species and increasingly weigh the abundance or frequency of a species with increasing order of Hill numbers. This means that Hill numbers with q < 1 disproportionately favor infrequent species within the dataset, while all orders > 1 disproportionately favor frequent species [112,114]. We considered the first three Hill numbers as representing widely common species diversity measures including species richness (q = 0), the true diversity of the Shannon-Index which is the exponential of the Shannon-Index (q = 1), and Simpson diversity (q = 2) [112,114].
To investigate whether and how the overstory tree layer and the regeneration layer deviate in their tree species diversity and composition at the plot level, we also calculated species richness and the true diversity of the Shannon-Index (in the following referred to as true diversity) at the plot level. Species richness represents the total number of species per plot. The abundance and evenness of a species are accounted for in calculating the Shannon-Index as H' = − ∑(p i × lnp i ). Here the abundance of species i (n i ) is divided by the total number of species (N) (pi = n i /N), multiplying the result with its natural logarithm (lnp i ) [115]. We used the "vegan" package for calculating the Shannon-Index [116]. The true diversity was calculated as the exponent of the Shannon-Index (exp (H')) [113]. By dividing plot-based richness and diversity of the regeneration layer by the respective measures of the overstory layer, we calculated several ratios (Table 6).
We used the one sample t-test to check the similarity in diversity or species richness between overstory and regeneration layers. We compared the ratios to the value of 1. The null hypothesis of the one sample t-test is that the mean value of each ratio is equal to 1, indicating similarity between both forest layers in terms of diversity and species richness. The alternative hypothesis is that the mean value of each ratio is less than 1, indicating a less diverse regeneration layer compared to the overstory layer [117]. Before using the one sample t-test, the ratios were tested for normality of distribution with the Shapiro-Wilk test and a nonparametric Krukal-Wallis rank sum test.
Principal component analysis (PCA) was used to extract important variables from our set of environmental variables [118]. Input data for the PCA included the 24 environmental and human factors from the 90 random sample plots. In the first step, "prcomp()", "Factor-MinorR" and "factorextra" package were used to run the PCA [117,119]. Then, those PCs which best explained the variation in the data based on their eigenvalues were determined. We chose the three most important PCs for further analyses.
We built linear mixed effect models with the five ratios as response variables, the PCs as fixed effects, and the study area as random effect using the function "lme()" [120,121]. The first model was built with all three PCs, then backward elimination of PCs was done using a p-value at a 5% level of significance [51]. From these we selected the best fit model using the "model.sl()" function in "MuMIn" package [122]. Simultaneously, we built the full model with the six environmental variables (EV) most strongly correlated with the first three PC axes and conducted a model selection by using the "model.sl()" function in "MuMIn" package (Barton, 2009). The study site remained as random factor. Akaike information criterion (AICc) and log-likelihood estimation (logLik) were used as criteria to choose the best fit model. Finally, criteria were compared among the best "PC" and the best "EV" model [117,122]. We calculated the pseudo R 2 values to estimate the goodness of fit of the linear mixed effect model [123]. Thereby, the marginal R 2 indicates the explained variance by fixed effects only, whereas the conditional R 2 shows the explained variance by both fixed and random effects [117,122,123]. In addition to the five ratios, we also used the regeneration density as a response variable.
All statistical analyses were conducted using the statistical software R version 3.4.2 [117]. The level of significance was defined by a p-value < 0.05.
Data collection was conducted in close cooperation with the National Park authorities and all permissions were acquired before data sampling. See Tables 7, 8, 9, 10, 11 and 12 and Figs. 9, 10, 11 and 12. Table 7 Diversity estimates of the regeneration layer interpolated and extrapolated based on incidence data using the iNEXT package   Table 8 Diversity estimates of the overstory layer interpolated and extrapolated based on incidence data using the iNEXT package         . 9 a Coverage-based rarefaction and extrapolation, and b sample completeness for estimating species diversity based on incidence data. The solid line depicts the interpolation, and the dotted line shows the extrapolation of sample-based curves for tree species data of overstory and regeneration layers for different Hill numbers: q = 0 (species richness, left side), q = 1, (Shannon diversity, middle) and q = 2 (Simpson diversity, right side). The solid dots/triangles show the observed reference sample size of 90 plots      The sixteen most abundant species accounted for 51% of total tree species abundance in the overstory and 72% in the regeneration layer. The 16 species of the overstory that accounted for 51% provide 67% of the trees in the regeneration layer. Abundance columns show the number of tree species individuals across the 90 sample plots and 450 sub-sample plots. The percentage column was calculated by dividing the abundance of each species by all tree species abundance. Accumulation aggregated the percentage column from the first to the last species. The Group column classifies species as common species and threatened species and newly occurred species. Rank shows the ranking of the species in terms of their share of total abundance for the overstory and the regeneration. Species are sorted by the abundance of the overstory species from largest to smallest value. Sp1 to Sp5 are unidentified species Fig. 12 The combination of percentage abundance of tree species in the overstory (unframed light blue bar), and in the regeneration layer (framed light blue bar). Species ranking was conducted according to the rank-abundance in the overstory (x-axis; for tree species see Table 12). The dark blue bars show newly occurring species in the regeneration, the yellow bars represent threatened species