The genome of Shorea leprosula (Dipterocarpaceae) highlights the ecological relevance of drought in aseasonal tropical rainforests

Ng, Kevin Kit Siong; Kobayashi, Masaki J.; Fawcett, Jeffrey A.; Hatakeyama, Masaomi; Paape, Timothy; Ng, Chin Hong; Ang, Choon Cheng; Tnah, Lee Hong; Lee, Chai Ting; Nishiyama, Tomoaki; Sese, Jun; O’Brien, Michael J.; Copetti, Dario; Isa, Mohd Noor Mat; Ong, Robert Cyril; Putra, Mahardika; Siregar, Iskandar Z.; Indrioko, Sapto; Kosugi, Yoshiko; Izuno, Ayako; Isagi, Yuji; Lee, Soon Leong; Shimizu, Kentaro K.

doi:10.1038/s42003-021-02682-1

Download PDF

Article
Open access
Published: 07 October 2021

The genome of Shorea leprosula (Dipterocarpaceae) highlights the ecological relevance of drought in aseasonal tropical rainforests

Kevin Kit Siong Ng ORCID: orcid.org/0000-0002-7810-7575^1,2^na1,
Masaki J. Kobayashi ORCID: orcid.org/0000-0003-0554-8801^1,3,4,5^na1,
Jeffrey A. Fawcett^6,7,
Masaomi Hatakeyama^1,3,8,9,
Timothy Paape^1,3,
Chin Hong Ng²,
Choon Cheng Ang^1,3,
Lee Hong Tnah²,
Chai Ting Lee²,
Tomoaki Nishiyama ORCID: orcid.org/0000-0003-1279-7806¹⁰,
Jun Sese^5,11,12,
Michael J. O’Brien ORCID: orcid.org/0000-0003-0943-8423^1,3,13,
Dario Copetti^1,14,
Mohd Noor Mat Isa¹⁵,
Robert Cyril Ong¹⁶,
Mahardika Putra ORCID: orcid.org/0000-0003-3365-6025¹⁷,
Iskandar Z. Siregar¹⁷,
Sapto Indrioko¹⁸,
Yoshiko Kosugi¹⁹,
Ayako Izuno^1,19,20,
Yuji Isagi¹⁹,
Soon Leong Lee ORCID: orcid.org/0000-0001-5581-4691² &
…
Kentaro K. Shimizu ORCID: orcid.org/0000-0002-6483-1781^1,3,21

Communications Biology volume 4, Article number: 1166 (2021) Cite this article

6595 Accesses
12 Citations
93 Altmetric
Metrics details

Subjects

Abstract

Hyperdiverse tropical rainforests, such as the aseasonal forests in Southeast Asia, are supported by high annual rainfall. Its canopy is dominated by the species-rich tree family of Dipterocarpaceae (Asian dipterocarps), which has both ecological (e.g., supports flora and fauna) and economical (e.g., timber production) importance. Recent ecological studies suggested that rare irregular drought events may be an environmental stress and signal for the tropical trees. We assembled the genome of a widespread but near threatened dipterocarp, Shorea leprosula, and analyzed the transcriptome sequences of ten dipterocarp species representing seven genera. Comparative genomic and molecular dating analyses suggested a whole-genome duplication close to the Cretaceous-Paleogene extinction event followed by the diversification of major dipterocarp lineages (i.e. Dipterocarpoideae). Interestingly, the retained duplicated genes were enriched for genes upregulated by no-irrigation treatment. These findings provide molecular support for the relevance of drought for tropical trees despite the lack of an annual dry season.

Dipterocarpoidae genomics reveal their demography and adaptations to Asian rainforests

Article Open access 23 February 2024

Rong Wang, Chao-Nan Liu, … Xiao-Yong Chen

Reference transcriptomes and comparative analyses of six species in the threatened rosewood genus Dalbergia

Article Open access 20 October 2020

Tin Hang Hung, Thea So, … John J. MacKay

Exploring evolution and diversity of Chinese Dipterocarpaceae using next-generation sequencing

Article Open access 12 August 2019

Tijana Cvetković, Damien Daniel Hinsinger & Joeri Sergej Strijk

Introduction

Average annual rainfall is the highest in tropical rainforests, which harbor hotspots of biodiversity. Southeast Asian tropical rainforests are commonly aseasonal, without distinct intra-annual dry seasons, and are characterized by the dominant canopy tree family of Dipterocarpaceae^1,2,3. Recent research has pursued the importance of rainfall variation and drought for promoting species distribution⁴ and for triggering reproduction^5,6,7,8 in tropical forests, although ecologists have long-viewed light and soil characteristics as the main drivers of environmental filtering and species distributions in ever-wet tropical forests⁹. Drought events in this system are often associated with irregular supra-annual El Niño Southern Oscillations (ENSO), and climate models project more frequent and severe ENSO events^10,11,12. These increased drought patterns could alter synchronous general flowering^5,6,7,8,13, reduce plant growth and carbon sequestration¹⁴, increase tree mortality^15,16, and shift species composition¹⁷.

To complement the existing ecological studies, genomic studies may elucidate the potential importance of the inter-annual drought on plants. One of the major limitations of tropical plant studies is the paucity of genetic and genomic data for species of environmental and forestry relevance in contrast to crop and commodity-producing species (cacao¹⁸, rubber tree¹⁹, oil palm²⁰, and durian²¹). Nonetheless, several molecular studies using real-time PCR or de novo transcriptome approaches of Dipterocarpaceae suggested that expression levels of phenology- and stress-related genes^7,8 were associated with ENSO-related fluctuations in drought or temperature. This premises that a genome assembly would be valuable to test the relevance of drought in tropical trees.

The dominant tree family, Dipterocarpaceae (comprised of >500 species) has the center of diversity in tropical Southeast Asia, where 488 species of the subfamily Dipterocarpoideae are found^1,2. Their evolutionary origin remains enigmatic. While many dipterocarp researchers have proposed an ancient origin of the family in Gondwanaland (e.g., >120 Ma (million years ago))³, molecular dating studies have suggested a much younger date^22,23. In support of the importance of inter-annual drought events, dipterocarp species appear to have maintained a functional response to drought at the community level, which promotes species coexistence²⁴ and diversity²⁵ and synchronizes reproduction^5,6,7,8. Besides their ecological importance, Asian dipterocarps lead the international tropical timber market, therefore playing an important role in the economy of many countries within the region²⁶. They are critically important as keystone species and serve as active carbon sink³. Despite the research activities of Asian dipterocarps dating back to 1825²⁷, the main issue in tropical tree breeding and improvement are the complexity and cost of the breeding programs as well as the long breeding cycles. Additionally, many of the dipterocarp species are now categorized as near threatened or endangered as a result of exploitation and massive population reduction²⁸, further indicating the need of the genomic resources for strengthening research related to genetic conservation of dipterocarps^29,30.

Here, we report a draft genome assembly of Shorea leprosula, a species that has been used as a representative of Dipterocarpaceae to assess genetic diversity by allozymes, nuclear SSR, AFLPs, and chloroplast loci^{31,32,33,34,35}. It is locally known as Meranti Tembaga, and is internationally traded under the Light Red Meranti timber group. This species is widely distributed throughout aseasonal tropical rainforests of Southeast Asia (Peninsular Malaysia, Borneo, and Sumatra)^1,36, but is classified as a near-threatened category under the IUCN Red List³⁷. We showed that an ancient whole-genome duplication (WGD) event coincided with the Cretaceous–Paleogene (K-Pg) boundary using the genome-wide data of 19 distribution-wide S. leprosula individuals as well as of 10 species from seven genera of Dipterocarpaceae. Genes that were upregulated by no-irrigation treatment were significantly enriched in the retained duplicated genes. Climate data supported that S. leprosula is distributed in the environments with irregular drought despite the lack of annual dry season. The availability of the genome assembly of a dipterocarp is of great utility for genetic conservation and plant breeding in facing global changes.

Results

Genome assembly

Whole-genome sequencing of S. leprosula (Fig. 1) was performed on Illumina HiSeq platform, using paired-end and mate-pair libraries with various insert sizes ranging from 170 bp to 17 kb, with over 380-fold coverage of its haploid genome (n = 7)^38,39 (Supplementary Table 1). The contig and scaffold N50 lengths obtained from the ALLPATHSLG⁴⁰ assembly were 7.8 kb (spanning the longest 71,752 contigs) and 2.07 Mb (with 2913 scaffolds above 1 kb), respectively. The total size of the assembly of scaffolds was 340.5 Mb (Table 1). Thus, the scaffolds covered ~85% and ~87% of the estimated genome ∼402 Mb by flow cytometry⁴¹ and ~391 Mb by k-mer distribution⁴², respectively. K-mer Analysis Toolkit (KAT)⁴³ analysis revealed two peaks (Supplementary Fig. 1), confirming the genome of S. leprosula is heterozygous. The frequency of the k-mers in the assembly confirmed that the assembly is haploid (i.e., only one of the two heterozygous variants is present).

**Fig. 1: The *Shorea leprosula* tree that was used for genome sequencing.**

Table 1 Summary statistics of the Shorea leprosula draft genome assembly.

Full size table

To validate the genome assembly, we mapped all paired-end and mate-pair reads to the assembled genome and found that the vast majority of the reads (93.35%) aligned (Supplementary Table 1). To assess the completeness of our assembly, we compared it to 1440 core genes in the Embryophyta lineage using BUSCO⁴⁴, finding that 93.3% of them were present (79.7% in a single copy, 13.6% in two copies), with only 2.5% and 4.2% fragmented or missing, respectively, comparable to available assemblies of cacao (95.8%)¹⁸ and durian (90.3%)²¹ in Malvales. We also confirmed that the vast majority of RNA-seq reads of seven organs of S. leprosula (namely leaf buds, flower bud, flower, inner bark, small seed, large seed, and calyx) obtained from the sequenced individual were mapped on the assembly (~86%) (Supplementary Table 2).

Genome annotation

To annotate the S. leprosula assembly, we first identified transposable elements and non-genic repeated sequences. We found that about 132 Mb of sequence (corresponding to 33% of the assembly) were attributed to transposable elements and repeats (Table 1 and Supplementary Table 3). Gene prediction with AUGUSTUS⁴⁵ and the RNA-seq reads of seven organs described above resulted in 60,563 protein-coding gene models (Supplementary Table 4). In a further evaluation, the S. leprosula models were compared with the protein-coding genes of Theobroma cacao¹⁸ (cacao, Malvaceae, which is distantly related in Malvales and is still the closest well-characterized relative of Dipterocarpaceae without lineage-specific genome duplication) and Arabidopsis thaliana, and we found that 43,868 genes were supported by homology. Moreover, out of the 43,868 genes with homology, 20,690 genes showed synteny with the T. cacao assembly by using MCScanX. Based on these empirical supports, we classified the predicted genes into three categories: category A for the 20,690 genes with synteny; category B for the 23,178 genes with homology with either T. cacao and/or A. thaliana but without synteny; category C for the 16,695 genes without clear homology (Supplementary Fig. 2 and Supplementary Tables 4 and 5). Category C was composed mostly of predicted genes shorter than 80 or 50 amino acids (aa) (mean length ~122 aa compared to those in the categories A and B being 414 and 458 aa on average, respectively (Supplementary Table 5).

To test whether genes in the A and B categories are also present in the individuals from different populations and other dipterocarp species, we analyzed the resequencing data of 19 S. leprosula individuals covering the distribution range (Borneo, Sumatra, and Peninsular Malaysia, Supplementary Table 6), obtaining 673,772 SNPs. The resequencing of three dipterocarp species Shorea platycarpa, Neobalanocarpus heimii, and Dryobalanops aromatica (Supplementary Table 7) showed relatively high mapping rate (73–92%), allowing the identification of homologs. We found that 30,677 (70%) out of 43,868 genes of categories A and B were present in all the studied individuals and species (Supplementary Table 5). Using the 30,677 genes that were found in all samples (Supplementary Table 4), genome-wide average nucleotide diversity (π), Watterson’s theta (θ_w), and Tajima’s D values were estimated as 0.0072, 0.0095, and −0.9801, respectively (Supplementary Table 8 and Supplementary Fig. 3), which was comparable to a previous study that used fewer nuclear loci⁴⁶. Admixture analysis of 19 individuals of S. leprosula from Peninsular Malaysia, Sumatra, and Borneo based on the cross-validation error plot suggests the presence of two subpopulations (K = 2) (Supplementary Fig. 4); where the samples from Borneo were split from those of Peninsular Malaysia and Sumatra (Supplementary Fig. 5). Because of the empirical support in closely related species and populations, and longer protein sequences, we considered that the gene models in the categories A and B (43,868 genes) were of high-confidence genes.

Ancient whole-genome duplication (WGD)

In order to understand the genome evolution in Dipterocarpaceae, we assessed synteny between S. leprosula and T. cacao. As visualized in a dotplot (Fig. 2a and Supplementary Table 9), most T. cacao genomic regions were syntenic to two genomic regions of S. leprosula. This suggested that the entire genome of S. leprosula duplicated after its divergence from the lineage of T. cacao. Among the 20,690 S. leprosula genes (category A) that had syntenic homologs in T. cacao, more than half (12,886 genes, 62%) were retained as duplicates in the collinear blocks of S. leprosula (Supplementary Tables 4 and 10). We then estimated the expected number of synonymous substitutions per synonymous site (Ks) among the S. leprosula collinear duplicates. The Ks distribution showed a single and distinct peak around Ks = 0.3 (Fig. 2b). This result further supports that these genes duplicated around the same time, most probably via a single WGD event. In addition, the Ks estimates between the S. leprosula and T. cacao orthologs were considerably larger than those between the S. leprosula collinear duplicates (hereafter, referred to as “the WGD-retained duplicates”) (Fig. 2b). This is consistent with the WGD being specific to the lineage of S. leprosula and not shared with the lineage of T. cacao, and also suggests that the WGD is considerably younger than the divergence of Dipterocarpaceae and Malvaceae.

**Fig. 2: Assessment of whole-genome duplication.**

To test whether the WGD can be observed in (is shared with) the other dipterocarp species, we examined the Ks distributions of the duplicated genes (between 4004 to 7108 genes) obtained by transcriptome assembly of 10 other species from seven different genera (Supplementary Tables 4 and 11). The Ks distributions of all species also had single peaks around Ks = 0.3 (Supplementary Fig. 6a), suggesting that the WGD event occurred before the split of the examined species in Dipterocarpoideae. To validate this finding further, we also checked the Ks distributions of ortholog pairs between S. leprosula and the other Dipterocarpoideae species (Fig. 2b, Supplementary Fig. 6b, and Supplementary Data 1). In all the studied species, the peak of Ks estimates for orthologous genes was lower than the peak corresponding to the WGDs. Taken together, these results place the WGD event after the split from T. cacao, but before the divergence of the examined Dipterocarpoideae species.

The WGD event coincided with the K-Pg boundary, as in other plant lineages

To further understand when the WGD event occurred, we estimated the timing of the WGD event by focusing on the WGD-retained duplicates in S. leprosula that have syntenic homologs in the T. cacao genome. To obtain an age estimate of the WGD, phylogenetic dating was performed (Supplementary Data 2) using a Bayesian evolutionary analysis framework previously described⁴⁷ for 204 orthologous groups with cleaned alignment lengths of at least 100 aa. For each of these orthologous groups, the dates for each node were estimated by incorporating fossil calibrations and the dates obtained from previous studies (i.e., secondary calibrations) as prior information to account for the uncertainty in the ages of the calibrations. Using two different calibration settings (Supplementary Table 12), we estimated the timing of the WGD event as 66.9 Ma (95% CI, 61.3–69.3 Ma) and 69.7 Ma (95% CI, 67.7–75.3 Ma). Likewise, the divergence between the Dipterocarpaceae and Malvaceae was estimated to be ~86–98 Ma, whereas the divergence between the different dipterocarp lineages represented by the nodes 4 and 5 were estimated to be ~42–50 and ~36–40 Ma, respectively (Fig. 3, Supplementary Figs. 7–9 and, Supplementary Table 13). These results suggest that the ancestral dipterocarp lineage underwent a WGD close to the Cretaceous–Paleogene (K-Pg) extinction event of ~66 Ma, as in many other angiosperm plant linegaes^47,48.

**Fig. 3: Time estimation of the whole-genome duplication.**

Characterization of duplicated genes in dipterocarps

We next characterized the WGD-retained duplicates in the S. leprosula genome. First, we focused on their overall evolutionary trends. Previous studies suggest that genes retained as duplicates after WGD tend to show slower evolutionary rates at nonsynonymous sites than genes not retained as duplicates during the long rediploidization processes (loss of some gene duplicates after WGD)^49,50. To test whether a similar trend is observed in the WGD-retained duplicates of S. leprosula, we estimated Ka, Ks, and Ka/Ks using the orthologs between S. leprosula and T. cacao and compared the results between the WGD-retained duplicates and the genes that lost the syntenic duplicates derived from the WGD event (“the non-retained genes”). Our analysis showed that Ka and Ka/Ks estimates for the WGD-retained duplicates were significantly lower than those for the non-retained genes (Supplementary Fig. 10), indicating slower evolutionary rates of the WGD-retained duplicates at nonsynonymous sites. In contrast, such a significant difference was not observed for the Ks estimates between the WGD-retained duplicates and the non-retained genes (Supplementary Fig. 10). Therefore, changes in substitution rates were specific to the nonsynonymous sites.

Gene retention and loss are shown to be nonrandom with respect to gene function in ancient polyploids^{51,52,53,54,55,56}. Hence, we examined the common functions of the 12,886 WGD-retained duplicates using a gene ontology (GO) enrichment test against the GO terms of A. thaliana orthologs. A large number of genes related to transcriptional regulation, signal transduction, and development were retained (Supplementary Table 14), consistent with previous findings reported for A. thaliana and other plants^{51,52,53,54,55}. In addition to these terms commonly enriched in retained duplicated genes, drought-related terms, such as “response to salt stress” and “response to abscisic acid” were also found. To test whether the retention of the drought-related genes is specific to S. leprosula or a common feature among the Dipterocarpoideae species, we investigated the retention of these duplicated genes in the resequencing data obtained from the population and the interspecific samples (Supplementary Tables 6 and 7). Of the 12,886 WGD-retained duplicates, most of them (87%, 11,250 genes) had both copies of the corresponding homologs (Supplementary Table 15). A GO enrichment test of this conserved gene set also yielded similar results including “response to abscisic acid” (Supplementary Table 16). These data suggest that the retention of the drought-related duplicated genes is a common feature among the Dipterocarpoideae species in aseasonal tropics, rather than being a lineage-specific character of S. leprosula.

We also examined the common functions of tandemly duplicated genes in the S. leprosula genome by GO enrichment test. We found that 1212 genes in the category A had tandemly duplicated copies (Supplementary Table 4), and that their enriched functions were not overlapped with those of the WGD-retained duplicates (Supplementary Table 17).

Functional analysis of drought-responsive genes via no-irrigation treatment

Although we obtained results showing that drought-related genes were significantly enriched in the WGD-retained duplicates using the GO terms assigned based on the homologies to the A. thaliana orthologs, homology to functionally verified A. thaliana genes does not ensure that the S. leprosula homologs also have a role in response to drought. Therefore, we characterized drought-responsive genes of S. leprosula by performing a no-irrigation treatment of S. leprosula seedlings (Supplementary Table 18). Leaf samples were collected for RNA-seq analysis at the beginning of the treatment and at the 7th day, which was slightly before the 9th day when the typical wilting symptom (withered and brown leaves) was observed (Fig. 4a, b). Under this water stress condition, we conducted an expression analysis using genes from all three categories. Differential expression analysis identified 1200 upregulated and 914 downregulated genes in total, of which the A category had 829 and 658 genes, respectively (Supplementary Fig. 11 and Supplementary Tables 19 and 20). In the upregulated gene list, the highest-ranking GO terms were similar to those known to be involved in the drought response, such as “response to water deprivation”, “response to abscisic acid”, and “response to salt stress” (Supplementary Table 21). In addition, the enriched categories encompassed “response to chitin” and “response to oxidative stress”, which may be attributable to the crosstalk of the signaling of abscisic acid, wounding, and defense facing the high pressure of pathogens in the tropics^57,58. GO terms related to photosynthesis, light, and biosynthetic processes (starch, chlorophyll, glycogen, and amylopectin) were enriched among the downregulated genes (Supplementary Table 22).

**Fig. 4: No-irrigation treatment on *Shorea leprosula* seedlings carried out in a growth chamber.**

Using the genes that responded to the no-irrigation treatment, we tested whether these are significantly enriched in the WGD-retained duplicates. Fisher’s exact test showed significant enrichment of upregulated genes in the WGD-retained duplicates (Bonferroni corrected P = 0.0004, Table 2, and Fig. 4c), in contrast to non-significant enrichment for the downregulated genes (Bonferroni corrected P = 1.0000, Table 2 and Fig. 4d). This result is consistent with that obtained in the GO analysis described above, and indicates that the observed enrichment of drought-response genes is not likely due to artifacts in the GO enrichment test based on the homologies to the A. thaliana orthologs. These WGD-retained drought-up genes also showed slower evolutionary rates at nonsynonymous sites, compared with the non-retained genes (Supplementary Fig. 10). On the other hand, such significant enrichments of drought-response genes were not found in the tandemly duplicated genes (Supplementary Table 23), similarly to the results of the GO enrichment test (Supplementary Table 17). We found that the list of WGD-retained drought-up genes encompassed genes involved in diverse molecular roles in drought stress-response pathways (Supplementary Table 19), including a homolog of ABI1 (encoding a receptor component of plant hormone abscisic acid), DREB2C (encoding a key transcriptional factor in dry treatment), and TIP1 and TIP3 (encoding water-transport aquaporins) (Supplementary Fig. 11). These results support the hypothesis that some of the WGD-retained duplicates in the Dipterocarpoideae species tend to function in drought response.

Table 2 Comparisons between the WGD-retained genes and the differentially expressed genes under the no-irrigation treatment.

Full size table

Irregular drought instead of annual dry season

To examine whether the populations of S. leprosula experience dry environments, we analyzed multiple datasets of precipitation across the range of S. leprosula. First, we extracted the precipitation of the driest month across the spatial range of S. leprosula from the WorldClim data. Despite a broad variation, most localities (173) across the range of S. leprosula had greater than 100 mm of rainfall in the driest month (i.e., the driest month still met the evapotranspiration demands at the site). These values showed little overlap with species in seasonal forests (Shorea roxburghii is provided as an example from seasonal forests; Supplementary Fig. 12a, b). These data indicate few sites across the range of S. leprosula have an annual dry season. Second, we analyzed average 30-day cumulative rainfall from 2001 to 2014 measured at two localities within the distribution of S. leprosula (Pasoh Forest Reserve, Peninsular Malaysia, and Danum Valley Field Centre, Borneo). We found it fell below 100 mm roughly 20% in Pasoh and roughly 5% in Danum site (Supplementary Fig. 12c–f). The latter site was wetter but there were still supra-annual drought events (below 100 mm in 2002 and 2010). The combination of these modeled and observed climate data suggest that S. leprosula is distributed in the environments with irregular drought events even if they lack annual dry season.

Discussion

We sequenced the genome of S. leprosula using Illumina paired-end and mate-pair sequencing strategy, yielding sequence dataset of ~388-fold genome coverage. K-mer analysis, BUSCO analysis, and high-read-mapping rate indicated the completeness and accuracy of our genome assembly. We annotated 43,868 high-confidence genes showing homology to T. cacao and A. thaliana proteomes.

Our comparative genomic and molecular dating results, together with many recent studies on angiosperm evolution, allow us to propose the following scenario regarding the evolution and biogeography of Dipterocarpaceae. First, the Dipterocarpaceae lineage split from the lineage of Malvaceae in the Late Cretaceous, followed by a WGD in the common ancestor of the Dipterocarpoideae species close to the K-Pg boundary, after which the Dipterocarpoideae lineages diverged during the Eocene. Thus, Dipterocarpaceae provides another example that has been observed across many plant groups where the diversification occurred following a WGD around the K-Pg extinction event^59,60,61. This timeline contrasts with the scenario hypothesized by many dipterocarp researchers which posits that the Dipterocarpaceae originated on Gondwanaland >120 Ma³ or >135 Ma⁶², and that Dipterocarpaceae and related lineages distributed in South America, Africa, Madagascar, Seychelles, and Asia diverged due to the breakup of the Gondwanan landmasses^3,63, i.e., Gondwanan vicariance. However, the Gondwanan origin of Dipterocarpaceae is clearly not consistent with the generally accepted timeline of the divergence of various angiosperm/eudicot clades, in particular, that the origin of the eudicots should not be much older than ∼130 Ma^22,64. Moreover, molecular dating studies of many different Tropical and Southern Hemisphere plant groups that show “Gondwanan” trans-continental distributions have reported much younger divergence date estimates that are not consistent with a strict Gondwanan vicariance scenario^{65,66,67,68,69}. Our estimates of the divergence between Dipterocarpaceae and Malvaceae (86–98 Ma) are much younger than the proposed dates assuming a Gondwanan origin, as expected considering our priors, although it is worth noting that they are slightly older than previous estimates based on molecular dating (~70–80 Ma^22,63). In addition, we obtained much younger estimates for the divergence of the Dipterocarpoideae lineages that are not consistent with the assumption that the separation of India and the Seychelles caused the divergence of certain lineages (see also “Methods”)^3,63.

The main hypothesis to explain the inconsistencies between the molecular dating results and vicariance scenarios in many plant lineages is that long-distance and trans-oceanic dispersals are much more common than thought before^{65,66,67,68,69}. For Dipterocarpaceae, such dispersals have been considered highly unlikely because their seeds lack dormancy, show salt-intolerance, and have low dispersal capacity^3,63. Yet, results indicating long-distance dispersal have also been obtained for plant groups, such as the Nothofagus species, which show trans-oceanic distributions despite having poor dispersal capacity⁶⁵. It is also worth noting that the exact timing and nature of the Gondwanan breakup is debatable, and that there may have been connected landmasses that enabled overland dispersal after the various proposed dates of the separation of landmasses^66,67,70. Thus, while our results, combined with various recent findings, suggest that dispersal played a key role in the trans-continental distribution of Dipterocarpaceae, its exact mechanism remains an open question that is relevant also to many other plant groups.

It is known that the aseasonal tropical rainforests of Southeast Asia region (where dipterocarps dominate) receives high annual rainfalls. S. leprosula is a typical species in aseasonal tropical rainforests, and the precipitation of the driest month in its habitat is clearly higher than those in the habitat of S. roxburghii, which is a species in seasonal tropics (Supplementary Fig. 12a, b). Although S. leprosula inhabits regions with no annual dry season, our results showed that the drought-up genes are preferentially retained after the WGD event in this species (Table 2 and Fig. 4), and these WGD-retained drought-up genes are likely to conserve their functions because of their slower evolutionary rates at nonsynonymous sites (Supplementary Fig. 10). It is yet to be shown whether these substitution rate differences are biologically relevant. Nevertheless, the WGD-retained duplicates were conserved among the three species in different genera (Shorea, Dryobalanops, and Neobalanocarpus) inhabiting aseasonal tropics as well as among the 19 S. leprosula individuals of different populations (Supplementary Tables 15 and 16). The observed conservation suggests that these WGD-retained drought-related genes have been functionally important, not only at the WGD event, but also during the subsequent period in dipterocarp species in aseasonal tropics. At the WGD event, the genome duplication and duplicated drought-related genes might allow the ancestral dipterocarp species to develop tolerance to harsh environments during the mass-extinction period of the K-Pg boundary because contemporary polyploids often show enhanced environmental tolerance^71,72,73. After the period around the WGD event, paleoclimate studies suggest that Asian dipterocarps lived in climates with dry seasons^74,75,76, which might have contributed to the retention of the WGD-derived drought-related genes. In the present-day condition, aseasonal tropics in Southeast Asia receive high annual rainfalls and also suffers from occasional drought mostly due to ENSO. Although such drought conditions rarely occur, the irregular supra-annual drought (Supplementary Fig. 12) may be the basis for the preferential retention of drought-related duplicated genes in the Asian dipterocarps of aseasonal tropics. The observed preferential retention of the WGD-derived drought-related genes does not contradict the recent ecological studies that showed the relevance of inter-annual drought events in dipterocarp species in aseasonal tropical rainforests in Southeast Asia^{5,6,7,8,24,25}. Nevertheless, it is still difficult to reveal the significance of an additional copy of a drought-related gene. We note that the enrichment of retained drought-related genes in Dipterocarpaceae was originated by WGD (Supplementary Tables 14 and 16) rather than by tandem duplication (Supplementary Table 17), in contrast to lineage-specific tandem duplication of stress-related genes reported in e.g., A. thaliana⁷⁷.

In 2015, Malaysia and Indonesia contributed over 37.8% (93.7 million m³) of the total global production of tropical saw and veneer log, and more than 70% (4.8 million m³) of the total global export of plywood²⁶. The growing demand for timber and timber products requires that tree breeders accelerate the improvement of germplasm. The lack of improved planting materials and knowledge of genetic and genomic resources such as the availability of high-density markers or even genetic maps for any dipterocarps hinders the success of forestry plantation. Our data of genome assembly, genome-wide polymorphisms, and divergence between 10 additional dipterocarp species will serve as a solid basis for establishing a molecular breeding program for Dipterocarpaceae. Here, we identified 673,772 SNPs by the resequencing of 19 individuals throughout the distribution range. The population structure analysis showed the split of Bornean populations from those of Peninsular Malaysia and Sumatra, which informs the design of breeding and association studies. Our findings support the hypothesis stating that canopy trees^35,78,79 and other terrestrial organisms^80,81,82 in Sundaland were divided into two clusters from the drowning of Sunda Shelf after the Last Glacial Maximum⁸³.

Dipterocarp species are keystones in Asian tropical ecosystems. The biomass estimates of natural Asian dipterocarp forests range from 205 to 496 Mg per ha^84,85,86, with biomass values 30–60% higher than those of the corresponding forest in Amazonia^87,88,89, which highlights their high carbon storage value³. Presently, a large number of dipterocarp species have and are currently being planted and monitored in the Sabah Biodiversity Experiment and FRIM’s Common Garden Experiment sites, and thus would provide opportunities for establishing genome-wide association studies, genomic selection, and ecological genomics analyses^29,30. Considering the critical contribution of tropical forests to the earth systems, it is urgent to fill the gap of molecular knowledge about tropical trees to a level that is comparable to that of temperate regions.

Methods

Sequencing of Shorea leprosula genome

Sample collection

Leaf samples of S. leprosula were obtained from a reproductively mature (diameter at breast height, 50 cm) diploid tree B1_19 (DNA ID 214) grown in the Dipterocarp Arboretum, Forest Research Institute Malaysia (FRIM).

DNA extraction

Genomic DNA was extracted from leaf samples using the 2% cetyltrimethylammonium bromide (CTAB) method⁹⁰ and purified using a High Pure PCR Template Purification kit (Roche).

Library preparation and sequencing

Paired-end (170, 500, and 800 bp) and mate-pair (2 kb) genomic libraries were prepared using a TruSeq DNA Library Preparation kit (Illumina) and a Mate Pair Library Preparation kit (Illumina), respectively. Mate-pair libraries with larger insert sizes were constructed using a Nextera Mate Pair Library Preparation kit (Illumina). Ten micrograms of genomic DNA were tagmented in a 400 μl reaction and fractionated using SageELF, with the recovery of 11 fractions with 3–16+ kb. Each fraction was circularized and fragmented with a Covaris S2. Biotin-containing fragments were purified using Dynabeads M-280 streptavidin beads. Sequencing adapters (KAPA TruSeq Adapter kit) were attached using a KAPA Hyper Prep kit. The libraries were amplified for 10–13 cycles and purified with 0.8× AMpure XP. DNA libraries were then sequenced (~388× coverage) using Illumina HiSeq2000 (TruSeq libraries) and HiSeq2500 (Nextera libraries) at the Functional Genomics Center Zurich (FGCZ), University of Zurich, Switzerland (Supplementary Table 1).