Whole genome based insights into the phylogeny and evolution of the Juglandaceae

The walnut family (Juglandaceae) contains commercially important woody trees commonly called walnut, wingnut, pecan and hickory. Phylogenetic relationships and diversification within the Juglandaceae are classic and hot scientific topics that have been elucidated by recent fossil, morphological, molecular, and (paleo) environmental data. Further resolution of relationships among and within genera is still needed and can be achieved by analysis of the variation of chloroplast, mtDNA, and nuclear genomes. We reconstructed the backbone phylogenetic relationships of Juglandaceae using organelle and nuclear genome data from 27 species. The divergence time of Juglandaceae was estimated to be 78.7 Mya. The major lineages diversified in warm and dry habitats during the mid-Paleocene and early Eocene. The plastid, mitochondrial, and nuclear phylogenetic analyses all revealed three subfamilies, i.e., Juglandoideae, Engelhardioideae, Rhoipteleoideae. Five genera of Juglandoideae were strongly supported. Juglandaceae were estimated to have originated during the late Cretaceous, while Juglandoideae were estimated to have originated during the Paleocene, with evidence for rapid diversification events during several glacial and geological periods. The phylogenetic analyses of organelle sequences and nuclear genome yielded highly supported incongruence positions for J. cinerea, J. hopeiensis, and Platycarya strobilacea. Winged fruit were the ancestral condition in the Juglandoideae, but adaptation to novel dispersal and regeneration regimes after the Cretaceous-Paleogene boundary led to the independent evolution of zoochory among several genera of the Juglandaceae. A fully resolved, strongly supported, time-calibrated phylogenetic tree of Juglandaceae can provide an important framework for studying classification, diversification, biogeography, and comparative genomics of plant lineages. Our addition of new, annotated whole chloroplast genomic sequences and identification of their variability informs the study of their evolution in walnuts (Juglandaceae).

complete chloroplast genomes [6,9] and whole genome resequencing data [11]. The plastid genome has provided insight into molecular phylogeny and evolutionary relationships at many taxonomic levels [4,9,12,13]. Foundational genetic studies of the Juglandaceae were based on analysis of selected loci [14][15][16]. Whole genome scale studies can be useful-and in some cases necessary-supplements to previous research. Whole genomes are particularly suited to resolution of evolutionary relationships where sequence variation is limited by taxonomic level, early divergence, large difference in morphology, rapid speciation or slow genome evolution [7,[17][18][19][20].
Although studies based on a limited number of loci (chloroplast DNA fragments) and fossil evidence have greatly advanced our understanding of Juglandaceae [16, 17, 23-27, 30, 31], some relationships within Juglans, Carya, and Pterocarya are weakly supported or conflicting; especially the relationship of Platycarya to Carya, and the position of Cyclocarya and Pterocarya in relation to Juglans [16,36]. Other issues include the placement of the Rhoipteleaceae, a monotypic family containing only the species Rhoiptelea chiliantha [36,43]. It was placed in the Juglandaceae by APG III (2009) system ( Fig. 1) [44]. Similarly, the genus Annamocarya contains only one

species, A. sinensis. Placement of Annamocarya within
Carya is well-accepted [27,35], although it shares a number of characteristics with walnuts (genus Juglans). The evolution of the Juglandaceae remains a difficult problem too; hypothesized to have both ancient and recent extinctions and radiations [21,27,45], the family is considered species poor. The species that remain, however, are divergent in their ecology (wind versus animaldispersed fruit) [31], and flower development [23].
The primary goal of this study was to increase the resolution of the molecular phylogeny of the Juglandaceae by maximizing the number of taxa sampled and the number of genetic markers used [23,28,31]. We selected 27 Juglandaceae taxa, slightly more than half of the ~ 60 recognized species from three subfamilies (Engelhardioideae, Juglandoideae, and Rhoipteleoideae), and from seven of the ten worldwide genera, making this the most comprehensive study to date. We used sequence data from matrilineally (chloroplast genomes and mitochondrial protein-coding genes) and biparentally (whole genome re-sequencing of nuclear genome SNPs) inherited DNA to illuminate the evolutionary history of the Juglandaceae. We also reanalyzed phylogenetic relationships of 55 species using ITS (Internal transcribed spacers) sequences. Our goal was to (1) reconstruct the phylogenetic relationships of the family Juglandaceae based on whole chloroplast genomes, whole genome re-sequencing of nuclear genome SNPs (nrSNPs), ITS, and sixteen mitochondrial protein-coding genes (mtCDS), with an eye toward the major unresolved systematic questions in this family, (2) compare the plastid genomes of Juglandaceae, and identify the location and extent of genetic variations in these genomes across within the Juglandaceae, (3) reconstruct a time-calibrated phylogeny of the Juglandaceae based on whole chloroplast genomes, (4) reveal the timing of diversification for important nodes within the family.

Phylogenetic relationships of the Juglandaceae
Based on best-fit partitioning schemes and models, the phylogenies returned from the RAxML and MrBayes analyses using 61 chloroplast protein-coding genes showed all branches highly supported (Fig. 3a). Within the Fagales, members of the Juglandaceae were closest to the Myricaceae and Betulaceae (Fig. 3b). Species within the Juglandaceae divided into three groups corresponding to the three previously described sub-families (Juglandoideae, Engelhardioideae, and Rhoipteleoideae) with 100% bootstrap (BS) support based on mtCDS and chloroplast genomes using maximum likelihood (ML) analysis ( Fig. 3a, b).
Within the Juglandoideae subfamily, the species divided into five groups, corresponding to the five genera Carya, Platycarya, Cyclocarya, Pterocarya, and Juglans that were strongly supported as monophyletic (Fig. 3a). The genus Pterocarya was most closely related to Juglans (Fig. 3). The wheel wingnut (Cyclocarya paliurus) is the sole member of its genus in Juglandaceae. It was monophyletic  and most closely related to Pterocarya based on chloroplast genomes (Fig. 3a). In Carya, Pecan (C. illinoinensis a North American species) was joined with the other four species of Carya (Asian hickories) with 100% BS. The cladograms supported the current division of Carya into two sections (Sect. Sinocarya, Asian hickories, i.e., C. cathayensis, C. hunanensis, C. kweichowensis, and C. sinensis; and Sect. Apocarya, which includes C. illinoinensis). We also confirmed that the genus Annamocarya (A. sinensis) is properly within Carya and closest to Carya regia/J. sigillata were extremely short, further supporting their recent divergence. Based on 1,161,468 nuclear SNPs, the phylogenetic analysis showed a generally well-supported clustering topology with high bootstrap values when rooted against Populus trichocarpa as the outgroup (Fig. 3c). The resulting phylogeny identified and provided 100 % support for the three sub-families that we observed in the genome-based phylogeny of the Juglandaceae (Fig. 3): Clade I (Rhoipteleoideae), clade II (Engelhardioideae), and clade III (Juglandoideae). Clade III (Juglandoideae) contained five genera Platycarya, Carya, Cyclocarya, Pterocarya, and Juglans, however, the relative placement of the three genera, Carya, Platycarya, and Cyclocarya was not consistent in the phylogenies based on the combined Cp and mitochondrial genomes versus the nuclear data. Although we only used one species in Platycarya, our results strongly supported the model that Cyclocarya Table 2 Summary of variants from all Juglandaceae genomes based on comparison with Juglans regia whole genome sequences cp-SNPs the number of SNPs of chloroplast genomes, nr-SNPs the number of SNPs of whole genome resequencing, cp-Indels the number of Indels of chloroplast genomes, Ts/Tv ratio the transition/transversion ratio based on chloroplast genomes and whole genome resequencing data respectively, Mapped the mapped ratio of whole genome resequencing data used common walnut genome sequence data, Het-ratio the Heterozygosity ratio of each samples based on the whole genome resequencing data and Platycarya are monophyletic with long branches and taxa-specific SNPs ( Fig. 3c; Additional file 1: Table S3). Based on nuclear SNPs, we found a strong sister relationship of Cyclocarya to Pterocarya and, secondarily, to Juglans (Fig. 3c), as suggested by Manos et al. (2007) [17] and Larson-Johnson (2016) [35]. We reconstructed the Bayesian and ML trees based on ITS sequences of 55 Juglandaceae species (Fig. S3). The resulting phylogenetic tree showed that the three subfamilies, Juglandoideae, Engelhardioideae, and Rhoiptelioideae, cluster as monophyletic branches, however, support for the genera within the Juglandoideae was weak (< 50 %) (Additional file 4: Fig. S3). ITS alone produced cladograms markedly different than accepted topologies.

The divergence time and historical diversification of Juglandaceae
The stem age of Juglandaceae was estimated at 78.69 Mya (95% highest posterior density (HPD): 76.58-80.50 Mya). The walnut family diverged from the Myricaceae during the late Cretaceous (Fig. 4). During the Fig. 3 The Maximum Likelihood (ML) phylogenetic trees of Juglandaceae. Trees are based on a sixty one chloroplast protein-coding genes in the chloroplast, b 16 mtCDS fragement DNA sequence data, and c nuclear SNPs from whole genome resequencing data. For these trees, the PartitionFinder method for the best model combinations (Additional file 1: Table S4) was inferred by RAxML. Numbers at nodes correspond to ML bootstrap percentages (10,000 replicates). The three subfamilies are indicated with red arrows; Rhoipteleoideae (black bar), Engelhardioideae (dark red bar), and Juglandoideae (blue bar). Fruit morphology is shown using one species from each genus; the black solid circles indicate wingless fruits, hollow circles indicate winged fruits. Details for the outgroups (orange bar) are in Additional file 1: Table S1. The triangles indicate taxa with discordance between nuclear and chloroplast phylogeny   (Fig. 4).

Comparison of the genomes of the Juglandaceae
Both genome size and GC content among Juglandaceae plastomes were consistently more than the median genome size and GC content for land plant plastomes ( Table 1). The nucleotide variability (Pi) across all 27 plastomes of Juglandaceae included in this study was 0.00791 (Fig. 2). Coding regions with the highest variation included matK, atpI, rpoC2, rps14, aacD, psaI, ycf4, cemA, rpl33, infA, rps19, ndhF, rpl32, ndhD, ndhI, and ycf1. Non-coding regions that were most variable were matK-rps16, petN-psbD, ndhC-trnV-UAC, rbcL-psaI, psbE-petL, and rpl14-ycf1. These regions of maximum variability will no doubt prove the most informative for phylogenetic studies in the Juglandaceae [6,12]. Previous studies have identified rpl22, rps19 and ycf1 genes as the most variable genes in the Juglandaceae plastomes based on high indel density [12]. It was surprising, however, that the LSC region also contained variation, including 2577 bp differences among Juglandaceae plastomes, while SSC had 237 bp and IR had 296 bp differences among plastomes ( Table 1). The identification of these regions of variability in protein-coding genes (CDS) and intergenic spacer (IGS) regions will be useful for the study of the evolution, phylogeny, biogeography of the walnut family (Juglandaceae) and Fagales [4, 27-29, 34, 35] and, potentially, for DNA barcoding. Three potential pseudogenes (infA, rpl22, and ycf15) will also be valuable genetic resource for study of plastid transfer to the nucleus and for studies of the evolution of the walnut family and Fagales [10,20].
In previous studies, it was suggested the genus Cyclocarya is sister to genus Platycarya [17] based on fossil, chloroplast DNA fragments, and morphological data. Our data confirm this relationship (Figs. 3 and 4).

Alternatively, it was suggested by Xiang et al. (2014) that
Platycarya is sister to Juglans based on five chloroplast markers [31], and that Carya and Platycarya are sister groups [31]. Others considered Cyclocarya and Juglans to be sister groups [29]. Within Juglandoideae, our results strongly supported five genera (Juglans, Pterocarya, Cyclocarya, Platycarya, Carya) based on our chloroplast data (Fig. 3a), which is consistent with the phylogeny inferred from RAD-Seq data [36]. Using criteria based on fruit morphology, however, Carya and Juglans are sister groups [35], this relationship was not confirmed by our DNA-based analysis (Fig. 4), and Cyclocarya and Pterocarya are sister groups, a relationship supported in our data (Figs. 3 and 4) [35]. Previously, Smith and Doyle (1995) [54], based on chloroplast DNA and morphological data, concluded that Platycarya evolved earlier than Carya; our results based on nuclear resequencing (Fig. 3c)  hopeiensis, a controversial taxon that results from phylogenomic and population genetic analyses, transcriptomics, Genotyping-By-Sequencing, and whole chloroplast genome data indicated is a horticultural variety [55].
Species diversity centers of the genus Pterocarya occur in the northern temperate zone [27,36,37]. The previously unresolved intrageneric relationships of Pterocarya were resolved with high support using chloroplast genome data. P. stenoptera, P. hupehensis, and P. tonkinensis were clustered as a group (Fig. 3a); a second group consisted of P. macroptera and P. fraxinifolia (Fig. 3a) [37]. Morphology of the two groups within Pterocarya differs: group one species (P. stenoptera, P. hupehensis, and P. tonkinensis) have naked terminal buds, while the group two species P. macroptera and P. fraxinifolia have terminal buds with 2 to 4 caducous scales [37,57]. We consider these taxa species relationships based on our chloroplast genome, mtDNA fragments, and nuclear SNPs data (Fig. 3, but see Fig. 4), however we did not complete a detailed phylogeny of Pterocarya because our sample pool was too small.
Based on sequence data from 16 mtCDS and 61 chloroplast protein-coding genes, our results supported the unification of J. mandshurica, J. ailantifolia, and J. cathayensis within sect. Cardiocaryon ( Fig. 3b; Additional file 4: Fig. S3), consistent with a previous conclusion based on genotyping by sequencing data [22,55]. We also confirmed that the Ma walnut (J. hopeiensis) arose from the resent hybridization of J. regia and J. mandshurica based on both matrilineal and biparental inheritance data (Fig. 3) [12,55]. The placement of J. cinerea into Rhysocaryon (black walnuts) based on plastome sequence was clear (Fig. 3a), however, it belongs to Cardiocaryon (Asian butternuts) based on nuclear sequences (Fig. 3c), and its morphology is consistent with Cardiocaryon [12,15]. In addition, J. cinerea can hybridize with members of Cardiocaryon and even Dioscaryon, but not with Rhysocaryon [59]. All other North American Rhysocaryon freely hybridize. The discordance between the J. cinerea nuclear genome and its plastome is almost certainly the result of a chloroplast capture [16,32]. It is notable that the chloroplast of J. cinerea is not an ancient one (ancestral to the Rhysocaryon) but is instead most like J. nigra (Figs. 3 and 4).
Our results indicated that the capture of a Rhysocaryon chloroplast by J. cinerea capture was relatively recent (Figs. 3 and 4). Hybridization and chloroplast capture between Rhysocaryon and Cardiocaryon apparently played a major role in the diversification of Juglans, as it did in other plant families [33,[36][37][38]60]. The evaluation of divergence time using strictly bifurcating tree methods can be misleading because gene flow can result in underestimates of species divergence time [61].

Dating the origin and historical diversification of Juglandaceae
Stem ages in the Juglandaceae are controversial [13,17,29,30]. Most previous studies estimated a stem age of Juglandaceae about 84 Mya in the Cretaceous [25,29], however, the divergence times for some genera remain uncertain [29,30], as only a few studies have examined the divergence times of the major genera and within the species of the family [17,30]. The lack of a robust phylogenetic framework and time tree has hindered development of a full understanding of the diversification of Juglandaceae.
The  [35]. By the end of the Eocene, Cyclocarya and Platycarya became extinct in North America but survived in Eurasia [25]. Our results indicated Carya emerged as an animaldispersed genus about 58 Mya, considerably earlier than the estimate (~ 44 Mya) of Larson-Johnson (2016) [35] and Song et al. (2020, ~ 40 Mya) [37], but later than the estimate (~ 80 Mya) of Zhang et al. (2021) [27], although we agree that the overwhelming majority of winged and wingless fruited genera diverged or diversified during the Paleogene, probably reflecting adaptation to changing regeneration regimes [62]. We estimated the divergence time between the Juglandoideae and Engelhardioideae, which are reciprocal monophyly subfamilies, was ~ 68.6 Mya, later than the estimate of Mu et al. (2020) was ~ 79.18 Mya [36].
From the early Tertiary to the Neogene there was likely extensive migration and exchange among North Atlantic, North America, western European, and Asian flora [25]. Interestingly, most species within the extant genera diversified between 18.5 and 8.5 Mya in warm and dry environments of the Early Miocene (Fig. 4), a period of especially rapid speciation within Juglans and Pterocarya. Juglandaceae species diversity in from Oligocene to Pliocene with a rapid increase elucidated by Zhang et al. 2021 (between 30 and 5 Mya) [27], Mu et al. 2020 (between 20 and 5 Mya) [36], and Song et al. 2020 (between 13 and 5 Mya) [37]. Some closely related taxa within Juglans appear to have diverged relatively recently, under the influence of climate change during the Quaternary glacial period ( Fig. 4; Bai et al. 2017) [63]. For example, J. regia and J. sigillata, J. mandshurica and J. hopeiensis, and Carya hunanesis and C. kweichwensis (Fig. 4). Overall, the Juglandaceae reflect a complex evolutionary history and diversification affected by changes in geography, distinctive distributions, climate changes, coevolution with animals. Biotic interactions (e.g., pathogens) no doubt also had a role in driving species abundance and distribution [63], but biotic interactions of that type are difficult to detect from current data [36][37][38][39].

Conclusions
Our results are a first attempt to use whole genomes to elucidate the characterize sequence divergence and evolutionary history in the Juglandaceae. Evidence of early lineage diversification, hybridization and extinction lead us to predict complex evolutionary histories for the extant species in the Juglandaceae. A fully resolved, strongly supported, time-calibrated phylogenetic tree of Juglandaceae will provide an important framework for studying classification, diversification, biogeography, phenotypic evolution, gene function and comparative genomics of this important family. Our results supported some recently clarified circumscriptions of controversial genera, although our taxonomic sampling is insufficient to stand alone as definitive. Variation within our newly annotated whole chloroplast genomic sequences (available in GenBank) should be a useful resource for study of the evolution, for DNA barcoding, phylogeny, biogeography, and studies of genetic variation in the walnut family (Juglandaceae). Wider plastid phylogenomics, whole genomes (nuclear data), a more complete fossil record, better dating of the fossil record, and more studies of morphology will all be needed to fully reconstruct the phylogeny of woody plant families such as the Juglandaceae and other families of Fagales.

Taxon sampling, genomic DNA extractions, library, and sequencing
We analyzed 27 species of Juglandaceae from seven genera that span the taxonomic, geographic, and morphological range of the family. These were contextualized using published plastomes of nine species of Fagales (include four species for Betulaceae, and five species for Fagaceae), three species of Cucurbitales, and four species of Rosales (Additional file 1: Table S1). The voucher specimens were deposited in the herbarium of Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), Northwest University (Table 1). We collected fresh leaf samples from the field, and the samples were stored in air tight bags filled with silica gel desiccant for later DNA extraction.
Total genomic DNA was extracted from 200 mg of silica gel-dried leaves using a modified CTAB (cetrimonium bromide) method [64,65]. The DNA concentration was quantified using a NanoDrop spectrophotometer (Thermo Scientific, Carlsbad, CA, USA). A paired-end (PE) library with 350 bp insert size was constructed using the Illumina PE DNA library kit according to the manufacturer's instructions and sequenced using an Illumina Hiseq2500 by Novogene (www. novog ene. com, China).

Plastomes assembly and annotation
The sequenced and assembled plastomes were quality controlled using the NGSQC toolkit v2.3.3 trim tool to remove low quality reads, unknown bases, adapter sequences, and sequencing errors [66]. Short reads were assembled into long contigs using SPAdes Genomic Assembler v3.6.0 [67], followed by manual checking and finishing. We used a reference J. regia complete chloroplast genome (Genbank accession number KT963008) in this study [50]. The contigs were assembled in Geneious v8.0.2 [68]. To exclude nuclear DNA, we used BLAST to remove contigs that did not align to a reference plastome from J. regia [50]. A reference-based assembly allowed us to reconstruct each of all other species [13].
After we identified the boundaries between the inverted repeats (IR) and the single copy regions, i.e., the Large Single Copy (LSC) and Small Single Copy (SSC) regions, the completed plastomes were annotated using the online software DOGMA based on the J. regia reference [50,68,69]. We manually annotated start and stop codons and other regions of interest using Geneious v8.0.2 [50]. A circular representation of each plastome was visualized in OGDraw [70]. Finally, gene content, order, and variability were analyzed in Geneious and R [71]. The plastid genomes data were deposited in National Center for Biotechnology Information (NCBI), the accession numbers were KX703001 to KX703038 (Table 1).

Variant calling
Using paired-end (2 × 150 bp) Illumina sequencing, we obtained high sequencing depth (> 30×) per sample based on alignment with the J. regia reference plastome [50]. After aligning the re-sequenced reads, we processed the alignments to remove duplicate reads and applied a series of quality control filters with the intent of limiting false-positive variants. Sequence reads passing Illumina's quality control filter were aligned using bwa-mem algorithm of BWA v0.7.12 [72] and then mapped to the J. regia plastid genome. Only uniquely mapped reads were retained, which removed the repeat region IR. Duplicate reads were removed from individual sample alignments using Picard tools v2.5.0 (https:// github. com/ broad insti tute/ picard) Mark Duplicates function and assigned genomic positions for each accession based on the alignment files generated by SAMtools v0.1.19 [73].
The SNPs (single nucleotide polymorphisms) and small Indels (insertion-deletion) among Juglandaceae plastid genome accessions were identified if they were supported by at least three mapped reads. Following bwa-mem mapping, the rest of the sequencing pipeline was performed using the toolkit GATK v3.5.0 [74]. Reads present in areas surrounding Indels were realigned using the built-in function Indel Realigner, after which SNPs were called using Unified Genotyper. Finally, a series of quality filters were applied to reduce systematic errors, including quality-by-depth ratio (QD) < 10, ReadPosRankSum < − 8.0, depth coverage (DP) ≥ 30, probability of strand bias (FS) > 10.0, SNPs that passed these filters were kept for subsequent analyses. Finally, we use the stats module in the bcftools v1.1 to count SNPs and Indels and calculate Ts/Tv (transition/transversion) rates [75].
In this study, we called the nuclear SNPs from all samples of Juglandaceae (Additional file 1: Table S3). The Illumina paired-end reads from each sample were first processed to remove adaptor and low-quality sequences using Trimmomatic [76]. The cleaned unique reads were aligned to the common walnut reference genome version 1.0 (https:// treeg enesdb. org/ FTP/ Genom es/ Jure/) using BWA [46,73], and only uniquely mapped reads were retained. Following mapping, genotypes were assigned to each genomic position for each sample based on the alignment files generated by SAMtools [72]. Single nucleotide polymorphism (SNPs) and small indels (insertion and deletion) in the 27 samples were identified using GATK [74]. The redundant reads were then filtered based on the location of clean reads in the reference genome using software Picard (Picard: http:// sourc eforge. net/ proje cts/ picard/). We used GATK's Haploype Caller (local haplotype assembly) algorithm for SNPs and InDels based on each sample.

Partition strategy and phylogenetic analysis
To infer the evolutionary relationships among the 27 Juglandaceae plastomes, and to test the phylogenetic signal from different regions of the plastomes, we reconstructed the Juglandaceae phylogeny using the following four datasets based on the exons of protein-coding genes, whole chloroplast genome data, mitochondrial proteincoding genes (mtCDS), whole genome re-sequencing of nuclear genome SNPs (nrSNPs), and ITS (Internal transcribed spacers) sequences; to avoid large amounts of missing data in the phylogenetic analyses, sixty-one protein coding genes that were shared by all 44 taxa were extracted and aligned (Additional file 1: Table S4). Bestfit partitioning schemes and models were selected using the greedy search mode implemented in PartitionFinder v2.1.1 [77] (Additional file 1: Table S6).
Plastomes were aligned using default settings in MAFFT v7.245 [78]. The resulting alignments were manually checked in Geneious v8.0.2 [50]. The best-fit nucleotide substitution model for all our plastome data sets was determined (as suggested by Modeltest v3.7 with the Akaike information criterion (AIC) [79,80]. A concatenated data set was analyzed using Bayesian Inference (BI) and Maximum Likelihood (ML) analysis in MrBayes v3.2.6 [53] or RAxML v8.1.24 [81]. BI trees were produced by MrBayes v3.2.6 set at 10,000,000 generations. Two independent Markov chain Monte Carlo (MCMC) chains were run, each with one cold chain and three incrementally heated chains. Trees were sampled every 10,000 generations, with the first 25 % of the trees discarded as burn-in. Stationarity was considered reached when the average standard deviation of split frequencies was < 0.01. The Maximum Likelihood (ML) trees were generated using RAxML v8.1.24 using a GTR-GAMMA model [81]. For ML analysis, difference general time reversible models were performed with all data sets. For all analyses, 10 independent ML searches were conducted, bootstrap support was estimated with 1000 bootstrap replicates, and bootstrap (BS) proportions were drawn on the tree with highest likelihood score from the 10 independent searches. We generated multiple mtCDS sequence alignments using ClustalX with default parameters [82]. The phylogenetic tree analysis was performed using MEGA7 [83].
For the phylogenetic tree analysis based on nuclear genome data, we selected a total of 1,161,468 SNPs with minor allele frequency (MAF) ≥ 5 % and missing rate per site ≤ 10 % for phylogenetic analyses. A Maximum Likelihood (ML) tree was constructed using RAxML v8.1.24 in 1000 bootstrap replicates [81]. To gain a better understanding of the species relationships, we selected 55 species to represent all extant genera in the Juglandaceae for which internal transcribed spacer (ITS) sequence data are available in NCBI (Additional file 1: Table S7). We generated multiple ITS sequence alignments using ClustalX with default parameters [82], and a phylogenetic tree analysis using Maximum Likelihood analysis [81].

Divergence-time estimation and fossil calibration
We estimated the divergence time of Juglandaceae species based on complete chloroplast genome data combined with six fossil calibrations (Additional file 1: Table S8) [24,25,29]. Penalized likelihood (PL) dating analyses were conducted using the treePL v1.0 program [84]. To identify the appropriate level of rate heterogeneity in the phylogram, a data-driven cross-validation analysis was conducted with treePL v1.0. One thousand bootstrap replicates with branch lengths were also generated using RAxML v8.1.24 for calculating the confidence age intervals with TreeAnnotator as implemented in BEAST v2.4.5 with a GTR + I + G substitution model and an uncorrelated lognormal relaxed-clock [85,86]. The phylogenetic trees were then compiled into a maximum clade credibility tree using Tree Annotator v1.8.0 [87]. The program FigTree v1.3.1 (http:// tree. bio. ed. ac. uk) was used to visualize mean node ages and highest posterior density (HPD) intervals at 95 % (upper and lower) for each node and to estimate branch lengths and divergence times.
Additional file 1: Table S1. The taxa, Genbank accession number, and family of 41 species used for phylogenetic analysis. Table S2. Characterization of three primers to verified three pseudogenes. Table S3. Summary of variants from all Juglandaceae genome resequencing based on comparison with Juglans regia whole genome sequences. Table S4. Protein-coding genes (n = 61) included in the phylogenetic analysis. Table S5. Characterization of sixteen mitochondrion protein-coding genes (mtCDS) primers used in this study. Table S6. The PartitionFinder method for the best model combinations of the ML phylogenetic tree based on 61 protein-coding genes. Table S7. Sources of Internal transcribed spacer (ITS) sequences used in this study. Table S8. Fossil evidence used for estimation divergence time of Juglandaceae. The PCR amplication products of three pseudogenes. Their identity was verfied by Sanger sequencing (primers see Table S8). (e) The amino acid sequence of three pseudogenes of ten species of the Juglandaceae.