Conflicting phylogenetic signals in plastomes of the tribe Laureae (Lauraceae)

Background Gene tree discordance is common in phylogenetic analyses. Many phylogenetic studies have excluded non-coding regions of the plastome without evaluating their impact on tree topology. In general, plastid loci have often been treated as a single unit, and tree discordance among these loci has seldom been examined. Using samples of Laureae (Lauraceae) plastomes, we explored plastome variation among the tribe, examined the influence of non-coding regions on tree topology, and quantified intra-plastome conflict. Results We found that the plastomes of Laureae have low inter-specific variation and are highly similar in structure, size, and gene content. Laureae was divided into three groups, subclades I, II and III. The inclusion of non-coding regions changed the phylogenetic relationship among the three subclades. Topologies based on coding and non-coding regions were largely congruent except for the relationship among subclades I, II and III. By measuring the distribution of phylogenetic signal across loci that supported different topologies, we found that nine loci (two coding regions, two introns and five intergenic spacers) played a critical role at the contentious node. Conclusions Our results suggest that subclade III and subclade II are successively sister to subclade I. Conflicting phylogenetic signals exist between coding and non-coding regions of Laureae plastomes. Our study highlights the importance of evaluating the influence of non-coding regions on tree topology and emphasizes the necessity of examining discordance among different plastid loci in phylogenetic studies.


INTRODUCTION
Gene tree discordance is relatively common in phylogenomic studies. The conflicts can be caused by biological factors like incomplete lineage sorting (ILS), hybridization, horizontal gene transfer, gene loss, and gene duplication (Maddison, 1997;Sun et al., 2015;Gonçalves et al., 2019;Sato et al., 2019). Most relevant studies have focused on incongruent tree topologies among different genomic compartments (Sun et al., 2015;Zhao et al., 2016;Walker et al., 2019) because these genes have evolved independently and their gene tree topologies have been influenced by biological processes. By contrast, relatively few studies have focused on tree conflicts among plastid genes (e.g., Foster, Henwood & Ho, 2018;Gonçalves et al., 2019;Walker et al., 2019;Zhang et al., 2020). Usually, plastomes are considered to be uniparentally inherited and to have evolved as a single unit, free from such biological sources of conflict (Birky, 1995;Wicke et al., 2011). However, the branched and linear structure of plastid DNA, which arose from recombination-dependent replication, is indicative of recombination (Oldenburg & Bendich, 2016;Ruhlman et al., 2017). In addition, biparental inheritance and heteroplasmy (e.g., the presence of different plastomes within an individual or a cell) have been reported in seed plants (Szmidt, Aldén & Hällgren, 1987;Johnson & Palmer, 1989;Reboud & Zeyl, 1994;Carbonell-Caballero et al., 2015). Heteroplasmy may, in rare cases, give rise to heteroplasmic recombination, which has been invoked to explain gene tree discordance (Marshall, Newton & Ritland, 2001;Sullivan et al., 2017;Sancho et al., 2018). In addition to recombination events, the transfer of genes among plastid, mitochondrial and nuclear genomes; positive selection; tree length (a proxy for evolutionary rate); and GC content may also generate phylogenomic conflict (e.g., Stegemann et al., 2003;Smith, 2014;Wysocki et al., 2015;Piot et al., 2018;Saarela et al., 2018;Foster, Henwood & Ho, 2018). Aside from biological factors, non-biological factors (e.g., outlier genes, uninformative loci, and gaps) may cause conflict as well. For example, Duvall, Burke & Clark (2020) found that alternative topologies arose from alignment gaps. Given that most studies assume no conflict and treat the plastome as a single unit, taking biological and non-biological factors into consideration and quantifying the extent of conflict among different plastid loci is of great importance (Wolfe & Randle, 2004).
Owing to the rapid development of next-generation sequencing (NGS), more plastomes are becoming available at a reasonable cost, driving advances in phylogenomics and promoting a more comprehensive understanding of plant evolution (Li et al., 2019). Phylogenetic relationships among Lauraceae (Song et al., 2017), as well as many other groups (e.g., Eserman et al., 2014;Barrett et al., 2016), have been well resolved using plastome data. In phylogenomic studies of plastomes (Guo et al., 2017;Gonçalves et al., 2019;Xu et al., 2019;Li et al., 2019), plastome coding genes have generally been used, and non-coding regions have been excluded. Only a few studies have noted the potential impact of non-coding regions on tree topology. Parks, Cronn & Liston (2009) revealed that the phylogenetic position of Pinus albicaulis Engelm. based on complete plastomes differed from that based on exon sequences. A similar situation also occurred for phylogenetic relationships within Rubiaceae (Wikström, Bremer & Rydin, 2020), suggesting that there were conflicting phylogenetic signals between coding-and non-coding regions. Because tree topology is the foundation of comparative studies that infer biogeographic history, phylogenetic diversity and other evolutionary patterns (Walker et al., 2019), the influence of non-coding regions on phylogenetic inference should be evaluated.
Both ILS and hybridization are at play in tree species, which generally have high rates of outcrossing and large population sizes (Petit & Hampe, 2006;Crowl et al., 2019). Interspecific hybrids have been described in Persea (tribe Perseeae, sister to tribe Cinnamomeae and tribe Laureae), Cinnamomum and Aiouea (tribe Cinnamomeae) (van der Werff, 1984;Rohwer et al., 2019). These processes are perhaps also problematic in Laureae. When combined, such biological processes may make accurate inference of evolutionary relationships in Laureae difficult. Unfortunately, previous phylogenomic studies of Laureae have ignored potential conflicts among different plastid loci and the underlying processes that may have generated them (Zhao et al., 2018;Song et al., 2019;Tian, Ye & Song, 2019). These characteristics make Laureae an ideal group in which to explore intra-plastome conflict and its influence on phylogenetic inference.
Tribe Laureae, a species-rich group in the family Lauraceae, is phylogenetically sister to tribe Cinnamomeae . It comprises approximately 500 species and 10 genera: Actinodaphne, Adenodaphne, Dodecadenia, Iteadaphne, Laurus, Lindera, Litsea, Neolitsea, Parasassafras and Sinosassafras (Van der Werff & Richter, 1996;Chanderbali, Van der Werff & Renner, 2001;Li et al., 2004;Li et al., 2008b). Species of this tribe are evergreen or deciduous and usually occur in the form of trees or shrubs (Li et al., 2008a). Their distribution ranges from the Mediterranean region, Asia, and Oceania to North America (Li et al., 2004). Some members of Laureae have great ecological and economic value. For example, Neolitsea sericea (Bl.) Koidz. is a dominant species found in various evergreen and deciduous broadleaf mixed forests and in evergreen broadleaf forests (Wang et al., 2009), and Laurus nobilis L. has been used in remedies for centuries (Nayak et al., 2006).
Thirty-five plastomes representing 28 species and six genera of Laureae have been published (Table S1). Compared with the vast diversity of Laureae, the published plastome data for this group are relatively limited. Hence, we now report 12 newly sequenced plastomes (Table 1) and combine them with existing plastomes to address three primary  (2019). Twelve plastomes from other tribes were also downloaded (Table S1). Genomic DNA was extracted from silica-gel-dried leaf tissue using the cetyl trimethyl ammonium bromide (CTAB) method (Doyle & Doyle, 1987). The yields of genomic DNA extracts were quantified by fluorometric quantification on a Qubit instrument (Invitrogen, Carlsbad, California, USA) using the dsDNA HS kit, and the DNA size distribution was assessed visually on a 1% agarose gel. DNA libraries with an average insert size of 270 bp were prepared by the Beijing Genomics Institute (BGI, Shenzhen, China). Paired-end reads of 2× 151 bp were generated on the Illumina X ten sequencing system (Illumina Inc.).
Plastome annotation was performed using the program GeSeq -Annotation of Organellar Genomes (Tillich et al., 2017). Start and stop codons were inspected and manually adjusted in Geneious Prime when necessary. Plastomes were submitted to GenBank (MN274947, MN428456-MN428466). Maps of all 12 plastomes were drawn using the OrganellarGenomeDRAW tool (OGDRAW) (Lohse et al., 2013). A summary of the newly sequenced plastomes is presented in Table 2.
Sequences were aligned using MAFFT with default settings and manually edited with BioEdit v7.2.5 (Hall, 1999) when necessary. The best-fitting DNA substitution models for the six unpartitioned datasets were selected using ModelTest-NG (Darriba et al., 2020) under the corrected Akaike Information Critierion (AICc). The aligned sequences and selected DNA substitution models were used for ML analyses, and ML trees were constructed using RAxML-NG (Kozlov et al., 2019). We also implemented a partitioning strategy on two datasets, the CP with one IR region removed (CP-reduced) and CDS (configuration details shown in File S1). The optimal partitioning schemes for each dataset were inferred with PartitionFinder 2 (Lanfear et al., 2016), and the optimal partitioning schemes, and nucleotide substitution models for each partition were used for ML analyses in RAxML-NG.
We extracted all loci (coding regions, introns, tRNA, rRNA and intergenic spacers) using a python script (Jin, 2019) and aligned them using MAFFT with default settings. These alignments were used to infer gene trees by rapid bootstrap analyses (option -f a) in RAxML (Stamatakis, 2014) with the GTRGAMMA model. The number of bootstrap replicates was set to 1000, as Simmons & Kessenich (2019) have suggested that fewer replicates may be insufficient to find the optimal gene tree topology. The best-scoring ML trees were used to estimate the species tree with local posterior probability (LPP) (Sayyari & Mirarab, 2016) in ASTRAL III (Zhang et al., 2018).
We performed constrained maximum likelihood analyses in IQ-TREE (Nguyen et al., 2014) to obtain the ML trees that supported different topologies. To understand which loci supported the alternative topologies, we calculated site-wise log-likelihood values for each topology in RAxML using option ''-f G''. After obtaining site-wise lnL differences, we converted site-wise differences to locus-wise lnL differences ( lnL) in R 3.6.2. The lnL differences were plotted against each locus using ggplot2 (Wickham, 2016). It has been suggested that loci with an absolute lnL > 2 are statistically significant (Edwards, 1984). Therefore, we conducted separate ML analyses on datasets from which these loci (absolute lnL >2) had been removed to test whether small subsets of sequence matrices determined tree topology (Shen, Hittinger & Rokas, 2017).

Plastome features of Laureae
The sizes of the 12 newly generated Laureae plastid genomes ranged from 152,132 bp (Litsea szemaois) to 152,916 bp (Lindera erythrocarpa) (Table 2), similar to previously published Laureae plastomes (152,211-153,011 bp, Table S1). All had a typical quadripartite structure and were assembled into a single, circular and double-stranded DNA sequence (Fig. 1)  The 12 newly sequenced plastomes contained 112 single-copy genes: 78 protein-coding genes, 30 tRNA genes, and 4 rRNA genes (Table 2 and Table S2). Sixteen genes had one intron, and two genes had two introns. There were 13 duplicated genes in the IR regions (Table S2), and rps12, ycf1, and ycf2 were partly duplicated in the IR regions (Fig. 1).
Subclade II was sister to subclade I based on four unpartitioned datasets (CP, non-CDS, LSC, SSC; Fig. 2 and Figs. S2-S4, respectively). However, subclade II was sister to subclade III rather than subclade I based on the unpartitioned CDS dataset (Fig. S1). Both topologies were strongly supported. The sister relationship of subclades I and II was supported in the ML tree based on partitioned plastomes (one IR removed, CP-reduced dataset; Fig. S7), and subclade II was sister to subclade III in the ML tree based on the partitioned CDS dataset (Fig. S8), indicating that our results were robust to different partitioning schemes.
The sister relationship of subclades I and II (BS values ranging from 80% to 92%) was consistently revealed even as the percentage of gaps increased (Table S3), indicating that gaps had no impact on our tree topology. Positively selected sites were detected in 27 coding genes (Table S4). The ML tree based on CDS-reduced dataset supported a sister relationship of subclades II and III (Fig. S9), consistent with ML trees based on CDS dataset (Figs. S1 and S8), suggesting that positive selection did not affect the relationship of the three subclades.

Investigating incongruent nodes and differences in tree topology
The tree topology inferred from ASTRAL III (Fig. 3) was largely congruent with that of the ML trees ( Fig. 2 and Figs. S1-S4), except that the former showed a sister relationship of subclade I and subclade III. We performed constrained maximum likelihood analyses in IQ-TREE and obtained three suboptimal ML trees that supported the subclade II-subclade I (called T1 hereafter), subclade II-subclade III (T2) and subclade I-subclade III (T3) affinities. We extracted 243 loci and assessed how each locus supported one of the three topologies by examining the gene-wise log-likelihoods (Fig. 4). T1 was strongly supported by six loci (rpoC1 intron, trnG-trnfM, ndhA intron, psaJ-rpl33, rpl2-rpl23 and petN-psbM ; absolute lnL >2); T2 was strongly supported by three loci (psaB, trnS-ycf3 and ycf2; absolute lnL >2); and T3 was moderately supported by one locus (clpP intron1; absolute lnL >1 and <2) ( Table S5). The sum of absolute lnL of T1 was higher than that of T2 and T3 (Fig. 4), suggesting that our data support the topology of T1 rather than T2 or T3. After the removal of six loci (rpoC1 intron, trnG-trnfM, ndhA intron, psaJ-rpl33, rpl2-rpl23 and petN-psbM ), a sister relationship of subclade II and subclade III was revealed (Fig. S10). After the removal of three loci (psaB, trnS-ycf3, and ycf2), subclade II was sister to subclade I (Fig. S11). These results underscore the decisive role played by small subsets of loci in phylogenetic inference.
The topological tests showed that T2 did not differ significantly from T1 (p > 0.05, Table S6). T3 was statistically rejected by the KH and AU tests (p < 0.05) but not by the Shimodaira-Hasegawa (SH) test (p = 0.0505). That T3 was rejected according to the KH and AU tests suggests that the sister relationship between subclades I and III may be misleading.

Plastome features
It has been noted that most plastid genomes of land plants and algae range from 120 to 160 kilobase pairs (kb) in length (Palmer, 1985). In this study, the plastid genome sizes of 12 species from five Laureae genera ranged from 152,132 bp to 152,916 bp, indicating that plastid genome size was conserved within Laureae. GC content was highest in the IR region rather than in the single copy regions, owing to the presence of a ribosomal RNA gene cluster in the IR region, consistent with a previous study (Huotari & Korpelainen, 2012). GC contents of the IR, LSC and SSC regions of the newly sequenced plastomes were identical to those of nine Lindera species studied earlier (Zhao et al., 2018). In contrast to the gene losses recognized in several Lauraceae lineages (Song et al., 2017), our analysis revealed that gene content among Laureae was highly conserved. Song et al. (2017) suggested that plastome contraction in Lauraceae was largely driven by fragment loss events in the IR regions. In our study, we found no gene loss among Laureae plastomes.

Phylogenetic relationships within Laureae
Previous phylogenetic studies (Song et al., 2017;Zhao et al., 2018) based on complete plastomes suggested that Laureae was sister to Cinnamomeae and that together they were sister to Perseeae. The same phylogenetic relationships among these groups were recognized in our study (Figs. 2 and 3). In previous work, Actinodaphne and Neolitsea were resolved as monophyletic groups based on matK, ITS and rpb2 (Fijridiyanto & Murakami, 2009a;Fijridiyanto & Murakami, 2009b), but Actinodaphne was not a monophyletic group based on complete plastid genomes . In this study, the non-monophyletic status of Actinodaphne was supported. The conclusion of Actinodaphne monophyly may have been caused by sampling bias in previous studies (Fijridiyanto & Murakami, 2009b;Fijridiyanto & Murakami, 2009a). The monophyly of Neolitsea can be neither rejected nor supported in the present study. Actinodaphne cupularis (Hemsl.) Gamble was grouped with Neolitsea oblongifolia Merr. et Chun, N. pallens and N. chui Merr. with low bootstrap support (54%; Fig. 2), and sampling of Neolitsea and related genera was limited. Lindera and Litsea were polyphyletic in our analysis, consistent with previous studies (Li et al., 2008b;Fijridiyanto & Murakami, 2009b). The phylogenetic position of P. confertiflorum was unresolved based on ETS and ITS (Li et al., 2008b), and the ambiguity of its position still remains, despite the integration of complete plastid genomes in our analysis and a previous study (Liao, Ye & Song, 2018). Subclade III was sister to subclade I and II in our study, consistent with previous analyses (Zhao et al., 2018;Song et al., 2019;Tian, Ye & Song, 2019). The three Lindera species in subclade I share common morphological traits, such as alternate and pinninerved leaves, a persistent involucre, vegetative terminal buds in inflorescences and 3-merous flowers (Li et al., 2008a). However, these characters also occur in several members of the other two subclades (e.g., Lindera benzoin (L.) Bl. and Laurus nobilis), perhaps resulting from convergent and/or parallel evolution (Li et al., 2008b). These traits are not good indicators for delimiting the three subclades of Laureae. In subclade III, the trinerved or triplinerved species of Lindera (Lindera aggregata, L. chunii, L. fragrans, L. limprichtii, L. pulcherrima, L. supracostata, L. thomsonii and L. thomsonii var. vernayana) formed a well-supported clade in both our study and that of Tian, Ye & Song (2019). However, triplinerved leaves also exist in most species of Neolitsea (Li et al., 2008b;Li et al., 2008a). Therefore, traditional morphological traits are of limited use in taxon delimitation, even within subclades of Laureae. Given the limited samples and data in our analyses, more sampling and DNA sequences are needed to further elucidate the relationships within Laureae.

Phylogenetic incongruence in the plastome
Although many studies have treated plastid protein-coding genes or the complete plastome as a single unit (e.g., Song et al., 2019;Tian, Ye & Song, 2019), potential conflicts among sequence types (i.e., coding vs. non-coding regions) have been reported in several studies. By comparing phylogenies based on complete plastomes and coding regions (Yu et al., 2017), it was inferred that non-coding regions did not significantly influence the tree topology of Theaceae. By contrast, non-coding regions had an impact on the phylogenetic relationships of several tribes in Rubiaceae (Wikström, Bremer & Rydin, 2020) and subtribes in Poaceae (Saarela et al., 2018). A conflicting signal between coding and non-coding regions was also reported in Leguminosae (Zhang et al., 2020). In this study, inclusion of non-coding regions altered tree topology in the tribe Laureae, suggesting the existence of a conflicting signal between coding and non-coding regions. Non-coding regions are often discarded for being uninformative, or for being misleading due to saturation at deep time scales (Foster, Henwood & Ho, 2018). In our study, tree topologies based on coding and non-coding regions were largely congruent, except for the relationships among the three subclades (Figs. S1-S2), indicating that non-coding regions are as informative as coding regions in Laureae. Thus, it is imperative to evaluate the influence of non-coding regions on tree topology rather than treating the whole plastome as a single unit or simply excluding non-coding regions from phylogenetic analysis.
To accommodate the conflicts among different plastid regions, a species tree was inferred through summary coalescent analysis. It has been suggested that the coalescent method is more robust than the concatenation method when the level of ILS is high (Liu, Xi & Davis, 2014;Mirarab, Bayzid & Warnow, 2014). High ILS tends to occur when the time interval between consecutive speciation events is short (Sun et al., 2015;Sato et al., 2019), and the core Lauraceae group (Perseeae, Cinnamomeae and Laureae) is thought to have undergone a rapid radiation (Chanderbali, van der Werff & Renner, 2001;Rohwer & Rudolph, 2005;Nie, Wen & Sun, 2007). We therefore chose to implement the coalescent method. Nonetheless, it should be noted that, with this method, short and uninformative loci may lead to problematic gene trees and therefore result in a less accurate species tree (Xi, Liu & Davis, 2015;Springer & Gatesy, 2016). In our study, only nine of 243 loci (rpoC1 intron, trnG-trnfM, ndhA intron, psaJ-rpl33, rpl2-rpl23, petN-psbM, psaB, trnS-ycf3, and ycf2) had a strong phylogenetic signal at the contentious node. The other 234 loci with weak phylogenetic signals may have resulted in gene trees with uncertainties and led to inaccurate topology at this node.
Exploration of the factors that underlie conflicts in phylogenetic signals is of great importance-but it is also challenging. Previous studies have examined whether biological and non-biological factors contribute to such conflicts (e.g., Duvall, Burke & Clark, 2020;Zhang et al., 2020). For example, gaps have been found to cause alternate, but conflicting topologies in Poaceae (Duvall, Burke & Clark, 2020). However, the inclusion of alignment gaps did not alter our tree topology (Table S3). Although previous studies indicated that partitioning improves phylogenetic inference (Xi et al., 2012), ML tree topologies based on partitioned and unpartitioned datasets did not differ significantly in our study. It has been suggested that plastid genes under positive selection may bias phylogenies (e.g., Piot et al., 2018;Saarela et al., 2018), however, we found that the relationship among subclades I, II and III was not affected by positively selected sites, suggesting that positive selection was not the cause of tree conflicts. In this study, the low support values and short branch lengths of the estimated species tree (Fig. 3) suggested that each locus had a significantly incongruent topology and may indicate the existence of ILS. High levels of ILS are thought to yield similar numbers of loci supporting alternative topologies (Huson et al., 2005). In our study, the numbers of loci supporting each topology were different (six for T1, three for T2, and zero for T3 after exclusion of loci with absolute lnL ≤ 2), suggesting that ILS may not be the primary cause of the discordance among loci. Another plausible explanation for the conflict is heteroplasmic recombination, which can occur in species with biparental plastome inheritance (Walker et al., 2019). Although heteroplasmic recombination has been reported with clear evidence in Brachypodium and Picea (Sullivan et al., 2017;Sancho et al., 2018), to our knowledge it has never been documented in Lauraceae. Based on the data reported here, it is too early to draw a firm conclusion about the causes of the conflict in phylogenetic signals. Although fully resolved phylogenies may still remain elusive based on different genomic compartments (i.e., nuclear, mitochondrial and plastid), phylogenomic studies that incorporate these compartments can provide new insights into tree discordance and its underlying causes (Koenen et al., 2020). Therefore, more genetic information (e.g., nuclear genes) will be required to solve this problem in future work.

CONCLUSION
In summary, this study revealed that Laureae plastomes are conserved in structure, size and gene content. A conflicting phylogenetic signal was detected between coding and non-coding regions, suggesting that the plastid genome should not be treated as a single unit. ML trees based on coding and non-coding regions were largely congruent except at the contentious node, indicating that coding regions are as informative as non-coding regions and that the influence of non-coding regions on tree inference should be evaluated. We also found that small subsets of plastome loci determined the topology at specific nodes, consistent with the results of a previous study (Shen, Hittinger & Rokas, 2017). Through quantification and analysis of intra-plastome conflicts, the sister relationship of subclade I (including Lindera communis, L. glauca and L. nacusua) and II (including Laurus azorica, L. nobilis, Lindera megaphylla, Litsea acutivena, L. glutinosa, L. monopetala and L. pungens) was supported by our study. Biological factors may contribute to the conflicts among plastid loci; however, more information is needed to determine the underlying mechanism(s).
• Yong Xu and Lu Jin analyzed the data, prepared figures and/or tables, and approved the final draft.
• Tong-Jian Liu and Hai-Fei Yan analyzed the data, authored or reviewed drafts of the paper, and approved the final draft.
• Xue-Jun Ge conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

DNA Deposition
The following information was supplied regarding the deposition of DNA sequences: The 12 newly sequenced plastomes are available at GenBank: MN274947, MN428456-MN428466.

Data Availability
The following information was supplied regarding data availability: The 12 newly sequenced plastomes in our study are available in Data S1. Voucher specimens were deposited in the herbarium of the South China Botanical Garden (IBSC) and can be searched by collection number: XTBGLQM0236, XTBGLQM0582, 180923, XTBGLQM0095, WBGQXJ001, XTBGLQM0653, CFL2678, XTBGLQM0687, WBGQXJ124, XTBGLQM0692, 18371(see also Table 1).

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.10155#supplemental-information.