Plastid genomics of Nicotiana (Solanaceae): insights into molecular evolution, positive selection and the origin of the maternal genome of Aztec tobacco (Nicotiana rustica)

Species of the genus Nicotiana (Solanaceae), commonly referred to as tobacco plants, are often cultivated as non-food crops and garden ornamentals. In addition to the worldwide production of tobacco leaves, they are also used as evolutionary model systems due to their complex development history tangled by polyploidy and hybridization. Here, we assembled the plastid genomes of five tobacco species: N. knightiana, N. rustica, N. paniculata, N. obtusifolia and N. glauca. De novo assembled tobacco plastid genomes had the typical quadripartite structure, consisting of a pair of inverted repeat (IR) regions (25,323–25,369 bp each) separated by a large single-copy (LSC) region (86,510–86,716 bp) and a small single-copy (SSC) region (18,441–18,555 bp). Comparative analyses of Nicotiana plastid genomes with currently available Solanaceae genome sequences showed similar GC and gene content, codon usage, simple sequence and oligonucleotide repeats, RNA editing sites, and substitutions. We identified 20 highly polymorphic regions, mostly belonging to intergenic spacer regions (IGS), which could be suitable for the development of robust and cost-effective markers for inferring the phylogeny of the genus Nicotiana and family Solanaceae. Our comparative plastid genome analysis revealed that the maternal parent of the tetraploid N. rustica was the common ancestor of N. paniculata and N. knightiana, and the later species is more closely related to N. rustica. Relaxed molecular clock analyses estimated the speciation event between N. rustica and N. knightiana appeared 0.56 Ma (HPD 0.65–0.46). Biogeographical analysis supported a south-to-north range expansion and diversification for N. rustica and related species, where N. undulata and N. paniculata evolved in North/Central Peru, while N. rustica developed in Southern Peru and separated from N. knightiana, which adapted to the Southern coastal climatic regimes. We further inspected selective pressure on protein-coding genes among tobacco species to determine if this adaptation process affected the evolution of plastid genes. These analyses indicate that four genes involved in different plastid functions, including DNA replication (rpoA) and photosynthesis (atpB, ndhD and ndhF), came under positive selective pressure as a result of specific environmental conditions. Genetic mutations in these genes might have contributed to better survival and superior adaptations during the evolutionary history of tobacco species.


INTRODUCTION
Nicotiana L. is the fifth largest genus in the megadiverse plant family Solanaceae, comprising 75 species (Olmstead et al., 2008;Olmstead & Bohs, 2007), which were subdivided into three subgenera and fourteen sections by Goodspeed (1954). The subgenera of Nicotiana, as proposed by Goodspeed (1954), were not monophyletic (Aoki & Ito, 2000;Chase et al., 2003), but most of Goodspeed's sections were natural groups. The formal classification of the genus has been refined to reflect the growing body of evidence that Nicotiana consists of 13 sections (Knapp, Chase & Clarkson, 2004). One significant utilization of Nicotiana species has been as a source of genetic diversity for improving one of the most widely cultivated non-food crops, common tobacco (N. tabacum L.). This species is of major economic interest and is grown worldwide for its leaves used in the manufacture of cigars, cigarettes, pipe tobacco, and smokeless tobacco products consumed by more than one billion people globally (Lewis, 2011;Occhialini et al., 2016). While N. tabacum is the most notable commercial species, several additional species are also cultivated for smoking (N. rustica L.) and ornamental (N. sylvestris Spegazzini & Comes) or industrial (N. glauca Graham) purposes (Lester & Hawkes, 2001). Aztec or Indian tobacco (N. rustica), characterized by short yellowish flowers and round leaves, is widely cultivated in Mexico and North America. It was the first tobacco species introduced to Europe in the 16th century, but later superseded by N. tabacum for its milder taste (Shaw, 1960). Known as ''o-yen'-kwa hon'we'' (real tobacco) by North American Iroquois (Kell, 1966), it was used for medicinal and ritual purposes or even in weather forecasting (Winter, 2001). Aztec tobacco is still cultivated in South America, Turkey, Russia and Vietnam due to its tolerance to adverse climatic conditions (Sierro et al., 2018).
Some members of Nicotiana offer several research advantages, including extensive phenotypic diversity, amenability to controlled hybridizations and ploidy manipulations, high fecundity, and excellent response to tissue culture (Lewis, 2011). Consequently, N. tabacum and N. benthamiana Domin have become model organisms in the generation of new knowledge related to hybridization, cytogenetics, and polyploid evolution (Goodin et al., 2008;Zhang et al., 2011;Bally et al., 2018;Schiavinato et al., 2019). The first complete plastome nucleotide sequence was published in 1986 for N. tabacum (Shinozaki et al., 1986). Since then, the structure and composition of plastid genomes has become widely utilized in identifying unique genetic changes and evolutionary relationships of various groups of plants. Furthermore, plastid genes have also been linked with important crop traits such as yield and resistance to pests and pathogens (Jin & Daniell, 2015).
translocations (Lim et al., 2004;Kovarik et al., 2012), while additivity can be observed in the 5S and 35S rDNA loci respect to its progenitors (Lim et al., 2007), which were homogenized by concerted evolution (Kovarik et al., 2004). These analyses did not provide further evidence for the maternal parent of N. rustica but suggested either N. knightiana L. or N. paniculata could be the maternal donor.
Here, we assembled the plastid genome of five Nicotiana species and compared their sequences to gain insight into the plastid genome structure of genus Nicotiana. We also inferred the phylogenetic relationship of the genus and investigated the selection pressures acting on protein-coding genes. We then identified mutational hotspots in the Nicotiana plastid genome that might be used for the development of robust and cost-effective markers in crop breeding or taxonomy. Lastly, we used this information to trace the origin of the maternal genome of the allopolyploid Nicotiana rustica.

Comparative genome analysis and RNA editing prediction
Novel plastid genome sequences were compared through multiple alignments using MAFFT v7 (Katoh & Standley, 2013). All parts of the genome, including intergenic spacer regions (IGS), introns, protein-coding genes, and ribosomal RNAs and tRNAs, were considered for comparison. Each part was extracted and used to determine nucleotide diversity in DnaSP v6 (Rozas et al., 2017). Substitutions, transition and transversion rates were also determined compared to the N. tabacum reference using Geneious R8.1. Structural units of the plastid genome (LSC, SSC and IR) were individually aligned to determine the rate of substitutions and to further search for indels using DnaSP v6. Structural borders of plastid genomes were compared for 10 selected Nicotiana species using IRscope with option ''GB file upload'' and default settings (Amiryousefi, Hyvönen & Poczai, 2018b). The online software PREP-cp (Putative RNA Editing Predictor of Chloroplast) was used with default settings to determine putative RNA editing sites (Mower, 2009). Codon usage and amino acid frequencies were determined by Geneious R8.1.

Repeats analyses
Microsatellites within the plastid genomes of five Nicotiana species were identified using MISA (Beier et al., 2017) with a minimal repeat number of 7 for mononucleotide repeats, 4 for dinucleotide repeats and 3 for tri-, tetra-, penta-and hexanucleotide SSRs. We also used REPuter (Kurtz et al., 2001) with minimal repeat size set to 30 bp, Hamming distance set to 3, minimum similarity percentage of two repeat copies up to 90%, and a maximum computed repeat of 500 for scanning and visualizing forward (F), reverse (R), palindromic (P) and complementary (C) repeats. Tandem repeats were found with the Tandem Repeats Finder using default parameters (Benson, 1999).

Non-synonymous (K a ) and synonymous (K s ) substitution rate analysis
To determin K a and K s , protein-coding genes were extracted from the newly assembled Nicotiana plastid genomes and aligned using MAFFT with the corresponding genes of the previously published plastid genome of N. tabacum (Z00044.2) as a reference and analyzed using DnaSP v6. The data were interpreted in terms of purifying selection (K a /K s < 1), neutral evolution (K a /K s = 1), and positive selection (K a /K s > 1).
We evaluated the impact of positive selection using additional codon models to estimate the rates of synonymous and nonsynonymous substitution. The signs of positive selection were further assessed using fast unconstrained Bayesian approximation (FUBAR) (Murrell et al., 2013) and the mixed effects model of evolution (MEME) (Murrell et al., 2012) as implemented in the DATAMONKEY web server (Delport et al., 2010). Sites with cut-off values of PP > 0.9 in FUBAR were considered as candidates to have evolved under positive selection. From all the analyses performed in DATAMONKEY, the most suited model of evolution for each dataset was selected as directly estimated on this web server. In addition, the mixed effects model of evolution (MEME), a branch-site method incorporated in the DATAMONKEY server, was used to test for both pervasive and episodic diversifying selection. MEME applies models with variable ω across lineages at individual sites, restricting ω to ≤ 1 in a proportion p of branches and unrestricted at a proportion (1 − p) of branches per site. Positive selection was inferred with this method for p values < 0.05 using the false discovery rate (FDR) correction according to Benjamini & Hochberg (1995) in Microsoft Excel.

Phylogenetic analyses
Plastid genome sequences of the genus Nicotiana were selected from the Organelle Genome Resources of the NCBI (accessed on 21.02.2019) and used in phylogenetic inference along with de novo assembled sequences of Nicotiana. The x = 12 clade includes the traditional subfamily Solanoideae plus Nicotiana with the Australian endemic Anthocercideae tribe and takes its name from the synapomorphy of chromosome numbers based on 12 pairs (Olmstead et al., 2008). Nicotiana and Anthocercideae appear to be in a first branching position in the x = 12 clade, thus we have chosen Solanum dulcamara L. (Amiryousefi, Hyvönen & Poczai, 2018a) from the Solanoideae tribe with a curated plastid genome as an outgroup for rooting our phylogenetic tree. For the species included in our analysis, coding alignments were constructed from the excised plastid genes using MACSE (Ranwez et al., 2011), including 75 protein-coding genes (Table S2). For phylogenetic analysis, a 75,449bp concatenated matrix was used with the best fitting GY+F+I+G4 model determined by ModelFinder (Kalyaanamoorthy et al., 2017) according to the Akaike information criterion (AIC), and Bayesian information criterion (BIC). Maximum likelihood (ML) analyses were performed with IQ-TREE (Nguyen et al., 2015) using the ultrafast bootstrap approximation (UFBoot;Hoang et al., 2018) with 1,000 replicates. The key idea behind UFBoot is to keep trees encountered during the ML-tree search for the original sequence alignment and to use them to evaluate the tree likelihoods for the bootstrap sequence alignment. UFBoot provides relatively unbiased bootstrap estimates under mild model misspecifications and reduces computing time while achieving more unbiased branch supports than standard bootstrap (Hoang et al., 2018). TreeDyn was used for further enhancement of the phylogenetic tree (Dereeper et al., 2008;Lemoine et al., 2019).
Relative divergence times were estimated for species N. rustica and putative parental species using BEAST v. 1.8.4 (Drummond et al., 2012), applying GTR + I + G rate substitution to the protein-coding plastid gene matrix. A Yule speciation tree prior and a uncorrelated relaxed clock model that allows rates to vary independently along branches (Drummond et al., 2006) were used, with all other parameters set to default. The median time split between S. dulcamara and N. undulata (mean = 25 Myr; standard deviation = 0.5) was used as a temporal constraint to calibrate the BEAST analyses derived from the Solanaceae-wide phylogeny of Särkinen et al. (2013). Uncertainty regarding this date was incorporated by assigning normal prior distributions to the calibration point (Couvreur et al., 2008;Evans et al., 2014). Four independent BEAST runs were conducted with Markov Chain Monte Carlo (MCMC) samples based on 10 million generations, sampling every 10,000 generations. Convergence of all parameters was assessed in Tracer 1.5 (Rambaut et al., 2014) and 10% of each chain was removed as burn-in. The Markov chains were combined in LogCombiner 1.7.2. (Drummond et al., 2012) to calculate the maximum clade credibility tree.
We defined six biogeographical areas based on the Köppen-Geiger climate classification and further biogeographic evidence and distributions: (A) Colombian/Ecuadorian mountain range mixed equatorial (Af), monsoon (Am), and temperate oceanic climate (Cfb); (B) Northern Peruvian mountain range with tropical savanna climate (Aw); (C) Central Peru with equatorial climate (Af); (D) Coastal Peru with cold semi-arid and desert climate (Bsk, BWk); (E) Peruvian Mountain range with humid subtropical/oceanic highland climate (Cwb); and (F) Bolivian/Chilean alpine/mountain range with mixed semi-arid cold (Bsk, BWk) and humid subtropical climate (Cwa). These areas were used in the Bayesian Binary Method (BBM) model implemented in RASP (Yu, Blair & He, 2020) to investigate the biogeographic history of the selected four Nicotiana species. BBM infers ancestral area using a full hierarchical Bayesian approach and hypothesizes a special ''null distribution'', meaning that an ancestral range contains none of the unit areas (Ronquist, 2004). The analysis was performed on the BEAST maximum clade credibility tree using default settings, i.e., fixed JC + G (Jukes-Cantor + Gamma) with null root distribution. Ancestral area reconstruction for each node was manually plotted on the BEAST tree using pie charts. Species distributions were determined from data stored in the Solanaceae Source Database (http://solanaceaesource.org/) and Global Biodiversity Information Facility (GBIF) (https://www.gbif.org/).

Characteristics of Nicotiana plastid genomes
The genome size of the assembled complete plastid genomes ranged between 155,689 bp (N. paniculata) and 156,022 bp (N. obtusifolia), while reads provided 327 to 1,951x coverage (Table S1). Genomes harbored 133 unique genes, of which 18 genes were duplicated in the IR region (Table S2, Fig. 1A). Out of 133 genes, 85 were protein-coding, 37 were tRNA genes and 8 were rRNA genes. Among 18 duplicated genes in the IR region, 7 were protein-coding, 7 were tRNA genes and 4 were rRNA genes. From the protein-coding genes, 18 contained introns, while rps 12 was a trans-spliced gene with its 1st exon found in the LSC and the 2nd and 3rd exons found in the IR region. Structural elements of the IR region also showed the highest GC content (43.2%) compared to the LSC (35.9%) and SSC (32.1%) (Table S1). This finding could be attributed to the presence of tRNA (52.9%) and rRNA (55.4%) genes in inverted repeats.
The nucleotide composition comparison of Nicotiana genomes revealed high synteny among all regions, including not only the LSC, SSC, IR and CDSs, but interestingly also in non-coding regions. Detailed comparison of the base composition of each region is shown in Table S3. All amino acid sequences in Nicotiana plastid genomes were rich in AT bases and coded a higher percentage of hydrophobic amino acids compared to acidic amino acids (Fig. 1B). Codon usage and frequency of amino acids revealed that leucine is the most abundant and cysteine the least encoded amino acid in these genomes (Fig. 1B). At the 3rd codon position the frequency of A/T codons was higher compared to C/G (Table S4).
The number of RNA editing sites predicted using PREP-cp varied between 34 and 37, distributed among 15 genes (see Table S3). From these genes, the most RNA-edited sites were possessed by ndhB (9), followed by ndhD (6-8) and rpo B (4). The ndhD gene revealed a fraction of variation among species: N. knightiana, N. rustica and N. paniculata having six RNA editing sites, whereas seven were observed in N. obtusifolia and eight in N. glauca. Most of the RNA editing sites were C to U edits on the first and second base of the codons, with the frequency of second base codon edits being much higher. These changes helped in the formation of hydrophobic amino acids, for example valine (V), leucine (L) and phenylalanine (F), with conversions from serine to leucine being the most frequent. (Table S5).

IR contraction and expansion
The

Non-synonymous (K a ) and synonymous (K s ) substitution rate analysis
The synonymous/non-synonymous substitution rate ratio is widely used as an indicator of adaptive evolution or positive selection (Kimura, 1979). We have calculated the K s , K a and K a /K s ratio for 77 protein-coding genes for five selected Nicotiana species using N. tabacum as a reference. Among the analyzed genes, 31 had K s = 0, 19 had K a = 0, and 39 genes had both K s and K a = 0 values. Of the investigated genes, 13 genes showed K a /K s > 1 in at least one species (Table S4). We selected these genes for further analysis using FUBAR and MEME. FUBAR estimates the number of nonsynonymous and synonymous substitutions at each codon given a phylogeny, and provides the posterior probability of every codon belonging to a set of classes of ω (including ω = 1, ω < 1 or ω > 1) (Murrell et al., 2013). MEME estimates the probability for a codon to have undergone episodes of positive evolution, allowing the ω ratio distribution to vary across codons and branches in the phylogeny. This last attribute allows identification of the proportion of codons that may have been evolving neutrally or under purifying selection, while the remaining codons can also evolve under positive selection (Murrell et al., 2012). The two models indicated positive selection on the codons only found in atpB, ndhD, ndhF and rpoA (Table 1). Thus, the methods described suggested six amino acid replacements altogether as candidates for positive selection, of which three were fixed in all Nicotiana, and three were restricted to diverse groups of species (see Table 1).  Notes. α, the mean synonymous substitution rate at a site; β, the mean non-synonymous substitution rate at a site; PP, posterior probability of positive selection at a site; LRT, Likelihood ratio test for episodic (positive) diversification; FDR, false discovery rate (FDR = 5%). Nat, N. attenuata; Ngla, N. glauca; Nkni, N. knightiana; Nobt, N. obtusifolia; Noto, N. otophora; Npan, N. paniculata; Nrus, N. rustica; Nsyl, N. sylvestris; Ntab, N. tabacum; Ntom, N.

Repetitive sequences in novel Nicotiana plastid genomes
Repeat analysis performed with MISA revealed high similarity in chloroplast microsatellites (cpSSRs) ranging from 368 to 384 among the species. The majority of the SSRs in these plastid genomes were mononucleotide rather than trinucleotide or dinucleotide repeats. The most dominant of the SSRs were mononucleotide A/T motifs, while the second most predominant were dinucleotide AT/TA motifs. Mononucleotide SSRs varied from 7to 17-unit repeats, dinucleotide SSRs varied from 4-to 5-unit repeats, and other SSRs types present mainly in 3-unit repeats. Most SSRs were located in the LSC and were less frequent in the IR and SSC (Fig. S1 and Table S7). Locating repeats with REPuter revealed 117 oligonucleotide repeats evenly dispersed among the species, ranging from 21 (N.  Table S8). The non-coding IGS regions also contained most of the tandem repeats (Fig. S3).

Single nucleotide polymorphism and insertion/deletion analyses in Nicotiana
To discover polymorphic regions (mutational hotspots), the CDS, intron and IGS regions of the whole plastid genome of five Nicotiana species were compared. Nucleotide diversity values varied from 0 (ycf3) to 0.306 (rps12 intron region) (Fig. 2). High polymorphism was found in intronic regions (average π = 0.1670) compared to IGS (π = 0.031) and CDS regions (average π = 0.002). We further investigated the number and occurrence of substitution types in the plastid genomes using N. tabacum as a reference and encountered 509 (N. galuca) to 861 (N. paniculata) substitutions along the entire plastid genome. Most of the conversions were A/G and C/T single nucleotide polymorphisms (SNPs) ( Table 2). A detailed description of the ratio of transition to transversion substitutions (Ts/Tv) can be found in Table S9. In addition to the distribution of SNPs, we examined insertions and deletions (indels) and located 107 (N. rustica) to 143 (N. obtusifolia) polymorphisms across the compared genomes (Table 3). Based on this comparison we successfully determined 20 highly polymorphic regions that might be used as potential markers in Nicotiana species barcoding (Table 4).

Phylogenetic analyses
Phylogenetic analysis with Nicotiana plastid genomes was performed with the maximum likelihood method based on 75 selected and concatenated protein-coding genes. Our phylogenetic analyses resulted in a highly resolved tree (Fig. 1D). Almost all the recovered clades had maximum branch support values reconstructed based on alignment size of 75,449 bp with best fit model GY+F+I+G4. We further concentrated on the species phylogeny of N. rustica and putative parental species, where relative divergence times were estimated using a relaxed uncorrelated clock implemented in BEAST. This analysis found that the divergence of N. undulata appeared 5.36 million years ago (Ma) (highest posterior  . This analysis showed that the Nicotiana species included in the analysis are not older than the end of the Pliocene and that most subsequent evolution must have occurred in the Pleistocene. The timing of these lineage splits, in addition to the current distributions of four closely related species, were used to infer the progression of migratory steps in RASP (Fig. 3). The most recent common ancestor (MRCA) area illustrated a dispersal event for N. paniculata in Northern (B) and Southern Peru (E) and the vicariance of N. knightiana in Coastal Peru (D). The overall dispersal pattern of the examined species showed a south-to-north expansion pattern from Central Peru to Colombia and Ecuador (N. rustica) to Bolivia (N. undulata).

Molecular evolution of Nicotiana plastid genomes
We compared plastid genomes from five Nicotiana species, which revealed similar genomic features. These comparative analyses produced an insight into the phylogeny and evolution of Nicotiana species. The GC content of the novel Nicotiana plastid genomes were similar to those previously reported (Sugiyama et al., 2005;Yukawa, Tsudzuki & Sugiura, 2006); the GC content was high in the IR, which might be a result of the presence of ribosomal RNA (Qian et al., 2013;Cheng et al., 2017;Zhao et al., 2018). The genome organization, gene order and content also showed high similarity and synteny with sequences previously published for N. slyvestris and N. tabacum (Sugiyama et al., 2005;Yukawa, Tsudzuki & Sugiura, 2006). This could be attributed to plastid genomes of land plants having a conserved structure but with diversity prevailing at the border position of LSC/SSC/IR. However, examining the IR junction sites of Nicotiana species also showed similarities with some variation prevalent in N. tomentosiformis, which has 60 bp in the IRb region, while the rps19 gene is present entirely in the LSC compared to the N. tabacum reference. Such fluctuations at the border positions of various regions of the plastid genome might be helpful in determining evolutionary relationships or could be indicators of environmental adaptation of species (Menezes et al., 2018). Liu et al. (2018) reported that the similarities in the junction regions may be useful for explaining the relationship between species, and that plants with a high level of relatedness show minimal fluctuations in the junctions of the plastid genome. In this respect, the high resemblance of the IR junction sites reveals a close relationship of Nicotiana species. Repeats in the plastid genome are useful in evolutionary studies and play a vital role in genome arrangement (Zhang et al., 2016). We detected the presence of large amounts of mononucleotide repeats (A/T), and trinucleotide SSRs (ATT/TAA) in the five analyzed species of Nicotiana, which may be a result of the A/T-richness of the plastid genome. A similar result was also reported in N. otophora L. (Asaf et al., 2016). In all the species of Nicotiana, the LSC region contained a greater amount of SSRs in comparison to SSC and IR, which has also been demonstrated in other studies of angiosperm plastid genomes (Asaf et al., 2016;Shahzadi et al., 2019;Mehmood et al., 2019;Yang et al., 2019). To understand molecular evolution, it is important to analyze nucleotide substitution rates (Muse & Gaut, 1994); in plastid genomes LSC and SSC regions are more prone to substitutions and indels, whereas the IRs are more conserved (Ahmed et al., 2012;Abdullah et al., 2019b). Our results corroborate this finding, indicating the IR region is more conserved, and most of the substitutions occur in the LSC and SSC regions. Similar results have been shown in the plastid genome of yam (Dioscorea polystachya Turcz.) (Cao et al., 2018).
Divergence hotspot regions of the plastid genome could be used to develop accurate, robust and cost-effective molecular markers for population genetics, species barcoding, and evolution studies (Ahmed et al., 2013;Ahmed, 2014;Nguyen et al., 2017). Previous studies identified several polymorphic loci based on a comparison of plastid genomes, which have provided suitable information for the development of further molecular markers (Choi, Chung & Park, 2016;Li et al., 2018;Menezes et al., 2018). We found 20 polymorphism loci in Nicotiana that were more polymorphic than the frequently used rbcL, and mat K markers. For example, infA, rps12 intron and rps16 -trnQ-UUG had nucleotide diversities of 0.2594, 0.1527 and 0.0845, respectively (Table 4). These regions might show great potential as markers for population genetics and phylogenetic analyses in the genus Nicotiana.

Positive selection on Nicotiana plastid genes
Plants have evolved complex physiological and biochemical adaptations to adjust and adapt to different environmental stresses. Nicotiana, originating in South America, has spread to many regions of the world and members of the genus have successfully adapted to survive in harsh environmental conditions. This large variation in their distributional range has induced distinctive habits and morphology in inflorescence and flowers, indicative of the physiological specialization to the area where they evolved. Desert ephemeral Nicotiana species are short while subtropical perennials have tall and robust habits with variable inflorescences ranging from pleiochasial cymes to solitary flowers and diffuse panculatecymose mixtures. For example, members of Nicotiana section Suaveolentes, evolving in isolation, faced several cycles of harsh climate change. In Australia, the native range of the species, a predominantly warm and wet environment went through intensive aridification (Poczai, Hyvönen & Symon, 2011). Throughout this climate change and increasing central aridification, many species either retreated to the wetter coastline or adapted to and still survive in this hostile inland environment (Bally et al., 2018). Tobacco plants also developed specialized biosynthetic pathways and metabolites, such as nicotine, which serve complex functions for ecological adaptations to biotic and abiotic stresses, most importantly serving as a defense mechanism against herbivores (Xu et al., 2017). Nicotiana is thus a rich reservoir of genetic resources for evolutionary biological research since several members of the genus have gone through changing climatic events and adapted to environmental fluctuations.
The patterns of synonymous (K s ) and non-synonymous (K a ) substitution of nucleotides are essential markers in evolutionary genetics defining slow and fast evolving genes (Kimura, 2006). K a /K s values >1, =1, and <1 indicate positive selection, neutral evolution and purifying selection, respectively (Lawrie et al., 2013). Many proteins and RNA molecules encoded by plastid genomes have undergone purifying selection since they are involved in important functions of plant metabolism, self-replication, and photosynthesis and therefore play a pivotal role in plant survival (Piot et al., 2018). Departure from the main purifying selection in the case of plastid genes might happen in response to certain environmental changes when advantageous genetic mutations can contribute to better survival and adaptation. The K a /K s ratios in our analysis indicate changes in selective pressures. The genes atpB, ndhD, ndhF and rpoA had greater K a /K s values (>1), possibly due to positive selective pressure as a result of specific environmental conditions. This was conclusively supported by an integrative analysis using Fast Unconstrained Bayesian AppRoximation (FUBAR) and Mixed Effects Model of Evolution (MEME) methods, which identified a set of positively selected codons in these genes.
These genes are involved in different plastid functions, such as DNA replication (rpoA) and photosynthesis (atpB, ndhD and ndhF). The rpoA gene encodes the alpha subunit of PEP, which is believed to predominantly transcribe photosynthesis genes (Hajdukiewicz, Allison & Maliga, 1997). The transcripts of plastid genes encoding the PEP core subunits are transiently accumulated during leaf development (Kusumi et al., 2011), thus the entire rpoA polycistron is essential for chloroplast gene expression and plant development (Zhang et al., 2018). The housekeeping gene atpB encodes the β-subunit of the ATP synthase complex, which has a highly conserved structure that couples proton translocation across membranes with the synthesis of ATP (Gatenby, Rothstein & Nomura, 1989), which is the main source of energy for the functioning of plant cells. In chloroplasts, linear electron transport mediated by PSII and PSI produces both ATP and NADPH, whereas PSI cyclic electron transport preferentially contributes to ATP synthesis without the accumulation of NADPH . Chloroplast NDH monomers are sensitive to high light stress, suggesting that the ndh genes encoding NAD(P)H dehydrogenase (NDH) may also be involved in stress acclimation through the optimization of photosynthesis (Casano, Martín & Sabater, 2001;Martin et al., 2002;Rumeau, Peltier & Cournac, 2007). During acclimation to different light environments, many plants change biochemical composition and morphology (Terashima et al., 2005). The highly responsive regulatory system controlled by cyclic electron transport around PSI could optimize photosynthesis and plant growth under naturally fluctuating light (Yamori, 2016). When demand for ATP is higher than for NADPH (e.g., during photosynthetic induction, at high or low temperatures, at low CO 2 concentration, or under drought), cyclic electron transport around PSI is likely to be activated (Yamori, 2016;Yamori & Shikanai, 2016). Thus, positive selection acting on ATP synthase and NAD(P)H dehydrogenase encoding genes is probably evidence for adaptation to novel ecological conditions in Nicotiana.
These findings are further supported by our observation that RNA editing sites occur frequently in Nicotiana ndh genes (Table S3). It has been shown that ndhB mutants under lower air humidity conditions or following exposure to ABA present reduced levels of photosynthesis, likely mediated through stomatal closure triggered under these conditions (Horvath et al., 2000). Therefore, a protein structure modification resulting from a loss or decrease in RNA editing events could affect adaptations to stress conditions or cause other unknown changes (Rodrigues et al., 2017). Previous studies have demonstrated that abiotic stress influences the editing process and consequently plastid physiology (Nakajima & Mulligan, 2001). Alterations in editing site patterns resulting from abiotic stress could be associated with susceptibility to photo-oxidative damage (Rodrigues et al., 2017) and indicate that Nicotiana species experienced abiotic stresses during their evolution, which resulted in positive selection of some plastid genes. Up to this point, positive selection has rarely been detected in plastid genes except for clpP (Erixon & Oxelman, 2008), ndhF (Peng, Yamamoto & Shikanai, 2011), matK (Hao, Chen & Xiao, 2010 and rbcL (Kapralov et al., 2011). However, Piot et al. (2018 showed that one-third of the plastid genes in 113 species of grasses (Poaceae) evolved under positive selection. This indicates that positive selection is overlooked among diverse groups of plant taxa.

Phylogenetic relationships and the origin of tetraploid Nicotiana rustica
Our comparative plastid genome analysis revealed that the maternal parent of the tetraploid N. rustica was the common ancestor of N. paniculata and N. knightiana, with the latter species being more closely related to N. rustica. The relaxed molecular clock analyses estimated that the speciation event between N. rustica and N. knightiana appeared ∼0.56 Ma (HPD 0.65-0.46), in line with previous findings (Sierro et al., 2018). Comparative analysis of the genomes of four related Nicotiana species revealed that N. rustica inherited about 41% of its nuclear genome from its paternal progenitor, N. undulata, and the rest from its maternal progenitor, the common ancestor of N. paniculata and N. knightiana (Sierro et al., 2018), which has also been confirmed by our study. We also revealed that N. knightiana is more closely related to N. rustica than N. paniculata, which can be further corroborated by the distribution of indels highlighted in the present study. The paternal inheritance of plastid genomes was observed in Nicotiana under certain stressed conditions (Medgyesy, Fejes & Maliga, 1985;Medgyesy, Páy & Márton, 1986;Thang & Medgyesy, 1989;Avni & Edelman, 1991;Ruf, Karcher & Bock, 2007;Thyssen, Svab & Maliga, 2012;) Such low-frequency paternal leakage of plastids via pollen was suggested to be universal in plants with strict maternal plastid inheritance (Azagiri & Maliga, 2007). Thus, we expect that the plastids in the putative parents of N. rustica are maternally inherited. Medgyesy, Páy & Márton (1986) observed the paternal transmission of plastids in N. plumbaginifolia Viv., but they concluded that plants carried maternal mitochondria. Further studies investigating the parental origins of Nicotiana species should also focus on mitochondrial genomes excluding possible low-frequency paternal plastid inheritance.
The biogeographical analysis suggests that N. undulata and N. paniculata evolved in North/Central Peru, while N. rustica developed in Southern Peru and separated from N. knightiana, which adapted to the Southern coastal climatic regimes. Positively selected plastid genes with functions such as DNA replication (rpoA) and photosynthesis (atpB, ndhD and ndhF ) might have been associated with successful adaptation to, for example, a coastal environment. However, our results are tentative, as our study lacks data for several broad ecological variables, including variation in salinity, island versus mainland, and East versus West of the Andes. We aim to highlight that many potential environmental variables might be highly correlated with speciation processes in Nicotiana, as has been demonstrated in the same region for another Solanaceae group in the tomato clade (Solanum sect. Lycopersicon), where amino acid differences in genes associated with seasonal climate variation and intensity of photosynthetically active radiation have been correlated with speciation processes (Pease et al., 2016). Another example of rapid adaptive radiation from the family is the genus Nolana L.f., where several clades gained competitive advantages in water-dependent environments by succeeding and diverging in Peru and Northern Chile (Dillon et al., 2009). In the case of N. rustica and related species, we assume that diversification was driven by the ecologically variable environments of the Andes. Our molecular clock analysis provides evidence for recent species diversification in the Pleistocene and Pliocene while substantial climatic transitions in Peru predate these events. For example, the uplift of the central region of the Andes and the formation of the Peruvian coastal desert ended (Hoorn et al., 2010;Gerreaud, Molina & Farias, 2010), before the geographical and ecological expansion of N. rustica and related parental species.
The dispersal of N. rustica and related species shows a south-to-north range expansion and diversification which has been suggested by phylogenies of other plant and animal groups in the Central Andes (Picard, Sempere & Plantard, 2008;Luebert & Weigend, 2014). Based on the south-to-north progression scenario, habitats located at high altitudes were first available for colonization in the south, recently continuing to northward. Erosion and orogenic progression caused dispersal barriers of species colonizing these high habitats to diversify in a south-to-north pattern, frequently following allopatric speciation. Thus, for taxonomic groups currently residing throughout a large portion of the high Andes, a southto-north speciation pattern is expected (Doan, 2003). In this case, the most basal species (N. undulata) has more southern geographic ranges, and the most derived species (N. rustica) has more northern geographic ranges, except for N. knightiana, which presumably colonized the coastal range of Peru. Although the four Nicotiana species examined show overlaps in their distribution, it is probable that speciation was caused by fragmentation of populations during the glacial period (see Simpson, 1975). Utilizing fewer chloroplast loci for phylogenetic analyses of plant species may limit the solution of phylogenetic relationships, specifically at low taxonomic levels (Hilu & Alice, 2001;Majure et al., 2012). Previously, Nicotiana was subdivided into 13 sections using multiple chloroplast markers, i.e., trnL intron and trnL-F spacer, trnS-G spacer and two genes, ndhF and mat K (Clarkson et al., 2004). Recently, inference of phylogeny based on complete plastid genomes has provided deep insight into the phylogeny of certain families and genera (Henriquez et al., 2014;Amiryousefi, Hyvönen & Poczai, 2018a;Abdullah et al., 2020). Here, we reconstructed a phylogenetic tree for eleven species of Nicotiana that belong to nine sections (Clarkson et al., 2004) based on 75 protein-coding genes by using S. dulcamara as an outgroup, which attests the previous classification of genus Nicotiana with high bootstrap values. Species of each section are well resolved whereas N. tabacum of section Nicotiana, and N. sylvestris of section Sylvestres, show close resemblance. N. paniculata and N. knightiana belong to section Paniculatae and appeared to reflect the maternal ancestry of these species relative to N. rustica of section Rusticae. Overall, our phylogenetic analyses support the previous classification of genus Nicotiana, and corroborates that plastid genomic resources can provide further support for highly resolved phylogenies.

CONCLUSION
In the present study, we assembled, annotated and analyzed the whole plastid genome sequence of five Nicotiana species. The structure and organization of their plastid genome was similar to those of previously reported Solanaceae plastid genomes. Divergences of LSC, SSC and IR region sequences were identified, as well as the distribution and location of repeat sequences. The identified mutational hotspots could be utilized as potential molecular markers to investigate phylogenetic relationships in the genus, as we demonstrated in our study to elucidate the maternal genome origins of N. rustica. Our results could provide further help in understanding the evolutionary history of tobaccos.