Repetitive DNA Restructuring Across Multiple Nicotiana Allopolyploidisation Events Shows a Lack of Strong Cytoplasmic Bias in Influencing Repeat Turnover

Allopolyploidy is acknowledged as an important force in plant evolution. Frequent allopolyploidy in Nicotiana across different timescales permits the evaluation of genome restructuring and repeat dynamics through time. Here we use a clustering approach on high-throughput sequence reads to identify the main classes of repetitive elements following three allotetraploid events, and how these are inherited from the closest extant relatives of the maternal and paternal subgenome donors. In all three cases, there was a lack of clear maternal, cytoplasmic bias in repeat evolution, i.e., lack of a predicted bias towards maternal subgenome-derived repeats, with roughly equal contributions from both parental subgenomes. Different overall repeat dynamics were found across timescales of <0.5 (N. rustica L.), 4 (N. repanda Willd.) and 6 (N. benthamiana Domin) Ma, with nearly additive, genome upsizing, and genome downsizing, respectively. Lower copy repeats were inherited in similar abundance to the parental subgenomes, whereas higher copy repeats contributed the most to genome size change in N. repanda and N. benthamiana. Genome downsizing post-polyploidisation may be a general long-term trend across angiosperms, but at more recent timescales there is species-specific variance as found in Nicotiana.


Introduction
Allopolyploidy is pervasive in flowering plants, having occurred multiple times throughout angiosperm evolutionary history and recurrently in most clades [1]. Allopolyploidy occurs where genome duplication and hybridisation happen in concert. This has led to an appreciation of polyploidy (or whole genome duplication) as an important driving force in plant evolution, with the potential for immediate disadvantages of neopolyploidy ultimately leading to evolutionary novelty and ecological persistence [2][3][4][5][6]. The combination of two divergent subgenomes within one nucleus, and the redundancy of DNA sequences, leads to complicated post-polyploidisation restructuring of the polyploid genome as it is diploidised [7]. In Nicotiana (Solanaceae), recurrent polyploidisation has led to allotetraploid species of different ages, making this an excellent model system for investigating allopolyploidisation through time [8][9][10][11][12][13]. This allows us to test whether similar outcomes occur through allopolyploidisation within one genus, over different time scales, or whether post-polyploidisation changes are instead species-specific.
A combination of genomic, cytogenetic and phylogenetic work over the past two decades has established the parentage and age of the allotetraploid species of Nicotiana [14][15][16][17][18]. At the younger end of the spectrum (<0.5 Ma) is the commercial species, N. tabacum L., the common smoking tobacco. The closest extant relatives of the parental lineages for N. tabacum (section Nicotiana) are well-established as being N. tomentosiformis Goodsp. (paternal; T-genome donor) and N. sylvestris Speg. (maternal; S-genome donor) [9,14] (Figure 1). Young polyploids are also found within the sections Rusticae and Polydicliae. Nicotiana rustica is formed from N. undulata Ruiz & Pav. (paternal) and N. paniculata L. (maternal) in <0.5 Ma ( [14,19]; Figure 1). Section Repandae formed approximately 4 Ma, between N. obtusifolia M.Martens & Galeotti (paternal) and N. sylvestris (maternal) and consists of four taxa ( Figure 1): N. repanda, N. stocktonii Brandegee, N. nesophila I.M.Johnst. and N. nudicaulis S.Watson [11,14,18,19]. Finally, the largest section of polyploid taxa is section Suaveolentes (approximately 50 species), which is also the oldest section of allotetraploids, appearing at approximately 6-7 Ma [18,20]. The parentage of this section is probably more complex and likely involves N. sylvestris (paternal sub-genome donor) and a homoploid hybrid as the maternal subgenome donor (between sections Noctiflorae and Petunioides; Figure 1). The section Suaveolentes also includes the important model N. benthamiana that is used extensively for plant-viral interaction studies. Recent genomic analysis using gene-based phylogenetic trees indicated that approximately 1.4 Gbp of the N. benthamiana genome is attributable to N. noctiflora Hook. as the maternal parent [20] and thus, we consider N. noctiflora as a close relative of the maternal parent for N. benthamiana.
Genes 2020, 10, x FOR PEER REVIEW 2 of 9 allopolyploidisation through time [8][9][10][11][12][13]. This allows us to test whether similar outcomes occur through allopolyploidisation within one genus, over different time scales, or whether postpolyploidisation changes are instead species-specific. A combination of genomic, cytogenetic and phylogenetic work over the past two decades has established the parentage and age of the allotetraploid species of Nicotiana [14][15][16][17][18]. At the younger end of the spectrum (<0.5 Ma) is the commercial species, N. tabacum L., the common smoking tobacco.  [11,14,18,19]. Finally, the largest section of polyploid taxa is section Suaveolentes (approximately 50 species), which is also the oldest section of allotetraploids, appearing at approximately 6-7 Ma [18,20]. The parentage of this section is probably more complex and likely involves N. sylvestris (paternal sub-genome donor) and a homoploid hybrid as the maternal subgenome donor (between sections Noctiflorae and Petunioides; Figure 1). The section Suaveolentes also includes the important model N. benthamiana that is used extensively for plant-viral interaction studies. Recent genomic analysis using gene-based phylogenetic trees indicated that approximately 1.4 Gbp of the N. benthamiana genome is attributable to N. noctiflora Hook. as the maternal parent [20] and thus, we consider N. noctiflora as a close relative of the maternal parent for N. benthamiana.   Genome restructuring occurs post-polyploidisation, perhaps as one result of "genomic shock", the combination of two divergent subgenomes within one nucleus (McClintock, 1984), although this phenomenon is more pronounced in herbaceous annuals than in perennials and woody groups. The nuclear-cytoplasmic interaction hypothesis (NCI) proposes to explain some of the patterns seen in the formation of allopolyploids. The paternal genome is considered as alien DNA within the context of the maternal cytoplasm, which could therefore lead to vulnerability of the paternal sub-genome, and ultimately to its paternal-specific degradation [23]. Previous results have shown a bias in the loss of paternal DNA (T-genome) for Nicotiana tabacum [9,10,23]. Similarly, the recent draft genome of N. rustica has suggested that approximately 59% originated from the maternal genome donor (N. knightiana/N. paniculata) versus 41% from the paternal genome donor, using a k-mer analysis of sequence data [24].

1.
How do repeat dynamics vary across polyploidisation events of different ages? 2.
Is there a higher contribution of maternal subgenome DNA to the allopolyploid genome, reflected in repetitive element abundances? 3.
If present, do these biases vary with the age of the allopolyploid, and/or phylogenetic distance of progenitor subgenomes? . Reads were quality filtered (Phred score minimum >10, maximum 3 Ns) and trimmed to the same length (91 bp). Higher quality filtering can lead to biased estimates of repeat abundance, due to typically lower quality scores for AT/GC rich sequences such as satellite repeats.

Clustering of Read Data
In each case, read subsets representing a genome proportion of 2% (0.02 × coverage) were used to estimate the repetitive element abundance in each diploid and allotetraploid genome. These were combined to create three final datasets as follows: Reads were prefixed with five-letter sample-specific codes, and comparative clustering was run on each dataset using RepeatExplorer2, using default settings [26,27].

Statistical Analyses
Statistical analyses and graphical plots were performed in R version 3.3.0 [28]. Contaminating clusters were removed (sequence artefacts, organellar DNA), as well as clusters with fewer than 10 reads (per species) and species-specific clusters. The expected cluster size for each tetraploid species was calculated as the sum of the abundance in each parent. The cumulative deviation from expectation was calculated and plotted against the cumulative expected cluster size in order to visualise repeat dynamics across the range of cluster sizes, as in [11]. Linear regression was fitted to test parental contributions to tetraploid cluster size. Cluster sizes (number of reads) were natural log-transformed.
Three-dimensional plots were plotted using the plot3D package [29] and the car package [30] was used to compare the parental regression slopes. In all analyses, residuals were viewed to ensure that model assumptions were met.

Repeat Dynamics in Polyploids of 0.5-6 Ma
To broadly compare the repeat dynamics across polyploidisation events, we compared the deviation from expected abundance for a range of cluster sizes from ten reads upwards ( Figure 2). These represent different classes of repetitive elements. The expected cluster size is the sum of the cluster size in each of the two parental subgenomes for each allotetraploid. Deviation above zero represents an increase in repeat abundance from expectation and below zero the opposite.
Genes 2020, 10, x FOR PEER REVIEW 4 of 9 [30] was used to compare the parental regression slopes. In all analyses, residuals were viewed to ensure that model assumptions were met.

Repeat Dynamics in Polyploids of 0.5-6 Ma
To broadly compare the repeat dynamics across polyploidisation events, we compared the deviation from expected abundance for a range of cluster sizes from ten reads upwards ( Figure 2). These represent different classes of repetitive elements. The expected cluster size is the sum of the cluster size in each of the two parental subgenomes for each allotetraploid. Deviation above zero represents an increase in repeat abundance from expectation and below zero the opposite. In the young tetraploid Nicotiana rustica there is little deviation from expectation across the range of repeat sizes ( Figure 2). This indicates that N. rustica has faithfully maintained repetitive DNA without significant sequence divergence or genome size change. Comparing the deviation from expectation at the same scale across polyploidisation events (Figure 2), shows that any deviation is minimal compared with the alterations that occur in older tetraploids.
Over longer timescales, deviation from simple expectation is much greater in both of the older tetraploid groups for sections Repandae (~4 Ma) and Suaveolentes (~6 Ma). In both older tetraploids, there is limited deviation from expected cluster size at the small-medium sizes (up to about 5000 reads). As cluster size increases, however, these higher abundance elements have significantly differed from expectation in both cases. For N. repanda the larger abundance elements are more represented than expected, resulting in a final genome size that is larger than expected ( Figure 2). For N. benthamiana, there is an initial increase above expectation for some larger clusters (mid-size repeats at Log 10-11), but this quickly becomes a decrease, showing appreciable genome downsizing from expectation. This appears to be almost entirely due to deletion of higher abundance repeat elements (Figure 2). In the young tetraploid Nicotiana rustica there is little deviation from expectation across the range of repeat sizes (Figure 2). This indicates that N. rustica has faithfully maintained repetitive DNA without significant sequence divergence or genome size change. Comparing the deviation from expectation at the same scale across polyploidisation events (Figure 2), shows that any deviation is minimal compared with the alterations that occur in older tetraploids.
Over longer timescales, deviation from simple expectation is much greater in both of the older tetraploid groups for sections Repandae (~4 Ma) and Suaveolentes (~6 Ma). In both older tetraploids, there is limited deviation from expected cluster size at the small-medium sizes (up to about 5000 reads). As cluster size increases, however, these higher abundance elements have significantly differed from expectation in both cases. For N. repanda the larger abundance elements are more represented than expected, resulting in a final genome size that is larger than expected (Figure 2). For N. benthamiana, there is an initial increase above expectation for some larger clusters (mid-size repeats at Log 10-11), but this quickly becomes a decrease, showing appreciable genome downsizing from expectation. This appears to be almost entirely due to deletion of higher abundance repeat elements (Figure 2).

Parental Contribution to Allotetraploid Genomes
For each tetraploid, we conducted a series of regression analyses to explore the relative contribution of the paternal vs. maternal parents (Figures 3 and 4). In all cases, there is no clear difference between the maternal and paternal contribution (Figure 3), with no obvious pattern across the range of cluster sizes (Figure 4).

Parental Contribution to Allotetraploid Genomes
For each tetraploid, we conducted a series of regression analyses to explore the relative contribution of the paternal vs. maternal parents (Figures 3 and 4). In all cases, there is no clear difference between the maternal and paternal contribution (Figure 3), with no obvious pattern across the range of cluster sizes (Figure 4).  In the youngest polyploid, N. rustica, there is a stronger correlation between the parental cluster size and that of the tetraploid ( Figure 3A), with a statistically significant difference between the slope of the two parental regression lines (p = 0.0201). For both older tetraploids, N. repanda (p = 0.0002) and N. benthamiana (p = 0.0046) the slopes were found to be significantly different between the maternal and paternal progenitors ( Table 1; Table S1), although visually there is a lack of obvious pattern or direction in the regressions ( Figure 3B,C). Untransformed plots ( Figure S1) and LOESS plots ( Figure  S2) are additionally shown in Supplementary Materials. Genes 2020, 10, x FOR PEER REVIEW 5 of 9

Parental Contribution to Allotetraploid Genomes
For each tetraploid, we conducted a series of regression analyses to explore the relative contribution of the paternal vs. maternal parents (Figures 3 and 4). In all cases, there is no clear difference between the maternal and paternal contribution (Figure 3), with no obvious pattern across the range of cluster sizes (Figure 4).  In the youngest polyploid, N. rustica, there is a stronger correlation between the parental cluster size and that of the tetraploid (Figure 3A), with a statistically significant difference between the slope of the two parental regression lines (p = 0.0201). For both older tetraploids, N. repanda (p = 0.0002) and N. benthamiana (p = 0.0046) the slopes were found to be significantly different between the maternal and paternal progenitors ( Table 1; Table S1), although visually there is a lack of obvious pattern or direction in the regressions ( Figure 3B,C). Untransformed plots ( Figure S1) and LOESS plots ( Figure  S2) are additionally shown in Supplementary Materials. In the youngest polyploid, N. rustica, there is a stronger correlation between the parental cluster size and that of the tetraploid (Figure 3A), with a statistically significant difference between the slope of the two parental regression lines (p = 0.0201). For both older tetraploids, N. repanda (p = 0.0002) and N. benthamiana (p = 0.0046) the slopes were found to be significantly different between the maternal and paternal progenitors ( Table 1; Table S1), although visually there is a lack of obvious pattern or direction in the regressions ( Figure 3B,C). Untransformed plots ( Figure S1) and LOESS plots ( Figure S2) are additionally shown in Supplementary Materials. Plotting the regression results in 3D as an alternative visualisation also showed no clear pattern between maternal and paternal parents (Figure 4; Table S2). In the case of N. rustica, the cluster sizes have a much higher correlation between parents and polyploid, and hence fit closer to the regression plane ( Figure 4A). For older polyploids, N. repanda ( Figure 4B) and N. benthamiana ( Figure 4C), there is dispersion across the range of cluster sizes and around the regression plane. Summary statistics for each dataset are given in Table S3.

Repeat Restructuring across 0.5-6 Ma Timescales
In the youngest allotetraploid, N. rustica (Figure 2), repeat abundances are close to the sum of abundances expected from both parental donors. Nicotiana rustica has a genome size of 1C = 5.2 Gbp [22], which is close to the sum (5.3 Gbp) of the extant parental lineages N. paniculata (2.9 Gbp) and N. undulata (2.4 Gbp). It also has retained all 24 pairs of chromosomes (12 pairs from each subgenome). Previous studies using GISH (genomic in-situ hybridisation) showed clear additivity for both parental subgenomes and a lack of any clear genomic translocations [8,31]. Additionally, the distribution of several tandem repeats of the HRS60 family showed chromosomal locations concordant with their distribution in the parental genomes, apart from one from the N. undulata subgenome [31]. For ribosomal DNA, sequence conversion of the 18-5.8-26S locus is towards the N. undulata type (approx. 80% of sequences), the paternal genome donor [32], the same pattern found for N. tabacum. Recent studies in N. tabacum found there is much greater restructuring, including genomic translocations and significant loss of repetitive DNA from the paternal T-genome. Similar results were also found in synthetic lines of N. tabacum in the fourth generation [9,10,23].
In older tetraploids, repeat dynamics are much more variable (Figure 2), which likely reflects greater restructuring of the genome and turnover of repetitive elements. Species of section Repandae have retained all chromosomes (n = 24), ca. 4 Ma [18], but translocations identified via GISH [11] show genome exchange and reorganisation. In N. repanda, there is significant genome upsizing from the expected sum of parental subgenomes, of around 29% (observed 1C = 5.3 Gbp versus expected 1C = 4.1 Gbp) [11,13,22]. This genome size change is mostly due to an increase in high-abundance repeats, including Ty3/Gypsy chromovirus retroelements [13], and the same trajectory of repeat accumulation is found in N. stocktonii and N. nesophila [11]. Repeat dynamics are, however, to an extent species-specific, because N. nudicaulis has experienced overall genome downsizing (−14%), and its repeat abundances/types are much closer to expectation [11,13].
Section Suaveolentes is the oldest (ca. 6 Ma; [18]) and most species-rich [33] group of tetraploids in Nicotiana. In this section, most species have genome sizes that represent stark genome downsizing [34], and N. benthamiana has a genome size of 1C = 3.3 Gbp. This is in addition to a reduction in chromosome number (from n = 24 to n = 15 in some species), likely the result of descending dysploidy [34], not necessarily correlated with genome-size reduction. Ancestrally, section Suaveolentes has n = 24, but N. benthamiana has n = 18, 19. Such extensive genome reorganisation as part of the diploidisation process could potentially mask events that occurred shortly after allotetraploidisation at approx. 6 Ma [18]. It would be important to sample repeat types/abundances in several of the species with the ancestral (or nearly) chromosome numbers (there are six species with n = 23, 24) but among these, there is a wide range of genome sizes, 1C = 5.45-2.87 Gbp. Given the number of taxa in section Suaveolentes (ca. 40-50 species), investigating how repeat dynamics vary in this group will be central to discovering how it relates (or not) to chromosomal changes and genome sizes in a descending dysploid series (n = 15-24). In other groups, particularly Brassicaceae, there is extensive dysploidy post-polyploidisation [35], and this process is likely responsible for much of the diversification of angiosperms [7].

Lack of a Clear Parental Bias in Repeat Loss/Retention
As part of the "genomic shock" experienced by combining two divergent subgenomes within one neoallopolyploid nucleus, there are predictions regarding the level of genome reorganisation, and sequence loss/retention based on the direction of the cross. The maternal subgenome is expected to be favoured, relative to the paternal subgenome, due to their compatibility within the maternal cytoplasm. This potentially leads to specific degradation of various elements from the paternal subgenome, as predicted by the NCI hypothesis. In N. tabacum this was found to be the case, including synthetic lines, whereby the T-genome from N. tomentosiformis is significantly reduced relative to the S-genome from N. sylvestris [9,10]. However, not all lines showed genomic translocations, as predicted by NCI, which are therefore assumed to be more stochastic and not essential to polyploid success [23].
In the three allotetraploids studied here, we found no significant evidence of maternal bias in the repeat abundances (Figures 3 and 4). Nicotiana rustica has a much closer correlation between parental cluster size and cluster size ( Figures 3A and 4A), but its two parents are much more closely related than those of N. tabacum (Figure 1). This could give post-tetraploid diploidisation more stochasticity due to greater overall similarity of the two parental genomes (i.e., they are simply too similar for there to be a noticeable effect favouring one subgenome over the other). Additionally, the slopes of the regression lines were not found to be statistically different (Table 1). This reveals a faithful inheritance of most repeats from both parents, N. undulata and N. paniculata, and a lack of maternal cytoplasmic bias. Previously, this was suggested on the basis of a lack of genomic translocations, particularly involving perturbation of the paternal subgenome [23,31]. It is possible that if overcoming a greater level of genomic shock involves transgenomic translocations, then the effects of genomic incompatibility are greater, leading to a maternal bias, whereas if the two parental genomes are more similar then incompatibility is a much more subtle factor. We would argue that the ages of the two older allotetraploids also makes detecting parental biases difficult to impossible-too much genomic change has taken place post-polyploidisation to detect parental biases that might have been operating at earlier stages in the process.
For N. repanda and N. benthamiana, correlation between cluster size in parental lineages and tetraploids is much less clear (Figures 3 and 4), which reflects the longer timescales over which these tetraploid genomes have diverged. Slopes of parental regressions were significantly different in both cases ( Figure 3B,C; Table 1), but there is no clear directionality, e.g., favourable retention of maternal repeat abundances. In both cases, repeat changes in the allotetraploid are mostly a result of changes in higher copy sequences, but these are present in both parental subgenomes and subject to overall similar pressures. In the case of N. repanda this is overall genome upsizing, whereas in the case of N. benthamiana there is significant genome downsizing (Figure 2).

Conclusions
Overall our analyses show extensive repeat restructuring over longer time frames in sections Repandae and Suaveolentes, with the former experiencing genome upsizing and the latter significant genome downsizing. In the younger polyploid N. rustica we show little deviation from the expected sum of parental subgenomes. We did not find significant evidence of overall maternal bias in repeat retention as found previously for N. tabacum across the three separate allotetraploidisation events of <0.5 (N. rustica) 4 (N. repanda) and 6 (N. benthamiana) Ma respectively. Future work comparing the signatures of sequence retention in repetitive DNA, dark matter (i.e., degraded repeats) and gene space will be important to understand whether different processes govern the evolution of different sequence types post-polyploidisation and how these contribute to genome reorganisation.