Complete chloroplast genome assembly and phylogenetic analysis of blackcurrant (Ribes nigrum), red and white currant (Ribes rubrum), and gooseberry (Ribes uva-crispa) provide new insights into the phylogeny of Grossulariaceae

Background Blackcurrant (Ribes nigrum), red currant (R. rubrum), white currant (R. rubrum), and gooseberry (R. uva-crispa) belong to Grossulariaceae and are popular small-berry crops worldwide. The lack of genomic data has severely limited their systematic classification and molecular breeding. Methods The complete chloroplast (cp) genomes of these four taxa were assembled for the first time using MGI-DNBSEQ reads, and their genome structures, repeat elements and protein-coding genes were annotated. By genomic comparison of the present four and previous released five Ribes cp genomes, the genomic variations were identified. By phylogenetic analysis based on maximum-likelihood and Bayesian methods, the phylogeny of Grossulariaceae and the infrageneric relationships of the Ribes were revealed. Results The four cp genomes have lengths ranging from 157,450 to 157,802 bp and 131 shared genes. A total of 3,322 SNPs and 485 Indels were identified from the nine released Ribes cp genomes. Red currant and white currant have 100% identical cp genomes partially supporting the hypothesis that white currant (R. rubrum) is a fruit color variant of red currant (R. rubrum). The most polymorphic genic and intergenic region is ycf1 and trnT-psbD, respectively. The phylogenetic analysis demonstrated the monophyly of Grossulariaceae in Saxifragales and the paraphyletic relationship between Saxifragaceae and Grossulariaceae. Notably, the Grossularia subgenus is well nested within the Ribes subgenus and shows a paraphyletic relationship with the co-ancestor of Calobotrya and Coreosma sections, which challenges the dichotomous subclassification of the Ribes genus based on morphology (subgenus Ribes and subgenus Grossularia). These data, results, and insights lay a foundation for the phylogenetic research and breeding of Ribes species.

Plastids are semi-autonomous organelles in the plant cell, carrying out photosynthesis, and various metabolic and signaling functions (Daniell et al., 2016;Dobrogojski, Adamiec & Lucinski, 2020).Due to the haploidy, uniparental inheritance, no genetic recombination, conserved genomic structure, and relatively small genome size of plastids, the polymorphic loci of plastids have been increasingly applied to plant phylogenetics, taxonomy, population genetics, and marker development for molecular breeding and DNA barcoding (Olmstead & Palmer, 1994;Kuang et al., 2011;Dong et al., 2012;Gao et al., 2022).Meanwhile, the possibility that a lack of variation in the modest number of plastid genes may hamper species discrimination, suggesting improved resolutions based on complete plastome sequences (Parks, Cronn & Liston, 2009;Nock et al., 2011).Through comparative analysis based on the complete chloroplast (cp) sequences, the phylogeny of some plants, such as Cardiocrinum, Ficus, Orchidaceae, and Myrtales, has been delineated from the species level to the order level (Zhang et al., 2021;Chen, Hu & Zhang, 2022;Huang et al., 2022;Jiang et al., 2022).However, only five Ribes cp genomes have been released in NCBI to date (Jan.2023).Among them, R. glaciale and R. fasciculatum var.chinense (Dong et al., 2018) are wild species indigenous to East Asia; and Sierra currant (R. nevadense), Sierra gooseberry (R. roezlii) (Folk et al., 2020), and clove currant (R. odoratum) (Wang et al., 2021) are endemic species of America.There has been no report about the cp genomes of cultivated Ribes species and their phylogenies, which has severely limited the germplasm exploration, genetic conservation, cross-breeding, and genetic improvement of Ribes.
Herein, complete cp genomes of blackcurrant (R. nigrum), red currant and white currant (R. rubrum), and gooseberry (R. uva-crispa) were assembled and characterized, laying a molecular foundation for the research on Ribes taxa.The study also identified the highly polymorphic sites across all the nine released Ribes cp genomes, providing practical tools for molecular breeding and DNA barcoding.Moreover, phylogenetic trees of Saxifragales were constructed by using 31 complete cp genomes, which can shed new light on the phylogeny of the Grossulariaceae family and Ribes species.

Plant materials and DNA extraction
The fresh leaves of blackcurrant 'Liangye houpi' (R. nigrum), red currant 'Red Cross' (R. rubrum), white currant 'Witte Parel' (R. rubrum), and gooseberry 'Pixwell' (R. uvacrispa) were collected at the Horticultural Station of Northeast Agricultural University (126.73E, 45.74 N), Harbin, China.Samples were frozen immediately in liquid nitrogen and kept at an ultra-low temperature freezer (-80 C) until DNA extraction.Total genomic DNA was isolated using the Hi-Fast Plant Genomic DNA Kit (GeneBetter BioTech Co. Ltd., Beijing, China) following the manufacturer's instructions (http://www.gene-better.cn/).Sequencing, assembly, and annotation The DNA library preparation and next-generation sequencing were accomplished at Frasergen Bioinformatics Co., Ltd.(Wuhan, China) following the manufacturer's instructions (MGI Tech Co., Ltd., Shenzhen, China) of the DNBSEQ-T7 platform.

Repeat, SNP, and indel identification
Tandem Repeats Finder (Benson, 1999) was used to determine tandem repeats with default parameters.REPuter (Kurtz et al., 2001) was used to determine forward (F), palindromic (P), reverse (R), and complement (C) repeats with the parameters of "Hamming Distance: 3; Maximum Computed Repeats: 5,000 bp; Minimum Repeat Size: 30 bp".SSR loci were identified using Misa-web (Beier et al., 2017) (http://pgrc.ipk-gatersleben.de/misa/)with default parameters.SNPs and Indels among the nine cp genomes were identified using DnaSP v6.12 (Rozas et al., 2017) and the Pi values were calculated with a window length of 600 bp and a step size of 200 bp.

Identification of repeat elements among the four Ribes cp genomes
Aiming to provide useful information for marker development, the repeat elements in the four cp genomes were annotated.A total of 219 long repeats were identified (Fig. 3A), including tandem, forward, and palindromic repeats but not reverse and complement repeats.Tandem repeats accounted for the largest proportion (44.7%), followed by palindromic repeats (30.1%) and then forward repeats (25.1%) (Tables S2 and S3).

Genomic comparison of the nine Ribes cp genomes
We performed a genomic comparison of nine Ribes cp genomes, including the four assembled in the present study and five cp genomes released previously (R. glaciale, R. fasciculatum var.chinense, R. nevadense, R. roezlii, and R. odoratum).The nine Ribes cp genomes showed identical genomic features, gene orders, and gene numbers (Fig. 4).
As expected, the coding regions exhibited lower levels of divergence than non-coding regions.The pairwise genetic divergence between any two of the nine cp genomes was lower than 2.90% (Table S5).The highest sequence divergence was detected between R. fasciculatum var.chinense and R. odoratum.Notably, the cp genomes of red currant and

Identification of SNPs and Indels in the Ribes cp genomes
A total of 3,322 SNPs and 485 Indels were identified in the nine Ribes cp genomes (Table 3).The LSC region comprised the largest number of SNPs (2,337) and Indels (361), followed by the SSC region (701 SNPs and 65 Indels).Generally, the CDS is more conserved than non-coding regions.Here, a total of 1,594 SNPs and 377 Indels were detected in the intergenic spacer (IGS); 424 SNPs and 80 Indels were found in the intron; and 1,304 SNPs and 28 Indels were detected in the CDS (Table S6).The most polymorphic intergenic region was trnT-psbD, which contains 86 mutation sites, followed by ndhF-rpl32 (76) and atpH-atpI (71).The most variable gene ycf1, whose coding-sequence harbors 248 mutation sites, followed by rpoC2 (104), ndhF (56), and ycf2 (56).In terms of introns, the introns of trnK and matK contained the most variable mutation sites (57), followed by introns of rpl16 (38) and ndhA (57) (Tables S7 and S8).In the pairwise comparison, the most variation sites were found between R. roezlii and R. fasciculatum var.chinense  (1,863), and the fewest variation sites (507) were detected between blackcurrant (R. nigrum) and R. nevadense (Table S9).The nucleotide diversity (Pi) of the nine Ribes cp genomes was visualized using a window length of 600 bp and a step size of 200 bp (Fig. 6).
Notably, the nine Ribes taxa were clustered independently to form clade I with maximum BS and PP values instead of nesting within Saxifragaceae (clade II), indicating the monophyly of Grossulariaceae.In terms of the Ribes genus (Fig. 7B), R. fasciculatum (subgenus Ribes, section Parilla) was first differentiated from other species, followed by R. odoratum (subgenus Ribes, section Symphocalyx).However, the ancestral clade of the remaining seven species was not well-resolved (BS = 52%).In addition, paraphyletic relationships were found between the Berisia and Ribesia sections, between the Coreosma and Calobotrya sections, and between the Hesperia and Grossularia sections.Notably, the Grossularia subgenus was nested within the Ribes subgenus instead of being parallel to it at the basal of the tree as expected.

DISCUSSION
The present study for the first time assembled and characterized the complete cp genomes of four cultivated Ribes taxa, including blackcurrant (R. nigrum), red currant and white currant (R. rubrum), and gooseberry (R. uva-crispa), which enriches the genomic foundation for Ribes.The Grossulariaceae showed relatively high chloroplast conservativeness in terms of sequence length, structural organization, gene content, order, and arrangement than other plant families such as Gentianaceae, Asteraceae, Caprifoliaceae, and the genetic close family, Saxifragaceae (Walker, Zanis & Emery, 2014;Sun et al., 2018;Wang et al., 2020;Chen et al., 2022).Intriguingly, the results showed that the IR/SC shift types of the nine Ribes cp genomes are in accordance with the phylogenetic groups (Figs. 5,7B and S2), suggesting the shrinkage and expansion of the IR/SC boundaries might serve as an additional confirmation for phylogenetic results that calculated from the algorithms based on sequences.Notably, the cp genomes of red currant (OP888488) and white currant (OP888486) are 100% identical, which supports the hypothesis: white currant is an albino cultivar of red currant (Reisch & Pratt, 1996;Prange, 2002;Lim, 2012) and calls for further verification based on nuclear data.Repeat elements play an important role in sequence divergence and are involved in plastome rearrangement (Timme et al., 2007;Weng et al., 2014).Herein, 51/40, 64/42, and 53/54 long-repeats/SSRloci was respectively identified in R. rubrum, R. uva-crispa, and R. nigrum cp genomes, whereas that of the wild R. odoratum is 49/56 (Wang et al., 2021).These differences not merely display the characteristic evolutionary trace of these plastomes but also provide effective information for species discrimination and population genetics.Moreover, a total of 3,322 SNPs and 485 Indels were identified in the nine released Ribes cp genomes.
In particular, the intergeneric spacer of trnT-psbD, the CDS of ycf1, and the introns of trnK and matK showed high-resolution potentials.The molecular markers developed from these loci may contribute to genetic evaluations for germplasm collections, conservation genetic research for endangered species or enigmatic taxa, DNA barcoding for endemic species or important cultivars, and hybrid identification in cross-breeding of Ribes in the future.
In terms of the phylogeny of Grossulariaceae, we for the first time used the genomic data of Ribes species to clarify the monophyly of Grossulariaceae and the paraphyletic relationship between Grossulariaceae and Saxifragaceae, which are consistent with the results of previous studies about the phylogeny of Saxifragales (Dong et al., 2013(Dong et al., , 2018)).Moreover, in both ML and Bayesian trees, the highly supported co-ancestral clade of Saxifragaceae and Grossulariaceae and the relatively later divergence between them were revealed, suggesting their close phylogenetic relationship, which is in accordance with the previous studies based on various data types (Dong et al., 2013;Soltis et al., 2013;Han et al., 2022).In terms of the infrageneric relationships, our results generally support the consensus classification of Ribes proposed by Berger (1924) and revised by Sinnott (1985), that is, the Ribes genus includes 12 sections of Berisia, Calobotrya, Coreosma, Grossularioides, Heritiera, Parilla, Ribesia, Symphocalyx, Grossularia, Hesperia, Lobbia and Robsonia, which is also supported by recent studies based on molecular markers (Messinger, Hummer & Liston, 1999;Weigend, Mohr & Motley, 2002;Senters & Soltis, 2003;Schultheis & Donoghue, 2004).Although our results revealed a well-supported clade of the subgenus Grossularia, only two species (R. roezlii and R. uva-crispa) in this subgenus were used.Therefore, the monophyletic origin of Grossularia cannot be asserted here.However, in any case, the clade of the Grossularia subgenus is nested within the Ribes subgenus and shows a paraphyletic relationship with the co-ancestor of the Calobotrya and Coreosma sections either in the present study or in previous studies (Senters & Soltis, 2003;Schultheis & Donoghue, 2004), which is incongruent to that of the generally accepted subclassification based on morphological characteristics of the Ribes genus, regardless of whether gooseberry (Grossularia) is monophyletic.However, according to the limited samples and only cp genomic data used in the present study, the thorough clarification of intrageneric phylogeny and infrageneric resolution, evolutional scenario, and diversification history of Ribes still call for comprehensive studies based on large-scale sample collection, nuclear genomic data, biogeographic information, and fossil evidence.

CONCLUSIONS
The complete cp genomes of blackcurrant (R. nigrum), red currant and white currant (R. rubrum), and gooseberry (R. uva-crispa) have lengths of 157,450-157,802 bp and share a total of 131 genes (86 protein-coding genes, 37 tRNA genes, and eight rRNA genes).Notably, a 100% sequence identity was observed between red currant and white currant, supporting the inference that the latter is an albino cultivar of the former.A total of 485 Indels and 3,322 SNPs were identified in the nine released Ribes cp genomes.Ycf1 (Pi = 0.020) and trnT-psbD (Pi = 0.021) are the most polymorphic genic and intergenic regions, respectively.The ML and Bayesian phylogenetic trees provide strong support for the monophyly of Grossulariaceae in Saxifragales and the paraphyletic relationship between Saxifragaceae and Grossulariaceae.The tree topology of the nine Ribes taxa shows that the Grossularia subgenus is nested within the Ribes subgenus, and has a paraphyletic

Figure 2
Figure 2 Consensus feature map of cp genomes for blackcurrant (Ribes nigrum), red and white currant (R. rubrum), and gooseberry (R. uvacrispa).Genes outside and inside the circle are transcribed clockwise and counterclockwise, respectively.Gray bars in the inner ring show the GC content percentage.The inverted repeat regions (IRa and IRb) are denoted with thick lines, which are separated by the large single copy (LSC) region and the small single copy (SSC) region.Genes belonging to different functional groups are color-coded accordingly at the bottom left.Full-size  DOI: 10.7717/peerj.16272/fig-2

Figure 3 Figure 4
Figure 3 Characterization of repeats in the four Ribes cp genomes.(A) Summary of long repeats in the four Ribes cp genomes.(B) Types and repetitions of SSRs identified in the four cp genomes.Mono: mononucleotide-repeat; Di: dinucleotide-repeat.The numbers above the bars are the counts of the SSRs and those below the bars are the repetitions.(C) Counts of SSRs in the LSC, SSC, and IR regions.(D) Counts of SSRs in intergenic, intron, and protein-coding regions.Full-size  DOI: 10.7717/peerj.16272/fig-3

Figure 5
Figure 5 Comparison of LSC, SSC, and IR region boundaries in the nine Ribes cp genomes.Distance in the figure is not to scale.The gene names are indicated in the colored box, and the length of the corresponding region is shown next to the box.Ψ represents the pseudogene.JLB, LSC-IRb boundary; JSB, IRb-SSC; JSA, SSC/IRa; JLA, IRa-LSC.LSC, large single copy; IRa, IRb, two IR regions that are identical but in opposite orientations; SSC, small single copy.Full-size  DOI: 10.7717/peerj.16272/fig-5

Figure 6
Figure 6 Nucleotide diversities (Pi) of the nine Ribes chloroplast genomes presented in a sliding window approach.X-axis represents the position of the window midpoint; Y-axis represents the nucleotide diversity within each window (window size: 600 bp; step size: 200 bp).Six regions with relatively high Pi values were marked based on the alignment and annotation.Full-size  DOI: 10.7717/peerj.16272/fig-6

Figure 7
Figure7Chloroplast phylogeny of Saxifragales and Ribes.(A) Maximum likelihood tree constructed using 29 complete cp genomes from Saxifragales and two outgroup species from Rosales with 1,000 bootstrap replications.(B) Maximum likelihood tree inferred from nine complete cp genomes of Ribes with 1,000 bootstrap replications.Vertical bars at the right label the section classification of these nine taxa according to the consensus revision ofBerger (1924),Sinnott (1985),Messinger, Hummer &Liston (1999), andSenters &Soltis (2003).Full-size  DOI: 10.7717/peerj.16272/fig-7 Sun et al. (2023), PeerJ, DOI 10.7717/peerj.1627215/21 relationship with the co-ancestor of the Calobotrya and Coreosma sections.Our findings lay a foundation for the phylogenetic research and molecular breeding of Ribes species.

Table 1
Summary of cp genome subunits in the four Ribes taxa.

Table 2
Gene annotation of the four Ribes cp genomes.

Table 3
Summary of SNPs and Indels identified in the nine Ribes cp genomes.