Phylogeny of the Diploid Species of Rubus (Rosaceae)

Rubus L. (Rosaceae, Rosoideae) contains around 700 species distributed on all continents except Antarctica, with the highest species diversity in temperate to subtropical regions of the northern hemisphere. The taxonomy of Rubus is challenging due to the frequency of polyploidy, hybridization and apomixis. Previous studies mostly sampled sparsely and used limited DNA sequence data. The evolutionary relationships between infrageneric taxa, therefore, remain to be further clarified. In the present study, genotyping by sequencing (GBS) reduced-representation genome sequencing data from 186 accessions representing 65 species, 1 subspecies and 17 varieties of Rubus, with emphasis on diploid species, were used to infer a phylogeny using maximum likelihood and maximum parsimony methods. The major results were as follows: (1) we confirmed or reconfirmed the polyphyly or paraphyly of some traditionally circumscribed subgenera, sections and subsections; (2) 19 well-supported clades, which differed from one another on molecular, morphological and geographical grounds, were identified for the species sampled; (3) characteristics such as plants with dense bristles or not, leaves leathery or papyraceous, number of carpels, instead of inflorescences paniculate or not, aggregate fruits and leaves abaxially tomentose or not, may be of some use in classifying taxa whose drupelets are united into a thimble-shaped aggregate fruit that falls in its entirety from the dry receptacle; and (4) a preliminary classification scheme of diploid species of Rubus is proposed based on our results combined with those from previous phylogenetic analyses.


Phylogenetic Analysis
The chloroplast and nuclear datasets were analyzed separately. Equally weighted maximum parsimony (MP) jackknife (JK) analyses [72] were conducted using 1000 randomtaxon-addition replicates and tree-bisection-reconnection (TBR) branch swapping in PAUP* version 4.0a169 [73], with MAXTREES set to 10,000 and the removal probability set to approximately 37%, and "jac" resampling emulated analyses. Insertions and deletions were coded as missing data. Jackknife 50% majority-rule consensus trees were computed. jModelTest 2 [74,75] was used to test the models of nucleotide substitution for maximum likelihood (ML) [76]. The Akaike information criterion (AIC) [77] was used to select among models instead of the hierarchical likelihood ratio test (hLRT), following Pol [78] and Posada and Buckley [79].
Maximum Likelihood (ML) analyses were performed using RAxML-HPC2 on XSEDE version 8.2.12 [80] on the CIPRES web server [81], with 1000 rapid bootstrap analyses followed by a search for the best-scoring tree in a single run [82]. The nucleotide substitution models "GTRCAT" and "GTRGAMMA" were chosen for the bootstrapping phase for the chloroplast dataset and nuclear dataset, respectively.

GBS Data Analysis Summary
A summary of the sequence data generated from the samples studied is included in Supplementary Material Tables S1-S3. The 190 accessions, which consisted of 186 samples of Rubus and 4 samples of outgroups, were successfully sequenced with the Illumina HiSeq sequencing platform. The mean GC content was 37.85%, which is within the normal range. The sequencing data were high-quality (Q20 ≥ 96.19%, Q30 ≥ 89.54%), and subsequent analyses could be performed. After excluding the outgroups, the average sequencing depth of the Rubus samples ranged from 10.81 to 75.48, with an average of 25.21. The average degree of sequence coverage from the samples of Rubus was 14.51% for at least singlebase coverage, while the average degree of coverage of the samples with at least 4-base coverage was 8.68%. A total of 7,293,296 SNPs (nuclear genome) and 8115 SNPs (chloroplast genome) were obtained according to detection with SAMtools software; 256,954 highquality SNPs (nuclear genome) and 1848 high-quality SNPs (chloroplast genome) were obtained after filtration.

Phylogenetic Tree Based on SNPs of Nuclear Genome
Both maximum likelihood (ML) phylogeny and maximum parsimony (MP) jackknife (JK) analyses based on high-quality SNPs from the nuclear genome showed Rubus to be monophyletic ( Figure 1). subsequent analyses could be performed. After excluding the outgroups, the average sequencing depth of the Rubus samples ranged from 10.81 to 75.48, with an average of 25.21. The average degree of sequence coverage from the samples of Rubus was 14.51% for at least single-base coverage, while the average degree of coverage of the samples with at least 4-base coverage was 8.68%. A total of 7,293,296 SNPs (nuclear genome) and 8115 SNPs (chloroplast genome) were obtained according to detection with SAMtools software; 256,954 high-quality SNPs (nuclear genome) and 1848 high-quality SNPs (chloroplast genome) were obtained after filtration.

Phylogenetic Tree Based on SNPs of Nuclear Genome
Both maximum likelihood (ML) phylogeny and maximum parsimony (MP) jackknife (JK) analyses based on high-quality SNPs from the nuclear genome showed Rubus to be monophyletic ( Figure 1).  ML bootstrap (BS) support and MP JK support are shown on the phylogenetic tree from the ML analysis. Based on our reconstructed nuclear phylogeny and in consideration of morphological characteristics and distribution information, 19 maximumly supported (ML BS = 100%, MP JK = 100%) clades of Rubus were identified in this study ( Figure 1). The relationships among most of the 19 clades were also well supported.
The topologies and support values from ML and MP analyses were mostly similar except for the position of Clade XI. The ML analyses resolved Clade XI as sister to the lineage containing Clade XII-Clade XIX with maximum support values. In contrast, the MP analysis resolved Clade XI as sister to the remaining clades except for Clade I and Clade II; support values were also maximum.
Clade I plus Clade II were resolved as monophyletic with 77% ML BS and 94% MP JK support and as sister to the rest of the ingroup.
Except for Clade XI, the basal Clade I plus Clade II, and the predominately polyploid Clade XII, Clade XIII and Clade XIV, the remaining clades formed two strongly supported superclades. Superclade A contained Clade III-Clade X, and Superclade B consisted of Clade XV-Clade XIX.

Phylogenetic Tree Based on SNPs of Chloroplast Genome
The topologies from the ML and MP analyses were similar for the chloroplast phylogeny, which also showed Rubus to be a monophyletic group. However, the topologies and support values based on the chloroplast data were not identical to those based on the nuclear data ( Figure 2). First, for the topology of the chloroplast tree, 13 (Clades II, III, IV, V, VI, VII, IX, X, XII, XIII, XIV, XV and XVII) of the 19 major clades identified from the nuclear data were reconfirmed, but the monophyly of 6 clades (Clades I, VIII, XI, XVI, XVIII and XIX) was not supported. Rubus lasiococcus did not cluster with Rubus pedatus. The accessions of Clade VIII identified by nuclear data were split into two clusters on the chloroplast tree, one of which (including Rubus eustephanos, Rubus hirsutus, Rubus rosifolius, etc.) was sister to Clade VI. Clade XVI clustered with a portion of the accessions in Clade XVIII (Rubus pseudopileatus var. glabratus, Rubus trijugus, etc.) and Rubus lasiostylus and R. lasiostylus var. villosus of Clade XIX. Species in the remainder of Clade XIX clustered with some of the accessions in Clade XVIII (Rubus biflorus, etc.). Second, the two superclades were not supported by the chloroplast data. Third, Clade III was sister to the lineage containing Clades XVI, XVIII and XIX, instead of the lineage containing Clades IV to X. Clade XIV nested within Clade XI and was not sister to Superclade B as identified by nuclear data. Clade XVII was more closely related to Clade XV than to Clades XVIII and XIX.

Phylogenetic Incongruence between the Chloroplast and Nuclear Phylogenies
The monophyly of Rubus in its current circumscription was maximally supported in this study (Figures 1 and 2) and consistent with the results of earlier studies [1,42,46], further confirming Rubus to be a natural group. Phylogenetic incongruence, however, was detected between the chloroplast and nuclear phylogenies as the topologies based on the chloroplast data were not identical to those based on the nuclear data. The discordance between the nuclear and chloroplast phylogenetic relationships may be due to incomplete lineage sorting, genetic introgression and hybridization.

Phylogenetic Incongruence between the Chloroplast and Nuclear Phylogenies
The monophyly of Rubus in its current circumscription was maximally supported in this study (Figures 1 and 2) and consistent with the results of earlier studies [1,42,46], further confirming Rubus to be a natural group. Phylogenetic incongruence, however, was detected between the chloroplast and nuclear phylogenies as the topologies based on the chloroplast data were not identical to those based on the nuclear data. The discordance between the nuclear and chloroplast phylogenetic relationships may be due to incomplete lineage sorting, genetic introgression and hybridization.

Major Evolutionary Lineages within Rubus Revealed in This Study
Within the ingroups, the 186 accessions representing 65 species, 1 subspecies and 17 varieties of Rubus included in the current study were assignable to the following 19 wellsupported clades ( Figure 1). Those clades are also supported by morphological characteristics.
The clades ( Figure 1) can be divided into five main groups and thus may represent several major evolutionary lineages within Rubus, which is somewhat consistent with the results of Carter et al. [42]. The first group, found primarily in North America and containing Clades I and II, represents the basal lineage of Rubus. The second group was composed of Clade XI and mainly includes members of Rubus bearing aggregate fruit that falls together with the receptacle. The third group consists of Superclade A, including Clades III to X. The fourth group contains Clades XIII and XIV and represents the polyploid species of Rubus. The fifth group consists of Superclade B, including Clades XV to XIX. Unfortunately, morphological homoplasy is common in Rubus, thereby making it challenging to distinguish Superclades A and B from the other groups. That is, morphological synapomorphies of the two superclades are unclear at present.
Clade I--Clade I, composed of R. lasiococcus and R. pedatus, was shown to be sister to Clade II with high support (ML BS = 77%, MP JK = 94%; Figure 1). R. lasiococcus is native to North America and R. pedatus occurs naturally in North America and NE Asia. Both species were included in R. subg. Dalibarda by Focke [9,11]. They are morphologically similar in their creeping herbaceous habit, unarmed stems, broad stipules and filiform filaments.
Clade II--Clade II contained two species, Rubus odoratus and Rubus parviflorus, and was sister to Clade I ( Figure 1). The sister relationship between R. odoratus and R. parviflorus was consistent with the findings by Carter et al. [42]. The two species, both native to North America and included in R. subg. Anoplobatus by Focke [9,11], are distinguished from other species of Rubus by habit (erect, prickleless shrubs) and their simple palmately lobed or divided leaves. Our study, together with previous studies [1,42], provides strong molecular evidence that R. subg. Dalibarda, R. subg. Anoplobatus and R. subg. Chamaemorus occupy basal positions in the phylogenetic trees, which conflicts with Lu's [5] hypothesis of R. subg. Idaeobatus being the most primitive group.
Clade III--Clade III contained Rubus arcticus subsp. stellatus and Rubus pubescens and was strongly supported (ML BS = 100%, MP JK = 100%) as sister to the lineage containing Clades IV to X ( Figure 1). Clade III to Clade X together form the lineage Superclade A, the morphological synapomorphy of which is unclear at present. R. arcticus subsp. stellatus occurs in North America and the Russian Far East; R. pubescens is in North America. Morphologically, they are somewhat similar to the members of Clade I, but differ from the latter in having dilated, laminar filaments.
Clade IV--Clade IV, composed of eigh samples of Rubus ellipticus var. obcordatus and Rubus wallichianus, was strongly supported (ML BS = 100%, MP JK = 100%) as sister to the clade containing Clades V to X (Figures 1 and 2). The species of Clade IV are endemic to Asia [6,7] and were included in R. subg. Idaeobatus sect. Idaeanthi ser. Elliptici sensu Focke [10,11]. The morphological synapomorphies of Clade IV are shrubs with sparse, curved prickles and dense, spreading reddish brown bristles and usually threefoliolate leaves.
Clade V--This well-supported clade contained only one species, Rubus peltatus of China and Japan. Clade V was sister to Clade VI with maximum support (Figure 1). R. peltatus differs from other species of Rubus in having peltate leaves. Our molecular analysis supported the treatment of Focke [10,11], Yu and Lu [83], Lu and Yu [6] and Naruhashi [84], who placed R. peltatus in the monospecific section R. subg. Idaeobatus sect. Peltati (Lu and Yu's section and subsection, respectively, almost corresponding to Focke's subgenus and section).
Clade VI--In our samples, Clade VI contained Rubus corchorifolius, Rubus chingii, Rubus glabricarpus var. glabricarpus and R. glabricarpus var. glabratus. The sister relationship between Clade VI and Clade V was well supported (ML BS = 100%, MP JK = 100%) in our analyses ( Figure 1). The four taxa of Clade VI are distributed in Asia and were included in R. sect. Idaeobatus subsect. Corchorifolii by Lu and Yu [6]. Morphologically, species of Clade VI share the following features: flowers usually solitary, leaves simple, stipules adnate to base of petiole, stems usually pilose or glandular (except in R. chingii), aggregate fruit nearly globose and hairy or glabrous. The members of this clade are similar in appearance to taxa included in Clade VII, with the main difference being that the latter have three to several flowers in nearly corymbiform inflorescences and glabrous aggregate fruit. Clade VI can be divided into two subclades. The first contains R. corchorifolius and R. chingii. Species of this subclade lack glandular hairs and the fruits are hairy. Members of the second subclade, represented by R. glabricarpus var. glabricarpus and R. glabricarpus var. glabratus, have glandular hairy pedicels and glabrous fruits.
Clade VII--Clade VII comprised two species-Rubus crataegifolius and Rubus conduplicatus-in our sampling and was strongly supported (ML BS = 100%, MP JK = 100%) as sister to a monophyletic lineage composed of Clades VIII, IX and X ( Figure 1). Species of Clade VII are morphologically similar to each other in their nearly corymbiform inflorescences with three-to-several flowers, simple leaves, stipules adnate to the base of the petiole, aggregate fruit nearly globose and glabrous.
Clade VIII--In our sample, Clade VIII contained R. eustephanos, R. hirsutus, R. rosifolius, Rubus sumatranus and Rubus cf. tsangii, and was strongly supported (ML BS = 100%, MP JK = 100%) as sister to the monophyletic lineage formed by Clades IX and X ( Figure 1). Species of Clade VIII are mainly in Asia. Morphologically, they share a shrubby habit, imparipinnate leaves and ca. 100 or more carpels usually inserted on a stipitate torus. Our molecular evidence supported the classification of Focke [10,11], Yu and Lu [83], Lu and Yu [6] and Naruhashi [84], who placed these species in R. subg. Idaeobatus sect. Rosifolii as a natural group.
Clade IX--Clade IX is composed of four endemic Asian species-Rubus delavayi, Rubus macilentus, Rubus simplex and Rubus xanthocarpus. The sister relationship between Clade IX and Clade X was well supported (ML BS = 100%, MP JK = 100%; Figures 1 and 2). Morphologically, species of Clade IX are characterized by subshrubs or herbs' leaves with three (or five) leaflets, and the abaxial surface of the calyx is pubescent and with straight needle-like or curved minute prickles (except in R. macilentus where the calyx is unarmed). R. simplex and R. xanthocarpus were both previously included in R. subg. Cylactis by Focke [9,11] for their suffruticose or nearly herbaceous habit. Yu and Lu [83], Lu and Yu [6] and Lu and Boufford [7], noting that the two species are morphologically similar to some members of R. subg. Idaeobatus in non-paniculate inflorescences and that the abaxial surface of the leaflets and aggregate fruit were not tomentose, transferred them to R. sect. Idaeobatus sensu Lu and Yu [6]. Our study (Figures 1 and 2) showed that R. simplex and R. xanthocarpus are more closely related to R. macilentus of R. subg. Idaeobatus than to R. pubescens of R. subg. Cylactis, which supported the classification of Lu and Yu [6] and Lu and Boufford [7]. Clade X--Clade X consisted of four samples of Rubus columellaris and two samples of Rubus impressinervus. It was resolved as sister to the monophyletic Clade IX with maximum support (Figures 1 and 2). The two species are morphologically quite different from the other species of Rubus that we sampled in having approximately leathery leaves. Our study suggests that R. sect. Idaeobatus subsect. Leucanthi sensu Lu and Yu (including R. columellaris, R. delavayi, R. impressinervus, etc.) [6] is paraphyletic and implies that the species with leathery leaves in R. subg. Idaeobatus may be a natural group and that R. delavayi might be excluded from R. subg. Idaeobatus sect. Leucanthi.
Clade XI--Clade XI was composed of Rubus allegheniensis, Rubus argutus, Rubus canadensis, etc. ML analyses of nuclear data indicated that this clade is strongly supported (BS = 100%) as sister to the monophyletic lineage formed by Clade XII-Clade XIX (Figure 1). The morphological characteristics of this clade include shrubs being often prickly, leaves usually ternate, pedately or palmately and quinately compound, persistent narrow stipules which are mostly adnate to the base of the petiole, drupelets remaining on the fleshy receptacle at maturity or falling with the receptacle, or falling separately. Geographically, the members of this clade are mainly American and Eurasian. The eight species of this clade and Rubus caesius (4x) of Clade XIV were all included in R. subg. Eubatus (= subg. Rubus) by Focke [11]. Although the nine species clustered together based on chloroplast data (Figure 2), the phylogenetic analyses based on our nuclear data (Figure 1), ITS [1] and nearly one thousand low copy nuclear genes [42] suggested that R. subg. Rubus is not monophyletic.
Clade XII--Clade XII contained three samples of Rubus paniculatus, one of the few doubtfully diploid species reported to occur in the predominately polyploid R. subg. Malachobatus [27]. This clade was strongly supported (ML BS = 100%, MP JK = 100%) as sister to Clade XIII based on the nuclear data ( Figure 1). The remarkable morphological characteristics of Clade XII include shrubs that are often prickly, stems erect, arching or climbing and broad, and free stipules that are caducous or persistent on the twig near the base of the petiole.
Clade XIII--Clade XIII, comprising the Asian endemics Rubus pentagonus var. pentagonus (tetraploid [42]) and R. pentagonus var. eglandulosus, was sister to Clade XII with maximum support (Figure 1). Species of Clade XIII are morphologically distinguishable from other sampled taxa by their shrubby habit and palmately compound leaves with three or five leaflets. The close phylogenetic relationship between R. pentagonus of subg. Idaeobatus and R. paniculatus of subg. Malachobatus is consistent with the findings of Wang et al. [46] and Carter et al. [42] and may provide some support to the hypothesis that R. pentagonus may be one of the possible progenitors of the subg. Malachobatus polyploids [42,46].
Clade XIV--Clade XIV consisted of two samples of R. caesius (Figure 1), a tetraploid of subgenus Rubus occurring naturally from Europe and western Asia to western China. Our chloroplast phylogeny showed that R. caesius was nested within Clade XI (ML BS = 100%, MP JK = 100%; Figure 2), indicating that the maternal parent of R. caesius was likely from R. subg. Rubus, which is consistent with the findings of Carter et al. [42].
Clade XV--Clade XV contained two North American species, Rubus leucodermis and R. occidentalis. The nuclear phylogeny indicated that Clade XV was sister to the lineage formed by Clades XVI through XIX (ML BS = 100%, MP JK = 100%; Figure 1). Clade XV to Clade XIX together form superclade B (ML BS = 100%, MP JK = 100%; Figure 1), the morphological synapomorphy of which is unclear at present. Morphologically, the two species of Clade XV share palmately compound or ternate leaves, nearly black aggregate fruit and drupelets separating from the torus. R. leucodermis, R. occidentalis and two species, Rubus eriocarpus and Rubus glaucus, which were not sampled, were all placed in R. subg. Idaeobatus sect. Idaeanthi ser. Occidentales by Focke [9,11]. In the molecular phylogeny the three diploid species, R. leucodermis, R. occidentalis and R. eriocarpus, were closely related, while the tetraploid R. glaucus aligned with some putative blackberry × raspberry hybrids [42], suggesting this series defined by Focke is not monophyletic.
Clade XVII--Clade XVII, composed of Rubus pungens var. pungens, R. pungens var. ternatus and R. pungens var. oldhamii, was resolved as sister to the monophyletic lineage formed by Clade XVIII and Clade XIX with maximum support (Figure 1). Generally, species of Clade XVII are widely distributed from Kashmir to Japan and exhibit complex morphological variability: prickles dense to sparse, glandular hairs present or absent and size of leaflets unstable. However, they differ in appearance from the other species of Rubus we sampled: shrubs, stems longer, climbing or trailing, with dense or sparse needle-like prickles (R. pungens var. oldhamii sometimes nearly unarmed), vegetative reproduction mainly by rooting at apex of stem, three-nine-foliolate leaves, abaxial surface of calyx with needle-like prickles, inflorescences terminal or axillary, 1-flowered or corymbose 2to 4-flowered.
Clade XVIII--Clade XVIII included R. biflorus var. biflorus, R. cf. biflorus var. adenophorus, R. biflorus var. pubescens, R. pseudopileatus var. glabratus, R. cf. pseudopileatus var. glabratus and R. trijugus and was sister to Clade XIX with maximum support (Figure 1). Species contained in Clade XVIII occur in Asia and have the following features: shrubs, branchlets usually pruinose, pedicels distally inflated, leaves thick papyraceous or semi-leathery, aggregate fruit yellow or reddish yellow and densely gray tomentose (tomentum deciduous in R. biflorus and its varieties), flowers 1.5-3 cm in diameter and one to several in corymbose inflorescences.

A Preliminary Classification Scheme of the Diploid Species of Rubus
According to our results, combined with previous phylogenetic analyses [1,21,37,42,46,47,49,50,52,53], a preliminary classification scheme of the diploid species of Rubus (Figure 3) is proposed herein. R. paniculatus, representing R. subg. Malachobatus, was excluded since its ploidy level is uncertain and it was not verified in this study. Before making formal taxonomic and nomenclatural decisions on an infrageneric classification, more samples and molecular data are needed to unravel and confirm the relationships and evolutionary history of these clades.
A preliminary classification scheme of the diploid species of Rubus.
3a. Stem herbaceous, never prickly, rarely bristly; stipules broad, free or nearly so; floral branches arising directly from rootstock or from stolons.

Conclusions
The taxonomy of Rubus is challenging and the phylogenetic relationships within Rubus remain to be clarified. This study inferred the phylogeny of Rubus with emphasis on diploid species based on GBS data with comprehensive taxon sampling. Our results provided useful information for deducing the phylogeny of Rubus, especially providing important insights into the evolution of Rubus in China. We reconfirmed that R. subg. Idaeobatus, recognized by Focke, is not monophyletic. We found that characteristics such as leathery or papyraceous leaves may be of some use in classifying the raspberries. Based on our results, and combined with previous phylogenetic analyses, a preliminary classification scheme of the diploid species of Rubus is proposed here. Further studies, however, including more samples and additional molecular data, are still needed to better unravel and confirm the complicated evolutionary history of Rubus.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes14061152/s1, Table S1: Quality statistics of sequencing data of 190 samples; Table S2: Statistics of sequencing depth and coverage of nuclear genome of 190 samples; Table S3