Integrative Taxonomy of Armeria Taxa (Plumbaginaceae) Endemic to Sardinia and Corsica

Sardinia and Corsica are two Mediterranean islands where the genus Armeria is represented by 11 taxa, 10 out of which are endemic. An integrative approach, using molecular phylogeny, karyology, and seed and plant morphometry was used to resolve the complex taxonomy and systematics in this group. We found that several taxa are no longer supported by newly produced data. Accordingly, we describe a new taxonomic hypothesis that only considers five species: Armeria leucocephala and A. soleirolii, endemic to Corsica, and A. morisii, A. sardoa, and A. sulcitana, endemic to Sardinia.


Introduction
Island biodiversity attracted the attention of many biologists since the mid-XIX century [1][2][3][4]. Islands have been historically classified according to their geological history into two large groups: oceanic and continental [5,6]. In the Mediterranean, among the largest (>5000 km 2 ) islands-Sicily, Sardinia, Cyprus, Crete, and Corsica-only Cyprus can be considered an oceanic island [7], whereas all the others are continental islands. Speciation processes in the two types of islands are different [8][9][10].
Sardinia and Corsica share a similar Oligocenic-Miocenic (ca. 30 to 16 Ma) geological history [11,12] that played a role in shaping the current biodiversity. In fact, some authors consider them as a single biogeographic unit [13,14]). More recently, the Messinian salinity crisis (5.96-5.3 Mya) led to a high rate of extinction of the tertiary flora, followed by the onset of the Mediterranean climate [7].
It is generally assumed that continental islands should possess a higher rate of palaeoendemics. However, Cheikh Albassatneh et al. [15] failed to find any evidence supporting this idea for the flora endemic to Sardinia and Corsica. Irrespective of its origin, this flora shows a much higher rate of endemism than the mainland [16] due to isolation processes, reduced gene flow and adaptation, resulting in a higher rate of speciation [17]. The current floristic diversity of Sardinia and Corsica [18] is one of the highest in the Mediterranean. Indeed, 485 out of the 4572 plant species native to Sardinia and Corsica are endemic, and the Corso-Sardinian biogeographic province is currently considered a hotspot for plant diversity in the Mediterranean Basin [13,[19][20][21].

Molecular Phylogenetic Analysis
The results of the ILD test are shown in Table S2 and allow us to perform the concatenation of plastidial and nuclear markers. In total, the concatenated matrix contained 2384 characters: the nuclear marker (ITS) showed only seven polymorphic sites, whereas the plastidial markers showed 51 polymorphic sites (trnH-psbA: 12; trnL-rpl32: 28; trnL-trnF: 5; trnQ-rps16: 6), the ITS and plastidial trees are shown in Figure S2. The phylogenetic tree ( Figure 3) shows a rather complex scenario, with many evolutionary grades and admixtures between populations and even individuals. The chosen outgroup is formed by two populations of A. pungens, whereas the other branches of the tree are mostly collapsed. Armeria leucocephala and A. multiceps populations form together a scattered paraphyletic group. One population, BI, is found in two completely different grades. Concerning A. sardoa Spreng. subsp. sardoa and A. sardoa subsp. genargentea Arrigoni, they form a well-supported clade (0.98) together; however, the latter subspecies is paraphyletic. Lastly, A. morisii and A. sulcitana Arrigoni from the type locality form a clade closely related to A. soleirolii and to part of the A. leucocephala accessions.

Molecular Phylogenetic Analysis
The results of the ILD test are shown in Table S2 and allow us to perform the concatenation of plastidial and nuclear markers. In total, the concatenated matrix contained 2,384 characters: the nuclear marker (ITS) showed only seven polymorphic sites, whereas the plastidial markers showed 51 polymorphic sites (trnH-psbA: 12; trnL-rpl32: 28; trnL-trnF: 5; trnQ-rps16: 6), the ITS and plastidial trees are shown in Figure S2. The phylogenetic tree ( Figure 3) shows a rather complex scenario, with many evolutionary grades and admixtures between populations and even individuals. The chosen outgroup is formed by two populations of A. pungens, whereas the other branches of the tree are mostly collapsed. Armeria leucocephala and A. multiceps populations form together a scattered paraphyletic group. One population, BI, is found in two completely different grades. Concerning A. sardoa Spreng. subsp. sardoa and A. sardoa subsp. genargentea Arrigoni, they form a well-supported clade (0.98) together; however, the latter subspecies is paraphyletic. Lastly, A. morisii and A. sulcitana Arrigoni from the type locality form a clade closely related to A. soleirolii and to part of the A. leucocephala accessions.

Seed Morphometric Analysis
According to Random Forest mean decreased accuracy, the five most important seed morphometric features are, in decreasing order: CHull, MBCRadius, Feret, Concavity, and MaxR (Tables S3 and S4). The overall accuracy of the final model on the current species circumscription was quite low (50.47%), albeit the AUC (Area Under the ROC Curve) was 0.859. The highest per class errors were found among the subspecies of Armeria leucocephala and A. multiceps (class error >66%) (Table S6). On the contrary, the lowest values of misclassification were obtained by A. sardoa subsp. sardoa and A. morisii (27.09% and 30.1%, respectively). According to the tuned HDBSCAN* model, the whole dataset contains only two true clusters ( Figure S3) for values ranging from 20 to almost zero, obtaining cluster scores (i.e., the sum of the stability scores for each salient flat cluster) of 59.40 (first cluster) and 315.54 (second cluster). The first cluster contains all the seeds from the GO  These two populations show much lower values for the five most important morphometric  features found by Random Forest with respect to all others (Table S5).
The second cluster contains all the remaining populations. A finer structure is found within this latter cluster, given that two unsupported subclusters can be observed. PCoA applied to the same dataset confirmed this pattern. The first two axes captured more than 86% of the total variability of the distance matrix ( Figure S3). The KMO test returned a MSA = 0.5.

Plant Morphometric Analysis
The two first axes of the PCoA explain 58.4% of the total variance ( Figure 4). All of the populations largely overlap in the morphometric space. Both A. leucocephala and A. multiceps show a large overlap among their respective subspecies, whereas some degree of separation is present between A. sardoa subsp. sardoa and A. sardoa subsp. genargentea, especially concerning the populations from Badde Urbara (BU), Monte Limbara (ML), and Aritzo (AR). Armeria morisii (TH) shows a clear separation from all other taxa ( Figure 4 and Figure S4). The overall pattern is maintained when comparing the first and the third axis, but A. soleirolii shows an almost complete separation in the morphospace ( Figure S5).   (Table 1), whereas the colors indicate the taxa at species level. Asterisk = type localities.
The best k value for kNN when applied to the current taxonomic hypothesis (Table  S7) at species level was k = 3. The multiclass AUC was 0.958, whereas the balanced accuracy was 92.8%.
The lowest values of balanced accuracy were obtained by the three subspecies within A. leucocephala (A. leucocephala subsp. breviaristata 69.15%; A. leucocephala subsp. leucocephala 60%; A. leucocephala subsp. pubescens 89.43%) followed by A. multiceps subsp. multiceps (88.6%). Moving to the univariate data, a heatmap ( Figure 5) shows that the highest number of pairwise significant differences occurs between A. sardoa subsp. genargentea and A. morisii (29), whereas there is no difference between the subspecies within A. leucocephala and A. multiceps. Again, A. morisii shows the highest number of pairwise differences, fol-  (Table 1), whereas the colors indicate the taxa at species level. Asterisk = type localities.
The best k value for kNN when applied to the current taxonomic hypothesis (Table S7) at species level was k = 3. The multiclass AUC was 0.958, whereas the balanced accuracy was 92.8%.
The lowest values of balanced accuracy were obtained by the three subspecies within A. leucocephala (A. leucocephala subsp. breviaristata 69.15%; A. leucocephala subsp. leucocephala 60%; A. leucocephala subsp. pubescens 89.43%) followed by A. multiceps subsp. multiceps  (88.6%). Moving to the univariate data, a heatmap ( Figure 5) shows that the highest number of pairwise significant differences occurs between A. sardoa subsp. genargentea and A. morisii (29), whereas there is no difference between the subspecies within A. leucocephala and A. multiceps. Again, A. morisii shows the highest number of pairwise differences, followed by A. sardoa subsp. genargentea (135) and A. sardoa subsp. sardoa (104). In general, all subspecies within A. multiceps and A. leucocephala show <50 pairwise differences. There is a high similarity among taxa, especially A. sulcitana, A. leucocephala and its subspecies, A. multiceps and its subspecies, A. sardoa and its subspecies. Accordingly, we tested two alternative taxonomic hypotheses based on the phylogenetic results and on morphometric comparisons. Hypothesis 1 consists in just lumping the subspecies under their respective species, while hypothesis 2 consists in also lumping A. leucocephala and A. multiceps. A tuned kNN model under hypothesis 1 showed a balanced accuracy of 97.87% with k = 5 (Table S8), whereas a tuned kNN model for hypothesis 2 yielded a final balanced accuracy of 99.01% with k = 4 and AUC = 0.994 (Table S9). In the Canonical Variate Analysis morphospace, where individuals were grouped according to hypothesis 2, A. sardoa is distinct from A. sulcitana, A. leucocephala, and A. soleirolii following the first and second discriminant functions, whereas A. sulcitana is distinct from other taxa when the first and third discriminant functions are plotted. These distinctions are confirmed by the lack of overlap of the 95% confidence ellipses ( Figure S4).

Discussion
The currently accepted taxonomic hypothesis [24] is no longer supported by the new evidence produced in this study. Specifically, neither morphological nor molecular or karyological data is able to support the grouping hypothesis currently accepted for Armeria in Sardinia and Corsica ( Figure 6). There is a high similarity among taxa, especially A. sulcitana, A. leucocephala and its subspecies, A. multiceps and its subspecies, A. sardoa and its subspecies. Accordingly, we tested two alternative taxonomic hypotheses based on the phylogenetic results and on morphometric comparisons. Hypothesis 1 consists in just lumping the subspecies under their respective species, while hypothesis 2 consists in also lumping A. leucocephala and A. multiceps. A tuned kNN model under hypothesis 1 showed a balanced accuracy of 97.87% with k = 5 (Table S8), whereas a tuned kNN model for hypothesis 2 yielded a final balanced accuracy of 99.01% with k = 4 and AUC = 0.994 (Table S9). In the Canonical Variate Analysis morphospace, where individuals were grouped according to hypothesis 2, A. sardoa is distinct from A. sulcitana, A. leucocephala, and A. soleirolii following the first and second discriminant functions, whereas A. sulcitana is distinct from other taxa when the first and third discriminant functions are plotted. These distinctions are confirmed by the lack of overlap of the 95% confidence ellipses ( Figure S4).

Discussion
The currently accepted taxonomic hypothesis [24] is no longer supported by the new evidence produced in this study. Specifically, neither morphological nor molecular or karyological data is able to support the grouping hypothesis currently accepted for Armeria in Sardinia and Corsica ( Figure 6).
Seed shape and size are under natural selection and may play a role in the adaptation of the population to different ecological conditions, leading to genetically fixing some characters and shaping the evolution of a species [39,40]. However, this is not the case for the Armeria taxa endemic to Sardinia and Corsica, in which we highlighted a high homogeneity in seed features. From a karyological point of view, there is no significant difference among taxonomic units, either arranging the taxa according to the current hypothesis or according to the alternative grouping hypotheses 1 or 2. Accordingly, there is a high degree of karyological homogeneity, suggesting that evolution in Sardinia and Corsica did not imply any relevant chromosomal rearrangement. It is well known that homoploid hybrid speciation is frequent in Armeria [41,42]. Indeed, the karyotype is very stable with 2x = 2n = 18 [24,43], as also confirmed by our data, despite in other cases slight differences in karyotype structure can be observed [33]. The overall karyological homogeneity and the reduced prezygotic barriers, limited to the sole pollen-stigma dimorphism [44][45][46], may have facilitated the hybridization/introgression and reticulate evolution of these plants in Sardinia and Corsica [25,47,48]. Indeed, hybridization is common in Armeria [34,49], despite the low signal provided by ITS ( Figure S2), which may be due to concerted evolution [47]. Accordingly, the phylogenetic signal found in the concatenated tree is largely dominated by the information coming from the cpDNA ( Figure S2), which in Armeria is maternally inherited [50]. Hybridization/introgression, incomplete lineage sorting and concerted evolution could all explain the observed pattern in the Armeria clade containing all the taxa endemic to Sardinia and Corsica. There, some individuals belonging to the same population are sometimes scattered all over the tree Seed shape and size are under natural selection and may play a role in the adaptation of the population to different ecological conditions, leading to genetically fixing some characters and shaping the evolution of a species [39,40]. However, this is not the case for the Armeria taxa endemic to Sardinia and Corsica, in which we highlighted a high homogeneity in seed features. From a karyological point of view, there is no significant difference among taxonomic units, either arranging the taxa according to the current hypothesis or according to the alternative grouping hypotheses 1 or 2. Accordingly, there is a high degree of karyological homogeneity, suggesting that evolution in Sardinia and Corsica did not imply any relevant chromosomal rearrangement.
It is well known that homoploid hybrid speciation is frequent in Armeria [41,42]. Indeed, the karyotype is very stable with 2x = 2n = 18 [24,43], as also confirmed by our data, despite in other cases slight differences in karyotype structure can be observed [33]. The overall karyological homogeneity and the reduced prezygotic barriers, limited to the sole pollen-stigma dimorphism [44][45][46], may have facilitated the hybridization/introgression and reticulate evolution of these plants in Sardinia and Corsica [25,47,48]. Indeed, hybridization is common in Armeria [34,49], despite the low signal provided by ITS ( Figure S2), which may be due to concerted evolution [47]. Accordingly, the phylogenetic signal found in the concatenated tree is largely dominated by the information coming from the cpDNA ( Figure S2), which in Armeria is maternally inherited [50]. Hybridization/introgression, incomplete lineage sorting and concerted evolution could all explain the observed pattern in the Armeria clade containing all the taxa endemic to Sardinia and Corsica. There, some individuals belonging to the same population are sometimes scattered all over the tree (i.e., GO, MO, BI). Genomic data and more sophisticated models, such as coalescent network-like models, may be needed to disentangle among these possible events [51].
Nevertheless, we can see that the subspecies of A. leucocephala and A. multiceps are mixed and form a grade. This evidence is corroborated by the low number of univariate morphological features that separate these taxa, and supports the view by Tison & de Foucault [38], who did not recognize the taxonomic value of what they considered just altitudinal variations. Moreover, there is no possibility to morphologically separate neither the two species in their traditional concept, nor the two clades found by molecular phylogeny ( Figure 6). Accordingly, we do think that the only sound taxonomic solution is to merge A. leucocephala and A. multiceps into a single species. In this respect, the name Armeria leucocephala Salzm. ex W.D.J.Koch has priority over A. multiceps Wallr. Concerning A. sardoa, A. sardoa subsp. sardoa is phylogenetically included within A. sardoa subsp. genargentea. Morphologically, the two subspecies show only six statistically significant differences, possibly due to adaptation to different altitudes. Accordingly, we deem more opportune not to distinguish subspecies within A. sardoa. Armeria soleirolii is narrowly endemic to the northwestern part of the rocky coast of Corsica, ranging from Calvi to Galeria [24]. Interestingly, this taxon is phylogenetically isolated within a grade composed of taxa both from Sardinia and Corsica. The unique habitat and morphological features (i.e., the presence of papillate epidermal cells on leaves that are also canaliculate and stiff) support the autonomy of this taxon, which indeed shows no misclassification by kNN.
Lastly, A. morisii shows the highest number of unique morphological features. Armeria morisii and A. soleirolii are easily distinguished from all other taxa by a set of character states, and their position in the phylogenetic tree is better defined than in other cases. Our findings are partially in contrast with the observations published by Bernis [37], particularly for A. soleirolii. Finally, A. sulcitana is different from A. sardoa in having longer inner involucral bracts, longer awns and being taller (Table S10), whereas it is slightly different from A. leucocephala in possessing mainly dimorphic leaves with exclusively smooth margins (Table S11). Accordingly, considering the currently available systematic data together ( Figure 6), we deem the alternative taxonomic hypothesis 2 (i.e., five species) the best possible solution. Thus, we can conclude that there was an overestimation of plant diversity for this genus in the biogeographical region of Sardinia and Corsica. Interestingly, this is not an isolated case. Indeed, other studies that used an integrative taxonomic approach focusing on taxa endemic to this region found a similar pattern in Santolina L. (Asteraceae) and other taxa ( [52] and literature cited therein). A similar reduction in the number of taxa following integrative taxonomic approaches has been documented in Ophrys L. (Orchidaceae) [53] and Pulmonaria L. (Boraginaceae) [54] from elsewhere.
To facilitate the identification of the five species endemic to Sardinia and Corsica as newly circumscribed here (plus A. pungens), their morphological differences are summarized in an identification key (Section 4.1). Note: Koch [55] provided a description in German, also citing a collector ("Salzmann") and the provenance "in den Corsischen". Herbarium and types of Johann Friedrich Wilhelm Koch are unknown [56]. We found two herbarium specimens, one in G and one in MPU, where Salzmann's herbarium is preserved [56,57]. The sheet barcode G00440097 bears two plants (part of the same collection) and the labels (bottom-left corner of the sheet) "Armeria "Statice | Montagnes de Corse" and "Statice leucocephala| Armeria leucocephala Koch in . . . | Montagnes de Corse" (Salzmann's handwriting; C. Loup pers. comm.). Unfortunately, both specimens lack the date of collection. In many sheets, Salzmann did not annotate the date (C. Loup pers. comm.). Despite this, it seems prudential to designate the specimen G00440097 as neotype of the name Armeria leucocephala. This specimen corresponds to the protologue and to the application of this name in this study.

Armeria leucocephala
= Note: Salis-Marschlins [58] described Statice armeria var. pubescens providing a Latin diagnosis and the following provenance in "Monte Cagna supra Portovecchio. 4000' s. m." No original material was found. Therefore, a neotype was designated with a specimen concurring with the diagnosis and coming from the type locally. This specimen agrees with the protologue and the data presented in this study and definitely allows us to consider this name as a heterotypic synonym of A. leucocephala.
= Armeria leucocephala subsp.  Note: Sprengel [61] described this species as an addition to those already listed in volume 1 of Sprengel's Systema vegetabilium [62]. He provided a short diagnosis ("16. A. [Armeria] foliis filiformibus canaliculatis scapoque glabris, foliolis involucri obtusis") and the two generic localities of Sardinia and Corsica. Plumbaginaceae of Sprengel's herbarium were housed in B, but a large part of it was destroyed during the Second World War [56,63]. No original material was found in the main European herbaria consulted; therefore, a neotype was selected. This specimen corresponds to the protologue and to the data and application of this name in this study.

Identification Key to Armeria in Sardinia and Corsica
Before starting the identification, refer to the supplementary material published by Tiburtini et al. [33], where calyx and leaf apex characters are graphically exemplified. We also recommend taking several measures and computing the mean values of the mentioned characters. The key was built using herbarium material. Descriptive statistics and contingency tables for the newly circumscribed taxa are provided in Tables S10 and S11.

Sampling
We selected 15 populations (Table 1) across Sardinia and Corsica. The populations studied were selected based on three criteria: (1) to include all the type localities of the taxa considered in this study (BS, CB, MCA, ML, MO, MR, MS, RE, SP, and TH-acronyms as in Table 1); (2) to include other populations explicitly cited by Arrigoni [27]: AR, BU, GO, and BI; (3) to include other populations for those taxa showing a distribution range relatively larger than others (FO).
In total, we studied 197 herbarium specimens stored either in FI or in PI. Digitized specimens stored in the Herbarium Horti Botanici Pisani (PI) are freely available for consultation at http://erbario.unipi.it/ (codes in Table 1), whereas herbarium specimens stored in FI are listed in the specimina visa and marked with an asterisk (*). Concerning molecular systematics, dried leaves were collected from a subset of three individuals for each population and put in a paper bag with silica gel. Unfortunately, specimens from the BI population were unavailable for the morphometric analysis and were used only for the phylogenetic reconstruction. Ripen fruits were also collected from the same populations. The seeds were dried, cleaned, selected, and stored at −25 • C in the Sardinian Germplasm Bank (BG-SAR), Cagliari, Italy.

Karyological Data
We followed the protocol described by Tiburtini et al. [39] to obtain metaphasic plates. Seeds germinated easily in Petri dishes with agar-agar at 1%, without any pretreatment, in 4-6(8) days. Chromosomes were observed with a Leitz Diaplan microscope, and pictures were taken with a Leica MC-170HD camera through Leica LAS-EZ 3.0 imaging software. From all the pictures taken, at least four metaphase plates for each population were selected and measured. Chromosome number and karyological parameters such as THL (Total Haploid Length), M CA (Mean Centromeric Asymmetry), CV CL (Coefficient of Variation of the Chromosome Length), and CV CI (Coefficient of Variation of the Centromeric Index) were obtained from each plate with MATO 1.1 (version 20210101) [65]. All the statistical analyses, excluding the molecular systematics, were conducted in RStudio IDE [66]. Data were managed using the tidyverse package (version 2.0.0) [67], and pairwise permutation tests from the rcompanion package(version 2.4.26) [68] were used to test statistical significance reporting only p-values corrected with Benjamini-Hochberg procedure [69] (BH step-up procedure) to control the False Discovery Rate (FDR) at level = 0.01.

DNA Extraction and Molecular Systematics
We followed the protocol described by Tiburtini et al. [33] for DNA extraction, primer selection for internal transcribed spacers (ITS1 + 5.8S + ITS2) and four chloroplast intergenic spacers (trnF-trnL, trnH-psbA, trnL-rpl32, trnQ-rps16), their amplification, sequencing, and data analysis. ITS sequences and chloroplast intergenic spacers were submitted to DDBJ (ITS: LC757034-LC757084; trnF-trnL: LC757325-LC757375; psbA-trnH: LC757085-LC757135; trnL-rpl32: LC757226-LC757273; trnQ-rps16: LC757274-LC757324). Sequences were visually inspected and aligned using the ClustalW algorithm Sequences were visually inspected and aligned using the ClustalW algorithm [70] implemented in BioEdit (version 7.2.5) [71] with the default values. An incongruence length difference (ILD) test was carried out in Nona (version 2.0) [72] as a daughter process of Winclada (version 1.00.08) [73] to test the putative incongruence of nuclear and chloroplast partitions prior to combination; default values were used for the analysis. A nucleotide evolution model was calculated for each of the five sequenced regions using jModelTest (version 2.1.10) [74], and the best-fitting model was chosen over the others using the Bayesian Information Criterion (BIC) [75]. Bayesian phylogenetic trees for concatenated and nuclear markers were inferred in MrBayes (version 3.2.6) [76] in two simultaneous, independent runs with the following settings: 2,000,000 generations of MCMC sampling every 2000 generations and four runs (three cold and one hot). For the cpDNA, convergence was reached in 3,000,000 generations sampled every 3000 generations. Convergence and mixing were evaluated in Tracer (version 1.7.2) [77]. The consensus Bayesian tree was visualized in FigTree (version 1.4.2) [78] as implemented in BioEdit (version 7.2.5) with the default values. The best evolution models were JC for ITS and F81 for chloroplast markers, except for trnL-rpl32, which was F81+G. We were unable to amplify this plastidial marker from A. soleirolii. The tree was rooted using A. pungens

Seed Morphometric Analysis
Seeds were selected and prepared for image analysis. The seeds were dried at the Sardinian Germplasm Bank (BG-SAR) of Cagliari (Italy) based on established international protocols down to 3-5% of internal moisture content, which guarantees homogeneity and regularity in seed size and weight and allows an effective long-term conservation [79]. Digital images of 100 seeds for each accession were acquired using a flatbed scanner (Epson Perfection V550 photo) with a digital resolution of 1200 dpi. When an accession had fewer than 100 seeds, the analysis was carried out on the whole batch available. Seeds were randomly placed on the scanner tray so that they did not touch one another: each accession was scanned using a light blue background to avoid interference from environmental light. Images were processed using the software package ImageJ v. 1.53v [(http://rsb.info.nih.gov/ij) (accessed on 28 December 2022)], and the plugin Particles8, freely downloadable on the official website [(https://blog.bham.ac.uk/intellimic/g-landini-software/) (accessed on 28 December 2022)], was used to measure 26 morphometric features (Table S4). The raw final matrix was 1445 × 26 variables. We explored, visualized, and preprocessed the data through centering and scaling using the tidyverse package [67]. We build a Random Forest model using the caret package (version 6.0-93) [80] with 750 trees with a depth of 3. To both maximize model performance and reduce the effect of correlation, we tuned the hyperparameter mtry for 1 to 20. The dataset was split at each iteration in 75% training and 25% testing, and within the training set, we cross-validated the training data using the LGOCV method with 10 individuals left out at each iteration and applied it to the testing data. We selected the model that maximized ROC-AUC values instead of accuracy as a model performance metric to avoid the accuracy paradox. From the tuned model (mtry = 15), we extracted the most important morpho-colorimetric variables and the confusion matrix. Finally, we applied HDBSCAN* (Hierarchical Density-Based Spatial Clustering of Applications with Noise) [81,82], an unsupervised clustering algorithm. Classical clustering techniques such as K-mean are limited by the fact that (1) the number of clusters must be known a priori, (2) each point, even outliers, must belong to a cluster, and lastly (3) they assume some known probability density function (PDF) that may have generated the observed data. HDBSCAN* is a non-parametric, density-based clustering algorithm that resolves these problems and can be used both as a nonhierarchical and hierarchical clustering algorithm. The Manhattan distance was computed from the standardized matrix and used as input to HDBSCAN* from the dbscan package (version 1.1-11) [83]. The sole hyperparameter of the model, min-Points, was tuned by fitting 49 models with minPoints ranging from 2 to 50 and selecting the model that had the least number of dimensions (26) plus one [83,84] and minimized both the number of clusters and the points considered as noise points. We picked the value of 30 minPoints as the best model. After testing for suitability to factor analysis with the KMO test, PCoA from ape package (version 5.6-2) [85] corrected with Cailliez correction [86] based on the same distance matrix was used to visualize both the nonhierarchical clustering and the membership probability calculated from the estimated PDF.

Plant Morphometric Analysis
In total, 49 qualitative and quantitative morphological characters (Table 2) were studied, with a resulting dataset of 192 individuals × 49 variables. Macroscopic measures were taken with a digital caliper (error ± 0.1 mm) under a Leica A60 stereomicroscope, while microscopic measurements were taken through bar-scaled pictures in Fiji 2.1.0 [87]. To have a more objective way to count the number of leaf veins, free-handed transversal sections of leaves were carried out. We considered as a "vein," each fascicule composed of xylem and phloem surrounded by sclerenchyma. The anatomy of summer leaves was surveyed under a Leitz Diaplan light microscope at 40×.
We used the tidyverse package [67] for data exploration, preparation and description, while we used the nearZeroVar function from the caret [88] package (version 6.0-93) for preprocessing, removing zero and near-zero variance variables. We tested the suitability of the data for factor analysis with the Kaiser-Meyer-Olkin test (MSA = 0.89, psych (version 2.2.9) [89] and Bartlett sphericity test (p < 0.001, REdaS package (version 0.9.4) [90]. Both were performed successfully on the numeric correlate matrix. PCoA (Principal Coordinate Analysis) was performed using the pcoa function in the FD package (1.0-12.1) [91], while the Cailliez correction [86] was applied due to the violation of the triangle inequality (i.e., the matrix was not Euclidean). Lastly, 95% confidence ellipses were calculated assuming a t-distribution of individuals in a morphometric space using the function stat_ellipse of ggplot2 package (version 3.4.2) at the species level for the Canonical Variable Analysis (CVA) [67,92]. Categorical data were binary encoded using the encode_binary function from the cleandata package (version 0.3.0) [93]. To test the alternative taxonomic hypotheses, we applied the kNN algorithm from the caret package (version 6.0-93) [88]. We decided to use kNN, a non-parametric, powerful [94] supervised machine learning model, because, for some populations, a low number of individuals was available. The model was tuned using the caret package for the sole hyperparameter k applying LOOCV for a range of k values from 1 to 20 (i.e., the maximum number of individuals per population) and retaining the model that maximizes the multiclass, cross-validated, balanced accuracy value. We followed the separating or merging scheme proposed by Xiong et al. [95] in case of overlapping regions in the morphospace between classes (i.e., a priori groups to be tested) or propose new alternative hypotheses. Each character was statistically tested, comparing each species. For all the quantitative characters, a pairwise permutation test was applied (10,000 permutations for each test), while for qualitative variables, Fisher's Exact test was used. Both functions derive from the rcompanion package (version 2.4.26) [68]. We used the Benjamini & Yekutieli [96] procedure to control the family-wise error rate due to multiple testing setting the α at 0.01. We obtained a 10 × 10 matrix containing the number of characters showing statistically significant differences for each pair of taxa. Based on this matrix, we created a heatmap from the plotly package (version 2.4.26) [97]. To build the identification key, we used the combination of the most representative decision tree from the Random Forest tuned model (mtry = 5.64, 1000 trees) extracted using the ReprTree function of the reprtree package (version 0.6) [98] and the comparisons of the effect size of each character through Cohen's D and mosaic plots for qualitative variables. We selected those characters that maximized the pairwise Cohen's D values given the species or group species pairs.

Nomenclature
Information about currently accepted names, basionyms, and homotypic synonyms was obtained from the Euro+Med-Checklist [99] and Peruzzi et al. [100]. Information about the herbaria in which the original material of these names could be stored was derived from Stafleu and Cowan [56]. Accordingly, we digitally examined the following herbaria: BR, COI, G, K, L, MPU, and P (herbarium acronyms follow Thiers [101] in search of potential original material. This allowed us to typify the names A. leucocephala, A. morisii, A. multiceps, A. sardoa, Statice armeria var. pubescens, and Statice soleirolii. Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/plants12112229/s1, Table S1, Mean and standard deviation of the karyological parameters in Armeria taxa endemic to Sardinia and Corsica; Table S2, Results of IDL tests for all the molecular markers used in the study of Armeria taxa endemic to Sardinia and Corsica; Table S3, Mean Decrease Accuracy of Random Forest for each of the 26 seed morphological variables of Armeria taxa endemic to Sardinia and Corsica considered in the model; Table S4, List of the 26 seed morphometric features measured in Armeria taxa endemic to Sardinia and Corsica; Table S5, Descriptive statistics of the five most important seed morphometric features in Armeria taxa endemic to Sardinia and Corsica, found by Random Forest; Table S6, Confusion matrix from the tuned Random Forest model on the current taxonomic hypothesis for the seed morphometric data in Armeria taxa endemic to Sardinia and Corsica; Table S7, Confusion matrix for Armeria taxa endemic to Sardinia and Corsica, based on the current taxonomic hypothesis; Table S8, Confusion matrix of the morphometric data in Armeria taxa endemic to Sardinia and Corsica, according to the tuned kNN model under the alternative taxonomic hypothesis 1 (6 species, no subspecies); Table S9, Confusion matrix of the morphometric data in Armeria taxa endemic to Sardinia and Corsica, according to the tuned kNN model under the alternative taxonomic hypothesis 2 (five species, no subspecies); Table S10, Descriptive statistics for the continuous characters according to the new taxonomic circumscription of Armeria taxa endemic to Sardinia and Corsica; Table S11, Contingency tables of most significant categorical characters used in the identification key to Armeria taxa endemic to Sardinia and Corsica. Significance according to Fisher's exact test. Figure S1, Haploid idiograms of the 15 populations of Armeria taxa endemic to Sardinia and Corsica considered in this study; Figure S2, Phylogenetic trees of Armeria taxa endemic to Sardinia and Corsica, based on the plastidial markers; Figure S3, PCoA based on the Manhattan distance of the 26 seed morphometric features in Armeria taxa endemic to Sardinia and Corsica, colored based on the results of HDBSCAN* clustering algorithm; Figure  S4, Canonical Variate Analysis of the morphometric dataset of Armeria taxa endemic to Sardinia and Corsica, composed by numeric variables grouped based on the new taxonomic circumscription; Figure S5. PCoA illustrating the morphometric variation of the first and third axes in Armeria taxa endemic to Sardinia and Corsica, based on the Gower distance of the 49 characters measured; Figure  S6. Leaf surface of Armeria soleirolii and its papillate epidermal cells.