Rangeomorph classification schemes and intra-specific variation: are all characters created equal?

Abstract Rangeomorphs from the Ediacaran of Avalonia are among the oldest known complex macrofossils and our understanding of their ecology, ontogeny and phylogenetic relationships relies on accurate and consistent classification. There are a number of disparate classification schemes for this group, which dominantly rely on a combination of their branching characters and shape metrics. Using multivariate statistical analyses and the diverse stemmed, multifoliate rangeomorphs in Charnwood Forest (UK), we assess the taxonomic usefulness of the suite of characters currently in use. These techniques allow us to successfully discriminate taxonomic groupings without a priori assumptions or weighting of characters and to document a hitherto unrecognized level of variation within single taxonomic groups. Variation within the currently defined genus Primocandelabrum is too great to be realistically assigned to different species and may instead reflect primary character diversity, ontogenetic changes in character state or ecophenotypic variability. Its recognition cautions against generic-level diagnoses based on single differences in character state and will be crucial in understanding the mode of growth of these enigmatic organisms. Supplementary material: Data tables, definition of the characters used in the analyses, and detailed descriptions and breakdowns of methods and results are available at https://doi.org/10.6084/m9.figshare.c.3726937

Rangeomorphs were a major component of early Ediacaran macroscopic communities (c. 580-557 Ma), even dominating many of the preserved assemblages in Newfoundland, Canada and Charnwood Forest, UK (e.g. Liu et al. 2015). They are characterized by a modular frond, the architecture of which appears self-similar over at least four orders of subdivision (Narbonne 2004;Brasier et al. 2012). Despite this apparently simple organization, rangeomorphs achieved considerable morphological diversity. They exhibit variations in the number of growth poles, in the degree of insertion, inflation and organization of their branches, and in the presence or absence of a stem/stems (Brasier et al. 2012;Laflamme et al. 2012;Hoyal Cuthill & Conway Morris 2014). Of the major Ediacaran clades recognized by Erwin et al. (2011), the Rangeomorpha have received perhaps the greatest attention. Nevertheless, almost every aspect of their biology remains contentious, including feeding, growth, reproduction, internal anatomy and phylogenetic position (e.g. Liu et al. 2015;Dufour & McIlroy 2016).
The unique anatomy of rangeomorphs ( Fig. 1) and their typical preservation as mere (albeit high resolution) external moulds (Narbonne 2005) has thus far confounded attempts to develop a universally accepted classification scheme. There is an urgent need for such a scheme to support increasingly sophisticated analyses of their ecological interactions, environmental tolerance and developmental biology (Darroch et al. 2013;Mitchell et al. 2015). Current taxonomic frameworks for rangeomorphs place the greatest weight on one of the two main sources of available characters: (1) categorical characters, such as a branching architecture (see Fig. 2b; see Supplementary material table 1 for a full description of characters), the presence/ absence of a holdfast and the number of growth poles (Brasier & Antcliffe 2009;Narbonne et al. 2009;Brasier et al. 2012;Laflamme et al. 2012;Liu et al. 2016); and (2) continuous characters (see Fig. 2a), such as shape metrics (length to width ratios) and their overall morphology (Laflamme et al. 2004;Laflamme & Narbonne 2008). Each method has particular merits and disadvantages. For example, the comparative resilience of categorical characters to tectonic overprinting (shearing and distortion) and their possible application to fragmentary fossils makes them attractive (Brasier et al. 2012), but only a few taxa are sufficiently well preserved to determine the number of branches present throughout ontogeny. Equally, although the overall morphology can differentiate forms above the species level (Hoyal Cuthill & Conway Morris 2014), shape metrics are not easily transferred between different tectonic settings. Liu et al. (2016) have argued that categorical characters are more appropriate to genus-level diagnoses and that continuous characters should be restricted to differentiating species. This was based on the inference that the branching architecture reflects genetic controls and that shape metrics are more susceptible to environmental influences (and thus convergence), which relies on the consistency of branching between multiple specimens. Accordingly, forms that differ in even a single branching character (e.g. Vinlandia v. Beothukis) are classified as different genera, with the possibility that they reflect different species within one genus, or even variation within a species, being rejected (Brasier et al. 2012). Very few species level diagnoses note variation in branching architecture within a taxon, although local variation has been noted and attributed to either taphonomic overprint or ontogeny. Examples include: the presence of displayed branches in Avalofractus, where they are more typically rotated, as well as the pivoting of individual branches (Narbonne et al. 2009); local unfurling in Bradgatia (Brasier et al. 2012); and the reversal of branch overlap in Charnia (Wilby et al. 2015). Such variation may, potentially, be more widespread, but it has not been systematically investigated.
Phylogenetic relationships within the Rangeomorpha will remain opaque, and the metrics of species richness or diversity will have uncertain value, until the taxonomic robustness of individual characters is better known. Herein, we use multivariate statistical analysis to test the extent to which branching characters correlate with continuous morphological characters and to assess which characters are most taxonomically useful. Our analyses use the abundant and well-preserved assemblage of multifoliate, stemmed rangeomorphs present on Bed B in Charnwood Forest (Wilby et al. 2011;Fig. 1). This assemblage includes variability in many of the character states used in rangeomorph taxonomy and therefore provides an opportunity to reassess the species concept within the Rangeomorpha.
Rangeomorph terminology is reviewed in Supplementary material table 1.

Multifoliate rangeomorphs from Charnwood Forest
Stemmed, multifoliate rangeomorphs ( Fig. 1) are abundantly preserved on Bed B in Charnwood Forest (Wilby et al. 2011;n ¼ 64). They are all oriented in the same direction (s ¼ 158), suggesting that they were felled and covered by the same event bed and, as such, will have experienced similar flow and burial conditions and the same tectonic processes. This substantially reduces the likelihood and magnitude of any artefacts associated with differential modes of preservation being introduced into the statistical analysis. Each specimen consists of a disc, a discrete stem and a multifoliate crown, and falls into one of three morphotypes based on general appearance. The crown is here defined as that part of a multifoliate rangeomorph that consists of the branches. Fifty-three specimens are referable to Primocandelabrum ( Fig. 1b -i), a taxon first described from the Bonavista Peninsula, Newfoundland (Hofmann et al. 2008); nine specimens conform to the 'dumbbell-like taxon' of Wilby et al. (2011;Fig. 1a); and two specimens (Fig. 1j, k) are referable to the so-called 'feather dusters' from Mistaken Point, Newfoundland (Clapham & Narbonne 2002). Variation in the overall morphology and branching architecture within the Bed B Primocandelabrum sp. individuals led Wilby et al. (2011) to suggest that several taxa or forms were represented on Bed B.
Primocandelabrum is principally diagnosed by its coarse branching and the triangular outline of its crown, with P. hiemaloranum being separately distinguished on the basis of its Hiemalora-like holdfast, which has numerous sinuous to linear rays (Hofmann et al. 2008). The type material from Bonavista consists of just eight specimens, all of which are poorly preserved. Only two main branches are generally visible in this material, none of which reveals the taxon's branching architecture. As such, no formal diagnosis of the branching pattern exists by which this genus or its constituent species may be definitively identified at other localities. The Bonavista specimens are often reconstructed as a two-dimensional fan or candelabrum in life (Hofmann et al. 2008), while the single specimen assigned to Primocandelabrum from NW Canada suggests a more brush-like appearance (Narbonne et al. 2014). In the specimens from Charnwood Forest that fit the broad diagnosis of Primocandelabrum, it is clear that the crown consists of three first order branches that split from the main stem at a single point (Fig. 1b -i). Each of these first order branches is referred to here as a folium, plural folia. All the Primocandelabrum sp. from Charnwood Forest have rotated and unfurled first order branches, but differ in their other branching characters. We note that none of the Primocandelabrum specimens from Bed B are referable to Primocandelabrum hiemaloranum, although an isolated disc with rays is known from nearby Bradgate Park (Wilby et al. 2011).
In contrast with the Primocandelabrum sp. spectrum on Bed B, the dumbbell has a crown that is circular in outline and consists of upwards of five first order branches, which are curvilinear in outline and are entirely concealed along their lengths; a proportionally longer stem and large holdfast disc (Fig. 1a); and displayed first order branches.
The Mistaken Point feather dusters are a poorly defined, bucket group for small, brush-like fronds, which have numerous (more than three) first order branches emanating from a shared point at the top of the stem and a small, bulb-shaped holdfast (Clapham & Narbonne 2002;Mason & Narbonne 2016;Fig. 1j, k). Two of the .200 known specimens of feather dusters from the Mistaken Point E surface have been described as Plumeropriscum hofmanni, but the remainder still lack systematic description (Mason & Narbonne 2016). The relationship of feather dusters to Primocandelabrum remains uncertain, although the presence of more than three folia in Plumeropriscum is considered to be diagnostic at the generic level (Mason & Narbonne 2016). The gross morphology, number of branches and overall appearance of Plumeropriscum is superficially similar to that of the two feather dusters on Bed B. They are distinct from all specimens of the dumbbells, even those of comparable size. The feather dusters have smaller holdfasts, proportionally shorter stems, a crown that is more deltoid in shape and branches that appear more tightly constrained. However, the branching architecture of Plumeropriscum has not been described and cannot be determined with confidence from the published photographs. The inclusion of Plumeropriscum in this study would require additional data pretreatment (retro-deformation), which could potentially impart bias to the analyses, and as such should be treated separately.

Statistical techniques
Taxonomy has sporadically made use of statistical and computational approaches to deal with large datasets. The techniques used most commonly include dimensionality reduction analyses, such as principal components analysis and multiple correspondence analysis (Lajus et al. 2015), as well as clustering algorithms (Jardine & Sibson 1971;Dunn & Everitt 2004;Bonder et al. 2012;Pagni et al. 2013). The approach adopted in the present study is to combine clustering methods with multivariate analytical techniques. Various iterations of the analyses were run to determine the extent of influence of outliers and particular characters on the resultant groupings. All tests were run in the R statistical package, version 3.1 (R Core Team 2014). The methods and iterations run are discussed in the Supplementary material.
The main advantage of the statistical techniques outlined here is that they allow the characterization of clusters of specimens based on all characters and on subsets of characters, and that they weight all characters equally. They also provide a way of testing the correlation of continuous and categorical characters and of identifying statistically meaningful groups within a population. If categorical characters correlate with each other, and with the continuous characters, then the groups can be considered to be morphologically distinct and are therefore more likely to be taxonomically meaningful than ones where the characters vary essentially randomly.

Materials and methods
Jesmonite resin casts were made of 64 specimens of stemmed, multifoliate taxa from Bed B in Charnwood Forest. The original moulds taken from the bedding surface and the corresponding casts are held at the British Geological Survey, Keyworth, Nottingham, UK.
Tests were run on all specimens that fitted into the broad 'stemmed, multifoliate' morphospace for which both categorical and continuous characters could be recorded. Casts were studied under controlled lighting conditions using a microscope equipped with a camera lucida. Continuous characters were measured by hand to the nearest millimetre using a ruler (Fig. 2a). First and second order branching characters (Fig. 2b) can be discerned in 48 specimens (40 Primocandelabrum sp., six dumbbells and two feather dusters). These characters were diagnosed and recorded for each of these specimens, including any variation within a specimen (Supplementary material table 2). The identification of branching characters probably represents the largest source of primary error due to taphonomic overprint and complexities arising from the compound nature of the fossils. To minimize these biases, only those characters that were consistent, or that varied consistently, across the specimen were included in the statistical analyses. Third-order and higher characters can only be discerned in a few individuals and so were not included in the analyses. Characters for which states could not confidently be determined were left blank to minimize any errors introduced due to misidentification. All the descriptive characters proposed by Brasier et al. (2012) were used in these analyses, in conjunction with the 'constrained/unconstrained' character of Narbonne (2004). Although some of these characters may not be taxonomically meaningful, the effect of the removal of any character from the analyses would be to impart a subjective bias to the analyses. As dumbbells can be confidently distinguished from Primocandelabrum sp. based on gross morphology, they are used to assess the validity of the statistical tests, although no assumption is made regarding the taxonomic rank of this difference.

Data pre-treatment
The data were not retro-deformed prior to analysis due to a lack of independent strain markers on Bed B, meaning that holdfasts are the only structures available for strain analysis. For all multifoliate specimens, the correlation between disc length and disc width is even across the surface (R 2 ¼ 0.9915). Together with the fact that the holdfasts exhibit little variation in ellipticity (s ¼ 2.6%, Wilby et al. 2015), this indicates that tectonic shear is even across the surface, that it operated parallel to the long axis of the fossils and that it affected all specimens evenly. If the holdfasts were originally circular (the common assumption in the retro-deformation of Ediacaran fossils; Wood et al. 2003), then all the fossils have been shortened to 77% of their original height (i.e. a compaction of 23%) parallel to their long axis. Although this will have affected the absolute proportions, it will have done so equally for all the studied fossils and so the relative proportions may be robustly compared.
Several aspects of the data make it challenging to work with, requiring various data pre-treatments. The variance of each continuous character was investigated using the makeProfilePlot function (Coghlan 2014;Fig. 3). The total size of the individual is a source of great variance in the data and necessarily affects the absolute values of all other continuous characters. Accordingly, all the continuous characters were divided by total height (taken from the top of the crown to the base of the stem, Fig. 2a) to standardize them. As only well-preserved specimens with clear distal margins were used in this study, we consider that this step does not bias the analyses. As categorical characters (here, branching architecture) are either binary or consist of a small number of discrete categories, they have a proportionately large variance compared with the continuous characters. The method used here scales the data as a part of its algorithm, scaling all values to unit variance so that all characters influence the results equally. The majority of the fossils included in this study are incompletely preserved, or include characters that cannot be determined with confidence, and which are therefore left blank. This necessarily results in a dataset that has a relatively large number of missing values, which is problematic for any statistical test. The missing values were imputed following guidelines for best practice .

Character selection
Some of the continuous characters are strongly correlated with one another. For instance, the length of the left and right branches of Primocandelabrum are the same length in most specimens, and also correlate with the height and/or width of the crown (depending on whether it is taller than it is wide). Similarly, the width of the stem measured at any point along its length will tend to be correlated with the width at any other point along the stem; those specimens with narrower stems tend to be narrower along their entire length. To avoid any bias towards characters that are themselves strongly correlated due to the double-correlation of the analyses (Dillon & Goldstein 1984), most analyses were run on a reduced character matrix using only those characters whose proportions do not innately co-vary with one another.

Outlying individuals
The dataset includes organisms of a significantly larger size than the main population, but a lack of ontogenetic intermediates between the two size groups (cf. Wilby et al. 2015). Accordingly, any changes in morphology or mode of life attributable to ontogeny cannot be constrained by the present dataset. The presence of such outlying individuals can influence the results of each of the described methods (cf. Dillon & Goldstein 1984) and so several iterations of the data were run to investigate the extent of the influence of these factors. If any character differences are ontogenetic, the lack of specimens recording intermediate ontogenetic stages precludes confirmation of such patterns.

Analyses
Principal components analysis was run on continuous characters; multiple factorial analysis was run on categorical characters and factorial analysis of mixed data was run on the mixed datasets, i.e. those with both categorical and continuous characters. Hierarchical clustering (HCPC) was performed on the results of the principal components analysis, multiple factorial analysis and factorial analysis of mixed data analyses, producing hierarchical dendrograms (Husson et al. 2010). The influence of the iteration on the results was examined by determining the percentage of individuals that were assigned to the same group for each pair of iterations. The number of clusters was chosen based on an analysis of inertia gain, a measure of the within-group variance (plotted as a histogram of variance v. number of clusters; see insets, Figs 4-6). The greatest jump in inertia gain (i.e. the greatest decrease in within-group variance) is taken as the best node at which to cut the dendrogram into clusters. For each cluster, three factors describe the success of the cluster discrimination: (1) the percentage assignment of individuals displaying a particular character state to a cluster characterized by that character state (where 100% indicates that all individuals which display that character are placed in the cluster); (2) the percentage of individuals within a cluster that display a given character state used to describe that cluster (where 100% indicates that all individuals in the cluster display that character); and (3) the average value for a continuous character within a cluster compared with the average for all specimens (Tables 1 & 2).

Results
The results of the analyses including dumbbells are presented in Figure 4, whereas the analyses run on Primocandelabrum only are presented in Figure 5 and on Primocandelarum and feather dusters in Profile plots: graphic representations of the variance within each continuous character. Vectori is a measure of the variance and Index refers to the number of the specimen (in random order). (a) All specimens, all continuous characters, data not scaled to unit variance (see Supplementary material 1.2). The two outsize specimens are at the right-hand edge of the plot. The greatest variance is within crown width and disc width, length left and length right. (b) As (a), but on the reduced character matrix, the difference in variance between crown width and disc width compared with the other characters is more pronounced. (c) As (b), but scaled to unit variance. (d) All Primocandelabrum and feather dusters, including outsize specimen BB (right-most data points).
continuous characters only     Fig. 4. Results of the cluster analysis (HCPC) on the dataset, including all individuals for which categorical and continuous characters could be determined. All values were standardized to total height. (a) Cluster dendrogram and (b) factor map for analysis on continuous characters (reduced character matrix) combined with categorical characters; (c) cluster dendrogram and (d) factor map for categorical characters only; (e) cluster dendrogram and (f) factor map for morphological characters (reduced character matrix) only. In (a), inertia gain supports division into two or three clusters; in (c) it supports division into two or six clusters and in (e) it supports two or three clusters. Dumbbells plot separately in (a), (b), (c) and (d), but with other specimens in (e) and (f). Schematic diagrams describe the clusters that match their colour (see online version for colour): where a categorical character trait was not significant in describing the cluster, the most common state for the cluster was used; where more than one state is common in the group, both states are depicted. Where a continuous character did not significantly describe the cluster, average values for the population were used.
continuous and categorical characters    Inertia gain plots support division into four clusters in (a), two, three or six clusters in (c) and two or three clusters in (e). Schematic diagrams describe the clusters that match their colour (see online version for colour): where a categorical character trait was not significant in describing the cluster, the most common state for the cluster was used, and where more than one state is common in the group, both states are depicted. Where a continuous character did not significantly describe the cluster, average values for the population were used.     . 6. Results of the cluster analysis (HCPC) on the dataset including all individuals for which categorical and continuous characters could be determined, excluding dumbbells and using the reduced character matrix. All values were standardized to total height. (a) Cluster dendrogram and (b) factor map for analysis on continuous and categorical characters; (c) cluster dendrogram and (d) factor map for categorical characters only; (e) cluster dendrogram and (f) factor map for continuous characters only. Inertia gain plots support division into two or three clusters in (a), two, three or six clusters in (c) and two, three or five clusters in (e). Schematic diagrams describe the clusters that match their colour (see online version for colour): where a categorical character trait was not significant in describing the cluster, the most common state for the cluster was used, and where more than one state is common in the group, both states are depicted. Where a continuous character did not significantly describe the cluster, average values for the population were used.

Summary of results
All tests discriminated dumbbells to some degree. In fewer than 10% of the iterations run, one or two Primocandelabrum or feather duster specimens are included in the dumbbell cluster, or the smallest two dumbbells are placed in a different cluster to the majority of that group. Even then, the dumbbells all plot close together in the principal components space. The separation of dumbbells from Primocandelabrum is further supported by profile plots of variance (Fig. 7a, b) and bivariate plots (Fig. 7c-f ). The morphological and branching characters of the feather dusters place one specimen (8C1, Fig. 1k) most frequently with the dumbbells and one (8C3, Fig. 1j) within the Primocandelabrum clusters.
The clustering consistently supports the placement of Primocandelabrum individuals into two or three clusters by analysis of inertia gain (inset, Figs 4-6) that are distinct from dumbbells. The profile plots of the variance within the data show a smoother, more consistent correlation of continuous traits when sorted into three groups as determined by cluster analysis than when the individuals are unsorted (Fig. 7a, b). However, within Primocandelabrum, there is considerable variation in the composition of each cluster (both individuals placed into the cluster and the character descriptions of the cluster) depending on the iteration. The lowest percentage match between iterations resulted from analyses using only categorical characters. Primocandelabrum clusters are defined by at most once character state with 100% inclusion or exclusion of the individuals that display a particular character state (coded blue or green in Table 2b). The other character states defining the cluster also describe other clusters in the iteration -that is, individuals in multiple clusters share characters. Within Primocandelabrum, no iteration produces a set of clusters within which all individuals are identical in terms of their branching and distinct from other clusters (without dividing the individuals into 15 clusters). Importantly for this study, continuous and categorical characters do not correlate perfectly within Primocandelabrum as they do for dumbbells -evident by the lower percentage of group matches (Tables 3 & 4). It is not clear at this point whether this reflects the greater number of specimens within Primocandelabrum or that the dumbbells innately show less variability. The outsize specimen BB plots as an outlier to the bulk population in the principal components space, especially for morphological characters only (Fig. 6e, f).
The continuous characters that significantly contribute to the construction of the dimensions are similar in each iteration, but the categorical characters significantly contributing to the dimensions are more variable. For the reduced character matrix, the percentage of variance described by each dimension was comparable to that for the full character matrix, but slightly higher when the continuous characters only were considered (55% for the reduced and 27% for the full matrix, compared with around 40% for the reduced and 20% for the full matrix using only continuous characters). The influence of iteration on these aspects is presented in the following sections, focusing on the assignment of individuals within the group assigned to Primocandelabrum. This is quantified for the hierarchical cluster analyses as the percentage of individuals that are assigned to the same group for each pair of iterations. Significance refers to the 95% confidence level (i.e. p , 0.05) throughout. The utility of particular characters in defining clusters is summarized in Table 5.

Hierarchical clustering
In the following descriptions, all descriptors of continuous characters are proportional compared with the mean values of all individuals. Individual characters that are not mentioned in the descriptions do not significantly describe the cluster -that is, the continuous characters for the cluster are not significantly different from the mean values for the group -and the categorical characters are not significantly different between groups.
All iterations of the analyses using dumbbells show the greatest increase in inertia gain in moving from two to three clusters (Fig. 4a, c, e). In most iterations, the next greatest increase in inertia gain is in moving from three to four clusters (Fig. 4a, e), but for branching characters only, the next greatest increase was between six and seven clusters.
For most analyses where dumbbells were included, the clusters can be characterized as  Pink, average value for cluster larger than average value for all clusters; orange, average value for cluster larger than average value for all clusters, but smaller than the average value for another cluster; blue, average value for cluster smaller than average value for all clusters (see online version for colour).  Pink, average value for cluster larger than average value for all clusters; orange, average value for cluster larger than average value for all clusters, but smaller than the average value for another cluster; blue, average value for cluster smaller than average value for all clusters (see online version for colour). Blue, .75% for all metrics of cluster assignment; green, ,75% for one metric, but either 100 or 0% of individuals with character state are included in cluster (i.e. total inclusion/exclusion); yellow, .75% for all bar one metric, but total inclusion/exclusion of individuals with a character state within/from cluster; orange, all metrics 50% , x . 75% (see online version for colour). br, categorical characters only; mo, continuous characters only analysed; nbb, outsize specimens removed; nd nbb, outsize specimens and feather dusters removed; red, reduced character matrix.   Primocandelabrum cluster 1 (corresponding to P. aethelflaedia) are characterized by long stems, Primocandelabrum cluster 2 (corresponding to P. boyntoni) are characterized by short stems and unconcealed folia and Primocandelabrum cluster 3 (corresponding to P. aelfwynnia) is characterized by wide crowns and large discs. Only individuals for which continuous and categorical characters could be determined were included in each iteration. Analyses excluding dumbbells (po) were divided into two groups to allow comparison with those analyses including dumbbells (which typically separates into its own group, with the other two groups consisting of Primocandelabrum and feather dusters). br, categorical characters only; mo, continuous characters only analysed; nbb, outsize specimen BB excluded; nbbnd, BB and 'feather dusters' excluded; nbo, outsize specimens removed; po, 'dumbbells' excluded; red, reduced character matrix. Only individuals for which continuous and categorical characters could be determined were included in each iteration. All iterations were divided into three clusters, as this was supported by inertia gain and by cluster descriptions for most iterations. br, categorical characters only; mo, continuous characters only analysed;nbb, outsize specimen BB excluded; nbbnd, BB and feather dusters excluded; po, dumbbells excluded; red, reduced character matrix.
For most analyses where dumbbells were excluded, three clusters can be characterized as follows: Cluster 1 (Figs 5-7, Table 2) is typified by: a long and narrow stem, large disc and small crown; first order branches concealed with median inflation; and second order branches with median inflation. Cluster 2 (Figs 5-7, Table 2) specimens have a large crown, slightly small disc and short, slightly narrow stem (all approximating the average values for each variable); first order branches show proximal inflation. Cluster 3 (Figs 5 -7, Table 2) is characterized by: a large disc, a wide crown and a wide stem; unconcealed first and second order branches with either distal or proximal inflation; and furled second order branches.

Discrimination of dumbbells.
For all tests where they are included, dumbbells plot away from the main

Table 5. (b) Categorical characters
The rankings for the numbers of clusters defined and for how well clusters are defined are in descending order, where 1 ¼ most useful character, and are based on the characters colour coded in Tables 1 and 2. Green, ranked 1; yellow, ranked 2 or 3; red, ranked bottom two; blue, character state unique to a taxon (see online version for colour).
group of Primocandelabrum specimens and comprise the second group when specimens are forced into two clusters (Fig. 4). Iterations including only branching characters place one Primocandelabrum specimen (6A1) into the cluster with dumbbells (Fig. 4), whereas iterations using only continuous characters place 4A1b (right), 13A4 and BB (when included) into the dumbbells group (Fig. 4e, f ). Two dumbbells (3D7 and 5A4) do not cluster with the others in the iterations including only branching characters, placing in cluster 1 when outsize specimens are included and cluster 2 when outsize specimens are excluded. This is not due to the quantity of missing data, as other individuals with the same number of missing values consistently plot with the rest of the dumbbell specimens, but is rather due to the fact that 5A4 has unfurled second order branches and is the smallest specimen and 3D7 is missing the furled second order character, which is a significant variable in defining the clusters in this analysis (Table 2). 5A4 and 3D7 also have tall crowns and short stems compared with the rest of the dumbbells, which place them closer to the Primocandelabrum proportions for these characters.
Assignment of feather dusters. When dumbbells were included in the tests, and when only continuous characters are used, 8C1 clusters most frequently with dumbbells and 8C3 clusters most frequently with Primocandelabrum cluster 2. However, when categorical characters and outsize specimens are included, both 8C1 and 8C3 place in Primocandelabrum cluster 2. When dumbbells were excluded, the feather dusters place in very different parts of the factor map far from each other on the trees (Fig. 6).
Removing outsize specimens. For the iterations including dumbbells, removing the outsize specimen BB only affected the number of clusters for one iteration, i.e. when only categorical characters were used. For this iteration, six clusters were supported when outsize specimens were included, but five clusters were supported when outsize specimens were excluded. There was a 90 -100% (Table  3) match with the group assignment of specimens for all tests except those iterations including dumbbells and including all characters (88%; Table 3) and those using categorical characters only (62%; Table 3). The latter test also showed an appreciable difference between characters significantly correlated to cluster 1 in particular, with more variables significantly correlated to this cluster when outsize specimens are included (Table 3).
Assignment of Primocandelabrum specimens. Primocandelabrum specimens + feather dusters are consistently distinguished from dumbbells in almost all iterations of the data, but determining the number and composition of clusters within Primocandelabrum is more variable. The variation in the definition of each cluster is summarized in Tables 1 and 2. Between iterations including and excluding dumbbells, either including outsize specimens or using only morphological characters, results in a very low match in the cluster assignment of Primocandelabrum specimens (around 50%; Tables 3 &  4). When either all characters or only branching characters are considered, and when outsize specimens are excluded, the match increases to 74 and 81%, respectively (Tables 3 & 4). When individuals are divided into three clusters, the outsize specimen BB is excluded and there is a 92% match between iterations including and excluding feather dusters when all characters are considered, 82% between iterations including and excluding feather dusters when only continuous characters are considered and 84% when only categorical characters are considered (Table 4).
When the Primocandelabrum specimens are divided into three or more clusters, disc size becomes a significant contributing factor to cluster definition (Table 2a). When the data are split into two clusters, disc size is only significant if only continuous characters are used, or if the analyses are run on the reduced character matrix (Table 2). Three clusters are most strongly supported by inertia gain for most analyses (Figs 5 & 6). In addition, when individuals are divided into three rather than two clusters, a higher percentage of individuals that displayed a given character state are placed in the cluster described by that character state, and each cluster contains a higher percentage of individuals displaying the character states by which that cluster is described (Table 2). When divided into more than three clusters, there is a similar or greater percentage of cluster and character match, but only one or two characters significantly describe each cluster. Only when the specimens are divided into 15 clusters do the individual clusters have a suite of categorical characters that are unique to that group.
All characters v. continuous characters only v. categorical characters only. There is a match between all characters and categorical characters of only 55% when outsize specimens are included, 60% when outsize specimens are excluded and 49% when outsize specimens and feather dusters are excluded. Comparing all characters and continuous characters only, there is a match of 88% when outsize specimens are included, 81% when outsize specimens are excluded and 87% when outsize specimens and feather dusters are excluded. Comparing continuous and categorical characters only, there is a match of 57% when outsize specimens are included, of 60% when outsize specimens are excluded and of 42% when outsize specimens and feather dusters are excluded (Table 4).
Assessment of cluster assignment. When dumbbells are included, the within-cluster averages for the dumbbell group are significantly different to the within-cluster averages for the two Primocandelabrum (+feather duster) clusters. There is also a greater discrimination of the dumbbells from Primocandelabrum individuals than for groups within Primocandelabrum based on categorical characters, evidenced by more characters with 100% inclusion or exclusion (i.e. 0% inclusion) of individuals into the dumbbell cluster than into either of the Primocandelabrum clusters (Tables 1 & 2).
When individuals are sorted into the clusters to which they are assigned by the majority of iterations, the total variance within clusters is easily visualized through profile plots (Fig. 7a, b). Variance within clusters determined through the hierarchical cluster analysis is lower than it is for all individuals treated as a whole (Fig. 3, evidenced by smoother trend lines in Fig. 7). When bivariate character plots are constructed with individuals coloured according to their cluster number (Fig. 7c-f ), the clusters follow distinct trend lines for those characters shown to provide the strongest discrimination of each cluster (Tables 1 & 2), but do not form clear trend lines for those characters that provide weak discrimination of the clusters. These plots therefore support the conclusions drawn from the results of the cluster analyses.

Discussion
The argument that generic level diagnoses should be based on categorical characters, such as branching architecture, and that continuous characters should be used for species level diagnoses rests on the inference that categorical characters dominantly reflect a genetic control, whereas continuous characters, such as stem length, are more susceptible to environmental influences and, potentially, to ecophenotypic variability and convergence (Liu et al. 2016). If aspects of branching architecture can be influenced or modified by the environment to the same extent as continuous characters, then this inference would break down.

Potential functional significance of rangeomorph branching
The furling of branches is observed in many rangeomorph taxa (Brasier et al. 2012) and is seemingly at odds with the presumed function of rangeomorph elements as exchange surfaces, as it decreases the surface area exposed to the water column Liu et al. 2015). However, there are several conceivable advantages that might have been bestowed on a rangeomorph by the possession of furled branches, particularly for higher order branches. First, it allows for a tighter packing of branches and minimizing of the overlap of rangeomorph elements. Second, it may have served to protect the tips (a likely site of growth; Antcliffe & Brasier 2007) from abrasion by neighbouring branches and from environmental damage. Third, the flow of water within a furl might be slowed, increasing its contact time with the rangeomorph surface. On a larger scale, the surface texture of a row of adjacent furled branches may help to break up the boundary layer, resulting in a faster flow over the surface (as for the riblets on shark skin; Dean & Bhushan 2010). By contrast, unfurled rangeomorph elements would increase the surface roughness of the rangeomorph exchange surface, reducing the thickness of the diffusive boundary layer over the surface and enhancing the uptake of nutrients and/or the removal of waste products. Therefore if these characters are functionally important, then it might be expected that a trade-off between increased surface roughness and damage prevention would determine whether a rangeomorph adopted a furled or unfurled branching architecture.
The functional advantage of certain aspects of branching, such as proximal v. median v. distal inflation (Brasier et al. 2012), is not readily apparent. However, maintaining a consistent mode of inflation across different levels of branching may have served to reduce self-shading (cf. Enríquez & Pantoja-Reyes 2005) and could therefore have been favoured in slower flow or lower nutrient environments. Other aspects of branching, such as rotated v. displayed branches and concealed v. unconcealed axes, may be implicated in the efficiency of branch packing and consequently have other functional constraints. Traits that provide an advantage through their functionality are more likely to be converged upon and therefore have the potential to skew phylogenies where taxa lack a large number of apomorphies. Given the current uncertainties regarding the functional constraints on branching architecture, and the consequent possibilities of convergence, it is perhaps premature to place greater phylogenetic relevance on some categorical characters over others, and over continuous characters.

Influence of taphonomy
Any analysis of fossil morphology must take into account the biases that may be imposed by taphonomic and post-fossilization processes (Matthews et al. 2017). Taphonomic interference may include the local rotation of branches (Narbonne et al. 2009), the deflation of individual branches (Brasier et al. 2013), as well as the displacement of branch order of overlap for unconstrained branches. The fact that only branching characters consistent across a given specimen were included in the analyses presented here should minimize the influence of taphonomy on the branching characters in this study.
The overall morphology of a rangeomorph (i.e. continuous characters) is potentially susceptible to several sources of taphonomic overprint. For example, the preserved size of the holdfast and the number of concentric rings may be influenced by burial depth in the sediment. Stem width may be obscured by sediment settling beneath the stem during felling (cf. Laflamme et al. 2007). The shape of the crowns of Primocandelabrum and the dumbbells are inferred to have been three-dimensional in life, but are preserved as compound, twodimensional impressions. The preserved morphology was also likely to have been influenced during felling, depending on the stiffness and rheology of the branches. The preservation of only two main branches in the Primocandelabrum specimens from Newfoundland is potentially due to the failure of a third branch to be expressed in the fossil, perhaps held above the plane of preservation. By that token, it is possible that Primocandelabrum specimens from elsewhere, and indeed multifoliate taxa in general, may have had more branches than are preserved.

Validity of the statistical approach
The group of individuals assigned to dumbbells can readily be distinguished from Primocandelabrum individuals using a number of traditional taxonomic methods and so provide a good means of validating the approach used here. The dumbbell cluster is consistently discriminated from Primocandelabrum based on categorical and continuous characters in isolation and in tandem. That the dumbbell cluster is discriminated for all of these iterations indicates that categorical character states correlate consistently with particular values for the continuous characters, at least at this taxonomic level. Crucially, for the categorical characters that statistically significantly describe the dumbbell cluster, a high proportion of the individuals within that cluster display a given character state and a high percentage of the individuals that display that character state are assigned to the cluster (Table 3). In addition, the mean values of the continuous characters for dumbbells are statistically significantly different from the means of the continuous characters describing the Primocandelabrum clusters and are more different from the means of the Primocandelabrum clusters than are the means within the Primocandelabrum clusters. This is strong support for these individuals representing a distinct and separate group and indicates that the statistical approach described here is useful in discriminating taxonomic groups in mixed populations. It also suggests that both categorical and continuous characters are, at some level, taxonomically useful.
Categorical or continuous characters, or both? If the clusters identified by these analyses represent discrete taxa, which iteration should be used for taxonomic discrimination: those including all characters, only categorical characters, or only continuous characters? If categorical characters do indeed represent generic level diagnoses, then the division of clusters should first be based on these characters and only subsequently subdivided into species by the variation in continuous characters. However, unless taxa with the same branching characters have converged on forms that are indistinguishable based on the sum of their continuous characters, then the combination of continuous and categorical characters should provide the best discrimination of meaningful taxonomic groupings.
For continuous characters, the disc width, stem width, stem length, crown width and crown length all served to distinguish pairs of clusters from each other, although no single character discriminated each of the three clusters from each other and from the average values of the group, both when dumbbells were and were not included (Tables 1  & 2). This means that the combination of characters is required to identify each of the clusters identified within the population. For categorical characters, the nature of the first order branches (folia) proved most successful in discriminating dumbbells from Primocandelabrum, especially displayed v. rotated, the sense of inflation and furled v. unfurled (Table  1). For second order branching characters, only furled v. unfurled and the sense of inflation were consistently significant in the division of the clusters. Within Primocandelabrum, the only relevant first order characters were the sense of inflation and concealed or unconcealed axes, as all individuals had rotated and unfurled branches (where this character could be determined). Median inflation discriminated single clusters with 100% inclusion or exclusion, and the absence of either concealed or unconcealed axes was also useful in discriminating pairs of clusters from each other. For the secondorder branch characters, only the sense of inflation consistently discriminated pairs of clusters across all iterations, although furling and concealed/ unconcealed characters were useful for some iterations (Table 2).
In the analyses of Primocandelabrum, the cluster assignment for iterations using only continuous characters most consistently agrees with the assignment from iterations using all characters, having a higher cluster assignment match than for categorical characters only with all characters. Although variation in one continuous character (such as stem length) could reflect ecological drivers, variation in all continuous characters as documented here is much harder to explain in this manner, especially as all specimens are found on the same bedding plane. Equally, the potential functional drivers for branching architecture as well as for continuous characters means that one set of characters is not inherently more robust than the others. Accordingly, the clusters defined by all characters are interpreted here to most likely represent the fundamental, biological clusters within the data. The greater mismatch in cluster assignment between iterations when only categorical characters are used -compared with iterations using both categorical and continuous characters, or only continuous characters -has two likely causes: (1) the clusters defined do not incorporate all of the individuals that share a character state; and (2) the particular set of categorical characters that are used to describe the clusters varies widely between the different clusters in a given iteration and between iterations. By contrast, the same continuous characters are used to define the clusters, both between clusters in a given iteration and between iterations (Tables 1 & 2). This could be taken to infer that categorical characters are less taxonomically robust than continuous characters.
Inclusion of outgroups is an important consideration. Although the inclusion of outgroups can help to root clusters by providing information on ancestral or shared characters, they can also mask variation within the group under scrutiny, particularly if there is character convergence. When dumbbells are included, the characteristics of individuals belonging to dumbbells contribute to the definitions of the principal component space and of all the clusters, both masking variation within Primocandelabrum and influencing the way that group is divided (Tables 1 -4). When dumbbells are excluded, only the characters of Primocandelabrum contribute to the construction of the component space and, consequently, these iterations should most reliably and accurately identify natural clusters within Primocandelabrum. The inclusion/ exclusion of the two feather dusters has little effect on the analyses, but does influence the clustering to some degree (Tables 1-4), particularly when 8C3 is included because this individual is assigned to dumbbells.

Placement of feather dusters
Separation of the two feather dusters is more problematic, as their morphological and branching characters place one (8C3, Fig. 1j) within Primocandelabrum clusters and the other (8C1, Fig. 1k) most frequently with dumbbells. There are three potential causes for the failure of feather dusters to fall consistently into the same cluster and into a cluster separate from the Primocandelabrum specimens: (1) the limited size of the dataset; (2) they represent two different taxa; and (3) the exclusion of a key taxonomic character (the number of folia) from the analyses due to its uncertainty in the vast majority of specimens of both Primocandelabrum and feather dusters -as with dumbbells, it is only possible to determine a minimum number of folia. The inclusion of the number of folia may change the placement of feather dusters relative to Primocandelabrum and would further distinguish dumbbells from Primocandelabrum. Based on morphological characters, 8C1 is more similar to dumbbells than it is to 8C3 or Primocandelabrum and its branching structure (as far as can be determined) only differs from that of dumbbells in having median v. distally inflated first order branches, median v. proximally inflated second order branches and possibly also in the rotated/displayed nature of its folia.
Even within the dumbbells -which these analyses demonstrate to be a discrete taxon -there is variation in both continuous and categorical characters. Accordingly, although the stem of specimen 8C1 is proportionally thicker and its holdfast smaller than other specimens of dumbbells, even of similar size, its dominant placement in the dumbbell cluster suggests that it is a different morph of dumbbells, although whether it is a different species or simply a variant cannot be determined on the evidence from this solitary individual. All Primocandelabrum specimens appear to have three rotated folia, but because the number of folia and their rotated/ displayed nature in 8C3 is uncertain, it is not clear whether this individual represents a separate taxon or a different taphonomic expression of Primocandelabrum. The importance of folia number and branching characters on the placement of the two feather dusters in particular indicates that these are key taxonomic characters. However, the consistent placement of 8C3 within the Primocandelabrum spectrum based on those characters that are available, and its close position to Primocandelabrum in the principal component space even when its folia are set as 'displayed', does not preclude its assignment to this genus.

Taxonomic subdivision of Primocandelabrum
Determining the number and rank of taxa within the Primocandelabrum specimens is challenging given the variability of cluster composition depending on data treatment and the fact that many characters that define clusters do not do so exclusively (Table  2). Following the guidelines of Brasier et al. (2012), where the branching characters of each genus are unique, would result in 15 genera within the 40 well-preserved Primocandelabrum specimens analysed here. There is also variation in the continuous characters within each of these 15 groups, and this is greater than the variation between the groups. Although it is possible that the specimens here ascribed to Primocandelabrum represent a supra-generic group, the presence of so many morphologically similar genera (let alone species) is considered to be unlikely and difficult to objectively apply to other, less well-preserved material. Accordingly, this interpretation is rejected here.
The clustering analyses consistently support two or three clusters within Primocandelabrum, with the best discrimination of clusters being achieved when the specimens are divided into three clusters. Analyses based on only the branching characters show additional increases in inertia gain at higher levels of cluster division (Figs 5a, c & 6c), supporting more than three clusters, but even then the best cluster discrimination is generally achieved when the group is divided into three rather than more than three clusters. When Primocandelabrum is divided into the three clusters determined by the majority of cluster analyses, profile plots of the data show a smoother, more consistent correlation of morphological traits with total size than when the individuals are unsorted (Fig. 7a, b). Stem width shows a smaller increase in proportion to the total height than other characters (Fig. 7b). Division into three groups is also borne out by simple scatterplots (Fig.  7c, d), where the groups assigned by the principal component analyses (reduced and full character matrix) follow separate trends for many pairs of measurements. An underlying ontogenetic cause for these differences can be ruled out by the fact that the large specimens do not cluster together or plot close to each other (Figs 4 & 7c -f ).
There are therefore three dominant clusters that can be described within Primocandelabrum. The question is then to what taxonomic level these clusters should be assigned: genus, species, sub-species or variety? The clear discrimination of dumbbells from Primocandelabrum is not achieved for clusters within Primocandelabrum. Few clusters are defined by a 100% assignment for individuals that share the same categorical character state, or with 100% of individuals within a cluster sharing all the same character states. Only two or three categorical characters statistically significantly contribute to the definition of each cluster, with the other categorical characters showing no significant correlation with one cluster over another, even when only categorical characters are used. This is in contrast with the discrimination of dumbbells from Primocandelabrum based on almost all categorical characters. Although the mean values of many continuous characters for each cluster within Primocandelabrum are statistically significant from the means of the characters for Primocandelabrum as a whole (Table 2), the ranges in values for each group overlap (Table 5). In addition, morphological and branching characters do not correlate as closely as they do for dumbbells. An individual Primocandelabrum could be assigned to a taxon based on either its continuous or its categorical characters, but the resulting assignments are only likely to be the same around two-thirds of the time (Figs 4 & 5;Tables 3 & 4). This morphological variation does not appear to be the result of ontogenetic variation because the size ranges of specimens ascribable to each form overlap.
Taken together, this suggests that the taxonomic subdivision within Primocandelabrum is at a lower rank than the division between dumbbells and Primocandelabrum. The shared character states and overlap in morphology is inconsistent with division at a generic level. It could be used as a basis to treat the groups as morphs of one species. However, the three natural groupings are consistently identified and can be distinguished on the basis of the sum of their characters. Accordingly, we consider division of the Charnwood Forest Primocandelabrum specimens into three species (Figs 8-10) is justified. They are designated as P. aethelflaedia (corresponding to cluster 1 of Figs 5-7), P. boyntoni (corresponding to cluster 2 of Figs 5-7) and P. aelfwynnia (corresponding to cluster 3 of Figs 5-7).

Conclusions
Statistical techniques such as those described here provide a powerful way to analyse large datasets with missing values and, importantly, are a means of analysing both categorical and continuous characters in tandem, while being free from any assumption as to the taxonomic weight of any of the characters. They also provide a way of identifying and quantifying variation within a taxon. There are three possibilities for interpreting the variation within the group of individuals referable to Primocandelabrum. First, that it is a supra-generic group containing upwards of 15 genera, where each genus has a distinct branching architecture; second, that it is a single species within which there is a considerable degree of variation in both categorical and continuous characters; and third, that it is a single genus containing three species, within each of which there is variation in the continuous and categorical characters. That three natural clusters can be identified within the group leads us to favour the latter interpretation. We further interpret the dumbbells as a distinct genus from Primocandelabrum based on its consistent separation from that group based on all continuous and categorical characters.
These techniques may prove especially useful in elucidating the taxonomy of the feather dusters (Mason & Narbonne 2016). They allow taxa to be discriminated in an unbiased way based on the sum of the characters within the whole population. The findings that the combination of continuous and categorical characters is more powerful than either set considered in isolation, and that certain characters are more useful taxonomic discriminants (Table 5), has important ramifications for rangeomorph taxonomy. The application of these techniques to other taxa would, in an unbiased way, confirm or question some recent taxonomic revisions within rangeomorphs. These include synonymizations (Liu et al. 2016) and the creation of new genera based on single differences in branching character states alone (e.g. the differentiation of Vinlandia and Trepassia from Charnia).
The variety of branching characters in specimens whose gross morphology is otherwise essentially indistinguishable and, likewise, the shared branching characteristics of individuals with at least a superficially distinguishable morphology is surprising. This is especially so given the fact that few other published descriptions acknowledge variability in branching pattern within a taxon, unless attributed to processes such as ontogeny (Brasier et al. 2012). Certainly, Charnia masoni seems to have a very consistent branching pattern across multiple specimens, with the only variation observed clearly attributable to taphonomic processes (Wilby et al. 2015). Charnwood Forest has yielded few other currently assignable unifoliate rangeomorphs and so their variety in branching pattern is impossible to constrain. The only other multifoliate taxon known from more than a handful of individuals is Bradgatia, but its branching pattern is not distinguishable across large parts of many of these specimens due to their complex three-dimensional shape and their preservation as compound fossils. Analysis of taxa such as Trepassia, Vinlandia and Beothukis using these techniques would reveal the extent of variation in branching characters in unifoliate taxa and would test whether intra-specific variability in branching pattern is a feature unique to multifoliate taxa. The techniques presented here have the potential to revise the taxonomy of not just rangeomorphs, but also a host of other Ediacaran groups, including arboreomorphs, dickinsoniomorphs, erniettomorphs and even the palaeopascichnids. Problematic Phanerozoic groups would be similarly amenable to analysis using this approach. This species is characterized by a proportionally small disc (c. 25% (10-30%) of the total height), a short stem (average 25% (10-30%) of the total height), and a relatively wide crown (approximately equal to the total height, but 75-130%). The proportions of this taxon are close to the average for the genus Primocandelabrum (as defined from Charnwood Forest). Its first order branches have proximal inflation and are typically unconcealed and its second order branches are typically displayed, concealed and show proximal inflation, but may be unconcealed and either subparallel or radiating. Its third order branches are concealed and are typically displayed and unfurled, but may be subparallel or radiating, and with median or distal inflation. P. aethelflaedia sp. nov.
Etymology. Named after Lady Aethelflaed, an ancient queen of the historical kingdom of Mercia (within which Charnwood Forest is located).
Material. This species is described from 12 complete specimens, all from Bed B in Charnwood Forest (Wilby et al. 2011). Master moulds and casts are housed at the British Geological Survey, Keyworth.
Plastotype is designated as 4A1b (GSM105953; Fig.  9a, b), paratype is designated as 19C4 (GSM106049; Fig.  9c, d). This species has a proportionally long stem (average 35% (range 30-40%) of the total height) and a crown that is both short (average 65% (range 50-75%) of the total height) and narrow (average 80% (range 60-90%) of the total height). It is characterized by folia and second order branches that are concealed. First order branches show dominantly proximal (but sometimes median) inflation. Second order branches show dominantly median (but sometimes proximal) inflation, may be displayed or rotated and subparallel or radiating. Third order branches are concealed, but may be rotated or displayed, furled or unfurled, subparallel or radiating and with no or distal inflation. P. aelfwynnia sp. nov.
Etymology. Named after Lady Aelfwynn, the last queen of the historical kingdom of Mercia (within which Charnwood Forest is located).
Material. This species is described from 10 complete specimens, all from Bed B in Charnwood Forest (Wilby et al. 2011). Master moulds and casts are housed at the British Geological Survey, Keyworth. Plastotype is designated as 3D8 (GSM105963; Fig. 10a, b); paratypes are 19B1 (GSM106040; Fig. 10c, d) and BB (GSM105872; Fig. 1b). This species is characterized by a crown that is wider than the organism is tall (average 140% (range 110-175%) of the total height) and a proportionally large disc (average 50% (range 30-90%) of the total height). Its first order branches are unconcealed and inflate proximally and its second order branches are displayed, concealed or unconcealed and may be furled or unfurled and may show distal inflation. Its third order branches are concealed, and may be displayed or rotated, furled or unfurled, subparallel or radiating and with no, median or distal inflation.