Species composition drives macroinvertebrate community classification

Community classification enables us to simplify, communicate, track and assess complex distribution patterns. Yet, the distribution of organisms may not coincide with predefined geographical and environmental boundaries, and therefore, biology itself should be leading the classification. In this study, we showed how to arrive at such a biology-based classification by clustering locations based on similarity in species composition. A hierarchical classification structure allowed for the selection of classification levels that suit multiple scales of analysis. We also showed how to objectively identify the number of clusters present in a dataset based on the distribution of specific indicator species, allowing to identify clear boundaries in species composition on multiple scales. The resulting biology-based clusters were identified and characterized by local and regional environmental conditions, showing the limited explanatory power of these environmental conditions and the added value of taking biology itself as a starting point of the classification. By departing community classification from species composition, the unknown environmental, geographical, and biotic drivers influencing species composition are accounted for.


Introduction
It is inherent to humanity to try to categorize and classify our environments, either on local, regional or global scales. This habit persisted into modern sciences, by naming species and grouping these into zones, assemblages and communities (Quicke, 1993;Sokal, 1974), although opposing currents in science perceive the environment as a continuum which cannot be classified into distinct units, except when arbitrarily done (Austin and Smith, 1989;McIntosh, 1967;Whittaker, 1967). Yet, there is a general consensus on the scientific merit and practical use of community classification, as it enables us to simplify, communicate, track and assess complex distribution patterns (Everitt et al., 2011;Sokal, 1974). When describing and quantifying biotic responses to abiotic changes on larger spatial scales, such as the landscape-level, the community is often the most appropriate entity. Likewise, in environmental management, community classification may aid to evaluate if restoration measures indeed lead to the desired ecological improvements (Costanza et al., 2017;Cullum et al., 2016;Hawkins and Norris, 2000).
Environmental and geographical conditions are acting at spatial scales ranging from landscapes down to the level of habitats (Frissell et al., 1986;Hawkins et al., 1993;Poff, 1997). Likewise, community composition can be regarded as being hierarchically structured, where communities described at local scales are embedded in larger-scale communities (Miller, 2008;Patterson and Brown, 1991;Petsch et al., 2015), influenced by the environmental drivers that act on each specific spatial scale (Frissell et al., 1986). Communities on the other hand have often been classified on single predefined spatial scales, using artificial boundaries such as legislative regions or more natural boundaries such as catchments or geomorphologic regions. However, this ignores that the classification of communities can be achieved at a hierarchy of spatial scales, with each scale being characterized by multiple groups with a distinct species composition.
Community classification is approached in different ways, departing from a biological perspective, from geography, from the environment, or from combinations of these. Previously, biology-based classifications have helped us to understand patterns in species distributions, by delineating groups with a similar species composition and considering these as separate entities. Biology-based classification took a flight from the 1920s onwards, especially to define plant associations (Braun-Blanquet, 1928;Gleason, 1939Gleason, , 1926 and continued to be of use in vegetation science ever since then (Hill, 1979;Westhoff and van der Maarel, 1978).
Yet, the attention for biology-based approaches has decreased. Instead, within the various domains of ecology, abiotic characteristics such as environmental drivers and geographical zonation have been used for classification (Bohn et al., 2004;European Commission, 2000;Omernik and Griffith, 2014). Such characteristics are more easily measurable and quantifiable (Huet, 1949) and are assumed to be influenced more directly, showing a clear link to environmental management. Numerous classifications based on geographical and environmental characteristics have been developed. In geographical classifications, areas are subdivided into relatively homogeneous regions. These ecoregions are based on boundaries in the landscape such as landforms, soils and land use, and delineate areas that contain similar ecosystems inhabited by similar species (Illies, 1978;Wasson et al., 2004). Geographical classifications have been developed for many organism groups (Bohn et al., 2004;Fa et al., 2004;Gering et al., 2003;Hughes and Larsen, 1988;Van Sickle and Hughes, 2000) and span different spatial scales. Environmental classifications on the other hand are guided by certain predefined ranges of abiotic conditions, which are not necessarily linked to a geographic region. Environmental classifications have been made for a variety of applications, including ecological assessment (Costanza et al., 2017;Martin and Brunke, 2012;Van der Molen et al., 2016). In environmental classifications, the level of detail depends on the spatial scale of application. Classifications departing from a combination of geographical and environmental characteristics use regional delimitations, which are then subdivided into biotic types based on multiple environmental criteria. Examples are vegetation classifications (Mucina et al., 2016;Omernik and Griffith, 2014) and the water-type classification used for water quality assessment proposed by the Water Framework Directive (European Commission, 2000).
All approaches discussed above have both advantages and drawbacks in their conceptualisation of reality and application in practice. The main disadvantage of environment-and geography-based approaches is that they start from an a priori classification of abiotic conditions, either in a geographical or in an environmental context. Consequently, species composition itself is not leading the classification. Indicator species, representing local communities and indirectly the complex of environmental and biological interactions (Dauvin et al., 2010;Heink and Kowarik, 2010;Nijboer, 2006), are appointed to predefined regions or categories. However, it is unlikely that the distribution of organisms fully coincides with these arbitrary boundaries (Lorenz et al., 2004;Magnusson and Gering, 2004). This may be caused by interactions between environmental stressors and location-and geography-specific interactions between species, such as food web and non-food web relations, that influence community composition (Lake et al., 2007;Soberón, 2007). Indeed, regional classifications based on predefined landscape characteristics have lower classification strengths than classifications based on biotic composition (Hawkins and Vinson, 2000;McCormick et al., 2000;Sandin and Johnson, 2000;Van Sickle and Hughes, 2000). Hence, there are strong arguments to take biology as a starting point to arrive at classifications that account for geography, small-scale environmental variability as well as biological interactions. An additional advantage is that a hierarchically structured classification based on biology can be used to let the organisms show the appropriate scales in the classification, based on discontinuities in species composition. This is opposed to environment-and geographybased classifications, which are mostly developed for a single or multiple scales that are predefined by the user.
Yet, revitalizing biology-based classifications is accompanied by methodological constraints. Grouping species into specific clusters brings the challenge to objectively identify the most appropriate number of clusters which are present in a given dataset at different scales. Even when the clusters are clearly distinct and nonoverlapping, they may still be hard to identify by statistical procedures (Milligan and Cooper, 1985). Such internal metrics are based on measures for withincluster similarity and between-cluster distance, indicating relevant thresholds to separate clusters (Everitt et al., 2011;Legendre and Legendre, 2012;Van Sickle and Hughes, 2000). However, in most cases species composition gradually changes along environmental gradients in space and time and groups may not be distinctly separated, which makes it complicated to objectively determine the optimal number of clusters with such internal metrics. Due to this difficulty, multiple criteria have been used to define the optimal number of clusters (Adriaenssens et al., 2007;Costanza et al., 2017), while in some studies the adopted criteria are not reported at all (Lorenz et al., 2004;Wright et al., 1997). As these previously adopted metrics are clearly not effective in all cases, it is necessary to adopt another approach to objectively determine the appropriate number of clusters present in a dataset, which should be applicable on multiple spatial scales.
The aim of the present study was to arrive at a community classification which uses species composition as a starting point, objectively defining the appropriate numbers of clusters at the appropriate scales. To this purpose, we grouped sites based on their species composition and evaluated if the resulting community classification could be related to the environmental conditions prevalent at these sites. To objectively identify the appropriate numbers of clusters in the classification, we used indicator value analysis as a tool to determine relevant thresholds (Dufrene and Legendre, 1997). The availability of high-resolution distribution data for macroinvertebrates in the Netherlands offered a unique possibility to illustrate and validate our approach.

Data collection
Data on macroinvertebrate species distribution in the Netherlands was collected by 19 regional water authorities as part of ecological monitoring programs at 7103 sites for the time period 2007-2016. This resulted in a dataset containing abundance data for in total 1600 species of macroinvertebrates. For 4569 of the sites (64%), regional scale environmental conditions were recorded. These included geomorphology (size, form, soil type, presence and absence of fen features, being small water bodies with low buffering capacities), hydrology (flow character) and chemistry (salinity) ( Table 1). For 2704 of the sites (38%), also local scale environmental conditions, which are often major drivers of biological gradients in freshwaters, were recorded (Verberk et al., 2012;Verdonschot et al., 1998), including total nitrogen concentration, total phosphorus concentration, dissolved oxygen (DO) concentration, pH, chloride concentration and temperature.

Data analysis
To maximize the uniformity of the data from all sites, the macroinvertebrate data was pre-processed by only selecting species-level data, excluding taxa identified on a higher taxonomic level (e.g. genus, family, aggregate), originating from morphological identification issues. This ensured a consistent dataset for further analysis and avoided taxonomic overlap among taxa (Nijboer and Verdonschot, 2000). In addition, we did not aggregate species data into a higher taxonomic level, as this would result in the loss of species-specific data on Table 1 Regional scale environmental conditions recorded at the monitoring sites (see Table A1 in Appendix for definition of environmental class boundaries). ecological preferences. To decrease the effect of differences in sampling effort between the different regional water authorities, only presence/ absence data was used.

Site selection
Classification should be based on species composition under nearnatural conditions, since anthropogenic stress diminishes the natural differences between communities, obscuring the patterns in species composition required for an appropriate classification (McCormick et al., 2000;Verdonschot, 2006). Hence, to avoid the influence of strongly degraded sites on the classification, these have to be removed from the dataset. In the present study this was achieved by selecting sites that were not strongly degraded using a quality metric representing general degradation. To this purpose we applied the saprobic index, which responds to organic pollution and decreased oxygen concentrations, but which is also correlated with multiple other stressors acting on aquatic ecosystems, such as hydromorphological degradation and wastewater discharges (Burdon et al., 2019;Davy-Bowker and Furse, 2006;Verdonschot et al., 2012) and as such acts as a general stress indicator. We calculated the saprobic index for each site based on the saprobic valence of the macroinvertebrate species present (Moog et al., 2002), derived from a preference dataset. In this dataset, the relative abundance frequencies are distributed over four saprobic classes, ranging from low to high levels of organic pollution (Verberk et al., 2012). Sites where more than 50% of the species present had a preference for α-mesosaprobic or polysaprobic conditions were marked as degraded and were removed from the dataset (1505 sites, 21%).

Cluster analysis
The remaining 5598 sites were clustered based on species composition, using the cluster package within the R programme (Maechler et al., 2019;R Core Team, 2018). An agglomerative hierarchical clustering technique was used (agnes function), as we also aimed to elucidate the hierarchical structure based on species assemblages. The sites were sequentially grouped into larger clusters until a single cluster was obtained (Legendre and Legendre, 2012), which was depicted in a dendrogram. For each clustering step, the two clusters to be potentially merged were chosen based on minimum resulting within-cluster variance (Wards criterion).
The most appropriate level of clustering in the hierarchy was chosen using indicator species analysis (Costanza et al., 2017). An indicator value was calculated and tested for significance based on random permutation tests for each species present in a cluster (1000 iterations), which was based on specificity (maximum if all individuals of a species are found in one specific cluster only) and fidelity (maximum if the species occurs at all sites belonging to that specific cluster) (Dufrene and Legendre, 1997). This procedure was repeated for each level of clustering, ranging from 2 to 40 clusters, using the indval function in the labdsv package (Roberts, 2016).
The output of the indicator value analyses was used to select the appropriate clustering levels in the hierarchy a posteriori. Three indices based on the indicator value analysis were used to identify these clustering levels: 1) the sum of all significant indicator values, 2) the total number of significant indicator species and 3) the average significant pvalue. An outcome in which the former two indices were maximized and the latter minimized was considered to be optimal. The first index was considered to be the primary indicator, because a levelling off in the sum of significant indicator values points to a level of clustering where additional clusters have a limited additional explanatory power (Dufrene and Legendre, 1997), which indicates a distinct change in species composition. This can be observed as a bend in the curve when plotting the index value against the total cluster number.

Cluster characterization
Next, it was evaluated if the resulting biology-driven clusters could be related to environmental conditions acting at local and regional scales. To this purpose, environmental conditions were summarized for all sites within a specific cluster for which data on local environmental conditions was available (2388 sites, 43%). To determine potential effects of low DO concentrations, which could potentially pose stress to many macroinvertebrate species (Burnett and Stickle, 2001), the minimum value per site over the complete monitoring period was taken and averaged per cluster. The same procedure was applied for pH. For Cl concentrations, the maximum concentration per site was used (Kefford et al., 2007). For total nitrogen concentration, total phosphorous concentration and temperature, the mean value per site was calculated, after which these values were averaged over all sites within a cluster. Subsequently, we tested for significant differences in the mean abiotic conditions between the clusters using one-way Anova's and used Tukey's HSD test as a post hoc procedure to test for pairwise differences between the individual clusters.
Each of the regional scale environmental conditions was transformed into two or three distinct environmental classes (Table 1). For a total of 3593 sites (64%) for which data on these conditions was available, the proportion of sites within a cluster fitting into each of the environmental classes was calculated after removal of the highly impacted sites.

Cluster analysis
Cluster analysis of all sites based on its species composition using presence-absence data generated an agglomerative coefficient of 0.98. This indicated a strong clustering structure, confirming that Ward's linkage criterion performed well for clustering the sites (Fig. 1).
Indicator species analysis showed that for clustering levels up to 9 clusters, each cluster contained at least one unique indicator species. A steep increase in the sum of the significant indicator values was observed (Fig. 2) until a clustering level of 10 clusters. For clustering levels exceeding 10 clusters, the sum of the significant indicator values continued to increase, but this increase levelled off considerably above a total cluster number of 10, followed by a second shift at clustering level 30. This pattern points to the levels of clustering where additional clusters have a limited additional explanatory power. Therefore, the clustering levels of 10 and 30 clusters were considered biologically relevant levels and were used for further analysis. For a clustering level of 10 clusters, the sum of the significant indicator values was 97.2, which is relatively high compared to the values for the other clustering levels. The average of the significant p-values was 0.006 and the number of significant indicator species was 989. For a total of 30 clusters, the sum of the significant indicator values was 129.3 with an average p-value of 0.007 for a total number of 1022 significant indicator species, approaching the maximum value of 131.0 observed in the clustering range of this study.
For a total of 10 clusters, the number of sites per cluster varied between 102 and 1162 (1.8% and 20.8% of all sites respectively). The number of indicator species per cluster ranged from 0 to 183. For a total of 30 clusters, the number of sites per cluster ranged from 10 to 442 (0.1% and 7.9% of all sites), with 0 to 144 indicator species per cluster (Appendix Table A2).

Cluster characterization
Next, it was evaluated if the resulting biology-driven clusters at two hierarchical levels (based on n = 5598 sites) could be related to the regional and local scale environmental conditions of the sites within each cluster (available for n = 2704 and n = 4569 sites, respectively). Because describing this for a level of 30 clusters would be too lengthy, here we focused on illustrating our approach for the clustering level with a total of 10 clusters only. These 10 clusters with an overview of the corresponding indicator species and associated environmental conditions are presented in Fig. 3, based on the descriptive statistics in the Appendix (Table A3, Fig. A1, and Table A4).
Conform expectation, the distribution of species did not coincide with geographical and environmental boundaries and consequently the present clusters based on species composition did overlap geographically (Fig. 4). This is expressed by the regional distribution of the main environmental conditions that characterize the 10 clusters, which also overlap in space.

Discussion
A reliable and appropriate community classification may reveal and predict how changes in the abiotic and biotic environment impact local species composition. Here, we have developed a hierarchically scaled community classification for aquatic macroinvertebrates using an extensive dataset generated by the regional water authorities of the Netherlands. By applying the present indicator species analysis, we were able to objectively distinguish multiple groups on different clustering levels. With this classification, we aimed to show how species composition can guide the classification of communities over different scales.

Cluster analysis
Any clustering technique is an attempt to divide a continuum or gradient into distinct groups. In this respect, each site in the dataset could be regarded as being unique concerning its species composition and set of environmental conditions (Palmer and White, 1994). However, this would not allow us to generalize the effects of, amongst others, changes in the abiotic environment and human interventions on species composition. Hence, it is advantageous to group sites by similarity. A method to guide this grouping is to acquire a higher similarity between sites within a cluster than between clusters, but for long gradients in absence of clear discontinuities, this may be less feasible (Sokal, 1974). In such cases, combining crisp and fuzzy clustering techniques may help to come to an ecological classification of sites (Adriaenssens et al., 2007). However, in several studies, this challenge was solved by using subjective criteria to determine the number of  clusters, while sometimes even no criteria were reported at all (Lorenz et al., 2004;Wright et al., 1997). Therefore, in the present study we used an alternative approach based on indicator value analysis to identify the appropriate cluster number at different scale levels (Dufrene and Legendre, 1997), showing that it is indeed possible to objectively arrive at the choice for the appropriate cluster number. Fig. 3. Overview of the ten clusters derived from the cluster analysis (based on n = 5598 sites) with their associated indicator species and local and regional scale environmental conditions (based on n = 2704 and n = 4569 sites, respectively). Strongest indicator species are those species that have the highest indicative values, based on specificity and fidelity (Dufrene and Legendre, 1997). P: mean total phosphorous concentration, N: mean total nitrogen concentration, DO: minimum oxygen concentration, T: maximum water temperature.

Cluster characterization
The entity of a community and the appropriate criteria and scales for defining a community are continuously under debate (Austin and Smith, 1989;Palmer and White, 1994). Previously, the delineation of communities has been described on multiple spatial scales using multiple theories (Heino et al., 2005;Townsend, 1989;Vannote et al., 1980). An operational view on defining communities was posed by Palmer et al. (1994), suggesting that communities could be described for any arbitrary spatial unit, as there is no single correct scale of observation. Contrary to these views, we showed that it is indeed possible to objectively identify at which spatial scales discontinuities in species assemblages occur. In addition, such discontinuities were identified at multiple spatial scales in the landscape.
We succeeded to distinguish multiple scales with different levels of detail in the clustering, in our case consisting of 10 and 30 clusters. Which clustering level would be preferred depends on the intended application. In a local assessment context, it may be necessary to distinguish a larger number of clusters at a finer scale. For the current dataset, the classification at a level of 10 clusters represented distinctly different biotic groups. The classification with a total number of 30 clusters could represent a subdivision of the biology-based water types into groups of near-natural and more degraded sites (Appendix Fig. A2) and the clusters could picture more detailed groups of species, which can be of use in applications at smaller spatial scales.
The classification at a level of 10 clusters yielded groups which all had a distinct species composition. Some clusters could be associated with water types characterized by relatively extreme environmental conditions, such as a high salinity, acidic conditions or relatively low water temperatures in combination with higher dissolved oxygen concentrations (Fig. 3). On the other hand, for other clusters the environmental conditions were not markedly different, whereas the identified indicator species did point at a cluster-specific species composition, raising the question what these biology-based clusters actually reflect.

Ecological interpretation
In the present study, we let species composition drive community classification and we evaluated if the outcome of the biological clustering was associated with local and regional scale environmental conditions. It was observed that the limited set of environmental conditions did not suffice to set clear geographical and environmental boundaries to explain the variation and composition of the biologybased clusters.
This could be due to missing out on certain environmental conditions, both abiotic and biotic, relevant in driving the species composition at these sites, e.g. small-scale habitat variability, food quality or flow velocity, or the occurrence of disturbances (Feld and Hering, 2007;Hering et al., 2006;Poff, 1997). In the current analysis, the set of environmental conditions included six local chemical and physical parameters and six larger-scale parameters describing morphology, hydrology and chemistry. However, even when this set of environmental conditions would be extended, this might still be insufficient to characterise the biology-based clusters, because it remains unclear which are exactly the key environmental variables that organisms respond to (Scherrer and Guisan, 2019). Hence, the challenge remains to identify the environmental variables with the highest ecological relevance. Ideally, environmental variables should be included that act on multiple spatial and temporal scales, representing both average and extreme conditions in hydrology, chemistry, morphology, system conditions and biology. Interactions between environmental stressors further complicate the identification of the set of environmental conditions that determine whether or not species are present or absent (Beermann et al., 2018;Coors and de Meester, 2008;Jackson et al., 2016).
In addition to these environmental drivers, geographical drivers (such as dispersal-related processes) and biotic drivers (food web-and other biotic interactions) likely influenced the observed patterns in species distribution (Guisan and Thuiller, 2005;Lake et al., 2007). This shows that purely environment-and geography-based classifications are not sufficient to understand patterns in species composition. Therefore, as long as we are unable to fully characterize the true drivers of species composition, we should adopt biology-based classifications to acknowledge the unknown environmental, geographical, and biotic drivers of species composition.

Application
Community classification finds applications in many areas. Within restoration ecology, the clusters could indicate regional species pools available for recolonization of disturbed or restored sites (Carstensen et al., 2013).
In addition, biology-based classifications can be used to refine ecological water quality assessment systems that are now still based on a priori defined geography-or environment-based classifications (Martin and Brunke, 2012;Van der Molen et al., 2016). A revised assessment system would then be based on a biology-based classification of community types as described here, while the assessment system itself would remain similar. First a reference community should be described by using community type-specific indicator species within a relative homogeneous geographical region and set of environmental conditions (as in water types). The ecological status of a water body can then be described as the degree of similarity to this region-and water type-specific reference community. Yet, it remains a challenge to assign strongly degraded sites to one of the defined community types, since they may not be recognizable anymore due to the strongly impoverished community composition. Degraded sites under multiple stress will lose their distinctive features, having a low number of ubiquitous species. In such cases, the assignment could be done in a context-specific way, with knowledge of the original water body type and the subsequent restoration possibilities to improve the ecological status.
Furthermore, the delineation of communities on multiple scales can aid in monitoring and tracking changes in species composition under the influence of environmental change, further degradation and restoration attempts (Adriaenssens et al., 2007;Palmer et al., 1997). In addition, it can improve identification of the spatial scale on which certain stressors act, like temperature (climate change) on a larger scale or point pollution (e.g. road runoff) on a local scale.

Conclusions
In the present study, we let species composition drive community classification, not guided by a priori defined environmental or geographical boundaries. We succeeded in identifying appropriate clustering levels on multiple scales in an objective way using indicator species analysis. The resulting biology-based clusters were compared to local and regional environmental conditions, showing the limited explanatory power of environmental conditions and the added value of taking species composition itself as a starting point. By departing from a biological perspective in community classification, the unknown environmental, geographical, and biotic drivers influencing species composition are accounted for.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Table A3
Summary of local and regional environmental conditions of the 10 community clusters. Ptot: total phosphorous concentration, Ntot: total nitrogen concentration, DO: dissolved oxygen concentration, T: temperature, Cl: chloride concentration.