Does taxonomic and numerical resolution affect the assessment of invertebrate community structure in New World freshwater wetlands?

The efficiency of biodiversity assessments and biomonitoring studies is commonly challenged by limitations in taxonomic identification and quantification approaches. In this study, we assessed the effects of different taxonomic and numerical resolutions on a range of community structure metrics in invertebrate compositional data sets from six regions distributed across North and South America. We specifically assessed the degree of similarity in the metrics (richness, equitability, beta diversity, heterogeneity in community composition and congruence) for data sets identified to a coarse resolution (usually family level) and the finest taxonomic resolution practical (usually genus level, sometimes species or morphospecies) and by presence-absence and relative abundance numerical resolutions. Spearman correlations showed highly significant and positive associations between univariate metrics (richness and equitability) calculated for coarse- and finest-resolution datasets. Procrustes analysis detected significant congruence between composition datasets. Higher correlation coefficients were found for datasets with the same numerical resolutions regardless of the taxonomic level (about 90%), while the correlations for comparisons across numerical resolutions were consistently lower. Our findings indicate that family-level resolution can be used as a surrogate of finer taxonomic resolutions to calculate a range of biodiversity metrics commonly used to describe invertebrate community structure patterns in New World freshwater wetlands without significant loss of information. However, conclusions on biodiversity patterns derived from datasets with different numerical resolutions should be critically considered in studies on wetland invertebrates.


Introduction
The taxonomic impediment (Terlizzi et al., 2009) is a major challenge for biodiversity research (Bevilacqua et al., 2012). Knowledge about the biodiversity of many groups is limited, and the availability of specialized taxonomists to collaborate with ecologists is inconsistent throughout the world (Wheeler et al., 2004;Hortal et al., 2015). Consequently, it has become commonplace for researchers to rely on assessments conducted at lower (coarser) taxonomic resolutions, known as taxonomic surrogacy (or the higher-taxon approach; i.e. lower taxonomic resolution relies on higher taxonomic categories; Bertrand et al., 2006;Bevilacqua et al., 2012), to describe community patterns and assess ecological relationships. Indeed, several studies have shown that community patterns obtained with species level data, including those revealed by multivariate techniques, are still observed at lower taxonomic resolutions (usually the family level) (Kallimanis et al., 2012;Hernandez et al., 2013;Mueller et al., 2013;de Oliveira et al., 2020).
In contrast, several assessments and meta-analyses suggest that the efficacy of the highertaxon approach can be variable (Lenat and Resh, 2001;Jones, 2008;Mueller et al., 2013;de Oliveira et al., 2020). The similarity in statistic outcomes can be weakened at coarser resolutions for different taxa or vary according to geography or ecosystem type; these inconsistencies include inaccurate estimates of biodiversity patterns (Melo, 2005;Heino and Soininen, 2007;Rosser and Eggleton, 2011;Heino, 2014;Vilmi et al., 2016). Different factors may explain the variation in results when different taxonomic resolutions are used, for example, species-to-higher taxa ratios, spatial extent, methods of data transformation methods and differences in niche conservatism among groups (Bevilacqua et al., 2012;Mueller et al., 2013;Neeson et al., 2013;Rosser, 2017). This lack of generality led to criticism by several authors and to a long-standing debate on the appropriateness of the use of the higher-taxon approach for reliable ecological assessments (Lenat and Resh, 2001;Jones, 2008;Rosser and Eggleton, 2011;Mueller et al., 2013;de Oliveira et al., 2020). While many authors maintain that the species level is the most appropriate resolution to describe ecological responses (Lenat and Resh, 2001;Jones, 2008;Terlizzi et al., 2009;Rosser and Eggleton, 2011), others suggest that the higher-taxon approach is justified by its favorable cost-benefit relationship given the logistic and financial constraints faced by many researchers. The higher-taxon approach is especially appropriate for rapid biological assessments or to prioritize conservation efforts in areas lacking needed taxonomic resources (e.g., keys, experts) (Bailey et al., 2001;Jones, 2008).
The higher-taxon approach has been extensively used for aquatic invertebrates in streams; in these ecosystems, the patterns observed at coarse taxonomic resolutions usually mirror results obtained with finer taxonomic resolutions (Melo, 2005;Heino and Soininen, 2007;Monk et al., 2012;Heino, 2014;Vilmi et al., 2016). It has, however, rarely been investigated for aquatic invertebrates in wetlands (Epele and Miserendino, 2015), despite the diversity of habitat types (Batzer, 2013) and the acknowledged ecological value and threatened status of these ecosystems (Costanza et al., 2014). The taxonomy of fauna dominating streams (e.g. Ephemeroptera, Plecoptera and Trichoptera) is well known, and most studies in streams can thus be based on data with fine taxonomic resolution (genus, species, or morphospecies, Lenat and Resh, 2001). This is rarely an option for studies of wetland invertebrate communities because most invertebrate species in these ecosystems belong to highly diverse insect orders (e.g., Coleoptera, Diptera), for which adults are frequently required to key individuals to species, and sometimes genus. Consequently, community-level assessments of invertebrates in wetlands based on species-level data are usually restricted to particular families or genera (Chessman et al., 2002;King and Richardson, 2002;Chadd and Extence, 2004;Simić et al., 2007;Garrido and Munilla, 2008;McDaniel et al., 2017;Grech et al., 2019).
In addition to the issue of taxonomic resolution, differences in numerical resolution (e.g., relative abundance and presence-absence) can influence the observed patterns of community structure (Melo, 2005;Mueller et al., 2013;Heino, 2014;Sgarbi et al., 2020). Analyses based on presence-absence data sets tend to increase the influence of rarer taxa (Anderson et al., 2011) and may be especially useful for diversity studies. Analyses based on abundance data emphasize the importance of common taxa, and may be especially useful for studying interspecific interactions (Heino 2014). Differences in numerical resolution may constitute an especially important issue for the study of community structure in freshwater wetlands because the invertebrate communities are usually numerically dominated by a few taxa such as chironomids, oligochaetes and microcrustaceans (e.g. Kratzer and Batzer, 2007;Moraes et al., 2014), which are the same taxa that are rarely classified with a fine taxonomic resolution.
In this study, we assessed the effects of different taxonomic and numerical resolutions on a range of metrics commonly used in the characterization of the community structure, focusing on aquatic invertebrates in New World freshwater wetlands. We tested the congruence in community-level patterns between 'coarse' (usually family) and 'fine' (typically genus) taxonomic resolution and between presence-absence and relative abundance data sets for a range of univariate and multivariate metrics. We analyzed six independent invertebrate data sets from different wetland habitats distributed across subtropical and temperate regions of North and South America. If consistent patterns span across this range of habitats, the application of the higher-taxon approach and the use of presence-absence data could provide a useful shortcut in the assessment of biodiversity and community patterns of wetland invertebrates in the New World.

Study regions and reference sources
We compiled data sets of wetland invertebrate assemblages from six regions distributed across North and South America where freshwater wetlands are common elements of the landscape. In North America, the data sets covered the following regions of the continental United States (US): the dry-temperate northern Prairie Pothole region (state of North Dakota), the wet-temperate Northeastern US (state of Maine), dry-temperate Western US (the Central Valley region of the state of California), and the wet-subtropical Southeastern US (Coast Plain of the state of Georgia). In South America, the data sets covered the wetsubtropical Coastal Plain of Southern Brazil (states of Santa Catarina and Rio Grande do Sul) and the dry-temperate Argentinian Patagonia (province of Chubut) ( Fig. 1; Table 1).
Each data set was comprised of 10-40 wetlands and the data sets over the six regions included a wide range of habitat types (e.g., prairie potholes, Carolina bays, Patagonian mallines, temporary and permanent ponds, rock pools) and variable time frames (ranging from a single-year snapshot collection to a three-year collection period; Table 1). Thus, our analyses were carried out for each data set separately. Additional details on specific environmental features of the study sites and collection procedures in each study region can be found in Supporting information 1.

Taxonomic and numerical resolution
To compare the effects of taxonomic and numerical resolution on the patterns of community structure in wetland invertebrates, we assembled four matrices: (i) presence-absence at the coarse taxonomic level; (ii) presence-absence at the finest practical taxonomic level; (iii) relative abundance at the coarse taxonomic level; and (iv) relative abundance at the finest practical taxonomic levels. The coarse category usually corresponded to family level (although sometimes certain taxa could only be identified to coarser (higher) taxonomic resolutions, e.g., turbellarians, water mites, some anostracans, etc.). We refer hereafter to this category as the 'family' level, for the sake of brevity. The finest practical taxonomic level corresponded to either genus, species or morphospecies; (although certain taxa could only be identified to a coarser taxonomic resolution (e.g. family). We refer hereafter to this resolution as the 'finest' level.

Data analysis
2.3.1. Univariate approaches-For each region, we used the Spearman correlation coefficient to test the relationship between 'family-' and 'finest-level' datasets in terms of richness and community equitability (Shannon-Weiner diversity index). Correlations were conducted separately for each study region.

Multivariate approaches-We assessed whether taxonomic resolution interferes
in the homogeneity of multivariate dispersion within each study region. To calculate the average distance of each wetland (sampling unit) to their corresponding group centroid, we employed the PERMDISP procedure (Anderson et al., 2006). In our procedures, beta diversity was given by the total variance in a data set, in accordance with Legendre and De Cáceres (2013); the relative contributions of the turnover and nestedness components were calculated using the Podani family of indices (Podani and Schmera, 2011). We tested for possible effects of taxonomic and numerical resolution on beta diversity metrics and the relative contribution of the turnover (replacement) and nestedness (richness) components to compositional dissimilarity by means of paired t-tests.
We assessed the effects of different resolutions (taxonomic and numerical) on the distribution and relationships between sampling units by means of Principal Coordinates Analysis (PCoA). Prior to the PCoA, the dissimilarity matrices derived from the relative abundance data sets were square-root transformed to avoid the production of negative eigenvalues (Legendre and Legendre, 2012). Finally, we used a Procrustes analysis to test the degree of congruence between PCoA sampling scores derived from datasets with different taxonomic resolutions (Legendre and Legendre, 2012). In our procedures, the Procrustes analysis was based on the site scores of the full set of vectors generated by PCoA. The significance of the Procrustes correlation was assessed with a permutation-based approach (protest function; 9999 permutations; Peres-Neto and Jackson, 2001). We tested the degree of congruence for the following set of pairwise combinations of composition data sets: (i) family level (presence-absence vs. relative abundance); (ii) finest level (presenceabsence vs. relative abundance); (iii) family level (presence-absence) vs. finest level (presence-absence); and (iv) family level (relative abundance) vs. finest level (relative abundance). We ran all analyses in R v. 3.6.0 (R Core Team, 2019) using the functions available in the packages vegan (Oksanen et al., 2019) and ade4 (Dray and Dufour, 2007).
Finally, to provide a raw assessment of the ratios of the taxa classified to genus or species to their corresponding composite family taxa in each study region, we parsed the character of the "finest level" for each of the six regions, as compiled by the different research groups. That is, we calculated the following information for each data set: (i) the number of taxa that could be identified only to the family level (or to a coarser taxonomic resolution); (ii) the number of families with only one genus; (iii) the number of families with two genera; and (iv) the number of families with three (or more) genera.

Results
The Spearman correlation coefficients showed that richness values and equitability (measured for each taxonomic resolution) were positively and significantly correlated (P < 0.05) in the data sets of all regions (Fig. 2). The numerical outputs of the linear correlations in each region are given in Supporting information 2. For beta diversity, higher values of total beta diversity and turnover were obtained with data with fine (genus) taxonomic resolution than with data with coarse (family) taxonomic resolution; the opposite pattern was found for nestedness component, i.e., lower values of the nestedness component were obtained with data with fine taxonomic resolution as compared to data with coarse taxonomic resolution (irrespective of the numerical resolution; Supporting information 3). Regarding numerical resolution, the relative contribution of the turnover component was higher in presence-absence datasets when compared to relative abundance (except for Northern US data set) (Fig. 3). The outputs of the beta diversity metrics obtained for each numerical and taxonomic resolutions are given in Supporting information 3. Heterogeneity of community composition increased from coarsest to finest taxonomic resolutions and was higher in relative abundance data sets (compared to presence-absence data sets) in each region (Fig. 4). The outputs of the paired t-tests for the comparison of the absolute values and relative contribution of the turnover and nestedness fractions of beta diversity were significant or marginally significant in all cases (Supporting information 4). The Procrustes correlation coefficients for all pairwise comparisons were significant (P < 0.0001). Nevertheless, in each region, the values of the Procrustes correlation coefficients were consistently higher (r > 0.9 in most cases) for pairwise comparisons between taxonomic resolutions (family vs. finest) than comparisons between numerical resolution (presenceabsence vs. relative abundance; Table 2). The graphical outputs of the Procrustes analyses between the invertebrate composition data sets are given in Supporting information 5. PCoA ordination diagrams showed that the relationships among sampling units were consistently more similar across taxonomic resolutions based on the same numerical resolution for all regions (Supporting information 5).
In every invertebrate composition data set, the number of taxa that could not be identified beyond family levels (or to a coarser taxonomic resolution), combined with the number of families encompassing a single genus, represented the largest amount of the invertebrate composition. In contrast, the number of families where researchers identified multiple genera (or other finer levels) was a distinct minority in each region (Fig. 5).

Discussion
We found that the patterns observed at coarse taxonomic resolutions (typically family level) showed elevated congruence to finest-practical taxonomic levels (typically genus level) for most of the metrics studied. Most important, the majority of the correlations between taxonomic resolutions were consistent across regions and wetland types. Thus, our results provide empirical evidence for the potential broad application of the higher-taxon approach in studies aiming at the basic characterization of invertebrate community structure patterns in New World freshwater wetlands.
For each alpha diversity metric (richness and Shannon-Weiner index), we found significant and strong correlations between taxonomic resolutions. The majority of the correlations exceeded 0.7, a threshold for congruence suggested by Heino (2010). We found a similar pattern for multivariate metrics, especially the outcomes of ordination techniques (Procrustes tests and PCoA ordination diagrams) of community composition (Table 2; Supporting Information 5). The highest Procrustes r coefficients were detected for comparisons between taxonomic (rather than numerical) resolution data sets. Numerous reasons can account for the high congruence between taxonomic resolutions, some of which are related to intrinsic ecological characteristics of wetlands, while others, to contingencies associated with each research group. First, the taxonomic identity in the highest resolution data set often (~80%) matched the identity in the family data set or only a single genus occurred in the same family (Fig. 5). It appears that many families are simply not diverse in wetlands of the Nearctic and Neotropical regions (e.g. Crangonyctidae = Crangonyx, Lestidae = Lestes, Chaoboridae = Chaoborus, etc.); or diverse families frequently encompass a single genus within a given location (Maltchik et al., 2010). Thus, classification differences between family and finest-level occur for a small number of families (0-20% of families had multiple genera; Fig. 5). Low species-to-higher taxa ratios (in our case, finest-to-coarse) underlie the efficacy of the higher-taxon approach (Heino and Soininen, 2007;Bevilacqua et al., 2012;Rosser, 2017, de Oliveira Jr. et al., 2020, and this explanation seems especially prevalent in wetlands. Besides the low within-family richness, an important reason underlying the high congruence between datasets with different resolutions is the low completeness of the surveys, which, in turn, is due to the lack of expertise by most research groups to classify beyond the family for a wide array of organisms: e.g., helminths, annelids, water mites, microcrustaceans, immature zygopterans and corixids, and certain dipterans. Possibly, if those hard-to-identify groups are also more diverse in wetlands, lower correspondence would be detected between patterns generated with different taxonomic resolutions. Previous studies described that some of these underrepresented taxa can show elevated richness in some regions and be indicative of environmental change (Panatta et al., 2006;Batzer et al., 2014). Alternatively, some of these groups may be so rare that the effort for a detailed taxonomy is deemed unwarranted. Furthermore, because taxonomic errors at the genus level (and lower) are much more likely than at the family level (see Jones, 2008), many researchers opt for a conservative approach. Regrettably the taxonomic expertise needed for a refined taxonomy is often not available, especially for the common invertebrate groups in wetlands, or if available, access to these experts is often beyond the financial capabilities of many research groups, the experts themselves lack the time or resources to do the work, or productive collaborations have not been previously nurtured. Additionally, larval keys for many genus and species almost always focus only on late-instar stages of most invertebrates, which are either underrepresented in samples or unavailable throughout the year, and thus reliable identification even by experts often becomes impracticable. This suggests a scenario in which researchers face the choice of investing more time into finer taxonomic resolutions of fewer speciose groups (in order to obtain more accurate ecological responses) and incurring the risks of increased taxonomic errors along with logistical costs. In summary, low natural richness and limited taxonomic expertise among research groups synergistically act to render similar community-level patterns across different taxonomic resolutions.
To some extent, the differences in taxonomic expertise are likely accountable for the increased beta diversity and average distance to centroid in some datasets, i.e., the Maine (rock pools). In this data set, the Chironomidae were identified to genus (and sometimes species level). This contingency may also explain the much weaker congruence between taxonomic resolutions in the Maine data set. Chironomidae is usually the most speciose (as well as the most abundant and widespread invertebrate group) in freshwater wetlands (e.g., earlier studies have described more than 50 species within a single wetland; see Batzer et al., 2014). In some cases, the species richness of Chironomidae alone can exceed the richness of other invertebrates, vertebrates, or plants. This combination contributes to make the identification of Chironomidae to the genus level likely the most influential contingency impacting the observed community structure patterns (Chessman et al., 2002;King and Richardson, 2002;Jones, 2008). For the highly diverse invertebrate families in wetlands, genus (or species) level resolution has been shown to elucidate finer-level ecological relationships (see King and Richardson (2002) and Chessman et al. (2002) for Chironomidae; Grech et al. (2019) for Culicidae;and McDaniel et al. (2017) for Dytiscidae). Chessman et al. (2002) showed that species-level resolution in Chironomidae data sets rendered more accurate discrimination of sampling units in biomonitoring studies. However, contrary to what we observed in the Maine data set, the classification of the Chironomidae to genus in the Patagonia data set did not have appreciable effects. For organisms other than invertebrates (plants, fishes), the importance of genus (or species) identification for ecological analyses has been highlighted (Mueller et al., 2013). Yet, most wetland research groups are unable to reliably classify Chironomidae specimens beyond sub-family. Although we agree that refined taxonomic identification would always provide more robust assessments, the basic characterization of invertebrate community structure was little affected by the finer-level identification of Chironomidae (and other families) in our study.
The values of beta diversity (total variance in a data set), as well as the relative contribution of the turnover component, and heterogeneity in community composition (average distance to centroid) were higher in the datasets with finest taxonomic resolution. These results resembled the findings by Terlizzi et al. (2009) andHeino (2014), who found lower heterogeneity in multivariate dispersion measures at coarser taxonomic resolutions. To some extent, increased heterogeneity could be the observed outcome of patchy distribution of genera or species, as stressed by Heino (2014). For example, the numerous examples of taxonomic turnover across wetland hydroperiod and predator gradients (Wellborn et al., 1996) typically occur among species within genera (e.g., Lestes or Enallagma damselflies (Stoks and McPeek, 2006) and Chaoborus midges (Garcia and Mittelbach, 2008)) or genera within families (e.g., Limne-philidae caddisflies, Wissinger et al., 2006). This may also be the case in our study, since some of the most widespread wetland invertebrate families (e.g. Dytiscidae, Hydrophilidae, Chironomidae) also encompassed an elevated number of taxa.
The other general trend observed in our assessment were the decreasing strength of the correlations (within the same taxonomic resolutions) and reduced similarity in taxonomic composition ordination diagrams across numerical resolutions. An example of this is evident in the Northern US prairie Pothole and Western US data sets (Supporting Information 5), where the magnitude of correlations decreased more sharply (regardless of taxonomic resolution in both cases). This effect of data transformation was also obtained by Mueller et al. (2013) and Heino (2008) for assorted taxa in streams, with impacts of numerical resolution being especially pronounced for complex community structure analyses (e.g. ordinations). It may be that the choice of changing the numerical resolution masks the effect of the dominant species responsible for driving assemblage patterns in each community (Heino 2008), which is particularly common in wetland invertebrate communities (Kratzer and Batzer, 2007;Batzer et al., 2014;Moraes et al., 2014). Heino (2014) also found that community-environment relationships were more influenced by numerical resolution, regardless of taxonomic resolution. Our results thus suggest that presence-absence resolution may not always be optimal to detect community-environment relationships.

Conclusion
Our study provides support for the potential application of the higher-taxon approach for the characterization of basic patterns of invertebrate community structure in New World freshwater wetlands. In particular, community-level patterns detected at coarser taxonomic resolutions (typically family level) were similar to those detected with the finest-practical taxonomic levels (usually genus level) for a range of metrics (e.g., richness, equitability and ordination diagrams). Our results thus suggest that family-level assessments may be a costeffective alternative for biodiversity studies focusing on invertebrate communities. In contrast, the congruence between community composition data sets based on presenceabsence and relative abundance resolutions was lower than different numerical resolution, suggesting that the ability to distinguish ecological patterns within study regions is more sensitive to numerical resolution and should be carefully appraised in studies on wetland invertebrates. Lastly, our findings do not render void species-level assessments; these are essential to provide value judgments of wetland habitats for potential legal protection.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.  Scatterplots of the relationships between richness (left column) and equitability (right column) of invertebrate communities calculated for each taxonomic resolution in each region. Lines were fitted for the purpose of visualization only. Side-by-side bar charts with the relative contribution of the turnover (a) and nestedness (b) components of the beta diversity across taxonomic and numerical resolutions in each region. PA = presence-absence; RA = relative abundance. Heterogeneity in community composition depicted in side-by-side bar charts with the average distance to centroid (calculated using the PERMDISP approach) at each taxonomic resolution in each region. PA = presence-absence; RA = relative abundance. Horizontal stacked bar plots with the number of taxa that could be identified only to the family level (or to a coarser taxonomic resolution) ('Not beyond Family'); the number of families with only one genus ('Family Single Genus'); the number of families with two genera ('Family 2 Genera'); and the number of families with three (or more) genera ('Diverse Families') in each of the six study regions.
Pires et al.
Page 18 Table 1 Basic information about the wetlands in each region used in this study. "Temporal time frame" indicates collection periods.  Results of the Procrustes analysis between the invertebrate composition data sets with different taxonomic and numerical resolutions. PA = presenceabsence; RA = relative abundance; 'Family' = family-level taxonomic resolution; 'Finest' = finest practical taxonomic level.