A global perspective on the application of riverine macroinvertebrates as biological indicators in Africa, South-Central America, Mexico and Southern Asia

The aim of this study is to generate a first global overview of pressures and methods used to assess the envi- ronmental quality of rivers and streams using macroinvertebrates. In total, 314 peer-review studies were reviewed, published in the period 1997 – 2018, from developing economies in Africa, South-Central America, Mexico and Southern Asia. To establish a global perspective, the results from the literature review were compared to other compiled datasets, biomonitoring manuals, environmental surveys and literature reviews from Europe, North America and Australasia. The literature review from the developing economies showed that sampling was most usual during baseflow, using kick- or Surber sampling, with taxonomical identification levels mostly to genus or family. Assessments were most often done using metrics (singular and multimetrics; > 70% of the applications) and were based on community attributes related to richness and dominance (58% of studies), sensitivity (40%), diversity by heterogeneity (32%) and functional traits (25%). Within each category, the most used metrics were the richness and dominance of Ephemeroptera, Plecoptera and Trichoptera (EPT), Biological Monitoring Working Party scoring systems (BMWP/ASPT), Shannon-Wiener diversity and feeding traits. Overall, 92% of the reviewed studies reported that the use of macroinvertebrates, at least in some of their responses, was successful in detecting degradation of environmental quality in the investigated rivers. Given the many simi-larities in applied methods worldwide, at present, we consider that a global assessment of riverine environmental quality can be feasible by using family level identifications of macroinvertebrate samples. We propose a global common metric (multimetric), comprising three of the most common river assessment metrics from the reviewed literature, but also elsewhere, namely the BMWP/ASPT, Shannon-Wiener diversity and richness of EPT. Recent concerns regarding the global state of nature and consequences for freshwater communities, as reported by the intergovernmental science-policy platform on biodiversity and ecosystem services (IPBES), emphasize the urgent need for such a synthesis.


Bioindication in a global context
The rate of global change witnessed in natural systems during the past half-century is unprecedented in human history (IPBES, 2019). Ecosystems worldwide now suffer from multiple large-scale impacts related to pollution, habitat degradation, climate change and introduction of alien species. The negative consequences for biodiversity and ecosystem services have underpinned an urgent need to take action on a global scale and has led to a set of international abatement targets, including the UN Aichi Biodiversity Target 15 and EU Biodiversity Strategy for 2030 for restoring at least 15% of degraded ecosystems by 2020 (Convention on Biological Diversity, 2010), and to achieve favourable status for at least 30% of species and habitats not currently in that status by 2030 (EU, 2020).
Bioindicators are essential in tracking and quantifying environmental impacts (Carter and Resh, 2001;Niemi and McDonald, 2004), and are instrumental in the management and conservation of freshwaters worldwide by supporting the policy-makers who aim to improve and protect the ecosystems themselves and the goods and services they supply (Friberg et al., 2011). The need for a global assessment system using cost-effective bioindicators that track changes in ecosystem health and biodiversity is evident. For riverine ecosystems this could be achieved in a short-term perspective as there is already a worldwide use of methods that may be comparable in terms of scientific approach and underlying methodology.
The long tradition of using bioindicators to assess environmental quality, spanning more than a century in freshwater science (Cairns and Pratt, 1993;Metcalfe-Smith, 1996), has led to the accumulation of a substantial body of knowledge on biological community responses to human-induced stress, as well as a range of assessment methods that possesses a large degree of commonality (Friberg et al., 2011). The present-day widespread use of biological indicators for monitoring is a prime example of the applied use of ecological knowledge that has contributed to maintain and improve the environmental quality of many riverine ecosystems during recent decades (e.g. Birk et al. 2010), and which has a significant potential also to become instrumental in many developing economies with limited experience of the use of such tools and systems on a national basis.

Riverine macroinvertebrates are the key indicator group
Macroinvertebrates are the most used riverine indicator group in modern freshwater biomonitoring (Birk et al., 2012;Carter et al., 2017;Hellawell, 1986). They share a fundamental prerequisite by covering a range in sensitivity to a variety of stressors. In addition, they have several practical advantages for bioassessments, including a wide distribution in most rivers, a sedentary behaviour providing good spatial resolution and relatively long lifecycles. Moreover, they are easy to sample and can be identified to an operative level in a cost-effective way (Bonada et al., 2006;Rosenberg and Resh, 1993).
Although records of biological responses to water pollution date back to ancient Greece (Moog et al., 2018), modern biomonitoring originated in Europe and North America approximately a century ago (e.g. Forbes and Richardson, 1913;Kolkwitz and Marsson, 1909), with the development of indicator systems that focused on detecting inputs of organic waste to rivers (Cairns and Pratt, 1993;Karr and Chu, 1999). It took, however, approximately half a century before such systems were used routinely by river authorities to assess environmental quality (Carter et al., 2017;Hawkes, 1998). The saprobic systems originated in central Europe in the early 20th century and assessed rivers by measures of saprobity, i.e., the dependence of aquatic organisms on decomposing organic substance as sole source of food, having species specific tolerances for bacteria, algae and fauna (Persoone and De Pauw, 1979;Sládeček, 1969). The saprobic systems were revised and modernized several times in the 1950s-1970s and became widely used in Central and Eastern Europe (Moog et al., 2018). However, these systems did not become popular in the United States (US) and the United Kingdom (UK), mainly because they were considered specific for central Europe, their use was restricted to measuring organic pollution, and the collection and identifications of multiple organism groups was difficult and time consuming (Cairns and Pratt, 1993;Persoone and De Pauw, 1979). Instead, rapid biomonitoring approaches using biotic indices for assessment were developed in the UK and the US, exemplified by the Trent biotic index (Woodiwiss, 1964) and Beck biotic index (Beck, 1954). These systems were primarily designed to detect organic pollution on community levels but also reflected other environmental stressors Paisley et al., 2014). The development of biotic indices along with the implementation of The Water Act in the UK in the 1960s, accelerated the use of macroinvertebrates for water quality monitoring as river authorities were now charged with biomonitoring responsibilities (Hawkes, 1998). Taxaneutral diversity indices, i.e., numeric expressions of structural community composition based on both richness and abundances  were popular in North America in the 1960-70s, as they quantified the heterogeneity of full assemblages, had high statistical power, and were not dependent on tolerance values (Karr and Chu, 1999). Although diversity indices bypassed some of the difficulties experienced by the saprobic and biotic indices, they were eventually considered unsuccessful for several reasons, partly because they required rigorous sampling and the observed response to degradation was often poor (Cairns and Pratt, 1993;Metcalfe, 1989).
As the complexity of effluents increased with industrial activities and intensified land-use in the 1960s-1980s, many streams were affected by multiple stressors (Hellawell, 1986), and by the mid-1970s most European countries had changed their focus to biotic indices (Metcalfe, 1989). The plethora of systems that were in use in Europe at that time, led to an exercise to calibrate and harmonize the applied methods. Derived systems from this process are still in use today, such as the Biological Monitoring Working Party index (BMWP) and the Global Biological Normalized Index, (IBGN; Birk et al., 2010). The development of the BWMP systems in the UK from the mid-1970s (Hawkes, 1998;Paisley et al., 2014) along with the predictive classification models RIVPACS/RICT (Wright et al., 2000), have been instrumental for biomonitoring of rivers in the UK and has also been much used elsewhere (Birk et al., 2010).
In the US in the 1980s, biotic indices such as the family biotic index (Hilsenhoff, 1988;, were introduced for the purpose of rapid biomonitoring, together with structural and functional metric components, as the US Environmental Protection Agency (EPA) also called for efficacious methods to assess environmental quality of surface waters, as mandated by the Clean Water Act of 1972 (Barbour et al., 1999). Multi-metrics, in which simple metrics were combined to improve sensitivity, robustness and diagnostic capabilities of assessments, soon came into focus in the US (Cairns and Pratt, 1993;Karr and Chu, 1999). The implementation of the EU Water Framework Directive (European Community, 2000;hereafter EU WFD), at least in part, motivated for the use of multimetrics also in Europe as the directive require assessments based on multiple community components (Friberg, 2014;Hering et al., 2006). In Australia, by the mid-1990 s, a predictive model system called the Australian River Assessment System (AUSRI-VAS) was developed and implemented under the National River Health Program, which was inspired by the systems used in the UK (Chessman, 1995;Simpson and Norris, 2000;Nichols and Dyer, 2013). In New Zealand, standardized methods for macroinvertebrate biomonitoring were introduced in 1999 with assessments based on the macroinvertebrate community index (MCI; Stark et al., 2001).

Assessing the emergent use of riverine macroinvertebrates for status assessments
This review aims to fill a gap by providing a first global overview of indicators and methods used in the assessment of environmental quality in rivers by macroinvertebrates. We will focus on world regions with a recent history of biomonitoring and where a comprehensive synthesis of experiences is currently non-existent, namely the developing economies in Africa, South and Central America (hereafter America-SC), Mexico and Southern Asia. The review considers 1) identification of main pressures on rivers, 2) the application of macroinvertebrate assessments in terms of diagnostic capabilities, assessment types (metrics, multivariate, model systems) and level of taxonomical identification of samples, and 3) pitfalls in using macroinvertebrates as bioindicators in rivers. To establish a global perspective, the results from the literature review were compared to available compiled datasets, biomonitoring manuals, environmental monitoring surveys and literature reviews from Europe, North America and Australasia.
The overall objective of our study is to assess whether the large amounts of existing data, as collected through the many national monitoring and research activities worldwide, can be used to perform a data driven synthesis of environmental quality in rivers as indicated by macroinvertebrate community composition, for instance by using a common index. Recent concerns regarding the global state of nature and consequences for freshwater communities (Habell et al., 2019;He et al., 2019;IPBES, 2019;Reid et al., 2019) emphasize, in our opinion, the urgent need for such a synthesis. This is particularly relevant for nations lacking official biomonitoring programmes as it provides comparable insights into national river status and may indeed provide the impetus and baseline for getting started in the systematic use of biomonitoring. Developing a common metric to assess environmental quality of rivers, and one to which existing metrics could be intercalibrated, would be a powerful tool in a unifying, global assessment of riverine freshwater ecosystems.

Reviewing existing literature
We have searched literature published in peer-review journals during the period 1997-2018 that use riverine macroinvertebrates in the biomonitoring of freshwaters within developing economies in Africa, America-SC, Mexico and Southern Asia (Appendix A), as this covers a period when freshwater biomonitoring and related research activities accelerated in those areas (e.g. Ramírez and Gutiérrez-Fonseca, 2020;Resh, 2007). Novel methods and approaches following the selected period were discussed in the context of this review. For the acquisition of literature, we searched the online database, ISI Web of Science (http://w ebofknowledge.com; option "All Databases"), and used fixed sets of search criteria: [country/continent], and in addition one of the following criteria [macroinvertebrate/invertebrate/insect/benthic/ benthos], [ecological status], or [river/stream quality]. The literature search was conducted in December 2017 with a supplementary search in April 2018. We restricted the searches to literature published in English as we believe this forms a representative sample of main pressures types and assessment methods used. We also searched the grey literature, without using systematic search criteria, by exploring publication reference lists and using our network of colleagues. The grey literature also contains numerous studies, reports and other scientific communications from all three continents, published both in English and in native languages (e.g. Damanik-Ambarita et al., 2016;Mekong River Commission, 2018;Thirion, 2007). However, as these sources did not add new insights to our study objective, and it was difficult to avoid biases introduced by language barriers, the grey literature was not included in the final data analysis. Since our focus was on running waters, studies targeting estuarine areas, lagoons, lakes or lentic wetlands were not included, and neither were those solely targeting inherent variation in macroinvertebrate community compositions, baseline mapping surveys, as well as technical comparisons of sampling methods. All studies were given equal weight in the data analysis.
Our search resulted in a total of 314 publications ( Fig. 1): 78 from Africa, 138 from America-SC including Mexico (North America) and 100 from Asia (Thorne and Williams (1997) covered all three continents). The frequency of studies was unevenly distributed across the study area, with China dominating in Southern Asia, Brazil and Argentina in America-SC, and South Africa in Africa. A pronounced increase in the number of publications was found during our selected time period. In the last ten years (2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) there were approximately four times more publications (2 5 5) compared to the first eleven years (1997-2007; 59 publications). Although this was often not stated by the authors, the use of methods revealed that most rivers, or at least the sampling sites, were accessible by wading (~97%). The review data table is given in Supplementary material 1, information on data treatment in Supplementary material 2, and the reviewed literature references are listed in Supplementary material 3.
Pressures to rivers and streams show many commonalities across continents (IPBES, 2019), often related to the combined effects of land use changes (deforestation, agriculture, human settlements and urban development) with associated perturbations by nutrients, sediments, xenochemicals and hydromorphological alternations (Malmqvist and Rundle, 2002;Reid et al., 2019;Vorosmarty et al., 2010). We believe that a global assessment of environmental conditions in rivers using a macroinvertebrate common metric could be feasible as their community attributes have shown high diagnostic capabilities to degradation caused by such stressors, cf. the river assessment intercalibration process in Europe . To gather a global comparison, we started out by getting an overview of the dominating pressure types on rivers in Europe, North America and Australasia. For Europe we used data from the European Environmental Agency and the 2nd River Basin Management Plan (EEA, 2018a; 2018b); for North America the US EPA National Aquatic Resources Surveys (U.S. EPA, 2016; U. S. EPA, 2020), the US Geological Survey National Water Quality Program (Falcone et al., 2018), and Environment and Climate Change Canada (2020); for Australia the State of the Environment (Argent, 2016), Bond et al., (2008) and Haase and Nolte (2008); for New Zealand Clapcott et al. (2012) andLAWA (2020). For data on sampling methods and assessment types, we used the WISE methods database (Birk et al., 2010) and EN-16150:2012 for Europe; Buss et al. (2015), Carter et al. (2017) and Peck et al. (2016) for North America; van Looij (2009) and Smith et al. (1999) for Australia; Stark et al. (2001) and Clapcott et al. (2012) for New Zealand.
Pressures were categorized into six types where a single study could be ascribed more than one type: 1) deforestation/erosion; 2) agriculture/nutrients; 3) organic pollution (incl. wastewater runoffs from settlements, livestock and sewage); 4) urban development/infrastructure (incl. general degradation, human water use, garbage waste disposal and light industry, e.g. saw mills); 5) chemical/metals/spills (incl. mining tailings drainage and heavy industry leading to metal pollution, oil and other chemical spills); 6) hydromorphological (incl. water abstraction and damming, physical habitat modifications such as canalization, weirs, bank protection and sediment extraction). Alien species was also considered a category, but records were too few to have any impact in the data (three studies). Category 3) also includes high levels of nutrient pollution from unknown sources, elevated levels of biological and chemical oxygen demand, faecal bacteria and animal sacrifices (blood spills). Aquaculture was also grouped within this category as there were only a few such cases and they often referred to organic pollution effects reported downstream such installations.
Macroinvertebrate assessment types were categorized into four main types: singular metrics, multimetrics, multivariate (including predictive models) and models. To select candidate metrics for a global assessment system, we divided community responses into four response groups representing different ecological aspects: sensitivity were those using specific tolerance scores for the fauna to one or more stressors; richness/ dominance those affecting either the number of taxa and composition (e. g. % dominance of selected groups); diversity/entropy (hereafter diversity) those using a combination of richness and dominance to express heterogeneity (e.g. Shannon-Wiener diversity); traits those addressing ecological function (e.g. feeding types).

Pressures and application of methods
Overall, 92% of the reviewed studies reported that macroinvertebrates, at least by some of their responses, were successful in detecting perturbations in the investigated rivers (88, 91 and 96% success rate in Africa, America-SC and Asia, respectively). In the other cases, and for various reasons, the outcome was either unclear or they failed to find a provable connection with a putative degradation gradient. The number of study sites, elevation range, and stream size  Fig. 1), and there was hence a high potential for the application of macroinvertebrates for biomonitoring across a wide range of elevations and river sizes in those regions. Even so, the authors also reported several pitfalls in their application, covering the entire work chain from sampling and identification to assessment (Table 1).

Multiple anthropogenic pressures were the norm
Most of the studies reviewed were conducted in areas subject to deforestation/erosion, agriculture/nutrients, organic pollution, urban development/infrastructure and hydromorphological alternations, and the distribution of pressures was very similar across continents (Fig. 2a). Approximately 73% of studies reported the occurrence of more than one of the selected pressure types in their study gradient (71%, 75% and 73% in Africa, America-SC including Mexico and Asia, respectively; on average 2.6 pressure types). Although most studies were able to find correlations between stream degradation and macroinvertebrate assemblages, it was frequently reported that relevant environmental variables, according to the authors, were not measured, such as hydromorphology and habitat degradation, land use, and also water chemistry parameters (Bere et al., 2016;de Jesus-Crespo and Ramirez, 2011;Forio et al., 2017). Despite a high frequency of agriculture influences, xenochemicals like pesticides were rarely measured, although the authors discuss their likely presence in the environment. In cases where effects of such substances were targeted, impacts were found (Di Marzio et al., 2010;Hunt et al., 2017;Rasmussen et al., 2016).
Multiple anthropogenic pressures are also common in the populated and cultivated areas of the terrestrial realm, and rivers and river systems in North America, Australasia and Europe are impacted, to various degree, by habitat degradation and river flow modifications, climate change, sewage pollution, agriculture run-offs and invasive species (e.g. IPBES, 2019; Mazeika et al., 2019). Data from the EEA 2nd management plan in Europe, shows that multiple pressures were acting in about 60% of impacted rivers, with 68% being affected by hydromorphology, 25% by point source pollution (mainly urban wastewater and storm overflows), and 53% by diffuse sources. In the US, based on the national stream and river assessment in the period 2008-2009 and 2013-2014 (U.S. EPA, 2016, U.S. EPA, 2020), respectively, rivers were commonly impacted (categorized as disturbed or moderately disturbed) by nitrogen pollution (62-68% of stream and river lengths), phosphorus (65 -82%), sedimentation (45-44%), excessive enterococci bacteria levels (23-30%), riparian disturbance (66-71%) and in-stream habitat degradation (31 -34%). In Canada, land use changes through forestry, agriculture, mining, urbanization, acting alone or combination, were common (Environment and Climate Change Canada, 2020). For Australia common pressures were excessive water use, climate change, eutrophication and impacts from farming practices (incl. sedimentation and salinity), as well invasive species (Argent, 2016;Bond et al., 2008;Haase and Nolte, 2008), and for New Zealand, forestry, urbanization, eutrophication and organic pollution was most prominent (Clapcott et al., 2012;LAWA, 2020). This finding demonstrates the many similarities in pressures to rivers worldwide, and that a global common metric should be responsive to a variety of such co-existing pressure types.

Sampling during base flow is most common
Based on the reviewed literature from Africa, America-SC including Mexico and Southern Asia, approximately 37% of the studies reported specific sampling seasons for macroinvertebrates (dry, wet or intermediate), of which an average of 56% applied more than one sampling season (71, 46 and 70 per cent in Africa, America-SC and Asia, respectively; Fig. 2b). The most common combination was to sample both in the dry (low flow) and wet (high flow) seasons. If there was only one sampling campaign, the dry season was preferred over the wet season. Several studies report of high variation in macroinvertebrate composition between these seasons (Ferreira et al., 2009;Fierro et al., 2015;Kim et al., 2013) with abundances and richness metrics generally having lower scores during the wet season (Buss et al., 2004;Imoobe and Ohiozebau, 2010;Mesa, 2010), possibly as a result of destabilized Table 1 Pitfalls for the application of riverine macroinvertebrates for assessing environmental condition as reported by the authors of the reviewed literature (some reported more than one pitfall). substrates, dislodgement and drift of macroinvertebrates (Gebrehiwot et al., 2017), and insect emergence prior to spates (Astudillo et al., 2016). Sampling was also considered difficult or unsafe during periods of high precipitation leaving some sites unattainable (Dedieu et al., 2016;Nhiwatiwa et al., 2017). The authors experienced that impacts of pollutants may be more critical during low flow conditions because this is the time with the highest temperatures and the most concentrated pollution, whereas spates may obscure the results by diluting pollutants, and simultaneously mask population effects by increased macroinvertebrate recolonization (Jacobsen, 1998;M'Erimba et al., 2014;Zhang et al., 2012). However, spates may also represent cases of chemical flushing linked to surface runoff and agrochemicals thereby increasing presence of chemical substances in river water (Neumann and Dudgeon, 2002). During low flow conditions, it was argued, river sites may be dominated by local sources of pollution whereas in periods of high precipitation sites will also receive pollutants from distant parts of the catchment (Itayama et al., 2015;Jerves-Cobo et al., 2020). Despite the observed seasonal variation, many of the applied assessment systems proved successful across seasons (Baptista et al., 2007;Chen et al., 2014;, although not always (e.g. Helson and Williams, 2013) and some mention this as a future topic of study (Kaaya et al., 2015). In Europe, North America and Australasia, it is not recommended to sample riverine macroinvertebrates during and following spates for safety reasons and to ensure the quality of the data. Sampling is primarily recommended during baseflow (Buss et al. 2015) when the natural disturbances to stream assemblages are low and the effects of pollutants most representative. Several sampling seasons may be used (spring, summer and autumn), but care should be taken to compare like with like (Barbour et al., 1999;EN-0, 1615EN-0, :2012van Looij, 2009).

Kick and Surber sampling are the most common sampling methods
Kick nets (51%) and Surber samplers (40%) were the most common sampling devices for macroinvertebrates ( Fig. 2c; Appendix B). Other sampling devices, such as colonization units, and grab, drift and core samplers, were only infrequently used. In Africa, kick net was by far the most common sampling device (76%), whereas in Asia and America-SC including Mexico, the Surber sampler was more frequently used. Additional collection by handpicking was often applied in Africa, more infrequently in America-SC and Asia. Where mentioned, the mesh size was on average 450 µm (range 60 -2000 µm); generally finer for Surber than kick nets (Supplementary material 2 -Fig. 2). African studies applied the coarsest mesh sizes (average 598 µm), Southern Asia intermediate sizes (average 451 µm), and America-SC the finest sizes (average 378 µm). Regrettably, most often the choice of mesh size was not accounted for, but there is generally a trade-off between the portion of organisms collected and large amount of unwanted sediments and allochthonous material. For Africa, the frequent application of coarse mesh sizes is probably a consequence of the South African scoring systems (SASS), which uses kick sampling over a relatively large sampling area (three different habitats if available) and live sorting of samples in the field .
For the purpose of routine monitoring in wadable rivers, the handnet is most frequently used in North America (Carter et al., 2017), Australia (van Looij, 2009) and Europe (Birk et al., 2010), with a dominant mesh sizes of ~ 500 µm. Kick sampling is often considered more suitable for rapid biomonitoring purpose as it is substantially more cost-effective than Surber samplers without losing critical information (Storey et al., 1991;Tubic et al., 2017). Although applied sampling methods vary worldwide, with respect to area and sub-habitats monitored and sampling devices (Birk et al., 2012;Buss et al., 2015), this may be of minor importance when considering biological metrics, because the methods applied are, after all, based on similar principles (Friberg et al., 2006).

Identification level of macroinvertebrates varies
In the reviewed literature, identification of macroinvertebrates was based entirely on morphological characters, such as structure, shape, chaetotaxy and coloration of body parts. Interestingly, no studies reported the use of molecular methods to support or facilitate identifications. In many cases, the chosen identification level for the study was rarely accounted for, however, a common argument was that it would suffice for calculating the target metrics. The average identification level varied across continents (Fig. 3) but was also different for various groups of macroinvertebrates (Supplementary material 2 - Table 3). The most common identification levels were to genus and family and a combination of those levels were often used within the same study to target different groups of macroinvertebrates. In the tropical regions of Asia and America-SC, identification levels were closer to family than genus compared to temperate regions, where genus levels were more common, e.g. in Argentina, Chile, China and Korea. For Africa, the common application of the SASS-systems and frequent field sorting of samples may explain the relatively coarse identifications levels. Overall, the groups Ephemeroptera, Plecoptera, Trichoptera, Crustacea and Mollusca were identified to the lowest level (near genus), Coleoptera, Hemiptera, Odonata, Diptera to intermediate levels (genus/family) and Oligochaeta and Polychaeta to the highest level (near family). These findings undoubtedly relate to taxonomic issues, although such ranking is frequently considered adequate for their application in biomonitoring systems.
Identifications to lower taxonomical levels (e.g. species) was often not achieved although this was recommended in some studies Martínez-Sanz et al., 2014;Moya et al., 2011). The reasons for choosing higher identification levels were inadequate taxonomical knowledge and keys for some groups (Boonsoong and Braasch, 2013;Buss and Vitorino, 2010), limited time or funding for conducting analysis (Buss and Vitorino, 2010), and lack of trained personnel and facilities Suriano et al., 2011). As effects of degradation were generally detectable at higher taxonomical levels, such as family, there was little incentive to do more laborious analyses to obtain additional information (Suriano et al., 2011), although such data may be required for fine-tuned assessments and ecological research questions (Buss and Salles, 2007;Marshall et al., 2006), and for improving the existing biomonitoring systems (Abbaspour et al., 2017;Baptista et al., 2013;Dalu et al., 2017). In Latin America, developing taxonomy and systematics is currently considered a major research need (Ramírez and Gutiérrez-Fonseca, 2020).
Taxonomical identifications levels also vary in biomonitoring programmes applied in Europe, North America and Australasia, ranging from family to species (Birk et al., 2010;Buss et al., 2015), and similar cost-benefit debates regarding taxonomical identification levels have also taken place here (e.g., Marshall et al., 2006;Whittier and Van Sickle, 2010). Despite a much better taxonomical knowledge in general in these areas, species level identifications are still difficult for some groups and may only be possible for certain instars or sexes (Chessman, 1995). Because rapid biomonitoring is designed to be low cost, while identification work is generally time consuming (Rosenberg and Resh, 1993), species level identifications is not a realistic goal in most monitoring programmes (Buss and Vitorino, 2010). However, we may be close to a paradigm shift, changing from morphological to molecular identifications for the purpose of routine monitoring (e.g. Buchner et al., 2019;Pawlowski et al., 2018), at least in the developed economies with such resources at disposal. Molecular methods offer the possibility to monitor aquatic communities by extracting DNA from water samples (environmental DNA) and macroinvertebrate samples collected using traditional sampling devices (e.g. kick sampling). The retrieved sequences (e.g. from immature larvae) can then be compared to already established sequences from reference libraries, based on DNA extraction from stages that can be identified by morphology, to study their genetic distance to those reference species. A shift to molecular methods may eventually offer the possibility to study all organism groups and life stages in the samples and enable the use of multiple biological indicators in assessments.

Spatial and temporal variation may affect status assessments
With consequences for the assessments, it was frequently reported that macroinvertebrate assemblage compositions vary spatially and temporally, depending on variables like ecoregion, river typology, habitat type and time of year (Bae et al., 2011;Buss et al., 2004;Huang et al., 2015;Lorion and Kennedy, 2009;Pan et al., 2015). Many assessments in lowland rivers have also been challenged by the lack of non-degraded reference sites (Baptista et al., , 2013Hart et al., 2001;Mangadze et al., 2019), and in some cases, lowland assemblages had to be compared with those found at higher elevations although the fauna differed considerably (Jacobsen, 2003;Soldner et al., 2004). For example, undisturbed river faunas at high elevations in the tropics may score differently in sensitivity indices compared to nearby lowland rivers due to constraining environmental conditions such as oxygen limitation, large diel temperature variations and seasonal flow regime (Jacobsen, 2003;Jacobsen and Marin, 2008). Alternatively, one may make use of "least-disturbed sites" which is also not ideal as it implies at least some degradation (Hughes et al., 1986;Stoddard et al., 2006;Liu et al., 2017). Avaliability of long term data, from both reference and degraded sites, was generally lacking (Siqueira et al., 2015;Thanee and Phalaraksh, 2012;Zhang et al., 2014), although such data would give valuable insights into the temporal stability of reference assemblages which may be crucial for the assessments.
Spatial variation in macroinvertebrate assemblage compositions is the norm in riverine landscapes (Cairns and Pratt, 1993;Hynes, 1970) and this needs to be disentangled from human-induced stress to avoid misclassification in assessments. In the EU WFD (Annex II), like in the UK (Wright et al., 2000), US (Hawkins et al., 2000;U.S. EPA, 2016), Canada (Rosenberg et al., 2000) and Australia (Simpson and Norris, 2000), references communities are predicted on the basis of abiotic variables, such as climatic region, catchment geology and river size, either into categorical or continuous river typologies. If no appropriate reference typology exists, assessments can be made based on the distance to the closest available type by adding uncertainty to the classification. Missing reference site conditions may also be established based on expert judgements or historical data, which is most common for categorical typologies (Friberg et al., 2011).

Assessment types
Singular metrics and multimetrics were the most used in the assessments, comprising > 70% of the applications, with singular metrics being the most frequent (63%; Fig. 2d). Multivariate methods had equal popularity to multimetrics in America-SC and Southern Asia, whereas in Africa multivariate methods were more frequent than multimetrics. In Southern Asia, species abundance distributions models (SAD; i.e. vectors that integrate the abundances of all species encountered in a sample (McGill et al., 2007) were infrequently used (Kim et al., 2016a). Predictive models were applied to predict either O/E ratios of taxa Sudaryanti et al., 2001) or metrics values (Ambelu et al., 2010;Moya et al., 2011) in all three continents, although at low frequencies (1.3%). In South Africa, a predictive model system (MIRAI) has been proposed for specialists instead of the SASS-5 for the purpose of biomonitoring (Thirion, 2007).
Some singular metrics, like the widely used BMWP/ASPT-type systems, robustly measured impacts across different river typologies, habitats and ecoregions and were therefore preferred by many authors. However, on some occasions, such metrics were also considered too simplistic and failing to detect some perturbations (Bere and Nyamupingidza, 2014;Xu et al., 2014). For this reason, many authors made use of multimetrics to increase the reliability on assessments by adding more community components. Yet, some studies experienced problems using those too, relating to complex calculations  and requirements for specific reference values or metrics for different regions in particular (Dedieu et al., 2016;Huang et al., 2015;Thorne and Williams, 1997;Zagarola et al., 2017). A combined approach using multimetrics and predictive models to establish reference states was proposed to overcome this issue (Moya et al., 2011;Silva et al., 2017), but the threshold for using predictive models can be high (Blakely et al., 2014), relating to lack of sufficiently large datasets, environmental data, and higher technical demands to operate and interpret the model systems. Hence, although the use of a combined approach looks promising, the use of predictive modelling requires more effort in terms of study design and modelling expertise. Several studies introduced assessment systems suited to evaluate their degradation gradient, typically a multimetric, but reported that further testing and validation elsewhere were needed (Helson and Williams, 2013;Jun et al., 2012;Raburu et al., 2009;Weigel et al., 2002). In other words, it was often uncertain to what extent the proposed assessment system was successful beyond the limits of that study.
In recent years in the US, all States use riverine macroinvertebrates for status assessments, applying a wide variety of singular and multimetrics focusing on traits and organic pollution sensitivity metrics (Carter et al., 2017). One major national biomonitoring programme operates in the US, the National Aquatic Resources Survey (NARS), funded through the US Environmental Protection Agency (USEPA), using a combination of multimetrics and predictive O/E taxa ratios in assessments (Buss et al. 2015). The Canadian Aquatic Biomonitoring Network (CABIN) applies predictive models to set reference conditions and assessing environmental quality using multivariate and multimetric methods. In the EU WFD, individual member states use various methods that have been previously intercalibrated with the other nations (intercalibration groups). Several specific metric systems have also been developed to address other pressures than organic pollution, such as acidification (Davy-Bowker et al., 2005;Sandin et al., 2004), pesticides , hydromorphology (Lorenz et al., 2004), flow changes (Extence et al., 1999) and sedimentation (Extence et al., 2013). In Australia, little development has occurred since the introduction of AUSRIVAS and SIGNAL/SIGNAL2 (Chessman, 2003;Nichols and Dyer, 2013;Simpson and Norris, 2000), although a system targeting eutrophication using species level indicators (Haase and Nolte, 2008) and a trait database have been developed (Kefford et al., 2020), with aims to facilitate future biomonitoring. Predictive models have been tested for macroinvertebrate biomonitoring in New Zealand but have not adopted nationally (Stark and Maxted, 2007).

Richness and dominance metrics were focused on EPT
In all three continents, assessment metrics were based on community attributes reflected by changes in richness and dominance, sensitivity, diversity (by heterogeneity) and traits. The representative metric types were similar across the regions with richness and dominance being most used (58%), followed by sensitivity/biotic (40%), diversity (32%) and traits (25%). Several studies made use of more than one metric group for assessments (e.g. for multimetrics). In metrics covering richness and dominance, total taxon richness (incl. family richness) was used for assessments in 39% of the studies. For specific macroinvertebrate groups, taxa within the orders Ephemeroptera (41%), Plecoptera (37%) and Trichoptera (39%), hereafter EPT, were most used, showing lower richness and dominance in the response to degradation. A similar pattern was found in studies up until 2016 by Ruaro et al. (2020). This showed that total richness and EPT richness were the most used components in macroinvertebrate multimetrics globally. Despite being scarce in some regions, plecopterans are often included in the EPT metric calculations (Arnon et al., 2015;Dudgeon, 1999). Although some EPT taxa are obviously more tolerant than others to some stressors (Baptista et al., 2007;Dos Santos et al., 2011), EPT assemblages were, as a whole, sensitive to all types of degradation of their habitats. Degradation also led to decrease in richness and dominance of Odonata (used in 5% of assessments) and Coleoptera (8% of assessments; e.g. Buss and Vitorino, 2010;Dedieu et al., 2016;Huang et al., 2015;Perera et al., 2012), whereas Diptera and Oligochaeta (8%) typically increased in dominance but decreased in richness (e.g. Boonsoong et al., 2009;Ferreira et al., 2011). In areas with organic pollution, there was typically a noticeable increase in Chironomidae dominance (14%), often by "red forms" (5%), such as Chironomus (Rosa et al., 2014), and the oligochaete family Tubificidae (Shi et al., 2017), that are well adapted the anaerobic conditions following inputs of easily degradable organic matter (Hellawell, 1986;Hynes, 1960).
Although there is a multitude of metrics in use for assessing environmental quality of rivers in the US, total richness, EPT richness and % EPT are among the most used in biomonitoring programmes (Stoddard et al., 2006;Carter et al., 2017). EPT are also frequently used as parts of multimetrics in Europe (Birk et al., 2010), such as in Sweden (DJ-index), in Estonia (Estimation of freshwater quality using macroinvertebrates) and in Italy (MacrOper, based on STAR ICM index calculation). These groups have also proven successful for biomonitoring in Australia  and New Zealand (Clapcott et al., 2012). Several EPT taxa are also indicators of low impact in most sensitivity indices Chessman, 2003;Hilsenhoff, 1988;Stark and Maxted, 2007) and show that EPT is an essential group for river biomonitoring worldwide. Therefore, not only is there a remarkable similarity in the composition of macroinvertebrate stream fauna worldwide (Hynes, 1970), their responses to degradation are strikingly similar.

BMWP systems were the most used sensitivity metrics
BMWP/ASPT was the most used sensitivity metric (31%), including the many modifications to this system, such as SIGNAL (Chessman, 2003) and SASS . The second most popular was the family biotic index (Hilsenhoff, 1988), including modifications (e.g. , used by ~ 8% of the studies, and thereafter followed the Indice Biotico Esteso (IBE), including modifications such as the Belgian biotic index and the biotic index for Pampean rivers and streams Depauw and Vanhooren, 1983;Ghetti, 1986;, used by ~ 3% of studies. A variety of other different biotic indices were employed at lower frequency (~3%). The saprobic systems were overall little used (~1%) except for South Korea with some application of the Korean saprobic index (Bae et al., 2011;Kim et al., 2016b).
Although the BMWP score tables developed for the UK proved successful in several cases, local modifications were often preferred. The present literature review indicates that the BMWP system can detect organic pollution, for which is was originally developed, but also a variety of other degradation types, e.g. relating to deforestations/siltation, agricultural land use or mining. The fact that this system in most cases requires only family level identification and is applied purely qualitatively (i.e. presence/absence), may explain its popularity worldwide. Several European countries, Australia and New Zealand use BMWP/ ASPT type systems as part of their national biomonitoring assessments (Birk et al., 2010;Chessman, 2003;Stark and Maxted, 2007), whereas the family biotic index is most commonly used in the US (Carter et al., 2017). The latter requires abundance data for calculating index scores and calculations whereas many BWMP variants do not, although see Paisley et al. (2014) and Stark and Maxted (2007) for versions that also apply abundance classes. Although it is restricted to measuring organic pollution, the saprobic system is still actively used for assessment in some parts of Europe modified to suit assessments in the EU WFD (Moog et al., 2018;Rolauffs et al., 2004).

Shannon-Wiener was the most used diversity metric
Shannon-Wiener was the most used diversity metric (85%), followed by evenness (31%; often Pielou evenness, but this was sometimes not stated), Margalef (27%) and Simpson (21%). Hence, our results further support that Shannon-Wiener is the most used diversity index for river biomonitoring using macroinvertebrates worldwide (Carter et al., 2017;Metcalfe, 1989;Resh and McElravy, 1993). Shannon-Wiener diversity measures the number of taxa by their abundance frequency, without applying weights to rare or dominant taxa (like the Simpson index), whereas the Margalef index measures richness in relation to the total abundance. Although the use of diversity indices for freshwater biomonitoring has been much debated and somewhat disregarded (Cairns and Pratt, 1993;Metcalfe, 1989), this review shows that they are still frequently used to assess environmental quality of rivers in response to various degradation types.

Traits often gave unclear results
Feeding traits were the most common trait metrics (85%), more infrequently mobility/habitat (16%), and least trophic guilds, refuge, respiration, external protection and body size (each < 2% of assessments). Such attributes were often successful in detecting river degradation, i.e., the authors found ecological meaningful changes in traits composition that they could relate to degradation. However, sometimes these attributes also behaved in unpredictable manners (Lorion and Kennedy, 2009;Marquez et al., 2015;Miserendino and Pizzolon, 2003).
A general expectation of the authors was for shredders and/or grazer/ scrapers/predators to be replaced by gatherer/collectors in response to deforestation, intensified land-use, inputs of nutrients, and organic effluents (Ding et al., 2017;Mesa, 2014;Miserendino and Pizzolon, 2000). A common problem, especially in tropical regions, is that shredders may still be scarce even if leaf litter is available throughout the year (Li and Dudgeon, 2009;Mesa, 2014), and not always correlated with the amounts of leaf material (Lorion and Kennedy, 2009). Reasons for this may be that the roughness, high lignin/tannin content and the low nutritional quality of some tropical riparian plants renders them unpalatable and unattractive to shredders (Ferreira et al., 2011;Stout, 1989;Wantzen et al., 2002). The abundance of shredders may also be naturally low in medium and large-sized rivers (Marques and Barbosa, 2001;Miserendino and Pizzolon, 2003), and sometimes may change seasonally and in response to stormy weather that increases allochthonous matter inputs (Fierro et al., 2015). Microbial breakdown may also be relatively more important in tropical streams than in temperate systems due to higher temperatures (Irons et al., 1994), and the link between leaf litter and shredders may therefore be weaker. Others pitfalls associated with the application of traits were low taxonomical resolution of the data (often family), missing autecological knowledge for much of the fauna and the lack of a standardized methodology for trait-allocation (Buss and Vitorino, 2010;Marquez et al., 2015;Forio et al., 2018). Overall, the authors found potential in the use of traits metrics for assessments but at present, more research is probably needed before trait metrics can be recommended for the use in routine monitoring covering larger areas and river typologies. Feeding (e.g., richness of scrapers) and habitat traits (e.g., % burrower taxa) are used by the US EPA for the IBI multimetric (Stoddard et al., 2006;Carter et al., 2017), and in Europe, traits are used in several member states multimetric systems (Birk et al., 2010), such as in Germany (assessment method for rivers using benthic invertebrates; mobility and habitat), Austria (assessment of the biological quality elements -part benthic invertebrates; feeding and habitat) and Sweden (multimetric index for stream acidity; feeding).
Although not frequently applied for biomonitoring yet, the development of trait databases for Australasian rivers may increase their use in the future (Chessman, 2015;Kefford et al., 2020;Phillips and Smith, 2018).

A global ecological assessment of rivers and future perspectives
The present review shows that riverine macroinvertebrates have an excellent track-record worldwide as indicators of human-induced perturbation in rivers, acting alone or in concert, made evident by using relatively simple and cost-effective biomonitoring methods. Current and projected land-use (Habell et al., 2019;IPBES, 2019) is of high significance for the integrity of rivers, especially in developing economies where natural ecosystems are particularly exposed to impacts but often with limited knowledge, political will or resources for biomonitoring (Resh, 2007). This review shows that there is considerable similarity in pressures acting on rivers worldwide and that methods applied for sampling, analysis and assessment of macroinvertebrates have much in common. We consider that a global river assessment using macroinvertebrates can be implemented based on existing data and lead the way for a future standardized global monitoring programme.
Ideally, all biomonitoring assessments from wadable streams would apply the same sampling methods to reduce the variability inevitably introduced by using different approaches (Clarke et al., 2003;Friberg et al., 2006). However, this is rarely possible as devices, sampling approach and specimen identifications vary to suit different scientific objectives. To compensate for such differences, a global comparison of riverine macroinvertebrate data is, at present, perhaps only feasible using higher taxonomical levels, such as family, which are less sensitive to methodological differences but may still show high diagnostic capabilities. By selecting some of the most commonly used metrics for river assessments worldwide, we suggest three metrics to be considered as components of a common global multimetric index for measuring environmental quality, namely BMWP/ASPT, Shannon-Wiener diversity and richness of EPT taxa. Such a combination of metrics includes aspects of sensitivity, diversity by heterogeneity and richness (of sensitive groups), with each component expected to diagnose to specific (e.g. sewage pollution) and general degradation. The proposed metrics were also included as components in the European STAR Intercalibration Common Metric Index, developed to assess general degradation of river sites and used to harmonize assessment methods between EU Member States . Each metric should be referenced by appropriate national or regional reference sites with the final index value normalized and averaged (e.g. see Hering et al. 2006). Using local reference sites would allow for local modifications to the BMWP score tables and account for natural variability in community compositions across spatial scales, including different river typologies. To calculate diversity, estimates of abundances in samples are required which may prove challenging for live sorting of samples in the field. Regarding seasonality, data from baseflow (dry season) should be used to assure global comparability.
The use of family level identification in combination with relatively simple and proven methods, with low technological demands, is still advocated (Resh, 2007) and supported by the results of the present review. However, in species identification, molecular techniques are developing rapidly, and will most likely within the next decade be a costefficient alternative to traditional taxonomy, allowing for a taxonomic resolution at species and genus level across multiple organism groups. To ensure comparability and reproducibility between such molecular data, however, it will be crucial to extend and develop reliable reference libraries for molecular sequences, e.g. The Barcode of Life Data System (Ratnasingham and Hebert, 2007). The use of molecular data would therefore, at least partly, solve the issue of finding a common and robust taxonomical level for monitoring. Although these methods will probably revolutionize biomonitoring, much work remains to test their applicability compared to traditional methods, and especially in regions with limited resources and facilities to conduct such analyses. We believe molecular methods will facilitate rather than replace the traditional sampling methods, at least in a short-term perspective, as both approaches have valuable aspects not covered by the other, e.g. detecting whether the macroinvertebrates were alive at the time of sampling, hence advocating for combined approaches.
As molecular methods progress, in terms of environmental DNA or metabarcoding, we believe future sampling programmes should aim at data collection to measure i) environmental quality, ii) diversity and population estimates for macroinvertebrate assemblages (sensu global population declines), and iii) conservational status following the International Union for Conservation of Nature (IUCN) or other criteria (loss of species/diversity). As sample processing methods may soon be about to change, e.g. by introducing molecular methods to analyse collected samples, we believe that now will be a suitable moment also to harmonize macroinvertebrate sampling methods on a global scale to ensure comparable results in the future.

Horizon scanning: A step towards a global assessment?
Freshwaters are an essential and endangered resource for human wellbeing and could provide a bottleneck for a sustainable future global development with more equality and fewer conflicts (UN, 2020). It is therefore imperative that we take steps to protect our freshwater resource and to tackle the main pressures. This initiative requires knowledge on what are the main threats and how these scale to each other in terms of importance. Today, global assessments, such as IPBES, are made by synthesising a range of heterogenous sources of information differing in terms of scope and methodology, limiting the possibility of quantifying effect-sizes of different pressures on biodiversity and relating these to each other. For this reason, nations, and supra-national structures such as EU, rely on reproducible and standardised assessment methods for comparable assessment across spatial and temporal scales (e.g. EU WFD) and there is need to expand this type of rigorous approach to global scale assessments. This review suggests that we could be close to a data driven synthesis of the status of rivers as indicated by macroinvertebrate community composition, which may provide a platform for a future global assessment. A first global river assessment could be done by establishing a "Global River Assessment Committee", with members represented from every continent, with the responsibility for establishing and harmonizing the global common method, to provide protocols and training to members, and for conducting the final assessment. The global assessment project would require funding over an extended period to allow for several assessment phases, like the Mekong River Commission in Southern Asia, EEA in Europe and NARS in the US. This would be an efficacious way of sharing knowledge between participants and yield high quality data to be used by policy makers to underpin global strategies for securing freshwater resources in a World under threat.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements
For information about projects, initiatives and monitoring trends that were not covered by our literature survey, we thank our worldwide network of colleagues, including Dr. Helen M. Barber-James (Albany Museum, South Africa), Dr. Eduardo Dominguez (Instituto de Biodiversidad Neotropical, Argentina), Dr. Ian Campbell (Monash University, Australia), Dr. John Conallin (Charles Sturt University, Albury, Australia/IHE Delft Institute for Water Education) and Dr. Phil Suter (La Trobe University, Albury-Wodonga). We also thank two anonymous reviewers for their constructive comments that helped us to improve the manuscript.
Appendix A. The distribution of studies in 1997 -2018, number of sites investigated, site characteristics and success rate in assessments, as noted by the authors, at least for some of the community properties tested. Footnotes to Appendix B 1 Includes BMWP/ASPT systems designed for use the UK, e.g. Armitage et al. (1983), and modified versions adapted to Asia