Bioavailability and Toxicity Models of Copper to Freshwater Life: The State of Regulatory Science

Efforts to incorporate bioavailability adjustments into regulatory water quality criteria in the United States have included four major procedures: hardness‐based single‐linear regression equations, water‐effect ratios (WERs), biotic ligand models (BLMs), and multiple‐linear regression models (MLRs) that use dissolved organic carbon, hardness, and pH. The performance of each with copper (Cu) is evaluated, emphasizing the relative performance of hardness‐based versus MLR‐based criteria equations. The WER approach was shown to be inherently highly biased. The hardness‐based model is in widest use, and the MLR approach is the US Environmental Protection Agency's (USEPA's) present recommended approach for developing aquatic life criteria for metals. The performance of criteria versions was evaluated with numerous toxicity datasets that were independent of those used to develop the MLR models, including olfactory and behavioral toxicity, and field and ecosystem studies. Within the range of water conditions used to develop the Cu MLR criteria equations, the MLR performed well in terms of predicting toxicity and protecting sensitive species and ecosystems. In soft waters, the MLR outperformed both the BLM and hardness models. In atypical waters with pH <5.5 or >9, neither the MLR nor BLM predictions were reliable, suggesting that site‐specific testing would be needed to determine reliable Cu criteria for such settings. The hardness‐based criteria performed poorly with all toxicity datasets, showing no or weak ability to predict observed toxicity. In natural waters, MLR and BLM criteria versions were strongly correlated. In contrast, the hardness‐criteria version was often out of phase with the MLR and, depending on waterbody and season, could be either strongly overprotective or underprotective. The MLR‐based USEPA‐style chronic criterion appears to be more generally protective of ecosystems than other models. Environ Toxicol Chem 2023;42:2529–2563. © 2023 The Authors. Environmental Toxicology and Chemistry published by Wiley Periodicals LLC on behalf of SETAC.


BACKGROUND
Trace metals have long been of interest in both aquatic toxicology and the setting of water quality criteria to protect fisheries and other aquatic life.Copper (Cu) in particular has been of persistent concern in aquatic toxicology and fish physiology because it can be toxic to aquatic organisms at environmentally relevant concentrations and because it is an essential micronutrient.Copper occurs naturally in the environment and is a component of a number of enzymes, which makes it an essential element for all aerobic organisms.In addition to its essentiality, Cu is a potent toxicant and, as a result, aquatic organisms have developed delicate homeostatic controls to maintain a balance between deficiency and toxicity (Grosell, 2011).
However, in addition to its natural occurrence in waters, Cu's value to society and ubiquitous use brings human-caused increases in the aquatic environment.Some features and uses include its high toxicity to certain organisms, its ubiquity in the environment, its important role in the manufacture of antibiofouling, corrosion-resistant, and electrical conductivity materials, and its use in aquaculture as a pesticide with low risk to humans (Grosell, 2011).Copper's toxicity to aquatic organisms was recognized by at least the 1700s, when shipbuilders began adding Cu cladding to wooden hulls to reduce damage from wood-boring molluscs (Dürr & Thomason, 2009).
The chemical form (speciation) of metals in the environment has long been recognized as being critical to understanding and appropriately regulating allowable ambient concentrations in freshwaters.Bioavailability controls are important for understanding and managing the potentially harmful effects of all trace metals, but especially Cu (Allen et al., 1980;Chapman & McRady, 1977).In developing Cu criteria, refining bioavailability adjustments is of a particular priority because of the narrow range between natural background and toxic concentrations, which can be at as low as a factor of 5 (Mebane et al., 2020).Sufficiently safe criteria must be tempered by the fact that Cu is naturally found in all freshwaters and that simply setting very low "fully protective" water quality criteria could encroach into the range of natural background concentrations, which could cause problems when applying criteria to manage allowable discharge limits.
The purpose of the present study is to review the performance of the leading approaches that have been developed to incorporate bioavailability and predictive toxicity of Cu in freshwaters into practical regulatory applications.The performances of four approaches for predicting bioavailabilityadjusted criteria concentrations of Cu in freshwaters are contrasted: (1) an empirical hardness-based linear regression model currently in wide use (US Environmental Protection Agency [USEPA], 1985b[USEPA], , 1996)); (2) the water-effect ratio (WER) approach to adjusting hardness-based criteria using a ratio derived from concurrent toxicity testing with laboratory and site waters; (3) a mechanistic biotic ligand model (BLM) that predicts the speciation of Cu in water and its subsequent toxicity to aquatic organisms and that represents the USEPA's most recently recommended Cu criteria (Santore et al., 2001;USEPA, 2007); and (4) empirical multilinear regressions (MLRs) that predict Cu toxicity to aquatic animals as a function of dissolved organic carbon (DOC) concentration, water hardness, and pH (Brix et al., 2020(Brix et al., , 2021)).The goal was not to produce analyses similar to those presented in the investigations of Brix et al. (2017Brix et al. ( , 2021) ) and Environment Canada (2021), but to show the different perspectives that might help to vet approaches for potential regulatory application.The supporting datasets and calculations for these evaluations are available from the figshare data repository (Mebane, 2022).
Three notes on the scope and terminology used in the present review are pertinent.First, the term "water quality criteria" and its variations are used to mean protective values for aquatic life that were calculated consistent with the analytical framework of Stephan et al. (1985), and not just those criteria values formally determined and published by the USEPA itself.Second, the reason for doing the review was to examine the different approaches with potential application to California.However, even though examples from diverse California watersheds are used, little in this review is peculiar to California.Third, the literature within the scope of this review is extensive, and a great many citations could be given the prefix "e.g.," but doing so would be awkward.Citations to previous research are intended to be relevant and representative for the point made but not to be all encompassing.Fourth, the present review focuses on the effects of Cu on aquatic animals.The aquatic animal focus follows the traditional USEPA criteria practice of deriving criteria using animal data but possibly adjusting criteria downward if the final chronic values (FCVs) are shown to be unprotective of societally important aquatic plants or to disrupt important ecological functions (Stephan et al., 1985;Stephan, 1985).It is doubtful whether bioavailability models derived to predict aquatic animal responses to Cu will predict toxicity to aquatic plants and algae as accurately (Mebane, Chowdhury, et al., 2020).

Mechanisms of aquatic toxicity and factors that modify Cu toxicity
In short-term exposures, the risks of Cu toxicity seem to be primarily related to ionoregulatory disruption, mostly from interfering with sodium uptake (Grosell, 2011;Wood, 2012).The concept that metal binding to critical sites on receptor organs (e.g., gills of fish or, more generally, biotic ligands) led to toxic effects was supported by experiments showing that metal accumulated in the gills, disrupting sodium entry and the activity of the sodium-potassium pump (sodium-potassium adenosine triphosphatase) present in gill tissue (Playle et al., 1992(Playle et al., , 1993b(Playle et al., , 1993c)).
An important group of sublethal effects in fish that follow even short-term Cu exposures is interference with mechanosensory organs and related olfactory and lateral line dysfunction (Baldwin et al., 2003;Hansen, Rose, et al., 1999;Mebane et al., 2020;Sandahl et al., 2004Sandahl et al., , 2006)).This dysfunction then leads to a cascade of physiologic impairment and maladaptive behaviors, including altered swimming behaviors (Calfee et al., 2016;McIntyre et al., 2012;Puglis et al., 2019), failure to sense alarm chemicals indicating the presence of predators and then failing to avoid getting eaten by predators because innate behaviors to freeze or hide are lost (Dew et al., 2014;McIntyre et al., 2012;Thomas et al., 2016), loss of equilibrium or rheotaxis (ability to orient head up in the current; Calfee et al., 2014;Dew et al., 2012;Linbo et al., 2009), or disrupted homing or migratory behavior (Lorz & McPherson, 1976, 1977;Saucier et al., 1991;Saunders & Sprague, 1967).Whereas behavioral avoidance of Cu is likely a natural defense mechanism, avoidance behavior can be seen as an adverse effect because it can interfere with other important life history behaviors such as natural movements or migrations (Saunders & Sprague, 1967).In laboratory swim chambers in which the chemical is released into one side of the chamber and in the absence of other behavioral cues, avoidance responses may occur at concentrations that are up to 15× lower than the concentrations that biodynamic and toxicity modeling, Balistrieri et al. (2020) showed that the relative importance of Cu uptake by insect larvae via dissolved or dietary pathways is highly dependent on the concentrations of metals in food and water, in which the dietary pathway was usually dominant at lower water concentrations.The metal uptake rates for different insect taxa are also important, with some Cu-sensitive mayflies having much higher uptake rates than less sensitive caddisfly families (Balistrieri et al., 2020).
For regulatory perspectives, an important question is whether dietary toxicity of Cu to aquatic organisms is likely at dissolved concentrations that do not exceed regulatory criteria for waterborne Cu.As is shown in the present review in the section Field and ecosystem studies, with Cu, the answer to that question often depends on which potential regulatory criteria model is considered.Even under the best performing potential regulatory model, at present the answer seems to be "yes" for some sensitive mayfly species but "no" for profound shifts in aquatic community structure or function.

Factors that modify the bioavailability and toxicity of Cu to aquatic animals
An over-riding theme in the present review is that the risks of aquatic toxicity of Cu are not simply a function of the environmental Cu concentrations, but that factors such as complexation with negatively charged organic and inorganic particles or molecules such as clays and DOC influence toxicity, as does competition between positively charged Cu and other cations such as H + and Ca 2+ (Chapman & McRady, 1977;Erickson et al., 1996;Grosell, 2011).Many factors have been related to the modification of the bioavailability and toxicity of Cu in freshwater including pH, DOC, calcium (Ca), magnesium (Mg), sodium (Na), potassium (K), alkalinity, temperature, and the presence of other metals (De Schamphelaere & Janssen, 2002, 2004b;Erickson et al., 1996;Mebane et al., 2020;Niyogi & Wood, 2004).Other issues that influence the toxicity of metals, including Cu, are related to the sensitivity and vulnerability of animals including factors such as osmoregulatory stress, dietary loading, size or life stage of the organism, nutritional status, and parasite burden (Cadmus et al., 2019;Chapman, 1978;Kelly & Janz, 2008;Sprague, 1985;Wendelaar Bonga, 1997).Of the factors influencing Cu bioavailability, organic carbon, pH, and hardness (as a surrogate for Ca) have come to be considered of primary importance, and these issues are the focus of the present review.
Toxicity-modifying factors (such as pH, hardness, and DOC) themselves affect the vigor of aquatic organisms (Galvez et al., 2008;Meyer et al., 2007;Wood et al., 2011).The largest influence is probably the energy requirements of osmoregulation.In low hardness (low ionic strength) water membranes of fish become highly permeable to the efflux of ions (e.g., Na + , Ca 2+ ).This leads to ion leakage from cell junctions, which requires energy to counteract the ion losses from the body and to maintain their internal mineral balance.Metals seem to compound this problem (McDonald & Rogano, 1986;Wendelaar Bonga, 1997).For instance, the much greater resistance of fish to metals in marine waters versus freshwaters cannot solely be attributed to competition and complexation, rather, the increased Na in the marine environment adds physiological protections (Crémazy et al., 2022;Grosell, 2011).Short-term pre-exposure to soft water can substantially amplify the effects of exposure to Cu or other metals, especially if the toxicant targets ionoregulation (Taylor et al., 2000).In very soft water, fish may be on the verge of ionoregulatory problems, and any increase in metals or H + (i.e., pH decrease) may result in plasma ion losses and a fatal loss of homeostasis (Playle et al., 1992;Wendelaar Bonga & Lock, 2008).Cell biology is similar enough across faunal groups that it is likely that other organisms, such as aquatic insects or crustaceans, would also have ionoregulatory stress and differing adaptations in very soft waters (Buchwalter & Luoma, 2005;Griffith, 2017), which is related to the strong association between range limits of aquatic insects and ionic strength (Cormier et al., 2018).As a practical matter, it may not make much difference to the organism or to the population whether they get killed by metal toxicity or by increased susceptibility to ionic disruption secondary to metals.
pH.The pH can be considered a master variable in metal toxicity, because it exerts control over speciation of metals, affects homeostasis, and through proton competition can compete with cationic metals for binding to ionoregulatory cells (Mebane et al., 2020).The pH affects complexation reactions in the water surrounding the organism, at the interface between water and organism (Playle & Wood, 1989), and internally (Wood, 2012).At low pH, Cu complexes dissociate with the highly toxic free ionic form, becoming increasingly prevalent over the somewhat less toxic Cu-hydroxyl forms and nontoxic carbonate complexes, which leads to increased toxicity.However, concurrently as the H + concentration increases, proton competition with the positively charged Cu ions can decrease the effective toxicity.This interplay results in nonlinear influences of pH on Cu toxicity and likely contributes to differing reports in the literature that low pH either increases susceptibility (Erickson et al., 1996;Waiwood & Beamish, 1978a, 1978b) or reduces susceptibility to Cu toxicity (Cusimano et al., 1986;Ng et al., 2010).

DOC.
In the metals ecotoxicology literature, the terms dissolved organic matter (DOM) and DOC are often used more or less interchangeably, along with related terms such as natural organic matter, humic substances, or humic acids.Conceptually, this is incorrect, because only approximately 50% of DOM consists of DOC, with the balance often including dissolved organic nitrogen, along with other elements, and the composition of DOM varies between water bodies (Findlay, 2006).
Dissolved organic matter is a substantial mitigating factor against Cu toxicity because of its strong affinity to form metal complexes that then do not bind to biotic ligands on the gill or other biological surfaces.There has been extensive research into characterizing different sources of DOM (e.g., autochthonously produced, terrestrial, sewage) as well as its chemical and biological characteristics including strength of ameliorating toxicity, kinetics, and chemical composition (Al-Reasi et al., 2013;Ma et al., 2001;Richards et al., 2001;Wood et al., 2011).Repeated studies have shown differences in binding affinities to Cu and other metals and differences in protection from acute Cu toxicity.However, some research has shown that DOM sources do not vary much in their metal binding characteristics on a scale relevant to toxicity of freshwater organisms, and thus it is the concentrations of DOM and metals that are more important when the protective effects of DOM against metal toxicity to aquatic organisms are considered (Playle, 1998;Richards et al., 2001).This is the tack taken by aqueous chemists in developing code to model the binding of metals to DOM, that the quantity is more important than the different sources of DOM (Tipping et al., 2011;Tipping, 1994).Furthermore, the DOC fraction of DOM is known to be ecologically important and an important modifier of metal bioavailability (Findlay, 2006;Steinberg et al., 2006;Wood et al., 2011).
For simplicity's sake, the term DOC will be used from now on, except where more specific distinctions are needed.One key area in which more specific distinctions are needed is when one is working with geochemical speciation models that usually require DOM as an input, whereas most analytical laboratories report DOC, not DOM.This requires the user to estimate the fraction of DOM that is DOC, and to mathematically correct the inputs.In the case of the USEPA (2007) BLM, this estimation of DOM as being equal to 2× the DOC concentration is done internally within the software code (USEPA, 2003, p A1) The differing practices around this issue are described more in the The reactive DOM issue section.
Hardness.The term water hardness and its measurement came into wide use to describe hard waters that cause soap to precipitate into scums rather than lathering and that cause scale buildup and clogging of water supply pipes.Hardness is the sum of Ca and Mg milliequivalents in water and is generally expressed as "hardness as CaCO 3 " (Hem, 1992).For brevity, hardness as CaCO 3 is shortened to "hardness" throughout the present review.
In laboratory waters, hardness has sometimes been observed to be strongly correlated with reducing the toxicity of Cu (Crémazy et al., 2017;Erickson et al., 1996;Waiwood & Beamish, 1978a, 1978b).Particularly in tests with reconstituted dilution water made from adding Ca, Mg, and Na salts in constant ratios, the pH and alkalinity will rise as the hardness is increased (USEPA, 2007, Appendix C1).However, in natural waters this may not always be the case (see the Supporting Information).With the recognition of the importance that factors such as pH and DOC have on Cu speciation, competition, complexation, and toxicity, the old hardness-based criteria have come under severe criticism.The nature of the criticisms centers on three areas: (1) Negligible or inconsistent effects of water hardness on Cu toxicity and the failure of the hardnessbased criteria to track changes in Cu toxicity in natural waters (Apte et al., 2005;Chapman et al., 1980;De Schamphelaere & Janssen, 2002;Hyne et al., 2005;Iwasaki et al., 2022;Markich et al., 2006;Wang et al., 2009;Winner, 1985); (2) Failure of the hardness-based criteria to protect sensitive organisms or aquatic communities (Iwasaki et al., 2022;Joachim et al., 2017;March et al., 2007;Markich et al., 2005;Wang et al., 2014;present review); and (3) Chemosensory toxicity and related maladaptive behaviors such as avoidance or lack of predator avoidance of Cu concentrations that occurred at concentrations lower than the hardness-based criteria (Hecht et al., 2007;McIntyre et al., 2012;Meyer & Adams, 2010).
Recognition of the negligible effects of water hardness on Cu toxicity relative to other factors including pH and DOC led to more sophisticated and complex approaches to predict Cu toxicity to aquatic organisms.

REGULATORY MODELS FOR PREDICTING Cu BIOAVAILABILITY AND TOXICITY
The great majority of jurisdictions in the United States use versions of USEPA's hardness-based criteria (USEPA, 1985b(USEPA, , 1996)).Although the USEPA (1985b) noted that organic carbon was sometimes reported to have more effect on Cu toxicity than hardness, consistent with criteria developed for other metals at the time and available data, the criteria were expressed as a function of water hardness.Since that time, the USEPA-recommended approaches for bioavailability-adjusted metals criteria have progressed through three strategies: hardness-regression models paired with WERs, mechanistic BLM, and empirical MLR models.A discussion of each one follows.

WERs
The USEPA (1994) recognized that the steps taken at that time to address bioavailability in metals criteria (expressing criteria on a dissolved rather than total [unfiltered] concentration basis and the hardness adjustment) did not address the influence of other important factors on metals bioavailability, namely, DOC.The WER concept for bioavailability adjustments was developed for states to accommodate requests from dischargers to make empirical bioavailability adjustments to metals criteria (USEPA, 1994(USEPA, , 2000)).The concept behind WERs is simple and intuitive.The toxicity tests that were acceptable for use in aquatic life criteria were conducted in laboratory waters that had low concentrations of total organic carbon (<5 mg/L) or particulates.However, environmental waters that receive treated sewage will have elevated organic carbon, clay particles, and particulate organic matter that can sorb to metals and reduce their bioavailability (Stephan et al., 1985).Therefore, the hardness-based metals criteria could be adjusted by conducting acute toxicity tests of a metal in parallel, using site water collected from the receiving water (containing the diluted effluent) and using a standard laboratory water of a similar hardness.The ratio of the concentration lethal to 50% of the test population (LC50) from the site water and the laboratory waters was the "water-effect ratio," which could then be multiplied by the statewide acute and chronic hardness-based criteria to obtain WER adjusted site-specific criteria.The guidance called for seasonal testing to account for variability of site waters and chemical characterization of total and DOC and major ions for the test waters.The standard laboratory waters specified for use in WER testing were synthetic waters that were prepared using purified laboratory water with various salts added to produce reconstituted water of the desired hardnesses (USEPA, 1994).Many of these studies have been completed, generating Cu WERs ranging from at least 1.1 to 65 (Adams et al., 2020).Di Toro et al. (2001) recommended expanding the concept through modeling by using the Cu BLM as a more efficient way of generating WERs by site water chemistry in lieu of empirical toxicity testing, which can be expensive and time consuming and can have high variability.
Example WER calculations for several natural waters and laboratory analogs using the acute Cu BLM as suggested by Di Toro et al. (2001) and with the acute Cu MLR toxicity model (USEPA, 2007) are given in Table 1.Depending on the toxicity model used, the WER multiplier for these site waters ranged from 2.5 to 10.8.However, in a devious trick, for the "site water" chemistry examples used to produce the Table 1 values, rather than using chemistry from streams receiving treated wastewater, as intended by the WER concept, all are from laboratory waters from prominent toxicity testing laboratories that use natural water sources for their dilution waters.These include dechlorinated tap water (Environment Canada, Burlington, ON; University of Saskatchewan, Saskatoon, SK, Canada), river-influenced shallow well water (USEPA, Corvallis, OR), and filtered lake water (Lake Superior, Duluth, MN, USA and Horsetooth Reservoir, Fort Collins, CO, USA).The majority of the acute toxicity tests that were used in the hardness-based Cu criteria were from laboratories using natural water sources, not the reconstituted water recipes recommended in the WER guidance (USEPA, 1985b(USEPA, , 1994;;Welsh et al., 2001).Following the WER concept, a laboratory water-to-laboratory water comparison with similar hardness values between them should produce WERs close to 1.0, not 2.5 to 11 (Table 1).The implication of the laboratory water-to-laboratory water WER examples is that WER studies that use reconstituted water recipes to represent the laboratory waters used to generate the acute Cu criteria can produce WERs that are biased by factors of 2.5 to >10.
A further undesirable feature of the WER approach is the likelihood of obtaining illogical results, in that multiplying the criteria by a WER can produce a supposedly protective sitespecific criterion concentration that is higher than concentrations that were acutely toxic in the same site waters used to produce the WER.For example, the MLR model predicted a Ceriodaphnia dubia LC50 of 17 µg/L in Lake Superior water, and a WER calculated with the MLR (i.e., MLR-WER) adjusted site-specific acute criterion of 23 µg/L (Table 1).A site-specific criterion that is higher than a concentration killing 50% of a sensitive species in that same site water can doubtfully be considered fully protective for that site.In 2001, the USEPA published a streamlined WER approach that constrained the maximum ratios, although it did not necessarily supersede the 1994 interim WER guidance (USEPA, 2001).Furthermore, at least with fish, the protective effect of hardness against metal toxicity is mostly from Ca, not Mg (discussed further in the Supporting Information).However, the specified reconstituted water recipes have higher Mg than occurs in most natural waters, among other differences (Diamond et al., 1997;Welsh, Lipton, Chapman, 2000;Welsh, Lipton, Podrabsky, et al., 2000).Furthermore, as will be shown in the present study, the hardness-based criteria fluctuate seasonally in natural waters in patterns that are disconnected from reliable predictions of toxic or nontoxic conditions.Multiplying a hardness function that has little predictive capability to predict toxic or nontoxic conditions by a fixed ratio cannot much improve its predictive capacity.
The USEPA's interim recommendation for WERs advanced the science in that the various WER studies generated highquality datasets providing valuable data that advanced our understanding of metal toxicity-modifying factors in water.These WER datasets facilitated the development and evaluation of superseding concepts including the BLM and MLR models (Di Toro et al., 2001;Santore et al., 2001;present review).Site-specific toxicity testing is of unquestionable value, because, as will be discussed, the BLM and MLR predictive toxicity models are robust but have their limits of reliable performance.However, the specific 1994 "interim" WER construct for interpreting site-specific toxicity testing as a scientifically defensible tool to establish bioavailability-adjusted sitespecific Cu criteria, is not supported by more recent science, and may produce unprotective and misleading results, such as in the Table 1 illustration.Although current USEPA policy recommendations on developing bioavailability-adjusted aquatic life criteria for metals have moved well beyond WERs (USEPA, 2022a), the WER procedures were codified into most state water quality standards (USEPA, 1992), and in some states, the 1994 "interim" WER procedures remain codified in state water quality standards (USEPA, 2000).As of November 2022, references to the 1994 interim WER guidance still appeared in the regulatory water quality standards for metals criteria in 28 states, plus in the USEPA's own water quality standards (USEPA, 2021).

The BLM
Although this manuscript imprecisely refers to "the BLM," there are actually many BLMs, and the BLM should be thought of as a family of related models for predicting toxicity of metals (Mebane et al., 2020;Niyogi & Wood, 2004;Paquin et al., 2002;Smith et al., 2014).In the present study, the focus is on the performance of the BLM version used in USEPA (2007), which was in turn refined from the BLM originally described by Di Toro et al. (2001) and Santore et al. (2001).Although the performance of the 2007 version of the Cu BLM has been further optimized (Brix et al., 2021;Environment Canada, 2021), the present review is specific to the USEPA (2007) recommended criteria version and "the BLM" refers to this version unless otherwise specified.
The scientific underpinnings of the BLM include a mechanistic paradigm with a strong physiological basis that predicts dissolved metal binding at the site of action (gill in fish), which in turn is related to mechanisms of acute toxicity (blockage of sodium channels and hyponatremia) for Cu.The model incorporates pH, DOC, major ions, and alkalinity, which allows for prediction of toxicity under a broad range of environmental variables.Specifically, the BLM requires 12 user inputs: temperature, pH, DOC, a fraction of whether DOC is as humic or fulvic acid, Ca, Mg, Na, K, chloride, sulfate, alkalinity, and sulfide.The required sulfide input is a placeholder that is not  C-2, interpolated to match "site" hardnesses); Lake Superior at Duluth, Minnesota, USA (Erickson et al., 1996); Lake Ontario at Burlington, Ontario, Canada (Borgmann et al., 2005); Saskatoon at the University of Saskatchewan, Canada (Vardy et al., 2014); Horsetooth Reservoir at Fort Collins, Colorado, USA (Naddy et al., 2007); and Corvallis, USEPA Western Fisheries Toxicology Station, Corvallis, Oregon, USA (Chapman, 1978).The C. dubia BLM-predicted LC50s were calculated using the BLM Ver.3.41.2.45 software using the predefined acute C. dubia parameter file, which uses a critical accumulation (LA50) value of 0.05541 nmol/g.The MLR C. dubia predicted LC50s used Equation 1, substituting a C. dubia mean intercept of −5.89 (Brix et al., 2021).BLM = biotic ligand model; CMC = criterion maximum concentration; DOC = dissolved organic carbon; LC50 = median lethal concentration; MLR = multiple linear regression model; WER = water-effect ratio.
presently used in the reaction calculations.Users are advised to enter a default humic acid percentage of 10% (HydroQual, 2007;USEPA, 2007).The BLM includes an inorganic speciation submodel, and metal organic complexes are modeled using the code from the Windemere Humic Aqueous Model V (WHAM V; Tipping, 1994).
The BLM was developed to predict the acute toxicity of Cu, and its development relied heavily on a large dataset of acute toxicity tests with fathead minnow, Pimephales promelas (Erickson et al., 1996), which at the time was the only dataset of its type available in which water chemistry characteristics were systematically and comprehensively varied (Santore et al., 2001).A key assumption and feature of the version of the software used in the USEPA ( 2007) Cu criteria document allowed the use of conventional LC50 toxicity testing data to model inferred gill accumulations through speciation calculations rather than measured gill accumulation concentrations, as was done in the primary literature supporting the approach (MacRae, Smith, et al., 1999;Playle et al., 1993b).The inferred accumulation derived from a conventional toxicity test, is termed the "critical accumulation" value, or in the case of acute lethality data, is referred to as the LA50, which is the lethal accumulation concentration associated with 50% mortality.The LA50 values can be treated in the same manner as conventional hardness-toxicity regression-normalized toxicity data, to reduce the influence of water chemistry differences on intertest sensitivity differences.In theory, the LA50s represent the intrinsic sensitivity of test organisms that can be compared against each other, and species mean acute values can be calculated to rank the relative sensitivity of species into a species sensitivity distribution (SSD), which in turn can be used to define an acceptable acute water criterion at the 5th percentile of the SSD.These are the steps defined by Stephan et al. (1985) for reducing disparate acute toxicity testing data for a substance into a water quality criterion to protect aquatic communities.In the USEPA aquatic life criteria procedures, the "species" sensitivity distributions are actually calculated using genus mean sensitivity values.However, the term SSD is commonly used in ecotoxicology literature to describe ranked taxa sensitivities to chemicals, even though the taxonomic resolution may not always be at the species level (Posthuma et al., 2002;Stephan, 2002).
The 2007 BLM predicted the acute toxicity of Cu.An implicit assumption made in the USEPA (2007) Cu criteria is that the mechanisms of acute and chronic toxicity of Cu are similar, and the acute criterion can therefore be adjusted by a simple, fixed, acute-to-chronic ratio (ACR) to define a chronic criterion.The ACR set in the USEPA (2007) was 1.615, whereby the acute criterion divided by 1.615 equals the chronic criterion.
The publicly shared software version implementing the BLM code has been updated by its developers to maintain functionality with updated Windows operating system environments and to add or deprecate features (Santore & Croteau, 2019).Added features include a simplified chemistry input option, whereby hardness can be entered in lieu of Ca, Mg, Na, K, chloride, sulfate, and alkalinity.Because major ions are often correlated with each other in natural waters, the software then estimates these major ion values using median ratios for waters in the United States.The 2019 software version includes numerous predefined calculation options for predicting effects to various species and metals, including calculating the hazardous concentration at which 5% of species will be affected (HC5) values for waters for lead (Pb) and Zn (Santore & Croteau, 2019).The HC5 nomenclature is used to distinguish the water quality criteria concentration values that the model produces for Cu that was formally published by the USEPA, from criteria-like values for Pb and Zn that were developed by others in a manner consistent with that published by the USEPA for Cu (DeForest & Van Genderen, 2012;DeForest et al., 2017;USEPA, 2007).A deprecated function from the 2019 public version of the BLM software is the feature for users to calculate their own LA50 estimates from speciation calculations and then use those LA50s to normalize toxicity test data for different tests to a common water type.This LA50 normalization is a key step in analyses whereby users contrast difference in species sensitivities, such as when one is deriving SSDs to calculate USEPA-style criteria (HC5) values analogously to the USEPA's criteria derivation (DeForest & Van Genderen, 2012;DeForest et al., 2017;USEPA, 2007).The LA50 normalization step is also needed for those who wish to compare, for example, sensitivities of newly tested species with previous research (Calfee et al., 2014;Wang et al., 2009Wang et al., , 2011Wang et al., , 2014)).
The reactive DOM issue.A long-standing question among those working with BLMs and related constructs is whether only some DOC should be considered metals reactive and whether BLM inputs should be adjusted accordingly.Some fraction of the DOC appears to be relatively inert for Cu binding under natural conditions (Bryan et al., 2002;Dwane & Tipping, 1998;McKnight et al., 1983).For instance, Fulton and Meyer (2014) concluded that only an average of 43% of the DOC in their study provided a protective effect against Cu, and Bryan et al. (2002) concluded that, on average, only 65% of DOC in natural waters is reactive or complexes with metals (range 40%-80%), whereas the remainder of the DOC is inert with respect to ion binding.This had led to disparities in approaches for adjusting measured DOC for the metals-reactive fraction by those developing or using BLMs.For example, modelers have multiplied DOC by 65% (Balistrieri & Mebane, 2014), or by 70% (Balistrieri & Blank, 2008), or by 50% (De Schamphelaere & Janssen, 2004a;Villavicencio et al., 2005;Welsh et al., 2008), or by a site-specific function using a UV-absorbance-based aromaticity correction (De Schamphelaere et al., 2004).The USEPA (2007) version of the Cu BLM does not mention the issue and thus appears to assume 100% DOC reactivity.In some datasets of Cu toxicity under varying concentrations of DOC, failing to include reactivity adjustments resulted in biased BLM predictions, with DOC having a greater effect in model predictions than was observed in toxicity testing (De Schamphelaere & Janssen, 2004a;Fulton & Meyer, 2014;Villavicencio et al., 2005;Welsh et al., 2008).This can result in overestimations of Cu toxicity at low DOC levels and underestimations at high DOC levels.
Four comparisons of BLM-predicted Cu free ion and experimentally determined Cu free ion concentrations in natural waters are shown in Table 2. Predicted speciation was calculated using the BLM in speciation mode, in which it implements the WHAM V humic submodel (Tipping, 1994).In each instance, the BLM-predicted percentages of Cu present as free ion Cu concentrations were far lower than those from the experimental determinations (0.01%-0.4% vs. 7%-44%, respectively; Table 2).
In addition to the worked examples in Table 2, some other comparisons of measured free Cu versus WHAM-predicted free Cu have reported that WHAM overpredicted organic carbon complexation of Cu by 1 order of magnitude or more, resulting in measured free-ion concentrations greatly exceeding predicted values (Boeckman & Bidwell, 2006;Christensen et al., 1999;Nolan et al., 2003).It should be noted that these studies and the Table 2 comparisons used older WHAM versions V and VI, whereas the most recent WHAM version available at the time of writing (VII) is reported to have better performance in predicting Cu-organic complexes than did the superseded versions (Tipping et al., 2011(Tipping et al., , 2015)).However, the comparisons are still pertinent, because the USEPA (2007) national recommended Cu criteria relied on the WHAM V code.
The implications of the apparent over-reactivity of the BLM-WHAM V model for predicting Cu binding to DOM in the implementation of the 2007 BLM-based criteria are that the criteria could be overly responsive to DOC.This could lead to the criteria being less protective than intended (too high) under conditions of high DOC or overly protective (too low) under conditions of low DOC.
In the 15 years plus since the USEPA recommended the adoption of BLM-based water quality criteria, only six states had adopted the BLM-based Cu criteria into their state-wide water quality standards, with nine more allowing it as a site-specific standard (USEPA, 2022b).Reasons for the limited adoption of the USEPA (2007) BLM-based criteria are presumably related to concerns regarding the additional data requirements of the BLM-based criteria, the problem of missing data when determining the need for or quantifying effluent limits, and the overall complexity of bioavailability-adjusted criteria.Efforts to address these concerns through less complex bioavailabilityadjusted metals criteria have led to the empirical MLR approach to water quality criteria for metals (Adams et al., 2020;Brix et al., 2020;Mebane et al., 2020;USEPA, 2022a).

MLR models
Multiple-factor equations have been used in criteria since the 1980s when the USEPA established ammonia criteria using nonlinear equilibrium calculations that predicted un-ionized ammonia as a function of pH and temperature (Emerson et al., 1975;USEPA, 1985a).Most recently, national water quality criteria for aluminum were developed using MLR equations as functions of three dependent variables: hardness, DOC, and pH (USEPA, 2018).With Cu, because of the observed interplay of different factors, MLRs and related approaches have repeatedly been used to explain the combined influence of several independent factors (e.g., pH, DOC) on a single dependent variable (e.g., LC50s; Brix et al., 2017;Cui et al., 2023;Fulton & Meyer, 2014;Howarth & Sprague, 1978;Meador, 1991;Waiwood & Beamish, 1978a, 1978b;Welsh et al., 1993).
There are, however, two fundamental differences between the classic use of MLRs to explain individual study results and the "2020 MLR approach" put forward as a framework for establishing bioavailability-adjusted water quality criteria for metals (Adams et al., 2020;Brix et al., 2020).First, the selection of variables in the classic MLR approach may use stepwise regression to statistically test which variables most influence toxicity, whereas the 2020 MLR approach relies on the theory and predictions of previous models and toxicity studies to identify the variables of interest a priori.The 2020 approach proposes that hardness, pH, and DOC are fundamental variables for use in MLRs across the metals of regulatory concern (Brix et al., 2020).No BLM has ever used water hardness for modeling, and none of the MLR studies reviewed tested whether hardness was a better predictive variable than Ca.However, hardness and Ca are correlated in natural waters, as are all the major ions used in BLM modeling (Ca, Mg, Na, K, Cl, SO 4 , alkalinity), making hardness a reasonable surrogate for major ions in most waters (Carleton, 2008).Furthermore, the use of hardness rather than Ca retains a linkage to the more familiar hardness-based criteria.
The second major difference between the 2020 MLR approach for using MLRs in criteria and classic MLRs is the use of pooled models.Whereas classic MLRs are developed for the individual species being studied, such as fathead minnow or rainbow trout, pooled models are intended to perform adequately for all species and can be used as a basis to protect aquatic communities as a whole, which is a fundamental precept of establishing water quality criteria (Adams et al., 2020;Brix et al., 2020;Stephan et al., 1985).The development of pooled MLRs for multiple species in this context is simply the logical extension of the development of pooled hardness-toxicity slopes in traditional metals criteria documents (Stephan et al., 1985).In Stephan et al. (1985), the hardness-toxicity slopes developed with different species are pooled through analysis of covariance.
The 2020 MLR approach simply expands that to include pH and DOC as additional toxicity-modifying factors (Brix et al., 2020).Brix et al. (2017) published the first multispecies MLRs for Cu as a function of hardness, DOC, and pH in a framework suitable for use in setting aquatic life criteria, following the Stephan et al. (1985) framework.Two approaches and two versions of the chronic criteria equations were developed by Brix et al. (2017), the ACR approach and the direct FCV approach.The ACR approach to setting chronic criteria requires the assumptions that the mechanisms of toxic actions and the influences of toxicity-modifying factors are similar for acute and chronic and that acute models can be extended to chronic criteria by dividing by a simple fixed ratio.As a practical matter, the ACR approach is required for chemicals with insufficient data to directly develop a chronic criterion with chronic data.The FCV approach does not require that assumption and directly uses chronic data to derive a chronic criterion (Stephan et al., 1985).Brix et al. (2017) found differences between the acute and chronic regression slopes that suggested different modes of acute and chronic toxicity.This observation was consistent with earlier reports that for Daphnia magna, acute and chronic toxicity could not be predicted from the same BLM (De Schamphelaere & Janssen, 2004a).Similarly, Chapman et al. (1980) found that hardness mitigated Cu toxicity in acute tests with Daphnia magna but not in chronic tests.However, the differences between chronic criteria values calculated by either the ACR or FCV methods were not large (Brix et al., 2017).
This development of Cu MLRs in an aquatic life criteria context was followed by extensive comparisons of the performance of the MLRs with an updated Cu BLM version (Brix et al., 2021).The comparisons greatly expanded and synthesized performance of the models with different species and across a range of diverse water types.In addition, refinements to both MLRs and the BLM were made (Brix et al., 2021).
The pooled models for acute and chronic criteria developed by Brix et al. (2021) were: Acute Cu criterion g L exp 0.700 ln DOC mg L 0.579 ln hardness in mg L 0.778 pH 6.738 (2 where exp is the number "e" raised to the power in the superscript, and ln is the natural logarithm.

Evaluations of Cu MLR model performance and criteria protectiveness
In this section, the performance of the MLR models is evaluated with datasets of results from diverse waters, species, and endpoints.The evaluations focus on the performance of the MLR relative to the hardness-based criteria because those are still most commonly used in statewide water quality criteria in the United States.The USEPA (2022a) reported additional comparisons of the performance of the best available Cu BLM and MLR versions and found that the performance of the MLR and BLM was similar for the acute models, but for chronic models the MLR was clearly superior.There is little overlap between the datasets evaluated in the present study and those used as training datasets in the MLR development (Brix et al., 2021) or the evaluations of the USEPA (2022a, Table 3).Three of the five fathead minnow (Pimephales promelas) datasets in Figure 1 (Ryan et al., 2004;Sciera et al., 2004;Welsh et al., 1993) were among the datasets used by Brix et al. (2021).These three datasets provided 61 of the 903 (7%) of the acute tests used to develop the MLR.
Many "predicted versus observed" graphs are presented to visualize the performance of the MLRs.The ideal outcomes of these predictions are that the linear regressions between the predicted and observed values have a coefficient of determination (R 2 ) near 1, indicating that the model explains most of the variance in the dataset, a slope near 1, and an intercept near 0, indicating little bias to the predictions.The data are plotted as observed (in the y-axis) versus predicted (in the x-axis) regressions, in which the model predictions are considered the independent variable and the observations are treated as the dependent variable (Piñeiro et al., 2008).Plots comparing the model predictions also show whether the variability of responses falls mostly within a factor of 2 of the line of perfect agreement.The latter test has become something of a convention in related model testing.For example, the variation in repeated tests of fathead minnows in the same dilution water source could vary by up to a factor of 6 (Mebane et al., 2020, their Figure 8;Santore et al., 2001).Overall, the factor-of-2 rule of thumb appears broadly applicable for acute metal toxicity to a range of species for median effective concentration (EC50) data (Meyer & Traudt, & Ranville, 2018;Price et al., 2022).In the present study, the MLR models' predicted toxicity for a test is calculated using the species mean intercept value for that dataset as described by Brix et al. (2017, their Equation SI-2).Thus the accuracy of the model predictions cannot be expected to be better than the intrinsic biological variability of test organisms, which can be compounded by pooling data generated from different laboratories, cohorts, and strains of the same species.In some datasets, bivariate toxicity-hardness plots are also shown because the hardness-toxicity model is effectively the null model.In these plots, simple linear regression lines are shown.

Acute toxicity of Cu
Fish.
Soft-water tests with fathead minnow.The fathead minnow is a model organism that is extensively used by aquatic toxicologists worldwide to evaluate the relative potency of compounds and factors affecting toxicity (Ankley & Villeneuve, 2006).The BLM was calibrated to an extensive fathead minnow dataset that tested responses of Cu to several different toxicitymodifying factors (Santore et al., 2001;USEPA, 2007).The original dataset, by Erickson et al. (1996), included many tests with fathead minnows and Cu in natural water from Lake Superior that had been manipulated with varying different factors that could potentially control toxicity, such as DOC (humic acid), pH, Ca, Mg, Na, temperature, and suspended solids.These data could be explained well by the BLM, and the model's performance with this dataset greatly influenced its adoption in the USEPA (2007) criteria.Brix et al. (2021) demonstrated that the MLR predictions were as strong with the updated MLR dataset as those with the 2021 updated BLM, and no further analyses of that dataset were made as part of the present study.
However, other research has suggested that the BLM for acute Cu toxicity underpredicted toxicity (that is, overpredicted the LC50s) in very soft water with fathead minnow (Van Genderen et al., 2005).In the present review, datasets of acute Cu toxicity to fathead minnow in soft water (Sciera et al., 2004;Van Genderen et al., 2005;Welsh et al., 1993;Welsh, Parrott, et al., 1996) and hard water (Ryan et al., 2004) were compiled and regressed against hardness, BLM predictions, and MLR predictions (Figure 1).For the BLM predictions, the species mean LA50 of 2.97 nmol/g wet weight were from the USEPA (2007), and that species mean LA50 was used to predict toxicity under the varying water quality conditions of each test as described in earlier articles (Santore et al., 2001;Wang et al., 2009).With the MLR predictions, using the slope terms from Equation 1, rearranged as described in Brix et al. (2017), an intercept was calculated for each fathead minnow test.The geometric mean of all the intercepts for the fathead minnow tests was then used as the species mean intercept.The species mean intercept (−4.109) was then used with the hardness, DOC, and pH values from each test as an MLR predicted value (Brix et al., 2017; their Supporting Information 1).These are then plotted and evaluated by how well the MLR-predicted LC50s agreed with the observed LC50s.The same approach was used for all species or endpoint comparisons with the BLM or MLR.
Although there is clearly a pattern of increasing LC50s with increasing hardness (i.e., decreasing toxicity) in Figure 1A, the variability is so severe as to render a hardness-toxicity relationship useless for predicting toxicity.For instance, at a hardness of 20 mg/L, fathead minnow LC50s could be anywhere from 2 to over 300 µg/L (a factor of 150) and at a hardness of approximately 90 mg/L, LC50s range over a factor of approximately 20 (~100 to >2000 µg/L).In Figure 1B, the hardness-toxicity linear regression equation is rearranged to plot observed LC50s in the same style as the BLM and MLR plots.The purpose of the Figure 1B plot is simply to show the hardness-equation predictions in a visually comparable style as the BLM and MLR plots.When toxicity is predicted from a single variable, such as hardness, this observed:predicted style of presentation is far less informative than a simple scatterplot and correlation, and thus it is not used further, beyond this illustration.
The BLM predictions were strongly correlated with the observed toxicity responses in both the soft water and hard water datasets.However, the comparison shows consistent bias, with the BLM generally underpredicting toxicity in the more toxic samples with low measured LC50s (Figure 1C).Like the BLM, the MLR predictions were also strongly correlated with observed toxicity but with greatly reduced bias regarding underpredicting the toxicity of the most toxic, soft water tests.
The BLM had a systemic problem of underpredicting Cu toxicity to fish in very soft water (Figure 1C).This limitation to the BLM predictions goes beyond defining optional competition terms for H, Ca, Na, and Cu or other metals at the biotic ligand.Instead, it is likely related to conditioning changes that the fish undergo in low-ionic-strength waters that are independent of metals exposure (Mebane et al., 2010;Taylor et al., 2000;Van Genderen et al., 2008).In very soft waters without elevated metals, fish have increased energy requirements for respiratory gas transfer across the gill and to counter passive diffusive losses of Ca and Na from their bodies (Wendelaar Bonga & Lock, 2008).The MLR model did not show this soft-water toxicity underprediction bias (Figure 1D).The BLM could be modified to eliminate this bias through incorporating an LA50 correction for soft water (Paquin et al., 2012).However, this would add complexity to an already complex BLM model, and, as noted, the main purpose in the present study is to compare regulatory criteria options, and the USEPA (2007) BLM is currently the alternative regulatory option to the MLR.
Rainbow trout.Rainbow trout have commonly been used for the foundational work of effects of Cu and other metals on physiology, bioavailability, and mechanisms of toxicity (Grosell, 2011;MacRae, Smith, et al., 1999;Playle et al., 1992Playle et al., , 1993aPlayle et al., , 1993c)).In the present study, two large datasets of acute toxicity responses of rainbow trout in natural waters were evaluated with the MLR and hardness models (Figure 2).Both datasets are from limited-circulation gray literature and were not previously evaluated in the MLR development.The first dataset with salmonids was from a study that tested the comparative sensitivity of rainbow trout and Chinook salmon to Cu in natural waters of the upper Sacramento River in northern California.Tests were also conducted in laboratory waters in which Ca, Mg, and pH were manipulated (Welsh et al., 1998;Welsh, Marr, et al., 1996).All necessary water chemistry parameters were measured, and experimental controls were tight and well described.The tests in natural river waters tended to have soft water, low DOC, and pH in the ranges that are typical of other salmonid-bearing, forested waters in coastal or mountainous regions of California and western North America.Although the ranges of water chemistry data are fairly narrow, these data are otherwise nearly ideal for the evaluation of model performance under environmentally realistic conditions.A second large dataset with rainbow trout and Cu is from a sitespecific criteria study from the Clark Fork River, Montana (ENSR International, 1996).In the case of the Clark Fork testing, rainbow trout were tested in laboratory and in natural waters from tributary and river sites during different seasonal "rounds" of testing with measurements of all BLM chemical parameters.This resulted in values ranging from very soft to very hard waters and DOC from <1 to 11mg/L (ENSR International, 1996).In addition, the evaluation includes a study that varied inorganic factors in a series of nine tests with cutthroat trout and Cu that alternately held alkalinity constant and varied hardness or vice versa (Chakoumakos et al., 1979).
The MLR acute model predictions were strongly correlated with the observed LC50s (R 2 = 0.85), a slope close to 1, and the majority of the predictions within a factor of 2 of the predictions (Figure 2).The MLR performance with rainbow trout in these mostly natural water datasets was far superior to that reported for rainbow trout by Brix et al. (2021), in which their dataset was dominated by tests conducted in laboratory dilution waters that had been manipulated to test different water characteristics (R 2 = 0.48).Welsh et al. (1996Welsh et al. ( , 1998)).
Cu toxicity across pH gradients and model performance.pH is a particularly challenging toxicity-modifying factor to incorporate into either BLM or MLR models because of its unique influence on organism physiology and because nonlinear influences on toxicity are difficult to capture in linear model structures (Brix et al., 2020).The MLR model performances are compared with two data groups in Figure 3.The first group tested Cu toxicity across pH gradients that extended to very low pH conditions (Cusimano et al., 1986;Ng et al., 2010), and the second type tested Cu toxicity across narrower ranges that are routinely encountered in natural oligotrophic waters (Brix et al., 2020).This second type of test was conducted with natural waters from the Sacramento River that were tested at either ambient pH of approximately 8 or artificially lowered to a pH 6 (Figure 3B).
At the extreme low pH values below approximately 5.7, the Cu MLR predictions failed to even correctly predict the direction of responses (i.e., whether toxicity would decrease or increase as pH increased; Figure 3A).This illustrates the limitations of an empirical model using linear functions for an approximately linear portion of a nonlinear function (Figure 3).In contrast, the performance of the MLR was excellent in the natural water tests between pH 6 and 8 with three of the four pairs of Sacramento River tests having parallel slopes and in the 4th pair, the direction of toxicity response was at least in the right direction (Figure 3B).Thus the pH range tested in the Sacramento River studies was within a linear portion of the Cu-pH response curve as well as being within the range encountered in the vast majority of surface waters across the United States (Brix et al., 2020;Supporting Information).In that range, MLR performance was robust.
The performance of the 2007 BLM across pH gradients was previously evaluated with the same datasets from Figure 3 and other datasets (Balistrieri & Mebane, 2012, pp. 81-85).Their comparisons of Cu toxicity as modified by pH showed that in different, closely matched test series, contradictory response patterns have occurred between rainbow trout and fathead minnow tests, and even between different tests in a series.There, the predictions of the BLM with fathead minnow tests that were from the original Erickson et al. (1996) training dataset were excellent.However, unlike the MLR, the 2007 BLM performed poorly with the Sacramento River datasets, and as with the MLR, the BLM performed poorly with the very low pH datasets below 5.5.The pH responses appeared to be different with different Ca concentrations, and even among factorstesting studies with fathead minnows, markedly different patterns arise from tests across similar pH ranges at overlapping Ca concentrations (Balistrieri & Mebane, 2012, pp. 81-85).These different patterns elude simple explanations but emphasize an important limitation to the goal of developing overarching models applicable to nearly all waters and species: if fundamentally different biological responses result from similar water chemistry conditions, no water-chemistry-based model is ever going to be able to predict these different biological responses (Mebane et al., 2020).
Cu toxicity across a low DOC gradient and model performance.The development and refinement of the Cu MLRs included extensive training and evaluation of the performance relative to DOC (Brix et al., 2017(Brix et al., , 2021)).In Figure 4, the performance of the MLR and BLM are contrasted over a gradient of very low DOC concentrations from 0.15 to 1.9 mg/L using independent data from Welsh et al. (2008).The MLR predictions closely matched the observed data, whereas the BLM predicted much greater reductions in toxicity than were actually observed (Figure 4).It is possible that the oversensitivity of the BLM to the mitigating influence of DOC on Cu toxicity is related to The reactive DOM issue, discussed earlier in that section.
Invertebrates.Brix et al. (2021) included extensive model testing with acute and chronic D. magna datasets.The performance of the MLR toxicity predictions with other invertebrate taxa was further evaluated in the present study with  Ceriodaphnia dubia.Linton et al. (2006) conducted an extensive study of Cu toxicity in natural waters collected from forested streams and lakes in Michigan's Upper Peninsula, over a wide range of hardnesses and DOC.They tested the toxicity of Cu to C. dubia in approximately 25 natural waters with diverse water chemistries.The hardnesses of the stream and lake waters ranged from approximately 17 to 185 mg/L and the DOC values from 0.8 to 30 mg/L.The toxicity in a moderately hard artificial reference water was tested as a benchmark of the inherent variability of the C. dubia responses.The other C. dubia and Cu datasets evaluated in the present study were acute and chronic testing across a DOC gradient (Wang et al., 2011) and chronic testing over a variety of natural waters (Schwartz & Vigneault, 2007).
Hardness had no predictive value whatsoever in the acute Cu toxicity patterns in the testing from the Michigan natural waters (Figure 5A).In contrast, DOC was strongly predictive of acute Cu toxicity (Figure 5B), with a much higher coefficient of determination (R 2 ) value than the MLR with the Michigan dataset (Figure 5C).Curiously, Wang et al.'s (2011) acute testing of laboratory waters amended with DOC produced a similar slope with the MLR as did the Michigan dataset, but with a nearly perfect R 2 value.The better fit of the C. dubia acute Cu responses to DOC than to the MLR predictions with the Michigan waters is surprising considering that pH and hardness also varied widely.The chronic C. dubia responses to Cu were also well explained by the MLR predictions, with R 2 values of >0.9 and almost all predictions falling within ±2× of the line of perfect agreement (Figure 5D).
The MLR also performed well in predicting the acute toxicity of Cu to C. dubia in the highly urbanized waters of the Los Angeles River watershed in southern California (Figure 6A).In contrast to the forested, natural waters of northern Michigan, the Los Angeles River watershed is characterized by high DOC, high pH, and high hardness owing to the ubiquitous contact of runoff water with high Ca and alkaline concrete and asphalt surfaces, as well as numerous discharges of treated municipal wastewater.The watershed also has pronounced wet and dry cycles, producing widely varying hardness values (Larry Walker Associates, 2014).In contrast to the MLR predictions, Cu toxicity was only weakly predicted by hardness, with EC50s ranging between 20 and 1000 µg/L at a hardness of approximately 400 mg/L (Figure 6B).2011) amended diluted well water by mixing a concentrate of natural organic matter from the Suwannee River (GA, USA) to obtain a gradient of DOC concentrations.The patterns were similar despite the disparate DOC sources.Wang et al. (2009Wang et al. ( , 2011) ) included similar comparisons between the 2007 BLM and the USEPA (1996) hardness-toxicity model.The BLM predictions were similar to the MLR predictions, including the differences in slopes between the studies.There was little relationship between hardness and Cu toxicity (Wang et al., 2009).Gillis et al. (2010) similarly tested the glochidia stage of L. siliquoidea mussels for toxicity to Cu using waters collected from a variety of natural waters and in laboratory soft water that was amended with DOC extracts collected from an the Luther Marsh (ON, Canada).For these comparisons, only the natural water glochidia tests were used to set the intercepts for the predicted MLR EC50s.Whereas the predicted and observed Cu EC50s of the glochidia tested in natural waters overlapped with those from Wang et al.'s (2009) tests with older juveniles, the observed EC50s in Gillis et al.'s (2010) tests with Luther Marsh extracts were much higher than those predicted from the natural water tests.This was likely due to stronger metal complexing abilities of the terrestrially plant-derived, darker, organic matter extracted from the marsh compared with the lighter autochthonous organic matter (i.e., derived from aquatic photosynthesis) occurring in open waters (Gillis et al., 2010;Schwartz et al., 2004).
Hyalella azteca.Two Hyalella and Cu datasets were located that were usable for evaluation.The results of the MLR predictions were considered reasonable (Figure 7).In a series of four tests testing natural lake waters with different DOC concentrations (Welsh, 1996), the predictions were highly correlated with the observations (Figure 7C).The Collyard (2002) data series with variable Ca and pH in the absence of DOC also showed reasonable agreement between observations and predictions (Figure 7C), whereas hardness predictions were poorer in both datasets (Figure 7D).Another prominent dataset with Hyalella and Cu was examined, but it was unusable for the model evaluations.Borgmann et al. (2005) tested the effects of major ions (Ca, Mg, Na, and K) and pH on Cu toxicity to H. azteca.The data were from static, nonrenewal, 1-week exposures in which the animals were fed twice during the tests.The DOC in the artificial media used as a dilution water rose from ≤0.2 mg/L before the introduction of animals or food to a range of 0.4 to 2.8 mg/L (average 1.72 mg/L) at the end of the test.For tests with low alkalinities, this uncertainty in DOC values resulted in prediction differences greater than a factor 10. Thus, this large dataset was not useable to evaluate model performances.

Olfactory and behavioral toxicity
Studies reporting olfactory toxicity or behavioral responses of salmonids to Cu and that reported sufficient chemistry to calculate MLR criteria are summarized in Table 3 and in Figure 8. McIntyre et al. (2008a) conducted electrophysiological recordings from the olfactory epithelium of juvenile coho salmon (Oncorhynchus kisutch) following exposure to Cu under varying conditions of Ca, alkalinity, hardness, and DOC.These responses (electro-olfactograms [EOGs]) were reinterpreted by Meyer and Adams (2010) as EOG 20% effect concentrations (EC20s), which allowed comparison of the same relative level of effect across studies.Meyer and Adams (2010) argued that whereas <20% olfactory impairment might be considered important for some species of concern, the  Linton et al., 2006); acute data in laboratory water and chronic data in mixed laboratory and natural waters (Schwartz & Vigneault, 2007;Wang et al., 2011).DOC = dissolved organic carbon; LC50 = median lethal concentration; MLR = multiple linear regression.
variability associated with behavioral testing would make a smaller effect percentile of questionable meaning.
Such EOG EC20 values are modeled with the acute MLR in the same manner as for all the toxicity endpoints using an intercept of −6.208, and the experimental EC20s are plotted against the predictions in Figure 8.The MLR predictions were strong, with an R 2 of 0.81.
The protectiveness of the different criteria formulations (hardness vs. MLR) are evaluated by comparing the EC20 level of impairment with the criteria values of appropriate exposure duration for selected salmonid and white sturgeon endpoints (Table 3).White sturgeon have also exhibited behavioral toxicity at low Cu concentrations (Calfee et al., 2016;Puglis et al., 2019).
Copper has been shown to elicit similar types of behavioral, olfactory, or other mechanosensory neuron impairment in other species, including fathead minnow, yellow perch, and zebrafish (Dew et al., 2012(Dew et al., , 2014;;Green et al., 2010;Meyer & Adams, 2010;Meyer & DeForest, 2018;Scott & Sloman, 2004;Thomas et al., 2016;Tierney et al., 2010).However, the present analyses focus on salmonids and white sturgeon because these taxa seem to be among the most vulnerable to sensory toxicity.
The hardness-based acute criterion was unprotective in 19 of 22 comparisons, in which the three protective scenarios were tests with elevated DOC.The MLR could be considered clearly unprotective in 1 of the 22 comparisons in which the MLRbased criterion was almost 2× higher than the endpoint (rainbow trout avoidance; Table 3), and at the margins of protectiveness in 7 of the 22 comparisons.In several of the MLR concentrations flagged as having greater than the EC20 values in Table 3, the MLR was only slightly higher than the endpoints when the hardness criterion was often several times higher.Meyer and Adams (2010) and Meyer and DeForest (2018) reached similar conclusions: that the BLM acute criterion was largely protective of olfactory toxicity from Cu to fishes.
Not all behavioral or olfactory toxicity endpoints are of equal importance to the life histories of fishes.Behavioral avoidance by fish is one of the most sensitive sublethal endpoints to elevated Cu concentrations.However, the biological significance of avoiding the side of a featureless laboratory test chamber that has Cu introduced is at best debatable.Conceptually, avoidance behaviors could disrupt movements in fish that are important to their life histories such as habitat exclusion or disrupted migratory movements.However, fish presumably evolved with occasionally hazardous locations or episodes of metal concentrations, and avoidance can be considered a defense mechanism to allow fish to evade hazardous conditions; thus demonstrating laboratory avoidance responses is an ambiguous surrogate for harm in the environment.In studies that measured avoidance of metals with test designs that included competing behavioral cues such as the presence of other fish or shelter, avoidance responses to the metals were greatly diminished relative to tests in bare chambers (Hartwell et al., 1987;McNicol et al., 1999;Tierney, 2016).
Olfactory or mechanosensory toxicity has clearer negative implications for fish successfully completing their life cycles in the wild than does behavioral avoidance.Migratory salmonids use olfactory imprinting to migrate to their natal streams to spawn, and impaired olfaction also interferes with the downstream movements of juvenile salmonids (Lorz & McPherson, 1976;Quinn, 2005;Saucier et al., 1991).Olfaction is important to many fish species because of its role in helping FIGURE 7: Acute and chronic copper toxicity of mussels and amphipods as correlated with the MLR predictions and hardness.Mussels and MLR predictions (A); mussels and hardness correlations (B); amphipod Hyalella azteca as a function of the MLR (C), and hardness (D).Mussel data: Wang et al., 2009Wang et al., , 2011;;amphipod data: Collyard, 2002;Welsh, 1996.DOC = dissolved organic carbon; LC50 = median lethal concentration; MLR = multiple linear regression., 1996, 2000); BLM = the biotic ligand model-based Cu criteria (USEPA, 2007); multiple linear regression-based Cu criteria (Brix et al., 2021).
prey fish avoid predators.Prey fish have a behavioral alarm response to olfactory predation cues that provides a survival benefit when under attack, and some pollutants such as Cu disrupt the behavioral alarm response (Hecht et al., 2007;Sandahl et al., 2007).
McIntyre et al. ( 2012) demonstrated reduced survival of Cuexposed prey fish in predator-prey encounters.Copper exposure altered prey (juvenile coho salmon) responses to olfactory predation cues in the presence of predators (adult cutthroat trout), and this "info-disruption" reduced prey fish survival.The primary impact of Cu on predator-prey dynamics was faster prey detection, measured as faster time to attack and time to capture.Whereas the natural behavior of juvenile salmonids in response to detecting a chemical alarm cue (a conspecific skin extract, mimicking the signal given off by a fish being attacked and eaten) is to freeze and settle to the substrate, Cu-exposed prey remained actively swimming in the water column.For visual predators of juvenile fishes (e.g., larger fishes, birds, otters), prey activity is a critical determinant of detection by predators (McIntyre et al., 2012).
McIntyre et al. ( 2012) tested predator-prey interactions in relation to Cu exposures in two trials.The first trial tested encounters between Cu-exposed prey and non-Cu-exposed predators.The result was a graded decline in prey survival times following 3-h Cu concentrations ranging from approximately 0.2 in controls to 20 µg/L, tested in freshwater well water with very low organic carbon concentrations (≤0.25 mg/L DOC).No threshold of response was found, because reduced survival times were observed at the lowest concentration tested, 5 µg/L.In the second trial, both the prey and predators were exposed to Cu in a subset of the trials, which did not markedly improve the ability of Cu-exposed prey to evade the Cu-exposed predator (Figure 9).
For the water chemistry conditions of the exposure waters from McIntyre et al.'s (2012) study, both the MLR and the 2007 BLM-based acute Cu criteria concentrations (0.65 and 0.36 µg/L, respectively) were only slightly higher than the average Cu concentrations measured in the control waters (0.16 µg/L Cu).In contrast, the acute hardness-based criterion (USEPA, 1996) values for a hardness of 56 mg/L CaCO 3 is approximately 8 µg/L, well into the range of decreased prey survival (Figure 9).No minimum threshold below which Cu exposures have no or little effect on predator-prey interactions was obtained.This was because in the predator-prey test design, there was no escaping,  and all of the prey were eventually eaten by the predator.However, the MLR-based acute criterion concentration was much closer to the control concentration than to the lowest adverse effect concentration, suggesting that adverse effects would be unlikely at the MLR criterion concentrations.
In summary, the available information indicates that in natural waters the MLR-based Cu acute criterion would likely be protective against olfactory or other neurological damage or behavioral impairment resulting from Cu exposures.In many water types, the older hardness-based Cu acute criterion would be considerably underprotective for chemoreception, behavioral avoidance, predator avoidance, and survival from predators.

Chronic toxicity of Cu
When evaluating a candidate aquatic life criterion, two factors, protectiveness of the criteria and predictive power of the toxicity functions, are germane.With chronic Cu toxicity data, the predictive toxicity in which actual toxicity to an organism is predicted by the MLR was addressed comprehensively by Brix et al. (2021).Additional datasets of the chronic responses of the freshwater mussel V. iris and the cladoceran C. dubia were predicted by the MLR with high accuracy (Figures 5D and 7A).No additional datasets were located that systematically tested chronic responses to Cu under three or more conditions of hardness, DOC, or pH (i.e., 3 or more points are needed for regression analysis), beyond those datasets used by Brix et al. (2021) to develop the chronic MLR equation.Thus the evaluation of the chronic criteria with chronic data focuses on the protectiveness of the MLR criteria versus the traditional hardness-based chronic Cu criterion (USEPA, 1996(USEPA, , 2000)).
A previously derived (Mebane et al., 2020) SSD compilation of chronic freshwater toxicity data for Cu was standardized to a common water quality condition using the MLR as described in Brix et al. (2017).Plotting this SSD against the MLR-based criterion shows that all but the most sensitive two species, both mayflies, would be protected (Figure 10) The lack of full protection of the most sensitive species in the SSD is a mathematical certainty for datasets larger than 20 because of the 95% protection goal of the USEPA's framework for deriving aquatic life criteria (Stephan et al., 1985;Stephan, 1985).
In the present study, the MLR response function is considered to be the closest estimate of "truth" for the toxicity predictions, based on the evaluations shown in Brix et al. (2021) and in the present analyses.Figure 10 illustrates that under selected DOC, hardness, and pH combinations, the hardnessand MLR-based chronic criteria provide equivalent protections.However, under combinations with high hardness and low DOC or low hardness and high DOC, the hardness-based criteria could be either severely underprotective or overprotective, respectively (Figure 10).

Field and community ecology studies
All the previous analyses of laboratory studies implicitly rely on the assumption that laboratory toxicity testing produces results similar to those that would occur to the same life stages of the same species in nature in waters of similar characteristics.Laboratory toxicity testing also implicitly assumes by omission that combined dietary and waterborne exposures of Cu would be no more toxic than waterborne Cu alone, because all the chronic tests were ostensibly water-only tests.In the field, organisms will be exposed to metals through diet and water.To at least some extent, so-called field validation, or ecological reality comparisons, give a sense of the reasonableness and representativeness of laboratory toxicity testing and criteria calculations in more natural settings (Burton et al., 2012;Chapman, 1995).
A simple field validation approach to evaluate whether a criterion would be protective and relevant in field conditions is to make comparisons with field studies or experiments.The absence of apparent effects in settings that do not or seldom exceed criteria would suggest that the criterion is protective.Several field and experiment ecosystem/community tests for which approximate thresholds of effect were reported or could reasonably be estimated, and for which sufficient water chemistry was available to calculate MLR and BLM criteria values are summarized in Table 4. Three types of study results were compared: (1) a long-term field study of streams recovering from historical Cu contamination; (2) studies that experimentally dosed natural streams with Cu; and (3) mesocosm studies.Each study type has strengths and limitations.
Field studies of recovering streams have the key strength of realism.The exposure durations are multigenerational, and any indirect effects such as altered prey food base for fish are incorporated.Limitations include the variability in exposures and natural variability in a system not under the physical control of the researchers.The absence of instream effects can never be proved; effects may be present that are obscured by natural variability and measurement error in field surveys.Furthermore, attributing apparent effects to the stressor of interest may be difficult because other unmeasured or correlated variables could be the cause.Thus biological effects in field surveys are qualified as "apparent" effects.Only one such field study with minimal confounding factors was identified.Big Deer Creek (ID, USA) is a mining-contaminated wilderness stream for which Cu was the dominant contaminant (Mebane et al., 2015).
Experimentally dosed natural streams have the same advantages of field studies plus the fact that the exposures are manipulated by the researchers and Cu is the only added stressor.Limitations include the facts that few treatments are feasible, so thresholds of effect and no effect may be poorly defined and that the studies were difficult to conduct and are rare and decades old.Two such studies with Cu were identified, one, a 3-year dosing of an entire stream (Shayler Run, OH, USA) with Cu, which included fish and invertebrate surveys, complemented by streamside toxicity tests (Geckler et al., 1976).The second experimental stream study dosed an Eastern Sierra Nevada stream, Convict Creek (CA, USA) with Cu for almost 2 years.The investigations included fate and transport of Cu, stream ecosystem functions (primary productivity, metabolism, decomposition, etc.), and macroinvertebrate community structure (Leland & Carter, 1984, 1985;Leland et al., 1989).
Mesocosm experiments represent an important middle ground between laboratory toxicity tests and field studies.Those conducted with natural communities and food webs have an obvious connection to natural ecosystems and greater ecological realism than laboratory toxicity tests.Unlike field surveys, mesocosm experiments control for effects of confounding variables, supporting causal relationships between stressors and responses (Buchwalter et al., 2017;Cadmus et al., 2016;Perceval et al., 2009).Limitations to the use of mesocosms for controlled tests of natural communities include exposure duration, which for "microcosm" scale community tests are usually far shorter than the annual life cycle of many aquatic insects.Tank or captivity effects may develop whereby over time the experimental ecosystems diverge from their natural counterparts (Schmidt et al., 2018).In large, outdoor mesocosms with complex food webs that more closely mimic natural systems, variability and interactions can also start to mimic natural systems, making interpretations difficult.Differences have been reported between effects thresholds in mesocosm community studies and field aquatic invertebrate community responses, especially with shortterm mesocosm experiments (Iwasaki et al., 2018).However, in longer exposures, mesocosms and field studies have shown similar patterns.For example, aquatic insect responses were similar between 30-day mesocosm exposures with Zn+Cd mixtures and biosurveys in streams polluted with Zn+Cd mixtures at similar ratios (Mebane et al., 2017).With these cautions in mind, 14 field or ecosystem studies were analyzed, to compare lowest adverse effects with the criteria concentrations that are not supposed to allow adverse effects in field or ecosystem conditions (Table 4).For the studies in natural streams, conditions were seasonally variable.The effects for criteria comparison conditions selected were for stable, summer flow conditions.Some other published community studies with Cu that were reviewed but were not included for this comparison are Gardham et al. (2014), Hoang et al. (2011), Pascoe et al. (2000), and Shaw & Manning (1996); water chemistry was insufficient to calculate MLR-based criteria; Hickey & Golding (2002); Cu mixtures were tested in equitoxic ratios; and Meador et al. (1993); contrived communities were tested.
One study of ecosystem functional metrics in Convict Creek (Leland & Carter, 1985) showed adverse effects at a concentration lower than any of the hardness, BLM, or MLR criteria (Table 4, row 1).In that study, minor effects to periphyton or macroinvertebrate community structure were also observed at that low treatment (2.5 µg/L), with pronounced effects at 5 µg/L, which was also the MLR-chronic criterion concentration for that condition.Overall, the MLR-based chronic criterion concentrations were higher than selected effects concentrations in 3 of the 15 field or ecosystem case studies summarized, the hardness-based chronic criterion in 4 of 15, and the BLM-based chronic criterion was higher than the lowest effects concentrations in 9 of the 15 comparisons.The fact that the MLRbased criterion allowed adverse effects in three studies is not surprising because two were with phytoplankton or periphyton, which are data types not included in the 95th percentile of the SSD criterion formulation (Stephan et al., 1985).The third less than fully protective example, loss of sensitive mayflies (Clements et al., 2013), is not surprising because mayflies have been under-represented in aquatic life criteria datasets owing to methodological challenges and restrictive definitions of acceptable data (Buchwalter et al., 2017).
These simple comparisons should not be taken too literally as "bright line" tests of protectiveness or not.Consider, for example, the test by Iwasaki et al. (2022;Table 4).Although the hardness-based chronic criterion was slightly lower than the test condition, mayflies were decimated, and it is unlikely that they would have been unaffected at the slightly lower criterion concentration.Similarly, the MLR-based criteria were only slightly above a concentration that caused only transitory phytoplankton effects in the study by Winner (1985).The BLM-based criteria concentrations that were higher than the MLR suggest oversensitivity either to pH (Brix et al., 2021) or to DOC (discussed earlier in "The reactive DOM Issue" section).

Models and criteria considerations
Whereas the preceding analyses addressed the MLR and to a lesser extent the BLM and hardness model performance with different species, endpoints, and communities, now let us consider how the criteria would behave when they are applied.
Seasonal time series plots of the hardness, BLM, and MLR criteria for 10-year periods are contrasted for six data-rich streams in Figure 11.Although these six streams are all from California, their characteristics and environmental settings are diverse and not peculiar to California.Correlations between the criteria and correlations between the factors driving the criteria values are shown in Figure 12.An important implication of these seasonal criteria behaviors to water quality management is the phase differences in the cyclical patterns.From the foregoing material and the performance comparisons shown by Brix et al. (2021), the MLR criteria are the closer representation of "truth" in predicting conditions expected to be nontoxic or toxic to the great majority (95%) of aquatic genera represented in the criteria datasets.Thus, in the present study, the MLR is considered the benchmark against which to compare the hardness and BLM criteria versions.
Each of the streams reflects a very different water chemistry profile that in turn drives different criteria patterns.The Sacramento River and the San Joaquin River have similarities as the major rivers draining the Central Valley of California, yet the San Joaquin River tends to have higher and more variable DOC concentrations, which are reflected in high MLR and BLM criteria values.The lowest troughs in the patterns occurred with the hardness criteria.The Merced River and the Marble Fork Kaweah River both have alpine headwaters in the High Sierra, but the Merced has very low hardness year-round.There the seasonal criteria ups and downs are driven by the higher DOC in the spring runoff.The Marble Fork Kaweah River also experiences very low hardness during spring runoff.However, there, hardness has a strong influence on the MLR criteria because during baseflows the hardness increases to >80 mg/L, with pH increasing with the increasing hardness.Sagehen Creek in the subalpine northern portion of the Sierra Nevada range, has strong and regular seasonal patterns in criteria, DOC, hardness, and pH, with hardness and pH positively correlated and hardness and DOC negatively correlated.The Santa Ana River differs from the other streams with its more arid and urban southern California setting, with very high hardness and DOC.The criteria values vary strongly and irregularly, which is presumably related to rainfall events (Figures 11 and 12).
Sagehen Creek and the Santa Ana River are dissimilar streams, but they share the fact that the hardness and MLR criteria are largely out of phase.Conditions for which the hardness-based criteria would indicate that a given Cu concentration would be minimally toxic are actually predicted to be the most toxic by the MLR-based criteria, and vice versa (Figure 11).The hardness-based criteria would be either severely underprotective and overprotective at different times of the year in settings where they are out of phase with the MLR.Even in a setting where the hardness and MLR criteria are in phase, such as the Marble Fork Kaweah River, the hardness criterion concentrations are much too high during baseflow, again indicating criteria would be underprotective in such a setting.In contrast, in streams such as the Merced and San Joaquin Rivers in which hardness decreased and DOC increased during runoff, the hardness-based criteria would be overprotective at those times, relative to the MLR (Figure 11).

Summary of the relative performances of the hardness, MLR, and BLM models
In sum, the hardness-toxicity model underlying the hardness-based criteria (USEPA, 1985b(USEPA, , 1996) ) was unreliable for predicting toxicity in all settings evaluated.Depending on the interplay of DOC, hardness, and pH, the hardness-based criteria could be either strongly underprotective, overprotective, or only sometimes equally protective relative to the  USEPA's protection goals for national criteria as described by Stephan et al. (1985).In sum, the hardness-based Cu toxicity model cannot be recommended for any purpose.
The MLR toxicity model reliably predicted toxic or nontoxic conditions for a wide variety of species and endpoints over a wide variety of water types (Figures 1-9).Because Brix et al. (2021) applied the model to a chronic SSD that they had assembled and interpreted following the steps of Stephan et al. (1985), they produced a criteria value (Equation 2) that would be similar to what the USEPA itself would have produced following that guidance with the same dataset.When the Brix et al. (2021) chronic criterion value was compared with an independently compiled but strongly overlapping chronic criteria dataset (Figure 10), the comparison indicated that the Brix et al. (2021) chronic criterion produced the 95th percentile level of "community protection" (treating the chronic dataset in the context of a biological community) intended by the USEPA's national guidelines for deriving aquatic life criteria (Figure 10).Comparisons of the MLRbased chronic criterion with field and experimental ecosystem studies with Cu indicated that the MLR criterion was largely protective and fared better in these comparisons than did the hardness-based or BLM criteria.
In sum, the MLR criteria equations appear robust and suitable for general application within the range of conditions under which they were generated.This would include 95% of expected water quality conditions in California (Supporting Information, Table S1) and the United States overall (Brix et al., 2020, their Table 1).The MLR toxicity predictions are unreliable in low pH conditions below approximately 5.5 (Figure 3), which is below the range that can be approximated with a linear function.
The acute Cu BLM that was adopted into the USEPA's (2007) aquatic life criteria has had further incremental updates and refinement (Brix et al., 2021;Environment Canada, 2021).Brix et al. (2021) demonstrated that the performance of their 2021 BLM version approached that of the MLR and performed well across diverse waters with diverse species.Their version improved on an earlier oversensitivity to pH.However, the specific BLM version incorporated into the USEPA (2007) criteria version did not have those enhancements and produces criteria values that could either be too high or too low relative to the MLR.
The mechanistic structure of the BLM gives greater potential flexibility for site-specific use for unusual waters well outside the datasets used to develop the MLR (Mebane et al., 2020).Examples for which the MLR is untested include waters in which hardness is dominated by Mg, in waters with high hardness but low alkalinity (Van Genderen et al., 2007), or in direct proximity to some acid mine drainage waters (Dee et al., 2023).Conceptually, the refined BLM may be able to predict nonlinear pH-toxicity relations outside the pH 6 to 9 range that encompassed most of the MLR data.The BLM excels as a research tool in that it is flexible, is not as constrained to the training data as are MLRs, can be modified to address mixtures, and has good application in ecological risk assessment and other applied issues (Farley et al., 2015;Mebane et al., 2020).
Both the BLM and MLR approaches are appropriate tools for capturing important toxicity-modifying factors for the metals commonly of concern in manufacturing, mining, effluents, and runoff (Mebane et al., 2020;USEPA, 2022aUSEPA, , 2022b)).However, for regulatory water quality criteria, the BLM approach has disadvantages in terms of transparency and resiliency over time.The present BLM software implementations and in some cases, the underlying speciation models (current versions of the WHAM submodel, for example) may be the intellectual property of their developers (Lofts, 2012;Santore & Croteau, 2019).It may be difficult for public environmental managers to ensure the long-term availability of software-based BLMs that would require a sustained commitment to maintaining and updating the software.Over time, updates to make the software interoperable on different and evolving computer operating systems could be needed, perhaps with software testing and a help desk with technical support for users.Adopting an equation into regulatory criteria avoids all those complications.

CRITERIA IMPLEMENTATION CONSIDERATIONS
Considering the state of the science, model performance, water quality goals to protect freshwater environments, USEPA policy directions, transparency, and simplicity, I conclude that the MLR is the best candidate model presently available for statewide criteria updates.For example, in California, updated BLM-based criteria could be a candidate site-specific criteria option for waters with conditions outside the central 95th percentile of ambient conditions in California, that is, waters below or above the 2.5th to 95th percentiles of ambient DOC, hardness, and pH (Supporting Information, Table S1).These conditions are within the range of data used to calibrate the MLR.In particular, receving waters for which critical conditions (in a discharge permitting context) are less than approximately pH 5.7 or >pH 9, these would be outside the range of the data used to develop the MLR.The MLR model should be applied cautiously or not at all outside these bounds.In theory, because of the nonlinear behavior of pH and Cu toxicity and the constraints of a linear model, the most current BLM could potentially give a better representation of actual toxicity relations.
Multiple factor equations have been used in criteria since 1985 when the USEPA established ammonia criteria that applied nonlinear equilibrium calculations to predict un-ionized ammonia (the form considered toxic) as a function of pH and temperature (Emerson et al., 1975;USEPA, 1985a).With the USEPA (2018) updates to the aquatic life criteria for aluminum, the USEPA considered basing the criteria on either a BLM or MLR, both with similar model performance.The USEPA (2018) selected the MLR approach over the BLM for practical, nonscience reasons-the MLR has greater computational transparency, similar performance, and fewer data requirements (USEPA, 2018).In March 2022, the USEPA expanded this science policy preference for the MLR over the BLM from aluminum to trace metals generally, as an overarching approach adaptable to all criteria for metals (USEPA, 2022a).
Changing from a long-standing hardness-based Cu criteria to criteria requiring hardness plus DOC and pH as monitoring inputs can be challenging for discharge permittees and permit administrators alike.Adopting updated criteria on a tiered basis may be an approach that avoids burdening small dischargers while keeping flexibility for implementation with major dischargers.A tiered implementation could be devised in which conservative default MLR conditions are used to screen out facilities with low Cu concentrations in their discharges without the need for proper monitoring of the additional parameters (pH and DOC).For example, sewage effluent may have elevated Cu from household pipes and other sources in addition to elevated DOC and pH, relative to upstream receiving waters (Breault et al., 1996;Sarathy & Allen, 2005).Thus the additional complexity of the MLR with its DOC and pH requirements might be embraced by large municipal wastewater facility operators if they felt that their effluent permits appropriately incorporated the characteristics of their discharges.
Tiered criteria implementation, in which conservative assumed values for factors modifying toxicity are applied if reliable measurements are unavailable, is one way to adopt updated criteria without unnecessarily burdening dischargers with low Cu concentrations in their discharges.Furthermore, data quality problems from filtration, cross-contamination, and sample storage are not much of a concern for hardness measurements, but they are for DOC.Furthermore, pH sensors require exacting calibrations and may be finicky.For example, the vast majority of streams in California have Cu concentrations lower than MLR-based criteria (Supporting Information).A conservative, but not extreme, default condition could be to compare monitoring data against the MLR criteria calculated with 1 mg/L DOC, pH 7, and actual hardness values.Then, if environmental Cu concentrations are projected to exceed criteria calculated with these default values, actual DOC, hardness, and pH measurements under representative hydrologic conditions would be needed.
Finally, the USEPA's water quality standards regulation provides three ways for states and tribes to set numeric aquatic life criteria: (1) adopt the USEPA's current criteria guidance, such as the the USEPA (2007) Cu criteria; (2) modify the USEPA's criteria to reflect site-specific conditions; or (3) use "other scientifically defensible methods" (USEPA, 2021).The body of work with the Cu MLR-based aquatic life criterion-like values establishes the scientific defensibility of the Cu MLR method for setting criteria (Brix et al., 2017(Brix et al., , 2020(Brix et al., , 2021;;USEPA, 2022a).The independent vetting of the Cu MLR-based aquatic life criteria values in the present study shows that the specific data compilations and reduction by Brix et al. (2021) following the procedures of Stephan et al. (1985) produced "criteria" values that further appear largely protective from olfactory toxicity and in field and experimental ecosystem settings.Together, the data from this body of work would seem to provide sufficient technical justification that the Cu MLR-based criteria equations in their present form may be considered an acceptable "other scientifically defensible method" for Cu aquatic life criteria revisions.
Supporting Information-The Supporting Information is available on the Wiley Online Library at https://doi.org/10.1002/etc.5736.This article has earned an Open Data badge for making publicly available the digitally shareable data necessary to reproduce the reported results.The data are available at https://doi.org/10.6084/m9.figshare.19808668.Learn more about the Open Practices badges from the Center for Open Science: https://osf.io/tvyxz/wiki.
Data Availability Statement-Supporting data and calculation tools used to support the interpretations in this article are available from https://doi.org/10.6084/m9.figshare.19808668.
Copies of obscure gray literature references may be requested from the author (cmebane@usgs.gov).
Chronic Cucriterion g L exp 0.855 ln DOC mg L 0.221 ln hardness in mg L 0.216 pH 1.402

FIGURE 1 :
FIGURE 1: Fathead minnow toxicity from copper in soft and hard water, using linear regressions of hardness, BLM predicted values, and MLR predicted values as predictors of observed toxicity.(A) Hardness-toxicity correlations.(B) Predicted toxicity using the hardness-toxicity single linear regression equation from (A); hardness-predicted toxicity values <0 were plotted as 1 µg/L.(C) Toxicity values as a function of BLM predicted toxicity were much stronger than hardness regressions but were systematically biased toward underpredicting actual toxicity in soft water.(D) Toxicity as a function of MLR predictions, which were most accurate.BLM = biotic ligand model; LC50 = median lethal concentration; MLR = multiple-linear regression model.

( 1 )
large datasets testing the acute responses of the cladoceran Ceriodaphnia dubia in natural and urban waters; (2) chronic responses in laboratory waters; (3) acute responses of the amphipod Hyalella azteca in natural and manipulated laboratory waters; and (4) acute and chronic responses of freshwater mussels to Cu in a wide variety of natural and laboratory waters.
Freshwater mussels.The performance of the MLR was evaluated with four freshwater mussel datasets.Wang et al. (2009) tested the acute toxicity of the fatmucket mussel Lampsilis siliquoidea juveniles to Cu under various conditions of varying DOC and hardness, andWang et al. (2011) tested the acute and chronic toxicity of Cu to the unionid mussel Villosa iris juveniles under varying DOC concentrations.The observed:predicted plots are very tight for all three datasets (Figure7).The large dataset with fatmucket had an observed:predicted slope near 1, and both of the Villosa iris tests had slopes <1, indicating that the model predicted somewhat greater reduction in toxicity than was observed.The two studies used different DOC sources.Wang et al. (2009) used dilutions of natural pond water for their variable DOC tests, whereasWang et al. (

FIGURE 4 :
FIGURE 4: Performance of MLR and BLM predictions across a DOC gradient with rainbow trout.Data from Welsh et al. (2008).BLM = biotic ligand model; DOC = dissolved organic carbon; LC50 = median lethal concentration; MLR = multiple-linear regression model.

FIGURE 5 :
FIGURE 5: Ceriodaphnia dubia toxicity as a function of hardness, DOC, and MLR predictions tested in natural waters from forested watersheds.(A) Acute C. dubia toxicity as a function of water hardness.(B) Acute C. dubia toxicity as a function of DOC.(C) Acute C. dubia toxicity as a function of the MLR model.(D) Chronic C. dubia toxicity as a function of the MLR model.Data sources: acute data in natural streams from Michigan's Upper Peninsula (UP;Linton et al., 2006); acute data in laboratory water and chronic data in mixed laboratory and natural waters(Schwartz & Vigneault, 2007;Wang et al., 2011).DOC = dissolved organic carbon; LC50 = median lethal concentration; MLR = multiple linear regression.

FIGURE 6 :
FIGURE 6: Ceriodaphnia dubia toxicity as a function of MLR predicted toxicity or hardness, with waters collected from the urban Los Angeles River (CA) watershed.(A) Acute C. dubia toxicity as a function of MLR predictions.(B) Acute C. dubia toxicity as a function of hardness.Source: LWA (2014).LC50 = median lethal concentration; MLR = multiple linear regression.

FIGURE 8 :
FIGURE 8: Olfactory toxicity to coho salmon and avoidance responses of rainbow trout and Chinook salmon to copper, compared with MLR acute toxicity predictions.Coho salmon electro-olfactogram (EOG) inhibition data from McIntyre et al. (2008a, 2008b) as recalculated into EC20s by Meyer and Adams (2010).EC20 = 20% effective concentration; MLR = multiple linear regression.

FIGURE 10 :
FIGURE 10: Species sensitivity distribution (SSD) for species mean chronic responses (20% effective concentrations [EC20s]) showing levels of protection to aquatic communities under different DOC and hardness conditions.The pH is 7.5 in all scenarios.Species plotting to the right of (higher than) the vertical chronic criteria lines are considered protected by that criterion; species to the left may not be protected.Under some conditions (B and C) the hardness-and MLR-based criteria are equivalent, but under low DOC and high hardness (A) the hardness-based criterion was 3× underprotective and under low DOC and low hardness (D) the hardness-based criterion was 3× overprotective relative to the MLR-based version.DOC = dissolved organic carbon; MLR = multiple linear regression.Source: Mebane et al. (2020).

FIGURE 11 :
FIGURE 11: Hardness versus BLM versus MLR-based chronic criteria patterns in six California streams, ranging from low pH and DOC alpine streams (A and B) to large, lowland streams (C-F).The MLR and BLM patterns are in phase with each other, whereas the hardness values are often completely out of phase.BLM = biotic ligand model; DOC = dissolved organic carbon; MLR = multiple linear regression; USGS = US Geological Survey.

FIGURE 12 :
FIGURE 12: (A) Pearson correlations between hardness and MLR-based chronic criteria and between BLM and MLR criteria in six California streams/ (B) Correlations between factors driving those criteria patterns: hardness versus pH and hardness versus DOC.BLM = biotic ligand model; DOC = dissolved organic carbon; MLR = multiple linear regression.

TABLE 1 :
Example Cu water-effect ratio calculations using natural laboratory waters as "site waters" and reconstituted laboratory water recipes and the biotic ligand model or multiple linear regression model as predictions of Cu to Ceriodaphnia dubia Laboratory water characteristics from the USEPA (2007, Table

TABLE 2 :
Comparison of free ion copper predicted by the biotic ligand model and experimentally determined by cupric ion selective electrode BLM = biotic ligand model; Cu-ISE = cupric ion selective electrode; DOC = dissolved organic carbon; WHAM = Windermere Humic Aqueous Model.

TABLE 3 :
Behavioral avoidance responses or chemosensory impairment (20% effect concentrations), in rainbow trout, Chinook salmon, and coho salmon relative to hardness, BLM, or MLRbased acute criteria values, except tests noted as having >4 days of exposure, which are compared with chronic criteria