Toxicological effects assessment for wildlife in the 21st century: Review of current methods and recommendations for a path forward

This article is part of the special series from the SETAC workshop “ Wildlife Risk Assessment in the 21st Century: Integrating Advancements in Ecology, Toxicology, and Conservation. ” The series presents contributions from a multi - disciplinary, multistakeholder team providing examples of applications of emerging science focused on improving proc - esses and estimates of risk for assessments of chemical exposures for terrestrial wildlife. Examples are considered relative to applications within an expanding risk assessment paradigm where improvements are suggested in decision ‐ making and bridging various levels of biological organization


INTRODUCTION
Ecological risk assessment (ERA) evaluates the likelihood that adverse effects may occur or are occurring in nonhuman organisms, populations, and ecosystems as a result of exposure to one or more stressors (Suter, 2007).Depending on the scope and problem formulation of the ERA, it may encompass the characterization of exposure and adverse effects evoked by a chemical or group of chemicals in a broad range of species, including amphibians, reptiles, birds, and mammals (viz., wildlife;Rattner, 2009).Adverse effect measurement endpoints commonly range from toxicological responses at the molecular through organism level but can extend to ecological consequences.The ERA process is widely used for prospective and retrospective evaluation of hazard and risk to support decisions on anthropological uses of chemicals that may enter the environment and for retrospective assessments that principally focus on hazard and remediation of accidental chemical discharges or contaminated sites.
Measures of adverse effects, including survival, growth, and reproduction, are central to both prospective and retrospective terrestrial wildlife ERAs where the scope and problem formulation include the terrestrial ecosystem.Since the 1960s, deterministic methods involving hazard quotients (e.g., daily oral exposure divided by the toxicity threshold or reference value) have been used by many regulatory bodies, with emphasis on problem formulation, uncertainty, and risk probabilities being incorporated into the ERA by the close of the 20th century (Fairbrother, 2019).Despite advances in methods to describe dose-response relationships, development of toxicity reference values (TRVs), and predictions of responses of ecological processes (population dynamics, community biodiversity) across geographic scales, the ERA framework has advanced slowly for wildlife.However, a recent update to the European Food Safety Authority's (EFSA, 2009) guidance for bird and mammal risk assessment does theoretically capture some of the new advances and technologies (EFSA, 2009(EFSA, , 2023))."Model" test species may not share life history traits with the ecologically relevant target species of stakeholder concern (Segner & Baumann, 2016), and toxicity data extrapolations are often addressed through what appear to be somewhat arbitrary uncertainty factors.Thus, sometimes the "ecology" is seemingly missing from "ecotoxicology." With many regulatory agencies seeking to reduce, replace, or augment vertebrate animal tests with in chemico, in vitro, and in silico technologies (i.e., new approach methodologies [NAMs]) as potentially more efficient, predictive, and economical animal alternatives to inform hazard and risk assessments (European Commission [EC], 2007;National Research Council, 2007), ERAs focused on wildlife may be at a crossroads.While "omics" (e.g., SeqAPASS, LaLone et al., 2016;EcoToxChip, Basu et al., 2019) and other NAMs (e.g., CATMoS [Collaborative Acute Toxicity Modeling Suite], Mansouri et al., 2021) hold great promise for screening for chemical hazards and predicting interspecific differences, their application to wildlife ERAs is currently limited (e.g., effects of aryl hydrocarbon receptor active compounds for birds).Currently, the NAMs available for wildlife risk assessment (and ERA more generally) are less advanced than they are for human health risk assessment (Miccoli et al., 2022;Sewell et al., 2021;Stucki et al., 2022;United States Environmental Protection Agency [USEPA], 2021; Van der Zalm et al., 2022;Wolf et al., 2022).Protection goals for wildlife are often at the population level (e.g., EFSA, 2009), while to date many of the NAM activities are targeted for human health and therefore are applied at the individual level.A significant challenge is the translation and extrapolation of NAM endpoints to hazards and risks at whole animals and perhaps higher levels of biological organization for ERA (USEPA, 2021(USEPA, , 2022a;;Rattner et al., 2023).Thus, for the foreseeable future, it is likely that ERAs focused on terrestrial wildlife will continue to depend upon data derived from animal testing and field studies, with a gradual infusion of omics, chemical groupings based on mode-of-action, computational models for species sensitivity distributions (SSDs; USEPA, 2022b), extrapolationbased tools such as the USEPA's WEB-ICE (Interspecies Correlation Estimation; Raimondo et al., 2015), and other NAMs during the discovery process for new chemicals and new active ingredients, by providing additional evidence to better extrapolate in vivo data across multiple species.
As part of a 2021 SETAC Technical Workshop on advancements that might enhance ERA for terrestrial wildlife (focusing on air-breathing species), an "effects assessment workgroup" focused on the suitability of current methods and animal models and how promising new technologies could be integrated into risk assessment frameworks.Herein we review, critique, and identify sources of uncertainty and opportunities to improve various animal standardized tests and nonguideline test evaluations that generate toxicity data for wildlife ERAs.This information, in concert with a companion paper on new and promising technologies (Rattner et al., 2023), aims to help lead the field in a direction that accommodates the 3Rs (Reduction, Refinement, and Replacement), including the use of NAMs (e.g., tools and models), while recognizing existing animal-based frameworks that have been developed over decades and will be the foundation of ERA in the near-term.In time, an enhanced understanding of mechanisms of action, with toxicokinetic, toxicodynamic, and bioinformatic predictions, will reduce both animal use and uncertainty to better inform ERAs and ultimately benefit wildlife populations and their supporting habitats.In this article, we discuss: • Potential use of nonguideline wildlife toxicology effect studies in ERA • Shortcomings of existing guidelines and animal models used for toxicological effects assessment for wildlife • Knowledge gaps relating to risk assessments for taxa that may not be covered by the current animal models • Value of investing some resources to improve current in vivo protocols described in standard guidelines despite the desire to move away from vertebrate testing • The need for clear guidance on conducting wildlife field effect studies • Potential improvements in the tools available for toxicological effect assessment in wildlife

Nonguideline tests
The potential use of nonguideline studies in risk assessment introduces the challenge of justifying which studies are included and which are excluded.In some circumstances, nonguideline studies may help to fill data gaps and support more robust wildlife risk assessments.Data from the scientific literature (and its quality) may be considered for pesticide renewals and evaluation of existing industrial chemicals (e.g., study reliability evaluated through Klimisch scoring, Klimisch et al., 1997; systematic review processes, Moermond et al., 2016; use of other narrative evidence integration techniques, Lent et al., 2021), and this approach can yield more robust risk analyses compared to sole reliance on guideline studies.
In the European Union (EU), literature reviews are a requirement for active substances under Regulation (EC) 1107/2009(EC, 2009)).The EFSA also provides a guidance document on how the literature review should be conducted including the search, selection of relevant data, and its presentation in the dossier (EFSA, 2011).The literature review will only likely yield additional vertebrate data when an active substance is up for renewal, as new active substances will not have vertebrate data outside of the regulatory databases at the time of submission.
Knowledge collection solutions such as ECOTOXicology Knowledgebase (initially developed by the USEPA in the 1980s) and adverse outcome pathways (Ankley et al., 2010;Delrue et al., 2016;discussed in detail in a companion paper, Rattner et al., 2023) also serve to support more robust risk assessments.The latest version of ECOTOX Knowledgebase (Ver.5, www.epa.gov/ecotox)uses systematic methods for literature search, review, and data curation, providing ecotoxicity data for over 12 000 chemicals from studies using model test species and nonstandard species, including terrestrial wildlife (Olker et al., 2022).
Another means by which data from nonguideline studies can be incorporated into risk assessments is through adverse effect reporting.For example, if data showing adverse effects become available (e.g., a toxic response occurs at a lower dose or concentration than previously reported from a current guideline study in the same or a similar species), then this would have to be reported to a regulator in the United States (under the Federal Insecticide, Fungicide, and Rodenticide Act 6(a)(2)) immediately for consideration.Thus, new data can be considered outside the regular renewal cycle (e.g., 10-15 years) and it is possible for nonregulatory studies and incident reports to be included (see Supporting Information).
Compared to prospective risk assessments for new chemicals, retrospective wildlife risk assessments often have more flexibility in their use of data from nonstandardized studies (e.g., contaminated land and spill events under Natural Resource Damage Assessment [USEPA, 2022a]; Comprehensive Environmental Resource, Compensation, and Liability Act [USEPA, 2022c]; Federal Contaminated Site Action Plan [Environment Canada, 2010, 2013]; and existing chemical risk evaluations under the Toxic Substances Control Act [USEPA, 2022d]).Risk assessments for contaminated land regularly rely on repurposing data from nonstandardized studies in peer-reviewed journal articles to develop TRVs, although most of these investigations were not designed for ERA purposes.Such studies may be assessments of poultry productivity for birds and rat or mink reproduction for mammals.The flexibility of retrospective risk assessments can allow for the incorporation of data from studies that are designed with a specific problem or geography in mind.While flexibility to use data is an advantage of retrospective risk assessment, the absence of standardized requirements for vertebrate data generation can also be an issue when significant data gaps and uncertainties are apparent.When data are repurposed, the objectives of the original study may align only partially with those of a risk assessment.For example, avian toxicity data are lacking for antimony even though it is a widespread contaminant, and chromium toxicity in birds is often assessed in the United States based on an unpublished and incompletely documented reproductive study (Haseltine et al., 1983).

Guideline tests
For prospective risk assessments, the data requirements are often formalized in some applicable regulations and rely on standardized guideline studies (e.g., Federal Insecticide, Fungicide, and Rodenticide Act, 1996;EU Commission Regulation numbers 1107/2009;283/2013 and284/2013;EC, 2009EC, , 2013aEC, , 2013b;;Government of Canada, 2005;Health Canada Pest Management Regulatory Agency, 2016).Currently, for product approval of veterinary and human pharmaceuticals in the EU and United States (e.g., European Medicines Agency, 2006, 2016;United States Food and Drug Administration [FDA], 1998a, 1998b, 2016;Veterinary International Conference on Harmonization, 2000, 2006), industries are not routinely required to generate toxicological effects data for wildlife species, although a wealth of mammalian data are usually available from preclinical trials (reviewed in Bean et al. [2022]; see Supporting information for an overview of pharmaceutical effects assessment for wildlife).
Many of the regulatory frameworks for wildlife have population-level protection goals, and thus individual-level apical endpoints must be extrapolated from laboratory to field and from the individual level to the population level.Some regulatory frameworks are designed to protect at the individual level (e.g., endangered species in the United States; USEPA, 2022e), so apical hazard endpoints generated in the laboratory at the individual level in a surrogate model species can be extrapolated across species and to the field without additional extrapolation to population-level effects.The diverse physiologies of the many species to which the effects data from the model species must be extrapolated makes it difficult to strike a balance between sufficient protection for all species and being overly conservative.
Toxicity data for pesticide and industrial chemical wildlife risk assessments in Europe, North America, and other regions that are principally generated following guideline tests of the Organisation for Economic Co-operation and Development (OECD) or the USEPA are designed to assess the effects on birds (Tables 1 and 2) and mammals (Table 3).Studies conducted following OECD guidelines that are also conducted according to the Good Laboratory Practices are mutually accepted by the 38 OECD member states (e.g., Organization for Economic Co-operation and Development [OECD], 2022) and some nonmembers that adhere to mutual acceptance of data.Vertebrate hazard assessments are often based on acute oral exposure (single or multiple doses in a 24-h period), subacute exposure (five-day dietary exposure followed by at least three days postexposure observation) and chronic and/or subchronic "long-term" exposure (e.g., avian reproduction, rat two-generation), with short exposures by diet or gavage and chronic exposures by diet or drinking water.
Current avian guideline studies are predominantly conducted with Galliform and Anseriform species (e.g., quail, chicken, mallard), although in 2007 an acute passerine test was added to USEPA pesticide data requirements (40 CFR Part 158;USEPA, 2014).Effects data for wild mammals are typically derived from standardized studies conducted for human toxicological assessments in rats, mice, and rabbits.Such studies examine effects at the individual level (i.e., apical endpoints) and below (i.e., cellular-and tissue-level endpoints) in animal models that are well suited to laboratory studies.To reduce biological variability, these studies rely on well-defined husbandry protocols and robust historical control data sets.At present, effects of dermal or inhalation exposure are generally not assessed in wildlife, although an acute inhalation test is available for microbial pesticides (Office of Prevention, Pesticides and Toxic Substances [OPPTS] 885.4100;USEPA, 1996b], and some dermal and inhalation toxicity values are available as well (e.g., dermal exposure of pesticides for birds : Hudson et al., 1979; respiratory exposure to volatile organics in burrowing mammals: Gallegos et al., 2007).

Methodological issues of protocols based on current test guidelines for effects assessment in wildlife toxicology
The advantages and disadvantages of existing test protocols for toxicological effects assessment in wildlife are outlined in Table 4.Among their advantages are that protocols developed based on existing guidelines for laboratory testing are well-established for use within the current risk assessment frameworks, and the animal models used in the guideline studies are well-suited to testing in the laboratory.The test guidelines focus on apical endpoints at the organismal level, which is advantageous for extrapolation to population-level protection goals (i.e., fewer assumptions are required when extrapolating across biological levels of organization).In contrast, when extrapolating effects from studies at the molecular, cellular, tissue, or organ level to final consequences on population dynamics, uncertainty is recognized as being greater.
The current OECD 206 avian chronic test guideline has received criticism for its lack of biological relevance (e.g., absence of parental care during egg laying, incubation, and chick rearing period; Mineau, 2005).There are also issues in statistical power among the 15 endpoints that are typically evaluated due to differences in variance; small percentage differences from the control can be detected for some endpoints (e.g., noncracked eggs), whereas only large differences can be detected for others (e.g., adult male or female body weight gain over the course of the study) (see Green et al. [2022] for a detailed discussion).Furthermore, surrogate test species are chosen for convenience, often have limited genetic variability, may have different sensitivity than the species of interest, and may even fail to exhibit the full spectrum of toxicological responses.Also, both the bioaccessibility and bioavailability of the administered dose may not be relevant to environmental conditions.Tissue analysis and mechanistic data collection are not generally conducted under standard ecotoxicology effects assessment test guidelines for wildlife (listed in Office of Chemical Safety and Pollution Prevention [OCSPP] 850.2300 as an option but not a requirement; USEPA, 2012c), which is understandable from the perspective of efficiency but  (1984a, 2016), USEPA (1996aUSEPA ( , 1996bUSEPA ( , 2012b, 2012e), 2012e).
presents some limitations in study utility.For example, field monitoring often involves tissue analyses, but such results are not interpretable in the absence of tissue residue-effect data.
The absence of tissue residue data in ecotoxicology guideline studies also limits the utility of physiologically based kinetic models (e.g., Baier et al., 2022).The absence of mechanistic data from guideline ecotoxicology studies for terrestrial vertebrates maintains the regulatory focus on apical effects, but it can make it difficult to determine if effects near the minimum detectable difference (see Green et al., 2022)  Source: Organization for Economic Co-operation and Development (1997, 1998, 2001a, 2001b, 2001c, 2008a, 2008b, 2018a, 2018b, 2018c, 2018d, 2018e), USEPA (1998aUSEPA ( , 1998bUSEPA ( , 2000USEPA ( , 2002)). of normal ranges of responses and minimum detectable differences (Green et al., 2022;Valverde-Garcia et al., 2018).Historical control data should be used as part of a holistic evaluation of studies and can help address issues such as outlier identification, determining minimum detectable differences for endpoints, and evaluating whether informal statistical reasoning is appropriate (e.g., employing the lowest observable adverse effect concentration that is not statistically significant but produces a 10% reduction compared to the control group).It is also possible that statistically significant differences between treatments in apical organism-level endpoints in laboratory animal studies may have little relevance to actual thresholds for adverse effects if the observations lie within the normal range for the test species.Modeling of the complete dose-response curve (e.g., benchmark dose [BMD] methods) is a suggested alternative (Sample et al., 2022).However, for chronic studies where there are typically only three treatment groups plus a control group that are not necessarily geometrically spaced, such designs are not well suited to BMD methods (discussed in more detail below, e.g., see Green et al. [2022]).
In the laboratory, the dose is administered and quantified (or expressed) based on a selected exposure route, and each method has advantages and disadvantages.Oral gavage, typically used in acute studies and some repetitive exposure studies, has the advantage of accurately quantifying oral exposure on a mg chemical/kg body weight basis.However, oral gavage can introduce the "bolus effect" that can influence Existing animal models tolerate the stress of the laboratory environment.
Food avoidance and spillage in dietary studies challenge accurate dose quantification.
Testing protocols are well established for use in laboratory.
Bioavailability and bioaccessibility of the administered dose may not be environmentally realistic.
Generally, no quantification of internal exposure or dose.
Extensive data available on responses of existing animal models to a wide range of chemicals and contaminants.
Little or no investigation of underlying mechanisms of toxicity that could potentially be used as biomarkers in field studies.
Clinical chemistry parameters and mammalian toxicity data are conducted as part of the human health assessment.
Chronic studies only have three groups plus a control, which is not well suited to benchmark dose models.
Validity criteria for a regulatory acceptable study are clearly defined.
Rarely have sufficient data for species sensitivity distributions or geometric mean across species.
Controlled design can establish cause-effect relationships.Absence of testing protocols for air-breathing amphibians and reptiles necessitates extrapolation of data from fish, birds, and mammals.
Uncertainty in extrapolatation across species and from laboratory to field.
Calculation of dose in mg/kg BW/day can be unreliable due to variations in food ingestion and body weight.
TOXICOLOGICAL EFFECTS ASSESSMENT FOR WILDLIFE-Integr Environ Assess Manag 20, 2024 absorption, distribution, metabolism, and excretion in some species that may not emulate realism under environmental conditions.Gavage may initially result in higher peak internal concentrations (e.g., blood) of the test substance, which can result in overestimation or underestimation of effects compared to dietary dosing (Kapetanovic et al., 2006;Staples et al., 1976;Vandenberg et al., 2014).Gavage can also induce stress in animals, and bioavailability may differ between gavage in the laboratory setting and ingestion in field environments.Furthermore, it is important in some species to integrate robust procedures to quantify regurgitation if it occurs (e.g., cages lined with white paper, use of colored dyes mixed into food, contained within capsules or to coat the outside of capsules, constant observation for 2 h after dosing).
Although Johnson et al. (2005) found that regurgitation in common pigeons (Columba livia) was generally less than 4% of the administered dose (at a high-dose level) and it was surmised that food residing within the crop can partially ameliorate the bolus effect, regurgitation nonetheless remains a significant challenge in passerine acute oral studies (e.g., USEPA, 2012a).Dietary studies avoid artifacts associated with oral gavage, but feed spillage, individual variation in appetite, and food avoidance at high dietary concentrations can make dose quantification and responses difficult to interpret.Furthermore, an issue especially when testing with waterbirds is their tendency to dip their feed in water (Heinz et al., 1989;Heinz & Sanderson, 1990).For example, in a dietary mallard test with selenium, Heinz et al. (1989) observed that ducks appeared to rinse the selenium off their treated diets in their water pan before swallowing the "cleaned up" diet to avoid exposure to the toxicant (Heinz et al., 1989).Partial gavage in avian ad libitum studies (i.e., test substance in vehicle administered via a syringe into the crop) can reduce inaccuracies associated with estimating exposure in subchronic feeding trials without adversely affecting egg production (Johnson et al., 2005).
Subacute dietary studies with juvenile birds (typically northern bobwhite, Colinus virginianus and mallard, Anas platyrhynchos) are now often waived in Europe and North America because oral (gavage) acute studies usually generate higher risk estimates (Hilton et al., 2019;USEPA, 2020a).Although it seems that risk can be assessed with confidence in most cases using just the avian acute oral test (Hilton et al., 2019), subacute dietary studies are still conducted to meet registration requirements of other jurisdictions (e.g., Brazil), and because many products are developed with the intent of global registrations, such data are often still generated.Subacute dietary studies for birds offer a more realistic exposure route and feeding scenario (gorge feeding is less typical) and could be a valuable tool as part of a higher tier risk assessment framework (Bone et al., 2022).Furthermore, for some chemicals (e.g., anticoagulant rodenticides) hazard may be markedly underestimated by acute oral studies, as toxicity is enhanced with repeated exposure (Vyas & Rattner, 2012).It is noteworthy that about two-thirds of acute oral passerine tests (OCSPP 850.2100;USEPA, 2012e) conducted to meet data requirements of the USEPA end up following an adaptation of the subacute dietary test guideline (OECD 205;OECD, 1984a;OCSPP 850.2200, USEPA, 2012b) because of dose regurgitation (Temple et al., 2019).
Vertebrate use is a concern with current testing protocols (EFSA, 2022).Acute oral toxicity test guidelines for avian and rodent protocols offer the opportunity to use up-and-down sequential dosing methods to reduce the numbers of test subjects ultimately needed to estimate the endpoint (i.e., small groups are treated and the results are used to choose doses for the subsequent groups) (OECD, 2008a;OECD 223, OECD, 2016).Some stagewise probit methods indicate that up to 24 animals are needed for rodent studies (American Standards for Testing and Materials, 2019).The OECD 223 avian acute oral toxicity test (OECD, 2016) can require as few as 10 animals if it stops at the limit test, but if it goes through all testing stages then about 45 and often as many as 60 animals will need to be acclimated (i.e., at test initiation, it is uncertain if all the stages will be needed) and it is not always possible to use the untested animals in another study before the colony ages out (i.e., birds come into reproductive condition, staged dosing takes many weeks).Even if the birds have not aged out at the end of the test, there may not be enough left for their use in another test due to the requirement that birds from a different hatch could only be used at a new stage if a new control from that same hatch is also added (OECD, 2016).The combination of these factors means that despite the best attempts of the laboratory to conduct a staged dosing study to reduce animal use, total animal usage may actually still be similar to an OCSPP 850.2100 test.
Chronic tests can use thousands of vertebrates (primarily offspring), yet with only three dose groups and a control, the data generated are not well-suited to determine an effect concentration (ECx) or estimate a BMD.Indeed, in 2015, EFSA evaluated the suitability of around 50 ecotoxicology and toxicology test guidelines used globally to determine an EC10 (EFSA, 2015).With regard to the data requirements for avian and mammalian wildlife risk assessment, the following conclusion was reached: "The test guideline has serious limitations for the derivation of reliable EC10 estimations.However, under certain specific conditions, it may be possible to derive reliable EC10 values" (EFSA, 2015).Green et al. (2022) ran simulations and predicted that if the 144 adult birds currently used in an avian reproduction study were distributed across five (instead of three) treatment groups plus a control, each with 12 pairs (one male and one female), the likelihood of generating a reliable BMD was greatly increased, while the power to calculate a lowest observed effect concentration (and subsequently derive a no observed effect concentration) should a BMD not be derived was only reduced by 12.5%.A disadvantage is that this reduction in group size may make some of the validity criteria harder to achieve and increases the influence of poor-performing pairs on the data set (i.e., with 12 replicates, each pen represents 8.33% of a group, whereas with 18 replicates, each pen represents 5.56% of a group).Nevertheless, such updates to statistical procedures offer an example of innovation that could help drive the field forward into the 21st century while building upon existing data compiled over the past four decades.
With ongoing efforts to minimize the use of vertebrates in regulatory testing, another challenge is that the industry often has just one opportunity to conduct the studies needed for the submission and then the data generated are linked with the chemical for the lifetime of the product.At product renewal, which typically occurs on a 7-15-year timeframe, it is not uncommon for endpoints from existing studies to be reevaluated and reinterpreted (e.g., effect levels lowered to give a more conservative assessment; Brooks et al., 2020).Repeat testing might be desired to resolve uncertainties but may not be an option for the industry due to overarching concerns for animal welfare.In such cases, higher tier options are needed to pass the risk assessment, but many of these options for refinement relate to exposure and not effects assessment.For example, geometric means or SSDs could be produced for acute risk assessments by testing additional species in the laboratory (EFSA, 2023).EFSA's (2023) update to the bird and mammal guidance document indicates that a total of five species are needed for an acute SSD (EFSA, 2023); however, testing vertebrates beyond the minimum data requirements (a single species) is not permitted in the EU, so this option depends on robust studies being available from the literature or from data generated for other regions of the world (e.g., an acute test with a passerine species for a US registration or an acute oral test in pigeon or chicken conducted for registration in India).The risk assessment can also be refined by developing "grouped" risk assessments where a pesticide class can be assessed in its entirety using data for all members of the class.This can be done when a common mode of action is identified for the chemicals in a class and they all exhibit similar effects (USEPA, 2020b).Until ecological effects models are developed, validated, and widely accepted, field effect studies (discussed below) represent the only other realistic option available to refine the toxicological effect component of the risk assessment, but without clear guidance on what is required for a field study to be accepted, it is not uncommon for these studies to be considered supplemental data or even rejected.Modeling approaches at the individual level (e.g., toxicokinetic and/or toxicodynamic models for interspecies extrapolation) and population level (e.g., spatially explicit models such as the USEPA's Markov Chain nest productivity model [MCnest]; Bennett & Etterson, 2013) have not been validated and accepted for wildlife ERA by regulators in Europe.Thus, while such approaches are refinement options, intersector collaborations to develop or validate models for ecological effects are warranted and should be a future research priority.

Representativeness of model species used in current guidelines
Species chosen as laboratory animal models in guideline and nonguideline studies for extrapolation to terrestrial wildlife are not explicitly defined.Currently, Galliformes and Anseriformes are widely used for birds, and rodents are most often used for mammals.However, there are no approved laboratory animal models for terrestrial phase amphibians or reptiles (see OECD and OCSPP/OPPTS guidelines cited in Tables 1-3).The few model species in use are not representative of all relevant species due to differences in physiologies and life histories.This necessitates the use of uncertainty factors (also referred to as assessment or safety factors) in extrapolations for many ERAs.However, these factors are a one-size-fits-all approach that are rarely based on empirical data, with uncertainty of risk estimates possibly being overly conservative for some tolerant species or potentially offering too little protection for some sensitive species.Evaluations of the appropriateness of uncertainty factors are difficult without detailed knowledge of physiological characteristics and life histories of species of concern, as well as an understanding of the chemical-specific kinetics, mode of action, and linkage of the underlying toxicological mechanisms to population and ecosystem-level effects (see Brooks [2022] where an evaluation has been attempted).
As there are currently no OECD or OCSPP test guidelines for the effects assessment in terrestrial phase amphibians or reptiles (guidelines do exist for aquatic phase amphibians; see Supporting Information), effects assessment for amphibians and reptiles is in some cases inferred from data for other taxonomic classes (e.g., Adams et al., 2021;Aldrich, 2009;Bridges et al., 2002;Fryday & Thompson, 2012;Glaberman et al., 2019;Ortiz-Santaliestra et al., 2018;Weltje et al., 2013).For example, Glaberman et al. (2019) found a strong positive relationship for the lowest observed adverse effect concentration between fish and aquatic phase amphibians for survival and body weight endpoints for 44 of the 45 pesticides evaluated.
For pesticide registration in Europe, Commission Regulations (EU) No. 283/2013(EC, 2013a) and 284/2013(EC, 2013b) describe data or relevant information needed for toxicity assessments involving terrestrial amphibian and reptile species.Data from the published literature on these species may be submitted for consideration for product renewal but even then data are scarcely available.Such data are generally not available at the time of registration of new active (proprietary) ingredients.In those situations, risks to terrestrial amphibians and reptiles may be evaluated based on the available data for fish, birds, and mammals, although there has been a long-standing debate on whether toxicity data from surrogate taxa (i.e., fish for aquatic-phase amphibians and birds and mammals for terrestrial phase amphibians and reptiles) are suitable (reviewed in Johnson et al. [2017], Ortiz-Santaliestra et al. [2018], Weir et al. [2010], Weltje et al. [2017]).Weir et al. (2010) compared toxicant sensitivity between birds and reptiles and concluded that birds were not always universally more sensitive than reptiles and thus may not always be appropriate surrogates for use in reptiles in ERAs.Subsequent work, however, pointed to the potential to generate correlation models that relate reptilian and avian toxicity although with some caveats (Weir et al., 2015).
Reptilian and avian exposure pathways also likely differ.Similarly, in the component of their assessment focused on reptiles and terrestrial-phase amphibians, Ortiz-Santaliestra et al. (2018) compared pesticide sensitivity between amphibians and reptiles and their respective surrogates and concluded that homeothermic vertebrates are not suitable surrogates, especially for terrestrial phases or species.Alternatively, Weltje et al. (2017) concluded that acute toxicity data for mammals were similar to or protective of terrestrial phase amphibians under acute exposure conditions.However, there are chemicals such as pyrethroids and organochlorine pesticides that are more toxic to reptiles and terrestrial amphibians than to homeothermic vertebrates (Ortiz-Santatliestra et al., 2018).
Another important consideration is the varied life histories of amphibians and reptiles, which provide a multitude of exposure and risk scenarios that are not represented by any homeothermic surrogates (Allard et al., 2010).One important element is that dermal exposure may be relatively more important for terrestrial amphibians and reptiles (Weir et al., 2010) compared to homeothermic vertebrates.While Weltje et al. (2017) developed an extrapolation model to predict acute dermal toxicity to terrestrial amphibians from fish toxicity data, overall such data and modeling efforts have been limited.Moving forward, one tool to address gaps in toxicity data for amphibians and reptiles is the development of quantitative structure-activity relationship (QSAR) models (e.g., Toropov et al., 2022) or toxicokinetic toxicodynamic models; however, such models will require robust data sets, which are not currently available.
Even species within a vertebrate class, such as mammals, can have marked physiological differences that influence systemic dose (e.g., Monogastric versus ruminat and/or hindgut fermenting species; Johnson et al., 2010).Clearly, a greater understanding of toxicokinetics and toxicodynamics is needed if advancements are to be made in improving the extrapolation of data from laboratory animal models to freeranging wildlife.New approach methodologies that enable robust interspecies extrapolations with reduced uncertainty without further vertebrate testing should be a medium-to long-term priority (see also Fuchsman & Clewell, 2023;Rattner et al., 2023).
Due to species declines, bats are a vertebrate order of concern in many areas of the world but have received particular focus in Europe.They comprise a taxon that presents challenges when extrapolating toxicological effects from standard mammalian models due to the paucity of baseline data and biological differences relative to model species.Bats are elusive, nocturnal, and poorly studied, with approximately a third of the 1400 species being threatened or datadeficient (Frick et al., 2020).In 2019, EFSA's Plant Protection Products and their residues panel reported that risk for bats may not be adequately covered by pesticide risk assessments for ground-dwelling insectivores (EFSA, 2009(EFSA, , 2019)), but it is noteworthy that EFSA's report used some worst-case assumptions for exposure, (discussed by Brooks et al. [2021Brooks et al. [ , 2022]]).The EFSA's (2023) update to their guidance for bird and mammal risk assessments stated that bats are now adequately covered in the ground-dwelling mammalian insectivore risk assessment at the screening and Tier 1 level with the updated assumptions made to the exposure assessment.However, EFSA highlighted that uncertainties remain for other exposure routes (e.g., dermal, inhalation, and maternal transfer of contaminants to pups via lactation; EFSA, 2023).Existing pesticide data on the comparative sensitivity of bats other mammals and birds does not show a clear trend (see additional discussion in Supporting Information and Brooks et al. [2021]).Critical data gaps regarding bat behavior in agricultural landscapes and the comparative sensitivity of bats to pesticides and other contaminants should be filled to better gauge whether existing schemes for birds and grounddwelling mammals are protective of bats.The EFSA ( 2023) update to the bird and mammal guidance document did not include a specific approach for bats but did recommend that robust exposure models covering additional exposure routes (e.g., dermal, inhalation) should be developed for all terrestrial nontarget organisms; however, it remains unclear who is going to take the lead on developing and validating these models.

Challenges related to updating test guidelines or introducing new test guidelines
The OECD has a process for evaluating and updating test guidelines, but it is a time-consuming exercise.The USEPA has introduced and updated its avian testing guidelines more recently, but since requirements for passerine testing were introduced in 2007 (40 CFR part 158; USEPA, 2014), challenges due to regurgitation have been acknowledged (see USEPA, 2012a), and there have been no new requirements for in vivo effects assessments for wildlife.The process to update or introduce a new OECD guideline (OECD, 2009) generally relies on proposals from member states' national coordinators (i.e., no routine procedures for periodic reviews and updates exist).Interested parties (e.g., industries, nongovernmental organizations) seeking to propose an update to a guideline must contact their national coordinator.The OECD Secretariat and any member country can also submit proposals to update a guideline.A review process is then conducted through the Working Group of National Coordinators of the Test Guidelines Program.Due to the required resources and involvement of OECD delegations and other participants, updates to ecological test guidelines are infrequent.
Previous attempts to update test guidelines specific to assessment for birds, such as shortening the one-generation avian reproduction study and introducing a two-generation avian reproduction study (Figure 1), have been unsuccessful (i.e., no update or no new guidance document emerged; OECD, 2007OECD, , 2023;;OCSPP-890.2100, USEPA, 2015).Consequently, the current OECD 206 and OECD 205 guidelines used for the avian reproduction and subacute dietary studies date back to 1984 (OECD, 1984a(OECD, , 1984b)), and even the recent update of the EFSA Bird and Mammal Guidance for Risk Assessment (EFSA, 2020(EFSA, , 2023) ) recommends following OCSPP 850.2300 (USEPA, 2012c) and not OECD 206 (OECD, 1984a) with regard to measurement endpoints and statistical analysis (note: OECD 206 has not been updated in almost four decades).However, in 2018, a new OECD guideline (OECD 443, OECD, 2018e) was introduced for mammalian testing, the extended-one-generation reproductive toxicity study (EOGRTS), which reduces the number of animals used compared to a two-generation reproductive toxicity study as well as bringing mammalian reproductive toxicity testing inline with what is done for other vertebrate groups such as birds and fish.The introduction of the EOGRTS guideline is an example that guideline updates are possible if the resources are invested, although it is noteworthy that the paper upon which the EOGRTS was based was published 12 years before the guideline was finalized (Cooper et al., 2006).
The most recent effort to update OECD test guidelines for avian testing started at an OECD/SETAC Avian Toxicity Testing workshop held in Pensacola, Florida in 1994.Of all the test guidelines proposed for revision or suggested for introduction, only the OECD 223 guideline for the avian acute oral toxicity test emerged.Figure 1 summarizes what was attempted for each test type (acute oral, subacute dietary, one-generation reproduction and two-generation reproduction), and the issues that ultimately caused the activities of the workgroup to be abandoned without an updated or finalized guidance document (Chapman et al., 2001;OECD, 2023).In light of current trends in reducing vertebrate use in testing, the possibility of revisiting guideline updates with another OECD/SETAC workshop should be considered.

Why is it important to consider updates and improvements to current test guidelines?
While some NAMs are currently being developed and validated directly for wildlife (e.g., Crump et al., 2020;Farhat et al., 2019) or with application to wildlife (Mansouri et al., 2021), it remains far from certain whether complete replacement of complex whole organisms in regulatory testing with in vitro, in silico, in ovo, and in utero test systems can be achieved in the near future.Therefore, despite the logistical challenges associated with OECD guideline updates to address shortcomings, this option should not be overlooked; the alternative methods (i.e., NAMs) for effects assessments in wildlife are not yet ready to be integrated into regulatory frameworks.It is also worth recognizing that the scope of whole-animal alternative methods will be different from the guideline in vivo studies that they seek to complement or replace, and thus there will likely be a considerable time before they are validated and embedded within regulatory frameworks.It is our perspective that the initial goals of any NAMs should be to complement in vivo data to reduce uncertainty and expand the scope of what is currently possible, for example, implementation into the screening steps used in the discovery of new chemicals to enable earlier channeling of funds to more promising active ingredients or products.
The endpoints generated by NAMs (molecular, cellular, tissue level) and the life stages used (e.g., embryos) in these methods tend to differ from the apical endpoints derived from in vivo tests in adults and juveniles.It is essential to ensure regulatory flexibility in integrating these new methods (Stucki et al., 2022) while ensuring that the findings of such assessments are still protective.Despite efforts to move from animal testing to NAMs (e.g., SeqAPASS, LaLone et al., 2016;EcoToxChip, Basu et al., 2019), the complexity of the whole organism and its interorgan relationships (e.g., neuroendocrine, neuroimmunological, renal-cardiovascular) currently dictates that at least some of the existing animal models be used for years to come in order to support development and validation of NAMs and ensure their consistent reliability in ERAs.
Therefore, despite the lengthy process involved to update existing in vivo test guidelines, it is worth investing some resources to refine procedures to optimize the accuracy and biological relevance of results while hopefully also reducing the number of animals needed to generate the data.Animal testing is likely to be needed in some capacity for decades to come, and should be used until the replacement methods provide information of equivalent or better scientific quality and relevance, and has been demonstrated to offer free-ranging species protection from environmental contaminants with the same or greater accuracy when compared to current tests.

Opportunities for improving guideline-compliant studies without updating test guidelines
Small enhancements to existing practices within the guidelines for live animal studies are encouraged to maximize the quality of the in vivo data generated.Within current study protocols, options to improve the quality of effects assessment data and the accuracy of the conclusions exist.These options might entail small modifications to standard operating procedures or protocols used by testing laboratories (usually contract research organizations) such as contamination control measures to enable total randomization of birds to cages, assessing the influence of stocking density in brooders on chick survival and body weight to better inform minimum and maximum chick density limitations (e.g., Bean, Riley et al., 2020;Bean, Stanfield et al., 2020;Stanfield & Bean, 2021;Stanfield et al., 2020).Moreover, the integration of historical control data into analyses (Brooks et al., 2019;Valverde-Garcia et al., 2018) and new statistical protocols should be used to complement existing test guidelines (Green et al., 2022).Likewise, the interpretation of the magnitude of effects in laboratory studies (e.g., percent difference from control) in the context of in situ conditions for free-living animals could be improved if robust field data were available.

FIELD EFFECT STUDIES
Field effect studies offer additional realism that laboratory studies cannot and thus may be used to reduce uncertainty or answer specific questions in some circumstances as a higher tier refinement in an ERA.However, the additional realism comes at the cost of less control over variables such as weather, exposure level, and ability to observe and/or measure desired endpoints, to name but a few.In some instances, the applicability of findings may be considered "site specific" even though there may be arguments for broader applicability (e.g., climatic zone or regional applicability).
Unlike laboratory tests in birds and mammals, with the exception of OCSPP 850.2500 "Field Testing for Terrestrial Wildlife" (USEPA, 2012d), field effect studies do not have a regulatory guideline for their conduct, data analysis, or interpretation.In Europe, wildlife field effect studies are typically conducted when significant questions remain unanswered from the laboratory-based assessments of risks to terrestrial wildlife.This flexibility for field studies has the advantage that study designs can be tailored to answer specific questions, but the disadvantage is that there is less clarity as to whether the study results will be considered acceptable by regulators.

Field study design
If designed appropriately, field studies can improve the ecological relevance of wildlife risk assessments.Ideally, these studies should collect population and demographic data as well as abiotic data and, if practical, tissue concentration and pathology findings in animals in both contaminated and so-called reference areas.To be useful in the assessment, these studies should use statistical and modeling tools that quantify field-obtained endpoints with enough confidence and power to assess whether risks are unacceptable based on a priori decision rules.The challenges of working with free-ranging vertebrates necessitate different approaches than those used for captive animals in laboratory settings.For example, studies of free-ranging animals can involve opportunistic, indirect, and in many cases nonlethal sampling.Acute toxicity assessment might involve the evaluation of wildlife incident data or monitoring mortalities in treated areas or contaminated sites versus reference sites.Carcass searches can be opportunistic, systematic (e.g., walking transects), or targeted (e.g., radiotelemetry).Such studies can provide meaningful data on the actual toxicity of a chemical(s) in a field setting (Elliott et al., 2008;Millot et al., 2017), and population effects can also be explored (e.g., Mateo-Tomás et al., 2020;Meyer et al., 2016).However, carcass searches can be unreliable and may underestimate contaminant-related mortality, and results should be interpreted with caution (Balcomb, 1986;Prosser et al., 2008;Vyas, 1999).Assessments of effects on reproduction might involve evaluation of parameters such as clutch size, hatching or fledging success, brood or litter size, or evaluating the age structure of the population.These might be evaluated by physically visiting and monitoring nest sites (e.g., Custer, 2021;English et al., 2022;Grove et al., 2009;Morrissey et al., 2014;Rattner et al., 2018), using game cameras, GPS, or other tracking technologies to monitor individuals and groups of animals, their activity and even health, or trapping to capture, mark, and release so that survival and population age structure can be evaluated.For longer-lived species, this requires long-term commitment and techniques, such as banding, to develop a marked population (e.g., Newton & Wyllie, 1992).In order to explore mechanisms such as to separate, for example, embryotoxicity from parental behavioral effects, some studies have employed egg-swap experiments across study sites (Kubiak et al., 1989;Wiemeyer et al., 1975;Woodford et al., 1998).
Other researchers have collected eggs of wild birds for incubation in the laboratory both from nest sites within a gradient of exposure (Elliott et al., 1996(Elliott et al., , 2001;;Sanderson et al., 1994) and from less contaminated sites for egg injection studies (Gilman et al., 1978;Heinz et al., 2009).Strategic placement of nest boxes or other structures to enhance local populations have been combined with novel tracking and camera trap technologies to assess postrelease effects on reproduction, survival, and sublethal effects of persistent organic pollutants (POPs) (Custer, 2011;Groffen et al., 2019), pulp mill effluents (Harris & Elliott, 2000), pesticides (Poisson et al., 2021), and heavy metal contamination from mining sources (Berglund et al., 2011;Eeva & Lehikoinen, 2015).Carcasses and debilitated live birds are collected by agencies and wildlife rehabilitators and have been effectively used and factored into decision-making by agencies for lead (Descalzo et al., 2021), pesticides (Elliott et al., 2022;Hindmarch et al., 2019), and pharmaceuticals (Herrero-Villar et al., 2020).Similarly, feathers can be obtained from live birds (Espın et al., 2016;Jaspers et al., 2019) and carcass remains and tissues have been obtained from hunters and trappers and have also be assessed for POPs (Elliott et al., 2018;García-Fernańdez et al., 2013), pesticides (Martinez-Haro et al., 2022), lead (Mateo et al., 2001) and contaminants of emerging concern (González-Rubio et al., 2021).
Genotyping of noninvasively collected samples (e.g., feces, feathers, fur) can facilitate the marking of individuals in a population and permit the determination of population parameters (Guertin et al., 2012(Guertin et al., , 2010;;Huang et al., 2018;Lundin et al., 2016).Moreover, many contaminants can be quantified in feces, and metals can be measured in feathers and fur, which can enable the determination of exposure and biomarkers.Other effects that might be evaluated in field studies include changes in body size, weight or condition, or behavior and migratory movements (Eng et al., 2019).
Field studies suffer from uncertainties due to the lack of control of many relevant observational (e.g., detectability, observer skill) and environmental (e.g., habitat, climate) "nuisance" variables that may obscure understanding of chemical effects on endpoints.Modeling tools that predict and distinguish the relationships among such nuisance variables and the chemical effect can help overcome this challenge.For example, tools that correct for imperfect detectability in population density estimates using distance or probabilistic relationships include (1) spatially explicit capture recapture models to estimate densities of marked or tagged animals (Efford & Schofield, 2020), (2) the distance program for density estimates from point or transect counts (Thomas et al., 2010), and (3) the presence program to estimate animal or nest occupancy rates (MacKenzie et al., 2018).The Mark program can include nuisance covariates and remove field sampling error when quantifying survival, recruitment, and breeding probability using statetransition models that process animal capture histories of marked animals (Cooch & White, 2019).Reproductive data by age class can be estimated after removing sampling error (Morris & Doak, 2002) using general linear mixed modeling; individual adults are specified as the random effect to remove the bias of overrepresenting individuals most frequently sampled (Nur et al., 2021).Field studies should be designed with the potential nuisance covariates and parameters required for relevant software programs in mind.
To assess risks with statistical decision rules, data on population sizes, densities, reproduction, survival, sex or age ratios, and diversity can be compared between a reference area and a contaminated site while accounting for other factors important to the species of interest using modeling with habitat, temporal, or other covariates.A challenge with this approach is gaining consensus or approval of reference areas.A better approach to a simple comparison to reference areas is regressing the endpoints in a general linear model against a gradient of chemical concentrations, with or without a reference area.Sample sizes in the field design should meet the statistical power desired for the planned analysis; otherwise, results showing nonsignificant chemical effects, the decision rule for acceptable risk, could result in false negatives.Whether they are prospective or retrospective, field studies should be compared with model-based risk assessments to determine whether the latter are corroborated.In summary, robust methods are available to support field studies of chemical effects on wildlife, and regulatory guidance could help promote acceptance of this currently underutilized line of evidence for wildlife ERAs.

Field studies for retrospective risk assessments
Field studies to evaluate toxicological effects are most often conducted for retrospective ERAs on large sites.Small, moderately polluted sites may be irrelevant for charismatic highly mobile wildlife with large home ranges (Tannenbaum, 2020).Field monitoring is commonly used to explore the potential effects of widespread contaminants such as mercury, lead, POPs, anticoagulant rodenticides, and other pesticides.
An advantage of field toxicity studies used in retrospective assessments at contaminated sites is that animals are not deliberately treated or exposed; rather animals are already in areas where exposure may occur.Incident reports are often used to elucidate and understand the field effects of pesticides.For studies conducted in support of pesticide registration renewals, an experimental application occurs followed by the assessment of typical effects on abundance (population size and development such as minimum number alive), survival, growth (and/or body condition), reproduction, and even sublethal effects.
A pitfall of some field studies is the tendency to assume that the chemical of interest is the most important factor affecting the wildlife species of interest.Rather, field studies should optimally be approached from a stressor identification perspective (Cormier et al., 2003), whereby researchers identify multiple candidate stressors that could potentially contribute to observed effects and consider evidence supporting or refuting a causal role for each stressor.Habitat differences among field sampling sites can readily affect wildlife; one technique to address this issue is to apply general linear modeling with habitat covariates (e.g., Arcadis, 2021), as discussed above.Examples of habitat covariates for small mammals could include vegetative cover or biomass and rock or downed wood cover.For birds, habitat covariates could include the percent of mapped home-ranges in different habitat types or variables that define the quality of the habitat type.Other potentially important stressors include co-occurring chemical exposures, low prey availability, predation, disease, competition, and adverse weather.Such multistressor studies have often been conducted in different systems following the application of pesticides (e.g., Gibbons et al., 2015;Newton, 2004) but appear to be less common at sites contaminated by industrial chemicals, with some exceptions (e.g., Gill & Elliott, 2003).Postrelease assessments are available for mercury, with the relationship between mercury exposure and loon reproduction being complex, including mercury bioaccumulation potential and prey availability covarying as a function of lake pH (Kenow et al., 2015;Merrill et al., 2005;Scheuhammer et al., 2016).

Field studies for prospective assessments
Pesticide registration in Europe provides an example of challenges involved with the use of field study data in prospective risk assessments.Section 8.9 of European Commission Regulation no.283/2013 (EC, 2013a) requires any available monitoring data concerning adverse effects (e.g., mortality event, incident report) of the active substance to nontarget organisms to be reported.Currently, when a pesticide is registered, a quantitative risk assessment is conducted based on data from laboratory studies and, if the margin of safety is not acceptable after all the refinement options are exhausted, field effect studies may be undertaken.However, a field effect study is challenging to undertake prior to registration (i.e., this introduces some obstacles that are removed once the product is registered) as it requires the registrant to apply an unregistered active ingredient onto a relatively large area of farmland where a representative target crop from the product label is grown, with the valuable product having to be destroyed at great expense.Identification and availability of untreated areas with comparable landscapes to serve as a control or reference site can also be challenging.Such studies are expensive and difficult to replicate with meaningful statistical power.It would also be difficult to justify which products seemingly pose a risk great enough to warrant investing the resources to conduct pilot field monitoring.For example, regulators can often be reticent to accept the results of a field effect study if the theoretical risk assessment is still identifying potential risks-a perception that the desk-based risk assessment is more accurate than a specifically designed study.It would be more efficient to conduct field effect studies and monitoring after the product is registered as discussed in the following section.
Field studies as part of an adaptive management process Risk assessments of chemicals should be considered in the framework of the adaptive management process of any human activity (Berkes et al., 2000).The resilience of ecosystems, with its uncertainty and unpredictability, can be severely reduced by new chemical substances released into the environment, and feedback learning and appropriate responses (i.e., adaptive management) are essential for sustainable development (Berkes et al., 2000;Folke et al., 2002).Once a chemical is registered, typically there is a 7-15-year period before the registration is re-evaluated.
When specific questions about safety to terrestrial wildlife prompt re-evaluations of data from existing laboratory studies that result in more conservative interpretations, additional post-registration field monitoring or incident reporting to further evaluate safety may be prudent.However, there are rarely requirements for such monitoring, and new data are not always available for consideration in reevaluations.Postregistration monitoring is logistically more feasible than preregistration assessments as the chemical is already registered, thereby facilitating landowner cooperation.Such studies may be more readily implemented and more appropriate for postregistration risk assessments.Priorities should be focused on generating data from noninvasive methods that are relevant to population-level endpoints and not on invasive specimens for the measurement of biomarkers that cannot be linked with certainty to protection goals.As there are animal welfare issues and permission challenges related to postregistration field evaluations involving vertebrates, the periodicity with which such evaluations and the quantity of data needed to resolve unanswered questions should be carefully examined, considering the margin of safety from the initial quantitative risk assessment and the weight of evidence from all available studies.Nonetheless, such testing might address uncertainty in safety during endpoint review that occurs during the reregistration process.

Concluding remarks on field studies for effects assessment in wildlife
Robust field data can be powerful if studies are designed, conducted, and analyzed appropriately.Thus, clear guidance from regulators, particularly for pesticide registrations in Europe, on what represents an acceptable field study is needed.Collective consideration of evidence from a combination of laboratory and field studies can often yield a stronger assessment than laboratory studies alone (Custer, 2021;Fuchsman et al., 2017) and well-designed field studies could reduce the need for additional laboratory tests with animals when uncertainties remain.

CONCLUSION
Applying emerging science, including knowledge collection solutions such as systematic review, WEB-ICE, Adverse Outcome Pathway frameworks ECOTOXicology Knowledgebase, in vitro test systems (e.g., enzyme and cell lines application of high throughput screening systems), advanced statistical or mathematical methods (e.g., for data integration, doseresponse modeling, and cross-species extrapolation), and other NAMs for improving the hazard evaluation and risk assessment of chemicals is ongoing (Grimm, 2019), but for terrestrial wildlife, use of such methods has clearly lagged behind applications for aquatic animals and humans (Rattner et al., 2023).In vivo test guidelines to assess the effects of organic pesticides and industrial chemicals, and to a lesser degree, inorganic chemicals and pharmaceuticals, will likely play important roles in wildlife toxicology research and ERAs for years to come.Thus, improving on current approaches for laboratory animal and field effects assessment methods, and simultaneously learning how to efficiently reduce the numbers of test subjects for the long term, are worthy investments of resources.
Studies of intact animals will be relied upon well into the future for registrants seeking approval to market a new pesticide as well as for the evaluation of new and existing industrial chemicals and for risk managers making decisions on damages and setting remediation goals for a polluted site.Similarly, for ecological models (connecting effects at lower levels of biological organization to effects on individuals and populations) to play a larger role in ERAs, in vivo data will likely be needed as model inputs.At present, field effect studies for contaminants in wildlife offer realism incorporating a plethora of uncontrollable or unrecognized environmental variables that cannot be obtained from in vitro studies and computational models (and such tools for use with terrestrial wildlife do not yet exist).Regulatory test guidelines for controlled laboratory studies are well defined but guidance from regulators on what constitutes acceptable designs for field studies is needed so that high-quality data for free-ranging animals can be obtained to help address unanswered questions that remain from the lower tier risk assessment.Some potential options to enhance terrestrial ERAs now follow.
Strategies needing commitment from the regulatory community 1. Revisit updates to standard test guidelines to address shortcomings Revisit the potential to update existing in vivo test guidelines, for example, via engagement with the OECD Working Group of National Co-ordinators of the TGs program (WNT) (e.g., Tables 1-3).If updates to test guidelines are pursued, this could be through another SETAC/OECD workshop (e.g., OECD, 2023).Methods must focus on optimizing the accuracy of data and the biological relevance of tests, and they should be tailored toward improving the ability to derive more useful effects endpoints (e.g., TRVs), ideally without increasing animal use.Considerations should also be made to the refinement of practices commonly used in test protocols that would not require guideline updates but would also improve the science such as the reallocation of animals among treatment groups to better support regressionbased approaches (e.g., Green et al., 2022).2. Provide clear regulatory guidance for field study design Guidance from regulators, particularly for pesticide registrations in Europe, on what constitutes acceptable designs for field studies to ensure sufficient data quality and analysis for enhanced relability of ERAs.Such guidance would likely increase the inclusion and consideration of terrestrial wildlife data from field studies and monitoring in prospective risk assessment.
Strategies for the scientific research community 3. Fill critical knowledge gaps on the sensitivity of terrestrial amphibians, reptiles, and bats compared to current animal models, and if necessary, validate alternative methods for toxicological effects assessment pertaining to these taxa Before resources are channeled toward developing new risk assessment frameworks for amphibians, reptiles, and bats, critical knowledge gaps must be filled around their sensitivity (though beyond the scope of this article) and the frequencies and levels of exposures to contaminants in order to better establish whether existing effect data and risk assessments are protective.
If existing animal models and risk assessments are found not to be protective for these species, develop and validate in vitro omics and other NAMs against in vivo omics, tissue, organismal, and population data for both legacy contaminants and newer chemistries.This could then enable the linkage of NAMs with data from animal-based research to improve the predictive ability and quality of ERAs for terrestrial phase amphibians, reptiles, and bats and also support the transition from animal testing to NAMs.

Reduce uncertainty in extrapolations
Generate and analyze data using terrestrial wildlifespecific physiologically based toxicokinetic models, toxicokinetic and/or toxicodynamic models, and QSARs to support accurate and broadly applicable interspecific extrapolations from data generated in model species.

Employ a holistic approach
Develop a framework building upon existing knowledge derived from decades of work with intact animals that integrates all lines of evidence from validated or soon-to-be-validated NAMs to generate reliable TRVs.
Undertaking such activities would be a step toward enhancing wildlife ERAs in the 21st century.

a
Other rodent species may be justified.bOther species must be justified.c OECD 420 has a gap of 3-4 days between dosing at each dose level, OECD 423 and OECD 425 are sequential dosing designs, OECD 425 has a minimum of 48 h intervals between stages, test can last several weeks.d Stopping criteria for OECD 425: (a) three consecutive animals survive at the upper bound; (b) five reversals (a reversal is when there is a nonresponse in the animals at one dose and then a response is observed at the next dose tested [i.e., response followed by nonresponse]) occur in any six consecutive animals tested; (c) at least four animals have followed the first reversal and the specified likelihood-ratios exceed the critical value (see paragraph 44 and annex 3. Calculations are made at each dosing, following the fourth animal after the first reversal).e At least 10 animals (five males and five females) should be perfused in situ and used for detailed neurohistopathology.
are a factor of the exposure or not.Subtle responses observed in a standardized laboratory test may also be difficult or impossible to detect in the field due to logistical constraints and inherent variability in animal populations.Robust historical control data sets defined as control responses for the same species within a specified time period (e.g., covering a five-year period centered as closely as possible on the index test; EFSA, 2023) can help address this issue by providing an understanding Integr Environ Assess Manag 2024:699-724 © 2023 His Majesty the King in Right of Canada and The Authors DOI: 10.1002/ieam.4795TABLE 2 generations grown out from 2 weeks of egg production from the preceding generation Total number of birds used 128-180 adults, 2000-5000 birds including offspring 160 In F0 generation, numbers in subsequent generations depend on production, optimally at least six eggs set for each pair (i.e., 96 per group) with a minimum of 40 for each treatment group for F1 and F2 generations Endpoints A total of 15 endpoints analyzed statistically (Hartless, 2012) that relate to adult health (food consumption, body weight gain, egg production), eggshell (eggshell thickness, noncracked eggs), fertility, embryo survival, hatchability, chick survival, and growth.Not analyzed statistically but recorded and reported: adult survival, morbidity, abnormal behavior, gross pathology The purpose of the test is to characterize the nature and dose response of chemicals with endocrine bioactivity in birds.A total of 41 endpoints are measured, 23 must be analyzed statistically, five are optional.Endpoints that relate to growth, development, and reproduction are similar to the one-generation reproduction study but are measured in multiple generations (F0, F1, [treated diet] and F2 [untreated diet]).Histology endpoints (gross anomalies reproductive tract, histology of organs for signs of overt toxicity, reproductive tissues in adults and embryos) and biochemical endpoints (e.g., estradiol and testosterone in egg yolk and serum, thyroid hormones in gland and serum) across multiple generations Note: Japanese quail (Coturnix japonica), Northern bobwhite (Colinus virginianus), mallard (Anas platyrhynchos).Abbreviations: a.i., active ingredient; BW, body weight; NOAEC, no observed adverse effect concentration; NOAEL, no observed adverse effect level; OCSSP, Office of Chemical Safety and Pollution Prevention; OECD, Organisation for Economic Co-operation and Development.Source: Organization for Economic Co-operation and Development (1984b, 2007), USEPA (2012c, 2015).Integr Environ Assess Manag 2024:699-724 © 2023 His Majesty the King in Right of Canada and The Authors wileyonlinelibrary.com/journal/ieamTABLE 3 Standardized mammalian toxicity studies Integr Environ Assess Manag 2024:699-724 © 2023 His Majesty the King in Right of Canada and The Authors DOI: 10.1002/ieam.4795TABLE 3 (Continued) Integr Environ Assess Manag 2024:699-724 © 2023 His Majesty the King in Right of Canada and The Authors wileyonlinelibrary.com/journal/ieamTABLE 3 (Continued) Note: Rat (Rattus norvegicus), rabbit (Oryctolagus cuniculus).Abbreviations: a.i., active ingredient; BW, body weight; LD50, median lethal dose; NA, not applicable; NOAEL, no observed adverse effect level; OCSSP, Office of Chemical Safety and Pollution Prevention; OECD, Organisation for Economic Co-operation and Development; OPPTS, Office of Prevention, Pesticdes and Toxic Substances.
Select advantages and disadvantages of current testing protocols for toxicological effects assessments in terrestrial vertebrates Integr Environ Assess Manag 2024:699-724 © 2023 His Majesty the King in Right of Canada and The Authors DOI: 10.1002/ieam.4795TABLE 4 Thomas G. Bean: Conceptualization; writing-original draft; writing-review and editing.Val R. Beasley: Conceptualization; writing-original draft; writing-review and editing.Philippe Berny: Conceptualization.Karen M. Eisenreich: Conceptualization; writing-original draft.John E. Elliott: Conceptualization; writing-original draft.Margaret L. Eng: Conceptualization.Phyllis C. Fuchsman: Conceptualization; writing-original draft.Mark S. Johnson: Conceptualization; writing-original draft.Mason D. King: Conceptualization.Rafael Mateo: Conceptualization; writing-original draft.Carolyn B. Meyer: Conceptualization; writing-original draft.Christopher J. Salice: Conceptualization; writing-original draft; writing-review and editing.Barnett A. Rattner: Conceptualization; writing-original draft; writing-review and editing.ACKNOWLEDGMENT A draft of this manuscript was critically reviewed by the US Environmental Protection Agency.The authors would like to thank Jason M. O'Brien (Environment and Climate Change Canada, Ontario, Canada) for his contributions to the workgroup and the ideas that helped develop this manuscript.This manuscript was a product of the SETAC Technical Workshop "Wildlife Risk Assessment in the 21st Century: Integrating Advancements in Ecology, Toxicology and Conservation," with funding for the workshop provided by the United States Geological Survey, Teck Resources Ltd., and SETAC.The contribution of Barnett A. Rattner to this work was supported in part by the Contaminant Biology Program of the US Geological Survey Ecosystems Mission Area.