Incorporating New Approach Methodologies into Regulatory Nonclinical Pharmaceutical Safety Assessment

,

Toxicology studies evaluate systemic organ toxicities, behavioral effects, reproductive and developmental toxicology, genetic toxicology, eye irritancy, and dermal sensitization.They include single and repeat dose studies in rodent and non-rodent animal species, which identify target organs, assess severity and reversibility, and define dose-response and no observed adverse effect levels.These are critical parameters that are essential for regulatory decision-making on whether the compound can be progressed into clinical trials and if so, estimation of a suitable starting dose, maximum dose, dose escalation regime, and any non-standard clinical safety monitoring that may be needed.Attrition due to toxicity observed in animals and/or in humans is an important cause of the high failure rate of clinical drug development (Cook et al., 2014;Watkins, 2011;Thomas et al., 2021).Toxicity observed in nonclinical animal safety studies may cause attrition of candidate drugs prior to clinical trials (Cook et al., 2014); however, many drugs cause clinically serious adverse effects in humans which are not detected in animals (Bailey et al., 2015), leading to attrition late in clinical development, failed licensing, and/or restrictive drug labelling.For example, human drug induced liver injury (DILI), which is not detected in animal safety studies, is an important cause of attrition (Watkins, 2011).
New approach methodologies (NAMs) include methods that predict and evaluate biological processes by which pharmaceuticals may elicit desirable pharmacological effects and/or may cause undesirable toxicity.Many different types of NAMs have been described.These include simple in vitro cell-based

Introduction
Extensive nonclinical safety studies are undertaken on new pharmaceuticals prior to and alongside clinical trials.Their purpose is to identify and understand the toxic effects of the compound in order to determine whether its anticipated benefit versus risk profile justifies clinical evaluation and, if so, to inform the design and monitoring of clinical studies.The nonclinical safety studies are mandated by regulatory guidelines and include a variety of safety pharmacology and toxicology investigations.
Safety pharmacology studies aim to determine whether pharmaceuticals cause on-or off-target effects on biological processes that can affect the function of critical organ systems (e.g., cardiovascular, respiratory, gastrointestinal, and central nervous systems) and to assess potency, which is needed to assess safety margins versus human clinical drug exposure.Safety pharmacology studies also help inform the selection of follow-on investigations that can aid human risk assessment and may provide insight into mechanisms that underlie any effects that arise in humans.Multiple leading pharmaceutical companies (e.g., AstraZeneca, GlaxoSmithKline, Novartis, and Pfizer) have outlined the advantages provided by in vitro safety pharmacological profiling, including early identification of off-target interactions and the prediction of clinical side effects that may be missed in animal studies, and have highlighted that these studies enable much more cost-effective and rapid profiling of large numbers of compounds than animal procedures (Bowes et al., 2012).
The aim of this workshop was to accelerate the transition to inclusion of NAMs within an integrated, human-relevant pharmaceutical safety assessment strategy that is based on identified need and is scientifically valid and acceptable to industry and regulators.A specific objective was to propose a decision tree framework that aligns NAMs with the regulatory needs identified by the US FDA (Avila et al., 2020) and highlights where additional NAMs need to be, or currently are being, developed.

Methods
A number of international experts (regulators, preclinical scientists, and NAM developers) were identified from the authors' professional networks, including the Alliance for Human Relevant Science1 , and were invited to participate in a series of workshops.It was stipulated that the workshops would focus on the safety testing of new pharmaceuticals only and would not cover efficacy testing, the testing of medical devices or vaccines, chronic effects, carcinogenicity, reproductive toxicity, pediatric or special toxicity concerns.Those who accepted were asked to read several papers in advance (US FDA, 2017;Avila et al., 2020;Butler et al., 2017;EMA, 2021;Patterson et al., 2021;Andersen et al., 2019) and were directed to Table 2 in the 2020 US FDA paper (Avila et al., 2020), which presents the nonclinical safety assessment needs and was used as a starting point for discussions.
Five workshops took place in 2021, conducted online due to Covid-19 restrictions.The workshops focused on four clinically important organ systems concerning adverse drug reactions that are assessed routinely in nonclinical animal safety tests: cardiovascular, respiratory, liver, and central nervous system (CNS).For three of the organ systems (cardiovascular, respiratory, and CNS), adverse functional effects are assessed via the core battery of safety pharmacology tests described in ICH S7A.The primary focus was on in vitro NAMs due to the participants' expertise, although the additional value provided by computational NAM models also was recognized.Based on their knowledge, participants were specifically allocated to one of the four organ system groups.Each group considered: i) which currently available NAMs could be used to address safety pharmacology/regulatory needs, ii) what outcomes would be measured by these NAMs, iii) the strengths and weaknesses of each NAM, and iv) key gaps not addressed by the currently available NAMs.
Initially, an overall map illustrating the typical process of screening drug candidates was generated to establish the scope and context of the exercise.Participants then developed maps that summarized safety requirements for each organ system, acknowledging that these represent only a part of the overall nonclinical assessment that a drug candidate would typically undergo.Participants populated the maps with in vitro NAMs already in use and those that could be used with further development, while identifying areas where NAMs are still needed.Where necessary, participants continued to collaborate after the workshops to complete the maps.tests, more complex organotypic or microphysiological systems (MPS)/organ-on-a-chip devices, and whole tissues that are maintained ex vivo.Interpretation of the in vivo relevance of the data provided by these methods is complemented by computational tools that simulate and predict in vivo drug disposition and kinetics, in particular physiologically based pharmacokinetic (PBPK) models.Accurate in vitro to in vivo extrapolation is further aided by human low-dose testing and microdosing studies (phase 0 testing), which provide precise data on systemic human drug exposure and kinetics in vivo.
The use of NAMs during drug discovery and development has the potential to reduce, and ideally eliminate, toxicities that arise in animal safety studies and in humans.Pharmaceutical companies already are using NAMs in early drug development to support efficacy testing, and as screens to deselect unsuitable molecules prior to animal safety testing, with considerable success (Morgan et al., 2018).In 2017, the US Food and Drug Administration (US FDA) outlined its aim to improve predictivity and reduce the use of animals in toxicology by promoting the development, qualification, and integration of NAMs into regulatory science (US FDA, 2017).Three years later, it announced its "Innovative Science and Technology Approaches for New Drugs" (ISTAND) program, which enables developers of NAMs to interact with the US FDA in the early stages of the development process (US FDA, 2021a).In a key 2020 paper, the US FDA gave an overview of where NAMs are currently used in preclinical safety assessment and identified areas of unmet need where NAMs might deliver more predictive and productive methodologies, potentially improving and expediting drug development (Avila et al., 2020).It recommended that regulatory requirements and unanswered safety questions should be the starting point for the development of NAMs.However, at present there is no overarching consensus on how NAMs can be incorporated within regulatory guidelines.Regulatory agencies, pharmaceutical companies, and non-governmental organizations have identified numerous hurdles to doing so (Piersma et al., 2018;Butler et al., 2017;Burgdorf et al., 2019), and guidance is needed to help surmount them.Such guidance could be especially valuable to academic researchers and small biotech start-ups that drive new NAM development yet may be unfamiliar with the precise needs of regulators and other end users (ICCVAM, 2018).
The nonclinical studies used for safety assessment are identified in the International Council for Harmonization (ICH) guidance M3(R2) (non-clinical safety) (EMA, 2021) and ICH S7A (safety pharmacology) (ICH, 2000), which are nonbinding and do not specify what type of study should be conducted.The US FDA has acknowledged that a relatively standard set of studies has evolved to assess the profile and safety of a drug before it proceeds to firstin-human (FIH) trials, some of which (e.g., FIH dose selection) arguably perform better than others (e.g., predicting drug-induced liver injury) (Avila et al., 2020).While most of these studies involve animal testing, federal regulations state that pharmacology and toxicology data can also come from studies conducted ex vivo, in silico, and in vitro (ICH, 2000;EMA, 2018;US FDA, 2018).

Central nervous system
For the central nervous system (CNS), ICH S7 specifies that the effects of a test substance on motor activity, behavioral changes, coordination, sensory/motor reflex responses, and body temperature should be evaluated.Participants described NAMs that could assess motor activity, behavioral changes, and blood-brain barrier integrity.
Participants noted that established models such as Ntera-2D1 cells (differentiated with post-mitotic cell product) have been used extensively in motor activity screening programs over the last 10-20 years to assess elementary functionality and network formation, so a good body of data supports their use.They are advanced in terms of addressing regulatory requirements and display favorable characteristics (e.g., viable for extended periods), which allow several measurements to be routinely gathered using cellular platforms such as high-throughput imaging as well as cell viability assays.The Ntera-2D1 platform can also generate functional astrocytes and neuronal cells that can be used in calcium imaging and electrophysiological studies to give an understanding of brain cortex impact, and therefore, give some indication of behavioral change under drug pressure (Hill et al., 2012;Woehrling et al., 2015).However, Ntera-2D1 cells typically take 6-8 weeks to grow, making them expensive to maintain, and genetic instability can also result in post-mitotic cells.Other, less sophisticated models such as SH-SY5Y cells can be differentiated relatively quickly (~1 week) into neuronal-like cells for functionality studies.
It was observed that blood-brain barrier (BBB) models using transformed cell lines (e.g., hCMEC/D3) are well established 3 Results

Workshops
Thirteen individuals participated in the workshops; 2 preclinical scientists, 5 NAM developers, 5 who were both preclinical scientists and NAMs developers (of whom 2 had extensive regulatory experience), and one regulator.Their affiliations (or previous affiliation in the case of one retiree) included the pharmaceutical industry, contract research, SME/ start-up, toxicology, drug safety, regulation, academia, and bioengineering.Three were female.One was from the US, two from mainland Europe, and 10 from the UK.

Maps
Figure 1 provides an overview of the typical process of nonclinical safety and efficacy screening.Selected drug candidates are tested using computational methods and in vitro assays to address safety, kinetics (drug exposure), and efficacy.To predict efficacy and safety outcomes, effects observed in in vitro and in vivo assays are compared with in vivo human drug exposure, which is predicted using computational PBPK tools that may include quantitative systems toxicology (QST).These analyses inform decisions about whether the predicted benefit versus risk justifies progression to clinical trials and, if so, whether a bespoke safety monitoring plan is required.
The safety evaluation for each organ system is detailed in Figures 2-5.Safety evaluation of the liver included drug-drug interaction (DDI) studies, for which the liver is the primary organ of concern.from lack of reproducibility and variable functionality, depending on the differentiation protocols used.Nevertheless, it was acknowledged that they offer the possibility of long-term and recovery studies since they can be retained in culture for many months and may also provide improved BBB models with superior paracellular integrity compared with current transformed cell line models.
Participants observed that the current in vitro methods alone cannot yet generate the "higher level" data requested by regulators, such as sensory and motor reflex responses or whole-body temperature effects, although computational packages exploiting real-world and/or clinical data, combined with NAM data, are beginning to address these challenges.

Cardiovascular system
For the cardiovascular system, ICH S7 specifies that the effects of a test substance on blood pressure, heart rate, and the electrocardiogram should be evaluated.Participants identified NAMs that could address repolarization and conductance abnormalities, contractility and, to a very limited extent, vascular endpoints.and used to measure paracellular integrity and, therefore, BBB function.Although these cell lines exhibit only moderate paracellular tightness, they are applicable for these studies.Standard transporter assays, inhibitor studies, and subcellular localization studies by immunofluorescent imaging, together with permeability measurements, provide an overall picture of BBB integrity.However, complexity was noted to be an issue with any model of the BBB, including the number and types of cells necessary.Regarding human brain endothelial cells, it was acknowledged that few cell lines with physiological paracellular tightness are available, leading to attempts to recapitulate the BBB physiology more closely using multicellular (e.g., spheroids) and microfluidic (including shear stress) models.However, participants noted that these models are still under development and not yet standardized.Participants also observed that few primary human brain endothelial cell models achieve a paracellular tightness that is similar to the in vivo situation.Consequently, lower cost and more standardized transformed cells are currently preferred.
Participants noted that, while iPSC-derived neuronal cell types offer hope for the future of CNS studies, they currently suffer The absence of expertise needed to discuss i) contractility measurements with iPSC-CMs and ii) vascular effects was apparent from the limited population of these parts of the cardiovascular system map.Publications describing combined methods for measuring contraction with iPSC-CMs were highlighted (Pointon et al., 2015;Wang et al., 2020), but with the available expertise the panel members were unable to comment extensively on these approaches.Angiogenesis models were mentioned and several technologies were suggested, although participants provided little detail.Methods and readouts for contractile force and stress, arrhythmogenic risk, and direct effects on the vascular system of the heart were covered.However, it was acknowledged that further development of this map is required to provide a more comprehensive coverage of the numerous mechanisms by which drugs may adversely affect the cardiovascular system.

Respiratory system
For the respiratory system, ICH S7 specifies that the effects of a test substance on the respiratory rate and other measures of respiratory function such as tidal volume and hemoglobin oxygen sat-Participants noted that, while in vitro measurement of cardiac arrhythmia risk has traditionally employed hERG transfected cell lines (typically HEK293 and CHO), the development of human iPSC-derived cardiomyocytes (iPSC-CM) as well as platforms such as microelectrode arrays (Kanda et al., 2018) and automated patch clamp (Li et al., 2019) have enabled more extensive examination of multiple ion channel activities simultaneously.Co-culture of cardiomyocytes with cardiac fibroblasts or preparations of commercially available cells with mixed cell populations have also proved beneficial in many of these assays, although standardization remains a goal for many suppliers and users of these cells.It was acknowledged that reservations remain concerning the maturity and functionality of iPSC-CMs (Guo and Pu, 2020), with many cell types displaying fetal characteristics.This is already under consideration within the Comprehensive in vitro Proarrhythmia Assay (CiPA) program2 .
The use of iPSC-derived engineered heart tissue was discussed as it has enabled more contractility applications to be developed for assessing cardiovascular effects (Lemme et al., 2018).Traditional platforms like fluorescence imaging plate readers and high-content imaging are still used along with MPS incorporating co-culture 3D models.It was considered that further advances with multi-organ models (e.g., liver and heart (Ferrari and Rasponi, 2021)) will go some way towards creating a high-Fig.3: Cardiovascular map Participants highlighted key functional processes (purple boxes) identified by regulators and the assays that could address those processes.Assay outputs, using the NAM platforms (blue boxes), cell types (orange boxes), and measurements (green boxes), are integrated to provide evidence of the compound-related effect on each system.2D, 2-dimensional; 3D, 3-dimensional; ADME, absorption, distribution, metabolism and excretion; APD, action potential duration; CHO, Chinese hamster ovary; FDSS, functional drug screening system; FLIPR, fluorescent imaging plate reader; HEK, human embryonic kidney; hERG, human ether-à-go-go-related gene; iPSC, induced pluripotent stem cells; OOC, organ-on-a-chip; PECAM1, platelet and endothelial cell adhesion molecule 1; PBPK, physiologically based pharmacokinetic; SIRPA, signal regulatory protein alpha; TdP, torsades de pointes ture are becoming more established, instrumentation to generate exposure conditions that mimic those that occur in vivo are still being developed, and only a few are commercially available (Sadler et al., 2011;Primavessy et al., 2021).Greater use of computational models (e.g., multiple-path particle dosimetry) or computational fluid dynamics was suggested (Kuprat et al., 2021;Corley et al., 2021;Su et al., 2020;Tsega, 2018) to determine the benchmark dose (BMD) or no-observed effect level (NOEL) and was highlighted as an area ripe for research.One approach accepted by the US Environmental Protection Agency (US EPA) involves computational fluid dynamics, particle size, MucilAir™, and BMD modelling, resulting in an EPA waiver for the 90-day rat inhalation toxicity test (Hardy et al., 2017;Varewyck and Verbeke, 2017;LASA and NC3Rs, 2009; US EPA, 2016).

Liver
ICH S7 suggests that the effects of a test substance on organ systems not investigated as part of the core battery should be assessed when there is a reason for concern.Assessment of DILI was regarded by participants as an established part of the safety testing of all drugs, with the liver being a relatively frequent toxicity target organ in in vivo animal safety.The liver is also the primary organ involved in induction and/or inhibition of compound metabolism, the generation of metabolites that may be toxic to the liver itself or to other organs, and any potential DDIs.
Many different liver NAMs have been described, and hepatoxicity testing was regarded by participants as an established part of the nonclinical safety testing undertaken by many pharmaceutical companies.However, currently there is no consensus on which of these should be used during drug discovery and development to reduce risk of in vivo liver toxicity (Fernandez-Checa et al., 2021).To address this gap, NAMs that address key biological processes by which hepatotoxicity and DDIs may arise, and which are suitable for inclusion for regulatory safety testing, were discussed and proposed.
The most clinically concerning consequence of DILI is acute liver failure, which arises following damage to a substantial fraction of hepatocytes (Fontana, 2008).Participants felt that current human primary hepatocyte (PHH) NAMs capture most of the endpoints required to assess toxicity to hepatocytes.A particular advantage of human hepatocytes is that they avoid species differences in expression of liver drug metabolizing enzymes and transporter proteins, which is an important limitation of animal safety and DDI studies.In addition, studies undertaken using individual donor hepatocytes enable evaluation of the impact of inter-individual variability in drug metabolism and drug transporter function, which play a key role in many DDIs.However, many liver NAMs use hepatocytes pooled from multiple donors, which reduces or eliminates the impact of this variability.Due to the limited availability of PHHs, many companies instead undertake cellular toxicity testing of human liver-derived cell lines (e.g., uration should be evaluated.Participants identified NAMs that could address many aspects of respiratory function for regulatory purposes, including sensitization, inhalation toxicity, fibrosis, and irritation.Methods for determining correct exposure levels were also described.
Participants agreed that manifested physiological effects on respiratory function were often due to damage or changes at the cellular level, which could be obtained as in vitro read-outs and, when integrated, could provide an indication of overall respiratory function and/or toxicity.Notably, although effects such as CNS depression or effects on pulmonary vasculature would be absent from in vitro models, participants felt that the cellular endpoints described in this map and the NAMs associated with these pathways offered a more human-relevant assessment of respiratory toxicity than current in vivo animal measurements could provide.Nonetheless, it was acknowledged that measurements requested in current guidelines for tidal volume, respiratory rate, etc. are only possible in whole organisms.Participants noted that in vitro measurement of epithelial cell health, immune response, cilia motion, omics approaches, and mucus secretion are well developed within the chemicals testing sector, where acute inhalation toxicity may be monitored using a variety of commercially available proprietary models such as SenzaGen's3 GARD ® air, Epithelix's4 MucilAir™ and SmallAir™, MatTek's5 EpiAir-way™ and EpiAlveolar™, ImmuONE's6 ImmuPHAGE™ and ImmuLUNG™, and Invitrolize's7 ALISens™.
The use of mucus-producing epithelial cells or co-cultures of human alveolar epithelial cells with pulmonary microvascular endothelial cells enables cilia function, mucus production, and transepithelial electrical resistance to be monitored, giving an indication of barrier function 2 .Moreover, it was felt that the use of patient/disease donor cells in this area (e.g., from chronic obstructive pulmonary disease, asthma, and cystic fibrosis patients) provides additional translational information and has enabled better characterization of these disease phenotypes and potential tailored treatments.
One participant commented that the US FDA wants co-culture 3D models with a pathology output and biomarker support for safety assessment to predict an animal or clinical result (Clippinger et al., 2018).He also suggested that good markers are needed as surrogates for the respiratory response.3D models offer many advantages over traditional 2D models because they can be maintained for months and used to study longer term exposure, multiple dosing, and recovery.They also may be implemented with other tissue models in multi-organ chips to investigate systems biology.However, as with other models that include multiple cultured cell types, it was felt that quality control and genetic stability were necessary for their widespread adoption.
A major challenge identified for respiratory safety testing was that exposure and cell response in vivo are determined by the complexity of the lung.Although relevant cell types for co-cul-ger-term measurements to be performed.A recent in vitro toxicity test of 27 hepatotoxic and non-hepatotoxic test drugs in a proprietary MPS platform identified drugs that caused human DILI with very high specificity (100%) and high sensitivity (87%) (Ewart et al., 2022).It will be important to determine whether this performance can be replicated using larger numbers of drugs, and also whether similarly impressive results can be obtained using other liver MPS platforms.
In principle, liver NAMs that maintain viability and function for many weeks might be used to explore reversibility of effects.When undertaking reversibility studies, models will need to be monitored regularly to ensure that no deterioration or dramatic changes occur during the study period.Cell line proliferation in long-term culture, and the conditions in which cells are maintained, were both highlighted as potential confounding variables to be considered when interpreting findings from such studies.
The level of complexity of the various liver NAMs was discussed at length.Participants commented that compared to liver HepG2, HepaRG).When combined with information such as structural alerts and physico-chemical properties of compounds and the use of computational methods such as QST to support in vitro to in vivo extrapolation, human liver cell-derived NAM data have been successfully used to profile human DILI caused by numerous drugs (Smith et al., 2020).It was further noted that the use of liquid chromatography-mass spectrometry to detect and quantify metabolite formation has provided valuable insights into the role of biotransformation in DILI.
In conventional 2D culture, plated PHHs lose many of their hepatic phenotypes and functions within hours (Lauschke et al., 2016a).To overcome these limitations, various 3D culture methods, including spheroids, micropatterned co-cultures (MPCC), and liver MPS have been developed which maintain PHH viability and functionality for weeks (Smith et al., 2020;Lauschke et al., 2016bLauschke et al., , 2019;;Lin et al., 2016;Zhang et al., 2020).Participants noted that such 3D liver systems are now generally accessible and allow dose responses and some lon- When designing and interpreting human liver NAM studies, participants noted that concentrations of drugs and drug metabolites that are formed within human liver in vivo need to be considered.This is challenging since drug exposure within liver cells in vivo may be markedly higher than drug exposure in blood.
The need to evaluate differences between transient and persistent effects was also discussed.Participants cited troglitazone, which was withdrawn from the US market after deaths and severe liver failure in patients treated with the drug for prolonged intervals (> 1 month).It has been proposed that activation of adaptive immune responses could play an important role in severe liver injury caused by troglitazone and by many other drugs.The currently available liver NAMs do not assess activation of adaptive immune responses.This gap was considered by the participants to be an additional important limitation of the proposed liver map.cell suspensions or 2D culture models, more complex liver systems such as MPS, MPCC, and spheroids have the advantage of longer viability periods (~6 weeks), enabling effects such as fibrosis and cholestasis to be observed through interactions between co-cultured non-parenchymal cells (Ware et al., 2015).Many of the MPS catalogued by the North American 3Rs Collaborative8 focus on the liver, although at the time of writing companies operating in 19 different tissue areas were identified.
It has been reported that iPSC technology enables the development of hepatocyte-like cells that can be used to explore variability between humans in susceptibility to drug-induced liver injury in vitro (Choudhury et al., 2017;Takayama et al., 2014;Ouchi et al., 2019).However, participants noted that such individual donor cells may not be suitable for use in all platforms, potentially resulting in non-fully functioning liver models.

Fig. 5: Liver map
Participants highlighted key functional processes (purple boxes) identified by regulators and industry and the assays that could address those processes.Assay outcomes (yellow boxes), using the NAM platforms (blue boxes), cell or preparation types (orange boxes), and measurements (green boxes), are integrated (white boxes) into QST platforms to provide evidence of the compound-related effect on each system.2D, 2-dimensional; 3D, 3-dimensional; ALT; alanine transaminase; ATP, adenosine triphosphate; BCS, biopharmaceutics classification system Seahorse™; CYP, cytochromes P450; GSH, glutathione; LDH, lactate dehydrogenase; MPS, microphysiological system; QST, quantitative systems toxicology ties provided by in vitro methods.Some participants commented that using NAMs was risky because they could be rejected by regulators in favor of in vivo tests, resulting in lost revenue and delay.The cost of some NAMs and the lack of a comprehensive database of positive and negative test compounds were also felt to discourage their uptake.

Factors likely to increase uptake of NAMs
Participants highlighted the advantages of collaborating with stakeholders, citing the examples of previous genotoxicity working groups (European Environmental Mutagenesis and Genomics Society 10 ; Health and Environmental Sciences Institute, 2021 11 ; Gocke et al., 2000) and the CiPA initiative 2 , and suggested that talking to decision-makers within regulatory agencies prior to using NAMs was advisable.A collaborative approach was also considered useful in the context of standardization and validation.Generally, it was felt that engaging with other disciplines allowed more scientists to learn about the capabilities of NAMs and that much could be learned from other regulated sectors.
One participant commented that it would be helpful if NAMs were employed to assess developmental and reproductive toxicology (DART) prior to FIH.Application of NAMs to the assessment of DART was not discussed within the workshop and is a major focus of activity by other researchers.To increase the familiarity of regulators with NAMs, another participant suggested that it would be helpful for companies to describe in-house in vitro tests that they have used when making their regulatory submissions.For the same reason, participants supported the idea of submitting in vitro data alongside in vivo data.A bridging approach was mentioned in this context, whereby data are correlated between animal in vivo and animal in vitro, between animal in vitro and human in vitro, and finally between human in vitro and human in vivo studies (NASEM, 2021).Uncertainty was expressed about how regulators would choose between the results of in vitro and in vivo experiments in the likely event that they produced conflicting results.A participant responded that the concern of regulators was to make a judgement about human risk, not to compare predictive ability.
Other suggestions included rewording ICH M3 to make the guidance more flexible (currently, it was felt to convey that animal studies were the default option), agreeing a basic battery of cellular tests that could quickly identify compounds with high intrinsic toxicity (although it was recognized that some companies might not welcome a prescriptive approach), and making more NAMs commercially available.Some noted that, while much time was now being spent on validating NAMs, the known strengths and weaknesses of animal tests were receiving less attention.Participants felt that there was now compelling evidence that the replacement of some poorly performing animal tests would result in fewer clinical trial failures for safety reasons.

Overall themes
Several themes recurred in the workshops, which are presented briefly here and summarized in the supplementary file 9 .

Advantages of NAMs
Participants highlighted that NAMs have been used retrospectively to identify toxicity that animal studies failed to detect, citing examples of where in vivo toxicity was predicted in vitro (Dirven et al., 2021;Balogh Sivers et al., 2018), but agreed that the challenge now was to apply NAMs prospectively to predict toxicity, citing an example of where this had already been achieved (Smith et al., 2020;Watkins, 2020).Several participants commented that US regulators have expressed confidence in NAMs, with many being used in Good Laboratory Practice toxicology tests and accepted by the EPA.Participants noted that the US FDA uses modelling software for evaluating drug safety, including Simcyp™ and GastroPlus ® , neither of which underwent US FDA qualification, prompting them to question whether formal validation of NAMs is necessary.
Participants reported many advantages of NAMs, including their ability to highlight mechanisms of toxicity (Woodhead et al., 2017), represent human variability, provide information about the efficacy and safety of compounds (using diseased tissue models), investigate the impact of compounds over longer, chronic durations, evaluate reversibility (using complex cell models), and rapidly screen many different types of compounds for their adverse effects on target organs.

Factors discouraging the uptake of NAMs
Concerns were raised about the quality of cells used for in vitro studies, including the possibility of genetic drift, batch-to-batch differences, problems with some culture media, and specific concerns relating to iPSCs.Several participants commented that NAMs are still unable to generate some information about higher level endpoints or to represent exposure that occurs in the whole organism.However, it was felt that models such as computational fluid dynamic or PBPK computations are increasingly able to address this challenge, especially alongside the MPS revolution.Participants emphasized the need for platform designers to develop NAMs tailored to regulatory requirements.Some participants reported that regulators were sometimes unclear about what data they required and that it was unhelpful to be asked to submit data without being told why it was necessary upfront.Others observed that, while several animal tests conducted in the early stages of drug development could now be replaced with in vitro methods, some companies lacked the confidence to jettison them.Conversely, it was noted that several pharmaceutical companies (e.g., AstraZeneca (US FDA, 2019)) now routinely use in vitro tests in their preliminary inhouse studies to screen and triage compounds.Participants emphasized, however, that without more transparency about this, relatively few researchers and companies realize the opportuni-9 doi:10.14573/altex.2212081s 10https://www.eemgs.eu/affiliated-groups/ 11https://hesiglobal.org/genetic-toxicology-gttc/complexity.The workshops identified gaps in assay endpoints, which when addressed should provide a better understanding of investigational drug effects.
The discussions exposed that traditional industry disciplines and their somewhat siloed nature (e.g., safety pharmacology, primary pharmacology, secondary pharmacology, general toxicity, etc.) may result in the unnecessary duplication of animal experiments.At present, no body oversees drug safety testing practices, although sponsors and regulators could identify where tests required for one discipline (e.g., safety pharmacology) might also be used for another (e.g., general toxicology) to prevent duplication and fulfil some of their 3Rs programs' objectives.It may be that the current classification of studies within traditional primary, secondary or safety pharmacology fields may no longer be appropriate as NAMs continue to uncover, for example, mechanistic data or mode of action information that traverse conventional disciplines.
An important insight was that NAMs are highly valued within pharmaceutical companies and often routinely used in-house in the early stages of drug development, but that this is not widely known.Greater transparency would increase awareness among the broader scientific community about the value of NAMs in this context.A requirement for companies to specify in their submissions to regulators which NAMs they use in-house would also familiarize regulators with NAMs.Commercial protection would be necessary, but agencies could omit the test compound, formulation information, and other commercial details, and the sponsor could choose whether to be identified.Another reason for familiarizing regulators with NAMs is the possibility that the lack of guidance on in vitro methods in ICH M3 may reflect regulators' lack of expertise in these methods.If regulators are unable to recommend specific NAMs, guidelines will remain rooted in existing animal-based methodologies.
Regulator-sponsor collaboration about the NAMs to be used prior to submission would give all parties an understanding of the robustness, reproducibility, context of use, and applicability of specific NAMs.The ISTAND program (Piersma et al., 2018) now makes this possible, enabling developers of NAMs to communicate with the US FDA in the early stages of the development process.A notable example is the evaluation of liver MPS and spheroid reproducibility for drug hepatotoxicity and metabolism studies (Rubiano et al., 2021).Such initiatives give regulators a greater acquaintance with NAMs, enabling them to make upfront recommendations to scientists about the sort of in vitro tests that might be done, and to update guidelines where necessary.The use of facilitator groups such as the Health and Environmental Sciences Institute can also be successful in uniting stakeholders, enabling regulators to learn first-hand about the advantages, limitations, and context of use of NAMs, as with the CiPA initiative.The International Consortium for Innovation and Quality in Pharmaceutical Development has published pharmaceutical companies' preferred characteristics and features for MPS to guide system developers, regulators, and end users (Baudy et al., 2020;Fowler et al., 2020;Peterson et al., 2020;Pointon et al., 2021).
While human patient-derived cells retaining their inherent genotypic and phenotypic variability were considered invaluable

Discussion
These workshops brought together a diverse group of scientists to develop maps of NAMs for four vital organ systems.Greater regulatory input would have benefited the discussions and confirmed whether the NAMs included in the maps are able to address the needs identified by the US FDA (Butler et al., 2017).Only one of ten invited regulators accepted the invitation, and there was no representation from large pharmaceutical companies.Representation from the computational toxicology community was also limited, and it is acknowledged that not all relevant computational applications are likely to have been included in the maps.Furthermore, it is recognized that the maps are not a complete framework for addressing all the nonclinical safety concerns for new pharmaceuticals.
The workshops identified general, dynamic issues affecting organ testing systems that are difficult to represent in a static format.First, acute and chronic toxicity may be expressed as different endpoints, and therefore, distinct methods may be required for each, but different timepoints cannot easily be mapped in a static format.Second, while a battery of assays may be used that encompasses multiple steps of a toxicity pathway to address questions typically derived from in vivo animal studies, it is difficult to represent each component in map form.Third, it was difficult to capture nascent technologies that are not yet commercially or widely available.These issues highlight the need for a more dynamic means of representing this information, such as an interactive decision tree that users could interrogate to answer specific questions.
Understanding exposure levels without employing in vivo animal methods was considered problematic.While an increasing number of computational models exist, some felt that the link between exposure and cell response was missing in many areas.Nevertheless, many computational models are now able to use in vitro and human clinical data to successfully determine effective dose levels and regimes for new or next-in-class molecules (ICH, 2000(ICH, , 2009;;Ferrari and Rasponi, 2021;Bai et al., 2020;IPCS and IOMC, 2010;Busquet et al., 2020), making many animal studies redundant when undertaking dose prediction.Some participants observed that it was not yet possible to generate data on full system effects and higher-level responses using NAMs.It was noted that animal studies are likewise unable to provide reliable data on such responses and that such data can only be reliably obtained after Phase II trials.Furthermore, combining computational technologies with in vitro, existing in vivo, and clinical data, makes it possible to model a whole system response to replace the use of animals (US FDA, 2021b).
It was unequivocally acknowledged that a number (or panel) of integrated in vitro and/or computational assays will be required to address safety needs that currently are unmet by animal studies.The described NAMs may be used in this way to measure cellular endpoints relating to the biological pathways involved in toxicity and pharmacological perturbations.This is the basis of the adverse outcome pathways (AOP) approach, which describes the sequence of molecular and cellular events necessary to produce a toxic response, as well as appropriate testing strategies for each event at multiple levels of biological NAMs, enabling them to update guidelines, where necessary, and make upfront recommendations to nonclinical scientists about NAMs that might be employed.
In addition, the terminology of safety pharmacology and toxicity testing guidance should be updated so that appropriate language is used to describe where NAMs may be employed in the nonclinical safety assessment of drug candidates.
for understanding heterogeneity in response to treatments and the manifestation of and susceptibility to disease (Chioccioli et al., 2019;Nantasanti et al., 2016;Bartfeld and Clevers, 2017;Kennedy et al., 2019;Yin et al., 2020;Liu et al., 2021), it was acknowledged that quality concerns need addressing if in vitro tests are to achieve wider acceptance.Good Cell and Tissue Culture Practice 2.0 (Pamies et al., 2022) now includes updated chapters on 3D culture, MPS, genetically modified cells, and pluripotent stem cells.NAMs also need to be fit for purpose (Parish et al., 2020) and properly reported (Bracher et al., 2020).Agency working groups set up to agree performance criteria for NAMs are already in place (EMA, 2021; US FDA, 2021c).
It was agreed that complex models using 3D culture may provide important information relating to system responses.However, low throughput, high cost, and the ability to measure these responses are all challenges that need to be addressed before wider adoption is possible.Andersen et al. (2019) have described tiered application scenarios for NAMs in the risk assessment of chemicals and stratified them depending on their complexity and context of use, providing a potential framework for including more complex models within an overall testing strategy.
While some companies appear hesitant about investing in NAMs due to perceived financial risks, clearly others are already investing in them heavily for the early screening of compounds.The acceptance of those screens as direct replacements in the regulatory environment would make the drug discovery process cheaper, faster, and more reliable.A return-on-investment model is needed to fully understand the financial opportunities offered by NAMs in drug discovery.As an example, for MPS, the reduction in total R&D costs for pharmaceutical companies is estimated to amount to 10-26% (~$700M) per new drug (Franzen et al., 2019).

Conclusion
In this workshop, a strategy for NAM use in four key areas of nonclinical pharmaceutical safety testing was presented.Comprehensive organ system maps, such as those documented within this article, will provide stakeholders with guidance on, and confidence in, the NAMs that may be used to complement and ultimately replace in vivo animal methods.Converting these maps into interactive decision trees would allow users to ask specific questions and select the most appropriate NAM for their purpose.The development of a more dynamic and user-friendly version of these maps is, therefore, an important project for the future.With broader discussion and further development, our maps could provide a template for the safety testing of other organ systems, or use in other testing contexts, leading to greater adoption of NAMs, improved productivity within pharmaceutical companies, and potentially, safer medicines.
The authors recommend that pharmaceutical companies are transparent about the NAMs employed in the early stages of drug development, that they should include the results from all relevant NAMs used in their submissions to regulators, and that there should be regulator-sponsor collaboration about NAM use prior to submission.These actions would familiarize regulators with