Predicting Acute Oral Toxicity Using AcutoX: An Animal Product-Free and Metabolically Relevant Human Cell-Based Test *

AcutoX is a human in vitro test method for the evaluation of acute oral toxicity, developed using a library of 67 curated test chemicals. These chemicals cover a wide variety of chemistries, industrial sectors, rodent toxicities, and all EPA and GHS hazard categories. The test uses two different cytotoxicity endpoints (Neutral Red uptake and MTT metabolism), performed both in the presence and absence of a pooled human liver extract (S9), to produce four EC 50 values. The EC 50 values are used in prediction models to assign a “highly toxic” and “low toxicity” category for both EPA and GHS classification, which can be further refined to assign a hazard category. The binary “highly toxic” / “low toxicity” prediction model has an accuracy of 73.8% and 63.1% for EPA and GHS, respectively, with the subsequent hazard categorization offering a protective prediction (correct or higher category) in 90.0% and 93.3% of cases, respectively. Moreover, the AcutoX test can identify chemicals activated or detoxified by liver metabolism.


Introduction
Acute systemic toxicity studies are used by regulatory agencies to determine hazard categorization, assign appropriate labelling, identify potential toxicity hazards, and when performing risk assessment (Strickland et al., 2018(Strickland et al., , 2023)).There are three tests for acute oral systemic toxicity with OECD test guidelines: OECD 420 (OECD, 2002a), OECD 423 (OECD 2002b) and OECD 425 (OECD 2022).Depending on the OECD test guideline method used, the outcome will be a LD50 range (OECD 420 and OECD 423) or a point estimate (OECD 425).In simple terms, these in vivo regulatory tests are used to determine a dose level, resulting in 50% lethality, i.e., an oral LD50 value.These values are used to assign substances to toxicity categories following two well used categorization systems: The United States Environmental Protection Agency (EPA) system of classification, consisting of four categories, and United Nations Globally Harmonized System of Classification and Labelling of Chemicals (GHS), consisting of five categories with chemicals or mixtures summarized in Tab. 1.
There are many uses for the data generated from these tests (Strickland et al., 2018).For example, the determination of acceptable human exposure limits in consumer or industrial products, or the identification of suitable personal protective equipment for handling of chemicals in laboratories or manufacturing facilities.
Efforts have been made to develop and use alternative test methods or new approach methodologies (NAMs) to reduce or replace the traditional animal tests used for acute toxicity.The United Nations GHS Mixtures Equation (UN, 2023) estimates the acute toxicity of mixtures using the toxicities of mixture components.This was investigated using the EPA classification system (Hamm et al., 2021) with concordance of 55% for a dataset of 671 mixtures (620 agrochemical formulations and 51 antimicrobial cleaning products).Substances predicted by the GHS equation to have low or negligible acute oral toxicity were adequately identified.Sjöström et al. (2008) compared mouse LD50 to human blood LC50 data calculated from time-related sub-lethal and lethal blood concentrations determined from human acute poisoning for 67 reference chemicals.Their linear regression analysis resulted in a variance R 2 of 0.56 confirming discordance and poor predictive value for human hazard when compared directly with animal data.High accuracy predictions for the median lethal dose point, the "low toxicity" (LD50 >2000 mg/kg) and "very toxic" (LD50 <50 mg/kg) categorizations of EPA and GHS categories were calculated using a multi-fingerprint similarity approach (Alberga et al., 2019).This approach was suggested to be used alongside other available models for regulatory use.
When creating new test methods (e.g., in vitro, in silico or in chemico) to replace existing in vivo test methods, the differences in test performance of the animal test should be understood (van der Zalm et al., 2022).The variability in in vivo acute oral hazard classification was evaluated by Karmaus et al. (2022).Rat acute oral LD50 data from multiple databases were compiled, curated, and analyzed to characterize variability and reproducibility of results across 2441 chemicals with multiple independent study records.Their analysis demonstrated that, on average, replicate studies only resulted in the same hazard categorization with a 60% likelihood.The observed variability gave a margin of uncertainty of 60.24 log10 (mg/kg).They identified many reasons for this discordance, including choice of species (i.e., rat or mouse), choice of strain and age and weight range, treatment vehicle, choice of food and housing.Sections 10 and 11 of test guideline 420 (OECD, 2002a) require the use of healthy young female adult animals of commonly used laboratory strains aged between 8 and 12 weeks old.These animals will be fed differing conventional laboratory diets with unlimited access to water.It is interesting to note that much higher levels of precision in test systems and cell husbandry are required for in vitro test methods, where culture medium composition and all components of the test system must be highly standardized for regulatory acceptance.Since there are still potential differences between how labs may perform these tests, especially in a non-regulatory context, some discordance should be expected when developing in vitro test methods to replace in vivo tests.
In silico methods have also proved to be useful non-animal approaches to predict human oral acute toxicity, especially when used in combination with in vitro tests.To create the CatMOS database, a highly curated data set of 11,992 chemical structures were identified with high confidence of these chemicals' EPA and/or GHS oral acute toxicity categories (Mansouri et al., 2021).Other data curation methods have been published, such as the integrated chemical environment database (Daniel et al., 2022).
Another factor to consider is the species differences, i.e., rodent (in vivo) and human (in vivo) and how these may translate to the human (in vivo) in occupational, consumer and patient environments.Anatomical and physiological differences between large animals and human gastrointestinal tract (Furness et al., 2015) and between human and lab animal species, including rodents (Kararli, 1995) provide many potential sources of variability.The critical differences between the rat and human include metabolic, gut microbiota, anatomical, morphological, and physiological factors including gastrointestinal tract pH and bile, pancreatic juice, and mucus secretion.The livers of these species are known to have different physiologies (Kogure et al., 1999) and contain species-specific enzymes (Martignoni et al., 2006).While few species differences are observed for CYP2E1, there are considerable species-specific isoforms of CYP1A, CYP2C, CYP2D and CYP3A (Martignoni et al., 2006).These differences would be expected to modify toxicokinetics and chemical clearance, and produce different metabolites, resulting in different mechanisms of detoxification or increased potency of toxins resulting in large health hazard effects.In addition to these phase 1 enzymes, there are species and sex differences in phase 2 enzymes, including glucuronidation in liver microsomes of humans, monkeys, rats, and mice, with the rank order (of in vitro clearance) of liver microsomes stated as mice > humans > monkeys > rats in both males and females (Mukai et al., 2015).These hepatic enzyme differences between species could be compounded by species differences in transporters impacting on the pharmacokinetic properties, and thus toxicity, of a chemical between species (Hammer et al., 2021).Hepatic intrinsic clearance was lower in rats than for human CYP2D6 substrates and showed low correlation with humans.Intestinal clearance values for human CYP3A substrates in rats appeared to be lower for most of the compounds (Nishimuta et al., 2013).Species differences in clearance from the intestine of 13 drugs and 4 drug candidates were attributed to differences in their intestinal glucuronidation (Furukawa et al., 2014) confirming that there are both phase 1 and phase 2 enzyme differences between species.All these parameters will influence the rate and extent of absorption of the chemical from the gastrointestinal tract and how it is metabolized and cleared in the animal, resulting in important changes in the acute toxicity of the chemical.
The only in vitro model to have been adopted for use in acute oral toxicity testing is a neutral red uptake cytotoxicity assay, using the BALB/c3T3 cell line derived from mouse fibroblasts or primary normal human epidermal keratinocyte (NHK) cells (OECD, 2010).This can be used to reduce the number of animals required, by identifying starting doses for the acute oral tests, but is not a replacement test.However, the value in animal reduction has been questioned (Schrage et al., 2011) when NRU uptake methodology is used to estimate the starting dose for the acute oral in vivo test since the default standard starting dose would have been almost as useful and an experienced toxicologist was far more predictive.A step-by-step approach to replace animals using this test has been proposed (Kojima et al., 2023); however, this is only suitable for replacing animals for chemicals with LD50 >2000 mg/kg.Other weaknesses of the NRU uptake model are that it does not incorporate any liver enzyme system to identify chemicals which may be detoxified or activated by metabolism and only utilizes a single measurement of toxicity, i.e., cellular membrane damage by measurement of neutral red uptake into the cell.Cell death results from different mechanisms and does not always include cell membrane damage.In addition, the use of a mouse cell line, cultured in fetal bovine serum (FBS), using in vivo rat data as a benchmark to extrapolate to humans, highlights the complexities of species differences even within the field of in vitro testing.Although this translation is removed when using human NHK cells instead of the mouse cell line.Attempts have been made to construct more complex approaches using a battery of organ-specific in vitro tests and in silico approaches (Prieto et al., 2013).Prieto et al (2013) demonstrated the potential to reduce the number of chemicals wrongly predicted as non-toxic in the Classification, Labelling and Packaging (CLP) system (LD50 >2000 mg/kg b.w.) that implements in the EU the classification criteria for the GHS system.However, uptake of these approaches has been limited.It is increasingly important to understand how international regulatory agencies use acute systemic toxicity data (Strickland et al., 2023).For many jurisdictions and chemical sectors, non-animal approaches are not accepted, but several jurisdictions provide guidance to support the use of test waivers to reduce animal use for specific applications, including weight of evidence approaches.These are, however, often limited to chemicals with an expected LD50 of >2,000 mg/kg body weight (OECD, 2016).
The aim of this work was to develop an in vitro test method, suitable for use alone or in conjunction with other computational or in vitro test methods, to reduce the number of animals used in the rodent oral acute toxicity test methods or fully replace the in vivo test methods, for chemicals covering a range of toxicities from non-toxic through to extremely toxic chemicals.The priority was to use a fully human relevant in vitro test system, with multiple cytotoxic endpoints and incorporating metabolism to enhance predictivity compared with previously available methods.

2.1
Selection of chemicals It was not possible to randomly choose test chemicals with the different GHS and EPA categorizations (Mansouri et al., 2021 andKarmaus et al., 2022); therefore, a curation scheme was devised to choose suitable test chemicals for testing.Selecting chemicals for this project involved consideration of many factors.The first step was to consider the robustness of the oral acute toxicity data as there is substantial variability in the data obtained between labs for the same chemical (Karmaus et al., 2022), this was the first criterion to be considered.All categories from the EPA and GHS systems needed to be included to create a prediction model for both systems.The relevance of the choice of chemicals was considered, i.e., for different industrial sectors.Once the formal curation process was identified, as summarized in Table 2, chemical lipophilicity and water (or media) dissolution were also considered as it is important that adequate chemical solubility in the media could be achieved to deliver high enough concentrations to result in cytotoxicity of the human dermal fibroblasts.
The test chemicals were chosen from the well categorized and curated data set used to create the CatMOS database (Mansouri et al., 2021).A full list of chemicals was downloaded from the Supplementary File 1 (Karmaus et al., 2022).This gave an initial chemical data set of 7574 entries representing 2441 chemicals.Curation Level 1 resulted in rejection of any individual chemical with 1 or 2 data entries as these were considered unreliable, this gave a chemical data set of 265 chemicals with 3 or more entries.Where all categorizations were the same (Curation Level 2), these were identified as true categorized chemicals (e.g., true GHS Cat 3, true EPA Cat 2, and true EPA Cat 2/ GHS Cat 3) as there were no discordant data sets.For example, acrylamide (CAS No. 79-06-1) contained 8 entries stating the chemical to be EPA Cat 2 and GHS Cat 3, therefore, this was considered a true EPA Cat 2/ GHS Cat 3 chemical.Where there were discordant categorizations, Curation Level 3 was applied, i.e., there was only one entry different categorization from all entries for that chemical.For example, Nicotine (CAS No. 54-11-5) contained 8 entries stating the chemical to be EPA Cat 2 (7 times) and EPA Cat 1 (once) and GHS Cat 3 (7 times) and GHS Cat 2 (once), therefore, this was considered a true EPA Cat 2/ GHS Cat 3 chemical.Curation Level 4 utilized expert judgement examining the number of classification groups and the mean LD50.For example, Difenacoum (CAS No. 56073-07-5) contained 7 entries (Curation Level 1), which were all EPA Cat 1 (true EPA Cat 1).Conversely, there were five GHS Cat 1 entries and two GHS Cat 2 entries (failing Curation Level 3).Since the LD50 values were reported to be 0.68, 1.17, 1.625, 1.8, 2, 6 and 7.33 mg/kg, the mean LD50 ± standard deviation (SD) was calculated to be 2.94 ± 2.61 which is GHS Cat 1 (Curation Level 4).Therefore, Difenacoum was considered a true EPA Cat 1/ GHS Cat 1 chemical.Expert judgements were applied to a total of 7 chemicals across all classifications.The expert judgements are summarized in Table 3.
From the Supplementary File 1 (Karmaus et al., 2022), the curation process resulted in 265 chemicals which could be potentially tested in the AcutoX trial.These 265 chemicals were assessed for chemical class, industrial sector or use and evenly distributed by EPA and GHS categorizations.This industrial sector distribution was based on typical usage as described in the PubChem database 1 , and there are cross overs in use of different chemicals across industrial sectors.This was to illustrate the diversity of chemicals chosen.A total of 67 chemicals were identified for testing.

A B C
The test chemicals chosen for AcutoX testing are summarized in Table 4, including their CAS Registry No., molecular weight, chemical supplier, curated LD50, EPA and GHS classifications.The positive control chemicals; sodium dodecyl sulfate (SDS;Cat No. 436143) and chlorpromazine hydrochloride (CAS No. 69-09-0, Cat No. C8138), were obtained from Sigma-Aldrich.These test chemicals and general chemicals were predominantly purchased from Sigma-Aldrich Co Ltd (Poole, Dorset, BH12 4QH, UK) and were of analytical quality or higher, where available.All other test chemicals and general chemicals were obtained from Fisher Scientific (Loughborough, LE11 5RG, UK), Thermo Fisher Scientific (Basingstoke, Hampshire, RG24 8PW, UK), LGC Standards (Teddington, Middlesex, TW11 0LY, UK) and Apollo Scientific Limited (Bredbury, Stockport, Cheshire, SK6 2QR, UK).

Cell lines and growth conditions
The xeno-free human dermal fibroblasts (Cat.No. FC-0037, Cell Systems) were obtained frozen and banked using standard cell cryopreservation techniques.They were supplied with a Product Specification Sheet which included details on handling and subculturing procedures and a Certificate of Analysis confirming their sterility and virus free status.The contamination control consisted of visual checks for bacterial or fungal infections prior to routine culture and dosing.The cells were not cultured beyond passage 7 and underwent periodic mycoplasma testing.The cells were cultured in animal product free conditions using Fibrolife Xeno-Free Complete Medium (Cat.No. LM-0013, Cell Systems) and incubated in standard culture conditions 37°C, 5% CO2, 95% humidity.The cells were passaged twice a week using Trypsin/EDTA Xeno-Free (Cat.No. CM-0046, Cell Systems) to detach the cells and Trypsin Neutralizing Solution Xeno-Free (Cat.No. CM-0047, Cell Systems) to neutralize the solution.The cells were then centrifuged at 1100 r.p.m. for 5 minutes to remove the supernatant and plated at a density of between 2500-5000 cells/cm 2 .Fresh medium was supplied ever 48-72 hours.An example image of the cells is provided in Fig 3.

2.3
Preliminary testing Chemicals were evaluated for visual dissolution in Dulbecco's phosphate buffered saline (Sigma, Aldrich, Cat.No. D8537) as a proxy for cell culture medium, then dimethyl sulfoxide (DMSO; Fischer Scientific, Cat No. 10213810), then ethanol (Sigma-Aldrich, Cat No. 459844) in that order of preference.The attempted top concentration was unique to each test item based on the available literature solubility data in either water, DMSO or ethanol.To attempt to achieve solubility, solutions were first vortexed, then heated and finally sonicated with heating.When solubility was not achieved at the expected concentration the volume of solvent was increased to achieve a dilution by a factor of 1.1-2, and the mixing steps repeated.The dilution and mixing steps were then repeated until a visual assessment confirmed the solution showed no signs of precipitation.This process ensured the dosing of the test system with the highest achievable concentration of test chemical.Liquid test items miscible with water were dosed with a top concentration of 20% (v/v) to ensure any cytotoxicity detected was due to the chemical action and not through limited availability of nutrients to the cell culture.
A thiazolyl blue tetrazolium bromide (MTT; Sigma-Aldrich, Cat.No. M2128) interference test was conducted prior to cytotoxicity testing to identify chemicals that directly reduced MTT.This direct reduction could result in false negative cytotoxicity data, as the ability of a cell to metabolize MTT would be overexaggerated.The highest, dissolved, concentration of the test chemical was added to MTT (1 mg/mL) in Dulbecco's Modified Eagle Medium (DMEM; Gibco, Cat No. A1896701) and incubated for 1 hour ± 5 minutes at 37°C in a 5% CO2 atmosphere (standard conditions).Post incubation, the solution was visually assessed for any darkening or purple staining, indicative of MTT reduction.

2.4
Dose range finding test Dose range finding experiments were conducted over 3 days to determine the ideal top concentration required to achieve the highest resolution of the effective concentration to effectively kill 50% of the cells (EC50).On Day 1, the human dermal fibroblasts in Fibrolife Xeno-Free medium (Cell Systems, LM-0013) were plated at 5 × 10 3 cells/ well (100 µL/well) into 96 well plates.These plates were incubated at standard conditions for 24 ± 2 hours.On Day 2, the test chemicals, negative (i.e., relevant solvent controls without test chemicals) or positive control chemical (sodium dodecyl sulfate) were prepared in media, serially diluted (10-fold) to produce 8 concentrations and dosed onto the human dermal fibroblast cells in the 96 well plates.These plates were incubated at standard conditions for 24 ± 2 hours.On Day 3, a working stock solution (0.11 mg/mL) of neutral red uptake solution (NRU; Sigma-Aldrich, N2889) was prepared in DMEM (Gibco, Cat No. A1896701).The dosing media was removed from the 96 well plates and NRU solution (100 μL) was added.The plates were then incubated at standard conditions for 3 hours ± 15 minutes.Post incubation, the dye was extracted by addition of NRU solubilizing solution (100 μL) containing ethanol, tissue culture grade water (Sigma-Aldrich, Cat No. W3500) and acetic acid (Sigma-Aldrich, Cat.No. 320099) at a ratio of 50:49:1 (v/v/v) with shaking for 10 minutes.The optical density (OD) was then measured in the individual well plates at wavelengths of 490 nm (main reading) and 690 nm (background reading) using a FLUOstar Omega Spectrophotometer (BMG Labtech GmBH, Software: Omega Control Software v.5.50R3,Mars Data Analysis Software V3.32 R4).The resulting OD values were blank corrected, background corrected and converted to percentage viability, relative to the appropriate vehicle control.The EC50 for each test chemical was calculated and used to determine the top concentration for the main test.

2.5
AcutoX main test Once a suitable dose range was calculated from the dose range finding test, the main test was performed using either the calculated top concentration or the highest concentration at which the test chemical dissolved in incubation media.The main test was performed both with and without the presence of human liver S9 and used the additional MTT endpoint.Therefore, the test groups were MTT with S9 (MTT +S9), MTT without S9 (MTT -S9), neutral red uptake with S9 (NRU +S9) and neutral red uptake without S9 (NRU -S9).
The main test was conducted as previously described for the dose range finding test with the following changes.On Day 2, the test chemicals and positive control chemical were prepared in media and serially diluted 2-fold to produce 8 concentrations and dosed in triplicate onto the human dermal fibroblast cells in the 96 well plates.While SDS was selected as a positive control for the dose range finding experiments, as historic data was available to determine the validity of individual test runs, chlorpromazine hydrochloride was used in the main test as SDS was the only surfactant examined during the study and chlorpromazine hydrochloride was deemed to be more chemically relevant.A human S9 mix was prepared in Fibrolife Media (Cell Systems, Cat.No. LM-0013) with final concentrations of NADP (0.8 mM; Sigma-Aldrich, Cat No. 93205), iso citric acid (3.6 mM; Fisher, Cat No. 205010010) and human S9 ((2.4%, w/v); Xenotech, Kansas City, KS 66103, USA, Cat No. 098H2620.S9).The human S9 mix was then transferred to the MTT +S9 and NRU +S9 test groups.The without S9 test groups were incubated at standard conditions as described for the dose range finding experiments.The S9 test group samples were exposed to the S9 metabolizing system and test chemicals at standard incubation conditions for 3 hours ± 15 minutes, then washed with Hanks Balanced Salt solution (Sigma-Aldrich, Cat.No. H8264) and returned to the incubator at standard conditions for 21 ± 1.5 hours.On Day 3, the NRU solution was prepared as described and a working stock MTT solution (0.5 mg/mL) was prepared in DMEM).The dosing media was removed from the 96 well plates and MTT solution (50 μL) was added to the MTT +S9 and MTT -S9 groups plates.The plates were then incubated at standard conditions for 3 hours ± 15 minutes.Post incubation, the dye was extracted by addition of DMSO (50 μL) and shaking for 10 minutes.

AcutoX EC50 values
The mean EC50 values for the 67 chemicals in the MTT +S9, MTT -S9, NRU +S9 and NRU -S9 test groups are presented in Table 5.For some of the tests, an EC50 could not be determined.The perceived reasons for returning a not determined result included (a) the dissolution of the test chemical was too low to result in toxicity to the cells, all chemicals were tested at their top soluble concentration in culture media and all miscible liquid test items were tested at a top concentration of 20% (v/v).(b) the time to toxicity for the +S9 tests, (3 hours) may not have been long enough when compared to the chemical exposure time of the preliminary NRU -S9 tests which were run for longer (24 hours), a 3-hour preliminary test will be added to the testing scheme in the future.(c) A physicochemical reason related to volatility, reactivity or other, this did not fall within the scope of this project but could be expanded upon in future work.(d) a combination of these explanations.Of the 67 chemicals examined, 62 produced an EC50 for the NRU assay (93%), 55 for NRU +S9 (82%), 63 for MTT (94%) and 60 for MTT +S9 (90%).A total of 54 chemicals produced an EC50 in all four assays.Chlorpromazine hydrochloride proved to be a robust positive control, with inter-assay coefficient of variation (CV%) for chlorpromazine hydrochloride of 10.12%, 8.13%, 8.76%, 8.08% for NRU, NRU+S9, MTT and MTT+S9, respectively across the 25 runs of the assay.

3.2
Prediction models Prediction models were created for both EPA and GHS categorizations.Briefly, box and whisker plots were produced for each endpoint, separating mean EC50 by EPA or GHS category (Fig 4 is the example for the EPA classification system).A stepwise  Only a single EC50 value was obtained from three independent repeats.ND: not determined increase in EC50 was observed from EPA Cat. 1 to Cat. 4 for all endpoints, broadly suggesting that EC50 values could be used to predict categories.Some overlap was observed between categories; therefore, chemicals were first assigned to broad "highly toxic" or "low toxicity" categories.EPA Cat.1-2 and GHS Cat.1-3 were regarded as "highly toxic", and EPA Cat.3-4 and GHS Cat.4-5 as "low toxicity"; using these categories as a guide, cut-off values were selected that offer a protective prediction of toxicity -either a correct prediction or a prediction of a more toxic category than assigned by animal acute toxicity data.Chemical hazard predictions were further refined from this binary classification into EPA and GHS categories.

EPA binary prediction of highly toxic and low toxicity
A chemical was assigned as "highly toxic" (EPA Category 1 and 2) if any one or more of the four following criteria were met: 1) MTT +S9 EC50 was ≤50 mM.
3) MTT -S9 EC50 was ≤25 mM.4) NRU -S9 EC50 was ≤25 mM.If none of these criteria were met, the chemical was assigned as "low toxicity".This step in the AcutoX test was able to assign the binary prediction to 66 of the 67 chemicals (97.0%;Table 6).Of the 30 chemicals classified as EPA Category 1 or 2, 29 were assigned as "highly toxic", while no classification could be assigned for carbofuran.Of the 37 chemicals classified as EPA Category 3 or 4, 19 were assigned as "low toxicity", 17 assigned as "highly toxic", and no prediction could be made for ethyl acrylate.This resulted in 48 (73.8%) correctly assigned "highly toxic" / "low toxicity" for the EPA classification system.
Therefore, the sensitivity, specificity, and overall accuracy of this binary prediction model were 100%, 52.8% and 73.8%, respectively.Table 6 shows the "highly toxic" and "low toxicity" chemicals assigned by AcutoX against the EPA category.

EPA prediction model
In the second stage, EC50 cut-off values that offer a protective prediction were examined for all four endpoints.Chemicals assigned as "highly toxic" and "low toxicity" were examined separately.The MTT +S9 endpoint offered the most robust, protective prediction.In this second stage, chemicals assigned as "highly toxic" were further assigned to EPA categories as follows: 1) EPA 1: if MTT +S9 EC50 was <10 mM.
2) EPA 4: if MTT +S9 EC50 was >200 mM.This second step of the AcutoX test was used to predict the EPA category for 60 of the 67 chemicals (89.6%).Acutox was unable to assign an EPA prediction for 7 chemicals (4 EPA Cat 1 and 3 EPA Cat 3), and these chemicals were not included in calculations of precision.Of the 60 chemicals, 29 were assigned to the correct EPA categorization (Tab.6); therefore, the overall precision of the test was 48.3%.However, of the 31 mis-categorized chemicals, 25 (80.6%) were assigned to a higher EPA category (e.g., EPA Cat 1 instead of Cat 2), resulting in a protective hazard identification system with 54 chemicals (90.0%) either assigned the correct or higher EPA category (summarized in Tab. 7).Although 6 chemicals (10.0%) were assigned a lower hazard category, a cautious approach to hazard prediction was used to minimize under-classification of chemicals.An AcutoX prediction model decision tree for EPA categorization was created and is presented in Fig 5.

GHS prediction model
As in the EPA prediction, the first step was to identify the binary prediction for "highly toxic" (GHS Cat 1 + Cat 2 + Cat 3) and "low toxicity" (GHS Cat 4 + Cat 5).The same cut off values for the 4 tests were applied (i.e., assign toxic if either MTT or NRU without S9 is ≤25 mM or MTT or NRU with S9 is ≤50 mM).Of the 23 chemicals classified as GHS Categories 1-3, 22 were correctly predicted to be "highly toxic" and no prediction could be made for carbofuran.Of the 44 chemicals classified as GHS Categories 4-5, 19 were correctly predicted to be "low toxicity" and a prediction could not be made for ethyl acrylate.Therefore, the sensitivity, specificity, and overall accuracy of this binary prediction model were 100%, 44.2% and 63.1%, respectively for the GHS system.Table 8 shows the "highly toxic" and "low toxicity" chemicals assigned by AcutoX against the GHS category.
In the second stage, to assign a chemical to one of the five GHS categories, only the MTT +S9 test group results were evaluated further.The chemicals assigned as "highly toxic" were assigned to the GHS categories as follows: 1. GHS Cat 1: if MTT +S9 EC50 was <1 mM. 2. GHS Cat 2: if MTT +S9 EC50 was between 1 and 10 mM, inclusive.

Discussion
The only in vitro model to have been adopted previously to predict the starting dose for in vivo acute oral toxicity is a neutral red uptake cytotoxicity assay (OECD, 2010).This has been successfully used in a stepwise approach for acute oral toxicity prediction (Kojima et al., 2023), although it could only be used for chemicals with LD50 >2000 mg/kg.The weaknesses of the test system are that there is no liver enzyme system to detoxify or activate chemicals and it utilizes only a single measurement of toxicity, i.e., cellular membrane damage.It also uses mouse fibroblasts or NHK cultured in the presence of FBS, and therefore, lacks human relevance.The AcutoX method overcomes these weaknesses by employing a fully human test system (human fibroblasts cultured in the presence of pooled human serum), testing with or without pooled human liver S9 as the liver metabolic system, and using multiple endpoints, cellular membrane damage (neutral red uptake assay) and metabolic competence (MTT assay), to gain a more comprehensive assessment of cytotoxicity.Previous studies support the use of a single representative cell type to predict human acute toxicity, with the type of assay having the greatest influence on predictive potential, rather than the tissue origin of the cells (Clothier et al., 2013).
When validating in vitro tests and in silico approaches, selection of appropriate benchmark data is critical.Typically, methods are still being validated against animal-derived benchmark data even though the goal is to predict human responses.This has been necessitated by the general lack of reliable human data, especially for systemic endpoints including acute oral toxicity, and can introduce complications in the interpretation of results.The defined approaches for skin sensitization detailed in OECD guideline no.497 (OECD, 2023) demonstrated the superior prediction of the "2 of 3" prediction model in identifying human patch test results over the LLNA test.The computational toxicology model, OPERA (Mansouri et al., 2021), predicts rodent oral toxicity whereas AcutoX predicts human oral toxicity.This species difference explains some differences between these models when comparing with the in vivo rodent classifications.
The AcutoX model follows OPERA by initially identifying if the chemical is "highly toxic" or "low toxicity" before assigning the chemical to their EPA and GHS categories.When comparing OPERA with the AcutoX test, AcutoX is less predictive in identifying the correct rodent acute oral toxicity EPA and GHS classifications.In addition to factors around species differences, OPERA compares the nearest 5 chemically similar structures of unknown chemicals in the CatMOS database, which contains 11,992 chemical structures, to predict the acute oral toxicity hazard of unknown chemicals (Mansouri et al., 2021), whereas AcutoX was built from 67 chemicals to date.Since the OPERA prediction uses a large chemical database, it has the capacity to be a more accurate predictive model.OPERA does have limitations, for example, it cannot make predictions for inorganic and metallo-organic chemicals, mixtures, and general representations that are not specific, such as repeating monomers (Mansouri et al., 2021).AcutoX can be used with inorganic and metallo-organic chemicals.Mixtures and chemicals containing repeating monomers could theoretically be assessed in AcutoX by basing the data on the mean molecular weight of the item.However, further work with mixtures with known, curated, rodent LD50 values to create a full EPA and GHS prediction model would be required.
A screening testing strategy is proposed.For a chemical with suitable media solubility (>200 mM), a broad, binary "highly toxic" / "low toxicity" prediction can be made.Where a "low toxicity" prediction is identified (EPA Cat 3 or Cat 4 and GHS Cat 4 or Cat 5), the chemical can be considered to have a low acute oral toxicity, while a "highly toxic" prediction (EPA Cat 1 or Cat 2 and GHS Cat 1, Cat 2, or Cat 3) is indicative of high acute oral toxicity.
Since the in vitro AcutoX and in silico OPERA prediction models provide different methods to assign a chemical to an EPA and GHS acute oral category, these models could be utilized in an IATA for performing oral acute hazard identification and eventual replacement of the OECD acute oral tests.Further work will be performed to compare the outputs from these tests.
Many chemicals are detoxified or become more toxic in the liver.AcutoX can identify chemicals that are metabolically activated and detoxified in the human liver.It has been shown, for example, that 2-butyne-1,4-diol is metabolized by liver alcohol dehydrogenase to an unknown toxic metabolite (Taberner and Pearce, 1974).This toxification was identified by the AcutoX assay, with the EC50 values for 2-butyne-1,4,diol decreasing in the presence of human liver S9 extract (Tab.5, Fig. 7C-D).Moreover, it is well known that the liver enzyme catalase is responsible for the degradation of hydrogen peroxide into oxygen and water (Williams, 1928).Similarly, a dramatic increase in EC50 values was observed for hydrogen peroxide in the presence of exogenous liver metabolism in the AcutoX assay (Tab.5, Fig. 7C-D), demonstrating that the metabolic component of the assay is active and able to recapitulate the activity of the liver in a whole organism.
To build further confidence in the AcutoX test, further work could include determining reproducibility of the assay with a subset of the chemicals (van der Zalm et al., 2022) and comparing these with the variability in in vivo acute oral hazard classification (Karmaus et al., 2022).The test is performed on 3 independent occasions with the mean value used in the prediction model.Within lab, the inter-assay coefficient of variation for each end point is 10.12%, 8.13%, 8.76%, 8.08% for NRU, NRU+S9, MTT and MTT+S9 respectively.This demonstrates good reproducibility for the chemicals tested.Chlorpromazine hydrochloride data will continue to be collected to ensure reproducibility beyond this study, with acceptance criteria for the main test set as the mean EC50 ± 1.5 SD of this historic control data.Based on the current dataset, these acceptance criteria are 7.7 to 30.8 µM for NRU, 67.4 to 101.6 µM for NRU +S9, 9.5 to 24.3 µM for MTT and 54.7 to 82.0 µM for MTT+S9.
In conclusion, the in vitro AcutoX model has been developed using human dermal fibroblasts in human-relevant culture conditions, with and without the human S9 metabolizing system, and MTT and NRU as measurements for cytotoxicity.The binary prediction model ("highly toxic" versus "low toxicity") resulted in sensitivity, specificity, and overall accuracy of 100%, 52.8% and 73.8%, respectively, for the EPA classification system and 100%, 44.2% and 63.1%, respectively for the GHS system.For the EPA system, 48.3% of chemicals were correctly assigned to their EPA categories.Most of the incorrectly predicted chemicals (80.6%) were assigned to a higher EPA category (e.g., EPA Cat 1 instead of Cat 2) resulting in a protective hazard identification system with 90.0% of chemicals either assigned to the correct or higher EPA category.For the GHS system, 45.0% of chemicals were correctly assigned to their GHS categories.Most of the incorrectly predicted chemicals (87.9%) were assigned to a higher GHS category resulting in a protective hazard identification system with 93.3% of chemicals either assigned to the correct or higher GHS category.

Fig. 1 :
Fig. 1: Curated chemicals A: Curated chemicals chosen for testing in AcutoX by industrial sector; B: Proportion of curated chemicals chosen for testing in AcutoX by EPA classification; C: Proportion of curated chemicals chosen for testing in AcutoX by GHS classification Fig 1A shows the categorized chemicals by industrial sector; food and fragrance, personal care, manufacturing, consumer products, pesticides, and pharmaceuticals, demonstrating a wide and equal mixture of chemicals across industrial sector.The curated chemical distribution by EPA and GHS classifications are shown in Fig. 1B and Fig. 1C, respectively.The division of chemicals across EPA and GHS groups was well balanced within the classifications and this is shown in Fig 2. This clearly shows an even distribution of chemicals between all 9 classifications.

Fig. 5 :
Fig. 5: AcutoX decision tree for the EPA category prediction model

Fig. 6 :
Fig. 6: AcutoX decision tree for the GHS category prediction model 3.3 Mechanism of action and metabolism Since AcutoX uses two different mechanisms to assess cytotoxicity, i.e., MTT reduction (a measure of metabolic competence of the dermal fibroblast cells) and neutral red uptake (a measure of membrane integrity of the dermal fibroblast cells), it is possible to identify mechanisms of action when comparing MTT with NRU (Fig 7).Since the R 2 for both MTT +S9 against NRU +S9 (Fig 7A) and MTT -S9 against NRU -S9 (Fig 7B) are high (0.978 and 0.974, respectively), this demonstrates that the analytical methods are similarly determining cytotoxicity with no differences in mechanism of action being observed.Additionally, since AcutoX includes an exogenous metabolic component, i.e., human S9, it is possible to identify if there is metabolic activation to a more toxic metabolite or deactivation to a less or non-toxic metabolite.There is metabolic activation of 2-butyne-1,4-diol to an unknown toxic metabolite as observed with the shift to the left in the MTT (Fig 7C) and NRU (Fig 7D) graphs.Conversely, there is a detoxification of hydrogen peroxide and digoxin as observed with the shift to the right in the same graphs.

Tab. 4: Curated chemicals chosen for AcutoX testing with CAS no., molecular weight, supplier and curated LD50, EPA and
a As curated in Selection of Chemicals section.b Sourced from product specific MSDS documents.c ECHA registration dossier.