Applying a Next Generation Risk Assessment Framework for Skin Sensitisation to Inconsistent New Approach Methodology Information

Cosmetic products must be safe for their intended use. Regulatory bans on animal testing for new ingredients has resulted in a shift towards the use of new approach methodologies (NAM), such as in silico predictions and in chemico / in vitro data. Defined Approaches (DA) have been developed to interpret combinations of NAM to provide information on skin sensitisation hazard and potency, three having been adopted within OECD Guideline 497. However, the challenge remains as to how DA can be used to derive a quantitative point of departure for use in next generation risk assessments (NGRA). Here we provide an update to our previously published NGRA framework and present two hypothetical consumer risk assessment scenarios (rinse-off and leave-on) on one case study ingredient. Diethanolamine (DEA) was selected as the case study ingredient based upon the existing NAM information demonstrating differences with respect to the outcomes from in silico predictions and in chemico / in vitro data. Seven DA were applied, and these differences resulted in divergent DA outcomes and reduced confidence with respect to the hazard potential and potency predictions. Risk assessment conclusion for the rinse-off exposure led to an overall decision of safe for all DA applied. Risk assessment conclusion for the higher leave-on exposure was safe when based upon some DA but unsafe for others. The reasons for this were evaluated, as well as the inherent uncertainty from the use of each NAM and DA in the risk assessment, enabling further refinement of our NGRA framework


Introduction
All cosmetic products which are placed onto the market must be safe for their intended use and as such must undergo a human health risk assessment (SCCS, 2021) (BPR, Regulation (EU) 528/2012). In Europe, a ban on animal testing for new cosmetic ingredients was implemented within the Cosmetic legislation (Regulation (EC) No 1223/2009) in 2009, with a marketing ban implemented for skin sensitisation from 2013. Other geographic regions have and continue to follow suit 1 (Daniel et al., 2018). Thus, risk assessment of cosmetics and their ingredients has shifted toward the use of new approach methodologies (NAM), such as in silico predictions and in chemico/in vitro data. The development of NAM within the field of skin sensitisation has been particularly successful, aided by our understanding of the molecular mechanisms of skin sensitisation and documentation of these within "The Adverse Outcome Pathway (AOP) for Skin Sensitisation" (OECD, 2014). A number of in chemico and in vitro NAM addressing key events (KE) involved in the induction of skin sensitisation have been developed and have been reviewed recently (Ezendam et al., 2016), some of which have now been validated and Organisation for Economic Cooperation and Development (OECD) test guidelines adopted. OECD TG 442C describes the direct peptide reactivity assay (DPRA), amino acid derivative reactivity assay (ADRA) and the kinetic DPRA, which are test methods addressing KE-1 -the binding of haptens to proteins in the skin (OECD, 2020). OECD TG 442D describes the KeratinoSens™ and LuSens which are test methods addressing KE-2 -the activation of keratinocytes (OECD, 2018) . OECD TG 442E describes the h-CLAT, U-Sens™, IL-8 Luc assay and GARDskin™, which are test methods addressing KE-3 -the activation of dendritic cells (OECD, 2022). Currently there are no NAM addressing KE-4 -the activation and proliferation of a T cell response, which are sufficiently progressed for implementation in OECD TG or for use in a Next Generation Risk Assessment (NGRA) (van Vliet et al., 2018). Due to the sequential nature of the AOP, the need for a NAM addressing KE-4 might not be a general requirement for all NGRA. NAM can be used to provide information on skin sensitisation hazard and potency. Defined Approaches (DA), which follow a specific data interpretation procedure have been developed to interpret combinations of NAM information (Ezendam et al., 2016;Gilmour et al., 2020;Hoffmann et al., 2018;Kleinstreuer et al., 2018;Tollefsen et al., 2014). Three DA, which provide a skin sensitisation hazard or UN GHS classification have now been adopted within OECD Guideline No. 497: Guideline on Defined Approaches for Skin Sensitisation (OECD, 2021). Despite this progress in the identification of skin sensitiser hazard or UN GHS classes, there remains the challenge of how predictions from DA can be used to derive a quantitative point of departure (PoD) for use in a next generation risk assessment (NGRA) to ensure consumer safety.
Cosmetics Europe has implemented a scientific research program, the Long-Range Science Strategy to foster the development, assessment, and application of NAMs in human health risk assessments and to support their regulatory acceptance (Desprez et al., 2018). For skin sensitisation, this work has included review/evaluation of available NAM (Reisinger et al., 2015), prioritization of six NAM to generate a database of NAM information (Hoffmann et al., 2022), which was used to assess the DA and Integrated Approaches for Testing and Assessment (IATA) for skin sensitisation and ultimately propose a NGRA framework to allow application of a structured logic to the skin sensitisation risk (Ezendam et al., 2016;Kleinstreuer et al., 2018;Gilmour et al., 2020;SCCS, 2021).
Risk assessment case studies using only NAM are being increasingly utilized to explore application of the different DA aligned to such NGRA frameworks in hypothetical risk assessment scenarios (Assaf Vandecasteele et al., 2021;ENV/CBC/MONO(2022)32, 2022Gautier et al., 2020;Gilmour et al., 2020Gilmour et al., , 2022Natsch et al., 2018). Here, Diethanolamine (DEA) was selected as the case study ingredient based upon the existing NAM information being available in the public domain and demonstrating differences with respect to the outcomes from in silico predictions and in chemico / in vitro data (Hoffmann et al., 2022). Whilst the use of read across, including the use of analogue data could be used to reduce uncertainty in the risk assessment, this was considered out of scope for this case study to allow focus on how to deal with the conflicting data in absence of analogues. DEA is a pH adjuster, used in the manufacturing industry and in cosmetics, except of the EU and Canada where its use in cosmetics is prohibited (European Union Council Directive, 76/768 EEC) (Fiume et al., 2017). Hypothetical risk assessments using only NAM information were conducted for two consumer exposure scenarios: 0.8% DEA used in a shampoo (rinse-off product representing a low exposure) and 0.8% DEA used in an underarm deodorant (leave-on product representing a relatively high exposure). These are hypothetical exposures and do not represent realistic use levels. The impact of the inconsistent outcomes from the in silico predictions and the in chemico / in vitro data was analyzed with respect to the outcome of seven DA and the risk assessment decisions for the given exposure scenarios for each of the individual DA. This case study represents an example of a complex risk assessment scenario which aims to address causes of and issues surrounding use of data sets with differing outcomes in risk assessment. Exploring the added complexity led to an update to the NGRA framework (Gilmour et al., 2020) by providing additional clarity on the hypothesis generation and how the available data will be used in risk assessment. Case studies, such as the one presented here will enable us to continue to build our knowledge in the strengths and limitations of the NAM and confidence in the application of NAM within a NGRA framework for skin sensitisation.

2.1
Selection of case study ingredient and scope Diethanolamine (DEA) (CASRN# 111-42-2) was selected as the case study ingredient based upon availability of existing NAM information where differences were evident with respect to the outcomes from the in silico predictions and the in chemico / in vitro data (Hoffmann et al., 2022). The aim of this case study was to explore the impact of inconsistent NAM information on the final risk assessment outcome based on two exposure scenarios; a rinse-off and leave-on consumer use. The exposure scenarios are hypothetical, i.e., do not represent real consumer exposures and were designed to exclude the application of exposure-based waiving and read across within the NGRA framework.

2.2
Risk assessment framework The previously published NGRA framework (Gilmour et al., 2020) was applied and then updated ( Fig. 1) based upon the learnings from this case study (described below). In short, the framework is a tiered and iterative approach, involving first the collation of all available information; thereafter hypothesis generation and consideration as to how this information can be used within the risk assessment; whether further data generation is required and then the final risk assessment conclusion.

Exposure
The consumer exposure level (CEL) in dose per unit area, expressed as micro-grams applied per day (µg/cm 2 ) for DEA in each product type, was calculated based upon the 90 th percentile consumer use provided in the SCCS Notes of Guidance (SCCS, 2021).

Chemical characteristics
Information on Molecular weight (MW), LogP, LogS, LogVP, boiling point and melting point were sourced from a previously published dataset (Hoffmann et al., 2022). These values were calculated using the quantitative structure−property relationship (QSPR) (Zang et al., 2017). The pH was measured using an adaptation of the method described in OECD Test Guideline No. 122 (OECD, 2013) and information on volatility was obtained from the MPBPWIN™ model from the open source Episuite™ software version 4.1 (Spicer et al., 2002). The BN-ITS uses some additional physical chemical properties (log D and plasma protein binding) calculated by ACD Lab software (version 2021.2.0). Fraction ionized is calculated via LogP/ logD.

Available (NAM) hazard information
Information from the NAM listed below were sourced from a previously published dataset (Hoffmann et al., 2022): − Derek Nexus v. 6.1: A prediction of skin sensitisation potential was obtained from Derek Nexus, a commercial knowledge-based Expert System with set of rules coded as structural alerts developed by Lhasa Ltd. (Chilton et al., 2018). − TIMES-SS V2.30.1.11: Predictions of skin sensitisation potency for parent chemical and its potential metabolites were obtained from The TImes MEtabolism Simulator platform, a hybrid commercial expert system developed by the Laboratory of Mathematical Chemistry (LMC) that encodes structure-toxicity and structure-skin metabolism relationships through a number of transformations (Patlewicz et al., 2007(Patlewicz et al., , 2014. − ToxTree v. 3.1.0: A prediction of the skin sensitisation reactivity domain was obtained from ToxTree, a rule-based system for the assignment of substances to one or several skin sensitisation reactivity domains (Enoch et al., 2008). Information using this version was collated in the reference dataset (Hoffmann et al., 2022). − OECD QSAR Toolbox v 4.5: A skin sensitisation prediction was obtained from the OECD QSAR Toolbox Skin sensitisation automated workflow for DASS (defined approaches for skin sensitisation), a software application for assessing the hazards of chemicals 2 . − Direct peptide reactivity assay (DPRA): Information on reactivity (KE-1), measured as percentage depletion of synthetic cysteine and lysine-containing peptides was obtained from the DPRA, OECD TG 442C (OECD, 2020;Gerberick et al., 2004). − KeratinoSens TM : Information on cellular activation (KE-2), measured as activation of the Keap1-Nrf2-ARE-pathway was obtained from the KeratinoSens TM assay, OECD TG 442D (Natsch and Emter, 2016;OECD, 2018).  Defined approaches applied in case study The following seven DA were explored within this case study based upon a combination of factors including the authors' working knowledge of the DA; accessibility of the DA; availability of input data; ability to derive a PoD and previous use within NGRA case studies for cosmetic ingredients (ENV/CBC/MONO(2022)32, 2022; Gilmour et al., 2020). Since each DA is described in detail elsewhere, only a brief description is provided below: Integrated testing strategy (ITSv1 and ITSv2) The ITSv1 and ITSv2 DA were recently adopted in the OECD guideline No. 497 (OECD, 2021). The DA requires information from the DPRA, h-CLAT and Derek Nexus (ITSv1) or the OECD automated workflow (ITSv2). All inputs are converted to a score of 0 to 3, then the summed score, ranging between 0 and 7, can be used to predict the skin sensitising potential and as a sub-categorization according to the UN GHS: Cat. 1A (6-7), Cat. 1B (2-5), Not Classified (Non-Sensitisers) (0-1) or inconclusive (0-1, due to either the in silico prediction or one of the OECD test methods being out of domain).

Artificial neural network (ANN) model (ANN-TIMES and ANN-ToxTree)
The ANN is an integrated testing strategy for prediction of the skin sensitisation potential and potency (predicted EC3) (Hirota et al., 2018). The model uses information from the DPRA, KeratinoSens™, h-CLAT and TIMES-SS or ToxTree (ANN-TIMES and ANN-ToxTree). The ANN is a nonlinear statistical data-modelling tool that predicts LLNA EC3 values which can be used as (continuous) potency estimations or to subcategorize skin sensitisers in UN GHS Cat. 1A and 1B. Whilst, individual inputs can be used to make a prediction from the ANN, the performance improves when all inputs are utilized.

Sequential testing strategy (STS) DA
The STS DA is a tiered approach providing hazard identification and potency category information (UN GHS Cat. 1A/1B/Not classified). The first tier (hazard identification) requires information from DPRA, U-SENS™, KeratinoSens™, TIMES-SS and physicochemical parameters (pH and volatility). The second tier (hazard potency) requires information from DPRA, KeratinoSens™, U-SENS™ {and optionally SENS-IS (Cottrez et al., 2016)} and physicochemical parameters (Molecular Weight, volatility and clogP). The STS is a meta-model stacking five different statistical methods: Boosting, Naïve Bayes, Support Vector Machine (SVM), Sparse PLS-DA and Expert Scoring (Gomes et al., 2014). Each statistical method provides a probability of the chemical being a skin sensitiser. These intermediate probabilities are then linearly combined in the stacking meta-model which determines a final overall probability (a stacking score) of the chemical being a skin sensitiser (Gomes et al., 2012;Nocairi et al., 2016;Wolpert, 1992), providing a probability to belong to the group of interest (UN GHS Cat. 1A, UN GHS Cat. 1B, Not Classified (Non-Sensitisers). Thresholds of ≥70% probability of being a skin sensitiser are applied for UN GHS Cat.1 classification and ≤ 30% for categorization as not classified. Based upon comparison to historical data, it was shown that the level of confidence in the hazard prediction was high when these thresholds are applied (Del Bufalo et al., 2018).

Bayesian network ITS (BN-ITS) DA
The BN-ITS DA is a Bayesian Network (BN) which provides skin sensitiser hazard identification and allows potency derivation distinguishing four categories (ENV/CBC/MONO(2022)32, 2022; Jaworska et al., 2015). The model uses information from DPRA, KeratinoSens™, h-CLAT, TIMES-SS and physicochemical properties (water solubility at pH7 {WspH7} (M), protein binding (% bound), log D (pH7), log KoW). The model prediction is given as a potency probability distribution, the pEC3, over 4 potency classes: non-sensitisers, weak, moderate, strong/extreme. The BN-ITS can reason and provide potency information using partial datasets as inputs. In general, using all input parameters will lower uncertainty in the prediction and by adding additional data sequentially one can explore the impact on confidence on the prediction. Expressing the prediction as a probability distribution allows quantification of uncertainty in the pEC3. In a subsequent step, the probability is converted into Bayes factors, estimating a strength of evidence, also for partial input datasets, (Goodman, 1999) and the final decision of the predicted potency category is based on the highest BF in a category.

SARA model
The SARA model utilizes a Bayesian statistical approach to infer a human-relevant metric of sensitiser potency and a measure of consumer risk for any given consumer exposure to a chemical of interest . It can utilize any combination of data from human repeat insult patch tests (HRIPT) (Politano and Api, 2008), LLNA (OECD, 2010), DPRA, KeratinoSens™, h-CLAT or U-SENS™ to derive sensitiser potency, which is expressed as the ED01 (the dose in µg/cm 2 which is predicted to sensitise 1% of a HRIPT population), with explicit quantification of the uncertainty associated with the prediction. Whilst the SARA model can derive a potency and risk prediction based upon a single NAM input, however the precision of the prediction is dependent upon which NAM and the experimental values obtained from the NAM. Generally, increased number of inputs will reduce the uncertainty in the SARA predictions. In addition to potency assessment the SARA model also provides the probability of whether a given exposure is low risk (SARA risk metric). The margin of exposure (MoE) is calculated for the given exposure within the SARA model, by dividing the ED01 by the consumer exposure (in µg/cm 2 ). This MoE is then regressed against the MoE derived for benchmark consumer exposures. The benchmark exposures are historical/current exposures to consumer products which have been designated as high or low risk for induction of skin sensitisation based upon the publicly available clinical/patch test data.

Risk assessment Determination of point of departure (PoD)
The predictions from the DA were converted to PoD, based on each unique DA and outcome. The purpose here was not to use the information incorporated within the DA and the DA predictions in isolation but to apply them in the context of all available information within the NGRA. As described below in the hypothesis, uncertainty was introduced from the different outcomes observed in the available information for DEA. Therefore, whilst not necessarily required for all purposes, in this case study scenario a DA prediction of non-sensitiser was converted into a PoD with the aim to further increase confidence in the risk assessment. The PoD derivation was conducted for this case study as follows: − For DA which predicted a non-sensitiser outcome for DEA an EC3 value of 100% (negative LLNA) was utilized to derive a PoD. − For DA which predicted a sensitiser UN GHS Cat. 1B for DEA, a default LLNA EC3 value of >2% was applied (GHS, 2021). − The ANN output is a predicted EC3 value (in %). − The SARA model makes a prediction of the human sensitisation potency, the ED01, this value is used within the model as the PoD . Whilst several proposals have been published on how to convert LLNA EC3 values into sensitisation potency categories or PoD values for risk assessment (Api et al., 2008;Griem et al., 2003). For this case study we applied a unified approach, by converting the DA predicted LLNA threshold values (EC3%) into a PoD [µg/cm²] as dose per unit area by using a factor of 250 (Robinson et al., 2000). This is based on the standard LLNA protocol where 25 µL test solution are distributed over a surface of 1 cm 2 per mouse ear (Griem et al., 2003).

Margin of exposure
The Margin of exposure (MoE) is calculated using the following equation: MoE = PoD / CEL

Risk assessment outcome
The overall risk assessment outcome is evaluated as a weight of evidence considering the calculated MoE [and in the case of SARA the corresponding P(low risk)], the confidence in use of NAM input data within the DA, and the relative conservatism in the transformation of the DA outcome to a PoD. − The value associated with an acceptable MoE for risk assessments based upon NAM is yet to be defined. For the purposes of this case study the MoE was considered as high if > 100 considered as low if < 100 in order to illustrate the decisionmaking process. − Note, this does not apply to SARA, which incorporates clinical benchmarks to provide empirical support for the size of the MoE. − Confidence in use of the NAM information informing each DA was defined by whether the chemical was predicted to be directly or indirectly reactive (ie: a pre or pro hapten), which indicates whether the test method should be considered as in or out of domain and whether the in silico prediction was in domain. For the purposes of this case study this was considered as high if all NAM utilized in the DA are within domain and low if all NAM utilized in the DA are out of domain. − Relative conservatism in transformation of DA outcome to PoD was considered as high when the PoD is derived from a non-sensitiser outcome from the DA. This was considered as low when the PoD is provided as a quantitative output of the DA and considered as unknown when the PoD is derived from CLP Cat. 1B or 1A DA outputs.

Updated NGRA framework
In the process of constructing this case study it became evident that the original NGRA framework (Gilmour et al., 2020) could be refined. The updated framework is shown in Figure 1. These modifications are as follows: − The elements of Tier 0 (gathering existing information) have been grouped together, as they are applied without any specific order, e.g., not in a sequence as indicated in the earlier version. − Within Tier 0 the gathering of chemical characteristics was expanded to explicitly include physicochemical parameters, which are used in several DA. − Tier 1 (hypothesis generation: How will data be used in risk assessment) was simplified to better reflect the logic applied when the integration of the information allows (or not allows) a weight of evidence conclusion of non-sensitiser with high confidence (and thus exit) from Tier 1. This is described in more detail in the following section.

Use scenario
The case study exposure scenarios were selected to represent both a relatively low consumer exposure (0.8% DEA in a shampoo) and a relatively high consumer exposure via a product to remain occlusive on skin (0.8% DEA in an underarm deodorant). The exposure was calculated based upon 90 th percentile consumer use provided in the SCCS Notes of Guidance (SCCS, 2021). The consumer exposure from use of 0.8% DEA in a shampoo was calculated to be 0.6 µg/cm 2 (11 g shampoo/day x 0.8% use concentration x 0.01 skin retention factor / 1440 cm 2 skin surface).
The consumer exposure from use of 0.8% DEA in an underarm deodorant was calculated to be 60 µg/cm 2 (1.5 g deodorant roll-on/day x 0.8% use concentration x 1 skin retention factor / 200 cm 2 skin surface).

Chemical characteristics
For the purposes of this case study, it was assumed that DEA was 100% pure and that risk assessment of impurities was not required. Physicochemical properties are summarized in Table 1.

Hypothesis generation and how the data will be used in risk assessment
DEA was assumed to be a novel/new ingredient which is proposed for use in two cosmetic products: 0.8% in a shampoo and 0.8% in an underarm deodorant.
Two of the four applied in silico tools predicted no reactivity or skin sensitisation potential, all predictions for DEA were in domain for the in silico tools. Derek Nexus predicted that DEA was a skin sensitiser, although this outcome was reported as equivocal. A detailed review of the report revealed that this alert (aminophenol) was triggered by differences with respect to sensitiser / non-sensitiser outcome in the historical in vivo data underpinning the alert, which can vary from moderate/weak sensitiser potency in the LLNA and negative skin sensitisation potential in guinea pig maximization test. The human patch test data also demonstrate materials within this chemical class are rare human sensitisers (Basketter et al., 2014;Lessmann et al., 2009). ToxTree triggered a protein binding alert and reported DEA could be reactive via a Schiff base mechanism, this was corroborated by expert review which concluded that DEA has the potential to form a Schiff base by probiotic activation (i.e.: is a pro-hapten). The Derek Nexus report also supports this chemistry hypothesis indicating that DEA is likely a pro-Schiff base, and as an aminoethanol it is thought to sensitise by reaction pathways involving either abiotic or enzymatic processes in which this class of compounds is oxidized into imine/aldehydes and further hydrolyzed into glycoaldehyde (CASRN# 141-46-8), which has an EC3=1.8% (Anderson et al., 2007). Whist the metabolite is a contact allergen of moderate potency, 100% enzymatic conversion is not plausible/likely to happen in vivo since biotic transformation requires access to cells within the skin and skin penetration has been shown to be low (below 3% over 24h in human skin in vitro), most of the penetrating DEA remained in the skin reservoir (Brain et al., 2005;Kraeling et al., 2004).
The available NAM data demonstrate differing outcomes. DPRA and KeratinoSens TM gave negative results while U-SENS™ and h-CLAT were positive, according to the prediction models specified in the respective OECD TGs. DEA was predicted to potentially be a pro-hapten, thus some caution should be applied in interpreting these results given the negative outcomes in the DPRA and KeratinoSensTM, due to the theoretical lack of enzymatic metabolic capability in these test systems (OECD, 2018(OECD, , 2020.
A positive outcome was evident in the h-CLAT and U-SENS TM , which have been shown to detect pro-haptens due to respective enzyme activities. (OECD, 2022). However, to note, when used in combination, the skin sensitisation NAM are able to detect most pre-and pro-haptens (Urbisch et al., 2016, Patlewicz et al., 2016.
Based upon the above information, it is not possible to reach the conclusion that DEA is a non-sensitiser with high certainty.
Thus, the NGRA framework cannot be exited at Tier 1 and should be progressed to Tier 2, (Fig. 1). The next step was the application of DA to combine the NAM information and generate skin sensitisation potential and potency predictions.
Whilst in principle for a DA predicting a non-sensitiser the determination of a PoD is not required as a quantitative risk assessment is usually not done, based upon the reduced confidence in use of some of the NAM test data and in the case of the deodorant, it being associated with high consumer exposure and prolonged occlusive skin contact a non-sensitiser outcome from a DA should be converted to a PoD.
The weight of evidence risk assessment will consider the calculated MoE and in the case of SARA the P(low risk), the confidence in use of NAM input data, on DA outcome and the relative conservatism in transformation of DA outcome to PoD.

DA outcome and derivation of PoD
The ITSv1 predicts DEA to be a UN GHS Cat. 1B based upon an overall score of 2, which, as described in section 2.5. This equates to a default LLNA value of >2% which when converted to dose per unit area provides a PoD of >500 µg/cm 2 .
The ITSv2 predicts an overall score of 1, however the DA outcome is considered as inconclusive based upon DEA predicted to be a pro-hapten and thus out of domain for the DPRA (OECD, 2021). In accordance with OECD Guideline No. 497, for a DA prediction with low confidence (inconclusive) a WoE was applied within the context of the IATA (OECD, 2021). As outlined above in the hypothesis DEA must undergo metabolism to induce skin sensitisation. Whist the metabolite is a contact allergen of moderate potency, 100% enzymatic conversion is not plausible/likely to happen in vivo since biotic transformation requires access to cells within the skin and skin penetration has been shown to be low and weak responses being observed in the h-Clat and U-SENS TM (Tab. 1). Thus, in the case study risk assessment using the ITSv2, DEA is treated as a UN GHS Cat. 1B. This equates to a default LLNA value of >2% which when converted to dose per unit area provides a PoD of >500 µg/cm 2 .
The different outcomes of the two versions of the ITS were attributable to the different in silico tools applied, i.e., Derek Nexus (ITSv1) predicted sensitiser and OECD automated workflow (ITSv2) and the use of an out of domain test method having greater impact and decreasing confidence when the overall DA outcome is non-sensitising compared to when the outcome is sensitising.
The ANN (TIMES-SS) predicts an EC3 value of 81.5%, which when converted to dose per unit area to provide a PoD 20375 µg/cm 2 .
The ANN (ToxTree) predicts an EC3 value of 59.1%, which when converted to dose per unit area to provide a PoD of 14775 µg/cm 2 .
The STS predicts DEA to be a non-sensitiser with a high probability of 87% (pCat1 = 13%), which as described in section 2.5 equates to a default LLNA value of 100%, which converted to dose per unit area provides a PoD of 25000 µg/cm 2 .
The BN-ITS predicts DEA to be a non-sensitiser with a high probability (> 99%) and a high Bayes Factor (>30), which again equates to a default LLNA value of 100%, which converted to dose per unit area provides a PoD of 25000 µg/cm 2 .
The SARA model predicts an expected ED01 of 13000µg/cm 2 (95 th % confidence interval of 530 -370000 µg/cm 2 ), which is consistent with a prediction of a weak/moderate skin sensitiser potency.

3.4.2
Derivation of a point of departure and calculation of Margin of Exposure For use of 0.8% DEA in a shampoo the exposure was calculated to be 0.6 µg/cm 2 and the MoE obtained from the 7 DA ranged from 833 to 41667. Individual values are shown in Table 2.
For the use of 0.8% DEA in an underarm deodorant product the exposure was calculated to be 60 µg/cm 2 and the MoE obtained from the 7 DA ranged from 8 to 658. Individual values are shown in Table 3 Based upon the PoD being derived based upon the use of GHS Cat. 1B, categorising DEA as a weak / moderate sensitiser with an EC3 value of >2% which is converted to a PoD of >500 µg/cm 2 . The exact PoD value is undetermined and could be anywhere >500 µg/cm 2 . 4 Based upon the DA outcome being a quantitative measure of potency (EC3%) which is converted to dose per unit area.

5
The DA outcome was non-sensitiser. 6 The DA outcome is a quantitative measure of potency (ED01) which is converted to p(low risk) for a given exposure. Based upon the PoD being derived based upon the use of GHS Cat. 1B, categorising DEA as a weak / moderate sensitiser with an EC3 value of >2% which is converted to a PoD of >500 µg/cm 2 . The exact PoD value is undetermined and could be anywhere >500 µg/cm 2 . 4 Based upon the DA outcome being a quantitative measure of potency (EC3%) which is converted to dose per unit area.

5
The DA outcome was non-sensitiser. 6 The DA outcome is a quantitative measure of potency (ED01) which is converted to p(low risk) for a given exposure. 7 MoE <100 8 MoE >100 9 MoE=217 and the p(low) risk is 0.5, i.e., it is highly uncertain to whether the exposure is high or low risk.

3.4.3
Weight of evidence assessment and risk assessment outcomes The overall risk assessment outcome was evaluated as a weight of evidence considering: the calculated MoE and in the case of SARA the P(low risk), the confidence in use of NAM input data and the relative conservatism in transformation of DA outcome to PoD. Table 2 summarizes the NAM risk assessment outcomes based upon the 7 DA for the use of 0.8% DEA in a shampoo and Table 3 for the use of 0.8% DEA in an underarm deodorant. A detailed summary of the risk assessment outcome for each applied DA is given below.

ITSv1
The overall conclusion was that for the risk assessment based upon the ITSv1, use of 0.8% DEA in a shampoo was SAFE. However, use of 0.8% DEA in a deodorant was considered as UNSAFE.

ITSv2
The overall conclusion was that for a risk assessment based upon the ITSv2, use of 0.8% DEA in a shampoo was SAFE and use of 0.8% DEA in a deodorant was considered as UNSAFE

ANN (TIMES-SS)
The overall conclusion was that for the risk assessments based upon the ANN (TIMS-SS); use of 0.8% DEA in a shampoo and 0.8% DEA in a deodorant were considered as SAFE.

ANN (ToxTree)
The overall conclusion was that for the risk assessments based upon the ANN (ToxTree); use of 0.8% DEA in a shampoo and 0.8% DEA in a deodorant were considered as SAFE.

STS
The overall conclusion was that for the risk assessments based upon the STS; use of 0.8% DEA in a shampoo and 0.8% DEA in a deodorant were considered as SAFE.

BN-ITS
The overall conclusion was that for the risk assessments based upon the BN-ITS; use of 0.8% DEA in a shampoo and 0.8% DEA in a deodorant were considered as SAFE.

SARA model
The overall conclusion was that for the risk assessment based upon the SARA model the use of 0.8% DEA in a shampoo was considered as SAFE. However, the use of 0.8% DEA in a deodorant was considered as UNSAFE.

Discussion
Consumer safety risk assessments for new cosmetic ingredients are no longer based upon hazard characterisation relying upon in vivo animal tests but on NAM. Previously, we published a NGRA framework for skin sensitisation to aid the construction of risk assessments based upon NAM (Gilmour et al., 2020) and an increasing number of case studies have been applied, aligned to this framework, building our knowledge and confidence in application of these new information sources (Assaf Vandecasteele et al., 2021;ENV/CBC/MONO(2022)32, 2022Gautier et al., 2020;Gilmour et al., 2020Gilmour et al., , 2022Natsch et al., 2018). The present case study was selected based upon observed differences in outcome for the existing NAM information set (Hoffmann et al., 2022), allowing exploration of how these differences impact the DA and the risk assessment outcomes. The exposure scenarios were hypothetical and selected to represent a relatively high and relatively low consumer exposure, these exposures do not represent real use levels of DEA in cosmetics (European Union Council Directive 76/768 EEC) (Fiume et al., 2017). We considered DEA as a "new ingredient" without any existing in vivo or human data. Whilst the use of read-across analogues with historical data has previously been shown to reduce uncertainty (Gautier et al., 2020), for the purposes of this case study it was considered as out of scope to allow a focus on how to deal with the inconsistent NAM informationa scenario of relevance for risk assessment of novel ingredients. It is intended that read-across will form the topic of a subsequent case study. Information regarding the chemistry, i.e., the mechanism by which a chemical can interact with protein, is a critical element to understand the applicability domain of the NAM. It also is one element which defines the confidence in using the NAM information within the risk assessment. DEA was predicted to be a potential pro-hapten by two of the four in silico tools and an expert chemistry review, i.e., to become a hapten, it would require metabolic activation to convert to a reactive intermediate before it can then react with protein. The potential inability of DPRA and KeratinoSens TM to detect pro-haptens, as the test systems lack metabolic capacity, is well documented. However, when used in combination, the majority of the skin sensitisation NAM are reported to be able to detect most pre-and pro-haptens (Patlewicz et al., 2016;Urbisch et al., 2016). Nevertheless, the negative response observed in both DPRA and KeratinoSens™ combined with the positive responses observed in U-SENS ™ and h-CLAT (the cell-based assays addressing KE3) for DEA introduced some uncertainty in the assessment, resulting in a conclusion that it was not possible to confidently define DEA as a non-sensitiser based upon the weight of evidence (and exit at Tier 1). Whilst some of the DA applied did predict DEA to be a non-sensitiser, the reduced confidence in the use of some of the NAM information utilized within the DA and positive outcomes in test methods not used within the DA but available within the IATA led within this particular risk assessment case study to more conservative approach being applied and a PoD being derived using an LLNA EC3 of 100% (or 25000µg /cm 2 ). Furthermore, due to this uncertainty and the in silico tools applied, an inconclusive result was obtained from the ITSv2. In this instance in accordance with OECD guideline 497 and as exemplified in case studies published elsewhere a WoE can be applied to derive an outcome that can be used in hazard and risk assessment (Macmillan et al., 2022;OECD, 2021).
Many of the DA (all but the SARA) applied within this case study utilize information from in silico tools. Uncertainty can also be introduced when different in silico tools which provide the same information for use within a DA are used, this is best illustrated here by the different outcomes of the two versions of the ITS DA. This was in part due to different in silico tools being applied, i.e., ITSv1 (Derek Nexus) outcome was that DEA was a skin sensitiser (UN GHS CAT. 1B) and the ITSv2 (OECD automated workflow for DASS) for DEA was inconclusive. The OECD automated workflow for DASS is an open access software application, whereas the Derek Nexus software requires a commercial license for use. This does raise the challenge as to when the safety assessor has access to both in silico tools and the predictions are a. in domain but b. differ in outcome and subsequently result in different UN GHS Categories, which approach should the safety assessor apply? It is not only the use of different tools in a DA, the in silico tools are regularly updated with new expert knowledge resulting in updated predictions which may influence the output of the DA which utilize in silico predictions. Thus, it is important to document the versions of in silico tools used in the risk assessment.
Within the NGRA framework, there is always the possibility to generate additional NAM information e.g.; on potential metabolites (Reynolds, 2021), this was not considered in this case study. Another way to account for uncertainty within a risk assessment is by the use of safety assessment factors (SAF). For example, the quantitative risk assessment for skin sensitisation utilizes SAF to account for uncertainty in the extrapolation from the PoD -No expected sensitisation induction level (NESIL) which is commonly derived from in vivo data, either human repeat insult patch test (HRIPT) or LLNA, to a consumer product use scenario. Uncertainties considered within these SAF include human variability (increased population size), and the way the product is used compared to the HRIPT exposure (frequency, anatomical site) (Api et al., 2008(Api et al., , 2020Basketter and Safford, 2016). With the evolution of the skin allergy risk assessment paradigm and the use of NAM, it is yet to be determined whether simply applying the same SAF as the historical QRA is appropriate, particularly due to the different uncertainties associated with use of historical in vivo data versus use of NAM information. In the present case study, we have applied a different approach, i.e., we derived a MoE and then evaluated possible areas of uncertainty within the risk assessment process namely, the confidence in use of NAM input data within the DA and the relative conservatism in the transformation of the DA outcome to a PoD.
The SARA model translated the MoE into a risk metric [p(low risk)] based upon the model regressing the MoE for the case study ingredient DEA against the MoE for established high/low risk benchmark exposures, this feature means the benchmarks determine whether the MoE is sufficiently high. For the risk assessments using the information from the other 6 DA an arbitrary value of 100 was first assigned to see whether the MoE was sufficiently high. It should, however, be noted that it remains to be determined as to whether this value should be refined for skin sensitisation risk assessments based upon NAM, this needs to be re-visited and a systematic approach taken to ensure that all of the appropriate uncertainties are addressed. The weight of evidence uncertainty assessment outlined here is a very simplistic framework which was applied to explore how such an approach could be further developed to increase transparency in the decision making as to whether an exposure should be considered as safe or un-safe. Our work is ongoing to expand upon this type of uncertainty assessment to ensure that our NGRA reaches the desired level of transparency and adequately addresses all the associated uncertainties.
Whilst the different outcomes observed in the NAM information led to differences in the DA outputs, there was less impact on the risk assessment outcome. In the case of the rinse-off scenario, the use of 0.8% DEA in a shampoo was considered as safe regardless of the DA used within the risk assessment. For the leave-on scenario of 0.8% DEA in an underarm deodorant product 4/7 applied DA resulted in a conclusion of safe (STS, BN-ITS, ANN-TIMES and ANN-ToxTree) whilst 3 (ITSv1, ITSv2 and SARA) resulted in a conclusion of unsafe. The mostly likely explanation for this observation is that this is largely dependent upon the rules that have been applied within this case study. For example, in the process of applying DA which were developed to derive UN GHS categories a conservative approach such as demonstrated here in the case of the ITSv1 and ITSv2 was applied. A PoD value of >500 µg/cm 2 was applied as a most conservative estimate of a true threshold since it is not possible to determine where the exact threshold would lie. Furthermore, whilst the SARA model has integrated high/low risk benchmarks which provide empirical support for whether a MoE is sufficient and provides a p(lowrisk), the other DA to do not incorporate this functionality, thus an arbitrary value of 100 was set as the 'acceptable' MoE so that the NGRA process could be illustrated. If, for example a higher or lower value was applied as the 'acceptable' MoE, then the risk assessment outcomes for both exposures may be different. As noted above -this is all 'work still to be done'. It is envisaged that this will become evident as we evolve a systematic weight of evidence uncertainty assessment approach and it may ultimately transpire that the 'acceptable MoE' value is dependent upon the DA applied, the uncertainty associated with the NAM data used within the DA and available additional NAM information, not used in the DA. The purpose of this case study was to demonstrate that DAs can be used in an IATA. It is not required to apply all the DAs to conduct a next generation risk assessment. In reality, not all DA are publicly available for use at this moment. Which DA to apply will depend on accessibility to the DA, experience in the use of a certain DA, the information required from a DA (e.g., GHS classification or quantitative potency prediction), and the existing NAM information. A safety assessor will first apply a DA for which the required NAM input data are available, before generating new data.
Overall, we demonstrated that our NGRA framework can be successfully applied to more complex cases. Most importantly, the framework is transparent enough to explain exactly what has been done and why. This NGRA framework will continue to evolve and thus, be adaptable to different scenarios. Other case studies will follow to further challenge our NGRA framework, build our knowledge in application of NAM and increase confidence in the risk assessment for skin sensitisation of cosmetic ingredients based on NAM.