Developing structure-activity relationships for N -nitrosamine activity Computational Toxicology

and carcinogenicity reaction mechanisms, collection and review of available, public relevant experimental data, development of structure – activity relationships consistent with mechanisms for prediction of N -nitrosamine carcinogenic potency categories, and improved methods for calculating acceptable intake limits for N -nitrosamines based upon mechanistic analogs. Here we describe this collaboration and review our progress to date towards development of mechanistically based structure – activity relationships. We propose improving risk assessment of N -nitrosamines by first establishing the dominant re- action mechanism prior to retrieving an appropriate set of close analogs for use in read-across exercises. that


Introduction
Recently N-nitrosodimethylamine (NDMA) has been detected in several pharmaceutical marketed drugs. These events have led regulatory agencies to require that N-nitrosamine risk assessments be performed on all marketed medical products [1]. The need for these assessments is driven by the high carcinogenic potency of several Nnitrosamines in rodents, thus making these substances a significant regulatory concern [2]. Management of N-nitrosamine impurity levels in pharmaceutical drug substances and products has previously been guided by ICH M7 where they are referred to as "cohort-of-concern" (COC) compounds. Consequently, class-specific Acceptable Intake (AI) limits for N-nitrosamines are calculated from compound-specific carcinogenicity data by extrapolation of rodent TD 50 values. For N-nitrosamines without carcinogenicity data, regulatory agencies established provisional AI limits for several N-nitrosamine impurities based on structure activity relationships (SARs) with "close" analogs [3][4][5][6]. Currently, these regulatory limits are based on the AIs for the highly potent animal carcinogens NDMA and N-nitrosodiethylamine (NDEA). However, not all N-nitrosamines are highly potent (as measured by rodent TD 50 values), and their carcinogenic potency have been shown to span over 4 log units of TD 50 values [7,8]. Fortunately, the class-specific limit can be adjusted based upon a SAR analysis as part of a comparison with other similar N-nitrosamines that have established carcinogenicity Abbreviations: AI, acceptable intake; ADME, absorption, distribution, metabolism, and elimination; CPDB, Carcinogenicity Potency Database; COC, cohort of concern; EMA, European Medicines Agency; EWG, electron-withdrawing group; FDA, U.S. Food and Drug Administration; LCDB, Lhasa Carcinogenicity Database; NDEA, N-nitrosodiethylamine; NDIPA, N-nitrosodiisopropylamine; NDMA, N-nitrosodimethylamine; NDSBA, N-nitrosdisecbutylamine; NMEA, N-nitrosomethylethylamine; NMNA, N-nitrosomethylneopentylamine; NMIPA, N-nitrosomethylisopropylamine; NMTBA, N-nitrosomethyltertbutylamine; NPDA, N-nitrosodiphenylamine; SAR, structure activity relationship; TD 50 , dose that results in a 50% excess in tumor incidence. data. The EMA Assessment Report on the subject [4] states, "It is therefore prudent to consider all N-nitrosamines containing an α-hydrogen that can be metabolically activated as potentially mutagenic and carcinogenic to humans, however with different potencies depending on nature of the functional group, specifics of metabolic activation and repair efficiency and capacity." To investigate whether improvements in SARs can more effectively predict N-nitrosamine carcinogenic potency, an ad hoc workgroup of 23 companies and universities was established to address several scientific and regulatory issues. These include: 1) reporting and review of N-nitrosamine mutagenicity and carcinogenicity reaction mechanisms, 2) collection and review of available, public, relevant experimental carcinogenicity and mutagenicity data, 3) development of SARs consistent with mechanisms for predicting Nnitrosamine carcinogenic potency categories, and 4) improved methods for calculating AI limits for N-nitrosamines based upon mechanistic analogs.
Herein we describe the progress made towards development of mechanistically based SARs, identifying the structural features that most affect carcinogenic potency. Specifically: 1) α-carbon substitution, and 2) electron-withdrawing groups on nitrosamine carcinogenicity potency and mutagenicity prevalence. The features that impact a SAR of a complex biological process such as carcinogenesis may include a number of different events. The key events driving DNA mutagenicity from dialkyl N-nitrosamines include: metabolic activation, DNA alkylation and the repair of potential DNA adducts. While these events could potentially result in different SARs, the metabolic activation mechanism is understood [9,10] to be of principal concern for the overall SARsince if a nitrosamine is not metabolically activated, the SAR for binding and repair is relevant. A three-stage consideration of the SAR, however, may be necessary in some cases to fully explain the potency of some dialkyl N-nitrosamines.

Metabolic activation mechanisms for dialkyl N-nitrosamine mutagenicity
Given the significance of the metabolic activation in understanding the overall SAR, current understanding is briefly summarized here. It has been reported [9,10] that several different competing metabolism mechanisms primarily drive the potency for dialkyl N-nitrosamines, with uninhibited metabolic activation via α-carbon hydroxylation producing the most potent carcinogens. The mechanism for the highest potency dialkyl N-nitrosamines (i.e., those with the lowest TD50 values available in the carcinogenicity database) is that of α-carbon hydroxylation via metabolic activation, as indicated in Fig. 1 [9][10][11][12]. It has been reported that multiple stages of this, including that marked as heterolysis, may be catalysed by the same P450 enzyme without relaxation of conformationresulting in the loss of the R1-bearing side as a carboxylic acid as opposed to an aldehyde [13,14]; however, in other cases such as nitrosomorpholine, the reactive aldehyde intermediate is significant and trapped intramolecularly [15].
For small dialkyl nitrosamines, the predominant enzyme responsible for the activation of the nitrosamine to intermediate I is reported to be Cytochrome P450 2E1 (Cyp 2E1) [12]; however, the active site of this specific isoform is particularly small, and a number of other P450 isoforms may become involved for larger nitrosamines. Examples of particular relevance are: Cyp 2A6 -also relevant for small nitrosamines [11,12,14]; Cyp 2C9 -substrates with an anionic site, and of specific orientation requirements [16,17]; 2C19 -Zwitterionic compounds [17]; 2D6 -cationic site [17] and Cyp 3A4 -which is able to metabolise particularly large substrates [17].
Many factors can contribute to nitrosamine carcinogenicity potency, including: a) the relevant P450 enzymes summarised above and their levels in various target organswhich can vary between species and between individuals [11] b) compound solubility, size, and shape [18], c) potential phase II conjugation (such as carboxylic acid-containing compounds being substrates for e.g., glucuronidation directly), d) the stability of intermediates such as carbocation and diazonium ion stability, e) DNA adduct profiles and the level of mutagenic adducts, and f) DNA repair mechanisms and their capacity levels.
There can also be competing metabolic activation mechanisms, such as β-carbon [9,19], γ-carbon [19], and ω-carbon hydroxylation [9,19], as well as mechanisms such as denitrosation [20], and trans-nitrosation [21], which may be either metabolically mediated (in the case of denitrosation, potentially via the same radical intermediate as α-hydroxylation [22]) or not. This investigation will focus on identifying the structural characteristics that affect dialkyl N-nitrosamines potency and how they may be used to determine the relative potency of these different nitrosamines.

Dataset curation
Data was extracted from historic rodent carcinogenicity and mutagenicity sources and curated according to the respective standard protocols by a number of separate data-gathering exercises -Lhasa Limited's Vitic (2020) [7,23], Instem's Leadscope Genetox and Carcinogenicity Databases (2020) [24] and the now-retired Carcinogenicity Potency Database (CPDB) [25,26] available as the Lhasa Carcinogenicity Database [LCDB, carcdb.lhasalimited.org]. Data extraction from CPDB/LCDB was performed in-house at Lhasa Limited from the source data, extracting all data for structures that match NN(III) = O and filtering via substructure patterns in Knime (www.rdkit.org, as implemented in KNIME version 4.1.0, www.knime.org) to remove those structures that match the non-dialkyl compounds shown in Fig. 2. These compound classes, such as nitrosoureas, nitrosamides and similar compounds are known to exert mutagenic and carcinogenic potential via different mechanisms and have therefore been excluded from this analysis. A similar approach was taken to the Vitic data, using the same substructural features, and extracting all data from the 'Carcinogenicity' and 'Genetic Toxicologyin Vitro' tables; data from the latter was then filtered to Ames test or synonyms only. Data extraction from the Leadscope Genetox and Carcinogenicity Databases was similarly performed in-house at Instem from the source data, extracting all data for structures that match NN(III) = O and filtering using Leadscope substructure search functionality. The latter was filtered to include compounds containing Ames test data and carcinogenicity calls. Data from these three sources were curated together manually, creating a combined dataset with consensus calls for carcinogenicity and Ames test data.

Choice of structural features
Exploratory investigations were performed using a subjective analysis of TD 50 [27,28] potency data from the LCDB previously described [7,29] using substructure patterns for features previously identified [9,10]. Several distinct substructural categories were identified (see Fig. 5) and two were chosen to investigate in more depth: 1) the degree of α-branching of the nitrosamine (Fig. 3) 2) the presence or absence of electron-withdrawing groups (Fig. 4).

Data analysis
The structural categories described in Figs. 3-5 were manually encoded into substructure patterns using the SMARTS notation [31], and pattern-matching was performed against the dataset described using RDKit (www.rdkit.org, as implemented in KNIME version 4.1.0, www. knime.org). Data analysis and visualisation was performed in python (www.python.org, version 3.7.6).
The two alkyl substituents of the molecule were considered both separately and in combination (i.e., with R1 in Fig. 3-5 either kept as "C except C = O, C = N" or explicitly defined, respectively), and thus an exponentially large number of potential feature combinations exist.
Many of these, however, have no examples in the dataset and are thus unable to be considered.

Results
The curation of carcinogenicity and Ames study data described resulted in a consensus dataset of 362 dialkyl N-nitrosamines. Of these, 208 have carcinogenicity data (including TD 50 [27,28] values for 74 of these) and 281 have Ames study data. Analysis of the concordance between these endpoints has been performed elsewhere [Trejo-Martin et al, manuscript in preparation, [7,29]], and is reported to be excellent. The reasons for the lack of a TD 50 for many of the carcinogenicity records include principally that for 120 compounds a study exists in the Lhasa and/or Instem dataset that was not incorporated in the CPDB and 14 compounds for which at least one record exists in the CPDB, but no TD 50 was able to be determined by Gold et al (typically due to a negative result in the study).

Categorizing nitrosamine potency by structural features
The analysis focused on extracting and developing chemistry-based knowledge by uncovering trends in the chemical feature-activity space that are represented in the database. The objective is ultimately to encode the expert, intellectual knowledge into alerts for identification of carcinogenicity potency categories for compounds (based on rodent TD 50 values). As it is not the intent to develop statistical (Q)SAR models using these features, the number of observations is not as important as is the relevance of chemical features to known organic chemistry reactivity and functional group properties.
A closer examination of the many structural features that can affect dialkyl N-nitrosamines is presented in Fig. 5. This figure shows a summary of all the structural features investigated thus far. Many potential features had few observations and the presence of multiple substituents per compound can sometimes complicate the analysis when carbon hydroxylation can potentially occur on either substituent. Since the relative amount of 2-year rodent carcinogenicity bioassay data is low and there is little expectation of new data being generated, the potency trends established from analysing the carcinogenicity data were corroborated by comparing the Ames mutagenicity data for prevalence of positive and negative results with carcinogenicity potency trends. This comparison is supported by the high sensitivity of Ames study results in predicting rodent carcinogenicity [7,29] and the fact that nitrosamine mutagenicity is observed to occur via alkylation at specific DNA base sites (e.g., O 6 -guanine [32]) in a mutagenic mechanism [9,10].
Based upon these considerations, the current investigation chose to initially analyse and report the steric effects of α-carbon substitution and electronic effects of β-carbon electron-withdrawing groups on nitrosamine carcinogenicity potency and mutagenicity prevalence.

The effects of degree of α-carbon substitution on nitrosamine carcinogenicity potency and mutagenicity prevalence
The first category investigated is the degree of α-branching of the nitrosamine, which has historically been reported [9,10] to have a significant impact on potencyindeed, dialkyl nitrosamines lacking any α-carbon hydrogens are indicated by the European Medicines Agency (EMA) to be of lower concern [4]. Fig. 3 gives the structural definitions used to identify these classes. While much of the literature on nitrosamines has concentrated on experiments measuring NDMA and NDEA potency, Fig. 6 shows that these small nitrosamines constitute a very potent but limited nitrosamine set with a tight TD 50 value range. Larger nitrosamines, such as those for drug-like compounds, have TD 50 ranges spanning 4 orders of magnitude and containing examples of compounds with much lower potency and significant potency differences between structural classes. When comparing the "Only Et/Me" plot with the "Has Et/Me" plot it can be seen that increasing the size of a nitrosamine substantially increases the range of possible TD 50 potency values. The reasons for this discrepancy may have a number of origins.
The "Has acyclic a-CH2 (not Et/Me)" and "Has cyclic a-CH2" plots illustrate that potency generally decreases for nitrosamines with increasing chain length and ring size (though there are some notable exceptions to this trend). Lastly, the "No a-CH2" plot is of particular interest. There are two compounds in this category with TD 50 values; firstly, 2,6-dimethyl-N,N'-dinitrosopiperazine contains both a substituted and unsubstituted nitrosamine, and thus matches the substructure pattern for having two isopropyl groups. However, it also has a reactive, unsubstituted nitrosamine that is the probable source of mutagenesis and carcinogenesisand hence this compound is worthy of inclusion in the cohort-of-concern and matches both the "No a-CH2" (at one nitrosamine substitution site) and "Cyclic a-CH2 (at the other). Secondly, nitrosodiphenylamine, which is the weakest carcinogen in the nitrosamine dataset for which a TD 50 was calculated (167 mg kg − 1 day − 1 ). When this is combined with the observations from Fig. 7 below, potency and also prevalence are significantly reduced. Therefore, it can be argued that nitrosamine groups with zero or one α-carbon hydrogen lack carcinogenic potency to such an extent that, even when positive, their TD 50 values would no longer fall within the level for cohort-ofconcern described in the ICH M7 guideline. This assumes, of course,   [30].
that they are the only alerting substructural feature present. Fig. 7 continues this analysis by more precisely illustrating the effects of α-carbon substitution and steric bulk on nitrosamine carcinogenicity and mutagenicity prevalence. Functionally, the "no a-CH2" category of the plot in Fig. 6 contains compounds with two of any one of the followingisopropyl, tert-butyl and/or aryl groupsas their substituents; note there is no data for the compounds with mixed arrangements such as isopropyl and aryl. These effects arising from the substituents have previously been reported [9,10], but are re-examined and confirmed here in light of additional data and decades of scientific advances. The top row histograms include data from binary (positive/ negative) carcinogenicity data in addition to compounds having TD 50 values. The "Two CH2 groups" histogram illustrates the strong prevalence of positive carcinogenicity for compounds having 2 or 3 hydrogens on at least one of the α-carbon positions. The "CH2 with iPr" histogram illustrates that substitution of an isopropyl group (or longer) at one of the α-carbon positions reduces the prevalence of positive carcinogenicity compounds while "Two iPr groups" illustrates that isopropyl (or longer) substitutions at both positions reduces positive prevalence even more so. There is one exception to this trend without clear reason: The other compounds with positive results in this category are explainable. Firstly, 2,6-dimethyl-N,N'-dinitrosopiperazine (Fig. 8) as discussed with reference to Fig. 7. Secondly, methyl-1-methyl-6-nitro-2nitroso-1,3,4,9-tetrahydropyrido(3,4)-bindole-3-carboxylate ( Fig. 8) is carcinogenicity positive but contains a nitro group on a polycyclic aromatic, and these are known structural alerts for genotoxicity independent of the hindered nitrosamine. When a tert-butyl group substitution is present, the "CH2 with tBu" histogram illustrates that a lack of α-carbon hydrogens on just one side of the nitrosamine can negate genotoxicity. This occurs despite the presence of methyl/ethyl groups on the other side which can be assumed to be metabolically labile, and therefore may be due to a lack of ability for the cation to alkylate nucleic acids [33]; this shows the importance of the three-stage SAR consideration discussed in section 1. If two tert-butyl groups are present there are of course no α-hydrogens, and compounds with this feature are likewise carcinogenicity negative. Lastly, the "CH2 with aryl" histogram illustrates that the presence of one aryl substitution may also reduce carcinogenicity prevalence. The specific effect likely depends not only the presence of an aryl ring (versus a non-aryl ringwhich with the open patterns used here appears as 'isopropyl'-like) but on substitutions at the ortho, meta, and para positions on that ring (such as the differing results between 2-,3-(and 4-) (Nnitroso-N-methylamino)pyridine (Fig. 8, center), and the nature of the substituting groups. For the nitrosomethylaminopyridine series, the nature of the pyridinyl cation or diazopyridine may explain the different toxicity profile. An organic chemists' understanding would normally be that the 2-and 4-derivatives would be comparable and the 3-different due to the delocalisation patterns within the aromatic ring; however, this is not the case. Rather, the 2-derivative is different, which may be due to the proximity of the pyridinyl nitrogen and its lone pair (both as a potential base and as a lack of steric hindrance when compared to a CH group) to the diazonium site [34]. These effects in this and similar systems will be the focus of future investigations. While the aryl group itself provides no α-carbon hydrogens for metabolic activation via the standard mechanism, it may well be able to alkylate DNA. The difference  in carcinogenicity effects, between the similarly hydrogen-free tert-butyl group (which seems to eliminate genotoxic potential) and the aryl group also bears further investigation.
The bottom row of graphs in Fig. 7 provides an analysis of the same patterns of substitution on a larger set of Ames study data. These histograms compare favorably with the carcinogenicity histograms, reinforcing the same resulting trends. Again, this analysis approach is focused on extracting and  developing chemistry-based knowledge by uncovering trends in the chemical feature-activity space that are represented in the database. Even though the number of observations is limited, finding associations of these chemical features with known organic chemistry reactivity and functional group properties provides a theoretical justification for these effects and allows us to use these chemical properties in the future to predict the carcinogenicity of new compounds having such groups.
In summary both the extent of α-carbon substitution and steric bulk can significantly reduce or even potentially eliminate carcinogenicity.

The effects of β-carbon electron withdrawing groups on nitrosamine potency and mutagenicity prevalence
The second category investigated is the presence or absence of β-electron-withdrawing groups since several compounds with these features have reported to be of lower potency or non-carcinogenic. Early investigations [29] showed that this rule is not universal, however, a number of compounds with the 2-oxo-propyl functionality are still potent. Such potency may be due either to the acidity of the enol protons in this case, or due to a reduction in this electron-withdrawing potential compared to other groups (such as trifluoromethyl). As a result, the electron-withdrawing groups were divided into strong and weak categories (after Remya and Suresh [30]), as shown in Fig. 4. These categories represent a commonly-observed subset of the possible electronwithdrawing groups, using an approximate energy difference cut-off (ΔV c ) of 10 kcal mol − 1 .This is only an approximate cut-off value due to the need to group the specific structures modelled by Remya and Suresh according to the substructure patterns [30]. However, it covers all electron-withdrawing groups observed in the dataset. One obvious exception from the carbonyl/carboxyl category that is represented in the dataset are the carboxylic acids. These are expected to be deprotonated in vivoeither in isolation or forming a Zwitterion with the amine N of the nitrosamineand thus not electron-withdrawing. An example of this is N-nitrosoproline whose negative charge at physiological pH, may prevent it from entering relevant cells and thus evading the cytochrome P450 metabolism [35,36].
The carcinogenic potencies of nitrosamines containing strong, weak, and no β-carbon electron withdrawing groups are illustrating in Fig. 9. While there were limited examples of strong withdrawing groups in the carcinogenicity potency data set, its presence was associated with a reduction in carcinogenic potency while weak withdrawing groups showed less effect. Fig. 10 more clearly illustrates the effect of β-carbon electron withdrawing groups on nitrosamine mutagenicity prevalence as it considers binary carcinogenicity data in addition to only positive nitrosamines having TD 50 values. In this figure we see that a single weak electron withdrawing group appears to have a limited effect on carcinogenicity as compared to the general trend for all compounds. The presence of two weak electron withdrawing groups (as in nitroso-bis(2-oxopropyl) amine, Fig. 11) appears to have little to no effect on mutagenicity though possibly some minor effect in reducing carcinogenicity albeit there is little data present. The effect of a single strong electron withdrawing group negating carcinogenic potential is more clearly seen. The presence of two electron withdrawing groups is seen to negate carcinogenic potentialcompare the tri-and hexa-fluoro compounds in Fig. 11. The presence of β-carbon electron withdrawing groups is associated with a reduction in the prevalence of carcinogenicity potency with most effect resulting from strong and multiple groups.

NDMA and NDEA activity and their relevance to other Nnitrosamine potencies
NDMA and NDEA have been extensively studied both experimentally and mechanistically [27,[37][38][39]. The potencies of these compounds have been used as references for establishing the regulatory limits for many nitrosamines [3][4][5][6]. However, these are very small, volatile compounds containing no functional groups beyond the nitroso moiety. Consequently, unlike larger nitrosamines such as those in drug impurities, they are characterized by specific physico-chemical properties and ADME (absorption, distribution, metabolism, and elimination) profile, and they would not exhibit steric hinderance or electronic effects limiting their reactivity and subsequent carcinogenicity potency.
One path for carcinogenicity of many nitrosamines is believed to be via mutagenicity as a result of DNA adduct formation albeit a quantitative relationship has not been determined nor has the method of action been yet proven [19,40]. Consequently, DNA adduct formation is used as a biomarker but is not a regulatory endpoint. The principal reaction mechanism for NDMA and NDEA is the hydroxylation of the α-carbon which is metabolized via CYP450 2E1 [9]. After metabolic activation, intermediate I (Fig. 1) is formed, potentially stabilised by an intramolecular hydrogen bond (rotation around the nitrosamine group is restricted). Thus the molecule may or may not be in the correct configuration to form this intermediate [41]. This is followed by heterolysis and subsequent diazonium ion formation before ultimately alkylating DNA [19,41].
However, there are several characteristics of this metabolic activation mechanism that are key in determining the extent of DNA adduct formation. These vary significantly across different subclasses of nitrosamines consequently affecting the relative potency of subclasses which may make the use of default potencies (e.g. rodent TD 50 values) of NDMA and NDEA not always the most appropriate. These characteristics include: 1. This mechanism depends on the availability of CYP enzymes for the extent of reactivity and thus potency. The 2E1 enzyme levels vary by species (rat vs hamster vs human) as well as organ [42][43][44]. Variation by species leads to different experimental results for both in vivo and in vitro studies (for example the use of hamster vs rat S9 in Ames salmonella tests). Those species with higher 2E1 liver enzymes may be a more sensitive tester species. This also complicates assessment of human relevancy due to relative enzyme levels. For large nitrosamines, enzymes other than 2E1 (such as 2C9, 2A6 and ultimately 3A4) become responsible for hydroxylation since the active site of 2E1 is proportionally very small and larger nitrosamines do not fit [16,17]. Consequently, the rate and SAR of hydroxylation of these compounds can vary significantly from that for NDMA and NDEA. Different relevant enzyme levels in different organs affects the susceptibility of organs to tumor formation. Whilst relevant liver enzyme levels are the highest amongst target organs, metabolism outside of the liver (such as bladder, stomach, and esophagus) may also occur and subsequently results in tumor formation [45]. 2. This mechanism depends on the availability of the hydrogen on the α-carbon for metabolic activation to occur. Substitution at the α-carbon on one or both sides of the nitroso group affects reactivity. If there are no α-carbon hydrogens available this metabolic mechanism is completely inhibited (competing mutagenicity mechanisms may still theoretically occur, but in practice the blocking of this mechanism removes the need to consider the compound as part of the cohort-of-concern [4]). As fewer α-carbon hydrogens are present (with 6 being the maximum in NDMA) more inhibition of the mechanism occurs. 3. If the substitution patterns are different on each side of the nitroso group, two different DNA adducts may occur depending on which α-carbon hydrogen is predominantly metabolized. The proportion of different resulting adducts depends on the relative ease of hydroxylation of each side group. NDMA and NDEA have symmetrical substituents and thus produce only a single (small) adduct for alkylation (methyl or ethyl respectively) with no competition. In comparison, nitrosomethylethylamine (NMEA) can cause both DNA methylation and ethylation, and it has been observed that blocking the ethyl site, as in nitrosomethylneopentylamine (NMNA), results in only methylation with tumors specific to a particular organ (the esophagus) [46]. Additional organotrophic carcinogenic effects have been reported between symmetrical and asymmetrical nitrosamines [18].
4. Substitution affects steric access to the α-carbon. As steric bulk, such as isopropyl groups are added as part of the nitroso substituents, α-carbon hydroxylation can become partially or totally inhibited due to steric hinderance. Consequently, not all nitrosamines will have the same unfettered ability to undergo α-carbon hydroxylation as NDMA or NDEA. 5. The electrostatic effects of different substituents will alter reactivity. Differences in electrophilicity due to the presence of, for example, electron-withdrawing groups, on nitroso substituents can make the energetics of the α-hydroxylation mechanism less favorable. NDMA and NDEA display no such effects.  6. Substitution affects the potential formation and stability of the hydrogen bonding intermediate I. The formation of this intermediate depends on the correct structural orientation for hydrogen-bonding to occur as the hydroxyl hydrogen is hydrogen-bonded to the nitroso oxygen. Large and bulky substituents will affect energetics which are favorable to intermediate formation in NDMA and NDEA. The presence of additional functionality on nitroso substituents (such as β or γ hydroxyl groups) can also interfere not only with intermediate formation but may also lead to the formation of hydrogen-bonded rings between the two sides of the nitrosamine which can be energetically more favorable than the intermediate required for diazonium ion formation [S. Yu, et. al., manuscript in preparation].
7. Other reaction mechanisms may well compete with α-carbon hydroxylation [9]. As NDMA and NDEA are relatively featureless, there is little competition from other known reaction mechanisms. However, there can be several different reaction mechanisms that compete with α-carbon hydroxylation for other Nnitrosamines including hydroxylation at other (β, γ, and omega) carbon sites [9,19] and denitrosation [20]. The presence of a different dominate reaction mechanism reduces the potency of a particular N-nitrosamine subclass relative to NDMA and NDEA. 8. The presence of hydroxyl groups on N-nitrosamines substantially affects the stability and solubility of N-nitrosamines [47]. α-hydroxylated dialkyl-N-Nitrosamines only have a half-life of up to 10 s under physiologic conditions before they spontaneous decompose to an aldehyde or (ultimately) diazonium hydroxide which reacts predominantly with water and is then cleared from the system (but also reacts with DNA) [48]. Consequently, the site in the body where α-carbon hydroxylation occurs can influence the organ tumor site. In theory, hydroxylated N-nitrosamines may be less potent carcinogens due to their solubility. However, when considering hydroxylation at these additional positions, other enzymatic reaction mechanisms such as oxidation to aldehydes and carboxylic acids via alcohol dehydrogenase may result in their conversion to potent direct-acting carcinogens [49]. Hence the presence of alcohol, aldehyde, and carboxylic acid functionality complicates potency determination (with respect to the TTC) via several alternative pathways. These effects are not seen in NDMA or NDEA. 9. The effectiveness of in vivo detoxification mechanisms is dependent on nitroso substituents. The detoxification pathways for drugs occurs via second phase metabolism in the liver where conjugation occurs resulting in a stable, soluble compound more easily cleared from the body [50,51]. There are several types of conjugates relevant to nitrosamines that can be formed, including glucuronides, sulfonates, and glutathione conjugates [19]. The prevalence for conjugate formation (and which type of conjugate) depends on the chemical functionality present in the nitroso substituents. Denitrosation, resulting in the secondary amine formation is yet another "detoxification" mechanism resulting in carcinogens that are less potent than those from α-carbon hydroxylation [20]. Detoxification through conjugate formation is not observed with NDMA or NDEA though denitrosation is observed [52]. 10. Labile functional groups such as thiol or ether moieties affect reactivity and potency. Such groups in nitroso substituents will change the reactivity of an N-nitrosamine, resulting in conjugate formation or direct cleavage of the compound, for example cleavage at the thiol or ether functional group. Long chain substituents may be subject to pre-metabolic oxidation clipping of the carbon chain resulting in more potent carcinogenicity than expected [9,53,54]. The featureless NDMA and NDEA structures do not possess these characteristics. 11. DNA repair mechanisms are different for adducts larger than methyl and ethyl alkylation and involve repair of more than just DNA adducts via alkyltransferases [55][56][57]. The focus on NDMA and NDEA DNA mutations considers only the DNA repair mechanisms for methylation and ethylation, (specifically methyl and ethyl transferases), while repair of bulky adduct formation involves other alkyltransferases as well as base and nucleotide excision repair mechanisms. 12. NDEA and NDMA are low molecular weight N-nitrosamines.
When a read-across exercise is performed with these as analogs and a weight-based AI limit (as opposed to molarity) is extrapolated for a larger compound, the derived limit permits even fewer molecules of the larger compound. Hence a molarity-based limit may provide a more relevant comparison.
As demonstrated here, the mechanism resulting in NDMA and NDEA potency may not translate well when assessing other, more complicated N-nitrosamines. Their lack of substituents beyond methyl and ethyl groups in these small nitrosamines results in a very limited set of parameters affecting their potency that may be present in all the other larger and more complex N-nitrosamines. Larger and varied substituents not only affect the efficiency of the α-carbon hydroxylation mechanism but can result in different detoxification mechanisms and ADME properties (such as solubility) affecting exposure levels and times, varying target organs, and involving different metabolism enzymes. Different competing mechanisms for mutagenicity can occur due to differences in structure and small structural variations can lead to significant differences in resulting potency. This makes the risk assessment of N-nitrosamines both complex and dependent on many variables, and significantly different than assessments of NDMA and NDEA.

Applying mechanistic analogs to support read-across of Nnitrosamines
This investigation has shown that structural features of the test article can be assessed to determine the nature of its metabolic activation (e.g., α-carbon hydroxylation) and if any chemical attributes may reduce or eliminate its carcinogenicity. These chemical attributes may be structural feature patterns encoded as structural alerts which can help in automatically performing this assessment. Once the dominant reaction mechanism of the test article has been identified along with any mitigation, a mechanistically appropriate set of reference compounds can be selected (i.e., based on mechanistic similarity) from a database of experimental data for comparison in a read-across exercise. Although the mechanistic analogs may not be as similar as some other structures (based on global structural similarity), their relevance to the reaction mechanism of the test article makes them a better choice for a readacross exercise in agreement with Read-Across Assessment Framework (RAAF) [58], where structural similarity considerations are combined with elements from the mechanistic similarity and metabolic similarity.
In some cases, the presence of multiple structural features on the test article could affect the nature and rate of metabolic activation. In this case the feature that will more significantly change the potency should be selected and those analogs with that feature used for comparison. Fig. 12 shows a potential example of this read-across exercise. Nnitrosohydroxyl proline contains two different features that affect α-carbon hydroxylation. A carboxylic acid group is substituted at the 2position on the ring, partially blocking one α-carbon. There is also a hydroxyl group substituted at the 4-position on the ring, While the presence of the alcohol may reduce the potency of the test article with respect to nitrosopyrrolidine (TD 50 of 0.679 mg kg − 1 day − 1 ) [26,27] , the presence of the carboxylic acid group partially blocking metabolism at the α-carbon site is much more significant.
This observation is confirmed when looking at two analogs, both quite structurally similar to the test article and each containing one of these structural features. Analog 1 (N-nitroso-3-hydroxypyrrolidine) has a hydroxyl group in the same position as the test article, while analog 2 (N-nitroso-L-proline) has a carboxylic acid group in the same position.
Whereas analog 1 had a relatively high TD 50 value of 7.65 mg kg − 1 day − 1 , analog 2 is non-carcinogenic altogether displaying the more significant effect. In this example, analog 2 would make a better choice for assessing the potency of N-nitrosohydroxyl proline in a read-across exercise since a typical assessment of a nitrosamine must start with the assumption of extreme potency (as defined in the cohort-of-concern) and then add in the effects of deactivating features. It is important to note that in this case, the presence of the acid group on the test article and analog 2 may have significant impact beyond any steric or electronic effects. This is because the acid group substantially impacts the ADME of these molecules, potentially in the following ways: Firstly, the acid group, negatively charged at physiological pH, increases the polarity (and consequentially solubility and potential for glucuronidation) and decreases the potential for cell membrane penetration, reducing both the need and potential for metabolism and thus the risk for metabolic activation. Secondly, should metabolic activation of acidcontaining compounds occur (not all are negative [59], though many are [29]), it is likely to be via Cyp 2C9, which has the specific steric requirements [16,17] mentioned in section 1.1 that may prevent α-hydroxylation. These factors further reinforce the choice of analog 2 as the most suitable and the conclusion that potency is low or negligible.
Another potential use of SAR is in the creation of categorical mechanistic alerts to support a read-across exercise. Here a mechanistic alert represents a set of analogs with a similar potency and mechanism of action. Alerts matching the test article are first reviewed for mechanistic relevancy. A TD 50 value representing the most appropriate alert may then be selected. In Fig. 13 we see an example where two alerts are defined representing two different sets of compounds with two different potency ranges. Here alert 2, with compounds of low potency (e.g., TD 50 > 15 mg kg − 1 day − 1 ), is selected as the more appropriate alert over alert 1 (medium potency, e.g., TD 50 1.5-15 mg kg − 1 day − 1 ). As an alternative, the structures in the matching alert group could be inspected for the potency value of the most similar analog in that group, which would result in the use of nitrosoproline as an analog, similar to the first approach.

Conclusions
This investigation into the structure-activity relationships responsible for differences in the relative carcinogenic potency (as defined using rodent TD 50 values) of dialkyl N-nitrosamines examined in detail the structural effects on the main driver of high-potency nitrosamines, namely α-carbon hydroxylation via metabolic activation. While there are many different features affecting nitrosamine carcinogenic potency, this investigation examined some of the electronic and steric effects responsible for lower carcinogenic potency and even non-carcinogenic nitrosamines.
Both electronic and steric effects can partially or totally inhibit a metabolism mechanism, potentially resulting in an alternative dominant metabolism mechanism, and/or a significant reduction or elimination of carcinogenic potency. We have shown (using both Ames study data and rodent carcinogenicity data) that increasing the number of α-carbon substitutions decreases the potency and positivity of nitrosamines, with potency decreasing as the number of α-carbon hydrogens is reduced due to additional substitutions. Substitutions on both sides of thenitrosamine's amine nitrogen continue this trend. It is not necessary for complete substitution (e.g., removal of all α-carbon hydrogens) to completely negate carcinogenicity nor it is necessary to have substitution on both sides of the amine.
Steric hindrance from bulky chemical groups attached directly to the amine nitrogen or to the α-carbon can both reduce or eliminate carcinogenicity. Again, it is not necessary to have steric hinderance on both sides of the amine to inhibit metabolic activation. While increasing chain length substitution, ring-size, and molecular weight can decrease carcinogenicity potency compared to very small (methyl, and ethyl substituted) nitrosamines, the nature of functional group attachments plays a more important role in the relative potency of larger nitrosamines.
Compounds containing strong electron-withdrawing groups at the β-carbon exhibit a large decrease in carcinogenicity and an increase in the prevalence of negative mutagenic compounds. Two, strong β electron-withdrawing groups exhibit an even more pronounced effect, while weak β electron-withdrawing groups exhibit little effects on potency (though two weak groups show some limited effects).
This investigation identified several structural features that affect the potency of dialkyl N-nitrosamines by reducing or eliminating their metabolism. Consequently, when analogs are selected for a read-across exercise to determine nitrosamine potency, their selections must be initially be based upon assessment of the mechanistic domain and thus on the mechanistic similarity between the test article and potential analogs. Additionally, structural features affecting the metabolism of the test article must then be considered, either as explicit examples on the analog or as a functional group containing the same characteristics for modifying the metabolism of the test article, including those investigated here: the presence of α-carbon substitution and strong β-electronwithdrawing groups. Consequently, performing a read-across exercise for a test article using NMDA or NDEA as analogs may not always be appropriate, especially for test articles of significantly different size and shape, or those with significant chemical functional groups.
This investigation is on-going, and the current results represent only a few of the categories in the nitrosamine SAR affecting their carcinogenicity potency. Future work will continue the investigation to elucidate the effects of additional categories on carcinogenicity potency.

Declaration of Competing Interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Kevin P. Cross reports financial support was provided by National Institute of Environmental Health Sciences of the National Institutes of Health. The authors are employed full-time by their respective companies which develop software for the prediction of chemical toxicity as listed as their affiliations and have performed this work as part of their research tasks in collaboration with each other and the informal working group discussed; no product, commercial, market or strategic information has been shared in discussion, nor has it influenced this work.