Computational insights into the mutagenicity of two tobacco-derived carcinogenic DNA lesions

Abstract 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone is a potent carcinogen found in all tobacco products that leads to a variety of DNA lesions in cells, including O6-[4-oxo-4-(3-pyridyl)butyl]guanine (POB-G) and O6-[4-hydroxy-4-(3-pyridyl)butyl]guanine (PHB-G), which differ by only a single substituent in the bulky moiety. This work uses a multiscale computational approach to shed light on the intrinsic conformational and base-pairing preferences of POB-G and PHB-G, and the corresponding properties in DNA and the polymerase η active site. Our calculations reveal that both lesions form stable pairs with C and T, with the T pairs being the least distorted relative to canonical DNA. This rationalizes the experimentally reported mutational profile for POB-G and validates our computational model. The same approach predicts that PHB-G is more mutagenic than POB-G due to a difference in the bulky moiety hydrogen-bonding pattern, which increases the stability of the PHB-G:T pair. The mutagenicity of PHB-G is likely further increased by stabilization of an intercalated DNA conformation that is associated with deletion mutations. This work thereby uncovers structural explanations for the reported mutagenicity of POB-G, provides the first clues regarding the mutagenicity of PHB-G and complements a growing body of literature highlighting that subtle chemical changes can affect the biological outcomes of DNA adducts.


INTRODUCTION
Nitrosamines are a large group of compounds that occur in the human diet, cosmetics, and tobacco, as well as flexible plastics such as balloons, condoms and baby bottle nipples (1)(2)(3). Despite the high prevalence of nitrosamines in contemporary society, these compounds are very carcinogenic, with 90% of the 300 identified nitrosamines being known carcinogens (4). The nitrosamine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) is formed during the curing of tobacco and has been identified in all tobacco products, including those used in conventional cigarettes, electronic cigarettes and Hookahs (5)(6)(7). NNK is also the only tobacco component that has led to lung cancer in every species tested (i.e. mice, rats, hamsters, rabbits, pigs, monkeys and humans) regardless of the route of administration (8,9). Therefore, the World Health Organization's International Agency for Research on Cancer has classified NNK as carcinogenic to humans (Group 1) (10). Nevertheless, changes in cigarette design since the 1950s and new routes of tobacco administration have increased human exposure to NNK, and consequently led to an increase in the number of cases of adenocarcinoma (a form of lung cancer) (11,12). Indeed, lung cancer is the most common cancer caused by NNK and is also the leading cause of cancer death worldwide in part due to a survival rate of only 18% (13).
In cells, NNK can be reduced to 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanol (NNAL), and cytochrome p450 enzymes convert both NNK and NNAL into species that react with DNA (5). The complex metabolism of NNK and NNAL gives rise to a variety of DNA lesions including methyl, formaldehyde, pyridyloxobutyl (POB) and pyridylhydroxybutyl (PHB) DNA adducts (5). The current study focuses on the larger POB-G and PHB-G lesions that result from attack at the O6 position of G ( Figure 1). Both POB-G and PHB-G have a modified Watson-Crick hydrogenbonding face compared to G (i.e. N1 becomes a hydrogenbond acceptor; Figure 1). Therefore, this may cause these G lesions to no longer exhibit a preference for C, which could result in mutations upon replication. POB-G and PHB-G differ in the substitution of the chain connecting the adducted G to the bulky moiety ring (i.e. carbonyl versus hydroxyl group, respectively) and previous work on DNA adducts arising from known human carcinogens has shown that lesion mutagenicity is affected by small changes in chemical composition including linker substituents, ring substitution and ionization state (14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25). Thus, POB-G and PHB-G may have different mutagenic profiles, and their consideration may afford a broader understanding of the effect of bulky moiety linker composition on lesion mutagenicity. Detailed investigations have shown that POB-G is highly mutagenic (21,26,27). Specifically, POB-G has a 96% mutation rate in Escherichia coli cells, exclusively leading to G → A transition mutations (26). The mutational spectrum in human HEK293 cells is slightly more complex, with the primary replication outcome still being G → A transition mutations (54%), but G → T mutations also occurring at a low frequency (2%) (26). In vitro kinetics studies support these reported mutational spectra, indicating that translesion synthesis (TLS) polymerases preferentially insert dCTP and dTTP opposite the lesion (21,27). Although the kinetic studies also report dGTP insertion, extension past the POB-G:G base pair does not occur (21,27). Furthermore, while polymerase (pol) bypasses POB-G with a k cat /K m of 0.63 mM −1 s −1 , pol , and bypass POB-G with a k cat /K m of < 0.005 mM −1 s −1 , which indicates that pol likely plays a critical role in POB-G replication (21,27). Nevertheless, POB-G replication is significantly slower compared to that of natural DNA, which has a k cat /K m of 160 mM −1 s −1 for the insertion of dCTP opposite G by pol (21). While the mutagenicity of PHB-G is currently unknown, the NNAL parent compound that primarily forms PHB lesions has been reported to be tumorigenic in mice (28).
To complement mutagenicity data, structural studies have been performed on POB-G using a variety of models (29)(30)(31). Molecular mechanics (MM) minimization of the 5 -AT[POB-G] 2 -deoxytrinucleoside diphosphate revealed that the bulky moiety can be directed either 3 or 5 with respect to the lesion site, or extended in the same plane as the adducted G (29). Classical molecular dynamics (MD) simulations on POB-G-adducted DNA (29) and nuclear magnetic resonance (NMR)-based restrained MM minimizations (30) found that the bulky moiety is preferentially extended in the same plane as the adducted G. Nevertheless, while only one position of the POB-G bulky moiety was reported in the NMR structure, the inherent flexibility of the lesion is reflected in broad NMR signals (30). This correlates with computational work on the POB-G nucleobase (31) and 5 -AT[POB-G] 2 -deoxytrinucleoside diphosphate models (29), which indicate POB-G has a high degree of conformational flexibility. In terms of base pairing, POB-G has been shown both in computational and experimental work to form a wobble pair opposite C (29)(30)(31). Previous computational work from our group provided the first structural data for understanding the observed POB-G mutagenicity by characterizing nucleobase base dimers (31). Specifically, a stable, non-distorted pseudo-Watson-Crick POB-G:T pair and marginally distorted pairs between the Watson-Crick face of POB-G and A or C were identified (31). Furthermore, a predicted stable, but highly-distorted, Hoogsteen POB-G:G pair was found (31), which correlates with the experimentally-observed insertion of dGTP opposite POB-G, but the lack of subsequent extension (21,27). Overall, these previous studies have provided insight into the flexibility and base-pairing propensity of POB-G. Furthermore, these works highlight the critical role computational studies can play in unveiling how lesion structural properties relate to mutagenicity. Nevertheless, structural details of the adduct mispairs within the helical environment, as well as pairs within the polymerase active site, are key for understanding lesion mutagenicity, but remain unavailable. Additionally, no structural studies have clarified the similarities and/or differences in the flexibility and basepairing preferences of PHB-G with respect to POB-G.
To shed further light on the mutagenicity of POB-G and provide the first clues regarding the mispairing potential of PHB-G, the present work uses quantum chemical calculations and MD simulation methods to characterize the intrinsic conformational and hydrogen-bonding preferences of POB-G and PHB-G, as well as the corresponding properties in a DNA duplex and human TLS polymerase active site. Specifically, density functional theory (DFT) nucleobase and nucleoside models are initially used to examine the energy penalty for rotation about key bonds in the bulky moiety and at the glycosidic linkage, which uncovers the accessible conformations of free POB-G and PHB-G. Subsequently, 28 hydrogen-bonded pairs for each lesion were characterized using DFT nucleobase models to understand possible interactions between different lesion orientations and the canonical DNA nucleobases. MD simulations on adducted DNA provide insight into the dynamic conformational landscape of POB-G and PHB-G, and the effects of the identity of the pairing base on the structure of post-replication duplexes. Finally, MD simulations reveal the POB-G and PHB-G conformational flexibility and base pairing within the confines of the pol active site, and thereby shed light on the relative propensity for the insertion of various dNTPs. Overall, our multiscaled computational approach provides structural insight into the mispairing potential of two key tobacco-derived DNA lesions and complements a growing body of literature focused on understanding how small changes in the chemical composition of adducts affect mutagenicity (14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25).

MATERIALS AND METHODS
An AMBER conformational search about key rotatable bonds in the POB-G and PHB-G nucleobases (␣ , ␤ , ␥ , ␦ , ⑀ , and , Figure 1) as implemented in Hyperchem was used to determine the inherent conformational preference about the nucleobase-bulky moiety linker and within the bulky moiety. All orientations of POB-G and PHB-G isolated from the conformational search were subsequently optimized using DFT, specifically B3LYP-D3(BJ)/6-31G(d) and the relative energies were determined using B3LYP-D3/6-311+G(2df,2p). The structures thus obtained were grouped into conformational categories based on the orientation of the bulky moiety with respect to the adducted G. Subsequently, to assess the relative stability of the anti ( ≈ 220 • ) and syn ( ≈ 60 • ) conformations about the glycosidic bond in the POB-G and PHB-G nucleosides, 2deoxyribose was added to the most stable orientation from each conformational category of the nucleobase adducts. Finally, hydrogen-bonded nucleobase dimers between the Watson-Crick or Hoogsteen face of the most stable orientations of POB-G and PHB-G from each conformational category, and each of the four canonical DNA bases were investigated. These steps all involved B3LYP-D3(BJ)/6-31G(d) or M06-2X/6-31G(d) optimizations and B3LYP-D3(BJ)/6-311+G(2df,2p) energy calculations using Gaussian 09 (revision D.01).
MD simulations (AMBER14SB) were performed on 5 -CTCGGCG*CCATC 12-mer DNA duplexes, with G* = POB-G or PHB-G in orientations that represent each conformational category and paired opposite C. Using the insight gained from these simulations about possible POB-G or PHB-G orientations in DNA, MD simulations were performed on duplexes containing the lesions mispaired against T, A and G. Finally, MD simulations for dCTP, dTTP and dATP insertion opposite POB-G or PHB-G by pol were initiated from a crystal structure of the insertion complex for dATP incorporation opposite T (PDB ID: 4ECS) (32). DNA systems were neutralized and solvated such that 8Å of TIP3P water surround the duplex, while polymerase systems were solvated such that 10Å of TIP3P water surround the solute and the concentration of NaCl is ∼0.150 M. All systems were minimized, heated and equilibrated. Subsequently, 20 ns pre-production simulations were performed to understand the inherent conformational dynamics at the lesion site. From these trial simulations, representative structures of unique conformations that cannot be easily converted into another conformation through standard MD sampling were chosen as starting points for further simulations. To enhance sampling of the accessible lesion conformations, this process was continued iteratively until no new structures were obtained. Based on these sampling simulations, the lesion orientation was used to select structures for three 100 ns replicas for each system (using different initial velocities) to ensure statistically relevant results were obtained. Due to negligible all-atom rmsds between each replica (generally < 2.5Å; Supplementary Table S1), one replica was extended to yield a final 0.5 s production trajectory for each DNA duplex and DNA-polymerase complex. Data from these final production simulations will be discussed throughout the main text. The representative structure for each MD simulation shown in the figures was obtained by clustering the trajectory based on the rmsd of the damaged base pair using the average linkage algorithm; however, all analysis was performed over the entire trajectory. To qualitatively compare the strength of lesion-site hydrogen bonding, the average lesion nucleobase pair interactions in DNA and polymerase models were estimated using B3LYP-D3(BJ)/6-311+G(2df,2p) calculations on 100 representative structures across each simulation.
Full details of the computational protocol are provided in the Supplementary Data.

POB-G and PHB-G have a high degree of inherent conformational flexibility
To understand the inherently preferred orientations of POB-G and PHB-G, the conformational landscape of both lesions is investigated using DFT nucleobase and nucleoside models. Consistent with the low rotational barriers previously reported for POB-G (31), both lesions are extremely flexible, with 302 and 710 unique conformations of POB-G and PHB-G identified, respectively ( Figure 2A). The greater number of structures for PHB-G compared to POB-G arises due to multiple orientations of the hydroxy moiety. All structures were visually inspected and classified based on discrete interactions between the bulky moiety and adducted G as stacked, hydrogen bonded, T-shaped or extended (no interactions; Figure 2B). The majority of POB-G (62%) and PHB-G (52%) conformations do not contain an interaction between the bulky moiety and the adducted G (extended, Supplementary Figure S1). The relative distribution of the remaining conformations varies between the two adducts, although the separation between the categories is not as clear for PHB-G since some stacked conformations also involve hydrogen bonding between the bulky moiety hydroxy group and the adducted G. Most importantly, regardless of the lesion, substantial variation occurs within each conformational category (Figure 2A), which further highlights lesion flexibility.
The most stable POB-G conformation is stacked, while the most stable hydrogen bonded, extended and T-shaped orientations are 2. and 4). Additionally, the anti orientation is consistently ∼15 kJ/mol more stable than the syn conformation, which indicates that both glycosidic orientations of the lesions may be energetically accessible.
Overall, although our computational protocol may not have fully recovered all possible lesion conformations, the number and variation in the predicted orientations support the conclusions drawn from the nucleobase and nucleoside models. Most importantly, our calculations on the damaged nucleobases and nucleosides reveal that both POB-G and PHB-G display a high degree of inherent conformational flexibility. As a result, the nucleobase and nucleoside DFT models emphasize that many lesion conformations should be sampled in larger DNA and DNA-polymerase complexes, with observed conformations being limited by only the surrounding environment. Furthermore, each possible lesion conformation may have unique biological implications that should be carefully scrutinized in larger models.

The Watson-Crick face of POB-G and PHB-G forms stable and minimally distorted base pairs with C, T and A, while the Hoogsteen face of POB-G and PHB-G forms stable but distorted base pairs with G
To understand the effects of the POB-G and PHB-G conformational heterogeneity on the lesion base-pairing potential, DFT is used to evaluate the geometry and strength of interactions between the Watson-Crick or Hoogsteen face of the most stable structure in each lesion conforma-tional category and the canonical nucleobases. Additionally, since an NMR structure of POB-G adducted DNA predicted the bulky moiety to be projected straight into the major groove (30), the hydrogen-bonding potential of fully extended POB-G and PHB-G (i.e. all dihedral angles in the bulky moiety set to 180 • ) is also considered. Nevertheless, neither the POB nor PHB bulky moiety orientation affects the trends in stability of the resulting base pairs (Supplementary Table S2 and Figures S5-12). Therefore, only the fully extended conformation will be discussed for simplicity. However, the bulky moiety orientation leads to additional interactions between the lesion and pairing base in some cases (e.g. hydrogen bonding between the bulky moiety and opposing base; Supplementary Figures S5-12), which may be important and will be discussed in more detail in the helical and polymerase environments.
Both POB-G and PHB-G form a wobble base pair with C using the Watson-Crick face of the lesion (Figure 3), which is consistent with previous studies on POB-G (29)(30)(31). This pair contains two hydrogen bonds and is weaker than the canonical G:C pair by ∼58-65 kJ/mol. Both wobble pairs are wider than G:C by ∼0.6Å and the base pair opening is ∼10 • larger. Nevertheless, the base pairs are nearly planar (interplanar angle = 1-4 • ). Regardless of the bulky moiety, adducted G forms a pseudo-Watson-Crick base pair with T (-69.7 or -55.6 kJ/mol for POB-G or PHB-G, respectively; Figure 3), which contains two hydrogen bonds. Although the G*:T mispairs slightly deviate from planarity (by 10-  Table S2) and significantly distorted (i.e. deviations up to ∼2Å and 55 • in the base pair width and interplanar angles, respectively, Supplementary Figures  S11 and 12).
Several other DNA adducts use their Hoogsteen face to pair with the canonical nucleobases (18,33,34). In the case of POB-G and PHB-G, the strongest interactions with the lesion Hoogsteen face occur with G (−73.8 and −93.9 kJ/mol for POB-G and PHB-G, respectively, Figure 3). Nevertheless, these base pairs deviate from the natural G:C pair in the opening angle by up to ∼25 • and base pair width by up to ∼2Å, and the pairs are nonplanar by up to ∼57 • (Figure 3). The hydrogen-bonding interactions between the Hoogsteen face of POB-G or PHB-G, and the Watson-Crick face of C, T or A are weak (-43 to -52 kJ/mol, Supplementary Table S2 and Figures S5-10), and the pairs are structural distorted (i.e. deviations up to ∼3Å and 20 • in the base pair width and opening angles, respectively, Supplementary Figures S5-8).
Overall, the Watson-Crick faces of POB-G and PHB-G form similar hydrogen-bonding interactions with the canonical nucleobases. In particular, stable pairs are characterized between the lesions and C, T and A, with the T pairs being the least distorted. Conversely, the most stable base pairs with the Hoogsteen faces of the lesions occur with G. However, the high degree of distortion predicted for the isolated G*:G mispairs indicates that either the distorted base pairing geometry will not be adopted in the duplex environment or the pair will cause significant structural changes to the overall helical geometry.

Both lesions preferably extended the bulky moiety into the major groove of DNA, while PHB-G may also adopt an intercalated conformation
Since each lesion conformational category identified for the adducted nucleobases will result in a different bulky moiety position in DNA duplexes, MD simulations were initiated from each conformational theme for POB-G or PHB-G paired opposite C to better understand the accessible lesion orientations in DNA and the resulting impacts on DNA structure. Unsurprisingly, the bulky and intrinsically unstable T-shaped nucleobase orientation ( Figure 2) does not fit within the confines of the duplex environment. Incorporation of the stacked POB-G or PHB-G conformation into DNA places the bulky moiety between the adducted base pair and the 5 or 3 flanking base pair. Regardless of the initial 3 or 5 stacked orientation, the bulky moiety consistently adopted an extended orientation during equilibration (Supplementary Figure S13), which suggests the inherently stable stacked conformation of the damaged nucleobases (Figure 2) cannot be accommodated in the duplex. When the lesion hydrogen-bonded conformation is considered, the bulky moiety is situated in the location of the pairing base, forcing the opposing C into an extrahelical position. Although this conformation could be stabilized through stacking interactions between the bulky moiety and flanking base pairs, as well as hydrogen bonding between the bulky moiety and adducted G, POB-G acquires an extended conformation with the bulky moiety in the major groove and stacked with the 3 -G upon simulation, regardless of the initial starting point (Figure 4, left). This suggests that the POB-G hydrogen-bonded conformation is not stable in the duplex. Conversely, PHB-G remains intercalated throughout the simulation (Figure 4, right). In addition to bulky moiety stacking interactions with the flanking pairs and hydrogen-bonding interactions with the adducted G, the intercalated PHB-G position is stabilized through discrete hydrogen-bonding interactions between the bulky moiety hydroxy group and the 5 or 3 -G (5 , 17% or 3 , 11%; Figure 4, right), which suggests the stability of this conformer is likely sequence dependent.
When the dynamics of the extended POB-G or PHB-G orientation is considered with the bulky moiety in the major groove, many lesion conformations occur in which the bulky moiety does not interact with the surrounding DNA ( Figures 5 and 6). This directly correlates with the extended conformation being the inherently most common orientation for the isolated nucleobase ( Figure 2). Nevertheless, in some of the sampled adducted DNA conformations, the bulky moiety forms a T-shaped interaction with the Hoogsteen edge of the adducted G or a flanking base, or a hydrogen bond with the pairing C (N4H, ∼20%), which further highlights the inherent lesion flexibility and the pre-   dicted similar stability of many nucleobase conformational themes. As predicted by DFT base pair models, both POB-G and PHB-G form a wobble pair with C that contains G*(N1)···C(N4H) and G*(N2H)···C(N3) hydrogen bonds (each occupied for > 86% of the simulation). The pairs are ∼0.5Å wider and the opening angle is ∼10 • larger than for canonical DNA (Figures 5C and 6C; Supplementary Figures S14 and 15). The stability of POB-G:C (−61.8 kJ/mol) is less than PHB-G:C (−68.9 kJ/mol; Supplementary Table S2), as well as canonical G:C. Other global structural feature of the damaged DNA are consistent with those expected for natural DNA throughout the simulation, including the maintenance of the flanking base pairs and base step parameters ( Supplementary Figures S14 and 15).
Overall, POB-G adducted DNA adopts one dominant conformational theme that positions the bulky moiety in the major groove and the damaged G hydrogen bonds with the opposing base, which correlates with the single structure of POB-G:C adducted DNA previously reported based on NMR (30). Although we acknowledge limitations in the ability of classical MD simulations to fully sample all lesion site conformations, the current work highlights the diversity of bulky moiety orientations, which is consistent with reported broad NMR peaks (30). In contrast, PHB-G adducted DNA can adopt two conformations: (i) the bulky moiety intercalates into the helix and the pairing base is displaced into the major groove, and (ii) the bulky moiety is in the major groove and the adducted G hydrogen bonds with the opposing base. This difference in the conformational heterogeneity of POB-G and PHB-G damaged DNA provides insight into the potential influence of the bulky moiety composition on mutagenic outcomes. Indeed, it has been proposed that DNA lesions with the ability to stabilize intercalated conformations have a propensity to form deletion mutations (18,33,34). Therefore, our calculations predict that PHB-G may lead to deletion mutations, while the instability of the intercalated conformation for POB-G correlates with the absence of deletion mutations in the reported mutagenic profile for this lesion (21,27). Regardless, both POB-G and PHB-G can maintain hydrogen bonding with an opposing C such that there are only minor distortions to the DNA structure, which reflects the importance of this non-mutagenic pairing in the broader biological context and is consistent with the experimentallyobserved non-mutagenic replication of POB-G (21,27).

Both POB-G and PHB-G form stable interactions with T and A in the DNA duplex
To probe the dynamical structure of post-replication duplexes that correspond to mutagenic events, the POB-G and PHB-G mispairs with T, A and G were modelled in DNA. When POB-G or PHB-G is paired opposite T, a pseudo-Watson-Crick pair is maintained that contains G*(N2H)···T(O2) and G*(N1)···T(N3H) hydrogen bonds (−41.7 or −55.5 kJ/mol for POB-G or PHB-G, respectively; Figures 7A and 8A; Supplementary Table S4). The G*(N1)···T(N3H) interaction is less persistent for POB-G (34%) than PHB-G (60%) due to the orientation of the bulky moiety at the site of attachment (i.e. C10-H is directed toward the POB-G Watson-Crick face to allow for Tshaped interaction between the pyridyl ring with the flanking base pairs, but C10-H is directed toward the PHB-G Hoogsteen face, with transient hydrogen bonding between the bulky moiety hydroxy group and flanking bases). Although the G*:T pairs are 15 to 20 kJ/mol weaker than the G*:C pairs within the helix, the G*:T mispairs cause little deviation in the canonical DNA structure, with a lesion base pair width of ∼10.6Å and opening of ∼58 • , as well as base step and base pair parameters consistent with natural DNA (Supplementary Figures S17 and 18).
When syn-POB-G or PHB-G is paired opposite G (Supplementary Table S4), a single hydrogen bond occurs between the adducted G(N7) and the pairing G (N2, for >90% of the simulation; Figures 7C and 8C; Supplementary Table  S6), which results in pairs that are up to 10 kJ/mol weaker than the G*:C pairs. Although there is little deviation from the canonical base pair width, the lesion mispair is not planar (interplanar angle ∼20 • ). As a result, the DNA helix is highly distorted with respect to all base step and base pair parameters surrounding the lesion site ( Supplementary Figures S21 and 22). For example, the rise and twist between the adducted base pair and the 5 -flanking pair are 2Å and ∼90 • smaller than for an undamaged helix, respectively.
Overall, our simulations on post-replication DNA duplexes suggest that potential base substitution mutations formed upon replication of POB-G and PHB-G due to alternations in the Watson-Crick face of G will afford stable duplexes. Specifically, stable T and A mispairs can exist for both lesions, with the T mispairs causing the least distortion to the DNA duplex. However, the predicted highly distorted helices containing POB-G:G and PHB-G:G suggest that the G mispairs will destabilize the post-replication duplexes and may hinder lesion replication. This proposal is consistent with the observed formation of POB-G:G pairs, but the lack of extension past the lesion site by pol , , and in vitro (21,27).

Polymerase insertion complexes indicate that dCTP and dTTP are best aligned for catalytic insertion opposite POB-G and PHB-G, but suggest that PHB-G may display a higher mutagenic potential
To understand the impact of constraints imposed by the pol active site on the preferred lesion orientations and structures of lesion base pairs, the pol insertion complexes were considered for dCTP, dTTP or dATP incorporation opposite POB-G and PHB-G ( Figure 9). The bulky moiety adopts many distinct extended conformations in the pol major groove binding pocket (Supplementary Figure S23). Indeed, although there are limitations in the ability of classical MD simulations to sample the entire conformational landscape of flexible lesions, the lesion conformations predicted when bound in the polymerase active site exhibit a high degree of overlap with those for the free damaged nu-  Regardless of the lesion conformation, all insertion complexes have key structural features that correctly align the dNTP for the phosphoryl transfer reaction. Specifically, Cys16, Phe17, Phe18, Tyr52, Arg55, Arg61 and Lys231 form hydrogen-bonding interactions with the dNTP (Supplementary Figure S25) (32). Although minor differences exist in the interactions between the polymerase and each dNTP (Figure 10), all dNTPs maintain a network of dNTPpolymerase hydrogen bonds (Tables S7-S9). Additionally, all predicted structures for the pol insertion complexes associated with POB-G and PHB-G replication maintain octahedral coordination of the binding and catalytic Mg 2+ ions (Supplementary Table S10), which is known to be essential for catalysis (32). Nevertheless, the coordination of O␣2 to the binding Mg 2+ ion adopts a wide spread of distances (Supplementary Figure S24). Finally, for a polymerase insertion complex to be catalytically active, the reaction coordinates should be aligned, which includes: (i) the distance between the 3 -primer end(O3 ) and the dNTP(P␣) approaching the van der Waals radii for O and P (3.5 A), and (ii) the in-line attack angle [∠(3 -primer end(O3 )-dNTP(P␣)-dNTP(O␣␤))] approaching 180 • (35-37). All POB-G and PHB-G replication complexes maintain this reaction coordinate orientation, with a distance < ∼3.3Å and angle > ∼170 • (Figure 9).
In addition to the contacts between pol and the dNTP triphosphate tail, the position of the dNTP in the active site is stabilized by hydrogen-bonding interactions with the lesion in the template strand. Indeed, hydrogen bonding between the template base and incoming dNTP has been proposed to play an important role in dNTP selection by TLS polymerases (16,38,39). Consistent with DNA duplex mod-els ( Figures 5-8), POB-G and PHB-G form a wobble pair with dCTP that contains G*(N1)···dCTP(N4H) (>20%) and G*(N2H)···dCTP(N3) (>50%) interactions, and has an overall interaction strength of ∼−50 kJ/mol (Figure 9). The POB-G and PHB-G pairs with dCTP have opening angles significantly wider than G:dCTP in the pol active site (up to ∼35 • ), but have the same interplanar angle as G:dCTP. While the PHB-G:dCTP pair has the same width as G:dCTP, the POB-G:dCTP pair is ∼0.5Å narrower.
As seen for unbound DNA, a persistent pseudo-Watson-Crick pair occurs between dTTP and POB-G or PHB-G that contains an intermittent POB-G(N1)···dTTP(N3H) interaction (17 or 25%, respectively) and a POB-G(N2H)·dTTP(O2) hydrogen bond (39 or 86%, respectively; Figure 9). When only these two interactions are formed between the adducted nucleobase and the incoming dTTP, both pairs have an interaction strength of ∼−45 kJ/mol. However, in contrast to unbound DNA and dTTP insertion opposite POB-G within the pol active site, the PHB-G:dTTP pair can also contain a hydrogen bond between the bulky moiety hydroxy group and O4 of the pairing dTTP (40%). When this bulky moiety-dTTP contact is present, the interaction strength increases to ∼−70 kJ/mol. Irrespective of the involvement of the bulky moiety in dTTP binding, both the POB-G and PHB-G pairs maintain a structure consistent with canonical DNA.
Overall, regardless of the catalytically conducive alignment of the pol active site, the lack of strong, persistent and undistorted hydrogen bonding between A and the lesion suggests that dATP is unlikely to be selectively incorporated opposite POB-G or PHB-G by pol . On the other hand, dCTP insertion results in a more stable and a less distorted hydrogen-bonded pair with both lesions, as well as a correct active site configuration. Finally, persistent hydrogen bonding occurs between both lesions and an incoming dTTP, which leads to base pair structural parameters more consistent with natural DNA than dCTP. Therefore, our calculations predict that pol replication of these two key tobacco induced DNA lesions will lead to preferential dNTP insertion according to dTTP > dCTP >> dATP. This is consistent with experimental studies on POB-G, which report a high frequency of G → A mutations and non-mutagenic replication (21,26,27), and highlights the predictive power of computer modelling for understanding the mutagenicity of DNA damage. In the absence of experimental data, the same computational approach reveals that the PHB-G:dTTP pair can be significantly more stable in the pol active site than POB-G:dTTP, and therefore indicates that PHB-G may be more mutagenic. This effect is caused by a difference in the bulky moiety composition, and therefore this finding complements previous literature that emphasizes the influence of adduct structure on the mutational outcomes of DNA lesions (14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25).

CONCLUSION
The current study uses a multiscale computational approach to provide structural data that rationalizes and predicts the mutagenicity of two tobacco-derived carcinogenic DNA lesions, namely POB-G and PHB-G. Quantum mechanical calculations reveal that both POB-G and PHB-G possess a high degree of inherent conformational flexibility, which may affect the biological processing of the lesions and emphasizes the importance of dynamical structural data for other flexible DNA lesions. Nevertheless, regardless of the conformation adopted, both lesions form stable and minimally distorted nucleobase pairs with C, T and A, but highly distorted pairs with G. Indeed, MD simulations predict that the G mispairs significantly alter the DNA duplex, suggesting that dGTP incorporation will not afford stable post-replication duplexes. This proposal is consistent with the previously reported formation of, but lack of extension past, POB-G:G pairs by TLS polymerases in vitro (21,27). In contrast, stable minimally distorted duplexes occur when POB-G or PHB-G are paired opposite C, T and A. However, dATP weakly interacts with either lesion when bound within the polymerase, and therefore is unlikely to be selectively incorporated opposite POB-G or PHB-G, despite a catalytically conducive alignment of the pol active site. This finding agrees with the observed low frequency of G → T mutations upon replication of POB-G in vivo (2%) (26). When dCTP and dTTP are placed opposite POB-G or PHB-G, the pol active site is aligned for catalysis and the dNTP forms stable interactions with both lesions. However, the G*:dTTP pairs are less distorted, suggesting dTTP will be inserted more often than dCTP. These findings are consistent with experimental studies that report an abundance of G → A mutations, as well as non-mutagenic replication, for POB-G (21,26,27), which validates our approach. Nevertheless, differences in the bulky moiety hydrogen-bonding patterns enhance the stability of the PHB-G:dTTP pair, suggesting that dTTP will be more often misinserted opposite PHB-G. Furthermore, the mutagenicity of PHB-G is likely increased by the stabilization of a conformation that intercalates the bulky moiety into the DNA helix, which has been associated with deletion mutations for other adducts. This contrasts the instability of an analogous conformer for POB-G, which coincides with the experimental mutagenicity spectrum. Nevertheless, future experimental work that probes the mutational profile of PHB-G is required to confirm the relative importance of deletion mutations for different tobacco-derived lesions. Overall, this study provides key structural insight into the reported mutagenicity of POB-G, uncovers the first clues about the mutagenicity of PHB-G, and adds to a growing body of literature that highlights the impact of small chemical changes to the bulky moiety on lesion mutagenicity.