Modelling of C-terminal tail of human STING and its interaction with tank-binding kinase 1

Stimulator of interferon genes (STING) plays a significant role in a cell’s intracellular defense against pathogens or self-DNA by inducing inflammation or apoptosis through a pathway known as cGAS-cGAMP-STING. STING uses one of its domains, the C-terminal tail (CTT) to recruit the members of the pathway. However, the structure of this domain has not been solved experimentally. STING conformation is open and more flexible when inactive. When STING gets activated by cGAMP, its conformation changes to a closed state covered by 4 beta-sheets over the binding site. This conformational change leads to its binding to Tank-binding kinase 1 (TBK1). TBK1 then phosphorylates STING aiding its entry to the cell’s nucleus. In this study, we focused on the loop modeling of the CTT domain in both the active and inactive STING conformations. After the modeling step, the active and inactive STING structures were docked to one of the cGAS-cGAMP-STING pathway members, TBK1, to observe the differences of binding modes. CTT loop stayed higher in the active structure, while all the best-scored models, active or inactive, ended up around the same position with respect to TBK1. However, when the STING poses are compared with the cryo-EM image of the complex structure, the models in the active structure chain B displayed closer results to the complex structure.


Introduction
Human Stimulator of Interferon Genes (STING) protein is a homodimer, 379 amino acid long transmembrane protein encoded by the human TMEM173 gene. It is expressed in hematopoietic cells and immune tissue (Barber, 2015). STING is a member of an immune response signaling network that gets activated in response to bacterial, protozoa, viral nucleic acids, and self-DNA through the regulation of type-I interferon (IFN) (Li, Wilson and Kiss-Toth, 2017) Structurally, STING is divided into three parts, an N-terminal domain that includes four transmembrane regions (1-154), which functions as a control for interorganelle trafficking and membrane anchorage, a dimerization and ligand-binding domain , and a C-terminal tail (CTT, that is located in the cytoplasm. CTT domain includes the conserved PLPLRT/ SD motif and pLxIS motif that STING uses to recruit other members of the pathway (Zhao et al., 2019). In a healthy human cell, DNA is located either in the nucleus or mitochondria, but, in some rare cases, the mtDNA/DNA is released to the cytosol. This triggers the cGAS-cGAMP-STING pathway which gets also activated by bacterial, protozoa, or viral nucleic acids double-stranded DNA (dsDNA). The pathway as explained in Figure 1, starts with the dsDNA binding to cyclic GMP-AMP synthase (cGAS) that uses ATP and GTP to catalyze the formation of cGAMP, a second messenger (Bai and Liu, 2019;Unterholzner and Dunphy, 2019). The catalyzed cGAMP binds to STING in the ER activating it.
In Figure 2, inactive (Shu et al., 2012) STING conformation is displayed with the active X-ray structure (Gao et al., 2013). These structures are formed by dimerization of two monomers and ligand-binding domain . There is also a membrane-spanning segment and CTT domain, which are not resolved in these structures. STING is a homodimer, and when cGAMP binds to STING, the two monomers of the STING come closer around the ligand-binding site. This closure over ligand forms four β-sheets, by bringing 2 β-sheets from each subunit (four beta-sheets in red, Figure 2B). This conformational change leads to the CTT domain being released and moving towards the lid residues. Only then, STING will bind to TBK1 (Zhao et al., 2019). However, the CTT domain of STING is still not fully understood nor has its crystal structure been solved. This limits our understanding of STING-TBK1's complex formation and its important functional interactions.
In the molecular dynamics (MD) study of Tsuchiya and coworkers (Tsuchiya et al., 2016), active (cGAMPbound) and inactive STING structures with modeled CTT loop structures were used. 700 ns and 1000 ns-long MD simulations in explicit solvent molecules were collected for each structure. In the cGAMP-bound structure, a more organized and structured CTT loop structure is observed. Mainly, a temporary β-sheet structure is formed between residues 348-350 and 362-364 as seen in Figure  3. Hydrogen bond interactions between Thr348-Leu364 and Ala350-Glu362 main chain are observed only in the active structure. Tsuchiya and coworkers proposed that  (Tsuchiya et al., 2016). The hydrogen bonds were shown by green lines. The conserved SER residues in mouse and human STING are underlined. the formation of this β-sheet might be an important factor in the complex formation of TBK1 and STING (Tsuchiya et al., 2016). However, they did not have TBK1 in these simulations; only STING structures were present.
When the TBK1-STING complex structure is considered, a recent cryo-EM structure at 3.3 Å resolution was obtained in 2019 (Zhang et al., 2019). This complex structure is in between human TBK1 and chicken STING, and the resolution of the STING part is lower than 3.3Å. Thus, in the complex structure, PDB code 6NT9, only the STING tail is resolved while the whole TBK1 is in all-atom detail (as seen in Figure 4, red and pink for STING tails). The resolved segment of STING is only 8 amino acids long out of the 36 amino acid CTT loop. Namely, a highresolution complex structure of human TBK1 and human STING structure is still missing.
In this study, we focused on modeling the CTT domain in both active and inactive STING conformations by homology and loop modeling. Then, we docked full-length STING with the CTT domain in multiple conformations to TBK1 by using HADDOCK software. We used the resolved parts of STING as restraints in the proteinprotein docking part. Finally, we analyzed and compared the structural features of STING-TBK1 complex formation in the active and inactive forms of STING.

Modeling
Active STING (PDB code: 4LOI) and inactive STING (PDB code: 4EMT) structures shown in Figure 2 were used as the templates to model STING structure with the CTT domain. These two structures are X-Ray structures for human STING, but they lack the CTT loop part. CTT domain is approx. 36 amino acid long loop, and it exists at the end of each monomer of the dimer structure. For building the homology models and for loop modeling, MODELLER auto-model and loop classes were used (Webb and Sali, 2016). First, each X-Ray structure was aligned with the full sequence of STING (UniprotKB: Q86WV6) containing the CTT domain. Figure 5 shows the alignment results, and the missing CTT sequence is highlighted in green in the PIR sequence format. We, then, built ten models for each template by using the alignments. Next, the best model was picked based on the Z-Dope score. Z-Dope is a statistical score developed by MODELLER, which evaluates the energy of the models through many iterations. Models returning the minimum value of normalized Z-Dope (-1.0 being the native-like structure) score were chosen as the most probable structure.
Subsequently, two best models, one for active and one for inactive, are carried to the loop modeling step. Here, many random loop structures were generated by randomizing the atomic positions by ±5Å in each Cartesian direction. In the loop modeling algorithm, the model optimization occurs twice; the first takes into consideration only the loop atoms, while the second iteration takes into consideration how the atom interacts with the rest of the protein. For each structure, 50 different loop conformations were built via the MODELLER loop modeling module used (Fiser, Do and Šali, 2000). MODELLER has different levels of MD refinement stage after the optimization of the models. This is separate from the initial model building step. The models obtained after the loop modeling step was refined by setting "md_ level" parameter to "very_slow" in MODELLER. This is equivalent to 10000 steps of minimization followed by 1 ns equilibration of the model at 300 K. Again, for these structures, only the loop parts are different.
After the modeling step, loop modelling, and refinement stages, we picked the best scored inactive and active STING structure to protein-protein docking step. The details of filtering the best structures are explained in the results section.

Protein-protein docking
With the information we have about STING-TBK1 interaction, the protein-protein docking step was done using HADDOCK (Dominguez, Boelens and Bonvin, 2003). To drive the docking process, HADDOCK introduces ambiguous interaction restraints (AIRs) which represent the distances between the residues involved in the interaction of two proteins. In our case, these interacting residues were taken from PDB structure of 6NT9 (Zhang et al., 2019). Mainly, HADDOCK algorithm introduces three different steps: Randomization of orientations and rigid-body minimization followed by semiflexible simulated annealing in torsion angle space. In the first stage of minimization, AIRs are included in the energy function being minimized. The following scoring function is utilized in the first step: In this equation, van der Waals energy, electrostatic energy, and the distance restraint contribution of AIRs are included. The best structures from this step were taken to torsion angle space. In the final refinement stage, the following scoring function is evaluated for the complex structure.
Finally, a refinement step in cartesian coordinate space with explicit solvent (TIP3P water model) is performed.
For docking, the STING segment was removed from the cryo-EM complex structure. The remaining structure was used for TBK1, and it has a resolution of 3.3 Å. For  STING the models we built with the CTT domain in active and inactive conformations were used.
The active residue numbers directly involved in the binding were taken from the cryo-EM complex and set as (residues 215, 217, 218, 220, 219) for STING and (residues 8,27,29,577,581,584,582) for TBK1. The remaining passive residues were set automatically by HADDOCK. For the docking, all HADDOCK parameters were left as default. HADDOCK yielded seven cluster results. The cluster with the lowest z-score was considered as the best structure. We also analyzed the results by superimposing them onto 6NT9 to see how close our STING active residue poses are to the crystal structure's active residues. Moreover, STING has two chains to be docked to TBK1, chain A, and chain B, and the docking process was repeated several times using different chains.
Additionally, the best poses obtained from HADDOCK are cross-checked with the ClusPro web server (Kozakov et al., 2017). ClusPro uses rigid-body protein-protein docking, while HADDOCK incorporates rigid and flexible steps into the docking. Still, for the validation of the poses, a second protein-protein docking software was needed. Restraints in HADDOCK were given to the ClusPro web server as attraction points on the surfaces of TBK1 and STING. The obtained poses are then compared with the superposed HADDOCK results.

Modelling
At the end of the loop modelling step, we had 50 different conformations for each loop. In Figure 6A, the highest Z-Dope scored loop structures are displayed for chain A and chain B in the active structure. In Figure 6B, the same type of analysis is displayed for the inactive structure. Z-Dope score changes between -1 and 1, and a lower score means native-like structure. In these graphs, Z-Dope profiles, smoothed over a 15-residue window, and normalized by the number of restraints acting on each residue, are displayed. This shows the local quality of the model around each residue. We picked the best 4 structures from Figure 6 for both loops to be evaluated in the next step (Figure 7). A ribbon representation of the top 4 models superimposed in Figure 8 shows the difference  in the loops for both the active and inactive models. Please note the loops are staying lower with respect to the body of the protein in the inactive structure ( Figure 8A is for active, and Figure 8B is for inactive). This conformational difference has been determined to be functionally important for STING to interact with TBK1, STING is moving its arms up (its CTT loops) when it is activated and associating with TBK1 (cf. Figure 1, cGAS-cGAMP-STING pathway). Also, the formation of ordered structures in the loop region is inspected. In a previous MD study, a partial β-sheet formation in the CTT region has been observed only in the active structure of STING (Tsuchiya et al., 2016). Figure 9 displays distance distributions of the residues in the small β-sheet segment active (orange) and inactive (blue). The distances are between E362-A350, L363-S349, and L364-T348 (shown in Figure 3). Figure  9A displays the distance distribution of E362-A350 in all loops, and the distances are shorter in the active structure. Figure 9B displays L363-S349 distance, and a more random distribution is observed in both active and inactive structures. The inactive structure still samples larger distances. And finally, in Figure 9C, L364-T348 distance distribution is displayed, and in the active structure, this distance is also located more to the left. As a summary in the active structure, the previously observed β-sheet segment stays more compact. Please note that these loops are very mobile and random, and the sheet formation in the MD study is not checked experimentally. But still, our study also suggests that the distances in this specific region stay closer in the active structure.
Additionally, the best inactive and active models according to this β-sheet distances were aligned and can be seen in Figure 10. These models are the ones that displayed the shortest distance for β-sheet forming residues. The loops are staying more compact in the active structure (grey colored in Figure 10) while the arms as CTT loops are again lower in the best inactive model (blue colored in Figure 10) structure when compared with the active structure.
By considering all the above-mentioned criteria for the structural properties of the CTT loop, Z-Dope score, CTT loop height, and β-sheet distances are combined in Table 1. Z-Dope scores plotted in Figure 6 and Figure 7 were local scores. They showed the quality of local regions. However, the values in the Table 1 are the overall scores of the complete loop sections. CTT height is measured from the end of the CTT loop to the beginning of the loop (distance A), and from the midpoint of CTT to the beginning of the loop (distance B). In Figure 10B, these two distances are shown with green arrows in one of the models.
For inactive STING models chain-A, model 46, and model 11 have the shortest distances in β-sheet for chain A. Those two models have also the most compact form in CTT-height and low Z-Dope scores. The other model, model 23, has the lowest Z-Dope scores as well as compact CTT height. That means the most compact loop structure is giving the best Z-Dope score also.
For inactive STING models, chain B, model 5, and 19 have the shortest distances in β-sheet and relatively short CTT height. The best Z-Dope score with -0.566 has also a short CTT height. From the analysis of these two sets of models, the best Z-Dope scores are also resulting in low CTT heights.
For active models, Z-Dope scores are much better than inactive models in general. Once again, the TBK1 binds to STING from the CTT loop in the active conformation. Here in the results of active models, the loop structures are organized and more compact according to Z-Dope scores. In chain A model 12 has a strikingly low distance for β-sheet formation as well as the lowest Z-Dope score. For that model, the CTT model height from the midpoint (distance B in the table) distance is also relatively low. For chain B, the same model, model 12 is giving good results. Additionally, model 9 is giving short distances in β-sheet formation, as well as low CTT height values for chain B.

Protein-protein docking
Since Z-Dope scores for different models are still very close to each other; for the protein-protein docking steps, instead of taking single best models, we took all the models in Table 1 to HADDOCK analysis. Models in Table 1 were taken to HADDOCK database for the docking of STING and TBK1. The STING chains A and B were docked separately to TBK1 binding site.
In Table 2, the HADDOCK results are displayed with HADDOCK scores for the best clusters obtained and the corresponding Z-score of each cluster. RMSD values are calculated after superposing the residues of the whole TBK1 and STING segment that are solved in cryo-EM complex structure (PDB code:6NT9).
Model 46 chain A and Model 27 chain B for the inactive conformation and Model 12 chain A, Model 35 chain B for the active conformation displayed the best RMSD results from the docking step. Model 12 for the active model is also the best model according to the Z-Dope score. In Figure 11A and B, the poses of the STING Zaxis segments are displayed for inactive structure and active structure, respectively. The RMSD values are calculated superposing TBK1 and cryo-EM resolved STING segment (green colored in the figures). Since the STING cryo-EM structure is for chicken, and our models are for human, the RMSD values are for different sequences but include the conserved PLPLRT segment of CTT loops.
The best models according to RMSD values are shown in Figure 12 as full-length complexes to display the location of STING with respect to TBK1. TBK1 is cyan, while active STING molecules are purple and inactive ones are orange in this Figure. One important finding is that all those models are docked approximately around the same position in terms of the z-axis of TBK1. Please don't forget that there is a long CTT loop; thus, the STING molecule has all the freedom to be docked. In other words, out of 36 residues modelled in the CTT loop, we are only giving the 8 residue segments as restraints to the HADDOCK. In Figure 12C, different poses that we obtained for STING as sub-optimal solutions in HADDOCK for chain B are displayed (TBK1 is cyan, and different poses of STING are displayed with colours ranging from white-red-blue scale in Figure 12 C). Out of all those possible locations, both inactive and active best-docked poses ended up in the same location with respect to TBK1 (Figure 12 A) We also checked our findings here with a second protein-protein docking software, ClusPro. Figure 13 displays the best-scored models in HADDOCK together

Inactive 27B
with the results of ClusPro. We took the best-scored STING models and one TBK1 structure and re-dock them in ClusPro. We used the restraints from HADDOCK as attracting residues in ClusPro and left all the other parameters as default. The best or at most second best pose in ClusPro gave a similar position of STING with respect to TBK1 (Yellow colored STING in Figure 13 is ClusPro results).
Another key finding of this study is that in all the active models, when STING is docked from Chain B (see Table 2 Active STING chain B), RMSD values are lower than 8 Å. Thus, the active models, when the structure of STING is more compact, have better poses. We didn't get similar results for the inactive structure or active structure but chain A. There is still an ongoing discussion whether STING is binding to TBK1 from two CTT chains or one. If one STING molecule is interacting with TBK1 from two sites, then the second CTT loop can have different behaviour when the first chain is bound (Zhang et al., 2019;Zhao et al., 2019).

Discussion
When the STING molecule binds to cGAMP, it gets activated. Then, it can interact with TBK1. However, the interaction mechanism of STING with TBK1 remains unclear. In this study, we modelled the missing CTT loop in the human structure, which is around 36 amino-acid long by using loop modelling algorithms in MODELLER software. In general, loops are very difficult to model or resolve by experimental techniques due to their random and highly flexible structures (Fiser, Do and Šali, 2000;Lee, Heo and Seok, 2016;Feig, 2017). Here, instead of obtaining an initial homology model for the loop segment, then sampling it with different simulation techniques, we obtained an ensemble of loop structures from MODELLER program directly. Then, they are analysed with different scoring functions. These different loop structures could have also been obtained from a simulation such as MD simulation; however, MODELLER loops are sampling a larger pool of geometries with lower precision. Thus, in this study, we aimed at a larger sample of conformations that might be less precise in the atomistic details.
The loop structures are first analysed with the MODELLER scoring function locally and globally. Z-Dope scores obtained for active models are much lower than inactive models in general. Additionally, the height of the CTT and secondary structure formation of a small β-sheet, previously observed in 700 ns molecular dynamics simulation, has been checked for the models. The ligand-induced ordering of the CTT was observed only in the active conformation (Tsuchiya et al., 2016). In our models, the distance required to form this ordered structure is checked via the distribution of three distances in the models and we also observed that in the active conformation, these distances stayed shorter. Finally, the height of CTT, like the arms of the molecule, is observed to be higher in the active conformation.
Last but not least, in the HADDOCK protein-protein docking step, all the best-docked results ended up around the same vicinity of TBK1 (Figure 12 A). The location of TBK1 with respect to STING is still not known. Only a small conserved region of chicken STING is solved in the cryo-EM image (Zhang et al., 2019;Zhao et al., 2019). In our findings, RMSD values of the active human STING conformations for chain B are closest to the cryo-EM image structure. In the inactive models, there are also similar poses however, RMSD values are not that close.
As a summary, STING's relative location to TBK1 did not change whether STING is the inactive or the active form. However, the CTT tail sampled much closer results to cryo-EM complex structure when STING is in active conformation. We hope the findings of this study, especially the location of STING with respect to TBK1 and binding of one of the chains of CTT to TBK1 more efficiently than the second chain, will be tested when the full-length STING-TBK1 complex structure becomes available. Moreover, when creating the model in MODELLER and docking in HADDOCK, the solvation effects are implicitly included in the scoring functions. In a future study, we are planning to run the best scored complexes of this study in a fully solvated all-atom molecular dynamics simulation to observe the effect of solvation and equilibration on the CTT loop structure.