Investigating the trade-off between folding and function in a multidomain Y-family DNA polymerase

The way in which multidomain proteins fold has been a puzzling question for decades. Until now, the mechanisms and functions of domain interactions involved in multidomain protein folding have been obscure. Here, we develop structure-based models to investigate the folding and DNA-binding processes of the multidomain Y-family DNA polymerase IV (DPO4). We uncover shifts in the folding mechanism among ordered domain-wise folding, backtracking folding, and cooperative folding, modulated by interdomain interactions. These lead to ‘U-shaped’ DPO4 folding kinetics. We characterize the effects of interdomain flexibility on the promotion of DPO4–DNA (un)binding, which probably contributes to the ability of DPO4 to bypass DNA lesions, which is a known biological role of Y-family polymerases. We suggest that the native topology of DPO4 leads to a trade-off between fast, stable folding and tight functional DNA binding. Our approach provides an effective way to quantitatively correlate the roles of protein interactions in conformational dynamics at the multidomain level.


Introduction
Our understanding of protein folding has been deepened by intensive experimental, theoretical, and computational studies focused on single-domain proteins or isolated domains of multidomain proteins (Jackson, 1998). However, it is widely recognized that throughout all three kingdoms of life, proteins occur predominately in multidomain forms (Apic et al., 2001;Ekman et al., 2005). As their name indicates, multidomain proteins consist of more than one structural building unit, or domain (Teichmann et al., 1999). Domains themselves have a strong tendency to fold (Murzin et al., 1995), but although there is high structural modularity in a multidomain protein (Han et al., 2007), the folding of a multidomain protein usually takes a more complex form than a simple sum of folding of individual domains (Levy, 2017). The key component in the folding of a multidomain protein is the interaction of domain interfaces or linkers, which have been found to play nonuniversal roles in modulating folding stability (Fast et al., 2009;Bhaskara et al., 2013;Vishwanath et al., 2018), cooperativity (Batey et al., 2005) and kinetics (Osváth et al., 2005;. Efficient folding of a multidomain protein is vital not only for providing structural scaffolds for biological function (Vogel et al., 2004), but also for preventing misfolding (Strucksberg et al., 2007). Multidomain proteins, which often possess significant domain interfaces, are more prone to aggregation during folding processes than single-domain proteins (Han et al., 2007;Borgia et al., 2011). It has been suggested that in vivo, multidomain proteins can undergo co-translational folding (Fedorov and Baldwin, 1997), where each domain folds sequentially one-by-one during protein synthesis from the ribosome (Netzer and Hartl, 1997;Frydman et al., 1999). Likewise, a 'divide-andconquer' scenario has been proposed for in vitro multidomain protein folding, where all domains fold independently, followed by coalescence of neighbors (Wang et al., 2012a). In both of these folding scenarios, independent domain folding plays an essential role and is deemed to drive the global folding. At the same time, the role of domain coupling in multidomain protein folding appears to be important, but its complexity means that a definitive conclusion cannot easily be drawn (Batey et al., 2008). A recent computational study of a two-domain serpin elucidated the critical role of the functional binding-related reactive center loop (RCL) in the folding of the protein to distinct structures (Giri Rao and Gosavi, 2018). Folding of the serpin to the metastable active structure, where the RCL is present as an intradomain segment, is faster than folding to the stable latent structure, where the RCL is involved in extensive interactions between domains. Other work using a similar model, however, indicated that removal of interdomain interactions had little effect on the folding cooperativity of a three-domain adenylate kinase (Giri Rao and Gosavi, 2014). Using statistical mechanical models, Sasai and co-workers investigated a variety of multidomain proteins and their circular permutants. Their results showed that domain connectivity and interactions in multidomain proteins determine folding pathways, cooperativity, and kinetics (Itoh and Sasai, 2008;Inanami et al., 2014). However, at present, a unified perspective on the role of interdomain interactions is still missing. Addressing this issue is an important avenue in studies of multidomain protein folding.
From a structural perspective, the effects of neighboring domains in terms of interdomain interactions are fundamental for generating and stabilizing the correct multidomain folds (Jones et al., 2000;Bhaskara and Srinivasan, 2011). On the other hand, overwhelming interdomain interactions may distort domains from the structurally folded units, reducing the efficiency that comes with domain-wise folding. To achieve a 'speed-stability' balance, a multidomain protein may optimize the strength of interdomain interactions to simultaneously guarantee efficient folding through the 'divide-and-conquer' folding mechanism and successful formation of functional structures with the aid of stabilization from the domain interface. Usually, the relatively weak interdomain interactions trigger domain motions in multidomain proteins, making a pivotal contribution to protein function (Bennett and Huber, 1984;Schulz, 1991;Miyashita et al., 2003). As is now widely recognized, the native folds of proteins may exhibit a certain degree of frustration in favor of functional state switching (Ferreiro et al., 2007;Ferreiro et al., 2014;Whitford and Onuchic, 2015). Therefore, the energetics of interdomain interactions in multidomain proteins may be evolutionarily optimized for making the trade-off between fast, stable folding and efficient, tight substrate binding (Bigman and Levy, 2020). However, it is still unclear how a multidomain protein manages the intricate balance among its interactions to allow simultaneous folding and function. Here, we aim to answer this fundamental question through a computational study of the folding and DNA-binding processes of Sulfolobus solfataricus DNA polymerase IV (DPO4), a prototype Y-family DNA polymerase.
Akin to the other Y-family polymerases, DPO4 consists of a polymerase core with a right-handed architecture, including finger (F), palm (P), and thumb (T) domains, as well as a little figure (LF) domain that is connected to the polymerase core by a flexible linker ( Figure 1A; Ling et al., 2001). Structural analysis has shown that many more intradomain contacts than interdomain contacts are present in the apo form of DPO4 (Wong et al., 2008; Appendix 2-table 1). Thermal unfolding experiments have indicated the existence of one intermediate state (Sherrer et al., 2012), which is probably formed by the unfolding of the linker interactions with the domains. These features imply that the four domains of DPO4, though differing in size and topology, are prone to fold independently. Binding of DPO4 to DNA is an essential step in nucleotide incorporation (Fiala and Suo, 2004;Wong et al., 2008). Structural comparisons of DPO4 in apo form and DNA binary form have revealed that significant rotation and translation of the LF domain occur during DNA binding, while the domain structures remain unchanged (Wong et al., 2008). This 'open-to-closed' conformational transition in DPO4 is pivotal to the formation of a high-affinity DPO4-DNA complex prior to nucleotide binding and incorporation (Fiala and Suo, 2004;Sherrer et al., 2009). Previous experimental and simulation studies have suggested that the linker plays an important role in this conformational transition (Xing et al., 2009;Sherrer et al., 2012;Chu et al., 2014). Nevertheless, a thorough investigation of the roles played by DPO4 domain interactions in the modulation of DNA binding and protein conformational dynamics is still lacking. More importantly, it remains unclear how the folding and DNA binding of DPO4 can be optimized by the interactions in DPO4.
Here, we use structure-based models (SBMs) with a comprehensive procedure for parameterizing the strengths of intra-and interdomain interactions to investigate the folding and DNA-binding The 2D folding free energy landscape of DPO4 projected onto the fractions of total (QðTotalÞ) and interdomain (QðInterÞ) native contacts for the default SBM parameter 0 at the folding temperature T 0 f . There are six folding states identified on the free energy landscape, which are denoted by U, I 3 , I 0 3 , I 2 , I 1 , and N, with corresponding typical DPO4 structures shown. The all-atom representations of DPO4 were reconstructed based on the C a structures from the simulations (Rotkiewicz and Skolnick, 2008). Domains of DPO4 are labeled with different colors: the finger domain (F, blue, residues 11-77), the palm domain (P, red, residues 1-10 and 78-166), the thumb domain (T, green, residues 167-229), the little finger domain (LF, purple, residues 245-341), and the flexible linker (gray, residues 230-244), which connects the T and LF domains. (B) The 1D folding free energy landscape of DPO4 projected onto QðTotalÞ for different values of the ratio of interdomain to intradomain native contact strength, denoted by r and ranging from 0.5 to 1.5 (indicated by different colors). It is worth noting that the I 3 and I 0 3 states cannot be distinguished by QðTotalÞ. The black line corresponds to the free energy landscape for the default SBM parameter 0 . We set the zeros of the free energies at the lowest QðTotalÞ detected from the simulations. (C) Change in stability of each folding state for r relative to that for 0 at the corresponding folding temperature T f according to the expression DDFðÞ S ¼ DFðÞ S À DFð 0 Þ S , where S represents the folding state (I 3 , I 2 , I 1 , or U) and DFðÞ S is the stability of the folding state S at r. DFðÞ S is calculated as the free energy difference between the folding state S and the unfolded state U : DFðÞ S ¼ FðÞ S À FðÞ U , where F is the free energy obtained from (B). The online version of this article includes the following figure supplement(s) for figure 1: processes of DPO4. We find a monotonic increase in folding stability led by a decrease in the interdomain interaction strengths for all intermediate states during folding but not the folded states. This further underlines the importance of interdomain connections in maintaining the correct fold. Interestingly, we find that strengthening the interdomain interactions can result in an increase in folding cooperativity and the chances of backtracking (Gosavi et al., 2006). The interplay among folding stability, backtracking, and cooperativity leads to a 'U-shaped' interdomain interaction-dependent kinetics. Finally, we quantitatively characterize the role of a flexible domain interface in accelerating the fast DNA (un)binding to DPO4, which probably promotes DNA lesion bypass during DNA synthesis undertaken by a Y-family enzyme. Our results suggest that the topology and interactions of DPO4 have been optimized to achieve its fast folding and tight DNA binding, plausibly by evolutionary pressure. Our systematic investigation of the interactions in a multidomain protein provides a mechanistic understanding of the relationship between protein folding and function at the multidomain level and offers useful guidance for multidomain protein engineering.

Effects of interplay between intra-and interdomain interactions on DPO4 folding thermodynamics
SBMs are simplified models based on energy landscape theory (Bryngelson et al., 1995). They include only the interactions in the protein native structure and have proven successful in capturing protein folding mechanisms (Clementi et al., 2000). The essential assumption made by SBMs is that native contacts should determine the protein folding mechanism, which has been confirmed by allatom simulations (Best et al., 2013). Further systematic comparisons between coarse-grained SBMs and all-atom simulations have also shown high consistency in extensive predictions for the folding of proteins with diverse native topologies (Hu et al., 2017). These features, together with the reliability and computational affordability of SBMs, indicate that they are attractive models for investigating protein folding, especially in the case of complex large proteins. Therefore, in this study, we use a coarse-grained SBM to simulate the folding of DPO4.
To improve the sampling, we use replica-exchange molecular dynamics (REMD) to explore the thermodynamics of DPO4 folding (Sugita and Okamoto, 1999). We apply the default parameter in the SBM, which sets the same strength for all the native contacts ( ¼ 0 ¼ 1:0; details are presented in Materials and methods). We project the folding free energy landscape onto the fraction of native contacts, Q, which has been recognized as a good reaction coordinate for describing folding of the single-domain proteins with two-state manner by means of SBMs (Best and Hummer, 2005;Cho et al., 2006). In addition, there are previous studies using Q to describe the folding of various multidomain proteins (Li et al., 2012;Giri Rao and Gosavi, 2014;Inanami et al., 2014;Tanaka et al., 2015;Giri Rao and Gosavi, 2018), supporting the validity of Q as a reaction coordinate for describing DPO4 folding. However, more precise and detailed description of the multidomain protein folding process may require the involvement of more reaction coordinates. Here, we use the interdomain and total contact Q (QðInterÞ and QðTotalÞ) to describe DPO4 folding ( Figure 1A). The 2D free energy landscape reveals a complex DPO4 folding process with multiple intermediate states. The existence of these intermediate states in DPO4 folding is in good agreement with the results of temperature-induced unfolding experiments, which revealed more than one transition during the unfolding of DPO4 (Sherrer et al., 2012).
To see how DPO4 folds, we analyze the structures of DPO4 in the (meta)stable states indicated by free energy landscape minima from contact maps (Figure 1-figure supplement 1). This enables us to gain insight into domain and interface formation in DPO4 during folding. We see that U and N are the completely unfolded and fully folded states, with little and fully formed native contacts in DPO4, respectively; I 1 is an intermediate folding state in which DPO4 has only an unfolded F domain; DPO4 in the I 2 state further unfolds the T domain from I 1 ; in its I 3 and I 0 3 states, DPO4 possesses a similar QðTotalÞ but has a different QðInterÞ. From the contact maps, we find that there is only one folded domain in DPO4 in the I 3 or I 0 3 state: the LF domain in I 3 and the P domain in I 0 3 . Folding of the P domain can partly stabilize the formation of P-T and P-F interfaces within the DPO4 polymerase core (Silvian et al., 2001;Trincao et al., 2001), leading to an increase in QðInterÞ. By contrast, folding of the LF domain does not trigger any interdomain formation, probably because the LF domain is separated by the flexible linker, far from the DPO4 core Ling et al., 2001). Overall, DPO4 folding complies with a 'divide-and-conquer' framework (Wang et al., 2012a). Such a folding mechanism obtained from the SBM with default parameter 0 (the same native contact strength) is probably a consequence of the DPO4 topology (Baker, 2000), which exhibits many more intradomain contacts than interdomain ones (Appendix 2-table 1).
To see how a change in the balance between intra-and interdomain interactions in DPO4 may influence folding, we modulate the relative strength of these interactions in the SBM through the ratio ¼ Inter = Intra , where Intra and Inter are the strengths of the native contacts for intra-and interdomain interactions, respectively. In practice, this is implemented by changing only Inter while keeping Intra and the other parameters to the default as they are in a homogeneously weighted SBM (Clementi et al., 2000). Using a reweighting method (Cao et al., 2016;Li et al., 2018), we quantify the free energy landscapes for DPO4 folding by different values r. In Figure 1B, we see that changing r within a moderate range (0.7-1.3) does not alter the multistate characteristics of DPO4 folding, since the values of QðTotalÞ for all the (meta)stable states remain almost the same as those at 0 . Further decreasing or increasing r can distort the free energy landscape from that at 0 . In general, this involves an alteration of the DPO4 folding mechanism when the strength of interdomain interactions deviates significantly from the default value.
Change in interactions in proteins, for example, by means of the mutations, pH, and denaturants, can affect folding stability. Likewise, modulation of the strength of the intra-and interdomain interactions may change the stability of different states during DPO4 folding. In Figure 1C, we can see that all the intermediate states of DPO4 exhibit monotonic decreases in folding stability as the interdomain interaction strengths increase (DDF increases as r increases), although with different magnitudes. Since DPO4 in its intermediate states forms more intradomain contacts than interdomain ones, increasing the weight of interdomain interactions in the SBM is expected to weaken the folding stability in the intermediate states. DPO4 in the I 2 state possesses two large-sized P and LF domains, which are distal structural units and do not have interactions in between (Appendix 2table 1). Strengthening the interdomain interactions relatively weakens the intradomain interactions, thus leading to destabilization of the I 2 state, which exhibits the most significant decrease among the intermediates as r increases. DPO4 in the I 1 state is stabilized by both intra-and interdomain interactions ( Figure 1-figure supplement 1). Thus, increasing the interdomain interaction strength has less of a destabilization effect than that on the I 3 state, where only one domain in DPO4 is folded. An interesting nonmonotonic r-dependent behavior is observed in the N state, where weakening the interdomain interactions (decreasing r below~0.85) can instead lead to a decrease in folding stability. This is probably because the structures of DPO4 in the N state are maintained by cooperation between intra-and interdomain interactions. Increasing the relative weight of intradomain interactions in the SBM (decreasing r) cannot always increase the stability of the N state, since the fragile interdomain interactions may not be able to form domain interfaces within the DPO4 native structure.

Effects of interplay between intra-and interdomain interactions on the DPO4 folding mechanism
The 2D free energy landscape for DPO4 folding with the default SBM parameter 0 indicates that there are at least two potential folding pathways that go separately from I 3 and I 0 3 to I 2 and then to I 1 ( Figure 1A). Increasing and decreasing the strength of the interdomain interactions appear to enhance one of these two pathways (Figure 2-figure supplement 1). Very weak ( ¼ 0:5) or very strong ( ¼ 1:5) interdomain interactions can shift DPO4 folding completely to one pathway ( Figure 2A and Figure 2B), resonating with the findings from the 1D free energy landscape that a very high or low r can change the DPO4 folding mechanism. We further calculate the averaged QðInterÞ and QðTotalÞ and interestingly find that there are two regions exhibiting an increase followed by a decrease in QðInterÞ as QðTotalÞ increases ( Figure 2C). This observation may be a sign of folding 'backtracking,' which involves the formation, breaking, and refolding of a subset of native contacts as the protein proceeds from the unfolded to the folded state (Gosavi et al., 2006). As the REMD simulations have broken the kinetic connectivity of the folding trajectory, we perform additional kinetic simulations at constant temperature to assess the backtracking rigorously. Although a temperature (0:96T f ) lower than the folding temperature is used in the kinetic simulations with the aim of generating a sufficient number of folding events, we still observe the backtracking within many of the individual folding trajectories ( Figure 2D To address the origins of backtracking, we extract all the DPO4 structures where QðInterÞ shows a local maximum versus QðTotalÞ in the two backtracking regions and perform a structural analysis by calculating Q for each domain in DPO4. We find that DPO4 in backtracking region I (QðTotalÞ ¼ 0:15-0.35) probably has a folded T domain, while the other domains remain unfolded (bottom left in Figure 2E and Figure 2F). Since region I is located between the U and I 3 states, we deduce that backtracking is caused mainly by folding and subsequent unfolding of the T domain. On the other hand, DPO4 in backtracking region II (QðTotalÞ ¼ 0:4-0.7) has mainly either folded T and LF domains (top left in Figure 2E and Figure 2F) or a folded core formed from the F, T, and P domains (bottom right in Figure 2E and Figure 2F) during the transition from the I 3 or I 0 3 state to the I 2 and I 1 states. The bimodal distribution of DPO4 structures in region II indicates that       backtracking exists within both of the two folding pathways. Therefore, we suggest that the backtracking in DPO4 folding is led by unstable and fast domain folding and unfolding of the small-sized F and T domains (Figure 2-figure supplement 5).
Folding cooperativity measures how synchronous residues in a protein form native-like configurations during folding. It dictates the folding mechanisms of the single-domain proteins in two-state 'all-or-none' folding, folding with intermediates, and downhill folding (Muñoz et al., 2016). Here, we apply a thermodynamic coupling index (TCI) to quantify the folding cooperativity of the domains and interfaces in DPO4. TCI is a measure of the similarity between a pair of intra-and interdomain melting curves during DPO4 unfolding (Sadqi et al., 2006;Sborgi et al., 2015). A large (small) TCI leads to a high (low) degree of synchronous folding between the domains/interfaces and thus a high (low) folding cooperativity (TCI is defined in Materials and methods). In Figure 2G, we can see that there is high cooperative folding in the DPO4 core. This is consistent with the experimental evidence that the conserved DPO4 core is a stable structural unit that unfolds cooperatively (Sherrer et al., 2012). When the interdomain interaction is weakened, DPO4 folding cooperativity decreases ( Figure 2H). This trend is confirmed both when all domains/interfaces are taken into account and when only domains are considered. We also perform independent REMD folding simulations for individual domains of DPO4. This corresponds to the extreme case in which the interdomain interactions of the four domains of DPO4 are completely removed, reminiscent of the case ¼ 0. We observe an increase in the mean TCI (MTCI) as r increases from 0 to 0.5, indicating that the presence of interdomain interactions enhances cooperative folding among the four domains of DPO4. Interestingly, the relation between r and MTCI is not entirely monotonic, since MTCI decreases slightly after r increases beyond~1.0. This may be due to the following two reasons. (1) Increasing r tends to separate the folding curves of the unstable F domain and its associated F-P domain from the other parts of DPO4 (Appendix 3-figure 7). Removing the unfolding curves of the F domain and F-P interface when calculating MTCI can lead to a significant increase in its value as well as in the value of r at which MTCI reaches its maximum (Appendix 3-figure 8).
(2) Increasing r decreases the relative intradomain contributions in an SBM and may have different magnitudes of destabilization of the folding of different domains. Such a decrease in MTCI with increasing r can be found in applications to independent folding of individual domains with a reweighting method (Appendix 3- figure 9). Overall, we find that a moderate increase in interdomain interaction strength can significantly increase the folding cooperativity of DPO4 (for values of r in the range 0.5-1.0).

Effects of interplay between intra-and interdomain interactions on DPO4 folding kinetics
Previous studies have suggested that the topology of a protein's native structure is an important determinant of its folding rate (Plaxco et al., 1998;Koga and Takada, 2001). Interaction heterogeneity, which originates from the amino acid sequence, has also been recognized as affecting folding kinetics (Islam et al., 2004;Calloni et al., 2003;Szczepaniak et al., 2019). To see how the energetic factor in terms of intra-and interdomain interactions in DPO4 affects the folding rate, we perform 100 independent kinetic simulations starting from different unfolded configurations at room temperature T r for each r. In practice, we use the time at which Q reaches 0.75, termed as the first passage time (FPT) to represent the folding rate. We observe a 'U-shaped' r-dependent mean FPT (MFPT) behavior for DPO4 global folding kinetics ( Figure 3A). The value of r at which DPO4 global folding is fastest is 0.8, which is lower than the default parameter 0 . There are distinct effects of r on the kinetics of domain and interface folding. Intuitively, weakening interdomain interactions (decreasing r) should promote domain folding ( Figure 3B), since the increased folding stability of individual domains resulting from a relatively strengthening of intradomain interactions should accelerate the folding process (De Sancho et al., 2009;Garbuzynskiy et al., 2013). On the other hand, interdomain folding exhibits 'U-shaped' behavior with r, similar to global DPO4 folding. When interdomain interactions are weak (r in the range 0.5-0.9), the formation of interfaces between domains in DPO4 consumes more time than domain folding and thus is the rate-limiting step. Increasing r beyond 0.9 slows down interdomain formation. During DPO4 folding, domain interfaces use the folded domains as templates to proceed further. When r is high (!1.0), corresponding to the case of weak intradomain interactions, domain folding is slow and may become the rate-limiting step ( Figure 3B). Therefore, we investigate switching of the bottleneck for DPO4 folding between interface formation and domain folding through modulation of the intra-and interdomain interaction strengths.
In contrast to domain folding, we find that formation of domain interfaces and the linker show different r-dependent behaviors ( Figure 3C). The formation of the T-LF interface and the linker are significantly accelerated when r increases from 0.5 to 0.8. These two interdomain structures are critical to DPO4-DNA functional binding, where a large-scale 'open-to-closed' conformational transition with rearrangement of the T-LF interface and the linker has been observed between apo-DPO4 and DNA-binding DPO4 (Wong et al., 2008). The kinetic result implies that the DNA-binding dynamics of DPO4 may be facilitated at 0:8, where folding of DPO4 to the stable apo form is slowed down.
The folding order of elements in a protein dictates the kinetic folding pathway. Experimental determination of the domain folding order in DPO4 is challenging (Sherrer et al., 2012). To measure the folding order of the domains/interfaces and establish its connection to the interactions in DPO4, we calculate the probability of folding domain/interface I in step k, termed the folding order probability OP I k , for different values of r ( Figure 3D-F). We find a high chance of folding the domains prior  to forming the domain interface when r is very small ( ¼ 0:5, Figure 3D). In particular, domain folding at ¼ 0:5 approximately follows a deterministic route with T ! F ! LF ! P. Increasing r leads to more dispersed distributions of OP I k ( Figure 3E and F and Figure 3-figure supplement 1). The degree of dispersion of the OP I k distribution is quantified by calculating its standard deviation s OPk ( Figure 3G). We observe a significant decrease in s OPk with increasing r, in particular when 0:9. This indicates that the orders of folding and formation of the domains and interfaces become less deterministic as r increases, leading to more diverse folding pathways.
We also investigate the effects of temperature on folding kinetics by analyzing kinetic simulations performed at a temperature 0:96T f , which is the optimal temperature for growth of Sulfolobus solfataricus ( Figure 2). We find that most of the results obtained at this relatively high temperature, such as the 'U-shaped' r-dependent folding time and the more (less) deterministic folding order for lower (higher) r, are similar to those at room temperature T r . Interestingly, we observe a plateau in MFPT for the range of r from 1.2 to 1.5. This is probably due to the complex r-dependent behaviors of intra-and interdomain folding kinetics when r is high, although the optimum value of r for achieving the fastest folding is still 0.8, which is less than the default parameter of the SBM. In addition, the folding order is not the same as that at T r . The LF and P domains probably accomplish folding within the first two steps of DPO4 folding at 0:96T f when r is low (Figure 3). This may contribute to the elimination of backtracking when domains in DPO4 have a strong tendency to fold spontaneously (low r).

Effects of interplay between intra-and interdomain interactions on the DPO4-DNA binding function
As a DNA polymerase (Ohmori et al., 2001), DPO4 synthesizes DNA molecules by assembling nucleotides. An essential step in the action of DPO4 is its DNA-binding process. We investigate the effects of the interactions in DPO4 on its function in terms of DNA binding. To describe DPO4-DNA binding, we construct a 'double-basin' SBM by adding to the original SBM the F-LF interdomain contacts in DPO4 and the DPO4-DNA native contacts identified in the crystal of the DPO4-DNA binary structure using a similar protocol proposed previously (Whitford et al., 2007;Wang et al., 2012b;Chu et al., 2014). The 'double-basin' SBM takes into account the effects of DNA binding and aims to capture the large-scale 'open-to-closed' conformational transition in DPO4 during DNA binding. Here, we assess the efficiency and effectiveness of the DPO4-DNA binding process in terms of thermodynamic stability and kinetic rates.
We use umbrella sampling techniques to calculate the free energy landscape of DPO4-DNA binding for values of r ranging from 0.5 to 1.5 (details are presented in Materials and methods and Appendix 1). We find an increase in binding affinity with strengthening interdomain interactions (increasing r), and the value at ¼ 0:7 (simulated K d = 2.24 nM) is approximately equal to the experimental measurement of 3-10 nM ( Figure 4A, details of K d calculation are presented in Appendix 1) (Fiala and Suo, 2004;. This indicates that the interdomain interactions help to stabilize the DPO4-DNA complex. For all r, we observe a similar multistate DPO4-DNA binding process, which we divide into four binding states ( Figure        To see how the interactions in DPO4 modulate the kinetics of DNA binding, we focus on the individual transitions between neighboring states during the binding process ( Figure 4B). The transition from the US to the EC can be described approximately as a free diffusion of DPO4 to DNA molecules in the bulk. The diffusion coefficient thus controls the kinetics of this process. We perform simulations of free DPO4 in an infinite box for different values of r and then calculate the corresponding diffusion coefficient D ( Figure 4C and Figure 4-figure supplement 2). Although for low r, DPO4 exhibits excessive conformational flexibility (Figure 4-figure supplement 3), which is assumed to increase the hydrodynamic radius of the chain (Huang and Liu, 2009), we find little effect of the structural fluctuations on D. Our results indicate that the free spatial diffusion of a folded large multidomain protein is barely affected by interfacial domain formation. Therefore, the capture rate of DNA by DPO4 in the first step of the binding process should be similar for different interaction strengths of DPO4, leading to r-independent observations. Spatial proximity between DPO4 and DNA does not guarantee a successful transition from the EC to the IS, since DPO4 can dissociate from DNA through thermodynamic fluctuations. To investigate how the interactions of DPO4 influence the transition from the EC to the IS, we perform 200 independent kinetic simulations for different values of r. Each simulation starts from random configurations of DPO4 and DNA in the EC and ends when DPO4 binds with the DNA as the IS or when DPO4 completely dissociates from the DNA as the US. We calculate the rates associated with the EC, K Evo and K Dis , using a kinetic framework proposed previously (Huang and Liu, 2009). The encounter time, which is defined as K Dis =K Evo þ 1, measures the time for DPO4 to achieve one successful transition from the EC to the IS. We find that the encounter time increases significantly as r increases from 0.5 to 0.7 and then becomes more or less constant for >0:7 ( Figure 4D). This indicates that the weakly formed and flexible domain interface of DPO4 facilitates the DNA-binding process. We further find a strong correlation between the degree of conformational fluctuation of DPO4, quantified by the root-mean-square fluctuation RMSF, and DNA-binding encounter times. These results confirm the roles of domain interfacial fluctuations of DPO4 led by low r in promoting DNA binding.
Direct simulations of the transitions between the IS and the BS are expected to be computationally demanding owing to the high barriers between these two states. Instead, we calculate the kinetic rates by performing so-called frequency-adaptive metadynamics simulations (Wang et al., 2018), for which the computational expense is significantly lower (details are presented in Materials and methods and Appendix 1). We find that the transition times calculated from the metadynamics simulations are strongly correlated with the barrier heights measured by the umbrella sampling simulations for the transitions between the IS and the BS (Figure 4-figure supplements 4 and 5). The consistency between thermodynamics and kinetics resonates with the findings of previous work, where a quantitative relationship between the barrier heights and binding rates in both ordered and disordered protein-binding processes has been established (Cao et al., 2016). We find a monotonic increase in the kinetic times for both of the transitions between the IS and the BS as r increases ( Figure 4E). In particular, significant deceleration of the transition from the IS to the BS is found as the interdomain interactions in DPO4 become stronger. By contrast, the effect on the transition rates from the BS to the IS led by r appear to be minor. We find a strong r-dependent conformational distribution of DPO4 in the IS, where DPO4 in apo and intermediate forms is dominant for high and low r, respectively. The population of DPO4 in the apo form in the IS hinders the conformational dynamics of transformation of DPO4 to the DNA binary form and thus is disfavored in the binding transition to the BS. On the other hand, DPO4 is almost entirely in the DNA binary form in the BS (Figure 4-figure supplement 1). Escape from the BS should have little dependence on the conformational dynamics of DPO4. Therefore, the transition rates between the IS and the BS are dependent on DPO4 conformational dynamics, which is modulated by its inherent interactions. It is worth noting that when ! 0:8, the IS state, rather than the BS state, becomes the most stable in DPO4-DNA binding (Figure 4-figure supplement 4), and the transition time from the IS to the BS is much longer than that from the BS to the IS. Although there are quantitative discrepancies between the thermodynamic and kinetic results (Figure 4-figure supplement 5), our results lead to the conclusion that to achieve and maintain the conformation that underpins its biological function, DPO4 has to avoid strong interdomain interactions, even when they are purely native.

Discussion
Thermal denaturation experiments revealed that unfolding of truncated DPO4 mutants, such as the DPO4 core and LF domains, proceeds via cooperative processes (Sherrer et al., 2009). Stoppedflow Fö rster resonance energy transfer (FRET) studies monitoring intradomain  and interdomain (Xu et al., 2009;Maxwell et al., 2014) conformational dynamics of DPO4 during DNA binding as well as nucleotide binding and incorporation revealed weak and strong interactions between and within each domain of DPO4, respectively. Using only topological information, an SBM with homogeneously weighted native contacts predicted a 'divide-and-conquer' domainwise folding scenario for DPO4 folding (Wang et al., 2012a;Chu et al., 2020). These results suggested that domains in DPO4 can fold without any aid from other domains. Previous studies showed that many multidomain proteins have stable domains that can fold independently (Scott et al., 2002;Steward et al., 2002;Robertsson et al., 2005). From a structural perspective, these proteins exhibit a lack of densely packed domain interfaces, so their interdomain interactions should be minimal (Han et al., 2007). Nevertheless, the relatively weak interdomain interactions may still play an important role in the multidomain protein folding process, since many proteins with independently folding domains are found not to fold in a 'sum of the parts' manner (Levy, 2017). Here, we have investigated the effects of the interplay between intra-and interdomain interactions in DPO4 on the thermodynamics and kinetics of folding. The incorporation of strength heterogeneity into the intraand interdomain contact interactions in an SBM has enabled a quantitative investigation into how DPO4 can modulate its inherent interactions to maximize folding efficiency.
We have characterized the critical role of interdomain interactions in controlling the continuum shift of the DPO4 folding mechanism in noncooperative unstable folding, fast folding, and highly cooperative 'all-or-none' folding ( Figure 5, left panel). Folding of multidomain proteins resembles the binding of proteins (domains) to form complexes. Two extreme cases can be outlined. One is the docking of rigid domains, and the other is the concomitant folding and binding of domains, reminiscent of binding-coupled folding in intrinsically disordered proteins (IDPs) (Sugase et al., 2007;Dyson and Wright, 2002). Which of these two mechanisms is involved in multidomain protein folding should depend on the interplay between the folding tendency of domains and the binding strength between domains (Levy et al., 2004;Levy et al., 2005). For DPO4, we have found that a decrease in the interdomain interaction strength from that in the default SBM to a value of ¼ 0:8 can lead to the fastest DPO4 folding rate. In the presence of weak interdomain interactions, the Figure 5. Illustration of intra-and interdomain interactions in the trade-off between DPO4 folding and function. The four domains of DPO4 are indicated using the same color scheme as in Figure 1A. The sizes of the arrows indicate the magnitudes of the rates or fluxes for folding (binding). DPO4 and DPO4-DNA binding complexes shown in lighter shades are less stable or populated than the others. stability of the on-pathway folding intermediate states is enhanced. This can promote a deterministic folding order of the domains in DPO4 to help eliminate backtracking, which usually acts as a kinetic trap during folding (Capraro et al., 2008;Gosavi et al., 2008). However, if interdomain interactions are very weak, the folded states are destabilized, resulting in failed DPO4 folding. By contrast, strong interdomain interactions tend to couple domain folding and interface formation to fold the DPO4 as a whole. The 'all-or-none' DPO4 folding shows high cooperativity that disfavors the formation of intermediate states, so the folding order of DPO4 domains is not clearly defined. Besides, the domains in DPO4 in the presence of strong interdomain interactions are destabilized by the relatively weak intradomain interactions, which induces more backtracking. Collectively, the fast folding of DPO4 requires weak interdomain interactions, so that DPO4 can efficiently use the 'divide-andconquer' strategy to fold via modest cooperativity and limited backtracking.
By investigating DPO4-DNA binding, we have addressed the roles of weak interdomain interactions of DPO4 in facilitating DNA binding. Our results show that weak interdomain interactions can induce massive conformational flexibility of DPO4 in favor of anchoring DPO4 to DNA ( Figure 5, right panel). The result is reminiscent of the 'fly-casting' effect in the binding process (Shoemaker et al., 2000), where the roles of conformational disorder can be appreciated (Huang and Liu, 2009;Levy et al., 2007;Umezawa et al., 2016;Chu and Muñoz, 2017). Structural analysis has revealed that DPO4 in the IS is not in the DNA binary form (BS in Figure 4B) and thus is functionally inactive. At the same time, DPO4 is almost correctly located at the primer-template junction of the DNA substrate. Therefore, the transition from the IS to the BS is mainly related to the large-scale 'open-to-closed' DPO4 conformational transition. Experimentally, the existence of switching from the nonproductive to the productive DPO4-DNA complex has also been observed by single-molecule fluorescence resonant energy transfer (FRET) (Brenlla et al., 2014; and has been proposed to be essential in completing the DPO4-DNA binding process (Raper et al., 2018). Here, we have found that a decrease in interdomain interaction strength can significantly accelerate the transition from the IS to the BS, thus favoring subsequent DNA replication. On the other hand, an effect of weak interdomain interactions in facilitating the transition from the BS to the IS has also been observed, but tends to be minor. As a prototype Y-family DNA polymerase, DPO4 is capable of catalyzing translesion synthesis (TLS) across various DNA lesions (Ohmori et al., 2001;Sale et al., 2012). Weak interdomain interactions can accelerate both of the transitions between the functionally inactive IS and the active BS. Thus, they promote the function of DPO4 as a polymerase for bypassing DNA damage during TLS, which would otherwise stall DNA synthesis in vivo by highfidelity DNA polymerases (Kunkel and Bebenek, 2000). It is worth noting that in reality, both very weak and very strong interdomain interactions are not favored in DPO4-DNA binding. Very weak interdomain interactions lead to high conformational flexibility in DPO4 (Figure 4-figure supplement 3), resulting in very-low-affinity binding of DNA, which has been widely observed in IDPs (Wright and Dyson, 1999;Dunker et al., 2001). On the other hand, very strong interdomain interactions quench the conformational dynamics of DPO4, thereby slowing the transitions during the DNA-binding process and eventually populating the binding complex at the nonproductive IS rather than the productive BS.
We see that the default SBM with homogeneously weighted intra-and interdomain interactions can lead to many consistencies in describing the DPO4 folding and DNA-binding processes with experiments. The intermediate state caused by stable individual domain folding during DPO4 folding from the default SBM simulations resonates with the experimentally observed unfolding intermediate (Sherrer et al., 2012). In addition, simulations with the default SBM have described a multistate process for DPO4-DNA binding, in good agreement with the results of previous experiments, where the complexity of the processes, including multistep DNA binding and multistate DPO4 conformational dynamics, was revealed Lee et al., 2017). These features are manifestations that the native topologies of DPO4 and the DPO4-DNA complex are major elements in determining the mechanisms of folding (Clementi et al., 2000) and binding (Levy et al., 2004), respectively. However, there is also evidence indicating that SBM with considering only the topological factors, may not accurately capture the DPO4 conformational dynamics. The SBM simulations with various parameterizations on interdomain interaction strength show that DPO4 folding is not the most efficient when the strengths of intra-and interdomain interactions are in the same weights. Furthermore, the default SBM has led to a significantly enhanced affinity of DPO4-DNA binding and stabilized more on the functionally inactive state IS rather than the functionally active state BS, in contradiction with the experiments. These facts imply that the default SBM fails to result in a fast-folding process of DPO4 and generate the correct functional DPO4-DNA binding. Considering both the DPO4 folding and the DPO4-DNA binding results, we have suggested that the weak interdomain interactions in DPO4 are the key to the trade-off between DPO4 folding and function. The value of r for achieving fast folding and efficient DNA (un) binding appears to be below 1.0. By assuming that DPO4 has structurally evolved in nature to optimize its folding and function, we suggest that the SBM with default parameter 0 overweights the strength of interdomain interactions in its potential. This suggestion seems to be reasonable since within the domains of DPO4, there are a large proportion of conserved hydrophobic residues (Ling et al., 2001), which should form stronger interactions inside domains than the ones between domains. We anticipate that the SBM can be further improved by incorporating the energetic heterogeneity of native contacts (Cho et al., 2009) or by benchmarking available experimental observations and all-atom simulations (Ganguly and Chen, 2011;Jackson et al., 2015).
One interesting observation from our simulations is that DPO4 folds with backtracking, which has been found to be related to the fast and unstable folding followed by the unfolding of the F and T domains. Structural analysis has revealed that the F and T domains are significantly smaller than the corresponding domains in high-fidelity DNA polymerases (Ling et al., 2001). Thus, the interactions between these two domains and DNA are significantly reduced, providing an explanation of the ability of DPO4 to accommodate and to bypass various DNA lesions (Ling et al., 2003;Vaisman et al., 2005). Here, we have suggested that backtracking can help DPO4 to perform TLS by using the structural fluctuations of the domains. This can be undertaken by DPO4 adjusting its binding conformation through opening the active site when it encounters a bulky DNA lesion, and such an adjustment may enhance the binding of the damaged DNA to the DPO4 (Mizukami et al., 2006). For example, DPO4 binds DNA containing benzo[a]pyrene-deoxyguanosine and allows the bulky lesion to be flipped/looped out of the DNA helix into a structural gap between the F and LF domains (Bauer et al., 2007). In addition, the structure of DPO4 with DNA containing 8-(deoxyguanosine-N 2yl)-1-aminopyrene (dG1,8) reveals that the dG moiety of the bulky lesion projects into the cleft between the F and LF domains of DPO4 (Vyas et al., 2015). These structural characteristics, differing from those of DPO4 binding to undamaged DNA, provide the dynamic basis for TLS and are probably favored by the fluctuating F and T domain conformations when binding to DNA with backtracking. However, we also note that when r is below 1.0 (weak interdomain interactions), the backtracking is not very populated, indicating limited fluctuations in the F and T domains. This may help to maintain the conformational rigidity of the F domain in the DPO4-DNA binding complex, which has been deemed to contribute to the low-fidelity DNA polymerization of DPO4 (Wong et al., 2008). Therefore, we speculate that DPO4 may optimize the extent of backtracking represented by the conformational fluctuations in the F and T domains to promote the binding of damaged DNA.
Our results have the important implication that DPO4 has naturally evolved to favor simultaneously folding and function. For folding, the high structural modularity in DPO4 has led to the high thermodynamic modularity that allows an efficient 'divide-and-conquer' mechanism by significantly reducing the number of configurations that need to be searched during folding (Wang et al., 2012a). Furthermore, the fastest folding of DPO4 is achieved when the interdomain interactions are weaker than they if they were determined simply by topology. This corresponds to the fact that the evolutionary pressure has acted on the DPO4 sequence to place more hydrophobic residues inside domains rather than on the domain interfaces (Ling et al., 2001). For DNA binding, weak interdomain interactions in DPO4 promote functional conformational transition through domain movement and facilitate all transitions throughout the DNA-binding process in favor of DNA replication and lesion bypass. Therefore, the natural evolution of DPO4 requires optimization of its protein sequence to form a structure composed of independently folded domains with hydrophobic cores to handle the folding and DNA-binding processes.
Our theoretical predictions can be potentially assessed by targeted biophysical experiments. In these, it should be straightforward to change the interactions in DPO4 via site-directed mutations and determine their effects on DPO4 folding and DNA binding. Although the positively charged linker has been targeted as one of the widely investigated mutation sites in DPO4 (Sherrer et al., 2012), we argue that ionic interactions can be nonspecific and long-ranged, which significantly changes the direct binding interactions with DNA (Ling et al., 2001). These features would lead to difficulties in delineating the effects of changes in internal interactions on the DPO4 folding and binding processes. In this context, more attractive mutations in DPO4 would be those that disrupt the hydrophobic interactions within the domains to mimic the effect of the weakening of intradomain interactions (i.e. relatively strengthening of interdomain interactions) in theoretical studies. Subsequently, the well-developed experimental approaches for DPO4 can be used to investigate the kinetics and mechanism of DPO4 folding, DNA binding, and nucleotide binding and incorporation. For instances, the melting circular dichroism (CD) spectroscopy used in our previous study for monitoring the temperature-dependent melting of wild-type DPO4 and its various truncation mutants can be easily adapted to examine the alterations of the folding cooperativity and folding order of DPO4's domains by using site-directed mutagenesis and protein engineering (Sherrer et al., 2012). We expect to see a more cooperative folding of DPO4 with more synchronous melting curves of different DPO4 truncation mutants through weakening the intradomain interactions. In addition, the previously designed stopped-flow and single-molecule FRET experiments revealed a dynamical conformational equilibrium of DPO4 during DNA binding (Xu et al., 2009;. These FRET-based techniques have further measured the kinetic rates of DPO4 interconversion between different states during DNA binding (Raper et al., 2018). The mutations designated to destabilize the intradomain interaction in DPO4 can potentially promote the effective interdomain interaction. Thus, according to our simulations, the conformational dynamics of DPO4 on DNA is expected to slow down because of shifting the equilibrium toward the catalytically incompetent complex. These expectations can be verified through future FRET experiments. Furthermore, the structural determination of DPO4 in complex with a damaged DNA substrate and an incoming nucleotide (Ling et al., 2001;Vaisman et al., 2005;Ling et al., 2004b;Bauer et al., 2007;Ling et al., 2004a;Vyas et al., 2015) can provide insights into the effects of backtracking caused by weakening intradomain interactions via mutations, which disrupt the structures of the ternary complexes.
Our modeling and simulations are applicable to various Y-family DNA polymerases, and we expect similar findings to be obtained for these. This expectation is based on the fact that all Y-family polymerases share structurally conserved architecture and sequence homology (Ling et al., 2001;Silvian et al., 2001;Trincao et al., 2001;Uljon et al., 2004). For polymerases in other families, the results of applying our approach may be substantially different, because these polymerases possess significantly different structural architectures to Y-family polymerases. For example, the F domains in high-fidelity DNA polymerases are much larger than those in the Y-family enzymes (Ling et al., 2001). A large-scale conformational change in the F domains of replicative DNA polymerases upon nucleotide binding is thought to be responsible for their high-fidelity DNA synthesis (Swan et al., 2009;Johnson, 2010;Prindle et al., 2013;Bę benek and Ziuzia-Graczyk, 2018). However, such a change is not observed with Y-family DNA polymerases. These features imply that differences in topology of multidomain polymerases lead to different folding scenarios and different biological functions. We anticipate that models with a wide range of parameters can be applied when investigating other multidomain proteins and can provide a promising way to characterize the trade-off between folding and function. Our results can offer useful guidance for protein design and engineering at the multidomain level. We used an SBM to investigate the folding and DNA binding of DPO4 with molecular dynamics simulations. In SBMs, which are based on funneled energy landscape theory (Bryngelson et al., 1995), it is assumed that it is the native topology of the protein/complex that determines the folding (Clementi et al., 2000) and binding mechanisms (Levy et al., 2004). SBMs can be verified by comparing simulation results with experiment measurements in terms of identifying the intermediate states, f values, folding pathways, etc. (Clementi et al., 2000). Here, we applied a coarse-grained SBM that used one C a bead to represent one amino acid in DPO4 and three beads to represent the sugar, base, and phosphate groups of one nucleotide in DNA. For the DPO4 folding, we used a plain SBM potential V apoDPO4 SBM based on the crystal structure of apo-DPO4 (PDB: 2RDI) (Wong et al., 2008). V apoDPO4 SBM is made up of the bonded interactions, including bond stretching, angle bending, and dihedral rotation, as well as the nonbonded interactions (Clementi et al., 2000). To see the effects of intra-and interdomain native interactions on the DPO4 folding and DNA-binding processes, we further introduced a prefactor in front of the intra-and interdomain nonbonded terms to modulate the strength of the corresponding interaction. Thus, V apoDPO4 SBM can be expressed as follows:

Materials and methods
where V Intra SBM , V Inter SBM , and V Linker SBM are the nonbonded potentials for intradomain, interdomain, and linker interactions, respectively. We used a ratio ¼ Inter = Intra to control the relative weight of the intraand interdomain interactions. Since the linker is found to interact extensively with the other four domains in DPO4 (Table S1), we here grouped the interactions of linker into the interdomain interactions, so Linker Inter was used throughout our simulations.
To explore DPO4 folding, we used replica-exchanged molecular dynamics (REMD) (Sugita and Okamoto, 1999) with the default parameters, where all prefactors are equal to 1.0 ( ¼ 0 ¼ 1:0). Reduced units, except for the length unit (nanometers, nm), were used throughout the simulations. The weighted histogram analysis method (WHAM) (Kumar et al., 1992) was then applied to calculate the thermodynamics of DPO4 folding, including heat capacity curves, free energy landscapes, and domain/interface melting curves, among other things. The melting curve was further fitted by a sigmoidal function that provides the (un)folding probability along the temperature P I ðTÞ ( hj½P I ðTÞ À P J ðTÞji ( ) ; where N I;J is the summed number of pairs I,J (Sadqi et al., 2006). A large (small) TCI corresponds to high (low) synchronous folding of the domains/interfaces and thus a high (low) folding cooperativity of DPO4. We used a reweighting method based on the principles of statistical mechanics to efficiently calculate the thermodynamics of DPO4 folding at other values of r from the REMD simulations performed at 0 (Cao et al., 2016;Li et al., 2018). The algorithm was implemented as follows. The probability of one state having potential E and reaction coordinate r with parameters 0 and r at temperature T can be written as pðEð 0 Þ; rÞ ¼ nðrÞ exp À Eð 0 Þ kT ; pðEðÞ; rÞ ¼ nðrÞ exp À EðÞ kT ; where nðrÞ is the density of states. nðrÞ is intrinsic to the system, and thus is independent of r (Chu et al., 2013). Therefore, pðEðÞ; rÞ can be calculated by reweighting pðEð 0 Þ; rÞ as follows: pðEðÞ; rÞ ¼ pðEð 0 Þ; rÞ exp À EðÞ À Eð 0 Þ kT : Since pðEð 0 Þ; rÞ has been calculated by analyzing the REMD simulations at 0 with WHAM, we used the above equation to calculate pðEðÞ; rÞ. Eventually, the equilibrium properties for different values of r (free energy landscapes and averaged QðInterÞ along with QðTotalÞ, TCI, etc.) can be obtained. The reweighting method has been proven to be effective and accurate in characterizing many other protein dynamics processes, including many-body interactions in protein folding (Ejtehadi et al., 2004), multidomain protein folding (Li et al., 2018), and protein-protein binding (Cao et al., 2016). These processes often require elaborate calibration of parameters in the SBM potentials, so they are always computationally expensive. To verify the results from the reweighting method, we also performed the REMD simulations for four different values of r (0.5, 0.8, 1.2, and 1.5) in V apoDPO4 SBM . The high degree of consistency between the results of these two approaches confirms the reliability of the reweighting method (Appendix 3- figure 2).
To identify the backtracking and calculate the time for DPO4 folding, we performed additional kinetic simulations that ran at constant temperature. For comparing the kinetic results obtained at different r, it is essential that the kinetic simulations be performed under identical environmental conditions, that is, at identical temperatures. Since T f also changes with r, we shifted the folding temperature T f at the parameter r to that at 0 (T 0 f ). This was done by inserting the ratio T 0 f =T f in front of V apoDPO4 SBM . The rationale for this is based on the fact that the solvent effects in SBMs are linearly dependent on temperature. We examined the folding temperature with the rescaled V apoDPO4 SBM for four different values of r (0.5, 0.8, 1.2, and 1.5) by REMD simulations and confirmed the validity of our implementation (Appendix 3- figure 3). Furthermore, as the DPO4 folding at T f is too slow to be represented by the kinetic simulations, we performed the simulations at two temperatures lower than T f . These were the optimal temperature T p for the growth of Sulfolobus solfataricus and the room temperature T r of the experiments. The simulation temperatures can be approximately deduced from the linear relation T p;r ðsimÞ » T p;r ðexpÞ Â T f ðsimÞ T f ðexpÞ ; where T f ðexpÞ and T f ðsimÞ are the DPO4 folding temperatures in experiments (369 K) and simulations (1.13, in energy units). T f ðsimÞ was characterized as the temperature where the heat capacity curve shows a prominent peak (Appendix 3- figure 1). With the experimental temperatures T p ðexpÞ ¼ 353 K and T r ðexpÞ ¼ 300 K, we obtain the corresponding temperatures in the simulations: T p ðsimÞ ¼ 0:96T f ðsimÞ ¼ 1:08 and T r ðsimÞ ¼ 0:81T f ðsimÞ ¼ 0:92. For DPO4-DNA binding, we used a short DNA segment that is present in the binary DPO4-DNA PDB crystal structure (PDB: 2RDJ) (Wong et al., 2008). As DPO4 exhibits a large-scale 'open-toclosed' transition from apo to DNA binary structures (Wong et al., 2008), we built a 'double-basin' SBM for DPO4 by adding the specific contacts within DPO4 at the binary structure as the potential V binaryDPO4 SBM (Chu et al., 2014). These contacts are found to be entirely located at the interface between the F and LF domains. It is worth noting that we did not use this 'double-basin' SBM for investigating DPO4 folding, where the potential of the SBM (V apoDPO4 SBM ) has only a 'single basin' in the apo-DPO4 structure. This is because (1) in the absence of DNA, DPO4 is mostly in apo form, and the transition rate for DPO4 from apo form to DNA binary form is very slow Lee et al., 2017), so DPO4 is prone to fold to its apo form without DNA, and (2) the contacts formed at the F-LF interface can be regarded as a consequence of DNA binding , so the formation of these contacts reflects the effect of DNA binding. On the other hand, DNA was kept frozen throughout the simulations. Therefore, the potential for DPO4-DNA binding is expressed as where V binaryDPO4 SBM is the nonbonded potential (native contact) within DPO4 existing only in the DPO4-DNA binary structure, V DPO4ÀDNA SBM is the native contact potential between DPO4 and DNA, and V DPO4ÀDNA Elc is the electrostatic potential. V binaryDPO4 SBM and V DPO4ÀDNA SBM are SBM terms and provide the driving forces for the formation of the functional DPO4-DNA complex .
Elc is mostly non-native, except when there is a native contact formed by two oppositely charged beads. Electrostatic interactions are known to play important roles in fast 3D diffusion  and in facilitated 1D search on DNA (von Hippel and Berg, 1989).
Only Arg and Lys in DPO4 were modeled to carry one positive charge, while and Asp and Glu in DPO4 and the phosphate pseudo-bead in DNA were modeled to carry one negative charge. In practice, we used the Debye-Hü ckel (DH) model, which considers the ion screening effect of a solvent to describe electrostatic interactions (Azia and Levy, 2009). The DH model was scaled to the strength at which two oppositely charged beads located at 0.5 nm would form the same strength as the native contact (1.0). This energy balance between the electrostatic and native contact interactions in SBM was previously suggested (Azia and Levy, 2009) and subsequently applied to a wide range of systems (Chu et al., 2012). Furthermore, we also used the rescaled V apoDPO4 SBM that led to the same DPO4 folding at room temperature with different values of r for investigating DPO4-DNA binding. Finally, we performed the DPO4-DNA binding simulations at room temperature. Details of the models can be found in SI and our previous work (Chu et al., 2012).
We used umbrella sampling simulations to calculate the free energy landscape of DPO4-DNA binding for different values of r. For a protein folding SBM, the fraction of native contacts Q is a typical reaction coordinate (Cho et al., 2006) and therefore is usually used for applying the biasing potential. For DPO4-DNA binding, the fraction of interchain native contacts Q DNA is an intuitively obvious candidate for performing umbrella sampling. However, as noted elsewhere (Cao et al., 2016;Chu and Wang, 2019), Q DNA cannot be used to distinguish among different unbound states, which all have Q DNA ¼ 0. This would lead to great difficulty in accelerating the diffusion stage of binding and thereafter establishing the free energy for the unbound states. Instead, we chose the distance root-mean-square deviation of native contacts between DPO4 and DNA (d RMS ) to perform the umbrella sampling simulations (Chu and Wang, 2019). d RMS is given by where N is the number of summed native contacts, r ij is the distance between pseudo-beads in DPO4 and DNA forming native contacts, and s ij is the value of the corresponding distance at the native structure. d RMS has a minimum value of 0 nm for the native DPO4-DNA binary structure and deviates from 0 nm as unbinding proceeds. DPO4-DNA binding was described by the three different processes shown in Figure 4B. The calculations of kinetics were done by performing three additional simulations: free diffusion of DPO4, the transition from the EC to the IS, and the transition between the IS and BS. The free diffusion of DPO4 was characterized by the simulations of free DPO4 for different values of r. The simulations for estimating the encounter times at the EC-to-IS transition started from different EC complex configurations (defined by forming one native binding contact between DPO4 and DNA) and ran until the arrival of the IS or the complete dissociation of DPO4 from DNA.
Since a high barrier is detected between the IS and BS ( Figure 4A), especially when r is high, the corresponding transitions are expected to be very slow. This makes the direct transition between the IS and BS inaccessible by the kinetic simulation. We used a kinetic rate calculation framework based on enhanced sampling simulations (Tiwary and Parrinello, 2013;Salvalaglio et al., 2014). Specifically, we applied frequency-adaptive metadynamics simulations to investigate the transition between the IS and BS (Wang et al., 2018). We calculated the transition times for different values of r and found strong correlations with the kinetics inferred from the thermodynamic free energy landscapes (Figure 4-figure supplements 4 and 5). The result showed the consistency between the thermodynamic and kinetic simulations Cao et al., 2016). Further details of the frequency-adaptive metadynamics simulations can be found in SI Appendix 1. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. replicas attempted to exchange at every 1000 MD steps following the Metropolis criterion. Each replica ran for 1 Â 10 9 MD steps long. We observed reasonable exchanging probabilities between neighboring replicas, which are all higher than 0.2, indicating a good sampling. Finally, WHAM was used to collect all the replicas and calculate the thermodynamics of the DPO4 folding (Kumar et al., 1992). . It is worth noting that the P domain is made up of two sequentially separated segments (residues 1-10 and residues 77-166). We practically added a 5-residue long segment that has all glycines, linking the residue 10 to 77 (Appendix 3-figure 10). Such a long segment only has bond stretching and non-bonded repulsive potential, which has little effect on folding of the P domain. REMD simulations were performed for all the four domains independently.

Author contributions
Kinetic simulations for the DPO4 folding were performed for different r under the constant temperatures T p and T r , respectively. For each , we set up 100 independent simulations with a maximum length of 4 Â 10 8 MD steps at T p and 0:2 Â 10 8 MD steps at T r , starting from different unfolded DPO4 conformations. The individual simulations were terminated when DPO4 accomplished folding, or the maximum MD steps was reached. We found that DPO4 can accomplish folding in most of the simulations (more than 91 out of 100 simulations for all .). Finally, the first passage time (FPT) was collected by the criterion that the corresponding Q firstly exceeds 0.75, to represent the folding rate.
We used umbrella sampling to quantify the free energy landscape of DPO4-DNA binding. d RMS of the DPO4-DNA binding native contact was applied as the reaction coordinate and 151 windows range from 0.0 nm to 15.0 nm were practically used. The simulation at each window was performed for 0:2 Â 10 8 MD steps. After the first round of umbrella sampling simulation, the probability distributions of d RMS between 3.2 nm and 3.3 nm have little overlap when is large. We therefore added 10 more windows in the range of 3:2~3:3 to ensure the sufficiency of sampling. Finally, we performed four same sets of umbrella sampling simulations (except that different initial configurations were used for different sets of umbrellas sampling). All the data were collected and analyzed by the WHAM (Kumar et al., 1992).
We calculated the theoretical binding affinity K d from the binding free energy landscape. We simplified the DPO4-DNA recognition to a two-state association and dissociation process: ½DPO4 þ ½DNA ÀÀ* )ÀÀ ½Complex , where the 'Complex' is the (meta)stable binding complex that contains both the BS and IS. The binding affinity K d can be calculated by: , where P b is the probability of the binding complex and ½DPO4 0 is the total concentration of DPO4 in the system. In our simulation, we used a periodic cubic box with length of 20 nm, so N A Â ð20 Â 10 À9 Þ 3 mol=L ¼ 0:21 Â 10 À3 mol=L , where N A is the Avogadro constant. P b can be calculated from the binding free energy landscape with the following expression: for separating the IS and BS at ¼ 0:7, when the theoretical K d approximates to the experimental K d . We finally set d b=u rms ¼ 3:25 nm and obtained theoretical K d ¼ 2:24 nM, which is close to the experimental K d of 3-10 nM (Fiala and Suo, 2004;Sherrer et al., 2009).
To calculate the diffusion coefficient of free DPO4, we performed 10 independent simulations starting from folded states of DPO4 for each r. Every simulation ran for 1 Â 10 8 MD steps. The free diffusion coefficient D of DPO4 was calculated by applying the command 'g msd' implemented in Gromacs (4.5.7) (Hess et al., 2008).
To calculate the encounter times for the transition from EC to IS, we performed 200 independent simulations starting from the DPO4-DNA configurations at EC, where only one native contact between DPO4 and DNA was formed. The simulation ran until the arrival of IS or complete dissociation of DPO4 from DNA.
To calculate the kinetics between the IS and BS, we used the frequency adaptive metadynamics approach (Wang et al., 2018). Metadynamics has been shown to be not only a powerful enhanced sampling method to efficiently explore the complex free energy landscape (Laio and Parrinello, 2002), but also a reliable approach to estimate the kinetic rate for a basin-to-basin transition (Tiwary and Parrinello, 2013). The acceleration factor by metadynamics was found to be: , where V bias ðtÞ is the biasing potential and the average is over a metadynamics simulation run up until the simulation time t sim . The kinetic calculation by metadynamics simulation is accurate if adding the bias is sufficiently infrequent so that the barrier region is not affected by the biasing potentials (Tiwary and Parrinello, 2013). To overcome the computational expenses laid by the infrequent bias adding, we applied an efficient approach, that is, frequency adaptive metadynamics simulation (Wang et al., 2018), where a strategy of the fast filling up the basin followed by the infrequent bias near the barrier was proposed.
Practically, we used d RMS as the collective variable to add the biasing potential in metadynamics simulations. The height of the Gaussian biasing potential was set to 1.0 for the transition from BS to IS and changed to 2.0 for the transition from the IS and BS when ! 1:0. The well-tempered metadynamics simulation was applied with bias factors varying from 2.0 to 100.0 (increasing with ) ( Barducci et al., 2008). The initial deposition time for adding a bias is set to 1000 MD steps and increased to the maximum of 2 Â 10 5 MD steps, leading to the infrequent metadynamics. The details of the frequency adaptive metadynamics simulation can be found here (Wang et al., 2018).
We performed 100 independent frequency adaptive metadynamics simulations for the forward and backward transitions between the IS and BS for each r. A simulation for the transition from the IS to BS was deemed to a successful transition event when d RMS is smaller than 0.1 nm, while a simulation for the transition from the BS to IS was deemed to a successful transition even when d RMS is larger than 2.5 nm. The calculation of the transition time t trans was verified using a Kolmogorov-Smirnov (KS) test (Salvalaglio et al., 2014) to examine whether the cumulative distribution of the transition time obeys a Poisson distribution. We found all the p-values were higher than 0.05, confirming the reliability of the calculation (Salvalaglio et al., 2014).
We also used free energy landscape to estimate the times for the transitions between the IS and BS based on the energy landscape theory (Socci et al., 1996). The calculation procedure of t Ã trans is described as follows. In principle, t Ã trans can be calculated from the double integral of the free energy landscape projected onto the reaction coordinate (here we used d RMS ) with the expression (Socci et al., 1996;Chahine et al., 2007) where Dðd RMS Þ is the position-dependent diffusion coefficient. By approximately setting Dðd RMS Þ as a constant, we calculated the kinetic stability from the free energy landscape for different values of . In practice, we set the thresholds of d RMS for the transition from the IS to BS with d RMS ðStartÞ ¼ 3:5 nm and d RMS ðEndÞ ¼ 0:1 nm and for the transition from the BS to IS with d RMS ðStartÞ ¼ 0:0 nm and d RMS ðEndÞ ¼ 2:5 nm.