Linking in domain-swapped protein dimers

The presence of knots has been observed in a small fraction of single-domain proteins and related to their thermodynamic and kinetic properties. The exchanging of identical structural elements, typical of domain-swapped proteins, makes such dimers suitable candidates to validate the possibility that mutual entanglement between chains may play a similar role for protein complexes. We suggest that such entanglement is captured by the linking number. This represents, for two closed curves, the number of times that each curve winds around the other. We show that closing the curves is not necessary, as a novel parameter G′, termed Gaussian entanglement, is strongly correlated with the linking number. Based on 110 non redundant domain-swapped dimers, our analysis evidences a high fraction of chains with a significant intertwining, that is with |G′| > 1. We report that Nature promotes configurations with negative mutual entanglement and surprisingly, it seems to suppress intertwining in long protein dimers. Supported by numerical simulations of dimer dissociation, our results provide a novel topology-based classification of protein-swapped dimers together with some preliminary evidence of its impact on their physical and biological properties.

The apparent intertwining between proteins assembled in complexes is certainly a distinguishing characteristic of these systems. An interesting issue to explore is the possibility of introducing topology-based descriptors that can capture the entanglement in a robust and measurable way, and that can be related to either some physico/ chemical properties or biological functionalities.
For instance, for single chain globular proteins it has been observed that the backbones may entangle themselves into a physical knot 31-36 that does not disentangle even in the denatured unfolded state 37 . Knotted proteins are interesting because they are rare, and their folding mechanisms and function are not well understood, although it has been proposed that they might increase thermal and kinetic stability [38][39][40] .
Although knots are mathematically defined only for closed loops 41 , in the last decade there have been several attempts to introduce sufficiently robust and topologically inspired measures of knots in open chains 42 . For a single protein, for instance, one can close its backbone by connecting the N and C termini to a point (chosen randomly) on the surface of a sphere that contains the chain. This sphere can be either very large compared to the chain size (closure at infinity) or it can be replaced by the convex hull of the chain. In all cases the artificial closure can introduce additional entanglement and several ways have been suggested to either mitigate or control this problem [43][44][45][46] .
For two proteins forming a dimer, if one is interested in measuring the degree of mutual entanglement, the notion of a knotted open chain must be generalised to that of a link between two open chains. In analogy with the procedure used for knots, one can think of closing artificially the two backbones to generate two loops. This can be done by joining the ends of each protein at infinity and computing a link invariant, such as the linking number G n , an integer index that quantifies the number of times that a curve winds around the other (technically, with Gauss integrals evaluated over the protein backbone, G n detects the degree of homological linking between two closed curves 41 and can be used to classify proteins 47 ).
Two curves are not linked if G n = 0 while G n = 1 or G n = − 1 denotes the simplest link between two loops. Since the sign of G n depends on the orientation of the two curves, in our implementation we choose to follow the standard N-C orientation along the backbones of the proteins. As in knots, the random closure may introduce additional linking between the two chains and an estimate of G n is necessarily a probabilistic one that requires a sufficiently large number of closure paths 45 .
We denote by G the average of the linking number G n over many closures. Along with G, we consider a Gaussian entanglement indicator, G′ , computed with the same method adopted for G but without closing the curves. We show that G′ strongly correlates with its topological counterpart G. Both estimators are used to analyse a non-redundant set of 3D domain swapped dimers. It turns out that several dimers present a high degree of mutual entanglement.
The short CPU time required to estimate G′ allows a quick systematic mining of linked dimers from protein databanks. With a computationally much heavier test, for some dimers we check whether this measure of entanglement is robust or is just an artifact of the specific crystallographic structure found in the database. This is done by performing simplified molecular dynamics simulations, starting from several native structures with non-trivial values of G′ . The time evolution of G′ (t) during the process of dissociation of the dimer reveals additional information on the amount of intertwining of the two proteins.

Results
Databank. Within the protein databank we have found n D = 110 non-redundant domain-swapped proteins and for each of them we have computed the average linking number G and the Gaussian entanglement G′ for open chains (see Methods for details, and below). The results are reported in Table 1, ranked for increasing G′ . The table also indicates whether the dimer is human (33 cases, tagged with a "h") or not, and reports the number N of C α atoms of each protein forming the dimer.
Protein mutual entanglement estimators: G e G′. As a first indicator of the mutual entanglement of two proteins belonging to a given dimer, we consider the Gauss integral G n computed over the pair of loops obtained by closing randomly each protein C α backbones on a sphere (see Fig. 3 and Methods for details). For a given closure, G n is an integer 41 but, once averaged over several random closures, its mean value G is eventually a fractional number.
Alternatively, we can apply the use of the Gauss integrals to open backbones. This measure, G′ , is certainly not a topological invariant anymore, but nevertheless it captures the interwinding between the two strands and does not require averages over many random closures. By computing these two quantities on the whole set of swapped domains in our dataset, we can notice that there is a strong correlation between G and G′ (see Fig. 4). This result validates, at least for the domain-swapped proteins, the use of G′ as a faithful measure of the mutual entanglement. The reason to prefer G′ is twofold: First, the estimate of G′ does not require a computationally expensive averaging over different closure modes. Second, there are cases in which the closure procedure does not work properly, giving rise to an unreasonable value of G (compared to G′ , see the point with G > 2 and G′ ≈ 0 in Fig. 4).

Analysis of the Gaussian entanglement G′.
The histogram reproducing the number of swapped dimers with a given G′ is plotted in Fig. 5(a): The distribution of G′ is fairly well fitted by a Gaussian with mean ≈ − 0.1 and standard deviation ≈ 0.63. The plot has high fraction of cases with − 1 < G′ < 1, suggesting that most of the 3D domain-swapped dimers are not linked. On the other hand, there is a consistent percentage of structures that exhibit a non trivial value of |G′ |. In particular, in Table 1 there are 15 structures with G′ < − 1 and 4 dimers have G′ > 1. Hence, more than 15% of the dimers in our databank have |G′ | higher than 1. The figures also tell us that the statistics of mutual entanglement displays an asymmetry towards more negative values of G′ . Indeed, in Table 1 one could notice that about two thirds of the dimers have G′ < 0. For obtaining a better evaluation of the spread and average value of G′ , we analysed the empirical cumulative distribution function F(G′ ), namely the fraction of configurations that have a value at most equal to G′ , see solid lines in Fig. 5(b). These have been fitted by an error-function with non-zero average. The fit yields average ′ = − . G 0 163 0 and standard deviation Δ G′ = 0.853 (the corresponding fit is shown as a dashed line in Fig. 5(b)). If the fit is restricted to non-human dimers, we get ′ = − . G 0 839. Again, mean values are lower than zero both for the case of human proteins and non-human ones. A fraction of data ≈ 64% (non-humans) and ≈ 70% (humans) have G′ < 0.
The asymmetry in the distribution in favor of structures with negative G′ is slightly more marked for human swapped dimers. This could be explained by the fact that, for human proteins, the interface between the secondary structures of the two monomers is mainly formed by swapped β-structures: the preferential right-handed twist of the β-sheets together with the higher frequency of anti-parallel pairings may imply a negative value for G′ .
In order to verify whether our measure of mutual entanglement is affected by some bias that can be introduced by the different lengths of proteins, in Fig. 6 we plot G′ as a function of the number of amino acids in the proteins forming the dimers. As a matter of fact we see that the mean value of ′ ≈ G 1/2 is not varying significantly in the range under investigation, which includes protein lengths ranging from about 50 to 550 amino acids. Only fluctuations are larger at  Table 1. Domain-swapped dimers ranked from lowest to highest Gaussian entanglement G′. The mean linking number (G) and the number of amino acids in each protein of the dimer (N) are also reported for the analysed dimers. Human proteins are tagged with a "h".
small length due to the presence of more data. Therefore we conclude that, in our dataset, the Gaussian entanglement is a parameter intrinsic to the dimers and it is not affected by entropic effects induced by the length of the protein.
Dynamical entanglement. The values of G′ are easy to compute from configurations and thus represent a basic indicator of the mutual entanglement of a structure. Through visual inspection of configurations, as those shown in Figs 1 and 2, one verifies indeed that dimers with large |G′ | are quite intertwined. However, from the same figures, it appears that, in addition to a global twisting of one protein around the other, G′ may be affected by some local details of the chains, such as their 3D shape near their ends. These local details should be irrelevant if one thermally excites the dimer, which should unfold with a time scale that corresponds to the Rouse dynamics needed to untwist one whole chain from the other 48 . Motivated by these observations, to complement G′ we tackle the problem of the entanglement from a dynamical perspective. For some test dimers we monitor the evolution of the value of G′ (t) with time, in a Langevin simulation where only excluded volume effects play a role (besides of course the chain connectivities). This is equivalent to the unfolding at a sufficiently high temperature.   The average curve of G′ (t) over many trajectories starts from the static value of the crystallographic structure ) and decays to zero for long time. It turns out that an exponential form τ − ⁎ G e t/ represents well the long time decay of ′ G t ( ). However, the extrapolated value of the fit at time t = 0, namely G * , does not necessarily match the static value G′ . In the studied cases, shown in Fig. 7 and listed in Table 2, we find both instances of > ′ ⁎ G G (3DIE and 1LGP), and < ′ ⁎ G G (1WKQ, 1LOM, 1M0D). This shows that G * , a more time consuming option than G′ , may however be considered for complementing the quick, static evaluation of the Gaussian entanglement. Of course, any dynamical procedure provides a result that depends on the kind of dynamics used to disentangle the structure. For example, at room temperature we may expect different G * 's if we perform all  The second parameter of the fit, the timescale τ (in simulation units), should increase with the chain length. This is expected for Rouse dynamics of polymers in general 48 : to shift the polymer center of mass of one radius of gyration one needs to wait a time ∼ ν+ N 2 1 with Flory exponent v ≈ 3/5. It is not possible for us to assess if this scaling is respected, given the few cases analysed. However, these cases correctly display an almost monotonically increasing trend of τ with N (compatibly with error bars, see Table 2). Note that strong logarithmic corrections to the scaling τ ∼ ν+ N 2 1 are also expected in unwinding processes 51,52 .

Discussion
Mathematically, two curves are linked or not, in a rigorous sense, only if they are loops. Hence, it is not trivial to estimate the level of intertwining between two open chains. Yet, the mutual entanglement is a well-visible feature in the crystallographic structures of several domain-swapped dimers. Being interested in quantifying such entanglement, with Gauss double integrals over the backbones of the two proteins in the dimers we provide a simple and efficient indicator, the Gaussian entanglement G′ . Indeed, according to our comparisons, a procedure for looping each protein (with an artificial continuation escaping from the core of the dimer) produces on average a linking number G that is strongly correlated with G′ . This suggests that such procedure can be avoided in practice, one may just rely on the information from the open chains, encoded in G′ .
We report that about 15% of the domain-swapped dimers have a significant ′|> G 1. This is quite intriguing, especially if compared with corresponding figures for knotting of single proteins, where about 1% of the PDB entries has been found to host a knot 35 .
The asymmetry in the typical values of G′ , with many more cases with G′ < 1 than with G′ > 1, is another interesting feature emerging from our analysis. This asymmetry could be explained by the conjecture that the Gaussian entanglement, despite being a global feature, can be deduced from the local twisting of closely interacting swapped structural elements. In several cases the latter are β-strands within the same sheet (see for example 1M0D, 1LGP in Fig. 1 and 3DIE, 1WKQ in Fig. 2), so that the more frequent case of a right-handed anti-parallel β-sheet 53 would indeed imply a negative Gaussian entanglement. With a preliminary overview, we note that anti-parallel swapped β-sheets are indeed occurring more frequently in the human dimers of our database than in the non-human ones, and human dimers have indeed on average a more negative G′ . The dependence of the   intertwining of domain-swapped protein dimers on the local twist of swapped interacting elements is a feature clearly worth further investigation. If confirmed, the tuning of local interactions could then be an evolutionary mechanism used by natural selection to control the emergence of topological entanglement in domain-swapped dimers. We also tried to investigate whether there is a correlation between entanglement and pathological states. However, our analysis of domain-swapped dimers associated with pathologies 17 does not show the emergence of any clear pattern.
The longer the proteins in the dimers, the slightly lower is their mutual entanglement. This is a surprising feature because one might anticipate that longer chains should be more easily entangled than shorter ones. Thus, it seems that the natural selection has acted against a form of random interpenetration in long domain-swapped protein dimers.
Via numerical simulations of the dissociation of some dimers, we observe that the presence of a non-trivial mutual entanglement is a robust characteristic, which vanishes exponentially with time during the dimer unbinding. The exponential fit furnishes a characteristic disentanglement time τ whose values does not depend on G′ and is weakly correlated with the length of the proteins. The amplitude of the exponential decay of the Gaussian entanglement furnishes a further estimate (G * ) of the intertwining in the dimer, which complements the G′ of the crystallographic structure (they are not exactly equal to each other) in assessing the amount of linking in the dimer.
Our new approach of classifying domain-swapped protein dimers according to their mutual entanglement will likely add novel insight on the crucial role played by the generic topological properties of linear polymer chains in the protein context. As already demonstrated in the case of knotted protein folds 36 , the presence of a global topological constraint, such as the linking between two protein chains, may strongly impact the conformational properties, the thermodynamic and kinetic stability, the functional and evolutionary role of domain-swapped structures.
Finally, it is interesting to speculate on the possible outcome of a single-molecule experiment performed by pulling apart the two protein backbones of a domain-swapped dimer with significant entanglement (high |G′ |). A similar experiment was carried on very recently for single-domain protein knots, showing that the knotting topology of the unfolded state can be controlled by varying the pulling direction 54 . In the linked dimer case, similarly, we expect the choice of the pulling directions to be crucial in allowing or not dimer unlinking and dissociation into monomers.

Data bank of 3D domain-swapped dimers.
In order to derive a statistically significant ensemble of non-redundant domain-swapped dimers, we merge two existing databanks of domain-swapped proteins, namely 3DSwap [55][56][57] and ProSwap 58,59 . These databanks provide curated information and various sequence and structural features about the proteins involved in the domain swapping phenomenon. PDB entries involved in 3D Domain swapping are identified from integrated literature search and searches in PDB using advanced mining options. We first consider only the dimers and, to avoid the presence of related structures, we consider the UniprotKB code. For each code, we select only one protein, the one obtained experimentally with the highest experimental accuracy. We then filter the remaining structures to avoid the presence of holes in the main backbone chain. Specifically, we discarded those proteins in which the distance between two C α , listed consecutively in the pdb file, is bigger than 10 Å. As a matter of fact, such holes could affect dramatically the reliability of the linking number and there is not an obvious strategy to join them artificially. At the end we obtained a databank of n D = 110 proteins, whose PDB code is reported in Table 1. Out of these, 33 were human.

Gauss integrals.
Our procedure for estimating the amount of linking between two proteins uses Gauss integrals. As representative of the backbones of proteins, we consider the chains connecting the coordinates  r of the C α atoms of amino acids, which are N in each of the two proteins in the dimer.
A definition of linking number between two closed curves γ 1 and γ 2 in 3 dimensions is given by the Gauss double integral, (1) (1) This formula yields integer numbers G n = 0, − 1, 1, − 2, 2, etc. Such definition is adapted to compute the amount of linking between two proteins. We need first to define a procedure that closes each open chain via the addition of artificial residues. This closure starts by computing the center of mass of the dimer and by considering such center as a repeller for the new growing arms. Let us describe the method for protein 1, as for protein 2 it is exactly the same: from each of the two free ends we continue with a path expanding diametrically from the center of mass, with a length corresponding to n = 25 typical C α -C α distances ≈ .
 3 8 Å. At this stage the polymer is composed by N + 2n residues. Since there is some arbitrariness in the final closure joining the two artificial new end residues, we perform a statistics over 12 different closures, each being a meridian along a sphere where the poles are the artificial end residues. Semicircular closure paths are drawn, each containing a number of artificial residues n′ that makes their bond distance as close as possible to . An example is shown in Fig. 3 , for which we use the midpoint approximation ≡ + . For a given choice z of the closure for both proteins, out of the  = 12 × 12= 144 possible ones, we have  (1) (1) ( 2) 3 (1) ( 2) has no statistical averaging and is a straightforward alternative to G in the estimate of the linking of proteins.
Simulations. In our molecular dynamics simulations, each protein forming the dimer is modeled as a self-avoiding chain of N beads. Each bead has radius σ and is centered in the C α position of a residue. Adjacent beads of each protein are tethered together into a polymer chain by an harmonic potential with the average C α − C α distance along the chain equal to 1.5 σ. To take into account the excluded volume interaction between beads we consider the truncated Lennard-Jones potential is the distance of the bead centers i and j, θ is the Heaviside function and ε is the characteristic unit of energy of the system which is set equal to the thermal energy k B T.
The system dynamics is described within a Langevin scheme: where U 1 is the total potential of the i th particle, γ is the friction coefficient and η is the stochastic delta-correlated noise. The variance of each Cartesian component of the noise, σ η 2 satisfies the usual fluctuation dissipation relationship σ γ = η k T 2 B 2 . As customary, we set γ τ = m/(2 ) LJ , with τ σ ε σ = = m mk T / / LJ B being the characteristic simulation time. From the Stokes friction coefficient of spherical beads of diameters σ we have γ πη σ = 3 sol , where η is the solution viscosity. By using the nominal water viscosity, η = 1 sol cP and setting T = 300 K and σ = 2.5 nm, one has τ πη σ ε = = 6 / 74 LJ sol 3 ns. To study the unfolding dynamics of a given dimer, we take its folded configuration, as given by the PDB, as the initial condition. For each initial condition we generate 100 different molecular dynamics trajectories by integrating numerically (6) up to τ = t 10 LJ 4 . During the dynamics we monitor the quantities G and G′ . In Fig. 7 the curves are obtained by averaging over the 100 trajectories. Simulations are performed with the package LAMMPS 60 .