Molecular dynamics simulations disclose early stages of the photo-activation of cryptochrome 4

Birds appear to be equipped with a light-dependent, radical-pair-based magnetic compass that relies on truly quantum processes. While the identity of the sensory protein has remained speculative, cryptochrome 4 has recently been identified as the most auspicious candidate. Here, we report on all-atom molecular dynamics (MD) simulations addressing the structural reorganisations that accompany the photoreduction of the flavin cofactor in the European robin cryptochrome 4 (ErCry4). Extensive MD simulations reveal that the photo-activation of ErCry4 induces large-scale conformational changes on short (hundreds of nanoseconds) timescales. Specifically, the photo-reduction is accompanied with the release of the C-terminal tail, structural rearrangements in the vicinity of the FAD-binding site, and the noteworthy formation of an α-helical segment at the N-terminal part. Some of these rearrangements appear to expose potential phosphorylation sites. We describe the conformational dynamics of the protein using a graph-based approach that is informed by the adjacency of residues and the correlation of their local motions. This approach reveals densely coupled reorganisation entities, i.e. graph communities, which could facilitate an efficient signal transduction due to a high density of hubs. These communities are interconnected by a small number of highly important residues. The network approach clearly identifies the sites restructuring upon photo-activation, which appear as protrusions or delicate bridges in the reorganisation network. We also find that, unlike in the homologous cryptochrome from D. melanogaster, the release of the C-terminal domain does not appear to be correlated with the transposition of a histidine residue close to the FAD cofactor.


Introduction
Magnetoreception, the remarkable trait of perceiving Earth's weak magnetic field, is widespread among the animal kingdom [1][2][3][4]. Yet, the underlying sensory mechanisms have remained elusive. In several instances, most notably the compass sense in migratory birds [1,2,[5][6][7][8], the accumulated evidence supports Schulten's revolutionary hypothesis of a radical pair-based magnetic sensor [9]. According to this model, magnetoreception is the result of the quantum coherent evolution of the singlet and triplet states of transient pairs of radicals under the influence of spin-selective reactions and magnetic interactions [10]. Ritz, Adem and Schulten were the first to speculate that the underlying radical pair could be formed by photo-excitation in the animals' eyes in cryptochromes [11], a class of blue-light sensitive flavo-proteins that shares some similarity with photolyases [12]; this supposition still applies;. To date, cryptochromes are the only vertebrate proteins known to form radical pairs in a physiologically significant photoreaction and their relevancy to magnetoreception has indeed been implicated in a multitude of studies [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28]. The reader is referred to the many reviews on this subject for a more detailed exposition [10,27,29,30]. Interestingly, radical pair magnetoreception is considered Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. a truly quantum process in a biological environment [31]. The underlying coherent spin dynamics are a topic of much current research [29,[32][33][34][35][36][37][38][39]. Here, we focus on the processes that follow up on this initial stage.
Four different cryptochromes have been found in the retinae of birds. Among these, two (Cry1a from the garden warbler and Cry4 in chicken) have been shown to form long-lived flavo-semiquinone radicals under photoexcitation [40,41]. Cry1 has been located at the inner and/or outer segments of UV-sensitive cone cells [42,43]. Cry1b was found in the photoreceptor inner segment and the cytosol of ganglion cells [24,44]. Recently, Günther et al and Pinzon-Rodriguez et al have provided circumstantial support for the involvement of Cry4 in magnetoreception [20,21]: according to [20], Cry4 is localised in double cones and long wavelength single cones in the retinae of European robins and chickens. It shows only a weak circadian oscillation [20,21], but its expression is upregulated during the migratory season in European robins [20]. Importantly, as a Type IV cryptochrome, it is able to securely bind the photoactive flavin cofactor, which has previously been questioned for Type II animal cryptochromes including Cry1a from birds [45,46].
The structure of none of the four avian cryptochromes has so far been resolved. A homology model of Cry4 from the European Robin (E. rubecula; the protein is henceforth referred to as ErCry4) has recently been established [20]. Its three-dimensional structure and folding pattern resembles that of resolved cryptochromes and photolyases and is shown in figure 1. Its fold is characterised by an α/β-domain and a helical domain. The C-terminal tail (CTT) is thought to mediate signalling functions [41,47]. In the dark, Cry4 binds a fully oxidised flavin adenine dinucleotide (FAD) chromophore in a central groove [20,46]. Based on the nomenclature established for the cryptochrome of D. melanogaster (DmCry), the structural elements surrounding the FADbinding site (see figure 1(a)) are referred to as the C-terminal lid (residues 420-446 in DmCry; 400-421 in ErCry4), the protrusion motif (residues 288-306 in DmCry; 273-282 in ErCry4), the phosphate-binding loop (residues 249-263 in DmCry; 231-248 in ErCry4), and the CTT base loop (residues 154-160 in DmCry; 147-151 in ErCry4). Like DmCry and other animal cryptochromes, the protein features an electron transfer chain comprising four tryptophan residues (labelled W A , W B , W C and W D ), the tryptophan tetrad [20]. Photoexcitation is accompanied by the reduction of the flavin cofactor yielding FAD •in the first place. In chickens, this radical can be further photo-reduced to FADHin a process that likely involves intermittent FADH • [41]. It is currently unknown whether the direct photo-reduction of FAD or the dark-state re-oxidation of the fully reduced flavin, FADH -, gives rise to the magnetosensitive radical pair underpinning the compass sensor [14,16,29,31]. Some evidence in favour of the re-oxidation hypothesis has become available recently [18,26,[48][49][50][51][52][53]. Furthermore, a suggestion of how to overcome the issue of fast spin relaxation in the superoxide radical anion, which is closely linked to the latter hypothesis, has been provided [33,35]. In any case, here we focus on the photo-reduction, because it entails a well-defined initial state in terms of a radical pair comprising the flavin anion radical, FAD •-, and the radical ion W D •+ . Note, however, that for DmCry comparable structural changes are induced by photo-reduction and chemical reduction, suggesting that the charge on W D might not be essential for the induced rearrangements [54].
In order to function as magnetic sensor, the radical pair in the cryptochrome must engage in at least two reactive pathways: spin-selective charge recombination, which discriminates the different spin states of the pair, and the competing formation of a signalling state, which induces a cascade of (currently unknown) events eventually giving rise to nervous excitation. In birds, except for the likely involvement of the CTT, little is known about the conformational changes of the protein related to signalling [41,47,49,50]. More specifically, for the cryptochrome 4 from chicken, the molecular accessibility of the C-terminal region to a specific antibody was found to be reduced upon the photo-induced formation of the flavin semiquinone [41,47]. Light-dependent changes in its trypsin digestion pattern furthermore suggest that the CTT may bind the FAD-binding domain in the photo-activated state [41]. This suggestion is in stark contrast to the photo-activation of DmCry and the cryptochrome from A. thaliana (AtCry), which have been studied in more detail, as well as that of the avian Cry1a [49,50]. For DmCry, the reduction of the flavin to the anionic semiquinone by light or chemicals induces a complex conformational reorganisation of the protein, which involves the release of the CTT in addition to rearrangements of its surroundings (the CTT-coupled motif) and several more remote sites [54]. As a consequence, the binding to at least one region of the clock protein Timeless is promoted. These conformational changes were found to be fully reversible under regeneration of the resting state. Based on all-atom molecular dynamics (MD) simulations, Ganguly et al [55] have suggested that the release of the CTT is correlated to a transposition of His378 in the vicinity of the flavin cofactor. The authors furthermore argue that protonation of this histidine could enhance the conformational flexibility of the CTT. In plant cryptochromes, the formation of the anionic radical initiates rapid conformational changes in the photolyase homology region (PHR) (such as the loss of β-sheet structure in the α/β region), which precedes the slower release of the CTT [56][57][58][59]. Here, the anionic flavin radical is eventually protonated by a nearby aspartic acid (D396 in Cry1). The photocycle and signalling mechanisms of plant cryptochromes have recently been reviewed in [60] and the reader is referred to this publication for a detailed account on the topic.
The present study aims to elucidate the early structural changes in the PHR that accompany the photoactivation of the flavin in a model of Cry4 from the European robin. To this end, we have run extensive MD simulations, which reveal the fast initial response to the photo-induced charge separation on the timescale of a microsecond. We here interpret these data in terms of topological models derived from graph representations of the protein. In addition to the now popular contact maps, we advocate a graph representation that is simultaneously informed by the correlation of displacements in addition to the usual spatial adjacency. We argue that this representation reveals efficient reorganisation networks, which could mediate the signalling in a more directed, purposeful way than networks predicted from contact maps. Our approach builds upon two Figure 1. (a) Secondary structure model of ErCry4 highlighting the FAD cofactor (lime), the tryptophan tetrad (blue) and pertinent structural motifs such as the C-terminal tail (CTT), the C-terminal lid (red), the phosphate-binding loop (orange), the protrusion motif (yellow) and the CTT base loop (purple). A representative structure of the resting state of the protein is shown, which has been determined through principal component analysis of extensive MD trajectories as part of this study (vide infra; R in figure 3). (b) Rootmean-square deviation of the atomic positions of the protein backbone as a function of time for the dark-state (blue) and two independent radical pair-state trajectories (red and yellow). The RMSD has been calculated by aligning the rigid residues to the reference structure from [20]. (c) Average RMSD per residue for the dark state and the radical pair-state relative to the representative dark-state structure.
pillars: the use of covariance matrices to analyse the correlated dynamics of protein residues from MD, which has for example been described by Hünenberger and co-workers [61] as well as Karplus and co-workers [62,63], and network representations of the protein structure [64] and dynamics [65,66]. In terms of the graph theoretical analysis, our approach is similar to that of Kasahara et al [65].

Molecular dynamics simulations
All MD simulations were performed using the NAMD programme [67] in combination with the CHARMM36 force field including the CMAP corrections for proteins [68][69][70] and earlier parameterisations for FAD, FAD •and W •+ [71][72][73]. An equilibrated MD ensemble of the dark state (DS) configuration (containing fully oxidised FAD and the standard tryptophan residue) was available from a previous study [20]. The existing trajectory was extended by additional 250 ns following the protocol as outlined in [20]. The last 0.9 μs of the combined trajectory were used in the analysis described here. Two independent MD runs of the radical pair configuration (containing the anion semiquinone FAD •and an oxidised tryptophan residue at the terminal position of the tryptophan tetrad, W D

•+
) were generated from the final dark-state geometry by propagating this state by an additional 1 ns and 2 ns, respectively, followed by swapping the force field parameters for those reflecting the radical pair state (henceforth referred to as RPD). A total of 1.9 μs of MD trajectories was then accumulated for the radical pair state (two independent trajectories; 1 μs+0.9 μs). All production simulations pertained to the NVT-ensemble; equilibration runs used the NPT-ensemble. A time step of 2 fs was used and temperature and pressure were kept constant at 315 K and, for the equilibration runs, 1 atm using a modified Nosé-Hoover scheme in conjunction with Langevin dynamics. The SHAKE algorithm [37] was used to constrain bonds involving hydrogen atoms to their respective equilibrium distances. A tetragonal box of approximately 109×85×85 Å 3 contained 23658 TIP3P water molecules, 74 Na + and 66 Cl − ions in addition to the protein. This gave rise to an overall neutral system at a salt concentration of 50 mM. Periodic boundary conditions were employed to replicate the solvated box in all three dimensions. The particle-mesh Ewald summation method with a cut-off of 12 Å was used to evaluate Coulomb forces. The van der Waals energy was calculated using a smooth cut-off of 12 Å and a switching distance of 10 Å. The simulation conditions used for the production runs are summarised in table S1 of the supporting information (SI; available online at stacks.iop.org/NJP/20/ 083018/mmedia).

Contact networks/distance maps
In order to account for the different interaction sites of the cofactor, FAD/FAD •− was represented by 5 sub-residues corresponding to the flavin, the ribityl chain, the diphosphate, the ribose and the adenine, respectively (see figure S1 in the SI). The distance matrices of the centres of mass of all residues and cofactor fragments, d(s), where s ä {DS, RPD}, were evaluated for the equilibrated dark-state and the radical pair trajectories, respectively: Here, r i is the centre-of-mass coordinate of fragment i and 〈 〉 s denotes the average over the DS or RPD trajectory when calculating d(DS) or d(RPD), respectively. For the radical pair states, the first 200 ns after the charge-shift mimicking the electron transfer were excluded from this analysis. The protein 3D structural information was mapped onto a graph in which nodes represent residues/sites and edges indicate the existence of neighbourhood relationships. We considered two fragments to be adjacent, i.e. neighbours, if the average distance of their centres of mass was smaller than 8 Å. Subsequently, topological relationships between residues were explored by graph-theoretical tools such as the geodesic distance (the number of edges in a shortest path connecting two nodes), vertex degree (number of edges incident to a vertex, i.e. number of adjacent residues), the local clustering coefficient (a measure of the link-density in the node's neighbourhood) and the betweenness centralities (a measure of the importance of a node defined as the number of shortest paths that pass through a vertex relative to the total number of shortest paths). Following [74], nodes with a degree exceeding 3/2 times the average degree were classified as hubs. For a comprehensive introduction to network representations of protein structures, the inclined reader is referred to [64].

Displacement correlated contact networks
Global rotational and translational motions of the entire protein were removed by aligning the protein backbone to the average dark-state geometry. The average structure was determined using an iterative scheme: all snapshots were aligned to a template, i.e. the current approximation of the average structure. A new approximation of the average was determined from the aligned structures. The process was repeated until selfconsistency of the average was realised. An arbitrary, i.e. the first, structure from the ensemble was used to initiate the method. Following [55], only residues with a root mean square fluctuation of less than 1.3 Å were considered in the molecular alignment step. The RPD structures were aligned to the average dark state geometry using the same set of rigid residues. The protein structure was divided into N fragments, which formed the basis for the subsequent analysis. Here, we used each residue of the protein as a fragment, while the FAD cofactor was represented by five effective residue fragments as discussed above. Let us denote the ith component (i ä {x, y, z}) of the centre-of-mass coordinate of the kth fragment by r k,i and the 3N-dimensional vector of all residue coordinates by R. Furthermore, let us refer to the 3-vectors corresponding to the coordinate of fragment k as r k . Thus, R is the vector comprising all r k -vectors as partitions, i.e.
Here, the brackets 〈 〉 signify the average over the MD trajectory. C as defined in equation (2) should be rescaled as C′=M/(M−1) C, with M denoting the number of averaged structures, in order to arrive at the best unbiased estimate of the covariance matrix if the observations are from a normal distribution, as would be expected for structures subjected to thermal motion (here M=95 700 and 189 300 for the DS and RPD state, respectively). The 3N orthonormal eigenmodes of the N fragments, Q k , were determined as the solution of the eigenvalue problem By partitioning Q (n) in the same way as R, it is possible to associate 3-vectors q k with every fragment k. The displacement R−〈R〉 can be re-expressing in terms of the eigenmodes by evaluating: The variance of the kth 3-vector in X, x k , due to the nth normal mode is then given by q .
The following heuristics were used to identify correlated motions of fragments. A simple approach can be directly based on the correlation matrix P, which is related to C' by P S C S . 5 Here, S is the diagonal matrix comprising the variances of the displacements of the individual fragments, i.e. the matrix assembled from the diagonal of C′. Fragments k and l have been classified as correlated if the correlation coefficient of any pair of coordinates exceeds a problem specific threshold, r min . In detail, denoting the associated N-dimensional adjacency matrix by M, we may define An alternative approach to detect correlated reorientational motions of fragments is based on the eigenmode decomposition, equation (3). Following [75], two fragments k and l are considered correlated if the same normal modes provide, on average, similar contributions, i.e. if the ordered sets of displacement contributions } of the two fragments are positively correlated. The degree of correlation of d k and d l can be expressed by the Pearson correlation coefficient, corr(d k , d l ). Again, one has to binarise the correlation measure with a suitable r′ min to obtain the adjacency matrix of motional correlation M′ In practice, it is advisable to choose the cut-off distances in equations (6) or (7), r min or r min ′, to be as large as possible while ensuring that the resulting graph comprised a single dominant connected component in addition to a few isolated fragments. For comparable cut-offs the adjacency matrix M′ tends to be less sparse than M and comprise more large-distance correlations. In combination with the additional adjacency criterion (see below), comparable results are obtained from both approaches.
Our analysis aims to identify spatial networks of ErCry4 fragments that provide a pathway for structural rearrangements. In other words, we are interested in networks of local correlations that function as the building blocks of the overall reorganisation dynamics. This view stipulates that not all correlations from M are considered, but only those that pertain to neighbouring interacting sites. In order to accommodate this adjacency aspect, we introduced the matrix D: Here, we have used d min (k, l)=8 Å if fragments k and l are not part of the same rigid secondary structure element. If, on the other hand, k and l belong to the same α-helix or β-strand (as determined by the STRIDE programme [76]), d min =∞; the latter condition reflects that correlated motions over rigid secondary structure elements were always considered as direct link, thereby simplifying the interaction network by providing 'shortcuts'. This is consistent with our observation that for the ErCry4 protein considered here, secondary structure elements are preserved and exhibit collective modes. The displacement correlated contact network graph was eventually generated using the adjacency matrix A as defined by or a similar expression with M′ replacing M. The '°' denotes the Hadamard entrywise product. This graph was analysed using the same topological measures as set out for the contact networks above.

Results
Root-mean-square deviations (RMSD), solvent accessible surface areas and interaction energies Figures 1(b) and S2 in the SI illustrate the temporal evolution of the RMSD of the position of the backbone atoms (C, O, N, CA) in ErCry4 with respect to the reference structure from [20]. It is immediately obvious that the charge shift that generates the radical pair state (RPD) from the ground state molecule (DS), gives way to large fluctuations and an increase of the RMSD. The mean RMSD per residue (figure 1(c)) reveals that large RMSDs are accrued at the CTT and, to a lesser extent, several loop regions (which are to be specified in detail below). Note that in general sites of large RMSDs coincide for the DS and RPD state. However, for the latter the RMSD are typically bigger. The largest differences of mean RMSD per residue for the DS and RPD state are observed for the CTT and the mobile regions surrounding residues 45, 185 and the region following the C-terminal lid around residue number 410. While the RMSD plots reveal interesting features, this analysis is biased by the choice of the reference structure and the protocol for structure alignment. These limitations can be overcome by comparing the average DS structure to the average RPD structure of ErCry4 in internal coordinates (see below). The photo-activation is accompanied by a substantial increase of the solvent accessibility of W D ; its solventaccessible surface area (SASA) increases from 26.5 Å 2 to about 72 Å 2 , as is shown in figure S3 in the SI. Likewise, the SASA is slightly larger for FAD •− in the RPD state than FAD in the DS (26.5 Å 2 versus 30.3 Å 2 ; see figure S4). The Coulomb interaction energy of the FAD cofactor and W D decreases from −0.31±0.18 kcal mol −1 in the DS to −10.64±0.68 kcal mol −1 in the RPD state (see figure S5); the standard deviation increases by a factor of 3.7. As a consequence of the additional charge, the binding energy of FAD is furthermore more negative in the RPD state than the DS; the standard deviations of the binding energy are comparable in both states (figure S6).

Distance matrices and contact maps
In order to assess the internal structure of ErCry4 and its changes upon photo-ionisation, we have evaluated the average distances of the centres of mass of all residues over the equilibrated trajectories. Unlike many other studies that employ the distances of C α -atoms, we have focused on centres of mass as their location is more sensitive to sidechain reorganisations. The cofactor, FAD, has been represented by five effective residues corresponding to the flavin, the ribityl chain, the diphosphate, the ribose and the adenine, respectively, as explained above and illustrated in figure S1 in the SI. The difference of the average distance matrices of the RPD and DS ensembles, equation (1), reveals the identity of segments that undergo major restructuring upon activation. Figure 2 provides a density plot of this difference metric. It exhibits a checkered pattern, which is indicative of mobile segments that change their relative orientation with respect to a rigid mainstay structure. The mean absolute displacement shown in figure 2 is defined as with d i,j denoting the distance of the centres of mass of fragments i and j. This quantity reveals the identity of the mobile residues (top panel in figure 2). The dominant structural changes occur in the C-terminal region, in particular the segment comprising residues 472-486, which moves away from the core of the protein while approaching the distal α-helix H17 (residues 450-456). The conformational changes are, however, not limited to the C-terminus. The difference metric Δ j reveals additional reorganisation of the helix-bound residues Arg419 and Thr420 and several surface-exposed loop regions: residues 405-415, 276-285 (which correspond to the protrusion motif), 237-246 (which cover most of the phosphate binding loop), 179-186 and 38-48. In addition, large shifts of a few solitary residues are observed (e.g. Arg209, Asp367 and Ser130). We have also constructed contact maps (adjacency matrices) by binarizing the average distance matrices with a cut-off distance of 8 Å. The difference of the contact maps of the RPD and DS is indicative of the reorganisation process giving rise to the formation of new, or the interruption of existing, interactions (see figure S7 in the SI).
While the combined analysis of the distance matrices and the contact maps cannot reveal the temporal/ causal chain of events that facilitate these reorganisations, the correlated transpositions of a few key residues appear noteworthy. For example, the restructuring of the loop 237-246 appears to correlate with a reduced distance between Ser245 (in the loop) and Lys234 (change in average distance: -3.4 Å). The latter directly interacts with the diphosphate group of FAD via an H-bond involving its ammonium group. Likewise, Arg209, one of the isolated residues that are strongly displaced, decreases its distance to Pro244 and increases its distance to the rest of the loop. Figure S7 in the SI reveals that the reorganisation of the loop is accompanied by the formation of a series of new contacts with residues 337-350, a mostly α-helical segment that itself does not undergo strong reorganisation, and the segment 417-430. In terms of the difference metric, this is evinced in the displacement of the adjacent residues Arg419 and Thr420 and its upstream segment 405-414. While the former appears to be transpositioned towards the C-terminal domain of ErCry4, the upstream segment 405-414 mostly shifts in the N-terminal direction. In doing so, residues 413 and 414 establish new contacts with an adjacent αhelix (around residue 357). This is one of two reorganisation motifs in the vicinity of the C-terminal segment. The second involves repositioning of Asp367, which is located close to the oxidised W D =Trp369. Unlike Asp367, W D does not strongly change its relative position with respect to the rest of the protein upon photoionisation. The reorganisation of ErCry4 is not confined to the C-terminal part of the protein, but extends to the loop region 276-285 and the N-terminal part. Large displacements can be observed for residues 38-48, which correlate with a reduced Lys377-Ser45 distance. The restructuring of this loop goes along with the formation of a new contact of Arg39 and Asp197, which likely conveys additional rearrangements in the adjacent loop covering residues 179-186. Here, the largest displacement is observed for the surface-exposed cysteine Cys179. Large displacements are also detected for residues 276-285, the protrusion motif; a surface-exposed loop adjacent to one of the α-helices forming the FAD binding pocked.
The residues surrounding the flavin cofactor are comparably immobile. The largest total displacement, Δ j , of a residue within 15 Å of the isoalloxazine ring is due to His357, which is displaced by 0.6 Å away from the flavin; see figure 2. This is remarkable because a similar displacement of a solitary histidine has been implicated in the signalling of DmCry [55]. While these interaction motifs do indeed appear comparable on the first glance, it here involves a different residue. Based on a sequence alignment of DmCry and ErCry4, the histidine that was expected to mediate this mechanism in ErCry4 was His353 (corresponding to His378 in DmCry), which, however, exhibits an insignificant total displacement, Δ j , and a smaller shift with respect to the isoalloxazine (-0.3 Å). Furthermore, His357 makes a new contact with Phe411 and approaches the related loop. The largest individual shift of close-by residues with respect to the isoalloxazine is due to Leu383 (0.9 Å).

Principal component analysis (PCA)
A principal component analysis (PCA) on the motions of the centres of mass of the ErCry4 residues was carried out following the procedure outlined in the Methods section (for additional information, see [77]). The combined RPD and DS trajectories were analysed simultaneously with the aim of identifying the predominant reorganisation modes associated with the photo-induced charge separation and their large-scale correlations in ErCry4. A small number of principal components (PC) was found to account for the observed structural variability. The dominant PC explained 33% and the three leading PCs 64% of the total variance (relative contributions calculated as ; (3)). The corresponding biplots are shown in figure 3.
These plots provided the probability densities for the system configurations expressed as a function of their projection onto the normal modes (see equation (4)).
Several characteristic conformers can be identified from the biplots. Three conformers have been marked and their 3D structures plotted in figure 3. Conformation R is representative for most of the DS-trajectory; A1 and A2 correspond to two conformers occurring predominantly for the photo-activated state. The two main conformers, R and A1, are separated by a translation along PC1 for constant PC2. Note however, that the direct reorganisation path appears to involve a substantial free energy barrier, which the system can avoid by paths intermittently changing PC2 and PC3 (and/or lower order PCs; see the depopulated region between R and A1 along the PC1 component in figure 3). The major reorganisations are linked to the mobile residues that have already been named above based on the analysis of the distance matrices. Figure S8 in the SI shows the weighted RMSD modes [77], which provide the relative contribution of every residue/fragment to the total variance. It also reveals that the reorganisations at different sites are highly correlated, i.e. they proceed in a concerted fashion for changes in a single coordinate. Most mobile sites contribute significantly to all four dominant PCs. The notable exception is the reorganisation of residues 80-135, which appears to be an exclusive feature of PC2 (among PC1 to PC4; see figure S8). Expectedly, the largest per residue contribution of the dominant PCs is due to the CTT. The site surrounding residue 45 accounts for the second largest contribution in PC1 and PC2. The major reorganisations from R to A1, which are predominantly accounted for by PC1, are found at the CTT and the CTT-coupled motifs including the C-terminal lid, the phosphate binding loop and the protrusion motif (see figure 4). In addition, marked structural changes occur at the α-helix containing Asp197 and its adjacent loops (residues 186-206), which increase the solvent exposition of the α-helix. Furthermore, the loop region comprising residues 38-48 at the N-terminal end is internalised. For A2, the structural changes follow a similar pattern. However, the CTT is rotated away from the protein surface. Interestingly, for this structure, the loop region at residues 38-48 coils to form an α-helix (see figure 4, left), which is exposed at the protein surface, quite contrary to what is observed for PC1 at this site. Taken together, our analysis suggests that, just like in cryptochromes from other species, the CTT and the CTT-coupled motif show major responses to photoionisation. In addition, the loop region 38-48 at the N-terminus is identified as a site that undergoes interesting structural transformations upon photo-ionisation.
Protein contact networks/adjacency graphs Contact networks provide an alternative representation of the structural features of proteins. The graph representations were generated from the distance matrices of the DS and the RPD states according to the procedure described above. A graphical representation of the contact network of the DS is shown in figure 5; the plot has been obtained by using the spring-electrical graph embedding of Wolfram Mathematica [78].
Characteristic nodes have been highlighted by colours; mobile sites for which N Δ j >400 Å (see equation (10)) have been drawn in yellow; the isoalloxazine subunit of the cofactor (henceforth denoted F) is shown as a blue circle; other features will be discussed below. The corresponding graph representation of the RPD state of ErCry4 is shown in figure S9 of the SI. Some of the mobile sites are obviously associated with peripheral loops or  extensions. The CTT part of ErCry4 is particularly prominent (even more so for the RPD state; see figure S9), as is the loop in between the helices H6 and H7 at the N-terminal side of the protein and the loop connecting the helices H9 and H10. The prominently reorganising site surrounding residue Ser45 is located at the bottom of the graph, although not at its topological surface. Many mobile residues appear to be scattered over the network without following an obvious pattern. Comparing figure 5 with its counterpart for the RPD state of ErCry4 in figure S9 reveals the structural changes associated with the photo-activation. The release of the CTT is striking as are the reorganisations at residues 180 and 278, which appear to assume a more 'exposed' position. The mobile residues around Ser45-are moved to the bottom surface of the graph. These changes are in accordance with the picture of an opening of the protein structure, which has also been postulated to accompany the activation of other cryptochromes [54][55][56][57][58][59].
The notion of an opening of the protein structure is also supported by analysing the geodesic distance, i.e. the number of edges in a shortest path connecting two nodes in the network; see figure S10 in the SI. In fact, for the DS state of ErCry4, the flavin residue F is classified as a central vertex suggesting that its vertex eccentricity (the largest distance to other vertices) provides information about the protein opening as well as the length of the reorganisation pathway from F to any effector site in the periphery. We find that the eccentricity of F increases from 8 to 11 upon ErCry4 photo-excitation while the graph diameter increases from 15 to 18. We have calculated the difference of graph distance matrices. It reveals distance changes upon photo-activation ranging from −6 (expansion) to +2 (contraction). Among the preserved secondary structure elements, one particularly notes that the helices H7 and H8 approach F, while the helices H3 and H17 are drifting away from F (see figure S10). Figure 5 also features nodes that are deemed important for the structural identity of the network and the transmission of signals. This includes the so-called hubs, nodes with large vertex degree (i.e. the number of edges incident to a vertex; for hubs it is expected to exceed the mean node degree by at least 3/2), and the 20 nodes of highest betweenness centrality, which is defined as where σ st is the number of shortest paths between nodes s and t and σ st (ν) is the number of shortest paths that pass through vertex ν. g(ν) is a measure of the importance of node ν to the connectivity of the network. In particular, the presence of high-betweenness nodes has been suggested to facilitate long range transmission of information in protein networks [64]. Here, 5% of the nodes are classified as hubs, which is a typical fraction for protein networks [74].
As far as our aspiration to identify mobile sites in ErCry4 is concerned, additional insights are provided by the vertex degree, the local clustering coefficient and the node centrality. In fact, mobile sites comprising more than 1 residue have an average degree smaller than the overall mean degree (9.2 and 9.1 for the DS and the RPD state, respectively). Furthermore, for most mobile sites, residues tend to cluster together forming tightly knit groups characterised by a relatively high local clustering coefficient. On the contrary, the betweenness centrality (or likewise the eigenvector centrality) is low for the mobile residues. This portrays the image of mobile sites as groups of peripheral residues that are only weakly coupled to the core protein. Additional details, including representations of the mentioned graph-theoretical parameters as a function of residue sequence number, are provided in the SI (figures S10-S12).

Displacement correlated contact networks
Contact networks provide an accessible way to describe the structure of the DS and RPD states of ErCry4 and their differences. However, it lacks the insight as to how the underlying conformational changes are propagated through the protein, at least beyond the level that can be guessed based on mere adjacency. This is what displacement correlated contact networks seek to deliver by combining the concepts of correlated motion and closeness. Figure 6 shows the cross-correlation network as obtained based on the correlation of the displacements of the centres of mass of the residues over the combined DS and RPD trajectories. The correlation is measured here as expressed by equation (6) in combination with a threshold correlation coefficient of r min =0.7. The network in figure 6 has an astoundingly simple structure, for which most mobile sites are easily recognisable either as protrusions (e.g. CTT, protrusion motif, phosphate binding loop) or feeble bridges (e.g. C-terminal lid). These signalling sites are connected to tightly coupled core structures, by but a few residues. The network also shows evidence of dynamical bottlenecks between the left and right side in figure 6, which roughly separates the N-terminal part from the C-terminal part of the sequence. Not all of the residues are coupled to the main subgraph shown in the figure; a few isolated residues do not show a tendency for correlated motion with neighbouring sites. Among these are the mobile sites Met46, His47, Ile48 and His357. Evidently, this is a function of the cut-off value used in valuing correlation of motion in the protein, which here was chosen to produce a graph as sparse as possible but subject to the constraint that all subgraphs that are uncoupled from the core graph comprise only single residues. This allows the essence of the correlated restructuring motion to be captured, while keeping the network simple. Here, this procedure gives rise to a core connected component of graph diameter 26 and a vertex eccentricity of F of 14 (as compared to 15 and 8, respectively, for the DS contact network detailed above). The most notable feature of this graph is the high density of hubs, which here account for 20% of the nodes as compared to 5% for the contact networks. In addition, the assortativity (the tendency of nodes to attach to nodes of similar degree) of the correlation network is higher than for the adjacency networks. Mobile sites are typically characterised by a lower degree, high local clustering coefficient and low betweenness centrality. In addition, it is noteworthy that all signalling sites except for that surrounding Ser45 are remote from F and thus the network centre. Detailed summaries of the pertinent graph-theoretical parameters are given in the supporting information (figures S13 and S14); we will return to peculiar features in these data in the Discussion. Expectedly, the displacement correlated contact networks do depend on the choice of the correlation metric and the associated cut-off. While the network layout and individual paths can be strongly affected if different methods retain (slightly) different dynamic correlations, the overarching features such as trends in the pertinent parameters, the identity of important nodes/hubs agree. In favour of conciseness, here we shall limit ourselves on the graph based on equation (6).

Discussion
Structural changes upon photo-activation Our data suggest that grosso modo the photo-activation of ErCry4 is associated with an expansion of the protein.
In particular, the CTT is attached in the DS and, when illuminated, released in a fashion similar to what has been described for AtCry and DmCry [54][55][56][57][58][59][60]. We do not see indications of the CTT's attaching to the PHR of the protein in the course of the photo-activation, as suggested for cryptochrome 4 from chicken [41]. In [41], the authors found that the accessibility of trypsin to the FAD-binding domain is reduced upon light illumination, which they interpreted as the binding of the CTT to the protein core, thereby protecting the associated tryptic digestion sites. The present analysis does not support this suggestion. On the contrary, the CTT appears to be released in a way that increases the accessibility at the FAD binding site (see figure 4). The photo-activation of ErCry4 is, however, linked to substantial reorganisation at the CTT-coupled motif, which could possibly alter the susceptibility to trypsin. Alternatively, the discrepancy could be attributed to differences in the conformational changes of the full-length protein relative to the truncated structure currently amenable to theoretical studies or to the protonation state of the flavin. Here, we have investigated the radical anionic state, as this is the state that, in AtCry and DmCry, accounts for the fast, microsecond, structural rearrangements that lead to protein signalling. Indeed, in AtCry these conformational changes precede the protonation of the flavin anion radical at the nitrogen atom N5 with a nearby aspartic acid functioning as a proton donor (D396 in Cry1) [71][72][73][79][80][81][82]. As no comparable proton donor exists in ErCry4 (just as in DmCry; see figure S15 in the SI), one can safely assume that the microsecond conformational dynamics are correctly represented in our calculations. Further studies will be necessary to assess the influence of the full-length CTT once its structure has been revealed.
Signalling mediated by mobile histidines Ganguly et al [55] have conducted a detailed atomistic study of the activation of DmCry and concluded that the CTT release correlates with the conformation of conserved His378 (DmCry numbering), which resides between the CTT and the flavin cofactor. Given the high degree of homology of DmCry and ErCry4 (45.5% sequence identity over the entire sequence), in particular for the PHR, the question arises if a similar conformational change could be associated with the signalling of ErCry4. His378 in DmCry corresponds to His353 in ErCry4. According to the presented data, His353 in ErCry4 does not undergo a significant shift with respect to the isoalloxazine subunit of FAD (F) upon photo-activation (shift: -0.3 Å, in the opposite direction and by less than observed for His378 in DmCry). Furthermore, His353 assumes a peripheral position in the displacement correlated contact network, which does not hint at an important role in the activation of the protein's CTT domain (see figure 6). In particular, the residue does not qualify as hub or important residue from its betweenness centrality. Alternatively, here His357 is realised as a mobile residue in the vicinity of FAD (distance to isoalloxazine ring: ∼9 Å). However, our correlation network analysis indicates that its displacements are only weakly correlated to the mainstay motions, i.e. mostly random, to the extent that the node is not coupled to the main reorganizational network. Thus, it is likewise unlikely that His357 in ErCry4 could perform a similar function to His378 in DmCry. A closer inspection of the protein environment of His353 in ErCry4 reveals that its complacent signalling role does not come as a surprise: unlike for DmCry, the fold of ErCry4 does not allow hydrogen bonding of the histidine to the ribityl unit in FAD. This applies to all protonation states of His353. Correspondingly, we have not yet explored the impact of prototropy or double-protonation for ErCry4, which are relevant for DmCry [55].

Within-protein signalling pathways
Signalling 'pathways' are often understood as isolated continuous chains of residues connecting the key sites of a protein or a group of proteins, i.e. the effector site and multiple possible recipient sites. The present analysis based on networks of short-range correlated displacements portrays an entirely different picture of ErCry4. Instead of single pathways, we have identified difficult-to-define networks of residues that mediate global conformational changes involving most, if not the entirety, of the protein. Specifically, we have established by PCA that two main conformers, interpreted as resting and activated, are linked to changes in a single principal component that simultaneously affects multiple sites including the N-and C-terminal parts of the protein. Furthermore, we have realised that the correlated reorganisation network comprises communities of densely coupled nodes with a remarkably high density of hubs (20% over the entire protein as opposed to 5% in contact networks of average proteins; even dense in the strongly coupled subdomains). For these regions, the coupling network appears highly redundant and able to transmit information in a concerted fashion. The question of a pathway in the sense of a chain of well-defined residues is obviously moot for such tightly knit sections. On the other hand, many of the mobile sites in ErCry4 are characterised by a linear or quasi-linear topology, for which the notion of a pathway applies in good approximation if the view is limited to the dynamics away from the interconnection site. The mentioned, densely coupled communities can be determined from the graph by maximising the so-called modularity, i.e. the fraction of the edges that fall within communities minus the expected fraction if edges were distributed at random [64]. This procedure yields communities of dense connections within reorganisation modules but sparse connections to adjacent modules. In figure 7 we show a 3D representation of ErCry4 for which the 12 communities of cardinality larger than 1 have been coloured differently. According to the above discussion, within these communities signals are transmitted either linearly (for the mobile protrusion of the graph) or in a complex multi-sited mode involving many concurrent pathway. While the simple pathway picture is ill-conceived in the presence of the latter, the reorganisation network is still depending on a few distinguished nodes, the nodes of high centrality. These nodes mediate bottlenecks in the reorganisation network, typically at the interface of the tightly coupled communities. In this way, they facilitate the long range transmission of information while the hubs support short to medium range reorganisation. Note that for ErCry4 all mobile residues but residues 405-407 are located in a community other than that containing the flavin cofactor. Consequently, high betweenness nodes are essential for most reorganisations in this protein. In table 1 we have collected the 20 nodes of highest betweenness centrality as well as the hubs. Some of these nodes are also labelled in figure 6. The most remarkable feature is the fact that 4 residues (Glu102, Met103, Ser250 and Asp385) are crucial in passing the signal from the F-residue to sites at the N-terminus of the protein; if one wished to abolish this pathway, these would be the residues to target. While high betweenness residues also appear for the connection of F and the CTT domain, the system is more redundant at the C-terminal end suggesting that a point mutation will have a smaller impact. Figure 7(b) illustrates the coarse grained network graph of reorganizational communities. There, the widths of the edges are proportional to the number of  pairs of residues transmitting the coupling. This picture suggests that the FAD-site (community 2) is efficiently coupled to community 1, which mediates the release of the CTT (community 8). The mobile site involving residue 45 (contained in community 4) is directly innervated by the cofactor bearing community 2. Table 2 summarises the association of mobile sites with the reorganizational communities they are contained in. It is remarkable that only communities 11+1 and 4+12 appear to be coupled through mobile sites at the interface of the respective pairs. All other links involve high betweenness sites that however are only displaced by a comparably small amount with respect to the protein backbone. Note furthermore that while residues 405-415 are located at the interface of communities 2 and 11, they do not provide a connection via correlated displacements.

Contact maps versus dynamically correlated adjacency networks
With a description of the reorganizational dynamics of ErCry4 based on the correlated motion of adjacent segments established, the question arises if conventional contact networks can provide comparable insights. We here argue that this is not the case, because the mere adjacency of two residues does not necessarily imply an efficient reorganisation pathway. Only if spatially close sites undergo correlated displacements, can a functional relationship be inferred.
Here, the presence of non-productive links in the contact network becomes obvious in the face of the markedly larger sparsity of the correlated displacement graphs, which eventually gives rise to altered reorganisation pathways. This is, for example, reflected in the distance of the mobile sites from the isoalloxazine subunit, F: while the contact network predicts that the site comprising residues 237-246 and the C-terminal lid (residues 405-415) are relatively close to F, the dynamic displacement map reveals that they are as peripheral as the CTT (compare figures S10 and S13 in the SI). Likewise, residues that are identified as important to the contact network by their high betweenness centrality (e.g. His357, W A , W D , Asn394) are found not to actually transmit reorganisations by the correlated displacement maps. This is particularly obvious for His357, for which an analysis based on the contact network is likely to (erroneously) suggest functional relevancy. In strong contrast, the dynamic correlation network divulges that the residue is actually uncoupled from the main reorganisation network as a consequence of it lacking correlated motion with adjacent residues. Likewise, the contact network analysis overestimates the role of W D in releasing the CTT by placing it at a bottleneck position of the shortest reorganisation path. While analysis of the correlated displacements based on equation (7) reveals a shortest path close to this prediction, it indicates that Asp367 instead of W D is important. The analysis based on equation (6) favours an alternative pathway involving the α-helix H12. Thus, in summary, a study based on contact alone might fail to elucidate the intricate reorganisation properties of proteins.

Phosphorylation sites
It is important to stress that some of the mobile residues of ErCry4 are serine, thus providing a hint to phosphorylation sites that could be functionally coupled to the photo-activation. In order to explore this possibility, we have used NetPhos 3.1 [83,84] to predict the propensity of serines, threonines or tyrosines to phosphorylation by kinases. We find that 8 out of the 42 predicted phosphorylation sites are located within mobile sites (see table 3). Among these, the site surrounding Ser45 is particularly noteworthy as it comprises three adjacent phosphorylation sites (Thr43, Ser44, Ser45) that undergo α-helical coiling in response to the photo-activation of ErCry4. This conformational change is encoded by the second largest principal component accounting for 23% of the positional variance in the combined ensemble of DS and RPD states and is associated with a large increase in the SASA of Thr43 (see figure S16). Likewise, the contact graphs show that the potential phosphorylation sites Ser180 and Ser278 are on average more exposed in the photo-activated state than the DS ( figure S9). While currently unsupported by experimental evidence, we speculate that phosphorylation at these sites could have a function in modulating/ deactivating the photo-transduction cascade. This could proceed in a manner similar to the deactivation of rhodopsin, which after photoexcitation undergoes conformational changes that allow its phosphorylation by rhodopsin kinase [85]. It is then capped by arrestin thereby preventing further activation of transducine.

Conclusions
The present investigation utilises three tools to unravel the governing principles underlying the photo-activation of ErCry4. First, we have identified groups of residues that undergo marked structural reorganisation based on changes in internal coordinates, i.e. the distance matrix, and characterised their interrelation by principal component analysis. Second, we have mapped the protein structure onto a graph representation facilitated by the adjacency of residues and used these contact networks to identify common motifs of the mobile sites such as a low degree, an above-average local clustering coefficient and a low centrality. Third, we have demonstrated that these contact networks, though popular, can only reveal an incomplete picture of the dynamics. This shortcoming can be overcome by an alternative graph representation that is induced by the correlation of displacements in addition to the usual spatial adjacency. The signalling is multifaceted and involves many pathways: the network efficiently transmits correlation over short to intermediate distances via nearest-neighbour interactions and rigid secondary structure elements. The process is facilitated by hubs, which are four times more abundant than in the contact networks of the DS and RPD state. Tightly knit communities of residues, i.e. reorganisation modules, are interconnected by a small number of important residues, whose betweenness centrality also exceeds that of central nodes of the contact network by a factor of 3.5. Our analysis raises the prospect that the reorganisation properties could be altered or even engineered by manipulating these key links. In any case, the network layout provides a foundation for efficient signal transduction that goes beyond alternative modes of reorganisation such as a single pathway model or the undirected, i.e. stochastic, search of a new free energy minimum. More specifically, the analysis has revealed that the most prominent reorganisation sites in ErCry4 are located at the CTT and CTT-coupled motifs. In addition, the loop region involving Thr43-Ser45 has been found to show an interesting transition upon photo-activation whereupon a solvent-exposed α-helix forms. This and many of the other mobile sites expose serines (see figure 5 and table 3) that are potential phosphorylation sites in the ErCry4 photo-activation process. This observation allows us to insinuate that the phototransduction cascade could potentially be modulated/controlled by kinases.
Here, we focus on a radical pair state that is formed in a photo-induced electron transfer reaction from the darkstate protein containing the FAD cofactor in the fully oxidised form. It is currently unknown and intensely debated whether this state could form the basis of a magnetic compass [14,16,29,31,32,35,36,86]. Some evidence in favour of the alternative radical pair resulting from the re-oxidation of the fully reduced cryptochrome by e.g. oxygen has recently been found [18,26,33,[48][49][50][51][52][53]. Studies of the signalling of this and related states of ErCry4 are currently ongoing. However, based on experimental findings for the analogous activation process in DmCry, we anticipate that the signalling behaviour will be prominently mediated by the cofactor and widely indifferent to the oxidation state of W D [54]. The toolchain established here will prove useful in these follow-up studies.
Our approach has a bearing that goes beyond the photo-activation process in ErCry4 discussed here and is applicable to a multitude of signalling processes and allosteric adaption. In fact, it provides insights into a much deeper question: namely, does protein signalling/allostery involve dedicated pathways or follow a more holistic, concerted paradigm, for which the transition states and interactions are fleeting with only the initial and final state well-defined. The ErCry4 studied here shows characteristics of both prototypes: concerted reorganisation Table 3. Potential phosphorylation sites in regions that undergo large rearrangements upon the photo-activation of ErCry4 . The table lists  mobile regions by their residue sequence number and reorganisation  communities using the numeric labels from figure 7.   Potential phosphorylation sites  Mobile site   Reorganisation  community   T43, S44, S45  38-48  4  S130 129-131 12 S180 179-186 10 S278 274-285 7 T420 419-422 11 S475 CTT 8 in tightly coupled reorganisation modules, but still remnants of pathways in the form of highly important residues at interface sites. Currently, we utilise extensive MD simulations to derive the correlation of displacements of residues. However, as the dynamically correlated networks are based on the concept of local correlation, this raises the prospect of deriving the graph representations from short-time MD simulations or alternative tools such as normal mode analysis or elastic network models [87]. In this way, the approach could provide predictive instead of descriptive power.