Time Series Analysis on the Conformational Change of c-Src Tyrosine Kinase

c-Src tyrosine kinase plays an important role in signal transduction pathways, where its activity is regulated by phosphorylation of the two tyrosine residues. We performed targeted molecular dynamics simulation to obtain trajectory of conformational change from inactive to active form. To investigate the conformational change of c-Src tyrosine kinase, we applied network analysis to time series of correlation among residues. The time series of correlation between residues during the conformational change generated by targeted molecular dynamic simulation. With centrality measures such as betweenness centrality, degree centrality, and closeness centrality, we observed a few important residues that significantly contribute to the conformational change of c-Src tyrosine kinase for the different time steps.


Introduction
Tyrosine kinases play a critical role in various biological processes such as migration, angiogenesis, proliferation, differentiation, survival, and immune function [1][2][3]. c-Src (cellular Src), encoded by Src gene, is a non-receptor tyrosine kinase first isolated as the normal cellular homolog to the potent avian sarcoma viral transforming oncogene v-Src [4]. c-Src tyrosine kinase consists of the N-terminal unique region, the Src homology 3 (SH3), SH2, linker, kinase domain, and the regulatory C-terminal tail. Under normal circumstances, the SH3 domain of the c-Src tyrosine kinase binds to the proline-rich region in the linker domain. And the SH2 domain binds to Tyr527, which leads to a closed conformation [5,6]. However, under certain conditions, the closed, inactive c-Src tyrosine kinase undergoes a transition to an open and active conformation. One of the important features of c-Src tyrosine kinase catalytic activation is to control its phosphorylation status. One of the major phosphorylation sites is Tyr527, which is located in the C-terminal tail. Dephosphorylation of pTyr527 (phosphorylated Tyr527) releases the closed conformation, which leads to the active state. Another major phosphorylation site is Tyr416, which is located in the activation loop of the tyrosine kinase domain [7,8]. c-Src tyrosine kinase activity depends on the phosphorylation status of the two residues: Tyr527 and the Tyr416.
Using targeted molecular dynamics (TMD) simulation, we study the conformational change of c-Src tyrosine kinase by applying an external bias [9]. We generate sequential and continuous transition between two known conformations: the inactive and the active conformation for the case of Tyr527-pTyr416 (instead of pTyr527-Tyr416).
Network analysis has been successfully applied to biological problems [10,11]. For instance, Goh et al. constructed a network of Mendelian gene-disease associations to identify unknown important genes causing disease [12]. In a biological network, a node corresponds to a gene and an edge is an interaction or correlation between genes. Centrality measures are very effective approaches to determine the ranking or importance of genes in the gene-disease network. Those centrality measures uncover important target genes relevant to disease. In this study, using network analysis, clustering method, and time-series analysis, we attempt to reveal the key residues, which influence the conformational transition of the c-Src tyrosine kinase between the inactive and active conformation.

Targeted molecular dynamics (TMD) simulation
To perform the Targeted Molecular Dynamics simulation, an external potential, U TMD [13], to be defined.
In the above equation, RMSD(t) is defined as root-mean-square deviation (RMSD) of the simulated structure from the target structure at time t. RMSD*(t) is the RMSD value at time t assuming a linear decrease from the initial to the target structure. The inactive (PDB ID: 2SRC) and active form (PDB id: 1Y57) of c-Src tyrosine kinase are defined as initial and target conformation, respectively [14,15]. Both the conformations were modified (to have the same number of atoms) for the TMD simulation. The spring constant 'k' was set as 2500 kcal/mol ÁÅ 2 for 3,619 atoms (hydrogen atoms excluded). NAMD 2.9 [16] with the CHARMM 27 force field [17] was used to perform simulation and the protein parameters were incorporated with the CMAP corrections [18]. In TMD, the conformational change from inactive to the active state is guided by the external force within a reasonable time scale. In our calculation, we have used 10 ns time scale. To undergo the conformational change within 10 ns time scale and to avoid the bias effect, we employed the smallest spring constant 'k' (2500 kcal/mol ÁÅ 2 ). TIP3P water model [19] was used in the simulation. The particle mesh Ewald (PME) method was set as 12 Å direct space cut-off [20]. The damping coefficient was set as 5 ps À1 for the Langevin dynamics simulation. To maintain the constant pressure (1 atm), Nosé-Hoover method was used (1 atm) [21]. The NPT ensemble was carried out in 310K for the TMD simulation.

Network analysis
To analyze the correlation between residues in the trajectories of the TMD simulations, we have used the dynamical cross-correlation (DCCM) method [22][23][24].
Where, r i (t) and r j (t) = atomic positions of the i th and j th C α atoms at time t. The quantity "r i (t) À <r i (t)>" corresponds to the fluctuation of the "i th " atom and "r j (t) À <r j (t)>" corresponds to the fluctuation of the "j th " atom. For all the Cα atoms, a 450 Â 450 (residues 84-533) correlation map was obtained during the 10 ns TMD simulation. In Eq. (2), the quantity C ij in the DCCM is an adjacency matrix. The weight w ij , of the edge between the nodes, 'i and j', defined as [25,26] DCCM can be used to measure the weight, which is the probability of information transfer across the edge. In the constructed network, every node is a Cα atom and each edge is an information transfer probability, in the cross-correlation. To identify and quantify the nodes that occupy critical positions in a network, a few centrality measures have been proposed, including the degree, betweenness, and closeness centralities [27][28][29].
The degree centrality measures the number of edges incident on a node in a network, thus expressing the "popularity" of the node.
where A ij is the adjacency matrix: if w ij = 0 then A ij = 0, otherwise A ij = 1. The closeness centrality is defined as the average length of the shortest paths between a node and all the other nodes in a network. It can be used to measure information spread from a given node to the other nodes. The closeness centrality is defined as where g(v i , v j ) is the shortest path with a weight between two nodes 'i' and 'j'. The betweenness centrality is to measure the number of information pathways that flow through a node in a network. The betweenness of node 'i' is the fraction of the shortest paths between pairs of nodes that pass through node 'i'. The betweenness centrality is defined as where g st i is the number of shortest paths from 's' to 't' with a weight that passes through node 'i'. n st is the total number of shortest paths from to 's' to 't'. We obtained DCCM and three centralities using Bio3d [30][31][32]

Clustering
Clustering is an effective approach to discover interesting patterns of time series data. It is widely used in diverse fields of research from biology to economy: functional clustering of time-series gene expression data [33], identification of functionally related genes [34-36], detecting brain activity [37,38], identifying pathological cases from mass spectrometry (MS) clinical samples [39], discovering energy consumption pattern [40,41], and pattern finding in stock time series [42,43].
Using the targeted molecular simulation (TMD), we generated a dynamical correlation matrix between two residues, C ij , for each time step during the conformational change from the inactive to active state. Then, the time-series of C ij are grouped into a set of time series in such a way that their temporal profiles in the same group (cluster) are more similar to each other than those in other groups (clusters). We applied a hierarchical clustering algorithm and repeated clustering runs with a different number of clusters from 4 to 10 until getting clear clustering. The function 'tsclust [44]' of 'DTWclust [45]' in the R statistical package is used for clustering of time series data.

Conformational change of c-Src tyrosine kinase
The electrostatic interaction between the Tyr527 and the positively charged Arg175/Lys203 becomes weaker in the inactive conformation at the early stage of the transition from the inactive to the active conformation by TMD simulation. The C-terminal tail, including Tyr527, is completely detached from the SH2 domain (the period of 0-3 ns) in the TMD simulation. The detachment of Tyr527 from the SH2 domain triggers the conformational change. The detached Tyr527 moves toward the kinase domain (4-6 ns period: Figure 1b and c). At this stage, the most prominent conformational change occurs in the kinase domain. The secondary structures,  521-533, Yellow). The two residues, pTyr416 and Tyr527, are represented as licorice [9].
including the αC-helix in the kinase domain, rotate significantly. During this time, Tyr416 remains buried beneath the activation loop. At the final stage of transition, Tyr527 has reached the far side of the kinase domain relative to the SH2 domain. The activation loop has also moved from its original position. pTyr416 is now exposed to the surface.
The activation processes from the inactive state (0 ns: Figure 1a) to the active state (10 ns: Figure 1d) in the TMD simulation are shown in Figure 1.
The conformational transition of c-Src tyrosine kinase from the inactive to the active state generated by TMD simulation is shown in the supplementary video material [9].

Centrality measures
The degree, closeness, and betweenness centralities are measured for the conformational change that occurred during the 10 ns TMD simulation (Figure 2). The residues with high values of the degree, closeness, and betweenness centralities are not located in the activation loop, linker, αC-helix. The residues with high centrality measures are mainly confined to the helix region adjacent to the αC-helix. The residues with the top 5 values of each centrality measure are listed in Table 1.

Clustering
We investigate correlation patterns between specific residues (Trp260, Tyr416, Tyr527, Lys321) and all the other residues using the clustering method. The clustering of residues provides us with information on not only the pattern of the time series but also the correlation values, C ij , between residues at each time step.  [9].

Betweenness
Closeness Degree

Trp260
Trp260 has an important role in the conformational change of c-Src tyrosine kinase [46]. Figure 3 shows a time series of correlation, C ij (t), for Trp260. It shows distinctive patterns among clustering groups. Residues in both group 6 (Figure 3a) and group 11 (Figure 3b) are positively correlated to Trp260 for the entire time window (0-10 ns). The residues in clustering group 9 (Figure 3c), however, are negatively correlated to Trp260. Figure 1a shows no correlation in early time (0-4 ns). The correlation increases positively (4-7 ns), and reaches the highest correlation value ($7 ns). Eventually, the correlation decreases (8-10 ns). The location of residues in clustering group 6 (orchid), clustering group 9 (cyan), and clustering group 11 (pink) are shown (Figure 3d). The positively correlated (blue) and negatively correlated residues (red) are shown (Figure 3e). The positively correlated residues are mainly located near residue Trp260 in the inactive form of c-Src tyrosine kinase. The negatively correlated residues are spread around a relatively far from Trp260 in the inactive form. The time series analysis based on the clustering method and network analysis indicates that Trp260 is rapidly changing its position after the second half of the conformational transition. Figure 4 shows the time series of correlation, C ij (t), for pTyr416. The residues in the clustering group 12 are positively correlated to Tyr416 for the entire time window (0-10 ns) (Figure 4a). The residues in the clustering group 5, however, are negatively correlated to pTyr416 (Figure 4b). Interestingly, the residues in the clustering group 16 show correlation change in the two different time regimes (Figure 4c). As shown in Figure 4a, C ij (t) has a large value (> 0.5) even in early time and increases to 0.9. During the 7-8 ns period, C ij (t) reaches a plateau with the highest correlation value. After 8 ns, C ij (t) returns to 0.5. The residues in the clustering group 12, shown in orchid in Figure 4d, are mainly located close to pTyr416. The positive correlation between the clustering group 12 and pTyr416 is due to "physical distance".

pTyr416
Most of the residues in the clustering group 5 are negatively correlated to pTyr416 during the conformational transition (Figure 4b). C ij (t) decreases to a minimum value of -0.7 around 7 ns and returns to -0.3. The residues in group 5, shown as cyan in Figure 4d, are located far away from pTyr416. The negative value of C ij (t) indicates that the movement of pTyr416 and the residues in the clustering group 5 are in a reverse direction.
The residues in the clustering group 16 show the correlation pattern change to pTyr416 during the conformational transition. During 0-4 ns period, residues in group 16 (shown as yellow in Figure 4d) shows negatively correlated to pTyr416 with a minimum value of -0.3 in C ij (t). And after 4 ns, the correlation pattern shows positively correlated to pTyr416. C ij (t) reaches 0.7 around 7 ns in the conformational transition process.
The positively correlated (blue) and negatively correlated residues (red) to pTyr416 are mapped into the inactive conformation (Figure 4e).  Figure 5 shows the time series of correlation, C ij (t), for Tyr527. The residues in the clustering group 14 are positively correlated to Tyr527 for the entire time window (0-10 ns) (Figure 5a). The residues in the clustering group 9, however, are negatively correlated to Tyr527 during the most time of conformational transition (Figure 4b). As with the case of pTyr416, the residues in the clustering group 2 show correlation pattern change in the two different time regimes (Figure 5c). As shown in Figure 5a, C ij (t) increases to 0.6 during 0-4 ns period. After an abrupt decrease, C ij (t) increases again to reach a plateau with the value of 0.6 during 6-10 ns. The residues in the clustering group 14 (shown as orchid in Figure 5d) are located close to Tyr527. Similarly, the positive correlation between the clustering group 14 and Tyr527 is due to a strong coupled through "physical distance".

Tyr527
Most of the residues in the clustering group 9 are negatively correlated to Tyr527 during the conformational transition (Figure 5b). The residues in the clustering group 9 (shown as cyan in Figure 5d) are located away from Tyr527 in the inactive conformation.
The residues in the clustering group 2 show the correlation pattern change to Tyr527 during the conformational transition, which is observed in the case of pTyr416. During 0-4 ns period, the residues in the clustering group 2 (shown as yellow in Figure 5d) shows negatively correlated to Tyr527 with a minimum value of -0.3 in C ij (t). And after 4 ns, the correlation pattern shows positively correlated to Tyr527. In the last, C ij (t) reaches 0.45 around 7 ns in the conformational transition process.
The positively correlated (blue) and negatively correlated residues (red) to pTyr416 are mapped into the inactive conformation (Figure 5e).

Lys321
According to network analysis, Lys321 has high centralities both in betweenness and degree centralities ( Table 1). Within the framework of network theory, it implies that Lys321 would play an essential role in the conformational change of c-Src tyrosine kinase. The residues in the clustering group 13 are positively correlated to Lys321 for the entire time window (0-10 ns) (Figure 6a). The residues in the clustering group 1 are negatively correlated to Lys321 (Figure 6b). As with the cases of pTyr416 and Tyr527, the residues in the clustering group 16 show correlation pattern change in the two different time regimes (Figure 6c).
As shown in Figure 6a, C ij (t) values fluctuate around 0.5 during the 0-5 ns period. After that, it increases to 0.7 around 7 ns. Finally, it decreases to around 0.55. The residues in the clustering group 13 (shown as an orchid in Figure 6d) are located close to Lys 321. Similarly, the positive correlation between the clustering group 14 and Lys321 is due to a strong coupled through "physical distance".
Most of the residues in the clustering group 1 are negatively correlated to Lys321 during the conformational transition (Figure 6b). The residues in the clustering group 1 (shown as cyan in Figure 5d) are located away from Lys321 in the inactive conformation. Figure 6b shows that C ij decreases to -0.5 until 6 ns and increases and The residues in the clustering group 16 show the correlation pattern change to Lys321 during the conformational transition, which is observed in the cases of pTyr416 and Tyr527. During 0-2 ns period, the residues in the clustering group 16 (shown as yellow in Figure 6d) shows negatively correlated to Lys321 with the minimum value of -0.25 in C ij (t). And for 2-4 ns, the correlation pattern shows positively correlated to Lys321. After 4 ns, it sharply decreases to the minimum value of -0.25. Subsequently, it reaches the maximum positive value of 0.3 in C ij (t). Eventually, it decreases to zero. The C ij (t) for Lys321 shows more correlation pattern changes compared to the cases of pTyr416 and Tyr527.
The positively correlated (blue) and negatively correlated residues (red) to Lys321 are mapped into the inactive conformation (Figure 6e).
The residues in the clustering group (Figures 3d, 4d, 5d, and 6d) are listed in Table 2 positively and negatively correlated residues in the clustering group (Figures 3e, 4e, 5e, and 6e) are listed in Table 3. Negative correlation Cluster 5 162,163,169,170,171,172,173,174,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205 The residues in the clustering group (Figures 3d, 4d, 5d, and 6d).

Conclusions
In this study, we investigated the conformational transition of c-Src tyrosine kinase from the inactive to the active state using TMD simulation and network theory. Tyr416 and Tyr527 are known to be significant residues in the conformational transition by controlling the phosphorylation status. They play as a switch for the conversion of the inactive/active conformation. The time-dependent correlation matrices, C ij (t), between all the residues and pTyr416 and Tyr527 show very similar patterns (positively correlated, negatively correlated, negatively/positively correlated) during the conformational transition process. The time-series analysis supports that pTrp416 and Tyr527 act as a switch in a very concerted manner for completion of conformational transition of c-Src tyrosine kinase. Based on the analysis of the three centrality measures (betweenness, closeness, and degree), we observed that Lys321 plays an essential role in the conformational transition of c-Src tyrosine kinase. The time-series analysis shows that C ij (t), between all the residues and Lys321 show positively correlated, negatively correlated, and pattern change from negatively to positively correlated one during the conformational transition process. Combining the network analysis with time-series analysis, Lys321 may be another candidate for a switch for the conformational transition of c-Src tyrosine kinase.