Structural analyses of 2015-updated drug-resistant mutations in HIV-1 protease: an implication of protease inhibitor cross-resistance

Strategies to control HIV for improving the quality of patient lives have been aided by the Highly Active Anti-Retroviral Therapy (HAART), which consists of a cocktail of inhibitors targeting key viral enzymes. Numerous new drugs have been developed over the past few decades but viral resistances to these drugs in the targeted viral enzymes are increasingly reported. Nonetheless the acquired mutations often reduce viral fitness and infectivity. Viral compensatory secondary-line mutations mitigate this loss of fitness, equipping the virus with a broad spectrum of resistance against these drugs. While structural understanding of the viral protease and its drug resistance mutations have been well established, the interconnectivity and development of structural cross-resistance remain unclear. This paper reports the structural analyses of recent clinical mutations on the drug cross-resistance effects from various protease and protease inhibitors (PIs) complexes. Using the 2015 updated clinical HIV protease mutations, we constructed a structure-based correlation network and a minimum-spanning tree (MST) based on the following features: (i) topology of the PI-binding pocket, (ii) allosteric effects of the mutations, and (iii) protease structural stability. Analyis of the network and the MST of dominant mutations conferring resistance to the seven PIs (Atazanavir-ATV, Darunavir-DRV, Indinavir-IDV, Lopinavir-LPV, Nelfinavir-NFV, Saquinavir-SQV, and Tipranavir-TPV) showed that cross-resistance can develop easily across NFV, SQV, LPV, IDV, and DRV, but not for ATV or TPV. Through estimation of the changes in vibrational entropies caused by each reported mutation, some secondary mutations were found to destabilize protease structure. Our findings provide an insight into the mechanism of PI cross-resistance and may also be useful in guiding the selection of PI in clinical treatment to delay the onset of cross drug resistance.


Background
Ever since the identification of the Human Immunodeficiency Virus (HIV) as the cause of Acquired Immunodeficiency Syndrome (AIDS), scientists have raced to find effective and sustainable treatment methods to inhibit viral replication and assembly. In the last 30 years alone, HIV has grown to be a pandemic with more than 35 million people infected worldwide [1]. While therapeutic progress has been made in prolonging the lifespan of infected individuals using the Highly Active Anti-Retroviral Therapy (HAART) [2][3][4], HIV rapidly adapts and develops drug resistance.
Although new drugs such as Protease Inhibitors (PIs) are constantly being developed, such progress is outpaced by HIV drug resistance. This drug resistance arises from mutations in the viral protease gene to compromise the protease-PI interaction to facilitate the binding to protease substrate (i.e. Gag), even in the presence of the PIs [5][6][7][8][9][10][11]; consequently rendering the PIs ineffective.
While drug resistant mutations are often associated to specific PIs, many confer cross-resistance to other PIs [12]. The cross-resistance makes it challenging to map specific protease mutations to specific PIs. Nevertheless, to effectively guide clinical selection of second or third line of treatment when resistance to one such PI has occurred, such investigations are necessary [13]. Mutation mappings have revealed that these mutations spontaneously arise as part of the natural variance [14] and become dominant during PI-drug treatments.
This high variance in the HIV enzymes often results in reduced viral fitness (in terms of replication and infectivity) and an increasing percentage of inactivated or "unfit" viruses [15]. Nonetheless, of the PI-resistance mutations [16], many also compensate for the reduced viral fitness [17][18][19][20]. Such compensatory mutations are typically found outside the protease active site or on the protease substrate Gag [21][22][23][24][25][26] to balance fitness with the impaired enzymatic activity. Through better surveillance, the reports of such emerging mutations would certainly enable in-depth investigations into the structural mechanisms of drug resistance [27,28].
Current studies of protease structures bearing different resistance mutations [7,[18][19][20][27][28][29][30][31][32][33] are generally focussed on the protease flaps located above the protease active site and how the flaps mediate PI accessibility. Mutations in this area naturally affect the flap motions, reducing PI accessibility and binding. While such mutations can be single point mutations or a cluster to confer resistances to specific PIs, the exact structural mechanisms that result in cross-drug resistance remain enigmatic.
Using network analyses, Ragland et al. [27] and Appadurai et al. [28] investigated the relationship of mutations inside and outside the protease active site. Their studies revealed allosteric effects that explained resistance development against the current PIs. While their studies conveyed residue-based correlations within a protease structure and for specific PI resistance, the cross-resistance conferred by different combinations of mutations remains not well established.
In this study, we investigated different combinations of protease mutations and how they lead to resistance to various PIs by constructing a structure-based correlation network of 33 protease-PI mutant complexes from the 2015-updated clinical drug-resistant mutations in HIV-1 protease [34]. To construct this network, we employed different structural constraints such as PI-binding pockets corresponding to various PIs, allosteric effects on the PIbinding region, and structural stability affected by mutations. In addition, we investigated the mutation hotspots that can destabilize the protease structure. Together, these findings can lead to better understanding of PI crossresistance and serve as a possible guide in the clinical selection of PIs to delay the onset of PI resistance.

Structural modelling of mutated complexes of protease and protease inhibitors (PIs)
We modelled 26 mutated protease structures based on the various combinations of mutations reported in the 2015 update of drug-resistance mutations in HIV-1 [34]. According to this report, the mutations were categorized into "major" (substantially reducing protease binding ability to the PIs) and "minor" (some compensate and improve the viral fitness loss caused by major mutations, whereas others affect PI susceptibility) [34]. These protease mutants are known to be resistant to the seven Protease Inhibitors For each protease-PI complex, we used SCWRL4 [35] to model the side chains of the mutated residues in the protease. The PI coordinates were then input into the modelled protease and minimized using AMBER12 [36].

Constructing the correlation network of the modelled protease-PI complexes
We simulated the co-related drug resistant mutations against various PIs by constructing a weighted graph generated by Gephi v0.82 [37]. The graph G = (V, E) contains a set of nodes V, each of which represents a protease-PI complex containing the combinations of major and/or minor protease mutations [34]. Nodes are connected by edges E represented by the pairwise structural corelationships between nodes.
To investigate the influence among the nodes, we assigned node weights using closeness centrality, which quantifies the closeness of a node to the other nodes in the network. The nodes with high closeness centrality show the potentially related mutations that would be selected in resistance development against various PIs.
For each node v in graph G, we characterized a vector C v involving (i) PI-binding pocket corresponding to various PIs and mutations, (ii) allosteric effects on the PI-binding region caused by the mutations, and (iii) structural stability affected by the mutations.

(i) PI-binding pocket
The protein cavity detection package fpocket [38] was used to estimate the PI-binding pocket volume (Vpocket) for each protease-PI variant and the influence of the different mutation categories (i.e. major or minor) on the PI-binding region. (ii)Allosteric effect SPACER [39] was used to detect communications and the impact on the PI-binding regions resulting from the major and minor mutations. The communication strengths between sites, characterized by leverage coupling (details in [40]), imply the potential allosteric effect (AllosComm defined in our C v vector) caused by the various mutations. Due to the different combinations of mutations, where some minor mutations do affect the major mutations and PI susceptibility, we estimated the AllosComm as below: 1. If the protease-PI complex contains major mutations do 2. AllosComm ← effect of both (major, minor) to the PI-binding region, excluding effect of minor onto the major mutations 3. Else: 4. AllosComm ← effect of minor to the PI-binding region 5. End if (iii)Structural stability Thermostability of a native protease [PDB:1ODW] given the various mutations was evaluated using ENCoM server [41] and the resulting free energy (ΔΔG) was assigned to the mutated protease structure by linearly combining all energy values (including vibrational entropy and approximated enthalpy scores) that resulted from each single mutation. The structural deviations (RMSd) of the resulting mutated protease from the native protease above were also calculated. Therefore in our graph G, we defined an integrated vector C v for each node v as C v = (Vpocket, AllosComm, ΔΔG, RMSd), and the weight of the edge between nodes i,j was estimated based on Pearson's correlation between two normalized vectors C vi and C vj .

Constructing the minimum spanning tree
After constructing the graph G, a shortest path subgraph H was extracted by employing the Kruskal's minimum spanning tree (MST) algorithm [42] in Scipy package [43]. In this subgraph H, nodes were weighted using numbers of neighbours, and the edges were defined the same as in G.

Results and discussion
Structural relationship of PI-resistance protease mutants We set out to study the cross-drug resistance of protease mutations associated with various PIs structurally. To do this, we estimated the protease pocket volumes of 33 protease-PI complexes using fpocket [38] and calculated the pocket volume for another seven native protease-PI complexes [referenced from PDB:1ODW]. Results of the seven native complexes showed that the protease pocket volume was capable of flexible adjustment in accordance to the PI size (Tables 1, 2, 3 and 4) and that even a single mutation can alter the pocket volume of HIV protease (see Additional file 1).
To further investigate the mechanism, we analyzed the effects of these mutations on the PI-binding site using SPA-CER [39]. We found that both the major and minor mutations influenced the PI-binding pockets. The quadruple mutant-containing complexes (ATV_0, IDV_0, LPV_1, and NFV_1) showed increased allosteric effects when compared to the native complexes (Tables 1, 2 and 3). In addition, some minor mutations at position 82, demonstrated a strong effect on the binding region (in NFV_3, NFV_4, SQV_3, or SQV_4 complexes) when mutated to hydrophilic residues.
To investigate the structural stability of the mutated proteases, we first checked for possible biases in side chain replacement by SCWRL4 [35]. As a control, the replacement protocol was applied on the native protease [PDB: 1ODW], where all the side chains were first removed and added back to the retained backbone. Our results of superimposed full structures of the original native and the reconstituted structures showed that SCWL4 was able to recover the side chains of the native structure (RMSd0 Å). Thereafter, SCWRL4 was used to model the mutated side chains for all the protease mutants. ENCoM [41] was used to estimate the free energy and vibrational entropy and we found that the mutations destabilized the protease structure even though they caused reductions of PI susceptibility (Tables 1, 2, 3 and 4). In addition, it was shown that mutations might have caused structural changes in some mutated protease structures when compared to the native protease (i.e. RMSd values in Tables 1, 2, 3 and 4). Yet these values are unlikely to reflect the structural effect caused by single mutations. Nonetheless, we have incorporated the RMSd factor into our integrated vector C v to construct the correlation network. This was performed to include the structural contribution effect in the network, and also to avoid the possible biases toward allosteric communication and free energy.
Using the multiple structural constraints shown in Tables 1, 2, 3 and 4, we generated a structure-based  ΔΔG and ΔS scores represent free energy and vibrational energy respectively, demonstrating thermo-stability of the protease when mutated from the native protease correlation network of the mutant protease-PI complexes with respect to various PIs (Fig. 1). Figure 1 shows that mutations associated with the seven PIs are highly related where cross-resistance could develop easily against NFV, SQV, LPV, IDV, or DRV (Fig. 1b), with the arrows indicating the predominant directions of the correlations. The results suggest that LPV should be used as the first line of treatment to reduce the development of cross-resistance (since the arrows point from the other drugs to LPV and not vice versa, see Fig. 1b). We next constructed a MST to extract the shortest path between the different PI resistant proteases from a wild type protease (i.e. complex of protease-DRV at the bottom of the MST). Our MST demonstrates a path (Fig. 2) for a wild type protease to develop resistance from one PI to another, serving as a guide for the structural relationships among different protease mutants to various PIs.
The MST was sectioned into four main groups: (I), (II), (III), and (IV) as shown in Fig. 2. The major group III (including most of major mutations-containing protease-PI complexes, i.e. DRV_1, IDV_1, ATV_1, TPV_1, LPV_1, and SQV_1) was connected to the other groups at ATV_2 (to group II) and at SQV_1 (to group IV). The other major resistant mutant NFV_1 however belonged to group I. As the connections between the mutant complexes were weighted based on their structural correlations (see Methods), the thicker edges demonstrated the high likelihood of crossresistance to the next PI in the tree.
In the major group III, strong connections between protease mutants that resist DRV (i.e. DRV_1) and IDV (i.e. IDV_1) were observed. This connection was supported by clinical findings where the combinations of mutations V32I, I47A, and L76V, conferred high-level resistance to DRV and IDV [34]. The path thus further dissuades the use of DRV on patients with IDV resistant viruses.   Interestingly, the MST distinguished NFV_1 from the other major mutants. A distinct mutation D30N around the NFV-binding region might characterize the specificity of the NFV_1 complex ( Fig. 2 and Table 3). However, this D30N mutation is rarely associated with NFVresistance in some non-subtype B HIV [34], suggesting that NFV can be used as a first-line therapy to avoid the rapid development of resistance to other PIs.
Our structural and sequence analyses showed that mutations at the protease flap regions (position 46-54) and residue 90 (i.e. L90M) played a key role to distinguish group III from the other groups. In the major resistance protease mutants (i.e. group III), the flap region mutations were predominantly hydrophobic and accompanied by mutations at position 90 (e.g. L90M), whereas in the other minor resistance protease mutants, the flap regions were mostly hydrophilic with no mutations occurring at position 90 (Fig. 2). The physicochemical changes of the flap regions may have caused rigid or buried flaps, resulting in reduced PI susceptibility.
However, an exception to this trend was found for I50L in the mutant ATV_1 previously reported to improve susceptibility to IDV, SQV, LPV, or NFV [47][48][49]. In our MST, we further defined this relationship where ATV_1 was closer to IDV_1, SQV_1, and LPV_1 than to NFV_1, suggesting that NFV_1 resistance would develop  ). The protease-PI complexes are numbered following indices: 0 (the PDB available complex structure that was used to initiate the reported mutations), 1 (mutated protease-PI complexes containing both major and minor mutations), the rest from 2 to 6 (mutated protease-PI complexes containing minor mutations). In this network, edges are coloured based on the source node, which are connected to other nodes; hence highlighting the link from one node to the others. The node sizes are varied based on different related closeness level of each node to other nodes. b A consolidated graph that shows predominant trends that the protease is most likely to resist, from one PI to another. The thicker the arrows are, the more likelihood that protease mutations extend the resistance to the corresponding PIs. The graph was generated using Gephi v0.82 [37] slower when compared to the others. On the other hand, the other drug resistance mutations (IDV_1, SQV_1, LPV_1) would develop intervening intermediates (LPV_3 for IDV_1 and TPV_1 for the latter two).
For group IV, the path depicted that patients first treated with NFV (i.e. NFV_0 -wild type protease) may develop significant changes, e.g. in the flap region at position 54, and at the hotspot 82 in order to resist SQV. For patients treated with LPV (i.e. LPV_0wild type protease), there may be a series of resistance developed against NFV, SQV, and TPV by a single or a few minor mutations (Fig. 2, group IV). This suggests for the displacement of SQV instead of NFV in clinical use to avoid rendering LPV ineffective later.
Detecting protease structure stability-affecting hotspots Major mutations typically confer PI resistance but also cause viral loss of fitness. This loss is often in turn compensated by other minor mutations. In our findings, we showed that the major mutations destabilized protease structure via increased vibrational entropy (calculated by ENCoM [41]), and this vibrational entropy contributed significantly to the free energy of proteins [50]. To evaluate the reliability of the ENCoM findings, we used CUPSAT server [51] and found 63% (12/19) agreement in the calculation of the single mutations.
As shown in Fig. 3, many minor and even major mutations caused protease structure to be unstable in the course of resistance development. Resistance to DRV, LPV, and TPV resulted in noticeable stability impact as they contain the destabilizing major mutations (Tables 1, 2, and 4). With 60% (6/10) mutations causing the instability, viral resistance to escape from the strong DRV binding [10] comes with significant structural stability compromise.
Of these mutations, residue substitutions at the flap regions I47V (in all PI resistance), I50V (in DRV and LPV), and the mutation L76V (in DRV and LPV) significantly increased the regional entropy. While similar protease-destabilizing effect by major mutations was observed in LPV and TPV resistance, the minor mutation G73S (LPV) or I54A (TPV) amplified this effect while compensating for the viral fitness loss.

Conclusion
Based on the structural responses and viral fitness cost of the clinically reported mutations, we report a PI resistance-related pathway that HIV protease may undertake in PI cross-resistance. In this we found the structural rationale for the rapid development of crossresistance amongst five of the seven clinically used PIs. There are some PIs that would be better used in clinical settings against naïve HIV infections or when resistance Fig. 2 The drug resistance-related pathway that Protease might travel to resist from one PI to another. The minimum spanning tree or MST (left) shows the shortest path that leads to different PI resistance from a wild type protease (e.g. complex of protease-DRV at the bottom of the MST). Nodes are coloured and scaled the same as shown in Fig. 1 according to seven PIs: Atazanavir (ATV, cyan), Darunavir (DRV, blue), Indinavir (IDV, red), Lopinavir (LPV, yellow), Nefinavir (NFV, magenta), Saquinavir (SQV, green), and Tipranavir (TPV, purple). Edge thickness is weighted based on correlation between nodes. Highlighted on the right are the major (in bold) and minor mutations that could distinguish four close groups of mutant protease complexes (namely group I, II, III, and IV): PI-binding site region (position 30, 32), protease flap region (position 46-54), and other important major mutations (position 76, 82, 84, and 90). Protease-PI complexes containing major mutations are highlighted in bold. The graph was generated using Gephi v0.82 [37] has already been developed towards another PI. On this, our findings suggest the use of LPV as the first line of PI in HAART, and depending on the emergence of PIresistant mutations, certain drugs would be more useful in subsequent lines of treatment. The findings of this study thus provides a structural understanding that may be useful to guide the clinical use of PIs in HAART, aiding in drug selection to prolong the effectiveness of the given PI.

Additional file
Additional file 1: Dynamic changes of pocket volumes of three wild type protease-PI complexes during 10 ns molecular dynamics simulation. The three wild type protease-PI complexes (DRV_0, LPV_0, and NFV_0) contain single residue substitutions S37N, L63P, and V3I respectively that contributed to shrink the PI-binding pocket as compared to the native protease [PDB:1ODW]. The fluctuation of pocket volume of the native protease is estimated based on the three PIs (DRV, LPV, and NFV), and is shown in gray shade (~1955 ± 213 Å 3 ). The molecular dynamics simulations were performed in standard protocol for 2x5ns using AMBER14. (PDF 564 kb)  Mutation hotspots that cause protease structures to be unstable when resisting PIs. The hotspots were detected based on vibrational entropy (ΔS) using ENCoM [41] for seven major mutation-containing protease-PI complexes: Atazanavir (ATV_1), Darunavir (DRV_1), Indinavir (IDV_1), Lopinavir (LPV_1), Nefinavir (NFV_1), Saquinavir (SQV_1), and Tipranavir (TPV_1). The major mutations that reduced the PI susceptibility are underlined. Common hotspots among PIs are highlighted in coloured boxes and ovals. The size of the residue columns is reciprocal to the total number of mutations in each complex. Heat maps are generated by the ENCoM