Systematic Investigation on Interactions for HIV Drug Resistance and Cross-Resistance among Protease Inhibitors

We systematically investigate the interactions for HIV drug resistance and cross-resistance of three protease inhibitors. Using Bayesian probabilistic models, we first detect mutation combinations related to drug resistance in the HIV-1 protease or reverse transcriptase (RT) of patients with single, double and triple drug treatments respectively, and then infer detailed interaction structures of these mutations. Moreover, we compare the resistance patterns of single-drug treatments and multiple-drug treatments and study the selection patterns of mutations from single-drug treatments and multiple-drug treatments, using molecular dynamics simulation and free energy calculations, in order to draw inference for giving advice on the design of multiple-drug treatments.


Introduction
HIV drug-resistance, which is caused by mutations of viral proteins that disrupt the drugs' binding, enables the virus to survive and reproduce in the presence of antiretroviral drugs, hence substantially diminishes the efficacy of HIV drug therapy [1][2][3].Furthermore, HIV drug-resistance induced by the genetic variation in virus replication is preserved and flourished by selection pressure during therapy [3][4][5].Although it has been shown that physicians with access to drug resistance data can provide better-responded therapy to patients, the complicated mutation patterns and unequal importance of the viral protein mutations make it hard for genotypic data to be helpful in designing therapeutic regimen [7][8][9].
Zhang [13] proposed an innovative method for investigating mutation interactions of HIV induced by a certain drug treatment [13].In this study, the Bayesian variable partition (BVP) model is applied to the data of three different drug treatments (Indinavir, zidovudine, and nevirapine) respectively to select mutations associated with drug resistance, and then the recursive model selection (RMS) procedure is employed to infer the dependence structure among the interacting mutation positions found by the BVP model.This paper rediscovers the majority of known resistant mutations to the aforementioned three drugs [8] and uncovers several interacting structures for these mutations.In particular, a conditional independence structure among the mutations M46I, I54V and V82A agrees with previous experimental results.
However, combinations of drugs from more than one drug classes, which are believed to be able to prolong virus suppression and immunologic reconstitution, lead to the problem of cross-resistance, i.e.,drug resistance of the HIV virus when two or more drugs are used in the therapy for the same patient [11][12][13][14][15]. Currently there are sixteen antiretroviral drugs that have been approved for the HIV-1 infection: seven nucleoside/nucleotide reverse transcription (RT) inhibitors (NRTI), six protease inhibitors (PIs), and three nonnucleoside RT inhibitors (NNRTI) [11][12].Yet previous studies provide little insight on this issue and often give inconsistent results when analyzing the same input mutation data [5,10].
In this study, we proposed a systematic procedure to apply BVP and RMS on the data of single, double and triple treatments of three PIs: IDV, NFV and SQV from Stanford HIVdb database to detect the multiple-drug treatment effects and compare them to the single-drug treatment effects.We then use Molecular Dynamic (MD) Simulations and Free Energy Calculations to explore the selection pattern of drug-resistance effects from single-drug profile to multiple-drug profile.We find that only a few mutation positions of single-drug treatment showing up in multiple-drug treatment, and that the existence of these mutations depends on the competing outcome of the interaction effects on the drugs.

Using BVP and RMS for cross-resistance for multiple drug
Here we first used the Bayesian variable partition (BVP) model [13,14] to search for the mutation positions (1 to 99) associated with the cross-resistance of IDV, NFV and SQV interactively.After detecting the interaction mutation positions, we applied Recursive Model Selection [13] (RMS) to infer the detailed dependence structure among the interacting positions.
BVP and RMS are both designed to explore mutations for singledrug resistance rather than cross-resistance for multiple drugs.We now show our procedure of using BVP and RMS to study cross-resistance.We applied them sequentially to the data from single-drug treated, double-drug treated and triple-drug treated patients respectively.From Stanford database, we collected 4146 untreated patients' sequences, and single-drug treated sequences (949 IDV-treated, 714 NFV-treated, 240 SQV-treated), double-drug treated sequences (280 IDV-NFV, 143 IDV-SQV, 96 NFV-SQV), and finally triple-drug treated patients (201 IDV-NFV-SQV).
First we applied BVP to all single-drug-treated datasets (IDV, NFV and SQV) versus 4146 untreated dataset, and searched for interactively associated mutation positions (see Figure 1 for results).Then RMS was used to figure out the dependence structure among the interacting positions (see Figure 2).These are just single-drug resistance results.
Second, we applied BVP and RMS to 3 double-drug datasets individually (Figure 4 shows the results for IDV-NFV, Figure S1 and S2 in Supplementary Materials show the results for IDV-SQV, NFV-SQV).Comparing this result with the single-drug resistance results, we can see how the resistant mutation interactions change when two of the three protease inhibitors were used.
Third, we applied BVP and RMS to triple-drug dataset.Figure 3 shows the result.

Results
We downloaded the data from Stanford HIVdb database (http://www.hivdb.stanford.edu).Among all protease inhibitors (PI), NRTI, NNRTI, only several PIs and RTs have sufficient sample sizes, namely IDV, NFV, SQV, AZT and NVP.Since we are interested in cross-resistance within drug classes, we also need good sample sizes for multipledrug treatments.Only IDV, NFV and SQV have large enough sample sizes for all single-drug treatments (949 IDV-treated sequences, 714 NFV-treated, 240 SQV-treated), double-drug treated sequences (280 IDV-NFV, 143 IDV-SQV, 96 NFV-SQV), and triple-drug treated patients (201 IDV-NFV-SQV).Hence we focus our investigation on these three PIs, with the understanding that given enough data, the same methods can be applied to other PIs and NRTI, NNRTI, etc.
Interactions for single-drug resistance of IDV, NFV and SQV Zhang [13] proposed a systematic approach for detecting and understanding combinatorial mutation patterns responsible for HIV single-drug resistance.In order to find interaction pattern for drug resistance of IDV, Zhang [13] applied Bayesian variable partition (BVP) and Recursive Model Selection (RMS) on HIV-1 type B protease sequence data -4146 sequences from untreated patients and 949 sequences from IDV-treated patients.Similarly, we applied BVP and RMS on HIV-1 type B protease sequence data, from NFV-treated patients (714 sequences) and SQV-treated patients (240 sequences).Figure 1 and Figure 2 show the interacting positions detected by BVP and the dependence structure inferred by RMS for IDV, NFV and SQV.Interestingly, even from this single-drug resistance results we can see there are some positions associated with more than one drug treatments.54, 82 and 90 (in red circles in Figure 1) are associated with all of the three drugs, and 10, 71 and 73 are associated with IDV and SQV (but not NFV), while 88 associated with NFV and SQV (but not IDV).Totally, there are 10 positions interactively associated with IDV, 6 with NFV, and12 with SQV.

Dependent structures for IDV, NFV and SQV
After applying RMS, we were able to figure out some of the dependent structures among the interacting positions.Zhang [13] identified two independent groups: one is composed of 46, 54 and 82, the other composed of 73 and 90 (Figure 2a).In Figure 2b and 2c, there are three independent groups identified for NFV: 54 and 82, 20 and 90, 30 and 88.While for SQV, two independent groups were identified:   [13].For NFV (b), three independent interaction groups were inferred with high posterior probabilities (>0.9): 54 and 82, 20 and 90, 30 and 88.For SQV (c), two independent interation groups were inferred with high posterior probability (>0.9): 48, 54 and 82; 71, 73, 79 and 90, although other details of the structure were not able to inferred with high confidence.one contains 48, 54 and 82, the other containing 71, 73, 79 and 90.It is interesting to see that the group of 54 and 82 are associated with all of the drugs.The group of 90 is interacting with 73 for IDV and SQV, but with 20 for NFV.The interacting group of 30 and 88 is drug-specific for NFV.

Conjecture of drug-resistance mutation interaction profiles for multiple-drug treatments based on profiles from singledrug treatments
Figure 1 and Figure 2 are the drug-resistant mutation interaction profiles for IDV, NFV and SQV based on single-drug treatment data.Currently, the state of art in cross-resistance is based on single-drug resistance profiles.If the drug-resistant effects in a multiple-drug treatment are simply additive (independent), i.e. mutation interactions from one drug will not affect the one from another drug in the same treatment, then we would expect to see most of the mutation interactions in single-drug treatments show up in multiple-drug treatment.For example, there are 10 mutation positions involved in resistant interactions of IDV treatment and 6 positions involved in NFV treatment (Figure 1 and 2A), totally 13 non-redundant positions forming about 4 interaction groups (one for 30 and 88, one for 10 and 71, one for 73, 90 and 20, one for 54, 82 and others) for IDV and NFV.Based on the independent assumption, if we apply BVP and RMS to double-drug treatment of IDV and NFV, we would expect to see about 13 positions for mutation interactions and 4 interaction groups either from IDV or NFV.Similar arguments could also be applied for double-drug treatments of IDV-SQV and NFV-SQV.Finally, in triple-drug treatment of IDV-NFV-SQV, we would expect to see totally 18 non-redundant positions for resistant interactions.

Interactions for double-drug resistance of IDV, NFV and SQV
In order to study the interaction patterns for double-drug resistance of three PIs, we applied BVP and RMS to sequences from double-PI treatments: 280 from IDV-NFV treatment, 143 from IDV-SQV treatment, 96 from NFV-SQV treatment.Figure 4 shows the results for IDV-NFV data, while Figure S1 and S2 in the Supplementary Materials show those for IDV-SQV data and NFV-SQV data.For IDV-NFV treatment, in Figure 4, all of the 7 positions BVP detected are either from IDV single drug resistance [13], or from NFV single drug resistance (see Figure 1).For example, 54, 82 and 90 are interactively associated with both IDV and NFV treatments, while 24 and 73 are only associated with IDV treatment, 30 and 88 from NFV treatment.But there are only 7 mutation positions (Figure 4), instead of 13, as conjectured, from single-drug treatments of IDV and NFV.In other words, 6 positions (10, 20, 32, 46, 47, 71) disappeared comparing with single-drug treatment profiles.
Similar things happen to the dependent structure of double-drug resistance inferred by RMS.For IDV-NFV double-drug treatment, we found three resistant interaction groups: 24, 54 and 82 are from one independent interaction group in IDV resistance [13], while 73 and 90 are from another independent group of IDV [13]; 30 and 88 are from NFV independent group.All of these independence structures remain the same for double-drug resistance.However, we did not see the conditional independence of 46, 54 and 82 (which was observed in IDV drug resistance) in IDV-NFV treatment.In NFV single-drug treatment, 90 is in the group of 20, but in IDV-NFV double-drug treatment, 20 is not even detected by BVP (Figure 4).So again comparing with our conjecture from single-drug profiles, the group of 10 and 71 from IDV disappeared.
As for IDV-SQV data, we observed similar results (Figure S1): most of the positions detected are from IDV or SQV single-drug resistance, and two independent groups (46, 54, 82, and 73, 90) are also the same as the ones from IDV single-drug resistance.However, due to the small sample size (143 sequences), we were not able to infer the details of each group.20, 36, 61, 63 are not in single-drug profiles  of IDV or SQV, but in double-drug treatment profile of IDV-SQV.10, 22, 23, 48, 79 and 84 are in single-drug profiles but not double-drug profile.For NFV-SQV (Figure S2), again the sample size is too small (96 sequences), so the posterior probabilities shown in Figure S2a are very moderate, thus the details of the dependence structure cannot be inferred (Figure S2b).

Interactions for triple-drug resistance of IDV, NFV and SQV
We also studied triple-drug resistance of the three drugs by applying BVP and RMS to 201 sequences from three PI treated patients (note that this sample size is even larger than double-drug treatments of IDV-SQV and NFV-SQV).Figure 3 shows the associated positions and the detailed dependent structure.The results are very similar to double-drug IDV-NFV resistance (Figure 4) except that 46 is observed again.However, only 8 positions are involved in resistant interactions, much smaller than what we conjectured from single-drug profiles:18.It seems that only the ones repetitively detected in singledrug, double-drug profiles are detected in triple-drug profile.In Figure 3b, three independent interaction groups were inferred: 30 and 88; 73 and 90; 24, 46, 54 and 82.The detailed structure inside the last group (24, 46, 54 and 82) was ambiguous due to the small sample size.These interaction groups are also showing up repetitively in single-and double-drug profiles.

Multiple-drug treatment effect
Observing the results of double-drug treatments and triple-drug treatment, we see they are very different from what we conjectured from single-drug resistance profiles.In double-drug IDV-NFV treatment profile, there are only 7 mutation positions (Figure 4) in interactions, not 13 as conjectured from single-drug treatments of IDV and NFV.In triple-drug IDV-NFV-SQV treatment profile, only 8 are in Figure 3, not 18, as conjectured from single-drug profiles.Thus in multiple-drug treatment, the resistance profile is not simply additive from single-drug profiles.There are some multi-drug treatment effects that select mutation interactions from single-drug profiles.
In order to figure out what is this selection effect, we further did Molecular Dynamic (MD) Simulations and Free Energy Calculations of two interaction groups: 54 and 82; 30 and 88 (since it is well-known that the resistance caused by 73 and 90 cannot be explained by the binding free energy analysis [13], so we did not do MD simulation or free energy analysis on this group).Both are repetitively detected in single-, double-and triple-drug resistance profiles.We hope to figure out what makes them selected in multiple-drug profiles from single-drug profile.

Molecular Basis of Interacting Mutations Revealed by MD Simulations and Free Energy Calculations
From above statistical analysis we can see the three independent interaction groups: 54 and 82; 30 and 88; 73 and 90, repeatedly being detected in single-, double-and triple-drug resistance.To further investigate the molecular implications of the mutation interactions within the three groups across three different PIs, we conducted MD simulations to analyze the binding free energies of the protease/ IDV/NFV/SQV complexes.The results are shown in Table1, from which we can see that the interaction effects of 54V/82A in all of the three drugs are very strong but in different directions: 54V has individual resistance in neither IDV nor NFV but it amplifies the one from 82A in IDV while neutralizes 82A in NFV; in SQV 54V does have some intermediate resistance but 82A does not (even more susceptible than the wild type), so 82A actually neutralizes 54V for 54V/82A.Because the interaction effects are strong in all the drugs, the interaction group of 54 and 82 are detected in all single-, doubleand triple-drug treatments.Figure S5 shows the difference between the drug-protease residue interactions of the wild-type complex and those of the L82A mutated complex for (a) IDV, (b) NFV and (c) SQV. Figure S6 shows the structure alignment of the wild-type and the L82A mutated complexes of IDV and the structural distributions of eight important residues labeled in Figure S5a.
The interaction group of 30 and 88 behaves very differently from 54 and 82, since there are two possible mutations at 88: 88D or 88S.The interaction effect is most evident when we compare 30N/88D with 30N/88S in NFV: neither 30N nor 88S has individual resistance to NFV, but 88D does; more importantly, 30N/88D has intermediate resistance but 30N/88S is more susceptible to NFV than the wild type.This strong interaction effect is the main reason that 30 and 88 are detected in any single-, double-and triple-drug treatment where NFV is involved.
Figure 5 shows the difference between the drug-protease residue interactions of D30N/N88S and those of D30N/N88D for (a) IDV, (b) SQV and (c) NFV.It is quite clear that for IDV, there is very minimum difference; for SQV, some intermediate level of difference were observed; however, for NFV, very strong difference were observed at positions 50 and 50'.This observation is consistent with our statistical inference for IDV, SQV and NFV. Figure 6 shows the structural distributions of important residues labeled in Figure 5b for SQV and Figure 5c for NFV.The D30N/N88S HIV-1 protease mutant is shown as blue strand and SQV or NFV is shown as green stick.The favorable residues to the binding of SQV or NFV to the D30N/N88S mutant are shown as the yellow CPK model, those of the unfavorable residues are the red CPK model, and the residues at positions 30 and 88 are the green CPK model.In both cases of SQV (Figure 6a) and NFV (Figure 6b), the distributions of favorable and unfavorable residues are very unbalanced.The results suggest that if the interaction effect is strong to one of the drugs, and intermediate or weak to some other drugs in the same treatment, it has more chance to be selected in multiple-drug treatment, like 30 and 88 in NFV.
But the drug-resistant effects in a multiple-drug treatment are not simply additive (independent), thus the state of art in cross-resistance based on single-drug resistance profiles (which ignores the epistasis effect in multiple-drug treatment) is very inaccurate, and multiple-drug resistance profile should be built based on multiple-drug treatment data equipped with advanced statistical inference.

Discussion
Utilizing the pioneering method proposed by Zhang [13], which integrates Bayesian statistical modeling with molecular dynamic   simulations, we were able to detect and analyze, beyond the singledrug level, the complex interactions of drug resistance mutations and cross-mutations of the HIV-1 protease and reverse transcriptase.Most significantly, we applied the said method to all possible combinations of single-, double-, and triple-drug treatment data on IDV, NFV and SQV from the Stanford HIV-1db database to study the problem of cross-resistance, revealing the selection pattern of the puzzled problem of multiple-drug treatment effects.We find that in general, the interactions effect is likely to be strengthened when it has strong interaction with at least one other drug in the same treatment.Hence it seems that simultaneous multi-drug treatment is more effective than sequential multi-drug treatment.
There are, admittedly, many complexities not addressed in this paper.For example, the HIV data from Stanford HIV db are collected from a global sample and thus multiple subpopulations in both treated and untreated population are possible, which may render our inference biased.Another issue may lie in the quasi-species nature of HIV-1.The HIV-1 population within an individual consists of innumerable variants and minor variants that often go undetected and hence there exists the possibility that our data underrepresent those variants.Furthermore, HIV-1 drug resistance can be not only acquired (developing in a person receiving antiretroviral treatment) but also transmitted (occurring because a virus with drug-resistance mutations was transmitted to a drug-naive person).In recent years, the transmitted resistance occurrence has been increasing due to scaled-up antiretroviral treatments.In Europe, North America, and Brazil, it has been reported that the prevalence of drug resistance ranges from 5-15% in newly diagnosed individuals.Because our untreated sequence data were collected from 1982 to 2005, it is possible that there are several transmitted drug resistant sequences in the untreated group that may affect both the sensitivity of our BVP algorithm and the power of our Bayesian model structure inference method.Because there are many antiretroviral drugs, cross-resistance is a severe and practical problem.Despite all the possibilities that may emerge, this study has verified the original findings of HIV drug resistance and demonstrated the long-puzzled selection pattern of HIV multiple drug treatment effects.We are positive that the method and results presented here will enlighten new and more accurate ways to decipher the myths behind drug resistance of HIV and other related diseases.

Figure 1 .
Figure 1.The posterior probabilities of each position of mutation to be associated interactively with IDV, NFV and SQV treatments respectively (applying BVP to single-drug treated datasets).Red circles show the positions interactively associated with all of the three drugs (i.e.54, 82, 90).Blue circles show the positions interactively associated with IDV and SQV, but not NFV (i.e. 10, 71 and 73).Green circles show the positions interactively associated with NFV and SQV, but not IDV (i.e.88).

Figure 2 .
Figure 2. The inference of detailed dependence structure for IDV (a), NFV (b) and SQV (c) using RMS.The result of IDV was reported in Zhang[13].For NFV (b), three independent interaction groups were inferred with high posterior probabilities (>0.9): 54 and 82, 20 and 90, 30 and 88.For SQV (c), two independent interation groups were inferred with high posterior probability (>0.9): 48, 54 and 82; 71, 73, 79 and 90, although other details of the structure were not able to inferred with high confidence.

Figure. 3
Figure.3The posterior probabilities (a) of each position of mutation to be associated interactively with triple-drug treatment (IDV, NFV and SQV, applying BVP to triple-drug treated dataset) and the inferred detail dependence structure (b) among interacting positions (applying RMS to triple-drug treated dataset).In (a), totally eight positions are interactively associated: 24, 30, 46, 54, 73, 82, 88 and 90.In (b), three independent interaction groups were inferred: 30 and 88; 73 and 90; 24, 46, 54 and 82.The detailed structure inside the last group was ambiguous.

Figure S2 .
Figure S2.The posterior probabilities (a) of each position of mutation to be associated interactively with double-drug treatment (NFV and SQV, using BVP) and the inferred detail dependence structure (b) among interacting positions (using RMS).In (a), totally ten positions are interactively associated: 10, 30, 48, 54, 73, 74, 82, 84, 88 and 90.In (b), the details of the dependence structure were ambiguous.

Figure 5 .
Figure 5.The difference between the drug-protease residue interactions of D30N/N88S and those of D30N/N88D for (a) IDV, (b) SQV and (c) NFV.The residues with absolute value larger than 1.00 kcal/mol are labeled.(ΔΔG equals to the interactions between drug and the D30N/N88S protease mutant minus those between drug and the D30N/N88D protease mutant).

Figure 6 .
Figure 6.Structural distributions of important residues labeled in (a) Figure 5b and (b) Figure 5c.The D30N/N88S HIV-1 protease mutant is shown as blue strand and SQV (a) or NFV (b) is shown as green stick.The favorable residues to the binding of SQV to the D30N/N88S mutant are shown as the yellow CPK model, those of the unfavorable residues as the red CPK model, and the residues at positions 30 and 88 as the green CPK mode.

Figure
FigureS4shows the difference between the drug-protease residue interactions of the wild-type complex and those of the N88S mutated complex for (a) IDV, (b) NFV and (c) SQV.More positions are affected in SQV compared with IDV and NFV by N88S single mutation.The results suggest that if the interaction effect is strong to one of the drugs, and intermediate or weak to some other drugs in the same treatment, it has more chance to be selected in multiple-drug treatment, like 30 and 88 in NFV.But the drug-resistant effects in a multiple-drug treatment are not simply additive (independent), thus the state of art in cross-resistance based on single-drug resistance profiles (which ignores the epistasis effect in multiple-drug treatment) is very inaccurate, and multiple-drug resistance profile should be built based on multiple-drug treatment data equipped with advanced statistical inference.

Figure S5 .
Figure S5.The difference between the drug-protease residue interactions of the wild-type complex and those of the L82A mutated complex for (a) IDV, (b) NFV and (c) SQV.The residues with absolute value larger than 1.00 kcal/mol are labeled.(∆G difference equals to the interactions between drug and the wile-type protease minus those between drug and the mutant).

Figure S6 .
Figure S6.Structure alignment of the wild-type and the L82A mutated complexes of IDV and the structural distributions of eight important residues labeled in Figure S5a.The wild-type complex is shown as red strand, the mutated complex as green strand and IDV is shown as red or green stick.The favorable residues to the binding of IDV with the wild-type protease are shown as the red CPK model, and the unfavorable residue as the green CPK model.The structures are averaged from0.5 to 3 ns.The root mean square deviation (RMSD) of the main-chain atoms for the two averaged structures after structure alignment is 0.97 Å.

Figure S4 .
Figure S4.The difference between the drug-protease residue interactions of the wild-type complex and those of the N88S mutated complex for (a) IDV, (b) NFV and (c) SQV.The residues with absolute value larger than 1.00 kcal/mol are labeled.(∆G difference equals to the interactions between drug and the wile-type protease minus those between drug and the mutant).