Protein structural disorder of the envelope V3 loop contributes to the switch in human immunodeficiency virus type 1 cell tropism

Human immunodeficiency virus type 1 (HIV-1) envelope gp120 is partly an intrinsically disordered (unstructured/disordered) protein as it contains regions that do not fold into well-defined protein structures. These disordered regions play important roles in HIV’s life cycle, particularly, V3 loop-dependent cell entry, which determines how the virus uses two coreceptors on immune cells, the chemokine receptors CCR5 (R5), CXCR4 (X4) or both (R5X4 virus). Most infecting HIV-1 variants utilise CCR5, while a switch to CXCR4-use occurs in the majority of infections. Why does this ‘rewiring’ event occur in HIV-1 infected patients? As changes in the charge of the V3 loop are associated with this receptor switch and it has been suggested that charged residues promote structure disorder, we hypothesise that the intrinsic disorder of the V3 loop is permissive to sequence variation thus contributing to the switch in cell tropism. To test this we use three independent data sets of gp120 to analyse V3 loop disorder. We find that the V3 loop of X4 virus has significantly higher intrinsic disorder tendency than R5 and R5X4 virus, while R5X4 virus has the lowest. These results indicate that structural disorder plays an important role in HIV-1 cell tropism and CXCR4 binding. We discuss the potential evolutionary mechanisms leading to the fixation of disorder promoting mutations and the adaptive potential of protein structural disorder in viral host adaptation.


Introduction
HIV-1 cell entry relies primarily on the binding of viral envelope protein, gp120, to two host immune cell membrane proteins, namely, the CD4 receptor and the coreceptor CCR5 or CXCR4 [1]. Virus that solely uses CCR5 as coreceptor is termed R5 tropic, whereas exclusive CXCR4 using virus is termed X4 tropic, while virus that can use both coreceptors for cell entry is termed dual tropic (R5X4). The switch from CCR5 to CXCR4 is correlated with disease progression and pathogenesis [2,3]. Interestingly, the CCR5 receptor is not expressed in individuals that present the CCR5 delta-32 mutation [4]. This mutation confers natural resistance to a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 R5 tropic HIV and non-expression of this receptor appears not to have any significant affect. Based on these observations, the small molecule CCR5 antagonist maraviroc (MVC) was developed to inhibit the CCR5 receptor, thereby blocking virus cell entry [5] and helping to control virus infection.
In all documented cases of MVC virologic failure due to an observed tropism switch, a minority pre-existing X4 using viral population was present prior to therapy, which is 'unmasked' by the use of an entry-inhibitor [6]. This indicates that X4 populations may commonly be present, but only transiently dominant, in the life time of an infection. Indeed, in the now classic longitudinal study of Shankarrapa et al. [7], this is readily apparent (see Meehan et al. for visualisations [8]), indicating that the true rates of X4 using virus are much higher than the commonly reported figure of 50% of patients observed to progress to X4 using virus, as confirmed by recent studies [3,9].
Investigating the HIV-1 coreceptor switch is therefore central to understanding HIV-1 infection and has clinical implications for the use of CCR5 entry inhibitors. It is generally accepted that the V3 loop of gp120 plays a primary role in determining HIV-1 cell tropism and coreceptor specificity [10]. However, our understanding of the mechanisms of HIV-1 tropism remains incomplete [2, 3]. Charged residues in the V3 loop have been shown to affect tropism [11,12]. Moreover, N-linked glycosylation in the vicinity of the V3 loop has been shown to affect viral tropism[13], e.g., the lack of N-linked glycosylation sites is associated with X4 phenotypes [13][14][15][16][17][18]. Because N-linked glycosylation plays a role in protein folding and stability [19][20][21][22], lack of N-linked glycosylation may decrease V3 loop stability and therefore contribute to X4 using phenotypes. While these changes may collectively lead to the tropism switch, why does this occur in the majority of HIV-1 infections? How can random sequence evolution, given it's blind nature, result in a predictable outcome? Could changes in the disorder of gp120 V3 loop play a role in rewiring protein-protein interactions (from CCR5 to CXCR4) or be resulting in a different structure in the context of trimeric Env that results in the change in tropism?
The classical structure-function paradigm states that protein function requires a welldefined three-dimensional structure [23,24]. However, it has become clear that many functionally important proteins do not have well-defined structures (termed intrinsically disordered proteins), or have protein regions that lack structures (proteins with disordered regions) [23,[25][26][27][28]. In some cases the intrinsically disordered regions can form structures after binding to other molecules. HIV-1's gp120 has five highly variable (V1-V5) and five conserved (C1-C5) protein regions [29]. The variable regions are normally missing in the solved X-ray crystal structures (V1/V2, V3, V4 and V5) of gp120 unless co-crystallized with other binding partners, presumably because they are intrinsically disordered. Disordered proteins or protein regions have been observed to be "sticky" and aggregate potentially forming promiscuous molecular interactions [25,30,31]. Disordered regions are frequently found to undergo posttranslational modifications, such as phosphorylation or N-linked glycosylation [24,32]. In addition, amino acid residues in proteins that give rise to structural disorder tend to be polar and charged [24,25]. Given the disordered nature of HIV-1's V3 loop, the documented role of charge and glycosylation in the tropism switch, we hypothesize that intrinsic disorder of the V3 loop is permissive to the variation that can lead to a change coreceptor usage and thus switches in cell tropism.

Data and tropism prediction
We retrieved three data sets: (i) Full-length envelope protein (Env) sequences of HIV-1 subtype B with tropism information (R5, R5X4 or X4) from the Los Alamos National Laboratory HIV sequence database (http://www.hiv.lanl.gov/content/index). These sequences have annotated coreceptor usage based on biological data only, which is not from sequence-based bioinformatics prediction (http://www.hiv.lanl.gov/components/sequence/HIV/search/help. html#coreceptor). And, two intrapatient data sets: (ii) A dataset containing C2 to V5 regions of Env, as used by Shankarrapa [7]. (iii) A dataset consisting of V1 to V3 regions of Env, as used by Mild [16,33]. We predicted tropism using Geno2pheno[454] with default parameters, and a false positive rate (FPR) cutoff of 5% was used to class sequences as CXCR4-using ('true') or R5 tropic ('false') [34]. Note that Geno2pheno does not distinguish between X4 and R5X4 viruses, this means that sequences classed as true could be either X4-or dual-tropic, which has important implications when interpreting the statistical test results. All sequences used in this study with tropism annotation are in S1 File.
Comparison of protein disorder in the V3 loop of CCR5, CXCR4 and CCR5-CXCR4 tropic viruses Bioinformatics methods for charactering probable disordered regions in amino acid sequences generally fall into two categories: (1) machine-learning methods trained on missing (presumed to be disordered) regions of experimentally solved X-ray crystallographic structures and these trained models used to predict disordered regions in protein sequences or on mobile regions characterised by nuclear magnetic resonance (NMR) [35,36], or physiochemical properties and pairwise amino acid interaction energies can be calculated to determine likely disordered regions of sequences (2). The former approach is prone to errors in the solved protein structures deposited in Protein Data Bank (http://www.rcsb.org/) [37]. We therefore opted to use several methods, IUPred [38], DISOPRED3 [36], PONDR VL-XT [39] and PONDR VSL2 [40], to predict differences in the intrinsic disorder regions between CCR5 and CXCR4 using virus in our datasets, namely R5, R5X4 and X4 envelope sequences.
For each amino acid residue in a protein sequence, IUpred and other methods will report a disorder score from 0 to 1 ranging from complete order (0) to complete disorder (1). We use a cut-off 0.4 to indicate structural disorder (> = 0.4) [41]. We first performed disorder prediction on the full available envelope sequences in our three datasets, and then extracted the V3 region to analyze disorder within the V3 loop. We then group the extracted scores into two groups (CCR5-using and CXCR4-using) according to tropism prediction by Geno2pheno, and compare these statistically using a nonparametric method in the R statistical package [42] (two sample comparisons: nonparametric Behrens-Fisher problem in paired data, http://cran.r project.org/web/packages/nparcomp/index.html). The R scripts for the nonparametric test and V3 loop disorder scores of all patients are in S2 File. We performed three statistical tests. First, the null hypothesis is H 0 :P(a,b) = 1/2 (b tends to be similar to a); the alternative hypothesis is H a :P(a,b) < 1/2 or P(a,b) > 1/2 (b tends not to be similar to a). Second, the null hypothesis is H 0 :P(a,b) ! 1/2 (b tends to be greater than or similar to a); the alternative hypothesis is H a :P(a,b) < 1/2 (b tends to be less than a). Finally, the null hypothesis is H 0 :P(a,b) 1/2 (b tends to be less than or similar to a); the alternative hypothesis is H a :P(a,b) > 1/2 (b tends to be greater than a).

Results
First, we investigated whether there were statistically significant differences in structural disorder associated with coreceptor usage. We calculated the disorder tendency of amino acids in the full-length consensus envelope protein sequences (gp120 and gp41) of R5, R5X4 and X4 tropic viruses by four different methods, respectively. This allowed us to obtain an overview of the extent of protein disorder in HIV-1 envelope proteins by tropism with different methods.
The predicted disorder tendency from the four methods generally coincides with regions missing from experimentally solved crystallized protein structures of gp120 [43] and gp41 [44] (Fig  1). The V3 loop may be folded in the trimeric Env, however, structural studies of the Env trimers demonstrated that the variable regions are indeed structurally flexible (e.g., V1-V3) [45][46][47][48][49]. For example, the electron densities of the longer intrinsically disordered regions such as V1/V2, V4 and V3 are difficult to obtain monomerically without co-crystallization with a stabilizing binding partner. There are several noticeable differences between the predictions of the four methods. First, there are over-predictions of disorder in either N-and/or C-terminus by DISOPRED3, PONDR VL-XT and PONDR VSL2 except in IUPred as noticed before [41]. Second, strikingly, PONDR VL-XT gives direct support of our hypothesis that X4 has the highest disorder propensity of the V3 loop (Fig 1C and 1E) than R5/R5X4 V3 loop. Finally, as IUPred specifically offers an option to predict long disordered regions (>30 amino acids) and there is no over-prediction of the termini (Fig 1) [41], we therefore decided to use disorder tendency scores predicted by IUPred for subsequent statistical analysis.
Next, we focus on the V3 loop to examine how structural disorder changes between coreceptor switches. We found that the disorder tendency of amino acids in the V3 loop of X4 virus was significantly higher than in R5 virus (p = 2.35 × 10 −3 , Fig 2) and R5X4 virus (p = 1.87 × 10 −16 , Fig 2), respectively. Interestingly, there was significantly higher structural disorder in R5 virus compared to R5X4 virus (p = 7.49 × 10 −6 , Fig 2). Thus, dual tropic R5X4 virus has the lowest V3 loop structural disorder tendency, compared to R5 and X4 virus. Collectively, these results suggest that the V3 domain of X4 virus has significantly greater structural disorder tendency than that of R5 and R5X4 virus during coreceptor switch. Dual tropic R5X4 virus may therefore have a less flexible V3 domain than R5 and X4 virus, which may be more prone to neutralizing antibody binding. This is consistent with a previous study that showed dual tropic virus had lower fitness, and was therefore more sensitive to antibody inhibitors and neutralization [50].
To further test our hypothesis, we analyzed two data sets dataset also had experimentally verified coreceptor calls. In the Shankarappa dataset, we first predicted coreceptor usage for all sequences at all visits across all nine patients (patient 1, 2, 3, 5, 6 7, 8, 9 and 11). We compared the structural disorder tendency between X4-capable and R5 virus at each visit that has coreceptor switch events in eight patients (patient 1, 2, 3, 5, 7, 8, 9 and 11). We showed the results for two patients (patient 2 and 9) that are representative of all eight patients, as suggested in Shankarappa et al. study (results for patient 1, 3, 5, 6, 7, 8 and 11 are reported in S1 Fig). In patient 2 we found that X4-capable V3 loop had significantly higher disorder tendency than R5 using V3 loop at visits 7 (p<0.001), 10 (p<0.05), 12 (p<0.001), 13 (p<0.001), 14 (p<0.05), 15 (p<0.001), 17 (p<0.001), 19 (p<0.001) and 23 (p<0.001) except at visit 16 (Fig 3 patient 2). At visit 16 the R5 using V3 loop has significantly higher disorder tendency than the X4-capable V3 loop (the mean disorder tendency of the R5 using V3 loop is also higher than the X4-capable V3 loop), which may indicate the predicted X4-capable virus is actually dual tropic R5X4. For patient 9, it was only at visit 20 that the disorder tendency of the X4-cabable V3 loop is significantly higher than that of the R5 using V3 loop (p<0.001 Fig 3  patient 9). At both visits 3 and 21 the R5 using V3 loop disorder tendency is significantly higher than the X4-cabable V3 loop indicating the predicted tropism may be dual tropic R5X4. However, at both visits 17 and 22 there is no significant difference of disorder tendency during corecetor switch (p>0.05 Fig 3 patient 9), which may suggest the prediction of tropism is false positive and they have the same tropism.
In the Mild et al. data we first predicted the coreceptor usage for all sequences at each time point within patients. Patients who are predicted to have coreceptor switch events are consistent with the experiment verification except 25 months, 45 months, 10 months in patient 2239, 2242 and 2282, respectively (patient 2239, 2242 and 2282 are included in our study while patient 1865 was omitted due to dual tropic R3X4 virus). We then calculated the disorder tendency in R5 and X4 virus (Fig 4), respectively. We then compared the disorder tendency between R5 and X4 virus at each time point of all three patients that have coreceptor switch events. In patient 2239 we find that there are three time points that have coreceptor switch events (time points 25 months, 68 months and 88 months). At two of the three time points the X4-capable V3 loop has significantly higher disorder tendency than the R5 using V3 loop Disorder tendency for each amino acid is predicted by four prediction methods (0 represents complete order; 1 represents complete disorder; scores over 0.4 represent intrinsically disordered). Disorder tendency scores are plotted for consensus R5, R5X4 and X4 envelope sequence for DISOPRED3 (A), IUPred (B), PONDR VL-XT (C) and PONDR VSL2 (D), respectively. (E) The consensus protein sequences of R5, R5X4 and X4 are aligned to HXB2 envelope sequence (gp120 and gp41) for indicating the structure locations of the disordered residues (adapted from Jiang et al. 2015 [61]). Gp120 and gp41 protein domains are numbered and color-coded for visualisation. Note that here the length of the Env sequence is longer than the normal one because gaps are introduced in the sequence alignment and dashed lines are used for linking sites with missing prediction scores in the gaps.  Fig 4 patient 2242). In patient 2282 we find 5 time points that have coreceptor switch events (time points 10 months, 47 months, 62 months, 63 months and 70 months , Fig 4 patient 2282). Interestingly, all X4 V3 loops have significantly higher structural disorder tendency than R5 using V3 loops (p<0.001, Fig 4 patient 2282). These results are compatible with our hypothesis: Firstly the X4-capable virus had significantly higher structural disorder than the R5 tropic virus (time points 25 months and 88 months in patient 2239; time points 10 months, 47 months, 62 months 63 months and 70 months in patient 2282, Fig 4). Secondly, the structural disorder is not significantly different between R5 and X4 tropic virus (time points 45 months, 84 months and 85 months in patient 2242, Fig 4). This is not consistent with our hypothesis indicating the predicted tropism is due to a false prediction and therefore we may compare viruses with the same tropism. Moreover, the mean disorder tendency in each comparison was also consistent   Envelope V3 loop structural disorder contributes to HIV-1 tropism

Discussion
In this study we have used a nonparametric statistical approach to investigate if there are distinct patterns of predicted structural disorder in the V3 domain of HIV-1 envelope protein between R5, R5X4 and X4 tropic viruses. Interestingly, there seems to be an increase in the predicted structural disorder tendency of V3 domain from R5/R5X4 tropic virus to X4 tropic virus, which indicates that structural disorder (flexibility) of the V3 loop plays a role in HIV-1 cell tropism. Our disorder prediction of gp120 and gp41 generally agrees with previous studies [51]. The difference in disorder propensity of the V3 loop for preferential coreceptor usage (CCR5 and/or CXCR4) with the binding of gp120 to CD4 offers a new model to understand HIV-1 tropism and coreceptor switch. The flexibility of V3 loop and other variable regions (e.g., V1 and V2) are well known. The "snapshots" of the V3 loop presented in crystal or cryoEM structures are not at their functional state as there is no experimental gp120-CCR5/ CXCR4 complex available with the V3 loop present. Moreover, in order to get either the crystal or the cryo-electron microscopy (cryoEM) structures particular chemicals and/or antibodies are added before and/or after the purification of the protein for structural determination (see the "Methods" or "Experimental procedures" section in previous studies such as [44]). Thus, the appearance of a predicted disordered region in a structure does not necessarily mean it is not flexible or disordered as it can be stabilised by added chemicals and/or antibodies, and it also does not necessarily mean the "snapshot" of the disordered region is in its native functional state. The flexibility of the V3 loop in trimeric Env is also demonstrated in several structural and molecular dynamic simulation studies [45][46][47][48][49]. Even if the V3 loop is structured in the context of the trimer it is this context that is giving the structure.
Our statistical comparisons of the predicted disorder tendency between disordered V3 loops capable of using different chemokine receptors (CCR5 and/or CXCR4) are consistent with our hypothesis that 1) the V3 loop of X4 using virus has a significantly higher disorder tendency than that of R5 virus; and 2) the V3 loop of X4 and R5 virus has significantly higher disorder than the dual tropic (R5X4) virus. As protein disorder is a source of novel proteinprotein interactions [23,24,30,37,52], such a significant shift of protein disorder tendency is presumably contributing to rewiring of protein-protein interactions [52].
In terms of a mechanism for the switch, intrinsically disordered regions are relatively unconstrained resulting in the accumulation of neutral or nearly neutral residue changes in line with the neutral theory of molecular evolution [53,54]. Random changes may lead by chance to some affinity for the CXCR4 coreceptor, i.e., dual tropic intermediate viruses [50] which selection may then act upon resulting in X4 virus. Given the main receptor used in recognition by HIV-1 is CD4, switching between the spatially proximal (and paralogous) coreceptor molecules CCR5 and CXCR4 on the different cell types could be relatively straightforward due to the unconstrained nature of disordered regions.
The higher disorder tendency of the X4 virus V3 loop suggests it may be 'stickier' and able to use CXCR4 coreceptor more efficiently and therefore cause further infection of immune cells that express CXCR4. Or alternatively it forms a different structure in the context of the trimeric Env and this lead to the switch in tropism. On the other hand, the increased disorder tendency of the V3 loop may make the virus more promiscuous in binding to immune cell coreceptors for cell entry (e.g., not just CXCR4) [30,55].
We speculate that that structural disorder of the V3 loop contributes to function by being stabilized by either the context of the Env trimer and/or secondary interactions leading to coreceptor switch, which could be another protein (e.g., CXCR4 or other chemokine receptors [55]) or posttranslational modifications such as N-linked glycosylations as suggested in other studies [13][14][15][16][17][18][19][20]22]. Evidence from several previous studies supports our hypothesis. First, non-switching viral populations (remain R5 using) have increased N-linked glycosylation sites, which has a stabilization role [17,[19][20][21][22]. Second, molecular-dynamics simulation study also suggests that X4 binding stabilizes the V3 loop [56]. Third, many previous studies demonstrate that increased charge in the V3 loop play a significant role in coreceptor switch [13, 16,17,57,58]. It is well known that charged residues promote structural disorder [24], which leads to increased disorder in X4 using V3 loop. However, if the V3 loop with increased disorder can be stabilized by increased N-glycans or a combination of N-glycans and other secondary interactions the virus could remain R5 tropic [13][14][15][16][17][18]22]. In conclusion, understanding the evolutionary mechanisms that lead to the fixation of virus protein structural disorder may hold the key to understand viral host adaptation and develop novel intervention strategies. The prediction of intrinsic disorder can be informative in determining that a region may be more likely to form protein-protein interactions with other proteins or under post-translational modifications (PTM) [59], such as glycosylation, phosphorylation, disulfide-bond formation and many others [60]. So disorder analysis can be a useful approach to identify candidate regions for PTM and protein-protein interaction [59]. This also helps to explain why a particular N-linked glycosylation may be important to determine CCR5 tropism [13]. Nevertheless, we caution that further experimental studies must be used to confirm our hypothesis. The HIV-1 structural disorder mediated coreceptor switch provides a unique model to study change in specificity of a protein-protein interaction, which provides a mechanistic understanding of HIV-1 cell tropism. Future research aiming at understanding viral host adaptation could benefit from elucidating the molecular evolutionary mechanisms that lead to the fixation of structural disorder promoting mutations in the virus-host protein-protein interaction networks, as these mutations may be key to host adaptations and cross-species transmission.