Effects of the Q80K Polymorphism on the Physicochemical Properties of Hepatitis C Virus Subtype 1a NS3 Protease

Hepatitis C virus genotype 1a (HCV-1a) comprises clades I and II. The Q80K polymorphism is found predominantly in clade I but rarely in clade II. Here, we investigated whether natural polymorphisms in HCV-1a clade II entailed structural protein changes when occurrence of the Q80K variant was simulated. Based on HCV-1a clade I and II protein sequences, the structure of the HCV-1a Q80K mutant NS3-4A was obtained by comparative modeling. Its physicochemical properties were studied by molecular dynamics simulations and network analysis. Results demonstrate that, in the presence of the K80 variant, clade II protease polymorphisms A91 and S/G174 led to variations in hydrogen bond occupancies. Structural analyses revealed differences in (i) flexibility of the H57 catalytic residue on the NS3 protease and (ii) correlations between amino acids on the NS3 protease and the NS4A cofactor. The latter indicated possible destabilization of interactions, resulting in increased separation of these proteins. The present findings describe how the relationships between different HCV-1a NS3 protease amino acid residues could affect the appearance of viral variants and the existence of distinct genetic barriers to HCV-1a isolates.


Introduction
Hepatitis C virus (HCV) represents a global public health threat due to its high rate of evolution to chronic infection and the lack of a vaccine to prevent infections [1]. HCV is classified into seven major genotypes and multiple subtypes [2], of which HCV genotype 1 (HCV-1) is the most prevalent worldwide. Effective therapeutic drugs directly targeting HCV proteins, the so-called direct-acting antivirals (DAAs), have been the subject of intense research in the last decade. As the HCV NS3-4A protease is a critical component of the viral replicase complex, several protease inhibitors have been developed for use in clinical practice [3][4][5][6].
HCV displays high genetic heterogeneity and great sequence variability, contributing to the existence of natural polymorphisms. HCV is present within patients as a quasispecies-variants that are closely related but heterogeneous [7]. Variability of the HCV genome poses a challenge for DAAs, because drug resistance represents the leading cause for the failure of antiviral therapy against HCV infections. The Q80K polymorphism is found predominantly in HCV genotype 1a (HCV-1a) sequences and has been associated with reduced inhibitor activity of not only simeprevir [8], but also other protease inhibitors in both in vitro and in vivo experiments [5,9]. Additionally, this polymorphism could facilitate the emergence of resistance-associated substitutions (RAS) conferring the virus increased resistance to protease inhibitors or other DAAs [9,10].
Based on 282 statistically significant phylogenetic informative sites in the HCV genome, HCV-1a has been differentiated into two clades, I and II [11]. Several sites responsible for this distinction are located near or within codons associated with resistance to protease inhibitors, which would increase the possibility of one of these HCV-1a clades being more likely to develop resistance to DAA inhibitors. This same pattern of segregation was observed in HCV NS3 protease domain sequences containing 19 phylogenetic HCV-1a clade informative sites, whereby the Q80K polymorphism was detected in most HCV-1a clade I sequences but was rarely present in clade II [12]. The two clades have a different geographic distribution, with clade I accounting for 75% of the US sequences but less than half of the European ones. Consequently, the prevalence of the Q80K polymorphism is higher in the US than in Europe [13]. The vast majority of Brazilian HCV-1a sequences are grouped in clade I, mostly as a separate group of related sequences with a high supporting value (subclade IC). However, the prevalence of the K80 polymorphism in this population is very low (≤3%) within this subclade [12,14]. Accordingly, the presence of polymorphic sites in HCV genotypes, subtypes, and even in HCV-1a clades may be critical for the different Q80K frequencies observed in HCV NS3 sequences.
This study applied molecular dynamics (MD) simulations and network analysis to investigate whether the distinct natural polymorphisms in HCV-1a sequences caused structural changes in HCV-1a clade II proteins when the occurrence of the Q80K variant was simulated. Results point to the possible adverse effect of combining the Q80K variant and naturally occurring polymorphisms in clade II on the structural and enzymatic properties of the NS3 protease. Thus, polymorphisms could pose genetic barriers to the appearance of viral HCV-1a isolates, explaining why Q80K variants are not detected in HCV-1a clade II sequences.

Sequence Analysis and Statistical Treatment
A total of 120 HCV-1a clades I and II reference sequences per clade were extracted from Pickett et al. [11]. The sequence from the entire NS3 protease clade I domain (181 amino acids) was screened and compared to that of NS3 protease clade II using T-coffee [15]. Sequences were aligned with the Vespa program [16] to determine the subtype-clade-specific amino acid signature pattern at NS3 protease sites.
Based on the HCV-1a clade I and II amino acid signature pattern, only clade I sequences presented high frequencies of K80, S91, and N/S174. To verify if K80+S91+N/S174 polymorphisms were more frequent in the same clade I sequences than based on a random distribution, we estimated the frequencies of K80+S91+N174 and K80+S91+S174 and then compared the results with the frequency of clade I Q80+S91+N174 and Q80+S91+S174 polymorphisms. To this end, we considered 116 sequences, which contained either K or Q in position 80; the remaining four sequences had either G, L, N, or R at this site. A chi-square test was performed to verify the significance of sample size, with p < 0.05 being considered statistically significant. The total sample size of Q80 (44/116, 37.9%) and K80 (72/116, 62.1%) polymorphisms and their frequencies was estimated. A chi-square test was used to compare the observed counts of each category (K80+S91+N174, Q80+S91+N174 and K80+S91+S174, Q80+S91+S174) with their respective expected counts, specified by probability distributions.
As K80 was detected only in NS3 sequences belonging to HCV-1a clade I, we considered such sequences and the ensuing mutant structures' dynamic behavior as reference controls. For HCV-1a clade II sequences, a Q80K alteration was made in clade II GenBank EU256049 and EU239713 sequences prior to molecular modeling. A comparison with HCV-1a clade I controls, determined if this introduction of K80 affected HCV NS3 physicochemical properties.

Comparative Modeling
The BLASTP program was used to search the Protein Data Bank (PDB) [17] and select a template structure to be used in the comparative modeling procedure. A single template, 2f9v [18], was used for all target sequences generated. This specific template was chosen based on identities and coverage between target and template sequences. PDB crystallographic resolution (2.6 Å) and co-crystallization of the NS4A cofactor with the template structure were also taken into account.
The template and target sequences were aligned using the PSI-Coffee mode in T-Coffee. A hundred homology models were generated for each target sequence using the standard "auto model" routine in Modeller version 9.18 [19]. Each model was optimized using the variable target function method until accomplishing 300 iterations. MD optimization was carried out in the slow level mode. The full cycle was repeated twice to generate an optimized conformation of the model. The best resulting modeled structures were selected according to their discrete optimized protein energy (DOPE) score. The models' quality was evaluated using DOPE, ERRAT2 [20], Ramachandran plots, and the QMEAN server [21]. Sequence alignments were rendered using ALINE [22]. Three-dimensional (3D) structure figures were generated using the UCSF Chimera program [23].

Molecular Dynamics (MD) Simulations
MD simulations were carried out using the GROMACS 5.1.2 package [24], and protein interactions were represented using the AMBER99SB ILDN [25] force field. A set of 100-ns production runs was performed to thoroughly investigate the dynamic behavior of HCV-1a NS3-4A protease mutant proteins. Protonation states were assigned using pdb2pqr software, and zinc-coordinating cysteine residues were manually deprotonated to maintain ion coordination. Electrostatic interactions were treated using the particle mesh Ewald algorithm with a cut-off of 10 Å. Each system was simulated under periodic boundary conditions in a triclinic box, whose dimensions were defined automatically based on the most distant atoms from the center of the protein's mass being 1 nm away from the box edges and the box being filled with TIP3P water molecules [26]. All systems were neutralized by adding counterions.
Simulations were performed in three stages: (i) energy minimization, (ii) thermalization and equilibration, and (iii) trajectory production.
Energy minimization was achieved through 3000 steps and a gradient tolerance <1.0 kJ mol −1 of the steepest descent (with and without heavy atom restraints by applying a harmonic potential with a force constant of 1000 kJ mol −1 nm −2 ) and conjugate-gradient algorithms.
In the second phase, starting atomic velocities were assigned to all atoms in the system using a Maxwell-Boltzmann distribution, corresponding to an initial temperature of 20 K. The temperature was gradually increased to 300 K over 500 picoseconds (ps) utilizing the Langevin thermostat. During this stage, all heavy atoms were restrained by applying a harmonic potential with a force constant of 1000 kJ mol −1 nm −2 .
Systems were subsequently equilibrated during four successive equilibration simulations of 750 ps each, corresponding to different force constant values (900, 750, 500, 250 kJ mol −1 nm −2 ) from harmonic restraints. After this period, simulations ran with no restraints for 100 ns. All simulations were performed in the NPT ensemble. The V-rescale thermostat and Berendsen barostat were used for temperature (300 K) and pressure (1 atm) control, respectively. We performed five replicas simulations on the protease systems, each of 100 ns, starting from the same structure and running the simulation independently, yielding a total simulation time of 500 ns.

Trajectory Analysis
Differences in structural stability and dynamic nature of the mutants were investigated by comparing structural protein descriptors. Root-mean-square deviation (RMSD) and root-mean-square fluctuation (RMSF) values were calculated after taking the initial structure from production dynamics as a reference. VMD software was used to calculate distances between atoms and determine hydrogen bond (H-bond) formation using a geometrical criterion. A hit was computed when the distance between the heavy atom acceptor and the hydrogen atom was <2.5 Å and an angle <30 • was formed between the acceptor, hydrogen, and donor atoms.

Correlation Network Analysis
Cross-correlation and network analyses were carried out in R [27] with the Bio3D program and the igraph package [28]. Initially, dynamic cross-correlation matrices (DCCM) were calculated separately for each simulation using as input the corresponding last 70 ns of the MD trajectory superimposed onto the initial structure. DCCM was used for building the residues correlation network. Briefly, graphs were obtained considering Cα atoms as nodes, and the connection between nodes i and j was weighted by the absolute values of cross-correlation (C (i,j) ) coefficients: The community analyses network was built using the Girvan-Newman algorithm [29].

Phylogenetic Analysis
To investigate possible correlations between clade I S91+N/S174 polymorphisms and the K80 phenotype, phylogenetic analysis was performed with the same HCV-1a clade I and II reference sequences used to determine specific amino acid signature patterns. Maximum-likelihood phylogenetic trees were inferred using the PhyML program [30] with the approximate likelihood-ratio test (aLRT) [31] based on a Shimodaira-Hasegawa-like procedure. Specifically, general time reversible was used as substitution model and a subtree pruning and regrafting-based tree search algorithm was used to estimate tree topologies.
From the set of 57 structures returned by the BLASTP program, we chose the HCV NS3 protease domain with NS4A peptide (PDB ID: 2f9v) [18] as template. The template sequence shared 97% identity with clade II sequences and 98% identity with clade I sequences, based on 100% sequence coverage ( Figure 1). An evaluation of HCV NS3-4A protease models is presented in Table 1, and a 3D representation is shown in Figure 2.
97% identity with clade II sequences and 98% identity with clade I sequences, based on 100% sequence coverage ( Figure 1). An evaluation of HCV NS3-4A protease models is presented in Table  1, and a 3D representation is shown in Figure 2.

Assessing Dynamic Properties of the Models in Aqueous Solution
An important aspect of MD simulation analysis is the description of macromolecular motions. The most common mobility measures are the RMSD and RMSF, obtained after aligning the atomic coordinates in each trajectory step to a reference structure. The RMSD is used for the evaluation of differences between molecules. It can be used to calculate the global deviation during the trajectory and evaluate the oscillation of the system during the elapsed time. Uniform RMSD values that remain below a certain limit (this limit depends on the protein) indicate that the structure deviates only marginally from the initial position, whereas abrupt changes in mean values denote important changes in structural conformation.
The RMSF, a measure complementary to the RMSD, allows to obtain information about local structural flexibility. The RMSF calculates the atomic positions throughout the simulation. Its values might reflect only displacements of a small structural subset within an overall rigid structure.

Assessing Dynamic Properties of the Models in Aqueous Solution
An important aspect of MD simulation analysis is the description of macromolecular motions. The most common mobility measures are the RMSD and RMSF, obtained after aligning the atomic coordinates in each trajectory step to a reference structure. The RMSD is used for the evaluation of differences between molecules. It can be used to calculate the global deviation during the trajectory and evaluate the oscillation of the system during the elapsed time. Uniform RMSD values that remain below a certain limit (this limit depends on the protein) indicate that the structure deviates only marginally from the initial position, whereas abrupt changes in mean values denote important changes in structural conformation.
The RMSF, a measure complementary to the RMSD, allows to obtain information about local structural flexibility. The RMSF calculates the atomic positions throughout the simulation. Its values might reflect only displacements of a small structural subset within an overall rigid structure.
According To investigate the effect of the different mutations on residues' dynamic behavior and stability, the RMSF of each HCV-1a mutant protease residue was calculated ( Figure 4). HCV NS3 protease residues 80, 91, and 174 did not present substantial fluctuations around their respective averages of 0.8 Å, 1.19 Å, and 0.58 Å. However, residues 14 and 15, which are part of an alpha helix at the N-terminus of the protease, showed fluctuations >2 Å in the c1mutKSS structure. These residues are located close to the NS4A cofactor, which is probably the reason for the elevated fluctuation values observed for this structure. Higher fluctuations were observed also for residues 89 (2.38 Å) and 90 (2.22 Å) of the c2mutKAS structure, as well as residue 161 of c1mutKSN (2.61 Å) and c1mutKSS (2.33 Å), and residue 181 (3.36 Å) of c2mutKAG. Regarding protease catalytic residues, c2mutKAS and c2mutKAG structures exhibited more flexibility for H57, whereas the c1mutKSN structure displayed high flexibility for S139. These differences in flexibility likely allowed accommodation of the surrounding polymorphisms at sites 80, 91 or 174 and the catalytic residues in these mutated structures. No flexibility differences were observed for the catalytic residue D81 in any of the systems. In the NS4A cofactor protein, residue 182 and the final residues (198-203) of the c2mutKAG structure exhibited fluctuations >3 Å.
Viruses 2019, 11, x FOR PEER REVIEW 8 of 17 located close to the NS4A cofactor, which is probably the reason for the elevated fluctuation values observed for this structure. Higher fluctuations were observed also for residues 89 (2.38 Å) and 90 (2.22 Å) of the c2mutKAS structure, as well as residue 161 of c1mutKSN (2.61 Å) and c1mutKSS (2.33 Å), and residue 181 (3.36 Å) of c2mutKAG. Regarding protease catalytic residues, c2mutKAS and c2mutKAG structures exhibited more flexibility for H57, whereas the c1mutKSN structure displayed high flexibility for S139. These differences in flexibility likely allowed accommodation of the surrounding polymorphisms at sites 80, 91 or 174 and the catalytic residues in these mutated structures. No flexibility differences were observed for the catalytic residue D81 in any of the systems. In the NS4A cofactor protein, residue 182 and the final residues (198-203) of the c2mutKAG structure exhibited fluctuations >3 Å.

Hydrogen Bonding Interactions
To understand the possible effect of the Q80K mutation on HCV-1a clade II proteins' interactions, intramolecular H-bonds were calculated for all systems and compared to those of HCV-1a clade I control structures. H-bond occupancy between atoms from residues involved in the interaction is summarized in Table 2. The presence of the resistant amino acid K80 in the c2mutKAS structure resulted in the formation of nine H-bond interactions between side-chain atoms of catalytic residues H57 and D81 (~96%). Among these H-bonds, H 57 (ND1) − D 81 (CG), H 57 (ND1) − D 81 (OD1), and D 57 (ND1) − D 81 (OD2) presented high occupancy values of 48.99%, 27.41%, and 20.04%, respectively. The same H-bonds were observed in the control c1mutKSS structure, but at much lower occupancies of 10.34%, 6.38%, and 7.64%, respectively. The RMSF of residue 57 in the c2mutKAS structure was also higher (1.00 Å) compared to the control c1mutKSS (0.71 Å). The higher RMSF may have allowed a greater proximity to residue 81, as evidenced by the shorter distance between atoms involved in H-bond interactions (Table 3), and consequently resulted in higher H-bond occupancies.

Hydrogen Bonding Interactions
To understand the possible effect of the Q80K mutation on HCV-1a clade II proteins' interactions, intramolecular H-bonds were calculated for all systems and compared to those of HCV-1a clade I control structures. H-bond occupancy between atoms from residues involved in the interaction is summarized in Table 2. The presence of the resistant amino acid K80 in the c2mutKAS structure resulted in the formation of nine H-bond interactions between side-chain atoms of catalytic residues H57 and D81 (~96%). Among these H-bonds, H 57 (ND1) − D 81 (CG), H 57 (ND1) − D 81 (OD1), and D 57 (ND1) − D 81 (OD2) presented high occupancy values of 48.99%, 27.41%, and 20.04%, respectively. The same H-bonds were observed in the control c1mutKSS structure, but at much lower occupancies of 10.34%, 6.38%, and 7.64%, respectively. The RMSF of residue 57 in the c2mutKAS structure was also higher (1.00 Å) compared to the control c1mutKSS (0.71 Å). The higher RMSF may have allowed a greater proximity to residue 81, as evidenced by the shorter distance between atoms involved in H-bond interactions (Table 3), and consequently resulted in higher H-bond occupancies.  Two H-bond interactions between atoms were formed between the catalytic side-chain atoms of S139 and those of backbone K136. In the c2mutKAG case, the occupancy in one of these interactions, S 139 (OG) − K 136 (O), was substantially lower (6.58%) than in the other structures ( Table 2). RMSFs of residues located at positions 136 (0.89 Å) and 139 (0.49 Å) were also slightly lower in comparison to the control c1mutKSN, c1mutKSS, and c2mutKAS structures, whose respective RMSF values were as follows: 1.21 Å, 1.00 Å, and 1.02 Å (residue 136) and 1.04 Å, 0.52 Å, and 0.54 Å (residue 139). Hence, the slight reduction in flexibility affecting residues 136 and 139 in c2mutKAG may have prevented sufficient closeness and H-bond formation between these two residues. The greater distance between oxygen atoms in these two residues (5.2 Å) ( Table 3) and the lower H-bond occupancy appeared to confirm this hypothesis.
Differences observed for H-bond interactions involving the catalytic triad residues of HCV-1a clade II mutated structures could affect NS3 protease activity, providing evidence that changes in chemical bonding patterns in these structures would be a plausible reason for the non-detection of Q80K variants in HCV-1a clade II sequences.

Correlation Network Analysis and Community Network Comparison
A strategy based on network analysis was applied to study motion correlations between residues and among the four mutated HCV-1a clade structures. In this approach, each residue represented a node and the weight of the connection between nodes, i and j, represented their respective cross-correlation value, w ij . We analyzed how the simulated occurrence of the Q80K mutation in HCV-1a clade II sequences affected correlations between residues in NS3-4A protease structures and compared them to correlations between HCV-1a clade I mutants.
Overall, a similar pattern of cross-correlations was observed among all systems ( Figure 5). Both HCV-1a clade II mutant structures displayed anticorrelations between residues belonging to the NS4A cofactor and the NS3 protease. Anticorrelations were more pronounced in the c2mutKAG structure, indicating that the affected residues moved in opposite directions. These results were consistent among replicas, indicating that the observed differences were real and reproducible. Notably, the only difference between the HCV-1a clade II structures used in this study were the residues located at site 174, which are close to the NS4A cofactor. The presence of G174 was likely responsible for the increased anti-correlations in the c2mutKAG structure. DCCM confirmed that the movement between the HCV NS3 protease and the NS4A cofactor was opposite to what was observed for HCV-1a clade I K80 control structures. Community network analysis revealed significant differences between HCV clade I and clade II mutated proteins ( Figure 6). Correlations between the HCV NS3 protease and the terminal residues (197-203) from the NS4A cofactor were clearly evidenced in the simulated clade II systems. In the case of c2mutKAS, these terminal residues formed a community of 39 residues (4:7, 46, 53:62, 73, 75:83, 136, 191:203) that included the protease K80 resistance variant residue, as well as catalytic residues 57 and 81. In the c2mutKAG system, the 197-203 cofactor residues were included in a community of 67 residues (4: 9, 24:45, 63:72, 92, 108:112, 138, 182:203). These terminal cofactor residues formed a single and separated community in the c1mutKSN and c1mutKSS systems. Therefore, simulation of the K80 mutation in clade II systems demonstrated the generation of a new correlation pattern with the NS4A cofactor, which may not guarantee proper enzymatic functioning in these mutated structures. Community network analysis revealed significant differences between HCV clade I and clade II mutated proteins ( Figure 6). Correlations between the HCV NS3 protease and the terminal residues (197-203) from the NS4A cofactor were clearly evidenced in the simulated clade II systems. In the case of c2mutKAS, these terminal residues formed a community of 39 residues (4:7, 46, 53:62, 73, 75:83, 136, 191:203) that included the protease K80 resistance variant residue, as well as catalytic residues 57 and 81. In the c2mutKAG system, the 197-203 cofactor residues were included in a community of 67 residues (4:9, 24:45, 63:72, 92, 108:112, 138, 182:203). These terminal cofactor residues formed a single and separated community in the c1mutKSN and c1mutKSS systems. Therefore, simulation of the K80 mutation in clade II systems demonstrated the generation of a new correlation pattern with the NS4A cofactor, which may not guarantee proper enzymatic functioning in these mutated structures. To quantify the relative importance of each residue in the network, we computed the betweenness centrality per amino acid for each simulated system (Figure 7). Betweenness centrality is a measure based on the number of shortest paths between any two nodes. A high betweenness centrality might suggest that an individual node is connecting various parts of the network. The catalytic residue H57 displayed higher betweenness in c2mutKAS (0.06) than in the other mutant structures studied. To compare the overall similarity of betweenness centrality profiles, we calculated the square inner product (SIP) ( Table 4). A higher SIP was observed when comparing HCV-1a clade I structures (0.62) than HCV-1a clade II structures (0.46). Figure 6. Community networks of the four HCV-1a clade systems. Correlation between NS3 residues and the terminal NS4A cofactor residues (197-203) is evidenced in the simulated clade II systems (red). These terminal cofactor residues form a single and separated community in the c1mutKSN (green) and c1mutKSS (grey) systems.
To quantify the relative importance of each residue in the network, we computed the betweenness centrality per amino acid for each simulated system (Figure 7). Betweenness centrality is a measure based on the number of shortest paths between any two nodes. A high betweenness centrality might suggest that an individual node is connecting various parts of the network. The catalytic residue H57 displayed higher betweenness in c2mutKAS (0.06) than in the other mutant structures studied. To compare the overall similarity of betweenness centrality profiles, we calculated the square inner product (SIP) ( Table 4)

Discussion
HCV-1a comprises two clades (I and II), with the Q80K polymorphism being found predominantly in clade I and only rarely in clade II. However, the association between HCV-1a clades and Q80K polymorphism seems to be more complex. As shown by Santos et al. [14], clade I can be further split into at least three subclades (IA to IC). Notably, the Q80K polymorphism is associated with subclade IA and is mostly absent from subclades IB and IC. Most of the HCV-1a sequences reported in Brazil belong to subclade IC and some Argentinean HCV-1a sequences are grouped within clade I. Only a very low prevalence of Q80K has been reported in these countries [12,32]. Not surprisingly, the A91 polymorphism is also predominant in subclades IB and IC, as well as in clade II [14]. This fact suggests that the presence of alanine at site 91 may be related to the absence of lysine at site 80 of these sequences.
In our analysis, Q80K polymorphism correlated with S91+N/S174 substitutions. Importantly, these residues arose in viral lineages of two independent subclades (IA and IB), rather than branching out from a unique ancestral node. Coupling the structural results of the present study with analysis of coevolutionary dependencies between residues corroborates the findings and helps elucidate the relationships driven by interaction correlations.
After 100 ns of MD simulations, RMSD values from HCV-1a clade models of NS3 proteins containing Q80K polymorphisms exhibited only minimal differences among them. However,

Discussion
HCV-1a comprises two clades (I and II), with the Q80K polymorphism being found predominantly in clade I and only rarely in clade II. However, the association between HCV-1a clades and Q80K polymorphism seems to be more complex. As shown by Santos et al. [14], clade I can be further split into at least three subclades (IA to IC). Notably, the Q80K polymorphism is associated with subclade IA and is mostly absent from subclades IB and IC. Most of the HCV-1a sequences reported in Brazil belong to subclade IC and some Argentinean HCV-1a sequences are grouped within clade I. Only a very low prevalence of Q80K has been reported in these countries [12,32]. Not surprisingly, the A91 polymorphism is also predominant in subclades IB and IC, as well as in clade II [14]. This fact suggests that the presence of alanine at site 91 may be related to the absence of lysine at site 80 of these sequences.
In our analysis, Q80K polymorphism correlated with S91+N/S174 substitutions. Importantly, these residues arose in viral lineages of two independent subclades (IA and IB), rather than branching out from a unique ancestral node. Coupling the structural results of the present study with analysis of coevolutionary dependencies between residues corroborates the findings and helps elucidate the relationships driven by interaction correlations.
After 100 ns of MD simulations, RMSD values from HCV-1a clade models of NS3 proteins containing Q80K polymorphisms exhibited only minimal differences among them. However, differences and fluctuations were higher for the NS4A cofactor. This may be explained by the greater exposure of this protein to the solvent. Furthermore, we investigated the effect of the Q80K mutation on the residues of each system. Residues of the catalytic triad, H57, D81, and S139, showed distinct flexibilities when clade I and II mutant systems were compared.
The simulation of Q80K occurrence in the HCV NS3 protease also altered H-bond interactions of HCV-1a clade II proteins. When compared to clade I K80 control systems, c2mutKAS displayed more H-bonds between the catalytic residues H57 and D81. In contrast, in the c2mutKAG system, there were fewer H-bond interactions between the side chain of catalytic residue S139 and K136 on the backbone. Importantly, residues H57 and S139 are structural neighbors of polymorphic sites 91 and 174. These data confirm the possible impact of polymorphisms in HCV-1a clade II sequences on the catalytic triad residues in the presence of the Q80K mutation. Indeed, by affecting the chemical bonds in the catalytic site, the polymorphisms could compromise the protein's enzymatic activity.
Interestingly, noticeable anticorrelations were observed between the terminal residues of the NS4A cofactor and the NS3 protease in HCV-1a clade II systems, especially in c2mutKAG. Increased anticorrelations may indicate a possible destabilization of the interactions between these proteins, which could ultimately lead to an increased separation between them.
The betweenness centrality metric allowed the identification of critical nodes for network-wide communication. Residues presenting high betweenness values are considered "bottlenecks" of information as they are mostly found in the shortest paths of communication [33]. Interestingly, the catalytic residue H57 displayed a high betweenness value in c2mutKAS, indicating relevant communication through this particular residue. Elevated SIP of betweenness centrality was associated with weak modulation of intramolecular communication, thus reinforcing noticeable Q80K-related effects in c2mutKAG and c2mutKAS systems.
The HCV NS3 A91S and S174N polymorphisms were strongly associated with Q80K in a previous study by McCloskey et al. [34]. However, Santos and colleagues [14] only found an association between S174N and Q80K in subclade IA but not in subclades IB and IC, despite the high frequency of the S174N polymorphism in these last two subclades. Interestingly, most subclade IB and IC sequences have the same A91 polymorphic pattern as seen for clade II sequences. Santos et al. [14] also proposed that mutations at HCV NS3 80, 91, and 174 sites were not necessarily biologically relevant. However, our results suggest that the presence of the Q80K polymorphism is associated with S91 and N/S174 polymorphisms, which would affect NS3 protease structure and contribute to K80 stabilization. In this sense, natural polymorphisms found in the HCV genome are crucial determinants of viral fitness, and the probability of the Q80K polymorphism in the viral quasispecies population relates to the fitness of the corresponding NS3 protease structures [35].
The NS3 proteolytic activity and consequently viral replication could be compromised by modifying the physicochemical properties of the NS3 protease in the presence of the Q80K mutation, potentially hindering its emergence in the clade II viral quasispecies population and explaining why this mutation is not detected in these sequences.
In vitro studies using HCV subgenomic replicons, infectious cell culture systems and HCV-1a clade I mutants as controls, could elucidate whether the physicochemical differences highlighted in this study led to a decrease in viral replication capacity and reveal the pattern of viral variants that emerges from the different HCV-1a isolates. At present, we did not measure in vitro biological activity and thus could not confirm whether the Q80K mutation in clade II proteins did indeed diminish enzymatic activity.
Different HCV genotype 1a baseline polymorphisms determine the emergence pattern of additional resistance-conferring polymorphisms. While simeprevir treatment favored the emergence of resistant mutations such as D168A/V in the wild-type virus, engineered Q80K/R recombinants were responsible for positively selecting resistant R155K variants [36]. Also, viral fitness depended on these specific substitutions. The replication capacity of Q80K variants was 1.26-fold higher than that of the wild-type virus. However, after the emergence of R155K variants (Q80K+R155K), the replication capacity was only 0.63-fold higher than that of the wild-type virus [36]. These results confirm that baseline polymorphisms present in the HCV genotype 1a NS3 protease are determinants not only of RAS selection during DAA therapies but also of viral fitness. In another study, Pham and colleagues investigated the effects of engineered substitutions at position 80 on the fitness of three HCV genotype 1a infectious recombinants (isolates H77, TN, and HCV1) by comparing viral spreading kinetics. Interestingly, for isolate TN, which originally harbored K at position 80, viral spreading kinetics was significantly delayed compared to that of the original viruses. This isolate (GenBank JX993348.1) was the only one to contain A91 in its sequence [9]. This fact, not addressed in that study, demonstrates that polymorphisms present in HCV genotype 1a sequences differentially impact resistance in viruses. However, no study so far engineered HCV genotype 1a K80+A91+G/S174 recombinants to evaluate viral fitness and the ability to promote additional RAS emergence.
Given the high genetic variability of HCV, other polymorphic sites present in non-structural genes (NS5A or NS5B) could also determine the emergence of DAA-resistant viral variants. Indeed, in vitro studies have demonstrated that baseline polymorphisms can favor the selection of variants resistant to DAA inhibitors. Sun and colleagues used hybrid replicon cell lines to demonstrate that the HCV NS5A polymorphism E62D, which has no effect on daclatasvir inhibitor activity, could not only influence the emergence of the resistant variant Q30R, but was also able to increase the levels of resistance to daclatasvir [37].
As the sequences analyzed here were obtained by Sanger sequencing, we were not able to confirm whether all PCR-amplified molecules from a particular viral isolate contained the triple mutations addressed in this study. Further, the eventual detection of Q80K variants present at low frequencies in HCV-1a clade II isolates may have been missed with this approach. However, it does not necessarily mean that a minor Q80K variant population would prevail over time or that these variants would emerge to become the main viral population without the occurrence of additional mutations in these HCV genomes.
Here, we demonstrate that HCV isolates of the same subtype (1a) exhibit different molecular behavior in the presence of DAA-resistant substitutions. It is quite plausible that other HCV proteins also exhibit a differential molecular behavior pattern. It is not yet fully understood whether the differences in frequencies in DAA resistance substitutions found in HCV non-structural proteins from different geographical regions were caused by changes in physicochemical properties. In this sense, the protocol described here could help predict whether interactions between residues cause changes in HCV protein structures.
The physicochemical differences to the NS3 protease described in this study may be altered by the appearance of additional mutations driven by the elevated replication rates of HCV. In fact, such additional mutations may even cause the physicochemical properties of these proteins to become similar to those of the control structures used in this study.
In conclusion, structure prediction and MD simulations yielded important structural information on HCV-1a clade DAA-resistant proteases. These findings will pave the way for an improved understanding of the atomic interactions between residues and the design of new antiviral therapeutics that can tackle currently resistant variants. Moreover, the methods used in this study are also applicable to other non-structural HCV proteins, whose mutated structures and behavior in an aqueous medium have not been investigated yet by MD simulations. Funding: This research was supported by grants from the "Coordenação de Aperfeiçoamento de pessoal de nível superior (CAPES)".