Proline-Rich Hypervariable Region of Hepatitis E Virus: Arranging the Disorder

The hepatitis E virus (HEV) hypervariable region (HVR) presents the highest divergence of the entire HEV genome. It is characteristically rich in proline, and so is also known as the “polyproline region” (PPR). HEV genotype 3 (HEV-3) exhibits different PPR lengths due to insertions, PPR and/or RNA-dependent RNA polymerase (RdRp) duplications and deletions. A total of 723 PPR-HEV sequences were analyzed, of which 137 HEV-3 sequences were obtained from clinical specimens (from acute and chronic infection) by Sanger sequencing. Eight swine stool/liver samples were also analyzed. N- and C-terminal fragments were confirmed as being conserved, but they harbored differences between genotypes and were not proline-plentiful regions. The genuine PPR is the intermediate region between them. HEV-3 PPR contains a higher percentage (30.4%) of prolines than other genotypes. We describe for the first time: (1) the specific placement of HEV-3 PPR rearrangements in sites 1 to 14 of the PPR, noting that duplications are more frequently attached to sites 11 and 12 (AAs 74–79 and 113–118, respectively); (2) the cadence of repetitions follows a circular-like pattern of blocks A to J, with F, G, H, and I being the most frequent; (3) a previously unreported insertion homologous to apolipoprotein C1; and (4) the increase in frequency of potential N-glycosylation sites and differences in AAs composition related to duplications.


Hepatitis E virus (HEV) infection is an important component of enteric-transmitted liver diseases
and has a significant impact on public health. The number of new HEV infections has increased in recent years in the industrialized countries of the European Union [1]. HEV genotype 3 infection (HEV-3) is a viral zoonosis transmitted to humans through consumption of meat from infected animals, mainly pig [2][3][4], wild boar [5][6][7], and deer [8]. HEV-3 is spreading worldwide and is the cause of acute mainly self-limited hepatitis. In immunocompetent and immunocompromised patients, hepatitis can be fulminant, while chronic infection has been only described in immunocompromised.
RNA was extracted with the Magna Pure L.C. © System (Roche Diagnostics, Mannheim, Germany) automatic extraction from 200 µL of serum samples. After RNA extraction, complementary DNA was transcribed with a Transcription First Strand cDNA Synthesis kit (Roche Diagnostics, Mannheim, Germany) using random hexamers and 20 µL of cDNA were obtained from 10 µL of RNA extract, following the manufacturer's recommendations. PPR fragments were obtained through nested PCR, as previously described [19]. Afterwards, amplification products were purified with Illustra ExoProStar 1-step (VWR International Eurolab S.L., Radnor, PA, USA) and sense and antisense DNA strands were both sequenced by the Sanger method. GenBank accession numbers are MT899272 to MT899416.

Consensus Definition
PPR consensus by genotype was obtained from reference and study sequences as follows: the AA consensus sequence was established according to the most frequent AA in each position (aligned with the MegAlign program; DNASTAR, Lasergene Inc., 12.3.1 Madison, WI, USA).

Limits of PPR
Due to the great variability of this region, we analyzed PPR in fragments, choosing histidine (H) as the starting position and aspartic acid (D) as the final position. Three fragments were examined: the initial 32 AAs in the N-terminal region, the intermediate region of variable length, and the final 12 AAs of the C-terminal region.

Amino Acid Composition
AA composition of each PPR sequence was calculated with the EMBOSS Peptats program, available at https://www.ebi.ac.uk/Tools/seqstats/emboss_pepstats/.

Residue Variability
This was calculated as the percentage of discordant AAs with respect to the consensus AA of each genotype. The average residue variability of each fragment was also determined.

Sequence Homology Analysis
This was calculated as the percentage of conserved AAs with respect to the consensus AA of each genotype.

Analysis of Insertions
In study sequences, a BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) was performed to determine the origin of insertions.

Regulation Sites Analysis
Potential ubiquitination sites were analyzed using the BMD-PUB server (http://bdmpub.biocuckoo. org/prediction.php), with a threshold value of a >0.3 average potential score. Potential acetylation sites were identified using the PAIL server (http://bdmpail.biocuckoo.org/prediction.php) with a value of >0.2 of the average potential score. Potential phosphorylation sites were identified using the NetPhos 2.0 server (www.cbs.dtu.dk/services/NetPhos/) with a value of >0.5 of the average potential score. Finally, potential N-linked glycosylation sites were analyzed using the NetNGlyc 1.0 server (www.cbs.dtu.dk/services/NetNGlyc) with a value of >0.5 of the average potential score.

Statistical Analysis
Qualitative variables were analyzed with chi-square tests. Values of p < 0.05 were considered to be significant.

Results
Three genomic regions were differentiated: the PPR-N-terminal (genome region encompassing the first 32 AAs); the PPR-C-terminal including the final 12 AAs; and the PPR intermediate region.

HEV-1 P P S A P A P D E P A S G T T A G A P A I T H Q T A R H HEV-2 L P D V T D G S R P S G A R P A G P N P N G V P HEV-3 P P V T P V S K P A N P P S P T T P R P P V R K P P T P P P A R N HEV-4 P P D L V D G G A X P A L P S A S V A P P A P A Q P V X P S G P R HEV-5 E N E R A A D G G S A A P V A A V P C P Q P P A Q P V G R L F C A G
HEV-6 X G X X X P X P A X X X P X X X P X X X E A X X P X P Q X X X X S X A X X X X A X HEV-7 Q X P X P X X X X X P X X P X X X X S X X X P A Q G X X X X V X R N

PPR-N-Terminal Region
The average residue variability of this region was 2.76% in HEV-3, 2.13% in HEV-4, and 1.34% in HEV-1, with no significant differences between them. Most genotypes had 31 AAs, although the HEV-2 PPR-N-terminal was the shortest (27 AAs), and HEV-1 was the longest (32 AAs), with an extra valine at position 23. Proline was present at the highly conserved positions 8 and 32, but the most common AA was serine, even though its percentage varied between genotypes, being higher in the zoonotic (21.7%) than in the non-zoonotic (13.8%) genotypes (p < 0.05).

PPR-C-Terminal Region
The average residue variability of this region was 1.14% in HEV-1, 0.72% in HEV-3, and 0.19% in HEV-4, with no significant differences between them. All genotypes had nine AAs, and residues 130-133 and 136-138 were conserved in all genotypes. Proline was present and conserved in position 136.

Variability and Composition
The average residue variability of this region was highest in zoonotic HEV-4 (28.20%) and HEV-3 (24.20%) than in non-zoonotic HEV-1 (12.21%) (p < 0.05). Proline was present throughout the entire region, being the most common AA, with an average of 22.8%. The proline composition differed among the genotypes 21.05% in HEV6,, the amount being significantly higher in HEV-3 (p < 0.05). Furthermore, the prolines in positions 117 and 121 were conserved in all genotypes, except for HEV-1. Arginine at position 128 was also highly conserved across genotypes. In addition, HEV-2, with only three available sequences, had a significantly higher percentage of glycine (16.04%) compared with the other genotypes (6.11%) (p < 0.05).
Insertions were more frequent at positions 50 to 111 at different sites. Except for one sequence that duplicated complete PPR, the duplications usually appeared at positions 74-79 and 113-118, these being duplicate fragments of AAs from position 67. There were 10 duplication blocks (Blocks A-J) ( Table 4). Usually, the first fragment to appear duplicated was that immediately adjacent to the positions where the insertion occurred. For instance, block H was the starting block in duplications inserted at position 117. The only exception to this was in the 35 sequences corresponding to HEV-3a in which the inserted block was I in position 113 (Table 4). Duplicated blocks were located one after the other, mainly in alphabetical order (and thereby in the same order as in the wild type PPR), although one of the blocks was occasionally skipped, or the cycle started with previous blocks, which recovered the alphabetical order (e.g., HIJABCDE). The most commonly repeated blocks were F, G, H, and I. In fact, we found one sequence (KJ917717) in which blocks F, G, H, I, and J were repeated up to three times in total. The HIJFG duplication was found in 189 HEV-3f sequences.
HEV genome insertions other than PPR were HEV-RdRp fragments are illustrated in Tables 3 and 4. In two sequences (KC618402 and KC618403) we found a 24-AA RdRp motif (LRGLTNVAQVCVDVVSRVCGVSPG).
Finally, we found HEV-3 inserts of ribosomal proteins 17S, 18S, 19S, and L6 (RPS17, RNA18S, RPS19, and RPL6, respectively) and human genes such as ring finger protein 19A (RNF19A), eukaryotic translation elongation factor 1a1 (EEF1a1P13), zinc finger protein (ZNF787), glycine aminotransferase (GATM), inter-alpha-trypsin-inhibitor heavy chain H2 (ITIH2), and kinesin-like protein 1B (KIF1B). We noted an insertion of five AAs in the HEV-3j sequence (STLPS motif) of unknown origin. In addition to that described in HEV-3, we identified a single AA insertion between positions 122 and 123 in HEV-4 that was present in sequences from HEV-4c, HEV-4a, and HEV-4d. Table 4. HEV-3 PPR duplications. HEV-3 consensus amino acid sequence and sequences with duplications are illustrated, along with 10 duplication blocks (A to J) and their sequences. The Table shows sequences  HEV I W V L P P P S E G S A I D P P P V T P V S K P A N P P S P T T P R P P V R K P P T P P P A R N

Specific Analysis of Newly Obtained HEV-3 Sequences
Ninety HEV-3f sequences from patients with acute infection had a 29-AA duplication of blocks HIJFG (site 12-Tables 3 and 4), identical to six of the seven HEV-3f sequences obtained from swine stool/liver samples.
Regarding the follow-up of chronic patients, HEV-3f CR-3 patient had a 28-AA insertion (site 3- Table 3) related to human apolipoprotein C1 (100% identity) that was maintained in the three follow-up samples over one year. This human insert conferred an increase in the number of potential regulation sites: four acetylation, three ubiquitination, and five phosphorylation sites (4 serine and 1 threonine). In the case of the HEV-3c CR-1 patient, the first sequence had duplicated blocks (site 12-Tables 3 and 4), but the duplication was lost in the subsequent four follow-up samples. The three HEV-3f sequences of CR-2 patient had no insertions.
The alterations of the number of potential regulation sites in the 91 sequences with duplication of blocks HIJFG (90 HEV-3f acute cases and one of HEV-3c CR-1 chronic infection) are as follows: Ubiquitination-suitable sites often increased by one site (range, 0 to 2); acetylation-suitable sites often increased by one site (range, −1 to 2); and potential phosphorylation sites often increased by six sites (range, 1 to 9) mainly due to the presence of serine. Regarding N-glycosylation 16 out of 91 (17.6%) sequences with duplication had at least one potential N-glycosylation site; on the contrary none of the sequences without duplications or the sequences with insertions had potential N-glycosylation sites (p < 0.05). Table 5 compares the characteristics of AA composition of sequences with insertions, duplications and without either. We observed an increase in positively charged and a decrease in hydrophobic and aromatic AAs in sequences with human fragment insertion; and an increase of negatively charged and hydrophobic while polar and a decrease in aromatic AAs in sequences with duplications.

Discussion
The PPR-C-terminal and PPR-N-terminal regions cannot be considered truly hypervariable or hyper-proline regions. Although a 105-AA fragment was originally considered to be a PPR [14], this study confirmed that the disorder does not actually encompass the entire hypervariable region, and implies that the true PPR would be located between positions 33 and 129, as was previously suggested [18,29]. A high conservation rate of these fragments was observed intra-genotypically, but with specific inter-genotypic discrepancies (AA 2 and AA 29 in the N-terminal region). Furthermore, AA 30 allowed zoonotic and non-zoonotic genotypes to be differentiated. The high degree of conservation of the two zones flanking the PPR-intermediate region suggests that a possible function can be assigned to these zones, although this would require additional functional studies. Proline is not common in these terminal regions, the PPR-N-terminal region being particularly rich in serine, which might be crucial for protein phosphatases that control many cell functions [30].
The true PPR hypervariable region is thus the intermediate region flanked by the PPR-N-and PPR-C-terminal regions. HEV -1, 5, 6, 7, and 8 maintain their length, harboring 60, 75, 76, 66, and 72 AAs, respectively. These differences seem to be related to previously undescribed insertions as the sequence GHLDA in HEV-2. Previous studies reported high sequence similarity in HEV-1 [29]. There are few sequences available for the cases of HEV-2, 5, 6, 7, and 8, which makes it difficult to draw conclusions, but our study nevertheless revealed considerable diversity among the small number of available sequences of each genotype. This means that although phylogenetic studies usually exclude this region because of its high degree of divergence, its phylogenetic use might be suitable ( Figure S1), especially when complemented by the analysis of other genome regions [15]. There is more proline in HEV-3 than in the other genotypes.
By contrast, the main zoonotic genotypes, HEV-3 and HEV-4, showed substantive differences in length due to insertions and deletions. Although this phenomenon has been previously reported [20,24,26,27], we describe for the first time the specific location of HEV-3 PPR rearrangements, noting that PPR duplications were more attached to specific locations . In this study we analyzed 723 HEV sequences, including 137 newly obtained sequences through Sanger sequencing. Next-generation sequencing may help researchers obtain hundreds of full genomes, but may give incorrect results in PPR when extreme rearranged sequences are assembled by mapping with reference genomes; thus, in these cases, it would be better to use de novo assembly or Sanger sequencing to obtain more reliable results [31]. Duplications affect HEV-3a, 3c, 3e, and 3f, and are described in acute and chronic infections. Additionally, a more thorough analysis of the duplications in HEV-3 show the previously unreported cadence of repetitions, which follows a circular-like pattern of blocks. Previous studies reported that HEV-3f was divided into HEV-3f-short and HEV-3f-long [32], based on the specific duplication of blocks of HIJFG. Here we describe the same duplication in one HEV-3c sequence from a chronic patient who presented this duplication in their first sequence but not in the subsequent four follow-up samples.
In the sequences newly obtained for this study, duplications increased the frequencies of potential ubiquitination, acetylation, and phosphorylation sites, as described previously [24]. However, a previously unreported increase in the number of potential N-glycosylation sites was also observed. Considering the parallels with other viruses, similar rearrangements have been described in the JC polyomavirus, whose noncoding control region, related to replication and transcription, features a genomic rearrangement that increases the replication rate and viral gene expression in patients with progressive multifocal leukoencephalopathy [33,34]. Something similar occurs in cytomegalovirus cell-adapted strains that contain genomic arrangements located, in this case, in non-essential genes [35]. More similar to HEV PPR, the regulatory Nsp2 protein of porcine reproductive and respiratory syndrome virus (PRRSV) contains a highly conserved N-terminal enzyme domain, a highly conserved C-terminal transmembrane region, and a hypervariable intermediate region with differences in length between the European and North American strains [36]. The introduction of duplications in a highly conserved 3'-noncoding region of the Japanese encephalitis virus (JEV) was found to lead to increases in the production of RNA and of virus yield [37]. In respiratory syncytial virus, a duplication of 23 amino acids was observed in the C-terminal region of the attachment glycoprotein that resulted in the repetition of seven potential o-glycosylation sites. Such changes may influence the pathogenicity of the virus [38]. In duplications we described an increase of negatively charged AAs by contrast with those previously described [24]. Insertions from other HEV ORF1 proteins, such as RdRp, that do not correspond to any functional motif, or that increase the number of potential functional sites, have been found less frequently [24].
Apart from duplications, exogenous inserts, all of human origin, were located along the PPR affecting HEV-3a, 3c, 3f, 3h, and 3m. We describe a new inserted fragment (homologous to apolipoprotein C1) in three samples from a chronic patient. Apolipoprotein C1 in hepatitis C virus is related to morphogenesis and virus infection [39]. Furthermore, the insertion significantly increased the number of potential acetylation, phosphorylation, and ubiquitination sites that might be involved in host adaptation [15]. Cell culture studies have demonstrated a replicative improvement of HEV harboring PPR insertions of the inter-alpha-trypsin-inhibitor heavy chain H2 [20], or the S17 and S19 ribosomal genes [23], that gives rise to new ubiquitination, acetylation, or phosphorylation sites. It seems significant that although a wide range of animals are susceptible to HEV-3, the exogenous inserts described are all of human origin. Human fragment insertions increase the frequency of positively charged AAs as described before [24] and decrease hydrophobic and aromatic AA fractions.
Two of the three patients with a chronic infection in our panel presented PPR insertions, although this is a short number of patients, the frequency of rearrangements in chronic infected patients seem to be high in contrast to the findings of Lhomme et al. [20], who reported that, in the majority of chronic patients, the PPR did not show insertions during follow-up.
In contrast to the HIV and HCV hypervariable regions, in which HVR are related to the structural proteins that the host response forces to mutate, allowing the virus to evade neutralizing antibodies [40,41], HEV PPR as Rubivirus PPR, is considered an intrinsically disordered region (IDR), i.e., a protein domain that does not adopt a compact three-dimensional structure [15]. IDRs are a consequence of the viral interaction with hosts in a wide variety of host viruses, such as herpes simplex [42]. More structural studies are required to see how duplications affect protein conformation.
Independently of the significance of the insertions, their abundance and variety suggest that PPR is a region that tolerates the insertions well, without apparently affecting virus viability. Potential insertion sites in HEV ORF1 were identified by the combined use of transposon-mediated random insertion and selection in a subgenomic replicon system, but insertions in functional domains (Mtase, helicase, and RdRp) were not viable. However, immunofluorescence, immunoblot analysis, and luciferase activity measurement demonstrated that PPR insertions do not affect virus infectivity and facilitate viral production [43]. This may be of interest in genetic engineering and will require additional studies to determine what insert capacity PPR allows and its potential use.

Conclusions
We propose that the true proline-plentiful hypervariable region is flanked by the PPR-N-and PPR-C-terminal regions which while conserved, harbor differences between genotypes and are not proline-plentiful regions.

•
We describe PPR length differences between HEV genotypes.

•
We describe for the first time the specific location of HEV-3 PPR rearrangements in sites 1 to 14 of the PPR, noting that duplications are more attached to sites respectively). The cadence of repetitions follows a circular-like pattern of blocks A to J, with blocks F, G, H, and I being the most frequent. Duplicated fragments increase the frequency of potential N-glycosylation sites and negatively charged AAs.

•
We identify a previously unreported insertion homologous to apolipoprotein C1 in a chronic patient sample.