Variable In Vivo Hepatitis D Virus (HDV) RNA Editing Rates According to the HDV Genotype

Human hepatitis delta virus (HDV) is a small defective RNA satellite virus that requires hepatitis B virus (HBV) envelope proteins to form its own virions. The HDV genome possesses a single coding open reading frame (ORF), located on a replicative intermediate, the antigenome, encoding the small (s) and the large (L) isoforms of the delta antigen (s-HDAg and L-HDAg). The latter is produced following an editing process, changing the amber/stop codon on the s-HDAg-ORF into a tryptophan codon, allowing L-HDAg synthesis by the addition of 19 (or 20) C-terminal amino acids. The two delta proteins play different roles in the viral cell cycle: s-HDAg activates genome replication, while L-HDAg blocks replication and favors virion morphogenesis and propagation. L-HDAg has also been involved in HDV pathogenicity. Understanding the kinetics of viral editing rates in vivo is key to unravel the biology of the virus and understand its spread and natural history. We developed and validated a new assay based on next-generation sequencing and aimed at quantifying HDV RNA editing in plasma. We analyzed plasma samples from 219 patients infected with different HDV genotypes and showed that HDV editing capacity strongly depends on the genotype of the strain.


Introduction
Hepatitis delta virus (HDV) is a small defective RNA virus that infects humans already chronically carrying hepatitis B virus (HBV). Indeed, HDV requires the HBV envelope proteins expressing the HBV surface antigen (HBsAg) for the morphogenesis of its viral particles [1]. HDV has been assigned to the Deltavirus genus. New HDV RNA-like sequences have been recently identified in the animal kingdom [2][3][4][5]. Human HDV infection is widespread, and its prevalence differs between areas. It is estimated that, among the 257 million individuals with chronic HBV infection, approximately 15-20 million are also infected with HDV. According to recent meta-analyses, this number may be underestimated in several countries [6,7]. HDV infection is associated with the most severe forms of viral hepatitis, ranging from acute and sometimes fulminant disease to a rapidly progressive form of chronic viral hepatitis rapidly leading to cirrhosis and hepatocellular carcinoma [8].
The HDV genus is characterized by a very high genetic diversity. We recently proposed, based on large-scale sequencing of HDV strains for many regions of the world,

Materials and Methods
Implementation and validation of a new tool to quantify HDV RNA editing rates. To implement a new assay to measure HDV RNA editing rates, we used an NGS 'method using the MiSeq ® Illumina sequencer (Illumina, Eindhoven, The Netherlands) to quantify the proportion of edited versus non-edited genome sequences in the plasma of HDV-infected patients. To analyze the large amount of sequence data generated, we took advantage of the patent-protected "in-house" software recently developed for HIV sequence analyses, including the Pyromute, Pyrotrop and Pyrolink modules, which were further developed to create the PyroEdit software.
Initial validation of the software was performed using computer-generated edited and non-edited sequences from reference sequences. Briefly, we created a program that generated several synthetic fastq files with the following characteristics: (I) fastq files containing 10 3 , 5 × 10 5 and 10 5 sequences; (II) consensus sequences from all HDV genotypes (HDV-1 to -8); (III) the presence of the W codon on all sequences of each genotype; (IV) sequence variability up to 30%; (V) Phred Quality Score ≥30, and (VI) a lower limit of detection of the W codon of 1%.
Finally, we analyzed the plasma samples from 219 HDV-infected patients described below. Repeatability and reproducibility experiments were conducted from 8 samples, including 3 low (10%), 3 median (32%) and 2 high (52%) RNA editing rates, that were tested 10 times using the same MiSeq ® flow cell and 3 times on 3 different MiSeq ® flow cells (Illumina, Eindhoven, The Netherlands).

Generation of Non-Edited and Edited Clones
We generated non-edited and edited clones from a well-characterized HDV-1 strain (dFr663d unpublished), which exhibited a double population by Sanger sequencing, ATC (for non-edited) and ACC (for edited) at the 1012 position of the genome corresponding to amber (UAG)/W (UGG) codon for the antigenome.
Briefly, amplicons of the so-called R0 genome region, spanning nt 920 to 1289 and covering the editing region [9], were generated after reverse transcription using the Su-perScriptIII Reverse Transcriptase (Invitrogen by ThermoFisher Scientific, Waltham, MA, USA) and amplification with the KAPA High-Fidelity PCR kit (Roche, Indianapolis, IN, USA). They were then cloned into a TOPO10 vector (Thermofisher Scientific, Waltham, MA, USA) and plasmids were expressed in E. Coli bacteria. The final non-edited and edited Viruses 2021, 13, 1572 4 of 15 purified clones were collected, further amplified using the KAPA high fidelity PCR kit and sequenced by the Sanger method using the ABI Prism Big Dye Terminator Cycle Sequencing Kit v3.1 (Applied Biosystems by ThermoFisher Scientific, Waltham, MA, USA). DNA plasmid purification was carried out with the GeneJET plasmid miniprep kit (Thermofisher Scientific, Waltham, MA, USA). DNA was normalized using a Qubit Fluorometer (Qubit dsDNA Assay kit, Life Technologies by ThermoFisher Scientific. Waltham, MA, USA), and new PCRs were performed to characterize each non-edited and edited clones by means of Sanger sequencing. Quantification of each clone by NGS was performed through new PCR amplifications using a pair of primers containing the 5 NGS-adaptor sequences P5 and P7 (Nextera XT Index Kit V2, Illumina, Eindhoven, The Netherlands). PCR products were further purified using the NXp Station (Agencourt AMPure XP, Beckman Coulter, Indianapolis, IN, USA). DNA quantity and quality were assessed using Qubit and BioAnalyser (Agilent by Thermofisher Scientific, Waltham, MA, USA).

Quantification of the In Vivo Editing Rate in Plasma Samples
Clinical samples from 219 untreated patients infected with different HDV genotypes were selected from the collection of the French HDV National Reference Laboratory according to the HDV genotype and a HDV viral load > 1000 UI/mL. HDV RNA levels ranged between 2.7 and 10.2 Log IU/mL, with a median of 6.3 ± 1.62 Log IU/mL, as quantified by the commercial Eurobioplex HDV kit (Eurobio, Les Ulis, France) [36]. HBV DNA levels, quantified by the Cobas HBV 6800 Systems (Roche, Waltham, MA, USA), were available in 139 of the 219 patients and their values ranged between 1 and 6.6 Log IU/mL, with a median of 1.83 ± 1.94 Log IU/mL, 5% of them being close to the limit of detection of the assay. HDV genotypes, determined as previously described [9,11], were distributed as follows: 115 HDV-1, 55 HDV-5, 12 HDV-6, 17 HDV-7 and 20 HDV-8.
For the quantification of HDV RNA editing rate, RNAs were extracted from plasma samples with the m2000sp system (Abbott Molecular, Chicago, IL, USA) and reversetranscribed with the SuperScript III First-Strand Synthesis System (Invitrogen by Ther-moFisher Scientific, Waltham, MA, USA) according to the manufacturer's instructions. The cDNAs were amplified as previously described [9,37] using the KAPA High-Fidelity DNA polymerase (KAPABiosystems, Roche, Indianapolis, IN, USA) and improving proofreading efficiency by using specific primers corresponding to the R0 genome region (See Appendix A, Table A1). After purification and quantification of the PCR products, a library was created containing all samples at the same normalized concentration, which was sequenced by NGS on the MiSeq ® System (MySeq ® V2 Reagent Kit 500 cycles, Illumina, Eindhoven, The Netherlands) according to the manufacturer's protocol.
NGS data analysis was performed with our new dedicated tool PyroEdit. After a specific quality filter step, only complete sequences (up to 151 pb) with a Phred Quality Scores >30 were recorded and aligned (Smith-Waterman algorithm) with a genotypespecific consensus reference sequence. The percentage of sequences bearing the W at the amber/W codon was calculated for each patient sample and recorded in the final report.

Determination of the Secondary Structures of the RNA Editing Region
To analyze secondary HDV RNA structures, especially in the region surrounding the amber/W editing site, we determined the full-length genome sequence of 32 out of the 219 strains from the initial cohort (10 HDV-1, 7 HDV-5, 5 HDV-6, 5 HDV-7 and 5 HDV-8), as previously described [9] (Being published in EMBL). The more stable predicted secondary structures (RNAfolding) of the RNA editing site regions were determined using two online software applications with default setting from the entire antigenome sequences: RNAfold WebServer (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi accessed on 1 November 2019) and UNAfold WebServer (http://www.unafold.org/mfold/ applications/rna-folding-form.php accessed on 1 June 2021). We then compared the patterns of the putative minimum editing substrate for the ADAR-1 enzyme according to the genotype and the editing rate.

Provision of a New Tool to Quantify HDV RNA Editing Rate
We developed a new tool to measure the editing capacity of HDV strains in samples from infected patients. The workflow is based on NGS and quantifies editing of the amber/stop codon at positions 1010-1013 of the antigenomic RNA.
Previously created in-house software was adapted to develop the PyroEdit application, that was subsequently validated to be used regardless of the number of sequences to analyze, threshold values, virus variability and HDV genotype. Firstly, the validation process was conducted by using computer-generated fastq files, as described in the "Methods" section. The mean difference between the expected values of edited genome rates (input) and the values calculated by PyroEdit (output) was less than 0.1%, whatever the HDV genotype is (data not shown). Secondly, eight mixtures containing clones expressing either edited or non-edited genomes in various proportions were prepared ( Figure 1). A total of 18,400,000 quality-filtered sequences were obtained (mean: 175,792 sequences per sample) with a Phred Quality Score ≥ 30. The results obtained with PyroEdit were very close to the expected values, with a mean difference < 3%. The lower limit of quantification (LLOQ) was 5% (percentage of edited RNA in a mixture) and the lower limit of detection (LLOD), defined by the threshold for detection according to information from Illumina support, was 1% ( Figure 1). Furthermore, repeatability and reproducibility were both high (standard deviations: <18.2 and <8.75%, respectively), independent of the editing rate (low, 10%; medium, 30%; high >50%).

Figure 2.
Plasma HDV RNA editing rates according to the HDV genotype. Two-hundred and nineteen samples from untreated infected patients were analyzed. The box plots, generated with ggplot2 application, show the median editing rate and the value distribution obtained with different genotypes. Significantly higher editing rates (expressed in percentage) were observed for HDV-5 and HDV-7 strains, compared to HDV-1 (African and non-African genotypes), HDV-6 and HDV-8 (* p < 0.001). In addition, HDV-6 and HDV-8 editing rates were significantly lower than HDV-1 ones (** p < 0.05). Comparison of editing rates was performed by means of Kruskal-Wallis, chi-square, and Wilcoxon's tests.

Predicted RNA Secondary Structures of the Editing Genome Region
The editing process involves several structural requirements that are described in the introduction. RNA secondary structures can differ according to the HDV genotype, as described earlier [9,21,38]. Indeed, loops of variable sizes are directly displayed at the vicinity of the editing site and downstream, disrupting base pairing that is essential for ADAR-1 binding. We characterized full-length viral genome sequences from 32 of the 219 samples, selected to represent different HDV genotypes and according to different editing rate levels. Then, we analyzed the more stable RNA secondary structures of their antigenomic sequence, with the main focus on the so-called "minimal substrate for editing" ( Table 2 and Figure 3). The structures were identical whatever the application used, RNAfold or UNAfold. The 32 different structures according to genotypes are presented based on RNAfold analysis in Figure 4.        Nine out of the 10 HDV-1 sequences analyzed showed similar secondary structure patterns, composed of three long base-pairing motifs and two bulges ( Figure 4A). By contrast, analysis of the seven HDV-5 strain sequences showed one short base-pairing motif followed by three long base-pairing motifs, separated by one large bubble and two small bulges ( Figure 4B). The five HDV-6 strains displayed three or four base-pairing motifs interspersed with two or three bulges ( Figure 4C). As shown in Figure 4D, four out of the five HDV-7 sequences showed a similar pattern, composed of a repetition of two very large base-pairing motifs separated by one bulge. However, the remaining one (dFr7024) displayed a branched structure composed of two stem-loops flanking the base-pairing motif, as has been previously described for HDV-3 [19,39], with a same free energy than the other HDV-7 strains. Finally, the five HDV-8 strains showed no particular structural characteristics ( Figure 4E). The presence of an A-C mismatch at the amber/stop codon editing site has been reported to be related to the editing efficiency of genotype HDV-1. This mismatch was found in HDV-1 and HDV-5 strains in our work and was replaced in HDV-7 strains by a large bubble at the same location. This A-C mismatch pair was absent in HDV-6 and in most HDV-8 strains, suggesting an alternative transient conformation for the editing site. The base located immediately 3′ downstream of the adenosine, known to influence editing efficiency, was always a guanine, whatever the HDV genotype. Of note, however, the number of dsRNA-binding motifs (DRBMs) did not seem to be associated with the editing capacity (Table 2). Finally, when considering the 130 nucleotides located upstream of the amber/W codon in these 32 full-length sequences, we did not find any genotype-specific pattern [22]. Nine out of the 10 HDV-1 sequences analyzed showed similar secondary structure patterns, composed of three long base-pairing motifs and two bulges ( Figure 4A). By contrast, analysis of the seven HDV-5 strain sequences showed one short base-pairing motif followed by three long base-pairing motifs, separated by one large bubble and two small bulges ( Figure 4B). The five HDV-6 strains displayed three or four base-pairing motifs interspersed with two or three bulges ( Figure 4C). As shown in Figure 4D, four out of the five HDV-7 sequences showed a similar pattern, composed of a repetition of two very large base-pairing motifs separated by one bulge. However, the remaining one (dFr7024) displayed a branched structure composed of two stem-loops flanking the base-pairing motif, as has been previously described for HDV-3 [19,39], with a same free energy than the other HDV-7 strains. Finally, the five HDV-8 strains showed no particular structural characteristics ( Figure 4E). The presence of an A-C mismatch at the amber/stop codon editing site has been reported to be related to the editing efficiency of genotype HDV-1. This mismatch was found in HDV-1 and HDV-5 strains in our work and was replaced in HDV-7 strains by a large bubble at the same location. This A-C mismatch pair was absent in HDV-6 and in most HDV-8 strains, suggesting an alternative transient conformation for the editing site. The base located immediately 3 downstream of the adenosine, known to influence editing efficiency, was always a guanine, whatever the HDV genotype. Of note, however, the number of dsRNA-binding motifs (DRBMs) did not seem to be associated with the editing capacity (Table 2). Finally, when considering the 130 nucleotides located upstream of the amber/W codon in these 32 full-length sequences, we did not find any genotype-specific pattern [22].

Discussion
RNA editing is responsible for the switch from viral RNA replication to virion packaging and secretion. Indeed, early during the replication cycle, the virus produces the s-HDAg protein, which is required for RNA synthesis as it ensures the recruitment of BAZ2B-associated chromatin remodeling factors (BRFs) on the viral RNA by histone mimicry, favors DNA-dependent RNA Pol 2 hijacking, and allows the Pol2 RNA elongation process to occur [23,40]. At later times in the life cycle, the virus produces the L-HDAg as a result of editing at the amber/W codon, which inhibits RNA synthesis and favors virion morphogenesis. HDV genome replication is highly dependent on the respective proportions of the two forms of HDAg. Indeed, it has been shown in vitro that replication was reduced by as much as 8-fold at a 10:1 s-HDAg/L-HDAg ratio [41,42]. Altogether, HDV RNA editing is a key step in the virus life cycle that plays an important role in the natural history and clinical outcomes of human infection.
In this study, we report the development and validation of a new NGS-based assay using the MiSeq ® Illumina device, together with a new in-house software, PyroEdit, which allowed us to measure the viral editing rate in plasma samples from treatment-naïve patients infected with different HDV genotypes. We also analyzed the predicted RNA structural features of the editing region of several strains from this cohort according to editing rate and HDV genotype.
Our results show that our assay was able to: (I) be robust in spite of the high variability of the HDV genome, characterized by a nucleotide dissimilarity rate ranging from 20% to up to 35% over the entire genome sequence between the different HDV genotypes; (II) individualize the sequence of interest of 251 nt in length with a Phred Quality Score > 20% and with 99% accuracy; (III) correctly align each consensus sequence; (IV) check which nucleotide was present at the amber/W codon position and characterize the codon; and (V) calculate the editing rate and provide a final report. With this quantitative method, the editing rate values in clinical samples ranged from 7% to 69% in patients infected with HDV genotypes HDV-1, -5, -6, -7 and -8. Previous in vitro (cell culture) and in vivo (plasma) studies using HDV-1 strains found editing rates around 40% [34,35,43]. Remarkably, in three samples from our study, we repeatedly observed very low editing rate values, lying between the LLOD (1%) and the LLOQ (5%). These results indicate that very low amounts of L-HDAg are sufficient for virion production. At the opposite, strains with high HDV-RNA editing capacity (for instance HDV-5 strains which have a 69% editing capacity), the presence of 31% of non-edited RNA encoding s-HDAg appears to be sufficient for the initiation and continuation of genome replication and to ensure the subsequent steps of the viral life cycle.
Another important result of our study is that the editing rate values differed significantly between different HDV genotypes. Indeed, the African HDV-5 and HDV-7 strains showed significant higher in vivo rates of editing than HDV-1, HDV-6 and HDV-8 (40% vs. 28%, p < 0.001). In addition, HDV-1 strains exhibited significantly higher editing capacities than HDV-6 and HDV-8 (p < 0.05). Interestingly, in the HDV-1 group, no difference in editing efficiency was seen between African and non-African strains. Because HDV-1 is the most frequent, ubiquitously distributed genotype worldwide, this suggests that the in vivo HDV-1 editing rate, around 30%, could be optimal for a most efficient spreading fitness, with an optimal early-to-late phase of HDAg production ratio.
Based on our predicted structural analyses, our results are supported by at least two RNA structural features described in Figure 4 and Table 2. First, the A-C mismatch pair at the amber/W site, which is key in the editing process of HDV-1 strains, is also present in HDV-5 and HDV-7 strains. Interestingly, this mismatch is accentuated in the HDV-7 amber/W codon, because it is located at the top of a large bulge, whereas the mismatch is disrupted in HDV-6 and some HDV-8 sequences which have the lowest editing rates. In addition, in the 25 nt-long-region downstream the editing codon, we found two or three strict base-pairing motifs, allowing for optimal binding of the host ADAR-1 editing enzyme. Surprisingly, the HDV-7 editing motif of one strain (dFr7024 strain) displayed a complex branched structure composed of two stem-loops flanking a base-pairing motif. The editing capacity of this strain appeared to be lower than that of the four other HDV-7 strains ( Figure 4D). Such a structure with two stem-loop arms, has been previously proposed to be a transient secondary structure of the HDV RNA used by HDV-3 strains to improve their editing efficiency by ADAR-1 [19][20][21]39]. Therefore, we hypothesize that this conformation observed in a single HDV-7 strain could be optimal for its editing capacity. Whether different genotypes can transiently use this alternative conformation to up-or down-regulate their RNA editing efficiency remains to be determined. Interestingly, HDV-6 and HDV-8 strains, which did not display the A-C mismatch pair at their amber/W site, could use such transient conformation to increase the binding efficiency of host ADAR-1 to its substrate.
L-HDAg synthesis and its accumulation in several cell compartments has been shown to interfere with multiple cellular signaling pathways and alter the pathogenesis of the infection [24][25][26][27][28][29][30]. Indeed, the overexpression of L-HDAg induces the transactivation of a variety of heterologous promoters and upstream regulatory elements, notably by activating serum response factor-associated transcription [24,25]. Park et al. showed that L-HDAg is able to increase TNF-alpha-induced NF-kappa B transcriptional activation involved in inflammation pathways [27]. In a previous work, we showed that L-HDAg induces NADPH oxidase-4 gene expression, resulting in an increase in cellular reactive oxygen species production, subsequently leading to oxidative stress and activation of pleiotropic transcription factors, including STAT-3 or NF-kappa B [28]. Choi et al. showed that L-HDAg modulates transforming growth factor-beta signaling cascades, arguing for a possible induction of liver fibrosis [26]. Interestingly, we recently confirmed in a large clinical study that patients infected with HDV-5 strains (n = 147) were more at risk of developing cirrhosis than those infected with other African genotypes, such as HDV-1 [8]. We hypothesize that the high HDV-5 editing rates observed in vivo in this study could contribute to this observation, encouraging additional experimentations comparing HDV-1and HDV-5-infected patients to confirm this hypothesis.
It has been reported that HDV proteins (s-and L-HDAg) are involved in immunemediated liver damage [44]. Data from in vitro experiments and in the chimpanzee model suggested a direct cytopathic effect of s-HDAg on hepatocytes, while L-HDAg epitopes render hepatocytes more susceptible to immune-mediated damages. Thus, cells expressing high levels of L-HDAg are likely to be preferentially lysed by the immune system [45,46]. As a result, monitoring the relative production of HDV proteins by quantifying the RNA editing rate could be relevant for patient management to monitor the natural history of the disease associated with chronic infection.
It has been demonstrated in cell culture experiments that L-HDAg is involved in the inhibition of HBV replication observed in HBV-HDV co-or super-infected individuals [47]. Activation of the human interferon alpha inducible MxA protein was one of the possible explanations for this phenomenon [28,29]. Interestingly, we found a significant relationship between high editing rates leading to L-HDAg synthesis in large amounts and low HBV DNA viral loads in vivo.
Another known consequence of HDV RNA editing is the existence of HDV particles containing edited genomes. Such particles are unable to be at the origin of full viral replication cycles. In the event of a co-infection with virions containing genomes capable of coding for both proteins, L-HDAg would likely inhibit the replication of otherwise competent genomes. With our assay, we found that up to 69% of viral particles contained such edited genomes. These edited particles could behave as defective interfering particles (DIPs), as described for HBV and several other viruses [48]. DIPs are thought to serve as decoys for the immune system, being targeted by immune effectors and/or lowering the expression of their own genes. As a result, they can persist and favor fibrogenesis, inflammation and carcinogenesis. Such a mechanism remains to be studied in chronic HDV infection.
The s-HDAg/L-HDAg ratio and its regulation govern the switch from genome replication to packaging, which seems to be unrelated to the HDV viral levels, as we found no clear relationship between viral loads and editing rates. New treatments against HDV will be available soon in addition to pegylated interferon alpha, including Myrcludex, Lonafarnib, Replicor and others in clinical development. Longitudinal studies assessing the kinetics of HDV editing rates on and after treatment could be particularly interesting to better understand their mechanisms of success and failure. The NGS-based method we have developed appears to be particularly well-suited to this endeavor. In addition, the assay will be very useful to finely characterize the editing process control during the natural history of the infection, both in in vitro and in in vivo studies.
In conclusion, our study provides a new specific tool dedicated to the in vitro or in vivo quantification of HDV editing rates and novel insights into the genotype dependency of this important step of the HDV life cycle. This tool will be useful in large-scale clinical studies in the future to assess the role of editing rates as a predictive marker of the natural history and treatment response of the infection.