Mapping the Human Herpesvirus 6B Transcriptome

RNA sequencing (RNA-seq) is an important tool for studying RNA transcripts, particularly during active viral infection. We made use of RNA-seq to study human herpesvirus 6B (HHV-6B) infection.

obtained using our RNA-seq protocol. As observed in Fig. 1, a coverage of 100Â was obtained for most of the genome, indicating efficient transcription throughout the genome. Some regions showed very high coverage, e.g., the DR6 region, the forward direction of the U41-U42 region, and the reverse orientation of the U77 gene. In contrast, genomic repeat regions, exemplified by telomeric repeats, R2 and R3, showed less than 1Â or no coverage, indicating that these regions are not efficiently transcribed.
Relative expression and transcript kinetic class over time. Using data from across the time course of infection, we generated a heat map showing the relative expression of each ORF described in the annotated HHV-6B genome (NCBI accession number AF157706). As shown in Fig. 2, we found that particular genes, such as U90 (IE1), were expressed in abundance throughout the course of infection. For other genes, such as U27, expression began at 9 h postinfection. Genes such as DR6 were only expressed from 48 h postinfection. To complete our analysis, we performed RNA-seq on RNA extracted from HHV-6B-infected Molt-3 cells in the presence of CHX, an inhibitor of protein synthesis (9C in Fig. 2). As shown in Fig. 2, the U90 (IE1) gene was highly expressed in the presence of CHX and can be classified as belonging to the immediate-early (IE) kinetic class. The DR3, B2, U2, U37, U38, U39, U40, U45, U59, U62, U64, B7, U86 (IE2), U94, and U95 genes were also expressed in the presence of CHX, categorizing these genes in the IE kinetic class. A similar experiment was performed in the presence of phosphonoacetic acid (PAA), an inhibitor of viral DNA polymerase (72P in Fig.  2). We characterized early (E) genes as those expressed in the presence of PAA but not in the presence of CHX. The U20, U22, U54, U55, U63, U74, and B6 genes were the most highly expressed genes in this kinetic class. Genes that were not expressed in the presence of CHX or PAA were categorized as belonging to the late (L) kinetic class. Examples include the DR1, DR6, U11, U18, and U100 genes.
The efficiency of PAA treatment was determined by quantifying viral DNA copies at 72 h postinfection in the absence or in the presence of PAA. As shown in Fig. 3A, a 95% reduction in viral DNA copy number per cell was observed in infected cells treated with PAA relative to infected and untreated cultures. Protein expression of one gene per kinetic class was evaluated in the presence of PAA. An immunofluorescence assay was used to detect the IE1 (U90), p41 (U27), and DR6 proteins. As shown in Fig. 3B, we detected the IE1 (U90) and p41 (U27) proteins in HHV-6B-Molt-3-infected cells 72 h postinfection in both the presence and absence of PAA. However, the DR6 protein was only detectable in the absence of PAA, confirming that DR6 is a late gene.
Sequence analysis and comparison with the published sequence. Next, we analyzed NGS data obtained from HHV-6B Z29 viral DNA and RNA-seq data obtained from HHV-6B-Molt-3 cells infected for 72 h. The sequence was first analyzed to identify nucleotide mismatches from the nucleotide sequence under NCBI accession number AF157706; 95 mismatches in the nucleotide sequence were identified (Table 1). Thirtythree mismatches (highlighted in orange) were previously found in the MF994829.1 FIG 2 Detailed heat map of HHV-6B transcripts. Relative abundance of each of the HHV-6B transcripts described in the reference genome annotation AF157706 is shown. Each column represents a given condition (6,9,12,24,48, 72 h postinfection; 9C, which represents 9 h postinfection in the presence of CHX; and 72P, which represents 72-h postinfection in the presence of PAA), and each row represents a described transcript. Black boxes exemplify the 3 classes of genes: immediate-early genes represented by U90, early genes represented by U27, and late genes represented by DR6. sequence and reported by Finkel et al., while 12 additional mismatches (highlighted in pink) were restricted to the MF994829.1 sequence (17) (Table 1). Of these 95 mismatches, 26 were located in noncoding regions of genes annotated in the reference sequence, leaving 69 mismatches located in the coding regions affecting 74 amino acids of coding sequence ( Table 1). Twenty of these 74 differences did not affect the amino acid coding sequence of the corresponding ORF (yellow), 16 differences were conservative mutations (green), 37 differences were nonconservative mutations, and one amino acid difference changed a tryptophan into a stop codon (blue) ( Table 1). Six insertions were found in noncoding sequences, while two were found in the B9 coding sequence ( Table 2). These insertions are present in the vast majority of HHV-6B sequences found in GenBank. They were previously identified in MF994829.1 (17), and five were referenced by Finkel et al. (15). One deletion was identified in the intergenic region between B6 and B7 ( Table 2). This deletion removed a T from the original sequence at bp 119973. Comparison of all HHV-6B sequences found in GenBank supports that this is a genuine deletion. A number of reads cover this region, and most of the sequences found in GenBank have T deleted at this location. This deletion was also found in the MF994829.1 sequence and mentioned in the study by Finkel et al. (15,17).
Our attention then turned to splice variants of the virus transcriptome. We used our RNA-seq data to generate a splice map in relation to the annotated genome (Fig. 4). A number of splicing events occurred around the U44 gene: long-range splicing events in the reverse orientation of the U77 gene, in the same orientation as the U95 gene, and the previously described multispliced transcripts of U86, U90, and U100 (Fig. 4). We first concentrated on the U7-U8 genes. U7 and U8 are described as two genes in the Z29 strain of HHV-6B (NCBI accession number AF157706) (Fig. 5A). Our RNA-seq data revealed that U8 is part of exon 1 of the U7 gene, since reads were found that spanned the exon-intron-exon region and consensus splicing acceptor and donor sites. The gene arrangement described here is similar to that of the HHV-6A U1102 U7 gene (NCBI accession number X83413) (Fig. 5A). To confirm these data, we designed PCR primers spanning the new intronic region and performed reverse transcriptase PCR (RT-PCR) on a new RNA sample from HHV-6B-Molt-3-infected cells (Fig. 5B). As shown in Fig. 5B, we observed a 350-bp PCR product corresponding to the unspliced version of the cDNA and a PCR product of 245 bp corresponding to the spliced version of the U7 gene. The two PCR products were isolated from the agarose gel, purified, and   Gravel et al. Journal of Virology sequenced to confirm our findings. The DNA sequence is presented in Fig. 5C, showing the intronic region in lowercase. This new arrangement for U7 and U8 ORFs was described by Finkel et al. (15). The U12-U13 region was the next region to be investigated. In the NCBI entry with accession number AF157706, the U12 gene is described as two exons separated by an intron (Fig. 6A). In the HHV-6A U1102 genome annotation, the U12 gene is also described as two exons separated by an intron, but the second exon is longer than that described for HHV-6B (Fig. 6A). HHV-6B Z29 is exceptional in this instance, as most other HHV-6B strains sequenced to date do not have this stop codon and resemble the HHV-6A version of U12 (18). The 72-h postinfection RNA-seq data revealed a limited

Human Herpesvirus 6B Transcriptome
Journal of Virology number of reads that covered the intron present between the two exons described for U12 (Fig. 6A). We confirmed these data using RT-PCR and sequencing, but the 104-bp PCR product was very faint, indicating very low abundance of this transcript ( Fig. 6B and C). RNA-seq data also indicated that the first exon of U12 is part of the U13 gene (Fig. 6E). Splicing between exons 2 and 3 of U13 ( Fig. 6D and E) and the splicing event in the noncoding region of U12-U13 ( Fig. 6F and G) were confirmed by RT-PCR and sequencing. Therefore, U12 and U13 share both the noncoding exon 1 and exon 2 but have distinct third exons. This observation was partially described by Finkel et al. (15). The RNA-seq data obtained for the region spanning U44 to U46 were of particular interest, since many reads and splicing events were observed. The HHV-6B Z29 annotation described the transcription of the U44 and U46 genes in the 59 to 39 direction, whereas the U45 gene was transcribed in the reverse direction (Fig. 7A). No intron was described for these three genes in either HHV-6B or HHV-6A (Fig. 7A). The presence of a long transcript spanning U44 and portions of U45 and U46 was revealed by RNA-seq data (Fig. 7A). The presence of this new long intron in the U44 gene was confirmed by RT-PCR and sequencing of PCR products ( Fig. 7B and C).
The U65-U67 region of HHV-6B was also examined. In the NCBI entry with accession number AF157706, the U67 gene is annotated as a single exon, with no intron (Fig.  8A). The analysis of our RNA-seq data revealed a spliced variant of this gene with a new noncoding exon in the 59 region of the described U67 (Fig. 8A). These results were confirmed by RT-PCR and sequencing of the PCR products ( Fig. 8B and C). This new spliced U67 gene was not described in the annotated HHV-6A U1102 genome.
Other spliced transcripts. The region spanning U69 to B7 was investigated next. Our RNA-seq data revealed an intron that spanned more than 13,000 bp (Fig. 9A). This intron is part of a transcript that originated between the B6 and B7 genes and ended in the reverse orientation of the U69 gene. The presence of this intron was confirmed by RT-PCR and sequencing of the PCR products ( Fig. 9B and C). Short exons flanked this intron, giving an ORF of 87 bp. While this transcript is of limited abundance, it was  genes of HHV-6B. The AF157706-Z29 drawing represents the description found in annotated genome AF157706 for HHV-6B Z29. (B) PCR primers were designed to surround the splice region between the noncoding and coding exons (black arrows in panel A). PCR was performed as described in Materials and Methods. PCR products were analyzed on a 2% agarose gel. (C) PCR products were extracted from the gel, purified, and sequenced. Primer sequences are underlined, and the intron sequence is presented in lowercase. (D) PCR primers were designed to surround the splice region between the coding exons of the U13 gene (black arrows). PCR was performed as described in Materials and Methods. PCR products were analyzed on a 2% agarose gel. (E) PCR products were extracted from the gel, purified, and sequenced. Primer sequences are underlined, and the intron sequence is presented in lowercase. The intron sequence was cut (. . .) to fit the figure. (F) PCR primers were designed to surround the splice region between U12-U13 exon 1 and exon 2 (black arrows). PCR was performed as described in Materials and Methods. PCR products were analyzed on a 2% agarose gel. (G) PCR products were extracted from the gel, purified, and sequenced. Primer sequences are underlined, and the intron sequence is presented in lowercase. The intron sequence was cut (. . .) to fit the figure. present in the 48-and 72-h postinfection samples. It is currently unknown whether this spliced transcript translates into a functional protein or whether it is one of the noncoding RNAs (sncRNAs or lncRNAs) found in many herpesvirus genomes. In 2020, Finkel et al. reported the presence of at least 3 lncRNA in the HHV-6B genome (15). We confirmed the presence of lncRNA3 in our data and that this transcript was expressed as early as 9 h postinfection. This transcript was continuously expressed from 9 h up to 72 h postinfection and was expressed in the presence of PAA, suggesting that it was an early transcript whose expression did not require viral DNA replication.

DISCUSSION
The omics revolution of recent years has brought tremendous possibilities to the study of large genomes of any origin at different levels. Here, we determined the RNA profile of an HHV-6B Z29 active infection at 6, 9, 12, 24, 48, and 72 h postinfection and at 9 h and 72 h postinfection in the presence of CHX and PAA, respectively.
Using data from the 72-h postinfection RNA-seq, we constructed a splice map for the HHV-6B genome (Fig. 4). A number of splicing events were observed in the 68,000- PCR was performed as described in Materials and Methods, and PCR products were analyzed on a 2% agarose gel. (C) PCR products were extracted from the gel, purified, and sequenced. Primer sequences are underlined, and the intron sequence is presented in lowercase. The intron sequence was cut (. . .) to fit the figure. to 78,000-bp region of the genome, and all spliced events were oriented in the forward direction (Fig. 4). This observation coincided with the very high coverage obtained in the forward direction of the 69,000-bp region of the genome (Fig. 1) and could be explained by the fact that this location is the origin of replication (OriLyt) of HHV-6B. It was previously demonstrated that sequences deriving from around the OriLyt of rat and mouse cytomegalovirus, and herpesvirus simplex 1 (HSV-1) and HSV-2, can act as RNA primers for replication of the viral genome (19)(20)(21). Published studies report that miRNAs are in the OriLyt region. In 2011, Tuddenham et al. identified miRNAs and other small noncoding RNAs from HHV-6B in this region (22). In 2014, a group from Israel concluded that the finding of processed pri-miRNA in supraspliceosomes brought further support to the cross talk between the splicing and miRNA (19)(20)(21) processing machinery (23). Pri-miRNAs are the precursors of pre-miRNAs, which are themselves the precursors of miRNAs. Pri-miRNAs are hairpin structures of about 70 nucleotides, with a 59-cap and a 39-poly(A) tail at the respective extremities. Therefore, the large number of sequencing reads and the numerous splicing events that we observed in the vicinity of the OriLyt region likely represent pri-miRNAs, which can be processed to produce many miRNAs. A long noncoding RNA (lncRNA1) found in this region was previously identified as the most highly expressed RNA in HHV-6B (15).
From the RNA-seq analysis at each of the infection time points, we obtained a heat map of the relative expression of each ORF described in the annotated HHV-6B genome (accession number AF157706) (Fig. 2). From the 9-h time point in the presence of CHX and the 72-h time point in the presence of PAA, we could match the majority of ORFs described in the annotated genome to their previously defined kinetic class, immediate-early (IE), early (E), and late (L) genes ( Fig. 2 and Table 3). We then compared the kinetic classes obtained with those from two previous studies conducted on HHV-6B (Table 3) (24,25). Of the 97 ORFs described in the annotated genome, 50 ORFs were newly classified or classified as previously mentioned by other studies (Table 3). PCR was performed as described in Materials and Methods, and PCR products were analyzed on a 2% agarose gel. (C) PCR products were extracted from the gel, purified, and sequenced. Primer sequences are underlined, and the intron sequence is presented in lowercase.
Of the 47 remaining ORFs, 22 were classified as E genes, although these ORFs were previously classified as L genes in other studies (Table 3, green ORF). In general, the expression of E genes is not dependent on viral DNA replication (26). However, the accumulation of some E genes is enhanced by viral DNA replication; these genes are called early late genes. On the other hand, the expression of leaky late genes is delayed compared with that of E genes, and only true late genes can be identified by their "nonexpression" in the presence of PAA. The high sensitivity of RNA-seq used to analyze our data makes it difficult to distinguish early late genes from leaky late genes. Thus, our preference was to classify ORF transcripts that were detected 72 h postinfection in the presence of PAA as E genes. Nine ORFs from the remaining 25 were assigned to the L kinetic class, while other studies have classified these same genes as E genes (Table 3, orange ORF). Our results suggest these genes should be classified as L genes, as no transcription was observed for these genes at 72 h postinfection in the presence of PAA. Moreover, it was demonstrated that the hCMV homologs of HHV-6B U71 and U82 belong to the L gene kinetic class (Table 3) (27,28). Four ORFs of the remaining 16 correspond to the B1, B5, B6, and B9 genes found only in the HHV-6B genome (Table 3, dark blue ORF). Conflicting reports exist regarding the kinetic class of these B genes. We have classified 3 genes as L genes, as they were not detected in the presence of PAA at 72 h postinfection (B1, B5, and B9). The fourth gene, B6, was classified as an E gene, since this transcript was found in the presence of PAA but was not detected in the presence of CHX. Previous studies classified B6 as an IE or a biphasic (IE and L) gene; however, these studies were conducted in different cell lines or with different HHV-6B strains (24,25). Four genes out of the remaining 12 were assigned to the IE kinetic class, since they were expressed at high levels in the presence of CHX at 6 h postinfection (Table 3, gray ORF). These genes were previously assigned to the E or L kinetic class (24,25). Two of the 8 remaining genes were previously classified as IE genes in other studies (Table 3, pink ORF). However, our study did not reveal any reads covering these genes in the presence of CHX (Fig. 2). Therefore, these genes were classified as members of the E kinetic class of genes. The remaining 6 genes were assigned  to the IE kinetic class because some reads were detected in the presence of CHX and PAA (Fig. 2) (Table 3, blue ORF). These observations suggest that these genes can be classified as both IE and E genes. However, it is also possible that RNA from these genes is packaged within the virion and does not represent nascent transcription (Table 3, blue ORF). This is of particular relevance to the U2 gene, which showed a marked abundance of transcripts at 6 h postinfection (Fig. 2). A comparison of gene kinetic classes between hCMV and HHV-6B gene homologs is also presented in Table 3. We observed that most HHV-6B gene kinetic classes determined in this study are the same as those of their hCMV homologs.
Next-generation sequencing combined with RNA-seq data bring us to deposit an updated version of the HHV-6B Z29 annotated genome. Our HHV-6B Z29 sequence was deposited under GenBank accession number MW536483. In this GenBank record and as shown in Table 4, we modified the annotated genome of HHV-6B to include mutations and new splicing regions described in this study. We also confirmed and included new spliced regions found in this study that were also found in previous studies: U19, U79, U83, U91 (18,29), and U7-U8 and U12-U13, described in Finkel et al. (15). We identified a mutation that introduces a stop codon in U21 (Table 1). The U21 protein is truncated by 125 amino acids compared to the original U21 protein described in AF157706. This mutation was observed in our NGS data on HHV-6B DNA and in all RNA-seq data from the different time points mentioned in this study. However, it is not found in any other HHV-6B sequences in the database. As mentioned in a recent study on HSV-1 (30), we purposely did not rename any ORF in order to avoid causing confusion with previous work. Considerable work has been published on HHV-6B over the last 30 years, and this work forms the basis of our current knowledge. Thus, changing the name of an annotated ORF would cause unnecessary confusion in the field. Finkel et al. have chosen to rename the HHV-6B ORF and include their newly found internal ORF (iORF) and upstream ORF (uORF) in the annotation (15). Inclusion of the iORF and uORF in our modified annotated version of the AF157706 genome would be more suitable for comparison with historical nomenclature. Figure 10 summarizes the genomic organization of the HHV-6B Z29 genome.
Our work is not without limitations. Although we are confident that most of the viral transcripts were detected, some regions had limited, if any, coverage in our data set. These regions could be transcriptionally silent regions or express very low levels of transcripts below the limit of detection. Second, differences between our data and those of others could result from the use of different cell lines for infection. HHV-6B transcriptional regulation and patterns may differ slightly depending on the cell lines for infection. Indeed, this observation was made by Greninger et al. for the U79 spliced transcripts (18). Finally, differences in the HHV-6B viral strains used, or the number of times the virus was passaged (and in which cell type), may also have had an impact on the overall results.
In conclusion, our work provides an up-to-date analysis of the HHV-6B Z29 transcriptome that complements the work of Finkel et al. (15). The data generated should prove useful to generate a better understanding of the complex and highly regulated life cycle of herpesviruses.
Infection. Seventy million Molt-3 cells were infected with HHV-6B at a multiplicity of infection (MOI) of 0.5 for 4 h in 4 ml of supplemented RPMI 1640. Cells were then washed twice with 1Â phosphate-buffered saline (PBS). DNA was extracted from 1,000,000 cells, and the remaining cells were resuspended at 500,000 cells/ml in supplemented RPMI 1640. Ten million cells were harvested at 6,9,12,24,48, and 72 h postinfection, and the cell pellets were stored at 280°C until extraction. DNA was extracted from 1,000,000 cells at 72 h postinfection. Ten million Molt-3 cells were incubated in the presence of 10 mg/ml cycloheximide (CHX) (MilliporeSigma Canada, Oakville, ON, Canada) for 30 min before HHV-6B infection at an MOI of 0.5. Cells were maintained in 600 ml for 4 h, washed twice with 1Â PBS, and resuspended at