The first genome sequence of a metatherian herpesvirus: Macropodid herpesvirus 1

While many placental herpesvirus genomes have been fully sequenced, the complete genome of a marsupial herpesvirus has not been described. Here we present the first genome sequence of a metatherian herpesvirus, Macropodid herpesvirus 1 (MaHV-1). The MaHV-1 viral genome was sequenced using an Illumina MiSeq sequencer, de novo assembly was performed and the genome was annotated. The MaHV-1 genome was 140 kbp in length and clustered phylogenetically with the primate simplexviruses, sharing 67 % nucleotide sequence identity with Human herpesviruses 1 and 2. The MaHV-1 genome contained 66 predicted open reading frames (ORFs) homologous to those in other herpesvirus genomes, but lacked homologues of UL3, UL4, UL56 and glycoprotein J. This is the first alphaherpesvirus genome that has been found to lack the UL3 and UL4 homologues. We identified six novel ORFs and confirmed their transcription by RT-PCR. This is the first genome sequence of a herpesvirus that infects metatherians, a taxonomically unique mammalian clade. Members of the Simplexvirus genus are remarkably conserved, so the absence of ORFs otherwise retained in eutherian and avian alphaherpesviruses contributes to our understanding of the Alphaherpesvirinae. Further study of metatherian herpesvirus genetics and pathogenesis provides a unique approach to understanding herpesvirus-mammalian interactions.


Background
Members of the Herpesviridae have a linear doublestranded DNA genome, 120-245 kbp in length, and cause significant morbidity and mortality in diverse groups of animals. Members are further classified into three subfamilies; the Alpha-, Betaand Gammaherpesvirinae. Reports of herpesvirus infections in the Marsupialia date back to the 1970s. The first isolation of a marsupial herpesvirus was from a fatal outbreak of severe respiratory disease and systemic organ failure in a zoological collection of Parma wallabies (Macropus parma) [1]. The isolation of this alphaherpesvirus, designated Macropodid herpesvirus 1 (MaHV-1), was closely followed by the isolation of a second, related, herpesvirus (Macropodid herpesvirus 2, MaHV-2) from fatal cases of disease in several vulnerable macropod species [2]. The macropodid viruses were detected in animals displaying some clinical signs of disease similar to those caused by Human herpesviruses 1 and 2 (HHV-1 and −2) infection, such as conjunctivitis and vesicular anogenital lesions, but also included hepatic disease [1,3].
Despite its classification as a Simplexvirus, early genome hybridization studies of MaHV-1 identified a type D genome structure (as defined by [14]) of approximately 135 kbp in length, containing a short unique (U S ) region, flanked by large inverted repeat sequences (internal repeat and terminal repeat; IR S /TR S ) joined to a long unique (U L ) region [15,16]. MaHV-1 occurs as only two equimolar genomic isomers [15]. These genomic features are characteristic of Varicelloviruses such as varicella zoster virus (VZV) and pseudorabies virus (PRV) and contrast with those of MaHV-2. MaHV-2 has a type E genome arrangement, more typical of the Simplexviruses, and occurs as four equimolar genomic isomers [17]. To date MaHV-1 is the only alphaherpesvirus that encodes both ICP34.5 (RL1) and the host-derived oncogene thymidylate synthase [18]. Sequence analysis of two conserved ORFs in MaHV-1, −2 and −4, as well as analyses of their antigenic relationships, has clustered these macropodid viruses closely with the primate simplexviruses [3,6,19]. As metatherian and eutherian mammals are believed to have diverged over 85 million years ago [20], this viral phylogenetic grouping differed from the typical virus-host co-evolutionary pattern observed within the Herpesviridae [3,19,21,22] and was instead suggestive of a recent and complex speciation event.
This study aimed to sequence and analyse the full genome of the metatherian alphaherpesvirus, MaHV-1, with particular attention to novel genomic features.

Results and discussion
Whole genome sequence analysis The genome of MaHV-1 is the first metatherian herpesvirus to be sequenced. Excluding the genomic termini, which remained unresolved, the final genome length of MaHV-1 was approximately 140.1 kbp (Fig. 1) [Gen-Bank:KT594769], larger than previously predicted. This difference appears to be due to a larger than predicted inverted repeat region [15]. This included a 98.8 kbp U L region and a 15.3 kbp U S region flanked by 13 kbp inverted repeat sequences (IR S /TR S ). The MaHV-1 genome had a G + C content of 52.9 %, but had a higher G + C content (up to 61.7 %) within the IR S /TR S regions. The final genome assembly had a mean depth of 2,168 reads per bp (2.05 million mapped reads) and approximately 95 % of reads had a quality score of at least Phred 20 . Three origins of replication were identified. The origin of lytic replication (oriLyt) was located between UL29 and UL30 in the U L region and the oriS was located within the IRs/TRs regions. Thus two copies of oriS were present, as in the genomes of HHV-1 and −2.

Conserved alphaherpesvirus ORFs
The U L region of the MaHV-1 genome encoded 54 ORFs common to other herpesviruses ( Table 1). The predicted protein sequences of these ORFs shared between 41 % and 73 % aa pairwise identity (up to 86 % aa similarity) with HHV-1 and −2 homologues. In the U S region the MaHV-1 genome encoded seven ORFs common to other simplexviruses (US1 to US4 and US6 to US8), with the predicted protein sequences sharing between 32 and 59 % aa pairwise identity (up to 73 % aa similarity) with HHV-1 and −2 homologues. The IR S / TR S regions encoded five ORFs, including those for thymidylate synthase, ICP0 and ICP34.5. There were no homologues of UL3, UL4, UL56 or US5 (glycoprotein J, gJ) identified in the MaHV-1 genome. Also, the US4 (glycoprotein G, gG) homologue was predicted to be nonfunctional, as the ORF was prematurely truncated (120 aa residues compared to 583 aa in MaHV-4). This is consistent with previous published sequence data reporting a truncation of the MaHV-1 gG ORF [6,23]. Phylogenetic analyses using translated protein sequences of three core herpesvirus genes (UL27, UL30 and US6) are shown in Fig. 2. These analyses show that MaHV-1 clusters most closely with other macropodid herpesviruses (MaHV-2 and MaHV-4), as well as with the simplexviruses that infect primates. It also groups with the herpesvirus of an Indonesian pteropodid bat. Comparison of other viral core genes yielded similar clustering patterns. Comparison of the MaHV-1 UL27 and UL30 ORFs with those of the recently sequenced fruit bat herpesvirus 1 (FbHV-1) [GenBank:BAP00706 and GenBank:YP_009042092; UL27 and UL30, respectively] showed that these ORFs shared 71 and 67 % pairwise aa identity, respectively (83 and 78 % aa similarity). This similarity is comparable to that seen between MaHV-1 and HHV-1/ HHV-2 (Table 1 and Fig. 2), which may offer some insight into their evolutionary relationship, for example, may suggest transmission of herpesviruses from primates to bats, and then to marsupials. Sequencing of herpesviruses from other metatherians, as well as other Australasian mammals, will be needed to determine the significance of this clustering.
Although UL3 and UL4 are conserved in all other alphaherpesviruses examined to date, gene deletion studies in the human simplexviruses have found that deletion of UL3 and UL4 does not affect viral replication or cell-to-cell spread in vitro [24]. In vivo functions of the UL3 and UL4 encoded accessory proteins are not well understood, but they colocalise and directly interact with the transcriptional repressor ICP22, encoded by US1, in small dense nuclear bodies and may also be involved in the late phase of viral replication [25][26][27]. The absence of gJ is also of note. This is the third Simplexvirus found to lack an ORF encoding a gJ homologue, which is otherwise conserved in the Simplexvirus genus. The other two simplexviruses lacking gJ are leoporid herpesvirus 4 and FbHV-1 [28,29]. In other herpesviruses gJ inhibits host cell apoptosis by inducing an increase in concentrations of reactive oxygen species in the host cell [30]. It is unclear whether the absence of UL3, UL4 and gJ might be related to adaptation to a new host (marsupials) or whether it may be the result of virus passage in vitro. In respect to gJ, the former scenario could be more likely, as the absence of this ORF in other non-primate herpesviruses shows that it is not strictly conserved within the genus. Sequencing of other marsupial alphaherpesviruses, particularly field isolates, would help to resolve this finding.

Unique or hypothetical ORFs
Seven unique hypothetical ORFs were identified; one in the U L region, two located in the IR S /TR S regions, and four in the U S region. Viral transcript analyses by qRT-PCR confirmed that six of the seven predicted ORFs were transcribed at both 4 and 12 h post infection (hpi) under in vitro conditions (Additional file 1: Figure S1). No transcripts for these six ORFs were detected in the uninfected cell controls at any time point. The seventh predicted ORF, which was located in the large inverted repeat region flanked by ICP0 and ICP34.5, was excluded from further analyses as qRT-PCR targeting this ORF could not confirm transcription. The six ORFs for which transcription was confirmed were annotated PW1 to PW6. PW1 was encoded in the TR S /IR S repeat region (and thus two copies were present), and no significant structural or sequence domains or motifs were identified within it. Four novel ORFs, PW2 to PW5, were encoded in the U S region as a cluster downstream of US8 (Fig. 1 The MaHV-1 genome lacked an identifiable UL56 homologue. Studies in HHV-2 have shown that UL56 encodes a tegument protein involved in relocalising ubiquitin ligase Nedd4 in HHV-2 infected cells, and has a role in intracellular virion transport and/or virion release from the cell surface [31,32]. UL56 polypeptide interacts and complexes with UL11 polypeptide as they colocalise in the Golgi apparatus and in aggresome-like structures [33]. In HHV-2, UL56 is dispensable for virus growth in vitro, but deletion of it results in reduced production of cell-free infectious virus [31]. In vivo, the presence of UL56 is important for pathogenicity of HHV-1, with deletion mutants having reduced neuroinvasiveness [34]. The hydrophobic C-terminal region of UL56 is particularly important for pathogenicity [35]. A similar hydrophobic region was identified in the C- However, at this stage any structural or functional similarities between PW6 and UL56 remain unclear, particularly as preliminary analyses of predicted tertiary structures did not identify significant structural similarities.
In the absence of conserved motifs or domains, the sequences of PW1 to PW4 provide no indication of the potential functions of these novel polypeptides. The identification of a rhoptery antigen domain in PW5 may suggest an association with organelles, but little else can be inferred. High relative levels of transcript of PW2 to PW5 at 4 hpi may indicate that they are transcribed at an early stage of the replication cycle, but further studies are necessary to better differentiate the kinetics of expression of these ORFs. It is not clear whether these genes are important for in vitro replication or in vivo pathogenicity. Gene deletion studies or functional studies of the products of these ORFs would be necessary to elucidate their function. The clustering of novel ORFs identified in the US/IR region, PW1 to PW5, suggest that they may have been acquired in a single event, possibly from an unknown host or another virus during virus speciation. Sequence comparisons with other marsupial herpesviruses would help determine whether the novel ORFs are unique to MaHV-1, or are instead ORFs common to herpesviruses infecting metatherians.

Conclusions
This is the first genome sequence of a herpesvirus that infects metatherians, a taxonomically unique mammalian clade. Members of the Simplexvirus genus are remarkably conserved, so the absence of ORFs otherwise conserved in eutherian and avian alphaherpesviruses contributes to our understanding of the Alphaherpesvirinae more generally. Together with the sequence similarities observed to the human herpesviruses, these conclusions indicate that further study of metatherian herpesvirus genetics and pathogenesis will provide a unique approach to understanding herpesvirusmammalian interactions.

Viral genome sequencing and analysis
The MaHV-1 isolate selected for sequencing (MaHV1.3076/08) was originally isolated from a Parma wallaby with clinical signs of disease [1]. The viral nucleocapsid genomic DNA was purified and sequenced as previously described [6,36]. Briefly, 50 ng of viral genomic DNA was used to prepare libraries using the Illumina Nextera DNA library preparation kits according to the manufacturer's instructions. The libraries were pooled in equimolar concentrations and loaded onto an Illumina MiSeq. Sequencing was carried out using a 300 cycle V2 SBS kit (Illumina, Inc.) in paired-end 150 bp format. Over 350 Mbp of sequence data were obtained from 2.69 million paired reads with a mean length of 137 bp (standard deviation of 26.3) and were submitted to the Short Read Archive [SRA:SRP067309]. Reads were trimmed to an error probability limit of 0.5 % and de novo assembly was performed using medium-low default sensitivity settings on the bioinformatics package Geneious version 6.1.7 [37] (Biomatters Ltd). This yielded four large contigs (52.6 kbp, 37.3 kbp, 14.9 kbp and 17 kbp) with consensus sequences that corresponded to herpesvirus sequence, according to Blastx and Blastn searches of GenBank databases [38,39]. These consensus sequences were used as references in further assemblies, where reads were reiteratively mapped until there was no further contig extension. Previously published MaHV-1 genome sequence data [Gen-Bank:AY048539, GenBank:AF188480] was used to aid scaffold construction. Medium and high sensitivity default settings with a minimum of 90-95 % overlap identity in Geneious version 6.1.7 were used in these assemblies.
Prediction of open reading frames (ORFs) using Glim-mer3 was restricted to those larger than 240 bp, and ORF annotations were determined by Blastx and Blastn searching against the NCBI non-redundant protein and nucleotide databases, respectively [38,39]. ORF annotations followed those of HHV-1 and −2, whilst the novel ORFs were prefixed with PW (Parma wallaby). The unique MaHV-1 ORF sequences were translated to hypothetical polypeptides and compared to sequence motifs in the Pfam database to determine their putative functions. Further structural prediction analyses were performed using I-TASSER [40]. Threshold cut-off values of >1 for the normalised Z-score, < 3.0 for the RMSD and >0.7 for the TMscore were considered significant and used to identify structural homologues.
Phylogenetic analyses of the translated protein sequences of the core herpesvirus genes UL27, UL30 and US6 were performed using the neighbour -joining method in Geneious version 6.1.7 with the Jukes Cantor model of amino acid substitution [41]. Ten thousand bootstrap replicates were used to assess the significance of the phylogenetic tree topology.

Confirmation of transcription of novel ORFs
To determine if the novel ORFs were transcribed in vitro, RNA from infected cells was interrogated using quantitative reverse transcription PCR (qRT-PCR). One-step growth analyses using wallaby fibroblast JU56 cells [42] in 6-well trays was performed as previously described [6] with modifications. Briefly, JU56 cells were infected with virus at a multiplicity of infection of 3 (3 median tissue culture infective dose (TCID 50 ) per cell). The contents of wells collected at 4 and 12 hpi. RNA was extracted using the RNeasy RNA Extraction kit (Qiagen) and 2 μg of purified nucleic acid was treated with DNase using the Tur-boDNase kit (Life Technologies). Complementary DNA was prepared using Superscript III reverse transcriptase (Life Technologies). Transcription was detected by qPCRs containing 500 nM of each primer (Additional file 2: Table  S1), 50 μM dNTPs, 2 μM MgCl, 8 μM Syto9 green fluorescent stain (Life Technologies) and GoTaq DNA polymerase (Promega). Reactions were incubated through 40 cycles of 95°C for 30 s, 60°C for 30 s and 72°C for 60 s. Relative levels of transcription of each ORF were calculated by comparing cycle threshold (Ct) values for each ORF to that of the host housekeeping gene GAPDH and to those obtained for uninfected cell controls, determining the normalised expression value as previously described [43,44]. Further amino acid sequence analyses, as described above, were continued only on the polypeptides encoded by ORFs confirmed to be transcribed in vitro.

Availability of supporting data
The MaHV-1 genome sequence data has been submitted to GenBank and the accession number is KT594769. The Illumina read data have been submitted to the Short Reads Archive database and has the ID number SRA:SRP067309. repeat/ terminal repeat; I-TASSER: Iterative Threading ASSEmbly Refinement algorithm; MaHV: macropodid herpesvirus; ORF: open reading frame; oriLyt: origin of lytic replication; PDB: protein data bank; pfam: protein families database; PRV: pseudorabies virus / suid herpesvirus 1; PSSM: position-specific scoring matrix; PW: parma wallaby; qRT-PCR: quantitative reverse transcription polymerase chain reaction; RMSD: root-mean-square deviation; SRA: short reads archive; TCID: tissue culture infective dose; UL: unique long; US: unique short; VZV: varicella zoster virus.

Competing interests
The author(s) declare that they have no competing interests.
Authors' contributions PKV, NF, TJM and EVF carried out the experiments, including sample preparation, genome assembly and annotation and the molecular genetic studies discussed. SWL, CAH, GFB, JMD and JRG conceived of the study, designed the study and participated in the analysis of the data. All authors participated in the drafting of the manuscript, and read and approved the final manuscript.