Identification of the Alternative Splicing of the UL49 Locus of Human Cytomegalovirus

The UL49 ORF of human cytomegalovirus (HCMV) is essential for viral replication; conserved among all herpes viruses; however, the function is unclear. Once the UL49 ORF was precisely deleted from the start to stop codon, the mutant did not yield infectious progeny. In this study, we find out many alternatively processed ESTs in UL49 locus in HCMV-infected cells, in which there are two novel transcription termination sites in UL49 locus. Most of these ESTs are rare transcripts that contain directed repeat sequences in the intron splicing regions. There is a typical GU-AG intron splicing site in UL49Y transcripts. The 1847 bp UL49Y cDNA spans an ORF from 335 to 1618 and encodes a putative protein of 427 amino acids with a predicted molecular mass of 47.1 kDa. All the new EST sequences and UL49Y cDNA sequence have been deposited in the GenBank database (GenBank Accession nos. GW314860-GW314900 and GU376796). This study provides us with very important clues for revealing the importance of the UL49 locus alternative splicing.

The human cytomegalovirus (HCMV), a -herpes virus, is the most common cause of congenital infection and an important pathogen in immunocompromised individuals [1]. As the largest virus in the herpes virus family, the HCMV genome comprises ≈230 kb of double-stranded linear DNA [2]. Evaluation of the genome led to estimates of the number of protein-coding ORFs ranging from a maximum of 252 potentially functional ORFs that are conserved in different clinical isolates to a minimum of 165 ORFs that are conserved between HCMV clinical isolates [3][4][5]. Another significant uncertainty to the number of ORFs is our incomplete understanding of HCMV splicing. A variety of spliced mRNAs have been successfully identified [6][7][8][9], but so far there has been no exhaustive experimental search for spliced HCMV mRNAs. It is not possible to predict splice donors and acceptors with certainty.
HCMV UL49 ORF with unclear function is conserved among all herpes viruses. Once the UL49 ORF was precisely deleted from the start to stop codon, the mutant did not yield infectious progeny even after repeated transfection and extensive incubation [10]. Loss of replication due to a deletion of the whole UL49 locus may be caused by deletion of essential overlapping ORFs because the UL49 ORF overlaps with UL50 ORF and UL48.5 ORF nearby. The UL48, UL48.5, and UL50 ORFs all encode essential proteins. Since our recent study suggested a novel alternative transcript originating from the UL49 locus of the HCMV genome, which is different from the UL49 ORF and UL48.5 ORF, we still do not know whether there are new transcripts or not in UL49 ORF. For this reason we thoroughly examined HCMV-infected cells for more alternatively processed ESTs in UL49 locus.
Towne HCMV BAC containing a cassette for GFP eukaryotic expression was a gift from Professor Fenyong Liu, University of California, Berkeley [11]. Human dermal fibroblasts neonatal (HDFn) (Cascade Biologics) cells were electroporated with the Towne HCMV BAC DNA and an expression plasmid of the HCMV pp71 tegument protein, which can increase the infectivity of HCMV BAC DNA. Then we plated cells onto 75 cm 2 (Table 1) corresponding to the nucleotide position of HCMV Towne genome used in this paper.
were cleared of cell debris by low-speed centrifugation and collected as stocks of cell-free virus.
HDFn cells were infected with the cell-free virus in Dulbecco's modified Eagle medium supplemented with 10% fetal calf serum (GIBCO/BRL). The final mixture will contain 10 g/mL gentamicin and 0.25 g/mL amphotericin B. Cells were cultured at 37 ∘ C in a humidified incubator with 5% CO 2 . RNA was extracted from infected HDFn cells cultured in sixwell plates at various time points (2 h, 6 h, 12 h, 48 h, 72 h, and 96 h) after infection. To select for IE transcripts, cells were treated with the protein synthesis inhibitor cyclohexamide (100 g/mL) for 1 h prior to infection and throughout the 24 h infection period, when cells were harvested for RNA isolation. To select for E (early) transcripts, viral DNA replication inhibitor phosphonoformic acid (100 M) was added to the medium after the 24 h infection period, and cells were harvested 72 h after infection [11]. Total RNA of all samples was isolated using TRIzol (Invitrogen).
To define the 3 end of the UL49 locus alternative mRNAs, we carried out the RACE method with 3 -Full RACE Core Sets (Takara) and all the primers have been listed in Table 1 and Figure 1. The cDNA template was synthesized with the Oligo dT-3sites Adaptor Primer of the 3 -Full RACE Core Set according to the manufacturer's instructions. The 1st PCR was performed with the 3 RACE Outer primer (5 -TAC CGT CGT TCC ACT AGT GAT TT-3 ) and Primer 1 in 50 L of the following reaction: 1 × LA PCR Buffer, 0.4 mM dNTP mixture, 0.2 M each primer, 2.5 U LA Taq polymerase, and 1 L of cDNA template. PCR amplification was done at 95 ∘ C for 5 min and followed by 30 cycles at 94 ∘ C for 30 s, 55 ∘ C for 1 min, and 72 ∘ C for 3 min. The 2nd PCR reactions and conditions were the same as the 1st PCR except for using the 3 RACE Inner primer (5 -CGC GGA TCC TCC ACT AGT GAT TTC ACT ATA GG-3 ) and Primer 2, 1 L the outer PCR product as the template. Each sample was analyzed on a 1.5% agarose gel. Gel-purify of the PCR products were performed by using Gel Extraction Kit (E.Z.N.A.). Then all the isolated fragments were directly cloned into the TA-vector pMD18-T (TaKaRa) and sequenced using the sequencing Primer RV-M and Primer M13-47. Through the 3 -RACE PCR approaches, 10 novel cDNA fragments were cloned, with length from 188 bp to 1594 bp. These new EST sequences have been deposited in the GenBank database (GenBank Accession no. GW314860-GW314869) ( Table 2). The various sequences comprised two groups. The group was characterized by the usage of a distinct alternative poly(A)  Figure 1). According to the 3 end of group A/B and the UL49 ORF 5 end, primers were designed to do the nested PCR. The first-strand cDNA was synthesized using Takara 1st Strand cDNA Synthesis Kit. Total RNA samples (3 g) were reverse transcribed in a 10 L volume in the presence of 5 M Oligo dT Primer, 1 mM dNTP mixture. The reaction tube was incubated at 65 ∘ C for 5 min, followed by keeping in the ice. Then the following reagents were added to each reaction tube: 1 × PrimeScript Buffer, RNase Inhibitor 20 U, PrimeScript RTase 200 U, and RNase free DEPC H 2 O 4.5 L (Takara, Japan). Samples were incubated at 42 ∘ C for 60 min and 70 ∘ C for 15 min and then stored at 4 ∘ C. Then nested PCR was performed. For amplifying the transcripts from the UL49 ORF 5 end to the group A, the outer PCR reactions volume is 50 L, which contains 1 L synthesized cDNA template, 1 × LA PCR Buffer, 0.4 mM dNTP mixture, 0.2 M Primer 3 and Primer 4, and 2.5 U LA Taq polymerase. Amplification conditions were 94 ∘ C for 30 s, 55 ∘ C for 1 min, and 72 ∘ C for 3.5 min for 30 cycles. The inner PCR reactions and conditions were the same as the outer PCR except for using Primer 5 and Primer 6, 1 L the outer PCR product as the template. To amplify the transcripts from the UL49 ORF 5 end to group B, the outer PCR was performed with the Primer 3 and Primer 7, and inner PCR was performed with Primer 5 and Primer 8. The products of nested PCR amplification were inserted into pMD18-T cloning vector. The recombinant plasmid was transformed into E. coli DH5 . The amplicons were sequenced using Primer RV-M and Primer M13-47. A complete list of the cDNA clones and their positions relative to genomic DNA is listed in Tables 3 and 4. These new EST sequences have been deposited in the GenBank database (GenBank Accession no. GW314870-GW314900). We identified the situation of these transcripts in HCMV genome by BLAST software (http://blast.ncbi.nlm.nih.gov/Blast.cgi).
All these transcripts in UL49 locus have not been reported before. All the cDNA clones were acquired by using RACE and nested PCR. The results show the defects of bioinformatics methods in predicting alternatively spliced transcripts on one hand. In addition, nested PCR method was sensitive enough to find more transcripts. In fact, there were a lot of rare transcripts in this locus. The alternatively spliced UL49 variants detected suggest the complexity of transcription in the UL49 locus. We summarized all the novel transcripts, which was mapped with the R package software (ver.3.1.1) (Figure 4).
Most of these novel transcripts have directed repeat sequences in intron splicing regions, rather than typical RNA splice site GU-AG. The directed repeat sequences existed in many viruses, with different lengths and functions [12][13][14][15][16]. Further research was required in order to clarify the importance of directed repeat sequences in UL49 locus. All these transcripts in HCMV UL49 locus had not been found in past. It might be because of low abundances, which is similar to the transcripts in HCMV UL37 locus. Alternatively UL37 spliced variants are exceedingly low abundances relative to the UL37x1 unspliced transcript, some ∼100 fold less and below detection by RT-PCR and gel detection, so the alternatively UL37 spliced variants cannot be detected by either S1 or Northern blot analysis [17,18]. Consistent with these results, some UL49 spliced variant cDNAs were too low abundances to be rarely obtained in the HCMV-infected cells ( Figure 2).
The functions of all the rare UL49 transcripts were unclear. We speculated that the functions of these transcripts might be for encoding virus proteins, or that they might play roles in regulating host cells or viral genes in the form of RNA. We will further do some research on the functions of these RNAs. Nonetheless, alternative processing of known HCMV transcripts results in the production of functionally different gene products. In the best-studied locus, differential processing of the major IE pre-mRNAs leads to the production of multiple spliced and polyadenylated RNAs. Moreover, Alwine has recently identified novel IE1 RNA splice variants, whose abundances differ during HCMV infection; however,     their temporal expression is similar to that of IE1 mRNA. The products of the differentially spliced IE1 and IE2 transcripts differ in functions [17,19].
Only the UL49X and UL49Y were detected from 2 h to 96 h after HCMV-infected cell and the other transcripts founded expressed temporally (Figure 2). UL49X and UL49Y can always be acquired, which might result from the fact that the UL49X and UL49Y have higher abundance and other transcripts have lower abundance. Although accurate splicing of the UL49X and UL49Y spliced junctions has been verified, we just obtained the EST fragments of UL49X and UL49Y, and the full-length UL49X and UL49Y cDNAs have not been cloned yet. Our next work is to obtain the UL49X and UL49Y full length cDNA.
The open reading frame was predicted by ORF finder program (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). The full-length cDNA of UL49 began at 73134 bp and ended at 71047 bp, in which ORF began at 73043 bp and ended at 71331 bp. UL49X had the same 5 end with UL49, but its 3 end terminated at 68783 bp. A 2341 bp intron was deleted when UL49X cDNA transcript from the HCMV genome. So, it encoded a completely different protein from the UL49. The UL49X cDNA sequence has been deposited in the GenBank database (GenBank Accession no. GW314876). Because we could not search a complete ORF in UL49X by bioinformatics methods, we only analyzed the protein sequences of UL49Y with the complete ORF. The UL49Y cDNA sequence has been deposited in the GenBank database (GenBank Accession no. GU376796).
The 1847 bp UL49Y cDNA spanned an ORF from 335 to 1618 and encoded a putative protein of 427 amino acids with a predicted molecular mass of 47.1 kDa and an isoelectric point (pI) 9.35. There is an inframe stop codon at −72 prior to the first initiation codon. UL49Y cDNA contained a 95 bp intron and the intron conformed to the GU-AG rule (GGCTGgtgtg. . .tacagCATGGA). UL49Y is the truncated form of UL49, which lacks 143 aa in the 5 end of UL49 ( Figure 3).
We identified that UL49Y could encode a complete ORF by bioinformatics methods. But we need further experiments to prove whether it encodes a protein or not. UL49Y and UL49 have a common 3 end, and current studies suggest that UL49 protein is an E (early) viral protein. In our experiment, we found that UL49Y began to express two hours after viral infection, which indicates that it is an IE (immediate early) protein. While UL49Y protein can continue to express till 96 hours, the different expression phase suggested that UL49Y may have distinct biological significance which is different from UL49. It is important for us to determine the functions of UL49Y.
Previous data reported that virus could not replicate if UL49 ORF was deleted [10]. Through bioinformatics analysis, we found that the deletion of UL49 also destroyed the UL50 ORF and UL49A ORF (Figure 1). And it may destroy the 3 UTR of UL48. We also found that the ORF of UL49Y was completely missing. Although so far we have not received full-length form of UL49X, but it is sure that the ORF of UL49X has also suffered damage when the UL49 ORF is deleted. We will examine the functions of different transcripts of this locus and reveal the molecular mechanisms why this locus is critical for virus replication. At meantime, we hope to learn the transcription situation of UL49 locus in other low passage CMV strains such as Merlin, TB40, and patientderived clinical isolates.
Overall, we found two novel transcription termination sites in UL49 locus. In these two transcription termination sites, we found a large number of new transcripts, most of which are rare transcripts and contain directed repeat sequences in intron splicing regions. UL49X gene can express stably and has directed repeat sequences in intron splicing regions. There are typical GU-AG intron splicing sites in UL49Y transcripts. UL49Y might encode a full-length ORF. The above studies provide us with important clues for revealing the importance of the UL49 locus alternative splicing. This surprising transcript complexity makes the UL49 locus be the most complex of any known HCMV transcript [20].