mRNA maturation in giant viruses: variation on a theme

Giant viruses from the Mimiviridae family replicate entirely in their host cytoplasm where their genes are transcribed by a viral transcription apparatus. mRNA polyadenylation uniquely occurs at hairpin-forming palindromic sequences terminating viral transcripts. Here we show that a conserved gene cluster both encode the enzyme responsible for the hairpin cleavage and the viral polyA polymerases (vPAP). Unexpectedly, the vPAPs are homodimeric and uniquely self-processive. The vPAP backbone structures exhibit a symmetrical architecture with two subdomains sharing a nucleotidyltransferase topology, suggesting that vPAPs originate from an ancestral duplication. A Poxvirus processivity factor homologue encoded by Megavirus chilensis displays a conserved 5′-GpppA 2′O methyltransferase activity but is also able to internally methylate the mRNAs’ polyA tails. These findings elucidate how the arm wrestling between hosts and their viruses to access the translation machinery is taking place in Mimiviridae.


INTRODUCTION
The discovery of Mimivirus 10 years ago brought in the realization that DNA viruses could overlap with the cellular world in terms of genome size and complexity (1). Many additional members of the Mimiviridae family have then been isolated from various environments (2,3). To date there are seven fully sequenced representatives of the mimiviruses sub-family, all infecting Acanthamoeba, including Mimivirus (1), Mamavirus (2), Moumouvirus (4), Terra1-2 (5), Megavirus chilensis (6) and Megavirus Iba (7). More distant relatives, belonging to the Mimiviridae family, but infecting other unicellular eukaryotes, have also been sequenced, including Cafeteria roenbergensis virus (CroV, (8)), Phaeocystis globosa virus (PgV, (9)) and the Organic lake phycodnaviruses (OLPV1-2, (10)). They all share an AT-rich (>70%) linear DNA genome with size up to 1.26 Mb encoding up to 1120 proteins, which include a full replication and transcription apparatus. Nucleocytoplasmic Large DNA viruses (NCLDV) (11) comprise the Iridoviridae and the Phycodnaviridae exhibiting mixed nuclear and cytoplasmic replication stages, and exclusively cytoplasmic viruses such as the Poxviridae and the Asfarviridae in addition to the Mimiviridae. Cytoplasmic NCLDV develop a transitory organelle, the virion factory, where all replication processes take place (12)(13)(14)(15). The simultaneous delivery from the virion of the fully virally-encoded transcription apparatus together with the genomic DNA, allows the immediate transcription of the early expressed viral genes independently from the host machinery (13,(16)(17)(18)(19)(20). As expected, all the Mimiviridae infecting Acanthamoeba (belonging to the mimiviruses sub-family) share the same transcriptional regulatory elements. This includes a highly conserved promoter element ('AAAATTGA') associated to the early-expressed genes and a more degenerate promoter element associated with the late-expressed genes (18,21). Their genes also present a unique polyadenylation signal that is defined by a hairpin-forming palindromic motif, not conserved at the nucleotide sequence level, instead of the consensus sequence used by eukaryotes and their viruses (6,18,22). This implies that the most distant members of the mimiviruses sub-family, Mimivirus and M. chilensis, despite their significant differences (encoding 50% orthologous proteins that are 50% identical in average), have in common the machinery responsible for the hairpin recognition and polyadenylation. Accordingly, their genomes exhibit a conserved cluster of three genes (Mimivirus R341-R343 and M. chilensis Mg559-561, Supplementary Figure S1A) encoding a remote homologue of the poxviruses polyA polymerase (PAP), a protein of unknown function, and a predicted dsRNA-specific RNAse III, which could be responsible for the 3 -end maturation of the viral transcripts (Supplementary Figure S1C, steps 5 and 6).
Cellular PAPs are classified in two groups. Canonical PAPs have a tripartite structure involving a N-terminal nucleotidyltransferase (NT) catalytic domain, corresponding to the palm domain of polymerases, a central domain, corresponding to the finger domain, and a C-terminal domain corresponding to the RNA binding domain (RBD) (23,24). The only known 3D structure of a non-canonical PAP is that of the mitochondrial PAPD1, which is dimeric and possesses a N-terminal RBD domain (25). The only known structure of a viral PAP is that of Vaccinia virus VP55, and is very different from that of cellular PAPs (26). This structure is nevertheless reminiscent of the non-canonical cellular PAP with a N-terminal domain predicted as the RBD and a central domain corresponding to the catalytic subunit containing the NT domain.
Poxviruses and asfarviruses recognize a sequential signal for transcription termination corresponding to stretches of thymidines in the 3 untranslated regions (UTR) of the genes (27,28). In addition, polyadenylation in poxviruses requires stretches of uridines at specific positions with respect to the 3 -end of the transcripts (26,29). In vitro, the monomeric Vaccinia virus PAP produces polyA tails as short (∼polyA 30 ) as that of asfarvirus mRNAs (16) while their mRNA polyA tails are much longer (∼polyA 300 ) in vivo. This difference in length is due to the processivity factor VP39, which forms a heterodimer with the VP55 PAP (30). Since a homolog of this processivity factor is encoded by M. chilensis genome (Mg18) but is absent from Mimivirus, we determined the crystallographic structure of their PAPs and performed their enzymatic characterization in the presence and absence of the Mg18 protein. We also identified the minimal machinery required to perform the recognition of the hairpin, its cleavage and the mRNA polyadenylation.
Mimivirus and M. chilensis encode a trifunctional mRNA capping enzyme (Mimivirus R382 and M. chilensis Mg512) responsible for capping (Supplementary Figure  S1C Figure S1B). Both the mRNA capping enzyme and the predicted 2 O MTase are present in the virion proteomes (14,17) and could thus be responsible for the 5 -end mRNA maturation (Supplementary Figure S1C, steps 1-3). An unusual cap modification by an additional MTase (L320, conserved in M. chilensis, Mg584) had also been reported for Mimimivirus, which produces a 2,7-dimethylguanosine DMG cap (Supplementary Figure S1C, step 4, (32)). Interestingly, the Vaccinia virus VP39 PAP processivity factor is also a cap 2 O MTase (33). This led us to study the RNA methyltransferase activity and specificity of its homologue in M. chilensis, Mg18.

Plasmids and proteins production
The M. chilensis Mg561 protein (H 6 c-Mg561) was produced as previously described (34). Two constructs have been produced for the Mimivirus R341 gene, which was first amplified from Mimivirus genomic DNA. It was cloned using the Gateway system (Invitrogen) in the pDEST17 expression plasmid to be expressed with a N-terminal His 6tag (H 6 +R341). It was also inserted by restriction/ligation in an 'in house' modified pETDuet expression vectors (Novagen) allowing the expression with a N-terminal His 6 -Tag (H 6 c+R341). A human rhinovirus 3C protease cleavage site allows tag removal. We refer to the Mimivirus and Megavirus PAPs after tag removal as H 6 c-R341 and H 6 c-Mg561, respectively. The D1 mutants lacking residues 1-43 were generated through site-directed mutagenesis by polymerase chain reaction (PCR) and were named D1 H 6 c+Mg561 (M. chilensis) and D1 H 6 c+R341 (Mimivirus), respectively. Expressions were performed in E. coli Rosetta (DE3) by overnight induction at 17 • C with 0.2 mM IPTG after OD 600 reached 0.8. The cultures were left overnight at 17 • C. Protein extraction from E. coli cells were performed by sonication. The proteins were purified using HisPur Ni-NTA Column (Pierce) washed with Buffer A (50 mM Tris-HCl, 300 mM NaCl buffer, pH 8.5), and then washed with Buffer A containing 25 mM and 50 mM imidazole. Bound proteins were eluted with 500 mM imidazole in Buffer A. After cleavage and purification, the H 6 c-R341 and D1 H 6 c-R341 proteins were dialyzed in CHES 10 mM pH 9 and the H 6 c-Mg561 and D1 H 6 c-Mg561 proteins were dialyzed against Tris-HCl 10 mM pH 9, NaCl 100 mM, prior to concentration using a Amicon-Ultracell 30K centrifugal filter device (Millipore).
The Mg18 and R343 genes were amplified from M. chilensis and Mimivirus genomic DNA and inserted in the modified pETDuet and pDEST42 vectors, for expression of N-and C-terminally tagged proteins, respectively. Expression, protein extraction and purification conditions were the same as for the R341 proteins but using for purification in place of buffer A (above), buffer B (50 mM Tris-HCl, 300 mM NaCl buffer pH 7.5) for Mg18 and buffer C (50 mM Tris-HCl, 300 mM NaCl buffer pH 7.0) for R343. The His 6 -tags were not removed from Mg18 and R343 and the H 6 c+Mg18 and R343+H 6 proteins were then concentrated using a Amicon-Ultracell 30K centrifugal filter device (Millipore).

Polyadenylation assay
A synthetic 20mer RNA substrate (5 -GUCCACGUAGACUAACAACU-3 , Biomers) was 5 end-labeled with T4 polynucleotide kinase (NEB) in the presence of ␥ -32 P ATP and purified through Microspin G25 columns (GE healthcare). Polyadenylation reactions were carried out at 30 • C in PAP reaction buffer (50 mM Tris pH 7.5, 10 mM KCl, 5 mM DTT) with 500 nM of the 32 P labeled RNA and 50 nM H 6 c-R341 or H 6 c-Mg561 and were initiated by addition of 5 mM MgCl 2 or 0.5 mM MnCl 2 and 100 M ATP. Where indicated, 100 nM of Mg18 were added to the reaction. To test the nucleotide specificity, ATP was replaced by CTP, GTP or UTP. Reaction products were separated on 8 M urea-14% polyacrylamide gel, and were visualized with a Storm PhosphorImager (Fujifilm) and Image-Gauge software.

Binding assay
Binding experiments were conducted by biolayer interferometry using a Blitz instrument (FortéBio). For binding analysis of Mg561 with Mg18, Ni-NTA-coated biosensors (ForteBio) were loaded with the H 6 c+Mg18 at 20 g.ml −1 in association buffer (PBS 1X, 0.05% Tween-20, 100 g.ml −1 BSA). After equilibration in association buffer to establish a baseline, a series of Mg561 concentrations ranging from 1.685 to 13.48 M were ran over the immobilized Mg18 protein in association buffer. The association and dissociation binding traces were recorded at room temperature. The ability of R341 to bind Mg18 was also measured using the H 6 c-R341 protein at 6.0 M.

Protein crystallization, structure determination and analysis
For the Mg561 structure determination, protein crystallization and data collection were performed according to (34). The phases were calculated using autoSHARP (36) on a three wavelengths MAD data set in the 51.3-2.24Å resolution range, and a single solution was found with 34 selenium atoms (for 36 methionines) with a mean figure of merit of 0.39. The solvent flattened electron-density map allowed to automatically build most of the two monomers. The model was refined using AutoBUSTER (37). The final round of refinement resulted in a final R work 18.4% and R free 21.2%. The refined structure consists of residues 4 to 514 and 516 for monomer A and B, respectively. Residues 263-275 of monomer A and 263-277 of monomer B being disordered in the crystal structure are absent of the final model. 95.4% of the residues are in the most favored regions of the Ramachandran plot and 4.2% are in the additionally allowed regions. Refinement statistics are listed in Table S1.
Crystals of H 6 c-R341 were obtained in 7% PEG 400 (v/v) and 50 mM Imidazole pH 6.4. The H 6 c-R341 was crystallized at 20 • C by hanging-drop vapor diffusion using 24well culture plates (Greiner). Each hanging drop was prepared by mixing 0.5 l of H 6 c-R341 (1.5 g/l) with 0.3 l of reservoir solution. The hanging drop on the cover glass was vapor-equilibrated against 1 ml reservoir solution in each well. To improve the crystal diffraction, a previously described evaporation protocol was used where crystals were soaked for 15min in 10 l reservoir solution containing 10% ethylene glycol as a cryoprotectant (38). Crystals were then flash-frozen at −173 • C. Data collection was carried out at −173 • C on ID29 beamline at the European Synchrotron Radiation Facility (Grenoble, France). Diffraction intensities were integrated with XDS (39). The structure of R341 was solved by molecular replacement with Phaser (40) using a Mg561 monomer as search model and refined using AutoBUSTER (37). The final round of refinement resulted in a final R work 22.7% and R free 24.6%. The refined structure consists of residues 9 to 528 and 527 for monomer A and B, respectively. Residues 262-274 and 361-372 of monomer A and 262-276 and 360-368 of monomer B being disordered in the crystal structure are absent of the final model. The quality of the model was validated using Molprobity. 90.8% of the residues are in the most favored regions of the Ramachandran plot and 8.6% are in the additionally allowed regions. Refinement statistics are listed in Table S1.

Structure analysis
The contact areas between the two Mg561 molecules in the asymmetric unit were analyzed with the PISA server (41). Electrostatic surfaces were calculated with APBS (42).

In vitro synthesis of Megavirus and Mimivirus RNA transcripts
DNA templates for the synthesis of 3 hairpin-containing RNA transcripts of the M. chilensis Mg592 gene and Mimivirus R418 gene were generated by PCR using as template the M. chilensis genomic DNA (6) or a plasmid containing the R418 gene previously described (43), using a 5 primer containing the T7 promoter and a 3 primer complementary to the sequence downstream of the 3 hairpin. The PCR products corresponded to the full length Mg592 gene (394nt) and the 80nt 3 -end of the R418 gene containing the canonical hairpin in which the natural transcripts are polyadenylated followed by a shorter non-canonical hairpin never used in vivo. The RNA transcripts were synthesized in vitro at 37 • C overnight in 250 l reactions containing 40 mM Tris-HCl pH 8.0, 5 mM DTT, 2 mM spermidine, 0,01% Triton X-100, 4% PEG8000 (w/v), 8 mM NTPs, 40 mM MgCl 2 , 2.5 U of yeast inorganic pyrophosphatase, 200 nM of DNA template and 100 nM of T7 RNAP (prepared in-house). DNA templates were then digested by 25U of DNase I (Roche). The Mg592 RNA transcripts were purified on 5% acrylamide-urea gel and the R418 ones were purified using the RNeasy mini kit (Qiagen) following the manufacturer's protocol. All RNAs were quantified spectrophotometrically.

RNase assay
The R418 RNA transcripts were 5 dephosphorylated using the FastAP phosphatase (Thermo) and then 5 -end-32 P labeled with T4 polynucleotide kinase (NEB) in the presence of ␥ -32 P ATP and purified through Microspin G25 columns (GE healthcare). RNase assay was carried out at 30 • C in 50 mM Tris pH 7.5, 150 mM KCl, 5 mM DTT with 250 nM of the R418 32 P-labeled RNA transcripts and 1 M R343+H 6 in the presence of 5 mM MgCl 2 or 0.5 mM MnCl 2 . Where indicated, 1 M of H6c-R341 was added to the reaction. Reaction products were separated on 8 M urea-10% polyacrylamide gel.

Polyadenylation of mg592 natural transcripts and synthesis of the dscDNA for sequencing
RNA transcripts (1 g) were in vitro polyadenylated at 30 • C for 1 hour in PAP reaction buffer, 1 mM ATP, 5 mM Nucleic Acids Research, 2015, Vol. 43, No. 7 3779 MgCl 2 , 0.5 mM MnCl 2 , and 0.5 M of H 6 c-Mg561 or H 6 c-R341. Control reactions with 2U of bacterial PAP (NEB) were also performed. Where indicated the RNAs were pre-incubated 3 hours with 1 M of R343+H 6 and 5 mM MgCl 2 or 0.5 mM MnCl 2 . Polyadenylated RNA were purified using the Dynabeads mRNA Purification Kit (Invitrogen). PolyA + RNA (150ng) were reverse transcribed using the SMARTScribe reverse transcriptase (Clontech) and a modified oligo(dT) primers (5 -AAGCATTATG CGGCCGCATTCTAGAGGCCGAGGCGGCCGACA TGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN-3 , where V corresponds to the mix of A, G, C or T and N to any nucleotides) according to the manufacturer's protocol. Second strand synthesis was performed using the NEBNext mRNA Second Strand Synthesis mix (NEB). Double-strand cDNAs were amplified using GoTaq G2 Flexi DNA Polymerase (Promega) and a specific 5 -primer (5 -CCAGGGGCCCGGATCCATGTCATTTGATTG GGGAGTTAGCCATG-3 ) and the modified oligo(dT) 3 -primer. PCR products were sequenced using the specific 5 -primer ordered from the sequencing company (Eurofins).

Common features of the viral PolyA polymerases
The Mimivirus R341 and M. chilensis Mg561 genes were reliably predicted to encode PAPs after two iterations of PSI-BLAST (45) (Figure 1 and Supplementary Figure S2). They present the conserved NT signature hG[G/S]x n Dh[D/E]h with n = 23 instead of 13 for the Vaccinia virus PAP (46). Their sequences share 62% identity over their entire length, more than the average level of conservation (50%) between orthologous proteins of the two viruses (6,47). The level of sequence conservation between the four subdomains is variable, from 48% identical residues in the N-terminal domain (D1), up to 80% for the catalytic domain (D2) (Figure 1). The Mimivirus R341 protein exhibits a unique acidic extension of 65 residues at its C-terminal end (D5) and an

R341 and Mg561 gene products are bona fide polyA polymerases
To confirm the functional prediction of Mg561 and R341, the proteins were purified and assayed in an in vitro polyadenylation assay on a synthetic 20-mers RNA without secondary structure in the presence of ATP and Mg 2+ or Mn 2+ as catalytic ions (Figure 2A and B). The two proteins were able to synthesize long products, apparently proccessively, derived from the RNA and ATP, in the presence of each cation, demonstrating the intrinsic property of Mg561 and R341 to polyadenylate RNA in absence of any specific  Figure S1C, step 6). This in turn suggested that in vivo the hairpin recognition and processing could require additional proteins (Supplementary Figure S1C, step 5). In contrast to the Vaccinia and ASFV PAPs which stop after the addition of ∼30 adenylates (16,48), Mg561 and R341 were able to add much longer polyA tails (up to 700 adenylates). The nucleotide specificity of the PAPs was then assayed in the presence of Mn 2+ and the 4 NTPs ( Figure 2C). As expected, ATP was by far the most efficient nucleotide for homopolymeric tail synthesis. Similar results were obtained when using Mg 2+ instead of Mn 2+ (data not shown). However, as observed for non-canonical cellular PAPs known as terminal uridyltransferases (TU-Tase) or polyU polymerases (PUP) (49), Mg561 was also able to add long polyU tails, although with a reduced activity. This property was also reported for Vaccinia VP55 (29). This enzyme requires uridylates 30-40 nt upstream the polymerization site to be able to proceed further. As a consequence, the enzyme terminates polyadenylation after 30-40 nt and to add long poly(A) tails VP55 needs its processivity factor VP39 (30).

Is Mg18 a bona fide processivity factor?
We produced the processivity factor homolog Mg18 encoded by the M. chilensis genome, and assayed its binding to Mg561 (Supplementary Figure S3). As expected, Mg18 interacted with Mg561 (K D = 6.2 M), suggesting that it could play a role similar to that of VP39 in the Vaccinia virus heterodimeric polyadenylation complex. No association was observed between Mg18 and R341 confirming the specificity of the interaction. To further investigate the role of Mg18, we tested the Mg561-Mg18 complex in our polyadenylation assay (Figure 2A and B). The presence of Mg18 did not increase the length of the polyA tail nor the polyadenylation velocity, showing that the Mimivirus and M. chilensis enzymes were unaffected by the addition of Mg18, in contrast to eukaryotic and other viral PAPs. Therefore, Mg18 and VP39 are not functionally equivalent and the absence of a Mg18 homolog in Mimivirus has no consequence on the polyadenylation process.

Structure of the vPAP enzymes
We determined the Mimivirus and M. chilensis PAP crystallographic 3D-structures in order to identify the determinants of their processivity. Since they shared the same characteristic features, we will refer to the highest resolution structure of Mg561 (Supplementary Table S1 and S2). Despite their lack of sequence similarity (15% of identical residues), the DALI search (50) returned the VP55 Vaccinia virus PAP (2GA9) as the best structural homologue of the Mg561 monomers (Z-score between 5 and 6, C␣-rmsd of 3.8Å over 248 residues and Table S2). As predicted from the sequence comparison of the mimiviruses PAPs with other viral PAPs (Figure 1), their structures share four subdomains ( Figure 3). The D5 domain unique to the Mimivirus PAP is disordered in the crystal structure.
The dimerization domain. As observed in solution (see later), the Mg561 crystal structure reveals a very stable homodimer with an almost perfect two-fold symmetry axis (179 • ) and a 3200Å 2 buried surface area upon dimerization (Figures 3 and 4). The dimer structure highlights a large tunnel at the interface between the two monomers. The Nterminal domain (D1) encompasses an extended strand followed by a long helix (␣ 1 ) bent at the P 28 residue (Figures 1  and 3). In each monomer, this helix creates a domain swap, and makes contact with the D4 loop located between ␣ 13 and ␤ 16 and the ␣ 7 helix of the D2 domains of the other monomer to form the dimer structure. In the VP55 structure (PDB: 2GA9, (26)), the equivalent helix (␣ 8 , Figure 1B) is making contacts with the D4 (␣ 18 and ␤ 16 ) and the D2 (␣ 15 ) domains of the same molecule. In Mg561, next to the ␣ 1 helix, another proline residue (P 44 ) induces a change in orientation of the main chain, making the first helix of the domain D2 (red) almost perpendicular to the ␣ 1 helix. We investigated if VP55 could form a related dimer at the high protein concentration found in crystals by scrutinizing the crystallographic structure of the unliganded monomeric VP55 containing two monomers per asymmetric unit (PDB: 3OWG, (51)). Even using the symmetry related molecules, no occurrences of such a dimer was visible. Reciprocally, none of the symmetry related molecules in the Mg561 and the R341 structure reproduce the dimer found in the VP55 structure. To further characterize the molecular determinants of the dimerization of Mg561 and the role of the dimerization on the polyadenylation activity, we made deletion of the Mg561 ( D1 Mg561) and R341 ( D1 R341) Nterminal domains. Gel filtration experiments revealed that the mutants were monomeric and although they still copurified with RNA, they were inactive (Figure 4), suggesting that the dimeric state and the N-terminal domain were required for the polyadenylation activity.
The vPAPs catalytic domain. The second domain (D2) exhibits the typical topology of a nucleotidyltransferase (NT) domain made of a mixed five-stranded ␤ sheet and two connecting helices and, as for the VP55 structure, the two helices pack against one side of the ␤ sheet parallel to the ␤ strands ( Figures 1B and 3). The canonical acidic residues D 94 -E 96 -D 143 are properly positioned and ordered in the crystal structure as evidenced by the 2Fo-Fc electron density maps contoured at 1 on the of M. chilensis and Mimivirus PAP active sites (Supplementary Figure S6A). A water molecule replacing the Mg 2+ ion usually found in structures in complex with ATP is used to help locate the active sites in the figures (Figures 3 and 5B, Supplementary Figure S4). The helix-turn motif within the ␤ sheet presents a 10-residues longer loop relative to the Vaccinia virus PAP structure. This loop stabilizes the ␣ 1 helix of the same monomer in an orientation perpendicular to D2. This domain thus appears equivalent to the catalytic domain of VP55 made of an NT domain extended by one helix (Mg561 ␣ 5 , VP55 ␣ 12 ), two antiparallel strands (␤ 6 -␤ 7 ) and terminated by two parallel helices (Mg561 ␣ 6 -␣ 7 , VP55 ␣ 13 -␣ 15 ). This topology thus defines the catalytic domain of viral PAPs ( Figure 1B).
A duplication event at the origin of the internal symmetry of the vPAPs. The third domain (D3) of the Mg561 structure surprisingly mimics the catalytic domain topology (Figures 1B and 3) leading to an internal symmetry between the D2 and D3 domains (rotation of 162 • , 2.4Å translation and C␣-rmsd > 2.0Å, Supplementary Figure S4, Table S2). Two of the three catalytic residues (F 286 , E 288 , K 342 ) are replaced by non-canonical residues in D3, and an even longer insertion within the helix-turn motif (37 amino-acids) of the NTlike subdomain is present (Supplementary Figure S2). An equivalent of the ␣ 5 helix is also missing. Unexpectedly, the Vaccinia virus PAP D3 domain also follows this topology (C␣-rmsd >5Å with Mg561 D2 and ∼2.5Å with Mg561 D3, Supplementary Figure S4, Table S2) but with an extra ␤ strand (␤ 8 ) at the N-terminus of the NT-like subdomain replacing the canonical first helix ( Figure 1B and Supplementary Figure S2). In the VP55 structure there is a rotation of 179 • and a translation of 5Å between the two subdomains leading to a C␣-rmsd of ∼2.6Å (Supplementary Figure S4, Table S2). The equivalents of the ␣ 10 and ␣ 12 helices of the NT domain (␣ 3 and ␣ 5 in Mg561) are also missing ( Figure 1B). This suggests that despite the lack of sequence homology between the two domains, the D3 domain might have originated from an ancestral duplication of the catalytic domain, a duplication that might be a hallmark of the PAPs of all large DNA viruses. The two subdomains appear more divergent in the VP55 structure (Table  S2), which could also explain why the internal symmetry of the molecule was not initially recognized.
The D4 domain. In Mg561, the D4 domain of monomer A extends at the interface between the two monomers, making contacts with the ␣ 1 helix of monomer B. It then runs toward the N-terminal end of monomer A, providing an extra ␤ strand (␤ 17 ) antiparallel to ␤ 3 of its D2 domain, instead of the two ␤ strands provided by the D4 domain in the VP55 structure (␤ 17 and ␤ 18 ). It finally runs back to terminate on top of the D3 domain of monomer A, the end of the structure being disordered (17 and 15 residues for A and B, respectively). The interaction with the D3 domain is stabilized by a conserved disulfide bridge (C 350 -C 426 ) connecting the beginning of the ␤ 14 strand and the ␤ 16 strand in the D4 domain ( Figure 1). The two D4 domains delineate what we defined as the entrance of the central tunnel at the interface of the dimer. This domain is the most basic in both sequences (predicted isoelectric points >10). It is worth noticing that the D5 extension unique to the Mimivirus PAP is disordered in the crystal structure and we thus have no clues on its molecular structure or function.
ATP binding site and nucleic acid recognition. Electrostatic potential calculations highlight intensely positively charged grooves on each side of the entrance of the central tunnel, which is contributed mostly by the positively charged D4 domains and to a lesser extent by the D2, D3 domains (Figure 5A and Supplementary Figures S5 and S6B). RNAs could therefore interact with the PAP through the positively charged grooves and reach the catalytic site through the 10 A diameter wide central tunnel making the two catalytic sites accessible for a ssRNA molecule. We refer to this side as the entrance side. The enzyme processivity could be explained by the dimeric state of the PAP which positions and stabilizes the RNA molecule. The superimposition of the catalytic domains of Mg561 with the Vaccinia virus ATPbound VP55 chain from the VP55-VP39 heterodimer structure (PDB 2GA9) showed that the residues involved in ATP binding were structurally conserved and ordered in the crystal structure as evidenced by the 2Fo-Fc electron density  Figure S6A). As suggested by the electrostatic potential properties of the molecular surface, the superimposition of one monomer of the Mg561 structure with VP55 in complex with ATP and RNA (PDB 3ERC) seems to confirm that both the ATP and the RNA molecule could bind Mg561 in a similar way to VP55 (Figure 5B) and that the RNA molecule could enter through the central tunnel formed by the dimer (Figure 5C). We thus propose that the highly basic D4 domains assume the RNA binding function performed by VP39 in the VP55-VP39 heterodimer (Supplementary Figure S6B). We also observed that a second tunnel, visible on each side of the central tunnel, led directly to the ATP binding site and could therefore be the entry path for ATP toward the active site ( Figure 5D). The processivity factor VP39 binds VP55 at this location and the tunnel is absent in the VP55 structures as well as in its complex with VP39 (2GA9 (26), 3OWG (51)). This ma-jor difference, where ATP could flow continuously through this tunnel, could also contribute to the processivity of the mimiviruses PAPs. Interestingly, positively charged residues are lining the central tunnel and are provided by ␣ 1A (R 24 -K 32 -R 35 ) and ␣ 7B (R 189 -K 192 -R 196 ) for one side of the tunnel and ␣ 1B and ␣ 7A for the other (Supplementary Figure  S6C). These charged residues are mostly conserved in the mimiviruses sequences (Supplementary Figure S2). They could generate an outlet for the inorganic pyrophosphate (PPi) produced during the polymerization reaction, again contributing to the PAPs processivity.

-end hairpin recognition and cleavage
We synthesized in vitro the M. chilensis Mg592 transcript with its 3 UTR-hairpin as predicted by Mfold (52) and analyzed the polyadenylated products produced by the vPAPs (Supplementary Figure S7). All transcripts were polyadenylated at their 3 -extremities when using Mg561, R341, or the bacterial PAP used as a control, confirming that other proteins were required for the recognition and processing of the hairpin. The Mimivirus and M. chilensis PAPs belong to a conserved gene cluster predicted to also encode a protein with tandem RNAse III-like domains (R343 in Mimivirus and Mg559 in M. chilensis). Interestingly, the structure of the eukaryotic enzyme Rnt1p, which has a single RNase III domain, in complex with an RNA hairpin (PDB: 4OOG (53)) reveals that Rnt1p recognizes its hairpin substrate as a homodimer, allowing the excision of the hairpin structure. In the Mimivirus and M. chilensis RNAse III-like sequences, the residues involved in the dimer formation, as well as the ones responsible for the dsRNA binding, are conserved and both dsRNA binding domains have predicted isoelectric points > 10 as expected for nucleic acids binding domains. The catalytic as well as the metal binding residues are perfectly conserved in the N-terminal RNAse domain, while the residues involved in metal coordination are not conserved in the second RNAse domain. This suggested that the mimiviruses RNase III-like enzyme, as a momomer with two RNase III-like domains, can recognize the hairpin in the same way than the homodimeric Rnt1p, but is unable to excise it. Instead, it could perform a nick within the hairpin, allowing polyadenylation by the viral PAP inside the hairpin. To validate this hypothesis, we produced the Mimivirus R343 RNAse III-like protein and addressed its enzymatic activity in vitro on a RNA corresponding to the 3 -end of the of the Mimivirus R418 transcripts (Figure 6). Our results showed that R343 was able to cleave this RNA and generate two cleavage products inside the canonical hairpin in a metal-dependent manner. In the presence of Mg 2+ , a major product of 34nt and a minor one of 51nt were observed while only that of 51nt was observed in the presence of Mn 2+ . Interestingly, the major product observed in the presence of Mg 2+ corresponded to the polyadenylation site previously identified in vivo for this transcript (22), suggesting that Mg 2+ was preferentially used by R343 in cell. More importantly, this demonstrated that R343 is the enzyme responsible for the canonical hairpin recognition and its specific cleavage (Supplementary Figure S1C, step 5). The presence of the R341 PAP had no impact on the cleavage. To elucidate if R343 and R341 were sufficient for 3 -end mRNA maturation we incubated sequentially the two enzymes with the Mg592 transcripts and analyzed the polyadenylated products. We also sequenced the corresponding in vivo transcripts extracted from A. castellanii cells infected by M. chilensis. We observed the same polyadenylation site within the hairpin after R343 incubation with Mg 2+ or Mn 2+ but this site was different from that of the transcripts polyadenylated in vivo (Supplementary Figure S7). This suggested that a third protein could be required to direct the in vivo location of the nick even if we cannot rule out the fact that the Mimivirus and the M. chilensis enzymes could cleave at different sites on the same hairpin and thus the Mg559 protein could produce the same nick in vivo and in vitro on the Mg592 hairpin.

Mg18 is a SAM-dependent ribose 2 O MTase
Even though we postulated that a cap specific 2 O MTase is present in the conserved gene cluster encoding the mimiviruses mRNA capping enzyme, we wondered whether the Mg18, like its Vaccinia virus VP39 homologue, could retain the 2 O MTase activity (Supplementary Figure S1C,  step 3). Sequence analysis revealed that Mg18 harbors the conserved catalytic K-D-K-E tetrad shared by all known 2 O MTases including that of poxvirus VP39 (33), flavivirus (54), coronavirus (55) and rhabdovirus (56). In the VP39 structure cap recognition is made by the stacking of two aromatic side chains (Y 22 and a F 180 ) and hydrogen bonding with two acidic side chains (D 182 , E 233 ). The first aromatic residue is conserved in the Mg18 sequence (Y 59 ) while the loop bearing the F 180 and the D 182 residues is 4 amino acids shorter in the Mg18 sequence suggesting that Mg18 may not be as specific for cap residues as VP39 (57). We compared the MTase activities of Mg18, VP39 and the Human N7 MTase (hN7 MTase) on several RNA substrates ( Figure 7). First, Mg18 has a much greater activity on the capped GpppAN 13 than the uncapped pppAN 13 , therefore is dependent upon 5 -guanylylation. In contrast to VP39, Mg18 had comparable activity with 7Me GpppAN 13 and GpppAN 13 and therefore, its activity does not dependent on the cap methylation status. Mg18 is about 10 times less efficient on 2 O-methylated capped RNAs (GpppA 2 OMe N 13 ) than on N7-methylated or unmethylated caps ( 7Me GpppAN 13 or GpppAN 13 ), indicating that the methylated residue in GpppA 2 OMe N 13 is the methylation target. As expected, the addition of Mg561 had no influence on the 5 -GpppA specific Mg18 MTase activity (Supplementary Figure S8). Then we investigated the methylation state of the A. castellanii polyA+ mRNA caps using VP39 and the hN7 MTase. The A. castellanii mRNAs were not methylated by the two MTases, suggesting that their caps were already methylated both at the N7 and at the ribose 2 O positions. To our surprise, Mg18 exhibited a strong MTase activity on the polyadenylated A. castellanii mR-NAs (Figure 7), suggesting that Mg18 had a second distinct activity: the expected 5 -GpppA specific 2 O MTase activity, and another cap independent one. To investigate this additional function of Mg18, we assayed Mg18 activity on uncapped homopolymeric poly (A), (U), (C) and (G) substrates. We found an even stronger MTase activity of Mg18 on polyadenylates. In the absence of any cap structure, Mg18 is able to methylate the ribose 2 O position of internal nucleotides with a clear specificity for adenylates suggesting it can also methylate the mRNAs polyA tails (Supplementary Figure S1C, step 7).

DISCUSSION
Transcripts maturation is a central step in the process of eukaryotic gene expression and involves complex protein machineries responsible for transcription termination, 5end capping of the mRNA, N7 and/or 2 O methylation of the cap, mRNA 3 -end cleavage and polyadenylation. The maturation enhances the translation by increasing the mR-NAs stability and their affinity for the ribosome. Viruses infecting eukaryotes have evolved their own strategies to favor their genes expression either by hijacking the host machinery or using virally encoded proteins. For the members of the mimiviruses sub-family, the transcription enzymes are virally encoded and loaded in the virions, allow-Nucleic Acids Research, 2015, Vol. 43, No. 7 3785   ing transcription to proceed immediately upon infection. They all share a specific signal governing their transcripts polyadenylation, which correlates with a hairpin structure. Mimivirus and M. chilensis share a remote homologue of viral and cellular PAP located in a cluster of three conserved genes transcribed as a polycistronic mRNA encompassing the three protein coding regions (22).
We verified that the Mimivirus and M. chilensis predicted PAPs were able to perform polyadenylation. These proteins were not only active, but also intrinsically self-processive, generating >700 nucleotides long polyA tails. To elucidate the molecular basis of this processivity we determined the Mimivirus and M. chilensis PAP crystallographic 3D structures, which revealed two topologically identical subdomains with a nucleotidyltransferase fold. A similar topology is still recognizable in the Vaccinia virus PAP structure and appears conserved in other NCLDVs, suggesting that an ancestral duplication was at the origin of the vPAPs. Surprisingly, the Mimivirus and M. chilensis enzymes are homodimeric and their 3D-structures suggested that the dimer stability depended on the N-terminal domain, the deletion of which lead to monomeric and enzymatically inactive proteins although still able to bind RNA. Thus, while other PAPs form heterodimers with processivity factors, the Mimivirus and M. chilensis PAP become processive upon homodimerization. Since the Mimivirus and M. chilensis PAP add polyA tails at the 3 -end of any mRNA, additional proteins are required to recognize the hairpin structure and cleave it prior polyadenylation. We showed that this role is played by the product of the third gene of the PAP gene cluster, which is able both to recognize and cleave the mRNA, within the hairpin. However, the RNase III-like sequences are lacking the elements acting as a ruler in eukaryotic enzymes (53), defining the precise location of the hairpin cleavage. The R343 enzyme cleavage site in vitro is not the same as in vivo suggesting that another factor is required to achieve the highly specific 3 -end processing of viral transcripts. We hypothesize that the second gene of the cluster (R342 and Mg560) encoding a protein made of a N-terminal basic domain and a C-terminal acidic one, could play the role of the ruler through its specific interaction with the RNAse and the RNA. This would need to be addressed experimentally.
Another step of mRNA maturation corresponds to the mRNA capping and its methylation. All members of the mimiviruses sub-family encode a mRNA capping enzyme also performing the N7 methylation of the cap (31) and a cap-specific guanine-N2 methyltransferase, forming a 2,7dimethylguanosine DMG cap which could favor viral protein synthesis (32). The mRNA capping enzyme genes are located next to genes predicted to encode a 2 O methyltransferase and both proteins are loaded in the virions. Homologs of that predicted 2 O MTase are found in all NCLDVs, including CroV (CroV442) -where it is also loaded in the virion (58)-, PgV (9) and OLPV (10). A 2 O MTase activity was also reported in the virions of Asfarviruses (16). Therefore, both proteins from the mRNA capping cluster could be involved in the mRNA 5 -end maturation. In contrast, the poxviruses exhibit an inactive homologue of this 2 O MTase (D12) the function of which is performed by the dual function VP39 protein (59,60). M. chilensis encodes a homologue of the Vaccinia VP39 fac-tor, the Mg18 protein, which retained the 2 O MTase activity and is also able to internally methylate the mRNAs polyA tails (Supplementary Figure S1C, step 7). The polyA tail synthesis and its internal methylation could thus be performed concomitantly through the interaction between M. chilensis PAP and Mg18. To date, the Flaviviruses NS5 MTase is the only enzyme reported to specifically methylate the internal 2 O adenosines of the viral genomic RNA to increase its stability (61). M. chilensis and the other Mimiviridae able to internally methylate their mRNAs could take advantage of this property. Compared to the host mRNA, the viral mRNA longer polyA tails might allow the viruses to further increase their stability by delaying their degradation in the cytoplasm and thus favor their translation over the host's mRNAs. Since the natural host of M. chilensis is not known, the differences in transcripts maturation observed between Mimivirus and M. chilensis could be the signature of their specificity toward their natural hosts.
The use of a hairpin signal for viral mRNA maturation has been reported for the first time in Mimivirus and is universally used by all mimiviruses (18,22). Their genomes are AT-rich which could explain why they use a structural signal for polyadenylation, instead of the 'AATAAA' canonical sequence which can be randomly found all along the genomes. Some other AT-rich DNA viruses such as the poxviruses present conserved polyuridylate sequences at specific positions with respect to the genes 3 ends, which are used as termination signals. As a result, there is a very large number of distinct polyadenylated 3 -ends consistent with the absence of a highly specific recognition motif (62). In Mimivirus, the few occurrences where the AATAAA canonical signal is used, its location relative to the transcription end is also much less stringent than for the palindrome structure. Hairpin structures-based processes have been documented in the cellular world for RNA maturation. In the prokaryotic Rho-independent transcription termination process (63), the transcription stops when the newly synthesized RNA molecule forms a short G-C-rich hairpin followed by a run of polyuridylates, due to the release of the RNA from the DNA template. In eukaryotes, the histones mRNA maturation also involves short hairpin structures but they are not polyadenylated and the hairpin is formed by endonucleolytic cleavage of the pre-mRNA (64). Non-coding RNA 3end maturation also involves hairpin structures (small nuclear RNA, small nucleolar RNA, miRNA in eukaryotes, rRNA in prokaryotes) (65). It is a dimeric RNAse III, which is involved in the hairpin recognition and performs the hairpin excision but the resulting products are not polyadenylated.
The alternative solutions evolved by mimiviruses to perform transcription, a central cellular process, questions the general idea that viruses are merely hijacking the host cell machinery or recruiting cellular processes through lateral gene transfer. Even if some viral enzymes are reminiscent of cellular ones, the mimiviruses overall machinery is markedly different. Overall, the viral machinery appears less complex than the multiprotein machinery needed for cellular RNA maturation. A single protein is required to both cap the mRNA and methylate them in N7. The 3 -end maturation of transcripts also involves a minimal machinery made of two proteins instead of the five required in eukaryotes. The newly discovered Pithovirus sibericum, a 30 000 years old virus revived from permafrost, although not a member of the Mimiviridae, also possesses an AT-rich genome and obeys the hairpin rule for its transcripts maturation (66). Interestingly, it encodes a RNAse III-like protein suggesting it uses a similar transcripts maturation process.
By definition, viruses are forced to rely on the host translation apparatus. This creates a fundamental bottleneck for their mRNAs to gain access to protein synthesis, in particular for those exclusively replicating in the cytoplasm. To win this showdown against the cell own mRNAs, viruses must constantly evolve new strategies. Different virus families have independently raised different solutions ranging from ribosomes editing, cellular mRNAs degradation, or stabilization of their own messengers (for review see (67)). The minimum molecular machinery set up by the Mimiviridae to produce and stabilize their own mRNA is another illustration of these variations on a theme in the arms-race contest with their host.

ACCESSION NUMBERS
The 3D structures have been deposited in the Protein Data Bank under the accession numbers: 4P37 (Mg561) and 4WSE (R341).

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.