Structure of the large terminase from a hyperthermophilic virus reveals a unique mechanism for oligomerization and ATP hydrolysis

Abstract The crystal structure of the large terminase from the Geobacillus stearothermophilus bacteriophage D6E shows a unique relative orientation of the N-terminal adenosine triphosphatase (ATPase) and C-terminal nuclease domains. This monomeric ‘initiation’ state with the two domains ‘locked’ together is stabilized via a conserved C-terminal arm, which may interact with the portal protein during motor assembly, as predicted for several bacteriophages. Further work supports the formation of an active oligomeric state: (i) AUC data demonstrate the presence of oligomers; (ii) mutational analysis reveals a trans-arginine finger, R158, indispensable for ATP hydrolysis; (iii) the location of this arginine is conserved with the HerA/FtsK ATPase superfamily; (iv) a molecular docking model of the pentamer is compatible with the location of the identified arginine finger. However, this pentameric model is structurally incompatible with the monomeric ‘initiation’ state and is supported by the observed increase in kcat of ATP hydrolysis, from 7.8 ± 0.1 min−1 to 457.7 ± 9.2 min−1 upon removal of the C-terminal nuclease domain. Taken together, these structural, biophysical and biochemical data suggest a model where transition from the ‘initiation’ state into a catalytically competent pentameric state, is accompanied by substantial domain rearrangements, triggered by the removal of the C-terminal arm from the ATPase active site.


INTRODUCTION
Most dsDNA bacteriophages and viruses utilize a powerful DNA translocation motor to package their unit-length genome into a preformed procapsid (1,2). A central component of the motor, large terminase, consists of an Nterminal adenosine triphosphatase (ATPase) domain com-prising ASCE (Additional Strand Catalytic glutamate) and lid subdomains, and a C-terminal nuclease domain (3,4), connected by a short linker sequence. While both domains are involved in interaction with DNA (5,6) DNA translocation is powered by the ASCE subdomain using energy released coinciding with ATP hydrolysis (7,8). Due to topological differences between viral large terminases and other ASCE ATPases, they were classified into a unique division within the superfamily comprised of HerA/FtsK, RecA and PilT ATPases (9,10). The nuclease domain is responsible for cleavage of the genomic DNA concatemer at both the initiation and completion stages of viral DNA packaging (11). This domain is a member of the RNase H-like endonuclease superfamily and has the highest similarity with the RuvC endonucleases (12)(13)(14).
Unlike hexameric ATPases (10,15), evidence suggests that the large terminase motor likely assembles into pentameric rings (3,5,16,17). The mechanism of the ATPase motor assembly and how ATP hydrolysis is coupled to DNA translocation, remain poorly understood. Various models have been proposed, including the electrostaticallydriven mechanism (3), the chemo-mechanical coupling model (4) and the pentameric trans-finger model (5). According to the electrostatic model proposed on the basis of T4 large terminase (3) ATP hydrolysis within ASCE subdomain induces rotation of the lid subdomain (also referred to as 'transmission domain') that triggers electrostatic pulling of the nuclease domain towards the ATPase domain, translocating DNA (3). However, according to the chemo-mechanical coupling model based on structural studies of the Sf6 large terminase, DNA is translocated by the conformational changes of the lid ('linker') subdomain upon ATP hydrolysis (4). Notably, both these models involve the participation of a cis-acting P-loop arginine (cis-arginine) to trigger the movement of the lid subdomain upon ATP hydrolysis. Recently, a pentameric transacting arginine finger (trans-arginine) model has been reported for the phi29 packaging ATPase and P74-26 large terminase ATPase. Distinct to the previous two models, this model compares the large terminase ATPase motor The DNA fragment encoding the full length D6E large terminase (residues 1-427) was synthesized (Genewiz USA Inc.) with codons optimized for Escherichia coli protein expression. This DNA fragment was amplified by PCR and re-cloned into the expression vector pET-YSBLIC3C using ligation-independent cloning (20) resulting in a sequence encoding for a protein with an N-terminal 6-histidine tag fused to the human rhinovirus 3C protease cleavage site. Site directed mutagenesis was used to introduce a stop codon for the ATPase domain construct  or codon changes for mutant variants of the full-length large terminase mutants using the CloneAmp™ HiFi PCR Premix (Takara Bio USA, Inc). The full-length terminase, ATPase domain and all mutants were expressed in E. coli BL21-Gold (DE3) (Agilent Technologies USA, Inc) cultured in LB medium containing 30 g/ml kanamycin. Cells were grown at 37 • C until OD 600 reached 0.6-0.8 followed by induction with 1 mM isopropyl 1-thio-␤-D-galactopyranoside (IPTG) and further growth for 2 h before harvesting by centrifugation for 20 min at 5000 × g at 4 • C. Pellets were frozen at -80 • C until purification.
Before sonication, cell pellets of the D6E large terminase proteins were resuspended in buffer A (20 mM Tris pH 7.5, 1 M NaCl) containing 1 mM AEBSF, 0.5 g/ml leupeptin, 0.7 g/ml pepstatin and 0.1 mg/ml lysozyme. The lysate was clarified by centrifugation at 19 000 × g for 1 h and filtration using a 0.45 m filter. Proteins were first purified by nickel affinity chromatography with a His-Trap column (GE Healthcare) equilibrated with buffer A containing 10 mM imidazole, and eluted with a 10-500 mM imidazole linear gradient in buffer A. The eluted target protein fractions were collected and dialyzed into 20 mM Tris pH 7.5, 250 mM NaCl, at 4 • C overnight. During the dialysis, HRV 3C protease was added to the protein in a 1:100 (w/w) ratio to remove the N-terminal 6-His-tag. Digested protein samples were applied to the His-Trap column as before, the flow-through concentrated and applied to a Superdex 200 Hiload 16/60 column pre-equilibrated in 20 mM Tris-HCl, pH 7.5, and 250 mM NaCl (buffer B). The ATPase domain was purified as the full-length protein, except that 1M NaCl was used in dialysis buffer and buffer B. The final protein samples were concentrated to 10-40 mg/ml, snap frozen in liquid nitrogen and stored at −80 • until further use.

Analytical ultra-centrifugation
Sedimentation velocity experiments were performed to analyse the oligomeric state of the full-length D6E large terminase protein. Purified protein (at a concentration of 4 M) was dialyzed extensively against a buffer containing 10 mM HEPES pH 7.5 and 50 mM potassium acetate using a 3500 Da cut-off dialysis device prior to the experiment. The dialyzed protein sample together with the dialysis buffer, used as a reference, were pipetted into sector-shaped cells with quartz windows and analysed by a Beckman Optima XL-A analytical ultracentrifuge. The sample was centrifuged at 35 000 rpm at 20 • C and radial scans at a wavelength of 280 nm were obtained continuously. Sedimentation profiles were analysed using Sedfit (21) using a partial specific volume of 0.74 ml/g and density and viscosity 1.0014 g/ml and 0.0102 Poise for the buffer were estimated for the potassium acetate buffer using Sednterp tabulated values for sodium acetate (22). Fits were considered to be satisfactory if the rmsd was less than 0.08 and the residuals were random and <10% of the original signal.

Thermal shift assays
This assay measures the fluorescence emission upon binding of a fluorescent probe to exposed hydrophobic regions by progressive protein denaturation with increasing temperature. The assay was performed using 5 M purified WT full-length D6E large terminase protein in the presence of 5 mM ADP or non-hydrolysable ATP analogues and 5 mM MgCl 2 in a 20 l mixture containing 5 × SYPRO ® Orange (diluted from 5000 × stock), 5 mM HEPES pH 7.5 and 5 mM potassium glutamate, unless otherwise noted. As magnesium chloride was added in the assay buffer to facilitate nucleotide binding, its effect on protein stability was evaluated prior to the addition of ADP or non-hydrolysable ATP analogues. Prior to measurement, samples were centrifugated at 568 × g for 1 min to remove precipitate and bubbles. Melting curves were obtained in the temperature range of 25 • C to 95 • C at 1 • C /min on a real-time q-PCR machine. Each condition was measured three times. Tm values were obtained by globally fitting the melting curves with a Sigmoid Function.

Coupled ATPase assays
Coupled enzyme assays were performed at room temperature using: 6U/ml pyruvate kinase, 6U/ml lactate dehydrogenase, 1 mM phosphoenolpyruvate, 340 M NADH, 10 mM HEPES pH 7.5, 50 mM potassium glutamate, 10 mM magnesium chloride and 0.5 mM ATP. Unless otherwise stated, reactions contained protein concentrations above the threshold for linearity: 4.5 M full-length WT, mutant or equimolar mixtures of mutant large terminase proteins; or 0.35 M ATPase domain. Absorbance was measured using sub-micro cell quartz cuvettes in a Cary 100 UV-visible spectrophotometer at a wave-length of 340 nm. In this assay, every NADH oxidized to NAD+ corresponds to one ATP hydrolyzed. To convert the measured absorbance directly to ATP concentration, we used the NADH extinction coefficient 6077 M −1 cm −1 as measured under our experimental conditions. The ATP hydrolysis rates obtained at various ATP concentrations were fitted by the Hill equation (23) using OriginPro 2017 software.

Crystallization and post crystallization manipulation
Crystallization was performed at 20 • C by sitting drop vapour diffusion using 8 mg/ml protein solution in 20 mM Tris pH 7.5, 250 mM NaCl, 0.5 l of protein solution was mixed with an equal volume of precipitant, before equilibrating against 100 l of the reservoir solution. Crystals of the full-length large terminase (1-427) grew with 0.1 M HEPES pH 8.0, 1.2 M ammonium sulfate in the reservoir (Supplementary Table S1, Crystal form 1) and were cryoprotected in a solution containing 1.2 M ammonium sulfate, 0.1 M HEPES pH 8.0 and 25% (v/v) glycerol. To aid structure determination, 1 mM manganese chloride was added into the protein before crystallization. The X-ray structure, determined by multi-wavelength anomalous diffraction (MAD), showed that the ATPase domain was disordered in the crystal. Soaking these crystals in 5 M sodium chloride or 3.5 M ammonium citrate for 1 min, resulted in a new, better diffracting crystal form, where the ATPase domain was ordered (Table 1, Crystal form 2). A sulfate ion, present in the crystallization condition, was found in the ATPase active site. Initial attempts to obtain structures for complexes with ADP and non-hydrolysable ATP analogues were performed by soaking. While addition of ATP-␥ -S abrogated the diffraction, in the presence of ADP, AMP-PNP or AMP-PCP, the diffraction quality was retained but no ligand was observed in the electron density. Subsequent cocrystallization trials with ADP or non-hydrolyzable ATP analogues using the WT protein resulted in crystals with the same characteristics as obtained in soaking experiments, with no ligand bound in the active site. A new construct, where the C-terminal arm residues 418-427 were deleted, was used in further trials. Crystals grew in the same conditions as the full-length protein, but were soaked in 4 M sodium formate in the presence of 50 mM ATP-␥ -S or 100 mM ADP for 16 h, with 100 mM MgCl 2 to facilitate nucleotide binding. Apo crystals were obtained using a similar procedure but with the omission of nucleotides and MgCl 2 . While the soaked crystals of this new construct retained diffraction quality, the crystal form was different to that obtained for the WT protein (Table 1, Crystal form 3).

Data collection and structure determination
Diffraction data were collected at Diamond Light Source beamlines I02 and I04 (Table 1) and processed using XDS (24). The structure of the crystal form 1, containing a disordered ATPase domain and an ordered nuclease domain with bound Mn 2+ , was determined by MAD using SHELXD (25) and SOLVE (26), followed by density modification (RESOLVE) (27) and model building (Buccaneer) (28). Structures of crystal form 2 and 3 were determined by molecular replacement, using Phaser (29). Iterative cycles of model building and refinement were carried out using Coot (30) and REFMAC5 (31). Chimera (32) was used for figure generation.

Molecular docking
The pentameric D6E large terminase ATPase model was generated using M-ZDOCK (33) using a similar approach as reported recently for the Thermus P74-26 large terminase (5). Prior to docking, two extended and potentially flexible solvent-exposed loops, residues 24-35 and 159-164, were removed, as their conformations in the monomeric and pentameric states may significantly differ.

Oligomeric state
Like T4 (34), SPP1 (35) and Sf6 (4), the full-length D6E large terminase is monomeric in solution, as judged by size exclusion chromatography (SEC) (Supplementary Figure  S1A). However, subsequent analysis by sedimentation velocity analytical ultracentrifugation (SV-AUC) revealed the presence of multiple species with molecular weights corresponding to monomeric, dimeric, trimeric, tetrameric and pentameric assemblies (Supplementary Figure S1B), along with a small fraction of larger aggregates. Likewise, the AT-Pase domain alone was predominately monomeric but also contained a small fraction of oligomeric species (Supplementary Figure S1C) with an estimated molecular weight of ∼110 kDa, i.e. around four-five times larger than that of a single ∼27 kDa ATPase domain. These data are consistent with a pentameric oligomer state, in common with recent observations for the terminase ATPase from another thermophilic bacteriophage P74-26 (5).

Crystal structure
Initial crystals used for structure determination by MAD, belonged to the R32 space group and contained one molecule per asymmetric unit. There was no interpretable electron density for the whole ATPase domain indicating its flexibility (Crystal form 1, Supplementary Table S1). Subsequent soaking of these crystals in cryo-protectant containing a high concentration of NaCl resulted in a transition from R32 to C2 crystal form (Crystal form 2, Table 1) with each of the ATPase domains of the three subunits related by the crystallographic 3-fold axis adopting a unique stable position. In the resulting C2 crystal form, there are three molecules in the asymmetric unit, each having a slightly different domain orientation, with an overall inter-subunit C␣ rmsd of 0.8-1.6Å (Supplementary Figure S2A). In common with other phages (4,5,7), the N-terminal ATPase domain of the D6E terminase comprises both an ASCE and lid subdomains facing each other ( Figure  1). From the lid subdomain, the polypeptide chain leads, through an ordered linker comprising 12 amino acids (230-241), into the C-terminal nuclease domain. The nuclease domain resembles the RNase H-like fold (36,37) which, as found for the large terminases of other viruses, differs from other RNase H family proteins by an extended ␤-sheet and the presence of an auxiliary ␤-hairpin. Interestingly, the ␤hairpin, previously predicted to interact with DNA (12,(38)(39)(40), has an extended hairpin-like loop structure with an ␣helix at its tip. The final ␣-helix (␣6) of the nuclease domain is followed by an ordered extended C-terminal loop or 'arm' proximal to the ATPase active site that contains a bound sulphate ion, which was present in the crystallization condition. The relative position of the ATPase and nuclease domains is stabilized by a salt bridge formed between R224 of the lid subdomain and D394 of the hairpin-like loop, and an interdomain hydrogen bond (M216-N362; Figure 1 and Supplementary Figure S2B).

Binding of ATP analogues
Initial attempts to produce diffracting crystals of ligandbound complexes of the full-length protein failed, despite thermofluor data showing stabilization of the protein in the presence of magnesium ions and either ATP-␥ -S or ADP ( Tm of 7.8 and 2.3 respectively, Supplementary Figure  S3), Inspection of the structure containing bound sulphate ion in the ATPase active site showed that in crystal form 2, the C-terminal arm of the nuclease domain occludes the ATPase active site and likely requires significant reorganization to accommodate an ATP analogue ( Figure 1). Consequently, a protein construct with this segment (residues 418-427) deleted was used in further crystallization trials. Structures of large terminase in complex with ATP analogues were obtained after soaking in a high salt cryoprotectant producing a different crystal form with improved diffraction quality (Crystal form 3, Table 1), as described in 'Materials and Methods' section. While these structures reveal very similar overall conformations, there are differences in the molecular interactions in the active site upon binding of different ATP analogues. Firstly, the side chain of R44 which hydrogen bonds to Y213 from the lid subdomain adopts different conformations upon the binding of ATP analogues (Figure 2A-D). Secondly, the hydrogen bond formed by the P-loop lysine K47 with side chains of N169 in the apo and ADP bound structures was not observed in the structure with bound ATP-␥ -S (Figure 2A-D and Supplementary Figure S4). Instead, the side chain amine of this lysine forms salt bridges with the ␤and ␥phosphates of the bound nucleotide. Likewise, the presence of the ␥ -phosphate causes N169 to rotate its side chain to hydrogen bond to an inner shell water molecule of the catalytic magnesium ion ( Figure 2C). Interestingly, addition of AMP-PNP did not significantly stabilize the protein ( T m = 1.4; Supplementary Figure S3) nor produced crystals with in any additional density in the active site, suggesting that the difference in its ␥ -phosphate conformation alters binding affinity. It is notable that pairwise comparison of three molecules taken from the asymmetric unit of the same structure reveals larger differences (C␣ rmsd of 0.8-  Figure S2A) than comparison of the same molecule from the asymmetric unit taken from structures of complexes with different ATP analogues (C␣ rmsd of 0.3-0.4Å; Figure S4A). This latter comparison reveals that the most significant differences occur in the lid subdomain and the P-loop (up to 1Å C␣ rmsd, Supplementary Figure S4B).

ATP hydrolysis by the ATPase domain and full-length terminase
To understand the ATP hydrolysis properties of this large terminase, we compared the activities of the full-length protein with that of the ATPase domain using a steady-state ATPase activity assay (Figure 3). Measurement of ATPase activity at various protein concentrations, revealed a nonlinear increase in the rate of ATP hydrolysis at low protein concentrations, indicating a concentration dependent assembly between subunits (Supplementary Figure S5). Interestingly, the ATPase domain and the full-length protein had distinct ATP binding and hydrolysis profiles (Figure 3), with the k cat of the ATPase domain (457.7 ± 9.2 min −1 ) around 60 times higher than that of the full-length protein (7.8 ± 0.1 min −1 ). This indicates that the ATPase domain exhibits a much higher catalytic efficiency than the full-length protein. In addition, the Km of 68.4 ± 4.3 M for this domain is approximately seven times higher than that of the fulllength protein (9.2 ± 0.5 M), suggesting a reduced level of ATP binding. Despite these differences in k cat and K m , an apparent lack of positive cooperativity during ATP hydrolysis was observed for both proteins, with a Hill coefficient of 1.1 observed for both the full-length protein and ATPase domain (Figure 3).

The effect of the C-terminal arm and R421 on ATP binding and hydrolysis
The C-terminal arm, residues 418-427 of the nuclease domain, lies close to the ATPase active site ( Figure 1) and thus may play a role in ATP binding and/or hydrolysis. Indeed, closer analysis reveals that this ordered extended turn contributes to the active site (Figures 1, 4A and Supplementary Figure S4A) through an arginine residue (R421) at the tip of the C-terminal arm, that is in proximity to the sulphate ion bound in the active site. Additionally, the carbonyl oxygen of the preceding residue, L420, formed a hydrogen bonding interaction with R44 of the P-loop of the lid subdomain ( Figure 4A). Superposition of the structures for the full-length protein and the truncated protein (with bound sulphate or ATP-␥ -S, respectively) positions R421 close to the ␣-phosphate of the ATP-␥ -S ( Figure 4A). However, the fact that the presence of this C-terminal arm was incompatible with the formation of well-ordered, diffracting crystals of ATP analogue-bound complexes suggests conformational plasticity. It is therefore possible that this residue may interact with ␤or ␥ -but not necessarily the ␣-phosphate. To further investigate the role of the C-terminal arm, and R421 in particular, on ATP binding and hydrolysis, we determined the K m and k cat values for the C-terminally truncated protein (1-417) and the full-length R421A mutant. Compared to the WT large terminase (Figures 3 and 4B), both mutants had reduced ATP binding affinity, as defined by the observed increase in the K m values of ∼10and 20fold, for the R421A mutant and the C-terminally truncated protein, respectively ( Figure 4B). Despite the drastic change in the K m , the catalytic efficiency (k cat ) showed only a slight decrease compared to the WT protein. These data collec- tively suggest that the C-terminal arm plays a role in ATP binding, potentially serving to stabilize the lid subdomain and the active site in a conformation compatible with ATP binding in the monomeric form.

A model for D6E ATPase motor
To gain understanding of how the motor is assembled, we constructed a pentamer model of the D6E ATPase motor by molecular docking (33). The resulting model of the pentamer contains a central tunnel of ∼18Å in diameter, with the ATPase active sites found at monomer-monomer interfaces ( Figure 5A), in common with the pentamer models for phi29 (16) and P74-26 (5). In the pentamer, the bound nucleotide is in proximity to R44 and E143 from one subunit and R158 from the adjacent subunit ( Figure 5B). Notably, two loops defined as L 1 and L 2 , each containing a positively charged residue, R101 and K123, respectively (Fig-ure 5B), are found to contribute to the central tunnel and are expected to interact with DNA. We note that equivalent positively charged residues are also present in P74-26 (confirmed by mutagenesis to be required for DNA binding (5)) and Sf6 large terminases ( Figure 6). In common with models of large terminase ATPases from other viruses and structural observations for hexameric helicases (5,16,41,42), the tunnel surface is mostly positively charged (Supplementary Figure S6).

Identification of the trans arginine finger
Previous sequence analysis of the HerA/FtsK superfamily of ASCE ATPases revealed a conserved arginine, indicative of ring-like oligomeric ATPases, likely to promote intersubunit coordination of ATP hydrolysis in trans (43,44). Recent structural and biochemical studies of the phi29 packaging ATPase and P74-26 large terminases identified the trans-arginine residues R146 and R139, respectively, located at the monomer-monomer interfaces of the corresponding pentamer models (5,16). R158, which projects into the active site from the adjacent subunit in our pentamer model ( Figure 5B), is predicted to stabilize the transition state and facilitate ATP hydrolysis in trans. Additional active site residues such as the P-loop arginine, or 'sensor', and the catalytic glutamate were identified in other large terminases as facilitating ATP catalysis in cis ( Figure 5B) (3)(4)(5). The equivalent three residues of the D6E large terminase, respectively, predicted to be R158 (trans), R44 and E143 (cis), should be indispensable for catalysis. We tested this by mutagenesis, showing that substitution of any one of these residues to alanine completely abrogated ATP hydrolysis ( Figure 5C). However, this activity was partially recovered in assays where R158A terminase was complemented with either R44A or E143A proteins ( Figure 5C), as previously found for the P74-26 large terminase (5) and other ring-like ATPase motors (16,45). These data are consistent with the pentamer model since complementation requires oligomer assembly to create a catalytically competent active site at the subunit interface containing both trans-and cis-arginines. These data confirmed that R158 is indeed the trans residue for D6E ( Figure 5C and D).

DISCUSSION
We here provide data from a combination of biophysical, biochemical, structural and molecular docking experiments, which, together support a model where the constituent domains of the D6E large terminase protein undergo radical rearrangement to both enable, and regulate, assembly into a functional pentameric motor capable of ATP hydrolysis and DNA translocation.

ATP binding and hydrolysis
The trans-arginine. In multi-subunit ASCE ATPases, a trans-arginine finger residue is generally required to couple ATP hydrolysis between the subunits of the active ring-like assembly (10,45,46). Comparative genomic analysis has indicated that although the presence of the trans-arginine finger is commonly found for multi-subunit ASCE ATPases, the position of this finger varies across different superfamilies (9,10). Within the HerA/FtsK superfamily, conserved arginine residues identified between ␣3 and ␤4 of the canonical ASCE fold have been proposed to be the transarginine finger (10). The biochemically confirmed transarginine residues, R158, and R146 for D6E ( Figure 5A-C) and phi29 packaging ATPases (16,47), respectively, are found at the equivalent position on the overall fold coinciding with the position of the trans-arginine in the HerA/FtsK superfamily (Supplementary Figure S7). Arginine residues are also found in corresponding positions for Sf6, T4 and P74-26 large terminase proteins (4,5,48) but have not been confirmed in activity assays (Supplementary Figure S7). Conversely, the biochemically confirmed trans-arginine finger, R139, for the P74-26 large terminase is located between ␣2 and ␤3 (Supplementary Figure S7) where an arginine residue is also found for T4. We note that a comparable lysine at this position could also potentially act as the trans-acting finger for Sf6. Further biochemical mutation complementation assays are required to identify the trans-acting residue for Sf6 and T4 large terminases. Despite this, our study supports the premise that large terminase proteins are indeed multi-subunit ring-like ATPases, with a conserved trans-arginine finger that complements the ATPase active site at the subunit interface. These data also suggest that large terminase ATPases may be closely related to the HerA/FtsK superfamily, as previously proposed for the phi29 packaging ATPase (10). Furthermore, pentamer assemblies were observed for both full-length protein and the isolated ATPase domain, by SV-AUC and SEC respectively. This, along with the biochemical and structural confirmation of a conserved trans-arginine finger in this, and other, bacteriophage large terminases, further supports the notion that DNA translocation is driven by inter-subunit coordination of ATP hydrolysis (5).
Cis residues. The P-loop arginine found in many bacteriophage large terminases and packaging ATPases, including T4, Sf6, P74-26, and phi29 (4,5,16,48,49) and other multisubunit ATPases (50,51), corresponds to R44 in the D6E large terminase (Figure 5A and B). It has been proposed that this arginine may act in cis to trigger ATP hydrolysis through subdomain rotation and/or chemo-mechanical coupling to coordinate DNA translocation (3)(4)(5). Recent studies for large terminase support the proposed cis mechanism implicated previously, but further suggests that it is likely to play a role in motor assembly (52). Structural and biochemical observations on the D6E large terminase are consistent with R44 being a cis arginine that couples the motion of the lid subdomain to ATP hydrolysis. However, similar to the observation for Sf6 large terminase (4), the observed overall conformational differences upon nucleotide binding are insignificant. It is likely that larger conforma- tional changes that drive DNA translocation are dependent on the presence of other motor components and require the participation of the trans-arginine finger from the adjacent subunit within the pentameric assembly.
Our results provide insight into the mechanism of ATP hydrolysis of the large terminase motor. In the assembled motor, R158 facilitates ATP hydrolysis in trans while R44 acts in cis to transmit the conformational changes of the Ploop upon ATP hydrolysis through the lid subdomain.

Motor assembly and regulation
Formation of the D6E pentameric assembly likely requires large rearrangements in relative domain orientation. Such flexibility is implied from superposition of the ATPase domain of the T4, Sf6 and D6E large terminase structures revealing that a significant rotation is required to move the D6E nuclease domain in order for it to occupy a similar position to that observed in T4 or Sf6 ( Figure 5E). This is also consistent with the observation that the domain ori-entation seen in the crystal structure is incompatible with our proposed pentameric ATPase model ( Figure 5F), with the nuclease domain clashing with the ATPase of the adjacent subunit, and the fact that the ATPase domain alone is significantly more efficient at hydrolyzing ATP than the WT large terminase (∼50-fold increase in k cat Figure 3). This can also explain the concentration-dependent pentamer assembly observed, for the full-length protein, in SV-AUC (undiluted), but not by SEC where the protein concentration was diluted during chromatography (53) (Supplementary Figure S1). This observation is also supported by the nonlinearity observed in the concentration-dependent ATPase rate (Supplementary Figure S5); however, further structural studies of the assembled motor or its intermediates, along with single molecule FRET studies are required to confirm this. Consistent with previous studies (5), only modest or low positive cooperativity for ATP hydrolysis (Hill coefficient 1.1 for D6E and 1.7 for P74-26) was observed. This suggests that, in the absence of DNA and/or the portal ring,  the motor may be a partially assembled pentamer, where only 1-2 functional ATPase active site(s) are formed by 2-3 subunits. This premise is consistent with the observation of intermediate oligomeric states by SV-AUC. Alternatively, as proposed for the phi29 DNA packaging motor (Hill constant 1.2 measured by ATPase assays during DNA packaging) (54), binding of ATP to a monomer in the fully assembled pentameric motor is independent of its binding to the adjacent subunits (55).
In the assembled pentamer, loops L 1 and L 2 are predicted to project into the tunnel where they may engage in DNA translocation. This is supported by structural superpositions with super family 1 helicase UrvD (56) (the closest structural homologue identified by a DALI search, Z-score = 13.1 (57)) and the hexameric helicase E1 (42) (Supplementary Figure S8). Mutagenesis and DNA binding studies are required to confirm if positively charged residues found on these loops do indeed contribute to large terminase-DNA interaction. Loop L 2 precedes an ␣-helix (residues 125-132) that adopts somewhat different positions within each molecule of an asymmetric unit ( Supplementary Figure S2A) and this conformational flexibility may to be required for DNA binding or translocation.
ATP bound in the monomer active site cannot be hydrolyzed without the participation of the trans-arginine finger from an adjacent subunit in the assembled pentamer ( Figure 7A and B), which requires significant rotation of the nuclease domain with respect to the ATPase domain. This monomer conformation is stabilized, or 'locked', by the C-terminal arm. There are also additional ATPasenuclease domain-domain interactions (Figure 1, Supplementary Figure S2, Tables S2 and 3) resulting in a total surface area buried between the two domains of 362Å 2 . For comparison, areas buried in inter-domain interactions in T4 and Sf6, are significantly smaller, 212 and 78Å 2 , respectively. It is possible the distinct domain arrangement of the D6E large terminase stabilized by these interactions represents a biologically important conformationally locked inactive 'initiation' state that serves to prevent futile hydrolysis of ATP before assembly of this extremophilic motor.
It is possible that oligomerization of the large terminase into a stable pentameric motor capable of DNA translocation may only occur after interaction with the portal protein as this may stabilize the domain reorientation ( Figure  7A and B) which would be required for the pentamer assembly. Since the C-terminal segment of the nuclease domain is thought to interact with portal (58), we speculate that the disengagement of the nuclease domain from its own ATPase frees this segment to interact with the portal assembly, enabling neighbouring large terminase subunits to form monomer-monomer interactions that may further inhibit the restoration of the 'initiation' state. Combined with the increased local concentration of large-terminase proteins at the portal vertex of the capsid, these processes might assist in overcoming the rate limiting step of a co-ordinated interdomain conformational change and oligomerization of the ATPase subunits ( Figure 7A and B), thus allowing more efficient assembly and ATP hydrolysis (Figure 3; Supplementary Figures S1 and 5).

The structure of the C-terminal arm and its potential regulatory role
Our structural and biochemical data show that a short segment after the C-terminal ␣-helix (␣6 in Figure 8) of the nuclease domain, the C-terminal arm, plays a role in stabilizing the ATP bound state in the monomeric form of D6E. Indeed, these data suggest that this effect is mostly due to R421, predicted to contact a phosphate group of the ligand and biochemically confirmed by mutagenesis to decrease ATP binding (increased K m , Figure 4B). This may play a similar role to the P-loop residue, R44, by facilitating the binding of the tri-phosphate in cis. Interestingly, similar ordered C-terminal arm elements, displaying clusters of either negatively charged or hydrophobic residues around a positive amino acid (K or R) close to the turn, are found in the large terminase proteins from P74-26, Sf6 and RB49 bacteriophages (Figure 8). For the nuclease structures of bacteriophages T4 and P22, where the C-terminal segments were mostly disordered, alignment of the C-terminal regions also reveal conserved positively charged residues (Figure 8). Indeed, the analogous residue to D6E R421 in T4 phage (K567) lies at the end of a negatively charged cluster of residues, near to S583, respectively proposed-and proven-to be involved in portal protein interaction (59). The observations that this structural element is conserved indicate that it plays a biologically important role in motor function, most likely mediating the assembly of large terminase onto the portal protein embedded in the preformed capsid. Further work will determine if, as observed for the D6E large terminase, ATP binding to the monomeric state (prior to motor assembly) facilitated by the C-terminal arm also occurs in other phage systems. Additionally, this structural element may also serve to restrict assembly of the pentameric motor, and subsequent futile ATP hydrolysis, until procapsid formation.
We here provide evidence that the D6E bacteriophage large terminase protein is a pentameric ring-like ATPases related to the HerA/FtsK superfamily utilizing a similar trans-arginine finger to trigger ATP hydrolysis. Additionally, we identify a new structural element, the C-terminal arm of the nuclease domain, as a potentially conserved feature of large terminase proteins that is implicated in the assembly of active motor complex onto the portal vertex of Positively and negatively charge residues are coloured in blue and red respectively. viral capsid. The combination of crystallographic and biochemical data indicate that the arm may play a role in both regulating and coupling the hydrolysis of pre-bound ATP with the motor assembly events. Future work is required to verify this model for other large terminases and to ascertain the residues that are important for ATP binding, motor assembly, and for coupling ATP hydrolysis events with DNA translocation.