Molecular details of the CPSF73-CPSF100 C-terminal heterodimer and interaction with Symplekin

Eukaryotic pre-mRNA is processed by a large multiprotein complex to accurately cleave the 3' end, and to catalyse the addition of the poly(A) tail. Within this cleavage and polyadenylation specificity factor (CPSF) machinery, the CPSF73/CPSF3 endonuclease subunit directly contacts both CPSF100/CPSF2 and the scaffold protein Symplekin to form a subcomplex known as the core cleavage complex or mammalian cleavage factor. Here we have taken advantage of a stable CPSF73-CPSF100 minimal heterodimer from Encephalitozoon cuniculi to determine the solution structure formed by the first and second C-terminal domain (CTD1 and CTD2) of both proteins. We find a large number of contacts between both proteins in the complex, and notably in the region between CTD1 and CTD2. A similarity is also observed between CTD2 and the TATA-box binding protein (TBP) domains. Separately, we have determined the structure of the terminal CTD3 domain of CPSF73, which also belongs to the TBP domain family and is connected by a flexible linker to the rest of CPSF73. Biochemical assays demonstrate a key role for the CTD3 of CPSF73 in binding Symplekin, and structural models of the trimeric complex from other species allow for comparative analysis and support an overall conserved architecture.


Introduction
Pre-messenger RNA, with the notable exception of histone-coding pre-mRNA, requires processing at the 3 0 -end via sequential steps of cleavage and polyadenylation.The resulting polyadenosine tail-containing mRNA is protected from the nuclear degradation machinery and ready for cytoplasmic translation.The process of cleavage and polyadenylation is also subject to regulation, and for example the choice of polyadenylation signal (PAS) on the pre-mRNA among alternate sites is a regulatory aspect of cell-type function and development.Furthermore, perturbation of this process is known to severely impact cell physiology, with links to a broad range of cancers and disease [1].
The 3 0 -end processing of pre-mRNA in metazoa is performed by the large multi-protein cleavage and polyadenylation specificity factor (CPSF).Metazoan CPSF is composed of a seven-subunit core, which can be divided into stable sub-complexes with specific functions.The sub-complex consisting of CPSF160, CPSF30, WDR33, and hFip1 is known in mammals as the mammalian polyadenylation specificity factor (mPSF) that binds the PAS motif and recruits poly(A) polymerase [2].The second sub-complex is required for the cleavage step, and is composed of CPSF73, CPSF100 and Symplekin.These three proteins are co-purified under stringent conditions, and their presence in both the canonical CPSF complex, as well as the complex required to process histone pre-mRNA, identify them as the essential core cleavage factors [3,4].Based on later analyses, the sub-complex is also named the core cleavage complex (CCC) common to all metazoan 3 0 -end processing complexes [5], or the mammalian cleavage factor (mCF) based on the human proteins [6].
The CPSF73 subunit (also known as CPSF3) is a member of the metallo β-lactamase (MBL) family of proteins [7], and is the catalytically active component in CCC/mCF [8][9][10][11].Specifically, the N-terminus of CPSF73 has an MBL fold that contains the active site residues [12], and also a β-CASP insert which may assist in RNA binding.CPSF73 was shown to directly contact the cleavage site downstream of the AAUAAA motif on the pre-mRNA, and its activity is zincdependent [10].Additional point mutants demonstrated that CPSF73 is the endonuclease within the histone 3 0 processing complex [12].The second subunit of CCC/mCF, CPSF100 (CPSF2), shares a similar domain architecture with CPSF73 but does not exhibit endonuclease activity [11,12].Nevertheless, the presence of CPSF100 is required to enable functional activity of CPSF73 and form a tether to the rest of the CPSF [10][11][12].The third protein, Symplekin, is a long HEAT repeat-containing protein acting as a hub for the interaction with multiple factors, and was identified as part of the CPSF from Xenopus studies [13], the mouse histone 3 0 processing complex [14], and by comparison to the Pta1 subunit from the yeast cleavage and polyadenylation factor (CPF) [15,16].
The atomic details for several regions in CPSF73, CPSF100 and Symplekin have been described based on isolated domains or within larger assemblies.The crystal structure of the N-terminal MBL region of human CPSF73 (hsCPSF73-MBL) revealed two zinc-binding sites, and allowed for structure-guided mutagenesis to confirm active site residues with comparison to the structure of yeast CPSF100 (Cft2) [11].Crystal structures of Cryptosporidium CPSF73-MBL were obtained in the apo form and bound to an anti-parasite compound [17], and hsCPSF73-MBL was crystallized in the presence of the anti-cancer drug JTE-607 [18].Structure of the N-terminal region of Symplekin was first determined from Drosophila [19], followed by the human N-terminal region in complex with Ssu72 and a RNAPII-CTD phosphopeptide [20,21].Cryo-EM structures of human CPSF [6] and the histone pre-mRNA 3 0 -end processing complex [22] provided valuable high-resolution insight into the architecture of the N-terminal regions of CPSF73, CPSF100 and Symplekin within the larger processing machinery.In contrast, there was significantly lowered resolution observed for the C-terminal regions of the CCC/mCF proteins, which necessitated homology docking from the IntS11-IntS9 crystal structure [23], as well as poly-Ala helices to fit the Symplekin C-terminal density.Despite the limited resolution, it is clear that the Cterminal regions of CPSF73, CPSF100 and Symplekin come together to form the main contacts within the CCC/mCF.The histone pre-mRNA 3 0 -end processing complex structure further defined three C-terminal domains for CPSF73 (CTD1, CTD2, CTD3) and two C-terminal domains for CPSF100 (CTD1, CTD2) [22].There was evidence that the CTD1 from both CPSF73 and CPSF100 come together to form a single βbarrel, and that the CTD2 from each protein resembles the C-terminal heterodimer of IntS11-IntS9 [23].Additional density further suggested that CTD3 from CPSF73 may contact Symplekin.However, specific atomic information for these regions was not possible to determine.
The direct interactions between the C-termini of CPSF73, CPSF100 and Symplekin have also been studied using biochemistry and molecular biology.Using Drosophila proteins, it was noted that the minimal regions required to form the CCC/mCF complex involved the C-termini of dmCPSF73 (residues 552-684) and dmCPSF100 (residues 648-756), and the dmSympk region from residue 272-1080 [5].In Arabidopsis, it was shown that the last 157 residues of atCPSF100 interact strongly with the last 254 residues of atCPSF73-I, or the last 180 residues of the truncated paralogue atCPSF73-II [24].Insight into the CCC/mCF C-terminal complex has also been provided from Integrator, in which the Integrator subunits IntS11, IntS9, and IntS4, are closely related to CPSF73, CPSF100, and Symplekin, respectively.The IntS11-IntS9-IntS4 trimer represents a minimal Integrator cleavage module, similar to that of CCC/mCF, and is stable during purification [25,26].It had been noted previously that the C-termini of IntS11 and IntS9 strongly interact [27], and the crystal structure of a IntS11-IntS9 heterodimer of the CTD2 from each protein [23] was already helpful for analysis of the histone pre-mRNA 3 0 -end processing complex [22].Several recent cryo-EM structures containing Integrator have provided improved resolution for the CTD1 and CTD2 domains of IntS11 and IntS9 [26,28,29].Insight into the metazoan CPSF is also provided by structural comparison to the yeast CPF nuclease module which contains Ysh1 and Cft2 (orthologues to CPSF73 and CPSF100), but at present the C-termini of these two proteins have not been observable by structural methods [30,31].
To obtain this missing atomic information on the assembly of the CCC/mCF we have taken advantage of the CPSF73 and CPSF100 proteins from the microsporidian mammalian pathogen Encephalitozoon cuniculi to determine the solution structure of all C-terminal domains.This includes an intimate complex formed by CTD1 and CTD2 of both proteins, as well as the independent CTD3 domain of CPSF73 that we find by biochemical analysis to be essential for binding Symplekin.

Cloning
Plasmids expressing hsCPSF73, hsCPSF100, ecCPSF73, ecCPSF100 and ecSymplekin were built by inserting the PCR-amplified sequences into modified pET vectors allowing for coexpression with or without an N-terminal hexa-histidine tag.The plasmids used for co-expression of ecCPSF73 and ecCPSF100 were previously described [32].Cloning of the remaining constructs was performed with ligation independent cloning and plasmids were verified by DNA sequencing.Mutagenesis used PCR with primers containing the desired mutation.Electronic supplementary material, table S1, details all PCR primers used during cloning and mutagenesis, and electronic supplementary material, table S2, contains the full list of plasmids generated for this study.

Protein expression
E. coli BL21(DE3) lysY (New England Biolabs) were transformed with the various plasmids, and transformant colonies were used for a small-scale overnight culture growth at 37°C in lysogeny broth (LB) or in terrific broth (TB) supplemented with the corresponding antibiotics.Bacteria from the overnight cultures royalsocietypublishing.org/journal/rsob Open Biol.13: 230221 were used to start 500 ml cultures in LB or TB for natural abundance protein, or in M9 minimal medium supplemented with 1 g l −1 15 NH 4 Cl and 2 g l −1 13 C-glucose.Growth of the 500 ml cultures at 37°C was followed by induction at an OD 600nm of 0.6 with 0.25 mM isopropyl β-D-1-thiogalactopyranoside (IPTG), and protein expression continued for 16 h at 25°C.For amino acid-specific labelling of isoleucine, valine or leucine, 100 mg l −1 [ 13 C, 15 N]-Ile, [ 13 C, 15 N]-Val or 13 C, 15 N-Leu was added to the M9 medium 30 min prior to induction.Specific labelling of deuterated amino acids used either 100 mg l −1 [ 2 H]-Phe in 500 ml M9, or a combination of 100 mg l −1 [ 2 H]-Tyr and 100 mg l −1 [ 2 H]-Trp in a culture size of 250 ml.Cells were harvested by centrifugation at 4500×g for 20 min at 4°C, resuspended in lysis buffer containing 5 mM imidazole, 50 mM Tris (pH 7.5), 500 mM NaCl, 5% (v/v) glycerol and stored at −80°C in the presence of added lysozyme.

Protein purification
After thawing on ice, resuspended cell pellets were sonicated and soluble protein was separated from cellular debris by centrifugation at 20 000×g for 30 min at 4°C.The supernatant was filtered through a GD/X 0.7 µm filter (GE Healthcare Life Sciences) and loaded onto 2 ml Nuvia IMAC Ni-charged resin (Bio-Rad Laboratories).The resin was washed with 10 column volumes of buffer containing 5 mM imidazole, 50 mM Tris ( pH 7.5), 500 mM NaCl, 5% (v/v) glycerol followed by 5 volumes of the same buffer but with 25 mM imidazole.Protein elution used the same buffer with 500 mM imidazole.Fractions containing the tagged domain were pooled and exchanged to the initial buffer containing 5 mM imidazole by using a PD10 desalting column (GE Healthcare Life Sciences).His-tagged TEV protease (0.1 mg ml −1 final concentration) was added for overnight cleavage at 4°C.The protease, hexahistidine tag and any uncleaved protein were removed by a second passage through the Nuvia IMAC Ni-charged resin.The purified samples were concentrated with Vivaspin centrifugal concentrators with cut-off corresponding to the considered fragments (Merck Millipore Corporation) to a volume of 500 µl and exchanged with a NAP-5 column (GE Healthcare Life Sciences) to a buffer consisting of 20 mM Tris ( pH 7.5), 150 mM NaCl, 2 mM DTT and concentrated to the desired volume.Protein concentrations were determined by absorbance at 280 nm with extinction coefficients obtained using ProtParam (http://web.expasy.org/protparam).

Limited proteolysis
Following co-expression of ecCPSF73 and ecCPSF100, the histidine-tagged complex was purified on Nuvia IMAC resin.100 µg of eluted complex was incubated with trypsin at a 0.001 mg ml −1 final concentration in wash buffer (5 mM imidazole, 50 mM Tris ( pH 7.5), 500 mM NaCl, 5% (v/v) glycerol).Aliquots were collected every 10 min and the reaction was stopped by adding SDS-PAGE loading buffer.After 60 min, the samples were loaded on 15% SDS-PAGE for analysis.In a second set of experiments, the reactions were stopped after the optimal incubation time to maximize the generation of the stable fragments and samples were sent to the in-house mass spectrometry service for molecular mass and peptide identification.

NMR spectroscopy
NMR spectra were recorded at 298 K using a Bruker Neo Avance spectrometer at 700 MHz or 800 MHz, equipped with a standard triple resonance gradient probe or cryoprobe, respectively.Bruker TopSpin v. 4.0 (Bruker BioSpin) was used to collect data.NMR data were processed with NMR Pipe/Draw [33] and analysed with Sparky 3 (T.D. Goddard and D. G. Kneller, University of California).

Chemical shift assignment
We have previously reported the backbone and sidechain assignment for ec73-CTD12/ec100-CTD12 [32] which is available from the Biological Magnetic Resonance Data Bank (http://bmrb.wisc.edu/)under BMRB accession number 51624.
For the heterodimer formed by ec73-CTD12/ec100-CTD12, 1 H distances were first obtained using NOE crosspeaks from 3D 15 N-HSQC-NOESY (120 ms mixing time) and 3D aliphatic 13 C-HSQC-NOESY (120 ms mixing time) spectra using sample protein concentrations of 880 and 230 µM, respectively.For aromatic 1 H, the majority of distance restraints were obtained from a 458 µM unlabelled sample in 99% D 2 O using a 2D 1 H, 1 H-NOESY (120 ms mixing time) spectrum.Similar to the strategy used in the original chemical shift assignment [32], manual NOE crosspeak assignment was assisted by comparison to a 2D 1 H, 1 H-NOESY spectrum collected on a sample of 2 H-Phe (150 µM), and a spectrum from a sample including both 2 H-Tyr and 2 H-Trp (85 µM).Additional 1 H distances were obtained by using a 3D aliphatic 13 C-HSQC-NOESY (120 ms mixing time) spectrum collected from a 625 µM sample in which only the isoleucine was 13 C-labelled, as well as a spectrum from a 694 µM sample in which both valine and isoleucine were 13 C-labelled.Starting at iteration four, residual dipolar coupling (RDC) values were included based on interleaved spin state-selective TROSY spectra from a sample of 230 µM [ 2 H, 15 N]ec73-CTD12/ec100-CTD12, without and with the addition of 18 mg ml −1 Pf1 phage.RDCbased intervector projection angle restraints relate to D a and R values of 10 and 0.64, respectively.Protein dihedral angles were obtained by using TALOS-N [39] and SideR [40,41].Hydrogen bond restraints (two per hydrogen bond) were royalsocietypublishing.org/journal/rsob Open Biol.13: 230221 introduced following an initial structural calculation and include only protected amides in the centre of secondary structure elements.Final ensembles were refined in explicit water and consisted of the 20 lowest energy structures from a total of 250 calculated models.
For CTD3 of ecCPSF73, the 1 H distances were obtained using NOE crosspeaks from a sample of 800 µM [ 13 C, 15 N]ec73-CTD3 using a 3D 15 N-HSQC-NOESY (120 ms mixing time) spectrum, and on the same sample in 99% D 2 O using a 3D aliphatic 13 C-HSQC-NOESY (120 ms mixing time).Distance restraints, in particular for the aromatic 1 H, were also obtained from an 800 µM sample of unlabelled ec73-CTD3 in 99% D 2 O from a 2D 1 H, 1 H-NOESY (120 ms mixing time) spectrum.Protein dihedral angles were obtained by using TALOS-N [39] and SideR [40,41].Hydrogen bond restraints (two per hydrogen bond) were introduced following an initial structural calculation and include only protected amides in the centre of secondary structure elements.Final ensembles were refined in explicit water and consisted of the 20 lowest energy structures from a total of 100 calculated models.

Pull-down assays
All proteins were purified as described above with the exception that the tagged proteins were used directly after the first Ni-NTA purification without elution.Protein concentrations were determined by Coomassie staining.Equal amounts of individual tagged proteins were used for each experiment.Pull-down assays were performed by mixing tagged proteins on Ni-NTA beads with excess untagged constructs in binding buffer (20 mM sodium phosphate, pH 7.4, 150 mM NaCl, 2 mM ß-mercaptoethanol) and incubated for 1 h at 20°C.Beads were then washed 3 times with the same buffer, and bound proteins were eluted and analysed on 18% SDS-PAGE.

Electrophoretic mobility shift assays
The binding of DNA and RNA to CPSF protein constructs used a Tris-glycine gel system [39] and non-specific nucleotide sequences as used in a previous study [44].Protein samples were prepared in 20 mM Tris ( pH 7.5), 150 mM potassium acetate, 1 mM EDTA and 10% glycerol, with  [36].c Determined using the PSVS validation suite [37,38].Values are reported for ordered residues, with the Ramachandran analysis for all residues included in parentheses.
royalsocietypublishing.org/journal/rsob Open Biol.13: 230221 bromophenol blue added to aid in gel loading.3 0 Cy3 labelling was incorporated into the forward strands of the DNA (AGGGTCTCCATTTTGAAGCATGC-Cy3) and RNA (AGG-GUCUCCAUUUUGAAGCAUGC-Cy3) during chemical synthesis on an Expedite 8909 (PerSeptive Biosystems).Double-stranded oligonucleotides were prepared with unlabelled reverse strands (GCATGCTTCAAAATGGA-GACCCT and GCAUGCUUCAAAAUGGAGACCCU) by heating 50 µM mixed samples to 98°C, and allowing them to slowly cool to room temperature.For each assay, 500 nM of the Cy3-labelled single-or double-stranded oligonucleotide was used without or with 50 µM protein, and analysed after incubation for 30 min.During the incubation step, polyacrylamide gels were prepared with a final concentration of 10% acrylamide:bisacrylamide (37.5 : 1), 300 mM Tris-HCl (pH 8.8), 0.1% (w/v) ammonium persulfate and 0.1% (v/v) TEMED.The wells were washed with water after a polymerization of 10 min, and the gel was pre-run at room temperature for 30 min at 100 V in a running buffer containing 27 mM Tris, 192 mM glycine, 1 mM EDTA and pH 8.3.Following a second cleaning of the wells, 2 μl of each sample was loaded and the gel was run at 100 V for an additional 30 min at room temperature.Visualization of Cy3 fluorescence used a Typhoon Trio + imager (GE Healthcare) with a 580 nm filter, 532 nm laser, normal sensitivity, photomultiplier tube setting of 450 V and 100 μm resolution.Preparation of images used ImageQuant TL v8.1.0.0 with default visualization parameters.In a final step, the gels were stained with Coomassie blue to visualize the protein.

C-terminal heterodimer of CPSF73 and CPSF100 has two independent modules
To understand the molecular basis of heterodimer formation by the C-termini of CPSF73 and CPSF100, we initially focused on the human protein complex.Although protein production was observed, significant instability and rapid precipitation complicated their study.A search for possible alternative species identified excellent protein production and stability for the complex formed from the orthologous proteins of the parasite Encephalitozoon cuniculi.The initial complex was made by co-expression in E. coli with the full C-terminal region of both ecCPSF73 (residues 452-643, hereafter ec73-CTD123; figure 1a) and ecCPSF100 (residues 525-639, hereafter ec100-CTD12).Characterization of this heterodimer by NMR spectroscopy reveals a well-folded complex with dispersed peaks in the 1 H, 15 N-heternuclear single quantum correlation (HSQC) spectrum (electronic supplementary material, figure S1a).A notable property of the spectrum is a subset of dispersed peaks with higher intensity, which suggests the presence of a smaller folded module independent of the larger complex.To identify the protein sequences corresponding to the two modules we used limited trypsin proteolysis on the heterodimer sample with an N-terminal His6-tag on ec73-CTD123.Subsequent Ni-column purification showed that an N-terminal fragment of His6-ec73-CTD123 retained binding royalsocietypublishing.org/journal/rsob Open Biol.13: 230221 to a mostly intact ec100-CTD12, and mass spectrometry identifies an ecCPSF73 trypsin cleavage site at Lys571 (electronic supplementary material, figure S1b).We therefore made two new constructs of ecCPSF73 based on secondary structure prediction and the trypsin digest results (figure 1a): ecCPSF73 from residue 452-567 (ec73-CTD12), and another from 567-643 (ec73-CTD3).Characterization by NMR spectroscopy of the new heterodimer prepared by co-expression of ec73-CTD12 and ec100-CTD12 showed that a stable and folded complex was produced (figure 1b).In addition, the isolated CTD3 from ecCPSF73 also produced a stable and folded protein by NMR spectroscopy (figure 1c).Using these two samples we began structure determination to access their atomic details.

Solution structures of the two C-terminal modules
To determine the structure of ec73-CTD12/ec100-CTD12 we initiated a crystallization screen, as well as preliminary data collection by NMR spectroscopy.Due to an absence of crystals, we prioritized determination of the solution structure by using NMR spectroscopy.Despite the high quality of the NMR spectra based on ec73-CTD12/ec100-CTD12 (figure 1b), its overall size at 27 kDa required additional NMR approaches to simplify and reduce ambiguity in the chemical shift assignment.
A usual strategy would involve isotopic labelling of only one protein at a time in the complex; however, attempts to produce the heterodimer from separated bacterial expressions were unsuccessful.We previously took advantage of amino acidselective isotopic labelling of the complex, with samples [ 13 C, 15 N]-labelled on Ile, Val or Leu to help with backbone and side chain assignments of these residues [32].It was also apparent early on that aromatic residues would define the hydrophobic core of this complex, and thus to ensure unambiguous assignment we used samples with ring-deuterated Phe or Tyr/Trp to selectively remove cross-peaks in the spectra.Near-complete chemical shift assignment was therefore obtained for the entire complex [32].Continuing this selective-labelling approach, we obtained a large number of nuclear Overhauser effect spectroscopy (NOESY)-derived distance restraints, as well as orientational restraints from residual dipolar couplings in a phage-aligned sample (table 1).These data were complemented by dihedral angle predictions to generate an initial ensemble of structures by using ARIA2.3/CNS1.2 [34,35].Inspection of these ensembles and observation of amide 1 H N NOESY cross-peaks enabled us to include additional hydrogen bond restraints in the final structure calculation.The resulting ensemble of 20 structures was obtained with a consistent overall architecture and good statistics (figure 1d; table 1).The smaller C-terminal module only consists of the CTD3 from ecCPSF73, and in this case the NMR spectroscopy and structure calculation used a standard approach to derive the ensemble of 20 structures (figure 1e; table 1).

Overview of the CTD12-CTD12 heterodimer
As can be seen in the ensemble of structures, the entire CTD12 region from both ecCPSF73 and ecCPSF100 forms a single folded complex (figure 1d).A notable feature of the complex is the extensive contacts made between secondary structure elements of ec73-CTD12 and ec100-CTD12 (figure 2a,b).The buried interface is calculated to be 1910 Å 2 using the PISA server [45], and the high degree of intermolecular contacts between the two proteins creates an overall architectural stability as evident by global fit of the RDC values (table 1).Despite this single folded structure, it is possible to consider three sub-domains within the complex (figure 2a,b): the CTD1-CTD1 β-barrel, a central region, and the CTD2-CTD2 segment.

CTD1-CTD1 β-barrel
Three β-strands from each CTD1 region of ecCPSF73 and ecCPSF100 come together to form the CTD1-CTD1 β-barrel structure (figure 2a,b).Specifically, the β2 strands from each protein form an extended antiparallel arrangement, followed by intramolecular antiparallel β2-β3 contacts and parallel β3-β1 contacts.The barrel is closed by the antiparallel interaction of the two β1 strands, with the β1 strand from ecCPSF100 interrupted by a short helical segment.The hydrophobic residues in the core are shown in figure 2c.For the human complex, the CTD1-CTD1 β-barrel was observed at low resolution in the cryo-EM study of the histone pre-mRNA 3 0 -end processing machinery (figure 2d ) [22]; however, the limited resolution prevented a detailed comparison.To this end, we have used AlphaFold2 [42] to expand the atomic model of the entire human CPSF73/CPSF100 C-terminal complex (electronic supplementary material, figure S2a,b).The top-ranked model displays excellent agreement with the cryo-EM map for the CTD12-CTD12 region (electronic supplementary material, figure S2c).Using this model, we observe that the human CTD1-CTD1 β-barrel is highly similar to the E. cuniculi NMR structure (figure 2e,f ) with an RMSD of 3.0 Å (81 residues).To determine the extent of similarity with other species, we then predicted models for the CPSF73/CPSF100 (CPSF3/CPSF2) C-terminal complex from four diverse model organisms (Trypanosoma cruzi, Caenorhabditis elegans, Arabidopsis thaliana and Drosophila melanogaster) as well as the corresponding Ysh1/Cft2 complex from the yeast Saccharomyces cerevisiae and Schizosaccharomyces pombe (electronic supplementary material, figure S2d).As expected, an overall structural similarity is clear from these additional models.However, it is also evident that large inserts are present at differing locations within the β-barrels (electronic supplementary material, figure S2e), such as the large β2-β3 loop insert from hs100-CTD1 (figure 2e).As a final comparison, cryo-EM data are available for the IntS11 and InstS9 proteins within the human Integrator complex, such as shown for PDB ID 7bfp (figure 2g) [26].A similar architecture is clearly observed despite the limited resolution, but with a helix inserted within the IntS11-CTD1 β1-β2 loop.

CTD2 belongs to the TBP domain family
The CTD2-CTD2 region of CPSF73/CPSF100 had already been proposed to form in a similar manner to the C-terminal heterodimer of IntS11/IntS9 [23] and suggested by the lower resolution cryo-EM data [22].Consistent with this observation, we find in our structure that the CTD2-CTD2 region forms an extended β-sheet formed by β-strands β4-β8 of ecCPSF73 and β5-β8 of ecCPSF100, with four α-helices on the opposing side (figure 2a,b).To identify key residues, we created a sequence alignment for the C-terminal regions (electronic supplementary material; figure 3a,b).The alignment required use of our predicted models with the royalsocietypublishing.org/journal/rsob Open Biol.13: 230221 PROMOLS3D structure-based protocol [46], to account for the variable loops in the sequences.Highest conservation is observed for residues at the interface between the two CTD2 domains (figure 3a,b), and indicate an importance for ionic interactions between the absolutely conserved Asp555 of ecCPSF73, and Arg629 of ecCPSF100 that is an Arg in all sequences except for a Lys in scCft2.Surrounding residues are also conserved.A similar ionic bridge appears to be key for the Integrator proteins (figure 3c) as shown by the interaction of Glu581 from IntS11 and Arg648 of IntS9 [23].
In order to identify additional structural homologues of the CTD2-CTD2 region and the entire CTD12-CTD12 heterodimer, we used the FoldSeek webserver [50] (https://search.foldseek.com/)since it is well-suited for multi-domain structures, and takes into account both the PDB as well as the AlphaFold Protein Structure Database.As expected, we identified the IntS9/IntS11 C-terminal complex from the Integrator cryo-EM structure (FoldSeek E-value of 0.07 for PDB 7bfq [26]), as well as predicted models for CPSF73/CPSF3 or CPSF100/CPSF2 from numerous species (electronic supplementary material, table S3).However, we also found structural similarity between the CTD2-CTD2 segment and the TATA-box binding protein (TBP), with E-values comparable to that of Integrator.The TBP domain family shares a core architecture of α-β-β-β-β-α, with or without an additional β-strand at the N-terminus (figure 3d).The CTD2 domain thus fits into this family, and the overall CTD2-CTD2 architecture is remarkably similar to TBP except that the Nterminal β-strand is missing in CPSF100 (figure 3e).In addition to TBP, the domain has been identified in bacterial DNA glycosidases and the archeal ribonuclease H3, and may have evolved from a distant single domain protein ancestor [51].Each of these proteins are involved to different degrees in nucleic acid binding: TBP has high specificity for the DNA TATA box (figure 3f ), the domain from ribonuclease H3 is non-sequence specific for DNA-RNA duplexes, and DNA glycosidases use a separate domain to bind the DNA β-barrel from the human histone pre-mRNA 3 0 processing complex from EMD-21050, and the derived PDB 6 × 4v [22].(e) CTD1-CTD1 β-barrel from the AlphaFold2 model (electronic supplementary material, figure S2a).( f ) E. cuniculi CTD1-CTD1 β-barrel with the same orientation as (d,e,g).(g) CTD1-CTD1 β-barrel from human Integrator (PDB ID 7bfp) [26].
royalsocietypublishing.org/journal/rsob Open Biol.13: 230221 target (electronic supplementary material, figure S4a).Unlike TBP, the presence of several acidic residues centred in the β-strands would argue against a nucleic acid-binding function for the CTD2-CTD2 β-sheet in CPSF73/100 (figure 3g).Indeed, using NMR spectroscopy and a previously established electrophoretic mobility shift assay protocol [44], we did not detect any binding by the CTD12-CTD12 heterodimer to single-stranded or double-stranded DNA and RNA ligands (electronic supplementary material, figure S4d,e).

Hydrophobic contacts in the central region
In our high-resolution structure of the CTD12-CTD12 heterodimer, we were able to observe all residues within the complex,  royalsocietypublishing.org/journal/rsob Open Biol.13: 230221 and therefore we were able to detect a large hydrophobic core that joins the CTD1-CTD1 and CTD2-CTD2 regions (figure 4a).In the context of the complete CTD12-CTD12 heterodimer (figure 2a,b), these residues connect the upper surface of the CTD1-CTD1 β-barrel, ecCPSF73 linker helix α1, ecCPSF100 linker strand β4, bottom of the CTD2-CTD2 segment, and the final β-strand (β9) of ecCPSF100.This central hydrophobic core is not unique to E. cuniculi, as seen in the high conservation of corresponding residues mapped onto the human CTD12-CTD12 model (figure 4b).The final residue of ecCPSF100 is particularly interesting in this context, as the C-terminus position is strictly conserved in length and is invariably a hydrophobic residue (electronic supplementary material, figure S3b).This hydrophobic C-terminal side chain (Ile639 in ecCPSF73; figure 4a) is structured in the CTD12-CTD12 ensemble, and directly contributes to the central hydrophobic core.The AlphaFold models suggest that a similar situation exists for other organisms, such as for Val782 at the C-terminus of hsCPSF100 (figure 4b).Re-analysis of the initial crystal structure of the IntS11-IntS9 CTD2-CTD2 heterodimer (PDB 5v8w) [23] shows that intermolecular contacts already exist outside of the primary CTD2-CTD2 interface.For example, the final residue of hsIntS9 (Phe658) is visible in the crystal structure, and is surrounded by neighbouring hydrophobic residues N-terminal to the CTD2 domains (figure 4c).Despite the absence of residues from the top of the CTD1-CTD1 β-barrel, this partial central core is robust enough to form in a manner similar to that observed in the complete ec73-CTD12/ec100-CTD12 heterodimer.

CTD3 of CPSF73 binds Symplekin
In contrast to the larger CTD12-CTD12 module, the second module of the CPSF73/CPSF100 C-terminal complex is only composed of ecCPSF73 CTD3.A representative model from the ensemble of structures shows that ec73-CTD3 is composed of a single folded domain that is joined by a flexible linker to the rest of ecCPSF73 (figure 5a,b).The α-β-β-β-βα architecture again places it in the TBP domain family, and a structure similarity search using DALI (http://ekhidna2.biocenter.helsinki.fi/dali/)reveals shared homology to many of the same hits as ec73-CTD2 and ec100-CTD2 (electronic supplementary material, table S4) including a top shared hit to the C-terminal domain of an Sm-like archaeal protein (PDB ID 1m5q; electronic supplementary material, figure S4b).In keeping with a four-stranded TBP domain fold, ec73-CTD3 also displays structural similarity to CTD2 in ecCPSF100 (electronic supplementary material, figure S4c), despite sharing only 13 identical residues (approx.20% of the sequence) between the two folded domains.Similar to the CTD12-CTD12 heterodimer, we failed to detect a general ability of ec73-CTD3 to bind nucleic acids (electronic supplementary material, figure S4f ).However, it was shown in the cryo-EM structure of the human histone pre-mRNA 3 0 processing complex that CPSF73 CTD3, and also the CTD2-CTD2 region, are likely in direct contact with the Symplekin protein [22].We therefore decided to see if our ecCPSF73 and ecCPSF100 constructs (figure 1a) indeed bind to the E. cuniculi Symplekin (ecSympk).In the human complex, it is the C-terminal HEAT repeats of Symplekin that are involved in the interaction, and therefore we started with the same region of ecSympk (figure 5c; residues 160-591).Using a pull-down assay, we found that the entire C-terminal complex of ec73-CTD123/ec100-CTD12 was able to interact with ecSympk(160-591) (figure 5d, lane 4; electronic supplementary material, figure S5a).In contrast, there was no detectable interaction with the CTD12-CTD12 heterodimer in which only CTD3 from ecCPSF73 was removed (lane 5).Furthermore, the pull-down assay suggested that the isolated ec73-CTD3 construct alone might retain the ability to bind (broadened protein band in lane 6).We further refined the interacting region in ecSympk to residues 160-385 (figure 5d; electronic supplementary material, figure S5b).To confirm royalsocietypublishing.org/journal/rsob Open Biol.13: 230221 our findings and to better probe the role of isolated CTD3, we used NMR spectroscopy to test for binding to ecSympk(160-385).Once again, we failed to detect any interaction between the CTD12-CTD12 heterodimer and ecSympk (figure 5e), whereas binding by ec73-CTD3 was clearly evident due to the large degree of chemical shift perturbation upon the addition of GST-ecSympk(160-385) (figure 5f ).

Conserved interaction with Symplekin
Given the importance of ec73-CTD3 in the binding of ecSympk(160-385), we chose to generate a model of this complex in order to design mutants to disrupt this interaction.The AlphaFold model of the complex (electronic supplementary material, figure S6a,b) indicates that the main interface is hydrophobic and centred on helix α1 of ec73-CTD3 (figure 6a).We chose to introduce the acidic residue Glu in place of Val589 in the middle of this interface, or Leu585 towards the edge.Based on NMR spectroscopy, both mutant proteins were well folded (figure 6b) but only V589E prevented this interaction.The ability of the ec73-CTD3 V589E mutant to disrupt the interaction with ecSympk was confirmed in a pulldown experiment (lane 11, figure 6c).This assay also shows that Leu585 is only able to disrupt the interface in the context of the helix α1 L585K,N592D double mutation (lanes 10 and 12, figure 6c).The corresponding hydrophobic surface on ecSympk is predicted to include Phe212, Tyr215, Phe216 and Val245 (figure 6a), and indeed a V245D mutation is able to weaken the complex (electronic supplementary material, figure S5c).
To explore a wider conservation of the CTD3-Sympk interaction, we expanded our AlphaFold models to include the C-terminal region of human Symplekin in complex with hs73-CTD123 and hs100-CTD12 (figure 7a; electronic supplementary material, figure S6c).There is excellent agreement of this model with the cryo-EM map of the histone pre-mRNA 3 0 -end processing complex (electronic supplementary material, figure S6d ) [22].Using this model, we were able to look for surface residue conservation by using the ConSurf program [48].The region on Symplekin in contact with hs73-CTD3 is enriched in conserved residues (white circle, figure 7b), and is surrounded by residues that are highly variable.Furthermore, the hs73-CTD3 surface in contact with hsSympk is enriched in conserved hydrophobic residues (right, figure 7b), consistent with the interface we observed for ec73-CTD3.Residue conservation for the CTD2 domains is instead mostly limited to residues that form the CTD2-CTD2 interface (as shown in figure 3a,b).Additional models for the CPSF73-CPSF100-Sympk C-terminal complexes in T. cruzi, C. elegans, A. thaliana and D. melanogaster show a similar arrangement of CPSF73 CTD3 onto Symplekin (electronic supplementary material, figure S6e).A conserved interaction for the yeast homologues are also seen from models determined for Ysh1-Cft2-Pta1 C-terminal complexes from S. cerevisiae and S. pombe (electronic supplementary material, figure S6e).Using S. cerevisiae as an example, the surface region on scPta1 in contact with scYsh1-CTD3 is clearly enriched in conserved residues (figure 7c,d).

Discussion
We have determined atomic details for the two structured modules formed by the C-termini of CPSF73 (CPSF3) and CPSF100 (CPSF2).The largest module comprises a heterodimer of the CTD1 and CTD2 from each protein, with extensive intermolecular contacts creating the 27 kDa complex.For full structure determination by NMR spectroscopy, a complex of this size remains challenging, particularly without an option to selectively label only one of the peptides.We instead took advantage of several amino acid type labelling strategies to simplify the NMR data analysis, reduce ambiguity in the crowded spectra, and obtain a large number of structural restraints (as shown in table 1).Using NMR spectroscopy on the isolated CTD1-CTD2 heterodimer allowed us to access full details of the intra-and intermolecular contacts, and to detect aspects of protein dynamics.Apart from high-resolution details of the CTD1-CTD1 and CTD2-CTD2 royalsocietypublishing.org/journal/rsob Open Biol.13: 230221 regions of the CPSF73-CPSF100 complex, we observed a significant number of hydrophobic residue contacts between these components to form an extended hydrophobic core.The entire CTD1-CTD2 heterodimer therefore behaves as a single structural unit, which likely persists within the assembled CPSF.Focusing on the β-barrel formed by the CTD1 of both proteins, we confirm structural similarity to the partial models built for this region from cryo-EM data of the histone pre-mRNA 3 0 processing machinery [22], as well as IntS11 and IntS9 within Integrator [26,29,52].Based on both predicted models and alignments from diverse model organisms (electronic supplementary material, figures S2 and S3), the β-barrel region typically includes loop insertions of varying length, such as the extensive 62-amino-acid loop between strands β2 and β3 in human CPSF100.The function of these species-specific loops is currently unknown.We note, however, that the CTD1-CTD2 regions from E. cuniculi CPSF73 and CPSF100 lack these extended loops, which may have been a factor in their favourable in vitro behaviour.
Moving to the CTD2 region, we find that the CTD2-CTD2 dimer is indeed similar to the heterodimer structure first determined for the isolated CTD2 of Integrator proteins IntS11 and IntS9 [23].A search for additional structure similarity shows that the CTD2 falls into the larger family of TBP domains [51].The CTD3 of CPSF73 is also a TBP domain, with a four-stranded architecture such as seen in the CTD2 of CPSF100.The TBP domain family is typically associated with nucleic acid binding proteins, with direct contact to DNA observed for the tandem TBP domains of TATA-box binding protein [49,53], and the binding of an RNA-DNA duplex by the TBP domain of archeal ribonuclease HIII [54].In both cases, the contact to nucleic acids is by the β-sheet surface.In contrast, we did not find any detectable binding to single-or double-stranded oligonucleotides for the CTD2 of CPSF73-CPSF100 or indeed for CPSF73 CTD3 (electronic supplementary material, figure S4).Although it is possible that binding is restricted to a specific motif, the fact that there are few conserved basic residues on the β-sheet surfaces argue against a nucleic acid-binding function.A similar case is observed for IntS11 and IntS9, for which there is non-specific ssRNA binding detected along one side of the CTD1-CTD1 β-barrel within the IntS4-IntS9-IntS11 trimer, but no interaction with the CTD2 domains [26].Instead, the AlphaFold2 predicted models show that the peptide region of Symplekin/ Pta1 that connects the N-and C-terminal domains tends to lie across the centre of the CTD2-CTD2 β-sheet surface (figure 7a,b; electronic supplementary material, figure S6).It may be that the TBP domains in these cases have a preference for binding peptides, and thus may help in the overall architecture of the CPSF and histone mRNA 3 0 processing machinery.
In contrast to the larger CTD12-CTD12 heterodimer, the second structured module in the CPSF73-CPSF100 C-termini is just the single CTD3 domain of CPSF73.In this case, there is a clear role for the TBP domain in protein-protein interaction.However, this interaction with Symplekin involves the two α-helices on the opposite side of CTD3 from the βsheet surface.Although we did not detect a high affinity interaction between the Symplekin CTD and the CPSF73-CPSF100 heterodimer module, it is again the α-helices of the CTD2 domains that make contact with Symplekin in the assembly of the three proteins.Our NMR data provided atomic details of the CPSF73 CTD3 and also allowed us to probe the specific interaction to Symplekin.A hydrophobic surface centred on Val589 in the first α-helix of ec73-CTD3 (corresponding to Met617 in human CPSF73; figure 7a) is key to the interaction with Symplekin, and a mutation to Glu preserves the fold but abolishes the interaction.For the corresponding surface on Symplekin, we combined sequence analysis (ConSurf ) and predicted models (AlphaFold2) to identify a consistent patch of conservation where CPSF73 CTD3 appears to bind (figure 7a).
The key role of CTD3 in the interaction between CPSF73 and Symplekin also marks an important difference in comparison to Integrator.For IntS11 (a paralogue to CPSF73) there is no CTD3 domain, and therefore this mode of interaction is not possible.Indeed, except for a common role in the assembly architecture, Symplekin and the Integrator protein IntS4 share little sequence similarity and only royalsocietypublishing.org/journal/rsob Open Biol.13: 230221 resemble each other by containing a series of HEAT repeats [25].Consistent with this difference, the main contact between IntS11-IntS9 and the C-terminus of IntS4 is on the opposite side of the CTD12-CTD12 heterodimer, as compared to Symplekin and CPSF73-CPSF100 [26].Further specification of the CPSF73-CPSF100 interaction with Symplekin lies in the fact that despite some sequence conservation, CPSF73 cannot form a heterodimer with IntS9 [27].Altogether, the intimate and specific dimerization of CPSF73 and CPSF100, via the CTD12-CTD12 heterodimer, ensures that only Symplekin is able to interact in the formation of the trimeric CCC/mCF core cleavage module.
The available cryo-EM data for human CPSF and the histone pre-mRNA 3 0 processing machinery, along with several studies of Integrator, have allowed us to interpret our findings on CPSF73, CPSF100 and Symplekin in various contexts.In contrast, the orthologous regions in the yeast proteins, namely the C-termini of Ysh1, Cst2 and Pta1, do not have reported structural data.Nevertheless, our analysis of sequence conservation and predicted models provide confidence that similar interactions of the three proteins are preserved within the yeast CPF.A similarity between the yeast and human proteins has been previously validated by the ability of some mammalian factors to replace yeast proteins [10,55].In CPF, Ysh1 and Cft2 belong to the nuclease module within the core CPF (CPF core ) [30] and their C-terminal regions likely form a CTD12-CTD12 heterodimer as observed for CPSF73 and CPSF100 (figure 7c).The model of the Ysh1-Cft2-Pta1 ternary complex supports a shared mode of interaction to Pta1 driven by the CTD3 domain of Ysh1, with further contacts to the CTD2 α-helices of Ysh1 and Cft2.This interaction helps link the CPF nuclease module to the phosphatase module.As seen for Symplekin, there is minimal direct contact of Pta1 with Cft2 in the predicted model, which in yeast may have an additional consequence.The APT complex (named for associated with PTa1) is a separate 3 0 processing machinery in S. cerevisiae that displays a preference for sn/snoRNA [56].The Syc1 protein is key to forming the APT complex, and appears to bind Pta1 in a mutually exclusive manner to Ysh1 [57].The two domains of full-length Syc1 are most closely related to CTD2 and CTD3 of Ysh1/CPSF73 [58].An AlphaFold model of the Syc1-Pta1 complex shows a remarkable similarity in the position and mode of Pta1 binding between Syc1 and scYsh1-CTD23 (figure 7e; electronic supplementary material, figure S6f ).This model would explain how the single Syc1 protein binds to Pta1 in a mutually exclusive manner to Ysh1 (and Cft2), to prevent inclusion of the nuclease and polymerase modules in the APT complex.The model again highlights the important role played by the CTD3 domain.The second domain in Syc1, which we name 'CTD3' due to similarity to scYsh1-CTD3, has complete conservation of all residues that interact with Pta1 (figure 7f ).
The structural details we have determined for the C-terminal complex of CPSF73 and CPSF100, as well as the trimer formation with Symplekin, fill a previous gap in the atomic description of CPSF in general, and CCC/mCF in particular.Our high-resolution data allow for a detailed description of the complete CTD12-CTD12 heterodimer, as well as the CPSF73 CTD3 structure and interaction with Symplekin.The next steps will use this information to understand the role of these interactions in the function of the CCC/mCF, taking advantage of structure-guided mutations.The consequence of post-translational modifications will add an additional layer of regulation to be addressed.Related to the current study, sumoylation has been observed for human CPSF73 on Lys462 and Lys465 which we place along the first β-strand of CTD1 [59].This modification would clearly impact assembly into CPSF or CCC/mCF.Sumoylation also occurs on Lys535, located in the loop between the first and second β-strands of CTD2, which may have additional implications.These and other protein modifications may be important in certain developmental or disease states.Finally, further study of the complexes from different organisms, including our use of the parasite E. cuniculi, may discover functionally important species-specific aspects within the CCC/mCF such as in infectious parasites.Such differences could in turn be exploited to derive new therapeutic avenues to treat infections and disease.
Ethics.This work did not require ethical approval from a human subject or animal welfare committee.

Figure 3 .
Figure 3.The CTD2 domains are part of the TBP domain family.(a) Close-up view of the interdomain contacts between ec73-CTD2 and ec100-CTD2.(b) Conservation of residues at the CTD2-CTD2 interface (indicated by black triangles) using PROMOLS3D [46] and ESPRIPT [47].The bottom line shows conservation of each residue based on the human sequences and generated by ConSurf [48].Colour spectrum goes from poorly conserved (teal) to highly conserved (magenta).Full alignments and complete details are found in electronic supplementary material, figure S3.(c) Close-up view of the interdomain contacts between hsIntS11-CTD2 and hsIntS9-CTD2 (PDB ID 5v8w) [23].(d ) Five-stranded (left) and four-stranded (right) TBP domain folds.(e) The TBP protein is composed of tandem five-stranded TBP domains, as compared to the five-stranded TBP domain from CPSF73 and the four-stranded TBP domain from CPSF100.The additional β-strand in TBP is highlighted in grey.( f ) Cartoon representation (top) and surface charge representation (bottom) of the TBP-DNA complex (PDB ID 1cdw) [49].(g) The corresponding cartoon and surface charge view for ec73-CTD2/ec100-CTD2.

Figure 4 .
Figure 4. Hydrophobic core of the CTD12-CTD12 central region.(a) Close-up view of the central region with residue sidechains annotated and shown in stick representation.The C-terminal residue is indicated with an asterisk.(b) Central region from an AlphaFold model of the human CTD12-CTD12 complex.Annotated hydrophobic sidechains are coloured based on their ConSurf conservation score (most conserved in magenta, average in white, and least conserved in teal).See electronic supplementary material, figure S2, for complete ConSurf analysis.(c) Annotated sidechains from the partial central region of the isolated CTD2-CTD2 complex from human Integrator proteins IntS11 and IntS9 (PDB iD 5v8w[23]).(d ) For reference, the boxed view shows a superposition onto the intact Integrator proteins within the structure of the human Integrator-PP2A complex (PDB ID 7cun[52]).

Figure 5 .
Figure 5. CTD3 of CPSF73 is required to bind Symplekin.(a) Schematic of the ec73-CTD3 domain.(b) Representative model of ec73-CTD3 with annotated secondary structure elements.(c) Domain architecture of ecSympk with construct boundaries.(d ) SDS-PAGE gel following pull-down of various ec73 and ec100 samples using empty beads, the full HEATc, or a truncated HEATc construct of ecSympk.(e) Methyl group region of 13 C-heteronuclear multiple quantum correlation ( 13 C-HMQC) spectra for the CTD12-CTD12 heterodimer without (magenta) and with (cyan) two molar equivalents of unlabelled GST-ecSympk(160-385).No changes are detected between the two spectra, supporting an absence of interaction.( f ) Similar experiments with isolated ec73-CTD3 (red) and upon addition of GST-ecSympk(160-385) (cyan) shows chemical shift perturbation of most methyl groups.The inset shows large perturbation (dotted lines) of the two methyl crosspeaks from Val589.

Figure 6 .
Figure 6.Mutagenesis of the interface between ec73-CTD3 and ecSympk.(a) AlphaFold model of the complex between ec73-CTD3 and ecSympk(160-385) including surface charge representation from blue as basic to red as acidic, and key interface residues indicated.(b) Similar to figure 5f, methyl group region in 13 C-HMQC spectra for ec73-CTD3 mutants L585E (left) and V589E (right), in the absence (red) and presence (cyan) of unlabelled GST-ecSympk(160-385).Both ec73-CTD3 mutant proteins are folded, but only V589E prevents binding to ecSympk.The mutation of Val589 is also confirmed by the disappearance of its two methyl groups in the spectrum (dotted circle) when changed to a glutamate.(c) SDS-PAGE gel from a pull-down experiment between ecSymp(160-591) and the wildtype ec73-CTD123/ec100-CTD12 complex, or a series of wild-type and mutant ec73-CTD3.

Figure 7 .
Figure 7. Conserved interaction between CTD3 and Symplekin/Pta1.(a) AlphaFold model of the human CPSF73-CPSF100-Symplekin C-terminal trimer (electronic supplementary material, figure S6c,d ).(b) Surface residue conservation using ConSurf [48] with the location of CTD3 binding on Sympk circled (left), and the corresponding surface shown on hs73-CTD3 (right).Surface hydrophobic residues on hsCTD3 are annotated in red.Also annotated are hs73-CTD2 Asp578 and hs100-CTD2 Arg772 at the CTD2-CTD2 interface.(c) Predicted model of the S. cerevisiae Ysh1-Cft2-Pta1 C-terminal complex (electronic supplementary material, figure S6e).The C-terminal HEAT repeats of scPta1 is shown in surface representation, and the other proteins are shown as cartoons.(d ) Surface residue conservation of scPta1.(e) Predicted model of the S. cerevisiae Syc1-Pta1 complex (electronic supplementary material, figureS6f).( f ) Identical residues between S. cerevisiae Syc1 and Ysh1 (electronic supplementary material, figureS6g) are coloured in magenta.All residues in the 'CTD3' domain of Syc1 that contact Pta1 are identical with scYsh1.

Table 1 .
NMR and refinement statistics.
b Calculated as described in