Structural basis for centromere maintenance by Drosophila CENP‐A chaperone CAL1

Abstract Centromeres are microtubule attachment sites on chromosomes defined by the enrichment of histone variant CENP‐A‐containing nucleosomes. To preserve centromere identity, CENP‐A must be escorted to centromeres by a CENP‐A‐specific chaperone for deposition. Despite this essential requirement, many eukaryotes differ in the composition of players involved in centromere maintenance, highlighting the plasticity of this process. In humans, CENP‐A recognition and centromere targeting are achieved by HJURP and the Mis18 complex, respectively. Using X‐ray crystallography, we here show how Drosophila CAL1, an evolutionarily distinct CENP‐A histone chaperone, binds both CENP‐A and the centromere receptor CENP‐C without the requirement for the Mis18 complex. While an N‐terminal CAL1 fragment wraps around CENP‐A/H4 through multiple physical contacts, a C‐terminal CAL1 fragment directly binds a CENP‐C cupin domain dimer. Although divergent at the primary structure level, CAL1 thus binds CENP‐A/H4 using evolutionarily conserved and adaptive structural principles. The CAL1 binding site on CENP‐C is strategically positioned near the cupin dimerisation interface, restricting binding to just one CAL1 molecule per CENP‐C dimer. Overall, by demonstrating how CAL1 binds CENP‐A/H4 and CENP‐C, we provide key insights into the minimalistic principles underlying centromere maintenance.


Introduction
Centromeres are specialised chromosomal regions that act as a platform for the assembly of kinetochores, the microtubule anchoring sites essential for chromosome segregation during mitosis and meiosis (Musacchio & Desai, 2017). Unlike budding yeast where DNA sequence is sufficient to define centromere identity, centromeres in most other eukaryotes are defined by the enrichment of unique nucleosomes containing the histone H3 variant CENP-A (Sekulic & Black, 2012;Zasadzinska & Foltz, 2017). As a consequence, maintenance of CENP-A-containing nucleosomes is essential for preserving centromere identity through generations of cell cycles. This is achieved through an epigenetic mechanism that relies on CENP-A as an epigenetic mark (Westhorpe & Straight, 2014;McKinley & Cheeseman, 2016;Musacchio & Desai, 2017;Zasadzinska & Foltz, 2017).
Unlike canonical chromatin maintenance, centromeric chromatin maintenance is decoupled from DNA replication. As a result, CENP-A levels on the sister chromatids are reduced by half during replication (Jansen et al, 2007;Hemmerich et al, 2008;Dunleavy et al, 2009;Mellone et al, 2011;Lidsky et al, 2013). To ensure stable centromere maintenance, CENP-A nucleosomes must return to their original levels through active CENP-A deposition. The timing of CENP-A deposition varies among species; however, the underlying mechanisms appear to share significant similarity (Zasadzinska & Foltz, 2017). A central player in this process is the CENP-A-specific chaperone HJURP in human and its homologue Scm3 in fungi (Kato et al, 2007;Foltz et al, 2009;Pidoux et al, 2009;Sanchez-Pulido et al, 2009;Dunleavy et al, 2011). Both HJURP and Scm3 can bind the CENP-A-histone H4 (CENP-A/H4) heterodimer in its pre-nucleosomal form, and these complexes are then targeted to centromeres by the Mis18 complex (Fujita et al, 2007;Moree et al, 2011;Dambacher et al, 2012;Hayashi et al, 2014;McKinley & Cheeseman, 2014;Nardi et al, 2016;Stellfox et al, 2016;French et al, 2017;Hori et al, 2017). While the human Mis18 complex is composed of Mis18a, Mis18b and Mis18BP1, the fission yeast Mis18 complex consists of Mis18, Mis16, Eic1 and Eic2, where Eic1 and Eic2 are proposed to be functional equivalents of human Mis18BP1 (Fujita et al, 2007;Hayashi et al, 2014;Subramanian et al, 2014). The timing of Mis18 complex assembly, its centromere targeting, and subsequent CENP-A deposition are suggested to be tightly controlled by the kinase activities of CDK and Plk1 (Silva et al, 2012;McKinley & Cheeseman, 2014;Stankovic et al, 2017;French & Straight, 2019). While we know the identity of key players involved in centromere maintenance, molecular and mechanistic understanding of their intermolecular cooperation are just emerging Stellfox et al, 2016;Pan et al, 2017;Spiller et al, 2017).
Strikingly, Drosophila species have regional centromeres defined by the presence of CENP-A (also called CID in this organism), but lack clear homologues of HJURP and the subunits of the Mis18 complex. Instead, fly-specific CAL1 appears to combine the roles of both HJURP and the Mis18 complex: pre-nucleosomal CENP-A recognition and its targeting to the centromere for deposition, respectively (Phansalkar et al, 2012). Targeting CAL1 to noncentromeric DNA in Drosophila cells can recruit CENP-A and establish centromeres capable of assembling kinetochore proteins and microtubule attachments (Chen et al, 2014). These observations and the ability of CAL1 to bind CENP-A/H4 and CENP-C with its N-and C-terminal regions, respectively, collectively established CAL1 as a "self-sufficient" CENP-A-specific assembly factor in Drosophila (Schittenhelm et al, 2010;Chen et al, 2014). However, structure-level mechanistic understanding of how CAL1 binds CENP-A/H4 and CENP-C to facilitate the establishment and maintenance of centromeres is yet to be determined. The simplistic nature of the centromere maintenance pathway in Drosophila makes it a unique model system to understand the fundamentally conserved structural principles underlying centromere maintenance.
In this study, we present the structural basis for the recognition of CENP-A/H4 and CENP-C by CAL1. Our analysis reveals that although CAL1 does not share noticeable sequence similarity with its human or fission yeast counterpart, it recognises CENP-A/H4 using both conserved and adaptive structural principles. We also provide the structural framework of interactions responsible for CENP-C recognition by CAL1. Our structural analysis, together with validation of structure-guided mutants in vitro and in cells, provides the molecular basis for the mechanism by which CAL1 singlehandedly recognises and targets CENP-A to centromeres to maintain centromere identity in flies.

Results
The N-terminal region of CAL1 forms a heterotrimer with the histone fold domain of CENP-A and H4 Secondary structure prediction analysis indicated that CAL1 is likely to be a predominantly unstructured protein, although it includes an N-terminal domain spanning amino acid (aa) residues 1-200 predicted to fold into a helices (Fig EV1A and B). With the aim of structurally characterising the intermolecular interactions responsible for CAL1 binding to CENP-A/H4, we reconstituted a protein complex containing the N-terminal 160 aa of CAL1, a putative histone fold domain of CENP-A and H4 (His-CAL1 1-160 -CENP-A 101-225 -H4) (Fig 1A) using recombinant proteins as previously reported (Chen et al, 2014). Limited proteolysis experiments performed on CAL1 1-160 -CENP-A 101-225 -H4 complex using different proteases suggested that a CENP-A fragment containing aa 144-255 (CENP-A 144-255 ) is sufficient to interact with CAL1 and H4. Subsequently, using CAL1 1-160 , CENP-A 144-255 and H4, we reconstituted a truncated protein complex (His-CAL1 1-160 -CENP-A 144-225 -H4). The molecular weights (MW) measured for His-CAL1 1-160 -CENP-A 101-225 -H4 and His-CAL1 1-160 -CENP-A 144-225 -H4 using size-exclusion chromatography combined multi-angle light scattering (SEC-MALS) are 47.0 AE 0.9 and 43.4 AE 0.8 kDa, respectively ( Fig EV1C). These values match with calculated MW for a 1:1:1 heterotrimeric assembly for both complexes (46.7 and 41.7 kDa, respectively) and are in agreement with our previous report (Roure et al, 2019). This observation is also in agreement with the subunit stoichiometry of the human pre-nucleosomal CENP-A/H4 in complex with HJURP (Hu et al, 2011).
Structure determination of the CAL1 1-160 -CENP-A/H4 complex Extensive crystallisation trials with CAL1 1-160 -CENP-A 101-225 -H4 and CAL1 1-160 -CENP-A 144-225 -H4 yielded two different crystal forms: form I that diffracted X-rays to about 3.5 Å and form II that diffracted anisotropically to about 4.4 Å ( Table 1). Molecular replacement was performed for the dataset collected from form I using the coordinates of Drosophila melanogaster (dm) H3/H4 heterodimer (deduced from the structure of dm nucleosome core particle, PDB: 2PYO) (Clapier et al, 2008). Molecular replacement solution yielded initial phases sufficient for subsequent rounds of model building and refinement ( Fig EV2A). The final model included residues 17-47 of CAL1, 147-220 of CENP-A and 27-98 of H4 and was refined to an R factor 27.2% and R free factor 28.6% ( Fig 1B and Table 1). Although we used a CAL1 fragment spanning residues 1-160 in the crystallisation experiment, the calculated electron density map accounted only for CAL1 residues 17-47. Considering these crystals took more than a year to form, we concluded that CAL1 was proteolytically cleaved, which may have facilitated the crystallisation of a truncated complex.
The refined model obtained using crystal form I was used as a template in molecular replacement to determine the structure of crystal form II (Figs 1C and EV2B). The difference electron density map calculated using the molecular replacement solution revealed unambiguous density for most main chain atoms of CAL1 1-160 . Considering the modest resolution of the structure, intermolecular interactions stabilising the CAL1-CENP-A/H4 complex were further analysed using chemical cross-linking mass spectrometry (CLMS). Purified recombinant CAL1 1-160 -CENP-A 101-225 -H4 complex was cross-linked using EDC (solid lines), a zero-length cross-linker that covalently links carboxylate groups of Asp or Glu residues with primary amines of Lys and N-terminus, or hydroxyl group of Ser, Thr and Tyr, or BS 3 (dashed lines), a cross-linker that covalently links amine to amine or hydroxyl group of Ser, Thr and Tyr. The cross-linked peptides were analysed by mass spectrometry to identify intra-and intermolecular contacts ( Fig EV3). Notably, the data revealed intramolecular cross-links between the N-and C-terminal regions of CAL1 1-160, particularly between Ser19 and Lys20 and Glu139 and Glu155, suggesting a direct interaction between these regions ( Fig EV3). This information was particularly helpful in tracing the backbone atoms of residues beyond CAL1 residue 47 within the electron density map.
Overall structure of CENP-A/H4 assembly The structures obtained from two different crystal forms together provide key insights into the overall architecture of the assembly (Fig 1B and C). Structural superposition analysis showed that CENP-A/H4 heterodimer (form I) aligns well with H3/H4 heterodimer (PDB: 2PYO) with a root mean square deviation (RMSD) of 1 Å (Fig EV4A). This suggests that both H3 and CENP-A use an identical mode of H4 binding. However, CENP-A a1, H4 a3 and Cterminal tail show conformational variations in the CAL1-bound CENP-A/H4 complex, likely due to CAL1 binding ( Fig EV4B). Particularly, in the H3/H4 structure, the C-terminal tail of H4 folds back and makes contacts with the H3 a3, resembling CAL1 interaction at A B C Figure 1. N-terminal 160 amino acids of CAL1 wrap around CENP-A/H4 heterodimer to form a heterotrimeric assembly.
A Schematic representation of structural features of CAL1, CENP-A and H4. Filled boxes represent folded domains. B Overall structure of His-CAL1 1-160 -CENP-A 101-225 -H4 (crystal form I). CAL1 is shown in blue, CENP-A in maroon and H4 in green. C Overall structure of His-CAL1 1-160 -CENP-A 144-225 -H4 (crystal form II). CAL1 is shown in blue, CENP-A in maroon and H4 in green.
ª 2020 The Authors The EMBO Journal 39: e103234 | 2020 the equivalent region of CENP-A in the CAL1/CENP-A/H4 structure. The H4 C-terminal tail possibly swings away from this site upon CAL1 binding. Overall structure of dm CENP-A/H4 (form I) is very similar to human CENP-A/H4 (PDB: 3NQJ) (Sekulic et al, 2010) with a RMSD of 1 Å ( Fig EV4C). However, noticeable conformational variation is seen in loop L1, possibly to accommodate the amino acid variations between HJURP and CAL1 ( Fig EV4C).
CAL1 binds CENP-A/H4 heterodimers through multiple physical contacts CAL1 1-160 is almost entirely made of a helices that make multiple contacts with CENP-A/H4 heterodimer by wrapping around it (Figs 1C and 2A). Most CENP-A contacts are made by CAL1 helices a1 and a2 and loop L1, which interact with the CENP-A helices a2, a1 and loop L1, respectively, involving a total interface area of about 940 Å 2 . Particularly, while the N-terminal half of the CAL1 a1 helix packs against CENP-A a2 involving electrostatic (CAL1 R18 with CENP-A Q90) and hydrophobic (involving CAL1 L11 and M14) interactions, the C-terminal half, mainly aa W22 and F29, is sandwiched between CENP-A a2 and H4 a3 (Fig 2A). CAL1 L1 crosses over CENP-A L1 to facilitate CAL1 a2 interaction with CENP-A a3.
In addition, CAL1 a4 contacts both CENP-A a2 and a3 involving an interface area of about 80 Å 2 . These CAL1-CENP-A interactions appear to be further stabilised by CAL1 a5 and a6 which together with CAL1 a1 make an intramolecular helical bundle resembling a latch that restrains the position of a1 helix ( Fig 1C).
To validate the requirement of these interactions in cells, we expressed CENP-A-GFP-LacI in U2OS cells containing a synthetic array with a LacO sequence integrated in a chromosome arm (Janicki et al, 2004) and analysed its ability to recruit CAL1-V5 (Roure et al, 2019). When CENP-A-GFP-LacI was tethered to the LacO site, CAL1 WT was efficiently recruited (Fig 3A). Consistent with our in vitro binding assay, CENP-A-GFP-LacI recruited CAL1 F43R threefold less efficiently when compared to CAL1 WT . CAL1 W22/F29A and CAL1 W22/F29R showed an even stronger reduction in their ability to associate with CENP-A ( Fig 3A).
We also tested the recruitment of CAL1-V5 WT and mutants by co-transfecting them with CENP-A-GFP-LacI into a physiologically related Drosophila Schneider S2 cells containing a LacO array ( Fig 3B). In agreement with the interaction studies in U2OS cells, association of CAL1 F43R and CAL1 W22/F29R with CENP-A was A Representative fluorescence images and quantification of tethering assays. U2OS cells containing a LacO array were co-transfected with CENP-A-GFP-LacI with CAL1 WT -V5 and also with CAL1-V5 carrying point mutations. Scale bar: 10 lm (n = 2 experiments). B Representative fluorescence images and quantification of in vivo tethering assays. Drosophila Schneider S2 cells containing a LacO array were co-transfected with CENP-A-GFP-LacI with CAL1 WT -V5 and also with CAL1-V5 carrying point mutations. Arrows point to the LacO site. Scale bar: is 5 lm (n = 3 experiments).
Data information: In (A), data presented as mean AE SEM of 2 experiments, n ≥ 20 cells per experiment. P-values were calculated using a Mann-Whitney test. In (B), data presented as mean AE SEM of 3 experiments, n ≥ 45 cells per experiment, P-values were calculated using a Mann-Whitney test (****P < 0.0001).

of 21
The EMBO Journal 39: e103234 | 2020 ª 2020 The Authors significantly reduced at the tethering site ( Fig 3B). Overall, in vitro binding assays together with interaction studies in cells indicate that the interactions mediated by W22 and F29 are crucial for the recognition of CENP-A/H4 by CAL1.

CAL1 uses conserved and adaptive interactions to recognise Drosophila CENP-A/H4
Structural superposition of CAL1-CENP-A/H4 onto its respective human and Kluyveromyces lactis structures, HJURP-CENP-A/H4 (PDB: 3R45) (Hu et al, 2011) and Scm3-CENP-A/H4 (PDB: 2YFV) (Cho & Harrison, 2011), showed that CAL1 employs a broadly similar mode of CENP-A recognition with a few striking differences ( Fig 4A). All CENP-A chaperones compared here use their a1 helix to interact with a2 of CENP-A in an anti-parallel fashion, occluding the tetramerisation of CENP-A/H4 heterodimers. However, in CAL1 the upstream segment of a1 swings away from CENP-A as compared with its counterpart in HJURP and Scm3. Structural superpositionbased sequence alignments showed a key amino acid variation in dm CENP-A at position 186 as compared with human and yeast CENP-A: Ala is replaced with Met, an amino acid with a long side chain, which appears to push CAL1 a1 away from it ( Fig 4B). This apparent weakening of CAL1 a1-CENP-A a2 interaction is likely to be compensated by CAL1 a5 and a6 which together restrains the position of a1 helix by forming a helical bundle. Our efforts to measure the binding strengths of CAL1 with (CAL1 1-160 ) and without a helical elements (CAL1 1-50 ) to bind CENP-A/H4 did not show a noticeable difference under the conditions (buffer containing at least 1M NaCl) needed for CENP-A/H4 solubility. We speculate that in a cellular context post-translational regulation such as phosphorylation or/and other intermolecular interaction involving the downstream helical segments of CAL1 might modulate CENP-A/H4 binding dynamics required for correct CENP-A recruitment at centromeres. This may be a possible explanation for why CAL1 1-50 is not sufficient for CENP-A recruitment in cells (Chen et al, 2014). Notably, loop L1 of both CAL1 and HJURP interacts with CENP-A L1 through main chain hydrogen bonding interactions. However, the secondary structural element downstream of L1 that interacts with the hydrophobic groove formed by CENP-A a1 and a2 is a three stranded b sheet in HJURP, while it is an a helix in CAL1. Strikingly, unlike other histone chaperones, CAL1 shields CENP-A a3 through downstream a helical elements (Figs 1 and 4). This intermolecular interaction appears to be critical for CENP-A recognition as a CENP-A chimera where CENP-A a3 was replaced with histone H3 a3 failed to associate with centromeres (Roure et al, 2019).

CAL1 recognises amino acid variations unique to CENP-A
The histone fold domain of CENP-A and histone H3 shares 31% sequence identity. To understand how CAL1 differentiates CENP-A from histone H3, we looked for conserved CENP-A-specific amino acid variations in several Drosophila species and compared these variations against dm histone H3 ( Fig 4C). This analysis together with the structural superposition of CENP-A onto histone H3 revealed several residues unique to CENP-A within the CAL1 binding region potentially responsible for CENP-A specificity: Ser154, Met186 and Gln190. The equivalent residues in histone H3 are Gln, Ala and Gly, respectively. To evaluate whether any of these specific amino acid variations are responsible for providing CENP-A specificity, we made several recombinant CENP-A mutants where these residues are mutated to corresponding histone H3 residues (CENP-A 101-225 S154Q , CENP-A 101-225 M186A and CENP-A 101-225 Q190G ) and tested their ability to interact with His-CAL1 1-160 in a nickel-NTA pull-down assay (Figs 4D and EV4D right panel). While His-CAL1 1-160 interacted with CENP-A mutants harbouring single "histone H3like" mutations as efficiently as it does the WT CENP-A, combining three "histone H3-like" mutations resulted in a significant reduction in CAL1 binding (Figs 4D and EV4D right panel). This suggests that CAL1 achieves CENP-A specificity by recognising multiple CENP-Aspecific amino acid variations.
CAL1 chaperones CENP-A/H4 by shielding protein/DNA interaction surfaces crucial for nucleosome assembly Histone chaperones are key regulators of nucleosome assembly. This function is achieved by ensuring the correct histone incorporation in a spatio-temporally controlled manner. To understand how CAL1 exerts its CENP-A chaperone function, we performed structural superposition of CAL1-CENP-A/H4 complex onto the crystal structure of nucleosome core particle (PDB: 2PYO) (Clapier et al, 2008). This revealed that CAL1 shields the CENP-A/H4 regions critical for nucleosome assembly at: (i) the CENP-A/H4 tetramerisation interface, (ii) the H2A/H2B binding region and (iii) the DNA-binding region (Fig 5). CENP-A/H4 tetramerisation is thought to be the very first step in the nucleosome assembly pathway, followed by the wrapping of DNA by the CENP-A/H4 heterotetramer and incorporation of H2A/H2B heterodimers (Hammond et al, 2017). Thus, the CAL1 bound form of CENP-A/H4 cannot be incorporated into the nucleosome, inhibiting any unwarranted incorporation of CENP-A.

CENP-C binds CAL1 via its C-terminal cupin domain
We next aimed to understand the structural basis for the centromere targeting of the CAL1 bound pre-nucleosomal CENP-A/H4 heterodimer. Previous studies have shown that CAL1 and CENP-C can directly interact with each other through their C-terminal regions, CAL1 699-979 and CENP-C 1009-1411 (Fig 6A), respectively (Schittenhelm et al, 2010). However, efforts to purify these recombinant proteins were not successful as they were prone to degradation. Based on secondary structure prediction and sequence conservation analysis, we designed shorter constructs, CAL1 841-979 and CENP-C 1264-1411 . This CENP-C fragment contains an evolutionarily conserved cupin domain. Reconstitution of CAL1-CENP-C complex using individually purified His-SUMO-CAL1 841-979 and His-CENP-C 1264-1411 showed clear complex formation (Fig 6B): His-SUMO-CAL1 841-979 eluted at a volume of 10.38 ml, His-CENP-C 1264-1411 10.54 ml, while the complex eluted at 9.63 ml.
Overall structure of the CENP-C cupin domain

of 21
The EMBO Journal 39: e103234 | 2020 ª 2020 The Authors a C-terminal cupin domain, these appear to show striking amino acid variations. Pairwise sequence alignments of dm CENP-C cupin domain against its budding yeast counterpart showed 11% sequence identity and 18% sequence identity against the human counterpart. Crystallisation trials carried out with CENP-C 1264-1411 alone and in complex with CAL1 produced diffraction quality crystals which diffracted X-rays to about 1.8 and 2.3 Å , respectively ( Table 1).
The CENP-C 1264-1411 structure was determined by molecular replacement using the crystal structure of budding yeast Mif2p cupin domain (PDB: 2VPV) (Cohen et al, 2008). The twofold axis of the CENP-C 1264-1411 dimer was aligned with the crystallographic twofold axis. Consequently, just one molecule was present in the asymmetric unit ( Fig 6C). As expected, CENP-C 1264-1411 domain forms a cupin fold almost entirely made of b strands forming a bbarrel with a helix preceding the cupin domain. The b strands assemble into two b sheets: a six-stranded (b1-b2-b3-b10-b5-b8) and a four-stranded (b4-b9-b6-b7) (Fig 6C). The b1 of the six-stranded b sheet is connected to the preceding a1 (spanning aa residues 1,276-1,288) with a long loop (aa residues 1,289-1,313) containing two short a helical segments. Dimerisation of CENP-C cupin domain is mediated by a back-to-back arrangement of six-stranded b-sheets. In this arrangement, the loop connecting the N-terminal a helix (a1) to b1 crosses over to its dimeric counterpart resulting in a "roof"-like positioning of a-helices on top of the b barrels. The surface area buried at the dimerisation interface is 1,706 Å 2 which is about 50% of the total solvent accessible surface area. The interactions stabilising the dimerisation are predominantly hydrophobic involving residues L1283, W1286, L1287, L1312, L1314, Y1325, Y1335, M1407 and L1357 (Fig 6D). Among these residues, L1357 and M1407 are centrally located and juxtaposed within the hydrophobic core. This led us to hypothesise that these residues may be critical for the assembly of cupin dimer. To test this, we generated a mutant where L1357 and M1407 were mutated to glutamic acids (CENP-C 1264-1411 L1357E/M1407E ) and analysed their oligomeric structure by measuring the MW using SEC-MALS (Fig 6E). While the measured MW of CENP-C 1264-1411 agreed with the calculated MW of a dimer, the corresponding value for the His-CENP-C 1264-1411 L1357E/M1407E revealed that it was a monomer (measured MW 20.2 AE 0.4 kDa and calculated MW 19.2 kDa) ( Fig 6E).
Structural comparison of dm and budding yeast CENP-C cupin domains showed that although these domains share only weak similarity at the amino acid sequence level (21%), the overall fold conferring the b barrel structure is conserved. However, two loop regions (dm CENP-C 1,324-1,333 and 1,368-1,376) show striking conformational variation as compared with their equivalent regions in budding yeast CENP-C, Mif2p (Fig EV5A).
During the preparation of this manuscript, work from elsewhere reported a crystal structure of a slightly longer fragment of dm CENP-C spanning aa 1,190-1,411 (PDB: 6O2K). The structure reported here and PDB: 6O2K are nearly identical and superpose well with an RMSD of 0.27 Å (Chik et al, 2019; Fig EV5B).

Structural basis for CAL1 recognition by CENP-C
The structure of CENP-C 1264-1411 bound to CAL1 841-979 was determined by molecular replacement using the CENP-C 1264-1411 structure reported here as a search model. The final model was refined to R and R free factors of 23.7 and 26.6%, respectively, and included CENP-C residues 1,303-1,411 and CAL1 residues 890-913 ( Fig 7A). This suggests that CAL1 residues preceding and following the region 890-913 are flexible and are not stabilised by CENP-C. While CAL1 residues 890-893 form a b-strand, residues 894-913 form a highly basic a helix (calculated pI of 10.57). CENP-C binds CAL1 using a cradle-shaped surface formed by loops L1, L2 and L3 and b-strands b1 and b2. The calculated electrostatic surface properties show that CAL1 binding involves a surface suitable for both electrostatic and hydrophobic interactions (Fig 7A). CAL1 residues 890-893, which form a b-strand, interact with b1 of CENP-C cupin domain running parallel to it and as a consequence extend the b sheet involved in cupin dimerisation. The CAL1 a helix consisting of residues 894-913 makes several hydrophobic (involving L896, I900, W904 and Y908) and electrostatic (R903 and K906) interactions with a complementary hydrophobic (involving residues Y1315, V1317, Y1322 and F1323) and acidic (S1295, E1311 and N1326) region of the cradleshaped CENP-C surface (Fig 7A-C). To evaluate the requirement of these interactions to stabilise CAL1-CENP-C binding, we mutated conserved CENP-C F1324 to Arg and CAL1 I900 to Arg and K907 and Y908 to Ala and tested the ability of these mutants to bind wildtype CAL1 and CENP-C, respectively, in separate SEC experiments (Fig 8A and B). Both His-CENP-C 1264-1411 F1324R and His-SUMO-CAL1 841-979 I900R/K907A/Y908A failed to interact with His-SUMO-CAL1 841-979 and His-CENP-C 1264-1411 , respectively, and hence eluted at their original elution volumes as compared with the elution volume of the CENP-C-CAL1 complex.
We next evaluated the contribution of CENP-C and CAL1 residues identified here as critical for interaction in vitro in U2SO and Drosophila Schneider S2 cells where LacO arrays are integrated in one of the chromosome arms. Tethering GFP-LacI-CENP-C recruited CAL1-V5 to the LacO array. However, the F1324R or the L1357E/ M1407E mutation in CENP-C and I900R/K907A/Y908A mutations in CAL1 are both able to inhibit interaction and reduce co-localisation at the tethering site (Fig 8D and E). 10 of 21 The EMBO Journal 39: e103234 | 2020 ª 2020 The Authors

Dimerisation of the CENP-C cupin domain stabilises the CAL1 binding site
Previously, we showed that CENP-C dimerisation is required for CAL1 binding in cells (Roure et al, 2019). In the crystal structure presented here, the CAL1 binding site on CENP-C is in close proximity to the cupin dimerisation interface: the loop L1 and bstrands b1, b2 and b3 are all directly involved in stabilising the cupin dimer. This led us to hypothesise that the CAL1 binding site is stabilised in the right conformation by the dimerisation interface and hence disrupting the dimerisation interface might affect CAL1 binding. To test this, we evaluated using SEC the ability of A B D E C Figure 6.
ª 2020 The Authors The EMBO Journal 39: e103234 | 2020 His-CENP-C 1264-1411 L1357E/M1407E , which we have shown here is not capable of forming a dimer (Fig 6E), to bind CAL1. When His-SUMO-CAL1 841-979 was mixed with 1.2 molar excess of His-CENP-C 1264-1411 L1357E/M1407E and subjected to SEC analysis, they did not interact with each other and eluted separately at elution volumes 10.4 and 11.6 ml, respectively ( Fig 8C). Consistent with these in vitro data, GFP-LacI-CENP-C tethered to the LacO site in U2OS and Drosophila S2 cells recruited CAL1 robustly, while the GFP-LacI-CENP-C L1357E/M1407E failed to do so (Fig 8D and E). These observations together demonstrate that the CENP-C dimerisationmediated stabilisation of CAL1 binding site is an essential requirement for CENP-C association of CAL1.
The CENP-C cupin dimer binds just one CAL1 molecule Although CAL1 binding by CENP-C involves just a cupin monomer, only one of the two cupin monomers was observed to interact with CAL1, while the equivalent CAL1 binding site of the dimeric counterpart was empty in the crystal structure. We speculate that the other binding site might be sterically hindered by the remaining residues of CAL1 not seen in the crystal structure, thus not allowing a second monomer of CAL1 to bind. This agrees with our previous observation that CAL1  and CENP-C 1264-1411 form a 1:2 complex in solution as estimated using the mass spectrometry derived iBAC peptide ratio and SEC-MALS (Roure et al, 2019). To confirm the subunit stoichiometry of CAL1-CENP-C complex unambiguously, we measured the molecular mass of CAL1 841-979 -CENP-C 1264-1411 complex using analytical ultracentrifugation (AUC) (Fig 9A). First, the individual components of the complex were characterised by both sedimentation velocity (SV) and sedimentation equilibrium (SE), the data from which ( Fig EV5C) demonstrate that CAL1 841-979 is monomeric with a very weak tendency to selfassociate, while CENP-C 1264-1411 is a dimer (95% confidence intervals for MW are 17.453 and 21.086 for CAL1 and 33.901 and 36.311 for CENP-C). Next, samples comprising a SEC purified untagged complex were analysed. Both the mass and sedimentation coefficient are consistent with a 2:1 complex, but not with a 2:2 complex (95% confidence intervals for MW are 38.896 and 58.881 for the complex). Thus, the AUC together with the crystal structure shows that CENP-C cupin dimer binds just one copy of CAL1 at any given time.

CAL1 can bind CENP-A/H4 and CENP-C at the same time
To gain a better understanding of the mechanism of CENP-A deposition, we wanted to establish whether CAL1 could bind both CENP-A/H4 and CENP-C simultaneously. However, we have not been able to generate recombinant full-length CAL1 using bacteria or insect cells. Since we have already established the regions needed to bind CENP-A/H4 and CENP-C, we generated an engineered version of CAL1 with 1-160 and 841-979 connected by a flexible GSSGGSSG linker. This was expressed and refolded with CENP-A 101-225 and H4 in a similar manner to CAL1 1-160 . The resulting folded complex was analysed by SEC on its own and mixed with 1.2 molar excess of His-CENP-C 1264-1411 ( Fig 9B). This revealed a clear shift in the elution profile of His-CAL1 1-160-LL-841-979 / CENP-A 101-225 /H4 when bound to His-CENP-C 1264-1411 (14.24 ml) compared to His-CAL1 1-160-LL-841-979 /CENP-A 101-225 /H4 (14.62 ml) and His-CENP-C 1264-1411 (16.05 ml) on their own (Fig 9B).

Discussion
Understanding the molecular details of how organisms maintain their centromere identity has been of great importance to biologists as loss of centromeres or establishment of new centromeres at noncentromeric locus (neocentromeres) results in genome instability, often leading to cell death. To maintain centromere identity defined by the enrichment of CENP-A containing nucleosome, the CENP-Aspecific chaperone (HJURP in humans and Scm3 in yeast) escorts CENP-A until its incorporation into the centromeric chromatin (Dunleavy et al, 2009;Foltz et al, 2009;Pidoux et al, 2009). Correct spatio-temporal regulation of this process is achieved by the Mis18 complex in humans and fission yeast (Hayashi et al, 2004;Fujita et al, 2007;Foltz et al, 2009;McKinley & Cheeseman, 2014;Pan et al, 2017;Spiller et al, 2017). Despite the essential requirement of CENP-A deposition at centromeres, the pathways and the molecular players regulating this process show significant variations across organisms (Zasadzinska & Foltz, 2017). This suggests that these organisms have evolved to employ unique strategies to establish and maintain centromeric chromatin.
Drosophila is a remarkable model organism to study centromere inheritance as it lacks direct homologues of either HJURP and Scm3 or the Mis18 complex. Instead, it maintains centromere identity using just CAL1. CAL1 does not share obvious sequence similarity with Scm3 or HJURP and does not appear to share common ancestry with these chaperones (Sanchez-Pulido et al, 2009;Phansalkar et al, 2012;Rosin & Mellone, 2016). Our structural analysis presented here shows that although CAL1 appears to have evolved independently of Scm3 and HJURP, it employs evolutionarily conserved and adaptive structural principles to bind CENP-A. ◀ Figure 6. CAL1 binds CENP-C by directly interacting with the evolutionarily conserved Cupin domain.
A Schematic representation of the structural features of CENP-C. Filled boxes represent domains. B SEC profile of His-SUMO-CAL1 841-979 (black) , His-CENP-C 1264-1411 (red) and His-SUMO-CAL1 841-979 mixed with molar excess of His-CENP-C 1264-1411 (blue) and corresponding SDS-PAGE analysis of the fractions. Samples were analysed using Superdex 75 increase 10/300 in 20 mM Tris-HCl pH 8.0, 100 mM NaCl and 2 mM DTT. C Crystal structure of CENP-C cupin domain determined at 1.7 Å resolution. D Overall structure of CENP-C cupin domain dimer. Amino acid residues involved in dimerisation are highlighted in zoomed in panels. Residues mutated to disrupt dimerisation are circled. E SEC-MALS analysis of CENP-C 1264-1411 (black) and His-CENP-C 1264-1411 L1357E/M1407E (blue). Absorption at 280 nm (mAU, left y-axis) and molecular mass (kDa, right y-axis) are plotted against elution volume (ml, x-axis). Measured MW and the calculated subunit stoichiometry based on the predicted MW of different subunit compositions. Samples were analysed using either a Superdex 75 or a Superdex 200 increase 10/300 in 50 mM HEPES pH 8.0, 100 mM or 300 mM NaCl and 1 mM TCEP.

of 21
The EMBO Journal 39: e103234 | 2020 ª 2020 The Authors 14 of 21 The EMBO Journal 39: e103234 | 2020 ª 2020 The Authors Recognition of CENP-A L1 and a2 by the N-terminal 50 aa of CAL1 is similar to that of Scm3 and HJURP. Despite this, CAL1 is also distinctly dissimilar from Scm3 and HJURP as residues downstream of the N-terminal 50 aa wrap around CENP-A/H4 making additional contacts with CENP-A a3 and CAL1 itself. These interactions appear to be crucial for CENP-A deposition as the N-terminal 50 aa of CAL1 were not sufficient to recruit CENP-A to centromeres in cells (Chen et al, 2014). Notably, unlike the human CENP-A, the centromere ◀ Figure 8. CENP-C cupin dimerisation is critical for CAL1 binding. The EMBO Journal 39: e103234 | 2020 targeting domain of Drosophila CENP-A includes a3 as L1 and a2 were not sufficient to target CENP-A (Roure et al, 2019). We note that while HJURP and Scm3 fragments used for structure analysis are shorter than the CAL1 fragment used here, secondary structure prediction analysis suggests a lack of a similar a-helical segments downstream of HJURP/Scm3 a1 that stacks against CENP-A a2. When compared to the available "histone variant"-chaperone complex crystal structures, the overall mode of CENP-A/H4 recognition by CAL1 appears to be novel as it is the only one which wraps around CENP-A/H4 through multiple CENP-A and H4 contacts resulting in the shielding of CENP-A/H4 surfaces involved in CENP-A/H4 tetramerisation, DNA binding and H2A/H2B binding-all critical for nucleosome assembly. This is in agreement with the observation that CAL1 cannot directly interact with the CENP-A nucleosome (Roure et al, 2019) and requires CENP-C to mediate the interaction with the centromeric chromatin.

A-C SEC analysis of (A) His-CENP-C
In humans and fission yeast, the Mis18 complex is responsible for targeting the HJURP bound pre-nucleosomal CENP-A/H4 to the centromere by directly binding CENP-C (reviewed in Stellfox et al, 2012;Westhorpe & Straight, 2014;Zasadzinska & Foltz, 2017) but appears to have been lost during evolution in Drosophila. However, CAL1 seems to compensate for this loss by directly associating with CENP-C, which is present in most organisms with monocentric chromosomes (Drinnenberg et al, 2014). While there has been a suggestion that CENP-C cupin domain could be a dimer based on the crystal structure Mif2p cupin domain (Cohen et al, 2008), structural and functional roles of CENP-C cupin domain have remained unclear.
Here we show that CAL1 associates with CENP-C by directly interacting with the cupin domain and this interaction is essential for CENP-C mediated recruitment in cells. Our structural analysis shows that the overall structure of Drosophila CENP-C cupin domain is similar to that of Saccharomyces cerevisiae (Mif2p) and Schizosaccharomyces pombe (Cnp3) CENP-C cupin domains, with striking differences in the mode of dimerisation (Fig EV5A and B). It is tempting to suggest that this variation is related to the ability of dm CENP-C to bind CAL1 as the CAL1 binding interface of CENP-C is stabilised by dimerisation. In agreement with this notion, CENP-C cupin domains of dm and S. pombe appear to use different modes of binding to CAL1 and Moa1, respectively. In the case of S. pombe CENP-C, Moa1 binding site is mapped to an extended site with critical residues laterally spread across the deep pocket that forms the core of the cupin domain (Chik et al, 2019), whereas CAL1 binds at the periphery of the equivalent pocket and extends away from the pocket and towards the dimeric interface ( Fig EV5D).
Interestingly, although the dm CENP-C cupin dimer possesses two CAL1 binding sites, it appears to accommodate just one CAL1 at a time due to steric hindrance limiting the accessibility of the second CAL1 site. This might have broader implications for the mechanism of CENP-C binding at centromeres (Roure et al, 2019). In the context of the full-length proteins, CAL1 can also oligomerise via its N-terminus (Roure et al, 2019), leading to a scenario where a CENP-C bound CAL1 at the centromere might interact with a second CAL1 bringing another CENP-A/H4 dimer and CENP-C to facilitate CENP-A/H4 tetramer incorporation and the recruitment of CENP-C to the newly formed CENP-A nucleosome (Fig 9C). This is consistent with CENP-C targeting being reliant on CAL1 and CENP-A In summary, our work demonstrates how Drosophila species elegantly compensates for the loss of HJURP or Scm3 and the Mis18 complex through CAL1, which by combining evolutionarily conserved and adaptive structural interactions escorts CENP-A/H4 to the centromere for its subsequent incorporation into the chromatin to maintain centromere identity. Moreover, this is the first study providing the structural basis for how the CENP-A deposition machinery is targeted to centromeres in any organism. Future structural studies on the Mis18 complex and its interaction with HJURP and CENP-C will provide insights into how apparently complex intermolecular interactions achieve the same objective in vertebrates and what are the species-specific functional requirements of this complexity.

Plasmids
Codon optimised Drosophila melanogaster CAL1 and CENP-C were produced as gBlocks (IDT) and used directly in ligation-independent cloning (LIC) into bacterial expression vectors. Smaller fragments were amplified using PCR and then used for LIC. CAL1 1-160-LL-841-979 was produced using homologous PCR.
All mammalian expression vectors used in this study were constructed in a pN2-CMV vector. Mutants were generated following the site-directed mutagenesis protocol using phusion ultra II. Primers used are shown in Table 2. pET3a CENP-A 101-225 was generated in (Roure et al, 2019). pET His6 Sumo TEV (14S Addgene plasmid # 48291) was a gift from Scott Gradia. pEC-K-3C-His was a gift from Elena Conti. pET22b H4 was a kind gift from Karolin Luger.

Protein production
Purification of histones Histones were expressed and purified as described in Abad et al (2019)

Protein refolding
To refold histones with and without CAL1, histones were resuspended in 20 mM Tris-HCl pH 7.5, 7 M guanidine HCl and 2 mM bME and mixed with equimolar amounts of proteins needed. Proteins were then dialysed for 2 h at 4°C against 200 ml of 20 mM Tris-HCl pH 7.5, 7 M guanidine HCl and 2 mM bME; then, 2 l of 10 mM Tris-HCl pH 7.5, 2 M NaCl, 1 mM EDTA and 5 mM bME was slowly added overnight using a peristaltic pump. If needed, refolded protein was further dialysed against a lower salt concentration solvent; if not, complexes were purified by SEC using either a Cleaved CENP-C 1264-1411 was screened at 15 mg/ml in 20 mM Tris-HCl pH 8.0, 500 mM NaCl and 2 mM DTT against several commercial and homemade screens at 18°C. Crystals were obtained in around 13% of all conditions tested. His-CENP-C 1264-1411 -CAL1 841-979 complex was made in 20 mM Tris-HCl pH 8.0, 100 mM NaCl and 2 mM DTT and used with Structure 1 + 2 and JCSG+ (Molecular Dimensions) at 15 mg/ml at 4°C. Crystals were briefly transferred to a cryoprotectant solution (either oil or the mother liquor supplemented with 40% peg 3350) before directly flash cooled in liquid nitrogen and analysed on beamlines i03 and i04-1 at the Diamond Light Source (Didcot, UK).
Figures were prepared using PyMOL (http://www.pymol.org). Data collection, phasing and refinement statistics are shown in Table 1.

Ni-NTA interaction trials
Ni-NTA pull-down assays were performed using His-CAL1 1-160 WT and mutants mixed with 1.3 times molar excess of CENP-A 101-225 -H4 and made up to 200 ll with 20 mM Tris-HCl pH 8.0, 2 M NaCl, 10% glycerol, 0.5% NP40, 35 mM imidazole and 2 mM bME. 190 ll was incubated with 120 ll of HisPur TM Ni-NTA resin slurry that had been washed with ddH 2 O and buffer for 30 min at 4°C. Beads were then washed four times with 1 ml of buffer, then twice with 1 ml of 20 mM Tris-HCl pH 8.0, 500 mM NaCl, 35 mM imidazole and 2 mM bME and eluted by boiling in SDS-PAGE loading dye before being separated on a Bolt TM 4-12% Bis-Tris Plus gel (Invitrogen) run at 180 V for 1 h in MES buffer. Gels were then stained with Coomassie Blue, and scanned gel images were analysed and quantified with ImageJ.

SEC-MALS
Size-exclusion chromatography (Ä KTA-Micro TM , GE Healthcare) coupled to UV, static light scattering and refractive index detection (Viscotek SEC-MALS 20 and Viscotek RI Detector VE3580; Malvern Instruments) was used to determine the molecular mass of proteins and protein complexes in solution. Injections of 100 ll of 1-5 mg/ml material were used.
Light scattering, refractive index (RI) and A 280 nm were analysed by a homo-polymer model (OmniSEC software, v5.02; Malvern Instruments) using the parameters stated for each protein, @n/@ c = 0.185 ml/g and buffer RI value of 1.335. The mean standard error in the mass accuracy determined for a range of protein-protein complexes spanning the mass range of 6-600 kDa is AE 1.9%.

Cross-linking mass spectrometry
Cross-linking was performed on gel-filtered complexes dialysed into PBS. 30 lg EDC (Thermo Fisher Scientific) and 66 lg sulpho-NHS (Thermo Fisher Scientific) were used to cross-link 10 lg of protein for 1.5 h at RT. 30 lg of BS 3 was used to cross-link 10 lg of protein for 2 h at RT. The reactions were quenched with final concentration 100 mM Tris-HCl or 5 mM ammonium bicarbonate, respectively, before separation on Bolt TM 4-12% Bis-Tris Plus gels (Invitrogen). Following previously established protocol (Maiolica et al, 2007), the bands were excised and proteins were digested with 13 ng/ll trypsin (Pierce) overnight at 37°C after being reduced and alkylated. The digested peptides were loaded onto C18-Stage-tips  for LC-MS/MS analysis (see Appendix Supplementary Methods).

Cell culture and transfections
Schneider S2 cells containing the LacO array (L2-4_LacO_LexA_-Clone11) were generated as described in Mendiburo et al (2011). Schneider S2 cells were grown at 25°C in Schneider's Drosophila medium (Gibco) supplemented with 10% foetal calf serum (Sigma). Cells were seeded in 24-well plates, a day prior to transfection at a density of 5 × 10 5 cells per well. Cells were transfected using X-tremeGENE DNA Transfection Reagent (Roche) according to the manufacturer's instructions, using 200 ng of plasmid DNA. Transfected cells were analysed by immunofluorescence 72 h posttransfection.
U2OS cells containing 200 copies of an array of 256 tandem repeats of the 17 bp LacO sequence on chromosome 1 (gift from B.E. Black, University of Pennsylvania, Philadelphia; Janicki et al, 2004) were grown in DMEM supplemented with 10% FBS and 1% penicillin-streptomycin at 37°C in a 5% CO 2 incubator. Cells were seeded in 10 cm dishes, a day prior to transfection at a density of 2.5 × 10 6 cells per well. Transfections were performed with Lipofectamine 3000 (Life Technologies) according to the manufacturer's instructions, using 15 lg of plasmid DNA and Opti-MEM I reduced serum medium (Life Technologies). Next day, cells were washed once with 1×DPBS, trypsinised, counted and re-plated on polylysine-coated coverslips in 6-well plates at a density of 10 6 cells per well. Downstream experiments were performed 3 days posttransfection.

Immunofluorescence
Cells were washed once in PBS and then fixed with 3.7% formaldehyde in 0.1% Triton X-100 in 1× PBS (PBST) for 8 min at RT. Following fixation, the slides were washed once in PBST and then blocked in Image-iT â FX signal enhancer in a humidified chamber at RT for at least 30 min. All antibodies were incubated in a 1:1 mix of PBST and 10% normal goat serum (Life Technologies) overnight at 4°C in a humidified chamber and were used in 1:100 dilution unless otherwise stated: myc (Abcam-ab9106), V5 (Invitrogen-R96025) and HA (clone 3F10; E. Kremmer, 1:20). Secondary antibodies coupled to Alexa Fluor 555 and 647 (Invitrogen) were used at 1:100 dilutions. Counterstaining of DNA was performed with DAPI (5 lg/ml), and coverslips were mounted on the slides with 30 ll of SlowFade â Gold antifade reagent.

Microscopy and image analysis
All IF images were taken as 50 z-stacks of 0.2 lm increments, using a 100× oil immersion objective on a DeltaVision RT Elite Microscope and a CoolSNAP HQ Monochrome camera. All images were deconvolved using the aggressive deconvolution mode on a SoftWorx Explorer Suite (Applied Precision) and are shown as quick projections of maximum intensity. For U2OS cells, the mean fluorescence intensity of the protein of interest was measured at the LacO spot, and then the mean fluorescence intensity in the nucleus (background) was subtracted from this value. For S2 cells, the mean fluorescence intensity of the protein of interest was measured at the LacO spot, and then the mean fluorescence intensity of three spots around the LacO spot was subtracted from this value. 25-50 cells were analysed per biological replicate, and a minimum of three independent biological replicates were quantified per experiment.

Analytical ultracentrifugation
Sedimentation velocity and SE experiments were performed using a Beckman Coulter XL-I analytical ultracentrifuge equipped with an An-50 Ti eight-hole rotor. Depending on their concentration, samples were loaded into 12 (low concentration) or 3 mm (high concentration) pathlength charcoal-filled epon double-sector centrepieces, sandwiched between two sapphire windows. For SV, samples were equilibrated at 4°C in vacuum for 6 h before running at 49 k rpm. For SE, data were recorded at 26 k rpm. The laser delay, brightness and contrast were pre-adjusted at 3 k rpm to acquire the best quality interference fringes. Data were collected using Rayleigh interference and absorbance optics recording radial intensity or absorbance at 280 nm. For SV, data were recorded between radial positions of 5.65 and 7.25 cm, with a radial resolution of 0.005 cm and a time interval of 7 min, and analysed with the program SEDFIT (Schuck, 2000) using a continuous c(s) model. For SE, data were recorded between radial positions of 6.00 and 7.25 cm, with a radial resolution of 0.001 cm and a time interval of 3 h (until successive scans overlaid satisfactorily), and analysed with the program SEDPHAT (Vistica et al, 2004) using species analysis. The partial specific volume, buffer density and viscosity were calculated using SEDNTERP (Hayes et al, 2012). Sedimentation coefficients were computed from atomic coordinate models using SOMO (Brookes & Rocco, 2018).

Statistics
Data are representative of at least three independent experiments, unless otherwise stated. Statistics were performed using GraphPad Prism, version 7.0e (GraphPad Software, Inc), using an unpaired two-tailed t-test or Mann-Whitney test. For each statistical test, Pvalue of < 0.05 was considered significant.
Expanded View for this article is available online.