Structural insights into the architecture and membrane interactions of the conserved COMMD proteins

The COMMD proteins are a conserved family of proteins with central roles in intracellular membrane trafficking and transcription. They form oligomeric complexes with each other and act as components of a larger assembly called the CCC complex, which is localized to endosomal compartments and mediates the transport of several transmembrane cargos. How these complexes are formed however is completely unknown. Here, we have systematically characterised the interactions between human COMMD proteins, and determined structures of COMMD proteins using X-ray crystallography and X-ray scattering to provide insights into the underlying mechanisms of homo- and heteromeric assembly. All COMMD proteins possess an α-helical N-terminal domain, and a highly conserved C-terminal domain that forms a tightly interlocked dimeric structure responsible for COMMD-COMMD interactions. The COMM domains also bind directly to components of CCC and mediate non-specific membrane association. Overall these studies show that COMMD proteins function as obligatory dimers with conserved domain architectures.


Introduction
The COMMD (Copper Metabolism Murr1 (Mouse U2af1-rs1 region 1) Domain) proteins are highly conserved in metazoans and unicellular protozoa (Burstein et al., 2005;. There are ten family members that play key roles in intracellular trafficking and in the regulation of transcription (Burstein et al., 2005;. A hallmark of the COMMD family members is a highly conserved C-terminal sequence of~70-80 amino acids called the COMM domain, which has no known structure. The N-terminal domain of these proteins, which we refer to as the HN (helical N-terminal) domain, is more variable in sequence across the ten proteins, and is proposed to ascribe unique functions to the different family members (Burstein et al., 2005;. Despite the high degree of conservation and the important roles of COMMD proteins in membrane trafficking and cell signalling, little is known about their structures or their specific molecular functions. In humans, all ten members of the COMMD family (Commd1-Commd10) are expressed broadly in many tissues (Burstein et al., 2005;van De Sluis et al., 2002). Much of our current understanding of the COMMD proteins is derived from studies of the founding family member Commd1. An inframe deletion in exon 2 of the COMMD1 gene was first identified in Bedlington terrier dogs with ( Figure 1A,B). SEC-MALLS data suggests that dimerization is likely to be a general property of the COMMD family members. To determine the domains required for homodimerisation we performed SEC-MALLS analyses of isolated HN and COMM domains. While the Commd1 and Commd9 HN domains behave as monomers, Commd1 and Commd9 COMM domains exist as dimeric species (Figure 1B,C). This data shows that the COMM domain is both required and sufficient to mediate COMMD protein dimerization.

Structural analysis of the COMMD proteins
We next sought to determine the X-ray crystallographic structures of COMMD proteins. However, although crystals of various full-length COMMD proteins grew rapidly, their diffraction quality was not sufficient for structure determination. Therefore a divide-and-conquer approach was taken to determine structures of the two individual domains of the protein, the conserved C-terminal COMM domain, and the variable N-terminal HN domain.

Crystal structure of the Commd9 C-terminal COMM domain
We determined the crystal structure of the Commd9 COMM domain to 2.2 Å resolution by singlewavelength anomalous dispersion (SAD). The overall structure of the Commd9 COMM domain is composed of two cone-shaped chains that are tightly intertwined with each other to form a globular dimeric module ( Figure 1D, Table 1). Each monomer is comprised of an N-terminal three-stranded b-sheet capped by an a-helix, with the overall arrangement making an open, hairpin-like structure. A simple analogy for the COMM domain dimer is that it resembles a left-handed handshake, where the sheet and helix from each monomer represent the interlocked palms and thumbs of each hand respectively ( Figure 1E). The overlapping C-terminal a-helices of each chain bury a large hydrophobic surface area of approximately 2100 Å 2 (nearly 1/3 rd of the monomer surface area) supporting the notion that the native state of COMMD proteins is to form dimers via the COMM domain. COMMD proteins dimerise via a conserved hydrophobic interface in the COMM domain In Figure 2 the interface that mediates Commd9 dimerisation is examined in closer detail. Using CONSURF (Ashkenazy et al., 2016;Landau et al., 2005), the most evolutionarily conserved residues in Commd9 were mapped onto the structure. There is a very high degree of sequence conservation seen in the hydrophobic core of the Commd9 COMM domain, particularly along the a-helical surface mediating dimerisation ( Figure 2A). Multiple sequence alignment of the COMM domains of all the COMMD proteins demonstrates conservation of many hydrophobic amino acids, particularly leucine residues ( Figure 2B). The high degree of conservation, and high degree of hydrophobicity within the dimerization interface strongly suggests that all of the COMMD proteins will form obligatory dimeric structures through similar mechanisms. We attempted to mutate several interfacial Figure 2. Structural mechanism underpinning the COMMD dimerization. (A) A surface map of the conserved and variable residues of the Commd9 COMM domain showing the hydrophobic core is highly conserved while the surface residues are more variable, confirming the importance of dimerization for COMMD stability. These calculations are made using the Consurf server (Ashkenazy et al., 2010)  residues, but both the full-length mutated COMMD proteins and isolated COMM domains were aggregated due to poor protein solubility (not shown). This provides further support for the essential nature of dimer formation.

Crystal structure of the Commd9 N-terminal HN domain
We next determined the structure of the Commd9 N-terminal domain by X-ray crystallography to a resolution of 1.55 Å using multi-wavelength anomalous dispersion. Overall, the Commd9 N-terminal domain has a globular architecture and is composed of a six-helix bundle with a meander topology ( Figure 3A, Table 1). We therefore refer to this domain as the HN (Helical N-terminal) domain. The overall structure of the HN domain has a similar all a-helical fold to the equivalent domain of Commd1, which was previously determined by NMR ( Figure 3B) (Sommerhalter et al., 2007). Compared to Commd1 however, the HN domain of Commd9 has an additional a-helix at its N-terminus that packs down on top of the structure, and a large disordered loop in Commd1 is better resolved in Commd9. In Commd9 the a4 helix forms a central core element that extends the length of the HN domain. In Commd1 however, the equivalent helix is bent and oriented differently. This could be  (Sommerhalter et al., 2007) shown in cartoon diagram in the same orientation as Commd9 in Figure 3A left panel. The Commd1 HN domain lacks the first helix (a1) and appears to have a kinked helix a4, but overall shares the same topology. (C) Sequence conservation of Commd9 HN domain mapped on the structure using CONSURF. DOI: https://doi.org/10.7554/eLife.35898.006 The following figure supplement is available for figure 3: due to the absence of the a1 helix to provide stability, or might also be due to an insufficient number of restraints in this region used for NMR structure calculations. However, the topology of the five shared a-helices (a2-a6) is similar overall. Comparing the HN domains of Commd1 and Commd9 using DALI (Holm and Laakso, 2016;Holm and Rosenströ m, 2010) showed structural similarity (DALI Z-score >3) with an RMSD of 3.2 Å over 83 Ca atoms. The electrostatic surface of the Commd9 HN domain reveals the presence of two basic patches and a negatively charged region, but it is not clear yet what their functional significance might be ( Figure 3-figure supplement 1A). Mapping sequence conservation of the Commd9 HN domain across species shows conserved surface residues mainly in the a1 helix region ( Figure 3C). The N-terminal regions of the COMMD proteins are quite variable in sequence across the ten family members (although they are well conserved in paralogous proteins across species), and this has led to it being referred to as a variable domain or VARD . Bioinformatics and secondary structure analyses of the human COMMD proteins however, shows that in all of the family members the HN domain is very likely to share the same a-helical topology, as well as across various species of Commd9 (Figure 3-figure supplement 1B,C). In support of this, the HN domain of human Commd3 shows a strong a-helical l signal when examined using far-UV circular dichroism spectroscopy (not shown). The exception to this is human Commd6, which does not possess an HN domain at all, although Commd6 orthologs in other species such as fish and amphibians do ( Table 2). The scattering curves and pair distribution functions (P(r)s) both signify a relatively globular structure. Ab initio structures of these calculated from the SAXS data reveal compact but elongated molecules. We next performed rigid body fitting of the SAXS data with the program SASREF (Petoukhov and Svergun, 2005), using the crystal structures of the Commd9 HN and COMM domains for Commd7 and Commd9 whereas Commd1 HN and Commd9 COMMD domain was used for generating the Commd1 model. The resulting models obtained by this approach were superimposed on the SAXS envelope. The theoretical scattering profiles of Commd1, Commd7 and Commd9 models, and the ab initio molecular envelopes are in good agreement with the experimental scattering data with low c 2 values. These studies suggest that the COMMD proteins, including Commd9 are homodimers with an elongated shape in solution.
COMMD proteins are structurally related to a unique protein from chlamydial species The COMMD proteins share no detectable sequence homology to any other proteins. However, structural comparison of the Commd9 COMM domain to the structures in the Protein Data Bank (PDB) using DALI (Holm and Laakso, 2016;Holm and Rosenströ m, 2010) did identify low scoring (DALI Z-score >2.5) structural matches to the PH domain of human pleckstrin as well as the phox homology (PX) domain (DALI Z-score >3) of yeast Grd19. Interestingly, it is clear from these comparisons that the COMM domain has a similar topology to core fragments of the larger PX and PH structures ( Figure 4-figure supplement 3). A closer structural match however was identified with the Pur-a (purine-rich element binding protein) repeat domain (DALI Z-score >4.5), a whirly-like nucleic acid binding fold (Graebsch et al., 2009;Weber et al., 2016) (Figure 4-figure supplement 4A). Overlay of the Commd9 COMM domain with repeats of Pur-a reveals a similar fold with a RMSD of 2.5 Å over nearly 50 Ca atoms.

COMMD proteins bind promiscuously to each other
There are now a number of high-throughput proteomics studies that point to the existence of a large multi-subunit assembly containing all of the COMMD family proteins (Mallam and Marcotte, 2017;Wan et al., 2015;McNally et al., 2017;Dey et al., 2015;Hein et al., 2015;Huttlin et al., 2015). This is also well supported by several more targeted studies demonstrating that COMMD proteins associate with each other in both endogenous and over-expression conditions (Wan et al., , 2017). In the majority of cases these interactions were found using co-immunoprecipitation strategies, although it has also been shown that heterodimeric complexes are formed upon bacterial co-expression of Commd1-Commd6, Commd1-Commd5 and Commd9-Commd5 (Wan et al., 2015). To study these pairwise interactions systematically we initially attempted a GST pull down assay with purified His-tagged Commd1 and all of the GST-tagged COMMD members. However, we did not observe any significant interactions using this approach (not shown). Next, we assessed whether co-translation would lead to heterodimeric interactions. GST-baits (all COMMDs) were co-expressed in E. coli with selected His-tagged preys (Commd1, 7, 9 and 10) followed by affinity purification using glutathione sepharose beads ( Figure 5A, Figure 5figure supplement 1). Western blotting was used to confirm interactions unambiguously as GST and His-tagged COMMD proteins are similar molecular weights. GST-Commd8 was not included in these experiments due to the tendency of this protein to degrade. This assay reveals that all of the COMMD proteins are able to co-assemble with each other in a highly promiscuous manner, while the lack of binding using separately purified proteins suggests that no exchange occurs between pre-formed homodimeric proteins. In general there is little specificity seen in the heteromeric complexes formed, although Commd10 binds most strongly to Commd2 and Commd5 and only weakly with Commd9, while Commd9 binds weakly to Commd6. Previous reports have shown that the COMM domain alone may be sufficient to allow interactions between COMMD proteins (Burstein et al., 2005). To test this we next performed our co-expression GST-pull down assays with His-tagged COMM domains of Commd1 and Commd9 and the HN domain of Commd1 as prey. While no interaction was observed between any COMMD proteins and the N-terminal domain of Commd1, strong interactions were seen for C-terminal COMM domains of Commd1 and Commd9 mirroring the full-length proteins ( Figure 5A). Overall, our data indicates that COMMD proteins form promiscuous homo-and heterodimeric complexes through their C-terminal COMM domains. The fact that COMMD-COMMD interactions could only be reconstituted after co-translation suggest that these complexes most likely involve the formation of dimeric structures analogous to that seen in the Commd9 COMM domain crystal structure.

Reconstitution of stable heteromeric COMMD complexes
Our GST pulldown experiment demonstrates that Commd10 preferentially binds to Commd2 and Commd5. To assess the stoichiometry of heteromeric Commd complexes, we co-expressed GST-Commd5 with Commd10-His and purified the complex by sequential affinity purification using glutathione sepharose and TALON beads. The eluted complex was subjected to size exclusion chromatography (SEC) to obtain a 1:1 stoichiometric complex. The SEC fractions under the peak clearly show reconstitution of a 1:1 Commd5-Commd10 complex ( Figure 5B).  Since the COMM domain singularly mediates the formation of homo-and heteromeric complexes of COMMD proteins, we also attempted to make the complex using the full-length Commd5 and COMM domain of Commd10 and COMM domains of Commd5 and Commd10 respectively. Indeed, highly pure and stoichiometric species were isolated confirming the role of the COMM domain in COMMD complexes ( Figure 5C,D). Several Commd proteins form COMM domain dependent homodimers and this prompted us to investigate the biophysical properties of Commd5-Commd10 complexes. SEC-MALLS analysis of Commd5-Commd10 reveals that the complex is highly monodisperse in solution. Although the combined theoretical molecular weight of these proteins is 47.5 kDa, the experimental mass was calculated to be 85 kDa, which suggests the existence of a Commd5-Commd10 heterotetramer ( Figure 5E,F). Along the same lines, the Commd5-Commd10 COMM domain complex appears to also form a heterotetrameric assembly ( Figure 5E,F). Altogether, these data raises two possibilities. First, COMM domains of Commd5 and Commd10 form homodimers and assemble together as a tetramer. Second, the Comm domain of Commd5 and Commd10 when expressed together form heterodimers ultimately forming the larger heterotetramer. In both of these scenarios, COMMD proteins are seemingly interacting with each other through a second binding interface on the COMM domain that is potentially distinct from the dimerization interface.

The COMM domains bind the calponin homology domains of CCDC22 and CCDC93
Central to the endosomal trafficking function of COMMD proteins is their assembly into the CCC complex (Mallam and Marcotte, 2017;Wan et al., 2015). CCDC22 and CCDC93 are large proteins predicted to contain N-terminal divergent calponin homology domains (NN-CH) and C-terminal coiled-coils (Schou et al., 2014), and are critical components of the CCC complex (Phillips-Krawczak et al., 2015;Bartuzi et al., 2016;Wan et al., 2015). A Commd1 knock out causes loss of CCDC22/CCDC93 (Phillips-Krawczak et al., 2015;Bartuzi et al., 2016). Therefore, we set out to examine whether CCDC22/CCDC93 directly interact with COMMDs, and their mode of association. Using the co-translation pull down assay, we found that Commd1, 7 and 10 bind to both CCDC22 and CCDC93 directly ( Figure 6A, Figure 5-figure supplement 1). Commd9 in contrast appears to recognize CCDC93 specifically. We next performed domain-truncations and conducted the binding assay with isolated calponin homology-like domains (NN-CH) and C-terminal coiled coil regions of CCDC22 and CCDC93. Interestingly, CCDC22 and CCDC93 bind to COMMDs chiefly through the NN-CH domain. Consistent with the previous literature (Phillips-Krawczak et al., 2015;Starokadomskyy et al., 2013), we also observed relatively weaker interaction bands for the C-terminal coiled coil domain of CCDC22 and CCDC93. Our data also shows that similar to COMMD-COMMD interaction, COMMD-CCDC22/CCDC93 binding occurs through the COMM domain.
In contrast to interactions between COMMD family members, which require co-expression for assembly, we find that CCDC22 and CCDC93 interaction with pre-formed COMMD dimers occurs spontaneously in vitro ( Figure 6B). GST-pull down experiments conducted by mixing the purified GST-NN-CH domain of CCDC22 and CCDC93 with full length Commd1 and Commd9 as well as the COMM domains showed a similar binding pattern to what was observed after co-expression ( Figure 6A,B). Specifically, Commd1 and Commd9 bind more strongly to the N-terminal NN-CH domain of CCDC93 in comparison to CCDC22. Moreover, this assay also suggests that Commd1 has stronger affinity than Commd9 for CCDC proteins. The interaction of the CCDC22 and CCDC93 NN-CH domains with the COMM domain of Commd1 was recapitulated and quantified using biolayer interferometry (BLiTz). A dose-dependent increase in the binding of the COMM  Commd proteins bind to CCDC22 and CCDC93 via a conserved site To identify the molecular determinants that govern the interaction between the COMM domain of COMMDs and NN-CH domain of CCDC22 and CCD93, we used cross-linking mass spectrometry (MS) in combination with pull down experiments. Non-deuterated BS3 cross-linker was used to crosslink full-length Commd9 with the NN-CH domain of CCDC93. Due to the availability of HN and COMM domain structures, Commd9 was chosen for these experiments. The NN-CH domain of CCDC22 could not be used as it does not contain any lysine residues. Upon crosslinking, three major bands (labeled as 1, 2 and 3) in the Commd9 and NN-CH domain of CCDC93 mixture were observed, which were excised for MS analysis ( Figure 7A). Lysozyme was used as a negative control, and a similar presence of bands 1 and 3 suggest they are likely to be cross-linked dimers and tetramers of Commd9 alone ( Figure 7A). This observation was further supported by inspection the MS spectra and xQuest database ( Table 3). Analyses of MS data from band 2 revealed 5 unique crosslinks, and 3 of these pairs connected Commd9 and NN-CH domain of CCDC93 ( Figure 7B and Table 3). Mapping of these pairs onto the crystal structures of Commd9 showed that all three lysines (K100, K133 and K152) were located on the surface accessible to the solvent ( Figure 7C,D). Notably, K133 and K152 are located on contiguous surfaces of the b-sheets of the COMM domains ( Figure 7D), suggesting a likely binding surface for the CCDC93 NN-CH domain. This surface also includes the side-chains of the conserved residues 128 WRVD 131 , with Trp128 in particular being strictly conserved across the entire Commd family ( Figure 2B).
To minimize the possibility that the cross-linked peptides captured were non-specific interactions caused by BS3, we also performed a cross-linking reaction using the unrelated VPS26 -VPS29 -VPS35 retromer complex from zebrafish (hereafter designated as zfRetromer) as a positive control. Inspection of the xQuest database and the MS spectra reveal a total of 18 cross-linked peptides ( Table 4). VPS26 and VPS29 mainly cross-linked to two opposite regions of VPS35, which is in good agreement with the known crystal structures of Retromer (Figure 7-figure supplement 1). This implies that the cross-links we observe between Commd9 and CCDC93 are specific.
COMMD proteins associate non-specifically with negatively charged phospholipids COMMD proteins are peripheral membrane proteins commonly associated with endosomal compartments (Phillips-Krawczak et al., 2015;Bartuzi et al., 2016;Wan et al., 2015;McNally et al., 2017;Burkhead et al., 2009;Drévillon et al., 2011). To examine if COMMD family proteins possess membrane-binding properties we performed qualitative liposome-pelleting assays with purified COMMD proteins ( Figure 8A, Figure 8-figure supplement 1). The assay shows that Commd1 associates relatively non-specifically with various negatively charged membranes, including PC/PE liposomes doped with different phosphoinositides, generic Folch lipids from brain extracts, and liposomes containing 30% phosphatidylinositolserine (PS). We observed a similar binding characteristic for Commd7, but Commd10 showed a relatively strong interaction with Folch liposomes as well as di-and tri-phosphorylated phosphoinositide species. In contrast to the other family members, Commd9 appears to associate weakly if at all with the membranes tested ( Figure 8A). A comparative analysis of the electrostatic surface of the COMM domain of Commd9 and a homology model of the COMM domain of Commd1 (constructed using the COMM domain of Commd9 as the template) shows that the Commd1 COMM domain exhibits a basic patch on its surface composed of solvent-exposed positively charged residues that are absent in Commd9 ( Figure 8B,C). To test if this basic surface on Commd1 is involved in membrane recruitment, we made a triple mutant in the putative lipid-binding site (R133Q, H134A and K167A). In liposome-pelleting assays this mutant shows a drastic reduction in membrane interaction ( Figure 8A).
Using BLiTz we next quantified the interactions of Commd1 and Commd9 with different phosphoinositide-containing membranes. While Commd1 possesses a higher level of binding response for phosphoinositides PI(3)P and PI(4,5)P 2 compared to Folch and PS containing membranes, Commd9 showed little association with any of the lipids tested, in line with the liposome-pelleting assay (Figure 8-figure supplement 2). We performed a concentration-dependent binding series with a selection of liposomes and Commd1 to calculate the dissociation constants of these interactions. Commd1 binds to a variety of liposomes with similar affinities in the range of 1-10 mM, which is in agreement with the qualitative results from pelleting assays ( Figure 8D,E,F,G and H). Confirming the importance of the basic patch on the Commd1 COMM domain, no interaction was observed   with the Commd1 triple mutant. Altogether, the COMM domain of COMMD proteins appears to be a central hub for both protein-protein and protein-membrane interactions.
We next assessed the importance of the Commd1 membrane-binding surface for its cellular localization. GFP-tagged Commd1 wild type and mutant were expressed in HeLa cells following lentiviral transduction ( Figure 8I). As seen previously, Commd1 is localized to endosomal puncta, and colocalises with the WASH complex subunit Fam21. Surprisingly however, the mutant Commd1 is also colocalised with Fam21 on endosomes, despite being defective for membrane binding in vitro. We propose therefore that this site on Commd1 may instead be important for non-specific membrane interactions and perhaps for the orientation of the CCC complex on endosomes, but in the context of the assembled complex with other COMMD proteins it is not required for specific recruitment to PtdIns3P-enriched endosomal compartments.

Discussion
The COMMD proteins were recently identified as conserved and central components of the CCC complex (Mallam and Marcotte, 2017;Wan et al., 2015;McNally et al., 2017), a large endosomeassociated assembly that regulates cell surface recycling of various transmembrane receptors (McNally et al., 2017;Phillips-Krawczak et al., 2015;Bartuzi et al., 2016;Wan et al., 2015). Despite the high degree of conservation of the COMMD proteins, and the CCC complex more generally, little is known about the structures or the stoichiometries of the component subunits, or how these proteins assemble together to control membrane recruitment and protein trafficking. In this study we provide the first insights into the architecture of the COMMD proteins, and the mechanisms that underpin the previously reported interactions between the various family members. COMMD proteins form obligatory dimers, and interact with each other in promiscuous homo-and heterodimeric arrangements via the C-terminal COMM domain. Furthermore this small domain is essential for pleiotropic interactions with both the CCC proteins CCDC93 and CCDC22, and with negatively charged phospholipid membranes. The solution structures of Commd1, Commd7 and Commd9 reveal a modular architecture with the a-helical N-terminal HN domains arranged as flexible appendages to the C-terminal COMM domain dimers. Previous studies had described the N-terminal region as a variable domain that could provide functional diversity to the COMMD proteins, due to the low level of sequence homology in this region across the family. Our biophysical, structural and bioinformatics analyses however show that the HN domain is a structurally conserved feature shared by all COMMD family members.
The architecture of the COMMD proteins revealed here bears a remarkable resemblance to the orthologous CT584 and Cpn0803, Chlamydia-specific proteins from C. trachomatis and C. pneumonia respectively. The function of these proteins is unknown, although there is evidence that Cpn0803 can interact with phospholipids as well as components of the Type III secretion system responsible for injection of bacterial effectors into the host cytoplasm (Stone et al., 2012). What does the homology of COMMD proteins to these chlamydial proteins imply? Chlamydial species are intracellular pathogens that reside inside a membrane vacuole called the inclusion. During their life cycle they secrete many effectors that can hijack the intracellular trafficking and signalling machinery of the host cell to promote survival (Elwell et al., 2016;Mirrashidi et al., 2015;Fischer et al., 2017;Pruneda et al., 2016). We speculate therefore that the CT584/Cpn0803 proteins may be secreted effectors with a potential to mimic or modify COMMD protein functions, although this remains to be confirmed. The evolutionary relationship of the COMMD proteins to the chlamydial homologues is also an interesting question. Several chlamydial proteins are believed to have evolved through horizontal gene transfer from eukaryotic hosts, such as SWIB domain-containing proteins and Swi/Snf2 helicases (Bastidas and Valdivia, 2016;Stephens et al., 1998), and it is possible that the CT584/ Cpn0803 genes have been acquired in a similar fashion. Both CT584 and Cpn0803 form hexameric structures composed of a trimer of dimers, with the N-terminal domains providing the primary interface for trimer formation (Barta et al., 2013;Stone et al., 2012) (Figure 4-figure supplement  4E). While full-length recombinant Commd1, Commd7 and Commd9 proteins analysed here form homodimers, it is tempting to speculate that the N-terminal domains of COMMD proteins will contribute to formation of higher-order heteromeric assemblies that are present in the 600 kDa CCC complexes isolated from cultured cells (Wan et al., 2015), the stoichiometries and structures of which remain to be determined.
Although the COMMD proteins play a central role in endosomal membrane trafficking as components of the CCC complex, they have also been implicated in a number of other cellular processes. The gene encoding Commd1 (originally called Murr1) was originally identified in dogs with copper toxicosis, which has been mechanistically linked to interactions with the Wilson disease ATPase protein ATP7B (Tao et al., 2003). Most prominently, Commd1 has been shown to be a potent inhibitor of the transcription factor NF-kB, a master regulator of inflammation, and other COMMD proteins also display similar activities de Bie et al., 2006;Ganesh et al., 2003;Bartuzi et al., 2013). Commd1 acts downstream of the inhibitory IkB kinase (Ganesh et al., 2003), and is believed to associate with NF-kB subunits at chromosomal loci and promote NF-kB ubiquitination by Cullin family ubiquitin ligases for degradation Mao et al., 2011). A similar function has been proposed for Commd1 regulation of the HIF-1a transcription factor (van de Sluis et al., 2007(van de Sluis et al., , 2010(van de Sluis et al., , 2009. The COMM domain fold is structurally related to the repeat domains found in the PUR (purine-rich element binding protein) family protein Pur-a (Graebsch et al., 2009;Weber et al., 2016). Pur-a contains three such repeat domains, the first two of which form an intramolecular 'dimer' while the third forms an intermolecular dimer (Weber et al., 2016) that both resemble the dimeric COMM domain topology. Pur-a is a nucleic acid binding protein that plays important roles in the transcription of neuronal genes (Gallia et al., 2000;White et al., 2009), and is associated with GGGGCC-containing inclusions in ALS-FTD patients (Xu et al., 2013). The similarity of the COMM domain with Pur-a suggests the intriguing possibility that COMMD proteins could also interact with DNA and/or RNA to regulate transcription, although it should be noted that the specific DNA binding site in Pur-a (Weber et al., 2016) is not conserved in the COMMD proteins (Figure 4-figure supplement 3A).
The CCC complex and associated COMMD proteins are predominantly localised to early endosomal membranes (Phillips-Krawczak et al., 2015;Bartuzi et al., 2016;Wan et al., 2015;McNally et al., 2017;Burkhead et al., 2009;Drévillon et al., 2011). The mechanism of membrane recruitment of the complex however is unknown. Most intracellular trafficking complexes rely on binding to proteins such as the Rab GTPases and membrane lipids such as phosphoinositides for their spatio-temporal recruitment to specific compartments (Cullen, 2011). COMMD proteins show significant but non-preferential binding to a variety of negatively charged lipids, and these interactions appear to be maintained by the COMM domain. Our data also highlights that the HN domain is dispensable for membrane association, which is in line with the work of Burkhead et al. (Burkhead et al., 2009). The membrane-binding surface of Commd1 in vitro is composed of residues from both molecules of the homodimer, but interestingly this does not appear essential for endosomal localisation of Commd1 in cells. This suggests that the ability of COMMD proteins to associate with membranes is only one part of a more complicated array of interactions controlling specific endosomal membrane recruitment, likely involving different lipid binding properties of heterodimeric COMMD proteins and other CCC complex subunits. Supporting a role for other subunits and complexes in specific endosomal localisation of the CCC complex, the depletion of CCDC93 or the WASH complex subunit Fam21 both lead to a loss of Commd1 from endosomes (Phillips-Krawczak et al., 2015), while the depletion of CCDC22 causes a redistribution of Commd1 and Commd10 (Starokadomskyy et al., 2013).
The ability to self-assemble and form heteromeric complexes is a core property of the COMMD proteins identified in some of the very first studies (Burstein et al., 2005). Our work provides a clear structural explanation for how homo-and heterodimers are formed by the different family members, and begins to suggest mechanisms for how different domains could contribute to the formation of larger assemblies, how they associate with biological membranes, and how they become incorporated into the CCC complex through interactions with the CCDC proteins. As COMMD-containing complexes emerge as key regulators of cellular trafficking, signalling and transcription, the details of these molecular interactions and the systems that regulate them remain outstanding questions to be answered.

Molecular biology and cloning
All the constructs cloned into bacterial expression plasmids are listed in Figure 1-figure supplement 1. Briefly, DNA encoding full-length human Commd proteins and CCDC22 and, 93 was cloned into the pGEX-4T-2 plasmid for expression as N-terminal GST-tagged fusion proteins. Full length Commd1, 7 and 9 were also cloned into the pET30b(+) vector with a C-terminal His6 tag by Genscript Corporation. Commd1 HN domain (1-114

Recombinant protein expression and purification
The bacterial expression plasmids were transformed into Escherichia coli BL21-CodonPlus (DE3)-RIPL competent cells (Agilent). The bacterial cultures were grown in LB until OD 600nm reached 0.6. The cultures were cooled to 18˚C before inducing protein expression by adding 0.5 mM isopropylthio-b-galactoside (IPTG) and allowed to grow for 16 h (Ghai et al., 2011). The cells were harvested by centrifugation at 6000 Â g for 5 min at 4˚C and the harvested cell pellet was resuspended in lysis buffer [20 mM Tris (pH 8.0), 500 mM NaCl, 10% glycerol, 0.2% IGEPAL, 50 mg/mL benzamidine, 100 units DNaseI, and 1 mM b-mercaptoethanol]. Cells expressing His6-fused proteins were resuspended in lysis buffer supplemented with 20 mM imidazole (pH 8.0). The cells were lysed by mechanical disruption at 30 kpsi using a Constant systems cell disrupter. The lysate was clarified by centrifugation at 50,000 Â g for 30 min at 4˚C. Proteins were purified using affinity chromatography from the clarified lysate.
His-tagged proteins were purified on a nickel-NTA (Clonetech) gravity column and eluted with 500 mM imidazole in buffer containing 150 mM NaCl, 20 mM Tris (pH 8.0) and 1 mM b-mercaptoethanol. GST-tagged proteins were purified on a glutathione-Sepharose (GE healthcare) gravity column and eluted with 10 mM glutathione, 150 mM NaCl, 20 mM Tris (pH 8.0) and 1 mM bmercaptoethanol or the GST tag was cleaved with the addition of thrombin or precision protease cleavage on to the beads with overnight incubation at room temperature. Finally, proteins were subjected to size exclusion chromatography using a superdex-200 16/60 Hiload column or superdex-75 16/60 Hiload column attached to an AKTA pure (GE Healthcare).
The Commd5-Commd10 complexes were reconstituted by co-transformation of GST-Commd5 or GST-tagged COMM domain of Commd5 with Commd10-His or His-tagged COMM domain of Commd10 into Escherichia coli BL21 (DE3) competent cells (Agilent). The cells were cultured, harvested and lysed as described above. The complexes were first purified using glutathione sepharose beads and the GST tag was cleaved using prescission protease. The eluted fractions were then mixed with equilibrated TALON beads to obtain a 1:1 stoichiometric complex. The complexes were eluted by supplementation of 200 mM Imidazole (pH 8.0) in the wash buffer. The eluted proteins were subjected to size exclusion chromatography using a superdex-200 16/60 Hiload column.
For crystallization, SAXS, and MALLS experiments, proteins were buffer exchanged into 10 mM Tris (pH 8.0), 100 mM NaCl and 2 mM DTT using SEC. For the structure determination of Commd9 COMM domain, the protein was labeled with selenomethionine using the method described by Van Duyne et al (Van Duyne et al., 1993). zfVPS26, zfVSP29 and zfVPS35 were expressed separately in E. coli BL21 (DE3) cells grown in LB at 37˚C, and protein expression was induced at an OD600 of 0.7-0.8 by the addition of 1 mM IPTG. Cells were harvested after 18 hr of growth at 18˚C. Cell pellets of overexpressed zfVPS26, zfVSP29 and zfVPS35 were mixed and purified using standard metal affinity, glutathione affinity and size-exclusion chromatography techniques to obtain the purified retromer complex. The size-exclusion buffer contains 50 mM HEPES (pH 7.5), 150 mM NaCl, 2 mM DTT.

Multi-angle laser light scattering
The molecular mass of the COMMD proteins was determined by size exclusion chromatography on an AKTA pure (GE Healthcare) connected to a multi angle laser light scattering and, differential refractive index (RI) detector. The protein samples were gel-filtered in a buffer containing 25 mM Tris (pH 8.0), 300 mM NaCl and 2 mM DTT that had been filtered (0.22 mm) and degassed. Measurements of full-length COMMD proteins were made using a superdex-200 increase 5/150 column (GE Healthcare) at a flow rate of 0.25 ml/min with in-line UV, MALLS, and RI detectros (Dawn Heleos II and Optilab reX, respectively, Wyatt Technology Corp) for M W characterization. Measurements of the COMMD COMM and HN domains were made using a superdex-75 5/150 column (GE Healthcare) at a flow rate of 0.25 ml/min. UV, MALLS and RI data were collected and analysed using the ASTRA TM software (Wyatt Technology) (Folta-Stogniew, 2006) to compute the molecular mass.

Crystallisation, data collection and structure determination
The Commd9 COMM domain was buffer-exchanged into 10 mM Tris (pH 8.0), 100 mM NaCl, 2 mM DTT, and concentrated to 8 mg/ml for crystallisation at 20˚C. The protein was supplemented with 10 mM DTT before setting up hanging-drop crystallization screens using a mosquito liquid handling robot (TTP LabTech). Commd9 COMM was crystallised in 0.1 M HEPES (pH 7.0), 6% Jeffamine M-600.
In the case of the Commd9 HN domain, protein was buffer exchanged into 50 mM HEPES (pH 8.0), 200 mM NaCl, 1 mM tris (2-carboxyethyl) phosphine (TCEP) and initial crystallisation screens were set up at 12 mg/ml at 18˚C. Crystals of the HN domain were obtained in 0.2 M citric acid (pH 4.9), 28% MME-PEG5000 in a hanging drop setup respectively. Data were collected at the Australian Synchrotron MX1 and MX2 Beamlines. iMOSFLM (Battye et al., 2011) was used to integrate the data, and AIMLESS (Evans and Murshudov, 2013) was used for data scaling in the CCP4 suite (Winn et al., 2011). The Commd9 COMM domain structure was solved using single anamolous dispersion (SAD), and the phases were calculated using the peak wavelength data of selenium with AUTOSOL using the PHENIX suite (Adams et al., 2010;Terwilliger et al., 2009). The solution from AUTOSOL was built using autobuild (Terwilliger et al., 2008) and the resulting model was rebuilt with COOT (Emsley and Cowtan, 2004) followed by repeated refinement runs and model building with PHENIX (Adams et al., 2010) and COOT (Emsley and Cowtan, 2004). The Commd9 HN domain structure was determined using multiwavelength anamolous dispersion (MAD) and the phases were obtained using the program SOLVE. Model building and refinement was done using COOT and PHENIX refine.

Small angle X-ray scattering
In line SEC-SAXS measurements on homogeneous protein samples (assessed using MALLS) were performed at the SAXS/WAXS beamline at the Australian Synchrotron using a superdex-200 increase 5/150 column (GE Healthcare), and Pilatus 1M detector (Dectris). The scattering data were measured in a q range of 0.011 to 0.4 Å at 12 keV using a 1.6 m camera length. Samples were loaded on to the size exclusion column that was equilibrated with 10 mM Tris (pH 8.0), 100 mM NaCl, 2 mM DTT and 5% Glycerol. Data reduction was performed using the ScatterBrain program (written and provided by the Australian Synchrotron; available at http://www.synchrotron.org.au). The buffer frames were averaged after assessing the statistical equivalence using CorMap p values with a significance threshold (a) of 0.01. The averaged buffer scattering was subtracted from statistically similar data from Commd proteins elution peak. R g was evaluated using the Guinier approximation and was found to be consistent under the elution peak. Primary data processing was performed in Primus using the ATSAS suite (version 2.6) (Petoukhov et al., 2012). Pair distance distribution P(r) of Commd proteins was determined using GNOM. C 2 symmetry was assumed in generating low-resolution three-dimensional Ab initio envelopes using the program GASBOR (Kozin and Svergun, 2001). DAMAVER (Volkov and Svergun, 2003) was used to average the 20 independent models generated by GASBOR. Rigid body modeling was performed using SASREF (Petoukhov and Svergun, 2005) and the partial scattering amplitudes were calculated with CRYSOL (Svergun et al., 1995). The ab initio models were superimposed on to the rigid body modeled structures using SUPCOMB (Kozin and Svergun, 2001).  (17-155), CCDC93_C (239-630) and empty pGEX4T-2 vector (expressing GST). Transformants were selected via overnight growth using triple antibiotic agar plates. A single colony was picked to initiate the culture and proteins were co-expressed using the standard protein expression protocol as described above. Proteins were purified by affinity chromatography using glutathione sepharose beads (GE healthcare) and SDS-PAGE was run to visualize GST-tagged bait proteins. Binding of His-tagged proteins (prey) to GST-tagged (bait) was observed by Western blotting using mouse anti-His antibody (Genscript). Genscript generated all the mutants used in this study.

Co-expression GST pull-downs
GST pull downs 1 nmol GST-tagged CCDC22_N (1-139) and CCDC93_N (17-155) were mixed with 1 nmol of Histagged Commd1 and Commd9, and COMM domains of Commd1 and Commd9 and Commd1 HN domain, for 1 hr at 4˚C. Protein mixture was then centrifuged at high speed to remove any precipitated proteins. The supernatant was then added to pre-equilibrated (20 mM Tris (pH 8.0), 300 mM NaCl, 1 mM DTT) glutathione sepharose and allowed to mix for a further 30 min at 4˚C. Beads were washed five times in the above buffer supplemented with 0.5% triton X100 (Sigma Aldrich). Bound proteins were analysed by Western blots using mouse anti-His antibody (Genscript).

Chemical Cross-linking coupled with mass spectrometry
For cross-linking, the purified full-length Commd9 and NN-CH domain of CCDC93 mixture at 50 mM in 50 mM Hepes (pH 7.5), 150 mM NaCl were incubated with 100 molar excess of BS3-d0 crosslinker (Sigma-Aldrich) for 30 min at room temperature. The reaction was quenched by addition of 100 mM Tris-HCl (pH 8.5), and the cross-linked products were analysed by SDS-PAGE and subjected to MS analysis. A negative control cross-linking reaction was performed between the full-length Commd9 and lysozyme using the same condition described above. For the positive control crosslinking reaction, the purified retromer complex at 15 mM was reacted with 100 molar excess of BS3-d0 cross-linker. BS3-d0 was purchased from Sigma Aldrich (catalog no. S5799). The gel band that corresponds to the molecular weight of monomeric retromer complex was subjected to MS analysis. The bands from the SDS-PAGE gels were excised and reduced with dithioerythritol followed by alkylation with iodoacetamide. Alkylated samples were digested with trypsin (Promega) in 50 mM ammonium bicarbonate pH 8.0 overnight using an enzyme-to-substrate ratio of 1:100 (w/w) at 37˚C. The digested samples were extracted using extraction buffer containing 5% formic acid and 50% acetonitrile followed by sonication for 1 min. The supernatant was then dried down in a vacuum centrifuge and redissolved in 0.1% formic acid prior to analyse by LC-MS/MS. The extracted peptides were analysed by uHPLC-MS/MS on an Eksigent, Ekspert nano LC400 uHPLC (SCIEX, Canada) coupled to a Triple Tof 6600 mass spectrometer (SCIEX, Canada) equipped with a duo microelectrospray ion source. In brief, samples were injected onto a 300 mm x 150 mm ChromXP C18 CL 3 mm column (SCIEX, Canada) at 5 ml/min. The bound peptides were eluted with a gradient using solvent containing 0.1% formic acid in acetonitrile. 250 ms full scan TOF-MS data was acquired followed by up to 30 50 ms full scan product ion data in an Information Dependant Acquisition, IDA, mode. TOFMS data was acquired over the mass range 350-2000 and for product ion ms/ms 100-1600. Ions observed in the TOF-MS scan exceeding a threshold of 100 counts and a charge state of +2 to +5 were set to trigger the acquisition of product ion, ms/ms spectra of the resultant 30 most intense ions. Acquisition of all MS/MS samples was performed using Analyst TF 1.7 software (SCIEX, Canada). Inspection of the raw MS data was done using ProteinPilot software (SCIEX, Canada). The assignment of cross-linked peptides was made based on xQuest database search engine (Rinner et al., 2008). Trypsin was set as the enzyme used for digestion during sample preparation with an MS1 tolerance of 10 ppm and MS2 tolerance of 0.2 m/z.

Liposome preparation
All the phosphoinositides were protonated prior to usage. In brief, powdered lipids were resuspended in chloroform (CHCl 3 ) and dried under argon. Dried lipids were then left in a desiccator for 1 hr to remove any remaining moisture. Dried lipids were resuspended in CHCl 3 :Methanol (MeOH):1N hydrochloric acid in a 2:1:0.01 molar ratio, lipids were dried once again and allowed to desiccate. Lipids were then resuspended in CHCl 3 :MeOH in a 3:1 ratio dried once again under argon. Finally, dried lipids were resuspended in CHCl 3 and stored at À20˚C.
Lipid stock solutions were mixed to the desired molar ratios and dried under argon. To prepare control liposomes POPC and POPE were mixed in a 90:10 molar ratio, for BLiTz experiments liposomes were doped with 0.5% biotinylated POPE. Liposomes containing phosphoinositides were prepared by mixing POPC, POPE and PIPs in a 80:10:10 molar ratio respectively. 30% POPS was used for POPC:POPE:POPS. Dried lipids were hydrated in 25 mM HEPES (pH 7.2), and 220 mM sucrose to obtain a suspension of multilamellar liposomes containing sucrose. This solution was then freezethawed five times to produce unilamellar liposomes. Liposomes were then diluted 1:5 in 25 mM HEPES (pH 7.2), and 125 mM NaCl solution. The solution was then centrifuged at 250,000 g to remove sucrose from the medium and maintain osmolarity. The pelleted liposomes were resuspended in 25 mM HEPES (pH 7.2), and 125 mM NaCl solution to the desired concentration of 0.5 mM. All liposomes were used within 1 day of preparation.

Liposome pelleting
10 mM of the protein of interest was added to a final volume of 200 ml of the liposome solution. This solution was left at room temperature for 25 min to allow for protein-liposome interaction. After incubation, the solution was centrifuged at 400,000 g for 30 min. Supernatant and pellet fractions were separated and the pellet was resuspended in 200 ml of 25 mM HEPES (pH 7.2), and 125 mM NaCl, samples were then collected for analysis on a precast 4-12% bis-tris gel (Novex) by coomassie staining.

Biophysical interaction using Bio-layer interferometry (BLiTz)
Protein-lipid and protein-protein interactions were determined using the bio-layer interferometry from the BLiTz system. Protein-lipid interactions were observed by immobilizing 500 mM of biotinylated liposomes on a streptavidin biosensor. After immobilization, the sensor was washed with buffer containing 10 mM Tris (pH 8.0), 150 mM NaCl and 0.1% BSA to prevent non-specific association. Increasing concentrations (12.5, 25, 50 and 100 mM) of protein were added to the sensor and the change in binding (nm) was measured. Proteins were then allowed to disassociate from the probe in the buffer previously mentioned. The kinetics of the protein-protein interactions were determined in the same fashion using 5 mM of His-tagged COMMD1 COMM domain immobilized on a nickel-NTA probe and increasing protein concentrations of 62.5, 125, 250 and 500 mM. The data was processed and plotted using the Sigmaplot package (Systat Software Inc.).
Cell culture RPE1, HEK293T cells were maintained in DMEM (D5796; Sigma-Aldrich) plus 10% fetal calf serum (F7524; Sigma-Aldrich) under standard conditions. These cell lines were obtained from America Type Culture Collection (ATCC). Parental and stable cells lines were negative for mycoplasma by DAPI staining, and authenticated by STR profiling. Lentivirus particles for producing stably expressing cell lines were generated in HEK293T cells using the pXLG3 vector to carry the GFP tagged Commd1 WT and Commd1 (R133Q, H134A, K167A). Cells were transfected with DNA using polyethylenimine (Sigma-Aldrich). Virus was harvested from the growth media 72 hr post transfection.
For stable transduction with lentivirus, cells were seeded at 75,000 per well in six well plates. The cells were then incubated under normal conditions with titrations of viral supernatant for 72 hr. Cells were then passaged and expression of the GFP tagged protein of interest assessed by western analysis. Cell lines that displayed similar expression levels were selected for comparison and those closest to endogenous levels of protein.
Immunofluorescence RPE1 cells grown on 13 mm coverslips were washed with PBS before being fixed in ice cold 4% formaldehyde in PBS for 25 min. Cells were permeabilised in 0.1% Triton X-100 (Sigma) for 6 min. The cells were then blocked with 1% bovine serum albumin (BSA) in 0.01% Triton for 15 min at room temperature. Primary antibodies were diluted in 1% BSA and samples were incubated for 1 hr at room temperature. The samples were then incubated with Alexa Fluor conjugated secondary antibody and 0.2 mM DAPI for 30 min at room temperature. Coverslips were mounted in Mowiol-DABCO mounting medium (Sigma). Cells were visualised using a Leica TCS SP5 X confocal microscope (Leica Biosystems).

Data availablity
Coordinates and structure factors for the COMM and HN domain of Commd9 have been deposited at the Protein Data Bank (PDB) with accession codes 6BP6 (COMM domain Commd9) and 4OE9 (HN domain of Commd9).