Antiparallel protocadherin homodimers use distinct affinity- and specificity-mediating regions in cadherin repeats 1-4

Protocadherins (Pcdhs) are cell adhesion and signaling proteins used by neurons to develop and maintain neuronal networks, relying on trans homophilic interactions between their extracellular cadherin (EC) repeat domains. We present the structure of the antiparallel EC1-4 homodimer of human PcdhγB3, a member of the γ subfamily of clustered Pcdhs. Structure and sequence comparisons of α, β, and γ clustered Pcdh isoforms illustrate that subfamilies encode specificity in distinct ways through diversification of loop region structure and composition in EC2 and EC3, which contains isoform-specific conservation of primarily polar residues. In contrast, the EC1/EC4 interface comprises hydrophobic interactions that provide non-selective dimerization affinity. Using sequence coevolution analysis, we found evidence for a similar antiparallel EC1-4 interaction in non-clustered Pcdh families. We thus deduce that the EC1-4 antiparallel homodimer is a general interaction strategy that evolved before the divergence of these distinct protocadherin families. DOI: http://dx.doi.org/10.7554/eLife.18449.001

Clustered Pcdh subfamilies show distinct phenotypes. In zebrafish, a and g Pcdhs express in overlapping but distinct brain regions (Biswas et al., 2012). In mammals, a Pcdhs regulate sorting of olfactory sensory neuron axons into glomeruli, serotonergic axon maturation, and dendritic patterning in CA1 pyramidal neurons (Hasegawa et al., 2012(Hasegawa et al., , 2008Katori et al., 2009;Suo et al., 2012). The g subfamily is important for self/non-self discrimination in retinal starburst amacrine cells and Purkinje neurons (Kostadinov and Sanes, 2015;Lefebvre et al., 2012). Thus, available data suggest that the different Pcdh subfamilies may function independently or cooperatively, perhaps depending on the brain region and/or neuronal cell type.
Our recent PcdhgA1 and PcdhgC3 EC1-3 structures revealed dimer interactions between EC2 and EC3 (Nicoludis et al., 2015), consistent with previous biochemical and bioinformatics data (Schreiner and Weiner, 2010;Wu, 2005). Using sequence co-evolution analysis, we predicted intersubunit EC1-EC4 interactions, and proposed that clustered Pcdhs form extended antiparallel homodimers engaging EC1-4. A complementary biochemical and structural study arrived at a very similar docking model (Rubinstein et al., 2015), which was recently confirmed for a and b clustered Pcdhs (Goodman et al., 2016).
We determined the crystal structure of PcdhgB3 EC1-4, the first full antiparallel dimer for a g isoform. We analyzed the clustered Pcdhs structures in light of biological, biochemical and evolutionary data to further resolve how clustered Pcdhs encode specificity. We describe how structural differences between the a, b and g subfamilies generate distinct modes of specificity encoding. We also provide evidence that the EC1/EC4 and EC2/EC3 interfaces are functionally different: EC1/EC4 eLife digest As the brain develops, nerve cells or neurons connect with one another to form complex networks. These connections form between branch-like structures, called dendrites, that project from the cell body of each neuron. To prevent unneeded connections from forming, dendrites that belong to the same neuron need a way to recognize and avoid one another. A family of proteins called protocadherins supports this process of self-avoidance.
Protocadherins have three main parts or domains: an extracellular domain that faces outwards away from the cell, a transmembrane domain that sits within the cell's surface membrane and an intracellular domain that faces into the cell's interior. There are two major groups of protocadherins -clustered and non-clustered -and the former are responsible for the self-avoidance behavior between dendrites. Clustered protocadherins in turn comprise three subfamilies, each of which consists of multiple variants with slightly different structures (known as isoforms). The particular set of protocadherin isoforms that a neuron displays on its surface distinguishes that neuron from all others, a little like a barcode.
When two dendrites meet, the protocadherins in their membranes come into contact with one another. If both dendrites come from the same neuron and therefore possess identical sets of protocadherins, then all protocadherins can form two-subunit complexes containing one copy of the same isoform from each dendrite. These complexes are called homodimers and their formation acts as a signal that informs the cell that it has encountered one of its own dendrites and should therefore not establish a connection. By using X-rays to determine the structure of a crystallized protocadherin fragment down to the level of its individual atoms, Nicoludis et al. now reveal exactly how clustered protocadherins form homodimers. The results show that each protocadherin subfamily uses a slightly different type of interaction due to differences in the structure of their extracellular domains.
The next challenge is to identify the signaling cascade that is triggered by the formation of clustered protocadherin homodimers, and to work out how activation of this cascade prevents a permanent connection from forming. In addition, the results of Nicoludis et al. predict that some non-clustered protocadherins form dimers with a similar architecture to that of clustered protocadherins. This possibility should also be tested experimentally.
provides nonselective dimerization affinity while EC2/EC3 is generally responsible for enforcing specificity. Finally, we extend our sequence coevolution analysis to the non-clustered Pcdhs and provide evidence that the EC1-4 interaction is broadly used by Pcdhs.
Although the asymmetric unit contains a single PcdhgB3 molecule, a crystallographic two-fold axis generates an antiparallel dimer with intersubunit EC1/EC4 and EC2/EC3 interactions ( Figure 1A). This dimer is consistent with the PcdhgA1 EC1-3 crystal structure, validating the previously predicted interface (Nicoludis et al., 2015;Rubinstein et al., 2015), and with recent a and b Pcdhs structures (Goodman et al., 2016), confirming that this interaction mechanism is conserved among all clustered Pcdh subfamilies ( Figure 1D). The structures do differ noticeably in overall twist, including subfamily-specific differences in relative EC1/EC4 orientation ( Figure 1E).
The linear architecture of clustered Pcdhs enables extended antiparallel dimer interfaces. Overall, the tilt and azimuthal angles between adjacent clustered Pcdh repeats are distinct from those of classical cadherins (Figure 1-figure supplement 2) (Nicoludis et al., 2015). Classical cadherins, which typically dimerize through EC1/EC1 interfaces, exhibit smaller tilt angles and thus an overall curved structure (Boggon et al., 2002). Notably, the clustered Pcdh repeat orientation is such that EC1 and EC3 use the same face for intersubunit contacts, as do EC2 and EC4 (Figure 1-figure supplement 3), suggesting that longer cadherins could readily form even more extended interfaces.

Clustered protocadherin subfamilies have distinct specificity mechanisms dictated by structural differences
Clustered Pcdh subfamilies control different phenotypes in vivo and have discrete expression patterns (Biswas et al., 2012;Keeler et al., 2015), suggesting that they encode specificity using distinct modes, which may relate to subfamily-specific structural features. To investigate this hypothesis, we calculated the isoform conservation ratio (ICR) within individual subfamilies, which quantifies the extent to which individual residue positions are conserved among orthologs (same isoform in different species) and diversified in paralogs (different isoforms in the same species) (Nicoludis et al., 2015), resulting in three ICR value sets for the a, b and g subfamilies, respectively (Figure 2-figure supplement 1). To account for subfamily differences in sequence conservation, we normalized the ICR values by dividing by the subfamily average. We then mapped them onto the Pcdha7 EC1-5, Pcdhb8 EC1-4 and PcdhgB3 EC1-4 structures ( Figure 2A). Comparing the structures and isoform-specific conservation in the different subfamilies allowed us to identify key specificity determinant regions for individual subfamilies. We illustrate three examples of how the subfamilies have encoded specificity using unique structural features.
In a isoforms, the EC2 b4-b5 loop is enriched in high-ICR and chemically diverse residues, and differs in conformation in the Pcdha4 and Pcdha7 structures ( Figure 2B): the Pcdha4 EC2 b4-b5 loop contacts b1b of EC3, while the corresponding loop in Pcdha7 does not, suggesting variable interactions in other isoforms. In comparison, the EC2 b4-b5 loop residues in both b and g isoforms have lower ICR values, more similar loop structure, and do not contact b1b of EC3. Thus, this loop may have evolved to generate diversity within a isoforms, but not in other subfamilies.   In b isoforms, the Phe-X 10 -Phe loop between b3 and b4 of EC3 has limited diversity compared to a and g isoforms and wedges between the EC2 b4-b5 loop and b2b strand ( Figure 2C). In contrast, the Phe-X 10 -Phe loop of a and g isoforms has a helical conformation, and has residues with higher ICR values and greater chemical diversity. Therefore alterations in secondary structure can affect how specificity is encoded within the subfamilies. The short loop following the extended b1a strand in EC4 contacts the EC1 b6-b7 loop ( Figure 2D), and there are large structural differences in the EC1/EC4 interaction between subfamilies ( Figure 1E). a Isoforms have low-ICR residues at this interface, whereas b and g isoforms have higher ICR value residues. This thus suggests that the large structural differences drive inter-subfamily specificity, on which may be layered additional isoform specificity.
In all cases, sequence regions with high isoform-specific conservation correlate with interface contacts, revealing the interplay between dimer structure and how subfamilies encode specificity. Diversity in the composition and conformation of loop regions provides distinct specificity mechanisms to subfamilies. Phylogenetic analysis indicates that isoforms are more similar within than across subfamilies (Sotomayor et al., 2014;Wu, 2005), and the available structures show that the interface architecture is more similar within subfamilies as well ( Figure 1E) (Goodman et al., 2016;Nicoludis et al., 2015). With this insight, the dimer interface seen in the PcdhgC3 EC1-3 crystal structure may represent a unique dimer architecture for C-type isoforms (Nicoludis et al., 2015), as these isoforms are transcriptionally, functionally and evolutionarily distinct from the subfamilies in which they reside Frank et al., 2005;Kaneko et al., 2006). Distinct expression of the clustered Pcdh subfamilies in different tissues and at different developmental stages supports the necessity for intra-subfamily specificity (Biswas et al., 2012;Frank et al., 2005). Differences in subfamily structure and isoform-specific conservation suggest that homophilic specificity mechanisms emerged independently in each subfamily through diversification of subfamily-specific interface contacts.

The EC1/EC4 interaction provides affinity of dimerization
The EC2/EC3 interaction is integral to clustered Pcdh dimerization specificity, as evidenced by bioinformatics and cell-aggregation assays (Nicoludis et al., 2015;Rubinstein et al., 2015;Schreiner and Weiner, 2010;Thu et al., 2014;Wu, 2005). We sought to understand the functional purpose of the EC1/EC4 interaction, and made three observations. First, for all isoforms with available structures, fewer EC1/EC4 interface residues have high isoform-specific conservation compared to the EC2/EC3 interface residues (Figure 2A, Figure 2-figure supplement 1). Second, interface residues shared by most isoforms are more hydrophobic in EC1/EC4 than in EC2/EC3 ( Figure 3A). Third, the PcdhgB3 EC1/EC4 interface is much larger (BSA = 976 Å 2 per protomer) than the EC2/ EC3 interface (555 Å 2 per protomer). The lack of isoform specificity, the hydrophobic nature, and large interface area together suggest that the EC1/EC4 interface promotes binding with little specificity.
In the PcdhgB3 EC1/EC4 interface, F86, one of the predicted affinity-driving residues from EC1, wedges into a cavity created by hydrophobic EC4 residues ( Figure 3D). A PcdhgB3 EC1-4 F86A mutant indeed disrupted dimerization, resulting in a monomeric protein as measured by MALS ( Figure 3E). Thus, the hydrophobic interactions between EC1 and EC4 are crucial to dimerization. Analogously, purified EC1-3 constructs failed to dimerize in vitro whereas EC1-4 constructs did (Nicoludis et al., 2015;Rubinstein et al., 2015), and K562 cells expressing DEC1 or DEC4-6 constructs did not aggregate while cells expressing chimeras in which EC1 and EC4 derived from different paralogs did (Schreiner and Weiner, 2010;Thu et al., 2014). Together, these results indicate that the EC1/EC4 interaction is not strictly required for the specificity of dimerization but it drives dimerization affinity through non-specific hydrophobic interactions.

Antiparallel EC1-4 interaction is predicted in non-clustered Pcdhs
The antiparallel EC1-4 interaction architecture can encode diverse specificities within the clustered Pcdh family. Is this architecture unique to clustered Pcdhs or is it ancestral, and thus also found in non-clustered Pcdhs? These include the d-1 (Pcdh1, Pcdh7, Pcdh9, Pcdh11) and d-2 (Pcdh8, Pcdh10, Pcdh17, Pcdh18, Pcdh19) families that are integral to the development and maintenance of the nervous system (Keeler et al., 2015;Kim et al., 2011). We used sequence coevolution analysis, which successfully predicted the clustered Pcdh interface (Nicoludis et al., 2015) (Figure 4-figure supplement 1),to look for evidence of an antiparallel interface in non-clustered Pcdhs ( Figure 4A). As in clustered Pcdhs, most covarying residue pairs in non-clustered Pcdhs were intra-domain structural contacts of the well-conserved cadherin fold. Additionally, several covarying pairs are found between EC2 and EC3, or EC1 and EC4, similar to those observed for the clustered Pcdhs. When mapped onto the PcdhgB3 dimer, these covarying pairs are somewhat further apart than true interface contacts ( Figure 4B), which could be due to differences in dimerization interfaces, as we observe between the clustered Pcdh families, or in the d-1 or d-2 Pcdhs secondary structure, for which there are no available structures. This analysis thus predicts an antiparallel EC1-4 interaction in members of the non-clustered Pcdhs. Notably, we cannot determine whether all members or only a subsetand if so, which -likely use this architecture. However maximum parsimony suggests that the ancestral Pcdh used the antiparallel EC1-4 dimer interaction, and Pcdh members which do not show this interaction mechanism either diverged before it evolved or lost it subsequently.
Finally, we looked at the composition of a predicted non-clustered Pcdh interface, by selecting residues homologous to those found at clustered Pcdh interfaces. The predicted EC1/EC4 interface residues are predominantly hydrophobic, while EC2/EC3 residues have more polar and ionic character ( Figure 4C). Notably, positions 41 and 77 in EC1 and 320, 321, 371 and 373 in EC4 are more hydrophobic in non-clustered than clustered Pcdhs, indicating that these may form contacts in some non-clustered Pcdhs. Thus, like in the clustered Pcdhs, the EC1/EC4 interaction may promote dimer affinity while the EC2/EC3 interaction provides specificity.

Conclusions
Recently, we and others predicted that clustered Pcdhs form homophilic antiparallel EC1-EC4 complexes based on crystal structures, mutagenesis and bioinformatics (Nicoludis et al., 2015;Rubinstein et al., 2015). Our structure of PcdhgB3 EC1-4 confirms our sequence coevolution analysis, demonstrating the robustness of the analysis and revealing the molecular details of this interaction. Here we extended this prediction to other non-clustered Pcdhs using sequence coevolution analysis.
We identified a hydrophobic interaction between EC1 and EC4 that contributes to dimerization affinity, whereas its conservation among clustered Pcdh isoforms suggests that this interaction is not a driver of specificity. Overall, our data support a general role for conserved hydrophobic EC1/EC4 interactions in affinity, and for highly diversified polar EC2/EC3 contacts in specificity, and sequence analyses suggest that this is conserved in at least some non-clustered Pcdhs.

Materials and methods
Expression, purification and crystallization of PcdhgB3 EC1-4 Human PcdhgB3 EC1-4 (residues 1-414, not counting the signal peptide) was cloned into pET21 with a C-terminal hexahistidine tag, expressed in BL21 Gold (DE3) Escherichia coli cells in terrific broth. Cells were induced at OD 600 = 0.8 with 0.5 mM isopropyl b-D-1-thiogalactopyranoside (IPTG) at 37˚C for 4 hr, harvested and lysed by sonication in 8 M guanadinium hydrochloride (GuHCl), 50 mM HEPES pH 7.5, 2 mM CaCl 2 , and 20 mM imidazole. Cell lysates were diluted to 5 M GuHCl and loaded onto Ni-Sepharose, washed with 50 mM HEPES pH 7.5, 250 mM NaCl, 10 mM CaCl 2 , and 25 mM imidazole and eluted with 250 mM imidazole. Eluted protein was refolded at 1 mg/mL in 12 hr dialysis steps reducing the GuHCl concentration from 2.5 M to 1.25 M and finally 0 M in refolding buffer (100 mM Tris pH 8.5, 10 mM CaCl 2 , 1 mM EDTA, 5 mM dithiothreitol (DTT), and 0.5 M L-arginine). Concentrated refolded protein was purified by size-exclusion chromatography (SEC) on a Superdex 200 16/60 column (GE Healthcare, Pittsburgh, PA) in 20 mM Tris pH 8.5, 200 mM NaCl, and 2 mM CaCl 2 (SEC buffer). Two peaks were isolated and each peak was run again separately by SEC before being concentrated for crystallization.

Multi-angle light scattering (MALS)
Approximate molecular mass of PcdhgB3 EC1-4 protein (WT or F86A mutant) was determined using a Superdex S200 10/300 column (GE Healthcare, Pittsburgh, PA) with in-line Wyatt Dawn Heleos II and Optilab T-rex refractive index detectors. Protein (100 mL at 4 mg/mL) was injected and run at 0.4 mL/min in SEC buffer. Signals were aligned, normalized and band-broadened using bovine serum albumin as a standard.
Crystallization, data collection, and structure determination and analysis Crystals were obtained by vapor diffusion at room temperature in 0.1 M HEPES pH 7, 4% ethylene glycol, and 5% polyethylene glycol monomethyl ether 500 in a 0.3 mL protein (12 mg/mL) to 0.3 mL reservoir drop, then cryoprotected with reservoir with 20% glycerol before flash cooling in liquid N 2 . Diffraction data (Figure 1-source data 1) were processed in HKL2000 (Otwinowski and Minor, 1997). The PcdhgB3 EC1-4 structure was determined by an iterative molecular replacement search with single domains of the PcdhgA1 EC1-3 structure (PDBID 4zi9) in PHENIX (Adams et al., 2010). Model building was done in Coot (Emsley and Cowtan, 2004) and refinement in PHENIX (Adams et al., 2010). We analyzed the physicochemical properties of the dimer interface using PISA (Krissinel and Henrick, 2007). In the structure, we found a HEPES molecule near the EC2/EC3 interface that forms a salt bridge with N253 and N155 (Figure 1-figure supplement 4). MALS data were collected with Tris as the buffer (Figure 1-figure supplement 1), indicating that HEPES is not required for dimerization.

ICR value calculations
Overall percent identity of the most common residue at each position was used to calculate ICR values, dividing the percent identity across subfamily orthologs by the percent identity across subfamily paralogs. ICR values were then normalized by dividing by the whole sequence ICR average within each subfamily. The alignment and identity data are provided here (Nicoludis et al., 2015).
We used the precision of the intra-domain evolutionary couplings to determine whether the interdomain evolutionary couplings are likely to be true. For the clustered Pcdh alignment, 83 non-local (more than five residues apart) contacts fall above a threshold of 80% precision of the intra-domain evolutionary couplings. Intra-domain evolutionary couplings are considered true if they correspond to structural contact (minimum atom distance < 8 Å ) in any structure (Figure 4-figure supplement 1). Based on the same 80% precision threshold, the top 38 non-local ECs are significant in the nonclustered Pcdh alignment. We exclude couplings between residues greater than 400 in this analysis due to the false signal from gaps in this region.