Intrinsically Disordered Flanking Regions Increase the Affinity of a Transcriptional Coactivator Interaction across Vertebrates

Interactions between two proteins are often mediated by a disordered region in one protein binding to a groove in a folded interaction domain in the other one. While the main determinants of a certain interaction are typically found within a well-defined binding interface involving the groove, recent studies show that nonspecific contacts by flanking regions may increase the affinity. One example is the coupled binding and folding underlying the interaction between the two transcriptional coactivators NCOA3 (ACTR) and CBP, where the flanking regions of an intrinsically disordered region in human NCOA3 increases the affinity for CBP. However, it is not clear whether this flanking region-mediated effect is a peculiarity of this single protein interaction or if it is of functional relevance in a broader context. To further assess the role of flanking regions in the interaction between NCOA3 and CBP, we analyzed the interaction across orthologs and paralogs (NCOA1, 2, and 3) in human, zebra fish, and ghost shark. We found that flanking regions increased the affinity 2- to 9-fold in the six interactions tested. Conservation of the amino acid sequence is a strong indicator of function. Analogously, the observed conservation of increased affinity provided by flanking regions, accompanied by moderate sequence conservation, suggests that flanking regions may be under selection to promote the affinity between NCOA transcriptional coregulators and CBP.


■ INTRODUCTION
The major determinants for a specific protein−protein interaction are found in the binding interface between the two proteins, as shown in numerous structural studies in combination with mutagenesis and binding assays.Presently, the field of systems biology grapples with how to integrate data on individual interactions in the context of the living cell.In addition to all specific interactions, i.e., those under natural selection for fitness, it is clear that nonspecific interactions between proteins and quinary interactions 1,2 affect the stability and function of proteins.Moreover, apparently nonspecific interactions within a protein complex add another layer of complexity.−8 We have recently investigated the role of flanking regions in the interaction between two transcriptional coactivators, CREB-binding protein (CBP), and nuclear receptor coactivator 3 (NCOA3, also called ACTR). 9The interaction domains are the molten globule-like nuclear coactivator binding domain (NCBD) 10−13 of CBP and the highly disordered CBP-interacting domain (CID) of NCOA3, which interact in a coupled binding and folding reaction, resulting in a well-ordered complex. 14,15The flanking disordered regions of the human CID domain from NCOA3 increases the affinity 3-fold for NCBD, likely via nonspecific hydrophobic or polar interactions. 9This may seem a minor contribution in terms of free energy of binding (<1 kcal mol −1 ), but the effect can be substantial if present in larger protein complexes such that the combined effect of several interactions provides an overall significant affinity.The regions involved in forming the complex interface are well conserved, for both NCBD and CID, but more sequence changes have occurred outside of the binding regions (Figure 1).
While a high substitution rate is expected for intrinsically disordered linker regions without a dedicated function, 18 nonconservation would question whether the influence of flanking regions on binding affinity is of any functional significance.To address this question, we measured the effect of flanking regions on the affinity for several NCBD/CID complexes.We selected CIDs from orthologs in different species but also from the two other paralogs present in vertebrates, NCOA1 (Src1) and NCOA2 (Tif1), respectively.Our data show that the affinity in complexes from three jawed vertebrate species separated by 420−450 million years of evolution is consistently increased by the intrinsically disordered regions that flank the CID region defined by the complex.Our data corroborate the notion of interacting flanking regions as a general way to modulate affinity in protein interactions, despite less stringent constraints on the amino acid sequence compared to the binding interface.

■ RESULTS AND DISCUSSION
Phylogenetic Analysis of Flanking Regions around the CID Domain.We could previously identify the CID domain only in deuterostome animals (vertebrates, echinoderms, hemichordates) but not in protostomes (arthropods, nematodes, annelids, molluscs). 19Thus, we concluded that the CID domain and its interaction with CBP emerged in an ancestral deuterostome.In the present work, we therefore collected NCOA sequences from different deuterostome animals in the Uniprot and NCBI databases.Based on our study on flanking regions for CID in human NCOA3, 9 we investigated 39 amino acid residues on either side of the "core" CID domain (Figure 1).The sequence alignment shows, as expected, that the flanking regions are less conserved than the core CID domain, which is defined by the binding interface with NCBD in NMR structures of the complex. 14,15,17onservation of disordered flanking regions is not straightforward to define quantitatively in terms of identity because of multiple insertions and deletions.However, there are conserved features in the flanking regions, for example, a D− D/E−Φ−Φ motif at the end of the N-terminal flanking region.(This motif could serve as a binding partner for another, unidentified protein domain.)Furthermore, both the N-and C-terminal flanking regions from jawed vertebrates have calculated isoelectric points between 3.39 and 4.66 due to more Asp and Glu as compared to Lys and Arg residues (Supporting Information Text File 1).These conserved features suggest that the flanking regions may play a role beyond acting as linkers between functional domains.
Two whole genome duplications occurred in an early vertebrate around 450 million years ago, 20 resulting in paralogs of many genes that are conserved in all present-day jawed vertebrates (gnathostomes).While the relationship between the three NCOA paralogs from jawed vertebrates is clear, the phylogeny of the nonjawed vertebrate Petromyzon marinus is not.All jawed vertebrates contain three paralogs, NCOA1, NCOA2, and NCOA3.Sequence-based phylogeny supports a scenario where the gene encoding NCOA1 diverged from the ancestral NCOA2/3 gene in the first whole genome duplication and NCOA2 and NCOA3 diverged in the second genome duplication.However, for P. marinus, when taking the full-length NCOA sequences into account, the two paralogs, here denoted NCOAa and NCOAb, do not clearly group with specific NCOA paralogs from the jawed vertebrates (Figure S1).It is not clear whether the nonjawed vertebrates diverged before, during, or after the two whole genome duplications in the jawed vertebrate lineages. 20,21Thus, NCOAa and NCOAb may have originated in the first genome duplication and experienced extensive sequence divergence-relative NCOA1 and NCOA2/3 or be the result of a separate gene duplication occurring after the split between jawed and nonjawed vertebrates.
Experimental Interaction Studies between NCBD and CID.For binding experiments, we designed expression constructs for CID and NCBD domains from five animals, based on phylogeny and previous experiments: 19,22 Strongylocentrotus purpuratus (purple sea urchin, an echinoderm, see note in the Materials section), P. marinus (sea lamprey, a jawless vertebrate), Callorhinchus milii (Austrailan ghost shark, a cartilaginous fish), Danio rerio (zebra fish, a bony fish), and Homo sapiens, representing tetrapods (Figures 1 and S1).We have previously expressed and purified the short version of CID from these animals and from the human paralogs, 19,22 but obtaining CID with flanking regions proved very challenging.Thus, while we initially aimed for four different expression constructs from each NCBD/CID complex consistent with our previous study 9 (the longest N-CID-C with both flanking regions, the N-terminal flanking region N-CID, the C-terminal CID-C, and the minimal region CID), we had to resort to comparing only the longest N-CID-C with CID.
We were able to express a long version (N-CID-C) and the minimal region (CID) from seven NCOAs: H. sapiens NCOA1, H. sapiens NCOA3, D. rerio NCOA1, D. rerio NCOA2, D. rerio NCOA3, C. milii NCOA3, and S. purpuratus NCOA.Except for S. purpuratus NCBD/CID, affinities were determined with stopped-flow spectroscopy using a Trp variant Sequence alignment of CID domains with flanking regions and structural models of the NCBD/CID complex.The sequence identity within the CID region that forms the interface with NCBD is higher than that in the flanking regions.The complex between human NCBD and NCOA3 CID with flanking regions (N, dark blue; C, red) (residues 1006−1125) was predicted by ColabFold. 16The predicted complex is compared with a complex solved by NMR (Protein Data Bank ID 6ES7) 17 with the short NCOA3 CID (residues 1045−1084) and a slightly shorter NCBD construct than that used in the present paper, corresponding to the conserved region, which binds the CID domain (residues 2062− 2109).There are differences in the conserved regions, most notably in the C-terminal helix of NCOA3 CID.This may be due to both uncertainty in the prediction (IDDT ∼70−80 in the helical regions) and an inherent flexibility in NCBD/CID complexes. 17In either case, the N-and Cterminal flanking regions are predicted as intrinsically disordered (IDDT ∼40).
of NCBD from the respective species, as previously described 9 (Figures 2 and S2).
The independent determination of k on and k off by stoppedflow spectroscopy gives both high accuracy and precision to the data, which is important when comparing relatively small differences in the K d value.For all of the vertebrate complexes, the affinity was increased by the presence of flanking regions, usually by 2−3-fold.However, for H. sapiens NCOA1 CID, the presence of flanking regions increased the affinity as much as 9fold (Figures 2 and 3 and the Supporting Information Excel File).For the low-affinity nonvertebrate S. purpuratus NCBD/ CID complex, we used isothermal titration calorimetry (ITC) to estimate the affinity.In this case, we did not observe a change in affinity from the flanking regions within the error of the ITC experiment, K d = 8 and 5 μM with and without flanking regions, respectively (Figure 2).Furthermore, as we  showed in a previous study, the stoichiometry of the S. purpuratus NCBD/CID interaction appears to be NCBD:CID 1:2, which complicates the analysis. 22We note that the sequence composition of the flanking regions in S. purpuratus CID is very different from that of the chordates, with multiple Gln residues and only one Glu in the N-terminal regions and one Asp in the C-terminal regions, resulting in higher calculated isoelectric points (Supporting Information Text File 1).Flanking Regions Remain Intrinsically Disordered in the Complex.The regions of CID that are in direct contact with NCBD in the complex fold to α helices upon binding (Figure 1). 15,17We performed circular dichroism (CD) experiments to estimate formation of helix upon binding, for the core CID region and for N-CID-C, for four complexes (Figure 4).Difference spectra between bound and free CID (or N-CID-C) show the increase in the CD signal associated with binding.Furthermore, the very similar changes for CID and N-CID-C suggest that it is only the core CID region that folds into helices and that the flanking regions remain intrinsically disordered.These results are consistent with previous data for human NCOA3 CID and NCBD across a range of ionic strength 9 and with ColabFold prediction (Figure S3).

■ CONCLUSIONS
−27 Emerging quantitative data suggest a role of disordered flanking regions, which could make multiple transient interactions with a folded interaction partner to either increase or decrease affinity. 3,4,9Phylogenetic methods are powerful in pinpointing evolutionarily conserved regions in proteins.If these regions are involved in a protein−protein interaction, then the conserved residues are likely important for affinity and specificity.Intrinsically disordered regions in proteins are usually less conserved than ordered ones, 18 although CID is an example of a very conserved disordered region.Obviously, the reason is that CID is directly involved in a binding interface with the NCBD domain of CBP and p300 and is therefore under strong selection to maintain the affinity of the complex.While the N-and C-terminal flanking regions of CID are less conserved than the "core" CID region in terms of sequence identity (Figure 1), we here show that they contribute to increasing the binding affinity in three different jawed vertebrate species (a shark, a bony fish, and a mammal), which diverged between 420 and 450 million years ago.But how is this apparently evolutionary conserved trait achieved?It is conceivable that the nonspecific interactions contributed by the flanking regions are less dependent on a specific amino acid sequence compared to those in the binding interface and more on sequence composition.For example, in the present case, there is a conserved motif including negatively charged residues in the N-terminal flanking region and additional relatively well conserved negative net charges in both the Nand C-terminal regions, as well as a Trp residue in the Nterminal.Our study on human NCOA3/NCBD demonstrated a lack of ionic strength dependence suggesting that charge− charge interactions are not involved in increasing the affinity. 9hus, extrapolating to our present data, the flanking regions may be under purifying selection to maintain a certain degree of favorable polar or nonpolar nonspecific interactions, where the structural flexibility allows many combinations of residues.This is conceptually similar to the "conformational buffering" proposed based on experiments with adenovirus E1A and host Rb protein, where overall properties rather than exact sequence are conserved. 28It could be argued that the observed effects on affinity are coincidental and of no functional importance.Because of the huge sequence space of even short disordered flanking regions, this objection is hard to refute since there will always be sequences that either promote or reduce affinity in a given context.In other words, negative controls are hard to design, and experiments would be difficult to interpret.While our approach, investigating the effect of naturally evolved and related sequences, does not provide direct proof, it corroborates the hypothesis that flanking regions promote interactions.In conclusion, our present data, limited to three paralogs and three species, and with the caveats delineated above, suggest that flanking regions are under selection for increasing the affinity and may therefore contribute functionally to the interaction between the transcriptional coregulator families CBP/p300 and NCOA in jawed vertebrates.

■ MATERIALS AND METHODS
Bioinformatics.Protein sequences were downloaded from Uniprot or NCBI (Supporting Information Text Files 2 and 3).Sequence alignment was performed with ClustalO 29 and Muscle. 30Overall, the regions corresponding to NCBD and CID are well conserved among animals. 19Sequences representing different branches of the deuterostome animal tree were selected for experiments.Prediction of complex structures were done by ColabFold, 16 which builds on AlphaFold2. 31The sequence for S. purpuratus NCOA (W4YZZ7) was withdrawn from Uniprot at a late stage of the project and is now presented in UniParc (UP-I000222AEB3).It is also present in NCBI and annotated as neurogenic protein mastermind (XP_030830181.1).A homologous sequence from the related L. variegatus (green sea urchin) is present in NCBI and annotated as NCOA2-like.Based on comparisons between these and the vertebrate NCOAs (Supporting Information Text File 3), we decided to keep the data for S. purpuratus in this paper.
Expression and Purification.Expression constructs were ordered from Genscript.Each plasmid encoded a 6 His-tagged lipo domain, followed by a thrombin cleavage site and the protein of interest (a CID or NCBD variant).The Uniprot ID for each sequence is shown in Supporting Information Text File 2. Gly-Ser remains at the N-terminus after thrombin cleavage.N-CID-C and CID from D. rerio NCOA2 were expressed with a PreScission site to improve yield.Here, Gly-Pro-Gly-Ser remains after cleavage.The first four residues in NCBD from C. milii were truncated during expression and purification, as shown by Maldi-TOF mass spectrometry, and this truncated NCBD was used in the experiments.For kinetic studies, a Trp was introduced at the position corresponding to Tyr2108 in human NCBD. 32The expressed sequences are compiled in Supporting Information Text File 2. Expression and purification of CID and NCBD variants have been previously described in detail. 9iophysical Experiments.All experiments to assess the secondary structure and determine affinity were performed in 20 mM sodium phosphate (pH, 7.40, 150 mM NaCl).Far-UV circular dichroism spectra were recorded in a Jasco J-1500 spectropolarimeter with a 1 mm quartz cuvette (Figure 4).Kinetic experiments were performed in an instrument from applied photophysics at a low temperature (4 °C) to facilitate kinetic experiments by reducing the observed rate constants.The details of the kinetic experiments and analysis of data were recently published. 9To obtain the observed rate constant k obs (Figures 2 and S2), kinetic transients were fitted to either a single exponential function or, in the case of human and C. milii NCOA3, a double exponential to account for a slow kinetic phase in displacement experiments likely associated with equilibration of two alternative complexes following initial binding. 33The high k obs values were used to calculate K d .Isothermal titration calorimetry (ITC) experiments (Figure 2) were performed at 25 °C in a MicroCal iTC200 system (Malvern) as described in figure legends and in Karlsson et al. 22 ■ ASSOCIATED CONTENT * sı Supporting Information The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.biochem.3c00285.
Phylogenetic gene tree and cladogram for the NCOA sequences; plots of all observed rate constants for CID/ NCBD interactions; and confidence of colabfold prediction of the NCOA3 N-CID-C/NCBD complex (PDF) Rate constants and calculations (XLSX) pI values of N-and C-terminal flanking regions (TXT) Protein sequences and calculated isoelectric points (TXT) Full-length sequences (TXT)

Figure 1 .
Figure1.Sequence alignment of CID domains with flanking regions and structural models of the NCBD/CID complex.The sequence identity within the CID region that forms the interface with NCBD is higher than that in the flanking regions.The complex between human NCBD and NCOA3 CID with flanking regions (N, dark blue; C, red) (residues 1006−1125) was predicted by ColabFold.16The predicted complex is compared with a complex solved by NMR (Protein Data Bank ID 6ES7)17 with the short NCOA3 CID (residues 1045−1084) and a slightly shorter NCBD construct than that used in the present paper, corresponding to the conserved region, which binds the CID domain (residues 2062− 2109).There are differences in the conserved regions, most notably in the C-terminal helix of NCOA3 CID.This may be due to both uncertainty in the prediction (IDDT ∼70−80 in the helical regions) and an inherent flexibility in NCBD/CID complexes.17In either case, the N-and Cterminal flanking regions are predicted as intrinsically disordered (IDDT ∼40).

Figure 2 .
Figure 2. Determination of affinity using stopped-flow spectroscopy and isothermal titration calorimetry.(A) Examples of observed rate constants (k obs ) from binding experiments between NCBD and CID from H. sapiens and D. rerio.k obs values were plotted versus CID concentration, and the slope at high [NCBD] corresponds to the association rate constant k on .(B) The dissociation rate constant was measured in a separate displacement experiment where the dissociation of the NCBD/CID complex was induced by an excess of wild-type NCBD domain.The observed rate constant is a good approximation of the dissociation rate constant, and k off was calculated as the average of the three experiments shown.The equilibrium constant K d was calculated as the ratio of k off and k on .Kinetic data from all experiments are shown in Figure S2 and the Supporting Information Excel File.(D) Isothermal titration calorimetry was used to determine K d for the low-affinity interaction between NCBD and CID from S. purpuratus.(E) The difference in affinity between long (N-CID-C) and short (CID) variants is shown as fold difference.The blue vertical line is at fold difference = 1, i.e., corresponding to identical affinity.

Figure 3 .
Figure 3. Affinities mapped on a phylogenetic tree.A simplified phylogeny with affinities of NCBD/CID complexes from the present and previous work (indicated by footnotes: 1 Karlsson et al. 22 and 2 Hultqvist et al. 19 ).K d values derived from kinetic experiments have high precision, and the propagated errors from k on and k off are usually low (below 10%, see the Supporting Information Excel File).The fold difference between K d CID and K d N-CID-C for any particular pair is very accurate since the same NCBD solutions were used in stopped-flow experiments run back-to-back.

Figure 4 .
Figure 4. Difference spectra for CID/NCBD complexes.Difference spectra for vertebrate NCBD/CID complexes.Difference spectra are shown for both the long N-CID-C and the short CID constructs.The similar difference spectra between N-CID-C and CID suggest that the flanking regions do not fold into a particular secondary structure upon binding to NCBD.