The ChiS family DNA-binding domain contains a cryptic helix-turn-helix variant

Sequence specific DNA-binding domains (DBDs) are conserved in all domains of life. These proteins carry out a variety of cellular functions, and there are a number of distinct structural domains already described that allow for sequence-specific DNA binding, including the ubiquitous helix-turn-helix (HTH) domain. In the facultative pathogen Vibrio cholerae, the chitin sensor ChiS is a transcriptional regulator that is critical for the survival of this organism in its marine reservoir. We have recently shown that ChiS contains a cryptic DBD in its C-terminus. This domain is not homologous to any known DBD, but it is a conserved domain present in other bacterial proteins. Here, we present the crystal structure of the ChiS DBD at a resolution of 1.28 Å. We find that the ChiS DBD contains an HTH domain that is structurally similar to those found in other DNA binding proteins, like the LacI repressor. However, one striking difference observed in the ChiS DBD is that the canonical tight “turn” of the HTH is replaced with an extended loop containing a β-sheet, a variant which we term the “helix-sheet-helix”. Through systematic mutagenesis of all positively charged residues within the ChiS DBD, we show that residues within and proximal to the ChiS helix-sheet-helix are critical for DNA binding. Finally, through phylogenetic analyses we show that the ChiS DBD is found in diverse Proteobacterial proteins that exhibit distinct domain architectures. Together, these results suggest that the structure described here represents the prototypical member of the ChiS-family of DBDs. Importance Regulating gene expression is essential in all domains of life. This process is commonly facilitated by the activity of DNA-binding transcription factors. There are diverse structural domains that allow proteins to bind to specific DNA sequences. The structural basis underlying how some proteins bind to DNA, however, remains unclear. Previously, we showed that in the major human pathogen Vibrio cholerae, the transcription factor ChiS directly regulates gene expression through a cryptic DNA binding domain. This domain lacked homology to any known DNA-binding protein. In the current study, we determined the structure of the ChiS DNA binding domain (DBD) and find that the ChiS-family DBD is a cryptic variant of the ubiquitous helix-turn-helix (HTH) domain. We further demonstrate that this domain is conserved in diverse proteins that may represent a novel group of transcriptional regulators.


Introduction 51
The intestinal pathogen Vibrio cholerae natively resides in the aquatic environment and 52 can cause disease if ingested in the form of contaminated food or drinking water. In the 53 aquatic environment, V. cholerae commonly associates with the chitinous surfaces of 54 crustacean zooplankton (1). Chitin is an abundant source of carbon and nitrogen for 55 marine bacteria, including V. cholerae (2, 3). In addition, chitin serves as a cue to 56 induce horizontal gene transfer by natural transformation in this species (4). Thus, 57 Vibrio-chitin interactions are critical for this facultative pathogen to thrive and evolve in 58 its environmental reservoir. 59 60 Chitin is sensed in V. cholerae by the hybrid histidine kinase ChiS (5-7). In response to 61 chitin, ChiS activates the expression of the chitin utilization program. This regulon 62 includes the chb operon, which is required for the uptake and degradation of the chitin 63 disaccharide chitobiose. In a recent study, we showed that unlike most histidine 64 kinases, ChiS is capable of directly binding to DNA to regulate the expression of the chb 65 operon (5). This finding was particularly surprising because ChiS is not predicted to 66 encode a DNA-binding domain via primary sequence homology (BLAST (8)) or 67 structural predictions (Phyre2 (9)). In the current study, we sought to understand the 68 structural basis for ChiS DNA binding. To that end, we determined the structure of the 69 ChiS DBD and found that it encodes a distinct variant of the canonical helix-turn-helix 70 domain, which we term a "helix-sheet-helix". 71 72 Results and Discussion 73 The C-terminus of ChiS (ChiS 1024(ChiS -1129 ) is sufficient to bind Pchb 74 Previous work from our group demonstrates that ChiS is a noncanonical hybrid histidine 75 kinase that contains a DBD at its C-terminus ( Fig. 1A) (5). In that study, we found that 76 the C-terminal 106 amino acids of ChiS (ChiS 1024-1129 ) was necessary and sufficient to 77 bind to the chb promoter in vivo. We further showed that ChiS binds directly to two 78 binding sites within the chb operon promoter (Pchb) to activate the expression of this 79 locus. To confirm that ChiS 1024-1129 was sufficient to bind DNA, we purified this domain 80 and tested DNA-binding activity in vitro by electrophoretic mobility shift assays 81 (EMSAs). We found that ChiS 1024-1129 bound to a wildtype Pchb promoter probe, but not 82 to a probe in which the two ChiS binding sites were mutated, suggesting that this 83 domain is sufficient to bind to DNA in a sequence-specific manner ( Fig. 1B and Fig.  84  S1). Thus, based on our in vivo and in vitro analysis, we refer to ChiS 1024-1129 as the 85 ChiS DBD. 86 87 Identification of positively charged residues in the ChiS DBD that are critical for DNA 88 binding and transcriptional activation of Pchb 89 As mentioned above, ChiS is not predicted to encode a DNA-binding domain based on 90 in silico searches (i.e., BLAST (8) and Phyre2 (9)). To characterize interactions between 91 the ChiS DBD and DNA, we first tried to identify residues important for DNA binding. 92 The positively charged residues arginine (R) and lysine (K) commonly interact with the 93 negatively charged DNA backbone (10). Thus, we mutated every R and K residue in the 94 ChiS DBD to a glutamine (Q), to ablate their charge but maintain, to a reasonable 95 extent, the steric properties of the side group. 96 97 To determine how these mutations affected ChiS activity, we introduced them into full-98 length FLAG-tagged ChiS (5), and assessed the ability of each mutant to bind to DNA in 99 vivo (by chromatin immunoprecipitation, or ChIP) and to activate Pchb expression (using 100 a Pchb-GFP reporter). We found that all mutations to the ChiS DBD reduced Pchb-GFP 101 activation to varying degrees (Fig. 2). Most mutants were able to facilitate partial 102 activation of Pchb and correspondingly partially enriched for Pchb by ChIP, indicating that 103 they were binding to the promoter in vivo. Some mutants (R1068Q, R1074Q, K1078Q, 104 R1090Q, and R1092Q) did not bind to Pchb DNA in vivo and resulted in complete loss of 105 Pchb expression. Importantly, all mutants still produced ChiS protein as assessed by 106 Western blot analysis (Fig. S2). Collectively, these data identify a subset of positively 107 charged residues in the ChiS DBD that are critical for DNA binding and, subsequent 108 transcriptional activation of the chb operon.

110
Structure of the ChiS DNA binding domain reveals a variant of the helix-turn-helix 111 We next sought to determine the structure of the ChiS DBD to further explore how ChiS 112 interacts with DNA. Since no structures for close sequence homologs were available in 113 the Protein Data Bank (PDB) to serve as search models for molecular replacement, we 114 used the Single-wavelength Anomalous Dispersion (SAD) technique to determine initial 115 phases. Selenomethionine was used as the replacement for methionine. Anomalous 116 data were collected from a single crystal (Tables S1 and S2). The crystal diffracted to 117 1.28 Å resolution and belonged to the orthogonal C2221 space group with unit cell 118 parameters of a=51.91Å, b=78.61Å, c=72.37Å, a=b=g=90.00°. There was one 119 polypeptide chain in the asymmetric unit. The structure includes 105 out of 106 residues 120 of the protein (1024 -1128), two uncleavable residues of the purification tag, four 121 sulfate ions (SO4 2-), one 2-(2-hydroxyethyloxy)ethanol molecule (PEG), two formic acids 122 molecules (FMT) and 200 water molecules (HOH). Only the C-terminal E1129 was 123 disordered in the structure and was not included in the final model.

125
The structure of the ChiS DBD revealed that it contains a fold that is reminiscent of the 126 canonical helix-turn-helix (HTH) used by diverse DNA-binding proteins ( Fig. 3A-B). The 127 basic HTH domain consists of a trihelical bundle where the second and third helices 128 encompass the namesake "helix-turn-helix" (11). The two helices that compose the HTH 129 are connected via a relatively short linker that forms a sharp turn, which is a 130 characteristic feature of this domain. Helix 3 from the HTH is generally inserted into the 131 major groove of DNA, thus forming the principle DNA-protein interface. Alignment of the 132 trihelical bundle from ChiS with the DNA-bound structure of the LacI repressor (PDB: 133 1EFA (12); RMSD of modeled Ca carbons = 3.514) revealed a similar spatial 134 arrangement for each helix (Fig. 3C). Notably, however, the ChiS HTH has an extended 135 loop between helix 2 and helix 3 that forms a beta sheet ( Fig. 3B-D). Structural insertion 136 between these helices is not typical; thus, the sheet found here is a distinct variant of 137 the HTH which we refer to as a "helix-sheet-helix". 138 Alignment of the ChiS DBD to LacI also revealed that the sheet within the ChiS helix-140 sheet-helix domain runs along the major groove (Fig. 3C, 4A), though it sterically 141 conflicts with the DNA bases. This may suggest that the ChiS DBD takes on a slightly 142 different conformation when bound to DNA. Consistent with this idea, the beta sheet 143 has the highest B-factor (a measure of structural motion) in the ChiS DBD structure, 144 indicating that it is relatively flexible (Fig. 3E). Thus, we speculate that this beta sheet is 145 stabilized in the major groove when the ChiS DBD is bound to DNA. The unique helix-146 sheet-helix feature of the ChiS C-terminal domain may also explain why it was not 147 previously identified as a DBD by structure prediction algorithms like Phyre2.

149
ChiS may bind to intrinsically bent DNA 150 Above, we identified five residues (R1068, R1074, K1078, R1090, and R1092) that 151 were critical for the ChiS DBD to bind to DNA. Mapping these residues onto the ChiS 152 DBD structure revealed that all five residues were found within the trihelical bundle that 153 forms the helix-sheet-helix ( Fig. 4A), which is consistent with this domain playing a 154 critical role in DNA binding. Specifically, these residues were located in the beta sheet 155 of the helix-sheet-helix (R1068), helix 3 (R1074, K1078), and helix 1 (R1090, R1092).

157
Most residues critical for DNA binding activity (R1068, R1074, K1078, R1090) were in 158 close proximity to DNA on our modeled alignment; however, one residue (R1092), was 159 distant from the DNA (Fig. 4A). Many transcription factors bend DNA upon binding to 160 their target site (13,14). Thus, one possible explanation for the critical role of R1092 is 161 that the Pchb promoter is bent when bound by ChiS, which would allow for R1092 to 162 come into close contact with DNA. To test this idea, we carried out a classic in vitro gel 163 mobility shift assay to test DNA bending (15). This assay operates on the basis that the 164 location of a bend within a DNA molecule alters its mobility during native PAGE analysis 165 (16,17). DNA probes that contain a bend in the middle of the probe exhibit the lowest 166 mobility, while probes with the bend closer to one end show the highest mobility. Thus, 167 we designed seven DNA probes of equal length that gradually shifted the position of the 168 ChiS binding sites within the chb promoter ( Fig. 4B and Fig. S1). First, we ran these 169 probes in the absence of ChiS protein and found that they ran at different mobilities 170 where the probes with the ChiS binding sites in the middle exhibited the lowest mobility 171 (Fig. 4C). This suggested that the chb promoter likely has an intrinsic bend that is 172 centered around the ChiS binding sites. The mobility pattern observed for these DNA 173 probes did not change when incubated with the purified ChiS DBD (Fig. S3), suggesting 174 that binding of the DNA probe by ChiS does not further bend the promoter. We propose 175 that the chb promoter has an intrinsic bend, which may allow residues in the ChiS DBD, 176 like R1092, to directly interact with DNA. The intrinsic bend found in the chb promoter 177 may increase the affinity of ChiS for this region of DNA; indeed, DNA bending has been 178 shown to increase the affinity of certain transcription factors for their DNA binding site 179 (18).

181
The ChiS family DNA binding domain is associated with variable domain arrangements 182 in diverse proteins 183 Above, we show that the ChiS DBD represents a cryptic variant of an HTH domain. As 184 noted previously, the ChiS DBD is found in proteins other than homologs of ChiS (5). To 185 more fully catalog proteins that contain this domain, we generated a profile Hidden 186 Markov Model (HMM) to the ChiS DBD and screened for its presence among 187 eubacterial genomes. A profile HMM is a position-specific scoring system that can 188 effectively encode the variation in a training set of representative peptide sequences, 189 and then find similar sequences from a much larger and more distantly related dataset 190 compared to tools that do not require training, such as BLAST (19,20). 191 192 This analysis revealed that the ChiS DBD is present in diverse Proteobacterial genomes 193 (Spreadsheet S1). The vast majority of hits from our search were direct homologs of 194 ChiS (3242/3829 = 84.7%), however, many proteins exhibited distinct domain 195 architectures (587/3829 = 15.3%) (Fig. 5A). Strikingly, the ChiS DBD was found 196 exclusively at the C-terminus in all of these proteins and was commonly associated with 197 sensory domains (Fig. 5A). Furthermore, the helix-sheet-helix is highly conserved 198 across these diverse proteins (Fig. 5B, Spreadsheet S1), and even the most dissimilar 199 ChiS DBD homolog (MAC43155. Cloning, protein production and purification 222 The chiS 1024-1129 (VC0622) construct was cloned into an Amp R pET15b-based vector 223 using the FastCloning method (24). This vector appended a TEV cleavable 6x His tag 224 onto the N-terminus of ChiS 1024-1129 . Vector and inserts were amplified using the primers 225 listed in Table S4. The plasmid was transformed into E. coli BL21(DE3) (Magic) cells 226 (25) and the protein was expressed in M9 media (High Yield M9 Se-Met media, 227 Medicilon Inc.). The starting overnight culture was grown in LB medium supplemented 228 with 130 μg/mL ampicillin and 50 μg/mL kanamycin at 37°C and 220 rpm. The next day, 229 M9 medium supplemented with 200 μg/mL ampicillin and 50 μg/mL kanamycin was 230 inoculated with the overnight culture (1:100 dilution) and incubated at 37°C and 220 231 rpm. Protein expression was induced at OD600=1.8-2.0 by the addition of 0.5 mM 232 isopropyl β-d-1-thiogalactopyranoside and the culture was further incubated at 25°C, 233 200 rpm for 14 hours (26). The cells were harvested by centrifugation at 6,000 xg for 10 234 minutes, resuspended to 0.2 g/mL in lysis buffer (50 mM Tris pH 8.3, 0.5 M NaCl, 10% 235 glycerol, 0.1% IGEPAL CA-630) and frozen at -30°C until purification.

237
Frozen pellets were thawed and sonicated at 50% amplitude, in a 5s on, 10s off cycle 238 for 20 min at 4°C. The lysate was clarified by centrifugation at 18,000 xg for 40 minutes 239 at 4°C and the supernatant was collected. The protein was purified in one step by IMAC 240 followed by size exclusion chromatography using ÅKTAxpress system (GE Healthcare) 241 as previously described with some modifications (27). The cell extract was loaded into a 242 His-Trap FF (Ni-NTA) column with loading buffer (10 mM Tris-HCl pH 8.3, 500 mM 243 NaCl, 1 mM Tris (2-carboxyethyl) phosphine (TCEP), 5% glycerol) and the column was 244 washed with 10 column volumes of loading buffer and 10 column volumes of washing 245 buffer (10 mM Tris-HCl pH 8.3, 1 M NaCl, 25 mM imidazole, 5% glycerol). Protein was 246 eluted with elution buffer (10 mM Tris pH 8.3, 500 mM NaCl, 1 M imidazole), loaded 247 onto a Superdex 200 26/600 column, separated in loading buffer, collected, and 248 analyzed by PAGE. The 6x His tag was cleaved with recombinant TEV protease in a 249 ratio of 1:20 (protein:protease) overnight at room temperature. The cleaved protein was 250 separated from uncleaved protein, recombinant TEV protease, and 6x His tag peptide 251 by Ni-NTA-affinity chromatography using loading buffer followed by loading buffer with 252 25 mM imidazole. The cleaved protein was collected in the flow-through in both the 253 loading buffer and the loading buffer with 25 mM imidazole. Both fractions were 254 analyzed by PAGE for 6x His tag cleavage, concentrated to 6-8 mg/mL, and set up for 255 crystallization.

257
Crystallization, data collection, structure solution and refinement 258 The protein from both fractions (collected in flow through and in 25 mM imidazole) was 259 set up at 6-8 mg/mL in loading buffer containing 0 or 500 mM NaCl as 2 μL 260 crystallization drops (1 μL protein: 1 μL reservoir solution) in 96-well plates (Corning) 261 using commercial Classics II, PACT and JCSG+ (QIAGEN) crystallization screens. 262 Diffraction quality crystal of the protein collected with 25 mM imidazole grown from the 263 condition with 0.2 M lithium sulfate, 0.1 M Bis-Tris, pH 5.5, 25%(w/v) PEG 3350 264 (Classics II, #74) was flash frozen in liquid nitrogen for data collection.

266
The crystals were screened, and data were collected at the Life Sciences-Collaborative 267 Access Team (LS-CAT) beamline F at the Advanced Photon Source (APS) of the 268 Argonne National Laboratory. A total of 300 diffraction images were indexed, integrated 269 and scaled using HKL-3000 (28). The structure was determined with the HKL3000 270 structure solution package using anomalous signal from selenomethionine (Se-Met). 271 The initial model went through several rounds of refinement in REFMAC v. 5.8.0258 272 (29) and manual corrections in Coot (30). The water molecules were generated using 273 ARP/wARP (31) and ligands were added to the model manually during visual inspection 274 in Coot. Translation-Libration-Screw (TLS) groups were created by the TLSMD server 275 (32) and TLS corrections were applied during the final stages of refinement. MolProbity 276 (33) was used for monitoring the quality of the model during refinement and for the final 277 validation of the structure. The structure was deposited to the Protein Data Bank 278 (https://www.rcsb.org/) with the assigned PDB code 7KPO.

292
Measuring GFP reporter fluorescence 293 GFP fluorescence was determined essentially as previously described (34). Briefly, 294 single colonies were picked and grown in LB broth at 30°C for 18 hours. Cells were then 295 washed and resuspended to an OD600 of 1.0 in instant ocean medium (7 g/L; Aquarium 296 Systems). Then, fluorescence was determined using a BioTek H1M plate reader with 297 excitation set to 500 nm and emission set to 540 nm.

299
Chromatin immunoprecipitation (ChIP)-qPCR assays 300 ChIP assays were carried out exactly as previously described (5). Briefly, overnight 301 cultures were diluted to an OD600 of 0.08 and then grown for 6 hours at 30°C. Cultures 302 were crosslinked using 1% paraformaldehyde, then quenched with a 1.2 molar excess 303 of Tris. Cells were washed with PBS and stored at -80°C overnight. The next day, cells 304 were resuspended in lysis buffer (1x FastBreak cell lysis reagent (Promega), 50 μg/mL 305 lysozyme, 1% Triton X-100, 1 mM PMSF, and 1x protease inhibitor cocktail; 100x 306 inhibitor cocktail contained the following: 0.07 mg/mL phosphoramidon (Santa Cruz), 307 0.006 mg/mL bestatin (MPbiomedicals/Fisher Scientific), 1.67 mg/mL AEBSF (DOT 308 Scientific), 0.07 mg/mL pepstatin A (Gold Bio), 0.07 mg/mL E64 (Gold Bio)) and then 309 lysed by sonication, resulting in a DNA shear size of ~500 bp. Lysates were incubated 310 with Anti-FLAG M2 Magnetic Beads (Sigma), washed to remove unbound proteins, and 311 then bound protein-DNA complexes were eluted off with SDS. Samples were digested 312 with Proteinase K, then crosslinks were reversed. DNA samples were cleaned up and 313 used as template for quantitative PCR (qPCR) using iTaq Universal SYBR Green 314 Supermix (Bio-Rad) and primers specific for the genes indicated (see Table S4 for 315 primers) on a Step-One qPCR system. Standard curves of genomic DNA were included 316 in each experiment and were used to determine the abundance of each amplicon in the 317 input (derived from the lysate prior to ChIP) and output (derived from the samples after 318 ChIP). Primers to amplify rpoB served as a baseline control in this assay because ChiS 319 does not bind this locus. Data are reported as 'Fold Enrichment', which is defined as the 320 ratio of Pchb / rpoB found in the output divided by the same ratio found in the input. 321