The Helix-Turn-Helix DNA Binding Motif

One of the most striking findings to emerge from the determinations of the structures of Cro, CAP, and X-repressor (1-3) was that these three sequence-specific DNA-binding proteins had in common a substructure now named the helix-turn-helix motif. The apparent importance of this motif was suggested not only by its striking similarity in the respective structures (4,5) but also from amino acid sequence comparisons (6-11) which indicated that a similar motif might occur in other DNA-biding proteins. It was proposed (1,3, 12-14) that the helix-turn-helix motif interacts with operator DNA in a manner simiiar to that shown in Fig. 1 for Cro. Recognition of a specific sequence on DNA was thought to be achieved by a network of hydrogen bonds and other contacts between the side chains of the protein and the parts of the base pairs exposed within the grooves of the DNA. The second helix of the helix-turn-helix motif, which was thought to bind withii the major groove of the DNA, was presumed to be especially important in recognition. The importance of the helix-turn-helix motif in DNA-protein interaction has been supported by a wealth of biochemical and genetic evidence (e.g. see Refs. 15-18) and has now been shown directly by recent structure determinations of a number of complexes of repressor proteins with synthetic operators. These include the DNA-binding domains of the repressors of phage 434 (19,20) and phage X (21), Cro protein from phage 434 (22), and trp repressor from Escherichia coli (23). In each case the helix-turn-helix motif is an integral part of the protein-DNA interface with the second a-helix (the “recognition helix”) located within the major groove of the DNA (cf. Fig. 1). The purpose of this report is not o review, in detail, the structures of these repressor-operator complexes. Rather, we will first discuss the use of amino acid sequence comparisons to locate putative helixturn-helix motifs. Then we will review the structural correspondence between known helii-turn-helix units. Finally, the role of the heliiturn-helix in recognition of sequence-specific binding sites on DNA will be evaluated in light of the recently defined repressor-operator complexes.


Sequence Comparisons
There has been considerable interest in the use of amino acid sequences to identify putative helix-turn-helix motifs in proteins of unknown structure. Predictions that have been confirmed by subsequent structure determination include the helix-turn-helix units of X-repressor (3, 6), trp repressor (9, 23, 24), 434 Cro protein,and 434 repressor (6,7,19,22,25,26). Prediction of a helix-turn-helix in loc repressor (7,8) has been verified by two-dimensional NMR spectroscopy (27), and predictions for the biotin repressor of E. coli (28) are strongly supported by genetic analysis (29).
These successful predictions give confidence in the methods on which they were based. They also provide a representative set of examples against which new predictions can be assessed.
In comparing amino acid sequences, especially those that have marginal correspondence, it is essential that defined statistical procedures be used. Also it is very desirable that the significance of a proposed matching of sequences be assessed by controls carried out with unrelated or jumbled sequences (e.g. see Refs. 8,30,and 31). We will focus our discussion on the method employed in our laboratory to search for putative helix-turn-helix units and, as listed above, found to be effective in a number of cases.
A new amino acid sequence is compared with a "master set" of prealigned sequences, 22 amino acids in length, taken from proteins known or assumed to contain helix-turn-helix units (9, 31). Our  12). For clarity the side chains of the residues within the helix-turn-helix units have been removed, and only the a-carbon atoms are shown as small white spheres. The helixturn-helix on the right of the figure is most obvious. The first helix extends from right to left; the second helix (the "recognition helix") is seen end-on within the major groove of the DNA. For the remainder of the protein acidic residues are drawn red, basic residues blue, hydrophilic noncharged residues are green, and hydrophobic residues University of Oregon and Du Pont). are shown gray (original figure prepared by Dr. D. H. Ohlendorf, "master set" consists of 10 proteins (Fig. 2). All but three of these. (P22 Cro, P22 repressor (cII), and X cII protein) are now confirmed as containing a helix-turn-helix motif (see above). An amino acid sequence of interest is compared with the master set by evaluating, in turn, the correspondence between every possible 22-amino acid segment and the master set of sequences. For largely historical reasons it is convenient to use a system of scoring in which sequences with the best correspondence with the master set have the lowest scores. The score for each 22-amino acid segment is determined by summing the total number of amino acids in the master set that differ with the segment being tested. The score so obtained is normalized by dividing by 220 (Le. 22 amino acids X 10 sequences in the master set) to give the average amino acid change per codon (AAC). Scores obtained from comparing many 22-amino acid segments with the master set are pooled to provide statistical estimates of the significance of an individual AAC score (9,31).
Segments with the lowest scores (AAC values) are the most likely candidates as helix-turn-helix motifs. It is not possible to draw a sharp dividing line between those proteins that do have a helix-turnhelix and those that probably do not. If, however, a 22-amino acid segment is found with an AAC score of 0.80 or less, it can be regarded as a strong candidate for a helix-turn-helix motif. Such sequences are shown in Fig. 2

.'
This procedure successfully predicts all known examples of heliiturn-helix motifs in sequence-specific DNA-binding proteins. In addition it does not misidentify any 22-amino acid segments that are known not to be helix-turn-helix units.
An amino acid sequence that corresponds to a putative helix-turnhelix motif can be given greater credence if it is compatible with known stereochemistry (9). Inspection of the structure of Cro suggested that a helii-turn-helix unit would probably have the following characteristics, with residue numbering as in Fig. 2 (9). (i) Residue 9 should be a glycine (having a conformation rare for nonglycines). (ii) Residues 4 and 15 should not be charged (these residues being buried Tables giving the AAC scores for these and other DNA-binding proteins, as well as statistical estimates of significance and a full set of references, are available on request.

-F G Q T K T K D L C V Y Q S A J N K A I H 0 . 7 0 43C r o 1 6 -M T Q T E L A T K A G V K Q Q S l Q L I E A
Other DNA-bindinn p r o t e i n s 18-L Leu0

8 -Q N I T R A A H V L G M S Q P A Y S N A V A 0 . 7 9
LysR

9 -G S L T E A A H L L H T S Q P T Y S R E L A 0 . 7 9
AmpR There are a number of proteins that have been proposed to include These characteristics are not absolute, and it is quite possible that potential helix-turn-helix motifs but agree poorly with the master set they could be violated by a bona fide helix-turn-helix motif (e.g. a of reference sequences used here. Examples include homeo boxes, functional mutant of A-repressor lacks the "invariant" glycine at histones, some transcriptional activator proteins, resolvases and inposition 9 (34)). However, it is worth noting that none of the most version proteins, and the AIDS protein "tat."' The poor scores asso-ciated with these proteins do not, of course, prove that they do not have helix-turn-helix motifs; it only suggests that the likelihood is not high.

1 -L S F T H A A I E L N V T H S A I S Q H
Dodd and Egan (32) have suggested a method of predicting putative helix-turn-helix motifs that is similar in principle to the one described here. It uses a master set of 37 sequences and a somewhat complicated method of scoring agreements. On the whole it gives results similar to those shown in Fig. 2, although it does not successfully predict the known helix-turn-helix motif in trp repressor. Another method of predicting helix-turn-helix motifs has been proposed by White (33). This method also does not predict the known helix-turn-helix in trp repressor and, in addition, incorrectly suggests helix-turn-helix motifs in myoglobin and superoxide dismutase.

Structural Comparisons
The initial recognition of the significance of the helix-turn-helix motif derived as much from structural comparisons as from considerations of amino acid sequences. In particular, CAP and Cro were seen to contain a similar helix-turn-helix substructure in their presumed DNA binding regions even though the rest of these protein structures had little in common (4). 24 a-carbon atoms (residues 13-36) including the helix-turn-helix of Cro could be superimposed with a root mean square discrepancy of 1.1 A on the corresponding 24 acarbons (166-189) of CAP (4). The significance of this result was enhanced by the finding that the structure of X-repressor also contained a helix-turn-helix substructure (5). In this case 23 a-carbons (residues 31-53) could be superimposed on residues 14-36 of Cro with a root mean square discrepancy of only 0.7 A. A systematic search through all protein structures in the Brookhaven Data Bank failed to find any 22 a-carbon segment that corresponded to the helix-turnhelix as seen in Cro, X-repressor, and CAP (4,35) (Fig. 3).
Now that the helix-turn-helix has been observed in a number of other DNA-binding proteins, it is possible to extend the initial structural comparisons. The correspondence between the helix-turnhelix unit of Cro (1) with those seen in CAP (2), X-repressor (3), trp repressor (24), phage 434 Cro (25), and phage 434 headpiece (26) is given in Table I. In making these comparisons Cro was chosen as the "reference" structure, in part because it was used for the initial comparisons, but also because it agrees better with most of the other helix-turn-helix motifs than they do among themselves. The backbone segment that is most precisely conserved in all six of the above proteins consists of 21 a-carbon atoms (Table I) . 3. Histogram showing the result of a search through the Brookhaven Data Bank for helix-turn-helix motifs. The inset shows the superposition of the helix-turn-helix motifs as initially seen in Cro and CAP (4). The horizontal axis is the root-mean-square discrepancy between the 24 a-carbon atoms in the helix-turn-helix of Cro and all possible 24 a-carbon segments for the known protein structures. Also shown are the root mean square discrepancies between the helix-turn-helix of Cro and the eight other proteins listed in Table I. Note that the histogram is taken from the original search (4,34) and was carried out using the 24 a-carbon atoms 13-36 of Cro repressor. In contrast, the individual values 1-8 shown in the figure were calculated for the shorter 21 a-carbon segment 16-36. As is clear from the figure and from Table I, this 21 a-carbon segment is well conserved in all known repressor and activator crystal structures. Beyond the 21 a-carbons, however, the protein backbones may differ substantially in the different structures.

TABLE I Structural correspondence between helix-turn-helix motifs
The table gives the root mean square discrepancy between 21 acarbons in the helix-turn-helix of the subject protein and that of Cro repressor. The rank corresponds to the numbering used in Fig. 3 substructures similar to the helix-turn-helix motif. The first of these is "DNA-binding protein 11," a nonspecific DNA-bindingprotein from Bacillus stearothermophilus (36), the second is cytochrome c peroxidase, and the third is the ribosomal protein L7/L12 (Table I). The latter two helix-turn-helix structures were found in a recent search through the Brookhaven Data Bank by Richardson and Richardson (37). Although the helix-turn-helix units in these three proteins correspond reasonably well with that of Cro, none agrees as well as the helix-turn-helix units in the proteins that are known to bind DNA sequence specifically (Fig. 3). Nevertheless, it is intriguing that two of the three examples are proteins that interact with nucleic acids. The amino acid sequences within these three helix-turn-helix units do not obviously correspond with the master set of 10 sequences. However, the amino acid sequences of several homologs of the nonspecific DNA-binding protein have been determined and one of these, from Rhizobium melitoti (DBPZZ ( R m ) ) yields a score of 0.80 (Fig. 2). Furthermore, this score is for the set of 22 amino acids that correspond to the observed helix-turn-helix in the B. stearothermophilus enzyme. The significance, if any, of this correspondence is a matter for debate.

Role of the Helix-Turn-Helix in Recognition
For Cro protein the "recognition helix" (i.e. the second helix of the helix-turn-helix motif) protrudes from the surface of the protein and matches in shape the major groove of the DNA (Fig. 1). This complementarity in shape can facilitate binding of the protein to both sequence-specific and nonspecific sites on the DNA. As shown in Table I, the backbone conformations of the helix-turn-helix motifs in other sequence-specific DNA-binding proteins are very similar to that seen in Cro. Notwithstanding this overall similarity, the helixturn-helix motifs do not interact with the DNA in exactly the same way in each repressor-operator complex (19-23). This had been anticipated from early structural comparisons of Cro and X-repressor (5, 13), although "helix-swap" experiments (38,39) suggested similar rather than dissimilar geometries of binding. It can now be seen directly in the different repressor-operator complexes that the respective helix-turn-helix motifs adopt approximately similar but distinctly different binding geometries with respect to the DNA operator The helix-turn-helix does not function as a fixed "reading head" that is always aligned in the same way relative to the DNA. Rather, recognition is achieved by a combination of factors. Specific hydrogen bonds and van der Waals contacts between the side chains of the protein and the parts of the base pairs exposed within the grooves of the DNA clearly play a major role. Appropriately placed water molecules that bridge between the protein and the DNA may also participate in the recognition of a specific sequence. The DNA may also adapt to enhance contacts within the grooves as well as interactions with the phosphate backbone.
The role of the helix-turn-helix appears to be to provide a rigid underlying framework that supports the recognition surface of the protein but not to define how that recognition surface is aligned with the DNA.
The complexes involving X-repressor headpiece, 434 Cro, and 434 repressor headpiece all support the expectation that these proteins recognize their specific binding sites on the DNA by direct contacts with the edges of the base pairs exposed within the grooves of the DNA (19-22). In the complex of trp repressor, however, there are no (19-23).
direct contacts between the protein and the base pairs that have been shown in uiuo to be important in recognition. Sigler and colleagues (23) argue that trp repressor provides an example of indirect readout. In other words, they suggest that trp operator is recognized indirectly through sequence-specific changes induced in the geometry of the phosphate backbone that, in turn, permit the formation of a tight complex with the protein. This rationalization for the specific binding of trp repressor poses several questions. In particular, the observed distortions in the structure of the trp operator are not large, and it is not obvious why different DNA sequences could not assume conformations similar to that adopted by the trp operator. The crystals of the trp repressor-operator complex were grown from 50 mM NaC1, 11 mM CaCL, 10 mM cadodylate and 35% methylpentanediol(23). Such low salt, high alcohol conditions would be expected to increase the affinity of nonspecific DNA binding. Indeed, von Hippel and coworkers (40) have shown that in low salt the affinity of lac repressor for nonspecific sites on DNA can approach, if not exceed, its affinity for the lac operator. Furthermore, they have also shown that the nonspecific binding of lac repressor is enhanced by the presence of glycerol (41). If trp repressor behaves similarly to lac repressor, it raises the possibility, first suggested to us by von Hippel,' that the conditions necessary to crystallize the trp repressor-operator complex may have resulted in a sequence nonspecific complex. This question will need to be addressed in future studies.
The reason for the precise conservation of the backbone configuration of the helix-turn-helix motif in the different repressors is not obvious. If different helix-turn-helix motifs were required to align on the DNA in exactly the same manner, then it would be easy to argue that this required strict conservation of the shape of the helix-turnhelix motif. As was noted, this is not the case. Different helix-turnhelix motifs are seen to bind to DNA with different alignments. Recent evidence (42-44) suggests that the helix-turn-helix unit of lac repressor could be aligned on the DNA in a direction opposite to that of X-repressor, cro 434, and trp repressor.
There is also no obvious structural reason for the conservation of the helix-turn-helix motif. Because structures corresponding to the helix-turn-helix motif are not found in protein structures in general (Fig. 3), it seems unlikely that the helix-turn-helix unit has a conformation that is energetically favored during the folding process. Inspection of observed helix-turn-helix motifs shows that there can be direct van der Waals contacts between residues in the "elbow" region, but the area in contact is relatively small. Pabo and Sauer (17) have noted that the contact residues include the highly conserved alanine at position 5 and the conserved isoleucine or valine at position 15 ( Fig. 2) and have suggested that this "invariant" contact helps maintain the angle between the two a-helices. On the other hand, the observations that CAP has a glycine and trp repressor has a lysine in place of the critical alanine suggest that the interhelix angle is not determined by these contacts alone.
Perhaps the most reasonable explanation is that the conservation of the backbone of the helix-turn-helix motif is a natural consequence of evolution. Presumably all helix-turn-helix motifs that have amino acid sequence correspondence evolved from a common precursor. It is well known that the amino acid sequence of a protein changes much more rapidly during evolution than does its three-dimensional structure. For this reason, the backbone of the helix-turn-helix motif would be expected to be conserved during evolution. If, in addition, the helix-turn-helix were to be intimately associated with the DNA in the DNA-protein complex, then it would be heavily constrained not to change its shape. In this scenario, the evolution of DNAbinding proteins specific for different operators would be achieved by a sequential process in which changes of individual amino acids within the helix-turn-helix motif, as well as larger changes elsewhere in the DNA-binding protein, could result in new sequence preferences, but the backbone of the helix-turn-helix motif would remain essentially invariant.