Zinc-finger domains in metazoans: evolution gone wild

We acknowledge funding from the Ministerio de Economia e Innovacion (Spanish Government) co-funded by FEDER (BFU2015-65235-P), and from the Agencia de Gestio d'Ajuts Universitaris i de Recerca Generalitat de Catalunya (AGAUR) (2014SGR1121).


Exuberant evolution of zinc-finger domains
Zinc-finger domains are rarely found alone; instead, they tend to form tandem arrays and sometimes they combine with different domains. This increases the length of the interaction surface and thus the number of different potential targets. The amount of redundancy in the binding sites is astonishing and it has been estimated that hundreds or even thousands of different C2H2-ZF sequences can recognize the same DNA triplet [5].
In general, transcription networks evolve through modifications of cis-regulatory sequences rather than through changes in the TFs. Mutations in the latter are often deleterious because they change the regulation of several genes at the same time. As a result, many developmental TFs are highly conserved across species. Zincfinger containing proteins, however, do not appear to follow this general rule and have instead undergone very rapid diversification. At least some of these changes appear to have been adaptive [2], suggesting that this TF family has been an important driving force for evolutionary innovation.
Another interesting property of zinc-finger proteins is that, unlike other abundant TF families, they have undergone bursts of gene duplication at different evolutionary time points, such as at the base of the vertebrates or in the primate branch [6,7]. Although the significance of this is not yet clear, it is tempting to speculate that it may have driven important species-specific adaptations.

In search of a recognition code
A longstanding question is whether a zinc-finger-DNA recognition code exists. In other words, given the sequence of a zinc-finger domain, can we predict its DNA target? Deciphering such a code would be useful to identify the actual targets in the genome and to better understand the contribution of different amino acids to the binding mechanism.
To attempt to decipher the code, two studies [3,5] used the one-hybrid system to estimate the DNAbinding affinities of thousands of natural zinc-finger domains. In this system, the interaction between the protein and the DNA sequence results in the expression of a reporter gene whose activity can be easily measured [8]. Persikov et al. [5] observed that not only could different zinc-finger sequences bind to the same DNA triplet, but the same zinc-finger could also recognize several different DNA targets. They also observed a negative relationship between the number of interactions and the strength of the binding, suggesting a trade-off between affinity and specificity. Najafabadi and colleagues [3] used information on the residues directly contacting the DNA bases to expand the zinc-finger-DNA recognition code. They observed that non-base-contacting amino acids also influenced the binding but the mechanism remained elusive. It was not until the study published in Genome Biology [4] that these data could be integrated into a more general model.

The role of non-base-contacting residues in DNA binding
How does the relative abundance of the C2H2-ZF domain in metazoans compared to other eukaryotes affect the diversity of DNA targets? Najafabadi et al. [4] describe a novel approach to addressing this question by plotting the affinities of the domains present in 238 eukaryotes against all possible DNA triplets, using available one-hybrid data [5]. The species analyzed included both non-metazoans (mostly fungi and plants) and metazoans. By representing the data in this way, they could clearly see that, in contrast to metazoan zinc-fingers, non-metazoan zinc-fingers could only recognize a very limited subset of DNA motifs.
The researchers discovered that there were other fundamental differences between the two groups of organisms. The recognition code in non-metazoans was mostly determined by the base-contacting residues, with little influence from other amino acids in the sequence. However, in metazoans, both base-contacting and nonbase-contacting residues were required for binding.
To unravel the complexity of zinc-fingers in metazoans, Najafabadi et al. [4] developed two separate models, one for each type of residue. The researchers used random forests, a machine-learning approach designed from a set of validated positive and negative cases. They observed that the non-base-contacting residues could discriminate between binding sites when the base-contacting residues were identical. By using molecular modeling of zinc-finger-DNA structures, they showed that these residues could contribute to the binding by forming hydrogen bonds with the DNA phosphate backbone. This is predicted to provide the necessary stability to the complex when the direct interactions with the bases are weak.
Najafabadi et al. [4] also analyzed the differences in DNA target binding between extant C2H2-ZFs and their inferred ancestral sequences. They concluded that the evolution of zinc-fingers was analogous to a kaleidoscope, with slight modifications of the amino acid sequence leading to dramatic changes in the preferred DNA targets, and with the combinations of motifs and targets appearing virtually unrestricted.

Concluding remarks
The metazoan zinc-finger domain provides a fascinating example of exhaustive exploration of the sequence space, with only four residues being completely conserved and the rest being highly variable. As shown in the study by Najafabadi and co-workers [4], the binding to DNA depends both on residues that make direct contact with the bases in the DNA and on other residues that contact the phosphate backbone. As a consequence, the complete range of possible DNA triplets can potentially be recognized by C2H2-ZFs. In fact, the diversity of the contacts made by these domains is so high that several solutions exist in nature to bind to every DNA triplet.
The large-scale studies performed to date have focused on single domains; therefore, much remains to be learnt about the combinatorial effects of several C2H2-ZFs. New experiments will be required to validate the model and to understand how proteins with multiple domains behave. Another open question is why the fast evolution of zinc-fingers is not harmful, considering the potential of these domains to impact gene expression. Many C2H2-ZF proteins contain other types of domains, such as the Krüppel-associated box (KRAB), which represses the expression of endogenous retroviral elements. This may impose limits to the functionality of C2H2-ZFs, effectively taming their power.