Common ancestry of heterodimerizing TALE homeobox transcription factors across Metazoa and Archaeplastida

Background Complex multicellularity requires elaborate developmental mechanisms, often based on the versatility of heterodimeric transcription factor (TF) interactions. Homeobox TFs in the TALE superclass are deeply embedded in the gene regulatory networks that orchestrate embryogenesis. Knotted-like homeobox (KNOX) TFs, homologous to animal MEIS, have been found to drive the haploid-to-diploid transition in both unicellular green algae and land plants via heterodimerization with other TALE superclass TFs, demonstrating remarkable functional conservation of a developmental TF across lineages that diverged one billion years ago. Here, we sought to delineate whether TALE-TALE heterodimerization is ancestral to eukaryotes. Results We analyzed TALE endowment in the algal radiations of Archaeplastida, ancestral to land plants. Homeodomain phylogeny and bioinformatics analysis partitioned TALEs into two broad groups, KNOX and non-KNOX. Each group shares previously defined heterodimerization domains, plant KNOX-homology in the KNOX group and animal PBC-homology in the non-KNOX group, indicating their deep ancestry. Protein-protein interaction experiments showed that the TALEs in the two groups all participated in heterodimerization. Conclusions Our study indicates that the TF dyads consisting of KNOX/MEIS and PBC-containing TALEs must have evolved early in eukaryotic evolution. Based on our results, we hypothesize that in early eukaryotes, the TALE heterodimeric configuration provided transcription-on switches via dimerization-dependent subcellular localization, ensuring execution of the haploid-to-diploid transition only when the gamete fusion is correctly executed between appropriate partner gametes. The TALE switch then diversified in the several lineages that engage in a complex multicellular organization. Electronic supplementary material The online version of this article (10.1186/s12915-018-0605-5) contains supplementary material, which is available to authorized users.

4 93 these drive the haploid-to-diploid transition by activating >200 diploid-specific genes and 94 inactivating >100 haploid-specific genes [10,16,17]. In subsequent studies, plant-type TALE-95 TALE heterodimers between KNOX and BELL were shown to be required for the haploid-to-96 diploid transition of the moss Physcomitrella patens [18,19]. Given the conserved role of 97 TALE heterodimerization as a developmental switch in the sexual life cycle of the plant 98 lineage, understanding its origins and diversification promises to shed light on the evolution 99 of developmental mechanisms during eukaryotic radiation and the emergence of land plants.

101
To delineate the ancestry of plant-type TALE heterodimerization, we performed a 102 phylogenetic and bioinformatics analysis of TALE TFs in the three algal radiations of the 103 Archaeplastida supergroup, the descendants of a single endosymbiosis event > one billion 104 years ago [20,21]. Our analysis showed that the TALEs were already diversified into two 105 groups at the origin of Archaeplastida, one sharing KNOX-homology and the other sharing 106 PBC-homology. Together with our protein-protein interaction data, we propose that all TALE 107 classes participate in heterodimerization networks via the KNOX-and PBC-homology 108 domains between the two ancestral groups.

121
To collect all the available homeobox protein sequences, we performed BLAST and Pfam-122 motif searches against non-plant genomes and transcriptome assemblies throughout the 123 Archaeplastida (S1 Spreadsheet), identifying 327 proteins from 55 species as the 124 Archaeplastida homeobox collection (29 genomes and 18 transcriptomes; S2 Spreadsheet).

125
Of these, 102 possessed the defining feature of TALE proteins, a three-amino-acid insertion 126 between aa positions 23-24 in the homeodomain [28]. At least two TALE genes were 127 detected in most genomes except five genomes in the Trebouxiophyceae class of the 128 Chlorophyta (S1 Spreadsheet; see S1A Notes for further discussion of the absence of 129 TALEs in Trebouxiophyceae).

130 131
The collected TALE sequences were then classified by their homeodomain features using a 132 phylogenetic approach, with TALEs from animals, plants, and early-diverging eukaryotes 133 (Amoebozoa and Excavata) as outgroups (S1 Fig). The resultant TALE homeodomain 134 phylogeny distinguished two groups in all three phyla of Archaeplastida (Fig 2). 1) The

135
KNOX-group as a well-supported clade displayed a phylum-specific cladogram: two 136 Glaucophyta sequences at the base (as KNOX-Glauco) were separate from the next clade, 137 which combines Rhodophyta sequences (as KNOX-Red1) and a Viridiplantae-specific clade 138 with strong support (92/90/1.00). 2) The non-KNOX group, including the BELL and GSP1 139 homologs, contained clades of mixed taxonomic affiliations. These analyses showed that the 140 TALE proteins had already diverged into two groups before the evolution of the 141 Archaeplastida and that the KNOX-group is highly conserved throughout Archaeplastida.     In Viridiplantae, we found a single KNOX homolog in most Chlorophyta species, whereas 168 KNOX1 and KNOX2 divergence was evident in the Streptophyta division, including the 169 charophyte Klebsormidium flaccidum and land plants (Fig 2). The newly discovered KN-C1 170 domain was specific to the Chlorophyta KNOX sequences and found in all but one species 171 (Pyramimonas amylifera). The absence of similarity between KN-C1 and the C-terminal

252
In all three species, we found that KNOX homologs interacted with all examined non-KNOX

485
Homology is restricted to a single homology-A domain ouside the homeodomain.