Gamete expression of TALE class HD genes activates the diploid sporophyte program in Marchantia polymorpha

Eukaryotic life cycles alternate between haploid and diploid phases and in phylogenetically diverse unicellular eukaryotes, expression of paralogous homeodomain genes in gametes primes the haploid-to-diploid transition. In the unicellular chlorophyte alga Chlamydomonas, KNOX and BELL TALE-homeodomain genes mediate this transition. We demonstrate that in the liverwort Marchantia polymorpha, paternal (sperm) expression of three of five phylogenetically diverse BELL genes, MpBELL234, and maternal (egg) expression of both MpKNOX1 and MpBELL34 mediate the haploid-to-diploid transition. Loss-of-function alleles of MpKNOX1 result in zygotic arrest, whereas a loss of either maternal or paternal MpBELL234 results in variable zygotic and early embryonic arrest. Expression of MpKNOX1 and MpBELL34 during diploid sporophyte development is consistent with a later role for these genes in patterning the sporophyte. These results indicate that the ancestral mechanism to activate diploid gene expression was retained in early diverging land plants and subsequently co-opted during evolution of the diploid sporophyte body.


Introduction
Life cycles of eukaryotes alternate between haploid and diploid phases, initiated by meiosis and gamete fusion, respectively. Expression of paralogous homeodomain (HD) genes in the two gametes and the subsequent heterodimerization of the respective proteins in the zygote direct the haploid-todiploid transition in gene expression in phylogenetically diverse eukaryotes, including the ascomycete fungus Saccharomyces cerevisiae (Goutte and Johnson, 1988;Herskowitz, 1989), the basidiomycete fungi Coprinopsis cinerea and Ustilago maydis (Gillissen et al., 1992;Hull et al., 2005;Kues et al., 1992;Spit et al., 1998, Urban, 1996, the Amoebozoa Dictyostelium discoideum (Hedgethorne et al., 2017), the brown alga Ectocarpus (Arun et al., 2019), the red alga Pyropia yezoensis (Mikami et al., 2019), and the unicellular chlorophyte alga Chlamydomonas reinhardtii (Ferris and Goodenough, 1987;Lee et al., 2008;Nishimura et al., 2012;Zhao et al., 2001). This broad phylogenetic distribution suggests this was an ancestral function of HD genes (reviewed by Bowman et al., 2016b). In Viridiplantae, the paralogs are two subclasses, KNOX and BELL, of TALE class HD genes. In Chlamydomonas, the minus (-) gamete expresses a KNOX protein (GSM1) and the plus (+) gamete expresses a BELL protein (GSP1), and upon gamete fusion, the two proteins heterodimerize and translocate to the nucleus, activating zygotic gene expression (Lee et al., 2008). GSM1 and GSP1 are necessary for diploid gene expression and, when ectopically expressed together in vegetative haploid cells, are sufficient to induce the diploid genetic program (Ferris and Goodenough, 1987;Lee et al., 2008;Nishimura et al., 2012;Zhao et al., 2001). Biologically, the expression of a unique paralog in each type of gamete, coupled with the requirement for heterodimerization for functionality, is a The M. polymorpha genome encodes nine TALE-HD-related family members: four KNOX genes and five BELL genes (Bowman et al., 2017; Figure 1), all of which are expressed in either the haploid sexual organs or the diploid sporophyte or both, with minimal or no expression detected by reverse transcription-polymerase chain reaction (RT-PCR) in the haploid vegetative thallus (Figure 1-figure  supplement 1). Of the four KNOX genes, three are of KNOX1 subclass, but with only one encoding a HD, and one of KNOX2 subclass (Figure 1-figure supplement 1). The two subclasses arose via gene duplication in an ancestral charophycean alga (Sakakibara, 2016;Joo et al., 2018;Frangedakis et al., 2017). Expression of MpKNOX1 (Mp5g01600), which encodes a HD, is predominantly detected in archegoniophores and young sporophytes. In contrast, the two other KNOX1 genes (MpKNOX1A, Mp4g12450; MpKNOX1B, Mp2g11140) lacking a HD are predominantly expressed in antheridiophores. Phylogenetic analysis indicates that these HD-less paralogs are liverwort specific and are likely evolved in the ancestral liverwort ( Figure 1-figure supplement 1). In contrast, expression of MpKNOX2 (Mp7g05320) is not detected in unfertilized reproductive organs and is expressed primarily during sporophyte development. The M. polymorpha BELL genes reside in three phylogenetically distinct clades, each with its own characteristic HD (Figure 2; Figure 2-figure supplement 1). Like some algal BELL-related proteins (Joo et al., 2018), the presence of a PBC domain is not easily discernible in any of the predicted M. polymorpha BELL protein sequences. Only one M. polymorpha BELL gene (MpBELL1, Mp8g18310)  White,5' and 3' untranslated regions (UTRs); black, coding exons; thin lines, introns; green triangles, start codon; black triangles, stop codon; red triangles, guide RNA-targeted positions. The mutant alleles generated via CRISPR-Cas9 are described in more detail in table 1; molecular lesions of some alleles can be found in Figure 1-source data 1. All protein annotation models are based on the Marchantia genome assembly of v5.1 except for MpKNOX2. The MpKNOX2 model is based on sequences derived from reverse transcription-polymerase chain reaction (RT-PCR). In genes with multiple homeodomains, they are denoted a, b, and c. Gene models were assembled using wormweb (http://wormweb.org/exonintron).
The online version of this article includes the following figure supplement(s) for figure 1: Source data 1. Molecular lesions of alleles listed in Figure 1 and Table 1.   is phylogenetically related to previously described, 'canonical', land plant BELL genes. MpBELL1 harbors a conserved canonical land plant BELL homeodomain sequence and a discernible, albeit divergent, BELL domain characteristic of other land plant BELL genes. Similar to MpKNOX2, MpBELL1 expression is not detected in reproductive organs prior to fertilization and is expressed predominantly during sporophyte development (Figure 1-figure supplement 1). The other four M. polymorpha BELL genes fall into two distinct clades and are more closely related to algal TALE-HD genes than to the canonical land plant BELL genes ( Figure 2). Three of the algal-like BELL genes, MpBELL2 (Mp4g09650), MpBELL3 (Mp8g02970), and MpBELL4 (Mp8g07680), each encode two or three homeodomains ( Figure 1). The carboxyl HD is most conserved with those upstream becoming progressively less conserved, consistent with their origins being via successive intragenic duplications. Notably, MpBELL3 and MpBELL4 both encode large proteins, while MpBELL2 encodes a much smaller protein consisting of only two HDs. These three genes, named here BELL-gamete, arose via gene duplications within liverworts, and related genes are found throughout liverworts and in some moss lineages that are sister groups to the vast majority of moss diversity ( Figure 2). The fourth algal-like sequence, MpBELL5 (Mp5g11060), is phylogenetically distinct from the other four M. polymorpha BELL genes, and resides in the previously defined 'GLX-basal' clade (Joo et al., 2018). Orthologs of MpBELL5 exist in other liverworts ( Figure 2). The most parsimonious interpretation is that a diversity of BELL paralogs arose in a charophyte algal ancestor, with the ancestral land plant likely possessing three paralogs representing the canonical land plant, GLX-basal and BELL-gamete lineages ( Figure 2). This diversity has persisted in M. polymorpha and other liverworts and, to a lesser extent, in mosses.

Maternal MpKNOX1 is required for post-zygotic embryo development
During gametophyte generation, MpKNOX1 is expressed specifically in the egg cell but is not detected at the stage prior, when the venter canal cell is present ( Figure 3A -B). During differentiation of the egg cell, MpKNOX1 appears to be expressed in a pulse early, with the signal diminishing in older archegonia ( Figure 3B). Following fertilization, MpKNOX1 is expressed throughout the developing sporophyte up until the time when future sporogenous cells become distinct ( Figure 3C-E), with expression thereafter becoming undetectable during meiotic stages ( Figure 3F). A female harboring a loss-of-function allele consisting of a 2.3 kb deletion spanning the MpKNOX1 locus ( Figure 1; Table 1) was crossed with wild-type males. In contrast to control crosses using wild-type males and females where sporophyte production was observed ( Figure 3G-I), mature sporophytes never developed on gametangiophores from >100 crosses between wild-type males and Mpknox1-6 ge females ( Figure 3J-L). However, on closer inspection, on rare occasions (3/129 fertilization events) did we encounter arrested multicellular embryos within senescing archegoniophores (Figure 3-figure supplement 1). A closer examination of female Mpknox1-6 ge mutants revealed a developmental arrest of the zygote following fertilization, with the zygote failing to undergo cytokinesis ( Figure 3K). However, two gametophytic tissues, the calyptra and the pseudoperianth, whose growth is dependent upon fertilization (Hofmeister, 1862), commence development normally. In Mpknox1-6 ge mutants, as in wild type, the calyptra undergoes periclinal divisions indicating successful fertilization and implying an intergenerational (sporophyte to gametophyte) signal inducing its development ( Figure 3H-I and K-L). Likewise, the pseudoperianth, a ring of tissue surrounding the archegonium that only develops post-fertilization, initially develops normally in Mpknox1-6 ge mutants ( Figure 3H-I and K-L). Subsequently, both the calyptra and pseudoperianth are developmentally arrested between 1 and 2 weeks post fertilization (wpf) in Mpknox1-6 ge mutants. In addition, Mpknox1-6 ge mutants exhibit a senescent archegoniophore phenotype. Unfertilized wild-type archegoniophores remain green for multiple are evident (Joo et al., 2018). MpBELL2/3/4 resides in a polytomy composed of liverwort and charophyte sequences, with other liverwort, moss, and charophyte sequences residing in a second clade; these are labeled 'BELL-gamete'. The multiple homeodomains of MpBELL2, MpBELL3 and MpBELL4 are designated a, b, and c (see Figure 1). MpBELL5 resides, with other liverwort and charophyte sequences, in the GLX-basal clade. Chlorophyte, light blue; charophyte, dark blue; liverwort, purple; moss, green; hornwort, dark green; lycophyte, brown; fern, orange; seed plant, red. Numbers at branches indicate posterior probability values >50%; branches explicitly shown have probability values >50%, whereas polytomies represent nodes with probability values <50%; values within subclades are omitted for clarity.
The online version of this article includes the following figure supplement(s) for figure 2:    weeks before they senesce ( Figure 3M). In contrast, Mpknox1-6 ge archegoniophores begin to senesce around 2 weeks post maturation ( Figure 3N), suggesting that an MpKNOX1-dependent signal is required for archegoniophore maintenance. At the MpKNOX1 locus, a convergently transcribed gene, SUPPRESSOR OF KNOX1 (MpSUK1, Mp5g01590), is expressed predominantly in antheridia ( Figure 3O-P), suggesting that MpKNOX1 may be regulated by an antisense transcript as has been described for MpFGMYB (Mp1g17210) (Hisanaga, 2019). Consistent with this hypothesis, a transcriptional fusion of 4.6 kb 5' of MpKNOX1 coding sequence with a β-glucuronidase (GUS) reporter coding sequence and 3' terminator results in expression in both egg cells and antheridia ( . However, if a 3' terminator sequence is inserted between the GUS coding sequence and the MpSUK1 sequences, the antisense repression is lost, suggesting a requirement of transcriptional overlap but in a non-sequence-specific mechanism ( Figure 3-figure supplement 2). MpSUK1 also encodes a 225 aa protein, including a 55 aa domain conserved across most liverworts but not found in other land plant lineages, that does not overlap with the MpKNOX1 coding sequence ( Figure 3O).
In addition to the full-length MpKNOX1 gene, two additional KNOX1-related genes, MpKNOX1A and MpKNOX1B, are encoded in the M. polymorpha genome. Both genes encode a conserved KNOX1 MEINOX domain ( MpBELL234 are expressed in antheridia, archegonia, and young embryos The acropetal development of antheridia within antheridiophores results in the youngest being peripheral. MpBELL2, MpBELL3, and MpBELL4 expression was detected in stage 2-4 antheridia (Higo et al., 2016) within antheridiophores as assayed by in situ hybridization ( Figure 4A, C and E). Likewise, translational MpBELL2 and transcriptional MpBELL3 and MpBELL4 GUS reporter gene fusion lines using 3.5 kb (MpBELL2), 5.6 kb (MpBELL3), and 5.8 kb (MpBELL4) of sequence 5' to a transcriptional start site exhibit signals in antheridia of developing antheridiophores ( Figure 4B, D, F and G). In these reporter lines, no signal was detected in unfertilized archegoniophores or developing sporophytes, nor was any expression observed in the vegetative gametophyte. However, in contrast to the above reporter lines, strand-specific transcriptomic analysis suggests that shorter transcripts, likely produced from an alternative promoter, are generated at the MpBELL3 and MpBELL4 loci in archegoniophores and sporophytes ( Figure 4-figure supplement 1A, B). Indeed, expression of both and Mpknox1 development; unfertilized (G, J), 1 wpf (H, K), and 3 wpf (I, L). Mpknox1-6 ge egg cells appear wild-type-like (unfertilized; G, J). In contrast to wild-type (H, I), the embryo of Mpknox1-6 ge mutants does not develop post fertilization (K, L). The zygote fails to undergo cytokinesis, with the nucleus disappearing after 3 weeks, leaving an empty space (approximately 3 wpf; L). The initial outgrowth of the pseudoperianth (black arrows) and calyptra (red arrows) post-fertilization is not affected in Mpknox1-6 ge mutants, but their development is arrested as well approximately 1 wpf (K, L). (M, N) Archegoniophores of Mpknox1-6 ge mutants begin to senesce approximately 2 weeks after maturation (N), while wild-type archegoniophores of the same age remain green (M). (O) Based on RNA-sequencing (RNA-seq) data and associated gene models, the 3' end of MpKNOX1 overlaps with the 3' untranslated region (UTR) of MpSUK1, which is transcribed from the opposite strand. Predicted full-length MpKNOX1 transcripts, that is, those including exons 2 and 3, are present primarily in the archegoniophore, consistent with the in situ data and semiquantitative reverse transcription-polymerase chain reaction (sqRT-PCR) data in Figure 1      These reporter genes all harbor sequences 5' of the longest predicted transcript at each of the loci; for example, pro MpBELL4 in (G). Signal appears strongest in young-to medium-aged antheridia (stage 3 and stage 4; Higo et al., 2016), with the signal being lost in older antheridia toward the center of the antheridiophore (B, F), possibly due to draining of spermatogenous tissue. Staining was not observed in unfertilized archegonia (B, D, F). (I) In situ hybridization using the full-length MpBELL4 coding sequence exhibits signals in egg cells of unfertilized archegonia, gametophytic tissues surrounding fertilized archegonia at 1 week post fertilization (wpf), as well as sporophytes up to at least 3 wpf. (H, J) Reporter genes constructed using an alternative promoter internal of the MpBELL4 locus ( ipro MpBELL4) marked with a Cap Analysis of Gene Expression (CAGE) signal (G) exhibit a signal in unfertilized archegonia, but not in antheridia (H). Signal remains visible in developing unfertilized archegonia within the center of the archegoniophore, but no signal was observed in sporophytes at 2 wpf (J). Scale bar = 200 μm for left panels in A, C, E and the right panel in C, panels B, D, F, H, and the right panel in J; 100 μm for right panels in A, E, I and the left panel in J; 50 μm for the left and middle panel in I.

Figure 4 continued on next page
genes was detected in egg cells of unfertilized archegonia, gametophytic tissues surrounding fertilized archegonia at 1 wpf, as well as sporophytes up to at least 3 wpf via in situ hybridization ( Figure 4I; Figure 4-figure supplement 1C). An MpBELL4 reporter construct using a 4.2 kb sequence upstream of a CAGE (Cap Analysis of Gene Expression) signal (Montgomery et al., 2020) corresponding to the presumed alternative transcriptional start site of the archegonial/sporophytic transcript results in an egg-cell and post-fertilization gametophytic signal; however, no expression was observed in the older sporophytes ( Figure

Paternal and maternal MpBELL are required for proper embryo development
To examine whether MpBELL234 could provide the male counterpart to the female MpKNOX1, we created loss-of-function alleles for each gene ( Table 1). When used as male parents in crosses with wildtype females, Mpbell2, Mpbell3, and Mpbell4 single mutants did not exhibit an aberrant phenotype; however, when Mpbell3 Mpbell4 (Mpbell34) double mutant or Mpbell2 Mpbell3 Mpbell4 (Mpbell234) triple mutant males were used in crosses with wild-type females, mature sporophytes were formed at a reduced frequency compared to crosses between wild-type parents ( Figure 5A-B). To quantify the reduction in fertility, we first examined crosses between wild-type parents in greater detail. A majority of fertilization events in crosses between wild-type males and wild-type females produced mature sporophytes ( Figure 5A-B). However, nearly 40 % of wild-type crosses were either aborted at the zygote stage or arrested at a globular multicellular stage, characteristic of about 1 wpf. As we did not observe sporophytes arrested at later stages, sporophytes that surpass the 1 wpf stage must generally progress to maturity. In contrast, most fertilization events derived from crosses between wild-type females and Mpbell(2)34 males resulted in sporophyte development that only progressed to a globular stage characteristic of the first week of wild-type development, followed by sporophyte arrest. A smaller number of fertilization events were aborted at zygote formation, resembling the phenotype observed in female Mpknox1-6 ge mutants ( Figure 5A-B). However, mature sporophytes with viable spores were produced in about 17 % of fertilization events. In individual crosses in which mature sporophytes form, the corresponding archegoniophores remain viable until after the sporophytes have matured, while unfertilized archegoniophores undergo senescence. The archegoniophores with only arrested sporophytes senesced in a manner similar to unfertilized archegoniophores.
Since both MpBELL3 and MpBELL4 expression was detected in the egg cell, we performed reciprocal crosses between Mpbell(2)34 females and wild-type males and we observed a similar distribution of sporophyte phenotypes compared to wild-type females crossed with Mpbell234 males: aborted zygotes, aborted multicellular embryos, and phenotypically normal sporophytes ( Figure 5A-B). However, in crosses where both the male and female harbored mutations in both MpBELL3 and MpBELL4, no mature sporophytes were formed, with a majority of embryos arrested at the zygote stage ( Figure 5A-B), a phenotype similar to, albeit less severe than, that observed for Mpknox1 mutants. Similar sporophyte phenotype ratios in reciprocal crosses between Mpknox1 and Mpbell34 have been described independently (Hisanaga, 2021).
We next examined whether MpKNOX1 and MpBELL proteins could interact. We chose antheridialexpressed full-length MpBELL4 for analysis due to the truncated nature of MpBELL2 and the extreme length of MpBELL3. In a split Yellow Fluorescent Protein (YFP) BiFC assay in Nicotiana benthamia leaf epidermal cells, MpKNOX1 on its own was cytoplasmically localized, but when co-expressed with nuclear localized MpBELL4, MpKNOX1 signal became nuclear ( Figure 5C-F). A similar interaction was observed with MpKNOX2 and MpBELL1 ( Figure 5-figure supplement 1). These interactions are selective, since neither interaction between MpKNOX1 and MpBELL1 nor between MpKNOX2 The online version of this article includes the following figure supplement(s) for figure 4:     Table 1. Alleles used in this study.
Nomenclature is as outlined previously (Bowman et al., 2016c;Montgomery et al., 2020).  and MpBELL4 was observed ( Figure 5-figure supplement 1). In contrast to full-length MpBELL4, we did not observe any efficient interaction between the shorter sporophyte-expressed MpBELL3 and MpKNOX1 in this heterologous system ( Figure 5G-H). Surprisingly, however, in this system, fulllength MpBELL4 and sporophyte-expressed MpBELL3 interacted ( Figure 5I-J).

Activation of sporophyte gene expression in the vegetative gametophyte
The expression patterns and mutant phenotypes of MpBELL2/3/4 and MpKNOX1 genes are consistent with a role in activating diploid gene expression, and thus we examined whether ectopic (co-) expression of these genes in vegetative gametophyte is sufficient to activate diploid gene expression ( Figure 5-figure supplement 2A). Ectopic expression of MpKNOX1 alone does not alter expression of MpKNOX2, MpBELL1, or MpBELL3 in the vegetative gametophyte ( Figure 5-figure supplement  2B). However, ectopic expression of either MpBELL3 or co-expression of MpKNOX1 and MpBELL3 in the vegetative gametophyte for 72 hr is sufficient to activate both MpKNOX2 and MpBELL1, whose expression is normally limited to sporophyte development, or, in the case of MpBELL1, also after continuous far-red light induction (Inoue et al., 2019;Figure 5-figure supplement 2B). Thus, in this context, MpBELL3 alone can activate MpKNOX1 expression, a scenario perhaps reminiscent of post-zygotic activation of MpKNOX1 after fertilization ( Figure 3C, Figure 3-figure supplement 2D).

An ancestral function for TALE-HD genes in Viridiplantae
The eukaryotic life cycle alternates between haploid and diploid phases, initiated by meiosis and gamete fusion, respectively. Organisms spanning the phylogenetic diversity of eukaryotes, including ascomycete and basiomycete fungi, Amoebozoa, brown algae, and chlorophyte algae, utilize paralogous homeodomain proteins that heterodimerize following gamete fusion to initiate the diploid genetic program, lending support for the idea that this may have been an ancestral function of homeodomain proteins. Our observations in M. polymorpha indicate that TALE-HD proteins, specifically MpKNOX1 and MpBELL2/3/4, are initially supplied in gametes, and that this gametic expression is required for diploid sporophyte development ( Figure 6). MpKNOX1 is absolutely required in the egg cell, as is MpBELL3/4 in either the sperm or egg. This scenario is reminiscent of that characterized in Chlamydomonas, wherein KNOX is supplied by the (-) gamete and BELL via the (+) gamete, and that once together in the zygote, the diploid genetic program can be activated. Thus, the basic tenets of the genetic regulation of the haploid-to-diploid transition are conserved in two widely divergent the 1 wpf stage. See Table 1 for details on mutant alleles. (B) Percentages of developmental stages observed for each of the crosses. Plants were crossed once and then examined after 1, 2, or 3 wpf. Observed embryos were grouped into the following developmental stages: aborted zygotes, sporophytes that were arrested at a stage younger than the maximum stage possible, and sporophytes that had reached the maximum stage expected. N = the total number of observed fertilization events. See    MpBELL1 and MpKNOX2 form a distinct heterodimer that acts at later stages of sporophyte development. The homeodomain lacking MpKNOX1A and MpKNOX1B genes expressed in the antheridia along with MpBELL2/3/4 potentially prevents functionality by sequestering MpBELL2/3/4 proteins. The function of MpBELL5 is unknown. Images are obtained from Marchant, 1713;Mirbel, 1835;Thuret, 1851;and Unger, 1837. The online version of this article includes the following figure supplement(s) for figure 6: Viridiplantae lineages, consistent with the notion that such a system was present in the common ancestor of extant Viridiplantae.
The obvious major difference between the roles of KNOX and BELL in Marchantia when compared to Chlamydomonas is the activity of MpBELL34 in both the egg and sperm, rather than being confined to the sperm. Thus, questions on how the system operates mechanistically and why it might have evolved are paramount. The functions of maternal MpKNOX1 and paternal MpBELL34 could be presumed to represent the ancestral condition, with the role of maternal MpBELL34 being a derived condition. The absolute requirement for maternal MpKNOX1 (Figure 3) and its ability to heterodimerize with full-length MpBELL3/4 ( Figure 5), which is expressed only paternally (Figure 4), suggest that this ancestral system functions in Marchantia, albeit with a reduced efficacy. In Marchantia, although neither maternal nor paternal MpBELL is absolutely required, both are necessary for efficient sporophyte development. One plausible hypothesis is that maternally supplied MpBELL3/4 evolved as a backup system to ensure diploid development following fertilization, concomitant with the evolution of sperm with reduced cytoplasmic content and highly condensed nuclei, anatomical features that could reduce the efficiency of MpBELL protein delivery by sperm. That both MpKNOX1 and short MpBELL3 are present in the egg cell prior to fertilization and that they do not efficiently interact, at least in a split YFP BiFC assay in Nicotiana leaf cells (Figure 5), suggests that the act of fertilization may trigger a biochemical change required for their interaction. Although short MpBELL3 can interact with full-length MpBELL4 ( Figure 5), it is likely not critical in planta, given the absolute requirement of MpKNOX1 for sporophyte development. The requirement of both maternal and paternal MpBELL for efficient sporophyte production may also provide a mechanism to ensure activation of sporophyte development only occurs following fertilization. Finally, it is of note that a similar system, wherein two TALE-HD genes are both expressed in male and female gametes, has been described for the brown alga Ectocarpus (Arun et al., 2019). As Ectocarpus also possesses multicellular haploid and diploid generations and is anisogamous, perhaps similar evolutionary pressures contributed to the independent evolution of similar systems.
Given the phylogenetic affinity of mosses as a sister group to liverworts (Renzaglia et al., 2018), we might expect a similar KNOX/BELL program to regulate the haploid-to-diploid transition in Physcomitrium. The phenotype of Mpknox1 alleles is more extreme than triple-null loss-of-function alleles of all PpKNOX1 loci where a sporophyte with viable spores forms (Sakakibara et al., 2008), suggesting that PpKNOX2 may also play a role in the transition in this species (Sakakibara et al., 2013). In Physcomitrium, PpBELL1 expression has been observed in sperm (Ortiz-Ramírez et al., 2017) via transcriptome analysis and in the egg via reporter gene (Horst et al., 2016). While the latter study reported a lack of male expression, this could be due to a loss of expression in this Physcomitrium accession (Meyberg et al., 2020), or alternatively, if the Physcomitrium PpBELL1 genomic architecture is similar to MpBELL3/4, the reporter gene insertion may have excluded regulatory sequences required for alternative transcription start sites. Hence, aspects of the ancestral system regulating the haploid-to-diploid transition could be present in multiple early diverging land plant lineages.
In contrast to the situation in bryophytes, in angiosperms, paternally supplied pluripotency factors related to BABYBOOM, an AP2/ERF transcription factor, can activate the zygotic genetic program (Khanday et al., 2019;Conner et al., 2017;Conner et al., 2015). Furthermore, despite the extensive literature on KNOX/BELL function in angiosperms, the only potential known vestige of this ancestral system existing in angiosperms is the presence of KNOX2 in the Arabidopsis female gametophyte, possibly in the egg cell (Pagnussat et al., 2007). Thus, it seems that the ancestral function has been lost entirely in some derived land plants, as it has presumably in the metazoan lineage (Bowman et al., 2016b).
Since liverworts are placental organisms, with the diploid sporophyte nourished by the maternal haploid gametophyte, there is opportunity for extensive intergenerational communication. For example, that the maternal gametophytic calyptra and pseudoperianth initiate their development in Mpknox1 mutants implicates a MpKNOX1-independent non-cell autonomous zygote-derived signal activated post-fertilization. In contrast, the premature senescence of Mpknox1 archegoniophores suggests a MpKNOX1-dependent intra-gametophytic signal for the maintenance of archegonial viability. Further, the premature senescence of wild-type archegoniophores bearing only aborted embryos suggests a signal emanating from older sporophytes to maintain the viability of the maternal archegoniophore tissues until sporophyte development is completed. Finally, that aborted embryos are found even in crosses involving only wild-type parents suggests possible maternal control over allocation of resources across multiple embryos, a phenomenon similar to maternal control over fruit set in angiosperms (Stephenson, 1981).

Roles of MpKNOX1 in Marchantia sporophyte
A key land plant innovation was the evolution of a multicellular diploid generation, the embryo, via mitoses, interpolated between gamete fusion and meiosis (Bower, 1908). In seed plants, ferns, and mosses, KNOX1 activity is associated with continued sporophyte cell proliferation, including sporophyte apical meristem activity (Hay and Tsiantis, 2010;Sakakibara et al., 2008;Sano et al., 2005). While M. polymorpha and other liverworts lack apical meristems during sporophyte development (Kienitz-gerloff, 1874), MpKNOX1 expression is detected throughout developing sporophytes until the inception of sporogenous cell differentiation (Figure 3), consistent with MpKNOX1 having a role in cell proliferation during M. polymorpha sporophyte development. Thus, while MpKNOX1 retains the ancestral function in the haploid-to-diploid transition, the lack of zygotic cell division in Mpknox1 mutants and the MpKNOX1 sporophytic expression pattern suggest that neofunctionalization of KNOX1 contributed to the evolution of the embryo via stimulation of cell proliferation. That MpBELL3/4 are also expressed during these stages of sporophyte development (Figure 4; Figure 4- figure supplements 1 and 2, 3) suggests these genes encode the heterodimeric partners of MpKNOX1 during sporophyte development. This is supported by the subfunctionalization of MpBELL genes with respect to heterodimerization partners-MpBELL3/4 specifically heterodimerize with MpKNOX1 while MpBELL1 interacts only with MpKNOX2 ( Figure 5-figure supplement 2). This subfunctionalization resembles that observed for KNOX/BELL interactions in angiosperms (Furumizu et al., 2015) but represents an independent evolutionary event since all angiosperm BELL genes are MpBELL1 orthologs.
In eukaryotic lineages in which multicellularity evolved in the diploid phase of the life cycle, genes involved in the haploid-to-diploid transition, or paralogs thereof, have been repeatedly co-opted into roles directing development in the diploid generation. Most conspicuously, despite the loss of their role in zygotic gene activation (see below), both non-TALE-(eg, Hox) and TALE-HD genes act to pattern the metazoan body (Merabet and Mann, 2016;Pearson et al., 2005;Lewis, 1978). Likewise, in the basidiomycete fungus Coprinopsis, the same HD heterodimer that initiates the diploid genetic program also directs early developmental stages in the multicellular diploid (Kamada, 2002). In the syncytial chlorophyte alga Caulerpa lentillifera, differential TALE-HD expression in the diploid body was speculated to influence differentiation of fronds and stolons, but functional data are lacking (Arimoto, 2019). Finally, in land plants, as the multicellular sporophyte evolved increasing complexity, KNOX/BELL genetic modules were co-opted to direct development of novel organs and tissues via regulation of cell proliferation (KNOX1) or differentiation (KNOX2) in conjunction with other gene regulatory networks (Furumizu et al., 2015;Hay and Tsiantis, 2010).

The remaining TALE-HD genes of M. polymorpha
The two KNOX1-related sequences lacking homeodomains (MpKNOX1A and MpKNOX1B) are reminiscent of similar, albeit independently evolved, proteins in angiosperms that act as inhibitors of KNOX/BELL function by forming inactive heterodimers primarily with BELL partners (Magnani and Hake, 2008;Kimura et al., 2008). The antheridial expression of MpKNOX1A and MpKNOX1B prompts the hypothesis that the encoded proteins may act as a safeguard against the inappropriate activity of MpBELL in the male gametophyte. This is particularly pertinent since ectopic expression of MpBELL3 alone in the vegetative gametophyte can activate MpKNOX1, MpKNOX2, and MpBELL1 ( Figure 5-figure supplement 2). The lack of an aberrant phenotype observed in lines constitutively expressing MpKNOX1 in male gametophytes would be consistent with this hypothesis.
In contrast to the other MpBELL genes, MpBELL5 transcripts are only detected in archegoniophores. Given the strong heterodimerization affinity preferences of BELL proteins for KNOX proteins (Bellaoui et al., 2001;Joo et al., 2018), the only conspicuous potential partner in M. polymorpha would be MpKNOX1. Hence, MpBELL5 might play an as-yet-undefined role in the activation of zygotic gene expression. At the MpBELL5 locus, a convergently transcribed locus is expressed predominantly in antheridia (Figure 4-figure supplement 1), suggesting MpBELL5 may be repressed by an antisense transcript in a manner similar to MpKNOX1 (Figure 3) and MpFGMYB (Hisanaga, 2019). Such male-expressed antisense transcript-mediated repression apparently provides a general mechanism for female-specific expression of autosomal genes, with sex-specific regulation ultimately being linked to the feminizing locus on the female sex chromosome (Bowman, 2016a;Knapp, 1935;Lorbeer, 1936).

Marchantia BELL gene diversity resembles that of charophyte algae
The phylogenetic diversity of BELL genes in M. polymorpha, and liverworts more broadly, more closely resembles the diversity observed in charophyte algae than that of other land plants (Joo et al., 2018;Lee et al., 2008). All previously described land plant BELL genes form a single clade that evolved within the charophycean algae (Figure 2). The single M. polymorpha gene in this clade, MpBELL1, is the only MpBELL gene predominantly expressed during sporophyte development. The other MpBELL genes reside in two other phylogenetically distinct clades, GLX-basal and BELL-gamete, both of which include charophyte sequences (Figure 2). This phylogenetic pattern suggests an ancient proliferation of BELL paralogs within the charophycean algal ancestor, with M. polymorpha (and other liverworts and some mosses) retaining this diversity that was subsequently lost in most other land plant lineages. The structural diversity (two or three homeodomains) of the MpBELL2/3/4 paralogs and the extensive sequence diversity of BELL-gamete clade genes are consistent with rapid evolution of genes involved in reproductive processes (Swanson and Vacquier, 2002). Finally, the expression pattern of MpBELL5, a member of the GLX-basal clade that evolved early in the charophytes, also suggests an as-yet-unresolved function in reproduction. Similarly, the Chlamydomonas genome encodes a second BELL paralog, HDG1, that is expressed in both (+) and (-) gametes, suggesting a role in reproduction, but whose function is unknown.

An ancestral function for TALE-HD genes in eukaryotes
In Viridiplantae (Chlamydomonas and Marchantia), the haploid-to-diploid transition is mediated by two TALE-HD genes. In the red alga Pyropia yezoensis, KNOX gene expression is detected in the diploid conchosporangium, but not in haploid thalli (Mikami et al., 2019). The phylogenetic distribution of KNOX and BELL subfamilies and their heterodimerization affinities (Joo et al., 2018) suggest that the Archaeplastida common ancestor utilized KNOX/BELL TALE-HD genes to mediate the haploidto-diploid transition. Likewise, in the brown alga Ectocarpus, two TALE-HD proteins, OUROBOROS and SAMSARA, mediate the transition (Arun et al., 2019). In contrast, in both ascomycete and basidiomycete fungi, the haploid-to-diploid transition is mediated by heterodimerization of a TALE-HD and a non-TALE-HD protein (Gillissen et al., 1992;Goutte and Johnson, 1988;Herskowitz, 1989;Hull et al., 2005;Kues et al., 1992;Spit et al., 1998;Urban, 1996). In the Amoebozoa Dictyostelium, the two homeodomain-like proteins controlling the haploid-to-diploid transition are highly divergent, rendering the phylogenetic affinities enigmatic (Hedgethorne et al., 2017). As these taxa span eukaryotic phylogenetic diversity, one role of homeodomain genes in the ancestral eukaryote was to regulate the haploid-to-diploid transition-the evolution of the homeodomain in the ancestral eukaryote was associated with evolution of a novel life cycle ( Figure 6-figure supplement 1). In arguably the most intensively studied taxon, the Metazoa, zygotic gene activation has been replaced by maternally derived pluripotency factors (Schulz and Harrison, 2019), but homeodomain genes have been retained.
Given the ancestral eukaryote possessed a minimum of two HD genes, including one TALE-HD and one non-TALE-HD (Bharathan et al., 1997;Bürglin, 1997;Derelle et al., 2007), it is an intriguing question as to whether, ancestrally, both TALE and non-TALE paralogs acted in the diploid-to-haploid transition. Either possible ancestral scenario (TALE + non -TALE or TALE + TALE) requires either a co-option of a pre-existing paralog (eg, non-TALE), or alternatively, a TALE gene duplication and transference of function to the new paralog, respectively. To resolve the ancestral condition, broader phylogenetic sampling and functional analyses across additional unicellular Bikont lineages, particularly in the Excavata and the paraphyletic sister groups of the Metazoa (choanoflagellates, Filasterea, and Ichthyosporea), might be informative ( Figure 6-figure supplement 1). Resolution of the ancestral condition could inform whether the primary ancestral function of both TALE-HD and non-TALE-HD was regulating the haploid-to-diploid transition or whether non-TALE-HD genes had another fundamental function in ancestral eukaryotes.

Materials and methods
Ecotypes, plant growth, transformation, and induction M. polymorpha ssp ruderalis, ecotype BoGa, obtained from the Botanical Garden of Osnabrueck, Germany, was used in SZ lab. Plant cultivation, transformation, and induction were performed according to Ishizaki et al., 2008and Althoff, 2014. M. polymorpha ssp ruderalis, ecotype MEL, was used in JLB lab and grown, transformed, and induced according to Ishizaki et al., 2008and Flores-Sandoval et al., 2015 In case of double transformations, sporelings were co-transformed using two constructs featuring different selectable markers. Spores were drained and plated on selection media for approximately 2 weeks and then subjected to a second round of selection before being transferred to ½ B5 plates. For induction of reproductive organs, plants were transferred to white light supplemented with far red light (735 nm; 45 µmol/m 2 /s) on ½ B5 media supplemented with 1 % glucose.
Genotyping, RNA extraction, cDNA synthesis, and semiquantitative reverse transcription-polymerase chain reaction DNA was extracted using a modified protocol from Edwards et al., 1991. Instead of vacuum drying, the pelleted DNA was air-dried. Amplicons were directly sequenced. RNA was extracted from ≈100 mg wet weight tissue using Rneasy kit from Qiagen following the manufacturer's instructions, including an on-column Dnase I digestion. The same amount of total RNA was used for complementary DNA (cDNA) synthesis using either Superscript II (Invitrogen) (SZ lab) or Bioscript (Bioline) (JLB lab) reverse transcriptase according to the manufacturer's instructions with oligo-dT 15 primer. 5' and 3' RACEready cDNA synthesis was performed using SMARTScribe reverse transcriptase (Takara) according to the manufacturer's instructions but with the addition of random primer mix (NEB) to allow reverse transcription of long mRNA templates. See Supplementary file 1, Table 1a for primer sequences.
In situ hybridization, sectioning, staining, and microscopy M. polymorpha ssp ruderalis, ecotype BoGa, tissue fixation, embedding, sectioning, and hybridization with digoxigenin (DIG)-labeled antisense RNA probes were performed according to Zachgo, 2002. Sections were either stained in toluidine blue O (TBO) alone or counterstained with ruthenium red (RR) according to Retamales and Scharaschkin, 2014. Staining time for RR and TBO was 90 s and 60 s, respectively. Microscopic slides were observed using a Leica MZ16 FA microscope and pictures were taken with a Leica DFC490 camera. Plants were observed using either a Lumar dissecting microscope (Zeiss) and photographed with AxioCam HRc and AxioVision software (both Zeiss) or a Leica stereomicroscope (Leica M165 FC) and photographed with an integrated Leica DFC490 camera. See Supplementary file 1, Table 1b for probe sequences.

Bimolecular fluorescence complementation
Coding sequences of KNOX and BELL genes were seamlessly cloned into the pGREEN derivatives (Hellens et al., 2000) pAMON/pSUR for N-terminal split VENUS (I152L) fusion and into pURIL/pDOX for C-terminal split VENUS (I152L) fusion (Lampugnani et al., 2016). All constructs feature the 35 S promoter driving the N-(pAMON/pURIL) or C-terminal (pSUR/pDOX) part of the split VENUS (I152L) fluorophore for translational fusion of the gene-of-interest in frame with the split VENUS (I152L). All constructs were transformed into Agrobacterium tumefaciens strain GV3101 for transfection of plant leaves (Bracha-Drori et al., 2004

Overexpression and transcriptional and translational fusion constructs
Primers used for overexpression with the endogenous, constitutive MpEF1α promoter (Althoff, 2014), inducible overexpression, and transcriptional and translational fusion constructs are shown in Supplementary file 1, Table 1c. The complete 1.8 kb MpKNOX1 coding sequence and a 4.8 kb fragment of the MpBELL3 coding sequence were amplified from cDNA and seamlessly cloned (NEBuilder, NEB) into pENTR-D (Invitrogen) via NotI/AscI sites. The MpKNOX1 coding sequence was subsequently recombined into pMpGWB403 (Addgene entry #68668) for constitutive expression, whereas the MpBELL3 coding sequence was recombined into the estrogen-inducible binary pHART XVE (Flores-Sandoval et al., 2016). Regulatory sequences upstream of the transcriptional start site of MpKNOX1 (4.6 kb), MpBELL3 (5.6 kb), and MpBELL4 (5.8 kb) and internal of the MpBELL4 locus (4.1 kb) were amplified and cloned into pENTR-D and subsequently recombined using LR-clonase II (Invitrogen) into pMpGWB104 (Addgene entry #68558) featuring the GUS reporter gene. To elucidate the transcriptional regulation at the MpKNOX1/MpSUK1 locus, the 4.6 kb regulatory region of MpKNOX1 was seamlessly cloned (NEBuilder, NEB) into the HindIII/XbaI site 5' of the Gateway cassette of pMpGWB401 (Addgene entry #68666). The reversely transcribed MpSUK1 locus (2.7 kb) was seamlessly cloned into either the SacI site or AscI site (5' or 3' of the NOS terminator). Subsequently, the GUS reporter gene was recombined using pENTR-GUS (Invitrogen). For translational fusions of MpBELL2, MpKNOX1A, and MpKNOX1B, the regulatory sequence 5' upstream of the transcriptional start site including the genomic gene locus and excluding the stop codon was amplified and cloned into pRITA, a shuttle vector featuring the GUS reporter gene and NOS terminator. MpBELL2 and MpKNOX1A were cloned into the KpnI or SalI/KpnI site, respectively. MpKNOX1B was seamlessly cloned using Infusion cloning (Clontech). All constructs were subsequently transferred to the binary vector pHART using NotI.

GUS staining
Generally, detection of GUS activity in a minimum of three independent lines was performed overnight at 37 °C in GUS staining solution (0.05 M NaPO4, pH 7.
Complete or partial coding nucleotide sequences were manually aligned as amino acid translations using Se-Al v2.0a11 for Macintosh (http://tree.bio.ed.ac.uk/software/seal/). We excluded ambiguously aligned sequences to produce an alignment of 231 nucleotides (77 amino acids) for 132 BELL sequences. Alignments of KNOX genes included the homeodomain, MEINOX (KNOX) and ELK domains (Joo et al., 2018), comprising 630 nucleotides (210 amino acids) for 124 KNOX sequences. Alignments of nucleotides and amino acids were employed in the subsequent Bayesian analysis. Bayesian phylogenetic analysis was performed using Mr Bayes 3.2.1 . The Bayesian analyses for the nucleotide data sets were run for 15,000,000 (BELL) or 5,000,000 (KNOX) generations, which was sufficient for convergence of the two simultaneous runs (BELL, 0.0357; KNOX, 0.0423). In both cases, to allow for the burn-in phase, 50 % of the total number of saved trees were discarded. The graphic representation of the trees was generated using the FigTree (version 1.4.0) software (http://tree.bio.ed.ac.uk/software/figtree/). Sequence alignments and command files used to run the Bayesian phylogenetic analyses can be provided upon request. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Additional files
Supplementary files • Supplementary file 1. Primers and guide RNAs used in this study. Table 1a. Primer name, gene number, sequence, and the corresponding figure used for sqRT-PCR and RACE . Table 1b. Primer name, gene name, and sequence used for in situ probe preparation. Table 1c. Primer name, gene name, and sequence used for overexpression, inducible overexpression, and transcriptional and translational β-glucuronidase reporter constructs. Table 1d. Guide RNA sequences used to generate mutant alleles.
• Transparent reporting form

Data availability
All data generated or analysed during this study are included in the manuscript and supporting files.