MYB -related transcription factors control chloroplast biogenesis

Chloroplast biogenesis is dependent on master regulators from the GOLDEN2-LIKE (GLK) family of transcription factors. However, glk mutants contain residual chlorophyll, indicating that other proteins must be involved. Here, we identify MYB-related transcription factors as regulators of chloroplast biogenesis in the liverwort Marchantia polymorpha and angiosperm Arabidopsis thaliana . In both species, double-mutant alleles in MYB -related genes show very limited chloroplast development, and photosynthesis gene expression is perturbed to a greater extent than in GLK mutants. Genes encoding enzymes of chlorophyll biosynthesis are controlled by MYB-related and GLK proteins, whereas those allowing CO 2 ﬁxation, photorespiration, and photosystem assembly and repair require MYB-related proteins. Regulation between the MYB-related and GLK transcription factors appears more extensive in A. thaliana than in M. polymorpha . Thus, MYB -related and GLK genes have overlapping as well as distinct targets. We conclude that MYB-related and GLK transcription factors orchestrate chloroplast development in land plants.


In brief
Analysis of a liverwort with a streamlined genome identified MYB-related transcription factors as regulators of chloroplast development in land plants.

INTRODUCTION
Photosynthesis is fundamental to life, and in eukaryotes, it takes place in organelles known as chloroplasts.2][3] Since then, significant elaborations to the control of photosynthesis gene expression have taken place.For example, in plants, the majority of genes allowing chloroplast biogenesis are encoded in the nucleus such that thousands are post-translationally imported into the chloroplast. 4,5Improving photosynthetic efficiency is considered an important target for crop improvement and, thus, global food security, 6,7 but to predictively engineer this key process, an improved knowledge of photosynthesis gene regulatory networks is required.
Expression of photosynthesis-associated nuclear genes is responsive to light as well as processes intrinsic to the cell.For example, in angiosperms, light is required for chloroplast formation, but hormones amplify or repress this response. 8hese exogenous and endogenous inputs are integrated by key transcriptional regulators, including the bZIP transcription factor elongated hypocotyl 5 (HY5) that acts antagonistically with phytochrome interacting factors (PIFs) to activate chloro-plast development in the presence of light.][10][11][12][13] Yet, glk mutants in Arabidopsis thaliana (A.thaliana), 10 rice, 14 and also non-seed plants such as Physcomitrium patens 15 and Marchantia polymorpha 16 contain chlorophyll.Moreover, A. thaliana mutants lacking functional GLK and GATA genes are not albino. 9,17In summary, other actors must allow assembly of the photosynthetic apparatus in the absence of these known regulators.
We sought to identify transcription factors operating alongside the master regulator GLK.As forward genetics has failed to identify such proteins, we rationalized that genetic redundancy had hindered their identification, and analysis of a species with a more compact genome would circumvent this issue.Marchantia polymorpha has a streamlined genome with many transcription factors represented by one or two copies, and the dominant form of lifecycle is haploid. 18Although in M. polymorpha GATA orthologs are not required for greening, consistent with analysis of A. thaliana, Mpglk knockouts contain low but detectable levels of chlorophyll. 16We therefore hypothesized that an unknown transcription factor conserved between M. polymorpha and  20 and Marchantia polymorpha during spore germination 19 were examined.Transcription factors (TFs) upregulated in response to light in both (legend continued on next page) A. thaliana acts alongside GLK to control chloroplast biogenesis during photomorphogenesis.
After re-examination of publicly available RNA sequencing (RNA-seq) data, gene editing of transcription factors, and phenotypic analysis, we identify two RR-type myoblastoma related (RR-MYB; ''R'' refers to domain repeat) transcription factors as regulators of chloroplast biogenesis and photosynthesis gene expression in M. polymorpha and A. thaliana.In contrast to the GLK proteins that regulate expression of genes allowing chlorophyll biosynthesis and function of photosystems I and II, the RR-MYBs have a broader set of targets that extends to genes allowing CO 2 fixation, photorespiration, photosystem assembly, and repair.We conclude that these proteins function as master regulators of chloroplast biogenesis and photosynthesis gene expression.The data have implications for understanding chloroplast biogenesis and photosynthesis as well as other processes taking place in plastids such as nitrogen and sulfur assimilation, the biosynthesis of amino acids, fatty acids, and carotenoids.

MpRR-MYB5 regulates chloroplast development synergistically with its paralog MpRR-MYB2
We interrogated publicly available gene expression data sampled during the transition from non-photosynthetic to photosynthetic growth in M. polymorpha 19 as well as A. thaliana (Figures 1A and 1B). 20This identified 108 and 144 transcription factors upregulated after exposure to light in M. polymorpha and A. thaliana, respectively (Tables S1 and S2).Orthologs upregulated in both datasets with an unknown or chlorophyll-related annotation and belonging to a multigene family in A. thaliana were selected (Figures 1A and 1B).Fourteen candidates from M. polymorpha were identified (Figure 1B; Table S2).Two of these (MpGLK and MpGATA4) are homologs of known photosynthesis regulators in A. thaliana, 10,21 and MpGLK has a confirmed role in M. polymorpha. 16The remainder included a number of B-box (BBX) domain proteins known to interact with the master regulator of photomorphogenesis HY5, 22 a homeobox-leucine zipper (HD-ZIP) protein whose Arabidopsis thaliana homeobox 17 (ATHB17) ortholog regulates photosynthesisassociated nuclear genes in response to abiotic stress, 23 a C2H2 type zinc finger transcription factor with unknown function in A. thaliana, and a MYB-related gene predicted to regulate photosynthesis gene expression. 24,25e used M. polymorpha as a testbed by subjecting each of these candidates to CRISPR-Cas9-mediated editing.With the exception of MpGLK, which has previously been reported to lead to a pale phenotype when mutated 16 only one other candidate had low chlorophyll.This was Mp5g11830, annotated as MpRR-MYB5 in the M. polymorpha genome database, but previously also referred to as a circadian clock associated-like (CCA1like) RR-MYB-related transcription factor. 26,27MpRR-MYB5 has a single paralog (MpRR-MYB2-Figures 1B and 1C) showing high similarity to MpRR-MYB5 at the amino acid level, with, for example, the CCA1-like/RR-Myb domain being 93% identical (Figure 1C).Mutant alleles of MpRR-MYB5 but not MpRR-MYB2 appeared pale (Figures 1D-1F, S1A, and S1B), and analysis of chlorophyll content confirmed this (Figure 1H).All lines in which insertions or deletions introduced premature stop codons in MpRR-MYB5 (Figure S1A) had 40%-50% less chlorophyll than controls (Figures 1E and 1H).The Mprr-myb5 mutant was complemented when MpRR-MYB5 was expressed from its own promoter (Figure S1C), confirming the pale phenotype was unlikely associated with off-target Cas9 editing.Mutating MpRR-MYB5 and MpRR-MYB2 simultaneously (Figure S1D) led to extremely pale plants with chlorophyll content reduced by $95% compared with controls (Figures 1G and 1H).Analysis of a time course associated with the first hours after exposure to light 19 indicated that transcripts derived from MpRR-MYB5, MpRR-MYB2, and MpGLK were induced by 24 h after light was perceived but that MpRR-MYB5 and MpRR-MYB2 expression was maintained for longer (Figure 1I).To test whether the photosynthetic apparatus was functional in the single Mprr-myb5 and species with either unknown or related to chlorophyll function were retained.As we hypothesized that functional redundancy had hindered identification of such regulators via forward genetic screens an additional criterion was that each family should be represented by multiple copies in A. thaliana.double Mprr-myb5,2 mutants, we applied the inhibitor di-chlorophenyl di-methyl urea (DCMU) that blocks photosynthetic electron transport 28 and measured activity of photosystem II via chlorophyll fluorescence imaging (Figure 1J).Although the mutants had low levels of chlorophyll, the photosynthetic apparatus was operational, suggesting that any remaining chlorophyll was functionally associated with photosystem II reaction centers.There was a small but significant reduction in F v /F m in the double Mprr-myb5,2 mutant (Figure 1J) indicating compromised function in this genotype.Consistent with low chlorophyll levels in Mprr-myb5 and Mprr-myb5,2 mutants, chloroplasts were smaller and thylakoids underdeveloped compared with controls and Mprr-myb2 (Figures 1K-1S and S1E).Poorly developed chloroplasts from Mprr-myb5,2 mutants contained more starch (Figure 1T).
To determine whether MpRR-MYB5 and MpRR-MYB2 limit greening, we generated over-expression lines driven by the strong MpUBE2 constitutive promoter 29 and used eGFP to mark the plasma membrane (Figures S2A-S2E).Although quantitative polymerase chain reactions confirmed each transgene was over-expressed (Figures S2F-S2N), plants appeared similar to controls, and there were no evident perturbations to chlorophyll content, chloroplast size, or morphology (Figures S2O-S2P).We conclude that MpRR-MYB5 and MpRR-MYB2 act redundantly and are necessary for chloroplast biogenesis, but in contrast with MpGLK, 16 they are not sufficient to activate this process.Moreover, in the absence of both MpRR-MYB5 and MpRR-MYB2 assembly of the photosynthetic apparatus was very limited.

MpRR-MYB5 and MpRR-MYB2 act with MpGLK to control chloroplast biogenesis
As double Mprr-myb5,2 mutants showed residual chloroplast development, we hypothesized that their limited photoautotrophic growth was associated with activity of the previously characterized master regulator GLK.To test this, we attempted to generate higher-order mutants that combined mutant alleles of Mpglk, Mprr-myb5, and Mprr-myb2.We were able to knock out MpRR-MYB2 or MpRR-MYB5 in the presence of Mpglk mutant alleles (Figures S3A-S3C), and these double mutants were similar or paler than the Mpglk mutant (Figures 2A-2E  and S3D).Although previous analysis indicated that GATAs are not required for chloroplast biogenesis in M. polymorpha, 16 we produced double Mpgata4,rr-myb5 mutants.Chlorophyll content in these lines was similar to that of Mprr-myb5, and they showed similar perturbations to thallus morphology evident in Mpgata4 mutants (Figures S3E and S3F).This implies that MpGATA4 is unlikely to control chloroplast development with Mprr-MYB5.Application of DCMU confirmed that photosystem II was functional in Mpglk,rr-myb5 double mutants (Figure 2F).Chloroplasts were smaller and contained fewer thylakoid membranes with reduced granal stacking in Mpglk,rr-myb5 compared with each single mutant (Figures 2G-2O).Thus, in the absence of both MpRR-MYB5 and MpGLK, very limited assembly of the photosynthetic apparatus took place.
We were unable to generate triple Mpglk,rr-myb5,2 mutants, implying that this allelic combination could be lethal.For example, after super-transforming Mpglk,rr-myb5 mutants with a vector allowing expression of the same guide RNA used to generate the Mprr-myb2 mutants reported above, 91 lines were obtained.However, none were paler than Mpglk,rr-myb5 mutants, and when genotyped 86 lines had no edits in MpRR-MYB2.Of the five lines that were edited in MpRR-MYB2 (as well as MpGLK and MpRR-MYB5), mutations had limited impact on the MpRR-MYB2 protein, with reading frame maintained and amino acids being modified in a poorly conserved region of the protein (Figures S3G and S3H).By contrast, when the original Mprr-myb2 mutants were identified (Figure 1F), more than 60% of plants contained mutations that introduced early stop codons or disturbed reading frame (Data S1).We propose that absence of all three proteins (MpGLK, MpRR-MYB5, and MpRR-MYB2) is likely lethal, possibly due to the lack of chloroplast biogenesis.

MpRR-MYB transcription factors regulate genes allowing carbon fixation, photorespiration, and photosystem function
To provide insight into the types of genes regulated by MpRR-MYB5 and MpRR-MYB2, we performed RNA-seq of overexpressing lines, single and double mutants.Over-expression of MpRR-MYB2 and MpRR-MYB5 led to upregulation of 71 and 11 genes, respectively (padj value % 0.01, log 2 FC R 1-fold) (Figures S4A and S4B; Data S1), and there was limited overlap between these two datasets (Figures 3A, S4C, and S4D).Overexpression of MpGLK that led to the upregulation of 493 genes (Figure 3A and Yelina et al. 16 ).
In loss-of-function mutants for MpRR-MYB2 or MpRR-MYB5, 65 and 823 genes showed reductions in transcript abundance, respectively, compared with controls (padj value % 0.01, log 2 FC R 1-fold) (Figures 3B, S4E, and S4F; Data S1).Knocking out MpGLK had greater impact with 1,065 genes being downregulated (Figures 3B and S4G).In double Mpglk,rr-myb5 mutants (Figure S4H), 1,161 genes had lower transcript abundance than controls, and in the double Mprr-myb5,2 mutants, this was increased to 1,744 (Figures 3B and S4I).The largest overlap between genotypes (524 genes) was detected for Mpglk,rr-myb5 and Mprr-myb5,2 mutants (Figures 3B and S4J), further supporting synergistic action of MpRR-MYBs and MpGLK.Gene Ontology (GO) terms were used to provide insight into classes of genes impacted by over-expression or loss of function of MpRR-MYBs and MpGLK.Consistent with the lack of detectable phenotype after over-expression of MpRR-MYB5 or loss of MpRR-MYB2 function, no distinct GO terms were impacted in these lines.When genes downregulated in Mprr-myb5 and Mpglk mutants were assessed, the response to oxidative stress and hydrogen peroxidase catabolism terms were over-represented (Figure 3C; Data S2).Mprr-myb5 mutants also showed changes to protein phosphorylation (Figure 3C; Data S2), while in Mpglk photosynthesis, light harvesting and chlorophyll biosynthesis terms were affected (Figure 3C).It was notable that for genes downregulated in the Mpglk, Mpglk,rr-myb5, and Mprr-myb5,2, mutants similar terms were over-represented (Figure 3C; Data S2).For genes upregulated in Mpglk mutants DNA replication-related GO terms were over-represented, whereas response to auxin and transmembrane transport were affected in Mprr-myb5,2 and Mpglk,rr-myb5 mutants, respectively (Figure S4K; Data S2).Transcripts associated with proteins operating in similar cellular locations were impacted in Mpglk, Mpglk,rr-myb5, and Mprr-myb5,2 mutants, including photosystem I, thylakoid membranes, and also the ribosome (Data S2).In the case of Mprr-myb5,2, however, a more extensive effect on chloroplast and photosystem-related GO terms was evident.For upregulated genes, no GO terms were impacted in the Mprr-myb5,2 and Mpglk,rr-myb5 mutants, while for Mpglk, the nucleosome GO term was most impacted (Data S2).Loss of function of either the MpRR-MYBs or MpGLK pri-marily therefore caused changes in overlapping GO terms, exemplified by those associated with photosynthesis, responses to oxidative stress, and hydrogen peroxide catabolic processes (Data S2).However, some terms were specific to each mutant, with, for example, iron ion transmembrane transport, translation, and translational elongation being enriched in Mprr-myb5,2, while the L-phenylalanine catabolic process term was specific to Mpglk (Data S2).Overall, loss-of-function Mpglk, Mpglk,rr-myb5, and Mprr-myb5,2 mutants caused changes in GO terms primarily associated with photosynthesis (Data S2).For the common impacted GO categories, $40%-60% of genes were co-regulated by both MpRR-MYBs and MpGLK (Data S2).
Since chlorophyll content was reduced in Mprr-myb5 and Mprr-myb5,2 mutants, we examined transcript abundance of the nineteen annotated chlorophyll biosynthesis genes (Figure 3D).With the exception of HEMA (encoding the glutamyl-tRNA reductase) in Mprr-myb5 mutants, knocking out either MpRR-MYB5 or MpRR-MYB2 did not significantly affect transcript abundance.By contrast, in Mpglk mutants, transcript abundance from seventeen of the genes was reduced, and in Mprr-myb5,2 all nineteen genes were significantly downregulated (Figure 3D).The effect on transcript abundance was more pronounced in Mprr-my5,2 than in Mpglk,rr-myb2 (Figure 3D).We next examined the impact of loss of the MpRR-MYBs on approximately 200 other genes annotated as photosynthesis-related (Data S1).This included genes associated with CO 2 fixation, the light-harvesting apparatus, and their assembly and repair.In the Mprr-myb5 and Mprr-myb2 mutant alleles, there was a limited effect on photosynthesis-associated genes (Figures 3E, 4A, and 4B).For example, in Mprr-myb2, expression of only petE (Mp4g02720) was significantly perturbed (log 2 FC R 1-fold) (Data S1).In Mprr-myb5 mutants, a small number of genes were impacted, including those encoding a small subunit of RuBisCO (RbcS) (Mp4g09890) and a chlorophyll a/b binding protein (Mp7g05530) (Figure 3E; Data S1).As expected, changes to photosynthesis transcripts were more evident in Mpglk (Figures 3E, 4A, and 4B) and even more severe when both MpRR-MYB5 and MpGLK were absent (Figures 3E, 4A,  and 4B).Strikingly, when MpRR-MYB2 and MpRR-MYB5 were simultaneously knocked out, the effect on photosynthesis-associated genes was extensive and more widespread than in Mpglk,rr-myb5.For example, in Mprr-myb5,2 double mutants, the majority of genes encoding enzymes involved in the Calvin-Benson-Bassham cycle and photorespiration were downregulated (Figures 3E and S4L).Moreover, genes encoding components of both photosystems and their respective light-harvesting complexes as well as the cytochrome b 6 f complex were downregulated (Figures 4A and 4B).We also found that genes associated with assembly of RuBisCO, non-photochemical quenching, as well as granal stacking and repair of photosystem II, were impacted in Mprr-myb5,2 (Figure S4M).This contrasts with the Mpglk single mutant, where the greatest changes were associated with genes encoding components of the photosystems and their light-harvesting complexes (Figures 4A and 4B).For genes, e.g., those associated with RuBisCO assembly, granal stacking, or repair of PSII, that were less expressed in Mpglk, the level of perturbation was much lower than in Mprr-myb5,2 (Figure S4M; Data S1).Immunoblotting demonstrated that LHCA, PsbA, PsaC, and PsbS proteins were less abundant in Mpglk and Mprr-myb5 mutants, and in the double Mprr-myb5,2, this effect was enhanced (Figure 4C).

MpRR-MYBs can activate photosynthesis genes
The analysis above indicated that loss of MpRR-MYBs had widespread impact on photosynthesis gene expression that overlapped with that seen in Mpglk.However, it was not clear if these effects were associated with direct interaction between the MpRR-MYB transcription factors and these structural genes.We therefore used DNA affinity purification and sequencing (DAP-seq) 32 to identify genome-wide binding sites of MpRR-MYB2 and MpRR-MYB5 and DNA sequence motifs that they bind.This identified a total of 6,804 and 2,839 binding sites for MpRR-MYB2 and MpRR-MYB5, respectively (Data S3).43% of MpRR-MYB2 and 35% of MpRR-MYB5 binding sites were located 3 kb upstream of predicted translational start sites (Figure 4D).For MpRR-MYB2 and MpRR-MYB5, 890 and 301 of peaks mapping to promoter regions were misregulated in the Mprr-myb5,2 mutant (Figure 4D).The Fisher exact and permutation testing indicated this was a statistically higher overlap than would be expected by chance (Figure S5A).Motif enrichment analysis for MpRR-MYB2 and MpRR-MYB5 binding sites showed that they shared a TTATC consensus (Figure 4E).
We next asked whether the MpRR-MYB consensus binding site was more enriched in photosynthesis genes than would be expected by chance.The Fisher exact test and permutation analysis indicated that while the GLK binding site was not over-represented, there was a small but statistically significant over-representation of the RR-MYB consensus site in promoters of photosynthesis genes (Figure 4F).MpRR-MYB5 and MpRR-MYB2 were able to bind promoters of photosynthesis genes (22 genes and 10, respectively, Data S3), including MpPsbQ, MpPsaD, and MpLHCB1 (Figure 4G).Moreover, when MpRR-MYB5 was used in trans-activation assays as an effector, it was sufficient to activate expression from all three genes (Figure 4G) as well as MpPsaN and MpPetA (Figure S5B).MpRR-MYB2 activated MpLHCB1 (Figure 4G).MpRR-MYB5 also activated expression from MpDVR (encoding divinyl chlorophyllide a 8-vinyl-reductase) and MpPORA (encoding protochlorophyllide oxidoreductase A) that encode enzymes allowing chlorophyll biosynthesis (Figure S5B).Trans-activation assays showed that MpGLK was sufficient to activate both MpRR-MYB5 and MpRR-MYB2 (Figure 4H).However, there  was no strong evidence that MpRR-MYB2 bound its own promoter or activated it (Figure 4H), and while both MpRR-MYB2 and MpRR-MYB5 appeared able to bind the promoter of MpRR-MYB5 neither activated its expression (Figure 4H).It is possible that in vivo other factors interact with MpRR-MYB5 and MpRR-MYB5 to control their expression to allow co-ordination with MpGLK.In support of this, analysis of the RNA-seq data indicated that MpRR-MYBs and MpGLK may be linked in a gene regulatory network.For example, in Mpglk, transcripts derived from MpRR-MYB5 and MpRR-MYB2 were upregulated (Figure 4I), and MpGLK transcripts were downregulated in Mprr-myb5,2 (Figure 4I).We note a MpRR-MYB binding site in the sixth intron of MpGLK (Figure S5C).Overall, DAP-seq and effector assays provide evidence that MpRR-MYB can bind promoters of photosynthesis genes and that they are sufficient to enhance expression from these targets.

RR-MYBs control chloroplast biogenesis in A. thaliana
The RR-MYB/CCA1-like subfamily of MYB-related transcription factors 27,33,34 containing MpRR-MYB5 and MpRR-MYB2 is characterized by a conserved SHAQK(Y/F)F motif (Figure S6A).We identified eleven members of this group in A. thaliana (Figures 6A and S6B-S6E) of which AtMYBS1, AtMYBS2, and AT5G23650 were the closest homologs of MpRR-MYB5 and MpRR-MYB2 (Figures S6B-S6E).Re-analysis of publicly available data indicated that AT5G23650 is not expressed in photosynthetic tissues, so we focused our analysis on AtMYBS1 and AtMYBS2.Due to the functional redundancy evident for MpRR-MYB5 and MpRR-MYB2 above, double Atmybs1,mybs2 mutants were identified after CRISPR-Cas9-mediated gene editing (Figures 6B, S7A, and S7B) and analyzed in parallel with previously generated single Atmybs1 and Atmybs2 mutants.There were no detectable changes to rosette phenotype in the single mutants, but Atmybs1,mybs2 mutants were pale (Figures 6C-6F), and this was most noticeable after bolting (Figures 6G and  6H).There were no detectable changes in chloroplast size or number in mesophyll cells of single mutants, but chloroplasts were smaller in Atmybs1,mybs2 (Figures 6C-6F and 6I).Chloroplasts of Atmybs1,mybs2 contained underdeveloped thylakoids (Figure 6J).Consistent with these findings, chlorophyll content in the single mutants was indistinguishable from wild type but was $40% lower in the double Atmybs1,mybs2 mutant (Figure 6K).Unfolding of the apical hook and greening of cotyledons were evident in both Atglk1,glk2 and Atmybs1,mybs2 double mutants, but greening appeared slower than in controls (Figure 6L).For the first 12 h after light was perceived, chlorophyll content was indistinguishable in Atglk1,glk2 and Atmybs1,mybs2 double mutants, but by 24 h Atglk1,glk2 had less chlorophyll than Atmyb-s1,mybs2 (Figure 6M).F v /F m was reduced for the first 6 h of light in both mutant backgrounds, but this was more apparent in At-mybs1,mybs2 (Figure 6N).AtGLK1 showed an increase in transcript abundance by 30 min of light that was maintained over 24 h (Figure 6O).By contrast, transcripts derived from AtMYBS2 had a higher abundance before exposure to light (Figures 6O and S6F).
Consensus binding sites for AtMYBS1&2 have previously been defined from DAP-seq, 35 and along with those we defined for M. polymorpha, they also comprise a TTATC core (Figure 7A).Fisher exact and permutation tests indicated that this motif is more common in promoters of photosynthesis genes than would be expected by chance (Figure 7B).RNA-seq of Atmyb-s1,mybs2 (Data S1) identified 2,470 transcripts that were less abundant compared with controls (Figure 7C, padj value % 0.01, LFC R 0.5), and of these, 122 can be bound by AtMYBS1 or AtMYBS2 (Figure 7C; Data S1; 35 ).The Fisher's exact test and permutation analysis indicated that this is greater overlap than expected by chance (Figure S7C).Genes downregulated in the Atmybs1,mybs2 mutant compared with wild type (Figures S7D and S7E) included those encoding enzymes of chlorophyll biosynthesis pathway and components of the light-harvesting complexes as well as photorespiration and the Calvin-Benson-Bassham cycle (Figures 7D and S7F).We selected six photosynthesis genes that appeared to be under control of AtMYBS1 and AtMYBS2 for testing in effector assays.
In each case, either AtMYBS1 or AtMYBS2 was able to activate expression (Figure 7E).Re-examination of publicly available DAP-seq data 35 showed that either or both AtMYBS1 or AtMYBS2 bind strongly to the promoters of these photosynthesis genes but also the promoter of 49 additional photosynthesis genes (Figure S7G; Data S3).We also tested whether AtMYBS1 or AtMYBS2 can activate themselves as well as AtGLK1 and AtGLK2.Although the magnitude of response varied, all four transcription factors could activate AtMYBS1 and At-MYBS2 (Figure 7F).Moreover, AtGLK2 and both AtMYBS proteins activated AtGLK1, and AtMYBS1 and AtMYBS2 activated AtGLK2 (Figure 7F).This capacity for AtMYBS1 and AtMYBS2 to recognize their own promoters as well as AtGLK1 is consistent with binding detected from DAP-seq (Figure S7H).We note no binding of AtMYBS1 or AtMYBS2 to the AtGLK2 promoter was detected (Figure S7H), despite its capacity to activate this promoter in effector assays, perhaps due to low affinity binding.As in M. polymorpha, transcript profiling indicated that more complex links likely operate between the AtRR-MYBs and AtGLK in vivo, with AtGLK2 being upregulated in the double Atmyb-s1,mybs2 mutant (Figure 7G).Of the 98 genes downregulated in both Atmybs1,mybs2 and Atglk1,glk2, 25 more were shared than would be expected by chance (Figures 7H and S7I; Data S1), and of these, 27 were related to photosynthesis, encoding, for example, components of the light-harvesting complexes and the photosystems (Figure S7J; Data S1).Again, consistent with a complex regulatory network, there was no clear correlation between expression profiles of these 98 genes and the RR-MYBs and GLKs after exposure to light (Figure S7K).Based on all the data above, we therefore propose that AtMYBS1&2 and AtGLK1&2 have the capacity to control chloroplast biogenesis in A. thaliana, to bind and regulate each other, and that in planta, this is likely part of a complex regulatory framework composed of both direct and indirect links allowing photosynthesis gene expression to be tuned to the environment.

DISCUSSION
7][38][39][40] Moreover, a targeted reengineering of the process could contribute to crop development. 6,7Indeed, improvements in yield have been reported after over-expression of Sedoheptulose bisphosphatase, 41,42 faster relaxation of nonphotochemical quenching of photosystem II 43 and rerouting of photorespiration. 44A complementary approach to improving photosynthesis predicted to increase yield by up to 50% would be to convert C 3 crops to use the more efficient C 4 pathway. 45,46ntroducing C 4 photosynthesis in C 3 crops such as rice would require a remodeling of chloroplast biogenesis in mesophyll and bundle sheath cells. 47n land plants, the GLK family of transcription factors are master regulators of chloroplast biogenesis, and CGA1 and GNC from the GATA family are considered ancillary players. 8,10,48ver-expression of GLK in rice is sufficient to increase chloroplast occupancy of cells such as the bundle sheath and thus to partially phenocopy traits associated with the efficient C 4 pathway. 38However, we have an incomplete understanding of transcription factors allowing chloroplast development.For See also Figure S7.
example, GLK and CGA1/GNC loss-of-function mutants still possess small chloroplasts 9 indicating that either these mutants are hypomorphic or that additional unidentified actors control chloroplast biogenesis.Here, we report two RR-MYB-related transcription factors that act redundantly to control chlorophyll biosynthesis and photosynthesis-associated gene expression in the bryophyte M. polymorpha.Homologs control chloroplast biogenesis in A. thaliana indicating functional conservation between these species.Interestingly, we were unable to identify null mutants lacking both GLK and RR-MYB in M. polymorpha.Indeed, although we used super-transformation of existing mutant alleles when attempting to generate triple mutants, we did not observe white sectors.Our inability to recover a triple Mpglk,rr-myb5,2 mutants mirrors attempts to generate loss-offunction mutations in plastidial pathways such as amino acid, vitamin, nucleotide, or fatty acid biosynthesis and those involved in chloroplast protein translation that result in an arrest of embryo development in A. thaliana. 49,50This often appears to coincide with the globular-to-heart transition stage when chloroplasts start to differentiate.Mutants in genes encoding plastidial proteins required for import, modification, and localization of indispensable proteins in the chloroplast are also often associated with embryo lethality. 49,50It is therefore possible that plants lacking both GLK and the RR-MYBs are unable to differentiate chloroplasts from proplastids.We did not detect any effect on roots or rhizoids in the mutants that we analyzed, and so at present, we have no evidence that GLKs or RR-MYBs impact plastid development in non-photosynthetic organs.
The precise architecture of the gene regulatory network involving the RR-MYBs and GLK will need to be fully elucidated.Our current data indicate that in both M. polymorpha and A. thaliana, there is overlap in the types of photosynthesis genes controlled by RR-MYBs and GLKs.However, the RR-MYBs appear to target a broader set of targets to allow the response of genes associated with carbon fixation, photorespiration, and repair of protein complexes to be regulated during photomorphogenesis.Although MpRR-MYB5 over-expression failed to upregulate MpGLK transcripts, MYB transcription factors commonly act in multimeric complexes involving basic-helixloop-helix (bHLH) and WD40 proteins 51 or with myelocytomatosis (MYC) proteins. 52It is therefore possible that additional partners need to be over-expressed in combination with the RR-MYBs to increase expression of MpGLK.It is also well documented that GLK is subject to multiple levels of regulation that can be overcome when non-native versions of the gene are mis-expressed. 38When MpGLK was mis-expressed in the double Mprr-myb5,2 mutant, although chlorophyll content remained low (90% of wild type), it was significantly statistically increased.We interpret these data in two ways.Either both classes of transcription factors are needed to drive full photosynthesis gene expression, or MpGLK is permissive for very early stages of chloroplast biogenesis, but full assembly of the photosynthetic apparatus is strengthened by RR-MYBs.A permissive role for GLK in initiating chloroplast biogenesis is consistent with its ability to convert normally non-photosynthetic cells to contain a large chloroplast compartment. 38Our current data therefore support a model in which GLK permits early stages of chloroplast biogenesis, and RR-MYBs then extend targets to activate accumulation of additional photosynthesis transcripts.Although this conditioning can be partially overcome by over-expression of MpGLK in the presence of RR-MYBs, 16 in the absence of MpRR-MYBs, the impact of MpGLK overexpression is limited.
As would be expected from their evolutionary distance, rewiring has taken place between these transcription factors and the structural photosynthesis genes they target in A. thaliana and M. polymorpha.In A. thaliana, this regulatory system is more complex, with regulation of AtGLK by AtMYBS1&2 being more evident, and members of the GATA transcription factor family also controlling photosynthesis gene expression.Moreover, inducible over-expression of AtMYBS1 in A. thaliana has been reported to increase expression of photosynthesis genes. 24lthough similar sets of genes were downregulated in loss-offunction Mprr-myb5,2 mutants, we did not detect widespread upregulation of photosynthesis genes after over-expression of M. polymorpha RR-MYBs.Also consistent with rewiring between these species is the fact that compared with MpRR-MYB2, the pale phenotype of MpRR-MYB5 indicates it plays a dominant role in chloroplast biogenesis, while in A. thaliana, neither single mutant was pale.
The role of RR-MYBs in controlling chloroplast biogenesis is supported by previous work.For example, in tomato, LeMYBI has been reported to bind the promoter of the RBCS gene. 53nd, although no effect on chloroplast biogenesis was reported, a reduction in RBCS and the chlorophyll a/b binding protein 1 (CAB1) genes expression has been reported in At-mybs1 mutants. 54Along with transcription factors belonging to the GLK, BBX, and nuclear factor-Y families, random forest analysis of gene expression recently predicted that RR-MYBs regulate photosynthesis gene expression. 24Moreover, an inducible AtMYBS1 over-expressor line showed upregulation of photosynthesis genes that we detected as downregulated in Atmybs1,mybs2.
Penetrance of the RR-MYBs on photosynthesis gene expression and chloroplast biogenesis in M. polymorpha was striking, with MpRR-MYBs activating genes allowing CO 2 fixation as well as light harvesting.Chloroplasts of Mpmyb5,2 mutants were $30% smaller than those of Mpglk mutants, and double Mpmyb5,2 mutants were paler than those of Mpglk.This appears to be because RR-MYBs control an overlapping but broader set of photosynthesis genes than those downstream of GLK.Previous work supports the notion that these two classes of transcription factors have shared targets, as co-binding of RR-MYB and GLK to photosynthesis genes has been proposed. 55Such cooperative binding of transcription factors is thought to allow greater variety of expression outputs.The reach of the RR-MYBs appears extensive in that they control genes encoding enzymes of the Calvin-Benson-Bassham cycle and photorespiration but also assembly and repair of RuBisCO.It seems likely that the large number of genes encoding a wide range of components underpinning photosynthesis targeted by the RR-MYBs contributes to the severe perturbation to phenotype when their function is removed.Overall, the data are consistent with overlapping as well as distinct roles for these two classes of transcription factor.In summary, from the analysis of M. polymorpha and A. thaliana, whose last common ancestor dates to around 400 million years ago, we propose a model in which both RR-MYBs and GLKs operate as master regulators of photosynthesis gene expression.In both species, the RR-MYBs play a conserved role in controlling photosynthesis gene expression, and their targets are broader than those documented for GLK.As RR-MYBs appear ubiquitous in land plants, 27 it seems plausible they play a conserved role in chloroplast biogenesis.Although we were unable to detect MpRR-MYBs in the Zygnematophyceae algae that are sister to the land plants, 56,57 they are in fact present in the Klebsormidiophyceae and Charophyceae 57,58 (Figure S6B), representing the other two most closely related algal lineages to land plants.GLK homologs are present in green algae 56 and both GLK and RR-MYB motifs are present in promoter regions of these genes in K. flaccidum and C. braunii (Figure S7L).These data imply that RR-MYBs operated alongside GLK to control chloroplast biogenesis before the colonization of land by plants.

Limitations of the study
Identification of RR-MYBs as regulators of chloroplast biogenesis and photosynthesis gene expression opens up a number of areas to be addressed in the future.These include, for example, analyses such as chromatin immunoprecipitation sequencing (ChIP-seq) to define their targets in space and time and allow downstream networks to be quantitatively defined, and an understanding of how these networks are rewired in response to developmental and environmental signals.It is also the case that currently we do not know how the RR-MYBs are regulated by the light signaling networks that act to initiate photosynthesis gene expression during de-etiolation nor the hormonal networks that tune chloroplast development in different cell types.Increased visibility of low chlorophyll in A. thaliana after reproductive growth implies that the RR-MYBs affect senescence, and so whether they interact with known players such as oresara 1 (ORE1) and Arabidopsis thaliana activating factor 1 (ATAF1) 59,60 will need to be defined.It will also be important to discover whether the RR-MYBs are integrated into retrograde signalling 61 from the chloroplast.Lastly, another area is how and what proteins the RR-MYBs interact with and if they are post-translationally regulated.Defining such complexes will help understand not only the mode of action of these transcription factors but also how they themselves are regulated.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following: MpRR-MYB2 and MpRR-MYB5 coding sequences were synthesised and cloned into the pUAP4 vector. 29For complementation, gRNA resistant MpGLK, MpRR-MYB2 and MpRR-MYB5 coding sequences were synthesised and cloned into the pUAP4 vector.OpenPlant parts used listed in STAR Methods.To generate Atmybs1,mybs2 mutants two gRNAs per gene were cloned into pEn-Chimera 83 after. 84This placed guides into the pRU294 vector that has a codon optimised and intron-containing version of Cas9 driven by the egg-cell specific pEC1.2promoter. 85A. thaliana was transformed by floral dipping 86 and genotyping performed as reported previously. 87T2 plants with confirmed edits were analysed.For M. polymorpha transformation, 5 mL LB media were inoculated with 3-4 Agrobacterium colonies (GV3101: 50 mg/mL rifampicin, 25 mg/mL gentamicin) and the plasmid-specific selection antibiotic.The preculture was incubated at 28 C for 2 days at 110 rpm. 5 mL of 2 d Agrobacterium culture were centrifuged for 7 min at 2000 x g.The supernatant was removed and pellet re-suspended in 5 mL liquid KNOP (0.25g/L KH 2 PO 4, 0.25g/L KCl, 0.25g/L MgSO 4 7H 2 O, 1g/L Ca(NO 3 ) 2 4H 2 O, 12.5mg/L FeSO 4 7H 2 O, 30mM MES and pH5.5) plus 1% (w/v) sucrose and 100 mM acetosyringone.The culture was then incubated with shaking (120 rpm) at 28 C for 3-4 hours.Around 20 thallus fragments were transferred into a 6-well plate with 5 mL liquid KNOP medium supplemented with 1% (w/ v) sucrose, 30 mM MES, pH 5.5, 80 mL of Agrobacterium culture and acetosyringone at a final concentration of 100 mM.Tissue was co-cultivated with Agrobacterium for 3 days on a shaker at 110 rpm, at 22 C with ambient light.Using a sterile plastic pipette, liquid was removed from each well and the thallus fragments were transferred onto plates with growth media containing the appropriate antibiotic (Chlorsulfuron 0.5 mM and Cefotaxime 100mg/mL).To facilitate spreading of thallus fragments 1-2 mL sterile water was added to the petri dish.To genotype M. polymorpha 3 3 3 mm pieces of thalli from individual plants were placed in 1.5 mL tubes and crushed in 100 mL genotyping buffer (100 mM Tris-HCl, 1 M KCl, 1 M KCl, and 10 mM EDTA, pH 9.5).Tubes were then placed at 70 C for 15-20 mins and 380 mL sterile water added to each tube.5 mL aliquots of the extract were used as a template for polymerase chain reactions.
Chlorophyll content, fluorescence imaging, microscopy and starch quantification For chlorophyll quantification in M. polymorpha $30-50mg of 10-14 days old gemmalings were used with five biological replicates per genotype.Tissue was blotted dry before weighing and then transferred into a 1.5mL microfuge tube containing 1 mL of dimethyl sulfoxide (DMSO) (D8418, Sigma Aldrich) and incubated in the dark at 65 C for 45 minutes.Samples were allowed to cool to room temperature for approximately one hour.Chlorophyll content was then measured using a NanoDropÔOne/One C Microvolume UV-Vis Spectrophotometer (ThermoFisher) following the manufacturer's instructions.Chlorophyll fluorescence measurements were carried out using a CF imager (Technologica Ltd, UK).M. polymorpha plants were placed in the dark for 20 mins and a minimum weak measuring light beam (<1 mmol m -2 s -1 ) applied to evaluate dark-adapted minimum fluorescence (F o ), and a subsequent saturating pulse of 3000 mmol m -2 s -1 used to evaluate dark-adapted maximum fluorescence (F m ).A total of three plants were measured per genotype and treatment.20 mM DCMU (#45463, Sigma Aldrich) was added to half-strength MS media, and thalli placed in DCMU for 24 h before chlorophyll fluorescence measurements were obtained.
To assess de-etiolation in A. thaliana, seeds were sown on plates, stratified at 4 C in darkness for 3 days and then incubated at 22 C for 3 days before transfer to continuous white light with an intensity of 100 mmol m À2 s À1 .Chlorophyll was measured at 0, 1.5, 3, 6, 12 and 24 hours after exposure to light by pooling $50 cotyledon pairs into a 1.5 mL microfuge tube containing 100 mL of dimethyl sulfoxide (DMSO) (D8418, Sigma Aldrich), with three biological replicates per time point.Tubes were incubated in the dark at 65 C for 30 minutes, samples were allowed to cool to room temperature for approximately one hour and chlorophyll was quantified using a nanodrop following the same procedure as described for M. polymorpha.
For confocal laser scanning microscopy of M. polymorpha, five to seven gemma were placed within a medium-filled gene frame together with 30 mL water prior to being sealed with a cover slip.Plants were imaged immediately using a Leica SP8X spectral fluorescent confocal microscope with either a 10X air objective (HC PL APO 103/0.40CS2) or 20X air objective (HC PL APO 203/0.75CS2).Excitation laser wavelength and captured emitted fluorescence wavelength windows were 488 nm, 498À516 nm for GFP, and 488 or 515nm, 670À700 nm for chlorophyll autofluorescence.For electron microscopy $2 mm 2 sections of 5-6 individual 3-week-old thalli or 2-week-old leaves were harvested, fixed, embedded and imaged as previously described. 88Chloroplast area was measured using ImageJ and the Macro is deposited on Mendeley.
To quantify starch, M. polymorpha thallus tissue (300-400 mg) that had been grown for two weeks under continuous light with a light intensity of 100 mmol m À2 s À1 was harvested into tubes containing 2 metal beads, flash-frozen in liquid nitrogen, and ground using a tissue lyser (settings: 2:30 min; 28Hz).1000 mL 0.7 M perchloric acid was added into the tubes prior to vigorous vortexing, grinding again in a tissue lyser (settings: 1:30 min; 28Hz) and then being centrifuged for 5 minutes.The supernatant was transferred into a fresh tube, and 400 mL neutralisation buffer [2 M KOH, 400 mM MES] added to achieve pH 6-7.The precipitate was spun down for 4 minutes at maximum speed using a benchtop centrifuge. 1 mL of dH 2 O was added to wash the pellet, vortexed to resuspend the pellet and spun at 3,000 g for 3 minutes.The pellet was washed by adding 1 mL 80% (v/v) EtOH and resuspended by vortexing.A total of three ethanol washes were performed, or until the final wash was clear.Excess EtOH was evaporated from the pellet by leaving the tube open in a fume hood for 20 minutes.The pellet was resuspended in 400 mL of dH 2 O, and two aliquots of 200 mL were placed into 1.5 mL tubes and incubated at 100 C for 15 minutes.Controls comprised 190 mL NaAc buffer and 10 mL H 2 O. Sample digests were set up by adding 190 mL 0.22M Na-Acetate Buffer (pH 4.8) and 10 mL of 9:1 mix of amyloglucosidase:a-amylase (Roche #10102857001) (Roche #10102814001).Reactions were incubated at 37 C for 2 hours.Insoluble material was spun out (3 min, max speed).A master mix was prepared consisting of 50 mM HEPES-NaOH (pH 7.4-7.6),1mM MgCl 2 , 1mM ATP, 1mM NAD, and 1.4 Units of Hexokinase (Roche #11426362001).Reactions consisted of 50 mL Master mix, 50 mL sample, 3 mL of glucose-6phosphate (Glucose-6-phosphate dehydrogenase (Roche #10165875001) diluted 4X in water) and dH 2 O to a final volume of 200 mL.The reaction was conducted in a GREINER 96 F-BOTTOM microtiter plate using a CLARIOstar plate reader.The initial absorbance at 340nm was monitored.Using a multichannel pipette, 2 mL of diluted G-6-P dehydrogenase was dispensed.Immediately after, the absorbance was monitored at 340nm for minutes.

RNA extraction and sequencing
For M. polymorpha, RNA was extracted from 3-4 two-week old gemmae using the RNeasy Plant kit (#74903, Qiagen) with RLT buffer supplemented with beta-mercaptoethanol, and residual genomic DNA removed using the Turbo DNA-free kit (# AM1907, Invitrogen).500 ng of DNase-treated RNA was used as template for cDNA preparation (SuperScriptÔ IV First-Strand Synthesis System, #18091050, Invitrogen) according to manufacturer's instructions except that reverse transcription was 40 minutes and used oligo (dT)18 primers.qPCR was performed using the SYBR Green JumpStart Taq Ready Mix (#S4438, Sigma Aldrich) and a CFX384 RT System (Bio-Rad) thermal cycler.cDNA was diluted six times and oligonucleotides (Table S3) used at a final concentration of 0.5 mM.Reaction conditions comprised initial denaturation at 94 C for 2 minutes followed by 40 cycles of 94 C for 15 seconds (denaturation) and 60 C for 1 minute (annealing, extension, and fluorescence reading).Primer sequences are in Table S3.Library preparation and RNA sequencing was performed by Novogene (Cambridge, UK).Briefly, messenger RNA was purified from total RNA using poly-T oligo-attached magnetic beads.After fragmentation, first strand cDNA was synthesised using random hexamer primers.Library concentration was measured on a Qubit instrument using the manufacturer's procedure (Thermo Fisher Scientific) followed by real-time qPCR quantification.Library size distribution was analysed on a bioanalyzer (Agilent) following the manufacturer's protocol.Quantified libraries were pooled and sequenced on a NovaSeq PE150 Illumina platform and 6G raw data per sample obtained.Adapter sequences were: 5' Adapter: 5'-AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT-3'.3' Adapter: 5'-GATCGGAAGAGCACACGTCTGAACTCCAGTCACGGATGACTATCTCGTATGCCGTCTTCTGCTTG-3' FastQC was used to assess read quality and TrimGalore (https://doi.org/10.5281/zenodo.5127899)to remove low-quality reads and adapters.Reads were pseudo-aligned using Kallisto 73 to the M. polymorpha genome version 5 (transcripts only, obtained from MarpolBase). 89Mapping statistics for each library are provided in Data S1.DGE analysis was performed with DESeq2, 63 with padj-values < 0.01.
For A. thaliana, RNA was extracted from two-week old seedlings following the same pipeline.Library preparation and RNA sequencing was performed by Novogene (Cambridge, UK).FastQC was used to assess read quality and TrimGalore to remove low-quality reads and adapters.Reads were pseudo-aligned using Kallisto 73 to the A. thaliana TAIR10 genome (transcripts only).Mapping statistics for each library are provided in Data S1.DGE analysis was performed with DESeq2, 63 with padj-values < 0.01.For qPCR analysis, seeds were sown on plates and stratified at 4 C in darkness for 3 days, prior to incubation at 22 C for 3 days and transferred to continuous white light with an intensity of 100 mmol m À2 s À1 .Approximately 70-80 cotyledon pairs were collected into a 1.5 mL microfuge tubes at 0h, 30 min, 3h, 6h, and 24h after exposure to light.RNA was extracted using the RNeasy Plant kit (#74903, Qiagen) with RLT buffer supplemented with beta-mercaptoethanol, and residual genomic DNA removed using the Turbo DNA-free kit (# AM1907, Invitrogen).Primers used for Q-PCR are listed in Table S3.
Mapping genome-wide binding capacity of Mp RR-MYB5 and MpRR-MYB2 M. polymorpha genomic DNA was extracted as described previously. 95DAP-seq was performed by CD Genomics.Briefly, genomic DNA (gDNA, 5 mg) was fragmented to $200 bp.Fragmented gDNA was purified (Cat# A63880, Beckman) and overhangs resulting from fragmentation converted into blunt ends using T4 DNA polymerase, Klenow Fragment, and T4 Polynucleotide Kinase.After adding an 'A' base to the 3' end of the blunt phosphorylated DNA fragments, adapters were ligated to the ends of the DNA fragments.The adapter-ligated gDNA fragments were purified using the Agencourt AMPureXP Kit (Cat# A63880, Beckman).For protein expression the coding sequences of MpRR-MYB2 and MpRR-MYB5 were cloned into a pFN19K HaloTag T7 SP6 Flexi expression vector.Halo-MpRR-MYB2 or 5 fusion proteins were expressed using the TNT SP6 Coupled Wheat GermExtract System (Promega) following the manufacturer's specifications for expression in a 50 mL reaction with 2 hours incubation at 37 C. Expressed proteins were directly captured using MagneHalo Tag Beads (Promega).The protein-bound beads were incubated with 50 ng of adapter-ligated gDNA fragments on a rotator for 1 hour at room temperature in 50 mL wash/bind buffer.Beads were washed three times using the same wash buffer to remove unbound DNA fragments.The HaloTag beads were resuspended in 30 mL of the elution buffer and heated to 98 C for 10 min to denature protein and release the bound DNA fragments into solution.The supernatant was then transferred to a new well, and 25 mL used in a 50 mL PCR employing the KAPA HiFi HotStart ReadyMixPCR Kit (Roche, Basel, Switzerland) for 10 cycles.PCR primers consisted of the full-length Illumina TruSeq Universal primer (5 0 -AATGATACGGCGACCACCGAGATCTA CACTCTTTCCCTACACGACGCTCTTCCGATC T-3 0 ) and an Illumina TruSeq Index primer (5 0 -CAAGCAGAAGACGGCATACGAGAT-NNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3 0 ) where NNNNNN represents the 6 bp sequence index used for sample identification.PCR products were purified and selected using Agencourt AMPure XP Kit (Cat#A63880, Beckman), and resuspended in 20 mL nuclease-free water.DNA concentration was determined using a Qubit (Life Technologies, Burlington, Ontario, Canada).Eluted DNA fragments were sequenced on an Illumina NovaSeq6000 platform.Negative control mock (Input) DAP-seq libraries were prepared without the addition of protein to the beads.Data provided were then mapped against the M. polymorpha v6 genome or the TAIR10 A. thaliana genome using bowtie2 74 version 2.5.3, sorted and converted to bam format with samtools 75 version 1.19.2.Mapped reads were visualised using IGV_2.17.2, and peaks called with MACS3 77 version 3.0.0,with bedtools 76 used to retrieve genomic coordinates of peaks.Nearest genes around the peaks were retrieved using the ChIPseeker R package. 78pRR-MYB2 and MpRR-MYB5 binding sites were identified by motif enrichment analysis (using meme-chip-5.5.5) using the 250bp flanking regions of the DAPseq identified peaks.AtMYBS1 and MYBS2 binding sites (motifs MA1186.1 and MA1399.1)were obtained from JASPAR.65 Motifs were merged to create RR-MYB combined motifs and visualised using the Ceqlogo tool from MEME.64 The MpGLK motif was obtained from.16 To assess occurrence of GLK and RR-MYB motifs in gene promoter sequences (500 bp and 1.5kb upstream of the TSS) were scanned using FIMO.66 500bp was selected in order to focus analysis on core promoters rather than long distance enhancer elements and to reduce background signal associated with the increased probability of finding any motif by chance as the search space is greater.Promoters were then scored for presence or absence of each motif and percentage of photosynthesis of genes containing each calculated.The Fisher exact and permutation testing were used to quantify over-representation of RR-MYB motifs in photosynthesis genes.For the latter, to test the background distribution of RR-MYB motifs a set of random promoters of the same size as the list of photosynthesis genes from M. polymorpha or A. thaliana was extracted and FIMO ran to determine the presence of these motifs.This process was iterated 1000 times and distributions of the frequency of these motifs plotted.The frequency of motifs in promoters of photosynthesis genes was also determined by searching each promoter with FIMO.Permutation tests were performed to test whether the frequency of motifs was significantly different to the frequency found in random selected promoters.In Data S2 the FIMO tool 66 was used to scan promoter sequences of A. thaliana and M. polymorpha for matches to transcription factor binding motifs found in the JASPAR motif database. 65To account for input sequence composition, a background model was generated using the fasta-get-markov tool from the MEME suite.64 FIMO was then run with default parameters and a P value cut-off of 1 3 10 À4 .Matches to GLK and RR-MYB combined motifs were highlighted in each output.

Trans-activation assays
Rice protoplasts were subjected to transformation as previously described. 96Golden gate level 1 modules for transformation were isolated using the ZymoPUREÔ II Plasmid Midiprep Kit, with ZmUBIpro::GUS-T35S (maize ubiquitin promoter and CAMV35S terminator module) serving as the transformation control.Transcription factor coding sequences were amplified using CTAB extracted M. polymorpha gDNA or synthesised by IDT, with BsaI and BpiI sites mutated, and then cloned into Golden gate SC level 0 modules.These modules were assembled into a level 1 module with a maize ubiquitin promoter and CAMV35S terminator module.Details of promoter fragments cloned can be found on Mendeley and in STAR Methods.For each transformation, 2 mg of transformation control plasmid, 5 mg of reporter plasmid, and 5 mg of effector plasmid were combined and mixed with 180 mL protoplasts.After 20 hours incubation in the light with intensity of 40 mmol m À2 s À1 , at 22 C, proteins were extracted using a passive lysis buffer (Promega).Quantification of GUS activity was performed using a fluorometric MUG assay. 97      The coloring used for each column depends on the fraction of that column made up of amino acids as follows: black 100% similar, dark gray 80%-100% similar, lighter gray 60%-80% similar, white less than 60% similar.

Figure 1 .
Figure 1.MpRR-MYB5 and MpRR-MYB2 act redundantly to regulate chloroplast biogenesis in M. polymorpha(A) Schematic of pipeline used to identify candidate transcription factors regulating photosynthesis gene expression.RNA-seq data from Arabidopsis thaliana during de-etiolation20 and Marchantia polymorpha during spore germination19 were examined.Transcription factors (TFs) upregulated in response to light in both (B) Heatmaps showing transcript abundance (Z score) of candidates that were selected to generate knockout mutants in M. polymorpha.Protein name shown in brackets after each gene identifier.(C) Top: schematic of MpRR-MYB5 and MpRR-MYB2 gene structure (exons represented as black boxes) with guide (g) RNAs for CRISPR represented by arrows.Bottom: amino acid sequence alignments of MpRR-MYB5 and MpRR-MYB2 proteins with the characteristic RR-MYB/CCA1-like domain highlighted in blue.(D-G) Representative images of control and Mprr-myb5, Mprr-myb2, and Mprr-myb5,2 mutants.Scale bars represent 2 mm.(H) Chlorophyll content of Mprr-myb5, Mprr-myb2, and Mprr-myb5,2 mutants.Letters show ranking using a post hoc Tukey test with different letters indicating statistically significant differences at p < 0.01.Data presented as means and standard error of the mean, n = 5. (I) Heatmaps showing transcript abundance (Z score) of MpGLK, MpRR-MYB2, and MpRR-MYB5 in germinating spores during the first 96 h after exposure to light. 19(J) Representative images and quantification after imaging of chlorophyll fluorescence parameter F v /F m after treatment with the inhibitor di-chlorophenyl dimethyl urea (DCMU).Asterisks indicate statistically significant difference using a two-tailed t test, ***p % 0.0001, n.s., non-significant, n = 8. (K-N) Representative images after confocal laser scanning microscopy of Mprr-myb5, Mprr-myb2, and Mprr-myb5,2 mutants with chlorophyll autofluorescence shown in magenta.Cell borders are marked with dashed white lines.Scale bars represent 10 mm.(O-R) Representative images after transmission electron microscopy of control, Mprr-myb5, Mprr-myb2, and Mprr-myb5,2 mutants.Scale bars represent 1 mm.Dashed area depicted in each chloroplast is enlarged, and granal stacks indicated with red arrows.(S) Violin plots of chloroplast area for control, Mprr-myb5, Mprr-myb2, and Mprr-myb5,2.Box and whiskers represent the 25 to 75 percentile and minimummaximum distributions of the data.Letters show ranking using a post hoc Tukey test with different letters indicating statistically significant differences at p < 0.01.n = 150.(T) Starch levels in control, Mprr-myb2, Mprr-myb5, and Mprr-myb5,2 mutants.FW: fresh weight.Data presented as means and standard error of the mean, n = 3. See also Figures S1 and S2.

Figure 2 . 2 M 5 Calvin
Figure 2. MpRR-MYB5 acts synergistically with MpGLK to control chloroplast biogenesis (A-D) Representative images of control (same as Figure 1D), Mprr-myb5, Mpglk, and Mpglk,rr-myb5 mutants.Scale bars represent 7 mm.(E) Chlorophyll content in Mprr-myb5, Mpglk, and the double Mpglk,rr-myb5 mutants compared with controls.Letters show ranking using a post hoc Tukey test with different letters indicating statistically significant differences at p < 0.01.Data presented as means and standard error of the mean, n = 5. (F) Representative images and quantification after imaging of the chlorophyll fluorescence parameter F v /F m after treatment with the inhibitor di-chlorophenyl di-methyl urea (DCMU).n.s., non-significant statistical difference using a two-tailed t test, n = 6.(G-J) Representative images after confocal laser scanning microscopy of control, Mprr-myb5, Mpglk, and Mpglk,rr-myb5 mutants.Chlorophyll autofluorescence shown in magenta, cell borders marked with dashed white lines.Scale bars represent 10 mm.(K-N) Representative images from transmission electron microscopy images of control, Mprr-myb5, Mpglk, and Mpglk,rr-myb5 mutants.Scale bars represent 1 mm.Dashed area depicted in each chloroplast is enlarged, and granal stacks indicated with red arrows.(O) Chloroplast area of Mpglk and Mpglk,rr-myb5 mutants.Box and whiskers represent the 25 to 75 percentile and minimum-maximum distributions of the data.Letters show ranking using a post hoc Tukey test with different letters indicating statistically significant differences at p < 0.01.n = 330.See also Figure S3.

Figure 6 .Figure 7 .
Figure 6.RR-MYBs control chlorophyll biogenesis in A. thaliana (A) RR-MYB-related transcription factors in A. thaliana, rice, and bryophytes.Amino acid alignment of MpRR-MYB5&2 from M. polymorpha and AtMYBS1 and AtMYBS2 from A. thaliana.Characteristic SHAQK(Y/F)F DNA binding motif containing domains highlighted with blue shading.(B) Schematic representation of AtMYBS1 and AtMYBS2 gene structure showing exons as gray rectangles and guide (g) RNAs positions for gene editing.(C-F) Images of 2-week old seedlings of wild-type, Atmybs1, Atmybs2, and Atmybs1,mybs2 mutants, Scale bars represent 7 mm.Representative images after confocal laser scanning microscopy also shown below, with scale bars represent 25 mm.(G and H) Representative images of wild-type and Atmybs1,mybs2 mutants with inflorescence.Scale bars represent 1.5 cm.(I) Violin plots of chloroplast area for wild-type, Atmybs1, Atmybs2, and Atmybs1,mybs2 mutants.Box and whiskers represent the 25 to 75 percentile and minimum-maximum distributions of the data.Letters show ranking using a post hoc Tukey test (with different letters indicating statistically significant differences at p < 0.01), n = 100.(J) Representative transmission electron microscopy images of wild-type and Atmybs1,mybs2 mutants.Scale bars represent 5 mm.(K) Chlorophyll content of wild-type, Atmybs1, Atmybs2, and Atmybs1,mybs2 mutants (three independent lines were used for measurements).Letters show ranking using a post hoc Tukey test (with different letters indicating statistically significant differences at p < 0.01).Data presented as means and standard error of the mean, n = 5. (L) Representative images of wild-type, Atglk1,glk2, and Atmybs1,mybs2 seedlings undergoing de-etiolation.Scale bar represents 500 mm.(M) Chlorophyll content wild-type, Atglk1,glk2, and Atmybs1,mybs2 seedlings during the de-etiolation time course.Data presented as mean and standard error of the mean, n = 3. (N) Chlorophyll fluorescence parameter F v /F m during de-etiolation of wild-type, Atglk1,glk2, and Atmybs1,mybs2.(O) Transcript abundance of AtGLK1, AtGLK2, AtMYBS1, and AtMYBS2 genes during de-etiolation.UBP6 (At1g51710) was used as internal control.Data presented as means and standard error of the mean, n = 3. See also Figure S6.

( A )
Schematic representation of MpRR-MYB5 gene with exons shown as black boxes, position of gRNA shown with arrow, and sequence analysis of Mprr-myb5 knockout lines.Wild-type M. polymorpha Cam-1 sequence shown at the top, with the 20 bp gRNA target sequence highlighted by a red line.Amino acid sequence depicted below nucleotide sequence.(legend continued on next page) (B) Schematic representation of MpRR-MYB2 gene with exons shown as black boxes, position of gRNA shown with arrow, and sequence analysis of Mprr-myb2 knockout lines.Wild-type M. polymorpha Cam-1 sequence shown at the top, with the 20 bp gRNA target sequence highlighted by a red line.Amino acid sequence depicted below nucleotide sequence.(C) Schematic representation of control construct and that used to express MpRR-MYB5 from the 1,792 bp fragment upstream of the predicted translational start codon.Representative phenotypes of control and Mprr-myb5 mutants complemented with the pro MpRR-MYB5::MpRR-MYB5. Scale bars represent 5 mm.Chlorophyll content in control, and Mprr-myb5 mutants complemented with the pro MpRR-MYB5::MpRR-MYB5. Letters show ranking using a post hoc Tukey test (with different letters indicating statistically significant differences at p < 0.01), n = 5. (D) Sequence analysis of Mprr-myb5,2 knockout mutant lines.Wild-type M. polymorpha Cam-1 sequence shown at the top, with the 20 bp gRNA target sequence highlighted by a red line.Amino acid sequence depicted below nucleotide sequence.(E) Scanning electron micrograph maps of control, Mprr-myb2, Mprr-myb5, and Mprr-myb5,2 thallus cross sections.Scale bars represent 20 mm.

( A )
Top: schematic of MpGLK gene with exons indicated by black boxes, position of gRNA shown with an arrow.Bottom: sequence analysis of Mpglk knockout lines.Wild-type M. polymorpha Cam-1 sequence shown at the top, with the 20 bp gRNA target sequence highlighted by a red line.Amino acid sequence depicted below nucleotide sequence.(B) Top: schematic representation of MpGLK and MpRR-MYB5 genes with exons indicated by black boxes.Position of gRNA shown with an arrow.Bottom: sequence analysis of Mpglk,rr-myb5 double-mutant lines.Wild-type M. polymorpha Cam-1 sequence shown at the top, with the 20 bp gRNA target sequence highlighted by a red line.Amino acid sequence depicted below nucleotide sequence.(C) Schematic of MpGLK gene as shown in (B) along with sequence analysis of Mpglk mutations in Mpglk,rr-myb2 knockout mutant lines.(D) Representative images of Mpglk and Mpglk,rr-myb2 mutants (scale bars represent 5 mm), and chlorophyll content of Mpglk and Mpglk,rr-myb2 mutants.Letters show ranking using a post hoc Tukey test (with different letters indicating statistically significant differences at p < 0.01), n = 5. (E) Gene structure and sequence analysis of Mpgata4,rr-myb5 knockout mutants.(F) Representative images of Mpgata4, Mprr-myb5, and Mpgata4,rr-myb5 mutants (scale bars represent 5 mm), and chlorophyll content of Mpgata4, Mprr-myb5, and Mpgata4,rr-myb5 mutants.Letters show ranking derived from a post hoc Tukey test (with different letters indicating statistically significant differences at p < 0.01), n = 5. (G) Gene structure and sequence analysis of Mpglk,rr-myb5,2 knockout mutants.Wild-type M. polymorpha Cam-1 sequence shown at top.Amino acid sequence depicted below nucleotide sequence.(H) MAFFT alignment of the N terminus of representative RR subclass RR-MYB/CCA1-like proteins.

M p rr -m y b 5 M p rr -m y b 2 M p rr -m y b 5 (
Photorespiration

TABLE
d RESOURCE AVAILABILITY B Lead contact B Materials availability B Data and code availability d METHOD DETAILS B Plant growth and transformation B Chlorophyll content, fluorescence imaging, microscopy and starch quantification B RNA extraction and sequencing B Immunoblotting B Phylogenetic analysis