Simultaneous cellular and molecular phenotyping of embryonic mutants using single-cell regulatory trajectories

Summary Developmental progression and cellular diversity are largely driven by transcription factors (TFs); yet, characterizing their loss-of-function phenotypes remains challenging and often disconnected from their underlying molecular mechanisms. Here, we combine single-cell regulatory genomics with loss-of-function mutants to jointly assess both cellular and molecular phenotypes. Performing sci-ATAC-seq at eight overlapping time points during Drosophila mesoderm development could reconstruct the developmental trajectories of all major muscle types and reveal the TFs and enhancers involved. To systematically assess mutant phenotypes, we developed a single-nucleus genotyping strategy to process embryo pools of mixed genotypes. Applying this to four TF mutants could identify and quantify their characterized phenotypes de novo and discover new ones, while simultaneously revealing their regulatory input and mode of action. Our approach is a general framework to dissect the functional input of TFs in a systematic, unbiased manner, identifying both cellular and molecular phenotypes at a scale and resolution that has not been feasible before.


In brief
By applying scATAC-seq over a dense time course of mesoderm development, Secchia et al. generate regulatory trajectories of multiple muscle lineages in Drosophila. They demonstrate that integrating this with loss-of-function transcription-factor mutants is sufficient to uncover cellular phenotypes de novo, while simultaneously providing new insights into the factor's molecular function.

INTRODUCTION
Understanding the progression and regulation of cell lineages is a major goal of developmental biology. Genetic studies traditionally addressed this either at the cellular level, describing high-level tissue abnormalities by immunostaining mutant embryos, or at a molecular level using genomics or biochemical approaches, with limited integration between the two. The development of single-cell methods provides new opportunities to change this, having the potential to obtain a more fine-grained description of mutant phenotypes at both cellular and molecular levels. Recent studies have demonstrated the power of single-cell transcriptomics to uncover cellular diversity, identify new cell states, and chart cellular trajectories during embryonic development. For example, single-cell RNA-seq (scRNA-seq) can map the development of tissues-e.g., the Drosophila optic lobe (Ö zel et al., 2021), aging brain (Davie et al., 2018), and mouse heart (Tyser et al., 2021)-and whole organisms, as shown for mice (Argelaguet et al., 2019;Pijuan-Sala et al., 2019), zebrafish (Farrell et al., 2018;Wagner et al., 2018), Xenopus , and planarians (Plass et al., 2018).
Expression states are regulated by transcription factors (TFs), which drive both the diversity and progression of cell lineages.
Many TFs act through enhancers to regulate a specific pattern of gene expression. TFs thereby specify and maintain cellular identity, giving a cell or tissue much of its morphological and functional characteristics (Farley et al., 2015;Furlong and Levine, 2018;Reiter et al., 2017;Spitz and Furlong, 2012). Deciphering how developmental lineages are regulated therefore requires an understanding of both the TFs involved and the enhancers they regulate. While scRNA-seq provides information on the former, the function and direct contribution of these factors need to be inferred. Single-cell regulatory genomics methods, such as single-cell ATAC-seq (sci-ATAC-seq) (Minnoye et al., 2021), provide a direct approach to uncover enhancer usage in different tissues and stages of development, as recently demonstrated during Drosophila (Cusanovich et al., 2018a), mouse (Argelaguet et al., 2019;Pijuan-Sala et al., 2020), and human (Domcke et al., 2020) embryogenesis. Combining such information, obtained through a high-resolution time course of embryogenesis, with loss-of-function mutants holds the promise of uncovering the functional role of developmental factors at both a cellular and molecular level.
To explore this, we selected mesoderm specification into different muscle primordia as a well-studied model system. This germ layer gives rise to all major muscle types from flies to humans, and the key TFs regulating the subdivision of the mesoderm into different muscle lineages are known and highly conserved. Seminal genetic screens in Drosophila uncovered the functional requirement of many of these factors, describing high-level phenotypes such as missing or abnormal muscles (Azpiazu and Frasch, 1993;Bour et al., 1995;Lilly et al., 1995;Zaffran et al., 2001). There are also extensive molecular data describing TF binding and enhancer usage during wild-type mesoderm-to-muscle development during Drosophila embryogenesis (Ciglar et al., 2014;Jakobsen et al., 2007;Junion et al., 2012;Liu et al., 2009;Reddington et al., 2020;Sandmann et al., 2006Sandmann et al., , 2007aZinzen et al., 2009).
To better integrate these two, i.e., high-level tissue phenotypes with the molecular input of their developmental regulators, we first generated a dense time course of single-cell regulatory changes during mesoderm development in wild-type Drosophila embryos. Performing sci-ATAC-seq across eight overlapping embryonic time points captured a continuum of regulatory transitions as cells move from multipotency down different developmental lineages. We then use these regulatory trajectories to examine the phenotypes of four essential TFs in loss-of-function mutants, providing a high-resolution view of their functional impact at both a cellular and molecular level. We demonstrate that this approach can pinpoint and quantify known tissue defects de novo and uncover new more subtle phenotypes, while simultaneously obtaining a deeper understanding of the molecular features and regulatory role of the TFs' input. Taken together, this study provides a framework to systematically assess the role of developmental regulators at multiple levels and serves as a forerunner for building predictive networks of tissue and organismal development.

RESULTS
A single-cell atlas of chromatin accessibility during a comprehensive time course of mesoderm development To capture the regulatory trajectories of different muscle lineages, we profiled sci-ATAC-seq in eight overlapping, rather than adjacent, 2 h tightly staged embryo collections to resolve continuous single-cell trajectories ( Figure 1A). This time course initiates shortly after gastrulation when cells are multipotent (3-5 h) and continues through the stages of lineage commitment to terminal differentiation (10-12 h), ensuring that all major developmental transitions are captured. The nuclei in each time point were obtained from hundreds of embryos and naturally sample an even finer range of developmental states. For each 2-h staged collection, embryos were formaldehyde fixed and intact nuclei were isolated and stained with a mesodermal/muscle marker (myocyte enhancer factor 2 [Mef2]) and FAC sorted to >95% purity ( Figure 1A) using our optimized batch isolation of tissue-specific (BiTS) nuclei protocol (Reddington et al., 2020).
We performed two sci-ATAC-seq ''replicates'' per time point, consisting of independent embryo collections and mesodermal nuclei sorting, which resulted in a combined dataset of 24,032 single mesodermal nuclei (subsequently referred to as cells) that passed quality filters (STAR Methods; Figure S1A; Table S1). Through protocol optimizations, we roughly doubled the read coverage per cell compared with our previous study (Cusanovich et al., 2018a), obtaining a median of 21,649 unique reads/cell. This is comparable with sci-ATAC-seq studies in mice (Cusanovich et al., 2018b), but as the Drosophila genome is $25-fold smaller, this yields a comprehensive coverage of occupied enhancers in each individual cell. The total number of cells, and median coverage per cell, was generally consistent across time (Figures S1B and S1C). Four lines of evidence attest to the high quality of these sci-ATAC-seq data: (1) the insert size has the expected nucleosomal distribution ( Figure S1D); (2) pseudobulk profiles, created by aggregating single cells, are highly correlated among ''replicate'' batches (Figure S1E); and (3) importantly our singlecell mesodermal data are highly correlated with bulk DNase-seq profiles from FAC-purified muscle populations (Figure S1F); and (4) they overlap characterized mesoderm/muscle enhancers (Figures 1B and S1G).
To maximize sensitivity to detect open chromatin regions over the entire time course, we performed a first round of clustering of each time point separately (yielding 61 clusters in total) and called peaks for each cell cluster, generating a merged set of 50,261 unique peaks (STAR Methods; Figure S2A). Cell clustering was performed on the resulting cell-by-peak accessibility matrix using cisTopic (Bravo Gonzá lez-Blas et al., 2019), which outperforms other methods on continuous populations (Chen et al., 2019). Unsupervised clustering of all cells from all time points revealed 15 clusters organized in a tree-like structure that naturally reflects the temporal order of the embryonic collections (Figures 1C,S2B,and S2C). Cells from the early time points are relatively uniform in their regulatory landscape and form a trunk that represents nascent unspecified mesoderm. The accessibility landscape markedly diversifies at $6-8 h of development, causing cells to branch along different trajectories ( Figure 1C). Although the clustering algorithm had no prior information, this matches the time window uncovered by genetic studies in the 1990's for the subdivision of the mesoderm into different muscle primordia (Azpiazu and Frasch, 1993;Azpiazu et al., 1996;Riechmann et al., 1997).
To determine the identity of the 15 major cell clusters (Figure S2D), we assessed the overrepresentation of tissue terms using two extensive resources: (1) curated enhancers with embryonic activity in vivo Kvon et al., 2014;Rivera et al., 2019) and (2) gene expression throughout embryogenesis  ( Figure S2E). These independent approaches gave highly concordant annotations and resolved the early mesoderm population and the three major myogenic lineages: the somatic, visceral, and cardiac muscles ( Figure 1D), indicating that they have distinct chromatin accessibility landscapes and require differential usage of regulatory elements. As specification begins halfway through our time course, unspecified mesodermal cells constitute the largest population of cells, while the cardiogenic mesoderm and resulting heart muscle is the least abundant ( Figure 1D). The fact that we can detect cardiomyocytes, a rare cell population consisting of only 104 cells per embryo (Reim and Frasch, 2010), indicates that we have comprehensively sampled the diversity of myogenic cell types at these stages. The continuum of muscle development is reflected in the distribution of different cell populations, which shifts over time with the early uncommitted progenitors being gradually replaced by the terminal muscle types ( Figure 1E). We also identified three small nonmyogenic populations. Fat body and hemocytes are both derived from the mesoderm and accordingly are present at the specification stages but absent at later time points ( Figure 1E). The third nonmyogenic population are neural cells, which may come from a subpopulation of Mef2-expressing cells within the mushroom body of the brain (Crittenden et al., 2018). We observed no sex biases between different muscle clusters ( Figure S2F).
To validate the cell-type assignment, we first FAC sorted cells from the visceral muscle using the lineage-specific marker Biniou and obtained high-quality sci-ATAC-seq profiles for 1,295 visceral muscle cells. Highlighting these cells on the UMAP shows that they cluster together with the cells assigned as visceral muscle (Figures 1F,orange and 1D,purple). Second, known marker genes for both lineage specification and differentiation show high accessibility in the expected muscle populations at the correct stages ( Figures 1G and S2G), including tinman (tin), pannier (pnr), and Doc3 (Doc3) in the cardiac muscle; (F) Same as (C), gray cells were FAC sorted using a general mesoderm/muscle marker (Mef2), orange cells using a visceral-muscle-specific marker (Biniou).
(G) Dotplot of marker-gene accessibility in each muscle type. Color scale indicates gene average accessibility (Z score); dot size, the percentage of cells in which the gene is accessible. Nonmyogenic populations are shown in Figure S2G.
Dynamic changes in single-cell chromatin accessibility reflects dynamic transcription-actor binding and identifies new regulators and enhancers in each lineage To resolve TFs likely responsible for these regulatory changes, we first examined dynamic changes in accessibility overlapping regions bound by muscle-specific factors by calculating TF deviation scores (Schep et al., 2017). These inferred TF activities match the expected patterns for each TF (Figures 2A and 2B). For example, tinman is broadly expressed throughout the trunk mesoderm at early stages before being restricted to the dorsal mesoderm and cardiac muscle at later stages (Azpiazu and Frasch, 1993;Yin and Frasch, 1998). Concordantly, Tinman bound sites from bulk ChIP data (Liu et al., 2009;Zinzen et al., 2009) at 2-4 and 4-6 h show high accessibility in cells in the early mesoderm, while Tinman 6-8 h ChIP peaks are restricted to the cardiac cells (Figure 2A). Similarly, Biniou and Bagpipe, the two factors required for visceral mesoderm specification (Azpiazu and Frasch, 1993;Zaffran et al., 2001), show specific activity in cells in the visceral-muscle lineage ( Figure 2A). Cells with open chromatin sites bound by the panmuscle factor Mef2 at either 2-4, 6-8, or 10-12 h of development  display concordant dynamic changes in accessibility over time, being more accessible at early, mid, and late points in our single-cell time course, respectively ( Figure 2B, upper). Similar temporal specific activity is seen for cells with regions overlapping twist binding at 4-6 h and Biniou and Lame-duck binding at 6-8 h ( Figure 2B, lower) (ChIP data from Cunha et al., 2010;Jakobsen et al., 2007;Sandmann et al., 2007a;Zinzen et al., 2009). Taken together, this indicates that our single-cell atlas accurately recapitulates the underlying temporal and spatial patterns of TF binding during mesoderm development.
To explore this further and identify new potential regulators, we integrated TF occupancy data for 280 factors (Kudron et al., 2018). These bulk ChIP-seq data were generated from whole embryos collected over large time windows spanning either half or all of embryogenesis, and thereby averages signal over different cell types and time points. Nevertheless, the high resolution of our single-cell data could resolve both the cell type and rough time window of occupancy of several factors ( Figure 2C). For example, resolving whole-embryo ChIP data for Tinman and Tailup to the cardiac muscle (Azpiazu and Frasch, 1993;Tao et al., 2007;Zmojdzian and Jagla, 2013), Nautilus and Pdp1 to the somatic and visceral muscle (Abmayr and Keller, 1998;Lin et al., 1997), and Org-1 and FoxL1 to visceral muscle (Hanlon and Andrew, 2016;Schaub and Frasch, 2013), consistent with the role of these TFs in the corresponding cell types. Nau is a good example of refining the temporal-window-the ChIP was performed on whole embryos spanning almost all of embryogenesis (4-24 h)-however, integration with our single-cell time course resolved the time window to 10-12 h and the tissue to the somatic and visceral muscle ( Figure 2C).
We next used this high-resolution atlas to identify new genes and enhancers that are differentially accessible in a specific muscle subpopulation ( Figure 2D; Table S2). Of the 5,180 differential peaks (representing 36% of all tested peaks), 78% (4,027/5,180) are distal from an annotated promoter and likely enhancers. In agreement with this, 20% (790/4,027) overlap previously characterized embryonic enhancers in vivo, suggesting that many of the other 80% are also likely enhancers. Remarkably, 19% (752/4,027) of distal differential elements were not discovered in our previous whole-embryo shotgun sci-ATAC-seq (Cusanovich et al., 2018a) or bulk tissue-specific DNase-seq (Reddington et al., 2020) studies ( Figure S2H), thereby increasing the discovery of mesoderm/muscle-specific regulatory elements.
In addition, 864 genes are differentially accessible across their gene body between cell types ( Figure 2D, right; Table S2), which can serve as a proxy for changes in gene expression. Many of these are components of signaling pathways or TFs, including known regulators of mesoderm/muscle development. For example, the transcription factors pnr, Doc3, tup, and apt in the cardiac; bap, bin, H2.0, and hand in the visceral; and Pdp1 and cf2 in the somatic muscle (Table S2). In addition, we uncovered many new potential regulators in specific tissues, including luna, Nk7.1, and CG14655. Our sci-ATAC data predict that luna is expressed during visceral-muscle development and Nk7.1 and CG14655 during somatic muscle development. In situ hy-bridization of each gene with a mesoderm/muscle marker confirms the expression of all three TFs in the tissues predicted by their differential accessibility ( Figure 2E). Concordantly, a function for CG14655 during flight-muscle development was recently reported (Meiler et al., 2021).
Dynamic changes in regulatory elements are sufficient to reconstruct diverse lineage trajectories We next exploited the continuous temporal resolution of our time course to reconstruct regulatory trajectories for three muscle lineages, starting from unspecified mesodermal cells ( Figure 3A, yellow dot). Ordering cells along pseudotime revealed extensive and dynamic temporal changes in accessibility for both regulatory elements and genes along each lineage's trajectory ( Figure 3B; Table S3). The loci of many upstream identity genes change in accessibility (in both directions) as the development of each lineage progresses. This includes lmd, kah, NK7.1,Pdp1,tx,and nau (dMyoD) in the somatic trajectory, in addition to more downstream effector genes required for differentiated muscle function (Mhc, Mlc, and Tropomyosin). Cardiomyocytes are specified by a highly conserved set of TFs from flies to humans (Davidson and Douglas, 2006), including members of the NKx2.5 (tinman [tin] in  3D). Both the somatic and visceral muscles are formed from two populations of cells-founder cells (FCs), which give the muscle its identity, and fusion-competent myoblasts (FCMs), which fuse to FCs during differentiation to form a multinucleated syncytium (Deng et al., 2017;Lee and Chen, 2019). We observed two trajectories for the visceral-muscle lineage (Figure 1D, purple cells). Starting from common precursors, one lineage branches toward the somatic body wall muscle ( Figure 1D (ii)), while the other has a distinctive lineage that remains separate from the rest of the muscle populations ( Figure 1D (i)).
To explore the visceral-muscle lineages further, we combined all annotated visceral-muscle cells from the Mef2+ time course with the Bin+ FAC sorted cells and reclustered these 4,187 visceral-muscle cells separately, revealing a more complex multi-branched structure ( Figure 3C). The visceral muscle represents a collection of muscles with different developmental origins ( Figure 3C, embryo scheme). The FCs and FCMs of the circular trunk visceral muscle (CVM) are specified at stages 10 and 11 (6-8 h), after which they migrate laterally and undergo myoblast fusion (stage 12) to form a continuous muscle that encloses the gut (Lee et al., 2006). The longitudinal VM (LVM) is formed from FCs in the caudal mesoderm toward the end of the embryo, which migrate on top of the circular VM, using it as a scaffold (Zaffran et al., 2001). Our single-cell trajectory captures these diverse origins and the temporal delay of the LVM development. The main branch ( Figure 3C, cluster 0) consists of both FCs and FCMs, as seen by their enrichment in markers for visceral-muscle FC specification (including Alk, numb, bap, and bin) and FCM specific genes (lmd and sns) ( Figure 3D). A subset of cells progress to a more differentiated state ( Figure 3C, moving from cluster 0 / 1 / 2), expressing specific myoblastfusion genes in the intermediate state before differentiating to contractile muscle, indicated by the expression of sarcomere protein genes such as MHC, Tmod, Tm1, and Tm2 ( Figure 3D).
Given the timing of these transitions and the presence of bap (a TF not expressed in LVM), this likely reflects circular VM development. The LVM and hindgut VM (HVM) branches have different developmental origins, as expected, and are enriched in longitudinal (cluster 5) and hindgut (cluster 4) visceral-muscle markers (Bae et al., 2017;San Martin and Bate, 2001) ( Figure 3D). The precursor population has a second branch ( Figure 3C, cluster 0 / 3), expressing a broader set of myoblast-fusion genes (Figure 3D), perhaps representing a more mature or more diverse FCM population. Some FCMs were proposed to remain unfused and wait for the LVM founder cells to migrate over the circular VM and then fuse during late stage 12 and stage 13 ($10-12 h) (Klapper et al., 2002). Our results support this and suggest that a proportion of these cells may also fuse to the overlying somatic-muscle FCs, given their expression of Vrp1/wip and wasp, two genes specific to myoblast fusion of the somatic muscle (Rudolf et al., 2014), and their progressive acquisition of a chromatin state similar to the somatic muscle (cluster 3 overlaps branch (ii) in Figure 1D).
Using single-nucleus genotyping to systematically profile homozygous mutants from mixed embryo populations To determine the functional impact of TF mutants on both the developmental trajectories of cells and their regulatory programs, we applied sci-ATAC-seq to loss-of-function mutant em-bryos of four mesodermal TFs and assessed the cells' behavior by integrating them with the wild-type developmental trajectory ( Figure 1D). Myocyte enhancer factor 2 (Mef2) is essential for myoblast fusion and terminal differentiation of all muscle types ( Figure 4A). Tinman (Nkx2-5) is essential for the subdivision of the dorsal mesoderm into cardiac and visceral muscle, through the activation of bagpipe (NKx3-2), which initiates biniou (FoxF2) expression and the visceral-muscle lineage. Although the occupancy of each TF has been examined in bulk (Jakobsen et al., 2007;Junion et al., 2012;Liu et al., 2009;Sandmann et al., 2006Sandmann et al., , 2007aZinzen et al., 2009), their contribution to enhancer accessibility and to an individual cell's state, remains unknown.
Recessive lethal mutants in animal models must be maintained as heterozygotes; therefore, only 25% of offspring embryos are homozygous for the mutation of interest. Here, we first developed an easy and generalizable approach to profile single-cell genomic measurements from embryos of mixed genotypes (Figure 4B). Rather than genotyping and hand sorting homozygous mutant embryos, we retained all embryos representing a pool of three genotypes (homozygous loss-of-function mutant, heterozygous, and homozygous nonmutant) ( Figure 4B) and performed sci-ATAC-seq on their dissociated pooled nuclei. In Drosophila, such heterozygous mutants are maintained over balancer chromosomes (Miller et al., 2016(Miller et al., , 2018. We sequenced the genetic background of the loss-of-function mutants (this study, STAR Methods) and the balancer chromosome (Ghavi-Helm (I) UMAP visualization of the muscle populations identified in (D), including Mef2 +/À and À/À cells, co-clustered with the wild-type muscle time course from Figure 1D. Cell clusters in (D) (Mef2 embryos) are plotted on the wild-type time course (gray). The somatic, cardiac, and visceral populations from the wild-type time course are enclosed by blue, red, and purple dashed lines, respectively. Note, Mef2 À/À clusters 1 and 2 (black arrows) are located close to the somatic muscle (blue) but off the wild-type trajectory.
et al., 2019), allowing each nucleus to be genotyped based on informative SNPs in the sci-ATAC reads. In contrast to standard allelic imbalance studies, this requires relatively few informative reads per nucleus, as described below.
The characterized mutations of these TFs were generated over 20 years ago and will have accumulated additional mutations that could impact chromatin accessibility independently of the TFs' function. To circumvent this, we first used CRISPR-Cas9 editing with single-stranded oligo donors (ssODNs) (Gratz et al., 2015) to recreate the characterized loss-of-function mutations for each factor in a common isogenic genetic background ( Figure S3A; Table S4; STAR Methods). These new alleles were sequence verified and recapitulate the expected characterized muscle phenotypes ( Figure S3B). As essential factors, they are homozygous lethal, and, importantly, our new alleles do not complement the characterized loss-of-function allele when placed in trans, confirming that the lethality is due to the mutation of the TF and not a CRISPR off-target effect.
To phenotype mutants ( Figure 4B), we collected staged embryos from the heterozygous adults (mutant CRISPR /balancer chromosome), which were formaldehyde fixed and processed for sci-ATAC-seq. The dissociated nuclei thereby come from a pool of F1 embryos which represent 25% homozygous loss-offunction mutant/mutant, 50% heterozygous mutant/balancer, and 25% homozygous balancer/balancer. After sci-ATAC-seq, each nucleus was genotyped de novo based on the fraction of reads mapping to the mutant or balancer chromosomes. The genotype assignment is based on over 450,000 genetic variants ( Figure S4A) between the balancer and the mutants genetic background. With a median of roughly 1,000 variants covered per cell, we could genotype 99.9% of all theoretically assignable nuclei ( Figure S4B) with high confidence (>0.9 posterior probability) ( Figure S4C). The genotype assignments were very robust, being 98% identical when performing the assigments using two sets of variants identified with a stringent or a more lenient filtering threshold ( Figure S4D).
Our de novo single-nucleus genotyping approach has a number of advantages for single-cell profiling of mutants. It eliminates the need, and associated experimental time, to hand select embryos of the correct genotype and is therefore faster and more reliable. Profiling nuclei from homozygous and heterozygous siblings in the same experiment has an additional advantage to aid in batch correction. As the heterozygous nuclei are essentially wild type, they can be used to align mutant data from the same batch to the wild-type reference trajectory, avoiding ''over fitting'' by batch aligners of biologically real mutant phenotypes.
Loss of the transcription factor Mef2 leads to a new cell state Mef2 regulates differentiation of all major muscle types (Bour et al., 1995) ( Figure 4A). To determine the functional impact of Mef2 on chromatin accessibility and mesodermal cell fate, we performed sci-ATAC-seq on Mef2 mutant embryos (a pool of homozygous and heterozygous) at 10-12 h (mainly stage 13) during the initiation of terminal muscle differentiation in the somatic and visceral muscle. Our de novo genotyping strategy assigned the expected proportion of profiled nuclei as homozygous mutant (expected: 25%, observed: 26%) ( Figure 4C).
As these experiments were performed on whole embryos, we did a first round of clustering to identify muscle cells. The profiled 12,926 cells cluster by cell type, rather than by genotype, revealing 8 broad cell states, including one large muscle population (Figures S5A and S5B; Table S5). Selecting the muscle cells, we then identified genotype and cluster-specific peaks and reclustered those 2,567 cells ( Figure 4D; STAR Methods). This resulted in five cell clusters with distinct chromatin accessibility, three of which could be identified as somatic, cardiac, and visceral muscle ( Figure 4D). Digitally genotyped Mef2À/À mutant nuclei are almost completely absent from the somaticmuscle cluster ( Figures 4E and 4F) and instead are highly enriched in two additional ''muscle clusters,'' which appear close to, but distinct from, somatic-muscle cells (Mutant1 and Mutant2 clusters). Mutant1 is composed of 89%, while Mutant2 56%, Mef2À/À cells ( Figure 4E). This indicates that in the absence of Mef2, mesodermal cells are unable to establish the regulatory landscape to become somatic muscle, and instead form a new altered state. Cells in the Mutant2 cluster have lower ($2.2-fold) coverage than the other clusters (Figure S5C). This may represent cells in a more naive state or undergoing apoptosis, although the clustering of these cells might be driven primarily by their lower coverage. The proportion of homozygous mutant cells is also partially reduced in the visceral muscle, while it is largely unaffected in the heart, at these stages ( Figures 4E and 4F). These tissue-specific differences likely reflect differences in the timing of differentiation between muscle lineages. Terminal differentiation of the somatic and visceral muscle begins after myoblast fusion at $10-12 h (stage 13), while it cannot occur in the heart until later stages, after dorsal closure is complete at $13 h (stage 15).
The somatic and visceral muscle were also the two tissues that displayed the highest relationship between Mef2 binding at 10-12 h and open chromatin signatures (chromVAR scores) in wild-type embryos (Figure 2A), suggesting that such scores are a good indicator of the tissues and time points that will most likely be affected by the removal of a TF. For example, while Mef2 is also expressed in the cardiac muscle, Mef2-bound regions are generally less accessible compared with the somatic and visceral muscle ( Figure 2A) and, concordantly, the proportion of mutant cells remains close to normal in the cardiac muscle ( Figure 4E). However, such computational inference from wildtype embryos cannot predict all mutant phenotypes. While the visceral and somatic muscle display similar accessibility (ChromVar scores) at Mef2-bound regions in wild-type embryos (Figure 2A), the mutant analyses revealed clear differences in the response of both tissues to loss of Mef2; mutant cells are almost completely absent in the somatic muscle but only partially reduced in visceral ( Figures 4E and 4F), indicating a difference in Mef2 dependency for accessibility between these two tissues, which was not evident in wild-type conditions.
To experimentally test the accuracy of our de novo genotyping strategy, we hand-sorted homozygous mutant embryos from the Mef2 mutant based on a GFP-marked balancer chromosome and performed sci-ATAC-seq on these 100% Mef2À/À nuclei. The hand-sorted mutant nuclei show the same properties as the genotyped mutant nuclei ( Figures 4E and 4F); they are absent from the somatic muscle and accumulate in two mutant clusters-mainly in Mutant1, where 88% of the homozygous mutant cells reside, similar to the digital genotyping above ( Figure 4E). Both the genotyped and hand-sorted homozygous mutant nuclei display the same alterations in chromatin accessibility at individual loci, as shown for two muscle contractile proteins, Mlc1 and Msp300 (Figure 4G). Both genes have multiple Mef2-bound regions overlapping open chromatin in cells from the somatic cluster, which are almost completely closed in both the digitally genotyped and hand-sorted Mutant1 cells ( Figure 4G). We also observe concordant gains in accessibility at regulatory regions in Mef2À/À cells, including the enhancer VT30021 ( Figure 4G), which is embryonically active, but normally not in muscle tissues. This proof of principle indicates that our de novo nuclear genotyping strategy correctly assigns homozygous mutant nuclei.
The whole-embryo single-cell data allowed us to explore if Mef2 mutant cells adopt another cell state, either from within the mesoderm or another germ layer. To assess this, we computed clusterwise accessibility correlations of the mutant clusters against all cell types (both mesodermal and nonmesodermal cell clusters) in embryos at 10-12 h (Cusanovich et al., 2018a) ( Figure 4H). Both Mutant1 and Mutant2 are most highly correlated to clusters within the myogenic mesoderm, in particular the somatic muscle, and are clearly separated from the nonmyogenic mesoderm, ectoderm, and endoderm lineages ( Figure 4H), indicating that these cells are specified to become muscle, but appear blocked in their development. To determine if they are stuck in an earlier myogenic state, we combined the mutant cell data (combining the digitally genotyped and hand-sorted mutant cells, given that they appear identical) and the heterozygous cells with all cells in our wild-type reference trajectory ( Figure 1D) and reclustered the data ( Figure 4I). The heterozygous cells (the wild-type clusters in Figure 4D) behave indistinguishably from the reference cells, falling within the expected wild-type populations on the trajectory. In contrast, Mef2À/À Mutant1 and Mutant2 cells cluster separately, off the wild-type muscle trajectory, but roughly at the appropriate ''temporal'' time point ( Figure 4I). If these cells were blocked in their developmental progression, we would expect them to cluster on the trajectory at some earlier time point in muscle development, which is not what we observe. To explore this further, we computed clusterwise accessibility correlations of the mutant and muscle clusters for each time point, which confirmed that both Mutant1 and Mutant2 are progressing to the appropriate developmental stage ( Figure S5D). This indicates that Mef2 mutant cells are not simply immature muscle cells but rather have developed a new abnormal ''muscle-like'' state, which is likely defined by the inactivation or decreased expression of late muscle function genes, combined with the inappropriate activation of nonmuscle enhancers and genes ( Figure 4G; Table S6). Taken together, this suggests that Mef2 is not only required as a differentiation factor to regulate the expression of muscle contractile genes, but also to prevent muscle cells from undergoing other cell-state changes.
Loss of tinman, bagpipe, and biniou differentially alters cellular composition We applied the same approach to three other loss-of-function mutants for TFs involved in the specification of the dorsal mesoderm (tinman) and its derived visceral muscle (bagpipe and biniou) that forms the gut musculature (Azpiazu and Frasch, 1993;Zaffran et al., 2001). These TFs have a hierarchical relationship between them, where Tinman regulates Bagpipe expression at stage 10 (6-8 h), which in turn regulates Biniou expression ( Figure 5A). To examine the function of these TFs, we assessed bagpipe and biniou mutants at 6-8 h of development, which coincides with the initiation of their expression and with the specification of the visceral muscle (stages 10 and 11). As Tinman acts upstream of both bap and bin, we shifted the time window 1 h earlier (5-7 h; stage 9, 10) to capture these events. For all three mutants, sci-ATAC-seq was performed on a pool of homozygous and heterozygous embryos, as above. Staged bagpipe and biniou mutant embryos were collected at 6-8 h (late stage 10, mainly stage 11) and the mesodermal population isolated by Mef2 FAC sorting, as in the wild-type trajectory, obtaining high-quality profiles for 6,306 and 5,833 mesodermal cells, respectively. Presorting for the mesodermal population was not possible for tinman, as it regulates Mef2 expression. We therefore performed sci-ATAC-seq on whole embryos of tinman mutants and performed a first round of clustering to identify 6,786 high-quality mesodermal cells.
To assess the fate of the mutant cells, we directly compared their development with the wild-type trajectory by co-clustering the combined mutant data, representing 18,925 homozygous and heterozygous cells, together with our wild-type mesoderm time course, correcting for batch-level effects (Korsunsky et al., 2019) Reclustering and reannotation of this joint dataset (Table S5), representing 40,232 cells, revealed a structure that is generally consistent with the wild-type trajectory ( Figure 5B). Nuclei from heterozygous mutant cells (+/À) for tinman are present in the cardiac (cluster C1) and early visceral-muscle clusters (clusters V1-3, plus V5 and V6) ( Figure 5C, middle, black). Similarly, bagpipe and biniou heterozygous mutant cells are present in the visceral mesoderm, spanning both branches, and extending to later stages of embryogenesis (clusters V1-4, plus V5 and V6) (Figures 5D and 5E,middle,black).
Examining the homozygous mutant nuclei (À/À) revealed that, in contrast to Mef2, the proportion was significantly lower than the expected 25%, representing 15%, 19%, and 17% for tinman, bagpipe, and biniou, respectively (Figures 5C-5E, left). This indicates that a proportion of homozygous mutant cells is not maintained and likely undergos apoptosis as they cannot progress in their development. The trajectories of the remaining mutant cells (À/À) are very different from their heterozygous siblings; tinman À/À cells are completely absent from the cardiac lineage and late-stage visceral muscle (V3-V6), with few remaining cells in the early visceral-muscle clusters ( Figure 5C, middle, red; V1 and V2). Moreover, there is a significant reduction of homozygous mutant cells in late mesoderm stages (clusters M4 and M5), which likely represents the dorsal mesoderm. Similarly, the bagpipe and biniou homozygous mutant nuclei are absent from visceral-muscle clusters at later stages of development (clusters V2 and V3, Figures 5D and 5E, middle, red). The early visceral cells (cluster V1) are more prominently affected in tinman mutants and to a lesser extent in bagpipe and biniou mutants, reflecting the hierarchical position of these TFs with Tinman acting upstream of both factors. These findings indicate that the circular VM cells are initially specified in bagpipe and biniou mutant embryos but are blocked from further expansion and differentiation, resulting in a loss of the VM at later stages (clusters V2 and V3). Interestingly, the hindgut and longitudinal visceral muscles This molecular data can therefore phenotype all three mutants de novo, identifying the gross phenotypes described by immunostaining of mutant embryos ( Figure S3B) (Azpiazu and Frasch, 1993;Zaffran et al., 2001). In addition, our single-cell approach provides more fine-grained quantitative information on the proportion of missing cells at a given stage ( Figures 5C-5E, right), in addition to revealing more subtle phenotypes not previously observed, including a gain of mutant cells in other muscle lineages. For example, there is a significant overrepresentation of tinman mutant cells in the early mesoderm (cluster M2) and in the somatic lineage (cluster S1) ( Figure 5C, right). The removal of these TFs thereby not only results in a loss of tissue (one cell fate) but also a more subtle gain of cells dispersed in other tissues from different mesodermal trajectories, highlighting the plasticity of cell fates within the myogenic mesoderm.
Mef2 is required for chromatin accessibility at its highaffinity sites and for gene expression Applying single-cell ATAC-seq to TF mutants in the context of a developing embryo allowed us to explore the extent to which such single-cell data can discern regulatory properties of the TF or its enhancers. As a proof of principle, we focus on the Mef2À/À mutant cell cluster (Mutant1), as it contains the highest number of mutant cells (943 cells). The somatic-muscle cluster is the closet cell type to Mutant 1 ( Figure 4H). Of the 8,725 accessible regions in both cell clusters, 408 have significant differential accessibility (DA) in Mef2À/À mutant cell (log 2 fold-change > ±0.5, Bonferroni corrected p value < 0.05) ( Figure 6A; Table S6). The majority of DA sites have reduced accessibility (67% [274/408]) and reduced sites often have a larger fold-change ( Figure 6A). Mef2-responsive sites are generally more gene distal, compared with unchanged sites ( Figure S6A) and are overrepresented in muscle enhancers: Mef2 À/À DA regions more frequently overlap (1) characterized muscle enhancers ( Figure S6B), and (2) two large collections of putative muscle enhancers defined by ChIP ) ( Figure S6C) and DNase-seq of FACS-sorted muscle cells (Reddington et al., 2020) (Figure S6D), compared with non-DA regions.
Ninety percent of the 8,725 accessible regions in either the somatic and/or Mutant1 cluster were identified in bulk DNase-seq from FACS-sorted muscle cells purified at different stages of embryogenesis (Reddington et al., 2020) (Figure S6D). While this highlights the quality of our sci-ATAC-seq data, having such information at a single-cell resolution goes beyond the detection of regulatory regions, as it reveals enhancer usage along developmental trajectories of specific cell types (as shown above) and enables a detailed analysis of the functional input of TFs to enhancers (discussed below). In addition, single-cell data have enhanced the sensitivity to identify regions that change in mutant embryos and enhanced the precision to uncover the cellular context that is susceptible to that change. To demonstrate this, we repeated the analysis in Figure 6A by testing for differential accessibility between nonmutant and mutant cells across the whole embryo and across the whole muscle population, in order to mimic samples profiled by bulk ATAC-seq in either whole embryos or FACS-purified muscle, respectively. Of the 408 DA sites in Mef2 mutants, only 7% (28) and 43% (176) are identified as significantly changed (using the same parameters as Figure 6A) compared with whole-embryo and muscle mimic samples, respectively ( Figure 6B). Our single-cell approach therefore has enhanced sensitivity to reveal regulatory changes that are not discoverable by traditional approaches, unless complex and extensive FAC sorting is used.
To explore the 408 DA sites further, we first categorized them into Mef2-bound and -unbound sites, using bulk Mef2 ChIP data at multiple time points of embryonic development . Almost half of the DA regions (48%, 197/408) are bound by Mef2 at this stage or earlier in embryogenesis. Mef2-bound DA sites almost exclusively lose accessibility ( Figure 6C), consisting of 66% of all DA sites with reduced accessibility (Figure S6E). In contrast, regions that gain accessibility are generally not bound by Mef2 ( Figure 6C) and involve regions less related to muscle function ( Figures S6F-S6H).
Removal of Mef2 affects the accessibility of only a specific subset (15%) of all Mef2-bound regions at 10-12 h. Mef2-bound sites that are sensitive to, or resistant to, Mef2 removal could depend on either their extent of co-occupancy by other factors or on the affinity of Mef2 binding. To distinguish between these (B) Proportion of 408 DA sites discovered as DA comparing mutant and nonmutant cells in whole-embryo or muscle-mimic samples, using logistic regression. two possibilities, we first examined the co-occupancy of ten TFs active in mesoderm. Susceptible sites are generally less frequently occupied by these TFs compared to nonsusceptible sites ( Figure 6D). The fraction of DA sites tends to decrease as sites are bound by an increasing number of mesodermal TFs ( Figure 6E). Examining the occupancy of a much larger set of TFs, 280 factors from modERN (Kudron et al., 2018) also shows an inverse relationship between differential chromatin accessibility in Mef2À/À cells and the number of bound TFs ( Figure 6F): the median number of bound TFs is 3 for the DA class and 40 for non-DA sites ( Figure 6G). Therefore, sites that require Mef2 for their accessibility tend to be bound by Mef2 alone or with a small number of other factors, perhaps cooperatively, suggesting that these regions are very Mef2 dependent. To investigate this further, we used the quantitative Mef2 ChIP signal as a proxy for Mef2 affinity. Susceptible (DA) sites have significantly higher Mef2 ChIP signal compared with nonsusceptible (non-DA) sites ( Figure 6H). Moreover, the proportion of DA sites steadily increases as the Mef2 ChIP signal increases, going from 5% of DA sites in the lowest to 35% in the highest ChIP quantile (Figure 6I). This indicates that sites bound more strongly by Mef2 are more likely to have reduced chromatin accessibility upon Mef2 removal. Although both classes are occupied by Mef2, susceptible sites have a 2.5-fold enrichment (61% in the DA group versus 17% in the non-DA group, Fisher's exact test p value = 2 3 10 À25 ) in the presence of a Mef2 motif ( Figure 6J). These results indicate that Mef2 is required to establish and/or maintain chromatin accessibility at a large fraction of its high-affinity sites.
Many of these regions overlap characterized ( Figure 6K) or putative ( Figure 6L) muscle enhancers. The loss of accessibility at these sites may therefore lead to changes in the expression of mesoderm/muscle genes ( Figure 6M), which most likely contributes to the mutant phenotype. To examine this, we integrated bulk RNA-expression data from Mef2 mutant embryos  and looked for genes with a Mef2bound site in their vicinity (defined as 5 kb upstream and intronic regions). Using this metric, 1,705 differentially expressed genes are associated with at least one Mef2-bound open chromatin region (Table S6). Of these, those with significantly downregulated, but not upregulated, expression (log 2 foldchange > ±0.7, q < 0.05) in Mef2 À/À mutants are highly overrepresented for a loss or reduction in chromatin accessibility in at least one of their Mef2-bound associated peaks ( Figure 6N). Moreover, genes with reduced Mef2-bound sites have significantly stronger changes in both their chromatin accessibility (Wilcoxon p value = 4 3 10 À13 ) and gene expression (Wilcoxon p value = 9 3 10 À4 ), compared with genes with unchanged Mef2-bound sites ( Figure 6O). Many known Mef2 target genes are among this set, including Mhc, Mlc1/2, Tm1, Mp20, Mlp60A, and Msp300. In addition, their expression changes become more severe with increasing numbers of associated regulatory regions with reduced accessibility ( Figure 6P). These findings indicate that Mef2 functions primarily as an activator and as the predominant regulator for the expression of these genes, which in turn likely leads to the muscle defects in Mef2 mutant embryos. It is also a rare example demonstrating that a single TF can affect the regulation of many genes by having a cumulative effect on their expression through the action of multiple dependent enhancers.

DISCUSSION
Here, we present a general framework to obtain a fine-grained view of TF function at both a cellular and molecular level using a systematic, unbiased approach. Phenotypes of developmental mutants are typically assessed by immunostaining with tissue markers and often described in qualitative and somewhat arbitrary terms. There are many examples where other phenotypes were missed as the tissue was outside the interests or scope of the study, and in some cases, suitable tissue markers were not available because they are downstream of the mutated TF. When they are, translating such coarse-grained tissue defects to the underlying molecular function of the TF remains a challenge, and typically the regulatory input is only assessed by occupancy in wild-type embryos compared with gene-expression changes in the mutant. Here, we show how single-cell regulatory trajectories, obtained by a dense time course of developmental stages, provide a new opportunity to map developmental mutants to much more precise cell states, thereby providing more fine-grained insights into mutant phenotypes. In the four mutants studied here, this approach not only revealed the loss of the expected cell types but could also quantify the proportion of cells lost, pinpoint the development stages, and reveal more subtle phenotypes, such as a gain of some mutant cells in seemingly normal trajectories of other tissues. This highlights the plasticity of mesodermal cell states and also a high degree of canalization to developmental programming, even upon mutation of these essential TFs. This could increase the overall robustness of embryogenesis, for example, by providing an excess of cells that can partly compensate for the loss of others when defects occur. The data provide a rich resource of regulatory changes associated with each step of mesoderm specification and differentiation into different muscle types, which we provide as easy to search, interactive UMAPs for further exploration (http://furlonglab. embl.de/ss/Drosophila-Mesoderm-Chromatin-Accessibility/). Going forward, this approach could be applied to reassess phenotypes and regulatory programs of ''classic'' developmental mutants and also to uncover phenotypes of completely uncharacterized mutants de novo, beckoning a new era for joint cellular and molecular phenotyping.

Limitations of the study
Although the approach presented here can readily identify cellular phenotypes from mutant embryos, it might not be possible to study molecular phenotypes in mutants in cases where specific cell types are not specified or maintained. This limitation could be overcome by using conditional depletion/ knockout strategies (Kö gler et al., 2021). Dissection of molecular phenotypes using scATAC-seq could also be masked by the binding of other TFs to the same enhancers. Here, using single-cell ChIP against H3K27ac and/or nascent RNA-seq to measure eRNA at single-cell resolution would help, although both approaches are still very challenging to apply to embryos.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following: Generation of CRISPR strains Balancer chromosomes are highly inverted chromosomes that prevent the recovery of recombinants by suppressing genetic recombination between homologous chromosomes during meiosis. They are typically homozygous lethal, and maintained in trans to a nonbalancer homologous chromosome. When placed in trans to a recessive lethal mutation, the only embryos that can survive to adulthood are the trans-heterozygous mutation/balancer, with the homozygous mutation/mutation and balancer/balancer offspring being lethal. Recessive lethal mutations can thereby be maintained in trans to a balancer for decades (indefinitely). However, as any additional spurious mutations on mutant of interest's chromosome also cannot recombine off the chromosome due to the presence of the balancer chromosome, old mutant stocks naturally accumulate other deleterious mutations.
As the loss-of-function lines for all four TFs assessed in this study were generated twenty or more years ago, they will have accumulated many additional mutations, which are also maintained by the balancer chromosomes. In addition, as they were made by different labs at different stages, they also have different genetic backgrounds. We therefore initiated this study by generating clean loss-of-function mutants for all four TFs in the same isogenic background. As the previous alleles were molecularly characterized and demonstrated to be loss-of-function, we used CRISPR induced template directed homology to regenerate the same loss-of-function alleles for each factor. Specifically, we regenerated the Mef2 22.21 allele (Flybase ID FBal0033789; (Bour et al., 1995)), tinman EC40 allele (Flybase ID FBal0032861; (Bodmer, 1993)) and biniou R22 allele (Flybase ID FBal0043738; (Zaffran et al., 2001)). These are all single-nucleotide nonsense mutations that introduce a premature stop codon and are therefore protein nulls. As the gene bagpipe (Flybase ID FBgn0004862) does not have a characterized loss-of-function allele, the mutant phenotype was characterized using a deficiency, we applied the same CRISPR approach to introduce a nonsense mutation at 3R:G21389189T, which is located in the first exon and 180 bp before the TF's DNA binding domain. These mutations were introduced in a clean, isogenic and fully sequenced fly line that expresses Cas9 in the germline under the Vasa promoter w[1118]; PBac{y[+mDint2]=vas-Cas9}VK00027 (Bloomington stock 51324).
A single stranded oligonucleotide (ssODN) was designed for each locus (Table S4) to serve as a template for HDR following the Cas9 induced double strand break, based on the protocol available on the flyCRISPR website (https://flycrispr.org/protocols/ ssodn/) (Gratz et al., 2015). The template ssODNs were designed to include additional features besides the intended single-nucleotide nonsense mutation ( Figure S3A): (1) a restriction site for SacI (GAGCTC; NEB) was introduced downstream the premature stop codon to be used for screening. (2) A thymine nucleotide was inserted immediately upstream the SacI restriction site, causing a frame-shift mutation that generates a second premature stop codon (TGA); this would serve to terminate translation in case of read-through after the first stop codon. (3) point mutations in the PAM or the gRNA seed to prevent re-cutting by Cas9. The ssODNs were synthesized by Integrated DNA Technologies (Coralville, IA, United States of America) (IDT). A single gRNA against each target locus (Table S4) was designed using flyCRISPR target finder (http://targetfinder.flycrispr.neuro.brown.edu/) (Gratz et al., 2014). The gRNAs were cloned in the vector pBs-U6-gRNA-BbsI (Addgene #45946) and injected together with the ssODNs into embryos of the Vasa-Cas9 fly line described above.
Emerging flies were crossed back to the same isogenic Vasa-Cas9 line in trans to a sequenced balancer for chromosome 2 (If/Cyo, IsoVasCas9; Mef2 allele) or chromosome 3 (IsoVasCas9, Sb/TM3 Ser; other alleles). Screening was performed by SacI restriction digestion of the PCR amplified locus. We confirmed that the correct mutant alleles were regenerated for each locus by three independent methods; (1) the intended nonsense mutations were confirmed by Sanger sequencing, (2) by a genetic complementation test, which showed that the new alleles non-complement the lines carrying the ''original'' mutant alleles described above, as expected, and (3) immunostaining showed that the new alleles recapitulate the known mutant phenotypes ( Figure S3B).

METHOD DETAILS
Embryo fixation and nuclear isolation Drosophila melanogaster embryos were collected and fixed as previously described (Bonn et al., 2012bSandmann et al., 2007b). In summary, embryos were collected in staged two-hour windows following three one-hour pre-lays to clear the females and synchronize the collections, which were aged at 25 C to the corresponding time window (3-5 hr, 4-6 hr, 5-7 hr, 6-8 hr, 7-9 hr, 8-10 hr, 9-11 hr and 10-12 hr for wild-type (Canton S) collections, 10-12 hr for Mef2, 5-7 hr for tinman and 6-8 hr for biniou and bagpipe mutant embryo collections). Embryos were dechorionated in 50% bleach for 2 min and fixed with 1.8% formaldehyde for 15 min. For Mef2 hand-sorted mutant embryo collections, the Mef2 mutant allele was placed in trans to a GFP marked balancer chromosome (CyO, twi-Gal4, UAS-GFP) and the homozygous mutant embryos (GFP negative) were hand-sorted from their siblings under a dissection microscope prior to fixation. After 15 minutes of formaldehyde fixation, embryos were quenched with glycine, washed, dried, snap frozen in liquid nitrogen and stored at À 80 C. Embryo dissociation and nuclear isolation were performed using a dounce homogenizer and a 22G needle as previously described (Bonn et al., 2012b). Nuclei were resuspended in nuclear freezing buffer (50 mM Tris at pH 8.0, 25% glycerol, 5 mM Mg(OAc)2, 0.1 mM EDTA, 5 mM DTT, 13 protease inhibitor cocktail (Roche #11697498001)), snap frozen in liquid nitrogen and stored at -80 C.
Nuclear staining for FANS of mesoderm / muscle populations One day prior to the sci-ATAC-seq experiments, aliquots of 10 million nuclei obtained from wild-type, bap and bin collections were prepared for Fluorescence-Activated Nuclear Sorting (FANS) using an improved BiTS protocol as described previously (Reddington ll OPEN ACCESS Article et al., 2020). Primary antibody staining was performed overnight at 4 C in 400 mL 1X PBS supplemented with 5% BSA, 0.1% TritonX-100 and 13 protease inhibitor cocktail (Roche #11697498001). Rabbit primary antibodies anti-Mef2 and anti-Biniou (both from (Reddington et al., 2020); 1:1000 dilution) were used to mark all mesoderm or myogenic mesoderm depending on stage and to mark visceral muscle (VM) primordia respectively. Secondary antibody staining was performed by incubation with fluorescently labelled donkey anti-rabbit IgG-PE conjugate (Biolegend #406421; 1:200 dilution) for 1 hour at 4 C in 400 ml 1X PBS supplemented with 5% BSA, 0.1% TritonX-100 and 13 protease inhibitor cocktail (Roche #11697498001).
Generation of sci-ATAC-seq libraries Generation of sci-ATAC-seq libraries was performed largely as previously described (Cusanovich et al., 2018a), with some modifications. Nuclei were washed twice by pelleting and resuspending in 1 mL 1X PBS supplemented with 0.1% TritonX-100 and 13 protease inhibitor cocktail (Roche #11697498001). Nuclei were stained with 3 mM DAPI and 2,500 DAPI+ (and Mef2+ or Biniou+ for mesoderm / visceral muscle sorting) nuclei were sorted into each well of a 96-well plate containing 5 mL of Omni-ATAC buffer (Corces et al., 2017) supplemented with 13 protease inhibitor cocktail (Roche #11697498001) and 12 mL of TD buffer (Illumina #15027866) in each well. Tagmentation was performed by adding 2 ml of each of the 96 custom and uniquely indexed Tn5 transposomes (2.5 uM; provided by Illumina as part of a collaborative agreement) and incubating at 55 C for 2 hours. Following the second sorting step and reverse-crosslinking, tagmented DNA was PCR amplified by adding 5 mL of 5 mM forward and reverse indexed primers (Cusanovich et al., 2018a), 7.5 ml of NPM polymerase master mix (Illumina #15027920) and BSA (2X final concentration) to each well and by running the following cycling conditions: 72 C 5 min, 98 C 30 s; 98 C 10 s, 63 C 30 s, 19-20 cycles; 72 C 1 min, hold at 10 C. The optimal number of cycles for each library was determined beforehand by monitoring amplification on a qPCR machine for a set of test wells. Libraries were sequenced on an Illumina NextSeq 500 sequencer High Capacity 150 PE kit (Illumina) using custom primers (sequencing primers for read 1-2 and for index 1-2) and a custom sequencing recipe previously described (Amini et al., 2014).

QUANTIFICATION AND STATISTICAL ANALYSIS
Raw data processing and cell assignment Processing of raw sequencing data was performed with the pipeline provided in (Cusanovich et al., 2018a). In brief, read barcodes presenting sequencing or PCR amplification errors were first corrected to their presumptive match (Levenshtein distance < 3 and distance to next best match > 2); all other barcodes were classified as ambiguous or unknown. Reads were trimmed with Trimmomatic v0.32 (Bolger et al., 2014) and mapped to the dm6 reference genome using Bowtie2 v2.3.4.1 (Langmead and Salzberg, 2012) (with options -X 2000 -3 1).
After removal of PCR duplicates, we proceeded to identify cells and exclude low quality cells by applying three stringent filters (Figure S1A): (1) barcodes were classified as genuine cells if there was no more than 5% uncertainty that they belonged to the higher read depth component of a Gaussian mixture model fitted to the distribution of read counts per barcode (as in (Cusanovich et al., 2018a), QC step 1). (2) As the insert size distribution was evident even in single cells ( Figure S1A), we quantified nucleosomal banding per-cell using a fast Fourier transform-based metric with the scripts provided in (Cusanovich et al., 2018b) and retained cells with clear nucleosomal banding (QC step 2). (3) We realized that cells missing a clear sub-nucleosomal peak can still receive a good nucleosomal banding score based on this metric alone ( Figure S1A) and therefore applied a third filter to remove these cells (QC step 3), based on per-cell quantification of sub-nucleosomal fragments.
Clustering of the wild-type time course After cell filtering based on the QC metrics described above, the two sci-ATAC-seq replicates were merged giving a combined data set of 24,032 wild-type cells. To maximise the resolution of the mesoderm/muscle chromatin accessibility landscape over the whole time course, we performed two rounds of clustering ( Figure S2A; Table S1): first each time point was clustered individually with Seurat v3.2.2 function 'FindClusters' based on accessibility quantified on a merged set of 42,076 peaks that were called separately for each time point. This process allowed the identification of 61 Seurat clusters, from which we excluded low quality and suspected collision clusters (12 clusters comprising 2,725 cells) based on the read depth and sex ratio metrics, as described in (Cusanovich et al., 2018a). Accessibility was quantified again at 50,261 merged peaks identified separately for each cluster and the resulting count matrix was used for clustering the full time course. Batch correction was not necessary, as we did not observe a clustering bias for the two replicates ( Figure S2B).

Calculation of gene activity scores
Gene activity scores were calculated with script 'sc_atac_window_counter.py' from (Cusanovich et al., 2018a) by computing accessibility over the whole gene body plus 500 bp upstream, for genes in the R Bionconductor package 'TxDb.Dmelanogaster.UCSC.d-m6.ensGene_3.4.4'. The resulting gene by cell accessibility matrices were imported in Seurat for downstream analysis.
Cell type annotation ATAC-seq peaks and genes were tested for differential accessibility in each cluster by logistic regression using Seurat v3.2.2 function 'FindAllMarkers', with the total counts per cell given as a latent variable. Features with a positive log fold-change and a Bonferroni adjusted P-value below 0.05 were considered to be markers of a given cluster, while the non-significant features were retained as background. The cluster markers and the background features were matched to activity terms of characterized in vivo enhancer activity Kvon et al., 2014;Rivera et al., 2019) or BDGP gene expression , and each activity term in a given cluster was tested for over-representation against the background with a Fisher's one-tailed test. As many of the activity terms are highly overlapping, the Fisher's test p-values were not formally corrected for multiple comparison, and instead to assign cell types we focused on large and consistent enrichments of similar activity terms.

Transcription factor deviation scores
To investigate the relationship between accessibility changes and transcription factor occupancy, we retrieved 16 ChIP datasets of mesodermal factors from our lab (Cunha et al., 2010;Jakobsen et al., 2007;Junion et al., 2012;Zinzen et al., 2009) (Figures 2A and 2B) and 280 ChIP datasets on diverse factors from the modERN database (Kudron et al., 2018) ( Figure 2C). In all cases, the peak sets already defined by the authors were used. For analysis with the modERN data, we followed our previously reported strategy (Reddington et al., 2020) to exclude ATAC-seq peaks occupied by any TF with ubiquitous expression, which resulted in 30,318 peaks being retained for analysis from our original list of 50,261 peaks. Deviations in accessibility were calculated with chromVAR v1.10.0 (Schep et al., 2017) and averaged per cell type in Figure 2A and per cell type and time point in Figure 2C.

Pseudotime analysis
To order cells in pseudotime, we identified trajectories for the myogenic lineages and aligned single cells along the trajectories following the approach outlined in (Satpathy et al., 2019). We used the function 'alignTrajectory' provided in (Satpathy et al., 2019) to construct trajectories in the UMAP subspace for the somatic (clusters 5, 0, 2, 3, 6), cardiac (clusters 5, 0, 15) and visceral lineages (clusters 5, 13, 8, 7) and to calculate pseudotime along the aligned cells (Table S3). We used functions 'getTrajectory ' and 'plotTrajectoryHeatmap' from ArchR v1.0 (Granja et al., 2021) to reconstruct feature trends across pseudotime and plot the peaks and genes heatmaps in Figure 3B. The binarized accessibility matrix was used as input for peaks and the log-normalized gene activity matrix for genes. For heatmap visualization, we selected the top 10% and 20% most variable peaks and genes across pseudotime. in features per cell). Features with Bonferroni adjusted p-value < 0.001 and log 2 fold-change > 0.5 were considered differentially accessible (Table S2). The log 2 fold-change was calculated using counts matrices scaled by the total counts in features per cell, in order to correct for potential coverage differences among clusters, and using a pseudocount of 10 À6 . To generate the heatmaps in Figure 2D, the scaled accessibility was averaged per cluster with Seurat function 'AverageExpression'.

Identification
Single-nucleus genotyping A set of discriminatory variants, separating the Vasa-Cas9 and the Balancer chromosomes, was used following the GATK best practice pipeline. Since the balancer chromosomes are homozygous lethal, we analyzed an F1 cross between the balancer and the virginizer line together with the homozygous virginizer line to retrieve the Balancer variants (data from (Ghavi-Helm et al., 2019)). Genomic DNA from more than 30 adult flies from the Vasa-Cas9 line was extracted and prepared for whole-genome sequencing, following the protocol in (Ghavi-Helm et al., 2019).
Balancer chromosome alleles were inferred from heterozygous variants in the cross between virginizer and balancer lines. Finally, we excluded contigs other than chromosomes 2, 3 and X, obtaining a total of 104,913 single nucleotide variants (SNVs) separating the Vasa-Cas9 and balancer chromosomes when applying the stringent filters and 465,110 when applying the lenient filters.
Genotypes were demultiplexed using Vireo v0.5.6 (Huang et al., 2019) based on the lenient set of discriminatory SNVs identified between the balancer chromosomes and the common isogenic Vasa-Cas9 genetic background of each mutant. First, cellSNP (https://github.com/single-cell-genetics/cellSNP) was used to pileup aligned reads at each variant for every cell, filtering variants with less than 50 reads or a minor allele frequency of less than 10% (options -minMAF 0.1 -minCOUNT 50 -UMItag=None). This resulted in a set of 201,349 variants for the Mef2 mutant (chromosome 2 balancer) and between 86,912 and 149,914 variants for tin, bin and bap mutants (chromosome 3 balancer) ( Figure S4). Second, allele-specific counts were passed to Vireo in order to assign nuclei to one of the three genotypes using the known reference genomes. Doublet detection was turned off (option -noDoublet) as Cas9/ Balancer cells are indistinguishable from barcode collisions between Cas9 and Balancer cells. The barcodes that passed the QC steps described above were further filtered for having been assigned a genotype, while unassigned nuclei were excluded from downstream analysis.
Clustering of the Mef2 mutant dataset After cell QC and genotype assignment (Table S5), chromatin accessibility for Mef2 nuclei was quantified at the 53,133 peaks previously identified in the whole-embryo sci-ATAC-seq dataset (Cusanovich et al., 2018a) and the mesoderm / muscle cells were identified by clustering as described above (5 clusters comprising 2,567 cells). Peaks were called as described above for each mesoderm / muscle cluster and merged with peaks called on each genotype, resulting in 54,609 merged peaks. This peak set was used to quantify accessibility in the 2,567 muscle cells and 739 additional muscle cells that were identified by clustering the Mef2 hand-sorted sample, totally 3,306 muscle cells. We performed one round of clustering followed by cell type annotation. In the final clustering displayed in Figure 4D, we removed an unrelated neuronal population (78 cells) and a group of 216 interspersed cells that showed poor clustering based on silhouette analysis (Rousseeuw, 1987), a common method to evaluate cluster cohesion and separation (all clusters had an average silhouette width above zero (mean = 0.13) except the removed cells, which had a negative average silhouette width (mean = -0.09), as calculated with R function 'silhouette' from package 'cluster' v2.1.2). For co-clustering with the wild-type time course in Figure 4I (Table S5), chromatin accessibility was quantified at 66,105 peaks obtained from merging wild-type peaks (50,261) with the lists of Mef2 genotype-cluster peaks defined above. Topic modelling was applied to the count matrix with cisTopic v0.2.2 (Bravo Gonzá lez-Blas et al., 2019) and the topic by cell matrix was batch corrected with Harmony v1.0 (Korsunsky et al., 2019) (theta = 0; wild-type cells were provided as reference with option 'reference_values') prior to UMAP visualization.
Clustering of tinman, bagpipe and biniou mutant datasets After cell quality control and genotype assignment (Table S5), chromatin accessibility for bap and bin mutant cells was initially quantified at the 50,261 peaks previously defined for the wild-type time course, and the resulting count matrices were clustered. Peaks were then called as described above for each cell cluster and merged with peaks called on each genotype, resulting in 57,878 and 55,490 merged peaks for bap and bin respectively. Similar to Mef2 processing, chromatin accessibility for tin was quantified at 53,133 peaks previously identified in a whole-embryo sci-ATAC-seq dataset (Cusanovich et al., 2018a) and the mesoderm / muscle cells were identified by clustering (6 clusters comprising 6,786 cells). Peaks were called for each mesoderm / muscle cluster and merged with peaks called on each genotype, resulting in 60,104 merged peaks. For co-clustering with the wild-type time course  (Reddington et al., 2020) and aggregated pseudobulk sci-ATAC-seq profiles for time-matched samples. (G) sci-ATAC-seq coverage at characterized mesoderm/muscle enhancers (red) and enhancers active in other embryonic tissues (non-muscle, grey). Figures 1-2): Constructing, annotating and analyzing an atlas of chromatin accessibility during a time-course of mesoderm/muscle development (A) Overview of data processing and clustering strategy. After cell filtering (Fig. S1A), the two sci-ATAC-seq batches were merged giving a combined set of 24,032 mesodermal (Mef2+) cells. To maximise sensitivity to detect peaks at each stage and cell type, peaks were first called separately for each timepoint (yielding a combined set of 42,076 peaks), which were used to cluster each time point individually (using Seurat v3.2.2), yielding 61 cell clusters. Accessibility peaks were identified again in each cell cluster, identifying 50,261 merged peaks used for clustering the whole wild-type time-course. (B) UMAP visualization of the whole mesoderm/muscle time course, highlighting cells belonging to each batch in purple (rep 1) or red (rep 2). Cells from each batch are evenly distributed throughout the time-course, suggesting little or no batch effect, keeping with the high correlation between batches (Fig. S1E). (C) Cell distribution over the first UMAP dimension per time point, showing the gradual progression along developmental time. (D) UMAP visualization of the full mesoderm/muscle time course with cells coloured by their respective Seurat defined cluster. (E) Enrichment for enhancers with tissue activity (left) and genes with tissue expression (right) terms in each cluster. Fisher's exact test P-value (-log10) displayed as a heatmap. Cell clusters are grouped by the inferred cell type annotations (top bars), colours correspond to Figure 1D. (F) As embryo collections are a mix of male and females, we assessed if there are sex-dependent effects. Each nucleus was molecularly sexed using the fraction of reads mapping to the X Chr, as female nuclei have roughly double the number of X-reads than males. Density plots show the fraction of reads from the X Chr (x-axis) for nuclei in each cluster (corresponding cluster number from (D) indicated on top). Each cluster has a bimodal distribution among nuclei in the ratio of X-Chr reads, indicating little or no sexbiases in each muscle cluster. (G) Marker genes accessibility for non-myogenic cell populations (top to bottom: neuro, fat body, hemocytes). The colour scale indicates gene average accessibility (Z-score), while dot size indicates the percentage of cells in which the gene is accessible. Annotation colours correspond to Figure 1D. (H) Upset plot showing the intersection between differentially accessible (DA) ATAC-seq peaks discovered here (5,180 peaks from Fig. 2D) compared to 63,157 peaks from tissue-specific bulk DNase-seq (Reddington et al., 2020) and 53,132 peaks from whole-embryo (WE) shot-gun sci-ATAC-seq (Cusanovich et al., 2018a). The DA sites are divided into distal (left, >500bp from a Transcriptional Start Site (TSS)) and TSS-proximal sites (right). An intersection is counted if at least 25% of a peak in this study overlaps a peak in another dataset.

Figure S3 (related to Figures 4-5): Generation and validation of new loss-of-function mutants for four mesodermal transcription factors (A)
Design of the single stranded oligonucleotides (ssODNs) for the Mef2, tin (tinman), bap (bagpipe), and bin (biniou) loci, which were used as homology directed repair (HDR) templates following a Cas9 induced double strand break. Nucleotides that differ from the isogenic vasa-Cas9 locus are indicated in red. For Mef2, tin and bin, the introduced mutation recapitulates the characterised loss-of-function allele, but is now generated in a clean isogenic background common to all four mutants. This previously characterized single point nonsense mutation (indicated by the red "STOP" box), in addition to a restriction site for SacI (GAGCTC; blue box) and a thymine nucleotide causing a nonsense frame-shift mutation (orange box) were introduced in each locus. Additional single point mutations were introduced in either the PAM site or the gRNA seed to prevent re-cutting by Cas9. (B) All four CRISPR generated new alleles recapitulate the characterised loss-of-function phenotypes for each gene. Upper left panels: Double immunostaining of Mef2 (red) and Tropomyosin (yellow) in heterozygous (+/-; identified by LacZ expression on the balancer chromosome) and homozygous (-/-) Mef2 mutant stage 16 embryos. While the Mef2 +/-embryo has stereotypically patterned somatic muscle (SM), the Mef2 -/-embryo has an absence of Mef2 protein and little or no differentiated somatic muscle. Upper right panels: Immunostaining of Mef2 (red) in tin heterozygous (+/-) and homozygous (-/-) mutant stage 16 embryos. The dorsal somatic muscle (DSM) and heart muscle (Ht) indicated in tin +/-embryos are missing in the tin -/embryo, in addition to the visceral muscle (not shown). Lower panels: Immunostaining of Fas3 (yellow) as a marker for visceral muscle (VM) in bap (left) and bin (right) heterozygous (+/-) and homozygous (-/-) mutant embryos. Note the complete absence of circular trunk visceral muscle in both mutants, while the hindgut VM (HVM) is unaffected. All embryos are orientated anterior to the left, and dorsal up.