Continuous transcriptome analysis reveals novel patterns of early gene expression in Drosophila embryos

Summary The transformative events during early organismal development lay the foundation for body formation and long-term phenotype. The rapid progression of events and the limited material available present major barriers to studying these earliest stages of development. Herein, we report an operationally simple RNA sequencing approach for high-resolution, time-sensitive transcriptome analysis in early (≤3 h) Drosophila embryos. This method does not require embryo staging but relies on single-embryo RNA sequencing and transcriptome ordering along a developmental trajectory (pseudo-time). The resulting high-resolution, time-sensitive mRNA expression profiles reveal the exact onset of transcription and degradation for thousands of transcripts. Further, using sex-specific transcription signatures, embryos can be sexed directly, eliminating the need for Y chromosome genotyping and revealing patterns of sex-biased transcription from the beginning of zygotic transcription. Our data provide an unparalleled resolution of gene expression during early development and enhance the current understanding of early transcriptional processes.


In brief
Pé rez-Mojica et al. have developed an operationally simple RNA sequencing approach for enabling high-resolution, time-sensitive transcriptome analysis in early-stage Drosophila embryos (0-3 h). Their analysis provides detailed insights into gene expression during early development and enhances the current understanding of the earliest sex-specific transcriptional signatures.

INTRODUCTION
In many animal species, the zygote relies on maternally deposited transcripts to progress through the earliest stages of development. 1 It is not until later that the zygote takes control of its own development, a process referred to as the maternal-tozygotic transition. An important element of this transition is the start of transcription from the zygotic genome, also referred to as zygotic genome activation (ZGA). Previous studies have shown that the ZGA is a progressive event. It starts with the transcription of just a handful of genes (minor ZGA) and increases to hundreds of genes shortly thereafter (major ZGA). [2][3][4] In Drosophila melanogaster, it is generally accepted that sisterless A (sis A) and snail (sna) are transcribed early in development during nuclear cycle (NC) 8. 5,6 Evidence suggests that scute (sc) is an additional early transcribed gene, but reports differ on the timing of transcriptional onset. 5,[7][8][9] The total number of zygotic transcribed genes reported to be expressed by NC 9 is 20 and increases to 63 by the end of NC 10. 4 This so-called minor wave of transcription coincides with other important developmental processes, such as the migration of nuclei to the posterior pole of the embryo and the generation of pole cells. 10 With the onset of the minor ZGA, nuclear cycles become progressively longer. While the first eight NCs on average take 8 min, the duration continuously increases until it reaches 65 min at NC 14. 10 NC 14 marks the beginning of the major ZGA, and the number of genes transcribed increases significantly to 3,540. 4 The major ZGA is accompanied by important developmental processes such as cellularization, first gastrulation movements, end of synchronous nuclear divisions, and the start of dosage compensation by Male-Specific Lethal (MSL) complex. 2,11,12 Of note, sex-specific transcription is observed as early as NC 11. 2 The exact onset and sequence of transcriptional events leading up to this differential gene expression remain poorly understood.
In parallel to ZGA, maternal transcripts are being degraded in an organized manner. This process of clearing the maternally deposited mRNAs is essential for proper development. It is important to note that the degradation of certain maternal transcripts occurs even in unfertilized eggs. 13 Design The limited size of embryos and rapid progression of developmental processes make a quantitative assessment of transcriptional events challenging. Previous studies have investigated the timing, extent, and sex specificity of the ZGA using different methods such as RNA radioactive 14 or metabolic labeling, 4 in situ hybridization, 15,16 RNA sequencing (RNA-seq), 2,17 qPCR-based experiments, 9 and direct count of mRNA molecules. 3,18 Notably, all the different methods rely on the staging of embryos, which can be time consuming, error prone, and requires a well-optimized protocol to guarantee the fast collection of embryos when working with fresh samples. Currently, the only available option to avoid manual embryo staging is to rely on short egg-laying times and incubation to the desired developmental stages. The results of these studies, however, can be biased due to the rate of egg fertilization, regularity of oviposition, and embryo withholding. The latter has been shown to differ for more than 10 h in some Drosophila species. 19 The technical difficulties of existing protocols have led to inconsistencies between findings from different laboratories. For instance, more than half of the transcripts assigned to the minor ZGA in one study 9 were likely due to the contamination of a sample with an older embryo. 4 To address these technical limitations and ensure increased data reproducibility, we developed a single-embryo RNA-seq method to measure zygotic transcripts on a continuous timescale. Using an analysis pipeline designed for single-cell RNAseq, we utilize the transcriptome to determine the biological age and sex for each embryo, eliminating human and technical errors introduced by visual staging. The data produced using this method can be corroborated through comparison with published data and provide the first continuous timeline of transcript levels during early development (%3 h) in D. melanogaster.

RESULTS
Single-embryo RNA-seq In order to study early embryonic transcription in a continuous manner, we performed single-embryo RNA-seq on a total of 192 embryos. The embryos were collected from two different cages in three consecutive 1-h time intervals and allowed to develop further for 0, 1, or 2 h. This resulted in an approximate collection time window of 10 min to 3 h.
RNA was isolated from individual embryos to perform singleembryo RNA-seq using a modified CEL-Seq22 protocol. 20,21 The sequencing data were analyzed using the RaceID/StemID/ FateID single-cell analysis tool ( Figure 1A). 22 Embryos with less than 250,000 reads were excluded from the analysis, leaving 122 embryos for final analysis. In total, we identified 9,777 genes with R3 unique molecular identifier (UMI) corrected read counts in R5 embryos. Unsupervised k-medoids clustering of our data, according to transcriptome similarities, resulted in 14 clusters ( Figure 1B).
Dimensionality reduction of the single-embryo RNA-seq data using t-distributed stochastic neighbor embedding (t-SNE) produced a map where all samples assembled in a linear pattern ( Figure 1B). A similar arrangement was confirmed by other dimensionality reduction methods, including a Fruchterman-Reingold force directed layout and principal-component analysis (PCA) (Figures S1A and S1B).
We then used StemID2, an algorithm developed for the derivation of differentiation trajectories in single-cell data, to generate a lineage tree object, where each embryo receives a relative coordinate on the inferred inter-cluster links according to their transcriptome. The ordering of embryos along this computed developmental trajectory is also called pseudo-time. Because mated females can lay unfertilized eggs, which would compro-mise our analysis, we identified and excluded such embryos based on their position in the pseudo-timeline and the lack of expression of previously reported early transcribed genes (e.g., scw, sc, and esg). At the beginning of development, mRNA decay is the dominating process determining the pseudo-temporal ordering of samples in our analysis. It was previously reported that the decay of certain maternally deposited transcripts occurs even in unfertilized eggs 13 ; however, only fertilized eggs will start transcribing genes. This allows us to identify and exclude samples that, due to mRNA decay patterns, are assigned a high pseudo-time but lack expression of these early expressed genes (Figures S1C-S1E). The number of unfertilized eggs (n = 5) in our dataset matches the actual fertility rate measured on the same day as sample collection ( Figure S1F). Analyzing only fertilized embryos resulted in a layout similar to the one observed for all samples (117 embryos, Figure 1C), and both t-SNE and pseudo-time analysis show that there is no cage batch effect in either analysis (Figures S1G and S1H).
For our purposes, pseudo-time encompasses the time it takes from the earliest embryo measurements, 10 min after fertilization (i.e., a 10-min delay is necessitated by sample processing time), to the developmental stage represented by the last embryo on the spanning tree ( Figure 1D). We subsequently compared this computationally derived pseudo-timeline with our three sample collection time intervals. Even though embryos were collected in three defined 1-h intervals, their position in pseudo-time was not restricted to their respective collection time windows (Figure 1E; Table 1). These results confirm the propensity of mated females to withhold fertilized eggs for extended periods of time. To avoid contamination of samples by withheld embryos, published related methods currently rely on hand staging the developing embryos. Our method enables us to identify these withheld embryos and to assign them to their correct developmental times without the use of labeling protocols or the need to hand stage the embryos. Together, these results show that we have successfully developed a single-embryo RNA-seq protocol and analysis pipeline for characterizing the developmental progress within single embryos.
Single-embryo sequencing and pseudo-time analysis allow for the continuous assessment of transcription profiles during early embryogenesis Next, to investigate the expression of genes across early development, we focused our analysis only on embryos up to approximately 3 h of age (84 embryos). This analysis revealed nine stable clusters of embryos based on their transcription (Figures 2A and 2B) and provided a more refined look at the developmental trajectory. To determine if the computationally derived pseudo-time was in agreement with the biological age of the embryos, we assessed the expression of genes that have been previously reported to be activated during the minor 4 or major 3 wave of ZGA (Table S1). Plotting the combined expression of these genes onto our two-dimensional layout (Figures 2C and 2D) and along the pseudo-timeline ( Figures 2E and 2F) reveals that these two major transcriptional events coincide with the increased distance in our t-SNE map between clusters 1 and 2 and 4 and 5. This is expected as gaps are expected between major transcriptional shifts, and their presence within our pseudo-timeline validates our computational approach.
Plotting gene expression values along the pseudo-time axis provides a detailed insight into the dynamic expression patterns of these early transcribed genes. The published minor ZGA gene dataset 4 utilized in this analysis covered a tight developmental time window between NC 7 and 9, providing a static picture of an approximately 30-min-long developmental window. In contrast, our analysis provides previously unprecedented resolution of the minor ZGA, showing a staggered onset of transcription for these genes ( Figures S2A and S2B). Intriguingly, many of the 20 minor ZGA genes share a similar sharp, transient peak of expression within the 3-h time window, indicative of their roles in early developmental processes. The exceptions are E(spl)m4-BFM, a member of the Notch signaling pathway; sisA, a gene involved in sex determination; and CG6885, a gene of unknown function.
To identify the start of the major ZGA in our timeline, we used the combined expression levels of 17 genes that reportedly increase (R5-fold) between NC 14A and NC 14B ( Figures 2D  and 2F). These genes show increased expression at the transition from cluster 4 to 5 in our data ( Figure 2F). Although the published gene list was curated using embryos within a 15-min developmental time window, carefully staged according to time elapsed in interphase, nuclear elongation, and progression of cellularization, 3 our continuous analysis shows that some of these genes actually increase transcription at earlier time points ( Figures S2C and S2D). Taken together, our results provide the To confirm the dynamic nature of expression patterns uncovered in our dataset, we compared the expression dynamics among a select group of genes that are known to be transcribed early (screw, zerknullt, spitz, deadhead, stumps, and yolkless). These genes were previously shown to exhibit dynamic expression during development in different datasets that relied on the visual assessment and manual separation of samples into specific developmental categories or stages. 2, 3 We plotted the normalized expression levels for these genes, according to the stages disclosed in datasets published by Sandler and Stathopoulos 3 ( Figure 2G) and Lott et al. 2 ( Figure 2H) and according to our new computationally determined timeline ( Figure 2I). Graphs revealed that the transcriptional changes uncovered by our pseudo-time order are in good agreement with the previously published data. Pseudo-time order, cluster number, and sample ID for each embryo are shown in Table S2. Together, the results show that our method provides a high-resolution, time-sensitive picture of transcriptional events during early embryonic development.
Single-embryo RNA-seq reveals novel early transcribed genes To determine if our method can identify novel early expressed genes, we compared expression of genes between embryos from cluster 1 and cluster 2 of the t-SNE map displayed in Figure 2A. Within our dataset, we found the differential upregulation of transcription for 66 genes (adjusted p [padj] <0.01, Log2FC > 1) in this time frame ( Figure 3A; Table S3). Over-representation analysis (ORA) shows that these genes are involved in sex determination and developmental processes ( Figure 3B).
To validate their early expression and determine the biological age at which these transcripts are activated, we compared our dataset with the most comprehensive dataset on early zygotic transcription published to date 4 and performed qPCR on a subset of genes from samples obtained from hand-staged fixed embryos spanning NC 6 to 11. The results from our single-embryo and qPCR analysis confirm the published evidence that sc is one of the earliest expressed genes at NC 7 ( Figure S3A). Our results also corroborate the early expression of the additional 19 genes previously reported to be expressed during NC 7-9. However, our analysis identified many other genes that are upregulated between clusters 1 and 2, which were previously reported to be expressed at significantly later time points, including 31 genes at NC 9-10 and 15 genes at syncytial blasto-derm. Importantly, qPCR results validated our single-embryo analysis approach and confirmed the observed earlier onset of expression for a randomly selected subset of genes (Figures S3B-S3E; ato and CG13465 were previously reported at NC 9-10 and halo and dpn were previously reported during syncytial blastoderm). We next wanted to exclude the possibility that these findings were due to the contamination of our qPCR samples with older embryos. To this end, we measured the levels of two gene transcripts (hrg, bnb) identified to be expressed at a later time point in both our own temporal analysis ( Figures 3D, S3F, and S3G, right panels) and in other published datasets. 2,4 Using this approach, we detected no increase in expression for either hrg or bnb in the NC 7 and NC 8 samples ( Figures S3F and S3G, left panels), which therefore excludes the possibility of contamination of our NC 7 and 8 samples with older embryos. Further analysis revealed an additional upregulation of 37 genes between clusters 2 and 3, including hrg and bnb ( Figure 3D), and 111 genes between clusters 3 and 4 ( Figure 3G; Table S3) with an enrichment in pathways related to early developmental processes ( Figures 3E and 3H). Our analysis identifies 214 genes that are significantly upregulated between clusters before the onset of the major ZGA. Taken together, these results show that our approach is able to identify the accurate onset of transcription of early-expressed genes with high sensitivity.
We next compared expression between clusters 4 and 5 to identify genes activated at the beginning of the major ZGA ( Figure 3J; Table S3). We identified 153 significantly upregulated transcripts in the cluster 5 embryos (padj < 0.1, Log2FC > 1). ORA revealed developmental pathways involved in tissue development, sex differentiation, and signaling pathways ( Figure 3K).
Previous studies have shown that a small subset of zygotically transcribed genes are dependent on the transcription factor Zelda (Zld), 23 while a majority of zygotically transcribed genes are Zld independent but enriched for the histone variant H2A.Z. 24 To explore Zld dependency and H2A.Z occupancy behavior over time in our dataset, we quantified the overlap between our up-and downregulated transcripts in Figures 3A, 3D, 3G, and 3J in terms of being Zld dependent or independent while being bound or unbound by H2A.Z (Zld dependent and Zld independent, which were divided into H2A.Z positive or negative). 24 Our analysis shows that the earliest minor ZGA genes are mostly Zld dependent ( Figures 3C and 3F) and that the share of Zld-dependent genes decreases sharply with the onset of the major ZGA. In contrast, the share of Zld-independent genes, both H2A.Z positive and negative, increases with the onset of the major ZGA ( Figures 3I and 3L). These observations are not true for downregulated transcripts, which showed a similar distribution between being Zld dependent and independent, which was similar to all analyzed transcripts ( Figure S3H).
In this way, we have demonstrated that our single-embryo RNA-seq methodology and analysis are a highly sensitive approach for identifying the accurate onset of gene transcription. Further, our analysis is able to define important transcriptional events and identify signatures of gene regulation during early development. Technology ll OPEN ACCESS mRNA decay of maternally deposited transcripts In addition to transcription, our dataset also reveals patterns of maternal RNA decay. While our method is unable to distinguish maternal mRNA from zygotic mRNA, it can be used to study maternal mRNA decay before the zygote takes complete control of its own transcription. In order to identify maternally degraded mRNAs only, we compared cluster 1 (youngest embryos) with cluster 5 (onset of major ZGA) and selected only maternally deposited transcripts for analysis. Maternally deposited transcripts were defined as those with an averaged normalized read count >1 on the first 10 samples in our pseudo-time.
Using this method, a total of 2,621 significantly degraded transcripts were identified (padj < 0.01, Log2FC < À1). Ninety-two percent of these significantly degraded transcripts had also been shown to be degraded in a previously published dataset (classes II, III, IV, and V) ( Figure 4A); 13 only 35, 33, or 13 genes were within the Thomsen stable (class I), purely zygotic group of transcripts, or preloaded and transcribed, respectively.
ORA of all 2,621 degraded transcripts revealed mainly pathways related to metabolism ( Figure S4). This result likely reflects the elimination of transcripts important during oogenesis but that are no longer needed for development. While patterns of transcript abundance differed before and after cluster 5, there is a clear inflection point at around value 50 of our pseudo-timeline. This time point coincides with the onset of the major ZGA. Maternal transcripts are deposited in oocytes at very different levels. To determine if degradation rates in the zygote are related to the initial quantity of deposited transcripts, we divided the significantly downregulated genes into four quartiles by their level of transcript abundance in cluster 1. We then determined the total number of normalized reads for each quartile in each individual embryo. Mean read counts plotted in Figure 4B show the progressive nature of the maternal mRNA decay up to cluster 5. Plotting the ratio of cluster 5 to cluster 1 for the different quartiles shows that the rate of decay is directly proportional to initial  (Table S1).  Figure 4C), meaning that transcripts of low and high abundance are degraded at the same rate. This also provides evidence that our single-embryo approach does not introduce any bias toward lowly expressed genes, which would be affected most significantly by an overall increase in RNA content due to ZGA. Overall, we showed that maternal mRNA decay before the major ZGA is a progressive process. Degradation of maternal mRNA is proportional to transcript levels, suggesting that mRNA abundance is not related to degradation rate.

X/Y chromosome genotyping uncovers transcriptional dynamics of primary sex determination
Our prior analysis of the earliest transcribed genes indicates sex determination as the most enriched pathway ( Figure 3B). The current model for primary sex determination is based on the tightly controlled sex-specific expression levels of genes such as Sex lethal (Sxl) and male-specific lethal (msl-2) 25 during early development. This made us wonder if there are additional detectable differences in transcription between male and female embryos during our early developmental time window. To define the sex of each individual embryo, we isolated DNA from the organic phase after TRIzol extraction of RNA and performed qPCR using primers specific for the X and Y chromosomes. Due to low DNA content of younger embryos, we only get consistent PCR results after embryo number 23 in our pseudo-time analysis. Based on these results, we categorized all embryos (after pseudo-time position 23) according to sex (Table S2). To determine the differential expression of genes between male and female embryos, we used splineTimeR (see STAR Methods), which is particularly designed for identification of expression changes in longitudinal data. Our analysis identified 120 transcripts that were differentially expressed between male and female embryos (padj < 0.01) ( Figure 5A). Although a large number of the differentially regulated genes are located on the X chromosome (44%), more than half the genes are located on autosomes and rDNA (56%). Several known regulators of primary sex determination, such as Sxl, sc, sisA, and msl-2, were also identified as significantly expressed in our analysis (Table S4). Indeed, ORA shows that sex differentiation is the most enriched pathway (Figure 5B). We selected and plotted known regulators of sex determination using our pseudo-timescale ( Figures 5C and 5D). This analysis shows that differential transcription of sc and sisA (Figure 5D) precedes the expression of Sxl. This agrees with the role of sc and sisA as activators of Sxl expression in females (Figure 5C). Additionally, we identify other differentially expressed transcripts that precede Sxl transcription, such as CG14427. Ex-amples for male-specific expression are plotted in Figure 5E. These data show that the start of differential expression of stonewall (stwl, chromosome 3L) matches the start of differential expression of Sxl and msl-2 and that a pre-rRNA gene (pre-rRNA:CR45847) is expressed only in males shortly after Sxl and msl-2.
Another important process linked to primary sex determination is dosage compensation, which ensures equal expression of X-linked genes in males and females. In Drosophila, this is accomplished by the 2-fold upregulation of the X chromosome in males. It has previously been reported that X-lined genes are upregulated in males as early as NC 14, 2 but functional studies suggest the likely onset of canonical dosage compensation at NC 15. 12 To assess the onset of dosage compensation in our dataset, we excluded all maternally deposited transcripts and determined the total number of normalized reads for the remaining genes on the X chromosome and autosomes for each individual embryo. Average read counts of male and female embryos within each cluster are plotted in Figures 5G and 5H. From cluster 4 to 7, we observed a 1.53 higher expression of X-linked genes in female compared with male embryos, but no difference in autosomal reads. This difference was reduced to 1.13 in cluster 9 (gastrulation onset) ( Figure 5G), probably due to the start of canonical dosage compensation. Two components of the dosage compensation complex have been shown to have male-biased transcription, msl-2 and long non-coding RNA on the X 1 (lncRNA:roX1). msl-2 is expressed at higher levels in males almost from the moment it begins to be transcribed ( Figure 5C) and well before we detect dosage compensation. lncRNA:roX1 is expressed at higher levels in female embryos at first (Figure 5F), which can be explained by its localization on the X chromosome; levels only start to be higher in males once we see evidence for dosage compensation (cluster 8).
To further investigate how transcript levels are influenced by their chromosomal localization, we plotted early transcribed genes from our prior analysis according to sex and chromosome location (Figures 2A and 2D). Analysis shows that early transcribed genes from the X chromosome, but not autosomes, tend to have higher levels in females compared with males (Data S1). The time point at which expression levels in females Technology ll OPEN ACCESS are higher than in males varies between genes, with some differences detected as soon as transcription of a gene starts (e.g., ac, acheate) and others occurring later in transcription (e.g., run, runt) (Data S1).
Beyond the biological relevance of sex-specific transcription, we asked whether sex-specific gene expression could serve as a tool to determine the sex of individual embryos. This approach would eliminate the need for utilizing the standard time-consuming genotyping approach. To this end, we plotted Sxl and msl-2 transcript levels and observed a clear separation of embryos according to their sex (Figure 5I). We further validated this approach by applying it to a published single-embryo sequencing dataset, 26 confirming that using Sxl and msl-2 expression is sufficient to determine the sex in embryos of different Drosophila species ( Figure 5J). Of note, this approach only works in embryos after  26 Metadata for sex-specific analysis can be found in Table S9. the onset of Sxl and msl-2 transcription, around NC 12 and NC 14D, respectively (Data S1D).
Overall, our analysis detects sex-specific transcription as early as the minor ZGA. Capitalizing on this differential gene expression, a simple strategy to sex embryos has been developed.

DISCUSSION
Studying early development is challenging due to the rapid progression of biological processes and the limited amount of material available. To overcome these limitations, we developed a single-embryo RNA-seq and analysis approach, using the transcriptome as a measure of developmental progress (pseudo-time) to determine the biological age of the embryo. The high sensitivity of our method allows us to provide an accurate assessment of zygotic transcription and uncover the dynamic patterns for hundreds of genes. Our single-embryo approach also enables us to determine sex-specific differences in transcript abundance. Utilizing these sex differences, we developed a new strategy to determine the sex of each embryo, without the need for genotyping. Together, we established an operationally simple method to document gene expression changes at unprecedented temporal resolution and provide a continuous assessment of transcriptional processes during early development.
An operationally simple, single-embryo sequencing method Previous studies investigating zygotic transcription relied on precise collection time windows and/or elaborate manual staging of the embryos under a microscope. Drosophila females, however, are known to lay unfertilized eggs and withhold embryos. 19 Human error in staging of embryos and irregular laying patterns of females can lead to the inclusion of mis-staged embryos in the analysis. Recent studies have highlighted the advantages of single-embryo sequencing approaches over working with pooled samples, 26,27 specifically the ability to detect and exclude older embryos from the analysis, therefore providing more accurate data.
In this work, we present an optimized single-cell sequencing protocol for use with Drosophila embryos along with a singlecell bioinformatic pipeline for their analysis. We assign a computationally derived age to each embryo, based upon their transcriptome, thereby circumventing the need for elaborate and error-prone staging procedures. Indeed, we show that our computationally derived pseudo-timeline reflects the biological age of the embryos, by comparison with previously established datasets. In addition, our protocol reduces reagent and sequencing costs due to the low-volume nature of the experiments and the inclusion of UMIs. 21 Further, this method requires no special instrumentation beyond a micro-pipetting device, and the analysis utilizes established tools. 22 Taken together, we show that sequencing 192 individual embryos provides a sufficient dataset to carry out detailed analysis of gene expression patterns over the first 3 h of development. Typically, it cost us around US$36 to generate the library and sequence a single embryo at 6 M reads, making it an affordable alternative to sequencing bulk samples of manually staged embryos. Together, these advantages make the method reported here the most accessible methodology developed to date, opening up this type of research to almost any Drosophila lab. This single-embryo sequencing approach will, ultimately, lead to improved reproducibility of developmental studies between experiments and laboratories.
An accurate characterization of early transcriptional events Our pseudo-time approach allows us to identify the exact onset of transcription even for lowly expressed genes and reveals that previously reported, as well as many novel, transcripts are expressed as early as NC 7. One example is the early expression of halo, a cofactor for the molecular motor, kinesin, and a regulator of lipid droplet movement, which was previously reported to be actively transcribed during syncytial blastoderm (after NC 11), 4 but was identified as one of the earliest transcribed genes in our dataset. Our analysis also reveals the dynamic nature of transcriptional events and provides information about expression for thousands of genes at a temporal resolution unchallenged by other methods. Recently, a single-cell dataset was published covering all of embryogenesis and providing insights into cell-type-specific transcriptional changes during development. 28 While this dataset provides an extremely detailed insight into Drosophila development, it only detected a median of 399 UMIs and 274 genes per cell, likely only covering very highly expressed genes. In contrast, we detected a median of over 600,000 UMIs and identified over 7,200 genes per embryo, leading to a total 9,777 identified genes across the whole dataset. Thus, our dataset gives a much more complete picture of transcriptional changes during early development.

Sex-specific gene expression
Our single-embryo method also distinguishes between male and female embryos, allowing for investigations into sex-specific transcriptional effects. X-signal elements (XSEs) have been shown to control the early sex-specific expression of Sxl and to drive primary sex determination. We show differential expression of sc and sisA, two strong XSEs, from the very first moment of ZGA (NC 7). Of note, CG14427, an X-linked gene with unknown function, is also differentially expressed between males and females starting at NC 7, making it a potential candidate as a novel XSE. Our data also allow for further insights into primary sex determination. An early expressed XSE, run, was previously reported to undergo a non-canonical form of dosage compensation and expression that was proposed to be under the direct control of Sxl. 29,30 Our results show that run is one of the earliest transcribed genes, preceding expression of Sxl. Our data also show that expression of run peaks after Sxl peak expression. As such, our data support a role of run as a regulator of Sxl 31 rather than Sxl controlling expression of run. 29,30 Surprisingly, we also identified several autosomal encoded genes as differentially expressed between males and females. While differential expression of X chromosome genes in females can be explained by their different dosage (2X in females versus 1X in males), this is not the case for autosomal genes, which are present at equal dosage in both males and females. These results suggest additional players in primary sex determination. Further studies will be needed to confirm these results and investigate the underlying mechanisms. Importantly, our newly developed strategy to determine the sex of single embryos by using the expression of known regulators of primary sex determination (Sxl and msl-2) eliminates the need for elaborate genotyping procedures in future sequencing datasets.
Limitations of the study While our method detects more transcripts (9,777) and achieves a higher sequencing depth (600,000) than single-cell methods, it may still be more difficult to detect lowly expressed genes compared with bulk RNA-seq datasets. Further, our dataset lacks cell-type-specific information. However, this is likely of only minor relevance during early development, as it is known that all nuclei share the same cytoplasm until cellularization, and the previously reported single-cell sequencing dataset only identified three different types of cells (anlage in statu nascendi, aminoesra anlage, ectoderm anlage) during the first 4 h of development. 28 That said, we acknowledge that cell-specific expression patterns become more important at later stages in development, and, while our method does not allow for this kind of analysis, our approach remains much more accessible than single-cell sequencing of thousands of individual cells at different developmental time points. Our method also does not reveal information about the spatial distribution of transcripts. Transcript gradients, however, have been shown to be of particular importance during early development. Once changes in gene expression have been determined using our methods, these patterns could be further investigated using in situ hybridization to assess whether the spatial distribution of a particular transcript is altered.
Taken together, we believe our method is the most accessible, high-throughput, transcriptomic technology published to date to Cell Genomics 3, 100265, March 8, 2023 9 Technology ll OPEN ACCESS study early gene expression in Drosophila. We suggest that our methodology provides the optimal tool to investigate the transcriptional consequences of mutations in developmental genes, providing gene expression data at a depth and temporal resolution that was previously inaccessible.

Materials availability
This study did not generate new unique reagents.
Data and code availability d Single-embryo RNA-seq data from this study has been deposited at NCBI Gene Expression Omnibus (GEO). Accession numbers are listed in the key resources table. Normalized reads, gene details, and metadata for sex-specific analysis can be found in Tables S7, S8, and S9. d All original code is available in this paper's supplemental information. d Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

EXPERIMENTAL MODEL AND SUBJECT DETAILS
Organisms Drosophila genetic reference panel (DGRP)-737 line from Bloomington Stock Center (#83729) was kept in incubators at 25 C with 60% humidity and a 12-h light-dark cycle. All flies were raised at constant densities on standardized cornmeal food (Bloomington recipe), Fly food M (LabExpress, Michigan, USA), and transferred into cages 1-2 days after eclosion.

METHOD DETAILS
Embryo collection 8-9 day old flies were put in embryo collection cages (Genesee Scientific, Cat #: 59-100). Food plates were changed and discarded twice before embryo collection started on. DGRP-737 line showed minimal egg laying (n = 0-2) in the first 30 min after changing the plates (data not shown), therefore, plates were changed every 90 min and processed immediately (0-1 h embryos) or incubated 1 or 2 more hours at 25 C for 1-2 h or 2-3 h timepoints respectively. Embryos were transferred into a pluriStrainer 150 mM cell strainer (pluriSelect, USA) and washed with tap water. Embryos were dechorionated by incubation in 3% sodium hypochlorite (PURE BRIGHT bleach, KIK international LLC) for 4 min, washed in 120 mM NaCl (Sigma-Aldrich, USA), 0.03% Triton X-100 (Thermo Fisher Scientific, USA) solution, and finally washed in ultrapure water (PURELAB Ultra, ELGA). For RNA-seq, single embryos were transferred into 2 mL pre-labelled (on side and top) screw-cap microtubes containing 0.2 g lysing matrix D beads (MP Biomedicals, USA) using a 20/0 liner brush (Royal & Langnickel, USA), snap-frozen on dry ice, and stored at À80 C. Embryos for qPCR experiments were fixed immediately.
RNA isolation from single embryos 500 mL TRIzol Reagent (Thermo Fisher Scientific, USA) and 50 mL Gibco 1x PBS (phosphate buffered saline) at pH 7.4 were added to microtubes with frozen single embryos. Samples were homogenized by bead-beating with 0.2 g lysing matrix D beads (MP Biomedicals, USA) at 6 m/s for 30 s using a FastPrep-24 instrument (MP Biomedicals, USA). RNA was then isolated following a miniaturized version of the manufacturer's instructions. Briefly, 100 mL chloroform (Sigma-Aldrich, USA) was added, samples mixed by vortex, incubated 2 min at room temperature (RT), and centrifuged for 15 min at 12,000 3 g at 4 C. The aqueous (upper) phase was transferred to a new pre-labelled 1.5 mL microtube, keep samples on ice. At this step, the organic phase was stored at À80 C for later DNA extraction. RNA was precipitated by adding 250 mL ice-cold isopropanol (Sigma-Aldrich, USA) and 2 mL GlycoBlue (Thermo Fisher Scientific, USA) as coprecipitant. If processing more than 14 samples, it is recommended to split samples into two groups after this step. Samples were mixed by hand, incubated for 10 min at RT, and centrifuged for 10 min at 12,000 3 g at 4 C. RNA pellets were washed with 1 mL 75% (v/v) ethanol (Pharmco, USA), dried, and stored at À80 C until further use.
Library preparation and RNA-seq RNA-seq was carried out following a miniaturized version of the sensitive highly-multiplexed single-cell RNA-seq (CEL-Seq2) protocol 20,22 with modifications.
Dried RNA was resuspended in 8 mL nuclease-free water (Thermo Fisher Scientific, USA) and a nanoliter-scale liquid handler was used to dispensed 120 nL of RNA sample into a 384-well plate holding 240 nL of primer-mix, ERCC RNA Spike-In Mix (Thermo Fisher Scientific, USA), dNTPs, CEL-Seq2 oligoT with different barcodes and unique molecular identifiers (UMI) and 1200 nL VaporLock (Qiagen, USA) to avoid evaporation. All pipetting steps referring to nL were carried out using a nano-scale automated liquid handler, and we have successfully used the i.DOT (CELLINK) and the mosquito LV (sptlabtech).