Single-nucleus atlas of the Artemia female reproductive system suggests germline repression of the Z chromosome

Our understanding of the molecular pathways that regulate oogenesis and define cellular identity in the Arthropod female reproductive system and the extent of their conservation is currently very limited. This is due to the focus on model systems, including Drosophila and Daphnia, which do not reflect the observed diversity of morphologies, reproductive modes, and sex chromosome systems. We use single-nucleus RNA and ATAC sequencing to produce a comprehensive single nucleus atlas of the adult Artemia franciscana female reproductive system. We map our data to the Fly Cell Atlas single-nucleus dataset of the Drosophila melanogaster ovary, shedding light on the conserved regulatory programs between the two distantly related Arthropod species. We identify the major cell types known to be present in the Artemia ovary, including germ cells, follicle cells, and ovarian muscle cells. Additionally, we use the germ cells to explore gene regulation and expression of the Z chromosome during meiosis, highlighting its unique regulatory dynamics and allowing us to explore the presence of meiotic sex chromosome silencing in this group.

mainly of studies in Drosophila and, to a lesser extent, Lepidoptera and Daphnia.The only part of the manuscript that I found difficult to understand was the section "Downregulation of the Z chromosome in the germline", which I will comment on below.
Thank you for your enthusiasm and suggestions for improvements, which we implement below.
1.I am confused by the classification scheme presented in lines 295-300.After going over the 3 bullet points multiple times, I think there are several mistakes: a) The first bullet point uses "log2(S0/Auto)", while the others use "S0/Auto".I assume it should be log2 in all cases, since some of the values given are negative.b) In bullet point 2, the "and" statement doesn't make sense, because both percentile cutoffs are based on S0/Auto.Should the second part be "somatic/2".c) Also in bullet point 2, should the first value be "<= -0.91", which would match the value in the first bullet point?Thank you for the comment.As you pointed out, the classification scheme had some typos, which we have now fixed.We have also adopted your suggestion to use the arbitrary criteria for the main text, and we have moved the figure with the percentiles' approach to the supplementary material and added the description of the approach in the methods section.We have also generated a separate figure (Supplementary figure 14) for the bootstrapping procedure (the non-overlapping autosomal windows approach that we had originally used).
We added the following to results section:

"In order to check if any cells have expression patterns consistent with additional repression of the S0, we classified all cells in each cluster into bins of decreasing S0:A expression ratio that should reflect the presence of dosage compensation and/or repression of the S0 (S0/Auto ratio for all cells is adjusted by adding 1-median(S0/Auto of somatic clusters)):
• Complete or partial dosage compensation: S0/Auto > 0.66 • Lack of dosage compensation: S0/Auto <= 0.66 and S0/Auto > 0.33.
• Repressed: S0/Auto <= 0.33." We have also added this sentence to the results: "As an additional measure, we used percentile-based cutoffs to control for the heterogeneity of Z chromosome regulation status caused by noise, and we recovered the same enrichment patterns (See methods section "Z-chromosome regulation status using percentile-based cutoffs" and Supplementary figure 15)." And the following section to the methods:

"Z-chromosome regulation status using percentile-based cutoffs
As the within cluster variation in the status of Z-chromosome expression is possibly driven by noise, we implemented more conservative thresholds that apply a 5% false positive rate to the first category in each comparison to provide a noise-sensitive estimate of the cluster-specific enrichments (Supplementary Figure 15): • Complete or Partial dosage compensation: S0/Auto > 5th percentile of S0/Auto in somatic clusters (>0.57).
• Repressed: log2(S0/Auto) <= (5th percentile of S0/Auto)in somatic clusters)/2 (<=0.28)." 2. If I now understand the classification scheme as corrected above, it is rather conservative in classifying "Lack of DC" relative to "Complete DC", and "Repressed" relative to "Lack of DC".This is because the authors use an approach that essentially applies a 5% false positive rate to the first category in each comparison.I understand that this is a conservative approach and should not have a large effect on the statistical analysis that compares proportions between cell types with a chi-square test (although might reduce power in cases where there are a small number of cells in a category).However, it may give misleading results in terms of the percentage of cells in each category as given in lines 303-307.I don't think the authors need to change the way the results are presented in the text/figures, but they should acknowledge in the text that their approach will underestimate the percentage of genes in the "Lack of DC" and "Repressed" categories.Maybe they could add a supplemental figure similar to figure 5, but using more relaxed (albeit arbitrary) definitions that don't rely on outliers to an empirical distribution.For example (on a non-log2 scale), Complete DC: S0/Auto > 0.66 Lack of DC: S0/Auto < 0.66 and S0/Auto > 0.33 Repressed: S0/Auto < 0.33 We have now used the suggested thresholds in Figure 5. Below are the differences between the percentiles approach and the arbitrary approach.We do see that the percentiles approach underestimates the percentage of repressed cells and cells that lack dosage compensation.However, in both cases, the enrichment of germ cells B for cells lacking dosage compensated and repressed cells remains significant.3. The above classification is also confusing, because the authors consider a log2(S0/Auto) > -0.91 as "complete" dosage compensation.However, at the low end this means the X has only slightly higher than half the expression of the autosomes.It might be better to consider this "Complete or partial DC".
We agree that "Complete or partial DC" is a better description, and we now use that throughout the paper.
Minor points line 20: "Melanogaster" should not be capitalized Thank you for the comment.We fixed this in the text.
line 87: I don't think "Meiotic" needs to be capitalized here.Similarly, "Meiosis" does not need to be capitalized in line 181.
We fixed both in the text.
line 198: here "drosophila" should be capitalized and in italics Fixed in the text.
line 289: "The median log2 of the expression ratio in Germ cells B is around -1, which is overall more consistent with lack of dosage compensation than with true Z inactivation".This observation does match the expectation based on gene dose, however, previous studies in Drosophila and birds have observed that the ratio is not as extreme in the absence of DC.It appears that the X (or Z)/Auto is closer to -0.6 (e.g. ).This would be expected if some genes had gene-specific regulatory mechanisms that balanced their expression.Many genes are part of regulatory networks that have feedback loops to maintain appropriate expression levels.Thus, I find it a bit surprising that Artemia seems to lack this type of regulation and expression appears to be almost entirely controlled by gene dose?If one assumes an expected value of -0.6 for lack of DC, then the observation of -1 might suggest a combination of lack of DC and sex chromosome repression.
We agree with your remarks, and we have added the following sentences to the discussion: "Additionally, lack of dosage compensation in other species seems to result in less than 2 fold decrease in the expression of X/Z-linked genes (~1.5 in the Drosophila testes and ~1.6 in the chicken gonads) (1)(2)(3)(4).In our analysis, the distribution of S0/Autosome ratios per cell in the germ cells B cluster is centered at 0.

(2-fold decrease in expression). If one assumes an expected value of 0.66 for lack of DC, then the observation of 0.might suggest a combination of lack of DC and sex chromosome repression."
Reviewer #2: Overview: Elkrewi and Vicoso generated the snRNA-seq and snATAC-seq for Artemia ovaries.Uniquely, Artemia has the ZW sex chromosome system which provides the opportunity to determine whether there is MSCI of the Z.Despite little being known about the regulatory programs of Artemia germline, the authors used a myriad of cutting edge SC methods to establish cell types based on similarities to the Drosophila ovaries, including two germ cell populations (A and B).
While the expression genes in the S0 region (differentiated and presumably non-recombining)of the Z decreases (relative to autosomes) in the germ cell populations, the reduction is within the range of loss of dosage compensation instead of MSCI.Interestingly, the authors found a reduced accessibility of the PAR, despite DC being unnecessary, suggesting a chromosome-wide silencing/downregulation mechanism consistent with MSCI. Comments: As the authors described, the status of MSCI has been highly contested in several organisms (flies and chicken), despite being clear in others (C.elegans and eutherian mammals).The ZW sex chromosome system of Artemia is a great system to examine this enigmatic germline regulation.The absence of well understood expression of ovarian genes makes cell cluster identity classification particularly challenging.But the authors used several sophisticated approaches to map the cell clusters to the melanogaster ovarian atlas, revealing that despite the great evolutionary distance there is enough similarity in gene expression programs of cell types to enable cell type assignment.I really appreciate the lengths to which the authors took to ensure proper inference of cell types.My primary concerns pertain to how X:A ratio is inferred and how to interpret reduced expression in relation to the dosage compensation status and how accessibility on the Z is interpreted, especially across stratas.
Thank you for the helpful comments and suggestions.As you will see below, we have improved the inference of the Z:A ratio, and clarified the interpretation of the results.
Primary concerns: I have a hard time understanding how the authors estimated the Z:A ratio using autosomal "bins".For each cell they first averaged across the 446 genes in the S0 region.But I do not understand how the denominator (autosomal gene expression) is generated in "bins with the same number of genes across the whole genome".Since comparisons of Z:A ratio is a key part of the study, the authors must provide a better explanation of their procedure.In addition, what would the distributions look like with a simple ratio of S0 median to autosomal median?
Thank you for the comment.We are now simply plotting the distribution of mean(S0)/mean(All autosomal genes) in the main figure.We use the mean, as medians with single cell data tend to be 0s due to the high dropout rate.We moved the binning approach (which we use essentially as a bootstrapping procedure) to generate an additional measure of confidence for our S0/Auto cluster medians (Supplementary Figure 14).For that approach, we divided each chromosome to non-overlapping sliding windows with the same number of genes as the S0 (446 genes), instead of random sampling of 446 autosomal genes, and we repeated the estimates of S0/Auto for all the windows.The procedure is now described in the methods section:

"Estimation of S0/Autosomal ratio using non-overlapping autosomal windows
To ensure that our S0/Autosomal expression estimates are not affected by the low expression throughout the genome, as is the case for some germline cells (see results section 3), we divided the genome into 48 non-overlapping sliding windows with the same number of genes as the S0 (446 genes).We reasoned that regions of similar gene counts as the S0 are as susceptible to the low detection rates and non-biological zeros that affect single-cell RNAseq data, and can therefore be used to ensure the overall patterns are not technical artifacts.In supplementary figure 14, we show the distribution of the per cluster medians of S0/Autosomal window for all the 48 windows." Supplementary Figure 14: Cluster medians of mean(S0)/mean(window) per cell) estimated using 48 non-overlapping genomic windows with the same number of genes as the S0.
Please note that these different approaches to analyze the data give qualitatively similar results, supporting the robustness of our conclusions.
The authors divided the germline cells into those showing complete dosage compensation (of S0), lack of dosage compensation, and repressed.Cells are considered to have "complete dosage compensation" if their log2(S0/Auto) is >-0.91.This equates to a lower-bound ratio of 0.532 which is typically considered no dosage compensation.The lack of dosage compensation is categorized by the cutoff of "<=0.91 and >-1.91" which equates to upperbound and lowerbound of 1.879 and 0.266, respectively which is nonsensical (Did the authors mean <= -0.91).The repressed category has a cutoff of <= -1.91 equating to a lower-bound ratio 0.266.I am not sure if there are typos, but these expression range classifiers are strange.There are also instances of switching from log scale to linear scale.E.g. "S0/Auto in somatic clusters -1", do they mean log(S0/Auto) -1?
Thanks for the comment and we apologize for the confusion caused by the typos.As you pointed out, our original thresholds were: • The lack of dosage compensation cutoff was log2(S0/Auto) <=-0.91 and >-1.91, which would be S0/Auto <=0.53 and >0.266.• The repressed category cutoff was log2(S0/Auto) <= -1.91 or S0/Auto <0.266.
Based on the comments of reviewer 1 and yours (see also our response to them above), for the main text (Figure 5), we now use: • The lack of dosage compensation cutoff is S0/Auto <=0.66 and >0.33.
For the Supplementary Figure 15, we used the more conservative percentile based cutoffs: • The lack of dosage compensation cutoff is S0/Auto <=0.57and >0.28.• The repressed category cutoff is S0/Auto <0.28.
While specific percentages necessarily change, this does not affect our conclusions (Germ cells A seem to lack dosage compensation, Germ cells B lack dosage compensation but a subset also seems to show stronger repression of the Z).
Odd cutoffs aside, classification of cell types like this implies that cells within the same cluster can have different Z chromosome regulation status.While this could be the case, alternatively, the cell-by-cell variation in Z:A ratio could just reflect a noisy distribution (especially when overall readcount/expression is low as is the case for germ cells B) and the median/mean is closer to the true state of the cell population.A diagnostic way to show that there are indeed different population of cells (some with DC, some no DC, and some repressed/inactivated) in germ cells B would be a clear multi-modal distribution.A unimodal distribution would suggest that noise is driving the spread and cell type classification.
We agree with your concern, which is why we used the percentile-based cutoffs, i.e. we assume that the heterogeneity of Z chromosome regulation status in the non-germline clusters is caused by noise and therefore can be used as the baseline for comparison with the germline cells.But we agree that it is possible that we observe cells with different Z expression status because of a broad distribution rather than because of an actual biological difference (such as the presence of nurse cells and oocytes in the germline, or germline cells at different timepoints in their developmental trajectory).We have plotted the distributions of mean S0/autosomes for different cell clusters as you suggested.Germ cells A have a median of 0.77 with a shoulder towards lower values, which suggests that some cells are compensated and some are not.On the other hand, the whole distribution of germ cells B is shifted (median = 0.53), suggesting a complete absence of dosage compensation.Furthermore, there is a more prominent bump on the left side (a similar bump is seen in germ cells A, but to a lesser extent), which might suggest that the repressed cells are indeed a distinct group of cells.While we want to be careful with our interpretations as it is tricky to characterize what is actually happening at a stage of overall reduction in transcriptional activity, these distributions are generally in line with our conclusions.In addition, it is also worth noting, if the Z-repressed cells indeed represent a distinct group of cells, the number of cells undergoing MSCI (or more accurately MSCRepression) appear to be a very small population of the germ cells.
We agree, and we now clearly say that it is a small subset of cells:

"While these results point to a small subset of germline cells (less than 20% of the cells in the cluster) showing at least partial Z-inactivation, it is notoriously difficult to fully exclude that absence of dosage compensation, along with sparse and noisy data, could create such a pattern."
The authors wisely looked at the PAR and younger stratas on the Y to infer MSCI.Loss of dosage compensation and MSCI should manifest in different patterns.The significant decrease of PAR expression is perhaps the most convincing evidence of repression to me, since it doesn't need dosage compensation.As such, I would recommend the authors display Figure S12 as a main figure (and perhaps a schematic breakdown of the Z chromosome architecture based on their earlier study).However, there are other oddities that make it hard to interpret the PAR expression, including significantly reduced expression across many somatic cells, with prefolicle being significant.What would explain the unexpected downregulation in a somatic cell type?How many genes are on the PAR? Thank you for the remark.The pseudoautosomal region in our annotation has around 1378 genes.We have added the number of genes in the different regions to Supplementary Figure 16 and a schematic of the Z-chromosome.As the figures in the manuscript are quite dense already, we decided to leave this one in the supplementary, but we would be happy to add a panel to Figure 5 if you think that is more suitable.
We have made an additional figure, where the PAR/Auto values were not adjusted by the median of somatic clusters, and where the different clusters were split by their Z expression classification.Review Figure 4 (below) shows that the distribution of PAR/Auto values for the majority of clusters is centered at around 1 (median PAR/Auto is 1.1 for Follicle cells, 0.987 for Tracheal cells, 1.076 for Prefollicle cells).Muscle cells seem to have higher expression of the PAR region compared to the other cell types (median PAR/Auto =1.457), which is why even prefollicle cells are significant when muscle cells are included in the somatic controls.What is worth noting is that even the repressed cells in the non-germline clusters have median(PAR/Auto per cell) around 1 (with the exception of the repressed category in Tracheal cells, which has 6 cells only).This suggests that their classification is likely due to noise and not to an actual reduction of the expression of their Z chromosome.The germ cell clusters, on the other hand, seem to have PAR/Auto values consistent with their S0/Auto expression.Despite all of that, we do agree that prefollicle cells seem to show many similarities with the germline cells, which is consistent with their assignment to group 1 based the dendrogram and heatmap of the correlation matrix of the mean expression values per cluster (Supplementary Figure 5).We mention this in the results (L183-189) and (L416-419) in the discussion.Other concerns/questions/comments: In the Drosophila testes, Y-linked genes are upregulated in the meiotic stages.This is particularly striking on a recently formed neo-Y chromosome and is thought to result from the masculinization of the male-specific chromosome (perhaps due to sexual conflict).In other testes cell types Y-linked genes are typically lowly expressed.However, the authors observed downregulation of the W in the meiotic stages instead and "dosage-compensated" expression in several of the somatic tissues.This is very counter-intuitive.How many genes are found on the W, and do the authors have an explanation for why they appear dosage compensated and become down regulated?
In mammals, both the X and the Y are repressed during meiosis, but as you point out this is not the case in Drosophila, so it is not completely straightforward to set expectations for the Artemia W. As all cell types have one W chromosome in the female somatic tissues, there is no reason for the W to not have the same overall expression levels across the different cell types.Additionally, as the W is only in females, there is no dosage compensation per say, and the W genes are individually regulated to their optimal level (which may differ across cell types).
We have included 107 W-genes in this analysis (Review Figure 5), based on the W scaffolds described in (5).The majority of those genes seem to be lowly expressed overall.However, some of them seem to have germline specific expression patterns, consistent with a minor feminization of the W. Thank you for the comment, we have replaced "in the male germline" with "in spermatocytes" in the following sentence: "In the case of fruit flies, a recent study showed that the X chromosome of Drosophila melanogaster is not enriched for silencing marks in spermatocytes, suggesting the absence of MSCI ( 6)" In the other instance, we do mention the specific cell types, where the absence of dosage compensation has been reported: "In Drosophila males, the lack of dosage compensation manifests in the absence of X chromosome upregulation in primordial germ cells, spermatocytes and spermatids (4,7,8)" Additionally, we now added that we mainly observe an enrichment in cells lacking DC in the meiotic and post-meiotic cells types in the Drosophila testes dataset: "We used the same approach and thresholds to check whether we see a similar pattern in the Drosophila testes dataset (9)

, and we only observe an enrichment in cells lacking dosage compensation in some of the germline clusters (mainly in the meiotic and post-meiotic cell types), but no cells show extreme repression of the X chromosome (Supplementary figure 20)."
In Line 74-75.The description of dosage compensation in the female germline of mammals should be better described than merely "disappear/reverse".It is "reconfigured" and rather complex, including a period of hyper-transcription and then dosage balance via expression of both Xs.
Thank you for bringing this to our attention, we have now rephrased the sentence as follows: "This sex chromosome-specific epigenetic regulation seems to disappear/reverse in the germline cells of Drosophila males (7,8) and undergo extensive reconfiguration in mammalian females, with a period of hyper-transcription before reaching dosage balance (X:A ~1) (10,11)." The authors should consider discussing the results from a recent publication investigating germline dosage compensation loss vs MSCI in a Drosophila species with neo-sex chromosomes.
We have now added a sentence discussing the findings from this paper: "In the case of fruit flies, a recent study showed that the X chromosome of Drosophila melanogaster is not enriched for silencing marks in spermatocytes, suggesting the absence of MSCI (6).Another study used single-cell RNAseq in Drosophila miranda, which has the ancestral Drosophila X (Muller element A), along with two younger X-linked chromosome arms (Muller AD and Muller C), found that all three X chromosomes have expression patterns consistent with a lack of dosage compensation in late spermatocytes and spermatids (12)." The cell grouping (1,2, and 3) is flipped in Figure 5B.
Thank you for pointing this out.We have now changed the groups in the Figure .Line 259-260, the authors state "RNA counts decrease across the germline pseudotime".However Figure 4E shows an increase in expression at the end of the pseudotime.
Yes, we agree that we should have commented on the increase.We added the following sentence to the main text: "The RNA counts seem to rise towards the end the germline pseudotime, which could be a sign of transient transcriptional reactivation, similar to what happens in Drosophila oocytes during oogenesis, between stages 9 and 11 (prophase I arrest ends at stage 13) (13).However, as the ATAC counts do not show a similar pattern, the pattern may not be biological." In Figure 6 What are red dotted lines supposed to represent?
We moved the red line to -1, where it signifies a two fold decrease in the number of fragments, and we added that information to the legend.Reviewer #3: In this manuscript, the authors performed single nucleus RNA-seq and ATAC-seq of the Artemia female reproductive system.This study presented a comprehensive single nucleus atlas of the adult Artemia franciscana female reproductive system.For example, the study identified different cell clusters, uncovered the gene expression dynamics across different cells, and shed light on the status of MSCI in the ZW system.Together with flycellatlas data, the authors also showed the conservation and divergence of female reproductive system between Artemia and Drosophila.However, I have some concerns before I would recommend the manuscript to be published.
Thank you for the support and for bringing up several aspects of the data that we have now explored in more detail.
1.There are a decent number of cells in the identified Germ cells B that can map to Artefact cells in flycellatlas data.Including these cells might be problematic, as these cells, according to the flycellatlas paper, expressed nearly all genes.The authors should clarify if the Artefact cells were included in subsequent analysis.This is a very good point.First, we should note that our own data shows no evidence of including artifacts, and we map whole clusters across species, not individual cells.Therefore even if a subset of cells is marked as artifact in Drosophila, this should not invalidate the classification of our own clusters.
That said, we have actually been puzzled by the cells labeled artefact in the Drosophila ovaries, as they cluster with the germline cells and show highly similar expression patterns to these cells (Supplementary figure 14 for a correlation analysis of the original Drosophila data).In the main text of the fly cell atlas paper, they mention the artefact cells in the context of the dataset integrating all tissues, as a cluster where they see the expression of three incompatible markers (grh for epithelial cells, Mhc for muscle cells, and Syt1 for neurons).We do not see any expression of those markers in the artefact cells in the ovary (Review Figure 6).
Below you can find the figure generated using the arbitrary approach (now Figure5B and C): Review figure 1: A) The mean(S0)/mean(Autosomes) expression per cell estimated using the normalized counts matrix.The stars show the significance values for group 3 and somatic clusters comparisons (wilcoxon rank-sum test).C) The percentage of cells that are dosage compensated (DC), lack dosage compensation (Lack of DC) and repressed (Repression).The stars show the significance values comparing group 3 clusters and somatic clusters (%Lack dosage compensation vs rest, and %Repression vs rest using Chi-square contingency test).For the stars, *** denotes p-value <= 0.001, ** denotes p-value <= 0.01, and * denotes p-value <=0.05.. Below you can find the figure generated using the percentiles approach (Panel B is now Supplementary Figure 15): Review figure 2: A) The mean(S0)/mean(Autosomes) expression per cell estimated using the normalized counts matrix.The stars show the significance values for group 3 and somatic clusters comparisons (wilcoxon rank-sum test).C) The percentage of cells that are dosage compensated (DC), lack dosage compensation (Lack of DC) and repressed (Repression) using percentile-based cutoffs.The stars show the significance values comparing group 1 clusters and somatic clusters (%Lack dosage compensation vs rest, and %Repression vs rest using Chi-square contingency test).For the stars, *** denotes p-value <= 0.001, ** denotes p-value <= 0.01, and * denotes p-value <=0.05.

Review figure 3 :
Distribution of mean(S0)/mean(Auto) values per cell in the different clusters.Dashed-lines are at 1.

Figure 6 :
Figure 6: Decreased Z-chromosome accessibility in the germline clusters.A) log2(fragments) for autosomal, PAR, and S0 in windows of 500,000 bp estimated from pseudo-bulks of all nuclei within a cluster adjusted by subtracting the median(log2(autosomal windows)) of each cluster.The stars show the significance values for the comparisons between group 3 clusters and somatic clusters (wilcoxon rank-sum test).B) and C) log2(fragments) for autosomal, PAR, and S0 in windows of 500,000 in Germ cells A and B split into pseudo-bulks based on the expression zone of the Z chromosome adjusted by subtracting the median(log2(autosomal windows)) of each cluster.The stars show the significance values for comparisons between the different categories (wilcoxon rank-sum test).For the stars, *** denotes p-value <= 0.001, ** denotes p-value <= 0.01, and * denotes p-value <=0.05.The red line is at -1, where it signifies a two fold decrease in the number of fragments.

Review figure 5 :
Expression patterns of the W genes on the putative W-scaffolds from (5).At several points in the manuscript, the authors state/imply there is no dosage compensation in the Drosophila male germline and cite several papers such as Witt et al 2021.Recent genomic and scRNA-seq studies have pretty clearly demonstrated the presence of DC during at least the early stages of the male germline, including Witt et al 2021, Mahadevaraju 2021, Anderson et al 2023, Wei et al 2024, and others.

table 1 :
Percentage of cells in the different categories of Z chromosomes expression using the two different approaches: arbitrary and percentiles approach.