A single-parasite transcriptional landscape of asexual development in 1 Toxoplasma gondii 2

(150 Toxoplasma gondii , a protozoan parasite, undergoes a complex and poorly understood developmental process that is critical for completing its intricate life cycle, 15 including establishing a chronic infection in its intermediate hosts. Here, we applied 16 single-cell RNA-sequencing (scRNA-seq) to > 5,000 Toxoplasma at single-parasite 17 resolution in tachyzoite and bradyzoite stages using three widely studied strains. We 18 resolve the oscillatory nature of cell cycle progression in an asynchronized population of the type I strain, RH. Using scRNA-seq, we also construct a comprehensive atlas of 20 asexual development and cell-cycle in the Type II strains, Pru and ME49, revealing 21 hidden heterogeneity in the course of development and transcription factors associated 22 with each developmental state. Lastly, we combined projection scoring with noise analysis to show that the expression of a subset of parasite-specific genes, including ones that encode surface antigens, varies independently of measurement noise, cell 25 cycle, and asexual development. Overall, our results reveal an unprecedented and surprising level of heterogeneity in Toxoplasma gondii and provide a molecular resource for understanding protozoan parasite development.

with each developmental state. Lastly, we combined projection scoring with noise 23 analysis to show that the expression of a subset of parasite-specific genes, including 24 ones that encode surface antigens, varies independently of measurement noise, cell 25 cycle, and asexual development. Overall, our results reveal an unprecedented and 26 surprising level of heterogeneity in Toxoplasma gondii and provide a molecular resource 27 for understanding protozoan parasite development.
Main text (5000 words) 33 Introduction 34 Toxoplasma gondii is an intracellular protozoan parasite that is thought to infect 35 over a quarter of the world's population 1 . Like some of its Apicomplexan cousins, 36 Toxoplasma undergoes a complex developmental transition inside the host. In  6,7 . Bulk transcriptomic analyses of Toxoplasma gondii at 58 distinct asexual stages reveal genetic modules that are expressed in each stage 8-15 , 59 including AP2 transcription factors that are thought to play a role in differentiation 16,17 ; 60 however, transitioning parasites convert to the bradyzoite stage asynchronously and 61 display a high degree of heterogeneity along the developmental pathway and in gene 62 expression 18,19 . Furthermore, parasites within the same tissue cysts have been shown 63 to display heterogeneity in the expression of bradyzoite marker proteins 20 . The transition 64 of tachyzoites to the bradyzoite stage results in an overwhelming majority of mature 65 bradyzoites in the G1 phase of the cell cycle that divide slowly, if at all 21,22 . Furthermore, 66 tachyzoites exhibit slower growth kinetics immediately prior to the bradyzoite 67 transition 21,23 . This suggests that parasites exit the cell cycle to differentiate into 68 bradyzoites, a pattern consistent with developmental processes in several other 69 eukaryotic organisms 24,25 . Dissecting these cell cycle aspects of stage conversion 70 requires a more detailed analysis than has been possible with bulk measurement of 71 tachyzoite or bradyzoite populations, or with the use of genetically modified parasites 72 coupled with chemical synchronization of cell cycle progression 13,[26][27][28] . This is because 73 the latter approaches require large quantities of synchronized parasites and can 74 potentially introduce artificial perturbations. Furthermore, bulk measurement fails to 75 distinguish parasite-to-parasite variation that is independent of cell cycle or known 76 developmental processes, potentially missing the phenotypic diversity intrinsic to a 77 population of cells. 78 Single-cell RNA sequencing (scRNA-seq) offers a powerful and unbiased 79 approach to reveal the underlying heterogeneity in an asynchronous population of cells. 80 Droplet and FACS-based approaches have already been applied towards multicellular 81 parasites such as Schistosoma to reveal developmental changes within different 82 hosts 29 . Recently, scRNA-seq has revealed a surprising degree of heterogeneity in 83 another apicomplexan parasite, Plasmodium 30-32 . Analyses derived from these single-84 parasite measurements uncovered rare and critical transition events in parasite 85 development that were undetectable in bulk measurements. Combined with novel 86 analytical development and increases in measurement throughput, scRNA-seq is 87 becoming a widely adopted tool for resolving cellular changes in a quantitative and 88 system-wide fashion. 89 Here, we performed scRNA-seq to reconstruct transcriptional dynamics of 90 asynchronous Toxoplasma parasites in the course of cell cycle and asexual 91 development in vitro. We benchmarked the purity of isolation, as well as sensitivity and 92 accuracy of our measurements, demonstrating that this experimental approach can 93 isolate single parasites to resolve the transcriptional variation of biological processes. 94 We show that cell cycle status can be accurately inferred from the transcriptional 95 signatures of an asynchronous population of Type I (RH) tachyzoites at single-parasite 96 resolution. Using Type II strains (Pru and ME49) switching to bradyzoites under alkaline 97 induction, we resolved a comprehensive single-parasite atlas of asexual development 98 together with cell cycle state annotation, identifying transcription factors that are 99 associated with each developmental state and revealing previously hidden 100 heterogeneity in the parasite. Furthermore, we identify a class of highly variable genes, 101 including ones that encode surface antigens and dense granule effectors, which exhibit 102 parasite-to-parasite variability that cannot be explained by either measurement noise, 103 cell cycle, or asexual development. Our combined results suggest that this prevalent 104 protozoan parasite may exhibit much greater heterogeneity than previously appreciated.

106
Technical validation of single-parasite sorting and sequencing 107 There are more than a dozen approaches available for single-cell isolation and 108 transcriptome amplification. Based on benchmark comparisons, Smart-seq2 generally 109 has higher sensitivity than competing droplet-based approaches 33,34 . We reasoned that 110 sensitive measurement is crucial in our study given that single Toxoplasma gondii 111 parasites are at least 50-fold smaller in volume than a typical mammalian cell, and thus 112 the average parasite gene is likely expressed with much lower copy number per cell 113 than a typical mammalian gene. For our initial studies, we used the common Type I lab 114 strain of Toxoplasma, RH, grown in vitro in human foreskin fibroblasts (HFFs). Following 115 such growth, individual tachyzoites were released by passage through a narrow gauge 116 needle and then purified by fluorescence activated cell sorting (FACS) into 384-well 117 plates. We then synthesized, amplified, and barcoded cDNA using Smart-seq2 and 118 Illumina Nextera protocols. Reaction in 384-well plates effectively reduced the reagent 119 cost by four-fold compared to the 96-well format. The sequenced reads were 120 bioinformatically deconvolved and grouped into individual parasites for analysis using 121 modified bcl2fastq and custom python scripts (Materials and methods). A schematic to 122 illustrate our experimental workflow is shown in Figure 1. 123 To ensure that our workflow efficiently captures single Toxoplasma parasites, we 124 mixed equal numbers of two transgenic lines of RH, one expressing GFP and the other 125 expressing mCherry, and sorted individual parasites into a 384 well plate based on the 126 presence of either green or red signals without a filter for those that were both red and 127 green. After Smart-seq2 amplification, we quantified the expression of GFP and 128 mCherry mRNAs using quantitative polymerase chain reaction (qPCR). Across all 301 129 wells that we measured, we observed the presence of both GFP and mCherry mRNA in 130 only one well, indicating that the rate of doublet events is below 1% (  Type I parasites, while Pru and ME49 were aligned to the ME49 strain Type II genome   145 reference. Because many genes encoding Toxoplasma secretion factors and surface 146 proteins are evolutionary products of gene duplication events 35 , we expected high 147 sequence similarity amongst a substantial portion of the parasite genes. Thus, we 148 modified our gene counting pipeline to account for duplicated genes (Materials and 149 methods). A comparison of counting methods does not reveal significant differences in 150 the observed counts (Figure 1 -Supplementary Figure 1c). Further analysis reveals 151 that our modified pipeline recovered the detection of a few more parasite genes than 152 default parameters (Figure 1 -Supplementary Figure 1d). 153 To ensure that poorly amplified or sequenced parasites did not confound our   Table   250 3). We observe high discordance of pseudotime expression for several genes in each 251 annotated organelle sets, suggesting that the current Toxoplasma annotation may need 252 significant revision. Our scRNA-seq data provide an important resource to help identify 253 mis-annotated genes and infer putative functions of uncharacterized proteins.

255
Toxoplasma has one of the most complicated developmental programs of any 256 single-celled organism; however, it is unknown how synchronized the transition is 257 between developmental states. To address this, we assessed the inherent 258 heterogeneity within asexually developing Pru, a type II strain that is capable of forming 259 tissue cysts in vitro upon growth in alkaline conditions 18,41 . We applied scRNA-seq to 260 measure and analyze Pru parasites grown in HFFs as tachyzoites ("uninduced") and 261 after inducing the switch to bradyzoites by growth in alkaline media for 3, 5, and 7 days.  Figure 3b). 290 To determine the gene modules specific to a given cluster, we conducted 291 differential gene expression for each cluster (Figure 3c and Supplementary Table 4).

292
P1 cluster is correlated with the expression of bradyzoite-specific genes while P2-5 are 293 correlated with the expression of tachyzoite-specific or cell cycle-associated genes 294 (Figure 3d). In our scRNA-seq data, we also observe a small portion of BAG1 + 295 bradyzoites (7.1%) annotated as either S, M, or C states, indicating that they are 296 replicating (Figure 3 -Supplementary Figure 4d). Our data supports the notion that 297 bradyzoites can undergo cell cycle progression, as posited by a previous report 19 . 298 Interestingly, we observe a group of AP2 transcription factors that are differentially 299 expressed across different clusters, some of which are implicated in Toxoplasma 300 development (Figure 3e). In particular, we identify AP2Ib-1, AP2IX-1, AP2IX-6, and 301 AP2VI-2 as over-expressed in P1, suggesting their potential roles in the regulation of 302 . CC-BY-NC-ND 4.0 International license certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which was not this version posted June 3, 2019. . https://doi.org/10.1101/656165 doi: bioRxiv preprint bradyzoite transition, while AP2-domain protein (TGME49_215895), AP2IX-9, 303 AP2VIIa-6, AP2XI-1, AP2IX-3, and AP2VIII-7 are highly expressed in P6, hinting at their 304 possible roles in defining this distinct cluster of parasites. 305 The most highly expressed genes in P6 include genes enriched in P2 as well as 306 bradyzoite-specific genes found in P1 (Figure 3c). To identify genes that are specifically 307 expressed in P6, we used Wilcoxon's test (Figure 3 -Supplementary Figure 6a (Figure 4a). 365 Applying this approach to our Pru data set, we first see that, as expected for an 366 asynchronous population at different cell cycle and developmental states, mRNA for 367 many genes has a low detection rate even though those genes have a mean 368 abundance across all cells that is above our threshold for detection (Figure 4b) 369 (Materials and methods). Comparison between RH and Pru demonstrates congruence 370 of many "variant" genes: 213 shared genes are more variable than the ERCC spike-ins 371 in both datasets (Figure 4 -Supplementary Figure 8a). Some degree of disagreement 372 between the two datasets is expected as the Pru data include differentiating parasites 373 while the RH data do not. Starting from the list of "variant" genes that we identified in The copyright holder for this preprint (which was not this version posted June 3, 2019. . https://doi.org/10.1101/656165 doi: bioRxiv preprint on PCA and UMAP projections are highly correlated (bottom panel in Figure 4c); 382 however, projection scoring identifies a subset of genes whose variation depends 383 exclusively on asexual development, but not on cell cycle, including ones that we 384 identified previously as enriched in bradyzoites (Supplementary Table 4). This shows 385 that projection scoring can be used to discover genes that may differ in regulation 386 across different dimensions of intrinsic biological variability. 387 To determine the variation dependence on cell cycle and asexual development in 388 Pru, we quantified projection scores for organelle gene sets of "variant" genes and 389 ERCC spike-ins (top panel in Figure 4d) genes. Interestingly, amongst these projection-independent genes, we found the 472 expression of several SRS surface antigen genes, which are known to play a role in 473 host attachment, and dense granule genes, which are known to play a role in 474 intracellular interaction with host, to be highly variable between cells of similar 475 developmental states. We also find other non-parasite-specific genes, including genes 476 encoding metabolic enzymes, to be highly projection-independent. While we cannot 477 exclude the possibility that variation in these cells is due to stochastic bursts of 478 transcription from these genes, especially given the small size of Toxoplasma, it is To filter out cells with poor amplification or sequencing reaction and doublet cells, 596 we discarded cells based on gene counts (>0 reads), total reads sum, percent reads 597 mapped to Toxoplasma genome, percent ERCC reads, and percent ribosomal RNA 598 reads. Next, we filtered "ribosomal RNA" genes from the gene count matrix. Gene count 599 matrices are normalized as counts per median (CPM): where X is the gene count matrix, sum(X) is the read sum for each cell, and 604 median(sum(X)) is the median of read sums. Normalized data are added with a 605 pseudocount of 1 and log transformed (e.g. log2(Xnorm+1)). To determine the detection 606 limit (e.g. 50% detection rate), we modeled the detection probability of ERCC standards 607 with a logistic regression as a function of spike-in amount 33 .

609
. CC-BY-NC-ND 4.0 International license certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which was not this version posted June 3, 2019. . https://doi.org/10.1101/656165 doi: bioRxiv preprint We calculated an estimate of absolute molecular abundance for all genes by fitting a 610 linear regression to ERCC spike-ins: 611 612 2 ( ) = ⋅ 2 ( + 1) + (2) 613 614 where Xnorm, ERCC>0.5 is the observed CPM value for ERCC spike-ins above the detection 615 limit, Y is the amount of ERCC spike-in, m is the regression coefficient, and b is the 616 intercept. To reduce the influence of measurement noise, we fit the model only to ERCC 617 spike-ins with mean expression above the detection limit.

619
Cell cycle analysis and annotation 620 To determine the transcriptional variation associated with cell cycle, we applied Self-     The copyright holder for this preprint (which was not this version posted June 3, 2019. . https://doi.org/10.1101/656165 doi: bioRxiv preprint