In vitro cellular reprogramming to model gonad development and its disorders

During embryonic development, mutually antagonistic signaling cascades determine gonadal fate toward a testicular or ovarian identity. Errors in this process result in disorders of sex development (DSDs), characterized by discordance between chromosomal, gonadal, and anatomical sex. The absence of an appropriate, accessible in vitro system is a major obstacle in understanding mechanisms of sex-determination/DSDs. Here, we describe protocols for differentiation of mouse and human pluripotent cells toward gonadal progenitors. Transcriptomic analysis reveals that the in vitro–derived murine gonadal cells are equivalent to embryonic day 11.5 in vivo progenitors. Using similar conditions, Sertoli-like cells derived from 46,XY human induced pluripotent stem cells (hiPSCs) exhibit sustained expression of testis-specific genes, secrete anti-Müllerian hormone, migrate, and form tubular structures. Cells derived from 46,XY DSD female hiPSCs, carrying an NR5A1 variant, show aberrant gene expression and absence of tubule formation. CRISPR-Cas9–mediated variant correction rescued the phenotype. This is a robust tool to understand mechanisms of sex determination and model DSDs.

Following 3.5 days of IM differentiation, each 6 well was split into 2 new wells and viruses produced in IM media were added onto the newly seeded cells. Viruses were left on the cells for 24 hr after which media was changed to "Sertoli media" supplemented with 2 µg/ml Doxycycline (D9891, Sigma).

Flow cytometry of mouse differentiated CFP positive cells
CFP expressing early gonadal progenitors were FACS sorted 4 days following Dox addition. Cells were prepared into single cell suspension using trypsinization and filtered via a 30 µm filter (CellTrics, 04-004-2326). Sorting was performed on the BD FACSARIA TM III cell sorter using a 450 laser. CFP negative IM3 cell sample was used to set up the gating.

RNA isolation, cDNA preparation and Quantitative Real-Time Polymerase chain reaction (qRT-PCR) for mouse differentiated cells
Total RNA was extracted using Tri Reagent (Sigma, T9424) according to the manufacturer's protocol. RNA yield was quantified using a NanoDrop spectrophotometer (NanoDrop Technologies), and 1500 ng RNA was treated with RQ1 DNase (Promega) and used to synthesize cDNA using the SuperScript™ III Reverse Transcriptase kit (Invitrogen). qRT-PCR reactions were performed in duplicate using SYBR Green PCR master mix (Invitrogen) and 150 nM each of forward and reverse primers and analysed on the Applied Biosystems 7500 Real-Time PCR System (Thermo Fischer Scientific). Primers are listed in Supp Table S2. Relative mRNA levels were determined by calculating 2 -ΔΔCt values relative to the normalizer gene Pbgd. Relative gene expression is presented as the mean 2 -ΔΔCt values (error bars are SEM of the 2 -ΔΔCt ) for triple biological repeats. Statistical analysis was performed using one-way ANOVA on the 2 -ΔΔCt values using the Prism 9 software (GraphPad) (* < 0.05, ** < 0.01, *** < 0.001, **** < 0.0001).

RNA-Sequencing
Total RNA was extracted Tri Reagent as described above . Triplicate of RNA samples of ESC,   EpiSC-like cells, M-like cells, IM-like cells (IM1-5) and CFP sorted Sertoli-like cells were used for RNA-Seq. RNA quality and quantity was analysed on the Qubit (Thermo Fischer Scientific) and Tapestation (Agilent). Libraries were prepared using the TrueSeq Library Prep kit V2 (Illumina) according to the manufacturer's instructions. Sequencing was performed on the Illumina HiSeq 4000 system (Paired end, 75 bp).

Bioinformatic analysis
The mouse ESC differentiation RNA-Seq data has been deposited to the GEO under the accession number: GSE165133.

Data exploration and differential expression
Gene-level RSEM estimated counts and average transcripts lengths were imported into R 3.6.0 using the tximport function from the tximport package (76). These were used to create a DESeqDataSet object for further analysis using the Bioconductor package DESeq2 (72). Data were normalised for differing library size using DESeq2's default method, leveraging the average transcript length information. Differential gene expression analysis between replicate groups was assessed using the default Wald test. Genes were called significant it they passed a combined filter of i) FDR<=0.01, ii) fold change >= +/-2, iii) base-mean >= 100 from the Wald test results and iv) a mean normalised read count of >= 100 in at least one of the tested replicate groups.
Principal Components Analysis (PCA) was used to assess the relationship between gene expression across samples using the PCAtools package's "pca" function (center=TRUE, scale=FALSE, removeVar=0.9). Data were first variance stabilised using DESeq2's "vst" function, which is roughly similar to putting the data on the log2 scale, while also dealing with the sampling variability of low counts. Only the top 10% most variant genes across selected samples were used to generate the visualisations.
While majority of the samples were generated in-vitro there are a number that were generated invivo, specifically: E10.5, E11.5, E12.5, E13.5 gonads and E15.5 sorted Sertoli cells. This difference in protocol appeared to dominate the first principal component (PC1, 47.47% variation) when combining both in-vivo and in-vitro data. A bi-plot of PC2 and PC3 better reflected the expected biology.
The in-vitro/in-vivo protocol specific effect was modelled and removed from the variance stabilised data using the limma package's "removeBatchEffect" function for the purposes of visualisation of combined in-vitro and in-vivo samples only. PCA analysis of the corrected data showed that PC1 was analogous to PC2 of the uncorrected data.
Sample similarity was assessed using a Poisson dissimilarity matrix constructed from the uncorrected normalised counts of all samples using the "PoissonDistance" function from the PoiClaClu package. All genes with a count greater than 0 in at least a single sample were included in the generation of the matrix unless otherwise stated.
Heatmaps were generated using the variance stabilised data. Data were additionally scaled per gene using a z-score to aid visualisation. Columns (samples) and rows (genes) were each hierarchically clustered using a "complete" clustering method on a set of Euclidean distances.

Exploration of the genes driving the change using differential expression analysis
Differential expression analysis between the in vitro reprogramming cells were performed using DEseq2 with the "LRT" test on library size normalised read counts. Genes presenting an adjusted p-value < 0.05 were plotted as a heatmap using the Pretty Heatmap R package and clustered into 15 profiles using "ward.D2" algorithm (P1-15, Supplementary Data S2-3). Genes with roughly similar expression profiles were merged and subjected to a GO term enrichment analysis using ClusterProfiler (Biological processes). GO term similarity were reduced using the similarity function from ClusterProfiler to remove redundant information (Supplementary Data S4-6).

Comparison to scRNA-seq gonadal data
Single-cell RNA-seq data from Stévant et al. (15) (GSE97519) were mapped on the mouse reference genome (GRCm38) using GemTools, duplicated reads and non-uniquely mapped reads were discarded with Samtools, and gene expression was assessed using an in-house pipeline as and operate gradual transcriptomic changes that restrict their fate to steroidogenic precursors that eventually give rise to fetal Leydig cells (C5).

Generation of pseudobulks
To be able to compare bulk RNA-seq with the single-cell RNA-seq data from Stévant et al. (15), we generated pseudobulks per cell types by summing the read count per genes for each of the six cell clusters identified in the original study. For each dataset, we selected the protein coding genes from the read count per gene matrices. For each comparison, the compared dataset read counts were normalised using DEseq2 size factor. To compare the samples, we computed pairwise Spearman correlations and generated a heatmap using Pretty Heatmap R package (default clustering parameters). Because the single-cell data show a strong batch effect comparing to bulk RNA-seq, we thought to repeat our analysis on the genes constituting the cell type identity rather than the whole transcriptome. For that, we extracted the genes that are over-expressed in each of the cell clusters present in the Stévant et al. (15) study (C1-C6) (Supplementary Data S7) with a strict threshold of adjusted p-value < 0.001. We selected these genes in each of the RNA-seq datasets and re-ran the Spearman pairwise correlation analyses.

Immunofluorescence staining for mouse differentiated cells
Immuno staining was performed onto glass chamber slides. Cells were fixed for 10 min in 4% PFA (Sigma, P6148) at RT. Blocking was performed in 5% Donkey serum (Sigma, D9663) in PBS + 0.1% Triton (PBST) solution for 1 hr at RT. Primary antibodies (listed in  Movies 5-7 of reconstituted 3D images were realized using the 3D project plugging.

Quantitative Real-Time Polymerase chain reaction (qRT-PCR) for undifferentiated hiPSCs and derivatives
Total RNA was extracted from undifferentiated hiPSC and cells during the course of differentiation, using TRIzol reagent (#15596026, ThermoFisher Scientific
followed by Bonferroni comparison.

Measurement of AMH concentrations
AMH (Anti-Müllerian Hormone) was measured by a one-step sandwich enzyme-linked immunosorbent assay (Access AMH, Beckman Coulter Company, Marseille, France (73)). AMH is sandwiched between two anti-AMH monoclonal antibodies, one conjugated to alkaline phosphatase, the other coated with paramagnetic particles. A Lumi-Phos 530 chemiluminescent developer was used to read the light output proportional to the concentration of AMH in the sample relative to recombinant human AMH that was used as standard. The intra-assay coefficients of variation range from 1.41% to 3.3% and the inter-assay coefficients of variation from 3.04% to 5.76%.

Soft Matrigel substrates
Organoid grade Matrigel (#354263, Corning) was diluted to 50% (v/v) in supporting medium. 50µl domes of Matrigel were cast onto a well of chilled 12-well plate and left for gelation for 30 min at 37°C. 50µl of concentrated cells (1x10 6 cells/ml) were pippeted onto the domes slowly and incubated at 37°C for 30min, after which 50µl of supporting medium was added carefully, and the plates were incubated for 48-72 hours. Cells self-aggregate on the top of the dome and make 3D structure and excess cells grow as a layer at the bottom of the dome. Images were obtained with a Leica Microsystems DMI4000B microscope at 40x, 63x and 100X (with oil) magnifications.

GONAChip microfluidic device and migration assay
GONAChip was composed of three channels (   Heatmap of genes most differentially expressed between the E10.5 and E13.5 XX female gonads following batch correction (16). IM3/IM4 cluster closely to the E11.5 in vivo gonadal cells. Genelevel expression across samples is shown as a z-score running from red (high) to blue (low). Three biological replicates were analysed for each sample type.      Other Supplementary Materials for this manuscript include the following: Table S4: Measurement of wound closure measured by calculating the decrease in the wound width over time.

Supplementary
Supp data S1. Selected Gene marker list used in Figure S2D Supp data S2. DE_genes_Gonen_LRT