Delineating the heterogeneity of matrix-directed differentiation toward soft and stiff tissue lineages via single-cell profiling

Significance The clinical utility of mesenchymal stromal/stem cells (MSCs) in mediating immunosuppressive effects and supporting regenerative processes is broadly established. However, the inherent heterogeneity of MSCs compromises its biomedical efficacy and reproducibility. To study how cellular variation affects fate decision-making processes, we perform single-cell RNA sequencing at multiple time points during bipotential matrix-directed differentiation toward soft- and stiff tissue lineages. In this manner, we identify distinctive MSC subpopulations that are characterized by their multipotent differentiation capacity and mechanosensitivity. Also, whole-genome screening highlights TPM1 as a potent mechanotransducer of matrix signals and regulator of cell differentiation. Thus, by introducing single-cell methodologies into mechanobiology, we delineate the complexity of adult stem cell responses to extracellular cues in tissue regeneration and immunomodulation.

Single-stranded DNA primers and primer dimers were digested using 1 unit µL -1 ExoI (New England Biolabs) and 1 U µL -1 HinFI (Thermo Fisher Scientific). mRNA:cDNA hybrids were purified using Agencout AMPure XP beads (Beckman Coulter Genomics) according to the manufacturer's instructions. Second-strand cDNA synthesis was performed using NEBNext Ultra II Non-Directional RNA Second Strand Synthesis Module (New England Biolabs) and amplified by in vitro transcription (HiScribe, New England Biolabs). RNA products were fragmented using Ambion RNA Fragmentation Reagents (AM8740, Thermo Fischer Scientific), purified using Agencout AMPure XP beads, and reverse-transcribed using random hexamer primers (SuperScript III, Thermo Fisher Scientific). Single-stranded DNA fragments were linearly amplified via 14 PCR cycles (HiFi HotStart ReadyMix, Kapa Biosystems).

Sequencing and processing of single-cell RNA data: Non-conditioned, matrix-conditioned and early differentiating cells
scRNA profiling of non-conditioned, matrix-conditioned and early differentiating cells was performed using MSCs that were obtained from a male donor, age 40, weight 73 kg, height 173 cm. Libraries were paired-end sequenced (PE5: 45 bp and PE7: 35 bp) on a NextSeq 500 platform using a High Output kit (75 cycles, Illumina). Sequences were demultiplexed using indices and barcodes, permitting ≤ 1 nucleotide alignment substitution. After removing low-complexity poly-adenylated sequences, transcripts were aligned against an indexed reference genome (GRCh38.p12) using Bowtie 0.12.8, and gene intensities were evaluated based on UMI frequency.
Cells with high representation of mitochondrial RNA (>25%) or low gene number (≤1,900 per 15,000 UMIs) were discarded. Mitochondrial and ribosomal genes were discarded as well as lowfrequency genes that were detected in ≤ 5 cells. Finally, UMI count was down-sampled to 15,000 per cell, thus normalizing gene expression and minimizing batch-specific confounders.

Clustering MSC subpopulations
Computational and statistical analyses were performed using established protocols in R (R Foundation for Statistical Computing) and Matlab (Matlab 2019 Mathworks Ltd). We distinguished highly variable genes from technical noise as described previously by Brennecke and colleagues. 3 Outlier expression values were truncated by winsorizing, the squared coefficient of variation (CV 2 ) versus the mean, , was fitted by � + , 3 and highly variable genes were identified using the Chisquared test (FDR < 10 -3 ). UMI counts (CPM) of highly variable genes were log 2 ( + 1) transformed and single-cell transcriptomes were dimension-reduced via PCA. The number of significant PCs was determined based on a screen test. Unsupervised cell clustering was performed using kmeans clustering in the PCA space. The number of clusters was determined based on withincluster sum of squares (Scree plot). Genes that were differentially expressed across clusters were assigned based on the corresponding log-fold ratios of expression levels and the adjusted twotailed Wilcoxon rank-sum test p-values (Benjamini-Hochberg method). Functional enrichment analysis was performed using Gene Set Enrichment Analysis database. 4

Diffusion pseudotime analysis
Diffusion mapping was performed using the Destiny package in R. 5 Only genes that varied significantly across soft and stiff matrix-conditioned cells, which we identified using the CV 2 plot as described above (FDR < 10 -3 ), were retained. Diffusion mapping was calculated separately for cells that were cultured on soft matrices (NV, D3Soft and D6Soft; Fig. 3a) and cells that were cultured on stiff matrices (NV, D3Stiff and D6Stiff; Fig. 3b). For each matrix, we performed PCA dimensionality reduction, and the diffusion components were calculated based on the top twenty principle components that maximized variance between transcriptomes with the local sigma calculated by nearest-neighbour approximation (k = 50).
Matrix-directed cell differentiation dynamics was analysed using the MeRLOT package in R. 6 Soft and stiff matrix scaffold trees were reconstructed using the first two diffusion components (CalculateScaffoldTree function in R). Principle elastic trees were calculated, and cells were assigned to 50 discrete support nodes that were aligned along the bifurcating branches (CalculateElasticTree function in R). Setting the first support nodes that are associated with nonconditioned MSC state as the initial pseudotime, pseudotime values were further propagated along the bifurcating branches of the support trees in proportion with the distance from 0 and cells were assigned accordingly (CalculatePseudotimes function in R). Next, cells were binned into pseudotime sets (25 bins for the trunk and for each of the branches), and dynamic gene expression profiles were averaged within each bin along the bifurcating trajectories of the soft and stiff matrix trees. 7 Gene trajectories were further smoothed by fitting a natural cubic spline with three degrees of freedom (Figs. 4a and Fig. S5b).
Gene trajectories along the Soft-Adipo and Stiff-Osteo were clustered using k-means, and the clusters were chronologically ordered according to the average pseudotime point of maximal gene expression. Genes that were differentially expressed between cell subpopulations (lfc > 1, adjusted FDR < 10 -5 ) were ranked by similarity using a cosine-distance metric within each cluster, and trajectories of most similar genes within each cluster were plotted with respect to SoftAD (

Permutation test of mechanically diverging genes
The statistical significance of gene divergence along the SoftAD versus StiffOS trajectories was evaluated using a permutation test. Specifically, cells were randomly reassigned a branch label while maintaining the correct overall distribution of cells between branches. New branch curves were then calculated, and the normalized divergence area between the curves was calculated as described above. A distribution of the divergence area was obtained based on 1,000 random permutations, and p-value of the non-permutated divergence was calculated.

Pathway analysis
The activation of SRF and YAP1 signaling was evaluated by measuring the expression levels of respective target genes. 8,9 Single-cell expression levels of target genes showing a minimum average expression (UMI > 0.5) were linearly rescaled to [0,1], and signaling pathway activation was scored in each cell by the average normalized intensity.

Transcriptome-based scoring of adipogenic-osteogenic differentiation
Adipogenic and osteogenic differentiation was scored based on a subset of established markers

Transcriptome-based scoring of cell cycling
Cell cycling scoring of single-cell transcriptomes based on a combined group of gene markers of S phase (43 genes) and G2M phase (54 genes) as listed in Supplementary Dataset S3. 10 The single-cell expression intensities of these gene markers were z-scored normalized across cells and cell-cycling scores were obtained by averaging all gene markers per transcriptome. Cells in G1-phase will not express S phase nor G2M phase markers.

Transcriptome-based scoring of oxidative phosphorylation state
Oxidative phosphorylation was scoredusing a set of 23 genes that encode proteins of the electron transport chain (complexes I-V) and the ATP synapse complex. The single-cell expression intensities of these genes were z-score normalized and averaged. D3soft subpopulation was divided into low oxidative (< −0.23), mid (> −0.23 and > 0.13) and high (> 0.13) oxidative phosphorylation cells.

Sequencing and processing of single-cell RNA data: TPM1.7, shTPM1, and Control cells
scRNA profiling of TPM1.7, shTPM1 and control conditions was performed using MSCs that were isolated from two donors (Donor-1: Female age 43, weight 88 kg, height 179 cm. Donor-2: Female age 35). Cells were seeded at 5,000 cells cm -2 in polystyrene 75 cm 2 flasks and cultured in basal medium for three days and three days in bi-potential induction medium. Droplet-based scRNA seq was performed as described above. Low-complexity poly-adenylated sequences were filtered out, the remaining sequences were de-multiplexed using the dropEst pipeline, 11 and aligned to a human indexed reference genome (GRCh38.p12) using TopHat2 2.0.08. 12 Gene expression loom file was generated based on the aligned transcript UMIs that included both spliced and unspliced sequence counts. Cell quality control was performed using the R software package Seurat3 to remove transcriptomes with < 500 genes or < 1,000 unique UMIs or cells with >25% mitochondrial RNA. 13 Log-scaled gene expression intensities were normalized according to transcript number. PCA dimension reduction of the single-cell transcriptomes was performed with respect to the highly variable genes and the top twenty principal components were UMAP-visualized. Graph-based clustering was performed in the PCA space according to the Seruat3 pipeline.

Population-level RNA sequencing
Population-level RNA profiling was performed using MSCs that were isolated from two donors      , short hairpin silencing RNA targeting TPM1 gene (shTPM1), and the corresponding controls expressing Dendra2 (DDR) and non-hairpin control sequence (Ctrl shRNA). Cells were immunostained against tropomyosin isoforms. Scale bar: 100 μm. (ii) Quantitative immunofluorescence analysis of tropomyosin levels shows approximately twofold upregulation in TPM1.7 and twofold downregulation in shTPM1 cells compared with controls. b, Tropomyosin immunoblotting shows (i) ectopic overexpression above endogenous levels that increases protein levels twofold in TPM1.7 relative to DDR control, and (ii) twofold knockdown relative to Ctrl shRNA cells. Total protein levels were adjusted and normalized to GAPDH loading control. DDR: Dendra2 control. DDR-TPM1.7: Dendra2conjugated tropomyosin 1.7. Ctrl shRNA: Non-hairpin insert sequence. shTPM1: shRNA targeting TPM1. MSCs were derived from a male donor, age 40.