Evaluation of the impact of iPSC differentiation protocols on transcriptomic signatures

the iPSC derived models to better understand and utilise them for toxicological relevant applications. In our study, iPSC (SBAD2 or SBAD3 lines obtained from StemBANCC project) were differentiated towards toxicolog-ically relevant cell types: alveolar macrophages, brain capillary endothelial cells, brain cells, endothelial cells, hepatocytes, lung airway epithelium, monocytes, podocytes and renal proximal tubular cells. A targeted transcriptomic approach was employed to understand the effects of differentiation protocols on these cell types. Pearson correlation and principal component analysis (PCA) separated most of the intended target cell types and undifferentiated iPSC models as distinct groups with a high correlation among replicates from the same model. Based on PCA, the intended target cell types could also be separated into the three germ layer groups (ectoderm, endoderm and mesoderm). Differential expression analysis (DESeq2) presented the upregulated genes in each intended target cell types that allowed the evaluation of the differentiation to certain degree and the selection of key differentiation markers. In conclusion, these data confirm the versatile use of iPSC differentiated cell types as standardizable and relevant model systems for in vitro toxicology.


Introduction
In 2006, induced pluripotent stem cells (iPSC) were first developed by reprogramming mouse adult somatic cells with integrative retroviral transduction of four key transcription factors named "Yamanaka factors" consisting of Oct3/4, Sox2, Klf4 and c-Myc (Takahashi and Yamanaka, 2006).Over years, this technology was further improved by other laboratories to reprogram human adult somatic cells, from sources such as skin, hair follicles, peripheral blood mononuclear cells and urine (Jiang et al., 2017;Re et al., 2018;Vlahos et al., 2019;Zhou et al., 2011) to produce iPSC by genetic reprogramming using integrating viral vectors such as retroviral or lentiviral vectors (Toivonen et al., 2013) and non-integrated viral vectors such as Sendai virus, transfection with episomal vectors, synthetic mRNA, miRNA, small molecules to obtain transgene-free iPSC (Ban et al., 2011;Bang et al., 2018;Fusaki et al., 2009;Rauch et al., 2018;Schlaeger et al., 2015;Subramanyam et al., 2011;Warren et al., 2010;Yu et al., 2011).These iPSC have similar molecular, morphological, and functional profile to embryonic stem cells (ESC) and closely mimic them with their indefinite ability to selfrenew and differentiate into any desired cell type.Also, this technology offers an easy accessibility to source material and a possibility to obtain an endless supply of human tissues which will be especially beneficial for obtaining inaccessible material such as brain tissues without major ethical concerns.Thus, facilitating patient specific stem cell therapies, regenerative medicines, drug discoveries and understanding disease models with the aid of recent advancements in CRISPR/ CAS9 gene editing technologies (Shi et al., 2017).
Models such as iPSC-derived skin keratinocytes and cardiomyocytes have been well established and found to be beneficial in regenerative medicine, disease modelling and chemical/drug screening (Ali et al., 2020;Burridge et al., 2016;Itoh et al., 2013;Kim et al., 2018;Sharma et al., 2018).However, majority of the existing iPSC-derived models are in the early stages of development and often have immature or fetal-like characteristics which might limit their use (Luo et al., 2016).Some models, although not fully established to replace existing animal or in vitro models, are found to be useful in investigating and understanding disease states.For example, iPSC-derived neuronal models have been used for the study of neurodegenerative diseases using patient-derived stem cells, and also to investigate developmental neurotoxicity (Bal-Price et al., 2018;Kamata et al., 2020;Penney et al., 2020;Pistollato et al., 2020).In order to explore the potential of these models for toxicological testing and risk assessment, it is imperative to perform indepth characterisation, to understand cellular differentiation status, and how well the model represents its in vivo counterpart.
With the steady development of next generation sequencing methods, transcriptomics techniques have become more cost effective.As such, they have become a more readily available high throughput and data rich method to interrogate complex systems and gene expression patterns (Cahan et al., 2014;Yu et al., 2015).Their application in dissecting the transcriptional networks of iPSC-derived cell types represents an important opportunity to provide an in-depth assessment of cellular differentiation status at a scale not previously attempted.
Additional outputs from these data can also aid in identifying transcription factors, and in further understanding genetic regulation.This would help in improving the differentiation efficiency and it could identify uniquely expressed genes in each model which could be used as markers to track the differentiation stages.This approach will also aid in understanding disease state models (Brennand et al., 2011;Burke et al., 2020;Griesi-Oliveira et al., 2021;Wellens et al., 2021;Yeo and Ng, 2011).
With the collective resources and knowledge obtained from previous projects, mainly StemBANCC (Morrison et al., 2015) (Innovative Medicines Initiative (IMI) funded) that primarily focussed on producing a well-characterised, standardised iPSC repository and in developing iPSC derived models, in3 (Integrated in vitro and in silico tools), a European Union funded project, focussed on utilising iPSC derived target tissues of toxicological relevance such as brain, cells of the immune system, lung, liver, kidney and vasculature.The study presented here, is part of this project, and was conducted to investigate and compare the transcriptomic changes across different target cell types resulting from a range of cell specific iPSC differentiation protocols.The cell types evaluated in this study were alveolar macrophages (immune system), brain capillary endothelial cells (brain/vasculature), BrainSpheres (brain), endothelial cells (vasculature), hepatocytes (liver), lung airway epithelium (lung), monocytes (immune system), neuronal cells (brain), podocytes and proximal tubular cells (kidney).Further, the possibilities of utilising the transcriptomic signatures of the cells resulting from each differentiation protocol to examine the success of the differentiation was explored.

iPSC culture
Within the framework of the in3 project, two iPSC lines: SBAD2, SBAD3 provided by the StemBANCC project (http://stembancc.org) were used.These iPSC lines were generated by Lyle Armstrong Laboratory, Newcastle University, by reprogramming human dermal fibroblasts into iPSC (SBAD2 and SBAD3) using non-integrated Sendai viral transfection with Cytotune 2.0 (Thermofisher).The SBAD2 line was further genetically modified by adding green fluorescent protein (GFP) downstream of HMOX1 gene by András Dinnyés laboratory, Biotalentum.The SBAD2-HMOX1-GFP cell line was used for the two kidney models and this fluorescent property was not exploited in this particular study.For hepatocytes differentiation, genetically modified SBAD2-3× line as described in Boon et al. was used (Boon et al., 2020;Morrison et al., 2015;Snijders et al., 2021)and transgenesis was performed in these lines using recombinase-mediated cassette exchange (Ordovás et al., 2015).
All four cell lines were cultured on either lactose dehydrogenase elevating virus (LDEV) -free reduced growth factor basement membrane (Geltrex) or Matrigel (both extracted from murine Engelbreth-Holm-Swarm (EHS) tumours) coated plates using mTeSR1 medium (cat no: 85850, StemCell technologies) and were routinely passaged every three V. Chandrasekaran et al. to four days using 0.02% EDTA as described previously in (Rauch et al., 2018) and detailed in the EURL ECVAM DB-ALM Protocol n • 215.

Differentiation of iPSC to intended targets/ Differentiation protocols
As most of these iPSC differentiation protocols are under development and in vitro cells only represent models of the in vivo situation, it is important to note that the iPSC derived target cell types are considered and indicated as 'intended target cell types' in this study.
Monocytes were produced, as previously described in (Wilgenburg et al., 2013), using BMP4, SCF, VEGF in mTeSR in ultra-low attachment (ULA) plates for 4 days followed by culture in X-VIVO base medium with M-CSF and IL3.Cells were harvested using a 40 μm cell strainer from day 21.For alveolar macrophages (MAC), the harvested monocytes were cultured for an additional 7 days using RPMI medium containing myeloid differentiation factors IL6 and M-CSF.TGFβ and GM-CSF were also included to improve differentiation towards an alveolar or airway macrophage phenotype (Yu et al., 2017).
Brain capillary endothelial cells were obtained by slightly adjusting the Qian protocol (Qian et al., 2017) to the SBAD3 cell line as previously reported in Wellens et al. (Wellens et al., 2021).Directed differentiation of iPSC into BBB was executed by forming mesodermderived endothelial progenitors through CHIR99021 (CHIR) and subsequently differentiated using small molecules and growth factors like bFGF, retinoic acid (RA) and B27 in human endothelial serum free medium.
BrainSpheres and Neuronal cells were obtained by first producing neuronal progenitor cells.For the BS, containing neurons, astrocytes and oligodendrocytes, Gibco PSC Neural Induction kit (Gibco, A1647801) was used for the differentiation of iPSC to neuronal progenitor cells as described by the manufacturer.The neuronal progenitor cells were expanded and used between passage 10 and 15 for the differentiation to 3D BrainSpheres.The differentiation, for 43 days, was performed as previously described (Pamies et al., 2017) with the cells under constant gyratory shaking (86 rpm) in differentiation medium (Neurobasal® Electro Medium (Gibco) supplemented with 5% B-27® Electrophysiology (Gibco), 1% glutamax (Gibco), 0.02 μg/ml human recombinant GDNF (Gemini), 0.02 μg/ml human recombinant BDNF (Gemini)).For the neuronal cells, progenitor cells were created through SMAD inhibition with SB4315412, LDN1913189, bFGF and additional medium supplements as described in (Ochalek et al., 2017).At day 10 of differentiation, neural rosettes were manually selected and expanded to neuronal progenitor cells (NPCs) on poly-L-ornithine/laminin coatedplates in neural maintenance medium, containing epidermal growth factor (EGF) and bFGF.To further generate human neurons, NPCs were plated on poly-L-ornithine/laminin coated-plates in neural maintenance medium containing 0.2 mM ascorbic acid (without EGF and bFGF).
Endothelial cells were differentiated from iPSC as described in Gholami et al., 2021(Gholami et al., 2021).In brief, iPSC were trypsinised and transferred to ultra-low attachment plates in mTeSR medium (Gholami et al., 2021).iPSC cell aggregates, with a diameter of 200 μM, were differentiated to endothelial cells by using small molecules CHIR00921, purmorphamine, BMP4, SB431542 and VEGFA in RMPI medium with B27 Plus.After 4 days in RPMI medium, the medium was changed to Endothelial Cell Growth Medium-2 (EGM-2, Lonza) with additional VEGF-A.At day 11 of differentiation, the cells were sorted by flow cytometry for vascular endothelial growth factor receptor (VEGFR2) and seeded on collagen I coated plates in EGM-2.
For hepatocytes, a recombinant cell line was produced as described in (Boon et al., 2020) using zinc finger mediated integration of recombinase cassette by nucleofection.This recombinant cell line was further utilized to differentiate into hepatocytes as described in Boon et al., 2020 and(Ordovás et al., 2016).Cells with 70-80% confluence state were differentiated using several differentiation factors Wnt3a, activin A, BMP4, aFGF, HGF, doxycycline etc., at different intervals in liver differentiation medium (LDM) or amino acids and glycine supplemented LDM.In the beginning 0.6% DMSO was used until day 12 and it was changed to 2% DMSO on day 12.From day 14 onwards, cells were maintained without additional factors in LDMAAGly medium until day 40.
Lung airway epithelium was produced by first differentiating iPSC into a mixed population of lung progenitors during a 14 day period using a published protocol by (Konishi et al., 2016).The mixed progenitors were trypsinised and plated on NIH/3 T3 cells in the presence of ROCKi in BEGM media which allows expansion of basal like cells.After 3 passages, the cells were plated on thincert inserts and an air liquid interface was induced to promote respiratory epithelium differentiation during weeks in pneumacult medium supplemented with heparin and HCS (Djidrovski et al., 2021).
Podocytes were produced as described previously (Murphy et al., 2019;Rauch et al., 2018).iPSC, after reaching about 70% confluency, were detached using accutase and replated onto the Costar 24 well plate format in podocyte culture medium (PCM) supplemented with retinoic acid (RA), activin A, BMP7 along with 1.25% FBS until day 10.By day 10, cells had stopped proliferating and podocytes were maintained in PCM medium with 1.25% FBS and utilized without any differentiation factor.
Proximal tubular cells were obtained by differentiating iPSC using the protocol described in (Chandrasekaran et al., 2021).To summarize, iPSC, grown to about 70-80% confluence, were detached using accutase and replated on to Matrigel coated plates with the small molecules CHIR992021 and TTNPB to induce intermediate mesoderm.Renal lineage was produced by day 6 with addition of FGF9 along with growth factors supplementation, such as EGF and hydrocortisone.After day 6, FGF9 was removed, and cells were maintained in the growth factor supplementation until day 14 to form proximal tubule like cells.

Sample preparation
All of the 10 above mentioned intended target cell types were grown on culture formats as mentioned in Fig. 1 and incubated for 24 h in medium containing 0.1% DMSO.The addition of 0.1% DMSO in the medium enable the same samples to be used to study the effects of 24 h treatments with compounds dissolved in 0.1% DMSO on target specific cell models reported in other publications from the in3 project.After h, supernatants were removed, and the cells were lysed with the required amount of lysis buffer (to obtain minimum of 0.25 million cells/mL).The lysates were frozen at − 80 • C until shipment.In this study, the targeted transcriptomes of the 10 different intended target cell types and undifferentiated SBAD 3 iPSC were compared using three biological replicates of each cell type lysed after 24 h of incubation in medium containing 0.1% DMSO.The day of differentiation/day of lysis mentioned in the article represents the total number of days after which the intended target cell type was achieved following differentiation with the additional 24 h exposure period.
All the samples were then sent to BioClavis technologies, Glasgow, UK to perform TempO-Seq targeted transcriptome assay and processed simultaneously, as a single batch TempO-Seq targeted transcriptome assay analysis was performed using the EU-ToxRisk v2.1 panel containing 3565 probes (representing 3257 genes) as mentioned in (Limonciel et al., 2018).Standard quality control tests were performed by BioClavis on read counts for all the samples.Generated FASTQ files were aligned using Bowtie aligner to generate read counts per gene for each sample (Li and Durbin, 2009).

Data management and harmonization
A central data management system was adopted using Edelweiss-Data™ management system (SaferWorldbyDesign2020) to store and access the data.This provides options to interoperate the data across different labs using the Python and R based workflows which were hosted on the Jupyter notebooks (https://jupyter.org).The raw data file containing the read count information for each sample with unique sampleID obtained from BioClavis were hosted on the EdelweissData™ management system.This was complemented with a corresponding metadata file containing all the necessary information on cell lines, intended target cell types, organs, plate format, dimension of cells, number of replicates, and day of differentiation during lysis etc.An automatic sample selection and filtering was done with the help of information available in the metadate file using the python workflow.This allowed the selection of iPSC derived control samples for the 10 different intended target cell types along with an iPSC line (SBAD3) as a reference for this study.In total, 33 samples representing three biological replicates for each of the 11 cell types (10 intended target cell types +1 undifferentiated iPSC line)were selected and used for further analysis.

Normalization and differential expression analysis
Raw read count distribution box plots were used to look at the distribution of the raw read counts and find outliers within the replicate samples from same group.The low read count probes with values less than the median value (median value: 12) of all probes across all samples were removed.After applying this median cut-off, 3066 probes were selected out of 3565 probes for further analysis.Normalization was performed for the selected list using DESeq2 analysis using the median ratio method as described previously (Love et al., 2014;Singh et al., 2021).
Further differential expression analysis was performed to find out the top 10 differentially expressed genes with significant p-adjusted value (p.adj) <0.05 in each of the 11 cell types (10 intended target cell types +1 undifferentiated iPSC line).As control group, an averaged cell model was created by combining the samples of all other cell models except the one under investigation, i.e., the target cell type of interest is taken as the test group (n = 3) and compared to the control group consisting of the mean of all other cell types (n = 30).
A principal component analysis (PCA) of all probes of the triplicates of the 10 intended target cell types and undifferentiated iPSC using raw read counts transformed using the regularized log function (r-log) of DESeq2 was performed and a similarity matrix was calculated using Pearson correlation coefficients between the normalised expression data of samples and clustering was done using the average Euclidean distance method.This was generated using the Morpheus online tool provided by the Broad institute (https://software.broadinstitute.org/morpheus/).
Finally, a visualization of the the read counts for the top 500 genes across all samples was performed to see the variance between them (Supplementary fig.1).

Differentiation to intended cell types and correlation
Human iPSCs were differentiated into multiple cell types as shown in Fig. 1.The differentiation protocols for each of these intended target cell types are displayed together with the source iPSC cell line, details on the culturing conditions, days of differentiation.Data obtained from TempO-Seq assay showed variations in total raw read counts distribution across different intended target cell types and, in some cases, within the triplicates of the same intended target cell type as shown in Supplementary Fig. 1.However, DESeq2 normalised read counts displayed high similarity between triplicates of the same sample as well as good separation between different intended target cell types, in both Pearson similarity correlation analysis (Fig. 2A) and in PCA (Fig. 2B), indicating the performance of DESeq2 on eliminating the sample variances.For example, monocytes which showed some variations between the triplicates in the read count distribution (as shown in Supplementary Fig. 1) displayed high Pearson correlation score (0.98) with the DESeq2 normalised read counts.
Correlation between the different intended target cell types and iPSC differed between the models, with the highest mean Pearson correlation (0.66) for the BBB triplicates and lowest (0.09) for the HC triplicates.The HC and EC triplicates also clearly cluster separately from the other intended target cell types in the Euclidean clustering (Fig. 2A) as well as in the PCA plot (Fig. 2B).Some of the intended target cell type groups displayed a higher similarity with each other, like neuronal cell and BrainSpheres; monocytes and alveolar macrophages; podocytes, proximal tubular cells and brain capillary endothelial cells, which was reflected in the PCA by clustering in close proximity of each other and by a higher Pearson correlation score.In addition, a general distinction according to the germ layers of these intended cell types could be made with ectodermal (BS and NC), mesodermal (MAC, MON, EC, BBB, PODO and PT) and endodermal (AE and HC) cell targets as distinct clusters based on the first principal component (PC1).

Top 10 differentially expressed genes
Differential expression analysis (DESeq2) was used to identify the differentially expressed probes for each intended target cell type compared to the average of all the other target cell types.For each intended target cell type, the DESeq2 normalised read counts of the top ten differentially expressed probes with p-adjusted value of <0.05 were plotted, in order of decreasing fold change, in a heatmap and accompanying table (Fig. 3).

Discussion
In this study, we evaluated differences in transcriptomic signatures of 10 different intended target cell types derived from human iPSC.
Correlation and variability between the different target cell types was assessed using Pearson correlation and PCA.Cell types like brain capillary endothelial cells and proximal tubular cells shared epithelial/ endothelial characteristics, like transporters, and thus clustered closely; neuronal cells and BrainSpheres as expected clustered together, since they present a very similar composition of brain parenchymal cell types (neurons and astrocytes in both models, and oligodendrocytes only in BS).Moreover, PCA also showed separation of cell types based on germ layer origin.We further identified the top ten differentially expressed genes for each model in comparison with the average expression in all other models combined, allowing the investigation of genes that were distinctly expressed in one model.

(Red (high expression); white (low expression)). Read count values used in the heatmap are displayed in table together with standard deviation (SD). Information on probe, gene description and ENSEMBL ID are displayed on the left side. On the right side, the individual replicate normalised read counts of the top two differentially expressed probes per intended target cell type are plotted. Scatterplots were created in Prism (v 9.0.2).
The undifferentiated iPSC displayed, as expected, an enrichment of stemness factors like POU5F1, ESRG, LEFTY2, TERT, LCK, LEFTY1 and FOXD3 (Kim et al., 2014;Lowry et al., 2008;Palma et al., 2013;Takahashi et al., 2007;Zhu et al., 2014).However, two genes enriched in the iPSC, SLC18A2 and NPTX2 are associated with neuronal tissue suggesting that the iPSCs are more prone to differentiate towards neuronal cells.SLC18A2, also known as Vesicular Monoamine Transporter 2 (VMAT2), is involved in monoamine transport, for example, in dopaminergic neurons (Playne et al., 2018).Neuronal Pentraxin 2 (NPTX2 or NARP), is thought to be involved in axon guidance (Pelkey et al., 2015) and was also enriched in the neuronal cell model but it was not expressed in BrainSpheres.
In macrophages, IGSF6, a member of the immunoglobulin superfamily, was presented as top differentially expressed probe while it was also highly expressed in monocytes.This type I membrane protein was found to be specifically expressed in hematopoietic tissue, more particular in myeloid and immature dendritic cells (Bates et al., 2000), and is consistent with the myeloid differentiation trajectory of the MON and MAC iPSC protocols.The hematopoietic specificity of this gene was also reflected in the lack of expression in the other models.Chemokine genes CCL22 and CCL24 (eotaxin-2) were specifically upregulated in MAC, again consistent with in vivo and primary macrophage cell specific expression (Fransen and Leonard, 2021;Van den Bossche et al., 2016;Xuan et al., 2015).CCL22 acts as chemoattractant for monocytes, dendritic cells, T cells and natural killer cells (Ushio et al., 2018), while CCL24 attracts mainly eosinophils and other CCR3 expressing cells (Dai et al., 2016).A number of prototypical markers of macrophages were not present within the TempO-Seq panel, including MRC1 and ITGAX.Increased expression of these markers was however observed by RT-PCR compared to undifferentiated iPSC (data not shown), supporting the success of the MAC differentiation protocol.
The differentially expressed gene with the highest fold change identified for the BBB model was hyaluronan and proteoglycan link protein 1 (HAPLN1), a linking protein detected in part of the brain extracellular matrix together with hyaluronic acid, lecticans and tenascins (Wong et al., 2013;Zimmermann and Dours-Zimmermann, 2008).This probe was also found to be differentially expressed in the podocytes.The gene with the second highest fold change for the BBB was CGA, encoding for the alpha subunit of the complex glycoprotein hormones (human chorionic gonadotropin, luteinizing hormone, follicle-stimulating hormone and thyroid-stimulating hormone) and was also found to be expressed in the EC model.There are currently no studies about the possible expression of CGA in the BBB.ABCG2 (also known as BCRP), known to be enriched in the human blood brain barrier, is an important efflux transporter that limits the accumulation of variety of drugs in the brain (Suhy et al., 2017;Warren et al., 2009).It was found in the top 10 differentially expressed probes in BBB.
The two brain models, BrainSpheres (3D cell cultures) and neuronal cells (2D cell cultures) showed as expected much greater similarities with each other than with any of the other models, even though the differentiation protocols differ.NEFL, SLC1A2, MAP2 and NCAM1 were identified as top expressed genes in human neurons, while NCAN, S100B, SLC1A2, MAP2, MT3 and NCAM1 were (also) identified in the astrocytes and S100B, SLC1A2 and NCAM1 were found to be expressed in oligodendrocytes (McKenzie et al., 2018).The S100B gene, known to be expressed in astrocytes and oligodendrocytes (McKenzie et al., 2018;Michetti et al., 2019), was only expressed in the BS, reflecting the difference with the NC model.The GAGE family members GAGE2A and GAGE1 were also uniquely expressed in the BS.An increased expression of the GAGE family members was reported in paediatric and high-grade brain tumours although GAGE2A and GAGE1 were not specifically investigated (Gjerstorff and Ditzel, 2008;Jacobs et al., 2008;Scarcella et al., 1999).The metallothionein 3 (MT3) gene was only expressed in the NC model while this gene is found to be expressed in both neurons and astrocytes (Vašák and Meloni, 2017;Yamada et al., 1996).This difference could possibly be attributed the different differentiation protocols and cell sources which were used (i.e SBAD2 for NC and SBAD3 for BS).
Three of the top ten differentially expressed probes identified in the endothelial cells belong to the GIMAP family, GTPase of the immunity associated protein family (Krücken et al., 2004).This family of genes was found to be expressed mainly in spleen and lymph nodes but as well in other organs like heart, muscle, digestive tract and placenta.As this family has only recently been identified, not much is known on their expression in endothelial cells, especially not for the highest differentially expressed GIMAP-1-GIMAP-5 transcriptional readthrough.The second most differentially expressed gene identified for the EC model is CALM3.This protein is a moderator of calcium signalling, serving as a transductor or sensor of Ca 2+ signals (Alaimo and Villarroel, 2018;Toutenhoofd et al., 1998).Calmodulin is widely expressed throughout tissues and no study, to our knowledge, addresses the specific expression of CALM3 in endothelium.Von Willebrand factor (vWF), a multimeric glycoprotein produced by endothelial cells was also displayed in the top 10 differentially expressed probes of EC.
Hepatocytes exhibited most of their typical markers in the top 10 upregulated probes.The most highly expressed gene in hepatocytes was TTR (Transthyretin).It is a secretory protein mainly produced by hepatocytes in liver and is responsible for the transport of thyroxine and retinol.Existing iPSC-derived hepatocyte models are known to have similar expression of TTR as primary hepatocytes and are being utilized in studying TTR related amyloidosis (Niemietz et al., 2018).The second most expressed gene was AFP, alpha-fetal protein.Although the secretory level of AFP is known to decrease significantly after development, gene expression level remains significantly high in the hepatocytes.But this could also represent marker for retrograde differentiation (Kuhlmann and Peschke, 2006).Other genes in the top 10 list in hepatocytes are SERPINA1, a protease inhibitor belonging to serpin superfamily; serum amyloid A2 (SAA2) and apolipoprotein A1 (APOA1) which belongs to the family of apolipoproteins; glycoproteins such as transferrin (TF) and fibrinogen gamma chain (FGG) and albumin (ALB), a gold standard liver marker which is exclusively produced in liver.These genes are known to be enriched in human liver (Andrews et al., 2021).This distinct and high expression of various liver specific genes in comparison to other models cluster the hepatocytes separately.
In the lung airway epithelium model, cytokeratins which are found in the cytoskeleton of the epithelial cells such as KRT4, KRT15, KRT17, were upregulated in comparison to other models.Type II cytokeratin, KRT4 is specifically detected in mucosal epithelial cells but not in the basal cells (Blobel et al., 1984).However, type I cytokeratins, KRT15 and KRT17 are found mainly in the basal cells (Miller et al., 2020).These cytokeratins offer resistance to mechanical stress and play a vital role in differentiation of epithelial cells.Also, other markers like CSTA, SCNN1B, SLC6A14, CYP4B1 were highly expressed in airway epithelium.SCNN1B, which is a sodium channel epithelial 1 beta subunit, is known to be enriched in alveolar type 1 cell and plays an essential role in the maintenance of air-liquid homeostasis (Hummler and Planès, 2010).CYP4B1, cytochrome P450 family 4 gene and SLC6A14, a solute carrier family 6 gene are predominantly expressed in bronchial airway epithelium (Boei et al., 2017) and recently SLC6A14 has been shown to be involved in regulating cystic fibrosis (Ruffin et al., 2020).Many of the prototypical genes used to characterise conducting airway epithelial differentiation were not present within the panel, including secretory (MUC5B, MUC5AC, SCGB1A1), multiciliated (FOXJ1, PIFO) markers.Further characterisation of the model was addressed elsewhere (Djidrovski et al., 2022;Djidrovski et al., 2021).
Out of the top 10 differentially expressed genes for the immune models MON and MAC, seven were found to be expressed in both.As the monocytes were further differentiated towards alveolar macrophages, it is unsurprising that these models contain an overlap in gene expression.AQP1 and CD14 were highly expressed in both MAC and MON.Human monocyte antigen, CD14 is a pattern recognition receptor which is important for the innate immune response to pathogens (Villani et al., V. Chandrasekaran et al. 2017;Wilgenburg et al., 2013;Wu et al., 2019).It is commonly employed as a marker for monocytes but is also identified in alveolar macrophages although to a lesser extent (Haugen et al., 1998).The water channel aquaporin 1 (AQP1) is essential for the cellular water regulation and plays a role in the immune response (to lipopolysaccharide (LPS)) in both monocytes and macrophages (Rump et al., 2013;Rump and Adamzik, 2018).TYROBP which is a tyrosine kinase binding protein was one of the top upregulated genes in MON and is also well known to be highly expressed in them (Bakker et al., 1999).However, there were no strong evidence in the literature for their presence in monocytes for the genes FMO1, SULT1C2, AMHR2.
In podocytes, ACTA1 (actin alpha 1) and HAPLN1 (Hyaluronan and proteoglycan link protein 1) were identified as the top 2 upregulated genes.Both actin cytoskeleton and extracellular matrix components help in maintaining cellular structure and integrity and thereby help to maintain intact glomerular basement membrane in order to provide effective barrier.Recent study also confirms the presence of HAPLN1 in human glomerular organoids (Hale et al., 2018).Other genes like, PRUNE2, CDH6, TXNRD1 were among the top highly expressed genes in podocytes but are not podocyte specific.Recent single cell sequencing study shows expression of CDH6 in early glomerular development; however, it declines over podocyte maturation (Harder et al., 2019).PRUNE2 has been shown to be present in iPSC-derived podocytes but not in human glomeruli in vivo (Sharmin et al., 2016).
In proximal tubular cells EMX2 and HOXA5 were the top two upregulated genes.Empty spiracles homeobox (EMX2), a homeodomain transcription factor, has been shown to be present in renal progenitors and assists in directing renal nephrogenesis.EMX2 has recently been utilized together with other transcription factors to produce renal tissues from stem cells through direct reprogramming or using synthetic mRNAs (Hiratsuka et al., 2019;Kaminski et al., 2016).Homeobox genes are important in embryonic development.Even though it has been reported that Homeobox A5 (HOXA5) is expressed in the developing kidney (Dony and Gruss, 1987;Hershko et al., 2003), we did not find solid evidence for its expression in proximal tubular cells.Other genes like retinol binding protein 1 (RBP1) and retinoic acid receptor alpha (RARA) and antioxidants related genes like metallothionein (MT1M, MT1E) and peroxiredoxin (PRDX2) were highly expressed in PT in comparison to other cell types.
Brain capillary endothelial cells, podocytes and proximal tubular cells clustered closely in the PCA.This could be due to the fact that BBB and proximal tubular cells share most of the ABC and SLC transporters, metabolizing enzymes and junctional proteins.Several studies, including ours, recently questioned the endothelial characteristics of multiple BBB differentiation protocols, reporting the lack of expression of vascular lineage genes and the presence of epithelial associated genes which could contribute to the proximity in the clustering (Delsing et al., 2018;Lippmann et al., 2020;Lu et al., 2021;Vatine et al., 2019;Wellens et al., 2021).As blood-brain barrier endothelial cells develop in close proximity of other cells of the neurovascular unit (pericytes and astrocytes), co-culturing with these cells of the neurovascular unit might improve the model (Wellens et al., 2022).Likewise, renal cell types, podocytes and proximal tubular cells, shared many common features, especially extracellular matrix proteins.This impedes the appearance of cell type specific genes in the top 10 list in these cell types particularly in PT as these cells share common genes across several other cell types.Also, in some of the models mainly PT and BBB, typical markers were not present in the top upregulated genes list.This is largely because this TempO-Seq targeted gene list (supplementary information 2) was specifically designed for studying toxicological mechanisms.Thus, key cell type-specific characterisation markers for some of the cell types (for example, LRP2 for PT, NPHS1, NPHS2 for podocytes, alveolar specific secretory components etc.,) were absent in the list.
It is also critical to note that some of the genes that were upregulated, could be directly related to differentiation factors used in the respective protocols.For example, retinoic acid receptor agonists induced transcription of genes related to retinoic acid signalling.In the case of podocytes, PT and BBB there is a strong upregulation of genes like HAPLN1.Retinoic acid stimulates human chorionic gonadotropin (hCG) secretion which might in turn increase HAPLN1 expression (Chou et al., 1983;Kato and Braunstein, 1991;Liu et al., 2010;Roulier et al., 1996).This might provide a possible explanation for HAPLN1 being one of the highest differentially expressed genes in both BBB and PODO and showing increased expression in PT (Fig. 3), as all three protocols make use of retinoic acid (Fig. 1).

Conclusion
Cross-comparison performed in this study on transcriptomic level alteration is first of its kind to understand the differences and similarities between different iPSC-derived intended target models.Targeted transcriptomic expression profiles are not sufficient to fully explore the degree of similarities and differences between the models derived from iPSC.Nevertheless, most of the genes identified as uniquely expressed in one model, using an unbiased approach, appear relevant for characterisation of the intended target cell types.This global cross comparison across different models provides in depth information on how these differentiation protocols affect the cells.Additionally, they allow to identify markers that are able to characterise these cells which could complement gold-standard cell-specific characterisation markers.Besides characterisation of intended cell types, the transcriptional expression of those genes, considered as markers, can also serve as a control to assess the reproducibility of differentiation protocols, in particular when different iPSC lines are used.In addition, it gives information on how each models behave, which pathways are activated in certain cell types in comparison to others which could be explored further to improve the models.Full transcriptomic testing will be employed in the future to get thorough knowledge on these models and also to compare with in vivo tissue specific expression profiles.Finally, the possibility to separate the intended target cell types into the three germ layer groups further illustrate the great potential of the use of iPSCderived cell models to address a wide range of biological questions including the nature of the mechanisms responsible for tissue specific response to chemicals in toxicology.

Declaration of competing interest
The authors declare no competing interests.JG and RGV are cofounders of Evercyte.RGV and ZM are employees of Evercyte.

Fig. 2 .
Fig. 2. Sample quality control and correlation.A) Pearson correlation heatmap with Euclidean average clustering (created using the Morpheus online tool provided by the Broad institute (https://software.broadinstitute.org/morpheus/) for normalised read counts of triplicates of all the intended target cell types and iPSC.B) Principal Component Analysis (PCA) of all probes of the triplicates of the 10 intended target cell types and undifferentiated iPSC.Germ layers separation of the intended target cell types are indicated on the graph with dotted lines.PCA plot was created in R.

Fig. 3 .
Fig. 3. Top 10 significantly upregulated genes across all intended target cell types.The normalised read counts of top 10 differentially expressed probes per intended target cell type are shown in the heatmap.Heatmap was generated using Microsoft Excel 365 with conditional formatting applied to each row using 2-color system.(Red(high expression); white (low expression)).Read count values used in the heatmap are displayed in table together with standard deviation (SD).Information on probe, gene description and ENSEMBL IDare displayed on the left side.On the right side, the individual replicate normalised read counts of the top two differentially expressed probes per intended target cell type are plotted.Scatterplots were created in Prism (v 9.0.2).