A high-content platform to characterise human induced pluripotent stem cell lines

Graphical abstract


Introduction
Human induced pluripotent stem cells (iPSCs) offer tremendous potential not only for cell therapy but also to develop platforms for medical research. In particular, patient-derived iPSCs can be used to obtain selected differentiated cell types to model diseases and discover new therapeutics [1]. Heterogeneity in gene expression has been described within a specific iPSC line [2], between different donors [3,4] and through the reprogramming process [5,6]. Furthermore, several studies have focused on the differences between a small number of lines from patients and controls or used isogenic lines [7]. However, despite recent examples in this direction [8] dissecting the phenotypic heterogeneity within one cell line and among lines derived from the same donor or diverse individuals is yet to be fully explored.
Nonetheless, a clear definition of the genetic and epigenetic variance and how each of these affects cell behaviour in large panels of iPSCs is crucial for stem cell biology. Moreover, assessing the phenotypic variance observed in cell populations from multiple donors will facilitate scaling up culture systems as well as the development of quality control and automation protocols with undoubted value for the maintenance of pluripotent stem cells and controlled differentiation towards specific cell types.
The human induced pluripotent stem cells initiative (HipSci) is generating iPSCs from hundreds of healthy individuals as well as patients diagnosed with selected diseases. This represents a powerful resource to evaluate and quantify cell responses to chemical, physical and biological stimuli using novel assays and artificial microenvironments. Within this framework, phenotypic data are being collated with genomics, epigenomics and proteomics data to discover the impact of their variation on the cellular phenotype.
Here we describe the development of a simple assay (including methods, workflow and set-up) to capture and quantify phenotypic features of iPSCs exposed to different extracellular matrix conditions.

iPSC quality control and maintenance
iPSCs are received from the Wellcome Trust Sanger Institute. There, cells are reprogrammed from fibroblasts using the Sendai virus method [9]. After reprogramming, each clone is genotyped and tested for copy number variations (CNVs). Pluripotency is assessed based on expression profiling [10], detection of pluripotency markers in culture and response to differentiation inducing conditions [11]. Data reported in this study refers to multiple replicate experiments of a single cell line [12] ( . For iPSCs maintenance, inactivated MEFs are seeded as feeders on a 6-well microplate (Falcon) coated with 0.1% gelatin from porcine skin type A (Sigma-Aldrich) at a density of 10 6 cells per 6-well plate and allowed to attach overnight.

Extracellular matrix coating conditions
To develop our assay, we first sought to identify favourable extracellular matrix substrates. We screened and examined a total of 74 diverse conditions from two sources (see Table 1). A customised array plate acquired from Orla protein technologies (Sarstedt cat. No. 02XECM-96) contained 9 conditions derived from single ECM proteins including fibronectin, laminin, collagen, vitronectin, osteopontin, tenascin C and bone sialoprotein and 17 conditions presented as a mixture of different ECMs. For this set, triplicates of a single concentration of approximately 25 lg/ml per condition were coated on wells. Additionally, we created an array plate containing in duplicate fragments of fibrillin-1, fibrillin-2 and agrin as well as cellular and plasma fibronectin at a range of 1, 10 and 25 lg/ml. The plate also contained single concentrations of the following ECM proteins: LTBP-1 C-terminal fragment, MAGP-1, syndecan-2 extracellular domain, syndecan-4 extracellular domain and fibulin 4. Extracellular matrix proteins diluted in 80 ll PBS were incubated overnight on 96 well lClear black tissue culture plates (Greiner cat. No. 655090) at 4°C. The supernatant was removed and well-coating blocked by the addition of 10 mg/ml BSA for 1 h. Upon removal of the BSA solution, the plates were stored at À80°C prior to use. Bovine Serum Albumin (BSA) and uncoated tissue culture plastic (TCP) were used as controls.
For the fibronectin assay, 96-well lClear plates (Greiner) are coated with 1, 5 and 25 lg/ml human plasma fibronectin (Corning) and stored at 4°C (overnight or up to 14 days). We will refer to these conditions as Fn1, Fn5 and Fn25, respectively (whereas Fn10 was only used in the screening). Each is present in a technical triplicate on the same vessel randomised per column using diverse patterns (i.e. Fn1-Fn5-Fn25, Fn1-Fn25-Fn5, Fn5-Fn25-Fn1, Fn5-Fn1-Fn25, Fn25-Fn1-Fn5, Fn25-Fn5-Fn1). Border wells are avoided to reduce edge effects. Before use, fibronectin is removed and wells are washed with DPBS (Sigma-Aldrich). (1:5000, 1 lg/ml final concentration, Life Technologies). Plates are then washed with DPBS and stored at 4°C. EdU was used according to manufacturer's instructions except for the concentration of the azide reagent halved. A period of half hour was chosen in line with the cell cycle period described in the literature for human iPSCs [6]. As a control, cells were exposed to the same reagents in the absence of EdU incorporation showed comparable background intensity values to the cells considered EdU negative by our analysis. Acquisition parameters and image analysis pipeline are described in details in Section 3.1 and Section 3.3 respectively. For endpoint analysis, stained plates are imaged using an Operetta Ò (Perkin Elmer) high content device. Images are acquired in wide field mode using 4 channels (DAPI, 488, 647, Brightfield as control). On Greiner lClear plates, we optimised heights focal settings for brightfield, DAPI, EdU and CellMask (respectively 11, 20, 9 and 10 lm) following the sharpest focal plan guided by the highest intensity of signal. Times of exposure (respectively 100, 200, 300 and 10 millisecs) were chosen to minimise the time of acquisition and the amount of reagents used. Incucyte (Essen Bioscience) images were acquired largely as described in [13].

Results and discussion
We first aimed to obtain a robust read out to evaluate response of undifferentiated iPSCs to controlled changes in the microenvironment. Furthermore, we aimed to develop a set of procedures to effectively extract from images relevant phenotypic features, which can be quantified and interrogated in downstream phases of the analysis. As a proof of principle for this protocol, we used here dissociated iPSCs from a single control line in undifferentiated culture conditions. This study serves as a foundation to build phenotypic signatures of large panels of iPSC lines from multiple donors which can be collated to complementary and matched datasets containing genomic and proteomic information. Similar approaches can be readily tailored to study cells differentiated from pluripotent stem cells or generated by other reprogramming strategies.

Screening for optimal extracellular matrix protein conditions
Cell behaviour is heavily influenced by genetics and by the surrounding environment [14,15]. In order to evaluate specific differences on cell behaviour, we reasoned that diverse coating concentrations on multiwell plates could be exploited. Thus, as a prerequisite to build a scalable workflow suitable for the characterisation of large panels of iPSCs, we first set out to identify an effective, robust and inexpensive substrate. We searched for an extracellular matrix (ECM) protein or peptide that could be used at different concentrations ranging from unfavourable to permissive for cell attachment and cell spreading. Furthermore, we searched for conditions to robustly detect a sufficient number of single cells when plating the same number of iPSCs on a range of diverse concentrations. In addition to these criteria, we also aimed to keep the concentrations as low as possible to minimise the presence of potential contaminants (e.g.: growth factors).
We tested 74 conditions (described in Section 2.2 and Table 1) by seeding cells, imaging them live every hour for 24 h and fixing and staining with DAPI, EdU and CellMask. Several conditions such as fragments lacking RGD sites (e.g. from Agrin, Syndecan, Fibulin 4 and MAGP-1) yielded poor numbers of cells, similar to the BSA or tissue culture plastic controls (Table 1, left insert). Others, such as vitronectin allowed for attachment, spreading and survival and yet cells were rarely found as single cells appearing mostly in clumps (Table 1, right insert). Importantly, some of the conditions tested allowed the attachment, spreading and survival of single cells and additionally demonstrated a dose-dependent response. These results suggest that varying concentrations of one single substrate may lead to the establishment of assays tuning cell response in terms of attachment, spreading, proliferation and intercellular adhesion. For our screening experiments we used two different vessel types: Sarstedt 96w (for the Orla plate) and Greiner lClear (for the custom plate). Images acquired on the Operetta appeared sharper for the Greiner lClear. Objective 4Â, 20Â and 40Â were unpractical or gave an unfavourable ratio of cells on the borders versus cells in the field. We therefore used a 10Â long working distance (WD) objective for 9 fields of view per each well, excluding the peripheral fields and excitation 50% and transmission 50% according to manufacturer's instructions to avoid photo-bleaching.

Assay development using a gradient of fibronectin concentrations
Among all promising substrates (Table 1, #65-74), we focused on those that for practical reasons such as cost and robustness will result suitable to a large number of iPSCs. Furthermore, these might facilitate the scale up and industrial application of similar strategies. Fibronectin is a large glycoprotein generally in the form of an insoluble dimer. Several reports suggest an essential role for fibronectin during vertebrate embryonic development [16] and tissue regeneration [17]. In human iPSCs, several studies indicate that fibronectin is a permissive substrate for the maintenance of pluripotency [18][19][20][21][22][23]. In addition, its adsorption on tissue culture plates has been shown to give rise to diverse structures with diverse surface density affecting the number of focal adhesion contacts [24]. In the human body, fibronectin exists in two forms: the Fig. 1 (continued) plasma form circulates in the blood whereas cellular fibronectin is physically associated to the cell surface [25]. Both cellular and plasma fibronectin as well as other conditions presenting recombinant peptides gave similar results in terms of total number of cells attached and number of single cells. Fn1 and Fn25 conditions yielded remarkable differences indicating an environment conducive of low versus high cellular adhesion respectively, whereas Fn10 appeared very similar to Fn25. We therefore sourced plasma fibronectin in conditions Fn1-Fn5-Fn25 to validate our observations. To achieve statistical significance, these conditions were each replicated three times on the same vessel in a randomised pattern (Fig. 3A). We report here results from numerous replicate experiments (n = 41).
Having chosen the extracellular matrix and a suitable range of concentrations, we tested the initial cell density. A seeding density below 3000 cells per well resulted in a suboptimal number of single cells. On the other hand, when seeding 6000 cells on Fn25 we observed a comparable number of single cells to those observed when plating 3000 cells on Fn25. We also observed a high number of clumps forming when plating 6000 cells on Fn1 (Fig. 2A). We thus chose to plate 3000 cells per well as this seeding density yielded an optimal number of single cells in all conditions tested.
To determine the most appropriate duration for our endpoint assays we performed live imaging of cells plated in Fn1, Fn5 or Fn25. The time chosen should be long enough for the cells to adhere and be in line with the cell cycle described for human iPSCs [26] to minimise complete cytokinesis. Ideally in this condition, the total number of cells observed is deemed to depend more on cell adherence and survival rather than on cell division. DNA replication can then be validly assessed with a 30 min pulse of EdU. Cells appeared to start adhering and spreading around 4 h post-plating at all fibronectin concentrations. At 24 h, the vast majority of the attached cells will have completed their spreading with minor cell divisions observed (Fig. 2B). We therefore opted to run the endpoint assays 24 h after plating.

Image analysis pipeline
Image acquisition parameters using Operetta have been detailed in Sections 2 and 3. We here describe in details the image analysis pipeline we have built using the Harmony 3.5.2 software (summarised in the workflow diagram in Fig. 3B). Similar strategies could be easily transferred to other, possibly open-source, image analysis software platforms. Step-by-step experimental conditions for the assay set-up are detailed on the right. KOSR = KnockOut Serum Replacement, RT = room temperature, BSA = bovine serum albumin. (B) Image analysis pipeline detailed in Section 3.3 is summarised here. Input images are segmented to identify nuclei and cytoplasm. Border objects and artefacts are discarded via morphology and intensity assessment on nuclei and on cells. The modify population module is employed to identify clumps based on cell-to-cell proximity and capture the number of cells in each clump as a context feature.
We first used the Input Image module and proceeded to minimise background by processing individual planes with a basic flatfield correction and no quick tune. The find nuclei module segments nuclei using the DAPI channel and the M Method. A 18 lm diameter was chosen with 0.40 splitting coefficient and 0.05 common threshold. These parameters allowed to efficiently segment an output population named 'Nuclei' still carrying artefacts to be discarded based on sizes and intensity of signals. Within the calculate morphology properties module we used the standard methods on the nuclei population and on the nucleus region to define area, roundness and ratio width to length and output properties as 'Nucleus'. We then used calculate intensity properties modules for the Alexa488 channel (EdU), the brightfield and the DAPI outputting median intensities as properties for each object. Using the select population module we then filtered by properties as follows: nucleus area between 60 lm 2 and 600 lm 2 , EdU median intensity below 10.000 and DAPI median intensity between 500 and 10.000 with brightfield over 0. This refined output population was named 'Nuclei 2'. We then used the find cytoplasm module on channel CellMask Deepred on nuclei 2 choosing method A with threshold 0.05. We select population again for 'Nuclei 2' removing border objects (common filters) on cell region. The output population was named 'Cell unselected'. We then calculate morphology properties for cell unselected on cell using standard method and area, roundness and ratio width to length. The output properties were named 'Cell'. We then applied the select population module using 'Cell unselected' and filter by property cells with cell area below 6000 lm 2 . Thus, filters are applied discarding objects of nucleus area below 60 lm 2 and over 600 lm 2 , EdU median intensity over 10,000 AU and DAPI median intensity under 500 or over 10,000 AU and cell area bigger than 6000 lm 2 (the latter include the vast majority of the few feeder cells present on the wells, see Fig. 3B). From the modify population module we used the cluster by distance method on the 'cell' population on cell region. Distance was set as 0 and Area over 0 px and no fill holes. The output population was named 'Clumps + Singles'. We then used the calculate properties module on the 'Clumps + Singles' population choosing the by related population method. Related population cell and number of cell was considered as a property. The output properties were defined 'per clump'. We then applied the calculate properties module to the cell population using by related population and using the number of cell per clump leaving the output property blank. In practical terms, a population is here modified to conglomerate 'cells' in a related 'clump' and to assess the number of 'cells' in each 'clump' and tag this back from the related population into each 'cell' object having this value equal one for single cells. We adapted this strategy from similar approaches [27]. We then used the define results module exporting for the 'cell' population only the following parameters for all cells producing single cell results as selected. In total, for each 'cell' object, 9 phenotypic features are defined: 6 morphology, 2 intensity and 1 context feature. The morphology features selected are: nucleus area, nucleus roundness, nucleus width to length ratio, cell area, cell roundness, cell width to length ratio; The intensity features are: DAPI median and EdU (488) median and the context feature captured is: Number of cell per clump sum (see Fig. 4).

Phenotypic features and their aggregation
For structured access, the output of the Harmony image analysis pipeline was stored in a MySQL database along with experimental metadata including fibronectin concentration per well and experiment number. Further processing, analysis and data visualisation was performed within the statistical computing framework R directly accessing the database. The cell phenotypic features described above were suitably normalised in value (log10 or square transformation) and aggregated across the cells for each well by taking average and standard deviation (Fig. 4A). The cell number was directly acquired from the Harmony data. For EdU, median intensity raw values were grouped and characterised on a well-based measure by the fraction of positive cells. We opted to quantify this as the area under the empirical density not explained by a Gaussian main peak representing EdU negative cells (see Fig. 4B). Tendency of cells to form clumps was described for each well by two summary statistics: the fraction of single cells and the inverse of the mean clump size. From the distribution of clumps over clump sizes (Fig. 4C) we observed exponentially less clumps of bigger sizes. The inverse of the mean clump size as the defining parameter of that geometric distribution proves therefore valid. Additionally, wells with fewer cells overall tend to have more single cells in proportion, as the chance of cells to come to contact is lower. Percentage of single cells was also used as a measure of the ability of cells to form clumps.
We used the Number of cell per clump sum as a context feature to maintain all objects in a common database interrogating separately data from single cells versus from cells in clumps (Fig. 5). Our assay demonstrated over 41 replicates a fraction of single cells per well of 37% indicating the effectiveness of the chosen substrate conditions in preserving a population of single cells (Fig. 5A). Exploiting this strategy, we first asked whether cell-cell contact affects phenotypic features. Cell area appeared different depending on the context feature and showed a bimodal curve in single cells (Fig. 5B, C, E) indicating the presence of a population of smaller single cells likely not spreading. We postulated that cells that come in contact with neighbouring cells may show distinct morphology from single cells as they are constrained in shape by the presence of their neighbours and cannot elongate as single cells. In fact, cell roundness (Fig. 5C, D) and cell width to length (Fig. 5E, F) showed a wider range for single cells than for cells in clump. These results indicate that our assay is suited to capture differences in features emerging upon intercellular adhesion. We next sought to observe and quantify the effect on phenotypic features of different fibronectin concentrations.

Diverse fibronectin concentrations trigger specific phenotypic responses
We therefore asked whether changes in the concentration of fibronectin affected the number of cells on the plate. We observed as expected that the total number of cells was higher in Fn25, intermediate in Fn5 and lower in Fn1 (Fig. 6A). We are seeding the same number of cells and the period of observation is relatively short and consistent with the time required to attach and spread before a cycle (see Section 3.2). Thus, we speculate that this difference in number of cells retrieved is more likely due to differences in adherence and/or survival and less likely due to proliferation. Accordingly, the fraction of EdU positive cells was comparable (Fig. 6B). Also, we found an increased tendency to form clumps in Fn25 in terms of inverse of mean clump size (Fig. 6C). We cannot rule out that this may be an indirect effect of fibronectin through cell number or cell migration. In agreement with visual inspection, both cell and nuclear morphology varied substantially in these diverse conditions. We found wider ranges in cell area on Fn25 whereas cells appeared more round on Fn1 (Fig. 6C). Similar observations were made for width to length (not shown). Density plots of nuclear area (Fig. 6D), roundness (Fig. 6E) and width to length ( Fig. 6E) also presented differences in these features at a population level.
We finally asked whether the developed assay and the cell phenotypic features obtained were sufficient to separate in a high dimensional features space cells in different fibronectin concentrations. We thus performed principal component analysis (PCA ,  Fig. 7A). The defined features affected variably the two components (Fig. 7B) which together explained a percentage of variance of approximately two thirds (Fig. 7C). An elliptic area representing 68% of the sample space showed non-overlapping Fn1 and Fn25 conditions whereas Fn5 appeared intermediate. Altogether these results demonstrate that the high content platform we developed is suited to quantify phenotypic feature changes in human iPSCs that depend on cell-cell contact and biological responses triggered by different substrate conditions.

Potential modifications to the current pipeline and conclusions
We deliberately chose a simple cytochemistry based read-out using only dyes to set up our workflow. Nonetheless, the content of this assay read-out can be easily increased using antibodies and other reporters. We have observed that the binding to fibronectin is partially blocked by disturbance with an anti-b1 integrin antibody indicating a specific effect of the substrate on the cell surface iPSCs. Blocking antibodies, inhibitors as well as other different environments could be used in similar approaches to challenge cell responses. The methods were developed on feeder-dependent iPSC lines and also successfully tested on feeder free cells. One possible confounding factor is the presence of sparse feeder cells among the iPSCs. Reassuringly, and as a negative control, dissociation with dispase and collagenase did not dissociate a cell culture composed of feeder cells only (Data not shown). The rare contaminating feeder cells are in the vast majority discarded based on their size. Changes in the pipeline such as the introduction of a machine learning classifiers were therefore deemed not necessary for this level of contamination but may be considered in the future for similar studies. It is also possible to exclude clumps over a certain size to contain artefact from suboptimal seeding although this was also deemed not necessary with the current experimental conditions. To enrich the panel of morphological features analysed to train classifiers, an extensive array of other morphological features could also be derived in combination with the described morphology features. Specific phenotypic traits can be examined further, for example improving the adherence would result in larger production of cells in a faster, more efficient and cost-effective manner. Furthermore, controlling the distribution of single cells versus clump may help the development of methods for homogeneous delivery to the cells of factors and a more stringent control of differentiation protocols. In conclusion, the characterisation of large panels of iPSCs is an important and challenging task. The method we describe here can be applied to test large panel of iPSCs to benchmark and characterise their phenotype and can readily be tailored to the acquisition of other parameters and the analysis of differentiated cell types.