Quantitative Single-Cell Transcript Assessment of Biomarkers Supports Cellular Heterogeneity in the Bovine IVD

Severe and chronic low back pain is often associated with intervertebral disc (IVD) degeneration. While imposing a considerable socio-economic burden worldwide, IVD degeneration is also severely impacting on the quality of life of affected individuals. Cell-based regenerative medicine approaches have moved into clinical trials, yet IVD cell identities in the mature disc remain to be fully elucidated and tissue heterogeneity exists, requiring a better characterization of IVD cells. The bovine coccygeal IVD is an accepted research model to study IVD mechano-biology and disc homeostasis. Recently, we identified novel IVD biomarkers in the outer annulus fibrosus (AF) and nucleus pulposus (NP) of the mature bovine coccygeal IVD through RNA in situ hybridization (AP-RISH) and z-proportion test. Here we follow up on Lam1, Thy1, Gli1, Gli3, Noto, Ptprc, Scx, Sox2 and Zscan10 with fluorescent RNA in situ hybridization (FL-RISH) and confocal microscopy. This permits sub-cellular transcript localization and the addition of quantitative single-cell derived values of mRNA expression levels to our previous analysis. Lastly, we used a Gaussian mixture modeling approach for the exploratory analysis of IVD cells. This work complements our earlier cell population proportion-based study, confirms the previously proposed biomarkers and indicates even further heterogeneity of cells in the outer AF and NP of a mature IVD.


Introduction
Cell-based tissue regeneration approaches to address degeneration of intervertebral discs (IVD) are a glimpse of hope for a large percentage of the population suffering from severe and chronic lower back pain due to age or injury related degenerative disc disease (DDD). The IVD is a tissue of strength, resilience and capable of reducing the impact of movement on the rigid vertebral bodies that stabilize our spine [1]. In this field of regenerative medicine, strategies are pursued to refurbish ailing IVDs with autologous or allogenic cells, derived either from the disc itself or other sources, with special interest in mesenchymal stem cells (MSC) derived from bone marrow, articular cartilage or adipose tissue. Alternatively, strategies to stimulate intrinsic disc cells through the injection of growth factors or the use of gene therapy to replace degenerating extracellular matrix (ECM) molecules are envisioned as ways to "rejuvenate" IVD function as a shock absorber in the spinal joints [2][3][4][5][6]. On first glance the IVD might appear as an anatomically simple structure, comprised of concentric rings in the annulus fibrosus (AF) and gradually transitioning towards a central nucleus pulposus (NP) [1,7,8]. However, compared to other vertebrate organs, the IVD is unique in many ways: The avascular nature of the IVD is reflected in a lack of coloration (see Figure 1A in [3]) and the few remaining cells in the mature IVD are embedded in a vast amount of ECM, likely experiencing only minor cell-cell contact [3]. These few resident cells within the outer AF and NP of mature IVDs display a heterogeneous profile when assessed on a single-cell level for their transcriptional activities [2,3]. Given several ongoing clinical trials [5,[9][10][11] and an alarming emergence of stem cell tourism worldwide [12] there remains a need to further characterize these cells on a molecular level, should they be the subject or target of regenerative treatments [3,[13][14][15].
Animals, especially Mus musculus, have long been extremely helpful as model organisms to study human development and genetic mutations underlying congenital diseases [16]. While important work has been accomplished in mouse to understand the early notochord to NP transition [17][18][19], the cell composition of the NP of mature caudal IVDs differs significantly between human and rodents or porcine [3,13,20] and points to alternative animal models like Bos Taurus. Respecting the 3R guidelines in research-replacement, reduction, and refinement-bovine tails are an ideal IVD source, as abattoirs often discard them. Bovine coccygeal discs provide a very suitable research model to study cell populations of the mature healthy IVD (Figure 1 in [20]). The coccygeal bovine IVD of a skeletally mature animal is considered similar to a human lumbar disc of a healthy young adult on an anatomical, histological, biochemical and biomechanical level [13,[20][21][22][23][24] and represents an ethically more acceptable tissue source to study healthy cells compared to human IVD tissue.
In need for further characterization of resident cells in the mature IVD, we recently proposed a set of novel IVD biomarkers based on the proportion of cells within the outer AF and NP tissue of bovine coccygeal IVDs being either positive or negative for the proposed biomarker transcript [3]: Laminin1 (Lam1) belongs to a group of glycoproteins of high molecular weight and is present in the ECM of the basal lamina with the ability to bind to collagens, integrins and proteoglycans [25]. Glioma-associated oncogene 1 (Gli1) and 3 (Gli3) belong to a family of transcription factors (TF) known as downstream mediators of hedgehog signaling [26][27][28]. Notochord (Noto) is a homeobox TF involved in early notochord development, acts downstream of brachyury [29] and is conserved during notochord development. Noto cell lineage tracing in mouse indicated that the NP originates from the notochord [30]. Scleraxis (Scx) is a basic helix-loop-helix TF otherwise found in connective tissues including tendons and ligaments and is implicated in skeletogenesis during mouse embryonic development [31,32]. Sex determining region Y-box 2 (Sox2) is essential for pluripotency of stem cells and involved with self-renewal capacity [33,34]. Zscan10 (Zinc finger and SCAN (SRE-ZBP, CTfin51, AW-1 and Number 18 cDNA) domain containing) is a TF and proposed multipotency marker in mouse [35]. Tyrosine phosphate receptor type C (Ptprc or CD45) and thymocyte differentiation antigen 1 (Thy1 or CD90), are part of a marker panel defining multipotent mesenchymal stromal cells [36,37].
Analyzing these genes with RNA in situ hybridization (RISH), we point to heterogeneity among cells within the outer AF or NP, which is typically not accounted for by methods involving cell pooling for RNA extraction, such as qRT-PCR, microarray expression profiling or non-single-cell RNA sequencing [2,3,38]. Here, we also explore the use of fluorescent (FL) transcript tagging to allow for transcript quantification of proposed biomarkers through both population averaging and single-cell analysis and we propose that this analysis based on FL values enables further evaluation of cellular heterogeneity within the population of cells actively transcribing a biomarker. Lastly, we provide evidence that transcriptional heterogeneity in the mature IVD is not simply attributable to cells undergoing senescence.

Materials and Methods
All procedures were performed according to ethical standards of Clarkson University (NIH Office of Laboratory Animal Welfare PHS Approved Animal Welfare Assurance Clarkson University-Assurance Number D16-00780 (A4536-01). No human material was included in this study.

Tissue Collection and IVD Isolation
Tails of skeletally mature bovine animals were retrieved fresh from local abattoirs, transported on ice and processed within two hours. All procedures were carried out strictly under ribonuclease free conditions [39]. Coccygeal IVDs were isolated and fixed in 4% (w/v) paraformaldehyde (PFA), dehydrated through a gradient of ethanol baths and embedded in paraffin [40]. Sections with a thickness of 7 µm were cut on a rotary microtome and mounted on VistaVision TM Histobond R glass slides (VWR, Radnor, PA, USA) [41].

Fluorescent RNA in situ Hybridization (FL-RISH) and Immunohistochemistry (IHC)
Templates for RNA probes were amplified by polymerase chain reaction (PCR) from bovine genomic DNA as described [3]. The enzymatic activity of T7 RNA polymerase facilitated labeling of RNA antisense probes with a digoxigenin (DIG) epitope (Roche, Basel, Switzerland) essentially as described in [20]. The epitope was detected by an anti-digoxin primary antibody (JIR, West Grove, PA, USA) and visualized with an Alexa-488 secondary antibody (JIR), while nuclei were stained with 4 ,6-diamidino-2-phenylindole (DAPI) (Thermo Fisher, Waltham, MA, USA). Experiments omitting the use of a probe or the primary antibody served as controls for unspecific binding. For a combined FL-RISH/IHC analysis, a polyclonal rabbit anti-Ki67 antibody (Thermo Fisher) was added after the RISH washing steps and visualized with an Alexa-647 secondary antibody (Thermo Fisher). To visualize subcellular locations of mRNA transcripts as well as for Ki67 protein detection, cells were imaged at 63× magnification with zoom factor 3 using a DMi8 confocal microscope (Leica, Wetzlar, Germany).

Selection of Investigated Biomarkers
Previously we had investigated 50 genes based on their occurrence in IVD related literature [17,36,[42][43][44] or based on data from our work in Mus musculus [9,35,[45][46][47][48][49][50]. Through RNA in situ hybridization (AP-RISH) and z-proportion test we had identified a total of three predominantly outer AF and fifteen predominantly NP associated transcripts, confirming several existing biomarkers such as Col1a1 in the outer AF and Acan in the NP. Of those 18 genes, two were novel biomarkers in the outer AF (Lam1 and Thy1) and eight in NP tissue (Gli1, Gli3, Noto, Scx, Ptprc, Sox2, Zscan10 and Klhl14as (LOC101904175)) [3]. Here, using FL-RISH in a modified approach, we extended our analysis of the nine protein-encoding novel biomarkers to provide quantification of mRNA transcript levels per cell.

Data Collection and Analysis
For transcript quantification of each proposed biomarker we chose 20 random fields of view within the outer AF and NP of three IVDs to capture fluorescent signal intensities, similar to the approach taken for Col2a1 and LOC101904175 [3]. Quantitative transcript data was collected under 40× magnification on a DMi8 confocal microscope (Leica) as maximum projection of four z-stack images. Background was accounted for through thresholding in Image J. Total fluorescence for each cell was calculated in Image J as the product of mean fluorescence and area values. For each biomarker and IVD, the mean total fluorescence was established for cells in the outer AF and NP, analyzed for significance using Student's t-test and graphed with GraphPad PRISM 5 (GraphPad, San Diego, CA, USA).
The Kernel density estimates of the data were graphed in R using a log2 transformation of the total fluorescence intensity for each transcriptionally active cell. The Shapiro-Wilk test was employed to assess the validity of a normality assumption for the data collected. When the data was not normally distributed, k-means clustering was performed to see if the data could be fit to a mixture of Gaussian populations using the expectation-maximization (EM) algorithm. Subsequently, if cells were found to be normally distributed within the sub-clusters, no further subdivision was pursued and the resulting Gaussian mixture model (GMM) was kept. Of note, we have excluded the possibility of artificial sub-clustering due to technical errors through natural data clustering. That is, we found those clusters to be uncorrelated with the frame number or individual IVD sections ( Figure S1).

Morphology of Outer AF and NP Cells in their ECM Environment
Cells of the mature AF and NP reside in a large amount of ECM. Fibers of structural molecules in the ECM appear aligned in a linear manner in the outer AF and show a more crisscross pattern in the NP ( Figure 1). NP cells are typically of round appearance and often described as chondrocyte-like. Larger and smaller cell bodies are seen side-by-side indicating morphological heterogeneity. Often long, bulging cell protrusions are visible in vivo and in vitro ( Figure 1 and [2,20]). Cells in the AF are elongated, resembling more the shape of fibroblasts ( Figure 1). This difference in cell morphology was also reflected by the shape of the cell nuclei, which is typically elongated in cells of the outer AF and round in NP cells as seen after DAPI staining ( Figure 2). The Kernel density estimates of the data were graphed in R using a log2 transformation of the total fluorescence intensity for each transcriptionally active cell. The Shapiro-Wilk test was employed to assess the validity of a normality assumption for the data collected. When the data was not normally distributed, k-means clustering was performed to see if the data could be fit to a mixture of Gaussian populations using the expectation-maximization (EM) algorithm. Subsequently, if cells were found to be normally distributed within the sub-clusters, no further subdivision was pursued and the resulting Gaussian mixture model (GMM) was kept. Of note, we have excluded the possibility of artificial sub-clustering due to technical errors through natural data clustering. That is, we found those clusters to be uncorrelated with the frame number or individual IVD sections ( Figure  S1).

Morphology of Outer AF and NP Cells in their ECM Environment
Cells of the mature AF and NP reside in a large amount of ECM. Fibers of structural molecules in the ECM appear aligned in a linear manner in the outer AF and show a more crisscross pattern in the NP ( Figure 1). NP cells are typically of round appearance and often described as chondrocytelike. Larger and smaller cell bodies are seen side-by-side indicating morphological heterogeneity. Often long, bulging cell protrusions are visible in vivo and in vitro ( Figure 1 and [2,20]). Cells in the AF are elongated, resembling more the shape of fibroblasts ( Figure 1). This difference in cell morphology was also reflected by the shape of the cell nuclei, which is typically elongated in cells of the outer AF and round in NP cells as seen after DAPI staining ( Figure 2).

Subcellular Transcript Localization of Proposed Biomarkers in Outer AF and NP Cells
Owing to the high resolution of confocal imaging and the availability of 3D manipulation software tools, the intracellular location of gene transcripts could be visualized after FL-RISH and imaging at 63× magnification ( Figure 2). According to the dogma of biology, primary protein encoding mRNA transcripts are first generated in the nucleus, processed and then exported into the cytoplasm for translation. As shown here, some transcripts were mostly localized in the nucleus (Thy1, Ptprc and Sox2 in cells of the outer AF), some were found predominantly in the cytoplasm (Gli1 in outer AF cells and Zscan10 in both cell types analyzed) and others were typically found in both cellular compartments (Thy 1 in outer AF cells, Gli1, Ptprc and Sox2 in NP cells or Gli3, Noto and Scx in both cell types) ( Figure 2). The intracellular transcript localization of the transcription factors Gli1,

Subcellular Transcript Localization of Proposed Biomarkers in Outer AF and NP Cells
Owing to the high resolution of confocal imaging and the availability of 3D manipulation software tools, the intracellular location of gene transcripts could be visualized after FL-RISH and imaging at 63× magnification ( Figure 2). According to the dogma of biology, primary protein encoding mRNA transcripts are first generated in the nucleus, processed and then exported into the cytoplasm for translation. As shown here, some transcripts were mostly localized in the nucleus (Thy1, Ptprc and Sox2 in cells of the outer AF), some were found predominantly in the cytoplasm (Gli1 in outer AF cells and Zscan10 in both cell types analyzed) and others were typically found in both cellular compartments (Thy 1 in outer AF cells, Gli1, Ptprc and Sox2 in NP cells or Gli3, Noto and Scx in both cell types) (Figure 2). The intracellular transcript localization of the transcription factors Gli1, Gli3, Sox2 and Zscan10 showed distinct bright foci, while transcripts of the other genes analyzed often had a more homogeneous appearance ( Figure 2). Gli3, Sox2 and Zscan10 showed distinct bright foci, while transcripts of the other genes analyzed often had a more homogeneous appearance ( Figure 2).

Confirmation of Biomarkers Using Transcript Quantification Through Population Averaging
We further investigated our recently proposed protein-encoding biomarkers in the outer AF (Lam1 and Thy1) and NP tissue (Gli1, Gli3, Noto, Scx, Ptprc, Sox2 and Zscan10) using fluorescent tagging of the transcripts (FL-RISH) and confocal imaging ( Figure 3 and Table 1 and Table S1). We first used the approach of identifying positive and negative cells and subjected these data to z-proportion analysis (Table 1). Next, using transcript quantification per cell, and comparing the mean of the cell population average for total fluorescence in the outer AF and NP, the proposed outer AF biomarkers Lam1 and Thy1 alongside the proposed NP biomarkers were confirmed as significant using data generated by FL-RISH ( Figure 3, Table 1 and Table S1).

Confirmation of Biomarkers Using Transcript Quantification Through Population Averaging
We further investigated our recently proposed protein-encoding biomarkers in the outer AF (Lam1 and Thy1) and NP tissue (Gli1, Gli3, Noto, Scx, Ptprc, Sox2 and Zscan10) using fluorescent tagging of the transcripts (FL-RISH) and confocal imaging (Figure 3 and Tables 1 and S1). We first used the approach of identifying positive and negative cells and subjected these data to z-proportion analysis (Table 1). Next, using transcript quantification per cell, and comparing the mean of the cell population average for total fluorescence in the outer AF and NP, the proposed outer AF biomarkers Lam1 and Thy1 alongside the proposed NP biomarkers were confirmed as significant using data generated by FL-RISH (Figure 3, Tables 1 and S1). Quantification of nine proposed biomarkers in the outer AF (green) and NP (blue) of a mature bovine IVD using a population averaging approach. Statistical significance was determined using Student's t-test. Cell types are reflected on the x-axis while the average total fluorescence is displayed on the y-axis. * p < 0.05 and ** p < 0.01. Laminin1 (Lam1); Thymocyte differentiation antigen 1 (Thy1); Glioma-associated oncogene 1 (Gli1); Glioma-associated oncogene 3 (Gli3); Notochord (Noto); Tyrosine phosphate receptor type C (Ptprc); Scleraxis (Scx); Sex determining region Y-box 2 (Sox2); Zinc finger and SCAN (SRE-ZBP, CTfin51, AW-1 and Number 18 cDNA) domain containing transcription factor 10 (Zscan10). Table 1. Comparison of NP/AF ratio data for nine proposed biomarkers in cells of the outer annulus fibrosus (AF) and nucleus pulposus (NP) of a mature bovine IVD generated through z-proportion test and population averaging comparing data from Li et al [3] using RNA in situ hybridization (AP-RISH) (purple) and this work using Fluorescent RNA in situ hybridization (FL-RISH) (green). Laminin1 (Lam1); Thymocyte differentiation antigen 1 (Thy1); Glioma-associated oncogene 1 (Gli1); Glioma-associated oncogene 3 (Gli3); Notochord (Noto); Tyrosine phosphate receptor type C (Ptprc); Figure 3. Quantification of nine proposed biomarkers in the outer AF (green) and NP (blue) of a mature bovine IVD using a population averaging approach. Statistical significance was determined using Student's t-test. Cell types are reflected on the x-axis while the average total fluorescence is displayed on the y-axis. * p < 0.05 and ** p < 0.01. Laminin1 (Lam1); Thymocyte differentiation antigen 1 (Thy1); Glioma-associated oncogene 1 (Gli1); Glioma-associated oncogene 3 (Gli3); Notochord (Noto); Tyrosine phosphate receptor type C (Ptprc); Scleraxis (Scx); Sex determining region Y-box 2 (Sox2); Zinc finger and SCAN (SRE-ZBP, CTfin51, AW-1 and Number 18 cDNA) domain containing transcription factor 10 (Zscan10).

Transcriptional Heterogeneity in IVD Cells is Not a Sole Consequence of Cell Cycle Arrest
Heterogeneity was confirmed with the FL-RISH approach as shown for Zscan10 expression in the AF and Ptprc expression in the NP (Figure 2) and by z-proportion analysis (Table 1). We also performed a combined FL-RISH and FL-IHC analysis using Ki67 as an indicator for cells in the active phase of the cell cycle ( Figure 4). As shown here for Gli1, Gli3, Sox2 and Zscan10, we found Ki67 positive NP-cells in both the presence and absence of these transcripts. Table 1. Comparison of NP/AF ratio data for nine proposed biomarkers in cells of the outer annulus fibrosus (AF) and nucleus pulposus (NP) of a mature bovine IVD generated through z-proportion test and population averaging comparing data from Li et al [3] using RNA in situ hybridization (AP-RISH) (purple) and this work using Fluorescent RNA in situ hybridization (FL-RISH) (green). Laminin1 (Lam1); Thymocyte differentiation antigen 1 (Thy1); Glioma-associated oncogene 1 (Gli1); Glioma-associated oncogene 3 (Gli3); Notochord (Noto); Tyrosine phosphate receptor type C (Ptprc); Scleraxis (Scx); Sex determining region Y-box 2 (Sox2); Zinc finger and SCAN (SRE-ZBP, CTfin51, AW-1 and Number 18 cDNA) domain containing transcription factor 10 (Zscan10).

Single Cell Transcript Analysis Indicates Further Heterogeneity in Transcriptionally Active Cells.
To further investigate the pool of transcriptionally active cells and to provide assessment of cell heterogeneity with single-cell resolution, the total fluorescence intensity of each positive cell was graphed using a Kernel density distribution plot. Some of our biomarkers showed a non-normal distribution in transcriptionally active outer AF cells: Gli3, Noto, and Zscan10 ( Figure 5A) or NP cells: Thy1, Gli1, Gli3, Scx, Sox2 and Zscan10 ( Figure 5B) as determined through the use of the Shapiro-Wilk test; k-means clustering was performed when data was not normally distributed to fit data to a Gaussian mixture model (GMM) (Tables S2 and S3). Using this approach and assessing one gene at a time we identified distinct sub-populations with at least a two-fold difference of transcript levels as represented by the total fluorescence intensity of Gli3, Noto and Zscan10 expressing cells in the outer AF cell population ( Figure 6A) and for Thy1, Gli1, Gli3, Scx, Sox2 and Zscan10 in the NP cell population ( Figure 6B).

Discussion
Discogenic, severe and chronic lower back pain due to disc trauma and degeneration is a debilitating condition that patients, once affected, often have to endure for a significant part of their life. So far, partial alleviation of symptoms but no complete cure or full restoration of athletic abilities or lifestyle choices is available [51,52]. Cell therapy-based regenerative medicine holds promises to delay or revert the degeneration of aging or injured body parts and animal studies as well as clinical trials for IVD regeneration are ongoing (www.clinicaltrials.gov) [9][10][11]53]. At the same time a more profound understanding of the pathophysiology of DDD and the cell types involved is required [37]. DDD is accompanied by a decrease in proteoglycan and collagen II synthesis alongside the degeneration of existing collagen II and an increase of collagen I in the NP [15,[54][55][56][57][58]. In the context of pathophysiology, a recent study suggested a link between iron deficiency and DDD [59], while on the contrary, an iron-overload and related free radical generation was considered a factor leading to DDD in patients with beta thalassemia major [60]. With regards to IVD development, progress has come from transgenic mouse models pointing to the contribution of notochord cells to the NP; however, in the mature human or bovine NP those cells are not retained to the same extent as they are in mouse or other rodents [3,[17][18][19][20]61,62]. The degeneration of ECM macromolecules like glycosaminoglycans (GAG) and the resulting loss of hydration could be a consequence thereof [7,8,11,55,63,64]. Nowadays human NP and AF cells are commercially available (www.sciencellonline.com), however their original fate and gene expression patterns might change in vitro without appropriate ECM and 3D-culture systems in place, as seen for primary bovine cells derived from IVD tissue and maintained in monolayer culture [20]. Furthermore, ethical concerns preclude transgenic lineage tracing of human IVD cells or higher primates in vivo, and not all animal models studied provide accurate cell composition, disc size or loadbearing conditions in the adult [11,52]. The bovine coccygeal IVD is an accepted research model to study cells in the environment of a healthy IVD, resembling closely the histological characteristics of a human lumbar IVD from a healthy young adult [20][21][22][23].
Aside from their exposure to mechanical, physical and chemical constraints [55,[65][66][67][68] AF and NP cells exist like eremites in unique microenvironments. Cells in the mature IVD are sparse, either isolated or present as small cell clusters embedded in a vast amount of ECM ( Figure 1) and [3,58,69], with their survival depending on the diffusion of nutrients or signaling cues through the extensive meshwork of ECM molecules [13,70,71]. Morphological heterogeneity has been described: NP cells are typically of round appearance with larger and vacuoled or smaller cell bodies, notochord cell-derived or chondrocyte-like, while cells in the AF are elongated, resembling more the shape of fibroblasts ( Figure 1) and [3,6,61,62,69,[72][73][74]. An ongoing debate concerns the fate of notochord cells during maturation of the human or bovine NP [13,62,75]. Are those cells replaced over time by migrants from adjacent AF or endplate tissue, as suggested in a mouse injury model [76], and do they subsequently undergo a change in morphology in response to differences in ECM stiffness and fiber alignment between the AF and NP, or do notochord cells eventually differentiate into small chondrocyte-like cells as indicated by qRT-PCR on two morphologically different cell populations which maintain expression of genes like KRT19 and T (brachyury) as a population [13,37]. We have previously identified positive and negative cells for T expression in cells of the AF and NP, however, with no significant difference in cell population proportions, indicating that assessment of heterogeneity within a cell population remains important to address the question of NP cell origin and heterogeneity [3], whether within their native environment or in culture. We have frequently isolated and propagated bovine cells from the outer AF and the NP through non-enzymatic explant outgrowth culture [20,77]. These cells exhibit some criteria established for mesenchymal stem cells, but continue to display heterogeneity when phenotyped by a selection of transcripts, by morphology or by cell movement [2,20,[77][78][79]. In this context others and we find morphological, molecular and "behavioral" differences in vivo and in vitro pointing to at least two cell populations within the outer AF and NP [2,3,61,68,73]. Since a more complex heterogeneity is apparent on the transcriptome level, the molecular identities and developmental origins of cells remaining in the healthy mature human or bovine IVD still need to be addressed further. Hence, the definition of biomarkers to identify AF and NP cells is of strong interest to the field [3,20,44,61,80,81]. In an attempt to contribute to the molecular characterization of IVD cells, we recently proposed a set of novel bovine IVD biomarkers out of 50 genes investigated by chromogenic AP-RISH using a cell population proportion based approach, where transcriptional heterogeneity was indicated through cells expressing (denoted positive) or not expressing (denoted negative) a gene transcript within the same tissue type [3]. These genes were expressed in a significantly larger proportion of cells in the outer AF (Lam1, Thy1) or NP (Gli1, Gli3, Noto, Ptprc, Scx, Sox2 and Zscan10). Other biomarkers previously proposed by others in the AF, like Col1a1 or in the NP, like Acan, Col1a2, Col2a1, Krt18, Krt19, Shh and Ca12 were also confirmed with our approach, while some did not show any significant difference in the proportion of positive cells between the outer AF and NP, for example Foxf1, Hif1a, Krt8, Snap25, Sox5, Sox6, Sox9, Pax1, T and Tmnd. Since our approach to analyze cell population proportions had differed from the common practice of cell pooling and expression profiling of cell population averages which uses methods like microarray, RNAseq or qRT-PCR [42][43][44],we developed here a modification of AP-RISH by using fluorescently tagged RNA probes in combination with statistical data analysis that facilitated both, analysis by cell averaging to be comparable to existing data and single-cell assessment. For comparison of our AP-and FL-RISH data, we initially also identified positive and negative cells and subjected the data to z-proportion analysis and observed a similar trend as described before (Table 1 and [3]). The FL-RISH based method can be applied in vitro and to cells in their natural surroundings in vivo, as demonstrated here. Both RISH methods confirmed Lam1 and Thy1 as potential AF and Gli1, Gli3, Noto, Scx, Ptprc, Sox2 and Zscan10 as potential NP biomarkers. To demonstrate that the observed transcriptional heterogeneity cannot simply be attributed to cells leaving the active phase of the cell cycle, we used the widely accepted cell proliferation marker Ki67 [82,83] in combination with FL-RISH analysis of our proposed biomarkers. We have demonstrated that cells without mRNA expression of the analyzed biomarker are not necessarily in G0. Instead, this transcriptional heterogeneity supports sub-populations within outer AF and NP cells. Additionally, by using a combination of FL-RISH and a GMM approach to further analyze the transcriptionally active cells in their native environment, we noticed for some transcripts in cells of the NP (Thy1, Gli1, Gli3, Scx, Sox2 and Zscan10) or the outer AF (Gli3, Noto and Zscan10), additional heterogeneity based on fluorescent intensities and hence transcript levels. While neither Gli3, Noto, nor Zscan10 are considered AF-specific, neither is Thy1 considered a NP biomarker, the indication of further heterogeneity could point to mixed cell sub-populations of shared developmental origin.
Despite strong evidence pointing to heterogeneity of outer AF and NP cells in the mature IVD, currently most expression profiling data available for the IVD is based on cell pooling and population averaging technologies [14,[42][43][44]81,84]. Here, by using FL-RISH with single cell resolution, we focused on the further evaluation and quantification of our recently proposed novel biomarkers identified through chromogenic AP-RISH and z-proportion analysis [3]. Previously, we provided a detailed comparison of our data on biomarkers to work by Minogue et al. [42,43] and van den Akker et al. [44], as seen in Table 2 of Li et al. [3]. We hypothesized that RNA extraction from pooled cells as source for microarray or qRT-PCR transcriptome profiling could explain some of the noted differences especially for biomarkers identified through microarray technology [85]. An observation from an earlier study suggested Krt8, Krt18 and Krt19 as uniquely expressed in embryonic and fetal notochord cells in the human spine [86]. We identified Krt8, Krt18 and Krt19 positive cells in both outer AF and NP tissue of mature bovine IVDs by AP-RISH, yet only Krt18 and Krt19 were expressed by a significantly larger proportion of cells in the NP [3]. Other markers like OVOS2, GDF10, GPC3, HBB or NCAM1, which were identified as human NP-specific over articular cartilage (AC) were not part of our initial panel of 50 genes and not investigated in our studies [3,13,15,42,43,61,62,87,88]. Classically known markers of the chondrogenic lineage like Col2a1, Acan and Sox9 have often been used to identify mature NP cells [13,89], yet taking cellular heterogeneity into account, Sox9 was not detected in a significantly higher proportion of cells in the NP over the AF in our previous study [3]. Recently, a different set of biomarkers was proposed based on a meta-analysis of the membranome, comparing transcripts of retrovirally immortalized, clonal cell lines derived through enzymatic tissue digestion with datasets of primary human AF and NP cells acquired in an earlier study [81]. The study proposed CLDN11, TMEFF2, CA12, ANAX2, CD44, EFNA1, NETO2 and SLC2A1 as NP-specific cell surface markers to assess cellular heterogeneity of NP cells in vivo and suggested CHIC1, COLEC12, LPAR1, LRP4, LRP5 and FZD2 as markers to define cellular heterogeneity in AF cells. The immortalized NP cell lines were previously employed by this group to identify human NP subpopulations through qRT-PCR based expression profiling, pointing to FOXF1 and CA12 as useful markers with significant difference between clonal subtypes [90]. The approach to focus on the membranome is helpful as cell sorting based on cell surface proteins is often used to isolate different cell types [91], however, effects on gene expression through random retroviral insertion as well as comparing expression profiles obtained through different array platforms can be challenging [85]. A different microarray-based transcriptome analysis of human embryonic and fetal notochord cells recently aimed to identify regulators of IVD development [84]. The identified notochord specific markers CD24, STMN2, RTN1, PRPH, CXC12, IGF1, ISL1, CLDN1, THBS2 and MAP1B were mostly down regulated in adult NP cells, with the exception of the latter. Using cell pooling for RNA extraction and genome wide microarray analysis another recent study also focused on the molecular discrimination between human AF and NP tissue suggesting six novel NP markers: CDKN2B, ARAP2, ERFE, DSC3, DEFB1, SPTLC3 and five novel AF markers: OLFML2A, ANKRD29, EMCN, ADGRL4 and LDB2 with a significant difference in expression levels between the two tissue types [14]. While we confirmed several existing biomarkers established by others through methods employing cell pooling, the minimal overlap with biomarkers of recent studies is not surprising, as the methods, tissue sources and objectives varied between study designs and none of these more recently proposed biomarkers other than Ca12 overlapped with our initial panel of 50 investigated genes [3], which led to the further investigation of the nine biomarkers Lam1, Thy1, Gli1, Gli3, Noto, Ptprc, Scx, Sox2 and Zscan10 presented here. While we cannot draw any further conclusions at this point, it would be interesting to investigate the novel markers proposed in the studies above in the context of cellular heterogeneity going forward.
Lastly, the combination of FL-RISH and the high resolution of confocal imaging allows for a glimpse at sub-cellular transcript location. It is known for some transcripts, like non-coding RNAs, that transcript location could point to regulatory functions [92]. In case of protein-coding mRNAs, an association between cytoplasmic location and function is also considered and found conserved between bacteria, yeast and higher eukaryotic cells [93][94][95]. Less information is available regarding a role of nuclear locations of primary RNA transcripts during processing prior to the export into the cytoplasm for translation. However sub-nuclear hotspots like the nucleoi for ribosomal RNA synthesis and assembly [96] or nuclear speckles, affiliated with the pre-mRNA splicing machinery [97] have been described. Interesting transcript accumulations as seen for the transcription factors Sox2 (NP), Zscan10 (AF, NP) or Gli3 (AF, NP) could point to the proximity of such functional nuclear domains and possibly indicate active transcription and processing of transcripts. The novel markers Lam1, Thy1, Gli1, Gli3, Noto, Ptprc, Scx, Sox2 and Zscan10 suggested here in our study, should be considered as addition to a growing repertoire of biomarker panels for future characterization of outer AF and NP cells along with morphological and cell behavior data, like cell movement, velocity and clustering ability [2].

Conclusions
Future regenerative cell-based therapies are of great potential if a safe outcome is predictable. Whole genome transcriptome analysis is a powerful screening tool to identify novel biomarkers, however phenotyping and genotyping of heterogeneous cell populations with single-cell assessment is an important quality control measure in achieving this predictability. Animal research models like the bovine IVD are invaluable tools to better characterize resident cells within a healthy IVD environment. FL-RISH in combination with confocal microscopy and statistical data analysis can provide both, transcript analysis through population averaging and on a single-cell level. Both were used here to confirm our previously proposed biomarkers Lam1, Thy1, Gli1, Gli3, Noto, Ptprc, Scx, Sox2 and Zscan10.
We believe that a combination of the many biomarkers that have been established so far and those that will be determined in the future will be helpful to clarify the molecular identity of cells in the mature IVD, however, data re-assessment from a cell-by-cell perspective taking heterogeneity into account might become necessary.
Funding: This work was supported by the Bayard and Virginia Clarkson Endowment Fund granted to Thomas Lufkin.