Induced in vivo transdifferentiation of vertebrate muscle into early endoderm-like cells

The extent to which differentiated cells, while remaining in their native microenvironment, can be reprogrammed to assume a different identity will reveal fundamental insight into cellular plasticity and impact regenerative medicine. To investigate in vivo cell lineage potential, we leveraged the zebrash as a practical vertebrate platform to determine factors and mechanisms necessary to induce differentiated cells of one germ layer to adopt the lineage of another. We discovered that ectopic co-expression of Sox32 and Oct4 in several non-endoderm lineages, including skeletal muscle, can specically trigger an early endoderm genetic program in a cell-autonomous manner. Gene expression, live imaging, and functional studies reveal that the endoderm-induced muscle cells lose muscle gene expression and morphology, while specically gaining endoderm organogenesis lineage markers, such as the pancreatic specication genes, hhex and ptf1a, via a mechanism resembling normal development. Endoderm induction by a pluripotent defective form of Oct4, endoderm markers appearing prior to loss of muscle cell morphology, and a lack of mesoderm, ectoderm, dedifferentiation, and pluripotency gene activation, together, suggests that reprogramming is endoderm specic and occurs via direct transdifferentiation. Our work demonstrates that within a living vertebrate animal, differentiated cells can be induced to directly adopt the identity of a completely unrelated cell lineage, while remaining in a distinct microenvironment, suggesting that differentiated cells in vivo may be more amenable to lineage conversion than previously appreciated. This discovery of extensive lineage potential of differentiated cells, in vivo, challenges our understanding of cell lineage restriction and may pave the way towards new in vivo sources of replacement cells for degenerative diseases such as diabetes.


Induced in vivo transdifferentiation of vertebrate muscle into early endoderm-like cells Introduction
In animals, from at worms to humans, nearly all cell lineages develop from one of three germ layers, established during the earliest stages of development. Despite several hundred millions of years of evolution, the type of specialized cells (i.e., neuronal, muscle, or pancreatic cells) that can develop from each of the distinct germ layers (ectoderm, mesoderm, or endoderm, respectively), remain well conserved among highly divergent animals, suggesting that lineage identities are restricted to speci c germ layers.
This paradigm is consistent with the prevailing interpretation of the Waddington's epigenetic landscape 1 model suggesting that during animal development, a cell's lineage potential becomes increasingly restricted as they differentiate. Furthermore, lineage restriction within germ layers appears to also apply to induced in vivo cell lineage reprogramming of vertebrate cells, as directed lineage transdifferentiation has only been shown among cell types from the same germ layer 2 . However, in the invertebrate roundworm Caenorhabditis elegans, it was demonstrated that two speci c non-endoderm cell lineages, can be arti cially converted into intestinal cells, which are normally of endoderm origin 3,4 . It remains unclear whether in vivo intergerm layer lineage conversion is unique to these cell types or to this and other invertebrates 5 . The ability to directly reprogram a differentiated vertebrate cell identity in vivo, without spatial or lineage limitations, would indicate that the developmental origin of a cell and its natural microenvironment does not absolutely limit its potential identity, challenging the dogma of in vivo developmental lineage restriction and opening potential new avenues for regenerative medicine.

In vivo platform to identify endoderm inducing factors
To investigate whether differentiated vertebrate cells can be lineage converted in vivo into unrelated cell types (across germ layers and in distinct microenvironments), we leveraged the zebra sh embryo. This animal model is highly amenable to transgenic modi cation, and due to its rapid embryonic development, functionally differentiated muscle cells are already present at 24 hpf (hours post fertilization), as indicated by a beating heart and body movements. We aimed to induce endoderm lineages from differentiated, non-endoderm-derived cells and in tissues not closely localized to gut endoderm derived tissues. Unlike differentiated cells originating from the mesoderm and ectoderm, which are often intermingled, differentiated cells of the endoderm lineage remain largely contiguous with the gut tube, facilitating the identi cation of ectopically induced endoderm cells in non-endoderm derived tissues.
In contrast to previous in vivo reprogramming strategies to induce a speci c mature subtype of endoderm cell, such as a beta cell 6 , or a pluripotent intermediate, our novel strategy is to directly induce early endoderm-like cells, which are de ned by their ability to give rise to a variety of endoderm lineages With this approach, we predict that the induced early-endoderm cells will have the functional potential to progress developmently towards various endoderm lineages and ultimately adopt distinct endoderm cell fates, such as pancreatic progenitors. Based on our understanding of normal endoderm development, zygotes were injected with DNA expression constructs containing candidate genes implicated in endoderm speci cation, including Foxa2, Foxa3, Gata5, Oct4, Sox17, and Sox32 7 . These transgenes are driven by the heat shock inducible hsp70l promoter 8 , allowing for temporally controlled, transient, and spatially mosaic expression in differentiated cells throughout the entire body of the sh (Fig. 1A-A"). With this approach, transgene expression levels will vary extensively between cells due to variable transgene dose. Therefore, this in vivo platform allows for highly diverse conditions of transgene expression, in a variety of different cell types, and at different states of differentiation, thereby enhancing the probability of discovering factors with lineage conversion potential.
Upon closer examination of sox17:GFP-positive cells within the myotomes, we found that most have multiple nuclei and an elongated and striated cell morphology (see Fig. 1D'). These cells also coexpressed the transgene reporters for both sox32 (nuclear mCherry) and Oct4 (membrane mCherry), indicating that differentiated skeletal muscle cells can be cell-autonomously induced by Sox32 and Oct4 to express sox17. Because sox17 is also normally expressed in a sub-population of endothelial cells during development, we examined endothelial markers in muscle cells of the myotome co-expressing Oct4 and sox32 and found no ectopic induction of i1a:GFP or kdrl:GFP (Supplement Fig. 1A-D'), suggesting that the ectopically induced sox17 expression does not indicate endothelial identity, but more likely endoderm. Further, forced coexpression of both sox32 and Oct4 can also cell-autonomously induce ectopic foxa3 (foxa3:GFP), a gene normally expressed in a broad subset of the endoderm (Fig. 1E-E'). Consistent with normal endoderm development, we nd that ectopic foxa3:GFP expression appeared later than sox17. These results indicate that Sox32 and Oct4 can induce the earliest genetic markers of the endoderm lineage.
To increase the number of cells expressing both Oct4 and Sox32, the coding sequence for each factor, in addition to mCherry (or H2B::mCherry), was placed within the same polycistronic construct, downstream of the hsp70l promoter (hsp70:Oct4-P2A-mCherry-P2A-sox32). Using this combined expression construct, we nd approximately 70% of animals dispay ectopic sox17:GFP expression with 30% of these animals exhibiting 1-3 sox17:GFP expressing muscle cells and 40% showing between 3 and 20 cells (Supplement Fig. 2A-A"). Moreover, we consistently found ectopic sox17:GFP in p63 expressing cells (epidermal) and Elavl3/4 expressing cells (neuronal) suggesting that cells originating from the surface ectoderm and neural ectoderm, respectively, can also be induced by Sox32 and Oct4 to express sox17 (Supplement Fig. 2B-C"). These ndings demonstrate that sox17:GFP can be induced in cells derived from mesoderm, surface ectoderm, and neural ectoderm.
Next, we sought to broadly investigate the cell types that could be induced to up-regulate endoderm associated markers. We performed single cell RNA-seq on FACs sorted H2B::mCherry + cells isolated from injected 48-50 hpf zebra sh larvae (Supplement Fig. 3A). The Seurat (v. 3.1) package for R was used for sequence data analysis with standard quality control (QC) parameters applied to exclude low quality cells and dublets 13 . Cells passing QC were then clustered and annotated by label transfer using the zebra sh single cell atlas to assign cell type and germ layer identity 14 (Supplement Fig. 3B, D). 15,344 cells from animals expressing Oct4, H2B::mCherry, and sox32 and 8,592 cells from control animals expressing only H2B::mCherry passed QC and were used for down stream analysis. Despite a low representation of the endoderm population compared to those of ectoderm and mesoderm, we found a signi cant increase in the number of cells annotated as the endoderm lineage (Supplement Fig. 3C; Fischer's exact test: p < 3.34e-6). This nding is consistent with our whole mount expression studies showing the induction of endoderm identity markers. By comparing the number of experimental and control cells expressing any of the endoderm markers established by endoderm developmental genetics studies and particularly from a recently reported mouse endoderm development sc-RNA seq data 15 , we were able to address the issue of drop-outs, which is especially problematic for lower expressing endoderm genes. We examined endoderm gene expression in groups of cells clustered into 9 general cell types and found that a greater number of cells expressed endoderm marker genes in clusters labelled as muscle cell/muscle cell progenitors, macrophages, neural crest as well as those classi ed as endoderm (Supplement Fig. 3D-F; Fisher's Exact p < 5.45e-05). For example, experimental cells clustered as muscle/muscle progenitor have signi cantly more cells expressing endoderm related transcripts compared to control (foxi1, BH Adjusted Fisher's exact p < 0.01; fabp1b.1, BH Adjusted Fisher's exact p < 0.002; Supplement Fig. 3F). Consistent with our observations shown above, these ndings suggest that endoderm factors can be induced in cells originating from non-endoderm populations. Importantly, the cells expressing endoderm genes that were identi ed in non-endoderm clusters were not annotated as endoderm likely because the reprogramming was incomplete at the time of analysis. Conversely, cells whose identity was more completely reprogrammed would not cluster with their original cell types, but with the endoderm cluster instead, consistent with the increased proportion of the endoderm cluster. However, drop-outs, as well as low level and transient gene expression, limit the ablility to track their transdifferentiation progress using pseudotime analyis.

Differentiated skeletal muscle cells rapidly converted into early endoderm-like cells
Given that genetic and cellular changes induced by ectopic expression of Oct4 and sox32 were consistently observed by both whole mount and scRNAseq expression data, we focused our analysis on skeletal muscle cells in the trunk of the animals, allowing us to genetically and visually track a speci c differentiated cell type during reprogramming. Differentiated skeletal muscle cells can be identi ed based on their elongated shape, striations, and multiple, evenly spaced nuclei, and are found in a consistent, parallel pattern spanning individual myotomes. The appearance of ectopic sox17:GFP and foxa3:GFP expression in muscle cells with these characteristics suggests that differentiated cells are indeed being induced to activate expression of these early endoderm genes. Importantly, it also suggests that a complete loss of muscle morphology and cell division is not a prerequisite for induction of endoderm genes. However, a subset of skeletal muscle cells expressing Oct4 and sox32 transgenes do exhibit a variable loss of muscle morphology, including partial or complete loss of striation and elongated rectangular cell shape, and adopting a more stellate shape ( Fig. 1E and J). Interestingly, the nuclei in reporgrammed muscle cells appear to aggregate together towards the center of the cell (Supplement Fig. 4), reminiscent of striated muscle regeneration 16 . Live imaging from 48 to 72 hpf, using light sheet microscopy, shows highly dynamic changes in cell morphology (Supplement Movie 1). In particular, some muscle cells appear to be separating into two individual cells, potentially becoming mononucleated ( Fig. 1I-I"; white and yellow arrows, Supplement Movie 2). Further, we observed that cells with induced sox17:GFP form highly dynamically cellular extensions protruding in multiple directions, similar to the behavior of normal sox17 expressing endoderm cells during gastrulation (Supplement Fig. 5) 17 . Intriguingly, live imaging revealed that transgenic mCherry expression can be rapidly cleared, suggesting an active process for protein turnover in cells that are being reprogrammed ( Fig. 1J-J"; double arrows, Supplement Movie 3). Consistent with the rapid clearance of transgenic mCherry protein, endogenous Myosin proteins (and transcripts), which are normally abundantly expressed, can become undetectable in most muscle cells with induced sox17:GFP expression (Supplement Fig. 6A-B"). Rapid turnover of structural proteins such as Myosin would explain the rapid loss of the skeletal muscle morphology, as these cells are dense with structural proteins necessary for their shape and striations.

Loss of muscle identity and speci c induction the early endoderm genetic program
To investigate whether induced muscle cells with sox17:GFP expression can function like early endoderm cells and progress in a development-like manner, we examined later markers of endoderm regionalization and organogenesis. Normal, endogenous endoderm sox17 mRNA expression is temporally restricted to a short 3-hour time window before the end of gastrulation (~ 7-10 hpf) and is therefore not coexpressed with later endoderm genes 18 such as foxa2, hhex, and ptf1a. Consistent with the early and brief duration of sox17 transcript expression observed during normal endoderm development, the ectopic sox17 mRNA expression in the myotomes was limited to within the initial several hours following heat shock induction of the Oct4 and sox32 transgenes (Supplement Fig. 7). However, perdurance of GFP allows for tracking of sox17:GFP-positive cells beyond the loss of ectopic sox17 transcript expression. By 48 hpf, a subset of the induced sox17:GFP-positive cells in the myotome can be found to coexpress transcripts of the foregut endoderm gene foxa2 and pancreas speci cation genes, hhex and ptf1a ( Fig. 1F-H), demonstrating that these induced cells can function like early endoderm cells by differentiating into later, more de ned endoderm lineages. In contrast, transcripts speci c to muscle such as myh7 (myosin heavy chain 7) mRNA are lost from most sox17:GFP positive muscle cells (Supplement Fig. 6A-B"). These ndings suggest that ectopic Oct4 and sox32 expression can inhibit muscle speci c transcripts while triggering the endoderm developmental genetic program.
To express transgenes in a speci c type of differentiating skeletal muscle cell, the mylpfa (myosin light chain, phosphorylatable, fast skeletal muscle a) promoter 19 was used to drive Oct4 and Sox32 expression. With a construct containing the mylpfa promoter driving mCherry alone, we con rmed that the activity of this promoter is almost exclusively restricted to fast muscle cells in injected embryos ( Fig. 2A) and does not induce expression of endoderm genes. With Oct4, sox32, and mCherry all driven by this lineage speci c promoter (mylpfa:Oct4-P2A-mCherry-P2A-sox32), we found ectopic sox17:GFP and foxa3:GFP induced only in the fast skeletal muscle layer of the myotomes (Fig. 2B, C), but not in other tissues. Flow cytometry analysis of mCherry positive cells from these injected embryos at 48 hpf reveal that approximately 30% of muscle cells expressing sox32/Oct4 also ectopically express sox17:GFP (Fig. 2D). Given that sox17:GFP positive cells continue to arise after 48 hpf and that mCherry expression can also be rapidly lost (see Fig. 1J-J", Supplement Movie 1), the actual level of e ciency is likely higher -although e ciency does vary among individual embryos (see Supplement Fig. 1). In contrast, under the same experimental conditions, no signi cant induction of endothelial marker i1a:GFP or neural marker elavl3:GFP was detected, consistent with speci c induction of early endoderm transcripts (Fig. 2D) and suggesting that endoderm is speci cally induced.
To more broadly assess transcriptional changes induced in reprogrammed muscle cells, we pooled sorted mCherry-positive cells from sh injected with either the mylpfa promoter driving mCherry alone (control) or together with Oct4 and sox32, and carried out qPCR analysis. Consistent with our whole mount histological studies using uorescent reporter lines and in situ hybridization, early and late endoderm genes, foxa3, hnf1ba, hnf4a, and ptf1a mRNA expression are all signi cantly upregulated in sorted muscle cells with sox32 and Oct4 transgene expression (Fig. 2E, Supplement Fig. 7). These results suggest that in fast muscle cells, the endoderm speci cation genes sox32 and Oct4 can induce early endoderm-like cells that are functional in their ability to proceed developmentally down multiple genetic programs of various endoderm lineages. We also examined markers of other germ layers to determine the speci city of the endoderm induction. Mesoderm genes, tbxta (T; brachyury) and meox1, surface ectoderm gene p63, and neural ectoderm genes, sox1a, zic2.2, and others, are not upregulated, suggesting neither mesoderm or ectoderm lineages are induced (Fig. 2E). However, we found that the muscle genes myod, myhz2, tnnt3a, and mylpfa are downregulated, consistent with the loss of muscle cell morphology and myosin protein/mRNA expression described above. Together, these gene expression studies indicate that forced expression of sox32 and Oct4 in fast skeletal muscle cells leads to a loss of the muscle genetic program and to a speci c induction of early endoderm-like cells.

Induced endoderm cells proceed through a developmental mechanism for pancreatic lineage commitment
To explore the mechanism by which the induced endoderm (iEndo) cells proceed to commit to a speci c endoderm organogenesis lineage, we focused on reprogrammed muscle cells with ectopic expression of the pancreas speci cation gene ptf1a. Misexpression of sox32 and Oct4 can lead to muscle cells with ptf1a:GFP by 48-72 hpf in about one in ve injected embryos. These ptf1a:GFP + cells appear to have only one nuclie and continue to express GFP up to at least 10 dpf, suggesting a persistent lineage conversion. iEndo muscle cells with ectopic ptf1a:GFP are most often found in the posterior third of the sh. The low frequency and spatial propensity of induced ptf1a:GFP expression led us to posit that only certain iEndo cells with optimal intrinsic (intracellular) and extrinsic (extracellular/microenvironment) conditions will proceed toward a speci c endoderm lineage pathway such as liver, intestine, or pancreas. This in uence by the microenvironment on pancreas lineage commitment is analogous to the regional speci city observed for induced in vivo transdifferentiation of astrocytes into dopaminergic neurons 20 . The skeletal muscle microenvironment has previously been demonstrated to be permissive for human pancreas cell differentiation and function 21-24 . Because we already observed that neuronal genes are not induced by Oct4 and Sox32 (shown above), the ectopic ptf1a expression observed is unlikely to represent the normal cerebellar or retinal domains of ptf1a expression. Furthermore, because iEndo muscle cells with ectopic ptf1a expression can be found to coexpress sox17:GFP (see Fig. 1H), they are more likely to be pancreatic endoderm.
The temporal sequence of endoderm genes expressed in iEndo cells (sox17 intially, followed by foxa2 and hhex; Fig. 1F-H) appears to recapitulate that of endogenous pancreatic endoderm lineage commitment. This observation led us to functionally assess whether induction of ectopic ptf1a expression occurred via a genetic mechanism also required for endogenous pancreas development. FGF signaling was shown to be necessary for foregut endoderm cells to adopt ventral pancreatic endoderm lineage in normal zebra sh development and in human stem cells differentiation models 25,26 . Further, FGF signaling is highest posteriorly where ectopic ptf1a:GFP-positive muscle cells are most often observed 27 . Compared to DMSO treated controls (Fig. 3C-C"), blocking FGF receptor tyrosine kinase signaling with the inhibitor SU5402 prevents endogenous foregut endoderm (compare Fig. 3B", D", inset), as well as iEndo muscle cells (Fig. 3D-D"), from expressing ptf1a:GFP, suggesting that FGF signaling is required for both normal and induced pancreatic lineages. Note that SU5402 had no obvious effect on neural ptf1a:GFP expression (compare Fig. 3"A with 3B" and 3D", white arrows), consistent with a pancreas speci c loss of ptf1a:GFP. Moreover, inhibition of FGF signaling does not prevent ectopic sox17:GFP (Fig. 3E-F"), showing that induction of early endoderm cells was not hindered by loss of FGF signaling. These ndings suggest that iEndo cells do not spontaneously express ptf1a, but rather proceed through a step-wise genetic program resembling normal, Fgf signaling dependent, pancreas lineage commitment.

In vivo induced endoderm is independent of a pluripotency mechanism
Examples of in vivo cell lineage plasticity, including natural transdifferentiation in worms, n regeneration in zebra sh, and maintenance of the neural crest lineage potential in frogs, have implicated a pluripotency mechanism [28][29][30] . These ndings led us to functional assess whether our induction of muscle into endoderm lineage conversion using Oct4 and Sox32 also involves a pluripotency mechanism. The lack of induction of the skeletal muscle progenitor genes pax3 and pax7 (Fig. 2G), suggests that lineage conversion does not involve a dedifferentiation mechanism. The appearance of endoderm gene expression prior to loss of muscle morphology suggests that cell division is not required for iEndo cells. Together with a lack of mesoderm and ectoderm gene activation, we posit that lineage conversion of muscle to endoderm does not involve a pluripotent intermediate. Consistently, qPCR expression analysis does not show upregulation of standard pluripotent mRNA markers, including endogenous oct4, myca, vasa, or nanog (Fig. 2E). However, it may be possible that Oct4 functions as a pluripotent factor without requiring the iEndo cells to have gone through a detectable pluripotent intermediate. Mammalian Oct4, in combination with other Yamanaka factors, was previously shown to be able to reprogram sh cells in culture to pluripotency 31 . To assess the requirement of Oct4's pluripotency function for reprogramming muscle into endoderm in vivo, we used a modi ed form of mouse Oct4 which has an amino acid substitution in the linker domain, Oct4(L80A), previously shown to be transcriptionally active but unable to induce pluripotency (Fig. 4A) 32 . As with wild-type Oct4, misexpression of mutant Oct4(L80A) with Sox32 (hsp70:Oct4(L80A)-P2A-mCherry-P2A-sox32) can induce muscle cells to express sox17:GFP, as well as lead to cell shape changes (Fig. 4B-B"). Moreover, like wildtype Oct4 iEndo cells, these Oct4(L80A) iEndo cells can also proceed to exhibit ptf1a:GFP expression and lose myosin expression (Fig. 4C-E",Supplemental Fig. 6F-F"), suggesting that they can progress towards a pancreas lineage genetic program, while rapidly losing muscle protein expression. These ndings demonstrate that a robust pluripotency mechanism via Oct4 is not required for induction of muscle cells into early endoderm-like cells, further supporting our conclusion that in vivo lineage conversion across germ layers can be induced directly, independent of a pluripotency mechanism. This nding also functionally demonstrates that the role of Oct4 in endoderm speci cation may be distinct from its well recognized role in pluripotency.

Discussion
Within a vertebrate embryo, we demonstrate that differentiated mesoderm-derived skeletal muscle cells can be induced by ectopic expression of just two transcription factors, Sox32 and Oct4, to cellautonomously trigger an early endoderm-like developmental program. These early endoderm-induced cells can rapidly lose muscle cell morphology and gene expression while progressing through an endoderm lineage program resembling endogenous endoderm development. This lineage conversion process appears to be direct as no evidence was found to support a pluripotent or dedifferentiated intermediate state. As with normal endogenous pancreatic ptf1a expression, expression of ptf1a in iEndo muscle cells also requires Fgf signaling. Expression of ptf1a, in addition to other endoderm organogenesis speci cation factors, shows that these iEndo cells can function to give rise to distinct endoderm lineage genetic programs. Expression of organogenesis lineage markers also suggests the potential for these iEndo to be further coaxed, with additional factors, into a speci c functional differentiated endoderm organ cell type such as pancreatic beta-cells, which would be useful for diabetics.
Waddington's epigenetic landscape model suggests that as cells differentiate during normal development, they become more restricted from adopting other lineage identities. However, requiring only a few transcription (intrinsic) factors, transdifferentiation across germ layers using in vitro approaches has proven to be surprisingly easier than the Waddington model would predict, challenging this paradigm of limited lineage potential of differentiated cells 33, 34 . Yet, with in vitro lineage reprogramming, removal of the cells from their native microenvironment and exposing them to arti cial (extrinsic) culture conditions may compromise their lineage stability, facilitating their reprogramming. Importantly, Waddington's model addresses cell lineage constraint in the context of an embryo, where a cell's normal microenvironment may also be restricting its lineage potential. Although in vivo lineage reprogramming during development in fruit ies suggests great plasticity, those efforts were restricted to converting undifferentiated cells within a germ layer 35,36 . Our work, using zebra sh embryos, demonstrates that despite the differentiated muscle cells remaining in their native microenvironment, ectopic expression of only two transcription factors is able to repress muscle identity while triggering the early endoderm genetic program, suggesting that both intrinsic and extrinsic factors maintaining muscle cell identity can be surmounted to induce conversion towards an unrelated lineage identity, as previously predicted 37 . Because we observe that ectopic sox17 expression is also induced by Oct4 and Sox32 in other differentiated, non-muscle, cell types in other regions of the embryo, it is likely that other cell lineages, in other distinct microenvironments, are also amenable to induced in vivo lineage conversion.
Analogous to the commonly used MEFs (mouse embryonic broblasts) for in vitro lineage reprogramming studies, differentiated cells within the zebra sh embryo were used here as a practical vertebrate in vivo discovery platform to identify lineage reprogramming factors and investigate their mechanisms. This in vivo platform allows for transient and mosaic expression of transgenes, at greatly varying levels, in a large number of cell types within a large number of animals, thereby increasing the likelihood of discovering combinations of factors capable of inducing lineage conversion. Testing reprogramming factors on differentiated cells throughout the embryonic sh has key advantages. The rapid development and transparency of the zebra sh embryo together with transgenic uorescent lineage reporter lines allows for quick assessment of candidate reprogramming factors in live animals. In contrast to MEFs, the particular cell lineage reprogrammed, and its speci c microenvironment, can be de nitively identi ed, allowing for the assessment of how intrinsic and extrinsic factors in uence the lineage conversion process. But similar to MEFs, differentiated cells at developmental stages are presumably more amenable to cell lineage reprogramming, providing a sensitized background for revealing cell lineage reprogramming factors that would otherwise be di cult to uncover. Additional factors, including epigenetic modifying small molecules, may be necessary for reprogramming more mature or aged differentiated cells, as previously shown in adult mice 38 .
Although we nd that there is on average a high e ciency (30%) of endoderm induction within pooled samples (Fig. 2D), the variability of reprogramming e ciency we observed among individual embryos indicates great potential in the number of differentiated cells that are amenable to transdifferentiation. Conversely, the high variability of reprogramming e ciency also indicates that there are factors and conditions yet be uncovered that may allow for more consistent and e cient induction of lineage conversion. Uncovering these factors to enhance lineage reprogramming will undoubtedly lead to further mechanistic insight into direct lineage conversion, both in vivo and in vitro. Testing additional reprogramming factors may also allow for induction of a distinct endoderm lineage such as pancreatic beta cells, which will potentially have biomedical applications. Transplantation of replacement beta cells into various locations in the body 39 , including in skeletal muscles in humans 23,24 suggests that beta-cells can function outside their normal microenvironment to help maintain blood glucose levels. The ability to induce cells outside the foregut to adopt endoderm-like identity, as demonstrated here, is a signi cant step towards ultimately generating replacement beta cells from potentially any cell type, directly in the body of diabetics. Direct in vivo lineage conversion to generate replacement cells 40,41 may bypass safety 42 and e cacy risks associated with transplantation of in vitro engineered pluripotent cells [43][44][45][46][47] . An unrestricted ability for directly reprogramming any differentiated vertebrate cells in vivo into any cell type, in any microenvironment, would greatly expand the potential therapeutic applications of direct, induced in vivo lineage conversion for regenerative medicine.

Materials And Methods
Animal husbandry: Adult zebra sh and embryos were cared for and maintained under standard conditions. All research activity involving zebra sh was reviewed and approved by SBP Microscope setup: Zebra sh embryos were imaged with a custom-built light sheet microscope 61 . A laser engine (Toptica MLE, 488nm,561nm) was used as the excitation light source. The uorescence signal was captured by a water dipping objective (OLYMPUS, UMPLFLN 20XW, 20X/0.5) placed perpendicular to the light sheet direction. Collected signal was ltered (Chroma ET BP525/50, ET LP575) prior to image acquisition.
Sample embedding: Zebra sh embryos were de-chorionated at 48 hpf and transferred into a low-melting agarose (0.6%) solution prepared with E3 medium and tricaine (Maintained at 37°C). Using a syringe and needle, samples were drawn into a cleaned FEP tube (inner diameter: 0.8mm; wall thickness: 0.4mm; Bola) and the bottom of the tube plugged with solidi ed 2% agarose for additional support during imaging. The plugged FEP tube was mounted on the stage assembly so that the sample was positioned vertically at the intersection of illumination and detection optical paths of the light sheet microscope. The sample chamber temperature was maintained at 28°C using a custom-built perfusion based temperature control system. Time-lapse acquisition: Samples were moved along the detection axis while being illuminated by the excitation light sheet. Images were taken every 2 microns with a high speed sCMOS camera (Zyla 4.2 PLUS, Andor) at 100 frames per second. The sample was imaged every minute and different excitation wavelengths were imaged sequentially. A typical z-stack of around 300 images was required to cover the region of interest. The overall length of the time lapse recording was approximately 16 hours, resulting in 960 individual 3D stacks in each channel.
Fluorescent Activated Cell Sorting: To isolate single cells, uorescence activated cell sorting (FACS) was performed on wild type embryos injected at the 1-cell stage with either mylpfa:mCherry or mylpfa:Oct4-P2A-mCherry-P2A-sox32. At 24 hpf, injected embryos were placed in 0.003% phenylthiourea (PTU) to inhibit melanocyte formation and prevent pigmentation. At 48 hpf, embryos were inspected and only healthy/developmentally normal embryos were manually dechorionated. Following collection in 1.5ml eppi tubes, pooled embryos were washed in 1xPBS, incubated in 1xPBS(+Mg/Ca; )+ 0.05 mg/ml Liberase TM (Roche) at 37°C for 60 min and triturated with a P1000 pipette. The resulting suspension was ltered with a 30μm Celltrics cell strainer (Sysmex), spun down (300g for 10min at 4°C) and resuspended in ice cold 1xPBS +0.9% FBS(Gibco). SYTOX Red (Thermo Fischer Scienti c) was added at 1:1000 to exclude dead cells immediately prior to sorting for either qPCR analysis or quanti cation of induction e ciency.
Single cell RNA-seq library preparation and sequencing: FACs for scRNA-Seq was performed at the SBP Medical Discovery Institute FACs core with slight modi cations. Embryos were injected with hsp:Oct4-P2A-H2B::mCherry-sox32 (30-35ng) or hsp:H2B::mCherry (30-35ng). At 50 hpf, experimental and control embryos were pooled separately and dissociated by incubating with FACSmax Cell Dissociation Solution (Genlantis) at 28°C for 15-20 minutes with gentle trituration every 5 minutes. Sorted cells were captured in FACS buffer and concentrated to recommended levels suitable for use with the 10X Chromium platform.
In collaboration with the SBP Medical Disocovery Institute Genomics Core, cDNA libraries were generated with the Chromium Next GEM Single Cell 3' Reagent Kits (v. 3.1) according to manufacturer's protocol.
cDNA quantity and quality was con rmed using a Bioanalyzer(Agilent). Following library construction, samples were pooled and sequenced on a full Illumnia Novaseq SP Flowcell and sequenced 28x91, yielding an average of 1,752 (Oct4-P2A-H2B::mCherry-sox32 ) and 2,129 (H2B::mCherry) reads per cell. Sequencing was performed on an Illumina Novaseq 6000 in collaboration with the La Jolla Institute for Immunology Sequencing Core.
sc-RNA-Seq data analysis: Bioinformatics analysis was performed in the Sanford Burnham Prebys Bioinformatics Core. Raw 10X single cell sequencing reads were processed and aligned against the zebra sh genome (GRCz11) using Cell Ranger software pipeline version 3.1.0 (10x Genomics). The resulting sequencing data from Oct4/H2B::mCherry/sox32 treated cells and H2B::mCherry treated cells were merged using the Cell Ranger aggr command and the merged count matrix was analyzed using Seurat v3 (Butler et al., 2018).
Gene features having at least one read in more than three cells were used in the analysis. Quality control to eliminate low quality cells and doublets was performed using standard cut offs: > 200 and < 5000 gene features with at least one read, <100,000 total read counts. The resulting count matrix was then analyzed using Principal Component Analysis (PCA) resulting in 105 principal components based on a Jackstraw signi cance of p<0.01. For cell clustering, Seurat FindNeighbors and FindClusters functions were applied to the 105 principal components with a resolution of 0.5 using the default Louvain algorithm. 32 cell clusters were identi ed from this clustering analysis. Dimension reduction and cell clustering visualization was performed with the UMAP algorithm. Cell cluster biomarkers were examined using the Seurat FindMarkers function. To annotate and label cells from the 32 identi ed clusters, we compared our single cell sequencing data to the zebra sh single cell atlas 14 . Seurat FindTransferAnchors and TransferData functions were used to integrate our single cell sequencing data set with the zebra sh single cell atlas and transfer cell annotation labels. The identi ed 32 cell clusters were annotated based on the cell types de ned by the transferred cell labels 62 . For analysis, these 32 clusters were manually merged to 9 based on cell identity labels. A two sided Fisher's exact test was performed to test the proportion of cells originating from different germ layers or different cell clusters. To overcome dropouts of low expressing endoderm genes, we utilized known endoderm factors and also extracted endoderm biomarkers from endoderm single-cell literature 15 to identify any potential endoderm cells in nonendoderm clusters from Oct4/H2B::mCherry/sox32 treated cells and H2B::mCherry treated cells. We then performed two sided Fisher's exact test on cells expressing any of the known endoderm biomarkers to estimate the difference in the proportion of endoderm cells from the two cell populations.
Quantitative PCR: For qPCR experiments, cells were sorted with a FACSAria II (BD Biosciences). Following FACS, sorted cells were homogenized with a QIAshredder (QIAGEN)and total RNA extracted using the RNeasy Mini Kit (Qiagen). cDNA was synthesized using i-Script Supermix (Bio Rad) and qPCR performed using iQ SYBR Green Supermix (Bio Rad) according to manufacturer protocols. Samples were loaded on a 384 well plate and analyzed on an ABI 7900HT (Applied Biosystems). Primers to detect zebra sh transcripts were designed and are described in Supplement Material and Methods. Relative expression levels of genes were calculated by the following formula: relative expression = 2−(Ct[gene of interest]