EmbryoNet: using deep learning to link embryonic phenotypes to signaling pathways

Evolutionarily conserved signaling pathways are essential for early embryogenesis, and reducing or abolishing their activity leads to characteristic developmental defects. Classification of phenotypic defects can identify the underlying signaling mechanisms, but this requires expert knowledge and the classification schemes have not been standardized. Here we use a machine learning approach for automated phenotyping to train a deep convolutional neural network, EmbryoNet, to accurately identify zebrafish signaling mutants in an unbiased manner. Combined with a model of time-dependent developmental trajectories, this approach identifies and classifies with high precision phenotypic defects caused by loss of function of the seven major signaling pathways relevant for vertebrate development. Our classification algorithms have wide applications in developmental biology and robustly identify signaling defects in evolutionarily distant species. Furthermore, using automated phenotyping in high-throughput drug screens, we show that EmbryoNet can resolve the mechanism of action of pharmaceutical substances. As part of this work, we freely provide more than 2 million images that were used to train and test EmbryoNet.


Overview of developmental signaling pathways and early zebrafish embryogenesis
The zebrafish embryo develops from a single cell to a segmented larva with welldistinguishable organ primordia within 24 hpf 1 and is therefore a prime model system for our approach. After the cleavage stages, the future germ layers and the body axes become patterned during the blastula period. During gastrulation, these pre-determined cell populations migrate in a well-coordinated fashion to give rise to the later body-plan: The ectoderm and enveloping cell layer engulf the whole yolk sphere during epiboly, the presumptive mesoderm and endoderm ingress below the ectoderm to later form interior organs, and all tissues together move towards the dorsal side to establish the elongated shape of the embryo in a process called convergence and extension. In the following segmentation stages, the embryo elongates further, and somites and primary organs form 1 .
The ligands of signaling pathways regulating these processes are dynamically expressed from specific source tissues in the embryo (Fig. 1a). Nodal expression starts in nuclei of the yolksyncytial layer after the mid-blastula transition around 3 hpf and expands through the marginal blastoderm by positive feedback with peak expression around sphere stage at about 4 hpf 2,3 . FGF expression is induced by Nodal signaling and thus similarly localized around the margin starting from sphere stage 4,5 . During gastrulation, a second FGF expression domain in the future hindbrain is established 4 . BMP expression starts as a shallow ventral-to-dorsal gradient at late blastula stages around 4 hpf and steepens during gastrulation 6-9 . Canonical Wnt expression becomes visible around 5 hpf at 50% epiboly along the blastoderm margin 10 . Starting at mid-gastrulation, Wnt ligands are also expressed in the prospective neuroepithelium, and the posterior expression domain is maintained in the dorsal paraxial tailbud region 10,11 . PCP ligands regulate convergence and extension movements and are expressed in the germ ring at shield stage around 6 hpf and in the paraxial mesoderm during gastrulation 12,13 . Sonic hedgehog is expressed first in the shield at 60% epiboly around 7 hpf, and expression continues in the notochord and the neural tube floorplate 14 . Retinoic acid is synthesized by the enzyme Raldh2, which is first detectable around 30% epiboly at about 5 hpf in the marginal zone and later in the posterior presomitic, somatic and lateral plate mesoderm 15 .
Together, the expression patterns and activities of these signaling molecules cover the embryo (Fig. 1a) and jointly orchestrate the emergence of a body plan from an initially nearly uniform ball of cells.  [43][44][45] ) have been validated for specificity and widely applied in previous studies. mRNA injections of pathway antagonists also induce bona fide signaling pathway loss-of-function phenotypes, and ectopically provided lefty and chordin mRNA can even rescue the respective zebrafish mutants (e.g. Supplementary Ref. [46][47][48][49] ). In addition, the morpholino that we used to induce the -PCP phenotype has been extensively validated in previous studies (e.g. Supplementary Ref. [50][51][52][53][54] ).

Chemical genetics to modulate the activity of signaling pathways
To further validate our approach, we directly compared phenotypes induced by small-molecule inhibitors, pathway antagonists or mutants. These were then classified by EmbryoNet. Nodal phenotypes induced by small-molecule inhibitor treatment (SB-505124, n = 33), injection of a pathway antagonist (lefty1 mRNA, n = 27), or in a receptor mutant (MZoep, n = 27) were all classified as -Nodal with similar accuracy (Extended Data Fig. 1f). Similarly, BMP phenotypes induced by small-molecule inhibitor treatment (LDN-193189, n = 45), pathway antagonist injection (chordin mRNA, n = 26), or in a pathway ligand mutant (swirl -/-, n = 13), were all classified by EmbryoNet as -BMP with similar accuracy (Extended Data Fig. 1g). Importantly, -BMP phenotypes generated by overexpression of the BMP inhibitor Chordin were properly identified by EmbryoNet, even though such treatments had not been used for the training of the network. Furthermore, EmbryoNet recognized -Shh phenotypes with similar accuracy for both small-molecule treatment using Cyclopamine (82%) and zGli3R-GFP mRNA injection (72%).

Detection of known and novel developmental defect features by EmbryoNet-Prime
Using our class activation map (CAM) visualization approach (see Materials and Methods), we found that EmbryoNet-Prime often detected well-known features of defective signaling pathways. For example, it is well known that Wnt mutants have defective heads and tailbuds 11,55 , and EmbryoNet-Prime was indeed positively activated in these regions at later stages, while during early segmentation the whole body axis showed positive activation (Supplementary Video 15). In embryos where the dickkopf enlarged head phenotype was less pronounced, the head displayed negative activation for -Wnt classification in agreement with human assessment (Supplementary Video 16). Interestingly, positive activation of the network in the head region was often restricted to the mid-hindbrain boundary, and positive activation in the tail region seemed to target the yolk extension and the space between tail and body, suggesting a potential role for the angle between the two structures. Most (7 of 10 analyzed) -Wnt embryos were detected earlier by EmbryoNet-Prime than by human assessors. In these cases, the CAM visualization approach showed positive activation in spots across and next to the embryo (Supplementary Video 16).
Similarly, early detection of -Nodal (5.5 hpf, Supplementary Video 19) was based on latent features not recognized by human assessors -primarily the border between yolk and blastoderm, and spots directly outside the embryo proper -while later classification was linked to established structures, such as the ectodermal thickening from late gastrulation, followed by positive activation in head, tail and trunk regions (Supplementary Video 20, n = 10). While the cyclopic eye showed positive activation in the -Nodal class, it interestingly remained neutral for -Shh embryos (Supplementary Videos 21-22, n = 10), where positive activation in the CAM visualization was apparent at the somites and yolk extension. Consequently, -Shh samples got classified at various times during somitogenesis.
-BMP embryos frequently (6 of 10 analyzed samples) first got detected in late gastrulation shortly before they start their characteristic elongation, accompanied by positive activation spots at the yolk (Fig. 3). When elongation started, the yolk, head and tail showed stronger positive activation, which was later maintained in head, tail or both (Supplementary Videos 11-12). Once -BMP embryos disintegrated, the classification immediately switched to negative activation.
Similarly, + RA was first identified at the end of gastrulation, when the area around the tailbud or the head was positively activated in CAM visualizations (Fig. 3, Supplementary Videos 13-14). Half of the analyzed samples (n = 10) were only classified once elongation became visible to some extent. Like in the -BMP class, tail and head stay frequently positive, with the signal often being situated directly outside the respective structure. Interestingly, the signal near the tail looked different from the one in -Wnt samples, supporting the idea that the angle between body and tail is sampled. Most (7 of 10) Normal embryos were identified at the end of gastrulation, and these embryos displayed positive activation in the head and tailbud (Fig. 3, Supplementary Videos 9-10). A second wave of identification (3 of 10) occurred when somites became visible. In later development, the tail and the surrounding space were activated most consistently: In this case the movement of the embryo seemed to be identified.
The -FGF class was only reliably detected around 15 hpf, showing positive activation mostly in regions at the yolk and tail (Fig. 3, Supplementary Videos 17-18, n = 10). Interestingly, the tail later was neutral or even showed negative activation. Overall, negative activation was more dominant in -FGF embryos than in most other classes.
-PCP embryos often got classified correctly directly after gastrulation, when they showed mediolateral widening. The classification then tended to frequently change (Fig. 3, Supplementary Videos 23-24, n = 10) and only at late segmentation stages remained consistent. At these stages, the yolk extension and the somites displayed some positive activation. Overall, -PCP showed the least positive activation.

Use of statins in humans and in animal studies
For the last 40 years, statins have been the first-choice medication to treat hyperlipidemia in humans 56,57 . Statin treatment in pregnant women, however, is not recommended by most public health agencies due to potential teratogenic effects. For example, the Pharmacovigilance and Counseling Center for Embryonic Toxicology at the Charité supported by the German Federal Ministry of Health recommends to avoid the intake of atorvastatin and suggests the use of simvastatin if a therapy with statins cannot be circumvented (https://www.embryotox.de/ arzneimittel/details/ansicht/medikament/atorvastatin, accessed on August 23 rd 2022). However, in July 2021 the Food and Drug Administration (FDA) requested statin manufacturers to remove the FDA warning concerning statin usage during pregnancy (https://www.fda.gov/media/150774/download, accessed on August 23 rd 2022).
A recent meta-analysis continues to suggest an association of statin use with premature delivery and decreased birth weight 58 , whereas other reviews and meta-analyses have stated that there is no clear conclusion regarding the teratogenicity of statins [59][60][61] . Due to the lack of interventional studies in pregnant women, it is unclear whether teratogenic effects might depend on variables such as exposure duration or dosage. A potential effect of statins on FGF signaling, as observed in our zebrafish experiments (Fig. 4c-e), has previously been discussed for cultured human cells [62][63][64][65][66] .
In previous animal studies, statins were applied during late organogenesis stages at doses much higher than those typically used in humans. The recommended daily human intake of atorvastatin ranges between 10 -80 mg/d (0.13 -1.0 mg/kg/d or 0.32 -2.56 µM, assuming a body weight of 80 kg with 70% water content), for simvastatin between 5 -80 mg/d (0.06 -1.0 mg/kg/d or 0.21 -3.41 µM, assuming a body weight of 80 kg with 70% water content) and for lovastatin between 20 -60 mg/d (0.25 -0.75 mg/kg/d or 0.88 -2.65 µM, assuming a body weight of 80 kg with 70% water content). In contrast, doses in previous animal studies have typically used amounts between 2 and 200 mg/kg/d, ranging from micro-to millimolar daily concentrations. When applied during late organogenesis stages in the gestation period, no teratogenicity of atorvastatin could be found in rats, whereas a slight tendency for increased fetal loss and decreased birth weight was observed in rabbits 67 . Similar results were obtained for simvastatin and lovastatin 68 . However, data for statin exposure at earlier embryonic stages is not available.
In our zebrafish embryo experiments, strong dorsal-ventral patterning defects (Fig. 4c) were evident at concentrations comparable to human therapeutic doses and as low as 0.4 µM (see Materials and Methods). However, the bioavailability in zebrafish embryos compared to human cells and tissues is currently unclear.