No Predictable Relationship Between Presacral Vertebral Number and Hes7 Intron Number Across Mammals

In mammals, the number of vertebrae and the somites they derive from is highly limited. Nevertheless, there are some lineages that have an increased number of presacral vertebrae and thus an elongated trunk. This suggests that somitogenesis, the process of somite formation in early development, is altered in these lineages. According the ‘clock and wavefront’ model of somitogenesis, temporal information of somite boundary formation is generated by a traveling wave of cyclic expression of oscillator genes. Hes7 has been suggested to be a key oscillator gene of this molecular segmentation clock. A previous study showed that reducing the number of introns within the Hes7 gene results in a more rapid tempo of Hes7 oscillation and an increased number of presacral vertebrae. Variation in Hes7 intron number could therefore be a potential evolutionary mechanism for varying vertebral number across mammals. In order to test this hypothesis, Hes7 intron number is here compared to presacral vertebral number across a variety of mammals. No signicant relationship between both metrics could be detected as their variation across the mammalian phylogeny is fundamentally different. Integrating our data in the previously published mathematical model of Hes7 oscillation conrms the nding that variation in intron number does not predict variation in presacral vertebrae, rendering a direct causal relationship unlikely. However, our data support the previous suggestion that at least two introns are required for Hes7 pace making function of the segmentation clock.


Introduction
The number of vertebrae in mammals is highly limited. In the neck, it is constrained to seven in all of thẽ 6400 species except sloths and manatees [1,2]. Similarly, the number of dorsal vertebrae (sum of thoracic and lumbar vertebrae) is limited to 19 or 20 in the majority of mammals [2,3]. Thus, nearly all mammals have 26 or 27 presacral vertebrae (PSV, sum of cervical, thoracic and lumbar number). As vertebrae derive from somites, the number of (trunk) somites is accordingly conservative across mammals, too. This contrasts with birds and reptiles in which PSV number as well as the number within the individual vertebral regions is highly variable (e.g., [4]). It has been suggested that a combination of developmental and biomechanical innovations set the meristic constraint very early in mammalian evolution [3,5]. Nevertheless, there are few lineages that broke these constraints and evolved an increased number of dorsal (and thus presacral) vertebrae: elephants (23-24 dorsal vertebrae), hyraxes (28)(29)(30)(31), sea cows (24), golden moles (22)(23)(24), these four lineages are part of the Afrotheria), horses and rhinos (perissodactyls, [22][23][24], and Hero shrews (22-25; [6, 7]). As already the increase of one or two vertebrae is extremely rare among most mammals, these taxa represent signi cant trunk elongations among mammals. Somites (as the developmental precursors of vertebrae) are formed in early development via the process of somitogenesis. During somitogenesis, a pair of somites buds off from the paraxial, presomitic mesoderm every two hours in mouse embryos, suggesting that somite segmentation is controlled by a biological clock with a two-hour cycle [8,9]. The process of somitogenesis has been formalized in a theoretical model -the 'clock and wavefront' model [10]. In this model, temporal and spatial information are integrated and determine somite boundaries. Spatial information is provided by the wavefront of Fgf expression which continuously regresses in the posterior direction according to the posterior elongation of the body axis [11,12]. Temporal information is generated by a traveling wave of cyclic activation/expression of oscillator genes from posterior to anterior [11,12]. When oscillator expression and wavefront meet, somites are periodically generated. Thus, the period for the formation of one somite and the size of each somite are de ned by the period of the oscillator and the distance that the wavefront moves during one cycle of the oscillation, respectively.
The basic helix-loop-helix factor Hes7 is a key effector of Notch signaling during somitogenesis. Its expression follows a two-hour oscillatory cycle controlled by negative feedback and each cycle of Hes7 expression coincides with the generation of each pair of somites [13]. This Hes7 oscillation is proposed to be the molecular basis for the somite segmentation clock [13][14][15]. An important aspect of the negative feedback control is the transcriptional delay between the pre-mRNA and the nal protein due to the transcription, splicing, translation, and transport of the mRNA [13,[15][16][17][18][19][20]. The transcriptional delay of a gene is suggested to be affected by its number of introns that have to be removed during splicing. Variation in intron number would lead to variation in mRNA maturation time and, thus, in transcriptional delay. Using transgenic mice, Harima and colleagues [21] showed that reducing the number of introns from three (wild-type condition) to two or one within the Hes7 gene actually shortens the delay and results in the acceleration of both Hes7 oscillation and somite segmentation. This eventually led to an increase in the number of somites and vertebrae in the cervical and upper thoracic region, thus increasing total PSV number. Their results suggested that the number of introns is important for the appropriate tempo of oscillatory expression and that Hes7 is a key regulator of the pace of the segmentation clock [16,21]. Variation in Hes7 intron number could therefore be a potential evolutionary mechanism for varying PSV number across mammals. In order to test this hypothesis, we inferred Hes7 intron number from published genomes across mammals with varying PSV number.

Results And Discussion
Due to the reduction of the sacrum and hindlimbs, PSV number cannot reliably be determined in fully aquatic mammals (cetaceans and sirenians). For this reason, we used data for all mammals (aquatic + non-aquatic) to examine variation In Hes7 intron number but data for non-aquatic mammals only for the comparison with presacral vertebral number. We retrieved Information on Hes7 intron number from the genomes of 55 mammals (nine aquatic + 46 non-aquatic) available on NCBI GenBank (see Table S1 in Additional File 1). Although 32 of 55 mammals have three introns like the mouse, there is considerable variation in the remaining species. Variation ranges from two introns in the horse (Equus caballus) to six in the Weddel seal (Leptonychotes weddellii) (Fig. 1). In addition, Hes7 splicing variants have been predicted in 13 out of 55 mammalian genomes, resulting in variable number of introns, splicings, and thus mRNA maturation time. For instance, there are variants requiring the splicing of two, three or four introns in the gray short-tailed opossum (Monodelphis domestica) whereas there two variants with ve plus one with three introns in the sperm whale (Physeter macrocephalus). Variation in Hes7 intron number shows no signi cant phylogenetic signal no matter of minimum, mean or maximum intron number is tested (all mammals: K = 0.28 p = 0.12, K = 0.25 p = 0.18, K = 0.21 p = 0.35, respectively; nonaquatic mammals: K = 0.30 p = 0.13, K = 0.26 p = 0.24, K = 0.20 p = 0.49, respectively). A Blomberg's K = 1 prompts that a trait follows a Brownian motion model of evolution [22], the low and non-signi cant K values observed here suggest that phylogenetic relatedness cannot explain variation in Hes7 intron number. It further suggests that intron number might be quite different in closely related lineages (Fig. 1).
PSV number was collected for the same 46 non-aquatic mammalian species from the literature [2,3,6].
35 species show the conserved number of 26 or 27 PSV but in the remaining ones it varies between 22 in the nine-banded armadillo (Dasypus novemcinctus) to 31 in the horse (Equus caballus) and the Cape golden mole (Chrysochloris asiatica) (Fig. 1). There is a strong and signi cant phylogenetic signal in PSV (K = 0.84 p = 0.001), indicating that phylogenetic relatedness is a good predictor of variation in PSV. The signal is even stronger when thoracic (K = 0.92 p = 0.001) and lumbar number (K = 0.92 p = 0.001) are tested separately. These ndings are largely driven by the shared increase in vertebral number among afrotheres (elephants, tenrecs and relatives) and the shared decrease in apes (humans, chimpanzees and relatives) (Fig. 1). The relationship between Hes7 intron number and PSV number was tested using phylogenetic informed generalized least square analysis (PGLS) to account for phylogenetic independences of variables (particularly PSV number). There is no signi cant relationship detectable between Hes7 intron number and PSV across mammals ( Fig. 2A)  To further explore the impact of our ndings, we implemented the mathematical model of Hes7 oscillating expression presented by Hirata and colleagues [15]. As periodic peaks in protein concentration are suggested to produce somite boundaries when they meet the wavefront [13], the number of peaks in a given time (i.e., the oscillation period) is a direct representation of resulting somites in this model. Among others, the model also includes the parameter of transcriptional delay (T m ). T m = 29 min has been shown to result in a pattern of somite formation closely resembling the condition in wild type mice with three introns [21]. Reducing T m by ve minutes (T m = 24 min) decreases the oscillation period of the Hes7 protein and would result in an additional somite in a given time (but with decreased amplitudes in protein concentration). With this parameter set, the model mimics the condition in transgenic mouse lines with two introns [21]. Further decreasing T m , however, dampens the oscillation until there is no somite formation expected any more at T m = 10 min [16,21]. Input values were taken from [21] but T m was varied according the range of intron number in our empirical data set. Whereas two introns result in an additional peak (and expected somite/vertebra), Hes7 protein peak and expected somite number decreases with increasing intron number by one peak/somite per intron (Fig. 2B). According the model, the Weddel seal (Leptonychotes weddellii) is expected to lack four somites (and thus vertebrae) given six introns in the Hes7 gene. However, it owns 27 PSV instead of 23. Contrary, the African elephant (Loxodonta africana) with four introns is expected to have 25 PSV but owns 29. Accordingly, it is not possible to predict PSV number from Hes7 intron number, particularly in mammals deviating from 26/27 PSV. The horse (Equus caballus) might be the only exception as its reduction to only two introns matches its increased PSV count. A deeper look into a possible correlation of Hes7 intron/PSV number among odd-toed ungulates (perissodactyl: horses and donkeys, tapirs, rhinos) might therefore be interesting in future studies. The pilot whale (Globicephala melas) is the only other mammal examined that has only splicing variants with two introns (all other species with minimum number of two introns in Fig. 1 have at least one other variant with more introns). Although its number of PSV cannot exactly be determined, it has only a medium total number of vertebrae among cetaceans [23].

Conclusions
Our ndings suggest that changes in the number of Hes7 introns are not generally causative for the evolutionary variation of PSV across mammals, at least in the experimental and theoretical framework presented by Harima and colleagues [21]. They further highlight that outcomes from evo-devo studies on mice might not always be transferable to evolutionary patterns in mammals in general. However, this does not deny the evolutionary impact of Harima and colleagues' [21] results as they provide crucial insights into the developmental mechanics of axial patterning through the segmentation clock. Their transgenic mouse lines show that simple changes in the transcriptional delay of key oscillator genes are su cient to modify the number of produced somites without resulting in deleterious pleiotropic effects due to complete gene loss. Our results further support the authors' hypothesis that at least two introns are required for Hes7 pace making function of the segmentation clock. Transgenic mouse lacking two or all introns show severe segmentation defects [16,21]. According to their mathematical model, this is based on the dampened oscillation of Hey7 protein concentration that vanishes after a few circles (Fig. 2B). In accordance with this, no species among the investigated mammals has less than two introns. Other genetic mechanisms resulting in varying body axis length in vertebrates have previously been discussed. For instance, they propose the alteration of the basic body patterning into head, trunk, and tail [24], disturbance of the Notch signaling pathway [25], or elongated growth of presomitic mesoderm [26]. The genetic basis underlying the elongated trunk in afrotheres and perissodactyls, however, remains to be determined in future investigations.

Methods
In order to identify orthologs among mammals, mouse Hes7 protein (BAB39526.1) was blasted against all protein sequences of annotated mammalian genomes available on NCBI GenBank (blastp). Blast hits with less than 65% query coverage where discarded to avoid including proteins of other members of the Hes gene family based on the conserved helix-loop-helix only. Remaining sequences were realigned using MAFFT v7.310 [27]. A protein tree was subsequently computed in RAxML 8.2.11 [28] using the PROTGAMMABLOSUM62 substitution model. A group of proteins of phylogenetically divers species closely clustered together but split with an unusually long branch from other protein sequences. We therefore assumed that these sequences still represent other members of the Hes gene family that have misannotated as Hes7 and discarded them from the data set. Nevertheless, there alwaysmwas another and putative Hes7 protein for all species for which sequences have been discarded in this step.
Remaining sequences where more accurately realigned in MAFFT using the E-INS-I algorithm for local similarities but dissimilarities in-between. A nal protein tree was recomputed in RAxML, too (see Fig. S1 in Additional File 1). Genomic sequence of Hes7 genes were subsequently identi ed based on protein IDs in the annotation les. Numbers of predicted exons for each splicing variant were collected. Number of introns (i.e., required splicings between collected exons) were inferred based on these exon counts for every splicing variant in every species.
The phylogenetic species tree and divergence times were gathered from the TimeTree database [29]. All phylogenetic comparative analyses were conducted in R 3.6.0 [30] using the packages ape [31], geiger [32], nlme [33], and phytools [34]. The phylogenetic signal was calculated using Blomberg's K [22] and its signi cance estimated in 1000 simulations. PGLS was conducted by applying generalized least square regression with a covariation structure following Brownian motion. Model parameters were estimated using maximum likelihood. The mathematical model of Hes7 was implemented in Matlab R2020 (MathWorks, Inc.) using the equations presented by [15,21]. Proceeding from T m = 29 min being equivalent of the transcriptional delay resulting from splicing three introns, changes in intron number were implemented by changing T m by ve minutes per intron (following [21]).

Declarations
Ethnic approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
All data generated and/or analyzed during the current study are included in this published article and its additional information les.

Competing interests
The authors declare that they have no competing interests.

Funding
Not applicable.
Authors contribution PA designed the study, did statistical analyses, implemented the model, prepared the gures and drafted the manuscript. MG collected and analyzed the sequences to retrieve intron numbers. All authors approved the nal version of the manuscript.  A) The phylogenetic informed generalized least square analysis (PGLS) shows no signi cant relationship between Hes7 minimum intron and presacral vertebral number across mammals. B) Mathematical model of Hes7 cyclic expression [15,21]. Increasing transcriptional delay (Tm), representing increasing intron number, results in decreasing number of expression peaks within a certain time (i.e., increasing period, decreasing frequency) that would meet the wavefront to form somites (i.e., decreasing number of somites). Tm corresponding to only one intron results in oscillation that is quickly dampened.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. additional le1.pdf