Longer or shorter spines: Reciprocal trait evolution in stickleback via triallelic regulatory changes in Stanniocalcin2a

Significance Across a broad range of species, evolution has modified a common vertebrate body plan to produce endless forms most beautiful. A key unanswered question is whether diverse morphological changes in a common structure arise from modifying different genes or the same genes in different ways. Many natural populations of threespine stickleback have evolved either longer or shorter dorsal and pelvic spines. Here, we identify reciprocal regulatory changes in an ancient enhancer of the bone growth inhibitor, Stanniocalcin2a, as an underlying genetic cause. Many other stickleback loci similarly show three or more major classes of variants across populations; we suggest that diverse alleles at key loci may represent a common mechanism for producing diverse phenotypes from a smaller toolkit of genes.


Principal component analysis
Following variant calling, we performed dimensionality reduction through principal component analysis (PCA). For each genomic window of interest containing N variable sites, we transformed genotype into an N-dimensional vector for each sample. We excluded sites where >1/3 of samples were missing genotype calls and imputed any missing data at other sites using a Knearest neighbors algorithm (with k=3) on up to 100 neighboring sites in the genomic window. We performed PCA using the sklearn python library and extracted both the variance explained by each principal component axis and the coordinates of each sample along these axes using the inbuilt library methods.

Sample collection and measurements
Fish were collected from Little Meadow Creek and Matanuska Lake, euthanized in tricaine (MS-222, Syndel), immediately fixed in 70% ethanol, and measured for standard length, dorsal spine 1, dorsal spine 2, dorsal spine 3, anal spine, and the right pelvic spine. All fish collected in 2007 were measured in triplicate with digital calipers. Follow-up sampling from 2018 yielded an excess of fish, so fish with more visually extreme phenotypes were selected for measurement and genotyping. The 2018 fish had their right pelvic spine measured in triplicate with digital calipers and other measurements taken by X-ray, and later all verified by triplicate caliber measurement. All fish were regressed against standard body length and sex only within their sample cohort to control for any differences in measurement and effects of long-term ethanol storage.

Transgenic stickleback assays
Transgenic stickleback were generated as previously described (3), through microinjection of plasmids with Tol2 transposase into single cell stickleback embryos from Matadero Creek, California. Larvae were anaesthetized with tricaine (MS-222, Syndel) and imaged for GFP expression approximately once a week throughout the first two months of development. Images were taken with a MZFLIII fluorescent microscope (Leica Microsystems, Bannockburn, IL) using GFP2 filters and a ProgResCF camera (Jenoptik AG, Jena, Germany).

Triallelic overlap significance
The triallelic regions were shuffled by bedtools, and the number of CSS regions (2% False Discovery Rate)(4) overlapping a shuffled triallelic region was counted across 100,000 simulations. No random permutations had as many overlaps as observed, so p < 1e-5 was taken as an upper bound.

Gene ontology
Stickleback genes were mapped to the corresponding human 1-to-1 orthologs (by Ensembl 94). Stickleback genes without 1-to-1 human orthologs were not used for gene ontology analysis. The associated tag counts were then evaluated by hypergeometric distribution and p-values adjusted for multiple comparisons by false discovery rate.  (5). Maser is the peak of the globally shared genomic region, but only part of a much larger haplotype found across the Pacific Northwest.  (6). c. This locus is only 400 kb downstream of Maser and located between FoxI1 (5 kb upstream) and Wnt8b (two copies, located 1.4 kb and 3.6 kb downstream). d. A large (89 kb) triallelic region overlapping Mirlet7a3, Mirlet7b, Pparaa, Cdpf1, and 5.4 kb from Wnt7b. e. Overlapping a cluster of copies of Slc47a1 and just downstream of Vgfl1 (9 kb). f. 2.5 kb upstream of Kitlg, a major determinant of pigmentation in stickleback and humans (7). g. A smaller 4 kb region intronic to Bckdhb, and directly overlapping a previously characterized 4 kb enhancer of Gdf6 that regulates armor plate development (8). h. An example of a triallelic region that does not involve marine-freshwater differentiation, overlapping Fam184a. Note how in several of these examples, the two major freshwater clusters are separated by PC1, while marine and freshwater are largely separated by PC2.  Creek shows association with anal spine length. c-f. Association of spine length residuals with genotype after controlling for genotype at the peak marker (indicated). The substantial remaining signal indicates they are not significant in the earlier plots only by linkage disequilibrium, but also contain additional information. Fig. S5. Phenotypic stratification by population, trait, marker, and sex. a-j. Fish are stratified by genotype at the indicated marker in Little Meadow Creek (LMCK) and Matanuska Lake (MNKA). oGK724/725 is the primary peak marker and oGK768/769 is the secondary peak marker in Figure 3f. k-l. Fish are stratified by sex. Sex is removed from the phenotypic regression in panels k and l. All phenotypes are significantly associated with the genotype at the indicated markers except sex in Little Meadow Creek (see Figure 3). In this and other figures, the box plots show the median value (orange line) and 1 st and 3 rd quartiles, with whiskers extending to 1.5x the interquartile range.   (Table S12).

Fig. S8. Orientation and spacing changes around Stc2a in acanthomorphs (red).
Arrows represent gene orientation for Stc2a and its immediate neighbors. Intergenic distances are listed between adjacent loci for each species. The vertical grey bars represent the interval that contains Maser orthologous sequences (which are not identifiable outside ray-finned fish). Note that an inversion of Stc2a in Acanthomorpha results in a large increase in absolute (middle column) and relative distance (right column) to Nkx2.5.
Dataset S1 (separate file). Spine and standard length measurements in fish with previously sequenced genomes (5). Dataset S6 (separate file): Percent Variance Explained (PVE) and Dominance at the two peak markers in Maser. Dominance is defined here as the heterozygous phenotype being the same as Long/Long at h=1 and the same as Short/Short at h=0. These values are reported as observed in the wild populations and may differ from those that a traditional QTL cross would yield due to the different allelic ratios. Specifically, the Short/Short genotype is relatively rare in these samples, leading to an underestimation of PVE. Dataset S8 (separate file): Gene ontology analysis on genes called as differentially expressed between the two crosses using DESeq2 (potential trans downstream genes).
Dataset S10 (separate file): Gene ontology analysis on genes showing reciprocal expression changes.
Dataset S11 (separate file): Population group assignments for selected triallelic regions.