microCT-Based Skeletal Phenomics in Zebrafish Reveals Virtues of Deep Phenotyping at the Whole-Organism Scale

Phenomics—in-depth phenotyping at the whole-organism scale—holds promise to enhance our fundamental understanding of genes and genomic variation, yet methods in vertebrates are limited. Here, we demonstrate rapid whole-body profiling of hundreds of traits in the axial skeleton of adult zebrafish. We show the potential for vertebral patterns to confer heightened sensitivity, with similar specificity, in discriminating mutant populations compared to analyzing individual vertebrae in isolation, even when the latter is performed at higher resolution. We identify phenotypes associated with human brittle bone disease and thyroid stimulating hormone receptor hyperactivity. Finally, we develop allometric models and show their potential to discriminate mutant phenotypes masked by growth alterations in growth. Our studies demonstrate virtues of whole-body phenomic pattern analysis in a single organ system. The high sensitivity may increase productivity in genetic screens, and facilitate the study genetic variants of smaller effect size, such as those that underlie complex diseases.


INTRODUCTION
Advances in genomic sequencing have revolutionized our ability to identify genes and genetic variations important for health, yet our ability to characterize vertebrate phenomes -i.e., to acquire in-depth phenotypic profiles at the scale of the whole organism [1] -remains limited. The development of vertebrate phenotypic assays that approach the scale and efficiency of genomic technologies hold promise to fundamentally enhance our functional understanding of genes and genomic variation.
For instance, they could rapidly accelerate genetic and drug discovery by enabling organism-wide, unbiased analysis of a large number of traits. In addition, they could expand our understanding of functional relationships between genes by enabling the clustering of gene mutations into common pathways based on similarities in phenotypic signatures. Finally, there are likely to be benefits of phenomic profiling that are difficult to anticipate [1]. Specifically, since our functional understanding of genes and genomic variation is directly coupled to the depth in which we are able to characterize their effects on phenotype, our understanding of gene function is fundamentally limited by the tendency for phenotypic assays to be restricted to a few readouts [1][2][3]. A better understanding of the biological insights that may be attained by profiling changes in patterns in a large number of traits is essential to both guide and drive the development of next-generation phenotyping technologies.
phenotyping in zebrafish models of brittle bone disease and thyroid stimulating hormone receptor (tshr) hyperactivity, and identify multivariate phenotypes in zebrafish associated with human skeletal disease. Finally, we developed phenome-based allometric models and show that they are able to discriminate mutant phenotypes masked by alterations in growth. We have integrated our methods into a software package, FishCuT, that is available as a beta release upon request from the authors.

A microCT-Based WorkFlow For Rapid Phenomic Profiling in Adult Zebrafish
Due to their small size, zebrafish are conducive to whole-body microCT imaging at high resolution [12]. Previous studies have demonstrated radiopacity in adult zebrafish vertebrae [13] and the viability of imaging vertebral morphology in adult zebrafish via microCT [9, 10, 14-16]. In our studies, we found that the small size of zebrafish can be exploited by imaging multiple fish simultaneously. For instance, we routinely image two fish at a time (enabling acquisition of whole body scans at 21µm resolution in ~20min/fish), and have found that up to eight animals can be scanned simultaneously at 21µm resolution, resulting in an effective scan time of ~5min/fish. In this context, the primary bottleneck to whole-body skeletal analysis is not the time required for microCT scan acquisition, but the time required for segmenting large microCT datasets. As an example, we found that ~60min was required to manually segment a single vertebra of an adult fish scanned at 21µm resolution (~60 image slices) using a user-assisted segmentation tool, a rate that would require >24hrs to segment the ~27 precaudal and caudal vertebrae that comprise the zebrafish vertebral column.
To overcome this barrier, we developed FishCuT, a microCT analysis toolkit that couples supervised segmentation with connectivity analysis to enable rapid computation of hundreds of descriptors of bone morphology, mass, and mineralization in the vertebral column of adult zebrafish (Fig 1). To segment vertebral bodies, FishCuT employs a region-growing algorithm which takes user-specified line regions of interest (ROIs) specifying "seed" locations of vertebral boundaries, and uses them to isolate individual vertebrae. This segmentation is achieved by iteratively growing a separation boundary such that each vertebra is composed of connected components that do not contain voxels from different vertebrae. Once each vertebral body is segmented (Fig 1B), a supervised algorithm is used to segment each vertebral body into three skeletal elements: the neural arch (Neur), centrum (Cent), and haemal arch/ribs (Haem) ( Fig   1C). For each skeletal element, four primary parameters are computed: Tissue Mineral Density (TMD, mgHA/cm 3 ), Volume (Vol, µm 3 ), Thickness (Th, µm), and Surface Area (SA, µm 2 ) ( Fig 1D). In addition to the above traits, FishCuT computes centrum length (Cent.Le, µm), as well as intra-specimen variation in TMD and thickness (i.e. TMD.sd and Th.sd, respectively [11, 17]) for each skeletal element. For each trait, the "total" value (e.g., Tot.TMD) within a single vertebra (i.e., across the centrum, haemal arch/ribs, and neural arch) is also computed. For analysis, we developed a statistical testing procedure where each combination of element/outcome (e.g., Cent.TMD) is computed as a function of vertebra number, and the global test [18, 19], a regression-based statistical test designed for data sets in which many covariates (or features) have been measured for the same subjects, is used to assess whether the pattern across vertebrae is significantly different between groups. This output is provided graphically via a custom R script that computes plots of phenotypic trends (Fig 1E), and color codes them to highlight which trends are statistically significant in the global test. Finally, for each combination of outcome/element, a standard score is computed as the difference between its value in each vertebral body and its mean value across all vertebrae in the control population, divided by the standard deviation across all vertebrae in the control population. These data are arranged into matrix constructs that we have termed "skeletal barcodes". These data can be input into graphing software to generate heat maps to facilitate visualization of phenotypic trends both within individuals and across groups ( Fig   1F). Using this process, we are able to segment 24 vertebrae into 72 different skeletal compartments to quantify ~600 different traits, with image analysis usually taking <5min/fish (~100x faster than manual segmentation).
To assess segmentation quality, we analyzed a single wildtype zebrafish (strain, AB; standard length, S.L.: 24.8mm) using two different approaches --FishCuT and manual segmentation --and used the Dice similarity coefficient (DSC) [20] to evaluate spatial overlap in segmentations produced by the two approaches. We computed a DSC of 0.932±0.001 (mean±SD, n=2 vertebrae) when we excluded pixels with intensities less than the threshold used in the FishCuT analysis, exceeding the value of 0.7 suggested to indicate excellent agreement between manual and automated segmentation approaches [21]. We computed a mean difference of 3.4±1.2% (mean±SD, n=3 fish) in centrum length, a measure that is calculated directly from user-specified "seed" locations of the vertebral boundaries, when we serially performed primary and secondary scans of the same animal. This suggests that intra-operator reproducibility during this step was high.
Finally, to assess independence among traits, we computed Pearson correlation coefficients for each pair of traits across all 34 fish analyzed in this study. We observed a relatively low median absolute correlation of 0.34 among traits (Fig 2), similar to the value of 0.3 attained by Pardo-Martin et al. during high-content profiling of the zebrafish craniofacial skeleton [8]. This quantity was relatively invariant with increasing number of vertebrae used for analysis (Fig S1), consistent with the notion that each vertebra confers non-redundant information.

MicroCT Imaging at Medium and High Resolution Exhibit Comparable Ability to Distinguish Mutant Phenotypes
Due to the large volumes that must be analyzed, whole-body phenomics requires a balance between image resolution and throughput. We assessed a) the correlation in traits quantified using scans performed at 21µm (medium) vs. 10.5µm (high) nominal resolution, and b) the sensitivity of these two resolutions in discriminating mutant phenotypes in known skeletal mutants. For testing, we used bmp1a -/mutant fish. In at medium and high resolutions, analyzed scans using FishCuT, and compared quantities from individual vertebrae via linear regressions (Fig 3). In general, we observed extremely high correlations between values attained at the two scanning resolutions, with R 2 values ranging from 0.98 to >0.99. In most cases, slopes deviated slightly from unity: slopes ranged from 0.90 to 0.91 for TMD measurements, while it ranged 1.18 to 1.21 for thickness measurements (we did not observe a consistent trend for volumetric measurements). This deviation from unity is likely to due to partial volume effects (PVEs; an inherent property of microCT images that emerges from projecting a continuous object onto a discrete grid [23]). However, due to the high correlation in measurements attained at each resolution, we observed minimal impact of PVEs on sensitivity in discriminating mutant phenotypes, as t-tests for differences in single vertebrae between WT and bmp1a -/fish yielded similar p-values regardless of scan resolution used (see Fig 3 for values). Collectively, these analyses suggest that microCT scans at high and medium resolutions provided highly correlated information on each trait, and were comparable in their ability to discriminate WT from mutant phenotypes.

Phenomic Profiling Enhances Sensitivity in Discriminating Mutant Populations
As described above, for statistical testing we developed a procedure whereby each combination of element/outcome is computed as a function of vertebra number, and the global test is used to assess whether the pattern across vertebrae is significantly different between groups. We hypothesized that phenomic profiling, in concert with the global test, may provide greater sensitivity in distinguishing mutant populations compared to a) t-tests of individual vertebrae, and b) t-tests of quantities averaged across all vertebrae.
To test this, we used the n=3 WT and bmp1a -/fish from the previous section to estimate means and covariances for each trait in each population and used them to construct distributions for Monte Carlo simulations (10,000-100,000 simulations per analysis). For our outcome measure, we used Tot.TMD since all TMD measures (i.e., Cent.TMD, Haem.TMD, and Neur.TMD) were found to be significantly different using ttests in the previous section. First, we estimated the power of the global test in discriminating high Tot.TMD in bmp1a -/fish as a function of sample size and alpha value ( Fig 4A). We found that a power of >0.8 was attained with a sample size of n=3, 4, and 6 fish/group for alpha values of 0.05, 0.01, and 0.001, respectively. Next, using a sample size of n=3 fish/group and alpha=0.05, we estimated test sensitivity (fraction of times in which p<0.05 when comparing bmp1a -/fish to WT fish) ( Fig 4B) and specificity (1fraction of times in which p<0.05 when comparing WT to WT fish) ( Fig 4C) for a) the global test (using vertebrae 1:20, scanned at 21µm resolution), b) t-tests of averaged quantities (using the mean of vertebrae 1:20, scanned at 21µm resolution), and c) t-tests 1 0 the global test conferred higher sensitivity, with similar specificity, compared to t-tests of averaged quantities as well as t-tests of individual vertebrae, even when the latter was performed at higher resolution. For instance, at alpha=0.05, the sensitivity of the global test, t-test of individual vertebrae, and t-test of averaged quantities was 0.92, 0.87, and 0.82. At alpha=0.01, the sensitivity of the global test, t-test of individual vertebrae, and ttest of averaged quantities was 0.56, 0.42, and 0.37 (respectively). Sensitivity values were consistent with those predicted from specified levels of alpha: as an example, at alpha=0.01, specificity was 0.99 in all three cases. For alpha=0.05 and n=3, we estimated a false discovery rate of 4.4% for the global test procedure. Finally, we compared the power of the different testing procedures as a function of effect size ( Fig   4D,E). We found that the global test conferred higher power compared to t-tests of averaged quantities and t-tests of individual vertebrae, with the relative increase greatest at small effect sizes (e.g., ~1.3 to 1.4 fold greater at an effect size of 0.5, compared to ~1.1 to 1.2 fold greater at an effect size of 3).
To corroborate our simulations experimentally, we used the n=3 WT and bmp1a -/fish described in the previous section to form 15 different groupings with n=2 fish/group (Supplementary Table 1): 9 groupings compared fish of different genotypes (i.e., WT vs. mutant), and 6 groupings compared fish of the same genotype (WT vs. WT or mutant vs. mutant). When we computed p-values using the different groupings (Fig 4F), we found that global tests conferred lower p-values when comparing WT vs. mutant (suggesting higher sensitivity), with similar p-values when comparing fish of the same genotype (suggesting similar sensitivity), compared to both t-tests of averaged quantities as well as t-tests of individual of vertebrae scanned at high resolution. Notably, for 4/9 (44%) of the WT vs. mutant comparisons we found a significant difference (i.e., p<0.05) between groups, comparable to the power of 0.58 we estimated for a sample size of n=2 fish/group in Fig 4A. In contrast, only 1/9 (11%) of the WT vs. mutant comparisons were 1 significantly different when performing t-tests of averaged quantities, and 0/9 (0%) when using individual vertebrae scanned at high resolution.

Identification of Novel Phenotypic Features in Known Axial Skeletal Mutants
We next performed comprehensive phenotypic characterization of bmp1a -/fish, as well as another mutant associated with human brittle bone disease, plod2 -/- [14].
Mutations in PLOD2 are associated with Bruck syndrome, a recessive condition resembling OI. Previous studies revealed high vertebral BMD in adult zebrafish with mutants in bmp1a, [15], however, it is unclear whether this high BMD is attributable to an increase in bone mass, and/or mineralization. For plod2 -/mutants, adult animals were found to exhibit vertebral compressions, distorted vertebrae, and excessive bone formation [14]. Previous microCT analyses showed higher Centrum TMD in these mutants. However, due to the lack of robust methods to analyze the highly dysmorphic vertebrae in plod2 -/fish, previous microCT analyses were performed in two dimensional maximum intensity projections [14], and an in-depth, three dimensional phenotypic characterization of plod2 -/fish has yet to be performed.
Using FishCuT, we analyzed whole-body scans of bmp1a -/and plod2 -/mutant fish, and compared them to WT sibling controls (n=3/group) (Fig 5). Note that for the rest of our studies, to focus our analyses we restrict our presentation of results to ten traits (the nine possible combinations of (Cent, HA, NA) x (Vol, TMD, and Th), plus Cent.Le) and the 16 most anteriorly-located vertebrae. In bmp1a -/mutants, analysis revealed significant increases in both bone mass and mineralization. Specifically, in regard to bone mass, centrum volume was significantly increased (Cent.Vol: p<0.01). Further, bone thickness was significantly elevated in all skeletal elements (Cent.Th: p<0.001, Haem.Th: p<0.05, Neur.Th: p<0.01). In regard to bone mineralization, TMD was also significantly elevated in all skeletal elements (Cent.TMD: p<0.01, Haem.TMD: p<0.01, Neur.TMD: p<0.01). While Cent.TMD and Neur.TMD appeared to be uniformly elevated across all vertebrae, Haem.TMD appeared to be differentially elevated in precaudal (first ten) vertebrae. We used the covariates functionality in the globaltest package in R to identify vertebral clusters that exhibit a significant association with genotype. We found that only vertebrae 2, 3, 4, and 7 were significantly associated with genotype (Fig S2), consistent with the notion that precaudal vertebrae were differentially affected.

Allometric Models Aid in Discriminating Mutant Phenotypes Masked by Alterations in Growth
In zebrafish, developmental progress is more closely related to standard length than to age [24,25]. In analyzing mutants that exhibit differences in body size (e.g., plod2 -/mutants exhibited severely diminished body size compared to WT siblings), it is difficult to discriminate to what degree altered phenotypes are attributable to differences in developmental progress, versus specific effects on skeletal function. Furthermore, although morphological developmental milestones can sometimes allow staging despite genotype-specific differences in size and growth during the larval-to-adult transformation, few such milestones have been identified, particularly during later stages, and some milestones are themselves skeletal traits, necessitating an alternative approach. To help address these challenges, we developed allometric models to control for effects of body size during phenomic analysis. We performed an ontogenetic series by phenomically profiling WT fish over a range of standard lengths (18.4mm to 31.8mm, n=12) (Fig S3). To model each trait in WT fish as a function of standard length, we used a standard power-law relationship for allometric modeling [26]: y=ax b , where y is the trait of interest, x is standard length, and a and b are empirically-derived parameters. The scaling exponent b is directly interpretable when quantities are associated with mass, length, area, and volume. Thus, we converted TMD to a mass-based quantity by computing tissue mineral content (TMC, mgHA) as the product of volume and TMD in each skeletal element (e.g., Cent.TMC = Cent.Vol*Cent.TMD). In general, we found that most traits significantly deviated from isometric growth (i.e., proportional relationships were not preserved with growth) with respect to standard length (Table 1). Specifically, traits associated with thickness and volume exhibited negative allometry (scaling exponents lower than those expected for isometric growth), while TMC exhibited positive allometry. We used the following relationship to normalize for allometric effects of growth [26]: y* = y (x*/x) b , where y is the trait of interest, x is standard length, x* is a reference standard length, and y* is the transformed value. When we applied this relationship to the phenomic profiles from the WT animals in our ontogenetic series (using the mean standard length of all WT fish as the reference standard length), we found that the coefficient of variability was substantially reduced compared to unnormalized values, as well as when quantities were normalized by an alternate transformation, 1/S.L. (Fig 7).
Next, we used the above normalization procedure to re-analyze plod2 -/mutants.
We scaled phenotypic data in WT sibling controls by applying Eq. 2, using the mean standard length of plod2 -/mutants as the reference length. In our unnormalized analysis of plod2 -/fish (i.e., Fig 6), we observed significant decreases in several morphological 1 4 quantities including Haem.Vol, Haem.Th, Neur.Vol, Neur.Th, and Cent.Le. In contrast to these low bone mass phenotypes we did not observe any differences in these traits following normalization (Fig 8). Instead, we observed a significant increase in Cent.Vol (p<0.01). Since Cent.Le and Cent.Th were similar in plod2 -/fish and WT siblings, we surmised that the increase in Cent.Vol may be attributable to an increase in centrum diameter. Consistent with this notion, when we manually examined transverse sections in microCT images (Fig 8L), we observed a clear increase in centrum diameter in plod2 -/mutants relative to similarly-sized, non-sibling WT animals (and to a lesser extent, larger, sibling controls). This phenotype was not previously identified during the initial characterization of the plod2 -/mutant line [27], and is reminiscent of the cortical expansion that occurs in mammals due to coupled bone formation and resorption on the periosteal and endosteal surface, respectively. Collectively, these analyses identify a novel phenotypic feature, centrum expansion, in plod2 -/mutants, and suggest the utility of phenomic-based allometric models as a complimentary analytical tool to reveal mutant phenotypes masked by variations in growth.

Identification of opallus as a Novel Axial Skeletal Mutant
Finally, based on the potential for FishCuT to identify novel phenotypes in known skeletal mutants, we examined the potential for FishCuT to identify novel axial skeletal mutants among fish populations derived from forward genetic screens. opallus was derived from a forward genetic screen, and exhibits pigmentation abnormalities characterized by excessive xanthophores and depleted melanophores, as well as jaw hypertrophy [28]. opallus harbors a mutation in thyroid stimulating hormone receptor (tshr) identical to a human mutation causing constitutive TSHR activity and hyperthyroidism [28]. These two conditions have been associated with opposing effects on human BMD. Specifically, while hyperthyroidism is traditionally associated with low 1 5 whether opallus exhibited an axial skeletal phenotype. In a first cohort where we performed an initial screen of n=2 fish (Fig 9A), we observed a clear increase in TMD in opallus that was not present in standard length-matched AB controls, or in another mutant derived in a forward genetic screen that exhibits pigmentation defects, pissarro This decrease was most pronounced in anterior vertebrae, with covariate analysis revealing significant associations between vertebrae 1-5 and genotype (Fig S4).

DISCUSSION
In this study, we developed microCT-based methods to rapidly profile hundreds of phenotypic traits in adult zebrafish. We analyzed over 20,000 different phenotypic data points derived WT fish of different stages of ontogenetic growth, as well as zebrafish mutants associated with human skeletal disease. Our studies suggest the potential to systematically map gene-to-phenome relationships in zebrafish as a means to advance our understanding of the genetic basis of adult skeletal health, and reveal virtues of deep phenotyping within a single organ system at the whole-organism scale.
A challenge to the analysis of high-dimensional phenotypic data is the curse of dimensionality: when testing for changes in each trait individually, as a consequence of multiple testing correction, the number of samples required for a statistically reliable result increases exponentially as the number of traits is increased. In lieu of analyzing each trait in isolation, we examined whether patterns of element/outcome combinations varied across vertebral bodies were altered in mutant populations. In Monte Carlo simulations, when holding alpha constant, power in the global test was consistently higher compared to t-tests of averaged quantities and t-tests of individual vertebrae, with the relative increase in power greatest at small effect sizes. Further, simulated levels of specificity were consistent with specified values of alpha. Our studies suggest that vertebral phenomic patterns may confer enhanced sensitivity in discriminating mutant phenotypes relative to analyzing individual vertebrae, even when the latter is performed at higher resolution. This attribute may increase productivity in forward genetic screens, as well as provide opportunities to study genetic variants of smaller effect size, such as those which underlie the overwhelming majority of complex diseases [31]. Our studies also challenge the concept that higher sensitivity in discriminating mutant populations must come at the cost of higher microCT scan resolution and hence, lower throughput, since microCT scanning of the whole body at medium resolution can be performed in a similar amount of time as a few vertebrae at high-resolution. While our workflow is highly sensitive in discriminating mutants that exhibit subtle phenotypic alterations in a large number of bones, other scan resolutions and statistical testing may be appropriate in some cases. For instance, while we found that scans acquired at 21µm and 10.5µm resolution were comparable in their analysis of bmp1a -/mutants, it is possible that other mutant lines may present extremely small, thin, or hypomineralized structures that require greater scanning resolution to resolve. Finally, given the fact that the global test exhibits optimal power when many features exhibit minor changes [18], our workflow will be most optimal when there are many small changes in a large number of vertebrae, and other statistical tests may be more powerful when testing for larger effects in a single (or few) vertebrae. Since developmental progress in zebrafish is more closely related to standard length than to age, the interpretation of mutant phenotypes can substantially differ depending on whether mutants are compared to age-matched WT siblings (which may differ in standard length and thus developmental progress), or non-sibling WT animals matched by standard length (which could mask genetic alterations on developmental progress, and exhibit greater variation in genetics and environmental influences). In high-throughput settings where resource conservation is critical, it is not practical to dedicate resources for both experimental comparisons. Our studies demonstrate that allometric modeling is effective in transforming WT sibling data to a "virtual" phenome scaled to the mean standard length of age-matched mutants, providing a computational means to enable both length-and age-matched fish comparisons from a single control group. Using this approach, we identified expanded centrum diameter in plod2 -/-mutants. In mammals, cortical expansion arises due to bone remodeling, with bone formation on the periosteal surface coupled with resorption on the endosteal surface.
Thus, our studies bring forth the question of whether the increased centrum diameter in plod2 -/mutants is attributable to accelerated bone remodeling.
In addition to identifying novel phenotypes in known skeletal mutants, we also identified a novel axial skeletal mutant, opallus, harboring a TSHR gain-of-function mutation. Excess TSHR activity has been associated with high BMD in humans [29].
Notably, unlike plod2 -/and bmp1a -/mutants, opallus exhibited high TMD in the presence of mostly normal bone mass and morphology, providing evidence of the potential for these traits to be decoupled in zebrafish mutants. By virtue of the wealth of genetic resources in zebrafish (as of January 2017, >17,000 mutant alleles are available to order through the Zebrafish Mutation Project [34]), the ability to quickly generate new mutant alleles using CRISPR/Cas9, and the ability for individual labs to house thousands of adult animals, our studies establish successful practices to merge high-content phenotyping with genetic analysis to identify new genes mediating human skeletal health. In this context, the construction of large collections of phenomes in zebrafish mutants whose orthologs are associated with mammalian bone mass and mineral accrual is likely to facilitate not only the identification of novel regulators of human bone mass, but also phenotypic signatures in zebrafish that are predictive of human skeletal disease.
A future challenge is to increase the throughput and content of our workflow.
Since FishCuT supports the DICOM standard, it is readily ported to other microCT systems. In this context, commercial microCT systems optimized for rapid throughput imaging (e.g., through the use larger detectors to increase the number of animals per field of view, higher power x-ray sources to decrease sampling time per image, and robot-based sample changing systems) have been shown to increase imaging throughput by 10 fold or more [35]. In regard to image analysis, machine learning-based approaches enable fully automated localization of boundaries of vertebral bodies in human CT/MR data [36], and such an approach might be used to automate seeding of segment boundaries, particularly if analysis is restricted to mutants that do not exhibit severe dysmorphic phenotypes. Finally, a long-term challenge is to extend analysis to other skeletal structures, including the craniofacial skeleton. Notably, as microCT scans are archived, these image libraries can be retroactively analyzed as new algorithms are developed, and re-analyzed to identify new genotype-to-phenotype associations.
In conclusion, we have developed a sensitive, rapid workflow for microCT-based skeletal phenomics in adult zebrafish. Our studies provide a foundation to systematically map genotype-to-phenome relationships as a path toward rapid advances in our understanding of the genetic basis of adult skeletal health.

Zebrafish Rearing
All studies were performed on an approved protocol in accordance with the
High-resolution scans (10.5µm voxel resolution) were acquired using the following settings: 55kVp, 145μA, 2048 samples, 1000proj/180°, 200ms integration time. DICOM files of individual fish were generated using Scanco software, and analyzed using the custom software described below. In general, at least two fish were scanned simultaneously in each acquisition.

Image Analysis
Image processing methods were implemented as custom software developed in MATLAB (scripts were tested in v2016.a). To encourage open-source development, we implemented the MIJI package to enable calls to libraries and functions developed in FIJI/ImageJ [38,39]. Further, we developed a graphical user interface (GUI) to facilitate user interaction. Analysis consists of several stages. Stage 1 consists of preprocessing, for which we have implemented a preprocessing module in which the user is able to specify a rotation along the anteroposterior axis to orient specimens to an upright position. This module also enables slice-by-slice visualization, as well as mean or maximum intensity projections of unprocessed DICOM images. 1 Stage 2 consists of thresholding. In general, we have found that fish within and across clutches can exhibit significant differences in mineralization, and thus cannot be reliably analyzed using a uniform threshold value. Thus, we calculate thresholds for each animal using a semi-automatic approach. To filter out background, the user draws a ROI outlining the fish in a maximum intensity projection, all values outside this region of interest are set to 0, and the threshold is calculated using the IsoData algorithm in ImageJ [19,20]. The threshold value may be adjusted by multiplying it by a correction factor to provide more conservative or stringent thresholding, depending on user needs.
Based on a comparison of user defined thresholds and those computed using the approach described above, we multiplied the IsoData threshold by 0.73 across all experiments.
Stage 3 consists of vertebral segmentation. Our approach is to isolate individual vertebrae so that each vertebra is composed of one or more connected components that do not contain voxels from different vertebra. The user initiates planes of separation between vertebra by drawing a "separation line" between each pair of centra. Voxels within a plane defined by the separation line are set to 0, the connected components are computed, and connected component labels are tallied for each of the two volumes separated by the plane. If the connected components with the plurality of votes in the two regions are distinct, the algorithm stops; otherwise, the separation line is extended, and the process repeated. While we found this approach robustly separated centra in all samples (including those that exhibited significant morphological deficits, see below), we encountered some cases in which ribs close to pterygiophores and associated fin rays would be considered as a single connected structure. In this case, we have implemented a manual "cutting" tool to provide the user with the ability to sever connections between skeletal elements.           hierarchically clustered based on association with mutation, and a multiplicity-corrected procedure was used to identify portions of the graph that exhibit a significant association with mutation (highlighted in black). Significant associations are observed for vertebrae 2, 3, 4, and 7. Note that for vertebrae 9/10 significant branches did not reach all the way to the "leaf nodes"; in this case, we are able to infer that at least one of the vertebrae 9 or 10 was significantly associated with genotype, but not which one.