Does 3D Phenotyping Yield Substantial Insights in the Genetics of the Mouse Mandible Shape?

We describe the application of high-resolution 3D microcomputed tomography, together with 3D landmarks and geometric morphometrics, to validate and further improve previous quantitative genetic studies that reported QTL responsible for variation in the mandible shape of laboratory mice using a new backcross between C57BL/6J and A/J inbred strains. Despite the increasing availability of 3D imaging techniques, artificial flattening of the mandible by 2D imaging techniques seems at first an acceptable compromise for large-scale phenotyping protocols, thanks to an abundance of low-cost digital imaging systems such as microscopes or digital cameras. We evaluated the gain of information from considering explicitly this additional third dimension, and also from capturing variation on the bone surface where no precise anatomical landmark can be marked. Multivariate QTL mapping conducted with different landmark configurations (2D vs. 3D; manual vs. semilandmarks) broadly agreed with the findings of previous studies. Significantly more QTL (23) were identified and more precisely mapped when the mandible shape was captured with a large set of semilandmarks coupled with manual landmarks. It appears that finer phenotypic characterization of the mandibular shape with 3D landmarks, along with higher density genotyping, yields better insights into the genetic architecture of mandibular development. Most of the main variation is, nonetheless, preferentially embedded in the natural 2D plane of the hemi-mandible, reinforcing the results of earlier influential investigations.

3D geometric morphometrics multivariate QTL mapping Mus musculus mandible shape Geometric morphometric methods based on landmarks offer a convenient statistical framework to conduct quantitative genetic analyses of the shape of complex morphological structures such as the skull and mandible (Klingenberg 2010). Among model systems in the genetics and development of complex traits, the mouse mandible is probably one of the most extensively used (Atchley et al. 1985;Atchley and Hall 1991). The primary reason for this success is its intermediate level of complexity (Klingenberg and Navarro 2012) between rather complex models, such as the skull (Hallgrímsson et al. 2014), and simpler models, such as the Drosophila wing (Dworkin et al. 2011;Debat et al. 2009) or other insect appendages (e.g., Khila et al. 2009;Emília Santos et al. 2015;Prud'homme et al. 2012). One historical reason, and probably a major constituent of that success, is the relative flatness of this bone, allowing rudimentary 2D imaging techniques to be used effectively.
Genetic architecture, imprinting effects, integration, and modularity of the mouse mandible have been investigated in fairly high detail using geometric morphometrics (Klingenberg et al. , 2004Leamy et al. 2008;Suto 2009;Boell et al. , 2013Boell 2013;. One common assumption in the majority of these studies is the approximation of the mandible to a 2D shape based on a photograph of either the medial or the buccal sides. The 2D imaging of 3D shapes is well known to incur information loss and errors due to object projection (see Cardini 2014, for a review of this source of error). This flattening may represent a major factor of variation in the sample because of its intricate interaction with the positioning of the object and its shape itself. Nonetheless, for years it has been a common practice in the community to collect data in 2D, mainly because of technical availability and feasibility as well as processing speed compared to 3D data. Despite potential consequences, the 2D projection error inherent to this practice has rarely been explicitly assessed in morphometric protocols compared to other kind of digitization and observer errors that are routinely controlled (e.g., Muñoz-Muñoz and Perpiñan 2010;Yezerinac et al. 1992). For the mandible, the sole study that we are aware of reports that 2D approximation is actually accurate for marmot data (Cardini 2014). With the increased availability and reduced cost of both surface and volumetric high-resolution imaging, it is becoming more and more feasible to conduct studies using 3D landmarks, thus reducing the risks of issues related to artificial 2D flattening.
At the same time, in genetics, there is growing interest in deciphering the genetic architecture of within-population variation and local adaptation using genome-wide association studies in either natural populations or outbred stocks (Mott and Flint 2013;Flint and Eskin 2012). In such populations, variants segregate at variable frequencies and linkage disequilibrium is in the order of a few dozen kb (Yalcin et al. 2010), and this molecular variation needs to be captured. Nowadays, dense SNP maps obtained from genotyping arrays that are commercially available, or from diverse genotyping-by-sequencing techniques such as whole-genome resequencing (Huang et al. 2009), RAD-seq (Baird et al. 2008;Peterson et al. 2012;Miller et al. 2007), multiplexed shotgun sequencing (Andolfatto et al. 2011;Cande et al. 2012), or targeted capture (Jones and Good 2016;Linnen et al. 2013;Olson 2007;Chevalier et al. 2014;Hodges et al. 2007;Gnirke et al. 2009), may yielded thousands to millions of SNPs even in species lacking a reference genome (Elshire et al. 2011). This structure of molecular variation requires the use of large sample sizes that are in the order of a few thousands in the best case scenario of average minor allele frequencies (e.g., Valdar et al. 2006;Navarro and Klingenberg 2007). Failure to reach these high sample sizes will undoubtedly result in difficulties with reaching significance and being able to make decisions between noise and signal in associations (Ledur et al. 2009). These studies may finally come up with mixed results, mapping the main players but accounting only for a low fraction of the total genetic variance, suggesting a large amount of missing heritability (e.g., Pallares et al. 2014).
Whether the imaging modality is 2D or 3D, the process of acquiring landmark coordinates has remained predominantly the same over the last 30 years, i.e., tedious, manual expert annotation of the anatomy. Moreover, the third dimension presents unique challenges (e.g., dealing with the accurate projection of 3D structures on a 2D medium like a computer screen and/or artifacts relating to the orthographic or perspective rendering of the specimen), and the amount of time required to access the accuracy of landmarking (i.e., making sure the selected landmark is actually located on the specimen, not an artifact of the 3D rendering angle) clearly counterbalances the gains by a dramatic cost on the actual feasible sample size. Landmarking in 3D requires 10-60 sec/landmark depending on the expertise and software used (Bromiley et al. 2014). These numbers are also in the order of time required for the digitization of a complete mandible in 2D. Modern phenomics need large sample sizes to follow high-throughput genomic technologies and questions of modern genetics (Houle et al. 2010). A variety of alternatives to manual expert annotation exist or are currently under development depending on the field and the imaging modalities used (Perakis et al. 2014(Perakis et al. , 2010Liu et al. 2008;Aneja et al. 2015;Guo et al. 2013). For instance, attempts have been made to annotate landmarks on CT scans of new specimens using a machine learning algorithm applied to an initial training set created by experts (Bromiley et al. 2014), or using registrations of whole surfaces or volumes (Rolfe et al. 2011), sometimes coupled with multiple atlases, allowing precision comparable to manual editing (Young and Maga 2015). Most approaches are actually based on the dense registration of whole surfaces or volumes and therefore encapsulate homologies at the level of the whole structure. In such phenotyping protocols, constraining the registration of the whole structure with some expert annotation improves point correspondence by ensuring homologies of some anatomical features (McCane 2013). The semilandmark approach (Bookstein 1997;Gunz et al. 2005;Gunz and Mitteroecker 2013) employs such points on curves and surfaces for which homology conditions are relaxed. These landmarks do not have a one-to-one correspondence but quantify anatomical regions where precise manual annotation is not feasible or possible. The technique uses true landmarks to anchor the homology, and the optimal placement of these points is obtained by sliding them locally on the surface until either the Procrustes distance or the Bending energy is minimized (Gunz and Mitteroecker 2013).
Here, we want to revisit some of the QTL studies of mandibular shape in mice using landmarks acquired in 3D. We chose the mandible as our structure of interest because it is a well-studied model system, on which several QTL mapping studies of geometric shape have already been conducted in the past in inbred intercrosses Leamy et al. 2008). Since none of these studies have associated 3D data for their samples, we turned to a new mouse backcross between A/J and C57BL/6J that we recently reported . We wanted to reassess mandibular shape genetics using this new dataset and 3D phenotyping, and evaluate the gain of information (if any) from the third dimension, given that mandible flattening seems at first an acceptable compromise for large-scale study. As a secondary goal, we wanted to increase the phenotypic coverage of the mandible using a template of semilandmarks that was tied into the already collected expert landmarks and assess whether any further benefit was obtained from dense phenotypic coverage.

MATERIALS AND METHODS
Experimental design and statistical shape analysis All aspects of the experimental design, genotyping, the rationale for mapping shape loci using multivariate techniques, and a complete development of the multiple QTL mapping approach used are detailed in an open paper . Briefly, skulls of 433 (A/J · C57BL/ 6J) · A/J 28-day-old individuals were microCT scanned at 18 mm spatial resolution, and genotyped from liver tissue at 882 informative autosomal SNPs using the Illumina medium density linkage panel. After phenotyping and removing six incomplete specimens (see below for specific details relative to the mandible), a full generalized Procrustes analysis (Dryden and Mardia 1998) was performed on these 3D landmarks using the R/Morpho package (Schlager 2015a), and then multivariate shape QTL mapping was done using the R/shapeQTL package (Navarro 2015) of R statistical software (R Core Team 2015). All animal protocols were approved by the University of Washington's Institutional Animal Care and Use Committee.
3D phenotyping: manual landmarks and simulated 2D phenotyping Thirteen 3D landmarks from the right mandible ( Figure 1) were acquired twice from 3D renderings of original image stacks of the complete skull using 3D Slicer (Fedorov et al. 2012, http://www.slicer.org). The sets were averaged as the best estimate of the landmark location. These landmarks correspond to the classical set of 15 landmarks used in previous studies of mouse mandible genetics (Atchley et al. 1985;Klingenberg et al. 2004;Leamy et al. 2008). We initially acquired more landmarks but we found some gross or systematic errors on several of them, which were consequently removed. On the remaining set of 13, no systematic error between landmarking sessions was found (F 32;821 ¼ 1:28; p ¼ 0:14), and the percentage of measurement error was 4.38% and ranged from 2.6-7.55% per landmark.
For a fair comparison with existing results based on 2D imaging, and to better evaluate the gain from the third dimension independently of the gain of denser genotyping, we artificially flattened the mandibular 3D landmarks. To do that, we first aligned each mandible to its main axes, then chose three landmarks to start (dark gray and purple landmarks in Figure 1): the lower point of the angular process, the lower end of symphysis, and an additional landmark (purple landmark) at the top of the inner ridge meeting the molar alveolus. Positions of the first two were refined iteratively based on the innermost vertex of the mesh along the normal to the plane they defined with the third landmark. These three landmarks defined an approximate natural plane on which the mandible would lay. All 13 landmarks were then orthogonally projected according to the normal to this plane.
3D phenotyping: semilandmarks Right mandibles were segmented from the articulated heads. Image voxel resolution was reduced from 18 mm to 36 mm to make further image processing and computations more feasible. We applied a global threshold to remove nonbone material followed by a watershed algorithm to fill any gaps, since watertight meshes are critical for generating semilandmarks on the bone surface. A 3D Gaussian filter with s ¼ 0:2 was applied to reduce the noise in voxel data. Along this process, six specimens were identified with incomplete mandible scans and removed from the analysis.
Because the manual landmarks were annotated on the original fullresolution, articulated mouse heads, small differences may exist between the 3D surface and the 3D landmarks due to the image processing pipeline employed. Therefore, we back-projected the averaged landmarks onto the mandibular surface based on the shortest Euclidean distance between the landmark and the 3D mesh. This ensured that the manual landmarks also existed on the hemi-mandible meshes generated for this analysis.
A template of uniformly distributed semilandmarks was generated using poisson-disk sampling on the closest individual to the mean shape of 3D landmarks. As an alternative, targeted templates using curves and patches may have been developed to focus on specific anatomical features like ridges on the surface (e.g., masseteric ridge) or borders (Swiderski and Zelditch 2013;Anderson et al. 2014), but our aim was to model the bone surface densely. Points lying on the incisors or molars were also manually identified and removed from the template. In the end, we retained 579 semilandmarks (green landmarks in Figure 1) in addition to our 13 expert-annotated landmarks. This template was transferred onto new samples by thin-plate splines based on these 13 landmarks.
Semilandmarks were further slided based on the bending energy and back-projected on the actual mesh surfaces after the sliding relaxation (Gunz et al. 2005). Both the expert-annotated landmark and the semilandmark data were subjected to two independent full GPA. The described procedure made use of the vcgSample function from the Rvcg n  (Cox et al. 2009) and were converted from earlier map or physical position using the Jackson Laboratory's Marker Query Tool.

Shape QTL mapping
The effect at the locus l was estimated using the multivariate linear where x ic is the value of the covariate c and p ij ¼ Prðg i ¼ jjM i Þ is the probability of the QTL genotypes given the flanking markers M for individual i. These probabilities were computed using R=qtl (Broman et al. 2003). The effects b are the q-dimensional effect of the covariate c (i.e., log of the centroid size, gender and direction-of-cross) or of the genotype j representing the direction and magnitude of the shape change of the overall configuration of landmarks within the shape space.
A forward/backward algorithm was used for multiple QTL model searching. This procedure drops and refines positions of additive QTL without any prior knowledge of their number per chromosome, and compares models based on a penalized LOD score (Broman and Speed 2002;Broman and Sen 2009;Manichaikul et al. 2009), pLODðgÞ ¼ 2 log 10 p 2 Tjgj. The forward search was repeated up to a model g including 50 QTL. The penalty T for each additional QTL was evaluated using the 1000 permutations approach (Churchill and Doerge 1994). One classic inferential approach in geometric morphometrics uses the sum of the residual sum of squares (Goodall 1991).
Here, pLOD scores are derived from the Pillai's trace, a classic multivariate statistic, which makes use of covariances and is proven to be fairly robust in a variety of situations (Olson 1974(Olson , 1976Tabachnick and Fidell 2013). However, it is important to note that recent approaches to decipher the basis of adaptation or speciation using RAD-seq or similar techniques (see Jones and Good 2016, for a review) in nonmodel organisms may have fairly small sample sizes compared to the dimensionality of phenotypes (e.g., Huber et al. 2015). Such n p studies might gain in robustness by focusing just on variances, dropping the cost of estimating covariances, as suggested also in other contexts (Adams 2014). Our multivariate approach for mapping multiple QTL is very similar to the one developed for the mapping of functionvalued traits, with pLOD based on the sum of the residual sum of squares (Kwak et al. 2014;Gray et al. 2015). Bayes credible intervals of QTL were computed from the 10 LODðuÞ profile (Dupuis and Siegmund 1999;Sen and Churchill 2001;Manichaikul et al. 2006).
The magnitude of shape changes associated with each QTL was expressed first in units of Procrustes distance as the norm of the additive Workman et al. 2002). Also, the amount of variation accounted for, given all the other QTL and covariates, was reported as the percentage of total Procrustes variance (%SST in Supplemental Material, Table S1 and Table S2), a statistic routinely used with multivariate linear models (e.g., Monteiro 1999). We also reported the effect size as a percentage of variance accounted for in the specific direction defined by the additive vector b j in the shape space (%SS proj Scores in Table S1 and Table S2). For that, we defined a new shape variable, v ¼ Yb t j ðb j b t j Þ 20:5 , corresponding to the shape variable most associated with the shape changes defined by b j and containing both the effect and the residuals in that specific direction (Drake and Klingenberg 2008). The proportion of those projection scores explained by the QTL j is then the ratio of variance between the Eðvjp j Þ and the scores v. Its expectation with backcross and unlinked QTL is 21 . QTL-based G matrices were compared across the three approaches using only the xyðzÞ-coordinates of the 13 manual landmarks that were common across the set of matrices to be compared. Many approaches for matrix similarity have been used in the literature for comparing G matrices across populations (e.g., Aguirre et al. 2013;Teplitsky et al. 2014) or between levels of variation (see for example Debat et al. 2009, for contrasting canalization and developmental stability). Overall distance between G matrices was assessed with the root Euclidean distance (Dryden et al. 2009 where ULU 21 is the spectral decomposition of G. Then, the similarity between g max was measured as their angle, which was compared to 100; 000 pairs of random vectors to assess the significance of whether two g max were more similar than pairs of random vectors . To extend this comparison beyond g max and pairwise comparisons, we computed the common subspace H across the three G (Krzanowski 1979) where A i corresponds to the first q i eigenvectors of G i . We chose q i based on the cumulative amount of variance accounted for ( $ 90%). Briefly, the spectral decomposition of H provided axes maximizing the similarity among matrices D ¼  heritability. With multidimensional traits, heritability is also multidimensional (Klingenberg 2010), meaning that beyond an overall amount of heritable variance there are also directions across the shape space accounting for a varying degree of heritable variation. The multivariate analog to h 2 is GP 2 , where 2 stands for the Moore-Penrose generalized inverse. Its spectral decomposition ULU 21 provided these directions in the shape space maximizing heritability .
These shape features u h as well as any eigenvector of G, P, or QTL effects b j might then be amplified and visualized as a colorized 3D surface using thin-plate spline mapping from the mean shape and its mesh model, and by computing the signed distance between these extrapolated meshes. This visualization procedure made use of tps3d and meshDist functions from the R=Morpho package (Schlager 2015a).
Comparison with previous mapping data Mandible shape QTL have already been assessed in several studies using 2D landmarks on F2 mice from a LG/J · SM/J intercross (Klingenberg et al. , 2004 or on F3 mice from the same cross (Leamy et al. 2008). Despite missing confidence intervals for the discovered QTL, the latter was kept for comparison, because it represented a follow-up study on the former two with more markers and more individuals. A genomewide association study using the first generation of wild-caught mice from a hybrid zone (Pallares et al. 2014) was also included in comparisons. It differs from the three others by its use of outbred hybrid mice, 3D data, but smaller sample size (Table 1).
The closest proximal and distal markers given for the confidence intervals in earlier studies were converted to the current genetic map (Cox et al. 2009) using the Jackson Laboratory's Marker Query Tool. The SNPs used in the mapping from F3 of the LG/J · SM/J intercross (Leamy et al. 2008) were converted to the Cox map using the Mouse Map Converter tool of the Jackson Laboratory using the SNP IDs. Six SNPs were not recovered from their IDs in the conversion. Genomic positions in the NCBI build37 for these SNPs were known from the updated heterogeneous stock data (Shifman et al. 2006;Valdar et al. 2006). They were then converted from those NCBI build37 genomic coordinates to the Cox map using the Mouse Map Converter tool. The genomic positions of loci discovered in Pallares et al. (2014) were converted from the GRCm38 coordinates to the Cox map using the same tool.
Some markers used in LG · SM studies (Klingenberg et al. , 2004 are now known to be syntenic and do not localize to a specific location in the genome. Overall, three QTL were removed from the comparison, and left or right positions of the confidence interval were imputed with values from the central position for three others. The central positions of two other QTL were treated as missing because the new map positions fell outside the confidence interval, and the closest flanking marker was several dozen cM away. All these imputations led to slightly underestimated descriptive statistics on confidence intervals from previous studies. Such bias is nonetheless against any evidence of gain in the precision of QTL locations with more recent data.

Data availability
Genotypes and phenotypes are available from File S1 as a cross object readable by R=qtl or R=shapeQTL.

Mandible shape variation
Variation from expert-annotated 3D landmarks was dispersed with 22 out of 32 principal components (PC) with a variance higher than 1%, and explaining 94% of the total Procrustes variance. Two first PCs explained about 13% each (Figure 2A). Interestingly, the interpolated shape changes (using thin-plate-spline) associated with PC1 show strong correlated deformations in muscle insertion regions where no landmarks are digitized to actually capture genuine variation in these regions. Shape variation in 2D was dispersed with 18 out of 22 PCs with a variance higher than 1%, explaining 97% of the total Procrustes variance and the first three PCs explaining between 10-14% each. Based only on the xy-coordinates of the 13 landmarks, major axes of variation are more similar than two random vectors (a ¼ 25:3°; p , 10 25 ) once the permutation between PC1 and PC2 on 3D landmarks, which account for almost the same amount of variance, is controlled. Figure 3 Percentage of shape changes within the xy-plane (black) and along the z direction (gray). The proportion of shape effects b that lie along the m th dimension (m= 1, 2, or 3) is the sum of the squared effects over the k landmarks on the m th dimension normalized by the norm of the effect, P k i¼1 b 2 i;m =kbk. The figure may be understood as the proportion of changes that are embedded in the flat plane (xy) or that get out of this plane (z). Such parametrization is sensical here, despite the fact that shapes are invariant to rotation by definition, only because they were specifically oriented according to this specific coordinate system. (A) Principal components (PC), covariate, and QTL (quantitative trait loci) effects for 3D landmarks. (B) Principal components, covariate, and QTL effects for semilandmarks. Covariates are noted CS for the log of the centroid size, X for the direction of the cross, and G for gender.
Semilandmark variation was most dispersed with 426 nonnull eigenvalues from which 19 PCs, with a percentage of explained variance higher than 1%, explained 79% of the total Procrustes variance, the first two explaining 13% and 14% each. We chose to use the first 71 PCs that accounted for 95% of the total variance. Based only on fixed landmarks, the main axis of phenotypic variation is very similar to PC1 from 2D landmarks (a ¼ 40:9°; p ¼ 2 · 10 25 ), or to PC2 from the 3D landmarks (a ¼ 26:6°; p , 10 25 ).

Covariate analysis
The main effects of centroid size, gender, and directionality of the cross on mandible shape were found to be significant (p , 0.0001) but no significant interactions among them were found. Altogether, they explained 4.5% of the total Procrustes variance. The direction-of-cross was the major effect in this sample, explaining 2.5% of the total Procrustes variance (Table S1). Covariate results with 2D shapes were similar to the 3D data (Table S1). Results from the multivariate linear model of the three covariates were similar to those from the 3D manual landmark only, with the covariates explaining 5.5% of the variance altogether.
2D embedding of shape variation 3D shapes from manual landmarks were oriented according to the reference 2D plane allowing the decomposition of each effect according to the antero-posterior and bucco-lingual axes (Figure 3). The variation of the 3D landmarks on each PC in the bucco-lingual direction is highly variable (light gray amount in Figure 3). However, major axes of variation (PC1 to PC10; 73.6% of the total Procrustes variance) are embedded mainly in the antero-posterior/dorso-ventral plane of the mandible (black and dark gray), with very little of this variation in the bucco-lingual direction (4.4%). Accordingly, only 9-13% of covariate effects (size, sex, or direction-of-cross) were along this bucco-lingual axis. Shape variation from the semilandmarks was again mostly embedded in the flat plane with only 5.9% of the variation in the bucco-lingual direction, and covariate effects were mostly within this plane (90% of the effect).

QTL mapping of mandible shape
In all three cases (2D, Manual 3D landmarks, and Semilandmarks), the three covariates (log of the centroid size, gender, and direction-of-cross) were included in the QTL mapping as additive covariates. Multivariate QTL mapping identified between 17 QTL for 2D landmarks, 19 QTL for 3D manual landmarks, and 23 QTL for the 3D semilandmark analysis. All autosomes but two (chromosomes 18 and 19) harbor at least one, and in some cases up to three, QTL (chromosome 11) depending on the phenotyping method used. Summaries of the positions of the discovered QTL (location, nearest marker, and the confidence intervals) are provided in Table 2 for semilandmarks. Summaries for all three cases are provided in Table S2 and plotted on Figure 4. The median widths of the confidence intervals were 9 cM for both 3D landmarks and their 2D projections, and 5 cM for the semilandmark dataset, and three quarters of the confidence intervals were smaller than 17.3 cM, 20 cM, and 9.2 cM, respectively, for these three datasets (Table 1). Thirteen QTL were replicated across the three approaches, all 2D QTL were replicated in 3D but some were split in two with the semilandmark data or were not captured, eight were only mapped with the semilandmarks, and two only with the 3D landmarks ( Figure 4 and Table S2).
Effect sizes of QTL were small (1% of total Procrustes variance; Table S2) but explained on average 12% of the variance in the specific direction defined by the QTL effect (regression projection scores).
n a Number of protein-coding genes in the interval. b Number of candidate genes annotated for "mandible" in the MGI databases in the QTL confidence interval from 2D or 3D datasets. c Candidate genes annotated for "mandible" in the MGI databases in the QTL confidence interval for the semilandmark analysis. Candidates with nonsynonymous or splice-site variants between AJ and C57BL/6J are indicated in bold.
Replicated QTL between 2D and 3D data explained a similar percentage of total Procrustes variance (Wilcoxon test: V 16 ¼ 57; p ¼ 0:82), but 3D effects explained a significantly greater percentage of variance in the specific direction of the QTL (projection scores) than 2D effects (Wilcoxon test: V 16 ¼ 153; p ¼ 7:63 · 10 26 ). According to shape variation and covariate effects, QTL act mainly in the 2D plane ( Figure 3). However, the contribution of the third dimension may be as high as 30% for some QTL with the manual landmarks or the semilandmarks (Figure 3). It is important to note that these QTL were not detected in the 2D analysis (Table S2). Visualization of QTL effects from 3D landmarks only or semilandmarks are provided in supplementary Figure S1 and Figure S2.

Comparison of QTL-based G matrices
Altogether, these QTL accounted for 14.6% (2D), 15.1% (3D), and 16.9% (semilandmarks) of the total Procrustes variance. From those, 12.8% (3D) and 14.4% (semilandmarks) of the genetic variance are along the third dimension. Overall differences among the three G 2k matrices measured by the root Euclidean distances d H are low, the semilandmark matrix appearing as the most different (d Semi 2 2D=3D ¼ 0:012 and d 2D 2 3D ¼ 0:002). In agreement with this observation and similarly to P matrices, G matrices presented a very similar eigenvalue profile among the three approaches ( Figure 2) with g max explaining about 20% of the variance. Here, there was also a permutation between the two first PC in the 3D landmarks compared to the two other datasets. Once this permutation was controlled, these three g max were more similar than random vectors ða 2D=3D ¼ 18:2°; p , 10 25 ; a 2D=Semi ¼ 61:4°; p ¼ 0:012; a 3D=Semi ¼ 69°; p ¼ 0:025Þ based only on the xy-coordinates of the 13 fixed landmarks for the two first comparisons and on xyz-coordinates for the last one. The Krzanowski's common subspace analysis confirmed these observations. Angles between the first seven eigenvectors of the common subspace H and the q PCs accounting for 90% of the variance for each matrices (q ¼ 9 for 2D landmarks or 10 for the G 2k based on 3D landmarks or semilandmark datasets) were small (d ranging from 1.6-16°). Their associated eigenvalues were very close to their maximum value of three (D ranging from 2.997-2.859). This means that the common subspace may be almost perfectly recovered from linear combinations of the q i eigenvectors of any of the G 2k matrices. The semilandmark G 2k matrix appeared again to be the quickest to diverge, underlying the additional information we get from semilandmarks even if they were not explicitly taken into account in the construction of the common subspace.
Multivariate heritabilities are systematically greater with 3D than 2D data ( Figure 5) but in the same order ranging from 0.55-0.01. Semilandmarks show dimensions with a heritability ranging from 0.76-0.15. The shape changes associated with the most heritable dimension were very different to the one observed on 3D landmarks only ( Figure 5). Such discrepancies seem easily explained as the strong changes appeared to map on mandible surface and ridges where no manual landmarks could be easily captured.

DISCUSSION
Previous studies using QTL mapping of mandible size and shape in mouse have relied typically on 2D landmarks and sparse sampling of the genome using microsatellite markers (Klingenberg et al. , 2004 or a few hundred SNPs (Leamy et al. 2008). Our primary purpose in this study was to validate QTL responsible for the variation in mandibular shape observed in mice using 3D phenotyping with denser genotyping than previously attempted, and assess the gains (if any) from the consideration of the third dimension.
We detected marginally fewer QTL than studies based on the LG/J · SM/J intercross (Table 1). This might be expected as we fitted a model with all QTL simultaneously, whereas the LG/J · SM/J intercross studies modeled the genetics of the trait at the chromosome-wide level and used sample sizes about three times bigger. Effects of sample size and/or of allele frequencies seem striking in the study of Pallares et al. (2014), where only 10 loci were discovered but with greater precision, thanks to their use of an outbred population and a dense SNP map. Overall, about half of the 23 QTL we discovered had already been reported in the literature. Only one region on chromosome 15 seemed to be replicated across the three different crosses (Figure 4). Another GWAS locus on chromosome 16 was replicated in our cross. Two QTL previously detected with imprinting due to maternal genetic effects (Leamy et al. 2008) were also replicated here. At the time of the submission, a 3D geometric morphometric study using about 700 laboratory outbred mice, 80,000 SNPs, and univariate association mapping of the main principal components was published (Pallares et al. 2015). They mapped seven loci for mandible shape with great precision and their main finding, which was related to the Mn1 gene, seems to be replicated here based on gene content (SH8 on chromosome 5 in Table 2).
It may initially appear that not much difference exists between the mapping results based on 3D manual landmarks and their 2D projections, due to their significant overlap (Figure 4). However, the benefit of adding the third dimension is demonstrated by the reduction in the confidence intervals of the QTL estimates. The median CI Figure 4 QTL from 2D, 3D, and semilandmark analyses. Results from earlier studies from the LG/J · SM/J intercrosses (Klingenberg et al. , 2004Leamy et al. 2008) and from the Pallares et al. (2014) GWAS (genome-wide association study) are plotted in the gray boxes. estimate for our 2D projection results was 11 cM. Yet, when we conducted the mapping on 3D landmarks, extreme CIs were marginally reduced and median CI was down to 6.3 cM when we used semilandmarks. The difference was even more striking when we considered the total length of the QTL intervals, which dropped by 120 cM, representing a 38% drop in the number of known protein coding genes with 75% of QTL covering at most 146 protein coding genes (358 with 2D and 271 with 3D landmarks). The 2D projection data yielded very similar results to the one of 13.3 cM obtained from the remaining 24 QTL reported in earlier studies (Klingenberg et al. , 2004) despite a marked difference in molecular marker density (10-fold).
The consideration of the third dimension leads to the discovery, with 3D landmarks, of two additional QTL on chromosome 1 and 14, both QTL covering a few mandible-annotated genes (Disp1, Hhat, Ifr6, and Kat6b). Adding one dimension to each landmark (i.e., 10 independent dimensions to the shape space) increases detection power. This increase in power is not just related to bigger effect sizes or higher dimensionality, which could be more costly than useful in some cases (Healy 1969;Adams 2014). The two additional QTL actually have 26% and 32% of their effects along these additional dimensions. Thus, the specific consideration of the third dimension in the mapping is clearly very informative in these cases. The relaxed constraints on homology of semilandmarks and their ability to densely model the surface of bone led to the discovery of eight additional QTL. Three of them show between 20-30% of their effect on the third dimension. Three of those additional QTL cover one to four mandible-annotated candidate genes: Lef1, Rere, and in the same QTL Akap13, Kif7, Serpinh1, and Folr1. The overall additive genetic variation (G) is consistent between 2D and 3D data, both in terms of the amount of total variation captured and the directionality of major variations. Beyond an overall similarity with other data, semilandmarks capture original shape features related to the bone surface, and those features drive the pattern of multivariate heritability (GP 2 ).
Multivariate approaches have been repeatedly shown to be more powerful than analyzing individual PCs (or univariate traits) independently in various contexts (see for instance in GWAS, Galesloot et al. 2014;Gao et al. 2014;Stephens 2013). However, this univariate PC approach is commonly used, at least for operational reasons (e.g., Pallares et al. 2014Pallares et al. , 2015Percival et al. 2016, with geometric morphometric data). An incidental result of our study is a plea for the use of a fully multivariate approach with shape data, not only because shape is a single multidimensional trait (Klingenberg and Gidaszewski 2010;Collyer et al. 2015). Major phenotypic PCs appear to have only minor components of variance on the third dimension, whereas some QTL present up to a third of their variation in that specific direction. Prior selection of shape variables based on phenotypic PCA will reduce effect sizes by a third for those QTL, thus reducing power. We have chosen to reduce the dimensionality of the semilandmark data using such a technique. However, we kept most of the total variance (95%) while discarding a large amount of nonnull dimensions (355). Our assumption is that the remaining variance was only random variation. Such noisiness is inherent to the registration process of semilandmarks. This assumption leads us to not reestimate the effects and their associated effect sizes on the complete shape, as doing so will only marginally change our estimates. However, this reestimation of QTL effects on the complete shape may be good practice when the QTL detection is, for some reason, done on a strongly reduced shape space (Pallares et al. 2014(Pallares et al. , 2015Liu et al. 2014;Zeng et al. 2000;Langlade et al. 2005;Mezey et al. 2005;Franchini et al. 2013). Major QTL may be detected with the first PCs only, but they also have pleiotropic effects on additional dimensions.
In conclusion, the congruence of our results pleads for robustness of our knowledge on the genetic architecture of the mouse mandible, built over a few decades and initially based on 2D imaging techniques. There are many benefits of performing 2D morphometrics in large phenotyping programs, which can be summarized as the simplicity of the techniques and the time saved, but they come at the price of accuracy. Despite inherent difficulties and workloads that may impede the broader use of 3D techniques, information can be gained even from fairly flat structures like the mouse mandible. Once these technical and cost difficulties have been overcome, it appears that making the most of new technologies by opting for denser phenotyping is worth the supplementary technical expertise that is required, and may provide some new insights on the genetics of shape.

ACKNOWLEDGMENTS
We acknowledge Erika Jessett for her help in image processing, Sarah Park for her help in DNA sample preparation, and we would like to thank two anonymous reviewers for their constructive comments. We acknowledge the following institutions and endowments for sponsoring portions of this research: a National Institutes of Health/National Institute of Dental and Craniofacial Research Pathways to Independence award to A.