Estimating heritability in honeybees: Comparison of three major methods based on empirical and simulated datasets

The genetic contribution to phenotypic variation (namely the heritability) affects the response to selection. In honeybee, the haplodiploid sex determination does not allow the straightforward use of classical quantitative genetics methods to estimate heritability and genetic correlation. Nevertheless, specific methods have been devel oped for about 40 years. In particular, sibling analyses are frequently used with three main methods: an historical model using the average colony relatedness, a half- sib/ full- sib model, and the more recent animal model. We compared those three methods using experimental and simulated datasets to see which performs the best. Our experimental dataset is composed of 10 colonies with a total sample of 853 workers. All individuals were genotyped to reconstitute the pedigree, and phenotypic traits were measured: labial palpus and wing cubital veins lengths. We also simulated phenotypic datasets with varying levels of heritability, common environment effect, and genetic correlation between traits. The simulation approach showed that the average colony relatedness was highly biased in presence of common environment effect whereas the half- sib/full- sib and the animal model gave reliable estimates of heritability. The animal model provided the greatest precision in genetic correlations. Using this latter method, we found that wing vein lengths had high heritabilities whereas the palpus length had lower heritability due to larger environmental variance and/or measurement error. Finally, significant genetic correlations among measured traits indicate that they do not evolve independently.

the epistatic variance (due to interaction of alleles at different loci).
The response to selection depends only on the additive part of the genetic variance as indicated by the (univariate) breeder's equation R = h 2 S, where R is the expected change in trait means per generation, h 2 the narrow-sense heritability and S the selection differential (Lush & Hubbs, 1945). Quantitative genetics analyses allow breeders to improve economically important characteristics in animals or plants and evolutionary biologists to understand or predict trait evolution.
Similarly, studies of heritability in the honeybee, Apis mellifera, had the primary goal of increasing honey production, disease resistance, and decreasing swarming and aggression (Koffler et al., 2017). Such economical concerns had motivated the development of specific solutions to apply quantitative genetics analyses to this haplodiploid polyandrous species. First, the morphological, physiological, and behavioral differences between castes preclude the use of straightforward parent-offspring regression approach (but coloration traits can be analyzed, e.g., Szabo & Lefkovitch, 1992). A noticeable exception is the Cape honeybee, Apis mellifera capensis, in which laying workers produce female workers, allowing parent-offspring regression to be performed (Brandes, 1988;Le Conte et al., 1994;Moritz & Hillesheim, 1985;Moritz & Klepsch, 1985). More importantly, the general framework of quantitative genetics is best developed for diploid organisms, reproducing randomly. If noncompliance with the last assumption has rather minor consequences on heritability estimates, the asymmetrical inheritance of parental genomes, due to haplodiploid sex determination, greatly complicates analyses.
For the last 40 years, adaptations of existing sibling analyses methods and protocols available for diploid species have been proposed. In 1977, Rinderer proposed to use artificial insemination of queens to control relatedness of the workers (Rinderer, 1977). In the case of single-drone inseminations, a colony consists of a fullsib family of workers with known relatedness r = 3 4 whereas in the case of multiple-drone inseminations (with more than 20 drones), the colony approximates a diploid-diploid half-sib system (with = 1 4 ) (e.g., Collins et al., ,1984Collins et al., , , 1987Harbo, 1992;Rinderer et al., 1983).
Heritability is then estimated from the intracolony correlation t (also called the sibling correlation) such that h 2 = t r . Rinderer's method was soon improved to avoid time-consuming and expensive artificial insemination. In 1983, Oldroyd and Moran published a general formulae to calculate the average colony relatedness when a queen is mated with m drones, r = 1 4 + 1 2m (also published by Laidlaw & Page, 1984or Milne & Friars, 1984. This approach has been widely used (until 2012(until , Goudie et al., 2012 if it was early recognized that it produced upward bias in h 2 estimates (Diniz-Filho et al., 1993;Goudie et al., 2012;Melo et al., 1997;Oldroyd & Moran, 1983;Oldroyd et al., 1991;Poklukar & Kezić, 1994). Indeed, sib workers within a colony are raised in a common environment which further increases phenotypic resemblance beyond genetic effects (Oldroyd et al., 1991). Actually, this common environment is similar to a maternal (phenotypic) effect in which sibling's phenotypes are influenced by the maternal phenotype and by the environment provided by the mother.
When paternal lineages within colonies can be artificially manipulated (thanks to controlled fertilization) or identified based on genotyping, a typical half-sib/full-sib design can be applied. Liu and Smith derived heritability calculation of this sib design (and three other experimental designs) in the case of one male mated with several females (Liu & Smith, 2000). The reverse situation (one femalequeen mated with several male-drones) is the natural situation in the honeybee and this leads to different formulae, as pointed out by Fjerdingstad (Fjerdingstad, 2005;Harpur et al., 2014;Laloi & Pham-Delegue, 2010).
With molecular genotyping, the additive genetic relationship matrix among individuals can be derived from the pedigree and included as random factor into the so-called animal model, a linear mixed model (Wilson et al., 2010). This approach is more flexible than the half-sib/full-sib method since it considers all known degrees of relatedness between measured individuals and it does not require a balanced breeding design. However, the main issue when using the animal model is the inversion of the additive genetic matrix (which is not diagonal) in haplodiploid species. Bienefield et al. (Bienefeld et al., 2007applied in Costa-Maia et al., 2011Faquinello et al., 2011;Pernal et al., 2012) proposed a first solution using an approximation which was further improved by Brascamp and Bijma (Brascamp & Bijma, 2014;Brascamp et al., 2016). Even more recently, Bernstein et al. (Bernstein et al., 2018) published a new algorithm to facilitate the previous method on large datasets. Another approach to invert the additive genetic matrix is based on methods specifically developed for sex chromosome inheritance, which is functionally equivalent to haplodiploidy (Bohidar, 1964;Crow & Roberts, 1950;Crozier, 1970;Fernando & Grossman, 1990). In haplodiploid species, a mother contributes to the genome of both sons and daughters, whereas a father contributes only to the genome of daughters.
Similarly, in sex chromosome, genes linked to the X-chromosome are transmitted to daughters and sons, whereas genes linked to the Y-chromosomes are only transmitted to sons. This method of matrix inversion is implemented in the R package nadiv (Wolak, 2012), which has been applied to wasps but never to honeybees (Sheehan et al., 2017).
However, many traits under selection are genetically correlated and will not evolve independently from each other, due to linkage or pleiotropy (one loci influencing several traits). In such case, estimating genetic correlation between traits allows a better understanding of trait evolution. They are calculated following the same approaches as heritability estimates but require larger sample size to obtain reliable estimates (around 1,000 individuals when considering only two traits, Brown, 1969).
These three sibling analysis methods, average colony relatedness, half-sib/full-sib, and animal model, have allowed estimating heritability and genetic correlations of a vast number of traits in honeybee (e.g., morphology, behavior, life-history traits). Koffler et al. (2017) reviewed published estimates in honeybee (and other bee, ant, and wasp species), and they showed that heritability estimates varied between trait types. This is expected from Fisher's fundamental theorem saying that traits more closely related to fitness should be more submitted to selection which decreases genetic variation (Fisher, 1958). However, their meta-analysis did not detect any effect of the analytical methods used (among which sibling analyses and parent-offspring regression) on heritability estimates. This result, contrasting with previously published studies (e.g., Postma, 2014), could either be due to restricted statistical power or to biases inherent to meta-analysis: datasets diverge on several factors (analytical method, sample size, and traits) resulting in confounded effects.
In addition, a few studies compared different methods in the same honeybee dataset and found that those methods yielded heritability estimates in the same range (Brandes, 1988;Moritz, 1985;Moritz & Klepsch, 1985). Genetic correlation estimates are scarce (14 studies only in honeybee), and half of the estimates are not significantly different from 0 or were not provided with any significance tests or standard errors (Koffler et al., 2017). Overall, previous works do not allow to draw any firm conclusion on the performance of different methods, sample size, or experimental design to estimate genetic correlation and heritabilities. One option is to use simulation studies to compare available methods (as well as sample size and design) based on their accuracy of estimates and on their statistical power to detect heritability and genetic correlation (De Villemereuil et al., 2013;Holand & Steinsland, 2016;Morrissey et al., 2007).
In this paper, we used both empirical and simulated datasets to compare the performance of three sibling analysis methods, average colony relatedness, half-sib/full-sib, and animal model, in the estimation of heritability and genetic correlations. Our empirical dataset is composed of three morphological traits measured on 853 workers from ten colonies of Apis mellifera unicolor sampled in two islands of the South-West Indian Ocean. The three traits (associated with mouthpart and forewing morphometry) are used to discriminate between species or populations, indicating relatively fast evolutionary rates (Cornuet et al., 1975;Ruttner, 1988;Ruttner et al., 1978). We also generated simulated phenotypic datasets with varying levels of heritability, genetic correlation, and common environment effect.

| Empirical dataset
Honeybee workers were sampled in Mauritius and La Reunion, two islands of the Mascareignes archipelago situated in the South-West Indian Ocean. In this area, the local honeybee subspecies is A. m. unicolor belonging to the African lineage with recent import of the A. m. carnica, ligustica, and mellifera subspecies belonging to the European lineage (Techer, Clémencet, Simiand, Turpin, et al., 2017).
We sampled 95 worker honey bees per colony, from the frames of the hives, in a total of ten colonies. Those workers are a mixture of full-sisters and half-sisters (sharing the same queen mother but different fathers). Five colonies were sampled on Mauritius in October 2014, whereas five colonies were sampled on La Reunion from 2011 to 2018 (Appendix S1). Individuals were kept in 95% ethanol at −80°C until they were processed.

| Morphometric measurements
For each worker, the mouthparts and the right forewing were dissected and digitally photographed. The right forewing of each bee was cut at its base and mounted in water between micrometer blade and cover. The mouthparts were dissected and then mounted on a micrometer slide with a strip of adhesive tape.
Pictures were acquired using a video camera, and measurements were made on the AxioVision SE4 software 4.7. To study the variation in size of the mouthparts, we measured the length of the long segment of the right palpus (Appendix S3) which is not influenced by the extension of the proboscis and hence more repeatable than the proboscis (Morimoto, 1968). For the wings, we measured the length of the cubital veins A and B and computed the cubital index (CuI) as the ratio A B , an index informing on the shape of the wing. This index allows to discriminate between species, subspecies, and even populations (Cornuet et al., 1975;Ruttner, 1988;Ruttner et al., 1978).
The palpus length and cubital cell of 36 individuals were measured 4 times with two montages and two measurements for each of these montages. Measurement errors were mainly due to slide mounting, and it was almost null for the cubital cell (A, B, and CuI) and less than 2% for the palpus length. Therefore, experimental noise would not artificially increase the residual variance and bias heritability.
All measured traits (except for CuI) significantly differed between colonies and between islands (Appendix S4). Morphological measurements were globally larger on La Reunion than on Mauritius.

| Quantitative genetics analysis methods
Three commonly used methods based on sibling analysis were compared for their performances in estimating heritabilities and genetic correlations: (a) a simple linear model considering a colony effect and an average colony relatedness between individuals (Oldroyd & Moran, 1983); (b) a nested model of patrilines in the colonies equivalent to a half-sib/full-sib approach (Fjerdingstad, 2005); and (c) an animal model based on the pedigree of individuals (Sheehan et al., 2017).
In the three methods, dominance variance (V D ) cannot be separated from additive variance (V A ) due to haplodiploidy (Fjerdingstad, 2005;Liu & Smith, 2000). Thus, only the broad sense heritability is estimated: In addition, those methods do not model the epistatic variance which will contribute to the genetic variance. This epistatic variance is very difficult to estimate and assumed to be negligible, at least for diploid organisms (Lynch & Walsh, 1998).
The first method does not require the identification of patrilines of the measured workers but an estimate of the number of efficient matings (m) is needed. Heritability is then calculated as follows: where r is the average colony relatedness r = 1 4 + 1 2m (Oldroyd & Moran, 1983) and t the intracolony correlation t = with V col the variance among colonies and V R the (residual) variance within colonies (Lynch & Walsh, 1998). Note that the estimate of r changes little when m is greater than 8, which appears to be usually the case for freely mating queens (Oldroyd & Moran, 1983).
The genetic correlation (r G ) is given by: with cov col t1t2 the covariance between traits t 1 and t 2 among colonies.
The second method is based on the identification of the patriline of each worker and applies a mixed-effects model with patrilines nested within colonies as random effect (Fjerdingstad, 2005).
Heritability is calculated according to the following formula: (with V pat the variance among patrilines, V CE the variance associated with the common environment of the colony and V R the residual variance) owing that V pat = 1 2 V A + 1 2 V D . Note that the coefficient for the dominance variance is 1 2 because all paternal half-sibs get the same alleles from their father.
The genetic correlation is estimated by: with cov pat t1t2 the covariance between traits t 1 and t 2 among patrilines.
The third method is based on a mixed-effect linear model where the pedigree is used to derive the additive genetic relationship matrix among individuals, fitted as random effect. To create the inverse of the additive genetic matrix required by asreml-R, we used the makeS function of the nadiv package implemented in R (Wolak, 2012). This function returns the inverse of the additive genetic relationship matrix for the sex chromosomes since haplodiploidy is equivalent to total sex-linkage of all genes. Here diploid females are coded as the heterogametic sex whereas haploid males are the homogametic sex.
A "colony" (common environment) random effect is added to distinguish the effect of common environment from the genetic effect of the queen. Thus, heritability can be calculated as follows: where V A is the variance associated with the additive genetic matrix (remember that in our case, V A is confounded with V D ), V CE the variance associated with the common environment of the colony and V R is the residual variance.
The genetic correlation between two traits (t 1 and t 2 ) is calculated based on bivariate animal models according to the formula: with cov G the genetic covariance between traits t 1 and t 2 .
In the second and the third methods, the amount of variance attributable to the common environment effect (CE 2 ) can be calculated as follows: To test whether heritability and genetic correlation estimates are significantly different from 0, log-likelihood ratio tests between nested models (with and without colony/patriline/additive matrix/ genetic covariance) were performed.

| Quantitative genetics analyses of the empirical dataset
We applied the three methods described above to calculate heritabilities and genetic correlations between our measured traits (A, B, CuI, and palpus). The ASReml-R package (Gilmour et al., 2009)  Using the animal model, we also tested if the additive, common environment, and residual variances differed between islands using a bivariate model. Log-likelihood ratio tests allowed testing if variances were significantly different between islands. The approximate standard errors of heritabilities and genetic correlations were estimated with the nadiv R package, applying the delta method (Lynch & Walsh, 1998;Wolak, 2012).

| Quantitative genetics analyses of simulated datasets
We performed simulations to test the performances of the three methods in estimating heritability and genetic correlations. Our hypotheses were that ( and ε i , the residual variation (normally distributed with V R variance).
The genetic values A i were generated according to the pedigree (see details below) and V A using the grfx function of the nadiv package (Wolak, 2012). We did not simulate any epistatic variance.
In each scenario studied, the heritability (h 2 ) and common en- For each scenario, 1,000 phenotypic datasets were simulated using one of the three tested pedigrees. The first one is our unbalanced experimental pedigree (with 853 workers in 10 colonies). The second one is a simulated balanced pedigree. This simulated pedigree was built to be of similar size and design to our experimental pedigree: It is composed of 840 phenotyped individuals from 10 queens each crossed with 42 males (against an average of 38 males in our real pedigree), with a full-sib family size of two workers. The third pedigree was a modified pedigree from our experimental dataset (i.e., with 853 individuals), where three queens of three different colonies (out of the 10 colonies in the dataset) were considered to be full-sisters, therefore adding relatedness relationship between those three colonies.
Therefore we generated 48,000 datasets (16 phenotypic scenarios × 3 pedigrees × 1,000 datasets). We estimated the heritability, confidence intervals, and significance of the "genetic" effect (corresponding to the additive genetic matrix) of each simulated phenotypic dataset with the three previously described methods (average colony relatedness, half-sib/full-sib, and animal model), except for datasets generated with the third pedigree (with related queens) for which we compared only the half-sib/full-sib and animal model.
For each dataset, we computed the mean and 2.5%, 25%, 75%, and 97.5% quantiles to quantify the bias and the dispersion of heritability estimates. We also calculated the power to detect heritability as the percentage of simulated datasets with a significant genetic effect. However, when estimates are biased, power is not a suitable parameter to assess the performance of a method and should be replaced by the coverage of 95% of confidence intervals (Bolker, 2008). Confidence intervals were calculated assuming a normal distribution of estimates (h 2 ± 1.96 × SE). Coverages were computed as the proportion of datasets in which simulated value of heritability was within the 95% confidence interval.

| Genetic correlation
Using the grfx function of the nadiv R package, we simulated two correlated traits with heritabilities being a combination of 0.1, 0.3, and 0.5 (hence nine combinations). We skipped the null value since there could not be genetic correlation between traits that are not heritable. We did not include any common environmental effect.
We simulated only positive genetic correlations between traits, considering that the results would be symmetrical with negative genetic correlations. The simulated genetic correlation values were 0, 0.1, 0.3, and 0.5 (according to the genetic correlation estimates published in honeybee (Koffler et al., 2017)). The statistical performances of the three methods were evaluated as above (precision, power, and coverage).

| Empirical dataset
All the studied traits showed significant heritability but estimates varied according to the method used ( Using the animal model, variance estimates were not significantly different between years (p = 1). Variance estimates differed between islands for palpus length but not for wing-related traits (Table 3). Indeed, palpus length heritability was lower on Mauritius (H 2 = 0.18) than on La Reunion (H 2 = 0.46). This difference was mostly due to an almost 10 times higher common environment variance on Mauritius.

| Simulations
Analyses carried out with the three methods with simulated data- timates (though with slight upward bias of around 8% for CE 2 = 0.5, TA B L E 1 Estimated genetic parameters of the measured traits estimated (the cubital veins A and B, the cubital index (CuI) and the length of the long segment of the right palpus) using three methods: an average colony relatedness, a half-sib/full-sib model (HS/FS), and an animal model Year and Island were included as fixed effects. All genetic effects were significantly different from zero as tested by likelihood ratio tests.

TA B L E 2 Genetic correlation, standard error (SE), and 95% confidence interval (CI) between pairs of measured traits (the cubital veins
A and B, the cubital index (CuI) and the length of the long segment of the right palpus) estimated using three methods: an average colony relatedness, a half-sib/full-sib model and an animal model (without island as random effect)  Figure 1). Similar results were found using a balanced pedigree (Appendix S6).
For the three methods (and the three tested pedigrees), the rate of false positive (probability to obtain a significant genetic effect whereas h 2 = 0) was low (<3%), except for the average colony relatedness method in presence of a common environment effect due to large bias in heritability estimates (as mentioned above, Figure 1. right side). Accordingly, coverage was very low for the average colony relatedness when the common environment effect was moderate to high (CE 2 > 0.3). The two other methods led to greater coverage but still below 95%, whatever the value of the true heritability. Finally, the statistical power was low (between 35% and 90%) for heritabilities of 0.1 but very high (>98%) for moderate to high heritabilities Using the simulated pedigree with related queens, we found that the half-sib/full-sib and the animal model methods yielded similar results as mentioned above for our empirical pedigree. We noticed only a very slight decrease in the precision of estimates for the halfsib/full-sib method (Appendix S7).
Concerning genetic correlations, the average colony relatedness method gave estimates with upward bias, low precision, and low power ( Figure 2a). On the contrary, the half-sib/full-sib and the animal model methods gave fairly good estimates, in particular when the two considered traits had high heritabilities. However, the half-sib/full-sib method has a slight upward bias hence lower coverage of 95% CI (Figure 2a,c).
The statistical powers of half-sib/full-sib and the animal model methods were low when genetic correlation was lower than 0.5 except when the two traits were highly heritable (Figure 2b). Note that the half-sib/ TA B L E 3 Estimated genetic parameters of the measured traits (the cubital veins A and B, the cubital index (CuI) and the length of the long segment of the right palpus) estimated using a bivariate animal model, with distinct parameters for each island Trait Mauritius Reunion We presented values for genetic variance (V G ), variance associated with common environment (V CE ) and residual variance (V R ), heritability (H 2 ), and its standard error (SE) for each island. p-values indicated if variances significantly differed between islands, as tested by likelihood ratio tests.

F I G U R E 1
Performance of heritability estimates for the different models used (average colony relatedness, half-sib/full-sib, and animal model, white, gray, and black bars, respectively) depending on the levels of simulated heritabilities and of common environment effects (CE 2 , vertical panels). (a) Distribution of heritability estimates. Boxes represent 25 and 75% quantiles, and the middle line represents the mean. Whiskers show 2.5% (bottom) and 97.5% (top) quantiles. Note that the scales of the Y-axis vary for the different levels of CE 2 to facilitate comparison between the three models. (b) Power to determine genetic effect and (c) coverage of 95% confidence interval of heritability estimates. Simulated datasets were based on our experimental pedigree full-sib method was subject to convergence failure (when using default convergence parameters Appendix S8). This was largely determined by the estimated heritabilies of the two correlated traits: failure to converge was high when heritabilies were low.

| D ISCUSS I ON
We compared the performances of three sibling analysis meth-

ods, average colony relatedness, half-sib/full-sib, and animal model,
in the estimation of heritability and genetic correlations. Of the three methods tested, the average colony relatedness relies the least on patriline reconstruction, since it is based on a mean number of patrilines per colony and the calculation of average relatedness is little impacted by the effective number of matings (Oldroyd & Moran, 1983). Hence, it is not necessary to genotype all individuals which constitutes a real practical and financial advantage. The two other methods were influenced by the quality of pedigree reconstruction which depends of genotyping errors, the presence of null alleles, or informativeness of genetic markers (Csilléry et al., 2006;Pemberton, 2008;Wang, 2006). Here, the number of patrilines estimated for each colony in our experimental dataset fluctuates according to the number microsatellite markers used: expectedly, with seven markers instead of eight, the number of patrilines estimates per colony was lower. Similarly, heritability estimates were lower with seven microsatellite markers (Appendix S9). Indeed, when the number of patrilines estimated is less than the actual number of fathers, half-sisters will be considered as full-sisters while their resemblance will be weaker than expected, thus leading to a lower heritability (Firth et al., 2015).
However, the magnitude of underestimation associated with a lower number of microsatellite markers was small (typically between 3% and 15% depending on the considered trait). Those values are in line with already published biases associated with pedigree errors (Bérénos et al., 2014;Charmantier & Réale, 2005;Firth et al., 2015) and have little impact on general interpretations.
Despite its practical simplicity, the average colony relatedness method suffers from a major flaw: an upward bias in heritability estimates (even yielding estimates higher than 1) caused by common environmental rearing condition of the workers (Poklukar & Kezić, 1994). This major limitation had already been acknowledged before, and this bias was drastically reduced by cross-fostering of offspring workers into different rearing environment (Oldroyd et al., 1991).
Another solution is to reduce the differences in rearing/developing conditions between colonies, using a "common garden" experiment (see, e.g. Moritz & Hillesheim, 1985;Oldroyd & Moran, 1983). Under performance of the two methods with unbalanced pedigree was contrary to our expectations. First, this may be due to the fact that our experimental design is not heavily unbalanced. Second, we used restricted maximum likelihood procedures that are relatively insensitive to unbalanced designs (contrary to ANOVA approaches). In such situation, using either of both methods are equally pertinent and people who are not familiar with the animal model (and the inversion of the additive matrix with appropriate tools) may prefer to use the half-sib/full-sib method.
In our experimental pedigree, we reconstructed a fairly simple pedigree (similar to a half-sib/full-sib design) and considered that fathers and queens were unrelated to each other. This assumption was reasonable since mating occurs during nuptial flight in congregation area where thousands of drones gather, from a large number of colonies in the surroundings, resulting in an almost perfect panmixis (Baudry et al., 1998). In such a situation, the copulation of the queen with one drone related to herself or with two drones related to each other is very unlikely. In addition, when colonies are sampled in distant areas, queens are less susceptible to be related.
However, colonies sampled in the same apiary (as is the case for two colonies of our dataset sampled in Le Baril) may be related, if they originated from the division of one hive into daughter colonies. To explore this situation, we simulated a pedigree with related queens. We expected the half-sib/full-sib method, which ignores such additional relationship, to be more biased than the animal model, which takes into account all types of relationships between individuals. Surprisingly, this was not the case and both methods provided reliable estimates. This may be due to the fact that the added relatedness links were negligible, and it echoes the demonstration of Liu and Smith (2000) showing that moderate inbreeding may not notably bias the genetic parameters estimated by sib analysis. Therefore, in most cases, half-sib/full-sib and animal model methods may be used but the animal model will yield slightly more precise estimates (i.e., with the smallest RMSE). In addition, the animal model method can handle complex models which may be required to study colony traits (Bienefeld et al., 2007;Bienefeld & Pirchner, 1990) but also to deal with dominance (Wolak & Keller, 2014).
In the case of a haplodiploid species, the genetic variance estimated by the animal model is composed of an additive and a dominance component. According to Fisher's assumption, in population with large effective size, the variance of dominance is supposed to be negligible (Wolak & Keller, 2014). Indeed, the dominance effect is linked with the genetic background in which it is expressed (Fisher, 1958). Thus, in a population of infinite size, every possible genetic backgrounds are represented and the dominance effects average out to zero. In the case of La Reunion and Mauritius, the effective honeybee population size seems to be large (Techer, Clémencet, Simiand, Turpin, et al., 2017). Hence, the dominance variation, which are likely to contribute to our heritability estimates, will have a limited effect on the genetic variance of the population.
One meta-analysis showed that the dominance variance is only important for domesticated species and is generally low (0.15, Wolak & Keller, 2014). More recently, Class and Brommer (2020) confirmed that the dominance variance represents a modest amount of genetic variance in a wild population of blue tits.
As mentioned above, it is theoretically possible to determine the dominance variance in an animal model using the dominance matrix which can be obtained using the makeSd function of the nadiv R package (designed for sex chromosome inheritance and thus appropriate for haplodiploid organisms) (Wolak, 2012). In a design like ours (half-sib/full-sib), the additive matrix is identical to the dominance matrix, which prevents to separate the two variances V A and V D . A more complex pedigree (with more complex kinship relationships) would resolve this constraint (Wolak & Keller, 2014).
Considering genetic correlation, our simulation approach clearly demonstrates that the average colony relatedness method is not suitable for such task even in the absence of common environment effect: estimates were biased and dispersed (except when the two traits were highly heritable), and the statistical power was very low. The half-sib/ full-sib and the animal model were more appropriate with the latter less prone to false positive and providing more accurate estimates.
We did not test the impact of sample size on the performances of the three methods (either on genetic correlation or heritability) but it has been regularly demonstrated that quantitative genetics require large sample size (at minimum 250 individuals and preferably over 500), in particular when estimating genetic correlation (which requires rather almost 1,000 individuals (Brown, 1969;Lynch & Walsh, 1998;De Villemereuil et al., 2013). In honeybee, most colonies are easily accessible and colony sizes are very large. This allows sampling a large number of individuals, families, and colonies, and hence, the only limitation to obtain robust estimates is the technical and financial ability to phenotype and genotype all sampled individuals.
According to the simulation results, we decided to discuss only the estimates provided by the animal model in our experimental dataset. This model yielded high heritability estimates for the four measured traits, which is consistent with the honeybee literature (Moritz & Klepsch, 1985;Mostajeran et al., 2006;Oldroyd et al., 1991;Poklukar & Kezić, 1994;reviewed in Koffler et al. 2017).
In general, morphological traits are known to display higher heritabilities than fitness-related traits (Mousseau & Roff, 1987). The lower heritability of the palpus length could be in part due to errors of measurement contributing to residual variance. We also found that variance estimates are not significantly different for wing traits between the two islands indicating little influence of environmental factors. In contrast, palpus length is less heritable on Mauritius than on La Reunion due an almost 10 times larger common environment variance. Accordingly, the proboscis is the morphological character showing the largest geographic variability (Ruttner, 1988;Ruttner et al., 1978) supporting a great phenotypic plasticity. In addition, colonies came from more diverse environments on Mauritius compared to La Reunion where two areas were sampled twice (Le Barril and Ligne Paradis/Bassin Plat).
Our results indicated that the A vein was genetically correlated with the B vein and with the palpus. However, genetic correlation estimates between A and palpus obtained with the half-sib/full-sib and the animal model were not congruent and this may be due to the fact that the heritability of palpus is relatively low, resulting in less precise genetic estimates correlations as shown by our simulation study.
Phenotypic and genetic correlations between the A and B wing veins are in the same direction. This is probably a result of developmental constraints on the shape of wing cells to ensure efficient flight.
Our results are in line with previous studies showing genetic correlation between morphometric traits in honeybee (Collins et al., 1984;Poklukar & Kezić, 1994).
To summarize, wing traits are highly heritable and seem robust to environmental variation. These traits also show divergent pattern between populations and subspecies and are therefore useful for morphological determination. On the contrary, palpus length is less heritable (particularly in Mauritius) and displays higher phenotypic plasticity: it cannot be used to classify specimens in the South-West Indian Ocean.

ACK N OWLED G M ENTS
We thank Olivier Esnault for providing honeybee samples and

Meenowa Deodass and Sookar Preeaduth for their help in fieldwork
in Mauritius. We also thank Elodie Chapuis and two anonymous reviewers for their comments on the manuscript and Matthew Wolak for his help with nadiv package. The authors acknowledge the South Green Platform (http://www.south green.fr) for providing computational resources that have contributed to the research results reported within this paper. This study has been financially supported by the CIRAD, the "Conseil Régional de La Réunion" and the European Agricultural Fund for Rural Development (EAFRD).
The authors acknowledge the Plant Protection Platform (3P, IBISA) where molecular experiments were conducted.

CO N FLI C T O F I NTE R E S T
The authors have declared that no conflict of interest exists.

DATA AVA I L A B I L I T Y S TAT E M E N T
Morphological data, genotype, and pedigree are available on Dryad https://doi.org/10.5061/dryad.tx95x 69vd.