Flu vaccine production gets a shot in the arm.

The world is holding its collective breath. Governments and health organizations are feverishly preparing, stockpiling drugs and vaccines, and formulating contingency plans. Headlines are coughing out dire predictions of up to 100 million deaths worldwide and devastating economic consequences. Experts are chillingly warning that it’s not a matter of if, but when—when a pandemic of avian influenza will strike the human population. 
 
Although the H5N1 avian flu is already a serious problem in Asia, it will not become a major threat to human health worldwide until and unless the virus mutates into a strain that is both highly virulent and highly communicable from human to human. At present that is not the case, but influenza viruses are notorious for their ability to mutate via a process called antigenic drift. 
 
Mutation into a strain with the potential for pandemic may never happen, but if it does, mortality could be extremely high. Rapid global travel could spread the disease quickly, and, unlike with the seasonal flu strains that come around every winter, our bodies are not immunologically familiar with this type of avian influenza—no one will have native defenses to ward off or minimize infection. 
 
Massive efforts are under way both in the United States and internationally to respond should an avian flu pandemic occur. One of the most important elements in controlling a pandemic will be the development and production of an effective vaccine. Now Yoshihiro Kawaoka and his colleagues at the University of Wisconsin–Madison School of Veterinary Medicine and the University of Tokyo have perfected an advanced method of producing the inactivated “seed” virus used to produce influenza vaccine, a technique known as reverse genetics. This breakthrough may represent a critical step forward in accelerating the production of enough vaccine in a short enough time to prevent massive loss of life.


Background
Use of linear models for the estimation of genetic parameters for categorical traits violates basic assumptions of mixed linear model methodology. Algorithms have been developed to transform linear model estimates to the underlying liability scale in order to compensate for the estimation bias caused by analysis of non-linear traits in linear models, but transformed genetic parameter estimates might still be significantly biased [e.g., [1][2][3][4]]. Use of threshold models for estimation of genetic parameters directly accounts for the non-linear nature of categorical traits, and threshold model estimates should be more reliable than linear model estimates or transformed linear model estimates [e.g., [5][6][7]].
Markov chain Monte Carlo (MCMC) methods such as Gibbs sampling (GS) make it feasible to implement multivariate threshold models or multivariate mixed linearthreshold models. Animal models fully use the available pedigree information, but accuracy of genetic variance and covariance estimates and convergence of GS chains might be a problem in the case of low trait prevalences and few observations per individual [e.g., [5,[8][9][10]]. Therefore, in practical situations implementation of multivariate animal threshold models is not always straightforward. The inclusion of continuous traits, i.e. use of a multivariate linear-threshold model, is expected to increase the reliability of genetic parameter estimates [e.g., [2,[11][12][13]].
In animal breeding, health data are often recorded using discrete categories, whilst most performance traits are continuous. In the horse, binary coding has been used for investigations on radiographic findings in the equine limbs, and high prevalences of radiologically visible alterations, mostly in the range of 10-25%, have been determined in the limbs of young horses [e.g. [14][15][16][17]]. This promoted the search for preventive rather than curative measures. Because strength and soundness of the equine locomotory system is of great importance in all sectors of the horse industry, inclusion of radiological health traits in current breeding schemes of the Warmblood riding horse has been suggested [18,19]. Reliably estimated genetic parameters for these categorical traits provide the basis to do so. For the analyzed radiographic health data, applicability of transformation factors according to Dempster and Lerner [20] and Vinson et al. [21] to linear animal model estimates obtained with residual maximum likelihood (REML) has been proven via simulation [22]. However, rate of over-or underestimation, which is caused by analysis of binary traits in linear models and has to be compensated via transformation, depends on data structure. Re-evaluation of transformation procedure will therefore be required for analyses of data of different structure with respect to distribution and kind of available information. Implementation of a Bayesian multivariate animal threshold or mixed linear-threshold model for (co)variance component estimation may provide a worthwhile alternative.
Quantitative trait loci (QTL), i.e. genome regions which include genes that influence the phenotype of an individual with respect to a particular trait, have been identified for production and health traits in different species [23]. Increasing knowledge on genetic determination of radiological health traits [24,25] implies further opportunities for improvement of genetic evaluation and selection schemes in the horse. Conditions for efficient use of marker-assisted selection have been described [26], but the effects of combined use of phenotypic and genotypic data on the accuracy of genetic parameter estimation for categorical traits has not been studied yet. Requirements for successful implementation of multivariate animal threshold models including phenotypic and genotypic information are unknown.
The aim of this study was to characterize the properties of multivariate estimation of genetic parameters for categorical, continuous and molecular genetic data using linearthreshold animal models and Gibbs sampling. Impact of data structure and quality of molecular genetic data on the accuracy of genetic parameter estimates was studied in the context of important radiological health traits in Warmblood horses.

Gibbs chains
The number of rounds of Gibbs sampling that had to be discarded as burn-in ranged between 5000 and 53000. Mean lengths of burn-in ranged between 8100 and 13866 rounds in the analyses of the different datasets. Mean, minimum and maximum length of burn-in were lowest in the analyses of dataset A2 and highest in the analyses of dataset B1.
Mean, minimum and maximum ESS of heritabilities and additive genetic correlations by dataset and quality of genotype information on T2 are given in Tables 1 and 2. For heritability of the continuous trait (T1) ESS ranged between 286.9 and 858.4 in the analyses of data on animals from one generation (datasets A1, B1 and C1) and between 1222.7 and 1949.0 in the analyses of data on animals from two subsequent generations (datasets A2, B2 and C2). For heritability of the binary traits ESS ranged between 23.0 and 139.3 for T2, between 24.1 and 147.5 for T3, between 15.4 and 115.2 for T4, and between 50.6 and 289.6 for T5 in the analyses of all datasets. Mean ESS was 810.7 to 1325.4 for the additive genetic correlation between T1 and T5, 175.7 to 500.7 for the additive genetic correlations of T1 with T2, T3 and T4, and 86.4 to 376.2 for the additive genetic correlation between traits T2, T3, T4 and T5. Significant influence on ESS was determined for the analyzed dataset (P < 0.001), but not for the quality of genetic marker information. There were no significant differences between ESS for all heritabilities and all additive genetic correlations in analyses of datasets B1 and C1 and between ESS for all heritabilities and all additive genetic correlations but r g12 in analyses of datasets B2 and C2.
Mean MCE was 0.016 to 0.017 for heritability of T1 in analyses of datasets B1 and C1, and 0.002 to 0.010 for heritability of T1 in analyses of datasets A1, A2, B2 and C2 and heritabilities of the binary traits (T2 to T5) in analyses of all datasets. Mean MCE for additive genetic correlations ranged between 0.001 and 0.004 in all analyses.
Mean, minimum and maximum bias of heritabilities and additive genetic correlations by dataset and quality of genotype information on T2 are given in Tables 3 and 4. Bias of heritability estimates for T1 ranged between -0.075 and 0.221 in the analyses of datasets A1, B1 and C1, and between -0.089 and -0.021 in the analyses of datasets A2, B2 and C2. Mean bias of heritability estimates for T2 was -0.134 to 0.205 in the analyses of datasets A1, A2, B1 and B2, 0.94 to 0.99 in analyses of dataset C1, and 0.47 to 0.52 in analyses of dataset C2. Mean bias of heritability estimates was -0.057 to 0.024 for T3, 0.065 to 0.231 for T4, and -0.018 to 0.095 for T5 in all analyses. Extremes of negative and positive bias of heritability estimates were -0.414 and 1.811 for T2, -0.233 and 0.316 for T3, -0.201 and 0.649 for T4, and -0.156 and 0.423 for T5. Additive genetic correlations between T1 and T2, T1 and T3, and T1 and T4 had a mean bias of -0.503 to 0.177 and a bias range from -0.958 to 0.637. Additive genetic correlation between T4 and T5 had a mean bias of 0.127 to 0.376 and a bias range from -1.748 to 1.532. The analyzed dataset had a significant influence on the bias of heritability estimates for T1, T2, T4 and T5 (P < 0.01) and of estimated additive genetic correlations between T1 and T2, T1 and T3, and T1 and T4 (P < 0.001). Bias of heritability estimate for T2 was further significantly dependent on the quality of genetic marker information (P < 0.01), with lower means in scenario r0p7 than in scenarios r0p9 and r1p9.

Mixed linear-threshold model analyses via Gibbs sampling
Simulated categorical, continuous and molecular genetic data were used for multivariate estimation of genetic parameters in mixed linear-threshold animal models via Gibbs sampling. General statements on the properties of multivariate analyses in mixed linear-threshold models require validation using independent sample datasets. For this study, simulation of multiple populations per study : heritability of binary trait T2 (T3, T4, T5); QTL scenarios: different scenarios with respect to recombination rate between markers and QTL (r) and polymorphism information content (PIC) of markers with r = 0.00 and PIC = 0.9 in the first line, r = 0.01 and PIC = 0.9 in the second line, and r = 0.00 and PIC = 0.7 in the third line. For details on datasets A1 to C2 see Table 6.
scenario and random sampling of one set of datasets per population was straightforward, but computationally expensive. In a preliminary study we therefore tested an alternative approach, with simulation of one population per study scenario and repeated random sampling of sets of datasets from one population. Because results regarding effective sample sizes and biases were not statistically different under these two approaches (results not shown), the computationally less expensive approach was chosen for the main study, i.e. replicate datasets were generated by repeated random sampling from single multi-generation populations.
Datasets used for the analyses differed in the number of animals with trait information, the distribution of animals with trait information and the availability of marker genotype information. The simulated pedigree structure resembled that of the Hanoverian Warmblood horse, and simulated data resembled the distribution of radiographic findings in the limbs of young Warmblood riding horses [18]. The traits of interest were binary, but one continuous trait was included in the multivariate analyses in order to improve mixing and convergence of the Gibbs sampler. Even in the analyses of the smallest datasets burn-in periods of 5000 to 15000 rounds were mostly sufficient for the Gibbs chains to converge. However, effective sample size of heritability of the continuous trait was considerably larger than effective sample sizes of heritabilities and additive genetic correlations of the binary traits. Effective sample sizes mentioned in literature for heritabilities of binary traits and uni-or bivariate animal threshold models were often in the same order despite larger numbers of informative animals and longer chain lengths [27][28][29][30]. In a previous simulation study inclusion of a continuous QTL: quantitative trait locus; r g12 (r g13 , r g14 , r g15 ): additive genetic correlation between continuous trait T1 and binary trait T2 (T3, T4, T5); r g23 (r g24 , r g25 , r g34 , r g35 , r g45 ): additive genetic correlations between binary traits T2 and T3 (T2 and T4, T2 and T5, T3 and T4, T3 and T5, T4 and T5); QTL scenarios: different scenarios with respect to recombination rate between markers and QTL (r) and polymorphism information content (PIC) of markers with r = 0.00 and PIC = 0.9 in the first line, r = 0.01 and PIC = 0.9 in the second line, and r = 0.00 and PIC = 0.7 in the third line. For details on datasets A1 to C2 see Table 6.  For details on datasets A1 to C2 see Table 6.  For details on datasets A1 to C2 see Table 6.
trait in animal threshold model analyses of binary traits had a positive effect on convergence, but not on effective sample size [8]. We found significant dependence of effective sample sizes on amount and distribution of available information. Given a certain number of animals with trait records, effective sample sizes for heritabilities were larger when animals were from two subsequent generations instead of one generation. There was no such clear effect on effective sample sizes for additive genetic correlations. Despite fixation of residual covariances to zero, effective sample sizes for additive genetic covariances and correlations were in many cases larger than effective sample sizes for additive genetic variances and heritabilities of the binary traits. Results from literature are not consistent in this respect [31].

Genetic parameter estimates
Bias of genetic parameter estimates is the major argument against transformation of linear model estimates and in favor of threshold model analysis of binary traits [e.g.
[1, 2]]. Heritabilities of the continuous trait and the moderately heritable binary traits (h 2 = 0.25) were over-or underestimated by maximally 10 percent. If heritability of the binary trait was lower (h 2 = 0.10) and only phenotype information was considered, mean bias increased to up to 25 percent over-or underestimation. These values are in agreement with literature [1, 10,32] and clearly smaller than [2] or similar to [22] reported bias of transformed linear model estimates. Inclusion of fixed genotype effect in the model resulted in marked overestimation of heritability of the QTL trait. If trait information was available on animals in only one generation, upward bias was larger than 90 percent, indicating failure to reliably estimate heritability of the polygenic component. Heritability estimates were close to simulated overall heritabilities. If trait information was available on animals in two subsequent generations, i.e. to parents and offspring, upward bias of polygenic heritability estimates was reduced to about 50 percent.
Bias of additive genetic correlations between continuous and binary traits was dependent on heritability of the binary trait, but largely independent of the sign of the simulated correlation. Low heritability of the binary trait (h 2 = 0.10) resulted in downward bias of continuous-binary correlation estimates by up to 50 percent, i.e. estimates for both positive and negative additive genetic correlations were closer to zero than the simulated values. Estimates for the positive additive genetic correlation between the continuous and the moderately heritable binary trait (h 2 = 0.25) were on average less biased, with upward bias of up to 18 percent in the small datasets and downward bias of up to 8 percent in the larger datasets. Ranges of biases were similar for all continuous-binary correlation estimates. Estimates for the positive additive genetic correla-tion between two of the binary traits showed considerable variation of bias and were on average overestimated by 13 to 38 percent. Variation of bias was larger if information on animals from only one generation was considered, and overestimation was larger if information on animals from two subsequent generations was considered. Underestimation of positive additive genetic correlation between one continuous trait of moderate heritability (h 2 = 0.25) and binary traits of low heritability (h 2 = 0.05 or 0.10) and low prevalence (0.05 or 0.15) has been reported previously [33]. However, in this and a similar study estimates for positive additive genetic correlation between the two binary traits were biased downward as well [2,33]. Differences in simulation parameters and method of analysis may be responsible for the different results. In contrast with our study, simulated additive genetic correlations were all 0.5, residual correlations of 0.3 or 0.2 or -0.2 were simulated, and analyses were performed in linear animal model with residual maximum likelihood (REML). Theoretically, estimates for additive genetic correlations should not be affected by violation of assumption of normality when using linear models for analysis of binary data [21,34]. However, opposite directions of biases seem to arise from use of linear and threshold models for estimation of genetic correlations between binary traits.

Amount and structure of data
Amount and structure of available information had little influence on accuracy of heritability estimates for the continuous trait and the binary traits of moderate heritability. Increase of the amount of available information increased accuracy of heritability estimates for the binary traits of low heritability and of additive genetic correlation estimates between continuous and binary traits, but also increased the upward bias of additive genetic correlation estimates between two binary traits. Influence of quality of genetic marker information on accuracy of heritability estimates for the binary QTL trait and additive genetic correlation estimates between the continuous trait and the binary QTL trait was less distinct. Given high PIC of genetic markers, estimates were largely unaffected by presence or absence of recombination between markers and QTL. Decrease of PIC resulted in lower heritability estimates, i.e. increased negative bias or decreased positive bias of heritability estimates, and larger range of bias of heritability estimates for the QTL trait. Estimates of additive genetic correlations between the continuous trait and the binary QTL trait were least biased when phenotype and genotype information was considered, genotype information included highly informative markers, and trait information referred to animals from two subsequent generations. There are no reports of similar investigations on influences on accuracy of genetic parameter estimates. Simulated heritabilities and additive genetic correlations were aligned with results of previous real data analyses and kept constant in this study. Furthermore, we simulated only four of the ten additive genetic correlations and no residual correlations in the multivariate setting of one continuous and four binary traits. Correlations between the binary QTL trait and another binary trait were not simulated. Impact of quality of genetic marker information on accuracy of additive genetic correlation estimates between two binary traits, one of which is influenced by QTL, requires further investigation. Simulation can be extended by modification of heritability levels, closeness and sign of genetic correlations and inclusion of more than one QTL trait. This study documents general feasibility of combined use of phenotype and genotype data for estimation of genetic parameters in the horse using multivariate mixed linear-threshold animal models and Gibbs sampling.

Conclusion
It is feasible to perform multivariate estimation of genetic parameters for categorical, continuous and molecular genetic data in mixed linear-threshold animal models via Gibbs sampling using data and pedigree structures similar to those encountered in the Hanoverian Warmblood horse. Use of trait information on two subsequent generations of animals can increase accuracy of parameter estimates more than merely increasing the number of informative animals from one generation. Impact of different quality of genetic marker information with respect to recombination rate and PIC was minor in the scenarios studied. Consideration of marker genotype information as a fixed effect in the model is likely to result in overestimation of polygenic heritability of a QTL trait, but may be advantageous for quantification of additive genetic correlations between traits of interest. Improved identification of agonistic and antagonistic genetic correlations between traits of interests will facilitate design of effective multiple trait selection schemes.

Data simulation
Simulated data, which resembled the situation encountered in the population of the Hanoverian Warmblood horse, were used for this study. Characteristics of this population include a high percentage of artificial insemination and coexistence of many small breeding farms and few large studs. The number of horses used for breeding is rather low when compared to the total number of horses, but data collection for genetic analyses is not limited to broodmares and sires. We simulated fixed effects, residual and additive genetic variances for one continuous trait (T1) and liabilities of four categorical traits (T2 to T5), and QTL effects for the liability of one of the categorical traits (T2). Heritabilities were set to 0.50 (T1), 0.25 (T3, T5) and 0.10 (T2, T4). Simulated additive genetic correlations were positive between T1 and T2 and between T1 and T3 (r g = 0.20), and negative between T1 and T4 and between T4 and T5 (r g = -0.20). For T2 two QTL and two flanking markers per QTL with five randomly distributed and equally prevalent alleles each were simulated. Linkage between markers and corresponding QTL was assumed, with one of the marker alleles being associated with the unfavorable QTL allele, therewith indicating increased probability of affection with regard to T2. We assumed that the closely linked markers and the unfavorable allele of each of the markers had been identified in previous association studies, allowing the distinction between individuals homozygous positive, heterozygous or homozygous negative for the unfavorable allele of the respective marker. Only additive effects and no dominance effects were simulated, and total QTL variance was set equal to the polygenic variance. In order to study the effects of different quality of QTL information on the estimation of genetic parameters, three scenarios with respect to recombination rate between markers and QTL (r) and polymorphism information content (PIC) of markers were simulated: r = 0.00 and PIC = 0.9 for all markers (r0p9); r = 0.01 and PIC = 0.9 for all markers (r1p9); r = 0.00 and PIC = 0.7 for all markers (r0p7). In this context, a recombination rate of 0.01 between a marker and the QTL will mean that given the unfavorable marker allele, the QTL allele will be unfavorable with probability p = 0.99. A recombination rate of 0.00 between a marker and the QTL will mean that given the unfavorable marker allele, the QTL allele will be unfavorable with probability p = 1.00. This case is identical with the situation in which the causal mutation is definitely known. For the categorical traits, simulation of liabilities on the linear scale was followed by dichotomization in order to obtain trait prevalences of 25% (T 2 , T 5 ) or 10% (T 3 , T 4 ). Details on the simulation procedure are summarized in Table 5.
Each of the three simulated populations included 7 generations and 40000 animals per generation. Samples of 10000 animals were drawn at random from the fourth generation, and the pedigree of these animals was traced back over three generations. Within each of ten replicates which were generated this way, six different datasets were created for the genetic analyses. Dataset A1 included all 10000 animals with records for the continuous trait and the four binary traits, information on the fixed effects of sex and contemporary group, and pedigree information over three generations. Dataset B1 included 5000 animals, randomly chosen from the animals included in dataset A1, with respective information on traits, sex, contemporary group and pedigree. Dataset C1 included the same animals and the same information as dataset B1, and additional information on the marker genotype of the animals. Dataset B2 included the same animals as datasets B1 and C1 plus their parents with information on traits, sex, contemporary group and pedigree. Dataset C2 included the same animals and the same information as dataset B2, and additional information on the marker genotype of the animals. Dataset A2 included the same animals as datasets B2 and C2 plus the additional 5000 animals which were included in dataset A1 with information on traits, sex, contemporary group and pedigree. The average size of paternal halfsib groups ranged between 16.30 and 29.28, and the average size of maternal halfsib groups ranged between 1.23 and 1.55 among the animals with trait records in the six datasets. Distribution of trait and pedigree information in the different datasets used for the genetic analyses is summarized in Table 6.

Statistical analyses
Genetic parameters were estimated using Gibbs sampling with the threshold version of the Multiple Trait Gibbs Sampler for Animal Models (MTGSAM) [35], a software which supports multivariate genetic analyses of any combination of continuous and categorical traits. Random and residual effects are assumed to be normally distributed, and flat priors are used for the fixed effects. The user can specify starting values and priors for additive genetic and residual (co)variance matrices. For our analyses, a starting value of one was chosen for all additive genetic variances, a starting value of zero was chosen for all additive genetic covariances, and the residual covariances between all traits were fixed to zero. In uni-and multivariate binary threshold models identifiability of the model is ensured by fixing the values for the thresholds and the residual variances to values of zero and one, respectively [36]. Methods for effective sampling of the residual covariance matrix subject to the restriction of diagonal elements fixed at one are still under development. However, fixation of the residual covariances for this study was justified by the results of previous real data analyses [14] and fit our simulated data. Because residual covariances were negligible in the real data and were accordingly set to zero in the data simulation, possible gain of further information seemed to be disproportionate to the additional costs of sampling of the residual covariances. Residual covariances were therefore fixed to zero in this study. For the genetic covariance matrix a proper prior using an inverse Wishart distribution with minimum shape parameter (i.e. ν IW = 7) was adopted in order to ensure posterior propriety. The fixed effects of sex and contemporary group were considered in all analyses. The fixed effect of marker genotype was considered in the analyses of datasets C1 and C2 only, distinguishing between individuals homozygous negative for the unfavorable alleles of all genetic markers, individuals heterozygous for the unfavorable allele of at least one of the genetic markers, and individuals  homozygous for the unfavorable allele of at least one of the four genetic markers. y ijlm = µ + SEX i + CONT j + a l + e ijlm (datasets A1, A2, B1 and B2), and y ijklm = µ + SEX i + CONT j + QTL k + a l + e ijklm (datasets C1 and C2), with y ijlm (y ijklm ) = observation on trait T1 (continuous) or on trait T2, T3, T4 or T5 (binary) for the l th animal, µ = model constant, SEX i = fixed effect of the sex of the animal (i = 1-2), CONT j = fixed effect of the contemporary group (j = 1-5 in analyses of datasets A1, B1 and C1; j = 1-8 in analyses of datasets A2, B2 and C2), QTL k = fixed effect of the QTL marker genotype (n = 1-3), a l = random additive genetic effect of the l th animal (l = 1-30533 to 30766 for dataset A1, l = 1-37028 to 37292 for dataset A2, l = 1-19791 to 20050 for datasets B1 and C1, and l = 1-26253 to 26664 for datasets B2 and C2), and e ijlm (e ijklm ) = random residual.
The total length of the Gibbs chain was set to 205000 in all analyses, and all samples after 5000 rounds of burn-in were saved. Convergence of the Gibbs chain and the need for additional rounds of burn-in to be discarded was checked by visual inspection of sample plots. Effective sample size (ESS) and Monte Carlo error (MCE) was cal-culated for all (co)variance estimates by the times series method implemented in the post-Gibbs analysis program POSTGIBBSF90 [37] with a thinning rate of ten. Unthinned chains were used to calculate posterior means of additive genetic (co)variance, heritability and additive genetic correlation estimates. Posterior means rather than modes were chosen as point estimates, because in preliminary analyses the means were in most cases closer to the true, i.e. simulated, values than the modes, a finding which is in agreement with previous studies [8,38]. Bias of heritability and additive genetic correlation estimates was calculated as the mean relative deviation of the estimated values (par est ) from the true, i.e. simulated, values (partrue ). bias = (par est -par true )/par true The influence of data structure and quality of genetic marker information on ESS and bias was tested via analysis of variance using the procedure GLM of Statistical Analysis Systems, (SAS), version 9.1.3 (SAS Institute, Cary, NC, USA, 2005). Effective sample size or bias were considered as dependent variable, and dataset (A1, A2, B1, B2, C1, C2) and quality of genetic marker information (r0p9, r1p9, r0p7) were considered as fixed effect.

Authors' contributions
KFS designed and carried out the simulation study, and drafted the manuscript. OD conceived of the study, participated in its design, and revised the manuscript. IH participated in conception and design of the study, coordinated it, and revised the manuscript. All authors read and approved the final manuscript.