Functional Multi-Locus QTL Mapping of Temporal Trends in Scots Pine Wood Traits

Quantitative trait loci (QTL) mapping of wood properties in conifer species has focused on single time point measurements or on trait means based on heterogeneous wood samples (e.g., increment cores), thus ignoring systematic within-tree trends. In this study, functional QTL mapping was performed for a set of important wood properties in increment cores from a 17-yr-old Scots pine (Pinus sylvestris L.) full-sib family with the aim of detecting wood trait QTL for general intercepts (means) and for linear slopes by increasing cambial age. Two multi-locus functional QTL analysis approaches were proposed and their performances were compared on trait datasets comprising 2 to 9 time points, 91 to 455 individual tree measurements and genotype datasets of amplified length polymorphisms (AFLP), and single nucleotide polymorphism (SNP) markers. The first method was a multilevel LASSO analysis whereby trend parameter estimation and QTL mapping were conducted consecutively; the second method was our Bayesian linear mixed model whereby trends and underlying genetic effects were estimated simultaneously. We also compared several different hypothesis testing methods under either the LASSO or the Bayesian framework to perform QTL inference. In total, five and four significant QTL were observed for the intercepts and slopes, respectively, across wood traits such as earlywood percentage, wood density, radial fiberwidth, and spiral grain angle. Four of these QTL were represented by candidate gene SNPs, thus providing promising targets for future research in QTL mapping and molecular function. Bayesian and LASSO methods both detected similar sets of QTL given datasets that comprised large numbers of individuals.


Marker mapping supplementary
Marker sorting and mapping was performed with all the available genotype data simultaneously with the aim of constructing one pure AFLP dataset (A-set) for the larger subset of individuals and one mixed SNP-AFLP dataset (S+A-set) intended for the smaller subset. Because the studied full-sib family was generated by two non-inbred and highly heterozygotic parents, a two-way pseudo-testcross mapping strategy was employed (e.g. Grattapaglia and Sederoff, 1994;Grattapaglia et al., 1995). Markers for which genotyping scoring success was inadequate (< 80%) and poorly genotyped individuals (< 70%) were excluded from further study. By checking parental heterozygosity and genotypic frequencies of the offspring, an additional selection was performed where codominant SNP markers were accepted given that genotypes segregated according to 1:1 proportions (one of the parents being heterozygote) or to 1:2:1 proportions (both parents heterozygotic). For dominant AFLP however, only a 1:1 band presence/absence segregation proportion was accepted (only one parent heterozygote). Selection was performed based on submitting genotype proportions to χ 2 -tests using 0.01 as a threshold for the p-value.
Among the selected markers, recombination fractions were calculated for all pairs of markers in order to estimate marker linkage and phase. For most marker pairs this could be done by division, but for pairs of 1:2:1-segregating SNPs, recombination frequencies were estimated via the maximum likelihood EM-algorithm (Dempster et al., 1977). Linkage between markers was declared based on a log-likelihood ratio of 23.03 or higher (equivalent to LOD>5). Even though the inclusion of 1:2:1-segregating markers made the construction of a consensus map possible, it was nonetheless reasonable to assume that different sets of QTL segregated within each of the unrelated and heterozygotic parents and it was thus more meaningful to separate the marker linkage mapping into maternal and paternal sections. The 1:1-segregating markers were assigned to linkage groups based on the parent for which they exhibited heterozygosity as well as on linkage. The 1:2:1-segregating markers were instead duplicated into two identical 1:1segregating markers each assigned to male and female linkage groups respectively. Heterozygote genotypes of 1:2:1-segregating markers were henceforth treated as unknown (replaced by missing 1 values). Consequently all marker genotypes were recoded as numeric values of 0, 1 or missing value.
Genetic distances between markers within each linkage group were estimated using the Haldane mapping function and markers were then ordered by minimizing the sum of adjacent genetic distances using the branch-and-bound searching algorithm. Preliminary analyses showed that genetic distances estimated using the more common Kosambi mapping function behaved in a less additive manner than the Haldane function and therefore the latter was used. Markers that were strongly correlated (|r| > 0.95 approximately equivalent to a 2.5 cM genetic distance) to another marker with a better genotype coverage were excluded from further analysis due to redundancy.
In summary 153 AFLP markers genotyped on 455 individuals (the A-dataset) and 153 AFLP and 166 SNP markers genotyped on 91 individuals (the S+A-dataset) were retained in the analysis after filtering and sorting. 251 markers were distributed on 26 maternal and 24 paternal linkage groups while 68 markers could not be assigned to any linkage group (unclustered).
By cross-examining the linkages of the duplicated 1:2:1-segregating SNPs, 15 maternal linkage groups were successfully associated with the corresponding paternal linkage groups. Missing genotype values were imputed by their conditional expectation estimated from flanking markers with known genotypes in accordance with Haley and Knott (1992), but taking marker phase (coupling/repulsion) into account. The imputation of missing heterozygote values for the duplicated 1:2:1-segregating markers was conducted by selecting the marker duplicate that exhibited the closest linkage with flanking markers and imputing its missing values conventionally (x 1 ).
Subsequently, the missing values of the second marker duplicate located in the linkage group of the opposite sex, were imputed using the opposite values of the first duplicate imputation (x 2 = 1 − x 1 ). Using this mirror imputation technique it was ensured that the separation of 1:2:1-segregating markers into pairs of 1:1-segregating markers was made as complete as possible and to break down the artefactual correlation between duplicated markers.
Equation (2.2) can be written as the following likelihood function where N = n i=1 m i , θ represents all the parameters in the model. Furthermore, the priors are • p(σ 2 ) ∝ 1 σ 2 , • p(α i |Σ 2×2 ) = MVN(α i |0, Σ 2×2 ), for i = 1, ..., n, In Bayesian statistics, our target is the posterior distribution p(θ|Y ) ∝ p(Y |θ)p(θ), proportional to the product of the likelihood function and priors. We use the Gibbs sampling method (Geman and Geman, 1984), a Markov Chain Monte Carlo (MCMC) algorithm, to evaluate the full posterior distribution. The following is a skeleton of the Gibbs sampler: i) assign an initial state of the unknown parameters θ 0 = θ 0 1 , ..., θ 0 M (M represents the total number of all the single parameters).
Note that during step (2), it is also possible to update a group of parameters as a block, if it is possible to sample from their joint full conditional posterior distribution.
Next, all the required full conditional posterior distributions for the LMM model are provided.

Posterior summarization
For each data set, we all-together simulated 50000 MCMC samples from the Gibbs sampler, by using the first 10000 were considered as burn-in, the remaining were stored in every 20th, so that . Based on the Rao-Blackwell theorem, this treatment helps to reduce the sample variance (Guan and Stephens, 2011).