Bayesian analysis of random regression models to model test-day somatic cell score of primiparous Holstein cattle in Iran

ABSTRACT Several functions (Ali-Schaeffer curve, Legendre polynomials, and the combination of Legendre polynomial and the Wilmink function) were used to adjust the lactation curve of somatic cell score (SCS) of primiparous Holstein cows in Iran. Data included 204,374 test-day records collected on 29,215 animals over 10 freshening years, from 2002 to 2011. The model included herd-year of calving (HY), additive genetic (AG), permanent environmental (PE) and residual effects as random and age-season of calving (A-S) as fixed regression effects. In addition, contemporary group effect (herd-year-month of test-day) was included as fixed effect. Residual variances were modelled considering 1, 4, 7, 10 or 20 classes of days in milk (DIM). Estimates of residual variance were quite similar, rating from 1.30 to 1.56. Estimates of daily heritability were practically constant until DIM 250 of lactation (around 0.03) and increased toward the end of lactation (0.07), thereby indicating the presence of low AG variation for SCS in this population of Holstein cattle. Regarding low heritabilities, a relatively low genetic progress will be expected following selection. The chosen model (according to the deviance- and Bayesian-information criterion) had Legendre polynomials with six coefficients for A-S effect, four coefficients for HY effect, five coefficients for additive effect, and six coefficients for PE effect with a homogeneous error structure.


Introduction
Mastitis is an important infectious disease that affects primarily cattle. Average annual economic losses to dairymen in the Iran attributable to mastitis have been estimated at $80 per Holstein cow (Sadeghi-Sefidmazgi et al. 2011). Thus, with a current cow population of approximately 8 million (Sadeghi-Sefidmazgi et al. 2012), the annual cost of this disease to the dairy industry approaches $0.6 billion. Over the past couple of decades, strong emphasis has been placed on the development of mastitis control programs. For instance, the Norwegian prevention programm, partly described by Østerås et al. (2007), has been successful at decreasing the prevalence of mastitis. The authors described three principal reasons for this decrease: effect of preventive management programs, effect of changed attitudes of the farmers toward treatment, and genetic improvement. In contrast to the first two reasons, the effect of genetic change is permanent to the dairy population (Sørensen et al. 2010). Therefore, it is of interest to increase the genetic gain of resistance to mastitis.
In fact, selection for increased mastitis resistance contributes to reduce production costs and is consistent with the goal of maximizing genetic improvement for total economic merit (Heringstad et al. 2000). Selection on clinical mastitis can be done directly and indirectly (Heringstad et al. 2000;Ødegard et al. 2003). For direct selection data on mastitis cases are needed and to be collected on the farms. However, while clinical mastitis is rather easy to detect, animals with subclinical mastitis are often difficult to find since there is a lack of the reliable diagnostic method (Leitner et al. 2004). In the Iran no registration of mastitis infections is available on the national level, therefore, direct selection to reduce mastitis incidence is not possible. Fortunately, somatic cell count (SCC) has a high genetic correlation with mastitis (Carlén et al. 2004;Wolf et al. 2010), therefore, the monitoring of SCC of cow's milk at the farm level and used as an indicator of both clinical and subclinical mastitis (Emanuelson 1988;Emanuelson et al. 1988;Mrode and Swanson 1996;Heringstad et al. 2000) seems to be helpful to increase mastitis resistance of dairy cows more efficiently. In addition to its importance as an indicator of udder infection, avoiding high SCC in milk has associated advantages. High SCC is associated with the reduced processing ability of milk; it reduces cheese yield (Politis and Ng-Kwai-Hang 1988) and decreases the shelf-life of consumer milk (Ma et al. 2000). Reduced cure rate (Barkema et al. 2006), economic losses, and poor milk quality, consequently, form incentives to reduce the SCC. Genetic evaluation of SCC, however, is mostly based on SCS; a logarithmic transformation of SCC to achieve normality of distribution (Ali and Shook 1980).
Considering SCS-based selection in dairy cattle for improved udder health, however, requires accurate estimates of genetic parameters for cow milk SCS. Bayesian approach is gaining popularity because of its more comprehensive assumptions than those of classical approaches and its flexibility in resolving a wide range of biological problems (Ben Gara et al. 2006). Therefore, the objective of the present study was to model variations in test-day (TD) SCS of first lactations of Holstein cows by random regression (RR) models using different functions and Bayesian inference in order to fit adequate and parsimonious models which could be used for the genetic evaluation for test-day SCS of dairy Holstein cows in Iran.

Data
In Iran, SCC is recorded in most herds on a bimonthly basis. Records for cows included in a pedigree file for the Iranian Holstein population, stored in the Animal Breeding Center of Iran (ABCI, Karaj), were used in the analyses. Data were restricted to primiparous cows, calving between 21 March 2002 and 20 March 2011, that met the following requirements: (1) age at first calving from 20 to 38 month, (2) test days from 6 to 305 days in milk (DIM), and (3) milking three times daily (i.e. morning, noon, and evening). Daily SCC records <1000 or >1,000,000 cells/mL were discarded (0.05%). SCC records were included if there were at least two valid monthly tests, the first monthly test was <90 DIM, and the difference between any two consecutive tests were >15 d. Edits were on the number of cows per herd (>100), per herd-year of calving (HY) class (>10), and per sire (>10) and number of records per-lactation (≥4). Studies have indicated that the logarithmic transformation of SCC produces data with a more normal distribution, increased heritability, and reduced heterogeneity of within-herd and within-sire variances (Shook 2006). Consequently, individual SCC records for each TD were transformed to SCS as SCS = 3 + log 2 (SCC × 10 −5 ). After the edits, more than 200,000 TD records remained, which gave on average: 383.4 records per HY, 54.8 cows per HY, and 5.7 cows per herd-test-date (HTD). Because of the small herd sizes, the herd effect was modelled by fixed herd-year-month of TD and random HY components. Descriptive statistics of the dataset are given in Table 1.

Model
The data were analysed using RR test-day models with different RR functions. All models had the following general characteristics: where Y ijklm is the lth test-day SCS of animal m within herd-yearmonth of TD (hym) effect i, belonging to age-season of calving (A-S) class j and the kth class of herd-year of calving (HY), b jn are fixed regression coefficients (FRC) specific to age-season subclass j (19 age, four seasons), c kn are random regression coefficients (RRC) specific to herd-year of calving k, a mn are RRC specific to the additive genetic (AG) effect of cow m, p mn is the random permanent environmental (PE) effect, and e ijklm is the residual effect for each observation. The φ mnl (t) are the nth covariate depending on DIM t, and n b , n c , n a , and n p represent the order of fit for A-S, HY, AG, and PE curves, respectively. Assume that elements of a m and p m (the vectors of AG and PE regression coefficients for cow m, respectively) are ordered within regression coefficients, and let var(c k ) = H, var(a m ) = G, var(p m ) = P, and var(e ijklm ) = R denote HY, AG, PE, and residual covariance matrices, respectively. This gives var(c) = I ⊗ H, var(a) = A ⊗ G, var(p) = I ⊗ P, and var(e) = I ⊗ R, where A is a numerator relationship matrix, I is an identity matrix, and ⊗ depicts the Kronecker product operator.
The following mathematical functions were used to the model FRC and RRC effects: (a) Ali and Schaeffer (AS) logarithmic function (Ali and Schaeffer 1987); where t = DIM.
where x = −1 + 2((t − t min )/(t max − t min )), and the coefficients α 0 -α 2 are coefficients of the second-order Legendre polynomial at DIM. In the present study, the fourth parameter (α 3 ) was assumed equal to −0.05. (c) in addition, 22 RR models fitting orthogonal Legendre polynomials (LEG) of different orders (Kirkpatrick et al. 1990) to both fixed (A-S) and random (i.e. HY, AG and PE) effects were considered. Parameters were estimated up to a fifth order (quintic) of fit. The notation that will subsequently be used to refer to the RR model based on Legendre polynomials is Lijkl, where i (i = 2, 6), j ( j = 2, 6), k (k = 2, 6), and l (l = 2, 6), respectively, indicate the order of fit for the submodel fitted to the A-S, HY, AG, and PE effects. Models with more than 6 RR coefficients for fixed and random effects were not used due to computing limitations.
All specific models were fitted by the Gibbs sampling method using the GIBBS3F90 software package (Misztal 2010). A chain of 200,000 samples was generated for each model studied. Using a conservative approach, the first 20,000 samples were discarded and were considered as burn-in period. Using a thin the output by 100, inferences were performed on the remaining 1800 samples. Convergence of Gibbs chains was monitored by graphic inspection of samples × iterations. For illustration, the genetic parameters for days of lactation were obtained for each sample. The highest posterior density interval was also determined for each parameter of the model with 95% confidence. As suggested by Van der Werf (2001), the eigenvalues and their corresponding eigenfunctions (EF) of the covariance matrix of the genetic additive random regression coefficients were used to indicate the pattern of genetic variation at all covariate values. The EF of genetic covariance function (CF) were obtained by EF i (t) = n a n=1 w mnl (t)E imnl , where E imnl are the elements of the ith eigenvector of genetic CF (Meyer and Kirkpatrick 2005).
The RR models were compared using the deviance (DIC) and Bayesian (BIC) information criteria. The information criteria can be described as DIC = D(u) + p D (Spiegelhalter et al. 2002); (Schwarz 1978), where D(u) is the posterior expectation of the Bayesian deviance, p D is the effective number of parameters, par is the number of parameters in the model, and log(n) is the natural logarithm of the total number of observations. Models having smaller DIC and BIC are favoured because it indicates a better goodness-of-fit at a lower degree of complexity. The model selection process was split into two steps. In the first step, all models were analysed (i.e. AS, LW, and 22 RR models fitting orthogonal Legendre polynomials), assuming homogeneous residual variances (results not shown). Based on the results of this analysis, three selected models (the three best models based on DIC and BIC criteria; i.e. AS, L6455, L6456) were further studied for heterogeneous residual variances (Table 2). For the heteroscedastic models, the pattern of variation was analysed to define step functions with 4, 7, 10 or 20 various classes depending on DIM. In this case, the DIM were grouped as follows: 6-20, … , and 291-305 DIM for the model with 20 classes, 6-35, … , and 276-305 DIM for the model with 10 classes, 6-50, … , 231-275 and 276-305 DIM for the model with seven classes, and 6-30, 31-120, 121-240 and 241-305 DIM for the model with four classes. Each of these models (i.e. AS, L6455, L6456) followed by ME (measurement error) and the number of classes (x: 1, 4, 7, 10 or 20) of residual variance (RV). For instance, ASME20 identifies the Ali-Schaeffer curve adjusted with 20 classes of ME.

Results
The overall average test-day SCS was 2.03, with a standard deviation of 1.63. After the beginning of lactation, SCS decreases to a minimum at around 60 DIM and increase thereafter (Figure 1).
In all cases, the model with AS showed the greater performance compared with the LW model. However, Legendre polynomials showed better performance than lactation curves with the same number of parameters (results not presented here). Results for DIC and BIC from the three best models analysed with both homogeneous and heterogeneous residual variances are presented in Table 2. As the cited table shows, the structure of residuals (i.e. homogeneous vs. heterogeneous) for the estimated covariance components varied across models. In general, the pattern of differences in residual structure did not indicate any clear association between model and the both information criteria. For instance, for the series of LEG models with the different orders of covariance functions for AG and PE effects (denoted by L6456MEx), the model considering RV homogeneity (e.g. L6456ME1) provided the best fit, whilst BIC and DIC selected models with ME = 1 and ME = 7 as the best for L6455 (denoted by L6455ME1 and L6455ME7, respectively, in Table 2). For AS function the ranking on the BIC and DIC criteria was also the same as for the LEG, with a permutation between the homogeneous and heterogeneous of residuals. However, both DIC and BIC suggested that a model with n = 6, 4, 5, 5 and ME = 1, involving a total of 47 parameters to be estimated, would the best to describe the covariance structure of this data. Based on these results, four models were selected for further comparisons: L6456ME1, L6456ME4, L6456ME7 and L6455ME7.
The RV estimates obtained for the DIM through the adjustment of the L6456ME7 (L6455ME7) model were similar, varying from 1.33 (1.30) for ME7 to 1.52 (1.56) for ME1 ( Figure 2). The average RV obtained from the seven ME groups was 1.39 (1.40), with a standard deviation of 0.07 (0.08). RV estimates obtained through the adjustment of the L6456ME4 model varied from 1.34 for ME4 to 1.47 for ME1. The average RV obtained from this model was 1.40, with a standard deviation of 0.06. The posterior mean of RV for the L6456ME1 was 1.39.
The PE variances obtained with the selected models showed a similar trend across lactation, with the observation of discretely lower values in the middle third of lactation ( Figure 2). The estimates obtained with models L6456ME1 and L6456ME4 for the first days of lactation, however, were slightly higher than those estimated with the other models (i.e. L6456ME7 and L6455ME7). On the other hand, the estimates obtained with model L6455ME7 were overestimate at the end of lactation. The amplitudes of variation in the AG estimates of the models ranged from 0.06 to 0.15, with the lowest genetic estimates obtained in the mid-lactation (130 to 150 DIM) and the highest at the end of lactation. Overall, most of the curves estimated rise at the edge. These growths were particularly high for beginning of daily PE variances. The posterior means of daily heritability (Figure 3) showed a similar trend to that observed for genetic variances and ranged from 0.030 ± 0.004 to 0.071 ± 0.012 for model L6456ME1, from 0.028 ± 0.005 to 0.069 ± 0.011 for model L6456ME4, from 0.029 ± 0.005 to 0.071 ± 0.020 for model L6456ME7 and from 0.030 ± 0.004 to 0.064 ± 0.012 for model L6455ME7. The average estimate of this parameter was 0.036 for the four compared models. The heritability estimates obtained with the four  models were practically constant until DIM 250 of lactation, increasing slightly thereafter until the end of lactation (see Figure 3).
The posterior means of genetic correlations ranged from 0.46 to 0.99 (L6456ME1), from 0.38 to 0.99 (L6456ME4), from 0.44 to 0.99 (L6456ME7) and from 0.30 to 0.99 (L6455ME7). As expected, genetic correlations were high between adjacent yields (especially in late lactation) and decreased as the interval between yields increased (Figure 4). Similarly, the PE correlation estimates obtained with the four models were about 0.99 between adjacent TD records and decreased to 0.10 between test days at the beginning and at the end of lactation (Figure 4).
For AG effects, the three principal eigenvalues of the covariance matrix between TD records explained 95.7% (L6456ME7) to 98.4% (L6456ME4) of the total genetic variation between animals. When the L6456ME1 model was used, each of the three principal eigenvalues explained 86.8%, 6.2% and 4% of the variation, respectively. It means that little variation was associated with the other coefficients, for the AG random effect. In the same way, for the permanent environment effect, the largest variation was associated with the three principal eigenvalues, corresponding to 94.7% of the total variation. Models L6456ME4, L6456ME7, and L6455ME7 provided similar results. Regardless of the model, the last eigenvalue for each CF estimated was close to zero (not shown), indicating that the three main eigenvalues of the covariance matrix between TD records explained almost all the genetic variation of this trait; therefore, a possibility exists of fitting a reduced rank model with a smaller number of parameters. Figure 5 displays estimates of the first and second genetic eigenfunctions for different orders of fit. Estimates of the first genetic eigenfunction obtained with the four models presented similar trends. In general the leading eigenfunction was close to zero at the beginning of lactation and increased slightly over the trajectory. Estimates of the second eigenfunction, however, changed considerably (with an opposite trend) from ME = 1 or 4 ME = 7.

Discussion
According to Ødegard et al. (2003), the results of the present study show that Legendre polynomials well suited for genetic analysis of SCS compared with other RR functions. This has also been found in the genetic parameters for milk production traits (López-Romero and Carabaño 2003;. In all models, the maximum variation was obtained at the periphery of lactation (see Figure 2), which supports a previous hypothesis, which found that yield at the extremes of lactation were subject to more temporary environmental variation than yield in mid-lactation, which is influenced more by genetic and PE differences between animals (Van Vleck and Henderson 1961). The decrease in PE variability after the onset of lactation has been commonly observed in studies with dairy cows (Haile-Mariam et al. 2001;Ødegard et al. 2003;Rzewuska et al. 2011;. Likewise, it is common to find higher estimates of genetic variance at the beginning and at the end of lactation (Ptak et al. 2007;Rzewuska et al. 2011). The maximum value of AG variance (0.76) in Ptak et al. (2007) was on the fifth day after calving, and it gradually decreased to the minimum value of 0.37 in the first month of lactation. Our results were more constant across lactation with no   artefacts. This could be due to a better modelling of nongenetic factors by the use of a high-order polynomial for the PE effect (Strabel and Jamrozik 2006).
Trends of the heritability estimates followed those reported in the literature, i.e. higher estimates at the end of lactation (Haile-Mariam et al. 2001;Ødegard et al. 2003;. This was mainly an effect of decreasing residual and PE variances but also, in some cases, due to increasing genetic variances. The low posterior means of daily heritability, which were in the range from 0.029 to 0.071, are in good agreement with the recent estimates of 0.04-0.08 (Dadpasand et al. 2013) and 0.03-0.07  in Iranian. Mrode and Swanson (2003) also reported low daily heritability (0.04 to 0.09) for first lactation SCS of United Kingdom Holstein-Friesian cows. However, heritabilities around 0.20 (on average in the first parity) estimated by Bohmanova et al. (2008) and Samoré et al. (2008) are clearly higher than our estimates, but these differences may be explained by differences in populations, statistical models and estimation methods (Zavadilová et al. 2011).
It seems clear from results in this and other studies (Haile-Mariam et al. 2001;Mrode and Swanson 2003;Rzewuska et al. 2011) that genetic correlations were stronger than corresponding PE correlations and both declined as DIM got further apart. This was in agreement with the hypothesis that correlations of yields on consecutive TD are affected by the interval between tests (Schaeffer and Burnside 1976).
As pointed above, the first eigenvalue accounted for about 85% of total genetic variation. A large eigenvalue (proportional to the other eigenvalues) indicates that the change caused by selection of the associated eigenfunction will happen faster (Kirkpatrick et al. 1990). Thus, the trend of EFs is an indication of how the factors associated with each eigenvalue affects the pattern of variation at all covariate values (Meyer and Kirkpatrick 2005). As illustrated in Figure 5, the leading eigenfunction was positive and slightly increased toward the end of lactation. This implies that an increase or a decrease of the mean at all DIM is relatively easy to produce (Meyer and Kirkpatrick 2005). Conversely, selection to increase (or decrease) mean SCS at any day will result in a corresponding change in value for all other DIM as a correlated response to selection.
For the heteroscedastic models, this study and results by Haile-Mariam et al. (2001) and Ødegard et al. (2003) have observed relatively higher residual variances in early lactation (Figure 2), ranging from 1.30 to 1.56. However, a similar posterior mean and a small variation for RV estimates among models with different measurement errors were observed, indicating that there is no need to fit for heterogeneity of RV. Absorption is also seen from Figure 2, depicting the PE variance and the RV by DIM for homogeneous and heterogeneous RV. As could be seen in Figure 2, for the model L6456ME1, the slope of the PE variance trajectory is obviously affected by change in RV throughout lactation, demonstrating the ability of the PE effects to absorb heterogeneity of the residuals (Ødegard et al. 2003). Considering these results, the L6456ME1 was characterized as the most parsimonious and of the lower computational demand to model test-day SCS of Holstein cows in Iran by RR models.

Conclusion
Overall, RR models using Legendre polynomials seem well suited for the genetic analysis of SCS compared with other RR functions. It was shown that models with lower order of fit for additive effects than for PE effects had performance. Hence, the fifth and sixth order of fit, respectively, was preferred for additive and PE effects. Although there were no differences in estimated variance components for models assuming either heterogeneous or homogeneous RV, model comparison indicated that the models assuming homogeneous residual variances were favoured on basis of both information criteria. Hence, a model with homogeneous RV assumption was preferred. For the chosen model with the fourth-, fifth-and sixth-order Legendre polynomials for HY, AG, and PE effects, respectively, the heritability estimates of SCS were practically constant until DIM 250 of lactation (around 0.03) and slightly increased towards the end of lactation (0.07).
Because of the low posterior means of daily heritability (mean 0.036), direct selection on SCS will result in slow response to selection for resistance to mastitis. However, when setting up a breeding program, also low heritability traits should be included if they are of interest (and in order to avoid deterioration in the case of antagonistic relationships to other traits). Low heritability does not necessarily mean that there is not much genetic variation but that phenotypic variation is higher. In other words, the lack of genetic variation has led to recommendations that the improvement of somatic cells level should rather be based on modification of the environment to create conditions suitable for health of udder.