Transforming the Tradition of Discrete Milk Yield Correction Factors: A Continuous One-step DeLorenzo-Wiggans Method

: Over the past decades, various methods have been proposed to estimate daily milk yields from partial yields. Many of these methods divide milking interval time into varied classes, assuming that the yield correction factors are constant within classes but vary between classes. The DeLorenzo and Wiggans (D-W) method has been widely used in the United States, typically following a 2-step process. It calculates discrete yield factors for segmented milking interval classes and then refines them through a follow-up smoothing step. Such a 2-step approach is computationally inefficient, and discrete yield correction factors introduce biases. This study explored strategies to integrate continuous yield factors into established methods, exemplified by the D-W method. The renovated method, also called the polynomial-interaction regression (PIR) model, postulates multiplicative yield correction factors as a linear or quadratic function of milking interval time, operating on interactions with partial yields. It utilizes all available data in a single step, exhibiting greater computability efficiency and higher estimation accuracy. A reparameterization leads to a linear model, making estimating the model parameters convenient. We evaluated the performance of the revised methods using a previous data set of milking records from Holstein cows compared with some existing methods. The results showed that the refurbished model gave more accurate estimates of daily milk yields


Short Communication Genetics
The list of standard abbreviations for JDSC is available at adsa.org/jdsc-abbreviations-24.Nonstandard abbreviations are available in the Notes.
Abstract: Over the past decades, various methods have been proposed to estimate daily milk yields from partial yields.Many of these methods divide milking interval time into varied classes, assuming that the yield correction factors are constant within classes but vary between classes.The DeLorenzo and Wiggans (D-W) method has been widely used in the United States, typically following a 2-step process.It calculates discrete yield factors for segmented milking interval classes and then refines them through a follow-up smoothing step.Such a 2-step approach is computationally inefficient, and discrete yield correction factors introduce biases.This study explored strategies to integrate continuous yield factors into established methods, exemplified by the D-W method.The renovated method, also called the polynomial-interaction regression (PIR) model, postulates multiplicative yield correction factors as a linear or quadratic function of milking interval time, operating on interactions with partial yields.It utilizes all available data in a single step, exhibiting greater computability efficiency and higher estimation accuracy.A reparameterization leads to a linear model, making estimating the model parameters convenient.We evaluated the performance of the revised methods using a previous data set of milking records from Holstein cows compared with some existing methods.The results showed that the refurbished model gave more accurate estimates of daily milk yields.
T he 1960s marked a significant shift in milk testing plans in the United States and other countries.The transition moved from the standard supervised twice-daily, monthly testing scheme toward cost-efficient milk sampling plans, motivated to minimize costs associated with Dairy Herd Improvement Association supervisor visits.The frequency of test-day recordings varied, adapting to different herd management strategies.Typically, a cow undergoes multiple milkings on a test day, yet not all these milkings are weighed.One common approach is the AM-PM milk sampling method, which alternates milk sampling between morning (AM) or evening (PM) on test days across lactation.Historically, each testday milk yield was calculated as twice a single yield.Such a practice assumes equal milk secretion rates for AM and PM milkings and equal AM and PM milking sessions, each spanning precisely 12 h apart.In the case of unequal morning and evening milking intervals, the biases are assumed to be offset by complementary unevenness between AM and PM milkings.However, these assumptions are not practically plausible (Putnam and Gilmore, 1970).
Various statistical methods have been proposed for daily milk yield computations from partial daily yields (reviewed by Wu et al., 2023c).Traditionally, these methods assume that yield correction factors are constant within milking interval classes (MIC) but vary between classes.One widely used method in the United States was proposed by DeLorenzo and Wiggans (D-W;1986) to derive multiplicative correction factors (MCF) for cows milked twice daily.First, it divides the entire milking interval time range into various classes of equal length and calculates MCF for each specific MIC.Then, these MCF are smoothed through a linear or quadratic polynomial function.In practice, such a 2-step approach is often computationally inefficient, and discrete MCF introduces estimation biases (see the review by Wu et al., 2023c).In this study, we explored strategies to integrate continuous correction factors into existing methods, exemplified by the DeLorenzo and Wiggans (D-W) method.The model renovation involved replacing separate local regressions with a single global linear or quadratic regression, operating on the interactions with partial yields.It thus circumvented the need to smooth MCF.Further model extensions were also discussed.
The traditional D-W method fits separate linear regressions without an intercept, one at a time for each MIC, as follows: Here, y ik and x ik are the total and partial (AM or PM) daily yield, respectively, for a particular animal i measured on the k-th discrete MIC, F k is the regression coefficient corresponding to the MCF specific to the k-th MIC, and ε ik is the error term, assumed to have an expected value of zero, E(ε ik ) = 0. Here, we omit the subscript for AM or PM milking as we initially present the method for analyzing AM and PM milkings separately.A joint analysis of AM and PM milkings is discussed later.The regression coefficient, F k , can

Transforming the Tradition of Discrete Milk Yield Correction Factors: A Continuous One-step DeLorenzo-Wiggans Method
Xiao-Lin Wu, 1,2 * Malia J. Caputo, 1 George R. Wiggans, 1 H. Duane Norman, 1 Asha M. Miles, 3 Curtis P. Van Tassell, 3 Ransom L. Baldwin VI, 3 Michael M. Schutz, 4 Javier Burchard, 1 and João Dürr 1 also be expressed as the ratio of the expected value of a daily milk yield over the expected value of a partial yield.The latter form corresponds to the empirical interpretation of MCF according to Shook et al. (1980).
Additional covariates and categorical variables can be included to account for secondary sources of variation in daily milk yields.For instance, incorporating days in milk (DIM) as a covariate leads to the following model: Here, γ k denotes the regression coefficient of DIM and d 0 is a constant value, which can be the DIM mean or an empirical value (e.g., d 0 = 158; DeLorenzo and Wiggans, 1986;Liu et al., 2000).Assume E(d ik − d 0 ) = 0; the interpretation of F k remains the same as that based on model (1).Otherwise, F k is the ratio of the expected value of the adjusted daily yield over the partial yield: [3] Fitting separate linear regressions to various MIC with the D-W approach has disadvantages.First, MCF tend to fluctuate between neighboring MIC when directly obtained from separate linear regressions.Second, the data volume can vary substantially between MIC, and there may not be sufficient data to compute MCF reliably for all MIC, particularly for MIC featuring small and large milking interval times.Hence, after the initial model fitting, a smoothing step is often employed to reduce the fluctuation and impute MCF for MIC where the data information is insufficient.Smoothing is usually conducted by fitting the reciprocal of the "raw" MCF as a linear function (DeLorenzo and Wiggans, 1986) or as a quadratic polynomial function (Shook et al., 1980) of milking interval time (precisely speaking, mid-points of milking interval classes).
In this study, we proposed modifying the D-W method by introducing a one-step procedure that leverages all available data information to calculate continuous MCF.This new approach employs a global regression that operates on the interactions between partial yields and milking interval time, contrasting the tradition of restricting MCF calculations to data specific to each MIC.Instructively, the model development is explained as follows.First, we define continuous MCF as a function of milking interval time, F t = f(t), specific for every time point.Note that the switch of the subscript of F from k to t is fundamental, implying a transition from the traditional discrete MCF to continuous MCF with computational and modeling advantages.It allows the model to utilize all available data more effectively than the traditional methods that limit computing MCF to each specific MIC.Then, we need to decide on appropriate forms for F t = f(t).The lactation biology and previous studies have indicated that daily milk yield (including fat and solids-not-fat) tends to be linear with milking interval time within 12 h, but such a linear relationship will no longer hold when milking interval exceeds 12 h (e.g., Ragsdale et al., 1924;Bailey et al., 1955;Elliott and Brumby, 1955;Schmidt, 1960;Wiggans, 1986).Therefore, we applied a linear function for F t = f(t) in thricemilkings, in which the milking interval time is mostly less than 12 h, and a quadratic function in twice-milkings, in which the milking interval time can go beyond 12 h.
Consider the quadratic form and include DIM as a covariate.Then, the D-W model is reformatted as the following: Here, t i denotes the milking interval time for the i-th animal.Including a quadratic term t i 2 ( ) allows the model to account for the nonlinear effects of the time interval on the daily yield.Then, MCF are calculated pertaining to specific milking interval time (t) as follows: Note that setting b 2 = 0 yields a linear function for deriving MCF applicable to cows milked thrice daily or more frequently.Given a partial yield and computed continuous MCF, the corresponding daily is calculated as follows: In this paper, this modified D-W model is referred to as polynomial-interactions regression (PIR) because it operates on the interactions between partial milk yields and milking interval time in linear and quadratic forms.Mathematically, this new model is a nonlinear function, but all model parameters can conveniently be estimated with re-parameterization by introducing 2 new independent variables.Let τ i = (t i * x i ), which represents the interaction term between linear milking interval time and a partial yield pertaining to each animal.Similarly, let δ i i i t x = 2 * , for the interaction between quadratic milking interval time and a partial yield.Then, re-arranging model (4) leads to the following: [7] Further model extensions are straightforward.For instance, a joint analysis of both milkings can be implemented as follows: Here, the subscript j indexes AM (j = 1) or PM (j = 2) milkings.Based on our preliminary analysis of the US Holstein cattle population, the above model assumes heterogeneous linear and quadratic polynomial functions of milking interval time in interaction with an AM or PM partial yield and a common DIM effect for AM and PM milkings.However, various settings of the joint analysis model can be made possible depending on the data.The performance of the renovated model was evaluated using a previous data set (Wu et al., 2023b).This data set consisted of 15,888 Holstein milking records (7,544 a.m. and 7,544 p.m.) collected from 3,717 animals.This data set was randomly sampled from 23 herds in 11 states in the US The records covered the first 3 lactations (39.8%,59.4%,and 0.8%) in 4 consecutive years (2006 The polynomial-interaction regression models were implemented under 2 scenarios.The model in the first scenario (PIR) had milking interval time and partial (AM or PM) yields as the predictor variables.In the second scenario, the model included months in milk, parities, and geological regions (MPR) as additional predictor variables, thus denoted as PIR_MPR.Here, we approximated the categorical effect of months in milk, bypassing the complexity of the nonlinear effects of days in milk (Wu et al., 2023b).For comparison, benchmark models included the D-W model, the George Wiggans (1986) model (denoted as GW), and linear regression (Liu et al., 2000).The D-W and the GW models represented traditional MCF models without considering additional predictor variables.Similar to the PIR models, linear regression was implemented in 2 scenarios: one without MPR (denoted as LR) and the other with MPR (denoted as LR_MPR).
Model comparison was conducted using bootstrapping samples and validation samples.Each bootstrapping sample was generated by sampling with replacements, maintaining the same number of milking records as the entire data set, based on unique cows.The unsampled milking records served as the validation set.Bootstrapping was randomly repeated 100 times, producing 100 bootstrap-ping samples and 100 validation samples.The performance of these models was evaluated and compared using 3 criteria: mean squared errors (MSE), correlation between actual and estimated daily milk yields, and R 2 accuracy as defined below: R y y where y i and AE y i stand for the actual and estimated daily milk yields for the i-th animal, respectively, and y is the mean actual yield.
Statistically, R 2 measures the proportion of variance in the dependent variable that is predictable from the independent variable(s).A higher R 2 indicates a better fit in the bootstrapping samples and better prediction accuracy in the validation samples.
Discrete and continuous MCF were obtained and compared between the D-W and PIR models and compared (See the Graphi-cAbstract Figure , middle).For the D-W model without the smoothing step, F s k ' varied from 1.40 to 2.23 for AM MIC and from 1.73 to 3.37 for PM MIC.These F s k ' were discrete and fluctuated substantially between neighboring MIC.Another challenge with the D-W model was insufficient data to compute MCF reliably for all MIC.We arbitrarily set the MCF as the mean of all MCF for a MIC with less than 20 milking records so that they stand identifiable as "outliers."In practice, "missing MCF" are imputed by linear or quadratic smoothing instead (Shook et al., 1980;DeLorenzo and Wiggans, 1098).The D-W approach with follow-up smoothing also yielded discrete MCF because it computed MCF on the midpoints of each MIC.The MCF obtained from the 2 D-W models, implemented with or without the smoothing step, generally agreed with each other.However, there were noticeable discrepancies between both approaches for milking interval classes with insufficient data and for short (less than 9 h) or long (greater than 15 h) milking intervals.The latter differences were due to the quadratic smoothing with the D-W_1 model.In contrast, the PIR model produced a smooth curve of MCF as they were derived from a global model, simultaneously fitted all the data, and evaluated on every milking interval time unit.
Compared with the traditional D-W method, the renovated models (PIR and PIR_MPR) had smaller MSE and greater R 2 accuracies (Table 1).The renovated models had an approximately 4.0 -15.0%reduction in MSE and an up to around 2% increment in R 2 accuracy.The increase in R 2 accuracy was more significant with uneven AM and PM milking intervals and more pronounced at individual levels.For the latter comparison, we computed individual R 2 accuracy as: R y y Compared with the D-W model, the PIR_MPR models had 0.24% of the animals showing ≥1% improvement in individual R 2 accuracy and 0.06% showing ≥2% improvement in individual R 2 accuracy.The maximum improvement in individual R 2 accuracy was 4.10%.In contrast, the maximum negative change in individual R 2 accuracy, favoring the D-W model, was −0.67%, meaning that no animals had an individual R 2 accuracy from the D-W model that was 1% higher than that from the PIR_MPR model.The GW model performed roughly similarly to the D-W model.The LR model without MPR slightly outperformed the PIR model concerning MSR and R 2 .
Nevertheless, the PIR model with MPR outperformed LR with MPR.In all the 6 models, PIR_MPR had the smallest MSE and the greatest R 2 .Mean squared errors decreased with lactation months, while correlations varied only slightly.R 2 accuracies were relatively higher in the beginning (1-3) and ending months (9-11+) of lactation because the daily yield variances were higher in these months compared with lactation mo 4-8.
The differences in the correlations between actual and estimated daily milk yields were less significant among the 6 models (Table 1).Also, the conclusions from the correlation and R 2 measures were not consistent with each other.For instance, without accounting for MPR, LR had slightly greater R 2 accuracy than the 2 traditional MCF models and the PIR model.Yet, it had the smallest correlations among the 6 models, leading to inconsistent conclusions.Statistically, correlation is a measure of associations between 2 quantities, not a precise measure of errors.For instance, let the correlation be r cor y y = ( ) , .
ˆ Then, adding a nonzero constant (Δ) to all the predicted values increases the errors, but it does not change the correlation: cor y y c or y y r , , .ˆ+

( )= ( )= ∆
The renovated model outperformed the 2 traditional MCF models because continuous MCF minimized the biases inherent in the traditional discrete MCF.Discrete MCF introduces biases and, therefore, leads to more significant errors.Consider an animal, say i, with a partial yield collected for milking interval time t.Let the MCF be a quadratic polynomial function of milking interval time.A precise estimate of MCF for milking interval time t is the following: 2 . [10] In the above, t t ≠ , and t is the mid-point time in that milking interval class.Then, the daily milk yield is estimated by: where F b b t b t k = + + ˆˆˆ.11).Yet, it ignores the second term on the right-hand side, which then becomes the bias.
The bottom left Figure in the GraphicAbstract showed the R2 accuracies obtained from the D-W model with varying MIC size.The R 2 accuracy decreased as the MIC size increased from 0.05 h to 1 h.When comparing the test-day yield estimates obtained from the discrete MCF model (D-W) and the continuous MCF model (PIR) models, the discrepancies in yield estimates between the 2 models showed recurrent patterns.The difference was minimum at the midpoint of each MIC, increased as the milking interval moved off the midpoint, and reached the maximum at the boundaries between 2 MCF.(Liu et al., 2000); LR_MPR = linear regression accounting for months in milk, parities, and geological regions (MPR); PIR = polynomial-interaction regression; PIR_MPR = PIR accounting for MPR.
, where E k stands for taking the expectation operation locally for the k-th milking interval class.
In practice, precisely adjusting for their effects is complicated.Instead, ignoring the term E k (f(z)) works only for the "perfectly" balanced data blending such that E k (f(z)) = 0. Otherwise, it induces "systematic" biases.
In the present study, accommodating more covariates and factors with LR and PIR significantly decreased MSE and improved R 2 accuracies.This was evidence that the data blending was not "perfect."The PIR_MPR outperforms LR_MPR, possibly because the former captured the interactions between partial yields and milk interval time.Analysis of variance based on the reparametrized polynomial-interaction model with MPR effects showed a highly significant impact of partial yields on test-day yields .The interaction effect between a partial yield and linear milking interval time was highly significant (Pr < 2.2e-16).The interaction effect between a partial yield and quadratic milking interval time was suggestive (Pr = 0.081) for AM milkings and highly significant (Pr <2.2e-16) for PM milkings.These findings support the inclusion of interactions in the modified D-W model.Months in milk significantly affected daily milk yields (Pr < 2.2e-16).The ANOVA results also showed highly significant parity effects (AM: Pr = 1.76e-07;PM: Pr = 2.18e-04) and regional effects (AM: Pr = 1.44e-07;PM <2.2e-16) on daily milk yields.In the present study, the MPR variables are considered secondary.Accounting for the difference due to the secondary predictor variables can improve the accuracy when the blending is unsuccessful.
Finally, the PIR model has no intercept.This model setting is subject to the criticism that it would unrealistically enforce x = 0 and y = 0 (Liu et al., 2000).Statistically, linear regression with an intercept would allow the model to fit the data flexibly.In particular, linear regression with an intercept can be more useful when the dependent variable does not naturally pass through the origin when x = 0.It can also account for any baseline level of y independent of x.Probably because of the above reasons, the PIR without intercept performed slightly worse than the LR with intercept in terms of MSE and R 2 accuracies.Also, it should be noted that the R 2 accuracy defined in ( 9) is appropriate for linear regression with an intercept in the sense that the variance of estimates is smaller than the total (actual) variance.A more appropriate accuracy measure for linear regression with an intercept would be the R 2 , which resembles the liability measure (Wu et al., 2023b) because the estimated variance is often greater than the actual variance.It should be noted that the latter measure could give slightly greater accuracy for the MCF models.
In theory, the purpose of omitting the intercept is to produce MCF instead of ACF (Wu et al., 2023a).Its primary impact was observed in the variance of estimated daily milk yields.A linear model with an intercept tends to generate a smaller estimate variance than the actual variance.Again, let y = a + bx + e.We have In practice, either reduced or inflated variance of estimated daily milk yields has a non-ignorable impact on genetic evaluations because it can lead to varied genetic variance.Therefore, estimated daily milk yields must be rescaled to be comparable to the actual variance.A simple practice is to apply a common rescaling factor equaling the square root of the estimated to actual daily yield variance ratio.Down-scaling the variance often reduces the errors, whereas inflating the variance tends to increase the errors further.In the study, rescaling the variance reversed the conclusion of the model comparison: PIR had around 1% higher R 2 accuracy than LR.
In conclusion, we proposed a strategy to incorporate continuous MCF into the established methods, exemplified by the DeLorenzo- Wiggans (1986) model.The traditional D-W method represents a 2-step procedure, where the smoothing uses data information captured only by means, possibly leading to oversimplified and potentially biased results due to the loss of detailed information.In contrast, the renovated D-W method fully leveraged the available data and better captured the underlying relationship between the variables through modeling interactions between partial yields and milking interval time.The latter formed the basis for accurately computing MCF and estimating daily milk yields.Utilizing all available data information also allows for more accurate estimating of the error term, which is crucial for understanding the model's predictive accuracy and any statistical tests.We demonstrated the methodology with the AM-PM milking plans, yet this modeling strategy is universally applicable across various milking plans, subject to some necessary modifications.
Wu et al. | Transforming the Tradition…  to 2009).These 11 states represent 4 of the 5 geological climate regions (https: / / codes .iccsafe.org/content/ IECC2021P1/ chapter -3 -ce -general -requirements).The blending of milking records was also intended to average out the effects of secondary variables such as days in milk, parities, and geological regions.The number of AM and PM milking records per lactation month varied between 996 and 1,372 for up to 305 d, including an additional 2,890 milking records for prolonged lactations.The distributions of AM or PM milking records per 30-min milking time bin are shown in Figure1.The 95% interval of milking interval time was 10.51 -15.06 h for AM milkings and 8.06 -13.36 h for PM milkings.The maximum number of milking records per 30-min milking time bin was 1,995 (AM) and 2,130 (PM), respectively.The mean (standard deviation; SD) of the milking interval was 12.3 (1.12) hours for AM milkings and 11.6 (1.14) hours for PM milkings.The means (SD) of AM and PM milk yields were 16.4 (5.58) kg and 15.3 (5.45) kg, respectively.

Figure 1 .
Figure 1.Distributions of milking records by months in milk (upper) and by milking interval time in 30-min intervals (bottom).
the first term on the right-hand side of the equation (

1
MSE = mean squared errors; COR = correlation between estimated and actual daily milk yields; y i and ŷi stand for the actual and estimated daily milk yields for the i-th animal. 2 D-W = DeLorenzo-Wiggans (1986) model; GW = Wiggans (1986) model; LR = Linear regression Wu et al. | Transforming the Tradition…

=
In contrast, the model without intercept can inflate the variance of estimates due to reasons such as lack of centering of predictor variables, multicollinearity, inappropriate scaling and nonzero means of variables, and heteroscedasticity of the residuals.Often, omitting the intercept leads to a significantly larger ˆ, b which, in turn, can give a larger estimate variance than the actual variance.When we evaluated the variance ratio per month in milk across the entire data set, the mean (range) of the variance ratios were 1.134 (1.086 -1.183) with D-W, 1. 157 (1.103 -1.215) with GW, 0.950 (0.909 -0.998) with LR, 1.132 (1.076 -1.192) with PIR, 0.895 (0.856 -0.944) with LR_MPR, and 0.898 (0.858 -0.945) with PIR_MPR.

Table 1 .
Wu et al. | Transforming the Tradition…Comparing accuracies of estimated daily milk yields using six models respectively1,2