Is it possible to predict the methane emission intensity of Swedish dairy cows from milk spectra?

of greenhouse gases (GHG), especially methane (CH 4 ), from


Introduction
The carbon footprint of cattle production has been intensively discussed in recent decades and many efforts have been made to reduce greenhouse gas (GHG) emissions from ruminant production [1][2][3].Production and losses of methane (CH 4 ) contribute to GHG emissions and also represent a loss of energy for the animal, corresponding to roughly 6-8% of gross energy consumed.Methane production is related to the ability of ruminants to convert human-inedible feeds and by-products into nutritious foods for human consumption (i.e.milk and meat), which is achieved mainly through fermentation of fibrous components by rumen microorganisms.Differences in rumen fermentation result in different concentrations and proportions of volatile fatty acids (VFA) being absorbed from the rumen, which also affects milk composition, e.g.fatty acid composition [4].Thus, it can be hypothesised that there is a relationship between milk composition and CH 4 production, because both are affected by the amount and proportions of hydrogen and VFA formed by microbial fermentation in the rumen [5,6].
It has been suggested that breeding for animals with lower CH 4 production per unit product would increase production efficiency and contribute to GHG mitigation [7,8].There can be large variation between individuals in terms of CH 4 production, even when feeding and stage of lactation are taken into account [9], and CH 4 production has moderate heritability [10].
As a first step in investigating CH 4 production on the individual cow or farm level or in evaluating strategies for reducing CH 4 production, accurate measurements of CH 4 are needed [11].Methods for estimating CH 4 production from individual cows on a large scale would therefore be valuable.Emissions of CH 4 can be measured with several techniques, such as respiration chambers (RC), sulphur hexafluoride (SF 6 ) tracer, laser CH 4 detector (LMD), infrared sniffers (IRS) [12], the GreenFeed (GF) system [13] and others [14].Although continuous measurements in RC give the most accurate estimates of CH 4 production, this technique has the major drawbacks that is expensive, can be used on a limited number of animals and is laborious.The use of spot sampling techniques such as IRS to measure CH 4 provides a greater number of observations, involves reduced animal handling and is more cost-efficient [12,15,16].
While direct measurements can be performed in a cost-efficient way on many animals, they still require calibration and maintenance, and thus their use is not likely to be widespread in commercial farming.Thus, identification of an easily measured proxy for CH 4 emissions is required.Several equations have been developed to estimate CH 4 [17,18], but these are often based on variables for which data are not available on commercial farms, such as dry matter intake.On modern dairy farms, many dairy cows are enroled in a milk recording scheme, where milk is regularly sampled and analysed by mid-infrared spectroscopy (MIRS) to determine its content of fat, protein, lactose, urea and casein [19][20][21].In addition to these compounds, the MIRS analysis could be extended to other milk-related or cow-related traits.Many studies have used MIRS or Fourier transform infrared (FTIR) spectroscopy to predict milk species [22], feed intake [23], energy balance [24], mastitis [25] and many other parameters that could be challenging to measure directly.Moreover, many models using milk MIRS to predict CH 4 emissions have been developed, with varying results [26][27][28][29][30].The measurement period in those studies was often short and different animals contributed data at different lactation stages, which could have increased random variation and lowered accuracy.
Therefore, the aim in this study was to evaluate the potential for using information generated from MIRS analysis of milk samples to estimate CH 4 emissions intensity during lactation in dairy cows.

Animals and experimental design
The data used were taken from a previously published study by [31] in which diets with a low starch content were fed to dairy cows at the Swedish Livestock Research Centre at Lövsta, Uppsala, Sweden.All handling of animals was approved by the Uppsala Ethics Committee for Animal Research, Sweden (diary number C 99/16).In total, 37 cows (13 Swedish Holstein and 24 Swedish Red) were included in the study for one full lactation period.In average (mean±SD), the total daily dry matter intake was 25.8 ± 5.2 kg, daily CH 4 production was 406.9 ± 56.3 g and daily milk yield was 32.9 ± 6.9 kg.The cows were all multiparous, with 20 in their second lactation and 17 in their third to seventh lactation.The cows were randomly divided into two treatment groups fed two different levels of byproduct-based concentrates and given ad libitum access to grass-clover silage.Full details of the experimental design, diet formulation and chemical composition, and the main findings related to production, energy balance, feed efficiency and fertility, can be found in [31].The cows calved between February and July 2017, and milk MIRS data and CH 4 data were collected between April 2017 and May 2018.

Methane and milk MIRS data collection and processing
Individual CH 4 emissions were measured using the infrared sniffer (IRS) method [12] with a similar set-up as previously described in [32].In brief, a CH 4 analyser (Guardian Plus; Edinburgh Instruments Ltd., Livingston, UK) was calibrated using standard mixtures of CH 4 in nitrogen.The analyser was attached to the automatic milking system (AMS) and the sampling tube was attached to the concentrate trough within the AMS.The CH 4 concentration was logged every second on a data logger (Simex SRD-99; Simex Sp. z o.o., Gdansk, Poland) and then visualised using logging software (Loggy Soft; Simex Sp. z o.o.).Times of entry to the milking station and cow ID were recognised using the data management software DelPro (version 5.1/5.2.1; DeLaval International AB), and the values were coupled with corresponding CH 4 values from the logger.To reduce the risk of variation in head position only data from the first five minutes of the visit were used, where we assumed that the cow kept the head still in the feed bin and did not finish the concentrates provided in less than five minutes.On average, milking data were recorded 2.6 times a day for each cow.Methane production for every visit (g/d) was calculated using the equation developed by [12].
For each milking, mean peak height and integral were calculated, together with peak frequency (eructation rate).Milking occasions with fewer than three recorded peaks were removed from the analysis.On average, 2.2 readings of CH 4 per animal and day were recorded.Daily milk yield was calculated as the total yield over 24 h.Milk samples were taken fortnightly from one milking between 12 am to 12 pm, preserved with bronopol and analysed within three days by MIRS (CombiScope FTIR 300 HP, Delta Instruments B. V., Drachten, the Netherlands).Each full MIRS dataset consisted of 935 absorbances in wavenumbers ranging from 397.307 to 4000.071 cm − 1 .Spectral data was plotted in a principal component analysis (PCA) for identification of outliers, however, no outliers were observed.Mean CH 4 intensity (MI, g/kg milk yield) was averaged into one value for two weeks (one week before and one week after milk sampling) to match the MIRS data.In total, after preprocessing (averaged fortnightly and merged), 593 records from first week of lactation until the 46th week were used for further data analysis.

Prediction analysis using partial least square regression (PLSR) analysis
The prediction analyses were performed by partitioning the lactation into six-week intervals, which created seven lactation sub-periods in total.The Caret package version 6.0-92 [33] in R software [34] was used for the prediction analysis.The partial least square regression (PLSR) method was used as the tool for prediction from the multivariate MIRS data.The prediction analyses of CH 4 intensity were validated by leave-one-cow-out (LOCO) cross-validation, which was performed by calibrating the model on data from 36 of the cows and then using it to predict data from the last cow.This procedure was repeated until data had been predicted once for every individual cow.Using this validation strategy, each cow had a chance of being predicted from data for the other 36 cows.Individual cow MI was predicted for every lactation sub-period and for the complete lactation.
From the LOCO procedure, the prediction error was calculated as: where êij is the prediction error of MI for observation j for cow i, Y ij is the observed MI value j for cow I, and Ŷi ′ j ′ is the predicted value of MI for observation j for cow i considering the model built without cow i.In this case, a positive prediction error implies model underprediction and a negative prediction error implies model overprediction.The coefficient of determination (R 2 ) for LOCO cross-validation was calculated as the square of the correlation between predicted and observed values.

Data variation and model evaluation
The random variation in predicted and observed values of MI was evaluated according to two models: (2) where Y ij is the MI corresponding to the observed or predicted value j for cow i, µ is the general constant (fixed effect), C i is the effect of cow i, ε (i)j is the random error for the model (2), Y ijk is the observation or predicted value k for the cow j during the period i, G i is the effect of lactation subperiod i, C (i)j is the effect of cow j nested within lactation sub-period i, and ε (ij)k is the random error for model (3).Models ( 2) and (3) were applied to interpret the datasets for the seven lactation sub-periods and the full lactation dataset, respectively.All effects in the models (except the general constant) were considered random and interpreted according to the normal distribution.The variances associated with each effect were estimated by the restricted maximum likelihood method.The analyses were performed using the MIXED procedure in SAS.
The performance of the models was evaluated using the Model Evaluation Software (MES) developed by [35].Model validation was performed using four different approaches.The first approach consisted of evaluating the significance of the mean prediction error, using a simple t-test based on a bilateral alternative hypothesis (α = 0.05).
The second validation approach was based on adjustment of linear regression of observed (Y) on predicted (X) values.The adjusted model was evaluated according to the hypotheses: H 0 : β 1 = 1 and H a : where β 0 and β 1 are the intercept and slope of the model, respectively.
Predicted and observed values were assumed to be equal when both null hypotheses were not rejected (P > 0.05).
The third approach was based on calculation of concordance correlation coefficient (CCC) and its components [36] according to the equations: where CCC is within the range -1 ≤ CCC ≤ 1, ρ is the correlation between predicted and observed values, Cb is the bias correction factor (0 < Cb ≤ 1), v is the scale shift, s o and s p are the standard deviation of observed and predicted values, respectively, u is the location shift, and Y o and Y p are the mean of observed and predicted values, respectively.The fourth validation approach was based on decomposition of the mean squared error of prediction following an existing method [37]: where MSEP is mean squared error of prediction, SB is the squared bias, U is the component of MSEP associated with unequal variances, and I is the component of MSEP associated with incomplete (co)variation.The other terms are as defined previously defined.The terms SB, U and I were estimated as percentages of MSEP.

Descriptive evaluation of random variation
The observed values of MI indicated that, in general, the variation between cows was larger than the variation within cows (Fig. 1) and this predominance of variation between rather than within cows tended to increase from early to late lactation.This could have been due to differences in dry matter intake (DMI) to meet nutritional requirements depending on lactation stage and milk production level [38].It is well known that DMI is positively correlated with CH 4 production [39].DMI

S. Mohamad Salleh et
is also associated with milk yield [40], and hence these parameters are intercorrelated with each other.Thus predicted MI might not only reflect the amount of CH 4 produced, but also variations in milk production between cows.
The variation in predicted MI showed a different pattern to that in observed MI.Except for the measurements taken between lactation weeks 25 and 36, most of the random variability in predicted values was associated with variations within cows (i.e., among measurements) instead of between cows.The variation in predicted MI was also substantially smaller than that in observed MI.The models were thus not able to reproduce the individual variation in MI.For 16 out of 17 variance components, the estimates associated with predicted MI were numerically lower than those for observed MI.The only exception to this was the variation between measurements obtained from the 7th to 12th week of lactation.This overall pattern provided the first evidence that MI prediction from milk MIRS was unable to account for the variation found in the observed data.The individual variation in MI was not captured in the predictions based on milk MIRS for any of the lactation subperiods.
After pre-processing of the milk MIRS data, we tested several ways of including the number of variables (wavenumbers) in the prediction model.It is worth noting that the more wavenumbers included as explanatory variables in the prediction model, the better the prediction accuracy.Similarly, a study by [27] found that the coefficient of determination (R 2 ) was improved when full spectra were used instead of selected wavenumbers.To ensure that no important information was excluded, all 935 wavenumbers in the MIRS data were used for predictions in the present study.In some previous studies [23,41,42], milk composition parameters (e.g.fat and protein) have been included as variables together with milk MIRS values.However, since milk composition was derived from milk MIRS, and thus reflected in the spectra data, we did not include more variables related to milk composition in the prediction model, to avoid having multiple or redundant information in the model.

Evaluation of model performance
There are various ways of evaluating the performance of prediction models.The most common methods, which also make it easy to understand model outcomes, are explained by [35], who tested various methods for summarising and evaluating mathematical models specifically used in agriculture.In the present study, we used regression analysis, CCC analysis and decomposition of MSEP to evaluate the models, based on the dataset where data were partitioned into seven lactation sub-periods.
Overall, MI showed a numerical increase as the lactation period progressed (Table 1) and this trend was perceived for both observed and predicted values.The average predicted and observed values were close to each other, resulting in mean prediction errors that were nonsignificant (P ≥ 0.81) and numerically close to zero.However, despite the similarity in average values, the R 2 for LOCO cross-validation was low, indicating poor prediction capacity of the models.
Corroborating the pattern seen with LOCO cross-validation R 2 , the regression analysis indicated disagreement between predicted and observed MI values, with an intercept different from 0 and a slope different from 1 (P ≤ 0.046), regardless of the lactation sub-period Table 1).This indicates that in all lactation sub-periods, the relationship between predicted and observed values was different from unity.When the full dataset was used for prediction, neither of the null hypotheses (Eqs.( 4) and ((5), testing the difference of the intercept from 0 and the difference of the slope from 1), was rejected (P ≥ 0.13).However, graphical evaluation of the ordered pairs showed quite a dispersed pattern that was far from an ideal relationship Fig. 2).This pattern was confirmed by the numerical estimates of the intercept and slope, which were far from the ideal parametric values (Eqs.( 4) and (( 5)), with the standard error for intercept and slope comprising 66% and 17% of the respective estimates for the full dataset.The poor quality of MI prediction from milk MIRS was also indicated by weak CCC, which ranged in value from -0.224 to 0.122 (Table 1).All bias correction factor estimates were far from unity, confirming the deviation from the parametric slope in the linear relationship described above.This indicates a high degree of bias in the linear relationship between predicted and observed MI.On the other hand, the values associated with location shifts were low and close to zero, corroborating the findings for mean prediction bias.The main constraint identified was high values of the scale shift characteristic, in agreement with the observation that the variation in observed values was wider than that in predicted values.Joint evaluation of these two characteristics indicated that milk MIRS information was able to produce adequate mean values of MI, both within lactation sub-periods and for the full lactation but was not able to

Table 1
Statistics for comparison between observed methane intensity (g/kg milk yield ± SD) in Swedish dairy cows and values predicted using information from milk midinfrared spectroscopy.adequately simulate the pattern and range of data variation.Thus, using milk MIRS data to predict MI would result in a small range of predicted values that would not deviate far from the average observed values.In addition, it appears that the model used in the present had limitations in accurately predicting data with high MI values.The evaluation of MSEP confirmed the findings for model performance obtained using the other diagnostics (Table 1).Prediction of MI from milk MIRS had no bias, which allowed accurate prediction of the average values.The main constraints in prediction of values were associated with reproducing the variation in actual data in terms of range (unequal variances) and direction (incomplete (co)variation).
Physiologically, many events occur during the milk production process, from rumen fermentation to synthesis of milk compounds [32].Theoretically, when one mol of glucose from cellulose is completely fermented to acetate, there is net production of one mol of methane [43], so the correlation between acetate production and methane production should be strong and positive.However, the final metabolic fate of acetate is not deterministic, as it can be used in many different metabolic pathways.Synthesis of milk components is one such metabolic pathway but, while all acetate can be used for milk fat synthesis, the exact type of fatty acid in which it is incorporated cannot be predicted.This partitioning of fermentation products into different metabolic pathways, including milk synthesis, would weaken the association between rumen fermentation pattern and milk composition, thus affecting the relationship between MI and milk MIRS.
The dataset used in the present study was unique, because the data were collected from 37 individual cows for which MI observations throughout the whole lactation were available.This made it possible to study differences in the predictive ability of models built for different lactation sub-periods, and for the full lactation.Despite the limitations with using milk MIRS to predict individual MI identified in this work, one specific pattern emerged from the different validation processes, namely that there was no bias in predicting average MI based on the full dataset.This indicates that while MIRS information cannot be used to predict MI for an individual cow, it may provide an accurate estimate of average MI at herd level.If the average predictions across animals in a group or herd could be used, this would be useful for different applications, such as to discriminate MI between herds or to provide information for MI inventory.For methane inventories in particular, predicting methane using the information from milk MIRS could provide benefits, as it is a cheap, fast, high-throughput and easily available method [20,44].However, the potential for using milk MIRS for this purpose requires further evaluation.In addition, it was evident that the MI measurements made during the first part of the lactation showed higher variation than those made later in the lactation.Therefore, measurements should perhaps be performed later in lactation if the aim is to evaluate differences in MI between cows.

Conclusions
Information from milk MIRS sampled fortnightly at morning milkings proved to be unsuitable for predicting methane intensity (g/kg milk production) in individual multiparous cows in any stage of lactation, with low prediction accuracy and poor capacity to reproduce betweencow variation.Predicted between-cow variation was closer to that in observed values in the latter half of lactation, so that period might be more suitable for evaluating methane intensity in individual cows.The average prediction values for the present dataset were consistently accurate, suggesting that predictions on herd level using milk MIRS may be achievable.The potential to use milk MIRS for this purpose should be evaluated in future studies.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.1).

Fig. 1 .
Fig. 1.Partitioning of total variance in model-observed (O) values of CH 4 intensity (MI, g/kg milk) in Swedish dairy cows and model-predicted (P) values based on information from milk mid-infrared spectroscopy (numbers inside bars are absolute values of variance).

Fig. 2 .
Fig. 2.Relationship between observed values of CH 4 intensity (MI, g/kg milk yield) in Swedish dairy cows and predicted values based on information from milk midinfrared spectroscopy (solid and dashed line correspond to equality line and least square straight line, respectively; for details, see Table1).