Evaluation of reticuloruminal pH measurements from individual cattle: Sampling strategies for the assessment of herd status

The application of pH observations to clinical practice in dairy cattle is based on criteria derived primarily from single time-point observations more than 20 years ago. The aims of this study were to evaluate these criteria using data collected using continuous recording methods; to make recommendations that might improve their interpretation; and to determine the relationship between the number of devices deployed in a herd and the accuracy of the resulting estimate of the herd-mean reticuloruminal pH. The study made use of 815,475 observations of reticuloruminal pH values obtained from 75 cattle in three herds (one beef and two twice-daily milking herds) to assess sampling strategies for the diagnosis of sub- acute rumen acidosis (SARA), and to evaluate the ability of different numbers of bolus devices to accurately estimate the true herd-mean reticuloruminal pH value at any time. The traditional criteria for SARA provide low diagnostic utility, the probability of detection of animals with pH values below speci ﬁ ed thresholds being affected by a strong effect of time of day and herd. The analysis suggests that regardless of time of feeding, sampling should be carried out in the late afternoon or evening to obtain a reasonable probability of detection of animals with pH values below the threshold level. The among-cow variation varied strongly between herds, but for a typical herd, if using reticuloruminal pH boluses to detect a predisposition to fermentation disorders while feeding a diet that is high in rapidly fermentable carbohydrates, it is recommended to use a minimum of nine boluses.


Introduction
Single time point (discontinuous) measurement of reticuloruminal fluid pH has been used by veterinarians for the last 25 years to assess the possible contribution of reticuloruminal acidosis to health problems such as laminitis, left-displaced abomasum, diarrhoea and poor performance (Dirksen and Smith, 1987;Nordlund and Garrett, 1994;Nordlund et al., 1995;Garrett et al., 1999;Plaizier et al., 2009). The use of discontinuous sampling strategies of reticuloruminal pH for the detection of nutritionally induced acidosis has been reviewed previously (Enemark et al., 2004;Enemark, 2009;Tajik and Nazifi, 2011;Kleen and Cannizzo, 2012). The criteria that have been most widely applied are based on recommendations by K.V. Nordlund, G.R. Oetzel and E.F. Garrett from the University of Wisconsin (Nordlund and Garrett, 1994;Nordlund et al., 1995), who first recommended that six cows in the late dry period, six cows in the immediate postpartum period and six cows 21-60 days postpartum should be sampled by rumenocentesis. The recommended sampling time was 2-5 h after feeding with grain if individual components were fed separately or 5-8 h post feeding if a total mixed ration (TMR) was used. The later paper (Nordlund et al., 1995) makes similar recommendations but specifies two groups (early postpartum and adapted) and that samples be obtained 2-4 h post feeding with concentrates (rather than 2-5 h) and 4-7 h after TMR access (rather than 5-8 h). In both papers, the pH thresholds for rumenocentesis-derived samples were: 5.5 = abnormal; 5.6-5.8 = marginal; >5.8 = normal. A herd was classified as having a problem with SARA if one or more groups had two or more animals with a pH 5.5. The original diagnostic criteria for SARA were revised by analysis of sequential rumenocentesis pH values (Garrett et al., 1999). It was concluded that a reasonable compromise for test performance was obtained if 12 animals were sampled and three of them had pH 5.5. Then the probability of correctly classifying the herd as being at risk of SARA ranged from 0.25 to 0.98, being highest when the prevalence of low reticuloruminal pH was either less than 0.1 or higher than 0.9.
Reticuloruminal pH can now be monitored continuously using commercially available boluses in the reticulum (Gasteiner et al., 2012). The cost of these devices will likely preclude their widespread use within the foreseeable future, but experimental data obtained from continuous monitoring devices can be used to refine the current recommendations in relation to interpretation of rumenocentesis data. One challenge is that the pH of reticuloruminal fluid varies with location within the rumen (Duffield et al., 2004), and while rumenocentesis procedures as recommended by (Garrett et al., 1999) would be expected to produce a sample from the ventral sac or the caudoventral blind sac of the rumen, continuous monitoring devices for reticuloruminal pH are typically retained in the reticulum (Gasteiner et al., 2012). Although the authors concluded that it was not possible to recommend a fixed conversion factor, recent work (Falk et al., 2016) found that the pH was on average 0.24 pH units higher in the reticulum than in the rumen.
This study was based on continuously monitored reticuloruminal pH values obtained from beef and dairy cattle enrolled in feeding trials. These populations were used to assess the performance of previously defined sampling strategies for suboptimal reticuloruminal pH, using transformed ruminal pH data that were derived from continuous observations of reticular pH. We used the same dataset to evaluate the precision of estimation of a herd's true reticuloruminal pH values to provide guidance on deployment rates of continuous monitoring devices for reticuloruminal pH. The aims were to evaluate current criteria for the diagnosis of SARA by single time-point reticuloruminal pH observations, and to determine the relationship between the number of devices deployed in a herd and the accuracy of the resultant herd-mean reticuloruminal pH estimate.

Cattle and diets
This study is an opportunistic, retrospective study using data that are broadly representative of contemporary intensive cattle management in Europe. All of the observations were obtained from cattle that were considered to be clinically normal. Continuously monitored reticuloruminal pH observations were obtained from 24, 28 and 23 animals respectively from each of Farm A, a German dairy farm at low risk of SARA, Farm B, UK beef research farm in which a group of animals received a diet formulated to induce SARA, and Farm C, a UK dairy research farm that was also fed a diet intended to induce SARA. Boluses for monitoring reticuloruminal pH are commercially available and recommended for use in routine husbandry. They were used in compliance with the relevant animal ethical regulations in each country (UK: UK Home Office licences PPL 60/4378 (issued 8 November 2012) and PPL 60/4156 (issued 29 January 2013) respectively for Farms B and C; Boluses were used on German dairy farms and data were captured in the run of their routine husbandry procedures and therefore were outside the requirement for ethical approval). Table 1 shows the dietary inputs for each group of animals. On Farm A -Dairy Low the cows were in early lactation and were maintained on a single, silage-based total mixed ration (TMR) in which the concentrate:roughage ratio in the diet was constant throughout the monitoring period, and is therefore intended to present a low risk of SARA. On this farm, the boluses were being used as a management tool. The average daily milk yield of cows on Farm A during the study was 35.8 L/day, with a standard deviation of 5.6 L/day. Cattle on Farm B -Beef High were provided with a high-concentrate, beef finishing dietary regime expected to predispose the animals to SARA. The cattle were subjected to a transition from a basal forage-based diet (3 weeks at a 48:52 forage:concentrate ratio), through a 2-week transition ration (27:73 forage:concentrate ratio) to 4 weeks on a very high energy density, concentrate-rich, beef cattle fattening diet (9:91 forage:concentrate ratio). Thirty six steers of 13-15 months of age were used in the experiment (Aberdeen Angus Â Limousin), of which data from 28 were considered to be free of artefacts and therefore suitable for inclusion in the present analysis. In the present study, we use only the data from the final period when cattle were fed 91% of DM concentrate. The mean bodyweight at commencement of the study was 597 AE 39 kg and at conclusion of the 100-days feeding period they were 677 AE43 kg, with an average daily liveweight gain during the high concentrate period of 1.67 AE 0.49 kg/day. The 23 cows on Farm C -Dairy High were offered concentrate feed in the parlour and also provided with ad libitum access to a partial mixed ration (PMR). The challenge diet achieved a high concentrate to forage ratio and low percentage of neutral detergent fibre (NDF) from forage that was expected to result in a high risk of SARA. Cows ranged widely in age and stage of lactation. Their average milk yield was 36.61 AE 3.07 L/day.

Devices
The devices used on Farms A and C were the SMAXTEC SENSOR +PH bolus (smaXtec animal care sales GmbH) and those used on Farm B were the Well Cow pH/ temperature Bolus (Well Cow Limited). All devices were calibrated according to the manufacturers' instructions before administration via balling gun to the animal. The Well-Cow boluses in beef cattle were calibrated before administration, recovered at time of slaughter and checked for electrode drift and clock synchronisation. All data from malfunctioning devices were discarded. Data from all the smaXtec boluses in the dairy cattle were recorded telemetrically during the course of the study. All observations from all farms were screened by graphical representation and if device malfunction was suspected on the basis of either fixation at any single value or a sustained drift away from an animal's modal ruminal pH value (in the absence of the characteristic diurnal rhythm) all observations from that animal were discarded. All observations during the first 12 h of deployment and any observations after the 50day manufacturer's guarantee period were also discarded.

Statistical analysis
All data were analysed using R (R Core Team, 2015), using plots to visualise the diurnal patterns of pH measurements through the day for each animal. Separate linear models incorporating a sine wave describing the time of day were fitted individually to the data observed from each animal in a manner similar to that described by Denwood et al. (2018), and estimates of the amplitude and times of peaks and troughs in pH (phase shift) were extracted from these models and compared between farms using t tests. Observed mean pH measurements were calculated for each animal, and these mean estimates were compared pairwise between farms using t tests.
To estimate the probability of diagnosing SARA on the basis of a single observation per animal below a specified diagnostic threshold, firstly a single pH observation from within 15 min of a given time of day from each animal was sampled, ignoring the date on which the sample was taken. These observations were then transformed by deducting 0.24 pH units, which corresponds to the average difference between reticular and ruminal pH values (Falk et al. 2016). A given number of animals (from the same farm) were randomly sampled from this data without replacement, and the number of transformed observations below the specified threshold was determined. A total of 10,000 iterations were used to give an estimate of the probability of diagnosing SARA for each combination of time of day, recommended number of animals, pH threshold, and proportion of tested animals below the threshold. This was done separately for each farm, using the following values for pH threshold: 5.0, 5.5, 5.8, 6.0; number of sampled animals: 3, 4, 6, 12; proportion of animals required to be below the threshold for a positive diagnosis of SARA: 1/3, 1/4, 1/6, 2/4, 2/6, 3/6, 6/12.
To quantify the effect of number of devices deployed on the accuracy of the estimation of a herd's reticuloruminal pH status, a set of pH measurements taken from animals in the same herd at the same time and date was sampled with replacement from the available measurements. The absolute difference between the mean of the sample and the true mean of the observed pH from all animals at that date and time in that herd was recorded. One thousand iterations were used to give median, 75% quantile and 99% quantile estimates for the mean absolute error associated with each combination of time of day, herd, and number of animals sampled. To identify the optimum number of boluses for each herd, the benefit of adding one more bolus was calculated for each sample size as the relative decrease in median estimate of the absolute difference. A threshold was arbitrarily set where the decrease in absolute difference associated with adding a single extra bolus fell to less than 5% of its value.

Results
There were no overt cases of clinical acidosis or indigestion despite the challenge imposed on Farms B and C. After elimination of faulty devices, data were used from 24 of 24 deployed devices on Farm A, 28 of 39 deployed devices on Farm B and 23 of 23 deployed devices on Farm C. Fig. 1 shows summary statistics of the individual cow reticuloruminal pH values. The animal mean pH values, amplitude and time of lowest pH all differed significantly on beef Farm B relative to both dairy farms (both P < 0.001), but not significantly between the dairy farms (P = 0.26 for mean; P = 0.70 for amplitude; P = 0.70 for time of lowest pH). Each farm differed significantly from each of the others in respect of the phase shift (all P < 0.001). Fig. 2 shows a time series plot from one representative cow on dairy Farm C, showing the fitted sine curve in red superimposed on all of the observed traces for each day in grey and the mean of the instantaneous observations in blue. Plots from all animals are included in the online supplementary information. Fig. 3 shows the probability that a sample will have a pH below a specified diagnostic threshold depending on the time of day. Given the accepted criterion of 3 out of 12 animals with pH 5.5, there was close to 100% probability of a positive  point in time would Dairy Farm A be considered to be at risk of SARA according to this definition. On Dairy Farm C, which was exposed to a SARA challenge diet, the probability of detection at the recommended criteria never exceeded 12%. Adjusting the criteria by increasing the pH threshold naturally resulted in increased probabilities for detecting values below the threshold, and reducing the requirement for 3 animals above the threshold to only 1 animal above the threshold also resulted in increased probabilities of detection, particularly for Dairy Farm C. The expected absolute error in instantaneous pH sampled from between 1 and 12 cows relative to the actual mean of the whole herd is shown in Fig. 4. There is a low overall variability among animals in Farms A and C relative to Farm B, resulting in a lower absolute difference for all sample sizes. The mean pH from a sample of 3 animals from herd A was within 0.1 pH units of the true mean 50% of the time, and within 0.25 pH units 99% of the time. In contrast the mean of a sample of 3 animals from Farm B was within 0.25 pH units only 50% of the time, and within 0.8 pH units 99% of the time. Shading on Fig. 4 shows the parameter region where the decrease in absolute difference associated with adding a single extra bolus fell to less than 5% of its value, which demonstrates the diminishing returns associated with increasing sample size. According to this criterion, the optimal number of animals was 9 for both farms A and C, but 10 animals for farm B.

Discussion
The currently accepted criteria for the assessment of reticuloruminal pH as part of a diagnosis of SARA is a herd or group of cattle from which 3 out of 12 randomly selected animals have a reticuloruminal pH 5.5 at a specified range of times after feeding, measured by rumenocentesis. This study suggests that these criteria will likely have only moderate diagnostic utility. The diurnal cycle strongly affects the probability of detection of reticuloruminal pH values below a specified threshold, to the extent that the same group of animals had a probability of either 100% or 30% of being identified as having sub-threshold pH values at different hours of the day. Given the high concentrate ration fed to the Beef Farm B animals, and the very conservative ration that was fed to the Dairy Farm A animals, the desired profile from Fig. 3 would be one in which the probability of detection of SARA is constant and close to 0% for Farm A and 100% for Farm B. Dairy Farm C should be expected to fit somewhere in between, depending on the true SARA status of the herd. However, as seen in Fig. 3, this is not the case for any of the alternative sets of criteria. The trace for 3 of 12 animals below a ruminal pH threshold of 5.8 (reticular pH equivalent of 6.04) provides a good approximation of the ideal, but only for the period 16:00 to 21:00 h. In the Beef Farm B animals that were fed at approxim"0.25"/>h, the Garrett et al. (1999) recommendation of sampling 5-8 h after making fresh TMR available to animals would mean sampling between 12:00 h (when the probability of a positive diagnosis in this herd was <50%) and 15:00 h (when the probability of a positive diagnosis had climbed to just under 90%). Our analysis suggests that on this farm, the probability of rumenocentesis samples being below the threshold set by Garrett et al. (1999) would be highest if samples were taken between about 16:00 h and 05:00 h.
The currently accepted criteria for the definition and diagnosis of SARA are heavily based on clinical studies from Wisconsin in the mid to late 1990s (Nordlund and Garrett, 1994; Nordlund et al., 1995; Garrett et al., 1999). In all cases the diagnosis was made at a herd or group level rather than individual animal and the intention was to identify the presence of herd or group-level risk factors that contributed to low reticuloruminal pH. The authors made it clear that reticuloruminal pH values were only one component of a diagnosis that also included anamnestic, environmental, historical and clinical information. The recommendations regarding reticuloruminal pH were made on a clinical and inferential rather than statistical basis and without the benefit of the recently available knowledge of the variation in reticuloruminal pH of cattle on farms under diverse environmental conditions (Denwood et al., 2018). The main justification for the threshold of pH 5.5 was that it was the level below which feed intake was reported to be depressed in a small study (four rumen-fistulated beef steers) on the adaptation to corn or wheat diets (Fulton et al., 1979). Feed intake declined as pH fell below 5.5, and causality was apparently assumed, although it was clear that corn and wheat had very different effects on pH and on intake. Nonetheless, in the absence of more robust data, this threshold seems to have assumed general acceptance among the clinical and research community. More recently, Aschenbach et al. (2011) examined the basis for defining tolerable pH thresholds and concluded that a first threshold at pH = 5.8 is appropriate due to strong evidence of an increase in the proportion of lactate-utilising micro-organisms below this level, and that this approximately coincides with the level at which early inflammatory processes can occur. They proposed a second threshold at a level of pH = 5.0, at  Fig. 3. Probability of a sample falling below a specified diagnostic threshold by time of day and using alternative thresholds and numbers of animals observed. The shaded panel shows the recommended criteria for risk of sub-acute rumen acidosis (SARA) as given by Garrett et al. (1999). Farms B and C represent herds in which SARA was at a very high risk according to dietary inputs. The figure shows that even in beef herd B, which was on a 90% concentrate diet, the probability of meeting the recommended criteria (3/ 12 animals with pH 5.5) was <25% between 09:00 h and 15:00 h. On dairy farm C, which was on a SARA challenge diet, the probability of detection at the recommended criteria never exceeded 25%, but if the threshold was set at pH = 5.8, the probability of diagnosis with 3/12 animals would exceed 50% between 15:00 h and 21:00 h.
which protozoa die, microbiome changes again and epithelial transport and barrier function are compromised. The first threshold proposed would be roughly consistent with the concept of SARA, whereas the second threshold would be more consistent with previous definitions of acute acidosis. The treatment of diurnal variation in pH within the literature shows possible confirmation bias. Early studies using discontinuous sampling methods reported that pH nadirs occurred some hours after feeding (e.g. Nordlund et al., 1995;Nordlund and Garrett, 1994). Other authors have tended to confirm this relationship although their data do not always provide strong support (e.g. Duffield et al., 2004). An alternative hypothesis that is more consistent with our data is that the strongest determinant of the diurnal cycle is the night-time period of high rates of rumination in company with reduced feeding, likely causing increased saliva production and allowing the removal of acidic products of carbohydrate ingestion from the rumen. This is consistent with our observation on all farms that peak reticuloruminal pH values were seen in the early morning, and nadirs were seen in the late evening. We have recently taken a statistical modelling approach to describe the temporal variation in reticular pH in cattle on dairy farms that shows a strong and predictable diurnal rhythm in pH with a similar peak time during the day (Denwood et al., 2018). Farm, animal and time of day were the dominant terms, exceeding the effect of intervals between milking times (which is believed to be a proxy for feeding time intervals).
The results in relation to the effect of the number of devices deployed and the precision of estimation of reticuloruminal pH confirm that the accuracy of herd-mean pH estimates is lower in herds with a higher degree of among-cow variation than those that are more consistent between cows. The observations are broadly consistent with the manufacturers' claims that the smaXtec boluses have an accuracy of AE0.2 pH units from deployment until 90 days of life, and the Well Cow boluses have an accuracy of AE0.3 pH units. Considering the relative decrease in absolute difference with increasing number of boluses deployed, the optimal number of boluses was between 9 and 10 in all three of the herds examined. On the basis of our analysis, we would recommend the use of 9 boluses. Even with high variability among animals, this would ensure that the estimate of herd-mean pH was within 0.5 pH units of the true value 99% of the time.
This study used data derived from two types of bolus in this study and as the study was opportunistic and retrospective, we did not have a means for standardising their performance. However, all the Well-Cow boluses in beef cattle were calibrated before administration, recovered at time of slaughter and checked for electrode drift and clock synchronisation. All data from 8 malfunctioning devices were discarded. The smaXtec boluses were all calibrated before insertion but daily data recovery and examination showed that the devices performed consistently with no evidence of drift at all during the experimental period. Non-surgical recovery and post hoc calibration is not possible with these devices in lactating dairy cows. Nonetheless, to minimise any possibility of electrode drift affecting the results, we applied filtering criteria that were conservative relative to manufacturers' claims.

Conclusions
This study suggests that rumenocentesis should be carried out late in the afternoon or in the evening to maximise the probability of detecting low pH values, regardless of time of feeding. The previously published criteria of 3 animals from a group of 12 with a pH 5.5 do not provide a robust diagnosis. Our results also suggest that a minimum of 9 boluses would provide a reasonable estimate of the true mean pH for herds at high risk of rapidly fermentable carbohydrate-induced fermentation disorders, assuming that the intra-and inter-animal variation in instantaneous pH is similar to that observed in our data.

Conflict of interest
None of the authors has any financial or personal relationships that could inappropriately influence or bias the content of the paper.