Dairy management practices associated with multi-drug resistant fecal commensals and Salmonella in cull cows: a machine learning approach

Background Understanding the effects of herd management practices on the prevalence of multidrug-resistant pathogenic Salmonella and commensals Enterococcus spp. and Escherichia coli in dairy cattle is key in reducing antibacterial resistant infections in humans originating from food animals. Our objective was to explore the herd and cow level features associated with the multi-drug resistant, and resistance phenotypes shared between Salmonella, E. coli and Enterococcus spp. using machine learning algorithms. Methods Randomly collected fecal samples from cull dairy cows from six dairy farms in central California were tested for multi-drug resistance phenotypes of Salmonella, E. coli and Enterococcus spp. Using data on herd management practices collected from a questionnaire, we built three machine learning algorithms (decision tree classifier, random forest, and gradient boosting decision trees) to predict the cows shedding multidrug-resistant Salmonella and commensal bacteria. Results The decision tree classifier identified rolling herd average milk production as an important feature for predicting fecal shedding of multi-drug resistance in Salmonella or commensal bacteria. The number of culled animals, monthly culling frequency and percentage, herd size, and proportion of Holstein cows in the herd were found to be influential herd characteristics predicting fecal shedding of multidrug-resistant phenotypes based on random forest models for Salmonella and commensal bacteria. Gradient boosting models showed that higher culling frequency and monthly culling percentages were associated with fecal shedding of multidrug resistant Salmonella or commensal bacteria. In contrast, an overall increase in the number of culled animals on a culling day showed a negative trend with classifying a cow as shedding multidrug-resistant bacteria. Increasing rolling herd average milk production and spring season were positively associated with fecal shedding of multidrug- resistant Salmonella. Only six individual cows were detected sharing tetracycline resistance phenotypes between Salmonella and either of the commensal bacteria. Discussion Percent culled and culling rate reflect the increase in culling over time adjusting for herd size and were associated with shedding multidrug resistant bacteria. In contrast, number culled was negatively associated with shedding multidrug resistant bacteria which may reflect producer decisions to prioritize the culling of otherwise healthy but low-producing cows based on milk or beef prices (with respect to dairy beef), amongst other factors. Using a data-driven suite of machine learning algorithms we identified generalizable and distant associations between antimicrobial resistance in Salmonella and fecal commensal bacteria, that can help develop a producer-friendly and data-informed risk assessment tool to reduce shedding of multidrug-resistant bacteria in cull dairy cows.


INTRODUCTION
The Centers for Disease Control and Prevention (CDC) estimates that more than 2.8 million antimicrobial resistant infections occur in the U.S. with more than 35,000 deaths annually (Control CfD, & Prevention, 2019). Amongst the resistant bacteria, the CDC classifies nontyphoidal Salmonella enterica as a serious public health threat (Control CfD, & Prevention, 2019). Salmonella is an important foodborne zoonotic agent in the U.S. (Scallan et al., 2011) and several studies reported on its prevalence in cull cattle. Troutt et al. found that the prevalence of Salmonella in cecal contents of dairy cattle at slaughter in the Western U.S. ranged between 9.6% and 35.6% in the winter, and between 32.3% and 93% in the summer (Troutt et al., 2001). More recent studies reported on the associations between herd management and seasonal differences on the prevalence of Salmonella in fecal samples of cull dairy cows collected quarterly from seven California dairies with an overall Salmonella shedding prevalence of 3.42% (95% CI [1. 28-5.56]) (Abu Aboud et al., 2016). Pereira et al. (2019) followed six of the same California dairies for a second year showing an increase in Salmonella shedding prevalence (30.6%; 95% CI ). The increase in the prevalence was speculated to be due to increased rainfall and drier summer season and herd changes that occurred during the latter study (Adaska et al., 2020). The latter study explored how the study herd and cow level features were associated with shedding of antimicrobial resistant Salmonella and although 60% of the isolates were pan-susceptible, the remaining isolates were found to be resistant to different medically important antimicrobial drugs (MIAD) defined as antimicrobial drugs (AMD) that are important for therapeutic use in humans, with 12% of the isolates being multidrug resistance to more than two drug classes.
Fecal commensals such as E. coli and Enterococcus spp. can acquire mobile gene elements that encode antimicrobial resistance to these species (Aidara-Kane et al., 2018). Given the documented antimicrobial resistance in Salmonella isolated from cull dairy cows, further research on the similarity between resistance patterns of fecal commensals and Salmonella shed in feces of dairy cattle is needed.
Traditional risk factor approaches (mixed-effects modeling) can often have limitations due to high-dimensional, imbalanced, and non-linear data and can perform poorly in cases of large number predictive variables. To overcome these, we used a suite of classification tree models, a group of supervised machine learning models that can handle various types of data and handle interactions between predictive variables. Classification tree models like random forest, gradient boosting have been found useful in investigating the prevalence and associated risk factors of bovine viral diarrhea virus (Machado, Mendoza & Corbellini, 2015), swine pneumonia (Mollenhorst et al., 2019), and mastitis in dairy cattle (Hyde et al., 2020). In the study reported here, our objective was to explore the herd and cow level features associated with the multi-drug resistance and resistance phenotype shared between Salmonella, E. coli, or Enterococcus spp. using machine learning algorithms.

Farms surveys and sampling
The study was approved by the University of California, Davis's Institutional Animal Care and Use Committee (protocol number 18019). Six dairy farms located in the San Joaquin Valley of California were reenrolled as a convenience sample and followed up for a second year after being part of an earlier study (Abu Aboud et al., 2016). Briefly, cull cows were identified for fecal sampling once during each season between 2015 and 2016, specifically during summer (July 1-September 30, 2015), fall (October 1-December 31, 2015), winter (January 1-March 31, 2016) and spring (April 1-June 30, 2016). The choice of sampling week to collect fecal samples from the cull cows during any of the four seasons was also by convenience. From the list of cows selected by the dairy farms for culling, 10 cows were randomly selected for fecal sampling on the day of their removal from the herd using a random number generator (Excel; Microsoft Corp., Redmond, WA, USA). Several lists of random numbers were prepared in multiples of 10 ranging from 11-20 to 91-100 cows and the respective list was selected depending on the number of cows presented for culling on the day of sampling. If a producer had less than 11 cows presented for culling, all the cows were sampled. An individual disposable sleeve was used to manually collect fecal samples from the rectum of each selected cow. The fecal samples were transported to the Aly Lab (Dairy Epi Lab) on wet ice for processing on the same day.
A survey was also completed with the help of the herd manager on the same sampling day. The survey questions were described in an earlier report (Pereira et al., 2019). Briefly, the questions targeted management of the herd in the previous 4 months and collected information on herd size, breed, rolling herd average, cull rate, frequency of culling per month, the proportion of cows sold for beef (compared to as dairy), proportion and reasons for condemnation of culled cows. The survey also collected information on the proportion of culled cows that received injectable medical treatments in the 3 weeks prior to culling, the role of dairy staff allowed to treat cows on the dairy, practices to avoid drug residue violations (use of specific drug types, following withdrawal times, milk and/or urine testing prior to the cow being culling, or other practices), tracking of drug withdrawal intervals, having a drug inventory system in place, and extra-label drug use (ELDU, frequency and familiarity). Finally, a backup of the herd record software was obtained to collect information on the culled cows' milk production and health events. A relational database was used to house and merge data from the surveys, dairy records, and test results (Access; Microsoft Corp., Redmond, WA, USA).

Bacteriological culture
The California Animal Health and Food Safety (CAHFS) laboratory conducted all the study sample testing for Salmonella as described by Adaska et al. (2020). Briefly, 1 g of feces was inoculated into tetrathionate selective enrichment broth and incubated at 37 ± 2 C. The next morning a cotton swab was used to inoculate the overnight broth onto XLD and XLT-4 plates and these were incubated overnight at 37 ± 2 C. H 2 S positive, Salmonella suspect colonies from each set of plates were subcultured onto individual bi-plates (5% sheep blood agar-MacConkey agar) and incubated overnight at 37 ± 2 C. One colony from each bi-plate was used for biochemical testing which included triple sugar iron, urea, motility indole ornithine, citrate, O-nitrophenyl-beta-D-galactopyranoside, and lysine iron agar slants. Serogrouping and serotyping were performed, using the White-Kauffmann-Le Minor scheme (Grimont & Weill, 2007) on colonies with biochemical test results consistent with Salmonella (Quinn et al., 2002).
Fecal samples were also cultured for E. coli and Enterococcus spp. isolation as described previously (Li et al., 2014). Briefly, a 40 mL solution of buffered peptone water containing 5 g of feces in a 50 mL polypropylene tube was homogenized using a mechanical shaker for 15 min before filtering using gauze. A total of 1000, 100 and 10 ml of the filtered solution were then streaked on CHROMAgar ECC (Chromagar, Paris) and Enterococcus Indoxyl-β-D-Glucoside agar (mEI) plates (Becton, Dickinson and Company, Franklin Lakes, NJ, USA) both incubated at 37 C for 24 h. Reference strains ATCC 25922 (E. coli) and ATCC 29212 (Enterococcus faecalis) were plated on agar plates as positive controls. Two pure colonies were isolated of each species after presumptive colonies were confirmed using biochemical tests (E. coli were confirmed using urea, indole, triple sugar iron, Methyl Red-Voges-Proskauer and Simmons citrate; Enterococcus were confirmed using bile esculin, brain heart infusion agar, and growth in broth with and without 6.5% NaCl).

Antimicrobial susceptibility testing
Salmonella and E. coli bacterial resistance was evaluated with a broth microdilution method using a Gram-negative Sensititre plate (CMV2AGNF) and Enterococcus spp. evaluated on gram-positive Sensititre plate (CMV3AGPF) (Sensititre, Thermo Fisher Scientific, MA, USA) according to the manufacturer's instructions and as described in previous studies (Li et al., 2018;Pereira et al., 2019). The minimum inhibitory concentration (MIC) values were the lowest concentrations of AMD that inhibited visible growth of bacteria. Interpretations of antimicrobial resistance were based on breakpoints recommended by the National Antimicrobial Resistance Monitoring System (https://www.cdc.gov/narms/antibiotics-tested.html; and https://www.ars.usda.gov/ ARSUserFiles/60400520/NARMS/ABXEntero.pdf) and the Clinical and Laboratory Standards Institute (CLSI, 2014;CLSI, 2018). Due to the inherent resistance of Salmonella to cephalosporins, aminoglycosides, lincosamides, oxazolidinones and glycopeptides, drugs from these classes were excluded from the antimicrobial resistance analysis. In addition, the following drug classes to which E. coli is inherently resistant were excluded from the analysis: lincosamides, oxazolidinones, penicillins, streptogramins, glycopeptides. Similarly, the drug classes exclude due to the inherent resistance of Enterococcus spp. included cephalosporins, lincosamides, fluoroquinolones, aminoglycosides, aminocyclitols, sulfonamides, folate pathway antagonists. Isolates from any of the three species (Salmonella, E. coli, Enterococcus spp.) were identified as multi-drug resistant if resistance to at least one AMD in each of three or more drug classes was observed (Magiorakos et al., 2012).

Development of classification algorithms
Classification tree models were developed to test if herd management practices and features related to dairy cows can predict multi-drug resistance phenotype in Salmonella and fecal commensal E. coli, and Enterococcus spp. isolated from the same cows. A cow was considered to shed MDR bacteria if any of its Salmonella, Enterococcus spp. and/or E. coli isolates showed resistance for three or more antimicrobial classes (regardless of whether the species with resistance was Salmonella, Enterococcus spp. and/or E. coli), in which case the cow was labeled as 'shedding bacteria with multi-drug resistance (MDR) phenotype' (numerical label = 2). In contrast, if a cow shed bacteria that were resistant to only one or two antimicrobial classes (regardless of species) the bacteria isolated were labeled as 'shedding bacteria with antimicrobial resistance (AMR) phenotype' (numerical label = 1). If the bacteria isolated from a cow did not show any resistance across all three bacterial species to any AMD, the bacteria were labeled as non-resistant (numerical label = 0). The mutually exclusive definitions were necessary to develop a single model that predicts one of three resistance states MDR, AMR, or no resistance. Similarly, cows were also classified based on resistance phenotypes separately observed in each bacterial species isolated (three separate classification labels based on resistance phenotypes of Salmonella, Enterococcus spp. and E. coli isolates). Finally, classification labels were also generated based on resistant phenotypes observed in either commensal bacteria (Enterococcus spp. and E. coli) shed in feces collectively. Using features from herd surveys, classification tree models were trained to predict the MDR phenotype of bacteria shed in the feces of the study cows.
Three machine learning algorithms, specifically decision tree classifier (DTC), random forest (RF) and gradient boosting (GB) were developed to explore the risk factors for resistance phenotypes in the study isolates using herd survey data, specifically to predict the multilabel outcome based on the resistance phenotype of bacteria shed by cows (non-resistant, AMR, and MDR). For each algorithm, data from the entire study cohort described by Pereira et al. (238 cows) were used, except models specific to predicting AMR phenotypes in Salmonella were restricted to the cohort of 58 Salmonella positive cows. Table 1 describes all classification algorithms trained and developed for various bacteria-specific outcomes.
For all three classification algorithms, 25 features related to herd management and 12 features related to individual cows collected from the survey were used as predictive features ( Table 2). The DTC generates an optimum tree based on attributes by recursive selections to split data into classes and it was only used to visualize the optimum tree and as a contrast to the remaining classification tree models (RF and GB) that prevent overfitting, unlike DTC. The RF and GB algorithms both generate a series of recursive trees of binary splits for randomly sampled predictor variables. While all tree classification algorithms handle interaction effects between predictors, within GB, boosting builds and combines collective models improving the predictive performance of many weak models substantially, and fits complex nonlinear relationships (Elith, Leathwick & Hastie, 2008). For the validation of each algorithm, the data was split into training and validation datasets. To identify the best hyperparameters of the classification algorithms (hyper-tuning), a grid search was implemented on the training dataset. Grid search is a tuning technique that computes optimum values of model parameters automatically by an exhaustive search performed on a set of parameter values. Training datasets, composed of 80% of the data, were randomly selected for gird search, maintaining the outcome proportional to the original dataset, with three-fold cross-validation (internal validation). Model parameters tested for hyper-tuning of DTC, RF, and GB are given in Table 1. The best performing model parameters were chosen based on the accuracy of the model. For each algorithm, the external validation of the best performing model was done on the validation dataset (20% remaining random sample of the original dataset) to quantify the performance of the model on the completely independent validation dataset. The DTC and RF models were implemented using the Scikit-learn machine learning package (Pedregosa et al., 2011) and GB was implanted using XGBoost in python (Chen & Guestrin, 2016).
Validated models were eventually fit on the complete dataset to produce the final predictions. The relative influence (importance) of features for the random forest model was estimated using average gini importance, permutation, and feature drop methods. The gini importance for a feature is defined as the sum over the number of splits (across all tress) that include the feature, proportionally to the number of samples it splits (Breiman, 2001). The gradient boosting model with the XGboost platform was evaluated using Shapely Additive Explanations (SHAP) that assigned each predictive feature an additive feature unifying six existing methods (Lundberg & Lee, 2017). Partial dependence of gradient boosted model prediction on model features (expected output response trend as a function of feature) was explored to understand the associations of the herd and cow-related features with predictions (Friedman, 2001). The Python code used for pre-processing the data, training and validating models and generating figures can be found here in the Zenodo repository: DOI 10.5281/zenodo.4387017.

RESULTS
From the six dairy herds, Salmonella isolates were detected in the feces of 58 cows (24.4% ± 2.8 out of 238 cows). Two herds had no Salmonella positive samples throughout the study period while for others the prevalence ranged from 12.5% (±5.2, n = 40) to 70.0% (±7.2, n = 40). The most common reason reported for culling was low milk production (65.1%, ±3.1) followed by poor reproduction (31.1%, ±3.0). Lameness (10.5%, ±1.9) and mastitis (10.1%, ±1.9) were the other reasons reported in the survey. Of the sampled cull cows, 15.5% (±2.3) were reported as having received AMD as part of a treatment protocol for the condition resulting in their culling decision. In contrast, 5.9% (±1.5) received anti-inflammatory drugs for the condition resulting in their culling decision.

Antimicrobial resistance phenotype shared between bacterial isolates
Study found no perfectly shared resistance phenotype between the study isolates. Within the commensal bacteria (E. coli or Enterococcus spp.), resistance to tetracycline was the most prevalent (11 cows), while shared resistance to each of the drugs kanamycin, streptomycin, and chloramphenicol was observed once. However, six individual cows were detected sharing tetracycline resistance phenotypes between Salmonella and one or both commensal bacteria (E. coli or Enterococcus spp.). Of the six cows, five were from a single herd (herd 4) and three shed MDR bacteria (Table 5). However, Salmonella, and one or both commensal species, were found to share resistance to only one drug, tetracycline, in six cows across the study. Of the six cows, only three had bacterial species with MDR phenotypes detected and in two of these cows the MDR profile included tetracycline. The sensitivity of models ranged from 0.47 to 0.74 with precision ranging from 0.46 to 0.66. Details related to hyperparameters, best performing decision tree classifiers, random forest, and XGboost models and the performance of the selected models are given in Table 1.

Decision tree classification models
For all DTC models, the impurity (gini), a measure of the heterogeneity of the outcome in a subset of samples resulting from a split in a decision tree, was reduced in samples the most by the rolling herd average milk production, denoting its highest position in decision trees generated by all five models (Fig. 1, Figs. S2-S5). Other management features that formed nodes showing high measures of split quality (gini) including the proportion of Jersey cows in the herd, administration of tetracyclines, monthly culling percentage, culling frequency, and the number of culled individuals. Of the five DCT models, all but the model for overall resistance across Salmonella and commensal bacterial species were able to generate nodes that classify cows based on their fecal shedding of bacteria to a single class of AMD, or not resistant (Figs. S2-S5) with 100% purity (homogenous subsets). While only the DCT model for commensal bacteria, resulted in pure nodes for MDR cows, none of the other models were able to classify cows into pure MDR nodes (Fig. S2). Figure 2 shows the optimum decision tree for classifying cows into MDR, AMR, or not resistant based on phenotypes of all bacteria (Salmonella, Enterococcus spp., E. coli).

Random forest models
Results of random forest models indicated common herd management practices that influence the shedding of MDR and AMR phenotypes of Salmonella, Enterococcus spp., and E. coli collectively, as well as individually (Fig. 3). The number of culled animals, lower than greater than Figure 2 Optimum decision tree to classify cows shedding multi-drug resistant (MDR), antimicrobial-resistant (AMR), and non-resistant Salmonella, Enterococcus spp., E. coli based on management practices observed in Californian dairy herds. Nodes are represented by stacked histograms depicting distribution samples in the data (AMR, MDR, no resistance), followed by optimum decision point (pointer on the x-axis). Arrows on the left and right indicate lesser and greater than the decision point respectively. Final nodes are represented by pie chart with distribution of samples. Factor acronym definitions are described in Table 2. Full-size  DOI: 10.7717/peerj.11732/ fig-2 monthly culling frequency and percentage, herd size, and proportion of Holstein cows in the herd were found to be influential herd characteristics (top ten features by relative influence) predicting MDR phenotypes in all algorithms. Random forest algorithms for predicting AMR phenotype in Enterococcus spp., commensal bacterial species combined, and in all bacterial species showed the same top ten most influential herd management features. Individual-level features such as culling due to milk production, mastitis, reproductive and other reasons appeared important for predicting resistance phenotypes in Salmonella. The use of the chalk method for withdrawal determination was in the top ten most influential features and for the E. coli and Salmonella models.

Gradient boosting classification models
Herd characteristics that showed higher variable importance in SHAP values for predicting cows shedding MDR resistance bacteria based on all bacterial species were herd size, the proportion of Jersey cows, sampling season, the frequency of extra-label drug use, rolling herd average, and culling related features. These features also showed high marginal contributions in predicting AMR phenotypes. For predicting AMR phenotypes, features such as the number of monthly veterinary treatments, antibiotic dose tracking, and the proportion of Holstein cows also showed higher SHAP values (Fig. 4). Partial dependence of gradient boosting model prediction on important features showed the possible relationships of these herd and cow-related features in predicting the AMR of their fecal bacteria as MDR. Higher culling frequency and monthly culling percentages were associated with cows with MDR phenotypes from all bacteria, whereas an overall increase in the number of culled animals from the herd showed a negative trend  Figure 3 Top ten herd management practices based on variable importance (Gini coefficient) in classifying cows shedding multi-drug resistant (MDR), antimicrobial-resistant (AMR), and non-resistant for Salmonella, Enterococcus spp., E. coli in Cal. Factor acronym definitions described in Table 2.
Full-size  DOI: 10.7717/peerj.11732/ fig-3 with classifying a cow as shedding MDR bacteria. Winter season was negatively associated with MDR phenotype bacteria shed by cows compared to cows sampled in Spring. Similarly, herds with more than 10,432 kgs of rolling average milk production showed a positive trend with MDR positive cows. Cows from herds with a higher proportion of Jersey cows were negatively associated with being classified as shedding MDR bacteria by the gradient boosting algorithm (Fig. 5). Within other herd characteristics that were identified as important, herd size showed a varying trend with classifying cows shedding MDR bacteria, with some herd sizes showing a positive association of classifying cows as shedding MDR bacteria (Fig. 5).
For predicting MDR phenotypes in Salmonella shed by cows, rolling herd average milk production, sampling season, chalk methods for tracking withdrawal, monthly culling Figure 4 Mean SHAP values depicting the impact of herd management practices on predicting multi-drug resistant phenotype in either Salmonella, Enterococcus spp., and E. coli shed in dairy cows for Gradient boosting classification (XGboost) model. Factor acronym definitions described in Table 2.
Full-size  DOI: 10.7717/peerj.11732/ fig-4 frequency, and Salmonella vaccine were found to be important predictive features with high SHAP values (Fig. 6). The exploration into partial dependence of these features gave insights into relationships of feature values with classifying a cow as shedding MDR phenotype Salmonella (Fig. 7). Increasing rolling herd average milk production and monthly culling percentage was positively associated with MDR phenotypic Salmonella in cow feces. Similarly, cows sampled in spring were more likely to be classified as shedding MDR Salmonella. Models predicting phenotypes in commensals E. coli, and Enterococcus spp.; the number of culled animals in the previous year was always in the top ten most important features to classify cows as shedding MDR bacteria (Figs. S5-S8). Rolling herd average milk production features describing culling practices, and herd size, consistently featured as important in classifying cows as shedding MDR phenotypic bacteria for all the other three models (Commensals, E. coli and Enterococcus spp.). The frequency of extra-label drug use was an important feature for models separately predicting MDR in Enterococcus spp. (Fig. S7) and E. coli shed by cows.

DISCUSSION
The current study investigated antimicrobial resistance phenotypes between bacteria shed in dairy cattle using decision tree classification algorithms. A simple decision tree model does tend to find the best fit for the training data, but the splitting process rarely is Figure 5 Partial dependence indicating the association of top six predictive herd management practices in classifying cows as multi-drug resistant phenotype in either Salmonella, Enterococcus spp. and E. coli shed in dairy cows for Gradient boost. Partial dependence plots are generated for values presented in the data resulting in the non-linear x-axis. Blue shaded region and error bars represents standard deviation of partial dependence (n = 238).
Full-size  DOI: 10.7717/peerj.11732/ fig-5 generalizable to other data. A random forest model, which is bagging of decision trees, and a boosting classification model, which is boosting decision trees, tend to perform better on testing data and can help us identify the generalizable conclusion. In this analysis we explored these models step by step from a simple decision tree to generalized boosting trees to find important management-related factors that might affect the distribution of multidrug resistance in dairy cattle herds. A unique aspect of the current study is use of the aforementioned algorithms to distinguish between the resistance profiles (no resistance, antimicrobial resistance and multidrug resistance) of a pathogenic bacteria, Salmonella, and commensal bacteria (E. coli and Enterococcus spp.) isolates from feces of cull dairy cows. Previous investigations were restricted to understanding herd and cow level characteristics with the resistant Figure 6 Mean SHAP values depicting the impact of herd management practices on predicting multi-drug resistant phenotype in Salmonella shed in dairy cows for Gradient boosting classification (XGboost) model. Factor acronym definitions described in Table 2.
Full-size  DOI: 10.7717/peerj.11732/ fig-6 phenotypes in Salmonella shed by cull dairy cows (Pereira et al., 2019). We explored weather considering the resistance phenotypes for these three bacterial species together, and separately, to identify associations between herd management practices and prevalence of resistance phenotypes. Although, none of the isolates had shared phenotype resistance, the six cows that had tetracycline resistance in at least two of the three bacteria studied should be explored further with whole genome sequencing and follow up studies that employ metagenomic analyses on the microbiome.
The three machine learning algorithms tested in this study indicated that the overall distributions of three resistance phenotypes classified in this study as MDR, AMR, or no resistance were mainly governed by resistance in Enterococcus spp., which showed the highest prevalence of MDR and AMR phenotypes compared to Salmonella and E. coli. The latter may be explained by Enterococcus spp. that are known to have frequent MDR phenotypes such as E. faecium. Comparative feature importance plots for all random forest models developed for these bacterial groups indicated the same, where similar features are found to be important for the model predicting MDR in all bacteria together, in commensals together, and in Enterococcus spp..
Herd size has been already identified as associated with higher odds of detecting Salmonella resistant to tetracycline (Pereira et al., 2019). In the current study, we showed that herd size was also positively associated with detecting MDR phenotypes in Salmonella as well as the commensal bacteria E. coli and Enterococcus spp. Salmonellosis is known to be associated with poor milk production and reproduction and hence increased risk for diseases, such as mastitis and infertility, and AMD treatments which may explain the 12% MDR in Salmonella isolates from the study cows (Lanzas et al., 2010). Similarly, Salmonella has been associated with clinical disease in both adult and young dairy cattle and beef cows (Divers & Peek, 2007;Pender, 2003;Roy et al., 2001;Smith, 2014). Percent cull and rate would reflect the increase in culling due to diseases better than actual numbers culled. This is evident from the importance of all three in the random forest model for Salmonella MDR phenotypes. In contrast, the total number culled was negatively associated with shedding multidrug resistant bacteria which may reflect producer decisions to prioritize the culling of otherwise healthy but low-producing cows based on milk or beef market value (with respect to dairy beef), amongst other factors.
Other than market dynamics including milk or beef demand, underlying herd health or management reasons may explain the opposing trends of both percent and number culled, and the outcome MDR in cull cow fecal bacteria. Herds with MDR in their cull cow fecal bacteria hence need to be explored further to identify mechanisms that eventually increase culling rates. It is worthy to note that the random forest algorithm identified diseases of relative importance for MDR in Salmonella but not commensals. However, caution should be exercised with interpreting findings from this specific analysis due to the inability to ascertain that such diseases preceded the Salmonella shedding and specifically MDR status. The combined random forest model for all 3 species however did not show diseases as correlated with MDR, which may be due to the overall effect of the commensals in the dataset. SHAP values ranked variables by importance for classifying resistance type (either MDR, AMR, or no resistance). However, in the case of rolling herd average milk production (RHA), SHAP values were allocated only for predicting MDR or AMR in models for Salmonella and Enterococcus and commensals, showing zero importance in predicting absence of any resistance. The RHA is hence more important in terms of identifying any resistance type (MDR or AMR) versus no resistance. All three algorithms (DCT, RF and XGboost) indicated a high importance of RHA in predicting MDR in fecal bacteria of cull cows. RHA is a dairy performance indicator affected by multiple herd characteristics such as age and breed structure. Studies have indicated stress related to higher production which may increase the chance of certain health conditions subsequently increasing the risk of antimicrobial drug use and hence bacterial resistance (Robbins et al., 2016). In addition, the association between RHA and MDR presented by all the models here is conditioned upon other features that follow the splits further down the classification tree. In contrast, for E. coli, RHA was important in identifying all resistance classifications (AMR, MDR, no resistance).
XGBoost results show that season was correlated with MDR in all 3 species. Specifically, Spring and Fall had a greater correlation with MDR compared to Summer and Winter, with Winter being the least correlated. Similar findings have been observed with a risk of disease in calves in Spring and Fall with bovine respiratory disease (Cummings et al., 2019;Dubrovsky et al., 2019;Maier et al., 2019). The current study's bacterial species, specifically Salmonella, E. coli and Enterococcus spp., shared no specific MDR profiles; however, shared tetracycline resistance was detected. Further studies employing whole genome sequencing and metagenomics on the microbiome may explore factors that explain such shared resistance.

CONCLUSIONS
The current study characterized dairy cattle herd management practices that were associated with fecal shedding of multi-drug resistant bacteria. We identified generalizable and distant associations between pathogenic Salmonella and commensal bacteria. The data-driven suite of machine learning algorithms used here can help develop data-informed tools for better decision making, and risk assessment related to antibacterial resistant shedding by cows.