Modelling the impact of antimicrobial use and external introductions on commensal E. coli colistin resistance in small‐scale chicken farms of the Mekong delta of Vietnam

Abstract Colistin is a critically important antimicrobial for human medicine, and colistin‐resistant Escherichia coli are commonly found in poultry and poultry products in Southeast Asia. Here, we aim at disentangling the within‐farm and outside‐farm drivers of colistin resistance in small‐scale chicken farms of the Mekong delta of Vietnam. Nineteen Vietnamese chicken farms were followed up along a whole production cycle, during which weekly antimicrobial use data were recorded. At the beginning, middle and end of each production cycle, commensal E. coli samples from birds were collected, pooled and tested for colistin resistance. Twelve models were fitted to the data using an expectation–maximization algorithm and compared. We further tested the spatial clustering of the occurrence of resistance importations from external sources using the local Moran's I statistic. In the best model, colistin resistance in E. coli from chickens was found to be mostly affected by importations of resistance, and, to a lesser extent, by the use of antimicrobials in the last 1.73 weeks [0.00; 2.90], but not by the use of antimicrobials in day‐olds, nor their colistin resistance carriage from hatchery. The occurrence of external source importations proved to be sometimes spatially clustered, suggesting a role of local environmental sources of colistin resistance.


Figure S2. Distribution of resistance metric values for each colistin concentration. For each 19
concentration, the resistance metric is the area between the growth curve under this concentration of 20 colistin, and the growth curve in absence of colistin (reference). Only the distributions for 1, 2 and 4 21 mg/L are displayed, as they are the only concentrations presenting similar amounts of high-growth and 22 low-growth curves (see Figure S1). The y position of dots is random and just for displaying purposes. 23 For each concentration, we use a two-component Gaussian Mixture Model (GMM)

64
The algorithm is depicted in Figure S4. For a given model, Y is the observed variables, and Ym 65 their value for sample m=(i,j), including Rm and the explanatory variables detailed in section 66 2.3 (main document). Z is a latent binary variable such that Zm = 1 if an importation of 67 resistance from external sources occurred for sample m, and Zm = 0 otherwise. θ is the set of 68 parameters for the model. 69 At any step s of the E-M algorithm, the complete-data LogLikelihood, named Q function, is 70 defined as (Dempster et al., 1977): 71 72 where E[▪ |Y,θ (s) ] is the conditional expectation with respect to P(Z |Y,θ (s) ), i.e.: 73

74
The E-M algorithm is composed of two steps repeated in loop until convergence ( Figure S4).

75
The algorithm is started with P(Zm = 1 |Ym, θ (0) ) = 0.01, but the exact value chosen here does 76 not affect the convergence (tests were done with values ranging from 0.01 to 0.99). Then, the 77 following steps are repeated until convergence: 78 • Maximization step: because the value of P(Zm = 1 |Ym, θ (s) ) is known from the previous 79 E-step (or from initialization if s = 0), it is possible to find the set of parameters θ (s+1) 80 maximizing the complete-data LogLikelihood Q(θ,θ (s) ): 81 82 • Expectation step: The probability of resistance importation from external source, i.e. 83 P(Zm = 1 |Ym, θ (s+1) ) (the distribution of the missing data), is computed for each sample 84 using the set of parameters θ (s+1) estimated from the previous Maximization step: 85 where η is the probability of resistance when an external source importation occurs (see Table  87 1, main document), Mm,s+1 the probability of resistance predicted by the logistic function (with 88 parameters θ (s+1) ) for sample m when no external source importation occurs (see section 2.3, 89 main document), and Rm is the resistance observed (0 or 1) for sample m. 90 And: 91

92
Steps E and M are repeated until the complete-data LogLikelihood Q converges. The 93 convergence is defined as having five times in a row the difference between the complete-data 94 LogLikelihood values of steps s+1 and s less than 0.01. The M-step and E-step are alternated until convergence of the complete-data LogLikelihood. 103

SM4. Model selection in a missing data situation 104
Once the convergence of the E-M algorithm has been reached for all models, we aim at 105 comparing the models. The Akaike Information Criterion (AIC) is based on the observed data 106 likelihood. Here, as there is a latent variable Z, we maximize the complete-data (and not 107 observed data) LogLikelihood at each Maximization step. Instead, we use the ICH,Q criterion 108 defined in (Ibrahim et al., 2008). 109 First, it has to be noted that, for any step s of the algorithm: where θ* is the converged set of parameters. 116 (Ibrahim et al., 2008) define the ICH,Q as: 117 118 where C*(θ*) is the model penalty term that we set to two times the number of parameters. 119 Therefore, we retrieve the AIC. 120 Q(θ*,θ*) is directly available from the Maximization step at convergence. In section 2.6 (main 121 document), we defined for each sample m: P inf m = P(Zm=1|Ym, θ*), then: 122 123 124 125

SM5. Method validation 126
To validate the method, we simulate mock resistance data according to four scenarios, and 127 apply the E-M algorithm and model selection process to each of these scenarios. The method 128 is validated if, for each scenario, the best model retrieves the simulated relationships between 129 explanatory variables and the outcome. 130  Scenario 1: 131 In the first scenario, we simulate random resistance data, representing a situation with no 132 relationship between the explanatory variables and the resistance. In this scenario, for each 133 After application of the E-M algorithm and comparison of models, the best model selected 135 (with the lowest ICH,Q) is Model 1, i.e. the null model, as expected ( Figure S5). 136

 Scenario 2: 137
In the second scenario, we simulate resistance data such that, using model 7, recent AMU 138 predicts the resistance with a 85% positive predictive value and a 85% negative predictive 139 value. In details, let: When Rij is simulated as such, and our method is applied on the mock data, the best model 144 selected in this scenario is Model 7 ( Figure S5). Therefore, the simulated relationship between 145 the explanatory variables and the outcome is retrieved with our method. 146

 Scenario 3: 147
In the third scenario, we simulate resistance data such that, using model 4, initial AMU 148 predicts the resistance in later samples of the same flock, with a 85% positive predictive value 149 and a 85% negative predictive value. In details, let: When Rij is simulated as such, and our method is applied on the mock data, the best model 154 selected in this scenario is Model 4 ( Figure S5). Therefore, the simulated relationship between 155 the explanatory variables and the outcome is retrieved with our method. 156

 Scenario 4: 157
In the fourth scenario, we simulate resistance data such that, using model 2, the carriage of 158 colistin resistance in day-old chicks (arriving from hatchery) predicts the resistance with a 85% 159 positive predictive value and a 85% negative predictive value. In details, let: When Rij is simulated as such, and our method is applied on the mock data, the best model 164 selected in this scenario is Model 2 ( Figure S5). Therefore, the simulated relationship between 165 the explanatory variables and the outcome is retrieved with our method. 166 167 Figure S5. Ranking

189
Using c = 2 mg/L (resp. c = 4 mg/L), 10 (resp. 9) samples are classified as susceptible whereas 190 they are classified as resistant in the baseline analysis with c = 1 mg/L, including 3 (resp. 2) 191 belonging to the first round of sampling, 4 (resp. 4) to the second and 3 (resp. 3) to the third. 192 Therefore, for a given value of η, the results of the E-M algorithm are the same with c = 2 mg/L 193 and c = 4 mg/L in models 1, 4, 7 and 10 only, i.e. the models that do not include the variable 194 Ri1 (see Table 2, main document). As a consequence, in Model 7, for any given value of η, the 195 parameter estimates and the local spatial clusters detected are the same with c = 2 mg/L and 196 c = 4 mg/L. 197 The results of the model selection for the five tested values of (c; η) are presented in Table S2. 198 Model 7 shows the lowest ICH,Q value in all cases and is therefore selected as the best model. 199 Table S3 presents the points estimates of Model 7 parameters for all tested values of (c; η). 200 Finally, Figure S6 shows the local spatial clustering of the probability of external source 201 importation of resistance, to compare with Figure 3C of the main document.   coordinates. Compared to the baseline analysis (Figure 3, main document), we always detect a high-high 232 cluster of the probability of importation, in the same geographical area. However, low-low spatial 233 clustering is not always retrieved as in the baseline analysis. 234