The reliability of observational approaches for detecting interspecific parasite interactions: comparison with experimental results

: Interactions among coinfecting parasites have the potential to alter host susceptibility to infection, the progression of disease and the efficacy of disease control measures. It is therefore essential to be able to accurately infer the occurrence and direction of such interactions from parasitological data. Due to logistical constraints, perturbation experiments are rarely undertaken to directly detect interactions, therefore a variety of approaches are commonly used to infer them from patterns of parasite association in observational data. However, the reliability of these various approaches is not known. We assess the ability of a range of standard analytical approaches to detect known interactions between infections of nematodes and intestinal coccidia (Eimeria) in natural small-mammal populations, as revealed by experimental perturbations. We show that correlation-based approaches are highly unreliable, often predicting strong and highly significant associations between nematodes and Eimeria in the opposite direction to the underlying interaction. The most reliable methods involved longitudinal analyses, in which the nematode infection status of individuals at one month is related to the infection status by Eimeria the next month. Even then, however, we suggest these approaches are only viable for certain types of infections and datasets. Overall we suggest that, in the absence of experimental approaches, careful consideration be given to the choice of statistical approach when attempting to infer interspecific interactions from observational data. reliability of interspecific comparison experimental Abstract Interactions among coinfecting parasites have the potential to alter host susceptibility to infection, 18 the progression of disease and the efficacy of disease control measures. It is therefore essential to 19 be able to accurately infer the occurrence and direction of such interactions from parasitological 20 data. Due to logistical constraints, perturbation experiments are rarely undertaken to directly detect 21 interactions, therefore a variety of approaches are commonly used to infer them from patterns of 22 parasite association in observational data. However, the reliability of these various approaches is 23 not known. We assess the ability of a range of standard analytical approaches to detect known 24 interactions between infections of nematodes and intestinal coccidia ( Eimeria ) in natural small ‐ 25 mammal populations, as revealed by experimental perturbations. We show that correlation ‐ based 26 approaches are highly unreliable, often predicting strong and highly significant associations between 27 nematodes and Eimeria in the opposite direction to the underlying interaction. The most reliable 28 methods involved longitudinal analyses, in which the nematode infection status of individuals at one 29 month is related to the infection status by Eimeria the next month. Even then, however, we suggest 30 these approaches are only viable for certain types of infections and datasets. Overall we suggest 31 that, in the absence of experimental approaches, careful consideration be given to the choice of 32 statistical approach when attempting to infer interspecific interactions from observational data. 33


6
In both studies the parasite communities were perturbed by treating a subset of animals 114 with the anthelmintic drug ivermectin to reduce their gastrointestinal nematodes (>90% reduction in 115 nematode prevalence over a period of 4 weeks in Peromyscus, and 71% reduction in prevalence 116 within 3 weeks of treatment in Apodemus). Only a subset of animals was treated on the treatment 117 grids in both studies; the remaining animals were left as untreated controls. In addition, in both 118 studies there were untreated control grids (two grids in the Peromyscus study and two grids in the 119 Apodemus study) on which no animals received treatment; there was no evidence that infections in 120 untreated mice on treatment grids differed from those of animals from the untreated control grids. 121 Comparing non-target parasite infections at subsequent captures between treated and untreated 122 mice showed, in both studies, that coccidial parasites from the genus Eimeria (a genus of directly-123 transmitted protozoa that inhabit the gastrointestinal tract of small mammals) increased following 124 anti-nematode treatment. Specifically, in Peromyscus, Eimeria showed a 20% increase in prevalence 125 post-treatment (Fig. 1A) whereas in Apodemus, Eimeria increased 15-fold in intensity post-126 treatment (Fig. 1B). These classic perturbation experiments therefore provide clear evidence of 127 negative interactions between nematodes and Eimeria in two separate host-parasite systems. Given 128 this, we can then ask whether any of the standard approaches used for inferring interspecific 129 parasite interactions from observational data would suggest the presence of these interactions. 130 131

Analytical approaches to infer interspecific interactions from observational data 132
Data were analysed from the untreated mice in the Peromyscus and Apodemus studies using 133 five standard approaches (plus variants) that cover the broad range of techniques typically used in 134 such analyses, examining either qualitative (presence/absence) or quantitative (parasite intensity 135 (excluding uninfected individuals) or abundance (including uninfected individuals) data, based on 136 eggs or oocysts per gram of faeces; EPG) measures of infection. Note that we were restricted to 137 estimating abundance and intensity data indirectly using EPG, since we used non-destructive 7 sampling to allow longitudinal analyses of each individual's infection status. For each approach we 139 considered whether we would reasonably infer the negative interactions between nematodes and 140 Eimeria revealed by our experimental results. In all cases model assumptions for the analyses (e.g., 141 normality of residuals, homoscedasticity etc) were checked and found to be upheld. Analyses  Analysis of residuals controlling for potential confounding factors. Confounding variables 158 (e.g., age, sex, sampling location) may create spurious associations between parasites. One way to 159 control for these effects has been to conduct two ANOVA (or equivalent) analyses, one with each 160 parasite species as the response variable, on parasite intensity data with potential confounders (see 161   Table 1 for lists of covariates for each study) as explanatory variables (e.g., Behnke et al., 2005). A 162 8 significant correlation between the residuals from each analysis is then used as evidence of an 163 interspecific interaction independent of the confounding factors. It should be noted that using 164 residuals in this way can result in biased parameter estimates and has been discouraged (Freckleton, 165 2002); an alternative approach is to conduct a Generalised Linear Model (GLM) which directly 166 controls for covariates in the analysis (e.g., Analysis 4, below). This method now tends not to be 167 used, but has been used previously, and was included here for completeness. This analysis was run 168 on nematode and Eimeria intensity data (EPG counts among coinfected hosts). 169 170

T-test comparison (cross-sectional) 171
Here, an interaction is inferred from a significant difference in infection levels of one 172 parasite between hosts infected and not infected by the other. Unpaired two sample Student's t-173 tests on logged Eimeria EPG from Eimeria-infected hosts (i.e., using Eimeria intensity data) were 174 used to compare nematode-infected with -uninfected hosts. Based on our experimental results, we 175 would expect nematode infected hosts to have significantly lower Eimeria EPG counts than hosts 176 without nematode infections. Again, this analysis does not account for potential confounding 177 covariates, and an alternative approach using GLMs to control for covariates is conducted later 178 (Analysis 4). However, as with the correlation approach, it has been used previously (Chappell, 179 1969;Hendrickson and Curtis, 2002), and was included here for completeness. 180  When the response variable (Eimeria) was qualitative a binomial GLM was used, and when 208 quantitative (log(Eimeria EPG), restricted to Eimeria-infected individuals) a Gaussian GLM was used. 209 These models were simplified by backwards stepwise deletion (using the function 'step' in R), until a 210 minimal model was reached. For these results to match our experimental results, we would expect 211 a significant negative relationship between nematodes and Eimeria. 212 Note that a further version of this analysis was explored, controlling for potential non-linear 213 effects of host age, using body length and body mass as proxies (Fenton et al 2010;Supplementary 214 Data S1 but found it did not significantly change model log-likelihood. We therefore only present 215 the results from the standard GLMs. 216 217

Longitudinal GLM 218
All analyses considered so far have been cross-sectional, examining the contemporary 219 associations between Eimeria and nematodes. For the Apodemus study an additional, longitudinal 220 analysis was carried out, using the same four baseline models as the cross-sectional GLMs above, but 221 here the 'Nematodes' explanatory variable referred to infection status the previous month. Once 222 again, model simplification was used to reach a minimal model, and we sought evidence that 223 nematode infection one month reduced Eimeria infection the next. 224 225

Controlling for pseudoreplication arising from multiple captures of individuals 226
The full data included multiple captures of some individuals, which are not independent 227 from each other (i.e., pseudoreplication at the level of the individual). In Supplementary Data S1 we 228 describe a range of approaches we explored to control for this pseudoreplication. However, in all 229 cases the results (the terms remaining in the minimal models, and effect sizes of those terms) were 230 very similar for all three methods (Supplementary Data S1; Supplementary Fig. S1), presumably due 231 11 to the relatively low numbers of recaptures in the data. We therefore concentrate on the results 232 from the full datasets here. 233 234

Assessing the reliability of each analytical technique 235
The reliability at inferring the experimentally-revealed negative interaction between 236 nematodes and Eimeria was assessed for each of the above approaches. Because our observational 237 data may not be optimal for inferring interactions using a given technique (e.g., the sample size may 238 be too small, or of insufficient temporal resolution), a broad approach was taken to assess reliability. 239 First, we used a simple qualitative assessment, asking whether each technique predicted the correct 240 direction (negative) of association between nematodes and Eimeria. We then used a quantitative 241 assessment of the statistical significance (P < 0.05) of association between nematodes and Eimeria. 242 Finally, we sought a quantitative measure of the magnitude of reported effects of nematodes on 243 Eimeria. Since the various approaches return different statistical metrics, these metrics were 244 converted to a common, standardised effect size, Hedge's g (Borenstein, 1994;Nakagawa and 245 Cuthill, 2007). This allows effect sizes from different tests to be presented on the same scale, aiding 246 comparison with the experimental results; a negative value of Hedge's g in these analyses implies a 247 negative association between nematodes and Eimeria, matching the experimental results. 248 Note that, to maximise sample size for these analyses data were used from a wider range of 249 years and study grids than were used in the experiments (Table 1). To check whether this explains 250 any discrepancies between these analyses and the experimental results, we re-ran our analyses of 251 the Apodemus data restricted to the same year as the experiment was conducted, and found the 252 results were little affected (Supplementary Data S1; Supplementary Fig. S2). Finally, we emphasise 253 that our results are only directly applicable to the host-parasite systems examined and the quality 254 and resolution of data available to us (we return to this point in the Discussion). However we 255 suggest that many of our conclusions are likely to be applicable to many other empirical systems 256 where similarly structured data are used to infer the existence of interspecific parasite interactions. 257 258

Results 259
Here we summarise the reliability of each technique in comparison to our experimental 260 results, leaving more detailed descriptions of each analysis in Supplementary Data S1. 261 Overall, there was considerable variation between the different approaches in the predicted 262 association between nematodes and Eimeria ( Fig. 2; Table 2), with relatively few tests matching the 263 experimental results by returning negative associations (5/17 tests for Apodemus (two significant at 264 P < 0.05), and 0/13 tests for Peromyscus; Fig. 2). Indeed, the majority of tests returned positive 265 associations between nematodes and Eimeria (12/17 for Apodemus and 13/13 for Peromyscus), the 266 opposite direction of that seen with experimental manipulation. 267 The least reliable techniques were the correlation-based ones, where all variations reported 268 positive associations between nematodes and Eimeria (Fig. 2, Table 2). Furthermore, even the cross-269 sectional GLMs, which controlled as much as possible for potential confounders, fared poorly; one 270 variant (where both nematode and Eimeria infections were analysed as EPG) resulted in the 271 strongest positive effect size out of all tests for both Apodemus and Peromyscus (Fig. 2), and the 272 other cross-sectional GLMs returned effect sizes around zero. This suggests that adding covariates 273 into the analysis does not necessarily improve model accuracy. For example, the cross-sectional 274 'Eim(EPG)~Nematode(PA)' GLM is closely related to the t-test analysis (both have Eimeria EPG as the 275 response variable and nematode presence/absence as the predictor), except that the GLM controls 276 for covariates, whereas the t-test does not. However there was no evidence that the GLM 277 performed any better (predicted effect sizes were not stronger, and confidence intervals were not 278 narrower) than the t-test. Similarly, the cross-sectional 'Eim(EPG)~Nematode(EPG)' GLM is related 279 13 to the standard correlations in the nature of the response and predictor variables, but there was no 280 evidence that the GLM, which controls for covariates, performed any better than the correlation 281 approach (Fig. 2). 282 Overall the most reliable methods tended to be longitudinally-based, which examined the 283 association between nematode infections one month and Eimeria infections the following month; 284 three out of four of these analyses predicted a negative association between nematodes and Eimeria 285 and two were statistically significant (Fig. 2). However, the form of analysis that most closely 286 matches the experimental result for Apodemus, in terms of the nature of the response and predictor 287 variables (Eimeria EPG and nematode presence/absence), although predicting a negative effect size, 288 had a wide confidence interval and was not statistically significant (Fig. 2, Table 2). 289 290

Discussion 291
Few of the observation-based statistical approaches tested were successful at inferring the 292 experimentally-revealed negative interaction between nematodes and Eimeria. In particular most 293 cross-sectional approaches, particularly the correlation-based ones, performed extremely poorly, 294 often returning highly significant but strongly positive associations between the parasites. This was 295 particularly apparent for the Peromyscus dataset (Fig. 2), which had a lower sample size than the 296 Apodemus dataset. These results match, and extend, our previous theoretical analyses which 297 showed that correlation-based approaches can perform very poorly when attempting to detect 298 negative interactions between parasites (Fenton et al., 2010). While we may expect detection of 299 genuine interactions in real-world data to be difficult, it is highly concerning that such frequently-300 used techniques can lead to the inference of significant associations in the opposite direction to the 301 genuine interaction. Therefore, we strongly advise against using correlation-based approaches, even 302 14 those that attempt to control for confounding factors, when seeking interspecific interactions from 303 ecological data (parasitological or otherwise). 304 The most reliable approaches examined were longitudinally-based, which sought 305 associations between nematodes one month and Eimeria the next. Intuitively this makes sense, as it 306 reflects the cause and effect of the underlying interaction; Eimeria levels will decline following 307 nematode infection. As such, we advocate the use of longitudinal methods where possible in the 308 inference of interspecific interactions. However, this should be tempered with the recognition that, data are likely to be beyond the scope of many studies of natural parasite communities (including 330 those of humans), therefore great caution should be taken either when applying longitudinal 331 approaches to more restricted datasets, or when having to resort to less desirable cross-sectional 332 approaches. 333 Why then did so many of the other approaches perform so badly? There are several, not 334 mutually exclusive, explanations. One possibility is that the methods may be reliable given the right 335 data, but the datasets used here are lacking in terms of sample size, resolution (frequency of 336 sampling) or type of data available (egg count data, which are an indirect and potentially unreliable 337 proxy for parasite abundance). For these reasons we used liberal assessments of reliability, tending 338 to base our conclusions on directions of effects, rather than strict statistical significance. Hence, if a 339 given approach is reasonable, but our sample size was inadequate, we may expect a predicted effect 340 in the correct direction even if it was not statistically significant. However, our results do not 341 suggest a mere lack of statistical power, as the associations we found were not necessarily small or 342 insignificant but were, in the majority of cases, in the opposite direction to the experimentally-343 observed interactions. It is certainly true that two of the most prominent observational studies 344 reporting strong associations among coinfecting parasites (Lello et al., 2004;Telfer et al., 2010) had 345 particularly large sample sizes. However, the sample sizes of our datasets were not particularly 346 different from those used in many observational studies of parasite communities, and so it seems 347 reasonable to suggest that if our datasets were inadequate for these statistical methods then the 348 same may apply to other studies. An alternative explanation is that observational studies do not 349 adequately control for confounding factors that either obscure genuine interactions, or generate 350 spurious associations. In our GLM analyses we attempted to control for such effects as much as 351 possible (e.g., host age, sex, sampling location, time-point etc) but these analyses did not necessarily 352 perform better than the equivalent analyses that did not control for covariates (e.g., correlations or 353 t-tests). Clearly there may be other important factors that we did not account for (e.g., exposure, 354 host genetic resistance/susceptibility, local spatial heterogeneity etc), but the factors we controlled 355 for are consistent with those used in many other studies. Thus these approaches may not be 356 expected to be any more reliable for other, similar studies. Finally, it is possible that interactions 357 among parasites are non-linear, meaning that linear statistical models, as are commonly used for 358 inferring the existence of parasite interactions (and as were used here) are not adequate to detect 359 the true relationships between parasites. In particular, if interactions are strongest at low infection 360 intensities (i.e., between nematode-free hosts and those with light nematode infections) then the 361 typically higher burdens seen in untreated individuals may reduce the ability of observational 362 approaches to detect those interactions. To assess this possibility we re-ran our cross-sectional and 363 longitudinal GLM analyses with a quadratic term for nematode EPG as the predictor variable (for 364 both Eimeria EPG and presence/absence as the response variables). However in no cases did the 365 quadratic term stay in the final model, suggesting there was no detectable non-linearity in the 366 nematode-Eimeria interaction that could have caused the differences between our observational 367 and experimental results. 368 An alternative explanation for the mismatch between experimental and observational 369 analyses is that it is the experimental results are incorrect, while the observational results reflect the 370 true interaction. For example, the administered drug (ivermectin, a broad-spectrum nematocidal 371 drug) may directly affect Eimeria, generating the apparent interactions we saw experimentally. 372 However, this seems highly unlikely, as ivermectin is one of the most widely used anthelmintic drugs 373 for both medical and veterinary usage and has been tested multiple times and in a diverse array of 374 systems, yet we have not found any reported direct effects on coccidia. Furthermore, if ivermectin 375 did directly affect Eimeria it would have to have a positive effect (increase Eimeria infection status) 376 to create the post-treatment effects that we found in both datasets. Ivermectin targets the nervous 377 system, and it is hard to envisage how that mode of action would directly lead to an increase in the 378 abundance of Eimeria. Alternatively, ivermectin may be affecting a nematode that is not detected in 379 the observational samples (i.e., that does not pass eggs in the host's faeces), but that is affecting 380   Indicates the year and/or grids from which the experimental data were taken. 570 571 572 Table 2. Summary of results in this study. P-values refer to associations between nematodes and Eimeria. 'All data' includes multiple captures per individual. Bootstrapped results from 100 random subsamples, each including one capture per individual, presenting the percentage of runs (Sig%) returning a significant association between nematodes and Eimeria. Cell shadings show analyses that may imply an association between nematodes and Eimeria (based on P < ~0.05 for all data, or >50% of runs were significant for bootstrapped'data); blue, positive association; yellow, negative association. Eim(EPG) ~ Nem(PA) P = 0.518 Sig% = 0% Eim(EPG) ~ Nem(EPG) P = 0.510 Sig% = 1% (all +ve) a Pearson's correlations. r is the correlation coefficient. PA, the parasite is coded as present/absent (categorical variable); EPG, eggs per gram (continuous variable); Eim, Eimeria (the response variable); and Nem, nematodes (the predictor). b P, value at which 'nematodes' drops out or is retained in the final model. c OR, Odds Ratio (and 95% Confidence Intervals) for the effect of nematodes on Eimeria from models where 'nematodes' is retained. +ve, positive; -ve, negative.