Campbell Systematic Reviews

Vivian A. Welch | Elizabeth Ghogomu | Alomgir Hossain | Alison Riddle | Michelle Gaffey | Paul Arora | Omar Dewidar | Rehana Salaam | Simon Cousens | Robert Black | T. Déirdre Hollingsworth | Sue Horton | Peter Tugwell | Donald Bundy | Mary Christine Castro | Alison Eliott | Henrik Friis | Huong T. Le | Chengfang Liu | Emily K. Rousham | Fabian Rohner | Charles King | Erliyani Sartono | Taniawati Supali | Peter Steinmann | Emily Webb | Franck Wieringa | Pattanee Winnichagoon | Maria Yazdanbakhsh | Zulfiqar A. Bhutta* | George Wells*


| PLAIN LANGUAGE SUMMARY
Mass deworming programmes have little effect on nutritional status and cognitive development on a population level

| The Campbell review in brief
The effectiveness and cost-effectiveness of mass deworming of children to improve child health and other outcomes is debated. This independent analysis reinforces the case against mass deworming at a population-level, finding little effect on nutritional status or cognition. However, children with heavier intensity infections may benefit more.

| What is this review about?
Soil-transmitted helminthiasis (STH) and schistosomiasis affects over 800 million people. There is ongoing debate about whether mass deworming of children improves child nutritional status and cognitive development in endemic areas.

| What studies are included?
Randomised trials of mass deworming for STH (alone or in combination with other drugs or child health interventions) for children aged 6 months to 16 years were eligible if they reported at least one of the following outcomes: growth, haemoglobin, serum ferritin, or cognitive processing or development. Trials had to collect data on baseline STH infection intensity, since the main purpose of this review was to assess effect modification across intensity of infection.
Individual participant data (IPD) was obtained from 19 out of 41 eligible randomised trials. These 19 trials included 31,945 participants and had an overall low risk of bias.
A secondary analysis added new data to the meta-analysis of STH deworming versus placebo of a previous Campbell review by the same authors. This analysis included 29 randomised trials, with data from two studies which had not published weight gain data and updated effect estimates from three studies based on the data provided by authors.
These studies were conducted in 11 low and middle income countries. Most programmes conducted deworming every 4 months or more frequently. Seven out of 19 studies gave a single dose of deworming. Children were school-age, with a median of 11 years of age.
1.3 | Does deworming improve child health and other welfare outcomes?
Mass deworming for STHs compared to placebo probably has little to no effect on nutritional status or cognitive development (moderate certainty evidence). Children with moderate to heavy intensity infections of Ascaris lumbricoides or Trichuris trichiuria may experience greater weight gain (very low certainty evidence). No other differences in effects were found across age, sex or baseline nutritional status.
Findings are consistent for studies at low risk of bias and for other methodological considerations such as completer analyses.
There was no trend in effect according to publication year, baseline A. lumbricoides prevalence or T. trichuria prevalence in the full dataset of 29 studies. Higher baseline hookworm prevalence was weakly associated with greater effects of STH deworming.

| What are the implications of this review for policy makers and decision makers?
This analysis replicates the prior findings of small effects of mass deworming at the population level. In areas where there are children with moderate to heavy intensity infections, which are increasingly uncommon, mass deworming may be beneficial, but this analysis was limited by the small number of children with heavy intensity infections in this sample (<1,000). In areas with light intensity infections, mass deworming programmes probably have very small effects on weight for these children and additional policy options need to be explored to improve child health and nutrition in these areas.

| What are the research implications of this review?
This analysis was severely limited by not being able to obtain IPD for many older studies, which may have included children with heavier intensity infections. Greater adoption of calls for open, structured data from trials could maximise the benefit of research to understand effects in the most vulnerable and marginalised populations within these trials. Mass deworming is applied widely to reduce the consequences of helminth infection, and there have been numerous studies on the effects of deworming on growth, cognition and learning outcomes in children over the past several decades. Systematic reviews and meta-analyses based on aggregate results of the effect of mass deworming on health and education outcomes are conflicting with some showing benefit (Croke, Hicks, Hsu, Kremer, & Miguel, 2016;Hall, Hewitt, Tuffrey, & de, 2008) and others not (Taylor-Robinson, Maayan, Soares-Weiser, Donegan, & Garner, 2015;Welch et al., 2017). Debate has ensued about whether these conflicting results are due to the influence of variations in effect across individual-level characteristics such as whether children are infected or not and intensity of infection (Bundy, Kremer, Bleakley, Jukes, & Miguel, 2009;Hotez et al., 2007;Montresor et al., 2015) as well as setting characteristics such as the sanitation environment and rapidity of reinfection (Campbell et al., 2016).

| The intervention
Mass deworming for STH infection and schistosomiasis is recommended one to four times per year in order to reduce worm burden in endemic areas in the updated World Health Organization guidelines, depending on prevalence of worm infection (WHO, 2017). These updated WHO guidelines cite the Campbell and Cochrane systematic reviews on deworming which both concluded there was little to no effect of deworming on child health outcomes which included growth, anaemia and cognitive outcomes (Taylor-Robinson et al., 2015;Welch et al., 2016). Mass deworming can be applied to school-aged children or whole communities. Selective treatment of infected individuals is rarely done due to the high cost of screening for infection.  Rajagopal, Hotez, & Bundy, 2014;. In addition, water and sanitation measures may be implemented with mass deworming to reduce exposure and transmission of infections.

| How the intervention might work
Even with heavy infections, the nutritional requirements of intestinal worms relative to their human hosts are small. The harm to child welfare is expected to be caused by three factors: (a) malabsorption, (b) tissue damage and bleeding and (c) loss of appetite (Crawley, 2004). STH infections may cause malabsorption of nutrients in their hosts because of damage to the gastrointestinal surfaces. Hookworm infections are associated with anaemia, thought to be due to hookworm feeding on host tissue and to bleeding when they move from one site to another (Hall et al., 2008). Intestinal infections may also lead to reduced appetite which may negatively influence both growth and attention in school.
Deworming drugs are over 90% effective at reducing the worm load in individuals and are expected to reduce the prevalence of worm infection in the community as well as the intensity of infection in individuals (Figure 1). Reducing the prevalence and intensity of infection is expected to improve child nutritional status due to the mechanisms described above of reducing blood loss, reducing damage to gastrointestinal surfaces and improving appetite. Improved nutritional status and appetite are expected to improve attention in school and cognitive outcomes. Some have argued that deworming alone is insufficient to improve child health outcomes since the nutritional deficiencies caused by infections must be corrected with food and/or micronutrients (Hall et al., 2008).
Many potential effect modifiers have been described in the literature. Younger children may have a greater impact of deworming since they are smaller in size and the impact of infections may be greater on them (Hall et al., 2008). Girls may benefit less from deworming if they have lower school attendance (thus, not receiving deworming given at school) and if there is preferential distribution of food or other resources at home which could influence child welfare. Children who are stunted for age at three years of age may not be able to benefit as much in terms of growth. Conversely, children who are underweight may benefit more from deworming than those of normal weight (Hall et al., 2008). It is expected that benefits of deworming would only accrue to those who are infected, and even more so to those with heavier infection intensity (Hall et al., 2008). Low socioeconomic status is expected to be correlated with other features such as exposure to repeat intestinal infections, including those that cause diarrhoea, and thus children with lower socioeconomic status may not achieve as much benefit as less poor children.
Reinfection is expected to depend on the prevalence and intensity of infection as well as environmental factors such as the water and sanitation environment and hygiene practices in the community. with 47 randomised trials and >1 million children, found little to no overall effect on growth, attention and school attendance (Welch et al., 2016). With NMA, we were able to explore the size of effect with different types and frequency of drugs and their combination with food or micronutrients; none of which contributed to larger effects. Our review also did not find larger effects in subgroups of children at the aggregate level across characteristics such as age, baseline nutritional status, prevalence or intensity of infection that have been postulated to be important (Welch et al., 2016). These analyses were conducted at the study level, rather than using data for each individual child, which limits the power to detect effect modification by individual participant characteristics. This review was therefore unable to identify whether mass deworming was more effective for children with certain characteristics. There was substantial unexplained heterogeneity between studies, with some studies finding larger effects than others, and no single individual-level, setting-level or methodology characteristic explaining this variation. Thus, we concluded that our analysis of effect modifiers was limited by the aggregate level data.
Our previous review was conducted using NMA, which allowed the comparison of treatments which had not been directly compared in head-to-head trials. NMA also allowed for the assessment of the role of multicomponent interventions (such as deworming combined with other parasite control interventions, food or micronutrients).
Because there are several drugs used for mass deworming, this allowed the assessment of heterogeneity related to the type of drug, frequency and use of concomitant interventions.
IPD meta-analysis has been called the "gold standard" in meta-analyses for exploring individual level characteristics and their association with effects (Stewart, 1995). Advantages of IPD meta-analysis include improving data quality, enabling standardisation of outcomes, clarifying risk of bias and increasing the power to assess the interaction of participant characteristics with effect size (Dagne, Brown, Howe, Kellam, & Liu, 2016;Stewart et al., 2015). Furthermore, IPD analysis can explore the size and direction of differences in effect, thus assessing whether there is F I G U R E 1 Logic model for deworming effects a greater benefit for some participants (Early Breast Cancer Trialists' Collaborative Group, 1990). Another advantage of IPD is that they usually require an international collaborative effort, involving trial authors, who may help to identify more relevant trials, and also contribute to an agreed analysis plan and shared understanding of the results.
While failure to obtain some datasets may lead to selection bias if there are systematic reasons why some studies do not provide full data, methods have been developed to combine IPD with aggregate data (when IPD is not available for some studies) in NMA (Donegan, Williamson, D'Alessandro, & Smith, 2012;Sutton, Kendrick, & Coupland, 2008).
We decided in collaboration with several authors of primary trials that there would be value in conducting an IPD metaanalysis to explore the question of whether mass deworming is more effective for subgroups of children defined by characteristics such as infection intensity or status, age or nutritional status. This understanding could help to develop targeted strategies to reach these children better with deworming and guide policy regarding deworming.

| OBJECTIVES
The primary objective is to use IPD NMA to explore whether the effects of different types and frequency of deworming drugs as well as their combination with food or micronutrients on anaemia, cognition and growth vary with child-level and studylevel characteristics (see Table 1

| METHODOLOGY
The protocol was registered with the Campbell Collaboration (Welch et al., 2018) and reported according to the preferred reporting items for systematic reviews and meta-analyses for protocols (PRISMA-P;Moher et al., 2015). Results of the review are reported using the PRISMA of individual patient data (PRISMA-IPD) Statement  and the PRISMA for network meta-analyses (PRISMA-NMA).

| Criteria for including and excluding studies
We included studies which met the following eligibility criteria: We included studies with combined approaches to parasite elimination such as albendazole and praziquantel. Also, because deworming may be used in combination with iron, food or hygiene promotion, we included studies with multiple component interventions.
Studies were included with placebo, control, or other active interventions (e.g., vitamin A, iron, hygiene promotion) as comparators.
As NMA depends on the assumption of transitivity (that participants could be randomised to any one of the treatments; Salanti, 2012), we planned to conduct two evidence networks of jointly randomizeable interventions of drugs given for two indications. First, we assessed the evidence network of interventions given for STH which includes different frequencies of albendazole, mebendazole, levamisole, pyrantel, piperazine, ivermectin and tetramisole with or without micronutrients or food. These are considered jointly randomizable because they are given for the same indication, and many have been compared in multiarm trials (Salanti, 2012). Secondly, we considered the evidence network of interventions given for schistosomiasis (praziquantel, metrifonate, hycanthone) with or without micronutrients or food.

| Types of outcome measures
The primary health outcomes were change from baseline in: weight (kg), height (cm), plasma ferritin, cognition and haemoglobin (g/L). We included studies which measured weight, haemoglobin, plasma ferritin, cognition or height. Cognition could be measured using scales that measured development (e.g., Raven's matrices) or tests that assessed attention using digit recall.
We did not exclude on the basis of reported outcomes since some measured outcomes may not be reported in trial reports or abstracts.
We used the available data on age and sex to calculate height for

| Description of methods used in primary research
Randomized controlled trials (RCTs) of deworming include two-arm trials as well as factorial trials, with children allocated either individually or by cluster-randomisation (e.g., by village or school).

| Details of study coding categories
Details of the populations, interventions, comparators, outcomes and study design were extracted in duplicate by two reviewers, using a pretested form, designed for a previous Campbell review on deworming for children (Welch et al., 2016). This extraction includes details about the context, setting and environment, as well as sociodemographic details, and details about the frequency, delivery method and dose of interventions.
Two independent reviewers appraised each study with the Cochrane risk of bias tool which assesses selection bias, performance bias, detection bias, attrition bias and reporting bias (Cochrane Handbook; Higgins, Altman, & Sterne, 2011). Disagreements were resolved by discussion or consultation with a third reviewer.
We appraised the GRADE certainty for each outcome for each comparison by two independent reviewers, using the GRADE approach for NMA (Puhan et al., 2014). GRADE certainty (quality) "reflects our confidence that the estimates of the effect are correct.
In the context of recommendations, quality reflects our confidence that the effect estimates are adequate to support a particular recommendation. "Quality as used in GRADE means more than risk of bias and so may also be compromised by imprecision, inconsistency, indirectness of study results, and publication bias." (Balshem et al., 2011). The two reviewers discussed ratings and reached consensus. Disagreements were resolved by consulting a third reviewer.
We developed a summary of findings We accounted for clusters (such as villages, schools or households) as nested within each study.
We analysed IPD datasets to check for comparability with the primary published papers. We calculated the standardised difference between the published data and the IPD received from authors for baseline characteristics and baseline outcome assessment. For endline, we replicated the effect measures reported in study publications and calculated the standardised difference between the IPD received and the study report (Austin, 2009).
As with our previous Campbell review, we used a two-step process to meta-analysis. We conducted pairwise analyses for each comparison of interest by entering all IPD data into a multilevel model, with each study as one cluster. We expected considerable heterogeneity between studies for each outcome based on our Campbell review; therefore, we used a random effects model. We assessed mean differences in change from baseline for weight (kg), height (cm) and haemoglobin (g/L). We intended to assess plasma ferritin (mcg/L) but too few studies reported this outcome (seven studies with 6,318 participants). The Advisory board, based on clinical and methodological expertise, decided that there were insufficient studies to conduct effect modification analyses and that basic random effects meta-analysis could be misleading.
For cognition, we analysed measures of motor and cognitive development separately. We analysed measures of attention separately from developmental outcomes. We did not combine different measures of cognition.
We accounted for clustering as above by nesting clusters within studies. We decided on a set of predefined covariates with advice from our advisory board and coauthors. We accounted for the covariates of sex, age, infection intensity for each type of agent, socioeconomic status, maternal education and baseline nutritional status in the model. We assessed heterogeneity using visual inspection of forest plots for pairwise analyses as well as statistical tests of heterogeneity (I 2 ).
We conducted NMA with IPD, using a frequentist approach for

| Assessment of transitivity across treatment comparisons
Transitivity cannot be assessed statistically. With IPD, we have more opportunity to account for and model heterogeneity. As proposed by Salanti (2012), we used IPD to assess the distribution of the childlevel effect modifiers from Table 1 in each comparison to assess the plausibility of the transitivity assumption (Salanti, 2012

| Publication bias
A funnel plot would have been plotted for comparisons and outcomes with >10 studies. We used Egger's test for asymmetry and visual inspection to assess the presence of publication bias and/or selective reporting in the entire corpus of randomised trials of deworming versus control for children (which includes some studies that were not eligible due to missing baseline data on infection intensity and some studies which were eligible but did not provide data).
We did not rank interventions because there is controversy as to the utility of ranking.

| Subgroup analyses
Provided sufficient data was available to inform the evidence network, subgroup analyses were conducted to assess effects across both child-level as well as environment-level characteristics. We compared the results of models with subgroup analyses by assessing the size of quantitative or qualitative differences in effects, the statistical significance of tests for interactions, assessing betweenstudy variance and assessing the goodness of fit of the models using the likelihood ratio. using BAZ > −2.0, BAZ < −2.0 to −3.0, BAZ < −3.0), • Anaemia (using WHO cutoffs by age and altitude of nonanaemic, mild, moderate and severe, http://www.who.int/vmnis/indicators/ haemoglobin.pdf) • Age (<5 and ≥5 years of age) • Sex (male/female) • Socioeconomic status: socioeconomic status is measured in different ways in studies (e.g., questionnaires, asset indices, quintiles). We planned to assess whether the measurement of socioeconomic status could be compared across study settings and time. We decided this was not possible therefore we did not do a planned sensitivity analysis with children in the poorest tertile.
Before conducting subgroup analyses, we assessed the distribution of each variable. If there were insufficient children in some categories, the levels were combined (see results).
We planned to assess socioeconomic status of household or parents and maternal education as effect modifiers, but data was insufficient (see results).

Environment level:
• Study level sanitation and hygiene environment, as reported by studies was assessed to consider whether environments can be classified according to consistent system • Study-level prevalence (using WHO cut-offs for each worm-type, as above) • Study-level intensity of infection (using WHO cut-offs for each worm-type, as above) As noted in Table 1, environment level characteristics were not entered into the model. They were assessed by sensitivity analyses.
We expected poor reporting on these details in the articles based on our prior Campbell review, but some studies may have collected information on this at the study level that were not reported in the paper publications. We assessed whether there was sufficient data on the geographic location and date of the studies to assess study-level prevalence generated by the Global Atlas of Helminth Infections.

| Sensitivity analyses
Provided sufficient data was available to inform the evidence network, we conducted sensitivity analyses to assess robustness of results when restricted to studies at low risk of bias for sequence generation, allocation concealment and blinding of participants. We assessed whether results were robust to excluding imputed data (i.e., complete case analysis). We assessed sensitivity to restricting to studies published in 2008 or later (last 10 years).
Data were housed at a secure data warehouse at the Bruyère Research Institute, following the personal health information act.
Data were transferred to SAS as a common platform for all studies, using a common data dictionary. V. W. checked IPD data for consistency immediately upon receiving datasets. For example, we checked for outlier individuals (e.g., with ages outside of eligibility criteria, duplicate participant IDs, unrealistic date ranges). We compared the IPD from authors with the aggregate data reported in the articles. Any missing or unusual data were flagged for discussion with the trial author or statistician by V. W. We asked for clarification from the authors to establish reasons for the errors, and corrected them if possible. Any requests for authors were discussed when the data was provided, such as clarification of trial risk of bias, conduct or eligibility criteria. We ran the same statistical analysis as the authors to check for consistency with the published paper .
We requested statements of ethics approval from each study. No studies were identified that did not receive ethics approval. We requested that all data be transferred without any identifiers.

| Treatment of qualitative research
We did not include qualitative research.

| RESULTS
The results of this review are reported according to the PRISMA-IPD and PRISMA-NMA reporting guidelines (checklists in Table S1).
We screened 14,034 records for inclusion. We screened 340 studies in full-text. We assessed 41 studies of deworming for STH and 14 studies of schistosomiasis treatment as eligible for inclusion.
One study included treatments for both STH and schistosomiasis , and is included in both counts ( Figure 2).
A total of 285 studies were excluded because they did not meet eligibility criteria, due to lack of infection intensity data (n = 14), <3 months (n = 9) and wrong study design (n = 262; Table S3). We identified one ongoing study of albendazole (Table S4).  Figure 3).

| Contacting authors and yield of studies
The retrieval of data was better for studies conducted after 2000, with a yield of 15 out of 22 published studies (68%) and 90% of participants randomised to eligible studies ( Figure 4).
For studies conducted before 2000, we received only four out of 19 studies (21%), and 39% of participants randomised ( Figure 5).
For schistosomiasis, we received data from only two out of 14 studies (14%) (Table S6), representing 37% of participants randomised to eligible studies ( Figure 6). We decided not to pursue an analysis of schistosomiasis studies because of the risk of misleading results with an inadequate representation of available studies.
All study authors who provided data signed a data transfer agreement (Appendix 2).
Three of these studies were screen and treat (SAT) studies: Yap et al. (2014), Beasley et al. (1999) and Beasley (1995). We decided to include these in the model since our model is designed to adjust for infection intensity.
Additional child and setting charateristics for the 19 studies with <50% missing data are in Table S2.

| Characteristics of STH deworming studies which did not provide data
Characteristics of studies that did not provide data are shown in  Abbreviations: epg, eggs per gram of stool; STH, soil-transmitted helminthiasis.
a "Anyworm" is a variable indicating children with no detected STH infection of any type of STH, light intensity using WHO cut-offs for each type of STH, or moderate or heavy infection intensity for any type of STH.
b Anaemia cut points defined on the basis of age and sex using WHO guidelines.

| Compared to the 2016 aggregate data Campbell review
Seventeen studies which were included in our prior Campbell review (Welch et al., 2016) were excluded because they were not randomised or quasirandomised trials (n = 2), had no baseline infection intensity data (n = 15). These studies are summarised in Table S9.

| Aggregate effect estimates of studies not providing IPD
We compared the effect estimates of the studies which were eligible but did not provide data, those that provided data and those which were not eligible (no infection intensity, too small or too short).
Results for STH deworming versus placebo for weight gain (kg) are shown in Figure 7. As can be seen on visual inspection, two studies had much larger effects on weight gain than any others (Stephenson, Latham, Adams, Kinoti, & Pertet, 1993;Stephenson, Latham, Kurz, Kinoti, & Brigham, 1989). The heterogeneity with both of these studies included was 90% as assessed by I 2 , suggesting that statistical combining of these studies is inappropriate. As in our previous Campbell review, we removed outliers to assess contribution to I 2 .
Removing Stephenson (1989), which we earlier assessed as having imbalance in baseline covariates which may have influenced results (Welch et al., 2016), resulted in an I 2 of 71%, which we considered acceptable for statistical pooling, following the Cochrane Handbook guidance (Higgins et al., 2011). The test for interaction of effect was not statistically significant (p = .10).
Details for height and haemoglobin for STH versus placebo are shown in Appendix 3, comparison 1. The interaction test for subgroup effects was not statistically significant for any of these outcomes. However, the studies which were not included

| Feasibility of conducting IPD meta-analysis
We judged that we had insufficient data to conduct analysis of the studies of deworming for schistosomiasis since we received only two studies out of 13 eligible for analysis, and this represented 36% of participants randomised to eligible studies.
For deworming for STH, we received IPD from 19 studies out of 41 considered eligible (46%) and 31,945 out of 40,525 participants randomised (79%). We considered this was sufficient data to pursue IPD meta-analysis.

| Quality of studies
Overall, there was low risk of selection and performance bias in 47% (9 of 19) studies. 47% (9 studies) had unclear risk of bias due to lack of detail on allocation method or method of blinding. Overall, there was a high risk of attrition bias in 37% (seven studies) of the included studies. Attrition bias was judged high risk due to loss to follow-up of >20% of participants in these studies. Detection bias could not be assessed in 58% (11 studies) of the studies and selective reporting could not be assessed in 79% (15 studies) due to insufficient information. No major baseline imbalance was found in 74% (14 studies) of the studies, judged according to the description of baseline characteristics (Figures 8 and 9).
The overall risk of bias was similar for studies for which we were unable to obtain data except for selection bias which was low risk in only 4.5% (1 of 22 studies) and unclear in 91% (22 studies) and blinding of personnel which was low risk in 18% (four studies) and unclear in 72% (16 studies) due to lack of description of the method of allocation or blinding (Figures 10 and 11).

| Preparation, replication, imputation, measurement and estimation
As described in the methods, we followed four steps to prepare, replicate, impute and calculate anthropometric Z scores.
F I G U R E 8 Risk of bias graph for 19 studies that provided data F I G U R E 9 Risk of bias summary for 19 studies that provided data

| Preparation: missingness analysis
Of the 19 studies that met this review's inclusion criteria, 14 studies were missing <50% of data for outcomes and covariates at baseline and endline, and were included in the main analysis (Table 4). For the studies included in the main analysis, there was an average of 4% missing data at baseline (range, 0-42%), and an average of 9% missing data at endline (range, 0-31%). Five studies Kirwan et al., 2009;Miguel & Kremer, 2004;Rousham & Mascie-Taylor, 1994;Wiria et al., 2013) were missing more than 50% of outcome or covariate data at baseline or endline, and were included in the complete case analysis only. Wiria et al. (2013), Hall et al. (2006), and Miguel and Kremer (2004) were missing more than 50% of data for all STH counts at baseline. Hall et al. (2006) and Rousham and Mascie-Taylor (1994) did not collect haemoglobin at baseline nor endline. Wiria et al. (2013) and Kirwan et al. (2009) were missing more than 50% of data on all outcome variables at endline. (2004)  respectively. For every study, there was at least one instance where the standardised difference could not be calculated at baseline or endline because the published results did not report the covariate or outcome measure in question (indicated as "NA" in Table 6).

| Imputation
We used multiple imputation for missing data at baseline and endline and created five completed datasets.

| Measurement and estimation
Two studies (Ebenezer et al., 2013 andYap et al., 2014) required adjustments to haemoglobin measures due to high altitude. The altitude correction method applied was: and for infection intensity (any helminth). The cutoffs used for BMIfor-age z score, weight-for-age z score, and height-for-age z scores were adjusted to include only two levels (≤−2SD, >−2SD) to accommodate the lack of children with extreme scores at either end of the distribution. Anaemia status was adjusted to two levels (not anaemic, anaemic) 1 for the same reason. lumbricoides prevalence at endline (Beasley et al., 1999;Le Huong et al., 2007;Stoltzfus et al., 1997Stoltzfus et al., , 2004) and these were included in a sensitivity analysis to assess Anaemia cutoffs were sex-and age-specific as per the WHO's Haemoglobin concentrations for the diagnosis of anaemia and assessment of severity (World Health Organization, 2013). whether greater impact on A. lumbricoides prevalence was associated with greater effects on growth or haemoglobin.

| Effect of deworming on infection intensity
One study did not assess endline infection intensity (Liu et al., 2017).

| NMA-IPD model development
We planned our analysis model a priori based on consultation with the advisory group and our research team to consider study design elements, outcomes, covariates and effect modifiers.
Six out of 14 studies (Ebenezer et al., 2013;Liu et al., 2017;Rohner et al., 2010;Solon et al., 2003;Stoltzfus et al., 1997) measured effects on cognition outcomes. However, the specific measures and methods used to assess cognition varied by study. At the December 2017 meeting of the review investigators and advisors (London, UK), it was decided to assess cognition (using measures for attention and development) on a study-by-study basis.
Where measures were described with the same name (e.g., working memory in Liu et al., 2017;Ndibazza et al., 2012;), the Advisory Group recommended not combining results across studies since the translation and different contexts of the studies could influence the tool's application. Cognitive measures were categorised as related to: (a) short-term attention (e.g., digit recall), (b) scholastic performance (e.g., math, language tests) or (c) developmental outcomes (e.g. motor development, Raven's index).
An insufficient number of eligible studies included measures for maternal education and socioeconomic status, and the specific measures used varied by study (Table 8). Five studies (Ebenezer et al., 2013;Liu et al., 2017;Ndibazza et al., 2012;Yap et al., 2014) included a measure for maternal education, and seven studies (Beasley et al., 1999;Beasley, 1995;Liu et al., 2017;Ebenezer et al., 2013;Ndibazza et al., 2012;Yap et al., 2014) included a measure for socioeconomic status. Given the limited number of studies and the variability in measures, these measures were not included as covariates in the model.
The weight-for-height z score for children under 5 years that was originally planned as a covariate was replaced by BMI-for-age on the recommendation of the Advisory Group to avoid collinearity between weight-for-age and height-for-age z scores.
Indicators for water and sanitation were not included as effect modifiers because not all studies described water and sanitation conditions (see Table 2-Characteristics of Included Studies). The studies that did provide descriptions did not do so in a quantifiable way that would allow comparison.

| Evidence network and feasibility assessment for NMA
The full evidence network included 18 nodes ( Figure 12) due to different types of deworming (e.g., albendazole, mebendazole and praziquantel), cointerventions (e.g., micronutrients) and frequency of deworming. We considered the control arm of two studies as equivalent to placebo (Liu and Miguel).

| Evidence network refinement
We ran the NMA-IPD for the full network with 18 nodes, as above.
We excluded five studies with >50% missing data since we could not impute missing data and adjusted analyses would be biased due to the amount of missing data. We decided to include these five studies in a "complete case" sensitivity analysis.
The results with the full evidence network were presented at a meeting of the Advisory Group in June 2017 ( the network was further reduced to six nodes ( Figure 13).
Since the results may be influenced by these decisions about collapsing across nodes, the Advisory Board and research team decided to also analyse the full network as a sensitivity analysis.
5.9.1 | Assessing feasibility of NMA with IPD

Assumptions of transitivity and consistency
Transitivity was considered plausible because we assessed the distribution of child-level effect modifiers across studies, and found  (Figures 2-13).
In addition, we found that the distribution of effect modifiers was balanced across comparisons (
As planned, we constructed a funnel plot to assess the presence of publication bias. To do this, we included all studies of STH versus placebo from our previous Campbell review of deworming (Welch et al., 2016) to compare the received data with the data which was either not received or ineligible (due to lack of baseline infection intensity data).
The funnel plot of STH deworming versus placebo for the studies for which we received data (circles) shows that the studies we received include both positive and negative studies ( Figure 14). The studies which were not received had larger effects on weight gain and were smaller (diamonds).
The Egger test for publication bias on the aggregate data of the entire sample (n = 30 studies) was not statistically significant (p = .249) for small study effects.

| Main effects
This section provides the overall results on our four primary outcomes: weight, height, haemoglobin and cognition, using the collapsed evidence  These findings are summarised in three summary of findings tables.
Following this section, we describe effect modifier analyses for each planned effect modifier for each outcome of interest.
A road map of all analyses is described in Table S10. Results for main effects of NMA with IPD for the base case are in Table S11.

| Weight
Base case IPD-NMA analysis There were no statistically significant effects on weight gain (kg) for any of the deworming combinations compared to placebo. For STH deworming versus placebo, the effect on weight gain was 0.01 kg (95% CI: −0.08, 0.11; Figure 15).
The head-to-head comparisons of deworming treatment combinations produced results that were consistent in direction and size with the results of the treatment versus placebo comparisons.

Direct evidence-aggregate and IPD
For each comparison, we compared the IPD-NMA result with the results for the direct evidence from study results pooled at the aggregate level (adjusted for covariates) and the direct evidence pooled using IPD (adjusted for covariates).
In all cases, the effect estimates from direct evidence were of similar size and direction as the IPD-NMA indirect + direct effect estimates (Table 10), and the heterogeneity of direct comparisons was below an I 2 of 75% (Table 11). The forest plot for one comparison (deworming for STH vs. placebo) is shown in Figure 16.
The effect estimates for all other comparisons are shown in Appendix 3 with details for each study for each comparison (Appendix 3).

Sensitivity analyses
There were no qualitative (different directions of effects) nor quantitative (different sizes of effects) differences in the analyses conducted with no covariates (unadjusted analyses; Table S12). For example, the unadjusted effect on weight gain was 0.01 (−0.08, 0.11) for STH versus placebo.
The results of a complete case analysis with the same 14 studies from the base case, where missing data were not imputed, was congruent with the main effects described above (Table S12). For example, the effect on weight gain for STH deworming versus placebo was 0.03 kg (95% CI: −0.07, 0.13).
Analysis of the NMA model restricted to studies at low risk of bias yielded similar results (Table S12) for weight gain (kg) for STH versus placebo (0.01 kg, 95% CI: −0.10, 0.12). However, there were F I G U R E 1 3 Final collapsed evidence network. *Five studies with >50% missing data are not shown in this figure. Four of these included STH deworming versus placebo (Wiria, Kirwan, Miguel, Rousham) and one assessed STH deworming + micronutrients versus micronutrients . STH, soil-transmitted helminthiasis T A B L E 9 Comparison of node constitution in full network, June 2017 collapsed network and November 2017 network model  None of these effects were statistically significant.
One study had very precise results and received a lot of weight in the meta-analyses for weight and height gain ). We conducted a sensitivity analysis without this study and found the same effect on weight gain for STH versus placebo (0.01 kg, 95% CI: −0.08, 0.11). Other effect sizes were also of a similar magnitude and direction as the base case.
As described above, there was variation in effect of deworming on infection prevalence at endline. We conducted a senstivity analysis restricted to studies which were more effective at reducing infection prevalence, defined as a a relative risk of 0.80 or lower when compared to the placebo group in A. lumbricoides prevalence at endline (Beasley et al., 1999;Le Huong et al., 2007;Stoltzfus et al., 1997Stoltzfus et al., , 2004. The results of this sensitivity analysis show that for STH deworming versus placebo, the effect on weight gain was 0.08 kg, 95% CI (−0.10, 0.26), whereas our basecase analysis findings were 0.10 kg (95% CI: −0.08, 0.11).

Comparison of effect sizes for weight gain (kg) between received data and studies that were not included in the analysis
We assessed whether the effects on weight gain were similar for these studies which did not provide data (either because they did not provide it or because they did not meet eligibility criteria) to the studies which did provide data (Appendix 3 The interaction test for subgroup differences was not statistically significant (p = .10). The pooled effect of all 28 studies was 0.07 kg, 95% CI (0.00, 0.13).
The above analysis only includes studies which randomised STH alone compared to a placebo or control arm. Studies with vitamin A, iron or praziquantel as cointerventions are not included in this analysis since we decided that STH deworming and cointerventions should be considered as separate nodes. The latter analysis omits one study (Stephenson et al., 1989) which was also omitted from our previous metaanalysis (Welch et al. 2016

Height-base case analysis
The effect on height gain for STH deworming versus placebo was 0.09 cm (95% CI: −0.08, 0.27). The effects for the other comparisons were of similar magnitude ( Figure 17).
The head-to-head comparisons of STH deworming treatment combinations produced results that were consistent in expected direction and size with the results of the treatment versus placebo comparisons.

Direct evidence-aggregate and IPD
Comparison of the analyses of height gain for STH versus placebo for aggregate data, IPD direct estimates and IPD-NMA estimates are congruent in size and direction of effect (Table 12).
The forest plots for each direct evidence comparison were of acceptable heterogeneity to carry out NMA (Table 13). nodes and (e) complete case with additional five studies that had too much missing data to be included in the adjusted models (Tables S12 and S13).

Sensitivity analyses
As an example, the effect sizes for the STH versus placebo comparison for height gain are in the table below (Table 14).

Comparison of effect sizes for height gain (cm) between received data
and studies that did not provide data We assessed whether the effects on height gain were similar for these studies which did not provide data (either because they did not provide it or because they did not meet eligibility criteria) to the studies which did provide data (Appendix 3). The test for interaction

Sensitivity analyses
All sensitivity analyses were congruent with these main findings including the complete case model (14 studies, six nodes, unadjusted), complete case with additional five studies (unadjusted) and the unadjusted 14 study model ( Comparison of effect sizes for haemoglobin (g/L) between received data and studies that did not provide data We assessed whether the effects on haemoglobin were similar for these studies which did not provide data (either because they did not provide it or because they did not meet eligibility criteria) to the studies which did provide data (Appendix 3). The test for interaction for subgroup differences for STH deworming versus placebo was not statistically significant (p = .33). The effect size was 0.05 g/L (95% CI: −0.02, 0.11) compared to an effect size of studies for which data was not received of 0.00 (95% CI: −0.05, 0.06). Also, when sorted by year of publication, there was no pattern in effect size based on the year in which the study was published.
The baseline means and range of minimum and maximum scores at baseline are given below to aid in interpreting the effect sizes observed (Table 17).  found that digit forward was 0.38 (95% CI: 0.06, 0.71) units higher for albendazole + fortified biscuit compared to unfortified biscuit, and that digit forward was also improved for fortified biscuit alone compared to unfortified biscuit (0.57, 95% CI: 0.25, 0.88). All other outcomes had nonsignificant effects (see Table 18).

| Effect modifier analyses
We conducted subgroup analyses across each of the nine factors that were deemed important by our advisory group.

Height
Tests for interaction were not statistically significant across BMI for age for height gain in cm for any comparison ( Figure S14; Tables S15 and S18).

Haemoglobin
There were no statistically significant subgroup differences across BMI for age for change in haemoglobin for any comparison ( Figure   S15 and Table S15).

Cognition
The test for interaction was not statistically significant across BMI for age for cognition for any comparison (Table S16). The test for interaction was not statistically significant across levels of height for age for weight gain for any comparison ( Figure S16 and Table S18).

Height
The test for interaction was not statistically significant across levels of height for age for height gain for any comparison ( Figure S17 and Table S18).

Haemoglobin
The test for interaction was not statistically significant across levels of height for age for change in haemoglobin for any comparison ( Figure S18 and Table S18).

Weight
The test for interaction for subgroup effects was not statistically significant across sex for weight gain for any comparison ( Figure S19 and Table S16).

Height
The test for interaction for subgroup effects was not statistically significant across sex for height gain for any comparison ( Figure S20 and

Haemoglobin
The test for interaction for subgroup effects was not statistically significant across sex for change in haemoglobin for any comparison ( Figure S21 and Table S16).

Cognition
Tests for interaction for subgroup effects across sex were not statistically significant for cognition for any outcome measure or any comparison (Table S16).

| Age, as effect modifier
Weight Tests for interaction for subgroup effects across age were not statistically significant for weight gain for any comparison ( Figure   S22 and Table S16).

Height
The relatively small number of participants <5 years of age led to wide CIs for estimates in this age group. Tests for interaction for subgroup effects across age were not statistically significant for height gain for any comparison ( Figure S23 and Table S16).

Haemoglobin
Some comparisons did not have any children <5 years of age. Tests for interaction for subgroup effects across age were not statistically significant for change in haemoglobin for any comparison ( Figure S24 and Table S16).

Cognition
Studies that reported cognition outcomes did not have children <5 years.

| A. lumbricoides, as effect modifier
We conducted two analyses because of the limited number of children with moderate or heavy intensity infections: 1) NMA with IPD using cutoffs based on the distribution of intensity in the sample of three levels, and 2) Direct evidence analysis with IPD using WHO cutoffs for intensity of infection.

Weight
For the NMA-IPD, tests for interaction for subgroup effects across A.
lumbricoides intensity were not statistically significant for weight gain for the NMA. When using cut-offs for intensity of infection for A.
For the analysis of STH deworming versus placebo using direct evidence only for weight across three levels of A. lumbricoides infection using WHO cutoffs: none detected, light (1-4,999 epg) and moderate/heavy (≥5,000 epg), the interaction test for subgroup effects was not statistically significant. The effects for children with moderate or heavy intensity of A. lumbricoides infection was 0.12 kg (−0.05, 0.28) which is higher than the effect for those with no detected infection (−0.01 kg (95% CI: −0.11, 0.09) or those with light infection intensity (0.04 kg, 95% CI: −0.07, 0.15) ( Figure S25).
In order to explore the role of A. lumbricoides prevalence further, we conducted a meta-regression according to prevalence of A.
lumbricoides at the study level using aggregate data for all 30 studies available with STH deworming versus placebo. We chose this comparison since it is the comparison with the most data. The results yielded a coefficient of 0.18 (SE, 0.24), p = .455, 95% CI: −0.313, 0.68) with an adjusted R 2 of −2.97% (proportion of betweenstudy variance explained by prevalence of ascaris). These results indicate that A. lumbricoides prevalence was not a significant predictor of the effectiveness of deworming ( Figure 19).

Height
Tests for interaction for subgroup effects across A. lumbricoides intensity were not statistically significant for height gain for the NMA. When using cut-offs for intensity of infection for A.
lumbricoides based on the median distribution across three levels: none detected, lighter intensity (1-1,776 epg), and higher intensity (>1,776 epg), the effect modification for children with higher intensity was 0.04 cm (95% CI: −0.22, 0.30) ( a Bold values are statistically significant differences between the intervention and placebo groups. F I G U R E 1 9 Meta-regression according to prevalence of Ascaris lumbricoides for difference in weight gain at aggregate level moderate or heavy intensity of A. lumbricoides infection was 0.07 cm (95% CI: −0.07, 0.22) ( Figure S26).

Haemoglobin
Tests for interaction for subgroup effects across A. lumbricoides intensity were not statistically significant for change in haemoglobin for the NMA. When using cut-offs for intensity of infection for A.
For the posthoc analysis of direct evidence of STH deworming versus placebo for haemoglobin across three levels of A. lumbricoides infection, using WHO cutoffs: none detected, light (1-4,999 epg) and moderate/heavy (≥5,000 epg), the interaction test for subgroup effects was not statistically significant. The effect for children with moderate or heavy intensity of A. lumbricoides infection was 0.44 g/L (95% CI: −2.49, 1.60) ( Figure S27).

Cognition
Tests for interaction for subgroup effects across A. lumbricoides intensity were not statistically significant for single digit attention scores, math scores, Tamil language scores, processing speed index, working memory index, TIMSS z score, digit forward, digit back, block score and code score (Table S16).

| Hookworm, as effect modifier
Two analyses were conducted: (a) NMA using cutoffs based on the distribution of intensity in the sample of three levels and (b) direct evidence analysis using WHO cutoffs for intensity of infection.

Weight
Tests for interaction for subgroup effects in the NMA across hookworm intensity were not statistically significant for weight gain for any comparison. When using cut-offs for intensity of infection for hookworm based on the median distribution across three levels: none detected, lighter intensity (1-384 epg), and higher intensity (>384 epg), the effect modification for children with higher intensity was 0.16 kg (95% CI: −0.13, 0.46) (Table S15).
For the direct evidence, posthoc analysis using random effects pairwise meta-analysis of STH deworming versus placebo for weight gain across three levels of hookworm infection using WHO cutoffs: none detected, light (1-1,999 epg) and moderate/heavy (≥2,000 epg), the interaction test for subgroup effects was not statistically significant. The effect for children with moderate or heavy intensity of hookworm infection was −0.53 kg (95% CI: −2.09, 1.03) ( Figure S28).
To further assess the role of prevelance of hookworm, we conducted meta-regression using aggregate level data for 23 studies with data on hookworm prevalence for the comparison of STH deworming versus placebo. The proportion of variance explained is 54%, p = .014, showing a positive relationship of weight gain with hookworm infection prevalence ( Figure 20).

Tests for interaction for subgroup effects across hookworm intensity
were not statistically significant for height for any comparison in the NMA. When using cut-offs for intensity of infection for hookworm based on the median distribution across three levels: none detected, lighter intensity (1-384 epg), and higher intensity (>384 epg), the effect modification for children with higher intensity was 0.20 cm (95% CI: −0.13, 0.52) (Table S15).
For the direct evidence, posthoc analysis of STH deworming versus placebo for height gain across three levels of hookworm infection, using WHO cutoffs: none detected, light (1-1,999 epg) and moderate/heavy (≥2,000 epg), the interaction test for subgroup effects was not statistically significant. The effect for children with moderate or heavy intensity of hookworm infection was −0.17 cm (95% CI: −0.52, 0.18) ( Figure S29).

Haemoglobin
Tests for interaction for subgroup effects across hookworm intensity were not statistically significant for change in haemoglobin for any comparison in the NMA. When using cut-offs for intensity of infection for hookworm based on the median distribution across three levels: none detected, lighter intensity (1-384 epg) and higher intensity (>384 epg), the effect modification for children with higher intensity was 3.58 g/L (95% CI: 0.13, 7.02) (Table S15).
For the direct evidence, posthoc analysis of STH deworming versus placebo for haemoglobin across three levels of hookworm infection using WHO cutoffs: none detected, light (1-1,999 epg) and moderate/heavy (≥2,000 epg), the interaction test for subgroup effects was not statistically significant. The effect for children with moderate or heavy intensity of hookworm infection was −0.56 g/L (95% CI: −6.39, 5.27) ( Figure S30).
F I G U R E 2 0 Meta-regression of hookworm prevalence at aggregate level for 23 studies with data on STH versus placebo. STH, soil-transmitted helminthiasis WELCH ET AL.

| 37 of 51
Cognition Tests for interaction for subgroup effects across hookworm intensity were not statistically significant for any comparison for single digit attention scores, math scores, Tamil language scores, processing speed index, working memory index, TIMSS z score, digit forward, digit back, block score and code score (Table S16).
5.12.7 | T. trichiura, as effect modifier We conducted two analyses: (a) NMA using cutoffs based on the distribution of intensity in the sample of three levels and (b) direct evidence analysis using WHO cutoffs for intensity of infection.

Weight
Tests for interaction for subgroup effects across T. trichiura intensity were not statistically significant for weight gain for any comparison in the NMA. When using cut-offs for intensity of infection for T. trichiura based on the median distribution across three levels: none detected, lighter intensity (1-288 epg), and higher intensity (>288 epg), the effect modification for children with higher intensity was 0.17 kg (95% CI: −0.06, 0.41) (Table S15).
For the direct evidence, posthoc analysis of STH deworming versus placebo for weight gain across three levels of T. trichiura infection using WHO cutoffs: none detected, light (1-999 epg) and moderate/heavy (≥1,000 epg), the interaction test for subgroup effects was not statistically significant. However, the effect for children with moderate or heavy intensity of T. trichiura infection was 0.11 kg (−0.14, 0.35) which was higher than for those with no detected infection ( Figure S31).

Height
Tests for interaction for subgroup effects across T. trichiura intensity were not statistically significant for height for any comparison in the NMA models. When using cut-offs for intensity of infection for T.
trichiura based on the median distribution across three levels: none detected, lighter intensity (1-288 epg), and higher intensity (>288 epg), the effect modification for children with higher intensity was 0.

Haemoglobin
Tests for interaction for subgroup effects across T. trichiura intensity were not statistically significant for haemoglobin for any comparisonmin the NMA models. When using cut-offs for intensity of infection for T. trichiura based on the median distribution across three levels: none detected, lighter intensity (1-288 epg), and higher intensity (>288 epg), the effect modification for children with higher intensity was 1.33 g/L (−1.14, 3.81) (  Figure S33).

Cognition
Tests for interaction for subgroup effects across T. trichiura intensity were not statistically significant for any comparison for single digit attention scores, math scores, Tamil language scores, processing speed index, working memory index, TIMSS z score, digit forward, digit back, block score and code score (Table S16).

| Any helminth infection, as effect modifier
Weight Tests for interaction for subgroup effects across a composite category of intensity of infection for any parasite were not statistically significant for weight gain for any comparison in the NMA-IPD model ( Figure S34 and Table S15).

Height
Tests for interaction for subgroup effects across a composite category of intensity of infection for any parasite were not statistically significant for height gain for any comparison ( Figure   S35and Table S15).

Haemoglobin
Tests for interaction for subgroup effects across a composite category of intensity of infection for any parasite were not statistically significant for change in haemoglobin for any comparison ( Figure S36and Table S15).

Cognition
Tests for interaction for subgroup effects across a composite category of intensity of infection for any parasite were not statistically significant for any comparison for single digit attention scores, math scores, Tamil language scores, processing speed index, working memory index, TIMSS z score, digit forward, digit back, block score and code score (Table S16).

| Anaemia as an effect modifier
Weight Tests for interaction for subgroup effects across anaemia were not statistically significant for weight gain for any comparison ( Figure   S37 and Table S15).

Height
Tests for interaction for subgroup effects across anaemia were not statistically significant for height gain for any comparison ( Figure 38 and Table S15).

Haemoglobin
Tests for interaction for subgroup effects across anaemia were not statistically significant for change in haemoglobin for any comparison (Table S15; Figures 21 and S39).

Cognition
Tests for interaction for subgroup effects across anaemia were not statistically significant for any comparison for single digit attention scores, math scores, Tamil language scores, working memory index, TIMSS z score, digit forward, digit back, block score and code score (Table S16).

| Year of publication
We planned to restrict our IPD-NMA to studies conducted 2008 or later. However, we decided that it would be more informative to conduct a meta-regression using aggregate data according to year of publication to include older studies for which we were unable to obtain individual participant datasets.
This analysis shows a negative association, with a greater effect in older studies which was not statistically significant (p = .05) and explained 7.88% of the variance between studies. The graph shows a concentration of more recent studies with smaller effects on weight gain.

| Comparison with other recent systematic reviews for STH deworming versus placebo
We compared our findings for weight gain, height gain and haemoglobin and cognition to a Cochrane review (Taylor-Robinson et al., 2015) and prior Campbell review, which both used aggregate level data (Table 19).
Also, the Welch et al. 2016 review assessed the relationship of aggregate data with prevalence of each type of helminth infection, and found no relationship using two different methods. The findings of this systematic review and IPD-NMA are in agreement with this, using IPD-NMA effect modification tests for subgroup effects, and aggregate data subgroup analysis as well as meta-regression across prevalence of ascaris. Unlike our prior systematic review, we did find a statistically significant relationship with hookworm prevalence and effect on weight gain.

Cognition
Little to no effect 0.23 points on a 100 point scale (95% CI: −0.6, 0.14) Little to no effect Abbreviation: IPD, individual participant data. Studies included by Croke, not by Welch et al. Awasthi 1995 3,712 (50 clusters Our primary analysis was deworming versus placebo (without cointerventions) because it was the closest match to our NMA.
5.13.1 | Notes for those with different point estimates and standard errors Awasthi and Pande (2001): We used 1 year data reported in Taylor-Robinson, Maayan, Soares-Weiser, Donegan, and Garner (2012) systematic review. Croke et al. (2016) used 2 year data. Donnen et al. (1998): we used adjusted estimates reported by Donnen et al. (1998). Croke et al. (2016) used unadjusted estimates provided to the Cochrane authors. Kruger et al. (1996): Kruger et al. (1996)   deworming + iron versus iron (Dossa et al., 2001). Because we used a NMA approach, these studies are included in the node for STH + micronutrients or iron compared to micronutrients or iron. In a sensitivity analysis, we included these studies to assess the influence on our results, and our random effects meta-analysis was 0.10 kg (95% CI: 0.03, 0.17). Note: these studies are included in the NMA-IPD presented in this paper if they had baseline infection intensity and provided data (that is, Hall et al. (2006)  lumbricoides and hookworm, we also assessed whether there was a relationship between prevalence and effects on weight using metaregression for aggregate data for all studies with weight data for STH deworming versus placebo. These analyses did not show a relationship with A. lumbricoides prevalence at the study level. There was an association of higher hookworm prevalence with effect on weight.
These meta-regressions must be interpreted with caution since they are using data at the aggregate level (Debray et al., 2018).
We conducted an extensive search of electronic databases, with advice from the Campbell Collaboration International Development Group information scientist. We screened 16,613 articles and updated this search to March 27, 2018. We report the systematic review according to the reporting guidelines for IPD meta-analysis (PRISMA-IPD) and network-meta-analysis (PRISMA NMA).
We published and followed an a priori protocol (Welch et al., 2018 Our data suggest that there is publication bias in the deworming literature with failure to report growth data since we obtained weight and height data from eight studies which had not previously reported these (Beasley et al., 1999;Beasley, 1995;Ebenezer et al., 2013;Kirwan et al., 2009;Le Huong et al., 2007;Rohner et al., 2010;Solon et al., 2003). We also report cognition data that was not previously published from one study (Rohner et al., 2010). Given our findings of selective outcome reporting, it is still possible that there are additional older studies with negative findings.
We compared the effect sizes observed in the studies that we retrieved to those which were excluded (due to missing baseline infection intensity) or which we were unable to obtain from the trial authors (due to lost datasets, administrative hurdles or nonresponse from the authors). We found that the test for interaction for subgroup differences was not statistically significant for weight, height or haemoglobin, but the effect on weight was higher in the studies which were not obtained, which were mostly older studies.
For schistosomiasis deworming, we received only two of 14 eligible studies. We decided that meta-analysis of these two studies would be misleading and did not pursue IPD meta-analysis for schistosomiasis deworming. We did include nodes in our evidence network for combinations of schistosomiasis deworming and STH deworming, but these had relatively fewer studies and participants.
Small amounts of calories were provided in three studies in the form of unfortified or fortified biscuits , noodles (Le Huong et al., 2007) or beverage (Solon et al., 2003). In each of these studies, the comparator groups received the unfortified food or beverage. We did not identify any studies that looked at providing substantive meals or snacks with deworming. Thus, we cannot draw conclusions on the effects of deworming when combined with feeding programmes in comparison to not providing feeding.

| Quality of the evidence
We included only RCTs. About 40% of trials did not provide enough information to assess adequacy of randomisation and allocation concealment. We considered the included studies were at overall low risk of bias. The quality of evidence as assessed using the GRADE framework was moderate across all outcomes and comparisons for the main effects. Quality of evidence was downgraded because of uncertainty about selective reporting bias across the evidence base, and the fact that we were not able to obtain data from all eligible studies. Subgroup effect analyses were judged at very low certainty due to imprecision and inability to obtain all eligible studies.
Sensitivity analyses across adequacy of allocation concealment were congruent with our main findings for weight, height and haemoglobin for all comparisons.

| Limitations and potential biases in the review process
One limitation of this review is that we did not receive data from all eligible studies. We compared published results of the studies received for STH deworming versus placebo with the studies that were not received and those that were not eligible to assess the potential influence of these missing studies on our findings. The test for interaction was not statistically significant but the effect on weight gain overall was larger in studies that were not received, which limits the ability of this analysis to assess the overall, population level effects. However, this should not affect the effect modification analyses since these are based on individual level covariates. There was no trend in effect size or direction across the year of publication for weight, height or haemoglobin.
The assumptions of transitivity and consistency were assessed and considered plausible by assessing distribution of effect modifiers, assessing within comparison heterogeneity in direct evidence and by comparing direct and indirect evidence. WELCH ET AL.

| 43 of 51
Another limitation is that different diagnostic tools with different measuring properties including Kato-Katz, polymerase chain reaction (PCR) and other techniques, were used for assessing infection intensity across the studies and may lead to measurement error.
Only one study used PCR, and we used its infection intensity estimates in analyses with other studies, recognising that there may be differences in sensitivity of these tests.
Cognitive outcomes are measured using diverse tools and some are translated for use in these studies. For this reason we presented each cognitive outcome for each study separately without combining them in a meta-analysis. This limits the ability to combine results across studies thus these analyses are under-powered for cognitive outcomes.
We were unable to assess effect modification for infection The study durations were short with a median duration of 12 months (ranging from 4 to 45 months) and this may have limited our ability to detect changes in height or weight gain. However, since two of the earlier studies mentioned previously with large effects on weight (Stephenson et al., 1989(Stephenson et al., , 1993 were only 6 and 8 months in duration, we consider that the study durations of these studies was sufficient to assess differences in weight gain. It is unlikely that these study durations are sufficient to assess differences in linear growth.
Single dose trials of short duration may not be able to detect positive effects due to high re-infection rates in endemic areas.
In our collapsed model, we collapsed across frequency of deworming which limits our ability to assess whether high frequency STH deworming is more effective than regular frequency deworming.
As described above, our preliminary models with frequency of administration as separate nodes did not show differences in effects on weight, height or haemoglobin between high frequency and regular frequency deworming.
Two studies in Kenya have shown large effects on weight gain of 1 kg or more (Stephenson et al., 1989(Stephenson et al., , 1993. The reason for these large effects is unclear. Analysis of heterogeneity led us to exclude the Stephenson et al. (1989) study due to baseline imbalance in a prior systematic review (Welch et al., 2016). The conditions in which those two trials were carried out may have been different from other trials, including characteristics such as intensity of infection, sanitation, and participant and investigator adherence to protocols.
However, 25 other studies are available on STH versus placebo, and when all are combined, the overall effect in our analyses is 70 g.
The older studies of deworming suggested stronger effects on nutrition and other health outcomes than we have found in our analysis. Given that stunting is associated with adverse health and cognitive outcomes that implied (since deworming drugs are inexpensive) that deworming is cost-effective. However, our study would cast doubt on this, since at moderate levels of infection, we could not discern significant impacts on key nutrition outcomes such as stunting and wasting. Our systematic review cannot predict outcomes and cost-effectiveness for chemoprophylaxis where infection is severe, since we had <2% of our sample with heavy intensity infections.
Our study did not look at school attendance which has been used for previous cost-effectiveness analysis of deworming. There has been an intense debate on this topic where an independent replication identified smaller benefits than previously thought Hicks, Kremer, & Miguel, 2015). Also, our prior systematic review found an average effect on school attendance of 1% (95% CI: −1, 3%) (Welch et al., 2016). We also identified problems with the methods of measuring school attendance in these studies.
The implication is that the cost-effectiveness/cost benefit of deworming on the basis of school attendance is not proven.
The exclusion of studies with <100 participants may lead to small study bias. However, only three studies had <100 participants; including them would not affect the main analyses or effect modification analyses.

| Agreements and disagreements with other studies or reviews
A Cochrane review (Taylor-Robinson et al., 2015) and Campbell review (Welch et al., 2016) on mass deworming for children both concluded there was little to no effect on weight and height for STH deworming. The effects observed in Taylor-Robinson et al. (2015) were a mean difference of 0.08 kg (95% CI: −0.11, 0.27) on weight, a mean difference of 0.02 cm (95% CI: −0.14, 0.17) on height and a mean difference of 0.02 g/dL, 95% CI: −0.08, 0.04) on haemoglobin for regular treatment, and little to no effect on formal tests of cognition (Taylor-Robinson et al., 2015). In Welch et al. (2016) Our findings for STH deworming versus placebo for height, cognition and haemoglobin are similar to these two prior reviews.
Our IPD-NMA effect on weight gain of 0.01 kg (95% CI: −0.08, 0.11) is lower than these reviews, and is likely due to not being able to retrieve data from all eligible studies. Our meta-regression of year of publication and weight gain did not show a statistically significant effect of year of publication, but this must be interpreted with caution since metaregression suffers from low power and was based on aggregate data. The smaller effect seen in our analysis may be related to publication bias in the previous reviews since we obtained unpublished data which is known to be associated with negative findings (defined as smaller effects or nonstatistically significant; Hopewell, Loudon, Clarke, Oxman, & Dickersin, 2009) and that we did not receive data from all available studies.
Our finding on weight gain with our IPD-NMA of 0.01 kg is considerably smaller than in the meta-analysis by Croke et al. (2016) on weight gain (http://www.nber.org/papers/w22382.pdf), which found an average overall effect on weight gain of 0.134 kg (95% CI: The quality of evidence is rated as moderate for our findings, mainly due to the possibility of selective reporting and publication bias in the body of literature. Further research to obtain additional unpublished data on growth and cognition could change our findings.
For schistosomiasis deworming, we were unable to obtain the majority of studies, thus we did not carry out these analyses.
Further short-term studies of STH deworming in lightly infected populations are not likely to change the certainty or sizes of effects observed in this systematic review or in other systematic reviews of deworming.
Ideally, in the design of studies, duplicate methods to measure exposure and outcome in a reliable way would be important. For example, future studies could use more sensitive diagnostic tools (e.g., PCR). Also, for cognition, proper cultural translations and validation of measurement tools are important.

ACKNOWLEDGMENTS
We would like to thank Celia Holland for being part of our Advisory

SOURCES OF SUPPORT
This review is funded by the Bill and Melinda Gates Foundation (funding reference number: OPP1140742). low-income and middle-income countries: A systematic review and network meta-analysis. The Lancet Global Health, 5(1), e40-e50. WHO (2017). Guideline: Preventive chemotherapy to control soiltransmitted helminth infections in at-risk population groups, Geneva: WHO. Licence: CC BY-NC-SA 3.0 IGO.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section.