General contextual effects on neglected tropical disease risk in rural Kenya

The neglected tropical diseases (NTDs) are characterized by their tendency to cluster within groups of people, typically the poorest and most marginalized. Despite this, measures of clustering, such as within-group correlation or between-group heterogeneity, are rarely reported from community-based studies of NTD risk. We describe a general contextual analysis that uses multi-level models to partition and quantify variation in individual NTD risk at multiple grouping levels in rural Kenya. The importance of general contextual effects (GCE) in structuring variation in individual infection with Schistosoma mansoni, the soil-transmitted helminths, Taenia species, and Entamoeba histolytica/dispar was examined at the household-, sublocation- and constituency-levels using variance partition/intra-class correlation co-efficients and median odds ratios. These were compared with GCE for HIV, Plasmodium falciparum and Mycobacterium tuberculosis. The role of place of residence in shaping infection risk was further assessed using the spatial scan statistic. Individuals from the same household showed correlation in infection for all pathogens, and this was consistently highest for the gastrointestinal helminths. The lowest levels of household clustering were observed for E. histolytica/dispar, P. falciparum and M. tuberculosis. Substantial heterogeneity in individual infection risk was observed between sublocations for S. mansoni and Taenia solium cysticercosis and between constituencies for infection with S. mansoni, Trichuris trichiura and Ascaris lumbricoides. Large overlapping spatial clusters were detected for S. mansoni, T. trichiura, A. lumbricoides, and Taenia spp., which overlapped a large cluster of elevated HIV risk. Important place-based heterogeneities in infection risk exist in this community, and these GCEs are greater for the NTDs and HIV than for TB and malaria. Our findings suggest that broad-scale contextual drivers shape infectious disease risk in this population, but these effects operate at different grouping-levels for different pathogens. A general contextual analysis can provide a foundation for understanding the complex ecology of NTDs and contribute to the targeting of interventions.


Introduction
People living in rural areas in sub-Saharan Africa are often at high risk of infection with a range of pathogens [1][2][3]. The burden of preventable infectious disease in many of these communities can perpetuate poverty [4], reduce well-being [5,6], and contribute to high rates of mortality [7]. An individual's risk of infection with any pathogen depends on a complex interplay of factors that relate to their exposure and susceptibility [8]. The individual-level characteristics that determine the likelihood of encountering a particular pathogen, and of infection following exposure, are often greatly influenced by the social, cultural, political, economic and/or environmental contextual conditions in which a person lives [9][10][11]. Since individuals living in the same geographic, administrative or institutional setting are generally exposed to the same contextual conditions (although not necessarily in the same way), adverse health outcomes commonly cluster within particular grouping levels. Hence, all else being equal, two people living in the same group will tend to be more similar in their health status than two people living in different groups [12]. Such clustering effects are often large for infectious diseases, and particularly so at the household-level for pathogens that are spread through poor sanitation, contaminated water, endophagic vectors, and unhygienic practices [13][14][15][16][17][18].
Clustering of infection within groups, and the contextual effects that drive it, such as marginalization, poverty and access to health services, is integral to the conceptualization of an infectious disease as 'neglected' [19]. However, it has been suggested that effects acting at the group-level are often forgotten in the epidemiological study of NTD infection risk [20,21], or indeed of infectious disease risk more broadly [22,23]. This apparent deficit in "contextual thinking" has occurred despite the widespread use of multi-level models, also called random effect or hierarchical models, in community-based studies of infectious disease risk in low income settings. One possible explanation for the absence of explicit contextual thinking is that the condition that necessitates the use of random effects in these multi-level models, namely the presence of within-group correlation in the outcome of interest, is almost never reported as an outcome of substantive interest in studies of NTD risk. Intra-group correlation (and the need for group-level random effects) is evidence that the grouping level chosen, be it household or geographic or administrative area, has a role in shaping variation in risk, and therefore points to the importance of group-level effects on individual infection.
A number of authors have described the value (and, it could be argued, the need [24]) of considering and reporting measures of general contextual effect, such as within-group correlation or between-group heterogeneity, from multi-level studies of disease risk [12,[24][25][26][27][28][29]. Such effects are described as "general" as they refer only to influence of the cluster boundaries, rather than the specific contextual characteristics of the cluster [28]. Quantification of the extent and level at which infection risk varies between these clusters of individuals can contribute to the development of research questions that are explicitly contextual, and which therefore seek to better understand how the conditions in which people live impact upon their health [27,28]. Moreover, if health inequalities can be defined as differences in health status between groups of individuals [30], estimating general contextual effects (GCE), such as the median odds ratio or the intra-cluster correlation coefficient, can also provide a simple and standardized means with which to quantify and compare health inequalities within and between populations, and for different health outcomes [31]. Estimation of these group-level effects is straightforward to integrate into the multi-level analysis of community-based disease risk [32][33][34][35], and can provide fundamental information on the levels of variation that exist within a population.
Here, we describe a general contextual analysis that seeks to quantify the role of group-level effects in shaping variation in endemic NTD risk at a range of levels of aggregation in a rural farming community in Kenya. Since the NTDs commonly co-occur with HIV/AIDS, tuberculosis (TB) and malaria [36], we compare the GCE observed for NTDs with infection with pathogens causing these three diseases. In addition to describing the levels of variation in helminth, bacterial, protozoal and viral infection risk that exists within a single population, our aim is to use this analysis to demonstrate the value that can be added to the multi-level analysis of NTD risk through the quantification of GCE.

Methods
Data were collected as part of the 'People, Animals and their Zoonoses' (PAZ) study [37]. This was a large cross-sectional survey of all eligible and consenting members of 416 randomly selected households in a single, mixed farming community in western Kenya. In total, 2113 people of all ages meeting the inclusion criteria (� 5 years and without conditions that may have made blood sampling harmful) were included and sampled between September 2010 and July 2012. Samples from participants were tested for current infection with a range of pathogens. A questionnaire was conducted with all recruited participants in their preferred language (Kiswahili, Dholuo, Kiluhya or English).
The average reported household size was 7.6 people (range 1 to 30), from which our average sample size was 5.1 (range 1 to 21). Households were selected from within sublocations, the smallest administrative unit in Kenya. We sampled between 1 and 8 households in all 141 sublocations in the study area. The PAZ study focused on zoonotic disease risk, and the number of households selected per sublocation was proportional to the cattle population (see [37] for further details). Sublocations were nested within constituencies, the level at which government funding for development, and particularly for poverty alleviation, is allocated in Kenya. There were a total of 13 constituencies in the study area. On the basis of the 2009 census (OpenData, http://www.opendata.go.ke), sublocations in the study area had a median total population of 4,809 (range 1,187-33,352) and a median area of 10.8 km 2 (range 0.96-64.6). Constituencies were made up of between 7 and 22 sublocations. The geographic distribution of sampled households, sublocations and constituencies is shown in Fig 1.
Individuals were classified as infected with P. falciparum, the only agent of malaria identified in the study area, if parasites were observed by light microscopy on thick or thin blood smears stained with Giemsa. Infection with the soil-transmitted helminths (hookworm, A. lumbricoides, T. trichiura) and S. mansoni was defined as the presence of at least one egg in a single faecal sample examined following preparation using the Kato-Katz (KK) [38] and formal ether concentration (FEC) techniques [39]. Infection with E. histolytica/dispar was defined as the presence of at least one cyst in a single faecal sample prepared using the FEC technique. M. tuberculosis infection was determined using a gamma-interferon assay (QuantiFERON-TB test, Cellestis) and HIV infection diagnosed using a rapid strip test (SD Bioline HIV 1/2 3.0, Standard Diagnostics). Infection with Taenia species (causing taeniasis, or the presence of an adult tapeworm in the gastrointestinal tract) was defined on the basis of a non-species specific copro-antigen ELISA [40], whilst cysticercosis due to T. solium (the presence of encysted larvae) was determined using a HP10-Ag ELISA on serum [41].

Ethical approval
Ethical approval for this study was granted by the Kenya Medical Research Institute (KEMRI) Ethical Review Board (SCC1701). All participants or their guardians provided written informed consent. Individuals found to be infected with helminths or protozoa (including P. falciparum) were offered treatment free of charge by study clinical officers. Referral to local health facilities was provided where necessary.

Model specification
The entire sample of 2113 people was used for the general contextual analysis. Missing-ness was present in all outcome measures and ranged from 0.05% (for P. falciparum) to 11.1% (for M. tuberculosis). Missing-ness was related to an absence of a particular sample type (blood or faeces), typically due to inadequate volumes collected or because of participant unwillingness to provide it.
Four-level logistic regression models were specified with infection as a binary outcome (infected/not infected) for each pathogen. Probability of infection was related to a set of predictors at the individual-level and random effects at the household-, sublocation-and constituency-levels. These models estimated the log odds of individual infection together with the variance at the intercept for the household (σ 2 H ), sublocation (σ 2 SL ) and constituency (σ 2 C ) levels for an individual i living in household j in sublocation k in constituency l. The regression equation can be summarised as logit(π ijkl ) = β 0 + βX + H 0jkl + SL 0kl + C 0l . Our primary motivation for this analysis was to quantify general (rather than specific) contextual effects operating at each of the three grouping levels. However, age, sex, education status and ethnicity were included as fixed effects, X, at the individual level in order to assess the impact of withinhousehold composition on between-group variation. Models with and without fixed effects were estimated for each pathogen. A quadratic term was included for the continuous predictor age (recorded as 5 year intervals) based on the expectation of non-linear relationships with infection risk for several pathogens [37]. The continuous age variable was scaled to have a mean of zero and standard deviation of one. Models were estimated for each pathogen in WinBUGS 1.4.3 (http://www.mrc-bsu.cam.ac. uk/software/bugs/) using weakly informative normal priors for all fixed and random effects. The standard deviation for each of the group-level random effects was defined using a wide uniform hyper-prior (i.e. Uniform(1,100)). Model convergence was confirmed by visual assessment of MCMC chains. Inference was based on 3 chains that were allowed to run for at least 70,000 iterations after a burn-in of at least 30,000 with a thinning interval of at least 10. We derived the median and 2.5 th and 97.5 th percentiles from posterior distributions of each parameter for point estimates and 95% credibility intervals, respectively. All data manipulation was performed in R statistical environment (R version 3.1.1, http://cran.r-project.org/) with logistic regression models estimated via the R2WinBUGs package [42]. Estimation was performed within a Bayesian framework based on MCMC to reduce bias in the estimates for random effect parameters [43], and for ease of estimation of the associated uncertainty for GCE.

Quantifying general contextual effects
Variance partition coefficient. The variance partition co-efficient (VPC) was calculated from the outputs from each multi-level logistic regression model for each pathogen using the latent variable method [44,45]. This approach assumes that the propensity for individual infection is on a continuous scale and that only those people for which a threshold is exceeded can be considered to acquire infection. Whilst it has been suggested that such a justification is difficult to make for truly discrete outcomes [44], such as infection, interacting thresholds relating to exposure (for example, infectious dose) and susceptibility (such as immunity) could be envisaged. The unobserved latent variable (or probability of infection) is assumed to follow a logistic distribution, with variance equal to π 2 /3 (i.e. 3.29). Using this approach, the VPC at the household ( H ), sublocation ( SL ) and constituency ( C ) levels were [27]: The VPC represents the correlation in the probability of infection between two individuals randomly selected from the same household (VPC H ), sublocation (VPC SL ) or constituency (VPC C ). For the models described, the VPC can be considered to be equivalent to the intraclass co-efficient (ICC).
In order to further evaluate the importance of higher contextual levels in structuring variation in individual infection, we also calculated the proportion of variance (PTV) at the sublocation and household level as a fraction of total variation: We do not directly estimate PTV C since this is equivalent to VPC C .
Median odds ratio. The median odds ratio (MOR) provides a measure of heterogeneity in an individual-level outcome between groups. It represents the median value of the odds ratio comparing group-level residuals from randomly selected pairs of individuals living in a group at higher risk and those from a group at lower risk [27]. The MOR can be considered to represent the (median) difference in odds when moving between groups. It can be calculated as [26]: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Where there is little variation in individual risk between groups, the MOR will be close to one.

Spatial clustering
Geographic effects not captured in the non-spatial multi-level logistic regression models were identified by testing the standardised sublocation level residual log odds for evidence of spatial clustering in high or low values using the spatial scan statistic [46]. The default maximum cluster size of 50% of the sample was chosen using a circular spatial window. The sublocation was used as the highest contextual level for the exploration of spatial clustering due to the small number of groups at the constituency level (n = 13). We used a normal model in SatScan version 9.4.4 (www.satscan.org). To account for differences in sample sizes, the number of individuals sampled in each sublocation were included as model weights [47]. Sublocation residuals for spatial analysis were drawn from a three-level logistic regression model (with random effects for household and sublocation only) with and without adjustment for withinhousehold compositional effects.

General characteristics
The variation in prevalence of each infectious agent across the range of variables included as fixed effects is shown in Table 1. Variation in prevalence of infection between self-reported members the different ethnic groups was particularly apparent, and most notably so for A. lumbricoides, T. trichiura, Taenia spp. (causing taeniasis) and HIV. Heterogeneity in the prevalence of infection with each of these pathogens, and with S. mansoni and T. solium (causing cysticercosis), was also evident between constituencies.

Fixed effects
Co-efficients from the adjusted models (M2) for each pathogen are shown in Table 2 (STH and S. mansoni),  (Table 4). Having an education beyond primary school tended to reduce odds of infection for the majority of pathogens under study, although this was only significant in the case of hookworm (Table 2). There were strong relationships between ethnicity and infection for several pathogens, including substantially reduced odds among people of Samia and Teso ethnicity for A. lumbricoides compared to the Luhya baseline. Odds of T. trichiura infection were reduced among people of Teso ethnicity and elevated among people of Luo ethnicity when compared to the Luhya baseline. The odds of HIV infection was also higher among individuals of Luo ethnicity than the Luhya baseline.

General contextual effects
The posterior distribution of household-, sublocation-and constituency-level variance, VPCs, MORs and PTVs for the gastrointestinal nematodes and S. mansoni are shown in Table 2, in  Table 3 for E. histolytica/dispar and Taenia species, and in Table 4 for HIV, P. falciparum and M. tuberculosis. Some degree of clustering at the household-level was apparent for all pathogens. This was consistently highest for the helminth parasites (Fig 2), for which there was substantial heterogeneity in risk of infection between individuals in different households, as evidenced by MORs which exceeded 3.5 for each helminth infection in both the null and adjusted models ( Table 2 and Table 3). To put these effects into context, we would expect that were an individual to permanently move from one household to another with higher risk anywhere in the study area, their odds of infection with the helminth parasites under study would change by at least 3.5 times. This household clustering effect was particularly large for S. mansoni (Table 2) and T. solium cysticercosis ( Table 3). The partitioning of group-level variation was generally largest at the household-level, although the greatest proportion of individual Table 2  variation was partitioned at the constituency level (VPC c ) in null models for T. trichiura (Table 2) and HIV (Table 4), and the sublocation-level for T. solium cysticercosis (PTV SL ) ( Table 3). Using MORs, these higher-level contextual effects could be interpreted as an almost five-and three-fold change in the odds of infection for an individual that permanently moves to a higher risk constituency for T. trichiura and HIV, respectively. Similarly, the median odds of an individual permanently moving to a higher risk sublocation could be expected to increase by around eight times for T. solium cysticercosis. Control for individual-level fixed effects resulted in declines in within-constituency correlation (VPC C ) and between-constituency heterogeneity (MOR C ) for infection with several of the pathogens under study, most notably for A. lumbricoides and T. trichiura (Table 2) and HIV (Table 4).

Spatial clustering
The spatial distribution of sublocations with evidence for clustering in high or low values of residual log odds of infection is shown in Fig 3. Large spatial clusters of both high and low values were observed from null models for T. trichiura, S. mansoni, A. lumbricoides, and Taenia spp.. There was substantial overlap in clusters for all of these pathogens and a large cluster of sublocations with elevated risk of individual HIV infection. We found no evidence of spatial structuring in the sublocation-level residual log odds of infection with M. tuberculosis or T.  solium and relatively small clusters for P. falciparum, hookworm and E. histolytica/dispar (Fig  3). The spatial extent of the clusters of both high and low sublocation residual log odds was reduced when controlling for individual-level fixed effects in the case of HIV. Adjustment for these fixed effects resulted in a loss of significance in spatial clusters of both high and low values from the model for A. lumbricoides, and of high values for T. trichiura. Only the spatial cluster of positive sublocation residual log odds remained significant in the case of S. mansoni (Fig 3).

Discussion
In this general contextual analysis, we demonstrate the value of summarizing variation in individual infectious disease risk at one or more biologically relevant grouping levels using the outputs from multi-level regression. Deriving statistics such as the MOR and VPC (or ICC) as part of an exploratory analysis of infectious disease risk is straightforward, and can contribute important information about the heterogeneity that underlies population-level averages, such as prevalence [26][27][28]33]. Using this approach, we show that variation in individual infection risk is partitioned at the household, sublocation and constituency-levels for a range of NTDs in a rural population in Kenya. These findings point to the importance of social and/or environmental contextual conditions in shaping infection at each of these levels, and which may  provide actionable targets for public health interventions seeking to reduce both the prevalence of infection and the health inequalities observed. An important limitation that should be recognised when interpreting these findings, and particularly when making comparisons between pathogens, is the lack of precision in many of our estimates of GCE, particularly at higher contextual levels. Hence, whilst estimates of VPC and MOR at the constituency-level were substantially different between, for example, hookworm and S. mansoni infection, the 95% credibility intervals overlap. This is a limitation of the sample available, both in terms of number of individuals and number of individual groups at the higher contextual levels. The magnitude of the MOR or VPC provides useful information on the importance of a particular level in structuring risk [28], and for the example of hookworm and S. mansoni, strongly suggests contextual drivers operating at the constituency level are more important for the latter than the former. However, when interpreting differences between pathogens at these higher contextual levels, or between different contextual levels for the same pathogen, it should be noted that the statistical support for many of the differences we observed was often limited.
A general contextual analysis can provide a tool for exploring the levels at which pathogen transmission occurs within a population [16]. For example, we show that the majority of variation in individual hookworm infection was partitioned at the household level, with comparatively smaller amounts at sublocation and constituency levels. This suggests clustering at higher contextual levels is less important for this parasite in this population than for the other STHs. Individual infection with A. lumbricoides, for example, was partitioned at both the household-and constituency-levels, and therefore household clusters of infection can also be considered to cluster by constituency. Household clustering was less important for T. trichiura, but there was substantial variation in infection between constituencies, and to a lesser extent between sublocations within constituencies. Understanding these patterns of partitioning in infection risk may assist in the design of interventions that seek to reduce both the prevalence and health inequalities observed. For pathogens with limited evidence for higher level GCE, such as hookworm or E. histolytica/dispar, it is likely that households in all parts of the study area would need to be targeted. Interventions in high risk constituencies are likely to be more cost effective for T. trichiura, A. lumbricoides and S. mansoni, potentially including a focus in high risk sublocations for the latter two pathogens. The general contextual analysis approach described here could be particularly valuable in monitoring the effectiveness of an intervention, such as mass drug administration. For example, a decline in population-level prevalence but persistence of, or increase in, general contextual effects at particular grouping-levels would point to ongoing or new health inequalities. Moreover, such a finding would suggest the presence of hotspots of transmission that may impact elimination [48]. Wider usage of general contextual analysis in the study of NTD risk could therefore contribute to the post-2020 NTD roadmap that sees a transition from monitoring programme coverage to measuring impact [49].
Clustering in T. solium cysticercosis and Taenia spp. taeniasis was observed at both the household and sublocation levels. This was particularly large at the sublocation level for T. solium cysticercosis, but not between constituencies. Hence, while spatially heterogeneous factors appear to influence cysticercosis risk, these effects are likely to operate at small spatial scales (i.e. at the sublocation-level). Cases of human cysticercosis commonly cluster around human tapeworm carriers [50], and Okello et al [51] reported hyper-endemic hotspots for T. solium infection in Lao PDR. The importance of non-spatially-structured sublocation effects in our own study area could therefore be hypothesised to reflect small-scale differences in pork consumption practices, or the existence of slaughterhouses in particular sublocations with inadequate meat inspection practices. Sublocation-level residuals for taeniasis showed substantial spatial structuring on the basis of the spatial scan statistic, and the lack of a similar finding for cysticercosis may point to a preponderance of the beef tapeworm, T. saginata (which does not cause human cysticercosis) over T. solium in the study area.
The nesting of variation in individual HIV infection at the constituency level supports the growing recognition that HIV epidemiology can be characterized as a number of diverse epidemics, often with substantial variation in prevalence even at small spatial scales [52,53]. In this part of western Kenya, individual risk of HIV infection was most concentrated in constituencies in the south-western part of the study area. Further work is needed to explore the important clustering observed, including the compositional effect of ethnicity; the Luo community who, as a group, have been previously been described to be heavily burdened by HIV [54], reside primarily in the southern part of the study area [37]. Schistosoma haematobium, which we did not test for but which is known to be an important co-factor for HIV infection in sub Saharan Africa [55], is also likely to be common in the swampy area around Lake Victoria [56], and may also contribute to the clustering observed. There were substantial overlaps in the spatial distribution of HIV infection risk and that for several NTDs, most notably S. mansoni, A. lumbricoides and T. trichiura. This supports earlier analysis of the same data that showed overlapping spatial clustering in household-level infection with these pathogens [37]. The observed co-distribution of these pathogens may point to the existence of shared environmental, cultural, behavioural or social conditions leading to poly-parasitism [19]. Alternatively, it may suggest immunological interactions between HIV and these helminth parasites that influence transmission dynamics, a hypothesis supported by a growing number of field and laboratory based studies [57].
Interestingly, between-group levels of variation were considerably lower for P. falciparum and M. tuberculosis than for any of the NTDs, with the exception of infection with E. histolytica/dispar. Previous studies on M. tuberculosis have suggested that the majority (>80%) of transmission events for the pathogen occurs in the public (or community) rather than domestic domain [58][59][60]. The comparatively small levels of individual variation partitioned at the household-level (particularly compared to the helminth pathogens under study) provides further support for these findings. Moreover, in the absence of higher level GCEs, we show there is little variation in community-level transmission between different parts of the study area for M. tuberculosis. Although we found evidence for a small cluster of sublocations with reduced risk of P. falciparum infection, the absence of higher-level contextual effects (at the sublocation-and constituency-level) for this pathogen suggests geographic or administrative place of residence does not have a major influence on infection risk. This is supported by a recent study from neighbouring Eastern Uganda which, using highly sensitive molecular-based diagnostic tests, demonstrated that the vast majority of community residents, regardless of age, demography and geographic location, were infected with malaria parasites [61].
We have explored only a limited set of fixed effects at the individual level in this analysis, and no specific contextual effects (i.e. predictors operating at group-level). Having demonstrated the importance of these grouping-levels in structuring infectious disease risk, the next analytical step would be to integrate specific contextual effects, including household, sublocation and constituency-level indicators of social or environmental conditions that may explain the variation observed. The inclusion of individual-level predictors resulted in substantial decreases in the variation at higher contextual levels for pathogens such as A. lumbricoides, T. trichiura and HIV. There were large, overlapping spatial clusters for each of these pathogens, the size of which was reduced or made to be non-significant following the inclusion of individual level predictors. All of these pathogens had strong relationships with ethnicity, which is known to be highly spatially structured in the study area [37]. Disentangling the importance of individual level cultural and behavioural practices and local social and environmental conditions would therefore help to better understand the general contextual effects observed.

Conclusion
Quantification of general contextual effects provides a means to evaluate the importance of social and environmental conditions in structuring infectious disease risk within a population. Such an approach encourages the explicit consideration of group-level, contextual effects on individual health and can form the basis for subsequent analyses that seek to explain the variation observed. Using a general contextual analysis, we have demonstrated the existence of important place-based contextual effects for a range of pathogens in a rural farming community in Kenya and show that these are particularly large for the NTDs and HIV. This study provides evidence for important variation in infectious disease risk in this underprivileged population that point to the existence of health inequalities at a range of grouping-levels.