An exploration of the disease burden due to Cryptosporidium in consumed surface water for sub-Saharan Africa

The protozoan pathogen Cryptosporidium is an important cause of diarrhoeal disease, but in many contexts its burden remains uncertain. The Global Waterborne Pathogen model for Cryptosporidium (GloWPa-Crypto) predicts oocyst concentrations in surface water at 0.5 by 0.5° (longitude by latitude) resolution, allowing us to assess the burden specifically associated with the consumption of contaminated surface water at a large scale. In this study, data produced by the GloWPa-Crypto model were used in a quantitative microbial risk assessment (QMRA) for sub-Saharan Africa, one of the regions most severely affected by diarrhoeal disease. We first estimated the number of people consuming surface water in this region and assessed both direct consumption and consumption from a piped (treated) supply. The disease burden was expressed in disability adjusted life years (DALYs). We estimate an annual number of 4.3× 107 (95% uncertainty interval [UI] 7.4×106–5.4× 107) cases which represent 1.6×106 (95% UI 3.2× 105–2.3×106) DALYs. Relative disease burden (DALYs per 100,000 persons) varies widely, ranging between 1.3 (95% UI 0.1–5.7) for Senegal and 1.0× 103 (95% UI 4.2× 102–1.4×103) for Eswatini. Countries that carry the highest relative disease burden are primarily located in south and south-east sub-Saharan Africa and are characterised by a relatively high HIV/AIDS prevalence. Direct surface water consumption accounts for the vast majority of cases, but the results also point towards the importance of stable drinking water treatment performance. This is, to our knowledge, the first study to utilise modelled data on pathogen concentrations in a large scale QMRA. It demonstrates the potential value of such data in epidemiological research, particularly regarding disease aetiology.


Introduction
Cryptosporidium is increasingly recognised as a leading cause of diarrhoeal disease (Checkley et al., 2015;Shirley et al., 2012). Common around the world, its largest burden occurs in children in low-income regions and in people with a weakened immune system, particularly HIV/AIDS patients (Shirley et al., 2012). The highly contagious oocysts emitted by an infected person or animal are transmitted through the faecal-oral route, either by direct or indirect contact (Casemore, 1990;Chappell et al., 1999). Contaminated water represents an important indirect transmission route as Cryptosporidium oocysts are very suitable for waterborne transmission. Surface water, an important source of drinking water and sometimes consumed without having received treatment, in particular, can pose a risk as it is prone to faecal contamination. The recently developed Global Waterborne Pathogen model for Cryptosporidium (GloWPa-Crypto model) produces global estimates of Cryptosporidium oocyst concentrations in surface water at a 0.5 by 0.5°(longitude by latitude) resolution (Vermeulen et al., 2018). This has created the opportunity to assess the risk and disease burden due to Cryptosporidium in surface water used as drinking water at a larger scale than previously possible. Quantitative microbial risk assessment (QMRA) is the primary tool to be used for such assessments and has been widely applied to study the risk associated with microbial contamination of drinking water. Development and application of largescale QMRA models, utilising data such as those produced by the GloWPa model, was identified as a priority towards gaining a greater understanding of the risks and disease burden associated with waterborne pathogens (Hofstra et al., 2019).
With this study we aimed to provide further insight into the disease burden attributable to Cryptosporidium, which remains uncertain, and demonstrate a practical application of modelled waterborne pathogen data. Sub-Saharan Africa was recognised as a region for which such data could be particularly useful due to its low overall rates of access to safely managed drinking water, and poor availability of water quality data (United Nations Environment Programme, 2016). Additionally, diarrhoeal disease, including cryptosporidiosis, remains a significant health issue within this region Squire and Ryan, 2017;Troeger et al., 2018); the most recent Global Burden of Disease study (GBD) has attributed 7.9 million disability adjusted life years (DALYs) to Cryptosporidium in sub-Saharan Africa, 16.7% of the total diarrhoeal disease burden . By conducting a QMRA with data from the GloWPa-Crypto model as an input, we produced a first exploration of the risk and disease burden (in DALYs) attributable to Cryptosporidium in consumed surface water for sub-Saharan Africa. We examined two possible transmission pathways: direct consumption and consumption of treated surface water from a piped supply. A sensitivity analysis provided insight into which QMRA parameters were most influential in determining the final outcome.

Methodology
A QMRA generally consists of four steps; hazard identification, exposure assessment, dose-response assessment and risk characterisation (Medema et al., 2009). The presence of infectious Cryptosporidium oocysts in surface water used as drinking water was identified as the hazard. The other steps will be described in this section. Some QMRA parameters were represented through a statistical distribution rather than a fixed value. Monte Carlo simulation was used to sample from these distributions. Based on the surface water concentration data produced by the GloWPa-Crypto model, the study area was divided into 0.5 by 0.5 longitude by latitude degree grids. For each grid and each month the Monte Carlo simulation produced 10,000 different risk values. The QMRA was programmed in R using R Studio (RStudio Team, 2016). Forty-three countries were included in the assessment; due to limited data availability Comoros, Djibouti, Mauritius, Sao Tome and Principe, and the Seychelles were excluded from analysis. The main results were presented at the country level as the median value with a 95% uncertainty interval (95% UI). Table 1 provides an overview of all QMRA inputs.

Surface water usage in sub-Saharan Africa
The Joint Monitoring Programme (JMP) produces estimates of direct surface water consumption and access to a piped supply using nationally representative survey data. These estimates were retrieved for 2016 from the JMP website (http://washdata.org/data). We estimated the share of the population with access to a piped supply that received piped water sourced from surface water. Data from Döll et al. (2012) was the primary source for this assessment. These data included the average groundwater share of all water abstracted for domestic purposes (assuming only surface-and groundwater sources) between 1998 and 2002 for 0.5 by 0.5°longitude by latitude grid cells. It was assumed this included all water for consumption, and that these fractions have remained constant over time. The average groundwater share was determined per country and subtracted from 100 to obtain the surface water shares. For some countries these figures were complemented or substituted by more recent figures. AQUASTAT provided additional information for two countries, the International Water Association (IWA) Water Statistics and Economics website provided data on drinking water abstraction source for six countries (IWA, 2014). The Nature Conservancy (TNC) had assessed the drinking water source for thirty cities in sub-Saharan Africa in their Urban Water Blueprint Study (The Nature Conservancy, 2016). Five major cities in both South Africa and Nigeria were represented in these data which was taken into account to determine the urban surface water share for these countries. It was assumed people exclusively consumed either surface or ground water, and received this from either a direct or piped source with no variability. All surface water was assumed to be sourced from within the grid in which it was consumed. Surface water shares were assumed to be equal for each urban or rural grid in a country. No surface water consumption was assumed to take place in The Gambia as a result of saltwater infiltration (FAO, 2005). Fig. 1a provides an overview of the estimated shares of the population consuming surface water, with additional detail provided in Supplement A.

Source water concentrations
Source water oocyst concentrations were modelled for 0.5 by 0.5°l ongitude by latitude grids for each month of the year using the most recent GloWPa-Crypto model which takes into account both human and animal emissions (Vermeulen et al., 2018). Oocyst concentration means were lognormally distributed with a standard deviation of 1. Only the Cryptosporidium species C. hominis and C. parvum were taken into account, as these most commonly cause infection in humans. In the GloWPa-Crypto model it was assumed that these species represented 50% of oocysts excreted by cattle and buffalo calves, 20% of oocysts excreted by lambs and goat kids, and 5% of oocysts excreted by adult cattle and buffaloes. It was assumed all oocysts were viable prior to treatment. Concentration data produced by GloWPa-crypto were approximately representative of the year 2010, as sanitation, waste water treatment, gridded population and livestock data from around 2010 were used in their development.

Drinking water treatment
We assumed all water from a piped supply had received treatment consisting of coagulation, sedimentation, and filtration (basic secondary treatment), with no significant removal during primary treatment (Cummins et al., 2010) and additional disinfection using chlorine. The treatment removal rate was calculated using Eq. (1): With R t the overall removal rate, R s the log removal due to sedimentation, R f the log removal due to coagulation and filtration, and R c the log removal due to chlorination (assumed to be 0 [Medema et al., 2009]). R s was sampled from a triangular distribution with min. = 0.5, mode = 1 and max. = 2 log credits (Cummins et al., 2010). R f was selected from a uniform distribution with min. = 2 and max. = 3 log credits (Brown and Clasen, 2012).
The practice of boiling water at home before consumption was included in both pathways with an assumed log removal of 6 (Medema et al., 2009). Prevalence of this practice was determined for a number of countries by Rosa and Clasen (2010), who found an overall average for Africa of 4.5% (7% for urban populations, 3.6% for rural populations). These average values were applied to countries for which no specific data were available.

Daily oocyst intake
Daily oocyst intake was calculated using Eq. (2): With N d the daily oocyst intake, C sw the oocyst concentration in source water, R t the log removal rate of treatment, R b the log removal of boiling, and V the volume of water ingested. A review of previous QMRAs conducted in sub-Saharan Africa (Van Abel and Taylor, 2018) indicates that applied daily water intake values have ranged between 100 ml/day and 2.9 l/day, with only one country specific value (100 ml/day for South Africa based on unpublished data). It is likely that high variability in water consumption exists between individuals. In absence of country-specific data we followed the recommendation of Mons et al. (2007) and sampled water intake values from a Poisson distribution with λ = 3.49 (glasses [250 ml] of water/day) (Robertson et al., 2000). Incidental ingestion (e.g. through recreation or tooth brushing) was not included.

Dose-response assessment
An exponential dose-response relationship was used to describe the probability of infection given a certain number of ingested oocysts (Eq. (3)): With P d,i the daily probability of infection and r the dose-response parameter. For the immunocompetent population we sampled r from a triangular distribution with min. = 0.00024, mode = 0.00419 and max. = 0.0573 (Daniels et al., 2018). For the immunocompromised population r = 0.354 (Pouillot et al., 2004) was applied.

Risk characterisation
For each grid, the daily probability of infection was used to calculate the monthly and annual probability of infection (Eq. (4) and (5)). This was subsequently converted to an annual probability of illness (Eq. (6)).
With P m,i and P y,i the monthly and annual probabilities of infection, respectively, P y,ill the annual probability of illness and P ill,inf the probability of illness given infection. P ill,inf was set at 0.7 for the immunocompetent population and 1 for the immunocompromised population (Havelaar and Melse, 2003;Xiao et al., 2012). Eight different risk values were calculated, four for both the immunocompetent and immunocompromised populations representing the different treatment pathways (i.e. consumption from a piped supply with or without boiling, and direct consumption with or without boiling).

Disease burden
The disease burden was represented in DALYs, which consist of the years of life lost (YLL) and years lived with disability (YLD) values associated with a certain health condition. For cryptosporidiosis the standard DALY value is 1.47 (YLL = 0.13; YLD = 1.34) per 1000 cases (Havelaar and Melse, 2003). It has been suggested that this value is not always appropriate; especially for countries with a large immunocompromised population, a higher value will better reflect the actual burden due to the more severe disease progression in this group. Considering the relatively high HIV/AIDS prevalence in sub-Saharan Africa, the methodology of previous QMRAs (Howard et al., 2006;Labite et al., 2010) was adopted to calculate a new DALY value for each country consisting of the same YLD value and a new YLL value. We assumed a mortality rate of 10% for HIV/AIDS patients not receiving (or without access to) antiretroviral therapy (ART). As most HIV/AIDS prevalence data was only available for two age groups, 0-14 years and 15-49 years, all cases were assumed to occur within these age groups.
With a life expectancy of 1.5 years for AIDS patients this gave a median age of death of 8.5 years and 33.5 years, respectively (Labite et al., 2010). Using life expectancy at birth (obtained from the World Bank Databank) an average weighted YLL per death was determined for each country which was subsequently used to determine a base DALY value (per 1000 cases) (Fig. 1b). Supplement B provides an overview of the used values. The number of DALYs per grid was calculated by multiplying the base DALY values with the number of cases (/1000) per grid. The values for each grid pertaining to a certain country were added together to find the total number of DALYs per country.

Sensitivity analysis
Sensitivity of the model inputs used in the risk calculations was explored through a nominal range sensitivity analysis (NRSA). Input parameter values were increased or decreased from the base model, one input at a time (Table 2). We opted for an NRSA as it provides a quantitative insight into the individual impact of the different model parameters on model outcome. Parameter inputs were varied over a plausible range considering the sensitivity of the GloWPa-model (for concentrations, Vermeulen et al., 2018), variation in water intake values applied in previous QMRAs (Van Abel and Taylor, 2018), and possible variation in treatment performance (Cummins et al., 2010), although the latter was limited to a 0.5 log change based on the lower bound value of the distribution used to describe performance of the sedimentation step (triangular [0.5, 1, 2]). While treatment parameters were included individually, their influence on model outcome was the same and they were represented as a single value (see Table 2 for individual parameter values). The effect of accounting for HIV/AIDS prevalence in the disease burden calculation was also assessed by using the standard DALY value of 1.47 in the disease burden calculation. As we were primarily interested in parameters associated with the exposure pathway, the dose-response parameter (r) and probability of illness given infection (P ill,inf ) were not included in the sensitivity analysis. Additionally, there was limited information available to realistically vary these inputs beyond the variability already included in the model.

QMRA
We estimated an annual number of 4.3 × 10 7 (95% UI 7.4 × 10 6 -5.4 × 10 7 ) cases of illness across all included countries (Table 3). A breakdown by group (Table 4) reveals that the vast majority of cases are a result of direct surface water consumption and consequently occur in the rural population. The wide uncertainty interval around the affected piped population points towards the significance of stable treatment performance. The four countries with the highest number of cases (Nigeria, Kenya, Tanzania, and Ethiopia) together account for over 50% of all cases.

Sensitivity analysis
The results from the sensitivity analysis are displayed in Fig. 3. Overall, increasing or decreasing the oocyst concentrations by 1 log resulted in the largest change in disease burden of approximately 15.1% higher or 17.5% lower, respectively. As expected, varying treatment performance had a noticeably smaller positive (4.8% lower at a 0.5 log increase) than negative (13.4% higher at a 0.5 log decrease) effect. This highlights again the importance of stable treatment performance. Changing water intake by 20% had a limited effect on the final disease burden (2.9% higher or 3.5% lower).
Using the standard DALY value of 1.47 in the disease burden calculations yields an overall disease burden of 6.3 × 10 4 DALYs (95% UI 1.1 × 10 4 -7.9 × 10 4 ), with relative disease burdens (DALYs per 100,000 persons) ranging between 0.2 (95% UI 0.02-0.7) for Cameroon and 24.1 (95% UI 5.3-25.8) for Kenya. Comparison between these values and the main findings highlights the potentially substantial contribution of the immunocompromised population to the burden of Cryptosporidium. However, it also warrants the need to further quantify the relationship between immunocompromising conditions such as HIV/AIDS and infection with Cryptosporidium.

Discussion
This study demonstrates the use of modelled concentration data to explore the disease burden associated with Cryptosporidium in consumed surface water for sub-Saharan Africa. While it is known that Cryptosporidium is an important cause of diarrhoeal disease, the contribution of this specific exposure pathway was not quantified before at a comparable scale. Our findings highlight the significance of direct surface water consumption and immune status as risk factors of Cryptosporidium infection and disease burden.
Comparison of our simulations with observational data or other model outputs is difficult, because such data are unavailable. In addition, the extent to which unsafe drinking water contributes to the risk of Cryptosporidium infection remains very uncertain. When various risk Table 2 Inputs of sensitivity analysis of QMRA and disease burden calculations.

Group
Number of cases (95% UI) Total population in group Cases/pppy in group (95% UI) Urban 3.7 × 10 6 (1.1 × 10 6 -9.1 × 10 6 ) 124,805,031 0.03 (0.009-0.07) Rural 3.9 × 10 7 (6.3 × 10 6 -4.5 × 10 7 ) 132,434,283 0.3 (0.05-0.3) Piped 6.9 × 10 4 (1.7 × 10 2 -2.9 × 10 6 ) 177,511,247 3.9 × 10 −4 (9.7 × 10 −7 -0.02) Direct 4.3 × 10 7 (7.4 × 10 6 -5.1 × 10 7 ) 79,728,068 0.5 (0.09-0.6) Immunocompetent 4.1 × 10 7 (5.4 × 10 6 -4.9 × 10 7 ) 247,642,999 0.2 (0.02-0.2) Immunocompromised 2.2 × 10 6 (2.0 × 10 6 -5.0 × 10 6 ) 9,596,316 0.    transmission could be responsible for 35-37% of cryptosporidiosis cases in sub-Saharan Africa, with similar figures for other world regions. As this percentage includes all waterborne transmission pathways, we expect the surface water attributable share to be lower. To evaluate our results, we estimated this share by dividing our estimated number of cases for children under 5 by the GBD number of cases for this age category for each included country (Fig. 4a). Country specific GBD data for this calculation were obtained through the Global Health Data Exchange GBD Query Tool [http://ghdx.healthdata.org/gbd-resultstool]). We find a total number of 6.8 × 10 6 (95% UI 9.3 × 10 5 -8.0 × 10 6 ) annual cases for children under 5, which represents, on average, 20.2% of the under-5 annual incidence as estimated by GBD. Using the WHO expert elicitation estimate this would equal, on average, ∼59% of all waterborne cases. No relevant source attribution studies exist for comparison. However, in view of the prevalence of direct surface water consumption from JMP (which causes the vast majority of cases) and the moderating (unaccounted for) effects of factors such as immunity, we consider this to be a reasonable if somewhat high percentage. The majority of the cases in our study occur in the population drinking surface water directly. Only a fraction of the cases occur in the population drinking piped surface water (6.9 × 10 4 vs 4.3 × 10 7 , see Table 3). Butler et al. (2016) found that in Canada approximately 20.4% of waterborne cryptosporidiosis cases could occur as a result of treated surface water consumption (from large [8.4%] and small [12%] municipal systems). This percentage makes sense in a high-income setting considering the absence of other major (water-associated) risk factors besides recreation, but is consequently very high compared to our estimates. For sub-Saharan Africa our results seem more reasonable.
For the disease burden, we can compare our results to the all-age GBD estimates by dividing our disease burden by the GBD disease burden. The found shares range between 0.20% and > 100% and the average share is 19.4% (Fig. 4b). For several countries in sub-Saharan Africa our disease burden due to consumption of surface water is higher than the disease burden estimated by GBD which includes all exposure pathways. While the GBD estimates are also not conclusive, especially for many countries included in this study, we do believe that when our estimates are higher than the GBD estimates (> 100% in Fig. 4b), our estimates are too high. This could be attributed to the fact that the majority of the cases and burden occur in children under 5, likely causing overestimation of our all-age burden (and incidence). The applied correction on the standard DALY value based on the prevalence of HIV/AIDS as demonstrated before (Howard et al., 2006;Labite et al., 2010), may play an additional role. The sensitivity analysis (Section 3.2) indicated that the overall disease burden we estimated was 25 times higher than the disease burden estimated when using only the standard DALY value for cryptosporidiosis. For individual countries the differences can be higher. More research is required on the DALY values to use in a low-income setting and for the immunocompromised population.
The number of cases and the disease burden are uncertain due to uncertainty in and availability of input data. In our assessment we have used statistical distributions instead of fixed values to capture some of this uncertainty. Sub-Saharan Africa, generally, is a data-poor region with few relevant country-or region-specific inputs. We used simulated Cryptosporidium concentration inputs at a 0.5 by 0.5°grid, and countryspecific surface water consumption and DALY values to partially mitigate this issue. Despite this, not all differences between and within countries are fully conveyed through our methodology. As such, while the found overall share as discussed above appears to be plausible, we cannot quite attest to the reasonability of the shares found at the national level.
Our sensitivity analysis highlights the importance of other input variables. The used concentrations contribute to the uncertainty. Validation of the GloWPa-crypto model has shown it typically predicts concentrations that are higher than the observed concentrations, for various reasons (around 1.5-2 log units; Vermeulen et al., 2018). A 1 log reduction in the concentration resulted in a 17.5% reduction in the disease burden. Conversely, however, there are also factors that, when included, should result in higher estimates. Examples are intermittent performance or failure of treatment, or (further) contamination of the water supply after treatment. Additionally, recent findings point towards the potential significant long-term health consequences incurred by young children as a result of cryptosporidiosis  which were not considered here. Community or individual level variability in drinking water source, access to and quality of care, health status, or other environmental exposures could influence the outcome both positively and negatively. Currently, it does not appear to be possible to include all of these components. However, for future assessments of the same nature it is recommended to establish, at least, a more comprehensive exposure pathway.
To our knowledge, this is the first time modelled data on waterborne pathogen concentrations have been used in a large-scale, comprehensive QMRA. We have demonstrated a practical application of such data in a global health context. In the future, model-based assessments could offer an important complement to 'traditional' disease burden estimation, especially when attempting to attribute a source to a case. Here, surface water consumption is singled out as a source. This has not previously been done for Cryptosporidium. Spatially explicit data, such as those used in this study, offer the opportunity to provide a spatially explicit overview of disease (burden) distribution. Additionally, models such as GloWPa-crypto allow us to explore the effect of future change on pathogen occurrence and, subsequently, the associated health risks. Future research could explore, for example, the effects of climate change or population growth, which have been identified as two important drivers behind increased pathogen presence in water for sub-Saharan Africa (Squire and Ryan, 2017). We also see opportunities to utilise more advanced methods to describe exposure and infection such as susceptible-infected-recovered (SIR) modelling, as used by Daniels et al. (2018), or Agent-Based Modelling which allows for better incorporation of individual behaviour and interaction (Hofstra et al., 2019).

Conclusion
We present a first exploration of the disease burden of Cryptosporidium specifically attributable to surface water consumption for sub-Saharan Africa. This transmission pathway could be responsible for an estimated 43.1 million cases annually which represent 1.6 million DALYs, with large variation between countries. The majority of cryptosporidiosis cases in our study population occur as a result of direct surface water consumption. The relative burden distribution and sensitivity analysis highlight the potentially substantial contribution of the immunocompromised population to the overall burden. Further refinement of the disease burden estimate is possible, for example by including a more comprehensive exposure pathway. This study demonstrates the potential of modelled data on pathogen concentrations in the context of disease burden estimation, particularly regarding disease aetiology.