Modelling the age-prevalence relationship in schistosomiasis: A secondary data analysis of school-aged-children in Mangochi District, Lake Malawi

Schistosomiasis is an aquatic snail borne parasitic disease, with intestinal schistosomiasis (IS) and urogenital schistosomiasis (UGS) caused by Schistosoma mansoni and S. haematobium infections, respectively. School-aged-children (SAC) are a known vulnerable group and can also suffer from co-infections. Along the shoreline of Lake Malawi a newly emerging outbreak of IS is occurring with increasing UGS co-infection rates. Age-prevalence (co)infection profiles are not fully understood. To shed light on these (co)infection trends by Schistosoma species and by age of child, we conducted a secondary data analysis of primary epidemiological data collected from SAC in Mangochi District, Lake Malawi, as published previously. Available diagnostic data by child, were converted into binary response infection profiles for 520 children, aged 6–15, across 12 sampled schools. Generalised additive models were then fitted to mono- and dual-infections. These were used to identify consistent population trends, finding the prevalence of IS significantly increased [p = 8.45e-4] up to 11 years of age then decreasing thereafter. A similar age-prevalence association was observed for co-infection [p = 7.81e-3]. By contrast, no clear age-infection pattern for UGS was found [p = 0.114]. Peak prevalence of Schistosoma infection typically occurs around adolescence; however, in this newly established IS outbreak with rising prevalence of UGS co-infections, the peak appears to occur earlier, around the age of 11 years. As the outbreak of IS fulminates, further temporal analysis of the age-relationship with Schistosoma infection is justified. This should refer to age-prevalence models which could better reveal newly emerging transmission trends and Schistosoma species dynamics. Dynamical modelling of infections, alongside malacological niche mapping, should be considered to guide future primary data collection and intervention programmes.


Introduction
School-aged-children (SAC) are known to be one of the most vulnerable groups for schistosomiasis, which can lead to severe morbidity, and in some cases mortality. Standard infection and transmission rates in SAC are 3-4 times higher than in adults (Colley et al., 2014). Children are thought to be first infected soon after birth upon freshwater contact(s) with prevalence increasing with cumulative parasite exposure(s) up to adolescence (WHO, 2022). Over time, ongoing inflammation within the tissues, from accumulating trapped eggs, can lead SAC to suffer from malnutrition, anaemia, and neurological and developmental delays (Mawa et al., 2021). Furthermore, acute and chronic infection with urogenital schistosomiasis (UGS) and/or intestinal schistosomiasis (IS) can lead to debilitating symptoms and signs such as stunting, but whether chronic co-infections are truly synergistic is equivocal (Gouvras et al., 2013).
To counter schistosomiasis, WHO recommend preventive chemotherapy by mass drug administration (MDA) with the anthelmintic praziquantel. MDA treatment programmes can avert and reverse some of these disease manifestations as well as diminish transmission. However, praziquantel is only effective against adult worms, leaving immature (drug tolerant) worms to remain within the body (Colley et al., 2014). Since MDA does not guard against reinfection, SAC often reacquire infection upon subsequent water contact, with persistent "hotspots" occurring (Mawa et al., 2021;Stothard et al., 2013;Makaula et al., 2014;McManus et al., 2018). As a consequence of ongoing persistent schistosomiasis infection among SAC, children are often absent from school, and have delayed learning affecting their ability to work as they enter adulthood (WHO, 2022). This further hinders the socio-economic advances of a geographical area, a known risk factor for schistosomiasis (Mawa et al., 2021).
A decrease in prevalence of infection is known to occur after young adolescence, which is typical of community age-prevalence relationships (WHO, 2022;Woolhouse, 1998). This is thought to be due to the development of partial immunity over time given repeated exposure, as well as decreased contact with water or more enigmatic changes in skin texture, for example (Colley et al., 2014;Dawaki et al., 2016;Oso and Odaibo, 2020). Prevalence rates among SAC and the wider population vary considerably between geographical areas, often with localised rates among each community (Colley et al., 2014). There are many factors that influence the transmission rates in a specific area, such as demographic and environmental factors, MDA, and snail-schistosome ecology (Mawa Fig . 1. Locations of the schools sampled in the primary study, a) red markers represent a repeat of the previous collection (80 SAC sampled), green markers represent collections newly known to Biomphalaria intermediate host locations (60 SAC sampled) and yellow markers represent rapid mapping of the shoreline (30 SAC sampled), b) map indicating the location of Mangochi District. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) et al., 2021). Consequently, prevalence data can be noisy but pooling across schools allows for inferences to be extracted. The heterogeneity of transmission in a geographical area within a community influences the age at which prevalence and intensity of infection are at their highest in SAC, leading to some SAC being burdened more than others (Gryseels et al., 2006). The identification of areas with high prevalence and intensity of infection is essential to allow for more appropriate application of control interventions, such as MDA (Kittur et al., 2017).
The southern part of the Lake Malawi shoreline in Mangochi District has been reported to have increasing schistosomiasis infection rates since the 1980s, with known UGS endemicity in the region (Madsen et al., 2011). Al-Harbi et al. 2019 (Alharbi et al., 2019) and Kayuni et al. 2020(Kayuni et al., 2020 reported the emergence and an outbreak of IS since 2017 in this region, in part due to the newly detected presence of Biomphalaria, a keystone snail intermediate host for Schistosoma mansoni. They suggested better inspection of ageinfection dynamics is needed before intensification of current control methods is advised. Similarly, with the recent endorsement of urine-CCA testing for prevalence mapping of IS (Bärenbold et al., 2018), closer inspection of infection data by age would further underpin its guiding role. To our knowledge, however, there are no studies that have analysed the age-prevalence relationship of IS within SAC in the context of a newly emerging focus of infection set against a background of UGS.
In this secondary analysis of primary data reported by Kayuni et al. 2020(Kayuni et al., 2020, our two aims were: i) to determine if general relationships between age of SAC and prevalence of IS, UGS and co-infection could be determined, and ii) to assess heterogeneities in infection-age profiles across sampled schools.

Dataset
The primary dataset reported by Kayuni et al. (Kayuni et al., 2020) which this secondary analysis is based on, was originally collected in late May/June 2019 from cross-sectional school-based surveys in Mangochi District along the shoreline of southern Malawi ( Fig. 1) (Kayuni et al., 2020). In brief, the study carried out a mixture of rapid diagnostic tests, parasitological examinations and questionnaire surveys on 520 primary school children, aged 6-15 years old in twelve schools, after parental consent was given. The study was split into three phases during May/June 2019: 80 SAC each from Samama and Mchoka schoolsannual follow up (Alharbi et al., 2019); 60 SAC each from Moet and Koche schoolsan assessment of the two schools near known locations for Biomphalaria; and 30 SAC each from 8 further schools along the lake shorelinea rapid surveillance exercise. The SAC were randomly sampled after being stratified by gender and age, with sample sizes at each school calculated by standard sample size methodology (Kayuni et al., 2020). As reported by Kayuni et al. (Kayuni et al., 2020), all participants provided a urine sample. Sampling was accompanied by a questionnaire on demographics, water contact behaviour, praziquantel treatment history and travel. A visual inspection of the urine samples was carried out before samples underwent on-site testing using the circulating cathodic antigen (CCA) test for IS, and 10 ml well-mixed urine was filtrated for UGS (Kayuni et al., 2020). The former was used to estimate prevalence of IS and the latter for UGS (Bärenbold et al., 2018;WHO, 1993).
Ethical approval for this study was obtained from the National Health Sciences Research Committee, Mangochi District Health Office Research Committee and LSTM's Research Ethics Committee.

Statistical analysis
The primary data were cross-checked with any ambiguities resolved against paper records, then secondary analyses were carried out in R version 3.6.1 with RStudio. The CCA antigen test and urine filtration results were used as binary response variables to measure the prevalence of infection. The responses were categorised into two subgroups '1 = Positive' and '0 = Negative' in our study. For CCA antigen tests in the original study an additional 'trace' result was recorded. In our study, we carry out two analyses: one as 'T+' (Trace positive) and one as 'T-' (Trace negative). 'T+' is where all trace responses are considered 'Positive' and 'T-' is where all trace responses are considered 'Negative'.
As a visual exploration tool, heatmaps were used to inspect the empirical age-prevalence profile of S. mansoni and S. haematobium in each school. The order of the schools on the heatmaps reflected a highest to lowest prevalence ranking.
For both Schistosoma species assessed, our response data were binary: an individual was denoted positive (1) or negative (0) for infection and for co-infection an individual was positive (1 and 1) for both infections. Our response data were from independent tests detecting different infections. We assumed therefore, given the characteristics of a child, their age and school, that test results are independent between children. The CCA and urine filtration tests behave the same with respect to school and age, but sensitivity and specificity can vary with prevalence. Despite this, we assumed that the sensitivity and specificity do not change with respect to age of the children or by school.
We assumed the diagnostic data followed a Bernoulli distribution and therefore used a logistic regression framework. Since our exploratory data analysis (Appendix A Figs. A.1 and A.2) suggested a non-linear relationship between log odds of infection and age, we fitted age using a thin plate spline. School was additionally fitted as a categorical explanatory variable to adjust for systematic schoollevel variation in baseline prevalence. The resulting logistic generalised additive model (GAM) enables estimation of a smooth, though non-linear, relationship between age and prevalence as a trend summary of our otherwise noisy observational data (Hastie and Tibsh, 1990). GAMs were fitted using the 'mgcv' package in R version 3.6.1 (Wood, 2019) (Appendix B). After fitting these models, smooth age-prevalence curves were reconstructed for each outcome. Model fit was assessed by plotting the average of binned residuals against the fitted values as shown in Appendix E (Gelman and Hill, 2007).

Results
As reported in the primary study, 520 children were tested using urine CCA-dipsticks for S. mansoni and urine filtration for S. haematobium (Kayuni, 2020). Our provisional secondary analysis found that the prevalence of S. mansoni at each school ranged from 67.5% to 96.7%, with overall pooled prevalence of 82.5% [T+]. Schistosoma haematobium prevalence ranged from 3.3% to 60.0% with an overall pooled prevalence of 24.0%. Co-infection prevalence by school, ranged from 1.67% to 56.7% with overall pooled prevalence of 21.0% (Table 1). Ages of the SAC were between 6 and 15 years, with mean age 10.4. Ndembo school had the lowest mean age sampled with 9.77, whereas Mtengza had the highest with 10.7. Trace negative [T-] prevalence summary can be found in Appendix Table D.1.

Generalised additive models
The thin-plate spline for age, adjusted for school, used in the GAM enables us to construct a smooth function of the log odds ratio of infection with respect to age. For the average binned residuals, no evidence of outliers or systematic model fit was found, suggesting they were a good measure of fit (Appendix E Fig. E.1 and E.2). Fig. 3a shows very strong evidence for a non-linear relationship between S. mansoni infection and age [T+: p = 8.45e-4], Fig. 3b shows strong evidence for a non-linear relationship between co-infection and age [T+: p = 7.81e-3], whereas there is no evidence to suggest an increase or decrease of S. haematobium infection with age [p = 0.114]. This is visualised in Fig. 3a, S. mansoni [T+], where the smoothing coefficient for age goes from a negative to positive from ages 6 to 11 before decreasing back to negative, and similarly for co-infection in Fig. 3b. For S. haematobium there was no clear pattern between prevalence and age for all the schools (Fig. 3c). Fig. D

Discussion
To our knowledge, the secondary analysis reported here is the first to analyse the IS infection age-relationship within SAC in a newly established and novel co-infection focus. The newly emerging focus of IS was first noted by Al-Harbi et al. (Alharbi et al., 2019) then described in greater detail by Kayuni et al. (Kayuni et al., 2020). Even though MDA has been ongoing, this focus of IS and coinfections thereof, is being further documented as it seemingly spreads along the southern part of shoreline of Lake Malawi in the Mangochi District. For S. mansoni infection detected by CCA dipsticks from the primary data (Kayuni et al., 2020), our secondary analysis finds that a positive association between IS prevalence and age was observed up to the age of 11, after which there was a decreasing trend [T+: p = 8.45e-4]. As might be expected, co-infection showed a similar pattern [T+: p = 7.81e-3], largely mirroring the IS pattern. By contrast, no clear age-infection pattern for UGS was identified [p = 0.114].
Other studies on Schistosoma infection carried out in sub-Saharan Africa have found varied peak infection profiles and all to be expected to arise around early to mid-adolescence [10-15 years] (Colley et al., 2014;Madsen et al., 2011;Singh and Muddasiru, 2014;Wilson, 2020;Mazigo et al., 2021). The earlier observed peak in S. mansoni and co-infection prevalence at the age of 11 in this study compared to up to 15 years may be a result of the newly established transmission potential of this species locally alongside growing acquired immunity in exposed children. This is due to their cumulative exposure to parasite antigens as adult worms die within the body after treatment or natural senescence, or as egg antigens present. Literature reports varied age infection profiles considered to be dependent on the transmission rates and focality (Woolhouse, 1998).
In classic infection epidemiology of schistosomiasis, changes in the 'peak shift' are known (Woolhouse, 1998;Anderson and May, 1985). These can be explained by site-specific factors, for instance, water exposure, environmental, socio-economic, genetic, MDA compliance, as well as age and gender profiles within a community (Mawa et al., 2021). In our instance, the expansion of the underlying distribution of Bi. pfeifferi both in time and space is an influential transmission potential driver of IS. A key observation is the contrasting age-prevalence by schistosome species, yet each share a common infection pathway, viz. exposure to unsafe water. The occurrence of the snail species present in unsafe water is an underlying heterogeneity of the fine-scale distributions of intermediate snail hosts, viz. Biomphalaria for S. mansoni and Bulinus for S. haematobium. The latter genus of snail is also undergoing a reappraisal as cryptic species, with as of yet unknown transmission potentials, as described in (Alharbi et al., 2022). Whilst Kayuni et al. 2020  . Order of schools on heatmap was by highest to lowest prevalence and showed that there was considerable heterogeneity between the schools. Further, S. haematobium shows a similar pattern of prevalence among SAC to co-infection. (Kayuni et al., 2020) and Al-Harbi et al. 2019 (Alharbi et al., 2019) presented information of the presence and absence of Biomphalaria, a similarly precise map for Bulinus is starting to emerge (Alharbi et al., 2022). A recent study of Bi. pfeifferi has confirmed a year-on-year expanding distribution of this species along the shoreline of the lake, with clear evidence of schistosome DNA in examined snails from . This follows a general trend towards SAC between 9 and 12 years old having the highest odds of infection was seen in all cases, though that for UGS is not statistically significant. several independent locations (Alharbi et al., 2023). It is reasonable to speculate that further transmission foci for intestinal schistosomiasis will continue to appear in the lake and along its periphery. Heterogeneities in prevalence among the schools were also found in our secondary analysis, with Mchoka School having the lowest and St Augustine 2 the highest for S. mansoni infection. Clearly, this heterogeneity indicates that there are many further un-considered factors that affect the transmission of Schistosoma infection within SAC, such as location, local environmental and socio-economic factors. For instance, SAC living and attending school in areas near the lake shoreline have been found to have increased and different age-infection profiles compared to inland villages (Madsen et al., 2011). As longitudinal data were not collected in the primary study, we were not able to assess seasonal or long-term variation in prevalence; however it is possible that the force of infection could vary spatially and temporally. For instance, environmental changes such as increases or decreases in water levels of the lake, flooding events or fluctuations in aquatic vegetation could impact SAC water contact (Alharbi et al., 2019;Kayuni et al., 2020). Another factor not studied was reinfection after preventive chemotherapy, as praziquantel only affects the adult worms, and any immediate snail exposure could therefore lead to reinfection. Further studies into the relationship between age, water exposure rates and treatment in the future could enhance our perspective of age infection profiles. Identifying the peak of infection prevalence within SAC at school level using GAM increases the interpretability of our findings by turning noisy data into useful and assessable information which in turn will help better our understanding of epidemiology of infection and other control methods along the southern part of the shoreline (Mwinzi et al., 2015).
A further limitation of this secondary analysis was that, owing to available resourcing, sample size taken from each school in the primary data was constrained (Kayuni et al., 2020). More generally, GAMs are sometimes known to smooth out underlying relationships excessively. Also, they have higher computational load compared to linear models and unstable behaviours at the boundaries of smooth splines (Laskowski et al., 2020). Nevertheless, for the purpose of our study, the general age-prevalence relationships were detected adequately, and provide a useful insight for future research into the causal mechanisms driving this infection biology.

Conclusion
Our study which is a secondary analysis of recently collected epidemiological data concerning a newly emergent focus of IS against an existing background of US, provides evidence for the peak of prevalence for Schistosoma infection being around 11 years for both S. mansoni mono-infection and co-infection with S. haematobium along the southern part of Lake Malawi in Mangochi district. However, considerable heterogeneity still remains in terms of baseline prevalence between schools, and investigating this in terms of demographics and Schistosoma transmission dynamics requires further research. In particular, understanding how SAC exposure is related to water access will require both further prevalence and malacological niche mapping. Coupling these conclusions into statistically-grounded infection modelling techniques will advance the understanding of the dynamics of Schistosoma infection, and hence inform future intervention programmes.

Declaration of Competing Interest
None.

Data availability
Anonymised epidemiological data are available from the corresponding author upon request.