Model-based spatial-temporal mapping of opisthorchiasis in endemic countries of Southeast Asia

Opisthorchiasis is an overlooked danger to Southeast Asia. High-resolution disease risk maps are critical but have not been available for Southeast Asia. Georeferenced disease data and potential influencing factor data were collected through a systematic review of literatures and open-access databases, respectively. Bayesian spatial-temporal joint models were developed to analyze both point- and area-level disease data, within a logit regression in combination of potential influencing factors and spatial-temporal random effects. The model-based risk mapping identified areas of low, moderate, and high prevalence across the study region. Even though the overall population-adjusted estimated prevalence presented a trend down, a total of 12.39 million (95% Bayesian credible intervals [BCI]: 10.10–15.06) people were estimated to be infected with O. viverrini in 2018 in four major endemic countries (i.e., Thailand, Laos, Cambodia, and Vietnam), highlighting the public health importance of the disease in the study region. The high-resolution risk maps provide valuable information for spatial targeting of opisthorchiasis control interventions.


Introduction
End of the epidemics of neglected tropical diseases (NTDs) by 2030 embodied in the international set of targets for the sustainable development goals (SDGs) endorsed by the United Nations empowers the efforts made by developing countries to combat the NTD epidemics (UN, 2015). To date, 20 diseases have been listed as NTDs, and opisthorchiasis is under the umbrella of food-borne trematodiasis (Ogorodova et al., 2015). Two species of opisthorchiasis are of public health significance, that is, Opisthorchis felineus (O. felineus), endemic in eastern Europe and Russia, and Opithorchis viverrini (O. viverrini), endemic in Southeast Asian countries (Petney et al., 2013). The later species is of our interest in the current article.
According to WHO's conservative estimation, an overall disease burden due to opisthorchiasis was 188,346 disability-adjusting life years (DALYs) in 2010 (Havelaar et al., 2015). Fü rst and colleagues estimated that more than 99% of the burden worldwide attribute to O. viverrini infection in Southeast Asia (Fürst et al., 2012). Five countries in Southeast Asia, Cambodia, Lao PDR, Myanmar, Thailand, and Vietnam, are endemic for opisthorchiasis, with an estimated 67.3 million people at risk (Keiser and Utzinger, 2005). It is well documented that chronic and repeated infection with O. viverrini leads to the development of fatal bile duct cancer (cholangiocarcinoma) (International Agency for Research on Cancer, 1994).
The life cycle of O. viverrini involves freshwater snails of the genus Bithynia as the first intermediate host, and freshwater cyprinoid fish as the second intermediate host. Humans and other carnivores (e.g., cats and dogs), the final hosts, become infected by consuming raw or insufficiently cooked infected fish (Andrews et al., 2008;Saijuntha et al., 2014). Behavioral, environmental, and socioeconomic factors affect the transmission of O. viverrini (Grundy-Warr et al., 2012, Phimpraphai et al., 2017, Phimpraphai et al., 2018, Prueksapanich et al., 2018. Raw or insufficiently cooked fish consumption is the cultural root in endemic countries, showing a strong relationship with the occurrence of the disease (Andrews et al., 2008;Grundy-Warr et al., 2012). Poorly hygienic conditions increase the risk of infection, especially in areas practicing raw-fish-eating habit (Grundy-Warr et al., 2012). In addition, environmental and climatic factors, such as temperature, precipitation, and landscape, affecting either snail/fish population or growth of the parasites inside the intermediate hosts, can potentially influence the risk of human infection (Forrer et al., 2012;Suwannatrai et al., 2017). Important control strategies of O. viverrini infection include preventive chemotherapy, health education, environmental modification, improving sanitation, as well as comprehensive approaches with combinations of the above (Saijuntha et al., 2014). For purposes of public health control, WHO recommends implementing preventive chemotherapy once a year or once every 2 years depending on the levels of prevalence in population, with complementary interventions such as health education and improvement of sanitation (WHO, 2009).
Understanding the geographical distribution of O. viverrini infection risk at high spatial resolution is critical to prevent and control the disease cost-effectively in priority areas. Thailand conducted national surveys for O. viverrini prevalence in 1981O. viverrini prevalence in , 1991O. viverrini prevalence in , 2001O. viverrini prevalence in , 2009O. viverrini prevalence in , and 2014O. viverrini prevalence in (Echaubard et al., 2016Suwannatrai et al., 2018), but the results of these surveys were presented at the province level, which is less informative for precisely targeting control interventions. Suwannatrai and colleagues, based on climatic and O. viverrini presence data, produced climatic suitability maps for O. viverrini in Thailand using the MaxEnt modeling approach (Suwannatrai et al., 2017). The maps brought insights for identifying areas with a high probability of O. viverrini occurrence; however, they did not provide direct information on prevalence of O. viverrini in population (Elith et al., 2011). A risk map of O. viverrini infection in Champasack province of Lao PDR was presented by Forrer and colleagues (Forrer et al., 2012). To our knowledge, high-resolution, model-based risk estimates of O. viverrini infection are unavailable in the whole endemic region of Southeast Asia.
Bayesian geostatistical modeling is one of the most rigorous inferential approaches for high-resolution maps depicting the distribution of the disease risk (Karagiannis-Voules et al., 2015). Geostatistical modeling relates geo-referenced disease data with potential influencing factors (e.g., socioeconomic and environmental factors) and estimates the infection risk in areas without observed data (Gelfand and Banerjee, 2017). Common geostatistical models are usually based on point-referenced survey data (Banerjee et al., 2014). In practice, disease data collected from various sources often consists of point-referenced and area-aggregated data. Bayesian geostatistical joint modeling approaches provide a flexible framework for combining analysis of both kinds of data (Moraga et al., 2017;Smith et al., 2008). In this study, we aimed (1) to collect all available survey data on the prevalence of O. viverrini infection at point-or area-level in Southeast Asia through systematic review; and (2) to estimate the spatial-temporal distribution of the disease risk at a high spatial resolution, with the application of advanced Bayesian geostatistical joint modeling approach.

Results
A total of 2690 references were identified through systematically reviewing peer-review literatures, and 13 additional references were gathered from other sources. According to the inclusion and exclusion criteria, 168 records were included, resulted in a total of 580 ADM1-level surveys in 174 areas, 210 ADM2-level surveys in 142 areas, 53 ADM3-level surveys in 51 areas, and 251 point-level surveys at 207 locations in five endemic countries (i.e., Cambodia, Lao PDR, Myanmar, Thailand, and Vietnam) of Southeast Asia (Figure 1). Around 70% and 15% of surveys were conducted in Thailand and Lao PDR, respectively. Only two relevant records were obtained from Myanmar. To avoid large estimated errors, we did not include this data in the final geostatistical analysis. All surveys were conducted after 1970, with around 75% done after 1998. Most surveys (95%) are community based. Around 40% of surveys used the Kato-Katz technique for diagnosis, while another 42% did not specify diagnostic approaches. Mean prevalence calculated directly from survey data was 16.74% across the study region. A summary of survey data is listed in Table 1, and survey locations and observed prevalence in each period are shown in Figure 2. Area-level data cover all regions in Thailand and Lao PDR, and most regions in Cambodia and Vietnam, while point-referenced data are absent in most areas of Vietnam, the western part of Cambodia and southern part of Thailand. Around 70% of eligible literatures got a score equal or more than 7, indicating an overall good quality of eligible literatures in our study (Figure 2-figure supplement 1).
Seven variables were selected for the final model through the Bayesian variable selection process ( Table 2). The infection risk was 2.61 (95% BCI: 2.10-3.42) times in the community as much as that in school-aged children. Surveys using FECT (formalin-ethyl acetate concentration technique) as the diagnostic method showed a lower prevalence (OR 0.76, 95% BCI: 0.61-0.93) compared to that using Kato-Katz method, while no significant difference was found between Kato-Katz and the other diagnostic methods. Human influence index and elevation were negatively correlated with the infection risk. Each unit increase of the HII index was associated with 0.01 (95% BCI: 0.003-0.02) decrease  1998-2016 1989-2016 2015-2016 1978-2018 1991-2015 1978-2018 Year of survey (surveys/locations) in the logit of the prevalence. And increase in 1 m in elevation was associated with the 0.003 (95% BCI: 0.001-0.005) decrease in the logit of the prevalence. The spatial range was estimated as 83.55 km (95% BCI: 81.34-86.61), the spatial variance s 2 f was 12.59 (95% BCI: 11.96-13.56), the variance of beta-likelihood s 2 b was 0.15 (95% BCI: 0.14-0.15), and the temporal correlation coefficient was 0.66 (95% BCI: 0.65-0.67). Model validation showed that our model was able to correctly estimate 79.61% of locations within the 95% BCI, indicating the model had a reasonable capacity of prediction accuracy. The ME, MAE, and MSE were 0.24%, 9.06%, and 2.38%, respectively, in the final model, while they were À7.14%, 16.67%, and 5.09%, respectively, in the model only based on pointreferenced data, suggesting that the performance of the final model was better than the model only based on point-referenced data. On the other hand, Monte Carlo test for preferential sampling suggested that preferential sampling may exist for survey locations in one third (6/18) of the survey years ( Figure 2-source data 2).
The estimated risk maps of O. viverrini infection in different selected years (i.e., 1978, 1983, 1988, 1993, 1998, 2003, 2008, 2013, and 2018) are presented in Figure 3. In 2018, the high infection risk (with prevalence >25%) was mainly estimated in regions of the southern, the central, and the northcentral parts of Lao PDR, some areas in the east-central parts of Cambodia, and some areas of the northeastern and the northern parts of Thailand. The southern part of Thailand, the northern part of Lao PDR, and the western part of Cambodia showed low risk estimates (with prevalence <5%) of O.    estimation uncertainty was mainly present in the central part of Lao PDR, the northern and the eastern parts of Thailand, and the central part of Cambodia and Vietnam ( Figure 4).
In addition, the infection risk varies over time across the study region ( Figure 5). Areas of northern Thailand showed an increasing trend in periods 1978-1988 and 1993-2003, while most areas of the country presented a considerable decrease of infection risk after 2008. The infection risk first increased and then decreased in areas of the north, the central, and the southern parts of Lao PDR and the central parts of Vietnam. The east-central and western part of Cambodia showed an increasing trend in recent years.

Discussion
In this study, we produced model-based, high-resolution risk estimates of opisthorchiasis across endemic countries of Southeast Asia. The disease is the most important foodborne trematodiasis in the study region (Sripa et al., 2010), taking into account most of the disease burden of opisthorchiasis in the world (Fürst et al., 2012). The estimates were obtained by systematically reviewing all possible geo-referenced survey data and applying a Bayesian geostatistical modeling approach that jointly analyzes point-referenced and area-aggregated disease data, as well as environmental and socioeconomic predictors. Our findings will be important for guiding control and intervention costeffectively and serve as a baseline for future progress assessment. Our estimates suggested that there was an overall decrease of O. viverrini infection in Southeast Asia from 1995 onwards, which may be largely attributed to the decline of infection prevalence in Thailand. This decline was probably on account of the national opisthorchiasis control program launched by the Ministry of Public Health of Thailand from 1987 (Jongsuksuntigul and Imsomboon, 2003;. Our high-resolution risk estimates in Thailand in 2018 showed similar pattern as the climatic suitability map provided by Suwannatral and colleagues (Suwannatrai et al., 2017). In this case, we estimated the prevalence of the population instead of the occurrence probability of the parasite, which arms decision makers with more direct epidemiological information for guiding control and intervention. The national surveys in Thailand reported a prevalence of 8. 7% and 5.2% in 20097% and 5.2% in and 20147% and 5.2% in , respectively (MOPH, 2014Wongsaroj et al., 2014). However, we estimated higher prevalence of 12.44% (95% BCI: 10.79-14.26%) and 9.34% (95% BCI: 7.88-11.02%) in 2009 and 2014, respectively. Even though the national surveys covered most provinces in Thailand, estimates were based on simply calculating the percentage of positive cases among all the participants (Wongsaroj et al., 2014), and the remote areas might not be included (Maipanich et al., 2004). Instead, our estimates were based on rigorous Bayesian geostatistical modeling of available survey data with environmental and socioeconomic predictors, accounting for heterogeneous distribution of infection risk and population density when aggregating country-level prevalence.
Our findings suggested that the overall prevalence of O. viverrini remained high (>20%) in Lao PDR during the study periods, consistent with conclusions drawn by Suwannatrai et al., 2018. We   estimated that a total number of 2.45 (95% BCI: 1.98-2.83) million people living in Lao PDR were infected with O. viverrini, equivalent to that estimated by WHO in 2004(WHO, 2002. Besides, our risk mapping for Champasack province shares similarly risk map pattern produced by Forrer and colleagues (Forrer et al., 2012). A national-scaled survey in Cambodia during the period 2006-2011 reported infection rate of 5.7% (Yong et al., 2014), lower than our estimation of 8.34% (95% BCI: 5.25-14.95%) in 2011. The former may underestimate the prevalence because more than 77% of participants were schoolchildren (Yong et al., 2014). Another large survey in five provinces of Cambodia suggested a large intra-district variation, which makes the identification of endemic areas difficult (Miyamoto et al., 2014). Our high-resolution estimates for Cambodia help to differentiate the intra-district risk. However, the estimates should be taken cautious due to large district-wide variances and a relatively small number of surveys. Indeed, O. viverrini infection was underreported in Cambodia (Khieu et al., 2019), and further point-referenced survey data are recommended for more confirmative results.
Although an overall low prevalence was estimated in Vietnam (2.15%, 95% BCI: 0.73-4.40%) in 2018, it corresponds to 2.07 million (95% BCI: 0.70-4.24 million) people infected, comparable to the number in Lao PDR, mainly due to a larger population in Vietnam. The risk mapping suggested moderate to high risk areas presented in central Vietnam, with a high risk in Phu Yen province for many years, particularly. This agreed with previous studies considering the province a 'hotspot' (Doanh and Nawa, 2016). Of note, even though there was no evidence of O. viverrini infection in the northern part of the country, Clonorchis sinensis, another important liver fluke species, is endemic in the region . We did not provide estimates for Myanmar in case of large estimated errors. Indeed, only two relevant papers were identified by our systematic review, where one shows low to moderate prevalence in three regions of Lower Myanmar (Aung et al., 2017), and the other found low endemic of O. viverrini infection in three districts of the capital city Yangon (Sohn et al., 2019). Nation-wide epidemiological studies are urgent for a more comprehensive understanding of the disease in Myanmar. We identified several important factors associated with O. viverrini infection in Southeast Asia, which may provide insights for the prevention and control of the disease. The infection risk was higher in the entire community than that in schoolchildren, consistent with multiple studies (Aung et al., 2017;Forrer et al., 2012;Miyamoto et al., 2014;Van De, 2004;Wongsaroj et al., 2014). A negative association was found between O. viverrini infection and elevation, suggesting the disease was more likely to occur in low altitude areas, which was consistent with a previous study (Wang et al., 2013). HII, a measure of human direct influence on ecosystems (Sanderson et al., 2002), showed a negative relationship with O. viverrini infection risk, indicating the disease was more likely to occur in areas with low levels of human activities, which were often remote and economically underdeveloped. The habit of eating raw or insufficiently cooked fish was more common in rural areas than that in economically developed ones, which could partially explain our findings (Grundy-Warr et al., 2012, Keiser, 2019. Indeed, this culturally rooted habit is one of the determinants for human opisthorchiasis (Kaewpitoon et al., 2008;Ziegler et al., 2011). However, the precise geographical distribution of such information is unavailable and thus we could not use it as a covariate in this study.
Our estimate of the number of people infected with O. viverrini is higher than that of the previous study (12.39 million vs 8.6 million [Qian and Zhou, 2019]) emphasizing the public health importance of this neglected disease in Southeast Asia, and suggesting that more effective control interventions should be conducted, particularly in the high risk areas. The successful experience in the intervention of Thailand may be useful for reference by other endemic countries of the region. The national opisthorchiasis control program, supported by the government of Thailand, applied interrelated approaches, including stool examination and treatment of positive cases, health education aiming at the promotion of cooked fish consumption, and environmental sanitation to improve hygienic defecation (Jongsuksuntigul and Imsomboon, 2003). In addition, for areas with difficulties to reduce infection risk, a new strategy was developed by Sripa and colleagues, using the EcoHealth approach   Figure 6 continued on next page with anthelminthic treatment, novel intensive health education on both communities and schools, ecosystem monitoring, and active participation of the community . This 'Lawa model' shows good effectiveness in Lawa Lake area, where the liver fluke was highly endemic . Furthermore, common integrated control interventions (e.g., combination of preventive chemotherapy with praziquantel, improvement of sanitation and water sources, and health education) are applicable not only for opisthorchiasis but also for other NTDs, such as soiltransmitted helminth infection and schistosomiasis, which are also prevalent in the study region (Dunn et al., 2016;Gordon et al., 2019). Implementation of such interventions in co-endemic areas could be cost-effective (Linehan et al., 2011;WHO, 2012).
Frankly, several limitations exist in our study. We collected data from different sources, locations of which might not be random and preferential sampling may exist. We performed a risk-preferential sampling test and the results showed that preferential sampling might exist for survey locations in one third (6/18) of the survey years ( Figure 2-source data 2). The corresponding impacts might               include improper variogram estimator, biased parameter estimation, and unreliable exposure surface estimates (Diggle et al., 2010;Pati et al., 2011;Gelfand et al., 2012). To avoid a more complex model, we did not take into account the preferential sampling issue for our final model, as the model validation showed a reasonable capacity of prediction accuracy. However, the disadvantage of this issue should be well aware. We set clear criteria for selection of all possible qualified surveys and did not exclude surveys that reported prevalence in intervals without exact observed values. Sensitivity analysis showed that the using the midpoint values of the intervals had little effects on the final results (Figure 3-figure supplement 1). For surveys across a large area, complex designs, such as randomly sampling from subgroups of the population under a well-designed scheme, are likely adopted, as it is impractical to draw simple random samples from the whole area. In such case, respondents may have unequal probabilities to be selected, thus weighting should be used to generalize results for the entire area. The observed disease data we collected were from surveys either at point-level (i.e., community or school) or aggregated over areas. For point-level data, as study areas were quite small, simple sampling design was mostly used in the corresponding surveys. And for areal-level data, particularly those aggregated across ADM1, complex designs were likely applied. However, most of the corresponding surveys were only reported raw prevalence or prevalence without clarifying whether weighting was applied. Thus, we did not have enough information to address the design effect for each single survey included. On the other hand, as population density across the study region was different, we calculated the estimated country-and provincial-level prevalence by averaging the estimated pixel-level prevalence weighted by population density. In this way, we took into account the diversity of population density across areas for regional summaries of the estimates.
We assumed similar proportions of age and gender in different surveys, as most of which only reported prevalence aggregated by age and gender. Nevertheless, considering the possible differences in infection risk between the whole population and schoolchildren, we categorized survey types to the community-and school-based. Furthermore, our analysis was based on survey data under different diagnostic methods. The sensitivity and specificity of the same diagnostic method may differ across studies (Charoensuk et al., 2019;Laoprom et al., 2016;Sayasone et al., 2015), while different diagnostic methods may result in different results in the same survey. To partially taking into account the diversity of diagnostic methods, we assumed the same diagnostic method has similar sensitivity and specificity, and we considered the types of diagnostic methods as covariates in the model. Results showed that the odds of infection with FECT methods was significantly lower than that with Kato-Katz, which was consistent with results found by Lovis et al., 2009. In addition, most of the diagnostic methods in the surveys were based on fecal microscopic technique on eggs, which could not effectively distinguish between O. viverrini and minute intestinal flukes of the family Heterophyidae (e.g., heterophyid and lecithodendriid) (Charoensuk et al., 2019, Sato et al., 2010. Thus, our results may overestimate the O. viverrini infection risk in areas where heterophyid and lecithodendriid are endemic, such as Phongsaly, Saravane, and Champasak provinces in Lao PDR (Sato et al., 2010, Chai et al., 2010Chai et al., 2013), Nan and Lampang provinces in Thailand (Wijit et al., 2013), and Takeo province in Cambodia (Sohn et al., 2011). There is an urgent need for the application of more powerful diagnostic practices with higher sensitivity and specificity to better detect the true O. viverrini prevalence, such as PCR (Lovis et al., 2009, Lu et al., 2017, Sato et al., 2010. Nevertheless, because of the similar treatment and the prevention strategies of O. viverrini and minute intestinal flukes (Keiser and Utzinger, 2010), our risk mapping is valuable also for areas co-endemic with the above flukes.
In conclusion, this study contributes to better understand the spatial-temporal characteristics of O. viverrini infection in major endemic countries of Southeast Asia, providing valuable information guiding control and intervention, and serving as a baseline for future progress assessment. Estimates were based on a rigorous geostatistical framework jointly analyzing point-and areal-level survey data with potential predictors. The higher number of infected people we estimated highlights the public health importance of this neglected disease in the study region. More comprehensive epidemiological studies are urgently needed for endemic areas with scant survey data. We followed a protocol (Figure 1-figure supplement 1) for inclusion, exclusion, and extraction of survey data. First, we screened titles and abstracts to identify potentially relevant articles. Publications on in vitro studies, or absence of human studies or absence of disease studies were excluded. Quality control was undertaken by re-checking 20% of randomly selected irrelevant papers. Second, the full-text review was applied to potentially relevant articles. We excluded publications with following conditions: absence of prevalence data; studies done in specific patient groups (e.g., prevalence on patients with specific diseases), in specific population groups (e.g., travelers, military personnel, expatriates, nomads, displaced or migrating population), under specific study designs (e.g., case report studies, case-control studies, clinical trials, autopsy studies); drug efficacy or intervention studies (except for baseline data or control groups), population deworming within 1 year, the survey time interval more than 10 years, data only based on the direct smear method (due to low sensitivity) or serum diagnostics (due to unable to differ the past and the active infection). During the full-text review, the potential relevant cited references of the articles were also screened. Studies were included if they reported survey data at provincial level and below, such as administrative divisions of level 1 (ADM1: province, state, etc.), 2 (ADM2: city, etc.), and 3 (ADM3: county, etc.), and at point-level (village, town, school, etc.). Duplicates were checked and removed. The quality assessment of each individual record included in the final geostatistical analysis was performed by two independent reviewers, based on a nine-point quality evaluation checklist (Figure 2-figure supplement 1-source data 1).

Materials and methods
We followed the GATHER checklist (Supplementary file 1B; Stevens et al., 2016) for the data extraction. Detailed information of records was extracted into a database, which includes literature information (e.g., journal, authors, publication date, title, volume, and issue), survey information (e. g., survey type: community-or school-based, and year of survey), location information (e.g., location name, location type, and coordinates), and disease-related data (e.g., species of parasites, diagnostic method, population age, number of examined, number of positive, and percentage of positive). The coordinates of the survey locations were obtained from Google Maps (https://www.google. com/maps/). For surveys reported prevalence in intervals without exact observed values, the midpoints of the intervals were assigned.

Environmental, socioeconomic, and demographic data
The environmental data (i.e., annual precipitation, distance to the nearest open water bodies, elevation, land cover, land surface temperature [LST] in the daytime and at night, and normalized difference vegetation index [NDVI]), socioeconomic data (i.e., human influence index, survey type, and travel time to the nearest big city), and demographic data of Southeast Asia were downloaded from open data sources (Figure 7-source data 1). Land cover data was summarized by the most frequent category within each pixel over the period of 2001-2018. We combined similar land cover classes and re-grouped them into five categories: (i) croplands; (ii) forests; (iii) shrub and grass; (iv) urban; and (v) others. LST in the daytime and at night, as well as NDVI were averaged over the period of 2000-2018. All data were aligned over a 5 Â 5 km grid across the study region (Figure 7). Data at point-referenced survey locations were extracted. We linked the data to the divisions (i.e., ADM1, ADM2, or ADM3) reported aggregated outcome of interest (i.e., infection prevalence) by averaging them within the corresponding divisions. The above data processing was done using the package 'ratser' (https://cran.r-project.org/web/packages/raster) through R (version 3.5.0).

Model fitting and variable selection
As our outcome of interest derived from both point-referenced and area-aggregated surveys, a bivariate Bayesian geostatistical joint modeling approach was applied to analyze the area-level and point-level survey data together (Moraga et al., 2017;Utazi et al., 2019), and account for both disease data reporting numbers of examined and positive, and those reporting only prevalence.
We defined p it the probability of infection at location i and time period t, where i is the index either for the location of point-referenced data or of the area for area-level data. Based on the probability theory, for data reported with numbers of examined and positive, we assumed that the number of examined Y it followed a binomial distribution Y it~B in p it ; N it ð Þ, where N it denoted the number of examined; and for data only reported with the observed prevalence, we assumed that the Figure 7. Images of spatial covariates used in the present study. The online version of this article includes the following source data for figure 7: Source data 1. The sources of covariate layers. observed prevalence ob it followed a beta distribution ob it~B e p it ; s 2 b . The period of this study was from 1978 to 2018. We modeled predictors on a logit scale of p it . We referred to the method proposed by Cameletti and colleagues (Krainski, 2019;Cameletti et al., 2013) to build a spatial-temporal model combined with covariates, which was defined as an SPDE (Stochastic Partial Differential Equation) model for the spatial domain and an AR1 model for the time dimension. A standard grid of 5 Â 5 km 2 was overlaid to each survey area resulting in a certain number of pixels representing the area. We assumed that survey locations and pixels within survey areas shared the same spatial-temporal process. In addition, we assumed the infection risk the same within 1-year period for the same areas. Different observations from the same year in the same areas can be treated as realizations of the randomized spatial-temporal process. Let i ¼ 1; . . . ; n A ; n A þ 1; . . . ; n A þ n p , where n A is the total number of areas for area-level surveys and n p is the total number of locations for point-referenced surveys. Regarding area-level data, it is the vectors of covariate values and ! s i ; t ð Þ is the spatial-temporal random effect for i th location in time period t. To decrease the computational burden, under the SPDE framework, we built the GMRF on regular temporal knots, that is, ! ¼ ! t¼1978 ; ! t¼1983 ; ! t¼1988 ; ! t¼1993 ; ! t¼1998 ; ! t¼2003 ; ! t¼2008 ; ! t¼2013 ; ! t¼2018 ð Þ 0 (Cameletti et al., 2013;Krainski, 2019). We assumed the spatio-temporal random effect ! s; t ð Þ follow a zero-mean Gaussian distribution, that is, !~GP 0; K space K time À Á , where the spatial covariance matrix K space was defined as a stationary Maté rn covariance function Þ and the temporal covariance matrix as K time ¼ jtuÀtoj with j j<1, corresponding to the autoregressive stochastic process with first order (AR1). And the spatio-temporal random effect ! s; t ð Þ was assumed independent of each other in different times and locations, that is,Cov . Here D donates the Euclidean distance matrix, k is a scaling parameter, and the range r ¼ ffiffiffiffiffi 8n p =k, representing the distance at which spatial correlation becomes negligible (<0.1), and K n is the modified Bessel function of the second kind, with the smoothness parameter n fixed at 1. The latent fields corresponding to other years are approximated by projection of ! using the B-spline basis function of degree two, that is, where m is the degree of two (Krainski, 2019;Cameletti et al., 2013). We formulated the model in a Bayesian framework. Minimally informative priors were specified for parameters and hyper parameters as follows: b~N 0; 10 5 I À Á , log 1=s 2 b ~l ogGamma 1; 0:1 ð Þ, log 1=s 2 is the median distance between the predicted grids. Additionally, we applied variable selection procedure to identify the best set of predictors for a parsimonious model. First, the best functional form (continuous or categorical) of continuous variables was selected, by fitting univariate Bayesian spatial-temporal models with either form as the independent variable and selecting the form with the lowest log score (Pettit, 1990). Second, the best subset method was used to identify the best combination of predictors for the final model. According to previous studies (Aung et al., 2017;Forrer et al., 2012;Miyamoto et al., 2014;Wongsaroj et al., 2014), the infection risk in community and school may be different, and using different diagnostic methods may differ the observed prevalence (Charoensuk et al., 2019;Laoprom et al., 2016;Sayasone et al., 2015). Thus, the survey type (i.e., community-or schoolbased) and the diagnostic methods (i.e., Kato-Katz, FECT, or other methods) were kept in all potential models, while the other 10 environmental and socioeconomic variables were put forth into the Bayesian variable selection process. The model with the minimum log score was chosen as the final model.
Model fitting and variable selection process were conducted through INLA-SPDE approach (Lindgren et al., 2011;Rue et al., 2009), using INLA package in R (version 3.5.0). Estimation of risk for O. viverrini infection in each year of the study period was done over a grid with cell size of 5 Â 5 km 2 . And the relative changes of the prevalence were also calculated using a formula as pp stj À pp sti À Á =pp sti for pixel s between the former year t i and the later year t j , where pp indicates the median of the posterior estimated distribution of infection risk. The corresponding risk maps and the prevalence changing maps were produced using ArcGIS (version 10.2). In addition, as population density across the study region was different, the population-adjusted estimated prevalence and number of infected individuals in 2018 were calculated at the country and provincial levels averaging the estimated pixel-level prevalence weighted by population density, that is, Here pp A , pp i , and w i are the estimated prevalence in area A, estimated prevalence at pixel i, and population density at pixel i, respectively, where i belongs to area A. Based on previous studies, for the provinces in Vietnam where there was no evidence of O. viverrini infection, we multiplied the estimated results by zero as the final estimated prevalence (Doanh and Nawa, 2016). The R code used for model fitting is publicly available in GitHub (https://github.com/ SYSU-Opisthorchiasis/Spatial-temporal-mapping-of-opisthorchiasis and archived in software heritage; Zhao, 2021; copy archived at swh:1:rev:6493df4ba60c1f2f1aaaad979174a3a5d928627a).

Model validation, sensitivity analysis, and test of preferential sampling
Model validation was conducted using the 5-fold out-of-sample cross-validation approach. Mean error (ME ¼ 1 Þ 2 ), and the coverage rate of observations within 95% BCI were calculated to evaluate the performance of the model. Furthermore, a Bayesian geostatistical model only based on point-referenced data was fitted and validated, to compare its performance with our joint modeling approach. In addition, a sensitivity analysis was conducted to evaluate the effects of using the midpoint values of the intervals as the observed prevalence in one literature from Suwannatrai and colleagues (Suwannatrai et al., 2018), reporting observed prevalence of O. viverrini infection in intervals. Sensitivity analysis was done by using the lower and the upper limits of the intervals in the modeling analysis.
Considering that the data in this study were sourced from different studies, preferential sampling may exist. We performed a test for preferential sampling of the data. To our knowledge, no method has been developed for preferential sampling test on observations combined at point and areal levels. To compromise, we took centers of the areas with survey data as their locations for the test of preferential sampling. A fast and intuitive Monte Carlo test developed by Watson was adopted for its advantage of fast speed and feasibility of data arising from various distributions. We assumed S t (i.e., the collection of sampled points at time t) a realization from an inhomogeneous Poisson processes (IPP) under the condition of ! s; t ð Þ (i.e., the spatial-temporal Gaussian random field), that is, ½s t j!ðs; tÞ ¼ IPPðlðs; tÞÞ, and log l s; t ð Þ ð Þ ¼ a 0 þ h ! s; t ð Þ ð Þ, where h is a monotonic function of ! s; t ð Þ. When h 0, the sampling process is independent from ! s; t ð Þ, thus the preferential sampling is not significant. In this way, the problem of detecting preferential sampling can be transformed into the hypothesis testing of h 0. If h 0 is false, for example, in case that h is a monotonic increasing function of ! s; t ð Þ, then the point patterns S t are expected to exhibit an excess of clustering in areas with higher ! s; t ð Þ, thus positive association can be detected between the localized amount of clustering and estimated ! s; t ð Þ. First, we used the mean of the distances to the K nearest points (D K ) to measure the clustering of locations, and calculated the rank correlation r t K ð Þ between D K and the estimated ! s; t ð Þ for survey year t. Here the estimated ! s; t ð Þ was obtained from fitting the Bayesian spatial-temporal joint model. Next, the Monte Carlo method was used to sample realizations from the IPP under the null hypothesis (i.e., h 0), following which a set of rank correlations r M t K ð Þ were calculated, approximating the distribution of the rank correlations t K ð Þ under h 0. In this way, the nonstandard sampling distribution of the test statistic can be approximated. Finally, we computed the desired empirical p-value by evaluating the proportion of the Monte Carlo-sampled r M t K ð Þ which are more extreme than r t K ð Þ . We set a sample size of 1000 for each Monte Carlo sampling. We also considered K from 1 to 8 to measure the clustering of locations and resulted in eight p-values respective to different K for each survey year. If one of the p-values is smaller or equal to 0.05, we considered preferential sampling existing in the corresponding survey year. Since our model could estimate the disease risk each year of the study period, this test was done for each survey year with number of locations more than or equal to 10 (i.e., 1978, 1981, 1991, 1995, 1998, 2000, 2001, 2004, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, and 2016). The test was conducted using the package 'PStestR' in R (version 3.6.3) (Watson, 2020).

2017A030313704
Ying-Si Lai The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. We did quality evaluation for each literature included in the final geostatistical modeling analysis, which is undertaken using a nine-point checklist. The items of quality evaluation are as follows: Q1: provide specific inclusion and exclusion criteria. Q2: provide basic characteristics of the investigated population (gender, age, etc.). Q3: provide prevalence rate of the survey. Q4: provide number of positive patients and number of examined people of the survey. Q5: provide diagnostic method used in the survey. Q6: provide survey type. Q7: provide time of the survey. Q8: describe or discuss the possible bias of the survey or how confounders are controlled. Q9: the literature comes from Science Citation Index Expanded database. Each item is scored 1 in case the publication meets or 0 in contrary. The scores are summed up for all items and assigned to the publication as its quality score. The score for each literature is listed in Figure 2-figure supplement 1-source data 1.

Author contributions
. Transparent reporting form

Data availability
All data generated or analysed during this study are included in the manuscript and supporting files.