Spatio-temporal detection for dengue outbreaks in the Central Region of Malaysia using climatic drivers at mesoscale and synoptic scale

The disease dengue is associated with both mesoscale and synoptic scale meteorology. However, previous studies for south-east Asia have found a very limited association between synoptic variables and the reported number of dengue cases. Hence there is an urgent need to establish a more clear association with dengue incidence rates and the most relevant meteorological variables in order to institute an early warning system. This article develops a rigorous Bayesian modelling framework to identify the most important covariates and their lagged effects for constructing an early warning system for the Central Region of Malaysia where the case rates have increased substantially in the recent past. Our modelling includes multiple synoptic scale Ni ˜ no indices, which are related to the phenomenon of El Ni ˜ no Southern Oscillation (ENSO), along with other relevant mesoscale environmental measurements and an unobserved variable derived from reanalysis data. An empirically well validated hierarchical Bayesian spatio-temporal is used to build a probabilistic early warning system for detecting an upcoming dengue epidemic. Our study finds a 46.87% increase in dengue cases due to one degree increase in the central equatorial Pacific sea surface temperature with a lag time of six weeks. We discover the existence of a mild association with relative risk 0.9774 (CI: 0.9602, 0.9947) between the rate of cases and a distant lagged cooling effect in the region of coastal South America related to a phenomenon called El Ni ˜ no Modoki. The Bayesian model also establishes that the synoptic meteorological drivers can enhance short-term early detection of dengue outbreaks and these can also potentially be used to provide longer-term forecasts.


Introduction
Dengue is a very harmful mosquito-borne viral infection worldwide.This viral infectious disease can lead to a wide spectrum of clinical manifestations such as acute onset high fever, muscle and joint pain, myalgia, cutaneous rash, hemorrhagic episodes and circulatory shock (Hasan et al., 2016).Thus the disease causes an enormous burden to pubic health systems.Bhatt (2013) estimate that there are 390 million total annual infections throughout the world.During the year 1990 to 2019, the number of dengue cases has increased by 85.5% according to Yang et al. (2021).
In Asia, dengue fever has been reported earliest by Skae (1902), followed by dengue hemorrhagic fever and dengue shock syndrome epidemics in the twentieth century (Henchal and Putnak, 1990).Gubler (1998) describes that the dengue virus (DENV) is transmitted by the bite of female Aedes aegypti mosquitoes.To a lesser extent, Aedes albopictus is also a vector of indoor transmission (Noor et al., 2018).Four serotypes of virus DENV-1, DENV-2, DENV-3 and DENV-4 following the human cycle are genetically similar (Mustafa et al., 2015).
Climate is a crucial determinant of dengue disease transmission by affecting its vector dynamics (Morin et al., 2013).Local and global climate not only influence the spatial distribution of infections (Johansson et al., 2009) but also the interanuual variability (Cazelles et al., 2005).The climatic impact to a dengue outbreak is also known to be cumulative and delayed (Lowe et al., 2018).Hii et al. (2016) emphasise that dengue is a climate-sensitive infectious disease.The rapid change in climate drivers increases the risk of dengue outbreaks in the past decade.A climate-based early warning system (EWS) has the potential to enhance surveillance and control of the disease.Simple statistical models, which do not adequately account for complex spatio-dynamic dependencies, cannot be used to construct a reliable EWS.Hence there is an urgent need to develop complex Bayesian models which validate well in empirical studies.
In this article our study region is the Central Region in Malaysia which constitutes of the Selangor state together with the two adjacent federal territories, namely Kuala Lumpur and Putrajaya, see Fig. 1.This region contributes most to the national dengue hospitalisation numbers in Malaysia.The dengue infection rates in this region have increased significantly in the past decade as reported by AbdMajid et al. (2021) and Salim (2021).
A significant relationship has been found between dengue hospitalisation rates and covariates such as precipitation, temperature, number of monthly rain days and ENSO for 12 states of West Malaysia (Che Him et al., 2018).A similar study by Che Him et al. (2018) identifies two distinct spatial clusters via two generalised additive models (GAM) for nine districts of the state of Selangor.Ahmad et al. (2018) conducted a large scale study on dengue outbreaks in Malaysia by collecting mosquito samples using a device called ovitrap and rain gauge data.A linear regression model is used to identify the entomological, epidemiological and environmental drivers that contributed to the dengue outbreak of two locations in Selangor state.Salim (2021) develop a support vector machine model that incorporates environmental variables including temperature, wind speed, humidity, and rainfall to predict dengue outbreaks.
Hierarchical Bayesian spatio-temporal models for areal data are widely used in dengue disease mapping and prediction (Lowe et al., 2011;Lowe et al., 2013;Lowe et al., 2014;Stewart-Ibarra and Lowe, 2013).Using spatio-temporal model as a toolkit, Bayesian modelling can have a better capacity to handle explicit contribution from the covariates and latent spatio-temporal dependency.One popular choice of structured prior to capture spatially spill-over effect is a conditional intrinsic Gaussian autoregressive prior (CAR; Besag et al., 1991).Spatio-temporal autocorrelations are used to capture the facts that the disease rates in closer areal units and temporally close time periods tend to have more similar values (Lee et al., 2018).With the fact that spatial and temporal components are intrinsically interacted, a variety of CAR-based spatio-temporal model is developed to tackle many real-world applications as investigated by Bernardinelli et al. (1995), Knorr-Held (2000), Napier et al. (2016), Rushworth et al. (2014), Rushworth et al. (2017) and Sahu (2022).
ENSO is a global scale of climate variation where the cycles have lasted between two and seven years.Several previous studies have found significant associations between dengue outbreaks and ENSO in some specific study regions (Kovats et al., 2003).The oscillations of the sea surface temperature (SST) in different regions in the equatorial Pacific are used to define ENSO (Rasmusson and Carpenter, 1982).Ashok et al. (2007) define the anomalous warming events that occur in the central equatorial Pacific (Niño4 region) as an alternative type of El Niño called El Niño Modoki which is different from the conventional study region Niño3.4(5 (McGregor and Ebi, 2018) highlight that the contrasting rainfall fields for conventional El Niño and El Niño Modoki events hint at potential spatio-temporal inconsistencies of ENSO-health associations.Salimun et al. (2014) find that, although displayed much warmer SST anomalies in the Indian Ocean and regional seas in the Maritime Continent, the impact on the winter rainfall during conventional El Niño in boreal winter season over Peninsular Malaysia is minimal but significant higher during El Niño Modoki.Tangang et al. (2017) show that, during winter, a strong La Niña leads to a significant decrease in wet precipitation extremes over the Peninsular Malaysia due to the anomalous cyclonic circulation over strong La Niña.Nevertheless, Hanley et al. (2003) demonstrate that Niño4 index is more relevant to La Niña but poorly explain El Niño whilst the Niño1 + 2 index has the opposite characteristics.These two SST indices altogether cover different types of ENSO and their impact on dengue transmission.
Most high impact weather in synoptic scale occurs where the atmosphere is experiencing rising motion.The vertical velocity measured by the omega equation is associated with high impact weather and cyclones (Dostalek et al., 2017).In a study of the impact of meteorological factors to the air pollution in China, Hou et al. (2018) indicate that the vertical velocity has a short-term influence on PM 2.5 level in the Pearl River Delta.It is expected that the unobserved meteorological variable would add value to our understanding of environmental association with the disease.Wong et al., 2011 use a lagged 22 day mean air temperature to capture the second generation gonotrophic cycle of Aedes mosquitoes to predict ovitrap index.(Cheong et al., 2013) study the effects of temperature, rainfall and wind speed in Selangor with emphasis on their lag times.The lag times of 51 days minimum daily temperature and 28 days bi-weekly cumulated rainfall present a positive association with dengue hospitalisations.The effect from mesoscale local temperature and rainfall is related to some other major synoptic climate oscillations which influences the regional climate of Malaysia such as Indian Ocean Dipole (IOD; Tangang et al., 2012).IOD can happen in conjunction with ENSO or independently.Hong and Jin (2014) discover that the IOD-ENSO interaction is the cause of the generation of Super El Niños.Hameed et al. (2018) also show that the IOD lagged Niño3.4 by three to six months.
The paper by Hameed et al. (2018) conjectures that air pollution, especially ozone, has a profound effect on the mosquito vectors.Thiruchelvam et al. (2018) study the relationships between air quality and dengue hospitalisations.It is asserted that the air pollution index (API) levels do not have a significant effect on the reported cases.However, ozone is proven to have a repellent effect on both Aedes aegypti and Aedes albopictus (Wan-Norafikah et al., 2016).The API used by the Malaysian government follows the Pollutant Standard Index (PSI; Swamee and Tyagi, 1999) by the United States Environmental Protection Agency (USEPA).It considers five pollutants namely carbon dioxide, ozone, nitrogen dioxide, sulphur dioxide and particulate matter with a diameter of less than 10 microns.Individual air quality scores of each pollutant are assigned according to a gold standard.The API is the highest of those given individual scores.Its impact on humans has been thoroughly studied but its applicability to dengue transmission is still questionable.Without knowing which pollutant it refers to, the lagged value of API is meaningless for studying dengue incidence rates.For this reason, it is worthwhile to investigate each of the pollutants separately.
The remainder of this paper is organised as follows.Section 2 describes the data used in this study.Section 3 illustrates the components considered in the Bayesian spatio-temporal models.Model implementation, validation, evaluation of the EWS are discussed in Section 4. A detailed discussion of the results and their implications is provided in Section 5.

Data source
Our data set contains the weekly counts of hospital admissions for dengue fever (Y kt ) in Selangor State and two federal territories in Malaysia (indexed by k) from 2013 to 2019 (indexed by t) obtained from the Ministry of Health (MOH) Malaysia.We also use relevant demographic information obtained from the Department of Statistics Malaysia (DOSM).Nine Selangor districts and two federal territories, namely Kuala Lumpur and Putrajaya, constitute our study region of the Central Region.
Environmental variables such as air pollution index, ozone concentration level (in parts per billion) and temperature are provided by the Malaysian Department of Environment (DOE), Ministry of Environment and Water, whilst rainfall information (in mm) is provided by the Malaysian Meteorological Department (MetMalaysia).Ozone concentration data from the federal territories of Kuala Lumpur and Putrajaya are not available to us.That is why we have replaced these missing values by imputing the average values from the adjacent districts.The Niño4 and Niño1 + 2 SST indices (Huang et al., 2021) capturing sea surface temperature anomalies in the central equatorial Pacific region (5 • N-5 • S, 160 • E-150 • W) and the region of coastal South America (0 • -10 • S, 90 • W-80 • W) are obtained from NOAA Climate Prediction Center.
Gridded (2.5 • × 2.5 • ) reanalysis daily mean vertical velocity in pressure coordinates obtained from the NCAR/NCEP Reanalysis (Kalnay et al., 1996) is aggregated into weekly scale according to the epidemiological week (Epi week) defined by MOH.
Finally, the administrative district areal boundaries are extracted from The Humanitarian Data Exchange and all studied districts are within one (2.5 • × 2.5 • ) grid cell in the reanalysis dataset.

Basic characteristics
A total of 414,284 dengue fever cases are reported in the nine districts of Selangor and two federal territories from January 2013 to December 2019 in the Central Region.The total numbers of cases vary from 26,422 in 2013 to 87,967 in 2019.Since the outbreaks after summer in 2013, there is no clear annual trend until a severe upsurge in 2019 which surpassed three fold of the total cases in 2013 (Fig. 2).

Temporal evolution and lagged effect dependency
The seasonality of dengue incidence rate (DIR) across Selangor is not as obvious as in other geographical regions, such as Thailand, in the existing literature see e.g.Lowe et al. (2016).A weak but consistent seasonality can be seen found in the study period.The weekly mean DIR peaks in the winter and finds another peak in the summer (Fig. 3).
Preceded by a weak El Niño event in 2014, historical observation shows that, the unusual 2015-2016 El Niño was one of the strongest El Niño in history (Lian et al., 2017).The SST anomalies in the central equatorial Pacific acts as a good predictor of El Niño but Trenberth et al. (2002) also suggest that El Niño should be identified along with the difference between the Niño4 and Niño12 indices.The difference measures the gradient in SST anomalies between the central and eastern equatorial Pacific termed El Modoki (Ashok et al., 2007).Fig. 4 shows that both Niño4 and Niño12 indices peaked in late 2015 when a strong El Niño event occurred.An El Niño Modoki event also began in late 2018 where the eastern equatorial SST became cooler.The DIR is closely related to this upward trend of both Niño1 + 2 and Niño4 indices during El Niño (Fig. 4).Taking out the effect of Niño4, the partial correlation between DIR and Niño1 + 2 index is 0.0578 only, although the Niño1 + 2 and Niño4 are highly correlated (Fig. 5).The Niño indices, representing the SST in the central/eastern equatorial Pacific, exert a delayed effect on Peninsular Malaysia local climate.
Following Cheong et al. (2013), a distributed lag non-linear model (DLNM; Gasparrini et al., 2010;Gasparrini, 2011) is used as an exploratory tool.The number of dengue hospitalisation and environmental variables are aggregated into a set of single region timeseries.The lag time with quartic B-splines for the predictors and lag stratifications are then evaluated through a DLNM analysis.The relative risk (RR) at 90% quantile of temperature, ozone, rainfall, ozone, omega, Niño1 + 2 and Niño4 reach their maximum at a lag of 1, 10, 7, 15, 28, and 6 weeks respectively (Fig. 6).Capturing the effect of La Niña, for Niño4, the RR at 10% quantile reaches a local minimum at the 10 week lag.The formation of ozone is heavily influenced by sunlight and temperature (Ghazali et al., 2010).Since     high temperature and presence of sunlight are the confounding factors, ozone has a strong immediate effect on dengue (Fig. 6c).The incremental cumulative RRs of rainfall has a monotonic increasing trend and have a long-range dependency throughout a long lag time.Fig. 6b shows a different pattern of short-term drought and wet scenario, with a very strong and immediate effect during drought (lag time 0-5) and a more delayed association with wet weather peak at a lag of 7 weeks.On the other hand, Fig. 6d shows a strong positive impact from the vertical velocity with a lag time of 9 weeks.It is understood that rainfall and vertical velocity are related to ground-level hydrology.A possible explanation is that drought makes people store water (Gagnon et al., 2001).Pontes et al. (2000) also suggest that household storage of water during the drought is correlated with the increase of Aedes aegypti vector abundance.

Regional variations
The Central Region area, especially for districts adjacent to Kuala Lumpur, became hyper-endemic of dengue transmission due to years of neglect (AhmadMeer et al., 2018).Fig. 7 plots a map of DIR, temperature, rainfall level, ground-level ozone concentration level from 2013 to 2019.The districts of Gombak, Petaling, Klang and Hulu Langat generally recorded higher DIR, mean temperature and ozone concentration level compared to other districts.However, the capital city Kuala Lumpur has shown significant reduction in cases although among the wettest in the region.This regional variation is regarded as a function of degree of urbanisation.An explicit formulation of this type of function is generally infeasible (Chandler, 2005).An entomological explanation to this variation is related to the abundance of the breeding areas of Aedes aegypti and Aedes albopictus.In an entomological surveillance study for two villages in Selangor, Noor et al. (2018) show that two species are indoor and outdoor breeders respectively.The transmission of the dengue vector is a combined effect of two species.Hence, this socio-economic difference between the two districts is a source of the step change in the cases count.Both the federal territories, Kuala Lumpur and Putrajaya, have lower rate of cases than the surrounding districts.This may be attributed to the increased activity of the enforcement agencies and anti-dengue campaigns conducted in the capital city (Hassan et al., 2012).

Spatial dependency
We first consider an independent Poisson generalised linear model for the disease count Y kt defined in Section 2.1 and covariates found in the exploratory analysis through DLNM of the form: where μ kt is the expected number of cases in the district k at time t, e kt is the population size in the district k at time t, Niño12 t is the Niño1 + 2 index at time t, Niño4 t is the Niño4 index at time t, Ozone kt is the ground-level ozone concentration level at time t in district k, Capital k is a binary variable indicates whether the district is Kuala Lumpur, Temp kt and Rain kt is the temperature and total rainfall in the week t, omega t is vertical velocity of air motion derived from weather model at time t.Note that here a lag of three weeks is used for temperature as one week is not a practical lag time for an EWS.We calculated the associated Moran's I statistics (Moran, 1950) for both spatial and spatio-temporal neighbourhood matrices.The statistics are 0.7075 and 0.72402 with p-values less than 0.001%.It indicates both the spatial and spatio-temporal variations have not adequately been captured through the generalised linear model.

Overdispersion
Overdispersion behaviour (Lawless, 1987) often exists in many disease count datasets.Lowe et al. (2011) suggested that a negative binomial model would adequately accommodate an extra-Poisson variation in the dengue case.We fit a negative binomial model using maximum likelihood estimation through a built-in R (R Core Team, 2021) function glm.nb with Eq. ( 1) and the estimated dispersion parameter is 2.61.The amount of overdispersion is quite high.It appears that the Poisson distribution is better suited to explain the "number of infected groups" rather than the total disease count.Represented as a compound Poisson distribution with a logarithmically distributed count per group (Quenouille, 1949), the negative binomial distribution turns out to be a reasonable model.

Model developments
The Bayesian hierarchical modelling approach is a flexible framework to describe the statistical properties in the previous Section.The components of the model formulation can be individually specified conditional to other parameters and data.In this Section, we will go through the key components of our candidate models.

Negative binomial regression
To overcome overdispersion, we use the negative binomial parametrisation which introduces r as a universal control parameter for overdispersion (Gelman et al., 1995).The probability mass function is given by: where the mean and variance of the random variable are E[Y kt ] = μ kt and Var[Y kt ] = μ kt + μ 2 kt /r.As r goes to infinity, the distribution of Y kt converges to the Poisson distribution.

Besag-York-Mollié model
The Besag-York-Mollié model (BYM; Besag et al., 1991;Besag and Kooperberg, 1995) specifies the additive relationship of the overall risk level as an intercept, the fixed effect by the covariates, the pure random effect θ kt and the spatial variation component ϕ k : where θ kt is a normally distributed unstructured error and ϕ k is the structured error modelled by an intrinsic conditionally autoregressive model (ICAR).It has a conditional specification that is normally distributed with a mean equal to the average of its neighbours (ϕ k∼j ) and its variance decreases as the number of neighbours d k increases: An alternative form of BYM (BYM2) model proposed by Riebler et al. (2016) pure residual randomness in which the terms ϕ k and θ kt combined to one entity ϕ kt : where logit(ρ) ∼ N(0, 1), ϕ * k is the ICAR model, θ * kt ∼ N(0, 1), s is the scaling factor computed from the neighbourhood graph.Meanwhile, σ is the overall standard deviation of the two random effects.

Dynamic structure
Dynamic linear model (West and Harrison, 2006) is a special type of state-space model that enables a sequential model definition in the time series context and information propagates conditional to existing information.Taking a negative binomial model as a starting point, Eq. ( 2) is now a top-level observation equation, the spatio-temporal structure is defined as follows: where the overdispersion parameter follows a Gamma distribution with hyperparameters a and b, α is the autoregressive (AR) parameter to control temporal dependency between adjacent time points, ω kt is the Gaussian distributed evolution error.Initial information is required for this temporal structure, s is the scaling parameter controls the proportion of a spatial and non-spatial variation, W is the neighbourhood information formulated as a connected graph.The AR(1) model in the system equation could be understood as a moving average model of infinite order MA(∞) which aggregates all its lagged unexplained residuals as an additional piece of information.Sahu et al. (2009) impute the initial mean by the observed grand mean for a spatial point reference modelling problem.Alternatively, we choose to estimate the initial mean and set log(λ k0 ) to follow a normal distribution with a non-informative prior for both m 0 and σ 2 0 .

Modelling results
We consider five models with different levels of complexity (Table 1).The regression part of the model x kt ′ β is specified by the following setup: No-U-turn sampler (NUTS) is used for Markov chain Monte Carlo (MCMC) sampling Hoffman et al. (2014).Nishio and Arakawa (2019) suggest that NUTS performance is better than Gibbs sampling due to the high effective sample sizes and low autocorrelations in some statistical applications.

Model assessment
Expected log pointwise predictive density (elpd; Vehtari et al. (2017)) is used to compare model performance.The criterion is estimated by leave-one-out cross-validation to mimic out-of-sample prediction data and the looic = − 2 êlpd loo will be used for reporting to provide typical Bayesian conventional scale of deviance information criterion (DIC; Spiegelhalter et al., 2002).The overall fit of each model is summarised in Table 1.The negative binomial family of models (Model B, C, D, E) outperforms the Poisson model (Model A).The negative binomial dynamic model (Model D) with the lowest looic fits the data better than the other four models.The looic of the negative binomial spatial dynamic model (Model E) and Model D differ by within one standard error.
Models B and C are well-specified because the effective number of parameters (pLOO; Vehtari et al., 2017) is smaller than the actual total number of parameters in the models whilst Model A is misspecified due to failure to capture overdispersion.Bürkner et al. (2020) show that elpd/looic estimates are overly optimistic because the future observation has an influence to predictions of the past.Since the pLOO is the difference between elpd and the non-cross-validated log posterior predictive density, thus the pLOO is overestimated

Table 1
Model performance by the LOO information criterion (looic), where pLOO is the estimated effective number of parameters of the model.

Model
System equation looic pLOO under any dynamic setting.The evidence is inconclusive to determine whether Model D and E are well-specified or not.A further model validation procedure is required to check their validity.

Environmental and regional risk factors
Note that models A, B and C possess lower looic values when those are compared to the dynamic models.However, they preserve a considerable explanatory power.This is evident by taking a closer look at the coefficient estimates of Model C, which are presented in the form of relative risk (RR) in Table 2.The covariate lag(Nino4, 6) and lag(Nino4, 10) have a strong positive relationship with the disease, for each degree increase of the indices, the RRs increase by 46.87% and 8.44% respectively.Meanwhile, the Niño1 + 2 index of lag time 28 weeks decreases by 2.26% for each degree increase.The pollutant ozone has a negative effect on the disease.For every 10 ppb increase in concentration level, there is 3.13% decrease in dengue incidence.Kuala Lumpur has 40.40% expected cases lower than other regions.The local weather-related variables have a lesser impact on the RR with only 0.90% and − 3.83% for a unit change in rainfall and temperature.The vertical velocity has a mild impact with only 3.61% RR increment for each 0.01 unit increase.The Niño4 index is the dominant factor and a negative temperature effect is seen as an adjustment to ENSO's impact.With a positive Niño4 and a negative Niño1 + 2 RR, although of different lag times, this provides a shred of indirect evidence that the central equatorial ENSO exerts a stronger impact on dengue disease than the convention ENSO.

Prediction for dengue epidemics and an early warning system
Four model validation criteria: root mean square error (RMSE), mean absolute error (MAE), continuous ranked probability score (CRPS; Hersbach, 2000), coverage at 95% nominal level (CVG; Sahu, 2022) are used for comparing out-of-sample model performance.The first two criteria evaluate the model performance in terms of mean response.The latter two are related to probabilistic forecasts.CRPS measures the discrepancy between the observations and the whole predictive distributions whilst the CVG detects underfitting and overfitting if the criterion drifts away to the nominal coverage probability of 95%.For the criteria RMSE, MAE and CRPS, better predictions correspond to their corresponding lower values.All these criteria values are calculated using the R package bmstdr developed by Sahu (2021).
Provided by European Centre for Medium-Range Weather Forecasts (ECMWF), a high resolution (HRES) integrated forecast system run every 12 h generates up to 10 days forecast (Owens and Hewson, 2018).In other words, replacing the daily mean temperature at lag time of 3 weeks by the forecasts provided by ECMWF, an EWS will have a capacity to produce outbreak detection signals at least four weeks in advance.One of the useful ways to disseminate outbreak detection is to use a visualisation called epidemic channel (Runge-Ranzinger, 2016).The probability of the dengue cases exceeding a certain threshold h in the region k at the time t is the complementary cumulative distribution of the random variable Y kt .It can be computed by obtaining its empirical samples from the MCMC outputs with the equation: where y kt and M are the mth empirical samples of Y kt and the number of empirical samples obtained from the MCMC algorithm respectively, I( ⋅ ) is an indicator function that the set A is the condition when the empirical sample from the posterior predictive distribution exceeds the threshold h (i.e.: A = {x ∈ N 0 ; x > h}).Once the predictive probability with a threshold level between 0.08 − 0.2 (Bowman, 2016) for future dengue cases exceeds a certain alarm value (e.g.: cases more than two standard deviation of the five-year average), an alarm signal forms when the weekly case numbers enter the "alarm zone".
An out-of-sample probability forecast for the weekly reported cases in 11 districts and federal territories in the first four weeks in 2019 is generated from all model candidates.Table 3 summarises the values of the four model validation criteria from the fitted models using 2013-2018 data.Model D is the best model in terms of RMSE and MAE.Model E, although not being optimal in the first three criteria, appears to be the most adequate model if we consider its CVG.The sensitivity and specificity (Bowman et al., 2016;Lowe et al., 2016) represent the hit rate and true negative rate of an EWS.In order to achieve the goal of identifying potential outbreaks with high sensitivity, the probability threshold level is set to a relatively small value.This is also due to the concern that a single miss of a disease outbreak is costly from a disease surveillance point of view.
A receiver operating characteristic (ROC) analysis shows how sensitivity changes with the specificity.The optimal probability threshold level which balances the trade-off between sensitivity and specificity is defined as the probability threshold of issuing an alert that maximises the product of sensitivity and specificity (CZ value;Liu, 2012).The ROC curve achieves the maximum CZ value at the point the probability threshold p = 0.15 as the out-of-sample forecasts considered in the first four weeks in 2019 (Fig. 8).Setting the probability threshold level to 0.15, it means the posterior predictive distribution at 85 percentile exceeds the predefined alarm values of the reported cases greater than two standard deviations of the five-year average at each district driving an alarm signal.Using the same out-of-sample probability estimates for model validation statistics, Model E exhibits the highest sensitivity and a moderate specificity.A careful look at both Model D and E shows that the differences among the models with regard to the looic and model validation criteria are quite small.Model E appears to be preferable after evaluating the overall performance measures.
We find that the Niño4 index provides a high predictability for the number of dengue cases.It makes a longer-term prediction of dengue outbreaks feasible.Meng et al. (2020) show that a complexity-based approach allows us to forecast the magnitude of an ENSO S. Yip et al. event one year in advance.Ham et al. (2019) utilise a convolution neural network (CNN) to predict zonal SST (in their example, Niño3.4 region) by learning from historical simulations of a multi-model ensemble (Bellenger et al., 2014).Ballpark figures generated from a simpler EWS with ENSO information can be then assessed by the government agency.A longer-term climate uncertainty analysis (Yip et al., 2011;Northrop and Chandler, 2014) can be easily plugged into a disease mapping setting (e.g.: Baker et al., 2021).

Discussion
This paper presents a Bayesian spatio-temporal modelling framework leading to a full implementation of an EWS for dengue outbreaks from upstream data source to production.We propose to utilise some global and regional climate observation and variables derived from reanalysis data for a more accurate forecast.Exploratory data analysis methods show a long range of lag times is required   for some synoptic scale meteorological variables namely Niño12 and Niño4 indices.Our proposed holistic assessment goes beyond a single cross-validation metric.The whole assessment consists of calculation of out-of-sample predictive accuracy in multiple ways and alert signal evaluation.The methodology developed in this study can potentially be used to build a similar EWS in other countries or regions in Malaysia.Dissimilar to horizontal wind, the vertical motion is often neglected due to its unobserved nature.In contrast to the environmental variables considered in previous studies, this study also considers a vertical velocity of air motion derived from reanalysis data and reveals to have a mild effect to the epidemics.Similar to many other studies, temperature and rainfall are used in the regression formula.Contrary to the findings of other studies (e.g.Lowe et al., 2016), the estimated coefficient of lagged temperature is negative.This appears to be a case of a local adjustment to a larger scale regional effect dominated by ENSO.The estimate of lagged ozone ties well with the biological argument based on Aedes's gonotrophic cycle in Wong et al. (2011).
The RR estimates from the Model C exhibit that strong lagged anomalous warming in the Niño4 region has a strong positive effect on dengue hospitalisations.Consistent with our present findings, Gagnon et al. (2001) also report a significant positive correlation between El Niño and dengue epidemics in multiple countries.With a less-than-one RR for the lagged Niño1 + 2 index, cooling in the eastern tropical Pacific contributes to the increased dengue.This distinct relationship suggests that both El Niño and El Niño Modoki play a role in the epidemics.A previous study by Petrova et al. (2019) mentions that dengue epidemics can be associated with different teleconnections for different time lags.Dengue transmission is sensitive to the variability of rainfall due to its cumulative nature.A recent finding shows that a strong positive IOD which leads to drought (Amirudin et al., 2020) can be linked with the pre-existing El Niño Modoki with lead time up to one year (Doi et al., 2020).Our results align to these claims.
Due to data limitations, the impact from spatio-temporal variations of virus serotype are missing from the study.An anomalous upsurge happens twice in our study period, the first one occurred in the 2013 summer is verified by microbiology evidence (Ng et al., 2015).The second one observed in early 2019 is thought to be due to another serotype shift.A self-service EWS received a user feedback that change of predominant serotype alone attains a 50% of sensitivity of outbreak detection (Hussain-Alkhateeb et al., 2018).Currently a passive surveillance system is being applied in Malaysia.Hence, there is a very high possibility of underestimation of the public health burden because of the combined effect of delayed and under-reporting of cases.The actual number of cases can be four to five times of the number of reported cases (Shepard et al., 2012).Another data limitation due to the change of support problem (Gelfand et al., 2001) on the point-referenced rainfall and ozone concentration levels data concerns with the spatial misalignment of data used to infer district-level dengue counts.The point-to-district matching brings extra uncertainty into the prediction.However, as the data recorded are averaged out temporally by week, the effect can be seen as negligible.
Although the transmission dynamics is proven to be temperature-dependent (e.g.: Chen and Hsieh, 2012), the relationship between entomological parameters and the environment variables has not yet been clearly studied.A recent article by Sun et al. ( 2021) studies a residential-block-level dengue vector population in Singapore.It is shown that the Aedes abundance is heavily associated with the building age and managed vegetation cover.With modern geographic information systems (GIS) technology, these information can be incorporated in the future work.
Thanks to the flexibility of our modelling framework, joint modelling on multiple diseases is a possible methodological extension.Caminade et al. (2017) show the mosquito-borne transmission of Zika in South America is fuelled by the El Niño climate phenomenon.Funk et al. (2016) suggest, with their extensive sensitivity analysis, models for dengue transmission can be useful for handling the dynamics of Zika transmission.Held et al. (2005) demonstrate that the joint modelling approach on multiple diseases achieves a gain in precision of the RR estimates.Niriella et al. (2021) spot a sharp decrease in dengue cases for the second quarter of 2020 compared with pre-COVID-19 peaks in Sri Lanka.The drastic measures imposed by the Sri Lanka government regarding COVID-19 outbreaks help the reduction of hospitalisations.An identical pattern is also found during the first five phrases COVID-19 lockdown in Malaysia (Ong et al., 2021).
Contrast to the No-U-turn sampling scheme explained earlier, integrated nested Laplace approximation (INLA; Rue et al., 2009) algorithm is a fast alternative to our existing methodology (e.g.: Lowe et al., 2018;Lowe et al., 2021).The model A, B and C can be naturally fitted by INLA with a significant faster speed.However, The dynamic models (model D and E) are not currently implemented in the INLA R package (Lindgren and Rue, 2015) due to its complex state-space structure.A vector autoregression component (VAR; Spencer, 1993) can be also added to our current setup to incorporate dynamic lagged effect from other variables.Implemented in Stan language (Carpenter et al., 2017), conditional dependence such as spatial heterogeneity, temporal dynamics and covariate structure can be simply introduced and modified under the hierarchical Bayesian modelling paradigm, allowing for greater modelling flexibility.

Conclusion
The SST anomalies with a lag time of six weeks in the central equatorial Pacific region is the most crucial driver to the Central Region of Malaysia dengue hospitalisations.The EWS built on a Bayesian spatio-temporal hierarchical model yields reliable forecasts to help out dengue disease outbreak surveillance for at least four weeks in advance.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Geographical distribution and population in the Central Region of Malaysia in 2019.

Fig. 4 .
Fig. 4. Time series of (a) average DIR , (b) lagged four-week average of Niño4 index, (c) lagged four-week average of Niño1 + 2 index in the Central Region, Malaysia for the period 2013-2019.

Fig. 5 .
Fig. 5. Pairwise scatter plots of the DIR along with the covariates used in the models.

Fig. 6 .
Fig.6.RR surface of dengue hospitalisations by six variables.The variable Temp is the temperature, Rain is total precipitation, Ozone is the groundlevel ozone concentration level, omega is vertical velocity of air motion derived from omega equation, nino12 is the Niño1 + 2 index, nino4 is the Niño4 index.

Fig. 8 .
Fig. 8.The receiver operator characteristic curve for the number of dengue cases forecasts for the first four weeks in 2019 in the Central region of Malaysia.

Table 2
Parameter estimates of RR for the negative binomial spatial model (Model C).

Table 3
Model validation statistics and performance measures of early warning system derived from five models, A-E.