Bayesian analysis of zero inflated spatiotemporal HIV/TB child mortality data through the INLA and SPDE approaches: Applied to data observed between 1992 and 2010 in rural North East South Africa

Highlights ► South Africa's HIV/TB burden in children and modelling which results in mortality risk maps. ► Analyses of hierarchical spatiotemporal data with zero inflated outcomes. ► Resolving the “Big N” on Gaussian fields (GF) by converting to Gaussian Markov Random Fields (GMRF). ► Use of fast and accurate approximate Bayesian algorithms, i.e. INLA and SPDE in lieu of MCMC. ► Medical or natural assumptions driven modelling that informs further research and policy.


Introduction
Public Health data on mortality have been growing increasingly rich as more accurate information on "who", "where" and "when" becomes available. These form hierarchical (multilevel) data structures which are correlated such that person-level ("who") information can be repeated, geo-statistical ("where") data often has spatial correlation and temporal ("when") data can be autocorrelated. Classical statistical techniques are usually based on independent observations, but when applied to multilevel data structures they often underestimate the standard errors. As a result of this the statistical significance is overestimated leading to erroneous results and subsequent inferences (Cressie, 1993).
This defeats the main goal in epidemiological analysis, which is to identify and quantify correctly any exposures, behaviours and characteristics that may modify a population's or individuals risk and use these to implement more appropriate interventions (Rose, 2001).
In modelling hierarchical data we can take into account spatial and temporal correlations by introducing in the model spatiotemporal random effects. Several other hurdles have to be overcome when modelling hierarchical mortality data such as: zero inflation when there is a greater proportion of non-occurrence for an outcome, handling large data structures, repeated measures and estimating many parameters rapidly and accurately. Bayesian techniques with the aid of the Markov chain Monte Carlo (MCMC) simulation methods have successfully overcome these hurdles and fit spatiotemporal random effects for reasonably sized geolocations of Gaussian fields (GF) (Berliner et al., 2000;Gilks et al., 1996;Casella and Robert, 1999;Wikle, 2003;Wikle et al., 1998). However as the number of geo-locations increases, MCMC computations of a dense GF m × m spatial correlation matrix become infeasible or extremely slow in the order of power three (O(m 3 )), this problem is popularly known as the "big m" or "big N" (Banerjee et al., 2004). Several approaches have been used to resolve the "big m". Banerjee et al. (2004) give brief summaries of these: subsampling, spectral, lattice, dimension reduction and course fine coupling methods (Banerjee et al., 2004(Banerjee et al., , 2008Banerjee and Carlin, 2003;Kamman and Wand, 2001;Johnson et al., 1990;French et al., 2002). Generally these techniques attempt to reduce the dimension of the GF by selecting a "representative" sub-sample or fixing some parameters or changing the scale from continuous to discrete with the aim of reducing the computational burden in running MCMC simulations.
We addressed this problem using techniques proposed by Rue and Held (2005) who changed the continuous scale GF to a discrete scale Gaussian Markov Random Field (GMRF), for the Matérn family of covariance structures (Rue and Held, 2005). More recently Lindgren et al. (2011) provides the detail of how the GF and GMRF relate via Stochastic Partial Differential Equations (SPDE) using basis functions Cameletti et al., 2012). Secondly we performed inference and prediction using Integrated Nested Laplace Approximation (INLA) well suited for GMRF as opposed to the commonly used MCMC (Rue et al., 2009). Hence we greatly reduced the computational burden and could do in hours what usually took days having reduced the computational operations for a spatiotemporal model from power 3 to power 1.5 (O(m 3 ) → O(m 3/2 )).
The aim of this paper is to discuss and apply a Bayesian model that can handle large zero-inflated spatiotemporal observational data on mortality producing reliable estimates speedily. In Section 2 we explore the Bayesian methods and model fitting inference, prediction and goodness of fit. Section 3 we apply the discussed Bayesian spatiotemporal model to the data from Agincourt in rural South Africa which has 71,057 children aged 0-9 years living in 15,703 households over the years 1992-2010. In Section 4 we discuss the merits of our model and distil the Public Health implications of our results in interventional studies.

Spatiotemporal model structure
The outcome y i (s j , t) was the observed HIV/TB related death of a child i = 1, . . ., N from a given household j = 1, . . ., m in a specific year t = 1, . . ., T which is a realisation of the spatio-temporal process y(. , .) ∈ Y(. , .). Assuming the outcomes distribution belongs to the exponential family of distributions, we can fit flexible structural additive models belonging to the generalized linear mixed models (GLMM) (Brezger and Lang, 2006;Fahrmeir and Lang, 2001). Our data may be represented by the equation: where X(s j , t) is the design matrix with fixed p covariates, ˇ = (ˇ0, . . . ˇp) is the regression coefficients vector, f(.) which is one of the f (j) used to relax the linear relationship or introduce random effects or both and ε i (s j , t)∼N(0, 2 ε ) are the error terms which are neither temporally nor spatially correlated (Cameletti et al., 2012). As our data are spatially and temporally correlated we can introduce random effects f(.) = f(s j , t) a Gaussian random field with a first order autoregressive temporal effect (s j , t − 1) and coefficient and zero mean multivariate normal (temporally independent) spatial effects ω(s j , t)∼MVN(0, ˙ = 2 ω C(||s j − s k || = h); j / = k) resulting in the equation: where | |<1 in case of stationarity, (s j , 1)∼N(0, 2 ω /(1 − 2 ) = 1/ ω (1 − 2 )) and the spatial effect is second order stationary and isotropic. When the spatial correlation follows a Matérn covariance structure we obtain C(h) = (1/( ( )2 −1 ))(Äh) K (Äh) for the Euclidean distance lags h. The parameter measures the degree of smoothness and also the order of the modified Bessel function (when > 0) of second kind K and finally Ä > 0 is the scaling parameter with a range = ( √ 8 /Ä) where the spatial correlation is close to 0.1 for each Rue et al., 2009).

Zero inflated Poisson and Binomial spatiotemporal models
Observational binary outcome data are commonly analysed using the logistic regression model, which has a logit linear predictor in the GLMMs canonical link structure. However this model has problems of instability especially with spatial random effects, which would be exacerbated due to zero inflation (Agarwal et al., 2002). In epidemiological cohort studies a relative risk is more preferred than an odds ratio as this caters for temporality and also a more intuitive measure of burden of morbidity or mortality (Barros and Hirakata, 2003;Fekedulegn et al., 2010). In light of this and in an endeavour to have better fitting models, two models that can handle zero inflation were employed. The two conditionally independent models fit were the zero inflated Poisson and zero inflated Binomial with a log and a logit canonical link functions respectively: The mortality outcome (count/binary) data y t (s j ) observed at the households in the Agincourt area are zero inflated and assumed to follow either a Poisson (count) or Binomial distribution (binary). We will occasionally drop the s j for notational convenience in the rest of the article. We therefore resorted to the zero inflated models to cater for the imbalance due to many zeros. The model that takes care of zero inflation (Â t ) can then be represented as: With canonical links of the expected means: g(E(y t (.)) = log( t ) = Á t and g(E(y t (.)) = logit(p t ) = Á t with means are t = exp(Á t ) and p t = (exp(Á t )/(1 + exp(Á t ))) for the Poisson and Binomial distributions, respectively. A spatiotemporal canonical link (linear predictor) model can be expressed as where ε t (s j )∼N(0, 2 ε I m ) with identity matrix I m of dimension m × m, ω t ∼N(0, ˙ = 2 ω˜) with a stationary AR(1) process 1 ∼ N( 0, /(1 − 2 )) (Cameletti et al., 2012). The ˙ is a dense GF m by m dimensional matrix from a Matérn distribution with scale and smoothness parameters Ä and (which is fixed in all our computations) respectively. As the size of m increases computations become increasingly more difficult due to the "big m" as previously highlighted.
2.3. Solving the "big m" using the SPDEs to estimate the spatial random effects To resolve the computational burden associated with the GF Matérn covariance function we used a technique that changes this to a GMRF proposed by Rue and Held (2005). Briefly the locations are converted into areal triangulations firstly by making them the initial triangle vertices before adding more vertices for proper triangulation which extends the grid and very useful for prediction. Fig. 1 shows how we employed triangulation to our data, the diagram on the right was made assuming households within 500 m were similar we fit 488 vertices and 938 triangles.
The SPDE technique redefines the Matérn field as a finite "representative" linear combination of basis functions on a triangulation of the locations (Cameletti et al., 2012). Hence for our spatiotemporal random effects in GMRF representation we get ω t ∼N(0, ˙ = Q −1 s ) and 1 ∼N(0, where Q s is a sparse time-invariant precision matrix with dimension m* vertices from the triangulations. A joint spatiotemporal GMRFf t ∼N(0, Q −1 = (Q T ⊗ Q s ) −1 ) whose precision matrix is the Kronecker product of the temporal and spatial precision matrices; is such thatf t ≈ Bf t where the basis B is a sparse matrix with unit elements for matching triangle vertices and zero's elsewhere . Therefore Eq. (4) becomes Á t = X tˇ + Bf t + ε t , where we let x t = {ˇ, B, Â t } be the vector of latent Gaussian fields and ϕ t = { 2 ω , , Ä, 2 ε } being a vector of unknown parameters. We can thus express our model into a hierarchical Gaussian latent variable fashion as follows, stage 1 -observational equation, stage 2 -latent Gaussian field and stage 3 -parameter model (Simpson et al., 2011): where the precision matrices Q −1 . (.) are either small enough (for easier multiple factorisation) or sparse (Simpson et al., 2011b). These models cover a wide range of models and are easily estimable using INLA as shown in the next subsection.

Bayesian inference using INLA
In accordance with the Bayesian paradigm we aim to find the posterior distribution of the processes and parameters updated by data (Wikle, 2003). This could be expressed as: Applying this expression to our model, letting = {ˇ, , Â t , 2 ω , Ä, 2 ε } denote the vector of all parameters and dropping the subscripts to present in vector form, = { t } and data y = {y t } (Cameletti et al., 2012;Rue et al., 2009), their joint posterior distribution is thus: Fig. 2 gives a simplified pictographical view, where level 1 are the data and assumed distributions, level 2 is a process, level 3 and 4 are parameters and level 5 gives the default hyper-parameters used in the INLA package.
The posterior marginals are required, standard (i), nested approximation (ii) and numerical integrations (iii) for latent fields x = {ˇ, f, Â} and hyper-parameters ϕ = { , 2 ω , Ä, 2 ε } respectively (Rue and Held, 2005;Rue et al., 2009) Using this technique we aim to initially get the terms "nested" inside the integrand in Eq. (7ii) left hand side without integration (Simpson et al., 2011b). To do so we firstly estimate the marginal˜ G (x|ϕ, y) which is a Gaussian approximation of x with mode x * (ϕ), for a given ϕ. Secondly we estimate˜ (x i |ϕ, y) = N(x i ; i (ϕ), 2 i (ϕ)) using either Gaussian or Laplace or a simplified Laplace approximations (Rue et al., 2009). We computed these marginals using the R package Integrated Nested Laplace Approximation (INLA) which uses the simplified Laplace approximations (Rue and Held, 2005;Rue et al., 2009;R-cran, 2010). The INLA procedure also enables easier spatial prediction since it computes posterior conditionals for the spatial random effects on all triangulation vertices including the extensions as shown in Fig. 1.

Model goodness of fit and convergence diagnostics
We assessed the accuracy of˜ (ϕ|y) using the effective number of parameters, which can be approximated as the difference between the dimension of the normalised integral {˜ (ϕ|y)} n and the trace of the product of the prior precision matrix and the posterior covariance (Spiegelhalter et al., 2002): The deviance information criteria (DIC) was also used which is defined as the difference between twice the mean of the deviance and the deviance of the mean according to Spiegelhalter et al. (2002) and expressed as: Interpretations of these is quite straight forward the smaller the effective number of parameters the more parsimonious the model and the smaller the DIC the better the model fit, more-so most parsimonious is not always the best model.

Rural South Africa Agincourt HDSS data, study design and ethics
The Agincourt health and demographic surveillance system (HDSS) site was set up in the Agincourt sub-district in 1992 due to its remote location, availability of several clinics and presence of Mozambican in-migrants (Tollman et al., 1999). By 2010, the Agincourt HDSS had a population of over 84,000 persons living in approximately 17,000 households scattered throughout 27 neighbouring villages. Cause of death data were obtained through verbal autopsies conducted on every recorded death (Clark et al., 2007). Interviews were conducted by trained field worker. This was done  within 1 year after a death, with the closest caregiver of the deceased in their mother tongue. Cause of death was independently determined by two medical practitioners with the third as a tie breaker. Their consensus cause of death was classified according to the World Health Organization's International Classification of Diseases (ICD10) (Kahn et al., 2000). HIV/TB mortality in children was ascertained by the reported signs and symptoms, and in some instances this was verified through the mother's cause of death (Kahn et al., 2000). Over 90% of the HDSS households were geo-coded by 1992 and by 2010 all the households were geocoded, thus enabling spatial analyses at household as well as village level. The study design was a retrospective cohort study covering households observed from the onset of the site to December 2010. The Agincourt HDSS site was granted ethical clearance by the University of the Witwatersrand's Committee for Research on Human Subjects (No. 960720). This work was also granted ethical clearance by the University of the Witwatersrand's Committee for Research on Human Subjects (M081145). Verbal informed consent was obtained when the census rounds were conducted and also when verbal autopsy data were collected from a close relative of the deceased.

Dependent and independent variables
The persons included in the study were all the children aged between 0 and under 10 years who lived or had lived in the Agincourt HDSS between January 1992 and December 2010. The independent variables used were: child's gender, birth-weight category, age category, slum (availability of water, electricity and toilet) and number of mother's antenatal clinic visits during pregnancy and year of observation. The household's latitude and longitude were used to construct the latent variables for the spatial random effects and year the AR(1) temporal effects. The dependent variable was death due to HIV and or tuberculosis (TB) determined by the WHOs ICD10 verbal autopsy codes A16-A19 1 for HIV and B20-B24 2 for TB. These data were extracted using Structured Query 1 A16 = respiratory tuberculosis; not confirmed bacteriologically or histologically; A17 = tuberculosis of nervous system; A18 = tuberculosis of other organs; A19 = miliary tuberculosis. 2 B20 = human immunodeficiency virus (HIV) disease resulting in infectious and parasitic diseases; B21 = human immunodeficiency virus (HIV) disease resulting in Table 1 Univariate and multiple regression results models using zero inflated Poisson adjusting for spatiotemporal random effects.

Descriptive statistics
There were similar proportions of boys and girls (50.3%) with an average age of 6.14 years (standard deviation of 3.38). Almost 9% of the births were low-weight, the majority (60.3%) of the children were only observed after their 5th birthday. Only 8.09% had access to all three: clean tap water, flush toilet inside house and electricity, which is expected in typical rural areas. Health checkups during pregnancy averaged about 6 visits, with the majority (73%) still not going at all. A total of 456 deaths were HIV/AIDS (including HIV/tuberculosis) which was a small proportion of the population indicative of the presence of zero inflation. Mortality was moderate in the early to mid 1990s before gradually growing from 1999 and reaching a peak in 2001, remaining high for several years before gradually declining from 2007 as shown in Fig. 3.

Zero inflated Poisson and Binomial spatiotemporal modelling results
We performed univariate and multiple variable analyses using both the ZIP and ZIB models. Multiple variable models were fit systematically starting with one without random effects, spatial only, temporal only and finally spatiotemporal random effects. The Rcodes using INLA are given in Appendix 1. in Tables 1 and 2, our discussion will be centred on the ZIP spatiotemporal model which was the best fitting model since this had the lowest DIC (4532.31) and catered for spatiotemporal random effects, as later indicated in Section 3.5.
Two variables consistently showed significant associations with child HIV/TB mortality, firstly the age category of the child, which showed a decrease in mortality with increase in age (Chi-Square p-value < 0.001). Those aged 1 to under 5 (127/19,619) were 82% {0.18(0.14; 0.22)} less likely to experience a death due to HIV/TB compared to those under 1(262/8580) and those over 5 and under10 (67/42,858) were 96%{0.04(0.03;0.05)} less likely to die     {0.73(0.53;0.99)} less likely to die due to HIV/TB relative to those who were born with a low birth weight of less than 2.5 kg (76/6329) independent of other risk factors. The only other significant protective factor was the number of antenatal clinic visits made by the mother; for every additional visit the mother attended the risk of losing the child from HIV/TB decreased by 4% {0.96(0.94;0.97)} keeping other variables constant as shown by the spatial only and no-random effects model. Those who had access to all three electricity, water and flush toilet (who in this area are the more affluent) experienced fewer deaths due to the HIV/TB compared to those who had none. We show the posterior marginal point estimates from the ZIP and ZIB models in Fig. 4 top and bottom respectively. As shown in Fig. 4, the region with the greatest risk of child HIV/TB mortality is the central region with a trend towards the south-westerly. In the context of our findings, this is the region whose households have lower birth weights, greater under 1 mortality and having lower visits to the health facility compared to households in other regions. The lowest risk region is the top northeasterly with the ZIP model showing few clusters than the ZIB model. The standard error maps indicate the greatest errors on the peripherals which are also the regions where the extensions of the triangulation shown in Fig. 1. Also consistent with our results from Tables 1 and 2 there is a greater margin of error with the ZIB model as compared with the ZIP model.

Model assumptions, goodness of fit and convergence diagnostics
Using the final spatiotemporal models in both ZIB and ZIP we investigated the presence of zero inflation, temporal and spatial correlation. The null hypothesis for zero inflation was that the inflation parameter Â t is zero (no zero inflation) our results in Tables 1 and 2 indicate significant presence of zero inflation of 0.04 and 0.42 for ZIP and ZIB respectively. Investigation of temporality was done by testing the coefficient of the first order autocorrelation model with null hypothesis that the process was not stationary | |>1. Both our models indicated significance presence of stationarity that is 0.86 and 0.87 for ZIP and ZIB respectively. Fig. 5 shows the posterior means plus 95% credible bands, time series plots from the modelling which resembles the observed data shown in Fig. 3. Lastly on checking model assumptions we investigated the presence of spatial effects by inspecting the significance of the components (Ä and ) of the Matérn covariance (see Tables 1 and 2). Also this was investigated with the classical Moran's indices on the spatial residuals testing the null hypothesis that there is zero spatial autocorrelation which from both tests yielded significant results, hence rejecting the null.
We compared the two models the ZIB and ZIP and found that the latter fitted better in all the variants from the no-random effects model to the spatiotemporal as this yielded smaller DICs and model parsimony. Some of the parameter estimates were similar, except for the zero inflated parameter, age categories and birth weights. The ZIP model also showed more stable estimates and consistent results in the adjusted relative risks (adjusted RRs) compared to the ZIB and narrower credible intervals. As a way of showing the convergence pictographically for some parameter estimates we show the fixed effects logarithmic form posterior means density plots in Figs. 6 and 7. The plots in Figs. 6 and 7 show that the parameters fitted well as GMRF since they are symmetrical about thus the mean equals the mean and mode.

Conclusions
The Public Health rational of our results is centred on lack of health seeking behaviour in rural areas. Interventions could target two groups of mothers (also caregivers) those not yet infected and those already infected. For both groups maternal attendance of health facility can minimize the risk of under one mortality and possibly giving birth to low-weight children. For those mothers' already infected several options are there; before child birth (prevention of mother to child) and after birth (anti-retro-virals -ARV uptake) and also early treatment on ARVs for the parent to increase their longevity. A significant decline in mortality can already be observed in our cohort post 2007 the year when ARVs were started in the area. Poverty however still remains the greatest challenge in the developing countries and mostly interventions when available are very scarce. As such our results from spatiotemporal Bayesian modelling coupled with maps can be very handy in allocation of the limited resources of aid.
Modelling that controls for potential confounding is an addon to epidemiological studies and gives strength to the estimates derived from such modelling procedures. Our modelling approach caters for potential spatial and temporal confounders. This approach was able also to cater for "large" zero inflated spatiotemporal data. The years 1992-2010 cover a span of 19 years (76%) of the United Nation's Millennium Development Goals (MDG) period of 25 years. Goal 4.1 aims at reducing child (under five) mortality by two thirds over that period (Black et al., 2003). Our results demonstrate such a significant decline in mortality due to HIV/TB.
A major limitation in developing nations is lack of reliable and user friendly analyses software, which was addressed by using INLA a package available on the public domain. Zero inflated Poisson (ZIP) and zero inflated Negative Binomial (ZINB) models were used for log link function linear predictors and also catered for spatial random effects (Lambart, 1992;Ridout et al., 2001). It has also been noted from other studies that the log link function linear predictor models were more stable than logit link function models when they contained a spatial component. This finding was also confirmed in our study as we observed that models from the exponential family with a log-linear predictor were more stable and able to converge better than the logit-link function model. This motivated us to treat the outcome as a discrete measure as opposed to binary mostly used for logistic regression (Agarwal et al., 2002). We demonstrated that the "big m" can be resolved with great ease for Public Health mortality hierarchical structures with the aid of SPDEs and INLA.