Studying the probability of spruce beetle caused mortality in Colorado's spruce forests using Bayesian hierarchical models

The association between environmental factors and Engelmann spruce mortality in western Colorado was investigated using Bayesian hierarchical zero‐and‐one inflated beta regressive model. The results indicated that the probability of mortality occurrence rate was positively associated with moderate to warm temperature zones, moderate precipitation, and single canopy stands with the smaller size class of spruce within stands. The probability of full mortality rate was positively associated with warm and humid climate zones with spruce as subdominant species in the vertically complex stands. On the other hand, higher mortality rate was more associated with local stand characteristics than climate factors that low mortality rate associated with stands dominated by spruce of a large size classes. The correlation model offers an approach to create predicted risk map for probability and rate of spruce mortality that could help manager in precedent planning for the upcoming outbreak events.

(1996) used a tree-based classification approach to model spruce beetle hazard in Lutz spruce in Alaska and showed that beetle incidence was associated with total stand basal area, dominance of spruce, spruce age classes, elevation, and aspect. Hebertson and Jenkins (2008) used a similar tree-based approach in Utah to show that temperature in winter and drought severity index precedently associated with outbreak years. DeRose and Long (2012) found that latitude, slope, aspect, stand structure, species composition, site productivity, winter minimum temperature, and summer maximum temperature were strongly related to spruce mortality in southern Utah in the early stages of an infestation, but eventually be weakened with time. A detailed examination of the complex mechanisms of outbreak were presented by Raffa et al. (2008) illustrated that outbreaks were results of a series of excessive internal and external factors over the thresholds, driving cross-scale interactions, feedback, and nonlinear relationships.
Probabilistic models are common statistics tools used to study ecological patterns those characterize spatially explicit occupancy of organisms (Hoeting, Leecaster, & Bowden, 2000;MacKenzie et al., 2002). These models can assess the influences of hypothesized environmental covariates on the presence, absence, or abundance of targeted taxa and can spatially address the probability of occurrence across the study area. For instance, generalized linear model (GLM) was used to describe the association between spatially explicit responses of organisms and associated environmental covariates (Guisan & Thuiller, 2005). Bayesian-likelihood models, in the other hand, allow multiple problems within a hierarchical structure to be linked via a normalized likelihood function and the results, the posteriors, were sampled with the Markov chain Monte Carlo (MCMC) approaches (Gelman & Hill, 2006;Hooten & Hobbs, 2015). Structure of spatial dependency can also be integrated into the probabilistic models to address the autocorrelated errors in which could increase the efficiency of the model, with Gaussian process being often used (Banerjee, Carlin, & Gelfand, 2014;Banerjee, Yandell, & Yi, 2008;Chelgren, Michael, Bailey, & Bury, 2011).
Hierarchical modeling approaches have been applied in the studying the impact of disturbances, and management interventions on species richness; effects of community composition on diseases of individual trees; effects of logging and sanitation thinning on density of bark beetle infestations (Giovanini, Andrew, Jones, Altman, & Arnett, 2013;Haas, Hooten, Rizzo, & Meentemeyer, 2011;Stadelmann, Bugmann, Wermelinger, Meier, & Bigler, 2013), and so forth. To mitigate the excessive zero data, which detection of presence is uncommon, the probabilistic models often be depended on zero-inflated modeling structure (Ridout, Demétrio, & Hinde, 1998). The inordinate absence in data are separately modeled using binomial likelihood. Sometimes, model was decomposed into multiple likelihood parts to account for both extreme and intermediate values in which each part is separately modeled using the logistic regression (Nishii & Tanaka, 2012). The general class of models with binomial-beta framework was proposed by Ferrari (2010, 2012) to address the extreme values at both ends of the data by applying multiple stages logistic regression for proportional data that was separated into three processes consist of modeling of presence-absence, occurrence of 100-percent rate, and intermediate proportional values.
In this study, a spatially explicit Bayesian multistage model was developed to investigate the mortality of P. engelmannii infested by D. rufipennis. The relationships between mortality, environmental factors at local and landscape scale, and spatial dependency were statistically examined. A zero-and-one inflated beta regression model was formulated using three consecutive likelihood structures based on the framework presented by Ospina and Ferrari (2012) to account for presence-absence and rate of Engelmann spruce mortality associated with climate, stand-level covariates, and spatial dependency. The aim of this study is to develop an approach to assess the effects of combined multi-scale environmental variables composed of climate data, stand structure, and composition on Engelmann spruce mortality, and to create a predicted risk map based on the associations described by the best model.

| Study area and data collection
Field data were collected from 55 sites in 8 National Forests in western Colorado during July 2013 and August 2014 by cluster sampling (Figure 1). A sampling site was composed of three variable size plots approximately 50 m separated from each other, and each site was randomly located at least 50 m from the edge of forest road with 1.5 km minimum distance between sites. P. engelmannii distribution map obtained from part of the landcover data (Gap Analysis Program; https://gapanalysis.usgs.gov/gaplandcover/) was used for study sites placement conditional on spruce-fir forest type. In each plot, all individual trees, both alive and dead, designated by relascope to be within the plot were tallied for species and measured diameter at breast height. The presence of D. rufipennis larval feeding galleries was searched and identified on every suspected tree. Stand characteristic data were summarized for average basal area per tree, species richness, species dominance, vertical stand complexity (number of canopy layers), and mortality rate as proportion of total basal area.
F I G U R E 1 Study area in western Colorado and the locations of 55 study sites (black dots). The study area was delineated by the forested areas in Colorado Numeric plot data were discretized to account for the possible nonlinearity without adding excessive complexity to the model function. Class intervals were determined to attain equal frequencies among classes. Relative dominance (RD) of P. engelmannii was divided into four classes: RD1 for [0%, 25%], RD2 for (25%, 50%], RD3 for (50%, 75%], and RD4 for (75%, 100%]. Average basal area per tree (BT) was classified into three categories: BT1 for between [0, 1] square feet/tree, BT2 for between (1, 2] square feet /tree, and BT3 for >2 square feet /tree. The vertical structure (S) of stand was categorized into three classes: S1 for stands with one story, S2 for stands with two stories, and S3 for stands with at least three stories. Climate data were obtained from a previous study (Aquirre-Bravo & Reich, 2006) by spatially intersecting locations of study plots with the predicted climate map. Precipitation (P, from P1 to P5) and average temperature (T, from T1 to T5) data were categorically combined into 25 climate zones (5-by-5 combinations) based on geostatistic interpolation obtained from previous studies (Figure 2; Aquirre-Bravo & Reich, 2006;Reich et al., 2014). Delineation of climate zones provide an opportunity to examine the nonlinear effect of climatic factors and avoiding the use of highlybias numeric climate data (Pongpattananurak, Reich, Khosla, & Aguirre-Bravo, 2012;Reich et al., 2014).

| Model specification
A hierarchical Bayesian framework was implemented to accommodate multiple steps of GLMs. Definition of model parameters were shown in Table 1. The first step in the formulation of model is zero-inflated component, which describe the binary response of spruce mortality occurrence (y 0 i ≥ ), given by z i . The second binary response was then modeled to inflate full mortality (y = 1 i ), given by v i . Occurrence of mortality and full mortality among sample plots can be formulated as zero-and-one inflated model based on hierarchical binomial process. The estimated parameters p i and ϕ i represent the probability of occurrence of mortality (z = 1 i ) and F I G U R E 2 Maps represent temperature zones (left) and precipitation zones (right). The lower factor values of climate zones variables represent lower temperature and precipitation and vice versa probability of full mortality (v = 1 i ) at plot i, respectively. The following equations capture the relationships within zero-and-one inflated structures: The first part (Equation 1-3) is zero-inflated binomial process created to describe the presence of mortality, z i , with proportion parameter p i . The subsequent structure (Equations 4 and 5) described the probability ϕ i to inflate extreme mortality rate (y = 1 i ). Since neither y = i 0 nor 1 are in the domain of beta distribution, absence and extreme value must be modeled separately leaving out the value of mortality rate in-between. Equation (6) shows the full probability of the zero-andone inflated structure. The formulation of beta model follows after zero-and-one binomial process with conditional probability of ϕ 1 − i (Equation 6), to address the value of mortality rate between 0 and 1 given the mortality occurs. Then, beta likelihood was incorporated in the last part of hierarchical structure to model mortality rate, y i , given y 0 < < 1. The model can be formulated as in Equation (7), parameterized by mean, μ y i , and variances, σ y To address the association between mortality and environmental factors, logistic regression was implemented to model the probability of mortality occurrence, p, probability full mortality, ϕ, Effect of spatial covariance λ Exponential decay rate and average mortality rate, μ y , as the responses in logit form (Banerjee et al., 2014;Finley, Banerjee, & McRoberts, 2007). Equation (8) represents the logistic regression with X as design matrix and γ as model coefficients. To account for the spatial dependence, a normal distribution of spatial errors with exponential decay covariance function was implemented in the following structure (Equations 9 and 10): Given that the value of logit response μ ( ) is a linear function of environmental covariates with spatial error terms (w) described by latent Gaussian process. Spatial covariance matrix (∑) address spatial dependence between pairs of observed locations described by variance effect parameter (σ w 2 ), exponential diminishing function of Euclidean distance, D, between pairs of points, and exponential decay parameter, λ. Probability of mortality occurrence, probability of full mortality, and average mortality rate were all modeled in the same approach. Directed acyclic graph of hierarchical structure was shown in Figure 3 and full likelihood functions of binomial zero-and-one inflated model and beta model could be, respectively, written as the following equations: beta( | ( ( , )), ( ))normal( , ( , )).
The hierarchical model was developed under the Bayesian framework (Gelman & Hill, 2006). Given f α and f β are the moment matching function for beta mean and variance of mortality rate. The coefficients, γ, were assigned by vague normal priors centered at zero with large variances. The spatial variance effect term, σ w 2 , received vague gamma priors centered at one with large variances. Centers of priors for regression parameters were specified by a preliminary analysis of fitted GLM model without spatial dependency. Then, the residuals obtained from the preliminary GLM model were used to fit spatial semi-variogram model with GeoR package (R Core Team, 2014; Ribeiro & Diggle, 2001). The estimated range parameter from preliminary semi-variogram model were used as center of the uniform prior for spatial decay parameter, λ, with value between zero and two times of center's value. Model fitting was performed by Just another Gibbs sampler Software (JAGS; Plummer, 2004) through R via rjags package (Plummer, 2013). MCMC with Metropolis and Gibbs sampling algorithm implemented to update the posteriors. The sampling was performed on three independent MCMC chains of 100,000 iterations with 10,000 burn-in samples. Parameters for each chain were randomly initialized within a feasible range according to priors. Gelman-Rubin diagnostics were applied to assess the convergence of posteriors. Model evaluation and selection was done using deviance information criterion (DIC; Spiegelhalter, Nicola, Carlin, & Van Der Linde, 2002). Candidate model with the lowest DIC value suggest to be the most optimal. Furthermore, the other evaluation methods were also used to address for accuracy of the model without involving in model selection. Mean squared prediction error (MSPE) was used in posterior predictive check to assess the accuracy of the prediction. Besides, log predictive density (LPD) was estimated by averaging over the iteration of total likelihood conditional on parameters within MCMC chains.

| Prediction of mortality at regional scale
The probability of mortality occurrence, probability of full mortality, and mortality rate were predicted from 0.5 quantile of posteriors across MCMC conditional on the presence of host species (20,547 of the 1-km grids). First, the regional climate covariates (temperature and precipitation zones) were known throughout whole study area, however, prediction for the whole study area required the covariates of stand-level characteristics to be estimated for the whole spatial prediction domain. The stand-level covariates were estimated by linear spatial interpolation using the inverse distance weight (IDW) method considering 12 nearest neighbors (Figure 4). The interpolation processes were applied using ArcMap (Environmental Systems F I G U R E 3 Conceptual diagram of the hierarchical Bayesian model (Directed Acyclic Graph) of zeroand one-inflated beta model. Above is the model for p and ϕ, the zero-and-one inflated binomial process. Below is the beta model for continuous proportion of mortality, y. Solid lines represent stochastic relationship between parameters while dashed lines represent deterministic relationship Research Institute, 2011) on all grids that P. engelmannii was presence in the study region based on the nearest observed locations.

| RESULTS
The hierarchical Bayesian zero-and-one inflated beta regression models were fitted to examine the relationship between environmental factors and spruce mortality. All candidate models were validated by DIC, LPD, and MSPE posterior predictive checks. The best five candidates for separate models were selected based on the information criteria of DIC to incorporate tradeoff between model predictive power versus complexity result in the optimal simple model. The comparable of information criterion for the best models were shown in Table 2.
The best zero-inflated model describing the environmental association with the probability of mortality occurrence was composed of temperature zone (T), precipitation zone (P), vertical structure of stand (S), and average basal area per tree (BT; Figure 5). For the spatial parameters, F I G U R E 4 Map of interpolated covariates from linear inverse distance weight (IDW) with 12 nearest neighbors. Top left is basal area per tree (BT). Top right is stand structure (S). The bottom map shows relative dominance (RD) of spruce. The study area was delineated by the forested areas in Colorado posterior of exponential decay rate (λ) and spatial dependence effect (σ w 2 ) were median of 0.017 with 95% credible interval of [0.014, 0.021], and 7.028 [6.522, 7.555], respectively ( Figure 6). The derived spatial effect was shown in Figure 6. Posterior density for coefficient of each temperature and precipitation zone was compared against middle baseline at temperature zone (T3) and middle precipitation zone (P3), respectively. Note: The posterior predictive check (MSPE and LPD) was listed. Three stages of modeling consist of zero-inflated model for dealing with the presence of spruce mortality, one-inflated model for the presence of full mortality conditional on the presence of mortality, and beta regression model estimating the proportion of partial mortality. The covariates include temperature zones (T), precipitation zones (P), number of stories (S), relative dominance (RD), and average basal area perper tree (BT).
F I G U R E 5 Posterior distribution of coefficient parameters of the best-fitted zero-inflated (white boxes) and beta regression model (gray boxes), compared with baseline (gray solid lines). The covariates consist of temperature zone (T), precipitation zones (P), relative dominance (RD), number of stories (S), and average basal area per tree (BT) were included in the best-fitted model. Error bars showed 95% credible interval of posterior distributions basal area per tree were compared against the minimum baseline at the single level class (S1) and the smallest size class (BT1 , were significantly negative related to the occurrence of mortality relative to S1, with a monotonic relationship. The best one-inflated model describing the probability of full mortality was composed of the coefficients of T, P, S, and RD. The posterior densities of model parameters according to the best models are shown in Figure 7. For the spatial parameters, posterior of exponential decay rate (λ) and spatial dependence effect (σ w 2 ) were 0.  (10) showed a mild unimodal pattern when being compared with minimum baseline at S1. The coefficients of all classes with high relative dominance of P. engelmannii classes were negative compare with minimum baseline at RD1, however, The predicted risk maps for probability of occurrence of mortality, probability of full mortality, and mortality rate were created conditional on MCMC chains of model coefficients F I G U R E 8 Prediction map at 0.50 quantile of response values for the best-fitted zero-and-one inflated beta regression model. Each map represents the prediction of predicted probability and proportion for separated process of the model; zero-inflated (ZIM), one-inflated (OIM), and beta model (BRM). The study area was delineated by the occurrence of spruce-fir forest in Colorado following model with the lowest DIC (Figure 8). The regional prediction was done for median (0.50 quantile), lower bound of credible interval (0.025 quantile), and upper bound of credible interval (0.975 quantile). Table 3 showed the proportion of study area that the value fell in the 0 and 1 interval for each quantile predicted risk map. There was a significant difference of predicted value between quantile level of zero-inflated and beta model, while there was a less difference of prediction between quantile level of one-inflated model.

| DISCUSSION AND CONCLUSIONS
The causes of bark beetle outbreaks are often difficult to be deciphered because vulnerable host populations and aggressive beetle populations must co-occur within a favorable spatially varying environments. Temperature and precipitation play especially significant roles in determining the occurrence, distributions, and severity of outbreaks across forested regions. This study showed that a Bayesian framework can be used to account for the occurrence and severity of beetle caused spruce mortality by applying zero-and-one inflated binomial and beta likelihood with spatial dependency of errors described by Gaussian process.
The results indicated that the occurrence and rate of spruce mortality were associated with regional-scale temperature and precipitation zones, and with community-level stand structure, relative dominance, and tree size class (Allen et al., 2010;Breshears et al., 2005;Hanson & Weltzin, 2000;Rennenberg et al., 2006). Low temperature was negatively related to both the probability of spruce mortality occurrence and full mortality rate, while warm temperature was positively associated with full mortality rate. For the relations with local stand characteristics, probability of occurrence and mortality were both higher in single-story stands with medium to large size class of stands, while full mortality rate was associated with higher stand structure complexity and low relative density of Engelmann spruce. These showed that relationships of environmental covariates at multiple scales with mortality occurrence and rate were necessary to predict the risk spruce mortality at landscape level in western Colorado. The spatial dependence of the model showed a very strong and mild spatial dependency in zero-inflated model and beta process model, respectively, with the range around 200 km. This showed a large scale spatial dependence of errors in predicting the occurrence and rate of mortality. Nevertheless, there was a very weak spatial dependency in one-inflated model addressing full mortality rate which was not common in observations and implied that local scale factors might be more important driving such extreme mortality rate. Nevertheless, the clustery observed locations in this study which based on sparse dispersion of available host and protected area cause the lack of the intermediate distance of around 30-80 km. This made the spatial dependency at intermediate distance to be underrated.
Past studies have explained some of the correlations between susceptibility of host tree, population dynamics of bark beetle, and environmental factors. This study provides an empirical evidence that deviation from optimal climatic conditions could either alter physiological processes of hosts and increase their susceptibility to bark beetles, or increase population of bark beetle that would overcome even the non-susceptible trees (Huberty & Denno, 2004;Park Williams et al., 2013). Several hypotheses have explained these relationships, for example, Plant Stress Hypothesis (PSH) in which tree under stress generate higher concentrations of amino acids providing more palatable food source for mortality causing agents (Mattson & Haack, 1987;White, 1993). Moreover, high physiological stress also decreases hosts' ability to synthesize defensive secondary chemicals (Rhoades, 1983). Higher palatability and weakened defensive mechanisms could make plants to become more susceptible to insects and diseases. Our study found that mortality rate was lower in cold places and the other way around. Higher evaporation in warmer climate zones could cause higher level of water vapor deficit leading to physiological stress of host species. Reich et al. (2016) referred warm and dry climate as the marginal sites in which the both host population and site suitability is low, on the other hand, host has higher population and be less susceptible in niche center. Spatial distribution of marginal sites across could have impact to the landscape heterogeneity of mortality occurrence and rate, especially in regime of climate shifting (Mietkiewicz, Kulakowski, & Veblen, 2018). The results can also be conjectured by Climate Release Hypothesis (CRH). The CRH states that population dynamics of insects and other disease causing agents are directly related to abiotic factors (Price, 1991), for example, the increasing of dietary substances due to climate induced stress of host species; or warmer climate accelerates life cycle of agents (Burnett, 1949;Ayres & Lombardero, 2000;Mattson & Addy, 1975;Bentz & Six, 2006). This increase suitability for insect populations that eventually respond to the conditions by altering assemblage composition, range extension, ecological interactions, and timing of life history. These ecological shifting could enhance the aggressiveness or virulence of mortality causing agents to overcome defensive mechanism of host regardless of susceptibility (Burnett, 1949;Danks, 1992;Weed, Ayres, & Hicke, 2013).
Spatial data and statistical modeling could be applied to investigate the influence of the environmental factors on the spatial extent, distribution, and severity of mortality related to forest health. The insight of host-agent-environment interactions could help manager made a spatially explicit prediction of risk, severity, and spreading of mortality in the regional level based on existence of information at multiple scales. The predicted map created from the best model showed that predicted probability of mortality occurrence and the mortality rate are higher in the northern part of study area, while the southern part had higher probability of full mortality. According to field data, there is a gradual change in pattern of forest community structure and composition across the study area in north-south direction (see Figures 2 and 4). This also implied the pattern of spatial distribution of local environmental factors affect the heterogeneity of risk and severity of spruce mortality across the region conditional on the suitable climate conditions. The predicted risk of this study is also congruent with the several historical major outbreak events in central Rocky Mountain in the mid-20th century. This field data of this study captured the rising period of spruce beetle outbreak which was at its peak in 2017 that impacted more 67,000 acres of Engelmann spruce forest, and revealed the spreading of outbreak into historically undamaged stands those strongly link to proximity to the extent of historic outbreak, and shifting in climate regime (Hart, Veblen, Schneider, & Molotch, 2017). Moreover, there were clusters of severe mortality patch of spruce forest in the southern part of Colorado, predicted with high probability of full mortality while probability of mortality occurrence is low. According to the outbreak event in 2017, the Southern Colorado forest suffered very high rate of mortality. Therefore, understanding the importance of environmental drivers at both local and regional-scale could provide us an ability to predict occurrence and severity of bark beetle epidemic.
Spruce beetle outbreaks have been one of the most damaging disturbances of forests in Western North America. Understanding what drivers cause the onset of an outbreak, sustain it, and cause it to subside would greatly help managers to be better prepared for the future outbreaks. This study helps improve our ability to predict spruce mortality by using hierarchical models to address for the interactions between spruce bark beetles, hosts, and environmental factors that could lead to useful prediction of risk occurrence and severity of the devastating outbreaks.

ACKNOWLEDGMENTS
I would like to express my very great appreciation to late Dr. Robin Reich for his invaluable guidance during my years at Colorado State University and for suggestions in planning, development, and conducting fieldwork of this study. I would also like to thank my colleagues at the Department of Forest and Rangeland Stewardship, Kristina Hughes and Ryan Davy, who gave valuable support during fieldwork of this study that related to their projects.