Spatial patterns and controls on burned area for two contrasting fire regimes in Southern California

. An improved understanding of the mechanisms that regulate wildfire risk at local to regional scales is needed for the design of effective fire and ecosystem management. We investigated the spatial distribution of burned area in Southern California during 1960–2009 using five different data- driven meth -ods: multiple linear regression, generalized additive models (GAMs), GAMs with spatial autocorrelation, non- linear multiplicative models, and random forest models. We used each method to separately develop burned area risk maps for Southern California’s two distinct wildfire regimes: Santa Ana (SA fires), which occur during high wind events mostly in autumn, and non- Santa Ana fires (non- SA fires), which occur mostly during the hot and dry Mediterranean- climate summer. The different methods explained 38–63% of the spatial variance in burned area for SA fires and 21–48% for non- SA fires. The two fire regimes had contrasting drivers, with Fosberg fire weather index, relative humidity, minimum temperature, and dis tance to housing most important for SA fires, and shrub cover, road density, and distance to minor and major roads most important for non- SA fires. Our modeling framework carries implications for the strate gic placement of fire suppression resources, and for prevention planning in areas facing increasing human and climate pressures.

complex, and consequently quantitative models of the spatial patterns of fire risk remain highly uncertain.
The relative importance of the physical factors controlling large wildfires in Southern California has been vigorously debated. One school of thought has argued that large fires are the result of past fire suppression (Minnich 1983, Minnich and Chou 1997, Goforth and Minnich 2007. This perspective has emphasized the buildup of vegetation and detritus, which is thought to contribute to the development of large fires. A second school of thought has emphasized the importance of extreme weather (Davis and Michaelsen 1995, Moritz 1997, Keeley and Fotheringham 2001, Keeley and Zedler 2009, Moritz et al. 2010. The literature documenting the latter perspective has noted that large fires often occur during brief episodes, from September through January when strong Santa Ana winds blow out of Southern California's eastern deserts and mountains (Moritz et al. 2010, Peterson et al. 2011. Santa Ana winds often exceed 60 km/h and relative humidity may drop below 10%, resulting in fires that can spread at rates exceeding 10,000 ha/h (Keane et al. 2008). Fire spread during these extreme conditions is often comparatively insensitive to landscape variations in fuel loads. Approximately, half of the total burned area in Southern California occurs during Santa Ana events; almost all of the remaining burned area occurs during hot and dry summer months when the winds are predominately onshore .
Unique sets of environmental factors may drive the fire regime in the different ecoregions of California Skinner 2003, Stavros et al. 2014a,b). A conceptual diagram of the way physical and human factors interact to influence burned area is shown in Fig. 1. Interactions between meteorology, fuel structure and composition, and the frequency, spread, and severity of fire are well known (Keeley and Fotheringham 2001, Moritz 2003, Collins et al. 2007, Meyn et al. 2007, Archibald et al. 2009, Parisien and Moritz 2012. Within Southern California, fire susceptibility is known to be strongly related to vapor pressure deficit and relative humidity, which covary with attributes such as elevation and mean annual precipitation (Parisien and Moritz 2012). Human activity also exerts a strong influence on fire frequency and burned area (Syphard et al. 2007). Variation in ignition frequency, for example, is positively related to the extent of human development and negatively related to the distance from infrastructure such as housing and roads (Syphard et al. 2008;Faivre et al. 2014). Roads also Fig. 1. Conceptual model of the major factors controlling burned area in Southern California (adapted from Archibald et al. 2009). The diagram categorizes the controls among human-related and biophysical variables and shows how they relate to fire regime and influence burned area.
v www.esajournals.org FAIVRE ET AL. may serve as a barrier to fire spread, particularly for non-Santa Ana fires (Jin et al. 2015).
Three recent advances provide a foundation for more accurately modeling fire risk in Southern California. First, increasing quality and availability of geographic information system data sets describing human variables makes it possible to develop more sophisticated approaches for representing the interactions between human and environmental drivers shown in Fig. 1. Limited access to this type of information in previous regional assessments in Southern California and elsewhere may have led to an overreliance on climate and weather drivers. Second, the use of high resolution meteorology has made it possible for the first time to quantitatively separate Santa Ana and non-Santa Ana fires in the fire record . This is important because the way climate and other environmental variables influence these two fire types is considerably different (Jin et al. 2015). Third, new statistical techniques have the potential to improve model formulation and yield insight about the relationship between driver variables. Ecological studies comparing the predictive ability of regression models to explain species distributions and fire dynamics have shown considerable variation, depending on methodology (Segurado and Araujo 2004, Elith and Graham 2009, Syphard and Franklin 2009). Prasad et al. (2006) concluded that machine learning methods, such as random forest or boosted regression trees, produce more accurate results in ecological studies than linear or additive models. Studies comparing multiple regression to ensemble learning techniques such as random forest modeling (Breiman 2001) for burned area are currently lacking.
Here, we investigate the relationship between burned area and human and biophysical controls in Southern California using an array of modeling techniques, including (1) multiple linear regression, (2) generalized additive models (GAMs), (3) GAMs with spatial autocorrelation, (4) non-linear multiplicative models, and (5) random forest models. Our analysis focuses on the spatial pattern of mean annual burned area during 1960-2009, and begins by partitioning this area into Santa Ana (SA) and non-Santa Ana (non-SA) components ). Our use of multiple modeling approaches allowed us to test the ability of each technique to accurately predict the spatial pattern of burned area for both fire regimes and to quantify the relative contribution of the most important controls. Our analysis carries implications for the effect of climate change and further WUI development on Southern California fire risk, while also contributing to regional assessments of fire risk and more effective strategies for fire and ecosystem management.

Study area
Our study domain spanned 36,500 km 2 of wildland and developed areas in Southern California within Santa Barbara, Ventura, Los Angeles, San Bernardino, Orange, Riverside and San Diego counties. Southern California's Mediterranean-type climate is characterized by a dry summer followed by a relatively brief and mild rainy season (Bailey 1966). Spatial gradients of temperature and rainfall result in a variety of vegetation habitats (Franklin 1998 (Di Castri et al. 1981, Arroyo et al. 1995, Davis and Richardson 1995. Southern California has experienced intense population pressure and urban growth around the major metropolitan areas during the past five decades; this has created widespread urban communities interspersed with wildland areas and connected by an extensive road network. Over 22 million people lived in Southern California in 2010 (source: US Census Bureau 2012). We focused our analysis on predicting the regional burned area patterns throughout Southern California after excluding dense urban areas and deserts. Urban areas in the study domain represented less than 8% of the total land area, while the wildland-urban interface (WUI) accounted for 17%. Two-thirds of the area within the WUI consisted of housing in the vicinity of contiguous wildland vegetation and the remaining third was interspersed housing and vegetation.

Data sets: Wildfire data
We assessed burned area using the digitized perimeter for all reported fires >40 ha compiled by the California Department of Forestry -Fire v www.esajournals.org FAIVRE ET AL. and Resource Assessment Program (FRAP 2010). We focused on the 50-yr period from 1960 to 2009; the fire records during this period were more reliable than earlier records, and the period overlapped with the availability of information on human and biophysical factors.
We carried out our analysis at a 3 × 3 km resolution to match the spatial resolution of complementary downscaled meteorological data sets that were important for characterizing regional variations in fire weather . A sensitivity test was done during a preliminary analysis stage to quantify the effect of spatial resolution. We found that the 3-km resolution did not produce results that were systematically different from those using a finer resolution of 1 km. The 3-km resolution resulted in a sample total of 3590 grid cells that had a large and well-distributed range of burned area fractions, which aided model development. We considered burned area fraction, defined as the ratio of total area burned summed during 1960-2009 within each 3 × 3 km grid cell divided by the grid cell area as the dependent variable. Multiple human and environmental variables, which are described below, were the predictors. The ArcGIS overlay geoprocessing tool was used to intersect the polygon layer of the grid cell boundaries with all fire polygons during 1960-2009, and the areas of all intersected new polygons within each individual grid were then summed. For example, if two different fires during the study period each burned half of the grid area, the resulting burned area fraction would be one. We classified the historic record of fire perimeters into SA fires and non-SA fires using the start date reported in the FRAP database and a continuous historic time series of days with Santa Ana conditions (Jin et al. 2014, Fig. 2). Santa Ana days were determined using a downscaled meteorological time series that was obtained by driving the MM5 mesoscale model with the ERA-40 and North American Regional Reanalysis data sets. Santa Ana days were identified when the northeasterly component of the daily mean wind speed was greater than 6 m/s at the exit of the largest gap across the Santa Monica Mountains .

Data sets: Human factors
Humans can influence wildland fire regimes through several different pathways , Radeloff et al. 2010. WUI areas and road networks, for example, influence fuel continuity, the patterns of ignition and access for suppression (Lloret et al. 2002, Rollins et al. 2002, Ryu et al. 2007. We defined the WUI as areas with less than 50% vegetation and at least 6.2 houses/km 2 (1 house per 40 acres) that are located within 2.4 km of a 5 km 2 (or greater) area that is more than 75% vegetated .
We considered seven variables to describe the human influence on burned area: (1) distance of cell center to a major road, (2) distance of cell center to a minor road, (3) road density, (4) population density, (5) distance of cell center to low-density housing, (6) wildland-urban land fragmentation, and (7) ignition frequency. We derived these seven variables using the best available statewide data. Geographic Encoding and Referencing road data (TIGER; US Census Bureau 2000) was used to calculate the road density per grid cell and the distance to nearest road from the cell centroid. We computed average population and housing density per 3 × 3 km grid for 1960-2009 using the 1990 and 2000 U.S. decennial census spatial data, along with consistent decadal projections of past growth trends for 1960for , 1970for , and 1980for (see Hammer et al. 2004 for details). We used the distance from cell centroid to the nearest housing area with a density greater than 6.2 housing units/km 2 as an indicator of the proximity to low-density housing within the WUI.
Wildland-urban land fragmentation was calculated using an edge density metric that represented the degree of spatial heterogeneity in the landscape. We used a land cover data set at 100 m resolution from the California Department of Forestry and Fire Protection's Fire Resource Assessment Program (FRAP 2002) to aggregate vector-based layers describing urban and nonurban land cover types. The resulting binary map was then processed using the FRAGSTATS software package (McGarigal and Marks 1995) to analyze the spatial arrangement of wildland-urban patterns. We tested several landscape metrics including the patch density, mean patch size, mean shape index, edge density, mean nearest neighbor distance between similar patches and the interspersion and juxtaposition index (see     metrics). We found that edge density was the best proxy for quantifying the complexity of wildland patches imbricated within urban areas. Edge density (ED) is a shape index that indicates whether the wildland-urban boundary is simple and compact (low value) or irregular and convoluted (high value). We computed the mean for each predictor within each 3 × 3 km grid cell by applying the zonal statistics tool in ArcGIS Spatial Analyst (Fig. 3). Consistent region-wide ignition data outside of the National Forests were unavailable, and we estimated ignition frequency for each 3 × 3 km grid using the spatial modeling approach developed in Faivre et al. (2014). Poisson regression analyses were used to model ignition frequency as a function of the dominant human and biophysical covariates (Syphard et al. 2008).

Data sets: Vegetation and biophysical factors
We used a set of 12 environmental variables that were expected to influence the physical characteristics of fuel, including continuity, moisture, or loading ( Fig. 1). These variables can be sorted into three main categories: topography (1-elevation, 2-slope), land cover (fractional cover of 3-forest, 4-shrubland, 5-grassland, and 6-other), and meteorology (annual average daily 7-maximum and 8-minimum temperature, 9-cumulative winter precipitation, 10-wind speed, 11-relative humidity, and 12-Fosberg fire weather index (FFWI) ( Table 1)). The FFWI is a non-linear construct of meteorological conditions (i.e., temperature, relative humidity, and wind speed), which is widely used to infer wildfire potential from the short-term weather conditions (Fosberg 1978). FFWI values range from 0 to 100; values ≥50 indicate a significant threat of wildfire incidence and spread.
The topographic variables (elevation and slope) were calculated for each 3 × 3 km grid cell using the three arc-second digital elevation model from the U.S. Geological Survey National Elevation Dataset (NED). We assessed vegetation characteristics using the recent and comprehensive land cover data set at 100 m resolution from the California Department of Forestry and Fire Protection's Fire Resource Assessment Program (FRAP 2002). We classified the mixed vegetation of wildland areas into three major types: "shrubland" (comprising 52% of the study area), "forest/woodland" (19%), and "grassland" (8%). The remaining non-vegetated land cover types (21%) were grouped as "other"; this category included agricultural land, urban, desert, wetland, water, and barren soil. We calculated the fraction of each class within each 3 × 3 km grid cell.
We derived several of the meteorological variables from the monthly gridded Parameter-Elevation Regressions on Independent Slopes Model (PRISM) data set that has a native resolution of 800 m (Daly et al. 2002; Oregon State University PRISM Group). Winter precipitation was estimated using monthly mean of precipitation during September through March for each 3 × 3 km cell over the 1960-2009 period. Similarly, variables representing the annual mean of daily maximum temperature and the annual mean of daily minimum temperature were calculated over the period by averaging all the available monthly files.
To capture the spatial pattern of meteorological conditions that typically occur during SA and non-SA fires, we estimated daily relative humidity, wind speed, and the Fosberg Fire Weather Index using 3-hourly model outputs from the Mesoscale Model version 5 (MM5) forced with reanalysis data sets as described by Jin et al. (2014). Santa Ana days were identified using winds at the exit of the largest gap in the Santa Monica Mountains . Most of the Santa Ana events occurred in late autumn and early winter in Southern California, and most SA fires occurred in a 3-month window from September to November . We therefore quantified the meteorological conditions that typically occur during SA fires by averaging each of these three variables from the MM5 daily time series during Santa Ana days from September to November. For non-SA fires, we calculated these same variables during non-Santa Ana days from June to August. We resampled all meteorological data to the common 3 × 3 km grids.

Modeling approaches to predict burned area
We built, tested, and compared five modeling approaches separately for SA and non-SA fires: multiple linear regression (MLR), generalized additive models (GAMs), GAMs incorporating spatial autocorrelation (GAMspA), non-linear multiplicative models (NMM), and random forest models (RF).
MLR has been used extensively to analyze the relationship between burned area and environmental controls (Larsen 1996, Carvalho et al. 2008, Camia and Amatulli 2009. Empirical studies often predict a high proportion of the variation in burned area using MLR Harrington 1988, Turner andRomme 1994, v www.esajournals.org FAIVRE ET AL. , Larsen 1996. However, linear regression assumes the variance of the response variable is constant across observations and the errors follow a normal (Gaussian) distribution; these assumptions may be invalid for the estimation of burned area or other ecological variables (Viegas and Viegas 1994, Li et al. 1997, McCarthy et al. 2001. We therefore also considered generalized additive models (GAMs), which are comparatively flexible and often are better-suited for analyzing ecological data based on non-linear responses to predictor variables (Hastie and Tibshirani 1986).
These modeling approaches assume spatial stationarity (i.e., effects of environmental correlates are constant across the region) and isotropic spatial autocorrelation (i.e., the process resulting in spatial autocorrelation acts in the same way in all directions). Anisotropic spatial autocorrelation arises when the variables of interest in nearby sample units are not independent of each other (Griffith 1987), i.e., in ecological data. Such spatial patterns are usually explained by environmental features such as climatic or habitat structure variables that are themselves spatially structured (e.g., directionality and intensity of wind patterns). It is often impossible to measure all spatially structured variables, and this issue affects the uncertainty of statistical models (Legendre 1993, Legendre et al. 2002). A positive spatial autocorrelation (i.e., closer locations having more similar residual values than others) tends to underestimate the true standard error of parameters, which leads to an over estimation of the regression coefficients.
We thus constructed a version of the GAM model accounting for spatial autocorrelation to better represent gradually changing spatial variability in environmental correlates. We implemented these autocovariate GAMs by calculating locally weighed regressions within a moving window spanning the entire study domain. We included a two-dimensional smoothing function f(x i ,y i ) in the GAMs, using the two geographic coordinates (i.e., latitude and longitude) as a single variable, along with the other terms in the model (Wood andAugustin 2002, Wood 2003).
As an alternative approach to GAM, we investigated the use of non-linear multiplicative regression models. Previous modeling studies have shown that the rate of fire spread has a pos-itive exponential relationship with slope (Junpen et al. 2013) and fuel load (Cheney et al. 1998, Martins Fernandes 2001, and a negative exponential relationship with fuel moisture content (Junpen et al. 2013). Thus, we tested several non-linear multiplicative models including power functions (of the form y = x r ), rational functions (quotients of polynomial functions), exponential decay and growth functions (Eq. 1), logistic functions (Eq. 2) and combined forms. We performed an optimization of the model equations using the nonlinear least squares solver (nls; Bates and Watts 1988) that estimates iteratively the coefficients of explanatory variables to find the best fit (i.e., highest correlation) with the response variable. (1) (2) The previous modeling approaches are sensitive to collinearity among predictors, which can hinder the variable selection process, and model predictive power. A promising alternative is the use of classification and regression tree techniques; these approaches are generally more robust to the inclusion of correlated variables, and are complementary to generalized linear and additive models (Archer and Kimes 2008). Consequently, we implemented a random forest model (R package "random forest") by generating a large number of bootstrapped trees (using a randomized subset of predictors), and reserving 30% of the data for testing (Breiman 2001). We trained 1000 trees using 70% of the data and selected six predictors at each split. We used a default minimum node size of five to prevent the creation of small sample nodes without increasing the overall relative error (i.e., misclassification rate; Breiman (2001)). The model predictions were obtained using the reserved data each time a tree was grown. The final predictions consisted of the average of all predictions from the 1000 regression trees.

Variable selection and model validation
We performed initial univariate regressions between the response variable and all predictors  with the goal of identifying the relative importance of each predictor independent of its interactions with others (Table 2). We also examined the correlation matrix among explanatory variables for high pairwise correlations to detect multicollinearity issues and to narrow the selection of useful covariates.
We used the following methods to select the most relevant predictors from the entire set. The selection of terms for deletion from the MLR model was based on Akaike's Information Criterion (AIC). The selection of terms for the GAM analysis used the automatic term selection procedure (Wood and Augustin 2002), which imposed a penalty to smooth functions and thus effectively removed terms from the model. The selection of terms in the multiplicative models relied on sequentially adding terms based on an incremental improvement to model fit (i.e., minimizing cross-validated R 2 ).
We used 70% of the data (n = 2495), randomly selected, for the development of each model. The remaining data in reserve (30%, n = 1095) were used to quantify model performance, using cross-validated R 2 values (model predictions against the validation data subset), root mean square errors (RMSE), percent bias and AIC values. We repeated this process 500 times for each model type (except for RF where the iteration process is integrated) while maintaining the 70:30 ratio to ensure the statistically meaningful mean and accuracy of the results. Finally, we estimated the number of degrees of freedom (Table 3 and 4). Model building and statistical analyses were carried out using R software (R Development Core 2012; "mgcv" package for GAM, "rpart" and "randomForest" packages for RF).

Evaluation of relative importance of variables
We estimated the contribution of predictors by analyzing the deviance (AIC value) of nested models (i.e., models excluding successively the less relevant predictor) for all modeling approaches except Random Forest. In the RF approach, we used the 70:30 ratio to split the data sets for model calibration and validation (Breiman 2001), and used the percent decrease in accuracy (i.e., decrease in mean square error) as a measure of variable importance. Then, we conducted several analyses to better understand the relationship between driver variables, important splitting points, and the predicted spatial pattern of burned area. First, we ran an additional regression tree using the average (final) predictions from the random forest as input   Table 1 for a full description of the explanatory variables retained in the models. For the MLR model, the AIC = 3892, the adjusted R 2 = 0.21 [0.16, 0.24], the percent bias = 0.28, the RSME = 0.52, df = 9; for the GAM, the AIC = 3714, the adjusted R 2 = 0.27 [0.25, 0.34], the percent bias = 0.29, the RSME = 0.49, df = 31; for the GAMspA, the AIC = 3585, the adjusted R 2 = 0.32 [0.27, 0.36], the percent bias = 0.30, the RSME = 0.48, df = 37; for the NMM, the AIC = 3655, the adjusted R 2 = 0.23 [0.18, 0.28], the percent bias = 0.22, the RSME = 0.51, df = 37; for the RF model, the adjusted R 2 = 0.48, the percent bias = 0.28, the RSME = 0.31. v www.esajournals.org FAIVRE ET AL. data. We then pruned this tree using a complexity parameter of 0.01 (see the documentation of R package "rpart" for an explanation of this parameter). This "summary tree" explained significantly more variance in the input data (P < 0.001) than any random regression tree of equal complexity generated from the random forest (Rejwan et al. 1999). The tree structure enabled to us to investigate the explanatory nature of the dominant controls on burned area. We analyzed the splits and nodes of this regression tree and determined the combinations of human and biophysical conditions resulting in high and low burned area fractions across the region. Finally, we used predictive maps to spatially characterize the combined influence of climate, fuel, and human conditions.

Burned area patterns
We observed contrasting spatial patterns of burned area for SA and non-SA fires (Fig. 2). The characteristic location, size, shape, and overlap of individual fire perimeters differed markedly by fire type. SA fires accounted for most of the burned area in four regions: the Santa Monica Mountains and Simi Hills, the Cajon Pass between the San Gabriel Mountains, and the San Bernardino Mountains (Fig. 3), the Santa Ana Mountains, and the eastern part of San Diego County. High wind speeds occur through these mountain passes on Santa Ana days (Moritz et al. 2010), which translates into a FFWI above 21 (Fig. 4a). SA fires burned repeatedly near developed areas, resulting in aggregated fire mosaics with high burn frequencies in areas close to the wildland-urban interface. Non-SA fires were mostly confined to inland areas with low summer relative humidity (Fig. 4b). 370 SA fires and 890 non-SA fires were recorded between 1960 and 2009. The average size of SA fires during 1960-2009 was 2700 ha (median 723 ha), and 53% of the large fires (≥5000 ha) were SA. Non-SA fires were typically smaller, with a mean size of 900 ha (median 356 ha), and were typically scattered across remote and rugged areas, such as the central part of Los Padres National Forest and the San Gabriel Mountains. A relatively high frequency of non-SA fires occurred in the San Gorgonio Pass.
Contrasting sets of variables were required to predict the spatial burned area patterns for SA and non-SA fires (Table 2). SA fire burned area was positively associated with variables emphasizing human presence and proximity to urban development (Table 2; Fig. 5a), whereas non-SA burned area often had a negative relationship with these variables (Table 2; Fig. 5b). Both fire types had a positive relationship with variables related to the amount and composition of fuels, such as shrub cover. The relationship between meteorological variables and burned area was more pronounced for SA fires, with wind speed, temperature, and precipitation having a positive influence, and relative humidly having a negative influence (Table 2; Fig. 5).

Comparison of modeling approaches for SA fires
Performance characteristics and input parameters used to predict burned area patterns for each model are shown in Table 3 for SA fires and Table 4 for non-SA fires. The best compromise between model complexity and model performance was achieved using seven variables for SA fires and eight variables for non-SA fires. The variables retained for SA or non-SA fires were generally consistent regardless of modeling approach, although the relative importance of variables typically varied with method (Tables 3 and 4).
All five modeling methods captured a significant amount of variance in the spatial distribution of SA burned area (Tables 3). The burned area variance explained by models ranged from 39% for the MLR model to 63% for the RF model. Compared with MLR, GAMs increased the adjusted-R 2 to 43% and reduced bias but also had considerably more degrees of freedom (i.e., the number of components in the model that need to be known). Incorporating spatial autocorrelation in GAMs improved model performance, explaining 51% of the variance. We caution that the primary influence of the spatial autocorrelation term in the augmented GAM (Table 3) may be confounded with the influence of spatially structured variables such as wind speed, relative humidity, and to a lesser extent, elevation and fuel distributions (Fig. 4a). Indeed, the burned area patterns of SA fires varied by latitude and longitude along a south-western directional gradient. The non-linear multiplicative model fit explained 44% of the variance and also had a lower bias compared to MLR, with the same degrees of freedom. The multiplicative model for SA fires had the form as seen in Eq. 3 below.
Multiple linear regression produced spatial patterns that had excessive spatial smoothing relative to the observations (Fig. 6). In contrast, nonlinear multiplicative and random forest models captured more of the fine scale spatial structure.

Comparison of modeling approaches for non-SA fires
Non-SA model performance was somewhat weaker, ranging between 21% of the variance explained for the MLR to 48% for the RF model. Both GAMs (27%) and non-linear multiplicative models (23%) yielded slight improvements over MLR (i.e., had higher correlation coefficients and decreased AIC; Table 4). The non-linear multiplicative model developed for non-Santa Ana fires had the form as seen in Eq. 4 below.
Adding a spatial autocorrelation term had little effect on the overall performance of GAM, explaining 32% of the variance. For non-SA fires, the RF model also had the lowest RMSE and bias values, and resolved more of the observed patterns (Fig. 7).

Relative importance of biophysical and human variables
We found that the relative importance of variables influencing burned area differed between SA and non-SA fires. All models for SA fires identified FFWI as the variable that explained the most variance in burned area (i.e., from 28% of the total model predicted variance in MLR model to 40% for RF; Table 3). Wind speed, relative humidity, distance to housing, and shrub cover were comparatively strong contributors to model performance, and precipitation and tree cover were weaker factors. Shrub cover, relative humidity, temperature, wind speed and precipitation were the most important determining factors for predicting non-SA fires (Table 4). Road density was the strongest human variable influencing the spatial distribution of non-SA fires. Distance to housing and ignition frequency contributed to a lesser degree, though both factors are highly correlated with distance to roads.

Predictive mapping and split conditions
The "summary" trees created from the RF predictions for SA and non-SA fires were effective at predicting burned area for extremes cases, where particularly small or large proportions of an area were predicted to burn. A possible explanation is that the environmental and human conditions resulting in either high or low burned areas were easily identified for both fire types (Fig. 8). For SA fires, areas with FFWI <21 and located at a distance ≥5.8 km from low-density housing had the lowest mean predicted burned fraction (0.1% per yr) while representing nearly 30% of the domain (Table 5). Low SA burned fractions (<0.25% per yr) were also predicted in another 22% of the domain within areas that were close to urban development (where distance to housing was <5.8 km). Shrub cover in these regions was <48% and FFWI was <17 (Table 5). Intermediate SA burned area predictions coincided with areas at low elevations (<900 m), shrubland cover greater than 48%, and in close proximity to the wildland-urban interface (d.hou <5.8 km) (Table 5). Areas with higher predicted burned area were often located at a distance <6.2 km from low-density housing, with FFWI ≥21 and low relative humidity (<49%). These fire-prone conditions were especially common in the Santa Monica Mountains (Fig. 8a).
The occurrence of non-SA burns was mostly discriminated by fuel type (the amount of shrub cover) and relative humidity (Table 6). High humidity (≥62%) and low shrub cover (<40%) led to predictions of low to moderate non-SA burned area (i.e., mean annual fraction burned comprised between 1% and 3%). Denser shrub cover (≥40%) and lower relative humidity (<63%) were associated with intermediate to high burned areas. The fire probability within this group was further increased by annual average of daily minimum temperatures ≥6.7°C. Some areas showed extensive non-SA burning despite lower minimum temperatures; these areas were associated with low landscape fragy = C 1 × e (C 2 ×shr+C 3 ×rel.h+C 4 ×tmin+C 5 ×rd.den+C 6 ×wind.s+C 7 ×tmax+C 8 ×d.hou+C 8 ×pred,ign) y = C 1 × 1 1 + e (C 2 ×ffwi+C 3 ×wind.s+C 4 ×prec×C 5 ×shr+C 6 ×d.hou+C 7 ×tmax+C 8 ×rel.h) (4) mentation (ED <0.2). Recurrent and extensive non-SA fires were predicted in areas characterized by relative humidity <60%, dense shrub cover (≥70%), an annual rainfall ≥438 mm and an annual average of daily minimum tempera-tures ≥8.9°C ( Fig. 8b; Table 6, nodes 9 and 10). These conditions were typical of the northern part of the Los Padres National Forest and the western part of the Angeles National Forest (Fig. 8b). Fig. 6. Geospatial model predictions of SA fire burned area. Panels show: (a) the observed burned area, (b) the burned area predicted using multiple linear regression, (c) the area predicted using a generalized additive model, (d) the area predicted using a generalized additive model with spatial autocorrelation, (e) the area predicted using a non-linear multiplicative model, and (f) the area predicted using a random forest model.

Contrasting patterns of SA and non-SA fires
We characterized the spatial patterns of burned area and investigated the associated drivers for SA and non-SA fires in Southern California; these two fire regimes overlap spatially but are temporally distinct. The environmental and human-related driver variables influence the two types of fire in markedly Fig. 7. Geospatial model predictions of non-SA fire burned area. Panels show: (a) the observed burned area, (b) the burned area predicted using multiple linear regression, (c) the area predicted using a generalized additive model, (d) the area predicted using a generalized additive model with spatial autocorrelation, (e) the area predicted using a non-linear multiplicative model, and (f) the area predicted using a random forest model.
v www.esajournals.org FAIVRE ET AL. different ways. Jin et al. (2014) described a comprehensive analysis of the environmental controls on the temporal dynamics of SA and non-SA fires. Low relative humidity and strong wind promote ignition and increase the rate of fire spread within dry fuels, especially for SA fires. The cumulative precipitation during both the current and the preceding 3 yr exert a strong influence on fine fuel accumulation, which increases the likelihood of non-SA fire occurrence ).
Our research builds on previous studies that have provided an understanding of how meteorological factors (i.e., temperature and precipitation) constrain the temporal dynamics of fuel characteristics and fire activity. Temperature modulates fuel moisture directly through evapotranspiration, and indirectly at higher elevation through snowpack accumulation and melt (Westerling 2006). Westerling and Bryant (2008) proposed two basic fire regimes: "energy-limited", which occur in relatively wet and dense forested ecosystems Fig. 8. Spatial clustering of observed data using regression tree classification. Panels show the areas where fire regimes of Santa Ana (a) and non-Santa Ana fires (b) are under varying degrees of human, fuel and climatic controls. The mean predicted burned area fraction of each node is listed in the legend, and the corresponding sets of human and biophysical conditions with each node number, shown in the parenthesis, are described in Table 5 for Santa Ana fires and Table 6  where fuel flammability is the limiting factor, and "fuel-limited", which occur in low-density shrubland where spread is limited by fuel availability. Meteorological conditions during preceding years can have an important effect on fuel accumulation and thus fire spread in "fuel-limited" systems (Littell et al. 2009, Stavros et al. 2014a. Our analysis expands on previous work to show that the spatial distribution of SA and non-SA fires in Southern California respond differently to environmental and human drivers and their interactions. Ignitions in Southern California are clustered around urban development and transportation corridors, and are more widely scattered within wildland areas . The occurrence of very large fires (e.g., 10 000 acres) in the foothills and lower montane ecosystems of the San Gabriel and Castaic ranges reflects the like- Note: These splits identify break points in the predictor variables that are important for explaining burned area spatial patterns for Santa Ana fires. The reliability of this regression tree is slightly decreased from original random forest predictions (R 2 SA = 0.56, P < 0.001). Please refer to Table 1 for the definition of acronyms and a full description of explanatory variables. Note: These splits identify break points in the predictor variables that are important for explaining burned area spatial patterns for non-Santa Ana fires. The reliability of this regression tree is slightly decreased from original random forest predictions (R 2 nonSA = 0.42, P < 0.001). Please refer to Table 1 for the definition of acronyms and a full description of explanatory variables.
lihood of periurban ignition, and the influence of continuous, chaparral-dominated fuels that facilitate fire spread (Fig. 2; Fig. 4). The combination of suitable fire weather during summer and fall, wet winters that promote vegetation growth, steep terrain and interspersed fuels allow ignitions to grow into large wildland fires. These conditions are particularly effective at promoting fire growth in of the Los Padres National Forest and Angeles National Forest. The spatial configuration and location of SA burns differs from that of non-SA fires, owing, in part, to a north-south gradient of the meteorological and topographic factors influencing SA fires (Minnich 1995, Moritz 1997. Santa Ana winds may reach 10-20 m/s as the northeasterly flow is channeled through passes and canyons and are usually accompanied by very low relative humidity (i.e., 5-20%; Raphael 2003, Hughes and. Ignitions starting at the WUI interface of the Los Angeles Basin can develop into large fires across the San Gabriel valley and the Santa Monica Mountains (Fig. 4a). Similarly, the terrain-amplified flow of easterly downslope winds over San Diego County's Laguna Mountains is responsible for the spread of large chaparral fires toward the coast and WUI (Fovell 2012). Our results build on earlier work by Moritz et al. (2010) and provide further evidence for strong meteorological forcing on the spatial distribution of SA fires. We found the FFWI was especially effective at capturing the combined influence of wind velocity, relative humidity, and temperature on SA burned area (Fig. 4a, Table 4). Short-term (hourly to daily) variations in fire weather (i.e., relative humidity, precipitation, temperature, wind velocity) have been associated with local fire behavior through their influence on fire spread and intensity (Flannigan and Harrington 1988, Bessie and Johnson 1995, Keeley 2004, Schoennagel et al. 2004).

Model comparison
We developed five different classes of fire model by regressing human, meteorological, and biophysical variables onto observed burn area using a stepwise approach. We found little differences in the set of predictors retained by the different models, yet the relative importance of explanatory variables varied considerably. The models differed significantly in the amount of variance explained, underscoring the value of using a suite of approaches for predicting the spatial patterns of burned area, as well as diagnosing the importance of controlling variables.
The comparison of model performance (Tables 3 and 4) revealed that random forest models performed significantly better than MLR, GAMs, and non-linear multiplicative models. Classification and regression tree procedures (CART) such as Random Forest can find optimal binary splits in the selected covariates to partition the sample recursively into increasingly homogeneous clusters (Cutler et al. 2007). As a consequence, this technique may be more effective at distinguishing presence and absence (areas that are fire-prone vs. ones that are inappropriate for burning) than models with continuous outputs such as GAMs. Random forest yielded the most accurate predictions, but did not perform well when used on spatially independent test data or when varying the sample size of the training data set. This suggests that RF models suffered more from over fitting than the other models (Dormann 2011). Non-linear multiplicative models, in contrast, showed a good compromise between complexity and performance. They performed well compared to RF (i.e., low bias in overfitting the model and high cross-validated R 2 ) with relatively few degrees of freedom.
Our results showed that integrating a spatial autocorrelation term significantly increased the variance explained for SA fires (Table 3), likely as a consequence of resolving areas where a maximum neighborhood effect existed between predictors (i.e., areas where the spatial processes were explained by the surrounding influence of biophysical and human factors). Indeed, the spatial autocorrelation of SA fire weather variables induced a strong clustering effect in specific areas for SA fires. SA fires were most common in areas where FFWI was ≥21 and relative humidity <49%. In contrast, the human and biophysical conditions associated with non-SA fires were widely distributed across the region. Hence, a broader combination of factors explained the distribution of non-SA fire patterns, which hindered the influence of neighborhood effects. A decrease in average size of non-SA fires in the latter half of the 20th century may be the result of effecv www.esajournals.org FAIVRE ET AL.
tive fire suppression, limiting most fires to very small sizes and generating fine-grain fire mosaics (Conard and Weise 1998). This translated into scattered non-SA fire patterns, as observed in the upper-elevation areas of the San Bernardino and Cleveland National Forests, where fires are actively suppressed.

Potential for change in fire activity in Southern California
Projections of the impact of climate change on wildfire activity in Southern California have yielded contradictory results (Lenihan et al. 2008, Westerling and Bryant 2008. Uncertainty in future synoptic meteorological conditions, and the effect of complex topography on local surface wind speed, complicate efforts to predict future trends of Santa Ana wind occurrence and intensity (Miller and Schlegel 2006). Wildfire activity during the summer fire season is strongly associated with immediate drought conditions and, to a lesser extent, a moisture deficit from the preceding year (Westerling and Swetnam 2003). Most climate models project significant increases in surface temperatures for Southern California in the coming decades, while mean precipitation is expected to remain constant (Hayhoe et al. 2004, Cayan et al. 2008. Temperature projections indicate an annual warming of 1.5 to 5°C by 2100, with a fall median of 2°C (Yue et al. 2014).
The fire predictions based on the regression trees provide a simplified illustration of the potential responses of fire under changing human and climatic influences. The models associated a high burning probability for non-SA fires with dense shrub cover (≥70%), an annual rainfall ≥438 mm and an annual average daily minimum temperature ≥8.9°C ( Fig. 8b; Table 6). Such conditions are already typical for large swaths of the foothills and mountain ranges of Southern California (e.g., San Bernardino and Angeles National Forests), and the burned area may increase further as a result of warming at higher elevation (Yue et al. 2014). Rising temperatures will facilitate earlier snow-melt, runoff and green-up, desiccating fuels earlier and creating a longer fire season (McKenzie et al. 2004, Flannigan et al. 2005, Westerling 2006, Littell et al. 2009, Parisien and Moritz 2012, Stavros et al. 2014b). Yue et al. (2014) projected that median area burned in Southern California will likely double as a consequence of rising temperatures and increased length of wildfire season.
Rising temperatures coupled with an increased ignition probability and an expanding WUI will also impact the SA fire regime and may generate more frequent, larger, and higher severity fires in Southern California. The regression tree mapping of SA fires identified areas near the WUI with temperatures >10.8°C as especially hazardous for the spread of SA fires. However, the occurrence of Santa Ana events is projected to decrease by 2100 (Hughes et al. 2011) and their peak occurrence is projected to shift from September-October to November-December with the decrease in the temperature gradient between the desert and ocean (Miller and Schlegel 2006). Consequently, wildfires spreading under SA conditions are expected to be less frequent. Widespread burning by SA fires in the coastal ranges (e.g., Santa Ana and Santa Monica mountains) may accelerate the expansion of grassland at the expense of shrublands, and an important next step is to integrate these types of vegetation feedbacks into predictive fire models.

conclusIon
We partitioned wildfires in Southern California into those coincident with SA and non-SA conditions and separately modeled the spatial patterns of mean annual area burned during 1960-2009. Five different regression methods including a random forest model were tested. We found that these different methods explained 38-63% of the spatial variance in the area burned by SA fires and 21-48% of the variance for non-SA fires. Further work is needed to investigate how fire suppression or other factors such as time-since-last-fire, contribute to the spatial patterns of non-SA fires. Our study implies that a separate consideration of SA and non-SA fire regimes should improve assessments of fire probability, and may be a useful consideration for the development of wildfire policy in Southern California. Fuel reduction treatments intended to mitigate large fire hazard may prove comparatively ineffective in preventing fire spread under SA conditions . Syphard et al. (2012) noted that the majority of fire-related property losses occur within areas v www.esajournals.org FAIVRE ET AL. of low-fuel volume, such as grasslands, which have low-heat requirements for ignition and the potential to carry fires to nearby shrubland and woodlands. Further research is needed to integrate climate and urban development trends for predicting future burned area patterns.
acknowledgMents This study was supported by NASA Interdisciplinary Science grant NNX10AL14G to the University of California, Irvine. We thank the USFS and the California Department of Forestry and Fire Protection for providing the fire perimeter dataset. We thank researchers from UCI's Earth System Science Department for their comments on earlier versions of the manuscript. lIterature cIted