Evaluation of CMIP6 model performances in simulating fire weather spatiotemporal variability on global and regional scales

. Weather and climate play an important role in shaping global wildfre regimes and geographical distributions of burnable area. As projected by the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC-AR6), in the near future, fre danger is likely to increase in many regions due to warmer temperatures and drier conditions. General circulation models (GCMs) are an important resource in understanding how fre danger will evolve in a changing climate, but, to date, the development of fre risk scenarios has not fully accounted for systematic GCM errors and biases. This study presents a comprehensive global evaluation of the spatiotemporal representation of fre weather indicators from the Canadian Forest Fire Weather Index System simulated by 16 GCMs from the sixth Coupled Model Intercomparison Project (CMIP6). While at the global scale, the ensemble mean is able to represent variability, magnitude and spatial extent of different fre weather indicators reasonably well when compared to the latest global fre reanalysis, there is considerable regional and seasonal dependence in the performance of each GCM. To support the GCM selection and application for impact studies, the evaluation results are combined to generate global and regional rankings of individual GCM performance. The fndings high-light the value of GCM evaluation and selection in developing more reliable projections of future climate-driven fre danger, thereby enabling decision makers and forest managers to take targeted action and respond to future fre events.


Introduction
Wildfres burn hundreds of millions of hectares each year around the world (Giglio et al., 2013;Yang et al., 2014; van Lierop et al., 2015;van Wees et al., 2021).Their impacts include profound effects on ecosystems, damage to infrastructure, high costs associated with suppression activities and risk to human lives.In recent years, the impacts of devastating individual events have been widely reported.For instance, the 2016 wildfre in Fort McMurray (Alberta, Canada) resulted in the destruction of around 2400 buildings, the evacuation of 88 000 people and fnancial costs of more than USD 3.5 billion (Mamuji and Rozdilsky, 2019).In California, during the 2020 wildfre season, around 1.7 × 10 6 ha burned, causing 33 casualties and damaging more than 10 000 infrastructure elements (Department of Forestry and Fire Protection, 2021).Responding to present and future fre risks is of critical importance, particularly in the world's most vulnerable regions.Given the strong infuence of weather Published by Copernicus Publications on behalf of the European Geosciences Union.and climate on temporal and spatial patterns of wildfre occurrence (Flannigan and Wotton, 2001;Zumbrunnen et al., 2009;Masrur et al., 2018), a better understanding of the impact of climate change on wildfre risk, and the tools used to quantify this impact, is an important step in formulating such responses.
Wildfres are associated with a multitude of drivers, including land use; vegetation type; topography; and, quite signifcantly, human activity linked to ignitions (Camia et al., 2013;Balch et al., 2017;Gaboriau et al., 2020;Fernández-Guisuraga et al., 2021).In addition, wildfre occurrence, spread and impact (in terms of area burned) are highly dependent on climate and weather conditions (Littell et al., 2009;Abatzoglou and Kolden, 2013;San-Miguel-Ayanz et al., 2013;Harris et al., 2019;Mueller et al., 2020).Across the globe, long-established spatiotemporal patterns of wildfre are being altered by changing land use; population rise; and, perhaps most importantly, changes to the climate system in a warming world (United Nations Environment Programme, 2022).While wildfres cannot strictly be defned as meteorological hazards in the same way as droughts, foods and storms, fre danger is greater during periods of high temperature, minimal precipitation, low relative humidity and strong winds.Notably, higher temperatures are signifcantly related to wildfre occurrence and a large extent of burned areas (Westerling et al., 2006;Littell et al., 2009;Koutsias et al., 2013;Cardil et al., 2015).The same positive relationship between drought and wildfres has also been documented (Littell et al., 2016).Similarly, lower precipitation and increased dry days intensify wildfre activity (Flannigan and Harrington, 1988;Holden et al., 2018).
Disentangling the respective contribution of different meteorological variables to fre risks is challenging, particularly in a changing climate.It is understood that the intensity and frequency of hot extremes (e.g.heat waves) are an expected consequence of a warmer world, and changes in mean precipitation will vary geographically (IPCC, 2021b).On a global scale, weather conditions may become more favourable to wildfre activity (Jolly et al., 2015;de Rigo et al., 2017;Mueller et al., 2020) and extend over longer periods (Jolly et al., 2015).To better understand past, present and future changes, it is usually preferable to combine the hot, dry and windy conditions that are conducive to fre.The term fre weather was coined to describe the collective infuence of local specifc weather conditions that may lead to effective ignition and fre spread (Schroeder and Buck, 1970).Fire weather is typically quantifed as a series of indicators, generated based on meteorological input variables and established empirical relationships, which can be used to estimate wildfre danger.
Future changes in fre weather will most likely represent an increase in wildfre danger in many regions of the world (de Rigo et al., 2017;Arias et al., 2021).Understanding future meteorologically driven wildfre danger under climate change scenarios relies on projections from general circula-tion models (GCMs).As mathematical representations of the climate system and its processes, GCMs are the most important tool in understanding how the world's climate has varied in the past and how it will respond to different future scenarios associated with anthropogenic climate change.GCMs have frequently been used to quantify the link between wildfre activity and weather conditions (Bedia et al., 2015;Williams and Abatzoglou, 2016), specifcally, to simulate fre weather both in the past and under future climate change scenarios (Moritz et al., 2012;Flannigan et al., 2013;Bedia et al., 2015;Littell et al., 2018;Abatzoglou et al., 2019) and also in recent attribution studies to assess the infuence of anthropogenic climate change on fre weather (Barbero et al., 2020;Liu et al., 2022).However, all GCMs are associated with performance limitations that manifest as systematic biases and, ultimately, as uncertainty in GCM projections (Hawkins and Sutton, 2009;Lehner et al., 2020).Evaluation of model outputs, whether generated by individual GCMs or as part of a multi-GCM ensemble, is a continuous challenge and has been the subject of numerous studies (Johns et al., 2006;Flato et al., 2013;Baker and Taylor, 2016;Kotlarski et al., 2019).It is especially important for climate impact studies to (a) use projections from multiple GCMs and (b) evaluate the capacity of each individual GCM to represent characteristics of climate variables or phenomena that are relevant to the impact under investigation.To date, fre weather projections have frequently been based on single GCMs (e.g.Krawchuk et al., 2009;Amatulli et al., 2013), and, even when multiple GCMs have been used (e.g.Moritz et al., 2012;Dowdy et al., 2019), the capacity of each GCM to simulate realistic conditions (i.e.comparable to observed fre weather conditions) has not been thoroughly evaluated.In the absence of a comprehensive GCM evaluation, it is not possible to characterise and quantify the uncertainties that may affect the reliability of multi-GCM means and projections (Moritz et al., 2012;Bedia et al., 2015;Dowdy et al., 2019).
This study aims to evaluate the performance of the latest generation of GCMs from the sixth phase of the Coupled Model Intercomparison Project (CMIP6) in simulating a range of fre weather indicators across all fre-prone regions of the world (see Sect. 2.4).The analysis represents the frst global evaluation of GCM capacity to realistically simulate spatiotemporal variability in meteorologically driven wildfre danger.Evaluation is performed at the global and regional scales, accounting for model performance in simulating both mean and extreme fre weather conditions.The results generated are relevant for wildfre risk assessment studies and more informed decision-making and planning to respond to future fre danger.In the context of the ongoing global climate change, more tailored fre management strategies are key to better adapt to future fre weather conditions.
The remainder of this paper is organised into four sections.Section 2 gives an overview of the chosen set of fre weather indicators, the CMIP6 models and the reference datasets used as the basis for evaluation, alongside a description of the evaluation methodology.Section 3 presents the results of the model evaluation on both global and regional scales, initially for the multi-GCM mean and seasonality and subsequently for inter-model performance.Section 4 includes a synthesis and discussion of the implications of the results.Section 5 provides a set of conclusions and an outlook.
2 Data and methods

Fire weather indicators
The long-established relationship between climate and wildfre has led to the development of a range of meteorologybased indicators to describe fre weather (and consequently fre danger) in different parts of the world (e.g.McArthur, 1967;Deeming et al., 1972;Van Wagner, 1974).Throughout this study, indicators of fre weather are represented by the Canadian Fire Weather Index System (CFWIS).While originally developed for a standard pine forest in Canada (Van Wagner, 1974, 1987;Wotton, 2009), this system has been proven to be applicable in other regions (Carvalho et al., 2008;Di Giuseppe et al., 2016;Bowman et al., 2017) and is being used by the European Commission for fre weather statistics in Europe (European Forest Fire Information System) and worldwide (Global Wildfre Information System).It is also widely used for projections of future fre weather (Bedia et al., 2015;Camia et al., 2017;Dupuy et al., 2020).
The CFWIS consists of a set of different components, each of them calculated using a combination of daily meteorological variables (Van Wagner, 1987; Fig. 1): temperature, wind speed, relative humidity and precipitation.Firstly, a set of fuel moisture codes describe the quantity of moisture contained by fre fuels: the Fine Fuel Moisture Code (FFMC) represents the moisture content of litter and other fne fuels, indicating the relative ease of ignition and the fammability of fne fuel; the Duff Moisture Code (DMC) represents the average moisture content of loosely compacted organic layers of moderate depth; and the Drought Code (DC) represents the average moisture content of deep, compact organic layers.The following components describe weather-driven fre behaviour: the Initial Spread Index (ISI) represents the expected rate of fre spread, combining the effects of wind and FFMC on the rate of spread without the infuence of variable quantities of fuel; and the Buildup Index (BUI) represents the total amount of fuel available for combustion, combining DMC and DC.Finally, two indices are calculated: the Fire Weather Index (FWI) represents fre intensity, combining ISI and BUI, and is often used as the main fre danger indicator (Padilla and Vega-García, 2011;Bedia et al., 2015;de Rigo et al., 2017); the Daily Severity Rating (DSR), an extension of the CFWIS, is a transformation of the daily FWI value, representing the effort required for suppression.All fre weather components of the system are numeric ratings, and a higher number represents a higher potential fre danger.A detailed description of the system and its individual components can be found in Van Wagner (1987).

CMIP6 models
During recent decades, the development and dissemination of a growing number of GCMs from numerous modelling centres around the world have been coordinated by CMIP (Meehl et al., 2000(Meehl et al., , 2007;;Taylor et al., 2012;Eyring et al., 2016).CMIP supports climate change assessments at national and international levels and brings about climate model improvements.CMIP results have consequently been used to prepare the Intergovernmental Panel on Climate Change (IPCC) assessment reports (IPCC, 2021a).CMIP's sixth and current phase (CMIP6) (Eyring et al., 2016) includes the participation of more institutions (and model versions) in comparison to the project's ffth phase (CMIP5).
We calculated the CFWIS components using the R package cffdrs (Wang et al., 2017).The CFWIS typically requires observations of temperature, relative humidity and wind speed taken at noon local time, in addition to 24 h accumulated precipitation.For a consistent approach to the global analysis, daily values for maximum temperature, mean wind speed, minimum relative humidity and total precipitation were used as proxies for noon conditions.This approach is similar to that taken by Jolly et al. (2015) and Calheiros et al. (2021).At the time of analysis, the required input felds were available for 16 CMIP6 models (Eyring et al., 2016).Given the disparity in ensemble size among the available models, our analysis is limited to a single ensemble member for each model.The full set of models, developed by a total of 13 institutions, is detailed in Table 1.
Following the calculation of the CFWIS components, to permit comparison between CMIP6 models and the reference data, all data were re-gridded to a 2 • × 2 • resolution, using bilinear interpolation.

Fire danger reanalysis
An obvious choice for observational reference for fre weather is CFWIS data from the Global ECMWF Fire Forecast model (hereafter GEFF-ERA5) (Vitolo et al., 2020).Produced by the European Forest Fire Information System of the Copernicus Emergency Management Service, GEFF-ERA5 offers daily continuous fre weather data of the different CFWIS components at a spatial resolution of 0.25 • throughout the world's land area.GEFF-ERA5 has been driven by input felds from the ERA5 Reanalysis (ERA5; Hersbach et al., 2020) from 1979 to present and replaces the previous global fre danger reanalysis driven by ERA-Interim (Vitolo et al., 2019).In general, ERA5 provides a realistic and temporally coherent approximation of real-world weather states, with higher spatial and temporal resolutions and better estimates of meteorological variables compared to ERA-Interim (Dee Table 1.List of the 16 models used to simulate the CFWIS components and their original resolutions., 2011;Hersbach et al., 2019), reducing biases and increasing correlation with observations (Graham et al., 2019;Gleixner et al., 2020;Tarek et al., 2020).GEFF-ERA5 and other reanalysis-derived fre weather indicators have been shown to represent fre danger well.For instance, McElhinny et al. (2020) found a generally good agreement between FWI values and station observations in Canada.In our case, as the CFWIS indicators generated from CMIP6 rely on daily values for the four meteorological components as proxies for noon conditions, and to ensure a fair comparison, we generate CFWIS indicators for ERA5 using the same input components.We make a comparison between ERA5 and GEFF-ERA5 to illustrate the consistency between the two sources of CFWIS information.

Model evaluation
Model evaluation is limited to the areas of the world considered vulnerable to fre activity.Such fre-prone areas of the world are here defned according to the historical evidence of fre activity, determined using burned area data from version 4 of the Global Fire Emissions Database (GFED4) (Giglio et al., 2013;Poulter et al., 2015;Mezuman et al., 2020).GFED4 burned area data are available for the 1996-2016 period.Fol-lowing the approach of Liu et al. (2022) in isolating burnable area, all grid points within a 50 km radius of a record of burned area are identifed as fre-prone in order to account for the spatial randomness of fre activity and the relatively short record of the GFED4 data.
To understand the overall model representation of all CFWIS components (Fig. 1), historical simulations from each GCM are then compared to corresponding ERA5calculated felds between 1980 and 2014, the maximum period for which ERA5 and CMIP6 data are concurrently available.Model performance is then quantifed through the ability of GCMs to simulate monthly mean climatologies of daily values of each CFWIS indicator with ERA5 used as a reference.Additionally, to account for severe fre weather, performance is also quantifed by representation of the 90th percentile, constructed for each month using daily CFWIS values across all years.Evaluation of model representation of spatial and seasonal patterns is undertaken for all CFWIS components at both the global and regional scales, frstly, concerning the multi-model mean (Sect.3.1 and 3.2) and, secondly, with respect to the inter-model spread (Sect.3.3).Multiple model performance metrics are used, including (i) spatial correlation to assess the representation of spatial variability, (ii) root mean squared error (RMSE) to assess the representation of mean states and the extent of model bias, and (iii) the ratio of observed standard deviation to assess the representation of spatial variance.Taylor diagrams (Taylor, 2001;Grimmond et al., 2010;Abbasian et al., 2019) are used to visualise and quantify inter-model relative performance in terms of each model's capacity to reproduce the mean, variance and spatial variability of each CFWIS component.Regional analysis is based on 14 GFED-defned fre regions originally presented by Giglio et al. (2006) and Van der Werf et al. (2006) and widely used in subsequent work (e.g.Giglio et al., 2010Giglio et al., , 2013;;Andela et al., 2019;Mezuman et al., 2020;Grillakis et al., 2022;Liu et al., 2022).To isolate CMIP6 performance during periods that are most conducive to fre activity, a fre season was established for each region based on available GFED4 burned area data.For each GFEDdefned region, the fre season was defned by those months for which the total burned area is greater than 50 % of the maximum burned area across all months, averaged for each month over the available 1996-2016 period.

Evaluation of multi-model CFWIS representation
The ERA5 data suggest that wildfre danger is the largest in dry tropical and subtropical regions such as Australia, sub-Saharan Africa, South America, southern Asia, the Mediterranean Basin and western North America (Fig. 2; second column).These patterns compare favourably to those of the GEFF-ERA5 dataset (Fig. 2; frst column).For all CFWIS components, global patterns of the CMIP6 multi-model mean are generally similar for both the multi-annual monthly mean (Fig. 2; third column) and 90th percentile statistics of daily values (Fig. 3; third column).
The CMIP6 multi-model mean reproduces observed spatial patterns, i.e. regions where fre danger is the highest, reasonably well (Figs. 2 and 3).Nevertheless, compared to ERA5 data (Figs. 2 and 3; second column), there is a tendency for CMIP6 models to overestimate fre-prone weather conditions within the tropics, particularly in parts of South America, sub-Saharan Africa and southeast Asia (Figs. 2  and 3).There is also a general tendency for the CMIP6 multimodel mean to underestimate fre danger in South Africa, the western part of North America, some areas of the east of boreal Asia and Australia (Fig. 2h, l, t and x).
Regional contrasts are also identifed in simulating the fre weather indicators.Looking at the indices describing the quantity of moisture contained by fre fuels, FFMC is overestimated in wet tropical and subtropical regions, such as South America, sub-Saharan Africa and India, for both the mean (Fig. 2d) and, to a lesser extent, the 90th percentile (Fig. 3d).Meanwhile, the same index is particularly underestimated in cold and temperate regions, such as North America, Europe and boreal Asia.DMC is overestimated in South America, sub-Saharan Africa and southeast Asia, while underestimations are found in northern Australia, the southwestern part of North America and southern Africa (Figs. 2h and 3h).DC is generally underestimated in Australia, southern Africa, the east of Central Asia, the western part of northern America and eastern Brazil, whereas overestimation appears in areas of South America, Central America, southeast and western part of Central Asia, southern Europe, and Africa for both the mean (Fig. 2l) and 90th percentile (Fig. 3l).
Regarding fre behaviour indices, ISI is generally well represented across the world, but the mean is overestimated in a number of regions, including southeast Asia, the Middle East, southern Europe, Central and South America, Africa, the greater part of Australia, and some central areas of temperate North America (Fig. 2p).By contrast, ISI is underestimated in some areas of Central Asia, temperate North America, the northern part of Australia, some areas in Brazil, and the southernmost parts of South America and South Africa (Fig. 2p).For BUI, areas of overestimation include South America, southeast Asia and Northern Hemisphere Africa, with underestimation apparent in Australia, the western part of central and temperate North America, and the southernmost parts of South America and South Africa (Fig. 2t).For FWI and DSR, there is a similar pattern as in the other CFWIS components.FWI and DSR are overestimated in southern Australia, southeast Asia, some areas of Central Asia, the Middle East, southern Europe, the Northern and Southern Hemisphere Africa, South and Central America, and the central area of temperate North America (Fig. 2x and bb).Meanwhile, FWI and DSR are underestimated in northern Australia, the western part of Central and temperate https://doi.org/10.5194/gmd-16-3103-2023 Geosci.Model Dev., 16, 3103-3122, 2023 North America, southernmost South Africa and South America, eastern Brazil, and some areas of Central Asia (Fig. 2x and bb).In the case of FWI, this underestimation is more widespread in North America and eastern boreal and Central Asia (Fig. 2x).
The biases are driven by multi-model representation of the four meteorological components required as input for the CFWIS indicators: daily values for maximum temperature, mean wind speed, minimum relative humidity and total precipitation.The representation of these felds in ERA5 and CMIP6 is shown in Fig. S1 in the Supplement.Biases are apparent in all four felds, most strikingly in the representation of relative humidity in the Northern Hemisphere (Fig. S1i).However, cooler maximum temperatures in boreal Eurasia (Fig. S1c) do not appear to have an impact on the representation of fre weather (Figs. 2 and 3; fourth column).Overestimation of precipitation in southern Africa (Fig. S1f) may be responsible for an underrepresentation of DC and DMC in particular (Figs. 2 and 3; fourth column).(i-l), ISI (m-p), BUI (q-t), FWI (u-x) and DSR (y-bb).The lighter yellow colour represents lower danger, and darker brown represents higher danger.Meanwhile, the white colour represents lower bias and darker blue (red) higher negative (positive) bias.

Seasonality in multi-model biases
As model bias could exhibit strong seasonal and regional dependencies, we examine how CMIP6 models perform throughout the year for each of the 14 GFED fre regions in Fig. 4. As for Sect.3.1, model performances are assessed by quantifying the model discrepancy with respect to ERA5.Throughout the year, the results support those already determined from Figs. 2 and 3. CMIP6-simulated CFWIS components generally agree with ERA5 in boreal and temperate North America (BONA and TENA; Fig. 4a and b), South Hemisphere Africa (SHAF; Fig. 4i), and Australia (AUST; Fig. 4n).However, CMIP6 overestimation is found in South America (Fig. 4d and e), as well as southeast and equatorial Asia, (Fig. 4l and m) and, to a lesser extent, Northern Hemisphere Africa (Fig. 4h) and Europe (Fig. 4f) for all CFWIS components, except for FFMC.
There are some clear seasonal differences in model performances.In boreal North America (BONA) and boreal Asia (BOAS), several CFWIS components, including DMC, BUI, https://doi.org/10.5194/gmd-16-3103-2023 Geosci.Model Dev., 16, 3103-3122, 2023 FWI and DSR, are underestimated during the frst half of the year, then the rest of the year agrees quite well with ERA5, except for DSR that is overestimated from July to October (Fig. 4a and j).Biases for Central America (CEAM) vary during the year, with higher positive biases from July to September and a general underestimation from November to May (Fig. 4c).In the Middle East (MIDE) region, model biases are positive; however, they present lower values during the fre season and higher values from January to April for all indicators except for FFMC (Fig. 4g).
Looking at the regions with lower bias, in temperate North America (TENA), CFWIS components show good agreement overall, with moderate underestimation evident from December to May and moderate overestimation evident from July to October (Fig. 4b).CMIP6 performance is strong for all CFWIS components in Southern Hemisphere Africa (SHAF), showing marginal underestimation for most indicators and some slight overestimation for ISI, FWI and DSR from August to November (Fig. 4i).In Australia (AUST), CMIP6-simulated CFWIS components show good performances (Fig. 4n), with the lowest negative bias in FFMC, and the rest of the indicators show a low negative bias, except for November-February where biases are positive.In Central Asia (CEAS), the CMIP6 ensemble generally agrees with ERA5 data but exhibits overestimation from June to November, representing most of the fre season (Fig. 4k).
The rest of the regions present positive and higher bias, FFMC being the component with lower values.In Northern Hemisphere South America (NHSA), CFWIS components present a very large positive bias throughout the year, with lower values for FFMC, especially for the 90th percentile (Fig. 4d).In Southern Hemisphere South America (SHSA), indicators also show positive biases, especially in DMC, BUI and DSR (Fig. 4e), which are, however, lower than in NHSA.In Europe (EURO), most simulated indices (DMC, ISI, BUI, FWI, DSR) are especially overestimated compared to observations from June to October, which exactly represents the fre season (Fig. 4f).Similarly, biases in simulating CFWIS components in Northern Hemisphere Africa (NHAF) are generally positive (Fig. 4h).Lastly, in both southeast (SEAS) and equatorial Asia (EQAS) (Fig. 4l  and m), model biases are large and positive throughout the year, in particular in the months of the fre season in SEAS (May to November) and from October to April in EQAS.

Evaluation of inter-model performance
As shown in Sect.3.1 and 3.2, the CMIP6 multi-model ensemble shows overall good agreement with ERA5 in terms of spatial patterns for both the mean and 90th percentile.In this section, the focus is thus given to the performance of each CMIP6 model to simulate CFWIS components at both global and regional scales.This evaluation is again applied to simulated mean and 90th percentile values for all CFWIS components and is based on spatial correlation, the normalised root mean squared error (RMSE), and the ratio of the observed and simulated standard deviations, which are summarised using Taylor diagrams (Figs. 5 and 6).
At the global scale, the representations of DMC, DC and BUI present similar patterns, with greater inter-model variability and thus greater uncertainty than the other indices, for both monthly mean (Fig. 5b, c and e) and 90th percentile annual values (Fig. 6b, c and e).Inter-model variability and uncertainty are smaller for FFMC, ISI, FWI and DSR (Figs. 5  and 6a, d, f and g), for which most models reproduce spatial patterns reasonably well, with a normalised RMSE around 0.5 and a correlation ranging from 0.80 to 0.96.
Regarding the 90th percentile over the different CFWIS components (Fig. 6), individual model performance varies slightly, but patterns of models across regions remain very similar to the fre season mean simulations (Fig. 5).
The CMIP6 ensemble mean results show considerable regional dependencies, and one would expect such differences to be apparent in the performance of individual models.To understand and quantify the relative performance of each model, Fig. 7 details the same set of spatial correlation, normalised RMSE and standard deviation ratio shown in Figs. 5  and 6, this time for each of the 14 GFED regions.Unlike the global analysis shown in Figs. 5 and 6, the results in Fig. 7 only consider the corresponding fre season of each region based on historical burned area (as determined in Fig. 4).
The values of the three evaluation metrics, both for the mean and 90th percentile, vary greatly from region to region and across individual models (Fig. 7).Looking at the spatial correlation (Fig. 7a) for instance, Australia and southeast Asia are consistently in good agreement with observations across the different models, while for others like Central and South America all models show much weaker performance.For the normalised RMSE (Fig. 7b), most models in Central and South America show larger values, and central and southeast Asia present lower values overall.In the case of the standard deviation (Fig. 7c), there are no clear patterns, and the values are quite heterogeneous both among models and among regions.
Following the approach taken by Dieppois et al. (2015) in the evaluation of CMIP5 models, all three different statistics from Fig. 7 are combined to rank the individual model performance.Models are ranked for each of the three spatiotemporal skill metrics for seasonal mean and 90th percentile in each CFWIS component and each region, with a comprehensive ranking matrix shown in Fig. 8.The overall relative performance of individual models exhibits a strong degree of heterogeneity across the different regions but, in most cases, is consistent among the different CFWIS components (Fig. 8).There are some models (e.g.INM models, IPSL-CM6A-LR and MPI-ESM-1-2-HAM) that consistently show weaker performance in most of the regions (Fig. 8).The CNRM models, for instance, perform relatively poorly in many regions but perform reasonably well in Australia (Fig. 8).By contrast, there are some models, such as ACCESS-CM2, GFDL-CM4 and MRI-ESM2-0, that show better performance in most regions, with some exceptions (Fig. 8). https://doi.org/10.5194/gmd-16-3103-2023 Geosci.Model Dev., 16, 3103-3122, 2023

Synthesis and discussion
To support applications that seek to justify the selection of one or more models on which to base an impact study on, we generated a set of rankings inspired by those produced for the evaluation of the EURO-CORDEX ensemble by Vautard et al. (2021).All 16 models were ranked according to two different measurements: (1) the count of the number of times in which each model falls into the upper tercile in terms of all three skill metrics (i.e.correlation, normalised RMSE and the ratio of standard deviation) for the seasonal mean and 90th percentile in each of the seven CFWIS components and across each of the 14 GFED fre regions (Fig. 9a) and ( 2) the count of the number of times in which a model falls into the lower tercile, indicating which models exhibit poorer performance more frequently (Fig. 9b).
Only three models appear in the upper tercile more than 50 % of the time: GFDL-CM4, ACCESS-CM2 and MRI-ESM2-0 (Fig. 9a).GFDL-CM4 is a strong performer in Central Asia, as well as in Europe (EURO), North Hemisphere Africa (NHAF) and Australia (AUST), but is far weaker in Central America (CEAM) and equatorial Asia (EQAS).ACCESS-CM2 features in the upper tercile at least 35 out of 42 times in Europe (EURO) and Central America (CEAM) regions.In boreal (BONA) and temperate North America (TENA), the standout model is MPI-ESM1-2-HR and KIOST-ESM for TENA.In Australia (AUST), CNRM-CM6-1 and GFDL-CM4 perform the best overall.Overall, the two INM models and MPI-ESM-1-2-HAM feature in the upper tercile in less than 20 % of occasions, and there are no individual regions where these models are shown to perform well.MPI-ESM-1-2-HAM and the two INM models also appear in the lower-tercile category more than 300 times (Fig. 9b).GFDL-CM4 and ACCESS-CM2 are the strongest performers in this respect, falling in the lower tercile fewer than 100 times.
In addition, models perform well in simulating some variables but not others.The individual model performance also exhibits a strong regional dependence.For several models, performance was found to be strong across some regions and poorer in others.It is diffcult to identify systematic reasons for the inter-model differences based on spatial resolution or shared pathways of model development, otherwise referred to as model genealogy (Masson and Knutti, 2011).Performance is similar between the INM and CNRM model families, but there are considerable differences among the three MPI models.MPI-ESM1-2-HR consistently performs better than its companion lower-resolution models (MPI-ESM1-2-LR and MPI-ESM-1-2-HAM).It is also notable that the CanESM5 model has the lowest resolution (2.8 • × 2.8 • ) but outperforms many higher-resolution models in several regions, particularly in boreal North America (BONA) and Central America (CEAM).However, this observation aside, there is little evidence for a model's original spatial resolution as an important factor in its performance.Comparison of different models does not provide an ideal framework to draw conclusions, as the impact of resolution is likely to be driven by internal model physics and dynamics.
The models performing better across a wider set of regions are GFDL-CM4, ACCESS-CM2 and MRI-ESM2-0 when assessing model performance region by region and for each region's fre season (Fig. 8).MPI-ESM1-2-HR shows good skill annually and at a global scale (except for DMC and BUI), and it is one of the models performing well in sevhttps://doi.org/10.5194/gmd-16-3103-2023 Geosci.Model Dev., 16, 3103-3122, 2023 for the fre season mean and 90th percentile across each of the seven CFWIS indices and each of the 14 GFED fre regions.Darker colours show higher spatial correlations and lighter colours lower.The fre season for each region is defned as those months for which the average burned area is greater than 50 % of the monthly maximum (see Fig. 4).
eral regions (Figs. 8 and 9).The models that show the poor-forming poorly, when simulating CFWIS components both at est skill in most regions are INM-CM4-8, INM-CM5-0 and global and at regional scales, in a multi-model study unless MPI-ESM-1-2-HAM, and they are also often found in the for specifc regions where they present better skill.Careful lower part of the global ranking distribution (lower tercile, consideration to model selection should be given, taking into Fig.9).It is advisable not to include models consistently per- account the study area and the chosen fre weather indicators under analysis.
Our synthesis does not consider model representation of the meteorological components taken as input in deriving the CFWIS indicators.A frst-order analysis of multi-model biases in these felds is given in Sect.3.1 and Fig. S1, but more in-depth analysis of the relative contribution of biases in each feld to the overall representation of fre weather is beyond the scope of this study.Clearly, model development in fre weather representation of fre weather, especially in a changing world, should consider the reasons for model biases in key fre-prone regions.This includes the representation of temperature highs and relative humidity lows in large parts of the Northern Hemisphere.

Conclusions and outlook
Changes in the intensity and spatial distribution of wildfres are a likely consequence of a changing global climate.Producing reliable projections of meteorologically driven wildfre danger is crucial for establishing forest management and restoration strategies that will remain resilient in future decades.We presented a detailed evaluation of the perfor-mance of a subset of CMIP6 models in simulating spatiotemporal variability in fre weather across all parts of the world currently vulnerable to wildfre.A set of fre weather indicators, defned by the CFWIS, were generated for 16 different CMIP6 models and compared with corresponding felds from the ERA5 fre danger reanalysis for the period 1980-2014.Models were analysed collectively as part of an ensemble mean and in terms of their individual performance on both global and regional scales according to a set of performance criteria.At the global scale, the ensemble mean was found to simulate the set of CFWIS components well, reproducing similar spatial patterns to the ERA5 reference dataset.This is broadly encouraging for the use of the CMIP6 ensemble as a tool for understanding future changes in fre weather associated with a changing climate.At the regional scale, model results showed seasonal and regional variability, with some regions exhibiting very little model bias (e.g.Australia or Southern Hemisphere Africa) and vice-versa in other regions (e.g.Northern Hemisphere South America or southeast Asia).
Our results also have important implications for the use of CMIP6-derived simulations of past, present and future climate-driven fre danger.It is anticipated that the evaluhttps://doi.org/10.5194/gmd-16-3103-2023 Geosci.Model Dev., 16, 3103-3122, 2023 ation presented here, while based on solely historical spatiotemporal variability, will serve as an important resource for users of model-simulated fre weather, both during the CMIP6 era and beyond, in three different ways.Firstly, the extent to which any given model performs well is sensitive to the fre weather indicator being evaluated.Ultimately, different indicators, including the CFWIS set evaluated here, have different meanings in meteorological terms, and strong model performance for one indicator does not necessarily mean strong performance for another.At the global scale, FFMC, ISI, FWI and DSR tend to be reproduced with lower uncertainty.The results that are shown here catalogue where and for which model skill is suffciently strong for a range of fre weather indicators.Secondly, model performance can vary dramatically from one region to another.The evaluation highlights regions where the capacity to reproduce fre weather is strong, at least in a subset of models.These differences should be fully accounted for in regional-scale fre weather studies.Thirdly, the large differences in model performances highlight the importance of a comprehensive model selection.This could signifcantly affect the conclusion provided in previous assessments of global wildfre projections using a single model (e.g.Krawchuk et al., 2009) or using a multi-model mean (e.g.Moritz et al., 2012;Dowdy et al., 2019).For instance, projected trends derived from multi-model mean could be signifcantly impacted by outlier models, presenting unrealistic mean, variability and trends.
Comprehensive characterisation and quantifcation of model uncertainties are thus ethically crucial for robust decisionmaking (Knutti, 2010;Daron et al., 2021).The results presented here not only demonstrate the value of model selection but also provide a potential foundation for projections that take individual model skill and/or independence into account (e.g.Eyring et al., 2019).Future analysis will explore how the multi-model mean bias could be potentially reduced using a weighted mean or a multi-model mean with those models showing better performance and see how it is refected in the projections for different shared socioeconomic pathway (SSP) scenarios.
While here we provide a robust, meaningful and useful global evaluation of CMIP6-simulated fre weather, it is necessary to outline potential caveats and opportunities for expansion.The availability of the input felds necessary to construct the full set of CFWIS components limited the evaluation to 16 CMIP6 models out of more than 50.Further study may consider additional models that contribute to CMIP6 for which input data may become available in the future.Furthermore, as some of the models only had one realisation available, we only consider here differences between single members, which could potentially affect the model variability on regional scales (Deser, 2020).The currently used CFWIS indicators (Van Wagner, 1987) were frstly defned for specifc stand conditions at noon time.In order to update the system, so it provides better fre danger information, moisture codes and behaviour indices are being reviewed to consider peak daily burning conditions, and a new version of the system will be released by 2025 (Canadian Forest Service Fire Danger Group, 2021).In addition, analysis of fre weather indicators from other risk assessment systems would complement the results presented here.Global analysis of the CFWIS (e.g.Liu et al., 2022) has recommended extension to fre weather indicators from systems such as the McArthur Forest Fire Danger Index from the Centre for Australia Weather and Climate Research (McArthur, 1967), the Keetch-Byram drought index from the US Department of Agriculture's Forest Service (Keetch and Byram, 1968), and the Energy Release Component from the US National Fire Danger Rating System (Deeming et al., 1972).To truly understand the sources of error and biases for a given index, an in-depth analysis of the relative contribution of the meteorological felds used to construct it is required.Such an analysis is not trivial and should be an important focus for future study.A fnal point concerns the GFED fre regions taken as the basis for the regional-scale analysis: while they are a useful categorisation for the purpose of this evaluation, fre regimes vary substantially at the intra-regional scale.Potential alternative categorisations, in Europe for example, include the fre regimes defned by Galizia et al. (2021), while fre-prone areas may be better isolated using high-resolution land surface data (e.g.normalised difference vegetation index).It is important for studies requiring GCM-simulated fre weather data to consider that such intra-regional variability will likely extend to model performance.We also note that CMIP6 models have been found to show a greater warming extent than CMIP5 (Coppola et al., 2020;Hausfather et al., 2022), with several models exhibiting far greater equilibrium climate sensitivity (Forster et al., 2020;Zelinka et al., 2020).It remains unclear to what extent some warming rates may be unrealistic and how this might manifest in the calculated indicators.
Wildfres are complex events that involve not only forest dynamics but also climate conditions and human activity, so their projection under climate change is challenging.Given the predicted changes in fre regimes, their intensity and spatial distribution, current forest management and restoration strategies may not be effective for future conditions.This is particularly crucial as changes in wildfre activity become more evident both in fre-prone regions and in regions where wildfre danger was previously minimal (Mamuji and Rozdilsky, 2019;Boer et al., 2020;McCarty et al., 2020).The approach presented here aimed to characterise uncertainty in the latest generation of GCMs (CMIP6) when simulating fre weather and to evaluate model fdelity in order to reduce those uncertainties when informing future projections.Evaluation and model selection will support more appropriate and informed decision-making and aid forest managers in formulating strategies to respond to future wildfre events.
Figure 1.Fire weather components of the Canadian Fire Weather Index System (CFWIS).Adapted from Natural Resources Canada (2021).

Figure 2 .
Figure2.Multi-annual monthly mean for GEFF-ERA5 (frst column), ERA5 (second column) and the CMIP6 multi-model mean (third column), as well as bias in the CMIP6 multi-model mean with respect to ERA5 (fourth column) for FFMC (a-d), DMC (e-h), DC (i-l), ISI (m-p), BUI (q-t), FWI (u-x) and DSR (y-bb).The lighter yellow colour represents lower danger, and darker brown represents higher danger.Meanwhile, the white colour represents lower bias, and darker blue (red) higher negative (positive) bias.

Figure 3 .
Figure3.Multi-annual monthly 90th percentile for GEFF-ERA5 (frst column), ERA5 (second column) and the CMIP6 multi-model mean (third column), as well as bias in the CMIP6 multi-model mean with respect to ERA5 (fourth column) for FFMC (a-d), DMC (e-h), DC (i-l), ISI (m-p), BUI (q-t), FWI (u-x) and DSR (y-bb).The lighter yellow colour represents lower danger, and darker brown represents higher danger.Meanwhile, the white colour represents lower bias and darker blue (red) higher negative (positive) bias.

Figure 4 .
Figure4.Bias in monthly means and 90th percentiles in seven CFWIS components simulated by the CMIP6 multi-model mean with respect to ERA5 across 14 GFED fre regions: (a) boreal North America (BONA), (b) temperate North America (TENA), (c) Central America (CEAM), (d) Northern Hemisphere South America (NHSA), (e) Southern Hemisphere South America (SHSA), (f) Europe (EURO), (g) the Middle East (MIDE), (h) Northern Hemisphere Africa (NHAF), (i) Southern Hemisphere Africa (SHAF), (j) boreal Asia (BOAS), (k) Central Asia (CEAS), (l) southeast Asia (SEAS), (m) equatorial Asia (EQAS), and (n) Australia and New Zealand (AUST).Results show overall model performance, with blue shading indicating underestimation and red shading overestimation.The lower-right triangle represents the monthly mean and the upper-left triangle the monthly 90th percentile.Bar plots show the average monthly burned area for each GFED region, represented as a fraction of the monthly maximum.Black bars highlight months that constitute the fre season, defned as those months for which the average burned area is greater than 50 % of the monthly maximum.

Figure 5 .
Figure 5.Taylor diagrams showing the capacity of 16 CMIP6 models to simulate annual means in the seven CFWIS indices.The correlation coeffcient is plotted in relation to the polar axis, the normalised RMSE in relation to the internal circular axis and the normalised standard deviation in relation to the horizontal axis.ERA5 is represented by an empty dot on the horizontal axis.

Figure 6 .
Figure 6.Taylor diagrams showing the capacity of 16 CMIP6 models to simulate annual 90th percentiles in the seven CFWIS indices.The correlation coeffcient is plotted in relation to the polar axis, the normalised RMSE in relation to the internal circular axis and the normalised standard deviation in relation to the horizontal axis.ERA5 is represented by an empty dot on the horizontal axis.

Figure 7 .
Figure 7. Individual CMIP6 model (a) correlation, (b) RMSE, and (c) absolute log of the ratio of standard deviation with respect to ERA5for the fre season mean and 90th percentile across each of the seven CFWIS indices and each of the 14 GFED fre regions.Darker colours show higher spatial correlations and lighter colours lower.The fre season for each region is defned as those months for which the average burned area is greater than 50 % of the monthly maximum (see Fig.4).

Figure 8 .
Figure8.CMIP6 inter-model ranking for 14 GFED regions, 7 CFWIS components and 3 × 2 skill metrics (correlation, RMSE, and ratio of standard deviation for the mean and 90th percentile).For a given region and CFWIS component, models are ranked from 1 (the strongest) to 16 (the weakest) according to a given skill metric.Blue (red) shading is thus indicative of strong (weak) model performance.

Figure 9 .
Figure 9. (a) Counts of the number of times that each CMIP6 model is ranked in the upper tercile (top fve) across all seven CFWIS components and 3 × 2 skill metrics (correlation, RMSE, and ratio of standard deviation for the mean and 90th percentile).The grid (left) shows the breakdown of total counts for each of the 14 GFED regions.The bars (right) indicate the total counts across all regions.(b) As (a) but for the lower tercile (bottom 5).