Spatial and temporal characteristics of extreme rainfall: Added benefits with sub‐kilometre‐resolution climate model simulations?

Local, short‐duration extreme precipitation events can cause floodings and have massive economic consequences. The climate change impact on such events is of great interest, but due to the small spatio‐temporal scales involved, these are challenging to properly represent in climate models. This study analyses a new sub‐kilometre (750 m) HARMONIE‐Climate model simulation driven by ERA5 reanalysis data. Three convection‐permitting models at 750 m, 3 and 5 km grid distance are analysed and compared with driving reanalyses, intermediate model simulations and a dense rain gauge network. The representation of convective events is analysed by a range of metrics categorised as spatial, temporal and event‐focused. Precipitation events are analysed at both hourly and sub‐hourly scales and a clear difference between model performance on these scales is found. Overall, we find a better performance for HCLIM750m for most metrics, yet the added benefits of the computationally intensive sub‐kilometre scale simulation seems limited compared to the convection‐permitting models at 3 and 5 km.


INTRODUCTION
Extreme precipitation events with a temporal resolution of only a few hours and small spatial coverage may cause serious floodings with massive social and socioeconomic costs (IPCC, 2022). Such extreme events are often convective events caused by the uplifting of warm moist air resulting in heavy precipitation. Convective events can occur as a part of frontal systems or as an effect of solar-heated air becoming more buoyant than the surroundings. A recent attribution study indicates that the observed warming has already increased the risk of extreme precipitation events in Denmark (Matte et al., 2022). Climate change will increase the occurrence and size of precipitation events and, as a consequence, further increase the risk of pluvial flooding in northern Europe (IPCC, 2021;Christensen et al., 2022). Climate models' ability to represent extreme events at high temporal and spatial resolution is of great interest to better understand how convective events are influenced by climate change. A convective cell has a spatial extent of 1-10 km and may occur as a single cell, multicell or supercell clusters (Doswell, 2005). Due to the scale at which extreme precipitation evolves, climate models struggle to represent this phenomenon well, as the scale of climate models often does not match the scale of convective events Lucas-Picher et al., 2021).
Today, three general types of climate models are primarily used to assess changes in regional extreme precipitation: high-resolution Global Circulation Models (GCMs) with a typical grid cell scale of approximately 50 km, Regional Climate Models (RCMs) with a typical scale of approximately 10 km and so-called Convection-Permitting RCMs (CPMs) at a typical scale of 2-4 km. RCMs benefit from availability of large ensembles, which give vital information about uncertainties and variability (e.g. Christensen and Christensen, 2007;Jacob et al., 2014). Due to the resolution of RCMs, convection is typically parameterised, as the scale of convection is smaller than the grid cell size. Studies have found that the parameterisation of convection limits RCMs' ability to represent intense rainfall events realistically (Frei et al., 2006;Fowler and Ekström, 2009). CPMs have a grid cell resolution where convection can potentially be modelled explicitly, explaining why convective parameterisation is often turned off. In some models, both shallow and deep convection is turned off, while others still parameterise shallow convection (Kendon et al., 2017). CPMs have become more numerous since Kendon et al. (2012) showed the benefits of these models. Nevertheless, a drawback of CPMs is still the limited number of models and simulations, primarily due to the high computational cost, which results in only a few CPM ensembles being available for analyses (e.g. Fosser et al., 2019;Coppola et al., 2020;Ban et al., 2021). Several studies have shown that CPMs often improve the representation of extreme rainfall compared to RCMs (Prein et al., 2013;Chan et al., 2014;Lind et al., 2016;Olsson et al., 2021;Médus et al., 2022). However, these studies also point out that there is room for improvement in order to represent hourly and sub-hourly rainfall events realistically. Few studies have analysed sub-hourly rainfall, with varying methods and inconsistent results (Brisson et al., 2018;Purr et al., 2019;Meredith et al., 2020;Vergara-Temprado et al., 2021). As their name indicates, the typical grid cell scale of CPMs suggests that they are 'convection-permitting' but not necessarily 'convection-resolving'. Some studies have analysed the improvement of rainfall statistics in sub-kilometre CPMs; however, these studies have all focused on single events or short simulations periods of less than 40 days (e.g. Hanley et al., 2015;Moseley et al., 2020;Prein et al., 2021). It has now been more than a decade since the first CPM was introduced. Recent studies have suggested future research steps with CPMs to overcome the challenges such as too intense heavy precipitation, parametrisation of sub-kilometre processes and too small and few ensembles Lucas-Picher et al., 2021).
To assess climate models' ability to represent rainfall, high-resolution observational data are necessary. Observational data can be, for example, radar data, rain gauge data, satellite data or data from microwave links networks. Radar data can give information on spatial structures and movement of rainfall, but the intensity estimates are less certain as a radar measures reflectivity (Einfalt et al., 2004;Thorndahl et al., 2017). Rain gauge data are expected to give a more accurate estimate of intensities during a rainfall event, and have proven to exhibit the same spatio-temporal properties as bias-corrected radar observations . Depending on the metric analysed and on data availability, one type of observational data might be preferred over the other. Previous studies have suggested several metrics to assess climate models' ability to represent rainfall (Gregersen et al., 2013;Sunyer et al., 2017;Médus et al., 2022;Thomassen et al., 2022). When assessing added benefits, the conclusion may depend on the selected metric as well as the temporal and spatial scale at which the metric is analysed.
In this study, the objective is to assess the benefits of employing sub-kilometre resolution for CPMs with respect to precipitation extremes at sub-hourly, hourly and daily durations. We analyse dynamically downscaled reanalysis data from a set of experiments with HARMONIE-Climate (HCLIM, Belušić et al., 2020). The sub-kilometre simulation is made specifically for this study and consists of a reanalysis simulation downscaled to 750 m over Denmark (nested within 5 km grid spacing intermediate downscaling) for five years of heavy convective precipitation seasons (April-October). The study will also use data from an existing reanalysis-driven simulation over Fenno-Scandinavia run at 3 and 12 km resolution as well as the driving global reanalysis. To assess the representation of extreme rainfall in the models, we apply a broad range of metrics which can be summarised in three main types: spatial analyses, temporal analyses and event analyses.

Data
This study compares four simulation produced with the same climate model but at different resolutions, two reanalyses and one observational dataset. We limit the analysis to periods of complete spatial and temporal overlap, analysing only land cells over Denmark (see Figure 1). The analysed datasets consist of five years of data from April to October, stretching over the period where most heavy convective precipitation events in Denmark occur. While heavy convective precipitation events do not normally occur in April (Åström et al., 2016), and also include spin-up for the 5-km and the 750-m simulations, we included it in the analysis anyway due to the limited simulation length. The available data from April to October are hereafter referred to as the heavy convective precipitation seasons. some of the most extreme rainfall events observed in Denmark, to make sure extreme convective events happen in all analysed years. The years are therefore not representative years for a climatology. Denmark is relatively flat, with altitudes ranging from 0 to 170 m above sea level. Mean annual precipitation ranges from 550 mm in eastern Denmark to 950 mm in western Denmark . The observational dataset (henceforth, POINT) is from a network of tipping bucket rain gauges from The Water Pollution Committee of The Society of Danish Engineers (SVK) (Gregersen et al., 2013;Madsen et al., 2017). The POINT data have an accuracy of 0.2 mm and one-min temporal resolution. In the time period of interest, 98 stations are part of the network. Some stations suffer from intermittent breakdowns, due to technical problems or maintenance, and the knowledge of the affected intervals is used in the analysis to take care of missing data (see Section 2.2). The quality control procedure is outlined in Jørgensen et al. (1998). Spatial metrics of POINT have been validated against radar observations in a previous study .
The four climate model simulations in this study are driven at the lateral boundaries by two different reanalysis datasets from the European Centre for Medium-range Weather Forecasts (ECMWF), see Figure 1 top left and Table 1. All downscaling simulations have been performed with the HARMONIE-Climate model, cycle 38 (Belušić et al., 2020); two different sets of physical parameterisations have been used: ALADIN for intermediate resolution, and AROME for high resolution (Termonia et al., 2018).
The global reanalysis ERA-Interim (ERAI) is downscaled to 12 km (HCLIM12km) and further downscaled in a double-nested setup to 3 km (HCLIM3km). ERAI produced by the ECMWF has a temporal range from 1979 to 2019 and further details can be found in Table 1 and in Dee et al. (2011). HCLIM12km and HCLIM3km use ALADIN physics and AROME physics, respectively. The ALADIN physics parameterises all convective precipitation and makes HCLIM12km a traditional RCM, whereas HCLIM3km with AROME physics is a CPM with only shallow convection parameterised. HCLIM12km and HCLIM3km have been produced in the Nordic Convection Permitting Climate Projections project (NorCP), and data cover the entire Fenno-Scandinavia. HCLIM12km and HCLIM3km are continuous simulations covering the period 1998-2018. For full documentation of these experiments, we refer to Lind et al. (2020).
The second reanalysis employed is ERA5, which is downscaled first to 5 km (HCLIM5km) and further to 750 m (HCLIM750m) in a so-called double-nested setup (Table 1). ERA5 currently extends from 1959 to the present and is the state-of-the-art reanalysis dataset from the ECMWF (Hersbach et al., 2018(Hersbach et al., , 2020. The simulations HCLIM5km and HCLIM750m have been produced for this study, and consist of the five heavy convective precipitation seasons in 2007, 2011, 2014, 2015 and 2017. The seasonal focus is chosen assuming that periods with low convection activity will be well represented at coarser resolutions . HCLIM5km simulations are started one week before 1 April each of the simulated years, while HCLIM750m are started on 1 April directly. Data from 1 April to 31 October are used in the analysis to avoid reducing the limited amount of data, giving no excluded spin-up period. This is not optimal, since soil moisture obviously may influence heavy precipitation (e.g. Hohenegger et al., 2009), but the very heavy computational load renders multimonth soil spin-up impractical. Soil moisture and other initial values for the soil scheme are taken from ERA5 to HCLIM5km and from this simulation to HCLIM750m, making these as realistic and balanced as practically possible. Note that heavily convective precipitation events are not expected to occur in April (Åström et al., 2016) and we only aim to spin up the atmosphere, since the soil initialisation is expected to have only small effects on the climate in such a small domain. Note also, that most weather systems move into the domain from the sea, which also reduces the importance of soil initialisation.
HCLIM5km and HCLIM750m are both run with AROME physics with only shallow convection parameterised. The HCLIM750m data span a 480 by 570 km area over Denmark and southern Sweden. The data domains for HCLIM5km and HCLIM750m are shown in Figure 2; for data domains of the other simulations, we refer to the studies mentioned, in which the simulations are introduced. For HCLIM750m we only include every fourth cell in each direction, reducing the number of cells analysed to one every 3 km in each direction. This data-thinning procedure greatly reduces the computational burden of the analyses while retaining the full ability to compare with the other simulations, which are also analysed on the highest possible resolution. As the map projections are different between HCLIM750m and HCLIM3km, the sampling of HCLIM750m does not match the cell centres of HCLIM3km. Hereafter, HCLIM750m will refer to the HCLIM750m data with a sampling of every 16th cell. All analyses have been done on this resampled dataset.

Methods
This study is an analysis of the ensemble of opportunity where available simulations were analysed against TA B L E 1 Data overview of the observational dataset (POINT) and the two sets of climate model data a new sub-kilometre CPM (750 m). This implies that the models have different domain sizes and are driven by different reanalysis products. The influence of reanalysis on the lateral boundaries are not analysed, but discussed in Section 3.6. A set of metrics are analysed in order to quantify and assess the added benefits of higher resolutions. The metrics aim at covering temporal, spatial and event-based aspects of precipitation. Monthly precipitation and frequency of wet days are analysed to understand the representation of average precipitation properties in the models. The diurnal cycle is used to assess the presence of convective processes during the day. Extreme events are sampled as exceedance series, to analyse intensities, duration and spatial variation of extreme events. The spatial correlation of extreme events is calculated to evaluate their spatial extent. This is, in turn, used to assess whether the climate models produce extremes which in spatial size are comparable to observed extremes. The non-central moments are used to analyse time series' mean and extreme properties, which are used to test whether climate models have a tendency to over-or underestimate these properties across different levels of aggregation. The metrics described below have all been used in previous studies aimed at studying and quantifying the importance of spatio-temporal resolution in climate models with a focus on convective precipitation. They are briefly discussed below with reference to benchmark studies.
To evaluate model performance for each metric and as an overall performance assessment across all metrics, a simple error estimate is used. The error estimates compare climate model simulations against observations to quantify the performance of the models for each metric. We assume perfect observations, despite knowing that the chosen observational product can influence the conclusion (e.g. Sunyer et al., 2013). The error estimate is calculated as the mean absolute error (MAE): where n is the number of points evaluated for the given metric, obs is the observational value for the given metric and model is the climate model-simulated value for the metric. For metrics shown as maps, the error estimate is calculated based on the underlying boxplot. Error statistics on boxplots are calculated for n is the first, second and third quantile (Q1, Q2, Q3). For metrics shown as lines, MAE is calculated on the underlying point data, for example the differences between intensities for different simulations for each of the durations analysed. For the spatial correlation, MAE is calculated on the e-folding distances for the different durations.

Monthly precipitation and frequency of wet days
Monthly average precipitation is calculated solely based on precipitation within the five heavy convective precipitation seasons considered in this study. Accumulated rainfall within the period is divided by the length of data (35 months for climate model data, POINT data vary with missing data).
Wet days are defined as days with a total rainfall of more than 1 mm for all datasets to exclude drizzling (Kjellström et al., 2010). The average monthly number of wet days is calculated as the accumulated number of wet days for the five heavy convective precipitation seasons divided by the number of months.

Diurnal cycle
Two types of diurnal cycle of precipitation are calculated, a diurnal cycle of the average rainfall amount and a diurnal cycle visualising extreme intensity over the course of the day. The diurnal cycle is not calculated for the global reanalysis models (ERA5 and ERAI). The ERAI dataset does not contain hourly precipitation and is therefore not used, so for consistency, hourly precipitation rates from ERA5 are not used either. Only the closest grid cells to the rain gauge station were selected, in order to increase intercomparability with datasets of equal sizes. The first type of diurnal cycle is characterised by the average rainfall amount (mm) for each hour of the day, using the method from Olsson et al. (2021). The average rainfall amount for each hour (h) in mm/hour is given by: where d is the day and D denotes the total number of days in the data. To be able to compare diurnal cycles between stations R tot (h) is normalised by the 24-h average (Yin et al., 2009). The second type of diurnal cycle is calculated as the 95th percentile hourly intensity in mm/hour. Data are aggregated into hourly time steps and for each hour of the day, the 95th percentile is calculated separately (Médus et al., 2022).

2.2.3
Ranks of exceedances Extreme events are sampled with a peak-over-threshold (POT) censoring type II method (Mikkelsen et al., 1995;Gregersen et al., 2013), that is sampling a fixed number of the largest events within each grid point. This allows a flexible threshold and a set number of events within each dataset. An average of three events per year is chosen in each grid point. In this study, the heavy convective precipitation seasons are treated as full years, since convective activity is stronger during warmer months, resulting in a sample of the 15 most extreme events per grid cell. The intensity of the least intense of the 15 events in each grid cell or rain gauge station is named the cut-off value, as this intensity becomes the border between extremes and non-extremes. Extreme events are sampled for five individual intensity durations ranging from 15 min to 6 h (15, 30, 60, 180 and 360 min). Events are considered independent if a dry period between events is at minimum the same length as the intensity duration period: for example, if 60-min extreme events are sampled, the dry period must be of minimum 60 min between independent events. For climate model data, an interevent threshold of 0.2 mm/h is set to ease the separation of precipitation events . This threshold should not be confused with the 1 mm/day threshold used earlier to define dry days. Periods with rainfall intensities below the 0.2 mm/h threshold are considered dry when selecting extreme events.
As the dataset only consists of heavy convective precipitation seasons, the calculation of return periods for the sampled extreme events will be biased. For sub-hourly durations, the calculated return period will approximately represent the annual return period, whereas for the long durations (∼24 h), the calculated return period will be more biased because of the lack of information about heavy frontal storms during the seasons not simulated. Throughout the study, the rank of the sampled extreme events will therefore be used to compare intensities across models. This means each sampled event has an intensity, a duration and a rank associated with it.

2.2.4
Spatial correlation of extreme events The unconditional spatial correlation of extremes, , is calculated by applying the framework developed by Mikkelsen et al. (1996). It is calculated for each dataset separately and describes the spatial extent of extreme events within each dataset. The comparison between models is done by plotting the separate results for each dataset. The spatial correlation is calculated by sampling extreme events (Section 2.2.3) and estimating the correlation of extreme events that are concurrent and therefore interpreted as meteorologically dependent. This metric has been used frequently to assess climate models' ability to reproduce the spatial extent of extreme rainfall systems (Gregersen et al., 2013;Mayer et al., 2015;Thomassen et al., 2022). The sampled extreme events (Section 2.2.3) are paired between all pairs of separate locations. Events are considered concurrent by considering a lag time relative to the start time of the events. Introducing a lag time allows events to be considered concurrent even if the events do not overlap exactly in time, allowing an event to travel over the case area. A lag time of 11 h is applied as in Mikkelsen et al. (1996) and Gregersen et al. (2013). Thomassen et al. (2022) showed that the results were not sensitive to the choice of lag time for lag times above 0 h (events overlap exactly in time).
The correlation is assumed to decay with increasing distance. We fit a two-parameter generalised exponential function to the correlation pairs based on the distance, d, between them: . ( The two parameters of the exponential function ( and ) are chosen to secure a correlation of 1 at the distance d = 0. Finally, the e-folding distance is determined from the fitted exponential function as the distance at which the correlation is decreased to 1∕e. This e-folding distance provides a simple metric to compare spatial correlation across different durations and datasets.

2.2.5
Non-central sampling moments The non-central sampling moments of the observational dataset and the six climate model and reanalysis datasets are compared at different temporal aggregations. The non-central sampling moments of order 1-3 are compared between datasets to evaluate the temporal scaling behaviour of precipitation (Sunyer et al., 2017), that is, a comparison of mean, variance, and skewness across the datasets. The moments are estimated for temporal aggregations of 1-48 h (60, 180, 360, 720, 1,440 and 2,880 min). The non-central moment of order q (Molnar and Burlando, 2005) is: where X t is the rain series with the unit mm at the temporal aggregation t and N is the length of data at the temporal resolution t. The term 'seasons' is the number of heavy convective precipitation seasons. For all model data, 'seasons' is five but for POINT it may be lower due to missing data. A power law relationship over temporal resolutions of the non-central sampling moment has often been identified from daily to hourly scale on observations and model outputs (Gupta and Waymire, 1993;Molnar and Burlando, 2005;Onof and Arnbjerg-Nielsen, 2009) and used to characterise statistical scale-invariance or multifractality (Schertzer and Lovejoy, 1987;Olsson and Niemczynowicz, 1996;Mayer et al., 2015;Sunyer et al., 2017). To the knowledge of the authors, such moment analysis has not been carried out on CPMs before. Following the notation in Molnar and Burlando (2005), the temporal scale is defined as = 2 −n where n is the level of subdivision in the aggregation of data, starting from the highest level of aggregation (2,880 min.). Given this, n = 0 for 2,880 min, n = 1 for 1,440 min and n = 5.667 for 60 min. It should be noted that the power law relationship is not expected to hold for sub-hourly scale (Nguyen et al., 2007), yet in some cases it has proven to work (Olsson, 1995). Moments of order q > 1 are standardised by moment q = 1, to attempt a fair comparison between point data and gridded data. The standardised moments have been defined as (Sunyer et al., 2017):

Monthly precipitation and wet days
The average monthly precipitation within the considered five heavy convective precipitation seasons indicates that ERA5 and HCLIM12km are both too wet, while HCLIM3km seems a bit too dry, especially in eastern Denmark (Figure 3). HCLIM750m seems to have a good representation of the higher monthly precipitation in mid-Jutland and in the northernmost part of Jutland, which is also quite well represented in HCLIM5km while not as clear in HCLIM3km. ERAI also performs favourably, as confirmed by the small scatterplots comparing rain gauge stations with the corresponding grid cell in the climate simulations ( Figure 3). Here, the best fits are found for ERAI and HCLIM750m. Analysing the difference in average monthly precipitation between simulations compared to POINT data, we look at the boxplot of all data, as described in Section 2.2. The boxplot shows a close fit between POINT data and HCLIM750m, as well as a reasonable fit between POINT and ERAI and HCLIM5km (see Figure 4). While it might seem surprising that ERAI performs well compared to the finer-resolution climate models, it is expected that all models would represent average precipitation well. The important point is that the HCLIM12km and HCLIM3km simulations exhibit wet and dry biases, respectively. The wet and dry bias could be due to the large domains which HCLIM3km and HCLIM12km are run on, while reanalysis products benefit from data assimilation. Looking at the maps and scatter plots of the average monthly number of dry days, ERAI, ERA5 and

F I G U R E 4 Boxplot of average monthly precipitation for
POINT data and the six simulations. Boxes shows the interquartile range between Q1 and Q3 (25th to 75th percentile) of all data over Denmark. The 50th percentile is shown inside the box. Whiskers extend to the highest/lowest value in the dataset within the range Q1 − 1.5 (Q3-Q1) to Q3 + 1.5 (Q3-Q1), values outside the whiskers are plotted as separate points. [Colour figure can be viewed at wileyonlinelibrary.com] HCLIM12km seem to have too many wet days, while HCLIM750m seems to represent the number of wet days in POINT the best ( Figure 5). HCLIM3km seems to be skewed towards fewer wet days and HCLIM5km does not seem to be able to capture the spatial variation in wet days.

Diurnal cycle
The diurnal cycle in European summer is characterised by a late afternoon peak, due to convective precipitation, caused by convective processes growing and organising during the day (Vergara-Temprado et al., 2020). Though, Médus et al (2022) showed that the afternoon peak was not very clear in Danish observations. RCMs have generally failed to reproduce this afternoon peak, as the convective processes are not well simulated in such models Lucas-Picher et al., 2021). For the diurnal cycle of the normalised rainfall amount (Figure 6 left), all models seem to perform reasonably well. HCLIM12km and HCLIM750m have a too large mid-day peak, while HCLIM3km has an evening peak which is not seen in POINT data. Calculating MAE at each hour, HCLIM750m performs best for the day as a whole (MAE values not shown). The diurnal cycle for the 99.5th percentile hourly intensity is better represented by HCLIM750m than in the other models (Figure 6 right). HCLIM12km seems to underestimate the hourly intensity in the diurnal cycle at the 99.5th percentile, while HCLIM3km has a late evening peak not seen in the observations (Figure 6 right). Due to the small simulation period it should be that the 99.5th percentile only corresponds to the five highest values. In general, we see that the amplitude of the diurnal cycle is not very large in the observational data. This is in accordance with the Médus et al. (2022) study of HCLIM3km and HCLIM12km using eight full years of data, showing a much lower signal of the diurnal cycle in Denmark compared to Norway and Sweden.

Extreme events
From the intensity-duration curves, we compare the intensity of the sampled extreme events across datasets from different ranks. For the most extreme event sampled (Rank 1 = R1), corresponding approximately to a return period of seven years (when using the median plotting position [Rosbjerg et al., 1992]), the POINT data have larger intensities than the climate simulations for durations below 180 min (Figure 7). This is very likely due to the very intense event which hit Copenhagen on the 2 July 2011. The event broke the rainfall record for the period from 1933 to 2016 and has been estimated to have a return period of more than 2000 years (Arnbjerg-Nielsen et al., 2014;Ziersen et al., 2017). For more frequent events, the three CPMs (HCLIM5km, HCLIM3km and HCLIM750m) represent the intensity of the extreme events in POINT data well for all but the shortest durations ( Figure 7). HCLIM12km, ERA5 and ERAI all have lower intensities than POINT for all ranks for all durations. This indicates an added benefits of a convection-permitting resolution, corresponding to previous findings (Chan et al., 2014;Lind et al., 2020;Thomassen et al., 2021;Médus et al., 2022), but no noticeable added benefits of a higher resolution between the CPMs. The boxplot of the variability in intensities across Denmark for cut-off threshold intensity and rank 2 and 6 events for 15, 60 and 360 min shows overall that the CPMs represent the variability and levels in the intensities well for durations above 15 min (Figure 8). On the other hand, HCLIM12km, ERA5 and ERAI struggle to represent the range in intensities and in general have too low intensities (Figure 8). For the 15-min duration, both HCLIM750m and HCLIM3km have too low intensities compared to POINT data for both cut-off level, R6 Q1, Q2 and Q3 as described in Section 2.2, shows that HCLIM3km performs slightly better than the other CPMs as an average over all three durations and the three levels of extremes (MAE values not shown). HCLIM3km produces slightly higher intensities and a better range at 15 min duration, while excluding this duration, HCLIM750m performs better as an average for durations of 60 and 360 min together.
The spatial variation in the threshold intensity (cut-off) for sampled extreme events for POINT data and the six models for durations 15-360 min is shown in Figure 9.
For the shortest duration, 15 min, the climate models (HCLIM750m and HCLIM3km) are unable to reproduce the intensities seen in POINT data (Figure 9, first row). For the longer durations, there is a better overlap in the intensity range between POINT data and CPMs, while HCLIM12km, ERA5 and ERAI have too low intensities. Across all durations, POINT data show lower threshold intensities in an area in the northern part of Jutland, which to some extent is represented in HCLIM3km and HCLIM750m. Both HCLIM750m and HCLIM3km show a band of higher intensities up through Jutland, F I G U R E 8 Boxplot of the variation in intensity of the extreme events over Denmark for the seven datasets. Extreme event intensities for threshold level (cut-off ∼0.33-year event), rank 6 (R6 ∼ one-year event) and rank 2 (R2 ∼ three-year event) are compared for durations of 15, 60 and 360 min (columns). The resampled datasets only contain the grid cells at which a rain gauge station is located. Due to the coarse resolution of ERA5 and ERAI, there are no resampled datasets for these. For an explanation of boxplot features, please refer to Figure 4. [Colour figure can be viewed at wileyonlinelibrary.com] but where HCLIM750m places this band close to the west coast of Jutland, HCLIM3km places it more inland, somewhat along the central, more elevated ridge of Jutland. This suggests a better performance of HCLIM3km based on the authors' knowledge of rainfall distribution over Jutland. Nevertheless, comparing the correlation between intensity in POINT data and different climate model datasets (only selecting the grid cells closest to the location of POINT data) does not show any correlation between either of the CPMs and observations (results not shown).

Spatial correlation of extreme events
For the unconditional spatial correlation of extreme events, we see clear added benefits when moving to convection-permitting models, yet no clear differences between the three CPMs (HCLIM750m, HCLIM3km and HCLIM5km, Figure 10). The e-folding distances show a slight improvement in HCLIM750m and HCLIM3km compared to HCLIM5km ( Table 2). The improved representation of the spatial correlation with F I G U R E 9 Spatial variation of threshold level (cut-off intensity) across Denmark for the seven datasets (columns) and for durations  the increasing resolution is in accordance with previous studies (Gregersen et al., 2013;Thomassen et al., 2022), which could have suggested an improvement between HCLIM750m and HCLIM3km. However, such improvement is only limited, and all models show longer correlation distances in extreme events compared to observed data, with larger disagreement between model and observations at coarser resolution. While rain gauges do not per se give information on spatial properties, Thomassen et al. (2022) showed that rain gauge data very well represented the spatial correlation structure of measured rainfall when compared to radar data.

Non-central sampling moments
The average non-central sampling moments show that the CPMs (HCLIM750m, HCLIM3km and HCLIM5km) are very similar to POINT ( Figure 11). HCLIM12km, ERA5 and ERAI are also very similar to each other, but for q > 1 they are distinctly different from the POINT data and CPMs. For the mean precipitation (q = 1), all models except HCLIM12km underestimate this, while HCLIM12km overestimates it but also has the smallest error compared to POINT. For q > 1, HCLIM750m has the lowest absolute error over the six aggregation levels. The CPMs all overestimate the variance (q = 2) and the skewness (q = 3) which can be interpreted as these models overestimating extreme precipitation for longer durations. The ERA products and the traditional RCM (HCLIM12km) underestimate q = 2 and q = 3, meaning that these models underestimate extreme precipitation. All models perform increasingly better with higher aggregation level for q > 1, yet for HCLIM750m and HCLIM5km, there are smaller biases between POINT and model for 60 min aggregation level than for 180 min, with decreasing bias for aggregation levels above 180 min. The trend is only the case for HCLIM3km for q = 2, whereas for q = 3 biases are decreasing for all aggregation levels. This small bias suggests that HCLIM750m and HCLIM5km have an improvement in hourly precipitation regarding variance and extreme precipitation. Furthermore, this hourly improvement is not seen for the three-hour resolution or could even compromise the performance for higher aggregation levels. For HCLIM750m, this results in biases for 60-min aggregation levels comparable to the biases for 1,440-and 2,880-min aggregation levels. From q > 1 the power law behaviour does not seem to hold, as a negative curvature can be seen for all datasets (including POINT). Especially for the ERA products and HCLIM12km, a clear negative curvature is seen. Previous studies have shown increasingly better moments (compared to observational data) with an increasing spatial resolution (Mayer et al., 2015;Sunyer et al., 2017), which is confirmed here, yet the difference between the CPMs is minimal. When moments have been applied on precipitation time series from across the world, it generally showed that the mean value (q = 1), the variance (q = 2), and the skewness (q = 3) varied smoothly as a (near) log-log linear function of the temporal resolution. The graphs in Figure 11 thus show that, in contrast to the CPM data, both the reanalysis datasets and HCLIM12km simply do not exhibit the variation and skewness in the time series that observations have. This may be interpreted as that the actual temporal resolutions of these datasets are much lower than the nominal resolution.

Performance assessment
Based on the metrics analysed in this study, we apply a simple performance diagnostic (MAE, see Section 2.2) to give an overview of the overall performance of each model simulation. Considering all metrics as a whole, HCLIM750m clearly has the best performance in MAE across all model simulations and added benefits can be found moving to the vsub-kilometre scale in this setup (Table 3). However, such added benefits are, in this setup and for the metrics currently studied, rather low and for most applications most likely not worth the higher computational and storage costs. It is thus questionable whether long-term projections in this setup would be worth the higher computational and storage costs. While HCLIM750m performs best for most metrics, HCLIM3km is often very close in performance to HCLIM750m, in turn followed closely by HCLIM5km. This shows clear added benefits of CPMs over traditional RCMs, but little added benefit with higher resolutions within the CPMs.
Regarding the intensity estimates of the extremes (R2 and R6) and the threshold level, HCLIM3km has the best performance (Table 3). As seen in Figure 8, the better performance of HCLIM3km is due to higher intensities in HCLIM3km for 15-min duration, which is closer to observations. For durations from one hour and above (d = 60, 180 and 360 min), HCLIM750m performs better than HCLIM3km (30-min duration shows better performance for HCLIM3km, results not shown). Many studies have found that CPMs tend to have too intense heavy rainfall (e.g. Kendon et al., 2017;Prein et al., 2017), which has not been found in this study for sub-hourly resolutions. An explanation could be that sub-hourly precipitation extremes are rarely assessed in these papers (due to limitations in available temporal output resolution or observations). Another explanation could be the selection of years studied here, which are not representative of a classical 30-year climatology. The selected years contain some of the most severe extreme events observed, which are less likely to be overestimated in the simulations. The better performance for sub-hourly extreme event intensities in HCLIM3km might thus not have been found if the same analysis was performed on a more representative time period. As stated in Section 2.2, this study does not analyse the influence of the different reanalyses driving the simulation -ERA5 and ERAI. We acknowledge that the different reanalyses may influence the results, and therefore we have included the reanalysis products in our analysis. However, we believe that the behaviour of extreme precipitation is largely a property of the innermost model and that our results suggest that the choice of driving model does not appear to be the dominant influence. We build this upon the fact that our results show the best-performing simulations (HCLIM750m and HCLIM3km) are driven by two different reanalyses and HCLIM5km, driven by the same reanalysis as HCLIM750, does not outperform HCLIM3km. Finally, it should be stressed that this analysis is based on a small geographical area with a low variation in topography. Therefore the results cannot necessarily be generalised to other regions/climates.

CONCLUSION
The six simulations are nested in a way that to some extent makes it difficult to disentangle differences due to resolution and due to driving data. Nevertheless, the differences are sufficient to conclude that this study has identified added benefits in sub-kilometre resolution convection-permitting climate models. Nevertheless, the added benefits of moving to sub-kilometre resolution seem surprisingly limited, given earlier studies of the added value of higher resolutions (Prein et al., 2013;Chan et al., 2014;Lind et al., 2016;Olsson et al., 2021;Médus et al., 2022). The limited added benefits could be due to the details of the model setup, which is similar to the model setup for the HCLIM3km apart from the higher resolution. To obtain more significant added benefits with sub-kilometre resolution, one might need to consider the vertical resolution, the parametrisation of shallow convection or the need for an even higher horizontal resolution. Vergara-Temprado et al. (2020) suggest that the explicit modelling of deep convection works better for resolutions coarser than expected and that grey-zone resolution might extend to even sub-kilometre resolution. On the other hand, HCLIM750m has also shown lower intensities than HCLIM3km for sub-hourly extreme events. This might indicate a lower tendency towards too intense extreme events, although, in this study, the lower intensities also result in worse performance when compared to observations.
The study furthermore found differing conclusions for hourly and sub-hourly extreme events regarding the performance of CPMs. The CPMs in general represented hourly extreme events much better than sub-hourly extreme events. This indicates that some precipitation processes are still not well represented and emphasises the importance of analysing sub-hourly precipitation in climate model simulations. This study concludes that added benefits from sub-kilometre simulations must be further assessed. HCLIM750m does perform better than HCLIM3km, but in light of the limited added value found here, sub-kilometre model setups would benefit from a more detailed evaluation to assess the trade-off between improved performance and the additional computational cost.

ACKNOWLEDGEMENTS
Emma D. Thomassen, Rasmus A. Pedersen and Ole B. Christensen received funding from the Danish State through the Danish Climate Atlas. Peter L. Langen gratefully acknowledges the financial contributions of Aarhus University Interdisciplinary Centre for Climate Change (iClimate, Aarhus University). Jonas Olsson was supported by the Swedish Ministry of the Environment and Energy through Grant 1:10 for climate adaptation. The POINT dataset is a product of The Water Pollution Committee of The Society of Danish Engineers made freely available for research purposes. Access to data is governed by the Danish Meteorological Institute, and they should be contacted for enquiries regarding data access. ERA Interim is a data product of the European Centre for Medium-Range Weather Forecasts (ECMWF) and published under Creative Commons Attribution 4.0 International (CC BY 4.0). ECMWF does not accept any liability whatsoever for any error or omission in the data, their availability, or for any loss or damage arising from their use. ERA5 (Hersbach et al., 2018) was downloaded from the Copernicus Climate Change Service (C3S) Climate Data Store. The results contain modified Copernicus Climate Change Service information 2020. Neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus information or data it contains. HCLIM12km and HCLIM3km simulations were performed by the NorCP (Nordic Convection Permitting Climate Projections) project group, a collaboration between the Danish Meteorological Institute (DMI), the Finnish Meteorological Institute (FMI), the Norwegian Meteorological Institute (MET Norway), and the Swedish Meteorological and Hydrological Institute (SMHI). HCLIM5km and HCLIM750m have been produced by the Danish Meteorological Institute. The use of ECMWF's computing and archive facilities in this research is acknowledged.