Regional frequency analysis of annual daily rainfall maxima in Skåne, Sweden

Extreme daily rainfall events are critical for the urban drainage system, human life, agriculture and small catchments. The information about extreme rainfall magnitudes and frequencies is immensely important for civil engineers, city planners, scientists related to water management, rescue operations and flood control works. This study illustrates the results of regional frequency analysis (RFA) of annual maximum daily rainfall (AMDR) of Skåne County, Sweden. L‐moments based heterogeneity measure (H) reveals that the Skåne County is a homogeneous region. Based on the L‐moment ratio diagram and ZDist statistic results, the generalized normal (GNO) distribution is selected as the most suitable regional distribution. The accuracy measures used in K‐fold cross validation indicate that support vector machine (SVM) model is an appropriate model to find the index rainfall at ungauged sites in the region. The sites characteristics, elevation and latitude are identified as the most important variables to explain the variation in mean annual maximum daily rainfall (MAMDR). Finally, spatial maps of predicted MAMDR for different return periods are constructed by using index rainfall combined with regional quantiles. Spatial maps offer an overall view of the expected MAMDR in the region that is helpful for multiple decision makers including infrastructure planners, city planners, emergency managers, engineers and many others.

event, and at the same time are quite vulnerable to natural disasters like extreme rainfall, floods and droughts. Therefore, it is critical to have precise estimates of rainfall quantiles in this region.
Skåne County experienced flash-floods on 31st August 2014. According to Swedish Meteorological and Hydrological Institute (SMHI), Malmö city received a month's worth of rainfall (88 mm). Another town, Falsterbo, received 43 mm of rain during the night and early morning see Radio Sweden (2014) and The Local (2014). Flash floods may cause numerous deaths as they occur with a small warning which is not sufficient to warn and evacuate people. Therefore, the annual maximum daily rainfall (AMDR) is considered in this study, which is useful knowledge for water management, urban planning, drainage design, agricultural water management, planning of flood management, insurance studies, tourism industries, preventive measures for natural disasters and construction of hydraulic structures (bridges, spillways embankments, etc.).
There are mainly two approaches to frequency analysis in a practice known as at-site and regional analysis. We need a substantial rainfall data series for at-site extreme value extrapolation which is rarely available in practice. In this case, an efficient statistical method should be preferred for extreme value prediction. We can efficiently utilize regional frequency analysis (RFA) to reduce the prediction uncertainty if the region having rainfall-gauging sites can be established as a homogeneous region. One vital advantage of RFA over at-site is that the results of the regional quantile estimates can effectively be used to estimate the quantiles at ungauged sites. In RFA, using the L-moment method, the data from all sites in a region, regardless of different record lengths and missing values, can be combined to estimate the rainfall quantile for different return periods at any location in the region (Hosking and Wallis, 1997).
The estimation of rainfall magnitude at ungauged sites for different return periods is a key issue. Therefore, further modelling is required to estimate the index rainfall at ungauged sites in the region (Hosking and Wallis, 1997). The use of simple/multiple linear regression (MLR) model to estimate the extreme quantiles at gauges/ungauged sites is commonly traced in the literature (see e.g., Al Mamun et al. 2018;Hussain et al. 2017;Modarres 2008;Um et al. 2010;Naoum and Tsanis 2004b;Hailegeorgis and Alfredsen 2017). To estimate the flood quantiles at ungauged sites, Desai and Ouarda (2021) have utilized random forest (RF) regression. Malekinezhad and Zare-Garizi (2014) and Forestieri et al. (2018) use mean annual precipitation as an explanatory variable in a simple linear regression model to predict index rainfall. Szolgay et al. (2009) and Hussain et al. (2017) develop a linear regression model to predict index rainfall using elevation as an explanatory variable.
Four machine learning models are considered in the present study along with one geostatistical interpolation model for estimating rainfall quantiles at ungauged sites. Machine learning models include MLR, RF, Gaussian process (GP) and support vector machine (SVM) model which require some characteristics of gauging sites (e.g., elevation and average annual temperature) to predict average rainfall at ungauged sites. Among geostatistical models, the inverse distance weighting (IDW) model is considered which does not require site characteristics. To evaluate the model prediction performance, a K-fold cross-validation (CV) algorithm is used. The K-fold CV is a robust method to estimate the accuracy of a model (James et al., 2013). The best predictive model is chosen based on accuracy measures defined in Section 3.7. This is the first study of regional rainfall frequency analysis (RRFA) in the Skåne region. We use the methodology proposed by Hosking and Wallis (1997). The main objectives of this study include: the identification of homogeneous regions/region, selecting the best suitable regional frequency distribution and choosing the most appropriate interpolation model for the estimation of mean annual maximum daily rainfall (MAMDR) at ungauged sites in the region. Lastly, we estimate the extreme quantiles for different return periods with non-exceedance probability at gauged/ungauged sites in the Skåne region using regional quantiles and predicted average rainfall magnitude with the suitable model. The spatial map of the expected MAMDR in the region for different return periods is also presented.

| STUDY REGION AND DATA
County Skåne (English: Scania) of Sweden is located in the southeast of the country having a total area of 11,303 km 2 . The left panel of Figure 1 indicates the location of Skåne in Sweden. It shares the border with the counties of Halland, Kronoberg and Blekinge. It is also connected with the capital of Danmark through the Öresund bridge. Skåne is one of the warmest regions of Sweden with the average dailyhigh temperature of 12 centigrade. The daily rainfall data for each site are collected from the Swedish Meteorological and Hydrological Institute (SMHI) (www.smhi.se). The maximum value out of the daily rainfall record of each year for the complete span of data at each site is used in this study. A map of rainfall-gauged sites used in this study is presented in Figure 1. Ystad is the only site where 6 years (1984)(1985)(1986)(1987)(1988)(1989)  RFA, which based on L-moments, is used in this study. The L-moments can be defined for any random variable z if its mean exists (Hosking, 1990). The probability weighted moment (PWM) of rth order (α r ) is defined as The first four L-moments in terms of PWMs can be computed by using the following relationship where λ 1 represents the location and λ 2 is a dispersion. The L-moments ratio can be defined as The first four unbiased estimates of PWM (α r ) for any probability distribution can be computed as where z p : n is the data set in ascending order. Hosking (1990) defines the sample L-moments denoted by l 1 , l 2 , l 3 and l 4 . The sample L-moment ratios are defined by Hosking and Wallis (1997) as where t is a sample L-coefficient of variation (L-CV), t 3 represents a sample L-coefficient of skewness (L-Skewness) and t 4 is a sample L-coefficient of kurtosis (L-Kurtosis).

| Discordancy measure
The discordancy measure D i is used to identify a discordant site in the region. It is based on L-moments. The discordancy measure is defined by where N is the number of sites in the region, Site i in the region is considered as a discordant site if D i is greater than the critical value. For N ≥ 15 the critical value for D i is 3. Hosking and Wallis (1997) provide the table of critical values of D i for N < 15. Hosking and Wallis (1997) propose the heterogeneity measures (H) for identification of homogeneous region. H is computed as

| Heterogeneity measures
while μ V and σ V are the mean and SD of Vrespectively.
The values of V are obtained through simulation. N is the number of sites in the region, n i is the record length for site i, t = L -CV and t (R) is a weighted average of L -CV.
To perform the simulation, we first fit a Kappa distribution using the regional average L-moment ratios l (R) , t (R) , t R ð Þ 3 and t R ð Þ 4 and use this distribution to simulate the region. In a simulated region, sites have same record length as in the original data. We calculate the value of V for each simulated region.
For sufficient large value of H, we declare the region heterogeneous. The region under study is acceptably homogeneous for H < 1, for 1 ≤ H < 2 the region is possibly heterogeneous and for H ≥ 2 the region is definitely heterogeneous.

| Goodness-of-fit methods
Goodness-of-fit (GOF) methods are used to identify the best suitable regional frequency distribution. In this study, we consider the L-moment ratio diagram and Z Dist criterion to select the most appropriate regional frequency distribution.
The L-moment ratio diagram, which is gaining much attention in the literature, is a means to find the suitable distribution for RFA (Peel et al., 2001). The L-moment ratio diagram is a scatter plot of L-skewness (t 3 ) versus L-kurtosis (t 4 ) of each site in the region along with a point of their regional average. We can assess the appropriateness of a particular probability distribution using the relationship of sample L-moment ratios (t 3 , t 4 ) with L-moment ratios of various probability distributions. These relationships are described by Hosking and Wallis (1997). The probability distribution is appropriate for the region if its growth curve is close to average sample L-moment ratio.
The Z Dist criterion is another method proposed by Hosking and Wallis (1997) to select the appropriate distribution for the region. It is defined as where τ Dist 4 = L-kurtosis of the fitted distribution, B 4 = bias of t R 4 , t R 4 = weighted regional average L-kurtosis, σ 4 = SD of t R 4 are obtained by simulation.
The candidate distribution is accepted as a suitable distribution for the region if jZ Dist j < 1.64. It is possible that many probability distributions are accepted as appropriate distributions for the region. In that case, the most appropriate distribution among the accepted distributions is the one that has the smallest value of jZ Dist j.

| Quantiles estimate at gauged/ ungauged sites
After the selection of the most appropriate regional frequency distribution, the main goal of the RRFA is to estimate precise quantiles at gauged/ungauged sites in the region. The quantile estimate for the i th site (Q i F ð Þ) in the region with non-exceedance probability F can be computed aŝ whereq r F ð Þ represents the regional quantile estimate with non-exceedance probability F. To find the quantile at an ungauged site, we need the value of MAMDR on that site. We cannot compute the MAMDR at ungauged sites due to unavailability of data on these sites. In this situation, we can model the relationship between the MAMDR at gauged sites (l 1 ) and some geographical or climate characteristics of gauged sites.
To estimate the MAMDR at ungauged sites in the Skåne region, we have used inverse distance weighted (IDW), MLR, RF, GP and SVM models. We have to identify suitable explanatory variables (site characteristics) that explain variation in l 1 for machine learning models (MLR, RF, GP and SVM). The details of IDW, MLR, GP, RF and SVM are provided in Section 3.7. The K-fold cross validation is used to evaluate the prediction performance of these models. The best model is selected based on the accuracy measures in K-fold CV. Finally, the most suitable model is used to estimate the MAMDR (l 1 ) at ungauged sites in the region. To find the quantiles at ungauged sites for different return periods with nonexceedance probabilityF, we can replace l 1 in Equation (1) byl 1 .

| Accuracy measures
For accuracy measures, we have used the mean absolute error (MAE), root mean square error (RMSE), prediction error rate (PER) and correlation coefficient (R 2 ) in order to evaluate the model prediction performance. These accuracy measures are defined as where z j are observed values,ẑ j are expected values from the model and z is a sample mean of z j .

| IDW
The IDW method is one of the oldest, least complicated and most commonly used interpolation methods. The IDW is a deterministic model. It gives more weight to the observations which are closer to known values when estimating the values at unobserved points. The IDW formula is defined aŝ whereẑ s 0 ð Þ is the estimated average rainfall value at ungauged position x 0 ,Ẑ s i ð Þ is the observed rainfall value at gauged location s i , d s 0 ,s i ð Þ is a distance measure from ungauged position s 0 to known gauging point s i and m is a coefficient that adjusts the weight according to distance. For more detail of the IDW method see Johnston et al. (2001).

| MLR model
MLR describes the linear relationship between a response variable and two or more explanatory variables by fitting a linear equation to the observed data. The model for MLR can be defined as Where z is a response (dependent) variable whereas x 1 , x 2 , …, x p are independent or explanatory variables. To estimate the parameters with the least squares method, we minimize the squared difference between observed and expected response that is, minimize

| GP
The GP regression is a nonparametric Bayesian approach to regression. It is a kernel based probabilistic model and can be viewed as a Bayesian version of SVM models. The GP regression models are robust against the model overfitting problem (Thapa et al., 2020). The model can be summarized as follows.
The observational model: We assume a GP prior: and hyper prior: Where m(x) is a mean function, k(X, X 0 j θ) indicates the covariance function of a process f(x), θ represents kernel parameters and φ indicates the parameters of the observational model. The joint distribution of test output (f ) and training output (f) based on the prior is expressed as In the predictive equation off , one could use a different Kernel function, for example, Gaussian, linear, Bessel and polynomial. In this study, we have used a polynomial kernel.

| SVM
The SVM is a well-known supervised machine learning method which is utilized in regression, pattern recognition and classification problems. It is an efficient and robust machine learning algorithm for flood prediction (Dehghani et al., 2014). The SVM algorithm uses the structural risk minimization theory and reduces the overfitting problem (Yu et al., 2006). The objective of SVM for regression is to find a regression function that best approximates the response observations (z) with an error tolerance value ε. The decision function can be defined by the following equation where w and δ are vectors of parameters of the functions and ϕ(x) is a non-linear function. The quality of estimation is measured by the loss function L(z i ). The SVM regression uses Vapnik's ε -insensitive loss function: The SVM problem can be expressed as the following convex optimization problem where ξ Ã i and ξ i are slack variables indicating the upper and lower training error subject to error tolerance ε and C (positive constant) that determines degree of penalized loss when a training error occurs. The optimization problem can be transformed as a dual problem using a Lagrangian function and this leads to the following dual function Where γ, γ * are Langrage multiplier factor, m represents the number of support vectors and K(x i , x j ) is a kernel function. One could select different Kernel functions, for example, linear, Gaussian, sigmoid and polynomial. In this study, we have used a polynomial kernel. For further details of the SVM method see Vapnik (1995).

| RF
The RF is the most popular, oldest and an efficient decision tree-based machine learning predictive model. It is an industrial workhorse of machine learning.
The algorithm of RF is summarized by following steps a. Select n data points and p features (explanatory variables) randomly from the training data set. b. Construct decision tree based on the dataset in step (a) c. Select the number of trees (N) that we want to build and repeat step (a) and (b) d. To predict a value (z) for a new data point, we use each of our N trees to predict the value of y and take an average across all of the predicted z values. This average serves as a predicted value for the new data point.
The RF is competitive with VSM and Neural Networks in many practical tasks of prediction. However, it is much faster to train as it has fewer parameters. There is less risk of overfitting in RF because it reduces the variance by training on randomly selected different samples from the training data. For further detail of the RF method see Hastie et al. (2009).

| K-fold CV
K-fold CV is a robust method for estimating the accuracy of the model. K-fold CV is better than leave-one-out cross validation (LOOCV). It often provides more accurate estimates of test error rates (James et al., 2013). In this study, K-fold CV is used to examine the model's prediction performance. The K-fold CV algorithm has the following steps: • In the first step, we randomly divide the data into K groups/subsets/folds. • Treat one group of data as a test or validation data and train the model on another group of data (called training data) • Test the model on a validation subset and compute some accuracy measures, for example, RMSE, R 2 , MAE and PER as evaluation metrics. • Repeat this procedure K times. Each time a different group of observations serves as validation data. • Take the average of the accuracy measures in K records. This average serves as a model performance measure.

| Spatial map
Spatial mapping of rainfall is a helpful tool to depict the rainfall frequency for different return periods throughout the study region. It is a map of the region where colours indicate the intensity of expected rainfall in different areas of the region.
To construct the map, we use gridded data of the Skåne region having 23,231 grid-cells. We estimate the MAMDR valuel 1 for each grid-cell using one of the best models mentioned in Section 3.7. The quantile estimate for each grid-cell can be computed using Equation (1) by replacing l 1 withl 1 . Finally, the values of grid-cells lie in different numerical ranges are represented by different colours.

| RESULT AND DISCUSSION
The RRFA is performed using annual maximum daily rainfall (AMDR) series of 24 sites located in Skåne County, Sweden. The flowchart (Figure 2) describes the main steps involved in analysing the data. It is always crucial to review the appropriateness of data before applying any statistical technique.
The principal statistical assumptions of at-site and RFA include independence, stationarity and randomness of data series at each gauging site (see e.g., Ul Hassan et al. 2019;Hussain et al. 2017;Shahzadi et al. 2013 andKousar et al. 2020). To investigate the serial correlation of the observed data series at each gauging site, we use the lag-1 correlation coefficient (r). The corresponding p values of lag-1 correlation coefficients (r) indicate no serial correlation of data series at any given site, see Table 2. We use run test and Mann-Kendall test to examine the stationarity and randomness of the observed data series at each gauging site, respectively. The results of these tests are presented in Table 2. According to p values of these tests, the data series at each gauging site holds the assumption of stationarity and randomness.
To examine the discordant site in the region, the discordancy measure D i is used which is based on L-moments. The summary statistics along with a discordancy measure value of each gauging site are presented in Table 3. All the sites , D i values are less than the critical value three suggested by Hosking and Wallis (1997). This indicates that there is no discordant site in the region. The results of various tests at all sites of the Skåne region demonstrate that the data at each site is suitable for RFA. The next step is to establish that these 24 study sites of Skåne can be considered as a single region. The heterogeneity measure (H) is calculated using the procedure in Section 3.3. To compute H, 500 simulations are carried out using four-parameter Kappa distribution. The estimated parameters of Kappa distribution used in the simulation are x i = 0.801, α = 0.265, k = − 0.090 and h = 0.165. The value of H is −0.10 which is less than 1. So, the entire Skåne region based on these 24 sites can be seen as a homogeneous region. This suggests that the Skåne region is suitable for RFA. We expected this because of similar climatological and topography characteristics of the Skåne region.
The terrain of Skåne County is almost flat. The gauged sites in the region are within a small range of MAMDR, which is 30.083-40.789. This one climate characteristic is enough to constitute the boundary of the region. Many studies have formed homogeneous regions based on a small range of mean annual precipitations F I G U R E 2 Flowchart summarized the main steps of the methodology [Colour figure can be viewed at wileyonlinelibrary.com] (see e.g., Wallis et al. 2007 andSchaefer et al. 2006). If we look at the seasonality of the extremes AMDR in the region, overall 81% and at least 67% at each sites extremes AMDR are in the period of July to October. A previous study about the regional analysis of short-duration rainfall extremes finds the south-west counties of Sweden as one homogeneous region, which includes Skåne (Olsson et al., 2019).
After identification of the homogeneous region, the next step in the RFA is to choose the best suitable frequency distribution for the region. In RFA, a single frequency distribution is fitted for all the sites in the region that provides the most accurate quantile estimates at each site in the region. To determine the most appropriate distribution for the region, we use the L-moment ratio diagram and Z Dist criterion. In the L-moment ratio diagram, the regional average Lskewness (t R 3 ) and L-kurtosis (t R 4 ) are plotted in Figure 3, represented with a red dot. This dot lies closest to the theoretical curve of the generalized normal (GNO) distribution. Therefore, the GNO distribution seems to be a well fitted distribution for the Skåne region. Table 4 provides jZ Dist j values of various three parameter distributions for the region. The value of jZ Dist j is less than 1.64 for the GNO and GEV distributions. So, these distributions are accepted as suitable candidate distributions for the region. Furthermore, we have a smaller value of jZ Dist j for GNO than GEV. Therefore, GNO is the most favourable and appropriate distribution for the region. It also validates the conclusion from the L-moment ratio diagram. We have used the R package lmomRFA developed by Hosking (2019) for Lmoment analysis.
After a successful selection of the most appropriate regional frequency distribution, the next step is to estimate the regional quantiles for different return periods using the parameter estimates of GNO distribution in Table 4. The regional quantiles for different values of the non-exceedance probability (F) along with 90% error bounds and RMSE are presented in Table 5. The RMSE and 90% error bounds are obtained by using Monte Carlo simulation (Hosking and Wallis, 1997 Note: r represents the Pearson correlation coefficient and n indicates the sample size of data series. and RMSE indicate that there is more uncertainty around the regional quantiles estimate at high return periods. Atsite quantiles can be estimated using Equation (1) in Section 3.5.
The quantile estimates at ungauged sites in the region are paramount to engineers, city planners and scientist related to water management. To estimate the quantiles at ungauged sites, the MAMDR is needed on these sitessince MAMDR cannot be computed at ungauged T A B L E 3 Summary statistics for annual maximum of daily rainfall (in millimetres) of 24 sites of Skåne Note: Where n is a length of data series, l 1 represents the first sample L-moment (mean), t is the sample L-CV, t 3 indicates the sample L-skewness, t 4 depicts the L-kourtosis and D i is a discordancy measure value.  sites because of unavailability of data. Thus further modelling to estimate the MAMDR is required. For this purpose, we can use the relationship between the MAMDR at gauged sites (l 1 ) in the region and their characteristics. In this study, three characteristics (longitude, latitude and elevation) of gauged sites are used. Unfortunately, these are the only characteristics available for each ungauged sites in the region. This is also a limitation of the research. Therefore, only the best explanatory variables for the models among these characteristics to estimate the rainfall index for each ungauged site in the region are selected. The correlation matrix of the site characteristics is The correlation matrix exhibits that l 1 has a high correlation with latitude and elevation. Therefore, latitude and elevation are important variables to explain the variation in l 1 . Figure 4 manifests that there is more rain in elevated areas. The MAMDR is increasing along the latitude. Figure 4 also validates the results from the correlation matrix. Therefore, these variables are used as explanatory variables for MLR, RF, GP and SVM models. Polynomial kernels in GP and SVM models are used in order to capture the nonlinear relationship between MAMDR and explanatory variables.
Many studies of rainfall in different areas in the world find great effects of latitude on a special distribution of rainfall and use it as an important variable in order to predict the average rainfall (see e.g., Becerra (2016) Lv and Zhou (2016) etc.). According to Water Encyclopedia, the formulation of clouds and precipitations is a function of latitude, topography and some other factors (Water Encyclopedia 2021). Sweden is located in northern Europe and is rather extended in latitude having different types of climate regions along the latitude and hence making the variable latitude is more relevant to rainfall magnitude.
The machine learning models are data hungry and their predictive performance heavily relies on the size of the training data set. Many research studies demonstrate that it is hard to have a training data set with enough size. This study have used 20 observations to train the models. It is a limitation of this study. The problem of overfitting of models arises due to small sample sizes. To avoid overfitting and produce a model that have good predictive abilitysimple models like (IDW and MLR), and models that resist overfitting (RF, SVM, GP) are used. Moreover, K-fold CV and relevant features of the models are used.
The 10-fold CV algorithm as described in Section 3.8 is used to identify the best model which has an adequate predictive performance. For 10-fold CV, we consider 80% of the observed data as training data and 20% as test data. The model that possesses the lowest RMSE, MAE and T A B L E 5 Regional quantile estimates with accuracy measures using the best-fitted distribution for homogeneous region Skåne PER value and the highest R 2 value is considered as the best predictive model. Accuracy measures in Table 6 indicate that the IDW method which does not use any explanatory variable performs about equally well as the MLR model. Based on the accuracy measures, the SVM model is chosen as the best predictive model. Therefore, this model is used to estimate the MAMDR at any ungauged sites (l 1 ) in the region. The range of estimated MAMDR for all ungauged sites is 30.731-40.984 which is close to the range of MAMDR for gauged sites. This shows that all ungauged sites should be part of this region. To construct the spatial map of the region the SVM model is used to estimate the rainfall index all over the Skåne region. Lastly, by using Equation (1), isopluvial precipitation maps of Skåne County ( Figure 5) are generated for different return periods with non-exceedance probability F. Isopluvial precipitation maps of Skåne County ( Figure 5)   areas close to the south coast are expected to receive less rain while the northern parts of Skåne are expected to receive more rain. It is also observed that highly elevated areas receive more rain. Overall, there is an increasing trend of expected rain with increasing latitude.

| CONCLUSION
Daily rainfall data are prime information for urban drainage systems and management of flash flood. In this study, RFA is carried out by using annual maximum daily rainfall (AMDR) data for 24 gauging sites in the Skåne County, Sweden. None of the sites are discordant based on a discordancy measure. All the assumptions of RFA are satisfied for all rainfall gauging sites. The result of a L-moment based homogeneity measure reveals that the Skåne County is a homogeneous region. This result was expected due to similar climate and topography of the study area. The GOF methods indicate that GNO is the most suitable regional frequency distribution for this region.
The results manifest that latitude and elevation are important variables among the available set of site characteristics to explain the variation in the MAMDR at gauged site (l 1 ). Therefore, these variables are used in machine learning models as explanatory variables to predict the MAMDR at ungauged sites. To detect the best predicted model, K-fold cross validation is carried out. Accuracy measures indicate that the SVM model offers the best prediction performance. Therefore, the SVM model is used to estimate index rainfall (at-site MAMDR) efficiently in the Skåne County. Index rainfall combined with regional quantiles is used effectively to estimate the rainfall quantiles at ungauged sites for different return periods in the region. Finally, a spatial map of the expected AMDR for different return period is constructed. The map depicts that areas of high elevation receive more rainfall. There is an escalating trend of rainfall magnitude with increasing latitude.
The results found in this research study can be useful for management of flash flood, water resource planning, designing the urban drainage system, crop insurance and many other purposes related to water management activities in this region. The findings of this research could be used as guidelines for RFA of other regions in Sweden similar to Skåne County. The predictive model (SVM) for index rainfall is based on only two explanatory variables. Therefore, the prediction results of MAMDR can be improved by adding more sites characteristics, if available. This study is limited to daily rainfall data. One may consider sub-daily, monthly and seasonal data for further research in this region.