Rainfall regionalization and variability of extreme precipitation using arti ﬁ cial neural networks: a case study from western central Morocco

Here, we investigate the precipitation regionalization and the spatial variability of rainfall extremes, using a 47-year long station-based dataset from western central Morocco, a region with marked topographic and climatic variations. The principal component analysis revealed three homogeneous rainfall regimes, consistent with topographic features: the coastal area receives heavy rainfall during autumns and winters, whereas the inner lowlands, in the middle of the study area, are characterized by an overall rainfall de ﬁ cit regardless of their high water demand for irrigation, and the highest rainfall amounts take place in the mid-mountain area, including the summer seasons. Furthermore, the frequency analysis of daily rainfall extremes revealed high ten-year precipitation amounts in the coastal region (about 88 mm) and exceptional daily precipitation for longer return periods (182 mm for a 100-year period). Using arti ﬁ cial neural networks, the spatialization of these extreme precipitation events shows that they increase from the plain to the Atlas mountains and especially from the plain to the Atlantic Ocean. The spatial distribution of extreme precipitation highlights the areas where stormwater management needs to be improved, such as ef ﬁ cient stormwater drainage, and where ﬂ oods are more likely to take place in the future.


INTRODUCTION
Water resources management for agricultural, industrial, or domestic use is closely linked to a thorough knowledge of climatic vectors, including precipitation (Hiez ) which is often disturbed by the variability of rainfall intensity.
Indeed, the variability is a climatic feature that has become more influential than the long-term average (Whitford ). This is particularly the case in arid and semi-arid climate areas, where populations must adapt to the recurrent dry periods which are often interrupted by short precipitation extreme events. Hence, the delineation of homogeneous rainfall areas is essential to understand regional climate regimes and to better manage the meteoric water resources.
In this context, rainfall regionalization has become necessary in semi-arid and arid environments (such as central and southern Morocco, respectively) for various purposes; in particular, agricultural planning, drought analysis, design of water management structures, and land use planning. Therefore, one of the most widely used methods for distinguishing different rainfall regimes is multivariate precipitation that occurs in the form of intense showers, which are separated by long dry sequences and are harmful to soils and agricultural yields. This is very alarming, considering that rainfall studies, for the inventory and management of water resources for example, have been focused on the trends of average rainfall amount without giving enough attention to the behavior of precipitation extremes. The lack of research into precipitation extremes often stems from not having access to short time step data (Goula et al. ), especially in developing countries.
In this study, we aim to analyze the frequency of daily rainfall extremes and their probability of occurrence in western central Morocco. It is an important agricultural region and water scarcity is expected to be one of the key water challenges. Using the PCA, a rainfall regionalization will be performed to distinguish homogeneous rainfall regions.
Furthermore, using artificial neural networks (ANN), spatial variability of precipitation extremes will also be examined in order to provide a regional overview of the organization of these extremes.

Study area and data
The Tensift basin is a watershed located in western central Morocco with the main tributary leading to the Atlantic Ocean. Tensift is located between the latitudes 30 50 0 and 32 10 0 north and the longitudes 7 25 0 and 9 25 0 west ( Figure 1). The basin consists of a broad alluvial plain, that is generally arid, and a vigorous mountainous area in the south, which collects and transports most surface water to the plain. The mainstream flows from east to west with a total length of approximately 260 km. This temporary wadi (Arabic term for a valley, that is often dry, except during the rainy season) drains a catchment of 18,500 km², where altitudes range from 43 to 4,167 m.a.s.l. (meters above sea level) (the highest peak in North Africa). The slopes generally become stronger from the plain towards the High Atlas Mountains, with an average exceeding 20 . The most common orientations of the slopes are north, west, and north-west.
The Tensift catchment is exposed to the rainy disturbances originating from the Atlantic Ocean. However, the climate is characterized by relative aridity in the inner plain (less than 250 mm of rainfall per year). Acuteness of this aridity is conditioned by the low altitude and the sub-Saharan latitude. Contrariwise, the mountains are characterized by a heavy rainfall (more than 500 mm per year) and perennial fluvial flows. The seasonal contrast is very marked in the mountains and the rainy events are usually more frequent during autumn (Sep-Oct-Nov) and winter (Dec-Jan-Feb) (Saidi et al. ). These events are irregular, hard to predict, and sometimes intense and violent. During the rest of the year, drought mainly occurs in the lowland area where temperatures are high and evaporation rates are important. The annual thermal amplitude is also quite considerable, with temperatures reaching up to 45 C during summers (Jun-Jul-Aug) and dropping to below 5 C during winters.
The data used in this study consist of daily and monthly precipitation retrieved from 16 meteorological stations located over altitudes ranging from 53 to 1,100 meters ( Figure 1, Table 1). These data cover a 47-year period (1970/71 to 2016/17).

Principal component analysis
PCA is a descriptive statistical analysis that releases as much information as possible from a data table, information like optimal graphic representation of individuals (lines) and variables, by best explaining the initial links between these variables (that are extreme precipitations) (Smith ; Ringnér ). In our case, this table consists of individuals (rainfall data from 16 stations) and variables (47 years of monthly rainfall amount). The first main component is that for which the variance of the observations is maximal and which better illustrates the dispersion of the observations. The other components are also classified according to the degree of their explanation of the variation of the observations.

Frequency analysis
For the frequency analysis, many statistical laws allow the statistical adjustment of extreme weather events, in order    ), as the estimation of its parameters has shown very good statistical properties for large samples. Therefore, we chose the WM method to adjust our models.
After adjusting the models, the numerical confirmation of graphical results is necessary for selecting the most suitable frequency models. This selection can be formalized as follows: • A sample of size n D ¼ x1; . . . ; xn, is available in ascending order.
• The sample is taken from an unknown parent distribution f (x).
• M j , j ¼ 1; . . . ; N m are the operating models used to represent the observed data.
• The observed data are in the form of probability distributions, M j ¼ g j (x; c θ ).
• (θ) are the parameters estimated from the available data sample D.
The purpose of the model selection is to identify the optimal model (M opt ) that is best suited to represent the data, i.e., the model closest to the parent distribution f (x).
However, the adoption of evaluation criteria for these laws is required to better evaluate their suitability for the analyzed samples, because, with many possible models, there would be different statistical combinations of explanatory variables. Two criteria of selection, the most used in the literature, will be taken into account, namely, the Akaike information criterion (AIC) (Akaike ) and the Bayesian information criterion (BIC) (Schwarz ). These criteria are given by the following equations: Akaike's information criterion (Akaike ): where L is the maximized value of the likelihood function for the estimated model and K is the number of parameters in the estimated model.
For ARMA( p, q) models K ¼ p þ q, and the AIC can be calculated as: where σ is the variance of the innovation process.
Bayesian information criterion (Schwarz ): where T is the number of observations. For ARMA( p, q) models, K ¼ p þ q and the BIC can be calculated as: This criterion therefore represents a compromise between bias (which decreases with the number of parameters) and parsimony (description of the data with the minimum of possible parameters) (Lancelot & Lesnoff ). A few years later, Schwarz () developed BIC, derived from AIC. Unlike the latter, the penalty depends on the size of the sample and not just the number of parameters. Therefore, we will use these two criteria to choose the appropriate distribution.

Artificial neural networks (ANN)
The spatial distribution of an element is a classical problem of estimating a function f(x), where x ¼ (x, y), at a point x p of the plan from known values of f into a number m, of surrounding points x i : The problem is to determine the weighting, w i , of each of the surrounding points. There are many ways to choose these weights, including the two best-known methods: linear interpolation (based on the inverse distance weighting) and the cubic spline interpolation (adjustment of cubic polynomials). However, these approaches are limited by their inability to integrate the distance from the shore and the altitudinal effects. Therefore, here, we suggest a method of spatial distribution based on a neural networks approach: The adjustment of ANN is based on the learning mechanism which consists of varying the parameters of the parameterized functions (called neurons) of the neural network in order to minimize a criterion previously named cost function. This criterion is usually presented by the mean squared error.

Rainfall regionalization
The first PCA axis accounts for 61.55% of the total data variability ( Figure 3). With the second axis, they reflect 86.2% of the precipitation data variability. Therefore, the residual variability that is described by the remaining axes is rather weak and will thus not be considered in the following discussions.    In order to detect potential rainfall deficit, we will analyze rainfall evolution over 47 years and show its trends.
For this purpose, we will analyze the temporal rainfall variability and compute the standardized precipitation index (SPI); this, for each homogeneous rainfall class ( Figure 6).

The SPI is an index developed by McKee et al. (). It is
given by the difference of the precipitation from the mean, then dividing it by the standard deviation: where P i is precipitation of year i; P m is average precipitation of the whole study period; and σ is standard deviation.
Drought is noted when this index begins to be negative.
Negative SPI values therefore represent a precipitation deficit while positive values indicate that precipitation has been above the historical average. Several authors have defined SPI value ranges to identify the climate aridity or humidity. The one proposed by Lloyd-Hughes & Saunders () is as in Table 2.
The annual precipitation amounts vary widely from year to year. But overall, during the 47 years of data, there is a downward trend in the mid-mountain area (Aghbalou) and in the plain (Marrakech). However, for the coastal station  better to log-logistic law ( Table 3).
The Log normal distribution, therefore, seems to be the suitable probability distribution for modeling maximum    In the Tensift watershed, frequency analysis allowed estimation of 100 years of rainfall. To spatialize this precipitation, we used a black box model because of its properties of parsimony and universal approximation (Piron et al. ). Artificial neural networks are an example that can solve problems of identification and prediction (Ruano ). Consisting of multiple layers, the network has a learning algorithm and an aptitude for approximation and generalization (Huang ). The simplest is composed of three layers: an input layer, one or more hidden layers, and an output layer (Figure 9(a)).
Our   Model outputs also reveal changes in centennial precipitation, depending on longitude (X), latitude (Y), and altitude (Z) (Figure 11). Like the result of PCA and frequency  analysis, these centennial quantiles vary greatly by longitude (distance from the ocean) much more than by latitude (in the north-south direction), whereas a slight increase is noted on the High Atlas Mountains and a decrease of precipitation is observed on the plain (Figure 11(a)). The altitude (Z) also impacts on these quantiles, but in a more moderate way than longitude (Figure 11