The spatial correlation structure of rainfall at the local scale over southern Ghana

Study region: Ghana is located in west Africa by the coast with the majority of the annual rainfall coming from the west African monsoon. Study focus: Thanks to a new dense, long-term dataset the spatial structure of rainfall for the different phases of the monsoon has been investigated. Previous studies have only considered a general decorrelation range whereas in this study a novel approach of estimating the decorr-elation rate depending on the intensity of the rainfall event has been implemented. The anisotropic pattern at the subweekly and local scale was also modelled. New hydrological insights for the region: The spatial correlation structure of rainfall varies greatly with the intensity of the rainfall event and the phase of the monsoon, with a much shorter range for low intensity rainfall compared to other intensities. At the very local scale (∼10 km), there is a much larger variation over the year at the lower intensities compared to the heavier, indicating a larger variation in the structure of the convective systems generating low amount rainfall compared to heavy rainfall systems. The westward propagation of convective systems can be seen even at short aggregation periods and local scale.


Introduction
Rainfall over west Africa has received a lot of interest the past decades due to the limited possibility of irrigation for the many farmers depending on rain fed crops.In Ghana, 50% of the population depend on rain-fed crops (SRID Ministry of Food and Agriculture, 2017) and a large part of the country's energy come from hydropower from the lake Volta (Nyarko Kumi, 2017), which makes the hydrological cycle of great importance.Because of the sparse rain gauge network over most parts of Africa (Washington et al., 2004), some research has been done on describing the rainfall distribution over time for a specific station and then extrapolating this knowledge to the surrounding region (e.g.Nicholson et al., 2000).One problem with this approach is the highly variable weather over west Africa which makes it difficult to extrapolate knowledge outside a very small region, leading to large uncertainties.The majority of the west African rainfall comes from the west African monsoon which is controlled by the movement of the inter tropical convergence zone (ITCZ), an area where the south-east and north-east trade winds meet and a belt of convective clouds is present due to this convergence and the high amount of energy from the sun.The movement of the ITCZ results in high variability in rainfall over the year and the convective nature of the rainfall is one of the reasons for the high variability in both time and space on a daily scale.
There has also been a lot of research done using satellite rainfall estimates, calibrated against the sparse network of ground stations, or using reanalysis products which in general performs worse at finer scales in the tropics when compared to gauge measurements (Diro et al., 2009;Maidment et al., 2017 and references therein).To get a more realistic description of the spatial rainfall distribution, and to generate spatially accurate satellite rainfall estimates, several papers have modelled the spatial covariance over either all of Africa or a specific country or region.This has been done using satellite data (e.g.Funk et al., 2015;Smith et al., 2005) and rain gauge data (Moron et al., 2007;Ricciardulli and Sardeshmukh, 2002;Greatrex et al., 2014).Both of these types of dataset have their individual issues when collected over Africa.Satellite products may not represent fine scale variability accurately (Maidment et al., 2017), but rain gauge data on the other hand is usually very sparse which again leads to issues when modelling the small scale behaviour (Greatrex et al., 2014;Moron et al., 2007).A general problem is the lack of long time series, making it difficult to describe the full interannual variability (Greatrex et al., 2014).Ricciardulli and Sardeshmukh (2002) used 3 years of cloud observation data transformed into a "deep convection activity index", with a resolution of 0.35°×0.7°covering the entire tropics to model the correlation distance.They only focused on modelling the decorrelation distance for deep convection clouds for all active months, hence not making a distinction between the different phases of the monsoon cycle.With this method, they were not able to capture rainfall events related to any processes other than deep convection (Young et al., 2014).Both a method of estimating the distance until the correlation was less than 1/e, and a method to estimate the distance at which the conditional probability of rainfall, given that it rains at the grid point, approaches the overall probability of rainfall was used.In south West Africa, the decorrelation distance was estimated to 150-180 km with the second method resulted in the larger distances compared to the first method.Funk et al. (2015) instead used 0.05° resolution 5-day cumulative cold cloud duration (CCD) data to estimate the decorrelation distance for each month separately.Their method instead involves to estimate the average correlation at 1.5° around the grid point and then calculate the decorrelation slope by assuming the correlation to be 1 at distance 0. From this slope, the distance at which the correlation should be 0 was estimated.This method results in decorrelation distances of 500-800 km over south west Africa.The longer range in previous studies is likely to be due to the use of 5 day, rather than daily, values.One limitation of using CCD instead of gauge measurements is that decorrelation in CCD is not equivalent to the decorrelation range in rainfall.In contrast to the other two papers, Moron et al. (2007) calculated the decorrelation distance between measured rainfall at stations instead of satellite grid points.This was calculated for five different tropical regions, to assess the generality of the results, on amount and occurrence data separately by estimating the Pearson's correlation for amount data and phi correlation for occurrence.One major limitation in this paper is the small number of stations for each region (9,11,13,28,81) which results in very wide distance bins (100 km) and only a few station pairs in each bin.
Another method for estimating the correlation distance is to derive a variogram.A variogram describes the variance structure between locations depending on the distance between them.In both Greatrex et al. (2014) and Teo and Grimes (2007), climatology variograms are calculated on rain gauge data to estimate the range of rainfall over Ethiopia and Gambia.Both of these papers split the analysis for occurrence and positive rainfall amounts, similar to Moron et al. (2007).This is because the dependence structure in occurrence and amount are showed to not necessarily be equal, and furthermore total rainfall amount over some period is not entirely dependent on the frequency of rainfall event.Teo and Grimes (2007) face the same limitation as Moron et al. (2007) with only 20 stations, however distributed on a small area, resulting in a relatively dense network.The range for occurrence and positive rainfall amounts are 50 km and 150 km, hence substantially shorter than the once estimated by Funk et al. (2015) but similar to Ricciardulli and Sardeshmukh (2002).Greatrex et al. (2014) has a much larger dataset of 276 stations but is limited to only 5 years of data and the stations are very unevenly distributed over the country with a complex topography.Many of the variograms in the paper do not have a clearly defined range and thus a clear correlation distance is difficult to determine.The difference in correlation distance can partly be explained by the use of different methods and data types but some stems from the use of different countries, since the decorrelation range varies greatly across Africa (Funk et al., 2015).
Two common assumptions are that the rainfall distribution will be equal for all rainfall events and the distribution is equal in all directions.But in the recent paper of Maranan et al. (2018) it was showed that even though the vast majority of the annual amount of rainfall comes from Mesoscale convective systems (MCSs), moderate and strong convection has the highest frequency of events.Many rainfall processes are moreover anisotropic, especially at the daily scale.There has been some work done on anisotropy in Africa, but this has either just been done on very small areas (Gyasi-Agyei and Pegram, 2014;Ali et al., 2003) or using covariates to remove the spatial variability (Laux et al., 2009).
The results in this paper will help to close the knowledge gap between the statistics on a station level and on the large scale behaviour (∼400 km) by describing the covariance behaviour on daily rainfall at the scales of 10-150 km.This analysis is made possible by a completely new and unique dataset from the Ghana Met Agency comprising 590 stations with daily rainfall measurements.Ghana is chosen as our study region due to this unique dataset in combination with its varying rainfall behaviour both in time and space due to the ITCZ.We will be using the method of conditional probabilities from Ricciardulli and Sardeshmukh (2002) because of the easily interpretable results and the possibility to establish a rainfall reference probability.The results will also provide a better understanding of the spatial behaviour of different intensities of rainfall events over west Africa by modelling their dependence structures separately.The final contribution is an anisotropic description of rainfall, showing the impact of large scale drivers on the local scale.
The remainder of the paper is organised as follows: an introduction of the study area, the dataset and the methods used to model the co-occurrence will be presented in Section 2, results on the rainfall climatology and the spatial distribution will be given in Section 3 and the paper will end with a discussion in Section 4.

Study area
Ghana is located in the Guinea coast with boarders to Burkina Faso, Côte d'Ivoire and Togo (Fig. 1).It has five distinct geographical areas: low plains in the south, the Volta Basin in the centre with the artificial lake 'Lake Volta', the Akwapim-Togo ranges to the east of the Volta Basin with many heights and folded strata, the Ashanti Uplands to the west and high plains in the north (Boateng et al., 2018).The temperature peaks around February-March and is at its lowest around August.The rainfall is mainly associated with the west African monsoon, which is controlled by the movement of the ITCZ.The country is under the influence of the tropical maritime air mass from March to October, during which the rainy season occurs.South of 8° N, there are two rainy seasons with a short dryer period in August.North of 8° N only one long rainy season occur (see Fig. 6).From November to February/March the country is affected by the prevailing southward winds, called the Harmattan, which brings dry and dusty air from the Sahara and gives rise to the dry season.The majority of the rainfall is generated by convective clouds with a higher contributing proportion in the north.The coast experiences, aside from convective rainfall, warm rain processes and advective rainfall from the Atlantic ocean.

The dataset
The daily rainfall dataset used in this report is provided by the GMet (Ghana Meteorological Agency) and consists of daily rainfall amounts recorded at 590 stations covering all of Ghana, with a much higher density of stations in the southern half of the country.Extensive quality control was performed by the manuscript authors, along with a team of experts from the GMet.The dataset was assessed for errors in station locations, location shifts over time (over coastal data), erroneous data, the relationship with neighbouring stations and erroneous statistics and outliers.Any data flagged in this process was then checked against the original written records and other sources such as Google Earth Imagery for locations.In the case of data that was clearly erroneous, the station (or a subset) was removed.The original dataset consisted of 598 stations and 17,008,530 individual station-day data points.The Quality Control process led to a reduction of 1.55% of available data, with the controlled analysis using 590 stations and 16,744,082 individual data points.The dataset spans from 1940 until the end of 2017, with all the time series containing some missing values, but several of them only have few missing values in the period 1950-2017.Fig. 2 shows the number of stations for each month with less than 10% missing values, which we will refer to as valid stations.Fig. 4 shows the number of valid stations, i.e. stations with less than 10% missing values, for each proportion of valid months in the dataset.There is a significant increase in the number of valid stations from 1950 (Fig. 2) and then a steep decrease during the 80s, similar to the station pattern found in the datasets used in Nicholson et al. (2018).Fig. 3 shows the distribution of the median number of daily reporting stations during the year with the most available stations (1976) and in 2017.The station density has decreased coherently across the whole country, however the very sparse network in the north even during 1976 results in extremely few current stations.Due to the large increase in the number of valid stations in the  50s, our statistical analysis can be improved by only including data in the period 1950-2017, which still leaves us longer records than most other studies and includes data both before and after the Sahelian drought in the 70s and early 80s (Brooks, 2004).

Rainfall climatology
Because of the missing values, some adjustment must be made to the annual and monthly amount totals to compensate for the missing data points.Instead of using data fill methods, such as replacing missing values with the mean, median or most frequent measurement, the measured annual or monthly amounts are adjusted with a parameter proportional to the number of missing values in that time period.This scaling method is favoured over the fill method since we are not attempting to fill in the gaps, but rather to compensate for the expected lower amount total due to lower number of recorded days.For the monthly totals, each rainy day measurement is multiplied by m 1 , where ξ m is the proportion of missing observations within that month.Similarly for the annual totals, the total amount is multiplied by y 1 where ξ y is the proportion of missing observations within that year.Only individual years with less than 20% missing values are used for these maps.This to not risk including years where the recorded days do not accurately represent the full year.
The coefficient of variation is defined as , where s x is the standard deviation of daily rainfall ≥1 mm and x is the average rainfall over rainy days.This is calculated both for daily values and monthly aggregated values for each station with at least 23 years of data, only using the months outside of the dry season.For the estimation of the daily CV, values from all years are used, excluding missing values and amounts lower than 1 mm.For the monthly CV estimation, only months with ≤3 missing values are  used.

Spatial variability in the occurrence of rainfall of varying intensity
The analysis is done separately for each month to remove some of the variability due to the different phases of the monsoon, and only using every 5th day to work with independent events (see Fig. S1 in the supplementary materials for autocorrelation plots).This is done because we are interested in modelling the spatial dependence and not the dependence in time.To reduce the noise from large differences in absolute amounts between nearby stations, correlograms are estimated from occurrences within amount intervals instead of covariograms on the measured amounts.
The amount intervals, hereafter denoted intensity classes, are defined as; S 1 = [1, 10) mm, S 2 = [10, 30) mm, S 3 = [30, 50) mm and S 4 = [50, ∞) mm representing low, moderate, heavy and very heavy rainfall.In order to study how the rainfall dependence changes with intensity of the convective system, we calculate the proportion of rain-rain occurrences, hereafter denoted co-occurrence, both within an intensity class and between an intensity class and stations in higher or lower intensity classes.That is, we will estimate the co-occurrence probability between only stations that are within the same intensity class, and in the setting with the origin station in one intensity class and surrounding stations in the same or higher/lower intensity classes.Due to only using every 5th day, we end up with 408 independent time steps (68 years, 6 days per month except February) for each month.Only stations south of 8° N with less than 50% missing values in the period 1950-2017 has been used which gives us 232 locations for our analysis.The reason for only using stations south of 8° N is two-fold.Firstly, by excluding the northern region with a single rainy season, all the stations will be in the same rainfall regime (rainy or non-rainy).Secondly, the dataset is much more dense in the southern region which provides us with more robust estimations.50% missing values is chosen as a trade-off between using stations with just a few years long record which might skew the results and discarding information.Since our method involves taking the average over a very large number of estimates of co-occurrences, we determined that stations with up to 50% of missing values will not negatively impact the results.
To model the spatial dependence structure of co-occurring rainfall events within an intensity class, the second method with conditional probabilities in Ricciardulli and Sardeshmukh (2002) was used.The full algorithms are found in Appendix A and a summary of it is presented below.

Algorithm 1
For each unique day, transform all amounts that are within our chosen intensity class to a 1 and all other amounts to 0. Choose one of the stations assigned a 1 to be the origin station and calculate the distance from this station to all other stations.Within each 10 km distance bin, calculate the proportion of stations assigned a 1. Calculate the proportion for all stations assigned a 1 in step 1 and repeat for each unique day.
By calculating the average in each distance bin, we can get a climatological average on the probability of observing rainfall of the same intensity as the origin station for a given distance.
To model the dependence structure of co-occurring rainfall events between an intensity class and either lower or higher intensity classes, the above method is used with a few changes.Transform all amounts that are within our chosen intensity class to a 1, all stations with a measured amount in all lower (higher) intensity classes with a 2 and all other stations to 0. Then calculate the proportion of 1's and 2's instead of just 1's.The rest of the algorithm is identical.Just as for measurements within an intensity class, taking the average in each distance bin, we can get a climatological average on the probability of observing rainfall of the same intensity or lower (higher) as the origin station for a given distance.
To compare our co-occurrence probabilities with the climatology background state, a 2-step sampling method is used.A summary of the algorithm for calculating the background state within intensities is given below.

Algorithm 2
For each unique day, calculate the proportion of rainy stations.Transform all amounts that are within our chosen intensity class to a 1 and all other amounts to 0. Choose one of the stations that are assigned a 1 to be the origin station and calculate the distance from this station to all other stations.Randomly assign rain or no rain to all other station so the proportion equals the measured proportion, and for each rainy station randomly assign a measured rainfall amount from that station in that unique month.Assign a 1 to all stations in the chosen intensity class and calculate the proportion of 1's in each 10 km distance bin.Repeat for all stations initially assigned a 1 and for each unique day.
For the calculation between intensity classes apply the same modification as described for Algorithm 1.
A schematic overview of the two algorithms can be found in Figs.B.15-B.17 in Appendix B.

Anisotropy in spatial rainfall variability
To estimate the variation in rainfall variability, semivariograms were derived on 2-, 3-and 5-day rainy aggregated amounts, using all intensities, to study how the small scale rainfall variability structure is influenced by large scale drivers.
In our setting, let Z(s i ) represent the k-day aggregated rainfall amount at station s i .Assuming that the mean and variance of Z(s) are finite, the semivariogram is defined as the half mean squared difference Here we are going to assume that the underlying process Z(s) s≥0 generating the k-day aggregated rainfall amount is intrinsically stationary and isotropic.This simply means that the mean is constant, i.e.E(Z(s)) = μ, and the semivariogram only depend on the distance between 2 locations and not the direction.Hence A natural method to estimate the semivariogram is by the method-of-moments semivariogram which essentially stems from replacing theoretical expectations with the analogous sample averages.The corresponding sample semivariogram to Eq. ( 1), notably the Matheron's classical estimator (Matheron and Blondel, 1962), is defined as where z i is the observed k-day aggregated rainfall amount at station s i , N(h ± δ) is the set of pairs within the spatial lag h ± δ and |N (h ± δ)| is the number of pairs in that distance range.
The general shape of semivariograms can characterise and explain the dependence structure in terms of its three main indicators: nugget, range and sill.The nugget is the variance at the close-to-0 distance representing the variability at distances of a couple of metres and measurement error.The nugget is expected to be small in rainfall data because of very high correlation at small distances.The sill is the value that the semivariogram converges to as the distance increases (dependence decreases) and the range is the distance at which two stations are independent and the sill is reached (or 95% of it if only approached asymptotically) (Cressie and Wikle, 2011).The maximum distance is set to 160 km since we are interested in modelling the small scale behaviour and due to the small area over which this is estimated.
The estimated semivariance values will be used to calculate the covariogram, defined as where σ 2 is the 0 distance variance.Because the variance is linear in the number of aggregated days, C(h) is divided by σ 2 to enable us to compare the results from the different aggregation periods.By using the covariogram instead of semivariogram, we can again compare our estimated values against the theoretical convergence value 0, which is reached when there is no dependence left.
For quantities with spatial dependence varying with direction, an anisotropic model must be applied instead of the isotropic model described in Eq. (2).Isotropic models, i.e. models only depending on distance and not direction, can be turned into anisotropic models by replacing the distance parameter h with a distance vector h, which then will be associated with both a length and a direction.The bin h ± δ now represents both a distance range and an angular tolerance, e.g.all stations in a 45° segment.
If we have a dense station network and want to get more detailed information about the spatial distribution, one can estimate a covariogram map.A covariogram map is a lattice where each square represents a distance and an angle and is symmetric because of the square term in Eq. (2).To construct a covariogram map, select one of the stations and place a square lattice over the region with your chosen station in the centre.Calculate the difference in k-day aggregated amount between your selected station and all other stations.Calculated the average within each square of the lattice and apply Eq. (3).Repeat this for all stations and all k-day periods.After taking the average from all the individual maps, the resulting map describes the mean behaviour in all directions as we move away from a station, hence displaying directions with a stronger correlation.
A minimum threshold of 1000 pairs for each square is used to make the estimation more robust.

Climatology of rainfall in Ghana
To describe the climatology of the rainfall in each agro-ecological zone shown in Fig. 5 and defined by GMet (Owusu and Waylen, 2009), the four stations marked with red dots are used.Annual total amount time series, Box and whisker plots over the monthly total amounts, maps of key rainfall estimates and maps of coefficient of variation (CV) for each month not in the dry season are used to demonstrate this.A day is defined as rainy if the measured amount is ≥1 mm as used by Expert Team on Climate Change Detection and Indices (ETCCDI) and in several other papers (e.g.Moron et al., 2007;Sillmann et al., 2013).30 mm is chosen as the threshold for heavy rainfall because this equals the 10-20% heaviest amounts on rainy days for most of the country.
Combining the information in Figs. 6 and 7, one can clearly see the reason for the partition, with the dry Coast region, the Forest region with a high proportion of rainy days, the Transition region with much fewer rainy days but still a bimodal rainy season and the North with only one rainy season.In some papers, the South-Western coastal region is classified as a separate region, which Fig. 6c confirms with the much higher proportion of days with heavy rainfall in that area.One can notice a diagonal band of higher proportion of rainy days from the SW coast up to Lake Volta, with a significantly drier region along the coast.This was noted already in Acheampong (1982) and was explained by the complex atmospheric interaction in that region.We can also see the decreasing trend in rainfall as we move northward in Fig. 6a, matching the movement of the ITCZ and a decreasing presence of advective rainfall.
In Fig. 7 we can clearly see the different rainfall modes, with an unimodal rainy season in the north zone and a bimodal in the rest of the country.The difference between the bimodal regions is clearly visible, with both the major and the minor rainy season being of equal intensity in the Forest region whereas the Transition region has a slightly more intense minor season and the Coast has a much  A common feature for all regions is the large interannual variation in monthly and annual total amount.For all months during the rainy season, the range of the whiskers are around 300 mm and the mean rainfall is between 100-250 mm.June in Accra, which is at the peak of the rainy season, has the largest range which is 450 mm with a mean of 175 mm.There is also a common pattern of a slow increase in mean rainfall up until the peak of the rainy seasons and then a quick decrease as the ITCZ retracts.
Studying the total annual rainfall in Fig. 8 one can again see a very large interannual variation for all locations with the biggest spread in the Forest and Transition regions, both of which have two intense rainy seasons.Despite Tamale only experiencing one rainy season, the mean annual rainfall is higher than Accra, due to the longer rainy season.Accra has the lowest mean annual rainfall of the four stations, of around 750 mm/year.Both Kintampo and Effiduase have mean annual rainfall of around 1400 mm/year, however Effiduase has a varying pattern so the amount fluctuates over the studied time period.Accra has an interannual range of about 1000 mm, Tamale 800 mm and Kintampo and Effiduase of nearly 1500 mm.In contrast to Owusu and Waylen ( 2009) but similar to Lacombe et al. (2012) and Torgbor et al. (2018), it does not appear like the Sahelian drought in the 70s and 80s had a big impact on the average annual amount in any of the regions but there is a strong decrease in the variability at Tamale and Kintampo during this period.
There is a distinct pattern of lower CV over monthly aggregated values (Fig. 10) than daily values (Fig. 9), meaning that the fluctuations around the long term rainfall mean is much larger than the fluctuations around the monthly mean.This is expected due to the convective nature of the rainfall resulting in short intense storms.We can also see a more coherent pattern over space in the monthly compared to the daily values.The monthly CV values are however still large with a mean value of 1 or higher for nearly all months, meaning that the variation around the long term mean is of the same magnitude or larger than the mean.This is similar to the results in Arvind et al. (2017) which looked at monthly values in a region of India but a lot higher than Ayanlade et al. (2018) which studied a region in Nigeria.The most northern part of Ghana is under the influence of the Harmattan in March and November as seen in Fig. 7, hence the monthly average rainfall is very low which inflates the CV value.For the daily values, we can see a larger spread during the monsoon phase (May-October), with the exception of June which has one of the smallest spreads.This is because the higher mean in June lowers the CV estimate, even if the absolute variation is the same as May and July.There are no obvious The monthly values exhibit a different pattern.Outside the main monsoon in the north (October-April), there is a bimodal pattern with high CV values in the north and along the coast and lower values inland.May and June is very non coherent, but still with high values along the coast line.During the peak of the north rainy season (July-September), there is a clear gradient with high values in the south and low values in the north.There is also a West-East gradient in the southern part for all months except July-October, with lower values in the West where we have higher annual rainfall and more rainy days.

Spatial distribution of rainfall events
In the papers mentioned in the introduction, all rainfall events were to have the same spatial variability.This must not necessarily be true since we might expect a moderately intense rainfall event to have a larger extent than a low intensity rainfall or an intense shower storm.In order to study this, the method of conditional probability curves will be applied to the intensity classes described in Section 2.4.By comparing our measured results with a random sampling baseline, we can separate out the increased probability of rainfall at a station because of rainfall at nearby stations and the overall probability of rainfall due to the time of the year.This will both give information about the very local behaviour and the spatial extent for the different intensities.In the first section we will model the variability, placing the highest intensity class in the origin, and in the second section we will model the variability within each intensity class.

Extent of rainfall events
By modelling both the conditional probability of observing rainfall amounts that are higher (Fig. 11) or lower (Fig. S2 in the Supplementary material) than the origin station, we can get information about both the extent of a large rainfall event and the probabilities of observing higher rainfall amounts close to low rainfall.By subtracting the reference rainfall probabilities from our measured co-occurrence probabilities, we can study how the anomalies changes over the season.
The results shown in Fig. 11 suggests that low rainfall events are localised whilst heavy rainfall events have a larger spatial structure.In Fig. 11(a) and (c) we can see that the 5 km line very closely follows the baseline, showing the small scale behaviour   depending on the season.This would indicate that low rainfall events are very local, resulting in the co-occurrence probability depending on the overall probability of rain.The peak in August in (a) is most likely due to the significant difference in the rain intensity distribution in Accra.For all months except July and August, the histogram over rainfall amounts (not included) are very similar for the three southerly stations used in Section 3.1.However in July and August, there is much higher probability of low intensity rainfall in Accra, and since there is a high station density along the coast the results from that region has a higher weighting compared to further north.This strong connection with rainfall distribution and co-occurrence probability confirms the idea of local low intensity events.
If we instead consider Fig. 11(e) and (g), the 5 km line is nearly constant over the year, hence the small scale behaviour of heavy events are independent of the overall rainfall probability.This suggests that heavier rain events have a larger spatial structure that dominates the area wide behaviour.From the anomalies (right column Fig. 11), we can see that the co-occurrence probabilities follows the baseline from about 50 km for moderately intense rainfall and from around 100 km for heavy and very heavy rainfall.This further confirms that heavier rainfalls have a larger spatial structure.The seasonal behaviour in the baseline comes from the changes in probability of rainfall, presented in Table 1, which impacts the large scale co-occurrence probability because of a higher proportion of dry stations.From Table 2, we can see the number of time steps where the intensity is observed and the total number of observations for each intensity.The ratio of these two gives us the average number of stations observing a certain intensity, e.g.30-50 mm, given that at least one station observes that same intensity.For very heavy rainfall, this ratio varies between 1:2 in the dry season up to 1:6 in June.The average is 1:4 during the rainy season except August, when this drops to 1:3.This indicates that during the dry season it is more common with just one heavy storm occurring, whereas in the rainy season there are on average 4 stations affected by very heavy storms on the same day.This can explain the peak in June in the baseline probability and the relatively constant behaviour during the rest of the rainy season except August.
Similarly, the peak in October for moderate intense rainfall (Fig. 11(c)) can be explained by an increase in this ratio from September to October, meaning that there is a much higher probability of moderate intense rainfall in other locations in October, given that it rains, compared to September.
From the right column of Fig. 11 we can see that the decorrelation distance increases as we increase the rainfall intensity, strengthening our claim.No formal statistical test has been applied to test the difference between the anomalies and 0 but by eye, we can see that for low intensity rainfall already at 50 km the anomaly is only around 0.05.For moderately intense rainfall, the same value is not reached until 100 km away.Heavy rainfall exhibits a very similar range as moderate rainfall except a slightly larger peak in August.Very heavy rainfall has not fully converged, and therefore reached its decorrelation range, even at a distance of 150 km, demonstrating a large-scale impact on the rainfall probability.

Spatial variability of rainfall occurrence
To better understand the spatial variability of rainfall occurrence for various intensities, and thereby the differences in spatial extent of areas with the same rainfall intensity, conditional probabilities for the separate intensity classes were calculated.The same method as the previous section is used to again enable us to compare the measured probabilities with the climatology.
Fig. 12 confirms the pattern in Fig. 11, with the co-occurrence probability for heavy and very heavy rainfall not varying much with the season, even at long distances, whereas low and moderate rainfall has a strong seasonal pattern, from small to large scales.The seasonal pattern in the baseline for low and moderate rainfall again highlight the different overall probability during the monsoon season.
At all distances, there is a peak in the probability in August for low (brown) intensity but a trough for all other intensities.This is most likely explained by the higher frequency of small rainfall events compared to other intensities in the short dry season, which increases the probability of co-occurring low intensity events and decreases the other intensities.The climatology co-occurring probability of low and moderate (blue) intense rainfall is however close to identical during the build up phase, March-May.For moderately intense rainfall, there is an increasing overall probability until June, decreasing during July and August and then increasing again.For heavy (green) and very heavy (orange) rainfall, the climatology probability is relatively constant over time.Low intensity rainfall has nearly converged to the climatology at 100 km whereas moderately intense rainfall has converged at 150 km.Hence persistent moderately intense rainfall has a larger extent than persistently low intense rainfall.Neither heavy nor very heavy rainfall has converged, indicating that rainfall events that persistently releases more than 30 mm of rain has a spatial dependence even at 150 km away.

Spatial shape of rainfall events
To study if the large scale drivers such as the East African jet also can be seen on the small scale, we estimated covariogram maps using 2-, 3-and 5-day aggregated June data.Since the aim here is to look at the potential anisotropic pattern generated by all rainfall, we are now including all days without splitting it into intensity classes.We know from Fig. 6(a) that there is a strong rainfall gradient in the NW-SE direction in annual amount, but we want to see if there might be a different spatial variability pattern when looking at accumulation over only a few days.Because of the significantly larger mean and variance in the south west corner (Fig. 13), the following analysis will be done on the indicated region in Fig. 13(b), to work with the assumption of equal mean and variance over the entire region.This difference in mean and variance did not affect our previous results since we worked with intensity occurrence instead of amounts.Covariogram maps were also estimated over the coast region, but are excluded due to their very noisy pattern.
The patterns in the covariogram maps in Fig. 14 have a lot of similarities for all aggregation periods but some small differences as well.One can clearly see a higher correlation distance in the E-W direction compared to the N-S direction in all aggregation periods.There is however no clear difference in the NE-SW and NW-SE direction.Hence even on a 2-day scale we can see the pattern of dominantly westward propagating convection systems.The correlation drops off very rapidly with a correlation of around 0.5 just 20 km away and as we increase the aggregation period, the correlation gets coherently increased in all directions.The area with a correlation of 0.25 is however extended much further in the E-W direction compared to N-S, demonstrating an increased anisotropic pattern for longer aggregation periods.Even at 160 km, there is some correlation which is due to the climatology similar to the previous results.The higher correlation in E-W is not due to the fact that the region is wider than long, which was checked by doing the same calculation over a square region.

Discussion and conclusion
Estimating and predicting rainfall over west Africa will probably remain a difficult task for some years ahead due to the sparse and degrading rain gauge network.In this paper, we have provided some insights on the spatial behaviour of daily rainfall within Ghana.In contrast to previous studies, we have not assumed that all rainfall events have the same spatial structure, but instead studied the rainfall events split into four different intensity classes to understand differences in the co-occurrence structure at small scales over Fig. 12. Seasonal evolution of the conditional probability at different distances for stations south of 8° N, using Algorithm 1 in Section 2.3.The solid lines are the probabilities from the original dataset and the dashed lines the probabilities from the random sampling method.The rain-rain occurrence is 1 if both the distant station and the origin station are in the given intensity class.
the season.We have showed that the conditional probability of observing rainfall of the same intensity varies seasonally for low and moderately intense rainfall, but not for heavy and very heavy.This might partly be explained by changes in the proportion of rainfall events in each intensity class over the season, however the same pattern is observed when treating all lower intensities as occurrences.This shows that heavy rainfall events have a stronger influence at the local scale and therefore is not affected by the overall probability of rainfall, whereas low and moderate rainfall are much more localised.The anomalies structure on the other hand is very similar for all intensities except low, demonstrating a different structure in drizzle events compared to other rainfall events.
Our results show a decorrelation distance similar to the one obtained by Ricciardulli and Sardeshmukh (2002) and positive amount in Teo and Grimes (2007), but about three times further than their occurrence range.This difference probably results from several factors, such as the scale difference of gridded data and station data, how dense the dataset is and the method used.But a nonnegligible part most likely also comes from the use of different regions.Because of the complex atmospheric systems over central Africa, the rainfall structure varies greatly, as shown in Funk et al. (2015).This makes it difficult to directly compare large scale estimates, such as correlation ranges, with previous studies.The small scale estimates on the other hand, which mostly depend on the rainfall system and not the current rainfall state, could be more comparable assuming the convective systems are similar across the Tropics.It would however be difficult to compare with European studies since the rainfall there mostly comes from advective systems.
For the spatial shape of rainfall events, even at the small scale it is possible to see the influence of large scale drivers such as the  African easterly jet.As we increase the accumulation period, the covariance range is increased in the E-W direction, which is the direction of stronger covariance.The pattern of stronger correlation at longer accumulation periods was also noted by Bacchi and Kottegoda (1995) which can be explained by the decreased dependence of the individual rainfall events, which are local scale events, and more on the large scale drivers which usually affect a hole region.The results in this paper demonstrate the issues with describing all rainfall events with the same correlation structure, but that we can assume isotropy for short accumulation periods.We hope that this method will be applied in other regions since it is easy to adapt by changing the intensity classes to suitable country levels and the results from different studies can be directly compared.It would be especially interesting to see how this compares to other tropical regions where we might expect to see the same type of rainfall systems but with a different occurrence distribution compared to our study region.

Fig. B.16.
Schematic figure on the method to calculate the spatial co-occurrence dependence between lower (higher) intensity bands.The green stations are within the chosen intensity band and assigned a 1, the orange stations are the lower (higher) intensity bands assigned a 2 and the black stations are of other amounts and assigned a 0. The pink dot is the stations chosen as the origin station.Steps 2 and 3 are repeated for each green station in step 1 and steps 1-3 are repeated for all 408 unique days.

Fig. B
.17.Schematic figure on the method to calculate the spatial co-occurrence dependence within an intensity band.The green stations are within the chosen intensity band and assigned a 1 and the black stations are of other amounts and assigned a 0. The pink dot is the stations chosen as the origin station.Steps 2 and 3 are repeated for each green station in step 1 and steps 1-3 are repeated for all 408 unique days.

Fig. 1 .
Fig. 1.Map highlighting the location of Ghana on the African continent.

Fig. 2 .
Fig. 2.Temporal evolution of the number of stations with less than 10% missing values per month.Each vertical line marks the beginning of a year.There are 590 stations in total.

Fig. 4 .
Fig. 4. Absolute frequencies of stations against the proportion of valid months, i.e. months with less than 10% missing values.There are 590 stations and 936 months in total.

Fig. 5 .
Fig. 5. Map of Ghana showing the four agro-ecological zones defined by GMet (Owusu and Waylen, 2009) and the location of the 100 stations with the least number of missing values.The red stations are the stations used in the following climatology analysis and the enclosed region is used for the rest of the analysis.

Fig. 6 .
Fig. 6.Maps of Ghana.(a) Average annual total amount, (b) the distribution of proportion of rainy days (≥1 mm) and (c) the proportion of heavy rainfall days (≥30 mm).(a) and (c) only uses stations with at least 23 years of data.Note the different scales.

Fig. 7 .
Fig. 7. Box and whiskers plots of the distribution of the total rainfall in each month.Location of all the stations is displayed in Fig. 5 (red dots).Months with any missing values has been removed and the most extreme outliers are excluded.

Fig. 8 .
Fig. 8. Time series over the full annual total amount for one station in each agro-ecological zone.Gaps in the time series are years with more than 20% missing values.

Fig. 9 .
Fig. 9. Maps of the distribution of coefficient of variation for daily values per month.Note the different scales on the scale bars.

Fig. 10 .
Fig. 10.Maps of the distribution of coefficient of variation for monthly aggregated values per month.Note the different scales on the scale bars.

Fig. 11 .
Fig. 11.Seasonal evolution of the conditional occurrence probability for stations south of 8° N. The left column shows the raw co-occurrence probabilities and the right the anomalies from the baseline.The solid lines are distances away from the origin and the dashed line is the random sampling baseline at 50 km in the left column and 0 in the right.The intensity bands are as described in Section 2.3.The rain-rain occurrence is 1 if the distant station is in the same or lower intensity class (see Algorithm 2 in Section 2.3).Note the different scales on the y-axis in the left column.

Fig. 13 .
Fig. 13.Distribution maps in June over Ghana.The maps show (a) mean of 5-day aggregated values, (b) variance of 5-day aggregated values and the region used to estimate the covariogram maps.

Fig. 14 .
Fig. 14.Covariogram maps over the mid region in June.Each square is the average covariance value in that distance and direction bin and each distance bin is 20 km.The graphs shows (a) 2-day, (b) 3-day and (c) 5-day accumulated daily data.

Table 1
Probability of rain of ≥1 mm on any day in each month.

Table 2
Top row is the number of time steps with at least one station in the given intensity and the bottom row is the total number of occurrences in the given intensity.The maximum number of time steps is 408 and the maximum number of occurrences is 408*232.