Blending high-resolution satellite rainfall estimates over urban catchment using Bayesian Model Averaging approach

Study region: Akaki is a headwater catchment of Awash River Basin that hosts the capital city of Ethiopia, Addis Ababa. The area encompasses several agglomerated towns, water supply, and hydropower reservoirs and is characterized by a chain of mountains and floodplains. Due to basin rainfall, and the expansion of urbanized areas, the catchment is frequently affected by flooding. Study focus: This study evaluates dynamic Bayesian Model Averaging (BMA) approach to improve rainfall estimation over the catchment by blending four high-resolution satellite rainfall estimate (SRE) products. Using daily data (2003 – 2019) observed at thirteen stations as a reference, seven statistical metrics served to assess the point and spatial scale accuracy of the rainfall estimates. New hydrological insights: Main findings from this study are: (i) the blended product outperformed the individual SRE products by notably improving correlation with in-situ observed rainfall, and reducing the error of the estimated rainfall, (ii) the blended and individual SRE products performed better in the highlands than the lowlands of the catchment, and (iii) the amount of daily rainfall during the main-rainy season was mostly overestimated by the individual SRE products but was fairly estimated by the blended product. This study showed the nonexistence of sur-passing individual SRE products and emphasized the blending of several products for gaining optimal results from each product.


Introduction
Quality of rainfall data in terms of accuracy and reliability plays an important role in water, climate, and environmental studies and applications.Across Africa in particular, access to quality data is challenging mainly due to a lack of support (i.e., technical, financial, and administrative) and political instabilities (World Bank, 2012;WMO, 2020) that often cause disruptions in the collection of data.Over the past three-decades, the number of rain gauge stations reduced from 50 to below 10 in the Democratic Republic of Congo (Washington et al., 2013) and from 400 to less than 50 in Madagascar (Dinku, 2019).Even though 3000 stations are supposed to provide reliable time series across the African continent, only 744 are installed and only 25% of them are up to the required standard (Satgé et al., 2020).Therefore, to complement the existing rain gauge observations, rainfall estimates by satellite products have been advocated as an alternative data source to fill the rainfall data gaps (e.g., Koriche and Rientjes, 2016;Dembélé et al., 2020;Dosio et al., 2021).
The earliest studies that evaluate the performance of satellite rainfall estimate (SRE) products over different parts of Africa yielded highly discrepant outcomes across topographic settings, climatic zones, rainfall intensities, seasons, and types of sensors.Gebremicael et al. (2019) showed that CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) exhibited adequate rainfall estimation performance over mountainous parts of northern Ethiopian as compared to other products.In contrast, Ayehu et al. (2018), Gebrechorkos et al. (2018), and Belete et al. (2020) compared several products and reported good performance of CHIRPS in different parts of Eastern Africa, irrespective of topographic variability.Gebere et al. (2015) compared three products and revealed the better performance of GSMaP (Global Satellite Mapping of Precipitation) and PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) in flat areas than in mountainous areas of eastern Ethiopia.The effect of topographic features on the performance of three SRE products is reported by a study conducted in two lake catchments in the Nile River Basin (Haile et al., 2013).The study revealed an underestimation of CMORPH (Climate Prediction Center MORPHing technique) and TRMM (Tropical Rainfall Measuring Mission) products in capturing hourly rainfall over the lakes and nearby landscapes (i.e., shores and islands) whereas the products resulted in an overestimation in the case of mountainous areas.Ayugi et al. (2019) showed that the accuracy of SRE products can be related to climate zones because the study reveals the good performance of CHIRPS, PERSIANN, and ARC (Africa Rainfall Climatology) over arid/semi-arid, humid, and highland areas of Kenya, respectively.According to Haile et al. (2010) and Mekonnen et al. (2021), the performance of SRE products differs for respective rainfall rates.The study considered seven SRE products over the Upper Awash basin of Ethiopia and showed better performance of the products in estimating rainfall rates < 10 mm/day than rainfall rates ≥ 10 mm/day.A study conducted over an arid region of Egypt showed satisfactory performance of CHIRPS in estimating rainfall at intensity < 1 mm/day but ARC and GSMaP performed well for rainfall intensity ≥ 1 mm/day (Nashwan et al., 2020).The influence of season on the accuracy of SRE products is underlined in several studies.A study conducted over the Lake Tana basin of Ethiopia by Fenta et al. (2018) indicated the good performance of TAMSAT (Tropical Applications of Meteorology using SATellite data and ground-based observations) during the main rainy season and CHIRPS in the short rainy season, as compared to the dry season.Gebere et al. (2015) showed that PERSIANN well captured accumulated rainfall during a dry season whereas, TRMM well performed during short rainy and long rainy seasons over eastern Ethiopia.The type of sensor is another factor that affects the performance of SRE products over a specific region.Studies conducted in different parts of Ethiopia with highland topography indicated a consistently good performance of microwave sensor-based products such as CMORPH and infrared sensor-based products such as PERSIANN in estimating daily rainfalls at higher and lower altitudes, respectively (Romilly and Gebremichael, 2011;Mekonnen et al., 2021).In general, this review indicates that there is no single SRE product that surmounts over the other products (e.g., Maggioni et al., 2016;Le Coz and van de Giesen, 2020).
To overcome the limitations of using a single SRE product, combining information from gauged measurements and different sensors-based multiple SRE products has become an emerging approach which is widely known as 'blending' (e.g., Beck et al., 2017;Zhou et al., 2021).Blending involves the optimal use of satellite sensors but also satellite rainfall products to provide optimal rainfall estimates where gauged rainfall serves as a reference to train the estimation algorithm.In the process when multiple satellite rainfall products are used, defining a proportional, optimal, and uncertainty-disfavoring weightage for each SRE product is essential to create a blended rainfall estimate to better match with the gauged rainfall.The principle of blending is that the blended product is the outcome of the simultaneous weighing of individual SRE products that attribute to the blended estimate for an optimum match with rain gauge observations.For each time instant defined weights of individual SRE products sum to 1 but weights may change for each time instant to seek optimum performance of the blending algorithm.
Blending approaches can be broadly categorized under either geo-statistical or non-parametric.Geo-statistical approaches consist of classical, relatively simple, and commonly applied merging approaches such as Kriging-based interpolations (e.g., Chappell et al., 2013) and Geographically Weighted Regression (e.g., Chao et al., 2018).Limitations of such approaches relate to their underlying assumptions that consider only stationary and Gaussian type of data that is collected from well-distributed and dense rain gauge networks (Erdin et al., 2012;Shi and Wang, 2021).Non-parametric approaches encompass data-driven machine-learning approaches such as Kernel Smoothing (e.g., Long et al., 2016), Quantile Regression Forests (e.g., Bhuiyan et al., 2018), Random Forest (e.g., Baez-Villanueva et al., 2020) and Artificial Neural Networks (e.g., Hong et al., 2021).Among such approaches, Bayesian Model Averaging (BMA) recently emerged and proved to be reliable, robust, and stable in its performance (see Ma et al., 2021;Yumnam et al., 2022).The BMA approach applies an optimal weightage for individual SRE products by simultaneously computing weightages at each time step and location of gauging stations.Hu et al. (2019) and Ochoa-Rodriguez et al. (2019) provide detailed descriptions of the differences between blending approaches.
Studies that focused on blending rainfall products have exhibited several shortcomings related to their methods, materials, and outcomes.Studies often used SRE products with relatively coarse spatial (i.e., usually 0.1 • × 0.1 • -0.25 • × 0.25 • ; e.g., Kumar et al., 2019) or temporal (i.e., usually monthly; e.g., Woldemeskel et al., 2013;Chua et al., 2022) resolutions to produce blended rainfall product at similar coarser resolutions for a period of a shorter span from 1 to 5years (see Zhang et al., 2021;Zhou et al., 2021).Evaluation of blended rainfall products often has been at large river basins or international boundaries having an area of half-a-million km 2 and beyond (e.g., Shen et al., 2019;Rahman and Shang, 2020).As a result, blending high-resolution SRE products (i.e., < 0.1 • × 0.1 • ) at smaller spatial scales (i.e., < 2000 km 2 ) for extended years is rare but is warranted if SREs should serve local scale hydrological assessments such as flash-flood hazard modelling or regional scale rainfall-runoff simulations.Some algorithms, in particular, those categorized under geo-statistical approaches apply static weightages that only vary with grid cells (e.g., Li and Shao, 2010;Verdin et al., 2016).Reference data for some SRE products that use rain gauge observations to improve the rainfall estimates rely on global gridded rainfall data from databases such as Climate Research Unit (CRU) and Global Precipitation Climatology Centre (GPCC).These databases often encompass, unevenly distributed and less representative rain gauge stations across Africa (Nikulin et al., 2012;Eklund et al., 2016).As a result, the accuracy of rainfall products that use gridded data as a reference can be lower than some individual SRE products.For instance, CHIRPS SRE product (Funk et al., 2015) consistently performed better than Multi-Source Weighted-Ensemble Precipitation (MSWEP; Beck et al., 2017) rainfall product across continental Africa (Awange et al., 2019) and its river basins located in Ethiopia (Taye et al., 2020) and Kenya (Omonge et al., 2022).Although there is a varying performance of SRE products in capturing rainfall, many studies merged a single SRE product with gauged data (e.g., Teng et al., 2017;Lu et al., 2020) rather than incorporating multiple products.
The overall aim of this study is to generate a high-quality gridded rainfall estimation dataset to best capture spatiotemporal patterns and the magnitude of rainfall over a highly urbanized catchment.To generate the dataset, four SRE products (i.e., CHIRPS, CMORPH, PERSIANN, and TAMSAT) were blended using the BMA framework.The SRE products were selected because of their data accessibility over the study area at high spatial resolution (i.e., ~8 km × 8 km or finer), their availability at a daily time step, and their long overlapping period (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019).Using seven statistical performance indicators, this study: (i) evaluates the quality of the blended rainfall product and individual SRE products against daily gauged data that serve as a reference, and (ii) examines the performance of rainfall estimates in representing spatial variability of monthly, seasonal and annual rainfall using visual and statistical comparisons.

Study area
The area of study is the Akaki catchment which is located in the central part of Ethiopia.Akaki catchment is one of the headwater catchments of the Awash River Basin as part of the Great East African Rift Valley System.The catchment is geographically situated between 8.76 • N -9.22 • N and 38.57• E-39.07 • E and has an elevation that ranges between 2000 m and 3400 m (Fig. 1).Akaki catchment has a size of about 1500 km 2 at the Aba Samuel hydropower dam which is the hydrological outlet of the catchment.The catchment has a shared drainage boundary with two upland sub-basins of the Upper Blue Nile Basin which are Guder and Muger catchments in the northwestern and northeastern parts, respectively.The area is targeted for this study because it has relatively good rainfall data from a network of gauging stations, it constitutes complex land cover and topography, and the area often experiences significant rainfall variability.Akaki catchment hosts Addis Ababa (i.e., the capital city of Ethiopia) and many small agglomerated towns.In addition to the urban land, agriculture and forest land covers are of significant extent.
Akaki catchment experiences a unimodal rainfall pattern with three distinct seasons (Mengistu et al., 2019;Shawul and Chakma, 2020).These seasons are (i) the short-rainy season (Belg) from February to May, (ii) the main-rainy (Kiremt) season from June to September, and (iii) the dry season (Bega) from October to December and January of the subsequent year.This seasonal variability is mainly governed by the movement of the Inter-tropical Convergence Zone (ITCZ; Knoche et al., 2014;Jin et al., 2021).Fig. 2 illustrates the three seasons using long-term daily rainfall distribution at five selected stations in the catchment.
The outer periphery of the Akaki catchment is surrounded by mountainous ridges including Mount Intoto in the northern, Mount Wechecha in the western, Mount Furi in the south-western, Mount Berek (Rufi) in the north-eastern and Mount Erer in south-eastern directions.Malby et al. (2007) and Napoli et al. (2019) describe that mountains have a palpable contribution to the formation of rainfall over a catchment by creating orographic clouds that result in frequent rainfall.As a result, stations located nearby the mountain chains located in the north (i.e., Mount Intoto and Berek) received the highest annual rainfall (1100-1250 mm).The lower amount of annual rainfall is recorded (i.e., 670-900 mm) at the stations in the southern part of the catchment which mainly is characterized by flat topography at lower elevations.In general, the spatial rainfall pattern over the Akaki catchment is topography related and gradually decreases from the mountainous areas in the northern part of the catchment to the low-lying and deprived area in the south.
As a headwater catchment, the Akaki river system is the main contributor to the Awash River which serves irrigation activities, but the river is also known to cause devastating floods.Two major river systems Big Akaki and Little Akaki that flow from the eastern and western escarpments of the catchment, respectively, originate from the mountains and flow through the Akaki catchment to ultimately drain into the Aba Samuel hydropower reservoir.Three water supply reservoirs Gefersa, Dire, and Legedadi are built on these river networks.Since the mountainous and urbanized catchment is frequently exposed to extreme rainfall events, flash floods are frequent (Adugna et al., 2019;Jemberie and Melesse, 2021;Bekele et al., 2022).  .A detail of the stations is provided in Table 1.

Data from rainfall gauging stations
In this study, daily rainfall data (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019) was obtained from the Ethiopian Meteorology Institute (http://www.ethiomet.gov.et/) which is responsible for monitoring the stations and managing the recordings.The gauging network comprised twenty-six stations of which thirteen were selected for further use (Table 1).As such thirteen stations were discarded because of their short recording span or a high number of missing values.Among the selected stations, two (i.e., CNO and SLT) are located in the Upper Blue Nile Basin and the remaining are located in the Awash Basin.Table 2 shows the location coordinates and elevation of the selected stations with their corresponding name and distinct identifier code as used in this study.Missing data for all stations is less than 17% except for SDF.Since the SDF station is the only station that captured the rainfall near the mountain range of Mount Berek, the station is not discarded.

Data from satellite rainfall estimates products
For this study, four microwave and infrared sensor-based SRE products were selected (Table 2).Three products provide satellitesensor-derived estimates only, and one of the products incorporates information from a low-density network of rain gauges that was made available for product developers.The products provide data in a range from 18 to 40 years at daily or smaller time steps with relatively high spatial resolutions of (i.e., ~8 km × 8 km or finer).Data from 2003 to 2019 is used in this study as for this period data is available for all SRE products as well as for all selected gauging stations.SRE products can briefly be described as follows.
1. CHIRP stands for Climate Hazards group InfraRed Precipitation product that is developed by the US Geological Survey (USGS) and the Climate Hazards Group at the University of California (Funk et al., 2015).Data sources used to provide rainfall estimates include two global geosynchronous thermal infrared satellite observations that are Globally Gridded Satellite (GriSat) and Climate Prediction Center (CPC) datasets, and the monthly precipitation climatology (CHPclim) data, the Cold Cloud Duration (CCD) information based on thermal infrared data, and the TMPA 3B42 precipitation data (Funk et al., 2015).The product provides precipitation data with a precision of 0.1 mm at quasi-global coverage of 50  ).This product used passive micro-wave from the sensors of multiple low-orbit satellites for estimating precipitation (Joyce et al., 2004).Infrared Radiation (IR) observations from multiple geostationary satellites are only used for interpolating rainfall intensity fields from consecutive microwave sensor data but are not applied in the process of estimating the rainfall.CMORPH precipitation product has quasi-global coverage of 60

PERSIANN-CCS (hereinafter used as PERSIANN
) is an infrared radiation-based satellite rainfall estimate product with a full nomenclature of Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) Cloud Classification System (CCS) (Hong et al., 2007).Cloud imageries collected from several infrared geostationary satellites are used to develop the product by relating cloud-top brightness temperature with rainfall rates.The product is available with 1 mm precision for the areas between 60 • N and 60 • S and can be accessed from http://chrs.web.uci.edu/ in NetCDF file format.4. TAMSAT stands for Tropical Applications of Meteorology using SATellite which is a rainfall estimate product over Africa, developed by the University of Reading (Tarnavsky et al., 2014).The product is developed by considering Meteosat thermal infrared images derived from cold cloud-top temperatures for identifying rain-inducing clouds and observed stations data are used for the calibration of the TAMSAT rainfall estimation algorithm (Maidment et al., 2017).A recent version of the product (i.e., TAMSAT v3.1) is available with a precision of 0.1 mm on https://www.tamsat.org.uk/ in the NetCDF file format that is used in this study.

Methodology
The methodological approach adopted in this study is illustrated in Fig. 3. Salient steps include preprocessing datasets, blending SRE products, and performance evaluation of the estimated rainfall dataset over the Akaki catchment.Detailed descriptions of these steps are provided in the respective sections below.

Pre-processing of gauged rainfall data
To certify the reliability and usability of the data, gauged daily rainfall data from thirteen stations were tested for consistency and possible errors.For each month, a box and whisker plot served to statistically identify outliers using daily rainfall data from thirteen stations (Fig. 4).While the boxes represented quartiles (25 and 75 percentiles) and median (50 percentile), the whiskers above the 75th (i.e., upper whisker) or below 25th (i.e., lower whisker) percentiles indicated values within 1.5 times of the inter-quartile range.In this study, rainfall recordings that are lower than the lower whiskers (i.e., < 0.5 mm) were ignored because of their irrelevance to the research aims of this study.Recordings higher than the upper whiskers were only nine in number and occurred at few stations.After visual inspection, these recordings were excluded for further analysis as the values may result from an erroneous recording by observers, or by typing errors during digitization.
Time series of gauged data also were evaluated by applying tests for homogeneity and stationarity to verify the existence of break points and abrupt changes, respectively.A homogeneity test served to identify station data with long-term systematic shifts due to factors such as relocation of the gauging site, and changes in measuring instrument and technique of measurement.This test was performed using the transPMFred algorithm as implemented in the R environment-based RHtests package (Wang and Feng, 2013).With 95% nominal level of confidence, the result showed that daily rainfall time series at twelve stations have no or insignificant change points and thus, the data is considered homogeneous.Time series at one of the stations (i.e., CNO) shows a significant shift that was corrected using the Quantile-Matching adjustments method available within the RHtests package.Stationarity of the data at all stations is verified by widely used Augmented Dickey-Fuller (ADF) and Phillips-Perron (PP) Unit Root tests in the R environment.As suggested by Longman et al. (2020), missing values within the quality-controlled rainfall data were filled in before interpolation using the daily time scale-based linear regression (LR) method (see Annex 1).The method is selected as the interstation distances are relatively short in the small-scale Akaki catchment.

Pre-processing of SRE products
The SRE products used in this study were available at various time intervals (i.e., half-hourly to daily), spatial resolutions (i.e., 0.0375-0.0727• ), and file formats (i.e., binary and NetCDF).To ensure the blending of the datasets, each file was decompressed and different preprocessing stages were implemented in sequence.The salient stages included format conversion, aggregation of data to daily time step, and clipping to the study domain.Finally, all the datasets were re-gridded into a unified spatial resolution of 0.0375 • × 0.0375 • (~4 km × 4 km) using the most commonly applied bilinear interpolation operator (e.g., Yang et al., 2020;Tadesse et al., 2022) so that the resolution matches with the resolution of TAMSAT and PERSIANN products.All pre-processing stages were conducted using Linux machine-based Climate Data Operator (CDO; Schulzweida, 2020) and NetCDF Operator (NCO; Zender, 2008) tools.These tools were developed at the Max Planck Institute for Meteorology and the University of California, respectively.

Dynamic Bayesian Model Averaging (BMA)
For blending SRE products with observed rainfall data, a BMA model (Raftery et al., 2005;Sloughter et al., 2007) with space and time-varying weightages was implemented in this study.In this Bayes theorem-based blending approach, the four SRE products are independent and competing prediction members.The weightage for SRE product members is determined according to their relative  1. Daily rainfall values outside the upper and lower whiskers are illustrated in red dots.contribution to predictive skill while using rain gauge sample training data corresponding to a certain predefined training window (Vrugt et al., 2008;Fraley et al., 2010).Steps involved in the determination of the weightages are (i) specify parameters such as number of iterations, number of non-zero observations, convergence tolerance, and power to transform data, and (ii) determine weightages by introducing the time series data of input variables (i.e., SRE products), control parameters, and training window into the blending algorithm having gamma modeling function that considers the probability of zero rainfall.The algorithm accounts for three variables that are (i) a dependent variable to be blended (i.e., rainfall), (ii) the corresponding observed rainfall data (gauge rainfall) with T recording period (G=y 1 ,y 2 ,…,y T ), and (iii) K number of SRE products (i.e., four in this study) derived ensemble (s=s 1 ,s 2 ,…,s k ).
Based on the law of total probability, the expression of the BMA predictive probability density function (PDF) for generating blended rainfall data (y) reads: where p(s k |G) is a posterior probability (likelihood) of the ensemble member (SRE) and p(y|s k , G) is the posterior distribution of the blended rainfall data (y) generated member (s k ) and the gauged training data, G.The first term is also known as a fractional statistical weightage, w k that shows how well the member matches with the gauge and the sum of these weightages is equal to 1.In this study, the Expectation-Maximization (EM) algorithm is used for automated optimal weightage iteration (Ma et al., 2018a).
Defining a training window with an optimum length of days and selecting a training dataset is critical to train the BMA model with the objective of error reduction as a result of optimized weightages (Fang and Li, 2016;Qi et al., 2019).However, the process of determining the optimum length of a training window is not straightforward as there are no standard requirements for the selection (Courtney et al., 2013).Also, window length may change subject to the type of predictive variables, study area (Liu, Xie, 2014), and objective functions used for training.For this study, the window is determined by examining the sensitivity of nineteen model training windows that range in length from 10 to 100 days with discrete time increments of 5 days.Five stations that are well distributed in the catchment (i.e., AAB, CFD, DRL, INT, and SBT) were selected as training sites to ensure the robustness and representativeness of respective training windows across different locations.In addition, considering their largest number of rainy days, four non-consecutive and wettest years over the catchment (i.e., 2004, 2006, 2010, and 2013) were identified as periods for training to test the training windows for different climatic periods and rain distributions.
Among the tested training windows, the one that satisfies the following three criteria was selected.These include (i) yielding minimal error as measured using objective functions, (ii) showing insignificant error difference with its neighboring training windows (i.e., an indication for stability with the change in training window), and (iii) the window should be as short as possible.Shorter training windows are considered to reduce the loss of information that occurs due to rapid temporal changes in the pattern and regime of rainfall (Berrocal et al., 2008) while longer training windows contain more data to better estimate BMA parameters, but series may not be appropriate to cope when regime of rainfall changes rapidly (Raftery et al., 2005).After setting the length of the training window, the BMA weightages of the four SRE products were computed at the location of each gauging station for the entire study period when at least two satellites recorded a rainfall event.As such, if there is a training window T with a length of 1st to the N th number of days, then the weightage for the day N + 1st is first determined using gauged rainfall data recorded during T = N number of days.For computing N + 2nd daily weightage, the rainfall data of the T = 1st is inactivated and replaced by the data of T = 2nd, and the training window T spans from T = 2nd to T = N + 1st (Fig. 5).The process recursively goes forward by generating dynamic weightages until the end of the study period.
The daily optimal weightages generated at each of the stations were subsequently interpolated over the study area by applying a universal coordinate system (i.e., World Geodetic System 1984 (WGS84) and Universal Transverse Mercator (UTM) zone 37 N; https:// epsg.io/32637) and normalized so that the sum equals one.Interpolation of the daily weightages data was conducted at a spatial resolution that unifies all SRE products (i.e., 4 km × 4 km) using the Inverse Distance Weighting (IDW) method.The underlying principle of the method is that observations (O) at closer stations have a higher contribution to the estimation of interpolated values (I) than observations from more remote stations.IDW equation reads: where D i is the Euclidean distance between the location of gauge i and the ungauged grid point, P is the distance weighting power (i.e., 2 in this study), and N is the number of gauging stations.
To generate a blended daily rainfall data estimate at each of the grid elements that cover the study area, weightages are tied to their Measures the average bias of predictions in yielding overestimation and underestimations.
However, it has a limitation in indicating the exact performance because of the summation of positive and negative errors.
N stands for the number of data points; i is an index number for each data point; E and O are satellite-derived estimated rainfall and ground-based gauges observed rainfall datasets, respectively; E andO shows arithmetic means of E and O datasets, respectively; OiSDEandSDO stands for standard deviation of E and O datasets, respectively; Quantifies the amount of an average error of a prediction with less sensitivity to the higher error magnitudes but by ignoring the sign (overestimation and underestimation) Penalizes larger errors with less sensitivity towards the smaller errors.Drawbacks of this indicator are: (i) sensitivity to high errors even though they are few in number and not real representative samples of the whole data and (ii) not possible to differentiate overestimation from underestimation.
Also known as relative bias, describes an averaged overestimation or underestimation tendency of a SRE product when compared to the gauge-based counterparts.
Indicates the extent of agreement between a given SRE product-derived data and its corresponding rain gauge observation.
Shows the magnitude of the SRE product residual variance as compared to the variance of the observed data.It indicates how the estimated and observed rainfall datasets are fitted on a 1:1 line (Nash and Sutcliffe, 1970).
1 Measures the performance of an estimated rainfall from correlation, variability, and bias perspectives.(Kling et al., 2012) corresponding SRE products.Hereafter, this blended rainfall product is labeled as "TAM-PERCHIMOR" which stands for either the first or middle three consecutive letters of the individual SRE products by ordering them based on their original spatial resolution from finer to coarser (see Table 2).In addition, a hyphen is inserted to differentiate the satellite + gauge, TAMSAT product from the satellite-only products (i.e., PERSIANN, CHIRP, and CMORPH).

Performance assessment
To examine the performance of the blended rainfall product as compared to the individual SRE products, point-to-pixel comparison, and cross-validation performance assessment approaches were implemented.In the point-to-pixel comparison approach, the average error of estimated rainfall datasets was defined by comparing SRE products-derived pixel values against gauged data recorded at the same geographic locations with the assumption of the recorded point time series data are reference observations for the counterpart pixels of the SRE products.To examine the performance of rainfall estimates at ungauged grid points, the leave-one-out cross-validation (LOOCV) technique is selected (e.g., Cai et al., 2019;Ossa-Moreno et al., 2019).In LOOCV, SRE products-based daily time series of one station is excluded by assuming it as un-gauged.In this study, rainfall at the location of the excluded station is estimated by the IDW interpolation technique using daily rainfall from the remaining twelve stations.This procedure is repeated for all thirteen stations, so the time series of each station is sequentially excluded from the analysis.The accuracy of the interpolated daily time series for the excluded stations is assessed using different performance indicators by using the counterpart observed rain gauge data as a reference.
Results of the performance evaluation approaches are presented using seven statistical performance indicators (Table 3).These include Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Bias Error (MBE), Percent Bias (PBIAS), Nash-Sutcliffe Efficiency (NSE), Kling-Gupta Efficiency (KGE), and Pearson's Correlation Coefficient (CC).Different statistical indicators were considered to examine the performance of rainfall estimates based on their associated error that were measured from different perspectives.Analysis of these performance indicators is conducted using hydroGOF package in the R environment.To assess the space and time difference between the observed and estimated rainfall, gridded maps are created at annual, seasonal, and monthly time steps.At each grid cell of the entire catchment, residual differences are computed by deducing the gridded SRE products from the counterpart IDW interpolated rain gauge data.Lastly, for each time step and rainfall estimate, bar plots are produced to show the magnitude and distribution of the residual differences by assigning the magnitude of residual difference in the abscissa (x-axis) and the respective number of grid cells in the ordinate (y-axis).In addition, the mean of residual difference (MRD) is determined for each time step and rainfall estimate by dividing the total residual difference by the total number of grids.This is to show how far the overall averaged residual difference deviated from the targeted zero difference.

Length of training window
Determining the length of a training window is critical and marks the initial step to defining the parameters of the BMA algorithm.These parameters involve the weightages for each SRE product.In this study, to identify the optimum BMA training window, nineteen training windows are evaluated for four wettest years (i.e., 2004, 2006, 2010, and 2013) and for five gauging stations (i.e., AAB, CFD, DRL, INT, and SBT) that are distributed across the study area.Results of the evaluation of the training windows as obtained from three commonly used objective functions (i.e., MAE, RMSE, and MBE; Willmott and Matsuura, 2006) are illustrated using box and whisker plots (Fig. 6).The box and whisker plots are based on twenty data from four training periods (i.e., the wettest years) and five calibration sites (i.e., the gauging stations).MAE and RMSE show the magnitude of the mean error whereas MBE indicates the direction of the error bias in terms of underestimation and overestimation.Box-whisker plots for each of the three objective functions differ in box size and distribution and thus indicate the effect of increment in the length of training windows from 10 days to 100 days.
The median of the plots exhibited a non-consistent pattern for the shorter training windows (i.e., less than 30 days) which is caused by the high rainfall variability.The median of the objective functions shows a smooth transition for medium and large training windows (i.e., greater than 30 days) and that reveals less attribution of rainfall variability with transition in the training data of consecutive training windows.As the length of the training window increases, the median for MAE and RMSE shows a gradually decaying smooth pattern.However, median values for MBE follow an overall increasing pattern with the length of the training window.Both the overall increasing (i.e., for MAE and RMSE) and decreasing patterns (i.e., for MBE) are pointed toward the desired value of each objective function (i.e., zero).This indicated that there is an overall error reduction in response to the increment in the length of the training window that occurs due to enlargement in the sample size of training data.The difference in the interquartile range (i.e., the length of boxes) of the three objective functions is another important feature of the plots that can be attributed to the variability of data values.MAE and RMSE show boxes of shorter length but relatively larger value ranges for RMSE that can be attributed to the effect of squaring that assigns larger weight values when errors increase.In contrast, MBE exhibits boxes of longer length due to the expected range of variability between inconsistent overestimations and underestimations across different years of the training period and the location of gauging stations.
The shortest and most stable training windows appeared from 35 to 45 days and 40-50 days at the median of MAE and RMSE, respectively.This shows that windows of 40 and 45 days are consistently stable in terms of both objective functions.For the two periods, 45 days showed better performance in terms of MBE and thus 45 days is selected as the optimum training window for this study.Here, the error may reduce even further if a training window of longer than 60 days was selected.However, it compromises the aim of this study that blending results should not be provided for too long windows because that obstructs the ability of the BMA model to detect rapid changes in the occurrence of rainfall over a relatively small area of the Akaki catchment.The selected 45 days training window well harmonized with the range of training windows of 30-55 days that were reported by Ji et al. (2019), Rahman et al. (2020a), andYin et al. (2021).

Spatial and temporal pattern of weightages
In BMA approaches, weightages are assigned based on the performance of respective SRE products in reference to the gauged data.Therefore, weightages directly indicate the relative contribution of each SRE product to the blended rainfall estimate and predictive skill of the approach.Fig. 7 illustrates the intra-annual variability of daily accumulated fractional weightages for the four SRE products and daily rainfall data.The rainfall data was gauged at five well-distributed stations during the wettest year (i.e., 2010) over the catchment.
In all plots, weightage of SRE products corresponding to stations from similar topographic settings follows a similar pattern.This may be attributed to the true rainfall pattern over the study area that is influenced by catchment orography due to the presence of mountain ranges (i.e., Mount Intoto and Mount Berek) at the northern and northwestern parts of the catchment.As a result, stations in highly elevated areas (i.e., AAB and INT) received a higher amount of rainfall.However, a lower amount of rainfall was recorded at DRL and CFD stations that are located in the lowland plain areas at the south and southeastern part of the catchment, including the outlet of the Akaki River.Exceptionally, the remaining selected stations (i.e., SBT) exhibit a unique property that follows patterns of stations at higher mountains and in lowland areas during different seasons.This may be attributed to transitioning topography from high to low altitude that received moderate rainfall amount.
During the main-rainy season that stretches from June -September, the weightage of TAMSAT exceeds that of other SRE products at all stations excluding the station that received the lowest rainfall (i.e., DRL).The dominance of TAMSAT weightage slightly changed as the season advanced towards August and September due to an increased weightage of PERSIANN particularly, at DRL and SBT stations.Regardless of the space and time, the weightage of CHIRP product shows an inverse relation with the amount of rainfall by showing a decreasing trend as the rainfall increases.This indicates a notable performance of CHIRP in capturing a lower amount of rainfall events.Opposing this, weightage of TAMSAT is higher with the magnitude of rainfall events but still holds the major contribution at a few stations (i.e., AAB and INT) during the period of no/lower amount of rainfall events.
During the dry season that stretches from October to January, the contribution of CMORPH is dominant at low and moderate rainfall receiving stations (CFD, DRL, and SBT).At the other stations (AAB and INT), weightage of TAMSAT dominated other SRE products.In both situations when CMORPH and TAMSAT interchangeably exhibit higher weightage at different locations, CHIRP consistently held most of the remaining weightage by jointly sharing the weightage accumulation with CMORPH and TAMSAT.Significant weightage of PERSIANN is noticeable at most of the stations during the periods of intermittent rainfall events which often occurred in the short-rainy season (i.e., February to May).
In the elevated area at the central and northern part of the catchment, the highest weightage of the SRE products varied with topography during the dry season.The weightage of CHIRP and TAMSAT was highest in elevated areas at the central (i.e., AAB) and the north (i.e., INT) parts of the catchment, respectively.In general, the SRE products show large variability in weightage for all seasons and stations that results from varying performance of SRE products across time and space.Therefore, this finding emphasizes the necessity and signifies blending several products for improving the quality of rainfall estimates.
In most of the sub-plots, highly correlated data points are shown in the range of moderate rainfall intensities (i.e., 5-20 mm/day; Zambrano-Bigiarini et al., 2017).This indicates that SRE products and the blended rainfall product well captured light rainfall.The highest correlation coefficients (i.e., 0.46-0.53)and dense data points are observed near the 1:1 best-fitting line at AAB, INT, and SBT stations, particularly for the blended rainfall product, but also for CHIRP and TAMSAT products.Unlikely, for CFD and DRL stations, data points show that scatter and correlation coefficients are lower than any other data pair.In particular, the correlation coefficient at the DRL station for all rainfall estimates (i.e., 0.21-0.28) is lower by approximately half when compared to other stations.This illustrates that the blended and individual rainfall estimates were promising in capturing rainfall amounts over a mountainous area, but
The result from Pearson's correlation coefficient shows that the blended rainfall product surpassed the satellite-only products (i.e., CHIRP, CMORPH, and PERSIANN) but coefficients are comparable with the satellite + gauge product (i.e., TAMSAT).The higher correlation coefficient attained by TAMSAT product is attributed to the higher number of gauging stations that are used to develop the product for central parts of Ethiopia (Tarnavsky et al., 2014), which includes the area of the Akaki catchment.Other studies also show good performance of TAMSAT over Ethiopia (e.g., Young et al., 2014;Fenta et al., 2018;Dinku et al., 2018).
Although a correlation coefficient is a useful performance indicator, it only reveals a result about linearity between the observed and estimated rainfall.Therefore, further analysis was performed that aimed at validation of the blended rainfall product and the individual SRE products (Table 4).Two approaches (i.e., point-to-pixel and LOOCV) and six performance indicators were applied using daily data (2003-2019) from the five stations as a reference.The six statistical indicators were selected to measure average error (i.e., MAE and RMSE), to indicate over/underestimation (MBE and PBIAS), and to describe the extent of agreement between the gauged and estimated rainfall (NSE and KGE).
Based on point-to-pixel and LOOCV-based comparisons, the error of rainfall estimation is reduced (see MAE and RMSE) and the extent of agreement has improved (see NSE and KGE) in blended rainfall product as compared to the individual SRE products (Table 4: see green cells).According to the result from point-to-pixel based error indices, the blended rainfall product reduced (i) MAE of CHIRP, CMORPH, PERSIANN, and TAMSAT by 88, 146, 40, and 102 mm/year, respectively, and (ii) RMSE by 66, 343, 361, and 274 mm/year, respectively.Based on LOOCV-based assessment, the blended rainfall product shows error reduction in terms of (i) MAE of CHIRP, CMORPH, PERSIANN, and TAMSAT by 128, 336, 4, and 102 mm/year, respectively, and (ii) RMSE by 197, 482, 259, and 325 mm/ year, respectively.Similarly, the extent of agreement between the gauged data and the blended data significantly improved over the individual SRE products, with exception of TAMSAT.Opposing other satellite-only products, the value of NSE and KGE for the satellite + gauge TAMSAT exhibited a close performance as the blended rainfall product (Table 4: see blue cells).
In terms of the direction of bias, only PERSIANN distinctively underestimated observed rainfall (Table 4: see orange cells).In contrast, the blended rainfall product and the remaining individual SRE products overestimate the observed rainfall amount over the Akaki catchment.Overall, CMORPH and PERSIANN performed the least whereas TAMSAT and CHIRP exhibited better performance.This finding well complies with other studies conducted in Awash River Basin, Ethiopia, and East Africa at large (e.g., Young et al., 2014;Bayissa et al., 2017;Dinku et al., 2018;Fenta et al., 2018;Mekonnen et al., 2021).The main conclusion is that the blended rainfall product shows significant improvement over the individual satellite products.

Wet season-based comparison of rainfall datasets
The main rainy season of the Akaki catchment extends from June to September with substantial differences in rainfall recorded across the gauging station.The starting (June) and ending (September) months are characterized by moderate rainfall whereas July and August commonly experience a higher intensity and a large amount of accumulated rainfall.Fig. 9 shows box and whisker plots that incorporate data from the blended and individual SRE products with the daily rainfall recorded at AAB, CFD, DRL, INT, and SBT gauging stations for the main rainy seasons (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019).All rainfall products similarly mimicked the pattern of observed rainfall by attaining smaller and higher amounts of rainfall during the outward (June and September) and middle months (July and August), respectively.
In the first consecutive three months of the main rainy season, CMORPH and PERSIANN consistently underestimated the median amount of daily rainfall.In September, CMORPH slightly overestimated the median of daily rainfall recorded during the specified month but PERSIANN uninterruptedly underestimated the median daily rainfall regardless of location and month.Across the main rainy season and at many gauging stations, the blended rainfall product, TAMSAT and CHIRP overestimated the gauged median daily rainfall, particularly in the middle two months.During the outward months, the order of overestimation changed to blended, CHIRP, and TAMSAT at many of the stations.The blended rainfall product and TAMSAT have shown similar medians in September.In similar, TAMSAT and CHIRP exhibited an equivalent median of daily rainfall in August.
The length of boxes in the plot (i.e., inter-quartile range) shows the variation for the sample of observed data.Box length deviations of the four SRE products from the gauged data are smaller during the middle two months as compared to larger differences for the first and last month.During the entire main-rainy season, PERSIANN and TAMSAT consistently showed tangible underestimation and overestimation of rainfall amounts, respectively.The remaining individual SRE products (i.e., CHIRP and CMORPH) showed an overestimation but with a smaller margin.In particular, the blended rainfall product fairly captured the inter-quartile range of the The occurrence of very high daily rainfall is represented by the upper whisker.In all sub-plots, CMORPH and PERSIANN have shown low to high underestimation of heavy rainfall across all locations and datasets with exception of an overestimation that occurred at SBT and DRL for July and September, respectively.TAMSAT consistently overestimated observed data as its upper whisker is consistently higher but shows an underestimation at INT station, particularly in June and July.A slight underestimation of CHIRP and the blended rainfall product was noticed at CFD and DRL stations during the first two months of the main-rainy season (June and July) and later changed to an overestimation for August and September.In addition, during the last two months, the rainfall products consistently show smaller underestimation at AAB and INT stations but overestimated the observed heavy rainfall at CFD, DRL, and SBT stations.Overall, the blended rainfall product showed a significant improvement over the individual SRE products in capturing both median and very high daily rainfall during the main rainy season.Among the individual SRE products, CMORPH and CHIRP performed relatively well in capturing median and very high daily rainfall during the main rainy season, respectively.

Spatial variability of rainfall datasets
Fig. 10 shows the spatial distribution of seasonally categorized mean monthly rainfall (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019) over the Akaki catchment for gauge observations, the blended rainfall product, and individual SRE products.During months of the dry and short-rainy seasons, the spatial pattern of gauge-based rainfall over the study area exhibited relatively less variability and uniform distribution across the catchment.This spatial pattern of mean monthly rainfall is well captured by the individual SRE products and the blended dataset because satellites are often able to easily estimate no/low rainfall amounts which are mostly expected to occur during these seasons.In contrast to the considered rainfall estimates, only CMORPH highly overestimated mean monthly rainfall during these seasons at the western part of Akaki catchment.
During the main-rainy season, the gauge-based spatial rainfall pattern over the study area exhibited a rainfall decrement from the mountainous parts of the catchment in the northwestern parts to the low-lying areas in the southeastern parts of the catchment.This decreasing rainfall trend was shown by the blended rainfall product as well as the individual SRE products but was relatively better captured by the blended rainfall product and CMORPH.Compared to the observed mean monthly rainfall, PERSIANN showed high underestimation across many grid elements with exception of a few grids in the south.In this less rainfall-receiving part of the catchment, PERSIANN and CMORPH well captured the observed mean monthly rainfall.The remaining SRE products (i.e., CHIRP and TAMSAT) showed a good performance in the northwestern part of the catchment but with a slight overestimation of mean monthly rainfall in other parts of the catchment.In general, the significant bias of the rainfall product prominently occurred during the mainrainy season as compared to the dry or short rainy seasons of the catchment.This implies the estimation of rainfall during the main rainy season is associated with higher uncertainty because of difficulties to capture extreme rainfall events which prominently occurred during this season.
All SRE products and the blended rainfall product show significant spatial variability and are better performed over the northwestern than the southeastern part of the catchment for mean monthly rainfall, particularly during the main rainy season.The northwestern part is mainly characterized by mountainous ridges dominated by urban and forest and receives a higher amount of rainfall and season when compared to the flat plain areas at the southeastern of the study area.Therefore, it can be concluded that the rainfall estimates were good at capturing rainfall over mountainous areas.Over the central area which mainly encompasses Addis Ababa city, the mean monthly rainfall during the main rainy season was well captured by all rainfall products but was highly underestimated by PERSIANN.These results show that the blended rainfall product provided more reliable spatial precipitation estimates in many parts of the catchment and across various temporal scales by capturing both the spatial variability and the amount of observed rainfall.

Spatial difference of observed and estimated rainfall
Fig. 11 illustrates the residual difference between the satellite-based rainfall products and spatially interpolated gauged rainfall.Differences are indicated as spatially distributed residual differences of rainfall at monthly, seasonal and annual time scales.In the first column, a residual difference of gridded mean monthly rainfall data is represented for each SRE product and the blended dataset.In this column of bar plots, the residuals for the considered rainfall estimates are more concentrated near the targeted zero value but the mean of the residuals is slightly different.The mean of residual difference (MRD) for monthly CHIRP and TAMSAT is − 9.8 and − 13.3 mm, respectively which is negative and shows an underestimation.In contrast, PERSIANN overestimated the interpolated observed rainfall data with the MRD of 19.9 mm.Although MRD for CMORPH is 1.1 mm which nearly overlapped the targeted zero difference, the result was achieved by equating the number of overestimated grid cells with several underestimated grid cells.In the case of the blended dataset, the distribution of residual differences is nearly flat, the mean residual nearly matched the zero difference with MRD of − 1.5 mm and the bars are relatively short that revealing the presence of a low number of grid cells for the respective value of residual differences.
The second column of bar plots shows the distribution of residual difference from gridded mean seasonal rainfall which was estimated using individual SRE products and the blended dataset.Except for CMORPH, the distribution of residual difference for the remaining rainfall estimates is nearly flat with a symmetrical distribution of bars in the regions of overestimation and underestimation.In this column, the position of MRD with respect to the targeted zero difference for all rainfall estimates exhibits similarity trend as the MRD in the monthly column as indicated above.The third column of bar plots shows the distribution of residuals from spatially distributed mean annual rainfall during the study period (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019).The plots dominantly indicate a concentrated distribution of the residuals for all rainfall estimates.While the distribution is skewed towards underestimation for CHIRP and TAMSAT, the distribution is skewed to the region of overestimation in the case of CMORPH and PERSIANN.For the blended dataset, the MRD approached the target zero difference more than all individual SRE products without notably skewing towards overestimation or underestimation residuals.In general, the blended dataset brought an improvement by significantly reducing the residual difference when compared to the individual SRE products at all the evaluated time scales (i.e., monthly, seasonal, and annual).

Discussion
This study is one of the first to test and evaluate the blending of high-resolution satellite products for an urbanized catchment with complex topography.As a result, comparing the findings of this study with others is challenging by the lack of comparative studies.However, there are few studies on a larger spatial domain that engulf the Akaki catchment.These are the Awash River Basin (e.g., Hirpa et al., 2010;Romilly and Gebremichael, 2011;Adane et al., 2021) and the Upper Awash sub-basin (e.g., Mekonnen et al., 2021).In addition, since the Akaki catchment is geographically located in the central part of Ethiopia where the Awash and the Blue Nile Basin intersect, areas within both river basins share similar characteristics by experiencing high rainfall and constituting complex topography (Romilly and Gebremichael, 2011).Therefore, the findings of this study are further discussed with reference to these studies.
In this study, the effect of topographic variation on the performance of the rainfall estimates is demonstrated.Both the blended and the four individual SRE products relatively performed better in capturing rainfall at the mountainous highland areas in the north than the rainfall over the lowland parts in the south of the Akaki catchment.The results of studies previously conducted in Awash and Upper Blue Nile basins strongly abide by this finding (e.g., Romilly and Gebremichael, 2011;Abera et al., 2016;Belay et al., 2019;Mekonnen et al., 2021).According to Gebremichael et al. (2014) and Young et al. (2014), a such strong influence of topographic variation on the performance of SRE products is probably related to the efficiency of signal retrieval algorithm in detecting various processes in the rainfall formation systems over different topographic settings.However, there is a paradoxical result reported by Fenta et al. (2018) showing better performance of SRE products in the lowlands than over the highlands.As indicated by Wedajo et al. (2021), this likely can be attributed to the difference in topographic, land cover, and climatic conditions between the study areas.
The four individual high-resolution SRE products (i.e., TAMSAT, CHIRP, PERSIANN, and CMORPH) considered in this study performed differently.Particularly, most of the SRE products including TAMSAT, CHIRP, and CMORPH showed an overall overestimation of rainfall over the Akaki catchment.Similarly, Wedajo et al. (2021) compared four products (i.e., CHIRPS, IMERG, TRMM, and TAMSAT) for an area in the Upper Blue Nile basin and also revealed an overestimation of the products.Opposingly, PERSIANN is the only product that underestimates the rainfall over the Akaki catchment.Such results can be found in studies conducted for the Awash and Upper Blue Nile basins by Hirpa et al. (2010) and Romilly and Gebremichael (2011), respectively.Among the products, TAMSAT showed a surpassing strength in better capturing the spatiotemporal pattern of rainfall over the Akaki catchment.In similar, the outstanding performance of TAMSAT over different parts of Ethiopia is shown (Young et al., 2014;Dinku et al., 2018;Mekonnen et al., 2021).The high number of gauging stations used from the central part of Ethiopia and the efficiency of a deployed algorithm (i.e., apply geographically varying local calibration) for developing TAMSAT contributed to its good performance (Greatre et al., 2014;Fenta et al., 2018;Dinku et al., 2018).Next to TAMSAT, also CHIRP performed well in capturing rainfall over Akaki catchment.Similar to the finding of this study, other studies also reported comparable performance of TAMSAT to CHIRP (Fenta et al., 2018;Tadesse et al., 2022).Belete et al. (2020) also showed similar performance but in that study, CHIRPS slightly performed better than TAMSAT.According to Dinku et al. (2018), the close performance of the two products likely occurred because they shared the same dataset for bias correction (i.e., CHPclim) and their difference emerged due to the algorithm difference used in the product development.
In contrast to the best SRE products for the Akaki catchment, PERSIANN exhibited the weakest performance in estimating rainfall over the study area.Other studies also depicted poor performance of the product over the regions of the Upper Blue Nile basin (Bitew and Gebremichael, 2011;Bayissa et al., 2017).Compared to PERSIANN, CMORPH performed better in estimating rainfall over the Akaki catchment and Bartsotas et al. (2018) also reported this in their study conducted over the subtropical area of Blue Nile.Overall, the salient findings from the performance evaluation of the individual SRE products strongly agree with studies conducted over nearby regions.The most important finding of the current study is that BMA resulted in an improved blended rainfall product.Few other studies at large scale regions in China and Pakistan (e.g., Ma et al., 2018b;Rahman et al., 2020b;Yin et al., 2021;Li et al., 2021), also indicated the favorable performance of blending as compared to single products.Further testing of BMA as applied in this study in smaller catchments of different regions, is highly recommended to improve the spatiotemporal quality of rainfall estimates.This study also advocates further enriching the blending algorithm by developing more effective methodologies.

Conclusion
This study successfully tested Bayesian Model Averaging (BMA) to develop a high-quality blended rainfall dataset (2003-2019) using four high-resolution SRE products (CHIRP, CMORPH, PERSIANN, and TAMSAT).The data set at daily time step is prepared for a highly urbanized, 1500 km 2 catchment in Ethiopia.The catchment experiences significant annual rainfall variability (670-1250 mm).For blending, BMA-derived weightages were generated at daily time steps and intertwined with their SRE products.Daily, wet season, and mean annual rainfall performance evaluations were conducted for the blended rainfall product and the individual SRE products that were intercompared, and referenced to recorded rain gauge data.The main conclusions are: • Time series patterns of BMA-derived weightages for each SRE product show large variability.Weightages of TAMSAT and CHIRP were consistently dominant across the rain gauge locations for periods of heavy and light rainfall, respectively.Analysis of weightages of individual SRE products during the dry season shows that weightages can be related to the topography of the catchment for the mountainous, plane, and low-lying areas, where TAMSAT, CHIRP, and CMORPH show the highest weights, respectively.This indicates the nonexistence of outperforming individual rainfall products and signifies the need to evaluate the blending of several products for optimal utilization of each product.
• Findings reveal that (i) the quality of the blended rainfall dataset significantly improved when compared to the individual SRE products, (ii) the satellite + gauge product (i.e., TAMSAT) showed comparable performance with the blended rainfall product, (iii) except for PERSIANN that underestimated the gauged rainfall, the remaining SRE products show overestimation, and (iv) CMORPH and PERSIANN performed the least whereas TAMSAT and CHIRP showed good performance in capturing daily rainfall over Akaki catchment.• Assessment from spatial variability of mean annual and wet season rainfall indicated that the blended rainfall product and CMORPH well-captured observed rainfall across the catchment but the remaining individual SRE products showed less spatial variability.In terms of rainfall amount, the blended rainfall product well matched the gauged data in many parts of the catchment including the mountainous areas in the northern part of the catchment.• Assessing the spatially distributed residual difference between the interpolated rain gauge data and SRE products showed that the blended rainfall product significantly reduced the residual difference.At monthly, seasonal, and annual time scales, the distribution of residual difference for the blended product is nearly flat, the mean of residuals highly approached the targeted zero difference and the number of a grid cell for the respective value of residual differences is relatively smaller when compared to the individual SRE products.• Overall, none of the individual satellite products outperformed the other individual products in capturing various aspects of the gauged rainfall data.However, the blended product outperformed the other products in many aspects.This indicates that rainfall estimation largely improved from blending multiple satellite products that result in higher quality rainfall representations by improved matching to rain gauge rainfall.

Fig. 1 .
Fig. 1.(a) Location of Awash River Basin in Ethiopia, (b) location of Akaki catchment in Awash River Basin, and (c) geographic setting of Akaki catchment relative to its hosting Awash River Basin and the adjacent, Blue Nile River Basin.The catchment is illustrated with its features including elevation, mountains, reservoir, and river systems, surrounding meteorological stations, and the boundary of Addis Ababa.The symbol size of the meteorological stations varies with their corresponding mean annual rainfall (1990-2019).

Fig. 2 .
Fig. 2. Long-term and seasonally discrete daily rainfall distribution recorded at the selected meteorological station in the Akaki catchment.A detail of the stations is provided in Table1.

Fig. 3 .
Fig. 3. Flowchart for blending SRE products, evaluating its performance, and detecting spatial and temporal rainfall patterns over the Akaki catchment.

Fig. 5 .
Fig. 5. Recursive process of computing daily weightages using gauged daily rainfall data corresponding to a forward-moving training window.

Fig. 6 .
Fig. 6.Evaluation of length of training windows by MAE, RMSE, and MBE performance indicators.Box and whisker plots apply to the four wettest years and five stations (i.e.twenty data points).The optimum training window (i.e., 45 days) is marked in green.

Fig. 7 .
Fig. 7. Accumulated daily fractional weightage of individual satellite rainfall estimates for the period of 2010.Recorded daily rainfall at the five stations is plotted on the secondary axis in black broken lines.

Fig. 9 .
Fig. 9. Box and whisker plots of daily rainfall (2003-2019) of the wet season as observed at five gauging stations and counterparts of the blended rainfall product and individual rainfall products (i.e., CHIRP, CMORPH, PERSIANN, and TAMSAT).

Fig. 11 .
Fig. 11.Bar plots showing the amount and distribution of the residual difference between the IDW interpolated observation data and gridded SRE products over the Akaki catchment.Comparison is at monthly, seasonal and annual time scales for the study period (2003-2019).The green broken lines and the red solid lines indicate the targeted zero difference, and the mean of residual difference (MRD), respectively.

Table 2
Percentage of missing daily rainfall data recorded during 2003-2019 at the selected stations with their corresponding geographic coordinates, elevation, and unique identifiers, as used in this study.Satellite precipitation products used in this study with their sensor types (i.e., IR and MW that stand for infrared and passive microwave, respectively), data sources, spatiotemporal resolution, period of data availability, and number of grid elements that overlay the study area.

Table 3
List of performance indicators used in this study with their respective equation, range of value, and desired (i.e.optimum) value.