Novel approach to integrate daily satellite rainfall with in-situ rainfall, Upper Tekeze Basin, Ethiopia

The daily rainfall is the most important and demanded input of water resources studies, challenged by typically low density and/or poor quality of in-situ observations. However, the satellite earth observation, through freely available web-based products, can provide complementary rainfall data. Such data is however, typically affected by substantial error, particularly at daily temporal resolution. Therefore, effective methods and protocols of rainfall downscaling, validation, and bias-correction are needed. The aims of this study were to: i) validate two downscaled satellite-derived daily rainfall products, CHIRPS and MPEG, against in-situ observations; ii) merge the downscaled products with in-situ observations to improve their accuracy and evaluate them to select better performing one. This study was conducted at topographically complex, Upper Tekeze Basin (UTB), separately for the wet and dry seasons, within 1 January 2015 – 31 December 2018. Validation of the products, downscaled by nearest-neighbor (NN) and bilinear (BL) methods, was carried out using descriptive statistics, categorical statistics and bias decomposition methods, introducing novel protocol with new bias indicators for each of the evaluation methods. The validation showed large biases of CHIRPS and of MPEG, larger for CHIRPS than for MPEG, larger in dry than in wet season and slightly larger for NN than for BL. To correct biases of the downscaled CHIRPS and MPEG, each was merged with the in-situ observed rainfall applying Geographically Weighted Regression (GWR) algorithm and using rainfall dependence on altitude as explanatory variable. The GWRmerging method substantially improved the accuracy of the MPEG and CHIRPS, with slightly better final accuracy of MPEG than of CHIRPS, better in wet than in dry season. This study confirmed that GWR-merged method could substantially reduce daily bias of satellite rainfall products, even in topographically complex areas, such as the UTB. Further improvement of the method application, can be achieved by densifying raingauge network and eventually by adding accuracy-effective explanatory variable(s).


Introduction
Water resources studies are challenged by the need of accurate data acquisition, at suitable, for a required problem, spatial and temporal data coverage (Gebere et al., 2015). Rainfall is the key driving force in hydrological studies, required to understand the complexity of water cycle and to manage water resources (Haile et al., 2009;Zambrano-Bigiarini et al., 2017). African agriculture and other water-related activities are dependent on quantity of rainfall and also on its spatiotemporal variability . Though good quality and distribution of rainfall measurement is critical, the meteorological gauging station networks in Africa are not sufficient in quantity and also not well spatially distributed (Luetkemeier et al., 2018), i.e. particularly scarce in mountainous areas, where rainfall intensity is substantially higher than in lowlands and more variable in time and space (Rahmawati and Lubczynski, 2017). Therefore, obtaining high-quality rainfall data in mountainous areas, at fine spatial and temporal (e.g. daily) resolution, is a demanding but also challenging task (Hu et al., 2019).
In Developing Countries, particularly in high elevated areas with complex topography where rainfall is highly spatio-temporally variable but in-situ observations are scarce or unavailable (if available, then representative only locally), installing dense, spatially representative rain gauge network is not realistic as it requires large number of rain https://doi.org/10.1016/j.atmosres.2020.105135 Received 4 May 2020; Received in revised form 20 June 2020; Accepted 12 July 2020 gauges locally (Dinku et al., 2014;Alazzy et al., 2017). A common practice in water resource studies is interpolation of in-situ rainfall observations (Keblouti et al., 2012;Camera et al., 2013;Bárdossy and Pegram, 2013;Gebremedhin et al., 2018;Kahsay et al., 2019). However, the interpolated rainfall from sparse observations is uncertain (Rahmawati and Lubczynski, 2017). This limitation is particularly critical in Ethiopian highlands where this study is situated and where there is relatively high and spatially variable rainfall (Haile et al., 2009), but sparsely distributed rainfall gauging network. Moreover, in Ethiopian highlands, obtaining continuous daily rainfall from in-situ observations is a challenge due to frequent rain gauge measurement discontinuities (Fenta et al., 2018). Hence, alternative data sources and methods of rainfall assessment are needed for water resources management and research, which require reliable spatio-temporal water input estimates.
Satellite-derived rainfall is a potential alternative or complementary source of spatio-temporal rainfall data, especially in complex and inaccessible terrains. Thermal Infrared (TIR) and Passive Microwave (PMW) sensors use standard algorithms to retrieve rainfall from satellites (Kidd, 2001). The TIR-based rainfall rates are inferred from cloud top temperatures assuming that rainfall and cold cloud duration are linearly correlated while the PMW-based approach provides atmospheric liquid water content and rain rates by penetrating clouds. However, the TIR and the PMW have their own limitations. The TIR sensors do not penetrate clouds (Kidd, 2001), underestimate warm orographic rain and have weak performance to discriminate cirrus clouds from rain clouds (Thiemig et al., 2013;Dinku et al., 2014), while the PMW sensors can confuse very cold surface, like mountain tops covered by ice, with rainfall (Toté et al., 2015).
Despite ongoing improvement of satellite-derived rainfall products, a site-specific validation of preselected rainfall products, to choose the most appropriate for a given study area is needed (Zambrano-Bigiarini et al., 2017;Fenta et al., 2018). Once the type of satellite product is selected, then usually, that product is downscaled and bias-corrected using ground-based rainfall measurements, before using it in hydrological studies. Satellite-derived rainfall products are subjected to errors reflected by the weak relationships between the remotely retrieved, pixel-wise and ground-based point-wise rainfall rates. Such errors can be due to: i) scale difference between the point-wise in-situ observations and the pixel-wise satellite rainfall products (Dinku et al., 2014;Rahmawati and Lubczynski, 2017;Lekula et al., 2018); ii) technical satellite sensor constrains (mentioned above); iii) impact of environmental factors; iv) temporal satellite sampling constrains, i.e. decline of accuracy with the increase of temporal resolution (Kidd, 2001).
The satellite-derived rainfall products are typically available at lower resolution (larger pixel size) than required by water resources projects, so they have to be downscaled. The nearest neighbor, bilinear, and bicubic are the common methods used to downscale the spatial resolution of pixels (Getreuer, 2011). For instance, Kimani et al. (2017) downscaled satellite-derived rainfall products using nearest neighbor and bilinear methods to enhance the spatial scale of satellite-derived products over east Africa. Their result indicated that the nearest neighbor outperformed bilinear downscaling method to represent the ground measured monthly rainfall. Other studies favored bilinear downscaling method to generate smooth interpolated satellite-derived rainfall before merging them with in-situ observations (e. g. Ulloa et al., 2017;Chen et al., 2019). So, the selection of a downscaling method requires testing at site-specific study.
Merging of satellite-derived rainfall with locally available in-situ observations can improve the accuracy of rainfall product (Dinku et al., 2011). There are several approaches to merge satellite-derived rainfall products and in-situ rainfall, including mean bias correction (Lekula et al., 2018), Bayesian merging approach (Todini, 2001;Ma et al., 2018;Kimani et al., 2018), objective analysis (Rozante et al., 2010) and optimal interpolation (Shen et al., 2014). However, such approaches lack consideration of explanatory variables that have a great impact on rainfall such as orographic influence (Shi et al., 2020). In contrast, the regression-based algorithms can consider explanatory variables during merging approach. Recent literature introduced a new regression-based method, i.e. Geographically Weighted Regression (GWR), also applied in this study, that can appropriately describe the spatial nonstationary relationship between rainfall and influencing explanatory variables (Hu et al., 2019). The GWR is one of the robust approaches, capable to efficiently improve quality of the satellite-derived rainfall (Wu et al., 2015;Chao et al., 2018).
Studies in Ethiopia, validated satellite-derived rainfall by comparing in-situ point station records with pixel rainfall values at different spatial and temporal scales. Tropical Rainfall Measuring Mission (TRMM), Climate Prediction Center Morphing technique (CMORPH) and Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) are the most frequently validated satellite-derived products (Fenta et al., 2018). Dinku et al. (2008) evaluated these products over very complex topography in Ethiopia and less rugged topography of Zimbabwe. Their result indicated that all these satellite products had poor performance at the daily resolution, particularly over complex topography of Ethiopia. According to Dinku et al. (2011), that poor performance could be attributed to coarse spatial resolution, in which pixels averaged for the wet and dry areas might be misidentified as non-raining pixel.
The performance accuracy of satellite products to detect rainfall, increases towards coarser temporal resolution (Zambrano-Bigiarini et al., 2017;Gebrechorkos et al., 2018) due to temporal data accumulation (in a week or even better in a month) and the systematic error (Tian et al., 2007;Aghakouchak et al., 2012) cancelation (Fenta et al., 2018). Therefore, it is a challenge to find even a relatively accurate satellite-derived rainfall product at a daily time scale . However, it is the daily rainfall that is nowadays the most demanded in hydrology and water resources management, because the daily rainfall, is typically required as driving force input of integrated hydrological models (IHM) used in water management.
There are only few satellite-based studies focused on daily rainfall assessment and even less focusing on Africa. Example of such studies includes Multi-Sensor Precipitation Estimate-Geostationary satellite (MPEG) by Worqlul et al. (2018). That study was conducted on the highlands of the upper Blue Nile Basin. The results indicated that before any bias correction, the MPEG poorly captured the in-situ observations, but after mean bias correction (MBC), it still indicated substantial rainfall underestimation. Another study on daily rainfall assessment was on Climate Hazards Group Infrared Rainfall with Stations (CHIRPS) by Fenta et al. (2018) in Lake Tana Basin (source of Blue Nile Basin). The result remarked that CHIRPS, with its relatively high spatio-temporal resolution, weakly captured rainfall variability at daily time scale in the topographically complex study area. The hitherto validation approaches, especially in Blue Nile Basin (Ethiopia), were not suitable for the Upper Tekeze Basin (UTB), because none of them fulfiled all the conditions required by this study, i.e.: i) high accuracy of rainfall data in topographically complex (mountainous) terrains; ii) daily temporal resolution; and iii) at least 1 km spatial resolution.
Evaluation of daily satellite rainfall products by comparing satellite with in-situ rainfall observations used to indicate large biases. The weak performance of satellite-derived products at daily time scale was mainly reasoned by the fine temporal resolution, high rainfall variability under topographically complex areas and the scale difference between satellite-derived rainfall products and in-situ observations (Rahmawati and Lubczynski, 2017). Moreover, selection of not optimal satellite rainfall evaluation or of stationary bias correction (e.g. MBC) methods, can lead to incorrect conclusion on the performance of satellite products. Hence, direct inputting of daily rainfall from satellite-derived rainfall products into hydrological models can propagate errors affecting models tremendously. As such, there is a need for development of a method allowing to merge daily satellite with ground-based data that will substantially improve the daily satellite rainfall estimate.
This study focused on daily rainfall data acquired from two satellite rainfall products, MPEG and CHIRPS providing rainfall estimates at 3 km and about 5 km spatial resolution respectively. These rainfall products were selected because they provide daily data at the highest available spatial resolution over the UTB study area. The objectives of this study were to evaluate the performance of the two satellite-derived rainfall products on daily basis by downscaling the daily satellite rainfall estimates to 1 km spatial resolution and to improve their accuracy by merging the downscaled rainfall products with in-situ observations and evaluate them to select better performing product to create accurate daily input data for the IHM (not part of this paper) of the UTB.
The main novelty of this study is in: i) new testing indices introduced to all steps of the validation of the daily satellite rainfall products; ii) use of the GWR method under semi-arid, complex topography in a data-scarce area; iii) important, performance-related information for potential users of MPEG and CHIRPS products; and iv) first time of daily satellite rainfall assessment in UTB.

Study area
The study area, UTB (Northern Ethiopia), is located between latitudes 12° 38′ 12″ and 13° 20′ 16″ N and longitudes 38° 59′ 23″ and 39° 40′ 05″ E. It covers an area of about 3500 km 2 (Fig. 1). Based on topographic information from the Shuttle Radar Topographic Mission (SRTM) Digital Elevation Model (DEM), the elevation of the study area ranges from 1230 m a.s.l. at the western catchment outlet to 3948 m a.s.l. in the eastern mountainous area. More than 50% of the study area has slopes of less than 15°, while ~12% of the area has slopes greater than 30°.
The rainfall of the study area is highly spatially and temporally variable. According to in-situ gauge measurements ( Fig. 1) from 2015 to 2018, the mean annual rainfall varied from ~400 mm in the western lowland area, to ~940 mm in the eastern mountainous area. The elevation and mean annual rainfall are positively correlated in the study area, with r = 0.65 if considering all in-situ observations and much higher correlated (r = 0.9) if excluding outlying Adigudem and Debub rainfall station. The study area experiences two main seasons: short, 4 months wet season from June to September, when more than 80% of a yearly rainfall falls with maximum rainfall during August. The remaining months represent dry season. Slightlly different is the southeastern area around Maichew and Hashenge, where wet season rainfall is relatively lower, representing 73% and 78% of the yearly rainfall, respectively, with relatively higher rainfall during the dry season. The mean annual temperatures vary from about 11 °C in the eastern mountains to about 31 °C in the western lowlands. The highest mean monthly temperature is in May and the lowest in December.

Data acquisition
Ground-based and remote sensing CHIRPS and MPEG rainfall data were sourced for four years from 01 January 2015 to 31 December 2018 at a daily time resolution.

Ground-based data
Daily meteorological data for the ten stations ( Fig. 1) were obtained from the Ethiopian National Meteorological Agency (NMA) at Tigray regional state. The nine observations were obtained from 1 January Fig. 1. Location of the UTB with its elevation and location of in-situ rainfall gauging stations used in this study. M.A. Gebremedhin, et al. Atmospheric Research 248 (2021) 105135 2015 to 31 December 2018 and one station (Finarawa) from 1 January 2015 to 31 December 2017. Of the 10 stations, one is automatic weather station, which records weather parameters hourly (referred as Class 1), five stations record daily rainfall, maximum and minimum temperature (referred as Class 3) and the remaining four stations record only daily rainfall amounts (referred as Class 4). There were inconsistencies in the geographical coordinates of the stations obtained from NMA and from other published literature sources; hence, geographical coordinates of the locations of the meteorological stations were verified during this study fieldwork, applying Garmin eTrex GPS with ± 5 m accuracy. The verified geographical locations of the rainfall stations are presented in Fig. 1. The spatial distribution of the rainfall stations is not uniform, as in the western part, they are very sparse, but fortunately, relatively dense in the topographically complex, eastern area. The quality of the stations' data was checked; all the stations, except Adigudem, Maichew and Wedisemero have some missing data records, with the largest data gaps at Finarawa station (Table 1). In general, the amount of missing data is higher in the dry season (~73%) than in wet (~27%).

Remote sensing data
The CHIRPS satellite rainfall is a quasi-global (50 o S -50 o N), gridded product with a spatial resolution of 0.05 o (~5 km) and daily, pentadal, dekadal, and monthly resolution . The CHIRPS algorithm takes advantage of integrating data sources from the Climate Hazards Group Precipitation Climatology (CHPclim), TIRbased satellite rainfall and in-situ measurement. The CHPclim uses long-term average satellite rainfall fields as guides to derive climatological surfaces . The daily CHIRPS rainfall product was freely downloaded from Climate Hazards Group via the link https://data.chc.ucsb.edu.
The MPEG satellite rainfall product covers the Earth from 60 o S to 60 o N with spatial resolution of 3 km and high temporal resolution of 15 min. It is produced by the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) meteorological product extraction facility (MPEF). The MPEG is derived from the infrared data of the EUMETSAT geostationary satellites by continuous recalibration of the algorithm with rain rate data derived from polar-orbiting microwave sensors (Worqlul et al., 2018). The algorithm is based on a combination of Meteosat Second Generation (MSG) images from the infrared IR10.8 μm channel and passive microwave data from the Special Sensor Microwave/Imager (SSM/I) instrument on the United States Defence Meteorological Satellite Program (DMSP) polar satellites. The MPEG data is available through the GEONETCast near-realtime global network of satellite-based data dissemination systems, designed to distribute space-based, airborne and in-situ data. The MPEG product can be sourced freely from the International Institute for Geo-Information Science and Earth Observation (ITC) file transfer server: https://filetransfer.itc.nl/pub/mpe/. For this study, the 15 min time steps of MPEG data, was aggregated to daily time interval (00:00-23:45 UTC) using Integrated Land and Water Information System (ILWIS) software. The EUMETSAT-MPEG data is succeeded by the Hydrology SAF H05B product since 01 July 2019, which can be accessed through http://hsaf.meteoam.it or at https://filetransfer.itc.nl/pub/mpe/.

General approach
To assess correlation among in-situ UTB rainfall observations, reflecting spatial rainfall variability, Pearson's product-moment correlation coefficient (r) at daily time interval was used. Also, the relationship between correlation coefficient (r) attributing different pairs of gauges and corresponding distances was tested.
Preprocessing and analysis of the daily satellite rainfall were performed using the ILWIS open-source Water and Food Security Ethiopia Toolbox  and ArcGIS. As the performance of satellite-derived rainfall has to be validated and bias-corrected before using it in hydrological applications (Rahmawati and Lubczynski, 2017;Kimani et al., 2017;Lekula et al., 2018), the MPEG and CHIRPS satellite products were first validated using ground measurements to evaluate and compare their performances at the UTB. Considering CHIRPS satellite rainfall estimate, Navas et al. (2019) recommended that despite inherent blending procedure, the CHIRPS rainfall product can be still improved by merging with unused in the blending, in-situ observations. The list of rainfall stations used in this study was checked and except Maichew gauge record in 2018, none of the in-situ observations as per Fig. 1 and Table 1, acquired in the period 2015-2018, was used in the inherent CHIRPS blending procedure. A flowchart in Fig. 2 illustrates the main steps followed to validate satellite-derived rainfall and merging of the products with the in-situ observations. The evaluation methods used during the validation step, were again used after merging process to evaluate the performance of merged rainfall data.

Satellite-derived rainfall evaluation
The daily MPEG and CHIRPS rainfalls were downscaled to 1 km resolution using nearest neighbor (NN) and bilinear (BL) interpolation methods. These downscaling methods were used to enhance the spatial resolution of the satellite-derived rainfall products. To evaluate the downscaling approaches, the original and downscaled pixel-wise rainfall estimates of MPEG and CHIRPS, were compared with the coinciding, point-wise rainfall measured at each in-situ observation; in case in-situ observation was not available, the corresponding satellite data pixel was discarded from analysis. Descriptive statistics, categorical statistics and bias decomposition methods were applied to evaluate the performance of satellite rainfall products (Ayehu et al., 2018;Dinku et al., 2018;Lekula et al., 2018). That performance evaluation was carried out on daily basis and per season, i.e. separately for the wet (1 June −30 September) and dry (1 October -31 May) seasons, because the UTB is characterized with large amount of rain during the short-wet season compared to the long dry season.

Descriptive statistics
The descriptive statistics was used to compare the satellite-derived rainfall at pixel level with corresponding in-situ observations. The descriptive statistics used in this study include: i) Pearson's product-moment correlation coefficient (r); ii) Mean Error (ME); iii) Mean Absolute Error (MAE); iv) Root Mean Square Error (RMSE); v) Normalized Mean Absolute Error (N MAE ); and vi) Normalized Root Mean Square Error (N RMSE ). The equations of the descriptive statistics are as follow: Table 1 Geographical locations of rainfall stations as in Fig. 1 where P si is satellite-derived rainfall at day i; P gi is in-situ rainfall at day i; the over bar P g and P s represent the average in-situ and average satellite rainfall, respectively; T is the total number of daily data pairs. The range of values of r is from +1 to −1, in which the value of 0 indicates that there is no linear relationship between the satellite-derived rainfall and in-situ observations, the value 1 indicates perfect positive linear correlation and the value −1 implies a perfect negative linear correlation; the ME ranges from -∞ to ∞ with positive or negative values, which indicate overestimation or underestimation of rainfall by satellite product, respectively, while values close to zero indicate a close agreement; MAE, RMSE, N MAE and N RMSE range from 0 to ∞ with best value at 0. The MAE measures the average magnitude of the errors between the satellite and in-situ rainfall without considering their direction. The RMSE is a quadratic scoring rule, which is more reactive (sensitive) to rainfall outliers than the MAE. The N MAE and N RMSE provide relative to P g measures of MAE and RMSE.

Categorical statistics
The categorical statistics is a standard way to evaluate the detection capability of satellite-derived rainfall (Fenta et al., 2018;Lekula et al., 2018). Accordingly, the categorical statistics includes the following indices: i) Probability of Detection (POD); ii) False Alarm Ratio (FAR); iii) Critical Success Index (CSI); iv) Frequency Bias (FBS); and v) Heidke Skill Score (HSS). These statistics are computed based on four combinations between satellite-derived rainfall and in-situ observations to verify the frequency of the correct and incorrect rainfall detection at a given temporal resolution. The four combinations are Hit (event of insitu measured rainfall and satellite detected rainfall), Miss (event of insitu measured rainfall, but satellite failed to detect rainfall), False alarm (event of no in-situ measured rainfall, but satellite detected rainfall) and Correct Negative (no in-situ measured rainfall and no rainfall detected by satellite). The expressions of the categorical statistics based on a contingency table are presented in Table 2.
All the categorical statistics indices mentioned above, except POD, are dependent on FA, which is not an adequate indicator to evaluate the performance of satellite-derived rainfall products (Rahmawati and Lubczynski, 2017). This is because a rainfall-cloud may move through an analyzed pixel, not covering the location of the in-situ observation (Lekula et al., 2018); in that case the gauge will not record rainfall while there will be rainfall within that pixel that might be properly recorded by the satellite; however, applying standard categorical statistics, such event will then be interpreted as FA, despite in reality the satellite properly detected a rain event. Hence, this study proposes two new categorical indicators: i) fraction of miss (FM), which is similar to POD but in contrast, in the numerator instead of H includes more adequate M (see Discussion); ii) fraction of detection (FD) indicator that involves H, M and Correct Negative (CN) used to evaluate the overall detection capability of satellite product.  Gebremedhin, et al. Atmospheric Research 248 (2021) 105135 where FM is the ratio of number of wet days not detected by satellite to the total number of wet days recorded by a gauge (H + M), where the FM ranges from 0 to 1, the former representing perfect satellite rainfall detection, while the FD is the overall capability of a satellite to detect the rain and no rain events where the FD ranges from 0 to perfect value of 1.

Bias decomposition
To get more insight into the source of satellite rainfall errors, the total bias between satellite-derived rainfall and in-situ observed rainfall, can be decomposed into Hit bias (HB), Miss bias (MB) and False alarm bias (FAB) (Habib et al., 2009). However, the FAB which includes FA, cannot be considered as a reliable bias indicator because of the same problem as with FA, explained above. Therefore, this study used HB, Absolute Hit Bias (AHB) and MB to decompose and analyze the biases between satellite-derived rainfall estimates and in-situ observed rainfall. Besides, new bias decomposition indicators, i.e. Normalized Absolute Hit Bias (N AHB ) and Normalized Miss Bias (N MB ), are proposed to normalize the Absolute Hit and Miss biases by the total rainfall accumulated at a given gauge.
where P si and P gi are satellite and in-situ rainfall measures, respectively at valid day i; T is the total number of valid daily data pairs.
HB is positive if the rainfall detected by satellite-derived product is higher than the rainfall measured at in-situ observation, which can be interpreted as the satellite overestimation of correctly observed groundbased rainfall while the negative value of the HB indicates an underestimation of correctly observed rainfall by satellite-derived products. The MB accumulates the missed rainfall by satellite-derived product and its sign is always negative. The signs of the AHB and N AHB are positive while the sign of the N MB is the same as of the MB.

Merging satellite-derived and in-situ rainfall
This study applied a geographically weighted regression (GWR) merging approach for bias correction of the satellite rainfall product. GWR is a robust algorithm that has been used in merging satellite-derived rainfall products with in-situ rainfall (Hu et al., 2015;Chao et al., 2018). Moreover, GWR is suitable to analyze the spatial relationship between rainfall events and their influencing factors, such as topography (Lv and Zhou, 2016). The general formula of GWR by Brunsdon et al. (1996) is as follows: where Y i is dependent and X ij independent variable at location i; u i and v i are the geographical coordinates; β i0 is the intercept, β ij (u i , v i ) is the constant regression coefficient for X ij , ε i is the regression residual at the i th location, m is the number of independent variables and n is the number of observations. Considering elevation as independent explanatory (E i ), while the bias between satellite-derived rainfall and in-situ observations as dependent variable (B i ), the Eq. 14 can be rewritten into Eq. 15.
The regression coefficients are estimated using the following equation, after Fotheringham et al. (1998).
is the bias vector (difference between downscaled satellite rainfall and in-situ rainfall).
In GWR, the kernel type and bandwidth method need to be specified to assign the geographical weighting function. In this study, adaptive Gaussian kernel function (Fotheringham et al., 1998) was specified to compute the weighting function because that approach adapts well to the sparseness of the observation data and adjusts the bandwidth to optimize the spatial weight matrix (Kerm, 2003). The cross-validation method suggested by Lv and Zhou (2016) for local regression to determine the optimal bandwidth was specified. The optimal bandwidth estimated using leave-one-out crossvalidation, which also indicates uncertainty, is inherent in the GWR method. The inverse distance weighted method was applied to interpolate the GWR residuals of each observation location, because this interpolation method can optimally smooth GWR residual in topographically complex area with Table 3 Daily correlation coefficients and distances between in-situ rainfall observations at UTB within 1 January 2015 -31 December 2018, except of Finarawa record, which ends up in 31 December 2017. The numbers before parentheses denote correlation coefficients and the numbers in the parentheses denote distances between stations in kilometres.  The scale difference between point-wise in-situ measured rainfall and pixel-wise satellite rainfall estimate is one of the most important reasons of overestimation or underestimation of rainfall by satellite products (Haile et al., 2013;Dinku et al., 2014). To lower that difference, before the merging approach, it is important to downscale the product to finer resolution (Shi et al., 2020). Accordingly, in this study, after validation of the MPEG and CHIRPS products using the available ground-based data, the downscaled to 1 km grid products were merged with the in-situ observations. The merging approach was performed in four steps. Firstly, the differences (biases) between satellite rainfall and in-situ rainfall at all the available stations were computed. Secondly, the GWR model was used to estimate the spatial distribution of regression coffiecients and regression residuals at each observation point assuming that biases in the satellite products are dependent on elevation. Thirdly, the spatial regression coefficients and regression residuals were substituted into Eq. 15 to compute the bias at each grid cell -a sample of such uncertainty map, just for the last dates of the wet and dry seasons, can be seen in Appendix 1. Fourthly, the spatial biases were removed by subtracting them from the downscaled, satellite-derived rainfall, in order to obtain the bias-corrected rainfall at a daily time interval.
Finally, the bias corrected CHIRPS and MPEG rainfalls, were validated by the descriptive and bias decomposition statistics, applying the same procedure as in the validation stage, to: i) evaluate degree of improvement of the satellite performance, comparing the merged with non-merged rainfall products; ii) compare performances of the CHIRPS and MPGE to select one that will be further used for rainfall assessment in UTB. The pre-mergng processes such as projecting, clipping and downscaling were carrried out in ILWIS and ArcGIS enviroment. ArcGIS model builder was used to execute the GWR merging processes.

In-situ gauge rainfall variability
Pearson's product-moment correlation coefficient matrix of ten insitu, daily rainfall observations is presented in Table 3. The correlations range from 0.12 to 0.46 while distances between the stations from 14 km to 88 km. Generally, a decreasing trend is observed between stations' correlations and distances but this trend is not always followed. For instance, the correlation coefficient of the most distant (88 km) stations, Hashenge and Gijet, is 0.26 while between much closer (31 km), Gijet and Finarawa, is only 0.15. The results show that the stations' distances do not explain the spatial variability of rainfall.

Satellite-derived rainfall evaluation using descriptive statistics
The variability of the correlation coefficient (r) estimated between downscaled CHIRPS and MPEG and the corresponding, ground rainfall measurements during wet and dry seasons, are presented in Fig. 3. It can be observed that: i) during the wet season the correlation is generally better than in the dry season but the spatial correlation patterns, are similar; ii) at a higher elevation, CHIRPS has a better correlation with in-situ observation than MPEG but worse at lower elevation; iii) BL-CHIRPS estimation has slightly higher r than NN-CHIRPS while BL-MPEG and NN-MPEG, similar.
The ME, MAE and RMSE for the wet and dry seasons are presented in Table 4 and Table 5 respectively. In the wet season (Table 4), the ME shows rainfall differences at different altitudes of the in-situ observations for both products using both downscaling methods. The ME of CHIRPS shows large overestimation at the lowland (Finarawa) with better BL downscaling values (closer to zero) than with NN. The ME of MPEG shows also overestimation at the lowland (Finarawa), but in contrast to CHIRPS, in all other, higher elevated stations, the rainfall is underestimated. The mean rainfall underestimation was larger at the MPEG than the overestimation at the CHIRPS but in both satellite products, the BL downscaling had slightly better ME than NN downscaling. Considering the wet season average of MAE and RMSE (Table 4), they are generally lower for MPEG than for CHIRPS and  lower for BL than for NN, with some few stations (e.g. Adigudem and Gijet), where small opposite differences are observed. Also, the wet season N MAE and N RMSE clearly show that MPEG outperforms CHIRPS. In dry season (Table 5), the ME of CHIRPS shows underestimation at the higher elevated stations with slight overestimation at the lowland stations. On average, unlike overestimation in the wet season, CHIRPS underestimated the rainfall during the dry season. The ME of MPEG shows underestimation in all stations on average larger than in CHIRPS. In CHIRPS, the NN downscaling indicated slightly better ME than BL downscaling while in MPEG, the ME was the same. Considering the dry season averages of MAE and RMSE, they were lower for MPEG than for CHIRPS and lower for BL than for NN, as in the wet season. The dry season normalized MAE and RMSE (N MAE and N RMSE ) show that MPEG outperformed CHIRPS but also that these errors were lower during wet seasons than during dry seasons.

Satellite-derived rainfall detection capabilities using categorical statistics
The frequencies of rainfall detection success (H -hit), failure (Mmiss) and indices (FM -fraction of miss and FD -fraction of detection) of CHIRPS and MPEG at the daily resolution for the wet and dry seasons, are presented in Fig. 4. The analysis of that Fig. 4, shows that: i) the MPEG provides better rainfall detection capability than CHIRPS because it has substantially higher H, lower M, substantially lower FM and higher FD, all these characteristics pointing at superiority of MPEG, more distinct in wet than in dry season; ii) the BL-downscaling in general, provides better rainfall detection capability than NN-downscaling, except of CHIRPS in dry season, when the NN-downscaling shows better rainfall detection. Overall, the BL downscaling method outperforms NN in descriptive and categorical evaluations methods, therefore, in the follow up analysis (bias decomposition and merging process) only the BL downscaling is used.

Bias decomposition of satellite-derived rainfall
The biases of CHIRPS and MPEG were decomposed into HB, AHB and MB to quantify the magnitude of correctly detected rainfall and magnitude of missed rainfall in the wet (Table 6) and dry (Table 7) seasons, separately. In the wet season, CHIRPS overestimates observations except in Hashenge station while MPEG substantially underestimates in all stations. Considering AHB, the overall bias in MPEG is higher by ~31% than in CHIRPS, but the MB is much lower in MPEG than in CHIRPS at all gauge measurements.
The HB, AHB and even MB show large differences between the satellite rainfall products and in-situ observed rainfall, which are attributed not only to satellite performance but also to differences in gauge accumulated total rainfall (P T ) between stations, which justify introduction of N AHB and N MB (Eqs. 12 and 13, see Table 6 and Table 7). In wet season (Table 6), the N AHB in both rainfall products showed poor performance (> 50% bias), being worse in MPEG. In contrast, the N MB was better in MPEG than in CHIRPS.
During the dry season (Table 7), the average HB shows that CHIRPS underestimated rainfall, unlike in the wet season when overestimated with small bias at the lowland (Finarawa). The MPEG underestimated the rainfall at all stations, which is consistent with the wet season characteristics. Considering AHB, the overall bias in MPEG is higher by ~50% than in CHIRPS but the MB at all gauge measurements, in MPEG is much lower than in CHIRPS. The maximum MB for CHIRPS and Table 6 Bias decomposition of the daily CHIRPS and MPEG rainfall (in mm), totalized over wet seasons (1 June -30 September) within 1 January 2015 -31 December 2018: P T -gauge accumulated total rainfall (mm); HB -hit bias (mm); AHB -absolute hit bias (mm); MB -miss bias (mm); N AHB -normalized AHB and N MB -normalized MB.   (Table 7), like in wet season, the N AHB was better in CHIRPS but in contrast, the N MB was better in MPEG than in CHIRPS.
In the dry season, as expected (because of generally lower P T ), the HB, AHB and MB are smaller than in the wet season. However, interesting is comparison of the normalized characteristics of the dry and wet seasons. The N AHB indicates better performance of CHIRPS and of MPEG in the dry season than in the wet season while the N MB other way round, i.e. it shows better performance in the wet season.

Merging satellite and in-situ observation rainfall
The correlation coefficient (r) estimated between GWR-merged CHIRPS and MPEG rainfall and pixel-corresponding, ground rainfall measurement during wet and dry seasons is presented in Fig. 5. It is observed that the r is noticeably improved as compared to the downscaled rainfall products, being close to the perfect linear correlation value (~1). This indicates that the daily rainfall is well captured by CHIRPS and MPEG after the GWR merging approach for both, the wet and dry seasons.
The descriptive statistics of error measurement results after GWR merging is demonstrated in Table 8 and Table 9 for the wet and dry seasons, respectively. All the measures presented, show that the GWR merging approach substantially improved the accuracy of both rainfall products. Considering the wet season, the ME shows only a slight overestimation by CHIRPS and MPEG. The MAE and RMSE also demonstrated that the GWR merging approach improved the accuracy of rainfall data at all in-situ observations. The largest MAE and RMSE were at the highlands; for example in wet season, before merging, at Wedisemero, in CHIRPS 7.08 mm day −1 and 12.96 mm day −1 and in MPEG, 6.17 mm day −1 and 12.14 mm day −1 (Table 4) and after merging decreased to 0.45 mm day −1 and 1.06 mm day −1 for CHIRPS and 0.42 mm day −1 and 1.04 mm day −1 for MPEG (Table 8), respectively. The before-merging, average ME in MPEG indicating underestimation (−0.93 mm day −1 ), was also substantially improved to a slight overestimation (0.11 mm day −1 ). Considering average ME, the GWR-approach showed a similar improvement in CHIRPS and MPEG while regarding the average MAE and RMSE, the GWR-approach improved MPEG slightly better than CHIRPS. The N MAE and N RMSE also show that MPEG outperformed CHIRPS after merging during the wet season.
In the dry season, the ME showed slight overestimation by CHIRPS and MPEG after GWR-merging, substantially reducing error in both products as compared to before merging with similar improvement in CHIRPS and MPEG. The MAE and RMSE also demonstrated that the GWR merging improved the accuracy of rainfall at all in-situ observations. For example, before merging, the largest MAE and RMSE were at the highlands; at Hashenge station, in CHIRPS, 1.82 mm day −1 and 6.03 mm day −1 , and in MPEG, 1.34 mm day −1 and 5.32 mm day −1   Gebremedhin, et al. Atmospheric Research 248 (2021) 105135 ( Table 5) and after merging decreased to 0.02 mm day −1 and 0.06 mm day −1 for CHIRPS, and to 0.02 mm day −1 and 0.07 mm day −1 for MPEG (Table 9), respectively. The lowering of dry season MAE and RMSE indicate, that GWR-approach improved MPEG better than CHIRPS. The dry season MAE and RMSE are smaller than during wet season because of lower P g . However, the lower N MAE and N RMSE in wet than in dry season indicate better performance of both products in wet than in dry season but lower N MAE and N RMSE for MPEG than for CHIRPS in wet and dry seasons, indicate generally better performance of MPEG than of CHIRPS. The wet season bias estimates of GWR-merged CHIRPS and MPEG are presented in Table 10. They indicate that: i) all the bias estimates (HB, AHB and MB) show substantial improvement after GWR merging approach in both CHIRPS and MPEG; ii) the HB in CHIRPS and MPEG have positive and negative values, so no specific trend can be observed; iii) the AHB of MPEG in most of the stations is lower than in CHIRPS; iv) the MB in all gauges is lower in MPEG than in CHIRPS; v) N AHB and N MB are generally very low but lower for MPEG than for CHIRPS.
The dry season bias estimates of GWR-merged CHIRPS and MPEG are presented in Table 11. They indicate that: i) all the bias estimates (HB, AHB and MB) show substantial improvement after GWR merging approach in both CHIRPS and MPEG; ii) all the biases are smaller in dry than in wet season because of smaller P T ; iii) the HB in CHIRPS and MPEG have positive and negative values, so like in case of wet season no specific trend can be observed; iv) the AHB in most of the stations, is lower in MPEG than in CHIRPS; v) the MB in CHIRPS is small, < 2 mm, but in MPEG is negligible at all stations; v) N AHB and N MB are generally very low but lower for MPEG than for CHIRPS.

Discussion
The relationship among the in-situ rainfall station measurements ( Fig. 1) in UTB is not dependent on distances between the stations. This indicates that spatial rainfall variability can differ from region to region due to other influencing factors (e.g. altitude). Although the in-situ observations in UTB are better represented by altitudinal variation than by distances, they are too sparsely distributed to properly represent the spatial rainfall variability for example by interpolation. Hence, satellitederived rainfall products with high temporal and relatively high spatial resolution, like CHIRPS (~5 km) and MPEG (3 km), offer sources of complementary rainfall data. However, CHIRPS and MPEG, as any other rainfall product, are prone to errors, in UTB mainly because of the high daily rainfall variability and the scale difference between the pixelwise satellite-derived rainfall versus the point-wise in-situ rainfall observation. Downscaling methods can narrow down the scale difference, particularly in topographically complex areas, like UTB. The NN downscaling method does not change the original pixel rainfall in both satellite-derived rainfall products because the new pixel value is determined by matching to the corresponding position of the original pixel, which behaves to preserve original values. Conversely, BL method changes the original pixel rainfall values in both products because it uses a weighted average of four nearest pixels to assign rainfall value to the new pixel. This study indicated that by considering in-situ observations as a reference, the BL method, downscaled slightly better the CHIRPS and MPEG products than NN. This result disagreed with the study by (Kimani et al., 2017) where NN method outperformed BL method with high RMSE difference at monthly time scale. This disagreement might be attributed to the temporal resolution difference,

Table 10
Bias decomposition of the daily GWR-merged rainfall (in mm), totalized over wet seasons (1 June -30 September) within 1 January 2015 -31 December 2018: P Tgauge accumulated total rainfall (mm); HB -hit bias (mm); AHB -absolute hit bias (mm); MB -miss bias (mm); N AHB -normalized AHB and N MB -normalized MB.  Din et al. (2008) and Ulloa et al. (2017) demonstrated, that in-situ observed rainfall better correlated with BL downscaled satellite-derived rainfall than with the original rainfall product, which was also the case in this study.
In descriptive statistics, the mean error (ME) is a good measure of the systematic error to characterize the underestimation or overestimation performance of satellite-derived rainfall products; for example, in wet season analysis, it indicated that the MPEG rainfall was highly underestimated, even more than in CHIRPS. However, considering quantitative assessment of error, the ME is not conclusive because in time series analysis, underestimation and overestimation errors can cancel out. The MAE and RMSE are more reliable quantitative error measures. The MAE and RMSE were slightly smaller in the BL than in NN and smaller in MPEG than in CHIRPS, with the smallest BL-MPEG combination suggesting the best rainfall detection. Remarkable however was the range of the MAE and RMSE errors, as in all the in-situ gauges, they exceeded the mean daily rainfall (P g ).
The MAE and RMSE of CHIRPS and MPEG were lower in dry season than in the wet season but only because there was less rainfall in dry season. This indicates that absolute error measures of MAE and RMSE cannot provide objective judgment of the performance of rainfall products. In contrast, the introduced in this study, N MAE and N RMSE indicators, reliably evaluate the performance of satellite products because they take into account P g . The N MAE and N RMSE indicated that the performance of both CHIRPS and MPEG was better in the wet season than in the dry season (Fig. 6a, b), which is consistent with the observation made by Rahmawati and Lubczynski (2017) at the Bali Island.
The N RMSE , provides higher difference in the dry season than in the wet season, because of being prone to bias, erratic rainfalls in dry season that are better captured by RMSE than by MAE as the former is more sensitive to outliers.
According to the categorical statistics, MPEG provided higher frequency of hit rainfall events (H) and lower missed rainfall events (M) compared to CHIRPS in both, wet and dry seasons. The low M is more reliable than high H to evaluate the performance of satellite rainfall products as it refers to the low failures of satellite algorithm to detect rainfall recorded at in-situ observations. The H indicator is also valuable to evaluate the ability of satellite products to detect rainfall events recorded at the ground; however, there is a risk that different showers can be detected by the satellite than those measured at the ground (Rahmawati and Lubczynski, 2017;Lekula et al., 2018). Accordingly, the MPEG satellite-derived product performed better than CHIRPS during both seasons, which might be attributed to its higher spatial and temporal resolution of data acquisition. The fraction of miss (FM), new indicator introduced in this study, also suggests that MPEG outperformed CHIRPS in detecting the true rainfall occurrence at the ground in both, wet and dry seasons (Fig. 4). The lower FM in MPEG is because of the lower M and higher H compared to CHIRPS product. The higher FD in MPEG than in CHIRPS in both seasons, indicates that the former better captures the rainfall than the latter. However, compared to FM, the FD is less reliable indicator (Eq. 8) because of its low sensitivity to the satellite performance when correct negative (CN) is high, such as in long dry season in UTB with only a few scattered rainfall events but lots of CN. The categorical indicators do not consider the rainfall magnitude (in Table 11 Bias decomposition of the daily GWR-merged rainfall (in mm), totalized over dry seasons (1 October -31 May) within 1 January 2015 -31 December 2018: P Tgauge accumulated total rainfall (mm); HB -hit bias (mm); AHB -absolute hit bias (mm); MB -miss bias (mm); N AHB -normalized AHB and N MB -normalized MB. contrast, provided by bias decomposition), but only frequency agreement between the daily satellite and in-situ rainfall events. The bias decomposition indicated inconsistent performance in the two downscaled products. The hit bias (HB) and the absolute hit bias (AHB) were lower in CHIRPS but the miss bias (MB) was lower in MPEG. However, the HB and AHB are less reliable bias indicators than MB because different showers can be detected by a satellite algorithm than by a gauge observation (Lekula et al., 2018). Besides, in HB (Eq. 9), the overestimation events by satellite algorithm can be cancelled out with the underestimation events when accumulated in time series; the AHB (Eq. 10) is therefore a better bias indicator than HB to quantify the magnitude of satellite rainfall product biases. The increasing AHB by CHIRPS and MPEG towards the high elevated stations for the wet and dry seasons, is related with the high frequency and variability of rainfall occurrence. To account in the analysis for bias dependence on the rainfall quantity, this study introduced normalized bias indicators i.e. N AHB and N MB , both computing ratio of accumulated AHB and MB to the total gauge accumulated rainfall, respectively (Fig. 7). These are more reliable bias indicators than standard AHB and MB, as they account rainfall biases relative to the rainfall quantity. The higher N AHB and the lower N MB in CHIRPS and MPEG in the wet season than in the dry season ( Fig. 7 a,b) indicate better performance of both satellite rainfall products in the wet season, because the bias increases with increased rainfall intensity of individual showers (Hu et al., 2019), which is typically lower in wet season, also in UTB.
The overestimation of CHIRPS as compared to in-situ rainfall can be attributed to the limitation of TIR sensor to identify cirrus clouds from rain clouds (Thiemig et al., 2013;Young et al., 2014;Dinku et al., 2014;Toté et al., 2015). Unlike the finding by Kimani et al. (2017) that CHIRPS underestimated the in-situ observation at the altitude < 2500 m a.s.l. during wet season, CHIRPS at UTB overestimated in-situ observation during the wet season. This supports the opinions of Zambrano-Bigiarini et al. (2017) and Fenta et al. (2018) who remarked that satellite-derived products might show different performances from region to region due to regional and/or local factors, therefore always require validation and bias correction before using them in any hydrological studies.
The performance of MPEG in this study is in line with studies by Derin and Yilmaz (2014); Dhib et al. (2017) and Worqlul et al. (2018), i.e. is also substantially underestimated as compared to in-situ observations. That underestimation, especially high at the highlands, is likely attributed to the limitation of a product to detect warm orographic rainfall expected at mountainous areas (Worqlul et al., 2014;Dhib et al., 2017). Moreover, in highland parts of Ethiopia, including UTB, localized convective light rainfalls might not have been captured because the PMW sensors have tendencies to underestimate light rainfall (Rahmawati and Lubczynski, 2017).
The performance of both satellite-derived rainfall products at UTB, even with relatively high spatio-temporal resolution (downscaled to 1 km), indicated large biases compared to gauge measured rainfall. This required improvement, i.e. bias correction of both products. There are different bias correction methods. The mean bias correction (MBC) is a commonly used approach by deriving additive or multiplicative factors from the difference or ratio of in-situ observations and satellite-derived rainfall (Habib et al., 2014;Lekula et al., 2018;Worqlul et al., 2018). However, the MBC cannot reflect the non-stationary, spatial relationship between satellite-derived rainfall and in-situ rainfall influenced by land surface conditions; such approaches assume simplistic uniform bias distribution over the spatial domain (Nerini et al., 2015;Lv and Zhou, 2016). The applied in this study GWR method was used to merge CHIRPS and MPEG with in-situ observations considering altitude as explanatory variable because the spatial rainfall variability at UTB is correlated with altitudinal variation.
The Fig. 6 shows that after GWR merging (c, d) the N MAE and N RMSE were substantially improved in both, wet and dry seasons as compared to original, before GWR-merging (a, b) state of the rainfall. It also shows better performance of MPEG than of CHIRPS and better performance of both products in wet than in dry season, which is consistent with the N MAE and N RMSE from before GWR-merging. Remarkable is relatively large GWR-merged N RMSE (with its variability range) of CHIRPS in dry season, much larger than corresponding N RMSE of MPEG, which is attributed to better response of MPEG than CHIRPS to scattered, intense (outliers), UTB rainfall in the dry season.
The Fig. 7 shows great improvement of N AHB and N MB after GWRmerging, for both CHIRPS and MPEG, but also indicating similar, normalized bias magnitudes (in case of N MB close to zero) in both, wet and dry seasons (c, d), unlike before merging (a, b), when the differences and the normalized biases themselves, were much larger. The very small N MB after GWR-merging of MPEG, but also of CHIRPS, indicate that the biases, at least in locations close to stations, are small relative to the rainfalls quantities in these locations in respective seasons.
The time series example graphs, comparing the in-situ Bora daily rainfall with corresponding daily rainfall in CHIRPS (Appendix 2) and MPEG (Appendix 3) before and after merging indicate that GWR approach substantially improved time series projection of rainfall as compared to original, non-corrected products, better for MPEG than for CHIRPS rainfall. The general GWR rainfall improvement agrees well with the study by Chao et al. (2018) who stated that merging in-situ observation with satellite rainfall using GWR is an effective tool for producing improved rainfall data, particularly for the areas with sparse gauge observations and characterized with high rainfall variability. The improvement of CHIRPS rainfall data after the GWR merging approach confirms also the observation by Navas et al. (2019), who stated that CHIRPS rainfall product can be still improved by merging new in-situ observations, despite the inherent blending procedure. In contrast, the MPEG is a pure satellite product with no inherent ground-based calibration, but it provided better rainfall accuracy than CHIRPS before and after the GWR merging, which can be attributed to the higher spatiotemporal resolution, so better capability of the product to detect the frequency of rainfall events. The GWR results demonstrated the effectiveness of the merging method and its high accuracy, tested over 10 gauges available at the UTB, but there is still uncertainty away of the gauges that can be mitigated by installation of additional gauges. Considering the GWR-method implementation in the UTB, further improvement considering other explanatory variables can also be tested.

Conclusion
This study validated and bias corrected, the downscaled to 1 km spatial resolution CHIRPS and MPEG satellite-derived rainfall products at daily resolution. The validation was carried out over downscaled products, using descriptive statistics, categorical statistics and bias decomposition methods, separately for wet and dry seasons, introducing new bias indicators for each of the evaluation methods. To improve the accuracy of the rainfall products, the bias correction was carried out, applying GWR method of merging satellite with in-situ observations and using rainfall dependence on altitude as explanatory variable. From this study, the following main conclusions can be drawn: 1. The satellite rainfall products are sources of complementary to ground-based rainfall data, particularly in areas with sparse rain gauges and high spatio-temporal rainfall variability, provided they are downscaled (if necessary), validated, and bias corrected, but applying effective methods, appropriate for the study area of concern (such as GWR for UTB). 2. The bilinear downscaling method worked in UTB slightly better than nearest-neighbor for both satellite products 3. The ME is useful indicator of a systematic error to characterize the underestimation or overestimation performance of satellite but quantitatively not conclusive, as the underestimation and overestimation errors can cancel out when analysing time series rainfall data. The MAE and RMSE are reliable quantitative measures of rainfall performance, but they are dependent on quantity of rainfall analyzed, so are not suitable for comparison of different rainfall data sets, such as for example rainfalls of the wet and dry seasons. That role was optimally performed by normalized measures of MAE and RMSE, i.e., N MAE and N RMSE , introduced in this study. 4. Considering rainfall frequency measures, the 'miss' is more reliable than the 'hit', while the 'false alarm' is not reliable so it was not used in this study; to avoid 'false alarm' in commonly used indices, 'fraction of miss' and 'fraction of detection' modified indices were introduced. All the frequency measures, indicated better MPEG performance in the UTB than CHIRPS. 5. The bias decomposition is a reliable method to evaluate the performance of satellite rainfall products as it quantifies the accumulated magnitude of biases. The 'absolute hit bias' and the 'miss bias' are reliable measures of rainfall performance, but they are dependent on quantity of rainfall analyzed, so are not suitable for comparison of different rainfall data sets, such as rainfalls of the wet and dry seasons. That role was optimally performed by normalized measures of AHB and MB, i.e. N AHB and N MB , introduced in this study. 6. The bias correction of the MPEG and CHIRPS rainfall products by GWR-merging algorithm with rainfall dependence on altitude as explanatory variable, showed substantial improvement, decreasing the bias between satellite-derived rainfall and in-situ observations in both products, although with slightly better final accuracy of MPEG than of CHIRPS. 7. The GWR method has inherent assessment of uncertainty, which typically propagates away of rainfall observation points. The way to reduce uncertainty of a satellite product, is to increase the number of merged ground stations and eventually to condition the GWR method with more explanatory variables, if relevant.

Declaration of Competing Interest
None.