Precipitation Diurnal Cycle Assessment of Satellite-Based Estimates over Brazil

The main objective of this study is to assess the ability of several high-resolution satellite-based precipitation estimates to represent the Precipitation Diurnal Cycle (PDC) over Brazil during the 2014–2018 period, after the launch of the Global Precipitation Measurement satellite (GPM). The selected algorithms are the Global Satellite Mapping of Precipitation (GSMaP), The Integrated Multi-satellitE Retrievals for GPM (IMERG) and Climate Prediction Center (CPC) MORPHing technique (CMORPH). Hourly rain gauge data from different national and regional networks were used as the reference dataset after going through rigid quality control tests. All datasets were interpolated to a common 0.1◦ × 0.1◦ grid every 3 h for comparison. After a hierarchical cluster analysis, seven regions with different PDC characteristics (amplitude and phase) were selected for this study. The main results of this research could be summarized as follow: (i) Those regions where thermal heating produce deep convective clouds, the PDC is better represented by all algorithms (in term of amplitude and phase) than those regions driven by shallow convection or low-level circulation; (ii) the GSMaP suite (GSMaP-Gauge (G) and GSMaP-Motion Vector Kalman (MVK)), in general terms, outperforms the rest of the algorithms with lower bias and less dispersion. In this case, the gauge-adjusted version improves the satellite-only retrievals of the same algorithm suggesting that daily gauge-analysis is useful to reduce the bias in a sub-daily scale; (iii) IMERG suite (IMERG-Late (L) and IMERG-Final (F)) overestimates rainfall for almost all times and all the regions, while the satellite-only version provide better results than the final version; (iv) CMORPH has the better performance for a transitional regime between a coastal land-sea breeze and a continental amazonian regime. Further research should be performed to understand how shallow clouds processes and convective/stratiform classification is performed in each algorithm to improve the representativity of diurnal cycle.


Introduction
Precipitation, and its time and space distributions, is of paramount importance to any country, in particular for those of continental size such as Brazil. Indeed, rain gauge data are always required in almost all areas of activities: water resources management (with emphasis on potable water), agriculture, energy generation by hydroelectric power plants, just to mention some. Reliable planning for the operation and maintenance of these activities requires consistent and accurate data [1]. On the technique (CMORPH) from CPC/NOAA and The Integrated Multi-satelliE Retrievals for GPM (IMERG) from NASA. This article is organized as follows: Section 2 presents information from the study area while the data sets and statistics are presented in Section 3. Section 4 presents the main results of this research and discussion. The conclusions are provided in Section 5.

Study Area
Brazil covers 8,515,759 km 2 area in South America territorial which is approximately the domain of 37 • S-8 • N and 35 • -73 • W as shown in Figure 1. Due to its continental dimensions, it presents a great diversity of landscapes, topography, biodiversity and climates, as well as different precipitation regimes [1]. The different precipitation regimes in Brazil are mainly associated with large-scale systems acting according to seasonality, the most important being the South Atlantic Convergence Zone (SACZ) that acts in the southern summer ranging from the southeast of the country to the far west Amazonian region; the Intertropical Convergence Zone (ITCZ) acts mainly in the fall over the north and northeast regions and transient systems (baroclinic systems) acting mainly in the winter over most parts of Brazil. A complete description of these precipitation regimes can be found in [15].

Ground Gauge and Quality Control
The present study covers the period from 2014-2018. The initial date was chosen due to the availability of products derived from NASA's GPM measurements. During this period, hourly precipitation data were obtained from rain gauge networks from Brazilian National Institute of Meteorology (INMET; http://www.inmet.gov.br), the National Water Agency (ANA; www.ana.gov.br), Companhia Energética de Minas Gerais (CEMIG; http://www.cemig.com.br), Agronomic Institute (IAC; http://www.iac.sp.gov.br/) and Sistema Meteorológico do Paraná (SIMEPAR; http://www. simepar.br/). All precipitation data are stored the National Institute for Space Research (INPE; https://www.cptec.inpe.br/ database in Brazil. Table 1 presents the detailed information about the aforementioned networks. The rain gauge data used in this study cover different periods and their metadata (sources, space coverage, time resolution and number) are summarized in Table 1. Hourly and tri-hourly precipitation studies, especially those involving the diurnal cycle, are not so frequent due to their low reliability and quality. Therefore, a quality control (QC) procedure is essential; the schematic flow diagram of this processes is given in Figure 2a, while the space distribution and percentage of available reference data (after QC test and gridding procedure) is shown in Figure 2b. A concise description of the quality control procedures used in this study is given below: 1. Missing and unrealistic values were detected from the reference dataset. In some cases like INMET, CEMIG and SIMEPAR data are flagged as 9999.99 while the other networks use a spurious value (i.e., 650 mmh −1 ); 2. A threshold between 10 mmh −1 and 120 mmh −1 was established for convective rainfall (also adopted at SIMEPAR) to apply specific quality control tests, according to [16]; 3. For rainfall rates within this interval, the physical characteristics of the convective clouds were compared with the correspondent satellite imagery [17] using different channels (mainly infrared and visible, when available) from GOES 13 and 16. This imagery was provided by the Satellite Division and Environmental Systems DSA/INPE; 4. The reference dataset, with different time resolution, were accumulated for three hour periods following the WMO guidelines (i.e., 00-03 UTC; 03-06 UTC; and so on); 5. Daily values (12:00-12:00 UTC) were compared with accumulated values from the previous step at each station to satisfy INPE's quality control tests [1].
Based on the above criteria, some data were eliminated when they are missing (1) or when the cloudiness at the coordinates of the gauges did not correspond to the convection in the GOES images (3) or, even when the daily rainfall value does not match with the tri-hourly accumulation for the respective day (5). The station was totally discarded if the percentage of failures exceeded 5%.
Once the first set of tests are applied to validate the intense precipitations events, all precipitation values were analyzed (including those inferior to 10 mmh −1 ) using statistical techniques [18] as quantile methods and frequency distribution of rain thresholds. At the end of the process, 1261 stations were selected for the next step according to the following institutional distribution: INMET-592 stations, ANA-499 stations, CEMIG-21 stations, IAC-128 stations and SIMEPAR-21 stations ( Figure 1).
The data were interpolated to a 0.1 • × 0.1 • uniform grid using the simple average for the rain gauges station available at each grid point (Figure 2b). Grid points where there are not rain gauges nearby were removed from the series. This approach allows, on one hand, to make the best use of the available information and, on the other hand, to make a fair comparison with the SPE values. Figure 2b shows a reasonable space distribution of the rain gauge-based precipitation data which is suitable for the purpose of this study.
The sampling frequency of data availability is quite uneven along the country with relatively low values (less than 40% of the total series) for some points in southeastern and southern Brazil due to the short period of the observations at the IAC and SIMEPAR stations while IMNET station shows a better record of accepted observations after the QC procedure ( Figure 2b and Table 1).

Satellite-Based Precipitation Estimates (SPE)
High temporal and spatial resolution (almost on a global scale) of satellite-based precipitation estimates were used in this study. The products are precipitation estimates obtained from a constellation of polar and equatorial (low orbit) with on-board passive and active microwave sensors, adjusted using DPR radar data from the GPM satellite. These products are generated by morphing algorithms [19] which use geostationary IR data to attain high temporal (using cloud motion vectors) and high space resolution (Kalman filter applied to low resolution precipitation rate data [20]). The utilized algorithms are: (1) Global Satellite Mapping of Precipitation (GSMaP) from JAXA (http://sharaku.eorc.jaxa.jp/GSMaP/), that uses scattering algorithms with polarization corrected temperatures (PCTs) at 85.5 and 37 GHz; PCT85 recommended for light (stratiform) rainfall and PCT37 for heavier (convective) precipitation [21]; (2) Integrated Multi-SatellitE Retrievals for GPM (IMERG) NASA (http://pmm.nasa.gov/data-access/downloads/gpm) and (3) Morphing Technique (CMORPH) developed by the CPC/NCEP/NOAA (ftp://ftp.cpc.ncep.noaa.gov). Basically, the algorithms CMORPH and IMERG use the Goddard Profiling Algorithm (GPROF) to calculate the rainfall rates of instant showers [22] and the algorithm due to [23] for the sounder estimates (cross-track). The main characteristics of these algorithms are given in Table 2. Some of these satellite-based precipitation estimates use gauge data with the purpose of correcting the bias of these estimates [6]. In the case of the GSMaP suite, the product GSMaP-Gauge (hereby referred to as GSMaP-G) [25] adjusts the version GSMaP-Motion Vector Kalman (MVK) with daily data from the global pluviometric analysis from NOAA CPC Unified Gauge-Based Analysis of Global Daily Precipitation. In the case of the final version of IMERG (IMERG-F) the product is generated from the version IMERG-Late with monthly data from the Global Precipitation Climatology Center (GPCC)/Deutscher Wetterdienst (DWD). The CMORPH product was used only in the non adjusted version (hereinafter referred to as CMORPH). The basic information of the above mentioned products are given in Table 2.

Data Standardization
Data from all sources were classified according to the seasons: the southern summer (DJF), fall (MAM), winter (JJA) and spring (SON), for a 5-year period from 2014 to 2018. Gauge data were interpolated into a 0.1 • × 0.1 • uniform grid (as seen in Section 2.2) and used as reference for comparisons against the estimated values obtained from SPE's.
The IMERG (IMERG-F and IMERG-L) and GSMaP (GSMaP-G and GSMaP-MVK) products are also available for a grid with the same resolution above, but CMORPH data are given in an irregular grid of 0.08 • × 0.07 • (Table 2). However, in order to facilitate the evaluation and inter comparison among the different databases, the latter were interpolated for this resolution using a bi-linear interpolation method. All the products were accumulated for three hour period (just as we did for the gauge data) according to WMO guidelines.
Interpolated gauge data were also used to mask and filter out non-valid grid points from the SPE's in order to allow a direct comparison between the reference and estimated value at each grid point using statistical indices that will be presented in the next section.

Cluster Analysis
A hierarchical Cluster Analysis technique [26] was used to determine sub regions, hereafter denominated boxes, with homogeneous PDC. The PDC is characterized by the amplitude and phase. The cluster analysis was performed on each valid grid point in the database for the 5-year period. Those valid grid points as shown in Figure 2b.
After performing that cluster analysis, seven boxes with different characteristics of the PDC were selected for this study ( In order to verify the rainy period for each selected box, INMET climatological data were used as shown in Figure 4. Table 3 specifies the domain, season (which coincides with the rainy period of this box) and the number of grid points for each box. The number of points considered to perform the statistical analyses are closely related to the number of available points in a given box.
As mentioned before, the cluster analysis selected seven different groups with different PDC characteristics. In the next paragraphs, will be described the main characteristics of each PDC and the main meteorological systems acting on those regions.
The largest precipitation accumulation in boxes 1 and 2 occurs during the Southern Hemisphere summer (308.46 mm and 290.30 mm, respectively) and they are modulated by the presence of the South Atlantic Convergence Zone (SACZ) which extends from western Amazonia to the Atlantic ocean [27][28][29] and to the surface radiative heating.   The box 1, which is located in the southeastern region of Brazil, is characterized with a mean maximum intensity of 1.25 mm/3 h at 2100 UTC (1800 Local Time-LT) after the maximum solar heating. This suggest that thermal forcing is the main driver of the PDC. The minimum precipitation is observed at 1200 UTC (0900 LT) with less than 0.5 mm/3 h.
On the other hand, box 2 which is located in the western part of the Amazonian region, the peak hour is observed at 1800 UTC (1300 LT) with a mean maximum value of more than 2 mm/3 h, while the minimum value is observed at 0300 UTC with 0.7 mm/3h. In this case, rainfall episodes are characterized by regimes of "low-level easterly" and "westerly" winds in the context of the large scale circulation (enhancement or suppression of SACZ) [30].
During the winter season transients, in particular, moisture advection from the Southern Atlantic subtropical anticyclone are the key mechanisms to explain the precipitation regimen in the regions of boxes 3 and 4 [31]. However, the precipitation is more intense near the coast (224.21 mm inside box 3) compared to those further inland (77.44 mm inside box 4).
The convergence of the trade winds and the land breeze in the coastal region of the NE Brazil in early morning (more pronounced in the winter) and the development and propagation of that sea breeze (although less intense) inland during the late morning and afternoon (box 4) are the main mechanisms that modulated the PDC in both boxes [32,33]. Transients phenomena such as frontal systems and easterly waves also affect the PDC in the NE Brazil [34].
The observed data for the regions of boxes 5, 6 and 7 show a rainfall maximum during the austral fall season (290.15 mm, 358.52 mm and 312.77 mm, respectively) when ITCZ reaches its southernmost position [35,36] and the induced convection over the northern coast of Brazil by the sea breeze and the interaction of the trade winds with those breezes that produce tropical Squall Lines that propagate into the continent as described in [32,37,38].
The PDC of this region was largely studied by Brito et al. [11] and Janoviak et al. [4]. In our case, box 5 is over the continent close to the coast and represents the continental coastal regime with a maximum precipitation around 2100-0000 UTC (1800-2100 LT). Then, there is phase propagation of the precipitation with a maximum value of 3.5 mmh −1 at 0900 UTC as seen in box 6 (coast-inland regime). Those systems are known as tropical squall lines and they were also studied by Rickenback et al. [39]. The rainfall propagation continues inside the Amazonian region and the cycle changes from non-uniform to quasi-uniform regime at box 7 (inland regime) with maximum precipitation between 1200 and 1800 UTC. Another interesting feature is the minimum precipitation increase while the precipitation is moving inland from almost zero in box 5 to approximately 0.4 and 0.7 mmh −1 in boxes 6 and 7, respectively.

Statistical Indices
Various statistical indices were used for the regions defined in Section 3.2 in order to compare quantitatively the observations and the estimated precipitation as given by the different algorithms. The equations for the indices as well as their interpretations can be found in Wilks [18] and are summarized in Table 4. Pearson's correlation coefficient (CC) gives the agreement between the estimated precipitation and the observation at the gauge sites. The root-mean-squared error (RMSE) is used to measure the mean magnitude of the error while the standard deviation (SD) is used to measure the dispersion of the results for a given algorithm. Finally, the bias was used to calculate the systematic and aleatory components of the error in the algorithm products. For each hour, all indices are presented in the tables and highlighted in the Taylor [40] diagrams, after normalization as in Taylor [41]. Table 4. List of the statistical indices used to assess the quality of the satellite-based precipitation estimates.

Statistic Index Formula Unit Perfect Value
Notation: n is the sample size; S i is the SPE estimated precipitation; G i is the gridded reference data (gauges). Figure 5 show the PDC average values for all algorithms and the reference dataset in the boxes defined in Section 3.2. For a better understanding, the dispersion of mean values (expressed as a normalized value of the standard deviation) for each time is not shown in Figure 5 and it is presented separately in Table 5 and in Figures 6.

Precipitation Diurnal Cycle Comparison
The PDC in the first region (box 1), located in the subtropical region and modulated by SACZ and solar heating (maximum during late evening), is quite well represented by all algorithms. All of them could match the time of maximum precipitation (between 2100 and 0000 UTC) but the minimum value around 12 UTC (0900 LT) tends to be shifted around three hours at 1500 UTC with the exception of CMORPH (Figure 5a). However, the bias among the algorithms (see the first column of Table 5 and the Figure 6a, for all algorithms) is quite different. While CMORPH and IMERG suite overestimate the precipitation for all times (positive bias), GSMAP suite presents the lowest values for almost all times with the best performance for GSMaP-G. This result is expected because GSMaP-G is adjusted with daily gauges which help to reduce the bias. It is also important to note that GSMaP suite also has the larger CC and the less dispersion (lower normalized SD) when compared with the rest of the algorithms (Table 5 and Figure 6a).
The second region (box 2), located in the far west of Amazonian region, is largely influenced by the low level circulation with the enhancement/suppression of SACZ activity as mentioned before. In this case, none of the algorithms could fit, in a suitable way, the main characteristics of the PDC. The relative (and absolute) maximum of precipitation observed at 1800 UTC is not well represented by any algorithm (Figure 5b). This suggest that this dynamically driven precipitation process is not correctly represented by any SPEs. However, GSMaP suite has the better agreement with the reference dataset during the period with minimum values (between 0000-1200 UTC) with minimum bias, larger CC and less dispersion. CMORPH and IMERG suite has a better agreement during the peak hour, while the decrease in the precipitation rates at 2100 UTC is missed by all of them (they show the opposite behavior). It is worth mentioning that CMORPH and IMERG suite have larger dispersion values and smaller CC (when compared with GSMaP) for almost all times showing larger degree of uncertainty in those retrievals (Table 5 and Figure 6b). Boxes 3 and 4 ( Figure 5c,d, respectively) are located in the northeastern region of Brazil. The diurnal cycle (PDC) of this region and the physical drivers associated with this regime have been studied by Araujo [42]. In such study, it is clearly stated that the fraction of precipitation from shallow convection is larger than the fraction of precipitation from deep convective and stratiform clouds in a very thin region along the coast which is in a very close match with box 3. The frequency of precipitating deep convection clouds are relatively larger inland but the absolute number of events is much smaller than any other region, resulting in very low values for accumulated precipitation. These results are in good agreement with the results obtained for box 4.
In those regimes, where shallow convection is the main physical driver for precipitation (box 3, in our case), the SPEs tend to fail in retrieving rainfall because ice scattering, the main technique used to retrieve rainfall over the continent, is not efficient for water clouds rainfall retrievals. In that case, almost all algorithms for all times (with the exception of IMERG-F at 1800 and 2100 UTC) underestimate the mean rainfall value (negative bias), Table 5 and Figure 6c. It is also noted that none of the algorithms could represent properly the amplitude and phase of PDC (Figure 5c). It is also well known by the scientific community that water-land transition and shallow convection regimes are among the most difficult challenges that need to be addressed in future versions of the SPEs.   The box 4, mainly located inland over the northeastern region of Brasil, is the region where the lowest average rainfall is observed when compared with other regions (Figure 5d). In this case, most of the algorithms could reproduce those low values of precipitation with low bias (well below 1 mm-see Table 5 and Figure 6d) for all times (note that Figure 5d has a different scale, enhancing the differences among algorithms). However, in general terms, the values of the normalized standard deviation are among the largest values when compared with other regions (Table 5 and Figure 6d). This result suggests a larger dispersion and, consequently, larger uncertainty in the average value (sometimes four times the mean precipitation) which makes it very difficult to evaluate if PDC fits the reference dataset.
The last three boxes, located in the northern region, can be analyzed as the propagation and dissipation of tropical squall lines from the coast to the Amazon region [11,39]. Box 5, located in the continental coast, is mainly driven by a land-sea breeze process and the formation of deep convective clouds after the maximum heating (Figure 5e). The phase of this regime is very well captured by all algorithms (with a peak at 2100 UTC-1800 LT) while the amplitude is better represented by the GSMaP suite (MVK and G) with lower bias and less dispersion (lower standardized SD values), Table 5 and Figure 6e. The transition from a continental coastal to a coastal-inland region (box 6), Figure 5f, mainly driven by the displacement of tropical squall lines inland due to trade winds, is better represented by CMORPH (in phase and amplitude). All other algorithms could also estimate very well the phase of the peak hour at 1200 UTC (0900 LT) but fall short in the amplitude. In this particular case, IMERG suite overestimate the rainfall for all times (with better adjustment for late version), while GSMaP suite underestimate the maximum value and fits better for minimum values (Table 5 and Figure 6f).
The PDC in the last region (box 7) is not well represented by any algorithm (Figure 5g). While the amplitude of GSMaP suite is well represented in magnitude, the phase is the opposite of the observed values. In the case of CMORPH and IMERG suite, all of them overestimate the average precipitation for all times and the phase is completely missed when compared with the reference database (Table 5 and Figure 6g). The dissipation of some of the squall lines formed in the coast and the interaction with the low level circulation is the main factor which modulates the Precipitation Diurnal Cycle. (e) (f) (g)

Discussion and Conclusions
This study assessed the ability of several high-resolution satellite-based precipitation estimates to represent the Precipitation Diurnal Cycle (PDC) over Brazil during the 2014-2018 period. In order to perform this task, rigid quality control tests were applied to hourly rain gauge data from different national and regional networks used as the reference dataset, while a hierarchical cluster analysis was applied to rain gauge dataset and yielded seven regions with different PDC characteristics (amplitude and phase) where the performance of three different satellite-based precipitation algorithms (two of them with a gauge-adjusted version) were evaluated during the rainy season of each region.
The performance of all SPEs analyzed are directly related with the characteristics of the most frequent rainy systems acting in a given region. Generally speaking, in those regions where thermal heating produce deep convective clouds (i.e., boxes 1 and 5), the diurnal cycle is better represented in term of amplitude and phase; while the PDC for shallower convection and low-level circulation driven systems (i.e., box 3 and box 7) is poorly characterized by satellite-based retrievals. This result was expected because most of the algorithms relies on ice scattering techniques to retrieve rainfall over land and they fail when the amount of ice is not directly related with the accumulated precipitation. In a recent study of Costa et al. [43] over the Amazonian region showed that the rainfall overestimation error is only function of the Ice Water Path (IWP) and Palharini et al. [44] concluded that shallow clouds are the dominant systems over the Brazilian southeastern coast. Those statements will be discussed in detail for every region.
In those areas where deep convective clouds are responsible for most of the accumulated rainfall in a given region (boxes 1 and 5) and the best results are observed, some differences among algorithms could be pointed out: (i) GSMaP suite performs better than IMERG suite and CMORPH with lower bias, larger correlation coefficient and lower dispersion (blue colors in Table 5; (ii) GSMaP-G is slightly better than GSMaP-MVK due to the inclusion of daily gauges which reduce the bias (mainly during peak hours); (iii) IMERG suite and CMORPH overestimate rainfall for all times (positive bias) and also has a larger dispersion. In the case of IMERG-F (adjusted with monthly rain gauges), it does not outperform the IMERG-L, which suggests that gauge analysis, in this particular case, is not improving the satellite-only retrieval.
The regions located in the Amazonian far from the coast (boxes 2 and 7), where rainfall episodes are characterized by regimes of low-level easterly and westerly winds in the context of the large scale circulation, none of the algorithms could represent, in a suitable way, the amplitude and phase of the diurnal cycle. The IMERG suite and CMORPH (all of them rely on GPROF retrievals for passive microwave sensors) overestimate the observed rainfall. This could be due to the fact, cited by Costa et al. [43] who also used GPROF retrievals in that study, of IWP error estimation which leads in rainfall overestimation. IMERG-L outperform IMERG-F in this region, as also observed in box 1. GSMaP suite has, in general terms, lower bias and lower dispersion (Table 5) when compared with the rest of the algorithms.
The region 3 is dominated by shallow convection clouds which is responsible for most of the accumulated rainfall. With no ice (or very little) in its structure, none of the algorithms could reproduce the diurnal cycle properly with large underestimation of the observed values, mainly during the peak hour (0600 UTC). In this case, gauge-adjusted versions (GSMaP-G and IMERG-F) perform better than the respective satellite-only versions. Region 4, also located in the northeastern Brazil, is where the lowest accumulated rainfall is observed and it has the flattest diurnal cycle in terms of phase and amplitude.
The region 5 is characterized by the transition between a coastal land-sea breeze regime to a continental amazonian regime. In this case, the displacement of tropical squall lines inland generated over the coast, is the main driver for precipitating events. All algorithms, in different degrees of agreement, could represent properly the phase of the diurnal cycle. However, the amplitude is overestimated by IMERG suite (with better results for IMERG-L) and underestimated by GSMaP suite (the bias-adjusted outperform the satellite-only version). CMORPH have the best statistics when compared with other algorithms.
Future research should be centered on understanding how shallow clouds processes and convective/stratiform classification is performed in each algorithm to improve the representation of the diurnal cycle. Funding: This study was financed in part by and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior Brazil (CAPES)-Finance Code 001.

Acknowledgments:
The authors express their sincere thanks to the scientists responsible for the development of GSMaP, CMORPH and IMERG algorithms. They also acknowledge the National Institute for Space Research (INPE) for the rain gauge data database utilized in this study. The authors are thankful to the Ministry of Telecommunications, Information Technology, and Social Communication of Angola for sponsoring the publication of this article. The second author would like to acknowledge the São Paulo Research Foundation (FAPESP) for supporting this study through the project "Hydrometeorological Monitoring System (HMS) Based on Remote Sensing Products-2018/11160-2".

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: