Comprehensive analysis of GEO-KOMPSAT-2A and FengYun satellite-based precipitation estimates across Northeast Asia

ABSTRACT Geostationary meteorological satellites provide precipitation estimates with a high spatio-temporal resolution, which is important for near real-time precipitation monitoring. This study systematically evaluated geostationary orbit (GEO) satellite-based quantitative precipitation estimates (QPEs) from Chinese Fengyun-2 G (FY-2 G), Fengyun-4A (FY-4A), and South Korean Geo-KOMPSAT-2A (GK-2A) across Northeast Asia in 2020. Compared against ground-based rainfall gauges at a 6-hourly scale, FY-2 G provided the highest accuracy in the China region with a high correlation coefficient (R = 0.53) and a low bias (−0.26 mm) due to the ground calibration process in FY-2 G. Conversely, GK-2A provided more accurate precipitation estimates for South Korea and Japan stations. FY-4A QPE generally showed a large positive bias throughout different seasons, although it provided satisfactory R and categorical statistics. FY-based QPEs slightly overestimated summer precipitation, especially over South Korea and Japan region, while GK-2A tended to underestimate summer precipitation. All examined QPEs showed poor accuracy during the winter season due to the frozen particles and ice clouds. Intensity analysis revealed that FY-based QPEs tended to overestimate the occurrence of no rain and heavy rain cases, whereas GK-2A underestimated no rain and heavy rain cases and overestimated light rain occurrence. It is also found that all examined QPEs captured the temporal variation of precipitation during storm events, while FY-based products overestimated heavy precipitation peaks and GK-2A underestimated peak precipitation. The findings in the study provided valuable information to further improve current infrared precipitation retrieval algorithms.


Introduction
Northeast Asia is one of the regions greatly influenced by extreme precipitation (Kusunoki and Mizuta 2013;Takahashi and Fujinami 2021). The monsoon circulation and southwesterly winds during the summer contribute to a large portion (~60%) of annual precipitation in this region (Iwao and Takahashi 2006). With the increasing trend of extreme precipitation associated with climate change, Northeast Asia has experienced more frequent and severe flood events (Preethi et al. 2017;Si et al. 2021). For instance, China suffered the most severe flood in 2020 since 1998 and the annual mean precipitation in 2020 (694.8 mm) was 10.3% more than normal years (Chinese Meteorological Administration 2021). In the recent seven years, South Korea reported the most prolonged monsoon period in 2020, with over 46 days of heavy rainfall. In Japan, within the recent 30 years, the annual occurrence of short, intense precipitation (i.e. larger than 50 mm/hr) was approximately 1.4 times higher (https://www. mlit.go.jp/river/bousai/bousai-gensaihonbu/2kai/ pdf/siryou01-2.pdf). Therefore, understanding the spatio-temporal characteristics of precipitation, especially the extreme precipitation, in near realtime, are vital for hazard mitigation.
Conventional rain gauge directly measures precipitation with relatively high accuracy and high temporal frequency. However, it only provides point measurements and demands expensive operational and staffing costs (Sun et al. 2018). Ground-based radar instruments provide continuous spatial coverage of precipitation with a high, real-time resolution (Germann et al. 2006) but suffer from beam block by terrain and insufficient coverage across the globe (Habib et al. 2012). In recent years, different types of satellite-based precipitation products have been increasingly used in various fields (Gummadi et al. 2022). Passive microwave (PMW) sensors onboard low Earth orbit (LEO) can "see" through a cloud and yield realistic instantaneous precipitation estimates but they are limited by coarse temporal resolution (Xu, Adler, and Wang 2013).
Geostationary (GEO) meteorological satellites provide quantitative precipitation estimates (QPE) by building a relationship between infrared (IR) cloud top temperature and the probability and intensity of precipitation on the ground (Upadhyaya et al. 2020). Cold cloud tops usually suggest a larger vertical development in the cloud, and thus more precipitation (Sun et al. 2018). GEO-based QPEs provide large-scale coverage with high resolution in both space (~2 km) and time (5-30 minutes). Therefore, they are essential sources to fill the gaps in PMW QPE to generate spatially continuous global precipitation products with high temporal resolution. However, the relationship between cloud top temperature and precipitation is indirect and not always correct (Sun et al. 2018).
Two main GEO meteorological satellite series focusing on Northeast Asia are the Chinese Fengyun (FY) satellites and the South Korean Geostationary Korean Multi-Purpose Satellite (GK). Despite the valuable information provided by these two-satellite series, their QPE products are less examined. A limited number of studies Lu et al. 2020;) evaluated and compared FY-2 G QPE products with other satellitebased precipitation products such as the Integrated Multi-satellite Retrievals for Global Precipitation Measurement (GPM IMERG; Hou et al. 2014) precipitation products over China region. Results showed that FY-2 G generally underestimated summer precipitation in China Lu et al. 2020), and FY-2 G detected light rainfall (0.2-5 mm/ hr) with higher accuracy than IMERG . As for FY-4A, Gao et al. (2020) found that FY-4A failed to capture the spatio-temporal characteristics of precipitation and severely overestimated summer precipitation in southern China. Ren et al. (2021) found that FY-4A underestimated hourly precipitation over Western China. Even fewer studies evaluated the QPE product from the GK satellite series. Baik and Choi (2015) assessed the precipitation product from the first-generation GK series over the Korean peninsula and found that the QPE overestimated and underestimated precipitation before and after the monsoon season, respectively.
FY and GK satellites provide near-real-time precipitation estimates, which are important tools for hazards monitoring. The accuracy of FY and GK QPEs is also highly associated with the quality of merged global precipitation products. However, as stated beforehand, the QPE products from FY and GK satellites are seldomly examined with most studies focused on the sub-regional to regional scales (Baik and Choi 2015;Xu et al. 2019;Lu et al. 2020;Gao et al. 2020;Ren et al. 2021). A systematic evaluation and comparison of the two main GEO-based QPEs across Northeast Asia have not been investigated. Therefore, this study evaluated QPEs from the Chinese FY-2 G, FY-4A, and the South Korean GK-2A at the continental scale. To the best of our knowledge, this study is also the first evaluation of the GK-2A QPE product since its launch in 2018. Moreover, the findings of this study will provide valuable information to further improve current GEO-based precipitation retrieval algorithms. The remaining of this paper is organized as follows: Section 2 describes the study area and data sets used in this study. Section 3 introduces the statistical metrics used to evaluate the QPEs. Section 4 assesses the performance of FY-2 G, FY-4A, and GK-2A QPEs based on seasonality, rainfall intensity, as well as during the period of the tropical storms. A discussion of results follows in Section 5, and Section 6 summarizes the key findings of the study.

Study area and ground-based stations
This study selected Northeast Asia, including part of China, Japan, and South Korea, as the study domain with the geographic boundary of 18°N to 52°N and 75°E to 147°E. Figure 1 illustrates the elevation over the study domain along with the ground-based stations used to evaluate the satellite-based precipitation estimates. More than 60% of the study domain is classified as the mountainous area with the highest elevation of 8,848 m, 3,776 m, and 1,950 m in China, Japan, and South Korea, respectively. Summer precipitation across the study area is under the influence of the East Asia summer monsoon (Si et al. 2021). This rainy season prolongs approximately 20-30 days with heavy rainfall and tropical storms, and in turn, precipitation during the summer accounts for more than half of total annual  precipitation over the study domain (Baik and Choi 2015;Liu et al. 2021;Yihui and Chan 2005). Accordingly, the study area significantly undergoes natural hazards such as flood and precipitationinduced landslides (Kundzewicz et al. 2019;Ma et al. 2021;Lei et al. 2021), which obstacles the development of an efficient water management plan.

Satellite-based quantitative precipitation estimates (QPE)
Geo-KOMPSAT-2A (GK-2A), a geostationary satellite managed by the Korean Meteorological Administration (KMA) as a successor of Communication, Ocean, and Meteorological Satellite (COMS), was launched in December 2018 to continuously monitor the meteorological and oceanic conditions and provide communication services (Kim et al. 2021). GK-2A is operated on the geostationary orbit near the equator at a longitude of 128.2 °E with a mean altitude of 35,686 km and inclination of 0° (Magnes et al. 2020). It onboards the Advanced Meteorological Imager (AMI), which is a developed version of the Meteorological Imager (MI) onboarded the COMS. Specifically, COMS MI has five spectral bands (i.e. one visible and four infrared bands) while GK-2A AMI is composed of 16 spectral bands (i.e. four visible, two near-infrared, and 10 infrared spectrums) with a spatial resolution ranging from 0.5 km to 2 km (Kim et al. 2021;Jee et al. 2020). GK-2A uses brightness temperature (T b ) difference from four different bands (i.e. 6.24, 8.59, 11.21, and 12.36 μm) and T b from water vapor band to classify cloud into five different types (i.e. shallow, tall cold, tall colder, taller cold, taller colder cloud) across four longitude intervals (i.e. 80 S-30 S, 30 S-0 , 0 -30 N, 30 N-80 N). Afterward, the classified cloud information from GK-2A was used with a prior probability density function from GPM dual-frequency precipitation radar (DPR) in a Bayesian inversion to estimate the precipitation. Then, a cumulative distribution function matching method was applied to the estimated precipitation from GK-2A using GPM DPR-based precipitation in 2016 (Shin, Seo, and Kim 2019). GK-2A QPE is available since 31 October 2019 from the KMA website (https:// data.kma.go.kr/data/rmt/rmtList.do?code=21&pgmNo= 683). This study used the GK-2A QPE during 2020 with the temporal and spatial resolution of 10 min and 4 km (full-disk), respectively.
Fengyun 2 Meteorological Satellite Series (FY2) is the first generation of GEO meteorological satellites developed by China, and FY-2 G is one of the operational GEO satellites launched in December 2014 for monitoring the natural disaster (Lu et al. 2020). FY-2 G satellite is currently operated on the geosynchronous orbit at the 99.2°E (since April 2018) near the equator with an altitude of 35,786 km ). The Stretched Visible and Infrared Spin Scan Radiometer (SVISSR) onboarded FY-2 G has five channels covering visible, middle wavelength infrared, and thermal infrared spectrums. FY-2 G QPE is post-processed with the ground-based stations located in mainland China considering the directionality and intensity of precipitation in order to bias-correct the FY-2 G QPE (Lu et al. 2020). This study utilized FY-2 G level 2 hourly QPE during 2020 with the spatial resolution of 5 km (full-disk), which can be accessed via the Chinese Metrological Administration (CMA) National Satellite Meteorological Center (NSMC) webpage (https://satellite.nsmc.org.cn/portalsite/default.aspx).
Fengyun 4 Meteorological Satellite Series (FY4) is the second generation of GEO meteorological satellite developed by China, and FY-4A is the first operational GEO meteorological satellite in the FY4 series, which was launched on 11 December 2016 (Yang et al. 2017). FY-4A satellite is currently operated on the geosynchronous orbit at the longitude of 104.7°E. It carries Advanced Geosynchronous Radiation Imager (AGRI) with 14 channels covering visible, nearinfrared, shortwave infrared, middle wavelength infrared, water vapor, and longwave infrared spectrum (Ren et al. 2021;Yang et al. 2017). This study utilized level 2 hourly QPEs during the year of 2020 with the spatial resolution of 4 km at nadir (full-disk), which also can be accessed via the CMA NSMC webpage.

Ground-based measurements and validation procedures
The Integrated Surface Database (ISD) precipitation datasets provided by National Oceanic and Atmospheric Administration National Center for Environmental Information (NOAA NCEI; https:// www.ncdc.noaa.gov/isd) were used to evaluate GEO-based QPEs in China and Japan. In the case of South Korea, Automated Surface Observation System (ASOS) precipitation dataset provided by the KMA (https://data.kma.go.kr/data/) was used. NOAA ISD provides the 6-hourly accumulated precipitation in Universal Time Coordinated (UTC), while ASOS provides hourly precipitation at the local time zone. Accordingly, the ASOS precipitation dataset was converted into a UTC time and subsequently aggregated to 6-hourly cumulative precipitation to match with the ISD precipitation measurements. For quality control, the data quality flag provided by ISD datasets was used to assist stations selection. Additionally, this study did not use stations containing consecutive missing data for longer than 3 days in 2020. Afterward, a double mass curve analysis was conducted to ensure the consistency of precipitation datasets. As a result, a total of 304 ground-based stations (134, 75, and 95 stations in China, Japan, and South Korea, respectively) located within the study domain were selected in this study as shown in Figure 1.
For validating GEO-based QPEs against groundbased measurements, all the examined precipitation estimates (i.e. FY-2 G, FY-4A, and GK-2A) were temporally aggregated to 6 hours. Afterward, a point-to-pixel analysis (Thiemig et al. 2012;Logah et al. 2021) was conducted by comparing observed precipitation at each rain gauge with GEO-based precipitation estimates at the corresponding pixel where the rain gauge is located in. It is noted that the mismatch of spatial footprint between rain gauge and GEO-based precipitation estimates may provide uncertainty in statistical evaluation. However, spatial interpolation of ground-based precipitation also introduces uncertainties and causes the loss of accuracy, especially across orographic areas (Hu et al. 2016;Herrera et al. 2019). Considering the complex terrain in Northeast Asia as well as the relatively sparse station network (304 stations) in this study, we chose to conduct the point-to-pixel analysis.

Statistical evaluation
This study implemented (1) goodness of fit statistics and (2) categorical statistics (e.g. contingency table) for the quantitative evaluation of FY-2 G, FY-4A, and GK-2A QPEs with ground-based measurements following previous studies (Gao et al. 2020;Shahid et al. 2021).

Goodness of fit statistics
GEO-based QPEs collocated with the ground-based measurements were analyzed using bias, root mean square error (RMSE), and Pearson's correlation coefficient (R) as follows: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 n ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P n i¼1 R i À R i À � 2 q ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where R i and R i represent the instant and mean of the 6-hourly satellite-based QPEs (e.g. FY-2 G, FY-4A, and GK-2A) and O i and O i represent the instant and mean of precipitation measured from the ground-based stations, respectively. n denotes the sample size utilized to calculate the statistics during the entire study period at each point.

Categorical statistics
The contingency table has been extensively utilized in hydrologic fields to analyze the hit and miss of the simulated dataset compared to the observations. However, the usual contingency table statistics such as the probability of detection (POD), false alarm ratio (FAR), and critical success index (CSI) do not consider the actual volume of the target variable in the calculation. Thus, this study implemented the extended contingency table introduced by AghaKouchak and Mehran, A (2013), which provides an overall measure of volumetric performance and has been demonstrated to be helpful for evaluating gridded data sets. More specifically, volumetric hit index (VHI), volumetric false alarm ratio (VFAR), and volumetric critical successful index (VCSI) were calculated as follows: where t represents the threshold, which is set as 0.6 mm for 6-hourly accumulated precipitation (following the threshold of 0.1 mm for hourly precipitation in Behrangi et al. 2010). Even though all the categorical indices range from 0 to 1, they have different meanings: VHI and VCSI closer to 1 indicates that the volume of simulated precipitation is close to that of observed precipitation while VFAR closer to 1 indicates that volume of simulation precipitation does not match with that of observed precipitation.

Overall performance of satellite-based QPEs over Northeast Asia
Six-hourly cumulative precipitation estimates from FY-2 G, FY-4A, and GK-2A were evaluated against ground-based measurements, and station-averaged statistical metrics were summarized in Table 1. FY-2 G provided a higher consistency with groundbased measurements at most stations with a larger station-averaged R value (R FY-2G = 0.46) and a smaller magnitude of bias (bias FY-2G = 0.  -4) showed that the performance of FY-2G is highly region-dependent with higher R as well as smaller RMSE and bias detected in the China region. However, such regional dependent behavior was not witnessed in GK-2A. FY-4A did not present significant regional dependence in terms of R except that some stations in western China showed large R values. Nevertheless, FY-4A provided large RMSE and bias for stations located in southeastern China, South Korea, and Japan, where the annual precipitation is relatively larger. Maps of categorical statistics (Figures S1-S3 in supplementary) indicated that with a threshold of 0.6 mm, FY-2G outperformed FY-4A and GK-2A in the China region. Although FY-4A yielded a large bias in southeastern China, South Korea, and Japan, it still provided high VHI and low FAR relative to GK-2A in these regions. Noticing the different behaviors of FY-2G for stations located inside versus outside of the China region, station-averaged statistics for China, South Korea, and Japan were computed and summarized in Table 1. Results suggested that FY-2G consistently yielded better statistics in the China region for all examined metrics with higher consistency (e.g. R FY-2G_China = 0.53, R FY-2G_Korea = 0.46, and R FY-2G_Japan = 0.33) and smaller errors (e.g. RMSE FY-2G_China = 4.22 mm, RMSE FY-2G_Korea = 8.07 mm, and RMSE FY-2G_Japan = 9.24 mm) when compared to ground-based measurements. However, in South Korea and Japan, FY-2G provided inferior agreement and larger errors relative to GK-2A (e.g. R GK-2A_Korea = 0.48, and RMSE GK-2A_Korea = 5.37 mm).

Seasonal analysis of satellite-based QPEs over Northeast Asia
Six-hourly cumulative precipitation averaged for each season (hereafter referred to as seasonal average) derived from FY-2G, FY-4A, and GK-2A were evaluated against ground-based measurements at each station location. As only precipitation in 2020 was used in this study, 6-hourly precipitation from March to May, June to August, September to November were used to compute the seasonal average in spring, summer, and autumn, respectively. While for the winter season, precipitation estimates in January, February, and December of 2020 were used to calculate the average.
The scatter plots of seasonally averaged 6-hourly precipitation in each season between GEO-based QPEs and ground-based measurements are shown in Figure 5 and corresponding bias maps are shown in Figure 6. During the spring season, FY-2G and GK-2A agreed better with ground-based measurements than FY-4A with most stations located along the 1:1 line ( Figure 5a). However, GK-2A overestimated some stations located in southeastern China, where FY-2G exhibited a slight underestimation (Figure 6a, b). As for FY-4A, a positive bias was witnessed at most stations.
During the summer season, FY-based QPEs exhibited opposite behaviors with GK-2A QPE. An overall underestimation of summer precipitation was found in GK-2A, whereas FY-2G and FY-4A overestimated precipitation at most stations, especially in South Korea and Japan regions (Figure 6d-f). All QPEs provided underestimated precipitation in the China region during the autumn, but FY-based QPEs still overestimated precipitation outside of China. All examined QPEs (i.e. FY-2G, FY-4A, and GK-2A) yielded the worst performance during the winter season, with most stations deviating from the 1:1 line (Figure 5d). FY-2G and GK-2A both underestimated precipitation across most stations, while FY-4A provided largely overestimated precipitation in southeastern China, South Korea, and Japan (Figures 6j-l).
Temporal statistics of 6-hourly precipitation between GEO-based QPEs and ground-based measurements at each station location were computed during different seasons, and station-averaged statistics are listed in Table 2. FY-2G, FY-4A, and GK-2A all provided better statistics during the summer season when there is more precipitation at most stations with relatively larger R values (R FY-2G_summer = 0.49, R FY-4A_summer = 0.43, and R GK-2A_summer = 0.41), higher hit rate (VHI FY-2G_summer = 0.80, VHI FY-4A_summer = 0.79, and VHI GK-2A_summer = 0.75), and lower false alarm ratio (VFAR FY-2G_summer = 0.26, VFAR FY-4A_summer = 0.29, and R GK-2A_summer = 0.43) than in other seasons. The accuracy of precipitation estimates from FY-2G, FY-4A, and GK-2A during the winter is evidently inferior relative to other seasons (e.g. R FY-2G_winter = 0.26, R FY-4A_winter = 0.26, and R GK-2A_winter = 0.28). The inferior performance of QPEs during the winter season is further discussed in Section 5.
Additionally, an intensity analysis was conducted for the 6-hourly cumulative precipitation derived from ground stations, FY-2G, FY-4A, and GK-2A during different seasons and the whole year in 2020 ( Figure 7). FY-2G generally showed more similar distribution of rainfall intensity ranging from 1 to 20 mm with ground-based measurements than GK-2A except for the winter season. During the winter season, FY-2G consistently underestimated the frequency of 6-hourly cumulative precipitation between 0.1 and 50 mm (Figure 7d). It is also noted that FY-2G showed a slightly higher frequency of no rain (6-hourly cumulative precipitation < 0.1 mm) and heavy rainfall (->50 mm) events relative to ground-based measurements throughout different seasons. FY-4A satisfactorily represented the frequency of precipitation between 1 and 20 mm compared to groundbased measurements. However, it largely underestimated the frequency of very light rainfall (0.1-1 mm) and overestimated the frequency of both moderate (20-50 mm) and heavy rainfall (>50 mm), especially during the winter season. Opposite to FY-based QPEs,  GK-2A tended to provide less no rain and heavy rainfall cases across different seasons (Figure 7a-d).
However, it provided more light rainfall cases (0.1-5 mm) relative to ground-based measurements, especially during the summer season.

Analysis of satellite-based QPEs during the storm event
As the geostationary satellite provides precipitation estimates with relatively high resolution in both space and time, a GEO-based QPE is vital for precipitation near real-time monitoring, especially during storm event periods. This section evaluated the performance of FY-2G, FY-4A, and GK-2A QPEs during two storm events periods in the year 2020: (1) tropical storm Mekkhala and Higos (August 9 to 21 over Fujian and Guangdong province, China) and (2) tropical storm Bivi, Maysak, and Haishen (August 26 to September 8 in South Korea). Statistical evaluation was conducted at three different time steps (e.g. 6-hourly, 12-hourly, and daily) over Fujian and Guangdong province, China, while four different time steps (e.g. hourly, 6-hourly, 12-hourly, and daily) were used over South Korea as KMA provides ground-based measurements of hourly precipitation.  Station-averaged time series of 6-hourly precipitation estimates from the three GEO-based QPEs were compared against ground-based measurements within the Fujian and Guangdong provinces during the storm period (Figure 9). The result confirmed that all examined QPEs generally yielded similar temporal variation with the ground-based measurements. In terms of magnitude, GK-2A QPE showed constant underestimation while FY-2G and FY-4A QPEs were overestimated. Especially, FY-2G and FY-4A QPEs tended to overestimate peak precipitation during 18-19 August, which aligned with the occurrence of tropical storm Higos over Fujian and Guangdong provinces. Detailed discussions of the individual GEObased QPEs during the storm period are summarized in Section 5. Figure 10 represents the statistical performance of the three GEO-based QPEs at 95 ground-based stations located in South Korea from 26 August 2020 to  8 September 2020. The range of station-averaged R did not show a significant difference among the three QPE products (Figure 10c). However, GK-2A QPE showed the smallest magnitude of bias and RMSE, followed by FY-2G and FY-4A (Figure 10a). More specifically, the bias of FY-4A ranged from −2.39 to 27.9 mm, while that of GK-2A ranged from −9.4 to 1.57 mm. Comparing the bias at different time steps indicated that GK-2A QPEs showed positive (negative) bias during hourly (daily) timestep while opposite behavior was revealed for FY-based products. In terms of RMSE, GK-2A showed the lowest RMSE (4.91 to 25.4 mm), followed by FY-2G (7.96 to 44.9 mm) and FY-4A (9.98 to 70.9 mm) across South Korea (Figure 10b). For categorical indices illustrated in Figure 10d-f, all three examined QPEs showed a similar trend that VHI and VCSI increased and VFAR decreased when time stamps increased except that VHI decreased from hourly to 6-hourly timescale. Among the examined QPEs, FY-4A yielded the lowest VHI and VCSI and highest VFAR, suggesting that FY-4A showed the worst performance in capturing the storm event over South Korea.

Tropical storm Bivi, Maysak, and Haishen
Again, time series of station-averaged hourly and 6-hourly precipitation estimates from FY-2G, FY-4A, and GK-2A were compared against ground-based precipitation measurements (Figure 11). Although all  examined QPE products showed the capability to capture the temporal variability, the magnitude of precipitation showed discrepancies when compared against the ground-based measurements. In the case of GK-2A, even though station-averaged goodness-offit statistics showed better performance than FYbased products in Figure 10, GK-2A consistently underestimated precipitation at both hourly and 6-hourly time steps (Figure 11). However, FY-2G and FY-4A QPEs overestimated the peak precipitation during 27 August, 2-3 September, and 6-7 September 2020, which aligned with the occurrence of the storm event across Korea. Additionally, FYbased products showed more deviation from groundbased measurements when compared against GK-2A, and in turn, resulted in slightly larger stationaveraged bias and RMSE as summarized in Figure 10b.
In addition, diurnal variations of the station-averaged, hourly precipitation from satellite-based QPEs were assessed against ground-based measurements ( Figure 12). Similar diurnal variation was witnessed in the ground-and satellite-based precipitation in that relatively large variation occurred during the late afternoon to evening (e.g. 15-23 in local time). Among the three different GEO-based QPEs, FY-2G showed the most similar diurnal variation with the ground-based measurements followed by GK-2A and FY-4A. GK-2A yielded the least variation among the three examined QPEs with constant underestimation of precipitation intensity that is also depicted in Figures 10a, 11a. Both FY-4A and FY-2G overestimated precipitation during the late afternoon to evening. Notably, FY-4A QPE showed a more fluctuation in diurnal variation larger dynamic diurnal variation of precipitation, resulting in the largest station-averaged bias and RMSE depicted in Figure 10.

Discussion
FY-2G, FY-4A, and GK-2A QPEs generally exhibited a similar variation of precipitation when compared against ground-based measurements. However, errors and uncertainties in the QPEs are not neglectable, which are mainly associated with the precipitation retrieval algorithm. GEO-based QPE is estimated based on the relationship assumed between the cloud-top infrared brightness temperature and rainfall intensity (Upadhyaya et al. 2020), which is not always correct. For example, it may mistakenly identify nonprecipitating cirrus clouds as precipitating clouds due to their low brightness temperature (Scofield and Kuligowski 2003), resulting in an overestimation of precipitation. The accuracy of GEObased QPEs varied in different seasons with the worst performance found during the winter season. Previous studies (Vicente, Scofield, and Menzel 1998;Upadhyaya et al. 2020) demonstrated that the infrared rainfall estimation technique performed poorly in estimating stratiform precipitation such as underestimating rainfall rate in warm-top stratiform cloud systems. However, mid-level stratiform clouds are prevalent over East Asia in cold seasons (Wu and Chen 2021), which may explain the relatively poorer performance of the examined QPEs during the winter season. Additionally, the increased dust (Kaufman, Tanré, and Boucher 2002) and the existence of frozen particles and ice clouds (Prigent 2010) during the winter complicate the physical rainfall process, which further causes problems for the established cloud top temperature and rainfall intensity relationship (Vicente, Scofield, and Menzel 1998).
Previous studies (Zhu et al. 2018; found the good performance of FY-2G over China. For instance,  showed that FY-2G yielded the best statistics at daily scale (R = 0.73 and RMSE = 5.80 mm/day) across mainland China relative to FY-2 F and two IMERG products (i.e. IMERG-Late and IMERG-Final). However, there were no studies analyzing the accuracy of FY-2G QPE outside of China. This study evaluated the performance of FY-2G both inside and outside of the China region. Similar good performance of FY-2G in the China region was also witnessed in this study, whereas FY-2G exhibited inferior performance in Japan or South Korea (Table 1). The regional dependency of FY-2G statistics is associated with the ground calibration implemented to FY-2G QPE in China (Zhu et al. 2018). FY-2G QPE includes a fusion strategy to calibrate satellitebased precipitation estimates with ground-based measurements. The calibration process uses over 2000 meteorological stations in China with both distance and the angle between the target grid and stations considered (Xu et al. 2008;). However, this ground calibration process was only conducted in the China region. Therefore, FY-2G QPE yielded higher accuracy in the China region than in South Korea or Japan. It is also noted that ISD stations used for evaluation in the China region may be partially included as the ground calibration data for FY-2G. Furthermore, some ISD stations that are not directly utilized in calibration process might still lead to optimistic evaluation if they are located close to stations used for the calibration.
FY-4A generally yielded inferior accuracy relative to FY-2G and GK-2A with larger bias and RMSE values. The relatively poor performance of FY-4A found here coincides with previous studies (Gao et al. 2020;Ren et al. 2021;Qiu et al. 2021). Gao et al. (2020) found that FY-4A failed to capture the spatial characteristics of summer precipitation in southern China with the lowest R and largest RMSE at daily scale (R = 0.36 and RMSE = 27.51 mm/day). Ren et al. (2021) also observed the limited capability of FY-4A QPE in representing summer precipitation in western China at an hourly scale (R = 0.21 and RMSE = 13.78 mm/hr). They found that FY-4A underestimated the summer precipitation in Western China with all hourly precipitation estimates less than 30 mm. The relatively poor performance in FY-4A QPE may be related to the lack of a ground calibration process (Gao et al. 2020;Wang et al. 2021). Additionally, the AGRI onboard FY-4A does not load with a high-precision calibration system in visible and infrared channels, which can influence the radiometric performance and lead to the misclassification of the cloud (Zhong et al. 2021). The misclassification of cloud may degrade FY-4A QPE accuracy as cloud products were utilized to estimate precipitation (Min et al. 2017).
GK-2A generally provided a consistent performance across Northeast Asia with satisfactory accuracy. Instead of ground calibration, GK-2A calibrates QPE using the GPM DPR rainfall product. DPR instrument uses dual-frequency (Ka-band at 35.5 GHz and Ku-band at 13.6 GHz) to obtain the three-dimensional structure of precipitation, which improves the profiling of rainfall size, shape, and distribution, especially for the rainfall event associated with deep convective that cannot be explained by single scattering (Battaglia et al., 2015). The quality of the GPM DPR rainfall product greatly influenced the accuracy of GK-2A QPE. It is noted that GK-2A overestimated the frequency of light rain frequency and underestimated heavy rain frequency in Section 4.2. Gao et al. (2021) found that the DPR rain rate product obviously overestimated the occurrence for rainfall rate between 0.5 and 1 mm/hr while it underestimated the frequency of rainfall rate larger than 1 mm/hr. As the rainfall rate distribution of GK-2A QPEs was calibrated by DPR rainfall estimates based on a cumulative distribution function matching (Shin, Seo, and Kim 2019), GK-2A QPE showed similar distribution characteristics as DPR rainfall product.
Additionally, the slight underestimation of summer precipitation in GK-2A found in Sections 4.2 and 4.3 may be associated with the short period of the GPM DPR rainfall product that was used for calibration. As mentioned in Section 2.2, a priori probability distribution function based on the rainfall intensity from GPM DPR from March 2016 to February 2017 was used for GK-2A QPE retrieval. However, according to the Korea National Typhoon Center, South Korea underwent tropical storms with higher intensity at an abnormally delayed landing period during late September to October due to the strong El Nino during 2015-2016 (Hu and Fedorov 2017). Therefore, calibration of GK-2A relying on the rainfall intensity of GPM DPR observed during March 2016 to February 2017 is insufficient to reflect the inter-annual precipitation climatology, which can increase the uncertainty of GK-2A.

Conclusion
Geostationary meteorological satellite-based QPEs provide great potential for near-real-time precipitation monitoring, which is crucial for mitigating hazards associated with extreme precipitation. This study evaluated and compared QPEs from FY-2G, FY-4A, and GK-2A against ground-based measurements at a continental scale over Northeast Asia during the year of 2020. Particularly, this study provided the first quantitative evaluation of QPE from GK-2A since its launch in 2018.
Overall comparison indicated that FY-2G outperformed FY-4A and GK-2A in China region due to the ground calibration included in the FY-2G QPE retrieval and the potential dependency between FY-2G and the evaluation data. GK-2A calibrated IR-based precipitation estimates with GPM DPR rainfall product and exhibited a more consistent performance across the study area with satisfactory accuracy. FY-4A QPE used in the study is raw precipitation estimate without calibration, and thus it provided the lowest accuracy with large bias. The findings suggested the importance of including a calibration process for IR-based precipitation estimates to improve the overall accuracy.
The performance of FY-2G, FY-4A, and GK-2A QPEs varied along with the seasons and intensities. FY-2G largely overestimated summer precipitation while GK-2A underestimated summer precipitation at most stations. FY-4A tends to overestimate precipitation in all seasons. Furthermore, FY-2G, FY-4A, and GK-2A all showed the poorest accuracy during the winter season due to the prevalence of stratiform clouds over East Asia and the existence of frozen particles and ice. The errors in QPEs are mainly attributed to the precipitation retrieval algorithm as well as the data quality of calibration data products. Intensity analysis revealed that FYbased QPE showed better consistency with ground-based measurements for the 6-hourly cumulative rainfall between 1 and 20 mm. FYbased QPEs usually overestimated the frequency of no rain (<0.1 mm) or heavy rainfall (>50 mm) cases. On the contrary, GK-2A provided more 6-hourly rainfall cases between 0.1 and 5 mm and underestimated the frequency of no rain or heavy rainfall events.
Storm events analysis indicated that FY-2G, FY-4A, and GK-2A successfully captured the extreme rainfall events, but discrepancies in phase and magnitude exist. More specifically, all examined QPEs showed the largest uncertainties during the rainfall peak, which is related to the different retrieval algorithms and calibration schemes for the different satellite-based QPEs. FY-2G better captured the temporal variation of precipitation during the storm events (i.e. larger R) while GK-2A showed smaller quantitative errors (i.e. bias and RMSE). In case of diurnal variation, FY-2G yielded more consistent diurnal variation of precipitation in storm events with ground-based measurements than GK-2A and FY-4A.
As an important tool for near real-time precipitation monitoring and a crucial data source to generate IR-microwave merged global precipitation estimate products, understanding the accuracy of GEO-based QPEs is indispensable. Results from the study indicated the need to further improve IR-based precipitation estimates algorithms, especially during the winter season. Current IR-based precipitation retrieval algorithm mainly relies on the relationship between cloudtop brightness temperature and rainfall intensity, which may not contain adequate information to represent the complex precipitation process. An additional calibration step can effectively correct the bias in QPEs and improve estimation accuracy, but only to some extent depending on the accuracy of calibration data. Using observations from multiple bands based on machine learning method may be one potential technique to further improve IR-based QPEs Xue et al. 2021). With improved accuracy, QPEs from FY-2G, FY-4A, and GK-2A can be used as a crucial meteorological data set for broader hydrological applications such as drought monitoring (e.g. Standardized Precipitation Index and Palmer Drought Severity Index) and flash flood warning.