Mapping the representativeness of precipitation measurements in Mainland China

Meteorological observations provide essential data for weather forecasting and climate change studies. Whether the measured data can accurately support such applications closely relates to the representativeness of the data collected, which depends on both the scale of observation and the density of the measurement network. Precipitation presents in the form of events and is discontinuous both in time and space. Gauge observations of precipitation could provide fundamental data but have difficulty quantitatively assessing precipitation system scale. Therefore, assessments on the representativeness of precipitation at synoptic and climatological scales remain needed. Here, we show the first high-resolution map of the representativeness of precipitation over Mainland China based on the latest satellite data. Our results show that the daily precipitation spatial consistency is the highest in eastern China and lowest on the Tibetan Plateau. However, the pattern of the monthly spatial consistency is different and is the highest over Northeast China Plain, the Loess Plateau, and the Middle–Lower Yangtze Plain. Compared to the density of rain gauges, we find that the current national station network with ∼2400 stations still has difficulty supporting synoptic studies in western China. However, for climate change studies based on monthly data, the density of the national reference climatological station network is sufficient, except in the western Tibetan Plateau and deserts with no available stations. For climatological studies, the quality of precipitation gauge observations is more important than its spatial density. Our results could provide great practical significance for considering the layout of rain gauges.


Introduction
Precipitation, a vital component of the water and energy cycle, is one of the most common weather events that occurs in our daily lives (Trenberth et al 2003, Kidd and Huffman 2011, Hou et al 2014. Unlike other meteorological fields, precipitation is a discontinuous variable and presents in the form of events (Li et al 2021). Due to various formation mechanisms, precipitation may also vary greatly with distance . Therefore, precipitation features significant heterogeneity both in time and space.
Gauge observations of precipitation collected at meteorological stations have provided essential data for operational weather forecasting, climate change studies, water resource management, and natural disaster prevention and reduction (Kidd 2001, Jiang et al 2012, Kidd et al 2017. However, can gauge precipitation truly support these applications? To answer this key question, we need to know the representativeness of precipitation, which is a fundamental parameter needed to structure meteorological station networks. The latest Guide to Meteorological Instruments and Methods of Observation published by the World Meteorological Organization defines the representativeness of an observation as the degree of accuracy to which it defines the value of the variable for a specific purpose (Organization 2018). For instance, synoptic observations should typically be representative of an area up to 100 km around a station (Organization 2018).
However, is 100 km enough for synoptic observations of precipitation over China? What is the requirement for precipitation observations in climate change studies? Can current weather station networks support our operational weather forecast and climate change research? Although some existing studies have evaluated the accuracy of precipitation observational networks in China , Wang et al 2012, Li and Li 2017, such key questions have long been not answered. This is mainly attributed to the fact that these existing studies merely used gauge observations to calculate the representativeness of precipitation, such as the correlation between neighboring gauge observations in a certain area (Nappo et al 1982, Jacobs 1989, Wang et al 2011, Li and Li 2017. But gauge observations could not provide spatially continuous precipitation information and thus have difficulty quantitatively assessing precipitation system scale. One related study that investigated the spatial representatives of precipitation measurements over China analyzed the spatial representativeness of precipitation observations over Southwest China based on station network coverage and the spatial differences in interannual precipitation variation (Li and Li 2017). However, this study merely calculated the number of representative grid boxes instead of directly displaying the representative scale, and it used only the interannual precipitation data over specific regions of China.
We have tried to answer such questions from the perspective of precipitation systems according to their own spatial characteristics based on the latest satellite retrievals of precipitation.  investigated the global distribution of precipitation system size and found that most large precipitation systems (>10 6 km 2 ) occurred over ocean, and land areas were mostly characterized by median-size precipitation systems (10 4 -10 6 km 2 ). Although the precipitation system scale could reflect the representativeness of precipitation in each grid box to some extent, the system scale can merely reveal the spatial character of different precipitation systems. The precipitation spatial variability inside the same system could still not be reflected, especially some largescale precipitation systems with complex embedded structures.
Furthermore, existing studies at the precipitation system scale have mostly calculated the scale by grouping contiguous rainy grid boxes via various criteria (Liu et al 2008, Zhang and. In other words, the previous method of calculating the precipitation system scale considered only whether the grid box was rainy but did not include information about the precipitation amount. The rainfall amount is more important than whether it is rainy in terms of practical purposes (Yilmaz et al 2010, Jiang et al 2012, Liu et al 2017. Therefore, it is also necessary to determine a method to combine precipitation amount information with precipitation spatial scale information for further investigation. In this study, we choose to further calculate the distribution of precipitation spatial consistency at different time scales over Mainland China. The main motivation of this study is to calculate and obtain the first high-resolution map of the representativeness of precipitation measurements over Mainland China. First, we investigate the spatial consistency of precipitation at different time scales by using the recent merged satellite precipitation product from the global precipitation measurement (GPM). Then, we compare the correlated scale of precipitation spatial consistency with the closest distance between adjacent stations of the existing precipitation observational networks in Mainland China to examine whether the density of the existing precipitation gauges over Mainland China could meet the needs of basic synoptic and climatological studies on precipitation. Finally, the associated factors that might affect the distribution of precipitation spatial consistency are discussed.

Data
The Integrated Multi-satellitE Retrievals for GPM (IMERG) is the level-3 gridded product with a temporal-spatial resolution of 30 min and 0.1 • × 0.1 • (Huffman et al 2019b). The dataset is the successor of the Tropical Rainfall Measuring Mission Multi-satellite Precipitation Analysis with a temporalspatial resolution of three hours and 0.25 • × 0.25 • in coverage over 50 • N-50 • S (Kummerow et al 1998). IMERG uses various precipitation estimates derived from an international sampling constellation of passive microwave sensors from low Earth orbit satellites and infrared sensors from a geosynchronous Earth orbit. Furthermore, with the first spaceborne dual-frequency precipitation radar (DPR) and the advanced multifrequency/polarization GPM microwave imager (GMI), the accuracy of precipitation estimates from the GPM Core Observatory was greatly improved. Since the precipitation estimates from other multiple sensors in the IMERG product are uniformly calibrated based on the estimates from the GPM Core Observatory, the quality of these data is also greatly enhanced. Additionally, the IMERG product integrates many advanced merging algorithms, such as the Kalman Filter Lagrangian time interpolation algorithm and Artificial Neural Networks-Cloud Classification System recalibration scheme (PERSIANN-CCS) (Ashouri et al 2015) and the Climate Prediction Center (CPC) Morphing-Kalman Filter (CMORPH-KF) Lagrangian time interpolation scheme (Joyce and Xie 2011). Therefore, the IMERG product is currently considered one of the most advanced merged satellite precipitation data (Guo et al 2016, Dezfuli et al 2017, Li et al 2018, Prakash et al 2018, Chen et al 2019 and could provide new insight into precipitation spatial representativeness research at higher temporal and spatial resolutions than those of previous precipitation products. Here, we choose the latest Final Run Version 06B of IMERG, which is adjusted with the Global Precipitation Climatology Project Satellite-Gauge product and considered to be the most accurate among the three runs of IMERG (Huffman et al 2019a). Moreover, the global 30-arc-s elevation dataset (GTOPO30) with a resolution of 0.05 • × 0.05 • is used to analyze the effect of topographic conditions on the distribution of the precipitation spatial consistency.
The existing precipitation gauges evaluated in this study are the surface meteorological stations distributed across Mainland China, including national reference climatological stations, national basic synoptic stations, national meteorological observing stations, and automatic weather stations (AWSs). The distribution of the gauge network is shown in figure S1. The national reference climatological stations with ∼210 stations are climatological observation stations set up to obtain fully representative, long-term, and continuous climatological data according to the national climatic regionalization and the requirements of the Global Climate Observing System (GCOS). The national basic synoptic stations, of which there are ∼840 stations that are the main body of the national meteorological observation network across Mainland China, were set up according to the needs of national climate analysis and weather forecasting. National meteorological observing stations with ∼2400 stations were set up based on the administrative division of Mainland China, and the data obtained are mainly used for local meteorological services. AWSs with ∼55 000 stations were set up where it is inconvenient to establish national observation stations, which are used for increasing the spatial density of the national weather and climate station network (Administration 2003).

Methods
Since the GPM-era IMERG product started in June 2014, we choose the full years of 2015-2020 as our study period. Here, 0.1 mm h −1 is considered the threshold of precipitation to avoid large uncertainties in light precipitation (Li et al 2020, Tapiador et al 2020. In the past, the spatial representativeness of observation data was often calculated by the consistency of gauge observations in a certain area (e.g. Nappo et al 1982). In this study, we investigate the representativeness from the perspective of objects by showing how representative they are of the precipitation at a single location over a larger area and characterizing how well it relates to and covaries with other points at different time scales. Therefore, we borrow the concept of precipitation systems and correlations here to further analyze the representativeness of precipitation over Mainland China.
First, we calculate the daily and monthly anomaly series of precipitation during the period of 2015-2020 for Mainland China, to remove the effect of seasonal cycle. Next, we calculate the correlation coefficients of each grid box with all other grid boxes within the adjacent 15 • × 15 • box by using the daily and monthly anomaly series. It should be noted that although 2192 d and 72 months (2015-2020 time period) are mathematically enough to conduct correlation analysis, since we use rainy samples to conduct correlation calculation and the sample sizes determine whether the correlation coefficients are indeed high or not, we further check the rainy sample sizes of the daily and monthly anomaly series. The results show that 94.84% of the grid boxes over the whole Chinese Mainland domain have more than 1000 rainy samples for the daily anomaly series. For the monthly anomaly series, 99.94% and 94.80% of the grid boxes over the whole Chinese Mainland domain have more than 50 and 70 rainy samples, respectively. Therefore, the sample sizes are satisfactorily sufficient for most regions of our study areas, and the obtained correlation coefficients are indeed high according to the degree of freedom of rainy samples. Also, we filter out the correlation coefficients with p-value more than 0.01 to ensure the statistical significance of the correlation calculation.
Then, we select three thresholds of the correlation coefficients (R), namely, R 2 = 0.9, 0.7, and 0.5. In previous precipitation studies, the selection of the correlation coefficients is experience-oriented without a fixed criterion, and in most cases R > 0.7 is considered to have a highly positive correlation (e.g. Meng et al 2014, Trenberth and Zhang 2018, Mao et al 2022. Here, we select the threshold of the correlation coefficients according to the traditional method of calculating the representativeness of point measurements (Nappo et al 1982). In this previous study, the representativeness is calculated by the probability that a point measurement lies within one standard deviation of the area-average value, which is 68% in a normal distribution and can be seen as an analogy of R 2 = 0.7. Therefore, we also choose the R 2 = 0.7 as the threshold of correlation coefficients. Also, for the rigor of the results, R 2 = 0.9 and 0.5 are also calculated for comparison. The area of the contiguous grid boxes that contain the central point of the 15 • × 15 • box, of which the correlation coefficients with the central point greater than the different thresholds, are defined as the 'correlated area' to denote the precipitation spatial consistency of the central grid box under different thresholds. Next, the areas are calculated based on the area that each grid represents, and the square root of the areas is used to indicate the 'correlated scale' , as shown in equation (1): where S i is the area that each grid represents, and ngrid is the number of grid boxes that are correlated to and belong to the contiguous areas of the central grid box. Next, we calculate the closest distance between adjacent stations of the precipitation observational networks in Mainland China and compare it with the corresponding correlated scale of precipitation spatial consistency. Here, we choose the correlated scale at the threshold of R 2 = 0.7 for the correlation coefficient to compare with the density of rain gauges.

Precipitation spatial consistency
We obtain the distribution of precipitation spatial consistency by using the daily and monthly anomaly series during the period of 2015-2020 for Mainland China (figure 1). Figure 1 illustrates the evident differences in the distribution of precipitation spatial consistency at different time scales, which could be explained by the fact that when averaged over time and space, the accumulated precipitation will generally have higher spatial coherency (Bell et al 1990). Figures 1(a 1 -a 3 ) shows the distribution of precipitation spatial consistency at the daily scale over Mainland China. The results are basically consistent with the climate regionalization and humid-arid distribution of China (Zheng et al 2013). The largest scales appear over the southeastern regions of China, especially the Middle-Lower Yangtze Plain and the southeastern coastal areas. The main reason why the correlated scales are larger over these regions could be related to the abundant frontal precipitation events caused by East Asian summer monsoon systems, which have regular large spatial scales (Ding and Johnny 2005, Ninomiya and Shibagaki 2007, He and Liu 2016. In contrast, the smallest correlated scales occur over the Tibetan Plateau, where precipitation is scarce and light, which can also indicate very local precipitation primarily attributed to local forcing, such as the low-level atmospheric instability and moisture convergence related to local surface solar heating (Yu et al 2004(Yu et al , 2007 and the active convection triggered by mountain-valley breezes caused by complex topographic features (Fujinami et al 2005, Singh and Nakamura 2009, Guo et al 2014. The distribution of precipitation spatial consistency at the monthly scale is clearly different from that at the daily scale (figures 1(b 1 -b 3 )). The largest correlated scales are along the Heihe (31 • N, 92 • E)-Tengchong (25 • N, 98 • E) Line (the Hu line, or 400 mm isohyet line), including Northeast China Plain and the Loess Plateau, as well as some regions of the Middle-Lower Yangtze Plain. In contrast, the precipitation over humid regions such as the southern and central parts of China is characterized by a smaller correlated scale at the monthly scale, which indicates that although these regions are dominated by larger precipitation systems and more rainfall, the spatial consistency of precipitation does not improve quickly in the process of time upscaling likely due to the domination role of local convection.
The precipitation spatial consistency at different time scales also have significant seasonal variations. Figure 2 shows the distribution of precipitation spatial consistency at daily and monthly scales in warm seasons (April-September) and cold seasons (October-March), respectively. In general, the correlated scales of precipitation are larger in cold seasons over most regions of China both at daily and monthly scales, which is similar to results revealed in other studies (Dzotsi et al 2014, Fan et al 2021.
The results indicate that precipitation in warm seasons over Mainland China is more likely to be localized, while precipitation in cold seasons is more likely to be induced by large-scale driving forces.

Density of rain gauges
Based on the distribution of precipitation spatial consistency over Mainland China, we further investigate whether the representativeness of the existing surface meteorological stations in Mainland China could meet the needs of basic synoptic and climatological research on precipitation. We calculate the closest distance between adjacent stations and compare it with the corresponding correlated scale at the threshold of R 2 = 0.7 (figure 3). Figures 3(a 1, 2 ) show that the density of national reference climatological stations and national basic synoptic stations over most regions of Mainland China is not large enough compared with the correlated scale of daily precipitation spatial consistency. For the national meteorological observing station network with ∼2400 stations, which is the most commonly used in synoptic research, only the density of stations over eastern and central parts of China could satisfy the correlated scale of daily precipitation spatial consistency ( figure 3(a 3 ). The results indicate conspicuous sparsity of national rain gauges over the Tibetan Plateau and Northwest regions of China for the purpose of basic synoptic precipitation research. As for the AWSs, the density is evidently increased over the Xinjiang regions compared to the artificial national stations, and the density of AWSs over most regions of Mainland China could be large enough, except for the Tibetan Plateau ( figure 3(a 4 ).
In terms of the climatological precipitation study, we use the correlated scale of monthly precipitation spatial consistency to compare with the closest distance between adjacent stations ( figure 3(b 1-4 )). The results show that even most of the national reference climatological station network with ∼210 stations is representative enough from the perspective of precipitation spatial consistency, except for the western Tibetan Plateau and deserts with no available stations. Ren and Ren (2012) found that these observational networks could produce fairly consistent spatial patterns of precipitation change trends. This study provides clues to understand the conclusion that these networks are dense enough for conducting basic precipitation climatological and climate change research across Mainland China. Therefore, the quality of precipitation gauge observations is more important than its spatial density for climatological and climate change research.

Discussion
The spatial consistency of precipitation is affected by many factors, such as topographic, meteorological, hydrological, and ecological factors . Among those factors, the topography may be one of the important factors that influence the spatial consistency of precipitation in China. Figure 4 shows the basic topographical distribution of Mainland China, including the elevation and the standard deviation of the elevation in each 0.1 • grid box. Figure 2. The distribution of the precipitation spatial consistency using the daily (a1-a3, b1-b3) and monthly (c1-c3, d1-d3) anomaly series in warm seasons (a1-a3, c1-c3) and cold seasons (b1-b3, d1-d3) during the period of 2015-2020 in Mainland China.
The distribution of the precipitation spatial consistency at the daily scale (figures 1(a 1 -3 )) are basically in accordance with the 3-step terrain formation of China ( figure 4(a)). Also, the negative correlation between correlated scale of precipitation at daily scale and elevation is significant with −0.71 correlation coefficient, which indicates the effect of the general topographic distribution on the spatial pattern of the precipitation scale at the synoptic scale. However, at the monthly scale, the distribution of correlated scales is quite different and the largest correlated scales appear along the Heihe-Tengchong Line, which indicates that the effect of topography on the precipitation spatial consistency is more obvious at the synoptic scale.
Apart from the east-west variation in precipitation spatial consistency, figures 1(a 1 -3 ) show that there are some 'spots' that make the distribution of the precipitation spatial consistency at the daily scale less smooth, especially over the edge of the Tibetan Plateau, Tianshan Mountains, and Hengduan Mountains. The distribution of these 'spots' is similar to Figure 3. The distribution of (a1, b1) national reference climatological stations, (a2, b2) national basic synoptic stations, (a3, b3) national meteorological observing stations, and (a4, b4) AWSs in Mainland China. The solid ' * ' points represent the stations where the distance from the nearest stations is less than the correlated scale, and vice versa for the hollow points. The correlated scale of precipitation spatial consistency here is calculated by the threshold of the square root of 0.7 for the correlation coefficient using the daily (a1-3) and monthly (b1-3) anomaly series. that of the regions with drastic topographic relief ( figure 4(b)). This phenomenon indicates that the local topographic relief may also have influences on the precipitation spatial consistency, especially the complex topography, which have been stated in several previous studies before , Li and Li 2017, Bertini et al 2021. Therefore, we also further examined the correlation between elevation standard deviation and correlated scale of precipitation. The results showed that several regions with large topographic relief are characterized by significant positive correlation, while not all the regions had the same phenomenon. The reason may be that the precipitation is affected by complex factors apart from topography, so the topographic relief could not correspond to the precipitation spatial consistency well.
In addition to the topography, many other metrological factors may also affect the distribution of precipitation spatial consistency, such as the atmospheric circulation, the moisture transport, the atmospheric instability, and so on. However, unlike other precipitation properties, the precipitation spatial consistency refers to the coordinated spatial variability in precipitation, which is more complicated and the mere correlation analysis or the distribution of atmospheric structures could not be enough to explain the phenomenon. Therefore, more detailed and systematic experiments with more metrological data should be done to figure out the specific mechanisms in the future.

Summary
The spatial representativeness of precipitation measurements refers to the scale of the adjacent area that the precipitation measurement at a single location can represent. However, the specific requirement on the representativeness of precipitation for synoptic and climatological research across Mainland China are still needed. Therefore, we use the recent highresolution precipitation product to obtain the first high-resolution map of the representativeness of precipitation based on the spatial consistency of precipitation at different time scales over Mainland China. We also examine whether the density of the existing precipitation gauges over Mainland China could meet the needs of basic synoptic and climatological research on precipitation. Furthermore, we discuss the associated factors that might affect the distribution of the precipitation spatial consistency.
There is an evident difference in the distribution of precipitation spatial consistency at different time scales. The distribution of daily precipitation spatial consistency presents an evident east-west contrast and is basically consistent with the topographical distribution of contiguous China, with the largest precipitation correlated scales at daily scale occurring over southern and eastern China and the smallest occurring over northwestern China.
The largest correlated scales of monthly precipitation spatial consistency occur along the Heihe-Tengchong Line, including Northeast China Plain and the Loess Plateau, as well as some regions of the Middle-Lower Yangtze Plain. The precipitation over humid regions, such as the southern and central parts of China, is characterized by smaller correlated scales at the monthly scale, which indicates that although these regions are dominated by larger precipitation systems and more rainfall, the spatial consistency of precipitation does not improve quickly in the process of time upscaling.
Based on the results of precipitation spatial consistency at different time scales, we further analyze the representativeness of existing rain gauges across Mainland China. The results indicate conspicuous sparsity in the current national station network over the Tibetan Plateau and northwestern regions of China for the purpose of basic synoptic precipitation research. However, for climatological research, even the national reference station network is dense enough, indicating that the quality of precipitation gauge observations is more important than its spatial density for climatological and climate change research, except in the western Tibetan Plateau and in deserts with no available stations.
With the surroundings of the Tibetan Plateau and the Pacific Ocean, China has a particularly complex topography and underlying surface (Yu et al 2014). The complex topographic conditions cause the distribution of weather stations in China to be very uneven, with more densely distributed stations over the eastern plains and a scarcity of stations over the western mountainous areas (Li et al 2021). In the past, many scientists have investigated the spatial representativeness of surface meteorological stations by merely using gauge observations, such as the correlation between neighboring gauge observations in a certain area (Nappo et al 1982, Jacobs 1989, Wang et al 2011, Li and Li 2017. This study provides a new perspective in considering the spatial representativeness from the perspective of objects and comparing it with the closest distance between adjacent stations. In addition to the object scale and station network density, another factor that may affect the representativeness of observed gauge data may be the exposure conditions of stations, such as the impact of urbanization on the observed surface wind speed and temperature (Jiang et al 2020, Zhang and. For precipitation, for example, if the wind speed is too high, the rain gauge may undercatch the real rainfall. Therefore, considering that the density of the current station network is sufficient for basic climatological research across Mainland China and for basic synoptic studies over eastern parts of China shown in our study, the quality of observed data, such as the effect of station exposure on the accuracy of observed precipitation data, is also worth investigating in the future.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https:// pmm.nasa.gov/data-access/downloads/gpm.