Uncertainty in different precipitation products in the case of two atmospheric river events

One of the World Climate Research Programme Grand Challenges is to evaluate whether existing observations are enough to underpin the assessment of weather and climate extremes. In this study, we focus on extreme associated with atmospheric rivers (ARs). ARs are characterized by intense moisture transport usually from the tropics to the extra-tropics. They can either be beneficial, providing critical water supply, or hazardous, when excessive precipitation accumulation leads to floods. Here, we examine the uncertainty in gridded precipitation products included in the Frequent Rainfall Observations on GridS (FROGS) database during two atmospheric river events in distinct Mediterranean climates: one in California, USA, and another in Portugal. FROGS is composed of gridded daily-precipitation products on a common 1° × 1° grid to facilitate intercomparison and assessment exercises. The database includes satellite, ground-based and reanalysis (RE) products. Results show that the precipitation products based on satellite data, individually or combined with other products, perform least well in capturing daily precipitation totals over land during both cases studied here. The RE and the gauge-based products show the best agreement with local ground stations. As expected, there is an overall underestimation of precipitation by the different products. For the Portuguese AR, the multi-product ensembles reveal mean absolute percentage errors between −25% and −60%. For the western US case, the range is from −60% to −100%.


Introduction
Mediterranean climates are characterized by warm and hot summers combined with mild and rainy winters. In addition to the Mediterranean basin area, Mediterranean climates are found throughout the planet, including parts of the western United States, southwestern Africa, Central Chile, and southwestern Australia (Peel et al 2007).
A feature shared by all Mediterranean climate regions is the occurrence of atmospheric rivers (ARs) and their related wind and precipitation extremes Waliser 2015, Waliser andGuan 2017). ARs are shallow (∼1-3 km in height) and narrow (∼300-500 km in width) plumes with high water vapor content, stretching over thousands of kilometers. These features can be dynamically linked to the development and movement of extratropical cyclones, and they are often associated with largescale dynamics, which means they are generally more frequent during the winter months compared to the summer (Gimeno et al 2014, Eiras-Barca et al 2018. The transport of moisture from the oceans to the continents is the primary component of the atmospheric branch of the water cycle and links evaporation from the ocean to precipitation over the continents (Peixoto andOort 1992, Gimeno et al 2020). Zhu and Newell (1998) showed that ARs convey more than 90% of the total mid-latitude vertically integrated water vapor transport (IVT) and can lead to intense precipitation episodes, which can produce high impact weather in different regions of the globe (Ralph et al 2016, Waliser andGuan 2017).
Through orographic ascension, ARs can produce large amounts of precipitation when reaching land (Hu et al 2017). However, this is not the only mechanism inducing the upward motion of moisture; and mesoscale processes can play an important role in the intensification of precipitation via associated convection cells, or mesoscale frontal waves that can modify the orientation, intensity, or duration of ARs (Ralph et al 2011, Hu et al 2017, Martin et al 2019. Observational studies of ARs and their contribution to extreme precipitation has been restricted to a few areas of the world, with a strong focus on the North Pacific and their impact on the west coast of North America (e.g. Neiman et al 2008, Ralph et al 2016, Hatchett et al 2017. In the South Pacific, studies stress the importance of the ARs in extreme precipitation in South America, particularly in Chile (Garreaud 2013, Viale et al 2018. There is a growing interest in understanding the contribution of ARs in the Atlantic Ocean to extreme precipitation and floods in western Europe (Lavers et al 2012, Ramos et al 2015, Pereira et al 2018 and recently in South Africa (Blamey et al 2018, Ramos et al 2019).
A better understanding of both weather and climate extremes was recently identified as one of the World Climate Research Programme Grand Challenges (Sillmann et al 2017). The variability of weather extremes across different temporal and spatial scales is one area ripe for significant advances; however, one needs to account for uncertainties not only in model simulations (Knutti and Sedláček 2013, Soares et al 2017, Cardoso et al 2019 but also in observations (e.g. Lockhoff et al 2014, Herold et al 2017, Hénin et al 2018, Herrera et al 2019. As a result, the observation-based precipitation community is engaged in a broad scope assessment (Haddad and Roca 2017) with a dedicated focus on extreme precipitation (Alexander et al 2018). In support of these objectives the Frequent Rainfall Observations on GridS (FROGS) database  has been developed. It is a unique repository of various 1 • × 1 • gridded daily products originating from in situ, RE and satellite products.
Considering the importance of uncertainty assessment in the climate research community, a collaborative project entitled the Atmospheric River Tracking Method Intercomparison Project (ART-MIP) is currently ongoing. The goal of ARTMIP is to understand and quantify uncertainties in AR research based on the choice of the detection/tracking methodology (Shields et al 2018). The climatological characteristics of ARs, such as frequency, duration, intensity, and seasonality are all strongly dependent on the method used to identify ARs (Ralph et al 2018a.
In the present study, a contribution to better understand the uncertainties related to AR impacts is attempted by quantifying the extreme precipitation related to two AR landfall cases in two Mediterranean climates using a large set of precipitation products from the FROGS database.
The remaining sections are organized as follows: section 2 introduces the various gridded products and the in situ precipitation data, while the two AR events and their socio-economic impacts are presented are in section 3. The results of the comparison between the precipitation estimates from reference gauges and the gridded products are presented in sections 4 and 5 provides the overall conclusions.

Precipitation datasets and methodology
The current study focuses on two Mediterranean climates, Portugal and the western U.S., where strong precipitation has been shown to be highly connected to AR landfall interacting with complex terrain (e.g. Neiman et al 2013, Ramos et al 2015. In these cases, local ground stations are the best reference data if there are an appropriate spatial and temporal distributions. Taking this into account, to ensure the maximum number of local ground stations (especially in the Portuguese domain), we choose two case studies in 2016.
Using in situ reference datasets avoid some of the caveats linked with regular observational gridded products, such as temporal inhomogeneities due to a changing station network and the smoothing of extremes due to interpolation methods (e.g. Belo-Pereira et al 2011, Herrera et al 2012. In fact, observational uncertainty in gridded precipitation datasets is substantial and comparable to that linked to climate models . Here, in the case of Portugal the ground-station data used is from the Portuguese Institute of Meteorology while for the western United States we use the Global Historical Climatology Network-Daily (GHCN-Daily) database (Menne et al 2012a). More information regarding each dataset can be seen in sections 2.1 and 2.2.

Portuguese station data
The precipitation for Portugal was provided by the Portuguese Institute of Meteorology (Instituto Português do Mar e da Atmosfera, IPMA), which provides accumulated rainfall reported every 10 min from 72 automatic weather stations. These stations were chosen based on a combination of tests for temporal completeness over the period of interest in this case (see section 3) quality (Santo et al 2014), and their spatial distribution over mainland Portugal. The daily precipitation was accumulated between 0000 UTC and 2359 UTC.

Western United States station data
The precipitation dataset used to evaluate the AR that impacted the western U.S. is the GHCN-Daily database (Menne et al 2012a). Observations in this database are integrated from around 30 different data sources (about a dozen within the U.S.). Updates occur 7 d a week, and the entire database is reconstructed approximately weekly, due to the growing list of networks that contribute to it. During this reconstruction, a consistent suite of over 20 different quality checks are applied to the data. Each version of this reconstructed dataset is archived for future retrieval. Generally, data in the U.S. are finalized 45-60 d after the end of the month, indicating that the date of access for this study's look at the October 2016 event means data values should be unchanging. Many more details on this database can be found in an overview published by Menne et al (2012b). Some of the original information from source datasets, such as original quality flags, is not provided in this database. However, even if they were provided, the contributing networks have extremely diverse quality control procedures (Jaffrés 2019). GHCN-Daily attempts to provide a dataset with clear quality checks that work consistently across many different networks. Thus, we consider it to be the most relevant, and comprehensive option for assessing precipitation during the event discussed here. In order to be conservative, all data with any quality flag in the GHCN-D dataset are excluded from this analysis.

The FROGS dataset
Twenty-two daily precipitation estimates originating from multiple sources, including in situ, atmospheric RE, and satellite-based products are analyzed in this study. Each individual dataset has been gridded to the same 1 • × 1 • regular longitude-latitude grid as part of the FROGS database (Roca et al 2019) effort. More information on the FROGS initiative can be found at http://frogs.ipsl.fr. Table 1 summarizes the various products used. Following previous investigations (e.g. Alexander et al 2020), the products have been clustered into four groups according to the data type to facilitate the assessment. The groups are in situ gauges only (GO) and RE. The satellite products have been further split in satellite only (SO) products and satellite-withgauges (SG) products.
The in situ gridded products rely on the use of operational rain-gauge measurements that are quality controlled and combined to provide an aerial estimate at the daily scale of the accumulated precipitation. The products used here differ by the sources of in situ measurements, the quality control procedures, and the algorithms used to map the rain-gauges on the regular grid. The RE products consist of the precipitation fields from various atmospheric RE systems. The products used here assimilate different data types use varying assimilation schemes. The products also differ due to the particular sets of physical parametrizations used in each individual RE production center. The satellite-derived products span a wide range of methodologies and sources of satellite observations. Some products make use of infrared observations from geostationary satellites as their main data source while others rely mainly on the constellations of passive microwave imagers and sounders. Some products use multiple types of satellite observations. Finally, some products use in situ rain gauges to bias correct the satellite estimates. These products use different products as input, different algorithms to combine these data and to provide a precipitation estimate and finally different weights and methodologies associated with the merging of the in situ measurements. The details of each product are available in the references provided in table 1 and a summary for each product is available in Roca et al (2019).
These 22 products hence offer a unique, almost comprehensive, state of the art ensemble of daily precipitation on a 1 • × 1 • grid that is assessed here using reference high resolution in situ data (see sections 2.1 and 2.2). Figure 1 shows the geographical distribution of these reference data and their density in 1 • × 1 • grid boxes.

Evaluation methods
The evaluation of the FROGS precipitation products is conducted with the following pooling together methodology (pool_all). First, for each FROGS 1 • × 1 • grid box, the ground stations located within this grid box are identified and combined individually with the FROGS grid box, i.e. a pair record is built composed of the values of the station observations and FROGS as many times as the number of ground stations in the grid box (length = number of stations × number of days). Next, we extend this analysis to all FROGS grid-boxes (N = length = number of stations × number of days × number of FROGS grid-boxes). Subsequently, the following standard error statistics are computed to compare the FROGS precipitation against the reference station observations: bias (1), mean absolute error (MAE) (2), mean absolute percentage error (MAPE) (3) and root mean square error (RMSE) (4), defined as: where o k represents the observed values, p k the FROG products values and N is the number of pair values. The bias offers a view of the overall deviation between model and products values, and together with MAE allows the identification of systematic errors. The percentage error measure, MAPE, gives a relative measure of those errors with reference to the mean observed values. Finally, the RMSE, due to the root mean, emphasizes the larger deviations between observation and FROGS data. For the ensemble of product types (RE, SO, SG and GO) the same strategy is applied, which is pooling together all the data for each grid box and not applying the mean. Additionally, for comparison purposes, we computed the average of the observational ('av' from now forward) values at 1 • grid-point (the same as grid box of the FROGS dataset). Afterwards this mean observational value is directly compared with the FROGS regular gridded products values and averaged to get the mean error of each product.

Iberian Peninsula
The AR event that made landfall on the Iberian Peninsula on the 12th and 13th of February 2016 was chosen due to its socio-economic consequences. In terms of social and economic impacts, this violent storm resulted in offshore high waves, floods and landslides. The Portuguese National Civil Protection issued warnings for heavy rain, snow, strong winds and high waves for 12 February and 13 February, in areas north of the Tagus River. Several rivers in northern Portugal overflowed and produced floods, where one person drowned after he was swept away by flood water. The AR's low-level jet produced strong winds leading to fallen trees that caused disruption to road and rail links. Finally, the heavy precipitation caused a landslide in northern Portugal where four houses were damaged and 12 people were displaced, according to different Portuguese local newspapers. The AR's low-level jet produced strong winds leading to fallen trees that caused disruption to road and rail links. Finally, the heavy precipitation caused a landslide in northern Portugal where four houses were damaged and 12 people were displaced, according to different Portuguese local newspapers.
We identified the AR objectively, applying the Ramos et al (2015) detection algorithm to ERA5 RE (Hersbach et al 2020) in both days with IVT landfall values ranging from 700 kg m −1 s −1 to 800 kg m −1 s −1 . We also ensured that the AR meets the recently published definition criteria (Ralph et al 2018b, AMS Glossary, https://glossary. ametsoc.org/wiki/Atmospheric_river). The integrated water vapor (IWV) field from the Special Sensor Microwave Imager Sounder (Wentz 2013), shows a narrow plume of high-water vapor content, stretching from the Caribbean to western Iberia with maximum values around 45 mm (figure 2). Furthermore, enhanced storm track activity north of the Iberian Peninsula was observed during these days with several frontal systems (not shown) leading to vertical instability in the region. The combination of high moisture availability provided by the AR, the vertical instability provided by the frontal systems, and orographic lifting led ultimately to extreme precipitation in the western Iberian region. Some regions received more than 100 mm of precipitation in 24 h on each day of the event (figure 3). The AR made landfall north of Portugal and persisted at the same location for 18 h (figure 2). Most of the extreme precipitation occurred on 12 February. Several stations recorded more than 100 mm, especially in the mountainous regions in northern Portugal, due to enhanced orographic lifting. In addition, the inland heavy precipitation accumulation extends to near the Spanish border, where it drops to values below 30 mm. On 13 February, the AR core moved south (figure 2), however it still produced high precipitation on the northern part of Portugal with some stations recording precipitation in 24 h above 70 mm.

United States West Coast
The ARs that affected the western United States during the 13-16 October 2016 period first made landfall on the U.S. West Coast in the Pacific Northwest on 13 October. The first AR moved south and a second AR made landfall in northern California on 15 October. As in the Iberian Peninsula event, these ARs can be identified as plumes of enhanced IWV in SSMI satellite imagery stretching across much of the Pacific (figure 4). Following the same criteria as in section 3.1, we ensured that the AR meet the definition criteria using the Rutz et al (2014) identification method and by analyzing the data from the SSMI and corresponding sea level pressure (SLP) for the 13-16 October period.
During most of the month of October, there was a persistent low-pressure anomaly off the northern U.S. West Coast, which is associated with an AR storm track that impacts northern California (Guirguis et al 2018). In this case, the second AR received much of its moisture from the remnants of Super Typhoon Songda. Typhoon remnants can be an important moisture source for impactful ARs affecting the west coast (Hatchett 2018).
AR conditions were sustained in parts of the northern California coast for over 24 h per Atmospheric River Observatory observations , White et al 2013 with IWV values exceeding 40 mm and IVT exceeding 750 kg m −1 s −1 . Maximum precipitation accumulations from the first AR (13-14 October) exceeded 120 mm near the Oregon border (figure 5). Over the entire period, accumulations reached R-Cat 3 levels (exceeding 400 mm over a 3 d period) in some areas (Ralph and Dettinger 2012). These ARs contributed to an October precipitation accumulation total of over 200% of normal rainfall for much of northern California, and over 400% of normal for parts of coastal Oregon and Washington (not shown). Several climate divisions in Washington and western Oregon experienced their wettest October on record partially due to this event. In California, precipitation totals were high both at the coast and inland over the Sierra Nevada mountain range ( figure 5).   This storm was primarily beneficial in the Pacific Northwest as much of the region was still under drought conditions and this was one of the first storms of the season. Negative impacts from this event were relatively small. Many rivers set daily flow records; however, damaging flooding did not occur because of the large available storage space in soils and streams. Much of the negative impacts came as the result of high winds associated with these ARs, with gusts as high as 46 m s −1 and sustained winds reaching 9-18 m s −1 , and not from the precipitation accumulation.

Evaluation results
For both atmospheric river cases (Portugal and western U.S.) the different products show significant errors when compared with local observations when using the pooling together approach (figure 6 and table 2; pool_all). From an ensemble perspective, the satellite-based products (SO) present the worst results for both locations, but in particular for Portugal. For the western U.S., the gauges-based gridded product group performs the best and for Portugal the RE is best. All three products (RE, SG and GO) are rather comparable. In the two locations there is a general systematic underestimation of precipitation by the different products. The absolute errors are larger for the Portuguese case study than the one for California, but in contrast the relative errors (MAPE) are smaller for Portugal. This is, of course, linked to the higher precipitation intensities of the Portuguese AR case. All these considerations are true also following a simple average methodology (figure 6; av).
Following an ensemble view, for Portugal, the biases are in the range of −15 mm for the SO product and ∼+1 mm for the GO. MAEs are between ∼14 mm and ∼17 mm, which corresponds to large MAPEs that are between 59% and ∼73%, for the RE and SO, respectively. Finally, RMSEs are in the range of 26 and 32 mm. For the western U.S., the errors point to biases between ∼−1 mm and ∼−3 mm, respectively, for the RE and SG products. MAEs span between 8 mm (GO) and 12 mm (SO) that have associated quite large MAPEs (69% and 101%). Accordingly, RMSEs vary from ∼16 mm (RE, GO) to ∼38 mm (SO). As expected, when looking at the average based errors, a small reduction may be seen in the bias values, but a significant one in all other error measures. In general, the MAE, MAPEs and RMSEs, for this methodology, are much smaller in the Portuguese case, due to the rather small number of available observations for each grid point. However, the relative importance of each ensemble of regular gridded products remains almost unchanged, with only a few exceptions, again related to the different grid point observational sampling.
Looking at the individual products, the SO products reveal the largest error variability for Portugal. This is not the case for western US, where the high ensemble value error is largely due to a single product (SO6). This last grid presents outlier precipitation values in western US high latitudes ∼48 • N (not shown). The presence of such very intense precipitation cases over mid latitudes land in this product family appears to exist in various regions (Bador et al 2020). It might be associated with mis-detection issues over orographic areas (Yamamoto et al 2017). Interestingly, this extreme outlier is almost fully mitigated when the rain gauge correction to the satellite products is applied (SG5, not shown). In an opposite manner, for Portugal SO6 is the best satellite product; followed, in by the SO4. In western US, the best satellite product is SO5, but closely followed by others, except SO6. The individual RE and GO products show error values rather similar for the two areas. Finally, the inconsistencies of the SO grids appear partially transferred to the SG grids; this latter also shows a significant variability in errors. The absolute and relative error magnitudes give a clear idea of the scale mismatch between the coarse resolution of the gridded products (1 • ) and the local character of precipitation linked to the ARs. Moreover, the different number of stations for each FROGS grid-box may be relevant here. We did a preliminary analysis to assess the error sensitivity to the minimum number of observational values included in each grid box product using the same error metrics. Results show that the sensitivity error to the minimum number of weather observations is low (supplementary figure S1 (Portugal) and S2 (western United States) (available online at stacks.iop.org/ERL/16/045012/mmedia)). A thorough assessment of the effect of station density would require a much larger number of days in analysis and is therefore out of the scope of the current study.
One must keep in mind that the results shown in this section are only valid for these two cases studies and these regions. For a more general assessment one would have to use more AR case studies and consider that ARs can have very different impacts based on their intensity and duration . Therefore, this general assessment would not be straightforward.

Discussion and summary
A comparison between the new FROGS daily precipitation dataset, which includes mostly satellite derived products, and local in situ weather stations is made for two mid-latitude regions (western US and Portugal) for two specific AR landfall case studies. It was shown by different authors that at daily scale satellite gridded products exhibit significant skills in the tropics (Roca et al 2010, Gosset et al 2018. However, at a daily scale, the skill of extreme precipitation derived from satellite and RE at mid-latitudes is not well known. Lockhoff et al (2019) used a limited number of datasets and a systematic multi seasonal assessment to reveal the complexity of the products' behaviour with skills depending upon seasons and the underlying climatological regimes. Our results complete the global picture by focusing on ARs and confirm the general outcome of the previous analysis.
The two case studies were selected considering the observation of IWV from satellite data and SLP from different RE to meet the criteria of the ARs definition (AMS Glossary, https://glossary. ametsoc.org/wiki/Atmospheric_river). In addition, the authors would like to stress that both cases had socio-economic impacts corresponding and with extreme precipitation values. Even though there are only two selected case studies in this work, it is the first time that the FROGs database is used to study extreme precipitation within ARs. We acknowledge the fact the conclusions can't be extrapolated to other ARs that impacted the selected regions. This could be done using a large ARs dataset but keeping in mind that each ARs in unique and have different categories based on intensity and persistent as shown in Ralph et al (2019). Therefore, further analysis using a large set of ARs should be separated into categories to allow a fair comparison between the different ARs precipitation measured by the rain gauges and the FROGs dataset.
Two different studies over the western United States quantified AR-driven precipitation using different satellite products at a sub-daily scale (Behrangi et al 2016, Wen et al 2018. These studies show that the satellite products usually underestimate the heavy precipitation compared to gauge measurement, while some are able to capture the orographic enhancement over the California mountains. In addition, for extremely heavy precipitation (3-hourly precipitation rate >5 mm h −1 ), none of the products show good performance in quantifying the precipitation intensity (Wen et al 2018). Regarding the Iberian Peninsula, as far as we know, no specific comparison has been made between satellite precipitation products during an AR event. Hénin et al (2018), shows that the Tropical Rainfall Measuring Mission (TRMM) product overestimates (underestimates) daily precipitation sums for the least (most) extreme events over the Iberian Peninsula.
When analyzing these two case studies of ARs affecting two different Mediterranean climates, amongst the FROGS database, the products based only on satellite data or combined with it perform the poorest in capturing the daily precipitation over land in the western US and Portugal. The RE and the gauge-based products possess the best agreement with local ground stations. As expected, there is an overall underestimation of precipitation by the different products. For Portugal, MAPEs reach values between ∼60% and 70%, and for western US values from 60% to 100%. Those larger MAPE values correspond to the SO product. The large errors illustrate the mismatch between the coarse resolution of the FROGS products but also their shortcomings in describing the spatial structure and intensities of the strong precipitation linked to these two ARs in western USA and Portugal. RE and gauge products also reveal significant errors in the selected cases.
This study points out the need to develop higher resolution and accurate gridded products to capture the spatial and temporal properties and variability of precipitation due to ARs. FROGS products are mostly available at a global scale, therefore the same methodology used here can be applied to any other Mediterranean area were ARs precipitation is relevant like Chile (Viale et al 2018) or western South Africa (Blamey et al 2018).

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.