Ensemble Postprocessing of Daily Precipitation Sums over Complex Terrain Using Censored High-Resolution Standardized Anomalies

Probabilistic forecasts provided by numerical ensemble prediction systems have systematic errors and are typically underdispersive. This is especially true over complex topography with extensive terrain-induced small-scale effects, which cannot be resolved by the ensemble system. To alleviate these errors, statistical post- processing methods are often applied to calibrate the forecasts. This article presents a new full-distributional spatial postprocessing method for daily precipitation sums based on the standardized anomaly model output statistics (SAMOS) approach. Observations and forecasts are transformed into standardized anomalies by subtracting the long-term climatological mean and dividing by the climatological standard deviation. This removes all site-speciﬁc characteristics from the data and makes it possible to ﬁt one single regression model for all stations at once. As the model does not depend on the station locations, it directly allows the creation of probabilistic forecasts for any arbitrary location. SAMOS uses a left-censored power-transformed logistic re-sponsedistributiontoaccountforthelargefractionofzeroobservations(drydays),thelimitationtononnegative values, andthepositive skewnessofthe data. ECMWFreforecastsareusedformodel trainingandtocorrectthe ECMWF ensemble forecasts with the big advantage that SAMOS does not require an extensive archive of past ensemble forecastsas onlythe mostrecent four reforecastsare needed,anditautomaticallyadapts tochangesin the ECMWF ensemble model. The application of the new method to the central Alps shows that the new method is able to depict the small-scale properties and returns accurate fully probabilistic spatial forecasts.


Introduction
In mountainous regions, large amounts of precipitation can lead to severe floods and landslides during spring and summer and to dangerous avalanche conditions during winter. Accurate and reliable knowledge about the expected precipitation can therefore be crucial for strategic planning and to raise awareness among the public.
Precipitation forecasts, or weather forecasts in general, are typically provided by numerical weather prediction models. Nowadays, most forecast centers also compute probabilistic forecasts based on numerical ensemble prediction systems (EPSs; Epstein 1969;Buizza et al. 2005) as probabilistic information can be crucial, for example, for strategic planning or decision-makers. An ensemble consists of several (independent) forecast runs with slightly different initial conditions, model physics, and/or parameterizations. The goal of an EPS is not only to provide one single forecast but also to provide additional information about the weather-situation-dependent forecast uncertainty. Although EPSs are undergoing constant improvements, they are not able to provide fully reliable forecasts and are typically underdispersive (Mullen and Buizza 2001;Hagedorn et al. 2012).
To correct for systematic errors and to correct the uncertainty provided by the EPS, postprocessing methods are often applied. A variety of ensemble postprocessing methods for precipitation are available nowadays, such as analog methods (Hamill et al. 2006, ensemble dressing (Roulston and Smith 2003), Bayesian model averaging (BMA; Sloughter et al. 2007;Fraley et al. 2010), extended logistic regression (Wilks 2009;Ben Bouallègue and Theis 2014;Messner et al. 2014b), or nonhomogeneous regression (Gneiting et al. 2005). Several extensions exist for nonnormally distributed variables (Thorarinsdottir and Gneiting 2010;Lerch and Thorarinsdottir 2013;Scheuerer 2014;Scheuerer and Hamill 2015). For precipitation, Messner et al. (2014a) show that a censored logistic regression fits well, while Scheuerer (2014) and Scheuerer and Hamill (2015) use a left-censored generalized extreme value (GEV) distribution or a left-censored shifted gamma distribution, respectively.
These postprocessing methods are often applied on a station or gridpoint level such that for each location, one set of regression coefficients is estimated to correct the ensemble forecasts. However, for a wide range of applications, predictions for locations between observational sites are of great interest. Therefore, the regression models have to be extended such that spatial probabilistic predictions can be made.
In this article, a new spatial statistical postprocessing method for daily precipitation sums over complex terrain is presented. Even on a small spatial scale, two neighboring stations can show very different characteristics in terms of observed precipitation sums. These differences can be caused by topographically induced flow regimes, orographic lifting and shading effects, convective regimes, and many other factors. Most of these processes cannot yet be resolved by global EPS models. To account for these small-scale spatial variabilities among all stations, we are using an adapted version of the anomaly approach first published by Scheuerer and Büermann (2014) and further extended by Dabernig et al. (2017). Observations and ensemble forecasts are transformed into standardized anomalies by subtracting the long-term climatological mean and dividing by the climatological standard deviation. This removes the station-dependent characteristics from the data and makes it possible to fit one single regression model for all stations at once. As the model does not rely on site-specific characteristics anymore, the corrections can be applied to future ensemble forecasts to create probabilistic forecasts for any arbitrary location within the area of interest.
Following Dabernig et al. (2017), we use the standardized anomaly model output statistics (SAMOS) approach and extend the framework to fulfill all requirements needed for precipitation postprocessing. SAMOS offers a simple and computationally efficient framework for fully probabilistic spatial postprocessing and is applied to the European Centre for Medium-Range Weather Forecasts (ECMWF) ensemble in combination with the ECMWF reforecasts. The approach presented qualifies for an operational system as no extensive archive of historical forecasts is required. SAMOS uses a rolling 4-week time window as a training dataset so that only the reforecasts of the most recent month from the operational ECMWF data dissemination have to be retained, which currently (in 2016) consist of eight independent reforecast runs covering the previous 20 years. Because of this rolling training dataset, SAMOS automatically adapts itself to the latest ensemble model version within a very short time period.

a. Study area
To develop and validate the new method presented in this study, we focus on the governmental area of Tyrol, Austria. Tyrol has a size of about 12 500 km 2 and is home to approximately 740 000 inhabitants (Statistik Austria 2016) living in the two separated parts, with North Tyrol on the north side of the main Alpine ridge and East Tyrol south of the main Alpine ridge. The study area is located in the eastern part of the Alps, showing a highly complex topography. Figure 1 shows the state borders of Tyrol and the topography reaching from 465 to 3798 m MSL, including some of the highest mountains in Austria. Because of the high population density and the strong economic focus on tourism (.10 million tourists in 2014; Kaiser et al. 2014), there is a high demand for accurate weather forecasts.

b. Observational data
The local hydrographical service provides a dense precipitation measurement network, whereof 117 stations in Tyrol and its surroundings will be used for model training and validation spanning September 1971 through the end of 2012. The mean distance to the four closest stations in the surroundings is only about 10 km. Locations of the observation sites are highlighted in Fig. 1. The hydrographical service performs rigorous quality controls on the observations and makes them freely available for any noncommercial use on the maintainers' website (Bundesministerium für Land und Forstwirtschaft, Umwelt und Wasserwirtschaft 2016).

c. Numerical weather forecast data
The numerical forecasts are obtained from the ECMWF, including the operational ensemble (ENS; 0000 UTC initial), which consists of 50 1 1 individual forecasts based on perturbed initial conditions (50 forecasts plus control run) and the ECMWF reforecast dataset. The ECMWF reforecast dataset has existed since 18 February 2010 and was slightly extended over the years. Until 14 June 2012, the reforecast was computed once a week, providing ensemble reforecasts consisting of 4 1 1 members for the most recent 18 years.
FIG. 1. The black line shows the state borders of Tyrol. Each marker represents an observation site (total 117), the marker type indicates the altitude: square (#1000 m MSL), bullet (1000-1500 m MSL), and triangle ($1500 m MSL) with respect to the underlying topography. The background shows the (a) real topography (Jarvis et al. 2008) and the (b) ECMWF EPS model topography with a 0.58 resolution as used between February 2010 and December 2012.
From 21 June 2012 through the end of 2012, the number of years was extended from 18 to 20. This reforecast is designed to provide the model climate of the latest ECMWF ENS version and is often used for model calibration (e.g., Hamill et al. 2008;. In this study, the time period from February 2010 to December 2012 is used. Every Thursday, the reforecasts for the same date two weeks in advance have been computed, including a 4 1 1 member ensemble for the most recent 18-20 years. As an example, on Thursday 1 November 2012, the reforecast for 15 November has become available for the most recent 20 years, namely 15 November 2011, 15 November 2010, . . . , 15 November 1992, with 4 1 1 members each.

d. Training and verification dataset
The ECMWF reforecasts are used to compute the climatology of the ECMWF ensemble, which will be used as background information and to train the statistical postprocessing, including the most recent four reforecast runs centered around the current date (computed every Thursday; section 2c). Therefore, the model climatology is based on 4 runs 3 5 members 3 20 yr 5 400 individual forecasts (details in section 3c). For the training dataset, the reforecasts are bilinearly interpolated to each of the 117 observation sites. Out of each interpolated reforecast ensemble (daywise, 4 1 1 members), the mean and standard deviation is used later to build the training dataset. We use the most recent four 0000 UTC reforecast runs yielding to a training sample of up to 4 runs 3 20 yr 5 80 data pairs per station, or 4 runs 3 20 yr 3 117 stations 5 9360 observationreforecast pairs for the full spatial SAMOS (details in section 4b).
Once the regression coefficients are estimated, the correction can be applied to future EPS forecasts using the mean and standard deviation of the 50 1 1 members of the ECMWF ENS.
Because of the availability of the observations (section 2b) and the ECMWF reforecasts (section 2c), the time period between 26 February 2010 and 31 December 2012 will be used for verification, with an overall data availability of 99.4% and roughly 120 500 unique observation-forecast pairs.

a. Censored nonhomogeneous logistic regression (CNLR)
The distribution of precipitation observations at a particular observation site shows three main properties: it is limited to nonnegative values, has a large fraction of 0 observations (dry days), and is strongly positively skewed. We take the nonhomogeneous Gaussian regression (NGR; Gneiting et al. 2005) as our base model and extend the NGR framework to suit spatial precipitation postprocessing.
In contrast to the original NGR, a logistic response distribution is assumed. The logistic distribution shows a similar bell shape as the Gaussian distribution but has slightly heavier tails. The logistic distribution is defined by two parameters: the location m describing the mean and the scale s describing the width of the distribution. To remove the positive skewness, a power transformation 1/p is applied to the observations and to every ensemble member (Box and Cox 1964). Different power parameters p have already been suggested in the literature for precipitation applications such as p 5 3 (Stidd 1973) or p 5 2 (Hutchinson 1998). However, the optimal power parameter is a function of the data, the model assumptions, and the application. For this study, the power parameter p has been set to p 5 1:35, which turned out to fit best for the dataset and distribution used (details in section 3c).
Furthermore, the response is assumed to be left censored at 0 to account for the nonnegative observations and the large fraction of 0 observations. The concept of left censoring assumes that there is an underlying latent (unobservable) process driving the observable response, which can be described by a linear predictor. While the latent response y is allowed to become negative, the observable response ''precipitation'' is simply 0 if the latent response y is below zero or the inverse powertransformed latent response y p otherwise. For simplicity, the zero left-censored nonhomogeneous logistic regression will be denoted as CNLR from now on.
Both distributional parameters (m, s) are expressed by a linear predictor including the covariates or explanatory variables. As suggested by Gneiting et al. (2005), the mean of the ensemble forecast drives the location m, and the standard deviation of the ensemble drives the scale s. For this study, we only use the forecasted daily accumulated total precipitation from the ensemble (section 2c) as the meteorological predictor variable. In Eq.
(1), m denotes the mean, and s denotes the standard deviation of the forecasted power-transformed daily total precipitation amounts of the ensemble members.
Following the idea of Gebetsberger et al. (2016), a second covariate z has been included. The term z is a binary split variable, which takes 1 if all forecast members in the training dataset predict less than 0.01 mm day 21 (z 5 1; ''no'' precipitation) or 0 otherwise. This allows us to handle dry and wet cases differently and has a positive impact on the results. It furthermore solves the problem of taking the logarithm of the ensemble standard deviation if all members predict 0 mm, which leads to s 5 0. The log transformation on the scale s is used to ensure nonnegative-scale values during optimization. The full CNLR assumptions can then be written as (1) In case of a dry ensemble forecast (z 5 1), the linear predictors collapse to m 5 b 0 1 b 1 and log(s) 5 g 0 such that the model only consists of two estimated constants describing the climatological distribution of the response conditional on all cases where z 5 1. The variable b 1 typically becomes strongly negative, which leads to a strongly negative latent location m and overall small expected amounts of precipitation for the case z 5 1. For wet cases (z 5 0), the linear predictors become m 5 b 1 1 b 2 3 m and log(s) 5 g 0 1 g 1 3 log(s), which corresponds to the NGR model proposed by Gneiting et al. (2005). These assumptions allow us to correct the bias but also a possible overdispersion or underdispersion of the ensemble as the scale s depends on the predicted ensemble standard deviation. Even if the two cases are not independent and connected via the scale part, discontinuities occur at the transition where z goes from 0 to 1. As this only happens in regions with very small predicted amounts of precipitation, the effect on the results is marginal.
The model as specified in Eq.
(1) can be applied at every arbitrary location where both historical observations and historical ensemble forecasts are available. For pointwise ensemble postprocessing, one CNLR model has to be fitted at each observation site. In this case, all CNLR models are independent and have their own regression coefficients b d and g d . As these coefficients are site specific, spatial predictions are not directly possible and would require an additional interpolation method, which allows us to account for supplementary covariates, such as terrain or surface properties.
Instead of a two-step approach of performing stationwise estimates and interpolating/extrapolating the resulting coefficients afterward, we extend the model to include the training data of all stations at once and fit one simple and computationally efficient model for fully probabilistic spatial estimates.

b. SAMOS
The statistical method presented in this article is based on the anomaly approach first published by Scheuerer and Büermann (2014) and further extended by Dabernig et al. (2017), focusing on temperature forecasts across Germany and northern Italy, respectively. We extend the SAMOS approach by Dabernig et al. (2017), yielding to a censored SAMOS version for precipitation postprocessing.
Climatological properties between two precipitation observation sites may vary in mean (location) and variability (scale). This is especially true over complex terrain where only a few kilometers between a valley and a mountain station can result in very large climatological differences (Frei and Schär 1998;Isotta et al. 2014;Stauffer et al. 2017). These small-scale features influence daily precipitation sums but are not yet fully resolved by global numerical ENSs. Therefore, a highresolution spatiotemporal climatology is used as background information to provide small-scale features at any location within the study area. Instead of modeling the relationship between past observations and past numerical weather forecasts directly, the statistical model uses high-resolution standardized anomalies. Anomalies are defined as the short-term deviation from the local long-term climate. These anomalies can be divided by the local climatological variability to obtain standardized anomalies. Standardized anomalies of the observations (precipitation) are defined as where m obs,clim and s obs,clim describe the long-term climatological properties of daily observations and will be discussed in detail in section 3c. The term y* denotes the resulting latent response on the standardized anomaly scale, which follows a standard logistic distribution L (0; 1). Equivalent to Eq. (2), standardized anomalies of the ensemble forecasts (ens) can be computed with the climatological properties m ens,clim and s ens,clim of the ensemble using ens* 5 ens 1/p 2 m ens,clim s ens,clim . ( The ensemble climatology (m ens,clim , s ens,clim ) is described in section 3c.
Because of standardization, the censoring point on the anomaly scale becomes a function of the observed climatology. While the censoring point is on 0 (no precipitation) on the original or power-transformed scale [Eq. (1)], the censoring threshold becomes 2m obs,clim / s obs,clim after standardizing the data. Figure 2a shows the power-transformed observations with a constant censoring threshold of 0 throughout the whole year. Figure 2b shows all standardized anomalies and the shifted censoring threshold indicated by the solid line. As observations below the censoring threshold never occur, all data points lie on or above this line. Figure 2c is an extension of Fig. 2b, where all observations on the censoring threshold (0 mm day 21 on the original scale) were simulated from the standard logistic distribution for visual justification. As shown in the density plot, the standardized anomalies now follow a latent standard logistic distribution L (0, 1). As each of the 117 stations is standardized using its specific climatological properties m obs,clim and s obs,clim , the standardized anomalies of all stations show the same distribution [L (0, 1)]. Thus, the standardization removes site-specific features from the data and brings the data of all stations onto a comparable level.
Combining the CNLR model from Eq. (1) with the concept of standardized anomalies [Eqs. (2) and (3)] leads to the full specification of the SAMOS model with a left-censored logistic response: As on the power-transformed scale, the standardized anomalies y* are still assumed to follow a logistic distribution. The linear predictors for location m* and log(s*) on the standardized anomaly scale depend on the standardized ensemble anomalies (ens*) and the binary split indicator z. In this study, total precipitation forecasts are used as the only meteorological variable. The covariates m* and s* therefore correspond to the empirical mean and standard deviation of the standardized total precipitation forecast anomalies [Eq. (3)]. Once all covariates are known, the regression coefficients of the SAMOS model given by Eqs.
(2)-(4) can be estimated using censored maximum likelihood optimization as offered by the R package crch  or similar software. The climatological estimates required to create the standardized anomalies are explained in detail in section 3c.
Given all regression parameters b d and g d of the SAMOS model [Eq. (4)], the correction can be applied to future ensemble forecasts. As the SAMOS model returns both parameters on the standardized anomaly scale, they have to be destandardized with respect to the spatial climatology: m* 3 s obs,clim 1 m obs,clim |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} location , s* 3 s obs,clim |fflfflfflfflfflfflfflfflffl ffl{zfflfflfflfflfflfflfflfflffl ffl} The destandardized zero left-censored distribution L 0 (⋯) describes the full postprocessed ENS forecast distribution on the power-transformed scale. Since the SAMOS regression coefficients are location independent, the postprocessed predictions can be computed at any location within the study area where both ENS forecasts and climatological estimates (m d,clim , s d,clim ) are available. As spatiotemporal climatologies are used (details in section 3c), the only limitation for the postprocessed ENS forecasts is the horizontal grid spacing of the spatial climatology, which itself only depends on the resolution of the available digital elevation model (see Stauffer et al. 2017). From the full-probabilistic SAMOS forecasts, different properties can then be derived, such as the mean or expectation, quantiles, probability of precipitation, or probabilities exceeding a certain threshold. To retrieve the corrected forecasts on the original scale in millimeters per day, the inverse power transformation has to be taken into account. Details can be found in appendix A.
In the limiting case that the ensemble would not provide any information at all, m* approaches 0 and s* approaches 1, resulting in m 5 m obs,clim and s 5 s obs,clim , which corresponds to the underlying high-resolution climatology-the most reliable information available in this case.

c. Climatological estimates
The climatological properties m d,clim and s d,clim for both the observations and the ensemble forecasts have to be specified to be able to derive the standardized anomalies y* and ens* [Eqs. (2) and (3)]. The computation of the observed climatology is based on Stauffer et al. (2017) but uses a left-censored logistic instead of Gaussian distribution and consequently a modified power-transformation parameter. As in Stauffer et al. (2017), the optimal power parameter was chosen using a power-adjusted maximum likelihood approach optimizing 117 stationwise climatologies. Since the optimal power parameters did not show a distinct spatial or altitudinal dependency, the median among all 117 estimates was selected using a constant p 5 1:35 in this study.
The observed spatiotemporal climatology is based on all 117 stations (Fig. 1) and uses daily precipitation measurements from 1971 through the end of 2009, yielding to roughly 1.5 million individual observations. Data from the years 2010-13 are set aside for verification.
The climatology is based on a nonhomogeneous regression model similar to the SAMOS method. In contrast to Eqs. (1) and (4), the linear predictors of the climatological model include smooth one-dimensional and multidimensional spline effects to depict all features of the climatology. In addition to the global intercepts (b, g), an altitudinal effect (s 1 , t 1 ), an effect to describe the seasonality based on the day of the year (s 2 , t 2 ), a spatial effect on dependent longitude and latitude (s 3 , t 3 ), and a three-dimensional effect to describe spatial variations in the seasonal pattern (s 4 , t 4 ) are included. Further details can be found in Stauffer et al. (2017). The full model specification of the observation climatology can be expressed as precipitation 5 0 if y # 0 y p else , y ; L (m obs,clim , s obs,clim ), m obs,clim 5 b 1 s 1 (alt) 1 s 2 (yday) 1 s 3 (lon, lat ) 1 s 4 (yday, lon, lat ), log(s obs,clim ) 5 g 1 t 1 (alt) 1 t 2 (yday) 1 t 3 (lon, lat ) 1 t 4 (yday, lon, lat ).
Again, both parameters of the power-transformed left-censored logistic distribution (location m obs,clim and scale s obs,clim ) are modeled. This is required as they are used for the standardization of the SAMOS model. Although the climatology model (section 3c) is quite complex, estimation only takes about 30 h and has to be done rarely, for example, once a year.
In addition to climatological estimates of the observations, climatological estimates m ens,clim and s ens,clim are required to compute standardized anomalies of the ensemble forecasts as in Eq. (3). The two parameters represent the long-term climatology of the ECMWF EPS (section 2c) and are computed from the ECMWF reforecast dataset. The mean and standard deviation are The climatological location m ens,clim is simply the empirical mean; the climatological scale s ens,clim is the ''standard deviation'' of the reforecast used. The factor C 5 ffiffi ffi 3 p /p is used to get the empirical scale of a logistic distribution to be on the same scale as the estimated scale of the observation climatology s obs,clim [Eq. (6)].

Results and verification
a. SAMOS results Figure 3 shows an example of the climatologies used for 18 May 2010 and the resulting spatial SAMOS predictions. It can be seen in all climatological estimates (Figs. 1a-d) that the altitudinal dependency is the most dominant effect for this day (cf. Fig. 1). The ENS with a horizontal grid spacing of ;40 km 3 40 km is only able to resolve the main Alpine ridge leading to the smooth north-south transitions in the left column of Fig. 3. The ensemble climatology correctly shows larger location m (Fig. 3a) and scale s (Fig. 3c) toward the prealpine flatland to the north and the south; however, this is only a very rough approximation of what is actually observed (Figs. 3b,d).
Figures 3e-h show the predictions for 18 May 2010, when a cold front hit the Alps from the north driven by a strongly pronounced low pressure system east of the study area. As a result, the forecasts show larger precipitation amounts north of the area due to orographic lifting and blocking. As the ENS is only able to represent the topography as one smooth ridge (Fig. 1), the only feature that can be identified in the ENS prediction is a gradual decrease of precipitation from north to south over the main Alpine ridge. In reality, a first mountain ridge alongside the northern boundary of the study area is blocking the air mass. Larger amounts of precipitation are typically observed in southern Germany north of Tyrol, while the well-marked Alpine valleys in Tyrol typically receive less precipitation. This can be seen in the observed climatology (Fig. 3b) but also for this particular day in the corrected SAMOS forecasts (Figs. 3f,h). South of the largest valley with a west-east orientation, increased forecasted amounts and probabilities can be seen in the corrected SAMOS predictions related to a secondary lifting of the air masses at the high mountains close to the main Alpine ridge.
The example shows that SAMOS is able to add interpretable and meaningful features to the ENS during the postprocessing procedure. However, the performance cannot be evaluated with a single case alone. Section 4b therefore contains a detailed analysis and verification on a 3-yr independent dataset.

b. Verification
For verification, the predictions of four different methods will be compared with unused (out of sample) data between February 2010 and December 2012. As two baseline methods, the climatologies (CLIM; section 3c) and the raw total precipitation predictions from the ECMWF ENS will be used. The empirical frequency of the 50 1 1 ensemble members is used as probability to compute the Brier scores shown in the results. Furthermore, a stationwise postprocessing (STN) is included based on Eq. (1). For STN, a separate CNLR model is estimated for each of the 117 stations in the dataset.
The predictions of all methods are out of sample such that the data used for verification are not included in the training dataset, which is used to estimate the regression coefficients. CLIM is based on all available observations except that the years 2010-13 are excluded (section 3c). Therefore, CLIM predictions are spatially in sample but temporally out of sample. STN is using the latest four available reforecast runs yielding to spatially in-sample but temporally out-of-sample predictions. SAMOS is the only method whose predictions can be verified both spatially and temporally out of sample. Therefore, a leave-one-out cross validation is performed. For each station, the SAMOS regression coefficients were estimated based on the most recent four reforecast runs excluding this one specific station. Forecasts were then made for the excluded station only. Table 1 contains a summary of all four methods and shows their sample behavior. Full in-sample SAMOS results are omitted as hardly any differences can be seen compared to the cross-validated out-of-sample results.
The continuous ranked probability score (CRPS; appendix B) of all predictions is shown as a continuous ranked probability skill score (CRPSS) in Fig. 4 using CLIM as reference. Values below zero indicate less predictive skill than CLIM. The higher the score, the better the performance of the corresponding method. As the CRPS is a fully probabilistic score, it penalizes for a possible dislocation of the predicted distribution but also for the wrongly predicted width or sharpness. The scores show an overall decrease with increasing forecast horizon for all three methods, slowly approaching the skill of the climatology. The two postprocessing methods STN and SAMOS show a significant improvement with respect to the ENS up to the 6-dayahead forecasts. SAMOS outperforms the STN method, even if it is verified fully out of sample. The differences between STN and SAMOS are small but all significant (paired two-sided t test, 5% significance level; not shown).
In addition to the CRPSS, Fig. 5 shows the Brier skill scores (BSSs) for three different thresholds using CLIM as the reference method again. Positive BSSs show that the method has more predictive skill than the reference; values below zero show less skill than CLIM. For threshold 0 mm day 21 (precipitation yes/no), it can be seen that the ENS performs poorly, even worse than the climatology. This is mainly caused by a wet bias in the ENS (not shown), which depends on the design of the ENS predicting an average over a relatively large grid cell. Both postprocessing methods perform significantly better than the climatology. Overall, SAMOS shows the best performance even for long forecast horizons. Figures 5b and 5c show the same verification for 1 and 10 mm day 21 , respectively. For these thresholds, ENS is better than CLIM but outperformed by the postprocessing methods. For large thresholds (Fig. 5c) and large forecast horizons, all methods become very similar. Differences between them are no longer significant. As last measure of performance, verification rank histograms and probability integral transform (PIT) histograms are shown in Fig. 6 for the ENS and SAMOS 1-day-ahead and 6-day-ahead forecasts to assess the calibration . In general, a more uniformly distributed histogram shows better calibration. A concave shape indicates that the forecasted distribution is too tight (underdispersive); a convex shape indicates that the distribution is too wide (overdispersive).
The verification rank histogram assesses the calibration of discrete distributions as provided by the 50 1 1 members of the ENS, yielding to 52 possible ranks. For each pair of total precipitation forecasts and observations, the rank is evaluated. Observations falling below the lowest ensemble member forecast are assigned to rank 1; observations falling above the highest ensemble member forecast are assigned to rank 52. All others are assigned to the ranks 2-51 with respect to the ensemble distribution as shown in Figs. 6a and 6c. The pronounced concave shape of the rank histogram indicates a strong underdispersion of the raw ENS such that a large fraction falls into the tails of the distribution or even outside.
The PIT histogram shows a similar measure for probabilistic forecasts. For each observation/forecast pair, the quantile conditional on the observed value is evaluated ([0:0 2 1:0]) and pooled into equidistant bins. For easy comparison with the rank histogram, we have chosen 52 uniformly distributed bins as shown in Figs. 6b and 6d. SAMOS is much better calibrated than the ENS, but the convex shape indicates that the distribution of the SAMOS is slightly wider than what is observed (overdispersive).

Discussion and conclusions
In this study, the standardized anomaly model output statistics (SAMOS) model has been extended and The boxes show the model performance for 1-day-ahead to 6-day-ahead forecasts. Each box contains three boxand-whisker plots for the (left) raw ENS, and the two postprocessing methods (middle) STN and (right) SAMOS. Each one contains 117 stationwise-mean skill scores. The boxes show the upper and lower quartile, and the whiskers show the 1.5 interquartile range. Additionally, the median (black bar) and the outliers (circles) are plotted. Values below 0 indicate stations with less skill than the climatology. The higher the values, the better the performance of the method. applied to daily precipitation sums. It has been shown that the concept of using standardized anomalies (Scheuerer and Büermann 2014;Dabernig et al. 2017) can be used to correct precipitation forecasts of numerical ensemble forecast models. The SAMOS postprocessing method is able to create accurate spatial predictions of daily precipitation sums over complex terrain. SAMOS uses high-resolution spatial climatologies as background information to transform the data (observations and ensemble forecasts) into standardized anomalies. This (i) removes location-dependent climatological features from the data and (ii) brings all data to a comparable level to account for the small-scale features in the study area, which are not yet resolved by the ensemble model. SAMOS returns fully probabilistic predictions for any arbitrary location within the study area, even for regions without observational sites.
To create the standardized anomalies, daily estimates of the climatological mean (location m d,clim ) and variability (scale s d,clim ) are required. The observed climatology estimate is based on the method presented by Stauffer et al. (2017) using a censored logistic rather than censored Gaussian response distribution. The censored logistic distribution has been chosen for this study as the spatiotemporal climatology showed slightly better calibration. Both distributions are very similar except that the logistic distribution has somewhat heavier tails, which is partly compensated by the FIG. 5. BSSs for three different thresholds using climatology from Eq. (6) as reference: (a) 0, (b) 1, and (c) 10 mm day 21 . The specifications of the box-and-whisker plots are as in Fig. 4. The frequency of the daily total precipitation is used for ENS, whereas the probabilities for the two postprocessing methods STN and SAMOS are derived from the predicted distribution. (from left to right) Scores for 1-day-ahead to 6-day-ahead forecasts. The higher the values, the better the performance of the method.

MARCH 2017
S T A U F F E R E T A L .
additional power transformation. Overall (not shown), the predictive skill of the SAMOS using either a censored logistic distribution or a censored Gaussian distribution is very similar. The climatology of the ECMWF ensemble model is provided by the ECMWF reforecast dataset [Eq. (7)] with one reforecast run per week consisting of 4 1 1 members and covering the past 18-20 years. Once both climatologies are known, the observations and the ensemble forecasts can be converted into standardized anomalies such that all data follow a standard logistic distribution. As all location-dependent characteristics are removed, this allows us to apply one simple regression model including all data at once. Since SAMOS uses the empirical mean and standard deviation of the standardized anomalies for training, which are based on the reforecasts, these firstand second-order moments are based on 4 1 1 members only (Roulin and Vannitsem 2012). Because of this small sample, the estimates are less precise than on current reforecasts runs, which provide 10 1 1 different ensemble members. The effect of having a larger reforecast ensemble could not be tested because of lack of overlapping data (section 2b).
The results show that the spatial SAMOS outperforms the STNs even if the SAMOS predictions are (unlike STNs) spatially out of sample. This is mainly related to the training dataset. While the STN only includes interpolated forecasts of one location, the SAMOS training dataset includes the data of all stations, leading to more robust estimates. The SAMOS calibration indicates that the assumed response distribution is not optimal. A different distribution might improve the skill and remove the need of the power transformation (Scheuerer 2014;Hamill et al. 2015).
The goal of this study is to use the SAMOS approach proposed by (Dabernig et al. 2017) and to extend the method for the application of precipitation sums or censored responses in general. While only focusing on daily precipitation sums up to day 6 in this study, it would be worthwhile to extend the forecast horizon and the study area but also to include additional covariates and to apply the SAMOS approach to other meteorological parameters. A further SAMOS extension to account for spatiotemporal correlation structures would be of great interest. Because of the standardization, SAMOS corrects for a possible underprediction or overprediction of the ensemble over long time scales but not on a single event as only the spatial correlation structure of the EPS is considered at this stage.
As the estimation of the SAMOS requires only little computational time, the SAMOS can easily be refitted as soon as a new reforecast run is available. This ensures that the SAMOS automatically adapts itself to the latest ECMWF ensemble model version within a very short transition period. Nowadays, the ECMWF reforecast (ECMWF 2016) is run twice a week, providing 10 1 1 members, which could further improve the performance of the SAMOS but could not have been tested. project funded by the Austrian Science Fund (FWF), Grant TRP 290. The computational results presented have been achieved in part using the Vienna Scientific Cluster (VSC). The observation dataset was provided by the Tyrol hydrographical service (http://ehyd.gv.at/).

Properties of the Power-Transformed Left-Censored Logistic Distribution
The probability density function l and the cumulative distribution function L of a noncensored logistic distribution are defined as l(x j m, s) 5 exp 2 x 2 m s L(x j m, s) 5 1 The density l 0 and distribution function L 0 of a zero left-censored logistic distribution including the power transformation 1/p can then be written as L 0 (x i j m i , s i , p) 5

<
: 0 for all: x i , 0 where both are set to zero below the censoring point at 0. For x 1/p i $ 0, both follow the density and distribution function of the noncensored logistic distribution (l and L respectively) except that the density l 0 (x i 5 0 j m, s) depicts the point mass at the censoring point, which conforms the distribution function evaluated at 0. This also directly specifies the probability of precipitation P(x i . 0) defined as the probability that precipitation will be observed at a certain location/time: P(x i . 0 j m i , s i ) 5 1 2 P(x i # 0) 5 1 2 L(0 j m i , s i ). (A5) The probability of exceeding a certain threshold can be derived for any threshold k $ 0: P(x i . k j m i , s i , p) 5 1 2 P(x i # k) 5 1 2 L(k 1/p j m i , s i ).
(A6) Furthermore, the expectation of the distribution on the original scale has to be evaluated. The expectation on the original scale x in millimeters per day can be retrieved using A last property of interest is the median of the distribution, again on the original scale x in mm day 21 . Parameter m in Eqs. (1) and (4) describes the latent unobservable location. The median is then given as median(x j m, p) 0 for m # 0 m p else .
APPENDIX B

Error Measures Used for Verification
As a fully probabilistic score, the CRPS is shown in the verification section of this article. The mean CRPS of a zero left-censored power-transformed logistic distribution (see appendix A) can be written as where x is the response variable on the original scale in millimeters per day, N is the number of forecasts included, L 0 is the CDF of the forecasted distribution [Eq. (A4)], and H is the CDF of the observation represented by a Heaviside step function, which takes 0 for all x , observations and 1 otherwise. While x is on the original scale (mm day 21 ), both distributional parameters, location m and scale s, are on the powertransformed scale. Therefore, the power transformation 1/p is required to evaluate the CDF L 0 . As no analytic solution has been found, the CRPS is evaluated by quantile sampling with n 5 2000. The CRPS is shown as a skill score (CRPSS) in this article. A skill score shows the performance against a reference method. As the CRPS can only take nonnegative values, the CRPSS can be written as where the CRPS of the method to test is in the numerator, and the CRPS of the reference method is in the denominator. Values below 0 indicate that the tested method performs worse than the reference. CPRSS values can take values in the range of [ 2', 1]. As a second measure, the BSS is shown to verify the skill of the forecast probabilities. One of the most frequently used thresholds for precipitation forecasts is 0 mm, also known as the probability of precipitation. As an EPS does not provide fully probabilistic forecasts, the frequency of the daily total precipitation sum is used as an estimator of the probability. As an example, if half of all ensemble members predict no precipitation, the other half does; the frequency is 0.5 and can be seen as a probability of ;50% if the number of ensemble members is sufficiently large. The Brier score can then be written as where N is again the number of forecasts included, P i the predicted probability that an event exceeds threshold k [Eq. (A6)], and o i the binary observation which takes 0 for all observations x i # k and 1 otherwise. Correspondingly, the Brier skill score is defined as