Precipitation forecast over China for different thresholds using the multimodel bias-removed ensemble mean

Based on the daily accumulated precipitation data obtained from the ensemble forecasts by three meteorological agencies and the CMORPH observational data, the experiments of bias-removed ensemble mean (BREM) towards classified samples of different precipitation thresholds are carried out with results as follows: (1) The Classified BREM (CBREM) is characterized by higher skill in precipitation forecast in contrast to BREM. Most visible improvements can be observed for light precipitation, but there is a negative impact for a moderate threshold. (2) The CBREM after choosing optimal grading thresholds for each grid point further improves forecast skill of precipitation, which shows greatest advancement for moderate precipitation with the threat score improving percentage of over 20% on average.


Introduction
Precipitation is considered as one of the most important factors impacting agriculture and human society [1][2][3]. With the climate background of conspicuous changes, increasing demand of precipitation forecast has been addressed [4][5][6]. In recent decades, numerical models have been developed to improve simulations and forecasts of various atmospheric systems [7][8]. To make the best use of outputs from numerical weather prediction from multiple meteorological agencies, statistical post-processing methods such as multimodel ensemble have been proposed, which improve forecast skills of temperature, precipitation, wind and many other factors [9][10][11][12][13]. Moreover, considering the discontinuity and locality of precipitation, the forecast skill of precipitation can also be improved by dividing the precipitation samples to different thresholds and afterwards carrying out the calibration separately.
In this study, the two typical multimodel ensembles of simple ensemble mean (EMN) and biasremoved ensemble mean (BREM) [14] are employed for precipitation forecast. Additionally, the BREM is also performed towards samples of different precipitation thresholds. On the other hand, the influences of different reference forecasts on splitting samples and grading thresholds are further studied.
The structure of this paper is organized as follows. The used data and methods are briefly introduced in Section 2. In Section 3, we assess forecasts of the raw models, EMN and BREM. The BREM for different precipitation thresholds and different threshold grading criteria are also described in this section. Finally, the conclusion and discussion are presented in Section 4.

Data
The used forecast dataset of daily accumulated precipitation are obtained from the ensemble forecasts initialized at 0000 UTC produced by the European Centre for Medium-Range Weather Forecasts (ECMWF), the Japan Meteorological Agency (JMA), and the United Kingdom Meteorological Office (UKMO) under the THORPEX Interactive Grand Global Ensemble (TIGGE) framework with a horizontal resolution of 1° 1°. The lead times range from 1 to 7 days. In addition, the daily accumulated precipitation data provided by CPC MORPHing technique (CMORPH) are used as observation. It is produced with a horizontal resolution of 0.1° 0.1° and is widely used in precipitation-related researches. In this study, the used resolution is 1° 1° and the common period is 1 January-31 August 2016. To be noted, the two month of July and August are used for forecast verifications after the post-processing.

The multimodel ensemble methods
Multimodel ensemble forecast are always characterized by higher skills compared to individual models, in which, the EMN is the most common method: ∑ (1) where EMN is the output of EMN, N is the number of ensemble members, and is the forecast of model member i.
Besides, assuming that the model systematic errors can be removed through a training phase, the BREM has been proposed as the following formula: ∑ (2) where BREM is the output of BREM, is the mean of observation in the training phase, is the forecast of the model member i, is the forecast mean of the model i in the training phase. Importantly, the running training phase is applied in our study to adjust the latest forecast performance of the models.
Considering that the precipitation with different thresholds varies in spatial and temporal characteristics, we classify the experiment samples of the 24 h accumulated precipitation into four thresholds according to the criteria by the China Meteorological Administration (CMA), i.e. light precipitation (< 10 mm), moderate precipitation (10-25 mm), heavy precipitation (25-50 mm), and rainstorm (≥ 50 mm). Afterwards, they are calibrated with the BREM method separately to obtain an overall better forecast.

The verification methods
The multimodel ensemble results are verified via methods of the root-mean-square error (RMSE) and the threat score (TS), which are widely used in precipitation related studies [14].
RMSE describes the magnitudes that the forecast data deviates from the observation. The smaller RMSE indicates the higher forecast skill.
where n is the total number of grid points, is the forecast of the grid i, and the is the observation of the grid i.
The TS reflects the forecast skill of precipitation over different thresholds. The higher TS indicates the higher forecast skill.
(4) where hits is the number of grid points where both forecast and observation reach a certain precipitation threshold, misses is the number of grid points where observation reaches a threshold but forecast does not. Similarly, alarms is the number of grid points where forecast reaches a threshold but observation does not.

Improvement of precipitation forecast by BREM
The RMSEs of daily accumulated precipitation from different models and multimodel ensembles are displayed with lead times from 1 to 7 days in Figure 1, which are averaged over the period of 1 July-31 August 2016. In general, the RMSE increases over the lead time. The ECMWF forecast shows lowest RMSEs among the three raw models, showing its superiority to the other two centers. Besides, the EMN and BREM are characterized by lower RMSEs, indicating the significant advantages of multimodel ensembles, with the latter performing better.

Frequency distribution of precipitation over different thresholds
There are huge differences in precipitation across China, and the probability of precipitation for different thresholds are also different. Figure 2 shows the spatial distributions of precipitation frequency for different thresholds over China. It is obvious that the precipitation in China is dominated by light rain. The frequencies are mainly greater than 0.6. The moderate precipitation is also common in some areas, with the frequency ranging from 0.3 to 0.5. In contrast, the frequencies of high precipitation and rainstorm are relatively lower, especially for the rainstorm. Therefore, the samples of high precipitation and rainstorm could be too few for the forecast post-processing. Correspondingly, a spatial window of 3° 3° is determined to increase the samples of high precipitation and rainstorms.

BREM based on different precipitation thresholds
Precipitation with different thresholds are characterized by different spatial and temporal features. Therefore, the Classified BREM (CBREM) is used to take precipitation samples of different thresholds into consideration. With respect to the CBREM method, it would have great influences on the obtained results using different criteria of classification based on different forecasts, i.e. the raw forecasts or the ensembles. According to the results obtained from Section 3.1, the ECWMF, EMN and BREM are superior to the others in the precipitation forecasts, which are used as classification references in the following CBREM experiments. The precipitation forecasts towards different thresholds determined by ECWMF, EMN and BREM are hereafter abbreviated as c-ECMWF, c-EMN and c-BREM, respectively. Importantly, during the forecast of rainstorm, considering the small number of samples and their possible serious effects on the human society, overestimations are always considered more acceptable than underestimations. Therefore, if a rainstorm occurs in the classification reference forecast, due to the possible smoothing effect of the BREM, it is directly taken as the final forecast without any post-processing.  Figure 3 shows the TS by CBREM under different classification reference for different precipitation thresholds. In general, the TSs of the forecast decrease with the increasing lead times. For light precipitation, the TSs by CBREM are always higher than those by BREM. The c-EMN shows the most skillful performance among the results. The improvement magnitudes decrease with the

The optimal grading thresholds
On the basis of the considerable results of CBREM, different criteria of precipitation thresholds could also affect the forecast results, which are further investigated in the following section.
In this study, 66 schemes of threshold selection are employed. That is, for the light precipitation threshold of < 10 mm, we determine 6 new upper levels of 5-10 mm by 1 mm. For the moderate precipitation, the thresholds are adjusted to 15-25 mm by 1 mm composing 11 levels. In total, the combination of light and moderate precipitation thresholds composes 66 schemes. To be noted, these 66 schemes are carried out towards all grid points to determine an optimal one for each point. The changes in percentage of the TSs after applying the optimal threshold scheme for each grid point in China are displayed in Figure 4. To be noted, although the CBREM is performed by newly generated threshold schemes, the evaluation is carried out toward the traditional precipitation thresholds (by CMA) of 10 mm, 25 mm and 50 mm. Due to that the rainstorm threshold of 50 mm maintains, the distribution of TS changes is omitted in Figure 4. It shows conspicuous improvements in TS for the whole area and for all thresholds. For the light precipitation, the improvement magnitudes range between 5-15%, with generally lower values over the west. As for the moderate precipitation, the most visible improvements with percentages greater than 20% are located on the east of China, which shows more significant advancements than those of light precipitation. For the heavy precipitation, the improvements are the least among the three thresholds, with magnitudes of 20% located over only few parts of Southeast China.

Summary
Using the daily accumulated precipitation data provided by the ensemble forecasts of ECMWF, JMA, UKMO, and the CMORPH observational data, the experiments of ensemble methods such as EMN,