Four-dimensional variational (4DVAR) performance test with assimilation satellite and radar data (case study of a heavy rainfall Bengkulu March 4, 2019)

Four-dimensional variational (4DVAR) is one of the assimilation techniques considering time integration to distribute observational data at time window intervals. In this study, we aim to evaluate the 4DVAR assimilation technique using satellite and radar data to simulate a heavy rainfall case in Bengkulu on March 4, 2019. The result shows that radar data assimilation (DA-RAD) can improve rainfall pattern over Bengkulu mainland areas, while the satellite data assimilation (DA-SAT) enhances rainfall over the ocean. In addition, for temporal analysis, the DA-RAD successfully correct the initial time of the event, producing the smallest error and the best correlation in statistical verification, also a small bias and higher accuracy for discrete verification. However, DA-SAT is more capable to improve rainfall accumulation with the lowest FAR value. In conclusion, compared to others, both satellite and radar can be used as assimilation data for 4DVAR methods as they have different roles in increasing the quality of model performance.


Introduction
Indonesia is a country located in a tropical region where various global, regional, and local factors, such as El Nino/La Nina, Madden-Julian Oscillation, Indian Ocean Dipole, monsoon, and topography can trigger extreme weather conditions [1]. In particular, the Bengkulu region which is directly adjacent to the Indian Ocean with a varied topography can delivered in frequent hydrometeorological disasters to these areas. According to BNPB [2], floods and landslides in Bengkulu that occurred in March 2019 were included in the three major hydrometeorological disasters in early 2019. Thus, this study uses the WRF model to simulate heavy rain events in Bengkulu.
Weather Research and Forecasting (WRF) is one of the numerical weather models which can be forecast on a mesoscale [3]. WRF is a flexible application in various systems, inexpensive in infrastructure, open-source, and easy to operate [4]. However, there are still some shortcomings in the initial conditions of WRF due to the uneven distribution of the observation data spatially and temporally [5]. WRF Data Assimilation (WRFDA) is a data assimilation system in WRF that can improve this limitation [6]. WRFDA has various assimilation techniques, such as Three-Dimensional Variational (3DVAR), Four-Dimensional Variational (4DVAR), ensemble, Rapid Update Cycle (RUC), and hybrid. Among those assimilation techniques, 4DVAR outperforms in terms of performance [5,7]. The 4DVAR technique is a successor 3DVAR assimilation method, which includes time integration for the  [8]. Previous study about this technique has been carried out by Yi et al. [9] using TRMM 3B42 and GPM IMERG satellite data. The evaluation results show that the experiment with the 4DVAR technique increased the accuracy of rainfall prediction compared to non-assimilation. In addition, Mazzarella et al. [10] has also conducted research on the assimilation of radar data using 3DVAR and 4DVAR techniques to simulate a case study of heavy rain in central Italy using WRF. The result showed the use of reflectivity and radial velocity data on 4DVAR can increase the prediction skill for high-intensity rain.
Based on this background, this study is focused on the satellite and radar data for WRF model assimilation using 4DVAR technique during phase a heavy rain event on 4 March 2019 in Bengkulu. The aim of this study is to see the performance of each WRF model result before and after the assimilation was carried out using satellite (DA-SAT) and radar (DA-RAD) data with 4DVAR techniques spatially and temporally. The outline of this research will begin with data and WRF model discussed in section 2. An explanation of the methods in section 3. The results and discussions are described in section 4. The conclusions of the research are presented in section 5.

Data and WRF Model
This study uses High-Performance Computing (HPC) belonging to the Indonesian Institute of Sciences (LIPI) with the specifications of the Intel Xeon E-5 2609, 2.4 GHz, 100 cores CPU, 32 GB RAM, and 100 GB memory storage capacity. The tools and data used are divided into 2 groups, namely observational which is the data verifier of the model results, and assimilation which is the data used for WRF and WRFDA models version 4.1.2.

Observational
The obtained rain data from Automatic Weather Station (AWS) and 4 Automatic Rain Gauges (ARG) in Bengkulu used as a temporal verifier of the model results (table 1). We also use GSMaP rainfall accumulation data on March 4, 2019 with a grid of 0.1° x 0.1° (ftp://hokusai.eorc.jaxa.jp/realtime/) [11] This data is processed using the NCAR Command Language (NCL) application for spatial verifier of the model results.  [12]. 4DVAR assimilation is only done in the second domain. The time interval for the assimilation data used is shown in figure 1.
For assimilation purposes we use radiation data from the Advanced Microwave Sounding Unit-A (AMSU-A) and Microwave Humidity Sounder (MHS) sensors on the NOAA 15-19 satellite which can be downloaded from https://rda.ucar.edu/datasets/ds735.0/ [13]. AMSU-A data is designed to monitor 3 the temperature profile of the atmosphere, while MHS is to observe the atmospheric humidity and precipitation levels [14]. It has a temporal resolution of 6 hours and have a data length of 12 hours. This satellite passes through Indonesian territory twice in one day at 00 and 12 UTC [15].
We also use reflectivity (Z) and radial velocity (V) obtained from BMKG doppler radar. It has data length with an hourly temporal resolution. This data needs to be processed using Python to define the Z CAPPI [16]. The resulting data NetCDF extension, then need to combine and converted using the R-Studio to get ASCII data.
The model configuration used is shown in table 2. We use NCL to post-processing model output. Verification was carried out using eyeball verification to see the distribution pattern of rainfall. Besides, we also use the statistical verification of correlation coefficient and Root Mean Square Error (RMSE). In addition, discrete verification is utilized to determine the quality of forecast results against the verifier at the observation points. This verification method uses several attributes such as bias, accuracy, and False Alarms Ratio (FAR).

Results and Discussions
These results and discussions are divided into 2 parts, namely spatial analysis of daily rainfall distribution and temporal analysis for the quality of model output performance.

Spatial Analysis of Daily Rainfall Distribution
Heavy rain that occurred on March 4, 2019 in Bengkulu Province resulted in a diverse daily rainfall accumulation value at several selected AWS and ARG points. Rain event is spatially depicted from GSMaP data. As can be seen in figure 3 (a), the distribution of daily observed rainfall (GSMaP) occurs almost all along Bengkulu mainland to the ocean. The maximum rainfall seen around the ocean of Southeast Bengkulu with a value of 75-85 mm/day. Whilst, the rain that occurs along the Bengkulu mainland is dominated by rainfall with a value of 5-35 mm/day. At observation point 1, the value of daily accumulated rainfall is 15-25 mm/day. Observation points 2, 3, and 4 produce the same daily rainfall accumulation values with number varies between 5 and 15 mm/day. Among those points, observation point 5 yields the highest 24 hours rainfall accumulation of 25-35 mm/day. However, the rainfall produced by NON-DA scheme is distributed over the research domain ( figure  3 (b)). Maximum rainfall in the NON-DA scheme seen in the central part of Bengkulu with a value of >100 mm/day, though GSMaP did not shows any high rainfall intensity on the central part of Bengkulu mainland. Rainfall along the Bengkulu mainland in the NON-DA scheme has a number of 5-45 mm/day and is unevenly attributed. This scheme has also not been able to detect the presence of high rainfall in the Bengkulu ocean. At observation point 1, the value of daily accumulated rainfall is between 5-15 mm/day. Observation points 2 and 5 produce accumulated rainfall <5 mm/day. Meanwhile, at observation points 3 and 4, the accumulated daily rainfall is between 15-25 mm/day. Among five observation points, none of NON-DA results has the same amount of rainfall accumulated GSMaP. It indicates that is the NON-DA scheme unable to capture the spatial distribution of daily rainfall in Bengkulu. This is because of the use of default settings for local characteristics in the model [18] where the Bengkulu region has a varied topography like coastal areas and Barisan hill. Compared to GSMaP, the rainfall pattern of the DA-SAT scheme is also distributed over part of the research domains ( figure 3 (c)). However, the distribution of rainfall number tends to decrease in the northeastern land area. The maximum rainfall occurs almost along the central part of Bengkulu mainland with a value of >100 mm/day. However, the DA-SAT can detect high rainfall in the Bengkulu ocean with a value of 75-85 mm/day. The rain in mainland Bengkulu on the DA-SAT scheme is dominated by rainfall with a value of 15-45 mm/day. At observation points 1, 2, and 5, the accumulated rainfall varies between 15-25 mm/day. Observation point 3 yields a rainfall accumulation value of 5-15 mm/day. At observation point 4, the accumulated rainfall is 25-35 mm/day. Among those observation points, there are two observation points which can produce the same range of accumulated rainfall values as GSMaP values (observation points 1 and 3).
Whilst, the DA-RAD scheme has a rainfall pattern that is distributed along the Bengkulu mainland to part of the ocean ( figure 3 (d)). Compared to other schemes, the DA-RAD has succeeded in detecting the absence of rain pattern in the central to northeastern part of Bengkulu. However, this scheme is not able to detect the presence of high rainfall in the oceans. The maximum rainfall in the DA-RAD scheme was detected in the northwest and southeast Bengkulu mainland. With a number of the rainfall along the Bengkulu mainland, values between 5-35 mm/day. At observation points 1, 2, and 3, the resulting accumulated rainfall values is <5 mm/day. Observation points 4 and 5 produce rainfall accumulated values of 25-35 mm/day. Among those points, only observation point 5 can produce a range of accumulated rainfall values that are the same as GSMaP. Overall, the DA-RAD scheme is able to produce rainfall pattern that is closest to the GSMaP. This scheme also can produce the same range of accumulated on GSMaP over the Bengkulu mainland. However, DA-RAD scheme poorly detects the rainfall distribution over the Bengkulu ocean area. This is because the DA-RAD scheme is limited by the range of radar observation [19]. Whilst, DA-SAT is limited by the data availability due to polar orbit satellite coverage [20]. Since the DA-SAT scheme can detect the presence of high rainfall in the Bengkulu ocean, it can improve the rainfall model results over the Bengkulu ocean area.

Temporal Analysis for Quality of Model Output Performance
Based on the observation points, time-series graphs are used to depicts the intensity of rainfall per hour and accumulated rainfall (figure 4). Based on figure 4 (1), the DA-RAD scheme is unable to predict the occurrence of rain at this observation point. This is because the Stasiun Klimatologi Bengkulu AWS is in the cone of silence radar area, resulting a data gap above the radar center [21]. However, the DA-SAT scheme has a value that is closest to the accumulated value of the observed rainfall data with the number of 15.28 mm. Whilst, the NON-DA scheme is successful in predicting the start time of rain to start at around 11.00 UTC. Figure 4 (2) shows that DA-SAT has the closest accumulated rainfall value to observation with a number of 21.44 mm. Whilst, DA-SAT and DA-RAD can predict the start time of rain event at 12.00 UTC. At Bukit Kaba ARG point ( figure 4 (3)), NON-DA is the scheme with the closest accumulated rainfall to the observation of 22.98 mm. For the start time of the rain event at 9.00  From the time-series analysis, we know that DA-SAT and DA-RAD are successfully predicting the start of rain event at one observation point (Rekayasa Sukaraja Seluma ARG). At Bukit Kaba ARG, Air Rami ARG, and Kedurang Ilir ARG, DA-SAT and DA-RAD can improve the start time of rain event from NON-DA. This is in line with research by Sun and Wang [22] which stated that the 4DVAR assimilation technique is successful in improving and balancing the initial conditions in the model. The DA-SAT scheme is capable of predicting the accumulated rainfall closest to the observation value at 4 points. The DA-SAT scheme is also capable of predicting the presence of rainfall at all observation points. Whilst, the DA-RAD scheme is successful in predicting the start time of rain events three times. Based on that analysis, the use of satellite and radar assimilation data results in an increase in model performance. However, DA-SAT and DA-RAD reduce the quality of accumulated rainfall value from NON-DA at Bukit Kaba ARG point. This is because the topography at the observation point has the characteristics of a plateau (shown in figure 3 and table 1), so that should be a specific setting on the model [18]. In addition, DA-RAD has not been maximized in predicting rainfall accumulation at Stasiun Klimatologi Bengkulu AWS and Rekayasa Sukaraja Seluma ARG which are located at the cone of silence area [21]. Also, the Sukaraja Seluma ARG point, which is located ± 15 km from the center of the radar.  In statistical verification, RMSE and correlation calculation are used between the modeled rainfall data and the combined rainfall data from five observation points. Based on table 3, three schemes produce errors value from 0.8972 to 0.9448 mm. From the resulting RMSE value, the DA-SAT and DA-RAD schemes have succeeded in reducing the error value of the NON-DA scheme. This is in line with the results of the study by Xu et al. [23] which states that the use of remote sensing assimilation data such as satellite and radar data with the 4DVAR technique can reduce the error value of the model results. The correlation value between the model results and the observed rainfall data has varied between 0.2737 to 0.0645. This shows that each scheme is not capable of predicting in the diurnal variations of rainfall correctly. However, among those schemes, the DA-RAD has the best RMSE value and correlation in the number of 0.8972 mm and 0.2737, respectively. This error value is the smallest error value among other experimental schemes. These results are in line with previous studies on assimilation satellite [24] and radar [25] data with 3DVAR technique in Indonesia which can also increase the statistical values of the model. For discrete verification, this process is carried out by dichotomy between the rainfall data from each experimental scheme against the observed rainfall data. The bias, accuracy, and FAR values obtained from each experimental scheme are displayed in the form of a diagram called the Forecast Skill Score (FSS). Based on figure 5, the bias values of the three schemes tend to be overforecast. However, the best bias value is generated by the DA-RAD scheme with a value of 1.0583 followed by the NON-DA and DA-SAT schemes with a value of 1.1845 and 1.5146. Then, the respectively highest rainfall event prediction accuracy value is generated by the DA-RAD scheme with a value of 0.8250 followed by the DA-SAT and NON-DA schemes with a value of 0.8153 and 0.7931. This is in line with the research of Pan et al. [26] who stated that the assimilation of remote sensing data using 4DVAR technique can increase the accuracy value of the WRF model. Nonetheless, the lowest FAR value that shows the predicted rain event against non-rain event is generated by the DA-SAT scheme with a value of 0.5962 followed by the DA-RAD and NON-DA schemes with a value of 0.6055 and 0.6885. From this discrete verification, the best model performance is produced by the DA-RAD scheme noted by small bias and high accuracy.  Based on those verification techniques, each of DA-SAT and DA-RAD scheme has different influence in improving model performance. The DA-RAD scheme can improve the initial rainfall occurrence time in the model after assimilation, also producing the smallest error value and the best correlation value. In discrete verification, the DA-RAD scheme is also the best scheme because it produces almost perfect bias value and the highest accuracy value. However, the DA-RAD scheme is limited by radar coverage. The presence of a cone of silence area and a maximum distance of 150 km (radar limitation) from the center of the radar affects so does the assimilation results [21]. Meanwhile, the DA-SAT scheme uses satellite assimilation data with a wide range can cover the entire domain [27]. This make the DA-SAT scheme success in detecting the presence of rainfall at each observation point. The ability of the two data assimilations that be used can improve model performance in different ways. This is because the 4DVAR technique can accommodate a time window to distribute the assimilation data at certain time intervals. Thus, the density of data assimilation in the 4DVAR technique greatly affects the model performance [28].

Conclusions
Spatially, the DA-RAD scheme can approximate the distribution pattern of the daily observed rainfall and improve the rain events along Bengkulu mainland. Whilst, the DA-SAT scheme can improve the rain events in Bengkulu ocean area. Temporally, the DA-SAT scheme can improve the accumulated rainfall value from the model and produce the lowest FAR value of 0.5962. Whilst, the DA-RAD scheme can predict the onset of rain events, reduce the error value, and increase the correlation value of 0.8972 mm and 0.2737, respectively. It also produces the lowest bias value and the highest accuracy of 1.0583 and 82.5%, respectively.
However, there are still some obstacles in this research that need to be fixed in the future. This study is decreasing the number of research domains due to limitations in the RAM capacity of the devices used for processing this 4DVAR technique. In further research, a device with a very large RAM capacity is needed to get optimal results. In addition, CV7 as BE data can also maximize the calculations for U and V variables in the radar assimilation data [29]. From this study, it can be found that each data assimilation can improve the model results in different ways. Therefore, further research is needed by using combination of satellite and radar data with 4DVAR technique.