Progress, challenges, and future steps in data assimilation for convection-permitting numerical weather prediction: Report on the virtual meeting held on 10 and 12 November 2021

In November 2021, the Royal Meteorological Society Data Assimilation (DA) Special Interest Group and the University of Reading hosted a virtual meeting on the topic of DA for convection-permitting numerical weather prediction. The goal of the meeting was to discuss recent developments and review the challenges including methodological developments and progress in making the best use of observations. The meeting took place over two half days on the 10 and 12 November, and consisted of six talks and a panel discussion. The scientific presentations highlighted some recent work from Europe and the USA on convection-permitting DA including novel developments in the assimilation of observations such as cloud-affected satellite radiances in visible channels, ground-based profiling networks, aircraft data, and radar reflectivity data, as well as methodological advancements in background and observation error covariance modelling and progress in operational systems. The panel discussion focused on key future challenges including the handling of multiscales (synoptic-, meso-, and convective-scales), ensemble design, the specification of background and observation error covariances, and better use of observations. These will be critical issues to address in order to improve short-range fore-casts and nowcasts of hazardous weather


| INTRODUCTION
Convection-permitting (or convective-scale or storm-scale) data assimilation (DA) refers to DA in regional numerical weather prediction (NWP) systems with horizontal gridlengths of around 1-4 km, where convection is modelled explicitly rather than parametrized. Such systems have been used in research and operational NWP for more than fifteen years (e.g., Ballard et al., 2016;Dance, 2004;Gustafsson et al., 2018;Park &Županski, 2003;Sun, 2005). These systems can provide improved short-term (0-36 h) nowcasts and forecasts (Milan et al., 2020), particularly for hazardous weather such as convective storms  and fog (Clark et al., 2008).
Convection-permitting DA differs in four main aspects from global DA. First, there is a need for observation information on appropriate scales (e.g., roughly 1 km horizontal spacing, 250 m vertical spacing, and every 15 min in time in the boundary layer; WMO OSCAR, 2022). There are a variety of observations available that may provide some of the required information (e.g., geostationary satellite, radar, and ground-based remote-sensing observations). However, assimilating these observations can be challenging due to the need to develop complex observation operators (e.g., Hawkness-Smith & Simonin, 2021) and to properly represent the observation uncertainties Simonin et al., 2019). Furthermore, there are few suitable observation impact measures to help guide future observing network design for these systems (Fowler et al., 2020). Second, the convection-permitting DA problem spans multiple scales (synoptic-, meso-, and convective-scales). It is an open question to what extent we should attempt to analyze all of these scales in regional prediction systems (Baxter et al., 2011;Caron et al., 2019;Gustafsson et al., 2018;Wang et al., 2021). Third, the nonlinearity of convective processes leads to an increased need for nonlinear and flow-dependent DA techniques (e.g., Bishop, 2016;Hodyss, 2011;van Leeuwen, 2009). The tools developed for global DA (such as background error covariance modelling using linear balance constraints) are no longer appropriate . Fourth, systematic errors in the model representation of hydrometeors (and their radiative properties) present significant challenges (e.g., Grabowski et al., 2019).
In November 2021, the Royal Meteorological Society (RMetS) DA Special Interest Group (SIG) hosted a virtual meeting with the goal of discussing recent developments and the continuing challenges of improving convectionpermitting DA. This meeting was held on 10 and 12 November, 2021. Over 70 people registered for the meeting from weather services, research institutes, and universities in seventeen countries. The meeting consisted of six presentations and a panel discussion on topics proposed by the participants and organizers. The presentations were • Which scientific questions should we focus on for convection-permitting DA methods in the next 5 years (and why)?
In the rest of this report, we summarize the recent progress presented during the meeting (section 2), ongoing challenges (section 3), and recommendations for future research (section 4).

| PROGRESS
This section presents operational systems used by some meteorological centers (section 2.1), studies on the assimilation of novel remote-sensing and aircraft observations (section 2.2), and research on the modelling of background and observation error covariances (section 2.3). Instead of providing a comprehensive review, we have synthesized the material presented at the meeting. The reader is referred to Gustafsson et al. (2018) for a broader perspective on operational convection-permitting DA.

| The Met Office UKV hourly 4D-Var system
Convection-permitting DA has been operational in the UK since 2005 . In July 2017, hourly-cycling four-dimensional variational data assimilation (4D-Var) was implemented operationally in the Met Office's convection-permitting (approximately 1.5 km) forecast model known as the UKV (Milan et al., 2020). The previous operational system used Latent Heat Nudging (LHN) for radar-derived surface rain rate and 3D-Var-FGAT (First Guess at Appropriate Time) plus Incremental Analysis Updating (IAU) for all other observations (e.g., . The motivation for using hourly 4D-Var was to improve post-processing products in the 0-6 h forecast period and hourly forecasts up to 12 h. The hourly 4D-Var has been found to bring positive impacts to forecasts of storms and precipitation and it is an affordable single operational system that covers both nowcasting and "day one" timescales (Milan et al., 2020).
Due to the small domain size, it is questionable how well analyses can fit large scale information coming from the observations (e.g., Baxter et al., 2011). To address this issue, the Met Office will incorporate large-scale blending into the convection-permitting DA system for operational forecasting in early 2022. The global analysis will first be downscaled and then blended with the background from a Limited Area Model (LAM). The algorithm for the large-scale blending is to calculate a blended background increment, δx h , such that where x h is the model state downscaled from the global analysis, x b is the LAM background and the two linear operators, S and G denote a low-pass filter and a reconfiguration function respectively. The low-pass filter removes small scales and the reconfiguration function interpolates x h to the same grid as the LAM background, accounting for the surface terrain. Then the observation innovations are calculated as shown that large-scale blending improves the fit between the merged background and observations.

| The NOAA experimental warn-on-forecast system
Currently, in the USA, warnings for severe storms, tornadoes, and intense rainfall and flash floods are usually based on radar-and spotter-based detections. Guidance from numerical models has not been geared toward these warnings. Therefore, NOAA is developing an ensemble analysis and forecast system that can provide probabilistic forecasts of individual thunderstorms and their hazards from the time they are generated until 6 h later (Stensrud et al., 2009;Stensrud et al., 2013). Table 1 shows the configuration of NOAA's experimental Warn-on-Forecast System (WoFS). WoFS produces forecast graphics every 5 min, measuring the probability and the severity of events. Experimental results indicate that WoFS can predict thunderstorm events with associated hazards reasonably well at 0-6 h lead time and from regional to local spatial scales (Clark et al., 2021;Yussouf et al., 2020;Yussouf & Knopfmeier, 2019 Brousseau et al., 2012). The EDA is an ensemble of independent 3D-Var data assimilations that are performed by randomly perturbing observations, forecast model, and lateral boundary conditions. The AROME 3DEnVar and 4DEn-Var use flow-dependent background error covariance matrices that are computed using the ensemble members from AROME EDA. The AROME 3DEnVar has been shown to improve over 3D-Var, which uses a static background error covariance matrix, in forecasting many meteorological variables such as geopotential height, temperature, wind and humidity (Michel & Brousseau, 2021). The 3DEnVar will undergo intensive testing for its final operational implementation in 2023. A case study over France on May 26, 2018 showed that 4DEn-Var produced a closer 24-hour rainfall accumulation in comparison with radar observations than 3D-Var and 3DEnVar. Thus, the 4DEnVar will be further tested in 2023 for possible operational use in 2024. Another area of research is model error representation, which is currently based on Stochastically Perturbed Parametrization Tendencies (Palmer et al., 2009). Météo-France is undertaking work on using model parameter perturbations to represent model uncertainties for AROME-EPS (and later for EDA). This allows model uncertainties to also be represented in areas where physical tendencies are small.

| The DWD COSMO-KENDA and ICON-D2 systems
In February 2021, the Deutscher Wetterdienst (DWD) convection-permitting ensemble prediction system COSMO-D2 (-EPS) was replaced with ICON-D2(-EPS), an ICOsahedral Nonhydrostatic (ICON) model with a horizontal resolution of about 2.2 km (Reinert et al., 2020;Zängl et al., 2015). The operational DA system, known as KENDA, provides hourly analyses, using an Local Ensemble Transform Kalman Filter scheme (Schraff et al., 2016). In operational predictions, it assimilates radiosonde ascent and descent profiles, AMDAR and Mode-S aircraft data, wind profiler data, observations from surface stations, and Doppler radar winds and reflectivity from the German radar network. In addition, a latent heat nudging scheme (Stephan et al., 2008) assimilates radar-derived precipitation rates from the European radars within the model domain between analysis steps, during the first 30 min of the forecast. A separate system updates sea surface temperatures once per day and snow depth every 6 h.

| Novel observations
Current observing networks do not meet user requirements for convection-permitting DA (WMO OSCAR, 2022). This section describes some efforts to reduce datagaps by assimilating novel observations.

| Cloud-affected satellite radiances
Many centers are moving toward an "all-sky" approach for satellite DA in operational forecasting, in which the satellite radiances that are affected by cloud are directly assimilated. This could improve forecasts of weather phenomena that are poorly observed by conventional instruments, such as low stratus clouds and convective precipitation (e.g., Geer et al., 2018). Idealised experiments using the COSMO-KENDA system and simulated observations showed that assimilating cloud-affected satellite observations can bring improvements that are of similar magnitude to the benefits of radar assimilation (Bachmann et al., 2019;Bachmann et al., 2020;Schröttle et al., 2020). These benefits usually lasted longer than the lifetime of a convective system. These experiments also showed that assimilating both infrared and visible radiances was more effective than assimilating only infrared radiances.
In addition to these idealized experiments, realobservation experiments using the ICON-D2 system have been carried out (Geiss, 2021). The observations assimilated consisted of all operational observations, plus the visible channel of Spinning Enhanced Visible and Infrared Imager (SEVIRI). These experiments showed that assimilating SEVIRI visible channel satellite observations improved the forecasts of satellite specific quantities such as solar reflectance as well as meteorological quantities such as precipitation (up to 12 h). Furthermore, the assimilation improved the prediction of global horizontal irradiance at the Earth's surface which is expected to benefit solar energy forecasting.

| Ground-based remote-sensing observations
Many operational centers have been improving their treatment of radar observations (e.g., Simonin et al., 2019;Zeng et al., 2021). At the Met Office, LHN of surface rain rate has been applied for 25 years (Jones & Macpherson, 1997). Following development of improved observation operators and better treatment of observation errors, direct 4D-Var assimilation of radar reflectivity became part of the Met Office operational system in May 2022. The new trial results showed that directly assimilating radar reflectivity improves the analysis and forecast of organized bands of convection (Hawkness-Smith & Simonin, 2021).
In an experimental study in the USA, Chipilski et al. (2022) explored the impacts of assimilating ground-based remote-sensing observations on the forecasts of boregenerating nocturnal convection using the GSI-EnKF-WRF system (Johnson et al., 2015). The observations assimilated were from Radar Wind Profilers, Doppler Wind Lidar, Atmospheric Emitted Radiance Interferometers, and radiosondes. They found that assimilating all observations considered brought the largest benefit to precipitation forecasts compared to assimilating observations from a single instrument. Assimilating observations from single instruments was shown to have neutral impacts due to (1) forecast sensitivity to the initial moisture and wind fields, (2) deficiencies in the EnKF algorithm for nonlinear processes and (3) insufficient temporal frequency of radiosonde data. Overall, the promising findings from these experiments are in agreement with earlier work (e.g., Chipilski et al., 2020;Degelia et al., 2020) and pave the way for the integration of these instruments in operational convective-scale NWP systems.

| Mode-S EHS aircraft data
Mode-S EHS (enhanced surveillance) aircraft data allow the derivation of wind and temperature observations from air traffic management reports (e.g., de Haan, 2011). At the Met Office, Mode-S EHS wind observations have been assimilated operationally in the UKV convectionpermitting system since 2018. Li (2021) showed that assimilating Mode-S winds has a positive benefit on the forecast skill in wind profiles in the first 6 h of the forecast, and for hourly precipitation accumulations up to 9 h into the forecast. The assimilation of Mode-S EHS temperature data is more challenging as the temperature observations have been shown to be of lower quality, particularly in the boundary layer (Mirza et al., 2016;Mirza et al., 2019;Mirza et al., 2021). However, these data can be used after some processing (de Haan, 2013;de Haan & Stoffelen, 2012). The Met Office has brought these temperature observations into operational use in May 2022.

| Balance relationships in background error covariance modelling
In global DA, extensive use is made of balance relationships in modelling multivariate relationships in background error statistics. Geostrophic and hydrostatic balances are though weaker and less relevant for convective events (e.g., Vetra-Carvalho et al., 2012). A simplified model of convective-scale flow developed from the Euler equations (the "ABC model"; Petrie et al., 2017) and its DA system  have been used to investigate the role of these geophysical balances in DA.  showed that switching on the geophysical balances minimizes errors in the large-scale components of the analyzed flow fields. This allows wind and pressure observations of the large-scale flow to complement each other. On the other hand, switching off these balances is beneficial for the small-scale (smaller than a few 10s km). This implies that the assimilation problem should be split into two parts-one analyzing the larger scales, where geophysical balances provide useful information, and another analyzing the smaller scales, where geophysical balances are not relevant and can in fact be harmful.

| Modelling spatial correlations in observation errors
Many observation types have spatially correlated observation representation errors (e.g., Cordoba et al., 2017;Janji c et al., 2018;Michel, 2018;Waller et al., 2019;Zeng et al., 2021). It has been shown that accounting for spatial observation error correlations allows more observation information to be extracted in idealized systems (Fowler et al., 2018;Rainwater et al., 2015;Stewart et al., 2008;Stewart et al., 2013) and leads to improved forecast skill in operational systems . A spatially correlated observation error covariance matrix model has been proposed by Guillet et al. (2019), based on a finite-element discretization of a diffusion operator. The performance of the method depends on the distribution of the observations, as this determines the mesh for the finite element technique. New observation thinning strategies and their impact on the observation distribution are currently being investigated, with possible application to radar reflectivity.

| CHALLENGES
Forecasting of low stratus clouds, fog, convective precipitation, and storms is a major challenge for convectionpermitting NWP (e.g., Dance et al., 2019;Hu & Franzke, 2020). The prediction of these fast processes requires rapid DA cycling and careful treatment of many aspects of the system. In this section, we present some of the challenges discussed at the meeting.

| The handling of multiple spatial scales
Convection-permitting DA may require knowledge from both synoptic and meso scales. However, it is very difficult to correct all scales with a LAM (e.g., Baxter et al., 2011;Johnson et al., 2015). Some discussion at the meeting addressed whether we should better focus our efforts on improving just the small scales (incorporating larger scales by blending with a large-scale analysis) or whether truly multiscale assimilation techniques should be pursued. Can we produce ensembles that can represent small-scale background error statistics well? On the other hand, accurate large scale information can be important even for forecasts of very short periods (Durran & Gingrich, 2014). Furthermore, tropical regions may require different approaches from midlatitude regions.

| Model errors
In ICON-D2 simulations, model deficiencies in representing cloud statistics are observed in the following aspects: (1) too few mid-level and semi-transparent clouds; (2) too many thick ice clouds; and (3) too many clouds with low brightness temperatures. These issues have also been found in many other weather prediction models (Geiss et al., 2021). Thus, improving the representation of clouds in weather prediction models is of utmost importance. It is also important for DA algorithms to be able to take account of known model deficiencies, by accounting for model errors, through weak constraint variational DA (Trémolet, 2007), ensembles (Raynaud et al., 2012), model bias correction (Bell et al., 2004) or other approaches (e.g., Brajard et al., 2021). Development of NWP models and DA systems is a continuously ongoing process. Closer interactions between modelers and DA scientists may lead to better systems.

| Background uncertainty
Small-scale atmospheric processes, such as convection and cloud microphysical processes, are usually strongly nonlinear, so that models describing these processes can produce non-Gaussian forecast errors (e.g., following gamma or inverse-gamma distributions; Posselt & Bishop, 2018). In addition, the nonlinearity of the model enhances the need for flow-dependent background error covariances. Therefore, forecast ensembles are likely to benefit the estimation of background error statistics. The ensembles replace proxies such as forecast differences (Berre et al., 2006;Parrish & Derber, 1992). However, unlike in synoptic-scale DA, the ensemble mean should not be used as the best estimate in this non-Gaussian case (Lorenc & Payne, 2007). For instance, positive variables like rainfall amount may deviate considerably from their mean. Moreover, for highly complex distributions, one would ideally need to obtain a representative sample and the notion of a single best estimate may not be useful.

| Observation uncertainty
The assimilation of geostationary satellite and radar observations has brought great benefits to convection-permitting NWP (Gustafsson et al., 2018). However, the assimilation of these observations can be a challenge due to the non-Gaussian characteristics of observation errors and strong spatial observation error correlations.
The non-Gaussianity of observation errors needs to be carefully introduced into convection-permitting DA systems, because it will result in differently shaped error distributions (e.g., Bocquet et al., 2010).
Many recent works have addressed the issue of including the spatial observation error correlations in convective-scale DA systems. In addition to the work by Guillet et al. (2019) on modelling wind error correlations (see section 2.3.2), methods such as eigenvalue decomposition (e.g., Fowler, 2019;Michel, 2018;Stewart et al., 2013), spatial difference observations (Bédard & Buehner, 2020), and spectral transformation (e.g., Chabot et al., 2020;Ying, 2020) have also been studied. Moreover, pragmatic parallelization strategies  and numerical approximation methods  have been explored in order to reduce computational costs (particularly parallel communication costs). While the approach of Simonin et al. (2019) is already used for operational assimilation of Doppler radar winds at the Met Office, the challenge going forwards is to extend these methods to other operational centers and observation types.

| Satellite observation operators
While geostationary satellites provide spatially dense and frequent-in-time observations, many of these data are not used in DA. Many observations are discarded due to cloud-affected radiances, a lack of understanding of landsurface emissivity, a lack of knowledge on how to treat observations in visible bands and systematic model errors in representing the observed quantities. The problem of assimilating cloud-affected radiances has already been addressed in section 2.2.1. Land-surface emissivity atlases for use with fast radiative transfer schemes have recently been improved (Borbas & Feltz, 2019), but further research is needed to allow for a greater proportion of observations over land to be used in operations. An efficient and accurate forward operator for visible geostationary satellite observations has been developed over the last ten years (VISOP; Kostka et al., 2014;Scheck et al., 2016, Scheck et al., 2018Geiss et al., 2021). It is based on a method for fast 1D radiative transfer (Scheck et al., 2016) and now implemented in RTTOV (radiative transfer for TOVS), which makes it available for operational use. Several weather services are planning to use it for monitoring in the near future. However, there is still ongoing development to account for 3D-effects in this 1D operator (Scheck et al., 2018).

| OUTLOOK AND RECOMMENDATIONS
A number of future steps for convection-permitting DA research were discussed at the meeting. This section provides some outlook and recommendations for the future, focusing on the use of novel observations and better generation of ensembles.
4.1 | Improving the use of currently available observations Despite the exciting progress described in section 2.2, work is still needed to improve the use of currently available observations via improvements in satellite observation operators (see section 3.5), and increasing understanding of polarimetric radar observations such as nonprecipitation echoes (e.g., Augros et al., 2018;Rennie et al., 2011), radar refractivity  and differential phase (Augros et al., 2018). Waller et al. (2021) showed that representation error biases and correlations may be critical for convectionpermitting NWP. However, computationally feasible methods for treating large datasets with long spatial error correlation lengths still need to be developed for operational purposes (see sections 2.3.2 and 3.4).

| Assimilating new and emerging observation-types
There are many gaps in the observing network that affect our ability to forecast on convection-permitting scales (WMO OSCAR, 2022). There are few suitable observation impact measures to help guide strategic future observing network design for these systems (Fowler et al., 2020) and more work needs to be done to provide these tools and the evidence for new international observing networks. However, it is known that storm prediction requires accurate model representation of rapid changes in the near-storm environment. Ground-based remotesensing instruments (see section 2.2.2), and unmanned aircraft systems could provide well-resolved information about these environments. The use of these observations could improve the prediction of convection initiation as well as the evolution of storms. New observation operators may need to be developed for effective assimilation of ground-based remote-sensing observations (for instance, the use of raw observations instead of retrievals).
Crowdsourcing may provide new, inexpensive sources of observations (Hintz, Vedel, & Kaas, 2019). For example, private citizen's automatic weather stations (Chapman et al., 2017), surface pressure observations from mobile phones (e.g., Hintz, O'Boyle, et al., 2019) and temperature observations from cars or other vehicles (Bell et al., 2022;Siems-Anderson et al., 2020), are potentially useful sources of observations for convection-permitting DA. However, there are complex issues regarding data ownership and privacy, quality control (particularly for moving observing platforms such as mobile phones and cars), and dealing with large data volumes to be resolved before these data will see widespread use in NWP.

| Ensemble design
The use of ensembles is important for the provision of non-Gaussian, flow-dependent estimates of background uncertainty (see section 3.3), and for the provision of seamless probability forecasts (such as WoFS in section 2.1.2). Hence, there is a need for ensembles that can better describe the error statistics of small-scale atmospheric processes. Stochastic approaches and multiphysics could be considered as part of the future ensemble generation system.

| SUMMARY
This article reports on the RMetS DA SIG meeting on convection-permitting DA held in November 2021. Progress in operational DA systems at several centers, the assimilation of novel observations, and the estimation and treatment of background and observation error covariances were addressed in this report. A number of future steps for convection-permitting DA research were discussed at the meeting with a particular focus on improving the observing network in the boundary layer, better observation operators for existing observations, better treatment of observation uncertainty and better ensemble design. It is essential that these challenges are addressed to protect lives and livelihoods from hazardous weather events.