Assimilation of Meteosat Third Generation (MTG) Lightning Imager (LI) pseudo-observations in AROME-France – Proof of Concept

. This study develops a Lightning Data Assimilation (LDA) scheme for the regional, convection-permitting NWP model AROME-France. The LDA scheme intends to assimilate total lightning, i.e., cloud-to-ground (CG) and inter-and intra-cloud (IC), of the future Meteosat Third Generation (MTG) Lightning Imager (LI). MTG-LI proxy data are created and Flash Extent Density (FED) fields are derived. An FED forward observation operator (FFO) is trained based on modeled, column integrated graupel mass from 24 storm days in 2018. The FFO is successfully verified for 2 independent storm days. With the 5 FFO, the LDA adapts a 1-dimensional Bayesian (1DBay) retrieval followed by a 3-dimensional variational (3DVar) assimilation approach that is currently run operationally in AROME-France for radar reflectivity data. The 1DBay retrieval derives relative humidity profiles from the background by comparing the FED observations to the FED inferred from the background. Retrieved relative humidity profiles are assimilated as sounding data. The evaluation of the LDA comprises different LDA experiments and four case studies. It is found that all LDA experiments can increase the background integrated water vapor (IWV) in 10 regions where the observed FED exceeds the FED inferred from AROME-France outputs. In addition, IWV can be reduced where spurious FED is modeled. A qualitative analysis of 6-hour accumulated rainfall fields reveals that the LDA is capable of locating and initiating some local


Introduction
Convective weather phenomena such as thunderstorms threaten the society by producing severe weather and related impacts, e.g., flash floods, large hail, tornadoes, and strong winds.Cloud electrification and subsequent lightning discharges are caused by interactions of different ice particles inside convective clouds.The process makes lightning an effective tracer of deep to peaks of the flash rate density, and suggested a blended solution using both proxies.Yair et al. (2010) introduced a lightning potential index (LPI) that is calculated using the simulated grid-scale vertical velocity and simulated hydrometeor mass mixing ratios of liquid water, cloud ice, snow, and graupel.Barthe et al. (2010) concluded that none of precipitation ice mass, ice water path, ice mass flux product, updraft volume, maximum vertical velocity, and cloud top height could predict the lightning flash rates and trends well for both of their two cases.However, more recent studies of Formenton et al. (2013); Federico et al. (2014); Bovalo et al. (2019) confirm a key role of graupel in the cloud electrification and use ice and /or graupel content for lightning simulation and proxy definition.
With the lightning proxies, lightning observation operators are created to compare the model output to the lightning observations and assimilate the lightning data.Most of the recent LDA schemes use lightning operators based on ice hydrometeors or updraft characteristics, e.g., graupel mass or updraft volume.In the following, a brief overview of LDA techniques is presented, with the focus on 3-dimensional variational (3DVar) LDA and studies to assimilate GLM data.Fierro et al. (2012) put forward a widely used (e.g., Dixon et al., 2016;Wang et al., 2017a;Federico et al., 2017;Wang et al., 2017b) concept of lightning proxy calculating water vapor mixing ratio from simulated graupel mass and observed flash rates.They then increased the water vapor mixing ratio in the 0°C to -20°C layer where lightning was observed and the relative humidity (RH) of the background was less than 81%.Precipitating convection was better correlated with observed reflectivity fields for the LDA than for the control experiment.The method was further tested by Fierro et al. (2014) using 10-minute Flash Extent Density (FED).Lynn et al. (2015); Marchand and Fuelberg (2014) refined the nudging technique of Fierro et al. (2012) to increase the virtual temperature perturbation and favor static instability or by warming the source layer of the convective updraft, respectively.Lynn et al. (2015) also introduced an extension to suppress spurious convection.Other nudging-based LDA make use of ice-phase particle mixing ratios (e.g., Qie et al., 2014;Wang et al., 2018) and pseudo radar reflectivity (Wang et al., 2014).Mansell (2014); Allen et al. (2016) assimilated synthetic GLM total lightning represented as FED using an Ensemble Kalman Filter (EnKF).Mansell (2014) used both the simulated flash rate and a linear relationship between total lightning and graupel volume as observation operators.Allen et al. (2016) recommend a linear best fit operator based on graupel mass and graupel volume.In preparation of a hybrid variational-ensemble LDA technique for GLM, Apodaca et al. (2014) assimilated World Wide Lightning Location Network (WWLLN) data in WRF.However, the use of WWLLN as a GLM proxy is debatable as WWLLN mainly detects CG flashes, whereas GLM detects total lightning.A 3DVar LDA was applied by Wang et al. (2017b) using the relationship given by Fierro et al. (2012) and assimilated pseudo RH profiles in the form of sounding data.They noted increased forecasts skills in general, but also that the method still needs improvement to suppress spurious convection.Fierro et al. (2016) replaced their previous nudging by a 3DVar LDA with 10minute FED.They found that the Radar Data Assimilation (RDA) yielded better forecasts of convective cells during the first 30min of the forecast, while the LDA gave better storm structures one hour into the forecast.The combination of both RDA and LDA provided the highest forecast skill.Fierro et al. (2019) tested the 3DVar LDA technique developed for ground-based data with GLM total lightning observations.It adjusts water vapor mass mixing ratio (q v ) in regions of lightning by setting RH to 95% in a layer of 3km above the LCL if the background RH is less than 95%.Both LDA and RDA improved the short-term accumulated precipitation and radar reflectivity composite.Hu et al. (2020) adopted the technique of Fierro and found, as also previously reported, a wet bias in the model that increased with the forecast time.The method still misses a suppression of spurious convection in regions without lightning.They also conducted a layer depth sensitivity study with similar results for adding q v in layers of 2km to 10km depth.Kong et al. (2020) present an LDA of real GLM data in an EnKF framework.
FED at 10km pixel resolution is assimilated using both graupel mass and graupel volume-based observation operators, with positive results on the convection forecasts.LDA of Chinese satellite Fengyun-4 (FY4) lightning data was realized by Liu et al. (2020) through creating pseudo-RH profiles and by Chen et al. (2020) through retrieving maximum proxy-reflectivity and finally pseudo-reflectivity profiles from the lightning data.
The main goal of this work is to develop a 3DVar LDA for MTG-LI data to improve analyses and forecasts, especially in convective situations.Pseudo-observations of the MTG-LI are generated to form the lightning data base (Section 3).A novel LDA scheme is developed for the regional, operational, convective-scale model AROME-France of Météo-France.Various approaches in which FED is assimilated together with or without radar reflectivity and Doppler wind are compared to assess the added value of the developed LDA.Recent Var LDAs using GLM data, e.g., Fierro et al. (2019); Hu et al. (2020) cannot suppress spurious convection.Our LDA of GEO lightning data should promote convection if needed and also suppress spurious convection.
The NWP model configuration and lightning data as used for this work are briefly explained in Sections 2 and 3, respectively.Section 4 introduces the lightning observation operator developed for this study.Then, our LDA method and the model experiments are explained in Sections 5 and 6.Section 7 describes the AROME-France analysis resulting from LDA.One of four case studies is detailed using different assimilation experiments, including the new LDA.Finally, Section 9 concludes this work.
It provides 36 to 42-hour forecasts five times a day (00 UTC, 03 UTC, 06 UTC, 12 UTC, 18 UTC).After an update in 2015, the model grid comprises 1440 × 1536 grid points in the horizontal with uniform 1.3km horizontal resolution.The physical model domain and model topography are shown in Figure 1.In the vertical, the lowest model level is situated at around 5m above ground.Each column extends up to pressure level 10hPa.The vertical resolution is refined homogeneously from top to bottom by a factor of 1.5.In total, 90 vertical levels (33 levels below 2000m) are computed.Model time steps equal 50s.
Model dynamics are non-hydrostatic, semi-implicit, and semi-Lagrangian.Lateral boundary conditions (LBCs) are extracted from the global model ARPEGE (Bouyssel et al., 2022).AROME-France features a Davies relaxation (Davies, 1976) coupling and ARPEGE synchronization.The initial conditions rely on a 3DVar data assimilation technique (Section 5).
Deep convection is expected to be mostly resolved on the model grid (Fischer et al., 2018).Parametrization of sub-grid scale shallow convection is based on Pergaud et al. (2009).AROME-France uses a mixed-phase microphysical scheme with riming processes and graupel (Seity et al., 2011).In particular, the microphysics scheme of AROME-France separates five prognostic hydrometeor variables, that are specific contents of precipitating species rain (q r ), snow (q s ), and graupel (q g ) and the two non-precipitating species ice crystals (q i ) and cloud droplets (q c ).In addition, the water vapor specific content q v (also termed specific humidity) is computed.Hail is assumed to behave as large graupel particles.Overall, more than 25 processes are parametrized into the microphysics scheme (Lascaux et al., 2006).AROME-France physics include a 1-dimensional (1D) turbulence parametrization as combination of a prognostic turbulent kinetic energy (TKE) equation with a diagnostic mixing length.An Externalized Surface (SURFEX) scheme and the European Center for Medium-Range Weather Forecasts (ECMWF) radiation parametrization are other components of AROME-France model physics.Details can be found in Seity et al. (2011); Brousseau et al. (2016).

Lightning data
This work adapts the GEO lightning pseudo-observation generator as developed by Erdmann et al. (2022).It was trained using low frequency (LF) ground-based National Lightning Detection Network (NLDN) records collected over the South-East US.
The LF ground-based lightning observations in this study are provided by the French network Meteorage (Schulz et al., 2016;Pédeboy, 2015) as input.Meteorage locates total lightning with a discrimination between CG strokes and IC pulses.They are clustered to the flash level data using the same method as Erdmann et al. (2020), i.e., a spatiotemporal clustering with criteria of 20km and 0.4s.To verify Meteorage data as suitable input to the generator, Meteorage and NLDN observations were compared to space-borne ISS-LIS data as common reference (Erdmann, 2020): The flash detection effiency (DE) of NLDN relative to ISS-LIS was about 76%.ISS-LIS detected about 59% of the NLDN flashes.In France, Meteorage flash DE relative to ISS-LIS was about 83% while ISS-LIS detected about 57% of the Meteorage flashes.It should be mentioned that the low earth orbit, optical ISS-LIS observes lightning in a different way than the ground-based LF networks, with different detection efficiencies for CG and IC flashes, respectively (e.g., Erdmann et al., 2020;Zhang et al., 2019).The statistical evaluation of general flash characteristics revealed that NLDN and Meteorage flashes feature similar distributions of flash extent (average of about 7.0km), flash duration (average of about 0.2s), stroke/pulse number per flash (average of about 3.6).Meteorage, with shorter baseline distance than NLDN, recorded on average higher LF currents for the occurring flashes (average of 9.5kA and 8.3kA for Meteorage and NLDN, respectively).In all cases, flashes with large extent, long duration, or with a high number of ISS-LIS events or LF strokes/pulses were more likely detected by both LLSs than the small, short duration flashes.Overall, Meteorage was validated as a suitable input to the GEO lightning pseudo-observation generator.Pseudo-observations of MTG-LI are then generated on a regular latitude-longitude grid with average pixel resolution of 7km, which approximates the expected MTG-LI resolution over France (personal communication Bartolomeo Viticchiè, 2020).FEDs are calculated per 5-minute interval that can then be summed as needed for the assimilation.This work uses a 10-minute interval of FED data in the LDA that is centered at the time of the analysis (as done by Fierro et al., 2016;Hu et al., 2020).The short period around the analysis reduces displacement errors in the analysis.The domain is limited to 40°N to 51°N and 5.5°W to 10°E, which is inside the AROME- France physical model domain.The pseudo MTG-LI FED is referred to as FED observation hereafter to avoid confusion with the pseudo-observations (POs) created by the 1-dimensional Bayesian (1DBay) retrieval.
An example of simulated MTG-LI is provided in Figure 2  Previous studies suggest that graupel is a reliable proxy for lightning (see Section 1).Our observation operator is trained using the relationship found between MTG-LI FED observations and simulated graupel mass (m g ) from the 1-hour forecast of AROME-France, for 24 days in 2018 (2 days per month).FED time periods of 10min are used (e.g., as in Fierro et al., 2016;Hu et al., 2020) centered at the corresponding time of the AROME-France analysis.The m g profile is extracted from the AROME-France grid point closest to the FED pixel center.Thus, one specific m g profile is related to one FED value.This gives an equal count of FED and column integrated m g values that are used to train and validate our observation operator.Following Deierling et al. (2008), m g is taken from layers where the temperature was below -5°C.As a reminder, AROME-France graupel mass combines graupel and hail.All FED and column graupel mass values are further processed as climatological distributions (see Figure 3) regardless of observation location and time.This approach is different from Deierling et al. (2008); Barthe et al.Analyzing the training data, the linear relationship fits the majority of the data well.A discrepancy is identified for the largest graupel mass and FED values (Figure 3).Here, the observation operator tends to underestimate the FED for a given graupel mass.The high values (of both FED and graupel mass) are rare relative to the lower values as indicated by the colored pixels in the 24-day training dataset.Furthermore, the validation data comprises one single meteorological situation, while the training data includes several weather situations in different seasons.The observed discrepancy is considered during the evaluation of results.

1DBay+3DVar assimilation method
FED cannot be assimilated directly since lightning is not predicted by AROME-France.The previously described observation operator relates the FED observations to the prognostic variables of the NWP model and allows for comparing background and observations.The final step prior to the assimilation creates pseudo observations that can be ingested in the 3DVar system.
Unlike Fierro et al. (2019); Hu et al. (2020) and others who used an empirical method to adjust moisture in thunderstorms, the expected water vapor PO is retrieved for each model grid column (applying Bayes' Theorem) in a 1DBay approach.This allows to i) replace the humidity field in spurious convection areas with that of their convection-free environment, and thus in principle abolish the wet bias that results from data assimilation techniques that only consider the occurrence of lightning (e.g., Fierro et al., 2012Fierro et al., , 2019;;Hu et al., 2020), and ii) make use of the FED value to modulate the humidity field in observed lightning areas by leveraging the model's ability to create consistent, flow-dependent humidity and graupel profiles near the observation.Technically speaking, synthetic profiles are created and assimilated as sounding data.

1DBay retrieval
All FED values are transformed to units of dB as 10 • log 10 (FED/(7km × 7km 10min) −1 ) to account for the large range of scales, referred to as dBFED.FED equal to zero is transformed to dBFED of -10dB.In general, dBFED below 0dB means linear FED less than 1 1 (7km × 7km 10min) −1 , i.e., no lightning activity.The 1DBay retrieval of POs of relative humidity RH, y RH po , is defined as with the weights W i for each point as where i, j are counters for the grid points within the defined area, referred to as vicinity, FFO means the FED forward operator as specific observation operator, and σ o the standard deviation of the 1DBay retrieval.dBFFO(x i ) then defines the simulated AROME-France dBFED by converting the output of the FFO to dB units.
Equations ( 1) and ( 2) can be used with different model variables or diagnostics, e.g., the sensitivity test for σ o as described in the following used x FED i rather than x RH i .
Each RH PO is the best estimation from a weighted linear combination of RH profiles taken from the model background in the vicinity of the FED observation.
The method was proposed by Caumont et al. (2010), used in the operational model AROME-France (Wattrelot et al., 2014), and applied by Borderies et al. (2019) for radar reflectivity data, and Duruisseau et al. (2019) for microwave radiances.Here, weights are calculated based on differences between observed and simulated FED.The 1DBay retrieves the best estimate of RH at the center of each observed FED pixel from the background using Bayes' Theorem.It is expected that the model can predict a quantity similar to the observation within a certain area in most cases.The vicinity is initially fixed to a square of 160km × 160km centered at the observation.This size follows the suggestion of Borderies et al. (2018Borderies et al. ( , 2019)).However, the FED pixels (about 7km) are significantly larger than the radar reflectivity pixels (about 1km) and a given vicinity contains less FED than radar datapoints to retrieve the expected profile.Hence, the vicinity is adapted to the specifications of the FED data.
1DBay vicinities of 160km, 320km, and 500km were tested regarding the RMSE between observed FED and retrieved FED (not shown).It was found that larger vicinities give lower overall RMSE as there are more background profiles to feed the retrieval and to find grid points that are similar to the FED observation.In consequence, all LDA experiments presented in the following use a 1DBay vicinity of 500km.
The database comprises model forecasts at the observation time and in the vicinity of the observation location.Since FED has no vertical dimension, the covariance matrix is reduced to a single value of variance, σ 2 o , which is assumed to be constant.A small σ o means that the retrieval favors columns with values close to the FED observation.This can produce accurate retrievals; however, no retrieval can be created when the difference between all simulated FED in the vicinity and the observed FED is large relative to σ o (see Equation ( 2) where weights approach 0 in that case).Large values of σ o cause smoothing over all grid points in the vicinity of the observation.In that case, the likelihood to retrieve POs is high, at the cost of a less accurate retrieval potentially independent of the observation.
An assumption of the Bayesian retrieval is that differences between the observation and the model background, referred to as innovations, are Gaussian distributed.Figure 4 shows the PDF of innovations (dBFED observation minus AROME_dBFED) for the 24-hour assimilation cycles on 08 Aug. 2018.The distribution is bell-shaped and centered at 0. It is symmetric and the skewness is close to 0. Although the kurtosis is higher than for a classical Gaussian distribution, this distribution of innovations sufficiently fulfills the assumption of the 1DBay method to justify its use in this study.
The value for the standard deviation σ o of the observation and observation operator is inferred from a sensitivity study.It aims at minimizing the root mean square error (RMSE) between observed and retrieved pseudo FED for the 24 training days also used in Section 4. Pixel-to-pixel RMSE is used for all non-zero FED pixels and 10-minute intervals centered at each full hour.The pseudo FED is computed from joint use of the observation operator and the 1DBay retrieval.Figure 5(a) shows the curve of the RMSE between dBFED observation (from pseudo MTG-LI) and retrieved pseudo dBFED from the 1DBay retrieval for different σ o .The RMSE between AROME-France background dBFED (AROME_dBFED), that was obtained from the observation operator without additional 1DBay, is shown as reference.Including the 1DBay retrieval (blue curve in Figure 5(a)) produces pseudo dBFED much closer to the dBFED observation, i.e., lower RMSE, than the AROME_dBFED (orange curve in Figure 5(a)).The minimum RMSE for the retrieval is found at σ o of 2.0dB.This value is used for retrieving the pseudo-RH profiles in the following.
Case studies of fields of observed, background, and retrieved dBFED are conducted to visualize the effect of σ o on the pseudo dBFED.The example of 07 Oct. 2018, 00:00 UTC is presented in Figure 5(b-d).The dBFED observation in Figure 5  of lightning observations, however, also widespread near the center of the domain and over western France.The map of the 1DBay retrieved pseudo dBFED in Figure 5(d) demonstrates that the method effectively reduces the spurious dBFED in these regions (retrieved dBFED below 0, grey and marine blue).In some regions, the 1DBay retrieves pseudo dBFED to completely remove the spurious lightning, e.g., over western France and for some grid points over Switzerland.Spurious high values of AROME_dBFED near the center of the domain (Figure 5c) are effectively decreased to negative dBFED (marine blue in Figure 5d) in other words no lightning activity, similar to the observation in that region (Figure 5b).Furthermore, the pseudo dBFED values and areas of positive pseudo dBFED in the south center of the domain closely match the dBFED observation.
The observed, weak dBFED over the Gulf of Genoa could not be reproduced by the 1DBay retrieval as the background does not provide any positive dBFED in the retrieval vicinity.
An additional method, the humidity adjustment (HA), is applied if lightning (i.e., positive FED) is observed but all background FED values in the vicinity are zero.In this case, there is no estimated profile at this point and all W i in the vicinity equal zero.The HA is also applied if all background dBFED values within the vicinity of the FED observation are at least 10dB smaller than the observed dBFED value.Although the sum of W i can become greater than zero in this case, the retrieval would generate a profile that is too dry with respect to the FED observation.To produce RH POs, the layer between lifted condensation level (LCL) and 13km is saturated (i.e., RH set to 100%) at all levels where the modeled RH is less than 100%.This is conceptually similar to the method of Fierro et al. (2019), among others.However, the HA is only applied for few pseudo-RH profiles where the 1DBay method did not retrieve POs.For instance, the HA only contributes to 1.6% of almost 20,000 assimilated profiles during 08 Aug. 2018, the study case that is detailed below.
Another case with Equations ( 1) and (2) equal to 0 for all weights may occur if the observed FED equals 0 and the background FED is positive for all grid points in its vicinity.This behavior was observed for the initial 160-km vicinity, but not within the 320-km and 500-km vicinity.Thus, if the vicinity has a sufficient size, there are background grid points without lightning activity that can be used in the 1DBay retrieval.If, however, the specified vicinity were too small, one would need to artificially remove spurious FED, i.e., convection and humidity, from the model.
Eventually, no pseudo-RH profiles are created if both the observed FED and the closest (in space) AROME-France background FED equal zero.
It should be mentioned that the 1DBay retrieval method was initially developed and applied to retrieve humidity and cloud profiles from passive and active remote sensing data (e.g., Olson et al., 1996;Kummerow et al., 2001).Caumont et al. (2010) brought this approach forward by restricting the 1DBay method to use model profiles at the forecast time and in the neighborhood of the observation.This approach was successfully applied by Wattrelot et al. (2014); Borderies et al. (2019) for radar reflectivity assimilation in AROME-France.However, a 1DBay has not yet been used to retrieve humidity profiles from FED data.Whereas an FED greater than zero is always related to the presence of graupel and thus a RH profile with a cloud, an FED equal to zero does not necessarily mean a location without cloud coverage.It is the same problem which is faced for radar reflectivity, but presumably more marked than for the centimeter wavelength radars (Caumont et al., 2010) and even more than for the millimeter wavelength radars which are even more sensitive to the small hydrometeors (Borderies et al., 2019).In addition, FED is a 2D variable without vertical extent while the 1DBay retrieves vertical RH profiles.The use of integrated column graupel mass as proxy addresses the latter aspect by converting the 3D AROME-France outputs into a 2D variable comparable to FED.RH-profiles are assimilated since assimilating hydrometeor contents when VAR is able to update these variables often results in poor performances because the cross correlations with key variables such as temperature and humidity are poorly represented in climatological B matrices.To mitigate this effect, some recent studies assimilate humidity along with hydrometeor contents (see, e.g., Wang et al., 2013;Do et al., 2022, for radar reflectivity).

3DVar assimilation
The retrieved pseudo-RH profiles are assimilated as sounding data in the 3DVar assimilation system of AROME-France.
AROME-France uses a one-hour assimilation window.The short assimilation cycles aim at partially overcoming the missing temporal dimension and at allowing the assimilation of more high-frequency observations that can improve the initial conditions especially on the convective scale.AROME-France operationally assimilates surface (e.g., ground-stations, ships, buoys) and aircraft measurements, Global Positioning System (GPS) Zenith Tropospheric Delay (ZTD) data, satellite brightness temperatures of several polar orbiting satellites and from Meteosat Second Generation (MSG) SEVIRI, satellite-based atmospheric motion vectors, and radar velocity and reflectivity data (Seity et al., 2011;Brousseau et al., 2016).The control variables are temperature, specific humidity, surface pressure, and horizontal wind components.The 3DVar system minimizes the 3DVar cost function J of the state vector x: with the state vector of the background x b , the observation vector y o , the observation operator H, and the observation error covariance matrix R. The climatological background error covariance matrix B is inferred from offline AROME-France ensemble assimilation as a multivariate set of calculations for the control variable covariances and cross-covariances (Brousseau et al., 2014).

Experimental set-up
This section evaluates the effect of LDA relative to RDA.Since the application of the 1DBay retrieval for FED data constitutes a new approach, different experiments of AROME-France with respect to LDA are conducted.All experiments are initiated at 0000 UTC and run for a forecast period of 30 hours until 0600 UTC the following day.While analyzing first LDA experiments, it was evident that the use of all FED observations led to wrong results, i.e., the changes to the AROME-France background humidity contradicted the assimilated pseudo-RH profiles (not shown).This behavior is known to occur because the R matrix is diagonal whereas observation error cross-correlations are actually present, which leads to sub-optimal solutions (e.g., Rabier, 2006).It could be mitigated by reducing the number of assimilated observations (thinning Järvinen and Undén, 1997).In this work, FED data are thinned by a factor of 2 in latitude and longitude direction, i.e., 1 in 4 observations are assimilated, so that no observations of adjacent FED pixels are used.Tests revealed that this thinning was sufficient to eliminate the FED observation error correlations.For comparison, 1 in 64 radar observations, with a higher horizontal resolution than our FED data, are assimilated in AROME-France (Michel, 2018).The thinning, thus, prevents from assimilating observations with correlated observation errors, which would contradict the assumption of a diagonal R matrix.The resulting LDA without RDA is labeled LDA.
The second LDA experiment without RDA ,LDAnC, adds a so-called noCloud-filter.This noCloud-filter is utilized for locations where the observed FED equals zero but the AROME_FED exhibits lightning activity.Then, the distance to the closest positive FED observation, d FED , is computed.If d FED remains within 21km, i.e., a maximum of three FED pixels, it is assumed that the profile is still situated within the same thundercloud being responsible for observed FED greater than zero.In this case, the RH profile of the background is kept to avoid reducing the RH if observed FED equals zero but the location is likely associated with a cloud.In the case where d FED exceeds 21km, AROME-France profiles within 21km are not considered in the 1DBay as they probably belong to spurious simulated thunderclouds.Hence, the noCloud-filter should help to effectively reduce background humidity in spurious convection.All model experiments are initiated one day prior to the start of the model forecast.During the first 24 hours, the 3DVar assimilation system of AROME-France creates 23 analyses (there is no analysis for the time of initialization).The reference, LDA, and RDA experiments are conducted for 23 hours prior to the evaluated analysis.This time period has been chosen because convection was continuously observed inside the model domain.In addition, the long assimilation period allows AROME-France to efficiently ingest all available observations.

The AROME-France analysis and forecast assimilating FED observations
This section describes the LDA ability to modify and update the model background.The following sections briefly introduce the four test cases, detail one case, and discuss i) the LDA effects on the AROME-France background and ii) AROME-France rainfall forecasts for this selected case.AROME-France simulations are mainly analyzed for the first 12 hours of the forecast since the strongest impacts of RDA and LDA are expected during these forecast hours, as seen in Fierro et al. (2019), for example.The influence of the lateral boundary conditions becomes predominant after 12 hours (Vié et al., 2011).LDA experiment reduces the model IWV at rare locations where the AROME_dBFED exceeded the observed dBFED, e.g., due to a slight spatial shift of the local dBFED maxima at about 44.3°N and 0.8°E (compare Figure 7a,c and g at this location).
The 1DBay method aims at finding the profiles that are physically consistent with FED observations in terms of relative humidity.Indeed, the analysis (Figure 7f) adds humidity to the background over the southwestern regions and for the cell over eastern France.It can also be seen in the difference between analysis IWV and background IWV (Figure 7i) where LDA reduces model IWV where spurious lightning activity was predicted.In addition, Figure 7(h) displays the IWV as output of the 1DBay retrieval method where no background IWV is included.It is intended that the 1DBay retrieves complete profiles, whereas the HA only adds humidity to certain layers.A smaller IWV from pure PO (Figure 7(h)) than from background plus PO (Figure 7(e)) means that the HA method was used.The comparison of Figure 7(e) and (h) shows the same IWV values for most points with PO.This result means that the HA is rarely used, and the preferred 1DBay method, that retrieves flow consistent PO profiles, is mostly used to get the POs.  as evidenced by IWV was added to the background since AROME_FED was lower than the FED observation.RH is reduced in columns near 4°E where spurious AROME_FED was found (Figure 7a,c).Figure 8 also shows that the RH is not changed when observed FED was higher than AROME_FED but the background RH was at least 100% (e.g., altitudes up to 4km and from 4°W to 2°W).All in all, the vertical cross-section shows that changes induced by the LDA result in physically consistent analysis in the vertical struture.

AROME-France rainfall forecasts
The Antilope rainfall accumulation (RA) combines the data of the operational radar network of Météo-France and rain gauges on a 0.01°resolution grid at each hour (Laurantin, 2008(Laurantin, , 2013)).RA maps for the first 6 hours of the forecast are shown in area of high RA in the northwest of the domain for track 1. Experiments RDA (Figure 9c) and LDA (Figure 9d) overestimate the maximum RA over this region by about 20mm and 5mm, respectively.Track 2 is best predicted by experiment LDA, with good agreement in maximum RA and area of high RAs to the observation (Figure 9a).Experiment LDAnC (Figure 9e) also predicts the extent of the RAs related to track 2, but underestimates the RA amounts especially for the northern part of the track, i.e., most recent storm location to the prediction time.Experiment CTRL and the experiment using RDA without LDA poorly predict the high RAs related to track 2. It is not clear whether the storm is modeled at all or placed too far south.This case study implies that LDA without both RDA and the noCloud-filter of FED has the highest Fraction Skill Score (FSS, not shown, see Section 8) for predicting heavy precipitation.In that it differs from the other three case studies where the noCloud-filter improves the LDA experiment skill.A future analysis might detail why the RDA and LDA with noCloud-filter have lower skill than the LDA.Such an analysis is beyond the present proof-of-concept scope.
with the observation and forecast binary fields I O and I M that equal 0 if the field value is smaller than the threshold, and 1 otherwise.Our implementation uses the fast calculation of FSS in Python as proposed by Faggian et al. (2015).The FSS is calculated hourly for 6-hour RAs with a sliding 6-hour time window for the 30-hour forecast period.Forecasts are initiated at 00:00 UTC.The FSS can be generalized to yield an average score if the numerator and denominator are averaged separately and then the FSS is calculated (Faggian et al., 2015).This allows to achieve overall FSSs including forecasts of all four case studies.
Here, 6-hour RA thresholds of 0.1mm, 1.0mm, and 10.0mm are used to represent different RA categories.An FSS neighborhood of 0.5°is used.Figure 10 shows the FSSs for the entire forecast period of 30 hours and combination of the four cases.
FSS of 0.8 to 0.9 during the first 15 hours of the forecast for the RA thresholds 0.1mm and 1.0mm (Figure 10a and b, respectively) indicate that regions with precipitation were equally well identified by all six AROME-France experiments.
RDA or LDA effects on the FSS diminish beyond 12 hours of the forecast as the effect of the boundary conditions becomes predominant.The LDA without RDA and without the noCloud-filter (LDA) gains the highest FSS for the high RA threshold and during the first 12 hours of the forecast (Figure 10c).This finding demonstrates the high potential of LDA in AROME-France.
The noCloud-filter cannot always improve the LDA.RDA exhibits the lowest FSS during the first 6 hours of the forecast due to a low skill during the Aug case where high RAs of storm track 2 over les Pays de la Loire were not predicted (see also Figure 9).The combination of LDA and RDA (RDA_LDA) gives FSS between the skill of RDA and LDA.
This proof-of-concept study uses four cases, a rather low amount of data.Usually, scores are calculated over several months for evaluating whether a new method improves an existing NWP model.FSS curves appeared noisy for high thresholds and short accumulation periods (not shown) as a result.The main conclusion here is that there is likely not enough data to show significant differences, however, the encouraging point being that the effect of the assimilation is neutral.

Conclusions
The objective of this work is to design an assimilation technique for the upcoming MTG-LI data for the regional, convectionpermitting model AROME-France.To date, AROME-France applies a 3DVar assimilation system.A tailored 1DBay+3DVar assimilation technique (Caumont et al., 2010) is used to assimilate pseudo MTG-LI flash extent density (FED) in AROME-France.A similar assimilation technique is currently used operationally for radar reflectivity data assimilation in AROME- France, but has not yet been tested for LDA.
This work first generated MTG-LI data that are used to create the FED observations (Erdmann et al., 2022).Then, an observation operator for FED is developed based on a linear, climatological relationship between observed FED and the column integrated AROME-France graupel mass, m g , above the -5°C isotherm.The operator is trained for 24 days in 2018, and validated for 2 independent days in 2018.Pearson correlation coefficients of 0.97 and 0.92 for the training and validation data, respectively, reveal a very strong relation between the distributions of observed FED and model m g .Nevertheless, the observation operator systematically overestimates the FED for m g values greater than 1.5e7kg per AROME-France grid cell of 1.3km × 1.3km.More sophisticated observation operators are currently tested (Combarnous et al., 2022) but have not been included in this work yet.
The observation operator is then used to compare AROME-France-derived background FED (AROME_FED) to the FED observations.The 1DBay method identifies the best estimation of the FED from the background to in turn create pseudoobservations (POs) of RH-profiles based on both the FED observation and the AROME-France background fields.As background profiles are processed, the 1DBay method maintains model physics and flow characteristics.
The PO RH profiles add humidity to the AROME-France background where the observed FED exceeds the AROME_FED.
It is further found that the 1DBay retrieval leads to reduction of humidity where the observed FED equals zero and the AROME_FED is positive, i.e., in regions of spurious convection with substantial m g .Hence, the LDA technique improves the AROME-France background humidity.It is capable of both promoting convection in regions with lightning and suppressing spurious convection.This was successfully verified in four case studies where the new LDA technique provided similar skill than the operational RDA in AROME-France.
FED exhibits the highest values near the convective core of a thunderstorm and the lightning activity does not always cover the entire cloud size.In fact, zero FED exists at cloudy locations.In order to address the specific nature of FED data, the 1DBay retrieval method is adapted.In detail, a wider vicinity is used to identify vertical profiles in the 1DBay method and a so-called noCloud-filter is introduced.First results reveal that this adaption of the method can help to more effectively reduce the background humidity in regions of spurious convection, and to avoid a reduction of the background humidity if the profile occurs at the location of a cloud.Nonetheless, one of the four case studies revealed more skill in the LDA without the noCloud-filter than the LDA using this modification.The noCloud-filter is a first approach trying to overcome this issue by keeping the background humidity constant when lightning was correctly simulated but not observed.The authors encourage further research on how the specifications of FED data can best be addressed by the LDA scheme, i.e., through correlations between lightning locations and cloud cover.
In addition, this study found that thinning of the FED observations (on a grid of resolution of 7km) was necessary to avoid the effect of correlations between the observation errors that violates the assumptions of a diagonal observation error matrix in the AROME-France 3DVar data assimilation system (not shown).
Finally, forecasts of 6-hour rain accumulations are evaluated through FSS analysis.The developed LDA scheme can in general compete with the established RDA, an encouraging result for further testing and development of the LDA in AROME-France.Positive effects on the forecast of rainfall by both RDA and LDA are found mainly for the high precipitation threshold and during the first 9 to 12 hours of the forecast.Longer forecast times show small spread in FSS between control run, RDA, and LDA for the three RA thresholds indicating that the assimilated radar and FED data do not significant effect the model forecast after 12 hours.The FSS of the combination of RDA and LDA indicate in most cases skill between the RDA and the LDA.The combined RDA-LDA approach provides the best trade-off in general sense and with respect to the assimilated observations.The case of 02 February 2018 was unique in that neither RDA nor LDA improved the FSS of the control run.This case was significantly influenced by weather phenomena over the Atlantic Ocean, with limited radar and Meteorage coverage.
Real MTG-LI data will provide more accurate lightning records over the Atlantic than the synthetic MTG-LI data based on Meteorage observations used in this study, and the LDA performance might improve in such situations.
It could be further studied whether the radar reflectivity and LDA can be coupled.A coupled approach may help to overcome some issues explained for zero FED at cloudy locations in general or at least for precipitating clouds, and, at the same time benefit from the successful promotion and suppression of convection by both RDA and LDA.As the LDA explored here is based on satellite observations, such a coupled approach will also benefit forecasts over regions with limited radar coverage.
The specific improvement over regions such as oceans and mountain ranges has not been studied here since our pseudo MTG-LI observations are based on ground-based Meteorage records.A study addressing in particular the skill over such regions would be of great interest as soon as real MTG-LI data become available.

Figure 1 .
Figure 1.AROME-France physical domain and model topography.The AROME-France horizontal grid is equidistant at 1.3km resolution.

Figure 2 .
Figure 2. Simulated MTG-LI FED in the entire domain (a) and zoomed on the region of the maximum FED value (b) on 7km × 7km grid.The Meteorage strokes and pulses that were used to generate the MTG-LI flashes are superimposed on the FED in (c).Simulated MTG-LI flash centroids and the corresponding Meteorage strokes+pulses in (d).Example for the period from 09 Aug. 2018, 13:55 UTC to 14:00 UTC.
Figure 2(b).Throughout the domain, the FED pixels have a size of roughly 7km times 7km.

Figure 2
Figure 2(c) superimposes the zoomed-in simulated FED and the Meteorage CG strokes and IC pulses that were used as input to the GEO lightning pseudo-observation generator.The vast majority of strokes and pulses lies within the area of non-zero FED. Figure 2(d) illustrates that the simulated MTG-LI flash centroids are situated within the corresponding Meteorage stroke and pulse distribution.

( 2010 )
who used storm-based relationships of case studies, and fromMcCaul et al. (2009) who used the domain-wide peak values.Pixel-to-pixel m g and FED were barely correlated in our study, with about 0.09 Pearson correlation coefficient, likely as a consequence of a typical displacement of convection in the model by more than the FED spatial resolution of 7km.The second approach with domain-wide peak values was not further tested as it reduces the sample size for a regression analysis drastically.Our observation operator uses a simple linear regression between observed FED and simulated column graupel mass.It is unbiased by definition.Our approach optimizes both the slope factor and the y-intercept of the regression, whereas, e.g.,McCaul et al. (2009) only used a proportionality between FED and a proxy.

Figure 3
Figure 3 presents the analyzed linear relationship between FED and column integrated graupel mass m g .The functions are obtained after sorting both the FED and m g data individually.Paired datapoints in Figure 3 are geographically independent values as described in Combarnous et al. (2022).It should be noted that all pixels with either FED equal to zero or m g equal to zero are removed from the data.The observation operator represents the cases when lightning was actually observed.The training data (24 days) results are shown in Figure 3(a) and the results for 06 and 07 October 2018 as validation period can be seen in Figure 3(b).The Pearson correlation coefficient equals 0.97 for the training data and 0.92 for the validation data (0.96 combined).

Figure 3 Figure 3 .
Figure3(a).Hence, this observed discrepancy litte affects the Pearson correlation coefficient.It should still be considered that high FED values are systematically underestimated.It is further noted that the y-intercept is negative, meaning that statistically a certain mass of graupel is required to initiate lightning.This result is well in accordance with the widely accepted noninductive charging as main electrification process in extratropical storms.The validation data (Figure3(b)) roughly follow the regression line inferred from the training data.One can see, however, that the slope is smaller than that of the black regression line, i.e., observed values of FED are always lower for a given m g than the training data implies.The number of FED-m g -pairs is significantly lower for the 48-hour validation case than for

Figure 4 .
Figure 4. Innovations of FED observation minus AROME_FED for the 24-hour assimialtion cycles on 08 Aug. 2018.Skewness, kurtosis, and the sample size N are given.The red line shows the Gaussian fit on the PDF of the innovations.
(b) shows lightning activity mainly in the south center of the domain.Figure 5(c) indicates positive background dBFED in the region

Figure 5 .
Figure 5. (a) The sensitivity test for standard deviation σo of the 1DBay retrieval as inferred from the pixel-to-pixel RMSE between the dBFED observation and the AROME_dBFED (background) as well as the 1DBay retrieved pseudo dBFED (pseudo).07 Oct. 2018, 00:00 UTC, case with the MTG-LI dBFED observation (b), the model background AROME_dBFED (c), and the 1DBay retrieved pseudo dBFED with a σo of 2 (d).Grey color means no lightning.

Figure 7 .
Figure 7. (a) The background AROME_dBFED, (b) the 1DBay retrieved dBFED, (c) the MTG-LI dBFED observation, (d) the AROME-France background IWV, (e) PO IWV merged with the background where no profiles were retrieved, (f) the IWV of the analysis, (g) the difference between the PO IWV minus the background IWV, and (h) the 1DBay-only IWV including only points where RH was retrieved, and (i) the difference in IWV of analysis and background.Results for 08 Aug. 2018, 23:00 UTC and model experiment LDAnC.

Figure 8 .
Figure 8. Vertical cross-sections along 44°N of (a) the AROME background RH and (b) the difference of analysis minus background RH.Sawtooth features visible at 4°E are interpolation artefacts caused by the grid's irregularity.Results for 08 Aug. 2018, 23:00 UTC and model experiment LDAnC.

Figure 8
Figure8illustrates RH vertical structure on the 44°N latitude between 4°W et 5°E at the same time as fields in Figure7are taken.AROME-France background RH (a) and the difference of analysis minus background (b) are included.The latitude was chosen as the analysis increases and decreases the background IWV here (Figure7i).The latitude also exhibits regions where AROME-France over-and underestimated the FED (Figure7a,c).In Figure8(b), higher RH in the analysis than the background (positive values) from 4°W to 2°W (-4 to -2 in the figure) corresponds well with Figure 7(g, i) where humidity

FigureFigure 9
Figure 9a to d for (a)Antilope and experiments (b)CTRL, (c)RDA, (d)LDA, and (e)LDAnC.Three major thunderstorm tracks, labeled 1 to 3 from north to south hereafter, produce the bands of high RAs over the northwestern part of the domain (Figure 9a).The local maximum with very high RA up to 150mm per 6 hours over southern France is caused by a quasi-stationary thunderstorm development.The AROME-France experiments (Figure 9b-e) predict the RA of track 1 relatively well.Experiment CTRL (Figure 9b) and the two LDA experiments (Figure 9d,e) underestimate the

Figure 10 .
Figure 10.FSS average of 6-hour RAs calculated hourly for a sliding window during the 30-hour forecast initiated 00:00 UTC for the four study cases.The colors and line style indicate the model experiments as defined in Section 6. 3 different RA thresholds are used, (a) 0.1mm, (b) 1.0mm, and (c) 10.0mm.The size of the neighborhood used to calculate the FSS was set to 0.5°.

Table 1
lists the six different assimilation experiments.First, there is a control experiment used as reference without RDA, without Doppler wind velocity assimilation, and without LDA.It is called CTRL.The second experiment includes the use of radar data and is similar to the current operational AROME-France.It is referred to as RDA.All RDA experiments shown here assimilate both reflectivity and Doppler wind velocity.

Table 1 .
Simulation and assimilation techniques of the different AROME-France experiments.They differ by the use of radar data assimilation (RDA) and lightning data assimilation (LDA), as well as the application of the noCloud(nC)-filter as described in the text.
RAs of track 3 are similarly predicted by the RDA and LDA AROME-France experiments that all underestimate the RA of the southern part, i.e., during the beginning of the forecast period, and overestimate the RA of the northern part of the track, i.e., the most recent RA.Experiment CTRL predicts the southern part of track 3 arguably better than the RDA and LDA experiments.6-hour RAs related to the local thunderstorm over southern France are best predicted by CTRL and RDA.Both LDA experiments somewhat underestimate the local RA maximum and the area of high RAs.In addition, the LDA experiment produces a spurious, local high RA cluster over northeastern Spain.

Table 2 .
Summary of the 4 case studies.All case studies are summarized in Table2.Besides the detailed case 1, the other case studies analalyzed a cyclone with mainly frontal precipitation in autumn (case 2), shallow convection in winter (case 3), and widespread deep convection in late spring (case 4).The evaluation of these three cases revealed that the RDA and LDA improve the RA forecast skill for convection within the study domain over western Europe.Once RDA and once LDAnC provide the best RA forecast.Experiments combining RDA and LDA exhibit FSS values between RDA and LDA, thus, mean the best trade-off overall.In the future, a coupled assimilation of RDA and LDA could be tested.For example, PO RH profiles could be retrieved from a weighted product that includes both radar reflectivity and FED thereby adapting the 1DBay.In situations, where convection from the Atlantic Ocean is advected into the study domain from the west, i.e., case 3, RDA and LDA could not improve the control run.Both RDA and LDA currently rely on ground-based observations with limited coverage over the Atlantic ocean.It is expected that the real MTG-LI observations will help to predict cloud systems forming over the ocean with higher efficiency than the LDA using the generated MTG-LI observations derived from ground-based This section compares the predicted 6-hour RAs of different AROME-France experiments (Table1) and the observations in a statistical way.The forecast skill is quantified calculating Fractions Skill Scores (FSSs).The FSS was introduced by Roberts and Lean (2008): The FSS can be calculated as skill score from the mean squared error (MSE) for the observed and forecast fraction O (n) and M (n) , respectively, from a neighborhood of length n as