Simulating atmospheric transport of the 2011 Grímsvötn ash cloud using a data insertion update scheme

Abstract Effective modelling of atmospheric volcanic ash dispersion is important to ensure aircraft safety, and has been the subject of much study since the Eyjafjallajokull ash crisis in Europe in 2010. In this paper, a novel modelling method is presented, where the atmospheric transport of the 2011 Grimsvotn ash cloud is simulated using a data insertion update scheme. Output from the volcanic ash transport and dispersion model, NAME, is updated using satellite retrievals and the results of a probabilistic ash, cloud and clear sky classification algorithm. A range of configurations of the scheme are compared with each other, in addition to a simple data insertion method presented in a previous study. Results show that simulations in which ash layer heights and depths are updated using the model output generally perform worse in relation to satellite derived ash coverage and ash column loading than simulations that use satellite-retrieved heights and an assumed layer depth of 1.0 km. Simulated ash column loading and concentration tends to be under-predicted using this update scheme, but the timing of the arrival of the ash cloud at Stockholm is well captured, as shown by comparison with lidar-derived mass concentration profiles. Most of the updated simulations in this comparison make small gains in skill on the simple data insertion scheme.


The Grímsv€ otn eruption
Airborne volcanic ash poses a hazard to aircraft, but in restricting airspace to avoid ash, the civil aviation industry risks large economic losses, as in the case of the April and May 2010 eruption of Eyjafjallaj€ okull, Iceland (Oxford Economics, 2010). The following year in Iceland, a sub-glacial eruption of the Grímsv€ otn volcano started on the evening of 21 May 2011 and continued for 7 days. The eruption quickly broke through the glacial ice cover and the basaltic magma and glacial water interaction caused explosive tephra formation. The eruption produced higher plumes (Arason et al., 2011;Petersen et al., 2012), but was shorter lived than the Eyjafjallaj€ okull 2010 eruption, with coarser ash particles (Icelandic Met Office, 2011;Ansmann et al., 2012). During late May 2011 there were a number of low pressure weather systems which caused rainfall to the south of Iceland and variation in wind direction (Stevenson et al., 2013). The atmospheric conditions were expected to lead to rapid fall out of ash particles and a lesser impact on Europe than the 2010 Eyjafjallaj€ okull eruption (Icelandic Met Office, 2011;Tesche et al., 2012). Air traffic was disrupted in Iceland, Greenland, northern UK and Ireland from 24 May and northern Germany on 25 May .

Satellite measurements of ash
Volcanic Ash Advisory Centres (VAACs) provide safety advisories to civil aviation authorities informed by output from volcanic ash transport and dispersion models (VATDMs). Infrared satellite observations are valuable tools for VAACs to monitor ash clouds and validate model output. However, infrared detection of ash can be influenced by meteorological clouds, which, when overlying an ash layer, can obscure the characteristic negative brightness temperature difference (BTD; the brightness temperature at the~11 mm channel minus the brightness temperature at the~12 mm channel) of silicate ash (Prata, 1989). Water or ice (which normally exhibit positive BTD) within an ash cloud can decrease the magnitude of the negative BTD, and acting as condensation nuclei, the characteristic signature of ash particles can be obscured by becoming encased in ice (Rose et al., 1995). Kylling et al. (2015) studied the effects of meteorological cloud on ash detection in simulated imagery from the Meteosat Second Generation Spinning Enhanced Visible and Infrared Imager (SEVIRI) for the Eyjafjallaj€ okull 2010 and Grímsv€ otn 2011 eruptions. The authors simulated both cloudy and cloudless scenes containing ash using a radiative transfer model. Using the BTD ash detection approach, the presence of cloud led to an average 6e12% reduction in the detection of ash containing pixels, and up to 40% of those pixels in some scenes. The detection efficiency was greater for the Eyjafjallaj€ okull ash cloud. For the Grímsv€ otn scenes, the study indicated that the main cause of the false negatives was the small thermal difference between the top of the ash cloud and the Earth's surface, and mixing with clouds at low altitude overlying or colocated with the ash layer (see also Kylling et al., 2013).
In data insertion, observations are used to create a model state from which to begin a simulation. This paper is a feasibility study that expands on previous data insertion work on the Eyjafjallaj€ okull eruption (Wilkins et al., 2014(Wilkins et al., , 2016, which includes no information on the possible location of meteorological cloud that could inhibit ash detection. (The Francis et al. (2012) retrieval used in those studies does provide an estimate of aerosol optical depth, but that information was not utilised.) In those studies, model simulations are initialised from a series of satellite retrieved ash cloud properties, and the resulting outputs are combined to form composite simulations of ash observed in all of the scenes. Here, the dispersion of the 2011 Grímsv€ otn ash cloud is simulated using the Met Office's Numerical Atmospheric Modelling Environment (NAME; Jones et al., 2007) forced by Met Office Unified Model (MetUM) numerical weather prediction data. Model output errors can accumulate with increasing forecast length due to errors in the meteorological forcing and removal processes (Durant, 2015). In an attempt to constrain some of the cumulative errors, simulations are updated periodically using a variation on the data insertion method. As meteorological clouds are likely to inhibit detection of parts of the Grímsv€ otn ash cloud in the satellite imagery, a series of satellite retrievals and the results of a probabilistic ash, cloud and clear sky classification algorithm are incorporated into the updates. Details of the scheme are given below.

Methods
Owing to the high temporal resolution of the SEVIRI sensor, for the update scheme both the retrieval and classification are performed using data from that instrument. In theory other instruments and measurements could be employed in a similar way. Using SEVIRI infrared channels, ash is detected at a given time and the ash cloud height and column loading are retrieved using a one dimensional variational method (1D-Var; for full details see Francis et al., 2012), assuming an andesite refractive index (Pollack et al., 1973). These data are used to initialise NAME, where only positive ash flags are inserted into the model. For later times, the retrieval data and a Bayesian atmospheric classification scheme  are used to update the model state in a portion of the model domain.
The Bayesian scheme classifies satellite pixels by the probability of being free of meteorological cloud and ash, containing cloud or containing ash. The combined probabilities sum to unity. Specific prior information about the atmosphere within each pixel is used in the scheme, which does not include user-defined brightness temperature thresholds for ash detection. Based on the prior information and the satellite data, pixels are assigned a probability of being in each of the three states. For a complete description of the method please see Mackie and Watson (2014). The atmospheric states in the classifier are mutually exclusive. In this study, pixels are flagged as clear, cloud or ash according to which class has the highest posterior probability. Pixels are assigned as ambiguous where there is a <0.2 difference in probability between the assigned class and the next most probable.

Sequential update scheme
A step-by-step outline of the modelling methodology is given below and shown on the flow chart in Fig. 1. 1. The 1D-Var detection and retrieval scheme is run for time t 0 (0615 UTC 23 May 2011) and the output is inserted into NAME following the method outlined in Wilkins et al. (2016). Retrieved ash cloud height, ash column loading and an assumed ash layer depth are used to create a representation of the downwind ash cloud for the NAME source term. Each pixel represents a source and all sources are released into the model atmosphere. Output is obtained for time t 1 (1215 UTC 23 May 2011), forming an a priori forecast. NAME ash column loading results are output onto a 0.25 Â 0.25 grid and ash concentration is output at a Insert into NAME as instantaneous release of ash at me t 0 , run forward to me t 1 Run volcanic ash retrieval on flagged pixels for me t 1 Run classifica on scheme for me t 1 Insert into NAME as instantaneous release of ash at me t 1 , run forward to me t 2 Run ini al volcanic ash retrieval on SEVIRI imagery for me t 0 Run volcanic ash retrieval on flagged pixels for me t 2 Run classifica on scheme for me t 2 Insert into NAME as instantaneous release of ash at me t 2 , run forward to me t 3 … Update t 1 forecast with classifica on scheme/retrieval Update t 2 forecast with classifica on scheme/retrieval Con nue to t n … 200 m vertical resolution. NAME output represents an average over the hour preceding the time indicated. 2. The atmospheric classification scheme and 1D-Var retrieval are run on a SEVIRI image for time t 1 . 3. The model state at t 1 is updated using the classified and retrieved data for that time. Classified probabilities and retrieved ash column loading and ash cloud height are mapped onto the NAME output grid using nearest neighbour interpolation. The classification data covers the region within 50 to 70 N, À25 to 25 E. This is deemed the region of interest as it covers the main body of ash that travels towards the UK and Scandinavia over the time period studied. Selecting a subset of the classification data also reduces computational time. Depending on the pixel class, the model state is updated as per below: (a) If a grid cell is classified as clear, modelled ash is removed from that cell. (b) If a grid cell contains observed ash, the model state is updated with the 1D-Var retrieved value for that cell. The classification scheme is fairly conservative and classifies fewer ash pixels than the detection algorithm within the 1D-Var retrieval scheme. For this reason, ash is assumed to be present in the pixel if it is flagged as ash in the retrieval scheme and the classifier is in a cloud, ash or ambiguous state, i.e. not clear. (c) If a grid cell is classified as cloud or ambiguous and there is no ash detected within the 1D-Var retrieval, the model state for that grid cell is left unchanged. Modelled column loading values <0.01 g m À2 , are cleared to reduce computational cost when re-inserting into NAME. The SEVIRI imagery is unable to detect ash at this column loading magnitude. 4. The updated model state is then inserted into NAME as in Step 1 and run forward in time to create a new model state for time t 2 , using assumptions on the particle size distribution (PSD), ash layer depth and height (details given below). Particle density is assumed to be 2300 kg m À3 Devenish, 2013) 2.1.1. Ash layer height and depth To reconcile 2-dimensional observations and 3-dimensional NAME output, some simplifications and assumptions are made. Either option 1 or 2 below is implemented for grid cells that contain modelled and/or retrieved ash: 1. NAME height (NH) cases -At locations where non-zero ash is predicted by NAME (whether or not the column load from observations or from NAME is to be used), the effective ash layer height and depth are calculated from the model output and used for the updated state to be re-inserted into NAME. The peak concentration in each vertical NAME profile is assumed to lie at the centre (z N ) of the layer. The effective layer depth (dz N ) is calculated as the ratio of the ash column loading to the peak concentration within that profile (as in Devenish et al., 2012a;Dacre et al., 2015). Retrieved height (RH) values are used where ash is observed but not modelled (see option 2 below). 2. Retrieved height (RH) cases -At locations where ash is observed (here the observed value will always be used for the column load), the retrieved height (h R ) and a user-defined layer depth (dz R ) are used for the updated state. The centre of the layer, z R , is given by z R ¼ h R À dzR 2 , where dz R is assumed to be 1.0 km and h R is the retrieved height, assumed to be the top of the ash layer. Where ash is modelled but not observed, the NAME height (z N ) and depth (dz N ) are used. z N and dz N are calculated in the same way as in option 1, but the decision as to whether to use z R and dz R or to use z N and dz N is made differently.
In both cases, a single ash layer with a Gaussian vertical distribution is assumed, with a standard deviation of dz= ffiffiffiffiffiffi 12 p , which matches that of a uniform distribution over the layer depth (dz). This will place small amounts of ash above and below the calculated layer. Where z and dz place ash < 0.0 km above sea level (asl), dz is adjusted to be dz ¼ 2z to avoid ash being released below sea level (0 km asl). To reduce computational cost, other local topography is not taken into account. Ash released below ground level will be reflected back above ground, but as most of the ash in this study is located above sea, omitting the topography should not have a significant effect on the output.

Particle size distribution
During the sequential update, different options for the PSD are used; AA, BB, AN and BN. The first letter of the acronym is the PSD used for the initial run and in the updates for grid cells in which ash is observed but not modelled at the update time. The London VAAC (based at the Met Office) use PSD A as their default volcanic ash distribution (see Webster et al., 2012). PSD B is based on a distribution of the Eyjafjallaj€ okull ash cloud measured on a Facility for Airborne Atmospheric Measurements (FAAM) BAe-146 research aircraft flight on 14 May 2010 (see Table 1 in Dacre et al., 2013). Both PSDs are shown in Table 1.
The second letter of the acronym refers to the PSD used in the updated runs for the grid cells containing modelled (not observed) ash. To maintain the ratio of mass per bin, PSD N is compiled for each grid cell using the fraction of output mass from each of the bins in PSD A or B. NAME is used to output the column load per size bin. The update then attributes a proportion of the new mass to each bin according to the output, i.e. if 100% of the output mass fell in the 0.1e0.3 mm size bin, 100% of the updated mass would be ascribed to that bin. Note that output is not separated by bin within the vertical ash layer (this is possible in NAME but is not attempted here). Also note that PSD A includes 6 size bins and PSD B has 4. The bins are separated into different distributions in the NAME input file, each distribution having just one bin, to allow size-segregated input. However, as the configuration of NAME used here accepts only 5 distributions, the two bins representing the smallest particles in PSD A (0.1e0.3 mm and 0.3e1.0 mm) are grouped into one distribution during updates. These two bins contain the least mass and combine to equal the same mass fraction as the 0.3e1.0 mm bin in PSD B. The sedimentation rate of particles < 3 mm in diameter is very low (Dacre et al., 2015), so grouping the smallest size bins should have negligible affect on the modelled ash concentrations.

Initialisation from source
As well as the above simulations, a simulation from the volcano source with no updates, and some simulations combining the emissions from the volcano with updates are also undertaken. For the volcanic emissions, a simulation is initialised from the volcano vent using a set of eruption source parameters (ESPs) informed by 30 min C-and X-band radar plume height measurements (Petersen et al., 2012) and observations reported by the Icelandic Met Office (Icelandic Met Office, 2011). Eruption rates are calculated using an empirical relationship between plume height and eruption rate (Mastin et al., 2009), the assumed fine ash fraction is 5% (the default value used at the London VAAC; Webster et al., 2012) and PSD A is used. The resulting simulation for 0945 UTC 24 May 2011 is shown in Fig. 2a.
The radar measurements in Petersen et al. (2012) indicate that plume height was > 10 km asl for the majority of the first 1.5 days of the eruption. However, these measurements include substantial uncertainty (Woodhouse et al., 2015). These plume heights cause much of the ash in Fig. 2a to be transported north-eastwards of Iceland and less ash south and eastwards. In reality, mainly SO 2 , which was detected in 1D-Var retrievals by Cooke et al. (2014), was transported on a northwards trajectory at the beginning of the eruption, where only a few pixels were detected as ash (Kylling et al., 2015). From 23e25 May, the ash was mainly transported in a south easterly direction (Moxnes et al., 2014), and little SO 2 was detected by Cooke et al. (2014) south of Iceland. Inversion modelling by Moxnes et al. (2014) indicates that ash on the south easterly trajectory was emitted below 4 km. Therefore, it is likely that most of the southward ash was emitted in the lower part of the plume, and radar plume height measurements may not be suitable for calculating the source term for this type of eruption (Witham, 2016).
The plume heights used in the simulation shown in Fig. 2a are halved (from the vent height) given the evidence above, and eruption rates re-calculated accordingly. This significantly reduces the amount of ash released, so the fine ash fraction is doubled to 10% (Fig. 2b). This simulation is denoted ESP half and is shown in Fig. 2b, where much less ash is released and less travels northwards compared to Fig. 2a.
It is clear that, due to large uncertainties in the plume height measurements and the calculation of eruption rate from those heights, the modelled ash column load should not simply be tuned by adjusting the fraction of fine ash, but producing a perfect simulation from the vent ESPs is not the intention here. This simulation is intended as a reasonable a priori which can be updated using the satellite imagery. The aim is to determine whether the update scheme is successful when NAME is initialised from the volcano itself, where the simulations are less likely to miss ash if it is consistently unobserved. Output from ESP half is sequentially updated using PSD A and both the NAME height (to help maintain the modelled ash cloud height downwind from the vent) and retrieved height methods. Ash is released from the vent throughout the updates. These updated simulations are denoted NH ESP AA and RH ESP AA respectively.

Results and discussion
In the following, a subset of the simulations described above is chosen to simplify the discussion. A brief description of the simulation acronyms is given in Table 2. These simulations are compared against a simple data insertion scheme described in Wilkins et al. (2016). The simple scheme is denoted here as DI max B. It is a composite simulation where runs are initialised from 6-hourly retrievals using PSD B, starting at 1215 UTC 23 May. The final result in each output grid cell is taken as the maximum of those simulations at a given time, i.e. the maximum of 4 simulations run forward to 0945 UTC 24 May. This composite type was chosen for comparison because it was designed to help overcome the effect of potential cloud coverage in one or more of the contributing retrievals. The main differences between the updated simulations and the simple scheme, in no particular order, are: i) use of PSD A and PSD N, ii) the NAME height, iii) the release of material from the volcano, iv) clearing of ash according to the classifier, and v) sequential update of simulations rather than combination at the run end time. The most similar updated simulation to the simple scheme in terms of the values used is RH BB.

Qualitative comparison
Simulated ash column loading at 0945 UTC 24 May 2011 is shown in Fig. 3. Fig. 3a shows the simple insertion scheme, DI max B. Fig. 3bee are initialised from a retrieval at 0615 UTC 23 May and updated with the 1D-Var and classification data 6-hourly thereafter. Fig. 3f is initialised from the volcano vent using the radar informed ESPs (halved) and updated with the 1D-Var and classification data 6-hourly thereafter.
Observations of the ash cloud at 0945 UTC 24 May 2011 are shown in Fig. 4, where Fig. 4a is a SEVIRI ash column loading retrieval, and the associated BTD image is shown in Fig. 4b. Note that the minimum ash detection threshold for SEVIRI is considered to be 0.2e1.0 g m À2 (Devenish et al., 2012a;Francis et al., 2012;Prata and Prata, 2012). Fig. 4c is an ash column loading retrieval from Infrared Atmospheric Sounding Interferometer (IASI) imagery using the method of Western et al. (2016), in which the classification scheme described above is used for detecting the ash. Fig. 4d is a composite of IASI BTD images around 0945 UTC. IASI imagery is included here to provide an independent set of satellite observations to compare to the model simulations.
In Fig. 3aee, the simulations show similar transport patterns and most ash follows a south-easterly trajectory. The NH BB simulation in Fig. 3e has greater lateral spread than Fig. 3bed.
Differences in ash position may be due to the influence of wind shear acting on ash occurring at different vertical levels within the simulations. In the RH ESP AA simulation (Fig. 3f), more ash travels on a northward trajectory. The area west of À25 E and north of 70 N is not covered by the set of classified data used here. If that data set covered a larger region, ash on the northward trajectory may be subject to greater changes during the updates.

Quantitative comparison
A series of the simulations is compared to the SEVIRI and IASI retrievals of ash column loading for~0945 UTC 24 May 2011 within the region 50 to 70 N, À25 to 25 E in Fig. 5. Both sets of retrievals are interpolated onto the NAME grid using nearest neighbour interpolation. Three comparison metrics are employed and a brief Table 2 Acronyms of the simulations compared in Section 3. Update height indicates whether the NAME Height or Retrieved Height method (these are described in Section 2.1.1) is used to determine the vertical distribution during the update. Initial PSD indicates the PSD used for the initial run and in the updated run for grid cells in which ash is observed but not modelled. Update PSD is the PSD used during the update for grid cells where ash is modelled.  Where there are no pairs of data exceeding the threshold, no skill score is produced. Fig. 5 shows the scores of the simulations at 0945 UTC 24 May 2011 after 1e4 updates between 1215 UTC 23 May and 0615 UTC 24 May, where they were either initialised from a retrieval at 0615 UTC 23 May or from the volcano vent. Scores for the ESP half and DI max B simulations are also plotted. The absolute differences in NMSE and bias in the simulations after the final update are small, so relative differences are given here. In terms of the RH BB vs. SEVIRI (IASI) scores, the NH BB simulation shows a decrease in skill of 42% (179%) in NMSE, 48% (41%) in bias and 12% (9%) in FMS. Hence, using the NAME height scheme does not produce an overall improvement in skill for ash column loading (the NH AA simulation performs marginally better than RH AA in FMS only, not shown). Comparing the PSD A vs. PSD B simulations, RH BB shows an increase in skill in terms of the RH AA scores of 6% (19%) for NMSE, 9% (7%) for bias and a decrease in skill of 17% (20%) for FMS. The PSD A simulations tend to score better than the PSD B simulations according to the spatial metric (RH AN has the best FMS score), but not for pairwise column loading comparisons. The simulation using the PSD update scheme (PSD N) produces results with marginal differences to those in which the PSD is not updated; RH AA achieves 1e7% lower skill than RH AN for all metrics. (Although the same is not always true for the omitted simulations, where the scores are very similar and neither scheme is obviously favourable.) Fig. 5 shows a trend of increasing skill with the inclusion of subsequent updates, with the exception of the second update at 1815 UTC 23 May in Fig. 5a and b and the final update in Fig. 5f, which may indicate bad retrievals for those times. In terms of the DI max B scores, the updated simulations out-perform DI max B by 7e25% in NMSE and bias (except NH BB which shows a reduction in skill of 18e108%). RH AN and RH AA improve on the FMS score of DI max B by 14e23%, whereas the other simulations show a reduction in skill of 5e48%. There is one set of statistics for non-updated simulations initialised at 0615 UTC 23 May 2011 vs. the SEVIRI retrieval (where the simulations were initialised at 0615 UTC 23 May and not updated; not shown), but none for IASI. They are BB simulations and there is an increase in skill between these and simulations which have one update at 1215 UTC for NMSE and bias but a slight decrease in skill for FMS.
The model vs. SEVIRI retrieval comparisons (Fig. 5aec), perform better than in the model vs. IASI retrieval comparisons (Fig. 5def).  standard deviation (s) of the effective NAME ash cloud top heights (h N ) and depths (dz N ) in km are calculated using the method in Section 2. SEVIRI retrieved height: m ¼ 3.8 km, This is expected as 1D-Var SEVIRI data is used to update the simulations, and they are therefore likely to be more similar to later 1D-Var retrievals. The retrieved column loading from IASI is the order 2.0e4.0 g m À2 greater than for SEVIRI, leading to larger NMSE values and more negative bias scores. Also, for IASI the ash is detected using the same probabilistic methodology as the classification scheme, which as mentioned above, is fairly conservative (and appears to be more so for IASI than SEVIRI). This gives a smaller area classified as ash, leading to lower FMS scores. Bias scores for all simulations compared are negative, showing that they under-predict the ash column load. Under-prediction of mass may be due to over-estimation of ash column loading in the retrievals at the comparison time. It could also be due to failures in capturing the full extent of the ash cloud where it is obscured from view of the sensor during updates, and the subsequent spreading of the captured ash in the model, which reduces the mean column load. With the exception of the ESP runs where ash is continually emitted, if a region of ash is consistently obscured in the observations, it will not be inserted into the simulations. Under-prediction could also be due to model transport errors and/or excessive lateral spreading of ash in the simulations, reducing the mean column load. Also, in these retrieval schemes it is assumed that each ash flagged pixel corresponds to full ash coverage. In reality the coverage may be less than 100% of the pixel, leading to an underestimation of the ash column loading inserted into the model. Combined with high model diffusion, this could lead to negative bias in the simulations.

Effective layer heights
Cross sections of the simulated ash clouds at 60 N, between À8 and 2 E are shown in Fig. 6, with SEVIRI retrieved heights overplotted with black crosses at the SEVIRI pixel locations. Mean (m) effective ash cloud top heights (h N ) and depths (dz N ), calculated as in Section 2 above, and their standard deviations (s) are also shown. The location of the cross sections is indicated with a red line in Fig. 4a. Retrieved heights are taken to be ash cloud top heights, where the ash layer is assumed to be optically thick, but could lie beneath the top where the ash layer is optically thin (Francis et al., 2012).
The effective model heights in Fig. 6 broadly agree with the heights of NAME particle trajectories over the same region shown in Fig. 1 of Stevenson et al. (2013) and Fig. 2 of Cooke et al. (2014). They are also in reasonable agreement with the SEVIRI retrieved heights (m ¼ 3.8 km, s ¼ 0.7 km); RH simulations are within0 .5 km and the NH simulation is within~0.9 km. The RH simulations have lower mean layer top heights and thinner layers than the NH BB simulation (Fig. 6e) by approximately 1.0 km for both. As the simulations progress, the cumulative effects of turbulent mixing and meander can act to deepen ash layers in NAME (Devenish et al., 2012b;Dacre et al., 2015). For NH BB the effective depth is calculated for each grid cell from the previous run, whereas for the RH runs the ash is inserted back into NAME with a 1.0 km vertical depth. This means that both the depth and variance of depth are likely to be higher for NH BB. It is not possible to tell from these experiments whether wind shear has played a part in the depth of the simulated layers, but it is likely to have had an effect on the increased lateral spread of the deeper NH BB layer (Fig. 3e) compared to the RH layers (Dacre et al., 2015).
The NH ESP AA simulation (not shown) places most of the ash in a sea level layer at the same time and location as in Fig. 6. This simulation appears to have been subject to a "runaway" effect beginning around midnight on 24 May, where layer heights are initially similar to the other simulations, but are then consistently calculated too low, and/or the ash settles too far within the NAME atmosphere. In this case, the NAME height configuration fails to maintain the altitude of the ash. However, the lateral location of the NH ESP AA ash cloud is similar to that of the RH ESP AA ash cloud in Fig. 3f, which shows that the sequential update scheme allows some preservation of the 2-dimensional aerial structure of the ash cloud, even when the layer height is not well captured. Fig. 6d shows that using PSD B gives a marginally higher and thinner mean ash layer than its PSD A counterpart in Fig. 6c. This may be attributable to the larger particles (30e100 mm) and higher proportion of 10e30 mm particles included in PSD A. The larger particles will have greater sedimentation velocities than smaller particles of the same shape and density (Webster and Thomson, 2011), leading to deeper ash layers as the large particles settle out faster. A recent study using NAME to simulate ash clouds with different PSDs assessed how differential settling of particles could lead to changes in ash layer depth (Dacre et al., 2015). In that study, it was found that the greater sedimentation rate of particles with diameters > 30 mm had a negligible effect on layer depth (at distances of at least~1500 km from the source) because they will likely have already been deposited in proximity to the source. In contrast to the Dacre et al. (2015) study, the simulations in Fig. 6aee are initialised downwind of the vent as opposed to the source itself.
Also, the PSD A simulations are updated with particles > 30 mm far from the vent which may act to "recharge" that size bin. Hence, the influence of large particles on the layer depth is not ruled out.

Particle size distributions at Stockholm
Simulated PSDs shown in Fig. 7a and b represent the average ash mass fraction up to 0.5 km asl, over a 0.25 Â 0.25 grid cell centred on Stockholm (59.4 N,18.1 E) between 2200 UTC 24 May and 0600 UTC 25 May 2011. The simulations include 5 updates up to 1215 UTC 24 May. PSD A and B are shown by the grey histograms. Particulate matter measured at the surface using an optical particle counter (OPC) at Hornsgatan, Stockholm between 2200 UTC 24 May and 0600 UTC 25 May 2011 was reported by Tesche et al. (2012) (see Fig. 10 of that paper). The OPC measures particles between 0.25 and 32 mm diameter. The OPC data is shown in Fig. 7c as a volume fraction (this is equivalent to mass fraction if a constant particle density is assumed). It is noted by Tesche et al. (2012) that the particle measurements should be taken as approximate with at least a factor of two uncertainty .
All of the simulated PSDs are shifted towards larger particle sizes compared to the observations. PSD B, which contains no particles >30 mm, is in best agreement with the position and shape of the measured OPC distribution. Tesche et al. (2012) report that during this pollution episode, there was a peak in the volume size distribution at 3.5 mm, however, loss of large particles in the instrument inlet could mean that the observed peak is shifted towards a smaller diameter. Those authors also state that particles from 1 to 7 mm diameter showed more than a 5-fold increase compared to 24 h before the start of the episode (dashed line in Fig. 7c).
At < 0.5 km asl all of the simulated PSDs are shifted towards larger particle sizes than the PSD used to initialise them due to differential settling. NH BB has deviated the least from PSD B here, which could be explained by its average layer depth and height. During an update, a grid cell from NH BB is given the same mass and PSD as that grid cell in RH BB, but it is shown in Fig. 6e that the NH scheme produces deeper and higher ash layers due to the effective height and depth estimates. In a deep layer, mixed particles of different sizes will take longer to vertically sort than in a thin layer. Therefore, at the same location downwind, there will be a greater proportion of large particles in the lower part of a thin ash cloud than in a deeper one of equivalent mass and PSD.

Ash mass concentrations at Stockholm
Lidar measurements have high vertical resolution and are useful in distinguishing the vertical structure of ash clouds (e.g. Winker et al., 2012). In this section, ash mass concentrations estimated from ground-based Raman lidar at Stockholm on 25 May 2011 (for full details see Tesche et al., 2012) are compared to the simulated concentrations. Coloured lines in Fig. 8 show 1-h average ash concentrations from the same simulations shown in Fig. 7. The DI max B profile now incorporates an additional simulation initialised from a retrieval at 1215 UTC 24 May. The upper and lower bounds of the grey shading in Fig. 8 are 1-h averages of ash mass concentrations estimated by Tesche et al. (2012) from 30 min mean lidar measurements on 25 May. The lower bounds assume a specific mass extinction coefficient of K ext ¼ 0.64 m 2 g À1 and a coarse mode fraction of f c ¼ 0.3 (i.e., 30% of aerosols are assumed to be coarse volcanic ash). For the upper bounds K ext ¼ 0.64 m 2 g À1 and f c ¼ 0.6. These bounds represent the most likely mass range according to Tesche et al. (2012). Those authors consider the lower bound to be the most reliable and partly attribute the peak in mass concentration at~1.0 km height from~0330 UTC onward to unscreened thin clouds and swollen aerosols. Fig. 8aec shows that between 0200 and 0600 UTC the simulated profiles capture the position of the peak in ash concentration centred at~2.5 km altitude. The position of the lower peak at0 .5 km is reasonably well captured at 0200e0400 UTC. The appearance of these double peaks in the simulations shows that NAME is able to model a bimodal profile from a single layer. However, the ash concentrations are significantly under-predicted below 1.0 km. Under-prediction within the boundary layer is also seen in the a priori and inversion derived a posteriori estimates of Moxnes et al. (2014) (see Fig. 12 of that paper), although the peak of their a posteriori estimate is larger than both lidar peaks and lies between the two in height. Those authors attribute the respective over-and under-prediction of ash mass above and below 1.0 km altitude to possible under-estimation of ash entrainment into the boundary layer by FLEXPART (FLEXible PARTicle dispersion model; Stohl et al., 2011). As a peak does appear at that level in the simulations in this study, entrainment is unlikely to be an issue here. This scheme does not account for uncertainty in the retrievals during the model initialisation and update. The uncertainty in column loading retrievals has previously been estimated at 50% (e.g., Kylling et al., 2014). A factor of two uncertainty is associated with individual mass concentration profiles derived from lidar observations . Taking these into account, a good fit between simulated and measured mass concentration values is achieved above the boundary layer. Below the boundary layer top the fit is less convincing, particularly from 0500 UTC onwards. Some of the observed material below the boundary layer top could possibly be attributable to normal pollution, but night-time measurements were chosen for the comparison to minimise contribution of particulates from traffic.

Applicability and limitations
In the method described, uncertainties will be introduced in the use of the classification scheme and the associated probability thresholds by which the pixel state is assigned. The retrieved values for ash cloud top height and ash column loading are also associated with uncertainties that will be introduced to the model state when it is updated, although no attempt has been made to quantify the effect of these here. Simplification of the model state for re-insertion into NAME will also introduce uncertainties. The ash layer is re-inserted as a single layer with a Gaussian vertical distribution, regardless of the structure predicted by the model, meaning detail will be lost at each update. Particles of all sizes will be fully mixed within the layer after the update regardless of their distribution within the layer beforehand. The PSD for individual grid cells is taken to be PSD A, PSD B or a distribution determined using the size-segregated output from NAME (PSD N). Altering particle sizes mid-run will impact the rate of ash fall-out and hence column loading. In the NAME configuration used here, particles within the boundary layer are subject to dry deposition (Webster and Thomson, 2011). Therefore, wrongly revising the layer height in the vicinity of the boundary layer will affect the rate of dry deposition and hence the longevity of the ash cloud. Similarly, if the height adjustment wrongly places ash co-located with or below meteorological cloud (or likewise, wrongly removes ash from these regions), ash column loading may also be affected by changing wet deposition rates (in cloud rain-out and below cloud wash-out, see Webster and Thomson, 2014). (This could also happen if the meteorological cloud is at the wrong height, but this is not investigated here.) Simulation skill generally improves with an increasing number of updates, but compared to the simple insertion scheme (DI max B), which is created using the same type of SEVIRI retrievals, the gains in skill are small (if at all), especially for a low number of updates. If the "Most Recent" composite simulation type from Wilkins et al. (2016) (where the simulation initialised from the most recent retrieval takes priority over those initialised from older ones) were used here, the differences in skill may be smaller still. The most time consuming runs are the updated ESP simulations, but it has been shown that after a series of updates the skill of the RH ESP AA simulation is no higher than those which use satellite data only. However, this comparison is only interested in the satellite observed ash and the ESP methodology is likely to be more accurate in predicting the presence of ash closer to the volcano.
In this study the NAME simulations are run forward for 6 h before being updated. For a~50 Â 50 subset of the SEVIRI image disk, the classifier code takes a few minutes and the 1D-Var code takes the order of an hour to run on a standard desktop PC. Once this data is already processed into an update-ready format, a single configuration of the update scheme (i.e., 4 to 5 updates of NAME) takes~2e3 h to run, with the majority of this time attributable to NAME. In an operational setting, where the update code would be optimised and the observation data processed upon being received, the update scheme would have a shorter running time than a series of simple data insertion simulations of say, 24, 18, 12 and 6 h in length. However, an operational forecaster may wish to run the simulations further ahead and update them when new observations become available, as in Fig. 5. The update scheme would then take slightly longer to run than the simple scheme. (One could save time by saving the output from before the end of the run, so that when new data is available, the model data is ready to be updated.) Of course, the resolution of the output and number of particles released will impact the computational cost and time. Reducing the resolution and increasing computing power would reduce run time.

Conclusions
A novel method of simulating the atmospheric dispersion of volcanic ash is presented using the NAME model. Physical properties of the Grímsv€ otn ash cloud retrieved from SEVIRI satellite imagery and a probabilistic ash/clear/cloud classification scheme are used to initialise the simulations and/or update them every 6 h. The aim of using the retrieved data to update the model state is to constrain positional and magnitude errors periodically, while the classification scheme is used to remove modelled ash where the scene is classified as clear, and allow the model to take priority where the scene is cloudy or ambiguous. With this method, satellite-data centred simulations which include little prior information about the eruption can be produced to be in good agreement with observations.
To assess the utility of different PSDs, initialisation and update methods, several configurations of this scheme are presented. Statistical comparison of the simulations with ash column loading retrievals from SEVIRI and IASI imagery is performed, in addition to qualitative comparison with ash layer height estimates, particulate matter and mass concentration measurements. Simulation skill improves after a series of updates and most scores show a small improvement on the simple data insertion scheme. The results show that simulations in which retrieved ash cloud height is utilised where possible (with an assumed layer depth of 1.0 km) tend to achieve higher skill than a simulation in which the layer height and depth is calculated from the output of the previous NAME run, although the latter performs well in a comparison of ground-based mass and PSD measurements. The PSD based on FAAM measurements provides simulations which, in general, score marginally better than those using the default PSD used by the London VAAC for NMSE and bias, but not FMS. For the simulations compared in this study, the PSD scheme (using NAME output PSD to determine the input for the update run) produces a simulation with marginal increases in NMSE, bias and FMS score compared to its counterpart in which the PSD is not updated.
The simulations under-predict ash mass concentration at Stockholm, particularly within the boundary layer. However, the observed bimodal vertical profile is captured by NAME in some of the simulations. Taking into account uncertainty on the lidar measurement-derived profiles and an assumed uncertainty in the simulated profiles, the simulations compared are in good agreement with the lidar observations.