Using data insertion with the NAME model to simulate the 8 May 2010 Eyjafjallajökull volcanic ash cloud

A data insertion method, where a dispersion model is initialized from ash properties derived from a series of satellite observations, is used to model the 8 May 2010 Eyjafjallajökull volcanic ash cloud which extended from Iceland to northern Spain. We also briefly discuss the application of this method to the April 2010 phase of the Eyjafjallajökull eruption and the May 2011 Grímsvötn eruption. An advantage of this method is that very little knowledge about the eruption itself is required because some of the usual eruption source parameters are not used. The method may therefore be useful for remote volcanoes where good satellite observations of the erupted material are available, but little is known about the properties of the actual eruption. It does, however, have a number of limitations related to the quality and availability of the observations. We demonstrate that, using certain configurations, the data insertion method is able to capture the structure of a thin filament of ash extending over northern Spain that is not fully captured by other modeling methods. It also verifies well against the satellite observations according to the quantitative object‐based quality metric, SAL—structure, amplitude, location, and the spatial coverage metric, Figure of Merit in Space.


Introduction
The eruption of Eyjafjallajökull in Iceland (19.60 ∘ W, 63.62 ∘ N) during April and May 2010 caused major disruption to air traffic as the ash cloud was transported over Europe and the North Atlantic, costing $1.8 billion to the aviation industry and approximately $5 billion to the global economy [International Air Transport Association, 2011].Volcanic ash transport and dispersion models (VATDMs) are important tools which are used by Volcanic Ash Advisory Centers (VAACs) for forecasting the transport and dispersion of ash in order to provide safety advisories to civil aviation authorities.The Numerical Atmospheric-dispersion Modelling Environment (NAME) [Jones et al., 2007] is a Lagrangian VATDM used operationally by the London VAAC, based at the UK Met Office.The eruption source parameters (ESPs; e.g., mass eruption rate, plume altitude and particle-size distribution) that form the source term required by models such as NAME can often be highly uncertain, which, in addition to structural uncertainty inherent in the model and its parameterizations, can lead to considerable uncertainty in model output [e.g., Devenish et al., 2012a].

Uncertainties in Eruption Source Parameters
NAME does not include some of the complex physical processes that govern the behavior of the plume in proximity to the volcanic vent, e.g., particle aggregation.In standard practice an "effective" source is therefore calculated to represent ash which has already undergone near-source processes and is transported distally.Note that here the term "ash cloud" is used to describe distal ash which has been transported downwind, and "plume" is used to describe the column of ash erupting from the vent.
The difficulties in calculating effective source terms include estimating an effective particle-size distribution (PSD).PSDs calculated from ash deposits do not necessarily represent ash within distal ash clouds.Calculated PSDs can also be highly method dependent and are usually measured after an eruption, rather than during the eruption when VATDMs are run operationally [Bonadonna and Houghton, 2005].An effective PSD that has been used operationally by the London VAAC is based on empirical measurements from Hobbs et al. [1991] and represents only the percentage of fine ash that survives near-source fallout (the fine ash fraction) [Webster et al., 2012].In addition to the PSD, particle density (g m −3 ) determines the mass and hence sedimentation 10.1002/2015JD023895 velocity of particles, but this is also very difficult to determine in an operational situation.A default density of 2300 kg m −3 is assumed by the London VAAC for all eruptions unless other information is available.
Often, mass eruption rate (MER) is calculated using an empirical relationship between the height of the plume above the volcano summit and the MER of material [e.g., Sparks et al., 1997;Mastin et al., 2009].One such relationship is given by: Q m = 140.876z 4.15  max , ( where z max is plume height above the vent in kilometer and Q m is MER in kg s −1 [Mastin et al., 2009].The calculated MER and an estimation of the fine ash fraction can then be used to calculate the effective source strength required by NAME.Typically, the mass of ash in the distal cloud is the order of ∼5% of the total erupted mass, although this percentage is usually not well known and can therefore cause large uncertainties in the modeled concentration of distal ash clouds [Devenish et al., 2012a].The empirical relationship in equation ( 1) does not take into account meteorological conditions at the time of the eruption or the effects of moisture at the source which influence the characteristics of an eruption plume [Devenish et al., 2012a].The relationship may also be biased toward larger events which are easier to study [Woodhouse et al., 2013].Eruptions in moist atmospheres, weak eruptions subject to high wind speeds, or those with a high proportion of large particles may therefore vary significantly from the calculated MER [Sparks et al., 1997;Woodhouse et al., 2013].Modeling methods have been developed to deal with some of these uncertainties.Plume models such as PLUMERIA [Mastin, 2007] and PlumeRise [Woodhouse et al., 2013] include some of the physical processes that govern the behavior of an erupting plume and can be used to make estimates of ESPs to use within VATDMs [e.g., Kristiansen et al., 2012].Inversion methods have also been used to estimate effective ESPs using satellite ima gery for both volcanic ash and SO 2 [Eckhardt et al., 2008;Kristiansen et al., 2010;Stohl et al., 2011;Kristiansen et al., 2012;Boichu et al., 2014;Moxnes et al., 2014;Kristiansen et al., 2015;Pelley et al., 2015].
Plume rise height determines the level at which ash enters the atmosphere and hence which direction it will travel due to the wind direction at that height.Radar plume height measurements can be used for estimating the plume rise height, but they are subject to some uncertainty due to interpolation between discrete bands, the curvature of the Earth, and local topography [Arason et al., 2011;Folch et al., 2012].Also, the height at which ash spreads laterally into the atmosphere may not be the maximum measured plume height.This could be due to strong winds causing a weak plume to bend over, moving the location of the maximum plume height to downwind of the vent or due to the vertical momentum of the eruptive column causing it to extend higher than the level of neutral buoyancy before relaxing back down to that level [Webster et al., 2012].The plume can then spread laterally and thin under gravity (a gravity current), until it approaches the ambient wind speed [Sparks, 1986;Bursik et al., 1992].The physics describing gravity currents is not currently included in most VATDMs (including NAME), but for small to medium-sized eruptions such as Eyjafjallajökull (2010), gravity currents may only dominate transport (over the effects of advection and turbulence) in the earliest phase [Bursik et al., 1992;Costa et al., 2013].
One basic effective emission profile in NAME is a column of ash extending from the vent to the maximum plume height with a uniform vertical distribution, but the emission profile can be prescribed in different ways to represent effective plumes affected by the near-source processes mentioned above.A number of studies have investigated the effect of the emission profile on the downwind ash layer thickness and on modeled ash concentration [e.g., Dacre et al., 2011;Devenish et al., 2012aDevenish et al., , 2012b;;Grant et al., 2012;Dacre et al., 2015].It has been found that ash layer thickness far from the source can be dominated by subgrid scale diffusion, wind shear and vertical resolution of the model output, over the effects of assumed emission profile shape and PSD [Devenish et al., 2012a[Devenish et al., , 2012b;;Dacre et al., 2015].
In calculating an effective source to account for the near-source processes that are not included in VATDMs, a number of uncertainties will be introduced.In this study we implement a data insertion modeling method to create a downwind "virtual" ash source that does not require the calculation of an effective eruption source.This does, however, require assumptions to be made about the downwind source, including ash layer depth, vertical distribution, and PSD.This method was introduced in Wilkins et al. [2014] for the 2010 Eyjafjallajökull eruption and showed promising initial results.Here we expand on that work to qualitatively and quantitatively compare the results of this method with three other previously studied simulations described in section 2.

Data Insertion
In data insertion, the model state is set to observed values where they are available, and the model is used to propagate that information with time.Here NAME is initialized using the physical properties of downwind ash clouds estimated from satellite observations.Instead of using these data to inform us about the eruption source, we create a downwind source term in NAME, where a source is created for each satellite pixel, and a set number of particles are released per source.Ash from all sources is released into the model atmosphere instantaneously.This method requires little prior knowledge of the eruption and hence allows us to run volcanic ash simulations without the usual NAME source term.In doing this we can ignore some of the uncertainties associated with estimating an effective source.However, this does not take account of observation uncertainty and therefore introduces uncertainties attributable to the derivation of the observation data, in mapping observations to formats required by the model, and in the assumptions about the downwind source mentioned above.
Ash column loading (ACL) and ash cloud height estimates for a range of times during the May 2010 phase of the Eyjafjallajökull eruption (see Table 1) are retrieved from Meteosat Second Generation Spinning Enhanced Visible and InfraRed Imager (SEVIRI) infrared channels.The retrieval algorithm optimally estimates ash cloud properties with a one-dimensional variational (1D-Var) method using the 8.7, 10.8, 12.0, and 13.4 μm wavelength channels (for details see Francis et al. [2012]).Some example ACL retrievals are shown in Figure 1.The retrieval is run on the pixels in which ash has been detected, using four different refractive index data sets; 1.5% Haematite mineral dust [Balkanski et al., 2007], volcanic dust [Volz, 1973], andesite and obsidian [Pollack et al., 1973], and the algorithm provides solution costs for each.The Pollack et al. [1973] andesite data set is used here because all of the retrievals in Table 1 attained the lowest total solution costs using that data set.This result is in agreement with the Francis et al. [2012] study and with Millington et al. [2012], who found that andesite refractive indices gave the best results for simulation of SEVIRI imagery for this eruption.
As the retrieval data is two dimensional, the vertical structure of the ash and the thickness of the layer are estimated based on observations of the Eyjafjallajökull ash cloud from various times in May 2010 reported in a number of studies [Marenco et al., 2011;Grant et al., 2012;Johnson et al., 2012].Here three options for the ash layer thicknesses are assumed: 0.5, 1.0, or 2.0 km for the entire ash cloud.The mass is distribute vertically following a Gaussian distribution with the standard deviation matching the standard deviation of a uniform distribution over the nominal thickness.This distribution is used so that the peak of the distribution is around

Forecasts
In this section the construction of our two main types of data insertion forecasts, which we refer to as layered model (LM) or single retrieval (SR) forecasts, are described.The SR forecasts are simulations initialized with data from one retrieval.They are named according to the retrieval in Table 1 used to initialize them, e.g., SR1 was initialized using retrieval R1.
For the LM forecasts, output from a number of SR simulations are combined into a composite simulation in order to capture ash erupted at different times and ash potentially obscured by cloud in certain retrievals.The maximum ACL value for each output grid cell from all SR simulations is used for the LM forecast, i.e., for a set of N SR simulations ending at the same time, the LM forecast ACL value for grid cell xy is given by where ACL xy is grid cell xy in the set of simulations SR1 to N (ACL SR1 xy , … , ACL SR N xy ).We only use ash-flagged pixels to create the SR forecasts and do not include a meteorological cloud detection scheme to indicate the presence of cloud (i.e., where ash may be obscured).Therefore, taking the maximum value per grid cell from a number of simulations in the LM method should provide more conservative forecasts (for observed ash) than the SR method.Take the example case where ash is detected in an early retrieval and inserted into NAME, but undetected due to cloud cover in a later retrieval, and hence not inserted into NAME.In this case, assuming that transport errors are not prohibitively large, we would like to take the maximum value at the location at which the observed ash has been transported in the model, rather than preferentially choosing the value from the most recent simulation.A cloud detection mask would provide more information on the areas that are likely to be clear or to contain cloud, which could be used to indicate which areas are likely to give reliable ash (or no ash) retrievals.We aim to employ such a mask in future work.It is noted that using the maximum value method, ash mass will not be conserved in the layered forecasts.
For short-lived or instantaneous eruptions, or where there is minimal cloud cover, gains from using a large number of retrievals to create simulations may be negligible, and a small number of retrievals may be sufficient to capture much of the ash cloud mass and structure.For long-lived eruptions with some cloud cover, or short-lived eruptions with a lot of cloud cover, it may be appropriate to allow a larger number of SR simulations to contribute to the final LM forecast.It is noted that this method will be most effective at times with little cloud cover and ineffective when parts of the ash cloud are consistently undetected in the retrievals.
The LM forecasts are named according to the number of contributing SR simulations, starting with SR1, e.g., LM(3) is a combination of SR1, SR2, and SR3.For both SR and LM forecasts the number denoted in kilometer is the ash layer thickness assumed at initialization (e.g., SR1 1.0 km).
The final layered forecasts could be created in a number of ways, so alongside taking the maximum ACL, we will show three other alternative options.The first is a weighted average of the simulations, where ACL values for each xy grid cell are weighted according to the time since their insertion: ACL forecast where w is an N length vector of weights and t is an N length vector of the duration in hours of simulations SR1 ∶ N. The neighborhood is somewhat arbitrarily chosen to be 48 h, beyond which simulations would have no impact on the forecast (the longest running simulation used here is 38 h).It is noted that a different neighborhood value would provide different weights for each simulation.For the second alternative forecast option we take the mean of simulated values to give the forecast ACL value for xy.To create the final forecast option, the ACL value for grid cell xy from the most recent simulation replaces an ACL value from an older simulation, but only if it is greater than zero.These methods will be denoted the weighted average, mean, and recent value methods, respectively.
We compare the results of the data insertion forecasts with three other forecasts.The NAME "best guess" forecast is a NAME simulation initialized using a posteruption analysis of plume eruption height measurements and a fine ash fraction of 5% [see Webster et al., 2012].The NAME a posteriori and FLEXible PARTicle dispersion model (FLEXPART) a posteriori forecasts are simulations using NAME and FLEXPART [Stohl et al., 1998], respectively, initialized using ESPs derived with the inversion method of Kristiansen et al. [2012].Details of these methods are summarized in Table 2.

SAL
SAL is a quality measure comprising three components: structure (S), amplitude (A), and location (L) [Wernli et al., 2008].This diagnostic tool has previously been applied to NAME output by Dacre [2011] for air quality forecasts and has been adapted for use here.Perfect forecasts will have scores of 0 for all components of the assessment, with ±2 being the worst possible score for both S and A and +2 being the worst score for L. Objects from the observed and modeled ACL fields must be identified for S and L evaluations.An object is a group of adjacent grid cells which have an ACL value above a given threshold.In Wernli et al. [2008] the threshold is taken as one fifteenth of the maximum value in the field.The SEVIRI ash detection limit is considered to be ∼0.2-0.3 g m −2 or more conservatively 0.5-1.0g m −2 [Devenish et al., 2012a;Francis et al., 2012;Prata and Prata, 2012], so the threshold used here is chosen to be 0.5 g m −2 .Model ACL values below this threshold are excluded from all S, A, and L calculations.Here groups of at least two adjacent grid cells constitute an object so that the number of objects is not increased by spurious single ash-containing grid cells.Objects are given as where M is the number of objects in the domain.The domain is -45 to 50 ∘ E, 30 to 80 ∘ N.

Structure (S)
The structure component compares normalized object sizes and shapes.A scaled ash mass (scaled areaintegrated ACL), V n , is calculated for each modeled and observed object: where xy is the grid cell location within the modeled or observed field, R xy is the area-integrated ash column loading (i.e., ash mass) in grid cell xy, R n = ∑ (xy)∈O n R xy is the ash mass in object O n and R max n is the maximum grid cell ash mass within the object O n .Weighted means (V) of the scaled masses (V n ) for all objects are calculated separately for the modeled and observed fields.Each object has a weight proportional to its ash mass R n : S is calculated as the normalized difference in observed (obs) and modeled (mod) V values: S scores range between −2 and +2.A negative S score indicates that predicted fields cover too small an area or are too peaked, which can be achieved when the maximum modeled ACLs are much higher than the observed or where the observed field is more widespread [Dacre, 2011].A positive S score indicates a predicted field that is too spread out and/or flat, which can be achieved when there are many more objects in the observed field, giving a smaller average observed object size [Dacre, 2011].

Amplitude (A)
The amplitude component of SAL compares the normalized difference between observed and modeled ash mass averaged over the domain area.A scores range between −2 (−1 is a factor of 3 underprediction of the domain-averaged values) and +2 (+1 is a factor of 3 overprediction) [Wernli et al., 2008], with 0 indicating a perfect match in amplitude.A is given as follows: where R mod and R obs are the modeled and observed ash masses averaged over all grid cells in the domain (−45 to 50 ∘ E, 30 to 80 ∘ N), respectively, i.e., R = ∑ (xy)∈Domain R xy / domain area.Zero ash mass is assumed where no ash is detected.

Location (L)
The location component compares the distribution of the modeled and observed masses.It is made up of two parts, L 1 and L 2 .L 1 is a normalized distance measure, comparing the centers of mass between modeled and observed fields: where |C mod − C obs | is the distance between the modeled and observed centers of mass.D is the maximum distance within the entire domain.L 1 ranges between 0 (centers of mass in the same location) and 1 (centers of mass are as far apart as they can be within the domain).For a perfect match L 1 = 0, however, a field containing two separate ash clouds could have the same center of mass as a field with one ash cloud.L 2 measures the difference in distribution of masses in the modeled and observed fields about the respective centers of mass.
For both fields, the weighted average distance (H) between the center of mass (C) and the center of mass of each object within the field (C n ) is given as follows: 10.1002/2015JD023895 where |C − C n | is the distance between the center of mass of the field and the center of mass of the object n.
If there is only one object in the domain, H = 0. L 2 is calculated based on the difference between modeled and observed H values: If both the observed and modeled fields have only one object, L 2 = 0.A factor of 2 is used to scale L 2 to the same range as L 1 .L is given by the sum of the two components, L 1 and L 2 , and has a range between 0 and 2: We note that the magnitude of L is dependent on the size of the domain.However, we use the same domain for each calculation so that the scores can be considered relative to each other; therefore, the size of the domain will not affect our interpretation of the results.L does not take account of the objects rotation or rotation about the center of mass [Wernli et al., 2008].An absolute score (|S| + |A| + L) can range from 0 to 6, with values closest to 0 indicating the best agreement between the model and observation.

FMS
The Figure of Merit in Space (FMS) is used to measure the spatial coverage of modeled compared with observed fields [Galmarini et al., 2010] and is defined as the area of intersection of modeled and observed fields / the area of union (equation ( 13)).The method of interpolation from observation to model grid will therefore affect the results [Mosca et al., 1998], but again, we consider the results in relation to each other, so this will not affect our interpretation.Spatial ash coverage of both the modeled and interpolated observation fields is calculated for grid cells with ACL values exceeding 0.5 g m −2 .Good model output will have a high FMS value, but a low FMS could correspond to two similar shapes shifted in space [Mosca et al., 1998].Hence, this is taken to be a good complement to the SAL metric which does not measure spatial overlap.FMS is given as follows: where B mod and B obs are the modeled and observed ash areas, respectively.

Observed and Retrieved Ash
A thin filament of ash can be seen extending over northern Spain at 0900 UTC 8 May 2010 in the brightness temperature difference (BTD) image shown in Figure 2a.This filament is not well captured using the ash detection method in the 1D-Var retrieval (Figure 2b).However, turning off the final ash detection test which utilizes  ratios adds many more ash flagged pixels, and the structure is partially observed at that time (Figure 2c). ratios are ratios of effective absorption optical depth between pairs of spectral channels [Pavolonis et al., 2013].They facilitate isolation of cloud microphysical properties from contributions from the surface and atmosphere through conversion of spectral radiance to effective absorption optical depth [Pavolonis, 2010].The calculation of  ratios negates the requirement for large offline look-up tables in the retrieval of ash physical properties [Pavolonis et al., 2013], and they are used in the 1D-Var retrieval as a means of removing flags from pixels that have previously been tentatively flagged as containing ash [Francis et al., 2012].This technique largely removes the need for algorithm tuning and, under some conditions, has higher skill scores than simple band differencing but, under some circumstances, overconstrains ash cloud extent.Indeed, the ash detection method used within the Prata and Prata [2012] retrieval, which does not utilize  ratios, detects a larger amount of ash over Spain (Figure 2d).For an arid land surface such as Spain, emissivity is low and quite variable, so the 8.7 μm signal to which the  ratio test is sensitive is also likely to be variable.This means that turning the test off could be reasonable in this case (P.Francis, personal communication, 2015).Note that the retrievals we use in creating the data insertion simulations have the  ratio test turned on.[Francis et al., 2012], (c) same as Figure 2b with the  ratio test removed from detection steps (see text), and (d) retrieved ash column loading using the look-up-table method of Prata and Prata [2012].

Forecasts 3.2.1. Qualitative Assessment
The NAME best guess forecast (Figure 3a) places ash in a similar location to the NAME a posteriori forecast (Figure 3b) with increased ACL in general.The filament of ash extending over northern Spain is apparent in both of these simulations, but the best guess extends it slightly farther east.The overall shape of the FLEXPART a posteriori field (Figure 3c), which uses a different meteorology to the others, is similar to the other forecasts, but the filament of ash over northern Spain is not visible.
The LM data insertion forecast using retrievals R1-R5 (see Table 1), where a 1.0 km thick ash layer was assumed during insertion (hence, the forecast is denoted LM(5) 1.0 km), captures the ash cloud structure over Spain well, as seen in Figure 3d.(This forecast has effectively a 6 h lead time; the latest image to be inserted into NAME is valid from 6 h previous to the forecast time.)Qualitatively, the position of the ash agrees well with the BTD imagery, but the ACL values over Spain are generally lower than those in Figures 2c and 2d.
The ash filament over Spain in the LM forecast (Figure 3d) is more horizontally spread than the NAME best guess simulation and the observations.The LM forecast is a combination of a number of simulations, so it is likely to have more horizontal spread than a single simulation, particularly if ash is added in the wrong location or height in one or more of the simulations due to errors in the retrieval or layer thickness estimates.Errors in the meteorological fields at the ash insertion heights could also contribute to the increased spread.As mentioned above, the SEVIRI ash detection limit is considered to be ∼0.2-1.0 g m −2 , so some ash may not have been detected around the periphery of the filament over Spain.Much of the modeled ash over Spain is below the lower bound of the conservative threshold (0.5 g m −2 ).The LM forecast has placed some ash over WILKINS ET AL.
ASH CLOUD MODELING USING DATA INSERTION  1) assuming a 1.0 km thick ash layer.Red lines show the outlines of objects identified for the SAL analysis.
Spain above that threshold, although this is shifted slightly to the south of the observed filament, while the other forecasts placed very little, if any, ash above that threshold over Spain.
Vertical cross sections of the modeled ash concentrations over northern Spain at 43 ∘ N between −10 and 0 ∘ E (with 200 m vertical resolution) are shown in Figure 4. Figure 4a shows that the NAME best guess ash layer is centered around ∼5 km altitude and is 2-3 km thick.The NAME a posteriori ash layer is similar, centered at ∼5 km altitude, and is 1-2 km thick (Figure 4b).The FLEXPART forecast (not shown) has a thin layer centered at ∼2.5 km and another region of ash at ∼9 km altitude.The concentrations for these layers are very low (<4 μg m −3 ).The LM(5) 1.0 km forecast layer is centered at ∼5.5 km altitude and is ∼2-3 km thick, with an area of low concentration ash at −2 to 0 ∘ E, ∼8-11 km altitude (Figure 4c).The 1D-Var retrieved ash cloud height values (with the  ratio test turned off ) at this latitude are shown as black asterisks.This shows that the three NAME ash layers and the retrieved ash cloud height are in reasonable agreement over this region.
The difference in ash position across Spain between the NAME and FLEXPART a posteriori forecasts may be due to the accuracy of the MetUM and ECMWF wind fields at the altitudes at which the ash is transported.There are differences in the derived ESPs which could also be causing this discrepancy.These differences are due to the inversion algorithm finding different optimal ESPs based on the different VATDMs and meteorologies used.
Figure 5 shows a series of SR forecasts with the observed ash clouds that were used to initialize those simulations outlined in red.It is apparent from Figure 5a that much of the ash cloud coverage over Spain in the LM forecast (Figure 3d), specifically at the eastern edge, has been contributed by SR1.The SR1 forecast does not, however, include ash in the region 47 to 60  ash had not yet been emitted at the time of the R1 retrieval.This shows that in this case, it has been necessary to use retrieval data from different times to create the LM forecast.
Figure 6 shows the three alternative data insertion forecast methods.Figure 6a shows the weighted average of the same five simulations used to create LM(5) 1.0 km, Figure 6b is the mean of the same five simulations, and in Figure 6c preference has been given to recent, nonzero values.The weighted average and mean fields have generally lower ACL values and are spread wider than the recent value forecast field.Figure 6c is the most similar to the observation for that time (Figure 2b), as would be expected.None of the fields are as spread as the maximum value method (Figure 3d) nor do they show ash >0.05 g m −2 (the lower contour level in the figure) as far east in the filament over northern Spain.As the ash in the tip of this filament has been shown to mainly originate from the SR1 simulation, creating the forecasts using the mean of the simulations dilutes the ash from SR1 in that region.In the recent value method, much of the ash in this region is below the range shown in the plot due to the influence of the most recent simulations, which have very low ACLs here.
The data insertion forecasts do not include newly erupted ash south of Iceland, probably for two reasons.First, there is no constant ash injection from the volcano vent included in the data insertion method; therefore, newly erupted ash will only contribute if new images are used.eruption plume reported in Arason et al. [2011] show that the plume height was ∼4-5 km from 6 to 8 May, indicating that the eruption rate was fairly continuous over that time.Second, the newly erupted ash is not fully visible in a number of the retrievals.This may be due to a high optical thickness of the newly erupted ash cloud making it opaque and hence undetectable using the BTD method; the difference between the two channels tends toward zero for increasingly opaque clouds [Prata and Grant, 2001].Also, it is possible that water is present in the plume, that there is meteorological cloud obscuring the ash signal, or that there is a high concentration of large particles making the detection difficult [Rose et al., 1995;Stevenson et al., 2015].

Quantitative Assessment 3.2.2.1. SAL
SAL is performed on the model output for 0900 UTC 8 May 2010.The "observed" field compared here comprises the maximum values over the previous 1 h from each pixel retrieved using 1D-Var.The aim of this is to create a slightly smoother observation field because the retrievals are generally patchier than modeled fields.This period coincides with the 1 h averaging time of the NAME forecasts.Observed and modeled fields must be compared on the same grid; therefore, the "smoothed" observed field is mapped onto the model grids using nearest neighbor interpolation.
Absolute SAL scores are shown in Figure 7a.The four model runs with lowest (best) absolute SAL scores are SR5 1.0 km, SR5 0.5 km, SR6 0.5 km, and SR6 1.0 km (see section 2.2 for a description of the notation).Hence,  according to this measure, the SR simulations with short lead times are the best performing of the forecasts studied here.The highest (worst) overall SAL scores were achieved by SR1 0.5 km, SR1 2.0 km, SR1 1.0 km, and NAME best guess.Hence, using later observations to initialize the model in the SR forecasts generally improves the SAL score, i.e., forecasts get worse with longer forecast times (as would be expected).Figure 7a shows that assuming a thinner layer depth for the entire ash cloud during insertion gives a slightly better score for the SR forecasts, indicating that 0.5 km may be a reasonable assumption for the layer thickness at those times.However, this increase in skill is very small.
Negative S values are achieved by the three former of these forecasts and the FLEXPART a posteriori forecast.This indicates that those forecasts have predicted fields that cover too small an area and/or are too peaked.It can be seen in Figure 3c that the FLEXPART a posteriori field is peaked immediately south of Iceland.
The NAME best guess and NAME a posteriori fields in Figures 3a and 3b, respectively, are also peaked at that location, but these and all other forecasts have positive S values, indicating they are too spread and/or flat.The spread of the fields in Figures 3a and 3b compensate for the peaked values, giving a positive S score.It can also be seen that the NAME data insertion LM(5) 1.0 km forecast (Figure 3d) is more spread and has larger objects than the observed field, resulting in a positive S score.All of the model fields contain fewer objects than the observation field (not shown).
Forecasts with the best A scores (Figure 7c) are LM(5) 2.0 km, LM(5) 1.0 km (Figure 3d), LM(5) 0.5 km, and LM(6) 2.0 km.LM forecasts show a general improvement in the A score compared with the SR forecasts but have a worse overall SAL score due to the large S component, which is probably contributed by earlier retrievals.To test whether the poor S score is contributed by earlier retrievals, simulations SR1, SR2, and SR3 are removed from the LM(5) 1.0 km forecast (Table 3).This gives improved S scores for the three forecasts tested, LM(SR2-SR5) gives an improved A score, and LM(SR3-SR5) and LM(SR4-SR5) give improved L scores.All absolute SAL scores are improved by excluding the simulations initialized using the earliest retrievals from the forecast.However, excluding these also removes some of the ash over northern Spain (not shown).  1 for retrieval times.LM(5) (Figure 3d) is included for comparison.The bottom three rows show SAL scores for the weighted average, mean, and recent value forecasts combining simulations SR1-SR5.In all simulations a 1.0 km thick ash layer is assumed.See text for details.
For the LM forecasts in Figure 7c the best A score is achieved using a combination of SR1-SR5.The NAME best guess A score indicates an overprediction of ash mass.This could be due to the satellite sensor failing to detect ash where the model has included it or too much ash having been released in the model.The combination of positive S scores and negative A scores for many of the other forecasts could indicate that there is too much diffusion in NAME, ash having been inserted at the wrong height, or transport errors leading to spread out clouds with large objects compared with the observation.NAME output represents an ensemble average over unresolved motions, and this is likely to be wider than an instantaneous ash plume.The overspreading leads to large areas of ACL below the lower threshold used here (i.e., ash outside the red outlines in Figure 3).That ash is not included in the SAL calculations, reducing the modeled ash mass available for comparison, and making the A score more negative.For the LM forecasts, the high S scores could be due to the same reasons mentioned above or due to the number of simulations contributing to the forecast leading to a wider ash cloud, as previously discussed.
The best L scores (Figure 7d) are achieved by SR runs utilizing retrievals from 3-6 h previous to the forecast time: SR6 0.5 km, SR6 2.0 km, SR6 1.0 km, SR5 2.0 km, and SR5 1.0 km.This is likely to be because observed data have been propagated by the model for a short period, and the ash detection is consistent over that time.The worst L scores are achieved by NAME best guess, SR2 2.0 km, SR2 1.0 km, and SR2 0.5 km.This may indicate that R2 was a poor retrieval or that the assumptions made during insertion into NAME were unsuitable for that time.
The SAL scores for the weighted average, mean, and recent value data insertion forecast methods are shown in Table 3.This shows that these three alternative methods score better overall than the LM(5) 1.0 km forecast (created using the maximum value method), due to smaller S and L scores, but do not improve on the A score.This is likely to be because the fields for these methods do not contain as much ash around the peripheries of the ash clouds as the LM(5) 1.0 km forecast, reducing the area of ash exceeding the 0.5 g m −2 ACL threshold for which the S component is calculated.The recent value method has the best SAL scores of the three methods, which are the same as those for SR5 1.0 km.

FMS
Using the same model and observation fields as for SAL above, the Figure of Merit in Space is also calculated.The result is shown in Figure 8.This indicates that the SR simulations tend to have the best degree of spatial overlap with the observations, generally improving as the time since model initialization decreases.However, SR2 simulations show worse skill than all other data insertion simulations, again indicating that SR2 may be a poor simulation.The LM forecasts perform similarly to the a posteriori forecasts.The reason for the LM forecasts generally scoring worse than the SR forecasts is attributable to wider forecast fields resulting from combining a number of simulations.The NAME best guess forecast has the lowest score, as it has the widest spatial ash coverage (and the least input of observed data).
One reason for the general trend in increasing simulation skill with decreasing time between the last retrieval used and the forecast time is that the most recent inserted retrieval has been propagated by the model for a shorter period.That retrieval and the one used for comparison should have a high correlation, i.e., similar parts of the ash cloud are likely to be observed/obscured, and hence, simulations using the later retrievals WILKINS ET AL. should score better.Another possible reason is that, in this case, the ash cloud may be better observed in the later retrievals.A good earlier retrieval (observing most of the ash cloud and with very little cloud to obscure the observation) may give a better forecast than a later, poor retrieval.This may be the reason for the SR2 simulations generally scoring worse than the SR1 simulations.
The FMS scores for the weighted average, mean, and recent value forecast methods using SR1-SR5 are 54%, 52%, and 59% (same as SR5 1.0 km), respectively.These are higher scores than those achieved by the LM(5) 1.0 km forecast.Again, this is likely to be for the same reason as for the increase in skill for the S component of SAL; the fields for these methods do not contain as much ash as the LM(5) 1.0 km forecast, reducing the area of ash exceeding the 0.5 g m −2 ACL threshold for which the FMS score is calculated.

Applicability
For near-instantaneous eruptions, one or two good retrievals (and meteorological data) may be sufficient to simulate ash transport at long-range.For long-lived eruptions, such as Eyjafjallajökull, a good quality recent retrieval may be sufficient to predict ash transport in the short term, but to predict at a number of times over a longer range, a number of retrievals are required to maintain the addition of ash mass.For example, Figure 5a shows that a single retrieval from 38 h previous to the forecast time gives model output in which a large amount of ash is missing due to the continued eruption.Frequent updates would also be useful for the short emission to constrain advection errors.
To further demonstrate the applicability of the data insertion method, the shorter eruption of Grímsvötn (Iceland, May 2011) has been modeled, and the ACL results are shown in Figure 9a, with the corresponding SEVIRI BTD image shown in Figure 9b.In Figures 9c and 9d, part of the April 2010 phase of the Eyjafjallajökull eruption is shown.In both cases, four SR simulations assuming 1.0 km thick ash clouds have been combined to create LM forecasts (using the maximum value method) at 1200 UTC 24 May 2011 and 1200 UTC 16 April 2010, respectively.Both forecasts are shown 24 h after the first retrieval was inserted and 6 h after the last.In a study of the effects of clouds on detection of ash in simulated SEVIRI imagery using the BTD method assuming a detection threshold of 0.2 g m −2 ACL, Kylling et al. [2015] found that on average clouds reduced the detection of ash affected pixels by ∼12% in the period 14-21 April 2010 (i.e.∼25% were detected in cloudless scenes and ∼13% in cloudy scenes) and up to 40% in some images for the Eyjafjallajökull eruption.The effects of clouds on images from 15 April 2010 to 6-8 May 2010 were found to be small or negligible but were larger on 16 April 2010.For the eruption of Grímsvötn in May 2011, fewer ash contaminated pixels were detected and the average reduction in detection of those pixels due to clouds was ∼6% (i.e. from 10.0% of pixels detected in cloudless scenes to 3.6% in those with clouds).
In the early phase of the Grímsvötn eruption in May 2011, SO 2 traveled northward [Kylling et al., 2015], and a small amount of ash was detected north of Iceland in SEVIRI imagery by Cooke et al. [2014].During the later phase, differential settling and different emission altitudes of ash and SO 2 particles caused SO 2 to travel toward the north and most of the ash southeastward [Moxnes et al., 2014].As retrievals of the early phase are not included in this study, ash only traveled in a southeasterly direction in the data insertion simulations used to create the forecast shown in Figure 9a.Qualitatively, the position of the observed ash cloud, which shows as a dark blue region over the North Sea between Scotland and Norway in Figure 9b, agrees well with the forecast in Figure 9a.
For the April 2010 Eyjafjallajökull case in Figure 9c (note a different color scale is used), it can be seen that the transport pattern of the ash is similar to the pattern of negative BTD in Figure 9d.However, as meteorological cloud may have significantly reduced the ash detection on 16 April, it is likely that additional information is required to capture the majority of ash in this case.

Advantages and Limitations
While simpler than other data assimilation methods, data insertion is more complex than methods based solely on eruption height, and it introduces different issues.The data insertion method is strongly dependent on good satellite observations.Compared to methods such as inversion modeling, which accounts for errors in the observations and the model forecast, this method is largely data centered and does not consider errors in the satellite retrievals.Ash will be missed from the simulations if it has erupted since the retrieval, if it is obscured by cloud, or if it is near to the volcano, where it may be undetected either because it is too optically thick or has a high water content (see Figure 3d).The issue of missing ash close to the source could be overcome by including ash initialized from the volcano location in NAME.This is beyond the scope of this paper but will be addressed in future work.
While missing ash is a major limitation, especially close to the source where the concentration could be very high, the data insertion method only relies on images of downwind ash and does not make any assumptions about the source (e.g., eruption plume height, MER and fine ash fraction) nor does it require modeling of 10.1002/2015JD023895 near-source processes (e.g., particle aggregation, transport by gravity currents, and others).This means that it could be applicable to volcanoes in remote areas which are not well observed via ground-based monitoring.It is reasonable to assume that this method could also be applied to atmospheric SO 2 transport as long as good satellite retrievals can be performed, but this has not been attempted here.
Another limitation of the data insertion method is that it requires estimation of the layer thickness, vertical distribution of ash, and PSD, which may be difficult to obtain in an operational setting.Where these parameters are unknown, it is suggested that a set of default values could be used.However, the use of a set of default values could introduce large errors in the forecast position.Also, these are properties that can generally only be measured for a point location or along a trajectory (e.g., ground based lidar, research flights and others) and are not available for the entire ash cloud.A small number of observations (if available) must, therefore, be applied to a larger region of the ash cloud.The data insertion method does, though, have the advantage of enabling the modeler to update the forecast with new retrievals or different observations as they become available.Updating the model in this way could allow inclusion of observed features even if they are not resolved by the model, and if a good retrieval close to the forecast time is available, model transport errors may be relatively small.
In the NAME configuration used here, vertical distributions other than uniform or Gaussian would require the separation of sources into a number of layers in order to represent the vertical ash profile in the NAME input file.This would involve some simplifications of the profile and increase the number of sources released by the number of vertical layers used.Although beyond the scope of this paper, a next step would be to use Cloud-Aerosol LIdar with Orthogonal Polarization data to better characterize the vertical distribution of the ash layer and determine its effect on the simulated ash cloud.
The manner in which simulations are used to create the forecast will have a significant effect on the result.As discussed above, the LM maximum value method is conservative and is used to overcome obscured ash in one or more retrievals.The LM method does not remove false alarms which could be present due to errors in the observation, the translation from observation to model input format, the dispersion model itself, or the meteorological data used to drive the model.The weighted average method places more trust in newer simulations initialized from later retrievals, which are likely to contain fewer cumulative transport errors than earlier ones, but also does not allow distinction between good and bad retrievals.By taking the mean, we give all simulations equal weight, but a simulation in which ash is obscured by cloud would reduce the column load in the forecast.The recent value method gives the same results as simply taking the SR5 simulation here, although this may not always be the case.

Conclusions
We discuss the application of a data insertion method, where the NAME dispersion model is initialized from satellite retrievals of downwind ash clouds, using the 8 May 2010 ash cloud from the eruption of Eyjafjallajökull as the main case study.The model results are compared against other established methods and satellite imagery.We also briefly discuss the application of this method to the April 2010 phase of the Eyjafjallajökull eruption and the May 2011 Grímsvötn eruption.
For the 8 May 2010 Eyjafjallajökull case, the data insertion forecasts perform as well, according to the SAL and FMS metrics, as the NAME best guess forecast, which is initialized from the volcanic source alone and includes measured eruption plume height data in its ESPs.
Layered model (LM) data insertion forecasts score worse overall than the single retrieval (SR) forecasts using SAL and FMS.Poorer S, L, and FMS scores reflect the broader, more extensive ash clouds that are generally provided by the LM forecasts compared with the observations.The LM forecasts thus provide a more conservative estimate designed to ensure most observable ash is captured.The weighted average, mean, and recent value forecast methods give better absolute SAL and FMS scores than the LM method, for the same reasons as above.However, the LM method achieves significantly better A scores and produces a more similar pattern of ash over northern Spain on 8 May 2010 to that observed in SEVIRI imagery.
For satellite images in which a lot of ash is obscured, the data insertion method alone is likely to be insufficient, and more information such as a cloud mask or a combination of data insertion and initialization from the volcano source may be necessary.

2.
Observed Eyjafjallajökull ash cloud at 0900 UTC 8 May 2010.(a) Brightness temperature difference image with negative values shown in blue, (b) 1D-Var ash column loading retrieval

Figure 3 .
Figure 3. Ash column loading forecasts for 0900 UTC 8 May 2010 described in Table 2: (a) NAME best guess, (b) NAME a posteriori, (c) FLEXPART a posteriori, and (d) NAME data insertion forecast LM(5) 1.0 km; a combination of SR simulations initialized using retrievals R1-R5 (see Table1) assuming a 1.0 km thick ash layer.Red lines show the outlines of objects identified for the SAL analysis.

Figure 4 .
Figure 4. Cross sections of modeled ash concentrations at 0900 UTC 8 May 2010, 43 ∘ N over Spain.(a) NAME best guess forecast, (b) NAME a posteriori forecast, (c) NAME data insertion forecast LM(5) 1.0 km shown in Figure 3d.1D-Var retrieved ash cloud heights (with the  ratio test turned off ) from between 42.95 and 43.05 ∘ N are plotted with black asterisks.

Figure 6 .
Figure 6.Ash column loading forecasts for 0900 UTC 8 May 2010 by combining simulations SR1-SR5 using the (a) weighted average, (b) mean, and (c) recent value methods.

Figure 7 .
Figure 7. SAL scores for the smoothed observed field versus modeled fields at 0900 UTC 8 May 2010.FLEXap, NAMEap, and NAMEbg are the FLEXPART and NAME a posteriori forecasts and NAME best guess forecast, respectively.(a) Absolute SAL scores, (b) structure, (c) amplitude and (d) location.

Figure 8 .
Figure 8. Figure of Merit in Space scores for the smoothed observed field versus modeled fields at 0900 UTC 8 May 2010.The nomenclature is the same as Figure 7.

Figure 9 .
Figure 9. (a) Grímsvötn ash cloud at 1200 UTC 24 May 2011 modeled using the layered model (LM) data insertion method, (b) the corresponding BTD SEVIRI image for that time, and (c and d) the same as Figures 9a and 9b, respectively, but for the Eyjafjallajökull ash cloud at 1200 UTC 16 April 2010.

Table 1 .
SEVIRI 1D-Var Retrievals of the Eyjafjallajökull Ash Cloud Used for 2, specifically focussing on the 8 May 2010 Eyjafjallajökull ash cloud.WILKINS ET AL.ASH CLOUD MODELING USING DATA INSERTION 307

Table 2 .
Previously Studied Model Simulations Used Here for Comparison to the Data Insertion Forecasts WILKINS ET AL.ASH CLOUD MODELING USING DATA INSERTION 309 FLEXPART ECMWF Same as NAME a posteriori but with FLEXPART a posteriori replacing NAME, both in applying the inversion method to estimate the source and in calculating the forecast, and with the European Centre for Medium-Range Weather Forecasts (ECMWF) meteorological data.

Table 3 .
SAL Scores for LM Forecasts (Created Using the Maximum Method), Combining Simulations Initialized From Different Retrievals Are Shown in the Top Three Rows a a See Table WILKINS ET AL.ASH CLOUD MODELING USING DATA INSERTION Kate Wilkins is supported by the Natural Environment Research Council [Consortium on Risk in the Environment: Diagnostics, Integration, Benchmarking, Learning and Elicitation (CREDIBLE) and Cooperative Awards in Science and Technology (CASE, Met Office); grant reference NE/J017450/1].We acknowledge use of the NAME atmospheric dispersion model, associated NWP meteorological data sets and 1D-Var code made available by the Met Office, and also EUMETSAT for the provision of SEVIRI data via the Earth Observation Portal (available upon registration).Data used to generate figures, graphs, plots, and tables are freely available via contacting the lead author: kate.wilkins@bristol.ac.uk.Thank you to Pete Francis for the help with implementing the 1D-Var method used here, to Heini Wernli for use of the SAL code, and to Shona Mackie, Natalie Harvey, and Luke Western for helpful comments on a draft of this paper.Thank you also to the reviewers for helpful comments.