A new framework for evaluating dust emission model development using dichotomous satellite observations of dust emission

modelling has large uncertainties. Satellite observations of dust emission point sources (DPS) provide a valuable dichotomous inventory of regional dust emissions. We develop a framework for evaluating dust emission model performance using existing DPS data before routine calibration of dust models. To illustrate this framework ’ s utility and arising insights, we evaluated the albedo-based dust emission model (AEM) with its areal (MODIS 500 m) estimates of soil surface wind friction velocity ( u s * ) and common, poorly constrained grain-scale entrainment threshold ( u * ts ) adjusted by a function of soil moisture ( H ). The AEM simulations are reduced to its frequency of occurrence, P ( u s * > u * ts H ). The spatio-temporal variability in observed dust emission frequency is described by the collation of nine existing DPS datasets. Observed dust emission occurs rarely, even in North Africa and the Middle East, where DPS frequency averages 1.8 %, (~7 days y (cid:0) 1 ), indicating extreme, large wind speed events. The AEM coincided with observed dust emission ~71.4 %, but simulated dust emission ~27.4 % when no dust emission was observed, while dust emission occurrence was over-estimated by up to 2 orders of magnitude. For estimates to match observations, results showed that grain-*

Dust models are essential for understanding the impact of mineral dust on Earth's systems, human health, and global economies, but dust emission modelling has large uncertainties.Satellite observations of dust emission point sources (DPS) provide a valuable dichotomous inventory of regional dust emissions.We develop a framework for evaluating dust emission model performance using existing DPS data before routine calibration of dust models.To illustrate this framework's utility and arising insights, we evaluated the albedo-based dust emission model (AEM) with its areal (MODIS 500 m) estimates of soil surface wind friction velocity (u s* ) and common, poorly constrained grain-scale entrainment threshold (u *ts ) adjusted by a function of soil moisture (H).
The AEM simulations are reduced to its frequency of occurrence, P(u s* > u *ts H).The spatio-temporal variability in observed dust emission frequency is described by the collation of nine existing DPS datasets.Observed dust emission occurs rarely, even in North Africa and the Middle East, where DPS frequency averages 1.8 %, (~7 days y − 1 ), indicating extreme, large wind speed events.The AEM coincided with observed dust emission ~71.4 %, but simulated dust emission ~27.4 % when no dust emission was observed, while dust emission occurrence was over-estimated by up to 2 orders of magnitude.For estimates to match observations, results showed that grain-

Introduction
Atmospheric mineral dust has an important impact on many of Earth's systems, human health, and global economies (Li et al., 2018;Pi et al., 2020;Tegen and Schepanski, 2018).The scale of this impact is, at least in part, prescribed by the location and environmental controls of the emission source (Ackerman, 1997;Schepanski et al., 2012).Dust emission models have been developed over decades to resolve spatial patterns and trends of aeolian processes (emission, transport, and deposition) in the dust cycle (Shao et al., 2011;Chen et al., 2017;Yuan et al., 2019).Dust emission models are crucial for simulation of aeolian processes at unsampled / unmonitored locations for comparison with indicators and benchmarks to understand the impact of management on environmental changes (Pi et al., 2020).Dust emission models are also essential for making hindcasts in palaeo-environmental reconstructions (Mahowald et al., 2010) and forecasts in dust-climate interactions in Earth System Models (ESMs).
Global dust emission models were developed more than two decades ago (Marticorena and Bergametti, 1995) and have been rapidly adopted into large scale dust cycle models as part of ESMs, where their fidelity requires necessary compromise and simplification within their parameterisations (Raupach and Lu, 2004).These ESMs comprise a dust emission (production) module, a module describing horizontal and vertical transport of dust aerosol (advection scheme) and a module parameterising dust removal processes (dry and wet deposition).Dust emission and dust deposition processes are the critical factors which ultimately determines the net atmospheric dust concentration (Textor et al., 2006).Accordingly, an accurate estimate of dust feedbacks on e.g., radiation and cloud formation processes requires an accurate representation of dust emission (Chappell et al., 2023a;Chappell et al., 2023b).
Early dust emission models assumed the Earth's surface was devoid of vegetation and did not change over time.That assumption has been partially alleviated with the use of lateral cover (Raupach, 1992;Raupach et al., 1993) but which only very crudely represents the aerodynamics of drag partition (Chappell et al., 2023a).Currently, two key simplifying assumptions remain: i) a grain-scale entrainment threshold remains constant within soil types and static over time; ii) an infinite supply of sediment for transport is available everywhere.These assumptions cause ESMs to continually over-estimate dust in the atmosphere (Zender, 2003).Since ESMs focus on dust in the atmosphere, modelled atmospheric dust is reduced by comparison with observed dust optical depth (DOD).
Importantly, DOD is not a direct measurement of dust emission magnitude or frequency, key components which together underpin the sediment transport equation (Wolman and Miller, 1960;Lee and Tchakerian, 1995).Rather, DOD measures the concentration of dust in a specific column of atmosphere at a given moment.Extended atmospheric residence of dust (days to weeks) can exacerbate bias away from dust emission, towards atmospheric dust (Schepanski et al., 2012).Consequently, synoptic circulation may increase concentrations within pressure systems, maintaining aerosol optical depth (AOD) over specific areas without any significant further emission (Schepanski et al., 2012).While the deficiencies in existing dust emission modelling are somewhat understood, the inconsistency of evaluating dust emission model performance against DOD conceals which critical factors need to improve to increase dust emission model fidelity.Notably, current uncertainties in CMIP6 models are larger than previous generations, providing a timely implication that dust process parameterisations are becoming more uncertain as models develop (Zhao et al., 2022).For clarity, the preceding description is directed solely at dust emission modelling, and we do not dispute the utility and benefits of dust aerosol loading to calibrate ESMs.To isolate the performance of the dust emission modelling, we introduce a framework for evaluating dust emission models before the routine calibration of dust cycle models against DOD.
Satellites observe atmospheric dust.Additional expert inspection of satellite imagery enables the identification of dust plumes and to trace over space-time the dust plumes to the location from which they were emitted.Consequently, this use of satellite observed dust emission point sources (DPS) is distinct from satellite observed optical depth in the atmosphere which are not related directly to dust emission.Dust emission typically occurs infrequently (e.g., Hennen et al., 2019), and in remote and inhospitable areas.Field measurements of dust emission rely either on a limited number of ground stations or serendipitous observations.For these reasons, satellite-based remote sensing is ideally positioned to monitor and identify the source of these emissions.Currently, automated approaches are not well-established to accurately distinguish satellite observed DPS at the head of the plume.Therefore, DPS identification is performed by expert analysis, where an expert observer can study the shape of the plume, recognise any atmospheric opacity (clouds, smoke, dust, or fog) and precisely locate the dust emission.Consequently, DPS data currently represent the most robust set of dust emission observations from which to evaluate the performance of a global dust emission model (Johnson et al., 2011;Tegen et al., 2013;Laurent et al., 2010).
The aim here is to demonstrate that dust emission models should be evaluated against observed dust emission data and ultimately provide correctly calibrated dust emission modules prior to inclusion in ESMs.We seek to evaluate the performance of global dust emission models against global dust emission observations at appropriate scales.Our novel evaluation framework is based on two innovative approaches.The first approach collates nine extant observed DPS data from extant peerreviewed studies into a new global dataset of dust emission sources (Baddock et al., 2009;Bullard et al., 2008;Eckardt et al., 2020;Hennen et al., 2019;Kandakji et al., 2020;Lee et al., 2012;Nobakht et al., 2019;Schepanski et al., 2007;von Holdt et al., 2017).These DPS data describe dust emissions occurring over a wider range of conditions (soil and vegetation types and climates) than previously considered in dust emission modelling of only desert type conditions.These DPS data describe dust emission dichotomously (presence = 1) for studied areas at selected times.The second approach is to apply for the first time, established numerical weather forecasting dichotomous evaluations to dust emission predictions to evaluate dust emission model performance.We determine the coincidence in observed and modelled outcome at each DPS location for every day of the respective study duration.The second approach requires the novel use, in this field, of a contingency table to determine model performance through the respective number of daily 'hits' (Observed and Modelled dust), 'misses' (Observed dust, not Modelled), false positives (Modelled, not Observed dust), and correct negatives (no dust Observed or Modelled).
To enable the use of these novel approaches with dust emission models we reduced the continuous dust emission models to the binary occurrence when modelled soil surface wind friction velocity (u s* ) exceeds the entrainment threshold (u *ts ) adjusted by a function of soil moisture (H).This approach is emerging as a powerful new mechanism to overcome the poorly constrained dust frequency distribution and for calibrating dust emission models whilst dust emission parameterisations are improved (Hennen et al., 2022(Hennen et al., , 2023;;Chappell et al., 2023aChappell et al., , 2023b)).Our analyses are here compared regionally, with dust emission model performance in different soil-climate environments (in dryland regions with a range of soil types, vegetation density and wind speeds; Fig. 1), demonstrating how modelled and observed dust events coincide over time.These approaches enable us to identify how changes in dust emission model development improve dust emission model performance related to environmental controls, specifically variability when dust emission occurs due to the soil surface wind friction velocity exceeding the sediment entrainment threshold adjusted by soil moisture P(u s* > u *ts H) and dynamic erodibility of the soil.These analyses provide both i) a robust examination of contrasting dust emission model approaches and ii) critical information on the fidelity of wind friction velocity thresholds and sediment supply across dust source regions.These approaches will also improve the understanding of process representation in the dust emission modelling e.g., if a dust emission model consistently fails to reproduce a certain dust emission event, our approach identifies the need and provides a mechanism for how to improve the model.
We propose this new approach to routinely evaluate dust emission model development particularly whilst the aeolian research community is tackling those two key simplifying dust emission model assumptions about threshold and sediment supply.We recognise that dust emission model developments may not be sufficiently rapid to keep pace with applications e.g., in ESMs whilst the dust emission models are poorly constrained.Consequently, we recommend our recently established approach to using DPS data to calibrate dust emission model estimates and improve their performance before being used in the ESMs (Hennen et al., 2022;Hennen et al., 2023;Chappell et al., 2023aChappell et al., , 2023b)).

Validation datasets
We collated nine datasets from published studies across multiple dust emitting regions around the world (Fig. 1).This global satellite observed dust emission point source (DPS) dataset includes the location and timing of dust emission events from many but not all the major global dust producing drylands.For each study, satellite-derived data were acquired at regular intervals and subjectively inspected by an operator to identify the presence of dust plumes.Identification of elevated dust over a desert surface is particularly challenging in visible wavelengths, due to the spectral similarities of elevated dust and bare soil in the visible spectrum (Hsu et al., 2004).Therefore, images are typically converted into false colour composites, enhancing the image with spectral bands outside the visible wavelengths, specifically in the thermal infrared (TIR) bands (Lensky and Rosenfeld, 2008;Miller, 2003).Using these dust enhancement products, operators visually identify the point(s) where a dust plume originated and digitize each of these locations as a dust emission point source (DPS).The exception is North Africa (Schepanski et al., 2007), where the area of dust emission is observed sub-daily, within a 1 • grid (i.e., frequency of local emissionmaximum 1 per day).In this case, the centroid position within the grid box is taken as the dust emission source.The DPS identification protocol was the same for all DPS data sets.
The DPS data collection can be classified into two methodological groups, defined by the type of satellite data used.The majority (6 out of 9) of these studies used Moderate-resolution Imaging spectroradiometer (MODIS) multispectral imagery, which offers twice daily (daylight) imagery of the Earth's surface from each (Aqua and Terra) NASA satellite.These passive optical sensors provide a maximum spatial resolution of 250 m (level 1), recording surface reflectance in 36 individual spectral bands ranging from 0.4 μm (near ultraviolet) to 14.4 μm thermal infra-red (TIR; NASA).Their sun-synchronous orbits permit repeat observations at the same mean solar time, with Terra and Aqua spacecraft crossing the equator at 10:30 am and 1:30 pm (local time) respectively.For dust plume identification, a dust enhancement product is produced using brightness temperature differences (BTD) between a combination of visible bands (B1: v. red: 0.645 μm;, B3: v. blue: 0.470 μm; B4: v. green: 0.555 μm), near infrared (NIR, B26: 1.375 μm) and TIR bands (B31: 11.03 μm and B32: 12.02 μm) to distinguish dust plumes from the surface and other atmospheric conditions (e.g., clouds, biomass burning) (Nobakht et al., 2019).These BTDs distinguish the elevated plume as a thermal anomaly from the desert surface below, the calculated value (dimensionless) is included as the red beam of an RGB false colour composite (FCC) image, with blue and green beams using visible bands B3 and B4 (Fig. 2a).
The three other datasets cover North Africa, the Middle East, and areas in southern Africa, using the Spinning Enhanced Visible and Infrared Imager (SEVIRI) aboard the Meteosat Second Generation (MSG) satellite.This satellite operates in a geostationary orbit, with a spatial resolution of 3 km at nadir and frequent repeat observation (15 min).Atmospheric dust is identified within the narrow band thermal infrared (TIR) wavelengths (8.7 μm -12.0 μm) by its spectral signature, like MODIS DPS (Ackerman, 1997;Banks et al., 2018Banks et al., , 2019;;Volz, 1973).Atmospheric dust produces a distinctive reduction in thermal emissivity, when compared to clear sky conditions, across each of the TIR channels, with maximum absorption around 10.8 μm (Brindley et al., 2012;Sokolik, 2002).Again, the SEVIRI dust RGB product is rendered through BTDs, with red and green beams described through the difference between 10.8 μm and adjacent TIR bands 8.7 μm and 12.0 μm, while the blue beam is limited by the BT at 10.8 μm (Lensky and Rosenfeld, 2008).
The physical basis for this approach is given by the spectral variability of the refractive index for mineral dust particles across the TIR (Ackerman, 1997).Due to the variability, the spectral difference of the indices differs for individual wavelength bands.Hence, calculated BTD indicate the presence of mineral dust aerosol.During dusty conditions, absorption in the 10.8 μm channel is greater than the 8.7 μm and 12.0 μm channels, increasing BTD 12.0 μm -10.8 μm and decreasing BTD 10.8 μm -8.7 μm, creating a distinctive pink coloration of dust plumes in the RGB images (Banks et al., 2018(Banks et al., , 2019) ) while clouds appear as red or orange and land surface as cyan (Fig. 2b).The thermal dust index essentially is sensitive to mineral dust aerosol due to the refractive index being  grid boxes where frequency is described by a minimum of one DPS observation per day (maximum = 0.43; details are provided in main text below).Source North America: (Baddock et al., 2009;Kandakji et al., 2020;Lee et al., 2012); North Africa: (Schepanski et al., 2007); Middle East: (Hennen et al., 2019); Namibia: (von Holdt et al., 2017), South Africa: (Eckardt et al., 2020), Central Asia: (Nobakht et al., 2021); Australia: (Bullard et al., 2008).spectrally variable.As the refractive index varies barely spectrally for soot, this index is not sensitive to soot aerosol such as from biomass burning.However, it shows a sensitivity to volcanic aerosols (Ackerman, 1997); due to the colour rendering these aerosols may appear in a red (ash), yellow-greenish (SO 2 gas) or yellow colour (ash + SO 2 gas mixed), which can be clearly separated from the magenta colour indicating mineral dust aerosol (EUMETSAT RGB quick guides).
Absorption across the TIR wavelengths due to water vapour reduces the cooling trend created by atmospheric dust, presenting a potential limitation for each method (Brindley et al., 2012).The presence of meteorological cloud or elevated dust emission from upwind sources can obscure observation of the source of emission in a single image.Using SEVIRI's high (15-min) observation frequency, the observer will 'backtrack' plume position and size in sequential images to identify the location of where it first appears, allowing clear delineation of overlapping plumes (Hennen et al., 2019).The fine spatial resolution (250 m) of MODIS data describe the plume in great detail, partially mitigating the limitation of overlapping plumes as the observer can identify individual plume shapes, (Baddock et al., 2009).Spatial changes in surface condition (vegetation, geology) cause variations in surface TIR emissivity, potentially obscuring typical plume BTD profiles in RGB renderings (Banks et al., 2018(Banks et al., , 2019;;Banks and Brindley, 2013).Subjective interpretation can effectively mitigate many of these limitation scenarios, providing a better interpretation of plume dynamics than nondynamic automated retrieval algorithms, which are constrained by the need to work in all surface and atmospheric conditions (Schepanski et al., 2012).
The ability of human operators to interpret plume shape and make decisions on potential false positives currently exceeds those of automated approaches, although not without caveats (Sinclair and LeGrand, 2019) but which do not account for our grid box aggregations (see Section 3.3).Importantly, DPS studies typically determine specific criteria for determining an emission event, including i) the deflation surface is clearly identifiable at the head of emission plume; and ii) meteorological clouds or upwind dust emission plumes must not obscure the source of emission plume.Therefore, these data represent the cutting-edge of dust emission observations, allowing spatial verification by genuine emission events.These data represent a dichotomous account of dust emission, where only dust events are recorded DPS = 1.The absence of dust emission is not recorded.Consequently, there is an inherent bias in these data towards the occurrence of dust emission from observable events and in their quantitative analysis we must account for this bias using (weather forecast evaluation) statistics designed to handle this bias in dichotomous data (see Section 3.3).
Importantly, DOD data share many of the limitations that affect DPS observations.In particular, optically thin dust is detected in DOD, producing a bias towards large dust events like DPS.Commonly, ESM simulation calibration and/or performance evaluation is performed using ground-based AOD (AERONET) or satellite-based data.However, only few of the many AERONET ground-stations are located near to dust emission sources.Accordingly, validation of dust emission model results and DPS data with AERONET station data is inappropriate, due to their displacement from emission source dynamics and their reliance on transported/atmospheric dust.In contrast, satellite derived DOD estimates are continuous, providing measurements across all global dust source regions (Ginoux et al., 2012).However, as DOD measures the total column of atmospheric dust, it is difficult to distinguish between transported (aged) and freshly emitted (new) dust plumes.Consequently, DOD is also not consistent with DPS and dust emission model results (Chappell et al., 2023a(Chappell et al., , 2023b)).
We do not use DOD estimated from satellite observations here because the spatio-temporal variation in the dust emission processes is not directly represented.The DOD concentrations are only partly related to emission processes, as a product of emission frequency and magnitude.However, residence time is critical, as near surface winds and sizedependant deposition rates continually alter plume composition following emission event.Furthermore, automated DOD (Deep Blue product) collection processes are well known (cf., Ginoux et al., 2012 end paragraph 46) to be predominantly constrained to highly reflective areas (e.g., sand), with reduced reliability over water and vegetated surfaces.
where ρ a is air density (1.23 kg m − 3 ), g is gravitational acceleration (9.81 m s − 2 ), c is a dimensionless fitting parameter (set to 1), and u *ts (d) is threshold wind friction velocity (m s − 1 ).The soil surface wind friction velocity u s* is the momentum remaining after the removal of momentum by roughness elements at all larger scales (topography, vegetation).The entrainment threshold u *ts (Marticorena and Bergametti, 1995) is described and explained in detail in standard workflows (Darmenova et al., 2009).The H(w) is a function which adjusts u *ts when soil moisture (w) inhibits entrainment following Fécan et al. (1999).The above Eq.( 1) describes how the magnitude of sediment transport is calculated and adjusted by the frequency of occurrence (0 or 1) i.e., u s* > u *ts H.
We used a robust direct estimation of the coupled parameter u s* /U h with an estimation uncertainty of 0.0027 m s − 1 (Chappell and Webb, 2016): where ω ns is the normalised and rescaled albedo (ω) translated and scaled (ω n ) from a MODIS range (ω nmin = 0, ω nmax = 35) for a given illumination zenith angle (ϴ = 0 • ) to that of the calibration data (a = 0.0001 to b = 0.1) using the following rescaling equation (Chappell and Webb, 2016): (3) Shadow is the complement of waveband (λ) dependent albedo, 1 − ω dir (0 • , λ) and the spectral influences due to e.g., soil moisture, mineralogy and soil organic carbon, were removed by normalizing (Chappell et al., 2018) with the directional reflectance viewed and illuminated at nadir ρ(0 • , λ): This approach can be implemented with any type / scale of albedo measurement.Here the approach was implemented by making use of the available MODIS black sky albedo to estimate ω n , and the shadow is normalised by dividing it by the MODIS isotropic parameter f iso (MCD43A1 Collection 6, daily at 500 m) to remove the spectral influences: The f iso is a MODIS parameter that contains information on spectral composition as distinct from structural information (Chappell et al., 2018).Theory, field and laboratory-based measurements demonstrate the structural information is waveband independent (Chappell et al., 2007;Jacquemoud et al., 1992;Pinty et al., 1989).The normalization of MODIS data using this parameter and that of MODIS Nadir BRDF-Adjusted Reflectance (NBAR) is similarly sufficient to remove the spectral content using all bands examined (Chappell et al., 2018).In practice, we calculated ω n using MODIS band 1 (620-670 nm).Notably, this approach will work with albedo from ground measurements (Ziegler et al., 2020) monitored from airborne and satellite remote sensing, or modelled prognostically in energy-driven ESMs.Consequently, this approach enables the simulation of dust in a past or future climate.To retrieve the u s* as a function of U h , the daily maximum wind speed at h = 10 m above soil surface is provided by ECMWF Climate Reanalysis, ERA5-Land hourly wind field data at 11 km spatial resolution (Muñoz Sabater, 2019).
Dust emission flux F (<10 μm; kg m − 2 s − 1 ) is calculated as: The clay % was restricted to 20 %, consistent with previous work (Marticorena and Bergametti, 1995) which, when applied in a regional model calibrated to dust optical depth showed reasonable results (Woodward, 2001).We calculated 0.1 < d < 10 μm and adjusted the mass in the assumed global, tri-modal, log-normally distributed source modes by M = 0.87 following Zender (2003).In each pixel, the coverage of snow (A s ) and whether the soil surface is frozen (A f ) is used to reduce dust emission and is obtained from daily ERA5-Land model data.Unlike existing dust models, the use of ω ns to dynamically estimate u s* removes the need for vegetation indices and fixed vegetation coefficients to determine effective aerodynamic roughness (Hennen et al., 2022(Hennen et al., , 2023;;Chappell et al., 2023aChappell et al., , 2023b)).Furthermore, as u s* is spatially explicit, it is unnecessary to pre-condition dust emission by applying a preferential dust source mask i.e., positive bias in areas perceived to have more erodible soils (e.g., surface depressions).
Here we use a new approach to tackle the inconsistency of evaluating dust emission model performance against dust optical depth.By using satellite observed dust emission point source (DPS) frequency this approach enables us to investigate the impact for dust emission modelling of the assumptions that the soil surface is smooth and covered with an infinite supply of loose erodible material which when mobilised by sufficient u s* causes transport and dust emission.This (energylimited) assumption is rarely justified in dust source regions, where the soil surface is rough due to soil aggregates, rocks, or gravels, sealed with biogeochemical crusts, or loose sediment is largely unavailable.This new approach has enabled the dust emission model to be calibrated by replacing the frequency distribution of these traditional approximations of threshold and sediment supply, with the frequency distribution from DPS data (Hennen et al., 2022(Hennen et al., , 2023;;Chappell et al., 2023aChappell et al., , 2023b)).For clarity, here we do not calibrate the AEM using the DPS data.First, we calculated the DPS probability of occurrence P(DPS > 0), a first order approximation of the probability of sediment transport P(Q > 0), which is directly proportional to the probability of dust emission P(F > 0) at those locations.Next, we equated the study durations to the frequency that u s* exceeds u *ts adjusted by H: During each simulation, the correct response P(F > 0) { 1 0 depends on the correct u *ts H obtained from the DPS data.

Dichotomous testing
At each of the satellite-derived DPS we used the AEM to predict dust emission, daily across the entire time period.The AEM dust emission at these locations were converted to dust emission occurrence (0 = no dust; 1 = dust) for comparison with the DPS using dichotomous tests.Dichotomous tests are used where the prediction and observation variable contain a maximum of two distinct outcomes.This categorical verification is used in numerical weather forecasting, typically for specific meteorological events (e.g., tornado, rain, or snow), where the verification question is "Did/Will this event occur?"In each instance, observation and simulation will provide a binary response, (i.e., 1 = Yes it will/did occur, 0 = No it did not / will not occur), these responses can be compared in a contingency table, where the responses are categorised as either Hit (observation = 1, simulation = 1), Miss (observation = 1, simulation = 0), False Positive (observation = 0, simulation = 1) or a Correct Negative (observation = 0, simulation = 0; Table 1).We simulate the presence or absence of dust emission at each DPS location for every day of observation, aggregated at 1 • resolution, where if any of the DPS (observed or simulated) locations produces dust (Eq.( 7)), then that grid box scores 1 on that day.Dichotomous statistics compare the coincidence of these ones.Nan (not-a-number) boxes describe lost data due to remote sensing issues (cloud mask, bright pixel mask) are excluded from the analysis.For clarity the number per region are described in the results.Aggregating these 1 • grid boxes overcome discrepancies in the precision of precisely locating the dust emission point source associated with different operators (Sinclair and LeGrand, 2019).These grid boxes also overcome the broader issue that the sample support of individual DPS data are too small for tolerable within and between class variance (Gotway and Young, 2002).
We use P(u s* >u *ts H) to describe the relative conditions of each grid box, with 'windier' locations providing a larger probability of exceeding threshold.We chose this metric over mean u s* as dust emission is expected to be a rare event and would obscure the diversity in extreme wind conditions within the long-term mean.

Satellite observed dust emission point source (DPS) frequency
The frequency of satellite observed dust emission point source (DPS) data and albedo-based dust emission model (AEM) estimates were calculated for DPS locations in 6 global dryland regions (using 9 studies).Table 2 describes regional DPS observations as probabilities, where the total number of opportunities are calculated by the number of DPS locations, multiplied by the number of days (minus the number of missing datasee Methods section).A total of 37,352 unique DPS locations were identified across the nine studies, covering 1945 unique 1 • grid boxes.By applying Eq. ( 7), a total of 59,688 dust emissions were identified.Missing data are defined as the number of days that the grid cell is unable to produce a forecast.That is, each of the DPS points within a given cell on a given day, each produce a null value due to a missing parameter in the model on that day.This is typically caused by masking

Table 2
Dust event frequency data from dust point source (DPS) locations observed in nine separate studies.Data describe the relative probability of occurrence during dust emission P(F > 0) point source (DPS) observation and from albedo-based dust emission model (AEM) forecasts at the same location and time period, identified in the same way (Eq.( 7)).M. Hennen et al. in the MODIS daily imagery (due to cloud) preventing a description of surface roughness.On average, 34.4 % of data were missing across the nine regions, with North Africa producing the fewest (18.9 %), and Central Asia producing the most (54.5 %).Corresponding missing data were removed from both modelled and observed data to maintain consistency in results.

Categorical dust emission model performance
The performance of the albedo-based dust emission model (AEM) is assessed through the coincidence of simulated and observed occurrence (or lack) of dust emission.These results are described globally in Table 3, where all results from all regions are collated into a contingency table describing the proportion of each of four outcomes (see Table 1 for outcome descriptions).Dust emission observations account for only 1.8 % of all possibilities (grid boxes multiplied by days).In comparison, the AEM over-predicts the frequency of dust emission by an order of magnitude relative to the DPS observations, producing dust emission 28 % of the time.The model and observations agree 71.4 % of the time, including 0.6 % where both model and observations produce dust ('hits'), and 70.8 % of the time when neither predicts dust emission ('correct negatives').During the remaining 28.6 % of the time, the model predicts dust 27.4 % of the time when no dust emission was observed ('false positives') and fails to predict dust emission 1.2 % of the time when dust emission was observed (Table 3).
The variation in modelled dust emission frequency between global regions is explained by the varying cumulative distribution functions (empirical) of wind shear velocity (u s* ) conditions at the soil surface (Fig. 3).In Fig. 3, the probability of dust emission is defined by the intersection of the distribution of u s* conditions and the entrainment threshold adjusted by the soil moisture function (approximated for the visualisation as u *ts H = 0.2 m s − 1 ; vertical black line), where all simulations greater than the varying thresholds generate dust emission (i.e., F > 0).In each case, u s* is influenced by the roughness u s* /U h and surface wind speed (U h ).The results show a range of conditions between each of the regions.Along the Namibian coastline (von Holdt) u s* is distinctly larger than all other regions (mean 0.23 m s − 1 ).In contrast, South African (Eckardt) dust sources have predominantly small u s* (mean 0.11 m s − 1 ; Fig. 3a).In the arid south-west of North America, average u s* remains consistent across each of the three regions (0.19 m s − 1 ), and marginally greater than Australia and the Middle East (each ~0.17 m s − 1 ).Despite producing the same mean, the frequency at which North American regions exceed threshold varies.These regional data suggest that the Chihuahuan Desert (Baddock), produces a larger proportion of u s* conditions at extreme values (small and large values of u s* ), whereas the Southern High Plains (Kandakji and Lee) produce a larger frequency closer to the mean.Along with South Africa, u s* conditions in Central Asia (mean = 0.14 m s − 1 ) and North African (mean = 0.13 m s − 1 ) are the smallest, with u s* values proportionally smaller than the collective global distribution (dashed black line).
Fig. 3b shows the distribution of u s* conditions during observation periods (locations and days with observed dust only).These data determine the proportion of 'hits' (coinciding observed and simulated dust) by P(u s* >u *ts H).With a greater proportion of u s* values and a u *ts threshold of 0.2 m s − 1 approximated for the visualisation (vertical black line), the north American regions generate a high probability (0.97-0.99) of 'hits'.In contrast, North Africa, Central Asia, South Africa, and the Namibian Coast all produce 'hit' probabilities below 0.5, due to the smaller frequency of large u s* conditions.The Middle East (0.55) and Australia (0.84) have larger probabilities but continue to 'miss' a significant proportion of observed dust events.These results show that a large proportion of the observations (up to 79 % in North Africa) occur during u s* conditions below the fixed threshold, with all regions except North America (see Lee and Baddock datasets) producing a minimum observed u s* below the fixed threshold.Some u s* are so small as to be extremely unlikely to produce dust and indicate that some wind speeds at the scale of 11 km pixel are inadequate.
To demonstrate the impact of u *ts on the probability of dust emission, we consider the adjustment of regional u *ts to match global DPS frequency P(u s*all > u *ts H) = 0.018, where u s*all is the ECDF of u s* conditions at all locations during all days (black dashed line in Fig. 3a).Fig. 3a shows that the global combined distribution of u s* conditions would require u *ts = 0.36 m s − 1 (red dashed vertical line).We compared the intersections of each ECDF at u *ts H = 0.2 m s − 1 (approximated for the visualisation; black vertical line) with that at u *ts H = 0.36 m s − 1 (red dashed vertical line) to illustrate the differences for percentage 'hits' (Fig. 3b).The 'hits' reduced in all regions, produced a maximum reduction of 55 % in Australia (84 % with u *ts H = 0.2 m s − 1 , to 29 % with u *ts H = 0.36 m s − 1 ), and a minimum reduction of 20 % in North Africa (21 % to 1 %).North America produces the largest percentage of 'hits' (57 % -71 %), while all observed events are missed in South Africa ('hit' = 0 %).All other regions reduce the proportion of 'hits' below 10 %.
Overall, during all observed dust events (black dashed ECDF; Fig. 3b), u s* < u *ts 68 % of the time, indicating wind speeds are too small over two thirds of the time when we know dust emission has occurred (i.e., DPS > 0).

Dust emission model variability at a local (1 • ) scale
The ECDF analysis in Fig. 3 indicates an underestimation of u s* conditions most of the time during observed dust events.Consequently, 68 % of known dust events are not modelled.Regionally, this value varies depending on the range of u s* conditions during observed events (Fig. 3b).Fig. 4a describes P obs (u s* > u *ts H) during observed dust events at a 1 • grid box.By considering only days which are known to produce dust, these data describe how well the model captures blowing dust conditions, where perfect performance would produce 1 in each box.Therefore, Fig. 4a identifies spatial patterns in model performance within regions, elucidating spatial variability in u s* conditions during known dust emission events (i.e., every grid box is dark red in Fig. 4a).The variability in grid box P obs is independent of regional conditions, instead elucidating spatial patterns in u s* conditions during known dust emission events.
During observed days, P obs is consistently large (>0.8) across North America, and the southerly reaches of Australia including the Lake Eyre Basin, the Simpson and Strzelecki Deserts to the south (Fig. 4a).Across North Africa, P obs remains generally small, increasing in the north (0.4-0.6) along the Mediterranean coast, and decreasing to a minimum (<0.2) in the south and east.The Bodele Depression is not evident as a hotspot because of its unique dust producing mechanism which are not represented in the AEM (see Discussion section for more details).In the Middle East, P obs is large (>0.6) across large areas of the Arabian Peninsula, including Mesopotamia in the north, the Red Sea Coast in the west and Oman to the south.Iran has large variability, with P obs (<0.2) in the north-east, increasing P obs (>0.6) in the Sistan Basin to the east, along the Makran Coast to the south and on the shores of the Caspian Sea to the north.Central Asia produces the largest variability, peaking (P obs > 0.8) in the Gobi Desert (China) in the east, the Kara-Kum Desert and Aral Sea area (Kazakhstan) to the west, while many central areas, including the Taklamakan Desert, produce small P obs .In the Namib Desert, along the Namibian Coast, P obs peaks (>0.6) to the north, while inland P obs reduces significantly (<0.2).In the interior of southern Africa, P obs is generally small (<0.4), with a peak (>0.6) in the southeastern extent of the Kalahari Desert.
For comparison, Fig. 4b describes P all (u s* > u *ts H) during all days at each 1 • grid box.These data include observed events which comprise only a small proportion (<1.8 %) of all days (Table 3).Accordingly, these results reveal how likely the model is to create false positive dust events, where a good model performance would produce very small P all values (i.e., zero false positives / grid box = white; Fig. 4b).Here, large spatial variability in P all occurs across Australia, and North America and the Middle East.P all remains consistently small (<0.4) in North Africa, and parts of north-east Iran, Central Asia, and South Africa.P all peaks (>0.8) in the Namib Desert (Namibia), western Arabian Plateau (Saudi

Table 4
Description of categorical albedo-based dust emission model (AEM) outputs due to varying probabilities of u s* > u *ts H during observed DPS dust days and all days.Colours indicate the symbology applied to 1 • grid boxes in Fig. 4. Arabia), Mesopotamia (Iraq / Syria), Makran Coast (Iran), Sistan Basin (Iran/Afghanistan) and discrete parts of the Kara-Kum (Kazakhstan), Taklamakan (China), and Gobi (China) Deserts.
The difference between P obs and P all (ΔP Eq. ( 8); Fig. 4c) describes how distinct the u s* conditions are in each grid box during each period: assuming (for simplicity and consistent with the approximation for previous visualisations) u *ts H = 0.2 m s − 1 .Those ΔP values close to 0 indicate no, or very small, differences in u s* conditions, indicating that the AEM does not recognise a difference in the probability of u s* exceeding threshold.These conditions occur across large parts of the Sahara Desert, Central Asia, where small u s* conditions continue (P obs and P all < 0.2).In parts of the Arabian Peninsula (including northern Mesopotamia), ΔP remains small as u s* conditions continue to exceed threshold most of the time (P obs and P all > 0.6).Positive ΔP indicates an increase in u s* during observed dust emission days compared to all days.These conditions occur in most dust sources in Australia, North America, Western Arabian Peninsula (Jordan, north-west Saudi Arabia), where u s* conditions are large during dust events (P obs > 0.8) and smaller during all days (P all < 0.4).The ΔP remains positive in south-eastern Kalahari Desert, Central Iran, and the Mediterranean coast of North Africa where smaller u s* conditions during observed dust events (P obs 0.4-0.8),remain distinctly larger than on all days (P all < 0.2).Negative ΔP indicates larger u s* during all days compared to observed days.These conditions occur throughout the Namib Desert, the coast of the Arabian Gulf (Saudi Arabia), the Makran Coast and Dasht-e-Lut Desert (Iran), where u s* conditions exceed threshold most of the time (P all > 0.8) and are relatively small during observed dust events (P obs < 0.6).Discrete areas of the Kara-Kum, Taklamakan, and Gobi Deserts also produce negative ΔP, as large peaks in u s* conditions (P all > 0.8) during all days, exceed those on observed days (P obs < 0.6).

Diagnostic dust emission model performance relative to dust emission observations
The P(u s* >u *ts H) during DPS events describes the model accuracy in either the u s* conditions known to have created dust emission (i.e., DPS = 1) or the correct dust entrainment threshold (Fig. 4a; top row in Table 4).By plotting the combinations of these occurrences, we can understand which meteorological events are best described by dust emission models and/or where the dust entrainment threshold is poorly constrained.In common with other dust emission models, the AEM has no description of the spatio-temporal variation in soil erodibility and assumes an infinite sediment supply at all locations.Consequently, whenever u s* > u *ts H the AEM simulates dust emission.During DPS observations, by comparing P(u s* >u *ts H) with all modelled days (Fig. 4b), we can determine areas where sediment supply is poorly described by an infinite sediment availability i.e., no difference in P(u s* > u *ts H) between observed days and all days (top left and bottom right in Table 4) or comparing P(u s* > u *ts H) is larger during all days than during DPS observations (bottom left in Table 4).
Where there is no clear separation in u s* conditions during observed events and all days, we can interpret these results in two ways, depending on the P(u s* > u *ts H).If that P is large during both periods (bottom right in Table 4), the model will correctly simulate dust most of the time during DPS observations ('hits' are large).In this case, dust producing u s* conditions are well described, but the lack of erodibility parametrisation means dust emission will continue to be simulated beyond those days observed in the DPS data ('false positives' are large).If that P is small (top left in Table 4), dust-producing u s* conditions are not well described ('hits' are small) and are therefore not distinguished from all day events during observed DPS days ('false positives' remain small).

Discussion
The collective dust emission frequency from nine separate studies demonstrate that dust emission is a rare event (on average 1.8 % of all space-time occurrences; Table 3) indicating extreme conditions (e.g., large wind speeds) even in the more readily recognised dust emission areas (e.g., the Sahara Desert, the Arabian Peninsula; <100 days y − 1 ).Notably, an independent study using the Multiangle Imaging Spectroradiometer (MISR) also found spatially patchy dust plume distribution, with frequency of <135 days y − 1 (Yu et al., 2018).In comparison, the albedo-based dust emission model (AEM) simulations estimate dust emission frequency 28 % of the time.This AEM over-estimation is consistent with the need for ESMs to be globally tuned down by several orders of magnitude to match dust optical depth (Zender, 2003).The AEM over-estimation particularly in North Africa has considerable implications for ESMs because they are "…generally tuned to fit the observations in a given part of the world and often this tuning is done with observations from North Africa" (Huneeus et al., 2011;p.7809).Consequently, the ESMs are very likely to be simulating dust emission too frequently, with too little intensity and with reduced diversity in the contributing dust with different mineralogy from other regions (Chappell et al., 2023a(Chappell et al., , 2023b)).These results confirm earlier findings that dust emission models must first be calibrated against DPS data before being calibrated against dust optical depth (Chappell et al., 2023a(Chappell et al., , 2023b)).
Notably, the over-estimation remains despite the AEM model using a calibration of wind friction against aerodynamic roughness (Chappell and Webb, 2016;Webb et al., 2020) Chappell et al., 2023a, 2023b).There are two components of this AEM over-estimation that we think need to be considered: (i) it is systematic across dryland dust sources across Earth; (ii) the disparity is in total frequency and daily coincidence of observed and simulated emission.Without considering both components synchronously, it will be difficult for dust emission model developments to determine if the dust emission model simulated the correct frequency by chance (i.e., same frequency on different days), and under which environmental conditions the model performs.
The AEM coincides with DPS occurrences (observed and not observed) 71.4 % of the time.However, during observed dust events, the AEM only coincides with DPS events 0.6 % of the time.Since the AEM provides a realistic (calibrated) representation of u s* , these results suggest that the inconsistency in modelled and observed frequencies is due to a combination of three broad factors: (1) discrepancies in the formulation of the entrainment threshold adjusted by soil moisture (u *ts H); (2) incompatible scales in dust emission modelling e.g., the grain-scale u *ts is incompatible with the areal u s* (MODIS 500 m) and areal wind speed at a larger scale (ERA5-Land; 11 km), and (3) the inadequate assumption of an infinite supply of loose, erodible sediments.Each of these three main factors can be interpreted by comparing the conditions which exceed entrainment threshold P(u s* > u *ts H) during observed DPS days, to all days (Table 4) at multiple scales including, regional (Fig. 3) and local (Fig. 4).We should also not exclude a possible bias in the DPS data towards few (detectable) large magnitude events and away from smaller magnitude, larger frequency occurrences below the detectable limit of these DPS observations.However, there is limited quantitative information available, and we raise awareness of these sources of uncertainty below.

Discrepancies in the formulation of entrainment threshold (u *ts H)
By using dichotomous descriptions of dust emission frequency, we provide an assessment of model performance which emphasises the coincidence of events rather than just a comparison of total frequency.This assessment distinguishes simulations associated with dust emission events, from simulations of dust emission on all days, to provide a powerful description of dust emitting conditions from those on all days.Our results show that modelled dust emission occurs regularly i.e., u s* > u *ts H where and when no dust emission is observed (27.4 % of all simulations; Table 3).These findings form the alluring suggestion that dust emission model performance can be improved by matching u *ts H to the correct global frequency of observed dust emissions (globally = 1.8 %; u *ts H = 0.36 m s − 1 ).However, reducing the number of 'false positives' in this way will systematically reduce the proportion of correct observations (i.e., 'hits') in all regions by as much as 55 % (Australia), with only 1 % of all observations in North Africa correctly simulated.An alternative solution might appear to adjust u *ts H to maximise the number of 'hits' P(u s*obs > u *ts H) = 1 and globally would require a fixed u *ts H = 0.006 m s − 1 .However, this alternative will increase the proportion of 'false positives' to 99.9 %.Neither of these approaches are recommended for the reasons described.
Despite the rarity of dust observations (occurring only 1.8 % of the time; Table 3), the ECDF data show that dust emission events rarely represent extreme u s* conditions (P(u s*obs ) = < P(u s*all ); Fig. 3), because in most cases there is no distinct difference in u s* conditions between observed days and all days.These results demonstrate that there is no reasonable basis to calibrate model performance through an adjustment to a fixed global threshold (u *ts H).Whilst this may seem axiomatic to some, the assumption of global and fixed u *ts has endured for more than two decades since dust emission models were first developed.In contrast, a threshold varying in space-time responding to erodibility dynamics should improve model performance in areas where there is a clear positive change in frequency of occurrence (i.e., top right in Table 4;u s*obs > u s*all ).Our regional results indicate that this condition (u s*obs > u s*all ) occurs in North America and Australia, where the AEM identifies an increased mean u s* during observed DPS events (Fig. 3).In both regions, dust emission occurs during the passage of large frontal systems (Rivera Rivera et al., 2009;Strong et al., 2011) in response to cyclonic activity.The ability to accurately model these synoptic conditions would allow u *ts H to be adapted (increased) to reduce the number of 'false positive' simulations without negatively affecting the model's ability to simulate 'hits'.However, calibration of u *ts H in this way is also not recommended because it fundamentally tunes the model response to those specific conditions.It would not enable modelling to be physicallybased in responding to a changing environment which is essential for use in understanding past and future climate projections.

Incompatible scales in dust emission modelling
By describing the ECDFs of u s* during observed days and locations (Fig. 3b), a new understanding emerges assuming that the coupled property, wind friction velocity normalised by wind speed (u s* /U h ) is well constrained, by being calibrated against aerodynamic wind tunnel measurements (Chappell and Webb, 2016).Wind speeds used in the AEM are too small to enable u s* to exceed u *ts H during roughly 2 out of 3 observed events P obs (u s* > u *ts H) = 0.6.For example, North American DPS are from predominantly barren parts of the region and show little variation in u s* /U h, either spatially or temporally (Hennen et al., 2022).This characteristic of DPS data extends globally, with most dust emission point sources coinciding with barren conditions (u s* /U h > 0.028) which do not change much, most of the time (standard deviation <0.002) either within or between the few years of measurements available.In these locations, variation in u s* conditions of the DPS locations is created mainly by variation in U h .Accordingly, when a dust event is observed but u s* does not exceed u *ts H, we assume that the AEM has not correctly simulated the associated dust-producing wind conditions at that location.In the text which follows, we elaborate on regional conditions and AEM performance given these assumptions.
North Africa produces the smallest probability of dust-producing winds during observed dust events (P = 0.2; Fig. 4a).However, there is large spatial variability in P, with larger values along the Mediterranean coast and western Africa (P > 0.4) than inland, where eastern parts of the Sahel have P < 0.2.Dust emission in the north of the region occurs through cyclogenesis and associated formation of fronts (Schepanski et al., 2009).Specifically, Sharav cyclones (also named Mediterranean cyclones), track across the Mahgreb region towards the eastern Mediterranean Basin (Caton Harrison et al., 2021;Knippertz and Todd, 2012).These conditions are often associated with an active warm front, characterised by pronounced dust uplift (Schepanski et al., 2009).Saharan Depressions are also found anticyclonically over Western Africa, where they ultimately transit north and east into a Mediterranean cyclone (Schepanski and Knippertz, 2011).These synoptic scale meteorological conditions are described well in the AEM, with a distinct change in u s* (increasing P) during observed dust events compared to all days (Fig. 4c, top right in Table 4).
In parts of the Sahel, dust emission is associated with mesoscale meteorological drivers, including the diurnal break-down of the nocturnal low-level jet (LLJ) (Schepanski et al., 2009) and sudden increase in wind speeds at the leading edge of cold-pool density currents, formed from deep moist convection (Knippertz and Todd, 2012;Lawson, 1971).Fig. 4a shows that neither of these conditions are frequently identified in the AEM, with P < 0.2 during observed events.These small P values very likely arise from our use of ERA5-Land global wind field data (11 km pixels; daily maximum winds) which, like most global modelled wind field data, will struggle to describe episodic, mesoscale events such as LLJs and cold pooling (Fan et al., 2020;Caton Harrison et al., 2021).Instead, these wind data describe a single spatial mean value per 11 km pixel, which is subsequently used to form u s* which is then compared to u *ts H (at the grain scale without adjustment).The problem with this mean value is not that it is provided at 11 km, but that the spatial mean wind is derived from the 40 m 'blending' height.When that 11 km spatial mean value is provided by ERA5-Land at the 10 m height it is assumed that the aerodynamic roughness length is static over time and fixed over space (for a given land cover type; see technical ECMWF details).Our AEM used maximum daily ERA5-Land wind speeds to increase the chance of simulating dust-producing winds.However, maximum values still describe the spatial mean across the 11 km pixel, during that period.If peak wind speeds occur suddenly and/or in only a portion of an 11 km pixel, the mean pixel value will not capture the magnitude of those peak wind conditions at a given point dust source.Consequently, no distinct change in peak u s* conditions can be identified during local (discrete) or sudden dust emission events, as demonstrated by the parity in P(u s* >u *ts H) during observed dust events and all days (Neutral ΔP -Fig.4c).These results indicate that the (ERA5-Land) downscaling of wind using simplifying assumptions about aerodynamic roughness is limiting our ability to tackle sub-grid scale heterogeneity in wind fields and related applications in dust emission e.g., impact of wildfires on mineral dust emission (Menut et al., 2022).For clarity, we do not interpret this to mean that we should use finer resolution information.That will not tackle the sub-grid scale heterogeneity.We interpret this to mean that the downscaling of the wind field aerodynamic roughness needs to be improved.

Inadequate assumption of infinite supply of fine sediments
The AEM over-estimated the frequency of dust emission at all DPS sites.However, it also failed to simulate all of the observed dust emission events.The dichotomous statistics demonstrate that AEM dust emission occurs predominantly when no observation was made (27.4 % of the time; Table 3).At these dust sources, P(u s* > u *ts H) is large all the time (bottom row in Table 4).As the AEM has no description of the availability of dry, loose material to generate sediment transport (soil erodibility), it will produce dust emission whenever u s* conditions are large enough to exceed u *ts (many false positives).The entrainment threshold is exceeded more frequently in areas where the prevailing wind speeds remain frequently large.Our results show large daily P(u s* > u *ts H) across Mesopotamia, the Sistan Basin (Iran / Afghanistan) and the Namibian Desert (Fig. 4b), where dust emission is simulated >80 % of the time in response to frequent large winds.These occur in the northwesterly Shamal winds of Mesopotamia (Bou Karam Francis et al., 2017;Yu et al., 2016), the Sistan winds in eastern Iran (Rezazadeh et al., 2013) and the Berg winds across the Namibian coast (von Holdt et al., 2017).The DPS observations peak in some of these regions, yet continue to occur infrequently, with P(DPS > 0) <0.3.With sufficient u s* to initiate dust emission 80 % of the time, the scarcity of observations indicates an absence of erodible material.Despite an assumed infinite supply of loose material in the model, dryland environments are wellknown to be supply-limited (Bullard et al., 2011;Klose et al., 2019;Parajuli et al., 2014;von Holdt and Eckardt, 2018;Zender, 2003).Ephemeral processes, and the preferential transposition of fine materials are often considered key in the episodic nature of dust emission (Rashki et al., 2017).In supply-limited areas, once these fine materials are deposited, there exists a finite period of increased dust emission potential.During the intervening periods, supply is either exhausted or protected from erosive winds by the formation of biogeophysical crusts (Vos et al., 2020) or surface 'armouring'.Accordingly, dust source areas, like the Sistan Basin, Tigris-Euphrates Basin (Syria/Iraq), and the Kuiseb River catchment (Namibia), where ephemeral or fluvial systems (with variable flow rates) occur, will tend to be limited by the production of fine materials (von Holdt and Eckardt, 2018).While the impact caused by the simplistic model assumption of infinite sediment supply, is most apparent in frequently windy areas, our results (27.4 % 'false positive' simulations) suggest that the mismatch between the assumption and the DPS observations of dust emission occurs in all dryland areas (Fig. 4b).

Uncertainties in evaluating dust emission models
Several sources of uncertainty are associated with the evaluation of dust emission estimates including the use of dust emission point source (DPS) data in this work.The uncertainties surrounding DPS data are known but largely unquantified as we develop this new framework for the evaluation of dust emission model development.We provide a description of these DPS uncertainties below which includes the rationale for new and additional work to support this new framework.First, we place the uncertainties of this new framework into the broader context of how dust emission models are currently evaluated and how much (total) uncertainty is known and quantified with dust emission modelling itself.This ensures that the value of this new framework is appreciated.
At the largest ESM scale, the dust emission model itself is not evaluated, the dust cycle model is evaluated against dust optical depth.This approach assumes that there is no global spatio-temporal bias in the dust emission model (Huneeus et al., 2011;p.7809).Recent evidence indicates that at this global scale that assumption is not valid and uncertainty in the dust emission model was largely unrecognised and much larger than expected (Chappell et al., 2023a(Chappell et al., , 2023b)).Recent work using global DPS data to calibrate the AEM has provided the first quantitative estimate of dust emission model uncertainty (Chappell et al., 2023a).At the regional scale, numerical weather prediction models are typically evaluated using dust optical depth (LeGrand et al., 2023).Despite being at a fine spatio-temporal resolution, this approach also does not enable the dust emission model itself to be evaluated.At the field scale, active dust concentration measurements are used.Whilst this approach brings the evaluation closer to the dust emission process, uncertainty remains as to the difference between proximal and distal dust being measured.Therefore, it seems reasonable to conclude that we have very poor constraints on dust emission model uncertainty across these different scales.
For our new framework, we have described the methodologies for establishing dust emission point source (DPS) data which include protocols for consistent and repeatable identifications.The reproducibility issues raised by earlier studies of DPS data (e.g., Sinclair and LeGrand, 2019) are avoided here with our use of 1 • grid cells.The difference between previous approaches and our approach is similar to the way in which incompatible spatial data are combined (Gotway and Young, 2002).By using large 1 • grid cells we have many samples of dust emission observations across relatively few cells, which adequately represents the within-class variance.Without this large support size, the large number of samples would be spread over very many smaller cells reducing the number of samples per cell which would increase the within-class variance and hamper the ability to reliably detect difference.In other words, uncertainties associated with individual DPS identifications (or ground-based dust optical depth measurements) are reduced considerably by our 1 • grid cell aggregated approach.
Uncertainty in validity of DPS data focusses on how they represent the magnitude and frequency of dust emissions.Hennen et al. (2019;Fig.2) assigned a level of confidence (to their SEVIRI DPS data of the Middle East) which serves as a useful basis for discussing these sources of uncertainty.Here, low confidence levels are primarily indicative of difficult observation conditions, including the presence of meteorological clouds, and night-day temperature differences (Murray et al., 2016).
Level 1 data have the greatest confidence ascribed to them.The DPS data associated with MODIS are sun-synchronous which exclude dust emissions during the night.Similarly, reduced land surface 'skin' temperature, and night-time atmospheric temperature inversion reduced thermal contrast during nighttime conditions, potentially precluding a portion of SEVIRI DPS events at night (Hennen et al., 2019).However, we did not filter the daily wind speed maxima used in the albedo-based dust emission to be sun-synchronous.Whilst night-time dust observations are omitted from MODIS DPS, they are included in SEVIRI DPS.Across the global DPS dataset this reduces the possibility of a systematic bias.
Level 2 and 3 confidence in DPS data are associated with partial cloud cover close to the dust emission source, where upwind dust emission activation does not obscure the observed emission surface.The detection of these level 3 emissions typically occurs downwind of other dust plumes, or within challenging surface conditions, where DPS identification requires meticulous monitoring of dust plume evolution through sequential images (Hennen et al., 2019).The MODIS DPS data are unable to provide this plume tracking and are very likely to be biased away from these types of dust.However, across the global DPS data, the mixture of MODIS and SEVIRI data is unlikely to include a systematic bias.Furthermore, counting dust emission events in the model above a small threshold emission flux may help bound the bias in the DPS data.For each of these different levels of confidence, we have been unable to find any information in the literature on the magnitude of any potential bias in DPS data.

Conclusions
Several new insights for model performance have arisen from this work with implications for the prospects of dust emission modelling.Satellite observed dust emission point source (DPS) data aggregated to be compatible with the scale of dust emission model simulations, demonstrate that dust emission is rare, even in areas where there are many more dust sources in the region (e.g., North Africa, Middle East).Notwithstanding recent improvements in dust emission modelling using the albedo-based approach, the AEM currently over-estimates dust emission occurrence by several orders of magnitude.We describe elsewhere how these over-estimates are reduced by calibration with DPS data.
Our AEM over-estimation of dust emission is globally systematic, which we interpret here to be due to (i) the consistent difference between the scale of the wind friction velocity (using MODIS albedo at 500 m) and the scale of wind field data (using ERA5-Land at 11 km pixels) and (ii) estimates of wind speed (downscaled from 40 m to 10 m height) based on land surface roughness values static over time and fixed over land cover classes.Similarly, we know that the entrainment threshold is derived at the grain scale which is incompatible with those areal estimates of wind and wind friction velocity.Furthermore, the longstanding dust emission modelling assumption of an infinite supply of dry, loose and available sediment is evidently unreasonable and causing some of the discrepancy between dust emission modelling compared with DPS data.
Our results demonstrate that the following future improvements in dust emission modelling will be most effectively tackled in an integrated approach because of the interaction between magnitude and frequency of sediment transport and dust emission, by: • evaluating how various atmospheric conditions are represented by DPS data, by conducting DPS studies across a wider range of dust source types in alternate dust source regions.This work may usefully include an attempt to quantify the bias in DPS data.• overcoming the current incompatibility of grain-scale entrainment threshold with one which is area-weighted and varies over spacetime with soil surface conditions.Applying linear scaling of the normalised shadow data before those normalised shadow data are calibrated to the wind friction velocity will overcome the longstanding non-linearity in sediment transport and dust emission modelling.
• improving the parametrisation of sediment supply / availability by spatially area-weighting, changing over space, and scaling linearly for consistency with other model data.
Our results suggest that routine evaluation of dust emission model performance should be against dust emission measurements for which we now have a large database of satellite observed dust emission point source (DPS) data.We emphasise in our evaluation, the important difference between dust emission observations and atmospheric dust concentrations and the role each type of data plays in identifying model performance.Rather than evaluate developments in dust emission model parameterisation by assessment against dust in the atmosphere (e.g., dust optical depth), we recommend that evaluations of global dust emission modelling improvements are made first against global dust emission point source data.Dust emission model fidelity will then be described by the coincidence in space and time with those dust emission observations.In due course, we expect this new approach to re-balance dust emission modelling towards the skill of dust emission schema.As new data sources emerge, and emission schema develop this new evaluation approach will benefit the dust modelling research community, by avoiding enduring modelling deficiencies through objective critical reevaluation.

•
Satellite-observed dust emissions (DPS) are rare (1.8 %) even in North Africa.• Albedo-based dust emission model (AEM) coincided 71 % with DPS data.• The AEM simulated dust emission 27 % when no dust emission was observed.• Incompatible scales and crude model assumptions caused false positives.• DPS provide consistent and reproducible framework for dust emission model development.
number of simulations (daily grid box) lost due to missing albedo data.

Fig. 3 .
Fig. 3. Empirical cumulative distribution functions (ECDF) of satellite observed dust emission point sources (DPS) from 9 studies across 6 dryland regions compared to MODIS (500 m pixels) albedo-derived wind friction velocity (u s* ) estimated using ERA5-Land (11 km pixels) wind speed at 10 m height.The vertical black line approximates (for the visualisation) the actual model entrainment threshold (u *ts ) used fixed over space (for given soil types) and static over time.The distribution of u s* either side of the black line (u *ts ) represents the probability of modelled dust emission during (a) all modelled days during the duration of the respective study, (b) observed days, including only modelled u s* conditions at locations and days where DPS emissions were observed.Red dashed line describes the theoretical u *ts required to omit 98 % (blue horizontal line) of occurrences from the global combined distribution of u s* conditions (black dashed line), matching the observed frequency of the 9 regional studies (combined) (Table3).

Fig. 4 .
Fig. 4. Maps describing the probability of dust emission P(u s* > u *ts H) at a 1 • grid resolution, during (A) observed days and locations where dust point source (DPS) emissions were observed, and (B) all days and locations during the length of the respective study.The difference (C) in ΔP between observed and all days describes the relative difference in u s* conditions during each period.Red grid boxes describe positive ΔP, meaning winds are larger during DPS dust events than during all modelled days.Blue grid boxes describe negative ΔP, indicating winds are slower during DPS events than during all modelled days.Light blue, yellow, and orange grid boxes described neutral ΔP, indicating none, or very little, discernible difference between wind conditions during DPS events and all modelled days.

Table 1
Contingency table describing the frequency of occurrences in the observations and simulations.The joint distribution boxes (Hit, False Positive, Miss, Correct Negative) compare the binary responses of the observations and simulations.The totals describe the marginal distribution for either observation or simulation and are independent of each other.'Hit' + false positive 'Miss' + correct negative Grand total M. Hennen et al.

Table 3
Categorical statistics for albedo-based dust emission model (AEM) simulations (F > 0) when compared to all satellite observed dust emission point sources (DPS) combined.