A novel approach to the development of 1-hour threshold concentrations for exposure to particulate matter during episodic air pollution events.

Episodic air pollution events that occur because of wildfires, dust storms and industrial incidents can expose populations to particulate matter (PM) concentrations in the thousands of µg m-3. Such events have increased in frequency and duration over recent years, with this trend predicted to continue in the short to medium term because of climate warming. The human health cost of episodic PM events can be significant, and inflammatory responses are measurable even after only a few hours of exposure. Consequently, advice for the protection of public health should be available as quickly as possible, yet the shortest averaging period for which PM exposure guideline values (GVs) are available is 24-h. To address this problem, we have developed a novel approach, based on Receiver Operating Characteristic (ROC) statistical analysis, that derives 1-h threshold concentrations that have a probabilistic relationship with 24-h GVs. The ROC analysis was carried out on PM10 and PM2.5 monitoring data from across the US for the period 2014-2019. Validation of the model against US Air Quality Index (AQI) 24-h breakpoint concentrations for PM showed that the maximum-observed 1-h PM concentration in any rolling 24-h averaging period is an excellent predictor of exceedances of 24-h GVs.


Introduction
In this paper we present a novel approach to the development of 1-hour threshold concentrations (TCs) for exposure to particulate matter (PM) during episodic air pollution events, as might occur during wildfires (Rappold et al., 2017) dust storms (Milford et al., 2020, Zhang et al., 2016, Rublee et al., 2020 or incidents at industrial facilities (Griffiths et al., 2018). Populations exposed to episodic air pollution events can experience PM concentrations in the hundreds and even thousands of µg m -3 (Griffiths et al., 2018). Our approach uses a model that is developed using Receiver Operating Characteristic (ROC) statistical analysis of ambient monitoring data from the US over the period 2014 to 2019. The development of 1-hour TCs is needed because health effects of elevated PM exposure are apparent at a timescale of hours, as evidenced by measurable inflammatory responses for volunteers exposed to PM in the 100 to 300 µg m -3 range, over short durations (Behndig et al., 2006, Tong et al., 2014, Ghio et al., 2000, Salvi et al., 1999, Stenfors et al., 2004. Short term (hours) health effects have also been noted in fire fighters (Greven et al., 2012, Swiston et al., 2008, Main et al., 2020. Nevertheless, unlike nitrogen dioxide and sulphur dioxide, which have health-based GVs for exposure durations of 1-hour or less (US EPA, 2014, WHO, 2006b, and many chemical substances for which there are Acute Exposure Guideline Levels (AEGLs) for periods as short as 10 minutes (Stewart-Evans et al., 2016), no such values are available for PM 10 and PM 2.5 .
The need for the development of short-term PM exposure guidance has become more pressing in recent years because periods of highly elevated PM concentrations have increased in frequency, duration and extent, especially during wildfires (Balmes, 2018, Dodd et al., 2018, Ford et al., 2018, Howard et al., 2021. These events are responsible for causing significant ill health effects (Reid et al., 2016, Haikerwal et al., 2015, Faustini et al., 2015, Black et al., 2017, Cascio, 2018, particularly in the more vulnerable residents of an exposed area (Finlay et al., 2012, Holm et al., 2020, Wakefield, 2010. In addition, the toxicity of particulate emissions during combustion-related episodic pollution events has been found to be higher than for equivalent concentrations of ambient particulates (Wegesser et al., 2009). This enhanced toxicity is due to the wide range of chemical toxins present in PM that originate from combustion processes, including PAHs and benzene (Balmes, 2018, Wegesser et al., 2009. The incidence of wildfires, globally, is predicted to increase in the medium term as a result of a warming climate (Moritz et al., 2012) and there is also evidence that fires at waste management sites are more frequent during warmer conditions (Griffiths et al., 2018).
The health impacts of such changes may be considerable, with a recent study suggesting that premature deaths due to PM 2.5 exposure during wildfires in the US alone could increase from the current 17,000 per year to 42,000 per year by 2050 (Ford et al., 2018). The associated health related economic costs are also expected to be significant (Johnston et al., 2020, Kochi et al., 2016.

J o u r n a l P r e -p r o o f
The absence of short-term GVs for PM 10 and PM 2.5 has been acknowledged in the literature and there have been several studies that have derived surrogate exposure guidance for periods as short as one hour (Griffiths et al., 2018, Stieb et al., 2008, Mintz et al., 2013, European Union, 2020b, Connolly and Willis, 2013. A common theme to these approaches has been the relationship between the maximum hourly concentration within a 24-hour period and the corresponding mean value. A notable example is the European Union's (EU) Common Air Quality Index (CAQI), which has five classes ranging from 'Very Low' to 'Very High', each with corresponding concentration thresholds for PM 10 and PM 2.5 , both for 1-hour and 24-hour measured concentrations (European Union, 2007). The 1-hour thresholds between the CAQI categories for PM 10 were derived from the 24-hour limits by dividing the latter by a factor of 0.55, which is the ratio between the mean 24-hour concentration and the maximum hourly concentration within the same period. The ratio of 0.55 is based on European ambient monitoring data from 52 urban monitoring stations for the period 2001-2004(European Union, 2007. Thus, for the 24-hour PM 10 category boundary between 'medium' and 'high' (set at 50 µg m -3 , which is the same as the EU/WHO 24-hour ambient guideline value) the calculated 1-hour category boundary is set at 90 µg m -3 (after rounding). For PM 2.5 , the class boundaries are based on those of PM 10 , applying a factor of 0.6, which is the fraction of PM 10 that is PM 2.5 , again based on European monitoring data (European Union, 2020b). Stieb et al. (2008) employed a similar approach to CAQI in determining short term (3 hour) TCs. A numerically identical ratio was observed between the 24-hour mean concentration and the 3-hour maximum concentration for PM 10 or PM 2.5 (based on monitoring data collected in Canada over the period 1998 to 2000). Stieb et al. (2008) illustrated their approach using the US Air Quality Index (AQI) boundary between 'moderate' and 'unhealthy for sensitive groups' (AQI=100), at which the corresponding 24hour PM 10 sub-index guideline value of 150 µg m -3 has an equivalent 3-hour TC of 275 µg m -3 .
In the UK, the Department of Environment, Food and Rural Affairs (DEFRA) have derived 1-hour 'trigger' concentrations as a component of the UKs Daily Air Quality Index (DAQI) (Connolly andWillis, 2013, Holgate, 2011). The trigger concentrations establish a relationship between 1-hour measurements and the 24-hour mean concentration ranges that correspond to the DAQI air pollution categories of 'low', 'moderate', 'high' or 'very high'. Under the DAQI methodology, if two consecutive 1-hour measurements breach a 'trigger' concentration, this is taken to indicate that current air quality falls within the relevant DAQI category, thus providing a 'real-time' element to public information about air pollution levels in the UK. The trigger concentrations were derived using a categorical model based on 270,000 days of PM 10 data and 27,000 days of PM 2.5 data from automatic monitoring stations across the UK for the period 2004 to 2009 .

J o u r n a l P r e -p r o o f
In our own work in this field, we have developed a similar categorical model to DAQI, though one that is based on the higher concentration ranges that are observed during major incident fires (Griffiths et al., 2018). The model uses 1-hour PM 10 and PM 2.5 measurements to predict exceedances of 24-hour guideline and threshold concentrations that relate to public health advice during such incidents. The model development was based on monitoring data obtained from the UK's Air Quality in Major Incidents (AQiMI) programme (Griffiths et al., 2018), which coordinates field monitoring of a range of atmospheric pollutants arising from fires and loss of containment incidents at industrial facilities and waste disposal sites in the UK. Both authors of the present paper were involved in the AQiMI programme (Griffiths et al., 2018). The model demonstrated that there is a threshold concentration of 1-hour measured PM that, when breached, gives a defined probability that a 24hour guideline value is also likely to be exceeded.
Other approaches to providing more responsive information on ambient PM exposure include: remote sensing (Krstic and Henderson, 2015), real-time dispersion modelling, e.g. the BlueSky wildfire smoke forecasting service used in Canada (Yao et al., 2013), predictive models based on autoregression neural networks (Videnova et al., 2006), non-linear models using 'big data' (Xu et al., 2020), textual analysis of social media postings (Sachdeva and McCaffrey, 2018) and the USEPA Nowcast methodology which contributes to the US AirNow forecasting service (Mintz et al., 2013).
The latter uses short term monitoring data (the previous 12 hours) to predict an equivalent 'instantaneous' 24-hour PM concentration, which can then be compared to various health criteria.
This method is designed to be responsive at times of rapidly changing pollution conditions. It does this by giving greater weightings to the three most recent hours of air pollution data at times when the air quality is very variable but gives more equal weighting to the previous 12 hours of air pollution data when pollution concentrations are more stable.
This present paper builds upon our previous work (Griffiths et al., 2018) by using Receiver Operating Characteristic (ROC) analysis (Fawcett, 2006) of US EPA data for the period 2014 to 2019 to develop a probabilistic model from which 1-hour TCs can be derived for PM 10 and PM 2.5 . The US Air Quality Index (AQI), and associated breakpoint concentrations for the PM sub-indices, were used as the source of GVs, as they represent a wide range of pollutant concentrations (0 to 500 µg m -3 for PM 2.5 and 0 to 605 µg m -3 for PM 10 ) and are appropriate to the US monitoring data.

Principles of ROC analysis, as applied to air quality data
The ROC model development work described in this paper was carried out as follows. Firstly, we built a model based on ROC analysis of PM 10 and PM 2.5 ambient concentration measurements from J o u r n a l P r e -p r o o f across the US for the period 2014 to 2019. This allowed us to derive 1-hour TCs that gave defined probabilities that selected 24-hour GVs would be exceeded. We then evaluated the performance of the model using a cross-validation approach and also carried out a separate evaluation of an ROC model that was developed using monitoring data from California only.
ROC analysis is a classification metric that analyses the ability of a predictive parameter, or 'classifier', to discriminate between two outcomes. It has been widely used for health-related diagnostic analysis (Hajian-Tilaki, 2013, Phillips et al., 2010, for example to identify biomarkers in serum related to PM 10 exposure (Lee et al., 2015). In the environmental field, ROC analysis has been applied to the development of a model to predict the quality of beach water for swimming, based on either the previous day's rainfall or bacterial counts (Morrison et al., 2003). We used a variant of this analysis when developing our original model for AQiMI data (Griffiths et al., 2018). For that work, and in the present study, the classifier is the maximum 1-hour concentration in a rolling 24-hour period. The outcome is whether (or not) the mean concentration of the corresponding 24-hour period exceeds a selected 24-hour GV. The value of the GV can be selected from a range of healthbased values developed by the WHO, USEPA, EU and other bodies.
The utilisation of ROC analysis is illustrated in Figure 1, which shows a typical output in which the true positive rate (TPR, or sensitivity) is plotted against the false positive rate (FPR, or 1 -specificity) (Fawcett, 2006). In formal terms, TPR and FPR are defined in Equations 1 and 2 respectively, where for a given set of analysed data, TP is the number of true positives (i.e. the model correctly predicts an exceedance of a 24-hour guideline value), FN is the number of false negatives (the model incorrectly predicts that the 24-hour guideline value is not exceeded), FP is the number of false positives (the model incorrectly predicts an exceedance of a 24-hour guideline value) and TN is the number of true negatives (the model correctly predicts that the 24-hour guideline value is not exceeded).

FPR = FP TN+FP
Eq. 2 J o u r n a l P r e -p r o o f Figure 1: Example ROC plot. The solid blue line shows an ROC curve for the analysis of an illustrative set of PM 10 data where the 'outcome' is whether (or not) a defined 24-hour guideline value has been exceeded. The 'classifier' is the maximum 1hour concentration within the same 24-hour period. The dotted line shows an example of an ROC analysis where the 'classifier' correctly predicts 100% of outcomes (area under the curve, AUC, =1.0), whereas the dashed line shows an example curve where only 50% of outcomes are correctly predicted by the classifier (AUC = 0.5). The vertical lines indicate different selections of TPR, together with the corresponding FPR. The optimal situation is to have a TPR as close as possible to 1, whilst minimising the FPR. Also shown is the numerical value of the 'classifier' that corresponds to the selected TPR/FPR.
The area under the curve (AUC) for the solid line in Figure 1 is an important parameter for ROC analysis, representing the overall probability that the chosen classifier parameter will rank a randomly chosen true positive instance above a randomly chosen true negative instance (Fawcett, 2006). For our illustrative dataset the AUC is 0.843, meaning that the value of the maximum 1-hour concentration in a 24-hour period correctly determines exceedances of the 24-hour GV in 84.3% of cases.
Two other example curves are shown in Figure 1: a dotted line with an AUC of 1.0 (100% probability of distinguishing between two outcomes) and a diagonal, dashed, line with an AUC of 0.5, which means there is only a 50% probability of correctly discriminating between two outcomes, i.e. no better than chance. Curves that appear below the diagonal represent situations where the model classifier is giving rise to a reciprocal classification. It can also be used to decide on the acceptable TPR that will at the same time minimise the FPR, as shown by the red and blue vertical lines, and their relationship with the solid curve. For example, we could decide it is essential that all true positive values are correctly identified (red line) and have to accept an 80% false positive rate, or we could compromise on a lower level of true positive identification, which has the advantage of a lower false positive rate, as shown by the blue line, where TPR = 95% and FPR = 60%. However, the important point is that for each selected TPR, there is an associated value of the classifier (the maximum 1-hour concentration in any 24-hour period).
Thus, for the PM 10 data shown in Figure 1, the value of classifier concentration that will give a true positive rate of 95% (and a false positive rate of 60%) is 255 µg m -3 , whereas to achieve a true positive rate of 100% (and a false positive rate of 80%), we would need to lower the threshold to 180 µg m -3 . The value of the classifier at a given TPR is, de facto, a 1-hour TC, that has a probabilistic link to the selected 24-hour GV.
Regarding selection of an appropriate TPR/FPR value, the decision must be made on the basis of an acceptable scale of risk that balances public health protection against the resources that are available for incident response. The advantage of our ROC analysis approach to defining 1-hour TCs is the ability to specifically define the probabilities on which these decisions are made.
It is important to emphasise that because we use the maximum 1-hour concentration (in any 24hour period) as the classifier parameter in the ROC analysis, and that this value can occur at any position in a rolling 24-hour period, the results obtained from the models on which this analysis is based must have an equivalent interpretation. In other words, a one-hour TC could be triggered at the beginning middle or end of the 24-hour period that is predicted to exceed the relevant 24-hour GV. A preliminary analysis using SPSS showed that the median position for the maximum 1-hour concentration was at hour 12 for both PM 10 and PM 2.5 , as might be expected.

Initial model development using ROC analysis of PM 10 and PM 2.5 measurements from monitoring stations across the US.
Pre-generated data files of hourly PM 10 (USEPA parameter code 81102 ) and PM 2.5 (USEPA parameter code 88101) concentration data for the years 2014 to 2019, from ambient monitoring stations across the whole of the US, were downloaded in Comma Separate Values (CSV) file format from the US Environmental Protection Agency (US EPA) website (US EPA, 2021). The files were imported into Microsoft Access for further analysis. Table 1 summarises the number of monitoring stations in each state that were used in this study to provide 1-hour measurement data for PM 10 and PM 2.5 concentrations. The stations form part of a larger network of monitoring stations within these states. Additionally, Table S1   Pre-generated data files are available from the US EPA website in a format that has the null-data lines stripped out, whereas we required contiguous datasets at each monitoring station to calculate the rolling 24-hour averages that are required for the ROC analysis. We reconstructed contiguous datasets for each year, with null values reinstated, using an SQL query in Microsoft Access based on a contiguous hourly time series for that year together with the pre-generated USEPA data file.
Subsequently, these reconstructed hourly datasets of PM 10 and PM 2.5 concentrations at each monitoring station were analysed in Excel after exporting as a CSV file from Microsoft Access.
For each year and each monitoring station, the data files were prepared for ROC analysis in SPSS

J o u r n a l P r e -p r o o f
We were also interested in quantifying the probabilities associated with the EU (European Union, 2020a) and Stieb et al. (2008) approach to generating 1-hour TCs, i.e. the division of the 24-hour guideline by a factor of 0.55 (the ratio of C 24(i) : C max24(i) ). This was done by reading from the ROC output table, the corresponding TPR and FPR for the calculated 1-hour TC. In addition, we repeated this analysis for C 24(i) : C max24(i) ratios calculated from the US EPA dataset used in the current study.

Validation
The ROC probability approach to deriving 1-hour TCs was validated using a cross-validation design (Schaffer, 1993 Table 1) and also experiences a large number of wildfires (Rappold et al., 2017).
Therefore, the rationale for this separate validation was to see whether a model based on this profile is transferable to other states, where sources of elevated PM may differ, for example the dust storms in Arizona and New Mexico (Hyde et al., 2018, Raman et al., 2014.

Exceedance statistics for data used in the ROC analysis
ROC analysis was used to evaluate the effectiveness of the C max24 classifier to predict exceedances of As part of the process of compiling monitoring data for the model development and validation stages of this paper, some overall summary data on exceedances of various health-based 24-hour GVs were produced. Figure 2 shows the relative proportion of exceedances, by state, of selected 24-  For each state, the number of exceedances of any GV will be a function of both the number of monitoring sites that are in operation for PM 10 and PM 2.5 and the propensity for acute air pollution events. For example, California has by far the greatest number of PM 10 and PM 2.5 monitoring sites (see Table 1) but also has a large number of wildfires each year (Rappold et al., 2017), hence a large number of exceedances across the range of GVs. Thus, for the AQI 'unhealthy' and AQI 'very unhealthy' 24-hour GVs (the two highest 24-hour GVs used in this analysis for each PM fraction),

J o u r n a l P r e -p r o o f
California contributes more exceedances than for all the other US states combined. For AQI 'unhealthy for sensitive groups' (150 µg m -3 and 35.5 µg m -3 for PM 10 and PM 2.5 respectively), California contributes just under half of the exceedances.
For PM 10 , the other states that have high numbers of exceedances of the 24-hour exposure GVs include Arizona and New Mexico, which have frequent dust storms (Hyde et al., 2018, Raman et al., 2014 but also have a relatively high number of monitoring stations (see Table 1). For PM 2.5 , Montana and Washington make the greatest contributions to exceedances after California, reflecting the relatively large number of PM 2.5 monitoring stations in each of these states, but also the relatively high incidence of wildfires (Fann et al., 2018, Rappold et al., 2017. Finally, a clear trend from Figure 2, particularly for PM 2.5 , is that for the lower 24-hour GVs there is a far greater diversity of states contributing exceedance data. In contrast, for AQI 'Unhealthy', for both  considered 'good', whereas values greater than 0.8 could be considered 'very good' and those higher than 0.9, 'excellent' (Bekkar et al., 2013). A more stringent interpretation of AUC is given by Zhu et al. (2010), who set the boundary for 'good' at 0.8. For our ROC analysis, the AUC values are all above 0.99, demonstrating that C max24 is an excellent classifier parameter for determining whether selected 24-hour GVs will be exceeded.  Tables 2 and 3 show, for PM 10 and PM 2.5 respectively, the values of C max24 that will achieve TPRs of 90%, 95%, 99% and 100% for the selected 24-hour GVs. For clarity, we have denoted this value as C max24(TPR) . The predicted FPR is also shown. The values of C max24(TPR) and FPR in Tables 2 and 3 were   obtained from the ROC analysis output table for each corresponding TPR. C max24(TPR) is, in effect, a 1-hour TC that has a probabilistic link to a 24-hour GV. Thus, from  Table 3 can be similarly interpreted. Likewise for the PM 2.5 24-hour GVs in Table 4.

J o u r n a l P r e -p r o o f
The nature of elevated air pollution events means that false positives are likely to come in clusters, as illustrated by Figure 4, which shows an analysis, by date and site, of FPs associated with the PM 2.5 , 24-hour GV of 150 µg m -3 for 2018. Across 40 sites at which FPs were recorded, the FPs are clustered around dates corresponding to the Carr Fire in July/August (Lareau et al., 2018 and the Woolsey and Camp Fires in November (Keeley andSyphard, 2019, Wong et al., 2020). These are also the dates at which most TPs were recorded. Nevertheless, for each site, there are long periods of the year where no FPs are recorded. Table 2: Results for the overall ROC analysis for PM 10 , based on over 16 million rolling 24-hour periods from monitoring sites across the US. C max24(TPR) is the value of C max24 that will achieve true positive rates (TPRs) of 90%, 95%, 99% and 100% for the selected 24-hour GVs. The predicted false positive rate (FPR) is also shown. C max24(TPR) can be considered to be a 1hour TC.  The decision on which TPR is acceptable depends on the proposed public health response to predicted exceedances of the 24-hour value. For example, if the response is to send advisory health warnings to the affected population through media sources (Kochi et al., 2016, Mott et al., 2002, or to display the information through a mobile phone App, then the resource implications are clearly much less significant than if physical measures were taken to evacuate people (Stares et al., 2014. A higher FPR is likely to be tolerated in the former case, though there is a concern that too many false alarms will undermine confidence in the public health response system that is in place. Nevertheless, the results of the analysis in Figure 4, show that FPs are associated with specific events, and so FPs might not necessarily be registered by the public as false alarms. An J o u r n a l P r e -p r o o f additional aspect of the statistics on FPs, as shown in in Tables 2 and 3, is that lower FPRs are associated with higher 24-hour GVs, i.e. where public health advice is most needed. ) across sites in California in 2018. Only those sites recording one or more FPs are shown (40 in total). For each day, up to 24 FPs can be recorded at any particular site because the exceedances relate to rolling 24-hour periods. In total, across all sites, there were 1567 FPs and 832 TPs. Figure 5 shows the results of the cross-validation study that was carried out on the overall dataset.

Model validation
There is excellent agreement between the predicted (red line) and observed (black line) TPRs, with the observed performance for several of the 24-hour GVs exceeding that of the predicted.
Nevertheless, we see that the standard deviation of the mean observed TPR values increases for lower values of predicted TPR, i.e. at 90% and 95%; this is because there are far fewer data points at the higher values of C 24 and C max24 that are associated with these lower TPRs. Consequently, predictions made at TPRs of 95% and 90% are less reliable, though still likely to capture over 80% of exceedances for any particular year. The full set of cross-validation data for PM 10 and PM 2.5 is available in Tables S6 and S7, respectively.
We also investigated the predictive performance of an ROC model based only on data from California. The results of this validation study are shown in Figure 6, with the full dataset, including FPRs, available in Tables S8 and S9 for PM 10 and PM 2.5 respectively. For PM 10 , at all the 24-hour GVs, the observed TPRs track above the predicted curves. The agreement between predicted and observed TPRs is particularly noteworthy because the model developed for California was based mainly on exceedances due to wildfires, whereas for the validation set, other sources such as dust storms were major reasons for exceedances, notably the large number of exceedances contributed by Arizona and New Mexico (see Figure 2). The observed FPRs are also somewhat lower than the predicted FPRs. For PM 2.5 , the agreement is less good, with the curve for the observed TPRs tracking J o u r n a l P r e -p r o o f below that of the predicted; this is especially notable for the AQI 'very unhealthy' 24-hour GV (150 µg m -3 ). The derived 1-hour TCs at each probability level (Tables S6 and S7) are numerically very similar to those derived for the whole model, though for PM 10 , there are some deviations at the 95% and 90% levels.  An additional piece of confirmation work for our ROC model would be to examine the statistical relationship between exceedances of 1-hour TCs and selected health endpoints taken from population-based health data for the affected areas. In so doing, the 1-hour TCs could be assessed for their efficacy as health-relevant indicators of PM exposure in their own right.

Comparison with other approaches to calculating 1-hour TCs
As discussed in the introduction, an alternative approach to the development of 1-hour TCs in the literature has been to divide the relevant 24-hour GV by a fixed factor i.e., the ratio of the mean 24hour concentration to the maximum hourly concentration for the same period. This factor has been calculated to be 0.55 for PM 10 , based on monitoring data from urban background and roadside monitoring stations in Europe (European Union, 2007). Separately, a numerically identical value has been derived for data from Canada (Stieb et al., 2008). For the European study, the purpose of producing a 1-hour TC was to ensure that "the PM 10 sub-index based on hourly values on a given day will be (on average) consistent with the daily value once it is calculated (the next day)" (European Union, 2007). Nevertheless, this approach, and that of Stieb et al. (2008), cannot provide any information on the proportion of exceedances of 24-hour GVs that are predicted by their derived 1hour TCs. However, since our ROC approach also uses the maximum 1-hour concentration (C max24 ), we can calculate TPRs for 1-hour TCs derived using the European Union (2007) and Stieb et al. (2008) approach. One caveat is that the CAQI index for which 1-hour TCs were derived does not range as J o u r n a l P r e -p r o o f high as the 24-hour GVs considered in the current paper: for PM 10 , the boundary between CAQI High / Very High occurs at 100 µg m -3 .
We can also make comparisons with 1-hour TCs calculated using ratios derived from the US data in the current study. Ratios of 0.46 and 0.48 were calculated for PM 10 and PM 2.5 respectively, though these overall values mask considerable variation between states, as shown in Tables S10 and S11. Table 4 summarises the predicted TPRs (and FPRs) for PM 10 , based on 1-hour TCs that were derived from 24-hour guidelines using the C 24 : C max24 approach. The 1-hour TCs derived in this way do predict a high proportion of exceedances, with the accuracy increasing at the higher corresponding 24-hour GVs, i.e. up to 99.7% for the 355 µg m -3 GV. The 1-hour TC derived using US C 24 : C max24 data gives a slightly lower accuracy, reflecting its higher value.
However, for PM 2.5 , Table 5 shows that TPRs predicted using the C 24 : C max24 ratio-derived 1-hour TCs show a much lower accuracy, which decreases at the higher 24-hour GVs. Thus, for the 150 µg m -3 24-hour GV, the corresponding 1-hour TC (factor = 0.55) predicts only 47.9 % of exceedances. Table 4: Predicted TPRs (and FPRs) for 1-hour PM 10 TCs, that were derived by dividing 24-hour guidelines by a fixed factor, based on the ratio of the mean 24-hour concentration to the maximum hourly concentration for the same period. Two fixed factors are used: 0.55, derived from European data, and 0.46, derived from US monitoring data from the current study.     (Oakley et al., 2018, Zhou and Erdogan, 2019. Finally, from Table 6, we note that at the UK DAQI PM 10 boundary concentration for 'high' (24-hour average = 75 µg m -3 ), the corresponding DAQI 1-hour trigger concentration of 107 µg m -3 (Connolly and Willis, 2013) will give a TPR of about 99% using our ROC approach (i.e. comparing against the closest corresponding TPRs for threshold (trigger) concentrations derived using an alternative methodology and dataset (DAQI uses UK ambient air quality data), gives us further confidence in the applicability of the ROC approach developed in this paper.

The public health response
It is acknowledged that decisions on public health interventions for episodic PM events need to be taken as quickly as possible (WHO, 2006b). Nevertheless, there is a problem that the shortest averaging period for which there are epidemiological-based PM GVs is 24-hours (WHO, 2006b, Federal Register, 2012, WHO, 2006a. Therefore, the ability to robustly relate PM measurements gathered over shorter averaging periods to the epidemiologically based 24-hour PM guidelines provides a way of addressing this shortcoming. Our ROC-based probability model allows precisely this type of relationship to be established and potentially speeds up the public health risk assessment and decision-making processes. The question then remains as to what public health response is required during episodic events, and at which 24-hour GV (and associated 1-hour TC) this should be triggered.
Several studies have found that communicating simple advisory measures about the need to restrict physical activity and to shelter indoors has beneficial effects (Kolbe and Gilchrist, 2009), including an association with reduced respiratory effects (Mott et al., 2002). In the US, states are already required to report 24-hour AQI values for each metropolitan area with a population exceeding 350,000 (Code of Federal Regulations, 2016). For many states, this is done through the web-based NowCast system, which has the added advantage of predicting 24-hour concentrations based on the J o u r n a l P r e -p r o o f previous 12 hours of data (US EPA, 2018, Mintz et al., 2013. Each AQI category has a specified cautionary statement, which depends on the sub-index that has been breached (i.e, those for PM 10 , PM 2.5 , O 3 , CO, SO 2 or NO 2 ). These statements are particularly targeted at vulnerable people, urging them to stay indoors and to minimise physical activity. It is important that the care is taken to reach the more marginalised members of society, who might also be the most affected (Santana et al., 2020, WHO, 2006b.
A more timely prediction that certain 24-hour GVs will be exceeded allows for better planning, for example in ensuring that decisions are taken on identifying and safely evacuating the more vulnerable members of a population (Stares et al., 2014), in closing schools (Holm et al., 2020), advising on the wearing of masks (WHO, 2006b, Kolbe andGilchrist, 2009), or on the use of air cleaning systems in homes, schools workplaces and smoke refuges (Stares et al., 2014, Mott et al., 2002, Holm et al., 2020, Barn et al., 2016.
Regarding the most extreme of these measures, evacuation, the highest AQI category, 'Hazardous' (PM 10 , 425 µg m -3 ; PM 2.5 , 250.5 µg m -3 ), does not contain a specific requirement to evacuate, rather the advice is that the vulnerable should stay indoors and keep physical activity levels low (US EPA, 2018). However, guidance given to public health officials in the event of wildfires in the US does specify the possible evacuation 'at-risk populations' when the 24-hour PM 2.5 concentration reaches the AQI 'Hazardous' category (Stone et al., 2019). The British Columbia Centre for Disease Control also recommends that evacuation is considered when US AQI 'Hazardous' is reached, though that the likely duration, plume toxicity, and presence of vulnerable subgroups, such as the elderly, children and those with underlying health conditions, are taken into account (Stares et al., 2014).
The practice of having threshold concentrations for evacuation is one that is used in other countries, for example the UK, where a 24-hour PM 10 'trigger to evacuate' of 320 µg m -3 has been used during major incident fires (Brunt and Russell, 2012), and for which we have previously calculated a corresponding 1-hour TC of 550 µg m -3 (100% TPR) (Griffiths et al., 2018). This 1-hour TC is somewhat higher than the value of 439 µg m -3 obtained from the ROC model using US data in the current work (Table 6). Nevertheless, it is similar to the 99% TPR, 1-hour TC of 590 µg m -3 in the current paper and also the now withdrawn 1 to 3-hour average 'Recommended Action Level' for the closure of public buildings and possible evacuation, which was set of 526 μg m -3 for PM 10 /PM 2.5 during wildfires (Lipsett et al., 2008).

J o u r n a l P r e -p r o o f
Any decisions on evacuation should be carefully considered because there is evidence that they have a significant effect on mental and emotional health (Dodd et al., 2018, Krstic andHenderson, 2015) and may also involve other risks, such as exposure of vulnerable populations and responders to air pollutants during the evacuation process (Stewart-Evans et al., 2016). Other mitigating measures may be more effective, such as sheltering in place (Stewart-Evans et al., 2016), combined with the use of portable air cleaning devices (Barn et al., 2016, Mott et al., 2002.

Conclusions
In this paper we have demonstrated the application of ROC analysis to derive 1-hour TCs that have a probabilistic relationship with PM 10 and PM 2.5 health-based 24-hour exposure GVs. The analysis, based on 16 million and 22 million rolling 24-hour periods for PM 10 and PM 2.5 respectively, and involving a cross-validation design, shows that the maximum-observed PM concentration in any rolling 24-hour averaging period is an excellent predictor of exceedances of 24-hour GVs. An ROC analysis based on data only from California also provided a good basis for the prediction of exceedances from across the remaining states of the US.
The main advantages of our ROC method are as follows: (i) the high degree of accuracy that ROCgenerated TCs can achieve in predicting exceedances of health-based 24-hour guidelines; (ii) the consistency of year on year comparisons, as demonstrated by the validation analysis; (iii) the 'tunability' of the ROC method in generating TCs, i.e. the use of the ROC output table to select a TC that balances the need to achieve as high a TPR as possible, whilst also minimising the FPR; (iv) the transferability of this methodology to other datasets, e.g. in different countries, and to other pollutants for any 24-hour health based GV for which the corresponding 1-hour TC is required; and (v) the ease of use of the ROC model in generating TCs. The main disadvantage of the ROC approach is the high FPRs that are generated for the 24-hour GVs at the lower end of the harmfulness scale, though we have also shown that false predictions tend to be clustered around specific episodic events, coincident with real exceedances, and thus might not registered as a false alarm by the affected population.
Elevated PM during episodic air pollution events is associated with significant short-term health impacts, including mortality, and so the ability to provide timely public health guidance on appropriate remedial measures for affected populations is vital. We hope that the straightforward approach to developing 1-hour TCs that we have outlined in this paper might assist in this process.
J o u r n a l P r e -p r o o f

Role of the sponsor
The sponsors had no role in the design and/or conduct of the study; in the collection, analysis, and interpretation of the data; or in the preparation, review, and approval of the manuscript.

Conflicts of interest
All authors declare they have no actual or potential competing financial interest.

Declaration of Competing Interest
☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: