Impact of Regional Mobility on Air Quality during COVID-19 Lockdown in Mississippi, USA Using Machine Learning

Social distancing measures and shelter-in-place orders to limit mobility and transportation were among the strategic measures taken to control the rapid spreading of COVID-19. In major metropolitan areas, there was an estimated decrease of 50 to 90 percent in transit use. The secondary effect of the COVID-19 lockdown was expected to improve air quality, leading to a decrease in respiratory diseases. The present study examines the impact of mobility on air quality during the COVID-19 lockdown in the state of Mississippi (MS), USA. The study region is selected because of its non-metropolitan and non-industrial settings. Concentrations of air pollutants—particulate matter 2.5 (PM2.5), particulate matter 10 (PM10), ozone (O3), nitrogen oxide (NO2), sulfur dioxide (SO2), and carbon monoxide (CO)—were collected from the Environmental Protection Agency, USA from 2011 to 2020. Because of limitations in the data availability, the air quality data of Jackson, MS were assumed to be representative of the entire region of the state. Weather data (temperature, humidity, pressure, precipitation, wind speed, and wind direction) were collected from the National Oceanic and Atmospheric Administration, USA. Traffic-related data (transit) were taken from Google for the year 2020. The statistical and machine learning tools of R Studio were used on the data to study the changes in air quality, if any, during the lockdown period. Weather-normalized machine learning modeling simulating business-as-scenario (BAU) predicted a significant difference in the means of the observed and predicted values for NO2, O3, and CO (p < 0.05). Due to the lockdown, the mean concentrations decreased for NO2 and CO by −4.1 ppb and −0.088 ppm, respectively, while it increased for O3 by 0.002 ppm. The observed and predicted air quality results agree with the observed decrease in transit by −50.5% as a percentage change of the baseline, and the observed decrease in the prevalence rate of asthma in MS during the lockdown. This study demonstrates the validity and use of simple, easy, and versatile analytical tools to assist policymakers with estimating changes in air quality in situations of a pandemic or natural hazards, and to take measures for mitigating if the deterioration of air quality is detected.


Materials and Methods
Air quality data in the study region were obtained from the US Environmental Protection Agency (EPA) [33] for the period from 2011 to 2020 because of the availability of the data. The air quality data consist of the concentrations of six pollutants-PM2.5, PM10, O 3 , NO 2 , SO 2 , and CO. Because of the limited number of air quality collecting stations and large missing data, the air quality data are assumed to be representative of the entire region of the state.
Weather data were collected from the National Oceanic and Atmospheric Administration (NOAA) [34]. The weather data consist of the values of six meteorological variables-temperature, humidity, pressure, precipitation, wind speed, and wind direction.
Traffic is related to vehicle emissions such as nitrogen oxide (NO X ), CO, and black carbon (BC) that affect air quality. Traffic-related data (transit) for the study region were collected from Google community mobility reports [35][36][37] and from 'Our World Data' [38]. The COVID-19 lockdown in MS, USA was implemented from 15 March to 1 June 2020 [39] in terms of transportation restrictions.
This study was carried out to examine whether the COVID-19 lockdown affected the environment and air quality even in a region of non-metropolitan and not industrial settings. Exploratory data analysis was carried out using statistical methods. Machine learning modeling using weather normalization simulating BAU was carried out to make predictions of air quality. The computations of statistical analysis and machine learning modeling were performed using R programming in R-Studio [40] running on a PC with an Intel CORE i7 processor, 16 GB RAM, and 500 GB hard drive memory.

Results
The results of the time series study are given in Section 3.1, and the results of the machine learning model are given in Section 3.2.

Time Series Study and Results
The observed data on air pollutants, weather, and mobility were studied and organized as follows:

. Time Series Study of Air Pollutants
The time series of air pollutants are shown as a grid plot arranged in subplots of six rows in Figure 1. From the top, row 1, row 2, row 3, row 4, row 5, and row 6 show the time variations of PM2.5, PM10, NO 2 , O 3 , SO 2 , and CO, respectively. The units of measurement for the air pollutants are µg/m 3 , µg/m 3 , ppm, ppb, ppb, and ppm for PM2.5, PM10, O 3 , NO 2 , SO 2 , and CO, respectively.
The results of the descriptive statistics on the air pollutant data are shown in Table 1. The quantities describing the distribution of data are minimum (Min), maximum (Max), median, mean, first quartile (25%), third quartile (75%), and missing data (NAN). The first three quartiles determine the interquartile range, which is a measure of the variability of data around the median. The mean values of PM2. 5 During the study period, the correlations between the air pollutants are presented in Table 2. The highest correlation is 0.82 between PM2.5 and PM10. The correlation between NO2 and PM2.5 is 0.20. Overall, a positive correlation is observed between the air pollutants, though small. Figure 1. Time series of air pollutants arranged in subplots consisting of six rows. From the top, row 1, row 2, row 3, row 4, row 5, and row 6 represent the time variation of PM2.5, PM10, NO2, O3, SO2, and CO, respectively. The units of measurement for the air pollutants are µg/m 3 , µg/m 3 , ppm, ppb, ppb, and ppm for PM2.5, PM10, O3, NO2, SO2, and CO, respectively. The time variation of air pollutant concentrations before and after the lockdown year (2020) and their difference as a function of monthly and day-of-the-week averages are shown in Figures 2-7 for the air pollutants PM2.5, PM10, NO2, O3, SO2, and CO, respec-  During the study period, the correlations between the air pollutants are presented in Table 2. The highest correlation is 0.82 between PM2.5 and PM10. The correlation between NO 2 and PM2.5 is 0.20. Overall, a positive correlation is observed between the air pollutants, though small. The time variation of air pollutant concentrations before and after the lockdown year (2020) and their difference as a function of monthly and day-of-the-week averages are shown in Figures 2-7 for the air pollutants PM2.5, PM10, NO 2 , O 3, SO 2 , and CO, respectively. For each figure, column 1 and column 2 show monthly and day-of-theweek averages, respectively. The time duration considered for the lockdown year is from 1 January to 31 December 2020. For the pre-lockdown period, the time duration considered is from 1 January 2011 to 31 December 2019. The colors green, red, and blue represent the periods during the lockdown year, before the lockdown year, and their difference, respectively. During the lockdown period in 2020 (March to June), the difference line shows a decrease in the pollutant concentrations except for PM10 and O 3 . PM2.5 shows a decrease in some months and an increase in other months, and much the same occurs with the rest of the pollutants, except for SO 2 , which shows a year-round value below zero. In Figures 2 and 3, it is also observed that in the midweek (Wednesday and Thursday), the difference line shows a decrease in the pollutant concentrations except for PM10. Perhaps the lockdown largely restricted mobility during the week, compared to the weekend, when people might have more mobility to procure weekly provisions. averages, respectively. The time duration considered for the lockdown year is from 1 January to 31 December 2020. For the pre-lockdown period, the time duration considered is from 1 January 2011 to 31 December 2019. The colors green, red, and blue represent the periods during the lockdown year, before the lockdown year, and their difference, respectively. During the lockdown period in 2020 (March to June), the difference line shows a decrease in the pollutant concentrations except for PM10 and O3. PM2.5 shows a decrease in some months and an increase in other months, and much the same occurs with the rest of the pollutants, except for SO2, which shows a year-round value below zero. In Figures  2 and 3, it is also observed that in the midweek (Wednesday and Thursday), the difference line shows a decrease in the pollutant concentrations except for PM10. Perhaps the lockdown largely restricted mobility during the week, compared to the weekend, when people might have more mobility to procure weekly provisions.

Time Series Study of Meteorological Trends
The time series of the meteorological parameters are shown as a grid plot arranged in subplots of six rows in Figure 8. From the top, row 1, row 2, row 3, row 4, row 5, and row 6 show the time variation of temperature, humidity, wind speed, wind direction, pressure, and precipitation, respectively. The units for the variables are Fahrenheit, percentage, mph, degrees, in. Hg, and inches, respectively. The time series of the temperature, humidity, pressure, and wind speed show general seasonality patterns. For example, the temperatures are low in the winter and high in the summer. The relative humidity pattern is generally low when the temperatures are high, showing a mirror image. The pressure and wind speed show a general pattern of highs in the winter and lows in the summer. The precipitation shows variability, but a high amount of rainfall is normally associated with cool seasons. However, the impact of the weather on the air quality during lockdown can only be established by eliminating seasonal contributions using machine learning.

Time Series Study of Meteorological Trends
The time series of the meteorological parameters are shown as a grid plot arranged in subplots of six rows in Figure 8. From the top, row 1, row 2, row 3, row 4, row 5, and row 6 show the time variation of temperature, humidity, wind speed, wind direction, pressure, and precipitation, respectively. The units for the variables are Fahrenheit, percentage, mph, degrees, in. Hg, and inches, respectively. The time series of the temperature, humidity, pressure, and wind speed show general seasonality patterns. For example, the temperatures are low in the winter and high in the summer. The relative humidity pattern is generally low when the temperatures are high, showing a mirror image. The pressure and wind speed show a general pattern of highs in the winter and lows in the summer. The precipitation shows variability, but a high amount of rainfall is normally associated with cool seasons. However, the impact of the weather on the air quality during lockdown can only be established by eliminating seasonal contributions using machine learning.
The results of the descriptive statistics on the meteorological data are shown in Table 3. During the study period, the correlations between the meteorological variables are presented in Table 4. In particular, the temperature has a positive correlation of 0.44 and 0.13 with humidity and precipitation, respectively, but shows a negative correlation of −0.63, −0.10, and −0.08 with pressure, wind speed, and wind direction, respectively.   Vehicle emissions are sources of air pollutants such as oxides of nitrogen (NO X ), CO, and black carbon (BC) that affect air quality. Hence, transit (traffic-related) data were collected from the Google community mobility report [35][36][37]. Figure 9 shows the time series of the transit data as a percentage change based on a baseline [18,37], and its summary statistics are given in Table 5. The percentage changes in the mean, maximum, and minimum values of transit are −2.79, 20.00, and −50.50, respectively. The observed decrease in transit of −50.5% is expected to have contributed to the changes in the air quality during the COVID-19 lockdown.

Mobility: Time Series Study and Results
Vehicle emissions are sources of air pollutants such as oxides of nitrogen (NOX), CO, and black carbon (BC) that affect air quality. Hence, transit (traffic-related) data were collected from the Google community mobility report [35][36][37]. Figure 9 shows the time series of the transit data as a percentage change based on a baseline [18,37], and its summary statistics are given in Table 5. The percentage changes in the mean, maximum, and minimum values of transit are −2.79, 20.00, and −50.50, respectively. The observed decrease in transit of −50.5% is expected to have contributed to the changes in the air quality during the COVID-19 lockdown.

Figure 9
Time series plot of percentage change in transit, 2020. Table 5. Summary statistics of transit as percentage change by a baseline. Transit data were collected from the Google community mobility report [35][36][37] for the year 2020.

Machine Learning Modeling: Business as Usual Scenario Model
Apart from the lockdown intervention on air quality, meteorological conditions also play a role in affecting air pollutant concentrations. Favorable weather conditions such as increased wind and rain lower the air pollutant concentrations more than on a normal day of the week, whereas unfavorable conditions of low winds and a stable atmosphere may elevate the concentrations. Quantifying the changes caused by the lockdown by comparing the air pollutant concentrations before and after the intervention (such as in Table 3) may lead to wrong conclusions, since the meteorological changes may mask the variation in concentrations caused by the intervention. Hence, it would be difficult to determine from the observed data whether the changes in the concentrations (increase or decrease) are caused by weather conditions or by the traffic regulations implemented during the lockdown. By using machine learning models, one can subtract the weather component from the observation to obtain weather-normalized data that show the underlying causes of the change in the concentrations simulating a business-as-usual scenario (BAU) [18,[25][26][27][28][29][30][31][32]. Weather normalization can be achieved by using random forest (RF) regression  Table 5. Summary statistics of transit as percentage change by a baseline. Transit data were collected from the Google community mobility report [35][36][37] for the year 2020.

Item Transit
Min

Machine Learning Modeling: Business as Usual Scenario Model
Apart from the lockdown intervention on air quality, meteorological conditions also play a role in affecting air pollutant concentrations. Favorable weather conditions such as increased wind and rain lower the air pollutant concentrations more than on a normal day of the week, whereas unfavorable conditions of low winds and a stable atmosphere may elevate the concentrations. Quantifying the changes caused by the lockdown by comparing the air pollutant concentrations before and after the intervention (such as in Table 3) may lead to wrong conclusions, since the meteorological changes may mask the variation in concentrations caused by the intervention. Hence, it would be difficult to determine from the observed data whether the changes in the concentrations (increase or decrease) are caused by weather conditions or by the traffic regulations implemented during the lockdown. By using machine learning models, one can subtract the weather component from the observation to obtain weather-normalized data that show the underlying causes of the change in the concentrations simulating a business-as-usual scenario (BAU) [18,[25][26][27][28][29][30][31][32]. Weather normalization can be achieved by using random forest (RF) regression models [41] via the 'randomForest' package in R [42]. "RF" regression is a type of ensemble learning method using many of what are known as "weak" predictors for building a forest of decision trees to obtain a good prediction accuracy [32].
In the present study, the weather normalization of Grange et al. was implemented using an RF model-based R programming package-rmweather [25][26][27]. The major instructions used to execute the model are as follows:

1.
Install and run the packages in R studio [40].

2.
Load and run the data set for each pollutant and the independent variables.
a. Features or independent variables.

3.
Run the random forest model for training and create a meteorological normalized trend.
a. Define the training data set period. b.
Split the data and the training set; the test is for validation.
i. Input features-meteorological and temporal variables.
c. Run the RF model using weather normalization.
ii. For each day, resample the meteorological explanatory variables by repeating a certain number of times (say 300). iii.
Aggregate the predicted values from each iteration to obtain the meteorological normalized concentration. iv.
The estimated values represent the emission changes rather than the changes due to meteorological effects. v.
Repeat resampling for every day in the data set. d.
Set the hyper-parameters of the model.

4.
Check the model performance on the test part of the training set.

5.
Plot variable importance and decide which variables are more important, and remove insignificant variables if required. 6.
Run the model prediction and check if the model has suffered from overfitting. 7.
Check for partial dependencies and remove missing variables. 8.
Run the weather-normalized and trained RF model on the test data set (lockdown period) to obtain the predicted values of the pollutant concentrations. 9.
Collect the observed and predicted pollutant values; compute the pollutant change due to the lockdown, and plot for visualization.
For each pollutant, the model was trained on the past data of meteorological and temporal variables for the period from 1 January 2011 to 29 February 2020. Beyond the training period (29 February 2020), the trained model was used to predict the pollutant concentration using observed meteorological variables to generate a "counterfactual" time series that represents the estimation of concentrations under a business-as-usual scenario.
The RF model was run for each of the six air pollutants (PM2.5, PM10, O 3 , NO 2 , SO 2 , and CO). The meteorological features of the model are temperature, humidity, pressure, precipitation, wind speed, and wind direction. The temporal features consist of a trend term (Unix date), a seasonal term (Julian day), a weekday, a week, and a month. The results of the model are shown in Figures 10-15 for the observed and predicted results of PM2.5, O 3 , PM10, SO 2 , NO 2 , and CO, respectively. In the figures, the blue line represents the smoothened plot of predicted values with the confidence intervals shown by the green shade. The boxplot distribution of the observed and predicted concentrations for each air pollutant is shown in Figures 16 and 17 as subplots of three rows and two columns in pairs of the observed and predicted concentrations for each pollutant. In Figure 16, starting from the top, the observed and predicted results of PM2.5, PM10, and NO 2 are represented, respectively, by (row 1, column 1, and column 2), (row 2, column 1, and column 2), (row 3, column 1, and column 2). In Figure 17, starting from the top, the observed and predicted results of O 3 , SO 2 , and CO are represented, respectively, by (row 1, column 1, and column 2), (row 2, column 1, and column 2), (row 3, column 1, and column 2). The boxplot distribution summary statistics are shown in Table 6, where the underscore _obs and _prd represent the observed and predicted values, respectively, of each of the pollutants. The RF hyperparameters used were kept the same for each model and were based on the best results-the number of trees used is 300, the number of predictors randomly sampled to determine each split (mtry) is two, and the minimum node size is five (see Table 7). The model metrics are also shown in Table 7. The root mean square error (RMSE) ranged between 7.56 and 0.008.        Statistical testing (t-test) was performed on the time series of the observed and predicted values for each pollutant to check if they were significantly distinct. The results of the statistical testing are shown in Table 8. For the case of NO2, O3, and CO, it was found that the means of the observed and predicted data are significantly distinct (p < 0.05). Due to the lockdown, the mean concentrations decreased for NO2 and CO by −4.1 ppb and −0.088 ppm, respectively, while the mean concentrations of O3 increased by 0.002 ppm (see Figures 16 and 17.  . Boxplot distribution of observed and predicted air pollutants during a lockdown as subplots of three rows and two columns in pairs of observed and predicted concentrations for each pollutant. Starting from the top, the observed and predicted results of O3, SO2, and CO are represented, respectively, by (row 1, column 1, and column 2), (row 2, column 1, and column 2), (row 3, column 1, and column 2). The underscore _obs and _prd represent the observed and predicted values, respectively, of each of the pollutants. The symbols O3, SO2, and CO, stand for the pollutants O3, SO2, and CO, respectively. The units of O3, SO2, and CO, are ppm, ppb, and ppm, respectively. For all the six air pollutants, the RF hyper-parameters used were kept the same-the number of trees used is 300, the number of predictors randomly sampled to determine each split (mtry) is two, and the minimum node size is five.
The units of PM2.5, PM10, O3, NO2, SO2, and CO are µg/m 3 , µg/m 3 , ppm, ppb, ppb, and ppm, respectively. 'ci' stands for confidence interval.   Statistical testing (t-test) was performed on the time series of the observed and predicted values for each pollutant to check if they were significantly distinct. The results of the statistical testing are shown in Table 8. For the case of NO 2 , O 3 , and CO, it was found that the means of the observed and predicted data are significantly distinct (p < 0.05). Due to the lockdown, the mean concentrations decreased for NO 2   For all the six air pollutants, the RF hyper-parameters used were kept the same-the number of trees used is 300, the number of predictors randomly sampled to determine each split (mtry) is two, and the minimum node size is five.

Discussion
Onyeaka et al. [13] reported that there was a reduction of up to 30% in environmental pollution as a result of half of the world's population experiencing some form of lockdown, with an attendant reduction in mobility of up to 90%. A study of air quality during COVID-19 showed that the concentrations of PM 2.5 and PM10 decreased by 12% and 37% in Los Angeles, and 24% in New York, while NO 2 decreased by 25% in Sao Paulo, 38% in Los Angeles, and 24% in New York [15]. In a comparative study of the impact of COVID-19 onset on air quality in several cities of the world, Washington, DC is reported to have a reduction in PM 2.5 by about 10% [17]. Mohammed et al. [43] reported that NO 2 emissions were reduced by up to 30% based on the NASA satellite image over the northeastern USA before and after the lockdown. Chen et al. [44] completed a study of quarantine in China and found that PM2.5 dropped by 1.4 µg/m 3 in Wuhan but decreased by 18.9 µg/m 3 across 367 cities, and NO 2 dropped by 22.8 µg/m 3 and 12.9 µg/m 3 in Wuhan and China, respectively. Ghahremanloo et al. [45] reported a decrease in the percentage of PM 2.5 by 18%, 13.53%, and 20.7% in Boston, Detroit, and New York, respectively. In a study of air quality in urban sites in Spain, a slight reduction in PM10 (−4.1%) and PM2.5 levels (−2.3%) was observed during the lockdown, and a maximum reduction of above −50% was observed for NOx, whereas a maximum increase of 23.9% was observed for O 3 in contrast with a decrease in NOx [46].
In the present study, weather-normalized RF machine modeling is used to take into account the influence of meteorological changes on the model. Lower temperatures (less than 45 • F) were observed with more frequent southerly winds than for other wind directions, while higher temperatures (greater than 70 • F) were associated more with northern and NW directions. Similar observations were also seen in the case of humidity. During the lockdown, there were southerly and SW (southwest) winds that were stronger and had greater frequency (number of occurrences) than in the pre-lockdown years. This would have an effect on the air pollutions by dispersion frequency, or the number of times the wind blew in a given direction. The weather-normalized RF model estimates the pollutant concentrations for the lockdown period by subtracting the weather influences and predicting the pollutant concentrations under the BAU scenario.
Using weather-normalized RF machine modeling, it was observed that the means of the observed and predicted air pollutant values are significantly distinct (p < 0.05) for the case of NO 2 , O 3 , and CO. Due to the lockdown, the mean concentrations decreased for NO 2 (Figure 12, row 3 of Figure 16) and CO (Figure 15, row 3 of Figure 17) by −4.1 ppb and −0.088 ppm, respectively, while it increased for O 3 (Figure 13, row of Figure 17) by 0.002 ppm, leading to a partial improvement in the air quality due to the lockdown. The decrease in the concentrations of NO 2 reflects the restricted transit during lockdown because NO 2 and diesel soot are directly related to automobile vehicular traffic. Since the RF model predictions are based on weather normalization, the decrease in NO 2 and CO can be attributed to the lower emissions caused by less traffic during the lockdown. There were changes in the mean concentrations of particulate matter, but they cannot be accounted for since their p-values were found to be greater than 0.05. In the present study, there was an increase in the mean concentration of O 3 by 0.002 ppm, similar to that observed by others [28,46]. Generally, ozone concentrations depend on its precursors such as nitrogen oxide (NOx). Fewer NOx emissions may increase O 3 because it was broken down less frequently, but it is difficult to establish the lockdown impact on O 3 based on NOx emissions only. Normally, asthma prevalence is associated with particulate matter. In MS, the percentage of adults reported as having asthma was 9.9% in 2019 but decreased to 8.9% in 2020 [47] which agrees with the model prediction of a decrease in the mean concentrations of PM2.5 (−0.1 µg/m 3 ).
The methodology used in the present study is one of the two recent approaches developed during the post-pandemic period [28]. The two approaches differ in the way the effects of meteorology are analyzed to deduce pollutant changes during external influences such as the lockdown. In the first approach, a base case is used with a reference measurement period of the past, such as a similar period before the lockdown. Then, the changes in pollutant concentrations are deduced from the difference between base cases and the lockdown period. The method, although simple, does not completely eliminate the meteorological effects. The second approach uses predictive machine learning models to isolate lockdown intervention on the air pollutant concentration [25][26][27][28][29][30][31]. In the present study, the second approach is applied by comparing the predicted results of 2020 with actual observations made during the lockdown period. Hence, the results of the method used are a true measure of the relative changes for the lockdown intervention compared to the results obtained using the base case method. However, there are some other statistical, artificial neural network models and classification regression machine models that can be used to analyze the meteorological effects on air quality but were not explored presently and would be attempted later if adequate data size is available.
The present study confirms the hypothesis that, in principle, the COVID-19 lockdown affects the air quality even in non-metropolitan and non-industrial environments of MS. To study the spatial distribution of air quality, it would be good to have air quality data from multiple stations covering rural, urban, and traffic locations. Further, it is also required to have non-missing hourly observations for each station for validating the machine learning modeling accurately. In spite of these two limitations, the present study demonstrated an easy and simple methodology of using versatile tools of statistical and machine learning modeling for investigating the changes in air quality caused by pandemics or natural hazards. The methodology and results show the importance of taking mitigating measures for sustainable improvement in air quality.

Conclusions
Statistical and weather-normalized random forest modeling methods were used to study the impact of the COVID-19 lockdown on air quality in MS. Weather-normalized modeling mimics a business-as-usual scenario to establish the changes in the air quality as caused by the traffic regulations during the lockdown, by subtracting the weather component from the values of the observed air pollutants. For the pollutants NO 2 , CO, and O 3 , significant changes were found in their observed and predicted mean concentration values. The mean concentrations decreased for NO 2 and CO but increased for O 3 . The decrease in the NO 2 and CO concentrations reflects a partial improvement in the air quality due to the lockdown. The observed decrease in the pollutant concentration and the predicted air quality results show that the decrease in transit is associated with the decrease in asthma prevalence due to the lockdown. The decrease in NO 2 and increase in O 3 suggest different measures to mitigate these pollutants because one is a consequence of the other. The study examines the effect of lockdowns to control the COVID-19 pandemic on air quality, particularly in urban regions of non-metropolitan and non-industrial settings. The study demonstrates the utility of simple, easy, and versatile analytical tools to generate scientific data to assist policymakers in making informed decisions regarding the assessment and management of air quality during a pandemic or natural disaster.