The Impact of Air Quality and Meteorology on COVID-19 Cases at Kuala Lumpur and Selangor, Malaysia and Prediction Using Machine Learning

: Emissions from motor vehicles and industrial sources have contributed to air pollution worldwide. The effect of chronic exposure to air pollution is associated with the severity of the COVID-19 infection. This ecological investigation explored the relationship between meteorological parameters, air pollutants, and COVID-19 cases among residents in Selangor and Kuala Lumpur between 18 March and 1 June in the years 2019 and 2020. The air pollutants considered in this study comprised particulate matter (PM 2.5 , PM 10 ), sulfur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), ozone (O 3 ), and carbon monoxide (CO), whereas wind direction (WD), ambient temperature (AT), relative humidity (RH), solar radiation (SR), and wind speed (WS) were analyzed for meteorological information. On average, air pollutants demonstrated lower concentrations than in 2019 for both locations except PM 2.5 in Kuala Lumpur. The cumulative COVID-19 cases were negatively correlated with SR and WS but positively correlated with O 3 , NO 2 , RH, PM 10 , and PM 2.5 . Overall, RH (r = 0.494; p < 0.001) and PM 2.5 (r = − 0.396, p < 0.001) were identiﬁed as the most signiﬁcant parameters that correlated positively and negatively with the total cases of COVID-19 in Kuala Lumpur and Selangor, respectively. Boosted Trees (BT) prediction showed that the optimal combination for achieving the lowest Root Mean Squared Error (RMSE), Mean Squared Error (MSE), and Mean Absolute Error (MAE) and a higher R-squared (R 2 ) correlation between actual and predicted COVID-19 cases was achieved with a learning rate of 0.2, a minimum leaf size of 7, and 30 learners. The model yielded an R 2 value of 0.81, a RMSE of 0.44, a MSE of 0.19, and a MAE of 0.35. Using the BT predictive model, the number of COVID-19 cases in Selangor was projected with an R 2 value of 0.77. This study aligns with the existing notion of connecting meteorological factors and chronic exposure to airborne pollutants with the incidence of COVID-19. Integrated governance for holistic approaches would be needed for air quality management post-COVID-19 in Malaysia.


Introduction
The novel Coronavirus Disease  was announced by the World Health Organization (WHO) as a global health emergency on 11 March 2020. COVID-19 is a SARS-CoV-2 virus infectious disease initially reported in Wuhan, China, on December 1, 2019 [1]. This virus is highly infectious and spreads rapidly to many countries. COVID-19 spreads internationally, affecting the world's economy and ecology in many ways, not only as a health problem [2]. There are four main structural proteins in SARS-CoV-2, namely the In air quality monitoring, machine learning algorithms can analyze large amounts of air quality data to identify patterns and predict air pollution levels, helping cities and governments take proactive measures to reduce air pollution [27,28]. This study employs a machine learning technique called Boosted Trees (BT) to assess the influence of air quality parameters and meteorological factors on COVID-19 cases in Kuala Lumpur. Subsequently, the Selangor cases are predicted using the model. BT utilizes an ensemble of decision trees to make predictions, creating a sequence of trees where each successive tree is tailored to the negative gradient of the loss function concerning the current predictions. The ultimate prediction is generated by amalgamating the predictions of all trees in the sequence.

COVID-19 Incidence Data
Data on COVID-19 cases in Selangor and Kuala Lumpur were retrieved from the official website of Malaysia's Ministry of Health available at http://COVID-19.moh.gov.my (accessed on 31 May 2020). The duration of the cumulative daily cases of COVID-19 was taken from 18 March to 26 May 2020.

Environmental Condition Data
Meteorological and air pollution data were obtained from the Department of Environment (DOE), Ministry of Environment and Water, Malaysia. Specifically, the data were sourced from two stations: Petaling Jaya Station in Selangor (coordinate: 3.1094 • N 101.6388 • E) and Batu Muda Station in Kuala Lumpur (coordinate: 3.2124 • N 101.6822 • E). The locations of these monitoring stations are visualized in Figure 1. Batu Muda Station in Kuala Lumpur was selected because it represented an urban area and is within proximity to the populous city of Kuala Lumpur, which has high COVID-19 cases. As for Petaling Jaya Station in Selangor, it was chosen because it covers the industrial area and highly populated zone in Selangor. The duration of the data was from 4 March 2019 to 26 May 2020 and the data was averaged over 24 h each day. Meteorological parameters included relative humidity (RH), wind direction (WD), wind speed (WS), solar radiation (SR), and AT data. The air pollutants components were PM 2.5 (particulate matter with ≤10 µm diameter), PM 10 (particulate matter with ≤2.5 µm diameter), sulfur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), ozone (O 3 ), and carbon monoxide (CO).
The hourly air pollution dataset recorded at the Continuous Air Quality Monitoring Station (CAQMS) was obtained from the Malaysian Department of Environment (DOE). The instrument used for the measurements of PM 10 and PM 2.5 was a Thermo Scientific Model TEOM 1450-DF, while for SO 2 , NO 2 , CO, and O 3 , the instruments were Thermo Scientific Models 43 i, 42 i, 48 i, and 49 i, respectively. Each instrument was calibrated monthly to ensure accuracy and precision. The concentration of each pollutant was determined at 10-min intervals and then calculated for 1-h averages. The instrument used for the measurements of PM 10 and PM 2.5 was a Thermo Scientific Model TEOM 1450-DF [29,30]. Meanwhile, NO 2 concentrations were detected using the NO 2 analyzer Model 200A based on chemiluminescence detection principles. The Teledyne API Model 400/400E instrument, via a UV absorption (Beer-Lambert) technique with a precision level of 0.5% and a detection limit of 0.4 ppb, was utilized in measuring the O 3 concentrations [31,32]. Measurements of SO 2 concentrations were conducted using the Teledyne API Model 100A/100E with the lowest detection level at 0.04 ppb by the UV fluorescence approach, while the Teledyne API Model 300/300E was employed to determine CO concentrations with a 0.04 ppm detection limit and 0.5% precision level by the non-dispersive and infrared absorption (Beer-Lambert) method [32]. Meanwhile, the meteorological parameters were measured using the Met One 062 sensor, the Met One 083D sensor, and the Met One 010C sensor for measurements of AT, RH, and WS, respectively [33]. Consequently, all monitoring instruments were calibrated daily using zero air and standard gas concentrations to certify and validate the monitored data, which were reviewed before they were handed to the DOE [34].
Met One 010C sensor for measurements of AT, RH, and WS, respectivel quently, all monitoring instruments were calibrated daily using zero air an concentrations to certify and validate the monitored data, which were re they were handed to the DOE [34].

Statistical Analysis
Statistical analysis was performed using the Statistical Package for (SPSS) software version 25. The paired-samples t-test was performed to ana cally significant difference between two related samples that were normal Data not conforming to normality assumptions were analyzed using the Wi rank test. Furthermore, a Spearman's correlation test was employed to det variate level relationship (correlation coefficient; r) between the cumulat COVID-19 cases, air pollutants, and meteorological parameters. Statistica relationships were considered when p < 0.05.
The data on air pollutants in 2019 were analyzed against the data COVID-19 cases in 2020 to investigate the long-term effect of air pollutants hand, data on meteorological factors in 2020 were analyzed against the data COVID-19 cases in 2020. After that, multiple linear regression was applied the relationship between cumulative COVID-19 cases and predictor variab

Machine Learning Approach
During the period 18 March to 1 June 2020, BT regression was traine influence of air quality and meteorological data on COVID-19 cases. BT is to its numerous benefits, such as the capability to fine-tune the model usin preceding trees, enhanced effectiveness in handling imbalanced datasets, s

Statistical Analysis
Statistical analysis was performed using the Statistical Package for Social Science (SPSS) software version 25. The paired-samples t-test was performed to analyze a statistically significant difference between two related samples that were normally distributed. Data not conforming to normality assumptions were analyzed using the Wilcoxon signed-rank test. Furthermore, a Spearman's correlation test was employed to determine the bivariate level relationship (correlation coefficient; r) between the cumulative number of COVID-19 cases, air pollutants, and meteorological parameters. Statistically significant relationships were considered when p < 0.05.
The data on air pollutants in 2019 were analyzed against the data on cumulative COVID-19 cases in 2020 to investigate the long-term effect of air pollutants. On the other hand, data on meteorological factors in 2020 were analyzed against the data on cumulative COVID-19 cases in 2020. After that, multiple linear regression was applied to investigate the relationship between cumulative COVID-19 cases and predictor variables.

Machine Learning Approach
During the period 18 March to 1 June 2020, BT regression was trained to assess the influence of air quality and meteorological data on COVID-19 cases. BT is opted for due to its numerous benefits, such as the capability to fine-tune the model using errors from preceding trees, enhanced effectiveness in handling imbalanced datasets, shorter training time since trees are trained one at a time, and implicit feature selection by assigning greater significance to features that are more informative for prediction. The training and validation  The MATLAB software  R2020a was developed and released by MathWorks. The predictors were months, PM 10 ,  PM 2.5 , SO 2 , NO 2 , O 3 , CO, and meteorological parameters WD, WS, RH, SR, and AT. Other  parameters are shown in Table 1. The operational flow of the machine learning algorithm is depicted in Figure 2.  Figure 2.  The process of BT prediction began with the collection and pre the algorithm. The data was then normalized using Equation (1), wh range between 0 and 1. Data normalization is an important preprocess learning because it scales the features to a similar range, which helps r of some features that might have significantly larger values than others some features from dominating the objective function and improves model.
where X is a data point, Xmin is the minimum value in the dataset, and X value in the dataset. Then, the missing values were investigated. Many chine learning methods require complete datasets to generate good pr dealing with missing data is a required step. The BT model was chose of decision trees was generated using a boosting algorithm to com The process of BT prediction began with the collection and preparation of data for the algorithm. The data was then normalized using Equation (1), which rescaled it to a range between 0 and 1. Data normalization is an important preprocessing step in machine learning because it scales the features to a similar range, which helps reduce the influence of some features that might have significantly larger values than others. This helps prevent some features from dominating the objective function and improves the training of the model.
where X is a data point, X min is the minimum value in the dataset, and X max is the maximum value in the dataset. Then, the missing values were investigated. Many statistical and machine learning methods require complete datasets to generate good predictions; therefore, dealing with missing data is a required step. The BT model was chosen, and an ensemble of decision trees was generated using a boosting algorithm to combine multiple weak learners, which were decision trees in this case, into a strong learner. After training the model, its performance was evaluated before being employed to predict the response variable for new data. The performances of machine learning models were evaluated and compared using statistical error metrics such as Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error (MAE), and R-squared (R 2 ). The RMSE measures the difference between the predicted and true values of the response variable, with lower values indicating better model performance. It is calculated by taking the square root of the mean of the squared differences between the predicted and true values. Unlike RMSE, MAE calculates the average of the absolute differences between predicted and actual values, as shown in Equation (2).
where x is the actual value, y is the predicted value, and n is the number of samples in the dataset. The R 2 is utilized to show how well the regression model fits the data, where a higher R 2 implies that a larger proportion of the variance in the dependent variable can be accounted for by the independent variable(s). In this study, the dependent and independent variables are actual and predicted values. The R 2 value ranges from 0 to 1, where a value of 0 indicates that none of the variation in the dependent variable is explained by the independent variable(s), and a value of 1 indicates that all of the variation in the dependent variable is explained by the independent variable(s). Equation (3) shows how the R 2 is calculated.

Status of the COVID-19 Outbreak Situation in Malaysia
The trend of cumulative COVID-19 cases in Kuala Lumpur and Selangor during the study duration is shown in Figure 3. MCO in Malaysia began on 18 March 2020. On the first day, 119 cases and 192 cases were reported in Kuala Lumpur and Selangor, respectively. At the beginning of this study, Selangor recorded a higher number of cases compared to Kuala Lumpur. Several months later, as of 1 June 2020, 7857 cumulative cases of COVID-19 were reported in Malaysia. Kuala Lumpur recorded the highest number of cases, with 2039 cases, while Selangor had 1920 cases. Unlike RMSE, MAE calculates the average of the absolute differences between predicted and actual values, as shown in Equation (2).
where x is the actual value, y is the predicted value, and n is the number of samples in the dataset. The R 2 is utilized to show how well the regression model fits the data, where a higher R 2 implies that a larger proportion of the variance in the dependent variable can be accounted for by the independent variable(s). In this study, the dependent and independent variables are actual and predicted values. The R 2 value ranges from 0 to 1, where a value of 0 indicates that none of the variation in the dependent variable is explained by the independent variable(s), and a value of 1 indicates that all of the variation in the dependent variable is explained by the independent variable(s). Equation (3) shows how the R 2 is calculated.

Status of the COVID-19 Outbreak Situation in Malaysia
The trend of cumulative COVID-19 cases in Kuala Lumpur and Selangor during the study duration is shown in Figure 3. MCO in Malaysia began on 18 March 2020. On the first day, 119 cases and 192 cases were reported in Kuala Lumpur and Selangor, respectively. At the beginning of this study, Selangor recorded a higher number of cases compared to Kuala Lumpur. Several months later, as of 1 June 2020, 7857 cumulative cases of COVID-19 were reported in Malaysia. Kuala Lumpur recorded the highest number of cases, with 2039 cases, while Selangor had 1920 cases.     A total of 115 COVID-19-related deaths were reported in the country between 18 March 2020 and 1 June 2020 during the study. As highlighted by Malaysia's Ministry of Health, some of the patients that succumbed to COVID-19 had a minimum of two chronic illnesses, such as heart disease, diabetes, asthma, stroke, dementia, and kidney disease [35]. Exposure to poor air quality can lead to various health issues, especially among vulnerable populations such as immunocompromised patients. Moreover, air pollution can enhance the likelihood of COVID-19 infection due to comorbidities or other respiratory illnesses [36]. When a person has comorbidities, their immune system may be impaired or require extra care, which may expose them to other infections.
Cilia and upper airway defenses could have been weakened by persistent exposure to air pollution, which may have encouraged viral invasion by allowing viruses to invade lower airways, increasing COVID-19 occurrence and lethality [7]. The immune system could be severely compromised by a highly infectious virus, such as the novel SARS-CoV-2. This is particularly evident among people residing in locations with extreme levels of air pollution. Zhu et al. [8] reported a statistically significant association between COVID-19 infection and air pollution. Likewise, Liu et al. [37] demonstrated that COVID-19 community spread could be favored by low temperatures, low humidity, and mild diurnal temperature ranges.

Concentrations of Air Pollutants and Meteorological Parameters before and during MCO
Many processes contribute to causing air pollution in automobiles (aircraft, trucks, automobiles, and other engines), power plants, industries, and household heating systems. The release of chemicals and harmful gases interacts with sunlight to increase the A total of 115 COVID-19-related deaths were reported in the country between 18 March 2020 and 1 June 2020 during the study. As highlighted by Malaysia's Ministry of Health, some of the patients that succumbed to COVID-19 had a minimum of two chronic illnesses, such as heart disease, diabetes, asthma, stroke, dementia, and kidney disease [35]. Exposure to poor air quality can lead to various health issues, especially among vulnerable populations such as immunocompromised patients. Moreover, air pollution can enhance the likelihood of COVID-19 infection due to comorbidities or other respiratory illnesses [36]. When a person has comorbidities, their immune system may be impaired or require extra care, which may expose them to other infections.
Cilia and upper airway defenses could have been weakened by persistent exposure to air pollution, which may have encouraged viral invasion by allowing viruses to invade lower airways, increasing COVID-19 occurrence and lethality [7]. The immune system could be severely compromised by a highly infectious virus, such as the novel SARS-CoV-2. This is particularly evident among people residing in locations with extreme levels of air pollution. Zhu et al. [8] reported a statistically significant association between COVID-19 infection and air pollution. Likewise, Liu et al. [37] demonstrated that COVID-19 community spread could be favored by low temperatures, low humidity, and mild diurnal temperature ranges.

Concentrations of Air Pollutants and Meteorological Parameters before and during MCO
Many processes contribute to causing air pollution in automobiles (aircraft, trucks, automobiles, and other engines), power plants, industries, and household heating systems. The release of chemicals and harmful gases interacts with sunlight to increase the toxicity of the material [38]. As predicted, the lockdown, better known as MCO in Malaysia, improved the air quality in the country. During MCO, there were restricted social interactions and the closure of non-essential industries; hence, particular air contaminants, primarily those controlled by primary sources, were temporarily reduced. Several studies also reported a significant reduction in air pollution and improved air quality in their regions during lockdown [16,[39][40][41].
Tables 1 and 2 depict the descriptive statistics (mean, standard deviation, median, interquartile range, maximum, and minimum) of daily parameters and air pollutants parameters between 18 March and 1 June in 2019 and 2020 in the study locations. This 11-week period represents MCO in Malaysia, with MCO starting on 18 March 2020. A pair-sample t-test (effect size, d) was employed in comparing the mean concentrations of all parameters that were normally distributed. In contrast, non-normally distributed parameters were appropriately interpreted using a Wilcoxon signed-rank test (effect size, r) during these two years. In 2020, all studied air pollutants in Kuala Lumpur showed lower concentrations than in 2019, except for PM 2.5 , as shown in Table 2. The average of PM 2.5 in 2020 was higher than in 2019 at 0.22 µg/m 3 , and the results were statistically different (t or Z) at p < 0.05. In other words, the MCO was influential in reducing toxic air levels. The effect size of NO 2 was large (d ≥ 0.80 or r ≥ 0.80), whereas CO and SO 2 recorded a medium effect size (r ≥ 0.50). Nevertheless, all meteorological parameters were statistically significant (t or Z) at p < 0.05, following the removal of AT. SR had the highest effect size compared to other variables, which was a medium effect size at r = 0.68.
On average, all air quality parameters in 2020 showed lower concentrations than in 2019, as shown in Table 3. These results were statistically significant at p < 0.05, reflecting the impact of the MCO on enhancing air quality. Whereas PM 2.5 and PM 10 recorded a medium effect size (r ≥ 0.50), a large effect size was observed for NO 2 , O 3, and CO (d ≥ 0.80 or r ≥ 0.80). The exclusion of SR resulted in all meteorological parameters exhibiting non-statistically significant differences (t or Z) at p < 0.05. SR had the highest effect size compared to other variables, which had a medium effect size at r = 0.70. Table 3. Descriptive statistics and differences analysis of air pollutants and meteorological data in Selangor.

Variables
Year The new Malaysia Ambient Air Quality Standard for 2020 from the DOE (2020) was compared with all six air pollutant parameters. None of these air pollution parameters exceeded the standards (PM 10    There are industrial areas and heavy road traffic around the monitoring station, so when MCO was implemented, the restriction of movement and operation of industries also stopped, decreasing the production of particulate matter in the atmosphere. However, trucks used to transport foods and other essential goods during the MCO were operating as usual. Consequently, the concentrations of PM 2.5 and PM 10 in Kuala Lumpur did not reduce during MCO due to the ongoing essential activities in the region [30]. As a result of reduced manufacturing operations in Malaysia throughout MCO, SO 2 concentrations also decreased. Nevertheless, the SO 2 concentration increased drastically in Week 13 in Selangor. This finding could be due to the reopening of many economic sectors and activities when MCO moved to a new phase, the Conditional Movement Control Order (CMCO). Apart from this finding, concentrations of CO 2 and NO 2 in Kuala Lumpur and Selangor had recorded higher concentrations before MCO than during MCO. A decreased number of vehicles on the road caused fewer emissions from motor vehicles, which explained the significant reductions in NO 2 and CO emissions. Previous studies reported on CO reduction in the megacity of Delhi, India, and a decline in NO 2 levels in the city of Rio de Janeiro, Brazil, due to a reduction in vehicle movement and the closure of industrial complexes and power plants during the COVID-19 lockdown [42,43]. Therefore, shutting down transport and industrial sectors mainly explains why these pollutants declined sharply during the lockdown phase.   Furthermore, Lefohn et al. [44] and Paoletti et al. [45] indicated that the reduction of nitrogen oxide (NO x ) concentrations had an inverse relationship with O 3 concentrations. Contrastingly, concentrations of O 3 fell rapidly at higher concentrations of NO x [46,47]. In the presence of sunlight, photolysis of NO x and volatile organic compounds (VOCs) produces O 3 [48]. As a result of restricted movements and operations during the lockdown phase, decreased NO 2 emissions increased O 3 concentrations [49,50].
The climate of Malaysia is categorized as hot and humid throughout the year because its location is just north of the equator, with average temperatures around 28 degrees on the mainland. There are two monsoon wind seasons: the southwest monsoon from May to September and the northeast monsoon from October to March. These monsoon seasons bring in more rainfall with a higher RH, lower SR, and lower AT on average than hot seasons. Kuala Lumpur experienced the transitional period of the inter-monsoon season during the study period, March-April. For meteorological parameters, excluding WD and RH, the concentrations of all the variables were reduced during the first week of the MCO. The RH increase was expected to continue until May 2020 due to the monsoon season, which can cause frequent rain. Malaysia experienced an inter-monsoon season between March and April throughout the research period. In early May, we experienced the early stage of the Southwest monsoon; hence, more rainfall will happen as a result of increasing RH and decreasing SR and AT.
Nonetheless, on average, WS showed a slightly decreasing trend at the end of the MCO for Kuala Lumpur (2.65 to 2.17 m s −1 ) and Selangor (1.11 to 2.33 m s −1 ), where the WS might be attributed to seasonal change. The RH value was slightly higher during the MCO compared to the pre-MCO period, ranging from 88.40% to 97.47% and from 89.87% to 97.40% in Selangor and Kuala Lumpur, respectively. In comparison, the AT during MCO presented similar trends for both states, which were higher trends at the beginning of MCO but lower trends at the end of MCO.
In a broader context, these movement restrictions are not just to control the outbreak of COVID-19 but also to minimize the air pollution problem in Malaysia. The restrictions on pollution-generating activities and human mobility during the lockdown period resulted in an overall improvement in air quality throughout the world, including Malaysia. The present results are consistent with earlier reports [51], in which low levels of contaminants (CO, PM 2.5 , and PM 10 ) were documented in the Klang Valley during the MCO. In the same manner, studies from other countries such as India (Delhi) and Spain (Barcelona) also reported on the reduction of PM 2.5 , PM 10 , CO, and NO 2 but the increment of O 3 during the lockdown period in Delhi, India [43] and Barcelona, Spain [50]. In addition, studies from major cities in China and Morocco also showed a significant reduction in concentrations of PM 2.5 , PM 10 , NO 2 , SO 2 , and CO during the lockdown phase [8,52].

Relationship between Air Quality, Meteorological Factors, and COVID-19 Cases
COVID-19 has a 1-14-day incubation period [53], and the impact of meteorological parameters could last for a few days [4]. Another study evaluated the correlations between COVID-19 cases, related mortality, and the concentrations of air pollutants between the years in which COVID-19 cases occurred and the years before the pandemic [54]. In line with their findings, applying the lag effect of various air quality and meteorological variables was reasonable in this study. Assuming that the usual duration from virus transmission to infection is 7 days, the average daily air quality and meteorological parameters of 7 days ago were compared with the total reported cases during MCO. For instance, COVID-19 cases on 18 March 2020, were analyzed against the average air pollution parameters on 11 March 2019, and meteorological factors on 11 March 2020, as applied previously in a local study by Suhaimi et al. [55]. Table 4   Moving on, the variables that were statistically significant at the bivariate level were analyzed at the multivariate level by applying multiple linear regression. Kuala Lumpur had six predictor variables (PM 10 , PM 2.5 , NO 2 , WS, RH, and SR), whereas Selangor had five predictor variables (PM 10 , PM 2.5 , NO 2 , O 3 , and SR). Only variables with a significant p < 0.05 were selected from this model when the model was fitted with the stepwise method. Tables 5 and 6 show variables for each region representing air quality and meteorological factor variables that were strongly linked to cumulative COVID-19 cases in Kuala Lumpur and Selangor. From the analysis, results revealed that RH was the most contributing meteorological indicator that significantly influenced the incidence of COVID-19 in Kuala Lumpur, followed by Equation (4).
As for Selangor, PM 2.5 was strongly linked to cumulative COVID-19 cases. The equation is portrayed in Equation (5).
Total COVID-19 cases = 1885.26 + 13.25 (PM 2.5 ) For every unit (1 µg m −3 ) increase in PM 2.5 , COVID-19 cases will increase by 13.25. Beta values were significant at 0.05. VIF readings were less than five, which showed no multicollinearity concern. Moreover, 20.1% of the variance in COVID-19 cases can be explained by PM 2.5 , R 2 = 0.244, F (4, 71) = 5.72, p < 0.001. A combined effect of this magnitude can be considered large (f 2 = 0.73); hence, PM 2.5 has a huge influence on COVID-19 cases. However, several factors that were not considered in this study might influence the incidence of COVID-19.
Our findings demonstrated a link between COVID-19 cumulative cases, air contaminants, meteorological factors, and their relationships. This scientific study supports the evidence that chronic illnesses are linked to environmental pollution, particularly in urban areas. Air pollution is a well-known contributor to chronic inflammation, resulting in an overactive innate immune system [7]. Long-term air pollution exposure can lead to persistent immune system disturbances [56]. It may result in a weakened circulatory and respiratory viral invasion, thus increasing the risk of the severe outcome of COVID-19 [7].
In our study, the correlation test showed that air pollutant parameters PM 10 , PM 2.5 , NO 2 , and O 3 had positively significant correlations with COVID-19 cumulative cases. Results from our study are comparable with those from a prior investigation that revealed a positive relationship between cumulative COVID-19 cases and O 3 in China [9]. Moreover, a study discovered that a rise in the long-term O 3 average is linked to COVID-19 mortality and morbidity [54].
On the other hand, our findings were contrasted with findings from previous studies by Sahoo et al. [57] in India, Zhu et al. [8] in China, and Bashir et al. [58] in California, who reported that the air pollutants (PM 10 , PM 2.5 , and NO 2 ) were negatively and significantly correlated with COVID-19 cases; another study reported that O 3 was negatively correlated with daily COVID-19 cases [59]. PM 2.5 and PM 10 have been related to several health effects, including inflammatory responses, oxidative damage, DNA damage, and respiratory, cardiovascular, and nervous system problems [60]. PM 2.5 impairs bronchial immunity and affects the integrity of the epithelial cells [61], and these events reduce the capacity of the antibodies to combat viruses and increase susceptibility to respiratory diseases. Hence, the current findings regarding the positive correlation between cumulative COVID-19 cases and both particulate matter components (PM 2.5 and PM 10 ) are consistent with the report in Millan, Italy [60]. The researchers also found a positive correlation between PM 2.5 and PM 10 and daily cases of COVID-19. The presence of atmospheric PM can serve as a means of transportation for viruses, facilitating their spread in aerosol form and creating an environment that is conducive to their survival. This is because PM 10 and PM 2.5 can be inhaled, along with any associated microorganisms. Studies have shown that particle concentration and dimension can have a significant impact on the composition and concentration of microbial communities. When particles are inhaled, particularly those smaller than 2.5 microns, such as PM 2.5 and UFPs, they can penetrate deep into the lungs, allowing viruses to develop within the respiratory tract and cause infections.
It has been proven that NO 2 , SO 2 , and CO emissions are connected to an increased prevalence of lung and cardiovascular disease infections [62]. Nonetheless, the present findings reflect no significant correlation between CO and SO 2 and the cumulative cases of COVID-19. Meanwhile, positive correlations were found between cumulative COVID-19 cases and NO 2 , aligning with research conducted in China and Italy [8,63]. Furthermore, a study in England shows that exposure to such pollutants could prevent pulmonary antimicrobial responses, limiting virus clearance from the lungs and increasing infectivity [54]. They also stated that 3.3% of cases and 3.1% of deaths were linked to an increase of 1 mg m −3 of NO 2 concentration in 2018.
Moving on to susceptibility to diseases from air pollution exposure, demonstrated how chronic exposure to air pollutants can cause respiratory symptoms and lead to COVID-19 infection [64]. This study revealed that the expression level of Angiotensin-Converting Enzyme 2 (ACE-2) in the alveolar cells of the lungs is a strong determinant of the different categories of severity exhibited by COVID-19 patients. These could range from being asymptomatic to mildly symptomatic to severely symptomatic if the ACE-2 in the aforementioned location is low (↑), moderate (↑↑), or high (↑↑↑) for NO 2 , PM 2.5 , and NO X , respectively. Exposure to these air pollutants may contribute to low host defenses and immunity, increase susceptibility to diseases, and cause a high viral load of the SARS-CoV-2 virus.
In addition, meteorological factors are considered influential determinants for viability, transmission, and range of virus transmission [65,66]. These meteorological indicators can also impact droplet stability in the environment or virus survival; hence, they influence coronavirus transmission [10]. From our findings, the meteorological factors of WS, RH, and RH were correlated with cumulative COVID-19 cases in Kuala Lumpur and Selangor.
A study from Jordan discovered a higher infection rate with low levels of WS, RH, and SR, thus promoting the coronavirus's survival [67].
In this study, a moderately positive correlation was detected between the cumulative COVID-19 cases and RH (r = 0.494, p < 0.001), indicating that the cases increased positively with an increment in RH. Our findings are consistent with those of researchers in India [68], who demonstrated that cumulative cases increased rapidly with RH. Similarly, previous studies in Singapore and Thailand, which are neighboring countries of Malaysia, had shown significant positive correlations between RH and daily COVID-19 cases [9,69]. However, a group of researchers found a negative association between RH and daily new COVID-19 cases [70]. Based on the findings from these tropical nations, it can be concluded that high RH supported COVID-19 spread in tropical nations, such as Malaysia, Singapore, and Thailand, but not in colder regions, such as Europe and the United States of America, as previously stated by Suhaimi et al. [55].
Moving on with our findings on another meteorological indicator. SR had a weak negative correlation with COVID-19 cases in Kuala Lumpur (r = −0.368, p = 0.001) and Selangor (r = −0.249, p = 0.030). These findings implied that the number of confirmed cases was reduced with SR. Moreover, these outcomes may be due to our study period because there was a monsoon season that caused the cloud to cover the sun and led to decreasing SR concentration (Figure 4j and 5j). UV rays, especially in the summer period, might be vital for the prevention of COVID-19 transmission given their deleterious effects on a variety of viruses such as SARS and MERS [71].
Furthermore, a study identified that the growth of SARS-CoV-2 can be promoted by lower UV rays [72]. In another study in Jordan, the researchers established an adverse correlation between SR and COVID-19 cases and discovered that SR plays a crucial role in COVID-19 outbreaks, which matched our findings [67]. Overall, meteorological factors contributed more to SARS-CoV-2 transmission in regions and months with colder and drier conditions and lower UV radiation than in regions and months with warmer, wetter seasons and higher UV radiation levels, as previously claimed by [73].
Negative correlations were detected between the cumulative COVID-19 cases and WS in Kuala Lumpur (r = −0.311, p = 0.006), but such an association was lacking in Selangor. Our findings in Kuala Lumpur were in agreement with the outcomes reported by Alkhowailed et al. [74], who reported a negative relationship between WS and the incidence of COVID-19. They also claimed that WS influenced COVID-19 transmission in cities with a high population, which could be due to a low WS in these areas, favoring the spread of the SARS-COV-2 virus among persons living near congested areas as compared to areas with a higher WS. Our findings on WS in Selangor found no significant correlation with COVID-19 cases, although the relationship was positive, which was in line with a previous study in Africa [75]. On days when AT was cooler, increased WS might cause people to remain indoors, reducing the spread of COVID-19 [76].

Boosted Tree Prediction
In this study, the impact of air quality and meteorological factors on COVID-19 cases was accessed using the machine learning method. The model's performance is evaluated at various leaf sizes, learners, and learning rates. As shown in Table 7, it is found that the BT method for this investigation is best at the seven leaf size with Principal Component Analysis (PCA) disabled, 30 learners, and a 0.2 learning rate. Figure 7 shows the response plot of the predictors and response (COVID-19 cases) variables. The plotted response indicates the outcomes of the regression model. By employing cross-validation, these predictions are made on unseen data points. In simpler terms, each prediction is generated by a model that was trained without incorporating the relevant observation. Figure 8 shows the plot of the predicted versus actual data, followed by Figure 9, where the residual plot is shown. The residual plot shows the difference between the predicted response and the true response. It can be seen from the figure that the residual errors fall between 1 and -1 and are scattered around 0. The outliers are not visible, and there are no significant changes between the data on the x-axis, which suggests an acceptable prediction model. Finally, using the BT regression model, the COVID-19 cases in Selangor were forecast. The forecasted COVID-19 cases found that the RMSE, R 2 , MAE, and MSE were 0.47, 0.77, 0.39, and 0.22, respectively, as shown in Table 8. R 2 has become a commonly used metric for evaluating regression analyses in different scientific fields because of its ability to provide accurate and informative information [77]. An R 2 value higher than 0.75 is considered substantial, indicating that the model is the best fit for predicting COVID-19 cases [78]. The BT regression model was utilized to estimate the COVID-19 cases at Selangor Station by considering the air quality and meteorological data of the region. Figure 9 illustrates a comparison between the predicted cases and the actual cases recorded at Selangor Station. The graph shows that the predicted cases follow a similar trend to the actual cases, although the actual cases exhibit some variability not captured by the model. Nevertheless, the fact that the predicted cases track the actual cases quite closely suggests that the BT regression model was successful in forecasting COVID-19 cases in Selangor using air quality and meteorological data. Overall, this study demonstrates that environmental factors can be used to predict the spread of COVID-19 in a given region, which could be useful for policymakers in implementing targeted interventions to control the spread of the virus. by a model that was trained without incorporating the relevant observation. Figure 8 shows the plot of the predicted versus actual data, followed by Figure 9, where the residual plot is shown. The residual plot shows the difference between the predicted response and the true response. It can be seen from the figure that the residual errors fall between 1 and -1 and are scattered around 0. The outliers are not visible, and there are no significant changes between the data on the x-axis, which suggests an acceptable prediction model. Finally, using the BT regression model, the COVID-19 cases in Selangor were forecast. The forecasted COVID-19 cases found that the RMSE, R 2 , MAE, and MSE were 0.47, 0.77, 0.39, and 0.22, respectively, as shown in Table 8. R 2 has become a commonly used metric for evaluating regression analyses in different scientific fields because of its ability to provide accurate and informative information [77]. An R 2 value higher than 0.75 is considered substantial, indicating that the model is the best fit for predicting COVID-19 cases [78]. The BT regression model was utilized to estimate the COVID-19 cases at Selangor Station by considering the air quality and meteorological data of the region. Figure 9 illustrates a comparison between the predicted cases and the actual cases recorded at Selangor Station. The graph shows that the predicted cases follow a similar trend to the actual cases, although the actual cases exhibit some variability not captured by the model. Nevertheless, the fact that the predicted cases track the actual cases quite closely suggests that the BT regression model was successful in forecasting COVID-19 cases in Selangor using air quality and meteorological data. Overall, this study demonstrates that environmental factors can be used to predict the spread of COVID-19 in a given region, which could be useful for policymakers in implementing targeted interventions to control the spread of the virus.    Atmosphere 2023, 14, x FOR PEER REVIEW Figure 7. The relationship between a response variable and predictor variables sponse (cases) refers to COVID-19 cases.     Figure 9. The predicted and actual COVID-19 cases in Selangor. The predicted cases were based on air quality and meteorological data at Selangor Station using the developed BT regression model.

Limitations
Despite the findings highlighted in this study, it has limitations. First, the study only considered two major locations (Kuala Lumpur and Selangor), thereby leading to some outcomes that were different from the real influence of meteorological factors and ambien pollution on the transmission of the novel SARS-CoV-2 in Malaysia. Second, there wa only one station studied for the whole state. More data and relationships could be ex plored if more stations were included in this study. Third, the data on air quality param eters and meteorological factors were only studied for 11 weeks to compare when COVID 19 was absent and when COVID-19 was present. By lengthening the study period, bette results could represent the year without COVID-19 and with COVID-19. Fourth, the eco logical study design used in this research may contain an ecological fallacy. Individual level data on air pollutant exposure and coexisting health conditions were not collected resulting in limited assumptions for group-level analysis of the available data.
These limitations need to be addressed in future studies involving cohort groups in which factors such as gender, age, occupation, underlying conditions, and high-risk o vulnerable groups are considered in the Malaysian context. Examples of high-risk indi viduals include those with tuberculosis, cardiovascular diseases, diabetes, asthma, and chronic obstructive pulmonary disease. Chronic exposure to polluted air can lead to a compromised immune system; hence, the affected individuals will be more susceptible to any kind of respiratory disease, including COVID-19. Moreover, changes in the SARS CoV-2 virus have been detected over time and need more study, particularly on its trans mission as it is related to the environment, such as meteorological factors.
Considering the potential effects of air pollution, it could be considered a confound ing factor in the association between close interaction among people, population density  Figure 9. The predicted and actual COVID-19 cases in Selangor. The predicted cases were based on air quality and meteorological data at Selangor Station using the developed BT regression model.

Limitations
Despite the findings highlighted in this study, it has limitations. First, the study only considered two major locations (Kuala Lumpur and Selangor), thereby leading to some outcomes that were different from the real influence of meteorological factors and ambient pollution on the transmission of the novel SARS-CoV-2 in Malaysia. Second, there was only one station studied for the whole state. More data and relationships could be explored if more stations were included in this study. Third, the data on air quality parameters and meteorological factors were only studied for 11 weeks to compare when COVID-19 was absent and when COVID-19 was present. By lengthening the study period, better results could represent the year without COVID-19 and with COVID-19. Fourth, the ecological study design used in this research may contain an ecological fallacy. Individual-level data on air pollutant exposure and coexisting health conditions were not collected, resulting in limited assumptions for group-level analysis of the available data.
These limitations need to be addressed in future studies involving cohort groups in which factors such as gender, age, occupation, underlying conditions, and high-risk or vulnerable groups are considered in the Malaysian context. Examples of high-risk individuals include those with tuberculosis, cardiovascular diseases, diabetes, asthma, and chronic obstructive pulmonary disease. Chronic exposure to polluted air can lead to a compromised immune system; hence, the affected individuals will be more susceptible to any kind of respiratory disease, including COVID-19. Moreover, changes in the SARS-CoV-2 virus have been detected over time and need more study, particularly on its transmission as it is related to the environment, such as meteorological factors.
Considering the potential effects of air pollution, it could be considered a confounding factor in the association between close interaction among people, population density, and eating and drinking behaviors. Air pollution can weaken the respiratory and immune systems, which may heighten the vulnerability of individuals to infectious diseases such as COVID-19. Additionally, people living in areas with high levels of air pollution might be more likely to spend time indoors or in overcrowded spaces, amplifying the risk of transmission. Therefore, when studying the relationship between close interactions between people, it is important to consider the potential confounding effects of air pollution. This may involve controlling for air pollution levels in the analysis or conducting stratified analyses based on air pollution levels. By taking into account the potential effects of air pollution, researchers can better understand the true relationship between these factors and develop appropriate interventions to prevent the spread of infectious diseases.

Conclusions
In conclusion, this study highlights the significant correlation between air pollution, meteorological parameters, and COVID-19 cases in Malaysia. PM 2.5 , PM 10 , NO 2 , O 3 , RH, WS, and RH were found to be significantly correlated with COVID-19 cases. The findings also indicate that COVID-19 cases were positively correlated with O 3 , NO 2 , RH, PM 10 , and PM 2.5 but negatively correlated with SR and WS. The use of the BT regression model in forecasting COVID-19 cases in Selangor was successful, with an R 2 value of 0.77 indicating substantial accuracy. Although the actual cases exhibited some variability not captured by the model, the predicted cases tracked the actual cases closely, suggesting that the model was effective in forecasting COVID-19 cases in Selangor using air quality and meteorological data.
This study could provide valuable insights for future research in countries with similar climates and population densities. However, preventive measures, such as handwashing, sanitization, and physical distancing, are still crucial during the epidemic phase. Wearing masks and getting vaccinated against COVID-19 is also highly recommended to increase herd immunity and prevent the severe impacts of COVID-19. The findings of this study could benefit policymakers in developing a better systematic policy for Malaysia based on pollution sources and mitigation measures to improve air quality. Integrated efforts to control emissions and minimize exposure to air pollutants can improve human health in general and alleviate the public health burden of COVID-19 specifically. The Malaysian government's COVID-19 mitigation measures had a significant impact on air pollutant concentrations in the country. Reduced human activities, vehicle emissions, industrial emissions, and coal-fired power plant emissions were the main factors that led to cleaner air during the MCO period. This study's results enable the government to devise systematic policies that take into account the pollution sources and characteristics of pollutants. One such policy could be to encourage the use of cleaner alternatives and new vehicle technology, given the positive trend in air quality observed during the MCO period due to the reduced number of vehicles. Local authorities can also adopt their own measures to reduce air pollution on a smaller scale, which can be extended to a larger scale over time. In general, this study underscores the need for ongoing efforts to reduce air pollution in Malaysia and the potential advantages of implementing emission reduction policies over the long term.