Long-term exposure to air pollution and severe COVID-19 in Catalonia: a population-based cohort study

The association between long-term exposure to ambient air pollutants and severe COVID-19 is uncertain. We followed 4,660,502 adults from the general population in 2020 in Catalonia, Spain. Cox proportional models were fit to evaluate the association between annual averages of PM2.5, NO2, BC, and O3 at each participant’s residential address and severe COVID-19. Higher exposure to PM2.5, NO2, and BC was associated with an increased risk of COVID-19 hospitalization, ICU admission, death, and hospital length of stay. An increase of 3.2 µg/m3 of PM2.5 was associated with a 19% (95% CI, 16–21) increase in hospitalizations. An increase of 16.1 µg/m3 of NO2 was associated with a 42% (95% CI, 30–55) increase in ICU admissions. An increase of 0.7 µg/m3 of BC was associated with a 6% (95% CI, 0–13) increase in deaths. O3 was positively associated with severe outcomes when adjusted by NO2. Our study contributes robust evidence that long-term exposure to air pollutants is associated with severe COVID-19.

The purpose of the current study was to assess associations of particulate matter, nitrogen dioxide, ozone, and black carbon on COVID-19 severity (hospitalizations, ICU admissions, and death). The manuscript has great potential to add to the literature on environmental exposures and COVID-19 and provides novel findings in a large cohort setting. The analysis is well executed, but requires some additional details to make it reproducible. Most of my comments are minor (below), however, please carefully consider point 6, 7, and 14, which may require rerunning some analyses. Please also note that the following comments are not intended to degrade the researchers but are provided to improve the quality of this work for future submission.
1. Black carbon and hospital length of stay should be mentioned in abstract if they are assessed in the main manuscript. 2. For readers unfamiliar with Catalonia, it may be helpful to have a little more relevant information on the general population (e.g. % insured, % covered versus % not covered by the public healthcare system in 2015). 3. Please add a few lines on the proposed mechanism/rational for study inclusion for each of the three pollutants in the introduction. 4. Are the COVAIR-CAT exposure assessment models published or validated elsewhere? If so, please add references after the phrase, "using machine learning methods". If not, perhaps a couple of sentences could be added to the methods to briefly describe the exposure assessment (spatial resolution, inputs, etc.). 5. It is not clear why the air pollution was assigned to addresses after follow-up (2021) rather than before, was 2019 address information unavailable? Please add either rationale or limitation to the main text. 6. My main concern is that the term COVID-19 hospitalization is misleading. All cause hospitalization following a COVID-19 infection is not comparable with COVID-19 hospitalization, with the latter being used in almost all of the cited literature. Most studies use COVID-19 ICD-10 in a primary position at diagnosis. For comparability with other cited studies, the definition of COVID-19 hospitalization should be "COVID-19 as main cause of admission 33,981 (72.0%)", from Table S3. Table S4 groups respiratory cause admissions, including chronic respiratory, with COVID-19 admissions, but it is important that COVID-19 hospitalizations are reported separately from other respiratory disease admissions. That is, for comparability with other literature, COVID-19 hospitalizations should be the main analysis that is compared to other studies in the discussion. 7. L180 "For the main analysis, we considered a missing value on tobacco smoking as non-smoker because the value is most often omitted for non-smokers in the primary care service", requires a reference to support this decision. Otherwise, to avoid exclusions or imputation, perhaps missing smoker could be coded using a missing indicator e.g. "never", "ever", "smoker", "missing". See https://arxiv.org/abs/2111.00138 for more information on why I suggest this approach (no need to cite). 8. Please consider if time-to-event is relevant for long-term time invariant exposure (air pollution) and a short follow-up period? Is the time from March 1st 2020 until COVID diagnosis meaningful? 9. When comparing results with other studies please report if they reported HR or OR, etc. before each estimate. 10. L373 "Although we did not perform a formal causal mediation analysis because of the lack of established methods for time to event analysis". Causal mediation is possible with Cox PH, e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5010419/, please correct this statement. 11. This study reports effect estimates that are substantially larger than cited studies with adequate confounder adjustment. Given the infectious nature of COVID-19, perhaps the categorical Urbanicity variable does not fully adjust for the confounding by population density (and perhaps housing conditions). If you agree, this consideration could be discussed in limitations or be worked into the interpretation in the discussion. 12. How was 6 degrees of freedom selected for the age spline? 13. Please comment on the protective effect of O3 in single pollutant models. ICU admission, LOS, and death are 0, regardless of their air pollution levels, demographic characteristics, or others. When conducting survival analyses, the models should be conducted among participants who are at risk at baseline. When the outcomes are COVID-19-related hospitalization, LOS, and death, the study sample should be those who tested positive; when the outcome is ICU admission, the sample should be those who were hospitalized due to COVID-19 (assuming that ICU admission were all transfered from regular admissions).
The estimates for O3 are confusing and may be misleading. It is hard to interpret O3 as some sort of "good air". What are the correlation coefficients between PM2.5, NO2, and O3? I'm not sure if it is a good idea to put the results of O3 in a paper that is primarily targeting PM2. 5  Claiming acute myocardial infarction as a chronic comorbidity does not make sense to me (acute is an antonym to chronic).
Is it possible to tell different COVID variants for the 340,608 COVID-19-positive individuals in the analysis?
Page 15, line 321: please refrain from using "RR" for logistic regression or generalized linear models, they are not the same thing (even though they are close in certain situations). The paper is very well-written. Some key strengths of this study include its use of state-of-the-art methods for air pollution exposure assessment and the population-based cohort study design that consisted of nearly entire adult population in Catalonia, Spain. The findings suggest that long-term exposures to PM2.5 and NO2 were associated with severe COVID-19 outcomes in this cohort. The evidence concerning O3 exposure is less clear. The authors are commended for conducting a number of sensitivity analyses that resulted in similar findings indicating the robustness of the results. Although several individual-level factors such as race/ethnicity and occupation could not be controlled for, the set of other covariates was ample.
I only have few minor comments and suggestions that are listed below.
Methods/Covariates 1. For many readers who are unfamiliar with Catalonia, please provide more details about this study area, especially the size of various geographic regions used in this study (eg, health regions and primary care service areas). It would helpful to also discuss the appropriateness of adjusting for variables derived at these geographic levels. 2. Could the minimal impact of further adjusting for preexisting comorbidities on the air pollution-COVID-19 associations be due to the fact that the main model already included health risk group? As it was described, the latter variable encompassed multimorbidity. Please clarify. 3. As the primary outcome of interest is COVID-19 hospitalizations, could the authors consider a sensitivity analysis to adjust for distance to a nearest acute-care hospital? 4. Page 9. The authors "accounted for the competing event of death when evaluating COVID-19 related hospitalization and ICU admission by censoring at a death event". Since ambient air pollution has been linked to various mortality outcomes, the censoring may introduce selection bias under the null. Could the authors consider a sensitivity analysis to account for the censoring using the measured risk factors for the outcome of interest (for example, by IP censoring weighting)? Discussion 5. Page 18. The authors discussed "Another hypothesis is that air pollution exposure could facilitate SARS-CoV2 binding, because there is evidence that exposure to particulate matter upregulates the expression of SARS-CoV-2 receptors in the lung (e.g., angiotensin-converting enzyme 2)". This manuscript can benefit from a more thorough discussion about this possible mechanism, since its results suggested that the impact of air pollution on increasing the prevalence of chronic comorbidities associated with severe COVID-19 was likely minimal. This will help improve the biological plausibility of the study results.

Reviewer #1 (Remarks to the Author):
Thank you for the opportunity to review this manuscript on long-term air pollution exposure and COVID-19 severity.
The purpose of the current study was to assess associations of particulate matter, nitrogen dioxide, ozone, and black carbon on COVID-19 severity (hospitalizations, ICU admissions, and death). The manuscript has great potential to add to the literature on environmental exposures and COVID-19 and provides novel findings in a large cohort setting. The analysis is well executed, but requires some additional details to make it reproducible. Most of my comments are minor (below), however, please carefully consider point 6, 7, and 14, which may require rerunning some analyses. Please also note that the following comments are not intended to degrade the researchers but are provided to improve the quality of this work for future submission.
ANSWER: Thank you for your comments.

Black carbon and hospital length of stay should be mentioned in abstract if they are assessed in the main manuscript.
ANSWER: We added black carbon and hospital length of stay results to the abstract.

For readers unfamiliar with
Catalonia, it may be helpful to have a little more relevant information on the general population (e.g. % insured, % covered versus % not covered by the public healthcare system in 2015).

ANSWER:
We agree and added this information in the methods about the general population in Catalonia in 2015. In 2015, 98.8% of the total population was covered by the public healthcare system.

Please add a few lines on the proposed mechanism/rational for study inclusion for each of the three pollutants in the introduction.
ANSWER: We added the mechanistic rationale for analysing them in the introduction.

4.
Are the COVAIR-CAT exposure assessment models published or validated elsewhere? If so, please add references after the phrase, "using machine learning methods". If not, perhaps a couple of sentences could be added to the methods to briefly describe the exposure assessment (spatial resolution, inputs, etc.).

ANSWER:
The COVAIR-CAT exposure modeling manuscript is about to be submitted for publication. We improved the description of the modeling approach in the methods. The original submission included a description of the models in the supplementary methods.

It is not clear why the air pollution was assigned to addresses after follow-up (2021) rather than before, was 2019 address information unavailable? Please add either rationale or limitation to the main text.
ANSWER: We agree this was not sufficiently clear. We used the residential address of each individual at the start of 2021 or the last available, because we did not have an updated residential address for the start of 2020. The address registry for 2020 in Catalonia was disrupted by the pandemic. We had access to residential addresses at the start of 2015 or 2021. We selected the start of 2021, as it was closer to 2020 and run a sensitivity analysis accounting for those that did not move between 2015 and January 2021. We have now clarified the text.
6. My main concern is that the term COVID-19 hospitalization is misleading. All cause hospitalization following a COVID-19 infection is not comparable with COVID-19 hospitalization, with the latter being used in almost all of the cited literature. Most studies use COVID-19 ICD-10 in a primary position at diagnosis. For comparability with other cited studies, the definition of COVID-19 hospitalization should be "COVID-19 as main cause of admission 33,981 (72.0%)", from Table S3. Table S4 groups respiratory cause admissions, including chronic respiratory, with COVID-19 admissions, but it is important that COVID-19 hospitalizations are reported separately from other respiratory disease admissions. That is, for comparability with other literature, COVID-19 hospitalizations should be the main analysis that is compared to other studies in the discussion. ANSWER: Thanks for the opportunity to clarify this point. We have discussed this approach during the protocol planning. We chose the time-defined window, ie, any hospitalization within 30 days of COVID-19 diagnosis, because this definition has been used since the beginning of the pandemic in clinical trials for treatment, 1-3 as well as, by national public health agencies (e.g., UK) 4 and other environmental epidemiological studies. 5,6 One of the main advantages of this approach is to avoid the problem of ICD-10 coding practice that can vary within and between hospitals; this is particularly important during the first wave. Being a new disease in 2020, hospitalizations that today would be coded as due to COVID-19 were not consistently coded in the first wave due to lack of knowledge and standardized coding (e.g., an acute respiratory failure with "happy or silent hypoxaemia" 7 clinical presentationsevere hypoxemia and absence of dyspnoea -or a stroke in a young patient infected SARS-CoV-2 patients). For instance, we have 2,524 hospital admissions as Respiratory cause as main cause (first position of ICD-10 code), in individuals with a recent COVID-19 diagnosis. Additionally, we did not analyse the Omicron period, when incidental hospitalizations could occur, thus decreasing the likelihood of COVID-19 not being responsible for that hospitalization.
We agree with the reviewer that this is an important topic, and now also show in the Main text (New Table 4) the results of the new analysis requested, using the main model adjustment and having COVID-19 as the main/first cause of hospitalization. The results are consistent to our primary analysis and add robustness to our manuscript. However, we did not change the time-window definition in our main analysis in order to remain consistent with our original study protocol. 7. L180 "For the main analysis, we considered a missing value on tobacco smoking as non-smoker because the value is most often omitted for non-smokers in the primary care service", requires a reference to support this decision. Otherwise, to avoid exclusions or imputation, perhaps missing smoker could be coded using a missing indicator e.g. "never", "ever", "smoker", "missing". See https://arxiv.org/abs/2111.00138 for more information on why I suggest this approach (no need to cite).

ANSWER:
We discussed this topic during the development of our analysis plan and based our approach on recommendations from agencies responsible for the collection and analysis of administrative health data in Catalonia. Unfortunately, we did not find any data or references that directly support this approach. Nevertheless, our original submissions included a sensitivity analysis using multiple imputation, showing similar results. In the revised version, we run an additional sensitivity analysis using the missing indicator and show it in the supplement (supplementary tables S3, S4, S5); again, with results consistent with our main analysis. While the reference mentioned by the reviewer supports the use of a missing indicator, others references highlight its potential problems. 8

Please consider if time-to-event is relevant for long-term time invariant exposure (air pollution) and a short follow-up period? Is the time from March 1st 2020 until COVID diagnosis meaningful?
ANSWER: Thanks for this comment. Adjusting for calendar/epidemic time is important to capture the dynamics of the epidemic in terms of temporal changes in transmissibility, diagnosis, treatment, etc,. Thus, we used a time-to-event analysis with daily time steps because of the advantages of accounting for censoring and time trends in the baseline hazard due to the dynamics of the epidemic.

When comparing results with other studies please report if they reported HR or OR, etc. before each estimate.
ANSWER: Thanks for this. We revised the discussion and added the measured estimate for all values.
10. L373 "Although we did not perform a formal causal mediation analysis because of the lack of established methods for time to event analysis". Causal mediation is possible with Cox PH, e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5010419/, please correct this statement.

ANSWER:
The reviewer is correct, and we deleted that sentence.

This study reports effect estimates that are substantially larger than cited studies with adequate confounder adjustment. Given the infectious nature of COVID-19, perhaps the categorical Urbanicity variable does not fully adjust for the confounding by population density (and perhaps housing conditions). If you agree, this consideration could be discussed in limitations or be worked into the interpretation in the discussion.
ANSWER: We run a sensitivity analysis with population density at the census tract level, now reported in supplementary tables S3, S4, S5. The results are robust to this further adjustment.

How was 6 degrees of freedom selected for the age spline?
ANSWER: We evaluated from 3 to 6 degrees of freedom and chose the best fit by using AIC criteria. We clarified this in the revised version.

Please comment on the protective effect of O3 in single pollutant models.
ANSWER: Thanks for this comment. Following the editor and reviewers' comments, we present all O 3 results in the supplement. We agreed it is difficult to interpret the O 3 coefficients given the high negative correlation with the other pollutants. We highlight the issue of potential confounding by correlated co-pollutants in the revised version of the manuscript, particularly in the interpretation of changes in the O 3 estimates when coadjusting by NO 2 .
The correlation between the air pollutants is in the supplementary appendix and shown below. Major comment: Page 6, lines 131-132: since the participants were followed through December 31st, 2020, aren't the authors supposed to define the first COVID-19 diagnosis from March 1st, 2020 to December 1st, 2020 (instead of December 31st) to allow for sufficient time to observe the health outcomes, given that the time range of COVID-19 related events after testing positive is 30 days.
ANSWER: Thanks for this comment. We now explore this point in a sensitivity analysis. The majority of hospital admissions occurred very shortly after the COVID-19 diagnosis: mean 3.8 ± 5 days, 83% occurred in the first week after diagnosis and 95% within 2 weeks. We therefore did not anticipate a large impact of using the 31 st Dec as the administrative censoring data and preferred to keep the whole year to include more COVID-19 diagnoses.
In the sensitivity analysis, the results are robust to changing the censoring date (supplementary tables S3, S4, S5 ANSWER: Thanks for these suggestions. We agree the performance statistics for PM 2.5 models were not ideal and reflected the characteristics of the PM 2.5 monitoring network in Catalonia. More than 80% of the PM 2.5 observations in the region come from manual stations, which typically record daily average PM 2.5 with frequency lower than daily. This makes their use for exposure modeling challenging, since they do not have enough temporal coverage to model long-term concentrations. Even though we use the best from the data by applying advanced modeling, which borrowed information from PM 10 . As a sensitivity analysis we have the analysis with ELAPSE PM 2.5 models, yielding similar results. The reference suggested are monthly global estimates with an spatial resolution of 0.1* x 0.1* degrees (~ 11km), while our modeling was customized to the region and had a resolution of 250 meters.
Regarding NO 2 , the exposure metrics the reviewer suggested are mostly based on 5P TROPOMI tropospheric NO 2 columns (which our model also includes) at 1km resolution. Our model incorporates many other local relevant predictors (e.g. road network, land cover, population density, impervious surfaces) and offers better spatial resolution (250m) than the suggested exposure estimates, which is critical for long-term estimation of NO 2 , which may vary a lot at short distances. Nonetheless and similarly to PM 2.5 , we also linked ELAPSE NO 2 estimates (100 m resolution, year 2010), which were highly correlated with COVAIR exposures and achieved similar results in epidemiological models.

Page 9, Lines 195-197, why would the authors use negative binomial regressions for LOS? negative binomial models are commonly used when the dependent variable is a count variable, and LOS does not look like a count variable to me.
ANSWER: Thanks for the opportunity to clarify this point. Length-of-stay is usually a count variable (1, 2, 3, 4, 5 days spent in the hospital). In our analysis, we generated counts of days in hospital based on date of admission and date of discharge. We recognize that there is not a consensus in the literature on how to model LOS; however, negative binomial is one of the most commonly used modelling approach in several benchmark papers. [9][10][11] We have referenced it in the manuscript.
The authors did not mention the sample for each outcome. My understanding is that the authors conducted all the models based on all the 4660502 participants included in the study (please correct me if I'm wrong). If so, those who were never diagnosed with COVID-19 should not be in the risk set. Because they are COVID-19 negative, their risks of COVID-19-related hospitalization, ICU admission, LOS, and death are 0, regardless of their air pollution levels, demographic characteristics, or others. When conducting survival analyses, the models should be conducted among participants who are at risk at baseline. When the outcomes are COVID-19-related hospitalization, LOS, and death, the study sample should be those who tested positive; when the outcome is ICU admission, the sample should be those who were hospitalized due to COVID-19 (assuming that ICU admission were all transfered from regular admissions).
ANSWER: Thank you for the opportunity to discuss this topic. The denominator of the analysis of COVID-19 related hospitalization, ICU admission and deaths is the whole cohort (n=4,660,502), while for hospital length-of-stay, the denominator was everyone hospitalized. We discussed this issue during our protocol and analysis plan. We discuss the reasons to analyse the whole cohort below: 1 -To estimate the association in the target population in which all individuals are at risk. 12 Our outcome is a composite/conditional outcome: being diagnosed with COVID-19 plus having a severe event. 12 Those that have not been diagnosed are still at risk of the composite outcome, because they are at risk of being infected, then at risk of having the severe event within 30 days (COVID-19 diagnosis plus severe event). As our goal was to derive estimates for the target population, which was the full population of Catalonia. Similarly, we aimed to estimate mortality rates (for analysis focused on deaths) rather than case-fatality rates, to increase the broader public health relevance of the findings. 12 2-To avoid an expected and likely important selection bias when analysing only COVID-19 cases: Particularly for the first wave, and even in the second wave, the number of those undiagnosed with COVID-19 is high. 13 Probability of testing is likely associated with air pollution levels (in our cohort, those diagnosed with COVID-19 have higher levels of air pollution) and there are likely unmeasured factors associated with both, this gives risk to selection/collider bias. These issues have been discussed in these two methodological papers, 14,15 which recommend against restricting analysis to those diagnosed with COVID-19. As we don't expect severe COVID to be undiagnosed (people will go to hospital if they are very ill), we select to analyse severe COVID-19.
3 -A large proportion of hospitalized individuals had their diagnosis upon hospitalization: 14,556 (31%) of the 47,174 hospitalizations had their first COVID-19 diagnosis at the time of hospitalization. These individuals would have 0 time to event if we included only diagnosed individuals. These individuals could be those at higher risk, or have less access to primary care, which is likely associated with the exposure, potentially creating an additional source of selection bias.
We evaluated the results for hospitalization, ICU admission and deaths among those with a COVID-19 diagnosis. We present the results for all COVID-19 diagnosis, attributing 0.5 to time to event for those with event same day of diagnosis; and an analysis among those diagnosed through primary care (supplementary tables S9, S10, S11). As can be observed, there are still positive associations for hospital and ICU admissions, and depending on the model, for death. In wave-specific analyses, there is an impact of wave particularly for death, reflecting our previous arguments that the analysis of COVID-19 cases can be biased, particularly during the first wave.
The estimates for O3 are confusing and may be misleading. It is hard to interpret O3 as some sort of "good air". What are the correlation coefficients between PM2.5, NO2, and O3? I'm not sure if it is a good idea to put the results of O3 in a paper that is primarily targeting PM2.5 and NO2.
ANSWER: Thanks for this comment. Following the editor and reviewers' comments, we present all O 3 results in the supplement. We agreed it is difficult to interpret the O 3 coefficients given the high negative correlation with the other pollutants. We highlight the issue of potential confounding by correlated co-pollutants in the revised version of the manuscript, particularly in the interpretation of changes in the O 3 estimates when coadjusting by NO 2 .
The correlation between the air pollutants is in the supplementary appendix and shown below.