Association between smoking and COVID-19 severity: A multicentre retrospective observational study

The relationship between smoking and coronavirus disease 2019 (COVID-19) severity remains unclear. This study aimed to investigate the effect of smoking status (current smoking and a smoking history) on the clinical severity of COVID-19. Data of all enrolled 588 patients, who were referred to 25 hospitals in Jiangsu province between January 10, 2020 and March 14, 2020, were retrospectively reviewed. Univariate and multivariate regression, random forest algorithms, and additive interaction were used to estimate the importance of selective predictor variables in the relationship between smoking and COVID-19 severity. In the univariate analysis, the proportion of patients with a current smoking status in the severe group was significantly higher than that in the non-severe group. In the multivariate analysis, current smoking remained a risk factor for severe COVID-19. Data from the interaction analysis showed a strong interaction between the number of comorbidities in patients with COVID-19 and smoking. However, no significant interaction was found between smoking and specific comorbidities, such as hypertension, diabetes, etc. In the random forest model, smoking history was ranked sixth in mean decrease accuracy. Active smoking may be significantly associated with an enhanced risk of COVID-19 progression towards severe disease. However, additional prospective studies are needed to clarify the complex relationship between smoking and COVID-19 severity.


Introduction
Coronavirus disease 2019 , caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was declared a global public health emergency by the World Health Organization in January, 2020. [1] Whole genome sequencing and pathogenic nucleic acid analysis confirmed that the virus was the seventh member of the coronavirus family that can infect humans. [2] As of March 2021, the number of persons with COVID-19 worldwide was approximately 115 million, with more than 2.5 million deaths. [3] COVID-19 has several clinical manifestations. The main presenting symptoms are fever and respiratory symptoms, such as cough, sputum production, and shortness of breath. Some patients present with Medicine extrapulmonary symptoms, such as diarrhoea, cardiac function injury, and liver and kidney function injury. Some patients with severe disease may present with acute respiratory distress syndrome, multiple organ dysfunction syndromes, and even death. [4] SARS-CoV-2 is a β coronavirus, which is 88% identical to two bat-derived severe acute respiratory syndrome (SARS)like coronaviruses, bat-SL-CoVZC45 and bat-SL-CoVZXC21, at the whole genome level. However, the S receptor-binding domain structure of SARS-CoV-2 is similar to that of SARS-CoV, despite the amino acid variation at some key residue points. [5] The receptor of SARS-CoV-2 and SARS-CoV is angiotensin-converting enzyme 2 (ACE2). [6] They invade cells through endosomal or non-endosomal pathways with the help of different host cell proteases, which have a structural basis for direct injury of extrapulmonary organs or tissues. Because of the cell specificity of ACE2 and host cell protease, the affinity of SARS-CoV-2 to different organs varies. [7,8] Smoking seriously damages human health and is the main risk factor for respiratory and cardiovascular diseases. [9] ACE and ACE2 are important components of the renin-angiotensin system. ACE is an enzyme that catalyses the conversion of angiotensin (Ang) I to Ang II, which exerts a strong vasoconstrictive effect. Ang II favours vasoconstriction, cellular proliferation, inflammation, and fibrilization when it binds to the Ang II Type 1 receptor. Previous studies have suggested that longterm smoking changes renin-angiotensin system homeostasis by up-regulating the detrimental ACE/Ang II/Ang II type 1 receptor axis and down-regulating the compensatory ACE2/Ang-(1-7)/ Mas receptor axis, thus favouring the development of cardiopulmonary diseases. [9,10] Thus, it would be expected that a large proportion of patients with COVID-19 are smokers. Chakladar et al found that smoking-mediated upregulation of the androgen pathway leads to increased SARS-CoV-2 susceptibility. [11] Umnuaypornlert et al conducted a meta-analysis and found that smoking, whether current smoking or former smoking, significantly increases the risk of COVID-19 severity and death. [12] In contrast, a systematic review and meta-analyses found an unexpectedly low prevalence of current smoking among hospitalised patients with COVID-19. [13,14] A low prevalence of current smokers among hospitalised patients with COVID-19 has been reported in several studies. [15,16] However, the relationship between smoking and COVID-19 severity remains unclear. To address this important clinical question, this study aimed to evaluate the effect of smoking status (current smoking and former smoking) on the clinical severity of COVID-19, which may provide additional evidence for active and effective interventions and treatment measures for early-disease stages and for reducing the mortality rate of patients with severe disease.

Study design and participants
A total of 588 patients were referred to 25 hospitals in Jiangsu province between January 10, 2020 and March 14, 2020; they were enrolled, and their data were retrospectively and consecutively analysed. According to the government's arrangements, all tertiary hospitals provide treatment for patients with COVID-19, diagnosed using the World Health Organization interim guidance [17,18] and the guidelines of COVID-19 diagnosis and treatment trial (5th edition) of the National Health Commission of the People's Republic of China. [18,19] This study was performed in accordance with the Helsinki Declaration and was approved by the Ethics Committee of the First Affiliated Hospital of Nanjing Medical University. Written informed consent was obtained from participants or their families for data collection. Based on disease severity, the patients were divided into four groups: mild, moderate, severe, and critically ill.
The criteria for this clinical classification can be found in previous studies. [20,21]

Data collection and study definitions
The variables of interest included (1) participants' general information: age, sex, smoking history, comorbidities (such as chronic obstructive pulmonary disease, hypertension, diabetes, cardiovascular disease, cerebrovascular disease, hepatitis B, cancer, chronic kidney disease, and immunodeficiency disease), therapeutic drugs, respiratory support, and disease outcome; (2) patients' main clinical symptoms and signs; (3) results of laboratory tests performed within 48 hours of admission to the hospital or intensive care unit: blood routine examination, level of C-reactive protein (CRP), procalcitonin, lactate dehydrogenase, aspartate aminotransferase (AST), alanine aminotransferase, total bilirubin, creatine kinase, creatinine, D-dimer, erythrocyte sedimentation rate (ESR), and finger pulse oxygen saturation (SaO 2 ); and (4) imaging findings such as chest computed tomography findings. Data were obtained from the electronic medical records and initially evaluated by trained physicians. Full recovery and discharge, disease regression from critical/severe to non-severe disease status, positive to negative polymerase chain reaction, and maintenance of non-severe status were considered as disease improvement or favourable clinical outcome.

Smoking history
Smoking was quantified as pack-years (number of cigarettes smoked per day multiplied by the number of years of smoking). For example, smoking one pack a day for 10 years was defined as 10 pack-years. Any patient with a smoking quantity above 10 pack-years was considered as having a significant smoking history. Based on the patients' smoking history, the study population was divided into two groups: the ≥10 pack-years group (16 patients), which included patients whose current smoking was ≥10 pack-years, and the <10 pack-years group (572 patients), which included non-smokers, former smokers, and patients whose current smoking was <10 pack-years.

Statistical analysis
Continuous variables are expressed as medians and interquartile ranges or simple ranges, as appropriate. Categorical variables are summarised as counts and percentages. For continuous variables, the Student t test or Mann-Whitney U test was used for data analysis, while the chi-square test or Fisher exact test was used to analyse categorical variables. No imputation was made for missing data. All the statistics in this study are descriptive because the patient cohort was not derived from random selection. Odds ratios and 95% confidence intervals were calculated using univariate and multivariate logistic regression models. The random Forest package in R software was used to perform a random forest classification of the data. [22][23][24] To assess the relationship between independent and dependent variables, COVID-19 severity was considered as the dependent variable and clinical characteristics as the independent variables. The number of classification trees was set at 1000. Each time, 588 patients were sampled randomly with replacement to construct the classification tree. In each split, entry was set as the square root of the number of total variables to sample variables randomly as candidates. The variables were assessed and ranked by measuring the effect of perturbing them on Mean Decrease Accuracy (MDA) and Mean Decrease Gini (MDG). Comorbidity-smoking behaviour interaction analysis was performed using a logistic regression model. All analyses were implemented with R 3.12 software (R Foundation for Statistical Computing, Beijing, China; http://www.Rproject.org). [24] Twosided P-values <.05 were considered statistically significant.

Demographic and clinical characteristics
All the 588 patients had positive reverse transcriptase polymerase chain reaction tests. By the end of the study, all the patients had recovered and were discharged. The severe group consisted of 46 patients with severe or critical disease, while the non-severe group consisted of 542 patients with mild or moderate disease. The demographic and clinical characteristics of the patients in the two groups are shown in Table 1. There was no significant difference in sex between the two groups. The median age of the patients was 46 (range, 33-56) years. Most of the patients in the severe group were above 65 years (34.8%, 16/46), while most of the patients in the non-severe group were within the 15 to 49 years age range (58.9%, 319/542). On admission, cough (87.0%, 40/46) and fever (43.5%, 20/46) were the most common symptoms in the severe group, while cough (64.0%, 347/542) and fever (30.8%, 167/542) were the most common in the non-severe group. Hypertension (16.3%, 96/588) and diabetes (7.8%, 46/588) were the most common comorbidities in the overall patient cohort, severe group (34.8%, 16/46; 26.1%, 12/46), and non-severe group (14.8%, 80/542; 6.3%, 34/542).

Relationship between smoking and COVID-19 severity
In the univariate analysis, the proportion of patients with a current smoking status in the severe group (13.0%, 6/46) was significantly higher than that in the non-severe group (4.4%, 24/542). Current smoking status remained a risk factor for COVID-19 severity in the multivariate analysis. (P < 05). The differences in the demographic and clinical characteristics between the ≥10 pack-years and <10 pack-years group are shown in Table 3. There were significant differences in age, sex, diabetes, cerebrovascular disease, median haemoglobin level, and CD4 + T cell counts between the two groups.
Epidemiological studies show an interaction among factors when the combined effect of more than two risk factors on a disease is different from the sum of their independent effects. Biological interaction has attracted much attention due to its practicality in biological and clinical settings, as it relies heavily on statistical interactions. [22] The evaluation of statistical interactions is mainly based on multiplicative and additive interactions, with additive interaction having a greater biological and public health significance. [23] Through the interaction analysis, we found that only the number of comorbidities had an interaction with smoking (P = .043), with estimates of interaction (odds ratio, 0.58; 95% confidence interval: 0.31-0.95) ( Table 4). There was no interaction between smoking and chronic obstructive pulmonary disease, hypertension, diabetes, cardiovascular disease, cerebrovascular disease, hepatitis B, cancer, chronic kidney disease, and tuberculosis (P > .05).
The random forest algorithm can be used to analyse nonlinear, collinear, and interactive data effectively. As one of the classical algorithms of machine learning, the random forest algorithm has a high accuracy in risk prediction and diagnosis of diseases. It is currently widely used in molecular and genetic fields and other medical fields. [24,25] In MDA, the greater the decrease in the accuracy after permutation of the variable, the more important the predictor is. MDG) is the sum of all decreases in Gini impurity. MDG values show the importance of the variable in COVID-19 severity prediction. The top three important predictors in the MDA analysis were lymphocyte count, shortness of breath, and age. The top three variables in the MDG analysis were lymphocyte count, age, and SaO 2 <95%. Smoking history was ranked sixth in the MDA analysis and twenty second in the MDG analysis (Fig. 1).

Discussion
In the univariate analysis in this study, the proportion of patients with a current smoking status in the severe group was significantly higher than that in non-severe group. Current smoking status still remained a risk factor for severe COVID-19 in the multivariate analysis. The interaction analysis showed that there was a strong interaction between the number of comorbidities in patients with COVID-19 and smoking. However, there was no significant interaction between smoking and specific comorbidities, such as hypertension, diabetes, etc. In the random forest model, smoking history was ranked sixth in MDA.
The tobacco industry has created millions of jobs around the world and provided a large amount of tax revenue to the government. Simultaneously, 50% of tobacco consumers die from cigarette smoke, which causes heavy losses to the healthcare system. During the COVID-19 pandemic, smoking and the risk of acute respiratory infections have attracted a lot of interest again. [26,27] Smith et al demonstrated that ACE2 is expressed in a subset of secretory cells in the respiratory tract. Chronic smoke exposure triggers an increase in the population of these cells and a concomitant increase in ACE2 expression. ACE2 expression is responsive to inflammatory signalling and can be upregulated by viral infections or interferon treatment. They suggested that SARS-CoV-2 infections could create positive feedback loops that increase ACE2 levels and facilitate viral dissemination. [28] These mechanisms may partially explain why smokers are particularly susceptible to severe SARS-CoV-2 infections. Findings from Leung et al suggested that quitting smoking can reduce the probability of COVID-19 progression to severe disease. In the study, patients who were smokers or who had chronic obstructive pulmonary disease had higher ACE2 levels, increasing the probability of viral entry into the host cells and infection. They also found that ex-smokers and never-smokers had similar ACE2 levels. These findings support the fact that immediate quitting of smoking is optimal. [29] Arunima also found that smoking causes more severe SARS-CoV-2 infection by blocking the activity of the immune system messenger proteins, interferons, at least in part. Interferons play a crucial role in the body's early immune response, triggering    A French clinical observational study reported that current smokers had a lower susceptibility to SARS-CoV-2 infection, although the disease was severe once they got infected. [31] Some meta-analyses also reported a low prevalence of current smoking among hospitalized patients with COVID-19. [14] The mechanism of action of nicotine may explain the paradox of the relationship between smoking and COVID-19. When COVID-19 gets severe, excessive lung inflammation may occur due to a virus-activated "cytokine storm.". The cholinergic anti-inflammatory pathway that modulates the inflammatory response during systemic inflammation has been demonstrated. In addition, α7-nicotinic acetylcholine receptor is essential in attenuating the inflammatory response. [32] Nicotine is the main active substance in tobacco. It has been reported that nicotine, an agonist, plays an anti-inflammatory role in mice with acute lung injury. [33] Some studies suggested that nicotine may represent a potential therapeutic target for the improvement of cytokine storms and attenuation of dysregulated inflammatory responses of patients with COVID-19. [32,34] Perhaps this complex biological mechanism of nicotine can explain different relationships between smoking and COVID-19 severity in epidemiological studies.
It is believed that older patients with chronic diseases, such as diabetes, cardiovascular diseases, and hypertension are susceptible to respiratory failure and may have a poorer outcome. [15,35] In our study, in the univariate analysis, the proportion of patients with hypertension, diabetes, cerebrovascular disease, or cancer in the severe group was significantly higher than that in the non-severe group. Hypertension, diabetes, cerebrovascular disease, and cancer remained risk factors for severe COVID-19 in the multivariate analysis. Interestingly, there were statistically significant differences in diabetes and cerebrovascular disease between the patients who were current smokers with a smoking quantity of ≥10 pack-years and the other patients. The internal regulatory mechanism of diabetes, cerebrovascular disease, and smoking needs further research.
This study has three main limitations that must be acknowledged. First, due to the study's retrospective nature and the limited number of patients, our conclusions need to be further verified in prospective studies with large sample sizes. Second, data on prognosis was unavailable at the time of the analysis, and a longer follow-up time would have provided more detailed information on the potential risk factors that could interfere with clinical outcomes. Third, the study was focused on patients with obvious clinical symptoms who went to the hospital for treatment; thus, asymptomatic patients who might have been super-spreaders or patients with mild symptoms may have been missed. Fourth, smoking is a behavioural change in a person, and COVID has infected all throughout the world irrespective of their habits. Our retrospective analysis may only assume an association between smoking status and risk of severe COVID-19, and not a causeand-effect relation. [36] Considering the descriptive nature of the current reports with no control group, and other factors, our findings should be considered as hypothesis-generating indicating the need for further studies. [37] Fifth, patients with lung cancer are more susceptible and more likely to develop more severe COVID-19 disease after SARS-CoV-2 infection. [38,39] So further study by collecting more patients with COVID-19 complicated by lung cancer to investigate the relationship between lung cancer, smoking and COVID-19 may be required.

Conclusions
In conclusion, the results of this multicentre retrospective observational study of Chinese patients suggest that active smoking may have a significant association with a greater risk of COVID-19 progressing towards severe disease. Physicians and public health professionals should urgently take effective preventive measures to .043 Figure 1. The ranking chart of the two accuracy indicators. Variable selection by a random forest using mean decreases in accuracy and the Gini index, according to which the importance score of each variable was calculated.