Spirometric classifications of COPD severity as predictive markers for clinical outcomes: the HUNT Study

ABSTRACT Rationale GOLD grades based on percent-predicted FEV1 poorly predicts mortality. Studies have recommended alternative expressions of FEV1 for the classification of COPD severity and they warrant investigation. Objective To compare the predictive abilities of ppFEV1 (ppFEV1 quartiles, GOLD grades, ATS/ERS grades), FEV1 z-score (FEV1 z-score quartiles, FEV1 z-score grades), FEV1.Ht-2 (FEV1.Ht-2 quartiles, FEV1.Ht-2 grades), FEV1.Ht-3 (FEV1.Ht-3 quartiles), and FEV1Q (FEV1Q quartiles) to predict clinical outcomes. Methods People aged [≥]40 years with COPD (n=890) who participated in the HUNT Study (1995-1997) were followed for 5 years (short-term) and up to 20.4 years (long-term). Survival analysis and time-dependent area under curve (AUC) were used to compare the predictive abilities. A regression tree approach was applied to obtain optimal cut-offs of different expressions of FEV1. The UK Biobank (n=6495) was used as a replication cohort with a 5-year follow-up. Measurements and Main Results As a continuous variable, FEV1Q had the highest AUCs for all-cause mortality (short-term 70.2, long-term 68.3), respiratory mortality (short-term 68.4, long-term 67.7), cardiovascular mortality (short-term 63.1, long-term 62.3), COPD hospitalization (short-term 71.3, long-term 70.9), and pneumonia hospitalization (short-term 67.8, long-term 66.6), followed by FEV1.Ht-2 or FEV1.Ht-3. Generally, similar results were observed for FEV1Q quartiles. The optimal cut-offs of FEV1Q had higher AUCs compared to GOLD grades for predicting short-term and long-term clinical outcomes. Similar results were found in UK Biobank. Conclusions FEV1Q best predicted the clinical outcomes and could improve the classification of COPD severity.

FEV1 z-score has been recommended as it not susceptible to variation in people's age, sex, height, and race (3,4,13). Other expressions of FEV1 that have been recommended included FEV1 standardized by squared height (FEV1.Ht -2 )(6, 12) and cubic height (FEV1.Ht -3 ) (9,11,14), which account for size differences and indirectly for some sex differences. Miller et al. (9) recommended the FEV1 quotient (FEV1Q), where FEV1 is standardized by sex-specific lowest percentile of FEV1 distribution that takes account of sex and some size differences in lung function (9). Some studies have investigated these FEV1 expressions and/or their respective classifications of COPD severity (5-7, 9, 10, 14-17). Among them, only Huang et al. (5) and Hegendorfer et al. (15) have compared the FEV1 expressions mentioned above to predict mortality and exacerbation (5) or allcause hospitalizations (15). However, Huang et al. (5) had a relatively small sample of people with COPD (n=296) and the study by Hegendorfer et al. (15) included 501 people, among whom only 70 had asthma/COPD. No study has compared the predictive abilities of a broad range of FEV1 expressions for multiple clinical outcomes.
We aimed to compare the predictive abilities of ppFEV1, FEV1 z-score, FEV1.Ht -2 , FEV1.Ht -3 , and FEV1Q and their respective methods of classification of COPD severity for clinical outcomes in a Norwegian COPD cohort followed for 5 years (short-term) and up to 20.4 years (long-term).
All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20221432 doi: medRxiv preprint The clinical outcomes were all-cause mortality, respiratory mortality, cardiovascular mortality, COPD hospitalization, and pneumonia hospitalization.

Study population
Trøndelag is a county in central Norway with a homogenous and stable population. The HUNT Study invited the entire adult population (≥20 years) of northern Trøndelag to attend clinical examinations and answer questionnaires. This study included people aged ≥40 years who participated in HUNT2 (n=44,384, 75.2% participation). A 5% random sample (n=2300) and people reporting asthma related symptoms, diagnosis or use of medication (n=7123) were invited to perform spirometry (19). Participants from rural municipalities and participants having airflow limitation (pre-BD FEV1/ forced vital capacity (FVC)<0.75 or ppFEV1<80 using the European Coal and Steel Community (ECSC) equations (20)) from urban municipalities were invited to attend post-BD spirometry (n=5678). These airflow limitation criteria were used to allow for future changes in diagnostic and severity classification of COPD. Among those performing post-BD spirometry (n=4178, 73.6% of invited), 3840 (91.9%) had acceptable manoeuvres. There were 1350 people with COPD when we defined COPD as having post-BD FEV1/FVC<0.70 (fixed-ratio criteria) and [respiratory symptoms (daily cough in periods, cough with phlegm, wheezing, or dyspnoea) and/or self-reported doctor-diagnosed COPD](1). There were 894 people with COPD when we applied lower limit of normal (LLN) criteria i.e. post-BD FEV1/FVC z-scores< -1.645 (13). For the analysis, we included 890 people with COPD who met both the fixed-ratio and LLN criteria so that the same sample was used in all analyses regardless of COPD classification (Supplementary Figure E1). All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20221432 doi: medRxiv preprint

Spirometry and lung function classification
Post-BD spirometry was performed 30 minutes after inhalation of 1 mg terbutaline according to the 1994 ATS-guidelines (21,22). Quality assurance of spirometric measurements is described in detail elsewhere (22,23).
We categorized ppFEV1 according to the GOLD to define GOLD grades(1) and according to the ATS/ERS to define ATS/ERS grades(2) for the classification of COPD severity. We defined FEV1 z-score grades according to the recommendation from Quanjer et. al (4). FEV1.Ht -2 grades were defined suggested by Miller et. al (6) . For FEV1.Ht -3 , and FEV1Q, no widely acceptable cutpoints for the classification of COPD severity have been recommended (5-7, 9, 10, 14, 15). We generated quartiles of ppFEV1, FEV1 z-score, FEV1.Ht -2 , FEV1.Ht -3 , and FEV1Q distribution from our study cohort for the classification of COPD severity for comparisons.
The different expressions of FEV1 and their respective methods of classification of COPD severity are presented in Supplementary Table E1.

Clinical examination and questionnaires
From clinical examinations and questionnaires, information on age, sex, body mass index (BMI, kg/m 2 ), smoking status, smoking packs-years, physical activity, education, diabetes ever, asthma All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20221432 doi: medRxiv preprint ever, cardiovascular disease, systolic blood pressure (mmHg), and non-fasting total serum cholesterol (mmol/L) were recorded.
Age in years was recorded to one decimal place. Height and weight were measured with light clothing and without shoes. Height was rounded to the nearest centimetre and weight rounded to the half kilogram (19,24). Cardiovascular disease included self-reported angina pectoris, myocardial infarction, and stroke. From three measurements of systolic blood pressure, the mean of the last two measurements was used (24).

Follow-up and outcomes
The study outcomes were all-cause mortality, respiratory mortality, cardiovascular mortality, the first unplanned COPD hospitalization, and the first unplanned pneumonia hospitalization. Followup for all outcomes began at the date of participation in HUNT2 and ended at the date of the outcome or on right-censoring, whichever was the earlier. Right-censoring events were emigration (n=3) or end of follow-up. For the short-term follow-up, participants were followed for 5 years and for the long-term follow-up the participants were followed until 31 December 2015. There was no other loss to follow-up. Cause-specific mortality and hospitalizations were identified from the international statistical classification of disease and related health problems (ICD) codes in medical records and are presented in the Supplementary Table E2 (25). Date of death and hospitalizations was obtained from the Norwegian Cause of Death Registry and The Nord-Trøndelag Hospital Trust, respectively.

Statistical analysis
Incidence rates of all-cause mortality, respiratory mortality, cardiovascular mortality, COPD hospitalization, and pneumonia hospitalization were calculated. Cumulative incidence curves for all-cause mortality were constructed through Kaplan-Meier estimates and log-rank tests were used to test differences. Cumulative incidence curves were constructed for cause-specific mortality and All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20221432 doi: medRxiv preprint hospitalization where Fine and Grey(26) methods were used to account for the competing events and Grey tests (26) were used to test differences in cumulative incidence curves. For respiratory mortality, deaths from other causes were considered competing events. Likewise, for cardiovascular mortality, deaths from other causes were considered competing events. For hospitalization (COPD hospitalization or pneumonia hospitalization), all-cause mortality was considered a competing event. The classifications of COPD severity were used as continuous variables to test for trends with smoking status, the major risk factor for COPD(1).
Proportional hazard assumptions were tested through log-log survival curves and Schoenfeld residuals tests (27). Multicollinearity was tested where the variance inflation factor (VIF) was less than 1.2 in all models (28,29).
A regression tree method(30) that accounts for time and multiple outcomes was applied to obtain optimal cut-offs of FEV1 expressions. We defined models for each FEV1 expression in predicting multiple outcomes such as respiratory mortality, cardiovascular mortality, other causerelated mortality, and COPD hospitalization over the follow-up time.
The clinical outcomes are time-dependent, for example a healthy person may have disease over the follow-up time. Hence, we applied incident/dynamic time-dependent area under the receiver operating characteristic curves (AUCs) that accounts for time in order to compare the predictive abilities of FEV1 expressions and their respective methods of classification of COPD severity to predict all-cause mortality, respiratory mortality, cardiovascular mortality, COPD All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20221432 doi: medRxiv preprint hospitalization, and pneumonia hospitalization (31)(32)(33)(34). For cause-specific mortality and hospitalization, AUCs accounting for competing risks were calculated (33). We used crude models to compare AUCs because the clinical decision does not explicitly take into account other factors (9). We used 10,000 bootstrap iterations to calculate 95% CI for AUCs (35). A general bootstrap algorithm (gBA) (36) was applied to compare the AUCs.

Replication cohort
We used the UK Biobank as a replication cohort. Here, 6495 people with COPD were followed for 5 years to investigate the predictive abilities of FEV1 expressions and their respective methods of classification of COPD severity to predict all-cause, respiratory, and cardiovascular mortality. The optimal cut-offs of FEV1Q generated in HUNT were tested in UK Biobank. The details of the COPD cohort from UK Biobank are presented in the Supplementary Text E1.

Ethics
Ethical approval was obtained from the Regional Committees for Medical and Health Research Ethics (2015/1461/REK midt). All participants gave informed written consent.

RESULTS
This population-based COPD cohort (n=890) was followed-up for 5 years (short-term) and up to 20.4 years (long-term). During the long-term follow-up period, 615, 195 and 184 died due to allcauses, respiratory, and cardiovascular diseases, respectively, and 428 and 311 were hospitalized due to COPD and pneumonia, respectively. At baseline, the average age of participants was 63.8 years, six out of ten participants were men, and more than half were current smokers (Table 1,   Supplementary Table E3). A trend for increasing mean of smoking pack-years (except for FEV1.Ht -2 grades, p value=0.062) and increasing cumulative incidence of all-cause mortality, respiratory All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20221432 doi: medRxiv preprint mortality, cardiovascular mortality (only for FEV1.Ht -2 quartiles, FEV1.Ht -3 quartiles, and FEV1Q quartiles), COPD hospitalization, and pneumonia hospitalization was observed with worsening categories of classifications of COPD severity (Table 1, Figure 1). Similar results were observed in cumulative incidence curves ( In long-term follow-up, the HRs (95% CI) for all-cause mortality for the lowest compared to the highest grade or quartile were 3.97 (3.11-5.  When using FEV1 expressions as classifications of COPD severity, the short-term and longterm AUCs for COPD classifications based on reference-independent lung functions (FEV1.Ht -2 quartiles, FEV1.Ht -3 quartiles, and FEV1Q quartiles) were higher than COPD classifications based on reference-dependent lung functions (ppFEV1 quartiles, GOLD grades, ATS/ERS grades, FEV1 zscore quartiles, and FEV1 z-score grades) in predicting all-cause mortality, respiratory mortality, cardiovascular mortality, COPD hospitalization, and pneumonia hospitalization (Figure 4, Figure   5). However, the AUC for FEV1.Ht -2 grades based on reference-independent lung function All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20221432 doi: medRxiv preprint (FEV1.Ht -2 ) was generally less than GOLD grades that is based on reference-dependent lung function (ppFEV1) (Figure 4, Figure 5).
Additionally, we identified optimal cut-offs of FEV1 expressions that are referenceindependent through a regression tree approach (Supplementary Figure E7). The short-term and long-term AUCs for optimal cut-offs of FEV1Q (2.8, 4.1, 5.2) termed as FEV1Q grades ( Figure 6) were the highest compared to optimal cut-offs of FEV1.Ht -2 and FEV1.Ht -3 to predict clinical outcomes studied (data not shown). The FEV1Q grades captured more very severe cases compared to GOLD grades (Figure 1, Figure 6). Generally, in long-term follow-up the HRs for the lowest grade of FEV1Q were higher than those of the lowest grade of GOLD in predicting clinical outcomes studied (Figure 3, Figure 6). The short-term and long-term AUCs of FEV1Q grades were higher than those of the GOLD grades (p value <0.001) (Figure 4, Figure 5, Figure 6).

Replication cohort
Similar results were found in a COPD cohort from UK Biobank followed for 5 years (Supplementary Figure E8 and Figure E9). The short-term AUC for FEV1Q as a continuous variable or FEV1Q grades as classification of COPD severity was the highest for all-cause mortality, respiratory mortality, and cardiovascular mortality (Supplementary Figure E9).

DISCUSSION
In this population-based study, we found that among all FEV1 expressions, FEV1Q was the best predictor of clinical outcomes such as all-cause mortality, respiratory mortality, cardiovascular mortality, COPD hospitalization, and pneumonia hospitalization, followed by FEV1.Ht -2 or FEV1.Ht -3 across 5 years and up to 20.4 years of follow-up. The optimal cut-offs of FEV1Q (FEV1Q grades) have substantially higher short-term and long-term predictive abilities compared to GOLD grades in predicting clinical outcomes.
Others have observed similar results for all-cause mortality (5-7, 9, 10, 14, 15), exacerbation (5), and all-cause hospitalization (14,15). We observed that compared to the highest All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20221432 doi: medRxiv preprint quartile, the risk of all-cause mortality for the lowest quartiles of FEV1Q (adjusted HR 3.18) was higher than for the lowest quartile of ppFEV1 (adjusted HR 2.65) and other FEV1 expressions. In our study, we found that FEV1Q, which is independent of reference values, was the best predictor for clinical outcomes studied. Participants with COPD in this population-based cohort (HUNT Study) and replication cohort (UK Biobank) had mean age of 63.8 and 57.5, respectively, and included 18 (2.0%) and 108 (1.7%) very severe cases (GOLD grades 4), respectively, which might best represent a primary health care setting. Our study findings corresponds to the finding of predicting mortality, all-cause hospitalization, mental and physical decline. However, they studied people aged ≥80 years with only 14% of participants having asthma and/or COPD. Other studies All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20221432 doi: medRxiv preprint have investigated the prediction performance of categorical FEV1 expressions(5, 7). Huang et al. (5) studied a hospital-based cohort and found FEV1Q as quartiles was the best predictor of mortality and exacerbation in a limited sample of 296 people with COPD. Pedone et al. (7) studied hospitalbased cohort and used concordance index and observed similar results to ours for mortality when FEV1 expressions were categorized into quintiles among a limited sample of 318 people with COPD aged ≥65 years.
FEV1 is a continuous variable where, the expression of FEV1 is used for indicating lung function impairments in respiratory medicine and ppFEV1 is most commonly used for this purpose(1). Furthermore, the classification of COPD severity based on FEV1 expression has been clinically used for guiding therapy and predicting the outcomes of COPD patients(1). The GOLD grades based on ppFEV1, have been widely used for clinical purposes in classifying COPD severity(1). However, they have been criticized due to their susceptibility to physiological variation and poor prediction ability (3)(4)(5)(6). The FEV1 z-score avoids this bias due to physiological variation(3,

4). Vaz Fragoso et al.(3) used the reference equation from NHANES III(37) and found that severe
COPD based on FEV1 z-score was associated with high risk of respiratory symptoms and death.
Tejero et al. (16) found that FEV1 z-score predicted mortality worse than ppFEV1 where they calculated the predicted values from GLI reference equation (13). The ppFEV1 and FEV1 z-score are based on reference values and depend on the choice of reference equation (13,22,38,39).
Furthermore, the performance of the methods of classification of COPD severity based on these FEV1 expressions might vary with reference values. Miller et al. (6,9,10) found that the FEV1 expressions such as FEV1.Ht -2 , FEV1.Ht -3 , and FEV1Q, which are independent of reference equations, were better correlated with mortality than those that are dependent on reference equations. Additionally, Miller et al. (9) found that FEV1Q predicted mortality better than other FEV1 expressions. Extending this knowledge, our study investigated the short-term and long-term predictive ability for several clinical outcomes supporting FEV1Q as a stronger predictor than other FEV1 expressions. This indicates that the severity in people with COPD appears to be better related All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20221432 doi: medRxiv preprint to how far the FEV1 of that person is from "bottom line" rather than how far it is from a "predicted value".
The predictive ability of a classification of COPD severity based on a FEV1 expression largely depends on the choice of cut-offs. For example, the GOLD grades, ATS/ERS grades, and ppFEV1 quartiles had different predictive abilities in our study even though all are derived from ppFEV1. Huang et al(5) observed similar results. Therefore, the optimal cut-offs of FEV1 expressions for classification of COPD severity were investigated in this study and we found that cut-offs for FEV1Q (2.8, 4.1, 5.2, FEV1Q grades) were generally best in predicting short-term and long-term clinical outcomes. The optimal cut-offs should be further investigated in a large multiethnic population with a wide age range. In a clinical setting, information such as age, sex, and height of COPD patients is easily available. Therefore, using FEV1Q (or other expressions of FEV1 that are independent of reference equations) for risk classification of COPD patients might be easy to apply and avoid variation due to dependence on reference equations (9). Furthermore, multidimensional prognostic indices that combine reference independent FEV1 expressions with symptoms, exacerbation, risk factors, and/or biomarkers should be investigated further.
This study had several strengths. To our knowledge, this is the first study that has extensively studied the short-term and long-term predictive abilities of a range of FEV1 expressions both as categories or quartiles and as continuous measures to predict several clinical outcomes. We had complete information on mortality and there was no loss to follow-up other than very few emigrations (3 out of 890 participants). To reduce measurement error, quality assurance of spirometry curves was performed (22,23). Notably, we have replicated our results in a large COPD cohort from UK Biobank.
This study also had certain limitations. We had information on COPD hospitalizations only from the hospitals in the study area (northern Trøndelag) and lacked data from other hospitals in Norway. Additionally, there was missing information on some covariates, therefore, to avoid sample loss in adjusted models, a missing indicator variable (missing information as 'unknown' All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20221432 doi: medRxiv preprint category) was used which might bias the associations between the FEV1 expressions and the clinical outcomes studied. Our methods may not capture non-linear associations between FEV1 expressions and mortality (16) or hospitalization and further studies investigating these approaches are needed.
In summary, FEV1Q was the best predictor for clinical outcomes such as all-cause mortality, respiratory mortality, cardiovascular mortality, COPD hospitalization, and pneumonia hospitalization compared to a broad range of commonly applied FEV1 expressions. The findings highlight improved prediction of outcomes by use of FEV1Q for expressing spirometric lung function impairment and the classification of COPD severity.

Author's contributions
LB, LL, AL and BMB conceived and designed the study. LB analysed the data. LB wrote the first draft of the manuscript. All authors interpreted the results, revised and approved the manuscript for submission. LB and BMB are accountable for the accuracy and integrity of all parts of the work. As project leader for the HUNT2 Lung Study, AL was responsible for planning, data collection and quality assurance of data in the Lung Study. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20221432 doi: medRxiv preprint and the University of Bristol (MC_UU_12013/1). David Carslake works in a unit funded by the UK Medical Research Council (MC_UU_00011/1) and the University of Bristol.

Competing interests
None declared.

Acknowledgements
The Nord-Trøndelag Health Study (HUNT) is a collaboration between HUNT Research Centre (Faculty of Medicine and Health Science, Norwegian University of Science and Technology NTNU), Nord-Trøndelag County Council and the Norwegian Institute of Public Health. The HUNT2 Lung Study was partly funded through a non-demanding grant from AstraZeneca Norway.
All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 5, 2020. ; Table 1. Baseline characteristics of participants with COPD aged ≥40 years in the HUNT2 study (1995)(1996)(1997) Figure 2. Cumulative incidence curves of classifications of COPD severity for all-cause mortality among participants with COPD aged ≥40 years in the HUNT2 study (1995)(1996)(1997) followed for up to 20.4 years.

Figure 3.
Crude hazard ratios for different expressions of FEV1 and their respective methods of classification of COPD severity for all-cause mortality, respiratory mortality, cardiovascular mortality, COPD hospitalization, and pneumonia hospitalization among participants with COPD aged ≥40 years in the HUNT2 study (1995)(1996)(1997) followed for up to 20.4 years.