Development and Validation of a Comprehensive Multiparameter-based Scoring System to Assess Pulmonary Fibrosis Severity

Background Survival time varies greatly in patients with idiopathic pulmonary fibrosis (IPF). An assessment method that can accurately assess the severity and prognosis of pulmonary fibrosis is currently lacking. This study aimed to develop a new method, which can be easily used to assess pulmonary fibrosis severity. Method 1. Development of a HRCT combined pulmonary function & physiological parameter (CTPF) assessment method: The method included two parts. 1) CT-based fibrosis staging: Four representative lung CT sections were selected and evenly divided into 100 small areas. The percentage of honeycomb lesion area in the four sections was determined fibrosis stage,2) PF-based severity grade: FVC%pred,DLco%pred,SpO2% age and gender were used to assess PF severity grade. 2. Validation of the new method: The method was used to assess 192 patients with IPF. Two radiologists used the CT-based fibrosis staging method to determine the fibrosis stage. Pulmonologist determined the PF severity grade. 3. Statistical analyses: Intra-group correlation coefficient to estimate the consistency between the CT scores from the two radiologists. Spearman correlation coefficient to evaluate the correlation between CT scores and lung function parameters. The competitive risk Fine–Gray model was used to analyze the relationship between CT-based stage/PF-based grade and prognosis. CT-based stage, PF-based grade, and GAP stage were used as predictors to predicted the death risk. Results CTPF

representative lung CT sections were selected and evenly divided into 100 small areas. The percentage of honeycomb lesion area in the four sections was determined fibrosis stage,2) PF-based severity grade: FVC%pred,DLco%pred,SpO2% age and gender were used to assess PF severity grade.
2. Validation of the new method: The method was used to assess 192 patients with IPF. Two radiologists used the CT-based fibrosis staging method to determine the fibrosis stage. Pulmonologist determined the PF severity grade. 3. Statistical analyses: Intra-group correlation coefficient to estimate the consistency between the CT scores from the two radiologists. Spearman correlation coefficient to evaluate the correlation between CT scores and lung function parameters. The competitive risk Fine-Gray model was used to analyze the relationship between CT-based stage/PFbased grade and prognosis. CT-based stage, PF-based grade, and GAP stage were used as predictors to predicted the death risk. Results 1. The intra-group correlation coefficient of the CT scores of the two radiologists was 0.95, P<0.05. 2. The CT scores negatively correlated with pulmonary function. 3. The CTPF comprehensive model, showed higher predictive accuracy.
Conclusion Combined CT-based staging and PF-based grading methods CTPF can be adopted easily in clinical practice, and can assess IPF severity and predict death risk more accurately.
Background Survival time varies greatly in patients with idiopathic pulmonary fibrosis (IPF). Some patients have slow disease progression and are stable for a long time, whereas others develop acute exacerbation and die quickly [1,2]. How to accurately assess IPF severity and predict prognosis remains unanswered. A commonly accepted method that can provide accurate assessment for IPF severity and prognosis is currently still lacking [3].
The currently available IPF severity scoring methods include: 1) the clinical-radiographic-physiologic (CRP) scoring method published in 1986 by Leslie C. Watters et al [4,5]. The CRP method uses the following 7 variables: degree of dyspnea, score of chest x-ray results, forced vital capacity (FVC), forced expiratory volume in one second (FEV1), intrathoracic gas volume (Vtg), the ratio of diffusing capacity of the lung for Carbon Monoxide (DLco) to alveolar volume (VA) (DLco/VA), and resting alveolar-arterial oxygen partial pressure (AaPO 2 ). This method has several disadvantages, such as requiring many parameters, using a complex calculation method, and using Chest x-ray, which often fails to reveal lung lesions such as fibrosis lesion. In 2001, Talmadge E. King et al. [6] improved the CRP scoring method by including additional parameters, such as gender, age, smoking status, and clubbing fingers, which further increases the complexity of this assessment method. 2) In 2002, Athol U. Wells et al [7] proposed a composite physiologic index (CPI) method to assess interstitial lung disease (ILD) severity by combing chest computed tomography (CT) results and pulmonary function parameters. They verified that the CPI method is superior in terms of estimating fibrosis severity and predicting survival to the method that only uses pulmonary function parameters. However, the calculation formula of the CPI method is complex, which limits its adoption in clinical practice. 3

) Brett
Ley, MD et al. [8] suggested a gender, age, and physiologic (GAP)-based method, which is based on the data of gender, age, FVC, and DLco. However, the GAP method does not include chest HRCT data, which is critical in IPF assessment. Thus the assessment accuracy of the GAP method is compromised. 4) Japanese researcher Ryo Okuda et al [9] proposed to use the two key arterial blood gas indicators, arterial partial pressure of oxygen (PaO 2 ) and oxyhemoglobin saturation (SaO 2 %), to assess IPF severity. However, this method does not include HRCT data and pulmonary function parameters. In 2017, Hasti Robbie et al [10] analyzed the contributions of physiological parameters, histopathological parameters, imaging parameters, biomarkers to the assessment of IPF severity and concluded that using a single type of parameters to assess IPF severity has serious limitations.
In this study, based on the available scoring methods, we chose parameters that have been proven to have a good prognostic value and can be acquired easily in clinical practice to develop a new scoring method to assess pulmonary fibrosis severity (patent application number: 201910514972.5). In this method, we combined the data of HRCT, pulmonary function, and arterial oxygen saturation to estimate pulmonary fibrosis severity comprehensively, and we also validated the new scoring method (www.clinicaltrials.gov: ChiCTR-RRC-17010683). The new method may improve the accuracy of pulmonary fibrosis severity assessment and IPF prognosis prediction. This method is simple and can be adopted easily in clinical practice.

Method
Development of a New Scoring Method Pulmonary Fibrosis Staging by Chest High-resolution Computed Tomography (HRCT) (CT-based fibrosis staging, Fig. 1) Based on the latest 2018 IPF guidelines [1], the common pathological manifestation of IPF is usual interstitial pneumonia (UIP). The imaging presentation of IPF is characterized by reticular infiltration shadow, linear shadow, honeycomb lung, and traction bronchiectasis. The severity and area of the lesions showing on chest CT images are important predictor of IPF mortality [11]. Previous imaging studies on interstitial pneumonia and IPF have proposed that honeycomb and stretch bronchiectasis can better predict the survival and prognosis of patients than other imaging characteristics such as reticular infiltration shadow and linear shadow. Moreover, honeycomb and stretch bronchiectasis are the most representative imaging manifestations of pulmonary fibrosis [12][13][14]. Therefore, we choose the pathological range of honeycomb and traction bronchiectasis to evaluate the extent of pulmonary fibrosis.
We combined the theories proposed in the previous studies [15,16] and calculus principles to design a "four-section honeycomb lung percentage" method. We selected the following four representative lung CT sections to semi-quantitatively estimate the extent of honeycomb lesion in the entire lung: the aortic arch section, the tracheal bifurcation section, the section of basal (dorsal) segment of the tracheal bifurcation at the inferior lobes, and the section below the right lung apex. Each section included both the left and the right lungs. The largest transverse diameter line of each lung section was evenly divided into three parts, and then the lung section was divided into inner, middle, and outer sections by drawing lines starting from the dividing points alone the shape of patient's thorax.
The outer lung section was then evenly divided into 6 small areas; the middle lung Sect. 4-5 areas (5 areas for a large middle section); the inner lung Sect. 2 areas. Thus, in total, the 4 CT sections comprised 8 lung sections (4 left + 4 right lung) and were evenly divided into approximately 100 small areas (12 or 13 per lung section × 8). Each small area was scored as 1 when there was positive honeycomb lesion in the area, and the total score of the entire lung was used as the total honeycomb lung score. The traction bronchiectasis score was calculated in the same way as the honeycomb score. The honeycomb lung percentage was calculated as: (total honeycomb score + total traction bronchiectasis score) ÷ total number of the small areas × 100%. For example, if 8 lung sections were evenly divided into 100 small areas and 30 of them were scored as positive honeycomb lung or traction bronchiectasis, then the honeycomb lung percentage was 30%. According to Lynch et al [17], lung fibrosis can be staged based on the following lung CT characteristics: stage I: there is reticular and linear shadow but no honeycomb lesion; stage II: honeycomb lesion area is < 25% of the entire lung; stage III: honeycomb lesion area is 25%-49%; stage IV: honeycomb lesion area is 50%-75%; stage V: honeycomb lesion area is > 75%.
The thickness of HRCT section was 1-1.5 mm; section spacing was 2 cm. Patients were in supine or prone position. The minimum exposure was 200 mA per second. The definition of honeycomb lung followed the criteria recommended by the Fleischner Society Guidelines [11].

Assess Pulmonary Fibrosis Severity by Using Multi-parameter-based Comprehensive Scoring Method
Patients' baseline physiological condition and lung function parameters are important predictors for survival [7][8][9]18]. We compared the advantages and disadvantages of the existing pulmonary fibrosis severity scoring methods, including CRP, GAP, CPI, and JRS (Table 1) and chose the 5 parameters that are of important predictive values and are relatively easy to be collected in clinical practice: FVC%pred, DLco%pred, oxygen saturation of peripheral blood (SpO 2 %), age, and gender. We used the 5 parameters to evaluate the disease severity. We followed the previous studies [4][5][6][7][8][9] to define a multiparameter-based (parameters of pulmonary function and physiological condition, PF-based grading) comprehensive scoring criteria to estimate disease severity. We then combined this PFbased grading method with the CT-based pulmonary fibrosis staging method to develop a new scoring method (CTPF) to assess pulmonary fibrosis severity (Table 2).

Scoring the Clinical Data
Two radiologists used the CT-based pulmonary fibrosis staging method described above to evaluate patients' chest HRCT images. The average scores from the two radiologists were used as patients' final lung fibrosis scores, and then the scores were used to stage pulmonary fibrosis according to the criteria described in Table 2. Patients' age, gender, FVC%pred, DLco%pred, and SpO 2 % were scored according to the criteria in Table 2, and the total scores were used to estimate PF-based disease severity according to the criteria in Table 2. The definition of disease severity is: score 0-3 for grade

Statistical Analyses
Measurement data are expressed as mean ± standard deviation (SD). Count data are presented as percentage (%) or proportion (%). Intra-group correlation coefficient was calculated to estimate the CT score consistency between the two radiologists [19,20]. Spearman correlation coefficient was calculated to analyze the correlation between CT-based fibrosis scores and pulmonary function parameters (FVC%pred, DLco%pred, SpO 2 %) and CPI index. The competition risk (Fine-Gray) model was used to analyze the relationship between prognosis (cumulative mortality) and the CT-based fibrosis stage and the PF-based severity grade [21]. Patients' survival period was defined from the time when patients' data were acquired to the time of death endpoint or the last follow-up visit. The time unit was month. The death endpoint of this study was defined as the death caused by lung diseases (IPF exacerbation or IPF combined with lung cancer). Lung transplantation is considered to be the most effective treatment for patients with IPF, so the occurrence of lung transplantation was considered as a competitive risk event in this study [22]. Other types of data were treated as censored data.
We used the following strategies to develop and evaluate disease prognosis prediction models: (1) Considered lung transplantation occurrence as a competitive risk event and used CT-based stage, PFbased grade, and CTPF comprehensive stage as predictors. To estimate the accuracy of prediction models, we included the GAP staging method proposed by Brett Ley, MD et al [8] in our analysis. We used all the data and the Fine-Gray regression analysis to establish 4 death-risk prediction models: CT-based fibrosis stage model, PF-based severity grade model, CTPF combined stage model, and GAP stage model. (2) The Bootstrap cross-validation method was used to validate the predictive effectiveness of the 4 models, and the validation was repeated 1000 times to obtain the following average indexes of model prediction accuracy: area under the ROC curve (AUC), Brier score, and a calibration curve. The AUC value reflects the discrimination of the models. It is generally accepted that the model has a satisfactory discrimination to death risk from a disease when AUC is > 75%. The calibration curve reflects the consistency between the predicted risk and the actual risk. The Brier scores reflect both the discrimination and calibration of a model. The smaller the Brier score is, the better the discrimination and calibration of a model is [21].

Patients' Clinical Characteristics
Patient screening flow chart is displayed in Fig. 2  CT Score values by reviewer 1 and CT Score values by reviewer 2 were the scores from the two radiologists using the "4-section honeycomb lung percentage" method to score patients' HRCT imaging results. CT-based stage: The stage was determined by using the average score of the two radiologists and following the criteria described in Table 2. PF-based grade: The grade was determined by using the pulmonary function and physiological parameters (age, gender, FVC%pred, DLco%pred, and SpO2%) and following the description in Table 2. The grade was defined as: mild (a), moderate (b), and severe (c). GAP (gender, age, and physiologic variables) stage followed the recommendation by Brett Ley, and a higher stage represented a greater death risk. CPI: composite physiologic index. In 2002, Athol U. Wells and colleagues proposed to use CPI, which combined chest CT and pulmonary functional parameters, to assess the severity of interstitial lung diseases (ILDs). A higher CPI represents a more severe ILD.

The Relationship Between CT-based Stage/PF-based Severity and Pulmonary Function and Death Risk
The average CT scores of the 192 patients from the two radiologists using the "4-section honeycomb percentage" method were 24.4 ± 14.1 and 24.7 ± 14.4, respectively; the highest scores were 67 and 65, respectively, and the lowest values were 1 and 3, respectively ( Table 3). The intra-group correlation coefficient of the scores from the two radiologists was 0.95 (P < 0.05). For each patient, the mean CT score from the two radiologists was used as the final CT score. The final CT scores were used in the Spearman correlation analysis to assess the correlation between the CT scores and pulmonary function parameters (Fig. 3). The CT scores negatively correlated with FVC%pred (r s = -0.47, P < 0.01, Fig. 3A), DLco%pred (r s = -0.66, P < 0.01, Fig. 3B), and SpO 2 % (r s = -0.40, P < 0.01, Fig. 3C) and positively correlated with CPI index (r s =0.63, P < 0.01, Fig. 3D), which represented ILD severity. These data support that the "4-section honeycomb lung percentage" scoring method can effectively represent the severity of pulmonary fibrosis.
To analyze the correlation between CT-based stage and death risk, we performed Fine-Gray univariate regression (Fig. 4A) and multivariate regression to eliminate the potential confounding effects from the PF-based grade (Fig. 4B). Both analyses revealed that CT stage positively correlated with death risk. Similarly, both Fine-Gray univariate regression (Fig. 4C) and multivariate regression to eliminate the potential confounding effects from the CT-based stage (Fig. 4D) found that PF-based grade also positively correlated with death risk.

CTPF stage
HRCT images of two representative cases are displayed in Additional file. Figure. Table 2 Figure 6A is the nomogram showing CTPF-based death risk prediction, which was prepared from the CT-based stage and PF-based grade multivariate Fine-Gray regression coefficients. Figures 6B, 6C, and 6D show the calibration curves of the four prediction models after Bootstrap cross-validation. The CTPF model had the best stability. The one-, two-, and three-year cumulative death risks of patients at different CTPF stage are displayed in Table 5. When patients had the same CT-based stage, their cumulative death risk increased as their PF-based grade increased.
When patients had the same PF-based grade, their cumulative death risk increased as their CT-based stage increased. Thus. combination of CT-based stage and PF-based grade could improve the accuracy of death risk prediction.  Table 2  These data and the calibration curves after the cross-validation (Fig. 6) suggest that the CTPF stage appears to be more accurate for predicting death risk than the other 3 models.

Discussion
Comparison of several available IPF staging methods (Table 1) shows that the staging results from some methods, such as the GAP and JRS methods, fail to accurately reflect IPF severity and predict prognosis because the methods include too few parameters. The calculation methods in the CRP and CPI scoring systems are too complex to be adopted in clinical practice [10]. Therefore, a new scoring method that can accurately assess IPF severity, predict prognosis, and can be used easily is greatly needed.
Chest HRCT is one of the common clinical examinations to diagnose IPF and assess IPF severity and prognosis. Honeycomb lung is the most representative lesion of pulmonary fibrosis, and the area of honeycomb lesion directly correlates to IPF prognosis [11][12][13][14]16].
Currently, CT scoring for IPF patients includes manual semi-quantitative evaluation and total quantitative evaluation by artificial intelligence. Although the manual method is simple to use, the evaluation results are susceptible to the wide variation from different evaluators [24][25][26]. We took applicability in clinical practice into consideration and based on calculus principles to develop a "foursection honeycomb lung percentage" method, which can determine the proportion of honeycomb lung accurately and reduce inter-evaluator variation. In the current study, two radiologists reviewed patient HRCT results and determined the honeycomb lung percentage independently. The consistency coefficient of the two radiologists' scoring results was 0.95 (P < 0.05), and the fibrosis stage determined according to the honeycomb percentage was also consistent in the two radiologists. In addition, the CT-based stage negatively correlated with patients' lung function parameters (FVC%pred, DLco%pred, and SpO 2 %) and positively correlated with CPI index (Fig. 3). The CPI index reflects IPF severity. Patients with higher CT-based stage had a greater accumulative death risk.
These results indicate that our CT-based fibrosis staging method may effectively reflect IPF severity and prognosis.
Previous studies have shown that age, gender, oxygen use at rest, lower FVC %pred and lower DLco % pred were associated closely with risk of death in patients with IPF [4-8, 18, 27]. Thus, we selected the 5 important and clinical easily available lung function and physiological parameters, FVC%pred, DLco%pred, SpO 2 %, age, and gender to assess IPF severity grade (PF-based severity grade). Both our univariate and multivariate regression analysis revealed that PF-based severity grade was an independent risk factor for death from IPF.
Compared with the CT-based fibrosis staging method, the PF-based severity grading method, and the GAP staging method, the CTPF comprehensive staging method, which combined the CT-based fibrosis staging and the PF-based severity grading methods, showed the best AUC value, Brier score, and stability in terms of predicting death risk. For example, the case presented in Figure S1 was CTPF stage III c, and his predicted 2-year death risk was 49.65% according to Table 5. The patient died of acute IPF exacerbation 23 months after his clinical data were collected for the assessment in this study. The case in Figure S2 was CTPF stage II a, which corresponded to a predicted 3-year death risk of only 17.50%. This patient survived well 39 months after his data were collected for the assessment.
These results support that our CTPF comprehensive staging method can accurately predict patient death risk.
Lung transplantation has been considered to be an effective treatment for improving the survival of patients with IPF. Thus, we used lung transplantation as a competitive risk of death to calculate death risk when we validated the new CTPF comprehensive staging method. However, lung transplantation also has a death risk [21]. In 2015, Yusen, RD et al [28] reported that the global lung transplantation one-year and three-year death risk was 20% and 35%, respectively. When the death risk (

Ethics Statement
The study was approved by the Institutional Ethics Committee of Shanghai Pulmonary Hospital (No.

Consent for publication
Not applicable.

Availability of data and materials
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.

Funding
This study was funded by grants from the National Science Foundation of China (  proposed to use CPI, which combined chest CT and pulmonary functional parameters, to assess the severity of interstitial lung diseases (ILDs). A higher CPI represents a more severe ILD. CT score: mean CT score from the two radiologists. stage: The stage was determined by using the average score of the two radiologists and following the criteria described in Table 2. The definition of CT-based stage was: honeycomb lung < 25% was Stage II; honeycomb lung 25%-49% Stage III; honeycomb lung 50%-75% Stage IV; honeycomb lung >75% Stage V. PF-based grade: The grade was determined by using the pulmonary function and physiological parameters (age, gender, FVC%pred, DLco%pred, and SpO2%) and following the description in Table 2. The grade was defined as: PF score 0-3 was mild (a); PF score 4-6 moderate (b); PF score 7-10 severe (c).