FibroBox: a novel noninvasive tool for predicting significant liver fibrosis and cirrhosis in HBV infected patients

China is a highly endemic area of chronic hepatitis B (CHB). The accuracy of existed noninvasive biomarkers including TE, APRI and FIB-4 for staging fibrosis is not high enough in Chinese cohort. Using liver biopsy as a gold standard, a novel noninvasive indicator was developed using laboratory tests, ultrasound measurements and liver stiffness measurements with machine learning techniques to predict significant fibrosis and cirrhosis in CHB patients in north and east part of China. We retrospectively evaluated the diagnostic performance of the novel indicator named FibroBox, Fibroscan, aspartate transaminase-to-platelet ratio index (APRI), and fibrosis-4 index (FIB-4) in CHB patients from Jilin and Huai’an (training sets) and also in Anhui and Beijing cohorts (validation sets). Of 1289 eligible HBV patients who had liver histological data, 63.2% had significant fibrosis and 22.5% had cirrhosis. In LASSO logistic regression and filter methods, fibroscan results, platelet count, alanine transaminase (ALT), prothrombin time (PT), type III procollagen aminoterminal peptide (PIIINP), type IV collagen, laminin, hyaluronic acid (HA) and diameter of spleen vein were finally selected as input variables in FibroBox. Consequently, FibroBox was developed of which the area under the receiver operating characteristic curve (AUROC) was significantly higher than that of TE, APRI and FIB-4 to predicting significant fibrosis and cirrhosis. In the Anhui and Beijing cohort, the AUROC of FibroBox was 0.88 (95% CI, 0.72–0.82) and 0.87 (95% CI, 0.83–0.91) for significant fibrosis and 0.87 (95% CI, 0.82–0.92) and 0.90 (95% CI, 0.85–0.94) for cirrhosis. In the validation cohorts, FibroBox accurately diagnosed 81% of significant fibrosis and 84% of cirrhosis. FibroBox has a better performance in predicting liver fibrosis in Chinese cohorts with CHB, which may serve as a feasible alternative to liver biopsy.


Introduction
Hepatitis B virus (HBV) infection has become a major public health threat for its high prevalence (attacking 257 million people worldwide in 2016) [1]. The major complications of CHB include cirrhosis and hepatocellular carcinoma, leading to poor prognosis [2]. Chronic hepatitis B (CHB) is highly endemic in China, with over 74 million hepatitis B surface antigen (HBsAg)-positive patients [2,3]. The number of CHB patients undergoing antiviral treatment remains uncalculated [4]. To control the spread of CHB in China, it is essential to conduct early diagnosis and intervention of HBV infection.
Fibrosis staging, an approach to assess HBV-induced liver diseases, is efficient to estimate the prognosis of patients and identify those requiring antiviral treatment [5]. Liver biopsy is traditionally recommended as a standard for staging fibrosis [6], but it is restricted with by invasiveness, cost [7,8], and unavoidable errors from sampling [9,10]. Therefore, a variety of noninvasive tests have been developed in recent years.
As summarized in EASL-ALEH clinical practice guidelines [11], noninvasive staging usually depends on serum biomarkers-based mathematic calculation and elasticitybased imaging techniques, such as transient elastography (TE) and magnetic resonance elastography (MRE). Although several strategies combining TE and computer algorithm are introduced in the guidelines, they are only applicable for patients infected with hepatitis C virus (HCV). Moreover, no measurements or macro characteristics of imaging methods have been described in strategies.
With machine learning that can tease out the complex, non-linear relationships in the data [12,13], we conducted a retrospective multicenter study and established a novel multivariate algorithmic model, named FibroBox, in a cohort of CHB patients in Huaian and Jilin, and then evaluated its predictive accuracy in external validation sets from Anhui and Beijing.

Patients
We selected 1843 treatment-naïve CHB patients who underwent liver biopsy, blood test, B-ultrasound examination and Fibroscan (FS402, Echosens, France) at four centers, including Huai'an Fourth People's Hospital Their clinical data were retrospectively collected through hospital information system. Included were those who underwent liver biopsy and at least one of the following criteria: aspartic transaminase (AST) or alanine transaminase (ALT) ≥40 IU/L, liver stiffness ≥6.5 kPa, HBV DNA ≥2000 IU/mL or family history of liver diseases. The exclusion criteria included co-infection with HCV, hepatitis D virus (HDV) or human immunodeficiency virus (HIV), focal hepatic lesion (e.g. HCC, hepatic tuberculosis and any other), significant alcohol intake (> 20 g/day), severe hepatic failure (complications such as jaundice and ascites or transaminases level over 10 times the upper limit of normal (ULN)), acute heart failure and pregnancy and BMI greater than 30 kg/m 2 .

Liver biopsy
Percutaneous liver biopsy (LB) was performed under the ultrasonic guidance by experienced ultrasonologists. Liver samples were formalin-fixed and paraffinembedded for subsequent histological analysis. Histological analysis was performed by three senior pathologists in every center. If three different results came from one sample, the consensus was taken as the final decision. Liver samples with less than three portal tracts were considered as poor quality and excluded from the analysis. All the pathologists were blinded to the clinical information. The liver fibrosis was staged by the Metavir system [14]. F ≥ 2 was considered as significant fibrosis and F4 as cirrhosis.

Transient elastography (Fibroscan)
All liver stiffness measurements (LSMs) were performed using Fibroscan devices (FS402, Echosens, France) by skilled technicians according to the manufacturer's protocol [15]. The TE results were presented as kilopascal (kPa). For each patient, the median of 10 successfully measured TE values was regarded as the final TE. A measurement was considered invalid if its TE median > 7.1 kPa and interquartile ratio (IQR)/LSM > 0.30 [16].

Traditional serum index calculation
Aspartate transaminase (AST)-to-platelet ratio index (APRI) [17] and the fibrosis-4 (FIB-4) [18] are two common compound surrogates that use simple formulas to score easily acquired parameters. The formulas of APRI and FIB-4 were shown as follows: These relevant input parameters were measured when patients were admitted to the hospitals without any interventions.

Ultrasonic measurement
In this study, the parameters measured during ultrasonic examinations included the size of spleen (mm 2 , length × thickness), the diameter of splenic vein (mm) and the diameter of portal vein (mm). Every parameter was measured for at least three times by experienced ultrasonologists and the mean value was calculated as the final score of each measurement.

Training sets
Two training data sets of treatment-naïve HBV-infected patients who entirely met the study criteria from Huai'an and Jilin (n = 549) were subjected to the algorithmic model (FibroBox). The sets were not absolutely comparable, but the mode could normalize these sets.

Validation sets
The diagnostic performances of the FibroBox and other noninvasive markers were evaluated with external validation sets from Anhui and Beijing cohorts. In the Anhui (n = 408) and Beijing cohorts (n = 332), the CHB patients who underwent biopsy with available data on TE, AST, ALT and Platelet count were included in the analysis.

FibroBox construction
The data characteristics, preprocessing and training/testing procedures of FibroBox were described in Supplement Material 1. All variables were normalized in order to minimize systematic errors from different centers. And then algorithm models (Supplement Material 1) were used to select significant variables and conduct training and validation. The machine learning algorithm was implemented using Python 3.7 (Amsterdam, Netherlands).

Statistical analysis
The diagnostic accuracy of FibroBox and conventional fibrosis markers (APRI, FIB-4 and Fibroscan) was estimated using the area under the receiver operating characteristic curve (AUROC) and the rate of correctly classified fibrosis/cirrhosis. Delong's test [19] with a significant level of 0.05 was used to compare AUROC values of the FibroBox and other markers. Agreements between them were described using Cohen's kappa coefficient. The decision curve analysis (DCA) and ROC analysis were computed with R 3.5.1. Statistical analysis was conducted using SPSS 19.0 (SPSS Inc., Chicago, IL, USA).

Study population
Between July 2008 and November 2017, 1843 HBVinfected patients were retrospectively enrolled in this study ( Fig. 1). After exclusion of patients with HCC or other tumors (n = 193) and liver abscess (n = 86), histological specimens of 1393 (75.6%) patients showed Fig. 1 Flow diagram of the study population and reasons for exclusion. CHB, chronic HBV; HCC, hepatocellular carcinoma; HDV, hepatitis D virus eligibility. A total of 171 (9.3%) patients refused to participate in this study. After the investigation of clinical information, 14 patients were found co-infected with HDV and 26 with HIV ( Fig. 1). The data of 64 patients were incomplete. Therefore, 1289 patients were finally included in the study. The TE results of all the included patients were reliable according to guidelines proposed by Boursier et al. [16]. The main characteristics of the study patients are summarized in Table 1.

Histopathology
No complication was reported after liver biopsy. The significant fibrosis and cirrhosis account for 63.2% (815) and 22.5% (290) of all included patients, respectively. Almost a quarter of patients (382; 29.6%) had liver activity (A2/A3) and no steatosis was reported by the histopathologists. Meanwhile, 994 (77.1%) specimens showed consistent results rendered by 2 pathologists and a final determined diagnosis was reached by a third experienced histopathologist for the remaining specimens that showed biases.

Training sets in Huai'an and Jilin
In spearman correlation analyses of original variables, the stage of liver fibrosis was associated with age, AST, GGT, total bilirubin, platelet count, WBC, PT, ALP, albumin, INR, PIIINP, type IV collagen, laminin, HA, size of spleen, diameter of spleen vein, diameter of portal vein, velocity of portal vein and Fibroscan results ( Table 2). Subsequent multivariable analysis using the least absolute shrinkage and selection operator (LASSO) logistic regression (Fig. 2) and the filter method [20]
Across the range of reasonable threshold probabilities in this cohort, DCA graphically demonstrated that FibroBox provided a larger net benefit compared with TE, APRI and FIB-4 in diagnosing significant fibrosis and cirrhosis (Fig. 4a). This became as the supplementary evidence for the comparison of FibroBox and TE (p = 0.058) in predicting cirrhosis.

Discussion
In China, assessing the severity of CHB infection is a critical step before timely intervention [4]. TE has also been widely applied in Chinese hospitals in recent years, regardless of its high price.
To stage liver fibrosis noninvasively in patients with HBV, our study established and validated a multivariable model based on machine-learning and incorporating Fibroscan results, serum biomarker indices and ultrasonic measurements. This FibroBox model demonstrated favorable diagnostic performances in two external validation cohorts for the prediction of significant fibrosis which was superior to TE, APRI and FIB-4. The diagnostic performance of FibroBox for predicting cirrhosis was potentially better than TE, which required more validations. Fig. 2 Feature selection by using a parametric method, the least absolute shrinkage and selection operator (LASSO) regression. a Significant fibrosis feature selection of tuning parameter (λ) in the LASSO model used 10-fold cross-validation via minimum criteria. The AUC curve was plotted versus log(λ). Dotted vertical lines were drawn at the optimal values by using the minimum criteria and the 1 standard error of the minimum criteria (the 1standard error criteria). The optimal log(λ) of − 3.96 was chosen. b Cirrhosis feature selection and the optimal log(λ) of − 4.83 was chosen. c LASSO coefficient profiles of the 18 initially selected features. A vertical line was plotted at the optimal λ value, which resulted in 9 features with nonzero coefficients. d LASSO coefficient profiles of the 16 initially selected features. A vertical line was plotted at the optimal λ value, which resulted in 9 features with nonzero coefficients It was reported that Fibroscan performed better than serum biomarker indexes in predicting significant fibrosis and cirrhosis in Chinese cohorts [21,22]. In our study, TE measurements were obtained within a month after liver biopsy. The optimal cut-off values of Fibroscan for significant fibrosis and cirrhosis in both validation sets were 7.8 and 11.3 kpa, both close to those proposed in other countries [23][24][25]. Regardless of set types and prediction goals, all the AUROC results of TE were over 0.8, which was acceptable but not efficient enough. Our study excluded obese patients (BMI ≥30 kg/m2), thus ruling out an error leading to unreliable TE results. Fibroscan is not widespread because of its high cost (€34,000 for a portable device and €5000 for its annual maintenance), but its high diagnostic efficiency also makes it recommendable [5,26]. FibroBox behaved better than TE according to AUROC comparisons (Table 3, Fig. 3) and DCA curves (Fig. 4). Although the difference between FibroBox and TE for cirrhosis is not significant, the imbalance of data can also affect the validation results. For instance, less than a quarter of included patients were cirrhotic (Anhui: 14.5%; Beijing: 22.3%).
The application of Fibroscan is limited by ascites and not so reliable compared as two-dimensional (2D) shear wave elastography (SWE) [27,28]. However, 2D-SWE has not been widely applied like Fibroscan in China. Therefore, this study took TE as the only input variable. In addition, TE has the advantage of staging liver fibrosis regardless of causes (HBV, HCV or nonalcoholic fatty liver disease [NAFLD]). FibroBox only focused on the HBV-induced liver fibrosis, which required more similar studies about other kinds of fibrosis.
The prediction accuracy of APRI and FIB-4 observed in this study was unacceptable. The AUROC of APRI was 0.66 (0.60 to 0.73) in the Anhui cohort and 0.70 (0.65 to 0.76) in the Beijing cohort in predicting significant fibrosis, and 0.72 (0.65 to 0.79) in the Anhui cohort and 0.75 (0.67 to 0.82) in the Beijing cohort in predicting cirrhosis. The diagnostic performance of APRI in the prediction of cirrhosis was better than that of which in the prediction of significant fibrosis. The AUROC value of FIB-4 in predicting cirrhosis in the Anhui cohort was significantly higher than that of APRI (P = 0.009), indicating FIB-4 might have a prediction efficiency between those of APRI and TE. In addition, the optimal cut-off values of APRI and FIB-4 were both calculated with Youden index (sensitivity + specificity -1), and the optimal cut-off value of APRI was quite different from that recommended by the WHO guidelines [29], reminding There are several limitations in this study. First, the robustness of data was limited because of the retrospective researches. However, the size of research data is large and four centers participated in this study which can ensure the applicability and reliability of established models. We designed a two-validation-set study similar to that conducted by Lemoine et al. [25]. Second, the data sample inconsistency affected the model validations. For instance, the proportion of cirrhosis was only 14.5% in Anhui cohort, meaning that it cannot be taken as a training set, because this proportion is not enough to Third, the FibroBox is complicated and involves 10 parameters. However, the cost-effectiveness of this might not be poor because these 10 input parameters can be obtained through clinical examinations and the run time of FibroBox is only a few seconds. Finally, several parameters such as PIIINP, type IV collagen, laminin and HA are not readily available in clinical laboratories. We can develop several easily obtained ratios similar to the study conducted by Yuan et al. [30]. Future versions of Fibro-Box should focus on the simplification with accuracy.

Conclusions
In conclusion, compared with TE, APRI and FIB-4, FibroBox may be a superior noninvasive fibrosis indicator to predict the fibrosis stage in Chinese patients with CHB. The FibroBox requires further validation in other parts of China or other countries.