How effective are experienced hepatologists at staging fibrosis using non‐invasive fibrosis tests in patients with metabolic dysfunction‐associated steatotic liver disease?

Sequential use of non‐invasive fibrosis tests (NITs) to identify patients with advanced hepatic fibrosis is recommended. However, it remains unclear how reliable clinicians are staging liver fibrosis using combinations of NITs.


Metabolic dysfunction-associated steatotic liver disease (MASLD;
formerly non-alcohol-related fatty liver disease [NAFLD]) is the most common liver disease in many developed countries, affecting 20%-30% of the population. 1 The natural history of MASLD is variable, but secondary care studies suggest that up to 40% of patients with biopsy-proven MASLD develop progressive fibrosis that can result in cirrhosis. 2Individuals with advanced fibrosis or cirrhosis have an increased risk of liver-related complications and mortality in the short to medium term so identifying them is important to enable treatment, optimisation of cardiovascular risk and/or surveillance for liver-related complications. 3 Unfortunately, a large proportion of people with advanced MASLD remain undiagnosed and case-finding programmes are emerging to address this. 4 Multiple non-invasive tests for liver fibrosis (NITs) have been developed and combinations of these are used to stage liver fibrosis in primary and secondary care. 57][8][9] These tests can differentiate between F0-2 and F3-4 with reasonable accuracy, but sensitivity and specificity vary depending on the diagnostic cut-offs employed. 10Sequential use of NITs, particularly FIB-4 followed by VCTE or ELF, has emerged as a clinically and cost-effective method of staging MASLD patients and is recommended by guidelines. 5,11,12wever, NITs have limitations that can affect their performance.
For example, FIB-4 has a high false positive rate in older individuals and VCTE may be less accurate in individuals with high body mass index or severe steatosis. 13,14As a result, some clinicians may adapt NIT cut-offs in different clinical scenarios.Moreover, clinicians need to use clinical judgement when NIT results are discordant.Despite the widespread use of NITs in fibrosis staging pathways, it remains unknown how reliable clinicians are in interpreting their results.
Our aim was to assess the concordance between 'clinician fibrosis assessment' (CFA) by hepatologists using NITs and histology in patients with MASLD and to compare this with purely algorithmic approaches based on the strict application of evidence-based NIT thresholds.

| PATIENTS AND ME THODS
For this study, 230 patients representing the full histological spectrum of MASLD were randomly selected from the LITMUS Metacohort of the European NAFLD Registry. 15Details of the whole cohort have been previously reported. 16In brief, this study included well-phenotyped patients with biopsy-proven MASLD who were recruited prospectively from specialist centres across Europe between 2010 and 2017.Participants provided written informed consent before inclusion and studies contributing to the Metacohort were approved by ethics committees in the participating countries.
Only liver biopsy samples that were of adequate size and quality as judged by the reporting pathologist were included and they were histologically examined in each centre by expert liver pathologists.
Liver fibrosis was staged according to the NASH Clinical Research Network criteria. 17Advanced fibrosis was defined as F3 and cirrhosis as F4.
FIB-4 was calculated from laboratory data as previously described. 7ELF tests were analysed using the ADVIA Centaur CP system (Siemens, Munich, Germany).Liver stiffness measurement, determined by VCTE (FibroScan, Echosens, Paris, France), was obtained using the M or XL probe as per the manufacturer's recommendations.Blood tests and VTCE were performed at the time of liver biopsy or within 6 months.
Six consultant hepatologists, from the same tertiary referral liver transplant centre, were provided with four anonymised datasets of the patients that included basic clinical information (including age, sex, BMI, alanine aminotransferase [ALT], aspartate aminotransferase [AST], alkaline phosphatase [ALP], albumin and platelets) along with NIT results (FIB-4 only, FIB-4 plus ELF, and FIB-4 plus VCTE and FIB-4 plus ELF and VTCE).Laboratory and NIT results were provided as specific values rather than 'above' or 'below' cut-offs.For VTCE, data included the probe type used (M or XL) and the interquartile/ median ratio.There were no missing data.Using the available clinical and laboratory information, each hepatologist independently staged fibrosis, categorising patients as F0-2 vs F3-4.They were blinded to the histology and each other's results.The exercise was conducted four times with the same patients randomly ordered, firstly using FIB-4 alone, then FIB-4 plus ELF, then FIB-4 plus VCTE and finally with all 3 NITs (FIB-4, ELF and VTCE).Clinicians performed the fibrosis assessment in batches according to their individual preference.
Overall, it took clinicians approx. 2 h to complete each dataset of 230 patients (i.e.approx.8 h to score all for datasets).We also assessed the performance of purely algorithmic staging of F0-2 vs. F3-4 using the following NIT permutations: (i) FIB-4 score alone using a single cut-off of 1.3; (ii) ELF alone using single cut-offs of 9.5 or 9.8; (iii) VCTE alone using single cut-offs of 8, 10 or 12 kPa; (iv) sequential use of FIB-4 with single cut off of 1.3 followed by VCTE (i.e.those with <1.3 have F3-4 ruled out and those with FIB-4 ≥ 1.3 have VCTE utilised) using cut-offs of 8, 10 and 12 kPa; (v) the sequential use of FIB-4 using a single cut-off of 1.3 followed by ELF using cut-offs of

| RE SULTS
The demographic and clinical data for the 230 patients are shown in  Table 2 shows the concordance between histology and CFA or algorithmic NIT fibrosis assessments for F0-2 vs. F3-4.Overall, there was variability in concordance between the individual CFA and histology using all the test combinations.The largest variability was seen when clinicians used the FIB-4 score alone, where concordance with histology varied between 70% and 80% (median 74%).
Purely algorithmic approaches tended to perform more consistently than the CFAs, with sequential use of FIB-4 followed by VCTE having the highest concordance with histology at 84%.When using only a single test to stage fibrosis, ELF had the highest concordance with histology at 82%.Interestingly, only one clinician had marginally higher concordance (85%) with histology than the best performing algorithmic approach, using FIB-4 and VCTE and all 3 NITs.

| D ISCUSS I ON
The overall aim of this study was to assess how reliable clinicians are at staging liver fibrosis using NITs in patients with MASLD.Six experienced hepatologists were provided basic clinical information, liver blood tests and combinations of NIT results (FIB-4 alone then FIB-4 plus ELF then FIB-4 plus VCTE and finally all 3 of FIB-4, ELF and VTCE) and were asked to stage liver fibrosis for patients with biopsy-proven MASLD.Overall, we found good concordance between the clinicians when staging F0-2 vs. F3-4 with all NIT combinations (kappa ranged: 0.64-0.70).However, despite this apparent good concordance between the clinicians, we identified marked variability between the individual CFAs and histology for F0-2 vs. F3-4.
Moreover, there were notable differences in sensitivity and specificity between the hepatologists with some having an approach that had high specificity (few false positives) but low sensitivity (many false negatives) and others taking the opposite approach.Most of the clinicians had higher concordance with histology when they used FIB-4 plus VCTE or ELF or all three tests, rather

TA B L E 2 (Continued)
than FIB-4 alone.This likely reflects prior experience using combinations of NITs to stage fibrosis. 18Moreover, the FIB-4 typically yields indeterminate results in a significant proportion of patients, 9 and it is widely accepted that second-line testing is required for these patients. 5,11other key finding in this study was that algorithmic approaches tended to perform more accuarately than the CFAs.Overall, algorithmic sequential use of FIB-4 followed by VCTE using cut-offs of 8 or 10 kPa had the highest numerical concordance with histology at 83%-84%, with the 8 kPa cut-off being a little more sensitive, but slightly less specific than 10 kPa.Overall, ELF was the best performing single test at 82%.Only one of the hepatologists had marginally higher concordance (85%) than the best performing algorithmic approach.
Given the large burden of undiagnosed liver disease in the community, there is a need to develop standardised NIT pathways in primary care to efficiently identify patients with advanced liver fibrosis.Automated systems, such as intelligent liver function tests (iLFT) that was developed to aid diagnosis and staging of liver disease among patients with raised liver blood tests in primary care, have proven highly effective at improving liver disease diagnosis and reducing variability in care. 19Our findings support the integration of algorithms using NITs into automated pathways for the diagnosis and staging of fibrosis.
This study has some limitations.Firstly, it was conducted using a selected cohort that had a high prevalence of advanced fibrosis.
However, it was essential to use a well-characterised cohort to perform this analysis and typically, secondary care cohorts of patients with biopsy-proven MASLD are enriched with advanced disease.
Secondly, histology was used as the reference standard, and it is generally recognised that histology is an imperfect standard due to sampling variability and inter-and intra-observer variability. 20It is therefore possible that some of the patients whose fibrosis assessment was not concordant with histology were more accurately staged with NITs.This is an inherent problem with all studies that use histology as the reference.However, the raw performance of the NITs was high in this study indicating that the histological assessment was likely to have been robust.Thirdly, some of the tests used, particularly FIB-4, have an indeterminate zone where guidelines recommend use of a second-line test.We were keen to determine how effective a CFA using only FIB-4 was as second-line NITs are not available in some areas.Fourthly, ELF is not routinely available in our region so some of the hepatologists were less familiar with its use, which may have impacted on their performance using it for the CFAs.Moreover, it remains unclear whether these results can be extrapolated to clinicians from other settings such as primary care or those less familiar with the NITs assessed.
In conclusion, adhering to the recommended algorithmic NIT approaches to stage fibrosis tended to perform more accurately than less-structured 'clinical' NIT-based assessments conducted by consultant hepatologists.Adoption of algorithmic approaches will reduce variability in patient care and help conserve healthcare resources by focussing resource on those who need it most.

9. 5
or 9.8; (vi) sequential use of FIB-4 using 2 cut-offs (1.3 and 2.67) followed by VTCE (8 kPa) or ELF (9.5 or 9.8) (i.e those with <1.3 have F3-4 ruled out and those with FIB-4 ≥ 2.67 have F3-4 diagnosed, cases with indeterminate FIB-4 have VTCE or ELF utilised); (vii) the combined use of FIB-4 and ELF (patients with FIB-4 < 1.3 and ELF less than 9.8 had F3-4 excluded and those with either or both test results above the cut-off were staged as F3-4); or (viii) sequential use of FIB-4, ELF and VTCE (advanced fibrosis excluded with FIB-4 < 1.3 or or FIB-4 ≥ 1.3 and ELF < 9.8 or FIB-4 ≥ 1.3 or ELF ≥ 9.8 and VTCE < 8 kPa, while advanced fibrosis was diagnosed with FIB-4 ≥ 1.3, ELF ≥ 9.8 and VTCE ≥ 8 kPa).All statistical analyses were performed using SPSS software version 28.0 (IBM).Continuous normally distributed variables were represented as mean ± SD.Categorical and non-normal variables were summarised as percentage or median and interquartile range Histology was the reference standard for determining fibrosis stage (F0-2 vs. F3-4).The diagnostic performance of the raw NITs for F3-4 was assessed by receiver operating characteristic (ROC) curve analysis.Concordance between the 6 hepatologist's CFAs was assessed using the Fleiss multi-rater kappa test.Concordance between histology and the CFAs or algorithmic staging methods was assessed by diagnostic accuracy and Cohen's kappa test.Sensitivity, specificity, false positive and false negative rates were displayed for each modality.

Table 1 .
The median age of the cohort was 54 [range: 22-78] years