The threshold of alpha-fetoprotein (AFP) for the diagnosis of hepatocellular carcinoma: A systematic review and meta-analysis

Objective Hepatocellular carcinoma (HCC) has become a pressing health problem facing the world today due to its high morbidity, high mortality, and late discovery. As a diagnostic criteria of HCC, the exact threshold of Alpha-fetoprotein (AFP) is controversial. Therefore, this study was aimed to systematically estimate the performance of AFP in diagnosing HCC and to clarify its optimal threshold. Methods Medline and Embase databases were searched for articles indexed up to November 2019. English language studies were included if both the sensitivity and specificity of AFP in the diagnosis of HCC were provided. The basic information and accuracy data included in the studies were extracted. Combined estimates for sensitivity and specificity were statistically analyzed by random-effects model using MetaDisc 1.4 and Stata 15.0 software at the prespecified threshold of 400 ng/mL, 200 ng/mL, and the range of 20–100 ng/mL. The optimal threshold was evaluated by the area under curve (AUC) of the summary receiver operating characteristic (SROC). Results We retrieved 29,828 articles and included 59 studies and 1 review with a total of 11,731 HCC cases confirmed by histomorphology and 21,972 control cases without HCC. The included studies showed an overall judgment of at risk of bias. Four studies with AFP threshold of 400 ng/mL showed the summary sensitivity and specificity of 0.32 (95%CI 0.31–0.34) and 0.99 (95%CI 0.98–0.99), respectively. Four studies with AFP threshold of 200 ng/mL showed the summary sensitivity and specificity of 0.49 (95%CI 0.47–0.50) and 0.98 (95%CI 0.97–0.99), respectively. Forty-six studies with AFP threshold of 20–100 ng/mL showed the summary sensitivity and specificity of 0.61 (95%CI 0.60–0.62) and 0.86 (95%CI 0.86–0.87), respectively. The AUC of SROC and Q index of 400 ng/mL threshold were 0.9368 and 0.8734, respectively, which were significantly higher than those in 200 ng/mL threshold (0.9311 and 0.8664, respectively) and higher than those in 20–100 ng/mL threshold (0.8330 and 0.7654, respectively). Furthermore, similar result that favored 400 ng/mL were shown in the threshold in terms of AFP combined with ultrasound. Conclusion AFP levels in serum showed good accuracy in HCC diagnosis, and the threshold of AFP with 400 ng/mL was better than that of 200 ng/mL in terms of sensitivity and specificity no matter AFP is used alone or combined with ultrasound.

Introduction Hepatocellular carcinoma (HCC) remains one of the most invasive cancers in humans, mostly occurring in patients with chronic liver disease, and the third leading cause of cancer-related death throughout the world [1]. Although its causes, prevention, and treatment strategies are recommended in guidelines, HCC is expected to become a pressing health problem facing the world in the coming decades [1,2] Although researchers are making strides in HCC monitoring and treatment, there has been little improvement in survival in patients with HCC. In the United States, the 5-year survival rate of patients with HCC is still less than 12% [3]. The effective therapies are very limited for advanced HCC whose the survival rate decreased significantly [4], while there are several available treatments for the management of HCC with early stage, such as radical resection or liver transplantation, where 5-year survival rate of HCC patients who met the Milan criteria (single nodule < 5cm or three nodules diameter < 3cm) after liver transplantation was more than 70% [5,6]. Therefore, the early discovery of HCC might be very important, and it is reported that early detection of HCC can improve the clinical outcomes [7]. Based on the evidence of benefits from early detection of HCC, the guidelines of both American Association, Asian Pacific Association, and Japan Association recommend HCC monitoring in high-risk patients for early diagnosis of HCC [8][9][10][11].
The alpha-fetoprotein (AFP) in serum is currently available diagnostic marker for HCC discovery. As for patients with chronic liver disease, a sustained increase in AFP serum level was shown to be one of the risk factors of HCC and has been used to help identify high-risk subgroup of chronic liver disease [12]. In patients with liver cirrhosis, fluctuations in AFP levels may reflect the sudden onset of viral hepatitis, the deterioration of the potential liver disease, or the development of HCC [13]. Besides, the level of AFP was reported to interact with some molecular subtypes such as EpCAM positive in invasive HCC [14][15][16]. It is established that multiple factors could contribute to the AFP level, which increases the difficulty of identifying the threshold. When the cutoff value of AFP was 20 ng/ml, the detection showed relatively good sensitivity with poor specificity, while when the cutoff value was 200 ng/ml, the discovery performed high specificity, but the sensitivity decreased significantly [17]. In 2001 and 2017 diagnostic staging standard of HCC in China, AFP 400 ng/mL was used as the diagnostic threshold [18]. However, a meta-analysis [19] shows that the diagnostic efficiency of AFP � 200 ng/mL may be higher, partly because some of the early HCC [20] may be missed in the population with low concentration of AFP (20 to 200 ng/mL) if 400 ng/mL is still used as the criteria in HCC screening. Therefore, up to now, the optimal threshold of AFP for the diagnosis of HCC is still controversial [21][22][23].
In addition, it has been reported that AFP combined with ultrasound detection might improve the detection rate of HCC [24]. Both American Association for the Study of Liver Disease (AASLD) and European Association for the Study of the Liver (EASL) suggest that it is necessary to monitor HCC in high-risk patients partly by abdominal ultrasonography every six months, but there exists argument in the use of AFP as an auxiliary monitoring test and there is no identified threshold of AFP when the combination of AFP and ultrasound is used to monitor HCC [25,26].
Therefore, it is particularly important to explore the optimal screening and diagnostic threshold of serum AFP with or without ultrasound for early diagnosis of HCC. The purpose of this study was to identify the optimal diagnostic threshold of serum AFP by systematic review and meta analysis. This article was performed based on Meta-Analysis of Observational Studies in Epidemiology (MOOSE) and reported in accordance with Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) statement [27,28], and Qualitu assessment for studies of diagnostic accuracy (QUADAS-2) was used to evaluate the quality of diagnostic test [29].

Results
We retrieved 29,828 records from databases search, and assessed 21,464 records after deleting the duplication, and finally 59 original articles in terms of AFP alone and one systematic review in terms of AFP in combination with ultrasound  were enrolled for data synthesis, as is shown in Fig 1. This systematic review finally yielded information on a total of 11,731 HCC cases confirmed by histomorphology and 21,972 control cases without HCC.

Basic information and quality assessment
The basic information of the included studies was shown in Table 1. In all, we summarized the results from 4 studies using a AFP threshold of 400 ng/mL, and from 4 studies using a AFP threshold of 200 ng/mL, and 46 studies using a AFP threshold of 20-100 ng/mL. As for the sample, the serum was used to detect the AFP by forty-three studies, while the remaining used plasma. The included 59 researches were conducted in diverse countries, including China (n = 15), USA (n = 11), Japan (n = 9), Korea (n = 8), Egypt (n = 5), Italy (n = 2), Thailand (n = 2), France (n = 2), South Africa (n = 1), Turkey (n = 1), India (n = 1), Germany (n = 1), Indonesia (n = 1), and Australia (n = 1). Thirty-seven studies used samples from Asian while twenty-three studies used samples from Caucasian. As for the etiology of HCC, 16 studies [49,51,52,55,57,59,62,64,66,69,73,75,[79][80][81]86] only covered HBV or HCV hepatitis, one study was not available, and the remaining 42 studies [30-48, 50, 53, 54, 56, 58, 60, 61, 63, 65, 68, 70-72, 74, 76-78, 82-84, 86, 87] were mix which included HBV infection, HCV infection, alcohol and others. Shown were the estimates of sensitivity, specificity, true positive, false positive, false negative, true negative in terms of AFP in HCC diagnosis in the Table 2. The quality assessment by QUADAS-2 tool revealed a overall judgment of at low risk of bias for the included studies, which was shown in S3 Table. Specifically, domain of patient selection, index test, and flow and timing showed a low risk of bias, domain of reference standard showed a conclusion of potential for bias exits, and the applicability concerns were rated as low.

Meta-analysis of diagnostic accuracy estimates
As was shown in Table 3 and Figs 2-4, four studies with AFP threshold of 400 ng/mL showed the summary sensitivity and specificity of 0.32 (95%CI 0.31-0.34) and 0.99 (95%CI 0.98-0.99), respectively, while eighteen studies with 400 ng/mL plus ultrasound showed the pooled   The result from AFP alone as the marker indicated that the specificity of the threshold 400 ng/mL was the highest (99.0%), but the sensitivity was the lowest (32.0%). The specificity of the 200 ng/mL was 1.0% lower than that of the 400 ng/mL, but the sensitivity could increase to 49.0%, with dOR being the highest (42.06%). The threshold of 20-100 ng/mL owned the greatest sensitivity of 61.0%, but the specificity and dOR were lower than that of 200 ng/mL and 400 ng/mL.

Threshold identification by SROC analysis
As is shown in Table 3 and Fig 5, The AUC of SROC and Q index of 400 ng/mL threshold were 0.9368 and 0.8734, respectively, which were significantly higher than those in 200 ng/mL threshold (0.9311 and 0.8664, respectively) and higher than those in 20-100ng/mL threshold The threshold of AFP in diagnosis of HCC. (0.8330 and 0.7654, respectively). Similarly, when combined with ultrasound, the AUC of SROC and Q index of 400 ng/mL threshold were 0.9394 and 0.8767, respectively, which were significantly higher than those in 200 ng/mL threshold (0.9359 and 0.8723, respectively) and higher than those in 20-100 ng/mL threshold (0.8464 and 0.7778, respectively).

Heterogeneity test and meta-regression analysis
There was no heterogeneity between groups of different threshold (p > 0.05), as was shown in Table 4. However, there existed heterogeneity in sensitivity, specificity, + LR, -LR and dOR within groups with varied threshold, as was shown in Table 5. This heterogeneity may be related to the diversity of population selection, including hepatitis B (HBV) and hepatitis C (HCV), as well as some mixed cases, along with diverse detection methods, instruments, reagents, standards. However, only indicators of potential heterogeneity sources such as control, year, country, sample type, assay type and etiology (HBV, HCV or MIX) could be extracted from the included articles. The P-value > 0.10 was realized as homogeneous [88], The threshold of AFP in diagnosis of HCC. and no statistically significant effect existed on heterogeneity of three groups (P > 0.10), as shown in Table 6.

Discussion
The disagreement between different international guidelines in terms of the AFP threshold for HCC diagnosis has been continued for several decades, and it has not yet been revolved so far. This article comprehensively reviewed the evidence for the threshold of AFP, and the results showed that AFP threshold of 400 ng/mL reporting the summary sensitivity of 0. It is well established that AFP level has been an optimal diagnostic marker for early diagnosis of HCC because of its well performance of sensitivity and specificity. However, along with HCC, there are other tumor contributors to the rise of AFP levels, such as reproductive system tumors; besides, the process of liver cell regeneration after an acute inflammation could also lead to the occurrence of a sharp increase in AFP levels during the progress of chronic liver diseases like hepatitis and liver cirrhosis [89][90][91]. Therefore, further laboratory examinations and imaging tests should be provided to combine the result of AFP to make a definite diagnosis [92,93]. Because of this, the AFP threshold for the diagnosis of HCC is still controversial.  AFP � 400 ng/mL is recommended as the diagnostic criteria of HCC in the Chinese guideline for diagnosis and treatment of primary liver cancer (2017 edition) [94]. Nevertheless, Cedrone et al. [95] reported that the level of AFP in patients who had HCC was not affected by HBV or HCV, and a better threshold of serum AFP level should be 50 ng/mL. Another voice from Xu Jianye et al. [96] proposed that the 150 ng/mL diagnostic threshold of AFP for HCC showed better efficacy. Moreover, Zhang Jianhua et al.
[20] proved that a low concentration of AFP in the range of 20-200 ng/mL could be used for early screening in the high risk population which could also be combined with ultrasound. However, the 2011 American Society of Hepatology HCC guidelines no longer use AFP as a screening method for HCC [97]. But what should draw our great attentions is the fact that unlike American, the major cause of HCC in other countries such as China is viral hepatitis, so that the dynamic surveillance of AFP level along with ultrasound in the screening among HCC high-risk population [98] still owns its great clinical application [99,100]. What should actually be addressed in the next version guidelines of America, Europe, Asian-Pacific, and China, is the threshold of AFP in different phase in HCC management. This meta analysis has its strengths and limitations. This systematic review included 59 articles and a total of 11,731 HCC cases and 21,972 non-HCC cases, which has summarized the evidence from the largest number of researches and participants representative of varied population from all over the world up to now. All the positive and negative cases in this review were confirmed by histomorphology, which ruled out the misclassification bias, and the quality of the included researches showed a low risk of bias. However, there is not without limitations. The articles in this meta analysis was restricted to the publications only in English language, which might missed the studies published in other languages. What is worth mentioning, in this review there are 20,732 cases from Asia, 630 cases from Africa, 5,924 cases from Europe, 8,666 cases from North America, which means that there might be selection bias when giving the conclusion of this article to the whole population; however, the results from meta-regression to detect the heterogeneity sources did not find any significant difference between countries. Furthermore, we have also detected considerable heterogeneity between three groups of varied threshold, and the meta-regression model has not discovered any heterogeneity resource with statistical significance. There also exists potential imbalance between the three groups of different threshold in terms of the number of the studies in each threshold group.
In conclusion, the present meta analysis suggests that AFP levels show good accuracy in HCC diagnosis, and the threshold of AFP with 400 ng/mL is better than that of 200 ng/mL and 20-100 ng/mL in terms of sensitivity and specificity no matter AFP is used alone or combined with ultrasound. Although included studies showed a low risk of bias, and publication bias was not suggestive, yet heterogeneity existed within groups, which might lead to the different threshold across geographic regions. Despite the current conclusion that AFP threshold of 400 ng/mL should be used for the diagnosis of HCC, the threshold of 20 ng/mL should also be suggested to lead to the decision to let a patient go into the surveillance program for HCC due to its high sensitivity. Future studies should pay more attention to the dynamic change of AFP along with the advance of HCC, where artificial intelligence might be applied to construct a model to predict the prognosis of HCC.

Materials and methods
This systematic review was performed according to the MOOSE and reported in accordance with PRISMA statement [27,28]. The protocol was registered at PROSPERO (CRD42019133742, http://www.crd.york.ac.uk/PROSPERO).

Search strategy and article screening
The Medline and EMBASE databases were searched from inception up to November 2019 with the following terms: "alpha-Fetoproteins or AFP" AND "Carcinoma, Hepatocellular or Hepatocellular Carcinomas or Liver Cell Carcinoma" (The detailed search strategy was described in S1 Table and S2 Table). Besides, we reviewed the references in identified projects for further potential studies. Two reviewers independently screened the titles and abstracts of all retrieved records to find potentially appropriate studies, and then by reading the full text they evaluated the remaining records to identify studies suitable for data synthesis. Any disagreement was resolved by consensus or arbitrator.

Inclusion criteria
We finally included original articles that met the following criteria: 1. Type of the study was diagnostic accuracy study.
2. Participants in the study included both the patients with HCC diagnosed by pathological diagnosis (gold standard) were taken as the case group and the patients with clinically diagnosed non-liver cancer as the control group.
3. Indicators to be evaluated in the study included AFP.
4. There was a definite AFP measurement value in the article.

Information extraction and quality assessment
Basic information of each included studies was extracted by two reviewers independently. The QUADAS-2 was used to evaluate the quality of diagnostic test literature by two reviewers independently [29]. The evaluation tool includes three aspects-variation, bias, and report quality -and eleven items, where the answer of each item consists of three choices: "Yes," "No," and "unclear." "Yes" means the study meet the criterion, "No" means not satisfied or not mentioned, and "not clear" is partially satisfied or unable to obtain sufficient information from the literature.

Data extraction and statistical processing
The diagnostic four-grid table data including TN, FN, FP, and TP were extracted from the included literatures, and Meta Disc 1.4 as well as Stata 15.0 software were used for statistical processing. The random effect model was applied to summarize the accuracy estimates if there was heterogeneity, while the fixed-effect model was applied if there was not. We calculated summary estimates of sensitivity, specificity, diagnostic odds ratio (dOR), positive likelihood ratio (+ LR), negative likelihood ratio (-LR). A summary receiver operating characteristic (SROC) curve was also displayed and the area under curve (AUC), and Q � index was used to determine the threshold. Meta-analysis was used to obtain the combined value of the accuracy indicators and their 95%CI, The test level is α = 0.05. The heterogeneity caused by threshold effect was examined by Spearman correlation analysis, and sensitivity and specificity heterogeneity was examined by the chi-square test. The -LR and + LR were examined by Cochrane-Q test. Meta-regression analysis was used to detect the contributors of the heterogeneity. Deek's funnel plot was used to assess the publication bias, and a slope coefficient with p <0.10 revealed significant bias.
Supporting information S1 Prisma. PRISMA-P (Preferred reporting items for systematic review and meta-analysis protocols) 2015 checklist: Recommended items to address in a systematic review protocol � .