Elevated Urinary Neutrophil Gelatinase-Associated Lipocalin Is a Biomarker for Lupus Nephritis: A Systematic Review and Meta-Analysis

Objective Lupus nephritis (LN) is a major and severe complication of systemic lupus erythematosus (SLE). Neutrophil gelatinase-associated lipocalin (NGAL), as a promising next-generation biomarker in clinical nephrology, has received extensive attention. However, its diagnostic performance in LN has high variability. Therefore, we performed an updated meta-analysis to further evaluate the diagnostic accuracy of urinary NGAL (uNGAL). Materials and Methods PubMed, Embase, and Cochrane Library were searched from inception to October 27, 2019. Meta-analysis was performed with a bivariate random effects model. Additionally, the summary receiver operating characteristic (SROC) curves were established. The sources of heterogeneity were explored by meta-regression, subgroup analysis, and sensitivity analysis. Publication bias was assessed using the Deeks test. Results 19 articles consisting of 21 eligible studies were included. In diagnosing LN, the estimates (95% confidence interval (CI)) were as follows: sensitivity, 0.84 (0.71-0.91); specificity, 0.91 (0.70-0.98); and the SROC-AUC value, 0.92 (0.90-0.94). In identifying active LN, the estimates were as follows: sensitivity, 0.72 (0.56-0.84); specificity, 0.71 (0.51-0.84); and the AUC value, 0.77 (0.74-0.81). With respect to predicting renal flare, the estimates were as follows: sensitivity, 0.80 (0.57-0.92); specificity, 0.67 (0.58-0.75); and the AUC value, 0.74 (0.70-0.78). For the studies to distinguish proliferative LN, the estimates were as follows: sensitivity, 0.87 (0.66-0.97), and specificity, 0.69 (0.39-0.91). Deeks' funnel plot suggested that there was no significant publication bias. Conclusions Our meta-analysis indicates that uNGAL was a useful biomarker for diagnosis, estimation of activity, and prediction of renal flare of LN. In addition, the usefulness of uNGAL to distinguish pathological types of LN needs to be further investigated.


Introduction
Systemic lupus erythematosus (SLE) is a complex multisystem autoimmune disease characterized by the production of numerous antibodies to cellular components and marked by complicated manifestations, ranging from detectable laboratory abnormalities to multiorgan inflammation and failure [1]. Lupus nephritis (LN), a major risk factor for morbidity and mortality in SLE [2], is a real challenge in the management of SLE due to the lack of effective methods in diagnosing subclinical onset and identifying relapses. Neutrophil gelatinase-associated lipocalin (NGAL, also known as lipocalin-2) is a 25 kDa lipocalin originally purified from human neutrophils [3]. NGAL is an acute-phase glycoprotein secreted in small amounts by neutrophils, epithelial cells, macrophages, hepatocytes, adipocytes, and neurons under physiological conditions, and its expression is significantly increased when it responds to cellular stress [4]. The elevated level of NGAL is associated with injury to epithelial cells in the gastrointestinal tract, respiratory tract, or renal tubules [5]. The relatively small size, secreted pattern, and reliable stability have made it a valuable diagnostic and prognostic biomarker in multiple diseases including acute or chronic kidney diseases [6][7][8], sepsis [9], cardiovascular diseases [10,11], inflammatory bowel diseases [4], and cancer [12,13]. NGAL can be detected in both serum and urine.
Urinary biomarkers seem to be more promising than serum biomarkers in the diagnosis of kidney diseases, as the former is derived directly from the inflamed tissue [14].
A previous meta-analysis published in 2015 suggested that uNGAL was a potential biomarker in diagnosing LN and monitoring LN activity [15], but the number of eligible studies was relatively small and did not provide evidence about the role of NGAL in identifying proliferative LN. With accumulating evidence, there is an unmet need for us to perform a systematic review and an updated meta-analysis to further address the usefulness of uNGAL for diagnosis, monitoring, and prediction of LN.

Materials and Methods
2.1. Literature Search. The report of the methods used for this systematic review and meta-analysis was in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) consensus statement [16]. Two independent reviewers conducted a comprehensive literature search in the electronic databases including PubMed, Embase, and Cochrane Library up to October 27, 2019. Search strategies included Medical Subject Heading (MeSH) terms and keywords. The MeSH terms were "lupus erythematosus, systemic" and "lupus nephritis". The keywords included "lupus", "SLE", "LN", "neutrophil gelatinase-associated lipocalin", "NGAL", and "lipocalin". We also searched the combined mode of MeSH or keywords. All retrieval was not restricted by language. In addition, we searched the reference lists of eligible papers manually to identify additional relevant studies. The detailed literature search methods are presented in supplementary Table 1 (Table S1).

Study Selection.
The included articles were evaluated by two independent reviewers. Unrelated articles were excluded by reading titles and abstracts of the literatures. If articles were relevant to our research topic, the full texts were carefully read to determine the inclusion or exclusion criteria of the articles. Discrepancies were resolved by discussion or consulting a third investigator. Articles were included if the studies fulfilled the following criteria: (1) the studies were observational studies; (2) the patients were diagnosed according to the American College of Rheumatology (ACR) or Systemic Lupus International Collaborating Clinics (SLICC) classification criteria for SLE; (3) the studies evaluated the diagnostic accuracy of uNGAL concentration in LN vs. non-LN, patients with active LN vs. inactive LN, patients with renal flares vs. without renal flares, and patients with proliferative LN vs. with nonproliferative LN; (4) the studies provided mandatory data from which true-positive (TP), false-positive (FP), false-negative (FN), and truenegative (TN) values could be directly found or calculated; and (5) urine samples were obtained from the spot urine. The exclusion criteria were as follows: (1) studies that were duplicates; (2) studies that were reviews, case reports, metaanalysis, conference abstracts, and animal or cell experiments; (3) studies with irrelevant contents; and (4) studies that did not provide TP, FP, FN, and TN which were used to form a 2 × 2 contingency table.
2.3. Data Extraction and Quality Assessment. Two authors independently extracted the data from all the eligible studies, and they were both blind to the relevant contents of the included studies to reduce bias. The following items were extracted from the included studies: (1) basic characteristics of the studies: first author's name, year of publication, study design, region, population type, mean age, percentage of female patients, ethnicity, the method for the NGAL assay, pathological classification criteria, and renal disease activity score, and (2) outcomes of the studies: the optimal cut-off threshold and TP/FP/FN/TN values which were extracted directly or calculated by the Review Manager Software version 5.3 (RevMan 5.3).
The quality assessment of the included studies was performed by the quality assessment tool for diagnostic accuracy studies 2 (QUADAS-2) [17]. The tool is composed of 4 key domains: patient selection, index test, reference standard, and flow and timing. For the evaluation of each domain, the following judgments were used: yes, no, low risk, high risk, and unclear risk. We defined "yes" or "low risk" as 1 score and "no," "high risk," or "unclear risk" as 0 score and calculated the total score. RevMan 5.3 was used for the analysis of the risk of bias and applicability concerns.

Statistical Analysis.
The diagnostic meta-analysis was performed using Stata version 12.0 software (Stata Corporation, College Station, TX, USA) and Meta-DiSc version 1.4 (XI Cochrane Colloquium, Barcelona, Spain). Heterogeneity was estimated using Cochran's Q test and the I-squared (I 2 ) statistical test. I 2 values of 25, 50, and 75% were thought to indicate low, moderate, and high heterogeneity, respectively [18]. If the heterogeneity was significant (P Q < 0:05 or I 2 > 50%), a random effects model (DerSimonian-Laird method) was used to calculate the pooled sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR) with 95% confidence interval (CI); otherwise, a fixed effects model (Mantel-Haenszel method) was used. Moreover, forest plots of sensitivity, specificity, and summary receiver operating characteristic (SROC) with an area under the curve (AUC) value were presented. SROC was used to assess whether there was a "shoulder-arm" pattern or not. A typical "shoulder-arm" pattern would indicate the presence of the threshold effect. Furthermore, the relationship between sensitivity and specificity evaluated by the Spearman correlation coefficient was used to further evaluate the threshold effect. An AUC ≥ 0:70 was defined as a useful risk predictor [19]. Additionally, metaregression and subgroup analysis were performed to explore the sources of heterogeneity among the included studies. We also performed a sensitivity analysis to examine the stability of our meta-analysis. In addition, the publication bias was assessed by Deeks' funnel plot method, and values of p < 0:05 were considered statistically significant.

Quality Assessment.
According to the graph of risk of bias and applicability concerns (Figure 2), the included studies had a high risk of bias in terms of the index test, as well as flow and timing, but a low risk of bias in patient selection and reference standard. The unclear risk of bias and the concern regarding the applicability of the patient selection were introduced because 2 studies [32,34]     or "low concern": 1 score; "no"

BioMed Research International
or "high concern" or "unclear": 3.4.1. Part 1: The Diagnostic Accuracy for uNGAL to Identify LN. As shown in Figure 3 and  Figure 4, the AUC value of the SROC curve was 0.92 (95% CI, 0.90-0.94). The points in the plots did not show a "shoulder-arm" shape, suggesting the absence of the threshold effect. Furthermore, the Spearman correction coefficient between the logit of sensitivity and logit of 1 − specificity of uNGAL was -0.117 (p = 0:765), also indicating that there was no threshold effect.

Part 2:
The Diagnostic Accuracy for uNGAL to Identify Active LN. As shown in Figure 3 and Table 3, the overall pooled sensitivity and specificity for uNGAL to identify active LN were 0.72 (95% CI, 0.56-0.84) and 0.71 (95% CI, 0.51-0.84), respectively. In addition, the pooled PLR was

Part 3: The Diagnostic Accuracy for uNGAL to Predict
Renal Flare. As shown in Figure 3 and Table 3 Figure 3: Forest plots for sensitivity and specificity for uNGAL in part 1 to part 3. Forest plot for sensitivity and specificity of uNGAL to identify LN (a, b). Forest plot for sensitivity and specificity of uNGAL to identify active LN (c, d). Forest plot for sensitivity and specificity of uNGAL to predict renal flare (e, f). 10 BioMed Research International    . The heterogeneity detected in the pooled sensitivity and specificity was Q = 18:18 (I 2 = 72:50%, p < 0:001) and Q = 14:77 (I 2 = 66:15%, p = 0:01), respectively. As shown in Figure 4, the SROC-AUC value was 0.74 (95% CI, 0.70-0.78) and the graph of the SROC curve was not a "shoulder-arm" shape. Also, there was no threshold effect according to the Spearman rank correlation analysis (Spearman correlation coefficient: -0.771, p value = 0.072).

Part 4:
The Diagnostic Accuracy for uNGAL to Identify Proliferative LN. As shown in Table 3 3.5. Heterogeneity Analysis. Heterogeneity was significant in all parts of the meta-analysis. Therefore, meta-regression (when the number of the included studies is greater than or equal to 10), subgroup analysis, and sensitivity analysis were conducted to explore possible sources of heterogeneity for part 1 to part 3. In part 1, as shown in Table 3, four studies [20,21,33,38] formed a subgroup with QUADAS − 2 scores ≥ 13, the pooled sensitivity decreased from 0.84 to 0.81, and specificity increased from 0.91 to 0.95. Specifically, the heterogeneity of sensitivity increased from 86.20% to 89.90%. The heterogeneity of specificity decreased from 88.93% to 62.70%.
In part 2, the following covariates were used as predictor variables in the meta-regression analysis: patient type (children (n = 1) or adults (n = 9)), design type (prospective cohort study (n = 6) or cross-sectional study (n = 4)), publication year (2010 and before (n = 3) or after 2010 (n = 7)), reference standard (renal Systemic Lupus Erythematosus Disease Activity Index (R-SLEDAI) (n = 6) or others (n = 4 )), and quality of study (QUADAS-2 scores < 13 (n = 4) or QUADAS-2 scores ≥ 13 (n = 6)). The coefficients and p value of these variables are listed in supplementary Table 2  (Table S2). Of note, the p value of the design type was 0.0284 indicating that it was a potential source of heterogeneity among these studies. Then, subgroup analysis was performed according to the design type, and the results showed that the pooled sensitivity and specificity in the cross-sectional subgroup were higher than those in the prospective cohort subgroup (0.87 vs. 0.57 and 0.82 vs. 0.61, respectively). The heterogeneity of sensitivity and specificity in the cross-sectional subgroup was also lower when compared to the pooled results of the entire ten studies (84.56% vs. 91.17% and 84.04% vs. 94.24%, respectively) ( Table 3).
In part 3, based on the reference standard, the pooled sensitivity and specificity in the three studies [22,23,31] of the R-SLEDAI subgroup increased from 0.80 to 0.90 and 0.67 to 0.74, respectively, and the heterogeneity in sensitivity and specificity of the R-SLEDAI subgroup decreased from 72.50% to 55.40% and 66.15% to 21.17%, separately ( Table 3). As shown in supplementary Figure 1 (Figure S1), after removing the study of Elewa et al. [31], the heterogeneity in sensitivity and specificity was lower than before (65.05% vs. 72.50% and 43.96% vs. 66.15%, respectively).
3.6. Publication Bias. As shown in Figure 5, the evaluation of publication bias according to the Deeks funnel plot asymmetry test showed that there was no potential bias in part 1 to part 3 (p value = 0.861, 0.254, and 0.465, respectively).

Discussion
LN, a severe complication of SLE, poses a real challenge in the management of SLE patients because of the difficulty in early diagnosis and identification of relapses [39]. On the one hand, traditional clinical parameters such as proteinuria, glomerular filtration rate (GFR), urine sediments, anti-dsDNA, and complement levels are not sensitive or specific enough for diagnosis, monitoring of disease activity, and early relapse of nephritis [14]. On the other hand, renal biopsy, as the gold standard for diagnosis and prognosis of LN, is an invasive method and may cause potential complications [40]. Therefore, there is an urgent need for the identification of reliable noninvasive biomarkers with good sensitivity and specificity that contribute to the diagnosis and monitoring of LN. In the past twenty years, NGAL has been the most widely studied biomarker in AKI and has been demonstrated to possess an excellent diagnostic performance. Previous studies have shown that concentrations in urine and serum of NGAL represent sensitive, specific, and highly predictive biomarkers for acute renal injury (AKI) after cardiac surgery [41,42], in kidney transplantation [43] and critically ill patients [44]. Since 2006, an increasing number of studies have demonstrated the usefulness of urinary NGAL (uNGAL) in the diagnosis and monitoring of LN, but there is a wide range of variability in uNGAL's diagnostic performance. An existing meta-analysis [15] evaluated the diagnostic performance of uNGAL in LN. For the aim of diagnosing LN, it only included 4 eligible studies, and for estimating LN activity, it included 8 studies. But the number of studies has increased since 2015, so we performed an updated meta-analysis to derive a more accurate estimation for the diagnosis and prognosis of LN and also provided evidence for uNGAL to identify proliferative LN.
The main results of our current meta-analysis could be summarized as follows: uNGAL performed well in all parts investigated, with the pooled sensitivity ranging from 0.72 to 0.87 and the specificity ranging from 0.67 to 0.91, respectively. The AUC values of the SROC curves for diagnosing LN, active LN, and renal flare were all beyond 0.70. Of note, meta-analysis of SROC curves revealed a high diagnostic profile for uNGAL to identify LN (AUC value = 0:92). Apart from the valuable diagnostic performance, there was significant heterogeneity in all parts of our meta-analysis. In the meta-analysis of diagnostic tests, the threshold effect is an important source of heterogeneity [45]. In our meta-analysis, we tested the threshold effect in all parts of our meta-analysis  14 BioMed Research International and found that there were no obvious threshold effects, which indicated that threshold effects might not be a source of heterogeneity in our meta-analysis. To explore other possible sources of heterogeneity, we conducted meta-regression, subgroup analysis, and sensitivity analysis in part 1 to part 3. After removing the studies with a QUADAS-2 score < 13, the remaining subgroups showed better diagnostic accuracy for the diagnosis of LN, suggesting that the quality of the studies may be a potential source of heterogeneity. In addition, the application of the blinding method, the storage time, and temperature for uNGAL samples might also introduce potential bias as assessed by QUADAS-2. A meta-regression analysis was conducted in part 2 using the following covariates: patient type, design type, publication year, reference standard, and quality of the study, indicating that design type may be a potential source of heterogeneity. The cross-sectional subgroup in distinguishing active LN had better diagnostic accuracy and lower heterogeneity. The R-SLEDAI subgroup in part 3 showed increased sensitivity and specificity, as well as significantly decreased heterogeneity. According to the results of sensitivity analysis, by removing the study of Elewa et al. [31], the heterogeneity decreased and the summary results became more robust. Moreover, Deeks' funnel plots revealed that there was no obvious heterogeneity produced by publication bias. Apart from uNGAL, serum NGAL was also detected in several included studies [21,22,26,29,33], most of the studies [21,26,33] showed that serum levels of NGAL were not significantly different between LN and controls, one study pointed that serum NGAL levels were statistically different between patients with active LN and those with nonactive SLE, and another study [22] indicated that serum NGAL levels increased significantly before worsening of LN as measured by the BILAG renal score. However, the number of studies exploring the diagnostic accuracy of serum NGAL in LN is relatively small, which still needs to be further evaluated in future studies.
Although uNGAL has been verified to be a satisfactory diagnostic biomarker for LN, identifying new biomarkers or a combination of relevant biomarkers to diagnose and predict LN in a more sensitive and specific way remains an unmet need. Studies have demonstrated pentraxin 3 (PTX3), a regulator of the innate immunity system participating in the tubulointerstitial inflammation, and its level was significantly increased in patients with active LN and might be a biomarker for disease progression [46]. Other biomarkers such as monocyte chemotactic protein 1 [47], ceruloplasmin [48], adiponectin [49], and kidney injury molecule 1 [50] were also verified to be valuable biomarkers in the diagnosis and monitoring of LN. Brunner and his colleagues [51] developed a Renal Activity Index for Lupus (RAIL) based solely on laboratory measures, including uNGAL and other biomarkers, which could accurately reflect histologic LN activity in children. The role of RAIL in the prediction of LN activity in adults was also demonstrated to be excellent, indicating the promising value of the combined biomarkers in the diagnosis and prognosis of LN [52].
Additionally, the diagnostic value of uNGAL still needs to be tested in studies with better quality. Firstly, the studies we included in our meta-analysis are mainly single-center studies; therefore, multicenter studies are urgently needed to confirm the association between uNGAL level and LN and confirm its role in predicting the progression of LN. Secondly, studies including a greater number of patients will gain greater insight into the potential usefulness of urinary lipocalin-2 in patients with LN. Additionally, studies should predetermine their cut-offs according to the values proposed in the present review, in order to improve the quality and reliability of these studies. Furthermore, it is also recommended that the combination of laboratory biomarkers-RAIL needs to be assessed in more validation cohorts.

Limitations
This study was limited by certain factors. Firstly, the number of the eligible studies for identifying proliferative LN was too small to establish a SROC curve, so we presented the diagnostic accuracy merely by description. Secondly, there was significant heterogeneity in all parts of the meta-analysis, and meta-regression, subgroup analysis, and sensitivity analysis could only explain part of the sources of heterogeneity. Thirdly, many of the studies included did not use a blinding method, and the reference standards might vary from different studies, which might introduce potential bias in the summary of results. Lastly, the fact that uNGAL/Cr instead of absolute values of uNGAL was measured in some of the included studies might also play a role in the presence of heterogeneity.

Conclusion
In conclusion, our updated meta-analysis indicates that uNGAL was a useful biomarker for diagnosis, estimation of activity, and prediction of renal flare of LN and its diagnostic value for diagnosing LN was superior to those under other settings. In addition, the usefulness of uNGAL to distinguish pathological types of LN needs to be further investigated.

Conflicts of Interest
No potential conflict of interest was reported by the authors.

Authors' Contributions
Yueming Gao, Bin Wang, and Bicheng Liu contributed to the conceptualization of the study. Yueming Gao, Songtao Feng, and Jingyuan Cao are involved in the data curation. Yueming Gao and Bin Wang are involved in the formal analysis. Yueming Gao, Bin Wang, and Songtao Feng are involved in the investigation. Yueming Gao, Bin Wang, and Bicheng Liu contributed to the methodology. Yueming Gao and Jingyuan Cao are responsible for the resources. Yueming Gao, Bin Wang, and Songtao Feng are responsible for the software. Figure S1: plot of sensitivity analysis in part 3. Sensitivity analysis plot of uNGAL to predict renal flare. The results showed that the study of Elewa in part 3 might influence the robustness of the meta-analysis. Table S1: the detailed literature search methods for the meta-analysis.