The Diagnostic Performance of Afirma Gene Expression Classifier for the Indeterminate Thyroid Nodules: A Meta-Analysis

Background Approximately 15 to 30% of thyroid nodules evaluated by fine-needle aspiration (FNA) were classified as indeterminate; the accurate diagnostic molecular tests of these nodules remain a challenge. We aimed to evaluate the diagnostic performance of Afirma gene expression classifier (GEC) for the indeterminate thyroid nodules (ITNs). Methods Studies published from January 2005 to December 2018 were systematically reviewed. The gold reference standard relied on the histopathologic results diagnosis from thyroidectomy surgical specimens. MetaDisc software was used to investigate the pooled sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), diagnostic odds ratio (DOR), and summary receiver operating characteristic (SROC) curves. Results A total of 18 studies involving 5290 patients with 3290 cases of ITNs were included. Collected data revealed that the pooled sensitivity of GEC was 95.5% (95% CI 93.3%–97.0%, p < 0.001), the specificity was 22.1% (95% CI 19.4%-24.9%, p < 0.001), the NPV was 88.2% (95% CI 0.833–0.921, p < 0.001), the PPV was 44.3% (95% CI 0.416–0.471, p < 0.001), and the DOR was 5.25 (95% CI 3.42–8.04, p= 0.855). Conclusion The GEC has quite high sensitivity of 95.5% but low specificity of 22.1%. The high sensitivity makes it probable to rule out malignant nodules. Thus, over half of nodules with GEC-suspicious results still require further validation like molecular markers, diagnostic surgery, or long follow-up, which limits its use in future clinical practice.


Introduction
Approximately 15 to 30% of thyroid nodules evaluated by fine-needle aspiration (FNA) are classified as indeterminate, including atypia of undetermined significance/follicular lesion of undetermined significance (AUS/FLUS, category III), follicular neoplasm or suspicious for follicular neoplasm (FN/SFN, category IV), and suspicious for malignancy (SM, category V) according to the Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) [1]. The present guidelines recommend repeated FNA for category III lesions, lobectomy for category IV lesions, and repeated category III lesions [2][3][4]. However, the malignancy risk in TBSRTC categories III and IV ranges between 5% and 30% after surgery [5]. Patients with cytological ITNs are often referred for diagnostic surgery, though most of these nodules finally prove to be benign [6]. The Afirma gene expression classifier (GEC) measures the expression of 167 gene transcripts to determine whether the nodules are benign or malignant [7]. In 2012, a prospective, multicenter validation trial of the Afirma GEC involving 265 ITNs demonstrated a sensitivity of 92% and a specificity of 52% in TBSRTC III/IV nodules [7]. In the last decade, some studies [8][9][10] have evaluated its effects on ITNs but the results were inconsistent, probably due to the rates of indeterminate biopsy result varying among the hospitals and tertiary centers [11]. In 2016, a meta-analysis including seven studies of GEC revealed the pooled sensitivity of 95.7% and the specificity of 30.5% and concluded it as a ruleout malignancy test [9]. However, Sacks et al. demonstrated that there were no significant changes in surgery rates and malignant prevalence by comparing pre-Afirma and post-Afirma cases [12]. We checked the database and included 18 newly published studies to provide a more comprehensive analysis on the diagnostic performance of GEC and discuss its role in decision-making process of thyroid surgery.

Exclusion Criteria for Studies
(1) Opinions, reviews, commentary, case reports, and insufficient data.
(2) Lack of clinical characteristics of nodules, clear inclusion, and exclusion criteria.
We screened the studies following the process that was illustrated in Figure 1. A total of 18 studies met the inclusion criteria via the evaluation of QUADAS-2 questionnaire [29].

Data Extraction.
Two authors were engaged in reviewing the literatures from PubMed database and Embase independently according to the inclusion criteria. All conflicts were resolved through consensus within the groups. A third reviewer assessed all the discrepant items and the major opinion was used to resolve the disagreement between the reviewers.

Statistical
Analysis. The present work followed the structure of the PRISMA statement. Analyses were conducted using MetaDisc 1.4. We calculated pooled sensitivity, specificity, DOR, SROC, and the prediction ellipses for the hierarchical ordinal regression for ROC curves (HROC) model. We also used Cochrane Review Manager Version 5.3 (RevMan; 2014) to perform risk of bias evaluations of studies included in this meta-analysis. Deek's funnel plot asymmetry test was adopted as the way of evaluating publication bias both in each section of the analysis.

Summary
Estimates of Sensitivity, Specificity, NPV, PPV, DOR, and Summary ROC Curves. The analysis of diagnostic threshold revealed the spearman correlation coefficient was 0.414, p=0.111. We concluded that there was no threshold effect in this meta-analysis. Table 6 shows the pooled sensitivity, specificity, confidence intervals and heterogeneity results of the test. The   (Figures 2(a)-2(e)). Since the false negative and true negative values of two included studies [17,27] were 0, the original data of these two studies was dropped by the MetaDisc software.
Since the I 2 values of the sensitivity, specificity, PLR, and NPV were more than 50%, we conducted the metaregression analysis (inverse variance weights) to investigate the sources of heterogeneity. The metaregression revealed whether the original GEC test studies were conducted in single or multiple centers was the main source of heterogeneity (p=0.032) ( Table 7).
The bivariate logistic regression is described in Table 8. The ROC plane is in Figure 3. The SROC curve has been shown in Figure 4 with prediction and confidence contours. The area under the curve (AUC) is 0.73. The evaluation of bias in this meta-analysis is in Figure 5.

Publication Bias.
We conducted Deek's funnel plot asymmetry test to evaluate publication bias in each section of the analysis ( Figure 6). As the p-value is 0.34, we concluded that no obvious publication bias was found in every section of this meta-analysis.

Discussion
Thyroid cytopathological ITNs are usually referred to thyroidectomy or lobectomy and up to 74% of patients with cytologically indeterminate nodules are operated [5]. To some extent, ultrasound-guided FNA with on-site cytopathology improves both adequacy and accuracy of preoperative diagnoses in ITNs.
One earlier meta-analysis [9] assessed the performance of GEC. By adding newly published studies of GEC in recent years and pathological results after surgery, our results revealed that the GEC's sensitivity was 95.4%, the specificity was 22.3%. The diagnostic profiling of GEC is mainly limited to papillary and follicular thyroid carcinoma partly due to the relatively low prevalence of medullary and anaplastic thyroid cancer. Our present data revealed that the pooled NPV of GEC was not as high as previous studies [7,9,32].  The present study summarized the final pathological outcomes of GEC nodules after surgery. The high sensitivity and NPV make GEC as an effective approach to rule out malignant lesions in thyroid nodules with an indeterminate cytology. Taking the pooled postoperative pathological data into consideration, most GEC-suspicious nodules with benign pathological results after surgery are follicular adenomas (31.2%), benign follicular nodules (15.6%) and adenomatoid nodules (13.0%). The adenomatoid nodule is featured as a densely cellular follicular proliferation lack of capsule in histology. In the TBSRTC, the adenomatoid nodule is divided into category III or category IV [1]. According to a study of 234 thyroid FNA, the adenomatoid nodules were easily incorrectly diagnosed as follicular neoplasms [33]. Chronic thyroid inflammation is commonly regarded as chronic lymphocytic thyroiditis (CLT), characterized with diffuse lymphocytic infiltration in the thyroid glands. The impact of CLT on clinical and pathological outcomes of DTC remains unknown [34]. Some studies supported that DTC patients with CLT had a better prognostic outcome compared with those without CLT [35]. Most nodules with benign pathological results and well-differentiated PTC are proliferated from thyroid follicular cells. Benign nodules include follicular carcinoma and oncocytic adenoma. According to Table 5, follicular adenoma is the most common benign thyroid lesions (31.2%); the second most common is benign follicular nodules (16.0%). Malignant lesions such as cvPTC (44.3%) and fvPTC (38.3%) are classified into well-differentiated PTC.
An individual study [36] demonstrated that a predominance of Hürthle cells group led to an increased rate of suspicious GEC results with lower malignant risk than AUS/FLUS or FN/SFN nodules. HCNs partly contributed to the false positive rate of GEC. Considering the recent reclassification of the encapsulated fvPTC as "noninvasive follicular neoplasm with papillary-like nuclear features (NIFTP)", prior studies seldom reclassified fvPTC as NIFTP, which could give rise to unreliable estimates of cancer prevalence and PPV [37]. However, only limited data is available to evaluate the accuracy of GEC in HCN or NIFTP cases.
The Thyroid Imaging Reporting and Data System (TI-RADS) was designed to quantify malignancy of thyroid nodes [38,39]. It was based on suspicious ultrasound features such as solid component, hypoechogenicity or marked hypoechogenicity, irregular margins, microcalcifications or mixed calcifications, and taller-than-wide shape. Gathered data of thyroid nodes showed the sensitivity of TI-RADS was 97.4-99.1% and the NPV was 98.1-99.1% [40,41]. The TI-RADS and American Thyroid Association (ATA) guideline have greatly help physicians stratify the malignancy risk of ITNs. Recently, molecular tests with higher accuracy, together with TI-RADS, were applied for ITNs to decrease the false positive rates.
The BRAF V600E mutation is detected in more than half of papillary thyroid cancer. BRAF mutation has low prevalence in the FN/SFN and AUS/FLUS while high in the SM cytology thyroid lesions [42,43]. However, adding the BRAF V600E mutation to GEC did not improved the diagnostic sensitivity and specificity [44].
The next-generation sequencing panel, ThyroSeq v2, detected 14 cancer gene mutations with more than 1000 hotspots and 42 types of gene fusions or rearrangements in thyroid cancer [45]. A meta-analysis evaluated GEC from 1086 nodules and ThyroSeq v2 from 459 nodules to assess the preoperative diagnostic accuracy of ITNs [46]. Pooled data showed the sensitivity was 98% and 84%, and the specificity was 12% and 78%, respectively. In this meta-analysis, the pooled sensitivity of GEC was higher than our analysis while the pooled specificity was lower than our analysis. Therefore, the superiority of the GEC test lies in ruling-out of malignancy (higher sensitivity) and the ThyroSeq is a better test of 'ruling-in' thyroid neoplasm (higher specificity).
The risk of malignancy in ITNs was nearly 38.6% in our analysis, indicating that over half patients had underwent undue surgeries and conservative approaches could be considered for ITNs. The final decision of a diagnostic surgery or follow-up depends on US features, histological characteristics, and molecular test results.

Conclusions
The present meta-analysis has summarized the previously reported performance of GEC. We regard GEC as an effective approach to rule out malignant lesions in ITNs. Since the most benign nodules with GEC-suspicious results are follicular adenomas, benign follicular nodules and adenomatoid nodules, it is essential to combine other molecular markers to improve the specificity of GEC. The probability of malignancy and clinical management of nodules with GEC-suspicious still needs further investigation.

Limitations
Our study has several limitations. First, we failed to obtain the pathologic diagnosis of all the resected nodules, due to the missed original contents in some of the included studies. Second, it is not sure if there were geographic, race, and region variations regarding the GEC results and none of which mentioned the race of participants. Finally, some of the included studies lack the information of long-term followup for GEC-benign nodules or when the nodules underwent FNA during follow-up.   SROC: Summary receiver operating characteristic FNA:

Abbreviations
Fine-needle aspiration AUS/FLUS: Atypia of undetermined significance/follicular lesion of undetermined significance FN/SFN: Follicular neoplasm or suspicious for follicular neoplasm SM: Suspicious for malignancy ITNs: Indeterminate thyroid nodules GEC: Gene expression classifier PLR: Positive likelihood ratio NLR: Negative likelihood ratio ATA: American Thyroid Association TI-RADS: The thyroid imaging reporting and data system.

Data Availability
The datasets used or analyzed during the current study are available from the corresponding authors on reasonable request.

Ethical Approval
All studies that were included in this systematic review stated to be in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Conflicts of Interest
All authors declare that they have no conflicts of interest.

Authors' Contributions
Ying Liu and Da Fang independently searched the database and Ying Liu drafted the manuscript. Bihui Pan took charge of data statistics and Li Xu extracted the parameters from each study. Xianghua Ma and Hui Lu participated in the manuscript revision. All authors read and approved the final manuscript.