Tumor Digital Masking Allows Precise Patient Triaging: A Study Based on Ki-67 Scoring in Gastrointestinal Stromal Tumors

Background Technological advances constantly provide cutting-edge tools that enhance the progress of diagnostic capabilities. Gastrointestinal stromal tumors belong to a family of mesenchymal tumors where patient triaging is still based on traditional criteria such as mitotic count, tumor size, and tumor location. Limitations of the human eye and randomness in choice of area for mitotic figure counting compel us to seek more objective solutions such as digital image analysis. Presently, the labelling of proliferative activity is becoming a routine task amidst many cancers. The purpose of the present study was to compare the traditional method of prediction based on mitotic ratio with digital image analysis of cell cycle-dependent proteins. Methods Fifty-seven eligible cases were enrolled. Furthermore, a digital analysis of previously performed whole tissue section immunohistochemical assays was executed. Digital labelling covered both hotspots and not-hotspots equally. Results We noted a significant diversity of proliferative activities, and consequently, the results pointed to 6.5% of Ki-67, counted in hotspots, as the optimal cut-off for low–high-grade GIST. ROC analysis (AUC = 0.913; 95% CI: 0.828–0.997, p < 0.00001) and odds ratio (OR = 40.0, 95% CI: 6.7–237.3, p < 0.0001) pointed to Ki-67 16% as the cut-off for very high-grade (groups 5–6) cases. With help of a tumor digital map, we revealed possible errors resulting from a wrong choice of field for analysis. We confirmed that Ki-67 scores are in line with the level of intracellular metabolism that could be used as the additional biomarker. Conclusions Tumor digital masking is very promising solution for repeatable and objective labelling. Software adjustments of nuclear shape, outlines, size, etc. are helpful to omit other Ki-67-positive cells especially small lymphocytes. Our results pointed to Ki-67 as a good biomarker in GIST, but concurrently, we noted significant differences in used digital approaches which could lead to unequivocal results.


Background
Technological advancement constantly provides cuttingedge tools that enhance the diagnostic capability, although the subcategorisation of many mesenchymal tumors including gastrointestinal stromal tumors (GISTs) is still based on the mitotic count (MC). According to the current Miettinen classification and the European Society of Medical Oncology (ESMO) guidelines, which in fact recapitulates Miettinen's principles, MC is crucial for relevant tumor cross-division to a low-or high-risk recurrence group [1,2]. An arbitrary cut-off point stated at 5 mitoses per 50 hpf (according to updated 2014 ESMO, it is 5 mm 2 ) might be risky in borderline cases and could lead to underestimation. A conventional microscope analysis has natural technical restrictions. There is no doubt it does not allow us to distinguish the field with bigger mitotic activity rendering as the analysed area is random. These limitations of the human eye and the restricted field of view of microscope lens could be countered with more repeatable and efficient tools such as digital image analysis. With time, the labelling of proliferative activity with Ki-67 has become a crucial biomarker. Presently, this is vividly apparent in breast cancer, neuroendocrine tumors (NETs), or brain tumors for which it became a gold standard test hugely influential in tumor grading and gradually replacing MC [3][4][5][6][7]. It could be said that the time of the MC has gone and is gradually replaced by more effective biomarkers. Ki-67 implementation as the predictor encouraged many researchers to study its application in diagnosing other malignancies. A vast spectrum of malignancies has been examined, and notably, most of them confirmed its prognostic value [8][9][10][11][12][13][14]. Ki-67 gained a new image after the Cuylen et al. study. They described the unique role of that protein during the cell cycle-namely, the surfactant-like function which allows chromosome separation and its quantitative enhancement during the cell cycle [15]. It gave a foundation for counting not only strong immunohistochemical reactions but also weak ones. On the other hand, the reported differences amidst the immunohistochemical Ki-67 counterpart and counting methodology create new issues with correct tumor categorisation [16][17][18]. Basically, it means that the achieved scores partially depend on antibody clones or manufacturers, and with the assumption of thresholds, the result could lead to a varied categorisation. It seems, here, we need a digital support; there is no space for random and unrepeatable estimation. The next equally important question was the variable landscape of tumor heterogeneity with the presence of hotspots and many approaches to Ki-67 counting which made a rupture between researchers. Clearly, that could result from the variability of achieved cut-off predictive values and may arise from the absence of standardized, objective, and controlled methods of measurement [18][19][20][21][22][23]. Focusing on a new approach to heterogeneity especially in the context of hotspot presence reflects the meaning of the most active cell clones in tumors [24,25]. Employment of objective methods of analysis could help to avoid the subjective pathologist's judgement.

Objective
The purpose of the present study was tumor digital mapping to estimate the Ki-67 and other cell cycle-dependent proteins as prognostic biomarkers in GIST. Whole slide scanning and digital analysis with detailed mathematical calculations have been planned to reveal intratumoral heterogeneity of proliferative activity.

Methods
This study has been a natural continuation of our previously published research concerning the significance and the usefulness of GLUT-1, CD9, and CD63 in GIST. The inclusion criteria were previously described in detail [26]. There was no selection bias. This time, we pooled fifty-seven subjects who were eligible for enrolment. In accordance with our previous results, the Miettinen and also the 2012 ESMO guidelines were applied [1,2]. All cases were reanalysed and recategorised to low-and high-grade GIST. Mitotic count was tabularised as 0-5 and above 5 mitotic figures per 50 hpf. Simultaneously, immunohistochemical assays with Ki-67, p21, p27, and cyclin D1 were performed and then were scanned and digitally analysed.
3.1. Immunohistochemistry. The classical immunohistochemical assays with the use of antibodies against Ki67, p21, p27, and cyclin D1 were performed. All executed assays were fully validated with the intention of in vitro use. The details of used antibodies are presented in Table 1.
All reactions were performed with BenchMark XT (Ventana Medical Systems; Roche Group, Tucson, USA). After the fully automated deparaffinisation and rehydration of the samples, the antigen unmasking processes by CC1 (Ventana Medical Systems; Roche Group, Tucson, USA), incubation with primary antibodies (time and temperature of both antigen retrieval and primary antibody incubation were strictly in accordance with the manufacturer's recommendations), and further routine steps were performed. We used the Ventana ultraView Universal DAB Detection Kit.

Digital Image Analysis of Whole Section Assays. The
Hamamatsu NanoZoomer S210 (Hamamatsu®, Hamamatsu City, Shizuoka Pref., Japan) scanner was used for slide scanning. The consequent dedicated digital image analysis was performed using Visiopharm nuclei plus application (Visio-pharm®, Hoersholm, Denmark). A whole tissue section was scanned which provided great insight into the intratumoral heterogeneity and allowed us to detect the hotspots. We performed double separated calculations for both hotspot and not-hotspot foci. The whole slide analysis with digitally adjusted, from low to high, magnification allowed pointing to the most active fields (hotspots). The digital templates were stated according to training cycles and settings for intensity of nuclei reaction and simultaneously for nuclei shapes and size. The typical for GIST intertumoral cell shape and size heterogeneity and also spindle vs epithelioid cell type forced a natural correction of previous settings in some cases. Strong nuclear reaction was coded as "strong," weak nuclear reaction was coded was "weak," and lack of reaction was coded as "negative." Digital objects of interest (DOIs) were stated as five 1500 μm × 1500 μm areas corresponding to commonly analysed 50 hpf areas covering ca. 10-11 mm 2 . Importantly, all chosen DOI covered hotspot fields and separately not-hotspots fields for their comparative analysis. All areas in the vicinity of mucosal ulceration and granulation tissue and also rich in stromal Ki-67-positive lymphocytes were rigorously excluded to avoid the falsely raised results. Both weak and strong nuclear reactions were coded as positive. To depict intratumoral heterogeneity of proliferative activity, all calculations were done as follows: the hotspotpositive ratio (HSPR%) depicts the ratio of the all-positive cells counted in hotspots to all analysed tumor cells in DOI; similarly, the not-hotspot-positive ratio (nHSPR%) corresponds to the ratio of the all-positive cells counted in not-hotspots to all analysed cells in DOI.

Statistical Methods.
Quantitative data was reported as minimum, maximum, median, lower (Q1), and upper (Q3) quartile (in the case of nonnormal distributions) or as the means and standard deviations. Categorical data was expressed as number and percentage distributions. The chisquare test or Fisher's exact test was applied to compare proportions while the Mann-Whitney test or Kruskal-Wallis test was used to compare distributions of continuous variables. Correlations between continuous variables were assessed by Spearman's rank correlation coefficient. The receiver operating characteristic (ROC) curve analysis was performed to test the ability of analysed variables to distinguish between low (≤3) and high (>3) ESMO. The area under the ROC curve (AUC) with 95% confidence interval (95% CI) was estimated, and the optimal cut-off values were determined. The odds ratios (OR) with 95% CI were also calculated. The recurrence-free survival (RFS) period in two groups was compared by a log-rank test. A two-tailed p value < 0.05 was considered as statistically significant. All statistical analyses were performed using R (version 3.1.2; The R Foundation for Statistical Computing, Vienna, Austria) and Statistica (StatSoft Inc., 2014, version 12).

Results
The average age of patients was 62.2 years (ranging from 31to 89; st. dev. 13.8). The average quantity of analysed cells was 12,567 per singular ROI (1500 μm × 1500 μm area), viz., 5585 cells per 1 mm 2 . All general characteristics of patients and tumors are depicted in Table 2. Moreover, in addition to current calculations, the previously studied CD63 and GLUT-1 were included as well.
Simultaneously, a tercile analysis splitting patients into three groups was undertaken. Medians at 3.5% and 10% were extracted as follows: low-risk GIST (n = 27), moderate-to high-risk tumors (n = 17), and very high-risk tumors (n = 13)-here we observed a problem with accurate separation of low and moderate GISTs ( Figure 1).
We compared recurrence-free survival period (RFS) with the 6.5% median by log-rank test (p = 0 20), and collaterally, we calculated a tercile analysis for median thresholds at 3.5% and 10%. A 10% cut-off separating 3-4 and 5-6 groups strongly correlated with RFS (p = 0 003) which is presented in Figure 2.
The next task was to face the strong nuclear reaction paradigm. To estimate the impact on weak and strong nuclear reactions the Spearman rank analysis was applied. Figure 3 illustrates the digital map covering the slide scan. The results showed a strong positive correlation of HSPR% with all items reached by digital analysis. The highest r values (0.94 and 0.93) were obtained for total (weak and strong as well) positive cells and selectively only for weak nuclear reactions, respectively.
A comparative analysis of cyclin D1, p21, and p27 with Ki-67 was performed to unveil a biomarker power. Unsurprisingly, the expression level of cyclin D1, p21, and p27 followed the progress of tumor malignancy, although Ki-67 HSPR% reached the most significant meaning (AUC 0.913, p < 0001) ( Table 3). The same results were achieved by splitting MC into two groups: 0 to 5 mitoses and above 5 mitoses (for Ki-67 p < 0 0001).

Discussion
In the present paper, our attention has been focused on the Ki67 labelling as a standalone biomarker. There are numerous attempts at Ki-67 labelling, and the conclusions always remain consistent. It seems that the diagnostic relevance of Ki-67 could outperform traditional MC and is closer to becoming the routine biomarker in mesenchymal tumors as well. Cuylen et al.'s results helped us to align the meaning of the immunohistochemical nuclear strong and weak reactions which was disputed in the past [15]. Indeed, a very strong nuclear reaction corresponds to the late-phase cell cycle and could depict the vicinity of the mitotic figure. Our results did not confirm the key role of strong reactions; instead, we noted a stronger impact of the total positive reactions. The digital analysis provided us with a very good solution on precise extraction weak/strong reaction, then accurate counting, and finally precise mathematical calculations. It helped us to debunk a myth of strong reaction advantage. An outright indication of a sharp cut-off value for mesenchymal tumors could impose a risk of underestimation or overestimation. Our relatively small cohort allowed us to achieve a sharp cut-off at 6.5% to extract the high-risk tumors according to Miettinen's principles. It is intelligible that the final score could be falsified by stromal inflammatory or other Ki67-positive cells, so perhaps a frame construction with a slender grey zone would prove to be of better use. Obviously, a full cut-off value implementation needs much more numerical cohorts and working group consensus. PubMed-available data concerning GIST and Ki-67 are seriously diversified especially in methodology used. The results published by Liang et al. stated a 5% cut-off as biomarker of a worse outcome. That discrepancy with our data could result from a microarray technique with no hotspot extraction and counting with the naked eye [27]. Belev et al. and Kemmerling et al. also suggested 5% cut-off but the authors analysed only 100 and 1000 cells [28,29]. Furthermore, Basilio-de-Oliveira and Pannain pointed to a threshold at 134.8 Ki-67-positive cells per mm 2 as a good predictor of outcome [30]. Zhao et al. applied two thresholds at 5% and 8%, and they reached similar conclusions to ours; however, they solely analysed hotspots. The natural restriction of this study was the use of the microarray technique [31]. Basically, the available published data concerning Ki-67 scores are similar or slightly differ from ours. This requires a short explanation. We have found papers where calculations were based on completely different methodologies such as a cost-saving microarray, covering only hotspots or randomly chosen areas counting only strong reaction or analysing a small cell number. The implementation of so many   various methods was bound to lead to various results. Firstly, we designed counting in line with Miettinen rules-on 5 defined areas covering 10-11 mm 2 ; secondly, an average above 60,000 cells in all cases was analysed; and finally, we made a diversification between strong reactions, weak reactions, and hotspot and not-hotspot foci. In brief, we applied the whole slide digital assays with hotspot field scoring which was the reason behind the differences. It appears that the microarray could seriously contribute to interstudy discrepancies. The rhetorical question is the best answer-does 1.5-2 mm diameter core tissue truly correspond to real tumor complexity including heterogeneity?
The next question was the choice of field for analysis especially in the light of meaning of hotspots and randomly chosen areas. There is no global consensus on the principles of labelling. In the other words, automated counting in the hotspot fields seems to be the best repeatable solution. In the fresh paper of Thakur et al., what they applied was similar to our methodology, meaning whole slide scanning with hotspots extracted as an unbiased approach [18]. The issue of Ki-67 clones used in immunohistochemistry had been already thought of in the last decade, where MIB-1 clones were presented as one of the highest sensitivities for the Ki-67 antigen. Next, papers covering the newest Ki-67 clones reported comparable concordance between the MIB-1 clone and the 30-9 clone [16,17].
We observed that both HSPR% and nHSPR strongly correlated with the MC and malignancy levels, but concurrently, the quantitated significant difference (p < 0 0001) between HSPR% and nHSPR% was noted ( Table 2 depicts huge differences between groups). It has the absolute meaning in the formulation of precise cut-off values. That is why we emphasise using a digital solution, whole slide scanning, and choosing only hot fields to label. That will be relevant for patient triaging based on Ki-67 cut-off values in the future. In light of Miettinen's classification, Ki-67 could become a promising biomarker. Our experience has taught us to keep a distance from prediction based on MC in cases when just a small biopsy had been taken; thus, we previously tried to focus our attention on other biomarkers [26,[31][32][33][34]. In a very recent paper by Liu et al. covering 1022 patients, the cut-off values were stated at 6% which is very close our data, although some methodological differences exist between us [35]. The most newly published paper by Sugita et al. applied a methodology similar to ours reaching an 8% score for whole slide scanning and Ki-67 labelling. The authors concluded that whole slide analysis raises the Ki-67 score in high-grade tumors to 8% and assume our study in hotspots could reach 10% or more [36].
However, a clinical oncology light is obligatorily focused on Ki-67 as a crucial biomarker; no others exist that are less efficient. Phosphohistone H3 (pHH3) was widely tested amidst many cancers as a sensitive marker for mitotic figure detection and could be seen as a successor of traditionally scored MC in the close future. Interestingly, its significance was highlighted also in many types of malignancies where the mitotic index is not evaluated routinely. In many papers, the pHH3 ratio was presented as a highly prognostic biomarker [37][38][39][40][41][42]. In that light, GISTs, as the tumors where categorisation is fundamentally based on MC, have been also researched. If it was assumed that chromatin condensation with positive pHH3 IHC reaction corresponds to the recent stage of mitosis, the final pHH3/MC score must by higher over traditional MC. The authors involved in that issue reported a necessity of reclassifying tumors according Miettinen's rules because of the raised number of pHH3-positive cells than the original MC [43][44][45]. That is why immunohistochemical and digital support outperforms traditional manual counting of mitoses.
While we did not find the correlation with tumor measurements and tumor location, we noted progress of  proliferative activity with intracellular metabolism-dependent proteins, namely, CD63 and GLUT-1. Why the achieved Ki-67 scores between men and women are higher (p < 0 05) remains unclear. Assuming absolute lack of selection bias, we suppose that it could result from cohort size.
The comparative analysis showed the absolute Ki-67 prevalence over other cell cycle proteins as a clinically useful biomarker. There is a rising number of digital solutions, but a question regarding the employment of Ki-67 scoring remains unanswered. Our experience does not leave any doubts-digital masking of a large tumor area instead of small-field microscope photography allows us to extract the most active fields. The necessary time for analysis including scanning time and image analysis was close to 10 minutes making this way of scoring both precise and time saving while providing a repeatable and comparable solution.

Conclusion
Tumor digital masking is a promising solution for repeatable and objective labelling. Software adjustments of nuclear shape, outlines, size, etc. are helpful to omit other Ki-67positive cells especially small lymphocytes. Moreover, dedicated software allows precise mathematical calculations making results more accurate.
Our results point to Ki-67 as a good biomarker in GIST, but common utilization of Ki-67 should proceed after establishing the guideless for cell counting and finally after larger studies have been conducted in order to reach a consensus of the cut-off value. Lately, published papers present similar results although they were based on different digital applications which proves the advantage of digital labelling.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Ethical Approval
This study with the use of human tissue was in accordance with the ethical standards of the Declaration of Helsinki with its latest revision in 2004. Additionally, the study was approved by the Ethics Commission of the Faculty of the Medicine and Health Science, Jan Kochanowski University in Kielce, Poland (Decision No. 10/2016).

Consent
The patients' written consent has been obtained in all the cases. Consent for publication is not applicable.

Conflicts of Interest
The authors declare no conflicts of interest.

Authors' Contributions
The study design was handled by PL, SG, and JM. Gathering of the study cohort was handled by IW and MG-O. Histopathological examination was handled by PL. The statistical analysis was handled by MCh and the data analysis by PL, DK, MCh, SG, and JM. AH-L collected the references, and PL and AH-L prepared the manuscript. The study was supervised by PL and SG. All authors read and approved the final manuscript.