Reproducibility in PD-L1 Immunohistochemistry Quantification through the Tumor Proportion Score and the Combined Positive Score: Could Dual Immunostaining Help Pathologists?

Simple Summary The quantification of PD-L1 expression in tumor samples through the tumor proportion score (TPS) and the combined positive score (CPS) conditions provides access to anti-PD-1/PD-L1 immunotherapy in patients with various solids cancers. Reproducibility studies have shown very heterogeneous inter- and intra-pathologist agreements, from poor to excellent ones, in TPS and CPS quantification. We studied the inter- and intra-pathologist agreements in TPS and CPS quantification comparing single PD-L1 immunohistochemistry (S-IHC) and double immunohistochemistry (D-IHC) combining PD-L1 staining and tumor nuclear markers, trying to improve the distinction between tumor cells and immune cells necessary to TPS and CPS calculations. Our study concluded in excellent (TPS) to good (CPS) inter- and intra-pathologist agreements with both S-IHC and D-IHC with slightly higher intraclass correlation coefficients using D-IHC. D-IHC could help the pathologists to quantify PD-L1 expression through TPS and CPS for subsequent therapeutic choices in patients with advanced cancers. Abstract We studied the pathologists’ agreements in quantifying PD-L1 expression through the tumor proportion score (TPS) and the combined positive score (CPS) using single PD-L1 immunohistochemistry (S-IHC) and double immunohistochemistry (D-IHC) combining PD-L1 staining and tumor cell markers. S-IHC and D-IHC were applied to 15 cancer samples to generate 60 digital IHC slides (30 whole slides images and 30 regions of interest of 1 mm2) for PD-L1 expression quantification using both TPS and CPS, twice by four pathologists. Agreements were estimated calculating intraclass correlation coefficients (ICC). Both S-IHC and D-IHC slides analyses resulted in excellent (for TPS, ICC > 0.9) to good (for CPS, ICC > 0.75) inter- and intra-pathologist agreements with slightly higher ICC with D-IHC than with S-IHC. S-IHC resulted in higher TPS and CPS than D-IHC (+5.6 and +6.1 mean differences, respectively). High reproducibility in the quantification of PD-L1 expression is attainable using S-IHC and D-IHC.


Introduction
Targeting the PD-1/PD-L1 axis using immune checkpoint inhibitors (ICIs) is now approved to treat patients with different advanced solid cancers. Nevertheless, highly different responses to ICIs are observed in patients in terms of antitumor responses, from no response to complete remissions, but also in terms of immune-related adverse effects simulating autoimmune diseases in some patients [1]. To better stratify patients and anticipate their response to ICIs, predictive strategies have been proposed, such as the quantification of PD-L1 expression using immunohistochemistry (IHC) on tumor samples.
For some PD-1/PD-L1 ICIs, as a companion diagnostic, PD-L1 IHC conditions the possibility to use the treatment itself or provide its use in a first-line of treatment if the tumor sample sufficiently expresses the PD-L1 molecule. In other cases, it only consists of a complementary test of predictive and/or prognostic significance but does not condition the use of the treatment. For both these predictive and prognostic purposes, the guidelines for the quantifications of PD-L1 expression vary from one cancer to another. Indeed, PD-L1 expression could be quantified in the tumor cells (TC) solely leading to a "tumor proportion score" (TPS) expressed in terms of percentage of PD-L1-positive TC (with complete or incomplete membranous staining of any intensity) or in association with the expression by mononuclear immune cells (IC) except plasma cells (i.e., lymphocytes and monocytesmacrophages, with membranous and/or cytoplasmic IHC staining) through a "combined positive score" (CPS). In this manner, both the quantifications of TPS and CPS require distinguishing between TC and IC, PD-L1-positive, and PD-L1-negative, which could sometimes be difficult. In addition to various TPS or CPS quantification methods, positivity cut-off values vary from one cancer to another, but also from one ICI to another. Different PD-L1 IHC clones, protocols, and automates can also produce variations in IHC signals. Adding the intra-and intertumor spatial and time heterogeneity of PD-L1 expression, there are several potential difficulties and sources of discrepancies in the scoring of PD-L1 expression [2].

Cases Selection
The cases included in this study were selected among patients with cancer samples (melanomas and NSCLC) tested for PD-L1 expression for diagnostic purpose in the Department of Pathology of CHU Brest. S-IHC and D-IHC had been performed on the same tissue block for each case and were collected from archives as well as corresponding hematoxylineosin-saffron (HES) slides to select a panel of 15 cases of cancers (1) of different organs and histological subtypes (6 melanomas and 9 NSCLC samples, see Table 1 for summary), and (2) with different levels of PD-L1 IHC expression according to initial pathology reports from no to diffuse staining. Cases were deidentified for this reproducibility study. Clinical data about treatment choices and responses to treatments of the patients were not collected in this study, which was conducted in accordance with our national and institutional guidelines, in compliance with the Helsinki Declaration, and after approval by our institutional review board (CHRU Brest, CPP n • DC-2008-214).

Immunohistochemistry
The clone 22C3 (1:50 dilution; Dako, Glostrup, Denmark) was the anti-PD-L1 antibody used for IHC in the present study. The S-IHC analyses were performed on tissue sections, 3 µm thick, laid on Superfrost Plus slides using the Ventana Benchmark Ultra automated slide preparation system (Roche Diagnostics, Meylan, France) and OptiView DAB IHC Detection Kit (Roche Diagnostics). A PD-L1 positive control (tonsil) was added on each IHC slide. The S-IHC staining procedure included a pretreatment step with cell conditioner 1, followed by incubation with the anti-PD-L1 diluted antibody at 37 • C. Antibody incubation and signal amplification was followed by counterstaining with hematoxylin, washing, and mounting. For D-IHC, the slides underwent an antibody denaturation step at 95 • C for 8 min after incubation with the PD-L1 antibody revealed in DAB, as described for S-IHC slides, and before the incubation with a second antibody, targeting a tumor nuclear marker and revealed in red using the ultraView Universal Alkaline Phosphatase Red Detection Kit (Roche Diagnostics). The nuclear markers used in our study were TTF-1 (clone 8G7G3/1, 1:50 dilution, Dako, 64 min CC1 pretreatment), p40 (polyclonal, 1:100 dilution, Clinisciences (Nanterre, France), 36 min CC1 pretreatment), and p53 (clone DO-7, 1:50 dilution, Dako, 64 min CC1 pretreatment) for NSCLC samples and SOX10 (clone SP267, prediluted, Cell-Marque, 64 min CC1 pretreatment) for melanoma samples. After the incubation of the second antibody, the slide was counterstained with hematoxylin, washed, and mounted.

Slides Digitalization
HES, S-IHC, and D-IHC were digitalized using a 3DHistech Pannoramic Midi scanner (3Dhistech, Budapest, Hungary) at ×40 magnification resulting in MRXS-files whole slides images (WSIs). The CaseViewer software (3Dhistech) was used to visualize the WSIs and select, through the built-in annotation tool, small particular tissue regions of interest (ROIs) of 1 mm 2 (1 per IHC slide) on the basis of different PD-L1 staining abundance on the S-IHC slide, from no to diffuse staining with intermediate proportions of stained cells. The same ROI selected on the S-IHC slide was selected on the corresponding D-IHC slide. Selected ROIs were then exported separately of their native WSIs into new MRXS files using the Pannoramic Viewer software (3DHistech). Once the whole WSIs and ROIs MRXS had been produced, the whole set of MRXS files was duplicated into two image sets (set 1 and set 2) and slide labels were changed between the two image sets to prevent cross-identification by observers.

PD-L1 Expression Quantification
PD-L1 expression was quantified by four pathologists. Each pathologist analyzed the whole image sets of ROI and WSI files following both the TPS and the CPS criteria. Each pathologist performed two analyses of each slide through the separated interpretations of set 1 and set 2 images with at least 1 month between the analyses of the two images sets. For each IHC images, the corresponding HES images were also provided to the pathologists. The study design is summarized in Figure 1.
The percentage of TC with a membranous staining (complete or incomplete, of any intensity) evaluated on the whole MRXS image was used for the calculation of the TPS. CPS was defined as the number of PD-L1-stained cells (TC with membrane staining; IC cells as lymphocytes and macrophages with membranous and/or cytoplasmic staining) divided by the total number of viable TC, multiplied by 100. Necrotic areas were not taken into account for these quantifications. For each observer and each analysis, exact count values of TPS and CPS were collected. The percentage of TC with a membranous staining (complete or incomplete, of any intensity) evaluated on the whole MRXS image was used for the calculation of the TPS. CPS was defined as the number of PD-L1-stained cells (TC with membrane staining; IC cells as lymphocytes and macrophages with membranous and/or cytoplasmic staining) divided by the total number of viable TC, multiplied by 100. Necrotic areas were not taken into account for these quantifications. For each observer and each analysis, exact count values of TPS and CPS were collected.

Statistical Analyses
Statistical analyses were performed using MedCalc Statistical Software version 13.2.2 (MedCalc Software, Ostend, Belgium). Intra-class correlations coefficients (ICCs) were used to estimate the inter-and intra-observer agreements for TPS and CPS and were interpreted as follows: <0.5: poor reliability, 0.5-0.75: moderate reliability, 0.75-0.9: good reliability, >0.9: excellent reliability. Bland-Altman plots were used to represent the differences between the two IHC methods, and scatter diagrams were used to illustrate the standard deviations between the measurements of the four pathologists as a function of the means of their rating for each image.

Intra-and Inter-Pathologist Reproducibility in TPS Quantification
The global agreement between the four pathologists and for each pathologist was globally excellent for TPS scoring (i.e., ICC > 0.9), for WSI and ROI images, with nevertheless more issues in the analysis of melanoma samples (resulting in poor inter-pathologist agreement and a moderate intra-pathologist agreement) in comparison with NSCLC samples (excellent intra-and inter-pathologist agreements). The melanin pigmentation in tumor cells and melanophages, sometimes difficult to differentiate from DAB-brown staining could contribute to explaining the inferior performances in melanoma samples compared to in NSCLC ones. The intra-and inter-pathologist agreements were excellent using either S-IHC or D-IHC, with slight trends of higher ICC using D-IHC in comparison with S-IHC (see Table 2 for detailed values). Table 2. Inter-and intra-pathologist agreement according to intraclass correlation coefficient (ICC) calculations (ICC values; 95% confidence intervals, and interpretation of the reliability). TPS: tumor proportion score; CPS: combined positive score; WSI: whole slide image; NSCLC: non-small-cell lung carcinomas; S-IHC: single PD-L1 immunohistochemistry slide; D-IHC: dual immunohistochemistry slide combining PD-L1 and tumor nuclear markers.

Intra-and Inter-Pathologist Reproducibility in CPS Quantification
The global agreement between the four pathologists and for each pathologist was globally good for CPS scoring (i.e., ICC between 0.75 and 0.9), with the lowest ICC and intra-and inter-pathologist agreements in analyzing the melanoma samples as reported for the TPS scoring. Of note, the CPS calculated on WSI resulted in higher intra-and inter-pathologist agreement (said excellent through ICC interpretation) in comparison with those calculated on the basis of ROI. Most of the ICCs calculated for TPS and CPS resulted in lower values and inferior intra-and inter-pathologist agreements for CPS values than for TPS ones. With regard to the TPS, there were slight trends of higher ICC using D-IHC in comparison with S-IHC, but both IHC methods resulted in good intra-and inter-pathologist agreements (see Table 2 for detailed values).

Differences in TPS and CPS between S-IHC and D-IHC Methods
Scatter diagrams illustrating the standard deviations as function of the means of TPS and CPS data measured by the four pathologists indicated a trend that higher standard deviations were obtained in cases with TPS and CPS values far from 0 and 100, indicating more inter-pathologist discrepancy results in these cases. The same trend was observed for both S-IHC and D-IHC. The Bland-Altman plots method was concordant with this trend and also pointed out that the TPS and CPS calculated on the basis of D-IHC tended to be inferior to those calculated on the basis of S-IHC (mean differences of 5.6 for TPS and 6.1 for CPS, differences being inferior around the scores of 0 and 100 and more pronounced far from these values). See Figure 2 for the graphs and Figure 3 for images of paired S-IHC and D-IHC.

Discussion
Several biomarkers can condition the access to anti-PD-1/PD-L1 ICIs in patients with different advanced cancers such as tumor mutational burden, microsatellite instability, and, more recently, promising mature tertiary lymphoid structures identification [16][17][18]. PD-L1 IHC expression quantification using either TPS or CPS scores with various positivity cut-offs from one cancer type to another is also one of these biomarkers and has

Discussion
Several biomarkers can condition the access to anti-PD-1/PD-L1 ICIs in patients with different advanced cancers such as tumor mutational burden, microsatellite instability, and, more recently, promising mature tertiary lymphoid structures identification [16][17][18]. PD-L1 IHC expression quantification using either TPS or CPS scores with various positivity cut-offs from one cancer type to another is also one of these biomarkers and has been the subject of several reproducibility studies in the literature. These reproducibility studies are also very heterogeneous in terms of anti-PD-L1 IHC clones and platforms, the natures of cancers, the number of cases and pathologists, the use of physical glass slides or digital ones, the need for exact quantification of positive cells, or the use of semiquantitative scales and different statistical methods. Inter-and intra-observer reproducibility is reported with different agreements, from poor to excellent ones, but all the aforementioned methodological variations make the different studies difficult to compare [3][4][5][6][7][8][9][10][11][12][13][14][15].
Of note, despite the interest and potential issues in differentiating TC and IC in tumor samples to calculate TPS and CPS, studies investigating the potential interest of coupling anti-PD-L1 IHC with TC nuclear markers using D-IHC were lacking in the literature. Thus, we chose to focus the present reproducibility study on this D-IHC method and its comparison with the S-IHC one. In our study, intra-and inter-pathologist reproducibility was globally good to excellent, with notably better results in assessing TPS than in assessing CPS, highlighting more issues in quantifying PD-L1 expression in IC and TC than in TC only. Although ICC interpretation conclusions were similar between S-IHC and D-IHC, ICC values were slightly higher with D-IHC than with S-IHC, pointing out the potential interest of D-IHC as an aid in differentiating between PD-L1-positive TC and IC. Comparisons of S-IHC and D-IHC also revealed that S-IHC TPS and CPS scores were higher than D-IHC ones, pointing out a potential reclassification from PD-L1-positive TC on S-IHC images to PD-L1-positive IC ones using D-IHC. Because the use of D-IHC tends to result in inferior PD-L1 scores compared to in S-IHC, this could have consequences in terms of therapeutic decisions for tumor samples with scores below the currently established thresholds of "PD-L1 positivity" (established only on the basis of S-IHC to date) according to TPS and/or CPS scores in various cancers. This has to be kept in mind for pathologists using D-IHC as a complementary analysis to avoid reclassification of patients candidates to ICI treatments from a "PD-L1 positive" status to a "PD-L1-negative" status.
The variations between analyses were also higher when the mean proportions of PD-L1 cells were far from the lowest and highest ones, pointing out particular potential issues for pathologists in the case of heterogeneous IHC staining. There is no doubt that in a near future, novel artificial-intelligence-based software applied to digital pathology slides could be of great help to pathologists in analyzing those heterogeneous cases, allowing them to gain in reproducibility [19][20][21][22].

Conclusions
High intra-and inter-pathologist agreement is attainable in scoring PD-L1 expression through the TPS and CPS immunohistochemistry scores. In cases with difficult-todifferentiate tumor and immune cells, D-IHC can help pathologists to better calculate these scores, conditioning the access of patients to ICI treatments. Informed Consent Statement: Not required for this methodological study with fully anonymized retrospective pathological and not clinical data. Data Availability Statement: Data available upon reasonable request to the corresponding author.