Mid-infrared imaging in breast cancer tissue: an objective measure of grading breast cancer biopsies

The majority of cancers are diagnosed using excised biopsy specimens. These are graded, using a gold-standard histopathology protocol based on haemotoxylin and eosin ( ‘ H + E ’ ) chemical staining. However the grading is done by eye and if the same biopsy is graded by different practitioners, they typically only agree ~70% of the time. The resulting overtreatment problem constitutes a massive unmet need worldwide. Our new “ Digistain ” technology, uses mid-infrared imaging to map the fractional concentration of nucleic acids, i.e. the nuclear-to-cytoplasmic chemical ratio (NCR) across an unstained biopsy section. It allows a quantitative ‘ Digistain index ’ (DI) score, corresponding to the NCR, to be reproducibly extracted from an objective physical measurement of a cancer. Our objective here is to evaluate its potential for aiding cancer diagnosis for the first time. We correlate the DI scores with H + E grades in a double-blind clinical pilot trial. Two adjacent slices were taken from 75 breast cancer FFPE blocks; one was graded with the standard H + E protocol, and also used to define a ‘ region of interest ’ (RoI). Digistain was then used to acquire a DI value averaged over the corresponding RoI on the other (unstained) slice and the results were statistically analysed.


Abstract
The majority of cancers are diagnosed using excised biopsy specimens. These are graded, using a gold-standard histopathology protocol based on haemotoxylin and eosin ('H + E') chemical staining. However the grading is done by eye and if the same biopsy is graded by different practitioners, they typically only agree ~70% of the time.
The resulting overtreatment problem constitutes a massive unmet need worldwide.
Our new "Digistain" technology, uses mid-infrared imaging to map the fractional concentration of nucleic acids, i.e. the nuclear-to-cytoplasmic chemical ratio (NCR) across an unstained biopsy section. It allows a quantitative 'Digistain index' (DI) score, corresponding to the NCR, to be reproducibly extracted from an objective physical measurement of a cancer. Our objective here is to evaluate its potential for aiding cancer diagnosis for the first time. We correlate the DI scores with H + E grades in a double-blind clinical pilot trial.
Two adjacent slices were taken from 75 breast cancer FFPE blocks; one was graded with the standard H + E protocol, and also used to define a 'region of interest' (RoI). Digistain was then used to acquire a DI value averaged over the corresponding RoI on the other (unstained) slice and the results were statistically analysed.
We find the DI score correlates significantly (p = 0.0007) with tumor grade in a way that promises to significantly reduce the inherent subjectivity and variability in biopsy grading.
The NCR is elevated by increased mitotic activity because cells divide when they are younger and, on average, become smaller as the disease progresses. Also, extra DNA and RNA is generated as the nuclear transcription machinery goes awry and nuclear pleomorphism occurs. Both effects make the NCR a recognized biomarker for a wide range of tumors, so we expect Digistain will find application in a very wide range of cancers. administer potentially harmful cytotoxic chemotherapy rests to a great extent on grading.
Currently, hematoxylin and eosin ('H + E') staining is widely used for the histopathological diagnosis of cancer [1]. Biopsy specimens are processed into formalin-fixed-paraffin-embedded (FFPE) blocks. These are sectioned, by microtome, into slices usually a few microns thick, before being deparaffinised and H + E stained. Hematoxylin binds to the acidic cell components, primarily DNA and RNA, and dyes the nuclei blue, while the eosin dyes the cytoplasmic proteins pink.
Typically the sections are graded subjectively by eye, using disease-specific grading protocols. Histological grading of breast cancer currently varies widely across multiple institutions and practitioners, because it relies on subjective criteria [2]. For example, analysis of >24 000 biopsies, graded by 732 breast cancer graders, found that they only agreed ~73% of the time [3] on average.
The mid-infrared (mid-IR) part of the electromagnetic spectrum roughly spans wavelengths between λ ~ 2.5 µm and λ ~ 25 µm. Molecules absorb these wavelengths linearly, by exciting vibronic transitions that are specific to their chemical bonds. In a large molecule each chemical moiety supports its own collection of localized vibrational modes, whose absorption wavelengths and strengths are well-known. These give so-called 'fingerprint' absorption peaks in the molecules' mid-IR absorption spectra. IR chemists have long used these, often in automated systems [4], to analyze the chemical compositions of unknown mixtures, and to quantify the concentrations of various compounds in a mixture.
Digistain extends this idea by measuring the absorption of specially chosen mid-IR wavelengths [5] in a 2D image. The wavelengths that are chosen, λ ~ 8 µm and λ ~ 6 µm, are absorbed as vibrational excitations in the phosphate (nuclear) and amide (cytoplasmic) moieties, respectively, that are present in tissue (section 2, Methods). This raw absorption data is processed to produce a spatial map of the ratio of these two chemical components, that is precise to the one percent level [6], across a tissue section.
A computer then generates a false-colour 'digitally stained' image from this [7] that presents both the morphological and the chemical information (figure 1) in a visual form (section 2, Methods). This image then can be readily assimilated and used by histopathology personnel, and it can also be directly compared with the standard H + E images to augment the grading process.
A pathologist selects a 'region of interest' (RoI) by referencing the corresponding H + E slide, and the corresponding pixel values in the Digistain image of the tumor are averaged to generate a 'DI score', also precise to the percent level [6], which we then use to correlate with the clinical and histopathological data.
Here we report the results of an initial exploratory pilot trial, using archived FFPE blocks with follow up, aimed at identifying the diagnostic capability of this Digistain imaging technology in Breast Cancer. Table 1 shows the patient and disease characteristics. Patients were all female with ages ranging from 30.4 to 83.7, mean age 58.7 at diagnosis. More than half of the sample had grade 2 tumors (54.3%). The majority were HER2 negative (90.0%). Initially there were 75 patients in the sample. Four patients were recurrences and were excluded from the grading analysis. Figure 2 shows the distribution of the DI score (section 2, Methods) in the full sample of 75 patients. Also shown are the data subsets for those patients with and without 5 year follow up. The DI score has a reasonably Normal distribution with mean and median values slightly higher for those who died compared with those still alive at 5 years.

Association between DI scores and tumor grade
The full dataset was used to explore relationships between the potential prognostic variables. A significant relationship between DI score and grade was observed (p = 0.0007). No other significant relationships were observed between the other potential prognostic variables (table 2). Figure 3 shows that mean DI score increases with grade. The mean DI score for grade 1 patients is 0.58 (SD 0.08), the mean is increased to 0.61 (SD 0.07) for grade 2 patients and 0.68 (SD 0.09) for grade 3 patients.

Association of DI score with survival time
One patient had no follow up information so is excluded from these analyses. Twenty-one patients had died at the time of follow up and 49 were censored. The median survival time had not been reached and the low number of deaths in the study resulted in wide confidence intervals around the hazard ratio, and the DI score alone did not significantly influence the survival time (p = 0.19, Cox regression).
This is the first ever study of the value of the DI score in a clinical setting, and the DI index is a continuous variable, so there is no a priori way of choosing an appropriate DI cut point for stratifying the sample into 'low' and 'high' risk survival groups. As a first effort, we employed the Contal O'Quigley method (section 1, Methods) as an unbiased way of identifying the optimal DI cut point. The actual DI cut point value, (DI = 0.668, p = 0.20) returned by this analysis must be regarded only as preliminary, but nevertheless, using it in a log-rank test already suggests that Digistain can stratify significantly (p = 0.02) for risk.

Discussion
In summary, this study shows that this new Digistain technology for mapping out the NCR, can measure the grade of a breast cancer biopsy objectively.
By processing and mapping the ratio of the concentration of the phosphate and amide moieties across the tissue section it is possible to generate an image. Direct comparison with the standard H + E images then allows a DI score to be produced, that summarizes the tumor NCR, and we find that this correlates very strongly with tumor grade.
The significance of the correlation (p = 0.0007) observed here, in spite of the moderate N = 71 sample size, argues that, already, Digistain could form a useful adjunct to existing H + E Breast cancer grading. The grading scheme used in the NPI is 33% weighted to mitotic activity, and 33% weighted to nuclear pleomorphism. Both of these are likely to be strongly correlated with the NCR, and we believe this is the mechanism behind the correlation that we find here.
The significant (p = 0.02) correlation with survival times is encouraging, and can be qualitatively explained by the same factors as the correlation with grade. However, since, this is the first and only time the DI score has been used in this way, independent trials will be needed to exclude the possibility of over fitting artefacts before an appropriate DI cut-point can be determined.
As seen in table 2, the significance of the correlation between grade and DI is rather greater than that between DI and many of the objective variables that are already used in diagnosis. This argues that augmenting the current subjective grading protocol with the DI information is likely to reduce overall diagnostic variability.
This new Digistain technique uses routinely processed, unstained paraffin sections of cancer tissue, and it images and quantifies the NCR chemical marker within minutes. The test can be performed by unskilled personnel, with a single mouse click, and, the image information it generates can be stored, transmitted and analysed (including using 'big data' techniques) all in the digital domain.
These DI values come from specimens that are collected and prepared in a way that is already established and respected by the clinical profession. The majority of the overall clinical testing process that is needed is already part of the existing 'gold standard' H + E 'standard operating procedure' (SOP), so it is already accepted, approved and budgeted for.
The Digistain process adds no extra patient procedures to existing cancer diagnosis schemes, and it fits well with existing pathology laboratory workflows. As in this study, it can be conveniently evaluated post hoc, using a wide range of cancer samples that are already readily available in the form of archived FFPE blocks. This offers a route to clinical acceptance that has a minimum of cost and ethical implications.
The Digistain technology is label-free, and it uses low radiation intensities and photon-energies that are harmless to both the operator and the sample. The imaging machine requires only a 13A mains supply and a 30 × 30 cm bench top footprint. Future trials may allow the grading process to be partially or fully automated, especially since the DI images are already in a digital format that is compatible with machine-vision methods.
Constraining the data to only 4 pre-selected wavelengths saves time because only useable data is acquired, and the simplicity of the data processing means that we can test and understand the diagnostic value of the image data with confidence.
In addition, it should be noted that, because of the controlled and objective nature of the imaging measurement, the DI pixel values are absolute values that are directly relatable to the NCR. Since the NCR itself is a known cancer biomarker [8][9][10] this implies that the individual pixel values themselves could carry diagnostic (or even prognostic) value, over and above the morphology information in the image. This 'absolute value' feature is highly unusual in the field of biomedical imaging and it offers new avenues for future machine vision image analysis. Finally, we note that pleomorphism and mitotic activity changes are common to a range of cancers, and we believe that that future studies will reveal that the technology has both a diagnostic and a prognostic capability that has the potential to impact across a very wide spectrum of diseases and conditions that are currently assessed by histopathology.

Statistical methods
No formal sample size calculation was carried out for this, the first pilot trial of the Digistain technology. Samples were drawn from an audited collection of randomly selected breast cancer biopsy tissue blocks on the basis of available histological material and Table 1. Patient and disease characteristics. 70 patients included in the study were female aged 30-84 years. 83% of the tumours were invasive ductal carcinoma (IDC) type. Grade corresponds to tumour grade. HER2, PgR and ER refer to human epidermal growth factor receptor, progestorone receptor and estrogen receptor statuses, respectively. 66% of the patients were LN + (lymph node positive). One patient opted out of surgery. A Digistain index (DI) was recorded for each patient.  Patient and disease characteristics are summarized descriptively. The DI score is summarized descriptively using mean, median, standard deviation, range for the overall sample and for each grade. Boxplots are used to visualize these data. The association of grade with DI score is explored using linear regression.
Kaplan-Meier curves (not shown) were used to visualise the length of survival and censoring in the patient sample. Cox regression was used to explore the association of DI score with survival time. The Contal and O'Quigley method was used to explore if a cut point for DI score could be found that was able to separate the survival curves into a high and low risk group.
The work here is reported in accordance with the REMARK checklist [11] for prognostic marker studies.

Measuring the DI scores
Clinical samples were de-identified and two adjacent sections, each ~3 µm thick, were cut from each FFPE block; one was H + E stained and graded, as usual, under the Elston-Ellis method as used in the Nottingham prognostic index (NPI) system. It was also used by the histopathologist to visually delineate a RoI, which corresponded to the most significant grade of tissue in the section, i.e. the region in the tumour with the highest cellularity. The second section was mounted on an IR transmitting CaF 2 slide, then it was de-paraffinised using the same process as defined in the standard bio bank H + E staining SOP.
The Digistain imager uses bespoke software to register the H + E image with the Digistain one, before acquiring the DI score for the section by averaging the DI values of the pixels corresponding to the tissue in the RoI.
The Digistain imager [4] operates with an ~11 µm effective pixel size and takes pictures, in transmission, of the tissue section in wavelength bands specified by 4 IR interference filters. One filter records an image at the λ ~ 6.06 µm peak from the N-H 'amide 1' absorption, and a second records an image at a nearby wavelength, (λ ~ 6.23 µm) to serve as a background. The 4 images are accurately registered, and subtracting these two signals at a pixel-by-pixel level gives a value that is proportional to the concentration of N-H moieties in that part of the section. Repeating the process with mid-IR filters centered at λ ~ 8.13 µm and λ ~ 8.50 µm gives a measure of the areal concentration of the phosphodiester moiety.
The N-H and PO − 2 moieties are mainly present in the cytoplasmic proteins and nuclear DNA backbone, respectively, so these two images could, in principle, be used to computer generate the pink and blue components in a qualitative digital analogue of conventional H + E image. However, 'Digistain' goes on to use a proprietary procedure [7] to generate a quantitative pixel value, the so-called DI value that corresponds to the absolute ratio of the phosphodiester to the amide concentrations at that point in the section. The ratioing protocol is designed to produce a repeatable DI in a way that is robust against technical factors such as nonuniformities in illumination and detector sensitivity, thermal background radiation, and section thickness [4].
The DI measures the concentration ratios of phosphodiester to amide moieties, and because these are dominantly related to the amounts of nuclear and cytoplasmic material, respectively [4], the DI images can be regarded as 2D maps of the nuclear-to-cytoplasmic ratio, (NCR).

Grant support
The work was part-funded (HA and CP) by an Engineering and Physical Sciences Research Council 'Impact Acceleration' grant, reference number EP/K503733/1. We also thank Imperial BRC and the Imperial CRUK Centre for support. The tissue samples were kindly provided by the Imperial College Healthcare NHS Trust Tissue Bank.

Author contributions
CP, HA, SS and RCC conceived, managed and designed the research programme. GT, MS and TM prepared and supplied the biopsy sections. HA and CP provided the Digistain image data. SS provided the histopathology data. LW-B, BR and KG supplied patient follow up data. KC and CW-B performed the statistical analysis. CP, HA, CC, SS and GT prepared the manuscript.