A novel FTIR analysis method for rapid high-confidence discrimination of esophageal cancer

It is demonstrated that a novel multivariate analysis technique can discriminate with accuracies in the range 81–97% between Fourier transform infrared (FTIR) images of esophageal cancer OE19 and OE21 cell lines, and between esophageal cancer associated myofibroblast (CAM) and adjacent tissue myofibroblast (ATM) cells. The latter cells are morphologically indistinguishable but are known to have functionally important differences in their capacity to stimulate cancer cell growth; this report provides the first accurate spectral discrimination between CAM and ATM cells taken from the same patient. Rapid and accurate discrimination between cell types was achieved, and key wavenumbers were identified which uniquely discriminate between all four cell types. This metrics-based analysis (MA) method is shown to be unique for distinguishing between cancer stromal cells from the same patient. The key wavenumbers differ significantly from those typically found to discriminate between various esophageal cell and tissue types. A comparison is made between the MA and the established Random Forest method, and the advantages of the MA are discussed. Crucially the findings suggest a novel method that allows cancer staging based discrimination of the stromal cell types that provide the niche for tumor


Introduction
Esophageal cancer is the sixth most common cause of cancer mortality [1][2][3] and is the cancer with the fastest rise in incidence in the western world. There are two main forms of esophageal cancer. One is squamous cell carcinoma, which is most common in Asia and is associated with smoking and poor diet. The other is adenocarcinoma, which is more common in the west and is associated with the gastro-esophageal reflux of acid and bile salts and the preneoplastic condition of Barrett's metaplasia of the esophagus [4,5]. Both cancers consist of malignant epithelial cells and stroma and the latter is important for facilitating cancer progression. One of the most important cell types in the stroma is a specialized fibroblast called the myofibroblast that produces growth factors and cytokines that promote cancer growth and metastasis [6,7]. The diagnosis of esophageal cancer follows the standard approach of examining images of excised tissue, obtained by endoscopy, after staining with Haematoxylin and Eosin (H&E). This highlights the nucleic acid and protein content of the specimen at blue and red visible wavelengths respectively. Typically, the interobserver discordance for the diagnosis of the low-grade dysplasia, which is characteristic of the earliest preneoplastic stage of disease is greater than 50% [8]. Although this discordance is reduced to~15% for the diagnosis of the more serious condition of high-grade dysplasia, there is a need to improve the accuracy of diagnosis since false positives can give rise to unnecessary procedures and false negatives can be fatal [8][9][10][11][12][13]. As with all cancers, early detection is critical for the best patient outcome and there is a need for cheaper, more accurate and ideally automated methods for cancer diagnosis and for identification of those patients with Barrett's esophagus at most risk of progressing to dysplasia and cancer.
It has long been recognized that expanding the wavelength range of images of tissue will convey more information and there has been considerable progress in the application of infrared (IR) techniques to the examination of tissue in order to exploit the association of particular IR wavelengths with specific chemical moieties. Fourier transform infrared (FTIR) spectroscopy is one of the most successful techniques applied to studies of cancer and has shown considerable promise for development into a diagnostic tool [14][15][16][17][18][19]. In particular there have been a number of previous applications of FTIR spectroscopy to the study of normal and cancer associated esophageal tissues [8][9][10][11][12][13]. Wang et al. [8] applied a partial least-squares fitting procedure to determine the principal components of the FTIR spectra of squamous, Barrett's non-dysplasia, Barrett's dysplasia and gastric tissue and Maziak et al. [9] gave a direct comparison of the FTIR spectra of normal and cancerous tissue. Quaroni and Casson [10] combined confocal FTIR microscopy and an analysis of second derivative FTIR spectra to distinguish normal and Barrett's esophageal tissue from adenocarcinoma. Amrania et al. [11] have developed 'Digistain', an instrument for use in histopathology that simplifies the analysis of IR spectra by comparing the intensity of two spectral features. Recently Old et al. [12,13] have developed an automated analysis technique for rapid IR mapping that identifies Barrett's dysplasia or adenocarcinoma with high sensitivity and specificity. The conclusions of this previous work are discussed in detail later.
Imaging FTIR typically yields information at each pixel in a twodimensional image at~1000 wavelengths, with each spectrum containing information on the many excitation modes of the large number of different molecular species contained in the specimen. Most reported work has analyzed these large data sets using techniques such as principal component analysis and the identification of 'fingerprints' for characterizing specimens, rather than at the level of detailed assignments of individual vibrational modes that is possible with simpler molecular systems.
In this paper the results of applying a novel multivariate analysis technique to FTIR spectra are described for two esophageal cancer cell lines, OE19 and OE21, and two esophageal myofibroblast cell lines derived from the stroma of an esophageal adenocarcinoma patient. OE19 was derived from an adenocarcinoma from the esophago-gastric junction and OE21 was derived from a squamous cell esophageal cancer. Both were purchased from HPA Culture Collections [20] and maintained as described previously [6,7]. The two myofibroblast cell lines were cancer associated myofibroblasts (CAM) and adjacent tissue myofibroblasts (ATM) obtained from the same patient and previously characterized [6,7].
It is now well recognized that tumor formation requires not just the acquisition of DNA mutations by cancer cells but also an appropriate cellular microenvironment (the cancer cell niche) that facilitates tumor growth and metastasis. Different stromal cell types are implicated in niche formation including inflammatory and immune cells, microvascular cells and cells of fibroblastic lineages. Myofibroblasts are an important sub-set of fibroblasts; CAMs are morphologically similar to ATMs that have been obtained from normal tissue adjacent to the cancer, but they differ markedly in their biology and in particular are strong stimulants of aggressive behaviors by cancer cells [6,21]. Transcriptomic, proteomic and miRNA profiling studies have all provided a basis for understanding the functional differences between CAMs and ATMs [6,22,23]. However, there remains a pressing need for methods that allow the rapid and precise identification of these cell types, not least because this would facilitate the identification of the cellular microenvironments in which tumor formation occurs.
The analysis technique described in this paper is able to discriminate between all four cell types with high accuracy and speed. This is particularly important for CAM and ATM cells in view of the much stronger capacity of the former in stimulating cancer cell growth and invasion [21][22][23][24]. The data therefore support the feasibility of new staging methods for early tumor development based on identifying the presence of those myofibroblasts (CAMs) most likely to facilitate cancer cell growth. Since early diagnosis improves patient outcomes this approach should bring clear benefits.

Materials and methods
Experiments were conducted on two esophageal cancer cell lines (OE19 and OE21) and two esophageal myofibroblast cells lines denoted CAM (cancer associated) and ATM (adjacent tissue associated). The CAM and ATM cells were obtained from the same patient undergoing surgery for esophageal adenocarcinoma [6,7]. This work was approved by the Ethics Committee of the University of Szeged, Hungary. Primary myofibroblast cultures were maintained in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum, 1% penicillin-streptomycin, 1% antibiotic-antimycotic and 1% non-essential amino acid solution as described previously [25] The OE19 and OE21 human Caucasian esophageal cells were obtained from HPA Culture Collections (Sigma, Dorset, UK) [20].
OE19 and OE21 cells were cultured at 37˚C in a 5% CO 2 atmosphere in Roswell Park Memorial Institute (RPMI 1640) growth media (Sigma) supplemented with 2 mM glutamine (Sigma), 10% v/v fetal bovine serum (FBS) (Invitrogen, Paisley, UK) and 1% v/v penicillin/streptomycin (Sigma) until they reached 70-80% confluence. The culture medium was replenished at two-day intervals. The myofibroblast cells were cultured at 37˚C in a 5% CO 2 atmosphere in Dulbecco's modified Eagle medium with L-glutamine containing 10% v/v FBS, 1% v/v modified Eagle medium nonessential amino acid solution, 1% v/v penicillin/streptomycin, and 2% antibiotic-antimycotic. Medium was replaced routinely every 48-60 h and cells were passaged at confluence, up to 12 times. CaF 2 discs (20 mm diameter × 2 mm thick, Crystran Ltd, Poole, UK) were sterilized using ethanol and rinsed with ultra-pure water and left to air-dry overnight. The discs were irradiated with UV for 30 min to ensure sterility. The sterile discs were then placed in each well of a tissue culture twelve-well plate (Corning, New York, USA). The cells (2 × 10 4 ml −1 ) were seeded on each disc and incubated in a 5% CO 2 incubator at 37˚C for two-days. After two-days the media was removed and the cells were fixed with a 4% v/v paraformaldehyde (PFA) (Sigma) solution and stored in 1x phosphate buffered saline (PBS) solution at 4°C until required. Prior to imaging the CaF 2 slide containing the fixed cells was rinsed at least three times with Millipore ultra-pure water (18 MΩ cm). The rinsed slide was then removed from the well plate, the back surface wiped with ultra-pure water to ensure complete removal of any phosphate residue and then left to dry in the slide holder for a minimum of 90 min.
FTIR studies of the cell lines were carried out at room temperature in transmission mode with a Varian Cary 670-FTIR spectrometer in conjunction with a Varian Cary 620-FTIR imaging microscope produced by Varian (now Agilent Technologies, Santa Clara CA, USA) with a 128 × 128 pixel mercury-cadmium-telluride (MCT) focal plane array with a pixel size of 5.5 µm. The spectra were corrected for atmospheric and substrate absorption and the efficiencies of individual pixels in the array. FTIR images were acquired with a spectral range from 990 cm −1 to 3800 cm −1 with a resolution of 2 cm −1 , co-adding 256 scans. Infrared spectra were initially pre-processed using a principal component analysis based noise reduction algorithm. Substantial improvements in signal-to-noise were observed by retaining 10 principal components without the loss of biologically significant information. Spectra were then quality checked to remove those not attributable to the cell (including blank regions of the sample) or to a high degree of scattering. The quality check utilized a threshold based on the height of the Amide I band with spectra having absorbance between 0.03 and 1.00 being retained. Finally, infrared spectra were corrected for resonant Mie scattering with the RMieS-ESMC algorithm using 80 iterations and a matrigel reference spectrum [26][27][28][29].

Data analysis method
An FTIR data cube was acquired for each cell type and was corrected for Mie scattering effects [26]. Each FTIR data cube comprises a set of images of i × j pixels, where typically i × j~10 5 and on average~50% of pixels pass the quality check and Mie scattering correction. The third dimension of the data cube is the FTIR spectra of~1400 data points covering the range of wavenumbers ν = 990 cm −1 to 3800 cm −1 in 2 cm −1 steps. The FTIR image obtained from the OE19 sample is shown in Fig. 1(a). The FTIR spectra characterizing each cell type [ Fig. 1(b)] over the "fingerprint region" of 1000 cm −1 to 1800 cm −1 , were generated from averaging the spectra obtained from each pixel in the corresponding FTIR image of that cell type. This average does not include pixels from blank areas of the image. There are problems in deducing information from a direct comparison of these average profiles.
Firstly, due to variations in the total intensity of the spectra obtained from each specimen, it is necessary to normalize each profile to the same area under the curve. Since the effect of the normalization on the spectral profile depends on the wavelength range used this can hide or exacerbate differences between the profiles of different specimens. Secondly, the standard deviation of the absorbance of all pixels at a given wavenumber is significant and shows significant overlap between cell types (see Supplementary Fig. 1). Consequently a more sophisticated analysis is required to reveal the differences between the spectral profiles of the different cell lines. There is considerable interest in the application of machine learning algorithms and multivariate analysis techniques to such problems and there are several recent reviews of the application of such techniques to FTIR spectra [30,31]. In this work a novel multivariate analysis method hereafter referred to as Metrics Analysis (MA) is described. The metrics were chosen to be the ratios of the absorbance for a given pair of wavenumbers. One advantage of this approach is that the results are independent of absolute absorbance and thus insensitive to factors such as sample thickness or normalization of the spectra. Importantly, this MA method treats all the data equally and does not attribute any biological significance to any particular wavenumber, in contrast to other work such as Fernandez et al. [32,33] in which discrimination of prostate tissues used metrics that were defined to have a significance related to tissue biochemistry. By examining ratios at wavenumbers over the whole range of 1000 cm −1 to 1800 cm −1 , the MA demonstrates the existence of biomarkers at wavenumbers that have not been identified in previous studies using other analysis techniques. The MA method can be divided into three main parts: Stage 1: Training, Stage 2: Testing, and Stage 3: Analysis. For the results reported here, training was completed using 75% of the number of spectra in the data set, which were chosen at random, and testing was undertaken on the remaining 25%. Stage 1 parameterizes each cell type via the calculation of the absorbance ratio at two wavenumbers -the metric. This was done for all wavenumber combinations at a chosen step size over the range 1000 cm −1 to 1800 cm −1 . The step size was 6 cm −1 , as anything smaller has been shown [34] to be unnecessary. As a consequence there are a total of~18000 metrics. In Stage 2 a score was then associated with each metric to quantify how well the metric was able to discriminate between cell types. For each cell type, scores were calculated by making distribution histograms for the metrics (one for the cell type and one for each of the other cell types in the analysis) where a high score is obtained for distributions that are distinct and hence have relatively little overlap. The score is defined by = × − score success rate (1 mislabeling rate) 2 where the success rate (often referred to as the sensitivity) is the rate at which the cell type is labeled correctly and the mislabeling rate (often referred to as the false positive rate) is the rate at which other cell types are labeled incorrectly as this cell type. Given that for the 25% of spectra used in this testing phase, the cell type is known, a success rate can be calculated and the probabilities of identifying the other cell types are used to determine the mislabeling rate. The scores for each metric are used to rank the ability of that metric to distinguish a given cell type. Stage 3 determines the number of metrics that are needed by a voting system to give the best overall success rate for cell type discrimination. The overall success rate is plotted as a function of the number of metrics used which indicates the optimal number of metrics required to achieve the best discrimination.

Discrimination between cell types
The wavenumbers that the MA method finds to be most important for discrimination can be visualized in a plot of the metric scores against ν 1 and ν 2 , hereafter referred to as a Butterfly Plot. Two such plots, for CAM and ATM, are shown in Fig. 2.
All possible metrics are shown in these plots. The color-bar scale ranges from the least important (blue) to most important (red) metrics for discrimination. For the CAM and ATM samples, very different behavior is seen in the Butterfly plots, which highlights the clear discrimination achieved between these two cell types. This is a significant result since histopathologists find it difficult to distinguish between these cell types using the current standard method of optical microscopy on H&E stained samples [22]. For CAM, high scoring metrics are those that contain at least one high wavenumber around 1750 cm −1 (the red regions in Fig. 2(a)). The opposite situation is found for ATM, where high scoring metrics are often associated with at least one low wavenumber around 1150 cm −1 (the red regions in Fig. 2(b)).
While the scores for all the possible metrics (at the chosen step size) are evaluated and shown in Fig. 2, further insight can be obtained by limiting the results to a visualization of the best (highest-scoring) 100 metrics, hereafter referred to as Manhattan Plots. The plots for CAM and ATM are shown in Fig. 3, where the highest-ranked metrics for each cell type are shown plotted for ν 1 (red) and ν 2 (blue). These plots illustrate the combinations of wavenumbers that are used as a function of an increasing number of metrics from 1 to 100. It is clear that there are significant differences in the wavenumbers used for discrimination between these two cell types. In addition to visualizing the metric scores by Butterfly and Manhattan Plots, the success rate can be presented in a plot (Fig. 4) that shows how, for each cell type, the success rate varies with increasing number of metrics used in the analysis. In general, the success rate will eventually diminish due to poor metrics being added that compromise the success rate. Different variation is seen for the different cell types. For example, the success rate for ATM increases with the number of metrics used up to 24 metrics and subsequently decreases. In contrast, the success rate for OE19 is high for a low number of metrics and decreases as more metrics are used. For each cell type, the optimum number of metrics required for discrimination is given by the position of the maximum success rate.
As the data were sampled from a single image for each cell line, there was concern over whether spectra from adjacent pixels, which may be correlated due to the finite spatial resolution of the imaging system, could potentially bias the analysis and hence result in unrealistically high scores. To check this, the spatially ordered spectra were split into training and testing sets in such a way that the vast majority of the training spectra were not adjacent to the testing spectra. This analysis returned results that were indistinguishable from the original sets, demonstrating that any such pixel correlations do not contribute any significant bias to the results.
To aid the interpretation of the wavenumbers that are found to be important in this analysis, the wavenumbers in the top five metrics were examined for each cell type. Five metrics were chosen to give an apposite number of wavenumbers to allow meaningful comparisons between values for different cell types. These wavenumbers are shown in Fig. 5 and summarized in Table 1, and will be discussed further in the  Discussion section.

Comparison of metrics analysis with random forest
In order to compare the MA method with existing classification methods we chose a quantitative comparison with the well-established random forest (RF) method. This is the most appropriate comparison as RF encapsulates both feature extraction and classification, and is commonly used for FTIR data analysis in the biomedical field. The same data sets were analyzed using both techniques for the four cell lines. The RF method used was a standard RF classification algorithm [35] available from https://github.com/tingliu/randomforest-matlab that was used to construct a classifier to discriminate between the different samples. Table 2 compares the MA and RF analysis results for the cell lines. The key wavenumbers found to be necessary for discrimination in both techniques showed some similarities. Little improvement in accuracy was seen when running the RF analysis for greater than~30 s or by increasing the number of trees from 10 to 500. In general the MA method achieves greater accuracy in discrimination (particularly for ATM) in a shorter time (Table 2) than RF. For example, the MA of OE21 achieves a success rate of 79% within one minute whereas RF is limited to~50%. It appears that RF is unable to distinguish ATM, with success rates no higher than would be expected from random chance (25%) when choosing one cell type from four possible types. These low success rates for the RF method are a consequence of the size of the data sets (the number of spectra) associated with each of the cell lines. The MA method gives high success rates regardless of whether the data sets are balanced and of comparable sizes, whereas the RF method is sensitive to this balance and gives poor success rates unless the data sets are rebalanced or the input data are reweighted.

Discussion
There have been significant advances in the application of FTIR to the study of normal and cancerous esophageal tissues [8][9][10][11][12][13]. Maziak et al. [9] compared FTIR profiles of normal and cancerous tissue and revealed prominent absorption changes at certain wavenumbers. In particular, changes at 964 cm −1 and 1237 cm −1 were assigned to increased nucleic acid content in malignant tissue, and changes in the bands at 1024 cm −1 and 1049 cm −1 indicated that glycogen was clearly present in healthy tissue but almost completely depleted in cancerous tissue [9]. Wang et al. [8] showed using a partial leastsquares fitting procedure that the principal components of the FTIR spectra of squamous, Barrett's non-dysplasia, Barrett's dysplasia and gastric tissue in the range 950 cm −1 to 1800 cm −1 arose from variations in the concentration of DNA, protein, glycogen and glycoprotein. They established that dysplasia was characterized by an increase in glycoprotein and DNA. A subsequent imaging study by Quaroni and Casson [10] using a combination of confocal FTIR microscopy and a hierarchical cluster analysis of second derivative FTIR spectra was able to distinguish normal and Barrett's esophageal tissue  [13] have developed a rapid IR mapping automated analysis technique that identifies Barrett's dysplasia or adenocarcinoma with 95.6% sensitivity and 86.4% specificity. Their analysis of second derivative FTIR spectra confirmed that normal squamous tissue had a high glycogen content, Barrett's tissue a high glycoprotein content and Barrett's dysplasia and adenocarcinoma a high DNA content.
The first thing to note from the results of the MA is that the wavenumbers that are found to discriminate between the different cell types (Table 1) differ significantly from the wavenumbers that have previously been used to characterize esophageal tissue types. For example, none of the glycogen, glycoprotein or DNA wavenumbers identified by Wang et al. [8] and Quaroni and Casson [10] or any of the ten characteristic wavenumbers identified in Table 7 of Old et al. [13] appear in Table 1. Also, only four of the twenty characteristic wavenumbers identified as distinguishing normal tissue from adenocarcinoma by Maziak et al. [9] appear in Table 1. This does not mean that the wavenumbers identified in previous work [8][9][10]13] are not valid discriminants (indeed, they are found by the MA when more metrics are included) but that they are not as significant as those found from the top five metrics.
The four wavenumbers common to this work and Maziak et al. [9] provide discriminants, to an accuracy of ± 1 cm −1 , of the following cells from all other cells; ATM (1049 cm −1 ), OE19 and ATM (1399 cm −1 ), OE19 and ATM (1465 cm −1 ) and OE21 (1545 cm −1 ). These wavenumbers are attributed, respectively, by Maziak et al. [9] to glycogen, lipids, lipids and proteins. The meaning of the wavenumbers found to discriminate between cell types in the MA is subtle since they are derived from a blind pair wise comparison of all the wavenumbers in the FTIR spectra of all the cell types. Consequently the discriminating wavenumbers must be interpreted with care. What is clear is that when used in combination with other metrics they provide excellent discrimination between all the cell types (Fig. 5). An analysis at the level of five metrics reveals twenty-four discriminating wavenumbers and as described in detail above, only four of these wavenumbers have been used in previous work to characterize differences between esophageal tissue types. Five of these discriminating wavenumbers in Table 1 are common to more than one cell type. A wavenumber that is common to two cell types means that it discriminates between those cells and all the others. This means that it is a characteristic of a chemical moiety that is either present or absent in those cells in a concentration that is significantly different to its concentration in all other cells.
The finding from previous work [8][9][10]13] that malignancy is characterized by an increase in DNA and a large decrease in glycogen suggests that changes in the concentration of these molecules should provide important discriminants between the ATM cells, which can be taken to be representative of healthy tissue, and the CAM cells and two malignant cell lines. This draws attention to the region between 1000 cm −1 and 1200 cm −1 where there is significant overlap between strong contributions from both molecules [9,36] and Table 1 and Fig. 5 show a strong concentration of discriminating wavenumbers in this spectral region. Fig. 6 shows an overlay of the normalized spectral profiles of Fig. 1 for each cell type in this spectral region. As explained earlier such comparisons of spectra can be misleading due to the dependence of the profiles on the wavelength range over which the normalization is carried out. However by taking a third power derivative of the spectra obtained from normal and malignant tissue Maziak et al. [9] identified four key wavenumbers in this region, 1024 cm −1 , 1049 cm −1 , 1080 cm −1 and 1155 cm −1 which they attributed to glycogen, glycogen, nucleic acids and proteins, respectively. Only one of these wavenumbers, 1049 cm −1 , occurs in the list of discriminating wavenumbers of Table 1 and Fig. 5. A deeper analysis of the data at the optimum number of metrics, twenty-four, reveals a large increase in the number of discriminating wavenumbers in this range as shown in Fig. 6. None of these additional wavenumbers correspond to the wavenumbers identified by Maziak et al. [9]. It is possible that some of the discriminating wavenumbers shown in Fig. 6 arise from particular chemical or structural effects in the DNA of the OE19, OE21 and CAM cell lines which could not be identified from tables of wavenumbers known to arise from particular chemical moieties.
A comparison of the other wavenumbers that discriminate between the different cell types and with the signatures of known chemical moieties [37,38] provides other insights into differences in chemical structure of the cells and tissues. For example the OE19 and CAM cells,   which are both derived from adenocarcinoma, share a discriminant at 1692 cm −1 associated with nucleic acids [37], which is absent from OE21 cells, which arise from squamous carcinoma. This wavenumber may be a moiety that is specific to adenocarcinoma. The OE21 and ATM cells share discriminating wavenumbers of 1466 cm −1 and 1472 cm −1 , which have been identified as characteristics of lipids [36][37][38].
It is particularly notable that the metrics approach provides excellent discrimination between cells derived from adenocarcinoma (OE19) and squamous cell carcinoma (OE21) and that ATM and CAM cells do not share a single one of the fifteen wavenumbers that discriminate between them and the other cell types. Clearly the identification of discriminating wavenumbers between the various cells types contain a wealth of information that is worthy of further study and may produce significant new insights into the chemical structure of esophageal and other cancers.

Conclusions
To summarize, we have demonstrated that a novel multivariate statistical analysis technique can discriminate with accuracies in the range 81% to 97% between FTIR images of OE19, OE21, CAM and ATM cell lines. This provides the first accurate spectral discrimination between CAM and ATM myofibroblast cells taken within 3 cm of tissue from the same patient. It should be stressed that these cell types are not readily distinguished by routine morphological approaches even though it is established that they have important biochemical differences that are relevant to the stimulation of cancer cell behavior [6]. The findings have potential clinical application in early diagnosis by identification of putative cancer cell microenvironments and by allowing the demarcation between tumor and adjacent tissue stroma without recourse to the analysis of biomarkers or extensive tissue processing. This is a significant result since histopathologists find it difficult to distinguish between these cell types using the current standard method of optical microscopy on H&E stained samples [22]. Moreover, the data indicate that it is now justified to conduct a much larger, appropriately powered, trial directed at the spectral discrimination of the important clinical groups, not least those Barrett's patients most at risk of progression including those with dysplastic lesions.
The MA method offers a new way of interpreting FTIR data. It has revealed wavenumbers which uniquely discriminate between all four cell types, many of which have not previously been identified with chemical moieties found in healthy tissue. The method discriminates between cells types with high accuracy and speed and has significant advantages over the RF approach. The method is expected to be widely applicable to other cell types and tissues.

Declaration of Competing Interest
There are no conflicts to declare.