Convenient determination of serum HER‐2 status in breast cancer patients using Raman spectroscopy

Given the significant therapeutic efficacy of anti‐HER‐2 treatment, the HER‐2 status is a crucial piece of information that must be obtained in breast cancer patients. Currently, as per guidelines, HER‐2 status is typically acquired from breast tissue of patients. However, there is growing interest in obtaining HER‐2 status from serum and other samples due to the convenience and potential for dynamic monitoring. In this study, we have developed a serum Raman spectroscopy technique that allows for the rapid acquisition of HER‐2 status in a convenient manner. The established HER‐2 negative and positive classification model achieved an area under the curve of 0.8334. To further validate the reliability of our method, we replicated the process using immunohistochemistry and in situ hybridization. The results demonstrate that serum Raman spectroscopy, coupled with artificial intelligence algorithms, is an effective technical approach for obtaining HER‐2 status.


| INTRODUCTION
Breast cancer is the most common malignant tumor in the world, and its incidence is increasing [1].Breast cancer has complex heterogeneity, which is reflected in gene mutations or differences in protein-receptor expression [2,3].Treatment decisions were suggested to be guided by molecular typing and TNM staging [4].Immunohistochemistry (IHC) or combining it with ISH was recommended to obtain information on HER-2 [5,6].
Identifying HER-2 status plays an important role in breast cancer treatment since anti-HER-2 therapy has greatly improved the prognosis for this molecular type of breast cancer [7].The HER-2 status determined from an ultrasound-guided core needle biopsy (CNB) is highly consistent with the status determined using postoperative tissue specimens [8].In addition to the HER-2 status, IHC can obtain the estrogen receptor (ER), progesterone receptor (PR), and proliferation protein (Ki-67) statuses [9,10].The availability of these high-quality tumor markers, which have been validated for efficacy and clinical utility, is crucial for clinical diagnosis and treatment [4,11].Breast cancer can be divided into several molecular types depending on their statuses concerning these hormone receptors-luminal A and B, HER-2-positive, and triple-negative breast cancer (TNBC)-which further guide treatment plans and determine the prognosis.
The timely availability of serum samples provides a feasible way to realize real-time follow-up.The serum is extremely rich in material components and relatively easy to access, and it provides very strong possibilities for disease diagnosis/prognosis prediction/treatment guidance [12][13][14].Several molecules important in the diagnosis, evaluation, and treatment of breast cancer have also been shown to exist in the serum [15,16].The enzyme-linked immunosorbent assay (ELISA) is the first FDA-approved method for detecting HER-2 in the serum [17].In addition to ELISA, surface plasmon resonance [18], electrochemical detection [19], mass spectrometry [20], and other methods can be used to obtain the HER-2 status from serum samples, and these have high specificity but low sensitivity [15].HER-2 is a transmembrane tyrosine receptor kinase receptor.Like other receptors in the epithelial growth factor receptor (EGFR) family, it is a singlesubunit transmembrane glycoprotein composed of three domains: a transmembrane domain, intracellular tyrosine kinase catalytic domain, and extracellular domain (ECD).Among the EGFR family, HER-2 has the strongest catalytic kinase activity, so the heterodimer of HER-2 has the strongest signaling activity [21,22], and the extracellular domain of HER-2 can be cleaved and released into the serum.The serum HER-2 ECD levels of breast cancer patients are between 15 and 75 ng/mL, significantly higher than those of the normal population and HER-2-negative patients [23].Owing to the heterogeneity of breast cancer, the critical value of the serum concentration of HER-2 ECD, the interference of complex components in the serum, and so on.ASCO, in 2007, did not recommend measuring HER-2 serum levels for clinical diagnosis and treatment [24].However, this does not mean that the HER-2 status cannot be obtained from serum samples.Compared with other reported studies on detecting HER-2 status from serum, this study demonstrated its advantages in accuracy and convenience [15].IHC and ISH are the most recognized and recommended methods.When the IHC result is 2+, ISH is then performed.Fluorescence in situ hybridization (FISH) is the most used method for ISH, which was also the source of the label for HER-2 status in this study.In the case of HER-2 2+, there are limitations on the further evaluation of HER-2 by ISH, in which unique labeling on each tissue section is achieved, resulting in more meaningful information being missed.In addition, the use of IHCbased biomarkers has drawbacks such as subjectivity, the invasiveness of the acquisition, differences in sampling location, poor reproducibility, and the exhaustion of tissue samples [25].Although multiplex IHC/immunofluorescence (mIHC/IF) technologies can simultaneously obtain information on multiple markers from a single slice and acquire information about tissue samples with high throughput, there are still problems with the invasiveness of obtaining the tissue samples and the current high price [26].Nextgeneration sequencing (NGS) is also widely used by the academic community to verify the true status following IHC and ISH for HER-2.The results indicate that the NGS method is consistent with the current gold standard, but the disadvantage is that it consumes many medical resources [27,28].
Raman spectroscopy is an inelastic scattering detection technology based on the Raman scattering effect, it obtains spectral information through the coupling of molecular-bond oscillations and the original electromagnetic field [29].It has been widely used in the field of biology in recent years to classify substances, the obtained spectrum is called a fingerprint spectrum [30].Combined with artificial intelligence, it can realize good classification for early screening/prognosis prediction for many diseases [31].It is worth noting that Raman spectroscopy is not destructive to the sample, had high sensitivity, and can effectively classify samples such as serum [32].Since water molecules have very low photon cross-sections, they do not affect the components of interest in the liquid [29,33].Many studies have also used serum samples and Raman spectroscopy, Including screening for hyperthyroidism [34], gastric cancer [35], rats serum is used to screen for Alzheimer's disease [36].
Although IHC+ISH is the gold standard for HER-2 status acquisition, due to its non-real-time performance and considering the unique advantages of serum Raman spectrum, we reported the possibility of real-time acquisition of serum Raman spectrum in HER-2 status in this study.

| METHODS
The workflow demonstrates the real-time and convenient acquisition of Raman spectra for HER-2 status.The workflow consists of two parts.The first part is the model establishment stage, which includes programs A/B and C in Figure 1.Program A is the Raman spectrum data acquisition program, program B is the standard pathological acquisition program for HER-2 status, and program C is the model establishment program.The second part is the clinical application part, where a new sample can be validated for HER-2 status in real-time using the established workflow, as shown on the right side of Figure 1.

| Ethics approval and consent to participate
This study was approved by the Ethics Committee of Sichuan Cancer Hospital (No: SCCHEC-02-2022-140).Every subject completed an informed consent form in accordance with the Declaration of Helsinki.All methods were carried out in accordance with relevant guidelines and regulations.

| Obtaining clinical information and serum samples
Human serum samples were first obtained from patients who were hospitalized in Sichuan Cancer Hospital (Chengdu, China) from June 2021 to June 2022.Serum samples were obtained within 3 h after the blood was obtained, centrifuged at room temperature at 1500g for 10 min, and placed in cryopreservation tubes.Raman spectroscopy was obtained within 1 h.HER-2 labels were obtained by standard pathology procedures.

| Experimental device
Raman spectroscopy was performed at the Sichuan Institute for Brain Science and Brain-Inspired Intelligence (Chengdu 611731, China).The spectrometer was designed and assembled.The spectrometer comprised a laser/grating/cryogenic CCD/volume phase holographic spectrometer/Raman probe.The SMA fiber adapter and the collimator formed the laser end, which was used to collimate the 785 nm laser.It contained a laser filter (Semrock, LL01-785-12 5) for forming a monochromatic laser, and used a microscope objective lens (50Â, NA 0 5, WD 8 0, Sunnyoptical) to focus the laser.SMA fiber adapters and sequential bilateral filters made up the spectrometer end, which blocked all the backscattered Rayleigh signals in the sample and allowed only the Raman signal to pass through.The objective lens and spectrometer were connected by a 300-μm-diameter fiber, and the spectrometer (lens-based VPH grating type, F/2Á2, EMvision) had a CCD (thermoelectric cooling, À60 C, Andor iVac DR-316B-LDC-DD).The output power was 100 mW, and the power measured on the sample was less than 40 mW.Spectra were recorded at 400-1800 cm À1 .A special sample card slot was made to fix the cryotube, so the laser could pass through its wall to obtain the serum signal.Set the focus so that it focuses on the sample.

| Acquisition of Raman spectra data
Before the Raman spectrum data acquisition, we calibrated the instrument's wavenumbers using a neon lamp, the ethanol spectrum and normal saline spectrum of the cryovial were collected for calibration at the beginning and end of each experiment, and each integration time was 1 s.The sample-fixing tank was provided with a light-shielding device, and cosmic rays were manually removed during collection.Each sample was analyzed in three rounds, and each round was performed five times.A total of 15 spectral data were obtained for each sample.Multiple acquisitions for each sample ensured avoiding the information error in a single acquisition of the sample.The average value for each sample was recorded as an independent sample.

| Acquisition of HER-2 status
The HER-2 status was first determined using standard methods (VENTANA anti-Her2/neu, 4B5), and the HER-2 expression score was classified as 0, 1+, 2+, or 3+ according to the American Society of Clinical Oncology (ASCO) guidelines [6].ISH analysis was required in the case of 2+ to determine whether HER-2 was amplified.In this study, using the FISH technique with a directly labeled probe specific for HER-2, sections (4 μm) were deparaffinized in xylene and rehydrated through a series of graded ethanol solutions, which we fabricated.We observed the slides under a fluorescence microscope equipped (Leica DM6000) with multiple bandpass filters to visualize the colors simultaneously, following standard procedures recommended by the manufacturer.According to ASCO recommendations, samples with IHC scores of 3+ or FISH amplification were considered HER-2 positive, and other samples were considered HER-2 negative.

| Data processing and SVM model
In this study, a total of 459 patient samples were ultimately included, which were divided into a training group and an independent validation group based on patient source, with 70% and 30% of the data, respectively.The training group included 321 samples with a total of 4815 spectral data, while the independent validation group had 138 samples with 2070 spectral data.Among them, the sample labels of the independent validation set included 25 negative, 12 weak positive, 12 2+ FISH positive, 31 2+ FISH negative, and 34 positive patients.Prior to model establishment, all spectra underwent preprocessing steps, including automatic weighted least squares smoothing, baseline correction based on polynomial fitting, and fluorescence background subtraction after baseline correction.The processed spectra were then normalized using the maximum and minimum scaling methods.Statistical tests were performed using analysis of variance (ANOVA).In the training group, 40% of the data was randomly sampled and repeated 100 times.Once the distribution of the random samples met the Gaussian distribution, variance analysis was selected for statistical analysis.The wavenumber points that showed statistical significance in more than 50 out of 100 variance analyses between two comparison groups were selected as feature inputs for support vector machine (SVM) models to establish a classification model.To simulate the clinical process, eight different models were created (Table 1).The k-fold cross-validation method was used to evaluate the performance of the proposed model, based on the final clinical results of HER-2 from clinical samples, we developed a clinically applicable model, denoted as M, for determining HER-2 status.Additionally, to simulate the clinical details of IHC and ISH, we divided the process into three steps, simulating each stage and establishing models M1 to M9.The results are presented using the area under the curve (AUC) metric, and the model labels and classifications are summarized in Table 1.

| RESULTS
The research process is shown in Figure 1, and serum was obtained from 459 patients and the IHC procedure was performed simultaneously, and for those with an IHC result of 2+, the FISH procedure was performed.The average age of the patients was 51.58 (28-80) years, and specific clinical data are shown in Table 2.
In total, 459 samples had HER-2 results and 18 2+ samples had no FISH results.

| Results for Raman spectra
Intergroup labels obtained through IHC staining show that, after averaging the data from four large sample groups, individual differences become nearly imperceptible, with intergroup differences becoming nearly imperceptible as well (differences exist at the 10 À2 level), necessitating further exploration through statistical algorithms (Figure 2).
To make the data model reliable, we first used a twofold variance test to determine whether there were any statistically significant differences between data groups.After confirming the absence of statistical differences, we used analysis of variance to select Raman shifts that exhibited significant statistical differences as feature inputs for the SVM model (Figure 3).Each category was grouped by ANOVA statistics, and the feature points with stable features in the top 5% were selected.
Figure 3 illustrates the situation where all the models produced stable statistical differences at the Raman offset sites.It shows where the stable sites appear, and the color bands mark them.The dark color bands represent the statistically significant areas.The SVM model was established.The top 5% of all the points with stable differences were selected as the feature input model.The figure shows the location and strength of the feature points in the binary classification model.

| The SVM results
We first obtained a model for determining HER-2 clinical labels as negative or positive results.The final model underwent 10-fold cross-validation, yielding an average AUC of 0.8334, with a standard deviation (STD) of 0.0105, indicating good model stability.
After selecting the stable feature points, we established nine models according to the steps, and we calculated the average AUC of these models and the STD of the AUC to check the stability of the model (Figure 4).
The nine models showed good separability and stability, the maximum AUC was 0.881, the M4 model was the classification of 1+ and 2+ statuses in IHC, and the minimum AUC was M7, which combined the FISH-negative 1+ and 2+ statuses.The classification model had an IHC score of 0. The STD represents the stability of the model, and the mean STD of all the models was 0.0168, the specific ROC result is shown in Figure 5.

| DISCUSSION
This study confirms that the use of serum Raman spectroscopy technology can achieve an accuracy of 0.83 when combined with IHC+ISH.Furthermore, the results from 10 rounds of model validation demonstrate the stability of the model.This offers a rapid access method for determining HER-2 status.
Obtaining important immunohistochemical indicators, such as the status of hormone receptors and HER-2 by CNB, are common in clinical practice.In this study, we used serum samples and Raman spectroscopy combined with machine learning for the acquisition of HER-2 status and the establishment of an independent validation set demonstrated the reliability of this approach.The ability to obtain the HER-2 status in real time has attracted the attention of researchers [37,38].The greatest shortcoming of the currently used combination of IHC and ISH is the lack of "real-time" follow-up for HER-2 status [39].
The blue regions correspond to approximate positions reported in the literature.The peaks in the Raman shift coordinates represent the frequency of occurrence of these positions in the 200 feature extractions.It can be observed that the majority of the positions fall within the reported range.
These differences may result from stable changes in the major components of serum, which may include, but are not limited to, hydrolysis of the ECD of HER-2.
Figure 6 shows the Raman spectral bands for breast cancer as reported by Hanna et al. (provided by Renishaw) in the literature.It can be observed that most of the sites with statistically significant differences were consistent with those reported in the literature.These components were mainly cysteine (disulfide bridges), proteins, betacarotene, C-C bond stretching in lipids, and other substances [31,40,41].M. Based on literature reports, we conducted an analysis of compound and molecular bond vibrations at these differential points.The distribution of these molecular bonds closely aligns with the distribution of Raman differential points reported in Figure 5. Additionally, there are a few molecular vibration modes related to DNA, which we speculate may be due to the presence of free DNA in the serum.
In this study, we employed a single model to predict the clinical HER-2 status, and the results showed a high concordance with the gold standard.To further validate our reliability, we dissected the IHC+ISH workflow.In the IHC step, we verified that serum Raman spectroscopy can distinguish between four different staining states of HER-2 protein.In the ISH step, we simulated the amplification status of erbb2.Considering the clinical significance of antibody-drug conjugate drugs in the context of low HER-2 expression, we combined the results of IHC and ISH to categorize HER-2 into three classes: HER-2 negative, HER-2 low expression, and HER-2 amplification.It was demonstrated that serum Raman spectroscopy technology can reliably reproduce the details of IHC +ISH workflow.
IHC 1+ and 2+ FISH-negative patients were classified as patients with HER-2-low breast cancers [50].Our results showed that, after merging the 2+ status, the separability of weakly positive and positive groups decreased.The difference between M1 and M7 was that 98 samples without FISH amplification were added to the 1+ group, resulting in a decrease in the AUC.The difference between M3 and M8 was that 28 FISH-amplified samples were added to the 3+ samples, and the AUC decreased slightly.Considering that the number of FISH-amplified samples was lower than that of non-amplified samples, the mechanism of AUC's decrease may be the same as that for M1 and M7.The difference between M5 and M9 was that 2+ FISH samples were added to the 1+ and 3+ samples, respectively, and the AUC decreased.We speculate that this may be because the expression changes in the serum samples amplified by 2+ FISH were not obvious.At the same time, the IHC+ISH method we used was slightly different from NGS [27].At the same time, our final grouping was consistent with literature reports.These findings suggested the existence of molecular subsets within HER-2-positive, HER-2-low, and HER-2-zero subgroups harboring certain mutational features that could contribute to the molecular heterogeneity of these tumors [51].
There are many studies on obtaining HER-2 status through serum samples, with various biosensor-based methods capable of quantitative analysis, including electrochemical approaches, which can achieve detection at the picogram level [52,53].In addition to quantitative analysis, serum also offers qualitative analysis methods.This qualitative strategy is entirely different from the quantitative one and involves the use of known markers combined with artificial intelligence to establish classification models, successfully modeling molecular subtypes of breast cancer [54].This approach is similar to the main concept of our study, where we establish clinically applicable classification models for disease status based on known labels.
ANOVA helped us to screen valuable regions, but the correlation of shifted positions in Raman spectra with molecules was extremely difficult to interpret because a large number of organic molecules coexist and share some functional groups that affect Raman spectral features [55].This study could not prove which substances caused these differences, and further analysis is required in subsequent studies to further increase the accuracy.
Although this study has certain limitations, the results suggest that serum Raman spectroscopy may be a feasible way to obtain the HER-2 status from blood samples with enormous scalability and unparalleled convenience.

| CONCLUSION
Raman spectroscopy through serum samples can be conveniently and accurately used to determine the status of HER-2/neu.The results are consistent with those for the gold standard IHC/ISH detection and have higher scalability.
Collection of Raman spectra of serum; (B) IHC program for HER-2 status; (C) model establishment program.The new case can obtain HER-2 status through the model.IHC, immunohistochemistry.

F I G U R E 2
The figure depicts the average Raman spectra of intergroup variations in HER-2 protein expression.Panels A and B, respectively, illustrate certain details of intergroup differences.F I G U R E 3 Distribution of Raman spectroscopy features mimicking the IHC+ISH process.M1 to M6 represent the protein staining steps in IHC, comprising six binary models for the four HER-2 protein states.M7 to M9 depict the model features for three HER-2 states when combining FISH results.The red markers indicate the top 5% differential points.M1 to M6 are the procedures for processing the HER-2 staining in the first step of IHC, with six models in four statuses.M7 to M9 are the labels combined with FISH results, for three models in three statuses.The waveform of each model represents the statistical difference among the 100 models.The higher the waveform, the higher the frequency of the difference.The color bar shows the intensity of the frequency.FISH, fluorescence in situ hybridization; IHC, immunohistochemistry.

F
I G U R E 5 Model performance results for the IHC+ISH steps.IHC, immunohistochemistry; ISH, in situ hybridization.F I G U R E 6 Markings of differential Raman shifts from reference breast cancer and the top 5% differential locations of Model M.
Models and label marks.
T A B L E 1Abbreviation: FISH, fluorescence in situ hybridization.