Brain tumor grading diagnosis using transfer learning based on optical coherence tomography

In neurosurgery, accurately identifying brain tumor tissue is vital for reducing recurrence. Current imaging techniques have limitations, prompting the exploration of alternative methods. This study validated a binary hierarchical classification of brain tissues: normal tissue, primary central nervous system lymphoma (PCNSL), high-grade glioma (HGG), and low-grade glioma (LGG) using transfer learning. Tumor specimens were measured with optical coherence tomography (OCT), and a MobileNetV2 pre-trained model was employed for classification. Surgeons could optimize predictions based on experience. The model showed robust classification and promising clinical value. A dynamic t-SNE visualized its performance, offering a new approach to neurosurgical decision-making regarding brain tumors.


Introduction
According to the Central Brain Tumor Registry of the United States (CBTRUS) statistics [1], originating from glial cells, astrocytoma stands out as the most prevalent malignant brain tumor, known as gliomas.Their defining characteristic is their invasive growth into the surrounding white matter of the brain, posing challenges in distinguishing the tumor's border from the brain tissue.Based on growth rate and invasiveness, it is categorized into low-grade glioma (LGG) and high-grade glioma (HGG).LGG encompasses pilocytic astrocytoma (grade 1) and astrocytoma with IDH mutation (grade 2), while HGG includes astrocytoma with IDH mutation (grade 3, 4) and glioblastoma with IDH-wildtype (GBM, grade 4).Patients with HGG have an average life expectancy of approximately ten months [2].For LGG, the lifespan exhibits variations across studies, spanning from 61.1 to 90 months [3].About 45% of LGG evolve the malignant variant (HGG) within five years [4,5].Complete tumor resection represents a pivotal phase in the therapeutic process, facilitating the safe removal of a substantial tumor entity, mitigating neurological deficits, and establishing a precise tumor phenotype for subsequent treatment strategies.Statistics indicate a direct correlation between the extent of tumor resection and patient life expectancy [6][7][8][9], particularly for less malignant astrocytomas [10].Retrospective studies have emphasized potential issues: the confusion of primary central nervous system lymphoma (PCNSL) with glioma and inaccurate glioma grade classification, resulting in incorrect treatment decisions [11][12][13][14][15]. Treatment strategies for PCNSL and glioma are different.Also, within glioma, HGG and LGG follow distinct plans.Therefore, accurate clinical classification is crucial to minimize the risk of recurrence [16][17][18][19][20][21][22].
The general diagnosis of brain tumors is to perform surgery.A stereotaxic nerve guidance system guided the surgeon to remove all tumors without harming other brain tissues intraoperatively.After resection, the pathologists conduct paraffin sections (PS) to perform definite tissue diagnosis.While this method can preserve the tissue type and cell characteristics, the significant drawback is that the process takes about days.Since PS cannot provide tissue information intraoperatively, frozen section (FS), was developed, providing surgical strategy insights in approximately 30 minutes.However, the fast freezing can distort cellular structures, making artifacts on its hematoxylin/eosin (H&E) staining image and accuracy inferior to PS.Hence, on-site determination of brain tumors is essential during surgery.Optical coherence tomography (OCT) is a real-time and non-invasiveness imaging technology widely applied in medicine.It provides micro-level image resolution in cross-section and appropriate millimeter penetration depth, filling a vacancy between magnetic resonance imaging (MRI) and fluorescence microscopy.Due to its ability to offer image resolution comparable to pathological results, this approach is sometimes called "optical biopsy."OCT eliminates the need for additional contrast agents, mitigating potential side effects and streamlining the image acquisition process.On-site OCT scanning using a movable cart on fresh ex-vivo brain tissue potentially provides alternative histological information for clinicians.Therefore, OCT emerges as a secure option for facilitating intraoperative diagnoses in neurosurgery [23].
In recent years, there has been a significant surge of interest in deep learning.In addition, achieving rapid and effective convergence requires considerable computational power due to the immense computational complexity involved.Given these challenges, transfer learning is a novel approach for addressing problems across disparate, interconnected tasks by leveraging existing knowledge.Much research has been committed to applying transfer learning using brain MRI [24][25][26][27][28][29][30][31][32].The thriving of transfer learning to MRI brain imaging classification has been developed.However, there is a lack of relevant research on OCT imaging.
Real-time qualitative and quantitative cues offer clinicians valuable intraoperative brain tumor differentiation information.Currently, the time-consuming issue of FS leaves space for improvement.Thus, rapid on-site diagnoses via OCT technology provide an alternative tool to break through the insufficiency of a diagnostic method by FS.This study represents the pioneering effort in combining OCT technology with transfer learning to classify high-and low-grade gliomas.Furthermore, the hierarchical binary classification could align with clinicians' immediate needs.We aim to cultivate a robust OCT system tailored for intraoperative brain tumor assessment.

Materials and methods
In this study, we conducted our experiment on ex vivo specimens.Both gliomas and PCNSL samples were excised from routine surgical operations.Unfortunately, there is no published OCT normal brain tissue datasets.Given the unavailability of normal brain tissues from surgical routines, we selected the porcine brain as a surrogate for normal human brain tissues as the porcine brain resembles humans in histological characteristics and is suitable for preliminary clinical trials.[33,34] Following OCT measurements of the specimens, we employed the pre-trained MobileNetV2 model for subsequent deep learning.This design ultimately yielded predictions for three probabilities.Model performance evaluation used a confusion matric and dynamic t-distributed stochastic neighbor embedding (t-SNE) scatter plots [35].Experimental details are illustrated in the following sections accordingly.
The utilized OCT system remained consistent with the one previously published [36].It was designed using a single-mode fiber-based balanced Mach-Zehnder interferometer configuration.The high-speed swept-source laser (HSL-20-50, Santec Corp.) had a center wavelength of 1.31 µm with a full width at half maximum (FWHM) of 100 nm, reaching a theoretical axial resolution of 8 µm in air.The A-line scanning rate of 50 kHz was provided by the amplified photodetector (APD) (PDA05CF2, Thorlabs) to detect the reflection from the fiber Bragg grating (FBG) (FBGSMF-1266-80-0.2-A-(2)60F/E,L = 1 M, Tatsuta Electric Wire & Cable Co., Ltd.) at the wavelength of 1266.0 nm.The k-linearity calibration was performed using the built-in k-trigger of the laser.
Figure 1 shows the current swept-source OCT (SS-OCT) schematic diagram.The input light was split by the C1 coupler into a minor portion and a major portion, with a ratio of 1:99.The minor part was directed towards the FBG, resulting in 80% of the incoming beam being reflected at a wavelength of 1266.0 nm.The APD captured the reflected beam as the A-trigger signal.The major part passed the C2 coupler and entered the interferometer's reference and sample arms through two polarization-insensitive optical circulators (PICIR-1214-12-L-05-NE, OF-Link Communications Co., Ltd.) Cir1 and Cir2, with a ratio of 20:80.In the reference arm, the light was collimated by a fiber collimator (FC1) (F260APC-C, Thorlabs) and an achromatic lens (L1) (AC254-030-C-ML, Thorlabs) and reflected by a gold-coated mirror.In the sample arm, galvanometers (G1, G2) (GVS012, Thorlabs) and fiber collimator (FC2, L2) were added to the optical path before the samples.The system had an approximate lateral resolution of 18 µm in the air.The interference signal was formed in the C3 coupler by transmitting the beams from the two arms via the circulators and detected by a balanced photodetector (BPD) (PDB480-AC, Thorlabs) to acquire less contaminated signals.The system sensitivity achieved was 91.58 dB.Before entering the waveform digitizer (ATS9350, Alazar Technologies), the electric signals were filtered by a high-pass (HPF) (ZFHP-0R23-S+, Mini-Circuits International) and low-pass filter (LPF) (BLP-90+, Mini-Circuits International), leading a designated frequency band of 0.23 to 81 MHz.Finally, the interference signals were sampled linearly in k-space.
theoretical axial resolution of 8 µm in air.The A-line scanning rate of 50 kHz was provided by the amplified photodetector (APD) (PDA05CF2, Thorlabs) to detect the reflection from the fiber Bragg grating (FBG) (FBGSMF-1266-80-0.2-A-(2)60F/E,L=1M, Tatsuta Electric Wire & Cable Co., Ltd.) at the wavelength of 1266.0 nm.The k-linearity calibration was performed using the built-in k-trigger of the laser.
Fig. 1 shows the current swept-source OCT (SS-OCT) schematic diagram.The input light was split by the C1 coupler into a minor portion and a major portion, with a ratio of 1:99.The minor part was directed towards the FBG, resulting in 80% of the incoming beam being reflected at a wavelength of 1266.0 nm.The APD captured the reflected beam as the A-trigger signal.The major part passed the C2 coupler and entered the interferometer's reference and sample arms through two polarization-insensitive optical circulators (PICIR-1214-12-L-05-NE, OF-Link Communications Co., Ltd.) Cir1 and Cir2, with a ratio of 20:80.In the reference arm, the light was collimated by a fiber collimator (FC1) (F260APC-C, Thorlabs) and an achromatic lens (L1) (AC254-030-C-ML, Thorlabs) and reflected by a gold-coated mirror.In the sample arm, galvanometers (G1, G2) (GVS012, Thorlabs) and fiber collimator (FC2, L2) were added to the optical path before the samples.The system had an approximate lateral resolution of 18 μm in the air.The interference signal was formed in the C3 coupler by transmitting the beams from the two arms via the circulators and detected by a balanced photodetector (BPD) (PDB480-AC, Thorlabs) to acquire less contaminated signals.The system sensitivity achieved was 91.58 dB.Before entering the waveform digitizer (ATS9350, Alazar Technologies), the electric signals were filtered by a high-pass (HPF) (ZFHP-0R23-S+, Mini-Circuits International) and low-pass filter (LPF) (BLP-90+, Mini-Circuits International), leading a designated frequency band of 0.23 to 81 MHz.Finally, the interference signals were sampled linearly in k-space.Fig. 1.System diagram of swept-source OCT (SS-OCT) system.Black curves, dotted curves, and spaced yellow regions represent optical fiber path, electrical wires, and air-space beam transmission, respectively.HSL, high-speed laser; C1, 1:99 coupler; C2, 20:80 coupler; C3, 50:50 coupler; Cir1, Cir 2, circulator; L1, L2, lens; APD, amplified photodetector; BPD, balanced photodetector; FBG, fiber Bragg grating; FC, fiber collimator; FG, function generator; G1, G2, Galvano scanner; GC, Galvano controller; HPF, high-pass filter; LPF, low-pass filter; M, mirror; S, sample; PC, personal computer.
Self-developed LabVIEW programs controlled all system functions (LabVIEW 2017, National Instruments).The function generator (FG) governed the two galvanometers, ensuring synchronization with the waveform digitizer for two-dimensional scanning.The scanning area covered 5 mm (width x) × 5 mm (width y), encompassing a C-scan comprising 1000 × 1000 Ascans.Each sample was scanned volumetrically in multiple directions to increase the data.The scanning area was adjusted for each OCT volume measurement to minimize the structural similarity among OCT volumes from the same specimens.In addition, frames exhibiting strong reflections were manually excluded from model training as they degraded image quality.This study aimed to confirm the feasibility of the proposed algorithm.. System diagram of swept-source OCT (SS-OCT) system.Black curves, dotted curves, and spaced yellow regions represent optical fiber path, electrical wires, and airspace beam trans-mission, respectively.HSL, high-speed laser; C1, 1:99 coupler; C2, 20:80 coupler; C3, 50:50 coupler; Cir1, Cir 2, circulator; L1, L2, lens; APD, amplified photodetector; BPD, balanced photodetector; FBG, fiber Bragg grating; FC, fiber collimator; FG, function generator; G1, G2, Galvano scanner; GC, Galvano controller; HPF, high-pass filter; LPF, low-pass filter; M, mirror; S, sample; PC, personal computer.
Self-developed LabVIEW programs controlled all system functions (LabVIEW 2017, National Instruments).The function generator (FG) governed the two galvanometers, ensuring synchronization with the waveform digitizer for two-dimensional scanning.The scanning area covered 5 mm (width x) × 5 mm (width y), encompassing a C-scan comprising 1000 × 1000 A-scans.Each sample was scanned volumetrically in multiple directions to increase the data.The scanning area was adjusted for each OCT volume measurement to minimize the structural similarity among OCT volumes from the same specimens.In addition, frames exhibiting strong reflections were manually excluded from model training as they degraded image quality.This study aimed to confirm the feasibility of the proposed algorithm.
The Department of Neurosurgery at Taipei Veterans General Hospital in Taiwan recruited participants for this study, and all subjects provided written informed consent.Patients between the ages of 20 and 65 who required resection surgeries were eligible for inclusion, while those with metastatic brain tumors or who had undergone chemotherapy or radiotherapy were excluded.Tumor specimens were obtained during routine surgical procedures and preserved in formalin solution, along with porcine samples.We conducted OCT-scanning ex vivo after one day in formalin solution.The size of each specimen was at least 5 mm × 5 mm × 5 mm (maximum 10 mm × 10mm × 10 mm).Each scan covered a physical size of 5 mm (B scan, width x) × 5 mm (C scan, width y) × 5 mm (depth, z).The study received ethical approval from both the Institutional Review Board (IRB) of Taipei Veteran General Hospital (2019-07-022CC) and the National Chiao Tung University (NCTU-REC-108-066E).
Figure 2 depicts the image preprocessing before deep learning.In addition, all images were captured from different angles, even from the same specimen.All data processing was done using Python v3.6 programming language with CUDA GPU acceleration on a personal computer equipped with 16.0 GB RAM, an Intel Core i5-7500 CPU operating at 3.40 GHz, and an NVIDIA GeForce GTX1660 GPU.Starting from the captured interference signal, background subtraction, and calibration of k linearity in the frequency domain, then Fast Fourier Transform to the spatial domain.After that, speckle reduction and artifact removal would generate OCT images.Despecked images involved averaging seven adjacent B-scans after undergoing translational registration.We resized and normalized the despeckled images into the size of 128 pixels (depth) × 256 pixels (width) (physical range of 2.5 mm × 5.0 mm) before the training to achieve the efficient training of the neural network.Data augmentation was implemented through random combinations of rotation, translation, horizontal flip, and zooming.To preserve the morphological features in the OCT images, translation, and zooming effects were set at a ratio of 0.1.As such, strict control was exercised over image quality to eliminate potential variables that could influence the results.
The model was grafted from MobileNetV2 pretrained on the ImageNet dataset.We removed the last layer of the MobileNetV2 model and then added three new layers, as shown in Fig. 3(a), to learn the relation between the features extracted by the MobileNetV2 model and desired output labels.During the training processing, a batch size of 32 images was employed, and the Adam optimizer was selected with a learning rate of 0.00001.Furthermore, validation accuracy served as the criterion to evaluate performance and stopped when the validation data's accuracy function ceased to increase for 20 consecutive epochs.It is worth noting that the activation function of the output layer here utilized the sigmoid function to enable the adjustable thresholds for individual classes.Figure 3(b) depicts our self-built binary hierarchical classification flowchart.The predictions were completed by sequentially deciding whether the tissue was normal, PCNSL, and HGG.The decision was arranged by the closeness of the labels pathologically.This technique is proper when the patient's prior probability is available based on the surgeons' experience, and the user can adjust the threshold accordingly to optimize the confidence of the prediction outcomes.
Two visualization techniques, confusion matrix, and t-SNE, were showcased to assess the model performance.T-SNE stands out as a powerful dimensional reduction method that preserves the local structure of transformed data, making it particularly applicable for visualizing highdimensional datasets.This research used t-SNE to examine data distributions.Moreover, Bokeh, one of the Python modules, can make the graphs active without static presentation.Through dragging or zooming with the mouse, each dot on the dynamic t-SNE plot would show the sample's true label, predicted label, and corresponding OCT image.We successfully designed a generate OCT images.Despecked images involved averaging seven adjacent B-scans after undergoing translational registration.We resized and normalized the despeckled images into the size of 128 pixels (depth) × 256 pixels (width) (physical range of 2.5 mm × 5.0 mm) before the training to achieve the efficient training of the neural network.Data augmentation was implemented through random combinations of rotation, translation, horizontal flip, and zooming.To preserve the morphological features in the OCT images, translation, and zooming effects were set at a ratio of 0.1.As such, strict control was exercised over image quality to eliminate potential variables that could influence the results.desired output labels.During the training processing, a batch size of 32 images was employed, and the Adam optimizer was selected with a learning rate of 0.00001.Furthermore, validation accuracy served as the criterion to evaluate performance and stopped when the validation data's accuracy function ceased to increase for 20 consecutive epochs.It is worth noting that the activation function of the output layer here utilized the sigmoid function to enable the adjustable thresholds for individual classes.Fig. 3(b) depicts our self-built binary hierarchical classification flowchart.The predictions were completed by sequentially deciding whether the tissue was normal, PCNSL, and HGG.The decision was arranged by the closeness of the labels pathologically.This technique is proper when the patient's prior probability is available based on the surgeons' experience, and the user can adjust the threshold accordingly to optimize the confidence of the prediction outcomes.Two visualization techniques, confusion matrix, and t-SNE, were showcased to assess the model performance.T-SNE stands out as a powerful dimensional reduction method that preserves the local structure of transformed data, making it particularly applicable for visualizing high-dimensional datasets.This research used t-SNE to examine data distributions.Moreover, Bokeh, one of the Python modules, can make the graphs active without static presentation.Through dragging or zooming with the mouse, each dot on the dynamic t-SNE plot would show the sample's true label, predicted label, and corresponding OCT image.We successfully designed a user-friendly interface.Each generated plot is saved as an HTML file.After closing the program, users can still open the HTML files to review the model's performance.The results of this segment will be presented in subsequent sections.

Results
Twelve patients diagnosed with glioma, including nine HGG, three LGG, and one with PCNSL, were recruited from routine operations.Each specimen underwent multiple scans to augment the dataset.The recruitment detail is listed in Table 1, where NOR represents the normal brain tissue harvested from a porcine.We divided the data into training and testing datasets to facilitate model training and testing.Within the training dataset, we employed fivefold cross-

Results
Twelve patients diagnosed with glioma, including nine HGG, three LGG, and one with PCNSL, were recruited from routine operations.Each specimen underwent multiple scans to augment the dataset.The recruitment detail is listed in Table 1, where NOR represents the normal brain tissue harvested from a porcine.We divided the data into training and testing datasets to facilitate model training and testing.Within the training dataset, we employed fivefold cross-validation to assess the model's consistency and reliability, ensuring robust performance.Data splitting is shown in Table 2. Regarding data splitting for glioma, the data of one patient will not be allocated to the training set and the testing set simultaneously.This method will complete the model training, and there will be no cheating problem.Unfortunately, PCNSL only has one patient.The source of NOR is the porcine brain.Thankfully, the PCNSL patient provided seven separated specimens.We believe that in this case, the data can be divided into training and testing on the basis of volume, even for the same patient.The volume set of the training set is self-controlled to be a multiple of five at the split data level to implement five-fold cross-validation.
The accuracies of the proposed method on training data, validation data, and testing data under default probabilities (0.5) were 93.09%, 89.56%, and 84.59%, with standard deviations of 4.1%, 3.5%, and 3.6%, respectively, demonstrating an acceptable differentiation power.Figure 4 shows the confusion matrix of the testing data from one of the models.Normal tissues and LGG have a mere portion of misclassifying each other, which may be caused by OCT scanning for inconspicuous lesions.Also, in the third layer classification under default probability (0.5), the accuracy of distinguishing between HGG and LGG reached 80%.An important factor contributing to this challenge is the data imbalance.The primary image features are similar since both belong to the glioma category.More training data must be required to enhance the a The volume set of the training set is self-controlled to be a multiple of five at the split data level to implement five-fold cross-validation.
model performance.Thankfully, our proposed binary hierarchical classification method enables customizing threshold adjustments to align with clinical requirements.
of 4.1 %, 3.5 %, and 3.6 %, respectively, demonstrating an acceptable differentiation power.Fig. 4 shows the confusion matrix of the testing data from one of the models.Normal tissues and LGG have a mere portion of misclassifying each other, which may be caused by OCT scanning for inconspicuous lesions.Also, in the third layer classification under default probability (0.5), the accuracy of distinguishing between HGG and LGG reached 80 %.An important factor contributing to this challenge is the data imbalance.The primary image features are similar since both belong to the glioma category.More training data must be required to enhance the model performance.Thankfully, our proposed binary hierarchical classification method enables customizing threshold adjustments to align with clinical requirements.The evaluation metrics for PCNSL (labeled as LYM) exhibited sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 99.0 %, 100.0 %, 100.0 %, and 99.6 %, respectively.Corresponding metrics for HGG were 80.0 %, 80.0 %, 87.0 %, and 90.4 %, and LGG were 80.3 %, 88.5 %, 70.0 %, and 93.1 %, culminating in an overall accuracy of 86.4 %.According to the receiver operating characteristic (ROC) curves of P NOR , P LYM , and P HGG from the testing data, as depicted in Fig. 5, the mean areas under the curves (AUC) were 0.996, 1.000, and 0.898, showing good differentiation powers of the targeted tumors.Fig. 4. The confusion matrix of the testing data from one model.The sensitivities and specificities of PCNSL (labeled as LYM) were 99.0% and 100%, respectively.While HGG is 80.0% and 80.0%, LGG is 80.3% and 88.5%, leading to an overall accuracy of 86.4%.
We plotted the 2D distribution of the data at the last average pooling layer using dynamic t-SNE, as depicted in Fig. 6.The plot features true boundaries that divide the entire representation into four ground truth categories.The two axes correspond to the meta-features of the data, with each data point representing an OCT image color-coded according to its predicted label.According to the results in Fig. 6, normal tissue exhibited homogeneous appearances, whereas glioma, in general, displayed irregular holes and abnormal attenuation, as reported before [36].In contrast, instead of showing microstructural features, PCNSL surprisingly tended to be homogeneous with a few abnormalities of attenuation.Comparing the histological staining images of HGG and LGG [Fig.6(e) and (h)] in conjunction with the expert physician's commentary, pleomorphic tumor cells (blue arrow) and hypercellularity (yellow star) are typical microstructures of glioma.Glioma entity generally exhibits a formless area without nuclei (green star).Although the structures are similar, we observe that morphologies in LGG and HGG are different.As shown in We plotted the 2D distribution of the data at the last average pooling layer using dynamic t-SNE, as depicted in Fig. 6.The plot features true boundaries that divide the entire representation into four ground truth categories.The two axes correspond to the meta-features of the data, with each data point representing an OCT image color-coded according to its predicted label.According to the results in Fig. 6, normal tissue exhibited homogeneous appearances, whereas glioma, in general, displayed irregular holes and abnormal attenuation, as reported before [37].In contrast, instead of showing microstructural features, PCNSL surprisingly tended to be homogeneous with a few abnormalities of attenuation.Comparing the histological staining images of HGG and LGG [Fig.6(e) and (h)] in conjunction with the expert physician's commentary, pleomorphic tumor cells (blue arrow) and hypercellularity (yellow star) are typical microstructures of glioma.Glioma entity generally exhibits a formless area without nuclei (green star).Although the structures are similar, we observe that morphologies in LGG and HGG are different.As shown in Fig. 6(c) and (d), tissue structures in LGG are relatively denser, while Fig. 6(f) and (g) illustrate that those are sparser in HGG with numerous vesicles (red arrow).The results are consistent with known histological findings and comparable with prior research [38].Through a dynamic t-SNE graph, we can quickly point out the dots' information for misclassification.(Fig. 7) These misclassified points sit near the true boundary, showing their features are too similar to another category.In clinical scenes, the model had saved in the PC of movable OCT cart.After OCT measurement, it takes five minutes for data recording.Then, we will load raw data to perform image pre-processing (five minute).The model importing and use tensorflow.kerascomment: model.prediction to obtain the results.(one to five minutes depending on prediction images numbers).With this visualization, the reliability and convince of our model were affirmed.Fig. 6(c) and (d), tissue structures in LGG are relatively denser, while Fig. 6(f) and (g) illustrate that those are sparser in HGG with numerous vesicles (red arrow).The results are consistent with known histological findings and comparable with prior research [37].Through a dynamic t-SNE graph, we can quickly point out the dots' information for misclassification.(Fig. 7) These misclassified points sit near the true boundary, showing their features are too similar to another category.In clinical scenes, the model had saved in the PC of movable OCT cart.After OCT measurement, it takes five minutes for data recording.Then, we will load raw data to perform image pre-processing (five minute).The model importing and use tensorflow.kerascomment: model.prediction to obtain the results.(one to five minutes depending on prediction images numbers).With this visualization, the reliability and convince of our model were affirmed.

Discussions
The standard diagnosis of brain tumors is surgery.Before the operation, the surgeon examines the patient's clinical symptoms and medical images, determining the type of brain tumor the patient suffers from and pinpointing its location for preoperative surgical planning.Currently, clinics' most common medical imaging examinations are computed tomography (CT) and MRI.Retrospective studies have pointed out that FS might confuse PCNSL with glioma and may inaccurately classify glioma grades, leading to incorrect or over-treatment.[11][12][13][14][15].In a previous statistic of HGG and LGG, comparing HGG and LGG, LGG's was only 78.4%, whereas HGG's accuracy was 91.6% [15].Out of 578 brain tumor cases, 13 were diagnosed as PCNSL by PS, but only four were correctly identified by FS [15].This phenomenon indicates that despite PCNSL's low prevalence in brain tumors, FS's sensitivity to PCNSL is notably poor and often mistaken for glioma.In addition, the treatment strategies for PCNSL and glioma differ according to the clinical recommendation.While gliomas are typically addressed with surgical resection, PCNSL is treated exclusively with chemotherapy.In glioma, HGG and LGG would adopt different grade plans.Usually, HGG patients need appropriate adjuvant radiation therapy plus chemotherapy to avoid relapse after total tumor resection.On the other hand, LGG patients require complete surgical resection alone.Hence, on-site determination of PCNSL and glioma, as well as HGG and LGG, is essential during surgery.It looks forward to the new technology to address this challenge in the clinic.
Figure 6 shows the correctly classified results on the dynamic t-SNE plot, whereas Fig. 7 demonstrates the misclassification.According to our previous findings, NOR OCT images display homogeneous and no microstructural appearance, whereas glioma exhibits abnormal microstructure, such as microcystins, calcification, and hemorrhaging within tumoral regions [36].On the other hand, these are relatively uncommon on PCNSL OCT images [38].The results in Fig. 6 can verify the research reproducibility.Furthermore, we can observe similar overall morphologies when comparing LGG [Fig.6 The tissue structures in LGG appear relatively denser, while in HGG, it seems sparser.Numerous vesicles (red arrow) and pleomorphic tumor cells (blue arrow) can be observed in HGG images.Furthermore, comparing these two types reveals that the dense hypercellularity structures (yellow star) with high reflectivity in HGG samples are absent in the LGG samples.The remarkable correspondence between OCT images and histology [Fig.6(e) and(h)] further substantiates the potential of OCT as a diagnostic imaging tool for clinical glioma diagnosis.
From Fig. 7, LGG and Normal are occasionally misclassified between each other [Fig.7(a) and (b)], as well as HGG and LGG [Fig.7(c) and (d)].Straight artifacts might contribute to these misclassifications.Hence, it becomes essential to undertake further steps involving speckle reduction and registration to address these issues.Also, the misclassification between HGG and LGG likely resulted from their similarities, which aligns with our findings from the visual inspection of OCT images.Moreover, one of the LGGs misclassified for PCNSL may result from scanning out of range.In this condition, specific image pre-processing is required to crop additional regions.In our research, we use the easy average and translational registration to solve speckles, but the processed images still need to be improved.Generally speaking, the most fundamental way is to filter the signal in hardware, while in software, non-local means filter [39] or conditional generative adversarial network (cGAN) [40] is a novel method that can eliminate speckles more comprehensively.Ultimately, classification could be further improved by slightly adjusting the decision boundary without obvious overfitting.
Figure 8(a)-(d) presents the confusion matrix performance for four custom threshold combinations.Detailed values for sensitivity, specificity, PPV, and NPV of each category are outlined in Table 3.The default probability yields an overall accuracy of 86.4%.We focus on adjusting Threshold_1 and Threshold_3 while keeping Threshold_2 constant.Modifying the threshold value of Threshold_3 to 0.3 enhances HGG sensitivity to 95.8%, while LGG sensitivity decreases to 63.51%.Lowering the threshold value of Threshold_1 to 0.3 increases Normal sensitivity to 96.9%.Remarkably, the combination of thresholds (Threshold_1, Threshold_2, Threshold_3) = (0.5, 0.5, 0.4) yields the highest overall accuracy at 88.0%.Comparing different adjustments within the range of 0.4 to 0.7 (a variation of ±0.2) has minimal impact on the overall accuracy, which remains above 80%.It's important to note that these four categories exhibit distinct sensitivity, specificity, PPV, and NPV variations.In this example, we adjusted the thresholds (Threshold_1, Threshold_2, Threshold_3) = (0.5, 0.5, 0.7), meaning that the diagnosis of HGG would be more strict.Although the entire accuracy decreased from 86.4% to 80.7%, the specificity of HGG increased from 80.0% to 97.1%, and the sensitivity of LGG improved from 80.3% to 88.7%, as expected.This pilot study provides a novel approach to using the model.To reduce the impact of subjectivity and variability, it is necessary to give the surgeons a user-friendly operation interface, guide surgeons on how to use the model, adjust the threshold, and monitor and evaluate the use of the model.Therefore, it is possible to achieve cross-disciplinary cooperation between engineering and medicine and practice translational medicine.This design would be valuable when diagnostic evidence is on hand or individual requirements (age, medical history, chief complaint, physical examination, symptoms, et cetera) must be considered.To sum up, the current implementation of transfer learning has provided us with a novel and fast way to classify and differentiate various types and grades of brain tumors.threshold, and monitor and evaluate the use of the model.Therefore, it is possible to achieve cross-disciplinary cooperation between engineering and medicine and practice translational medicine.This design would be valuable when diagnostic evidence is on hand or individual requirements (age, medical history, chief complaint, physical examination, symptoms, et cetera) must be considered.To sum up, the current implementation of transfer learning has provided us with a novel and fast way to classify and differentiate various types and grades of brain tumors.3.   3.
Compared to other quantitative indices like the attenuation coefficient [41], co-channel attenuation, and forward cross-scattering [42,43], our method demonstrates an acceptable ability to distinguish between normal and tumoral tissues, achieving a commendable accuracy rate.Given OCT's ability to discern between normal and tumoral tissues, this approach is also expected to aid surgeons in evaluating residual tumor tissues at the end of the surgical excision in the future.Although differentiation among PCNSL, HGG, and LGG was still to be partially improved due to the limited samples, we anticipate achieving a high differentiation power based on the current insights into OCT image characteristics.This distinction is critical for clinically differentiating PCNSL, HGG, and LGG.
In the research regarding the classification of brain tumors, we showed that tumoral tissues and normal tissues can be distinguished almost perfectly.According to previous reports, [36] GBM displays necrotic areas, thus altering attenuation uniformity.Nonetheless, radiotherapy and surgical coagulation can also cause tissue necrosis [44], and patients treated with radiotherapy were excluded from this study.Also, the porcine brain suffered from no coagulation necrosis.Further studies are warranted to investigate features of tissues from more general occasions, and patients treated with radiotherapy should be included in the future recruitment criteria.In vivo, measuring normal human brain tissues is also one of our plans to claim a more straightforward conclusion.Another concern is that although the cancerous specimens should contain at most infiltrative tissues yet no normal tissues according to the surgical guideline of maximal safe resection, in consideration of a small amount of normal tissue is included and appearing in single frames within an OCT volume, the predictive model should be able to identify them rather than regard them as the class of a given overall label.As a suggestion for future research, the problem can be addressed by the experimental design of data processing introduced with multiple instance a The combination of threshold (Threshold_1, Threshold_2, Threshold_3) = (0.5, 0.5, 0.4) yields the highest overall accuracy at 88.0%.learning, in which negative frames are allowed in a positive OCT volume.Thereby, the prediction accuracy could be further escalated.

Conclusions
In this work, we recruited a limited number of cases to preliminarily validate the feasibility of using OCT to differentiate between PCNSL and gliomas from normal tissues.Subsequently, we determined PCNSL from gliomas and further distinguished between HGG and LGG among gliomas using the MobileNetV2 pre-trained model in an ex vivo experimental setup.Consistent with previous research, normal tissues displayed a homogeneous tissue texture, while PCNSL showed no abnormalities of attenuation along the transversal direction.In addition, HGG demonstrated clear vesicles, albeit with a sparser structural density.In contrast, LGG exhibited a distribution of floccules and dark strips, potentially due to hypercellularity and fibrosis.
Leveraging transfer learning, we achieved an overall testing accuracy of 86.4% for four differentiation categories using default thresholds.Users can modify arbitrary thresholds in real-time through customized binary hierarchical classification as diagnostic evidence is on hand, or individual requirements must be considered.Furthermore, dynamic t-SNE could provide on-site information about targeted data points.Through our custom interface, users can rapidly ascertain the model's performance by viewing the image's sorted number, true label, and predicted label.This user-friendly graphical user interface (GUI) offers tangible support for surgeons' clinical settings.Notably, we observed that all misclassified images congregated at the boundaries of the four categories, suggesting that with sufficient data for model generalization, the classification accuracy could further improve.
To our knowledge, this study marks the first study where normal tissues, PCNSL, HGG, and LGG, were observed in OCT imaging at the same time, and the findings of these distinctive features of OCT imaging might be an essential key during surgery.So far, our results show a promising outlook for combining OCT and transfer learning in PCNSL and glioma identification.Ultimately, this proposed methodology promises to assist surgeons during operations and elevate patient outcomes.

Fig. 2 .
Fig. 2. Image processing flowchart.Multiple scanning from the same specimens were acquired at different sections.The obtained images were preprocessed by speckle reduction, invalid images removal, resize, and normalization.Finally, data augmentation was employed during the model training process.The model was grafted from MobileNetV2 pre-trained on the ImageNet dataset.We removed the last layer of the MobileNetV2 model and then added three new layers, as shown in Fig.3(a), to learn the relation between the features extracted by the MobileNetV2 model and

Fig. 2 .
Fig. 2. Image processing flowchart.Multiple scanning from the same specimens were acquired at different sections.The obtained images were preprocessed by speckle reduction, invalid images removal, resize, and normalization.Finally, data augmentation was employed during the model training process.

Fig. 3 .
Fig. 3. Model design using the transfer learning technique.(a) The model was constructed by adding layers after the pretrained MobileNetV2 model.(b) The final prediction output of the model was adjustable by using the stepwise thresholding technique proposed in this study.NOR, normal tissue; LYM, PCNSL; HGG, high-grade glioma; LGG, low-grade glioma.P NOR , P LYM , P HGG represent the probability given by the model output.

Fig. 3 .
Fig. 3. Model design using the transfer learning technique.(a) The model was constructed by adding layers after the pretrained MobileNetV2 model.(b) The final prediction output of the model was adjustable by using the stepwise thresholding technique proposed in this study.NOR, normal tissue; LYM, PCNSL; HGG, high-grade glioma; LGG, low-grade glioma.P NOR , P LYM , P HGG represent the probability given by the model output.

Fig. 4 .
Fig. 4. The confusion matrix of the testing data from one model.The sensitivities and specificities of PCNSL (labeled as LYM) were 99.0 % and 100 %, respectively.While HGG is 80.0 % and 80.0 %, LGG is 80.3 % and 88.5 %, leading to an overall accuracy of 86.4 %.

Fig. 5 .
Fig. 5. ROC curve of Normal, PCNSL, and Glioma of the MobileNetV2 transfer learning model by the testing data.The AUC are 0.996, 1.000 and 0.898, respectively.

Fig. 5 .
Fig. 5. ROC curve of Normal, PCNSL, and Glioma of the MobileNetV2 transfer learning model by the testing data.The AUC are 0.996, 1.000 and 0.898, respectively.

Fig. 6 .
Fig. 6.Dynamic t-SNE scatter plot of the data points at the last average pooling layer.The colors denoted the predicted label by the model, and the pink line showed the true boundary between four categories.Examples of OCT intensity images of (a) Normal, (b) Lymphoma, (c) (d) LGG and the corresponding (e) H&E stain image, (f)(g) HGG and the corresponding (h) H&E stain image.Yellow star: hypercellularity; green star: Glioma entity; blue arrow: pleomorphic tumor cells; red arrow: vesicles.Scale bar: 1 mm.

Fig. 6 .
Fig. 6.Dynamic t-SNE scatter plot of the data points at the last average pooling layer.The colors denoted the predicted label by the model, and the pink line showed the true boundary between four categories.Examples of OCT intensity images of (a) Normal, (b) Lymphoma, (c) (d) LGG and the corresponding (e) H&E stain image, (f)(g) HGG and the corresponding (h) H&E stain image.Yellow star: hypercellularity; green star: Glioma entity; blue arrow: pleomorphic tumor cells; red arrow: vesicles.Scale bar: 1 mm.

Fig. 6 .Fig. 7 .
Fig. 6.Dynamic t-SNE scatter plot of the data points at the last average pooling layer.The colors denoted the predicted label by the model, and the pink line showed the true boundary between four categories.Examples of OCT intensity images of (a) Normal, (b) Lymphoma, (c) (d) LGG and the corresponding (e) H&E stain image, (f)(g) HGG and the corresponding (h) H&E stain image.Yellow star: hypercellularity; green star: Glioma entity; blue arrow: pleomorphic tumor cells; red arrow: vesicles.Scale bar: 1 mm.

Fig. 7 .
Fig. 7. Misclassified information on dynamic t-SNE scatter plot.The colors denoted the predicted label by the model, and the pink line showed the true boundary between four categories.Examples of OCT intensity images of misclassified each other between LGG and Normal (a)(b), HGG and LGG (c)(d).One of LGG misclassified for PCNSL (e).Scale bar: 1 mm.
Figure6shows the correctly classified results on the dynamic t-SNE plot, whereas Fig.7demonstrates the misclassification.According to our previous findings, NOR OCT images display homogeneous and no microstructural appearance, whereas glioma exhibits abnormal microstructure, such as microcystins, calcification, and hemorrhaging within tumoral regions[36].On the other hand, these are relatively uncommon on PCNSL OCT images[38].The results in Fig.6can verify the research reproducibility.Furthermore, we can observe similar overall morphologies when comparing LGG [Fig.6(c) and (d)] with HGG [Fig.6(f) and (g)].The tissue structures in LGG appear relatively denser, while in HGG, it seems sparser.Numerous vesicles (red arrow) and pleomorphic tumor cells (blue arrow) can be observed in HGG images.Furthermore, comparing these two types reveals that the dense hypercellularity structures (yellow star) with high reflectivity in HGG samples are absent in the LGG samples.The remarkable correspondence between OCT images and histology [Fig.6(e)and(h)] further substantiates the potential of OCT as a diagnostic imaging tool for clinical glioma diagnosis.From Fig.7, LGG and Normal are occasionally misclassified between each other [Fig.7(a)and (b)], as well as HGG and LGG [Fig.7(c) and (d)].Straight artifacts might contribute to these misclassifications.Hence, it becomes essential to undertake further steps involving speckle reduction and registration to address these issues.Also, the misclassification between HGG and LGG likely resulted from their similarities, which aligns with our findings from the visual inspection of OCT images.Moreover, one of the LGGs misclassified for PCNSL may result from scanning out of range.In this condition, specific image pre-processing is required to crop additional regions.In our research, we use the easy average and translational registration to solve speckles, but the processed images still need to be improved.Generally speaking, the most fundamental way is to filter the signal in hardware, while in software, non-local means filter[39] or conditional generative adversarial network (cGAN)[40] is a novel method that can eliminate speckles more comprehensively.Ultimately, classification could be further improved by slightly adjusting the decision boundary without obvious overfitting.Figure8(a)-(d) presents the confusion matrix performance for four custom threshold combinations.Detailed values for sensitivity, specificity, PPV, and NPV of each category are outlined in Table3.The default probability yields an overall accuracy of 86.4%.We focus on adjusting Threshold_1 and Threshold_3 while keeping Threshold_2 constant.Modifying the threshold value of Threshold_3 to 0.3 enhances HGG sensitivity to 95.8%, while LGG sensitivity decreases to 63.51%.Lowering the threshold value of Threshold_1 to 0.3 increases Normal sensitivity to 96.9%.Remarkably, the combination of thresholds (Threshold_1, Threshold_2, Threshold_3) = (0.5, 0.5, 0.4) yields the highest overall accuracy at 88.0%.Comparing different adjustments within the range of 0.4 to 0.7 (a variation of ±0.2) has minimal impact on the overall accuracy, which remains above 80%.It's important to note that these four categories exhibit distinct sensitivity, specificity, PPV, and NPV variations.In this example, we adjusted the thresholds (Threshold_1, Threshold_2, Threshold_3) = (0.5, 0.5, 0.7), meaning that the diagnosis of HGG would be more strict.Although the entire accuracy decreased from 86.4% to 80.7%, the specificity of HGG increased from 80.0% to 97.1%, and the sensitivity of LGG improved from 80.3% to 88.7%, as expected.This pilot study provides a novel approach to using the model.To reduce the impact of subjectivity and variability, it is necessary to give the surgeons a user-friendly operation interface, guide surgeons on how to use the model, adjust the threshold, and monitor and evaluate the use of the model.Therefore, it is possible to achieve cross-disciplinary cooperation between engineering and medicine and practice translational medicine.This design would be valuable when diagnostic evidence is on hand or individual requirements (age, medical history, chief complaint, physical examination, symptoms, et cetera) must be considered.To sum up, the current implementation of transfer learning has provided us with a novel and fast way to classify and differentiate various types and grades of brain tumors.

Table 1 . Recruitment information
a Patient N is porcine, which is used as a surrogate of the normal tissue.b 3 (1 × 1, 2 × 2) means patient A has two specimens and 3 OCT volumes.No.1 specimen has one OCT volume with 971 frames, and No.2 has two with 1942 frames.Several frames were manually deleted due to their being affected by motion during the initial galvanometer return, so there are not 1000 frames per volume.