Pulmonary Nodule Detection and Classification Using All-Optical Deep Diffractive Neural Network

A deep diffractive neural network (D2NN) is a fast optical computing structure that has been widely used in image classification, logical operations, and other fields. Computed tomography (CT) imaging is a reliable method for detecting and analyzing pulmonary nodules. In this paper, we propose using an all-optical D2NN for pulmonary nodule detection and classification based on CT imaging for lung cancer. The network was trained based on the LIDC-IDRI dataset, and the performance was evaluated on a test set. For pulmonary nodule detection, the existence of nodules scanned from CT images were estimated with two-class classification based on the network, achieving a recall rate of 91.08% from the test set. For pulmonary nodule classification, benign and malignant nodules were also classified with two-class classification with an accuracy of 76.77% and an area under the curve (AUC) value of 0.8292. Our numerical simulations show the possibility of using optical neural networks for fast medical image processing and aided diagnosis.


Introduction
Artificial intelligence has become a highly researched and widely discussed topic in recent years. Deep neural networks have been utilized to solve various tasks such as natural language processing [1][2][3], image classification [4][5][6], object detection [7][8][9][10], semantic segmentation [11][12][13], etc. As the complexity and size of deep neural networks increase, more parameters need to be computed, which requires more time to process the input data. However, real-time processing tasks such as autonomous driving [14,15] are highly demanded, presenting a challenge to traditional parallel computing devices, e.g., graphics processing units (GPUs). Despite significant advantages in GPU technology in recent years, it is increasingly difficult to achieve further developments with silicon-based processing technology.
Optical neural networks represent a new and exciting direction in deep learning architecture, utilizing the propagation of light waves and modulation of the light field with optical devices to achieve ultra-fast computational speeds. Recent research has proposed various structures, including optical convolution networks [16,17], Mach-Zehnder interferometer-based optical networks [18][19][20], optical spiking neural networks [21,22], and diffractive deep neural networks (D2NNs) [23][24][25][26][27][28][29][30][31][32][33][34][35][36][37]. Due to D2NNs' simple structure with high parallel operation and low cost, there has been significant interest in D2NN research over the past few years, including increasing the networks' computation ability [24][25][26][27][28][29]  In this study, we investigated the feasibility of using visible light as the light source for the all-optical D2NN. The He-Ne laser with 632.8 nm wavelength was selected for the networks with 5 diffraction layers in our numerical experiments. The neuron distribution of the diffractive layers was set to 200 × 200 (40,000 neurons per layer, and the size of each layer is 0.8 mm × 0.8 mm) and 400 × 400 (160,000 neurons per layer, and the size of each layer is 1.6 mm × 1.6 mm) for detection and classification tasks, respectively. The axial distance between adjacent layers, including the detection plane, was set to 10mm. Although the diffractive angle is not large enough to achieve full connectivity in the In this study, we investigated the feasibility of using visible light as the light source for the all-optical D2NN. The He-Ne laser with 632.8 nm wavelength was selected for the networks with 5 diffraction layers in our numerical experiments. The neuron distribution of the diffractive layers was set to 200 × 200 (40,000 neurons per layer, and the size of each layer is 0.8 mm × 0.8 mm) and 400 × 400 (160,000 neurons per layer, and the size of each layer is 1.6 mm × 1.6 mm) for detection and classification tasks, respectively. The axial distance between adjacent layers, including the detection plane, was set to 10 mm. Although the diffractive angle is not large enough to achieve full connectivity in the classification task [43], a sufficient number of neurons are obtained in the diffractive layers to modulate the secondary wave field created by the previous layer, and the networks still have a considerable number of trainable connections for training. For our experiments, we clipped the CT images into 50 × 50 pixels and resized them using nearest interpolation to 200 × 200 pixels and 400 × 400 pixels. In the training section, we set a batch size of 64 and the learning rates to 0.005 and 0.001 for pulmonary nodule detection and classification, respectively. The networks were trained for 120 epochs, and then we analyzed its inference performance on the blind test set. The results of the networks are indicated with the maximum intensity in the designed regions of the detecting plane, presenting the real-time computing results.

Pulmonary Nodule Detection
The network model was trained to detect the presence of nodules in CT images from the LIDC-IDRI dataset. Images were clipped around the center of each nodule and labeled as nodule regions, while images of the same size without nodules were also clipped and labeled as no-nodule regions. The number of images in both classes was balanced for training, and the dataset was divided into validation, test, and training sets in the ratio of 8:17:75.
During the training section, the propagated light amplitudes in 2 output detection regions were normalized by using where A i is the sum amplitude of light field in the i-th detector, and b 0 and b 1 are 2 bias factors. The regions without nodules may have a large dark area, and the light intensity in detectors may be close to zero; thus, the factors b 0 and b 1 were applied in the normalization equation. The softmax cross-entropy loss was applied to optimize the network, as described in Equation (4) below [33]: The networks were trained to classify nodules by scanning the entire CT image slices (see Figure 2). Equation (5) was applied to analyze the output of the network, obtaining the probability of nodules' existence as the score: Life 2023, 13, x FOR PEER REVIEW 4 of 12 classification task [43], a sufficient number of neurons are obtained in the diffractive layers to modulate the secondary wave field created by the previous layer, and the networks still have a considerable number of trainable connections for training. For our experiments, we clipped the CT images into 50 × 50 pixels and resized them using nearest interpolation to 200 × 200 pixels and 400 × 400 pixels. In the training section, we set a batch size of 64 and the learning rates to 0.005 and 0.001 for pulmonary nodule detection and classification, respectively. The networks were trained for 120 epochs, and then we analyzed its inference performance on the blind test set. The results of the networks are indicated with the maximum intensity in the designed regions of the detecting plane, presenting the real-time computing results.

Pulmonary Nodule Detection
The network model was trained to detect the presence of nodules in CT images from the LIDC-IDRI dataset. Images were clipped around the center of each nodule and labeled as nodule regions, while images of the same size without nodules were also clipped and labeled as no-nodule regions. The number of images in both classes was balanced for training, and the dataset was divided into validation, test, and training sets in the ratio of 8:17:75.
During the training section, the propagated light amplitudes in 2 output detection regions were normalized by using ′ = ( 0 + 1 + 0 ) ⁄ + 1 ( = 0,1), where is the sum amplitude of light field in the -th detector, and 0 and 1 are 2 bias factors. The regions without nodules may have a large dark area, and the light intensity in detectors may be close to zero; thus, the factors 0 and 1 were applied in the normalization equation. The softmax cross-entropy loss was applied to optimize the network, as described in Equation (4) below [33]: The networks were trained to classify nodules by scanning the entire CT image slices (see Figure 2). Equation (5) was applied to analyze the output of the network, obtaining the probability of nodules' existence as the score:

Pulmonary Nodule Classification
The location and classification of the nodules are provided in XML files, which divide the nodules into 5 classes (labeled [1][2][3][4][5]. Benign nodules were labeled as "1" or "2", while malignant nodules were labeled as "4" or "5". Nodules labeled as "3" were discarded. To prepare the images for training, the images were clipped to a size of 50 × 50 pixels, using the same method as mentioned in Section 2.3. The cases were also divided into validation, test, and training sets, with a ratio of 8:17:75, respectively. In addition, traditional data augmentation methods, such as rotating and flipping the images, were utilized to increase the number of images in the training set.
During the training process, the intensities of the 2 detectors in the output planes were also evaluated by using factor α, as follows: where A 0 and A 1 are the sum amplitudes in the 2 detectors' regions. The mean square error loss function (7) was applied to optimize the network as follows [23]: 3. Results Figure 2a,b present the training section of the networks and the accuracy of the network on the validation set converging after a few epochs, respectively. The detailed results are presented in the first two rows in Table 1. The networks' accuracy in the test set is 89.67% in the two-class classification, and the recall rate reaches 91.08%. The dataset can also be split into 10 parts with 10-fold cross validation, indicating that the mean accuracy in 10 folds is 89.72%, which is close to the performance in the test set. The score of each nodule in the test set was calculated to determine the existence possibility of the nodules. Figure 2c shows the distribution of scores, indicating that most nodules have a score higher than 0.7. In this case, the threshold of the score can be set higher than 0.5, and, at the same time, most of the regions can be detected with a correct result. The outputs of the networks were obtained from two detectors in the detection plane by comparing the amplitude of the light. In Figure 3a,b, the real-time inference results are shown, and the classification results can be clearly obtained by simply comparing the intensity in two detections directly. The trained networks were also applied to scan the CT image slices to search and detect nodules. The existence probability of the nodules was determined by the score of the clipped CT images, and a threshold was selected to assess the presence of nodules. Although there are many false-positive points in the results, almost all the nodules could be detected based on the networks' recall as shown in Figure 3c. Meanwhile, increasing the threshold can discard many false-positive points. However, the recall rate also reduced to 77.60% with a threshold of 0.7. Additionally, many regions without nodules are not included in the dataset, which also further influences the result. To balance the difference Life 2023, 13, 1148 6 of 11 ratio of images with and without nodules, the ratio was set to 1:4 to train and test the networks again. The training results and the confusion matrix are shown in Figure 4a, indicating the classification ability of the networks. The last two rows in Table 1 provide the detailed results of this trained network. The average accuracy in 10-fold cross validation is 92.49%, which is close to the accuracy in the test set (92.86%). The scan result is shown in Figure 4b, and the false-positive points are much less than before. However, the recall rate is also reduced to 70.07%, meaning that just 70.07% of the nodules are detected in the test set. In this case, both the threshold setting and the ratio of positive and negative samples influenced the result of the networks' performance. The trained networks were also applied to scan the CT image slices to search and detect nodules. The existence probability of the nodules was determined by the score of the clipped CT images, and a threshold was selected to assess the presence of nodules. Although there are many false-positive points in the results, almost all the nodules could be detected based on the networks' recall as shown in Figure 3c. Meanwhile, increasing the threshold can discard many false-positive points. However, the recall rate also reduced to 77.60% with a threshold of 0.7. Additionally, many regions without nodules are not included in the dataset, which also further influences the result. To balance the difference ratio of images with and without nodules, the ratio was set to 1:4 to train and test the networks again. The training results and the confusion matrix are shown in Figure 4a, indicating the classification ability of the networks. The last two rows in Table 1 provide the detailed results of this trained network. The average accuracy in 10-fold cross validation is 92.49%, which is close to the accuracy in the test set (92.86%). The scan result is shown in Figure 4b, and the false-positive points are much less than before. However, the a b c CT image core 0.5 core 0. The networks were also used to classify nodules into benign and malignant categories. Figure 5a shows the training results, where the loss decreases quickly, and the network converges after a few epochs of training. Table 2 shows the performance of the trained network in the validation and test sets. The accuracy in the test set is 76.77%, and the recall rate reaches 65.97%, which is slightly different from that of the validation set. The reason may be that some difficult classified singular malignant nodules, were split in the validation set and there was not enough data to validate the performance of the trained network. Furthermore, 10-fold cross validation was performed, showing that the max accuracy reaches 79.43% with a mean accuracy of 74.59%. The confusion matrix and ROC Life 2023, 13, 1148 7 of 11 curve in the test set are shown in Figure 5b, with an AUC of 0.8292, indicating the credible classification result. The field distribution is shown in Figure 5c,d, when the images were inferred on the networks, with the left detector representing benign nodules and the right detector representing malignant nodules. The real-time output is the label of the region with the highest intensity.
Life 2023, 13, x FOR PEER REVIEW 7 of 12 recall rate is also reduced to 70.07%, meaning that just 70.07% of the nodules are detected in the test set. In this case, both the threshold setting and the ratio of positive and negative samples influenced the result of the networks' performance. The networks were also used to classify nodules into benign and malignant categories. Figure 5a shows the training results, where the loss decreases quickly, and the network converges after a few epochs of training. Table 2 shows the performance of the trained network in the validation and test sets. The accuracy in the test set is 76.77%, and the recall rate reaches 65.97%, which is slightly different from that of the validation set. The reason may be that some difficult classified singular malignant nodules, were split in the validation set and there was not enough data to validate the performance of the trained network. Furthermore, 10-fold cross validation was performed, showing that the max accuracy reaches 79.43% with a mean accuracy of 74.59%. The confusion matrix and ROC curve in the test set are shown in Figure 5b, with an AUC of 0.8292, indicating the credible classification result. The field distribution is shown in Figure 5c,d, when the images were inferred on the networks, with the left detector representing benign nodules and the right detector representing malignant nodules. The real-time output is the label of the region with the highest intensity.

Discussion
In this paper, we present the model of an all-optical deep diffractive neural network, which was trained and employed to perform nodule detection and classification tasks using the LIDC-IDRI dataset. The nodule detection task involved determining whether nodules were present or not, which was achieved with an accuracy of 87.78% and a recall rate of 90.47%. The trained networks were further used to scan CT image slices to detect nod-

Discussion
In this paper, we present the model of an all-optical deep diffractive neural network, which was trained and employed to perform nodule detection and classification tasks using the LIDC-IDRI dataset. The nodule detection task involved determining whether nodules were present or not, which was achieved with an accuracy of 87.78% and a recall rate of 90.47%. The trained networks were further used to scan CT image slices to detect nodules. Although the recall ratio in this study is similar to that of others, as shown in Table 3, it should be noted that traditional deep learning methods considered the whole size of CT images in the training section while our network only focuses on the partial section of CT images and considers only the centers of nodules as the targets. This explains why many false-positive points were detected in this study. Despite the performance of our all-optical network being slightly poorer than that of other computer-based methods, the classification of benign and malignant nodules achieved an accuracy of 76.77%, with an AUC of 0.8292, as shown in Table 4. The performance of the all-optical network could be improved by incorporating more non-linear computing sections. Overall, the simulation results demonstrate the potential of all-optical neural networks in real-time processing of medical images for aided diagnosis.
On the other hand, the network can be fabricated in experiments using optical devices, and its inference process can achieve speeds similar to light flight [23]. The network can be divided into three parts: the light source, optical diffractive layers, and detectors. The light source is the input of the network, while the optical diffractive layers modulate the light field to perform designed computation. The optical diffractive layers can be fabricated using a 3D-printed technique [23], multi-step photolithography-etching method [43], or metasurface technique [44]. The number of modulation units (trained parameters) does not affect the inference speed of the network, as the speed of light is constant. The detectors collect the final intensity of light, and once the input image is loaded, the distribution of light intensity can be directly seen at the detectors immediately, which represents the result of the network processing. Since the all-optical network has a forward inference speed similar to light, it has been reported in many fields [34][35][36][37][49][50][51]. Table 3. Comparison with Other Studies in Nodule Detection Task.

Recall (Sensitivity) (%) Runtime
Ali et al. [38] 58.9 DPPU Harsono et al. [39] 94.12 DPPU Cao et al. [40] 92 In addition, the computation power of all-optical networks is currently limited due to the lack of non-linear computing. However, integration with optical non-linear materials, such as magneto-optical traps [52] and photo-refractive crystals [53], provides the possibility to enhance the computation power and further improve the precision of nodule detection and classification. Moreover, as manufacturing processes continue to develop, it may be possible to fabricate an integrated device that can address both nodule detection and classification, provided that non-linear materials can be incorporated into the device. Hopefully, this approach of using all-optical fast computation devices in medical image real-time processing for aided diagnosis will soon become a reality.  Data Availability Statement: Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.