Lung cancer diagnosis with quantitative DIC microscopy and a deep convolutional neural network.

We present a study on lung squamous cell carcinoma diagnosis using quantitative TI-DIC microscopy and a deep convolutional neural network (DCNN). The 2-D phase map of unstained tissue sections is first retrieved from through-focus differential interference contrast (DIC) images based on the transport of intensity equation (TIE). The spatially resolved optical properties are then computed from the 2-D phase map via the scattering-phase theorem. The scattering coefficient ( μ S ) and the reduced scattering coefficient ( μ S ' ) are found to increase whereas the anisotropy factor (g) is found to decrease with cancer. A DCNN classifier is developed afterwards to classify the tissue using either the DIC images or 2-D optical property maps of μ S , μ S ' and g. The DCNN classifier with the optical property maps exhibits high accuracy, significantly outperforming the same DCNN classifier on the DIC images. The label-free quantitative phase microscopy together with deep learning may emerge as a promising approach for in situ rapid cancer diagnosis.


Introduction
The morbidity and mortality rate of lung cancer is the highest among all cancers, both in term of new cases (2.09 million cases, 11.6% of total) and deaths (1.76 million deaths, 18.4%) among the 18.07 million new cancer cases and 9.55 million cancer deaths occurred in 2018 worldwide [1]. Squamous cell carcinoma approximately accounts for 30% in all lung cancers [2]. Pathological examination of excised tissue sections is currently the gold standard for cancer diagnosis. Traditional pathological diagnosis requires time-consuming multi-step tissue preparation and is not suitable for rapid diagnosis. It also suffers from the inter-and intra-observer variance due to its subjective nature.
During the past two decades, much efforts have been devoted to developing label-free optical techniques for in situ rapid diagnosis of cancer. Both quantitative phase imaging and tissue native fluorescence have shown great potential [3][4][5][6][7][8][9]. Recently, deep learning [10] has demonstrated significant potential in tissue imaging and diagnosis. Liu et al. and Wang et al. have designed deep convolutional neural networks (DCNN) to discriminate cancer and normal hematoxylin and eosin (H&E) stained pathological sections [11,12]. A DCNN classifier has also been developed to classify excised squamous-cell carcinoma, thyroid cancer, and normal head and neck tissue samples based on Hyperspectral Imaging (HSI) [13]. It was used to implement and classify the spectral patches as either normal or cancer [13]. The spatial structural information contained in HSI was, however, discarded.
Light scattering by cells or tissues has important applications in disease diagnosis as the wavelength of light in the visible and near-infrared wavebands is close to the characteristic scale of the structures in cells and tissues [3][4][5]14]. Light scattering can reveal the changes in the morphology, composition and physiological state and has been successfully used in detecting the sub-wavelength scale morphological and biochemical changes in tissues [15][16][17][18]. Phase imaging has been widely used to probe the microstructure changes in thin specimens. Compared to other phase-imaging techniques, the differential interference contrast (DIC) microscope stands out owing to its better depth discrimination and the pseudo-3D relief type of image being clear of artifacts. However, the resulting image of commercial DIC microscope cannot be used directly for quantitative analysis because the image intensity is not linearly proportional to the phase information. Kou et al. found that by taking a through-focus series of images and with the transport-of-intensity equation (TIE), quantitative phase image can be retrieved from DIC microscope images [19][20][21][22]. This TI-DIC approach is robust and requires no or minimal hardware modifications.
In this paper, we performed a study on lung cancer diagnosis using the TI-DIC method together with the scattering-phase theorem that we reported before [20,23]. Two-dimensional quantitative phase maps from 77 normal lung cases and 129 squamous cell lung cancer cases were first obtained with TI-DIC. The spatially resolved optical properties were then computed from the 2-D phase map via the scattering-phase theorem. A significant correlation between the light scattering parameters of normal and cancerous lung tissue was observed, S μ and ' S μ increasing whereas g decreasing with cancer. A DCNN was then designed to classify lung tissue with the 2-D maps of the optical scattering parameters S μ , ' S μ and g, and the original in-focus DIC images, respectively. The diagnosis with the DCNN classifier exhibits an accuracy of 96% to discriminate normal vs cancerous lung tissue using the 2-D maps of optical scattering parameters S μ , ' S μ and g, significantly outperforming the same DCNN classifier on the DIC images. The potential of quantitative phase microscopy with the aid of DCNN for cancer diagnosis is discussed at the end.

TI-DIC microscopy
TI-DIC has been presented elsewhere [19,21]. Here the principle of TI-DIC together with the phase-scattering theorem is briefly outlined. As the consequence of free-space Helmholtz wave equation under the paraxial approximation, TIE relates the phase to the intensity of the wavefront. The phase of the wave immediately after transmission through a thin weakly scattering sample satisfies [21]: where k is the wave number, the terms ( , , ) I x y z , ( , , ) x y z ϕ , and / I z ∂ ∂ denote the in-focus image intensity, the phase to be retrieved, and the longitudinal derivative of the intensity, respectively. The 2-D gradient operator ⊥ ∇ operates on the transverse direction alone. By applying Fourier transform, the phase on the in-focus plane is obtained as following [21]: where ( , ) q x y ,  and 1 −  are the transverse spatial frequency, symbols representing Fourier transform and inverse Fourier transform, respectively. The term ln / I z ∂ ∂ can be approximated by a finite difference of two measurements displaced by a small separation z. Δ The scattering-phase theorem [23] is then applied to determine S μ , ' S μ and g from the measured phase map. The relationship between S μ , ' S μ , g and the 2-D phase map (φ) is given by the following formulas [23]: originating from the anomalous diffraction by the forward-peaked scattering of the thin specimen. Here L is the thickness of the specimen,  means the spatial average, The scattering-phase theorem is applicable to a slice of homogeneous or inhomogeneous medium. In the latter case, a map of S μ , ' S μ and g can be computed from the phase map using spatial averaging over local regions rather than the whole slice.

Experimental setup and validation
The system was built on a commercial DIC microscope (Observer A1, Zeiss). The light source was a Halogen 100W lamp filtered by a (550 ± 5) nm narrow-band filter under Köhler illumination. The numerical aperture for the condenser and objective (Plan-Neofluar 40 × ) were 0.3 and 0.75, respectively. The pixel size for the recorded images was 0.082µm using a CCD camera from Zeiss (AxioCam ICC5). A three-dimensional scanning stage with a zencoder (Pro Scan Ш, Prior) was used to take in-focus and out-of-focus images ( z Δ = 1µm) automatically.
To validate the performance and stability of the system, we first measured the light scattering properties of polystyrene spheres and compared the results with the theoretical prediction obtained by Mie theory. The polystyrene spheres (10µm in diameter) suspension was diluted with water and deposited on a glass microscope slide and covered with a slide cover. Three images, one in-focus and two out-of-focus, were taken for the monolayer of polystyrene sphere suspension. The out-of-focus images were taken on planes with 1µm distance below and above the in-focus plane. The quantitative phase map of the polystyrene sphere suspension was retrieved by the TI-DIC algorithm. The scattering properties for each individual sphere were analyzed by applying the scattering-phase theorem to the region in the phase map being occupied by the sphere. Original image, the retrieved phase map, and the scattering properties of the sphere were shown in Fig. 1. The scattering coefficient, the reduced scattering coefficient and the anisotropy factor obtained were S μ = 0.208 µm −1 , ' S μ = 0.0182µm −1 , and g = 0.913, respectively. The results were in good agreement with theoretical prediction ( S μ = 0.230 µm −1 , ' S μ = 0.0210µm −1 , and g = 0.909) computed with a Mie code.

Data pre-processing
The data set contained 77 normal lung cases and 129 squamous cell lung cancer cases. A three-channel data cube of the optical scattering properties (size: 1722 1722 3 × × ) was obtained by stacking the 2-D map of S μ , ' S μ and g for each case.
Before training, the data set of 206 images (129 cancer cases and 77 normal cases) needs to be expanded. All images were divided into 9 sub-images (size: 574 574 3 × × ) whose dimension corresponds to about 16 lung cells, enlarging the image number from 206 to 1854. Sub-images at this scale have sufficient information for pathological diagnosis. The 1854 downsized images were then randomly split into 3 parts: 928 images as the train data set, 463 images as the validation data set, and 463 images as the test data set. The training data set included 581 images of cancer cases and 347 images of normal cases, and the proportion of the two cases in the validation data set and test data set was the same as the training data set. Among them, the train data set was used to train classifier; the validation data set was used to adjust the hyperparameters during the training process, and the test data set was used to test the classifier independently. The same procedure was also applied to through-focus DIC images.

Data augmentation
After data preprocessing, the downsized images (size: 574 574 3 × × ) in the train and validation sets were augmented by 16

DIC images and scattering characteristics of lung tissue
After validation, tissue microarrays of squamous cell lung cancer and normal lung tissue were imaged. The tissue microarrays (one stained and two accompanying unstained slides, Biomax Inc.) include pathology diagnosis information of grades, stages, and TNM grading. We first identified the characteristic location for each case using the H&E stained slide under the bright-field mode. The DIC images of in-focus and out-of-focus of the unstained slides were then taken at the corresponding location for each ease, from which the 2-D phase maps and the scattering properties were computed. The typical S μ , ' S μ and g maps and the original DIC images of normal lung tissue and squamous cell lung cancer were illustrated in Fig. 3 and Fig.  4, respectively. The histograms of the optical properties (see Fig. 5) and the table of the average values of the optical properties (see Table 2) show that lung squamous cell carcinoma has strong correlation with S μ , ' S μ and g. With the progress of carcinogenesis, the scattering coefficient and the reduced scattering coefficient are found to increase, while the anisotropy factor is found to decrease.

Classification with DCNN
In DCNN, the train loss, the test (validation) loss, and the test accuracy (the accuracy of the test data set) were calculated every 0.1 epoch during the process of training. Figure 6 displays the progress during the training operating on the in-focus DIC images and the optical property maps, respectively, without and with data augmentation. The varying behavior of the training loss during training in Fig. 6 occurs mainly inside each epoch and disappears when restricted to integer epochs (each epoch means the SGD optimization has iterated through the whole training data set once). A ROC (Receiver Operating Characteristic) curve was then utilized to assess the performance of classifier. The ROC curves were plotted in Fig. 7 for DCNN on DIC images and 2-D scattering parameter maps with data augment (green) and without data augment (red), respectively. Accuracies and AUC (Area Under the Curve) of DCNN were shown in Table 3.

Discussion
The results (see Fig. 5 and Table  The decreasing of the anisotropy factor with cancer suggests an overall fragmentation of structures in tissue with cancer progression [3][4][5]. TI-DIC microscopy together with the scattering-phase theorem is effective in quantifying the microstructural alterations in tissue. The classification performance from the DCNN suggests that the classifier using the 2-D images of optical parameters is superior to that using the raw DIC images. The DCNN has potential to be applied in the automatic labeling of cancer and normal tissue using the 2-D optical properties images. Data augmentation technique and drop out were used to avoid over-fitting and achieve a more robust network. One straightforward way of improving the performance of deep neural network is to increase their size such as the depth -the number of levels of the network and its width-the number of units at each level with a potential tradeoff of overfitting [26]. We design our DCNN on the basis of this principle. The designed deep neural network contains six convolution layers and three full connection layers. Data augment technique and drop out ratio (0.5) were implemented to avoid over-fitting. Comparing the train loss and test loss curves in Fig. 6 clearly shows the alleviation of overfitting with data augmentation vs without data augmentation. Even with data augmentation, the test loss curve for DCNN on the 2D scattering parameters was observed to decline initially, increase between the 3rd and 5th epoch, and then level off. We can early terminate at the 3rd epoch when the test loss reaches its minimum value. The AUC at this terminated point is found to be 96.8%, identical to the value reached by keeping training until the test loss is stable.
In the field of traditional machine learning, Support Vector Machine (SVM) is one of the most widely used and successful classifiers. The accuracy of SVM in the classification of cancer images, such as breast cancer [18,27], lung cancer [28], head and neck cancer [13], is usually about 85%-92%. We have also applied SVM to the same data set with a 4-fold cross validation. The features including energy, contrast, entropy, correlation and texture mean are extracted by the Gray-level Co-occurrence Matrix method (GLCM) [29]. The parameters including the penalty parameter and the kernel parameter in SVM are iteratively optimized. The accuracy of SVM classification is found to be 71%. The total computational time is around 32 hours using a 2 GHz CPU. Thus, the classifier of DCNN performs much better than SVM. The outperformance of DCNN may be attributed to one important factor that DCNN can utilize the spatial structure in the 2D optical parameter maps whereas SVM cannot. Furthermore, in SVM, features need to be extracted from data manually which may affect the accuracy of classification, whereas DCNN can automatically learn features from data and has no such concerns.

Conclusion
In summary, we have demonstrated a method for the diagnosis of lung squamous cell carcinoma by TI-DIC microscope and deep convolutional neural network. TI-DIC microscopy together with the scattering-phase theorem revealed that the optical parameters of cancerous lung tissue differ significantly from those of normal lung tissue. The scattering coefficient and the reduced scattering coefficient increase while the anisotropy factor decreases with lung cancer. A DCNN classifier has been developed to classify the tissue using either the DIC images or 2-D optical property maps of S μ , ' S μ and g. The DCNN classifier with the optical property maps is found to exhibit high accuracy of 96%, significantly outperforming the same DCNN classifier on the DIC images. As a label-free modality applicable to fresh tissues, quantitative phase microscopy together with deep learning may emerge as a promising approach for in situ rapid cancer diagnosis.