Optical coherence tomography for thyroid pathology: 3D analysis of tissue microstructure

: To investigate the potential of optical coherence tomography (OCT) to distinguish between normal and pathologic thyroid tissue, 3D OCT images were acquired on ex vivo thyroid samples from adult subjects (n = 22) diagnosed with a variety of pathologies. The follicular structure was analyzed in terms of count, size, density and sphericity. Results showed that OCT images highly agreed with the corresponding histopatology and the calculated parameters were representative of the follicular structure variation. The analysis of OCT volumes provides quantitative information that could make automatic classiﬁcation possible. Thus, OCT can be beneﬁcial for intraoperative surgical guidance or in the pathology assessment routine.


Introduction
The thyroid gland regulates metabolic, growth and development processes. Its functional units are follicles that are composed of colloid or thyroglobulin surrounded by epithelial cells. The structure and size of the follicles are dependent on the location in the gland, the functional state of the gland's tissue, as well as age [1]. Disregarding these parameters, in a normal thyroid, the follicles are usually spherical, uniform in shape and distribution, and vary considerably in size [2], with diameters ranging between 50 and 100 µm in adult humans. The common diseases of the thyroid, benign and malignant, are goiter (enlargement of the thyroid), hyperthyroidism (overactive thyroid, for example Grave's disease), hypothyroidism (underactive thyroid), adenoma (tumor changes), thyroiditis (for example Hashimoto's disease) and carcinoma (mainly papillary, medullary and follicular) which in one way or another affect the thyroid's follicular structure and hormone production. It is reported that thyroid cancer accounts for 2.1% of all new cancer cases worldwide being the most common type of cancer of the endocrine system, with number of cases per year continuously increasing in most parts of the world [3,4]. Furthermore, non-cancerous thyroid pathologies are common with an associated negative impact on life quality [5]. As most of the diseases of the thyroid affect the follicles, analysis of the follicles' morphology may aid in determining the pathological condition of the thyroid.
In clinical practice, when a disease of the thyroid is suspected, ultrasonography guided (USG) fine needle aspiration and cytology examination is performed. Based on the resulting diagnosis the tissue sections are surgically extracted and histologically analyzed by a pathologist. In a significant number of patients, the routine preoperative investigation of thyroid nodules, involving USG fine needle aspiration, does not provide sufficient diagnostic information. For these patients, diagnostic operations are often performed in form of hemithyroidectomy. If cancer is verified in the specimen, the patient undergoes a second surgery for completing the thyroidectomy. During both thyroid and parathyroid surgery, differentiation between normal and pathologic tissue as well as the correct identification of the tissue type is challenging [6]. The major risks during these surgeries are the potential damage to the parathyroid glands and vocal cords as well as bleeding [7,8]. In addition to the risks, a total thyroidectomy, compared to hemithyroidectomy, exposes the patients to higher risks and is associated with a higher morbidity and requires life-long medication after the surgery. Intraoperative diagnostic tools could therefore be of value to reduce the operation risks, save the non-pathologic thyroid tissue by avoiding unnecessary thyroidectomies, reduce operation time, post-operative care and medication. In patients where USG fine needle aspiration is not suitable, intraoperative diagnosis of thyroid pathology would be of additional interest as it helps avoiding a second operation.
Despite the demand, there is no solution yet available that intraoperatively assists the surgeons with tissue identification. Several publications have addressed localization of the parathyroid glands by using different optical imaging methods. Prosst et al. reported the use of 5aminolevulinic acid (ALA) for inducing fluorescence in the parathyroid [9]. Tummers et al. used methylene blue and near infrared imaging for detection of parathyroids [10]. Recently, it was reported that parathyroid, and to a lower extent thyroid, emit a unique autofluorescence signal in the near infrared region [11][12][13][14]. A few studies [15][16][17][18][19] reported the optical coherence tomography (OCT) to be a fairly reliable method for parathyroid identification, showing that representative tissue structures were visible in the OCT images, in both ex vivo [15] and in vivo [16] investigations. OCT and optical coherence microscopy (OCM) have shown to be promising in providing histology-like images of thyroid, parathyroid, adipose and muscle tissue [17]. Sommerey et al. showed that, in addition to the visual assessment of 2D OCT images, the intensity profile can be used as a method to distinguish parathyroid glands from thyroid, adipose and muscle tissue [18]. Hou et al. automatically classified ex vivo parathyroid from thyroid, lymph node and adipose tissue using texture feature analysis and artificial neural networks [19].
Only few studies have addressed the thyroid tissue and its pathology in detail by using OCT [20,21]. Zhou et al. imaged both benign and malignant thyroid diseases using an integrated OCT and OCM system [20]. The authors qualitatively showed that microstructural features were distinguishable in 3D OCT images that could be used to differentiate malignant tissue from normal or benign lesions. According to a feasibility study performed by Erickson-Bhatt et al., OCT can be used for real-time intraoperative imaging of thyroid to prevent unnecessary thyroidectomies, which were observed in 25-50% of cases when cancerous tissue was suspected [21].
The novelty of this work is the computation of parameters describing the thyroid follicular structure in various pathologies by advanced analysis of 3D OCT images. The results of this quantitative analysis could be used to distinguish normal from pathologic tissue, to help to preserve thyroid tissue during surgical procedure. The aim of this work was to characterize the follicular structure using specific parameters of density, count, volume variability and sphericity to distinguish normal from pathologic thyroid tissue, by implementing a semi-automatic algorithm.

Tissue specimens
Thyroid tissue specimens from 20 female and 2 male random adult patients (n=22), age 53±18 years (mean ± standard deviation), undergoing thyroid surgery were studied ex vivo using the OCT system described in section 2.2. Tissues were kept in 4% formalin in between the excision and the measurement. The study was approved by the local ethical committee (Dnr 2012/237-31, 2014/452-32 and 2015/463-32) and written informed consent was received from the patients.

OCT system and scanning protocol
A spectral domain OCT imaging system (TELESTO II, Thorlabs, Inc., NJ, USA) with an LSM03 probe and central laser wavelength of 1310 nm was used to study the tissue microstructure on the tissue specimens. The system has a maximum resolution of 13 and 5.5 µm in the lateral and axial directions, respectively. The refractive index was set to 1.4, as the mean value between the refractive index of dry tissue and water [22]. Images were acquired in 3D, where the number of averages per B-scan (x-z plane) was set to 10.
The regions of interest (ROI) were selected with the aid of an initial overview scan that imaged a maximum volume of 10 × 10 × 2 mm with a voxel size of 20 × 20 × 2.53 µm in the x, y and z direction, resulting in volumes of 500 × 500 × 790 voxels, respectively. Using the overview acquisition, ROIs with a size of 2-3 mm in the x, y directions and depth of 1.5-2 mm were selected from homogeneous regions in the sample. For these acquisitions, the voxel size was set to 5-7 µm in the x and y direction and 2.53 µm in the z direction, resulting in volumes with minimum 285 × 285 × 590 number of voxels. A layer of transparent plastic foil topped with immersion oil was used between the sample and the Z-Spacer (OCT-IMM3, Thorlabs, Inc., NJ, USA). OCT volumes (n=192) were collected during the study and 47 of these were selected for the analysis to match the tissue regions corresponding to histopathology.

Histopathology
To ensure a relevant comparison between histopathology and data analysis, the pathology evaluation of the samples was performed on the homogeneous tissue regions selected by the ROIs, whose positions were registered on the white-light images. Using the white-light images as a guide, the tissue regions corresponding to the ROIs were marked with red tattoo ink that served for later identification of the tissue surface in the histopathology images. The stained tissue regions were extracted using a biopsy punch tool (ø=3 mm), and the samples were processed using standard paraffin embedding procedure for histopathology sample preparation. Several 4 µm thick sections were collected from each sample where the number of sections was decided based on the follicular density observed by OCT. The sections were stained following the hematoxylin and eosin (H&E) staining protocol, and subsequently scanned using a digital microscope (Hamamatsu NanoZoomer, Hamamatsu Photonics, K.K., Japan). One to three digital images corresponding to each ROI were evaluated by a specialist thyroid pathologist and the diagnosis was used as the reference diagnosis for the OCT image analysis. The overall clinical pathology diagnosis of the patient was used as a guide by the pathologist during the evaluation of the histology images.

Data analysis
The analysis of the OCT volumetric data for the extraction of the follicular morphology involved image denoising, segmentation of the follicular regions and computation of geometrical properties of the segmented follicles. The data analysis was performed in the MATLAB environment (MATLAB and Image Processing Toolbox Release 2018b, The MathWorks, Inc., Natick, Massachusetts, United States) and it started by manually selecting the sub-volume of the entire 3D dataset as the ROI for analysis. The average volume of the selected ROIs was 3±1.5 mm 3 . The manual selection of the 3D ROI ensured that only relevant portions of the sample volume were processed excluding the deepest sample regions that did not contain useful information. Noise reduction was carried out on all B-scans selected by the ROI, whereas contrast enhancement, image binarization and region separation were performed on the interpolated en face images. The latter were stacked together to form a rough 3D segmentation of the follicular regions. To improve the quality of the segmentation, the shape of each 3D segmented region was evaluated along with its depth in the corresponding ROI and its degree of intersection with the boundaries of the ROI. These evaluations were combined and used for the discrimination between follicle-like and noise-like segmented regions. 3D region properties were computed on the follicular-like regions, Fig. 1. Data analysis steps. The OCT volumetric data is processed slice-by-slice to obtain a binarized version of the en face images. These are then stacked together forming a rough segmentation, which is improved by excluding the 3D segmented regions that do not resemble follicles based on a shape, border and depth score. Morphological analysis is then performed on the improved 3D segmentation to quantify the follicular density, count, size variability and sphericity of each ROI. and used for distinguishing between normal and pathologic tissue. The data analysis diagram is presented in Fig. 1.

Filtering of the speckle noise
Speckle noise introduces a granular intensity pattern leading to a reduced image contrast [23]. The pattern of the speckle noise, caused by the interference of multiple scattered photons with random phases, is related to the imaging system and cannot be avoided in the current technologies [24,25]. It especially obscures object boundaries and microstructures, resulting in a reduced accuracy of image analysis. To achieve better segmentation of the images, suppression of the speckle noise is necessary.
Several filtering methods were implemented on the 2D OCT B-scans and evaluated to increase the smoothness of the image while preserving edge sharpness. The methods included enhanced resolution imaging (ERI) [26], adaptive Wiener filter, multi-frame wavelet transform (WT)s and discrete wavelet transform (DWT) [27]. Of these methods the multi-frame and DWT methods resulted in image quality improvements [28]. The advantage of wavelet transform is seen in the decomposition of the image into different sub-bands. By applying a complex DWT an even greater separation between the sub-bands is possible, such that a better differentiation of the signal from the speckle noise can be obtained. Since signal-to-noise ratio and contrast-to-noise ratio cannot be correctly calculated without a noiseless reference image [28], the edge profile gradient and the performance of the segmentation (DICE score) were used to select the optimal DWT filter threshold. DWT filtering was further used in this work and applied on the OCT B-scans forming the imaged volumes.

Follicular structure segmentation
Image binarization After DWT speckle noise reduction, each interpolated en face image was contrast-enhanced using adaptive histogram equalization through the adapthisteq function and then filtered by a median filter with kernel [3×3] to reduce contrast-enhancement induced artefacts. The image was then binarized using an intensity threshold. The threshold was selected to be the smaller out of two threshold values computed using the multithresh function that implements Otsu's method [29]. The selection of the smaller threshold ensured that only the dark regions in the image, i.e. low intensity pixel values, were segmented as possible follicular regions. The binarized image with pixel value 1 in the foreground and 0 in the background was processed through a series of morphological operations, including reconstructed image opening and closing, that aimed to remove isolated and spur pixels, fill the holes and smooth the border of the segmented regions.
Separation of connected regions A region separation procedure was applied on the binarized image as described in [30] to disconnect possible follicular regions that have been segmented as one in the binarization step. The kernel size set in the imextendedmin function was adapted from [30] to be dynamically selected for each segmented region based on its 2D shape (Eq. (1)). For large kernel sizes (e.g. [10×10]) the region separation algorithm is less sensitive to connected regions, whereas small kernel sizes (e.g. [3×3]) induce over-separation. For each region in the binarized image, a 2D property-score, 2D S p , was computed as where circularity, solidity, extent and eccentricity were computed using the regionprops function. To identify the weighting factor for each region property, principal component analysis was performed on a dataset of more than 1000 manually annotated regions that represented both follicle and noisy portions in several random OCT images, each described by a set of region properties that include circularity, solidity, extent and eccentricity. The weighting factors used in the equation above are the constants of the linear combination that generates the direction along which the dataset is most variable. For regions in the binarized image with 2D S p higher than 0.7, no region separation was performed since these regions were considered to have a follicular-like 2D shape. On regions with 0.5 <= 2D S p < 0.7, region separation was performed with kernel size [10×10], whereas regions with 0.3 <= 2D S p < 0.5 were treated with a region separation with kernel size [7×7]. For the remaining regions with 2D S p < 0.3, a stronger region separation was applied, were the kernel size was set to [5×5]. 2D S p intervals and the relative kernel size were arbitrarily selected based on multiple trials.

Non-follicular region exclusion
By stacking the en face images processed through the previous steps, a rough 3D follicular segmentation was obtained. To improve the quality of the segmentation, an initial region exclusion based on the size of the regions was performed. Regions whose number of voxels was lower than the one needed to cover a sphere of 30 µm in diameter were set as background. This exclusion was based on the minimum diameter size of follicles. Next, three scores were computed that considered the three-dimensionality of each segmented region. The shape-score, S s , describes the sphericity of a region, the border score, S b , summarizes its degree of intersection with the ROI borders and the depth-score, S d , accounts for how deep it is in the selected ROI. The combination of the three scores produces a final-score, S f , that ultimately was used for the discrimination between follicular-like and noisy-like regions.
Region shape-score The shape measure used in this implementation is a sphericity evaluation based on DICE score. For each 3D region, the centroid and the bounding box were computed using the regionprops3 function. A sphere was defined around the centroid using the mean between the minimum and maximum length of the region's bounding box as diameter. Then, voxels belonging to the 3D region and the sphere volume were counted as true-positive (TP), voxels belonging to the 3D region but not to the sphere volume were counted as false-positives (FP) and voxels belonging to the sphere volume but not to the 3D region were counted as false-negatives (FN). Using the definition of DICE in [31], S s of each 3D region was computed and set as voxel value. An illustration of how S s is calculated is shown in Fig. 2, where the region under evaluation is illustrated along with the sphere constructed in its centroid. Region border-score S b is an evaluation of the degree of intersection of each region with the borders of the selected 3D ROI and reflects the uncertainty in the evaluation of the follicular regions introduced by partial segmentation. S b was computed as where n SA is the number of region surface-area voxels, n I is the number of surface-area voxels that are not in contact with any ROI border, n T is the number of surface-area voxels that intersect the most superficial face of the ROI, n B is the number of surface-area voxels that intersect the deepest face of the ROI and n L is the number of surface-area voxels that intersect the remaining four lateral faces of the ROI. The weighting factors T, L and B in the equation above are arbitrarily selected as T < L < B, meaning that the voxels intersecting the bottom border will have more impact on the computation of the border-score, reflecting the susceptibility to noise and artefacts of the deeper regions. In this implementation T = 2, L = 3 and B = 4. S b ranged in [0,1] with 1 assigned to all the voxels of regions that did not intersect the borders of the ROI.
Region depth-score An S d value between one and zero was set to all voxels in a region, based on its depth in the ROI. Similar to S b , the deeper the region is in the ROI, the more prone it is to noise and artefacts. Region voxels in contact with the top surface of the ROI have S d = 1 and thus are more reliable than region voxels in contact with the bottom surface of the ROI, that have S d = 0.
Final-score By combining S s , S b and S d , the final-score S f was obtained and used to refine the rough segmentation by discriminating between follicle-like and noise-like regions. S f is computed as where a value between [0, Inf] was attributed to every voxel belonging to a 3D segmented region.
The definition of S f is the result of experimental trials in which the combination of S s , S b and S d , was adjusted to obtain a clear separation between the S f of follicle-like and noise-like regions.
Using the above definition, voxels belonging to follicule-like regions showed an S f smaller than 5.
In this implementation, a region was considered as a follicle, and evaluated during the follicular morphology analysis, if 90% of its voxels had S f < 5. The voxels belonging to regions considered non-follicles were set to background.

Morphological analysis of the segmented follicular structure
The follicular structure of the OCT segmented volumes was analyzed by evaluating the follicular count, density, follicular volume variability and mean sphericity (computed as described in 2.4.2). The function regionprops3 was used to obtain the number of voxels of each region and the follicular count. The follicular density accounts for all the segmented follicles in the ROI and was computed as the number of foreground voxels divided by the total number of voxels in the selected ROI. The follicular volume variability and mean sphericity, on the other hand, were calculated on the ensemble of segmented regions which had less than 10% of the surface area voxels belonging to the border of the ROI. This region exclusion accounts for the effects of partial segmentation, where the volume measurement and the shape of the follicles are distorted by the selection of the ROI. The results of the follicular morphological analysis for each OCT volume were saved and used for later analysis.

Segmentation validation
The effectiveness of the speckle noise reduction and the performance of the segmentation algorithm were validated by comparing the automatic segmentation results, obtained using different DWT filter thresholds, with the manual annotation of 18 raw 2D OCT images, hereafter referred to as validation images. DICE score as defined in [31] was used as a metric for quantifying the performance of the automatic segmentation. For each validation image the number of TP, FP and PN pixels were determined by overlapping the manual with the automatic segmentation, and used for the calculation of the aforementioned metrics. Manual annotation was performed using an ad-hoc function that was run on a touchscreen computer provided with a stylus pen. Note that the manual annotation was not performed by a pathologist since it is a time-consuming task. The repeatability of the manual annotation was assessed on four of the 18 validation images, by comparing the manual annotation performed independently by two observers.

Results
Representative OCT and histopathology images of the thyroid tissue are presented, followed by the results of the speckle noise reduction for different filter threshold values and an example of follicular structure segmentation. The evaluation of the segmentation algorithm performance is then summarized. Finally, the results of the follicular structure morphological analysis are reported for the different thyroid pathologies available in this study.

OCT images
Examples of OCT B-scans of the thyroid tissue are shown in Fig. 3, where variations in imaging depth and differences in intrafollicular intensity value are noticeable. In Fig. 3(a) and (c), differences in the intrafollicular colloid density appear as variations in the intrafollicular pixel intensity. Follicles with low colloid density showed clearly as dark structures in contrast to the brighter tissue surrounding the follicles. On the other hand, follicles with high colloid density appeared as slightly brighter structures in low contrast with the surrounding tissue (red arrows). Moreover, Fig. 3(b) shows a follicle containing solid colloid structures that appear as bright spots in the intrafollicular space. These structures might provide relevant information for the pathological evaluation of the tissue. Figure 3(c) is representative of the imaging depth dependency on the tissue structure. The dense follicular structure on the right side of the image was visible as deep as 1.2 mm, whereas in the central part of the image with a few follicles visible close to the sample surface, the signal intensity dropped after 0.5 mm. On the left side of the image, structures could be imaged up to 0.8 mm after which no follicles could be distinguished.

OCT vs histopathology
The follicular structure in the majority (44/47) of the OCT images agreed with their corresponding histology by qualitative evaluation. A representative comparison between B-scan OCT and histopathology images for normal and pathologic thyroid tissues is shown in Fig. 4. In normal samples ( Fig. 4(a) and (b)), a large number of follicles are visible in both OCT and histology images. Follicular structure was not easily distinguishable in neither the OCT image nor the histopathology in the case of papillary cancer (Fig. 4(c)), where papillary hyperplasia visible in the histology image replaces the follicular structure. In the case of adenoma shown in Fig. 4(d), both OCT and histopathology showed a depleted and atrophic structure. Enlarged follicles commonly seen in multinodular goiter (nodular hyperplasia of the parenchyma) are visible in OCT and histopathology images (Fig. 4(e)), where the follicular diameter is found to be larger than 500 µm. Other examples of goiter are presented to show the variability in follicular structure seen for this pathology (Fig. 4(f)-h). A group of follicles constrained by a nodule-like structure was imaged by OCT which was comparable to the histology (Fig. 4(h)). In cases where OCT and histopathology did not show similar structures, follicles visible in the histology images were below the depth limit reachable by the OCT system.

Filtering of the speckle noise
Results of the DWT filtering with filter threshold set to 1.5 and 4, are shown in Fig. 5. By increasing the threshold value, the filtered image becomes visibly smoother. The degree of smoothness introduced by the DWT filter is evaluated by comparing the edge profile of the raw and the filtered images. For a better comparison, the edge profile of the enhanced image is also plotted. For a threshold value of 1.5, small intensity variations caused by the speckle noise were reduced by the DWT filter without broadening the edge profile. Using a filter threshold value of 4 results in a very smooth edge profile, with both small and large intensity variations being flattened. The pixel intensity of the dark areas is increased, reducing the contrast in the image. The adaptive histogram equalization improves the image contrast, as shown by the decrease in pixel intensity in the darker regions.
Using the validation images, the segmentation performance was evaluated when using filter threshold values of 0, 1.5, 2, 2.5, 3, 3.5 and 4. The mean DICE score for filter threshold value of 0 was 0.63 which was increased by 11.1% and 15.9% when using a threshold values of 1.5 and 4, respectively. Despite the increase in the mean DICE score, the standard deviation evaluating the robustness of the segmentation, had insignificant variations with values being 0.19, 0.16 and 0.18, for thresholds 0, 1.5 and 4, respectively. As a result, the optimal filter threshold was set to 1.5 and used hereafter for data processing.

Follicular structure segmentation
The result of the speckle noise reduction along with the intermediate step results of the follicular structure segmentation are shown in Fig. 6. The intensity-based image binarization identified the follicular regions with a limited number of artefacts which mainly appeared as irregular region borders in Fig. 6(c). In the binarized image (Fig. 6(c)), eight regions are two-by-two connected where the red arrows indicate their connection points. The color maps of the S s and S b are presented in Fig. 6(d) and (e), respectively. Warm-colored pixels in the S s map identified regions whose sphericity measurement was close to 1. In the S b map, warm-colored pixels belonged to 3D regions that intersect the borders of the ROI with the minority of the surface area voxels. The combination of S s , S b and S d , the latter not shown here, resulted in the S f map pictured in Fig. 6(f), where follicular-like regions were identified by cold-colored pixels. In Fig. 6(g), the final result of the segmentation algorithm is presented with the contours of the identified follicular regions superimposed to the raw image. As can be seen from the final segmentation result, by applying the region separation procedure, the connected regions were disconnected allowing for an improved segmentation of the follicular structure in the sample.
The discrepancy between the 2D sectional view regarding the regions and the value of the score maps shows how a 3D analysis gives more information than a 2D analysis. Looking at the S b map for example, regions that do not touch the border of the image are expected to have a score equal to 1. However, the S b value is lower and this is because the 2D sectional view contains no information about the three-dimensionality of the follicles and how they develop in the ROI. In this case, the 3D regions viewed in the sectional view, touched the border of the ROI and thus had a S b value lower than 1.

Repeatability of the manual annotation
The comparison between the manual annotations resulted in a DICE score of 0.91±0.02. Therefore, the manual annotations of the validation images used later for the evaluation of the automatic segmentation performance can be considered repeatable.

Automatic segmentation performance
The automatic segmentation for four validation images is shown in Fig. 7. In the top row the manual annotations and the automatic segmentations are superimposed on the validation images. In all examples, TP, FN and FP pixels are marked in green, red and blue, respectively. The performance of the automatic segmentation varied among samples based on the follicular structure (i.e. number of follicles, their size, the amount of solid colloid in the intrafollicular space) and the depth at which the follicles were still distinguishable from the background (bottom row in Fig. 7). For samples with high follicular density or superficial follicles (Fig. 7(a)-c), the automatic segmentation had DICE scores > 0.82. On the other hand, in cases where the follicular density was low or the follicular structures were deep in the sample, the segmentation performed poorly with a DICE score of 0.43 (Fig. 7(d)). The overall DICE score for all of the validation images was 0.7±0.16. The number of FNs was 7% higher than the number of FPs, showing that the segmentation algorithm had a tendency of under-segmenting the follicular regions and thus underestimating the follicular density. The major cause for the underestimation was found to be the image binarization step when looking at the edge profile across the intrafollicular and extrafollicular spaces. This was particularly true for follicles with high colloid density, that show in low contrast with the surrounding tissue.
From the bottom row in Fig. 7, showing examples of OCT B-scans, it is clear that the imaging depth is highly dependent on the follicular structure. In Fig. 7(b) and (c) solid colloids were  visible inside many of the follicles (red arrows in the B-scan images). The segmentation of follicles that present solid colloids is challenging, especially in the cases when these are abundant.

Morphological analysis
The morphological analysis results for the OCT volumes is summarized in Fig. 8, where the boxplots for follicular density, count, volume variability and mean sphericity are presented separately for the different thyroid tissue pathologies. In the boxplots, normal (n=9), goiter (n=24), adenoma (n=6), Graves (n=3), Hashimoto (n=2) and cancer (n=3) are represented. Analysis shows that the follicular density ranged up to 0.54, with normal and goiter samples having substantially higher values than atrophic pathologies ( Fig. 8(a)). Follicular count is highest in normal samples, with a median of 86 follicles per mm 3 ( Fig. 8(b)). This value is lower for all the thyroid pathologies investigated here. Normal samples have a follicular volume variability (2.9×10 6 µm 3 ) lower than hyperplastic diseases (4.5×10 6 µm 3 ) but higher than pathologies where the follicular structure is atrophic (Fig. 8(c)). Median follicular sphericity does not show any significant variation among different pathologies (Fig. 8(d)). The high sphericity variability seen among the cancer cases is an artifact of threshold selection during the image binarization, where the absence of dark regions (i.e. follicular structures) in the selected ROI pushes the threshold to high values making the binarization more sensitive to noisy regions. These regions resemble deformed follicles and thus spread the range of the sphericity for the samples where few to no follicles are present. The median values, along with the ranges, are summarized in Table 1 for all pathologies and follicular morphology parameters.  Table 1.

Discussion
Identification of pathologic thyroid tissue could be of help during thyroid surgery. The combination of qualitative evaluation of OCT images and quantitative information describing the follicular structure, could aid the discrimination of normal and diseased thyroid tissue intraoperatively and thus prevent unnecessary tissue removal. In this study, the feasibility of the methods was investigated, limitations identified and possible solutions suggested.

Clinical implementation
The implementation of the proposed method in clinical practice is currently limited by the acquisition and processing time of 3D OCT data. Nevertheless, the proposed methodology should be easily translated to a 2D analysis, with a substantial reduction in acquisition and processing time. Intraoperative 2D OCT investigations of the thyroid are reported in literature, with promising stability and sufficient image resolution showing the follicular structure clearly [15][16][17]. Today, intraoperative OCT systems are available in the form of a needle probe [32], integrated into surgical microscopes (iOCT) or as a handheld-probe based system [33]. This latter system type is more suitable for thyroid surgical application due to the small incision site, while the needle probe can be suitable for biopsy. Zysk, et al. investigated the challenges and the possible clinical benefits in translating OCT into clinical tools for surgical aid [34]. The advancements in OCT systems and computer hardware along with processing software make 3D, and even 4D OCT, practical for intraoperative implementation [35][36][37][38].
The limited imaging depth of OCT constitutes another obstacle to the clinical application of this method. Furthermore, OCT does not reach the levels of preoperative diagnostic accuracy of USG for thyroid fine needle aspiration. However, due to the low invasiveness and the ability of imaging tissue volume without slicing, it can be used as an aid in intraoperative diagnostics in an organ such as the thyroid in which histological frozen section investigations are not always recommended [39,40]. Moreover, a technique similar to OCT for imaging a small tissue volume without the need to perform slicing can be useful in the pathology examination routine or even for the physiologic studies [1,2] of tissue without any direct clinical implementation.

OCT vs OCM
The context of the appearing structures is affected by the type of imaging system. Zhou, et al. showed the ability of OCM to resolve the cellular organization in thyroid samples diagnosed with malignant diseases such as papillary carcinoma and medullary carcinoma, where OCT only shows a homogeneous tissue layer [20]. However, the need to resolve microstructural features at different scales partly dictates the dimension of the FOV appropriate for the investigation that captures a representative portion of the sample. In the case of OCM, the acquisition and volume rendering time limit the FOV size, which is in the order of 10 −2 mm 2 [20,41]. To comply with the size of follicles that range between 50 and 800 µm, in this study a horizontal FOV of at least 2 mm 2 was used, which is achievable by OCT systems in a reasonable acquisition time. With the prospect of applying the proposed method intraoperatively, the imaging time becomes a critical factor that will promote OCT over OCM if a larger area is to be scanned.

Effect of optical settings on follicular volume calculation
The measurement of the follicular volume is affected by the OCT imaging system configuration and the value of the refractive index set in the acquisition software. The assessed thickness of the epithelial cell layer surrounding the follicle might change when comparing images acquired with light source of different wavelengths, as described by Ishida and Nishizawa [42]. At 800 and 1300 nm, the follicular epithelium was found to be thicker compared to when the laser light is at 1060, 1550 and 1700 nm. A possible explanation is that different tissue structures and cell populations are visible at different wavelengths.
The estimation of the follicular volume is influenced by the refractive index that is used to obtain geometrical distances in the axial dimension from the optical delay. Considering the two extreme cases of pure water (n=1.3) and dry tissue (n=1.5) [22], the voxel's axial dimension deviates by up to ±8% for each mm inversely with ∆n = 0.1. In this study, the refractive index was selected as 1.4 as the mean value between water and dry tissue. This is also consistent with the values published for tissue [43] and that sets the axial voxel dimension to 2.54 µm. Nevertheless, the volume calculations are eventually affected in the same manner, which allows comparison between the follicular volumes.

Challenges of segmentation
The ability of the automatic segmentation to find and properly segment the follicular regions affects the overall statistics involving follicular volume measurements as well as the follicular density estimation. The higher percentage of FN found in the segmentation validation indicates that the follicular volume measurements are in general slightly underestimated reflecting a lower follicular density. Moreover, the performance variability indicates that, as for the current implementation, the reliability of the follicular volume and density estimation changes between samples. This low performance can be attributed to two main factors, namely the variability in both the imaging depth and intrafollicular signal intensity value.
The variability in imaging depth that occurs in each B-scan, makes the segmentation of the follicular structure more challenging. Regions in the image with shallow imaging depth appear as dark regions that connect to the follicles in their proximity. These follicles are difficult to identify which makes them prone to be excluded during the discrimination of non-follicular like regions. The region separation procedure to some extent improves identifying the connected follicles, improving the estimation of the follicular density and follicular count that otherwise would be underestimated. With respect to the intrafollicular pixel intensity variability, the limitation of the method proposed here is that it is based on the intensity difference between follicles and surrounding tissue. Follicles with high colloid density had a low contrast with the surrounding tissue in the OCT images, making their identification non-trivial. The adaptive histogram equalization increased the contrast between follicles and surrounding tissue. However, in the case of follicles with high colloid density, the contrast enhancement was not sufficient to allow the intensity thresholding to identify these follicles. This led to increased FNs and lower DICE score when looking at the validation images, and to an overall underestimation of the follicular volumes and density. Degradation of the spatial resolution due to different processing steps, evaluated by looking at the edge profile across the intrafollicular and extrafollicular spaces, showed that the image binarization procedure is the main cause of a possible underestimation.
In this study, segmentation was performed in 3D in order to capture the multi-dimensional features of the thyroid tissue microstructure. When assessing follicular shape, round follicles in a 2D section view might be ellipsoidal when observed in 3D. Therefore, a follicle classified as normal in the 2D section view might be marked as deformed if observed in 3D. Moreover, the volumetric information helps the identification and the removal of non-follicular like regions during segmentation, improving the overall analysis and evaluation of the sample. The main disadvantage in using 3D data is the longer acquisition and processing time. A possible solution for shortening the analysis time is exploiting parallel processing using graphical processing units, especially when it comes to the adaptive histogram equalization step, which is the most time demanding step in the implementation.

Deep learning
The implementation of deep learning algorithms for the segmentation of the follicular structure could overcome the limitations of the current analysis. Several publications describe the application of deep learning on OCT data [44]. Convolutional neural networks (CNN) are applied on ophthalmology 2D OCT images for the segmentation and assessment of the retinal layers [45][46][47] and the identification of retinal pathological conditions [48,49], whereas Abdolmanafi, et al. used deep learning to identify coronal artery layers in OCT images [50]. Recently, 2D thyroid OCT images were used to test the performance of a CNN for medical image segmentation that was able to classify follicular, non-follicular and background regions [51]. Different CNN architectures were proposed for improving the segmentation performance by increasing the depth of the network and adjusting the number of convolutional and fully connected layers [52]. Such implementation may succeed in segmenting follicles that currently are inaccessible. As for the training data, a considerable number of B-scans were acquired during this study (94000 in 192 volumes) that may provide a starting point for the evaluation of the approach. A major challenge is the preparation of the ground truth data since accurate annotation of the follicular structure requires one-to-one comparison with histology.
Segmentation of each B-scan should be initially implemented for the evaluation of the method, with the prospect of being extended to 3D. Multiview CNNs [53] and volumetric CNNs [54] can process 3D data and are able to perform object segmentation and semantic segmentation. However, the required computational time makes their clinical applications challenging. Recently, a new approach called PointNet was proposed, where instead of feeding the deep leaning algorithm with a set of 3D voxels, the point cloud description of the data was used as input [55]. The authors showed that in this way the computational time and costs are drastically reduced, allowing the classification of more than one million points per second while maintaining the performance of conventional techniques. Such a solution could potentially allow intraoperative 3D segmentation of the follicular structure using deep learning.

OCT compared to histopathology
Qualitative comparison between OCT and histopatology images shows that the follicular structure details resolved by OCT are in good agreement with histology. Enlarged and deformed follicles, as well as atrophic and depleted follicular structures, visible in the histology images of thyroid tissue samples are also found in OCT images and volumes, allowing the assessment of follicular diameters, count and density without traditional tissue sectioning. Nodular structures, where follicles are enclosed by a fibrous capsule as in adenoma and multinodular goiter, are clearly visible in the OCT images. Furthermore, the agreement between OCT images and histopathology of thyroid tissue is confirmed by previous studies [20,21].
The ability of OCT in showing relevant microstructures without tissue sectioning can be used in guiding thyroid surgical procedure. The inspection of the fibrous capsule of thyroid nodules using OCT could provide an initial intraoperative assessment of capsule infiltration that characterizes malignant pathologies. Since no scanning was performed on samples with infiltration of the capsule, this study is limited in assessing the OCT capabilities in showing this diagnostic feature. The relatively coarse resolution of the OCT system used in this study, limits the diagnostic power of the OCT images that are insensitive to cell morphology and organization visible in histopathology images. These cellular features are used by pathologists for the classification of malignant diseases or the evaluation of lymphocytic infiltration.

Morphological analysis
Quantitative analysis of the OCT volumes shows that follicular density, volume variability and count are descriptive for the pathologies included in this study. Depleted follicular structures can be identified by a follicular density value lower than 0.1, which is substantially lower than the one of normal or hyperplastic cases. Moreover, follicular count and volume variability provide additional information that supports the identification of depleted follicular structures based on the density value. Differentiation between normal and hyperplastic diseases is not possible with the data extracted in this study. The extensive overlap between these cases for all the parameters makes their separation challenging. Automatic classification based on the computed parameters was not possible due to the low number of samples for each class. The analysis of a higher number of samples might delineate between normal and hyperplastic diseases that can add complementary quantitative information to the OCT images. Comparison of the follicular size in terms of diameter or volume with literature was not performed since, to the best of our knowledge, limited studies are available that report on follicle size in adult humans.

Conclusion
OCT images from ex vivo thyroid samples show details of follicular structure comparable to histology images without traditional tissue slicing and staining, suggesting OCT as a tool for surgical guidance to avoid unnecessary tissue removal. Moreover, the analysis of volumetric OCT data shows that follicular density, volume variability and count are parameters that describe normal and pathologic tissue samples. To overcome the limitations of the segmentation, deep learning approaches can be implemented to fully achieve an automatic method for thyroid pathology classification.