Segmentation of anatomical layers and imaging artifacts in intravascular polarization sensitive optical coherence tomography using attending physician and boundary cardinality losses

Intravascular ultrasound and optical coherence tomography are widely available for assessing coronary stenoses and provide critical information to optimize percutaneous coronary intervention. Intravascular polarization-sensitive optical coherence tomography (PS-OCT) measures the polarization state of the light scattered by the vessel wall in addition to conventional cross-sectional images of subsurface microstructure. This affords reconstruction of tissue polarization properties and reveals improved contrast between the layers of the vessel wall along with insight into collagen and smooth muscle content. Here, we propose a convolutional neural network model, optimized using two new loss terms (Boundary Cardinality and Attending Physician), that takes advantage of the additional polarization contrast and classifies the lumen, intima, and media layers in addition to guidewire and plaque shadows. Our model segments the media boundaries through fibrotic plaques and continues to estimate the outer media boundary behind shadows of lipid-rich plaques. We demonstrate that our multi-class classification model outperforms existing methods that exclusively use conventional OCT data, predominantly segment the lumen, and consider subsurface layers at most in regions of minimal disease. Segmentation of all anatomical layers throughout diseased vessels may facilitate stent sizing and will enable automated characterization of plaque polarization properties for investigation of the natural history and significance of coronary atheromas.


I. INTRODUCTION
D ESPITE progress with effective therapies for treating acute coronary events, their prediction and prevention continues to present a major clinical challenge [1].More than one million individuals suffered from acute coronary events last year in the United States alone [2].In addition to pharmacological medical therapy, patients suffering an acute coronary event frequently receive percutaneous coronary intervention (PCI).PCI is similarly used in patients with chronic coronary syndrome.
Intravascular (IV) optical coherence tomography (OCT) is increasingly used for guiding PCI.It acquires high resolution images of the subsurface microstructure of coronary atherosclerotic lesions [3], [4] and helps with identification of the culprit lesion, stent sizing, and confirming stent implantation.The use of IV-OCT achieves better physiological outcomes than using coronary angiography alone [5], [6], and can assess functional stenosis severity more accurately than intravascular ultrasound (IVUS) [7].Nonetheless, clinical adoption of IV-OCT has been modest [8].This contrasts with its important role as an essential clinical research instrument for investigating the pathophysiology genesis of coronary atherosclerosis [9].One factor impairing more widespread use may be the image contrast, signal statistics, and speckle characteristics specific to OCT, which complicate image interpretation.Extensive training affords refined interpretation of pullback data sets but is based nearly exclusively on subjective criteria that are difficult to learn, which results in Fig. 1.Expert annotation of IV PS-OCT images.An example of a multichannel IV PS-OCT cross-sectional image, including backscatter signal intensity (A), birefringence (B), and depolarization (C) channels.The white arrow highlights a section of the media that exhibits little contrast in the intensity signal, but is readily identified in the birefringence image.D. Manual annotation by an expert using our Matlab graphical user interface.E. Inclusive pixel-level labels derived from the manual annotation (see Data Section).F. Equivalent exclusive labels defined in Table I.Scale bar: 2 mm.modest intra-and inter-reader agreement [10].Furthermore, the clinical workflow precludes time-consuming interpretation, emphasizing the need for automated analysis that presents the operator with clear indications to guide the intervention.However, until the recent introduction of Ultreon by Abbott [11], image processing algorithms used in the catheterization laboratories have primarily been limited to lumen segmentation, disregarding subsurface vessel morphology.The automatic detection of anatomical layers within the vessel wall and other features beyond the lumen promises to refine guidance of PCI and simplify translational research in the catheterization laboratory by eliminating the need for extensive training and time-consuming manual segmentation.
In parallel to conventional OCT (Fig. 1.A), polarization sensitive (PS) OCT performs polarimetric measurements to reconstruct images of tissue birefringence and depolarization (Fig. 1.B-C).Microscopic PS-OCT in human aortic plaques reports tissue birefringence that can quantify collagen and smooth muscle cell content features, which play an important role in plaque stability and vascular healing [12].Moreover, catheter-based PS-OCT perform intravascular polarimetry of coronary atherosclerosis [13].In addition to tissue birefringence, intravascular polarimetry also measures depolarization, which is increased in tissues containing lipid particles, macrophage accumulations, or cholesterol crystals, as confirmed by correlation with histology in a human cadaver heart study [13].Intravascular polarimetry is compatible with current intravascular imaging catheters, which facilitated its translation into the clinic [14], [15].
Imaging in patients confirmed the improved image contrast available to polarimetric measurements.In particular, the smooth muscle cell-rich tunica media features consistent and high birefringence, often separated from the adjacent intima and adventitia layers by fine bands of low birefringence, colocating with the internal and elastic laminae (IEL, EEL).Depolarization enhances lipid-rich lesions and simplifies their differentiation from calcification, which can have similar appearance in conventional intensity tomograms.The ability to consistently measure the EEL diameter along the entire coronary would be highly relevant for the sizing of stent diameter and length [16].
Intravascular polarimetry also offers a window of opportunity for prospective identification of remote lesions with a propensity for causing subsequent acute events.There remains a high rate of recurrent coronary events within only a few years following initial PCI, caused in about 50% of the cases by a lesion not involved in the original event [17].Plaques that rupture typically are depleted of collagen [18], [19], and are expected to be lowly birefringent.The fibrous caps of target lesions in patients with chronic coronary syndrome featured indeed significantly higher birefringence than the caps of patients with acute coronary syndrome [15].Combined, intravascular polarimetry converts the polarization properties of tissues into endogenous imaging contrast that may facilitate segmentation of subsurface features and could in turn enable improved guidance of PCI, as well as refined assessment of remote lesions.
We build on the success of deep learning methods for segmentation tasks.This paper proposes a convolutional neural network and optimizes its performance using a new multi-term loss function.It uses the additional image contrasts available to intravascular polarimetry by appending the three channels (conventional intensity, birefringence, depolarization) into a multichannel image to automatically analyze coronary artery images.
The multi-term loss function includes two common segmentation loss terms, i.e., weighted cross-entropy loss and generalized multi-class Dice loss.In addition to the common segmentation loss terms, a boundary loss term focuses on the accuracy of the model only within the pixels close to the boundaries between the anatomical layers.The boundary loss is suitable for problems requiring precise object boundary detection.Similarly, "boundary cardinality loss" penalizes the model from a topological point of view when the number of anatomical layers is different between the model's prediction and ground-truth by counting the number of boundary pixels along the radial axis.The boundary cardinality loss imposes a topological prior on the layered tissues.Additionally, a feature denoted attending physician loss uses an independentlytrained critique model, which distinguishes between low-and high-quality labels.The attending physician loss enables the utilization of the auxiliary information embedded in datasets with heterogeneous manual labeling qualities.We trained and evaluated our method on a set of 984 images from 57 patients and compared it to the performance of stateof-the-art algorithms reported in the literature.Our work is the first demonstration of automatic segmentation of anatomical layers and the shadow artifacts arising from the guidewire and lipid-rich lesions using intravascular PS-OCT.It improves the boundary detection of the coronary lumen compared to other methods, and identifies the guidewire and plaque shadows in a single step.Furthermore, to the best of our knowledge, only two other studies [20], [21] reported on the detection accuracy of intima-media and the media-adventitia boundaries, and only in regions of minimal disease, specifically excluding areas with thickened intima characteristic of atherosclerotic lesions.

II. RELATED WORK
The clear need for simplified interpretation of intravascular OCT images has motivated the development of automated methods, with a focus on lumen segmentation, that tackle the challenges and artifacts presented by typical intravascular OCT data.
OCT A-lines are independently acquired along the radial direction of the vessel in cylindrical coordinates as the probe rotates.Simultaneously, the core of the catheter is pulled back through the vessel, resulting in a helical scan pattern.Contemporary IV-OCT reconstructs individual A-lines from measurements in the frequency domain and visualizes the logarithm of the power of the reconstructed signal (dB).The presence of the guidewire, used to safely deploy the intravascular imaging catheter, casts a shadow on the vessel wall.The resulting intrinsic discontinuity even in the lumen signal creates an artifact that needs to be addressed by all analysis approaches.Prior signal processing techniques aimed to analyze A-lines independently in order to exploit the rich characteristics of the OCT signal [22]- [26].
Researchers proposed various classical segmentation methods to detect the vessel lumen in IV-OCT images.The region-based active contour segmentation methods with levelset energy functions utilize the prior cross-sectional information [27]- [29].While dynamic programming [29], [30] or small artificial neural networks [31] can be used to correct the remaining artifacts after the application of the primary methods, level-set methods perform poorly in low signal-tonoise (SNR) regions and produce non-smooth and imprecise lumen boundaries.Graphical models, such as graph-cuts, have also been used for segmentation of OCT images as the various boundaries do not intersect making the graphical models well suited for this problem [32], [33].This approach has been used for the detection of internal anatomical layers, i.e., the inner and outer boundary of the media [21], or also including the outer adventitia boundary [20], in areas of minimal intimal thickness.The counterpart physics-based methods formulate the segmentation as a diffusion problem [34]- [36].Despite the success of graphical methods, difficulties arise in low SNR regions with speckle, anatomical anomalies, and external objects, which limit the practical applications of these models and cause a cascade of increasing errors that require followup manual corrections done by expert annotators in the postprocessing stage [37].
Recently, deep learning models have emerged as a solution to many medical image analysis problems, including IV-OCT.Yong et al. [38] used a regression deep learning network to detect the vessel lumen along the radial direction in polar coordinates.Gharaibeh et al. [39] segmented vessel lumen and coronary calcifications in IV-OCT images using a U-Net architecture and post-processed the output with a conditional random field model.Abdolmanafi et al. [40], [41] used pretrained convolutional neural networks (CNN) to identify and classify several tissue types encountered in the coronary arteries using transfer learning.Specific attention has also been paid to the segmentation of stent struts to confirm stent placement and identify malapposition [27], [32], [36], [42], [43].Our model builds on these previous approaches and extends the state-of-the-art deep learning methods to detect guidewire and plaque shadows as well as anatomical layers not only in minimally diseased vessels but also through thickened intima in coronary arteries imaged with intravascular polarimetry.

III. METHODS
This section introduces the loss functions tailored for vessel layer and artifact segmentation, describes the training and evaluation procedures, and provides implementation details.Fig. 2 illustrates the architecture of our model.As described below, one of our datasets was revised extensively to provide a curated set of high-quality segmentations.This practice resembles the scenario of an attending physician reviewing and scoring manual annotations performed by resident physicians to provide constructive feedback for training.By analogy, we trained a model to critique the multi-class labels conditioned on their input images by distinguishing between the initial and final revisions of the labels.The trained model was used as the attending physician loss term (L AP ) to critique the quality of predicted labels by the main model.

A. Segmentation Loss
We developed a multi-term multivariate loss function that includes novel loss terms.The first loss term is the weighted cross-entropy function that measures the cross-entropy between the target label y and predicted label probabilities ŷ of all N pixels where i and j are 2D matrix indices, c is the class index, and N c is the number of classes.Each pixel's cross-entropy is then weighted proportionately to the inverse of its class population ( The second loss term is a multi-class version of the generalized Dice loss function [44] that measures each label's segmentation accuracy similar to the Dice coefficient.Dice loss uses the prediction probability (e.g., softmax of logits) instead of the classification result and ranges from zero to one, with zero corresponding to the most accurate result.We add a small constant ( ) for numerical stability.These segmentation loss functions are used widely in the field [44], [45].

B. Boundary Loss
The weighted cross-entropy and Dice loss terms are only marginally affected by the errors on the boundary because the boundary pixels are a small portion of the target objects.The third loss term is based on the boundary segmentation accuracy and focuses the network's attention on the close vicinity of label boundaries.The boundary precision loss term (L BP ) utilizes a boundary neighborhood mask, which masks the cross-entropy loss values of the pixels that are not in close vicinity of the label boundaries along the radial axis (e.g.> b, b = 10 pixels) using all-ones matrix (1) and two logical operators, i.e., convolution ( * ), disjunction ( ) and exclusive disjunction (⊕).The boundary precision loss term, is differentiable with respect to the model parameters as long as β β β is a function of the ground truth target label.

C. Attending Physician Loss
The training of a critique model involves a loss function that measures the distance of model parameters from the optimal solution in the parameter space.Arjovsky et al. [46] proposed Wasserstein-1 (a.k.a.Earth-Mover) distance, W 1 (P, P ) = inf γ∈Π(P,P ) where Π(P, P ) is the set of all joint distributions γ(x, x ) that their marginal distributions are equal to P and P .Wasserstein-1 is the optimal cost of transporting a mass with distribution P to another mass with distribution P when the transport cost and transport distance are linearly related.Stable learning with a meaningful learning curve that avoids common problems, including mode collapse, can be obtained when Wasserstein-1 distance is adopted [46].Since the infimum in ( 5) is intractable, Kantorovich and Rubinstein [47] proposed a tractable dual problem, where f is a 1-Lipschitz function, mapping the support of P and P to real numbers.Similarly, we can select f from a family of parameterized functions ({f w } w∈W ) that are at least K-Lipschitz for a constant K and optimize (5) over the functional parameter space, The requirement of f being K-Lipschitz for the function family of deep neural networks can be imposed by clipping the parameter values with an absolute value upper limit [46] or enforcing the gradient of parameters to be 1 almost everywhere through a gradient penalty loss term [48].Gulrajani et al. [48] showed that the gradient constraining method improves the learning process compared to the weight clipping method.We observed a similar effect while training our Attending Physician model, which is then used as the fourth loss term Therefore, we trained the main model from both types of manual segmentations by introducing the critique model (Fig. 2).

D. Topological Loss
The last loss term examines the labels from a topological point of view.Ideally, the predicted labels along A-lines are composed of three or four connected components without any void, starting with the lumen label in the center and ending with the outside label.The area between the lumen and outside labels should be occupied by two adjacent solid anatomical layers (i.e., intima and media) or by one of the artifact labels (i.e., guidewire or plaque shadows).These configurations are distinguishable in terms of the number of label boundaries along the radial direction.The soft boundary cardinality loss term (L BC ) penalizes the discrepancy between the predicted and ground truth labels based on the number of boundary pixels along the radial axis.We propose to employ that is a differentiable proxy for the arguments of the maxima and a saturated equivalent of the softmax function, i.e., e x / e x i 1 .The soft argmax admits the predicted class probabilities at each pixel and maps the probability of the most probable class and other classes to ∼ 1 and ∼ 0, respectively.The level of saturation is controlled by the large number M and the precision of the probability values.Since the value of the soft argmax for a given class changes between two adjacent pixels at the label boundary, the soft boundary set cardinality along the radial axis approximates the number of class boundaries in each A-line.
The boundary cardinality loss function compares the prediction and ground-truth labels with respect to the number of boundaries, where σ measures the difference between two BC vectors (e.g., norm 1).We considered 1, 100, and 100/ for M , where is the small number used for mathematical stability in (2) and sofmax value clipping.For σ, we considered • 1 , • 2 , and max(•).Based on the validation dataset and the convexity of L BC , norm-1 ( • 1 ) and 100/ are the optimal choices for σ and M , respectively.
The final loss function combines all five loss terms: in which loss term weights (λ .) are selected within the range [10 −3 , 10 3 ] and optimized over their logarithmically-spaced multidimensional grid using greedy algorithms.

E. CNN Architecture
The proposed network architecture scheme is based on the U-Net [49] and deep residual learning [50] models (Fig. 2).The auxiliary critique model was trained independently to distinguish low-and high-quality labels.Subsequently, the main model was trained by combining the trained critique model and other loss terms to segment three anatomical layers and two shadows artifacts (Table I).The optimized architecture contains multi-scale encoder and decoder sections with skip connections at each scale.The input consists of the threechannel images of conventional intensity, birefringence, and depolarization, in polar coordinates, down sampled to 512 by 512 pixels.The output consists of the six concatenated classes of the same pixel dimension.The convolutional complex contains three convolutional layers with a 3 × 3 pixel kernel size and a leaky version of the rectified linear unit (L-ReLU) activation function, which has a negative slope coefficient of 0.3.These three convolutional layers compute the residual values by using an internal skip connection.The max-pooling layers with a 2 × 2 pixel kernel size are applied after convolutional complexes in the encoding section for down-scaling while the counterpart deconvolutional layers are applied for bi-linear up-scaling within the decoding section.The encoding output and decoding input are connected through two convolutional complexes that operate at the latent representation level.The layers within each of the three scales and the latent representation layers have 8, 8, 16, and 16 features, respectively.
The critique model architecture accepts the concatenation of image channels and output label channels as the input and applies three convolutional complexes with 32, 64, and 128 features, respectively.Each complex consists of two convolutional layers with a 3 × 3 pixel kernel size and the ReLU activation function followed by a max-pooling with a 2 × 2 pixel kernel size.The last complex's output is flattened and processed by a three-layer dense neural network with 1024, 256, and 128 hidden nodes and ReLU activation function, respectively.The final output has one feature and uses the hyperbolic tangent activation function (Fig. 2).

F. Training and Implementation
We randomly divided the annotated dataset between training, validation, and hold-out testing dataset by selecting 45, 6, and 6 patients (80%/10%/10%), respectively.Augmentation included random mirroring, rotation, multi-channel image intensity distribution manipulations ([-0.05,0.05] brightness and [0.9, 1.1] contrast), and spatial scaling ([0.875, 112.5]).An element from the power set of the image augmentation set was applied to each given PS-OCT cross-section with randomly selected transformation parameters sampled uniformly and independently from the ranges above.The geometric transformations were defined in the Cartesian coordinate system, but they were implemented and applied in the polar coordinate system.The data augmentation methods were implemented and executed on a GPU to improve the model's runtime.
We implemented our model in Python using Keras ™ and Tensorflow ™ .We commonly used RMSprop optimizer with 10 −3 − 10 −4 learning rate and mini-batch size of 20 per GPU.The GPU memory size was the limiting factor in the learning rate and mini-batch size selection.We used two NVIDIA ® GeForce ® RTX 2080 Ti or four NVIDIA ® Tesla ® V100.

G. Post-processing
We investigated a post-processing procedure to the model output to enforce known topology of the multi-class segmentations.Initially, small objects and holes within each class were removed, and their interfaces were smoothed.Then, a set of logical operations was applied to impose the topological relationships between the classes in the polar coordinate system.The proposed set includes the following constraints: • Lumen is a single connected object without any 2D void.
The same rule applies to both guidewire shadow and outside.• Guidewire and plaque shadows are confined between the lumen, the outside, and two A-lines.• The order of layers from inside to outside ends is lumen, intima, media, and outside.

H. Performance Metrics
Based on the ground-truth labels, we evaluated the performance of the multi-class prediction model using accuracy and Dice coefficient, where • is the set cardinality, Ŷc is the set of predicted pixels as class c, Y c is the set of pixels in ground-truth as class c, C is the set of classes, and ∩ is the intersection operation.Furthermore, we evaluated the precision of interclass boundaries using the average distance error (ADE) along the radial direction and modified Hausdorff distance (MHD) [51] in 2D within the cross-section: M HD( B, B) = max{ADE( B; B), ADE(B; B)}, where B and B are the set of boundary pixels in the ground truth and prediction, respectively, and . 2 is the Euclidean norm.

IV. DATA
We demonstrate the method on images from an intravascular polarimetry pilot study, which included two cohorts and enrolled a total of 57 patients who underwent percutaneous coronary intervention and PS-OCT imaging at the Erasmus University Medical Center in Rotterdam.Of the 57 pullbacks, only segments of native vessel wall or containing old stents from previous interventions were included in this study.The Ethics Committee of Erasmus Medical Center approved the study protocol, and all procedures were performed in accordance with local and federal regulations and the Declaration of Helsinki.
The imaging system consists of "FastView" intravascular catheters (Terumo Co., Tokyo, Japan) interfaced with our custom-built PS-OCT system, operating at 1300 nm central wavelength similar to commercially available clinical IV-OCT systems.The wavelength scanning range was 110 nm, achieving a radial resolution below 10 µm, assuming a tissue refractive index of 1.34.The dimension of the pixels in the reconstructed tomograms in the radial direction were 4.2 µm and 4.43 µm, respectively, for the two cohorts.The catheter's rotation speed was 100 RPS, with 1024 radial scans per rotation, and pullbacks were performed at 10 mm/s or 20 mm/s, at the operator's discretion.Non-ionic contrast solution was injected at a rate of 3-4 mL/s during the pullback to displace coronary blood and obtain an unperturbed view of the vessel wall.
Intravascular polarimetry was performed based on our earlier work [13], [15], [52]- [54].Briefly, an electro-optic polarization modulator was used to alternate the polarization state of the light incident on the tissue between consecutive depth scans and a polarization-diverse receiver enabled determination of the detected light's polarization state and intensity.Polarimetric analysis employed spectral binning [55] to reconstruct maps of tissue birefringence and depolarization.Birefringence is the difference in the refractive index experienced by orthogonal polarization states aligned and orthogonal to the tissue optic axis, respectively.Tissue depolarization measures the randomness of the detected light's polarization state using the complement to one of the degree of polarization.
Initially, an expert interventional cardiologist (K.O.) excluded partial segments of 3D pullbacks that were uninterpretable and suffered from severe artifacts caused by insufficient blood clearing.The qualified pullback segments added up to 3936 mm of pullbacks at a 100 or 200 µm pitch.Subsequently, the expert annotated a total of 984 PS-OCT cross-sections spaced 4 mm apart using our in-house Matlab graphical user interface (Fig. 1.D), using the conventional OCT signal as well as the polarization channels.The manual annotations included the outer boundaries of the lumen, tunica intima (i.e., internal elastic lamina (IEL)), and tunica media (i.e., external elastic lamina (EEL)).The location of IEL and EEL within the plaque and guidewire shadows were extrapolated based on their visible segments (Fig. 1.E).Additionally, angular segments containing plaque, guidewire, stent struts, side branches, or thrombus were identified and used for segmentation or selective analysis without influencing the main label categories.Consequently, as summarized in Table I, the manual annotations were converted into six exclusive labels: outside, lumen, visible intima, visible media, plaque shadow, and guidewire shadow (Fig. 1.F).
To manage the workload, we annotated the total dataset in four separate batches and through three phases: initial annotation, high-precision annotation, and annotation approval.One of the batches was revised extensively at the pixel-level, requiring four times as long as other batches.The highaccuracy batch, in combination with its initial annotation, was utilized to train the proposed critique model and its resulting loss term.

V. RESULTS
We compared the model's automated annotation results to the expert's ground-truth annotations in Fig. 3 to qualitatively characterize our model, illustrate the model's strengths, and identify possible areas of improvement.Our model's annotations and the ground-truth are overlaid on the gray-scale intensity image in blue and red outlines, respectively.
The most common complication for boundary annotation, particularly for the outer intima and outer media, is the presence of thick plaques or calcium (e.g.Fig. 3.A green arrow) and thickened vessel walls (e.g.Fig. 3.G, green arrow) that cause significant reduction in the detected signal.The background signal and statistical noise characteristics within the plaque regions impede the model's objective to annotate the anatomical layers and result in higher annotation variability (e.g.Fig. 3.F, yellow arrow; Fig. 3.G, yellow arrow; Fig. 3

.K, yellow arrow).
Nonetheless, whenever the image information supports the ground-truth boundaries, the model matches well with the expert annotations even in these challenging cases (e.g.Fig. 3.D, both arrows).Correspondingly, the boundaries detected by the model may conform with the underlying multi-dimensional images more accurately than the ground-truth annotations (e.g.Fig. 3.F, green arrow), suggesting inconsistencies in the manual ground-truth segmentation.
The guidewire obstructs the probing light, causing a fuzzy signal at its boundaries, resulting in imprecise automatic and manual boundary detection (e.g.Fig. 3.B, green arrow).Moreover, the physical proximity of the vessel lumen with the guidewire and catheter leads to perturbed pixel-level delineation of the lumen boundary (e.g.Fig. 3.B, green arrow; Fig. 3.I, green arrow).
Side branches can appear in various locations of the field of view and could be expected to exhibit confusing features, yet our model analyzes these cases in concordance with the ground truth annotation.Such vessels might appear outside the vessel wall (e.g.Fig. 3.B, yellow arrow; Fig. 3.E, green arrow), directly adjacent to the vessel wall boundary (e.g.Fig. 3.I, yellow arrow), inside the intima (e.g.Fig. 3.C, yellow arrow), or in direct communication with the lumen (e.g.Fig. 3.L, yellow arrow).
Even though non-ionic contrast solution is injected during catheter pull-back to displace blood, residues of blood may persist in the vessel lumen vicinity (e.g.Fig. 3.A, yellow  arrow; Fig. 3.E, yellow arrow).Blood clearance can be incomplete, especially at the onset or the end of contrast injection Fig. 3.H, both arrows).Still, in all these cases, our model successfully detects the lumen outer boundaries.Equivalently, the dark and bright tissue patterns (e.g.Fig. 3.C, green arrow; and Fig. 3.K, green arrow) are observed beyond the media layer and mimic the multi-layer vessel wall structures but they do not distract the automatic boundary allocations.While our study only included intravascular imaging prior to intervention, previously embedded stents are commonly encountered, owing to the high recurrence rate of acute coronary syndrome and myocardial infarction.Depending on the specific stent material and patient history, stents might appear embedded in the vessel wall (e.g.Fig. 3.J, green arrow) or protruding into the lumen (e.g.Fig. 3.J, yellow arrow; Fig. 3.L, green arrow).Stents generate diverse and strong image artifacts that impede the model's ability to correctly detect the boundaries.Exact layer segmentation behind stents presents challenges even for expert readers.With the exception of neointimal hyperplasia, previously stented segments are unlikely to reside in the culprit segment.Such segments were included in our data set merely to train the model to ignore the ensuing artifacts.Notably, there exists a distinct class of models designed to detect stent struts and verify correct stent deployment [27], [32], [36]).
To complement the qualitative assessment of the model with quantitative metrics Table II lists the model's performance for individual label classes, evaluated on the hold-out test set.The lumen segmentation achieved the best scores for all metrics while the plaque shadow performance influenced by the more ambiguous ground-truth labels owing to the lack of clear structural markers.Nonetheless, the individual metrics confirm the overall high quality of segmentation achieved by the model.
To substantiate the design of the model, we conducted an ablation study to examine the individual effects of the various loss terms, i.e., 1) The soft boundary cardinality loss term (L BC ).
2) The Attending Physician (a.k.a.Wasserstein critique model) loss term (L AP ), 3) The boundary precision loss term (L BP ), 4) The generalized soft multi-class dice loss term (L Dice ), and 5) The weighted cross-entropy loss term (L W CE ), We measured the accuracy, Dice coefficient, and modified  Hausdorff distance (MHD) averaged among all label classes on the hold-out test dataset for models trained with a reduced number of loss terms.The results of the ablation study on the loss terms are tabulated in Table III and confirm that each loss term contributes to the performance of the method or the output quality.Before developing and refining the individual loss terms, we set out to confirm the advantage of using intravascular polarimetry compared to the conventional IV-OCT for the visualization and segmentation of anatomical layers.Using the proposed architecture we trained the model with only the weighted cross-entropy loss function (L W CE ) and compared its performance to an adapted model that was trained with only the single intensity channel as input.The Dice coefficient of the media class using intravascular polarimetry data was 70.7%, while it was only 62.7/% when using conventional IV-OCT.The subsequent optimization of the model's performance improved the Dice coefficient of the media class using PS-OCT to 79.5%.The significant gain in performance achieved by using the polarimetric channels even with only the L W CE loss term confirms our previous qualitative observations of improved contrast for the media layer [13], [14].
For comparison with previous segmentation efforts using conventional IV-OCT, we compiled the reported performance of previous studies that developed segmentation methods for the lumen in Table IV and for those that segmented the two additional anatomical layers in Table V.There are many methods ( [21]- [27], [29], [32], [35], [36], [38], [39], [59]) that extract the lumen with Dice 95-95% and our method outperforms them all at 99%.Moreover, Table V indicates that our model achieves lower absolute distance error (ADE) for both outer intima and media boundaries compared to the two other reports accomplishing and reporting on this task.Here, we excluded thickened vessel walls from evaluation of the outer boundaries in Table V, in line with the analysis in [20], which only evaluated layer segmentation in 'healthy regions', and [21], which inspected allograft vessels with minimal intimal thickening.However, thickened vessel walls are the result of coronary atherosclerosis and very common, especially in the population of patients likely to undergo intravascular imaging.Importantly, our model is able to segment crosssections including thickened vessel wall segments, although imaging through this additional tissue degrades the achieved ADE (2.60, 16.9, and 20.85 µm for outer the lumen, intima, and media, respectively).Still, these values are comparable to the previous methods that detect these layers only in segments with minimal disease.

VI. DISCUSSION
PS-OCT complements the IV-OCT backscatter intensity signal by measuring the polarization state of reflected light and reconstructing tissue birefringence and depolarization signals.These polarimetric signals provide a more detailed characterization of the vessel wall and can help to differentiate tissue layers that have comparable scattering properties but distinct polarization features.PS-OCT enriches the visualization of anatomical layers and hence facilitates downstream image processing tasks.We proposed a convolutional neural network model with a new multi-term loss function that leverages the increased contrast available to PS-OCT to segment the vessel lumen, as well as the intima-media and the mediaadventitia boundary.Furthermore, the model works on all plaque types and correctly segments the inner and outer media boundaries even through thickened vessel walls, as long as the plaque is not opaque.Conversely, angular segments of lipid-rich or calcified plaques that impede detection of the subluminal anatomical layers are identified as plaque shadows.The model, however, continues to estimate the outer media boundary throughout these opaque regions.The model also identifies guidewire shadows without interrupting the lumen and out media segmentation.
Our comprehensive multi-class image segmentation model can support many downstream image analysis tasks.Automated and objective image segmentation simplifies both clinical research and affords integration into the clinical workflow by removing the workload of manual segmentation.For guidance of PCI, robust and automated measurement of the EEL diameter would simplify stent sizing [6].Evaluation of the intimal thickness along the vessel could enhance the common simplified visualization of the culprit vessel based on the lumen diameter with complementary information on the location and extent of plaques to select a suitable landing zone.In a clinical research setting, automated segmentation of the intimal thickness along entire coronary vessels would enable the formulation of questions that are currently impractical to address due to the workload of manual segmentation.Crucially, automated segmentation also enables evaluation of tissue polarization properties in distinct anatomical areas, which previously relied on tedious manual segmentation [13], [15], [52].We anticipate that such volumetric analysis of polarization properties will offer refined insight into plaque composition and may enable the formulation of polarizationinformed plaque index similar to the lipid-core burden index of near infrared spectroscopy [17].
The high-performance segmentation of the lumen and outside classes is an indication that we approached the limits imposed by using a single-reader ground-truth.The media boundaries and shadow classes likely suffer from higher intrareader ground-truth variability.The anatomical layers beyond the lumen are located in areas of decaying signal quality and the shadows intrinsically have a poorly defined border.The increased boundary to area ratio of the media furthermore deteriorates typical segmentation metrics even without degradation in the boundary precision.
In addition to the use of a single-reader ground truth, the limitations of the current study include a modest number of pullbacks and a limited spectrum of atherosclerotic disease.Also, segmentation was performed on individual cross-sections.While segmentation of adjacent cross-sections enables volumetric segmentation, there may be information embedded in the volumetric data that escapes the current model.Lastly, intravascular polarimetry with PS-OCT uses commercial clinical imaging catheters but currently uses a custom imaging console, which complicates clinical translation.Towards resolving this limitation, Xiong et al. [60] proposed a new method that may be compatible with existing imaging consoles and could accelerate the clinical translation of using anatomical layer segmentation based on tissue polarization properties.

VII. CONCLUSION
We proposed a method for the segmentation of intravascular polarimetry images of coronary arteries.The method performance compares favorably with state-of-the-art baseline algorithms, which operate on conventional IV-OCT images.The additional polarization contrast available to PS-OCT affords improved segmentation across a wide range of atherosclerotic lesion types and significantly improves the segmentation of the media boundaries in diseased vessels.Intravascular polarimetry with automated segmentation could be used for refined lesion characterization and may simplify and improve guidance of percutaneous coronary interventions.

Fig. 2 .
Fig. 2. The proposed model architecture.The main model takes the multi-channel polarimetric image as the input and produces a multi-class probability prediction as the output ( Ŷ).The auxiliary critique model (a.k.a.Attending Physician) is trained independently with concatenated images (I) and ground-truth labels (Y) as its input to predict the quality level of manual labeling.This critique model then evaluates the segmentation during the main model's training by providing one of the loss terms.

Fig. 3 .
Fig. 3. Qualitative assessment of PS-OCT cross-sections.The annotations of our model and the ground-truth are overlaid on the gray-scale intensity image in blue and red outlines, respectively.See the text for detailed discussion.Scale bar: 1 mm.

TABLE I THE
DEFINITION OF THE SIX EXCLUSIVE LABELS THAT ARE BASED ON THE MANUAL EXPERT ANNOTATIONS AND SHOWN IN FIG.1.D.

TABLE II THE
PERFORMANCE OF OUR MULTI-LABEL CLASSIFICATION MODEL BASED ON DIFFERENT PERFORMANCE METRICS.

TABLE V COMPARISON
OF OUR MODEL AND OTHER STUDIES THAT DETECT THE OUTER BOUNDARY OF LUMEN, INTIMA, AND MEDIA BY REPORTING MEAN ± STANDARD DEVIATION.