Automated stent coverage analysis in intravascular OCT (IVOCT) image volumes using a support vector machine and mesh growing

: Absence of vascular-stent tissue coverage by IVOCT is a biomarker for potential stent-related thrombosis. We developed highly-automated algorithms to classify covered and uncovered struts and quantitatively evaluate stent apposition. We trained a machine learning model on 7,125 images, and included an active learning, relabeling step to improve noisy labels. We obtained uncovered strut classification sensitivity/specificity (94%/90%) comparable to analyst inter-and-intra-observer variability and AUC (0.97), and tissue coverage thickness measurement arguably better than the commercial product. By comparing classification models from regular and relabeled data sets, we observed robustness of the support vector machine to noisy data. A graph-based algorithm detected clusters of uncovered struts thought to pose a greater risk than isolated uncovered struts. The software enables highly-automated, objective, repeatable, comprehensive stent analysis.


Introduction
Stent implantation via percutaneous coronary intervention is the preferred coronary revascularization procedure for patients with a flow-limiting atherosclerotic lesion. The first generation of stent, bare metal stent, is subject to frequent restenosis (reoccurrence of vessel narrowing) [1]. Drug eluting stents (DES) were developed and widely used to avoid the occurrence of in-stent restenosis. However, DES has been related to a higher risk of late stent thrombosis (LST) caused by delayed arterial healing. Although infrequent, LST is a catastrophic clinical condition carrying a mortality rate of up to 45% [2,3]. In order to overcome these limitations, bioresorbable stent (BRS) was developed. BRS theoretically leaves no pro-inflammatory substances or obstacles [4] and allows for restoration of vessel function [5]. However, this stent still has many challenges, such as appropriate stent design, mechanical property, biodegradation rate, and in vivo performance. Particularly, the changes of mechanical properties of such stent during the biodegradation is unclear and thinner struts have yet to be achieved by BRS causing the high risk of thrombosis [6]. Some studies have reported that BRS has a higher risk of device thrombosis than metallic stents [7][8][9][10][11][12][13], since thick stent struts are strongly correlated with blood-flow alterations and thrombogenicity. Therefore, new stent designs aim to aid appropriate vascular healing and prevent restenosis by combining pro-endothelialization coatings and anti-proliferative drugs [14,15]. Extensive preclinical and clinical studies are needed to evaluate the efficacy and safety of stent designs.
With its superior resolution, contrast, and imaging speed, intravascular optical coherence tomography (IVOCT) has been used in in vivo assessments of high-resolution lumen architectures and stent tissue coverage after stent implantation [16,17]. IVOCT collects volumetric data by scanning the side-view imaging catheter in a helical pattern as the catheter is pulled back through the vessel. We refer to this volumetric data set as a pullback. Stent struts can be considered as the continuous stent wire "discretized" by sampling in longitudinal direction. Strut tissue coverage assessed by IVOCT has become an important surrogate biomarker of stent viability. Low percentage of covered stent struts is related to the occurrence of LST [2,[18][19][20][21]. Recent studies showed that, with similar percentage of uncovered struts, clustered uncovered struts might increase the risk of LST compared to scattered distribution of uncovered struts [22].
Currently, IVOCT image analysis is primarily done manually, requiring 6-12 hours per stent. In addition, inter-observer and intra-observer variability is inevitable in manual analysis. In prior work, we reported that even the skilled cardiologists may reveal up to 5% intra-and 6% inter-observer variabilities for detecting stent struts in OCT images [23]. Manual stent analysis research also have advocated that there is an urgent need of automated stent analysis tools [24].
Although there are multiple reports of stent strut detection algorithms [25][26][27][28][29], including those from our group [23,30], the more relevant problem of automated assessment of tissue coverage remains relatively unstudied. In one approach, we and others have assessed tissue coverage area in stents at later time points [23,29]. Typically, a virtual stent cross section is created, and the area of the lumen is subtracted from the area of the stent to get a tissue coverage area. This is only applicable to stents at later stages with substantive tissue coverage. Strut level analysis has been done manually where a cardiologist determines the presence of tissue coverage for each stent strut in a pullback. Ughi et al. reported a method for automated analysis that measures the distance between a detected strut and the lumen [31]. For rabbit iliac artery, they got good correlations with tissue coverage thicknesses as assessed by histology. They pointed out that simply using the distances between struts and lumen boundary is insufficient for distinguishing thinly covered and uncovered struts [31]. In another paper, Adriaenssens et al. proposed an agglomerative clustering algorithm based on nearest-neighbor to define clusters of uncovered struts [32]. More recently, Nam et al. developed a stent-strut detection method, based on an artificial neural network, to quantify stent apposition and neo-intimal coverage [33]. They first determined stent strut candidates using first/second gradient information, and then these were classified as strut or non-strut based on machine learning of 6 statistical and 5 geometrical features. After classification, either the protrusion distance or neointimal thickness was measured for each stent strut. Automatic analysis of BRSs is challenging, especially when there is tissue coverage, as there are no super-bright reflections and tell-tale shadows as obtained with metallic stents. Nevertheless, there are promising reports obtained using image analysis approaches that focus on BRS appearance [34,35].
Our group has developed multiple algorithms on computer-aided analysis of IVOCT image volumes, including semi/fully-automatic segmentation and quantification of coronary plaques [36][37][38][39][40][41][42], as well as automated stent strut detection [23,30]. Specifically, in [23] we successfully applied Bagged Decision Trees trained on manually marked struts to identify true stent struts from a large number of candidate struts obtained by image processing techniques. In this paper, we build on our stent strut detection software and develop additional stent strut analyses. Since thin tissue coverage of struts can be very difficult to call, we use machine learning to determine the presence of tissue coverage on the surface of struts. That is, we classify stent struts as covered or uncovered. We use various hand-crafted image features surrounding the struts, derived from criteria used by cardiologists. Our experimental data include many cases of thinly covered struts, stress testing our software. Because the manual task is laborious and analysts have variable criteria, we find it desirable to use an active learnin an algorithm growing (MG risk than isola on 80 stent IV

Image an
We develope thickness of c proposed algo classification, cluster. First, border. Strut f pixel patches. into either co metrics from increase the r representative The distance from the strut center to lumen boundary was a useful feature. Uncovered struts usually lie on the luminal side of the lumen boundary and covered struts tend to appear on the abluminal side. For example, uncovered struts floating in the lumen (malapposed struts) and struts with thick tissue coverage are on opposite sides of the lumen boundary and their distances to lumen boundary have different signs. However, this feature was not very effective in distinguishing thinly covered struts and uncovered, apposed struts, as the lumen boundary was always detected on the luminal side for both. So we refined the lumen boundary by interpolating the boundary locations at two ends of the strut to replace the original boundary in strut A-lines. The distance from strut to the originally detected lumen boundary (SF 7) and the distance to the refined lumen boundary (SF 8) were both found to be useful. The front edge sharpness (SF 9) was the intensity gradient along the original lumen boundary in the strut A-lines. If the strut is uncovered, this edge tends to be sharper than covered struts, because it is the direct transition from lumen to strut.

Pre-proc
(2) Side patch features To capture the tissue continuity feature at the strut-tissue boundary in angular direction, we extracted a 5 by 10 patch at each end of the strut (Fig. 3, orange). The orientation of the lumen boundary relative to the A-line direction can introduce errors in side patch extraction. To reduce this effect, side patches were extracted along the direction from lumen boundary to strut front surface. The mean, median intensity, and intensity variation of these two patches were computed (SFs 10-15).
The percentage of pixels with intensity similar to the background was another useful feature (SFs 16 and 17), i.e. uncovered struts were closer to the lumen and their side patches had more dark pixels. The horizontal distance between the top and bottom ends of the side patches (SFs 18 and 19) and the orientation of the lumen boundary (SFs 20 and 21) had some influence on the side patch intensity features. SFs 18-21 were found to improve algorithm performance, so they are included as features as well.

Support vector machine (SVM) classifier
SVM is a widely used classifier proven to be successful in many applications. It is a linear classifier with maximum margin. The margin is the combined distance from the linear decision surface to the nearest data point on either side. Maximizing the margin has been shown to reduce overfitting and improve generalization [43]. It handles non-separable data by adding a slack variable, i ξ , to the objective function to allow certain amount of misclassification [44], i.e. soft-margin SVM. The soft-margin SVM algorithm solves the constrained minimization problem in (1).
where w is a vector normal to the decision boundary and 2 1 2 w is inversely proportional to the margin width; C is a positive parameter that controls the trade-off between the slack variable penalty and the margin width; y i is the true label; φ(x i ) is the transformed feature vector, and b is the bias parameter. SVM can produce a decision boundary nonlinear in the input. In (1), this is indicated with However, instead of mapping the input explicitly using a function φ, a "kernel trick" can be used. Here, we defined a kernel matrix Κ( Intuitively, the kernel matrix measures the similarity between the transformed feature vectors. In fact, it can be shown that any kernel matrix that satisfies Mercer's conditions corresponds to some feature map φ [45]. Further, the decision boundary can be expressed in terms of the kernel matrix. Thus the feature map φ does not need to be explicitly considered. This "trick" has great computational advantages: certain kernel matrices, such as the radial basis function (RBF) kernel, correspond to an underlying infinite dimensional mapping function φ. Yet the matrices themselves can be computed in polynomial time and can express a complex decision boundary. Various kernel functions, e.g. polynomial, RBF, and hyperbolic tangent, allow SVM to learn very complex relationships [44]. RBF kernel in (2) was selected via a trial-and-error experiment.
where x i and x this report, we  stance to lume accurate tissue map and the con e the global l ate boundaries d a local region t map was calc given a high g y. A local lume t ( Fig. 4(B)). from previousl ary ( Fig. 4

Data acquisition
Eighty manually analyzed IVOCT follow-up pullbacks acquired 2-5 months after stent implantation were used for training and assessment. The IVOCT data set used in this study consisted of 7,125 2D image frames including 39,000 covered and 16,500 uncovered struts in total via the Cardiovascular Imaging Core Lab in the Harrington Heart & Vascular Institute, University Hospitals Cleveland Medical Center, Cleveland, Ohio, serves as the IVOCT image analysis center for numerous clinical stent evaluation trials. There are more than 2,500 manually analyzed stent IVOCT pullbacks in the Core Lab database. All images were acquired by a Fourier-Domain OCT (FD-OCT) system (C7-XR TM OCT Intravascular Imaging System, St. Jude Medical, Westford, MA). The system used a tunable laser light source sweeping from 1,250 nm to 1,370 nm, providing 15-µm resolution along the A-line and 20-40 µm lateral resolution. Pullback speed was 20 mm/sec over a distance of 54.2 mm, and frame interval was 200 µm, giving 271 frames in total. Depending upon the length of the stent, 100 to 200 frames had stents present in each pullback. Each polar-coordinate (r,θ) image ( Fig.  1(A)) consisted of 504 A-lines, 970 pixels along each A-line. Images were transformed into Cartesian coordinates (x,y) ( Fig. 1(B)) to restore dimension and shape of anatomical structures. The Cartesian coordinate image was 1,024 x 1,024 pixels and the pixel size was 9.5 x 9.5 µm 2 . In polar coordinates, the strut shape is distorted and neighboring struts farther from the catheter tend to clump together due to fewer angular samples for the same arc length, making it hard to extract the features described in Section II. So we used Cartesian coordinate images for strut feature extraction.

Cross validation and SVM parameter selection
We randomly divided the 80 image volumes into groups of 10 and performed 8-fold cross validation. Using this strategy, images from one volume never simultaneously belonged to training and testing fold, which is the real-world scenario. In each round of training and testing, the values of C and σ were selected via cross validation within the training data set. For C 1 and C 2 , tested values included 1, 2, 3, 10, 20, 30, 100, 200, 300, 1,000, 2,000, 3,000, 10,000, 20,000, and 30,000. For σ, we tested 0.1, 0.3, 1, 3, and 10. In all 8 rounds, C 1 = 1,000 (for covered struts), C 2 = 1,000 (for uncovered struts), and σ = 1 were selected as the best parameter set, giving balanced sensitivity and specificity. The classifiers were trained with these parameters and performance was evaluated on corresponding testing data set. In our training data, the proportion of covered to uncovered struts was 2.4:1. In order to reduce the effect of class imbalance on the learned classifier, we randomly subsampled covered struts so that the final proportion was 1:1.

Inter-and intra-observer variability
To set a benchmark for algorithm performance, three pullbacks were used to assess inter-and intra-observer variability. To assess inter-observer variability, we compared the manual analyses of three cardiologists against a "gold standard" established by the consensus of two experienced OCT image analysts. Intra-observer variability was assessed by having two cardiologists repeat the analysis at a later time. Intervals were one month and one year for the two cardiologists.

Active learning relabeling
The 80 pullbacks were analyzed by four different expert cardiologists. Manual analysis was very laborious leading to potential analyst fatigue. Although analysts were consistently trained and had to achieve a good score on a test set prior to clinical evaluations, there could be inconsistent criteria among the four analysts over time. As a result, inter-observer variability inevitably introduced noise in the training data. Visual examination of false positives (FPs) and false negatives (FNs) revealed a small number of obvious manual labeling errors, possibly due to lack of experience at early stage of analysis. Therefore, we highlighted all the FPs and FNs (15% of all 55,500 struts) in the same color and asked an experienced cardiologist to re-examine them determined by previous analysts. In this process, the cardiologist was blinded to previous manual and automatic labels. Labels of 30% of the reexamined struts (4.5% of all the struts analyzed) were changed by the cardiologist. After reexamination, 8-fold cross validation was performed again.

Covered vs. uncovered strut classification
Collecting results on all the 80 pullbacks, sensitivity was 85 ± 2% and specificity was 85 ± 3% for identifying uncovered struts before training data improvement. To reduce training errors, the most experienced expert classified again FPs and FNs without knowledge of previous classification. With our active learning, relabeling step, the expert of experts changed classification in only 30% of re-examined struts. (Recall that the expert had no knowledge of previous manual or machine classifications.) After this gold standard data improvement, sensitivity and specificity were increased to 94 ± 3% and 90 ± 4%, respectively. Interestingly, after relabeling, algorithm performance became much less sensitive to changes in SVM parameters. For example, when σ was fixed at 1, C 1 = [1,000, 2,000, 3,000] and C 2 = [1,000, 2,000, 3,000] produced very similar results (< 1% change in accuracy). Statistics were calculated for uncovered struts, because stent studies usually report percentage of uncovered struts in a pullback. We used the probabilistic output of SVM to plot the receiver operating characteristic (ROC) curve (Fig. 5). The area under the ROC was 0.92 and 0.97 before and after improved training, respectively.   Histogram of errors on 39,000 struts. The very few large errors due to lumen segmentation errors can be avoided with manual correction of the lumen contour. B. Correlation between automatic and manual measurements after removing 5% outliers. Because it is impossible to plot 39,000 data points, we have created a color code showing that there are a very large number of samples at some data point locations. Figure 9 compares the tissue coverage thickness of covered struts obtained by the algorithm described in Section 2.4 and manual measurement using a commercial offline analysis tool (St. Jude Medical, Westford, MA). Figure 9(A) shows the histogram of errors. The mean and standard deviation of the errors were 2.5 µm and 29 µm, respectively. More than 95% errors were smaller than 50 µm. Very few large errors were caused by lumen segmentation errors in images with blood along the vessel wall. After removing 5% of outliers due to lumen segmentation errors, the correlation coefficient between manual and automatic methods was 0.94 ( Fig. 9(B)) and a fit with a linear equation gave a very small y-axis intercept. For strut to lumen distance measurements, the error was 0.1 ± 2.9 µm and correlation coefficient was 1.00 (not shown). There was much less error in the distance measurement as it was not affected by lumen segmentation error. Figure 10 shows the result of the MG cluster detection algorithm. Covered struts are shown in white. Small clusters of single uncovered struts are in red. The other different colors represent uncovered struts belonging to different large clusters. Here we defined clusters with more than 15 uncovered struts as large clusters. Another option would be setting a threshold of cluster area. The cardiology field should determine this threshold in future studies. A convex hull was computed for each large cluster and drawn in the same color as the struts.  Figure 11(A) is a 3D volume rendering of a stent implanted in a distal right coronary artery. The vessel wall was rendered in gold and stent struts were in grey. In Fig. 11(B), the vessel wall was removed to better visualize the stent struts. The lumen was shown in red and covered struts in different levels of green and blue to indicate different coverage thicknesses. A 3D volume rendering was carried out using Amira software 6.0 (Thermo Fisher Scientific, Waltham, MA, USA).

Discussio
In  8(B)), giving rise to inter-and intra-observer variability, respectively (Fig. 8). Manual evaluation is extremely time consuming, taking 6 to 12 hours, possibly causing fatigue and reduced effort, which may also play a role in variability. Training and testing with unenhanced data prior to relabeling gave good results (sensitivity of 85 ± 2% and specificity of 85 ± 3%), but performance as measured against the 4 cardiologists could never exceed the inter-and intra-observer variability as many algorithm "errors" were due to labeling errors.
To create the enhanced, relabeled data set, a single, very experienced cardiologist reviewed all FP and FN strut classifications and reclassified them without knowledge of prior analyst or computer classifications. In this enhanced training/test data set, inter-observer variability was greatly reduced, but probably not eliminated as the experienced analyst did not review all TP and TN results. With retraining on the enhanced data set, enhanced classifier performance against the single experienced expert very noticeably improved to sensitivity/specificity of 94 ± 3%/90 ± 4%. We believe that the improved performance statistics are indicative of actual algorithm performance.
To assess how SVM accommodates noisy training data, we compared the classifier trained on unenhanced data to the classifier trained on the relabeled, enhanced data, setting the latter as the gold standard. In this case, sensitivity/specificity improved from 85%/86% (8fold cross validation on noisy data) to 93%/91% (validation against the classification result of classifier trained on enhanced data). This remarkably improved result was very similar to 94%/90% (8-fold cross validation on enhanced data). This comparison indicates that SVM accommodated noisy training data well in this application, even though algorithms with complex hypothesis space, e.g. SVM with non-linear kernel, were considered to be more prone to over-fitting in the presence of noise. In our case, one can assert that the classifier trained on noisy data was in fact better than would be indicated by the 8-fold analysis on noisy data.
As an alternative to our strut classification approach, we tried to get very good segmentation of the lumen and then threshold the strut to lumen distance to determine covered versus uncovered. However, as we can see in Fig. 3, both the intensity characteristics in the A-line direction and the tissue continuity at strut-tissue junction in angular direction provide decisive information for distinguishing covered versus uncovered struts. When we classified the struts using strut-lumen-distance thresholds, poor areas under the ROC where obtained. Similarly, when we used the strut to lumen distance as a single feature to train our classifier, performance statistics decreased about 10 percentage points and the classifier significantly overcalled uncovered struts. Hence, it appears that the additional features focusing on tissue continuity at the strut-tissue junction in angular direction are important for proper covered and uncovered classification.
The difference in measured tissue thickness between the automatic and manual methods was mainly because we used a closest point method, instead of measuring the thickness along the strut-to-lumen-centroid direction, as done in the commercial offline analysis tool. Differences in lumen boundaries and strut locations obtained by the two methods also contributed to the difference in coverage thickness measurement. Measuring tissue along the direction from strut to lumen center slightly overestimates tissue thickness when the lumen surface was not perpendicular to the line connecting the lumen centroid and the strut. The closest distance from strut to lumen boundary better reflects the physical tissue thickness. In spite of the difference in measurement methods, the thickness calculated by our algorithm correlated well with manual measurement. This is because only some of the pullbacks have non-circular lumen and only a portion of the stent struts are subject to thickness overestimation. In addition to this error source, ≈5% of the strut thicknesses were inaccurate due to lumen detection errors. Typically, the issue is the presence of blood along the vessel wall. It is challenging to design a lumen boundary detection algorithm that works in every situation. Considering the small percentage of errors, we can easily resort to the manual editing tool available in our software package.
The MG algorithm identified spatially clustered uncovered struts. It is thought that regions of uncovered struts will result in a thrombosis risk, and some measures have been proposed. For example, one can report the percentage of uncovered struts in each 2D image [47] and maximum length of segments with uncovered struts [22]. In addition to an early potential thrombosis risk, regions of uncovered and apposed struts at deployment might well not cover over time, creating a longer term risk. Our MG algorithm used concepts of graph methods and region growing and identified clusters in 3D, rather than 2D. With parameters reported herein, the algorithm will identify clusters of uncovered struts connected within 1mm × 1mm neighborhoods. Other parameters are possible with the MG algorithm, and optimum values will require more analyses. We believe this algorithm will enable efficient cluster analysis in a large number of stent pullbacks, which will help determining whether "cluster of uncovered struts" could be a new, more powerful predictor of LST, as well as the critical size and percentage of coverage of the clusters.
With our highly-automated software, one manually sets the start and end frames, initiates processing, and then reviews and edits results. Processing took about 27 min on average and actual operator time was approximately 5% of the time required for fully manual labeling, assuming 9 hours per full pullback. Only about 5-10% of frames in each pullback required manual correction. Our software is created for offline analysis of stent evaluation studies rather than live-time analysis in the clinic. However, the analysis speed could be significantly improved by rewriting in a more efficient language, e.g. C + + .
In addition to the reduction of manual effort, a great advantage of automated processing is the repeatability of the analysis. Inter-and intra-observer variability among analysts was significant (Fig. 8). Computer analysis will be exactly repeatable, and even with manual editing, it will likely be much more repeatable than using different cardiologists. This reduced analysis variability should reduce statistical variations and lead to improved statistical power for comparisons of types of stents and treatments, possibly reducing the number of trials for significance. In addition, currently most trials are done with at least two cohorts such as conventional and new stent. With objective, reproducible analysis, one might be able to reliably compare a new stent or treatment against previously acquired and analyzed OCT image data. This could reduce the cost of clinical trials.

Conclusion
We developed highly-automated software which enables objective, repeatable, comprehensive stent analysis with very substantially reduced manual labor as compared to commercial software. As compared to human analysis with intra-and inter-observer variability, the reproducibility of the automated approach will reduce measurement variance and will likely improve the ability to assess statistical differences between stent designs. Relabeling of noisily labeled data gave us better statistical performance when measured against the improved ground truth. However, there was relatively little difference between classifiers trained on the original noisy labels and the enhanced data obtained following relabeling, indicating that SVM machine learning is robust against noisy data. We obtained very good measurements of tissue thickness with the exception of some outliers due to errors in lumen segmentation that can be easily edited. Our measurement of tissue thickness using a nearest point to lumen is likely more meaningful than the cord method used in the commercial software. Finally, the MG algorithm provides a means of assessing clusters of uncovered struts thought to present additional thrombosis risk as compared to isolated uncovered struts.