Learning-based distortion correction enables proximal-scanning endoscopic OCT elastography

Proximal rotary scanning is predominantly used in the clinical practice of endoscopic and intravascular OCT, mainly because of the much lower manufacturing cost of the probe compared to distal scanning. However, proximal scanning causes severe beam stability issues (also known as non-uniform rotational distortion, NURD), which hinders the extension of its applications to functional imaging, such as OCT elastography (OCE). In this work, we demonstrate the abilities of learning-based NURD correction methods to enable the imaging stability required for intensity-based OCE. Compared with the previous learning-based NURD correction methods that use pseudo distortion vectors for model training, we propose a method to extract real distortion vectors from a specific endoscopic OCT system, and validate its superiority in accuracy under both convolutional-neural-network- and transformer-based learning architectures. We further verify its effectiveness in elastography calculations (digital image correlation and optical flow) and the advantages of our method over other NURD correction methods. Using the air pressure of a balloon catheter as a mechanical stimulus, our proximal-scanning endoscopic OCE could effectively differentiate between areas of varying stiffness of atherosclerotic vascular phantoms. Compared with the existing endoscopic OCE methods that measure only in the radial direction, our method could achieve 2D displacement/strain distribution in both radial and circumferential directions.


Introduction
Elastography is a functional extension of medical imaging modalities that assesses the mechanical properties of tissues, which is predicated on the principle that diseased tissues often exhibit distinct elastic properties compared to healthy ones.By measuring the displacement or strain induced by an applied force, elastography can generate images that represent the stiffness or elasticity of the tissue under examination [1,2].The success of elastography lies in its ability to detect early-stage diseases, which often manifest as changes in tissue elasticity before structural alterations are visible.Ultrasound and magnetic resonance-based elastography have shown high sensitivity and specificity in detecting and grading liver fibrosis, which is crucial for the management of chronic liver diseases [3].Ultrasound-based elastography has been shown to improve the differentiation between malignant and benign breast lesions [4].
However, ultrasound and magnetic resonance-based elastography, with spatial resolutions of hundreds of micrometers to millimeters, face limitations in characterizing the mechanical properties of small lesions, such as those found on the inner surfaces of blood vessels and the gastrointestinal tract [5].For example, atherosclerotic plaques, which can form on the inner surfaces of blood vessels, are crucial to monitor due to their potential to cause cardiovascular events.The limited resolution may hinder detailed evaluation of plaque vulnerability, a key factor in predicting the risk of plaque rupture and subsequent complications like heart attacks and strokes [6].
Optical Coherence tomography-based Elastography (OCE), on the other hand, inherits the resolution advantages of OCT, has been successfully applied in various medical contexts [7,8].In oncology, OCE has demonstrated the ability to differentiate between healthy and cancerous breast tissues by revealing heterogeneous mechanical contrast not visible with OCT alone.This has significant implications for improving the accuracy of tumor identification during surgery, potentially reducing the risk of residual tumors after breast-conserving procedures [9,10].Additionally, OCE has shown promise in ophthalmology, where it has been used to measure the elastic modulus of the cornea in patients with keratoconus, a condition characterized by corneal thinning and bulging.By providing detailed maps of tissue elasticity, OCE can help monitor the effectiveness of treatments such as collagen crosslinking, which aims to stiffen the cornea and halt the progression of the disease [11,12].
The endoscopic form of OCE expands its applications beyond ex vivo examination and ophthalmology, offering a new dimension in in vivo diagnosis and treatment inside the human body.To realize endoscopic OCE, the key technical challenge is to reliably extract mechanical stimulus-induced deformations from OCT temporal sequences.This is because, unlike freespace OCT optical systems that use galvanometers or microelectromechanical systems for beam scanning, fiber-form OCT endoscopes usually achieve beam scanning through circumferential rotation and axial translation of the proximal end of the probe.This proximal scanning mode has severe position instability issues (also known as Non-Uniform Rotational Distortion, NURD [13,14]) for the reasons below [15]: (1) The variations in scanning speed for both proximal and distal scanning, (2) the mechanical friction during the bending of the catheter for proximal scanning, and (3) the synchronization errors between image acquisition and scanning.
There have been several technical attempts to address the NURD issue for realizing endoscopic OCE.M-mode imaging, which uses static fiber probes to collect temporal A-line sequences, has been employed to measure tissue deformation only at a single angular or transverse location [16][17][18].neighboring A-lines, instead of neighboring B-frames, have been utilized to calculate tissue deformation [19].This approach requires that the OCT A-line rate be synchronized with the period of the applied mechanical stimulus.Another method to characterize the tissue stiffness of the luminal structure is to track the changes in luminal shape [20], which minimizes the influence of the NURD but does not allow obtaining a two-dimensional strain distribution.Besides, instead of proximal scanning, Wang et al. utilized the distal scanning of a dual-layer micromotor to minimize beam scanning instability, which enables phase-based intravascular OCE [21].However, such distal scanning imaging probes are still difficult to popularize in clinical applications due to their high cost [22].
In this paper, we demonstrate, to the best of our knowledge, the first proximal-scanning BM-mode endoscopic OCE.The BM-mode here means that the displacement/strain is calculated from two B-scans before and after the application of the mechanical stimulus.We achieve this goal via learning-based NURD correction, which previously has been demonstrated to have superior correction accuracy [15,23] and real-time capability [23].Here we further demonstrate its capabilities in enabling the imaging stability required for intensity-based elastography [7].Our technical contributions can be summarized as follows: • Training NURD correction deep-learning models with real distortion vectors.Existing learning-based NURD correction methods [15,23] used feature tracking methods to extract pseudo distortion vectors from massive publicly-available endoscopic OCT datasets.This approach has the following drawbacks: (1) The accuracy of feature tracking-based NURD extraction depends on the reliable acquisition of image features, which is very difficult for image modalities such as OCT that are heavily affected by speckle noise [24].As a result, the extracted distortion vectors are usually inaccurate, which is why we call them pseudo.
(2) These publicly-available data were collected on endoscopic OCT systems with various imaging setups, probe designs, and application scenarios.Although the distortion vectors from them can reflect the general characteristics of NURD, the large feature variation may limit the performance of trained NURD correction models.By imaging geometric phantoms and a deep-learning-based post-processing pipeline, we are able to extract real distortion vectors from a targeted proximal-scanning endoscopic OCT system.We call them real is due to the fact that our proposed extraction method is able to obtain NURDs with very high accuracy.They are used in the training of NURD correction models, which improves the correction accuracy in both Convolutional Neural Network (CNN)-and transformer-based learning architectures.
• Measuring 2D displacement/strain distribution in both radial and circumferential directions.Previous endoscopic OCE methods [16][17][18][19]21] employed the intensity or phase of A-lines, thus can only derive the displacement/strain along the radial direction.Leveraging the capability of learning-based NURD correction, our endoscopic OCE method could be applied to B-scans, which enables the extraction of radial and circumferential displacement/strain.Our results show that the displacement/strain, particularly the circumferential ones, benefit from the NURD correction.
• Combining proximal-scanning microprobe and balloon air pressure for endoscopic OCE.Previous methods for endoscopic optical elastography either used acoustic radiation force or intraluminal pressure change as mechanical stimulus [18,[25][26][27].For endoscopic OCE, the former needs the integration of optical and acoustical components [28], and the latter requires synchronization with imaging speed [19].Here we utilize a balloon catheter, which is common to percutaneous interventional procedures [29].By varying its air pressure, we are able to obtain the mechanical stimulus required for OCE.This approach does not require changes to the OCT endoscope design and has a low need for synchronization.
• Validating our NURD correction and endoscopic OCE methods on atherosclerotic vascular phantoms.We construct vascular phantoms that mimic the morphology and mechanical properties of atherosclerotic plaques.Using this phantom model, we have verified that our NURD correction method could provide superior imaging stability for calculating intensity-based elastography, compared with other NURD correction methods.It could effectively differentiate between areas of varying stiffness in proximal-scanning endoscopic OCT imaging.
The remainder of this paper is organized into the following sections.In Section 2, we introduce our methods for real distortion vector extraction, learning-based NURD correction, endoscopic OCE via balloon air pressure, and intensity-based elastography.We describe the materials and fabrication process for the atherosclerotic vascular phantoms and experimental settings in Section 3. We present our experimental results in Section 4 and discuss the perspectives and limitations of this work in Section 5. We draw our conclusions in Section 6.

Real distortion vector extraction
Figure 1 illustrates the overall pipeline of the real distortion vector extraction and application.Firstly, we utilize an imaging phantom for the distortion extraction.Here we employ a square quartz tube for its significant angular characteristics and axially invariant property.It should be noted that our method relies on a priori knowledge of the shape of the imaging object.Quartz tubes rather than biological tissues are used here as imaging objects because the former have a very well-defined and non-deformable structure.Of course, we can also use other well-defined and rigid objects for the extraction of the real distortion vectors.We perform endoscopic imaging in it using our home-built proximal-scanning endoscopic OCT system as described in Section 2.3.As shown in Fig. 1(a), after the OCT imaging, the four inner edges of the quartz tube are presented as four interconnected irregularly-shaped arcs.We crop the arc trajectory and reshape all images into a size of H × W. To extract the real distortion vectors from these arcs, we train a deep-learning-based segmentation model with the U-Net architecture [30].Due to their morphological similarity, we only need to label 20 representative B-scans for training, which results in a segmentation accuracy of at least 99%.After obtaining all binary masks of arc trajectory, we can extract A-line-level distortion vectors between two successive binary masks (the n − 1-th and n-th frames.n refers to the order of temporal sequence).As shown in Fig. 1(b), for a patch with the size of H × w (w ≪ W) centered at the i-th A-line of the n − 1-th frame, we aim to find its correspondence inside the n-th frame, which can be used to derive the NURD-induced displacement.We calculate the similarity between this patch and the sliding-window patches from the centered A-line position i − d/2 to i + d/2 in the n-th frame.The Dice score is employed as the similarity measure, which can be written as: where P i is the patch of the centered A-line position i in the n − 1-th frame.G j is the patch of the centered A-line position j (j ) in the n-th frame.The Dice scores range from 0 (no overlap) to 1 (complete overlap).We obtain the index k of the Dice(i, j) maximum of the i−th A-line and take ∆ = k − d/2 as its mismatch.After calculating the mismatch values for each A-line, we can then form a true distortion vector between two consecutively acquired B-scan images and smooth it using a Gaussian filter.By repeating this process, we can collect numerous real distortion vectors for a specific endoscopic OCT system.Finally, we can build original/distorted pairs by randomly using a distortion vector as the ground-truth.The constructed dataset is then used for training the NURD correction models.

Learning-based NURD correction
CNN [31] and transformer [32] are two dominant architectures in deep learning.Here we use both of them in the verification of our proposed distortion extraction method.[23].This architecture is inspired by the self-attention mechanism in natural language processing and computer vision [33].By exploiting its ability to model long-distance dependencies, we can directly obtain spatial correlations between OCT A-lines at arbitrary distances, thus accelerating NURD corrections.During the training phase, original/distorted frame pairs are fed into the stacked cross-attention network to predict two distortion vectors: one is the distortion applied to the original frames to form the distorted frames (known ground truth); the other is the distortion applied to the distorted frames to form the original frames.With these, the original frames can be converted into distorted frames and vice versa.In the inference phase, the output distortion vector 1 is used to correct the NURD of the n − th frame.
Note that both CNN-and transformer-based methods require known distortion vectors as ground truth to construct the original/distorted pairs for NURD correction training.The previous works [15,23] employed the pseudo distortion vectors extracted from massive publicly-available endoscopic OCT datasets.In this work, we propose a method to extract real distortion vectors for a specific endoscopic OCT system, as described in Section 2.1.

Endoscopic OCE via balloon air pressure
Figure 3 illustrates the schematic of our home-built proximal-scanning endoscopic OCT system and its elastography function by balloon air pressure.It is a spectral-domain OCT system, which has a central wavelength of 840 nm, a bandwidth of 60 nm, and an A-line rate of 80 kHz.The corresponding axial resolution is ∼ 5 µm.A homemade side-view fiber probe is used as the sample arm, which has a length of 1.2 m and an outer diameter of 0.46 mm (for more details, please refer to our previous work [23]).It has a lateral resolution of 25 µm and a working distance of ∼ 2 mm.Proximal scanning was performed using a homemade fiber optic rotary joint (FORJ) driven by a motor with a rotation speed of 34 rps.For the mechanical stimulus required by the OCE, we use the balloon expansion pressure from a balloon catheter for the quasi-static compression.Our proximal-scanning OCT probe is inserted into the guidewire port of the balloon catheter for the distal-end imaging of the targeted positions of the lumen.The balloon inflation port of the catheter is connected to an air pump for controllable inflation.
The use of balloon is our endoscopic OCE should follow the clinical practice of balloon angioplasty procedure [34].Before the procedure, patients undergo diagnostic imaging such as angiography to confirm the location and severity of the arterial narrowing.Antiplatelet medications may be administered to reduce the risk of clot formation during the procedure.The doctors perform the angioplasty under fluoroscopic guidance, which allows real-time imaging of the blood vessels.The balloon is inflated to the desired pressure, and the resulting change in the artery's diameter is assessed.During the procedure, a doctor selects a balloon catheter of appropriate size, based on the reference vessel diameter and lesion characteristics, to ensure safe and effective dilation.The balloon is carefully guided to the site of the lesion via a catheter, inflated to compress the plaque against the arterial wall, and then deflated and removed.The choice of balloon size and inflation pressure is critical, typically starting with a balloon diameter 0.5-1 mm smaller than the vessel to minimize complications, and using pressures ranging from 6 to 10 atm for general dilation, with higher pressures for harder or calcified lesions.
In our experiments below, the balloon is inflated by increasing internal pressure to ensure close contact with the lumen wall.Benefiting from the resolution advantage of OCT, only micrometer scale deformation of the lumen is required for elasticity computation.Here we employed the balloon air pressure ranging from 6 to 8 atm.

Intensity-based elastography algorithms
Intensity-based elastography algorithms are widely used for assessing tissue mechanical properties by analyzing the deformation induced in response to external forces [7,35].Two common techniques employed in intensity-based elastography to measure tissue deformation are Digital Image Correlation (DIC) and optical flow.
DIC measures tissue deformation by tracking patterns of intensity changes between the images before (reference) and after (deformed) applying mechanical stimulus.The sampling points are set in the reference image.For each sampling point in the reference image, a subset around it I is chosen to search for another subset I ′ in the deformed image with the maximum correlation coefficient.Six deformed variables including displacements and local displacement gradients between the subsets can be obtained with iterative computations.The cross-correlation can be written as: where C is the cross-correlation coefficient.u and v are lateral and vertical displacements, respectively.∂u ∂x , ∂u ∂y , ∂v ∂x , ∂v ∂y are the local displacement gradients where the normal and shear strain can be obtained.f (x, y) and f ′ (x + α, y + β) are intensity values of the I and I ′ , where α = u + ∂u ∂x ∆x + ∂u ∂y ∆y and β = v + ∂v ∂x ∆x + ∂v ∂y ∆y.f and f ′ are their mean values, respectively.Optical flow methods estimate tissue displacement by tracking the apparent motion of image features over time.Let I(x, y, t) be the intensity of a pixel at coordinates (x, y) in the reference image frame at time t, and I ′ (x ′ , y ′ , t + ∆t) denotes the intensity of the same pixel in the deformed image frame at time t + ∆t.The optical flow field v = (u, v) is estimated by solving the optical flow equation: where ∇I is the image gradient and ∂I ∂t is the temporal intensity change.This equation represents the constraint that the image intensity does not change for a moving pixel over a short time interval ∆t.
During balloon inflation, the lumen undergoes radial and circumferential deformation.We calculate the elasticity before the polar transformation, so we calculate the radial (δ rr ) and circumferential (δ θθ ) strain based on the deformation field: For the DIC method, the strain can be achieved directly from the cross-correlation calculation.For the optical flow method, the strain is deduced from the estimated displacement fields.

Atherosclerotic vascular phantoms
In order to simulate the morphology and mechanical properties of atherosclerotic blood vessels, we fabricated a silicone-based soft body model containing a polydimethylsiloxane (PDMS) rigid inclusion.The optical properties and controlled elasticity of silicone materials mixed with inorganic scatterers are stable and widely used for OCE modeling [36].We modeled blood vessels using a soft medical silicone tube with a Young's modulus of 350 kPa (commonly used as an instructional material for training vascular suture practice) with an inner diameter of 2.5 mm and a wall thickness of 0.5 mm, and a small amount of PDMS was applied to the wall portion of the tube to simulate stiff calcified plaques.The estimated Young's modulus was 15.5 MPa.The concentration ratio of PDMS to curing agent was 10:1.Alumina powder was added to the PDMS fluid at a concentration of 16 mg/mL to produce optical scattering to a similar degree as the container phantoms.Homogeneous mixing of the alumina in the PDMS was achieved by dispersing the particles in an ultrasonic bath.Figure 4(a) shows the photographs of our atherosclerotic vascular phantoms, including external and cross-sectional views.An OCT B-scan of the phantom acquired using our homemade system is given in Fig. 4(b).The plaque-mimic regions are marked with star symbols.

Training NURD correction models
We amassed a total of 7,731 endoscopic OCT B-scans sourced from publicly-available datasets [37][38][39][40][41][42][43][44] to facilitate the training of the NURD correction models.In preprocessing each B-scan, we implement a series of augmentation techniques to enhance the robustness and generalization of our models.This process involved introducing random shifts within the Cartesian coordinate domain and random rotations within the polar coordinate domain.Additionally, we incorporate random Gaussian noise, apply random horizontal flips, and introduce random small shift perturbations for each A-line.Then we apply a distortion vector to each A-line of an original frame to obtain the corresponding distorted frame.In the experiments, we compare the NURD correction models trained with the pseudo distortion vectors used in previous studies [15,23] and the real distortion vectors introduced in this work.By randomly applying these distortion vectors to the B-scans, we created 20,000 pairs of original/distorted images for training.We implement both the CNN-and transformer-based NURD correction models, the extraction of the distortion vectors, and the pre/post-processing steps, using the Pytorch framework [45].The training of the NURD correction models are conducted on a personal computer equipped with an Nvidia 3090 GPU (24GB memory).Input data is structured to contain 1024 A-lines per B-frame, each with 512 data points.In order to enhance the convergence efficiency of the network, we initially conduct pre-training of the NURD correction networks utilizing a conservative learning rate of 1e − 5, employing a modest dataset for this purpose.This dataset comprises 2,000 original/distorted pairs, generated through random sampling from 8 uniformly acquired images sourced from the entire dataset.Subsequently, the pre-trained network weights serve as the initialization for training the 20,000 original/distorted pairs of data.Here we employ an elevated learning rate of 5e − 4. The hyperparameters we used in training followed the settings used in their original implementation [15,23].

Pipeline of proximal-scanning endoscopic OCE
Figure 5 illustrates the operation pipeline of our proximal-scanning endoscopic OCE.We integrate the homemade proximal-scanning OCT probe and an air pump with a balloon catheter for the balloon preparation.During the balloon inflation, the OCT system is simultaneously performing continuous rotational scanning and acquisition.The NURD of the acquired OCT B-scans is corrected in real time via the inference of our learning-based models.The mechanical stimulus (balloon pressure) starts from the contact between the balloon and the inner wall of a lumen.An OCT B-scan acquired before this stimulus is selected as the reference frame.The B-scans under pressure are employed as the deformed frames.To minimize the influence of decorrelation noise [46], we use a correlation coefficient greater than 0.6 as a selection criterion for the deformed frames [47].The subsequent pre-processing of the reference and deformed frames includes the suppression of background noise using an intensity threshold and the recurrent padding of A-lines at the start and ending edges of a B-scan.Finally, the elasticity calculation, including displacement and strain, is conducted using the DIC-and optical-flow-based algorithms.

Implementation of DIC and optical flow algorithms
For the DIC-based displacement/strain estimation, we employ the µDIC toolkit [48], which uses B-spline elements for the discretization of displacement fields.It allows us to control the polynomial order and continuity degree, with the B-spline surface defined by a double summation over control points that are determined by the B-spline basis functions and their coordinates.To optimize the mapping and minimize the sum of squared differences (SSD) between reference and deformed images, it utilizes a modified Newton-Raphson iterative solver, which iteratively refines the control point positions based on the gradient of the SSD.During this optimization process, inter-pixel values are obtained through bi-cubic or bi-quintic spline interpolation to minimize bias error.
For the optical flow estimation, we employ the Gunnar Farneback algorithm [49], which computes dense motion fields by locally fitting quadratic polynomials to image brightness variations over time.The key idea of this algorithm is to approximate the spatial and temporal derivatives of image intensity using polynomial expansions, and then solve for the optical flow parameters that minimize the difference between these derivatives in consecutive frames.By iteratively refining these parameters using a pyramidal approach, the algorithm can accurately estimate the motion vectors for every pixel in the image.Besides, it incorporates a dense interpolation scheme to provide smooth and dense flow fields.
Utilizing the above computational resources, the DIC and optical flow algorithms process a pair of OCT B-scans in approximately 0.9 seconds and 0.15 seconds, respectively.

Real vs. pseudo distortion vectors
We compare the performance of NURD correction using the deep-learning models trained with the real distortion vectors (this work) and the pseudo distortion vectors (the previous works [15,23]).The qualitative and quantitative results are given in Fig. 6 and Fig. 7, respectively.Before the inference of the deep-learning models, the OCT imaging data of the quartz tube (used for the real distortion vector extraction) are severely distorted, as demonstrated by the twisted morphology in 3D rendering [Fig.6(a)] and the colored appearance in cross-sectional view [Fig.6(d)].Here, three B-scans by the sampling interval of 10 are mapped into each channel of an RGB image, so the colored appearance refers to the fact that the structural information in these channels is not well aligned.In contrast, after the NURD correction, the original geometry of the quartz tube that has flat surfaces and right angles can be achieved, and the alignment of the structural information in the RGB channels is improved.When further comparing the results using the pseudo distortion vectors [Fig.6(b) and (e)] and the real distortion vectors [Fig.6(c) and (f)], it can be seen that the latter is able to obtain a more accurate 3D morphological recovery, and the information of the different channels in the cross-sectional view overlaps better, thus approaching a grayscale display.In the qualitative comparison, we employed both the CNN-and transformer-based learning architectures, and achieved similar results.We only give the results of the transformer-based model here for simplicity.We further conducted the quantitative comparison of the NURD correction performance using the real and pseudo distortion vectors, the results are demonstrated in Fig. 7.The upper and lower panels are the results using the transformer-based and CNN-based learning architectures [15,23], respectively.As shown in Fig. 7(a) and (c), for an image sequence, the correction errors (in pixel) of the models trained with the real distortion vectors (green) are always smaller than those trained with the pseudo distortion vectors (red).The corresponding statistical box plots are given in Fig. 7(b) and (d).

NURD correction for elastography
To evaluate the effectiveness of the NURD correction in elastography, we captured a B-scan sequence of the homemade atherosclerotic vascular phantom (described in Section 3.1) using our proximal-scanning endoscopic OCT system (described in Section 2.3).Here we did not inflate the balloon, so the vascular phantom is in a static state, i.e., no displacement or strain.In this case, if the DIC or optical flow algorithm is used for the two neighboring frames in the sequence, the resulting displacement is that due to the NURD.
Figure 8 demonstrates the image sequence of the vascular phantom before and after the NURD correction.Hereafter we demonstrate the results of using our transformer-based learning architecture [23] unless stated otherwise.The upper row shows the axial maximum value projection (MVP) of the sequence, the lower row gives representative B-scans.Their positions in the sequence are labeled by the dashed lines in the left column.As shown in this figure, before the NURD correction, the structural features drift along the circumferential direction during the acquisition of the image sequence.This phenomenon can be corrected via the learning-based algorithms.For the representative B-scans in the lower row, their orientation is altered during the NURD correction.In addition, the parallelism of the stripe features in the axial MVP images confirms the superiority of the model trained with the real distortion vectors.We then employed the neighboring B-scans in the lower row of Fig. 8 to calculate the circumferential and radial displacements between them.Because no mechanical stimulus was applied during their acquisition, the displacement should be calculated as zero if there is no NURD.We compare the results of DIC and optical flow as demonstrated in Fig. 9.The colors in the graphs correspond to displacements in pixels.The color bar on the right gives the quantitative mapping between them.For the radial displacements, red (positive values) represents outward expansion and blue (negative values) represents inward contraction.For the circumferential displacements, red (positive values) represents clockwise displacements and blue (negative values) represents counterclockwise displacements.The values in the graphs are the average displacements.As shown in the figure, the displacements calculated via the DIC and optical flow algorithms are consistent.Before the NURD correction, there is a pronounced presence of circumferential displacements (an average of 14.83 pixels from the DIC and 14.91 pixels from the optical flow).While the radial displacement is relatively small (an average of 0.32 pixels from the DIC and 1.04 pixels from the optical flow).It is consistent with the NURD observed during the proximal scanning endoscopic OCT imaging [50].After the NURD correction, the circumferential displacements are significantly reduced to less than 2 pixels, regardless of the types of distortion vectors and the displacement calculation methods.The NURD-induced circumferential displacements are further minimized using our real distortion vectors.On the other hand, the minimization of the NURD-induced radial displacements also benefits from the learning-based correction, probably because the displacements are estimated from 2D speckle patterns.

Comparison of NURD correction methods
To demonstrate the superiority of our method in enabling elastography computations, we compared its capabilities in minimizing the NURD-induced displacements, with two representative NURD correction methods [15,24].We employed the same experimental setting as those used in Section 4.2.The results are demonstrated in Fig. 10.For simplicity, we only give the results of the circumferential displacements here, which are more heavily contaminated by NURD.Our method is the combination of the transformer-based learning architecture we developed before [23] and the real distortion vectors (for training NURD correction models) proposed in this work.The FT refers to the feature-tracking-based NURD correction method proposed in [24], which first used the SURF algorithm [51] for feature extraction and then employed the KLT algorithm [52] for feature tracking and registration.The De-NURD refers to the CNN-based NURD correction method proposed in [15], which is the combination of the CNN-based learning architecture as illustrated in Fig. 2(a) and the pseudo distortion vectors for training NURD correction models.As shown in the figure, both the FT and De-NURD methods could reduce the NURD-induced displacements.The De-NURD method achieves smaller average displacement values.However, its displacement estimations show opposite elastographic states.This suggests that not only the NURD itself, but also inaccurate NURD corrections can lead to erroneous elastography results.Compared with them, our method is able to further minimize the NURD-induced displacements, which will help to accurately estimate the elastography parameters.

Proximal-scanning endoscopic OCE
After validating the contribution of the NURD correction to the elastography displacement calculation, based on the developed atherosclerotic vascular phantoms (described in Section 3.1), we performed experiments on its effectiveness in our proximal scanning endoscopic OCE.Because the plaque-mimic regions have a Young's modulus of 15.5 MPa, which is much stiffer than that of the surrounding vessel material (a Young's modulus of 350 kPa), the displacement and strain in the plaque-mimic region will be less than those in the surrounding regions when balloon pressure is applied, which allows the differentiation of stiffness at different regions.
Figure 11 demonstrates the calculated displacement and strain distributions of the phantom using the reference frame and a deformed frame obtained when applying the balloon air pressure.Due to the consistency between the results of DIC and optical flow algorithms, we only give the displacement and strain results from the DIC.On the reference frame in the upper left corner of the figure, we mark the plaque-mimic region with white lines.The positive values (red) in the strain maps refer to the expansion of materials, and the negative values (blue) refer to the compression of materials.In the radial direction, we can see the plaque-mimic region has displacement and strain values close to zero.Because the balloon pressure is directed outward from the inner wall of the vascular phantom, the stiff plaque areas result in pressure that cannot continue to be transmitted to the soft vascular areas behind them, so the displacement and strain in these areas are also close to zero.Although the NURD has limited influence in the radial direction, the application of the correction model still helps to more accurately characterize the plaque-mimic regions.For the circumferential positions that do not have the plaque-mimic material, the displacement values are positive (red) and decrease along the radial direction, which indicates that the material is pushed outward by the balloon and the displacement decreases from inside to outside.Their strain values, on the other hand, are negative (blue), which indicates that the material is squeezed as the balloon inflates.In the circumferential direction, since the lumen of the vascular phantom is not regularly round, the force exerted by the balloon is not uniform.So we could observe the displacement and strain values are varying along the circumferential direction.This phenomenon is clearly visualized in the OCT sequence recorded during the inflation of the balloon (see Visualization 1).Note that the circumferential displacement values before the NURD correction are all negative (blue), which implies that the entire phantom is displaced counterclockwise.It is not consistent with reality and further verifies the necessity of the NURD correction.Using the visualization of the dynamic response of the phantom during the balloon inflation, we further demonstrate the capabilities of proximal scanning endoscopic OCE.As shown in Fig. 12, we performed the elastography calculations between the reference frame and four deformed frames acquired successively during the inflation of the balloon.These frames were acquired with our proximal-scanning endoscopic OCT working in the BM mode, and the time interval between them was about 0.2 seconds.The second through fifth rows show the calculated results for the radial displacement, circumferential displacement, radial strain, and circumferential strain, respectively.It can be seen that as the balloon expands, the radial outward displacement gradually increases except in the plaque-mimic region, and the corresponding radial strain shows that the material of the vascular phantom is gradually compressed.For the circumferential displacement, due to the irregular shape of the lumen of the vascular phantom, the displacement is not uniformly distributed, but it can be seen that it also increases progressively as the balloon expands.The corresponding circumferential strain showed a similar trend.The second through fifth rows show the calculated results for the radial displacement, circumferential displacement, radial strain, and circumferential strain, respectively.

Discussion
In this study, we demonstrate that learning-based NURD correction can facilitate the development of proximal-scanning endoscopic OCE, which has important applications for the clinical diagnosis and treatment of vascular, esophageal, gastrointestinal, and other human lumens [19][20][21]24].Benefiting from its high resolution, endoscopic OCE can be utilized to assess the biomechanical properties of tiny lesions in the lumen, which in turn can aid in clinical decision-making, such as determining the vulnerability of atherosclerotic plaques [53].In addition, based on the difference in the stiffness of normal and diseased tissues, such as tumor tissues that usually have high stiffness [8], endoscopic OCE can be used as a novel means of in situ, label-free biopsy to enhance the efficiency of diagnosis and treatment.On the other hand, endoscopic OCE can provide a more accurate assessment of interventional procedures for both conventional and newer endoluminal implantable instruments, such as drug-eluting stents and balloons [54].Compared to structural OCT information alone, OCE can provide a quantitative description of the mechanical properties of the material, the compliance of the tissue, and the degree of fit between them.If we consider the processes of tissue deformation caused by minimally invasive interventions such as lasers, radiofrequency, ultrasound, etc. as the mechanical stimuli required for elastography, endoscopic OCE can be used for the monitoring of these processes to enhance the precision of the treatment [24,55].In addition, our method can be used for proximal scanning of endoscopic OCT systems without significantly increasing the cost of clinical applications as in the case of distal scanning probes [22].Moreover, our approach does not involve hardware modifications and can therefore be adapted to commercially available endoscopic and cardiovascular OCT systems.
Technically, we propose a method to extract real distortion vectors from a specific endoscopic OCT system, which has been demonstrated to improve the accuracy of NURD correction models under both CNN-and transformer-based learning architectures.This advancement is crucial as it moves us away from the reliance on pseudo distortion vectors [15,23], which are typically derived from publicly-available datasets and may not accurately represent the specific characteristics of an endoscopic OCT system.We have also shown that our method can measure 2D displacement/strain distribution in both radial and circumferential directions, offering a significant advantage over existing endoscopic OCE methods that are limited to radial measurements [18][19][20][21].This capability allows for a more comprehensive assessment of tissue elasticity, which is particularly beneficial for the detection and grading of diseases such as atherosclerosis, where the mechanical properties of tissues can vary significantly in different directions.Furthermore, we introduced a novel approach to endoscopic OCE by combining proximal-scanning microprobe with balloon air pressure as a mechanical stimulus.This method simplifies the process by utilizing a balloon catheter, which is a common tool in percutaneous interventional procedures, and avoids the need for complex integration of optical and acoustical components or synchronization with imaging speed [18,19].Our experimental results, validated on atherosclerotic vascular phantoms, confirm that our NURD correction method provides superior imaging stability for calculating intensity-based elastography.This achievement is a significant step towards the clinical application of endoscopic OCE, as it allows for the effective differentiation between areas of varying stiffness within biological tissues.
Despite the many technical innovations and application potentials mentioned above, the present work still has multiple limitations: (1) The effectiveness of our proximal-scanning endoscopic OCE needs to be further validated on ex vivo biological tissues and in vivo animals.In this paper, we used atherosclerotic vascular phantoms to validate the contribution of NURD correction to accurately extract displacement and strain for elastography, and to demonstrate the ability of our method to differentiate between atherosclerotic plaque and normal vascular tissues.Although our phantom has strived to mimic the optical and mechanical properties of real biological tissues, it is difficult to take into account the effects of cellular viability, spontaneous body movements (e.g., heartbeat), and the curvature and possible friction during in vivo catheter use, which may be random or unpredictable in nature.We will try to perform atherosclerosis modeling in mice and rabbits in the future in order to validate the effectiveness of our method and bring it to the clinic.
(2) Our method remains to be validated in other types of mechanical stimulus (e.g., ultrasound shear waves) and dynamic elastography.The use of balloon air pressure as a mechanical stimulus in our endoscopic OCE method is a novel approach; however, it also introduces potential limitations.The pressure application may not be uniform across all tissues, particularly in regions with complex geometries or stiff inclusions, such as atherosclerotic plaques.This could lead to variations in the measured displacement and strain values, which may need to be accounted for in the interpretation of elastography results.We are going to develop endoscopic OCT imaging probes that incorporate acoustic stimuli, laser thermal stimuli, etc. to promote the integration of interventional diagnosis and treatment.(3) The processing speed of the elastography calculations, such as the DIC and optical flow algorithms, needs to be accelerated for real-time applications.As mentioned above, currently, these algorithms require a processing time of >100 ms.In the case of OCT-guided interventions, this delay could hinder the surgeon's ability to make immediate decisions based on tissue elasticity information.Several strategies can be employed to address this limitation.First, optimizing the existing algorithms through parallel computing techniques and efficient numerical methods can help reduce computation times.Second, exploring alternative, less computationally demanding methods or approximations that maintain a reasonable level of accuracy could be beneficial for real-time applications.Lastly, the development of dedicated hardware or application-specific integrated circuits (ASICs) tailored for elastography calculations could potentially offer the speed and efficiency required for real-time processing.

Conclusions
We have demonstrated that the implementation of NURD correction through learning-based techniques has expedited the advancement of proximal-scanning endoscopic OCE.we have proposed a pipeline to extract real distortion vectors from a specific endoscopic OCT system, which enhances the accuracy of NURD correction models when applied to CNN and transformer-based architectures.Our method has enabled the measurement of 2D displacement/strain distribution in radial and circumferential directions, providing a comprehensive assessment of tissue elasticity.By utilizing balloon air pressure as a mechanical stimulus for elastography calculations, we have validated our method on atherosclerotic vascular phantoms.The results show that it can effectively differentiate tissue stiffness and is promising for many clinical applications such as on-site diagnosis, intraoperative monitoring, and therapeutic evaluation.

Fig. 1 .
Fig. 1.(a) Illustration of the overall pipeline of real distortion vector extraction and application.(b) Detailed schematic of obtaining the real distortion vectors.
Figure2(a) illustrates the CNN-based NURD correction architecture proposed in[15].It has two branches.The first branch is used to estimate A-line-level shift vectors.The correlation matrix of the A-line sequences between the original frame (the n − 1-th frame in the inference) and the distorted frame (the n-th frame in the inference) is first computed, and then the CNN is used to find a vector in the correlation matrix as the optimal path, which represents the shifts of each A-line.The other branch is used to estimate the image-based group rotation values.The inputs to the CNN branch consist of the original frame (the n − 1-th frame corrected in inference), the distorted frame (the n − 1-th frame in inference), and the masked original frame (the 1-th frame in reference).In the inference stage, the A-line-based shift vector and the image-based group rotation values are fused together to estimate the distortion vector of the corrected n-th frame.

Figure 2 (
Figure 2(b) illustrates the transformer-based NURD correction architecture proposed by us recently[23].This architecture is inspired by the self-attention mechanism in natural language processing and computer vision[33].By exploiting its ability to model long-distance dependencies, we can directly obtain spatial correlations between OCT A-lines at arbitrary distances, thus accelerating NURD corrections.During the training phase, original/distorted frame pairs are fed into the stacked cross-attention network to predict two distortion vectors: one is the distortion applied to the original frames to form the distorted frames (known ground truth); the other is the distortion applied to the distorted frames to form the original frames.With these, the original frames can be converted into distorted frames and vice versa.In the inference phase, the output distortion vector 1 is used to correct the NURD of the n − th frame.

Fig. 4 .
Fig. 4. Atherosclerotic vascular phantoms used for validating our proximal-scanning endoscopic OCE.(a) The photographs of the phantoms, including external and crosssectional views.(b) An OCT B-scan of the phantom acquired using our homemade system.The plaque-mimic regions are marked with star symbols.

Fig. 6 .
Fig. 6.Qualitative comparison of the NURD correction using the models trained with the real and pseudo distortion vectors.(a)-(c) are the 3D rendering of a quartz tube before and after the correction.(d)-(f) are the cross-sectional view of three B-scans by the sampling interval of 10 that are mapped into each channel of an RGB image.

Fig. 7 .
Fig. 7. Quantitative comparison of the NURD correction using the transformer-and CNN-based models, and trained with the real and pseudo distortion vectors.(a) and (c) are the correction errors (in pixel) of an image sequence via the real (green) and pseudo (red) distortion vectors.(b) and (d) are the corresponding statistical box plots.

Fig. 8 .
Fig. 8. Static image sequences used for evaluating the NURD correction for elastography.The upper row shows the axial maximum value projection of the sequence, the lower row gives representative B-scans.Their positions in the sequence are labeled by the dashed lines in the left column.

Fig. 9 .
Fig. 9. Displacement results between neighboring frames in the absence of mechanical stimuli using the DIC and optical flow algorithms.The colors in the graphs correspond to displacements in pixels.The color bar on the right gives the quantitative mapping between them.For the radial displacements, red (positive values) represents outward expansion and blue (negative values) represents inward contraction.For the circumferential displacements, red (positive values) represents clockwise displacements and blue (negative values) represents counterclockwise displacements.The values in the graphs are the average displacements.

Fig. 10 .
Fig. 10.Comparison of NURD correction methods for elastography computations.The upper and lower rows show the displacement results using the DIC and optical flow algorithms, respectively.FT: the feature-tracking-based NURD correction method proposed in [24].De-NURD: the CNN-based NURD correction method proposed in [15].

Fig. 11 .
Fig. 11.Displacement and strain distributions of the atherosclerotic vascular phantom before and after the NURD correction.The plaque-mimic region is marked with white lines on the reference frame in the upper left corner.The positive values (red) in the strain maps refer to the expansion of materials, and the negative values (blue) refer to the compression of materials.

Fig. 12 .
Fig. 12. Visualization of the dynamic response of the model during balloon inflation.The elastography calculations were performed between the reference frame and four deformation frames acquired consecutively.The time interval between them is about 0.2 seconds.The second through fifth rows show the calculated results for the radial displacement, circumferential displacement, radial strain, and circumferential strain, respectively.

Funding.
National Natural Science Foundation of China (62105198).