Motion-compensated noninvasive periodontal health monitoring using handheld and motor-based photoacoustic-ultrasound imaging systems

: Simultaneous visualization of the teeth and periodontium is of significant clinical interest for image-based monitoring of periodontal health. We recently reported the application of a dual-modality photoacoustic-ultrasound (PA-US) imaging system for resolving periodontal anatomy and periodontal pocket depths in humans. This work utilized a linear array transducer attached to a stepper motor to generate 3D images via of registration to 2D cross-sections. Application of the motor scans. This that a modality-independent registration technique can be applied to relevant scans of the to reduce operator burden of skill and subject while showing potential for handheld clinical periodontal imaging.


Introduction
Periodontitis affects nearly 50% of Americans exerting systemic effects on the body [1]. Chronic inflammation from periodontitis has been implicated as a risk factor for cardiovascular diseases [2][3][4], cancer [5], and dementia [6]. Thus, tools to diagnose periodontal disease at an early stage and the development of new measurement techniques are urgently needed [7].
Ultrasound (US) imaging systems with frequencies ≤20 MHz have previously been used to image facial crestal bone or the cementoenamel junction but lack spatial resolution and contrast [8,9]. The integration of optical excitation via photoacoustic imaging significantly increases the potential applications in oral health [10]. We recently utilized a dual-modality photoacoustic-ultrasound (PA-US) system to non-invasively image the pocket depth and geometry [11,12]. This approach offered 0.01 mm precision and provided a full contour of the pocket. Even though promising results were obtained, larger-scale clinical translation of this approach is limited by shaking artifacts caused by motion from the subject and operator. In prior work, a medical head immobilizer was used to stabilize the subject, but this failed to completely eliminate motion artifacts while also causing some discomfort to subjects [11,12].
To address these issues, we implement here an image registration technique for deshaking periodontal PA-US images. Image registration methods try to align two or more images of a specific scene. In general, there are two methods for registration: interactive and automated [13]. In the interactive method, a set of landmarks is manually selected and used to estimate the transformation models between the two images. This method requires an experienced and accurate operator. The work can be tedious, repetitive, and time-consuming. Consequently, automated registration methods are essential, and various methods have been proposed in this field [14] including the Harris-Laplacian, scale invariant feature transform (SIFT), speeded up robust features (SURF), maximally stable extremal regions (MSER), and modality independent neighborhood descriptor (MIND).
Automated methods typically have at least three steps: feature detection, feature matching, and model estimation [15]. Most of the differences in image registration algorithms are related to feature detection. The Harris-Laplacian algorithm looks for corners as features [16]. It is invariant to scale changes and rotation. The SIFT algorithm extracts key points through a pyramid structure that is invariant to scale, rotation, and brightness [17]. However, the number of these points is undesirable, and it takes a long time to create the vector describing the features. SURF is also invariant to scale and rotation, and its performance is acceptable in terms of speed [16]. MSER uses regional features and is independent of geometric and radiometric changes [18]. MIND [19] is based on a concept called self-similarity. It can be used for linear and deformable registration processes. This algorithm uses edges, corner points, and texture for features. Of these algorithms, we applied MIND for deshaking because it provides more sensitivity to the structural information of images [19].
In this paper, we report on the results obtained with a stepper-motor and handheld PA-US for noninvasive profiling of dental and periodontal anatomy. Our approach uses an image registration method based on MIND to correct for shaking artifacts produced by irregular hand movements during handheld scanning and subject motion during motor-based scanning. Phantom, ex vivo, and human data were collected and compared for qualitative and quantitative evaluation. Our results demonstrate that shaking artifacts from 3D PA-US images of the periodontal anatomy, enamel pigmentation, and pocket depth can be algorithmically removed to allow for accurate measurement and visualization of periodontal features in clinically relevant imaging scenarios. (a) A laser-integrated photoacoustic-ultrasound system (Vevo LAZR, VisualSonics Inc.), an (b) LED-integrated photoacoustic ultrasound system (AcousticX, Cyberdyne Inc.), or (c) an ultrasound-only system (Vantage, Verasonics, Inc.) was used to collect 3D images by scanning each system's linear array transducer via stepper motor and by hand. (d) Samples were first laterally scanned by a stepper motor as commonly performed. To induce motion artifacts in the motor-based imaging, the scan was repeated but the motor was perturbed by hand with lateral force at various intensities mimicking an imaging setup with imperfect subject-scanner immobilization. Samples were then manually scanned by hand to mimic the most convenient clinical imaging scenario.
The goal of this study was to validate the correction of motion artifacts for PA-US imaging conditions in which a common stepper motor was used or 3D sweeping was performed completely by hand. Freehand scanning is common and requires shaking corrections as does motor-based scanning when subject immobilization is imperfect [12]. Thus, we experimentally induced shaking motion artifacts by manually perturbing the stepper motor with periodic lateral force as the motor advanced [ Fig. 1(d)]; this also caused shaking in the elevation and axial directions. This was done with varying intensities while maintaining similar periodicity to evaluate the tolerance of the algorithm to shaking amplitude. We also performed fully handheld scanning in which the operator's elbow was stabilized but was otherwise subject to imperfect motion (i.e. not strictly 1-dimensional). The operator mimicked the scanning speed of the motor during imaging. Fresh swine jaws were acquired from an abattoir and prepared as previously described [11]. Briefly, the porcine head was sliced sagittally and the mandible was separated from the maxilla with a saw. The teeth and periodontal tissues were used as provided and immersed in a water bath for imaging [ Fig. 2(f)].

Human periodontal and epidermal imaging
The study enrolled one male and one female healthy adult. These subjects provided written informed consent and all work was conducted with approval from the UCSD Institutional Review Board and was in accordance with the ethical guidelines for human subject research set forth by the Helsinki Declaration of 1975. To evaluate a non-linear imaging target, a fine-tip permanent marker was used to inscribe the text "TU" on the supinated forearm of the male subject [ Fig. 2(e)]; this is called the "TU experiment" throughout the paper. The forearm was immersed in a water bath and imaging was performed as described above.
PA-US images of the periodontium were collected from a healthy female subject as previously described [12]. Briefly, the subject was seated in front of the laser-based PA-US system and the subject's head was immobilized upon a chin-level platform using a medical grade head immobilizer. The 40-MHz transducer was positioned at the gingiva perpendicular to the long axes of the central mandibular incisors. Sterile US gel was used for coupling. The stepper motor was then initiated to collect frames spanning the gingiva to the apical edge of teeth 24 and 25 (universal numbering system [see Fig. 2(g)]).

Image registration
In this paper, the reconstructed images are deshaken by the method introduced in [19] where computed tomography (CT) and magnetic resonance images MRI images of lungs were registered. It is based on the self-similarity concept used for noise removal [20]. The noise removal equation is as follows: where ω is the weight of each pixel (voxel in 3D) and the criterion for self-similarity, v is the is the noisy image, and NL is the denoised image [19]. First, a descriptor insensitive to the imaging modality (e.g., CT and MRI) and noise are defined, as follows: Here, we need to consider a search area of R in which the distance between the patch (i.e., sub-image) I. centered at x and another patch centered at x + r is denoted by D P . Term n is a constant coefficient to normalize the equation, V. is the variance of the patch, and MIND is the modality independent neighborhood descriptor used for registration.
After finding the descriptor for each pixel of the images, the similarity term for each pixel in both the patches (I and J) are calculated as follows: The metric S can be adapted in any registration algorithm. To obtain a better convergence, the Gauss-Newton optimization technique was used in this study. Readers are referred to [19,20] for more information. We refer to the deshaking technique by MIND through the rest of the paper.

Phantom experiments
The maximum intensity projections (MIPs) of the reconstructed images using the depth phantom [see Figs. 2(a), 2(b)] are shown in Fig. 3. The intensity decreases from the target 1 to 5 [see Fig. 3(a)] due to a lower laser fluence in depth. The effects of the shaking caused by the tapping (in three levels) and movement of the hand can be seen in Figs. 3(b)-3(e); the peak-to-peak distance (PPD) of the most intense target in Figs. 3(b)-3(d) is 0.7 mm, 1.9 mm, and 3.9 mm, respectively. MIND fails to correct the shaking in Fig. 3(h) because the level of shaking is too high [see Fig. 3(d)]. For the other two levels and also the handheld sweeping, MIND compensates for the shaking and leads to an image visually close to the ground truth [ Fig. 3(a)]. We expect no shaking artifacts in the ground truth because the phantom is stable and the minor shaking of the motor in such an ideal condition can be ignored.
The structural similarity index (SSIM) was used to quantitatively evaluate the performance of MIND [21]. The Structural Similarity Index (SSIM) is a metric for evaluating the similarity Fig. 3. The photoacoustic maximum intensity projection (MIP) of the depth phantom generated by (a-d) motor with no, low, medium, and high shaking, respectively, as well as (e) hand sweeping. (e-i) The processed (i.e., deshaken) MIPs of (b-e), respectively. The processing was performed on the B-mode images and the MIPs were then generated. The LED-based imaging system was used for data acquisition. between two images. It works structurally and does not perform any point-to-point comparison. The maximum value of this index is 1 and occurs when the two images are exactly the same. This metric is available by the ssim command in Matlab. The calculated SSIMs for all the tubes in Fig. 4(a) are higher than those in Fig. 4(b) due to the fact that the level of shaking in Fig. 3(b) is lower than that of Fig. 3(c). The MIND corrects the movements and results in a SSIM of about 0.8 in average for both the shaking levels and handheld scenario [see Fig. 4(a)-(c)]. The high shaking image presented in Fig. 3(d) was not evaluated because MIND failed to correct the shakiness [ Fig. 3(h)]. We also evaluated the contrast in Fig. 3. The ratio of brightness between a region on the most intense line and the background is 15.3 for both the shaky and processed MIP images.  The MIPs of the lateral resolution phantom [see Fig. 2(c), 2(d)] are presented in Fig. 6. Even though we used low, medium and high level of shaking in both Fig. 3 and Fig. 6, it does not mean that the applied forces to the motor were the same in these figures. The PPD caused by the shaking is about 1 mm, 1.7 mm and 2.4 mm for Fig. 6(b)-6(d), respectively. The first two lines from right side [see Fig. 2(d)] were not detected in the MIP images due to the resolution of the imaging system. The intensity of all the targets is the same due to the same laser fluence incident on the targets positioned at the same depth. Shaking causes the tubes to spatially mix together [see Fig. 6(b)-6(d); more specifically, the first two lines in the right side]. However, MIND still compensates for the motion artifacts and leads to MIPs [see Fig. 6(e)-6(g)] that are structurally close to the ground truth [ Fig. 6(a)]. It should be noted that the even though a high shaking was applied in Fig. 6(d), its PPD is still 1.5 mm lower than Fig. 3(d), which is why a reasonable image is obtained in Fig. 6(g). Even though the structure of the lines are preserved with MIND, there are some discontinuities [see the blue arrows] in the deshaken images [also visible in Fig. 6(a) at the scanning distance of 2 mm]. These issues could be due to the combined effects of the step motor shaking, inhomogeneities in the fluence, and lower performance of the MIND on those regions.  6. The photoacoustic maximum intensity projection (MIP) of the lateral resolution phantom generated by (a-d) motor with no, low, medium, and high shaking, respectively. (e-g) The processed MIPs of (b-d), respectively. The processing was performed on the B-mode images and then the MIPs were generated. The LED-based imaging system was used for data acquisition.
The MIPs of the TU experiment are provided in Fig. 7 where the shaking reduces the image quality in Fig. 7(b), but the deshaken image [ Fig. 7(c)] has a quality comparable with Fig. 7(a) where no shaking was applied. Figure 7(b) shows that the shaking causes up to 1 mm error in the lateral and scanning directions. The lateral size of some structures [such as what is shown with the blue arrows in Fig. 7(a)] are not even measurable in Fig. 7(b). MIND reduces the error to about 0.1 mm and 0.25 mm in the lateral and scanning directions, respectively [compare Fig. 7(a) and (c)]. Figure 7(a) is slightly tilted due to the movement and relative direction of handheld scanning in our experiment. This effect becomes more noticeable after deshaking in Figs. 7(b),7(c).

Ex vivo swine experiments
The MIPs of the ultrasound images generated by the ex vivo swine experiments are provided in Fig. 8. The effects of motor shaking are visible in Fig. 8(b), 8(c). MIND provides better and more accurate structural information (compare the green and red dashed-ovals). Table 1 indicates that SSIM of the processed MIP images are higher than the shaky ones (the mean SSIM for the regions indicated by the green and red dashed circles are presented).
To better understand the improvements, the sagittal planes of the MIP images [the red dashed-line in Fig. 8(a)] are presented in Fig. 9 where the extent of enamel staining is measured Fig. 7. The photoacoustic maximum intensity projection (MIP) of the TU experiment generated by (a) motor (no shaking), (b) handheld without processing, and (c) handheld with processing. The processing was performed on the B-mode images and then the MIPs were generated. The LED-based imaging system was used for data acquisition. by the photoacoustic imaging modality. The gray and red colormaps indicate the ultrasound and photoacoustic images, respectively. The image generated with the motor without shaking [ Fig. 9(a)] was given to the MIND to evaluate the bias of the deshaking method. In the deshaken version [ Fig. 9(a), second column], the image is smooth compared to the Fig. 9(a), first column. However, the calculus depth is estimated at 4.9 mm in both images, which proves the accuracy of our deshaking method.
For the low-shaking dataset, there is no difference in the calculus depth measured by the two images [see Fig. 9(b)]. In the medium and high shaking levels, the calculus depth is over   Fig. 9].

In vivo human experiments
The results obtained for the in vivo experiment are presented in Fig. 10. In Fig. 10(a), no intentional motion artifact was applied. However, we expect to see motion artifacts [the dashed boxes in Fig. 10(a)] due to the movement of the subject (e.g., breathing and minor head movements). These artifacts are addressed in Fig. 10(b) with MIND [compare the boxes in Fig. 10(a) and Fig. 10(b)].
US and PA imaging modalities were used to detect the gingival margin and pocket depth, respectively [see Fig. 10(f) and its zoomed version]. Figure 10(d), 10(e) shows the sagittal cross-section indicated with the blue and green dashed-lines in Fig. 10(c), respectively, where the shaking reduces the image quality. Figure 10(f), 10(g) shows sagittal cross-section of the processed MIPs where the images are deshaken, and the pocket depth is well estimated in agreement with our prior study [12]. Figure 11 shows the statistical brightness analysis conducted on R1 and R2 [see Fig. 10(a)]; R, P and S stand for region, processed and shaky, respectively. The mean brightness only reduces for about 3%, which could be due to displacement of pixels within the regions. The proposed deshaking method does not modify the brightness of pixels and only uses brightness as one of the features to look for similarities.   ,g), respectively, where (e,g) are the processed version of (d,f), respectively. The blue, black and red dashed-boxes can be used for comparison of the structural information. The green box is used for statistical brightness analysis (R stands for region). The yellow arrows indicate the pocket depth. The US and PA images are shown in gray and hot colormaps, respectively. The Vevo imaging system was used here for data acquisition. Fig. 11. The statistical brightness analysis for R1 and R2 shown in Fig. 10. R, P and S stand for region, processed and shaky, respectively. The US images were used for this comparison.

Discussion
The availability of a noninvasive imaging method to comprehensively profile oral anatomy is of great benefit for periodontal health monitoring. The feasibility of a dual modality PA-US imaging system to this end was previously presented by our group [11,12]. In this follow up study, we aimed to improve our imaging system. Different central frequencies will be used for different imaging scales, and the experiments described here were conducted with four probes with different central frequencies (i.e., 10, 20.2 and 40 MHz). Motion artifacts were induced by hand in different levels. An image registration algorithm based on MIND was used to deshake the images and provide more accurate structural information. The images of the depth phantom were deshaken with 80% similarity to the ground truth. The structural errors of about 1 mm were reduced to 0.1 mm and 0.25 mm in the lateral and scanning directions, respectively. Finally, the extent of enamel staining was measured with 0.1 mm error, and the in vivo results agreed with our prior study [22].
To deshake the images, the first image is defined as the reference, and image i+1 is registered on the image i. If registering and aligning error occurs between image i. and i. +1, then it affects all the subsequent steps. For applications with many images to be registered (e.g., our application), this makes the image deshaking very critical. Here, the MIND algorithm was used mainly due to the its simultaneous sensitivity to corners, textures, and edge features as well as its capability to perform deformable registration. Another advantage of this algorithm is that it operates regionally, which increases the processing speed and reduces the incidence of fundamental errors. The direction of the motion artifact does not influence MIND as discussed in [19].
The SSIM calculated for the experiments conducted with Vantage (F0 = 20.2 MHz) and AcousticX (F0 = 10 MHz) systems were almost the same (about 80%). This analogy can be used to generalize the error reduction obtained with MIND in AcousticX (0.1 mm and 0.25 mm in the lateral and scanning directions, respectively) to the Vantage system. The measurement error in the Vevo system (F0 = 40 MHz) was also reduced to 0.1 mm. Therefore, a measurement error of about 0.1 mm can be expected in all the three investigated central frequencies. The performance of the proposed deshaking method might be independent of the central frequency of the probe, but further study is needed to confirm this.
Even though there is no shaking applied in Fig. 9(a), MIND smoothens some of the structural variations especially the region showing the calculus depth (PA signals). It is not clear to us whether this is due to the minor internal shaking caused by the step motor or the fact that our deshaking technique is biased. The structures indicated by the red dotted-box are not smoothened in the second column of Fig. 9(a), and we believe that it should be the minor internal shaking of the step motor. This of course needs further investigation, but if biased, one solution can be to use machine learning and different datasets to teach MIND to prevent the smoothening of realistic structural variation. The extent of enamel staining measured in Fig. 9(a) does not change after applying the MIND, which demonstrates the preservation of the structural information after deshaking.
The beamformer used for image formation and the contrast of the images along with the signal-to-noise (SNR) affect the performance of deshaking. In our study, a delay-and-sum (DAS) beamformer was used for image formation. More advanced image formation techniques ( [22][23][24][25][26]) can be used to improve the contrast of the image and improve the performance of MIND because it locally processes features and sidelobes that might add unrealistic features. Machine learning and deep learning can further improve the performance of the deshaking by providing a better SNR [27,28]. The number of pixels used in the reconstruction of each image could also affect the performance of the MIND. Our investigation (not presented in this paper) showed that a higher number reduced deshaking error.
In the ex vivo and in vivo experiments, the ultrasound probe was horizontal, and vertical scanning was conducted. To measure the staining, we could perform the imaging with a verticallyheld transducer, which would provide an artifact-free sagittal plane of the tooth. However, if a complete profile of the tooth is needed, then image deshaking is necessary to correctly align the sagittal planes and provide an artifact-free image. In our study, the reconstructed B-mode images were used as the input of our deshaking method. This makes our method suitable to be coupled with commercial US/PA imaging systems without any need for RF data. This is an advantage as RF data is only available in research-based imaging systems, which are expensive to be used in clinics. The proposed motion compensation technique could be used in other applications such as imaging spinal curvatures [29], wound staging [30], carotid imaging [31][32][33] and generally free-hand imaging systems [34,35] where the 1D ultrasound probe is used handheld (without any sensor to track the trajectory of the hand) for sweeping the imaging medium and making 3D images.
Care must be taken to set the parameters of the deshaking method properly for different imaging scenarios. The PPD was a key parameter in our application and could be controlled by to the level of applied force to the motor in our study. A higher value of the PPD led to a higher possibility of failure. Our method failed in Fig. 3(h) mainly due to a large PPD (about 3.9 mm at the central frequency of 10 MHz). Following the fact that the same measurement error of 0.1 mm was obtained in different frequencies (10, 20 and 40 MHz), a PPD of 3.9 mm most probably led to failure in other frequencies as well. Our evaluation showed that the maximum PPD that our deshaking method can handle is 3.6 mm.
One solution to this failure is to increase the search region [parameter R in (2i t)]. However, if a large R is selected, then more complex images such as those presented in the original work (see [19]) might fail due to non-rigid deformation leading to poor image quality. Here, R was equal to 4, but using an R of 48 could still deshake Fig. 3(h) with a high similarity to the ground truth. Of course, all other results would be different as well.
A larger R also imposes a higher computational complexity. The boundary at which the MIND algorithm fails is not entirely clear and was also not discussed in prior work [19]. However, the clinicians using this approach have fine motor skills and will move the probe in a relatively straight trajectory. Thus, the expected tolerance of the hand will be well within the capacity of the proposed deshaking algorithm to improve the images; we have shown this in Figs. 3(e), 5(i) and Figs. 5(b), 5(d).

Conclusion
In this follow up study, we used an image registration technique to correctly align different B-mode PA-US images and create an artifact-free 2D profile of the tooth and its periodontium. The experimental results obtained with the depth phantom showed that 80% similarity to ground truth could be obtained. An error of about 1 mm in the TU experiment was reduced to 0.1 mm and 0.25 mm in the lateral and scanning directions, respectively. The results of the ex vivo experiment showed that the depth of calculus could be measured with 0.1 mm error. The deshaking technique shows potential for clinical collection of motion artifact-free oral PA-US images in both stepper motor and handheld configurations. This reduces the burden of technical skill on the operator and the need for stringent head immobilization.