Effective calibration of an endoscope to an optical tracking system for medical augmented reality

Background: We investigated the methods of calibrating an endoscope to an optical tracking system (OTS) for high accuracy augmented reality (AR)-based surgical navigation. We compared the possible calibration methods, and suggested the best method in terms of accuracy and speed in a medical environment. Material and methods: A calibration board with an attached OTS marker was used to acquire the pose data of the endoscope for the calibration. The transformation matrix from the endoscope to the OTS marker was calculated using the data. The calibration was performed by moving either the board or the endoscope in various placements. The re-projection error was utilized for evaluating the matrix. Results: From the statistical analysis, the method of moving the board was significantly more accurate than the method of moving the endoscope (p < 0.05). This difference resulted mainly from the uneven error distribution in the OTS measurement range and also the hand tremor in holding the endoscope. Conclusions: To increase the accuracy of AR, camera-to-OTS calibration should be performed by moving the board, and the board and the endoscope should be as close as possible to the OTS. This finding can contribute to improving the visualization accuracy in AR-based surgical navigation. Subjects: Biomedical Engineering; Medical Imaging; Medical Technology & Engineering


PUBLIC INTEREST STATEMENT
In implementing an augmented reality (AR)based surgical navigation system, an issue is to overlay reconstructed virtual 3-D models on real ones with high accuracy. In particular, for surgical fields where an endoscope is used, an alignment of coordinate systems between the endoscope and an optical tracking system (OTS), called camera-to-OTS calibration in this article, is a necessary procedure, and its result highly affects the AR accuracy. The calibration is generally performed by moving the endoscope. This requires a large measurement range of the OTS and thus it makes the result of the calibration inaccurate possibly because the OTS has a spatial error which increases with the distance between tracked objects and the OTS. This article proposes a method to move the calibration board instead of the endoscope for the calibration and shows that moving the board can significantly improve the performance of the calibration.

Introduction
Augmented reality (AR) is an emerging technology where virtual objects are superimposed onto camera images. Recently, AR has been used in surgical procedures that use medical cameras such as an endoscope or microscope. AR-based surgical navigation provides information on the shape or location of tumors, blood vessels or nerves which are difficult for surgeons to recognize by direct vision. AR information for specific organs or areas of interest is directly overlaid on endoscopic or microscopic images without using an extra monitor to display it, which avoids distracting the surgeon's vision (King et al., 2000;Sielhorst, Feuerstein, & Navab, 2008;Winne, Khan, Stopp, Jank, & Keeve, 2011).
There are two key elements that affect the accuracy of AR-based surgical navigation. One is the registration to determine the relationship, T P I in Equation (1), between the patient and image frames (Arun, Huang, & Blostein, 1987;Besl & McKay, 1992;Horn, 1987;Nottmeier & Crosby, 2007;Schicho et al., 2007;West, Fitzpatrick, Toms, Maurer, & Maciunas, 2001). The second is the camera calibration with respect to the optical tracking system (OTS), which is the process to determine the relationship, T CM C in Equation (1), between the cameras and the OTS marker attached to the camera. We refer to this calibration hereafter as ''camera-to-OTS calibration''.
Due to this second element, ensuring an acceptable accuracy in AR-based surgical navigation is more difficult than in the virtual reality-based surgical navigation achieved by using only the first element. Nevertheless, camera-to-OTS calibration has been little studied so far compared to patientimage registration.
Camera-to-OTS calibration methods are fundamentally similar to hand-eye calibration methods used to determine the relationship between the end-effector of a robot and the camera attached to the end-effector in typical robotic applications (Andreff, Horaud, & Espiau, 2001;Chen, 1991;Chou & Kamel, 1991;Daniilidis, 1999;Dornaika & Horaud, 1998;Li, Wang, & Wu, 2010;Shiu & Ahmad, 1989;Tsai & Lenz, 1989;Zhuang, Roth, & Sudhakar, 1994). Despite their similarity, camera-to-OTS calibration using the OTS for AR-based surgical navigation systems differs from that using the robotic system in two key ways. First, the camera must be moved manually without the assistance of the robot mechanism, as shown in Figure 1(a) and (b). Therefore, the weight and the size of the medical cameras make the calibration procedure inconvenient and may cause significant error due to hand tremor of the user while moving the camera. The second difference, the main focus in this paper, arises from the use of the OTS instead of the robot encoders to measure the pose of the camera. The OTS has a unique spatial error distribution which increases with the distance from the target. Several researchers have discussed the distribution of spatial error of the OTS (Gerard & Collins, 2015;Khadem et al., 2000;Koivukangas, Katisko, & Koivukangas, 2013;Schmidt, Berg, Ploeg, & Ploeg, 2009;Wiles, Thompson, & Frantz, 2004), and reported measurement strategies to reduce the effect of the error: to track the target as close as possible to the OTS (Gerard & Collins, 2015;Khadem et al., 2000;Wiles et al., 2004), and to have a small measurement volume (Schmidt et al., 2009). This spatial error also makes typical hand-eye calibration methods inaccurate and this is why we should consider a different approach.
In this study, we investigated the reason why the typical hand-eye calibration methods produce larger errors in clinical applications than in robotic applications. In addition, based on the comparison of possible camera-to-OTS calibration methods under three endoscope and OTS settings, the best approach was suggested for the AR display with an endoscope. Our results can also be utilized in solving other camera and external sensor calibration problems.

AR navigation and camera-to-OTS calibration
The basic concept of AR navigation is represented in Figure 2   frames of the image, patient, OTS, endoscope-affixed OTS marker, and endoscope. As shown in Figure 2, we can build the following equation straightforwardly: where P I and P C are 3D points in the image and endoscope frames, T O P and T O CM are 4 × 4 homogeneous transformation matrices (HTM) representing the poses (position and orientation) of the patient and endoscope, which are directly obtained from the OTS, T P I and T CM C are HTMs obtained from the patient-image registration and camera-to-OTS calibration.
An arbitrary point in image frame P I is transformed to the endoscope frame by using Equation (1) A typical method of performing camera-to-OTS calibration is the process to solve AX = XB, which is defined in Figure 1(c), where A is an HTM defined by the pose data from OTS, B is an HTM defined by the traditional camera calibration, and X is the target HTM to find, which is equal to T CM C in Equation (1). There is also a modified method of AX = XB, which is referred to as AX = YB, which is defined in Figure  1(d). The process to solve AX = YB is similar to that for AX = XB; however, in addition to X, the solution to AX = YB provides Y, which is the transformation from the OTS to the board, which can be used to find X differently by direct cascade multiplication without using the least-square sense. At least three pose data of A and B with different orientations and positions are required to solve AX = XB or AX = YB. For this, an endoscope attached to an OTS marker must be re-located multiple times while the OTS and the board are fixed. After acquiring sufficient pose data, X is calculated using A and B with the least-square sense.

AX = BYC configuration with a calibration board
Unlike the robotic encoder, the spatial error of the OTS increases with the distance to the target tracked, as shown in Figure 3. As the endoscope is moved to locations far from the OTS to acquire the necessary pose data for the calibration, the spatial error of the OTS strongly affects the accuracy. To reduce the influence of the spatial error, we proposed a method whereby the user moves the calibration board instead of the endoscope. The advantage of this method is explained below.
To move the calibration board instead of the endoscope, the OTS marker was attached to the calibration board before acquiring the pose data. The movement ranges of the endoscope and the calibration board are shown in Figure 4 pose 2. At the same rotation angle θ, the displacement of the board is much smaller than that of the endoscope, because the displacement depends on the distance between the OTS marker and the endoscope, or the OTS marker and the calibration board. The OTS marker is attached to the endoscope head, which is far from the distal, whereas the calibration board has the marker close to the rotational axis. Schmidt et al. (2009) reported that a small measurement range is recommended to reduce the error. Moving the board instead of the endoscope is expected to be less affected by the spatial error of the OTS, thanks to the relatively small movement range. Additionally, because the board is lighter and smaller than the endoscope, the calibration procedure will be more convenient and faster.
By attaching an OTS marker to the board, AX = BYC, which is a modified version of AX = YB, is formulated as shown in Figure 4(c). {BM} and {B} represent the frames of the board-affixed marker and the board itself. In AX = BYC, A represents the HTM from the OTS to the endoscope-affixed marker, B represents the HTM from the OTS to the board-affixed marker, C represents the HTM from the origin of the board to the tip of the endoscope, Y represents the HTM from the board-affixed marker to its origin, and X is the target calibration matrix. We will use a symbolic equation AX = BYC to represent the method of moving the board instead of moving the endoscope.

Solving AX = BYC
Three steps are required to solve AX = BYC. The first step is to acquire n-pose data by moving the board with a sufficient angle between the poses. According to the report by Tsai and Lenz (Tsai & Lenz, 1989), the accuracy of the camera-to-OTS calibration depends on this angle between the poses. The second step is to multiply both sides of AX = BYC with the inverse of B, thus yielding DX = YC, where D = B −1 A, which is the same form as AX = YB. The last step is to calculate the DX = YC equation using the previous methods to solve the AX = YB equation. In this step, the Kronecker productbased computation method was used (Li et al., 2010). This method begins by separating the rotation and the translation as follows: where R is a 3 × 3 rotation matrix and ⃗ t is a 3 × 1 translation vector, and their subscripts represent symbols of HTMs before separation, i.e., R D is the rotation matrix of D. Based on the definition of the Kronecker product and vectorization, Equations (2) and (3) can be expressed as Equations (4) and (5), respectively.
where ⊗ is a Kronecker product operator defined in Equation (6), vec(•) is a vectorization operator that reshapes an n × m matrix to an nm × 1 vector, and I is an identity matrix.
Finally, Equations (4) and (5) yield the following Q i v = p i equation where i represents an index of the acquired pose data, generally known as the least-square problem: The dimensions of Q i , v, and p i , are 12 × 24, 24 × 1, and 12 × 1, respectively. With multiple pose data, Equation (7) can be simplified to Equation (9): where M and N represent Q 1 ⋯ Q n T and p 1 ⋯ p n T , respectively, and n is the number of poses.
Equation (9) is solved using the pseudo inverse of M for n > 2. Note that the vector v in Equation (9) has both X and Y information.

Experimental setup
In the experiments, camera-to-OTS calibration was performed by using the following instruments: a calibration board with a 5 × 4 pattern array of corner points in which the distance between the corner points was 15 mm; an endoscopic system with 0 degree scope cylinder (1188HD, Stryker, Kalamazoo, MI, USA) to capture the pattern on the board; an OTS (Polaris Spectra, Northern Digital Inc., Waterloo, Canada) to track the poses of the markers attached to the endoscopes and the board.
Camera calibration was performed by using Zhang's method before camera-to-OTS calibration (Zhang, 2000). Thirty pattern images were used for the camera calibration. The image resolution was 720 × 576 pixels with 30 frames per second (fps). Different layouts were considered and applied as shown in Figure 5 because the influence of the spatial error of the OTS changed with the positions of the endoscope, the OTS, and the board. Three layouts, named layout 1, 2, and 3 in this study, were chosen after considering the actual positioning of the OTS and the endoscope in clinical environments. Figure 5(a), (b) and (c) show layout 1, 2, and 3, (2) respectively. Layout 1 simulates the case in which both the endoscope and the board are relatively close to the OTS. Because the OTS has spatial error that increases with the distance, layout 1 should have the least spatial error due to the short distance. Layouts 2 and 3 simulate the cases in which either the endoscope or the board is relatively far from the OTS. In addition, by considering the line-ofsight problem of the OTS in each layout, the location and direction of the markers on the endoscope and the board were carefully selected.
In each layout, camera-to-OTS calibration was performed 30 times. For each trial, the board was moved to 30 locations with different positions and orientations. The same process was repeated by moving the endoscope instead of the board.
The re-projection error was calculated to measure the accuracy of the camera-to-OTS calibration. After the camera calibration, A, B, X, and Y were used to compute the re-projection instead of extrinsic parameter, C as shown in Equation (10): where p ′ is a 2D point in the image coordinates, f x and f y are the focal lengths, c x and c y are the principal points; P is a 3D corner point of the board; A and B are pose data obtained from the OTS; and X is the result of the camera-to-OTS calibration. Y represents the transformation from the board and the marker attached to the board. As defined in Equation (10), the point of the board (P) in 3D is projected onto the image plane (p ′ ) in 2D.
On the board used there were 20 corner points. The total re-projection error of the camera-to-OTS calibration, ε, is defined in Equation (11): where m is the number of images, n is the number of corner points of the board, and p is the corner points obtained by image processing of the m-th picture. The re-projection points computed by the results of the camera-to-OTS calibration were given as p ′ . For the evaluation, 50 poses that had not been used previously for the camera-to-OTS calibration were selected. Note that we performed camera-to-OTS calibration 30 times with 30 poses for each trial, and the results were evaluated by using 50 different poses that were not used for the calibration process.
The solution from moving the board, referred to as AX = BYC, was compared with two solutions from moving the endoscope, referred to as AX = XB (Figure 1(c)) and AX = YB (Figure 1(d)). A t-test with a one-sided significance level of 5% was also performed to evaluate the statistical significance using OriginLab software (OriginPro 2015, Northampton, MA, USA). Note that moving the endoscope has two kinds of solutions while moving the board has only one solution, which produces three results altogether, as shown in Figure 6(a).
Figure 6(a) shows the re-projection errors for each layout over 30 trials. The AX = BYC method was more accurate than the conventional methods in all layouts (p < 0.05). For the conventional methods, several peak values were observed in the error. However, the AX = BYC method showed consistency and did not exhibit peak values in the error (see standard deviation). Furthermore, the time required to perform camera-to-OTS calibration was measured for four volunteers who have less experience of the camera-to-OTS calibration procedure. Most time-consuming task in the camera-to-OTS calibration procedure is to collect necessary pose data; therefore, we measured the time required to collect necessary pose data. Each volunteer acquired 30 poses, and repeated the acquisition task five times. As a result, an obvious difference was observed as shown in Figure 6(b). When using the AX = BYC method, the mean task time for all volunteers was approximately 93 s, while it was 141 s by the conventional method. This signifies that the AX = BYC method can reduce the calibration time.

Discussion
In this study, camera-to-OTS calibration methods for AR navigation with an OTS were investigated in terms of the accuracy, speed, and convenience. Due to the inherent spatial error of the OTS, conventional hand-eye calibration methods, AX = XB and AX = YB, which need to move the camera to acquire pose data did not perform well in clinical environments. In contrast, the method of moving the calibration board instead of the camera showed better performance thanks to the smaller movement range. This is easily achieved by attaching an additional OTS marker to the board and establishing AX = BYC, a modified form of AX = YB.
The AX = BYC method was most effective in the case of layout 1, which is the most sensitive to the spatial error of the OTS because the endoscope and the board mainly move along the z-axis of the OTS. On the other hand, in layouts 2 and 3, the influence of the spatial error is less than that in layout 1 because the board mainly moves in the x and y directions, while the spatial error mainly increases along the z-axis of the OTS. Additionally, in the case of layout 3, due to the position of the endoscope, the line of sight of the OTS was frequently disconnected. Therefore, the movement of the endoscope was significantly restricted. This restriction may have caused insufficient angles between the pose data, thus increasing the calibration error.
In comparison with AX = XB, the AX = BYC form has an additional source of error due to the OTS marker being attached to the board. In addition, the AX = BYC form has one more unknown parameter, Y, which is not an essential element for AR navigation. Nevertheless, the AX = BYC form showed a more accurate result than AX = XB. This indicates that the advantage in handling the spatial error of the OTS compensates for the loss due to the additional marker attached to the board. To take advantage of this more effectively, the marker must be attached to a location that is sufficiently close to the frame of the board. Chen et al. (2012) used the most similar method to the AX = BYC method in terms of using the board with the OTS marker. However, they solved X using an orthogonal Procrustes analysis method. This method requires a manual task to take several points on the board for solving Y. This is inconvenient for users and the accuracy varies from individual to individual.
Although all of the experiments were performed with an endoscope, the same method is applicable to surgical microscopes and other camera systems for AR as well.

Conclusions
In this study, we tried to find the most effective method for the camera-to-OTS calibration to improve AR navigation with the OTS. Through the experiments using re-projection, it was found that moving the board showed significantly higher accuracy than moving the endoscope (p < 0.05), particularly when both the endoscope and the board are relatively close to the OTS. The AX = BYC method also improved the speed of the calibration process. This finding can contribute to improving the visualization accuracy in various AR applications.