Perception enhancement using importance-driven hybrid rendering for augmented reality based endoscopic surgical navigation

: Misleading depth perception may greatly affect the correct identification of complex structures in image-guided surgery. In this study, we propose a novel importance-driven hybrid rendering method to enhance perception for navigated endoscopic surgery. First, the volume structures are enhanced using gradient-based shading to reduce the color information in low-priority regions and improve the distinctions between complicated structures. Second, an importance sorting method based on the order-independent transparency rendering is introduced to intensify the perception of multiple surfaces. Third, volume data are adaptively truncated and emphasized with respect to the perspective orientation and the illustration of critical information for viewing range extension. Various experimental results prove that with the combination of volume and surface rendering, our method can effectively improve the depth distinction of multiple objects both in simulated and clinical scenes. Our importance-driven surface rendering method demonstrates improved average performance and statistical significance as rated by 15 participants (five clinicians and ten non-clinicians) on a five-point Likert scale. Further, the average frame rate of hybrid rendering with thin-layer sectioning reaches 42 fps. Given that the process of the hybrid rendering is fully automatic, it can be utilized in real-time surgical navigation to improve the rendering efficiency and information validity.

With decades of studying, misleading depth cues remain to be the challenge of AR system resulting in perceptual biases [4]. It is obvious that the incorrect depth perception may interfere with the understanding of the spatial relationships of objects inside the rendering view. There are various factors that may influence the perceptual issues, such as environment, capturing and display methods [5,6]. In particular, three kinds of AR-based display methods have been used in medical application in recent years, including see-through, projectorcamera and video-based display [1,5]. The see-through method utilizes optical transmissive technology to superimpose information onto a semi-transparent mirror to extend surgeon's knowledge on in-vivo surgery. Bichlmeier et al. proposed an AR-based see-through method to improve 3D medical data perception [7,8]. With the motion of a virtual window, depth cues of occlusion and motion parallax are intensified. However, the key of this method is the rapid motion of surgeon's perspective, which may lead to additional physical burden. Moreover, the simple overlaying of semi-transparent images is perceptually not optimal and may cause false spatial relationship judgments between the virtual structures and real images. Liao et al. developed an MRI-guided AR system that superimposed integral videography image with surgical area, and it was viewed through a half-silvered mirror [9]. But the main issue of seethrough approach is that it needs more motion tracking sensors to align virtual and real views [10,11]. The projector-camera display allows surgeons to observe the internal structures on patient's skin surface [12,13]. Gavaghan et al. proposed a handheld image overlay device that projected images of 3D models onto the organ surface [14]. Even with the help of position tracking sensors, their method's projection error of a rugged liver phantom is uncontrollable. Tabrizi et al. proposed an AR neurosurgery system to project image of virtual model onto patient's skin, but their rendering was so lack of depth perception that the lesion seemed to be out of the skin surface [15]. The drawback of projector-camera systems is that a small offset between surgeon's gaze and the optical center of the projector may lead to a significant error of perception. The video-based display uses an external camera to capture the video image around the lesion areas. This approach directly superimposes preoperative patient information on the endoscopic image along the surgeon's gaze, thereby minimizing the locating error of deep structures inside the projection plane [16][17][18][19]. However, as the spatial relations between surgical tissue and organs at different positions are compressed according to the perspective projection principle, the depth cues are difficult to be perceived, which will further result in the abruption of the AR rendering. Therefore, augmented visualization of the preoperative 3D medical data has become a key issue in AR-based surgical navigation.
In clinical application, various medical visualization techniques are available, including basic direct volume rendering, surface rendering and the hybrid rendering that combines both surface and volume rendering [20,21]. Direct volume rendering (DVR) is used to obtain a fast overview, however, emphasizing objects or special parts is difficult. Therefore, to help surgeons understand the rendered anatomical structure of clinical data and remove unimportant details, non-photorealistic rendering method is proposed. This proposed method is extensively used in medical visualization. Volumetric non-photorealistic rendering techniques are also called volume illustration [22], which can handle original data without undergoing any segmentation process, and the gradient direction is estimated based on the differences among adjacent voxels. The volume illustration method can improve the visualization of the important structure and achieve various rendering styles for different features of interest. Ebert and Rheingans presented several illustrative techniques that enhanced structures and added depth and orientation cues [22]. In their applications, exponential functions were employed, showing that color blending of volume could be more effective. Bruckner et al. proposed an interactively defined halo transfer function to enhance the depth perception using GPU-based direct volume rendering [23], but, it led to the expense of occluding other structures. Csébfalvi et al. visualized object contours based on the magnitude of local gradients and the angle between the viewing direction and gradient vector through depth-shaded maximum intensity projection [24], which could clearly show the outline of objects. Shape-from-shadow is another common approach to enhance shape perception. Lee et al. used multiple lights and local illumination to adaptively enhance the shapes of different parts of structures [25]. However, local illumination may destroy the rendering of different structures and cannot provide the correct location information of the targets. Furthermore, Viola et al. introduced an importance-driven volume rendering method to emphasize important structures. By assigning every individual part of the volumetric data with a different importance factor that encoded visibility priority, a cut-away view was generated to suppress less important parts of the scene and reveal more important underlying information [26]. It provided an alternative idea to intensify structures in complex data.
Generally, the surface rendering needs to extract the surface of object from the volume data. As the objects can be rendered separately, it is very flexible to adjust the color and transparency of a specific object so that special emphasis can be applied to important surfaces. However, a simple sum-up of different structures' transparency often results in incorrect image with false depth perception. Consequently, many researchers have proposed different methods to enhance the transparency and to improve the depth perception of surface rendering. Porter et al. developed a partial coverage model that defined a set of operations for images with coverage relationship [27], which composited the multiple surfaces iteratively. The rendering methods of transparent surfaces used the alpha blending equation to arrange surfaces either front to back or back to front to obtain the correct visualization [28]. Meshkin et al. [29] first introduced the blended order-independent transparency by formulating the weighted sum of different components and produced plausible images for low alpha values. But their approximation was inadequate that might cause significant deviation from correct blending, because it ignored the order-dependent terms. Mcguire et al. proposed a depth weighting operator to improve the perception of shading and color [30]. However, their methods still mislead the depth perception of semi-transparent objects.
Hybrid visualization of volume and surface has been extensively used in computer graphics [31][32][33]; therefore, it is of great potential to be introduced into AR-based endoscopic surgery. For the hybrid rendering technique, DVR is employed to present the anatomical context, such as skeletal structures, soft tissue and body cavities. Surface rendering is utilized to demonstrate the pre-segmented anatomical structure and the surgical tools. To render the hybrid data of transparent surface and volumetric object, Brecheisen et al. [34] proposed an iterative ray casting approach with surface rendering, which was based on depth peeling [35] to get the Z-buffer of polygonal geometry. Although the aforementioned methods have somehow improved the structural perception, exact blending of transparent object and complex structure remains a problem [36]. The correct composite of color and opacity for multiple objects is important for hybrid rendering, and the tradeoff between the quality and speed need to be carefully considered. Whereas, the combination of illustrative visualization and AR is a beneficial solution to improve perception in surgical applications [6,37]. Hansen et al. proposed an AR-based 3D visualization of planning data which extended illustrative rendering [37]. Their distance-encoding method of silhouettes and surfaces generated accurate depth judgement but was still immersive-less with the laparoscopic images. Tietjen et al. presented a useful rendering method that combined hybrid data and silhouettes for surgical training and planning [33]. The problem for their method is that the order of the nodes in the scene is the basis for depth rendering [38], hence, a slight mistake of the order definition may result in unidentifiable spatial relation of different objects. Moreover, their method cannot render semi-transparent surfaces and the volume simultaneously, hence, the spatial perception is weak. Although employing transparency for every object may be helpful, it also increases the complexity of computation. Thus, the rendering appearance and efficiency cannot satisfy the requirement of real-time surgery navigation and rapidly interaction. In our previous study, we proposed a hybrid rendering method for dynamic endoscopic vision expansion, by which the 2D endoscope and 3D CT images were effectively fused during surgery [39]. Figure 1 shows the screenshot of our AR-based surgery navigation system for cadaver experiments. The circular area in the middle of the screen is endoscopic image, and its surrounding area is the fused rendering of the hybrid data. However, three major problems need to be addressed. First, a large number of aliasing artifacts, named woodgrain effects, as shown in the yellow rectangles, are caused by the large sample step size of volume rendering. Second, identifying the spatial relationship of different objects is difficult because the scene is cluttered with miscellaneous anatomical structures and artifacts. Moreover, the cost of rendering is high, resulting in a much lower frame rate of scene update. Hence, this study is designed to address these problems through importance-driven hybrid rendering.
In this paper, an importance-driven hybrid rendering method is proposed to enhance structure and depth perception. The contributions in this study are threefold. First, we propose a gradient-based shading method to enhance the volume rendering structure. It can suppress the color information in a low-complexity region while improving the recognition of important structures. Second, by optimizing the order-independent transparency rendering with the priority of surfaces, an importance sorting method is proposed to improve the depth perception of multiple surfaces rendering. Third, we develop a real-time data sectioning method to accelerate data exploration and help surgeons concentrate on important information. Our method has been applied to endoscopic surgical navigation systems to improve rendering efficiency and information validity. The proposed hybrid rendering process has four major steps, namely, gradient-based volume shading, importance sorting-based surface rendering, real-time data sectioning, and 2D-3D image fusion, as the dashed line rectangles indicated in Fig. 2. The rendering method consists of two pipelines: volume rendering and surface rendering. In the volume rendering pipeline, gradient magnitude and direction of each voxel are first calculated, and then the volume data are enhanced through gradient-based shading and edge-and-contour enhancement methods. In addition, the stochastic jitter method is used for eliminating aliasing artifacts. In the surface rendering pipeline, surfaces of important structures are segmented from 3D CT data during the preoperative preparation process. Then, the importance factors of each surface are sorted and divided into priority and normal. The order independent transparency (OIT) method is employed for color synthesis of all the surfaces. Surfaces with different priorities are reinforced by contouring with different rendering options: higher-priority surfaces are enhanced using edge contour line, whereas the data of normal importance undergo the normal transparency processes. After the rendering process, the two branches are mixed through hybrid rendering Thus, the endoscopic image and 3D sectioned rendering result can be effectively fused for AR-based navigated surgery.

Gradient-based volume shading
In volume data, there are no explicitly and discretely defined features like surface models that can distinguish regions clearly [22]. The features indicated by volume characteristics are distributed continuously throughout the whole volume data set, which leads to considerable difficulties to separate disparate regions. But, the important information of boundaries between different regions and spatial relationships between different structures are still of great interest to us, and not yet identified. The local gradient is commonly used to indicate the disparate levels of two different regions. Levoy [40] introduced gradient-based shading and opacity enhancement to volume rendering. In his approach, the opacity of each voxel was scaled based on its gradient magnitude to emphasize the boundaries between different structures (such as tissue and vessel); this method can also contain the density constant of the same organ. Assuming that the volume data contain a set of precomputed sample points, the value in a location i P can be calculated as follows: ( ) ( , , ).
By calculating the gradient ( ) value of that location, the opacity of enhanced boundary on the basis of the gradient magnitude can be compute as where v O is the original opacity, and f ∇ is the gradient value of the volume data. bc k and bw k determine the ability of boundary enhancement from no gradient to full gradient. be k is a power function to adjust the slope of opacity curve. Whereas, the depth perception is limited in translucent volume rendering as there is no correct obscuration cue to show a clear depth ordering. Inspired by shading concepts in graphics and technical illustration, we develop a gradient-based shading method for the enhancement of structure and depth cue in volume rendering.
To highlight the color of structures with a high gradient magnitude, the following equation is introduced for the gradient color calculation: where v C is the original color that allows users to define the degree of gradient enhancement ( c k = 1, s k = 0). By using the exponent e k to adjust the compression range of the gradient color shading, Eq. (3) improves the differentiate of structures in the high gradient magnitude portions.
Additionally, the hierarchical filter based on adaptive octree is implemented to reduce the computation burden and to promote rendering quality. The volume data are first transformed into a hierarchical octree that consists of nodes with constant size of block. All the visible nodes are determined as those within the view frustum and clipping planes, which will be detailed in section 2.3. Then, the invisible nodes are skipped by only traversing the octree nodes that contain relevant data in the lookup table. Notably, the aliasing artifacts mainly come from a large sampling step size. The artifact suppression method based on stochastic ray jitter is used to reduce the artifact of volume rendering with minimal influence on the processing speed [41]. Through random adjustment of the starting depth for each ray, the closing rays are not sampled exactly along the same depth, and then the aliasing of the large step sizes can be minimized.

Importance sorting-based OIT rendering
In surgical navigation or therapy planning, exploring some structures that may touch, cover, or contain one another is necessary. For instance, in tumor resection surgery, vascular structures are often distributed around the tumor and may penetrate into it. The optic nerve, vasculature, and tumors are often very close to each other [39]. In such a complex situation, surgeons are often concerned about the vasculature or optic nerve closer to the tumor, so these structures tend to have higher priority than farther ones. Specifically, the properties of important structures need to be accurately recognized, such as the depth and direction of vasculature and nerve, while other non-important structures can be ignored to avoid distraction [42,43]. Although many solutions exist for the coverage of multiple surface data in complex scenes, their results are not visually correct [30,44]. Hence, this study introduces an importance sorting-based order-independent transparency surface rendering method.
In our method, we sort the objects according to their levels of importance, and the structures are highlighted based on their important impact factors. The importance sorting of different targets is commonly completed during preoperative surgical planning. For our calculation, the importance sorting is determined according to the size of the object and also the distance to the tumor. And, the important factor of the object is determined under the supervision of surgeons, which determines the visibility of the structures. Assuming that the important factor of the i-th surface is , the color composition of all the surfaces f C can be calculated as follows: α is the factor of opacity, and i k is the power exponent of surface in the cumulative function that will render the color of i-th surface distinctly when the surface number of a target is more than two, 2 i ≥ . With the enumeration of i S , the surfaces that occluded with a more important factor will be displayed more clearly.  Figure 3 shows a schematic of surface peeling along the depth for the proposed method. The scene consists of one ellipse and two lines that indicate different surfaces of objects. Along the rays of sight, the depth of the scene increases from the left to the right and is normalized from 0 to 1, in 3(a). According to the intersection relationship, the surfaces of the objects are assigned with different importance factors and denoted with different line styles, as can be seen in 3(b). The layers touched by the traveling ray are denoted as Layer 0 to i according to the sequence of touching, as the bold lines represented in 3(c), 3(d), 3(e) and 3(f), respectively. A layer may consist of series of facets that are touched by the ray in the i-th sequence.

Data sectioning and depth buffer rendering
To extend the visualization range of important structures and to improve the rendering efficiency, we propose a real-time sectioning method. The viewing range is adjustable in six freedom, which may enable an arbitrary perspective view of the medical data. During surgical navigation, the image registration process aims to estimate the transformation function that maps the coordinate system of the endoscope to that of the CT image. The endoscopic reference target (ERT, a customized instrument from Northern Digital Inc.) is traced using an optical tracking system (OTS) for the localization of fiducial points on patients and in virtually generated scenes. In the world space, the patient fiducial target (PFT) is located with ERT via OTS, and then a 4 × 4 matrix CT PFT T transform between the CT image and PFT can be computed. The transformation from the coordinate system of OTS to PFT can be written as PFT OTS T . The pose represents the OTS to ERT rigid transformation, which can be written as ERT OTS T . Hence, the transformation from tracking system to ERT can be calculated as follows: ( ) 1 . After the preoperative registration, the virtual endoscope is located in the same coordinate system as the CT image is. In this case, the OTS can obtain the position and orientation of the endoscope in real time by using Eq. (5). Assuming the endoscope's tip end is located at position 0 0 0 0 ( , , ) P x y z with a view direction W and a view-up direction V. And the = × U V W is the orthogonal direction in a left-handed coordinate system, as shown in Fig.  4(a). All the three directions are 3-dimensional unit vectors. To achieve the sectioning of the volume in an arbitrary direction, we set an m n d × × sectioning cube at the top end of the endoscope in the directions of U, V and W. The final hybrid rendering, fusion and navigation display are realized in this cubic range. Basing on the information of endoscopic tip end, the three orthogonal planes i π at the point 0 P can be expressed as ( ) where i τ is the normal vector of plane i π and they are corresponding to the three directions respectively, as shown in Fig. 4(a). Thus, the six planes of sectioning cube are generated by translating the three plans in two opposite directions for three half sides length of the cube: π  are the two parallel planes generated from the same direction. Figure 4(b) shows a schematic of the cube section, where the UW plane corresponds to the V direction and the VW plane corresponds to the U direction. The rectangular region shows the hybrid rendering of the volume and surface data that is located in the sectioned cube. The depth of viewing m can be customized along the virtual endoscopic direction W. When m decreases, the range of viewing depth will narrow down. The influence of the deep and irrelevant structure on the final rendering result can be effectively minimized, hence improving the efficiency of information. With the variation of viewing range m, n and d, the visualization of complex structures becomes more effective as the irrelevant information and objects are discarded. It can help surgeons to avoid incorrect perceptual and concentrate on important objects through the data sectioning.
After sectioning the scene, the hybrid data can be composited. First, we render the scene and fill the buffer with facets. Then, these facets are sorted through back-to-front compositing to generate the pixel colors. We assume that each pixel in the composited image is accumulated with the opacity α and color ( , , ) r g b along the ray through the hybrid data.
Then, the pixel value ( , , , ) r g b α of each sampling voxel at different depth, z, can be recursively blended as follows: is the cumulative value of color. As the opacity of voxel is closer to 1, the contribution of the pixel becomes less important.
The basic process of depth-based buffer rendering is shown in Fig. 5. The cube represents the CT data, and the contours represent two different surfaces. For the rendering, the ( , , , , ) r g b z α is the buffered value of the volume data. First, the rendering range is determined according to the depth of each clipping plane in the hybrid data. Then, we rasterize the surfaces to extract ( , , , , ) r g b z α and synthesize them to the image plane. Finally, the composited and endoscopic images are fused to expand the view.

Distance-weighted 2D-3D fusion
To fuse the 2D endoscopic image with the 3D scene, tracking-based registration of 2D-3D is accomplished through the OTS. The real-time posture information of endoscope acquired through the OTS is multiplied by the internal and external transformation matrix of the camera to obtain the position of the endoscopic image in the virtual scene. Therefore, the coordinate transformation between the patient and CT should be calculated before surgery. It is achieved by placing the fiducial markers on the head of the patient prior to acquiring the CT data set. Thereafter, the surgeon has to manually define or segment the position of these fiducial markers in the recorded CT data set. During surgery, the surgeon touches each fiducial marker using the pointing device of the navigation system. A coordinate transformation between the intraoperative scene and the preoperative CT data set can be established. With the infrared reflective markers attached to the endoscope, the navigation system can continuously provide the position and orientation of endoscope relative to the referenced patient. The where l is the distance from the virtual camera coordinate origin to the image plane. The result of Eq. (9) is approximated to the integer pixel position in the endoscopic image.
After the endoscope has been projected onto the image plane in virtual view, the AR image fusion is performed. By combining the pixels of the endoscopic image with the hybrid rendering scene, we can highlight the spatial information and remove redundant structures. To obtain accurate information of the display and additional vivid image effects, we propose a distance weighted fusion method. Given that only the central circular region contains the image information, a three-stages regional transparency modulation is performed. Assuming the endoscopic image is an M N × pixel matrix, the illustrative point within a semitransparent layer ( , ) GED P i j is set as the distance  Hybrid P i j . The average fusion method is used for the blending of image pixels, which extracts the pixel information from each image and averages them to obtain the final result.

Experiments
A series of experiments were performed to evaluate the rendering method. Assessment contents include image similarity, visualization effects, and computational efficiency on the simulative and clinical data sets. The evaluation was conducted using an Intel Xeon E5-2620 computer with 24 GB RAM and an NVIDIA Quadro K2000 graphics card. A viewport with the size of 1000 × 1000 pixels was used for all the measurements. For the gradient-based volume shading, a three-point global illumination system is used to enhance volume visualization [45]. The illumination includes main, fill, and back lights. The main light is white, and its azimuth location and altitude are 60° and 20°, respectively, as shown in Fig. 6(a). The fill light is yellow, and its azimuth location and altitude are 90° and −135°. The back light is blue, and its azimuth location and altitude are 180° and 90°. The main and fill lights are used to shade the volume data set with appropriate contrast and depth perception. The back light is used to highlight the rim and relatively thin structures and to illuminate certain overshadowed regions from the back. Figure 6 shows the rendering results of the comparative experiments. The direct volume rendering result in 6(b) has no gradient information, and its low texture area in red rectangle has subtle color and spatial relation information. This issue becomes more evident in the blue rectangle due to the extremely intricate anatomy and disordered structural texture in the nasal cavity. In most cases, the linear gradient shading method is used for volume rendering. But its results show a drastic change in color and an indistinguishable importance of different structures, as shown in 6(c). This may cause the neglect of requisite information and easily lead to visual fatigue. In our method, the structural perception on complex and thin areas is enhanced using gradient-based volume shading method. The magnitude of the gradient is computed and mapped to the color of the volume structures, which can intensify the structure information without causing the color change of the small gradient areas. Figure 6(d), 6(e), and 6(f) show the rendering results of our method, where the gradient of the forehead of the skull has minimal changes, whereas the color barely changes. Conversely, in the area of the rectangles, the color enhancement effect can be noticeable at the edge of the structure. The most noticeable effect is that the wood-grain artifacts have been removed completely. The rendering results appear smooth and realistic. The rectangle areas in Fig. 6 indicate that areas with complex structures have better surface consistency. To quantitatively evaluate the results, we use structural similarity (SSIM) [46] to pairwise measure the differences. As the SSIM index uses a perception-based model, it is suitable to be used in image quality assessment. The index values are in the range [−1, 1], where −1 indicates that the two images are quite different from each other, and 1 indicates that they are completely the same. All the SSIM indexes in Fig. 6 are the results between each leftmost method in row and the top one in column, where the bigger the differences are, the smaller the SSIMs will be. Their differences are demonstrated with color mapping and histogram statistics. The most noticeable feature is that all the structural uniform regions are colored lightly, but the edges of the structure are shaded heavily in all the SSIM variograms. The result indicates that the proposed method achieves the desired effect to shade the areas with a high gradient magnitude. Our method can effectively suppress the effect of shading in the low-complexity area and simultaneously enable improved structural perception in the larger gradient magnitude area. In addition, we perform a statistical analysis of all the SSIM values in Table 1. The color mapping of SSIM between Fig. 6(b) and 6(f) is shown in 6(bf), which indicates the most similar structural rendering, in another word, the two methods have minimal difference in the ability to represent the structure and texture. Figure 6(ce) has the largest structural difference mainly because the linear gradient shading method takes effect on the entire volume data. The appearances of Fig. 6(d) and 6(f) are apparently similar, but their SSIMs are relatively different. The mean SSIMs of (bd) and (bf) are 0.74 and 0.87, respectively, which indicates that the proposed volume shading methods of Fig. 6(d) and 6(f) generate almost the same rendering effects. However, the differences revealed in the color map of (bd) and (bf) are located at the edge of the structure, which proves that our method performs effectively in enhancing structure and depth information.   Fig. 7(e) has the smallest mean index of 0.64, and its SD = 0.16 is also the smallest. The result indicates that although the difference between the two figures is significant, the structures cannot be well differentiated. However, the mean SSIM value of 7(bd) is the greatest at 0.79 ± 0.21, hence indicating its ability to effectively highlight the structural information. In comparison with linear shading in 7(c), the SSIM of 7(cd) is the smallest (0.74 ± 0.19), which indicates better structural prominence. All the assessments demonstrate that the gradient-based volume shading method can effectively highlight the depth and shape perception.

Surface and hybrid rendering
In this work, the simulation data are used to estimate the multi-surface rendering. The data consist of several translucent surfaces with different topological relationships, such as intersecting, traversing, and disjointing. A green sphere in the middle of the scene represents the tumor. The red cylinders with different radii simulate blood vessels and they have three kinds of relationships with the tumor: the 1st vessel penetrates the tumor along the Y-axis, the 2nd vessel traverses the tumor from the inside to the outside, and the 3rd vessel is completely inside the tumor. A blue pyramid inside the tumor simulates the target in the tumor. The yellow prism outside the tumor represents the nerve that is disjoint with the tumor. Table 2 lists the parameters for all the surfaces in the simulated scene, such as the location, size, rotation, and transparency, as shown Fig. 8(a). The rotation can be expressed by a quaternion ( , , , ) x y z ω , where the first three terms define a vector starting from the origin of simulated scene Simul O to point ( , , ) x y z , and ω is an angle in radians. In the implementation, the surface is rotated around the vector for ω radians.  The weighted average, blended OIT, and depth peeling methods are compared. The rendering result of each method is viewed along the Y-axis (on the left side) and inverse the Y-axis (on right side), as shown in Fig. 8. The weighted average method can provide the correct color synthesis results, but the spatial relationship and depth of the structures disappear completely in Fig. 8(b). Although the color composition of blended OIT method is correct on the yellow nerve and blue target, its composition of vessels is incorrect, and its structure and direction are unidentifiable in 8(c). Another apparent drawback is that its composites have an incorrect color regardless of the view direction. The absence of depth perception is also observed for the blended OIT method. The depth peeling method shows the correct surface spatial order because when the 1st vessel traverses the tumor, the colors of the three segments are all correct in 8(d). And the location of the nerve is noticeable. But, its color synthesis is distorted; thus, the direction of the blue target cannot be perceived through the vascular structure due to the defect of the depth peeling method. Further, our proposed method is evaluated using different cumulative function parameters i k without sorting. The color composition is effectively improved when 0.5 i k = , as shown in 8(e), thereby improving the depth and spatial perception. The relationship between the 1st vessel and tumor can be clearly identified through its distribution, and the relationship between the blue target and 1st vessel can be recognized easily. When 1.5 i k = , the color accumulation is reduced and structures inside and behind the tumor can be easily perceived. We also used the silhouettes of the surfaces to depict the edges of the vessels and nerves so that the basic shapes of the structures can be identified.
Furthermore, we sort the surfaces according to the level of importance and i k , as shown in F. 8(g) and (h). The sorting order of (g) is Considering that shapes and directions are complex, the vessel is assigned with the highest priority. Meanwhile, the nerve is inferior to the vessels, and tumor is set as the lowest priority. The relationship among the structures is clear: the 1st vessel penetrates the tumor and the 2nd vessel traverses the tumor. Although the blue target is located in front of the vessels, the contours and shapes of the vessels are still identifiable. Another sorting is set as , and the vessels are assigned with different levels of importance. In Fig. 8(h), the target and nerve with high importance are more sharpened in comparison with 8(g). And the 3rd vessel with low priority can be ignored; hence, the surgeon can focus on important objects. To quantitatively evaluate the depth and spatial perception of the proposed multi-surfaces importance sorting method, we designed a five-point Likert scale with increasing consent from strongly disagree (1) through uncertain (3) to strongly agree (5), as shown in Table 3. There were five questions included to investigate the differences of depth perception, color composition and shape recognition among various methods. As the weighted average method provided no depth information, it was excluded from the experiment. Therefore, six different rendering methods were taken into consideration: the blended OIT, depth peeling and four ours proposed importance sorting methods. In total, 15 participants took part in the experiment: 5 clinicians in otolaryngologic surgery and 10 non-clinical volunteers without any prior experience in volume visualization. In the experiment, each participant completed the questionnaire with randomly arranged pairwise images of the six methods, as Fig. 8 shown. As every method was investigated with 5 questions by 15 respondents, totally 75 counts of response were collected for each method; and 450 samples were collected for all the six methods. The evaluation was performed based on the analysis of variance (ANOVA) in Table 3. The histogram of respondents' scores was shown in Fig. 9(a). Most responses either strongly disagreed (46/75, 61.3%) or disagreed (19/75, 25.3%) that the blended OIT method provided no perception of depth or spatial relations (9 uncertain and 1 agree). And the method's average score was 1.5 (SD = 0.8), which was the lowest of the six methods. A oneway ANOVA test showed that the blended OIT method was significantly different from the others (p < 0.001). Consistently, 18.7% strongly agreed (14/75) and 22.7% agreed (17/75) that the depth peeling method provided spatial relation perceptual enhancement, while 26.7% uncertain (20/75) or disagree (20/75) and 5.3% strongly disagree (4/75) about it. The average score of depth peeling method was 3.2 ± 1.2, which indicated a quite ambivalent performance of spatial judgement. As one-way ANOVA test showed a significant difference (p < 0.001), the depth peeling method could basically improve depth perception. For our proposed method, the four examples all showed a significant effect of depth and spatial enhancement. The no sorting group with different cumulative parameters i k was strongly agreed by 30% (45/150) responses and agreed by 31.3% (47/150); but there were 26% uncertain, 8% disagree and 2% strongly disagreed about its perceptual enhancement ability. On the other hand, the majority of responses were strongly agreed (44.7%, 67/150) or agreed (36.7%, 55/150) with our importance sorting-based perceptual enhancement method, while 12.7% uncertain and only 4.7% disagree or 1.3% strongly disagree. The one-way ANOVA tests of the two groups against blended OIT method indicated that our methods significantly intensified the depth and spatial perception (p < 0.001), as Table 3 showed. These results indicated that our method was subjectively preferred by most of the participants, which fully confirmed the effectiveness of our spatial perception enhancement method. Furthermore, the two group's one-way ANOVA tests against depth peeling method showed significant difference. In specific, the sorting groups of two different cumulative parameter i k were all significant different (p < 0.001), but the no sorting group only had one significant different methods with 0.5 i k = (p < 0.001) and the other ( 1.5 i k = ) had no significant difference (p = 0.128) with the depth peeling method. These results indicate that the proposed cumulative parameters and sorting method are very effective for enhancing of depth and spatial perception. As the boxplot of Likert scale showed in Fig. 9(b), our proposed method in Fig. 8(g) was preferred by most participants, which achieved the highest average score and lowest dispersion (4.5 ± 0.6). This proposed importance sorting-based method was strongly agreed by 50.1% responses (38/75) and agreed by 44% (33/75) ones, only 5.3% was uncertain about its effectiveness (0 disagree or strongly disagree). Hence, it can be concluded that our method is very effective and robust for spatial perception of complex structures.
Subsequently, surface and hybrid rendering experiments are implemented based on clinical data. The surface data from clinical volume data are complex, rugged, and difficult to predict. The existence of additional surface layers along the ray of sight makes surgeons difficult to identify the color and opacity. The results of depth peeling method are not correct, as the white rectangles illustrate confusing spatial relations of surfaces in Fig. 10(a). The green tumor should be located behind the blue eye orbit, but the yellow optical should never be located in front of the tumor. Our method can correctly demonstrate the spatial order in 10(d), and the direction and shape of the vessels are also easy to identify. Besides, the hybrid rendering in 10(c) and (d) both present explicit structures by using our gradient-based volume shading method, which prove the perceptual enhancement ability for additional anatomical information.  The functions of real-time sectioning performance, 2D-3D image fusion effects, and hybrid rendering efficiency were evaluated using the surgical navigation system. In real-time sectioning, we tested the rendering ability with variable sizes of volume data, such as cube and thin layer. The CT volume data we used in the experiments consisted of 328 × 356 × 443 slices, and the resolution is 0.54 × 0.54 × 0.6 mm. The range of the cube section is 134 × 176 × 123 slices, and its volume ratio is 5.6%, as shown in Fig. 11(a). The region of the surgeryrelated structures that surgeons are interested in can be clearly demonstrated in the cubic section. The spatial relationships among the vessels, tumors, and nerves can be clearly identified. Besides, the number of slices in thin-layer sectioning is eight. Our proposed method can provide the stereoscopic expression of the CT image. Subsequently, we fuse the endoscopic image with the results of the hybrid rendering, as shown in Fig. 11(d) and 11(e). The endoscopic image transits smoothly to the scene, and multiple targets can be easily identified. Finally, we compare the fusion of thin-layer section with full-size CT volume rendering. The results of full-size rendering are in Fig. 11(d) and 11(e), and the corresponding results of thin layer from the same perspective are shown in 11(f) and 11(g). Therefore, the fusion of thin layer and endoscopic image can preserve the important structures in the sight and discard much of the irrelevant information, which is helpful during surgery.

Data sectioning and 2D-3D fusion
In addition, we recorded the frame rate of all the aforementioned rendering tests. The frame rates are sampled every 30° as the scene rotates from 0° to 360°, which provides 12 sample points for every rotation axis. To uniformly distribute all the sample points over the observing sphere, we define three Cartesian coordinate systems at the center of the hybrid data space: Hybird − O XYZ, 1 Fig. 8(a), respectively. The relationship between the two of three coordinates can be calculated as follows: The 1 1 1 X Y Z is obtained by rotating the Y and Z axis by the angle of θ successively, while the 2 2 2 X Y Z is obtained by rotating the Y and X axis. In this study, we set 45 θ =  for the rotation of coordinates. All the nine axes of the three coordinate systems act as the rotation axis for frame rate sample, which generate 108 view angles over the observing sphere. Figure 12(a) demonstrates the frame rates with respect to all the rotating angles, and Table 4 shows the rendering frame rates of different types of data, which also includes the maximum and minimum. The average frame rate of multi-surface rendering is approximately 53 fps, whereas the average frame rate of the full-size volume rendering is only 19 fps. Moreover, for the hybrid rendering, the average frame rates of full size, cube, and section rendering are approximately 13, 25, and 43 fps respectively, as shown in Fig. 12(b). It can be seen that only the surfaces and thin-layer methods obtain an average frame rate higher than 30 fps, which can meet the real-time interaction requirement for surgery navigation in clinical practice. To accelerate the hybrid rendering, the most straightforward way is to increase the step size of sampling, however, it may lead to a large amount of aliasing artifacts. Another way is to truncate the volume data and discard the unnecessary data from rendering. In the clinical application, as the redundant structures may disrupt the surgeon's judgment, the whole data rendering is generally unnecessary. Hence, the hybrid rendering of cube section and thin-layer section can meet the requirements of surgical navigation instead of the whole data rendering.

Conclusion and discussion
In this study, we proposed an importance-driven hybrid rendering method to enhance the depth perception for AR-based endoscopic surgery navigation. The method was used to reduce the incorrect demonstration of complex internal anatomical structures and to minimize the cost of rendering, which had considerable potential for clinical application. First, important structures in volume data were highlighted using the gradient-based volume shading method. The shading method could eliminate the color shading in low-complexity areas with the exponential function of gradient magnitude. The structural similarity of different rendering results indicated that our proposed method could effectively enhance the structure and shape perception. Second, an importance sorting-based OIT method was introduced to improve the comprehension of multiple structure rendering. Pre-sorting the priority of different surfaces ensures that the shape and relation of the target were clearly identified in the complex scene. As rated by 15 participants (five clinicians and ten nonclinicians) on a five-point Likert scale, the importance sorting method demonstrated improved average performance and statistical significance. Moreover, the proposed 3D real-time sectioning method allow surgeons to concentrate on critical structures during surgery and decrease the cost of hybrid rendering. The frame rates of simulated multi-surface data and clinical data were evaluated, and the average frame rate of hybrid rendering with thin-layer sectioning reached 42 fps, which could be utilized in real-time surgical navigation to effectively improve rendering efficiency and information validity. The proposed method will greatly improve the structure and depth perception of the hybrid rendering in image-guided surgery.