User Experience Evaluation of a Spinal Surgery Robot: Workload, Usability, and Satisfaction Study

Background: Robotic spine surgery has continued to evolve since its US Food and Drug Administration approval in 2004, with products now including real-time video guidance and navigation during surgery. As the market for robotic surgical devices evolves, it is important to consider usability factors. Objective: The primary objective of this study was to determine the user experience of a surgical-assistive robotic device. The secondary objective was to evaluate workload, usability, the After-Scenario Questionnaire (ASQ), and the System Usability Scale (SUS). In addition, this study compares the workload, usability, and satisfaction survey of the device among different occupational groups using the device. Methods: Doctors (n=15) and nurses (n=15), the intended users of the surgical assistant robot, participated in the usability evaluation. Participants performed essential scenarios for the surgical assistant robot and provided scenario-specific satisfaction (ASQ), workload (NASA Task Load Index), and usability (SUS) scores. Results: Both doctors and nurses had task success rates of 85% or higher for each scenario. ASQ results showed that both doctors and nurses were least satisfied with ease of completing the task of registration (group 1: mean 4.73, SD 1.57 and group 2: mean 4.47, SD 1.8), amount of time it took (group 1: mean 4.47, SD 1.63 and group 2: mean 4.40, SD 2.09), and support information satisfaction (group 1: mean 5.13, SD 1.50 and group 2: mean 5.13, SD 1.89). All participants had low workloads, and the overall Task Load Index score had a P value of .77, which is greater than .05. The SUS results showed that the overall usability mean for doctors was 64.17 (SD 16.52) and the mean for nurses was 61.67 (SD 19.18), with a P value of .84, which is greater than .05, indicating no difference between the 2 groups. Conclusions: In this study, doctors and nurses evaluated the interaction of the device in a simulated environment, the operating room. By evaluating the use experience and usability of the device with real intended users, we can develop a more effective and convenient user interface.


Background
Spine surgery is used to treat degenerative diseases and deformities of the spine, with 45 million surgeries performed annually in the United States [1].The use of robotic-assisted navigation is increasing as the number of patients undergoing lumbar spinal fixation increases [2].Spine surgery typically involves 7 people in the operating room, with an operator surgeon, a surgical first assistant (who may be a doctor or physician assistant nurse, depending on operating room staffing), and scrub nurse in the sterile area and a circulating nurse and radiologist in the nonsterile area.Nonoperative personnel include an anesthesiologist and an anesthesiologist assistant.The use of robotics in spine surgery is usually reserved for difficult anatomical areas where it is difficult to fix screws blindly.Spinal fusion surgery is the insertion and fixation of pedicle screws into the vertebrae to eliminate pain by preventing movement between vertebrae [3].It is also used for quick insertion in severe scoliosis, collapsed vertebrae, or long-level fusion in patients with difficult anatomy, usually at the iliac screw, C1, C2, C7, T1, and T2, or for other reasons.This is usually used for kyphosis and scoliosis correction.

Robot Spine Surgery
Surgical navigation systems are used to plan the procedure and guide the surgeon in inserting the screws [1].Robot spine surgery is popularly used to increase the accuracy of inserting screws in the spine, and the first robot used in spine surgery was the Spine Assist (Mazor Surgical Technologies), which received Food and Drug Administration clearance in 2004 [4].The third-generation Mazor X system was cleared by the Food and Drug Administration in 2016 and, compared to previous generations, has a robotic arm that is attached to the patient's body and can be viewed through a camera to ensure that the screws are inserted and the robotic arm is moving well [5].The Mazor X Stealth Edition technology, which adds real-time image guidance and navigation during surgery, was cleared in 2019 and combines the best of both worlds: traditional spinal robotic surgery guidance and real-time software confirmation [5,6].Figure 1 shows the evaluation device, which consists of a robotic arm, main console, and optional staff console, and is manufactured in South Korea.Like the Mazor X Stealth Edition technology, this product is capable of real-time image guidance and navigation during surgery.

Usability
According to IEC (International Electrotechnical Commission) 62366-1 [7], usability has the following meaning: a "characteristic of the user interface that facilitates use and thereby establishes effectiveness, efficiency and user satisfaction in the intended use environment."ISO (International Organization for Standardization) 9241-210 [8] defines usability as the "extent to which a system, product or service can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use."These documents demonstrate that good usability in medical device design is essential to preventing error-related risks.
Evaluating usability focuses on determining whether the test device is easy for users to use.To evaluate the user experience of the device, we used usability tests and surveys for effectiveness, efficiency, satisfaction, and workload.The usability test was mainly used to identify use errors and efficiency, while the After-Scenario Questionnaire (ASQ) was conducted to evaluate the satisfaction of each task.NASA Task Load Index (NASA TLX) was used to measure the workload of the device, and System Usability Scale (SUS) was used to XSL • FO RenderX check the overall system usability of the device.Both ASQ and SUS identified the satisfaction and efficiency of the device, but in this study, ASQ identified the efficiency and satisfaction of each scenario, while SUS evaluated the satisfaction and efficiency of the device in the overall workflow.

Recruitment
We recruited 30 medical staff from Severance Hospital in South Korea.There were 15 doctors and 15 nurses.The intended users of the device are doctors and operating room nurses.Due to the different tasks that doctors and nurses have to perform when using the device overall, both groups were selected to participate in the usability test.We recruited through recommendations from colleagues and notice on the bulletin board at the Future Medicine Research Center at Gangnam Severance Hospital.Doctors and nurses who have experience using a robotic surgical device or navigation system were selected.For doctors, we selected those with the necessary knowledge of spine surgery, and for nurses, we selected those with experience in the operating room.However, those who had worked in the operating room for less than 1 year were excluded.After confirming these inclusion and exclusion criteria, the screening was conducted.

Testing Procedure
One participant per session participated in the usability test, with the participant completing the assessment in a sequence guided by a facilitator, and an observer in the observation room videotaping the assessment and completing an observation sheet.
A total of 2 moderators and 2 observers participated in the evaluation, with one of the moderators acting as a nurse if a doctor participated in the evaluation, and one of the moderators acting as a doctor if a nurse participated in the evaluation.In addition, observers were used to reduce bias by having 2 observers observe the usability test to ensure that one person's opinion was not biased.
The facilitator introduced the participants to the usability test, obtained their informed consent, and trained them on the device for 20 minutes.Participants were allowed to interact with the device as much time as they needed.Afterward, 15 minutes were allowed between the training and evaluation to ensure that the training did not directly influence the evaluation [9].Doctors and nurses were given different tasks because of the different job duties they do when operating with assessment devices.The tasks given to nurses focus on doctors' instructions from preoperative preparation to surgery, while doctors focus on the surgery itself rather than preparing devices.
Participants then completed the evaluation for 40 minutes, with 8 scenarios (24 tasks) for doctors and 12 scenarios (41 tasks) for nurses.Doctors were asked to complete the following scenarios: preparation for use, preplanning of surgery, fixation of patient marker, scan, registration, verification, revisions of surgical planning, and navigation; nurses were asked to complete the following scenarios: preparation for use, system operation, initializing manipulator, drape, preparation of surgery, scan, registration, verification, planning of surgery, navigation, use of the emergency stop switch, and cleaning up after surgery.Tasks for each scenario are shown in Tables 1 and 2.

Preparation for use
Check the contents related to the emergency button, foot switch, and foot jamming in the user manual.Task Check the hand jamming label on the robotic arm manipulator.Task

Preplanning of surgery
Select the spine level as follows: After loading the first CT a data, check the CT data.Task Create an implant screw insertion path for target L4-L and L4-R and change the insertion path by moving the screw in the MPR b view.

Task
Create an implant screw insertion path for target L5-L and L5-R and change the insertion path using the arrow.Task Check for collision between each screw.Task

Fixation of patient marker
Attach the patient marker to the patient.Task

Scan
Attach the registration tool adapter to the end effector.Task After activating the hand guide function by pressing the AP c button, change the position of the end effector according to the guidance on the pop-up window.

Task
For C-Arm scan, attach the source calibrator to the end effector in the direction of AP and move the end effector to enable tracking by OTS d .

Task
Check if the ROI e includes the calibration marker, and the position and direction of the letters "R," "G," and "J" match the image, and then check pass or fail of registration.

Registration
Perform segmentation to distinguish the surgical target in the image.
Perform labeling to assign target level information of ROI of 3D image and 2D image segmented for each spine level.Task Adjust the ROI box so that the ROI of target L4 covers all the L4 vertebra area.Task Perform 2D and 3D image registration for each spine level.Task

Verification
After adjusting the CT image to overlay appropriately for target L4, check the registration result using the preview button and select whether to approve it.

Task
After selecting whether to approve for target L5, perform image registration again so that the ROI includes all the vertebra area.

Revision of surgical planning
Check the plan on 2D and 3D images, respectively.Task As a result of planning for the entire target, check whether the robot can move in an area.Task

Navigation
Insert the screw of target L4-L.Task Through the [PRE-OP] screen, indicate the values for the insertion depth of the L4-L tapper, the amount of force applied to the end effector, and the patient's movement.

Task
Through the [INTRA-OP] screen, indicate the values for the insertion depth of the L4-L screw, the amount of force applied to the end effector, and the patient's movement.

Task
Move the end effector to the ready position for screw insertion to the target L4-R.Task Table 2. Use scenarios of nurses.

Preparation for use
Check the contents related to the emergency button, foot switch, and foot jamming in the user manual.Task Check the hand jamming label on the robotic arm manipulator.Task Check if there are any abnormalities in the exterior of the robot marker frame and robotic arm.Task Place the main console and staff console in a convenient location during surgery.Task After checking the device and accessories in the operating room, assemble the marker ball.Task Assemble the surgical tools such as the tapper's driver and marker.Task Assemble the surgical tools such as screwdriver and marker.Task Assemble the clamp to be used to fix the patient marker.Task

System operation
Connect the power and cables of the robotic arm, main console, and staff console.Task After connecting the foot switch of the robotic arm, turn on the power of the robotic arm.Task

Initializing manipulator
After logging in, select the surgical method and imaging device.Task Check the connection status of the foot switch.Task After selecting the robot position as "right," initialize the manipulator (required to check movement notification sound and operation LED a ).

Task
Verify that the line laser on the robot marker intersects the area within range (required to check movement notification sound and operation LED).

Drape
Follow the on-screen instructions to drape the patient to prevent infection (proceed in order of manipulator drape, base drape, and robot marker drape).

Task
After installing the end effector of the robotic arm, assemble the marker ball where the robot marker drape is installed.Task Move the manipulator to the ready position.Task

Preparation for surgery
Move the robotic arm for patient surgery.Task Check the surgical tools through OTS b , and if all surgical tools are not checked by the OTS camera, check if they are within the operating area.

Task
The robot marker is not being recognized by the OTS camera due to damage to the marker ball.Replace with a new marker ball.

Task
Please load the surgical data.Task

Scan
Check if the ROI c includes the calibration marker, and the position and direction of the letters "R," "G," and "J" match the image, and then check pass or fail.

Registration
Perform segmentation to distinguish the surgical target in the image.
Perform labeling to assign target level information of ROI of the 3D image and 2D image segmented for each spine level.

Task
Adjust the box so that the ROI of target L4 covers all of the vertebra area.Task Perform 2D and 3D image registration for each spine level.Task

Task description
Use scenario/task number Use the slide control at the bottom of the image to check whether the 2D and 3D images match to check the registration result.

Task 27
For target L4, move the CT d (DRR e ) image by using the triangular button to adjust the 2 body images to be similar.Task 28 For target L4 whose registration result has been adjusted, use the preview button to check the registration result and select whether or not to approve it.

Task 29
After selecting whether or not to approve for target L5, perform image registration again so that the ROI includes all of the vertebra area (target L5: registration failed).

Task 30
After displaying the planned data on the screen through the preview button for each target for which the registration result has been adjusted, check if the registration is completed normally.

Task 31
Depending on the registration result of target L5, select whether or not to approve (target L5: registration completed normally).The test environment as shown in Figure 2 is organized to resemble the operating room.Participants used the device following prompts presented on a stand monitor.The test environment was organized similar to an operating room, considering the use environment of the robot spine surgery.An operating room bed, an upper torso dummy, and a patient monitoring device were prepared similar to the actual operating room environment.The temperature and humidity of the evaluation room were measured and recorded right before the evaluation.Similar to a real operating room, the temperature was kept between 20 °C and 24 °C, and the humidity was between 30% and 60%.
The evaluation facilitator guided the participant if they requested assistance with a use scenario, and an observer recorded all participant interactions from outside the test room with a 1-way mirror.The test observation environment setting is shown in Figure 3.The observer used a program from Media Express to record the progress of the usability evaluation.At the end of the evaluation, 3 types of questionnaires were administered.

ASQ Measure
After each scenario, the participants completed the ASQ created by Lewis [10] and developed from the ISO 9241-11 standard questionnaire [11].This is one of the most popular surveys for assessing usability because it is the simplest and its 3 items are easy for participants to understand [11].As shown in Textbox 1, the ASQ consists of 3 questions, each corresponding to the user's satisfaction with the ease, efficiency according to the time taken to complete the scenario, and validity of the information provided.Participants responded to each question on a 7-point Likert scale [10,12].Participants rated their satisfaction about the device's usability based on each task scenario [13].A score of "1" means strongly disagree, and a score of "7" means strongly agree [14].We found the mean and SD for the 3 questions participants asked ASQ.

XSL • FO
RenderX Textbox 1. After-Scenario Questionnaire (ASQ).ASQ1: Overall, I am satisfied with the ease of completing the tasks in this scenario.ASQ2: Overall, I am satisfied with the amount of time it took to complete the tasks in this scenario.ASQ3: Overall, I am satisfied with the support information (digital help, messages, and documentation) when completing the tasks.

NASA TLX Measure
NASA TLX measures cognitive workload, and like usability, workload is a complex construct that determines the amount of physical and mental effort required to use an interface [15,16].The workload is assessed by the US NASA TLX [15,17].The most effective way to assess a worker's perceived job difficulty is to ask questions directly to workers who have experienced the job.As shown in Figure 4, the Task Load Index (TLX) uses 6 dimensions to measure workload.The 6 metrics are mental demand, temporal demand, physical demand, performance, effort, and frustration [16,18].The NASA TLX scores are evaluated by dividing the score into 21 steps, subtracting 1 from the score, and multiplying it by 5 to express it on a scale of 0 to 100 [19,20].On a scale of 100, when the score is lower, the workload is lower.Less work means a less complex and easier-to-use user interface.On a 100-point scale, the workload can be described as low (0-9), medium (10-29), rather high (30-49), high (50-79), and very high (80-100) [21].In the NASA TLX, performance assesses satisfaction with task completion, with the lowest number representing perfect and the highest number representing failure [22].The point system for mental, physical, temporal, effort, and frustration part ranges from very low to very high [15,16,19,22,23].

SUS Measure
As shown in Textbox 2, the SUS consists of 10 items that assess the participant's level of agreement with the overall usability of the system, with odd-numbered items being positive and even-numbered items being negative [24].SUS is the most commonly used usability assessment questionnaire [24].Participants responded to each item on a 5-point Likert scale [13].The scale ranges from 1=strongly disagree to 5=strongly agree [25,26].To calculate the SUS score from the points acquired from the 5-point Likert scale, the following subtractions were used.For odd-numbered items, subtract 1 from the user response, and for even-numbered items, subtract the user responses from 5. With this calculation, the value range changes from 0 to 4. The most positive response is 4. The scores from each converted response were multiplied by 2.5 to a total possible point of 100.The SUS is percentage-based and divided into 5 levels: A (>80.3), B (68-80.3),C (68), D (51-68), and F (<51) [24].A score of 85 is considered very good usability, and a score of 68-84 is considered good usability [25,26].

Data Analysis
ASQ, NASA TLX, and SUS results were computed using SPSS (version 22; IBM Corp) [27].Descriptive statistics were performed on for doctor and nurse characteristics.Doctors and nurses were compared on age, gender, work experience, and use experience with similar devices.For the questionnaire items, values were compared between groups using 2-tailed t tests for normality and Mann-Whitney U tests for nonparametric tests.Figures are presented as the mean and SD, and P<.05 was considered significant.

Ethical Considerations
This study was approved by the institutional review board of Yonsei University Health System, Gangnam Severance Hospital (3-2022-0493).All the participants who passed the screening signed an informed consent form.Furthermore, all information collected about the participants was anonymized.This study complied with the Code of Ethics.Participants received monetary compensation for participating in the evaluation.

User Statistics
In total, 15 doctors and 15 nurses, each representing the intended users of the surgical assistant, participated in the evaluation.Participants were recruited from doctors and nurses at Severance Hospital.Table 3 shows the sociodemographic characteristics of participants.Both doctors and nurses were between the ages of 30 and 39 years.For both doctors and nurses, those with different experience levels were recruited, and opinions were collected from all the participants.In particular, those with more experience with surgical devices were able to gather relevant opinions because they were more familiar with the device or the existing surgical methods, while those with less experience focused on whether the device was easy to use without much experience.Doctors' professional experience ranged from 2 to 20 years, with an average of 7.53 (SD 5.45) years of professional experience.The nurses' professional experience ranged from 5 to 26 years, with an average of 12.93 (SD 6.43) years.The surgical assistants had used Medtronic (Medtronic), Stryker (Stryker Corp), and Curexo (Curexo, Inc), with an average of 3 (SD 2.36) years of experience.

Task Completion
The 15 doctors performed 24 tasks within 8 large scenarios, while the nurses performed a total of 41 tasks within 12 scenarios.As shown in Table 4, for the doctors, all 8 scenarios had a success rate of 90% or higher, with the lowest success rate for the revising a surgical plan scenario.As shown in Table 5, for nurses, all 11 scenarios except the surgical plan had a success rate of 90% or higher, with the planning of surgery scenario having an 87% success rate.Among the scenarios in which the doctors did not successfully complete a task during the assessment, the critical tasks were as follows: in task 11, the participant failed to follow the process of attaching the source calibrator in the opposite direction to track the optical tracking system and did not recognize the correct attachment method.In addition, pressing the anterior-posterior scan button and moving the end effector closer to the dummy proceeded correctly, but before attaching the source calibrator, the optical tracking system process could not be performed because it did not proceed in the existing pop-up window and proceeded to the data acquisition step.In task 18, participants selected the target but were unable to click the "Re-matching" button.To proceed with rematching, a target needs to be selected and pressed, but the Re-matching button could not be clicked because the target was not selected.In task 21, the participant did not recognize whether the robot movement was completed by continuously pressing the foot switch without releasing it.In this case, the participant said that he was unable to perform the task because there was no indication on the screen that the robot's movement was complete, and there was no visual or audible user interface.
The nurse was unable to complete task 28 due to difficulty using the image adjustment feature.Participants were asked to move the computed tomography image and adjust the body image to be similar but could not comprehend how to use the "Adjustment" function or the "Re-matching" function (the user did not recognize the intended function itself).Even when the "Adjustment" function was used, it was observed that the user could not use the "Adjustment" function in the way intended by trying to adjust the overlayed screen itself rather than adjusting the screen by pressing the button.If an accurate match is not made, the manipulator may move to a different location than the user's target location, causing potential harm.In addition, nurses had difficulty using the reassembly feature of task 30.The "Re-matching" function could not be used because the target was released while pressing the "Disapproved" button in the rematching task, or the "Adjustment" function was used rather than using the "Re-matching" function.Participants failed to perform the task because they did not recognize that the "Re-matching" function could only be used by resetting the target that was released when pressing the "Disapproved" button, or that "Re-matching" meant rematching.This caused potential harm by moving the manipulator to a location different from the user's target location.Nurses were unsuccessful in tasks such as registration, verification, and navigation because these tasks are usually performed through doctors' orders.During the scenarios, there were no given orders, forcing the nurses to make their own decisions, which they are not accustomed to.
Overall, 4 doctors said that when creating a screw position in the planning stage, the position is created in a completely different part from the actual location; thus, it would be better if the position could be created closer to the target, and when moving the position, that it would be better to be able to check other position paths at the same time.In total, 7 doctors said it would be better if there was notification or guidance for the arrival of the robot arm at the target so that moving to the guided position can be recognized.In addition, 5 doctors and 8 nurses found that in the overall process of selecting and adjusting the region of interest (ROI) box to the target area, it was inconvenient to select and release the box, and that it was difficult to adjust because of its excessive rotation.

Usability (ASQ)
After the usability evaluation, doctors and nurses were surveyed using the ASQ for each scenario.For both doctors and nurses, the ASQ for registration was divided into 2 parts: first, segmentation and labeling, and second, ROI setting and image matching.As shown in Tables 6 and 7, among the registration items, both doctors and nurses had the lowest scores for the ROI setting and image matching, followed by ease of completing the task (group 1: mean 4.73, SD 1.57 and group 2: mean 4.47, SD 1.89), amount of time it took (group 1: mean 4.47, SD 1.63 and group 2: mean 4.40, SD 2.09), and support information satisfaction (group 1: mean 5.13, SD 1.50 and group 2: mean 5.13, SD 1.89).The doctors' opinions were mainly that it was inconvenient to have to click on the line precisely; thus, the ease of adjustment should be improved.Nurses reported that they were less sensitive to the 360-degree rotation button at the XSL • FO RenderX top of the ROI box and had difficulty clarifying the image while adjusting the ROI box.Tables S1 and S2 in Multimedia Appendix 1 show categorization by use experience with similar devices.The ASQ results did not show significant differences in satisfaction based on use experience and years of experience with robotic surgical systems.For doctors, those with more than 3 years of experience using robotic surgical systems found it easier and faster to perform tasks.For nurses, participants with more experience using similar devices scored higher than those with less than 3 years of experience on the need to prepare before surgery.

Workload (NASA TLX)
Table 8 shows the results of the workload of the assistive robotic surgery devices by occupational group for mental demand, physical demand, temporal demand, performance, effort, frustration, and overall TLX.The Mann-Whitney U test comparing the TLX scores of the doctors and nurses, including mental demand (P=.81), physical demand (P=.90), temporal demand (P=.87), performance (P=.81), and frustration (P=.81) and the independent 2-sample t test comparing the TLX scores of effort (P=.64) and overall TLX (P=.77) showed no significant XSL • FO RenderX differences in the scores.Doctors' workload levels were generally in the medium (10-29), and nurses were also in the medium (10-29) except for effort.Figure 5 shows a boxplot of the NASA TLX results for doctors and nurses.In addition, Figure 6 shows the NASA TLX results for all evaluation participants (doctors and nurses).The box plots show the maximum (45-100), median (5-25) minimum (0), first quartile (0-17.5),and third quartile (12.5-52.5),with the center box showing the median of 50% of the cases.When comparing the workload of the doctors and nurses, there was no significant difference as shown in Table 8, and we can see that 3 categories, physical demand, temporal demand, and frustration, have lower workloads than the others.In Tables 9 and 10, NASA TLX scores were compared based on use experience with similar devices.Participants who had been using robotic surgical systems for more than 3 years, both doctors and nurses, reported that the evaluation device required a lot of effort to use.Participants had difficulty using the evaluation device because it was more complex than similar robotic surgical devices.However, when it comes to temporal demand, those who have been using robotic surgical systems for more than 3 years reported that it is not time-consuming.The workflow of the evaluation device is not much different from existing robotic surgical devices, and the graphical user interface is easy to use and can be performed quickly.

Usability (SUS)
Both doctors and nurses who participated in the usability test completed the SUS questionnaire.Table 11 shows that the mean score of SUS was 64.17 (SD 16.52) for doctors and 61.67 (SD 19.18) for nurses.The nonparametric Mann-Whitney U test was used to analyze U values and P values presented in Table 11.
When comparing the SUS scores of the doctors and nurses (P=.84, greater than .05),we can see that there was no significant difference between the 2 values.Figure 5 is a boxplot comparing the SUS scores of the doctors and nurses, showing a baseline of 68, which is the average SUS score.For the nurses and doctors, this corresponds to a grade of D on the SUS scale.When we compared participants' use experience with the device between those who had used it for more than 3 years and those who had used it for less than 3 years, we found that for doctors, those who had used it for more than 3 years had lower SUS scores than those who had used it for less than 3 years.For doctors, the average SUS score for participants with 3 or more years of experience is 55.5 (SD 19.13), while the average SUS score for those with less than 3 years is 68.5 (SD 13.05).For nurses, similar to doctors, we found that participants who had used a similar device for more than 3 years had lower scores than those who had used it for less than 3 years, at 53.75 versus 64.55 (Table 12).

Principal Findings
This study is a summative evaluation to examine the usability of the frameless stereotaxic navigation system (model CS200) that is used as an auxiliary tool for guiding the surgical tool to the target position and posture planned by the user in the incision or percutaneous spinal surgery.Regarding the use scenario, (1) task success, (2) use error, (3) satisfaction (ASQ), ( 4) workload (NASA TLX), and (5) SUS related to the usability of the test device were evaluated and analyzed.The usability test was conducted by professional medical staff who have completed specialized medical education and obtained professional medical qualifications.The participants in the usability test were doctors and nurses who have experience in using spinal surgery robots or navigation systems in operating rooms.
For doctors, all 8 scenarios had a success rate of at least 93% or higher, and for nurses, all 12 scenarios had a success rate of at least 87% or higher.Doctors had the lowest success rate of 93% on the "revision of surgical planning" scenario.In the ASQ results, the average score for "ease of completing the task" was 5.66 (SD 0.36), the average score for "amount of time it took" was 5.54 (SD 0.52), and the average score for "support information satisfaction" was 5.70 (SD 0.30).When performing the usability evaluation, the "revision of surgical planning" scenario had the lowest success rate; however, the 3 ASQ scores were higher than the average: 5.80 (SD 0.83), 5.93 (SD 0.68), and 5.93 (SD 1.29).

RenderX
The "image matching" scenario had the lowest score for each item in the satisfaction score, even though it had a 100% success rate.We found that a high success rate on the evaluation task does not necessarily indicate high usability satisfaction.Despite the high task success rate in the usability test, the low satisfaction rate in the questionnaire that evaluated the usability aspects of the device indicates a lack of satisfaction with the device.
Although there were no difficulties in performance, the task of adjusting the ROI within the "registration" scenario was criticized for its difficulty in accurately adjusting the ROI and its lack of usability.Nurses, like doctors, had the lowest success rate of 87% in the "revision of surgical planning" scenario.However, the ASQ survey results for the "registration" scenario, which had the highest success rate of 98%, showed the lowest scores for "ease of completing the task" with a mean of 4.47 (SD 1.89), "amount of time it took" with a mean of 4.40 (SD 2.09), and "support information satisfaction" with a mean of 5.13 (SD 1.89).Similar to the doctors, when adjusting the ROI box, many of the nurses commented that the 360-degree rotation button at the top was not sensitive, and the video was difficult to see clearly.In addition, both doctors and nurses reported that when moving the robot arm using the footswitch during the "navigation" scenario, there was no visual or audible indication of how far the robot arm had moved and whether it had completed its movement, resulting in collisions between the robot arm and the patient stack.If used with real patients, this could lead to a significant risk of patient injury.We believe that the usability of these screens needs to be improved.
When comparing the workloads of the 2 groups who primarily use the assessment tool, we found that there was no difference, and that the workloads of mental demand and performance are high for both groups.Doctors and nurses commonly commented that the process was too complicated and laborious and that they had doubts about the accuracy of the ROI adjustment.In addition, since the robot assists in surgery, we thought it would be a quick process, but we found that the robot needed more time to move than expected, which affected the workload.
The SUS also showed no significant difference between doctors and nurses, with slightly lower-than-average satisfaction scores.The doctor gave the system a low score on the usability scale because it was too time-consuming to use in the actual operating room.Nurses gave low scores because of the time-consuming setup prior to actual surgery.Both doctors and nurses gave low scores, especially on the items that they felt they would need technician support to use the system and that they would need to learn about the system before its use, because many of them had never used an assessment device before and were not familiar with it.In addition, the lack of usability, with no explanation of what to do next on the device and no prompts to prevent errors, contributed to the low scores.At the hospital where participants work, engineers who have no difficulty using similar devices are present to aid doctors and nurses in using the device.Because of their reliance on engineers, many of the participants had difficulty in using the device alone and commented that they needed the engineers' support.This suggests that the device needs to be highly usable with an easily understandable user interface and a screen design that is familiar to medical staff so that they can use the device without engineers' assistance.Overall, when evaluating usability, there was no significant difference between participants who had used similar devices for more than 3 years and those who had used them for less than 3 years, except for ASQ and SUS, which evaluate satisfaction, and NASA TLX, which evaluates the difficulty of the operator's job.It was found that there were differences in job duties when using the equipment that they were familiar with as well as differences in the time taken to perform the tasks.In other items, there seems to be no problem in using the device once they are familiar with it.
Table S3 in Multimedia Appendix 1 shows the improvements made to the device since the usability evaluation.The user interface was improved by quantifying and intuitively displaying data that used to be shown only as graphs.Confusing highlighted buttons that hindered the use of the device were rearranged to decrease the errors made by users.
To further increase the usability of the device, the footswitch needs to be improved to recognize how much movement is required to move the device by displaying the information on the interface.In particular, during the navigation phase, it would increase the usability if a notification or on-screen guidance appeared when the target was reached while moving to the location guided by the robotic arm.In addition, the drape is divided into 3 stages, while other similar devices only have 1 drape, increasing the risk of contamination.
Although the participants have experience using third-party equipment, they had difficulty using this evaluation device for the first time because they were not familiar with it; however, we do not think there will be any major problems once they are accustomed to the evaluation device.In addition, as a robot that guides the position of the screw in the patient's body, it should have a more accurate and simpler workflow, making it more competitive with other products.

Limitations
This study is limited by the fact that our evaluation took place at a single institution, Severance Hospital in South Korea.In South Korea, robotic surgery is not yet widely used, and many people have not used robotic surgical instruments before; thus, they are still unfamiliar with robotic surgical instruments.However, the strength of this study is that we conducted usability tests with doctors and nurses in the operating room, who are the closest users of the new system, the surgical assistant robot.

Conclusions
In group 1, a success rate of 93% or higher was observed in all 24 tasks.In group 2, a success rate of 87% or higher was observed in 38 of 41 tasks.A success rate of 80% was observed in the task related to marker ball view confirmation (task 18), 80% in the task related to the use of the "Adjustment" function (task 28), and 75% in the task related to using "Re-matching" (task 30).
In addition, subjective data such as follow-up questions and surveys were more effective in identifying shortcomings and judging the usability, satisfaction, and effectiveness of the device XSL • FO RenderX a link to the original publication on https://humanfactors.jmir.org,as well as this copyright and license information must be included.

Figure 1 .
Figure 1.The CUVIS spine robot system: robotic arm (left), main console with navigation optical infrared tracking camera (middle), and the control workstation (right).OTS: optical tracking system.
Check whether the robot can move to the planned position.Task 33 Navigation On the screen, move the end effector of the robotic arm to the planned guide position relative to target L4-L.Task 34 Move the end effector to the original position for the guide.Task 35 On the screen, move the end effector of the robotic arm to the planned guide position relative to target L4-R.Task 36 Use of the emergency stop switch (At the moment, the manipulator is positioned too close to the patient.)Press the emergency button.light emitting diode.b OTS: optical tracking system.c ROI: region of interest.d CT: computed tomography.e DRR: digitally reconstructed radiograph.

Figure 2 .
Figure 2. Test environment: the simulated environment is organized to resemble the operating room in which the evaluator is used.

Figure 3 .
Figure 3. Test observation environment: we set up monitoring equipment to observe and record the entire evaluation process in real time.
f N/A: not applicable.

Figure 5 .
Figure 5. Workload results by group: distribution of the NASA TLX scores for doctors and nurses.TLX: Task Load Index.

Figure 6 .
Figure 6.Workload results: distribution of the overall NASA TLX scores.TLX: Task Load Index.

Table 1 .
Use scenarios of doctors.

Table 3 .
Sociodemographic characteristics and experience of the test participants (N=30).

Table 4 .
Task completion rate in doctors.

Table 5 .
Task completion rate in nurses.

Table 6 .
After-Scenario Questionnaire result in group 1 (doctors).Not available; fixation of patient markers was not surveyed because they do not have any on-screen information.
a b ROI: region of interest.
a ROI: region of interest.

Table 8 .
Result of NASA Task Load Index.Because the data were normally distributed, a independent 2-sample t test was used.Because the data were not normally distributed, the Mann-Whitney U test was performed.P values were determined with the independent 2-sample t test and Mann-Whitney U test for continuous variables.
b Group 2: nurses.c d e

Table 9 .
Comparison of the NASA Task Load Index by use experience of doctors.

Table 10 .
Comparison of the NASA Task Load Index by use experience of nurses.

Table 11 .
Result of the System Usability Scale (SUS).Because data were not normally distributed, the Mann-Whitney U test was performed.P values were determined with an independent 2-sample t test and the Mann-Whitney U test for continuous variables.
b Group 2: nurses.c d

Table 12 .
System Usability Scale (SUS) comparison by use experience of a similar device.