Quantitative Assessment of the Learning Curve for Robotic Thyroid Surgery

With the increased utilization of robot thyroidectomy in recent years, surgical proficiency is the paramount consideration. However, there is no single perfect or ideal method for measuring surgical proficiency. In this study, we evaluated the learning curve of robotic thyroidectomy using various parameters. A total of 172 robotic total thyroidectomies were performed by a single surgeon between March 2014 and February 2018. Cumulative summation analysis revealed that it took 50 cases for the surgeon to significantly improve the operation time. Mean operation time was significantly shorter in the group that included the 51st to the 172nd case, than in the group that included only the first 50 cases (132.8 ± 27.7 min vs. 166.9 ± 29.5 min; p < 0.001). On the other hand, the surgeon was competent after the 75th case when postoperative transient hypoparathyroidism was used as the outcome measure. The incidence of hypoparathyroidism gradually decreased from 52.0%, for the first 75 cases, to 40.2% after the 76th case. These results indicated that the criteria used to assess proficiency greatly influenced the interpretation of the learning curve. Incorporation of the operation time, complications, and oncologic outcomes should be considered in learning curve assessment.


Introduction
Robotic thyroidectomy has been described as an alternative method for removing the thyroid gland without incising the neck. The bilateral axillo-breast approach (BABA) is one of the more popular techniques for robotic thyroidectomy [1]. Among the various remote-access approaches, BABA has several advantages [2]. BABA provides a symmetrical operative view, which is similar to that of open thyroidectomy. This midline approach allows optimal visualization and dissection for vital structures in both thyroid lobes. Moreover, BABA affords the largest operative angles for instrument insertion and manipulation, which can minimize instrument fighting or crowding. Evidence of the cosmetic superiority and surgical safety of BABA robotic thyroidectomy has also been widely reported [3][4][5]. With the increased utilization of robot thyroidectomy in recent years, surgeon training and proficiency are paramount considerations.
The relationship between patient outcomes and surgeon experience has been extensively investigated over the last 10 years [6]. Many studies have demonstrated that more experienced surgeons are associated with decreased operation times and complication rates. Learning curves are commonly used to plot the number of cases necessary to acquire robotic skills and gain the mastery and proficiency of an experienced surgeon [7]. Previous studies evaluating the learning curve for robotic thyroid surgery indicated that operation times have been shown to decrease gradually and reach a steady state only after 35-40 cases are performed by an individual surgeon [8][9][10]. However, these studies have used only operation time as the factor for determining the learning curve. Operation time may not be the most appropriate marker of learning, as speed does not equate to proficiency.
A recent systematic review indicated that learning curves for oncologic surgery could be analyzed using various outcomes [6]. These outcomes are classified into six main categories: operation times, intraoperative outcomes, postoperative outcomes, complications, oncologic outcomes, and surgical success outcomes. Of these categories, operation time was the most commonly used outcome because it is easily measured and compared [7,11]. However, learning curves should also refer to other important indicators for patient care, including functional or oncologic outcomes. As postoperative hypoparathyroidism is the most common complication after total thyroidectomy, we set the functional outcome variable as the incidence of transient hypoparathyroidism. The present study evaluated the learning curve for BABA robotic thyroidectomy using operation time and postoperative transient hypoparathyroidism.

Study Population
Our institutional review board approved this retrospective study and waived the requirement for written informed consent (Approval No. 2017-09-057). This study included 172 consecutive patients with thyroid cancer who underwent BABA robotic total thyroidectomy from March 2014 to February 2018. The BABA technique used in this series is described elsewhere [12]. The operative indications for BABA robotic thyroidectomy in my institution included tumor size under 2 cm, tumors without gross extrathyroidal extension, and no lymph node metastasis on ultrasound. Ipsilateral prophylactic central lymph node dissection was performed in all cases. All surgeries were performed by a single surgeon (H. Kwon), who had little prior experience with endoscopic thyroid surgery. The vocal cords were examined routinely 1 day before and 2 weeks after the thyroidectomy using video-assisted laryngoscopy. Serum concentrations of parathyroid hormone, calcium, phosphorus, and ionized calcium were measured 1 day before and 2 days after the surgery. All patients were followed at intervals of 2 weeks, 3 months, and then 6 months. Follow-up tests included clinical evaluation for hypocalcemia, vocal cord integrity, and thyroid function. Demographic data, pathologic stage, operation time, operative complications, postoperative serum thyroglobulin level, and remnant thyroid tissue on ultrasound were recorded and analyzed.

Definitions
Operation time was defined as the time from skin flap elevation to the detachment of the robot from the patient. Transient hypoparathyroidism was defined as a postoperative serum calcium level <8.5 mg/dL (normal range, 8.5-10.2 mg/dL) with a subnormal parathyroid hormone level (<15 pg/mL). Permanent hypoparathyroidism was defined as a requirement for calcium or vitamin D supplementation for periods >6 months, following surgery. Recurrent laryngeal nerve (RLN) injury, defined as a postoperative laryngoscopic impairment of the motility of the vocal cords, was considered permanent in patients exhibiting persistent impairment at 6 months after surgery.

Statistical Analysis
Cumulative summation (CUSUM) analysis was designed for the quantitative estimation of the learning curve. The CUSUM test shows the cumulative differences between the target data and the observed data. This method was used to plot the operation time and determine the learning curve for operation time [13]. Briefly, the 172 cases were ordered chronologically, from the earliest to the latest surgery date. The difference between the operation time of the ith case and the mean operation time was defined as S i . The S i values were sequentially summed and then plotted graphically using the equation CUSUM = ΣS i [14].
The learning curve for transient hypoparathyroidism was assessed using the learning curve CUSUM (LC-CUSUM) and standard CUSUM methods [15]. The LC-CUSUM test computes a score from successive outcomes; an operation without complications increases the score, and an operation with complications decreases the score. The LC-CUSUM method is helpful in determining whether a procedure or proficiency has reached an acceptable level. After achieving surgical competence, we applied the standard CUSUM to ensure the complication rate did not deviate from optimal performance. A more detailed description of LC-CUSUM and standard CUSUM with regards to the formulation, performance, and limits, is provided elsewhere [15][16][17]. Since the two largest series including over 1000 patients showed transient hypoparathyroidism rates of 48.1% and 39.1%, the unacceptable failure rate was set as 0.5 (transient hypoparathyroidism occurring in 50% of patients) during the learning curve period in the present study [18,19]. We also chose an acceptable failure rate of 0.4 and a control limit of 1.25, accordingly. During the control period using the standard CUSUM, the target failure rate was set at 0.6 with a control limit of 2.5.
We used SPSS version 22.0 (IBM Corp., Armonk, NY, USA) for all statistical analyses. Continuous data were compared using Student's t-tests and dichotomous data were compared using chi-squared tests. Correlation coefficients (R) were calculated using bivariate correlation analysis. A p-value less than 0.05 was considered statistically significant.

Results
The characteristics of the included patients are summarized in Table 1  The operation times and CUSUM learning curves are plotted in Figure 1. The mean operation time was 142.7 ± 32.1 min. The best fit for the curve was a sixth-order polynomial with the equation CUSUM = −6.8908 + 9.1341 × (Case number) + 1.8883 × (Case number) 2 which had a high R 2 value of 0.983. The slope of the learning curve turned from positive to negative after the 50th case, which means that the surgeon required 50 cases to achieve mastery. The mean operation time was significantly shorter in the group that included the 51st to the 172nd case than in the group that included only the first 50 cases (98.0 ± 27.6 min vs. 123.5 ± 29.3 min; p < 0.001).
Other clinicopathological factors, including complication rates, showed no differences between the groups ( Table 2). which had a high R 2 value of 0.983. The slope of the learning curve turned from positive to negative after the 50th case, which means that the surgeon required 50 cases to achieve mastery. The mean operation time was significantly shorter in the group that included the 51st to the 172nd case than in the group that included only the first 50 cases (98.0 ± 27.6 min vs. 123.5 ± 29.3 min; p < 0.001). Other clinicopathological factors, including complication rates, showed no differences between the groups ( Table 2).   The cumulative sums of transient hypoparathyroidism incidence are shown in Figure 2. The incidence of hypoparathyroidism was 52.0% for the first 50 cases, which decreased to 46.0% for the 51st to the 100th case, and to 40.3% for the 101st to the 172nd case. The LC-CUSUM analysis indicated that the surgeon was proficient at the 75th case. Until this point, 39 patients (52.0%) experienced transient hypoparathyroidism, which was above the unacceptable failure rate cutoff of 50%. However, from the 66th to the 75th case, the incidence of hypoparathyroidism was 30.0%, which was below the acceptable failure rate of 40.0%. The learning curve for minimizing transient hypoparathyroidism, therefore, required 75 cases. A standard CUSUM analysis was started after the 75th case to ensure that the surgeon did not deviate from ideal performance. Although the alarm was raised after case number 111, no further alarm was raised to the end of 172 completed cases. As the LC-CUSUM analysis could be threshold-dependent, we further performed sensitivity analyses ( Table 3). The learning curve for minimizing transient hypoparathyroidism varied widely from 4 to 163 cases, depending on the different threshold levels. Most of the learning curves in the sensitivity analyses ranged from 73 to 76 cases, although lower complication rates needed more cases for proficiency. The learning curve of 75 cases in the present study-using an unacceptable failure rate of 50.0% and an acceptable failure rate of 40.0% with decision limit 1.25-was also comparable with these sensitivity analyses. As the LC-CUSUM analysis could be threshold-dependent, we further performed sensitivity analyses ( Table 3). The learning curve for minimizing transient hypoparathyroidism varied widely from 4 to 163 cases, depending on the different threshold levels. Most of the learning curves in the sensitivity analyses ranged from 73 to 76 cases, although lower complication rates needed more cases for proficiency. The learning curve of 75 cases in the present study-using an unacceptable failure rate of 50.0% and an acceptable failure rate of 40.0% with decision limit 1.25-was also comparable with these sensitivity analyses.

Discussion
This study demonstrated that a surgeon required more completed operations to achieve proficiency at BABA robotic thyroidectomy when the incidence of transient hypoparathyroidism was used as the marker of proficiency, as opposed to operation time being used as the proficiency marker. Our findings also indicated that learning curve assessments should consider several variables, including functional or oncologic outcomes. In the field of robotic surgery, significant interest has been placed on surgeon performance evaluations, and the emphasis on safety and outcomes. Less than half of the previously published research on this topic, however, employed proper methods and appropriate outcome measures [7]. Qualitative or subjective methods cannot formally or impartially evaluate surgical competence, and their reproducibility and objectivity are questionable [20]. Studies specifically investigating the learning curve of robotic thyroidectomy have had similar limitations: nonobjective methods without statistical modeling or stratification and use of operation time as the only outcome measure [8][9][10]. To overcome these problems, we applied statistical modeling and compared learning curves that used operation time and postoperative hypoparathyroidism as markers of proficiency.
Observation by a tutor and graphical description of learning curves have been the most common ways of judging individual performance; however, both are prone to subjectivity [20,21]. Objective clinical performance can be measured using statistical methods, including the Shewhart p-chart, g-chart, exponentially weighted moving average chart, and CUSUM chart [22]. Among all of these methods, CUSUM has gained popularity and has been widely disseminated due to its simple formulation, its capability to detect small persistent changes, and its intuitive depiction [23]. The CUSUM method has also shown its capacity for detecting fatal errors, near misses, and suboptimal performance in a timely fashion [23]. We used LC-CUSUM, which was developed to enable quantitative assessment of individual performance and signals when a predefined level of performance has been achieved [15].
There is no single perfect or ideal method to measure surgical proficiency. However, learning curve evaluations should incorporate operation time, indicators of complications, and indicators of success [6]. Although we included operation time and a complication indicator in our analysis, no indicator of success was investigated in this study. Possible indicators of success for robotic thyroidectomy include postoperative serum thyroglobulin levels, remnant thyroid tissue on ultrasound, and recurrence of thyroid cancer. Monitoring of these factors, however, may be associated with poor statistical properties because of the limited number of expected events. In the present study, suppressed thyroglobulin levels at 3 postoperative months was <1.0 ng/mL in 98.3% of patients. Furthermore, no remnant thyroid was found on ultrasound after robotic thyroidectomy in all patients. Accordingly, we would not have been able to incorporate these indicators of success in the present analysis.
A recent meta-analysis reported that the median incidence of postoperative transient hypoparathyroidism ranged from 19.0% to 38.0% after conventional open thyroidectomy [24]. This incidence is comparable to that of robotic thyroid surgery, which has been shown to range from 39.1% to 48.1% [18,19]. The causes of postoperative hypocalcemia include disruption of the parathyroid arterial supply or venous drainage, thermal or electrical injury, mechanical injury, and either intentional or inadvertent removal [25]. As the parathyroid blood supply is both delicate and complex, close attention and experienced surgeons are required to ensure its preservation and to prevent hypoparathyroidism [7,26]. An analysis of the US National Inpatient Sample data also found an association between higher surgeon experience and lower hypoparathyroidism rates [27,28]. The incidence of transient hypoparathyroidism, therefore, was used as the surrogate marker of complications for learning curve evaluation. Our results indicated that at least 75 cases of BABA robotic thyroidectomy might be needed to overcome the learning curve.
This study had some limitations. First, the learning curve data were derived from the robotic thyroidectomy procedures of only one surgeon. For LC-CUSUM analysis, there was an assumption that each observation had no serial correlation [17]. Although we tried not to violate this assumption, a single novice operator had potentials for serial correlation. Furthermore, learning curves may also vary between individual surgeons based on experience in laparoscopy, surgical skills, or familiarity with the procedure [29]. This can limit the external validity of our findings, and generalization of the results should be made with caution. Second, the acceptable complication rates were relatively high. In the LC-CUSUM analysis, the adequate and inadequate levels for performance can be arbitrarily chosen by consensus. Although we used the complication rates from the two largest series of over 1000 patients, the acceptable level can be lower in the future. Third, learning curves can be affected by various patient-related factors; body characteristics, complex anatomy, or tumor characteristics can affect the learning process [30]. The inclusion of more straightforward and easier cases for initial cases of robot surgery can also lead to an inaccurate learning curve [31]. Further validation studies are needed to arrive at more precise conclusions.

Conclusions
More experience was required to achieve proficiency for robotic thyroid surgery when complication was used as the marker of proficiency, as opposed to operation time. Incorporation of the operation time, complications, and oncologic outcomes should be considered in learning curve assessment.