The Statistical Fragility of Operative vs Nonoperative Management for Achilles Tendon Rupture: A Systematic Review of Comparative Studies

Background: The statistical significance of randomized controlled trials (RCTs) and comparative studies is often conveyed utilizing the P value. However, P values are an imperfect measure and may be vulnerable to a small number of outcome reversals to alter statistical significance. The interpretation of the statistical strength of these studies may be aided by the inclusion of a Fragility Index (FI) and Fragility Quotient (FQ). This study examines the statistical stability of studies comparing operative vs nonoperative management for Achilles tendon rupture. Methods: A systematic search was performed of 10 orthopaedic journals between 2000 and 2021 for comparative studies focusing on management of Achilles tendon rupture reporting dichotomous outcome measures. FI for each outcome was determined by the number of event reversals necessary to alter significance (P < .05). FQ was calculated by dividing the FI by the respective sample size. Additional subgroup analyses were performed. Results: Of 8020 studies screened, 1062 met initial search criteria with 17 comparative studies ultimately included for analysis, 10 of which were RCTs. A total of 40 outcomes were examined. Overall, the median FI was 2.5 (interquartile range [IQR] 2-4), the mean FI was 2.90 (±1.58), the median FQ was 0.032 (IQR 0.012-0.069), and the mean FQ was 0.049 (±0.062). The FI was less than the number of patients lost to follow-up for 78% of outcomes. Conclusion: Studies examining the efficacy of operative vs nonoperative management of Achilles tendon rupture may not be as statistically stable as previously thought. The average number of outcome reversals needed to alter the significance of a given study was 2.90. Future analyses may benefit from the inclusion of a fragility index and a fragility quotient in their statistical analyses.


Introduction
The Achilles tendon is the most commonly ruptured tendon in the lower extremity, with an increasing annual reported incidence for acute Achilles tendon ruptures of up to 40 per 100 000/year. 19,24,37 Treatment options include nonsurgical management with the use of a cast-boot or functional brace and surgical repair of the tendon. 59 Several randomized controlled trials (RCTs) have sought to investigate the differences between operative and nonoperative options, with many trials showing no differences in patient-reported outcomes and rerupture rates. 43,59,65 The American Academy of Orthopaedic Surgeons have yet to make a strong recommendation in favor of either operative or nonoperative management, and as such there remains a substantial practice variation among surgeons for this injury. 15,59 The P value is a commonly used statistical tool to evaluate outcomes in research. When the P value is less than the threshold value, typically .05, the null hypothesis is rejected, indicating that there is a less than 5% chance that the difference measured occurred because of random chance. 4,16,63 This scenario is further interpreted as representing a "statistically significant" event. However, the P value is vulnerable to pitfalls in study design and study power as it does not account for effect size, strength of association, or applicability of an outcome to a specific population. 25,63 Furthermore, 96% of MEDLINE articles containing P values report at least 1 with a value of .05 or less. This is likely due to a variety of factors including, but not limited to, multiple testing, P-hacking, publication bias, and underpowered studies. 2,7,46 To this end, there is concern among medical professionals that the .05 threshold may be arbitrary or inappropriate and that its sole use for the statistical interpretation of a study may not be adequate.
Therefore, the Fragility Index (FI) has recently been introduced as a complement to traditional statistical analyses as represented by P values. FI is calculated from dichotomous outcomes by reversing the outcome status of patients included in one study arm, with the goal of determining the minimum number of outcome event reversals necessary to switch a finding from statistically significant to not statistically significant, or vice versa. 15,63 A large FI conveys to the reader more confidence in the statistical strength of a study outcome, suggesting that the reversal of a relatively large number of events is required to alter the observed result. The relevance of the FI is based on sample size and can therefore vary in strength depending on the power of the study. For example, an FI of 10 carries more weight in a smaller cohort study with a total of 50 patients as opposed to a larger population database study with 50 000 patients. Consequently, there is no specific threshold for FI to indicate the robustness of a study. 29 To address this issue, the Fragility Quotient (FQ) was introduced, dividing the FI by the sample size to achieve a value of relative stability. As such, the FQ demonstrates the percentage of reversals required to alter statistical significance, and therefore, statistical stability is most effectively communicated through the inclusion of both FI and FQ values. 1,15 The published literature investigating the statistical robustness of comparative studies via the utilization of fragility analysis has demonstrated relatively low FI and FQ values, with multiple studies reporting FIs ranging from 2 to 5, a number that is usually less than the number of patients lost to follow-up. 3,20,26,28,32,35,[39][40][41]43,45,54,61,62,64,65 Thus, the significance of a result could be altered by simply maintaining patient follow-up. 63 To date, no studies have used FI and FQ to evaluate the literature relevant to operative vs non operative management of Achilles tendon ruptures.
The purpose of the present study is to determine the statistical stability of studies comparing operative to nonoperative management for Achilles tendon rupture. The primary objective was to calculate the FI and FQ for dichotomous outcome measures, including tendon rerupture, of the included studies. The secondary aim was to conduct subgroup analysis to determine the proportion of outcome events for which FI was fewer than the number of patients lost to follow-up (LTF). The authors hypothesize that more than half of outcomes analyzed will have a loss to follow-up greater than the fragility index for that outcome.

Methods
Comparative studies and RCTs comparing outcomes of operative vs nonoperative management of Achilles tendon ruptures published in select journals from 2000 to 2021 were identified and collected. The journals were selected for their prominence within the field of orthopaedic surgery and foot and ankle surgery.  8 Studies from these journals were reviewed in adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. 33 Initial PubMed search was conducted by searching by "Journal" and then utilizing the "AND" tool to search for all articles containing the words Achilles, gastrocnemius, or soleus. For example, the search in Foot & Ankle International was as follows: ((("Foot ankle international"[Journal]) AND (achilles)) OR (gastrocnemius)) OR (soleus). The titles and abstracts of these studies were then screened independently by 2 authors (NF, CE). Any disagreements in article selection that arose were settled by the senior author (DW). Included studies compared operative vs nonoperative management of Achilles tendon ruptures. The studies were excluded if (1) the surgical technique was not explicitly described or referenced; (2) patients with an incomplete Achilles tendon tear were included; (3) the patients underwent revision Achilles tendon repair; (4) the studies were cadaveric, in vitro, or animal studies; (5) the study used population databases, national registries, or cross-sectional data; (6) no dichotomous outcomes were reported anywhere in the study; and (7) the study was not related to operative vs nonoperative outcomes (blood loss, anesthesia time, etc). From the studies meeting these criteria, all categorical outcomes were included. Nondichotomous data points were not included as these are unable to be analyzed with current fragility methodology ( Figure 1).
The quality of included studies was assessed independently by 2 authors (NF, WL) using the Cochrane Risk of Bias for Randomized Trials (ROB-2) tool and Methodological Index for Non-Randomized Studies (MINORS) criteria for randomized and nonrandomized studies, respectively. The ROB-2 tool examines risk of bias under 5 domains: (1) randomization process, (2) deviations from intended intervention, (3) missing data, (4) measurement of the outcome, (5) selection of the reported result. Each article is assessed and assigned a score of low risk, some concerns, or high risk of bias for each domain. 30 MINORS is a validated scoring system for nonrandomized studies that gives a score of 0, 1, or 2 to 12 criteria assessing bias for a maximum score of 24 for comparative studies. 58 Data involving dichotomous outcomes were extracted from each study including the number of patients in each outcome group, the outcome being measured, total population size, and the number lost to follow-up. The reported P value associated with each dichotomous outcome measure was recorded and verified for accuracy using a Fisher exact test. Statistical significance was set as a P value <.05. Using a contingency table, the results of the outcomes were manipulated until the significance was reversed. For example, if the P value of a certain outcome was reported as less than .05, the number of outcome reversals needed to increase the P value above .05 was determined, and vice versa. FI was recorded as the number of outcome reversals needed to change the significance of the study. FQ was determined by dividing the FI by the respective sample size. Studies whose FI was less than their number lost to followup were identified. Six subgroups were analyzed for significant differences via independent t tests at 95% confidence: (1) significant (P < .05) vs insignificant (P > .05) outcomes, (2) outcomes for which the FI was fewer than the number of patients lost to follow-up vs outcomes for which the FI was greater than the number of patients lost to follow-up, (3) outcomes between rates of rerupture and all other outcomes, (4) outcomes from RCTs vs those from nonrandomized comparative studies (5) Primary outcomes vs secondary outcomes, and (6) outcomes from studies determined to be low risk of bias by the ROB-2 tool (ie, high-quality studies) vs outcomes from all other studies. Data analysis was performed in Microsoft Excel (version 16.37).

Results
Of the 8020 studies identified, 1062 comparative studies were screened. Ultimately, 17 studies were included for the analysis, including 10 RCTs. Details of the included studies can be found in Appendix 1.
A summary of risk of bias for randomized studies utilizing the ROB-2 tool is shown in Figure 2, and MINORS criteria scoring for nonrandomized studies is demonstrated in Table 1. Five of the 10 RCTs had some concern for risk of bias found in their study. The average MINORS score for comparative studies was 14 (range [13][14][15][16]. A total of 40 dichotomous outcomes from the 17 studies examined were analyzed. Across all outcomes, the median FI was 2.5 (interquartile range [IQR] 2-4), the median FQ was 0.032 (IQR 0.012-0.069), the mean FI was 2.9 (±1.58), and the mean FQ was 0.049 (±0.061). Across all studies, with the mean FI and FQ of each study weighted evenly, mean FI was 2.81 (±1.31) and mean FQ was 0.040 (±0.028). The FI was greater than the number lost to follow-up (LTF) for 78% of outcomes. The results of the subgroup analysis can be found in Table 2.
No significant differences were found across any of the subgroups analyzed. The largest difference found in the subgroup analysis was the FI of outcomes in studies with no concern for risk of bias (3.71 ± 1.25) compared to outcomes in all other studies (2.73 ± 1.61) (P = .07). The next largest differences were found in the FQ of significant (P < .05) outcomes (0.022 ± 0.030) compared to insignificant (P > .05) outcomes (0.054 ± 0.065, P = .113), and the FQ of rerupture (0.035 ± 0.029) compared to all other outcomes (0.058 ± 0.074, P = .133).
This study expands on a discussion started by a recent fragility analysis examining Achilles tendon injury in top orthopaedic journals. 48 In their review, Parisien et al analyzed outcomes across studies focusing on Achilles tendon injury and found that these data lacked statistical stability. The current study narrowed its focus on a specific clinical question: operative vs nonoperative management of Achilles tendon rupture. This analysis revealed that outcomes in operative vs nonoperative studies were more fragile (median FI = 2.9) than the overall literature on Achilles tendon injury (median FI = 4). Furthermore, LTF >FI was found to be higher in the studies included in this analysis (78%) compared with Achilles tendon injury literature  60 The plus sign indicates a low risk of bias, and the question mark indicates that there is some concern for bias.
(70.5%). 48 The findings from this study add to the growing body of evidence supporting the inclusion of fragility indices and quotients in studies focused on Achilles tendon rupture management and the orthopaedic literature as a whole.
A recent systematic review and meta-analysis examined many of the trials included in this study and concluded that surgery decreases risk of rerupture but increases overall risk of complications related to surgery, and that the choice of  operative vs nonoperative management should be patient specific. 44 Multiple reviews have noted that heterogeneity among rehabilitation protocols, timing of weightbearing status, and duration of follow-up can all contribute to the lack of consensus regarding which treatment modality is superior. 27,44,59 There is also significant heterogeneity among surgical repair strategies, including traditional open vs minimally invasive techniques and use of suture anchors and biologics. Ultimately, future high-quality research examining each of these factors in both active and sedentary populations will be necessary to further delineate any differences in outcomes between operative and nonoperative treatment of Achilles tendon ruptures. The results of this study place an increased emphasis on the need for highquality research on the topic, as it has been demonstrated that high-quality studies are less fragile than studies with a greater risk of bias.
The fragility index has received some criticism recently, with some calling it a P value in disguise 6 and an oversimplification of the complex, nonlinear relationships between various factors in a given study. 9 Indeed, the fragility index is an offshoot of the P value and therefore should be taken as a metric to aide in the interpretation of the P value. 22 Other important metrics of a study's robustness such as study design, prospective sample size calculations, preregistration of planned analyses, and transparent reporting of procedures and statistical analyses should all be taken into consideration when interpreting the results of a study. The inclusion of FI and FQ in a given analysis should be viewed as an additional tool in the clinician's arsenal for the interpretation of the statistical conclusions of a study.
This study should be interpreted within the context of its limitations. First, FI and FQ can only be calculated from outcomes using dichotomous data, and therefore, the fragility of important continuous variables such as muscle dynamometry and Short Musculoskeletal Function Assessment scores cannot be determined with this mode of analysis. Future analyses examining continuous outcomes using the method developed recently by Caldwell et al 5 would be beneficial for the literature. Because only dichotomous outcomes could be analyzed, 4 studies were excluded. This study examined outcomes from articles published in the top 10 highest-impact journals in sports and foot and ankle surgery. This may be considered both a strength and a weakness as the data from these high-impact journals represent some of the best evidence available on the topic; however, there is potential for other studies to be published outside of these selected journals that were not included in this analysis. Finally, although having a majority high-quality RCTs in this analysis may be considered a strength, the heterogeneity of included studies, both in surgical technique and in patient population studied may be considered a weakness of this analysis.

Conclusion
The statistical significance of studies examining the operative vs nonoperative management of Achilles tendon ruptures is fragile. In particular, outcomes from studies with greater risk of bias proved to be more fragile than the rest of the literature. A focus on high-quality, statistically robust analyses of operative vs nonoperative management of Achilles tendon rupture will minimize this risk of fragility in the future. These future studies may benefit from the inclusion of an FI and FQ in their statistical analyses.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. ICMJE forms for all authors are available online.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.