Pre-operative Assessment of Shoulder Pathologies on MRI by a Radiologist and an Orthopaedic Surgeon

Introduction Pathologies of the shoulder, i.e. rotator cuff tears and labral injuries are very common. Most patients receive MRI examination prior to surgery. A correct assessment of pathologies is significant for a detailed patient education and planning of surgery. Materials and methods Sixty-nine patients were identified, who underwent both, a standardised shoulder MRI and following arthroscopic shoulder surgery in our hospital. For this retrospective comparative study, the MRIs were pseudonymised and evaluated separately by an orthopaedic surgeon and a radiologist. A third rater evaluated images and reports of shoulder surgery, which served as positive control. Results of all raters were then compared. The aim was an analysis of agreement rates of diagnostic accuracy of preoperative MRI by a radiologist and an orthopaedic surgeon. Results The overall agreement with positive control of detecting transmural cuff tears was high (84% and 89%) and lower for partial tears (70-80%). Subscapularis tears were assessed with moderate rates of agreement (60 – 70%) compared to intra-operative findings. Labral pathologies were detected mostly correctly. SLAP lesions and pulley lesions of the LHB were identified with only moderate agreement (66.4% and 57.2%) and had a high inter-rater disagreement. Conclusion This study demonstrated that tears of the rotator cuff (supraspinatus, infraspinatus) and labral pathologies can be assessed in non-contrast pre-operative shoulder MRI images with a high accuracy. This allows a detailed planning of surgery and aftercare. Pathologies of the subscapularis tendon, SLAP lesions and biceps instabilities are more challenging to detect correctly. There were only small differences between a radiologic and orthopaedic interpretation of the images.


INTRODUCTION
Shoulder pathologies are very common, especially in athletes with a high overhead activity (i.e.ice hockey, tennis, baseball) [1][2][3] .Asymptomatic professional baseball players had a prevalence of 79% of labral abnormalities on MRI images 4 .Full thickness tears of the rotator cuff have a prevalence in the general population between 11.75% 5 and 22.2% 6 .It is even higher in overhead athletes due to the elevated torques and forces 7 .Partial tears of the rotator cuff have an overall incidence of 18-20% 5 which seems to increase with age with reports of 4% under 40 years and 26% over 60 years 8,9 .It is the authors' clinical experience that many patients already had a current, non-contrast, MRI examination of the affected shoulder when consulting a specialised orthopaedic surgeon.It is crucial for the surgeon to interpret MRI findings correctly to better educate patients and plan possible surgery and aftercare.In the authors' view the radiologist's report demonstrates, not infrequently, a slightly different evaluation.A possible explanation could be the missing physical examination of the patient and limited opportunity of reassurance by intra-operative findings by the radiologist.
We designed this retrospective analysis of our shoulder surgery cases to explore the accuracy of pre-operative assessment of MRI findings and to detect whether there are differences between a radiologist and an orthopaedic surgeon.

Pre-operative Assessment of Shoulder Pathologies on MRI by a Radiologist and an Orthopaedic Surgeon
year in our specialised department.We retrospectively screened for patients who had both a standardised noncontrast shoulder MRI in our hospital and afterwards a shoulder arthroscopy in our department between 2016 and 2017.
Sixty-nine patients (n=69) were identified and included.This ensured uniform shoulder MRI images and surgical examination of all participants.All study patients underwent a non-contrast enhanced MRI examination of the shoulder joint with a standardised examination protocol.A 1. 5  Data of the two raters 'assessments at two timepoints were compared to positive control.Data were the involvement of substructures and the extent of injuries.The raters followed a two-stage assessment.First, is there a pathology (yes=1, no=0)?Second, the accuracy of classifying the pathology was detected (categorical data) on a numerical scale (0,1,2,3 etc.).The mean difference (=disagreement) between raters and positive control was given as score.A low score means a low disagreement (=high accuracy).Continuous data were summarised as means +/-standard deviations or as medians [25th and 75th percentiles] as appropriate.
The intra-observer disagreement was used to assess a difference between timepoints.It is calculated within each rater's assessments.The differences were averaged across raters.The inter-observer difference is used to examine the disagreement between the positive control, rater 1 and rater 2. Differences between the measurements of all instances were averaged.
The methods are described in the script Biostatistics for Biomedical Research by Harrell and Slaughter 15 .Inter-and intra-observer disagreement estimates, and 95% confidence intervals are based on bootstrap calculations.The summary statistics are shown in Tables I-IV stratified by rater and time of assessment.Disagreement rates are given, and bootstrap confidence intervals are shown in brackets.All calculations were performed with the statistical analysis software R (R Core Team, 2020).
Statistical significance level was alpha = 0.05.Differences between raters' disagreement to positive control were evaluated using mixed logistic and mixed ordinal regression models.Table IV shows p-values to demonstrate significant differences for each study variable between rater 1 and 2.
Both raters detected tear location with supraspinatus (71% vs 74%) and infraspinatus tears (94% vs 81%) with good accuracy (rates of agreement) compared to positive control.The inter-observer disagreement was 22.8% for supraspinatus and 14.5% for infraspinatus with a low intraobserver variability at both timepoints (T1 and T2).
The specification of tear extent was solid by both raters for partial tears.Articular sided tears were assessed correctly in >80% by both raters.Bursal sided tears were detected with 72% accuracy by rater 1 and 83% by rater 2, respectively.Furthermore, transmural tears were evaluated highly accurately with 84% for rater 1 and 88% for rater 2. All these assessments had a very low intra-observer variability.
However, there were difficulties in assessing transmural tear extent in more detail with very high disagreement to positive control and also high inter-observer disagreement.Also, the small number (n=4) of tears involving the ISP allows only limited interpretation of the data.The medial tendon retraction was evaluated more consistently and accurately, compared to intra-operative findings, by both raters.Still, there was an elevated inter-observer disagreement (0.319).
For further details see Table I.
There were deviating results for detection of subscapularis tendon tears.On the one hand, there were only 8 patients with tears, which were described during surgery, according to rater 3, which served as positive control (n=6 with Fox/Romeo type 1, n=1 of type 2 and n=1 of type 3 tears).In comparison, the two raters detected a higher and also consistent number of tears by evaluating the MRI images at both time points (t1 and t2).Rater 1, the orthopaedic surgeon, identified 19 (t 1) and 20 (t2) SSC tears, whereas rater 2, the radiologist, identified 29 (t1) and 28 (t2) tears on the MRI images.The rates of agreement to positive control were therefore moderately high with 74% for the orthopaedic surgeon (rater 1) and only 65% for the radiologist (rater 2).Grading of the tears, by using the Fox/Romeo classification, was also inconsistent by both raters compared to positive control with elevated disagreement rates (0.442 vs 0.717).Due to the small number of eight confirmed SSC tears the data should be carefully interpretated.
Labral findings were detected in only 6 of 69 study patients (9%) according to surgery reports.Both raters were able to identify labral pathologies with good accuracy (92% vs 91%) and a low intra-observer variability (<1%) on MRI images.Rater 1 falsely identified one bony Bankart's lesion.Both raters misinterpreted a sublabral foramen for a labral lesion.
Intra-operatively only one SLAP lesion was identified.The MRI assessment had a high accordance rate for this one detected SLAP lesion (96.3% for rater 1, 84.1% for rater 2).Results and agreement rates should be interpretated with caution due to the rather small sample size for labrum and SLAP.For further details see Table II.
Seventeen pulley-lesions of the LHB with instability of 69 (25%) study patients were found during surgery.The correct pre-operative assessment of the pulley lesions on MRI images was weak by both raters (66.4% for rater 1 vs 57.2% for rater 2).There was a high inter-observer (38.9%) and intra-observer (10.9%) variability for pulley lesions.Grading of pulley lesions, according to Habermeyer's classification, was very inconsistent to positive control for both raters.The intra-observer variability was also very high.For further details see Table III.
Both raters demonstrated good agreement rates, compared to positive control, in numerous anatomical structures.Due to the small sample size and low numbers of certain pathologies N is the number of non-missing values.Numbers after proportions are frequencies.Disagreement between rater (positive control) and within the rater.Numbers in brackets for disagreement rates are bootstrap confidence intervals.PC=positive control, R= rater, T= time there were only indications of different diagnostic accuracy, but no clear differences.The agreement rates for subscapularis and infraspinatus tears was mildly higher for rater 1. Bursal sided partial tears of the rotator cuff were assessed slightly more precisely by the radiologist.Both raters deviated in results to positive control in assessment of subscapularis tears and classifying of biceps pulley lesions.
For further details see Table IV.

DISCUSSION
Although ultrasound has a high accuracy in detecting rotator cuff and labral pathologies 16,17 , MRI and MRA (MR arthrography) are often used prior to surgery, as they have a comparable high diagnostic value for evaluating shoulder injuries 18,19 .N is the number of non-missing values.Numbers after proportions are frequencies.Disagreement between rater (positive control) and within the rater.Numbers in brackets for disagreement rates are bootstrap confidence intervals.(PC: positive control, R: rater, T: time).
The aim of this retrospective study was to examine how reliable shoulder MRI images can be assessed as this has direct consequences for the patient and surgeon alike.We intended to use a setup that is very close to daily routine of a specialised department.The data and findings during shoulder surgery and post-surgery treatment offer valuable feedback, which can be applied as positive control to evaluate diagnostic accuracy and possibly improve treatment planning.Furthermore, we were interested whether there was a different diagnostic assessment between an orthopaedic surgeon and a radiologist.We aimed to investigate and interpret varying assessments carefully by analysing agreements of the observers, rather than calculating absolute rates of right and wrong answers.
Our study demonstrated an accuracy of detecting transmural rotator cuff tears of 84% and 89% and partial tears between 70-80% for both raters.Although these numbers are slightly lower than results in the literature 20,21 , it confirms that partial tears are assessed less precisely (with lower sensitivity) than full thickness tears.Smith et al examined 44 studies in a meta-analysis and detected a pooled MRI sensitivity for partial cuff tears of 80% and for transmural tears 91% 21 .A 2019 meta-analysis found MRA and MRI equally effective for assessing bursal sided partial rotator cuff tears 22 .New advanced MRI techniques were examined by Lazik-Palm et al 23 with both, a pre-operative 7 Tesla shoulder MRI and arthroscopic surgery.They established, in a small number of patients, a sensitivity of 86% (specificity 74%) for all structures, which is lower than 3 T MRI results 19,22 and also described a bias of overrating signal alterations of the tendon and to mind the magic angle effect 23 .
The assessment of subscapularis tears in pre-operative MRIs is challenging.We only had a very small number (n=8) of intra-operatively confirmed SSC tears in our study sample.Rater 1 and 2 detected a higher number of tears (ca.20 vs 28) on the MRI images.This could either mean that tears were overinterpreted by rater 1 and 2, or that a certain number of SSC tears were missed by the surgeons.We used a 30°, and not a 70° scope.As a direct consequence of these findings, we slightly alternated our diagnostic assessment during arthroscopy by using the 30° scope and bringing the arm in anteversion and internal rotation, and optionally with ventral pressure on the humeral head, to evaluate SSC in more detail.The arm is rotated under visual control of the subscapularis and the insertion area probed from proximal to distal on both sides of the tendon.The follow-up reports of study patients to not indicate enduring subscapularis pain, although this study was not specifically designed for a standardised follow-up.A recent systemic review for diagnosing subscapularis tears established a sensitivity of only 0.68 (95%CI 0.64-0.72)for MRI 24 .This limited diagnostic precision for subscapularis tears might be an explanation why both raters of our study had difficulties of correctly classifying detected tears.Furthermore, we intended to use the findings during surgery as feedback for our study.The Fox/Romeo classification for SSC tears is a rather surgical classification and potentially only limited suitable for describing tears on MRI images.
Labral pathologies are common and can even be detected in 25% of shoulders of asymptomatic professional and collegiate ice hockey players 1 .A 2019 systematic review and meta-analysis found MRI to be the best diagnostic modality for acute labral pathologies with a sensitivity of 0.77 (95% CI 0.70-0.84) 25.Despite the small number (n=6 of 69) of cases in our study, both raters established a good accuracy of more than 90% correctly identified labral injuries and >80% agreement for a one SLAP lesion, although we used a noncontrast MRI.
Pathologies of the LHB (17 pulley lesions) were detected with low rates of agreement (66.4% vs 57.2%) on 1.5 T MRI images by our raters with a high inter-rater disagreement.Further classifying the pulley-lesion was very inconsistent.These results confirm Baptista et al 26 who had a comparable study design and also found only a moderate overall accuracy of 1.5 T MRI for detecting LHB tears (71%-73%) and LHB tendon displacement (51%-58%).MRarthrography seems to be controversial as Loock et al 27 , found only a low accuracy of MRA for LHB pathologies and would not recommend it for pre-surgery imaging.Contrary, Barile et al recommend a 1.5 T MRA with the arm in the ABER position for pulley lesions 28 .Kim et al 29 described a high diagnostic accuracy of a pre-operative 3 Tesla MRI for LHB pathologies.
The inter-observer comparison, orthopaedic surgeon vs radiologist, detected only small significant differences of diagnostic accuracy and agreement rates.Subscapularis and infraspinatus tendon tears were detected slightly more accurate by the orthopaedic surgeon, although the sample size was small.Bursal sided tears were assessed more accurately by the radiologist.A further interpretation of these findings seems to be speculative due to the small study population.
A recent study established that information on the patient's clinical complaints is helpful of better interpreting shoulder MRI as degenerative findings seem to be age-related and very common 30 .Yazigi et al demonstrated that experience of orthopaedic shoulder surgeons and musculoskeletal radiologists ensures higher results in MRI diagnostics 31 .
A meta-analysis of 44 studies by Smith et al described no difference of diagnostic accuracy between a musculoskeletal radiologist and a general radiologist in shoulder MRI for rotator cuff injuries 21 .This current study has limitations.First, we only had a small study population (n=69) with few cases of certain pathologies (i.e. one SLAP lesion).A larger cohort is necessary to further explore and confirm our current findings.Second, the intra-operative assessment could have been inaccurate and of low quality as positive control.To avoid that, all surgeons followed standardised diagnostic steps and are experienced in their field.One of the two surgeons (MJR) of this study served as rater 3 (=positive control).Rater 1 (BH), an orthopaedic surgeon, was present during some of the surgeries.We conducted our study in a pseudonymised fashion and kept a timely gap between surgeries and our study (>5 months) to limit a possible bias.The results did not indicate that rater 1 had advantage over rater 2.
Third, we initiated this analysis to learn from our daily routine.This meant that we assessed the diagnostic accuracy of the raters inversely by learning from the surgery images, reports and patient files and using this as control.Radiologic classifications often cannot be fully transferred to surgical findings, which leads to a certain indistinctness.
The idea for this study was to re-evaluate if surgery was indicated correctly and if findings of MRI images were either confirmed, completely missed or graded differently during surgery.From a surgeon's point of view, the interpretation of MRI images should conclude practical implications: Does the pathology or extent of finding require surgery (i.e.subscapularis tear)?If so, is tendon reconstruction still possible, how much surgery time or numbers of implants should be planned etc.? Do other structures need to be addressed?How should the patient be educated?
Forth, this study used 1.5 Tesla non-contrast MRI images.This represents a realistic daily routine in our clinic as many patients already had MRI prior to consulting a shoulder specialist.A high field MRI or MRA might increase diagnostic accuracy for certain pathologies 21,28 .A recent study suggests that saline could be used instead of gaudolinum for MRA with equal results in detecting labral and rotator cuff pathologies 32 .In the near future, software algorithms could support orthopaedic surgeons and radiologist by analysing all diagnostic modalities to further enhance diagnostic accuracy 33 .

CONCLUSION
In summary, this study aimed to learn from possible different assessments of MRI images by a radiologist or orthopaedic surgeon to improve our daily routine, rather than drawing any conclusions by direct comparison.We found that tears of the rotator cuff and labral pathologies can be assessed in shoulder MRI images with a high accuracy compared to intra-operative findings.A 1.5 Tesla MRI, with supine position of the arm, offered only moderate sensitivity for detecting pulley lesions of the LHB.There were only small differences between a radiologic and orthopaedic interpretation of the images.Future studies with larger study groups can further explore these results.
for hidden, instable tears or grade 2 or 3 tears.Then, the subacromial space was inspected.A lateral portal was established for introducing the shaver and bursectomy was performed.Afterwards the scope was switched from the posterior to the lateral portal to evaluate the rotator cuff by a probe with a clear view on the rotator cuff.All patient data was pseudonymised.

Table I :
Rotator cuff.

Table II :
Labral pathologies.is the number of non-missing values.Numbers after proportions are frequencies.Disagreement between rater (positive control) and within the rater.Numbers in brackets for disagreement rates are bootstrap confidence intervals.(PC: positive control, R: rater, T: time) N

Table IV :
Inter-rater comparison of diagnostic accuracy (differences between raters' disagreement to positive control).