Sex differences in the morphological failure patterns following hip resurfacing arthroplasty

Background Metal-on-metal hybrid hip resurfacing arthroplasty (with a cementless acetabular component and a cemented femoral component) is offered as an alternative to traditional total hip arthroplasty for the young and active adult with advanced osteoarthritis. Although it has been suggested that women are less appropriate candidates for metal-on-metal arthroplasty, the mechanisms of prosthesis failure has not been fully explained. While specific failure patterns, particularly osteonecrosis and delayed type hypersensitivity reactions have been suggested to be specifically linked to the sex of the patient, we wished to examine the potential influence of sex, clinical diagnosis, age of the patient and the size of the femoral component on morphological failure patterns in a large cohort of retrieved specimens following aseptic failure of hip resurfacing arthroplasty. Methods Femoral remnants retrieved from 173 hips with known patient's sex were morphologically analyzed for the cause of failure. The results were compared with the control group of the remaining 31 failures from patients of unknown sex. The odds ratios (OR) and 95% confidence intervals (CI) of the following morphologically defined variables were calculated using logistic regression analysis: periprosthetic fractures (n = 133), osteonecrosis (n = 151), the presence of excessive intraosseous lymphocyte infiltration (n = 11), and interface hyperosteoidosis (n = 30). Logistic regression analysis was performed both unadjusted and after adjustment for sex, age, the size of the femoral component, and preoperative clinical diagnosis. Results Femoral remnants from female patients had a smaller OR for fracture (adjusted OR: 0.29, 95% CI 0.11, 0.80, P for difference = 0.02) and for the presence of osteonecrosis (adjusted OR: 0.16, 95% CI 0.04, 0.63, P for difference = 0.01). However, women had a higher OR for both the presence of excessive intraosseous lymphocyte infiltration (adjusted OR: 10.22, 95% CI 0.79, 132.57, P for difference = 0.08) and interface hyperosteoidosis (adjusted OR: 4.19, 95% CI 1.14, 15.38, P for difference = 0.03). Conclusions Within the limitations of this study, we demonstrated substantial sex differences in distinct failure patterns of metal-on-metal hip resurfacing. Recognition of pathogenically distinct failure modes will enable further stratification of risk factors for certain failure mechanisms and thus affect future therapeutic options for selected patient groups.


Background
Gender medicine is a novel and rapidly evolving research discipline. Indeed, there has been an almost linear increase in the literature incorporating sex/gender differences [1]. Within the last few years, lively discussion regarding possible sex differences has also been initiated in the orthopedic surgeon community. Serious concerns have arisen regarding the potential adverse biological reactions to metal-bearing surfaces and particular prosthesis designs such as hip resurfacing arthroplasty. In fact, metal-on-metal technology is now used in over one-third of all hip arthroplasties performed in the United States [2]. In recent years, hip resurfacing arthroplasty has become an accepted alternative to traditional stemmed total hip arthroplasty in young adults worldwide [3], although patient selection is important in order to avoid failure [4][5][6][7][8][9][10][11]. Most authors [2][3][4][5][6][8][9][10][11][12] consider men under the age of 65 with osteoarthritis to be the best candidates for hip resurfacing. However, 1 Institute of Pathology, University Medical Center Hamburg-Eppendorf, Germany Full list of author information is available at the end of the article recent reports from centers that design hip resurfacing arthroplasty [7,13] suggest that the smaller size of the femoral component rather than female sex is linked with worse outcomes for this procedure.
In our earlier studies on failed hip resurfacing arthroplasty, we observed some sex differences in a large collection of retrieved prostheses: men were more frequently revised for postnecrotic fractures [14], and the extent of osteonecrosis was larger than in specimens obtained from women [14]. However, women were more frequently revised for unexplained persistent groin pain, which was attributed to a suggested hypersensitivity reaction after the index surgery [15]. In the present study, we calculated the ORs for morphologic failure modes in the entire cohort after adjustment for sex, age, the size of the femoral component, and preoperative clinical diagnosis. We asked: is the previously reported sex dimorphism really linked with the sex of the patient?

Data collection
In an international multi-surgeon retrieval study on total hip resurfacing arthroplasty (THRA), we obtained 283 specimens between January 2004 and February 2010. During the planning of the design of this study in 2003, the suggested primary objective was a tribological investigation of the prosthesis surface in order to demonstrate the potential wear-induced failures as they were frequently reported in the second generation of (metalon-polyethylene) THRA. Therefore, several specimens, preferentially from the early phase of the Hamburg retrieval study on THRA were obtained without bone tissues or without using any standard fixation method for bone tissue (Table 1). Later on, when we presented preliminary results of morphological analyses of the first dozen standard analyzed retrieved hips and specifically focused on the issue of histopathological changes within the periprosthetic tissues and the potential adverse reactions to metal material, the discipline of the cooperating surgeons in the submission of basic clinical data substantially improved. Altogether, 46 specimens did not contain bone remnant tissues under the cup at all; in 16 cases focal rests (mostly less than 2 cm 2 ) of the bone tissue were severely mechanically damaged and 11 specimens contained osseous tissue but were sent without fixation and the histopathology was non-informative. We also obtained 31 cases with minimal clinical data; particularly the data on sex were completely missing. Finally, six cases were revised for periprosthetic infections and were not included in further analyses. After excluding all 79 cases with septic complications, insufficient quality of fixation of the femoral remnant bone tissue and hips with invalid demographic data, the present study cohort contained 85 women (median age 56 years old, interquartile range (IQR) 49 to 60) and 88 men (median age 56 years old, IQR 51 to 60; P = 0.584; Table 2). Valid clinical data were obtained for the majority of the specimens in the study cohort: 97.1% (168) for age, 93.6% (162) for the duration of implantation, and 82.1% (142) for the preoperative clinical diagnosis. Most hips were treated for advanced stages of primary osteoarthritis (71.8%). Other conditions were developmental hip dysplasia (11.3%), femoral head osteonecrosis (7.0%), posttraumatic arthritis (4.9%), and rheumatoid arthritis (4.9%). The remaining 31 cases with unknown patient sex but informative results on the All revisions were unilateral. One hundred and fourteen revisions (66%) out of a total of 173 cases with valid data on patient sex were performed for periprosthetic fractures, 45 (26%) for non-fractural causes, and 14 (8%) for acetabular loosening (Table 3). Several cases had more than one reason for revision surgery, for example several hips with pseudoarthrosis hidden under the femoral component caused by chronic fracture were clinically or radiographically classified as loosening of the femoral component.

Morphological classification of failure patterns
Each specimen was cut using a water-cooled band saw and analyzed macroscopically, contact radiographically and microscopically according to a high standard sampling protocol as described previously [14][15][16][17][18]. Briefly, the femoral heads with in situ femoral components were cut in the coronal plane and X-rayed and documented photographically. A second section was oriented perpendicular to the first. The coronal plane and the anterior section were embedded without decalcification in their full length and microscopically analyzed. Each case was examined macroscopically, microscopically and by contact radiography. In our previous work, we proposed classifications for both periprosthetic fractures [18] and the loosening of the femoral component [17] based mostly on the macroscopic and contact radiographic findings which were subsequently confirmed microscopically (for example osteonecrosis, pseudoarthrosis). Histopathological analyses also revealed findings that could not be recognized by macroscopic assessment (for example intraosseous lymphocyte infiltration, hyperosteoidosis of the interface bone trabeculae). We summarized all the results of the histopathological analyses, both macroscopic and microscopic, and proposed classification schemas under the term "morphological patterns" of THRA failure.
Briefly, the periprosthetic fractures were morphologically classified [18] as postnecrotic, when advanced osteonecrosis was found in the complete femoral remnant proximal to the fracture line [14,15], or as biomechanical, when the bone tissue from both sides of the fracture line was proven viable by histopathology. In cases of acute fracture, no reparative reaction was present. In hips with chronic fracture, either the fracture callus (union) or pseudoarthrosis (non-union) was detected microscopically ( Figure 1A) [18]. Advanced osteonecrosis was defined macroscopically by yellowish colored areas of the bone and confirmed microscopically by the presence of trabeculae without stainable osteocytes, disorganized bone marrow, and bordering fibrosis ( Figure 1B). Because all osteonecrotic lesions showed contact with the surface of the femoral remnant under the prosthesis, we also measured the vertical distance between the bone remnant surface and bordering fibrosis [14].
Excessive intraosseous lymphocyte infiltration was characterized microscopically by the finding of more than 300 lymphocytes within one high power field of the microscope in areas with maximum intraosseous lymphocyte infiltration ( Figure 1C) [15].
Interface hyperosteoidosis was defined microscopically by the presence of widened osteoid seams on the trabecular surface at the bone-cement interface. These areas represented compact but somewhat irregular nonmineralized bone tissue within lamellar structured viable superficial bone trabeculae ( Figure 1D). These were oriented mostly parallel to the surface of the cement in the vicinity of the cement mantle and also next to intertrabecular cement interdigitations, irrespective of the direction of the intratrabecular lamellae [17].
Failures were defined as clinical complications leading to the revision surgery with loss of the THRA device. One hundred and thirty-three (65%) out of a total of 204 cases provided reproducible results of the morphological analyses showing failure due to periprosthetic fracture. Seventy-one hips were revised for reasons other than the fracture: loosening of the acetabular component (n = 15), loosening of the femoral component (n = 10), cement-socket debonding (n = 3), collapsed osteonecrosis (n = 5), macroscopic visible metallosis (n = 2) and unexplained groin pain (n = 36). Even though several potential causes of the groin pain have been discussed in the literature (for example femoro-acetabular impingement or hypersensitivity reaction), we did not obtain any further specific information and included such cases in the group of 'unexplained groin pain'.

Statistical methods
Descriptive statistics were performed to describe the median and interquartile range (IQR). As time to revision surgery, the vertical extent of osteonecrosis, and age deviated from a normal distribution, a non-parametric analytical method was used (Mann-Whitney-U test). Logistic regression analysis was used to estimate odds ratios (OR) and 95%-confidence intervals (95% CI). In order to evaluate the possible influence of other variables on the failure pattern of THRA, the size of the femoral component (women commonly need smaller sized prostheses) and clinical diagnoses, logistic regression analysis was also performed after adjustment for sex, age, size of the femoral component, and clinical diagnoses. In the adjusted models, age (in years) and the femoral component size (in millimeters) were used as continuous covariates; for the categorical variables sex and clinical diagnosis all categories were compared to a reference category. We used a global F-test for clinical diagnosis to overcome the problem of sparse subgroups.
Although the main focus of our study did not lie in reporting distinct morphological failure patterns for different clinical diagnoses, but instead in investigating the potential cofounders for the examined sex effect, we included these factors in our adjusted models.  (Table 4). Osteonecrosis was detected in the femoral remnants of 80 (90.9%) male and 71 (83.5%; OR: 0.507, 95% CI: 0.201, 1.280; P = 0.151) female patients. The vertical extent of osteonecrosis was, however, significantly larger in the femoral remnants of male patients (median vertical extent of osteonecrosis 15.3 mm, IQR: 3.6 to 24.2) compared with female patients (median vertical extent of osteonecrosis 6.2 mm, IQR: 2.6 to 14.6; P = 0.008). Moreover, 41 (63.1%) out of 65 hip fractures in men were defined as postnecrotic, with a slightly lower frequency in female patients (23 (46.9%) out of 49 periprosthetic fractures were postnecrotic, P = 0.086). Interestingly, after adjusting for sex, age, and size of the femoral component, the logistic regression analysis revealed lower ORs for the occurrence of osteonecrosis within the femoral remnants for female patients (adjusted OR: 0.159, 95% CI: 0.040, 0.634; P = 0.009) compared with men (Table 5).

Results
Excessive intraosseous lymphocyte infiltration of femoral remnant bone tissue was observed in 11 (6.4%) hips. Ten patients with unexplained groin pain and excessive  Table 7).

Summary of main findings
We investigated the possible sex differences in failure patterns of the current generation of metal-on-metal hip resurfacing arthroplasty. We analyzed morphologically distinct failure modes in a large collection of retrieved hips and performed statistical analyses. We observed substantial sex differences in the failure patterns of hip resurfacing arthroplasty: male hips showed more frequent osteonecrosis with larger lesions than those of women and osteonecrosis led to fracture more frequently in men. On the other hand, the bone remnants of women were more likely to contain excessive lymphocyte infiltrations and to show interface hyperosteoidosis, both of which were linked to unexplained persistent groin pain associated with suggested hypersensitivity reaction.
Explaining the results and comparing them with those of other studies Following improvements in metallurgy and surgical technique, patient selection remains an important tool with which to positively influence the outcome of   metal-on-metal hip arthroplasty. Although men under the age of 65 with osteoarthritis are considered to be the best candidates for hip resurfacing based on data from registries and larger centers [2][3][4][5][6][8][9][10][11][12], such data are mostly relatively unstructured and do not provide an adequate answer to the question, how do other factors such as the diameter of the prosthesis, age of the patient or clinical diagnosis influence the prosthesis failure? Recently, McBryde and associates computed a multivariate Cox proportional hazard survival model, and found that increased risk was related to differences in the size of the femoral component in their cohort of 48 failures (out of a total of 2,123 implanted hips) [13]. Similarly, in their study cohort of 1,107 resurfaced hips, Amstutz and collaborators reported a higher revision rate in women, although the effect of sex disappeared after adjustment for component size and surgical technique [7]. In contrast to clinical studies, we analyzed a large cohort of standardly analyzed retrieved hip resurfacing arthroplasties and focused on several morphologically well-defined lesions within the remnant tissue that had previously been suggested to show some degree of sexual dimorphism. It seems likely that further classification of characteristic failure modes into subgroups will enable further insight into the different biological reactions to prostheses in men and women. Similarly to our results [14], Little and colleagues [19] also found osteonecrosis in the majority of fractures in male patients. Moreover, the suggestion that female patients suffer from hypersensitivity reactions to prostheses more frequently than males is generally accepted [15,17,20,21].

Limitations of the study
We recognize several important limitations to the present study. First, we were unable to estimate the total population of patients with implanted THRA operated on by the cooperating surgeons, and therefore the prevalence, preoperative and postoperative functional scorings and other possible risk factors remain unknown. Moreover, we cannot exclude that some surgeons did not send all their retrieved hips to our laboratory or that some revision surgeries were possibly performed by other than our cooperating surgeons (selection bias). However, to reduce further selection bias, all cases with informative morphological findings were included in the current study and we also present our complete data on all specimens submitted to the Hamburg retrieval study on hip resurfacing arthroplasty. Furthermore, as only 8 to 15 failures were obtained for four of the five studied designs, we did not further differentiate between the different designs of prostheses. However, it must be noted that in our study cohort, association of THRA design with distinct failure modes was not observed. To minimize classification bias, all specimens were processed according to highly standard schema and we did histological analysis from three quadrants from each retrieved hip. In our previous work, we also investigated interand intra-observer agreement for qualitative diagnoses such as the presence of osteonecrosis [14,18] and the final diagnoses were assigned by consensus between two investigators (MA, JZ). In terms of the morphological changes associated with the potential delayed-type hypersensitivity reaction, it should be kept in mind that there is no consensus about the specific histopathological features of this complication. While some investigators suggested that anterior solid granulomatous pseudotumors [20,21] are specific for hypersensitivity, newer data observed malpositioning leading to the accumulation of metal wear particles directly within such lesions [22][23][24]. In the few cases in our study cohort that seemed to be associated with metal hypersensitivity, we observed proliferative desquamative synovitis linked with joint effusion under pressure and excessive intraosseous lymphocyte infiltration [15]. Because confidence intervals for some categories were quite wide, a reclassification of a single case (for example in a group of 11 cases showing excessive lymphocyte infiltration) may possibly change these substantially. To overcome the problem of inter-observer variability in semiquantitative diagnoses (for example moderate versus severe lymphocyte infiltration), we therefore defined intraosseous excessive lymphocyte infiltration quantitatively as more than 300 cells in one high power field [15], which represented a very conservative cutoff value. We also reported interface hyperosteoidosis [17] occurring preferentially in failures in female patients, but its possible association with the hypersensitivity reaction remains unclear until specific tests for metal allergy are available.

Implications for research and clinical practice
In the current study, we demonstrated that detailed classification of distinct failure patterns of prostheses might help to explain the differing pathogenesis of such complications and enable future stratification of risk factors as well as different therapeutic strategies for certain patient populations (gender medicine and/or personalized medicine). Specific diagnostics of (as minimally invasive as possible) and therapy for (immunomodulatory instead of operative) the hypersensitivity reaction to prostheses remains an important issue for future interdisciplinary research in orthopedics.

Conclusions
Within the limitations of this study, we can conclude that, we demonstrated a substantial sex difference in distinct failure patterns of metal-on-metal hip resurfacing. The recognition of pathogenically distinct failure modes will enable further stratification of risk factors for certain failure mechanisms and will influence future therapeutic options for selected patient groups.

Note
The study was supported by Abbreviations CI: confidence interval; IQR: interquartile range; OR: odds ratio; THRA: total hip resurfacing arthroplasty.