The Harris Hip Score CURRENT STATUS :


 Background The Harris Hip Score (HHS) is a widely used Patient-Related Outcomes score. It measures pain and function levels in patients with hip pathologies. Objectives The main objective of this study is to translate and culturally adapt the HHS into Arabic, and to further assess the reliability and validity of translated version Material & Methods 110 patients participated in this survey. The internal consistency tests were performed using Cronbach’s alpha. Test-retest reliability (intra-correlation coefficient), convergent construct validity, convergent validity, floor & ceiling effects and responsiveness was also calculated. In order to measure the level of agreement, Bland-Altman Plot, forest Plots are performed. Results Test reliability for the first testing situation - calculated using Cronbach's alpha - was 0.98 for the pain subscale, 0.98 for the stiffness, and 0.99 for the physical function subscale. For the second testing, reliability was 0.99, 0.97, and 0.99 (pain, stiffness, and physical function, respectively). This only proves that WOMAC is an instrument with good reliability. Same calculation of Cronbach’s alpha was essential to testing the reliability of the Harris Hip Score. For each of the three testing occasions the reliability was very good or excellent – α 1 = 0.92, α 2 = 0.91, and α 3 = 0.90. Intra-class correlation coefficient was good with the score of 0.76 (95% CI 0.44-0.88). Conclusion Overall, Arabic version of HOOS could be used as diagnostic tool for patients with hip problems, when it comes to information about the overall condition of the patient, especially about the improvement or deterioration, however, it is important to be cautious using HHS when the change magnitude of patient’s condition is investigated, since there is a potential probability that the level of improvement of the patient’s condition will be overestimated by HHS.

Background The Harris Hip Score (HHS) is a widely used Patient-Related Outcomes score. It measures pain and function levels in patients with hip pathologies. Objectives The main objective of this study is to translate and culturally adapt the HHS into Arabic, and to further assess the reliability and validity of translated version Material & Methods 110 patients participated in this survey. The internal consistency tests were performed using Cronbach's alpha. Test-retest reliability (intra-correlation coefficient), convergent construct validity, convergent validity, floor & ceiling effects and responsiveness was also calculated. In order to measure the level of agreement, Bland-Altman Plot, forest Plots are performed. Results Test reliability for the first testing situation -calculated using Cronbach's alpha -was 0.98 for the pain subscale, 0.98 for the stiffness, and 0.99 for the physical function subscale. For the second testing, reliability was 0.99, 0.97, and 0.99 (pain, stiffness, and physical function, respectively). This only proves that WOMAC is an instrument with good reliability.
Same calculation of Cronbach's alpha was essential to testing the reliability of the Harris Hip Score.
For each of the three testing occasions the reliability was very good or excellent -α 1 = 0.92, α 2 = 0.91, and α 3 = 0.90. Intra-class correlation coefficient was good with the score of 0.76 (95% CI 0.44-0.88). Conclusion Overall, Arabic version of HOOS could be used as diagnostic tool for patients with hip problems, when it comes to information about the overall condition of the patient, especially about the improvement or deterioration, however, it is important to be cautious using HHS when the change magnitude of patient's condition is investigated, since there is a potential probability that the level of improvement of the patient's condition will be overestimated by HHS.

Background
Patient-Related Outcomes (PROs) have emerged as useful tools for measuring medical conditions, has have been proven to be extremely useful in musculoskeletal disease clinics. 1 These well-structured questionnaires are completed by patients to reflect their own perspective. 2,3 . Hip pain is a prevalent complaint, in which both the patient and the clinician could benefit from utilizing a PRO to monitor conditions and decide on a management approach. [4][5] The Harris Hip Score is a widely used tool which combines the clinicians input with the patient-reported symptoms to generate a better clinical picture of the hip pathology at hand and evaluate treatment options. 6 The questionnaire itself however, is in English. Healthcare services in Arabic speaking countries would not be able to use it, hence, the need for a cross-cultural adaptation of the score. The authors of this study aim to prove the validity and reliability of the Arabic version of this score.

Translation
The translation process was done as per recommendations of Guillemin's guidelines for validation and cross-cultural adaptation 9 after permission obtained from the original HHS copyright holder. Two Bilingual orthopaedic surgeons were responsible for the conceptual and literary translation of the original HHS. Two other versions were produced by independent translation companies with a background in scientific English. All the versions produced were similar. Modifications to incorporate from all the version were made and implemented in the final version. A professional Arabic grammar checker reviewed it. The back-translation came close to the original score. A pilot test was then conducted on 10 random patients from the arthroplasty clinic after the Arabic version was approved by the translation committee. Both the physicians interviewed the patients after completing the questionnaire to address any issues or need for assistance.

Participants
110 patients completed the Harris Hip Score questionnaire, and agreed to have their data analyzed for research purposes. Average age of the participants was 44.3 years, with standard deviation of 15.4 years, implying that the majority of the sample was between 30 and 60 years of age. The youngest participant was 16, and the oldest was 76 years of age.

Psychometric Properties and Data Analysis
For all of the analyses IBM SPSS Statistics 21 was used.
In order to estimate reliability of the questionnaire Cronbach's alpha was calculated, and since every patient completed the survey on three different occasions, Cronbach's alpha was calculated for each of the three test situations. Also, ICC (interclass correlation coefficient) was used to assess test-retest reliability.
Content validity was tested by examing the shape of data distribution, as well as floor and ceiling effects. Floor effect is the percentage of patients who scored the lowest possible score (score of 0), and ceiling effect is the percentage of those with the highest score (score of 100). If more than 30% of the respondents had a floor or ceiling effect, the effects would be considered to be relevant.
To test convergent validity of the HHS, Spearman's correlation coefficient between HHS and WOMAC was calculated. Since WOMAC has already been validated in Arabic speaking countries, higher correlation coefficient would prove convergent validity of the HHS. Nonetheless, it is important to note that higher score on WOMAC indicates greater disability, while patients with greater disability would score low on HHS. This means that in order to have HHS validated, we are to expect negative correlation between score on WOMAC and HHS.

Harris Hip Score
The HHS usually contains 12 questions covering four domains: pain, function, deformity, and range of motion. Questions are answered using a Likert scale, with the final score having a maximum of 100 points (best possible outcome), and minimum of 0 points (extreme symptoms). Those 100 points are divided into subdomains -pain receives 44 points, function 47 points, range of motion 5 points, and deformity 4 points; function is subdivided into activities of daily living (14 points) and gait (33 points).
A total HHS of <70 points is considered to be a poor result, 70 to 80 is fair, 80 to 90 is good, and 90 to 100 is excellent (Nilsdotter and Bremander, 2011). For the purposes of this study a modified HHSwhich is subtracted from the deformity and range of motion subdomains -is used. Hence, the possible range for this instrument is not from 0 to 100, but from 0 to 91. This means that ceiling effect would be documented for those patients who had scored 91 points.
All 110 patients have completed HHS in at least two different occasions (T1 and T2), and 109 of them completed a the third time (T3). There were two and a half weeks between each of these three occasions.

Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) 8
24 Likert-type items make this WOMAC, and using it, every patient gets three scores, from three different subscales. First subscale -pain -has 5 questions (score range 0-20), 2 questions address stiffness (score range 0-8), and physical function has 17 questions (range 0-68). A 0 score on each of the subscales means that patient essentially has not felt any discomfort in his/her hip (if any); on the other hand, a higher score suggests greater disability.
The survey was taken in two different occasions, and 2 weeks had passed between the two testing situations.

WOMAC questionnaire
WOMAC has been validated in Arabic speaking countries, and has since been used in practice.
Nevertheless, additional analyses have been conducted in order to explore psychometric characteristics of a WOMAC questionnaire that had been used in this study.
Test reliability for the first testing situation -calculated using Cronbach's alpha -was 0.98 for the pain subscale, 0.98 for the stiffness, and 0.99 for the physical function subscale. For the second testing, reliability was 0.99, 0.97, and 0.99 (pain, stiffness, and physical function, respectively). This only proves that WOMAC is an instrument with good reliability.
In order to check content validity, floor and ceiling effects were examined. 10% of the patients have recorded floor effect on pain subscale, 14% on stiffness subscale, and 12% on the physical function.
On the other hand, 3% have recorded ceiling effect on pain subscale, 3% on stiffness subscale, and 3% on the physical function. Being that these percentages are far less than 30% (which is considered relevant) -this is an argument in favour of content validity of WOMAC.

Harris Hip Score
To test the reliability of the instrument, Cronbach's alpha was calculated. For each of the three testing occasions the reliability was very good or excellent -α 1 = 0.92, α 2 = 0.91, and α 3 = 0.90. Intra-class correlation coefficient was good with the score of 0.76 (95% CI 0.44-0.88).
Floor effect was recorded for 1% of the patients, and 2% showed a ceiling effect in the first week of testing. Two and a half weeks later, 1% of respondents again showed ceiling effect, and there was no floor effect recorded. On the third testing, 1% recorded floor effect, and one more time ceiling effect hasn't been documented. Shapiro-Wilk test was used to check if the data significantly deviates from the normal distribution, and it showed that it did, in all three testing occasions. A 2 weeks' test-retest reliability of HHS was applied to the present manuscript. Of the 110 patients that fulfilled the questionnaire, 108 responded to the second assessment after the initial evaluation. Test-retest reliability was also performed using Intra-class Correlation (ICC). The results ( Table 2) indicated that HHS has an acceptable intra-class correlation with 0.755 (95% CI 0.442, 0.876).
Considering the value of 0.902 (95% CI 0.704 -0.955) for Cronbach's alpha, the internal consistency of the three assessments were proven to be very high.
In order to be able to compare the results of WOMAC questionnaire with those from HHS, it was important to standardize the scores of WOMAC to the range of 0-100. In addition, HHS score which were in the range of 0-91, were rescaled to 0-100 to match the WOMAC scores. Figure 1 illustrates the change and the mean level of different subscales during different assessments which were conducted 2 weeks apart from each other. It is visually evident that the mean score of HHS decreased which is related to more pain and symptoms. At the same time the WOMAC mean score is showing an upward trend, which is also related with more pain and in general worsened conditions of the patient.
This illustrates a visual agreement between the two questionnaires.
As can be seen in the    (Table 3).

Level of Agreement between WOMAC & HHS
One of the best methods to measure the level of agreement between two measurement methods is Bland-Altman plot. In this method, the mean difference between WOMAC and HHS are plotted as a function of mean of WOMAC and HHS. As shown in the graphs, overall mean difference between

Discussion
The primary objective of this study was to create a reliable and valid Arabic version of HOOS by translation and adaptation. For this purpose, the Arabic version of HHS is compared to the efficacy and results of WOMAC questionnaire. Preliminary validity and reliability tests revealed that there is moderate reverse correlation between WOMAC subscales and HHS, which indicated that they are related in the right direction, since their scores are in the opposite directions (0 for WOMAC = no pain / 0 for HHS = extreme pain).
However, according to Altman and Bland's views regarding the correct analysis of the data gathered in studies of this type, it is not enough to use the correlation coefficient between the two measurements as a measure of agreement 18. They pointed out that methods can correlate well yet disagree greatly, as would occur if one method read consistently higher than the other. For this reason, Bland-Altman Plot was used to measure the level of agreement between WOMAC and HHS.
The Bland-Altman plots indicated that there is systemic bias between WOMAC and HHS. And the linear regression illustrated that with increasing mean score, Arabic version of HHS tends to underestimate the results of WOMAC. According to Mcgrory et. al. 19 , Differences in scores between hips were highly correlated for HSS and WOMAC total score, HHS pain and WOMAC pain subscores, and HHS function and WOMAC physical function subscores. However, they found out that WOMAC stiffness and HHS range of motion were not significantly correlated. Overall they concluded, that patients with bilateral hip arthroplasty can apply the WOMAC osteoarthritis index questions to individual hips at the same time as effectively as the joint-specific HHS questions. The illustrated forest plots, and effect sizes, showed that HHS scores was generally higher than WOMAC scores. In general, the results of both methods lead the surgeon to the right direction when it comes to information about the overall condition of the patient, especially about the improvement or deterioration, however, it is important to be cautious using HHS when the change magnitude of patient's condition is investigated, since there is a potential probability that the level of improvement of the patient's condition will be overestimated by HHS.
The major outcome of this study is that the HSS Arabic version demonstrated high levels of validity

Conclusion
The primary purpose of this study was to create a reliable and valid Arabic version of HHS by translation and adaptation. Its reliability -calculated both through Cronbach's alpha and ICC -was good or moderate. Although the distributions for all subscales deviate from a normal one, no significant ceiling or floor effects were observed.
The Arabic version of HHS is short and easily administered and interpreted with minimal investment of time required for both the researcher and clinician. It is our belief that the Arabic version of the HHS is sufficient to evaluate the state of a Hip disease. Its levels of reliability and validity are acceptable and we believe that it will facilitate assessment of functional limitations and symptoms experienced by Arab-speaking individuals with a variety of hip disorders. There is need for further studies to assess responsiveness and to determine the minimum clinically relevant differences in the  Bland-Altman Plot to demonstrate the level of agreement between HHS and WOMAC (First, last and average assessments). Linear regression line is also drawn to better demonstrate the systemic bias between the two methods.