Test–re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females

Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test–re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart. Results. For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81–0.88), test–re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88–0.95), and test–re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65) and good on the right side (ICC = 0.85). Conclusion. Inter-rater reliability and test–re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test–re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test–re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.


INTRODUCTION
Traditionally, pelvic tilt has often been measured in clinical practice to identify the presence of abnormal postures that may cause dysfunction and lead to chronic musculoskeletal pain conditions (Herrington, 2011), such as low back pain (Juhl, Cremin & Russell, 2004). However, in cross-sectional studies, anterior pelvic tilt has not often been identified as a risk factor for low back pain (Youdas et al., 2000;Chaléat-Valayer et al., 2011), although more recently Lim, Roh & Lee (2013) found that anterior pelvic tilt was lower in healthy individuals than in subjects with low back pain and Youdas et al. (2000) reported a significant correlation between the pelvic tilt angle and Oswestry Disability Index scores in females. Even so, Lim, Roh & Lee (2013) reported no similar, significant correlation when carrying out the same calculation. Such conflicting results may reflect the necessity for a biopsychosocial model to explain such conditions rather than a purely patho-anatomical one (O'Sullivan, 2012).
In addition, both in the literature and anecdotally, there are reports that greater anterior pelvic tilt may increase the risk of musculoskeletal injury during running (Schache et al., 1999;Schache, Blanch & Murphy, 2000;Schache et al., 2002). It has been suggested that such injuries could occur either through repetitive impingement of the vertebral facets (Schache et al., 1999;Schache et al., 2002) or by producing excessive lengthening of the hamstring, leading to strain injury (Schache et al., 1999;Schache et al., 2002). On this or another basis, some clinicians may decide to measure the extent of anterior pelvic tilt in their patients and clients, particularly those who undertake regular running activities.
Pelvic tilt can be measured either with a single measurement, at the center line, or with two measurements at either lateral border. Measurements taken in cadavers have shown that differences in bony anatomy lead to significant between-side differences in anterior pelvic tilt (Preece et al., 2008) and significant differences in pelvic tilt between sides have also been reported in live subjects (Herrington, 2011). The difference in pelvic tilt between sides has been taken as a measurement of pelvic torsion, which some investigations have associated with leg length discrepancy (Cummings, Scholz & Barnes, 1993;Young, Andrew & Cummings, 2000;Betsch et al., 2012;Wild et al., 2014). It has been variously suggested that pelvic torsion occurs as a natural adaptation to leg length discrepancy (Krawiec et al., 2003), that greater anterior pelvic tilt occurs on the side of the shorter leg compared to the contralateral leg (Knutson, 2005), and that this biomechanical feature may be common to both symptomatic and asymptomatic individuals alike (Herrington, 2011). Even so, the precise relationships between leg length discrepancy and pelvic torsion, as well as between leg length discrepancy and musculoskeletal injury risk, are contentious and remain poorly understood (Gurney, 2002;Juhl, Cremin & Russell, 2004;Knutson, 2005;Cooperstein & Lew, 2009).
Calliper-based inclinometers seem to be among the most common tools used by clinicians for measuring pelvic tilt for several reasons. They display good reliability for measuring iliac crest height differences (Walker et al., 1987;Hagins et al., 1998;Petrone et al., 2003;Krawiec et al., 2003) and for measuring pelvic tilt (Heino, Godges & Carter, 1990;Crowell et al., 1994;Youdas et al., 1996;Gnat et al., 2009;Herrington, 2011;Fourchet et al., 2014). Using the intra-class correlation coefficient (ICC) to assess reliability, researchers investigating the use of calliper-based inclinometers in healthy adult volunteers have generally reported at least good (Walker et al., 1987;Heino, Godges & Carter, 1990;Herrington, 2011) if not excellent reliability (Youdas et al., 1996;Hagins et al., 1998;Krawiec et al., 2003;Gnat et al., 2009). Additionally, calliper-based inclinometers have also been found to display good convergent criterion reference validity by reference to radiography (Crowell et al., 1994;Petrone et al., 2003). Furthermore, these devices also have several practical advantages to the clinician, being quickly and easily utilized (Crowell et al., 1994), as well as being small, portable, relatively safe compared to radiography, and comparatively inexpensive in comparison with low-dose digital stereoradiography and MRI scanning devices. Calliper-based inclinometers also permit measurements to be taken on both sides of the pelvis, which may be important given the differences between sides that have previously been observed (Preece et al., 2008;Herrington, 2011).
Different models of calliper-based inclinometer have been investigated in the literature. The Palpation Meter (PALM, Performance Attainment Associates, St. Paul, MN, USA) is the calliper-based inclinometer that has been extensively explored (Hagins et al., 1998;Petrone et al., 2003;Krawiec et al., 2003;Gnat et al., 2009;Lee, Yoo & Gak, 2011;Herrington, 2011;Fourchet et al., 2014). Other models that have been investigated include those developed and modified by Walker et al. (1987) andCrowell et al. (1994). The model used and developed by Crowell et al. (1994) included a spirit level to permit readings relative to the ground, fingertip rings to allow superior palpation of the bony prominences, and a digital read-out for ease and speed of reading the output. The Digital Pelvic Inclinometer (DPI, Sub-4 Limited, UK) is a new, commercially-available, calliper-based inclinometer that is very similar to the model developed by Crowell et al. (1994) (Fig. 1). Like the model developed by Crowell et al. (1994), the DPI uses a digital display. This display allows the clinician to see the output of the device while performing the measurement procedure. In addition, the DPI also has recessed calliper ends, which allow simultaneous palpation of the bony prominences with the hands and the calliper arms. Finally, the DPI also contains a spirit level to facilitate measurements of pelvic angles relative to the ground as well as relative to the other side of the pelvis.
The purpose of this study was to investigate the inter-rater reliability and test-re-test reliability of the DPI in young, healthy males and females across two rating sessions with experienced, trained raters. The first hypothesis for this study was that inter-rater reliability for the DPI between two raters would be good. The second hypothesis was that test-rest reliability for the DPI would be good by reference to three separate measurements taken on a single rating session. The third hypothesis was that test-re-test reliability for the DPI would be good by reference to the mean of the measurements taken on each of two rating sessions on separate occasions.

METHOD Experimental approach
The inter-rater reliability and test-re-test reliabilities of the DPI for measuring pelvic tilt on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects on separate occasions. The dependent variables were the two angles of pelvic tilt (right and left sides). The independent variables were the test number (3 tests per session), the session (2 sessions), and the rater (2 raters).

Measurement procedures
The subjects arrived at the laboratory wearing athletic clothing. The subjects were tested by both raters in two sessions on two separate days, three weeks apart. The two raters were in separate rooms (bays) and were therefore blinded from the results recorded by the other. Each rater was supervised by an investigator to ensure that no information could be passed from one to the other, and all data was retained by the investigators to prevent any communication of results between raters. In addition, the subjects were blinded from the results, and could not pass details between raters. The subjects were measured while standing in a normal, relaxed position and wearing loose clothing but no shoes or footwear on a level floor in the same room of the same building, at the same time of day on each occasion, as shown in Fig. 1. No specific instructions were provided to the subjects regarding posture in order that measurements during a normal standing position could be recorded. The raters used a DPI to take measurements for pelvic tilt on each side of the pelvis (right and left). The DPI is a hand-held, calliper-based inclinometer with a digital readout (Fig. 1). The DPI comprises two precision arms, which are mounted upon a main body. The main body contains a tri-axial accelerometer, which records the angle of pelvic tilt across the two precision arms. The output from the tri-axial accelerometer is shown as an angle in degrees, in numerical form on a liquid crystal display. For each measurement of pelvic tilt, standard instructions were used per the manufacturer's guidelines, as follows: ''the practitioner places the index finger and thumb on each hand on each finger grip at the end of the DPI arms. With each index finger slightly prominent ready for concurrent palpation of the posterior superior iliac spine (PSIS) and anterior superior iliac spine (ASIS), the practitioner positions the DPI on the side of the innominate bone and takes a reading. The practitioner moves their index finger over the most prominent point of the iliac crests until the apex is established for the measuring. The practitioner then reads off the degree of inclination from the LCD.''

Subjects and raters
Following a power analysis as described by Wolak, Fairbairn & Paulsen (2012), a convenience sample of 18 healthy subjects (12 males and 6 females) were recruited from a university physical therapy program. Of the 18 subjects, only 16 were included in the test-re-test reliability assessment between sessions (for subject characteristics relevant to each assessment, see Table 1).
Subjects qualified for the study if they met the following criteria: were ≥18 years of age, were able to stand unsupported for the duration of the measurement process (<10 min), were free from existing low back injuries, had not experienced any low back injuries within the previous 3 months, and had no medical condition leading to clinically meaningful leg length inequality. In accordance with ethical requirements, the subjects received an explanation of the nature, purpose, and risks of the study and were given the opportunity to ask questions. All subjects signed an informed consent document prior to participating in the study. Written ethical approval for the study was granted by the Faculty of Health Sciences Ethics Panel, Staffordshire University.
A convenience sample of two raters with similar experience in using the DPI were recruited. They completed the DPI measurements for all subjects. The first rater was a sports podiatrist with 26 years of experience in clinical practice, and 4 years of experience with using the DPI. The second rater was a podiatrist with 15 years of experience in clinical practice, and 6 months of experience with using the DPI.

Statistics
Intra-class correlation coefficients (ICC) were used to assess the inter-rater, intra-rater (between sessions) and intra-rater (within sessions) reliability of pelvic tilt measured using the DPI for both right and left sides. ICCs are suitable for use in fully-crossed study designs assessing reliability of interval variables (Hallgren, 2012). Since the raters were not randomly selected for each subject but were the same for all subjects, a two-way Analysis of Variance (ANOVA) model was used (Shrout & Fleiss, 1979). Since absolute rather than ranked values of pelvic tilt are of interest, the ICC model type was set to require absolute agreement (McGraw & Wong, 1996). The unit of measurement used in the model differed between the statistics calculated. Since clinical practice commonly involves taking multiple measurements and recording the mean, the mean of the three ratings taken for each subject in a single session was used for hypothesis testing for inter-rater reliability and test-re-test reliability between sessions. Inter-rater reliability was assessed by combining the results of both testing sessions. Test-re-test reliability between sessions was assessed by combining the results of both raters. In contrast, for test-re-test reliability within single sessions, the reliability of the single, individual ratings was assessed, although again the results of both raters were combined together (Shrout & Fleiss, 1979). Before commencing the trial, it was decided that interpretation of the reported values for each ICC would be based upon the following criteria: <0.50 = poor, 0.50-0.75 = moderate, and >0.75 = good (Walmsley & Amell, 1996;Batterham & George, 2003;Portney & Watkins, 2008). To enhance clinical interpretation of the results, the standard error of measurement (SEM) and minimum difference to be considered real (MD) were estimated (Weir, 2005). Descriptive statistics were calculated as means with standard deviation. Statistical significance was set a priori at p < 0.05. All statistical analysis was performed using R, using the irr (Gamer et al., 2007) and ICC (Wolak, 2012) packages.

Descriptive statistics
Descriptive statistics (mean ± standard deviation) for pelvic tilt on the right and left sides are presented in Table 2.

Reliability
The ICC, SEM, and MD reported when measuring inter-rater reliability, test-re-test reliability (within sessions) and test-re-test reliability (between sessions) are presented in Table 3. Data for 18 subjects were available for inter-rater reliability and test-re-test reliability (within sessions) but data for only 16 subjects were available for test-re-test reliability Table 3 Inter-rater and test-re-test reliabilities of the DPI. Inter-rater and test-re-test reliabilities (between sessions and within sessions) of the DPI for measuring pelvic tilt on the right and left sides, as assessed by intra-class correlation coefficient (ICC), standard error of measurement (SEM) and minimum difference (MD) to be considered real.

Inter-rater
Test  (between sessions), as only 16 subjects attended both sessions. Subject attendance in each session, along with the raw data for the mean pelvic tilt on left and right sides is shown in Table 4.

DISCUSSION
The purpose of this study was to investigate the inter-rater reliability and test-re-test reliability of the DPI for measuring pelvic tilt angle on both right and left sides of the pelvis in young, healthy males and females. The first hypothesis for this study was that inter-rater reliability for the DPI would be good. The second hypothesis was that test-re-test reliability for the DPI would be good within a single rating session. The third hypothesis was that test-re-test reliability for the DPI would be good between two rating sessions. By reference to pre-determined criteria for assessing reliability by reference to the magnitude of the ICC, the inter-rater reliability of the DPI for measuring pelvic tilt was designated as good on both sides (ICC = 0.81-0.88), the test-re-test reliability of the DPI for measuring pelvic tilt within a single rating session was designated as good on both sides (ICC = 0.88-0.95), and the test-re-test reliability for the DPI for measuring pelvic tilt between two rating sessions was designated as moderate on the left side (ICC = 0.65) and good on the right side (ICC = 0.85).
For inter-rater and test-rest reliability, our findings ( 2014) reported good inter-rater and intra-rater reliability (coefficient of variation = 15.8%). The reliability of the PALM in assessing linear differences in iliac crest height has also been found to be good (Hagins et al., 1998;Petrone et al., 2003) but whether such findings can be considered as directly comparable with the measurement of pelvic tilt angle is unclear. The reliability of a three-dimensional (3D) camera-based motion capture system reported by Levine & Whittle (1996) was also found to be good but interestingly no better than the PALM (ICC = 0.95; SEM = 0.96 degrees; MD = 2.7 degrees) and the caliper-based system used by Gajdosik et al. (1985) also displayed similar reliability (ICC = 0.88; SEM = 1.4 degrees; MD = 4.0 degrees).
Regarding pelvic tilt, our descriptive statistics (means of 10.5-10.6 degrees) are in line with the findings of other investigations, across various measurement devices. Using a PALM device, Herrington (2011) measured pelvic tilt in a population of 120 young, healthy subjects (65 males and 55 females, aged 23.8 years). It was reported that 85% of males and 75% of females displayed an anteriorly rotated pelvis, in the range of 6-7 degrees. Also using a PALM device, Lee, Yoo & Gak (2011) measured pelvic tilt in a population of 40 young, healthy subjects (23 males aged 23.8 years and 17 females aged 21.4 years) and found that anterior pelvic tilt was 7-8 degrees. Gajdosik et al. (1985) measured pelvic tilt in a population of 20 healthy males, aged 25.2 years, and reported a mean anterior pelvic tilt angle of 8.5 ± 4.1 degrees. Using a 3D camera-based motion capture system, Levine & Whittle (1996) measured pelvic tilt angle in a population of 20 healthy female subjects, aged 23.4 years, and reported a mean anterior pelvic tilt angle of 11.3 ± 4.3 degrees. Using radiography, Vaz et al. (2002) measured pelvic tilt angle in 100 healthy students from medical professions, aged 27 years, and reported a mean anterior pelvic tilt angle of 12.3 ± 5.9 degrees. From this very brief review, it seems that calliper or calliper-inclinometer systems (Gajdosik et al., 1985;Herrington, 2011;Lee, Yoo & Gak, 2011) tend to report slightly lower values of anterior pelvic tilt (6-8 degrees vs. 11-12 degrees) than those found using more sophisticated methods (Levine & Whittle, 1996;Vaz et al., 2002). It is interesting that the values reported here using the DPI (means of 10.5-10.6 degrees) are at the higher end of the spectrum reported in the literature and closer to those observed using more sophisticated methods. Whether this is a feature of the population measured, the presence of a spirit level in the DPI to standardize measurements relative to the ground, systematic bias in the DPI, or systematic bias in the raters is unclear.
Regarding differences between right and left sides, this investigation reported descriptive statistics (mean of 0.1 degrees greater anterior pelvic tilt on the right side) that are within the range of values observed by others. The literature is conflicting regarding whether the left or right sides tend to be more anteriorly rotated, or whether no difference is the norm. In respect of the prevailing direction of greater anterior tilt, some studies have reported very small differences that are likely within the bounds of measurement error (Gnat et al., 2009;Lee, Yoo & Gak, 2011). Other investigators have reported greater mean anterior tilt on the right side (Krawiec et al., 2003), which has been predicted based upon the apparent tendency for the right leg to be shorter in many populations (Knutson, 2005). However, greater mean anterior tilt on the left side has also been reported (Barakatt et al., 1996). In respect of the magnitude of difference between sides, as noted above, some studies have reported very small differences (Gnat et al., 2009;Lee, Yoo & Gak, 2011), while others have reported differences of around 2 degrees (Barakatt et al., 1996;Krawiec et al., 2003). It is noteworthy that Gnat et al. (2009) reported low mean values for the difference between sides in quiet standing (<0.5 degrees) but much greater values after exercise, particularly jumping (4.65 ± 1.56 degrees).

LIMITATIONS
There are several key limitations to this investigation. The study design and consequently the forms of ICC used for statistical analysis do not permit the extrapolation of these results to any rater but rather limit their application to experienced and trained raters (Shrout & Fleiss, 1979). Different results might therefore be observed in untrained or in trained but inexperienced raters. In addition, the subjects who were assessed comprised young, healthy physical therapy students and investigations in other populations might yield differing findings. Care should therefore be taken in drawing inferences about the use of the DPI in the general population based on these results. There were also two key controls in which the study protocol was deficient. Firstly, the raters were not blinded to the values displayed on the DPI for each measurement, unlike some other studies assessing reliability in similar devices (Gnat et al., 2009). This limitation is of particular concern in relation to the test-re-test reliability measurement taken within a single session, where it was very easy for each rater to recall the previous measurement when taking additional measurements. Secondly, the activities of the subjects immediately prior to the measurements being taken were not controlled. Since mechanical loading has been found to affect pelvic tilt angle (Gnat & Saulicz, 2008;Gnat et al., 2009), this may have affected the reliability of the measurements taken between sessions. In addition, although our exclusion criteria prevented the inclusion of any subjects with medical conditions leading to clinically meaningful leg length discrepancies, our study was limited in that we did not perform any tests to assess whether any of the subjects had such leg length discrepancies, nor did we measure any other musculoskeletal parameters, such as hamstring and lumbopelvic flexibility using the sit-and-reach test, or actual hamstring muscle-tendon length. Such confounding factors might have affected our results.
In respect of the validity of the DPI, there are three substantial limitations of the present study. Firstly, criterion reference validity of the DPI for assessing anterior pelvic tilt on either side of the pelvis was not assessed. Future studies could explore this by correlating measurements taken using the DPI with measurements taken using gold standard methods (such as radiography) in the same group of subjects, as other investigators have done (Crowell et al., 1994;Petrone et al., 2003). Therefore, while the DPI displays good reliability between raters and between ratings taken in the same session, it may not produce valid measurements of pelvic tilt in comparison with values recorded using radiography or MRI. Secondly, the extent to which the measurements of anterior pelvic tilt on either side of the pelvis or the difference between these (pelvic torsion) might be predictive of increased injury risk or low back pain was not assessed. Thirdly, the extent to which measurements of anterior pelvic tilt on either side of the pelvis or the difference between these (pelvic torsion) might provide useful information about the extent of any existing leg length inequality was not explored.

CONCLUSIONS
The inter-rater reliability and test-re-test reliability of the DPI for measuring pelvic tilt angle on both right and left sides of the pelvis were assessed, in a convenience sample of young, healthy males and females. The inter-rater reliability of the DPI for measuring pelvic tilt was designated as good on both sides (ICC = 0.81-0.88); the test-re-test reliability of the DPI for measuring pelvic tilt within a single rating session was designated as good on both sides (ICC = 0.88-0.95); and the test-re-test reliability for the DPI for measuring pelvic tilt between two rating sessions was designated as moderate on the left side (ICC = 0.65) and good on the right side (ICC = 0.85). Given that the raters were not blinded to the measurements, our findings regarding the test-re-test reliability of the DPI for measuring pelvic tilt within a single rating session should be interpreted with caution. Nevertheless, these results indicate that the DPI produces acceptably reliable measurements, although further research is required to establish the validity of the DPI in measuring pelvic tilt.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The authors received no funding for this work.

Competing Interests
Chris Beardsley is a Director of Strength and Conditioning Research Limited, a business that provides an online sports science encyclopedia, and Tim Egerton is the owner of Sport Science Tutor, a business that provides online sports science coaching for students.