Dynamic gait index post-stroke: What is the item hierarchy and what does it tell the clinician? A Rasch analysis

Aims: The purpose of this study was to use the Rasch measurement model to determine (1) the dynamic gait index (DGI) item-level psychometrics, (2) if the item-difficulty hierarchical order is consistent with a clinically logical progression from easiest to hardest, and (3) if the range of tasks is sufficient to measure the functional ability levels of the sample. Methods: Data were retrieved retrospectively from study records, which included initial DGI scores and subject demographics collected at multiple university laboratories. Individuals were eligible Stacey E. Aaron1, Ickpyo Hong1, Mark G. Bowden2, Chris M. Gregory2, Aaron E. Embry3, Craig A. Velozo4 Affiliations: 1PhD Candidate, Department of Health Sciences & Research, Medical University of South Carolina, Charleston, South Carolina, United States; 2Associate Professor, Department of Health Sciences & Research, Medical University of South Carolina, Charleston, South Carolina, United States; Division of Physical Therapy, Medical University of South Carolina, Charleston, South Carolina, United States; Ralph H. Johnson VA Medical Center, Charleston, South Carolina, United States; 3Research Associate, Department of Health Sciences & Research, Medical University of South Carolina, Charleston, South Carolina, United States; Division of Physical Therapy, Medical University of South Carolina, Charleston, South Carolina, United States; Ralph H. Johnson VA Medical Center, Charleston, South Carolina, United States; 4Occupational Therapy Division Director and Professor, Division of Occupational Therapy, Medical University of South Carolina, Charleston, South Carolina, United States. Corresponding Author: Stacey Elizabeth Aaron, MS, 77 President St, MSC 700, Charleston, South Carolina, United States, 29425; E-mail: aarons@musc.edu Received: 04 August 2016 Accepted: 29 August 2016 Published: 26 September 2016 to participate if 18+ years of age, >6 months post-stroke, residual paresis in lower extremity, and ability to walk with/without assistive device (n=117). Psychometrics of the DGI were tested with confirmatory factor analysis (CFA) and Rasch measurement modeling. Results: DGI demonstrated acceptable psychometric properties: unidimensionality (CFA: χ2/df =2.12, CFI=0.98, TLI=0.97, RMSEA=0.09), no misfit items to the Rasch model, local independence (all item residual correlations <0.2), and a good internal reliability (Cronbach alpha of 0.86). Item-level analysis revealed a clear itemdifficulty hierarchical order that is consistent with clinical observation and expectations. While the instrument separates the sample into three significant strata, there was mismatch between the average of person ability distribution (0.86 logit) and the average of item difficulties (0.00 logit). Conclusion: The DGI demonstrated good item-level psychometric properties and an expected item-difficulty hierarchical order. Order of administration and adding more challenging items may improve precision and person-item matching to better differentiate between individuals with higher ability levels.


INTRODUCTION
Multiple studies have shown a high incidence of falls within community dwelling individuals with chronic stroke [1][2][3]. The amount of fallers post-stroke is approximately 34% at three to four months, as high as 73% at six months, and 70% at one-year follow-up [4][5][6]. Additionally, within 12 months of their first fall 21 to 57% of stroke survivors are repeat fallers [6]. High frequency of falls may be due to a combination of existing fall risk factors before stroke, as well as impairments typically seen after stroke, such as decreased strength and balance, hemineglect, perceptual problems, and visual problems [7]. Therefore, a large emphasis in rehabilitation for individuals after stroke has been devoted to the development of interventions aimed at improving balance and mobility function.
Majority of falls occur during walking (40-90%), so the ability to assess dynamic balance and mobility properly post-stroke is extremely important [6,8,9]. One common tool used to assess changes in mobility and fall risk is the dynamic gait index (DGI). The DGI was developed by Shumway-Cook and Woollacott to evaluate functional stability during gait activities and risk of falling in older individuals with vestibular issues [10]. This assessment is used to detect problems that cannot be distinguished with less dynamic balance assessments (e.g. Berg Balance Scale) by evaluating a person's ability to alter gait in response to changing gait demands [10,11].
The psychometric properties of the DGI have been reported in many different patient populations including community-dwelling older individuals [12][13][14], vestibular dysfunction [15,16], multiple sclerosis [17,18], Parkinson's disease [19,20], as well as stroke [21,22]. The stroke properties include excellent test-retest reliability of 0.96 [21,22], excellent interrater reliability of 0.96 [21], excellent construct validity with 10 meter walk test (r= -0.68 to -0.83) and postural assessment scale (r=0.76 to 0.85) [22], and moderate responsiveness at depicting changes at second month and fifth month after therapy [21]. Despite these findings, there has not been any study that has used the Rasch measurement model to investigate the measurement properties of the DGI in the chronic stroke population. Further analysis may reveal other important psychometric characteristics, like item-difficulty hierarchical order, which may add to its clinical applicability and interpretability as a measure of walking balance ability.
The DGI is typically administered in order of the tasks listed on the form. However the tool was not built with an item-difficulty hierarchy in consideration [12]. The hierarchical structure of tasks has been established for the DGI in community-dwelling older individuals [12,23], but has not yet been evaluated in stroke. Other researchers have determined that the item hierarch in older adults differs compared to individuals with vestibular disorders, for whom the assessment was developed, and therefore there is a need to establish the DGI item hierarchy for other populations, such as stroke. Completion of walking tasks, according to difficulty, would be clinically useful since the Rasch item hierarchical structure of tasks may be used to establish the logical order of administration of items, beginning with the easiest and progressing to the hardest. Understanding and formalizing the order of administration could inform clinicians in three very important areas of clinical practice: (1) improving efficiency during examination, as all items will not need to be administered (2) improving goal setting by utilizing evidenced based progression of balance activities not currently published (3) clearly identifiable progression of balance interventions based on current physical deficits The purpose of this study was to use Rasch analysis to examine the measurement properties of the DGI. This includes unidimensionality, fit statistics, local independency, test reliability, and person distribution. Additionally, we will determine the item-difficulty hierarchical ordering of the DGI items and determine if the order is consistent with a clinically logical progression from easiest to hardest. Finally, we determined if the range of task difficulty is sufficient to measure our sample without a ceiling effect.

Participants
Participants had been involved in one of seven studies. Secondary data were retrieved retrospectively from study records, which included DGI scores and subject demographics (age, gender, paretic side). For the most strict study in the analysis, inclusion criteria included subjects more than 18 years of age, greater than six months post-stroke, residual paresis in the lower extremity (Fugl-Meyer lower extremity motor score <34), ability to sit unsupported for 30 seconds, ability to walk independently with or without an assistive device, successful completion of an exercise tolerance test, and ability to follow a three step command. Exclusion criteria included living in a nursing home prior to stroke, serious cardiac conditions or arrhythmias, legal blindness or severe visual impairment, history of significant psychiatric www.edoriumjournals.com/ej/dr Aaron et al. 107 illness, life expectancy less than one year, major poststroke depression (PHQ-9 >10), severe hypertension with systolic greater than 200 mmHg and diastolic greater than 110 mmHg at rest, previous or current enrollment in a clinical trial to enhance stroke motor recovery, or concomitant neurological disorders. The other six studies had similar inclusion and exclusion criteria, however, were not as stringent. All subjects consented to participation and the institutional review boards of each of the four participating institutions (University of Florida, Malcom Randall VA Medical Center, Medical University of South Carolina, and Ralph H. Johnson VA Medical Center; United States) approved all studies.

Procedure
All subjects performed the English version of the DGI as part of their initial baseline assessments. The eightitem evaluation was administered according to protocol by licensed physical therapists. The assessment was performed in a 20-foot-long marked walking area with required equipment of a shoebox, two obstacles, and stairs. Dependent on the person's ability, all eight tasks were completed in average of 15 minutes. Scores were based on a 4-point scale: 3 = no gait dysfunction, 2 = minimal impairment, 1 = moderate impairment, and 0 = severe impairment with a possible score ranging between 0 to 24 points. Tasks include: steady state walking, walking with changing speeds, walking with head turns both horizontally and vertically, walking while stepping over and around obstacles, pivoting while walking, and stair climbing.

Analyses
Rasch analysis with Andrich rating scale model was conducted to test psychometric properties of the DGI using Winsteps® Rasch Measurement, Version 3.91.0 [24,25]. Confirmatory factor analysis (CFA) with 1 factor model was accomplished using Mplus software, version 7.11 [26]. Linacre (1998) suggests three methods of testing unidimensionality, including point-biserial correlations, fit statistics, and factor analysis [27]. First, point-biserial correlations on test items were analyzed to detect negative correlations or a correlation less than 0.3. These values indicate that the direction of the scoring on the item may be opposite to the direction of the measurement construct or due to data entry errors [24,28]. Secondly, to investigate the fit of items to the Rasch model the dimensionality among the test items were assessed by infit (weighted) and outfit (un-weighted) statistics [29]. Fit statistics are represented as mean square residuals (MnSq), or the ratio of observed to expected scores (the ideal value being 1.0), and indicates the amount of distortion of the measurement [29]. An acceptable range of infit and outfit MnSq for clinical observations are 0.50 to 1.70 with an acceptable standardized Z square (ZSTD; amount of randomness) range from -2.00 to 2.00 [29,30]. Finally, the dimensionality of an outcome measure containing items identified by point-biserial correlation and fit statistics were examined using factor analysis. The CFA was conducted with the Weighted Least Squares with Adjustments for the Mean and Variance (WLSMV) estimation because the rating scales of the DGI were recorded with categorical responses [31]. The criteria for the CFA model fit were as follows: chi-square(χ2)/ degrees of freedom (df) test (<3.84 indicating p>.05), the comparative fit index (CFI >0.95), Tucker-Lewis Index (TLI >0.95), and root mean square error of approximation (RMSEA <0.06) [32]. We hypothesized that the outcome measure consisted of a single construct as well as all test items being local independent (residual correlations < 0.2) [33].

Precision
Reliability of the instrument was represented as person reliability, which is similar to the conventional statistics of Cronbach's alpha (α). For clinical application, a satisfactory value of person reliability was considered greater than 0.80 [24]. The precision of the instrument was presented as person strata or how precisely the instrument separates the subjects into distinct number of groups [25]. We considered at least three person strata as acceptable precision of the instrument, indicating the instrument is able to statistically divide a sample into three distinct groups, which is equivalent to a reliability of 0.80 [34]. Lastly, we investigated test information function across various continuum of person ability (theta) because the amount of test information function indicates how closely test items match person's ability [24,35]. Mathematically, test information function is the reciprocal of the standard error (SE), (SE = 1/√Test Information Function); therefore, it is considered as the estimated precision of the instrument across the spectrum of person ability levels of the sample [25].

Construct validity
Rasch measurement tests construct validity by investigating whether item-difficulty hierarchy is constant regardless of testing subjects [36], providing information about how well instruments are constructed. For example, "Step over obstacle" and "Stairs" items are expected to be more challenging than "Change in speed" and "Horizontal head turn" because the former items require more coordinated movements and are more dynamic movements. In addition, elements that require increased time in single leg stance ("stepping over obstacle" and "stairs") demonstrate more extensive dynamic balance requirements than those elements only assessing level walking. Through Rasch measurement, an item-person map places item-difficulty levels and a person's ability levels on the same linear continuum (log equivalent units or logits). We investigated whether the item difficulty hierarchical order of the test items was consistent with a clinically logical progression from easiest to hardest.

RESULTS
One hundred and seventeen participants were included in analysis. Average age was 59 (SD = 12.8) with a range of 21-83 years old. The majority of the sample was male (59.8%) and left side hemiplegia (53.0%) with a median DGI score of 15 indicating an increased fall risk (Table 1).

Precision
The DGI demonstrated acceptable person reliability (0.85). The separation coefficient was 2.41 with person strata being 3.5, which is equivalent to a reliability of 0.80. Figure 2 represents the level of information across various person ability (theta) continuum. On the x-axis, A represents the participant who had lowest person ability (-4.33 logits) and B represents the participant who had highest person ability (5.70 logits).

Construct validity
The DGI demonstrated clinically logical construct validity. Among the DGI items, " Step over obstacle (1.23 logits)" and "Stairs (0.94 logits)" were the most difficult items, and "Change in speed (-0.67 logits)" and " Step around obstacle (-1.48 logits)" were the least difficult items ( Table 2).   Figure 4). The item-person map demonstrates that the range of item step threshold difficulty (-3.91 to 3.88 logits) did not fully cover the person ability measure range (-4.33 to 5.70 logits). The average of item difficulty (0.00 logits) was slightly lower than the average of person ability (0.86 logits). 3.4% of the sample (n=4) demonstrated ceiling effects.

DISCUSSION
This study presents the analysis of the DGI by Rasch modeling. The DGI demonstrated good psychometric properties with the Rasch measurement model. While one of four fit indices did not meet the model fit criteria (RMSEA), all test items demonstrated good point-biserial correlations, fit statistics, and local independency. The item hierarchical order of the DGI has a logical itemdifficulty construct with the instrument able to separate the sample into three distinct sample groups.
The item-person map revealed that, on average, the sample performed 0.86 logit (about 1 standard deviation) higher than the average item-difficulty, with four people having had maximum measure scores. The range of item step threshold difficulty (the average of difficulty between adjacent response categories) did not cover 6% of the sample (n=7) who had a high person ability. The average standard error (SE) of person measures in Step Over Obstacle Step    the sample was 0.63 logits. However, the average of SE of the seven people who were not covered by the range of step threshold difficulty was 1.52 logits, which is equivalent to a value of 0.43 test information function. Mathematically, as SE increases the reliability decreases [35]. In other words, the DGI cannot precisely measure those seven participants. Our ceiling effect is lower than Lin et al. [22] who reported a ceiling effect with the DGI as high as 10% of their 39 subjects with stroke after two months of therapy. Therefore, the clinical implications for these results suggest assessments used in the stroke population need to include more difficult items to match with individuals who exhibit higher functional locomotor abilities.
To our knowledge, this is the first paper to report an item-difficulty hierarchical order for individuals with chronic stroke. The results showed the hierarchical order of item-difficulty with "step over obstacle", "stairs", and "level surface" being most difficult, "vertical head turn" and "horizontal head turn" being of medium difficulty, and "pivot turn", "change in speed", and "stepping around obstacles" being the least difficult. This item-difficulty hierarchical order represents a logical progression from hardest ("stepping over obstacles") to easiest ("stepping around obstacles"). The most difficult items ("step over obstacle" and "stairs") require single-limb support, as well as more strength, with the hemiparetic leg. Observationally, individuals with stroke tend to look down at their feet when they are walking, so it is not surprising that "vertical head turn" and "horizontal head turn" are considered to be of medium difficulty. The items "pivot turn", "change in speed", and "step around obstacles" do not require any extended periods of singlelimb support, drastically change the center of mass, or require the individuals to alter visual preferences and resulted in easier tasks for the individuals to perform. One item that may seem out of typical order is walking across "level surface", which is placed as the third most difficult task. This may be due to how the task is scored, which assess speed, abnormal gait patterns, and evidence of imbalance, whereas a task like "change in speed" is solely scored on ability to change speed and loss of balance while changing speed, but is not assessing abnormal gait patterns or speed during self-selected pace over level surface.
As expected, our item-difficulty hierarchical order for chronic stroke is different from the reported hierarchical order of community-dwelling older adults [12]. Chiu et al. [12] reported the item-difficulty hierarchical progression from hardest to easy being "horizontal head turn", "steps", "vertical head turn", "pivot turn", "over obstacle", "around obstacle", "speed change", and "level surface". These results suggest that not only are there differences between item-difficulty hierarchical order between differing populations, but also that the hierarchical structures are much different than the typical order of administration used in clinical settings. If the DGI is used in this population these results suggest that clinical administration of the order of items should be considered when administering the test in chronic stroke individuals. For example, on the basis of the item hierarchy found with our analysis, an individual who is capable of "stepping over an obstacle" has a higher probability of being successful at "horizontal head turn". This suggests that if a person is successful at a more difficult task then it would be unnecessary to test the person on an easier task. This approach for item-difficulty selection could reduce the burden of testing on the individual and reduce test administration time [12,37].
Future directions include the analysis of the Functional Gait Assessment (FGA) developed by Wrisley et al. [38], which is a modification of the DGI, developed to address some of the shortcomings of the DGI. To reduce the ceiling effect, additional items were added to the FGA that were expected to be more difficult. These tasks include "ambulating backwards", "gait with eyes closed", and "gait with narrow base of support". Furthermore, the DGI task "walk around obstacles" was removed in the FGA because it has been shown to be of insufficient difficulty [38]. Beninato and Ludlow [23] found that the FGA is clinically appropriate and a construct valid measure of walking balance ability in older individuals by Rasch modeling standards. A study by Lin et al. [22] analyzed ambulation measures used in the stroke population with their study comparing psychometric properties of DGI, DGI-4 (modified DGI with only 4 tasks), and the FGA. The FGA showed the strongest psychometric properties, however Rasch analysis was not used and the itemdifficulty hierarchical order of the FGA is still not known. There is a great need to establish an FGA item hierarchy for the chronic stroke population.

Study Limitations
The major limitation of this study is the use of participants that were recruited for other specific studies for other purposes and that had strict inclusion and exclusion criteria. Therefore, the generalizability of these findings are limited to the population of individuals with chronic stroke who have mild to moderate mobility impairments.
In addition, the RMSEA value (0.09) was relatively higher than the optimal cut-off of 0.06. However, the RMSEA is positively biased by a small sample size and small degree of freedom that results in a too large RMSEA value [39]. One simulation study demonstrated that a small sample size (about n=100) tended to falsely reject a valid unidimensional model with the optimal RMSEA cut-off [40]. Brown [32] suggests that with a small sample size, a RMSEA value of 0.08 is of less concern and is a good model fit if all other fit indices meet the CFA model fit criteria. While the RMSEA of the DGI was slightly higher than 0.08, the other three fit indices met www.edoriumjournals.com/ej/dr Aaron et al. 112 the model fit criteria and there were no low point-biserial correlations or high Rasch fit statistics. Therefore, we assumed that the DGI was essentially unidimensional. There is a need for future studies to validate our findings with a larger and more diverse sample.

CONCLUSION
The results suggest that the dynamic gait index (DGI) has good item-level psychometric properties and a hierarchical order that is logical when used in individuals with chronic stroke. Within our sample, the individuals with the highest abilities were not precisely assessed and adding more challenging items may improve precision and person-item matching. If the DGI is used the order of tasks based off difficulty, instead of the order of tasks listed on the form, may better assess dynamic function and may result in better representative ability scores. One possible assessment that may better assess this population is the Functional Gait Assessment, however this measure has not been analyzed using the Rasch model in chronic stroke population. *********