The Automatic Assessment of Strength and Mobility in Older Adults: A Test-Retest Reliability Study

Background: Simple field tests such as the Timed Up and Go test (TUG) and 30 s Chair Stand test are commonly used to evaluate physical function in the elderly, providing crude outcome measures. Using an automatic chronometer, it is possible to obtain additional kinematic parameters that may lead to obtaining extra information and drawing further conclusions. However, there is a lack of studies that evaluate the test-retest reliability of these parameters, which may help to judge and interpret changes caused by an intervention or differences between populations. Thus, the aim of this study was to evaluate the test-retest reliability of the Timed Up and Go test (TUG) and 30 s Chair Stand test in healthy older adults. Methods: A total of 99 healthy older adults participated in this cross-sectional study. The TUG and the 30 s Chair Stand test were performed five times and twice, respectively, using an automatic chronometer. The sit-to-stand-to-sit cycle from the 30 s Chair Stand test was divided into two phases. Results: Overall, reliability for the 30 s Chair Stand test was good for almost each variable (intraclass correlation coefficient (ICC) >0.70). Furthermore, the use of an automatic chronometer improved the reliability for the TUG (ICC >0.86 for a manual chronometer and ICC >0.88 for an automatic chronometer). Conclusions: The TUG and the 30 s Chair Stand test are reliable in older adults. The use of an automatic chronometer in the TUG is strongly recommended as it increased the reliability of the test. This device enables researchers to obtain relevant and reliable data from the 30 s Chair Stand test, such as the duration of the sit-to-stand-to-sit cycles and phases.


Introduction
The age of the world's population has increased and the proportion of older adults has rapidly grown in the last decades. The aging process leads to a reduction in physical function and muscular mass, as well as to a higher risk of disability, diseases, autonomy loss, and premature death [1]. In this regard, frailty is an age-associated health state characterized by vulnerability caused by aging (>65 years) and other processes [2].
Physical function in the elderly is crucial to autonomously perform activities of daily living [3,4]. The process of aging is particularly conditioned by healthy lifestyle habits, especially physical activity levels [5]. An objective evaluation of physical function is thus relevant to identify frail older adults who are at risk of losing their autonomy and independence [6]. In this regard, previous studies have identified muscle weakness as a factor related to a higher risk of falling [7][8][9][10].
Simple field tests such as the Timed Up and Go test (TUG) [9] and the 30 s Chair Stand test [10,11] are commonly used to evaluate physical function in the elderly. The score in the TUG is the time required to rise from a chair, walk as fast as possible towards a mark placed at 2.44 m (8 feet) from the chair, turn around, and sit again. In this test, the time recorded may be influenced by the ability of the rater [12]. Therefore, the use of automatic chronometers is recommended in order to remove the human variability of the rater and increase the reliability of the test. However, to our knowledge, there is no study aimed at evaluating the test-retest reliability of the TUG using automatic chronometers.
In the case of the 30 s Chair Stand test, the traditional test only provides a crude outcome measure (i.e., the number of times that a person is able to stand up from a chair and sit back), which may limit the clinical relevance of the test. Using the same automatic chronometer, it is possible to obtain additional kinematic parameters that may lead to obtaining extra information and drawing further conclusions [13]. Previous studies reported the association between kinematic performance of the 30 s Chair Stand test and frailty in older adults [14], fall status in healthy people [15], impact of chronic pain [16], or impaired postural control in patients with chronic obstructive pulmonary disease [17]. Although the clinical relevance of the kinematic performance in this test is well known, to our knowledge, there is a lack of studies that evaluate the test-retest reliability of these parameters, which may help to judge and interpret changes caused by an intervention or differences between populations. Therefore, studies are needed that report on the reliability, standard error of measurement, and smallest real difference (SRD) of kinematic data obtained during the execution of the 30 s Chair Stand test in different populations.
The main objective of the present study is to provide reliability parameters for the TUG and the 30 s Chair Stand test in healthy older adults. This study also aims to compare the results recorded using a manual stopwatch and an automatic chronometer in the TUG, as well as to report the reliability, standard error of measurement, and smallest real difference using each method.

Participants
A total of 99 healthy older adults participated in the study. The inclusion criteria were set as follows: (a) be aged 65 or more and (b) be able to walk autonomously without using any support. Participants were excluded when any of the following criteria was fulfilled: (a) be institutionalized or (b) suffer from a disease or injury that might affect the results in the tests. All participants understood and signed the written informed consent. The study was conducted in the elderly association from Montemor-o-Novo and the elderly center Airpiffs from Évora. The protocol of the study was approved by the Committee of Bioethics and Biosecurity of the University and is in agreement with the guidelines and values of the updated Helsinki Declaration and with the national legislation on bioethics, biomedical research, and personal data confidentiality (Code: GD/42998/2016; Date: 16/11/2016).

Procedure
First, anthropometrical measures were assessed using a SECA weighing device (SECA, Hamburg, Germany) and a measuring rod. Afterward, participants performed a light warm-up including self-paced walking and joint mobility for 5 min (see Figure 1).
All participants performed the TUG five times. The rest between each repetition and the subsequent one was one minute. Finally, five minutes after the end of the fifth repetition, participants were asked to perform the 30 s Chair Stand test. This test was carried out twice, with three minutes' rest between each one. In the TUG, an automatic chronometer was placed on the chair in order to assess the time required to complete the task. Specifically, the chronometer was the Chronopic (Chronojump, BoscoSystem ® , Barcelona, Spain) [18]. This device is based on an electric circuit that can be opened and closed: when participants are touching the device wearing a vest with a metallic tape, the circuit is closed, whereas when participants lose contact with the device, the circuit is opened. The Chronopic tracks the amount of time the circuit is opened and closed. Therefore, in the TUG, participants started with their back touching the back support (circuit closed), then they stood up losing the contact (circuit open), and at the end of the task participants returned to the initial position (circuit closed again). A trained rater also measured the time manually with a stopwatch.
In the case of the 30 s Chair Stand test, the same device (Chronopic) was used to evaluate the time spent in each repetition. However, two phases were identified in the sit-to-stand-to-sit cycle: impulse phase, which is defined as the time elapsed from when the buttocks come into contact with the seat until the buttocks lose contact with the seat (i.e., all the time that the participant is seated) [14,16], and the no-contact phase, which is defined as the time elapsed from when the buttocks lose contact with the seat until the contact is made again. This phase comprises two phases defined by Millor et al. [14]: stand-up phase and sit-down phase.

Statistical Analysis
Descriptive statistics (mean and SD) of age and anthropometric measurements were calculated for the whole sample, men and women. Parametric and non-parametric tests were conducted based on the results of Kolmogorov-Smirnov tests. Differences between test and retest were evaluated using the paired samples t-test or Wilcoxon test when appropriate. Recommendations by Weir [19] were followed in order to evaluate the reliability of the tasks. The selected intraclass correlation coefficient (ICC) was 3,1 (two-way mixed, single measures). Absolute reliability was determined by computing the standard error of measurement, which is calculated as SEM = SD· where SEM is the standard error of measurement and SD is the mean SD of the two repetitions (test and retest). The smallest real difference was also calculated as 1.96·SEM· √ 2. Both the standard error of measurement and the smallest real difference were converted into percentages to enable comparisons with further studies.
For the TUG, differences between the result obtained using a manual stopwatch and that recorded using an automatic chronometer were tested with repeated measures ANOVA and the Friedman test. Pairwise comparison analyses were performed through paired samples t-test and Wilcoxon test when appropriate. In these tests, pairwise comparisons between different repetitions were evaluated in order to provide reliability parameters according to the number of repetitions performed and to identify the optimum number of repetitions that should be performed. Therefore, statistical differences and reliability analyses were computed between each repetition and the consecutive one (1 vs. 2, 2 vs. 3, 3 vs. 4, and 4 vs. 5). Other comparisons (2 vs. 4, 2 vs. 5, and 3 vs. 5) were computed in order to draw conclusions about the most suitable number of repetitions. This procedure was performed with data from the manual stopwatch and also with results from the automatic chronometer. Moreover, Spearman's Rho and Pearson's r correlation coefficients were extracted to evaluate the relation between the results of the manual stopwatch and the automatic chronometer for each of the repetitions.
All analyses were performed using SPSS v21 (IBM, Armonk, NY, USA) and Microsoft Excel 2013. Significance level was set at p < 0.05. Table 1 shows the main characteristics of the participants. Mean age (SD) was 70.63 (5.57) for men and 72.03 (6.83) for women. Regarding anthropometrical variables, women were shorter than men (162.03 vs. 172.55) and their weight was lower than that of men (70.97 vs. 80.38). There were no significant differences in body mass index (BMI) according to gender (26.98 vs. 27.05).

Results
Reliability parameters obtained for the 30 s Chair Stand test are summarized in Table 2. Almost each variable achieved good reliability considering the classification by Munro et al. [20] (i.e., 0.70 to 0.90), except all parameters from the initial repetition and also the mean duration of the impulse phase from the last sit-to-stand-to-sit cycle of the test (<0.70). The smallest real difference oscillated between 13.1% for mean total duration of the last sit-to-stand-to-sit cycle and 50.50% for mean duration of the impulse phase from the first repetition of the test. Table 3 shows the differences between the TUG assessed using an automatic chronometer and a manual stopwatch. Results point to the fact that there were some differences in repetitions but not in all of them.
Reliability parameters obtained for the Timed Up and Go test are summarized in Table 4. All the repetitions, for both automatic and manual chronometers, achieved good (>0.70) or excellent (>0.90) reliability considering the classification by Munro et al. [20]. The smallest real difference oscillated with the manual stopwatch between 12.21% in the comparison between the 2 and the 3 repetitions and 16.68% in the comparison between the 2 and the 5 repetitions for mean total duration of the Timed Up and Go test. In the automatic chronometer, the smallest real difference oscillated with the manual stopwatch between 11.50% in the comparison between the 2 and the 3 repetitions and 16.79% when comparing the 3 with the 5 repetitions. In addition, repeated measures ANOVA (F = 9.036; p-value < 0.001; η 2 = 0.084) and the Friedman test (p-value < 0.001) showed a significant effect within the five repetitions with the manual stopwatch. In the same vein, the automatic chronometer, repeated measures ANOVA (F = 5.885; p-value < 0.001; η 2 = 0.057) and the Friedman test (p-value < 0.001), showed a significant effect within the five repetitions. Pairwise analyses are detailed in Table 4. Significant between-repetition differences only occurred when the first and the fifth repetitions were included. There were no differences among repetitions 2, 3, and 4.

Discussion
The main finding of the present study is that the 30 s Chair Stand test and the TUG are reliable in healthy older adults. Furthermore, this study provides the smallest real difference not only for the crude results of the tests, but also for the sit-to-stand-to-sit cycle, impulse phase and no-contact phase at the beginning and at the end of the 30 s task. Results indicate that the number of repetitions and mean durations of the total sit-to-stand-to-sit cycle, impulse phase and no-contact phase had very good reliability (ICC between 0.80 and 0.90). However, the reliability of those variables when analyzing the beginning or the end of the task was not always good and should be used with caution.
As expected, reliability was always better when the automatic chronometer was used to evaluate the TUG. These results are in line with those reported by Collado-Mateo et al. [12], who stated that using an automatic chronometer to assess the TUG in women with fibromyalgia increased the ICC and reduced the standard error of measurement and the smallest real difference. The relevance of the TUG in the elderly is well known, being one of the physical fitness tests which better correlates with health-related quality of life [21].
The use of an automatic chronometer may be a low-cost alternative to assess kinematic parameters in physical fitness tests, providing additional data that may be relevant for clinicians and researchers. The present study showed that it is possible to assess the impulse phase using this kind of device, achieving good reliability (ICC = 0.821). Lack of muscle strength could cause an increment in the duration of the impulse phase during the whole 30 s task. This could be an issue among older adults since the difference between the first and the last repetitions was large, that is, the impulse phase changed from 1.09 s in the first cycle and 1.33 s in the last (22% increase), while the no-contact phase increased from 1.94 s to 2.02 s (4% increase). Therefore, further studies may investigate the relevance of the kinematic performance over the 30 s Chair Stand test. In this regard, fatigability is increased in older adults and may lead to a high risk of mobility loss and a risk of falling [22]. Lindemann et al. [23] also pointed out the relevance of this evaluation and suggested a protocol based on the velocity of the sit-to-stand performance. That protocol may be complementary to a protocol based on duration of phases or cycles since the automatic chronometer and the linear encoder may be used simultaneously.
Another relevant contribution of the present study is the calculation of reliability for five repetitions of the TUG. Most of the studies in the literature chose performing two or three repetitions and recording the second, the third, the mean, or the best repetitions. Our results clearly indicate that the first repetition may be considered as familiarization, since the mean time is significantly higher than the second repetition. Similarly, the fifth repetition is significantly higher than the previous one. This could be due to the resting time, which was 1 min in this study. Thus, fatigue may appear after four repetitions with a 60 s rest between them in older adults. Therefore, according to the results from Tables 3 and 4, a minimum of two and a maximum of four repetitions should be performed in this population. There was no significant change from the second to the third repetition, nor from the third to the fourth. Furthermore, the third repetition was the best of the five repetitions. In summary, these results may indicate that the third repetition is the most adequate to be used as the final score in the TUG. Another adequate alternative would be to compute the average of the second, the third and the fourth repetitions.
Some limitations may be mentioned in this study. First, the sit-to-stand-to-sit cycle could only be divided into two phases (impulse phase and no-contact phases) using the Chronopic device. Thus, other relevant phases such as sit-to-stand or stand-to-sit could not be identified since the key point for this division is the standing position. Second, the height of the chair was the same for all participants regardless of their gender or height. Most studies using these tests follow this protocol, but it is known that the height of the chair might alter the kinematic performance of the sit-to-stand-to-sit cycle [24].

Conclusions
The reliability of the TUG and the 30 s Chair Stand test is good in healthy older adults. Using an automatic chronometer is possible to identify relevant and reliable phases in the sit-to-stand-to-sit cycles from the 30 s Chair Stand test. Furthermore, the reliability of the TUG can be improved by increasing the ICC and reducing the smallest real difference and the standard error of measurement when the result is assessed using that device. Therefore, the use of an automatic chronometer is recommended to evaluate the 30 s Chair Stand test and the TUG in the elderly.