Quantitative falls risk assessment in elderly people: results from a clinical study with distance based timed up-and-go test recordings

Objective: A third of people over 65 years experiences at least one fall a year. The Timed Up-and-Go (TUG) test is commonly used to assess gait and balance and to evaluate an individual’s risk of falling. Approach: We conducted a clinical study with 46 older participants for evaluating the fall risk assessment capabilities of an ultra-sound based TUG test device. The fall protocols over a period of one year were used to classify participants as fallers and non-fallers. For frailty evaluation, state-of-the-art questionnaires were used. Fall recordings were compared to six TUG test measurements that were recorded in fallers and non-fallers. Main results: TUG test data were available for 39 participants (36 f, age 84.2 ± 8.2, BMI 26.0 ± 5.1). Twenty-three participants did fall at least once within the fall screening period. We fitted two different regression and probability models into a region of interest of the distance over time curve as derived from the TUG device. We found that the coefficient of determination for Gaussian bell-shaped curves (p < 0.05, AUC = 0.71) and linear regression lines (p < 0.02, AUC = 0.74) significantly separated fallers from non-fallers. Subtasks of the TUG test like the sit-up time showed near significance (p < 0.07, AUC = 0.67). Significance: We found that specific features calculated from the TUG distance over time curve were significantly different between fallers and non-fallers in our study population. Automatic recording and analysis of TUG measurements could, therefore, reduce time of measurements and improve precision as compared to other methods currently being used in the assessments of fall risk.


Introduction
Frailty and falls are the main causes of morbidity and disability in elderly people. Around one third of persons over 65 years fall at least once a year (Tinetti and Kumar 2010). Several studies have shown that 40% of falls in nursing homes are related to posture changes from sitting to standing (Rapp et al 2012, Goswami 2017. According to previous studies, the strongest risk factors for falling are previous falls, the strength of a person, gait characteristics, balance impairments and the usage of specific medications Kumar 2010, Tinetti et al 1988).
The Timed Up-and-Go (TUG) test is a commonly used tool for evaluating elderly individuals' risk of falling (Panel on Prevention of Falls in Older Persons, American Geriatrics Society and British Geriatrics Society 2011, Kang et al 2017). It measures the time in seconds taken by an individual to stand up from a standard chair, walk for 3 meters, turn, walk back to the chair, and sit down. It includes a variety of functional mobility tasks (TUG subtasks), such as standing up, walking, turning, and sitting down. The TUG test has been recommended to assess gait and balance (Herman et al 2011). Numerous authors have investigated TUG ability of assessing the fall risk. (Killough 2006) mentioned the ROC curve for the previous fall analysis which demonstrated an area under the curve (AUC) of 0.64 to differ between fallers and non-fallers and (Andersson et al 2006) showed in 105 participants that TUG can be used to evaluate which patients tend to fall in order to carry out preventive measures (positive predictive value [PPV] = 59%). Nevertheless, several studies looking at the total TUG time (that is, time from standing up until sitting down again), however, have shown limited ability to predict falls (Barry et al 2014). Kojima et al (2015) monitored 259 participants over 6 months and concluded that the ability of TUG to predict future falls is limited with an achieved AUC of 0.58. Viccaro et al (2011) reported that the TUG test did not add predictive ability rather than using gait speed for fall classification in 457 over 1 year (both AUCs < 0.7). Also, Beauchet et al (2011) mentioned in their review that TUG is discussible, and that the predictive ability is limited. Nocera et al (2013) recommended including covariates like disease severity, quality of life and cognitive abilities to increase the number of correctly classified TUG test samples. TUG tests with such covariates (a secondary task) like a cognitive task and TUG with a manual task like physical exercises have been evaluated by different authors. Virtuoso et al (2014) performed a study with 82 physically active old people over 12 months and achieved an AUC of 65.3 and 58.1 with the cognitive TUG test for predicting the occurrence of falls. While Cardon-Verbecq et al (2017) also did not find an improved predictive ability in a study with 157 participants, and Sailer (2016) did not find the cognitive TUG to be an effective measure of fall risk in 14 participants. Shumway-Cook et al (2000) reported in a study with 30 participants that TUG's ability to predict falls is not enhanced by adding a secondary task. Authors such as Hofheinz and Mibs (2016) reported AUC curve results of a study with 120 patients for TUG with manual task 0.65 and for standard TUG 0.58, respectively. Looking at gait characteristics, Greene et al (2010) used body-worn sensors while each patient performed the TUG test. This method offered an improvement by using gait characteristics to discriminate between fallers and non-fallers. In particular, gait variability may seem to confer the risk of falling (Callisaya et al 2011, Hausdorff et al 2001. Van Schooten et al reported daily-life assessments and gait quality as useful predictors for falls with an AUC up to 0.76 (Van Schooten et al 2016).
Besides body-worn sensors, clinical trials using motion analysis systems with cameras and reflective markers placed on specific anatomic points to assess the time of TUG subtasks for fallers and non-fallers also have shown significant differences (Ansai et al 2018, Li et al 2018. Other automatic TUG test analysis technologies exist like Higashi et al's (2008), who investigated the detection of TUG subtasks by using gyroscopes and accelerometers attached to the subjects' waists and lower limbs. Salarin et al (2010) published an instrumented version of TUG, called iTUG, or there is a solution called aTUG provided by Frenken et al (2011) which is based on the usage of ambient sensor technologies like light barriers, force sensors, and a laser range scanner built into a single apparatus in a chair.
As an alternative to the complex video-based systems and body-worn sensors, we developed a method to evaluate the TUG test, different TUG subtasks and additional gait characteristics with an ultrasonic sensor (Ziegl et al 2018). This method provides a distance over time curve with features not used for fall prediction so far.
The present study compared the fall risk derived from state-of-the-art risk scores with the TUG time and specific parameters from the TUG distance over time curve.

Study design
The study was conducted between February and July 2018. During the 15 weeks of study, TUG measurements were recorded six times (one TUG measurement every three weeks). Demographic data were collected at baseline. Medication intake was recorded for each participant at baseline and at the date of the last TUG measurement. All falls of the participants were recorded with the exact date and time during the period from November 2017 to December 2018 (figure 1). The detailed timeline for each participant within the measurement period can be seen in figure 2.
Ethical approval was obtained by the ethics committee of the Medical University of Graz (GZ: ABT08-182942j2016 PN:8011; 30.1.2018).

Participants
Persons of age ≥ 65 years from four different nursing homes operated by the Geriatric Health Care Center Graz were enrolled in the study. Persons with the following criteria were included: 1. Age 65 years and older. 2. Mobile, able to walk the complete TUG distance and back (walking aids like walkers and walking sticks were allowed). 3. Living in one of the four participating geriatric nursing homes in Graz. 4. Cognitive competent to give a declaration of consent.
The exclusion criteria included: 1. Not living in one of the four participating geriatric nursing homes in Graz. 2. Suffering from a tumor or other severe illnesses. 3. Being immobile. 4. Suffering from a severe dementia.
All participants gave written informed and written consent prior to inclusion in the study.

Frailty assessment, quality of life measurement and medication
For frailty assessment, two questionnaires were used, i.e. the Groningen Frailty Indicator (GFI) (Drubbel et al 2013) and a modified version of the Falls Efficacy Scale created by Tinetti et al (1990). The GFI is a 15-item questionnaire-with a score range from zero to fifteen-that assesses the physical, cognitive, social and psychological abilities of a person. With a GFI score of four or greater, a person can be categorized as frail.
The Falls Efficacy Scale consists of ten activity items that can be rated with a score from 1 (very confident) to 10 (not confident at all). The query 'Prepare meals not requiring carrying heavy or hot objects' was excluded from the list of activities as the participants of this study did not perform this task in their daily life. Consequently, a maximum score between 0 and 90 was possible. A total score of greater than 70 indicates that the person has a fear of falling.
To assess the quality of life, the EQ-VAS (5 l version) questionnaire has been used. This is a scale numbered from 0 to 100: 100 implies the best possible health status and 0 the worst possible health (Feng et al 2014).
Prescribed medications of all patients were noted at the begin of the measurement period and at the date of the last TUG measurement. The number of medications was calculated as the total sum of drugs. Furthermore, each medication was classified on the basis whether it contained Benzodiazepines or not, as Benzodiazepines are associated with a greater fall risk (José et al 2017).

Timed up-and-go (TUG) recording
The TUG measurements were done with a previously developed ultrasonic TUG device (Ziegl et al 2017).
Attached to the backrest of a chair, it measured with a sampling frequency of 10 Hz the distance to the participant. The chair with the device was placed in front of a wall. After switching on the device, it guided the user to adjust the device correctly, 3.5 meters away from a wall. When the participant was correctly sitting on the chair (distance < 10 cm for > 5 samples, i.e. > 0.5 s), the device gave an acoustic start signal. The participant walked to the wall, turned around and walked back to the chair. Upon sitting down (distance < 10 cm), the device stopped the recordings and evaluated the test (Ziegl et al 2017). The sampled distance values for every test were directly stored on the device and later transmitted to a PC.

Statistical analysis
Statistical analysis was carried out using our Predictive Analytics Toolset for Health and Care Applications (PATH) (Hayn et al 2018), a MATLAB ® based system. In addition to statistics, PATH was used for data management, signal processing, and Machine Learning functionalities (Sams et al 2019).
We attempted to classify fallers and non-fallers using demographic data, frailty evaluations and TUG recordings with all their sub-features. Participants were classified as fallers if they had been falling at least once within the fall screening period which includes some months before and after the measurement period. If more than one TUG test was performed by a single patient, the mean value of each feature as achieved from the consecutive tests was used for further analyses. To compare the results of fallers and non-fallers, we used two-sided unpaired u-tests. P < 0.05 was considered to indicate statistical significance. Descriptive statistics are presented as mean ± SD and as boxplots, which show the distribution of the measured parameter for fallers and non-fallers. Furthermore, the median (MED), the interquartile range (IQR) and the minimum (MIN) and maximum (MAX) values are displayed in the figures.
To evaluate the performance of different classifiers, we calculated the 'True Positive Rate' (TP) and 'False Positive Rate' (FP) for different thresholds. TP was plotted against FP results in the Receiver Operating Characteristic (ROC) curve. The 'AUC' was used as measure of the classification performance (also known as C-statistics). Figure 3 shows the signal processing and the first part of feature extraction from the 'Raw TUG data' . Spikes were detected as super-threshold absolute values of gradients greater than 4 m s −1 , resulting in a curve named 'TUG data after spike removal' . Absolute distances were used to mark segments corresponding to subtask like sitting up, walking forward, turning around, walking back and sitting down, as detailed in Ziegl et al (2018).

Normal distribution fitting
As can be seen in figure 3, a Gaussian bell-shaped curve (probability density function of a normally distributed random variable) ('Fitted curve') was fitted into a marked area ('Selected fitting data'). This area consisted of two parts (left and right) that reached from the sit-to-stand point to 50 cm before the turn as well from 50 cm after the turn to the begin of the sitting down period. The fitting was done by using both segments of 'selected fitting data' of the curve. The coefficients of the following model were estimated: with a, b, c, and d being the coefficients of the normal distribution approximation. The proportion of the total sum of squares explained by the model as a scalar was calculated as following: SSE was the sum of squared error, SSR the sum of squares accounted for by the regression and TSS the total sum of squares of the dependent variable. Alternative start ('Alt. start') and alternative stop ('Alt. stop') points were calculated by determining the intersections of tangents at the highest gradients of the fitted curve with the horizontal line at y = 0.

Linear fitting
In figure 4, two linear regression lines were fitted to the rising and falling slope. Again, the intersections of these fitted lines with the horizontal line at y = 0 indicated alternative start ('Alt. start') and endpoints ('Alt. stop'). The point 'Center' was calculated by estimating the mean between alternative start and endpoints. Linear regression fitting was used on both sides of the curve. The 'Selected fitting data' of the curve were again used for the fitting. While y 1 represented the regression line of the rising flank with its coefficients β 0 and β 1 , y 2 represented the regression line of the falling one (coefficients β 2 and β 3 ): Based on the coefficients of the linear regression β 0 and β 1 , the point of intersection with the x-axis was calculated for y 1 = 0 (Alt. start) and y 2 = 0 (Alt. stop): The walking time was calculated by subtracting the point 'Alt.stop' with the point of 'Alt. start' . The list of features extracted based on the fitting includes: • RMSE (Root Mean Squared Error)

Results
We recruited 46 residents from four nursing homes in Graz. Seven of them were excluded due to a health status preventing them from performing the measurements or because they declined to participate in the study, resulting in a remaining number of 39 participants. During the measurement period, 23 participants fell at least once and in total 75 times (in average 3.3 ± 5.1 times).

Timed up-and-go (TUG) features as fall classifiers
From all parameters, TUG signal features showed the best performance when applied as fall classifiers. The R 2 coefficient exhibited significant differences between fallers and non-fallers with p < 0.05 for normal distribution fitting and p < 0.02 for linear regression fitting. Both distributions of values for R 2 can been seen in figure 5. The complete TUG time values tended to be longer in fallers but did not reach significance (p = 0.19). Similar results were achieved for the total sit-to-stand time (p = 0.07). Both distributions are shown in figure 6. The R 2 parameters for normal distribution and linear regression fitting were selected due to their significance. All four parameters are evaluated as classifiers with a receiver operating characteristic in figure 7.

Assessment of the fitting
To evaluate the fitting process for the Normal distribution and linear fitting the calculated upper adjacent, median and lower adjacent for the distribution of R 2 Normal Distribution (left) and R 2 Linear Regression (right) were calculated. These values are shown in table 1. The distribution of these values is shown in figure 8 as boxplots. Figure 9 shows example signals for signals from different participants with these parameters. All features evaluated for fall classification can be seen in table 2.

Discussion
With this study we have tried to get new insights in the functional health state and consequently the risk of falling of elderly people. We compared various parameters as derived from the TUG test, medication and state-of-the-art frailty assessment scales with respect to their link to falls. The coefficient of determination for Gaussian bell-shaped curves and linear regression lines significantly separated fallers from non-fallers. Subtasks of the Timed Up-and-Go test like the sit-up time showed near significance. Neither the amount of medication nor a binary value (Benzodiazepine yes/no) varied between fallers and non-fallers. The same was the case for the modified version of the Falls Efficacy Scale by Tinetti and the GFI. Calculating the subtask features from the signal which is based on total distances led to a better separation of fallers and non-fallers. A p-value close to significance (0.07) was found for the sit-up time. When analyzing the TUG curves, we visually identified that non-fallers seemed to have more regular and smooth TUG curves as compared to fallers. These findings can be interpreted as gait variability which is derived from fluctuations in gait rhythm of the participant and associated with the risk of falling in elderly people. We tried to quantify this variability by fitting a normal distribution into the distance over time TUG curve. R 2 , the mean squared error between the fitted curve and the actual values. The significantly different values for this parameter obtained from fallers and non-fallers indicates that this parameter may indeed help to separate fallers and non-fallers. It was noticeable, that the shape of the rising and falling flank was often smooth but had different gradients. Therefore, linear regression for each flank was determined separately. The R 2 parameter for this type of modelling also led to a significant difference between fallers and non-fallers.
Our literature research showed different results for calculating the ability of TUG to predict falls. The AUCs reached from 0.58 to 0.76. The higher areas were mostly achieved by considering additional tasks and parameters than just the TUG time. Therefore, our resulting AUC for R 2 of the Linear Regression model of 0.74 as well as R 2 of Normal Distribution model of 0.71 can be considered as strong markers compared to the state-of-the-art. These results could potentially be translated into practice by identifying people who have a high risk of falling and offering them mobility programs to counteract muscle weakness, balance and gait problems to prevent them from falling. Due to the simplicity and non-invasiveness of the TUG measurement method, repeated use for the assessment of medium and long-term intraindividual changes is possible. However, the potential benefit for trend monitoring should be evaluated in a follow-up study. Originally, we planned to recruit 60 persons and expected a dropout rate of 20%. Finally, 46 of them signed an informed consent and 39 of them attended at least one measurement session. This resulted in a lower dropout rate of 15%(expected dropout 20%) but also a lower number of participants in total and a correspondingly lower statistical power to detect significant differences between fallers and non-fallers. Also, the numbers of male and female participants were unbalanced. With a median TUG time of 24.2 s, most of the participants in our study exhibited values well beyond 13.5 s, the widely used cut-off time to identify individuals at higher risk of falling used in the existing literature. This generally low level of mobility in our cohort needs to be taken into account when transferring our findings to more vigorous cohorts with shorter TUG times.
Nevertheless, we found significant differences between fallers and non-fallers in features obtained from fitting normal distribution and linear regression models to the TUG distance over time curve. Neither information from medication data or quality of life or frailty questionnaires was significantly related with falling, nor the complete TUG time, although the latter came close to significance.
As the TUG distance values resulted from an ultrasonic measurement including noise, signal processing was necessary. Observed disturbances were mostly spikes that resulted from specular reflectance and acoustic noise. We managed this issue by looking at high derivative values in the signal and furthermore by detecting the beginning and the end of every spike, before removing it. This processing step worked well in the supervised setting, where all TUG tests were performed correctly. In an unsupervised setting, unusual TUG runs could happen which could bring the spike detection algorithm to its limits, potentially necessitating an adapted algorithm in such a setting.
Fall prediction based on automated TUG recordings could help to prevent falls in persons, who perform the test at home. In conclusion, we have found completely new indicators that seem to be superior to previously investigated once. Even if falls still remain hard to predict, these indicators could potentially open a new route for assessing elderly patient's risk of falling. If confirmed in a larger and potentially better-balanced population, this approach could lead to advances in falls risk prediction in terms of time consumption and precision as compared to existing methods. 4.7 ± 6.0 1.9 ± 5.7 3.1 ± 5.9 0.20