Physical and functional measures predicting long-term mortality in community-dwelling older adults: a comparative evaluation in the Singapore Longitudinal Ageing Study

Measures of functional status are known to predict mortality more strongly than traditional disease risk markers in old adult populations. Few studies have compared the predictive accuracy of physical and functional measures for long-term mortality. In this prospective cohort study, community-dwelling older adults (N = 2906) aged 55 + (mean age 66.6 ± 7.7 years) were followed up for mortality outcome up to 9 years (mean 5.8 years). Baseline assessments included Timed Up-and-Go (TUG), gait velocity (GV), knee extension strength, Performance Oriented Mobility Assessment, forced expiratory volume in 1 second, Mini-Mental State Examination (MMSE), Geriatric Depression Scale, frailty, and medical morbidity. A total of 111 (3.8%) participants died during 16976.7 person-years of follow up. TUG was significantly associated with mortality risk (HR = 2.60, 95% CI = 2.05–3.29 per SD increase; HR = 5.05, 95% CI = 3.27–7.80, for TUG score ≥ 9 s). In multivariate analysis, TUG remained significantly associated with mortality (HR = 1.64, 95% CI = 1.20–2.19 per SD increase; HR = 2.66, 95% CI = 1.67–4.23 for TUG score ≥ 9 s). In multivariable analyses, GV, MMSE, Frailty Index (FI) and physical frailty, diabetes and multi-morbidity were also significantly associated with mortality. However, TUG (AUC = 0.737) demonstrated significantly higher discriminatory accuracy than GV (AUC = 0.666, p < 0.001), MMSE (AUC = 0.63, p < 0.001), FI (AUC = 0.62, p < 0.001), physical frailty (AUC = 0.610, p < 0.001), diabetes (AUC = 0.582, p < 0.001) and multi-morbidity (AUC = 0.589, p < 0.001). TUG’s predictive accuracy shows surpassing predictive accuracy for long-term mortality in community-dwelling older adults.

AGING smoking-, alcohol-and obesity-related diseases. Indeed, among very old people (over 75 s), paradoxical inverse mortality risks are sometimes found in association with obesity and cholesterol. Smokers, drinkers and obese individuals who survive into older age may perhaps have genetic and/or environmental characteristics that protect them against the toxic effects of harmful habits. Risk factors measured at old age may not reflect lifetime exposures since non-smoker and non-drinkers may have stopped their habits for health-related reasons, and there may have been significant weight changes previously.
Clinical measures of health and functional statuses such as cognition [3], depression [4], impaired pulmonary function [5], slow gait velocity [6] and frailty [7] have been investigated and consistently shown to predict mortality among older adults. These measures are not only related to specific chronic disease(s) or multimorbidity, but also reflect the broad underlying intrinsic capacity of older people resulting from the interaction of physical and mental health declines. Few studies have evaluated various physical and functional measures together and compared their performance in predicting long-term mortality.
In this study, we evaluated the predictive accuracy of TUG for long-term mortality and compared its performance with those of other commonly used measures of physical strength, balance and gait, functional mobility, global cognition and depression in a cohort of over-55-year-olds participating in the Singapore Longitudinal Aging Study 2 (SLAS-2) followed up for mortality risks up to 9 years (mean of 5.8 years). We hypothesized that the TUG has surpassing accuracy for predicting long-term mortality over gait velocity (measured on the fast gait test), knee extension strength (KES), the Performance-Oriented Mobility Assessment (POMA), the Mini-Mental State Examination (MMSE), depressive symptoms (measured by the Geriatric Depression Scale), forced expiratory volume -one second (FEV1), as well as frailty (Frailty Index and Physical Frailty phenotype) and multi-morbidity, which are two other clinical diagnoses known to predict mortality.

AGING
We conducted further stratified analyses by sex and age groups and found consistent associations and similar predictive accuracy for both men and women and younger (<75) and older (≥75) individuals (Supplementary Tables 1 and 2).

DISCUSSION
In this study, we re-capitulated previous observations that physical and functional measures predict mortality risk.
Notably, we showed that TUG, gait speed, KES, FEV1, and frailty were significantly associated with increased mortality, even after adjusting for sociodemographic, lifestyle, and traditional disease and health behavioural risk markers. Diabetes, cardiovascular disease, and multimorbidity were also associated with increased mortality risks but low predictive accuracy in this cohort. Notably, compared with standardized units using their SD value, TUG showed the strongest hazard ratio for mortality risk among physical and functional measures. The AUC's for all measures clearly showed that the discriminant accuracy for predicting mortality risk was highest for TUG. The finding remained consistent, whether the TUG was analyzed as a continuous variable or a binary categorical variable with the cut-off of 9 s.
Previous studies have reported similarly that TUG predicts mortality [17][18][19][20][21][22][23]. Among them, three studied only men [17] or women [18,19]; one studied middleaged postmenopausal women [19]; three were Asian studies [21][22][23], of which one evaluated short-term 2year mortality risk [22], and one evaluated cardiovascular mortality [23]. Only a few studies, AGING beside our study, evaluated TUG alongside other physical or functional measures: one study evaluated two measures (TUG and handgrip strength) [22], another study evaluated four measures (TUG, handgrip strength, five times sit-to-stand test, standing balance) [17], and another study also evaluated four measures (TUG, usual gait velocity, functional reach, one-leg stance) [18]. Our finding showing TUG to have surpassing predictive accuracy for long-term mortality is consistent with the findings reported of older men aged 71-86 in Belgium [17], and another cohort of older men and women aged 65-94 in Singapore [22]. However, differing results were reported by Idland et al., who followed up a small group of 300 communitydwelling older women (mean age 80.9 years) for 13.5 years showing that usual gait velocity was the strongest predictor for all-cause mortality [18]. We performed stratified analyses by sex and found consistent associations and predictive accuracy for both men and women.
The TUG is a complex test of functional mobility that reflects strength, balance and mobility through assessing the ability to transfer, sit-to-stand, walk, and turn [9,10]. The sit-to-stand component includes a sequence of multiple subtasks, requiring forward movement of the centre-of-mass while still seated (preparatory to standing), acceleration of the centre-ofmass in the anterior-posterior and vertical plane, pushoff, and stabilization once standing is achieved. The walking component requires appropriate initiation of stepping, acceleration and deceleration, and preparation to turn twice. The first turning sequence and the final turning around to sit down requires some level of planning, orientation in space and organization. The transfer and turning components are thus cognitively demanding, particularly on tasks of executive function [24].
The significant correlations between TUG and other physical and functional measures suggest that they have overlapping and non-overlapping domains of physical, cognitive and functional performance with each other. TUG is less correlated with muscle strength (KES) than with gait speed. This is in accord with observations [25] that muscle strength partially determines variations in gait performance, besides other determinants such as reaction time, balance, and proprioception. AGING Furthermore, physical performance tests decline faster than muscle decline in the older population [26]. GV was also shown in this study to be more strongly predictive of mortality than muscle strength.
Muscle strength and gait speed are recommended for diagnosing sarcopenia and assessing its severity, respectively [27]. TUG's strong association with mortality is likely due to its ability to identify sarcopenia and frailty; both documented to predict mortality [7,28]. Sarcopenia, involving the accelerated loss of generalized skeletal muscle mass and function, is considered a precursor and component [29,30] of frailty, which increases the vulnerability to adverse health outcomes. Sarcopenia is about twice as common as frailty, depending on the criteria used [29]; hence not all sarcopenic older people are frail. Two widely accepted operational conceptualizations of frailty are used in this study: the FI considers the cumulative deficits from all diagnosable health conditions; the other physical phenotype of frailty is more closely related to sarcopenia but includes inactivity and exhaustion as additional criteria. Per other studies [7], FI appears to be a stronger predictor of mortality in this study.
Taken together, TUG thus provides more information in a single test than GV, POMA, FEV1 or MMSE alone. It also shows a surpassing accuracy than these physical and functional tests, as well as known disease and health risk markers in predicting mortality. Among the latter, only smoking showed a relatively high AUC of 0.662, whereas BMI and central obesity showed AUCs significantly below 0.50, consonant with their wellknown paradoxical 'protective' effect on mortality that has been reported in numerous studies [31]. On the other hand, age showed a higher AUC of 0.730.
Although the TUG appears to have only marginally higher AUC than age in predicting mortality, this does not detract from its potential clinical utility. TUG differs from age in being a modifiable risk predictor that provide clinically useful information for targeted intervention to reduce mortality risk.

TUG cut-off
Our results align with previous studies showing a monotonic increase of mortality risk per SD increase in TUG [28]. There is no recommended cut-off for mortality prediction. Various optimal cut-off points are recommended specifically for different predicted adverse outcomes and different population groups of healthy and unwell persons. For example, the American Geriatrics Society (AGS) and British Geriatrics Society (BGS) guidelines recommended a TUG cut-off of 13.5 s for fall risk prediction of community-dwelling older adults [32].
Asian older adults have a lower TUG than the Caucasian population due to the differences in habitual gait speed [33]. Two studies of Japanese and Singaporean older adults suggest appropriate cut-offs of 9.0 s or 9.5 s for ADL disability risk among Asians [33,34]. Consistent with these studies, TUG cut-off of 9.0 s gave the optimal balance of sensitivity (0.656) and specificity (0.696), whereas a cut-off of 8.0 s increases the sensitivity to 0.856 (while lowering the specificity to 0.488), and a higher cut-off of 10.s increases the specificity to 0.804 (while lowering the sensitivity to 0.468).

Clinical implications
Our findings contribute to a greater appreciation of the TUG as a powerful clinical tool predicting not only physical and cognitive impairment, sarcopenia, frailty, and other adverse health outcomes [11][12][13][14][15], but longterm mortality as well. The TUG appears unique among other physical and functional measures commonly explored for use as prognostication tools in clinical research and practice. Its overall discriminant accuracy for mortality (AUC = 0.737) is no less than other accepted risk prediction or prognostication tools such as the Framingham risk index for cardiovascular disease mortality (AUC = 0.61) [35] or the BODE score for chronic obstructive pulmonary disease (AUC = 0.71) [36].
Further studies should explore whether combinations of clinical and functional markers could improve its prognostication value. Already, the TUG has been recommended by the AGS and BGS guidelines for fall risk prediction of community-dwelling older adults. As such there is broader justification for routine screening with the TUG (cut-off of ≥9 s) for early comprehensive assessment and intervention, particularly with clinical consideration of patients' life expectancy during shared clinical decision making regarding chronic disease management, major surgeries and cancer screening.

Strengths and limitations
Our study is uniquely able to evaluate the TUG alongside many clinical measures of physical and functional health status to compare their relative strengths and limitations for clinical use. We could do this in a large sample community-based cohort with diverse demographic, socio-economic and health characteristics. Follow up over 10 years for mortality was complete using computerized search for deaths via the National Death Registry. The results are reasonably generalizable to other Asian populations, but additional studies in other non-Asian ethnic populations should be conducted.

Participants and setting
The Singapore Longitudinal Ageing Study is a prospective population-based study of ageing and health transitions of older adults aged 55 and above in Singapore. The current SLAS-2 study cohort was recruited between 2009 and 2013 from the South West and South Central regions of Singapore. A total of 3270 recruited participants underwent assessments for an extensive range of psychosocial, lifestyle and behaviour, medical, biological, physiological, diet and nutrition, physical and neurocognitive functioning, and health status variables. Previous publications have described the details of the participants' recruitment and measurements [37]. The present study involved 2906 participants who provided baseline data who were followed up to 9 years (mean of 5.8) years for mortality. Participants who were not included in the mortality follow-up study did not have complete baseline data for physical, cognitive, and functional tests and did not differ substantially in baseline characteristics from the participants in this study. The study was approved by the National University of Singapore Institutional Review Board, and written informed consent was obtained.

Physical and functional performance
Timed Up-and-Go (TUG) was measured by the time taken by the participant to stand up from an armchair (46 cm height), walk 3 metres, turn, walk back to the chair, and sit down again. The participants wore their regular footwear and used their customary walking aid, if required. Participants walked at their fastest pace with no physical assistance given. The test was administered twice, and the best performance time was used [8]. Various TUG cut-offs have been proposed or recommended for falls or disability prediction specific for different populations, and there are no suggested TUG cut-offs for mortality prediction. Asians generally have shorter mean TUG (faster gait speed) than Caucasians [33,34]. We used an optimal TUG cut-off of 9.0s from receiver operating characteristics (ROC) analyses, consistent with a recommended cutoff of 9.0 s predicting disability in Japanese older adults [38].
Gait velocity (GV) was measured by the time in seconds taken for the participant to walk 6 metres at their fastest pace, averaged for two trials. Participants performed the test with a dynamic start on a smooth, flat 10-metre walkway with red-tape markers placed at the 0-, 2-, 8, and 10-metre points along the walkway, allowing for acceleration the first 2 metres and deceleration over the last 2 metres. The timing made a stopwatch is started when the toes of the leading foot cross the 2-meter mark and stopped when the toes of the leading foot cross the 8-meter mark. Cut-offs for Asians of <1.0 m/s has been recommended by previous studies [39].
Knee extension strength (KES) was measured for the lower limb maximum isometric strength. It was measured with the participant seated, the hip and knee angles at 90° using the strap and strain gauge component of the Physiological Profile Assessment [40], using three trials' dominant leg average value (in kilograms). Cut-offs of 15 kg for males and 11 kg for females based on the lowest quintile value stratified by sex, were used to define low KES [41].
The Performance Oriented Mobility Assessment (POMA) battery measures both static and dynamic balance, with a separate subtest for balance and gait [42]. POMA is commonly used to predict falls and mortality of older adults [43,44]. A cut-off score of <25 indicates a medium to high fall risk.
The Geriatric Depression  score (0-15) was used to identify the presence of depressive symptoms (GDS ≥5) [45], and the Mini-Mental State Examination (MMSE) was used to assess global cognition and identify cognitive impairment (MMSE <23) [46]. Pulmonary function was assessed with the forced expiratory volume in 1 second (FEV1). FEV1 below 70% of the value predicted by age, sex, ethnicity, and height using local population equations indicates airflow obstruction.

Frailty
Two widely accepted models were used to measure the frailty status of the participants: i. Frailty Index (FI) [47]: a cumulative deficit model based on counts of dysfunction and impairment across multiple body systems. A total of 98 non-laboratory based evaluable health deficits were used to construct the index, expressed as a fractional value (number of observed deficits/number of evaluable deficits) from 0 (extremely robust) to 1 (extremely frail) (Supplementary Table 3). FI was analyzed as a continuous variable and binary variable using a cutoff of 0.15 and more to define frailty, based on calculations of stratum-specific likelihood ratios to determine the most appropriate cutoff to discriminate between frailty and non-frailty in predicting mortality in this cohort [44].
ii. Physical frailty: a physical phenotype model used in the Cardiovascular Health Study [48]. We used 5 operationally modified measures described in our AGING previous study [41] for assessing shrinking, weakness, slowness, exhaustion and low activity. One point was assigned for the presence of each of the components, and the total summed score (from 0 to 5) was used to categorize participants as robust (0 points), prefrail (1-2 points) and frail (3-5 points).

Covariates
We collected baseline information such as age, sex and years of education. Participants' housing type: low-end 1-2 room public housing apartments, 3 rooms or a higher-end with 4 rooms or others was used as an indicator of socio-economic status based on the Singapore population census data [49]. Lifestyle factors included participation in 16 categories of physical, social and productive activities described in a previous publication to derive aggregate score based on the number of activities and frequency of participation (on a 5-point Likert scale), with a higher score representing a higher level of participation [50].

Mortality assessment
Participants' mortality status from baseline up to 31 Dec 2016 was determined using the participants' unique National Registration Identity Card number for computerized record linkage with the National Death Registry through the National Disease Registry Office of the Ministry of Health.

Statistical analysis
We used Cox proportional hazard models to evaluate the association of TUG, other physical and functional measures, and chronic disease and behavioural risk markers (multi-morbidity, heart disease, diabetes mellitus, hypertension, chronic kidney disease, smoking, BMI, central obesity, frailty index, physical frailty) with mortality in a crude model and two adjusted models. In Model 1, the mortality HR estimate associated with each predictor variable was adjusted for age and sex (but not for ethnicity, as no deaths were observed among the small numbers of non-Chinese participants). Model 2 further adjusted for covariates in Model 1 as well as for education, housing status, living alone, smoking (but not alcohol, due to small sample size), physical activity, social activity, productive activity, heart disease, stroke, diabetes, hypertension, chronic kidney disease and multi-morbidity. Hazard ratios (HR) and 95% confidence intervals (95%CI) were estimated for each physical, functional and clinical predictor as a continuous variable and binary variable. The mortality HR value is variable for different cut-offs along with the range of values of the same predictor variable and for different measurement units of different predictor variables. Thus, for a valid comparison of the strengths of association with mortality between different predictors, we used a standardized approach to show per standard deviation (SD) increment of mortality HR.
The measures in predicting mortality were evaluated using receiver operating characteristic (ROC) curves, and the areas under the curves (AUCs) were compared using the DeLong's method for significance testing [49]. An AUC between 0.7 and 0.8 is considered acceptable discrimination, between 0.8 and 0.9 is deemed excellent discrimination, and more than 0.9 is outstanding discrimination [51]. The discriminant accuracy of various optimal cut-off values was expressed as sensitivity, specificity, positive predictive value, and negative predictive values. Analysis of the data was performed using IBM SPSS version 25.

CONCLUSIONS
Our study highlights the superior accuracy of TUG compared to other physical and functional measures in predicting long-term mortality among communitydwelling older adults. Taken together with evidence of the ability of the TUG to predict falls and other adverse health outcomes, the TUG appears to be uniquely positioned for use in early comprehensive geriatric assessment, and particularly in regard to shared clinical decision making requiring the prognostication of future life expectancy.

AUTHOR CONTRIBUTIONS
CYC and TPN reviewed the literature, designed the study, drafted and revised the manuscript. TPN analyzed the data. PY, KBY, XYG, DQLC contributed to the study design and data collection. All authors reviewed the results and drafts, and approved the final manuscript. Abbreviation: HR: hazard ratio; * p < 0.05; ** p < 0.01; *** p < 0.001. All physical and functional performance measure was included in the same model together. Binary cut-offs shown are commonly used in previous research and clinical applications.  Abbreviation: AUC: area under curve; * p < 0.05; ** p < 0.01; *** p < 0.001.