Biomarkers as precursors of disability Economics and Human Biology

Some social surveys now collect physical measurements and markers derived from biological samples, in addition to self-reported health assessments. This information is expensive to collect; its value in medical epidemiology has been clearly established, but its potential contribution to social science research is less certain. We focused on disability, which results from biological processes but is de ﬁ ned in terms of its implications for social functioning and wellbeing. Using data from waves 2 and 3 of the UK Understanding Society panel survey as our baseline, we estimated predictive models for disability 2 – 4 years ahead, using a wide range of biomarkers in addition to self-assessed health (SAH) and other socio-economic covariates. We found a quantitatively and statistically signi ﬁ cant predictive role for a large set of nurse-collected and blood-based biomarkers, over and above the strong predictive power of self-assessed health. We also applied a latent variable model accounting for the longitudinal nature of observed disability outcomes and measurement error in in SAH and biomarkers. Although SAH performed well as a summary measure, it has shortcomings as a leading indicator of disability, since we found it to be biased in the sense of over- or under-sensitivity to certain biological pathways. NatCen Social Research and Kantar Public. The research data are distributed by the UK Data Service. Participants gave informed consent for their blood to be taken for future scienti ﬁ c analysis. Biomarker collection was approved by the National Research Ethics Service (10/H0604/2). We are grateful to the Economic and Social Research Council for ﬁ nancial support for this research via project How can biomarkers and genetics improve our understanding of society and health? (award no. ES/ M008592/1) and the MiSoC research centre (award no. ES/L009153/ 1). The funders, data creators and UK Data Service have no responsibility for the contents of this paper. We are grateful to members of the project team for many helpful comments. Any remaining errors are our sole responsibility.


Introduction
An important recent development in research based on largescale social surveys is the integration of physical health measurements and markers derived from biological samples, in addition to traditional self-reported health assessments. Biomarkers are objectively measured and evaluated as indicators of normal biological or pathogenic processes (Colburn et al., 2001), and they have potential advantages over self-assessments as early indicators of conditions that are below clinical diagnostic thresholds, or are pre-symptomatic and below individuals' threshold of perception (Colburn et al., 2001). Cardiovascular, metabolic, inflammatory, neuroendocrine and other biomarkers have been shown to be predictors of mortality and morbidity when used alone or alongside self-reported health assessments (Idler and Benyamini, 1997;Gruenewald et al., 2006;Ridker, 2007;Jylhä, 2009;Doiron et al., 2015). They have also been used to reveal the socioeconomic gradient in health risks (Seeman et al., 2004;Lee et al., 2015;Carrieri and Jones, 2017).
Despite their advantages, biomarkers impose significant additional costs of collection in the survey context and their potential contribution to economic and social research is not entirely clear. The wider social impacts of ill-healthon quality of life, personal and social functioning, and social costs of diseasedepend critically on the duration and severity of disability prior to death, and there has been little research on the role of biomarkers in relation to disability. Disability is associated with loss of employment, early retirement and serious consequences for the families affected (Pudney et al., 2011;Jones, 2016;Christensen and Gupta, 2017) and typically implies long-lasting impairments that may prevent independent living and generate large social costs. This is particularly so in the UK where disability prevalence is well above the European Union average (Jones, 2016) and has been rising rapidly (from 11. 9 to 13.3 million over 2013/ 14-201514- /16 (DWP, 2017. There is evidence of an increasing birth-cohort trend in functional difficulties for older individuals of low socioeconomic status (Morciano et al., 2015) and developed countries like the UK may face severe problems in supporting the projected future growth in the disabled population and providing public support to people with care needs (Commission on Funding of Care and Support, 2011). A crucial question for researchers and policymakers is whether the demand for care services will be curbed by gains in disability-free life expectancy alongside the projected continuing gains in longevity. An answer to this question requires a better understanding of the processes leading to disability, allowing the development of strategies and screening programmes to address disability more efficiently. The availability of biomarker information in population-representative data may contribute to that better understanding.
Despite the importance of disability trends for social policy planning, relatively little is known about the association between biomarkers and future disability. The World Health Organization (WHO) proposed a framework that portrays progression from diseases to functional disabilities (WHO, 1980), and Fried et al. (1991) hypothesized the existence of pre-clinical disability as an intermediate stage in which health impairments have an impact on general functioning. Few studies have explored the predictive role of biomarkers in relation to this disability process, and most are limited by being based on small samples or unrepresentative data (Brex et al., 2002;Reuben et al., 1999;Baylis et al., 2013;Seeman et al., 1994;Kallaur et al., 2017); or focused exclusively on older individuals (Reuben et al., 1999;Seeman et al., 1994;Baylis et al., 2013); or concerned with disability outcomes from a specific disease or condition (Brex et al., 2002;Kallaur et al., 2017). Another study by Pagan et al. (2016) tests the hypothesis that disability is a potential mediator in the link between obesity and job satisfaction.
We examined the predictive power of a wide range of biomarkers for future disability and specifically asked whether biomarkers offer incremental value in predicting disability outcomes beyond the contribution of the conventional self-assessed health (SAH) measure. SAH may be associated with disability outcomes in parallel with biomarkers by reflecting the impact on the individual of diagnosis of health conditions defined in terms of elevated biomarkers (Idler et al., 2004;Jylhä, 2009), or through bodily sensations that are sensitive to the biochemical processes measured by biomarkers (a, 2009). Besides confirming the value of biomarkers as leading indicators of disability, we also investigated the success of SAH as an overall summary of biomedical states relevant to disability and examined whether SAH is biased in the sense of over-or under-sensitivity to specific biological pathways.
The paper makes several new contributions to the research literature. To the best of our knowledge, this is the first study that provides a comprehensive analysis of this kind. Using baseline data from early waves of the UK Household Longitudinal Study (UKHLS, also known as Understanding Society), we estimate predictive models for disability two to four years ahead, exploiting a large set of nurse-collected and blood-based biomarkers. These biomarkers measure adiposity, grip strength, blood pressure, lung, kidney and liver functions, inflammation, steroid hormone levels, blood sugar and anaemia, giving an unusually broad picture of individuals' health states. The use of several alternative disability measures demonstrates the robustness of our results.
In addition to simple prediction models, we also develop a latent variable (LV) approach which is new to the literature. This LV model has a number of advantages. First, unlike simpler prediction models, it takes into account the longitudinal nature of our data on disability, allowing for correlation between disability variables two to four years after baseline. Second, it addresses measurement error bias by allowing for measurement noise in both SAH (Crossley and Kennedy, 2002) and the biomarkers (Zang et al., 2015). Measurement error in biomarkers normally causes attenuation of the estimated impact of the biological pathways on disability, and the LV model is expected to give more accurate estimates of these impacts. This may be of particular importance for developing policy strategies and interventions to reduce the personal and social burden of disability. Third, our LV approach is set up to identify any distinct dimension of health which influences future disability and is captured by the biomarkers but missed by SAH. The pattern of factor loadings tells us what underlying aspects of health tend to be under-or over-represented by the SAH measure and can therefore guide the interpretation of research findings related to SAH.

Data
The UKHLS is a large, nationally representative panel survey, running continuously from the initial wave in 2009-10, with each panel member interviewed annually. Its predecessor, the British Household Panel Survey (BHPS) was incorporated into the UKHLS from wave 2. A set of physical health measures and non-fasted blood samples were collected by nurses, five months on average after the wave 2 interview for UKHLS respondents and similarly at wave 3 for the BHPS sample (McFall et al., 2014). Respondents were eligible for nurse visits if, at the relevant wave, they were aged 16 or over, lived in England, Wales or Scotland and were not pregnant. Blood sample collections were further restricted to those who had no clotting or bleeding disorders and no history of fits. Participants gave informed written consent for their blood to be taken and stored for future scientific analysis. The UKHLS has been approved by the University of Essex Ethics Committee and the nurse data collection by the National Research Ethics Service (10/H0604/2).
We define wave 2 as the baseline for the main UKHLS panel and wave 3 as the baseline for the BHPS sub-panel and we refer to the timing of this baseline observation as t = 0; these baseline observations were spread through calendar years 2010-2013. We used baseline data on personal and household characteristics and bio-medical measures as predictors of disability observed subsequently at waves 4-6 for the UKHLS sample (t = 2, . . . 4) or waves 5-6 for the BHPS sample (t = 2, 3), where t denotes years since the baseline main interview. We did not use data from t = 1 since the time gap between collection of biomarkers and interview at t = 1 was less than 6 months for 75% of the t = 1 sample.
There were 15,632 and 5053 UKHLS and BHPS respondents who participated in the wave 2 or wave 3 nurse visits. For the UKHLS group, 13,404, 12,719 and 11,434 were followed up at waves 4, 5 and 6 respectively and had non-missing information on disability; 4513 and 4113 of the BHPS subsample were followed up at waves 5 and 6. We further conditioned the analysis on the absence of reported disability at baseline by excluding those reporting the relevant type of disability at baseline. Exclusion of these cases reduced the potential samples by 25%, 13% and 8% depending on the disability concept used. A detailed summary of available sample sizes is given in the Supplement Table S1.

Biomarkers
Measures of adiposity, grip strength, heart rate, blood pressure and lung function were collected during visits by trained nurses. We used the waist-to-height ratio (WHR) to measure adiposity. Grip strength was measured (in kg) using a hand dynamometer (McFall et al., 2014) and we took the highest reading from three repeated measurements for the dominant hand. Higher levels of grip strength are indicative of better physical functioning.
Three repeated measurements of resting heart rate (HR), systolic and diastolic blood pressure (SBP, DBP) were taken at intervals of one minute (McFall et al., 2014). We skipped the first reading, believed to impose upward biases, and computed HR, SBP and DBP as the average of the second and third readings. HR, which is sometimes regarded as a measure of fitness rather than health, was used as a continuous variable, and we also used a binary hypertension indicator recording SBP > 140, DBP > 90 and/or current use of anti-hypertensive medications (Johnston et al., 2009).
Lung function, assessed using spirometry equipment, was measured by the total amount of air forcibly blown out after a full inspiration (forced vital capacity; FVC), higher values indicating better lung function (Gray et al., 2013). Forced expiratory volume in one second (FEV1) is often used as an alternative to FVC. However, different equipment and measurement protocols were used in Scotland than in England and Wales, and comparison of matched samples showed FEV1 to be seriously affected by this, while FVC measures appeared comparable. Consequently we retained the Scottish sample and used FVC as our lung function measure.
We used blood-based biomarkers specific to inflammation, steroid hormone, cholesterol, blood sugar, kidney function, liver function and anaemia.
C-reactive protein (CRP) rises as part of the immune response to infection and is associated with general chronic or systemic inflammation. We excluded those with a CRP over 10 mg/L, because those values may reflect current transient infections rather than chronic processes (Pearson et al., 2003).
Dihydroepiandrosterone suphate (DHEAS) is the most common steroid hormone in the body, considered as one of the primary mechanisms through which psychosocial stressors may affect individual health. Low levels of DHEAS are associated with cardiovascular (CVD) risks and all-cause mortality (Ohlsson et al., 2010).
High-density lipoprotein cholesterol (HDL) is known as "good" cholesterol, low levels being associated with increased CVD risks (Wannamethee et al., 2000).
Glycated haemoglobin (HbA1c) measures blood sugar, and is regarded as a validated diagnostic test for diabetes (WHO, 2011).
The estimated glomerular filtration rate (EGFR), calculated from the serum creatinine concentration, measures kidney function; higher EGFR levels indicate better kidney function (Levey et al., 2009).
We used albumin, the main protein made by the liver, as a liver function test; low albumin levels suggest impaired liver function (Howard and Sparks, 2016).
Haemoglobin (Hgb), is an iron-containing protein responsible for carrying oxygen throughout the body, and was used to proxy anaemia status; lower levels of Hgb are suggestive of anaemia (Balarajan et al., 2012).
In addition to specific markers, we also used two composite summary measures. One was an index of multi-system risk that measures the wear and tear on the body, approximating the allostatic load (Seeman et al., 2008). Our index combined the selected biomarkers for inflammation, blood pressure, HR, HbA1c, HDL cholesterol, albumin, DHEAS and the WHR (Seeman et al., 2008;Howard and Sparks, 2016). HDL, Albumin and DHEAS were converted to negative values to reflect ill health, and then each biomarker was transformed into z-scores and summed to calculate the overall index. The second index was a cumulative risk score for CVD, created by adding the relevant z-scores for WHR, blood pressure, HbA1c and CRP (Walsemann et al., 2016). A summary of all biomarkers by reported future disability state is given in the Supplement Tables S3 and S4.

Self-assessed health
SAH is considered a summary measure capturing the way that numerous aspects of health, both subjective and objective, are combined within the perceptual framework of the individual respondent (a, 2009). The SAH question asked respondents to rate their health on a five-point scale from "excellent" to "poor". It was collected in the self-completion instrument at baseline, approximately five months prior to biomarker collection. We group the lowest two SAH categories because of their small sample size, giving a four-point scale ranging from 1 = "excellent" to 4 = "fair" or "poor".

Disability measures
Our measures of disability were collected at UKHLS waves 4-6, so disability outcomes were observed for prediction horizons of t = 2 . . . 4 years for the main UKHLS sample or t = 2, 3 years for BHPS respondents. Respondents were asked about any longstanding physical or mental impairment that they might have and then the consequent functional difficulties, from a list of twelve provided (see Supplement Table S2 for the full list). We used the report of any functional difficulty as a dichotomous variable and the number of functional difficulties (coded as an ordinal variable: 0, 1, 2 or 3+) as an indicator of severity. Specific difficulty with mobility is also examined as a separate dichotomous indicator because of its relatively high prevalence and significance for functioning and independence in later life (Guralnik et al., 1993).
Our fourth disability measure came from the income module of the UKHLS questionnaire, constructed as a binary indicator of whether the respondent received income from private disability insurance or any of the UK disability benefit programmes. For all programmes, receipt requires a decision to apply for the benefit, the ability to craft a good-quality application and a positive assessment of need by the programme administrators. Consequently, while the benefit receipt indicator involves a rigorous external assessment of severity, that assessment is confounded to some degree by the incentive and capacity to apply for the benefit, which has a strong socioeconomic gradient (Hancock et al., 2016).

Covariates
The explanatory covariates that we used in our models have been found to be associated with disability (Hernández-Quevedo et al., 2008;Morciano et al., 2015), and also directly with biomarkers (Carrieri and Jones, 2017). The covariates were collected at baseline and are described and summarised in the Supplement Table S5. Gender and polynomials in age were used to capture demographic influences. Three indicators of socioeconomic status were included: educational attainment, home ownership and household income. We excluded disability benefits from income to avoid spurious correlation arising from the fact that disability creates eligibility for those benefits (see Morciano et al. (2015)). We also controlled for marital status, household composition, national and urban dummies. To assess the impact on future disability of lung function over and above smoking status, we estimated models with and without the inclusion of smoking variables.

Methods
Our analysis uses a sample of individuals who had no observed history of disability at baseline defined as the time of biomarker collection (t = 0). Conditioning the analysis in this way focuses attention on the transition from full physical functioning to disability, and it avoids the complications raised by the fact that, for people already disabled at baseline, we do not observe the evolution of their health and disability prior to joining the Understanding Society panel.
We apply two types of statistical models. First, in line with much of the health literature, we apply standard predictive models for the incidence of disability at a post-baseline horizon. Define D i4 to be any one of three binary indicators of disability status (reported mobility difficulty; any reported functional difficulty; receipt of disability benefit) four years after baseline. For each indicator we applied a probit prediction model: where F(Á) is the N(0, 1) distribution function. B i0 is a specific biomarker or an index of biomarkers observed at the nurse visit at baseline t = 0; S i0 is a set of dummy variables representing SAH in period 0; X i0 is a set of covariates describing the individual and her/ his history up to the nurse visit; u it is a random residual; and α 0 . . . α 3 are coefficients. Note that the use of a 4-year horizon excludes BHPS members who received the nurse visits at wave 3; however, they were included in the LV model presented later in this section.
In addition to the three binary indicators, we also extended model (1) to analyse the reported number of functional difficulties as a 4-level ordered probit model. Average marginal impacts of SAH and biomarkers on the probabilities of reporting 2+ or 3+ difficulties were then constructed from the estimated ordered probit.
We did not use all 12 specific biomarkers simultaneously as predictors in model (1), for three reasons. First, there was a significant loss of usable data when all markers are required to be observed. Second, the full set of health measures (SAH and biomarkers) displayed a substantial degree of collinearity, so there would have been a further loss of statistical precision. A third policy-relevant reason for considering each biomarker separately is that, in practice, it is unlikely that any screening programme would simultaneously check blood pressure, adiposity, blood sugar, cholesterol, haemoglobin, hormone levels, liver, kidney and lung function; so there is a practical interest in the predictive power of each specific marker on its own. However, we also separately used the two composite indexes for allostatic load and CVD risk to consider the potential performance of more comprehensive tests.
Although models like (1) are fairly standard in the literature, they are vulnerable to measurement error bias. Biomarkers and SAH are best seen as noisy indicators of the relevant health concepts rather than direct observations of those concepts. SAH may be subject to transient random variations in mood and perception, while biomarkers are affected by random variations in blood samples and measurement processes. Measurement noise may bias estimates of predictive models like (1), usually causing attenuation of the estimated impact of biomarkers and SAH. Our alternative latent variable approach offers a way of dealing with measurement error, by exploiting the multiplicity of biomarkers that reflect to varying degrees the underlying health state. Our aim here is to develop a form of the LV model that gives a clear indication of the predictive value of the biomarker information beyond that contained in the SAH measure.
We used a two-factor structural LV model in which a latent variable h i0 reflects a dimension of general health at baseline, measured to varying degrees by SAH and the set of twelve biomarkers (not the indexes for allostatic load and CVD risk). To capture the incremental contribution of the biomarkers, we specified a second latent health variable, b i0 representing any dimension of baseline health that the biomarkers succeed in measuring, but which is not captured by SAH. The outcome variables, D i2 , D i3 , D i4 , represent disability 2, 3 and 4 years after baseline; the correlation between them is captured by an unobserved random effect, u i , which may have different impacts in different periods. The resulting model has the structure shown in Fig. 1 and set out algebraically in the Supplement.
We used the estimated LV model to construct two predictive probabilities, p 1 and p 2 based on different predictor sets, one comprising the baseline covariates and SAH, the other expanded to include also the twelve biomarkers. Omitting time subscripts, the predictive probabilities are: where the expectations with respect to the latent variables h, b, u were approximated using Monte Carlo simulations with 100,000 replications.

Simple predictive models
The results for model (1) at horizon t = 4 are presented in Table 1, which shows the percentage impact on the predicted number of people classified as disabled, of a 1-standard deviation increase in the relevant biomarker. Columns three (one or more functional difficulties), six (mobility) and seven (benefits) of the table were derived from binary probit models; each cell in those columns was based on a separate model for the relevant combination of biomarker and disability measure. The cells in each row of columns four and five came from the same ordered probit model for the number of reported disabilities, with the impact of each biomarker evaluated respectively at the 2+ and 3+ thresholds.
First note that, when SAH was excluded from the prediction models, almost all biomarkers had substantial and statistically significant (at least at the 5% level) predictive power. The Table 1 Four year ahead prediction models: % impact on mean disability prevalence of a standard deviation increase.

No. of functional difficulties reported Mobility difficulty Benefit receipt
Marker SAH 1 or more x 2 or more † 3 or more †
To test whether the predictive role of each biomarker and/or SAH vary by demographic and socioeconomic status (household income, education and house tenure), we tested the relevant interaction terms across our different model specifications. The tests found no systematic differences in the predictive role of biomarkers and SAH on future disability by age, gender or socioeconomic status. For example, P-values for the interactions of allostatic load with gender range between 0.471 and 0.716 across the 4-year prediction models of the different disability outcomes; for interactions with age, P-values ranged between 0.211 and 0.898. We also found no systematic interactions of our socioeconomic status variables with allostatic load (P-values between 0.170 and 0.905).
Both composite biomarker measures gave strong effects, with allostatic load having the strongest impact. The results for the benefit receipt measure of disability were an interesting exception to this: when SAH was included in the model, the magnitude of the biomarker effect halved and retained significance only at the 10% level. There may be two behavioural factors involved in that result. One is justification biasreceipt of benefit may lead some respondents to report a worse state of health in SAH to justify their receipt of disability benefit. Alternatively, some people may be reluctant to accept or admit that their health is poor, leading them both to under-represent their current health difficulties in SAH and avoid claiming their potential entitlement to disability benefit. Both of these behaviours would be likely to strengthen the empirical SAH effect relative to the estimated effect of allostatic load or CVD risk indexes (which are more highly correlated with SAH than individual biomarkers, and therefore more affected by bias in SAH).
Figs. 2 and 3 compare the magnitude of the SAH and biomarker impacts calculated from models where both SAH and the relevant biomarker were use as predictors (together with the covariates X). The SAH impact was calculated as the mean predicted impact of switching the individual from the best ("excellent") to worst ("poor/very poor") category of SAH; with the exception of the binary hypertension marker, the biomarker effect was calculated as the mean impact of switching from a value approximately 1.5 standard deviations better to 1.5 standard deviations worse than the mean (between the 5th and 95th percentiles). Impacts are calculated using the covariates X for each sampled individual and then averaged.
Figs. 2 and 3 (see also Table S9 of the Supplement) show that the biomarkers with statistically significant impacts shown in Table 1 made predictive contributions of about 20-25% of the SAH effect in most cases. For disability measures based on the number of disabilities reported and mobility difficulties, allostatic load made the largest contribution to prediction, both absolutely and as a proportion of the SAH impact: the impact of a three standard deviation change in allostatic load was approximately one third that of the hypothetical SAH shift. For the mobility-based disability criterion, allostatic load and WHR had the largest impacts in absolute terms (roughly 40% of the SAH effect). In accordance with the results in Table 1, allostatic load contributed less to the prediction of benefit receipt, while the markers for grip strength, adiposity and lung function all gave substantially greater impacts (around one third of the SAH effect).
Despite some differences between disability definitions in the pattern of results, the overall conclusion seems robustbiomarkers made a contribution to prediction of disability four years ahead that is significant both statistically and in terms of absolute magnitude. But that contribution is moderate in comparison with the information contained in SAH.
There is a risk that the act of observation could change the behaviour being observedthat formal or informal feedback about the respondents' biomarker levels may prompt additional GP consultations and treatments for previously undiagnosed health conditions, or behavioural changes (Zhao et al., 2013). In the UKHLS, survey nurses were instructed not to discuss or interpret respondents' results in general or in relation to other people in the survey, and blood tests results were not available to survey participants. However, participants received a Measurement Record Card with their blood pressures, height, weight, waist circumference, percent body fat, and grip strength. The survey protocol (National Centre for Social Research, 2010;McFall et al., 2014) also specified tailored blood pressure feedback: respondents were informed and advised to visit a GP within 2 months, 2 weeks or 5 days if their blood pressure was mildly raised (140 systolic blood pressure <160 or 90 diastolic blood pressure <100), moderately raised (160-180 or 100-115 ) or considerably raised (over 180 or over 115), respectively. Those with normal blood pressure measurements received reassuring feedback.
To explore the robustness of our results to the possibility of feedback effects, we carried out a number of sensitivity analyses. First, we added the categorial blood pressure feedback variable to all our predictive models, and found that the results remained practically identical to those presented in Table 1 and Figs. 2 and 3. We also interacted the categorical blood pressure feedback variable with our biomarkers, finding the interaction terms statistically insignificant (at the 10% level) across all models. For example, the P-values of interaction terms between our composite biomarker measure (allostatic load) and the categorical blood pressure feedback variable range from 0.678 and 0.904 across the different disability outcome models. Thus the predictive role of biomarkers does not systemically differ between those who were informed of an elevated blood pressure and those with normal blood pressure levels. We also re-estimated the prediction models after excluding respondents who received feedback of mildly, moderately or considerably raised blood pressure, showing only minor differences to our base case results (compare Tables S9 and S10 of the Supplement). Overall, this evidence alleviates concerns about potential distortions from survey feedback effectsalthough it is disappointing from the policy perspective that there is so little evidence of feedback generating health improvements. This bleak result is consistent with recent evidence suggesting that screening programmes providing health information are often relatively ineffective as a means of disease control (Chang et al., 2018;Kim et al., 2019).

Do SAH and biomarkers measure the same thing? Latent variable results
Our finding that SAH is a strong predictor of future disability parallels similar evidence on mortality risks (Glei et al., 2014;Idler and Benyamini, 1997;Jylhä, 2009), but it raises the issue of how SAH relates to more objective biological aspects of health. SAH is a subjective response that may reflect bodily sensations produced by biological disease processes, but also potentially many other things such as mood, self-image, past contacts with the healthcare system, and health shocks to friends and relatives (Idler and Benyamini, 1997;Jylhä, 2009). If used for policy purposes, SAH might also be reported subconsciously or strategically to achieve or justify a particular outcome. Even in its relation to biological processes, SAH may be biased in particular directions since not all disease processes are equally apparent to the sufferer.
No single biomarker or composite biomarker index can plausibly act as a direct comparator in a conventional measurement error validation study (Bound et al., 2001). A more productive approach is to use biomarker information alongside SAH to indicate the biological pathways to which SAH is insufficiently or excessively sensitive. 1 We use the LV approach outlined in Section 3 to integrate SAH and biomarkers within an appropriately broad measurement setting.
In practice biomarkers are also potentially error-prone measures, although not vulnerable to the behavioural reporting biases that may affect SAH. Random measurement errors in biomarkers used as predictors cause bias for the purposes of estimating causal links between health and future disability. But that bias is not a problem if we are concerned with prediction in the context of policy applications such as screening or monitoring programmes, since the biomarkers available to programme administrators are subject to the same measurement error processthe (causally biased) prediction model still gives the best prediction of future Fig. 3. Estimated mean partial impacts of deterioration in SAH (change from "excellent" to "poor/very poor") and biomarkers (3-standard deviation change centred on mean) on proportions reporting multiple functional difficulties 4 years later (Ordered probit model; percentage values are ratios of the biomarker effect to the SAH effect).
1 Note that sensitivity in this context is judged in relation to future disability and may differ from the pattern of sensitivity that emerges relative to other future outcomes.
A. Davillas, S. Pudney / Economics and Human Biology xxx (2019) 100814 disability conditional on biomarker information as observed by the administrator. However, to understand the relationship between SAH and biological disease pathways we need estimates more closely related to the true causal processes. An advantage of the LV model is that allows for measurement noise in both SAH and biomarkers and so avoids measurement error bias stemming from classical random measurement error.
Tables 2 and 3 summarise results from the LV model. Table 2 gives the estimated factor loadings relating observable biomarkers to the two latent dimensions of health. Except for the binary hypertension measure, the loadings were normalised to give the impacts of latent health on observed indicators in standard deviation units. The loadings of the primary latent health factor h i0 are mostly as expected, with disability risk raised significantly by WHR, hypertension, CRP and HbA1c; and lowered by grip strength, lung capacity, HDL cholesterol, DHEAS, EGFR and Albumin.
The loadings on the second latent factor b must be interpreted carefully in relation to the SAH loadings. For any given marker, if the loading on h has the correct sign and the loading on b has the same sign, then this implies that the health concept implicitly captured by SAH understates the role of the biological pathway which the marker measures. Under this interpretation, our finding is that SAH strongly understates the importance of grip strength, lung function, DHEAS and liver function and weakly understates the effect of CRP.
If the loading on h has the correct sign and the loading on b has the opposite sign, then SAH can be interpreted as over-emphasising the pathway measured by the marker. This was the case for WHR, hypertension and HDL cholesterol and, more weakly, for HbA1c and EGFR.
In two cases, HR and Hgb, the loading on h was small, with an a priori wrong sign, which was strongly reversed in the loading for b. This finding can be interpreted to mean that, in these dimensions, SAH is a potentially misleading indicator of future disability.
Our finding from the simple predictive model (1) was that biomarkers provide significant and substantial predictive power which is nevertheless moderate in relation to SAH. The same result is also evident in results from the LV model. Table 3 gives summary statistics for the predictive probabilities of future disability conditional only on SAH and covariates (p 1 (X, S)), and conditional on SAH, covariates and twelve biomarkers (p 2 (X, S, B 1 . . . B 12 )). These predictive probabilities were calculated for all disability definitions and prediction horizons.
For three of the predictors, the mean predicted probability of disability rose as we varied the prediction horizon from t = 2 to t = 4, by approximately 20% (1 or more functional difficulties), 28% (mobility problem) or 62% (benefit receipt). This rise in disability risk over time is a natural reflection of the cumulative character of disability prevalence. However, there was no such rise in prevalence measured as the proportion of individuals reporting two or more functional difficulties. For all four disability measures, the predictive probabilities also became substantially more variable (their standard deviations increased) as the prediction horizon lengthened. For (almost) all of the predictions, there was a rise in the variability of the predictor when we expanded the baseline predictor set by adding the biomarkers. This reflects the fact that adding biomarkers gives a more detailed and diverse picture of each individual's disability risk. The correlations between p 1 and p 2 were generally high, reflecting the goodbut not perfectperformance of SAH as a general health proxy.

Discussion and conclusions
We investigated the predictive power of objective nursecollected biomeasures and blood-based biomarkers for functional disability, following individuals who reported no disability at baseline for up to four years after collection of the health measures.  We used a wide range of biomarkers, and alternative measures of disability, covering the existence and number of functional disabilities, mobility difficulties and receipt of disability benefits. For almost all of the biomarkers, we found 4-year-ahead predictive effects that were substantial in magnitude and statistically significant. When SAH was introduced as a predictor alongside the biomarkers, the magnitude of the biomarker effects fell, in most cases by 20-40%, but remained important in magnitude and highly statistically significant for most of the biomarkers examined. Although there were some differences across biomarkers and disability measures, we found that measures of adiposity, grip strength, heart rate, lung functioning, cholesterol levels, inflammation, blood sugar and anaemia had strong predictive power for future disability risk, over and above SAH.
How should we interpret these predictive results? Causality and predictability are not the same thing, and there is a possibility that the association between health measures and future disability outcomes is partly the result of unobserved factors. In our view, causality can never be established unambiguously in observational data but, compared to cross-section analysis, the 4-year gap between our initial health measure and disability outcomes makes it much more likely that the predictive effects are causal rather than proxy in nature. This separation in time removes the possibility of reverse causation and weakens any ability of baseline biomarker levels to proxy omitted heterogeneity. Moreover, the most likely non-causal channels of association appear not to be important herewe found no evidence of an effect of health information fed back to respondents, nor of smoking behaviour on disability outcomes. It is also important not to over-emphasise the importance of causality. Predictability is extremely important for some major policy purposesfor example, a successful screening programme needs a good predictor to identify priority population groups, and that does not necessarily require a complete structural understanding of all the biochemical, behavioural and environmental processes involved in determining the target outcomes.
In addition to simple predictive models, we also developed a new latent variable approach capable of incorporating large numbers of biomarkers and longitudinal observation of disability outcomes, while allowing for measurement error in SAH and biomarkers. This approach allowed us to identify distortions in SAH as a measure of health, by detecting an additional predictive factor that is revealed by the biomarkers but not by SAH. The corresponding factor loadings indicate dimensions of biological function that are given too much or too little weight by SAH in predicting disability. We found that SAH is excessively sensitive to the biological pathways reflected in adiposity, hypertension and cholesterol, and insufficiently sensitive to strength, lung function, hormonal balance and liver function.
Nevertheless, SAH emerged as a good general health proxy in the sense that, when SAH and biomarkers were both used as predictors, the estimated biomarker impacts on future disability, although substantial absolutely, were moderate in comparison with the effects of SAH. For example, using a composite summary measure to proxy allostatic load (our strongest biomarker predictor), we found that moving from the best (excellent) to worst (fair/poor) category of SAH increased the risk of disability 4 years later by 5-18 percentage points on average, depending on the disability concept used, while an increase in allostatic load from the fifth to the lowest 95th percentile (roughly a 3-standard deviation rise) increased disability risk by 2-7 points.

Limitations
Key strengths of our analysis come from its use of UKHLS data which allowed us to use a large, nationally representative sample covering all adult ages. The bio-social character of UKHLS provided a wide range of nurse-collected and blood-based biomarkers, in addition to SAH, disability indicators and extensive measures of household characteristics and socio-economic status. This adds breadth and depth to the small body of evidence that already exists on biomarkers as predictors of future disability. Existing studies are more limited in terms of the range of biomarkers used and also the study population, which is mainly restricted to older people, nonrepresentative samples or specific patient groups (Brex et al., 2002;Reuben et al., 1999;Baylis et al., 2013;Seeman et al., 1994). As far as we are aware, ours is the first study that makes an explicit evaluation of subjective SAH against objective biomarker information in relation to disability.
Despite these advantages, there are limitations. First, the available data follow individuals for a relatively short time horizon. We have found evidence of a rise in the estimated effect of biomarkers as the length of the prediction horizon increases, suggesting that our results may understate the full long-term predictive role of biomarkers. Second, functional disability is a slippery, hard-to-measure concept and the measures used in our analysis are necessarily limited. Our use of a range of alternative disability measures alleviates these concerns to some degree, but a complete solution requires further research. Finally, although we used an unusually extensive set of biomarkers, the multidimensional nature of the biomedical processes underlying disability means that there may remain significant aspects of physical health that are not covered by our analysis.