Towards a consensus definition of allostatic load: a multi-cohort, multi-system, multi-biomarker individual participant data (IPD) meta-analysis

Background: Allostatic load (AL) is a multi-system composite index for quantifying physiological dysregulation caused by life course stressors. For over 30 years, an extensive body of research has drawn on the AL framework but has been hampered by the lack of a consistent definition. Methods: This study analyses data for 67,126 individuals aged 40 – 111 years participating in 13 different cohort studies and 40 biomarkers across 12 physiological systems: hypothalamic-pituitary-adrenal (HPA) axis, sympathetic-adrenal-medullary (SAM) axis, parasympathetic nervous system functioning, oxidative stress, immunological/inflammatory, cardiovascular, respiratory, lipidemia, anthropometric, glucose metabolism, kidney, and liver. We use individual-participant-data meta-analysis and exploit natural heterogeneity in the number and type of biomarkers that have been used across studies, but a common set of health outcomes (grip strength, walking speed, and self-rated health), to determine the optimal configuration of parameters to define the concept. Results: There was at least one biomarker within 9/12 physiological systems that was reliably and consistently associated in the hypothesised direction with the three health outcomes in the meta-analysis of these cohorts: dehydroepiandrosterone sulfate (DHEAS), low frequency-heart rate variability (LF-HRV), C-reactive protein (CRP), resting heart rate (RHR), peak expiratory flow (PEF), high density lipoprotein cholesterol (HDL-C), waist- to-height ratio (WtHR), HbA1c, and cystatin C


Introduction
It has been proposed that lifetime exposure to psychosocial stressors increases the risk of diseases in later life by disrupting the physiological regulatory systems that are involved with initiating, maintaining, and inhibiting the stress response leading to greater 'wear and tear' on the body, and increased vulnerability to disease (Seeman et al., 2001). McEwen and Stellar (1993) invoked the concept of allostatic load (AL) to describe the physiological consequences when adaptive changes made by an organism in response to the experience of a stressor become maladaptive (i.e. dysregulated).
The neurobiological response to stress induces activation in two neuro-endocrine pathways -the sympathetic nervous system (SNS) and the hypothalamic-pituitary-adrenal (HPA) axis -that are central to the stress response. The SNS responds to the recognition of a stressor by releasing catecholamines (e.g. epinephrine, norepinephrine) that facilitate the rapid increases in heart rate, blood pressure, and respiration that enable the 'fight or flight' response, while the HPA system is responsible for the slower onset stress response which involves secretion of stress hormones (e.g. cortisol) to ensure the muscles have a steady supply of energy (McEwen, 2008). These 'primary mediators' of the stress responsewhich also include pro and anti-inflammatory cytokines -play a positive role in adaptation by allowing the body to respond to short term demands that exceed resources, but chronic activation can cause damage to secondary regulatory systems (e.g. cardiovascular, metabolic, immune functioning) that are hypothesised to contribute, in turn, to early disease progression, morbidity, and mortality (i.e. tertiary outcomes) (Juster et al., 2016).
Researchers have attempted to capture variation in the physiological dysregulation resulting from such life-course stresses using an AL score, which is a multi-system composite index, usually involving neuroendocrine, immunological, cardiovascular, and metabolic components (Seeman et al., 1997;Szanton et al., 2005). Initial efforts to operationalize AL simply summed the number of parameters for which each participant scores in the highest quartile of clinical risk based on each biomarker's distribution in the population studied. AL was originally measured using 10 biological parameters: epinephrine, norepinephrine, urinary cortisol, dehydroepiandrosterone sulfate, resting systolic and diastolic BP, waist-hip ratio, total and HDL cholesterol, and glycosylated haemoglobin. However, Seeman et al. (2010) acknowledged that: "this original set of 10 parameters was not meant to be comprehensive nor was it offered as a fixed/standard measure of AL" (p.228) as it was based on secondary data analyses from the MacArthur Successful Aging Study and was limited to available biological data. No gold standard measure of AL exists, and researchers have tended to use the list of biomarkers that are available to them to define AL. While the original authors did not wish to be prescriptive, the lack of a fixed definition has contributed to a type of atheoretical drift in the way AL has come to be defined in the literature. Juster et al. (2010) reported that 51 different biomarkers had been used to define AL across 58 studies included in their systematic review. Of these, metabolic system biomarkers tended to be the most common (34%), followed by neuroendocrine (25%), cardiovascular (20%), anthropometric (11%), and immune system biomarkers (10%). Johnson et al. (2017) reported that 59 biomarkers were used in the 26 studies they reviewed which examined the relationship of socio-economic position with AL. The number of biomarkers employed ranged between 6 and 25 with a mode of 9. All studies included cardiovascular and metabolic system biomarkers, 82% included immune system biomarkers, 58% included neuroendocrine system biomarkers, and substantially fewer studies included parasympathetic, respiratory, or biomarkers of kidney or liver functioning. Their analysis of study pairs revealed huge variation in the number of biomarkers shared between studies. Even within studies, there is considerable heterogeneity in how AL is defined. As a case in point, a recent paper noted that 21 analyses using the National Health and Nutrition Examination Survey (NHANES) calculated AL in 18 different ways using 26 different biomarkers (Duong et al., 2017). Surveying the literature, this tends to be the rule rather than the exception. Four recent papers using the UK Household Longitudinal Study (UKHLS, Understanding Society) dataset defined AL using a different set of biomarkers (Chandola et al., 2019;Karimi et al., 2019;Präg and Richards, 2019;Prior et al., 2018), as have papers using the English Longitudinal Study of Ageing (ELSA) (Coronado et al., 2018;Read and Grundy, 2014;Tampubolon and Maharani, 2018), and The Irish Longitudinal Study on Ageing (TILDA) (McCrory et al., 2019;McLoughlin et al., 2020).
Although several critiques of the AL literature have been penned over the years (Beckie, 2012;Dowd et al., 2009;Juster et al., 2016), the lack of consensus regarding which biomarkers should be used to define the concept remains, arguably, its most intractable problem. The lack of fidelity in the measurement of AL has two main consequences: (i) it renders comparisons across studies difficult, and (ii) it limits its potential utility as a screening tool for identifying pre-clinical health states (Rosemberg et al., 2020): one of its mooted applications. In essence, this represents a classic variable selection problemleaving us with the question of how we can overcome the impasse? In this paper, we use individual-participant-data (IPD) meta-analysis and exploit natural heterogeneity in the number and type of biomarkers that have been used across 13 different cohort studies, but a common set of health outcomes, to determine the optimal configuration of biomarkers that can be used to define AL. Analyses presented here are designed to address a series of inter-related questions about the measurement of AL: (a) which biomarker works best within each physiological system to predict health? (b) are some physiological systems more important than others? (c) can we develop a subset of biomarkers that perform as well or better than longer, more elaborate batteries? (d) can we move towards a consensus definition of AL? A unique strength of these analyses is our ability to assess the consistency of the answers to these questions across data from 13 cohort studies.

Study populations
Unlike many systematic reviews and meta-analyses which may consider the effect of one exposure / treatment on an outcome, AL indices comprise multiple biomarkers, so the effort required to pool and harmonise biomarkers across studies is substantial. Hence this study does not purport to represent a systematic review and meta-analysis of all AL studies, but was designed to calibrate a measure of AL from a reasonably representative sample of studies which were (1) publicly available or accessible to us through existing consortia, (2) collected biomarkers relevant to the AL concept, (3) measured walking speed and / or grip strength in addition to self-rated health, and (4) did not incur data access charges. Notable exclusions included the 1936 Lothian Birth Cohort (LBC), West of Scotland twenty-07 cohort, the Northern Swedish Cohort, the Normative Aging Study, the Copenhagen Aging and Midlife Biobank (CAMB), the Atherosclerosis Risk in Communities (ARIC) study, the Coronary Artery Risk Development in Young Adults Study (CARDIA) and the Canadian Longitudinal Study on Aging (CLSA) which were not publicly available without additional collaboration agreement forms and / or incurred data access charges.
In total, we use data from 13 different cohort studies involving a total sample of 67,126 individuals aged 40-111, 53.9% female. Inclusion criteria were mid-to-late life cohorts that are population-based and have collected detailed biomarker batteries alongside several objectively measured functional health outcomes and self-rated health. Four of the cohorts were part of the LIFEPATH consortium (Vineis et al., 2020) (Marmot et al., 2017).
Two were accessed from the InterUniversity Consortium for Political and Social Research (ICPSR): Midlife in the United States (MIDUS II) Biomarker Project (Ryff et al., 2019) and the Social Environment and Biomarkers of Aging Study (SEBAS) (Weinstein et al., 2014). The remaining three datasets were the US Health and Retirement Study (HRS); the National Health and Nutrition Examination Survey (NHANES), and the Health and Ageing Study in Africa: A Longitudinal Study of an INDEPTH Community in South Africa (HAALSI) cohort. As most of these studies are prospective and have collected biomarkers at more than one sweep, we decided to include data for the sweep that was richest in terms of the number of biomarkers collected. For example, the 2016 sweep of data collection for HRS was included, despite the study starting in 1992 as they performed a venous blood draw and assayed more biomarkers than other sweeps. The mean age of the samples ranged from 44.1 (SD=0.24) years in the NCDS to 66.2 years (SD=10.9) in the HRS. A detailed description of the study cohorts and of the sampling design is presented in the supplementary appendix.

Health outcomes
As our dependent variables, we use three health outcome measuresgrip strength, walking speed, and participant self-rated health (SRH). Detailed methods of assessment for each health outcome in each study are described in Supplementary Table S1. Grip strength is a measure of upper extremity muscle strength that has prognostic value as an indicator of functional decline and mortality (Bohannon, 2008). Grip strength was available in 11/13 studies (excluding NHANES and NCDS) and measured using a hand dynamometer which consisted of a gripping handle and strain gauge. Each study collected at least one reading in each of the dominant and non-dominant hands. We use maximal grip strength (kgs). Walking speed requires the co-ordinated action of a number of different physical systems including the nervous, musculoskeletal, and cardiopulmonary systems; and serves as a useful indicator of health and vitality in older adults (Middleton et al., 2015). Walking speed was measured in 8/13 studies (HRS, NHANES, MIDUS II, ELSA, TILDA, EPIPorto, HAALSI, and SEBAS). The distance travelled ranged between 2.5 and 15.24 m across studies. We standardized the outcome across studies to express walking speed as the distance travelled in centimetres per second (cm/sec). SRH was employed as not all studies collected objective health outcomes. As the wording and response categories varied across studies, harmonisation was conducted by generating a binary variable to indicate fair/poor/bad, rather than good, very good or excellent, health.

Biomarkers
Supplementary Table S2 describes the biomarkers that were available within each study to measure AL. We included respondents who had data for at least one biomarker and at least one health outcome. In total, data were available for 51 biomarkers across 12 physiological systems -(i) sympathetic-adrenal-medullary (SAM) axis, (ii) hypothalamicpituitary-adrenal (HPA) axis, (iii) parasympathetic nervous system functioning, (iv) oxidative stress, (v) immunological/inflammatory, (vi) cardiovascular, (vii) respiratory, (viii) anthropometric, (xi) lipidemia, (x) glucose metabolism, (xi) kidney, and (xii) liver. Meta-analysis requires the inclusion of a minimum of two or more studies to derive a pooled effect size estimate. Therefore, the biomarker had to be available in at least two studies in which the three health outcomes were also measured to be included in the analysis. When these exclusion criteria were applied, a final panel of 40 biomarkers across 12 physiological systems remained. Biomarkers were harmonised prior to analysis to ensure they were expressed in the same metric. Mean values for each biomarker across studies are shown in Supplementary Table S3.
Following the classic "count-based" method (Seeman et al., 2010), empirically defined high-risk thresholds were distinguished based on the distribution of each biomarker in the sample; "1" was assigned to values falling above the 75th percentile of the distribution for each marker, and "0" was assigned to values below this threshold. We use study specific cut-points to define high risk as AL is supposed to capture pre-clinical disease states, and there are no standard clinical cut-points for many of the biomarkers (e.g. cortisol, heart-rate-variability etc.). This method is advantageous as it is the most common approach in published studies, eliminates the need to standardize the measurements across labs, and obviates the requirement to apply transformations to account for skewedness of biological data. Limitations include the potential loss of information resulting from dichotomising a continuous variable, and population-specific risks which may not be generalisable to other studies. Nevertheless, studies comparing the predictive utility of a suite of scoring algorithms suggest this is not a major issue (Li et al., 2019;McLoughlin et al., 2020;Seplaki et al., 2005). Indeed, the count-based method performs well relative to more complicated scoring systems (e.g., factor scores). There were several biomarkers for which higher values indicate lower clinical risk, so we reversed the scores prior to calculating the cut-offs by multiplying by − 1.

Statistical analysis
Two-stage individual-participant-data (IPD) meta-analysis was implemented using the ipdmetan package in Stata (Fisher, 2015). The first stage involves fitting the specified model to each individual study's data to derive a study-specific effect estimate and variance. In the second stage, the effect estimates are combined using an inverse variance weighting method to derive a weighted average (pooled) effect estimate for each biomarker. The inter-study variance was estimated using restricted maximum likelihood (REML) estimation with the Hartung-Knapp-Siduk-Jonkman (HSJK) method (Langan et al., 2019) and assessed using the I 2 and tau 2 statistic. I 2 is the proportion of the total variation in the effect estimate due to between-study variability, while tau 2 is the between-study variance estimate. In line with best current practice, we also report a 95% prediction interval (PI) which is a range of values likely to contain the value of a single new observation based on our existing model and can be used to quantify heterogeneity in model performance (IntHout et al., 2016).
Meta-analyses of health studies typically contain few studies and simulation studies have shown that the estimated summary effect of the HSJK method is robust to changes in the heterogeneity variance estimate and works well in situations where there are small numbers of studies (Langan et al., 2019). We estimated separate linear regression models with respect to grip strength and walking speed adjusting for age (years), sex, and measured height (cm). The binary SRH measure was modelled using logistic regression adjusting for age and sex. We report the overall effect estimate, associated 95% confidence interval as well as I 2 , tau 2 , and the 95% PI for each biomarker in respect of each health outcome.
Having identified the best performing biomarker (if any) within each physiological system, we then compare how well a composite index composed of these biomarkers performed in predicting an independent criterion, mortality, compared with longer and more elaborate batteries published in the extant literature. Mortality data were available in 10/13 studies contributing to this analysis, with the length of follow-up ranging between 2 and 14 years (Table 1). As time to event was not available in all studies, we use a binary variable to indicate mortality status.

Results
The aggregated results for the 40 biomarkers across the 12 different physiological systems are depicted graphically in Figs. 1-3 for grip strength, walking speed, and SRH respectively, and measures of between study heterogeneity are provided in Supplementary Table S4. We discuss results for each physiological system in turn, drawing attention to which biomarker (if any) within that system was most strongly and consistently associated with the 3 health outcomes.

Hypothalamic-pituitary-adrenal (HPA) axis
The HPA system is responsible for the slower onset stress response. Two measures of HPA-axis functioning were available for analysis across the diverse cohorts: dehydroepiandrosterone sulphate (DHEAS) and cortisol. DHEAS is the sulphated form of the molecule dehydroepiandrosterone (DHEA), it declines rapidly with age, and low levels may indicate a problem with adrenal functioning (Urbanski et al., 2013). Cortisol is the primary glucocorticoid secreted by the HPA-axis following exposure to a stressor and plays an important role in the physiological response to stress (Sapolsky et al., 2000). DHEAS was available in 5/13 studies (HRS, ELSA, MIDUS II, UKHLS, SEBAS) while urinary cortisol measurements were available in 3/13 studies (MIDUS II, SEBAS, SKIPOGH). Lower DHEAS was associated with lower grip strength (B = − 1.02, 95CI= − 1.12, − 0.92), slower walking speed (B = − 3.83, 95CI= − 6.23, − 1.43), and worse SRH (OR = 1.74, 95CI= 1.51, 2.00). Despite its centrality to the theory of AL, cortisol was not related to the health outcomes assessed in the meta-analysis of these cohorts.

Sympathetic-adrenal-medullary (SAM) axis
Alongside glucocorticoids, catecholamines play a key role in the stress response because they are mediators of the adaptation of many systems of the body to acute challenges (McEwen, 2008). The sympathetic nervous system responds to the recognition of a stressor by releasing catecholamines that facilitate the rapid increases in heart rate, blood pressure, and respiration that enable the 'fight or flight' response. Despite the hypothesised importance of these biomarkers as primary mediators in the original conceptualisation of AL, direct measures of SAM activation are rarely included in population-based surveys and were only available in two studies (MIDUS II and SEBAS). However, neither epinephrine nor norepinephrine were related to the health outcomes assessed in the meta-analysis of these cohorts.

Parasympathetic nervous system
Heart rate variability (HRV) refers to the variation in time between successive heart beats and is a standard non-invasive method for evaluating autonomic nervous system functioning. Variability is considered an important and adaptive characteristic of the stress-response machinery as it allows the body to make accommodations in respiration and cardiac outflow in response to physical and psychological stressors. Higher HRV is therefore considered healthy as it signifies neurovascular compliance. Parasympathetic Nervous System (PNS) biomarkers were available in only two studies (TILDA and MIDUS II). The four PNS biomarkers common to both studies included two time-domain indices: the standard deviation of R-to-R intervals (SDRR) and root mean square of successive differences (RMSSD) and two frequency-domain indices: low frequency (LF-HRV) and high frequency heart rate variability (HF-HRV). LF-HRV was found to be the best performing biomarker within this system, with lower variability being consistently associated with lower grip strength (B = − 0.50, 95CI= − 1.49, 0.49), slower walking speed (B = − 3.35, 95CI= − 4.14, − 2.57), and worse SRH (OR = 1.76, 95CI= 1.32, 2.33).

Oxidative stress
Oxidative stress is caused by an imbalance between production and accumulation of reactive oxygen species in cells and tissues and the reduced ability of antioxidant defences to detoxify these reactive products (Pizzino et al., 2017). Oxidative stress biomarkers included uric acid which was available in 4/13 studies (SKIPOGH, Colaus/PsyCoLaus, EPIPorto, SEBAS) and homocysteine which was available in 2/13 studies (HRS and SEBAS). Neither uric acid nor homocysteine were found to relate to the health outcomes in the meta-analysis of these cohorts, although the confidence intervals around the estimates were large given the small number of studies in which they were assessed.

Inflammatory / immune system
The absolute number of different biomarkers measured across studies was greatest for measures of the inflammatory/ immune system involving 9 in total. Consistent with the hypothesis that chronic lowgrade inflammation is a common thread in the pathophysiology of ageing (Cevenini et al., 2013), we observed a general pattern whereby the constellation of inflammatory biomarkers tended to be negatively related to health state, although the magnitude of the associations varied across biomarkers. CRP was available in all studies and was consistently associated with lower grip strength (B = − 0.78, 95CI= − 1.05, − 0.50), slower walking speed (B = − 6.35, 95CI= − 8.31, − 4.39), and worse SRH (OR = 1.88, 95CI= 1.61, 2.20). Likewise, the coefficients for fibrinogen were marginally lower than those observed for CRP, although measured in fewer studies overall (6/13). Albumin was inversely associated with grip strength (B = − 1.57, 95CI= − 2.09, − 1.05) and SRH (OR = 1.66, 95CI= 1.11, 2.46), but not with walking speed. Although available in only 6/13 studies, the primary cytokine IL6 was also found to be  1.34, 95CI= 1.15, 1.57), while IL1ra, IL10, ICAM, and E-Selectin, were found to be unrelated to the 3 health outcomes.

Cardiovascular system
Cardiovascular measures are the most employed biomarkers of AL across studies (Juster et al., 2010;Misiak et al., 2022), and systolic blood pressure (SBP), diastolic blood pressure (DBP) and resting heart rate (RHR) were available in all 13 studies. Assessment usually involved 2-3 measurements in each study expressed as the mean value averaged over the number of measurement occasions. Although measures of blood pressure and RHR are intimately related, RHR was most strongly related to the 3 health outcomes under examination in the meta-analysis of these cohorts. Being dysregulated in RHR was associated with lower grip strength (B = − 0.45, 95CI= − 0.67, − 0.24), slower walking speed (B = − 3.43, 95CI= − 5.64, − 1.22), and worse SRH (OR = 1.56, 95CI= 1.36, 1.79). By contrast, SBP and DBP performed poorly. Indeed, being dysregulated in SBP and DBP was found, counter-intuitively, to be associated with higher grip strength, while neither measure was related to walking speed or SRH when pooled across studies. For the age range included in this study, SBP increases and DBP decreases with age, which is why pulse pressure (i.e. SBP -DBP) may be preferred as a biomarker of risk at older ages. Nevertheless, pulse pressure did not perform notably better than SBP or DBP. Medication use represents a major confounding factor as a substantial proportion of the older population are using anti-hypertensive medications. Prior studies have accounted for this by treating someone as being biologically dysregulated in blood pressure if they are using anti-hypertensive medications (Geronimus et al., 2006). We therefore re-estimated the models including cohorts for whom medication data was available (9/13). Supplementary Table S5 shows that the relationship of blood pressure (SBP, DBP, pulse pressure) with our various outcome measures improved substantially when we account for medication use.

Respiratory system
Peak expiratory flow (PEF) measurement is a simple measure of the air expelled from the lungs during forceful expiration following full inspiration and was available in 6/13 cohorts. Scoring in the lowest quartile of the distribution with respect to respiratory functioning was associated with a ~2 kg reduction in grip strength (B = − 2.09, 95CI= − 2.48, − 1.70), ~10 cm/sec slower walking speed (B = − 10.27, 95CI= − 16.03, − 4.51), and two times the odds of reporting fair/poor SRH (OR = 2.07, 95CI= 1.71, 2.51).

Anthropometric
Metabolic system dysregulation was measured using two subsystems -anthropometric measures of body fat composition, and serum measures of blood glucose intolerance. Body mass index (BMI), waist circumference (WC), waist-hip-ratio (WHR), and waist-to-height ratio (WtHR) are all measures used to estimate body fat composition. WtHR was found to be the best performing biomarker within the

Fig. 3. Odds ratio of scoring in the highest risk quartile for each biomarker with fair/poor self-rated health in individual participant data (IPD) metaanalysis. Legend:
Estimates were derived using restricted maximum likelihood (REML) estimation with the Hartung-Knapp-Siduk-Jonkman (HSJK) method while holding age, sex constant, *denotes biomarkers which are reverse scored prior to analysis.

Lipidemia
Markers of lipid dysregulation are amongst the most common biomarkers used in the calculation of the AL score in the extant literature and were well represented across the different studies. Measures of total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), and triglycerides were available in all 13 studies, while low-density lipoprotein cholesterol (LDL-C) was available in fewer studies (10/13). The ratio of TC to HDL cholesterol was derived for all cohorts (TC/HDL ratio). HDL cholesterol outperformed all other lipid biomarker in terms of its association with the health outcomes assessed in this study. Scoring in the highest HDL quartile (reversed, i.e., lower levels of HDL cholesterol), was consistently associated with lower grip strength (B = − 0.09, 95CI= − 0.34, 0.16), slower walking speed (B = − 5.04, 95CI= − 7.13, − 2.95), and worse SRH (OR = 1.65, 95CI= 1.40, 1.94). Importantly, TC is considered a mainstay of the AL measure yet was found to be associated with the health outcomes in the opposite direction than hypothesised. Similar patterns were observed for LDL cholesterol. In sensitivity analyses, we attempted to account for medication usage by counting participants as biologically dysregulated in total cholesterol if taking lipid modifying medications (Supplementary Table S5). The association of TC with the outcome measures were closer to expected associations after accounting for medication use.

Glucose metabolism
Metabolic alterations are a central feature of the ageing process (López-Otín et al., 2016), characterised by insulin resistance, changes in body composition, and physiological declines in growth hormone (Barzilai et al., 2012). Three measures of blood glucose metabolism were considered in the present study, including fasting insulin levels (4/13), fasting blood glucose levels (8/13), and glycated haemoglobin (HbA1c) (11/13). Insulin is a hormone secreted by the pancreas which helps regulate blood glucose levels by signalling muscle, liver, and fat cells to absorb glucose from the blood. Blood glucose indicates the average level of glucose in the blood, while glycated haemoglobin (HbA1c) reflects an individual's average glycaemic control over a longer time-period (approx. 8-12 weeks). Overall, the glucose-based biomarkers were associated with the health outcomes in the hypothesised direction.

Renal system
Two measures of renal function were available in a subset of studies: Serum creatinine was available in 6/13 studies, while cystatin C (CYS-C) was available in 4/13 studies. Creatinine and CYS-C are used to estimate glomerular filtration rate because they are freely filtered at the level of the glomerulus (functioning unit of the kidney). Creatinine is a byproduct of muscle cell catabolism, while CYS-C is produced by all nucleated cells meaning it may have broader impacts on ageing biology beyond just the renal system. In the meta-analysis of these cohorts, being dysregulated in CYS-C was associated with lower grip strength (B = − 1.48, 95CI= − 3.55, 0.59), slower walking speed (B = − 9.39, 95CI= − 14.09, − 4.69), and worse self-reported health (OR = 2.13, 95CI= 1.53, 2.96). On the other hand, creatinine measures performed less well in this regard.

Liver system
Liver based biomarkers have rarely been utilised in AL research to date, although a recent study by Karimi et al. (2019) using the UKHLS dataset included alanine transaminase (ALT) and aspartate transaminase (AST) as part of an expanded AL composite. Elevated levels of ALT and AST may indicate liver disease, although AST is considered a less specific marker for liver injury than ALT due to expression in other tissues (Giboney, 2005). These two biomarkers were available in HRS and SEBAS, in addition to the UKHLS, but neither related strongly to the health outcomes assessed in this study.

Summary of results
There was at least one biomarker within 9/12 systems that was reliably and consistently associated in the hypothesised direction with the three health outcomes under investigation in the meta-analysis of these cohorts. The list includes DHEAS, LF-HRV, CRP, RHR, PEF, HDL-C, WtHR ratio, HbA1c, and CYS-C. Biomarkers of SAM functioning, oxidative stress, and liver functioning were not related to the health outcomes in our meta-analysis, although it should be acknowledged that these biomarkers were available in fewer studies overall. The results of the analysis suggest that these 9 biomarkers might constitute an efficient panel of biomarkers for capturing physiological 'wear and tear'. Unfortunately, these 9 biomarkers were not all available within a single study. There was, however, a subset of five biomarkers (CRP, RHR, HDL-C, WtHR, and HbA1c) common to almost all studies that may constitute an abbreviated five-item measure of AL.

Validating an abbreviated five-item measure of AL
We sought to determine how well this brief measure compared with longer, more elaborate batteries in predicting mortality, (which was available for 10/13 studies) by recreating the AL indices used in previously published studies and comparing results directly against the fiveitem measure using the same case base. If a study included any of the four additional biomarkers (DHEAS, LF-HRV, PEF, CYS-C) which the meta-analysis suggested might serve as useful adjuncts to the short fiveitem index, we created additional indices to determine whether the predictive performance of the five-item AL measure was improved by the addition of that biomarker. These alternate indices were compared against the common set by variance explained (R 2 ); and model fit (Bayesian Information Criterion (BIC)) as the models were non-nested. Following Raftery (1995), an absolute reduction in BIC of between 0 and 2, 2-6, 6-10, and 10 or more was deemed to provide weak, positive, strong, and very strong evidence for alternative model specifications, respectively. The results of these analyses are depicted graphically in Fig. 4 and summary statistics are presented in Supplementary  Table S6. In general, we found that the abbreviated five-item measure performed as well or better than more elaborate batteries in predicting mortality, with only a marginal loss in R 2 . However, in general, the performance of the five-item index was improved markedly by the addition of PEF, and to a lesser extent by DHEAS and CYS-C, but not LF-HRV.

Discussion
The AL framework has generated a large volume of publications in the last three decades but has been criticised for the lack of consistent definition. This meta-analytic study was motivated by the desire to identify a core set from among the broad array of biomarkers that have been used to instantiate the concept. The results of this IPD metaanalytic study moves the field forward by enabling teams of researchers working with the AL framework to make informed decisions regarding which biomarkers are integral to the concept and which are more peripheral, at least in the context of physical health. We acknowledge that our findings are based on selected outcomes and future work should seek to expand the range of outcomes examined.

A brief five-item AL measure
We identified a small panel of five, primarily cardio-metabolic biomarkers -CRP, RHR, HDL, WtHR, and HbA1c -that were available in most studies contributing to this IPD meta-analysis that were consistently associated with health status. An abbreviated AL measure comprising our five biomarkers was found to predict mortality as well as longer, more elaborate AL batteries. Notably, three of these biomarkers -CRP, Hba1C, and RHR -(alongside PEF), were among the most strongly predictive of all-cause mortality in a prior study involving the NCDS cohort, while measures of HPA-axis activation such as baseline cortisol and change in cortisol, were not (Castagné et al., 2018). An obvious tension therefore exists between selecting biomarkers that remain faithful to the original theoretical exposition of AL, which emphasised the centrality of the primary stress mediators in the physiological cascade that leads to disease states; or choosing biomarkers that predict health outcomes. Our analytical strategy was designed to select the best performing set of biomarkers across different physiological systems based on their association with several important health outcomes irrespective of their association with stress. One interpretation of the results is that chronic inflammation is the common thread linking the biomarkers with the health outcomes as inflammation is strongly implicated in the development of many age-related diseases (Cevinini et al., 2013).
We expound briefly on each of the five biomarkers that comprise our abbreviated AL index. CRP is an acute phase inflammatory protein produced by cells in the liver during an inflammatory episode, largely in response to signalling by interleukin-6 and other primary cytokines. As a non-specific marker of inflammation, elevated levels of CRP may Fig. 4. Within-cohort comparison of the performance of the brief 5-item allostatic load index with longer batteries in predicting odds of mortality. Legend: All models adjusted for age and sex. Five-item = C-Reactive Protein, Resting Heart Rate, High Density Lipoprotein-Cholesterol, Waist-to-Height Ratio, HbA1c. DHEAS = dehydroepiandrosterone sulfate; PEF = peak expiratory flow; CYSC = cystatin C; LF-HRV = low frequency heart rate variability.
indicate the presence of acute inflammation, low grade chronic inflammation, or a state of upregulated tissue repair and regeneration, which incurs a physiological cost (Del Giudice and Gangestad, 2018). RHR is a readily available vital sign that holds important prognostic information about general health state as it reflects the number of times the heart beats in a minute while inactive, so a higher RHR signifies that the heart is having to work harder to pump blood around the body. The heart is a muscle subject to biomechanical stress, so it is perhaps unsurprising that a higher RHR has been identified as a risk factor for incident cardiovascular disease (CVD) and cardiovascular mortality in many studies (Zhang et al., 2016). WtHR ratio is a simple measure of central adiposity that has recently received attention as a biomarker of health risk (Ashwell and Gibson, 2016), and possesses properties that arguably make it a better metric of body composition compared with other measures of total or central adiposity. HDL cholesterol is known as 'good cholesterol' as it helps remove other forms of harmful cholesterol from the bloodstream via the liver. Low HDL cholesterol is a well-established biomarker of dyslipidemia that has been inversely associated with CVD risk in a number of meta-analyses (Di Angelantonio et al., 2009). Finally, glycated haemoglobin (HbA1c) serves as a biomarker of pre/diabetes and provides a read-out of average blood sugar levels over a longer time horizon (2-3 months) compared with fasting glucose or insulin.
These biomarkers have the advantage of being relatively inexpensive, easy-to-measure variables that have well-established clinical cutpoints enabling direct comparisons across studies and hold promise as a sub-clinical index of physiological dysregulation with diagnostic utility as long envisaged with the AL concept. As a further sensitivity check, we generated a clinical risk score based on established cut-offs for CRP, RHR, HDL-C, WtHR, and HbA1c, and compared its performance against study-specific cut-points. In general, results using study-specific cutpoints (OR=1.31; 95CI = 1.22, 1.40; 95PI = 1.14, 1.50) ( Supplementary  Fig. S1) were very similar to results obtained using high-risk clinical cutpoints (OR=1.34; 95CI = 1.20-1.49; 95PI = 1.01, 1.78) ( Supplementary  Fig. S2). We generated a funnel plot ( Supplementary Fig. S3) to check for potential bias in the studies included in our IPD meta-analysis, but the graph looks fairly symmetrical given the relatively small number of studies and some heterogeneity in terms of the general characteristics of the cohorts. Finally, the magnitude of the effect sizes reported here are entirely consistent with the results of a recent systematic review and meta-analysis that examined the relationship of AL with mortality and reported hazard ratios of 1.22 (95CI = 1.14, 1.30) and 1.27 (95CI = 1.10, 1.46) using population-based and clinic-based cut-offs respectively (Parker et al., 2022). We tested for effect modification by fitting an AL*sex interaction term in the IPD meta-analysis, but it was non-significant in respect of both the population-based (Fig. S4) and clinic-based cut-offs (Fig. S5). Would our conclusion have been different if we had calibrated our AL measure using mortality as the dependent variable? In short, no. Supplementary Fig. S6 indicates that our conclusions would have been the same, and that CRP (OR = 2.14, 95CI= 1.85, 2.47), RHR (OR = 1.82, 95CI= 1.56, 2.11), HDL (OR = 1.30, 95CI= 1.18, 1.44), WtHR (OR = 1.33, 95CI= 1.11, 1.60), and HBA1c (OR = 1.22, 95CI= 0.91, 1.64) were all positively associated with increased odds of mortality in the metaanalysis of these cohorts.

An expanded 8-item AL measure
The supplementary analyses suggest that the five-item measure could be improved, particularly by the addition of PEF, and to a lesser extent DHEAS and CYS-C. In fact, the results for PEF make a compelling case for it to be included as a core constituent of a 6-item AL index, except that it was not available in every cohort. PEF was the best performing biomarker in terms of its absolute effect sizes across the three health outcomes, which is perhaps unsurprising as it likely operates as a surrogate measure of physical health and lifetime smoking (Enright et al., 2001;Trevisan et al., 2019), as well as exposure to other occupational and environmental toxins. DHEAS was strongly associated with each of the three health outcomes and is generally regarded as being a reliable endocrine biomarker of ageing because circulating levels are very high in young adulthood and decline rapidly with age. Finally, there is growing evidence that CYS-C is more than just a marker of kidney disease as it is produced by all nucleated cells and its range of peptidase and proteolytic inhibitory functions, coupled with expression in almost all tissues, means it has broader impact on ageing biology beyond just the renal system (Zi and Xu, 2018).

Re-evaluation and retrenchment
One of the most interesting findings to emerge from this study was that the biomarkers of neuroendocrine functioning, which are posited to play a central load in the physiological cascade that contributes to allostatic overload, were for the most part, unrelated to the three physical health outcomes examined; although it should be acknowledged that epinephrine, norepinephrine and cortisol were measured in few studies overall. Furthermore, numerous reviews have documented challenges in the measurement of these biomarkers (Peaston and Weinkove, 2004), particularly cortisol (El-Farhan et al., 2017), which may help account for the lack of a signal in this study. These biomarkers may impact outcomes, not through levels -which is what is generally measured (e.g. average overnight output) -but through patterns of activity, as these systems (especially the catecholamines), oscillate rapidly over the course of a day making it difficult to use them as biomarkers. The corollary is that available data are less well-suited for picking up the potential impacts of dysregulation in these systems as they are measured on different time scales.
There were other biomarkers, considered mainstays of the AL index such as SBP, DBP, and total cholesterol that were not strongly related to the outcomes we measured. We found that medication use represented a major confounding factor as a substantial proportion of older persons use anti-hypertensives or statins. Despite this, not all studies capture medication usage, nor indeed account for it when constructing their AL composites. Some theorists have argued that a person should be counted as biologically dysregulated if they are taking medications (Geronimus et al., 2006), while others have argued that biomarker levels are controlled under these circumstances, and hence, the person should not be counted as biologically dysregulated unless they fall into the highest risk quartile (Seeman et al., 2004), yet others suggest a 0.5 point weighting to indicate some level of elevated risk (Rodriquez et al., 2019). Notwithstanding these caveats, the central point remains that these biomarkers may be less than ideal components of the AL index in studies including older people unless medication usage is accounted for.

Recommendations
The lack of a gold standard definition has been a persistent criticism levelled at the AL framework since its inception. This study represents an initial attempt to distil a core set from the wide array of biomarkers that have been used to instantiate the concept, and, it is hoped, stimulate further research in this vein. Of course, our study also has several limitations that arguably hamper any conclusions that can be drawn. These limitations include: (i) the lack of a systematic review and meta-analysis to inform the inclusion of studies meaning that the included studies represent only a subset of the wider corpus of studies that have measured AL, (ii) some heterogeneity in the general characteristics of the included cohorts (iii) a restricted age sample focused on mid-to-late life so we do not know whether these results generalise to younger cohorts, (iv) exclusion (e.g. testosterone) or under-representation (e.g. cortisol. epinephrine / norepinephrine) of some biomarkers that are central to the theory of AL, which tend to be less represented in large population-based cohort studies and may bias results and conclusions, (v) not stratifying biomarker levels by sex as men and women are known to differ in levels of specific biomarkers, (vi) failure to account for medication usage when characterising dysregulation, (vii) the choice of outcome variables which focus on objective physical function and SRH as opposed to selecting biomarkers based on disability, disease states or mortality (although we used mortality in our sensitivity analysis), and (viii) the use of cross-sectional data which assumes that elevated levels of the biomarkers leads to worse health (rather than the reverse).
While acknowledging these caveats, we would also point to a number of strengths including, samples that span the relevant age range for these outcomes, samples that span continents, samples that have extensive biomarker data available, and an analytical approach to the problem of variable selection designed to move the field forward. The panel of biomarkers selected to represent the core construct may help address some of the essential criticisms associated with the AL framework, most notably, that it lacks construct validity since its formulation varies from study to study.
This meta-analytic investigation suggests that some of these issues can be addressed in the following way: 1. Studies should consider reporting results for the common set of five biomarkers to allow for comparison of cumulative physiological burden across different socio-demographic groups and age ranges. 2. Studies should consider reporting results for the common set of five biomarkers using clinical cut-points to facilitate comparisons across studies. 3. De novo studies should consider including the expanded eight-item measure as well as other theoretically informed biomarkers. 4. Studies should take account of medication usage when calculating AL composite measures. 5. Researchers should endeavour to include measures of neuroendocrine functioning to allow for a fuller understanding of their role in predicting disease as they are still under-represented in many studies of allostatic load, particularly observational panel studies. 6. Researchers should revisit this question using other extant datasets and outcome variables to determine how well the proposed abbreviated panel works, and ways in which it could be further refined or improved.
These suggestions are offered only as a harmonising framework for comparative research, and it is anticipated that the panel will continue to be expanded and refined as teams of researchers revisit this issue armed with better data or newer analytical approaches, and a range of different outcome variables.

Conflict of interest disclosure
The authors declare no conflict of interest.

Data Availability
TILDA was funded by Irish Life plc, the Irish Government and the Atlantic Philanthropies. TILDA data are available from the Irish Social Science Data Archive (www. ucd.ie/issda/). The Health and Retirement Survey (HRS) is sponsored by the National Institute on Aging through a grant to the University of Michigan (U01AG009740). HRS data can be accessed from https://hrs.isr.umich.edu/data-products/ access-to-public-data. NHANES 2001-2002 data are curated by the National Center for Health Statistics and are available from https://wwwn.cdc.gov/ nchs/nhanes/continuousnhanes. MIDUS is supported by grant numbers P01-AG020166, U19-AG051428. The data are distributed by the Interuniversity Consortium for Political and Social Research (Ryff et al., 2019). Funding for the English Longitudinal Study of Ageing is provided by the National Institute of Aging [grants 2RO1AG7644-01A1 and 2RO1AG017644] and a consortium of UK government departments coordinated by the Office for National Statistics. The research data are distributed by the UK Data Service (Marmot et al., 2017) Studies, 2019a). The CoLaus/PsyCoLaus study was and is supported by research grants from GlaxoSmithKline, the Faculty of Biology and Medicine of Lausanne, and the Swiss National Science Foundation (grants 33CSCO-122661, 33CS30-139468, 33CS30-148401 and 33CS30_177535/1). The SKIPOGH cohort was supported by the Swiss National Science Foundation FN 33CM30-124087 and FN 33CM30-140331, complemented with institutional support (CHUV, Unisante). SKIPOGH is part of, and was supported by, the Swiss National Centre of Competence in Research (NCCR) Kidney Control of Homeostasis (Kidney.CH) and the National Center of Competence in Research (NCCR) TransCure (support for Dusan Petrovic). The EPIPorto cohort was supported by national funding from the Foundation for Science and Technology -FCT (Portuguese Ministry of Science, Technology and Higher Education), I.P., within the scope of projects UIDB/04750/2020 and LA/P/0064/2020. HAALSI is funded through a program project (P01AG041710-05) awarded by National Institute on Aging (NIA) at the National Institute of Health (NIH). The MRC/Wits Rural Public Health and Health Transitions Research Unit and Agincourt Health and Socio-Demographic Surveillance System, a node of the South African Population Research Infrastructure Network (SAPRIN), is supported by the Department of Science and Innovation, the University of the Witwatersrand, and the Medical Research Council, South Africa, and previously the Wellcome Trust, UK (grants 058893/Z/ 99/A; 069683/Z/02/Z; 085477/Z/08/Z; 085477/B/08/Z). Data are available from the Harvard dataverse: https://dataverse.harvard.edu/ dataverse/haalsi. SEBAS is supported by the United States Department of Health and Human Services. National Institutes of Health. National Institute on Aging (R01 AG16790, R01 AG16661), Taiwan Department of Health. Bureau of Health Promotion, National Health Research Institute (Taiwan) (DD01-861x-GR601S), Taiwan Provincial Government. The data are distributed by the Inter-university Consortium for Political and Social Research (Weinstein et al., 2014). Although all efforts are made to ensure the quality of the materials, neither the original data creators, depositors or copyright holders, the funders of the data collections, nor the UK Data Service bear any responsibility for the accuracy or comprehensiveness of these materials. The data providers also bear no responsibility for the analysis or interpretation of the data.

Appendix A. Supporting information
Supplementary data associated with this article can be found in the online version at doi:10.1016/j.psyneuen.2023.106117.