Discovering markers of healthy aging: a prospective study in a Danish male birth cohort

There is a pressing need to identify markers of cognitive and neural decline in healthy late-midlife participants. We explored the relationship between cross-sectional structural brain-imaging derived phenotypes (IDPs) and cognitive ability, demographic, health and lifestyle factors (non-IDPs). Participants were recruited from the 1953 Danish Male Birth Cohort (N=193). Applying an extreme group design, members were selected in 2 groups based on cognitive change between IQ at age ~20y (IQ-20) and age ~57y (IQ-57). Subjects showing the highest (n=95) and lowest (n=98) change were selected (at age ~57) for assessments on multiple IDPs and non-IDPs. We investigated the relationship between 453 IDPs and 70 non-IDPs through pairwise correlation and multivariate canonical correlation analysis (CCA) models. Significant pairwise associations included positive associations between IQ-20 and gray-matter volume of the temporal pole. CCA identified a richer pattern - a single “positive-negative” mode of population co-variation coupling individual cross-subject variations in IDPs to an extensive range of non-IDP measures (r = 0.75, Pcorrected < 0.01). Specifically, this mode linked higher cognitive performance, positive early-life social factors, and mental health to a larger brain volume of several brain structures, overall volume, and microstructural properties of some white matter tracts. Interestingly, both statistical models identified IQ-20 and gray-matter volume of the temporal pole as important contributors to the inter-individual variation observed. The converging patterns provide novel insight into the importance of early adulthood intelligence as a significant marker of late-midlife neural decline and motivates additional study.


INTRODUCTION
In countries with advanced economies, changes in age distributions, largely due to lower birth rates and increased life expectancy, has meant that the world's population is increasingly older, with the number of persons over 80 projected to triple by 2050 [1]. Paradoxically, the success and opportunities presented by this new longevity has often been over-shadowed by the many challenges that come with a "top heavy" society. Specifically, the growing number of older adults have created an unprecedented demand on health care services, with increased vulnerability to cognitive decline, appreciable loss of autonomy, and need for institutional care threatening the economic security of families, communities and countries. With an estimated 47 million people currently diagnosed with dementia worldwide [2,3], the necessity of responding to these challenges are vital. In view of this, what was once referred to as 'the elephant in the room' [4] -i.e., the role of aging in cognitive decline -has now transpired into a scientifically challenging and compelling pursuit, the search to identify the constituents of a healthy brain and mind across the human lifespan [5][6][7][8][9][10][11].
Despite significant individual differences in aging trajectories, the overall consensus on age-related effects on brain health and cognition ability is clear: the brain shrinks with advancing age with alterations observed at both the molecular and morphological level, and these changes are linked to declines in specific cognitive domains [47,67,68]. Of these, processing speed, executive functioning, working memory, and inhibitory functions are the cognitive domains reported as most vulnerable to advancing age, whilst implicit procedural long-term memory, numerical processing, and the general knowledge accrued across the lifespan are those that appear relatively spared [6,7,9,10,37,47,[68][69][70].
Currently, much of our knowledge on brain and behavior changes are derived from cross-sectional studies that compare single observations from individuals of different ages -most commonly groups of young and extremely old adults. Although suitable for identifying population-level mean trends, and efficient in terms of time and cost, cross-sectional studies are vulnerable to cohort effects, selection bias, and by design, can only offer insight on age-relateddifferences [37,43,50,52,62,65,[71][72][73]. As aging research is essentially the study of change, a preferred approach of extracting individual differences in change -independently of individual differences in levelwhilst simultaneously permitting the study of developmental and maturational trends is to use a longitudinal design with multiple follow-up assessments. Thus, to expand on prior efforts, we use withinsubject (longitudinal) behavioral measurements that span across critical periods of the human lifespan using members of a prospective study; the 1953 Metropolitan Danish Male Birth Cohort (MDBC-1953) [14,74].
AGING Specifically, following a major revitalization in 2009, research efforts based on MDBC-1953 data has focused on age-related cognitive decline. Here, the main aim is to elucidate why some older adult's cognitive abilities are preserved well into late adulthood, while others demonstrate rapid decline.
In order to optimize the possibility of cognitive ability in a late-midlife being found to be associated with biological (or other) correlates, we exploited the longterm nature of this study to identify individuals with a relatively large decline in cognitive ability from earlyadulthood ("decliners"), and those who show improvement ("improvers"). This standard approach, formally known as the Extreme Groups Design (EGD) [75], also maximizes the subject variability in other relevant factors such as education attainment, occupational complexity and levels of motivation, increasing the generalizability of this study to the real population. That is, in what is otherwise a highly homogenous cohort, the EGD attenuates the commonly observed selection bias of self-selected healthy study samples towards high-functioning and educated individuals. Further benefits of using this cohort are manifold. First, there is a lack of evidence suggesting that pathological change abruptly begins at old age after a period of relative stability. Thus, inclusion of childhood, youth and late midlife cognitive scores in aging studies may be key to predicting later-life health outcomes [8,13,67,68]. Second, a homogenous, late-midlife cohort provides conditions that are optimal for assessing the influence of potential candidate determinants on latelife morbidity without confounding cohort or other age factors. Third, findings from the extant literature exploring brain markers of develop-mental and aging processes have described age-associated changes as "early development in reverse" [42,64,66]. Thus, phenotyping across the human lifespan and not just the extremes of the age-range is optimal when exploring normative or pathological brain aging patterns [9,15,37,68,69,76]. Specifically, if the brain's 'blueprint for aging' has already developed by preschool years, the conservative approach of selecting the oldest of old to expose biomarkers of normative aging is an outdated one [25][26][27]. Lastly, even when data are derived from a longitudinal study, many factors (demographic, lifestyle etc.) that may contribute to the observed heterogeneity in aging trajectories are ignored which ultimately undermines the reliability of the relationships discovered.
Considering this, our primary focus was to explore the factors-general health, vascular, demographic and lifestyle-that contribute to normative brain and cognitive aging. Crucially, we are not interested in just the age-dependence in any individual physiological or behavioral component, but a holistic range of endo-genous and exogenous influences accrued across the lifespan. To achieve this, we distinguish between life course influences that act to preserve health ("positive influences") versus those that are implicated in its demise ("negative influences"). This approach has the potential to identify specific brain and cognitive patterns that may underlie differential aging trajectories, whilst simultaneously exposing the relation of these patterns to a broad range of modifiable risk and protective factors. By modelling multiple variables of multiple modalities simultaneously we provide a more precise estimate of their synergetic effects filling a gap in the existing aging literature. Furthermore, our inclusion of both bivariate and multi-level analyses allows for both specific and general relations to be explored, which are potentially more informative than either approach alone. Specifically, this study goes beyond just investigating the interrelations among a selection of variables; rather, we are seeking specific age-related patterns of brain structure that are associated to sets of correlated cognitive, demographic, health, and behavior variables, as brain-behavior modes of population covariation. Tables 1-4.

Whole-group univariate associations
Higher general cognitive ability in early adulthood (age 20) is significantly associated with greater grey matter (GM) volume in late midlife (age 57). Greater height is significantly associated with higher mode of anisotropy (MO) in the medial lemniscus. Higher general cognitive ability (assessed at ages 20 (IQ-20), 57 (IQ-57) and 63 (IQ-63), and a higher score in the Addenbrooke Cognitive Examination (ACE) are significantly associated with number of years in education.
We visualize results with Manhattan plots that show -log10 p-values for IDP-by-non-IDP correlations, arranged by non-IDPs on the x-axis, multiple testing thresholds across all pairwise associations are marked with a horizontal line (here only familywise error rate (FWE) is indicated as False Discovery Rate (FDR) identified no additional significant result). We identified two significant univariate associations between specific imaging derived phenotypes (IDP) and non-imaging derived phenotypes (non-IDP) both non-adjusted and adjusted for the effects of cognitive change, Figure1 and Supplementary Figure 7A-7B. Specifically, we found that higher scores in early adulthood IQ (IQ-20) is associated with a greater GM volume in the right temporal pole. Additionally, we identified a positive AGING association between height and MO in the medial lemniscus (left). Table 5 lists all FWE-significant cor-relations extracted from Figure 1, (FWE threshold: puncorrected = 5.80 x 10 -5 ).

Validation test: subgroup univariate associations
Results exploring univariate associations for each subgroup separately (i.e., subgroup group A improvers, and subgroup group B decliners) indicated that the significant associations observed at the level of the whole-group where largely driven by measures derived from subgroup A members. This suggests that, despite both groups being relatively homogenous, it is variation in the improvers that are most responsible for the observed effects. Crucially, we did not observe inconsistent effects in the sign of effects that would suggest Simpsons Paradox. Having evidence that the EGD is not inducing paradoxical effects, we can further explore IDP-by-non-IDP relations controlling for changes in IQ.

Univariate associations: adjusting for cognitive change (C∆)
Univariate associations estimating the relation between each of the IDP and non-IDP measures after adjusting for the effects of C∆1 (IQ-57-IQ-20) revealed negligible deviation from the whole-group results visualized in Figure 1. However, when adjusting all variables for the effects of C∆2 (IQ-63-IQ-57), we observed modest attenuation of the IQ-20 and GM volume of the temporal pole (right) association, Supplementary Figure 7A, but negligible change to the association between height and MO in the medial lemniscus. Conversely, we found that adjusting for C∆3 (IQ-20-IQ-11) did not affect the significant link between IQ-20 and GM volume of the temporal pole, but did remove the significant correlation between height and MO in the medial lemniscus. Similarly, when estimating the relationship between each cognitive variable and all (other) non-IDP measures after adjusting for the effects of C∆1 and C∆2 we found negligible changes to the associations visualized in Figure 2. However, adjusting for the effects of C∆3 resulted in the removal of all prior significant associations listed in Table 5.

Whole-group multivariate associations
Results from CCA identified a single statistically significant mode of population co-variation coupling individual cross-subject variations in brain structure to an extensive range of non-imaging measures (Rc = 0.75, permuted p-corrected = 0.01). Post-hoc analyses revealed IQ variables, SEP, psychosocial factors and GM volume as most influential in driving the multivariate correspondence between IDPs and non-IDPs.  AGING underlying structure of the CCA-mode. Of these, civil status (where a 'non-single' status was coded as a positive variable) and smokes (packs per year), were found to be most influential.
With regards to post-hoc correlations computed between IDPs and the CCA-mode, Figure 3B, we identified GM volume of the temporal pole (left and right) as the strongest positively linked brain-imaging marker (r 2 = 29.3%; r =0.54), and axial diffusivity L1 of the posterior limb (right) of the internal capsule as the strongest negatively linked (r 2 = 7.45%; r 2 =-0.35). Further top positive IDP contributions were succeeded by global brain volume measures of GM, white matter (WM) and peripheral cortical GM, a broad range of other GM region-of-interest (ROI) volume measures, and volume of the sub-cortical structure, the thalamus. Similar to non-IDPs, strong negatively contributing IDPs were fewer in number, such that the identified CCA-mode largely related positively contributing brain macrostructural markers (i.e., measuring larger whole-brain GM and WM volume, subcortical structure volumes, or GM volume of ROI) to each other and to range of diffusion MRI (dMRI) tract-based spatial statistics (TBSS) derived microstructural markers. AGING Figures 4A and 4B visualize the importance of each subdomain in influencing the multivariate associations underpinning the significant CCA-mode. In brief, for each subdomain (x-axis), we compute the mean observed-variable-to-CCA-mode correlation across all variables in that subdomain. This value is plotted on the y-axis using units of correlation (r 2 ). Specifically, the length of each bar represents that subdomain's average importance -both positive (grey) and negative (black)to the identified CCA-mode. Similar to the interpretation of post-hoc correlations in Figure 3A, in Figures 4A and 4B we invert the signs of both non-IDP and IDP measures where a lower value is indicative of a "positive" trait/marker and a higher value is representative of a "negative" trait/marker, such as in the case of DTI-derived indices MD, L1, L2, and L3, or cognitive tasks measuring reaction time or number of errors. Thus, positive correlations between a given sub-domain and the CCA-mode represents categoriallydriven contributions from "positive" IDP and non-IDP markers, whilst negative correlations represent categorically-driven contributions pertaining to "negative" traits. With this in mind, the subdomains dominating the underlying CCA-based associations were identified as positive contributions from total brain tissue volume (r 2 =4.41%), GM volume of noncerebellum ROIs (r 2 =3.16%), subcortical structure volumes (r 2 =2.56%), and cognition (r 2 =3.24%). With regards to meaningful links between subdomains, our results identified a mode of population covariation that resembles the general intelligence g-factor (i.e., the observed commonality among observed mental abilities [77]) but which also includes a broad range of other non-cognitive variables describing traits related to biophysical, sociodemographic, lifestyle and health factors. In addition, the identified CCA mode largely AGING distinguishes between subject performance in the various measures included, allowing their relative contribution to the CCA-mode to be described as either positive or negative. In this regard, the identified mode can be represented by a "positive-negative" axis that links positive measures of cognitive performance to each other and to a meaningful pattern of other (nonimaging) measures (e.g., better performance in cognitive tests, higher educational attainment, regular physical activity, higher SEP vs measures of lower cognitive performance, lower education attainment, physical inactivity and poorer health status etc.).
We explored the multivariate results in a number of ways to establish that the estimated CCA-mode was not unduly influenced by the EGD. First, a scatterplot of the IDP and non-IDP canonical variates, with group membership indicated by plotting symbol, showed no evidence of clustering (Supplementary Figure 9). Next, we computed post-hoc correlations for each subgroup separately. Here the subgroup analysis identified similarities in the top contributing IDP and non-IDP measures to the significant CCA-mode, Supplementary Table 2. Supplementary Figure 10 also presents a stratified version of the variance explained plots shown in Figure 4, again showing a generally similar pattern of contribution of each subdomain in the total, and two subgroups. Lastly, in order to further assess the CCA similarity in the subgroup analysis, we also provide a coefficient of factor congruence between groups for IDPs and non-IDPs. Here we identified a congruence value of 0.71 for non-IDPs and 0.75 for IDPs between groups. Table 6 presents the overlap between bivariate and multi-levels findings.

DISCUSSION
This study identified significant associations that linked multiple measures of brain neurostructure to an extensive battery of behavioral measures derived from the MDBC-1953 dataset. Specifically, this battery includes measures from several cognition domains, demographic, health and lifestyle factors. The results indicate that variance is shared across many aspects of behavior, and that this covariation may be of importance to the variability observed in brain structure. Fundamentally, this finding offers a better understanding of the types of variability that may exist across healthy aging individuals, as well as the factors that may underlie the observed differences. Specifically, using post-hoc correlations, we mapped the original behavioral and brain-imaging measures onto the significant CCA-mode of population covariation. Here, we provide a measure of importance of each observed variable in maximizing the relation between multiple brain structural and behavioral variables. The results indicated that the CCA-based associations were driven largely by cognitive ability, early-life SEP, psychosocial factors, and whole brain (GM and WM) tissue volume. The ability to identify individual indicators that may be of significance to the variability observed in healthy aging trajectories is extremely valuable to age-related biomarker research and warrants further investtigation.
Many aging studies apply statistical models that are unable to consider the cumulative effect of various lifespan experiences on an outcome variable of interest. As age-related change is a continuous process, studies focusing on specific candidate determinants (e.g. GM volume, BMI, smoking status, amyloid load) limit the possibility of discovering new and unforeseen relations.
To avoid such partial interpretations and expand on what is currently known about differential aging trajectories, we employed CCA, a multivariate technique to seek patterns of covariation between two sets of measures -i.e. IDPs and non-IDPs -simultaneously. A strength of this approach is that it boosts power by implicating the full dataset to investigate our main aimthe relation of brain structural markers with a set of behavioral measures and to further evaluate these relations with respect to a broad range of modifiable risk and protective factors. Analogous to the positivenegative mode previously identified in the HCP data [78,79], we identified a single significant mode of population covariation largely linking multiple brain  AGING imaging markers to (other) behavioral measures (r= 0.75, p-corrected = 0.01). Specifically, CCA coupled higher scores in general intelligence (IQ-20 and IQ-11), attention with working memory load (RVP), explicit verbal memory (15-Word Pairs Retention), global cognitive functioning (MMSE, ACE), executive functioning (SOC), reaction time (RTI), psychosocial well-being (MDI), self-reported prevalence of depression, and paternal SEP with larger total brain volume (GM and WM), larger volume of subcortical brain structures, and a range of WM microstructural brain indices.
To our knowledge, this work is the first to underscore psychosocial well-being and early-life conditions as significant contributors to the strength of brain and behavior associations observed in late-midlife. However, the discovery that individual variations in early-life SEP and psychosocial health may be most predictive of brain health and cognitive ability in laterlife is consistent with the concept of a socioeconomic gradient in health which reflects how the most disadvantaged groups are also those at increased risk of disease [80,81]. Previous studies also highlighting the significance of social inequality in disease progression AGING and premature mortality have attributed their observations to unknown influences termed "psychosocial factors" (i.e. hostility, depression, hopelessness), that act as mediators of SEP effects on later-life heath outcome [12,23,[82][83][84][85][86]. Furthermore, it has been suggested that many of these overlooked measures may act independently of common vascular risk factors (VRFs) and non-communicable diseases but are still Figure 6. A revised conceptual model of the Scaffolding Theory of Aging and Cognition (STAC-r). The original STAC model offered a theory that only accounted for individual variations in cognitive performance observed at one time point -later adulthood, with "aging" as the primary input to the model. In view of the latest developments within the field of aging research, the contribution of lifespan experiences (positive and negative) on later-life brain and cognitive health has led to a variant of the model that incorporates the cumulative effects of lifespan environmental and biological experiences on neurocognitive aging. In this adapted figure, we only present biomarkers and traits investigated in this study. Specifically, Figure 6 offers a pathway of how the additive bidirectional lifespan experiences may contribute to the differential age-related trajectories of health. We distinguish between influences that act to preserve neural integrity ("positive influences") versus those that are implicated in its demise ("negative influences"). AGING somehow 'entangled' within one's SEP [82]. Finally, although smoking is a well-established risk factor for brain and cognitive decline [17,23,33,87], this was not observed in our study sample. The lack of significant effects may be explained by our moderately sized sample, errors of measurement, or the cerebral and/or cognitive benefits of giving up smoking during adulthood [6]. Notably, as we do not quantify the number of previous smokers in this study, we remain cautious about inferring on the health benefits of smoking cessation on later-life health outcome.
This study also identified top contributions from brain structural measures (i.e. brain biomarkers specifically important in maximizing the CCA-based associations) to cognitive performance that are in agreement with previous reports [8,10,37,38,56,60,64,67,76]. Specifically, we identified spatially diverse brain influences pertaining to temporal, frontal, occipital, forebrain volumes to variations in specific cognitive domains, verbal learning and memory, attention, reaction time, executive function, and global cognitive performance. Specifically, the strong contributions observed from the fronto-temporal cortices are in line with the idea of compensatory neural mechanisms that are recruited to provide additional support in response to age-related WM deterioration [10,13,15,60,67,88]. As many cognitive abilities evidence of age-effects by early to late midlife [68], it is plausible that the positive total brain volume contributions reflect neurostructural strategies that seek to maintain homeostatic cognitive function in the face of age-related cognitive decline. This finding is also consistent with several other conceptual models of cognitive aging [69,88,89]. However, although, these models predominantly describe functional recruitment in response to age-related neural insults, continuous task demands and functional deterioration, it has also been expanded to include structural changes and neurogenesis (e.g. in the study of structural changes in the hippocampi of London taxicab drivers [90]). Specifically, in this present study we identified a pattern of WM microstructure that was characterized by decreased directional coherence (i.e., decreased FA and MO), and increased mean diffusivity (increased MD, ax, and rad) in WM tracts that typically demonstrate increased vulnerability to age-effects (e.g. superior frontooccipital fasciculus (SFOF), the superior corona radiata, superior longitudinal fasciculus, body of the corpus callosum, crus fornicis and stria terminalis (FC/ST), and sagittal stratum [40,91,92]. Guided by prior research reporting histological-DTI relations, the WM diffusivity patterns observed are consistent with the "last in, first out" hypothesis and may reflect age-related neurobiological processes such as chronic WM degeneration, demyelination, secondary Wallerian degeneration, or gliosis [25].
In age-or disease-related neurodegeneration, increases in MD (a sensitive but unspecific marker of cellular degeneration) are commonly linked to reduced FA and MO (reflecting an overall decrease in WM tract organization). However, we note that our results did not always identify such patterns of relations. In fact, previous studies have suggested that the relationship between DTI-derived summary measures may be region-dependent, and may not, especially in the case of older adults or patient groups, uniformly link decreases in anisotropic diffusion to increases in mean diffusion [92]. Furthermore, we found that the component measures, rad and ax, demonstrated more extensive relations to non-IDP measures than their derivatives (FA, MO and MD). This finding has also been highlighted in previous studies where MD and other directional diffusivities, but not FA, presented strong relations to cognitive function [91]. In general, we found that the inclusion of component diffusivity measures serve as useful additional sources of information regarding WM integrity. However, we note that in regions of crossing fibers the interpretation of diffusion MRI measures should be made with respect to the underlying WM anatomy and the differential neuropathology of WM tracts [93]. The importance of this was demonstrated in an earlier imaging-study comparing changes in DTI-derived summary measures in areas of crossing fibres pertaining to data from Alzheimer disease (AD), MCI, and healthy aging subjects [93]. Specifically, this group discovered atypical increases in FA and MO in the MCI group compared to the healthy aging controls. Upon further examination using probabilistic tractography, they attributed the unusual rise in mode and fractional anisotropy to a region of crossing fibres in the centrum semiovale. Here, they found that the relative sparing of motor-related projection fibres that traverse the agesensitive cognitive-related association fibres of the superior longitudinal fasciculus (SLF) artificially inflated FA and MO in the direction of the newly assigned primary eigenvector. Specifically, this finding provides a plausible explanation for the unexpected increases in FA and MO in conditions where the opposite is expected (i.e. decreases in anisotropic diffusion in disease conditions), underscoring the importance of interpreting DTI measures with respect to the underling WM microstructure. Consistent with the findings of Douaud et al., we also discovered linked increases in FA and MO in brain regions where mean diffusivity measures were also increased. Considering the late-midlife age-range of participants -one where negative age-related effects on cognitive ability and brain structure are commonly observed [68] -it is most AGING likely that our findings also evidence very early pathogenic alterations of WM in complex brain regions such as crossing fibres. Similarly, we note that findings of increased FA and MO did not always extend to coupled increases in axial ( ax = 1) and radial ( rad = 2, 3) diffusivity as would be expected in a healthy study samples. However, as the interpretation of diffusion anisotropic indices -FA and MO -in regions of crossing fibres are extremely complicated, it is likely that the interpretation of eigenvalues is equally convoluted and warrants further investigation.
Other factors that may explain the relations between variables that were in the opposite direction to what one would expect includes an insufficient lag time between changes in a brain biomarker, and their presumed effect on another variable(s). That is, the rate of effect of WM changes on a hypothesized effect variable (e.g., cognitive function) may not be instantaneous. In this case, future investigations that employ lead-lag analyses -using more than two assessments -are necessary to ensure that the observation interval harmonizes with the timing of the critical events. Alternatively, similar to findings from animal aging models [2], we also consider the possibility that the observed increases in regional brain volume may instead reflect pathological processes such as gliosis or defective elimination of byproducts [76], and not compensatory neural mechanisms. In this regard, the link between positively contributing total brain volume measures and predominately negatively contributing dMRI indices would be largely explained. Lastly, previous studies have shown that the GM and WM contrast immediately below and above the WM surface is markedly reduced in healthy aging participants [94]. It is thus a possibility that methodological procedures concerning segmentation of brain tissue could have led to an overestimation of GM content and underestimation of WM content further accounting for the anti-correlation observed between the two tissues.
Univariate IDP-by-non-IDP associations that survived correction for multiple testing were few; however, those identified were in agreement with previous studies that also linked better cognitive ability to dispersed brain patterns of regional influence [6,9,10,17,57,62,76,[95][96][97][98]. Specifically, we found a positive association between GM volume of the temporal pole and performance in a test of intelligence measured at age ~20 years (IQ-20). We also report a significant positive association between MO in the medial lemniscus and height. However, although previous studies have similarly identified a positive link between height and intact WM integrity [99], we also consider alternative interpretations that may have produced similar findings. Namely, as described in the case of [93], the observed co-localized increases in MO and FA in the medial lemniscus may indicate evidence of selective degeneration of secondary WM tracts which result in the "spared" fibers of this sub-region establishing a newly acclaimed primary eigenvector, and with it a misleading increase in both FA and MO [40].
We also identified few pairwise cognition-vs-all-(other)-non-IDPs associations that survived thresholds for multiple testing. Of these, the results confirm the well-acknowledged positive relations between cognitive ability (measured by IQ tests at ages 20, 57 and 63 and Addenbrookes Cognitive Examination at age 57) and educational achievement [100]. Interestingly, the results also revealed a strong positive link between IQ and height. Previous studies that have similarly identified links between intelligence and height have indicated that height may be a useful proxy marker for adverse early-life conditions and increased later-life dementia risk [101,102]. Although notably, the effect of genetic factors on height, IQ, and early life environmental conditions should also be considered. Finally, the results of validation tests regarding the effect of EGD on univariate associations showed the significant relations observed at the whole-group level were not driven by group average differences (and hence not by the extreme-group-design either). However, we found the whole-group associations were mainly representative of group A members -the "improvers" and not group B members -the "decliners". Although a similar trend of relations does exist for group B members we speculate that at present the effect sizes do not meet the threshold for 'discovery'.
Next, we explored univariate IDP-by-non-IDP associations after adjusting for change in IQ pertaining to both early-life and late-midlife. We found that IDPnon-IDP associations were largely unchanged when adjusting for the effect of C∆1 (IQ-57-IQ-20). However, adjusting for the effects of C∆2 (IQ-63-IQ-57) and C∆3 (IQ-20-IQ-11) revealed modest changes, Supplementary Figure 7A-7B. As early life intelligence is reputed to -in part -drive many of the observed relations between a "variable X versus cognitive performance" -an example of reverse causation [9,103,104] -we anticipated that adjusting for the effects of C∆3 may have attenuated or removed the significant relationship between IQ-20 and GM volume of the temporal pole. However, this was not observed, and the IQ-20-temporal-pole association remained significant. Conversely, we found that adjusting for the effects of C∆3 removed the association between height and MO in the medial lemniscus suggesting collinearity between C∆3 and one or more of the other measures in the model. As the association between height, brain and intelligence is largely attributed to shared genetic AGING influences [105], the present findings should not be directly interpreted as evidence for change in early-life intelligence or early-life conditions affecting the heightbrain relationship, but perhaps the effect of genetic factors that are interacting with change in cognitive function and/or environmental influences. Similarly, adjusting cognition-by-all-non-IDP correlations, Figure  2, for the effects of C∆1 and C∆2 resulted in negligible changes. However, accounting for the effects of C∆3 removed all prior significant correlations, Supplementary Figure 7C. Here, our results indicate that early-life cognitive change may play a role in the positive association identified between general cognitive ability and education. However, similar to the significant IDP-non-IDP associations, there are numerous likely causes of the IQ and education achievement association which principally concern variation in genetic profiles, and shared and non-shared environmental factors -with the former influential in the link between IQ and education attainment, and the latter important in the differences between them [106].
Notably, in this study we implemented two distinct methods for calculating cognitive change: 1. "raw difference scores" (RDS) formed by subtracting posttest scores from pre-test scores and 2. "residualized change approach" (RCA) which differences the observed score at follow-up (i.e., IQ-57) from the predicted score X2 (e.g., X2 is predicted with a linear regression analysis of the follow-up score on the observed score at baseline (i.e., IQ-20)) to produce a X2 residualized with respect to baseline. Specifically, the RDS approach was used to assess the effect of agerelated changes in IQ across distinct time periods on brain-behavior relations, whilst the latter, RCA, was used to determine the subjects selected for this present study. Despite the initial warnings against the use of change scores [107], it has since been demonstrated that in some cases, analysis of change scores may provide an equal or even superior approach to exploring change over time. Namely, in investigations that utilize a randomized multivariate pretest-posttest design, or in studies that are vulnerable to confounding by responseshift effects (e.g. response contamination in self-report measures) [108,109]. Nonetheless, we briefly list the main objections against the use of difference scores in exploring change over time: 1. Difference scores are based on "imperfectly" measured pre-and posttest scores [109]. This imperfect reliability is commonly attributed to varying learning, personal or environmental influences that may affect the outcome measure at each sitting differentially. Thus, if difference scores are a combination of true change and change in any random error of measurement, the analysis of change scores may also be contaminated by these errors. 2. Raw differences tend to be (by construction) negatively correlated with baseline measures, potentially confounding the true relationship between the two measures of interest. 3. Raw difference scores are vulnerable to the well-known but poorly understood statistical artefact: regression toward the mean [110]. This phenomenon proposes that due to errors of measurement, an extreme score at baseline will be succeeded by a less extreme score at follow-up (i.e. one that is moving toward to the overall mean). Thus, difference scores originating from imperfectly observed measures will inherently provide an erroneous representation of "real" change, and 4. The discovery of spurious correlations that are due to active pre-existing differences at baseline [111].
Lastly, as the MDBC-1953 is a narrow-aged birth cohort, we are unable to compute moderation analyses to explore whether the significant relations identified in this study vary as a function of age, or if they demonstrate stability throughout the lifespan. However, a strength of the present study resides in its use of the EGD. That is, compared to traditional aging studies that are typically biased towards higher educated and better cognitively performing participants, the EGD increases the diversity in participant characteristics and with it the likelihood of discovering relations between variables that are more representative of the true population. Historically, the EGD -a common sampling procedure -has been used to accentuate statistical power of linear associations, facilitate the task of fitting a trend line to data, and reduce the costs associated with examining data from the full range of a variable [75]. In this specific study, the EGD was applied for two main reasons: First, to ensure that change in cognitive ability from early-adulthood to late midlife was sufficient to detect biological correlations. In this regard, the EGD is particularly useful in enhancing the variability in a measure of interest when only modest changes are expected -which is often the case in moderately sized, homogenous, healthy samples like the MDBC-1953. And second, by sampling subjects from the extremes of the change-in-IQ distribution we increased the crosssubject variability in cognitive change and other related variables (e.g. education level, occupation complexity, levels of motivation) ensuring that our sample was not biased towards higher educated, better cognitively performing, and motivated participants. Crucially however, the EGD does not undermine the validity of statistical significance: if the selection variable or related variables truly explain no variance, nominal false positive rates will be obtained. The approach can, however, bias parameter estimates of the selection variable or linked variables. In view of this, we ran several validation tests to explore the extent to which using this approach may have biased our findings. However, the result of the validation tests indicated that AGING the significant relations observed at the whole-group level were not driven by group average differences (and hence not by the extreme-group-design either). Nevertheless, like all methods, the EGD is not without its own limitations, namely: artificial inflation of standardized effect sizes, power-enhancement for analyses of linear associations, increased vulnerability of extreme scores to the regression toward the mean phenomenon, low test-retest reliability, and un-suitability for testing nonlinear associations [75]. However, as the primary goal of this study was to detect brain-behavior relations and the extent to which they may be influenced by multiple, diverse aging-related covariates, we do not overstate the resultant effect-sizes, nor do we infer on how these relations may fluctuate over time. Rather, findings from this study can be used as a foundation for subsequent analyses exploring correlates of differential healthy aging trajectories. Furthermore, as the extant literature on healthy aging converge on findings that describe near linear declines in brain volume and cognitive ability we believe the limitation of the EGD to explore non-linear trends and associations therein is not a major shortcoming in this study.

CONCLUSION
With the pressing personal and economic challenges presented by "top heavy" societies, this study offers a valuable approach for discovering potential age-related markers of early brain and behavior changes. Specifically, using data from a homogenous longitudinal prospective study, we report significant associations that link broad-brush and specific brain and cognitive measures to each other and to a range of heterogeneous age-related covariates. Specifically, CCA identified individual variations in brain structural patterns that were not only significantly associated to each other but also related to individual differences in behavior. Future longitudinal studies using a larger sample size, > 2 measurement occasions, and a broader selection of potentially relevant age-related covariates (e.g. fMRI data [112], markers of oxidative stress [23], inflammatory processes [113], immuno-senescence [114], telomere attrition [115], hormonal dysregulation [10], and brain metabolites [10]) are likely to explain larger portions of the unexplained variance in healthy aging trajectories, ultimately improving early intervention targets and with it, the quality of life for older adults.

Participants: extreme group design (n=1,985)
Details regarding the subject selection criteria used for the imaging sub-study have been previously reported [112]. This study has also been registered at clinicaltrials.gov (NCT03290040). In summary, using youth and late midlife intelligence quotient (IQ) scores, subjects were selected based on their estimated change in mental ability as part of an "extreme group design" (EGD) [116]. Specifically, the two well-validated tests, the Børge Priens Test (BP) [117] and Intelligenz-Struktur-Test 2000 R (IST) [118] were taken at ages ~20 (IQ-20) and ~57 (IQ-57) respectively. The BP test was used as part of a military draft board assessment on a total of n=11,532 MDBC-1953 subjects, whilst the latter reassessment test, IST, was administered in 2009-2011 by the Copenhagen Ageing and late midlife Biobank Project (CAMB) and included n=1,985 members [14]. Both examinations comprise subtests that assess aspects of verbal and arithmetic intelligence (e.g. numerical series and verbal analogies), and thus are similarly structured and comparable. Since the cognitive change between these time-points was based on two different instruments, a change score was derived with a linear regression analysis of IQ-57 (IST-2000 R) on IQ-20 (BP) using the whole population of 1,985 CAMB subjects. IQ-20 explained R 2 =50.4% of variance in IQ-57 (beta=0.71, p<0.0001), and we used each subject's standardized residual about the regression line as a measure of their change in IQ across time, Supplementary Figure 1. To avoid the effects of extreme test scores, subjects with absolute standardized residuals ±3 were omitted. Finally, the remaining members were classified into two subgroups pertaining to the degree of cognitive change observed from earlyadulthood: subgroup A = improvers and subgroup B = decliners.

Participants: present study (n=193)
Acquisition of imaging and non-imaging data for this study was carried out in 2010-2013 (subject age 57±0.8 years). During this period, a total number of n=552 subjects were invited to participate, and of these, n=243 accepted their invitation and proceeded to the data acquisition stage. Here, subjects suffering from alcohol or drug abuse comorbid with cognitive impairment, psychiatric or neurological disease, and contraindications to MRI were identified and eliminated from further investigation (n=36). Out of the remaining eligible respondents, a further 14 subjects were removed due to imaging-related contraindications or having T1weighted (T1w) structural brain images that were unusable, leaving a total of n=193 subjects data that were used in this present study (subgroup A: n=95; subgroup B: n=98), Figure 5. Although there was a small range of ages during data acquisition, in general we refer to the ages of participants as 11 (W-11), 20 (W-20), 57 (W-57), and 63 (W-63) years. Data pertaining to W-63 (i.e. the second late-midlife data AGING sweep) includes both brain imaging and behavioral data and is subject to a subsequent report. This study was approved by the local ethical committee (De Videnskabsetiske Komiteer for Region Hovedstaden) and registered by the Danish Data Protection Agency. All participants provided written informed consent.

Neuropsychological assessment
An in-depth series of neuropsychological tests were administered on the same day as the brain-MRI acquisition. Global cognitive function was assessed with the mini-mental state examination (MMSE) and Addenbrooke's cognitive examination (ACE). The Cambridge Neuropsychological Test Automated Battery (CANTAB) was administered to evaluate cognitive ability across the following cognitive domains: learning and memory (spatial and pattern recognition, paired associates learning), executive function (planning), and attention and reaction time [119]. Furthermore, we include measures of intelligence acquired at W-11 (IQ-11) and W-63 (IQ-63). IQ-11 was assessed using the Härnquist test battery and consists of three subtests evaluating spatial, numerical and verbal intelligence [120]. Similar, to IQ-57, IQ-63 was also assessed using the IST 2000R test at the latest 5-year late-midlife follow up. Although we are unaware of any studies that evaluate the validity of the Härnquist test, previous studies using subjects from the MDBC-1953 have reported strong correlations between measures of IQ-11, IQ-20 and IQ-57 [14]. Furthermore, the BP test has previously shown strong correlations with the Wechsler Adult Intelligence Scale [121]. In total, we include 31 measures of cognitive performance. See Table 1 for a list of these variables and the study sample characteristics.

Demographic, health and lifestyle assessment
We evaluate the effect of environmental, lifestyle behaviors and biological factors, both positive and negative, on individual differences observed in brain structure and cognitive performance. These include: Major Depression Inventory (MDI) [122], Pittsburgh Sleep Quality Index (PSQI) [123], Multidimensional Fatigue Inventory (MFI-20) [124], and a range of demographic (social and biological), vascular risk factors (VRFs), general health and biological samples for biomarker analyses. In total, we report 39 measures of other (non-imaging) variables, which together with the neuropsychological data described above are referred to as non-imaging derived phenotypes (non-IDPs) or behavioral data. Finally, to assist the interpretation of results, non-IDPs were divided into four subdomains: cognitive, demographic (social and biological), health and lifestyle. Tables 1-4 provide a list of these variables, how they were assessed and study sample characteristics.

MRI data acquisition
All subjects underwent whole-brain MRI scanning using a 3.0 T Philips Intera Achieva (Philips Medical Systems, Best, the Netherlands), with a 32-channel phased-array head coil. During resting conditions the following sequences were acquired in all participants: 1. anatomical high-resolution 3D T1-weighted (T1w) images using a gradient echo sequence (TR/TE = utilizing a single spin-echo echo-planar imaging sequence. For each dMRI scan, 33 images were acquired: 1 image with no diffusion sensitization (b=0 image), and 32 diffusion-weighted images (b = 1000s/mm 2 ). Using varying orthogonal views, all T1w, T2w, T2w-FLAIR and dMRI images were visually inspected in their raw state for bias field corruption, excessive motion, and other potential artifacts. Lastly, although not used in this study, the MRI protocol also includes task and resting functional MRI (fMRI) imaging.

Image analysis pipeline
In order to facilitate future meta-analyses and replication studies, we used the UKB image processing pipeline on our raw (non-processed) imaging data [125]. A strength of this pipeline is that it ameliorates the loss of statistical power by reducing the number of multiple comparisons made (compared with voxel-wise testing) and increasing signal-to-noise ratio (SNR) by replacing voxel-wise measures with ROI (region of interest) averages obtained through alignment to a standard coordinate system. For a detailed overview of the imaging pipeline see [126].
In brief, we extracted 453 brain-imaging biomarkers (i.e., a parsimonious set of biologically meaningful measures derived from multiple imaging modalities) that best capture the differential aging processes and neuropathology observed in a healthy aging population. Subsequently, we categorized the extracted summary measures into 5 groups (T1w-SIENAX, T1w-FIRST, T1w-FAST, T2w-FLAIR-BIANCA, and dMRI-TBSS) AGING to reflect the MRI modality and image processing tool applied to estimate each measure, Table 7. Specifically, we used T1w structural data as the reference image to calculate cross-subject and cross-modality alignments required to process all other brain modalities. Additionally, T1w data is used to obtain estimates of both specific brain structures (primarily sub-cortically), and volumes of major tissue types of the whole brain (i.e., grey matter (GM), white matter (WM) and ventricular cerebrospinal fluid (CSF)), which is subsequently used to provide robust biomarkers of global and local brain atrophy [95]. See Supplementary Material ("T1w Pipeline") and Supplementary Figure 2 for further details. Next, we use T2w-FLAIR imaging to derive estimates of focal hyperintensities in WM which are typically indicative of inflammatory disease, cerebrovascular pathology, demyelination, trauma, or other neural insults [36,127]. We also use dMRI to provide an index of the density and structural integrity of WM microstructure in vivo (e.g., neurites and axonal structures) based on the principle of random diffusion of water molecules within cellular compartments [40,128]. Specifically, it has been shown that dMRI may be able to measure changes in normal appearing WM (NAWM) before these can be identified using more traditional MRI techniques e.g. T1w or T2w [39,76,93,129]. Lastly, in a previous sub-study, phase contrast mapping (PCM) was utilized to measure volume flow in basilar and the internal carotid arteries. Here, total blood flow to brain size was normalized and used to derive mean global cerebral blood flow [112,130]. For a detailed overview of the image processing pipeline refer to Supplementary Material and Figures 2-4.

Statistical analysis
To investigate whether brain IDPs contain relevant between-subject information pertaining to sets of other List of imaging modalities and processing tools employed (column 1 and 2), followed by the corresponding function of each processing tool (column 3), description of the brain measure(s) extracted (column 4), and total number of IDPs estimated (column 5) using the UK Biobank (UKB) image-processing pipeline. AGING (non-imaging) MDBC-1953 variables, we used both univariate (Pearson correlations) and multivariate statistics (canonical correlation analysis (CCA) [131], both analyses adjusted for a number of confound variables. Figure 6 visualizes the selection of candidate determinants included in this study and a hypothesized pathway of how their additive bidirectional effects may lead to an alteration in brain and behavior. In this study, we distinguish between influences that may act to preserve neural integrity (positive influences) versus those that are implicated in its demise (negative influences).

Whole-group adjusted univariate associations
We applied Pearson's correlations to study the relation between each of the 453 brain IDPs to each of the 70 non-IDP variables and each of the 31 cognitive measures to each of the (other) 39 non-IDP variables extracted from the MDBC database (full set of IDP x non-IDP estimates: 453 x 70; full set of cognitive x all (other) non-IDP estimates: 31 x 39). The motivation behind this approach was to first identify significant simple pairwise associations between each brain and non-brain measure and each cognitive and other non-IDP measure, before comparing these results to those from more complicated multivariate techniques accounttable for the synergistic effect of multiple variables.
In total, two variants of Manhattan plots are used to display the significance (-log10 P-values) of Pearson's correlations for IDPs x non-IDPs (31,710 values) and cognition x all (other) non-IDPs (1209 values).
To reduce the influence of potential outliers and increase the reliability of associations made, we applied rank-based inverse Gaussian transformation (quantile normalization) to enforce Gassianity for each of the IDPs, non-IDPs, and confound variables (see below). For feeding into CCA (but not needed for univariate associations), we then applied an iterative PCA algorithm (based on the soft shrinkage of eigenvalues) to impute missing data values until convergence [132]. Finally, four confound variables were created relating to effects that may trouble the interpretation of computed correlations: absolute motion during MRI, relative motion during MRI, head size, and age. The confounds were regressed out of all IDP and non-IDP variables. To account for multiplicity, we assessed the strength of significance against two different types of multiple testing correction, controlling the familywise error rate (FWE) via Bonferroni and the false discovery rate (FDR) [133]. However, as FDR correction did not identify any additional tests already marked significant by Bonferroni, Figure 1 and Supplementary Figure 7A-7B only show Bonferroni correction. Figure 7. Population continuum. CCA estimates a "population continuum" of covariation across subjects that jointly characterizes brain imaging and other (non-imaging) data through a single axis. We can describe each subject's relation to this axis by assessing the value and polarity assigned to their individual CCA-derived subject weight and further evaluating how this value is related to the observed IDP and non-IDP measures. In this example, subjects with negative subject weights are characterized by high total cholesterol levels and low total brain volume, whereas subjects assigned with positive subject weights are characterized by low total cholesterol levels and higher total brain volume.

Validation test: subgroup univariate associations
In order to assess the impact of the EGD, we separately compute univariate associations between each IDP and non-IDP measure for subgroup A ("improvers") and subgroup B ("decliners") subjects. Each subanalysis should be free of any spurious associations driven by average group differences in cognitive level (i.e., an example of Simpson's paradox [134], whereby suboptimal pooling across variables such as cognitive level can potentially generate misleading associations).

Whole-group univariate associations: adjusted for change in IQ (C∆)
We compute univariate associations between each IDP and non-IDP measure and between each cognitive and all (other) non-IDP measures after adjusting for change in normalized IQ score pertaining to cognitive tests administered during early-life, youth and late-midlife. Specifically, change in IQ was estimated using the raw difference score (RDS) approach (i.e. subtracting normalized post-test scores from normalized pre-test scores). Here we explore the effect of cognitive change on brain-behavior univariate relations with a specific focus on how individual differences in cognitive change may contribute to the brain-cognition relations observed in late-midlife. The three cognitive change variables estimated include: C∆1 = IQ-57 -IQ-20, C∆2 = IQ-63 -IQ-57, and C∆3 = IQ-20 -IQ-11.

Whole-group adjusted multivariate associations
To capture patterns of age-associated relations in multiple IDP and non-IDP measures simultaneously we applied CCA (Figure 7).
For the CCA, we adopted a similar approach as described previously [78,125]. In short, CCA was computed (canoncorr; MATLAB 2014a) following the model: U = AX and V = BY; where X represents the set of IDPs, Y is the set of non-IDPs, and A and B are optimized to maximize the correlation between each canonical variate pair, U and V [131]. The magnitude of the relationship between each variate pair is reflected by the canonical correlation coefficient (Rc), an indicator of how strongly the estimate of population covariation is reflected in both IDP and non-IDPs datasets. Intuitively, we can think of CCA as identifying two latent variables, Ui and Vi, from a specific linear combination of weighted MRI-derived brain measures that are most strongly associated to a specific linear combination of weighted non-imaging measures, Supplementary Figure  5A-5B.
IDP and non-IDP datasets for CCA analysis were prepared using the procedure described in "Whole-Group Adjusted Univariate Associations". This resulted in a brain-IDP matrix of size 193 x 453 (subjects × IDPs) and a non-IDP matrix of size 193 × 70 (subjects x non-IDPs). Typically, these datasets are the inputs fed into the CCA algorithm. However, to reduce overfitting (i.e., tending towards a rank-deficient CCA solution), prior to CCA we separately reduced the dimensionality of each dataset using PCA. Specifically, after accounting for missing data as before, we compressed the size of each matrix along the respective phenotype dimension to the top 30 subject-eigenvectors which accounted for ~75% of the total variance in our datasets (70.1% for IDPs, 77.6% for non-IDPs). The final dimension of each matrix fed into CCA was therefore 193 x 30 (subjects x PCA-derived components), with an output of 30 CCA modes estimated.
Statistical significance of the modes estimated was determined using 10,000 permutations of rows of one matrix relative to another. CCA was then re-run after each permutation and the respective r-values for each permuted CCA mode was estimated. Each observed canonical correlation r is compared to the null permutation distribution of the largest canonical correlation, creating family wise error p-values corrected for searching over all 30 canonical correlation dimensions.

Post-hoc correlations
To relate the CCA mode estimated back to the observed IDP and non-IDP variables, we perform post-hoc correlations. This is achieved by computing the correlation between each original (observed) variable and the CCA-derived canonical variate weights (U or V), Supplementary Figure 6. This approach is analogous to the computation of factor loadings in factor analysis, and are also known as canonical structure coefficients. Generally, variables with larger loadings indicate greater association with a CCA mode. Finally, to formally assess the degree of similarity in the CCA subgroup analysis, we provide a coefficient of factor congruence between groups (i.e. improvers vs decliners) for the IDPs and non-IDPs.

AUTHOR CONTRIBUTIONS
KZ, SS and TN contributed towards analysis and interpretation of the data and drafting and revising the original manuscript for intellectual content. FA contributed towards analysis of the imaging data. ML contributed towards study direction and coordination of project. BF and ER contributed towards study design and review and editing of manuscript.