Discovering correlates of age-related decline in a healthy late-midlife male birth cohort

Studies exploring age-related brain and cognitive change have identified substantial heterogeneity among individuals, but the underlying reasons for the differential trajectories remain largely unknown. We investigated cross-sectional and longitudinal associations between brain-imaging phenotypes (IDPs) and cognitive ability, and how these relations may be modified by common risk and protective factors. Participants were recruited from the 1953 Danish Male Birth Cohort (N=123), a longitudinal study of cognitive and brain ageing. Childhood IQ and socio-demographic factors are available for these participants who have been assessed regularly on multiple IDPs and behavioural factors in midlife. Using Pearson correlations and canonical correlation analysis (CCA), we explored the relation between 454 IDPs and 114 behavioural variables. CCA identified a single mode of population covariation coupling cross-subject longitudinal changes in brain structure to changes in cognitive performance and to a range of age-related covariates (r=0.92, Pcorrected < 0.001). Specifically, this CCA-mode indicated that; decreases in IQ and speed assessed tasks, higher rates of familial myocardial infarct, less physical activity, and poorer mental health are associated with larger decreases in whole brain grey matter and white matter. We found no evidence supporting the role of baseline scores as predictors of impending brain and behavioural change in late-midlife.


INTRODUCTION
Among the many challenges presented by an increasingly "top heavy" Western society, the cases of cognitive decline and the accompanying economic and social demands have never been more apparent [1,2]. It is wellestablished that the ageing brain undergoes major structural and functional changes which, even in the absence of disease, is related to decline in specific cognitive domains [3][4][5][6][7][8][9]. Furthermore, it has been shown AGING that on an individual basis, there is significant variability in the trajectories of brain and cognitive change, with a small proportion of the population demonstrating "heathy" or "successful" ageing well into old age [10]. However, the reasons underlying the observed variability is not well-understood [11]. Equally, evidence linking longitudinal brain-cognitive changes to each other and to possible health-related lifestyle behaviours and environmental influencers are limited and inconsistent. Thus, in this present study, our first aim was to explore brain-cognitive longitudinal relations. Our second aim was to identify potential risk and protective factors that may contribute to the individual variations observed in later-life brain structure and cognitive functioning.
In a review of the cumulative research assessing the interdependence of age-related (brain-behaviour) changes in non-pathological conditions, the findings are unclear [9]. Although evidence of coupled changes are commonly described in the direction of advancing age, less intact brain structures, greater brain structural degradation and lower cognitive ability [12][13][14], associations that oppose this common course of agerelated change are also reported [15,16]. Examples include correlations that link larger initial brain volume and greater negative (shrinkage) change [17], higher early-life IQ and greater decline in visuospatial abilities [18], higher crystallized abilities and greater brain volume reduction and thinning of the cerebral cortex [19], smaller baseline regional brain volume and a moderate (gradual) rate of decline [20]. The varying and at times contradictory findings have been attributed to many factors namely, variations in sample characteristics, small sample size, short observation intervals, inclusion of subjects with undiagnosed pathology, modest withinsubject change and between-subject differences in change.
However, as the study of ageing is fundamentally the study of change, the use of cross-sectional data based on single observations from individuals of different ages is not an ideal study design. Specifically, cross-sectional studies can only offer information on age-related individual differences in level, and not individual differences in change. The problem here is one of aggregation in so far as pooling data across age-groups may result in misleading "illusory associations" that are in fact based on average age differences (i.e. an example of Simpsons Paradox and Lord's paradox [43,44]). Thus, although ideal for estimating population-level mean trends [37,45], cross-sectional samples are illequipped in providing reliable estimates of intraindividual change and the associations between rates of change. Adding to this, cross-sectional studies that span a wide age-range are also highly vulnerable to cohort effects, secular trends, and any other overlooked individual differences that are brought into the study from previous years. Considering this, the ability to measure within subject changes independently of between subject differences demands that the same individual is followed over time using a longitudinal design. Although findings among longitudinal studies are generally more consistent than their cross-sectional counterparts' [21], they are also bound by their own limitations such as the '3M'mobility, morbidity and mortality of subjects [46]. Notably, as many longitudinal studies start off as cross-sectional samples that are based on age-heterogeneous groups there is still a risk of mixing individual differences in rates of change (i.e. random age effects) with average age-dependent changes at the population level (i.e. fixed age effects) [37].
Lastly, many studies investigating potential correlates of age-related changes typically include a small number of putative risk and protective factors. Due to mutualinterrelations, such studies are at increased risk of identifying relations that are, in part, or entirely confounded by variables that have been overlooked. Of the studies that do include a wide range of potential agerelated modifiers, it is rare that their affects are examined simultaneously. Specifically, the application of improper statistical models to essentially explore the same ageing-related hypotheses, may be largely accountable for the inconsistent results observed across studies. Thus, studies using a narrow-age longitudinal sample, a large multidimensional dataset, and multilevel statistical modelling can reduce many of the aforementioned types of confounding discussed in order to increase the precision of estimated effects. Additionally, compared to focused analyses that explore AGING the relation between specialized anatomical regions and specific cognitive tests, a multifactorial approach is ideal for revealing relations that may have so far been overlooked in ageing research.
Thus, in this present study, we use a single-year-of-birth cohort where the majority of subjects have completed two early intelligence quotient (IQ) tests at ages ~11 (IQ-11) and ~20 (IQ-20). Subsequent, brain-imaging and behavioural assessments were conducted in two late-midlife waves separated by an observation interval of ~5 years. Here, detailed neuropsychological, brain MRI, general health, demographic, and lifestyle data have been acquired to investigate three key questions: First, exploring cross-sectional-longitudinal associations we asked: do midlife-baseline (age ~57; W-57) and follow-up (age ~63; W-63) brain structure correlate with changes in cognitive functioning, and relatedly, does W-57 or W-63 cognitive ability correlate with changes in brain structure? Second, we explored the impact of pure cross-sectional information on longitudinal associations: How are associations of longitudinal change in brain structure and cognitive ability altered when average measures are controlled for (i.e., assessed by comparing correlations between longitudinal changes before and after regressing out average measures ((W-57+W-63)/2)? Third, we explored the extent and direction in which common agerelated risk and protective factors influence the observed brain-cognition relations in questions 1 and 2. Tables 1-5. Longitudinal change in cognitive ability and brain imaging structural measures are shown in Supplementary  Figures 1, 2.

Cross-sectional-longitudinal and longitudinal correlations
All cross-sectional-longitudinal and longitudinal univariate correlations between brain IDPs and behavioural measures revealed no statistically significant relations when accounting for multiple testing (FDR > 0.05). Specifically, for longitudinal correlations, this was the case for correlations adjusted and unadjusted for the effect of average scores ((W-63+W-57)/2). We visualize results with Manhattan plots that show -log 10 p-values for IDP-by-behavioural longitudinal correlations, arranged by behavioural measures on the x-axis, multiple testing thresholds across all pairwise associations are marked with a horizontal line, FWE top line (P uncorrected = 6.01 x 10 -6 ) and False Discovery Rate (FDR) bottom line (P uncorrected = 5.03 x 10 -5 ), Figure 1.
Similarly, we visualize results with Manhattan plots that show -log 10 p-values for cognitive-by-all-otherbehavioural cross-sectional-longitudinal correlations, arranged by cognitive variables on the x-axis; multiple testing thresholds across all pairwise associations are marked with a horizontal line, FWE top line (P uncorrected = 4.80 x 10 -4 ) and FDR bottom line (P uncorrected = 4.09 x 10 -4 ), Figure 2.

Bland-altman plots
In addition to univariate correlations, we use Bland-Altman (BA) plots to assess the relation between longitudinal change in normalized IQ score from childhood (age ~11), youth (age ~20), and late midlife (ages ~57 and ~63) at different magnitudes of the measured (mean) IQ score. Specifically, BA plots presented in Supplementary Figure 3 do not exhibit any particular structure, as (e.g.) might be expected if high IQ subjects had relative greater change. Rather, the BA plots indicate that the direction and magnitude of change in IQ is unrelated to mean cognitive ability in our sample.

Multivariate associations
Whole-group multivariate associations CCA identified a single statistically significant mode of population co-variation coupling longitudinal crosssubject variations in brain structure to an extensive range of behavioural measures (R c = 0.9, permuted P corrected = 0.001). Post-hoc correlational analyses indicated that decreases in cognitive performance (IQ-57 -IQ-20) and speed assessed tasks, higher rates of familial myocardial infarct, lower HDL cholesterol, less physical activity, and higher scores on the mental depression inventory are associated with larger decreases (from age ~57 to ~63) in whole brain GM and WM volume but increases in some WM and GM ROIs, in particular the GM cerebellum. Finally, we did not find evidence supporting the role of baseline scores as predictors of impending brain or behavioural change in late-midlife For ease of interpretation and comparison with an earlier study, we invert the signs of all baseline (W-57) and follow-up (W-63) behavioural measures where lower values reflect higher cognitive performance or more favourable/healthy traits (e.g., speed assessed tasks, number of total errors, total cholesterol, BMI). Thus, in general, using Figure 3A, we interpret positive post-hoc correlations between each observed behavioural measure and the CCA-derived subject weights (i.e. canonical variate weights, U or V) as positive or "healthy" contributions to the CCA-mode, whilst all negative behavioural correlations are AGING   Abbreviations: NCDs = non-communicable diseases; BMI = body mass index; SD = standard deviation. Total health measures included n=25. *Cerebral blood flow is normalised to brain size.  Total mean brain (grey matter and white matter) volume, total grey matter volume of ROIs and subcortical brain structures, total macrostructural white matter volume, total volume of WMH, and microstructural properties of specific WM tracts pertaining to W-57 and W-63. Diffusion tensor indices (FA, MD, MO, L1-L3) are based on eigenvalues (λ1,λ2,λ3) which describe the magnitude of diffusion within a voxel. Abbreviations: SD = standard deviation, WMH = white matter hyperintensity, FA = fractional anisotropy, MD = mean diffusivity, MO = mode tensor, L1-L3 = eigenvalues).
interpreted as negative or "unhealthy" contributions. However, as MRI-derived brain structural measures are themselves indirect measures of underlying brain neuroanatomy, we cannot be entirely certain what an observed brain measure or difference (i.e. change) score truly represents. Considering this uncertainty, assigning a "good" or "bad" direction to estimated longitudinal change scores in a brain biomarker is extremely risky and avoided in this study.
brain-imaging measure (r 2 = 30.8%; r =0.56), and longitudinal change in total WM volume (normalized for head size) as the strongest negatively linked (r 2 = 35.0%; r =-0.59). Other top contributing positive IDPs include change in GM volume of cerebellum and non-cerebellum regions-of-interest (ROIs), change in the diffusion properties of several microstructural WM ROIs, and change in the volume of subcortical structures, the nucleus accumbens (Nac) and caudate. Post-hoc correlational analyses also identified a number of strong negatively contributing IDPs to the underling structure of the identified CCA-mode. Of these, the most influential include: change in global brain volume (grey and white matter), change in GM volume of cerebellum ROIs (the juxtapositional lobule cortex, inferior and superior temporal gyrus), and a range of dMRI microstructural markers (medial lemniscus, cerebellar peduncle, internal and external capsule, crus fornicis and stria terminalis, cingulum cingulate gyrus, uncinate fasciculus).
While no one result can be taken in isolation, we pull out just a few variables to illustrate the directions of the effects: Higher rates of familial myocardial infarct, less physical activity, higher score on the mental depression inventory (MDI), decreases in cognitive performance (IQ-63 -IQ-11, IQ-57 -IQ-11, IQ-63 -IQ-20), and speed assessed tasks is associated with larger decreases (from age ~57 to age ~63) in whole brain GM and WM but increases in some WM and GM ROIs, in particular in the GM cerebellum.
We explored the multivariate results to establish that the estimated CCA-mode was not unduly influenced by the EGD. Specifically, a scatterplot of the IDP and behavioural canonical variates, with group membership indicated by plotting symbol, showed no evidence of clustering, Figure 4.

DISCUSSION
This study makes a number of key findings: First, CCA revealed a single significant mode of population covariation that linked multiple longitudinal measures of brain-imaging structural measures to multiple longitudinal behavioural measures (R c = 0.92, permuted P corrected = 0.001). Specially, this discovery indicates that variance is shared across longitudinal measures of cognition, demographic, health and lifestyle factors, and that this explains variance in important longitudinal brain structure measures. Second, post-hoc correlational analyses between the significant CCA-mode and the observed variables suggest that participants demonstrating cognitive decline (across specific cognitive domains) also show decreases in total brain volume within the two late-midlife assessment intervals. This finding lends support to the concept of a general intelligence, or g-factorused to describe the interrelation among diverse mental abilities [47] but extends it to include the contribution of brain ageing and other aspects of real-life function e.g. socioeconomic factors, mental health, lifestyle behaviours. Specifically, the discovery of this CCA-mode indicates two key points: First, the results corroborate the existence of a domain-general mechanism that is impaired by normal (non-pathological) ageing processeswhich in this study is reflected by potential age-related biomarkers and environmental factors to the end effects (i.e. cognitive decline and brain atrophy). Second, the differential association of brain and behavioural longitudinal measures with chronological age indicate that different systems do not "all go together when it goes", but rather that different aspects of behaviour and biology may be characterised by their own age-trajectory [48]. Our results did not find evidence supporting the role of baseline (W-57) or follow-up (W-63) crosssectional measures as predictors of impending change in late-midlife. The latter is consistent with mixed findings in the literature that also found variables correlated cross-sectionally at baseline are not inherent predictors of subsequent change. Finally, controlling for the effect of average ((W-57+W-63)/2) scores on longitudinal correlations did not alter our results.
Consistent with prior studies [7,[12][13][14]19], we found that age-related total brain volume loss was linked to decline in several cognitive domains. That is, our results AGING indicate that participants who experienced a decrease in cognitive performance over the inception of the study, were also those less resistant to the various elements driving brain atrophy. Interestingly, the cognitive measures most strongly associated with the CCA-mode of population covariation (identified using post-hoc correlations) are consistent with the cognitive domains well-established for their increased vulnerability to advancing age [2,7,9,35,49,50], Figure 3A. Specifically, post-hoc correlations between the significant CCA-mode and the observed behavioural variables coupled decline in cognitive measures assessing general intelligence (IQ), executive functioning (SOC), attention with working memory load (RVP), pattern recognition learning and memory (PRM), visual paired associates learning and memory (PAL) to poorer mental health (assessed by MDI), decreased physical activity, lower HDL-cholesterol, a familial history of myocardial infarct, higher body mass index (BMI), higher alcohol consumption, and smoking. Here, our results suggest a link between negative lifestyle behaviours and age-related decline and are thus consistent with prior age-related studies that have similarly identified associations linking higher general intelligence and bigger brain volume to greater physical fitness and (other) positive lifestyle behaviours in older adults [2,14,17,18,[51][52][53][54][55]. Finally, we found that covariation in the aforementioned behavioural measures were important to declines in the following brainimaging variables; total white matter (WM) volume, total brain volume (GM and WM), GM volume of the juxtapositional lobule cortex, temporal gyrus related regions-of-interest (ROIs), and a range of WM (microstructural) brain ROIs.

AGING
Notably in this study, we apply caution when interpreting the subject-SEP and physical fitness related findings. First, the indicators of subject-SEP were AGING limited to one item (working or not working) and second, the significant role of physical fitness may, in part, reflect pre-existing genetic differences which renders moot any implication that improving physical fitness in late-midlife is causally related to one's ageing trajectory. Future investigations that use more than one indicator of subject-SEP, and are able to examine the effects of pre-existing genetic differences in physical fitness can further elucidate the role of these measures in age-related trajectories. Nonetheless, irrespective of whether these relations underlie causality, the present findings suggest that variations in the level of physical fitness and subject-SEP may be partially accountable for the individual variability observed in normative ageing trajectories.
This study explores the contribution of multiple interrelated risk and protective factors to age-related (brainbehaviour) changes simultaneously. There are several advantages to this approach. First, studies including only a small number of potential age-related covariates risk identifying relations that may in part, or entirely, be artefacts of relations not accounted for. Thus, our ability to account for the effects of a wide-range of measures can potentially attenuate this type of confounding whilst also providing a more realistic setting for identifying potential contributors of age-related decline. Second, although contributors or correlates of age-related differences in brain structure and cognitive function may individually be of negligible consequence to laterlife health outcome, it has been demonstrated that their cumulative effects may be of importance to the observed heterogeneity in ageing trajectories [14,18,56]. The advantage of modelling multiple heterogeneous variables simultaneously is demonstrated in randomized controlled studies investigating the effects of lifestyle behaviours on health outcome. In one example, the mutual interrelation across multiple diverse measures was shown through the increased success of intervention programs that targeted multiple health behaviours simultaneously or sequentially over programs that isolated single measures [57]. Specifically, "simultaneous multiple health behaviour research" promotes the benefits of multiple intervention targets compared to programs that value specificity above all else. It is suggested that the silo mentality of traditional scientific research greatly impedes our ability to recognize the commonality across diverse traits, and in general cautions researchers against the compartmentalization of anatomical structures, physiological processes and behaviours as unitary, unrelated constructs. Thus, with growing evidence supporting an integrative, multidisciplinary approach to health interventions, multidimensional studies like ours are found to be more favourable in yielding reliable and informative results.
When evaluating the top contributing brain-imaging measures identified by post-hoc correlational analyses, our findings are in agreement with previous reports that also link changes in total brain volume (GM and WM) to changes in cognitive ability [7-9, 40, 46], Figures 3,5. According to Stern's reserve hypothesis, larger brainswhich ostensibly reflect greater neuronal density and more extensive synaptic networks -are less vulnerable to the effects of ageing than smaller ones, therefore MRIderived measures of total brain volume which presumably represents the volume of neuropil are considered to be top indicators of overall brain health and related cognitive function. However, there are several caveats to interpreting the neurobiological processes or environment inferred by MRI brainimaging. First, as MRI can only offer an indirect measure of brain structure, it is unclear whether measured differences in brain volume across the two consecutive assessment intervals -both between and within subjects -genuinely reflect underlying age-related neurobiological processes or errors of measurement. Thus, with the uncertainty of what the estimated MRIderived brain change truly reflects, further interpretation of how these measures relate to cognitive change will at best be an approximation. However, notwithstanding the ambiguity of the physical substrate being measured, it is unlikely that any one neurobiological process is accountable for the dynamic structure-function relations observed in normal ageing. Under the premise that the brain is the physical substrate of behaviour, subjects experiencing brain atrophyi.e. indicating loss of neuropil and neural connectionsare typically expected to demonstrate cognitive decline, a theory supported by the negatively contributing brain and behaviour measures to the CCA-mode, Figure 3A, 3B. Conversely, if an increase in brain volume is an indicator of pathological processes such as gliosis, inflammation, or defective elimination of by-products [9], we would expect to observe relations that link increases in brain volume to declines in cognitive ability. The CCA results are also consistent with this theory as we also identified highly contributing positive contributions from GM volumes of cerebellum ROIs (lingual gyrus, planum temporale, accumbens, insular cortex) and DTI-indices fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AD) and radial diffusivity (RD) to declines in cognitive performance. A third scenarioconsistent with both Stern's active reserve hypothesis and the scaffolding theory of ageing and cognition (STAC) [58,59], postulates that irrespective of the extent of agerelated (pathological) brain change or the amount of initial brain reserve capacity (i.e. brain size, neural count) [59], the ageing process is kinder to individuals with higher baseline intelligence and those who engage in cognitively or socially stimulating activities. Thus, with these preconditions in mind, individual differences AGING AGING in cognitive ability and decline are proposed to be partly attributable to cognitive reserve proxy measures (e.g. childhood intelligence, SEP, extracurricular activities) that are thought to promote functional adaptation and reorganization of brain elements in spite of age-related pathological change to maximise performance. If this were correct, the post-hoc CCA results linking brain atrophy and loss of WM integrity to increasing cognitive scores would also be accounted for. In summary, the CCA results revealed ageing patterns that may underlie: 1. longitudinal correlations between brain atrophy and cognitive decline, 2. correlations between increased brain volume (underlying potential age-related neurodegenerative processes) and cognitive decline, and 3. preservation of cognitive ability despite age-related pathological brain change due to the positive contribution of cognitive reserve proxy measures (i.e. evidence of active reserve) to later-life cognitive performance. Figures 3B, 5B present the relative importance of individual and subdomain imagingderived contributions in maximising the correlation between the brain and behaviour datasets respectively. Specifically, when considering the significance of each IDP subdomain to the CCA mode ( Figure 5B), we found that although total brain tissue volumes, volumes of subcortical brain structures, total WMH load, and noncerebellum ROIs were predominantly negatively contributing, cerebellum ROIs and DTI indices (FA, MD, AD, RD) showed mixed contributions. However, when assessing the top contributing DTI measures to the identified CCA-mode, we found that the large majority of these measures were also negatively contributing. The pattern of broadly negatively contributing IDPs to the CCA-mode may indicate linked neurobiological processes such as demyelination, axonal degradation or gliosis that are responsible for the age-related brain atrophy associated to later-life. Furthermore, given the well-established link between WM integrity and speedassessed tasks, memory, and executive function, our results also provide evidence towards the notion that changes in WM microstructure (as indexed by dMRI), in concert with total brain volume atrophy, are perhaps partly accountable for the declines observed in specific age-sensitive cognitive domains [5,[60][61][62][63].
Contrary to prior studies, we found no evidence of crosssectional measures of brain or behaviour as predictors of longitudinal change [11, 12, 17-20, 51, 64]. This finding suggests that specific patterns of initial baseline scores may not necessarily offer greater or lesser immunity in the face of forces that drive age-related decline. Specifically, Supplementary Figure 3 shows low agreement (but unstructured) Bland Altman plots that do not indicate a relationship between the rate, direction or magnitude of change in IQ and initial (mean) cognitive ability. This is also confirmed in Supplementary Figure 1.
Here the plot of normalized trajectories of cognitive change indicates that pre-existing between-person differences are preserved into later life, an example of 'preserved differentiation" i.e., brighter children become brighter adults, but that this does not necessarily infer protection against the rate or onset of cognitive decline in later-midlife. Our findings are consistent with several other studies that investigated whether "age is kinder to the initially more able" [18,21,65]. Notably, the pattern of longitudinal change observed in Supplementary  Figure 1 also brings to the forefront our use of the extreme group design (EGD) for subject recruitment. The main purpose of using this approach was to ensure that the variability in cognitive decline across time and subjects was sufficient to detect biological correlations in what is otherwise a modestly sized sample of healthy, homogenous subjects.
Although this study identified relations between brainbehaviour measures that are in agreement with prior ageing studies, we also report a number of inconsistencies. Below we provide likely explanations for these variations. First, findings from earlier studies assessing the interdependence of age-related change is mainly limited to data acquired using a cross-sectional design [21,37,46]. In general, cross-sectional and longitudinal studies have shown low agreeability, namely with cross-sectional studies underestimating the rate of age-related decline [11,20,21,37,64]. Unlike longitudinal designs that can provide reliable estimates of within-subject differences, rates of change therein, and the associations between these changes, crosssectional studies can inherently only provide information on population-level mean trends and are therefore poorly suited for investigating true longitudinal change. Furthermore, cross-sectional studies that include a wide age-range are highly vulnerable to cohort effects, secular trends, and confounding by other unmeasured differences that may have been brought into the study from previous years. Evidently, such factors make crosssectional studies suboptimal for assessing the relation of brain-behavior changes over time and may be partly accountable for the discrepant findings across studies. Second, it may be that, at least in part, only extremes of a variables range are related to the magnitude of change, rate of change, or to other age-related variables. As such extreme scores in behavioural or health indicators are most often observed in older adults or patient groups, it is likely that our healthy late-midlife sample may not have accrued sufficient age-related changes to meet the effect sizes required for these associations. Third, the lack of statistically significant univariate associations in this study may be due to the choice of model itself. For example, in one study [18], the contribution of physical fitness -when assessed as three single measures -to cognitive ability was small. However, when the AGING individual fitness measures were replaced with a latent factor reflecting 'overall fitness', large associations with cognitive decline were identified. This finding suggests that broad latent measures of behaviour or health status, rather than narrow single indicators, are necessary to identify significant associations between age-related processes. Furthermore, although in some cases bivariate models are an obvious choice (e.g., exploring relations between specific brain regions hypothesized to mediate specific pathological behavioural changes), in the event of identifying modifiers or correlates of healthy ageing, it is most likely that the cumulative effect of a range of health and behavioural factors are responsible for the observed heterogeneity among individuals and thus should not be assessed by simple pairwise correlations alone. Fourth, our sample sizealthough comparable to other midlife longitudinal ageing studiesmay be under-powered to detect the small effect sizes of "healthy" brain-behavior relations, especially if the variation in the brain-behaviour changes are small to begin with. Fifth, the lack of significant univariate correlations compared to earlier studies may also be attributable to a lead-lag interval that is not agreeable with the timing of effects between measures. Specifically, it is unreasonable to expect an immediate contribution from early changes in the presumed "causal" variable and subsequent change in the presumed "effect" variable. In this scenario, future investigations that employ lead-lag analyses using more than two assessments are necessary to prevent this type of limitation. Lastly, the inconsistent findings across studies may also reflect the variability in subject characteristics (e.g. age, ethnicity, geographical) or heterogeneity in tests used to measure similar constructs. For example, a study reporting an association between baseline hippocampal volume as a contributor of subsequent decline in memory in subjects who are 80+ years have a far greater chance of detecting relations that may not yet have reached the necessary effect sizes in subjects who are almost two decades younger and of better general health.

Strengths and limitations
A major strength of this study is the study-sample itself. As MDBC-1953 is a healthy, homogenous, single-yearof-birth male cohort, subjects are ethnically, culturally and geographically homogenous, and of general good health. This setup dramatically minimizes the unwanted effects of chronological age and the potentially troublesome contributions of disease, differential environmental factors, and population stratification to variations in health outcome. A further strength of this study pertains to its use of CCA to explore longitudinal associations [89]. Specifically, CCA boosts power by using the full dataset to extract latent factors that are based on the shared variance observed among sets of related measures. This approach allows the simultaneous prediction of multiple outcome variables, permits the isolation of distinct biological mechanisms, and largely attenuates the unexplained residual variance through its identification of multiple modes. Next, unlike many longitudinal studies that are biased by selective attrition (i.e. due to the "3M"mobility, morbidity and mortality [46]), of the 193 subjects who attended W-57, 64% returned to provide follow-up data at W-63. Another strength of this study concerns the manner in which subjects were selected. That is, compared to conventional ageing studies that are biased towards higher educated, more intelligent subjects, we recruited participants on the basis of cognitive change from youth to late-midlife based on the Extreme Group Design (EGD) [66]. Specifically, by sampling subjects from the extremes of the change-in-IQ distribution, our approach maximizes the variability in participant characteristics (e.g. cognitive ability, occupational complexity, levels of motivation) and with it the applicability of our findings to the general population. Equally important, is the ability of EGD to maximise the variability in cognitive decline in order to detect biological correlations. Further strengths of this study concerns the age-range of participants. Specifically, there is a lack of evidence indicating that pathological change begins abruptly at old age. Instead, a growing body of research has converged in highlighting the importance of early-life and midlife measures in predicting later-life health outcome [31,42,58,[67][68][69][70]. In view of this, the available early-life data, in combination with the latemidlife measures allows us to assess the contribution of potential modifiers and the interdependence of agerelated processes prior to confounding by overt or underlying later-life pathological conditions. Lastly, our study includes a comprehensive range of broad brush and specific cognitive tests, brain biomarkers, and agerelated covariates reducing the amount confounding that is attributable to associations driven by other unmeasured factors. Furthermore, "specific" measures (e.g. cognitive tests measuring a particular skill, or a brain ROI) are hypothesized to form stronger links than general "broad brush" measures (e.g. total brain volume, general intelligence).
This study also reports a number of limitations. First, we acknowledge that a major limitation of the present study lies in its modest sample size, decreasing the studies power to detect modest-to-small effects. Second, although repeated measurements are what allow longitudinal studies to assess change over time, the use of identical tests at each assessment increases the risk of underestimating age-related changes. This is mainly attributable to repeat exposure to the testing material, environment, and operator which ultimately result in AGING practice-related learning. Thus, it is possible that accrued familiarity to testing conditions and materials may explain the observed gains in some of the variables examined. Notably, however, practice-related learning effects are only applicable to the cognitive tests administered in the two late-midlife waves. Relatedly, practice effects may also be partly accountable for the lack of statistically significant cross-sectionallongitudinal correlations. That is, the practice-related gains may have attenuated what was already only modest declines in cognitive ability in subjects that are a generally healthy and with it eliminating the opportunity of observing potentially important associations. That being said, prior studies formally investigating the contribution of retest effects to observed gains have reported on average moderate-to-large retest effects, but small inter-individual variability [71]. Specifically, this indicates that associations between ageing-related changes and their correlates identify true covariates of age-related cognitive change, rather than covariates of practice-related gains. The number of measurement occasions available for investigation may also be a limiting factor in this study. Specifically, as our study consists of two occasions of longitudinal testing we were unable to account for nonlinear trajectories of brain and cognitive changes. Lastly, the homogeneity of subjects also means that the findings reported in this study should be extending to the general population with caution.

CONCLUSIONS
Our study demonstrates the benefits of using a homogenous, single-year-of birth cohort to examine the association between broad-brush and specific longitudinal measures of brain and cognition, and their relation to demographic, health and lifestyle factors. Here, we report correspondence between structural and functional changes that largely link brain atrophy to cognitive decline and negative self-care behaviours. However, we found no evidence in support of baseline or follow-up measures as predictors of impending brain or cognitive change, or that early intelligence level is protective against ageing-related cognitive decline. Instead, we confirm previous findings that identified total brain volume as a particularly informative indicator of underlying cognitive ability. Additionally, this study reveals several potentially influential modifiers of agerelated trajectories: fitness level, mental well-being, subject-SEP, offspring, familial history of cardiovascular disease and diabetes, HDL-cholesterol, alcohol consumption, and smoking. Here, our findings endorse the notion that variability in life-course factors may play a key role in the rate, direction, and magnitude of agerelated brain-behaviour changes and warrants further investigation using larger, more diverse samples, with more than two measurement occasions. As our study-sample are optimally healthy, the findings reported here provide evidence for associations that are relevant to healthy ageing and contribute to a better understanding of the brain as the physical substrate of cognitive ability in late-midlife.

Participants: extreme group design (n=1,985)
The participants for this imaging sub-study were members of the longitudinal Danish Metropolitan Birth Cohort 1953 (MDBC-1953) [72]. For a detailed discussion regarding the subject selection criteria, recruitment, attrition and testing for this cohort see [72][73][74]. In summary, using youth and late midlife IQ scores, subjects were selected based on their estimated change in mental ability as part of an "extreme group design" (EGD) [75]. Specifically, the two wellvalidated tests, the Børge Priens Test (BP) [76] and Intelligenz-Struktur-Test 2000 R (IST) [77] were taken at ages ~20 (IQ-20) and ~57 (IQ-57) respectively. Since the cognitive change between these time-points was based on two different instruments, a change score was derived with a linear regression analysis of IQ-57 (IST-2000 R) on IQ-20 (BP) using a total of 1,985 subjects [74]. Both examinations comprise subtests that assess aspects of verbal intelligence (e.g. numerical series and verbal analogies), and thus are similarly structured and comparable. IQ-20 explained R 2 =50.4% of variance in IQ-57 (beta=0.71, p<0.0001), and we used each subject's standardized residual about the regression line as a measure of their change in IQ across time. To avoid the effects of extreme test scores, subjects with absolute standardized residuals exceeding ±3 were omitted and the remaining members were classified into two groups pertaining to the degree of cognitive change observed from early-adulthood: group A = 66 improvers and group B = 57 decliners. This study has also been registered at clinicaltrials.gov (NCT03290040).

Participants: present study (n=123)
The majority of MDBC-1953 members have taken part in two early-life intelligence quotient (IQ) tests at ages ~11 (IQ-11) and ~20 (IQ-20). During this early-life period, information on early-life social and biological demographic factors was also acquired. Subsequently, in late-midlife, based on the degree of cognitive change, cohort members were selected (as described in section 1.1) to complete 2 waves of brain-imaging and behavioural assessments. The first wave (W-57, where 57 represents mean subject age at scanning, 57 years ±0.7 standard deviations (SD), took place during 2010-2013 and included a total of 193 subjects who had usable imaging and behavioural data. Data pertaining to AGING W-57 have been previously investigated and reported [73,78,79]. The second wave, W-63 (mean subject age 62.5±0.9 years) began in 2015 and is scheduled for completion in 2020. To date, of the initial 193 subjects with brain-imaging and behavioural data from W-57, 192 subjects were invited back to participate in W-63. 136 of the initial W-57 subjects accepted their invitation and proceeded to the subject screening stage. Here, participant eligibility was determined and this resulted in the exclusion of 6 subjects' due to conditions related to substance abuse comorbid with cognitive impairment, psychiatric or neurological disease, and contraindications to MRI. Of the subjects who fulfilled the eligibility criteria, a further 7 were excluded due to no show on the day of examination or non-completion of MRI session. The final number of subjects in each group was: group A n=66, group B n=57. The average observation interval between W-57 and W-63 was 4.82±0.9 years. Figure 6 shows a recruitment flow diagram for the present study sample used to investigate brain and behaviour longitudinal correlations. Although there was a small range of ages during data acquisition, in general we refer to the ages of participants as 11 (W-11), 20 (W-20), 57 (W-57), and 63 (W-63) years. This imaging study was approved by the local ethical committee (De Videnskabsetiske Komiteer for Region Hovedstaden) and registered by the Danish Data Protection Agency. All participants provided written informed consent.

Neuropsychological assessment: Repeated measurements
We include a detailed series of neuropsychological tests that were measured at both W-57 and W-63, Figure 7. All behavioural tests were acquired on the same day as the brain-MRI acquisition. In brief, global cognitive function was assessed with the mini-mental state examination (MMSE) and Addenbrooke's cognitive examination (ACE). The Cambridge Neuropsychological Test Automated Battery (CANTAB) was administered to evaluate cognitive ability across the following cognitive domains: learning and memory (spatial and pattern recognition, and paired associates learning), executive function (planning), attention and reaction time [80]. Furthermore, we include both early-life measures of general cognitive ability acquired at W-11 (IQ-11) and W-20 (IQ-20) [81], as well as late-midlife measures of IQ acquired at W-57 (IQ-57) and W-63 (IQ-63). Table 1 presents the study sample characteristics for each cognitive examination used in the present analyses, separately for W-57 and W-63. To explore crosssectional-longitudinal associations and longitudinal associations pertaining to brain and behaviour changes, we use cognitive scores acquired during W-57 and W-63 to estimate difference (i.e. change) (W-63 -W-57) and average ((W-63 + W-57)/2) scores. While it may seem more intuitive to compare change to a baseline, note that raw change is negatively correlated with baseline by construction. Thus, in the present analyses, each cognitive measurewith repeated measurementsis represented by its raw change and raw average counterparts and included as two independent measures of cognitive ability. Thus, in total we analyse 64 measures of cognitive performance of which 33 describe raw change, 27 describe raw average, and 4 describe IQ scores derived from W-11, W-20, W-57 and W-63.

Demographic, health and lifestyle assessment:
We evaluate the effect of a range of reputed positive and negative measures on brain structure and cognitive performance. Assessments include both self-reported and objective measures such as, the Major Depression Inventory (MDI) [82], the Pittsburgh Sleep Quality Index (PSQI) [83], the Multidimensional Fatigue Inventory (MFI-20) [84], demographic factors, general health measures, history of non-communicable diseases (NCDs) and multiple health-related lifestyle behaviours. The majority of the potential age-related modifiers included were measured at W-63. Similar to neuropsychological tests with repeated measurements, demographic, health and lifestyle factors assessed more than once are included in subsequent analyses as raw change and raw average scores. These include: body mass index (BMI), total cholesterol (TC), highdensity lipoprotein (HDL) cholesterol, low-density lipoprotein (LDL) cholesterol, and very low-density lipoprotein (VLDL) cholesterol. In summary, we report 8 = demographic, 34 = health, and 8 = lifestyle measures (total = 50), which together with the neuropsychological data is referred to as 'behavioural' measures. Finally, to assist the interpretation of results, behavioural measures were divided into four subdomains: cognitive, demographic (social and biological), health and lifestyle. See Tables 1-4 for a list of these variables and the study sample characteristics.

Image analysis pipeline
We used the UKB image processing pipeline on our raw (non-processed) W-57 and W-63 brain-imaging data to enable future meta-analyses and replication studies [85]. Essentially, T1w structural data is used as the reference image to calculate cross-subject and cross-modality alignments required to process all other brain modalities.
In brief, we extracted 454 brain-imaging biomarkers (i.e., a broad set of biologically meaningful measures derived from multiple imaging modalities) that best capture the differential ageing processes and neuropathologies observed in a healthy ageing population. Subsequently, we categorized the extracted summary measures into 6 groups to reflect the MRI modality and image processing tool applied to derive each measure. These include: 1. T1w-SIENA percentage brain volume change (PBVC), 2. T1w-SIENAX (estimation of brain tissue volumes), 3. T1w-FIRST (segmentation of subcortical brain structures), 4. T1w total volume of grey matter (GM) in cerebellum and non-cerebellum regions-of-interest (ROI) [86,87] 1 using FAST-derived GM partial volume estimates (PVE), 5. T2w-FLAIR-BIANCA (total white-matter hyperintensity volume), and 6. dMRI-TBSS (microstructural properties of specific white matter (WM) tracts). With regards to the volume of GM in cerebellum and non-cerebellum ROIs, we inverted the non-linear registration to standard space which was subsequently used to warp a cortical atlas of 139 ROIs into native T1 space. Next, within each ROI we then summed the total volume of GM using the FASTderived GM PVE.
Tables 5, 6 list all image-derived phenotypes measured in this study (also referred to as IDPs), their function and study sample characteristics. 1 The 139 ROIs were defined by a combining parcellations from the Harvard Oxford cortical and subcortical atlases, and Diedrichsen cerebellar atlas.

Statistical analysis
We used univariate correlations and multi-level latent variable modelling to investigate specific and general relations between multiple brain and behaviour measures. To extract estimates of longitudinal change between two successive measurement occasions, we computed "raw difference scores" (RDS) (i.e., cross-sectional follow-up score at W-63 subtracted from cross-sectional baseline score at W-57). Several reasons motivated our selection of this approach. First, application of the commonly used "residualized change model" (RCM) can lead to biased results if the study sample consists of pre-existing groups at baseline [88]. Conversely, the RDS approach has shown to arrive at the correct inference regardless of whether there are pretest (baseline) differences in action e.g., when the association of a covariate or confounding variable with baseline scores is not equal to zero [88]. Second, although the latent difference score (LDS) model is not vulnerable to the aforementioned biases and may have been a natural choice to investigate longitudinal relations, its undeniable value is most apparent when constituent scores are derived using a well-defined instrument. However, as we include a wide range of variables that were acquired using a mixture of measurement methods from multiple measurement occasions the application of the LDS model was on this occasion unsuitable.

Whole-group adjusted univariate associations
We used Pearson's correlations to examine both crosssectional-longitudinal correlations and longitudinal correlations (defined above) between each of the 454 (longitudinal) or 453 (cross-sectional) brain IDPs to each of the 114 (longitudinal and average) or 70 (cross-sectional) behavioural variables extracted from the MDBC-1953 database (full set of IDP x behavioural estimates: 1) 454 × 70 or 453 × 114 for cross-sectional-longitudinal correlations and 2) 454 × 114 for longitudinal correlations. Figure 8 provides a List of imaging modalities and processing tools employed (column 1 and 2), followed by the corresponding function of each processing tool (column 3), description of the brain phenotype extracted (column 4), and total number of IDPs generated for each brain-imaging category at W-57 and W-63 (columns 5 and 6). (FLAIR = fluid attenuated inversion recovery, CSF = cerebrospinal fluid, GM = grey matter, WM = white matter, WMH = white matter hyperintensity volume, ROIs = regions-ofinterest). Total number of IDP measures n=454. *In order to obtain the GM volume of the 139 ROIs, we inverted the nonlinear registration to standard space, and used this to warp a cortical atlas of 139 ROIs into native T1 space. Within each ROI we then summed the total GM using the FAST-derived GM partial volume estimates. To reduce the influence of potential outliers and increase the reliability of associations made, we applied rankbased inverse Gaussian transformation (quantile normalization) to enforce Gaussianity for each of the brain IDPs, behavioural, and confound variables. For univariate correlations, missing data was handled with a complete case approach individually for each pair of variables; for CCA, where missing data is particularly problematic, we then applied an iterative PCA algorithm (based on the soft shrinkage of eigenvalues) to impute missing data values until convergence [90]. Finally, five confound variables were created relating to effects that may trouble the interpretation of computed correlations: absolute motion during MRI, relative motion during MRI, head size, age (difference (W-63-W-57) and average ((W-63+W-57)/2). The confound variables were regressed out of all brain-imaging and behavioural variables prior to correlational analyses. To account for multiplicity, we assessed the strength of significance with two types of multiple testing correction controlling the familywise error rate (FWE) via Bonferroni and the false discovery rate (FDR) [91].

Extreme-group design validation test: Univariate associations
In order to assess the impact of the EGD, we separately compute univariate associations between each IDP and behavioural measure for group A ("improvers") and group B ("decliners") subjects. Each sub-analysis should be free of any spurious associations driven by average group differences in cognitive level (i.e., an example of Simpson's paradox [43], whereby suboptimal pooling across variables such as cognitive level can potentially generate misleading associations).

Whole-group adjusted multivariate associations
To explore the relation between multiple brain IDPs and behavioural measures simultaneously we applied CCA.
For the CCA, we adopted a similar approach as described previously [79,85,92]. In short, CCA was computed (canoncorr; MATLAB 2014a) following the model: U = AX and V = BY; where X represents the set of IDPs, Y is the set of behavioural measures, and A and B are optimized to maximize the correlation In this study, we explore "cross-sectional longitudinal correlations" (pink arrows), and "longitudinal correlations" (blue arrow) using univariate and multivariate analyses. All correlations were adjusted for nuisance confounders motion (during MRI), age, and head size.
AGING between each canonical variate pair, U and V [89]. The magnitude of the relationship between each variate pair is reflected by the canonical correlation coefficient (R c ), an indicator of how strongly the estimate of population covariation is reflected in both IDP and behavioural datasets. Intuitively, we can think of CCA as identifying two latent variables, U i and V i (i.e. canonical variates whose elements we refer to as individual subject weights) , from a specific linear combination of weighted MRI-derived brain measures that are most strongly associated to a specific linear combination of weighted behavioural measures.
IDP and behavioural datasets for CCA analysis were prepared using the same procedure as for the univariate correlation analysis. This resulted in a brain-IDP matrix of size 123 x 454 (subjects × IDPs) and a behavioural matrix of size 123 × 114 (subjects x behavioural measures) when investigating longitudinal correlations, and a brain-IDP matrix of size 123 x 453 (subjects × IDPs) and a behavioural matrix of size 123 × 70 (subjects x behavioural measures) when investigating cross-sectional-longitudinal relations. Typically, these datasets are the inputs fed into the CCA algorithm. However, to reduce overfitting (i.e., tending towards a rank-deficient CCA solution), prior to CCA we separately reduced the dimensionality of each dataset using PCA. Specifically, after accounting for missing data as before, we compressed the size of each matrix along the respective phenotype dimension to the top 30 subject-eigenvectors which accounted for ~70% of the total variance in our datasets (66.9% for IDPs, 76.1% for behavioural measures). The final dimension of each matrix fed into CCA was therefore 123 x 30 (subjects x PCA-derived components), with an output of 30 CCA modes estimated.
Statistical significance of the modes estimated was determined using 10,000 permutations of rows of one matrix relative to another. CCA was then re-run after each permutation and the respective r-values for each permuted CCA mode was estimated. Each observed canonical correlation r is compared to the null permutation distribution of the largest canonical correlation creating familywise error p-values corrected for searching over all 30 canonical correlation dimensions.

Post-hoc correlations
To relate the CCA modes estimated back to the observed IDP and behavioural variables, we perform post-hoc correlations between each observed (quantilenormalised and deconfounded) variable with the canonical variate weights (U or V). This is analogous to the computation of factor loadings, and in CCA are formally referred to as canonical structure correlations. Generally, variables with larger post-hoc correlations indicate greater association with a CCA mode.

AUTHOR CONTRIBUTIONS
KZ, SS and TN contributed towards analysis and interpretation of the data and drafting and revising the original manuscript for intellectual content. FA contributed towards analysis of the imaging data. ML contributed towards study direction and coordination of project. BF and ER contributed towards study design and review and editing of manuscript.

ACKNOWLEDGMENTS
We thank the members of the 1953 MDBC for their continued participation. We also thank all of those involved in the establishment of the Metropolit study and those who continue to work to support the aims of the project. We thank Ludovica Griffanti for her guidance and support with regards to the application of BIANCA to the imaging data. We also thank Naja Liv Hansen, the radiographers, and all other staff for their involvement in data collection. Finally, we also thank Copenhagen University for the supportive working conditions.