Estimation of brain age delta from brain imaging

It is of increasing interest to study “brain age” - the apparent age of a subject, as inferred from brain imaging data. The difference between brain age and actual age (the “delta”) is typically computed, reflecting deviation from the population norm. This therefore may reflect accelerated aging (positive delta) or resilience (negative delta) and has been found to be a useful correlate with factors such as disease and cognitive decline. However, although there has been a range of methods proposed for estimating brain age, there has been little study of the optimal ways of computing the delta. In this technical note we describe problems with the most common current approach, and present potential improvements. We evaluate different estimation methods on simulated and real data. We also find the strongest correlations of corrected brain age delta with 5,792 non-imaging variables (non-brain physical measures, life-factor measures, cognitive test scores, etc.), and also with 2,641 multimodal brain imaging-derived phenotypes, with data from 19,000 participants in UK Biobank.


Introduction
Brain imaging (and other sources of relevant data) can be used to predict "brain age" -the apparent age of individuals, when comparing their data against a population dataset spanning a range of ages. The difference between brain age and actual age (the "delta") is often then computed, providing a measure of whether a subject's brain appears to have aged more or less than the population average for their actual chronological age. For example, looking at structural MRI data, a high degree of atrophy (e.g., caused by disease) would cause a subject's brain to appear older than a normal age-matched brain [Franke et al., 2010;Cole and Franke, 2017].
The approach typically taken is to use one or more imaging modalities, for example, acquiring a T1-weighted structural image from each subject. The data then receives some level of preprocessing, e.g., alignment to standard space and tissue type segmentation. The imaging data then becomes "features" for predicting brain age -for example, from voxelwise maps of grey matter partial volume estimates, the voxelwise values themselves can be the features. Alternatively, a smaller number of more highly-condensed features, such as volumes of grey matter within multiple distinct brain regions of interest, may be generated. The entire dataset of multiple subjects' features, and their true ages, are fed into a supervised-learning algorithm (e.g., regression, support vector machine, deep learning), which learns to predict the subjects' ages from their brain imaging features. The hope is that a given subject's predicted brain age will deviate from their true age according to a meaningful delta, as long as this training is not badly overfitting.
While there have been a range of methods proposed for estimating brain age (i.e., choice of imaging-derived features to use, and choice of supervised-learning approach), there has been very little study of the optimal ways of then computing the delta. Presumably this is in part because this seems like a very simple calculation (delta equals brain age minus age). However, there are frequently various sources of bias in estimating brain age delta, which can give rise to significant false positives and false negatives when looking for associations between delta and other measures [Le et al., 2018]. Here we describe some important problems with this most common approach, and present potential improvements via explicitly laid out (albeit simple) mathematical frameworks.
We study these effects in simulated and real data, and suggest simple models for removing bias and increasing the accuracy of delta estimation. This includes models for correcting underestimation of brain aging, removing the resulting dependency of delta on age, modelling nonlinear dependence of brain aging (as a function of age), and studying nonadditive brain age delta. One of the causes of bias (regression dilution, see below) has recently also been discussed in depth in [Le et al., 2018], where the same linear correction as Eq. (4) below was proposed. A closely-related linear correction was also recently proposed in [Liang et al., 2019], although these authors suggest that regression towards the mean is the only source of bias, and we find empirically that non-Gaussian distribution of subject ages can be a major cause of bias.

Methods and results
In order to present the clearest description of the gradual development of models throughout the paper, we do not separate out Methods and Results sections, but intermix discussions of model improvements with simulation and real-data results.

Common approach
We start with some basic definitions, and outline the typical approach for brain age delta estimation. Actual age is Y (an N subjects Â 1 vector), brain age is Y B and the delta is δ ¼ Y B À Y. The imaging data matrix is X, which has N subjects rows and D columns; the columns are features from the imaging data, and might be different voxels, or different IDPs (imagingderived phenotypes -summary measures of brain structure and function). 1 It is common to try to predict Y B from X: Although many alternatives have been proposed for the form of f ðÞ and the fitting of its model parameters, the issues explored in this paper generally hold, irrespective of these choices (e.g., most brain age literature shows underestimation of brain age for old subjects, and overestimation in young subjects). We start with a very simple linear model, with f ðXÞ ¼ Xβ, a multiple regression with parameters β (a DÂ 1 vector). We therefore have: where we have added subscripts to differentiate some variables in this model from later variants. Clearly δ 1 is the brain age delta being estimated, and the noise term in this formulation. This multiple regression can be solved with standard simple methods, 2 for example, setting β 1 ¼ X þ Y using the pseudo-inverse X þ ¼ ðX'XÞ À1 X'. This gives the initially predicted brain age Y B1 ¼ XX þ Y, and brain age delta: Two complications are immediately clear: δ 1 will be orthogonal to the imaging data matrix X (though there is no reason to assume that this is desirable), and it will not be orthogonal to age (Y). This latter issue is likely to be problematic, given the simplest conception of the delta as being the difference between brain age and age, as a useful (objective, non-changing) description of a subject's brain-health that is not dependent on their current age. In the extreme case of X being an entirely useless model for brain aging, Xβ 1 % 0 and δ 1 will just be À1 times the actual age.
In addition, there are several factors that almost always cause the estimated β 1 to be biased towards zero (Fig. 1), resulting in under-fitting to Y and hence "moving" age dependence into δ 1 : 1. It is typical for more sophisticated fitting methods (e.g., sparse/regularised regression) to result in underestimation of β 1 [Grosenick et al., 2013]. 2. It is common for datasets to not have a Gaussian distribution (across all subjects) for age, often because of hard limits on age in the study design. In the linear model framework, it does not generally matter what the distribution of the predictors is (here, X), but a non-Gaussian distribution in the independent variable (here Y) can cause serious bias (typically underestimation) in the estimated β 1 . 3. Errors in measurement of the predictors (X) also likely cause underestimation of β 1 ("regression dilution") [Le et al., 2018].
Note that the above formulations, as they stand, model the entire set of subjects together, to estimate brain age and the delta, and these causes of bias often exist in such a scenario. In practice, researchers often use cross-validation (or a group of healthy subjects and a clinically distinct subject group), where model parameters are learned from training data and then applied to left-out data in order to estimate delta, in a way that is more robust against model over-fitting. Applying such cross-validation (as opposed to all-in-one fitting) can be yet another factor causing nonorthogonality between estimated delta and age (because, like with regularisation, the tendency would be to increase under-fitting).
Finally, with β 1 being underestimated, and age-dependence therefore moved into δ 1 , there is the danger that association tests against nonimaging variables (e.g., cognitive status, health outcomes) will be dominated by true age (i.e., be driven by the aging process), rather than the intended brain age delta. Obviously this can be eliminated through careful deconfounding of the non-imaging variables, but with agedependence left in δ, this still results in loss of statistical sensitivity. To re-iterate the danger here: if (as seems to happen frequently in the literature) estimated brain age delta is not orthogonal to age, and if other variables (cognitive status, health measures, etc.) have not been deconfounded with respect to age, then any apparent associations between "brain age delta" and the non-imaging measures might be more driven by age and not true delta.

Stage-2 correction of delta
One very simple approach to correct for all of the above issues is to remove age dependence in δ 1 in a second step. For now we will assume only linear corrections, and will return to nonlinear correction later.
The method is easily motivated by considering the scatterplot of δ 1 against Y (Fig. 2). As discussed above, ideally we would want the data to be distributed around δ 1 ¼ 0, with no overall slope (dependence on Y). If an overall slope is present, we can simply fit a straight line to the full data cloud and subtract this: where we have defined δ 2 (the residuals from this fitting), to be orthogonal to age, with biases (and modelling failures due to poor X) removed.
Although it is perfectly convenient (and easy to understand) to carry out this modelling in two separate steps, they can easily be combined into a single calculation: 1 We assume throughout that Y and all columns in X are demeaned (shifted to have zero mean), to simplify equations with no loss of generality. Also, for readability, we do not in general differentiate in our notation between true parameters and estimates of those parameters. 2 Alternatively, if this is poorly conditioned, e.g., with the number of features D too large, using penalised regression. . A. The model A is Gaussian distributed, and simulated B is set to be equal to A, with Gaussian noise added to B. The true model fit (β ¼ 1) is correctly estimated. B. Increased measurement noise on B increases the error in the model fit (σ), but does not cause bias (even though the data cloud appears rotated). C.
Changing A to being non-Gaussian does not harm the model fitting. D. Applying regularised model-fitting (here Tikhonov regularisation) biases the model fit towards zero (even though the data cloud is not rotated). E. Adding measurement noise to A (i.e., after it has been used to generate B) causes regression dilution. F. Truncating B (a special case of non-Gaussianity in B) causes a biased model fit. The plots of predicted brain age from steps 1 and 2, vs. actual age. B,E. The plots of estimated brain age delta from steps 1 and 2, vs. actual age. C,F. The plots of estimated brain age delta from steps 1 and 2, vs. true delta. where M Y ¼ I À YY þ is the "residual-forming matrix", which orthogonalises a vector with respect to Y. 3 The second term in the brackets in Eqn.
(5) disappears because M Y Y ¼ 0. A simple intuitive interpretation of this is that XX þ Y is the predicted age from step 1, and therefore δ 2 is just the orthogonalisation of this with respect to actual age. The predicted brain age from this second step is Y B2 ¼ Y þ δ 2 . Fig. 2 uses a simple simulation to illustrate the two stages of the model fitting described above. In A and B the biases in the one-stage modelling are apparent, and these are removed in D and E. Furthermore, the correlation of estimated delta with the true delta is improved in the second step.

Switching predictor matrix and age
Alternatively, one could frame the modelling in an arguably more natural framework, with X being dependent on Y B : where γ is a 1 Â D row vector, meaning that Yγ is a rank-1 approximation to X. This approach has the advantage of looking "causally sensible" (brain aging affects what we measure of the brain). Also, if this is solved using Y as the predictor variable, δ 3 will be treated as being in the residuals and hence is defined as being orthogonal to age (and not to X).
Finally, some problems such as regression dilution go away, as we can generally assume that there are no errors in Y.
However, one might (in general correctly) expect that this model is not as statistically powerful as the above approaches, given that each column in X is being modelled separately from each other (with respect to estimating γ).
Again, this formulation can be easily solved: where scaling factor k ¼ ðY'YÞ=ðY'XX'YÞ, as derived from multiplying by the right-pseudo-inverse of γ. This defines the residuals ε as being vertically orthogonal to age, and horizontally orthogonal to γ. It can immediately be seen that δ 3 becomes equal to kδ 2 if X is first orthonormalised before being used here (i.e., X þ ¼ X'). This makes intuitive sense as we would now be saying that the columns in X do not depend on each other, so the potential for statistical insensitivity in the formulation of Eq. (7) disappears.

Improved prediction accuracy via singular value decomposition
It is straightforward (and typical) for the first stage of the common approach to use regularised modelling applied to X, particularly given that X is frequently formed from (unwrapped) maps of voxels, resulting in there being too many variables in X for trivial application of multiple regression. Even if this were not the case, regularised estimations are known to often perform better than non-regularised estimations in terms of statistical efficiency, for a sufficiently good choice of the regularisation parameter.
As one option for regularisation, X can be pre-reduced using SVD (singular value decomposition): X ¼ USV', where the J strongest eigensubject-vectors from U (those that explain the strongest variance in X) would be used in place of X. This procedure is sometimes referred to as principal-component-regression [Massy, 1965;Franke et al., 2010], and has the general effect of "denoising" X and therefore improving overall modelling for an appropriate choice of J. By definition, U is orthonormal, and hence, as discussed above, this causes δ 2 and δ 3 to be the same (apart from the overall scaling). It is important to note that, even when using effective regularisation, it remains the case (as demonstrated in examples below) that the initial estimate (δ 1 ) most commonly used in brain age studies remains suboptimal and biased. In the following sections we illustrate this with simulated and real data.

Cross-validation
Although not a primary focus of this paper, we briefly discuss here the use of cross-validation. In general, when there is any risk of an analysis over-fitting the data (e.g., where there is some data-dependence in the processing, such as is the case with any supervised machine learning), it is important to use methods such as cross-validation to avoid inflated estimates of modelling success.
The models covered in this paper are sufficiently well-conditioned when run on large subject numbers that cross-validation is not expected to change results greatly. In practice, we found that results reported in the various simulations reported below were only significantly altered (when using cross-validation) where the models fitted had the greatest numbers of features -i.e., when using the full (no SVD) model for X, or when using the maximum number of SVD eigenvectors. Nevertheless, all results reported below were estimated using cross-validation, which we now describe briefly.
We applied 10-fold cross-validation, where the data samples are randomly assigned into 10 roughly equal-sized groups. For each group of left out data, the other 90% of samples (subjects) are used to "train", i.e., estimate, model parameters. These parameters are then applied to the left-out subjects. In this case, the training refers to 4 analysis stages: a) confound removal (for real data), b) SVD reduction, c) δ 1 initial estimation and d) δ 2=3 correction.
Finally, note that if automated model tuning is to be carried out (e.g., optimal SVD dimensionality to be estimated from the data), then clearly this also should be done within a cross-validation framework.

Evaluations with simulations and real data
We now present results from simple but realistic simulations and real data.

Simulation 1
For Simulation 1, we set the number of samples (subjects) to 20000, set a fairly sharply truncated (non-Gaussian) distribution for the age range (total range approximately 45-75y), and added Gaussian δ with standard deviation 2y to form the gold-standard brain age. We then defined 100 underlying components (processes) of subject variation in "brain imaging" measures, the first being brain age, and the other 99 being random. We then mixed these ground truth population modes by a (100x3000) sparse mixing matrix (random Gaussian noise to the fifth power) to form 3000 imaging variables, resulting in an X of size 20000x3000. Finally, we standardised all columns in X to 1, and added measurement noise with a standard deviation of 0.5. (Reducing this noise to the kinder level of 0.1 does not make a large qualitative difference to the results.) We ran the simulation 20 times, and show the mean and standard deviation results across the 20.
The results can be seen in Table 1. The most important results are the 3 right-most columns, where (in the simulated datasets) Q shows how well the different estimates of δ correlate with the true δ. The first row shows results when the "raw" X is used (i.e., without SVD). This is outperformed by the best SVD-based analyses. However, even in this case it is clear that δ 1 is not as accurate at recovering the true δ as δ 2 .
The best estimates of brain age delta are when using δ 2 or δ 3 with an SVD reduction to 100 components. Reducing the number of subjects to 500 gives qualitatively similar results, with the best option again being SVD reduction to 100 components (and best δ 2=3 prediction of true delta having a correlation of 0.82).
In all cases there is strong (negative) correlation between estimated brain age delta δ 1 and actual age (column 2), an undesirable feature of the most simple brain age modelling. This is at its worst for poor models (very low SVD dimensionality), which in effect is like Fig. 1E (regression dilution). We do not show these correlations for δ 2 and δ 3 as these are zero by design. While many of these analyses show very good correlation between actual age and predicted age (columns 3-5), it is clear (from the final 3 columns) that this is not a good indicator of successful modelling of the brain age δ.
When X is orthogonalised via SVD, δ 2 and δ 3 become identical (apart from scaling factor k), and results improve for sufficiently large values of J. The best results are achieved when setting the SVD dimensionality J to be the "correct" number (here 100). Underestimating dimensionality (e.g., here J ¼ 50) damages estimation much more than overestimation (e.g., J ¼ 1000), an important result to keep in mind when choosing J.
The estimates of mean absolute values of δ largely show the expected pattern of results: smaller δ in general indicates more successful modelling (as judged by the gold standard metric, the accuracy of recovering the true δ). This is expected because successful modelling of age implies relatively small modelling residuals, upon which delta is based. However, there are clearly exceptions to this, e.g., with the smallest (across different models) jδ 2 j corresponding to total modelling failure (SVD J ¼ 1). 4 More importantly (in practice), the optimal modelling choices can show slightly "supoptimal" mean absolute δ (see UK Biobank results discussed below). Hence it is important to realise that simply choosing (or tuning) a method that minimises estimated δ does not guarantee to give the best overall results.
While δ 2 and δ 3 are identical 5 (up to a scaling factor) when using SVD, their overall scaling performs quite differently, with the scaling of δ 2 being more stable and accurate overall. This means that correlations between δ 2 and δ 3 with true δ are the same, but, because of the difference in overall scaling, once these are added to true age to form estimated brain age, that will differ between the two models (as seen in r(Y, Y B1 ) vs. r(Y, Y B2 )).

Simulation 2
Simulation 2 reflects a much simpler scenario (probably unrealistically simple), in order to illustrate what happens if there are no other structured effects in the data apart from age dependency. In this case the 99 structured effects were removed, and noise standard deviation raised to 10. When using the original X (i.e., no SVD), δ 2 performs significantly worse than δ 3 . Using SVD gives results that are as good as when using δ 3 with no SVD (and using J > 1 gives almost identical results; using J ¼ 1 works well here because of the true data rank being 1). Most importantly, the simple common approach (δ 1 ) still results in suboptimal estimation of true delta, and strong incorrect correlation with age.

Real data
The real data evaluation used IDPs from 19000 subjects in UK Biobank. We used 2641 IDPs spanning a range of structural, diffusion and fMRI phenotypes, to create X. Confounds were removed from the data as done in [Miller et al., 2016;Elliott et al., 2018] (although of course age-dependent confounds were not removed from X). While it is common (and generally sensible) to derive brain-aging models from healthy subjects only, and then apply those models to all subjects (including those Table 1 Results from quantitative evaluations of brain age estimation. We show results for two different simulations and one real dataset. Different rows are for different SVD dimensionality reductions, with the first in each experiment being with no use of SVD at all. Columns 2-5 show the correlations of various estimated measures with true age. The next three show the mean absolute value of three different brain age δ estimates (for the true δ in the simulations, it is 1.6y). The final three columns ("Q") show the most important aspect of the results: in the case of the simulations, these are the correlation between the true delta and the different estimations of delta. In the case of the real data from UK Biobank, where we do not know the true delta, we are using as a surrogate a summary measure of the significance of the correlation between estimated delta and 5792 non-imaging variables.
with disease), we did not remove individual subjects from the modelling here using UK Biobank data, given that the fractions of imaged subjects having specific existent diagnoses are low (less than 10% having mental health or neurological diagnoses).
The methods described above were applied to estimate brain age delta. As this is real data we do not know the "true" delta, and therefore as a surrogate for this, for the "Q" columns, we instead estimated the significance of the correlation between estimated deltas and 5792 nonimaging variables, after applying the same deconfounding (though now including age-dependent confounds) to those variables. In general we would hope that "higher correlation is better": accurately estimated brain age delta should have stronger correlations with interesting non-imaging variables such as health outcomes and biophysical markers. In order to turn the 5792 correlations into a single summary statistic, with emphasis on stronger associations rather than weakest (null) associations, we took the 99th percentile of Àlog 10 ðPÞ over all non-imaging variables as our measure of success ("Q").
Doing SVD reduction with J % 50 components gave the best results, and δ 2=3 gave much stronger associations with non-imaging variables than δ 1 . Encouragingly, a relatively wide range of J (e.g., 20-100) all gave similar strong results. The "symmetry" of Q (as a function of J, looking either side of the optimal dimensionality) for real data is different than the pattern found in the simulated data, where success fell off very quickly as dimensionality was reduced to being lower than the true dimensionality. We hypothesised that this was because the simulated data had a true strong cutoff in the eigenspectrum, whereas real data has a much more gradually falling eigenspectrum, as structured signals are of varying strength. We re-ran Simulation 1, this time with the strengths of the added 99 dimensions of structured subject covariation being modulated by a random number uniformly drawn from 0:1. The results were unchanged with respect to the optimal dimensionality (still being 100, and having Q ¼ 0.97). However, now the results were more symmetric about this, i.e., lower dimensionalities did not perform as badly, with even a low J of 25 giving Q ¼ 0.8.
The real data results are illustrated further in Fig. 3, showing clearly the value in correcting the estimate of brain age delta, and the danger of computing associations between biased brain age delta and non-imaging variables that have not been deconfounded for age.

Nonlinear relationships between age and imaging data
One cannot assume that the effect of aging on imaging measures is a linear function of age; indeed, acceleration of the effects of aging (in older age) seems quite likely, particularly in disease. The above models are simple to adapt to include an additive nonlinear term in Y, the most natural extension being to add a quadratic term. 6 To adapt the first model described above, we can integrate the quadratic correction into the second step (that also corrects for bias in estimating the linear effect in step 1). Hence we subsume the quadratic term into the initially estimated δ (which is particularly straightforward Fig. 3. Brain age prediction results with real data from UK Biobank (N¼19000). A,B) The first two scatterplots show the clear bias (agedependence) in δ 1 being corrected in δ 2 (each point is one subject). C) shows relative strengths of correlations of δ 1 and δ 2 with (fully deconfounded) non-imaging variables, with the latter showing consistently stronger associations (each point relates to the association of a single nonimaging variable to brain age delta). P-values are uncorrected for multiple comparisons. D) shows the change in associations with δ 1 , without (x axis) vs. with (y axis) age-deconfounding of the non-imaging variables; this illustrates the danger of looking for associations between biased brain age delta and non-imaging variables that have not been corrected for age dependence. E) shows a similar story, when using δ 2 on the y axis; bias can cause (probably unwanted and misleading) associations (points under the y ¼ x line), while applying the corrections described here can raise sensitivity to finding (what are hopefully valid) associations in other cases (points above the line). 6 Without loss of generality, and to simplify notation and interpretation, in our equations and in the simulations, by Y 2 we mean a quadratic term that has been orthogonalised with respect to Y and demeaned. If Y has already been demeaned, then Y 2 will already be close to being orthogonal to Y.
if we assume that the quadratic term has been orthogonalised with respect to Y): where β 2q has two free parameters, covering the linear and quadratic regressors. Note that the first step is identical to what we had before; hence the unchanged subscript in δ 1 . This can all be combined: where Fig. 4 uses a simple simulation to illustrate the two stages of the model fitting described above. In A and B the biases in the one-stage modelling are apparent, and these are removed in D and E. Furthermore, the correlation of estimated delta with the true delta is improved in the second step.
Note that step 1 above assumes that it is most useful to group the quadratic term into the residuals of the first model fitting. It may be that some study populations have a linear age effect that is smaller than the quadratic, for example with young healthy adults around the peak of the lifespan "growth-aging inverted-U curve". In such cases, it might be possible that step 1 should fit to the quadratic term rather than the linear, with step 2 being unchanged.
For the second model, that switches predictor matrix and age, again the extension to a quadratic term in Y is straightforward: where γ becomes two rows of parameters (and hence incorporating α), and for the final line we have post-multiplied by just the first row of γ þ , i.e., X'Yk. According to the model in Eq. (13), the two rows of γ should be identical (up to a scale factor), but in practice there will normally be much less variance in the data associated with the quadratic term, and we find indeed that results are improved by only using the linear term (i.e., using the first row of γ). (Again, in some populations, it may be better to use the quadratic part of the model rather than the linear, for the estimate of γ.) Table 2 shows quantitative results from simulations of nonlinear brain aging, and from the UK Biobank real data. For the simulations, we start with the first linear simulation above, and add quadratic aging effects. For these 3 simulations we set true quadratic-term α values (Eq. (9)) of 0, 0.01 and 0.025 respectively, corresponding to total deviations away from linear brain aging of 0, 4y and 10y.
With no simulated nonlinear effect (Sim1), the quadratic models give identical results to the linear models (i.e., there is no noticeable penalty being paid for the additional model flexibility, presumably because, given the simplicity of the corrections to delta, overfitting is negligible). When a quadratic effect is included in the simulation, the quadraticcorrection models (i.e., generating δ 2q and δ 3q ) provide a major improvement in accuracy over the linear models. Accuracy of recovering the true δ remains high even for large amounts of nonlinear behaviour, providing J is optimal. As before, optimal SVD data reduction outperforms using the original data matrix X.
The quadratic modelling in the UKB real data accounts approximately for a 2y total deviation from the linear fit. Including this quadratic modelling results in improvements in the strength of associations with non-imaging variables, although these improvements are here very small. In disease populations this might be expected to be much greater. Where the nonlinear effect is greater, it is straightforward (particularly for the first, two-stage, model) to extend the above formulations to higher powers, or, possibly more well-conditioned, nonlinear modelling such as with splines.
In all cases the "common" approach (δ 1 ) performs significantly worse than these models that correct for bias and nonlinearity in brain age prediction. Fig. 4. Examples of the two stages of quadratic-fit age prediction. The black lines show the ideal linear model fits. A,D. The plots of predicted brain age from steps 1 and 2, vs. actual age. B,E. The plots of estimated brain age delta from steps 1 and 2, vs. actual age. C,F. The plots of estimated brain age delta from steps 1 and 2, vs. true delta.

Non-additive brain age delta
A different model for brain aging would be for each person to be aging at a different rate, meaning that their delta (according to the above models) is changing with age, as opposed to being fixed. Unfortunately, this cannot trivially be distinguished from having constant delta, at the level of the individual subject, if there is only one measurement (timepoint) per subject.
This ambiguity arises because subject-specific rate of aging would be considered to create a spread of trajectories around the population mean aging curve. By this definition, the effect of interest is orthogonal to the population mean aging effect, and therefore gets included in estimates of delta given above. Hence, the above aging models do not explicitly identify multiplicative brain aging, yet readily fit data from individual subjects at a single timepoint as an additive offset term regardless of the cause (constant offset vs. multiplicative).
However, while additive and multiplicative brain aging cannot easily be disambiguated at the subject level, it is possible to apply the above models, and then use the resulting delta values to estimate how scaling of the size of delta is changing with age in the population as a whole: where we form a temporary version of age Y 0 , which is a linear mapping of Y into the range 0:1, and hence jδ 0 j relates to the brain age delta distribution for the youngest subjects in the data. The product on the right is a pointwise product between the two column vectors δ 0 and ð1 þ λY 0 Þ. The reason for formulating things this way will now become clear, where we solve by squaring, taking the natural log, approximating the log function on the right using a power series expansion, and solving with linear regression: where D 0 ¼ logðδ 2 0 Þ is the noise in this model and this assumes that λ is not very large and negative. Hence fitting the model Y 0 to data logðjδjÞ gives us an initial λ, which we can then adjust for the expansion approximation error: To evaluate this with simulated data, we took the first linear simulation described above, and added age-dependence by setting true lambda to 0.5 according to Eq. (17). We then ran the simulation 10 times, with J ¼ 100, estimating λ from δ 2 . This resulted in estimated λ ¼ 0:46 AE 0:09.
On real data from UK Biobank, using δ 2q estimated using optimal SVD setting of J ¼ 50, we find that λ ¼ 0.13, with the regression-based fitting Table 2 Results from quantitative evaluations of brain age estimation in the presence of nonlinear brain aging. We show results for 3 realistic simulations and a real dataset. The 3 simulations have different strengths of nonlinearity in brain age Y 2 α, with α ¼ 0; 0:01; 0:025 respectively. Different rows are for different SVD dimensionality reductions, with the first in each experiment being with no use of SVD. We show mean absolute value of different brain age δ estimates (for the true δ in the simulations, it is 1.6y) and then correlations between estimated and true δ (for the simulations) and significance of associations with non-imaging variables (for real data), as above. also estimating this as being significantly greater than zero (P ¼ 0.001). This corresponds to an increase in "spread" of brain age delta of about jδjλ ¼ 0:4y when moving from the youngest to oldest subjects in UKB (45-80y). This provides evidence for a modest non-additive brain aging effect in this largely healthy, aging population.

Further results from UK Biobank brain age estimation
We carried out brain age delta estimation on UK Biobank data as described above, and also for females and males separately. Below we report results using δ 2q .
The delta estimations for females were almost identical when comparing delta estimated purely for females vs. estimated from the allin-one delta estimation from females and males together (r ¼ 0.98). The same was true for males (r ¼ 0.97). From the all-in-one analysis, females had a mean brain age delta that was 0.7y higher than in males.
We then correlated these various versions of brain age delta against 5792 non-imaging variables from UK Biobank, and converted these uncorrected P-values into À log 10 P. As expected from the above results, these vectors of correlation significance (with one entry in the vector for each non-imaging variable) were highly similar when comparing sexseparated delta estimation for females against when taking the deltas for females from the all-in-one estimation (r ¼ 0.98). The same was found for males (r ¼ 0.98). Hence, for sex-specific results reported below, we only list those from the sex-separated delta estimations.
Comparing Àlog 10 P derived from all subjects (both sexes) from the all-in-one analysis against the sex-separated estimates of Àlog 10 P showed greater differences (all subjects vs. female: r ¼ 0.87, vs. male: r ¼ 0.73). Table 3 The strongest associations between brain age delta and non-imaging variables in UK Biobank, 19038 subjects, males and females combined. Positive correlations (red italics) imply accelerated brain aging.
Comparing Àlog 10 P derived from female-only analysis against male-only gave r ¼ 0.36.
The strongest associations with non-imaging variables are shown in Tables 3 and 4. 7 Bonferroni correction, for the number of non-imaging variables, gives a threshold for Àlog 10 P of 5.1, but to limit the number of reported associations to a reasonable degree (and because effect sizes become tiny at this threshold, the weakest passing this being 0.1% variance explained!), we report results for Àlog 10 P > 8. The non-imaging variables are denoted via unique variable codes and brief descriptions. For more information on any variable, one can take the initial integer from the ID, and search for the variable on the UK Biobank website: https ://biobank.ctsu.ox.ac.uk/showcase/search.cgi. For these lists we have manually removed largely-redundant (highly Table 4 The strongest associations between brain age delta and non-imaging variables in UK Biobank, for just the 10112 females, and just the 8926 males. Positive correlations (red italics) imply accelerated brain aging. similar) variables for purposes of readability, listing just the strongest associated result in each case. If a variable is positively correlated with brain age delta, this implies that accelerated brain aging is associated with larger values of that variable, i.e., it is a "bad" life factor or biological measure. Of course, the results do not allow one to infer causality. There are several strong patterns that emerge. Higher body weight, body fat and bone density are all associated with reduced brain aging, largely in females. Higher blood pressure, heart rate and blood haemoglobin are all associated with accelerated brain aging, in both females and males, but to a slightly stronger extent in males.
Smoking and alcohol are associated with accelerated brain aging. There are several life-factor/life-style measures associated with delta, presumably reflecting socio-economic status, where the biological causality is likely complex (e.g., household income, number of vehicles in household, and possibly also related, time spent outdoors in summer). Several measures from the cognitive testing are associated with brain age delta, in all cases in the direction one might expect; measures of success/ accuracy correlate negatively, and measures of "time taken" (to complete a cognitive task) correlate positively.
Finally, two associations related to clinical diagnosis/treatment, both found in males, are that the number of treatments/medication taken, and diagnoses of diabetes, are both associated with accelerated brain aging.
We also looked at the correlations between delta and the imaging features (IDPs). This is expected to largely reflect which IDPs most strongly contribute to the modelling of the brain age delta, but of course, being a univariate analysis, is straightforward to interpret and does not take into account redundancy across IDPs. Table 5 lists these, sorted according to decreasing strength of correlations computed from all subjects (females and males), but also showing correlations for just females and just males. We do not report P here, as, given the relatively strong correlations, the P values are too strongly significant to be differentially informative. IDPs are included in the table if any of the 3 correlations (from females only, males only, or all subjects) are stronger than 0.3.
We highlight in blue the IDPs for which the correlation with brain age delta is different between females and males by more than 0.05. More detailed descriptions of the IDPs (including expansions of some of the anatomical acronyms) can be found here; http://www.fmrib.ox.ac.uk/ ukbiobank/IDPinfo_Jan2018.txt.
The strongest associations with delta are for grey and white total volumes, both normalised for head size and unnormalised. (Note also that we have used fully deconfounded versions of the IDPs for these correlations with delta, which included regressing out head size). There are then many measures of white matter microstructure (derived from the diffusion MRI) that correlate with delta, a minority of which are reasonably strongly different between the sexes.

Conclusions
We have discussed various problems and potential solutions relating to the estimation of brain age delta. It has commonly been the case that estimated delta is not orthogonal to age, and this can result in false associations with non-imaging variables, as well as a loss in sensitivity to finding valid associations. While estimation bias is a common general statistical issue, it is specifically problematic in the area of brain-age modelling, because the final computed delta is the difference of the estimated quantity and the true value.
We found that the strength of correlation of estimated brain age with actual age is not guaranteed to be a good indicator of optimal delta estimation, and neither is the (smallness of) the size of mean absolute delta.
We have shown that brain age delta can be estimated well by fitting one of the models described in this paper (e.g., see algorithm below), where bias in brain age estimation, and nonlinear dependence can be adjusted for.
We have also described how one might study non-additive effects of aging, where different subjects age at different rates. One of the few Table 5 The strongest associations between brain age delta and imaging-derived phenotypes (IDPs) in UK Biobank, for all subjects, and for just the 10112 females, and just the 8926 males. IDP names listed in blue italics denotes where the correlations differ between females and males by more than 0.05. studies that has carried out linear age adjustment of delta is . Interestingly, in this case, the adjustment reduced the strength of associations with external information (in that case, heritability). While this goes against most of our findings above, it may relate to non-additive aging, in which case, combining the modelling (for non-additive aging) presented here with the corrected delta models presented above might optimise a delta estimation. Alternatively, it may simply be that the bias (present in delta before correction) contributed to apparent heritability, given that twins have the same age and so experience related bias.
One simple, effective and computationally cheap tool for modelling brain age (before delta estimation) is to reduce the input data features (from the brain imaging) via SVD. We have shown that there is likely to be an optimal SVD dimensionality for the data reduction, and that it is safer to slightly overestimate rather than underestimate this dimensionality.
To apply the recommended approach for estimating brain age delta, apply the following.
All associations of brain-age delta with UK Biobank non-imaging variables are listed in a supplementary spreadsheet. Example Matlab code for the delta computations and all simulations can be found at: http://www.fmrib.ox.ac.uk/BrainAgeDelta.