A watershed model of individual differences in fluid intelligence

Fluid intelligence is a crucial cognitive ability that predicts key life outcomes across the lifespan. Strong empirical links exist between fluid intelligence and processing speed on the one hand, and white matter integrity and processing speed on the other. We propose a watershed model that integrates these three explanatory levels in a principled manner in a single statistical model, with processing speed and white matter figuring as intermediate endophenotypes. We fit this model in a large (N=555) adult lifespan cohort from the Cambridge Centre for Ageing and Neuroscience (Cam-CAN) using multiple measures of processing speed, white matter health and fluid intelligence. The model fit the data well, outperforming competing models and providing evidence for a many-to-one mapping between white matter integrity, processing speed and fluid intelligence. The model can be naturally extended to integrate other cognitive domains, endophenotypes and genotypes.


Introduction
Fluid intelligence, or fluid reasoning, is a core feature of human cognition.It refers to the ability to solve novel, abstract problems that do not depend on task-specific knowledge (Blair, 2006;Carroll, 1993;Deary, 2012;Horn and Cattell, 1966).In contrast to crystallized intelligence, which continues to improve across most of the lifespan, fluid intelligence shows strong age-related declines (Horn and Cattell, 1966;Salthouse, 2009).Understanding the causes of this decline is important for healthy ageing, as preserved fluid intelligence is strongly associated with independent day-to-day functioning (Tucker-Drob, 2011;Willis and Schaie, 1986), and is inversely related to mortality risk (Aichele et al., 2015).At the other end of the lifespan, low fluid intelligence in adolescence predicts poor outcome in later life (Huepe et al., 2011) and is a risk factor for psychopathologies such as schizophrenia (Blair, 2006;Snitz et al., 2006).However, our understanding of how this crucial cognitive ability relates to broader, mechanistic frameworks of cognition and the brain is limited.A promising line of research focuses on the relationships between fluid intelligence, processing speed and white matter organisation.Although intriguing, these empirical relationships are often interpreted in isolation, relating fluid reasoning to processing speed (e.g.Sheppard & Vernon, 2008), processing speed to white matter (e.g.Penke et al., 2010), or fluid intelligence to white matter (e.g.Haász et al., 2013), but never the three together.One unresolved question is therefore whether fluid intelligence, processing speed and white matter can be thought of as part of a single, hierarchical system.
Here, we propose a statistical framework to examine this question, developed by formalizing a conceptual model taken from the literature on psychopathological constructs and their causes.This so-called 'watershed model' (Cannon and Keller, 2006) uses the metaphor of a river system to illustrate how complex behavioural traits can be seen as the downstream consequence of many small upstream (e.g., neural/genetic) contributions.From this perspective, the relationship between fluid intelligence (hereafter FI), processing speed (PS) and white matter (WM) is hierarchical, such that WM influences PS, which in turn affects performance on tests of FI.We show that this model naturally accommodates a wide and disparate range of empirical findings, integrates a series of relatively well-established findings into a single larger model, and, most importantly, can be formally tested using Structural Equation Modelling (SEM).We derive a variety of statistical predictions that follow from the watershed model, and use SEM to test these predictions empirically in a large (N=555), population-based sample of ageing adults (18-87 years, Cam-CAN).First, we examine the empirical evidence concerning FI, PS and WM.
1.2 Processing speed, fluid intelligence and white matter Processing speed refers to the general speed with which mental computations are performed.It has been considered a central feature of higher cognitive functioning since the development of the first formalized models of (fluid) intelligence (Salthouse, 1982;Spearman, 1927).
Processing speed is a broad concept that has can be measured in a variety of ways (Salthouse, 2000).
One common approach is to use a set of tasks with strict time-limits, and consider the shared variance across those tasks to reflect an individuals' ability to perform cognitive tasks under time pressure (Babcock et al., 1997).Even purely physiological measures have been considered, such as the latency of neural evoked responses (Salthouse, 2000, Schubert et al., 2015).Other possibilities include the parameters estimated from response time distributions in a single task, such as the mean, standard deviation and exponential for an ex-gaussian distribution, or parameters such as drift rate and boundary separation in diffusion models (Matzke & Wagenmakers, 2009, Ratcliff, Smith, Brown, & McKoon, 2016).Here, we focus on the most basic and simple notions of processing speed, sometimes called psychomotor speed, namely the mean and standard deviation of RT distributions for simple tasks.
The empirical association between PS and FI is one of the most robust findings in psychology (Sheppard and Vernon, 2008).This association holds across the lifespan (Salthouse, 1994), in both healthy elderly (Ritchie et al., 2014) and in the extremes of mental retardation (e.g.Kail, 1992).
Longitudinal studies of either end of the lifespan show similar patterns.Dougherty and Haith (1997) showed that infant reaction time at 3.5 months predicts IQ several years later, and Fry and Hale (1996) showed in 214 children and adolescents how longitudinal changes in processing speed mediated changes in fluid intelligence and working memory.At the other end of the lifespan, declines in PS and FI show considerable correlations in old age, with estimates ranging from .53 (Zimprich & Martin, 2002) to .78 (Ritchie et al., 2014) .Similarly, a large longitudinal cohort study (Ghisletta, Rabbitt, Lunn, & Lindenberger, 2012) showed that a considerable portion of within-subject agerelated decline was shared between FI and PS.Although few studies have explicitly examined the temporal ordering of developmental changes, those that do generally find that declines in PS affect declines in FI and related cognitive abilities.For instance, Kail (2007) examined 185 children (age 8-13) tested twice on multiple outcomes, and found that the best mediation model described a developmental cascade, wherein improvements in processing speed affected working memory which in turn enhanced reasoning.In older adults, Robitaille et al. (2013) showed in two separate cohorts that within-subject declines in processing speed mediated within-subject declines in multiple cognitive domains, including fluid reasoning.Finally, Finkel, Reynolds, McArdle and Pedersen (2007) used bivariate latent change score models in older adults to show that processing speed was a leading indicator of cognitive changes, including in abstract reasoning tasks.Together, these behavioural findings suggest a strong relationship between processing speed and fluid reasoning ability.
The most common metric of PS is the central tendency, such as the mean or median, of RTs on a simple reaction time task.However, individual differences in the variability of RTs also relate to fluid reasoning ability (Rabbitt, 1993), such that less variable responses are associated with higher scores on fluid reasoning tasks.This 'cognitive consistency' in RTs has been shown to predict cognitive performance in elderly subjects beyond mean RT (MacDonald, Li, & Bäckman, 2009).Both the central tendency and variability of PS predict all-cause mortality (Batterham et al., 2014;Hagger-Johnson et al., 2014), supporting the idea that both are important and independent components of PS.The role of variability can be observed even on the purely neural level: A study using EEG in young adults (Euler et al., 2015) found evidence for the role of variability of neural responses, such that individuals 6 with more stable (less variable) responses to novel stimuli tended to have higher fluid reasoning ability.
Recent work suggests that the proper conceptualisation of the relation between PS and FI is as a causal factor (e.g., Kail, 2000;Rindermann and Neubauer, 2004;Robitaille et al., 2013).The most influential causal account comes from Salthouse (1996), who suggested at least two mechanisms by which PS affects cognitive performance, namely the limited time mechanism and the simultaneity mechanism.The former suggests that in any timed task, slower speed of processing simply precludes the timely completion of cognitive operations, leading to poorer scores; the latter suggests that high PS is necessary to juggle mental representations simultaneously, in order to perform complex cognitive operations (see Burzynska et al., 2013, for neuroimaging evidence for this claim).More recent work (Schubert et al., 2015) used drift-diffusion and EEG modelling to show that there are multiple components to processing speed, and that these components play different causal roles in different cognitive tasks.In summary, nearly all of the papers reviewed above, either explicitly or implicitly, consider PS to be a 'lower', or more fundamental, mental process that is not identical to FI itself (see also Schubert et al., 2015).We can also go further down this presumed causal hierarchy to understand the possible determinants of PS.One such candidate is the structural organization of white matter tracts.
Among the most influential studies showing the importance of white matter organisation are two papers by Penke and colleagues, who showed that the first principal component of fractional anisotropy (FA, a measure of white matter organization) predicted both information processing speed (Penke et al., 2010) as well as general intelligence (Penke et al., 2012).Further work has shown that decreased WM organisation has been associated with decreased PS both in healthy adults (Tuch et al., 2005;Penke et al., 2010) and in individuals suffering from clinical conditions associated with WM loss such as Multiple Sclerosis (Kail, 1997(Kail, , 1998;;Roosendaal et al., 2009;Segura et al., 2010; see Bennett & Madden, 2013, for a review).However, in a sample of 90 older adults, Yang, Bender, & Raz (2014) did not find strong associations between white matter organisation and reaction time components derived from a diffusion model.WM organisation has also been associated with the variability of RTs in children (Tamnes et al., 2012), in healthy controls and preclinical Alzheimer's dementia (Jackson et al. 2012), and decline in WM has been proposed as a key cause of age-related changes in cognition (O'Sullivan et al., 2001).This relationship between WM and performance variability has been found to strengthen with age (Fjell et al., 2011;Laukka et al., 2013;Lövdén et al., 2013b).Other studies have found direct relationships between WM measures and FI (Haász et al., 2013;Kievit et al., 2014) and specific neural (including white matter) structural correlates of intraindividual variability (MacDonald et al., 2009(MacDonald et al., , 2006)).Similarly, lesions in WM predict age-related declines in mental speed (Rabbitt et al., 2007a).Assessing a broad set of cognitive and neural markers in a large, age-heterogeneous cohort, Hedden et al. (2014, p. 1) conclude that 'The largest relationships linked FA and striatum volume to processing speed and executive function'.
A critical link in our model is the behavioural consequence of the microstructural properties evident in the white matter structures, as they are presumed critical for signal transmission between disparate regions of cortex.Various mechanisms (although none demonstrated definitively) have been proposed to explain the relation between WM and PS.One hypothesis is that inefficient signal transmission weakens the signal in neural activity, and/or increases the background noise, ultimately leading to slower decisions (Kail, 1997;Rolls and Deco, 2015).While ageing is generally thought to be accompanied by reduced neuronal plasticity, a growing number of models have addressed the complex-and not uniformly depressing-possibility that age-related changes in brain-behaviour relationships are driven by shifting adaptations to this changing signal-to-noise ratio (Welford, 1984).
Such approaches take as a premise the idea that aging reflects a progressive refinement and optimization of generative models used by the brain to predict states of the world (Moran et al., 2014).In this context, observed reductions in axonal density in aging (Peters, 2009) may reflect longterm pruning and adaptation.A related hypothesis that is gaining support is the proposal that agerelated demyelination affects propagation of action potentials (Bartzokis et al., 2010), an explanation consistent with slowing in patients with MS (Turken et al., 2008).Unlike MS, however, observations of an age-related increase in dystrophic myelin are relatively rare in macaque microscopy studies, leading Peters (2009) to propose remyelination with shorter internodes as the cause of age-related slowing observed in neurophysiological data.Whilst much further research is needed, these mechanistic accounts help to explain the strong and consistent neurocognitive relationship between PS and WM (Bennett and Madden, 2014;Penke et al., 2010;Turken et al., 2008).Together this suggests a hierarchical relationship, where WM affects PS, which in turn affects FI.Below, we describe a model that can integrate these diverse findings.

Watershed Model
Our goal is to integrate the three explanatory levels (FI, PS and WM) into a single model that can address the range of empirical findings described above.This a general challenge in cognitive neuroscience, namely that of reductionism (Kievit et al., 2011a): How do we best relate the phenotype observed at the 'higher' level of measurement (e.g., scores on a test of fluid intelligence) to 'lower' levels of explanation (e.g., WM structure)?We next show how a theoretical model from the field of psychopathology can be translated into a testable psychometric model to achieve this goal.
In psychopathology, single cause models for mental disorders such as schizophrenia were initially popular, but have not been successful: Despite being highly heritable and having various structural brain correlates, the search for single (or even a limited set of) genetic loci has not yielded candidates that explain more than a trivial percentage of the variance of the phenotypes of interest.
Recently, the 'watershed model' proposed by Cannon and Keller (2006) has provided a conceptual framework to help understand the potential multiple determinacy between various explanatory levels in the study of mental disorders (see also Penke et al., 2007) A simplified representation of the watershed model is shown in Figure 1.The central idea is that an observable phenotype can be thought of as the mouth of a river (denoted by "1" in the Figure ), and is the end product of a wide range of small, causal, genetic influences (genotypes) that exert their influence through a series of intermediate endophenotypes (such as neural and cognitive variables).A crucial assumption in this model is that genetic influences do not directly affect the phenotype, but do so indirectly via endophenotypes.These endophenotypes are the hypothesized intermediate mechanistic steps between many small genetic influences that together exert considerable influence (the idea of many small genetic effects has been referred to as the Fourth Law of behavioural genetics, c.f. C. F. Chabris, Lee, Cesarini, Benjamin, & Laibson, 2015).Such a model allows us to integrate the disparate known endophenotypes as potentially independent upstream 'tributaries' (denoted by "2a"-"2d") that all contribute to the distal consequence ("1").
This model has a variety of conceptual benefits, including the fact that it naturally accommodates the constellations of antecedent causes that can contribute, independently, to some aggregate behavioural phenotype.This proposed causal heterogeneity explains the relative lack of  Cannon & Keller, 2006).Point "1" represents the most complex phenotype, such as schizophrenia (or fluid intelligence for our purposes).Points "2a-2d" represent endophenotypes, such as lower-level behavioural consequences; points "3a-3g" represent the neural antecedents of those behavioural phenotypes.Points 4a-4d represent hypothetical genetic influences (not measured here) success of directly mapping phenotypes onto genetic markers: Given that there is inherent multiplicity in the phenotype, the path from genetic causes to phenotypic outcomes will be noisy, so large samples will be needed (see Ripke et al., 2014 for an illustration of the striking increase in variance explained once sample sizes are large enough to estimate many small effects).Cannon and Keller (2006, p. 274) derive from their model various empirical and conceptual predictions.These include that endophenotypes (intermediate causes) should be heritable, they should be associated with causes rather than effects, and numerous endophenotypes should affect a given construct.The model therefore predicts that a more efficient way of studying genetic causes is to focus on the endophenotypes of a disorder first, and then examine the genetic antecedents of those endophenotypes located further 'upstream'.Moreover, endophenotypes are expected to vary continuously in the population and they should affect multiple disorders.We here adopt the watershed model to explain the relationship between fluid intelligence, processing speed and white matter by translating the watershed model from conceptual tool into a testable statistical model.
In our representation, FI is the 'mouth' of the river, influenced by upstream endophenotypes of PS and WM.This implies a hierarchical relationship, such that greater WM organisation (3a-3h) affects PS (2a-2d), which in turn affects fluid intelligence (1).If we examine the predictions by Cannon and Keller described above, we can see that both the phenotype and the proposed endophenotypes are highly heritable -FI (Deary et al., 2010), PS (e.g.Vernon, 1989) and whole-brain FA (Chiang et al., 2009) -yet there has been a notable lack of success in establishing replicable genetic markers for FI (Chabris et al., 2012).Recent work has shown how, under certain circumstances, reductionist hypotheses like that of Cannon and Keller can be translated to formal statistical models, such as structural equation models (SEM) of covariance patterns (Kievit, 2014;Kievit et al., 2011aKievit et al., , 2011b;;Salthouse, 2011).Below, we show how the multiple predictions of the watershed model can be translated into statistical tests within a structural equation modelling (SEM) framework.11

Greater upstream statistical dimensionality
As one can see in Figure 1, upstream 'tributaries' represent partially independent influences on the phenotype.This means that if we move up the tributaries of the river and examine the statistical dimensionality of the variables at each level, we would expect the dimensionality of the covariance pattern between all variables at that level to increase.Although they likely share some environmental or genetic influences and so will be correlated to some degree, we expect that the upstream effects cannot be fully captured by a single summary statistic.More importantly, we expect these upstream effects to be partially independent, such that any one intermediate endophenotype in isolation will do worse than a broader set in terms of predicting the downstream outcome.

Multiple realizability
As we have seen above, the watershed model suggests that seemingly unitary constructs are nonetheless likely to have multidimensional antecedent causes.In other words, a single behavioural dimension such as intelligence is likely to have multiple neural determinants; a type of betweenindividual degeneracy (Friston and Price, 2003).There is increasing support for such a many-to-one brain-behaviour mapping.For example, recent evidence suggests that differences in emotional states are better seen as a broad network of regions showing a different activation profile, rather than activity in individual regions in isolation mapping onto individual emotional states (Lindquist et al., 2012).Similarly, many concurrent and partially independent neural properties determine individual differences in broad cognitive skills such as general intelligence (Kievit et al., 2012;Ritchie et al., 2015b).In a SEM framework, this prediction means that variability in each endophenotype will make partially independent contributions to variability in the phenotype, in line with a so-called MIMIC model (Multiple Indicators, Multiple Causes; see Jöreskog and Goldberger, 1975;Kievit et al., 2012).

Hierarchical dependence
A defining characteristic of the watershed model is hierarchical dependence.That is, the influence of upstream causes are presumed to 'flow through' lower levels (endophenotypes).Statistically, this implies that there should be no residual, or direct, relationships between levels separated by a 12 purported endophenotype.In the present context, we hypothesize that the influence of white matter is indirect, namely through processing speed.In the SEM formalization of this hypothesis below, any direct paths between WM and FI will be a source of model misfit.Taken together, it is possible to capture all these statistical predictions in a single structural equation model.This model is a hierarchical version of the MIMIC model (shown graphically in Figure 6).This model assumes that a latent variable (here FI) represents the phenotypic endpoint.The unidimensionality of this phenotype is tested by fitting a confirmatory factor model to the various behavioural measures available (four sub-scores of the Cattell test in the present data).At the second level, we hypothesize that various measures of PS: a) cannot be captured by a single factor, b) provide partially independent predictions of the fluid intelligence, and c) the latent variable of FI 'shields off' all direct effects of speed measures on the observed Cattell scores.Likewise, the WM tracts should have partially independent influences on the PS variables, but there should be no direct paths to FI that would explain away the relation between PS and FI.The statistical predictions described above can either be tested as part of the full model such that violations will lead to model misfit, or by explicit testing of individual predictions using model comparison.We will fit a MIMIC model in stages, so as to build up to the full the watershed model, and examine whether the predictions at the various stages described above are supported by our data.

Sample
A healthy, population-derived sample was collected as part of Phase 2 ("700") of the Cambridge Centre for Ageing and Neuroscience (Cam-CAN), described in more detail in (Shafto et al., 2014).
Exclusion criteria included low Mini Mental State Exam (MMSE) (24 or lower), poor hearing, poor vision, poor English (non-native or non-bilingual English speakers), self-reported substance abuse and current serious health conditions.Prior to analysis, we defined outliers as values for any variable that were more than 4 standard deviations from the mean (0.27% of all values) and included all participants with scores on all variables.The final sample contained 555 people, 274 female, age approximately uniformly distributed across the age range 18-87 (M=53.96).Table 1 contains descriptives of the sample in terms of sex, basic cognitive function and health factors.Note that, because our sample is cross-sectional, the findings relate to individual differences that are compatible with the watershed model, which may be, but are not necessarily, the same as age-related changes within an individual (e.g.Hofer & Sliwinski, 2001).A subset of these data have been reported in (Kievit et al., 2014).The covariance matrix is in the Supplementary

Processing speed
Processing speed reaction times on three different cued-response tasks: simple response time (SRT), choice response time (CRT) and audio-visual cued response time (AV).These tasks differed in their demand characteristics, such as the nature of the cue and the predictability of the stimulus, and so may tap different aspects of PS.The AV task in particular differed from the SRT and CRT in that participants were not explicitly instructed to respond as quickly as possible, so this variable captures the natural response time in the absence of time pressure.For procedural details, see Figure 2A-C and Shafto et al., 2014: p. 6 (for SRT and CRT) and p. 16 (for AV).We include the mean and standard 14 deviation for all three tasks, leading to a total of 6 measures of.All six variables were scaled to a standard normal distribution, log transformed and inverted (1/RT), so that higher scores reflect speedier and more consistent responses respectively (henceforth: SRTspeed, CRTspeed, AVspeed, SRTcons, CRTcons and AVcons).

Fluid intelligence
FI was measured using the Cattell's Culture Fair, Scale 2, Form A (Cattell, 1971), administered according to the standard protocol.This is a pen-and-paper test, consisting of four subtests with different types of abstract reasoning tasks, including series completion, classification, matrices and conditions.These four subtests each yield a sum-score representing the total number of correct responses which was scaled to a standard normal distribution prior to further analysis.Figure 2d shows an example test item.

White matter Organisation
In order to assess how different white matter tracts contribute to different cognitive functions, we computed mean Fractional Anisotropy (FA) values in various regions of interest (ROIs).Nonetheless, it should be noted that FA is a complex measure, and the relationship between FA and white-matter health is not yet fully understood (Jones et al., 2013;Wandell, 2016).There are a number of alternative measures that can be derived from diffusion-weighted MR images, which avoid the simplified single-tensor model, but the physiological validity of these is still under development (Tournier et al., 2011).A model focused on the contribution of the constituent physiological characteristics of white matter would be ideal for future applications of the watershed model, e.g. to test the complementary roles of axonal structure vs. myelin fraction (Caspers et al., 2015;Seehaus et al., 2015).Because our model does not make specific predictions for specific cellular constituents (e.g., water fraction, axonal diameter, myelin density), we favour the use of a simple tensor measure of diffusion organization, fractional anisotropy (FA).Most importantly, the vast majority of studies of white matter in healthy aging have used FA, and FA has been shown to be a comparatively reliable metric (Fox et al., 2012).For more detail on the white matter pipeline, see Appendix A. We computed mean FA for ten tracts as defined by the Johns Hopkins University white matter tractography atlas (Hua et al., 2008): The Uncinate fasciculus (UNC), superior longitudinal fasciculus (SLF), inferior Fronto-occipital fasciculus (IFOF), anterior thalamic radiations (ATR), forceps minor (FMin), forceps major (FMaj), cerebrospinal tract (CST), the inferior longitudinal fasciculus (ILF), ventral cingulate gyrus (CINGHipp) and the dorsal cingulate gyrus (CING) -see Figure 5A.

SEM
All models were fit using the package Lavaan (Rosseel, 2012) in R (R Development Core Team, 2016).
Prior to model fitting, variables were scaled to a standard normal distribution and log transformed where necessary to increase normality.All models were fit using Maximum Likelihood Estimation (ML) using robust standard errors and report overall model fit assessed with the Satorra-Bentler scaled test statistic.Model fit was also assessed with the chi-square test, RMSEA and its confidence interval, the Comparative Fit index and the standardized root mean squared residuals (Schermelleh-engel et al., 2003).We define good fit as follows: RMSEA<0.05 (acceptable: 0.05-0.08),CFI>0.97 (acceptable: 0.95-0.97)and SRMR <0.05 (acceptable: 0.05-0.10)and report the Satorra-Bentler scaling factor for each model.Models are compared using a chi-square test when nested and using the AIC in other cases. 16

Results
To examine the predictions of the watershed model, we will build up the full model, starting at the 'top'.First, we fit our measurement model, namely relating the latent variable FI to the four scores on  In order to establish the relationship between PS and FI, we first examined the dimensionality of the PS measures.Since the watershed model suggests that variables more 'upstream' may be partially independent, we first tested whether a single unidimensional model fit the six PS measures (see also Babcock, Laguna, & Roesch, 1997).A single factor model (Supplementary Figure 1A) to the six PS measures showed poor fit χ2 = 696.67,df =9, p < 0.0001, RMSEA = 0.371 [0.349 0.394], CFI = 0.549, SRMR =0.162, Satorra-Bentler scaling factor=1.08.We then examined whether a model with two latent variables (Supplementary Figure 1B), one for speed (measured by SRTspeed, CRTspeed, AVspeed) and one for consistency (SRTcons, CRTcons and AVcons) would fit better.This model also fit poorly: χ2 = 663.940,df =8, p < 0.0001, RMSEA = 0.384 [0.361 0.408], CFI = 0.57, SRMR = 0.158, Satorra-Bentler scaling factor=1.13as did an alternative model (Supplementary Figure 1C) with a latent factor for each task (SRT, CRT and AV) χ2 = 122.691,df =6, p < 0.0001, RMSEA = 0.187 [0.160 0.216], CFI = 0.923, SRMR = 0.048, Satorra-Bentler scaling factor=1.084).This suggests that our measures of PS cannot be reduced to a single dimension.However, a more crucial question in the context of the watershed model is whether the upstream antecedents will make partially independent contributions to fluid intelligence.To test this hypothesis, we fit the simple MIMIC model shown in Figure 4.This showed excellent fit to the data: χ2 = 18.456, df =20, P=.56, RMSEA =0.000 [0.000 0.033], CFI = 1.00,SRMR =0.010, Satorra-Bentler scaling factor=1.034.Most strikingly, five out of the six PS variables (all but SRTspeed) predicted unique variance in fluid intelligence.Next, we compared this model, with all PS to FI pathways estimated freely, to more parsimonious competitors.First, a simple model where all PS to FI pathways were set to 0 fit considerably worse than a model with PS to FI pathways estimated freely (χ2Δ = 331.72,dfΔ =6, p<.0001).Second, a model where all PS to FI pathways were constrained to be equal (Supplementary Figure 1D) again fit worse than the full model (χ2Δ = 175.5,dfΔ =5, p<.0001), suggesting specificity of the pathways in line with the watershed model.Finally, we tested whether only estimating the strongest pathway (CRT-speed) and constraining the other pathways to 0 might suffice (Supplementary Figure 1E).This model too showed worse fit than estimating all six pathways freely (χ2Δ = 69.668,dfΔ =5, p<.0001).Together these comparisons show that the relationship between processing speed is many-to-one, and cannot be captured fully by a single pathway, supporting the suggestion by Salthouse (2000) that different indicators of PS may reflect 'somewhat distinct processes' (p.41). Figure 4 shows the partially independent contributions of the PS measures to FI that together explain 58.6% of the variance in FI.
Perhaps surprisingly, AVspeed had a modest negative path, suggesting that those with higher FI scores are those who had fast response speed when instructed to respond quickly, but slower response speed when not so instructed.These findings suggest that different tasks and instructions can tap into distinct underlying processes, and that individual differences in these processes combine to explain a considerable portion of individual differences in FI.In other words, the results suggest that mental speed is multifaceted and that different elements play complementary roles in supporting higher cognitive abilities (Schubert et al., 2015).
Finally, the 'lowest' layer of our model pertains to the structural organization of WM as evidenced by the isotropy of water diffusion in major WM ROIs.The covariance of WM structure across individuals is informative because its dimensionality can suggest possible mechanisms driving individual differences, which has been a subject of contention in the literature (e.g.Kievit et al., 2014; 19 Lövdén, Laukka, et al., 2013;Ritchie, Bastin, et al., 2015).We fit SEMs to the mean FA of the ten, bilaterally averaged, WM tract ROIs (Figure 5a), which showed different sensitivities to age (Figure 5B).21 3.1 Full model So far, the findings are in line with the predictions of the watershed model.We then fit the full watershed model (Figure 6), integrating fluid intelligence, PS and FA in a set of major WM tracts.
By doing so, we can simultaneously test the hierarchical structure and many-to-one mapping imposed by our conceptual framework.In this model, we allow for residual covariances within, but not between levels.This model captures the assumption that all influence that WM tracts have on FI should go through the level (i.e., any residual covariance between WMI and the latent variable of FI, or any of the subtests, would be a source of misfit).Finally, any influence of PS on the Cattell subtests should go via the latent variable of FI.The full model, as shown in Figure 6 fits the data very well: χ2 = 103.201,df =60, P<0.001, RMSEA =0.036 [0.024 0.047], CFI = .986,SRMR =0.034, Satorra-Bentler scaling factor=1.033.This suggests that the observed covariance pattern in our data is compatible with the statistical constraints imposed by the watershed model.In other words, that the data are compatible with the hypotheses that the three explanatory levels stand in a hierarchical relationship, such that WM determines PS, which in turn determines FI.Given that FA is known to be affected by vascular health, we also added four binary cardiovascular/general health conditions in Table 1 as covariates (affecting white matter FA).This had little effect on model fit, χ2 = 156.280,df =100, P<0.001, RMSEA =0.032 [0.022 0.041], CFI = .989,SRMR =0.039, Satorra-Bentler scaling factor=1.122,suggesting that the results from the full model did not simply reflect unmodelled differences in vascular health.
A key prediction from the watershed model is that the relationship between upstream measures and downstream consequences is many-to-one.We can test this hypothesis by investigating, as we did previously when relating PS to FI, whether more parsimonious accounts of the relationship between WM and PS show better fit.First, we found that a model with all WM to PS pathways constrained to 0 showed poor fit (χ2Δ = 311.8,df Δ =60, p<0.001).Second, we tested a 'strongest path only' model, estimating the Forceps Minor pathway but fixing all others to 0 (Supplementary Figure 2A).This model too showed worse fit χ2Δ = 80.31, df Δ =54, p<0.05.Next, we tested a 'white matter tract specificity' model.Here we allow the effect of white matter to vary between tracts, but to be equal across the six processing speed measures (Supplementary Figure 2B).
This model also fit worse, χ2Δ = 142.76,df Δ =40, p<0.001).Finally, we tested a 'processing speed specific' model, in which the effects were allowed to differ across processing speed measures, but constrained to be equal for each tract (Supplementary Figure 2C).This model again showed poorer fit than the full model (χ2Δ = 142.45,df Δ =54, p<0.001).
Finally, to establish that the fit of the full model was not merely a consequence of some more general property of the covariance matrix, we performed control analyses to test the nature of the hierarchical relationship.Firstly, we inverted the two lower levels, such that PS affected WM organisation which in turn directly affected FI (again allowing for residual covariances between all WM tracts, but precluding direct influence between PS and fluid intelligence).The original watershed model fit the data considerably better (AICdiff= 224.28).Secondly, because there are more WM Residual covariances between processing speed variables and white matter tracts are allowed, but not shown for simplicity.See Supplementary Table 1 for the full covariance matrix, and Supplementary Table 2 for the unstandardized parameter estimates and se's.One notable observation was that the strongest prediction of PS was WM organisation in the Forceps Minor (also known as the anterior forceps, which passes through the genu of the corpus callosum).This variable explained significant amounts of variance in five of the six response time measures.Moreover, this relationship was strongest for the most 'executive' of PS variables, the speed and consistency of the Choice RT task, in line with previous findings that suggest an important role for prefrontal WM in such tasks (Davis et al., 2009).In fact, inspection of the modification indices suggested a possible residual, direct pathway from the Forceps Minor to FI, and adding this pathway did indeed lead to improved fit (χ2Δ = 36.297,dfΔ =1, p<0.001).This suggests the possibility of a cognitive endophenotype between white matter and FI that was not captured by our PS measures.A likely candidate for this endophenotype to explore in future work is working memory capacity (e.g.Fry & Hale, 1996).Importantly however, adding this direct pathway did not affect (e.g.render nonsignificant) the existing PS to FI paths, supporting an independent pathway rather than a violation of the proposed hierarchy.
The second strongest effect was from the cortico-spinal tract, which affected three speed measures above and beyond the variance already captured by Forceps Minor (see also Duering et al., 2013, andLövdén et al., 2014).Together, these findings provide significant support for the watershed model: white matter, processing speed and fluid intelligence stand in a hierarchical, many-to-one relationship that requires measuring a broad spectrum of variables at each explanatory level.

Discussion
In our population-based, age-heterogeneous sample, we found strong evidence for a hierarchical relationship between fluid intelligence, processing speed and white matter.Specifically, individual differences in WM anatomy predicted individual differences in processing speed, which in turn predicted over 58% of the variance in fluid intelligence scores.This model performed significantly better than control models that inverted these explanatory levels, or models that imposed equal or summary effects between levels.This watershed model is based on theoretical considerations, is in line with a wealth of empirical evidence, and provides an overarching framework for modelling causal relationships.All statistical predictions derived from the watershed model were supported by our data.First, our outcome measure (fluid reasoning) fit a single factor model, whereas no lowdimensional models fit for either PS or Second, there was evidence for a many-to-one mapping, such that multiple individual variables at the processing speed and WM levels explained unique variance in higher explanatory levels.Third, the overall model fit significantly better than various alternatives, providing evidence for hierarchical dependency.These findings have important implications for both the understanding of age-related declines in intelligence, and for the proscriptive ability of cognitive neuroscience to potentially inform successful interventions within aged communities.We expand on the findings and implications below.
Five out of the six processing speed measures predicted unique variance in fluid intelligence, supporting the hypothesis that processing speed is a multidimensional construct, and that these subtle aspects are important for understanding higher, abstract cognitive abilities.Similarly for WM, and in line with recent work (Lövdén et al., 2013a), we found that individual differences in WM are multidimensional, and that these different dimensions have partially independent predictions for processing speed.The strongest influence was that of the Forceps Minor, which predicted five distinct processing speed measures, with the Corticospinal Tract and the Inferior Fronto-Occipital Fasciculus also explaining considerable variance in PS measures.In contrast to views that aging represents a monolithic decline, the current findings support the idea that distinct brain regions and distinct cognitive abilities change in different ways, and that only models which strive to incorporate this multiplicity in their explanations of age-related decline can capture the entirety of age-related processes (Andrews-Hanna et al., 2007;Kievit et al., 2014;Lövdén et al., 2014;Tucker-Drob, 2011).
These findings may have implications for the design and implementation of cognitive training interventions.The watershed model suggests that transfer to other cognitive domains (see also (Taatgen, 2013) may only be achieved if the intervention is of sufficient length and intensity to affect the entire hierarchy of relationships, including the mapping of lower processing speed to higher cognitive processes.Such a suggestion is supported by work showing that white matter supports many distinct cognitive functions (Burzynska et al., 2013).For example, to truly improve fluid reasoning and not just observed scores (Hayes et al., 2015), cognitive training would have to be of such duration that lower levels (such as processing speed and WM) are also measurably affected (Keller and Just, 2009;Scholz et al., 2009), which can then generalize to other domains (Lövdén et al., 2010a).Behavioural evidence for such a pattern was found by Edwards et al. (2002), who showed transfer of speed of processing training to multiple cognitive domains in older adults.Additional evidence for this possibility comes from study by Schmiedek et al. (2010) who observed modest cognitive transfer as well as white matter microstructural change (Lövdén et al., 2010b) after a high intensity (100 day) cognitive training.Notably, the positive effects remained for up to two years (Schmiedek et al., 2014).
In addition to the empirical findings reported here, there are methodological advantages to implementing this model.By visualizing the full model, including statistical quantification of the strengths of association such as R 2 , it immediately emphasizes not just which ties are strong and wellestablished, but also shows where our knowledge is lacking.For example, not all variance can be explained in either fluid reasoning or processing speed (see also Rabbitt et al., 2007b), suggesting we need to explore other metrics of processing speed (e.g.inspection time, or digit-symbol substitution), additional cognitive determinants (e.g.working memory; Engle, Tuholski, Laughlin, & Conway, 1999;Fry & Hale, 1996), and additional neural markers such as prefrontal activity (e.g.Christoff et al., 2001), and structure (Waltz et al., 1999;Woolgar et al., 2010), grey matter indices (Kievit et al., 2014;Stuss et al., 2003) or additional WM metrics such as MD and AD (Tamnes et al., 2012) to obtain a more complete picture.
One limitation of the model as implemented here is that our sample is cross-sectional, not longitudinal.This means that although we can model the extent to which individual differences are in line with the watershed model, but we cannot make claims concerning intra-individual changes over the lifespan (Raz & Lindenberger, 2011;Salthouse, 2011, Kievit et al., 2013) , 2016;Yang et al., 2014) would allow for even more detailed investigation of the key hypotheses tested here, as would expanding the range of white matter metrics to include other measures of diffusivity and measures of magnetisation transfer (e.g.Penke et al., 2012;Yang et al., 2014).
In summary, the watershed model provides a powerful conceptual framework that organizes our knowledge and generates testable models of the expected covariance patterns within and across individuals.A strength of the model is that it naturally accommodates extensions in both 'directions'.
For example, findings in this model could be integrated with the study of other, even broader phenotypes.One notable and important extension of the model would be the inclusion of long-and short-memory measures, or measures of attention, which assess both an important aspect of higher functioning and also are notable in their age-related loss.By integrating multiple hierarchical models the interrelationships between cognitive phenotypes may become clearer.More ambitiously, it may be possible to integrate a model fit in a sample like ours with a larger study of psychopathological disorders known to be associated with impairments to cognitive abilities similar to fluid intelligence (e.g.schizophrenia, Snitz et al., 2006).

Figure 3 .
Figure 3.A single-factor confirmatory factor analysis for Cattell fits data well (left).Linear and polynomial fit of age-related differences in fluid intelligence (right).

Figure 4 .
Figure 4. MIMIC model for Processing Speed and Fluid intelligence.Below age-related trajectories for each processing speed measure ranging from strong CRTspeed, r=-0.64) to absent (AVspeed, r=0.03, n.s..).Residual covariances between PS variables are allowed but not shown for simplicity.

Figure 5 .
Figure 5. A) All ten white matter tracts used in our analysis, based on the JHU Atlas.B) Differential ageing of the ten tracts, correlations ranging from -0.71 (Forceps Minor) to -.10 (Ventral Cingulum, or CINGHipp).

Figure 6 :
Figure6: Full watershed model.Significant parameters are shown in green and red, R-squared is represented as the degree of shading of the variables.Residual covariances between processing speed variables and white matter tracts are allowed, but not shown for simplicity.See Supplementary Table1for the full covariance matrix, and Supplementary Table2for the unstandardized parameter estimates and se's.
variables than PS variables (which might affect model comparisons), we exhaustively compared all 210 combinations of 6 tracts (to match the number of PS variables) to an inverted model with the same subset of tracts.In every model comparison, the watershed model outperformed the inverted model (AICdiff ranged from 141.76 to 264.85 in favour of the watershed model).Again, including cardiovascular factors as WM covariates had little effect, with the watershed model being preferred to the inverted model in every combination (AICdiff ranged from 150.41 to 253.03 in favour of the watershed model).

Table 1 .
The raw data and analysis code are available upon signing a data sharing request form (see http://www.mrccbu.cam.ac.uk/datasets/camcan/ for more detail).As our sample included older participants, we report prevalence of common cardiovascular conditions in Table1.

Table 1 :
Sample descriptives of the N=555 sample.