Hierarchical modelling of functional brain networks in population and individuals from big fMRI data

Highlights • We introduce stochastic PROFUMO (sPROFUMO) for inferring functional brain networks from big data.• sPROFUMO hierarchically estimates fMRI networks for the population and every individual.• We characterised high dimensional resting state fMRI networks from UK Biobank.• Model outperforms ICA and dual regression for estimation of individual-specific network topography.• We demonstrate the model's utility for predicting cognitive traits, and capturing subject variability in network topographies versus connectivity.


Introduction
Spontaneous fluctuations in human brain activity, and their interpretations, have been a key focus of human neuroscience research for several years ( Biswal et al., 1995 ;Buckner and Vincent, 2007 ;Calhoun et al., 2008 ;Damoiseaux et al., 2006 ;Raichle et al., 2001 ;Smith et al., 2013b ). Resting state networks (RSNs) characterise functionally synchronised regions that underlie the brain function in the absence of active tasks, and have been replicated in different population cohorts ( Fransson et al., 2007 ;Lee et al., 2013 ) and using several imaging modalities such as functional Magnetic Resonance Imaging (fMRI) ( Allen et al., 2014 ;Beckmann et al., 2005 ;Power et al., 2014 ) and Magneto-/Electroencephalography ( Brookes et al., 2011 ;de Pasquale et al., 2010 ;Mantini et al., 2007 ;Vidaurre et al., 2018b ), and have been shown to coactivate with spontaneous replay of recently acquired information ( Higgins et al., 2021 ). Large-scale neuroimaging datasets such Functional brain modes 1 in rest and task are conventionally modelled using group-average algorithms, such as independent component analysis (ICA) ( McKeown et al., 1998 ). Typically, modes are modelled as spatially contiguous parcels ( Bellec et al., 2010 ;Craddock et al., 2012 ;van den Heuvel et al., 2008 ) or functionally unified systems distributed over multiple brain areas ( Beckmann and Smith, 2004 ;Calhoun and Adali, 2012 ;Thomas Yeo et al., 2011 ), and characterised in terms of spatial configuration over the brain voxels (mode topography) and a summary time course that captures mode activity over time. Of particular interest is to characterise the functional connectivity between the modes themselves, ideally using models that can accurately disambiguate changes in functional connectivity across separate dimensions (e.g. spatial or temporal features) ( Bijsterbosch et al., 2019( Bijsterbosch et al., , 2018, thereby obtaining functional connectomes. Recent evidence indicates that even after registration of the subject data to a standard brain such as MNI space, functional modes still significantly vary across individuals ( Glasser et al., 2016 ;Gordon et al., 2017a ;Mueller et al., 2013 ). These misalignments can be due to multiple factors including limitations in methodologies for aligning subjects' data (e.g. registration errors), differences in subjects' brain anatomy ( Devlin and Poldrack, 2007 ;Llera et al., 2019 ) or inherent differences in functional localisations of the modes ( Bijsterbosch et al., 2018 ;Gordon et al., 2017b ;Haxby et al., 2020 ;Laumann et al., 2015 ). Therefore, it is increasingly desirable to devise models with hierarchical links between the population and individuals, which can accurately cope with subject deviations from the group, while at the same time maintaining a representative group model that provides correspondence over subjects. Hierarchical models can differ with respect to multiple factors contributing to subject and/or group level mode estimations. Some of the key factors include: the direction of information flow between population and individuals; explicit versus derivational mode estimations for individuals; iterative versus one-off mode estimations; defining hierarchies over mode topography or functional connectivity; defining modes as distributed networks versus contiguous parcels; hard versus soft boundaries between modes; and scaling of algorithms to modern big data such as HCP ( ∼1000 subjects) and UKB ( ∼100,000 subjects). Details of these factors and some examples in existing models are presented in Appendix A.
A key element distinguishing hierarchical models is the way the direction of information flow between the population and individuals is set up. Unidirectional models often start from group-level estimations and derive subject modes as a variant of the group, of which dual regression is a well-known example ( Beckmann et al., 2009 ;Nickerson et al., 2017 ). It runs a two-step multiple regression paradigm: firstly, group maps, typically from spatial ICA, are regressed against subject fMRI data to extract subject-specific mode time courses; secondly, DR regresses subject mode time courses against fMRI time-series to obtain subject-specific spatial mode maps. Alternatively, bidirectional models such as multi-subject dictionary learning ( Abraham et al., 2013 ;Varoquaux et al., 2011 ), hierarchical ICA ( Guo and Tang, 2013 ;Shi and Guo, 2016 ), hierarchical topographic factor analysis ( Manning et al., 2018 ) and the framework of Probabilistic Functional Modes (PRO-FUMO) ( Harrison et al., 2015 ) allow for bottom-up, data-driven estimation of the group-level modes, as well as top-down regularisation of the subject specific modes using the group-level model -and can thus accommodate larger degrees of population heterogeneity.
Another important factor that distinguishes different hierarchical models is whether the hierarchy is defined over the mode topography or functional connectivity. As elaborated in Appendix A, existing algorithms often define hierarchy on spatial maps, thus identifying a consensus spatial layout for the modes, and estimating how the spatial arrangement of the voxels belonging to the mode varies across indi-viduals ( Glasser et al., 2016 ;Manning et al., 2018 ;Mejia et al., 2019 ;Nickerson et al., 2017 ;Shi and Guo, 2016 ). An alternative approach is to estimate a between-mode functional connectivity matrix (i.e. NetMats) at the group-level, and hierarchically estimate subject-specific connectivity through making comparisons with the group ( Chong et al., 2017 ). Notably, however, defining hierarchy solely on spatial topography or temporal connectivity might neglect how these two elements interact with each other in reality, which might in turn result in topographical misalignments being (mis)interpreted as (being changes in) functional connectivity or vice versa ( Bijsterbosch et al., 2018 ). One solution is proposed by the latest version of PROFUMO ( Harrison et al., 2020 ) which defines hierarchical models on both spatial topographies and functional connectivity.
Despite their promise for modelling population heterogeneity, the application of hierarchical models to modern high-resolution fMRI with thousands of subjects is limited because of computational costs. Recent work has proposed solutions that are applicable to group models or unidirectional hierarchical models, e.g. using incremental or stochastic matrix factorisations ( Mensch et al., 2018 ;Smith et al., 2014 ), or utilising Bayesian priors from group estimations to inform subject-specific network modelling ( Mejia et al., 2019 ). Nevertheless, the application of the bidirectional hierarchical models to big data remains impractical (c.f. Fig. 1 ) ( Mejia et al., 2019 ).
Contributions of the current study are twofold. Firstly, we propose an advance to PROFUMO using stochastic variational Bayes (VB) that can reduce the computational costs by a factor of 100, in order to scale the model for big data. We refer to the proposed model as stochastic PROFUMO, or sPROFUMO for short, and refer to the probabilistic functional modes identified using this approach as PFMs. Stochastic VB ( Hoffman et al., 2013 ) entails partitioning small, random subsets of observations into numerous batches, and visiting only a fraction of the population in every iteration, thus significantly reducing the computational expense ( Fig. 1 ), without loss of generalisability. For the purposes of fMRI analyses, different subjects provide a natural way to subdivide the data, and our algorithm therefore iterates through updating the group model over batches, while simultaneously optimising subject modes within these batches. This approach has been successfully applied to neuroimaging data to discover functional modes using group-level Hidden Markov Models (HMM) ( Vidaurre et al., 2018a ). Generalising this approach in sPROFUMO requires modifications to the original algorithm to ensure preserving group-level generalisability, cross-subject variability and subject-group alignment, as we will elaborate later in section Model. In addition to scaling the model to work on big data, sPROFUMO substantially reduces the computational expenses for medium-sized datasets while preserving the accuracy of estimations, and thus is expected to facilitate the model's broader usability.
The second contribution of this study is to unravel the cognitive relevance of sPROFUMO RSNs in individual subjects, and disentangle the spatial versus temporal properties of RSNs that contribute to the prediction of cognitive function. We applied sPROFUMO to fMRI data from 4999 UKB subjects, which allowed us to obtain both our first highdimensional PFM decomposition of brain function (100, 150 and 200 modes), and an increase in the detail of mapping of RSNs in single subjects compared to what has been reported previously from UKB. We tested how accurately these PFMs can predict 68 cognitive scores that cover a range of sensorimotor, memory, executive functions and general fluid intelligence. Our focus on high-dimensional RSNs was informed by recent studies that suggest the functional parcellations with 100-1000 modes to provide more elaborate delineation of the brain function compared to low-dimensional decompositions, which is useful for prediction of non-imaging variables ( Dadi et al., 2020 ;Pervaiz et al., 2020 ). We investigated in detail the prediction power of mode elements in spatial and temporal domains, and identified the sPROFUMO modes, or PFMs, that provided the best predictors. We further carried out a detailed comparison between sPROFUMO and ICA-DR with regards to accounting for to run the model on up to 100,000 participants, compared to classic PROFUMO. RAM usage for stochastic PROFUMO is calculated with batch size set to 50 or 1% of the full population, whichever the larger. Here we have used volumetric data from UK Biobank as reference for the calculations, which includes ∼230,000 voxels (brain masked), 1 recording session and 490 timepoints per recording. Both x-and y-axes are logarithmic scales.
cross-subject variability and prediction power for cognitive heterogeneity.

Model
In this section, we first provide a brief conceptual summary of PRO-FUMO (see ( Harrison et al., 2020 ) for further details), and next elaborate on the stochastic variational inference and its application to obtain sPROFUMO.

Classic PROFUMO summary
PROFUMO is a hierarchical matrix factorisation framework with two levels of subject and group modelling.
At the subject level, fMRI timeseries ( D sr ) are decomposed into a set of spatial maps ( P s ), time courses ( A sr ) and time course amplitudes ( H sr ), with residuals : where s denotes each subject and r denotes a recording session. P s denotes spatial mode layout of the modes across the brain voxels, which hereafter we will refer to as spatial maps , or just maps . A sr represents mode time courses and H sr denotes mode amplitudes. N v , N m and N t denote the number of voxels, modes and time points, respectively. Assuming that anatomical and functional organisation of the brain regions remain unchanged for a subject, the spatial maps, P s , are assumed to be the same across multiple scans (or runs -sets of continuous timeseries data). Each map is modelled using a Double-Gaussian Mixture Model (DGMM), where one Gaussian component is set to account for the signal and a second Gaussian distribution captures the background spatial noise in each voxel. Additionally, a per-mode membership probability is estimated to determine the probability of a voxel belonging to the signal versus the noise component. Therefore, P s consists of five posterior parameters (one map for each): signal mean, signal variance, noise mean, noise variance and membership probability. Spatial maps are one of the elements that are estimated hierarchically in PROFUMO.
Therefore, in addition to P s for each subject, a consensus set of grouplevel parameters is also estimated to capture both the group maps and the key patterns of cross-subject variability in the spatial domain. Concretely, we introduce a set of hyperpriors on -and infer the group-level posteriors over -the signal means, signal variances, noise variances and membership probabilities.
The mode time courses, A sr , are estimated separately for each scan. These timeseries are allowed to vary across runs such that PROFUMO can account for both the unconstrained nature of the resting state and the inherent cross-run variations in task data. Similar to the spatial maps, mode time courses are also modelled as a sum of a signal and a noise component, where the signal component is HRF-constrained and the noise component follows a Gaussian distribution. A sr is not modelled hierarchically, because in general (at least for resting state data) there will be no consensus temporal structure across the subjects. However, even though the time courses are unconstrained, the temporal correlations amongst the neural time courses can be both highly structured within-individuals, and have a consensus structure across individuals ( Shehzad et al., 2009 ). In order to account for these correlations, PRO-FUMO models the precision matrix hierarchically and using Wishart distributions. Additionally, because of haemodynamic responses that govern the BOLD signal, the estimation of the partial correlation matrices explicitly incorporates HRF-constrained autocorrelations between the modes.
Finally, the mode amplitudes, H sr , are modelled as positive diagonal matrices, to capture (subject and/or scan) variations in the amplitude of the mode time series. In the dual regression paradigm, these amplitudes appear as the time course standard deviations and have been shown to reflect meaningful sources of within-and between-subject variations ( Bijsterbosch et al., 2017 ). The PROFUMO framework captures these variations via multivariate normal distributions, again with a hierarchical link between the group and the subject levels.

Standard VB
PROFUMO uses a variational Bayesian (VB) framework to find a solution for the full probabilistic model described in the previous section. This optimises the parameters of an approximating distribution , with the aim that it is as close as possible to the true posterior. In each model iteration, the group model is used to constrain the subject-specific matrix factorisations, from which one can infer posterior distributions for subject-specific spatial maps, time course correlations and amplitudes. Next, the posterior evidence is accumulated and combined across individuals to obtain an updated version of the group model. The model iterates between these two levels of estimation until convergence. In order to reach convergence, the algorithm utilises the conjugate-exponential structure of the model to obtain a closed-form version of the natural gradient of the Free Energy (i.e. the difference between the true and approximate posteriors, as measured by the Kullback-Leibler (KL) divergence), based on which we optimise via gradient ascent. Therefore, for observations D (where D is a collection of all D sr in Eq. (1) ), group-level latent variables and subject-level latent variables z , we will have: where [ log ( , , ) ] is the expected log joint, log ( , ) is the (approximate) log joint posterior distribution over the latent variables, and  ( , ) is the free energy of the model. To find a (locally) optimal solution, traditional VB optimisation must have access to the entire dataset in each iteration, which becomes computationally intractable when a large number of subjects need to be analysed.

Substitution of standard VB with stochastic VB to obtain sPROFUMO
One of the main contributions of this work is to reduce the computational expense of the VB optimisation, by substituting standard VB with stochastic VB ( Hoffman et al., 2013 ). Applying this approach to PRO-FUMO requires modifications to the original algorithm to ensure that: a) the group model is not biased by a few batches and instead remains representative of the entire population, b) single subjects are kept consistent with the group regardless of when they have been visited, and c) the model has flexibility to accommodate single-subject heterogeneity in a big data population.
We start by making use of the fact that in VB the group-and subject-level PROFUMO parameters are inferred independently, and define them as global and local variables, respectively. Specifically, meanfield approximations allow for factorisation of the joint posteriors into a global term and a product of local terms; i.e.: where , z are global (i.e. group) and local (i.e. subject) latent variables respectively, governed by global and local parameters and . Here, S and R denote the total number of subjects and recordings per subject. The mean-field approximations allow for simplification of the free energy estimations into a sum of a global and a local term such that: Classic VB iterates between keeping fixed and optimising with respect to = > ∇  ( ) = 0 and keeping fixed and optimising with respect to = > ∇  ( ) = 0 until convergence (where ∇  ( ) and ∇  ( ) denote gradients of  with respect to and , respectively).
With stochastic VB, we instead randomly subsample the local variables (i.e. sessions/subjects) across batches, and focus on continuously improving the estimation of the global parameters as more observations are visited over time. Considering L I ( ) as the free energy term corresponding to the I th batch, we will have: The third term on the right-hand side refers to the optimisation of the local variables within batch I, while the overall aim will be to optimise ( ) across batches. The natural gradient of ( ) is therefore a noisy estimate of the natural gradient of the overall variational objective. According to proofs provided by Hoffman et al. (2013) , even though the noisy gradient ∇  = 0 does not have an analytical solution, the conjugate-exponential structure of PROFUMO means that convergence can be achieved if: a) we scale the subject-specific terms in Eq. (5) in each batch by the factor N (i.e. number of batches), such that ( ) is updated as though the entire population was included; and b) global parameters are updated iteratively, using a weighted sum of the intermediate global parameters obtained from the current batch ( ̂ ) and what has been accumulated over all the previous batches ( −1 ). In other words, we keep a memory of the global parameter updates over time; i.e.: We adopt a similar approach as ( Hoffman et al., 2013 ;Vidaurre et al., 2018a ), in defining a decreasing step size for the stochastic gradient descent: > 0 is the delay parameter of stochastic inference that determines the degree to which initial batches influence the overall inference, and ∈ ( 0 . 5 , 1 ] is the forget rate of the model that determines the degree to which global parameter updates rely on the current batch versus previous batches (i.e. inter-batch variations). Larger thus corresponds to smaller and results in less stochasticity in the model.
Having described the general rationale behind stochastic VB inference and application to PROFUMO, we next elaborate specific changes in the model implementation in Sections 2.2.3 to 2.2.5 . A flowchart of these steps is outlined in Fig. 2 .

Changes in model initialisation
As outlined in 2.1 , PROFUMO estimates multiple parameters per voxel and mode. For a typical volumetric fMRI in 2 mm MNI space, this entails estimating of the order of 150,000,000 parameters per subject.
Considering that with a model of this complexity, random initialisations are likely not viable, the algorithm initialises the group maps based on a group-level variational ICA in order to place the parameters within a realistic ballpark. To obtain this initialisation, subject fMRI data matrices D sr are first dimension reduced using SVD to obtain X sr ∈ ℝ × where p is the number of top singular values. The X sr matrices are concatenated across subjects and further reduced using random SVD to 2 * N m spatial basis vectors.
However, to handle many thousands of subjects, e.g. in UKB, the SVD approach used in the original PROFUMO will become intractable as it requires concatenation of the subject data from the entire population in one matrix. We therefore substituted this step in sPROFUMO with MELODIC's Incremental Group-PCA (MIGP) ( Smith et al., 2014 ). MIGP is an online PCA decomposition, where subjects' data are visited one at a time, and dimension reduced to 2 * N m in every step, until the entire population is visited. Considering that MIGP loads only a few subjects' data at a time, it offers an additional advantage of allowing us to work with the full rank subject data D sr instead of dimension-reduced X sr , and is thus expected to preserve fine-grained modes that are likely to get eliminated in subject-specific dimension reductions.
After obtaining the spatial basis, group-level variational spatial ICA ( Harrison et al., 2020 ;Lawrence and Bishop, 2000 ) is applied to the PCA output in order to infer N m initial maps. These initial maps, together with hyperpriors, are used to initialise the group model in PROFUMO/sPROFUMO. Note that initial maps do not provide subjectspecific results, subject-specific decompositions are conducted during full inference, with the group model used as a prior.

Fig. 2. Flowchart of different steps of stochastic PROFUMO (sPROFUMO)
. Batch updates, progressing from left to right, involve treating group model as global parameters that are continuously updated over time, whilst subsets of subjects are randomly selected and respective parameters are locally optimised within each batch. Blue boxes: group model update process, blue arrows: flow of information from the group to the subjects, where dark blue denotes hierarchical links between the spatial maps and light blue denotes hierarchical links between partial temporal correlation matrices. Orange boxes: subject models and batch-based update process, orange arrows: flow of information from subjects/batches to the group, where dark orange denotes hierarchical links between the spatial maps and light orange denotes hierarchical link between partial temporal correlation matrices. Grey: initialisation and priors, MIGP: MELODIC's Incremental Group-PCA.

Batch randomisations and changes in update rules
Considering that we randomise subjects across numerous small batches of size K, and in order to maximise the chances of every subject being selected multiple times, the randomisation is weighted by a parameter = that determines the probability of a subject i being picked in the current batch, where n i denotes the number of times that the subject i has been picked so far. Subjects that are picked for a batch are initialised as follows: • When a subject is visited for the first time, we can either initialise them based on the initial maps (from variational ICA as outlined earlier in 2.2.3 ) or the latest group model. There are arguably pros and cons to each approach; namely, initialisation based on the latest group will align the subjects and can thus help with quicker model convergence. However, considering that any subsequent batches will be built upon the first few batches, it might result in the overall inference being biased towards the early batches, thus potentially compromising the representativeness of the final group model. We therefore chose to always initialise subjects' first visit based on the initial maps. • In any subsequent visits, subject-specific updates build upon previous incarnations of the subject model. In this way, by keeping a memory of a subject's own previous visits, it can be expected that subject-specific deviations from the group are more accurately accounted for. • After subject-initialisations, we initially keep the group model constant and run multi-iterations of subject updates to bring the subjects into a better alignment with the group. We refer to this step as initial batch updates. • Subsequently, we run the full inference ( Section 2.1 ) on subjects within current batch (local parameters in stochastic VB) as well as the group model (global parameters). We refer to this step as full batch updates, or batch updates for short.
As outlined earlier, we treat group-level parameters as global parameters that stochastic VB optimises continuously across batches. Considering that all the distributions that are used in the model are from the exponential family (e.g. Gaussian), changes in the group-level update rules follow Eq. 6 .
It is worth noting that sPROFUMO can be expected to allow for larger degrees of heterogeneity; i.e. subject deviations from the group average. This is due to the fact that in PROFUMO, all subjects are updated in line with the group in every model iteration, which will produce a tendency for the algorithm to converge to local optima where the subject-level models are close to the group average. On the contrary, in sPROFUMO we are updating stochastically by using a random subset of the subjects at each model iteration with each subject being only visited at random intervals; and though the model update procedures are designed to ensure subject-group alignment throughout iterations, this stochastic updating may help sPROFUMO to jump out of local minima ( Keskar et al., 2017 ;Kleinberg et al., 2018 ).

Additional considerations
Batch randomisations mean that some of the subjects will be selected during the earliest batches and may not be revisited later. We therefore revisit every subject again at the end of the inference and update them with reference to the final group model. This is done to ensure the final estimations of posteriors for all subjects are defined in relation to the same group model. Additionally, with regards to the choice of the stochastic parameters in Eqs. (6) and 7 , it is typically of interest to minimise the running times and RAM memory requirements of the model, while maximising the accuracies of estimations and reaching a convergence point. The number of batches, and initial and full batch updates, control the running times while the choice of batch size controls the memory usage. It is therefore of interest to find an optimal trade-off between these factors, which we will later explore using simulations.

Simulations
We performed two sets of simulations. The first set focused on evaluating sPROFUMO's performance compared to PROFUMO, where the entire population is available at every model iteration. More specifically, we tested how the choice of the key stochastic parameters, namely batch size, stochastic forget rate and full batch updates, affects the accuracy of final results. The second set of simulations were aimed at evaluating sPROFUMO's performance in obtaining high dimensional modes; and in this case, comparison is with the spatial ICA-dual regression paradigm. We explored several scenarios that make the second set of simulations challenging, particularly with respect to accurate reconstruction of high dimensional subject-specific networks. More details of each simulation set are presented in the Results section. For each simulation scenario, we created synthetic data with realistic fMRI settings for 500 subjects with two recording sessions for each subject. Each simulation scenario was repeated twice and results were pooled for reporting.
Overall, we investigated data from 11 scenarios, each repeated twice and each repeat consisting of two datasets (2 × 500) of simulated subjects, providing a thorough evaluation of our model. Each synthetic fMRI data matrix consisted of 10,000 voxels and 300 time points at a TR of 0.72 s, and was created using an outer product model, similar to Eq. (1) . Therefore, spatial and temporal properties of the modes are simulated independently in this pipeline.
At the group level, we defined a set of spatial maps P g that consisted of modes generated from a number of randomly-selected contiguous blocks of voxels (i.e. parcels), such that some of the modes were confined to one brain region and others were distributed over multiple non-contiguous regions. Weights of signal for each voxel within a mode were drawn from a Gamma distribution. Subject-specific variations of these parcels, and subsequently subject spatial mode maps, P s , were defined at the vicinity of the group maps by applying spatial warps and adding background Gaussian noise. Time courses were generated separately for each mode, subject and session, but we defined a hierarchical link between the group and subjects' temporal correlation matrices, following a Wishart distribution, to ensure a consensus representation of the functional connectivity between the two levels. Time courses were first generated as semi-Gaussian neural time course with amplified frequencies < 0.1 Hz, and next convolved with a random draw from the FLOBS basis functions ( Woolrich et al., 2004 ) such that final time courses would mimic the BOLD signal. Finally, random noise was added to the outer product of spatial maps and time courses to create space-time data matrices. More details of simulation parameters are available in ( Bijsterbosch et al., 2019 ;Harrison et al., 2020Harrison et al., , 2015.
The simulated modes were reconstructed using sPROFUMO, PRO-FUMO and ICA/ICA-DR, and results were compared against the ground truth. For this purpose, we initialised all three models based on the same set of spatial bases that were obtained using MIGP, to ensure that the observed differences are not due to the initial PCA. Additionally, we have initialised PROFUMO and sPROFUMO based on the same set of initial maps.

Applying sPROFUMO to UK Biobank
We used UK Biobank (UKB) data, accessed under application number 8107, for volumetric resting state fMRI data as well as cognitive tests and imaging confounds (see 3.4.1 and 3.4.2 for more details on the latter). We randomly drew 4999 subjects from the May 2019 release of this dataset. fMRI data consisted of one recording session per subject, with 490 (occasionally 530) time points per session, at a TR of 0.735 s, yielding ∼6 minutes of data per subject. Data were preprocessed using the standard UKB pipeline that includes quality control, brain extraction, motion correction, artefact rejection using FSL-FIX, high-pass temporal filtering (sigma = 50.0 s, Gaussian-weighted leastsquares straight line fitting) and registration to standard MNI-2 mm space ( Alfaro-Almagro et al., 2018 ). As a part of sPROFUMO's internal initialisation (see ( Harrison et al., 2020 ) for details), we further processed the data using voxel-wise removal of timecourse mean and variance normalisation. We next initialised the model as explained in 2.2.3 and ran full model inference thorough batch randomisations as outlined in 2.2.4 and 2.2.5 , and characterised 100, 150 and 200 sPRO-FUMO modes.

Dealing with the missing modes to obtain high dimensional decompositions
In previous PROFUMO papers and based on real resting state fMRI data from up to ∼1000 subjects in HCP Young Adult cohort, we reported that while the model reliably estimates a low dimensional set of modes (i.e. up to 40), it tends to eliminate any more modes beyond that ( Harrison et al., 2020 ). Additionally, some of the modes that are reliably estimated in the group might be returned as empty in a subset of subjects. This is despite the fact that ICA and ICA-DR have reported up to 300 modes from the same data . The mode elimination in PROFUMO corresponds to the lack of evidence -in a Bayesian sense -for the mode being present, given that subject's data and the inferred noise level. Together, these can render the posterior probabilities of mode presence as zero. The validity of, and reasons behind, these missing modes had remained puzzling to date. More importantly, it is unclear whether these are specific to the s/PROFUMO framework or might also be reflected in a different way in other models e.g. ICA-DR.
In Appendix B, we will present details of five main changes in the data and model that were found to largely resolve sPROFUMO missing modes for the UK Biobank data in the current study; these include: 1) increased group-level signal-to-noise ratio due to the higher number of subjects in UKB; 2) stochastic variational inference in sPROFUMO may have allowed for the VB optimisation to jump out of local minima, resulting in an improved minima that seems to allow for higher degrees of subject variability in the group model; 3) homogeneous spatial smoothness in different parts of the brain (e.g. cortical and subcortical regions); 4) no SVD dimensionality reduction at the single subject level, and 5) increased subject-level SNR by applying a small amount of spatial smoothing. Factors 1, 2 and 3 led to a higher number of group-level modes being reliably reconstructed and factors 2, 4 and 5 increased the number of subject-specific modes.

Applying spatial ICA and ICA-Dual regression to UK Biobank
In addition to sPROFUMO characterisation of functional brain modes from resting state fMRI data, we also used spatial ICA followed by dual regression (DR) to characterise the modes. The ICA and ICA-DR are amongst the most commonly used methods for group-level and subjectlevel estimation of the resting state networks, respectively. Additionally, ICA/ICA-DR have been used as standard RSN characterisation methods from large-scale datasets such as HCP and UKB. Thus, they can provide suitable models to compare sPROFUMO results against. We used FSL tool MELODIC to identify 150 spatially independent ICA modes at the group-level and subsequently used FSL tool dual_regression to map the ICA group-level results onto single subject data. Stage 1 of Dual Regression yielded subject-specific time courses for each mode and stage 2 yielded subject-specific spatial maps. We further computed mode amplitudes as standard deviation of the time courses and estimated functional connectivity between the modes using Tikhonov-regularised partial correlations. It is worth noting that in order to make the final sPROFUMO and ICA/ICA-DR results fully comparable, we used the same PCA initialisation for both models. For this purpose, MIGP results were obtained as explained earlier in 2.2.3 and were fed into both MELODIC and sPRO-FUMO for initialisation.

Prediction pipeline
We investigated the accuracies of the spatial and temporal properties of sPROFUMO modes (i.e. PFMs) in predicting cognitive outcome from behavioural cognitive tests in UKB. We conducted predictions at two levels: multi-mode predictions where output from all the modes were combined, and uni-mode predictions where each mode was used separately for prediction. For both levels, we conducted predictions based on: a) Mode spatial maps, b) Mode temporal network matrices (temporal NetMats; partial correlation matrix between PFM timecourses), c) Mode spatial NetMats (full correlation matrix between PFM maps), d) Mode amplitudes and e) multi-element; i.e. all 4 elements combined. We further compared PFM prediction accuracies to ICA-DR.

Selecting cognitive tests
As the first step, we selected a subset of outcome measures from UK Biobank cognitive tests by pruning 1172 metrics to 68. This initial filtering was based on two criteria, firstly, one of the authors manually screened the outcome measures and only included active measures that were directly reflective of each subject's performance. For example, amongst outputs from the "Reaction Time " task, which entails viewing two cards (A and B) and pressing buttons if they are identical, we included the "Mean time to correctly identify matches ", while "Index for card A in round " was excluded. Secondly, we only selected tests that had a non-NaN value in at least 25% of the subjects. The final list of 68 tests belonged to the following categories: Reaction time, Trail making, Matrix pattern completion, Numeric memory, Prospective memory, Pairs matching, Symbol digit substitution and Fluid intelligence . These tests cover a wide range of cognitive abilities from sensorimotor coordination to memory and executive functions, thus providing a suitable testbed for evaluating the cognitive relevance of the PFMs. The list of 68 measures are shown in Table S 1 and more details are available in UKB website: https://biobank.ctsu.ox.ac.uk/crystal/label.cgi?id = 100026.

Confound removal
The large number of subjects in UKB provides an unparalleled potential for application of machine learning algorithms and predicting nonimaging phenotypes based on image-extracted brain features. However, contamination of the imaging data by interfering factors, often referred to as imaging confounds, is likely to have a significant impact on the interpretability of the results ( Alfaro-Almagro et al., 2020 ;Snoek et al., 2019 ). One solution to alleviate this problem is to regress out the confounds (a.k.a. deconfounding) before applying machine learning predictions. There is not currently a consensus in the literature as to whether or not deconfounding is required in prediction pipelines. Indeed, if the aim is to obtain the highest prediction accuracies for non-imaging phenotypes, regardless of the factors that drive the accuracies, one may opt to run predictions without deconfounding. However, that might arguably interfere with the interpretability of the results in scenarios where a common confounding factor, e.g. head size or head motion might have contributed to prediction accuracies ( Snoek et al., 2019 ). Here, in order to improve the interpretability of our results, we chose to run predictions after deconfounding.
In a recent study, Alfaro-Almagro et al. (2020) provided a comprehensive set of 602 confounds in UKB brain imaging. Here we used a reduced set by: a) selecting conventional confounds including age, age squared, sex, age × sex, site, head size and head motion; b) reducing the remaining confounds by applying singular value decompositions and retrieving the top principal components that explained 85% of the variance. These two steps yielded 82 variables for deconfounding, which were regressed out of both predictor and target variables using linear regression. Importantly, we applied deconfounding within cross-validation folds in order to avoid leakage of information from test to train data, as proposed by ( Snoek et al., 2019 ).

Elastic-Net prediction and cross-validation
The prediction pipeline was implemented in Python 3.6.5 using scikit-learn 0.19.1 . It was built around Elas-ticNet regression and nested 5-fold cross validations, where subjects were split into 5 non-overlapping subsets, and in each iteration 80% of the subjects were assigned to train and 20% to the test group. In each cross-validation loop, where pre-prediction feature selections were required (details in the following sections), the top n% of the features were selected based on correlation with the target variable within the training set. Next, quantile transformation ( QuantileTransformer ) was applied to obtain a Gaussian distribution for each predictor and target variable across subjects. The quantiles were estimated using the training set and applied to transform both train and test data. Next, confound regression parameters (or "betas ") were estimated from the training set and applied to de-confound both train and test data. Finally, ElasticNetCV was used to predict target variables in the test set. Note that we used nested cross-validations within the training set to optimise the ratio of Lasso to Lasso + Ridge regularisation ( l1ratio varied between [0.1,0.5,0.7,0.9,0.95,0.99,1.0] with 10 alphas per l1ratio ). The prediction accuracies were calculated based on correlations between estimated and actual values of the target variables across subjects in the test set.

Multi-mode prediction
Multi-mode prediction involved combining output from all the modes to predict cognitive outcome based on sPROFUMO and ICA-DR modes: • For the spatial maps, estimating 150 modes resulted in a P grand matrix of N subject × N voxel × 150, yielding ∼35 million features per subject. Dimensionality reduction of this matrix to a few hundred features that can meaningfully capture the essence of subject spatial maps is non-trivial. For this purpose, we used unsupervised learning in the form of FMRIB's Linked ICA for big data (bigFLICA) ( Gong et al., 2021 ;Groves et al., 2011 ), which is an ICA framework originally proposed to fuse multimodal data. First, we collapsed subject spatial maps across voxels using sparse dictionary learning and obtained a P dicL matrix of size N subject × 1000 × 150. Next, we considered each mode as a separate "modality " within bigFLICA and obtained a P feat feature matrix of size N subject × 500. Using linked ICA in this way will allow us to preserve subject-specific variations in each mode's spatial map and combine them across numerous modes to obtain a set of independent features to characterise the sources of population variations in PFMs. • For the spatial and full/partial temporal correlation matrices (spatial and temporal NetMats), we started with matrices of size N mode × N mode for each subject. After applying the Fisher r-to-Z transformation, we flattened these matrices by taking the upper diagonal elements. Next, in order to obtain feature matrix from spatial NatMat, i.e. SNET feat of size N subject × 500, we used SVD for dimensionality reduction, and reduced the flattened spatial NetMats to 500 features. Similarly, for temporal NetMat feature matrix TNET feat , we extracted 250 SVD features from each of the full and partial temporal NetMats and concatenated them to obtain a TNET feat of size N subject × 500. Partial temporal NetMats were computed based on precision matrices and with a Tikhonov regularisation parameter 0.01. • Mode amplitudes were used without any transformations; i.e. providing a H feat matrix of size N subject × 150.
We conducted uni-element predictions based on P feat, SNET feat, TNET feat and H feat separately, and multi-element predictions when all these elements were combined. For the multi-element predictions, we utilised pre-prediction univariate feature selections to further reduce from 1650 features to 900. For this purpose, in each cross-validation loop, the top 900 features that showed the highest correlations with the target variable, within the training set, were selected. 5-fold crossvalidations were repeated 20 times for prediction of each cognitive score, the 20 repeats were pooled for visualisation using Bland-Altman plots and averaged for statistical comparisons.

Uni-mode prediction
The uni-mode prediction steps are conceptually similar to multimode predictions outlined above, with a few exceptions. Firstly, for the dimensionality reduction of the spatial maps we did not need to combine information across modes using bigFLICA. Instead, we used sparse dictionary learning to obtain feature matrices of N subject × 500 for each mode. For spatial and temporal NetMats we take rows of correlation matrices for each subject and do not apply additional dimensionality reduction, thus obtaining 149 features per subject, per mode for each NetMat. For the amplitudes, as there is one value per subject-mode, we simply add a mode's amplitude to the temporal NetMat feature matrix.

Statistical significance of predictions
We conducted the aforementioned predictions based on both sPRO-FUMO and ICA-DR, and compared the overall model predication accuracies, as well as the predictions of different model elements in the spatial and temporal domains. In order to estimate p-values, we took note of the fact that cognitive tests included in the study (Table S 1) can be correlated with each other, and as such, a regular paired t -test is not valid because it assumes that the sample covariance is spherical. In order to overcome this problem, we used a generalised least square approach as implemented in MATLAB lscov function, which solves a linear regression, A x = B , by finding x that will maximise (B Here A is a column of ones, B is the difference of the correlation accuracies for each comparison pair (e.g. ICA-DR SMAPs versus sPROFUMO SMAPs). V is the sample covariance, which we approximate by computing the covariance matrix amongst the cognitive tests, and projecting it onto the nearest symmetric positive definite matrix. After estimating x and the standard deviation of x ( 2 ), t-values (similar to paired t -test) can be estimated as √ 2 , which are then used to estimate the corresponding p-values. Note that for the statistical significance testing, cognitive tests with negative predictions for either of the models in a pairwise comparison are removed as these would act as noise and disguise potentially interesting differences between the two models for the well-predicted tests ( Gong et al., 2021 ).

Canonical correlation analysis
Canonical Correlation Analysis (CCA) was used as an add-on test to the prediction results, mainly to disentangle the contribution of spatial versus temporal model elements ( Section 4.6 ) in the prediction accuracies of sPROFUMO and ICA-DR. More specifically, it was aimed to test the degree to which sPROFUMO SMAP/TNETs and ICA-DR SMAP/TNETs capture similar sources of subject variability. It was conducted using the following steps: 6 By finding correlations between columns of U and V for the top 100 CCA components, we estimated shared variances for each pairwise comparison. We further estimated confidence intervals on correlations using bootstraps. 7 We finally tested how many of the CCA components were significantly correlated. For this purpose, we ran permutations: in each iteration, we kept X matrix in step 5 fixed, randomly permuted Y alongside rows (i.e. subjects), and repeated step 6. Across 5000 permutations, we constructed a null distribution for the correlations of the top 100 CCA components. The correlation value corresponding to the top 5% of the null distribution for the first CCA component was used as threshold for p-value < 0.05 significance level for all the CCA components. This yields a significance threshold that is Family-Wise Error-rate (FWE) corrected for multiple comparisons.

Results
Results presented here are broadly focused on: firstly, evaluating sPROFUMO's performance, especially with respect to scaling to big data and reconstruction of a high dimensional set of sPROFUMO modes, or PFMs. Secondly, investigations into the functional and cognitive relevance of these high dimensional modes from resting state data. And, thirdly, a detailed comparison of the model with the ICA & Dual Regression (DR) paradigm, which has been one of the most commonly used methods in RSN research and functional mode discovery.

Simulation set 1 -sPROFUMO vs. PROFUMO
This first simulation set tested how sPROFUMO's accuracies depend on the choice of the main stochastic parameters, and compared it to PROFUMO. Simulations were designed to mimic real fMRI data (as outlined in Section 3.1 ) and comprised two datasets. To mimic big data, each dataset consisted of 500 subjects and 2 recordings per subject. 15 modes were simulated in the brain, which consisted of a mixture of contiguous and distributed modes and were simulated to be temporally correlated and spatially overlapping, where on average 1.3 modes resided per brain voxel. Subject spatial maps (or in short, subject maps) were simulated to have on average of ∼17% spatial misalignment with the group maps (i.e., a given subject mode was on average ∼83% overlapping with the group). Spatial mode layouts were kept constant between subject runs, while mode time courses were allowed to be different, thus resembling the unconstrained nature of the resting-state signals.
As shown in Fig. 3 , we evaluated the model's performance by computing correlations between three elements of ground truth and estimated modes: group maps, subject maps and subject time courses. Firstly, we found that sPROFUMO is robust to the choice of batch size, and by decreasing batch sizes from 20% to 5% of the population, model performance was comparable to PROFUMO ( Fig. 3 a). This shows how sPROFUMO can be used to generate the same results as classic PRO-FUMO while being RAM efficient.
Secondly, we found that sPROFUMO is robust to the choice of ( Fig. 3 b). This indicates that in real data application, one can flexibly tweak values to allow for higher or lower inter-batch variations in the posterior parameters, as would fit data and study requirements. Thirdly, as shown in Fig. 3 c, we found that the accuracy of the results was robust to the initial and full batch updates (see Section 2.2.4 for definitions). Thus, in real data, this parameter can be flexibly altered to reduce the model's running time.
This first set of simulations therefore depicts sPROFUMO's robustness to the choice of specific parameters, thus allowing us to tweak them in real data to obtain more flexible and memory-efficient inference. It is worth noting that for real data applications, it is useful to take data properties such as recording length and noise-level into account when choosing the parameters. For example, for datasets with shorter recordings per subject (e.g. UKB), we recommend using at least 50 subjects per batch to boost within-batch SNR, while for datasets with longer recordings (e.g. HCP), smaller batch sizes can be used. Additionally, for datasets with a small number of subjects and a large number of recording sessions (e.g. Midnight Scan Club ( Gordon et al., 2017b )), different recording sessions can be randomised across batches. Moreover, for datasets with excessive noise and artefact levels (e.g. data from babies or participants with movement disorders), using larger batch sizes can be beneficial to increase the effective group-level SNR.

Simulations set 2 -high dimensional RSNs: sPROFUMO vs. ICA
In the next simulation set, we specifically focused on scenarios that are associated with the estimation of a high-dimensional set of modes, and that can be particularly challenging for the matrix factorisation models. In these simulations, we compared sPROFUMO's performance to ICA-DR. As shown in Fig. 4 , we focused on three factors; i.e. spatial misalignments across subjects, spatial overlaps amongst the modes, and mode size, and evaluated each of these factors based on two resting state datasets, each with 500 subjects and 2 runs per subject (see Figure  A 1 in Appendix C for examples of simulated and estimated sPROFUMO modes).
The first scenario tested the hypothesis that in order to obtain a high-dimensional set of modes, we are required to estimate not only the modes that agree across the individuals, but also the modes with larger degrees of cross-subject inconsistencies (e.g., in the presence of spatial misalignments). We simulated 15 modes per subject, and the amount of misalignment (cross-subject) was increased from a moderate 23.1%, where subject maps were on average ∼80% overlapping with the group; to an excessive 57.7%, where subject and group maps were only ∼40% overlapping. In this scenario, simulated modes were set to be nonoverlapping at the group-level, and mode sizes were fixed on 8.6% of the brain voxels, on average. As shown in Fig. 4 a, all the model elements were somewhat affected by the increase in misalignment, but the effect was most pronounced in the subject map estimations. Additionally, the biggest difference between sPROFUMO and ICA-DR was also reflected in the subject map estimations, where, e.g., for 57.7% misalignments, sPROFUMO's average accuracy remained close to 0.9, while ICA-DR's accuracy dropped to less than 0.5.
In scenario 2 ( Fig. 4 b), we fixed cross-subject misalignments at 300 voxels (i.e. 3% of the brain voxels), and increased the number of modes in the brain from 20 to 30 to 40 while keeping the mode size constant at 8.6% of the brain voxels, on average. Therefore, simulated modes only varied with respect to the spatial overlap between the modes (1.7 to 3.5 modes per voxel on average). We found that while sPROFUMO generally coped well with moderate overlaps (i.e. 1.7 and 2.6 modes per voxel), larger overlaps induced a big impact on both spatial and temporal accuracies of the PFMs. Similar to the previous scenario, we found sPROFUMO's accuracies to be higher than the ICA-DR. Especially for the extreme case of 3.5 modes per voxel, for the group and subject maps we found: 0.6 ICA vs 0.75 sPROFUMO and 0.15 ICA-DR vs. 0.55 sPRO-FUMO, respectively. Larger spatial overlap between the modes induces higher spatial correlations amongst the modes, and as such, this scenario was expected to be particularly challenging for the spatial ICA because it is built around the assumption of spatial mode independence. Somewhat surprisingly, however, we also observed a big impact on sPROFUMO, even though model assumptions were not explicitly violated.
In the third and last scenario ( Fig. 4 c), we held the amount of mode overlap constant, and increased the number of modes in the brain from 20 to 30 to 40, thus obtaining smaller modes, covering 6.5% to 3.25% of the brain voxels. The smaller modes have been frequently reported in high dimensional functional parcellation studies ( Dadi et al., 2020 ;Smith et al., 2015 ), and are particularly challenging to accurately estimate in single subjects because even small amounts of misalignments (relative to the overall number of the voxels in the brain) can easily surpass the number of voxels in these modes. Additionally, estimation of these modes is likely to be more affected by the amount of noise in the data and/or dimensionality reduction that is often a part of the preprocessing pipelines. Here we observed that even though sPROFUMO group map and subject time course estimations remained fairly accurate ( > 0.8) for the smallest modes, subject map accuracies dropped to ∼0.5. Nevertheless, similar to the previous scenarios, its performance remained superior to that of the ICA-DR by up to 100%. In summary, these simulations based on three plausible scenarios, provide initial evidence that high dimensional decompositions, particularly at single subject level, are associated with extra challenges for accurate functional mode reconstructions that are not typically seen at lower dimensions. We further showed that the hierarchical modelling in sPROFUMO helps to overcome these challenges often with up to 100% more accuracy than ICA-DR. In Appendix C (and Figure A 2) we show how these results extend to classic PROFUMO and further discuss some of the plausible reasons for when each of these modelling frameworks yield low accuracies. Future studies are expected to investigate additional scenarios and how these findings might generalise to other single subject modelling frameworks.

sPROFUMO on UK Biobank: different types of RSNs identified in a high dimensional soft parcellation
After evaluating sPROFUMO using extensive simulations in the previous sections, we applied the model to the resting state fMRI data from 4999 subjects in UKB and characterised 150 sPROFUMO modes, or PFMs. For this purpose, we set = 0 . 6 = 5 for the inference (in Eq. (7) ) and subjects were randomised across batches of size 50. The inference was conducted across 250 batches; thus, each subject was included in a batch 2.5 times on average. Within each batch, subjects were first updated 10 times, keeping the group model constant, in order to bring the subjects into accordance with the group (i.e. initial batch updates, see Section 2.2.4 for details and rationale) and next, the full inference was run 20 times (i.e. full batch updates). Therefore, the overall group model was updated 5000 times, and each subject 50 times alongside the group (on average). The model was run on 10 processing cores on a compute node, taking ∼210 hours to complete, and the memory usage peaked at ∼100GB. Model convergence is shown in Figure A 3 (Appendix D), where we measured convergence based on the Free Energy calculated across the full population, as well as the group spatial map and group partial temporal correlation matrix, which represent the two key aspects of the model that are estimated hierarchically. Fig. 5 illustrates the 150 group-level PFMs that consist of a variety of cortical, subcortical and cerebellar modes, providing a high dimensional soft parcellation of the brain. PFMs in this figure are thresholded at 0.5 for illustration purposes. A key distinction between the current results and the previous PROFUMO papers is the reconstruction of a high dimensional set of resting state modes. As shown in Fig. 5 a, we can broadly categorise these PFMs into four categories: 1) high-SNR distributed RSNs (yellow), 2) lower SNR distributed RSNs (blue), 3) lower SNR parcel-like RSNs (green) and 4) PFMs with physiological or MR acquisition origins (purple). Hereafter, we will refer to all the categories as functional modes (or PFMs), but only categories 1-3 are referred to as RSNs. PFMs in category (1) show a close match to the well-known sensorimotor and cognitive RSNs that are typically reported at lower dimensions. PFMs in category (2) appear to represent the variants of sensorimotor and cognitive RSNs that are less frequently found in resting state studies. PFMs in category (3), i.e. the parcel-like modes, were found to spatially overlap with one or several of the distributed RSNs in categories (1) and (2). As will be illustrated later in Section 4.3.1 , we found the signal to noise ratio of the categories (2) and (3) to be lower than the more established RSNs in category (1) We further characterised the similarities between the PFMs based on two types of correlation-based network matrices (NetMats): partial temporal NetMats and spatial NetMats. As outlined earlier in Section 2.1 , in sPROFUMO (and PROFUMO), modes are simultaneously characterised in both the spatial and temporal domains. In the spatial domain, a mode is described based on its spatial layout (i.e. topography) across the brain voxels and in the temporal domain it is described with time courses, time course amplitudes, and time course partial correlations. The group-level spatial NetMat was computed as cross-mode correlations of the group spatial maps and showed a structured pattern consisting of 6 distinct clusters that were characterised by high within-

Fig. 5. Summary results of applying sPROFUMO to 4999 subjects from UK Biobank and obtaining a high dimensional decomposition with 150 PFMs
: a) the reconstructed modes include the large-scale RSNs that are typically found in lower dimensional decompositions, with the addition of less-common RSNs. Four types of PFMs were identified here: high SNR distributed RSNs (category (1) yellow frames), lower SNR distributed RSNs (category (2) blue), contiguous parcel-like RSNs (category (3) green) and PFMs of physiological or acquisition origin (category (4) purple), that altogether provide a high dimensional, soft parcellation of the brain function. Modes shown in this figure are thresholded at an absolute value of 0.5 for illustration purposes. b) Group-average spatial (left) and partial temporal (right) network matrices (NetMats) for sPROFUMO modes. The spatial NetMats cluster into 6 distinct clusters of modes. cluster correlations and low cross-cluster correlations ( Fig. 5 b -left). These clusters were identified using the Louvain algorithm as outlined in ( Geerligs et al., 2015 ) and can be broadly labelled as: somatosensory; visual and cerebellar; variants and subdivisions of DMN; parieto-occipital; fronto-parietal; fronto-central and subcortical. We additionally characterised the group-average temporal NetMat based on partial correlations amongst the mode time courses, as shown in ( Fig. 5 b -right). Using the same mode ordering in Fig. 5 b -left&right shows that modes with high spatial correlations tend to be generally more temporally correlated, albeit with a notable reduction in the strength of the block structure.

Signal versus noise components
Up to this point, we have shown results based on the "signal " elements of sPROFUMO. However, as outlined in Section 2.1 , in both spatial and temporal domains, sPROFUMO additionally estimates noise terms. More specifically, every PFM is characterised by six elements: spatial signal and noise, temporal signal and noise, amplitudes and residuals. Figure A 4 in Appendix D illustrates signal and noise components for a set of example resting state PFMs in three random subjects. In the spatial domain ( Figure A 4a), it can be seen that the signal element captures the spatial topography of a PFM and its variability across subjects, while the background that is irrelevant to the mode topography is accurately assigned to the noise components. This property is key in obtaining clean subject spatial maps, and can be expected to be particularly useful for datasets with naturally lower SNRs or when less data per subject are available. In the temporal domain ( Figure A 4b), the separation of signal and noise components clearly shows how the time courses of example RSNs from categories (1), (2) and (3) from Fig. 5 differ from each other, with category (1) showing substantially higher SNR.

Stability of sPROFUMO PFMs across subjects and model runs
We further evaluated how reliably sPROFUMO modes were estimated across and within subjects based on three set of consistency metrics: a) between-run consistency; b) consistency of results when a smaller population is analysed and c) cross-individual robustness of the results. Here we present a brief summary and further details are presented in Appendix E and Figure A 5. Firstly, between-run stability was measured by re-running the model on the same subjects and measuring correlations between the outputs of the two runs, in order to measure stability to stochastic randomisations. We found average consistency of 0.98, 0.98 and 0.94 for the group spatial maps, spatial and temporal NetMats, respectively. For subject-specific PFMs these values were: 0.90, 0.94 and 0.77, with 0.85 consistency for the amplitudes. Secondly, we tested whether the model will produce replicable results if a smaller population is analysed. For this purpose, we re-ran the model on a subset of the original subjects, with 1500 subjects, and compared subject and group results to that of the 4999 population. We found average replicability of 0.96, 0.92 and 0.86 for the group spatial maps, spatial and partial temporal NetMats, respectively. For subject-specific PFMs, these values were: 0.84, 0.72 and 0.62, with 0.89 consistency for the amplitudes. Thirdly, we measured cross-individual robustness of the results based on subject-to-group (S2G) and subject-to-subject (S2S) consistencies. In the absence of a ground truth in the real data, higher S2G and S2S consistencies have been often used as metrics of performance and robustness in single subject modelling ( Gordon et al., 2020 ;Guntupalli et al., 2018 ). This is particularly due to the fact that we expect the biologically meaningful RSNs to exhibit similar key features across individuals (e.g. right-hand motor network should localise to left motor cortex), and we expect the group model to capture the key elements shared across the population. We found the most-to-least consistent sPROFUMO mode elements to be: spatial NetMats (S2S: 0.79, S2G: 0.89), partial temporal NetMats (S2S: 0.70, S2G: 0.83), spatial maps (S2S: 0.55, S2G: 0.73) and mode amplitudes (S2S: 0.31, S2G: 0.56). It is worth noting that while S2G and S2S consistencies yield useful metrics of results stability, in order to provide more direct evaluation of the model's ability to capture meaningful subject-specific variability in spatial and/or temporal domains, we complement results from this section with additional metrics from simulations ( Section 4.2 ) and prediction power for cognitive tests ( Section 4.6 ).

Functional relevance of sPROFUMO PFMs at the single-subject level
A closer examination of the PFMs in Fig. 6 reveals interesting links with the established functional subdivisions of the brain, not only for the group but also at individual subject level. For example, Fig. 6 a -Right illustrates four large-scale PFMs distributed over motor cortex and cerebellum that closely correspond to a somatotopy map of the foot, left hand, right hand and face/tongue areas in the brain Haak et al., 2018 ;Van Den Heuvel and Pol, 2010 ). Importantly, visual inspection of the example subject PFMs shows rich details of topographical variability across the individuals, whilst the characteristic features of the PFMs, e.g. lateralisation of left versus right-hand motor networks persists. Similarly, Fig. 6 a -LEFT shows seven PFMs in the occipital lobe that correspond to specific functional sub-regions of the visual cortex ( Wang et al., 2015 ).
In addition to the sensorimotor RSNs, we detected multiple variants of the higher-level cognitive networks in single subjects. Fig. 6 b illustrates two variants of the DMN which is typically characterised by sub-networks in bilateral Inferior Parietal Lobe (IPL), Posterior Cingulate Cortex (PCC), bilateral Temporal Lobes and medial Prefrontal Cortex (PFC). The two DMNs identified here were distinct by several factors (see six marked arrows and explanations in Fig. 6 b and its caption). Considering these distinctions, DMN1 and DMN2 seem to closely match respectively to the medial temporal and dorsal medial DMN variants that recent studies have reported ( Andrews-Hanna et al., 2014 ;Braga and Buckner, 2017 ), and which was also reported in the PFMs inferred from HCP CIFTI data using PROFUMO ( Harrison et al., 2020 ). The important distinction between the results we show here and these previous results is that the characteristic features of the PFMs hold in individual subjects, even when using only six minutes of data per subject: for example, note the presence of the Para-hippocampal subnetwork in DMN1 versus its absence in DMN2. By way of contrast, HCP results used one hour of data per subject, and Braga & Buckner's analyses used nearly three hours of data per subject.

Distinct behaviours of sPROFUMO and ICA at 150-mode decomposition
We next conducted a detailed comparison of the sPROFUMO PFMs with 150 functional modes estimated using spatial ICA (FSL's MELODIC) and Dual Regression (DR). Results are shown in Fig. 7 . Note that grouplevel modes in this figure are thresholded for illustration purposes (refer to colourbars and figure caption for details).
At the group-level, we paired sPROFUMO and ICA modes based on the spatial correlation across voxels. As shown in Fig. 7 a, for 74 modes, we found a clear one-to-one correspondence between PFMs and ICs, while for the other half, every PFM corresponded to multiple ICs. A closer examination of the modes revealed that one-to-one correspondence typically occurs for smaller modes (i.e. categories (2) and (3) in Fig. 5 ), such as cerebellar and subcortical RSNs in Fig. 7 b. In contrast, large-scale PFMs corresponded to multiple ICs as shown e.g. for DMN in Fig. 7 b. This pattern is likely a by-product of differences in model assumptions. More specifically, ICA assumes spatial independence amongst the modes and as a result, ICA modes are oftentimes minimally overlapping ( Beckmann and Smith, 2004 ;Bijsterbosch et al., 2019 ). Consequently, by moving to higher dimensions, ICA is expected to split the large-scale modes into multiple fine-grained modes with small overlaps. On the contrary, sPROFUMO does not impose any such assumption and thus yields a mixture of overlapping large-scale and fine-grained PFMs in high dimensional decomposition. As we will later cover in the Discussion , this can have implications for the interpretability of the sPROFUMO and ICA modes.
In addition to the group-level differences, we also detected interesting differences between sPROFUMO and ICA-DR's single-subject results. In Fig. 7 c, we show two visual and motor networks because they provide examples of high SNR modes that are typically reliably estimated in single subjects. As can be noticed, PFM subject maps are less contaminated by the background noise. We showed earlier in Section 4.3.1 how the Double-Gaussian Mixture Modelling in sPROFUMO accurately assigns uninteresting background to the noise component. Here this is reflected in the histogram of the signal map values across the voxels ( Figure A 6 in Appendix F), where for a large number of voxels, signal values are near zero, indicating that noise membership probability has outweighed signal probability. However, such a pattern cannot be systematically captured using ICA-DR and instead, as shown in Figure A 6, it yields one Gaussian distribution across the voxels to account for signal and noise, combined.

Predicting cognitive outcome based on sPROFUMO and ICA-DR
We have so far shown evidence of sPROFUMO's ability to yield stable and functionally relevant results, and its potential to cope with challenging scenarios for reliable estimation of the high dimensional PFMs. We further examined in detail how the model compares against spatial These PFMs show similarities with the well-known subdivisions of the sensory-motor networks and show rich details of spatial variability across participants; b) two variants of DMN, that are distinct by several factors, marked with arrows, including: 1) a Para-hippocampal node exists in DMN1, but not DMN2; 2) while DMN1 includes posterior IPL and Angular Gyrus, DMN2 includes anterior IPL and Temporoparietal Junction; 3) lateral temporal cortex, including anterior temporal lobes are pronounced in DMN2 but less so in DMN1; 4) while DMN1 includes ventromedial PFC, DMN2 includes dorsomedial PFC; 5) a subnetwork in the inferior frontal cortex exists in DMN2 but not in DMN1. Note that group-level modes shown in this figure are thresholded at an absolute value of 0.5 for illustration purposes but subject-level modes are unthresholded to preserve the details of cross-subject variability.
ICA-DR paradigm. An important next question that might occur to the reader is whether the PFMs inferred by sPROFUMO are indeed capable of accounting for interpretable individual-specific traits; e.g. are these high dimensional PFMs of significant cognitive relevance? We looked to address this question by assessing the sPROFUMO PFMs' ability to predict a battery of 68 cognitive tests in the UKB data. These tests spanned a range of sensorimotor and higher-level cognitive functions.
First, we computed the prediction accuracies by combining outputs from all the functional modes, which we refer to as multi-mode predictions. For this purpose, as outlined earlier in Section 3.4.4 , spatial maps across all the functional modes where dimensionality-reduced and combined using sparse dictionary learning and bigFLICA ( Gong et al., 2021 ), in order to obtain a set of SMAP features across the subjects. Additionally, spatial and temporal NetMats were dimension-reduced to obtain SNET and TNET features, and mode amplitudes across the subjects were utilised as AMP features. The main purposes of the multi-mode predictions were: a) to determine which of the model elements provide the best predictors, and b) to compare sPROFUMO's performance to ICA-DR. Bland-Altman plots in Fig. 8 a show correlation-based prediction accuracies, where values above and below the baseline (at zero) depict higher performances for sPROFUMO and ICA-DR, respectively.
Two advantages of sPROFUMO that can be noted from these plots are its superior overall (i.e. multi-element) (p-value = 4.47 × 10 − 6 ) and SMAP (p-value = 3.03 × 10 − 7 ) prediction accuracies. More specifically, the size, locations and spatial arrangement of the functional modes in sPROFUMO were found to provide better predictions compared with ICA-DR. In contrast, TNET from ICA-DR, i.e. the functional connectivity amongst the modes, outperformed that of sPROFUMO (p-value = 3.59 × 10 − 6 ). Apart from this double dissociation of SMAP and TNET, the SNET and AMP elements showed similar performances between the two mod-els. It is worth noting that, as elaborated in Section 3.4.2 , we conducted predictions after a set of imaging confounds were linearly regressed out of both the predictor and target variables. While this can arguably allow us to obtain more interpretable prediction results, we further tested ( Figure A 7) that prediction patterns, in particular the double dissociation, are maintained if no deconfounding or unilateral deconfounding of predictor variables are performed.
We next made a closer examination of sPROFUMO's multi-mode multi-element accuracies for each cognitive test in isolation, as shown in violin plots in Appendix G, Figure A 8, and broadly categorised the 68 tests into three sets: a) sensorimotor metrics that include tests of reaction time and trail making; b) memory and executive functions that include tests of numeric and prospective memory, symbol-digit and pair matching, and matrix pattern completions; c) a range of fluid intelligence tests. We found sPROFUMO to be most successful at predicting tests related to memory and executive functions, where the average prediction accuracies were found to be as high as ∼0.3.
Next, taking a step away from combining outputs of all the functional modes, we used uni-mode predictions, to identify the best predicting modes. As shown in Fig. 8 c, these included multiple frontal, fronto-parietal and variants of the DMN, in addition to one visual, one auditory, and one cerebellar mode. We found the uni-mode prediction accuracies to be up to 0.15, 0.2, 0.2 and 0.25 for TNET, SNET, SMAP and multi-element, respectively ( Fig. 8 b). Comparison against ICA-DR revealed a generally better performance for sPROFUMO, reflected in SMAP, SNET and multi-element prediction but not the TNET. The superior performance of sPROFUMO for uni-mode multi-element predictions ( Fig. 8 b left) is particularly interesting as it reveals that single PFMs at 150-mode decomposition are of higher cognitive relevance compared to ICA-DR modes of the same dimensionality. Fig. 7. Comparison of 150 sPROFUMO modes from UKB to spatial ICA (group) and ICA-Dual Regression (subjects) : a) sPROFUMO and ICA modes that are matched based on spatial correlation across voxels. For the 74 modes labelled as showing one-to-one matching, absolute spatial correlation between each pair was > 0.7 and this correlation was at least two standard deviations higher that the next best matching mode. Modes that do not pass this criterion are labelled as having one-to-many matching. b) Left: examples of cerebellar, subcortical, frontal and parietal modes with one-to-one matching; right: example of one DMN mode in sPROFUMO that was divided into six subnetworks in ICA; c) example subject-level differences between sPROFUMO and ICA-DR for sensorimotor networks in two UKB participants. Note that group-level modes shown in this figure are thresholded for illustration purposes (refer to colourbars for thresholds) but subject-level modes are unthresholded to preserve the details of cross-subject variability and noise levels in each spatial map.
In summary, we found sPROFUMO to successfully predict a range of cognitive tests, and to generally outperform ICA-DR. Notably, we found a double dissociation between the two models, where in sPROFUMO the spatial mode topographies showed the highest contribution in predicting the cognitive outcome, while in ICA-DR, between-mode functional connectivity provided the best predicting element. This dissociation is inline with previous findings of ICA-DR and PROFUMO ( Bijsterbosch et al., 2019( Bijsterbosch et al., , 2018Harrison et al., 2020 ;Llera et al., 2019 ) based on canonical correlations analysis of behavioural and life factors. In this study, we expand those findings using predictions and a sample size that is approximately five times larger.

sPROFUMO versus PROFUMO predictions
The proposed stochastic version of PROFUMO (i.e. sPROFUMO) aims to make the model applicable to big datasets where PROFUMO's application is not computationally practical. For example, for the 150-mode PFM decomposition based on 4999 UKB subjects, PROFUMO would require ∼7.5 TB, which were intractable with our computing resources, while sPROFUMO required ∼100GB. Nevertheless, it is important to show that sPROFUMO's ability to predict cognitive tests is not worse than PROFUMO. For this purpose, we ran both models on a smaller population consisting of 1500 subjects. We conducted multi-mode predictions using multi-element model summaries, spatial maps, spatial Net-Mats, temporal NetMats and Amplitudes. Results are shown in Figure A  9, where we found no significant difference between the two models.

Disentangling double dissociation of sPROFUMO spatial map versus ICA-DR temporal NetMats in predictions
Our prediction results revealed that while ICA-DR TNETs outperform sPROFUMO TNETs for prediction of cognitive tests, sPROFUMO SMAPs show superior performance compared with ICA-DR SMAPs. This double dissociation raises a key question, namely, are the two algorithms measuring overlapping sources of subject variability, but projecting them onto different model elements? In order to address this question, we conducted two analyses: Firstly, we calculated the degree to which sPROFUMO SMAP/TNET share variance with ICA-DR SMAP/TNET across individuals. For this purpose, we extracted multi-mode summaries of each model in spatial versus temporal domains, and obtained N subject × 500 feature matrices for SMAPs and TNETs. We performed Canonical Correlation Analysis (CCA) to measure shared variance in pairwise comparisons, as outlined in Section 3.4.7 . Results are shown in Fig. 9 . Interestingly, we found the highest shared variance to be between sPROFUMO SMAPs and ICA-DR TNETs, where 77 out of top 100 CCA components where significantly correlated after Family-wise error-rate (FWER) correction for multiple Fig. 8. Prediction of cognitive outcome using 150 sPROFUMO modes at rest, and comparison to ICA-DR of the same dimensionality . These results show an overall higher performance for sPROFUMO and reveal a double dissociation between the role of spatial topography versus functional connectivity in prediction accuracies (PA) of the two models. a) Bland-Altman plots for multi-mode prediction. 5-fold cross-validations used to estimate prediction accuracies for 68 cognitive tests, each repeated 20 times, providing 20 × 68 dots per plot; b) density scatter plots of uni-mode predictions. 5-fold cross-validations used to estimate prediction accuracies for each of the 150 modes and 68 cognitive tests, providing 150 × 68 dots per plot. Since PFM and ICA-DR modes are not necessarily one-to-one matching (see Fig. 7 ), predictions are sorted and paired based on worst-to-best accuracy; c) examples of the best predicting sPROFUMO RSNs. Red dashed lines in (a) and (b) denote baselines for equal performance.
comparisons (see Section 3.4.7 for details). We also found significant correlations of sPROFUMO TNETs with ICA-DR TNETs (44 CCA components). On the contrary, we only found a few shared components between sPROFUMO SMAPs with ICA-DR SMAPs (1 significant CCA components) and sPROFUMO TNET with ICA-DR SMAP (2 CCA components). Therefore, it is likely that what ICA-DR projects onto functional connectivity matrices is inherently a combination of spatial variability and functional connectivity of RSNs in PFMs.
Secondly, considering these significant shared variances between ICA-DR and sPROFUMO model elements, we next tested whether the cognition prediction power of each model will be reduced if the linear projection of the opposite model is regressed out. More specifically, we first focused on sPROFUMO SMAPs, and compared its original prediction accuracies with when ICA-DR SMAPs or TNETs were regressed out ( Fig. 9 b, boxplots on the left). We found that only when the ICA-DR TNETs were regressed out, were the PFM SMAP predictions significantly reduced. Next, we focused on ICA-DR TNETs and compared its original prediction accuracies with when sPROFUMO SMAPs or TNETs were regressed out ( Fig. 9 b, boxplots on the right). This time we found significant reduction of accuracies in both cases. The reduc- Fig. 9. Disentangling sources of subject variability in sPROFUMO versus ICA-DR Spatial Maps (SMAP) and temporal NetMats (TNETs) : a) Canonical Correlation Analysis (CCA) revealed the highest number of significantly-correlated CCA components to be between sPROFUMO SMAPs and ICA-DR TNETs, followed by shared components between sPROFUMO TNETs and ICA-DR TNETs. Dashed lines show thresholds for significant correlations after Family-wise error rate correction for multiple comparisons using permutations. Pairwise comparisons that are associated with more conservative thresholds showed broader confidence intervals. b) Testing the degree to which sPROFUMO SMAP prediction accuracies will be reduced if ICA-DR SMAP or TNETs are linearly regressed out before running predictions. Testing the degree to which ICA-DR TNET prediction accuracies will be reduced if sPROFUMO SMAP or TNETs are linearly regressed out before running predictions. * denotes p-value < 0.01, and * * denotes p-value < 0.005. tion was particularly evident when sPROFUMO SMAPs were regressed out.
These results altogether confirm that sPROFUMO SMAPs and ICA-DR TNETs measure overlapping sources of subject variability that contribute to prediction of cognitive tests. Importantly, however, while in ICA-DR the sources of cognitive variability seem to be predominantly mapped onto functional RSN connectivity, sPROFUMO appears to divide them between spatial variability and temporal connectivity, with the former playing a more important role. As we will discuss later in Discussion , this can have important implications for how spatial topography and functional connectivity might interplay in subject-specific resting-state network estimations.

Effect of mode dimensionality on group-and subject-level PFMs
Results presented up to this point are based on the decompositions of 150 sPROFUMO PFMs in the brain at rest. Recent studies of subject-specific high dimensional functional parcellations have reported hundreds of functional modes at rest ( Dadi et al., 2020 ;Smith et al., 2015 ). These findings are often based on the datasets with long recordings in the order of hours e.g. HCP data or combinations of multiple datasets. Given the relatively small number of time points in UKB (i.e. 490 time points corresponding to 6 minutes), here we chose to estimate 150 modes. Nonetheless, this choice remained somewhat arbitrary. We therefore explored the extent to which the key results hold when 100 and 200 PFMs were to be estimated.
First focusing on the group model, we paired group spatial maps based on the spatial correlation across the voxels, as shown in Fig. 10 a. When comparing 100 and 150 dimensions, we found that there was generally a one-to-one correspondence between the first 100 PFMs, and by increasing to 150, sPROFUMO added new PFMs. We made a similar observation when comparing 150 and 200 PFMs. Modes that started appearing at higher dimensions were spatially correlated with multiple of the existing modes. These new modes were amongst the lowerranked modes in sPROFUMO's matrix factorisation and thus, perhaps unsurprisingly, were found to be typically lower in SNR. We next examined model convergence ( Figure A 10 in Appendix H) and found that higher-dimensional decompositions showed a quicker convergence of both group spatial maps and partial temporal NetMats.
Next, we investigated the effect of increased dimensionality on subject-specific PFMs ( Fig. 10 b). Focusing on cross-subject consistencies, similar to Section 4.3.2 and Appendix E, for each model element, we calculated correlations between each subject and all other subjects. Regardless of dimensionality, spatial NetMats were the most consistent model element across the individuals, followed by partial temporal Net-Mats and spatial maps, and consistencies of these elements further increased with dimensionality. On the contrary, mode amplitudes were the least consistent element and further decreased in consistency by the increased dimensionality. Interestingly, increasing the dimensionality from 100 to 150/200 resulted in more structured partial temporal NetMats with higher off-diagonal values, which in turn resulted in a noticeable increase in cross-subject temporal NetMat consistencies.
We finally compared cognition prediction accuracies of different sPROFUMO dimensionalities ( Fig. 10 c and Figure A 11). For this purpose, we first looked at the results from multi-mode predictions based on SMAPs, SNETs, TNETs, Amplitudes and multi-element combination. We generally found similar performances for different dimensionalities ( Fig. 10 c), except for three significant differences: SMAPs from 100 ( t = 4.02; p = 0.0003) and 150-mode ( t = 6.99; p = 6.4 × 10 − 8 ) decompositions were significantly better than 200-mode decompositions; TNETs from 150-mode were significantly better than 100-mode decompositions ( t = 3.74; p = 0.0018); Amplitudes from 100-mode were significantly better than 150-mode decompositions ( t = 4.55; p = 0.0001). Next, we focused on uni-mode predictions, namely to see if the modes that are found at higher dimensions can outperform the modes that are present at lower dimensions. For this purpose, we focused on 150-mode decompositions. By comparing against 100-mode decompositions, we labelled the top 82 modes that showed a clear one-to-one matching (spatial correlation > 0.75) as primary PFMs, and the remaining 68 as secondary PFMs. We compared multi-element predictions of each secondary PFM to all the primary PFMs and found 7 modes that significantly outperformed at least 25% of the primary PFMs (after FDR correction across 82 × 68 multiple comparisons). Results are shown in Figure A 11, these include 5 parcel-like parietal modes, one frontal and one cerebellar mode.
These results altogether show that the increased dimensionality of the PFMs can add unique information to what is provided by lower dimensions, resulting in more accurate fits to the data and improve consistency. Increasing dimensionality from 100 to 150 (but not from 150 to 200), also showed some improvements for prediction of cognitive tests and revealed new cognitively-relevant fine-grained modes. Therefore, given the relatively short recordings in UKB, 100/150 modes might provide a better choice of dimensionality compared with 200. We will later discuss how our findings on high dimensional PFMs might generalise to other studies in the future.  150 to 100 and 200) . a) Group-level: going to higher dimensions (i.e. from 100 to 150 and 150 to 200) yielded new modes that were spatially overlapping with the existing modes; b) subject-level: all model elements except for the amplitudes became more consistent for higher dimensions; c) different dimensionalities showed similar multi-mode prediction performances for cognitive tests except for: SMAPs of 100 and 150-mode decompositions were significantly better than 200-mode decomposition, TNETs of 150 were better than 100, and amplitudes of 100 were better than 150. * * denotes p < 0.005, refer to the main text for t-value/p-values.

Discussion
In this study, we proposed stochastic PROFUMO (sPROFUMO) for inferring functional brain modes in individuals and populations from big fMRI data. We applied the model to resting state data from 4999 UK Biobank (UKB) subjects, in order to obtain high dimensional PFM decompositions of the brain function at rest, and predicted cognitive traits based on these functional modes. The highlights of the proposed model and results are threefold: firstly, sPROFUMO resolves the issue of computational costs that is associated with the application of bidirectional hierarchical models to big fMRI data; e.g. UKB with expected 100,000 subjects. It therefore provides suitable means to leverage the unprecedented richness of population variability that is provided by big data, and explicitly models functional networks for each individual within the epidemiological cohorts. Secondly, using simulations and real data, we illustrated several advantages of sPROFUMO over the established paradigm of ICA Dual Regression. This was particularly reflected in over 100% more accurate estimation of cross-subject variability in the spatial sPROFUMO mode topographies, and providing a potentially more interpretable reflection of cognitive variability in RSN topographies versus connectivity . Thirdly, sPROFUMO enabled us to obtain an unprecedently detailed mapping of the single subject RSNs based on UKB, and we showed these RSNs to be predictive of individualistic cognitive traits. The sPROFUMO code and discovered PFMs will be made publicly available in hopes of being useful for future investigations into individualised brain function.

Hierarchical functional mode modelling from big data
An important objective of big neuroimaging data is to provide resources for precision medicine and neuroscience ( Bzdok and Yeo, 2017 ;Ganguli et al., 2018 ;Quinlan et al., 2020 ;Sui et al., 2020 ). Therefore, increasing developments have been made to map the brain function in single subjects, and an important challenge in doing so is to overcome the computational costs of such models ( Mejia et al., 2019 ). Classic PRO-FUMO ( Harrison et al., 2020( Harrison et al., , 2015 provides an advanced approach for hierarchical modelling of the functional brain networks in population and individuals. The sPROFUMO method proposed in this study builds upon PROFUMO and incorporates two key differences in order to make it suitable for big data application. Firstly, it substantially reduces the computational costs of the model. For example, for a 150-mode PFM decomposition and 4999 UKB subjects as used in this study, classic PRO-FUMO would typically require ∼7.5 TB and for the forthcoming 100,000 subjects in UKB, this value will increase to 150 TB of computer RAM, which are intractable. With sPROFUMO, these values are decreased by a factor of 100, such that thousands of subjects can be readily analysed with a modest computing cluster, while still maintaining similar accuracies to that of PROFUMO, as evaluated by extensive simulations and prediction accuracies for cognitive traits. The stochastic variational inference used in sPROFUMO can enable the model to also accommodate larger degrees of population heterogeneity; i.e. subject deviations from the group. More specifically, in the previous studies, we had observed that PROFUMO requires a mode to be present in a majority of individuals in order to be estimated for the population ( Harrison et al., 2020 ), otherwise, it was found to be typically "switched off" -returned as "missing " for the population as well as all subjects. On the contrary, we found that sPROFUMO was able to recover modes even when they were estimable in only a minority of the subjects. This could be useful in various instances when there are substantial variations in the SNR of the fMRI recordings across the individuals, e.g. in babies or patients, such that the less prominent modes might only be estimable in a subset of subjects. As outlined in Section 3.2.1 and Appendix B, we applied additional data curation strategies to boost the number of estimable modes in individuals, such that all the reported modes in this study were consistently present (i.e. showing no or a few missing modes across individuals).
Another feature of sPROFUMO, which is directly inherited from PROFUMO, and we found to be particularly useful when applying the model to UKB, was its explicit account of signal and noise components at single subject level. UKB, despite the unprecedently large number of subjects that provides excellent group-level SNR, has a relatively small amount of data per subject, where each subject is scanned once and for ∼6 minutes ( Miller et al., 2016 ). This will make it particularly challenging to estimate "clean " modes at single subject level and, thus, further modelling strategies are required to obtain high SNR mode estimation in individuals. Unlike the common approach used in matrix factorisations, where each mode is decomposed into spatial map, time course and residuals ( Beckmann et al., 2009 ;Guo and Tang, 2013 ;Mejia et al., 2019 ;Varoquaux et al., 2011 ), in s/PROFUMO, spatial maps and mode time courses are further decomposed into signal and noise components. We showed that ( Fig. 7 and Figure A 4) this noise modelling can accurately tease apart signal from the background noise and substantially improved the SNR of subject mode estimations, e.g. compared to ICA-DR.

High dimensional single subject RSN decompositions in UK Biobank
We estimated high dimensional sPROFUMO RSNs in population and individuals from UKB. The importance of high dimensional RSN decompositions has come to light in the last few years, as they provide a detailed mapping of the brain function ( Dadi et al., 2020 ;Pervaiz et al., 2020 ;Smith et al., 2015Smith et al., , 2013b. The fine millimetre spatial resolution of fMRI makes this modality suitable for investigation of the brain function on a voxel-by-voxel basis ( Norman et al., 2006 ;Woolgar et al., 2016 ). Nonetheless, resting state fMRI connectivity is oftentimes conducted in low-dimensional spaces, e.g. based on 20-50 networks which provides valuable insight into long-distance system-level interactions. Nevertheless, growing evidence suggests that this approach has a limited ability to elucidate the information that is encoded in fine-grained brain topographies  or within-mode interactions ( Schaefer et al., 2018 ). The high dimensional modes (100,150,200 PFMs) that we identified using sPROFUMO incorporated three types of RSNs: i) high-SNR distributed RSNs, ii) lower SNR distributed RSNs, iii) lower SNR parcel-like RSNs.
Importantly, the high dimensional PFMs demonstrated a unique behaviour whereby the well-known large scale RSNs, e.g. DMN or DAN, that are typically found at lower dimensions were preserved, while new, spatially overlapping, modes were added. This behaviour is distinct from the existing high dimensional functional parcellations ( Craddock et al., 2012 ;Schaefer et al., 2018 ), and previously reported behaviour in terms of hierarchically organised networks ( Doucet et al., 2011 ;Lee et al., 2012 ). In particular, this is a fundamentally different behaviour to that which we observed in spatial ICA. Namely, in ICA moving to higher dimensions (e.g. > 100) typically results in the large-scale modes being divided into multiple minimally-overlapping sub-modes that are predominantly confined to one brain region ( Kiviniemi et al., 2009 ;Smith et al., 2015Smith et al., , 2013b. As an example, at 150-mode dimension, we illustrated the DMN in Fig. 7 , where one PFM spatially overlaps with six different ICs. The additional RSNs identified with sPROFUMO can be described as the less prominent variants of the well-known RSNs. For example, as illustrated in Fig. 5 , we found 24 modes in the occipital and occipitotemporal cortex, coinciding with visual networks ( Wang et al., 2015 ) while low-dimensional decompositions typically yield less than 10 visual networks . Our results provide initial evidence for the relevance of these fine-grained modes for predicting cognitive traits. However, future studies are expected to shed more light on details of their functional relevance based on a broader set of phenotypes, ideally based on highly-sampled individuals that can provide sufficient data for high-SNR recovery of fine-grained modes. Some plausible scenarios for the functional relevance of these new RSNs include: a) a subset of these modes that are localised to specific brain regions are likely to be specialised for specific tasks; b) some of these modes might be sub-nodes of the distributed networks such as DMN. These include the modes that show high one-to-one correlations to ICA-DR modes at high dimensions (c.f. Fig. 7 ); c) a subset of these modes, e.g. parcel-like modes in the parietal cortex ( Fig. 5 , 6th and 7th rows), might have occurred due to the limitations of our model for capturing the more complex underlying connectivity patterns e.g. gradients of connectivity, dynamic connectivity, or presence of subpopulations in the data ( Lurie et al., 2020 ;Margulies et al., 2016 ;Seitzman et al., 2019 ;Vidaurre et al., 2017 ).

Predicting cognitive outcome based on functional modes
An important goal of single subject modelling of functional brain networks is to account for individualistic cognitive and clinical traits ( Finn and Constable 2016 ;Gordon et al., 2017b ;Kong et al., 2019 ). Leveraging the rich source of non-imaging phenotypes in UKB, we used PFMs to predict the outcome of a range of cognitive phenotypes. Based on multi-mode predictions, we found the model to be most successful at predicting tests related to memory and executive functions, where the correlations between the estimated and true test scores were found to be as high as ∼0.3. These prediction accuracies are on par with previous reports based on soft ( Pervaiz et al., 2020 ) and hard parcellations ( Kong et al., 2019 ), as well as the more recent multimodal phenotype discovery methods ( Gong et al., 2021 ). Furthermore, uni-mode predictions unravelled multiple variants of DMN and fronto-parietal attention networks as the best predictors of cognitive phenotypes. This finding can be readily integrated within the previous literature, given the well-established (albeit less well understood) role of DMN and fronto-parietal networks in higher-level cognitive function ( Andrews-Hanna et al., 2014 ;Buckner et al., 2008 ;Raichle, 2015 ;Spreng et al., 2010 ). Interestingly, however, we also found a cerebellar mode as one of the top predicting PFMs, which is in accordance with growing evidence suggesting a key role for the cerebellum in higher-level cognition that goes beyond motor coordination ( Schmahmann, 2019 ;Sokolov et al., 2017 ;Wagner and Luo, 2020 ). Importantly, we found the spatial characteristics of the PFMs, i.e. spatial configuration of the PFMs across the brain voxels as well as spatial correlations amongst the modes, to provide better predictors of cognitive phenotypes compared to the functional connectivity. While previous research has predominantly investigated links between RSNs and non-imaging phenotypes based on functional connectivity ( Dadi et al., 2019 ;Ng et al., 2016 ;Pervaiz et al., 2020 ;Smith et al., 2015 ), recent studies have highlighted ( Bijsterbosch et al., 2018 ;Kong et al., 2019 ) that differences in spatial topographies can give rise to trait-like differences. We showed that using sPROFUMO, functional subdivisions of the brain can be clearly reconstructed in individuals ( Fig. 6 ), and the prediction results further illustrate that cross-subject variability of these spatial maps is predictive of individualistic traits.

Interpreting the prediction accuracies
In addition to utilising the prediction accuracies as a proxy for the model's performance, we conducted comparisons between sPROFUMO and ICA-DR in order to shed light on how these results can be interpreted. While we observed a generally higher performance for sPRO-FUMO, particularly in uni-mode predictions, the most intriguing difference was the increased predictive performance of sPROFUMO's spatial topography versus ICA-DR's functional connectivity. This is in-line with recent studies ( Bijsterbosch et al., 2018 ;Llera et al., 2019 ) and can be explained in light of differences in model assumptions and the way subject-group hierarchies are defined in each framework ( Beckmann and Smith, 2004 ;Harrison et al., 2020 ;Nickerson et al., 2017 ).
If we consider the true sources of subject-variability to span both spatial and temporal characteristics of the functional modes, we ideally re-quire a model to: a) identify the variability across the individuals and b) accurately assign the division of labour between spatial versus temporal domains in accounting for that variability. Previous studies have shown that if the spatial variability is not accurately specified, in particular if spatial overlaps between the modes are not accurately accounted for, it can result in a biased, typically over-estimated, functional connectivity ( Bijsterbosch et al., 2019( Bijsterbosch et al., , 2018. In other words, true sources of variability that were spatial in nature are likely to be erroneously estimated as functional connectivity and enhance its prediction power. In-line with these findings, based on CCA analysis and predictions, we found that the subject variability that is reflected in ICA-DR TNETs, is mostly captured by sPROFUMO SMAPs and to a lesser extent by sPROFUMO TNETs. Multiple characteristics in the PROFUMO framework are designed to accurately identify spatial versus temporal variability: Firstly, as we showed in simulations ( Section 4.2 ), bidirectional hierarchical modelling and inference allows for accurate estimation of spatial mode configurations in single subjects. Secondly, considering that there are no constraints on spatial and/or temporal independence of the PFMs, the model allows the modes to be functionally correlated and spatially overlapping, depending on the evidence provided by the data at hand. Thirdly, and importantly, defining hierarchies on both spatial and temporal characteristics of the modes allows to iteratively fine-tune the parameters in both domains until an optimal balance is found. Therefore, sPROFUMO prediction accuracies can be expected to provide a more interpretable reflection of the true sources of functional mode variability that contribute to cognitive heterogeneity. This can be further tested using other types of phenotypes in the future.

Declaration of Competing Interest
None.