Enhanced design matrix for task-related fMRI data analysis

In this paper, we introduce a novel methodology for the analysis of task-related fMRI data. In particular, we propose an alternative way for constructing the design matrix, based on the newly suggested Information-Assisted Dictionary Learning (IADL) method. This technique oﬀers an enhanced potential, within the conventional GLM framework, (a) to eﬃciently cope with uncertainties in the modeling of the hemodynamic response function, (b) to accommodate unmodeled brain-induced sources, beyond the task-related ones, as well as potential interfering scanner-induced artifacts, uncorrected head-motion residuals and other unmodeled physiological signals, and (c) to integrate external knowledge regarding the natural sparsity of the brain activity that is associated with both the experimental design and brain atlases. The capabilities of the proposed methodology are evaluated via a realistic synthetic fMRI-like dataset, and demonstrated using a test case of a challenging fMRI study, which veriﬁes that the proposed approach produces substantially more consistent results compared to the standard design matrix method. A toolbox extension for SPM is also provided, to facilitate the use and reproducibility of the proposed methodology.


Introduction
In order to perform different actions/tasks, the brain is thought to rely on the simultaneous activation of several Functional Brain Networks (FBN) ( Fair et al., 2009;Huettel et al., 2009 ), which are engaged in proper interaction to execute the tasks effectively.Such networks, potentially distributed over the whole brain, constitute segregated regions that exhibit high functional connectivity ( Fair et al., 2009;Huettel et al., 2009;Poldrack et al., 2011 ).The latter is measurable via the underlying correlations among the associated activation/deactivation time patterns referred to as time courses ( Huettel et al., 2009 ).
During the last decades, functional Magnetic Resonance Imaging (fMRI) has become one of the most popular tools to study the brainbehavior links for neurophysiological ( Aslak et al., 2019;Huettel et al., 2009;Krohne et al., 2019;Rui et al., 2020 ) and clinical research ( Castro et al., 2016;Hannanu et al., 2020;Musaeus et al., 2019;Seo et al., 2019 ), since it is a non-invasive imaging modality that enjoys a relatively good spatial resolution ( Poldrack et al., 2011 ).Unlike other alternative non-invasive techniques that target neuronal activation more directly, e.g.Electroencephalography (EEG), fMRI records brain activ-tion ( Boynton et al., 1996;Friston et al., 1994 ), called Hemodynamic Response Function (HRF).Among a variety of models for the functional shape of the HRF, the two-gamma model ( Friston et al., 1994 ) has been widely used for decades.In particular, one widely used function stands out from this model: the so-called canonical HRF, and many standard toolboxes for fMRI data analysis use it as a default, e.g.SPM1 or FSL2 .

Conventional task-related fMRI experiments
Conventional clinical and neurophysiological task-related fMRI experiments aim to unveil the different FBNs or the brain areas that actively contribute to and underlie the task-related performance.Among a variety of alternatives, the General Linear Model (GLM) constitutes one of the most widely used approaches to analyze this kind of data ( Poldrack et al., 2011 ).Briefly, given a particular set of predefined regressors, such as the estimated task-related time courses, the GLM assumes that the signal measured at each voxel results from a linear combination of these regressors plus noise, and it determines how strongly each one of them matches the measured signal at each voxel independently ( Huettel et al., 2009;Poldrack et al., 2011 ).The obtained parameters only provide an estimate of the relative signal amplitude evoked by each one of the task-related time courses.Therefore, statistical tests are subsequently employed to determine the voxels that exhibit significant activation.
Despite its relative simplicity, the GLM still plays a fundamental role within the task-related fMRI framework; either as a primary method to analyze task-related fMRI data, e.g., ( Grady et al., 2021;Hannanu et al., 2020;Lebreton et al., 2019;Olszowy et al., 2018 ), or as a supplementary technique, e.g., as a baseline for introducing more advanced or new techniques, e.g., Chatzichristos et al., 2020 , or even as an approximate "ground truth " for fine-tuning regularization parameters that are associated with certain methods, e.g., Lv et al., 2017 .However, GLM suffers from two crucial -and well known -limitations: a) it assumes that the HRF is known, fixed and the same for all participants, whereas the truth is that the HRF significantly varies across individuals due to a wide range of factors ( Handwerker et al., 2004;Levin et al., 2001;Noseworthy et al., 2003;Seghouane and Shah, 2013 ), and b) hypothesis testing of whether a voxel is active due to the experimental task or not is based on the assumption that the measured signal can be modeled as a linear combination of predetermined imposed factors plus additive Gaussian noise, which is assumed to be independently distributed across voxels ( Huettel et al., 2009 ).
Therefore, the appropriate definition of the task-related time courses plays a crucial role within the GLM framework; all the outcomes of the statistical tests and the identification of the significant activation patterns depend directly on the accuracy of the defined task-related time courses.Hence, if the task-related time courses are poorly modeled, they can introduce detrimental effects, leading to inaccurate or incomplete results ( Poldrack et al., 2011 ).
A prominent alternative for fMRI data analysis is the Blind Source Separation (BSS) approach, which is usually performed with the aid of appropriate matrix factorization schemes.In general, BSS methods aim to unveil the different sources from the fMRI data without the necesesity of any prior information regarding the experimetal task.Independent Component Analysis (ICA) and Dictionary Learning (DL) are two of the most popular paths, e.g., Zhang et al. (2019) .Although ICA and DL are both BSS methods, they differ considearbly from each other from a theoretical perspective ( Theodoridis, 2020 ).ICA is a statistical method, which assumes that the FBNs constitute a set of maximally spatially independent set of sources3 .On the other hand, DL is a sparsity promoting representation-based method, which exploits the segregated nature of the FBNs ( Carroll et al., 2009;Harris and Mrsic-Flogel, 2013;Jenatton et al., 2012;Morante et al., 2020;Seghouane et al., 2019;Xie et al., 2017 ).
In contrast to GLM, BSS methods make no assumptions regarding the HRF and can also reveal other brain-induced sources beyond the taskrelated ones.Moreover, they can inherently model interfering artifacts, such as scanner-induce artifacts, uncorrected head-motion residuals, or other unmodeled physiological signals that may obscure the brain activity of interest.Due to all these reasons, BSS-based methods have become the dominant tool for the analysis of resting-state fMRI (rs-fMRI) experiments, where no prior external task-information is available ( Iqbal and Seghouane, 2018a;2018b;Levin-Schwartz et al., 2017;Rui et al., 2020 ).
Despite their advantages, BSS methods share a major drawback compared to GLM: when two or more task-related sources manifest themselves in highly overlapping brain regions, ICA (to a larger) and DL (to a smaller extent) fail to discriminate them ( Zhang et al., 2019 ).From a practical point of view, the presence of overlaps among FBNs can be encountered in most of the typical experimental designs of interest.More specifically, several research groups have reported that many task-related FBNs, such as motor, language, emotion or auditory, exhibit considerable overlap with each other ( Fuster, 2009;Hermansen et al., 2007;Xu et al., 2016 ).
In an attempt to overcome the aforementioned fundamental drawback of BSS methods, researchers have developed more advanced methods ( Calhoun et al., 2005;Lu and Rajapakse, 2001;Lv et al., 2017;Morante-Moreno et al., 2017;Wang et al., 2014;Zhao et al., 2015 ) that incorporate external information such as task-related time courses, in a similar rationale as GLM, with the aid of extra constraints.Although these constrained approaches perform considerably better compared to their unconstrained counterparts, most of them present two critical drawbacks: a) they build upon the canonical HRF, which is fixed and inevitably different from the true one, and b) they often adopt a regularized formulation of a cost function, which inherits the difficulties associated with the tuning of the associated set of regularization parameters 4 .From a practical point of view, the tuning of these parameters is not an easy task, and it is performed following cross-validation techniques; however, in real experiments, this is not possible due to the lack of ground truth data.Therefore, the only way to fine-tune such parameters is via visual inspection of the results, which is a process that requires the subjective judgment of the user and can be susceptible to errors.
Alternatively, researchers may opt for a less subjective procedure to cope with the absence of ground truth.For example, within the DL framework -when task-related information is available -researchers may explicitly use GLM to fine-tune parameters.Specifically, they first perform GLM and then use the results to find suitable candidates for the required regularization parameters, e.g., Lv et al., 2017;Zhao et al., 2015 .Such techniques aim to overcome problems with the fine tuning of regularization parameters that are associated with the DL methods, by biasing the solution towards that of the GLM.However, this makes the methods vulnerable to the same problems as GLM, e.g., Cremers et al., 2017;Nee, 2019;Yeung, 2018 .For a related comparison see, e.g.Morante et al. (2020) .

Beyond existing DL approaches
Recently, an alternative formulation of the DL task, referred to as Information Assisted Dictionary Learning (IADL), was introduced dependence among time courses, referred to as temporal-ICA.However, spatial-ICA is more commonly used in practice than temporal-ICA ( Calhoun et al., 2001b ). 4 This drawback is particularly remarkable when working with ICA since, unlike most unconstrained ICA algorithms that are relatively free of any parameter fine-tuning, the constrained ICA algorithms, e.g., Calhoun et al. (2005) ; Lu and Rajapakse (2001) ; Wang et al. (2014) , require fine-tuning of regularization parameters, often through cross-validation.
in Morante et al. (2020) .Unlike other constrained BSS variants, IADL offers several advances that overcome the limitations mentioned above: First, IADL incorporates the task-related information in a relaxed way, allowing the imposed time course to adjust to the real subject-specific hemodynamic response.This improvement is crucial since the hemodynamic response exhibits a great deal of heterogeneity among individuals, due to a wide range of factors such as vascular differences ( Huettel et al., 2009 ), partial volume imaging ( De Martino et al., 2011 ), non-linear effects ( Friston et al., 1998 ), age ( West et al., 2019 ) and even lipid ingestion ( Noseworthy et al., 2003 ).Second, the IADL's formulation builds upon a sparsity-promoting constraint that bears a direct neurophysiological interpretation, which establishes a bridge with the relative size of the FBNs.In contrast to conventional DL approaches that assume that the detected signal at each voxel can be described in terms of a small number of active sources, IADL considers that each source is assocaited with a relatively small number of activated voxels, which better complies with the segregated nature of the FBNs ( Morante et al., 2020 ).Thus, this particular constraint introduces two benefits: first, IADL is rendered free from the need to empirically fine-tune parameter regularization and, second, information regarding the approximate size of the areas occupied by each FBN can be easily exploited and built in the model using brain atlases as described in Morante et al. (2020) .Unlike other DL alternatives that use GLM results or fully rely on the subjective judment of the user, as detailed in Morante et al. (2020) , the IADL algorithm exploits alternative available information to bypass these issues.Note that this approach does not require one to specify the actual locations of the voxels through spatial templates, e.g., Lv et al., 2017;Shi et al., 2018 .Instead, IADL only needs an estimate of the number of active voxels, and, in the sequel, it finds the optimal set of active voxels that better represent the measured fMRI participants' data.
Note that, in the literature, one can find a few alternative DL-based methods that pursue similar goals as IADL.One upstanding example is the Shared and Subject-Specific Dictionary Learning (ShSSDL) algorithm ( Iqbal et al., 2018 ), which is designed for anayzing task-related experiments and capturing variations in the hemodynamic response.Nevertheless, ShSSDL incorporates sparsity column-wise, as any standard DL approach, which requires proper fine-tuning and does not allow, as explained above, to formerly exploit the segregated nature of the FBNs ( Morante et al., 2020 ).On top of that, ShSSDL is designed for task-related experiments "where all subjects perform an exact similar task " ( Iqbal et al., 2018 ), hence receive identical task-related time courses as far as task events are concerned.However, this condition is not met for the experiment we are analyzing in this study, as it is also not met for many, if not most, event-related designs.ShSSDL is not appropriate for optimally designed tasks ( Dale, 1999 ), which rely on randomization of sequences and asynchronies (jitter) across participants to maximize information.Note that IADL bypasses these limitations, rendering it a suitable option for the proposed methodology.

Summary of contributions
In this paper, we propose a novel methodology for the analysis of task-fMRI data, which brings together the best of the GLM and DL approaches.In other words, the robustness and the solid statistical foundation of the GLM framework is combined with the versatility, generality, and high potential of the BSS rationale.As we further explain in the following section, the proposed methodology can be summarized in three steps: a) define a preliminary estimate of the design matrix using any conventional approach, referred to as standard design matrix, b) run IADL, using the task-related time courses from the standard design matrix and any prior sparsity information, to obtain a new enhanced design matrix, and c) use the obtained design matrix from IADL within the GLM/SPM framework to infer significant activity from the data.
The proposed methodology is tested over a challenging real task-fMRI experiment, where the results based on the new enhanced design matrix are compared with those from the conventional design matrix under the same conditions.To illustrate the advantages of the proposed approach, we qualitatively evaluate both approaches at both first-and second-level analysis.Furthermore, we also introduce a new software toolbox extension for SPM called Toolbox for Enhanced Design Matrix (TEDM), which provides a graphical user interface for the new methodology within the SPM framework, including an integrated implementation of the IADL algorithm.

GLM overview
fMRI data consist of a sequence of 3D images composed of small cubic volumes referred to as voxels.For each voxel, a set of measurements is available, stacked together in a vector (signal), where  is the total number of voxels, and  is the number of successively acquired images over time.These vectors are often concatenated together to form the data matrix,  ∈ ℝ  × .
From a mathematical point of view, GLM assumes that the measured signal at each voxel results from a linear combination over a predefined set of regressors plus noise, that is, where,  = [ δ 1 , δ 2 , … , δ  ] ∈ ℝ  × is the so-called design matrix , whose columns contain the predefined  regressors, β  is a vector that contains the unknown parameters, which represent the contribution of each regressor to the measured signal, and  is the corresponding noise vector associated with the  th voxel.Under this formulation, and making several assumptions regarding the type of the noise ( Poldrack et al., 2011 ), GLM finds an optimal collection of parameters, β  , given a specific predefined design matrix, , which contains all the hypothesized regressors, including the experimental regressors associated with the task-related time courses.As we described in the Introduction, these task-related time courses can be estimated as the convolution of the actual experimental conditions with the canonical HRF.
Once the parameters are obtained, statistical tests provide information concerning the significant brain activity to determine the anatomical distributions of the activated FBNs.In practice, the results from GLM heavily rely on the appropriateness and the particular definition of the design matrix.It is not an exaggeration to state that one of the most notable challenges in task-related fMRI experiments is to create a proper design matrix ( Huettel et al., 2009 ).Therefore, many researchers augment the design matrix by including extra regressors to alleviate some of the limitations associated with the design matrix beyond the hypothesized task-related time courses.For example, temporal and dispersion derivatives of the task-related time courses are frequently included as extra components.Although the derivatives may lack any specific interpretation on their own, when considered together with their corresponding task-related time course, they can help to account for small shifts between the imposed task-related time courses and the actual brain activation patterns ( Huettel et al., 2009;Poldrack et al., 2011 ).On the other hand, addition of head-motion regressors within the design matrix may help reduce motion residuals and local variations induced by the head motion.However, if the motion parameters are highly correlated with the task, they may eliminate significant regions otherwise detectable ( Poldrack et al., 2011 ).

Information assisted dictionary learning
As described in the Introduction, the proposed methodology for enhancing the design matrix exploits the benefits of IADL, which was formally introduced in Morante et al. (2020) .Unlike similar techniques, IADL's formulation is specifically designed for fMRI data analysis and copes with some of the major drawbacks of conventional DL techniques, such as the proper selection of the regularization parameters.
From a mathematical point of view, IADL can be described as a matrix factorization of the data martix,  , that is, where, following the dictionary learning jargon,  ∈ ℝ  × is the dictionary matrix,  ∈ ℝ × is the coefficient matrix, and  is the number of sources.
Within the fMRI framework, these two matrices bear a specific interpretation.First, the columns of the dictionary matrix,  = , represent the different time courses.In this way, the dictionary plays a similar role to the design matrix in GLM.On the other hand, the rows (    ,  = 1 , 2 , … , ) of the coefficient matrix, i.e.,   = [  1 ,  2 , … ,   ] , correspond to the spatial maps associated with each source.In other words, each row of the coefficient matrix contains the particular distribution of voxels that are activated by the underlying FBNs.
However, in contrast to the GLM formulation in (1) , which requires a predefined fixed data matrix ( Huettel et al., 2009 ), i.e., , the design matrix  in (2) comes as a result of a constrained optimization task.In the spirit of IADL, matrix  is partitioned into two parts,   and   , i.e.,  = [   ,   ] , where   corresponds to the constrained part and   to the free part.Let us assume, for simplicity, that  is the number of task-related time courses used in GLM, i.e., δ 1 , δ 2 , … , δ  .Then,   comprises  columns, and the respective elements  1 ,  2 , … ,   , are constrained to be similar to δ 1 , δ 2 , … , δ  .According to IADL ( Morante et al., 2020 ), similarity is understood in the square Euclidean norm sense, i.e., where   controls the degree of similarity, which reflects the user's confidence on how accurate the predefined task-related time courses are.By default, this parameter is estimated by measuring the deviations that are likely to appear between the standard canonical HRF-based taskrelated time courses and the subject-specific ones, using, e.g., the model proposed in Morante et al. (2020) .
The deviations that IADL allows over the predefined task-related time courses account for small shifts and mis-modeling with no need to incorporate temporal or dispersion derivatives .In this way, IADL uses the task-related information to "assist " the optimization task.Therefore, we refer to these  sources as assisted sources, since they are not forced to remain equal to the imposed task-related ones5 .
On the other hand, the remaining  −  columns of the dictionary, which belong to   , can freely adapt to additional time courses and extra regressors to accommodate other potential unmodeled sources, interfering physiological signals, and potential scanner-induced artifacts ( Bianciardi et al., 2009 ).
Apart from constraints over the dictionary, IADL -being a DL technique -imposes a particular sparsity constraint associated with the rows of  .Aside from its mathematical relevance in order to render the DL task meaningful, e.g., Theodoridis, 2020 , this sparsity-related constraint has also a strong neurophysiological flavor; we know that each row of the coefficient matrix, associated with a brain source, must have most of its elements equal or close to zero, according to the segregated nature of the FBNs.Note that existing DL approaches assume sparsity over the columns of the coefficient matrix, implying that the activity at each voxel is caused by a reduced number of sources.In contrast, the IADL algorithm imposes sparsity row-wise.In other words, sparsity promotes zeros on voxels for each one of the sources.Moreover, it can assign a different sparsity level per row/source.To this end, IADL constrains each spatial map to comprise a given sparsity percentage .By definition, the sparsity percentage measures the proportion of zero values within a particular vector.Thus, sparse sources, i.e., sources with a large number of inactive voxels (zero values), have a large sparsity percentage, whereas dense sources, that is, non-sparse sources with a large number of active voxels, have a small sparsity percentage.
This approach turns out to be easily tunable and offers several advantages.Such an advantage is the use of external information regarding the sparsity percentage of potentially involved FBNs, in the experimental design under consideration, from published brain atlases, as described in Morante et al. (2020) .Furthermore, IADL offers the flexibility of simultaneously dealing with sparse and dense sources since, in addition to sparse sources, dense sources may appear in practice, commonly associated with physiological or scanner-induced artifacts.Moreover, as it has been demonstrated in Morante et al. (2020) , no unrealistic cross-validation parameter tuning is required.Furthermore, as the results in Morante et al. (2020) verify, IADL poses as a strong candidate to examine the individual variation for each participant compared to other potential alternatives.

Proposed methodology for enhancing the design matrix
As we described above, GLM performance mostly depends on proper specification of task-related regressors.Beyond that, although one may decide to augment the design matrix by adding extra regressors, the options to enhance the design matrix in this way are limited ( Huettel et al., 2009 ).Here, we propose a new methodology that exploits the main advantages offered by IADL to enhance the conventional GLM framework.To this end, we introduce an alternative way to construct the design matrix that has been optimized over the available data (in a least-squares sense), complying with constraints associated with the experimental design and any prior sparsity-related information at hand.
Overall, the proposed methodology allows: a) the replacement of the canonical HRF-based task-related time courses with optimized taskrelated time courses that better fit the subject-specific hemodynamic response, b) the augmentation of the original design matrix, , including extra time courses from additional brain sources beyond the experimental task-related ones, and c) its augmentation with regressors that represent additional interfering sources such as scanner-induced artifacts, uncorrected head-motion residuals, cardiac artifacts or other unmodeled physiological signals that may obscure the brain activity of interest.
Fig. 1 illustrates the major steps of the new proposed methodology for the construction of the design matrix, in comparison to the standard procedure.The proposed methodology can be summarized in the following three steps: a).Define the task-related time courses: The first step consists in obtaining a reasonable estimate of the task-related time courses.For example, this is done via the convolutional model and the canonical HRF, as in the conventional GLM approach.Note, however, that in the proposed approach, the accuracy of these estimated time courses is not critical, in the sense that they only serve as a prior information to the subsequent optimization, which will reveal the improved subject-specific task-related time courses.b).Build the enhanced design matrix using IADL: Run IADL using the previously defined task-related time courses and any extra information regarding the sparsity of the sources.The result will be the enhanced design matrix  .(Alternatively, one may use only a selected subset of regressors from the dictionary  , however we do not advocate this approach because of its inherent subjectivity.)c).Employ the enhanced design matrix within the GLM: Finally, use the enhanced design matrix within GLM, to perform any conventional analysis to infer significant activation patterns.Fig. 1.Schematic diagram for constructing the design matrix using the standard procedure (up) and the proposed methodology (down).In the proposed methodology there is no need to incorporate derivatives or extra regressors, as IADL automatically derives the set of regressors that best suits the fMRI data.

TEDM: A new software toolbox extension for SPM
For completeness and in order to facilitate the uptake of the proposed methodology in practice, we have developed a software toolbox extension for SPM 1 , referred to as Toolbox for Enhanced Design Matrix (TEDM), which provides a graphical user interface for the proposed methodology.The toolbox is fully integrated within the SPM framework and is freely available at the TEDM repository6 .Moreover, it automatically provides a parameter set-up that facilitates the use of this methodology.Then, once the enhanced design matrix has been estimated, TEDM automatically generates a new SPM file ready for a firstlevel analysis.

Complexity and computational costs
Regarding the complexity and computational costs, the proposed methodology only introduces an additional step to the conventional pipeline of SPM (see Fig. 1 ).Therefore, the adoption of this methodology only adds the extra time required to run IADL.Fortunately, IADL has no critical computational limitations, and exhibits comparable computational costs compared to any one of the DL algorithms ( Morante et al., 2020 ).
Nonetheless, we are aware that -for some SPM's users-the adoption of this extra step may seem disruptive; the user has to separately run IADL, and in the sequel to construct the enhanced design matrix using the obtained regressors, and, finally, run SPM.Consequently, to facilitate this process, we introduced the TEDM toolbox.TEDM includes an implementation of the IADL algorithm from ( Morante et al., 2020 ), which is smoothly integrated within the standard SPM's framework, simplifying the overall process and saving time for the user.

TEDM and IADL setup
Finally, we briefly discuss the parameter setup of the IADL implementation within the TEDM and its implications from a neuroscience perspective.We should emphasize that this is possible because the optimization parameters of IADL convey a physical interpretation linked to the natural behavior of the brain ( Morante et al., 2020 ).

Number of Sources:
Similar to other DL approaches, IADL is insensitive to the overestimation of the number of sources ( Morante et al., 2020 ).Therefore, one can safely overestimate the number of components, which will not have any significant detrimental effect.Nonetheless, the imposed number of sources should ideally represent the total number of components present in the fMRI data.
Sparsity-related information and brain atlases: One of the major advantages of IADL is that it allows for the exploitation of available sparsityrelated information.Moreover, the algorithm is relatively insensitive to the adopted sparsity level and it only requires a close enough underestimation that can be obtained from different brain atlases as described in ( Morante et al., 2020 ).Note that this approach does not require one to specify the actual locations of the voxels through coordinate or spatial templates, as it is required in, e.g., Lv et al., 2017;Shi et al., 2018 .In case no sparsity information is provided, TEDM automatically sets the sparsity parameters that best accommodate the experimental design: Put succinctly, TEDM sets the sparsity parameter to 85% by default for the task-related time courses, since this value complies with the sparsity percentage of the primary divisions of the brain ( Morante et al., 2020 ), whereas for the rest of the sources a gradient of values from 90% to 0% is specified, to deal with any additional sparse and dense sources.Nevertheless, IADL attains better performance if more accurate information regarding the sparsity of the underlying FBNs is accessible.
Similarity parameter: To control the variation between the imposed task-related time courses and the natural hemodynamic response of each individual, IADL introduces a parameter that reflects our confidence regarding the imposed task-related time courses.By default, TEDM automatically estimates this parameter using a similar procedure as described in Morante et al. (2020) .If needed, the user can adjust this parameter to meet her/his particular needs.
Manual Selection of Regressors: Execution of IADL results in a new design matrix that comprises two sets of regressors, namely, the enhanced task-related time courses and the regressors from additional free components.By default, TEDM assigns all obtained regressors to the new design matrix.However, this is optional.TEDM allows the user to "investigate " the obtained design matrix and manually select those regressors one might want to include in the enhanced design matrix.In this way, one has the freedom to choose, for example, only the enhanced taskrelated time courses and discard the rest of the regressors, or identify and keep just the most significant physiological signals, such as cardiac or movement residuals.This flexibiliaty naturally comes at the cost of increased subjectivity in the analysis.
TEDM and rs-fMRI: TEDM is designed to study task-related fMRI experiments, to exploit all the advances of the IADL algorithm has to offer.However, due to the BSS nature of the IADL algorithm, it can also be used to investigate rs-fMRI experiments where no task-related information is available.Therefore, for rs-fMRI, the dictionary will only contain free components,  =   .These components may still benefit from sparse constraints.

Synthetic fMRI-like data
To quantitatively evaluate the performance of the proposed methodology, we used the same realistic synthetic fMRI-like dataset from ( Morante et al., 2020 ).This synthetic dataset was developed using SimTB 7 , an open Matlab toolbox for the design of fMRI-like data, which has been widely used in many different studies, e.g., Castro et al., 2015;Iqbal et al., 2018 .We chose this dataset because it contains some additional features that make it particularly suitable for this study.The most relevant is that it has six different synthetic subjects, each one with a unique -yet realistic-hemodynamic response, in line with real fMRI data.In addition, the dataset contains a realistic number of components, including artifacts, ( Bianciardi et al., 2009 ).Finally, some of the sources present large spatial overlaps.For simplicity, we summarize the main features regarding this dataset in Section 1 of the supplementary material.

Simulation study
For this synthetic study, we compared the performance of the introduced methodology with conventional SPM and IADL.In the next paragraphs we describe the main steps that we have followed to perform this comparison.
First, following the study in Morante et al. (2020) , we consider three sources of interest, Source A, B and C, that corresponds to Sources 1, 14 and 11 respectively (see supplementary Figure S2).We chose these sources as the task-related ones since they correspond to realistic scenarios that are often encountered in practice.In particular, Source A is easy to identify since it has very low spatial overlaps with other sources and emulates a large block-related experimental design.Sources B and C are more challenging; both exhibit mild and large overlaps, respectively, and emulate an event-related task (intervals shorter than 5 seconds, see supplementary Figure S2).Regarding the differences on the hemodyanic behavior, the synthetic dataset contains six different subjects whose hemodynamic response is defined by a particular -yet realistic-HRF.Supplementary Figure S1 illustrates the HRFs used for each particular subject.Finally, in contrast to the performance evaluation in Morante et al. (2020) , we focused this synthetic study on the final group-level results among all the synthetic subjects.

Standard SPM
For each participant, we followed the standard workflow for a task-fMRI experiment: first, we obtain an estimate of the three task-related time courses using the experimental design and the canonical HRF.Then, we used these three task-related time courses as regressors to construct the standard design matrix and we performed a first level analysis for all the subjects.Finally, we conducted a second level analysis to asses significant active voxels at  < 0 .001 among all the synthetic subjects.

IADL
For this study case, since we are interested on the final group level results, we conducted a conventional multi-subject dataset using IADL, by concatenating the data of the six participants among time.Regarding the task-related information, we used the same regressors as used in SPM, which were obtained using the canonical HRF, as the external information for the constraints in IADL.
Concerning the parametrization of the algorithms, K was overestimated by 20% , i.e., K = 25 rather than 20 (see Section 1 of supplementary material).This is typical in realistic scenarios since it is not possible to know the exact number of sources.All benchmark methods require an estimate of the number of sources.Thus, the same value was provided to all the algorithms.The imposed sparsity percentages for the task-related sources A, B, and C (i.e., 1, 14 and 11 in Figure S2) are 95% , 94% and 90% respectively, which correspond to a slight overestimation compared to the true sparsity values (see Supplementary Table S1).The rest of the values were set in a broad gradient fashion as explained in Morante et al. (2020) (the exact sparsity perentage values are reported in supplementary Table S2), and we set the similarity parameter   = 0 . 2 , that we estimated using the algorithmic procedure in Morante et al. (2020) , which is automatically integrated within TEDM.Finally, we perform a statistical analysis over the obtained group spatial maps to asses significant active voxels at  < 0 .001 .

Proposed methodology
Following the proposed methodology, we construct the design matrix of each participant using TEDM.As external information in the involved constraints we used the same regressors as before.Similarly, we selected a total number of  = 25 components and we set exactly the same sparsity percentages for the task-related components as used for IADL.The rest of the parameters were set as default.Then, we used the enhanced design matrix -using all the obtained regressors from TEDMto perform a first level analysis for each one of the six subjects.As in SPM, we finally performed a group level analysis to asses significant active voxels at  < 0 .001 among all the synthetic subjects.

Performance evaluation
To evaluate the performance of the different studied approaches, we used the Jaccard overlap (  ).The Jaccard overlap, also known as the Jaccard index or Jaccard coefficient, is a commonly used metric of the similarity between two finite sets ( Deza and Deza, 2009;Turner et al., 2018 ).Formally, given two finite sample sets, say  and  , the Jaccard overlap is defined as the ratio of the cardinalities of the forllowing sets: Therefore, since we know the ground truth of the synthetic dataset, the Jaccard overlap is a good measure to quantify the overall success of each method on revealing significant activity.
For completeness, in addition to the Jaccard overlap, we further investigated the performance of the different methods using a) the True Positive Rate (TPR), which measures how sensitive is the method to detect correct significant active vexels, i.e., the success of the method on finding significant activity within the correct area, and b) the False Positive Rate (FPR), which is defined as the ratio of the number of negative events wrongly categorized as positive (false positive error) over the actual number of negative events.Note that these last two metrics are complementary, for example, one method may exhibit a good TPR, but also a large FPR, in which case, the method is not reliable.Similarly, a lower FPR is not useful if the method does not have a good TPR, i.e., it lacks sensitivity.

Real task-related fMRI data analysis
To study the performance that can be achieved with the enhanced design matrix, compared to the standard one, we used the task-related fMRI dataset from Protopapas et al. (2016) .In this section we describe the main steps we have followed to obtain the standard and the enhanced design matrix.For simplicity, we conducted this study within the framework of SPM, including the use of the TEDM to enhance the design matrix.

Dataset and preprocesing
The task-related fMRI experiment in Protopapas et al. (2016) includes a total number of 44 participants (32 women) recruited from the university community (age M = 29.5 y, SD = 5.8, range 19-47).During the scan, participants performed a lexical decision task, including three experimental conditions of 150 trials each, corresponding to the presentation of words, pseudowords, and fixation.The dataset was preprocessed using SPM8 1 , involving also quality check using scripts from the ArtRepair toolbox as described in Protopapas et al. (2016) .

Standard Design Matrix
For each participant, we followed the typical workflow for this kind of task-fMRI experiment, as described in detail in Protopapas et al. (2016) .We modeled each stimulus as an event of its corresponding type (word, pseudoword, fixation), with duration equal to the presentation time.Then we generated the corresponding taskrelated time courses in SPM12 1 , convolving each experimental trial sequence with the canonical HRF.
As detailed in Protopapas et al. (2016) , to account for small variations of the HRF, we augmented the design matrix by incorporating the temporal and dispersion derivatives of each task-related time course.Finally, we included motion regressors to further improve decomposition.These steps are in line with the standard procedure of task-related fMRI studies ( Huettel et al., 2009 ).

Enhanced Design Matrix
Following the proposed methodology, the design matrix of each participant was enhanced using TEDM.For this study, we prepared the TEDM set-up as follows: First, the three task-related time courses employed in the standard design matrix were used as prior-information, in order to otbain the three assisted time courses, as explained above.For the similarity parameter, we used the default value estimated by TEDM from the imposed task-related time courses.In addition to the three assisted sources, we used some extra free components to augment the design matrix.Studies concerning the total number of components present in fMRI data, e.g., Beckmann and Smith, 2004;Calhoun et al., 2001;Cordes and Nandy, 2006 , have shown that the number of components varies between 20 and 40.Consequently, in this study, we included 27 extra free components to accommodate other potential braininduced sources and other interfering components, amounting to a total of 30 sources for each participant, which constitutes an adequate and realistic number of the total sources.
Regarding the sparsity information of the sources, we modified the default parameters of TEDM to include additional external information.Specifically, the FBNs that are theoretically expected to be involved are language-related, and language-related FBNs typically exhibit a maximum sparsity percentage of around 95% ( Morante et al., 2020 ).Therefore, we set the sparsity percentage equal to 90% for the assisted sources that correspond to words and pseudowords, as an upper bound that allows accounting for small natural variations.For the free sources we used the default values of TEDM, which automatically provides a smooth gradient of sparsity values ranging from 95% to 0%.This gradient accounts for several components with different sparsity percentages, ranging from sparse components, such as other FBNs or some cardiac artifacts, and guarantees enough space for dense (non-sparse) components, as it was explained in Section 2.2 .
After setting the parameters, we used TEDM to obtain the enhanced design matrix.The result is a new SPM file that includes the enhanced design matrix where, by definition, the first three columns correspond to the subject-specific task-related time courses (words, pseudowords, and fixation) and the remaining columns correspond to the extra free components that function as additional regressors.

Performance evaluation on the synthetic dataset
In this study, we compared the overall success of IADL, standard SPM, and the proposed methodology on revealing the activation areas corresponding to three sources of interest (Source A, B, and C).As we detailed in the method section, these three sources rensemble differen realistic scenarios that may appear in practice.Furthermore, for each studied method, we repeated the same analysis over ten different noise realization of the dataset to avoid idiosyncratic results due to a particular noise seed selection, using realistic noise levels ( Gudbjartsson and Patz, 1995;Welvaert and Rosseel, 2013 ).
Fig. 2 visualizes the overall success of each studied method for the three different synthetic sources of interest.The bars in the figure depict the mean Jaccard overlap obtained for the ten different group noise realizations for the three different sources of interest.
Observe that the proposed methodology clearly exhibits better overall success in revealing the activation areas of the synthetic sources, IADL also succeeds on providing a good estimate, whereas SPM struggles the most.Nevertheless, the reasons behind these results are unclear; recall that the Jaccard overlap only provides information regarding the overall success.To further investigate what is happening, we have to look at the complementary metrics that we described in the method section.
Table 1 shows the results for the TPR and the FPR for the different studied methods.Observed that SPM exhibits the best TPR compared to the rest of the studied approaches, i.e., it is highly sensitive.However, we can observe that SPM presents a high FPR, making it unreliable and reducing the Jaccard overlap, as seen in Fig. 2 .On the contrary, IADL is considerably much better at avoiding the proliferation of false positives, which in this case is zero.This behavior is a combined result of the efficient sparsity constraint of IADL and the posterior satistical test.However, despite this excellent FPR, the TPR of IADL is considerably smaller than the rest of the studied alternatives.In this way, IADL is considerably more reliable than SPM but lacks sensitivity.Finally, observe that the proposed methodology simultaneously exhibits better TPR than IADL and a considerably smaller number of false positives compared to SPM.

Table 1
Comparison of the TPR and FPR (mean ± std.) for the three synthetic studied sources among the ten group noise realizaions of the synthetic dataset.

Comparison of the estimated task-related time courses
Before examining the results of GLM from the two design matrices under consideration, we will first study the major differences between the enhanced task-related time courses and the conventional ones.

Enhanced task-related time courses
The enhanced design matrix from IADL has two parts: the assisted part that contains the task-related time courses and the free part that accommodates other components and residuals ( Morante et al., 2020 ).A sample of task-related time courses of the assisted part of the dictionary are displayed in Fig. 3 , which depicts the three enhanced taskrelated for two randomly selected participants, in comparison to the conventional imposed task-related time courses, which were estimated using the canonical HRF.More results are shown in Figure S4 within the supplementary materials.The enhanced task-related response preserve the general functional shape of the imposed ones, as expected due to the similarity constraint.However, they are not identical.For example, some dips are observed at the onset of activation.Although conventional models, such as the two gamma distribution model, omit this initial dip ( Friston et al., 1994 ), the existence of this early deoxygenation stage has been consistently reported and is well studied, e.g., Kamran et al. (2018) .Other relevant differences include the variation of onset maxima and the modulation of posterior undershoots, in accordance with the expected natural variation of the hemodynamic response ( Handwerker et al., 2004;Seghouane and Shah, 2013 ).Fig. 4. Estimated HRFs for three randomly selected participants using the two-gamma distribution from the enhanced task-related components for each participant.The first column displays the functional shape of the HRF estimated for each studied task -colored lines-for three randomly selected participants, the gray lines correspond to the estimated HRFs for the rest of the participants.The second column shows the difference of the estimated HRFs compared with the canonical one.

Analysis of the hemodynamic response
According to the experimental design, each participant performed the same lexical decision task but in an individually randomized trial sequence.As a result, each participant has a unique set of task-related time courses.In addition, the similarity constraint relaxes the smooth behavior of the imposed task-related time courses.It is thus difficult to evaluate whether the observed differences between the conventional and the enhanced time courses are due to actual modulations of each individual's hemodynamic response or simply to noise residuals.
To better illustrate whether the IADL algorithm succeeded in unveiling the variation of the hemodynamic response among participants, we estimated the effective HRF of each enhanced task-related time course using the two-gamma distribution model.To this end, we fit the parameters of the two gamma-distribution model ( Friston et al., 1994 ) for the enhanced task-related time courses using the Levenberg-Marquardt algorithm as used in ( Lindquist et al., 2009 ).Similarly, we set the starting values for the algorithm to coincide with those of the canonical HRF.Finally, to check the reliability of this procedure, we applied the same analysis over the task-related components from the standard design matrix, where we obtained that the canonical HRF was the optimal one for all the components, as expected by construction.
Fig. 4 compares the estimated HRFs from each one of the enhanced task-related time courses with the canonical HRF for three randomly selected participants.The first column shows the actual functional shape of the estimated HRFs after estimating the parameters (time resolution 0.1s).The colored lines represent the different estimated HRFs, where the red one corresponds to the canonical HRF.The second column depicts the actual difference between the estimated HRFs and the canonical one.For completeness, the gray lines correspond to the estimated HRFs for the rest of the participants, included here to provide an overview of the variation of HRF over the whole dataset.
Note that the estimated HRFs are relatively close to the canonical HRF, however they are not identical.Crucially, the estimated HRFs are very similar for experimental conditions expected to elicit similar activity, that is, for words and pseudowords, compared to those estimated for the fixation condition, which is not expected to result in a similar activation.This is important because the HRFs are estimated independently for each experimental condition, and it confirms that HRF estimation does not model residual noise but, rather, represents a functional adaptation of the activation model to the dominant brain response of each participant for each experimental condition.
In nature, the HRF usually varies among brain regions.IADL does not capture these variations directly, however, it does account for small differences among conditions (i.e., task-related time-course components), as reported in Morante et al., 2020;Morante-Moreno et al., 2017 .Different FBNs are expected to dominate brain activity in response to different experimental conditions.Therefore, the flexibility of the estimated HRFs means that they can indirectly model more accurately the HRF that applies to the brain regions that are associated with the FBNs dominating each experimental condition.Finally, focusing on the differences between the estimated HRFs and the canonical one, we see that these resemble the derivatives of the HRF that are included in conventional analysis to accommodate this variability ( Huettel et al., 2009 ).

Additional components of the enhanced design matrix
In addition to the task-related time courses, the enhanced design matrix can accommodate additional components.As we explained above, IADL extracts these components in a fully blind fashion allowing to uncover additional sources hidden in the fMRI data; other brain sources as well as different interfering physiological signals.
To analyze the nature of the recovered sources, we examined all the additional components and their corresponding parameter maps among all the studied participants.From a general overview, although these additional sources highly depend on the considered individual's data, we can draw some general conclusions that extend to all the studied participants.
First, we observe that most of the common components are physiological residuals related to heartbeat and head movements.Regarding the interfering cardiac signals, these sources are relatively easy to identify since their corresponding time courses are high-frequency components.Besides, cardiac artifacts usually exhibit significant activity in distinctive locations, e.g., near main blood vessels.Interestingly, although the time courses associated with these components are unique for each participant, their corresponding spatial maps show consistent activation patterns common among all of them.The reason behind this behavior is simple: healthy individuals share similar primary anatomical structures.Therefore, cardiac artifacts that affect these specific anatomical areas will produce similar patterns among all individuals.
Similarly, motion residuals are also present among all the participants.The time courses associated to motion residuals often show a smooth change or present sudden peaks due to a small random movement.Thus, each time course will be different for each participant.The spatial maps associated to motion residuals usually exhibit significant activity around the brain's edges.Nonetheless, unlike cardiac artifacts, the spatial maps from these motion residuals are different for each participant since the head movement of each individual is unique.
In addition to interfering physiological signals, we observe the presence of other brain activation patterns.The most relevant source that appears in all the participants is the Default Mode Network (DMN).A closer examination of the corresponding time courses showed that they are highly anticorrelated with the performed tasks, as expected for this particular source.On the other hand, we also observed significant activity within the somatosensory and auditory cortex among all the participants.Interestingly, although all the participants showed activity in those areas, the particular patterns and the exact location of these activation areas slightly vary among individuals.
For simplicity, Fig. 5 depicts the spatial maps of five of these common components for five randomly selected participants.In particular, this figure shows the significant active voxels, | | > 2 .3 , from their corresponding spatial maps after running IADL.Note that we did this extra step to visualize the spatial maps of the estimated additional components.The time courses associated with these additional components appear in Figure S5 within the supplementary material.
In summary, the first column from the left in Fig. 5 shows the DMN.The second column corresponds to the somatosensory and auditory cortex.In addition to these two brain-related sources, the last three columns contain a selection of three interfering physiological components.The third column shows a cardiac artifact that exhibits significant activity within the 4th ventricle.Some participants also show significant activ-ity that extends through the lateral ventricles and the straight sinus.The fourth column shows another cardiac artifact at the base of the brain near the brainstem.A closer inspection revealed that these areas correspond, indeed, to major arteries.Finally, the last column depicts a motion residual.Interestingly, the differences between cardiac and motion residuals are also prominent when observing their corresponding time courses (see Figure S5).
Finally, in addition to those common components that Fig. 5 displays, it should be pointed out that we identified more relevant sources: for example, the attention network appears in more than half of the participants.Similarly, we observed significant activity within the precuneus where their corresponding time courses appeared highly correlated with the pseudowords in almost half of the participants.

Single-subject level analysis
To study the effects of the new proposed methodology, we analyzed the task-fMRI data using the previously described design matrices to perform a first-level analysis for all participants using SPM.Similarly to the study in Protopapas et al. (2016) , we defined three contrasts that convey information in accordance with the experimental task: words vs. fixation, pseudowords vs. fixation, and words vs. pseudowords.For each contrast, voxelwise significance was corrected for family-wise error (FWE) at  < 0 .05 .
A preliminary visual examination of the three contrasts indicated that the two design matrices produced qualitatively similar results.However, closer inspection over certain regions revealed some interesting differences.Thus, we focused our study on four particular regions: (a) left primary visual cortex (peak at -17 -91 -11) from the contrast words > fixation, (b) fusiform gyrus (peak at -42 -52 -20) and (c) inferior/superior parietal gyrus (peak at -27 -58 + 46) from pseudowords > fixation, and (d) middle temporal gyrus (peak at -45 -58 + 13) from words > pseudowords.
The selection of these four regions was made on the basis of several reasons: First, the expected response of the primary visual cortex exhibits consistent activity patterns among participants.Therefore, the left primary visual cortex constitutes an excellent example of an area that is easily identifiable and consistent.Second, the activation patterns within the fusiform gyrus, the inferior/superior parietal gyrus (pseudowords > fixation) and the middle temporal gyrus (words > pseudowords) have a relevant theoretical location within the left hemisphere ( Protopapas et al., 2016 ).Moreover, the precise corresponding areas within these gyri are considerably harder to locate compared to those within the primary visual cortex.No doubt, the most challenging Region Of Interest (ROI) is associated with the contrast words > pseudowords because both conditions trigger very similar FBNs, resulting in considerable overlap.
To quantitatively investigate the performance of both design matrices, we identified the activated clusters within the four ROIs among all participants, using the corrected maps from SPM as follows: First, for each ROI, we identified the closest cluster peak to its theoretical location using the Euclidean distance ( Mayka et al., 2006;Thyreau et al., 2012 ).If the closest peak lay inside a sphere centered at the theoretical peak with a 30 mm radius, then we took the whole cluster to belong to that ROI.Otherwise, we concluded that no significant activity was detected in that region for that particular participant.Table 2 shows the number of participants who exhibited significant activity within the expected theoretical locations using the standard and the enhanced design matrix.
After obtaining the significant clusters within each region of interest, we analyzed the overlap consistency among participants.For this analysis, we measured the Jaccard overlap of the obtained clusters between pairs of participants.Table 3 sums up the main results using both design matrices.In particular, it shows the percentage of pairs that contain at least one voxel that overlaps, the maximum Jaccard overlap, and the mean Jaccard overlap among pairs of participants for each studied ROI.
For completeness, Figure S6 in the supplementary material visually depicts the values of the Jaccard overlap for the two studied approaches.
Finally, in order to visualize the overlap consistency among participants, we determined the conjunction maps8 for each ROI.Fig. 6 depicts the obtained conjunction maps, where each colored voxel represents the number of participants that exhibited significant activity within the specific voxel.

Left primary visual cortex (words vs. fixation)
All participants exhibited significant activity within the left primary visual cortex.Specifically, we found at least one significant cluster within the expected ROI for both design matrices (see Table 2 ).Both approaches produced similar cluster sizes.Nevertheless, the standard method exhibited greater variability in the number of significant active voxels.
Analysis of the overlap between pairs of participants revealed that, for both approaches, the majority of pairs overlap, as seen in Table 3 .With respect to the consensus between pairs of participants, the enhanced design matrix resulted in a mean Jaccard overlap that was twice as large as that from the standard approach, with the same variance, as can be seen in Figure S6 in the supplementary material.
Going beyond pairwise overlap, we found that the conjunction maps from the enhanced design matrix exhibited greater consensus among all participants than the standard approach, as shown in Fig. 6 .a,since the results from the enhanced approach are more compact and the obtained clusters present larger overlaps.
As expected, the left primary visual cortex exhibited good results for both designs.Nonetheless, the improved discriminative power of the enhanced design matrix is reflected in the enhanced recovery of the ROI, with higher consistency among participants.

Fusiform Gyrus (pseudowords vs. fixation)
Turning to the fusiform gyrus, we observe that using the enhanced design matrix all but two participants exhibit significant activity within the ROI.In comparison, using the standard design matrix resulted in four participants not exhibiting significant activity within the expected region.Furthermore, the results from the enhanced design matrix show less cluster-size variability, and they appear closer to its theoretical location on average (see Table 2 ).
The study of the overlaps reveals that almost half of the pairs overlap for both design matrices.As we can observe in Table 3 , there are no major differences between the two design matrices.However, Fig. 6 .bshows that the cluster of interest also contains significant activity from the visual cortex, which might explain why the two design matrices have produced similar results compared to those from the left primary visual cortex.
In this case, we observe that the enhanced design matrix provides more consistent results among participants, as more participants were detected with similar Jaccard overlap and with lower cluster-size variance.

Inferior/Superior parietal (pseudowords vs. fixation)
Only some of the participants exhibited significant activity in the inferior/superior parietal gyrus.However, we detected significant activity in eight additional participants using the enhanced design matrix, compared to the standard one (see Table 2 ).Furthermore, the clusters obtained using the enhanced design matrix present less cluster-size variability and are closer to the theoretical location on average (see Table 2 ).

Table 2
Analysis of the significant clusters over all studied ROIs at the single-subject level.The table lists the number of participants that exhibited significant activity within the expected ROI and the corresponding percentage with respect to the total number of participants, in parenthesis.The table also lists the mean cluster size and the mean deviation of the maximum peak activity for each studied ROI.Table 3 Analysis between pairs of participants at the single-subject level.The table shows the percentage of pairs of participants who obtained significant cluster overlap.For completeness, it also lists the maximum and the mean Jaccard overlap among all the overlapped pairs of participants.Computation of the spatial overlap revealed overlapped clusters in only a quarter of participant pairs (see Table 3 ).The mean Jaccard overlap did not differ between the two approaches, but the clusters from the enhanced design matrix were smaller and exhibited less variability.Therefore, because more participants showed significant activity within the region of interest, these results suggest that the clusters from the enhanced design matrix exhibit similar or better consensus compared to the results from the standard design matrix.
Fig. 6 (c) shows the conjunction maps of the obtained clusters among all participants.Both approaches produced considerably less consensus in this region compared to the results for the visual cortex or the fusiform gyrus.The spatial distribution of the conjunction maps was similar for the two design matrices.However, note that the enhanced design matrix found significant activity within this ROI in more participants compared to the standard design, whereas the Jaccard overlap is more or less the same for the two approaches.Thus, in this case, although there are no considerable qualitative differences, the results shown in Table 2 suggest that the enhanced design matrix shows more sensitivity compared to the standard one.

Middle Temporal Gyrus (words vs. pseudowords)
Only a reduced subset of participants exhibited significant activity within this ROI (see Table 2 ).However, when using the enhanced design matrix, significant activity could be detected in more than half of the participants, that is, a total of 11 additional participants compared to those detected with the standard design matrix.Moreover, clusters from the enhanced design matrix are slightly smaller than those from the standard design matrix.
Concerning the consensus among overlapping pairs of participants, this ROI presents the worst results compared to the other regions, as only a small percentage of pairs show overlap.Still, almost twice as many pairs exhibit overlap when the enhanced design matrix is used, compared with the results using the standard design matrix.The mean Jaccard overlap from the enhanced design matrix is also slightly better than the standard (see Table 3 ).
The conjunction maps in Fig. 6 .dindicate that both approaches led to considerably lower spatial consensus compared with the other studied ROIs.Importantly, although the conjunction maps show a cluster within the theoretical peak of the ROI for both design matrices, the shape of the conjunction maps is quite different.This result stands in contrast to results from the previous ROIs, where both approaches lead to similarly shaped conjunction maps.
The relatively poor performance of this ROI was expected since, as we previously mentioned, this area is particularly challenging because the underlying task-related time courses are expected to trigger very similar FBNs.Therefore, it is not surprising that the weakest results are obtained for this ROI.In this context, we would like to emphasize that the enhanced design matrix improves the results of SPM, and, also, it increases the number of participants with significant activity within the region of interest.In this way, these results indicate that the enhanced design matrix has more sensitivity in detecting relevant significant activity at the single-participant level.

Second-level analysis
Second-level analysis is essential for inferring significant information regarding entire groups of participants ( Bossier et al., 2020 ).Standard procedures usually perform statistical tests over all first-level results to extract common group information.However, all group analyses are sensitive to the number of participants and to mis-modeling associated with potential individual variations in brain responses.
The number of participants has a considerable effect on the anatomical location and reproducibility of fMRI results.As reported in Turner et al. (2018) , conventional statistical analyses require relatively large number of participants to obtain reproducible results ( Bossier et al., 2020 ).However, increasing the number of partici-pants reduces the anatomical specificity of SPM ( Thyreau et al., 2012 ), because the standard error decreases as the square root of the number of participants, and the associated significance tests will then be more likely to reject the null hypothesis; that is, the larger the number of participants the more extensive the areas that are detected as "activated ".
On the other hand, another relevant factor that affects group analysis is the potential differences among participants and the natural variability of the underlying brain activation patterns among individuals.Although participants that perform the same task are expected to produce the same activation patterns, the truth is that there is variation among individuals.Little attention has been paid to the causes of this variability in the patterns of brain activity, although many researchers using neuroimaging techniques acknowledge that this is an issue ( Miller et al., 2009 ).
Similarly, the oversimplification of hemodynamic responses by conventional HRF-based models neglects the natural heterogeneity of neuronal activity, which may considerably limit the reproducibility and specificity of group results ( Poldrack et al., 2011;Xu et al., 2016 ).Thus, it should be emphasized that the quality of first-level analysis will affect the performance of any follow-up group analysis.Failing to provide a suitable estimate of the design matrix and unmodeled subject-dependent components, potentially introduces detrimental effects that may compromise subsequent group analyses.

Group size analysis
In order to evaluate the impact of the number of participants on the detected cluster size, we performed a second-level analysis over the three studied contrasts (words vs. fixation, pseudowords vs. fixation, and words vs. pseudowords) using both design matrices.We then studied the effects of group size on the activated areas within the four ROIs that we defined in the first-level analysis.
For this study, we defined five group sizes, namely 20, 25, 30, 35, and 40.As the full dataset contains only 44 participants, to avoid nongeneralizable findings due to idiosyncratic participant selection, we examined ten different randomly selected combinations of participants for each group size, thus creating a total of 50 different groups.For each group, a standard t -test for each contrast of interest was performed.Using the obtained maps from each contrast, we identified the significant active voxels at  < 0 .001 (uncorrected), and we examined the clusters obtained within the four studied ROIs of interest, following the same procedure as described in the first-level analysis.Fig. 7 shows the number of significant voxels found for each studied ROIs, as a function of group size.The square dots joined by a line display mean cluster size.In general, we observe that clusters using the standard design matrix are considerably larger and present higher variability in cluster size among the different subgroups of participants.On the contrary, the enhanced design matrix produces smaller clusters with considerably more consistent cluster size across groups.In this way, the particular selection of participants produces more substantial variations under the standard approach than when using the enhanced design matrix.
Apart from this general observation, the results show some further differences: First, in Fig. 7 (c), one can observe that the cluster size in the enhanced design matrix suddenly jumps from less than 100 to more than 500 significant active voxels, for the study of groups with 30 participants.A closer inspection of this contrast revealed that the reason behind this sudden growth is that the ROIs (b) and (c) (which belong to the same contrast) tend to merge to a single cluster.In this way, for small groups, the enhanced design matrix separates the two clusters, whereas for larger groups the two clusters merge.Interestingly,Fig. 7 (b) does not show substantial variation; this is because cluster (b) is considerably larger than cluster (c) and it is therefore less affected by the merging with the smaller cluster.
On the other hand, the standard design matrix does not exhibit any similar anomaly because it already presents merged clusters, even for small groups.This merging also explains why cluster size behaves con- Although the differences between the two studied approaches can be seen at a glance, it is not evident whether the observed variations occur due to the enhanced design matrix performance or to a reduction in the sensitivity.Similarly, the slower apparent growth rate of cluster size with increasing group size for the enhanced design matrix can be due to a reduced sensitivity to group size, or it can be due to smaller clusters produced by this approach.The large difference in cluster size between both approaches prevents reaching a definite conclusion.Therefore, to shed light on the issue, we studied the relative growth of the results.Fig. 8 shows the evolution of the proportion of significant active voxels as a ratio over the cluster size for the group with  = 40 , effectively equating the relative cluster size (equal to 1) for the two approaches.
Using this normalization, Fig. 8 shows that the relative growth of cluster size is similar for the two approaches.Therefore, these results suggest that the large differences in the size of the clusters in Fig. 7 is due to the fact that the enhanced design matrix naturally produces smaller clusters.Interestingly, this consistency and the relatively reduced cluster size indicate that the enhanced design matrix exhibits better specificity than the standard approach.In contrast, we also observe that the standard design matrix is profoundly affected by the number of participants and the stringent limitation of the fixed HRF-model.

Full second-level analysis
To evaluate the anatomical reliability as well as the main differences between the two studied approaches, we completed our study by performing a second-level analysis using all 44 participants for each contrast.Fig. 9 shows the obtained significant activated voxels at  < 0 .05 (FWE corrected) for the three studied contrasts.The left part of the figure shows the results obtained using the standard design matrix, whereas the right part corresponds to the results using the proposed methodology.The white circles mark the location of the four ROIs: (a) left visual cortex, (b) fusiform gyrus, (c) inferior/superior parietal gyrus, and (d) middle temporal gyrus.For completeness, Table S3 and Table S4 within the supplementary material list all obtained clusters for both design matrices respectively.Finally, to analyze the anatomical reliability of those detected areas, Fig. 10 compares the obtained significant active voxels with anatomical templates for words9 and pseudowords10 accordingly.
Overall, both approaches succeed in revealing significant activity within the expected ROIs.Nevertheless, we can observe some general differences: First, similarly to the results from the analysis of the effects of group size, we can qualitatively confirm that the enhanced design matrix produces more concentrated clusters within the ROIs than those obtained using the standard design matrix.For example, in the contrast words vs. fixation, clusters from the enhanced design matrix appear concentrated within the visual cortex.In contrast, results from the standard design matrix show activity beyond the secondary visual cortex, even in the posterior parietal area.
Beyond this general observation, a closer inspection reveals some further differences.The most interesting appears within the contrast words vs. pseudowords.Note that, although both design matrices produced significant activity within (d), the standard design matrix produces an additional significant activation cluster (peak at -54 -48 37) within the inferior parietal lobe that extends towards the supramarginal gyrus, whereas the enhanced design matrix does not show any peak of activity near this location.This observation also contrasts with the re- sults of the conjunction maps in Fig. 6 at the single-subject level, where none of the design matrices showed consistent activity near this area.
One can imagine that this area could have appeared in the firstlevel analysis but it was squashed because the applied correction was too strict.However, this is not the case: Closer inspection of this contrast and the parametric maps of words and pseudowords at different significance thresholds suggests that this area appears spuriously due to negative correlations and some residuals among participants, potentially introduced by mis-modeling.Therefore, although this area appears significantly "activated " at the second-level analysis using the standard design matrix, it does not play any significant role in the contrast words > pseudowords at the single-subject level.In contrast, the enhanced design matrix does not exhibit this erratic pattern, as the results from the second-level analysis consistently overlap with the results from the conjunction maps.Moreover, when comparing these results with the spatial templates in Fig. 10 , we confirm that our methodology succeeds in revealing the activation patterns that agree with previous related studies.In this respect, the standard approach clearly fails (see dashed line circle in Fig. 10 ), revealing a solid activation cluster that does not match the Neurosynth template but, rather, appears -as examined in detail above -to correspond to a false positive due to unmodelled residuals.
Another difference between analyses with the standard vs. the enhanced design matrix appears within the contrast of pseudowords > fixation.In particular, the enhanced design matrix produced significant activity within the middle temporal gyrus (peak at -57 -28 +1), which is not present in the results of the standard design matrix (see Fig. 9 and Table S3).However, unlike the previous area within the supramarginal gyrus, a closer inspection of the results at the first-level analysis revealed significant activity within this particular contrast in most of the studied participants.Figure S8 in the supplementary material shows the conjunction maps for this new ROI for both studied approaches.Thus, in contrast to the previously discussed region, this area plays a functional role within this contrast and provides valuable infor-mation ( Protopapas et al., 2016 ).Furthermore, when comparing this additional active area with the spatial templates in Fig. 10 , we observe that this cluster agrees with the activation observed in other word-related studies.Hence, this additional cluster that was only detected using the enhanced design matrix likely plays a role in this contrast.
Therefore, in agreement with the results of the conjunction maps and the comparison with the spatial templates, this area should have appeared in the second-level results for both design matrices.However, it seems to have been hidden by the statistical corrections in the case of the standard design matrix, as it is evidenced in the examination of the second-level results without correction (see Figure S9 in supplementary material).In particular, both design matrices produce significant activity within this area in the uncorrected statistical maps.These results show that the enhanced design matrix has more anatomical reliability and improved sensitivity compared to the standard design.

Discussion
After performing a detailed comparison within the GLM framework, we observe that the proposed methodology results in a participantspecific design matrix that represents both task-related sources of activation as well as additional interfering components, and it attains a closer-fitting basis for statistical modeling within the usual GLM framework.
The comparison of the enhanced task-related components with the conventional task-related time courses already indicates that the algorithm introduces some relevant differences.In particular, when studying the retrieved hemodynamic responses (see Fig. 4 ), we confirm that the algorithm can cope with individual differences in the hemodynamic responses, in agreement with previous results and the construction of IADL formulation ( Morante et al., 2020;Morante-Moreno et al., 2017 ).Nonetheless, recall that we estimated the HRFs in Fig. 4 from the enhanced task-related time courses; IADL does not implement any particu- At the first level analysis, our test case demonstrated that the new method results in significant task-related activation detected in more participants within the expected ROIs compared to the standard design matrix.In addition, the detected activation from the enhanced design matrix is less spatially extended among participants.In other words, the new methodology produces activation clusters for a larger number of participants that are simultaneously more precise and more anatomically reliable.
Similarly, the group analysis demonstrated that the enhanced discriminative power of the proposed methodology goes beyond a mere augmentation -in a conventional sense-of the design matrix or a change in the significance threshold ( Huettel et al., 2009;Poldrack et al., 2011 ).
On the contrary, we observed that the enhanced methodology provides more spatially reliable clusters with a relatively smaller size, and helps recover significant active areas that cannot be seen using conventional approaches.Furthermore, the group analysis with the synthetic data revealed a similar behavior; the proposed methodology exhibits more reliable results compared to SPM, with a substantial reduction of the proliferation of false positives.Finally, the comparison of the obtained results with spatial templates from other related studies (i.e., using the same terms and therefore targeting similar brain areas) confirms our findings by demonstrating that the proposed methodology is more anatomically reliable and more sensitive in detecting significant activity of interest compared to the standard procedure.

Appraisal of the proposed methodology
In the analysis of task-related fMRI experiments the user is primarily interested in BOLD-signal variations that are associated with the experimental time courses, free from interferring physiological signals and artifacts to the largest possible extent ( Dagli et al., 1999;Fair et al., 2009;Huettel et al., 2009 ).Traditional GLM-based analysis relies entirely on preprocessing and on the augmentation of the design matrix to remove those artifacts as well as to account for subject-specific variation on the hemodynamic response ( Poldrack et al., 2011 ).However, failing to provide an adequate estimate of these regressors may result in maps that do not correspond with the correct distribution of active FBNs.
The proposed methodology offers a novel approach for the construction of the design matrix, along with the familiarity associated with the long-established SPM framework.Thus, it permits modeling flexibility around the hemodynamic response, which is assumed to be fixed in traditional analysis, potentially allowing improved model fit and recovery of otherwise undetectable activation patterns.On the other hand, unlike the more traditional analysis -including standard DL methods-that cannot capitalize on sparse spatial information across voxels to identify the activation patterns that might signify actual FBNs, the proposed methodology uses the sparse spatial information to identify the voxels that better accommodate the activated areas associated with each source.In addition, by enforcing a sparsity-promoting constraint, the new method guards, to a sufficient extent, against the uncontrollable proliferation of significantly activated voxels as the sample size increases or as a function of natural SNR variation across brain regions ( Thyreau et al., 2012 ).
Importantly, the reduction in cluster size achieved by the proposed methodology is not associated with an increased miss rate.We are not proposing a "stricter " method, which might produce fewer but somehow more trustworthy clusters.Instead, as the experimental findings at the second-level indicate, we are proposing a method that, based on our understanding of the brain functioning, is more stringent in terms of spatial extent without being less sensitive in terms of detecting significant task-relevant sources, as it is evidenced by the comparison with Neurosynth templates related to similar studies (see Fig. 10 ).This permits the identification of the most relevant regions of the experimental task without worrying about significance spillover as standard errors are reduced in studies with increasingly larger samples.The importance of this development is seen more clearly once we take into account the current tendency to increase the sample size in fMRI research, in response to longstanding criticism regarding low power and a concomitant lack of reliability or replicability ( Cremers et al., 2017;Desmond and Glover, 2002;Nee, 2019;Yeung, 2018 ).
On the other hand, a critical assumption regarding group analyses of fMRI data concerns localization.For a group analysis to be interpretable, it must be the case that, following spatial normalization, individuals' brains exhibit nearly identical patterns of spatial localization of activation patterns.This assumption is known to hold only very approximately, if at all.Spatial smoothing is commonly applied to manage minor discrepancies, while functional localizers are occasionally employed when larger discrepancies are expected (see Chapters 1-4 in ( Hanson and Bunzl, 2010 ).The extent of actual co-localization across participants is rarely examined, perhaps in part because reliable detection of FBNs is difficult on individual brains, due to low SNR, and perhaps because interpretable group results may seem reassuring with respect to the critical assumption.However, when directly examined, large task-related individual variability has been found (e.g., Miller et al. (2009Miller et al. ( , 2002) ) ).
In this context, a method that can reliably detect activation patterns in individual brains and thereby help evaluate the colocalization assumption attains primary importance.Our proposed method constitutes a clear improvement on this front, as use of the enhanced design matrix was demonstrated to lead to the detection of reliable activation in more participants than the standard design matrix.Moreover, and perhaps more important, the proposed method produces higher spatial concordance among individual-participant activation maps than the traditional approach.It thereby seems to fulfill the critical colocalization assumption to a larger degree -or at least for a greater proportion of participants.
Due to the nature of the IADL-recovered sources, taking into account not only variations in the hemodynamic response but also the spatial location of the active voxels of each FBN, we can be more confident that the sources correspond to actual and reliable task-relevant FBNs.In conjunction with the increased sensitivity, this means that our proposed methodology can provide the much-needed foundation for a systematic exploration of individual variability in localization across tasks, contexts, and populations ( Dubois and Adolphs, 2016 ).

Conclusions
In this paper, we proposed a new methodology for constructing the design matrix within the GLM framework.The proposed methodology successfully exploits the main advantages of the IADL algorithm, resulting in an enhanced design matrix, which combined together with the conventional GLM framework optimally explains the individual's fMRI data.In particular, the proposed methodology allows the users to cope with the natural variations of the hemodynamic response among participants and to augment the design matrix with additional subjectdependent components, such as other brain sources beyond the taskrelated ones, and other interfering components.The performance evaluation over a realistic synthetic dataset showed that the proposed methodology exhibits (a) better overall performance than using IADL alone, and (b) a substantial reduction of false positives compared to the standard SPM.
The re-analysis of data from a challenging fMRI experiment revealed that the enhanced design matrix (a) copes with the variation of the hemodynamic response among participants, (b) exhibits improved sensitivity in detecting the significant activity of interest, and (c) unveils other brain sources and artifacts present within the fMRI data.Furthermore, a detailed analysis of the different studied ROIs showed that the enhanced design matrix is more anatomically reliable compared to the standard approach.We hope that further scrutiny and wide adoption of the proposed methodology -facilitated by the availability of the TEDM toolbox for SPM -can contribute to improvements in the sensitivity and reliability of fMRI results across the research community.

Data and Code Availability Statement
Data availability: The data analyzed during the current study are not publicly available due to the privacy policies enforced during the performance of the fMRI experiment, but they are available from the corresponding author on reasonable request.
Code availability: All the code used in this study is publicly available.To this aim, the authors developed and used a software toolbox extension for SPM, referred to as Toolbox for Enhanced Design Matrix (TEDM), which is freely available in the TEDM GitHub repository: https://github.com/Dmocrito/TEDM.

Fig. 2 .
Fig. 2. Mean Jaccard overlap among the ten different group realizations for the three studied synthetic sources of interest.

Fig. 3 .
Fig. 3. Comparison of the enhanced task-related time courses with the conventional ones using the canonical HRF for two randomly selected participants.Each inset graph depicts the time courses associated with the three performed tasks: words, pseudowords, and fixation.

Fig. 5 .
Fig. 5. Visual representation of the significant active voxels, | | > 2 .3 , of five common sources among all the participants from the free part of the enhanced design matrix.The first two columns from the left hand correspond to two brain-induced sources.The last three columns show three different interfering physiological sources.For simplicity, the figure only displays the results of five randomly selected participants out of the total 44 participants.

Fig. 6 .
Fig. 6.Conjunction maps of the obtained significant clusters among all the studied participants.The figure displays the conjunction maps for the four studied regions of interest: (a) left primary visual cortex, (b) fusiform gyrus, (c) inferior/superior parietal gyrus, and (d) middle temporal gyrus.Each colored voxel represents the number of participants that exhibited significant activity within this voxel.

Fig. 7 .
Fig. 7. Number of significant active voxels at  < 0 .001 (uncorrected) for the different group sizes.The point-lines represent the mean value obtained between the groups with the same size.

Fig. 8 .
Fig. 8. Proportion of significant active voxels at  < 0 .001 (uncorrected) for the different group sizes.The point-lines represent mean values obtained between groups with the same size.

Fig. 9 .
Fig. 9. Visual representation of the T-maps from the second-level analysis for the three studied contrasts.All maps show voxels with significant activation at  < 0 .05 (FWE corrected).Results using the standard design matrix are displayed on the left side, and results using the enhanced design matrix are on the right side.White circles mark the location of the four ROIs: (a) left visual cortex, (b) fusiform gyrus, (c) inferior/superior parietal gyrus, and (c) middle temporal gyrus.lar model.The actual estimated responses are beyond the convolutional model and allow for more flexibility than the conventional approach.At the first level analysis, our test case demonstrated that the new method results in significant task-related activation detected in more participants within the expected ROIs compared to the standard design matrix.In addition, the detected activation from the enhanced design

Fig. 10 .
Fig. 10.Visual comparison of the detected significant active voxels from Fig. 9 with the spatial templates from Neurosynth for the terms words and pseudowords , for the three main studied contrasts.