Methodologies for task-fMRI based prognostic biomarkers in response to aphasia treatment

With the diversity in aphasia coupled with diminished gains at the chronic phase, it is imperative to deliver effective rehabilitation plans. Treatment outcomes have therefore been predicted using lesion-to-symptom mapping, but this method lacks holistic functional information about the language-network. This study, therefore, aims to develop whole-brain task-fMRI multivariate analysis to neurobiologically inspect lesion impacts on the language-network and predict behavioral outcomes in persons with aphasia (PWA) undergoing language therapy. In 14 chronic PWA, semantic fluency task-fMRI and behavioral measures were collected to develop prediction methodologies for post-treatment outcomes. Then, a recently developed imaging-based multivariate method to predict behavior (i.e., LESYMAP) was optimized to intake whole-brain task-fMRI data, and systematically tested for reliability with mass univariate methods. We also accounted for lesion size in both methods. Results showed that both mass univariate and multivariate methods identified unique biomarkers for semantic fluency improvements from baseline to 2-weeks post-treatment. Additionally, both methods demonstrated reliable spatial overlap in task-specific areas including the right middle frontal gyrus when identifying biomarkers of language discourse. Thus whole-brain task-fMRI multivariate analysis has the potential to identify functionally meaningful prognostic biomarkers even for relatively small sample sizes. In sum, our task-fMRI based multi-variate approach holistically estimates post-treatment response for both word and sentence production and may serve as a complementary tool to mass univariate analysis in developing brain-behavior relationships for improved personalization of aphasia rehabilitation regimens.


Introduction
Aphasia is an acquired language impairment commonly caused by stroke or other forms of brain trauma [12]. This disorder affects the ability to produce and comprehend language due to deficits in semantics, phonology, syntax, and imagery. Given the complexity in language, aphasia may manifest in a myriad of ways with unique language impairments influenced by lesion location and size [38]. Due to the multi-faceted nature of aphasia, treatment interventions are diverse and highly specialized involving biological, behavioral, or adjuvant methods [11] which may pose challenges when trying to identify the right treatment for each patient. Thus, there is a need to develop biomarkers that predict rehabilitation gains in persons with aphasia (PWA) to help clinicians optimize treatment plans.
Mass-univariate methods such as voxel-based lesion-symptom mapping [4] have been used to discover relationships between stroke-related behavioral deficits and lesion sites to identify estimates of language recovery. For example, mass-univariate analysis quantified a positive response to speech entrainment therapy in participants with Broca's aphasia [16]. A caveat, however, to the voxel-wise mass-univariate approach is that by analyzing each voxel independently, voxel-to-voxel relationships are not accounted for [53]. This statistical assumption is an oversimplification as both adjacent and distal regions in the brain are not independent from one-another. Instead, in aphasia and other neurocognitive impairments, behavioral deficits arise when components of complex systems are damaged. Considering the complexity of language neurobiology and the lack of sensitivity towards intervoxel relationships, mass-univariate methods may underestimate the contribution of network-level connections in the language system. There is therefore a need to advance the modeling of brain-behavior relationships.
A potential solution may be to utilize multivariate methods [39,50]. There have been many multivariate methods to predict brain-behavior relationships including support vector machine learning [32,43,55], ridge regression [42,46], and lesion to symptom mapping (LESYMAP) [39]. All these methods have been shown to outperform traditional VLSM in detecting network-level brain-behavior estimations. LESYMAP, in particular, does this by utilizing sparse canonical correlation analysis for neuroimaging (SCCAN) to account for relative voxel interactions. It does so by assigning each voxel a weight reflective of the extent to which each voxel contributes to the correlational analysis [39]. By computing brain-behavior correlations for all voxels simultaneously, the intervoxel relationships are captured [21].
Despite the advantages of utilizing multivariate analysis, the current implementations are limited to assessing brain-behavior relationships from voxels subsumed by a given ROI or lesion as defined by structural magnetic resonance imaging (MRI) data. In this study, we extend the utility of LESYMAP SCCAN analysis to whole-brain task functional MRI (fMRI) data to generate a predictive model that is (i) not constrained by the cohort level lesion coverage and (ii) depends on regional brain function during a language task. The specific aims of this study are as follows: (1) to optimize multivariate-based predictors for whole-brain task fMRI data, (2) to assess the model's reliability, and (3) to validate the model using another behavioral measure. With these aims, this study introduces a novel methodology for generating brain-behavior maps of aphasia treatment response.

Participants
A total of 17 chronic stroke participants with aphasia completed prescreening for eligibility. Inclusion criteria included a diagnosis of aphasia 6 months after a left hemisphere stroke, the ability to follow one-step commands, right-handedness determined by the Edinburgh Handedness Inventory [37], and English as the first language. Aphasia severity was assessed using the Western Aphasia Battery -Revised [25] with a required score below 93.8, the threshold for normal language functioning [24]. Exclusion factors included the presence of other neurological or psychiatric disorders, an inability to be scanned in an MRI, and alcohol/drug abuse within the past year. Potential participants were required to score between 4 and 45 items correct out of 60 items on the Boston Naming Test (BNT) [23] because scores within this range indicate the presence of anomia as well as some residual naming ability. Three participants did not pass the pretreatment screening; therefore, only 14 participants were enrolled in the study. All participants provided written informed consent according to the procedures outlined by the University of Florida Health Science Institutional Review Board. Participant demographics are outlined in supplementary table 1. All participants were six months post-stroke with left middle cerebral artery (MCA) lesions spread across cortical and subcortical regions.

Language treatment
Before treatment, 60 pictures and 40 categories out of more than 400 pictures and 120 categories were selected as probes to track treatment change. Probe sets for participants with severe or profound word-finding impairments included high, medium, and low frequency words. Probe sets for participants with moderate word-finding impairments contained only low frequency words. These probes included similar proportions of living and nonliving words. All participants underwent 30 h of language treatment that was administered over 3 weeks (2 h/day, 5 days/week). The treatment was delivered in 3 phases consisting of one phase per week. The first two phases of treatment consisted of picture naming sessions where the participants were instructed to name 50 black-andwhite line drawings of objects in each phase where 20 of the 50 drawings were probes selected before treatment. 25 % of the selected probes included drawings that were consistently correctly identified in order to prevent participant discouragement while the other 75 % of the probes included drawings that were consistently incorrectly identified. The last phase consisted of category exemplar generation (CEG) sessions where the participants were asked to orally cite examples (i.e., blue) from 40 categories (i.e., colors) in which 20 of the 40 categories were probes [5]. The use of picture naming and CEG sessions was motivated by restitutive treatments that are designed to rehabilitate language mechanisms by utilizing activities involving semantic and/or phonological representations of words [33,34]. Again, 25 % of the selected probes included consistently correct categories and 75 % of the probes included consistently incorrect categories. Verbal feedback and cueing in the form of repetition was provided by an on-site speech language pathologist for incorrectly named drawings and incorrectly generated exemplars per category. Further details regarding the treatment are fully described in [10] with additional information in [5] and Altmann et al., 2014.

fMRI acquisition
A Philips 3 T Achieva MRI scanner utilizing a body coil for radio frequency (RF) transmission and an 8-channel head coil for RF receiving was used to obtain the fMRI data. Foam pads were used to stabilize the participants' heads to minimize motion during scans. High resolution T1-weighted anatomical images were acquired using turbo field echo acquisition (TE=3.7 ms, TR=8.1 ms, FOV=240 ×240 mm 2 , FA=8 • , matrix size=240 ×240, BW=3906 Hz/px) for spatial normalization to the Montreal Neurological Institute (MNI) template space. Functional MRI images were acquired to capture language-task related brain activity, represented by changes in the blood-oxygen-level-dependent (BOLD) signal, during a semantic fluency task. Six runs of the task fMRI time course consisting of BOLD weighted echo-planar (EPI) sequences were acquired (slice thickness=4 mm, FOV=240 ×240 mm 2 , matrix size=64 ×64, TR=1700 ms, TE=30 ms, FA=70 • , 186 volumes per run, and 49 slices).

Task design
Participants underwent pretreatment task fMRI to collect brain activity during word retrieval. CEG, a semantic fluency task, was chosen because it more closely parallels word selection demands in conversation and was used during the third phase of treatment. During the scan, participants were visually and auditorily presented a category in which they were instructed to orally generate an exemplar. Each task trial length was 6.8 s and 10 trials within each of six runs were conducted for a total of 60 trials. During intertrial intervals (ITI) that alternated between 13.6, 15.3, and 17.0 s [5], participants were instructed to remain still and not speak while viewing a "+ " symbol on a screen.

Behavioral analysis
In order to analyze the effects of language treatment, CEG was conducted before as well as 2-weeks after treatment. Baseline CEG scores indicated the percent average correct from 40 categories that were administered daily for 8 consecutive sessions that indicated no significant upward trend. Post-treatment CEG scores indicated the percent average correct from 20 category probes. The percent change in the CEG scores from baseline to two weeks post-treatment were collected for all participants and used as the primary behavioral measure in the prediction model. The pre-treatment and post-treatment CEG raw scores for all fourteen participants have been included in supplementary table 2.

Structural and task-fMRI image preprocessing
The T1 weighted (T1w) structural images were denoised using an optimized nonlocal means (ONLM) filter (matlab, MRI denoising package) [52,8] and bias field corrected (afni, fast) [47]. A binary lesion mask was estimated utilizing an automated segmentation algorithm called Lesion Identification with Neighborhood Data Analysis (LINDA) [40] and manually touched up if necessary, using ITK-Snap [54]. The skull was stripped from the images by applying a binary brain mask (afni, 3dskullstrip) generated by an optimized brain extraction script [31] to the T1w image. To assure complete removal of meninges, the brain mask was manually edited (ITK-Snap) before application to the structural image. Lastly, chimera spatial normalization into MNI space was conducted (FSL, flirt) (afni, applywarp) to improve the transformation into the template space, even in the presence of large cortical lesions [28].
The functional EPI images for each participant were corrected for slice-timing (afni, 3dtshift) and bulk-head motion (afni, 3dvolreg). The BOLD data was then denoised for spatial and temporal artifacts arising from task correlated motion (TCM) [30]. Briefly, with TCM correction, the data is decomposed using independent component analysis (FSL, melodic), then classified via a semi-automated approach to identify speech-related motions, such as facial/temporal movements, that are then filtered out (FSL, fix) to reduce the impact of speech on the fMRI time course. The TCM denoised images were then co-registered to the T1w images using a boundary-based registration algorithm (FSL, epi_reg). The functional time course was then normalized to MNI template space using the chimera transformation computed on the T1w images. Finally, the fMRI images were spatially smoothed with a Gaussian kernel of 6 mm inside a participant-specific brain mask excluding the ventricles and lesion to minimize CSF contamination. Then the images were resampled to the original EPI voxel size of 4 × 3.75 × 3.75 mm 3 (afni, 3dresample). The BOLD time course was scaled to the pre-baseline resting condition, consisting of 7 TRs lasting 11.9 s, and then deconvolved to estimate the voxel-wise hemodynamic response function (afni, 3dDeconvolve). The area under the curve (AUC) for the estimated voxel-wise HRF for each participant was then computed (afni, 3dmaskave) and Z-transformed (ZAUC) for group-level analysis [29] to predict brain-behavior relationships using multivariate modeling.

Optimization of the multivariate approach to analyze whole-brain fMRI data
The multivariate analysis used in this study consisted of Sparse Canonical Correlations Analysis for Neuroimaging (SCCAN) through an R package called LESYMAP [39]. SCCAN assigns each voxel a "weight" that reflects the level of contribution each voxel has towards a canonical correlation analysis between the fMRI data and behavioral data. Because fMRI data has 1000 s to 100,000 s of voxels, the solution to canonical correlation may be undetermined. The implementation of canonical correlation for imaging in LESYMAP therefore introduces a sparseness factor to determine the extent of a cluster's relationship to behavior within a spectrum ranging from local/sparse representation to distributed representation. The sparseness factor is found through a series of cross-validation analyses [41] built into LESYMAP. SCCAN multivariate analysis therefore determines meaningful weights for a voxel where the level of significance is based on an optimal sparseness value.
In previous studies, aphasia severity and object naming deficits have been predicted by lesion size [14,17,50]. Therefore, we identified voxels that had significant relationships between lesion size and brain activity in a univariate approach, and then regressed out this relationship from those voxels. In doing so, we were able to covariate out the influence of lesion size on the brain data. The behavioral data on the other hand did not have a significant relationship with lesion size, thus we have only corrected for lesion size in the functional data.
Although LESYMAP is designed to input anatomical T1w images, this study utilized functional ZAUC maps to estimate brain-behavior relationships in a multivariate framework by optimizing two LESYMAP parameters: (1) the brain mask and (2) the k-fold cross-validation. The optimization for each parameter is described in detail in the following sections.

LESYMAP brain mask optimizations for fMRI data
During the sparse canonical correlation analysis, a brain mask is used to define the voxels of interest. When a mask is not specified, LESYMAP creates a default brain mask by thresholding the averaged lesion maps (in our case functional images) by a certain amount indicated by LESYMAP's minSubjectPerVoxel function. This method of creating a mask of interest for structural images (where only the lesioned area is of interest) works well for binarized images but does not work well for whole-brain functional images that contain both positive and negative values ranging below zero and above 1. Due to the whole-brain nature of fMRI data, a whole brain mask is needed to include all brain voxels. To optimize the brain mask, we overlapped binarized anatomical brain masks from each participant (afni, 3doverlap) and thresholded the overlapping voxels into low (N = 1) to high (N = 14) levels of voxel-wise participant representation. We then tested the multivariate brainbehavior relationships resulting from each of the masks for wholebrain coverage and canonical correlation statistical power, reported with r and p.

LESYMAP k-fold cross-validation optimization for fMRI data
To determine the optimal sparseness, we ran an internal k-fold crossvalidation (CV) in LESYMAP. By default, LESYMAP was set to a 4-fold CV where the dataset was split into four sets, one as the test set and the other three as the training sets, each containing a random but an evenly distributed number of subjects. The training sets identified the weights of the voxels based on a sparseness value randomly determined by SCCAN, while the test set was used to create a brain-behavior prediction model using those weights. This brain-behavior prediction model was then tested for accuracy at a certain p-value (default is 0.05). LESYMAP repeated the k-fold validation while alternating the allocation of the sets' roles until every combination of the test set and training set has been computed. From this series of computations, the most optimal sparseness value was determined from the brain-behavior model that best predicted the expected behavioral scores. The optimal sparseness value was then used in a final SCCAN computation using the full dataset, and a raw weights image representing voxels that contribute significantly to the canonical correlation was computed. The raw weights image was then normalized and thresholded internally in the LESYMAP algorithm to output a final statistical image. All region of interest (ROI) clusters within the statistical image were corrected for multiple comparisons using cluster-wise thresholding in afni (p < 0.05, cluster size>50).
A caveat to k-fold CV is that the output greatly varies depending on the number of folds. Typically, a smaller number of folds is used for datasets that contain more than 100 participants to decrease computation time. However, due to the small number of participants in this study, all possible k-fold combinations from 2-fold to 14-fold were tested to find the k-fold that offered the most consistent prediction model. To assess the stability of each k-fold choice, LESYMAP runs were conducted three times for each k-fold value for a total of (14− 1)x3 = 39 runs. Because each dataset is assigned randomly to either the training or test set, the three LESYMAP runs help to assess the consistency of our results.

Comparison of multivariate and mass-univariate brain-behavior results
To compare multivariate-based predictors to a more standard approach of mass-univariate brain-behavior relationships, massunivariate analysis was performed by applying simple linear regressions between each voxel's ZAUC and behavior. To ensure comparability to the multivariate results, the ZAUC data was corrected for the effect of lesion size, and the brain mask optimized for sparse canonical correlations was applied during mass-univariate computations. The slopes of the linear regression on all 14 participants were observed at a thresholded R 2 value of 0.29 (p < 0.05). Clusters were corrected for multiple comparisons using bonferroni correction and cluster-wise thresholding in afni (p < 0.05, cluster size>50). The brain activity from significant clusters were extracted (afni, 3dmaskave) to assess linear regressions between the averaged ZAUC (supplementary table 3) and behavioral data for each ROI.

Extending the optimized multivariate approach to another behavioral measure
An additional behavioral measure different from the in-scanner task was utilized to assess if therapy responses involving language functions other than word finding can be predicted by task fMRI. In other words, does stimulating language systems with word-finding tasks lead to improvement in other language functions. To assess this question, language discourse samples were used because the connected speech required for discourse more closely parallels the skills necessary for daily conversation than does simple word finding, since simple wordfinding exercises do not require stringing words together to express an idea. This behavior is therefore more ecologically valid and represents a more complex language function. During the discourse production task, participants were asked to describe three given pictures and answer three open-ended questions. The number of correct information units (CIU) which involves any phrases or words that were accurate and relevant to the administered stimuli [36] was measured at both baseline and 2 weeks after treatment for both pictures and questions. The pre-treatment and post-treatment CIU raw scores for all fourteen participants have been included in supplementary table 2. For greater details of the discourse analysis, see (Altmann et al., 2014). Both multivariate and mass univariate analysis was conducted using this behavioral measure with the same settings outlined in the previous sections.

Multivariate optimizations
By default, when a custom brain mask is not input into LESYMAP a brain mask is automatically computed and applied. The generated default brain mask was not inclusive of the entire brain, and instead included spurious patches of voxels with averaged ZAUC values only between 0 and 1 (Fig. 1A). This was not appropriate for our study which involves a whole-brain representation. Therefore, a customized brain mask was created by overlapping the anatomical brain mask from all participants, excluding the participant-specific lesion. Given that the bulk of the middle cerebral artery territory for the left hemisphere falls within the lesioned region for at least one participant (Fig. 1B), many empty voxels were present within the left hemisphere when a brain mask with high (100 %) representation (i.e., at least 14 subjects per voxel) was created (Fig. 1C). Further, exclusion of voxels around the right insula and right opercular cortex suggests that atrophy also affected this mask. Alternatively, a brain mask with lower representation (i.e., at least 2 subjects per voxel) yielded insignificant results due to low statistical power. Thus, a brain mask with middle representation (i. e., at least 4 subjects per voxel) was the most optimal for our analysis (Fig. 1D). Compared to the default brain mask offered by LESYMAP, this customized mask better represents the whole brain while also mitigating problems with low statistical power in the computational analysis.
Multivariate analysis with the default inputs involving the wholebrain task fMRI data, changes in CEG from baseline to post-treatment, k-fold setting of 4, a brain template, and the default brain mask identified a different network of functional task-activated voxels compared to the output from the optimized brain mask ( Fig. 2A). In other words, the areas of activation that are correlated to post-treatment response varied between the default mask and the 4-participant whole-brain mask. When the default brain mask is utilized, regions within the left posterior cingulate and the primary sensory cortex are found to significantly predict post-treatment outcomes ( Fig. 2A). This contrasts with the results of the optimized multivariate analysis which involves a 4-participant overlap whole brain mask and a k-fold setting of 14 (see supplementary section 4). With these optimizations, clusters of activation were found in the right middle frontal gyrus (R-MFG) and right cingulate gyrus (Fig. 2B).

Comparison of multivariate and mass-univariate analysis using fMRI data
We compared the output maps from multivariate-based predictors to the more commonly used analyses based on mass-univariate fMRI predictors. When analyzing the relationship between baseline task fMRI and ΔCEG scores, the multivariate analysis yielded unique clusters of activation that differed from mass univariate predictors (Fig. 3 A). Specifically, the baseline brain activity within the R-MFG (p-value = 0.0084) and right cingulate gyrus (p-value = 0.0013) were positively correlated with ΔCEG (Fig. 3B). This region was not found to be significant in the analysis relying on mass-univariate predictors at R 2 = 0.29. Instead, mass univariate analysis identified baseline brain with voxels representing at least four subjects is more inclusive of the whole brain without lacking statistical power. activity within the right angular gyrus to be negatively correlated with ΔCEG (p-value = 0.000020) (Fig. 3B).

The expanded utility of multivariate analysis
The sensitivity and expanded utility of multivariate analysis was assessed by using another behavioral measure within the model. Changes in discourse CIU was assessed, and results revealed that some mass univariate predictors converged spatially with the multivariate predictors (Fig. 4 A). Specifically, baseline brain activity within the right middle frontal gyrus was found to negatively correlate with ΔCIU (pvalue = 0.00048) in both analyses. Multivariate analysis, however, uniquely identified the right cerebellum (p-value = 0.010) and right anterior cingulate gyrus (p-value = 0.010) as negative biomarkers of ΔCIU (Fig. 4B). These regions were not found to be significant in the mass-univariate analysis at R 2 = 0.29. Instead, baseline brain activity within the right cingulate gyrus (p-value = 0.00011) and left lingual gyrus/middle occipital gyrus (p-value = 0.00060) were found to be negatively correlated with ΔCIU while baseline brain activity within the left middle frontal gyrus (L-MFG) (p-value = 0.00011) was found to be positively correlated with ΔCIU within the mass univariate analysis.

Discussion
This study applied a novel approach to identify significant multivariate brain-behavior relationships by using whole-brain task-fMRI to estimate aphasia treatment outcomes. We also compared our novel multivariate approach to the conventional mass univariate method of using images to estimate treatment outcomes, with the caveat that mass univariate analyses are limited by lack of voxel-to-voxel relationships The default LESYMAP parameters consisted of a k-fold of 4 and an automatically generated default brain mask (x = − 45, y = − 8, z = 5). When inserting the task fMRI brain data and behavioral data, a multivariate SCCAN statistical map was obtained (x = − 13, y = − 43, z = 52). (B) LESYMAP optimizations included a k-fold of 14 and a customized brain mask (x = − 45, y = − 8, z = 5) exclusively consisting of voxels that a representative of at least four subjects. These changes resulted in a multivariate SCCAN statistical map (x = 35, y = 38, z = 38) with different spatial clusters compared to the default setting. Significant clusters are color coordinated based on correlational association (positive=red and negative=blue).  and high false-positive rates [6]. Our approach of using task fMRI in a multivariate framework mitigated such challenges by capturing context-driven voxel interactions at the whole-brain level when assessing brain-behavior relationships.

Multivariate brain mask and k-fold optimizations
Given that current lesion-symptom mapping approaches have been optimized for structural lesion-specific analyses [39,45], the current study modified the LESYMAP R package to feasibly analyze whole-brain task fMRI data. In doing so, brain mask modifications were made to account for the whole brain while maintaining statistical power. Figs. 1 and 2 highlighted how a brain mask with voxels representative of at least four subjects provided the best compromise between preserving whole-brain coverage and statistical power for our sample size. With the newly optimized brain mask, the multivariate predictors greatly differed from those found using the default mask. This is most likely due to the diverging amount and location of available voxels used within both analyses as the mask dictates which voxels are used within the model. Although the optimal level of representation for each voxel may vary depending on the sample size, maintaining the balance between whole-brain coverage and statistical power is crucial for language brain-behavior analyses as network-level information across the entirety of the brain may provide meaningful measures of language-related neural correlates. Therefore, the choice of mask must be selected carefully as its coverage greatly impacts the output brain-behavior relationships. Future analyses using this optimized multivariate model with more subjects may allow greater brain coverage and statistical reliability in discovering meaningful biomarkers of treatment rehabilitation.
Further k-fold CV modifications revealed that utilizing the leave-oneout k-fold method yielded the most robust data across both CEG and discourse CIU analyses. This could be due to this study's small sample size (n = 14). Historically, many studies of aphasia treatments have used small groups. However, our findings suggest that the multivariate predictors require a minimum number of participants to reach a stable solution. Generally, the leave-one-out method is utilized for smaller datasets as it offers the best estimates by testing every possible combination of training and testing data [44]. Using the leave-one-out method may, however, pose time-constraints for larger datasets [44]. Our CV analyses suggests that for small datasets the results begin to stabilize around 13 or 14 participants; however, further research is needed to understand minimal sample sizes. Altogether when considering clinical translatability, the proposed optimizations has provided the steppingstones to more accurately reveal prognostic neural biomarkers of treatment induced language-specific behavioral changes using multivariate analysis in PWA.

Multivariate and mass-univariate comparisons
Once the multivariate method was fully optimized, the reliability of this technique was assessed by analyzing the level of convergence with other prognostic tools such as mass univariate analysis. Results showed that both methods displayed spatial overlap when estimating changes in CIU; however, the analysis relying on multivariate predictors identified unique regions of brain-behavior relationships for both behavioral measures. Specifically, the mass univariate imaging analyses yielded anemic results with small datasets, where, for example, the R-MFG and right cingulate gyrus ROIs for CEG and the right cerebellum and right anterior cingulate gyrus ROIs for CIU were not found after correcting for multiple comparisons. Multivariate analysis, on the other hand, was sensitized to identify these significant clusters. This could be explained by the multifaceted nature of SCCAN as voxels are tested in group comparisons as opposed to independent voxel-by-voxel analysis. This allows relationships between voxels to be computed concurrently therefore allowing not only voxel-wise information but also intervoxel relationships (i.e., spatial configuration) to be accounted for in the model. Our findings therefore indicate greater sensitivity for multivariate estimates using whole-brain task fMRI data which may hold greater promise for resolving changes in functional anatomy necessary to support rehabilitation-related changes in behavior.
Considering the behavioral measure CEG, a variety of neural correlates in PWA have been known to be involved in semantic fluency such as the bilateral middle frontal gyrus, superior frontal gyrus, pars triangularis, pars orbitalis, pars opercularis, cingulate gyrus, supramarginal gyrus, and angular gyrus [1,9,2,22,27,18,51]. Considering these previous findings, the multivariate prognostic biomarkers in this study align with the current consensus as baseline activation within the R-MFG and right cingulate gyrus was found to estimate gains in CEG behavioral scores. As treatment-induced behavioral improvements are associated with increased activation of semantic fluency neural correlates [13,51], PWA who have undergone intensive language therapy may display contralesional activation of premorbid semantic networks [19,26,49]. Given that the treatment used in this study involved a combination of picture naming and CEG trials, the pretreatment activation within these areas could therefore indicate the availability of contralesional neural recruitment during word-finding tasks.

Multivariate expanded utility
The expanded utility of the multivariate method in estimating network-level language rehabilitation was conducted using correlational analyses between baseline whole-brain task fMRI and discourse CIU. Interestingly, both multivariate and mass univariate analyses found significant clusters of activation within the R-MFG, however, linear regressions between the ZAUC and ΔCIU were only significant in the multivariate discovered voxels.
These results pose interesting findings regarding the ability of wholebrain semantic fluency task fMRI data to predict more spatially distributed network-level functions of language discourse. CIU in particular is not isolated to word-level functions, and instead involves both word-level and sentence-level language fluency as "correct information units" refers to all phrases or words that provide accurate and relevant information pertaining to a stimulus of interest. Considering the underlying neural correlates of discourse CIU-related functions, sentence-level language processing is known to manifest more broadly throughout the fronto-temporal language network as opposed to being localized [15]. Given that sentence processing and semantics are closely tied [4,20,48], multivariate SCCAN analysis may have the potential to estimate spatially distributed network-level language functions from more localized word-level brain activity during a semantic fluency task fMRI.

Limitations and future works
Despite the promising utilities of whole-brain task fMRI multivariate analysis, there are some limitations to this study. Perhaps most notably, this study utilized a small number of participants resulting in a lack of statistical power for mass-univariate analyses. Due to the screening procedures and likelihood for subject attrition in clinical studies, only 14 subjects were utilized. This means that the presence of outliers could have a stronger effect on the brain-behavior relationship. Additionally, this sample was heterogenous in terms of anomia and aphasia type which may have affected both mass-univariate and multivariate analyses. Although heterogeneity is closer to the real-word reality, controlling for this will allow us to develop more robust and individualized prediction models. Future studies utilizing a large set of participants with homogenous aphasia types will aide in strengthening the validity of the discovered biomarkers. Another limitation of this study is that the long-term maintenance of treatment gains was not assessed. Only behavioral data collected within 2-weeks post-treatment was used in the model; therefore, the prognostic biomarkers discovered in this study may not be representative of long-term preservation of gains. Due to this limitation, a follow-up study utilizing behavioral measures obtained later (i.e., 3-months post-treatment) may be necessary to expand the current findings. Additionally, the task-based fMRI approach to study brain-behavior relationships has primarily involved cortical and subcortical analysis of gray matter. Stroke, however, also affects white matter tracts which have been shown to be involved in language by connecting nodes within the language network such as the Broca's area and Wernicke's area [7]. Adjacent approaches in understanding brain-behavior relationships involving diffusion tensor imaging (DTI) may therefore be necessary to fully grasp whole-brain analysis. Lastly, this study has focused on utilizing task fMRI data in order to predict treatment responses, however, resting state fMRI (rsfMRI) data may also provide meaningful information needed to create accurate prediction models. Studies have shown that resting network activity can be used to assess changes in connectivity [35] as well as aphasia severity profiles that may give insights into the extent of language recovery [3]. rsfMRI may therefore be a beneficial contributor to the rehabilitation prediction model by providing complementary hemodynamic changes within the resting network [56]. Thus, future studies utilizing both rsfMRI and task-fMRI data should be conducted. Considering these limitations, future works testing multivariate analysis on datasets involving more subjects, utilizing longitudinal behavioral measures, and using DTI as well as rsfMRI data in parallel with task fMRI data is recommended. Additionally, to test the clinical translatability and generalization of our method to earlier time points, our lab is currently working on applying the multivariate and mass univariate methodologies on subacute post-stroke aphasia participants in a 6-weeks longitudinal study. These future works could potentially reveal meaningful information regarding language treatment-induced neuroplasticity that may open new doors for personalized treatment planning and optimal rehabilitation in clinical settings for PWA.

Conclusion
In summary, we have reported a pilot study assessing the feasibility of utilizing whole-brain task-fMRI to compute multivariate brainbehavior estimation models. By assessing the reliability and expanded utility of optimized multivariate predictors, this study has methodically provided the preliminary groundwork for the improved efficacy of future prognostic analyses. For example, the discovery of unique multivariate-specific biomarkers, the R-MFG and right cingulate gyrus for treatment-induced changes in semantic fluency and the right cerebellum and right anterior cingulate gyrus for changes in discourse CIU, has highlighted the attuned sensitivity of this approach in discovering meaningful brain-behavior relationships. Given that this novel multivariate optimization discovered significant ROIs despite a small sample size, future works utilizing a larger sample size may provide more statistical power to confirm this study's preliminary findings. Additionally, brain-behavior relationships for CIU within the R-MFG was found to be significant only in the voxels that overlapped between both multivariate and mass univariate analyses instead of mass univariate alone. Thus, this study highlights how complex network-level brain-behavior relationships deduced in whole-brain task-based fMRI multivariate analysis may act as a potential complementary tool to mass univariate predictors. Both methodologies, as applied to whole-brain task-fMRI, offer versatile tools that can be applied to any cognitive task, treatment, and poststroke phase (i.e., acute, subacute, or chronic) for estimation of aphasia treatment responses. Ultimately, these methods may aid patient triaging and the personalization of treatment planning by offering a more precise estimation of rehabilitation-induced language improvements.

Ethics statement
This study was carried out in accordance with the Declaration of Helsinki following the recommendations of the Emory University and Atlanta Veterans Affairs Medical Center. All subjects gave written consent per the outlined by the University of Florida Health Science Institutional Review Board.

Declaration of Competing Interest
There is no conflict of interest among the authors and the author list is final.

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.