Automated individual-level parcellation of Broca's region based on functional connectivity

Broca's region can be subdivided into its constituent areas 44 and 45 based on established differences in connectivity to superior temporal and inferior parietal regions. The current study builds on our previous work manually parcellating Broca's area on the individual-level by applying these anatomical criteria to functional connectivity data. Here we present an automated observer-independent and anatomy-informed parcellation pipeline with comparable precision to the manual labels at the individual-level. The method first extracts individualized connectivity templates of areas 44 and 45 by assigning to each surface vertex within the ventrolateral frontal cortex the partial correlation value of its functional connectivity to group-level templates of areas 44 and 45, accounting for other template connectivity patterns. To account for cross-subject variability in connectivity, the partial correlation procedure is then repeated using individual-level network templates, including individual-level connectivity from areas 44 and 45. Each node is finally labeled as area 44, 45, or neither, using a winner-take-all approach. The method also incorporates prior knowledge of anatomical location by weighting the results using spatial probability maps. The resulting area labels show a high degree of spatial overlap with the gold-standard manual labels, and group-average area maps are consistent with cytoarchitectonic probability maps of areas 44 and 45. To facilitate reproducibility and to demonstrate that the method can be applied to resting-state fMRI datasets with varying acquisition and preprocessing parameters, the labeling procedure is applied to two open-source datasets from the Human Connectome Project and the Nathan Kline Institute Rockland Sample. While the current study focuses on Broca's region, the method is adaptable to parcellate other cortical regions with distinct connectivity profiles.


Introduction
Broca's region is one of the most widely studied brain areas because of its historical significance and critical role in language processing (Sahin et al., 2009). Located on the inferior frontal gyrus in the language-dominant hemisphere, it can be subdivided into its constituent areas 44 and 45 based on cytoarchitectonic boundaries that roughly correspond to the macroanatomical landmarks of the pars opercularis and pars triangularis respectively (Amunts et al., 1999;Petrides and Pandya, 2002). Based on this knowledge, it is common in neuropsychological research to make use of morphologically defined regions-ofinterest (ROIs) as proxies for areas 44 and 45. However, there is a high degree of inter-individual variability in cortical morphology in the ventrolateral frontal region (Tomaiuolo et al., 1999;Keller et al., 2007), making it difficult to distinguish accurately areas 44 and 45 based on macroanatomical features alone. In addition to anatomical variability, recent studies using resting-state fMRI have shown that higher-level cortical association areas, including Broca's region, show particularly strong inter-individual variability in functional connectivity (Yeo et al., 2011;Mueller et al., 2013;Wig et al., 2014). Methods for the analysis of resting-state functional connectivity, such as dual regression, have addressed this variability by estimating individual-level versions of group-level network templates, thereby facilitating the study of group differences in specific functional networks (Filippini et al., 2009). Tailoring cortical parcellation approaches to the individual brain is critical in order to capture the unique functional and morphological characteristics of each subject with precision, and this can be done by applying similar principles as those used for the dual regression http://dx.doi.org/10.1016/j.neuroimage.2016.09.069 Accepted 29 September 2016 approach (see for example Wang et al. (2015)). Functional atlases based on individual-level parcellation provide a foundation for exploring the relationship between structural and functional boundaries in the brain and their correspondence across individuals.
Existing automatic connectivity-based parcellation techniques aim to provide whole-brain functional atlases by parcellating the entire cerebral cortex using the same basic criteria, namely similarity or homogeneity of connectivity patterns, and therefore do not take advantage of the considerable amount of prior knowledge that exists about the anatomy and unique connectivity patterns of highly studied regions such as areas 44 and 45. Additionally, since no gold-standard connectivity-based cortical atlas currently exists, difficulty can arise when interpreting the correspondence of whole-brain parcellations to existing ontologies. For this reason, anatomical criteria are commonly used to define cortical regions in studies using functional data. The anatomy-informed manual labeling approach for areas 44 and 45, described in our previous work (Jakobsen et al., 2016), attempts to address this issue by incorporating prior knowledge of the anatomical locations, distinct connectivity profiles, and variability of the ROIs into each individual parcellation. It has previously been shown that areas 44 and 45 are clearly distinguishable by differences in connectivity to the inferior parietal and lateral temporal regions using resting-state functional connectivity (Kelly et al., 2010;Margulies and Petrides, 2013). Based on these connectivity differences, Brodmann areas 44 and 45 were manually labeled in 101 individual datasets using functional connectivity glyphs (Boettger et al., 2014;Jakobsen et al., 2016). By simultaneously displaying the seed-based connectivity patterns of multiple neighboring vertices of the cortical surface reconstruction, this technique allows for meticulous manual segmentation based on differences in the connectivity patterns of adjacent cortical regions. Resulting group-level probability maps based on the individual manual parcellations of areas 44 and 45 are consistent with cytoarchitectonic probability maps of the same regions while still reflecting the high degree of individual morphological variability, thereby validating the usefulness of intrinsic functional connectivity as the basis for accurate in vivo cortical parcellation. The manually defined area labels therefore serve as gold-standard functional labels that can be used to validate the results of an automated data-driven parcellation approach.
The current study presents a new method for automated, observerindependent, and anatomy-informed parcellation of specific cortical regions based on functional connectivity. This method aims to simulate the gold standard manual parcellation procedure and takes advantage of prior knowledge of well-studied brain regions by basing parcellations on a combination of area-specific functional connectivity features and meaningful anatomical criteria. Using the previously generated manual labels of areas 44 and 45 as gold-standard, the goal of this approach is to generate observer-independent area labels with comparable precision to manual labeling at the individual-level. The approach consists of three main steps: (1) defining a broad target ROI that includes nodes surrounding the areas of interest, (2) assessing the similarity of each node's connectivity to a set of group-level templates corresponding to the areas of interest and other canonical networks, (3) extracting individual-level templates used to assign vertices to their most similar network and (4) weighting the similarity values by spatial location and ensuring spatial continuity of the final labels. In keeping with the manual parcellation procedure, the automated parcellation pipeline 1 incorporates macroanatomical information through ROI definition and spatial weighting, but allows the resulting functional boundaries to deviate from anatomical constraints in order to better fit the individual anatomy and connectivity.
The area 44 and 45 labels derived from the automated observerindependent parcellation method display a high degree of overlap with the results of the manual labeling procedure, validating the precision of the method at the individual-level. Parcellation results are also presented for an independent resting-state fMRI dataset, demonstrating that the method is able to generalize to datasets with varying acquisition and preprocessing parameters.

Data and preprocessing
The data used in the development of the parcellation pipeline were provided by the Human Connectome Project (HCP) and comprised 101 previously manually labeled (based on connectivity and anatomical priors) resting-state fMRI datasets and corresponding T1-weighted structural data for each individual (mean age 29, 13 left-handed, 59 females). All data were preprocessed according to standard HCP preprocessing pipelines including ICA-based artefact removal (Salimi-Khorshidi et al., 2014) and registration to HCP 2 mm standard surface space (fs_LR 32k node surfaces). Further details about the standard HCP data acquisition and preprocessing methods can be found in Smith et al. (2013). Since language processing is strongly leftlateralized (Rasmussen and Milner, 1975), analyses included only the left cerebral hemisphere. Several additional study-specific preprocessing steps were performed as follows: (1) The functional time-series data of the left cerebral cortex was extracted for each of four 15-min rfMRI scans (TR=0.7 s) per subject, (2) surface-based smoothing with a 2 mm FWHM kernel was applied, (3) a correlation matrix was computed and Fisher's r-to-z transformed (Fisher, 1915;1921), (4) the resulting 32k×32k matrices were averaged across the four rfMRI runs for each participant, (5) the average matrices were z-to-r transformed. Note that these datasets are identical to those used in previous work describing the manual labeling procedure (Jakobsen et al., 2016).
In order to show that the parcellation pipeline can be successfully applied to independent datasets, 100 additional resting-state and corresponding T1-weighted structural datasets from the Enhanced Nathan Kline Institute-Rockland Sample (NKI) (Nooner et al., 2012) (mean age 43, 10 left-handed, 65 females) were also used. These data comprised one 10-min multiband rfMRI scan (TR=645 ms) per subject, for which the following preprocessing steps were performed: (1) removal of the first 5 volumes, (2) head motion correction, (3) coregistration of functional data to anatomy, (4) denoising, and (5) band-pass filtering. The full preprocessing pipeline for this dataset can be found at 2 . Several additional study-specific preprocessing steps were then performed as follows: (6) The timeseries data of the left cerebral cortex were extracted and projected to fsaverage5 (10k nodes per hemisphere) surface space, (7) surface-based smoothing with a 6 mm FWHM kernel was applied. The choice of a more traditional 6 mm smoothing kernel was due to the larger voxel size and shorter resting-state acquisition time of the NKI as compared to the HCP data. (8) Correlation matrices were then computed using Pearson's correlation, and (9) the individual native FreeSurfer surfaces were downsampled to fsaverage5 template for visualization purposes.

Generation of connectivity templates and ICA maps
Prior to parcellation of the individual datasets, group-level connectivity maps of areas 44 and 45 ( Fig. 1) were computed based on the 101 manually labeled HCP datasets. First, the average functional connectivity was averaged across all vertices included in each individual manual label, and the resulting connectivity vectors were then averaged across the 101 individuals. Note that the relatively low maximum correlation values are likely due to noise inherent in resting-state fMRI data (Biswal et al., 1995) as well as variance in the average correlation values across individuals in the HCP dataset (Jakobsen et al., 2016). These connectivity maps were subsequently used as group-level seed-based functional connectivity templates for areas 44 and 45 in the individual-level parcellation pipeline (Fig. 1).
In order to estimate the possible connectivity patterns of vertices not belonging to areas 44 or 45, group-and individual-level independent component analyses (ICA) were run on the timeseries of the 101 HCP datasets, resulting in 20 group-and individual-level independent component maps (Figs. 2 and 3 respectively).
Spatial correlations were then computed between each of the group-level IC maps and the previously generated group-level connectivity templates of areas 44 and 45 (Fig. 1). The two IC maps with the highest spatial correlations (r > 0.4) were then removed so that only IC maps not representative of areas 44 and 45 were included. More specifically, this was done to ensure that the connectivity patterns of interest were not included in the matrix of spatial regressors used in subsequent partial correlation steps. The 18 remaining IC maps were then used in the parcellation pipeline as group-level templates representing the connectivity patterns of vertices located outside of areas 44 and 45. A similar procedure applied to the individual-level IC maps is included in the parcellation pipeline, described below.

Parcellation pipeline
The individual datasets were then processed using an automated parcellation pipeline to produce binary labels of areas 44 and 45 tailored to account for intersubject variability in functional connectivity patterns by first comparing the connectivity of single vertices to grouplevel templates and then refining the results using individual-level connectivity patterns. The pipeline comprised 7 steps, as summarized in Fig. 4: 1. To reduce the size of the input matrix and thereby increase the processing speed of the parcellation pipeline, the whole-brain connectivity matrix of each individual subject is masked along one dimension to an ROI covering the ventrolateral prefrontal cortex, resulting in an asymmetrical matrix representing the connectivity of each vertex within the ROI to the whole cortical surface. As a result, only vertices within the ROI will be labeled. To allow for a higher degree of morphological variability than would be provided by anatomical labels based on sulcal information alone, the ROI is created by combining the FreeSurfer labels corresponding to the pars opercularis and pars triangularis with binarized (probability > 0) cross-subject probability maps of areas 44 and 45 based on the 101 manually labeled datasets. 2. Vertex-wise partial correlations are then run for the two group-level seed-based connectivity templates for areas 44 and 45. In this step, the connectivity pattern of each vertex is correlated with each of the template connectivity maps, regressing out the effects of the eighteen remaining group-level IC maps and the adjacent area 44 or 45. This allows for quantification of the spatial similarity of each vertex with the connectivity patterns of areas 44 and 45 when you remove the variance attributable to the 18 other networks. This results in two partial correlation maps representing the similarity of the connectivity patterns of each vertex within the ROI to the group-level seedbased connectivity templates for areas 44 and 45. 3. The vertices with the maximum partial correlation to the group-level connectivity templates for areas 44 and 45 are then identified as the vertices with the connectivity patterns most representative of areas 44 and 45. The seed-based connectivity vectors corresponding to these representative vertices are then extracted from the connectivity matrix and subsequently used as individual-level seed-based connectivity templates for areas 44 and 45. 4. Spatial correlations are then computed between each of the individual-level IC maps and the individual-level seed-based connectivity templates for areas 44 and 45 generated in step 3. The IC maps with the highest spatial correlations (r > 0.4) are then removed (Fig. 3).
To account for individual variability in connectivity patterns, vertexwise partial correlations are then run for the two individual-level seed-based connectivity templates for areas 44 and 45 and all remaining individual-level IC component maps. This step results in a set of partial correlation maps representing the similarity of the connectivity patterns of each vertex within the ROI to each of the individual-level connectivity template maps (consisting of areas 44 and 45, and approximately 18 IC maps for a typical subject). 5. A spatial weighting is then applied in order to decrease the probability of vertices falling outside a certain anatomical region being included in the final labels for areas 44 and 45. This step consists of multiplying the new individual-level partial correlation maps for areas 44 and 45 by the common logarithm of the cross- E. Jakobsen et al. NeuroImage 170 (2018) 41-53 subject spatial probability maps of areas 44 and 45 based on the 101 manually labeled datasets. The common logarithm is applied in order to decrease the slope of the probability maps and thereby apply a less restrictive spatial constraint. 6. A winner-take-all partition is then run across all partial correlation maps, resulting in each vertex within the ROI being assigned to one of approximately 20 possible connectivity-based classes. 7. To ensure spatial continuity of the final labels, the largest clusters assigned to the classes corresponding to areas 44 and 45 are then extracted, forming the final binary labels of areas 44 and 45.

Comparison of manual and automated parcellation results
To compare results from the manual and automated labeling procedures, the 101 previously manually labeled HCP datasets were run through the automated labeling pipeline. Spatial overlap of the area 44 and 45 labels from the manual and automated parcellation procedures were then compared using the Dice coefficient (DC) (Fig. 5). The Dice coefficient (Dice, 1945) ranges between 0 and 1 and is defined as: To evaluate the effect of the spatial weighting applied in step 5 of the parcellation pipeline (Fig. 4), the entire pipeline was repeated skipping this step.

Group-level comparisons
Group-level probability maps of areas 44 and 45 based on the results of the manual and automated parcellation procedures were then computed (Fig. 6). These maps were then compared to cytoarchitectonic probability maps from the Juelich Brain Model (Amunts et al., 1999) by binarizing at various probability thresholds and calculating the degree of spatial overlap between them using the Dice coefficient (Fig. 7).
To obtain a label-specific measure of sulcal morphometry, we used the FreeSurfer sulc measure. This measure represents the displacement of a particular vertex on the cortical surface from a hypothetical surface in the midpoint between gyri and sulci, with a mean displacement of zero. The mean sulc value across the individual area 44 and 45 labels produced by manual and automated labeling was computed, and their distributions across subjects were compared. (1) The whole-brain connectivity matrix of each individual subject is masked to a predefined region-of-interest (ROI) covering the ventrolateral prefrontal cortex. The following steps are then conducted on the whole-brain connectivity map of each vertex within the ROI.
(2) Vertex-wise partial correlations are run for areas 44 and 45 using group-level connectivity templates. (3) The vertices corresponding to the maximum partial correlation for areas 44 and 45 are identified, and used as seeds to extract individual-level connectivity templates. (4) Vertex-wise partial correlations are run using the individual-level seed-based connectivity templates of areas 44 and 45 and individual-level IC components. (5) A spatial weighting is applied to the partial correlation results for areas 44 and 45 using previously generated group-level spatial probability maps from manual labeling. (6) A winner-take-all partition is run across all resulting partial correlation maps.
The largest clusters corresponding to areas 44 and 45 are extracted.  Group-average masks from the automated parcellation of the same 100 functional connectivity datasets as used for manual parcellation. Abbreviations: aalf anterior ascending ramus of the lateral fissure; cs, central sulcus; half, horizontal anterior ramus of the lateral fissure; iprs, inferior precentral sulcus; Op, pars opercularis; Tr, pars triangularis.
2.6. Evaluating the effect of assigning vertices to the 'neither' category To evaluate the effect of the assignment of vertices to the confound networks (represented by the IC maps), the labeling pipeline was rerun on the 100 HCP subjects without assigning any vertices to the 'neither area' category. In this case, all vertices within the ROI are assigned as areas 44 or 45, and the two largest spatially continuous resulting clusters are selected as the final labels.

Comparing results to k-means++ clustering
To evaluate the spatial overlap of the results of the current parcellation pipeline and labels derived from existing data-driven clustering methods, we ran k-means++ clustering with a 2-cluster solution (k=2) on the 100 HCP datasets. k-means ++ is a clustering algorithm that has been shown to outperform standard k-means due to an optimized seeding technique (Arthur and Vassilvitskii, 2007). Further comparisons of the manual labeling procedure and various other clustering algorithms, including k-means++ can be found in Jakobsen et al. (2016).

Testing on an independent dataset
To confirm that the labeling pipeline can be applied to independent datasets with varying data acquisition and preprocessing parameters, the automated labeling procedure was applied to the 100 NKI datasets, eight of which were manually labeled post hoc. Prior to running the parcellation pipeline, the group-level connectivity and probability maps of areas 44 and 45 created using the manually labeled HCP datasets were downsampled from fs_LR_32k to fsaverage5 surface space. 20component group-and individual-level independent component analyses were then run on the time series of the 100 NKI datasets.
Since the group-level connectivity templates of areas 44 and 45 were derived from the HCP and not the NKI dataset, which has a comparably lower signal-to-noise ratio due to much shorter scan lengths, the spatial correlations between the IC maps and the grouplevel templates were decreased compared to the HCP data. For this reasons, the threshold for removal of IC components was decreased to a correlation value of greater than 0.1. This resulted in more components being identified as related to BA 44 or 45 prior to running the partial correlation steps and subsequently regressing out fewer components, allowing for a higher degree of variability in the connectivity patterns within labels. All other aspects of the parcellation pipeline remained the same as for the HCP data. Group-level probability maps were then computed across the 100 NKI datasets for comparison (Fig. 11). Eight of the 100 subjects were then manually labeled post hoc, blind to the results of the automated labeling, and spatial overlap of the automated and manual area 44 and 45 labels was compared using the Dice coefficient (Fig. 12).

Exclusion of one subject from automated parcellation
Of the 101 manually labeled HCP datasets, one subject was excluded from automated parcellation due to missing files in more recent HCP data releases. Removal of this subject has no effect on the results of the remaining subjects. The results of the automated parcellation are therefore presented for 100 of the 101 HCP subjects.

Comparison of manual and automated parcellation results
The results from the automated parcellation pipeline displayed a high degree of spatial overlap with the gold standard maps produced by manual labeling (Fig. 5). The average Dice coefficient across the 100 subjects was 0.71 for area 45 and 0.63 for area 44. To demonstrate the improvement of the parcellation results when anatomical information  is incorporated via spatial weighting of the partial correlation maps, results are additionally presented without the application of spatial weighting. For these results, the average Dice coefficient across the 100 subjects was 0.62 for area 45 and 0.39 for area 44 (note the markedly lower spatial overlap of the area 44 labels without spatial weighting).

Group-level comparisons
The group-level probability maps of the labeled areas 44 and 45 demonstrate high consistency between the manual and automated labeling techniques. The region of highest overlap between subjects for area 45 is on the posterior half of the pars triangularis, directly anterior to the anterior ascending ramus of the lateral fissure. For area 44, the region of highest overlap lies on the anterior bank and fundus of the inferior precentral sulcus, adjacent to and including the pars opercularis. The functional connectivity-based group-level probability maps also show consistency with cytoarchitectonic probability maps derived from post-mortem histology (Amunts et al., 1999;Fig. 6).
The degree of spatial overlap between the functional (manual and automated) and cytoarchitectonic probability maps shows a relatively steady decrease with increasing probability values, with only the highest probability values showing no overlap. Spatial overlap was slightly higher for area 45 than area 44 for both manual and automated labeling (Fig. 7). Fig. 8 shows the distribution of mean sulc value across the individual area 44 and 45 labels produced by manual and automated labeling for the 100 subjects. In the HCP data, sulci are represented by negative sulc values, and gyri by positive sulc values (note that this deviates from standard FreeSurfer convention). For both the manual and automated labeling, the mean sulc value for area 45 is consistently positive, suggesting that the area 45 labels include mostly gyral vertices. The mean sulc value across individuals for area 44 shows a wider distribution centered around zero for both the manual and automated labeling. This result is consistent with a higher degree of sulcal variability surrounding the pars opercularis as compared to the pars triangularis (Jakobsen et al. 2015), with the individual labels often extending into the inferior precentral sulcus. For both areas 44 and 45, the distributions of mean sulc values are wider for labels resulting from manual as compared to automated labeling, suggesting that the spatial weighting applied in the automated labeling pipeline is slightly more restrictive than in the manual labeling. However, a paired sample t-test revealed that the mean sulc values from the manual and automated labels are not significantly different (p=0.78 for area 45 and p=0.38 for area 44). For both manual and automated labeling, the mean sulc values from areas 44 and 45 were significantly different at p < 0.001.

Evaluating the effect of assigning vertices to the 'neither' category
On average, 894 vertices (corresponding to 64% of the approximately 1400 vertices comprising the ROI) are assigned to the neither category in the presented results. Without assigning vertices to the neither category, the average Dice coefficient across the 100 subjects is 0.57 for area 45 and 0.49 for area 44 (significantly lower as compared to 0.71 and 0.63 when including the neither category). In this case, the average number of vertices within the ROI not included in the labels (i.e. not belonging to the two largest spatially continuous clusters of vertices assigned as areas 44 and 45) is 364 (corresponding to 26% of the approximately 1400 vertices comprising the ROI). Fig. 9 shows the results of the labeling pipeline with and without assigning vertices to the neither category in a single representative subject.

Comparing results to k-means++ clustering
The average Dice Coefficient across the 100 subjects when comparing the largest spatially continuous clusters resulting from k-means++ clustering (k=2) to the manual labels is 0.58 for area 45 and 0.34 for area 44. The same analysis using the automated labels yields an average Dice coefficient of 0.61 for areas 45 and 0.40 for area 44. These results represent a markedly lower spatial overlap with the manual labels than the current parcellation pipeline. This is due to overestimation of the areas by k-means++ clustering since the nature of the algorithm is such that every vertex within the ROI will be assigned as either area 44 or 45 when a 2-cluster solution is applied.
To further illustrate the similarities and differences between the two methods, Fig. 10 depicts the results of running k-means++ clustering with increasing numbers of clusters (k=2-10, 20) on a single subject with the corresponding subject's manual and automated area 44 and 45 labels overlaid.
As can be seen in Fig. 10, there is a relatively high degree of spatial overlap between the manual and automated labels and certain clusters resulting from k-means++. However, while the boundary separating areas 44 and 45 is well-defined by k-means++, the outer boundaries of the areas change depending on the number of clusters chosen, and sometimes the areas of interest are split into multiple clusters. This gives rise to difficulty in determining the appropriate number of Fig. 9. Results of the parcellation pipeline with (left) and without (right) assigning vertices to the 'neither' category for a single representative subject. On average, approximately 64% of the vertices comprising the ROI are assigned to the 'neither' category. Without assigning vertices to the neither category, all vertices are assigned as belonging to area 44 or 45, and the two largest spatially continuous resulting clusters are selected as the final labels.
clusters when the aim is to delineate specific cortical areas. By excluding vertices from the final area labels if their connectivity is more similar to one of the confound networks (IC maps) than to the networks of interest (areas 44 or 45), the current parcellation approach allows for more flexible and accurate definition of the outer boundaries of the areas of interest.

Testing on an independent dataset
The results of post hoc manual labeling on eight NKI datasets showed a high degree of spatial overlap with the area 44 and 45 labels produced by the automated labeling pipeline (Fig. 11). The average Dice coefficient across the eight subjects was 0.71 for area 44 and 0.69 for area 45. These results demonstrate the ability of the automated labeling pipeline to generalize to datasets with varying resolution and acquisition parameters. Fig. 12 shows the group-level probability maps from automated labeling of the 100 NKI datasets. These maps are consistent with those from the HCP data as well as cytoarchitectonic probability maps derived from post-mortem histology (Fig. 6). All results including individual subject labels and group-level probability maps are available for download at [link to be added upon publication], as well as the automated labeling pipeline script. 3

Discussion
The results of the current study suggest that the automated parcellation pipeline is able to produce functional connectivity-based labels of areas 44 and 45 with comparable precision to the manual labeling procedure we had previously described (Jakobsen et al., 2016). The accuracy of parcellating and assigning labels in new datasets E. Jakobsen et al. NeuroImage 170 (2018) 41-53 suggests that the method is not dependent on the specific cohort used to create the group-level connectivity templates and can be applied to independent functional connectivity datasets with varying scanning parameters. Due to the high dimensionality and inherent complexity of functional connectivity data derived from resting-state fMRI, the development of data-driven parcellation techniques for dimensionality reduction has become a priority in the field (Thirion et al., 2014). Such approaches often make use of clustering algorithms, such as k-means, hierarchical, or spectral clustering, to group voxels or vertices into networks based on the similarity of their connectivity patterns (Yeo et al., 2011;Kahnt et al., 2012;Craddock et al., 2012;Blumensath et al., 2013). Other methods make use of boundary detection techniques to map the transitions in connectivity patterns between cortical regions (Cohen et al., 2008;Hirose et al., 2012;Wig et al., 2014). More recently, a method by Wang and colleagues (2015) made use of an iterative approach to fit a population-based functional atlas to individual brains, which critically and effectively captures cross-subject variability. These approaches for cortical parcellation are powerful tools for dimensionality reduction of complex functional connectivity data and can provide whole-brain functional atlases that are instrumental within particular research contexts.
One key difference that distinguishes the current parcellation approach from existing clustering methods is the exclusion of vertices that are more similar to a set of confound networks (represented by IC maps) than to the connectivity patterns of the target areas. Clustering methods such as k-means and hierarchical clustering produce parcellations with a predefined number of clusters, regardless of how many distinct networks actually exist within the ROI. This gives rise to the problem of determining the appropriate number of clusters to choose when the aim of the parcellation is to distinguish particular cortical areas (see Jakobsen et al. (2016) for a detailed comparison of the manual labeling approach and various data-driven clustering algorithms). The current parcellation approach does not force vertices to be assigned to every possible network and thereby allows for more flexible and accurate definition of the outer boundaries of the areas of interest. In more general terms, the parcellation approach presented here differs from other data-driven parcellation techniques by aiming to simulate the process of individual-level manual labeling driven by prior knowledge of the anatomy and connectivity of specific cortical areas. By  E. Jakobsen et al. NeuroImage 170 (2018) 41-53 targeting specific known differences in connectivity patterns and incorporating anatomical information via spatial weighting, the resulting area labels represent the best fit to the individual anatomy and connectivity. The availability of the manually labeled dataset is uniquely valuable for validation of the area labels derived from the automated parcellation pipeline, the accuracy of which would otherwise be difficult to quantify due to the lack of gold standard functional atlases.
The two-step partial correlation approach using both group-level and individual template connectivity maps ensures that cross-subject variance in connectivity patterns is accounted for in the parcellation pipeline. This approach notably shares some conceptual similarities with dual regression, which also makes use of group-level ICA-based network maps to estimate individual-level versions of the same networks (Filippini et al., 2009), but the goals of the two approaches differ slightly. In dual regression the aim is to facilitate comparisons of the same networks between groups of individuals, while in the current parcellation approach the aim is to produce individual area labels based on their unique patterns of connectivity. Therefore, in the current approach, the group-level spatial maps are simply used to identify an appropriate seed location from which to extract the individual-level network templates from the unmodified data, instead of using temporal regression to create individual-level versions of the group-level templates as is the case with dual regression.
The current study applied the automated parcellation pipeline to Broca's region where (i) the two cytoarchitectonic areas that comprise it are known to have distinct functional connectivity and (ii) the availability of the manually labeled datasets could provide a basis for comparison. However, the described parcellation pipeline could be modified to delineate any cortical regions for which known differences in functional connectivity exist. In this case, alternative group-level template sources, such as the IC maps most resembling the connectivity patterns of interest, could be used to initiate the analysis and subsequently extract the individual-level templates (see Supplementary materials for preliminary results of such an alternative method). Additionally, the spatial probability maps based on the manually labeled datasets used for spatial weighting could be replaced with a different form of spatial information such as cytoarchitectonic probability maps or geodesic distance from an anatomically-defined ROI.
Like most cortical parcellation techniques, the current method provides deterministic binary labels of areas 44 and 45. However, it has recently been demonstrated that different cortical areas display varying degrees of sharpness in their boundaries (Wig et al., 2014). For certain research applications it may therefore be advantageous to output probabilistic area labels at the individual level, and the parcellation criteria upon which the current method is based could just as well be used for this purpose. More specifically, the parcellation pipeline is very easily modified to output the partial correlation maps themselves, which, with the application of spatial weighting based on anatomical criteria, could serve as individual-level probabilistic area labels to be used for the investigation of transition gradients and quantification of the sharpness of boundaries across individuals.
Until recently, non-invasive in vivo definition of areas 44 and 45 has been a significant challenge and many functional neuroimaging studies have relied on anatomical definitions of the regions based on data derived from cytoarchitectonic studies. To our knowledge, no behavioral task exists that is able to reliably distinguish areas 44 and 45 from each other, and resting-state functional connectivity-based parcellation provides a solution to this problem. Our observer-independent parcellation method is able to produce probability maps based on both functional connectivity and anatomical location, which represent variability across a much larger number of individual brains than what is viable using invasive techniques. Additionally, we believe that the individual labels will prove useful for a variety of applications ranging from serving as functionally defined ROIs for fMRI studies to clinical applications such as presurgical planning. This method fills a gap in the tools currently available for mapping the functional boundaries of specific cortical areas on an individual-level.