Data on the test-retest reproducibility of streamline counts as a measure of structural connectivity

These data provide estimations of test-retest reproducibility of streamline counts based on diffusion weighted imaging (DWI) data using a global tractography algorithm in a sample of young healthy adults. Data on descriptive statistics and factorial analyses of within-session and between-session reproducibility in terms of intra-class correlation coefficients for the absolute agreement between measurements are provided. The effect of several exemplary methodological parameters pertaining to different steps along the tractography processing pipeline on reproducibility are considered. These data are related to the research article entitled ‘Probing the reproducibility of quantitative estimates of structural connectivity derived from global tractography’ (Schumacher et al., Neuroimage, 175 (2018) 215–229).


a b s t r a c t
These data provide estimations of test-retest reproducibility of streamline counts based on diffusion weighted imaging (DWI) data using a global tractography algorithm in a sample of young healthy adults. Data on descriptive statistics and factorial analyses of withinsession and between-session reproducibility in terms of intra-class correlation coefficients for the absolute agreement between measurements are provided. The effect of several exemplary methodological parameters pertaining to different steps along the tractography processing pipeline on reproducibility are considered. These data are related to the research article entitled 'Probing the reproducibility of quantitative estimates of structural connectivity derived from global tractography' (Schumacher et al., Neuroimage, 175 (2018)  Type of head-coil (12-channel vs. 32-channel coil), number of reconstruction repetitions (1 vs 10 repetitions), streamline selection variant (defining fuzzy versus no fuzzy borders of the seed mask; selecting streamlines that end in versus that visit a seed)

Experimental features
Participants were scanned twice within one week and tractography was performed in two independent tracking runs for both testing sessions using both types of head-coil. Whole-brain fiber reconstruction was carried out once with 1 repetition and once with 10 repetitions. Streamlines for connections between the seeds of the AAL atlas were selected using four different streamline selection variants (end-point_nofuzzy; endpoint_fuzzy; visiting_nofuzzy; visiting_fuzzy).

Data source location
Freiburg, Germany Data accessibility Data is provided with this article

Value of the data
The data provide comprehensive information on how the test-retest reproducibility of structural connectivity is influenced by methodological parameters commonly used with fiber tractography algorithms (e.g. seed-based selection of streamlines).
The data can inform future research using the global tractography technique to quantitatively assess differences in structural connectivity (e.g. in patient studies).
Data is based on the common AAL brain atlas, thus allowing comparisons with other reproducibility analyses using the same atlas (e.g. for different tractography approaches).

Data
The data of this article provide information on the test-retest reproducibility of quantitative estimates of structural connectivity based on whole-brain fiber tractography of diffusion-weighted MR images using the global tractography approach by Reisert and colleagues [1]. Streamline counts (i.e. the number of reconstructed 'fibers') of all pairwise connections between seeds of the AAL atlas were used as the quantitative measure of structural connectivity. The data presented here describe the test-retest reproducibility of these streamline counts for both within-session (comparing two independent tracking runs of the same data) and between-session (comparing data from two independent testing sessions) measurements. Data are provided on the effect of three methodological parameters on test-retest reproducibility: type of head-coil (12-channel vs. 32-channel coil), number of reconstruction repetitions (1 vs. 10 repetitive reconstructions of streamlines), and streamline selection variant (endpoint_nofuzzy; endpoint_fuzzy; visiting_nofuzzy; visiting_fuzzy). In the related research article (Schumacher et al., Probing the reproducibility of quantitative estimates of structural connectivity derived from global tractography), analyses are restricted to one streamline selection variant (endpoint_fuzzy), whereas this data set provides reproducibility statistics for all four streamline selection variants. These variants refer to selecting streamlines based on (1) whether they end in versus visit (i.e. pass through) a seed [endpoint vs. visiting] and (2) whether the image mask of the seed is used in its original binary version (i.e. each voxel has a value of 1 if it lies inside the seed image mask or 0 if it lies outside the seed image mask; streamlines are selected based on voxels with a value of 1 for a given seed) or whether 'fuzzy' borders of the seed image mask are defined to select streamlines by applying a Gaussian kernel to seed image voxels, so that voxels with a given minimum probability of lying within the seed image mask are used for streamline selection [nofuzzy vs. fuzzy]. For a detailed description, see Section 2, Streamline Selection. For a visualization of exemplary connections with the four streamline selection variants, see the related research article (Schumacher et al., Probing the reproducibility of quantitative estimates of structural connectivity derived from global tractography).
The analytical design of the current data set is depicted in Fig. 1 (for further information, see Section 2, Experimental design and statistical analysis). Briefly, reproducibility statistics are provided assessing the effect of the number of reconstruction repetitions and streamline selection variants for within-session data (Model 1) and between-session data (Model 2) and assessing the effect of type of head-coil and streamline selection variants for within-session data (Model 3) and between-session data (Model 4).
For an overview on reproducibility data in terms of the intra-class correlation coefficient (ICC) for absolute agreement, the median ICC(2,1) values are given in Table 1. Table 2 shows the percentage of ICC(2,1) values o.60, corresponding to low reproducibility; between .60 and .69, corresponding to marginal reproducibility; and Z .70, corresponding to adequate reproducibility or higher. Table 3 reports the reproducibility statistics for the four factorial models. These reproducibility statistics are further illustrated in terms of relative treatment effects (RTE) in Fig. 2. The entire raw ICC(2,1) values Fig. 1. Experimental design of the data set. Diffusion-weighted imaging sequences were acquired on two separate testing sessions (Session 1 and Session 2) and with a 12-channel (12ch) and 32-channel (32ch) head-coil. On each of these four resulting diffusion tensor imaging data sets, two independent runs of global tractography were performed (Run 1 and Run 2). During each tracking run, streamlines were reconstructed once with 1 reconstruction repetition (1 Rep) and once with 10 reconstruction repetitions (10 Rep). For each of these tractography data sets, the selection of streamlines was performed with four different variants of selection parameters: endpoint_nofuzzy (end_nofuz; blue shading); endpoint_fuzzy (end_fuz; red shading); visiting_nofuzzy (vis_nofuz; green shading); and visiting_fuzzy (vis_fuz; purple shading), resulting in 64 subject-specific data sets of streamline counts. Statistical analyses aimed at probing the effect of number of reconstruction repetitions, type of headcoil, and streamline selection variant on within-session and between-session test-retest reproducibility, resulting in four analyses: Model 1, within-session reproducibility of reconstruction repetitions Â streamline selection variant; Model 2, betweensession reproducibility of reconstruction repetitions Â streamline selection variant; Model 3, within-session reproducibility of type of head-coil Â streamline selection variant; Model 4, between-session reproducibility of type of head-coil Â streamline selection variant. Black bars indicate which sub-set of data entered these four statistical models.  for all four factorial models are depicted in Fig. 3. In relation to factorial models 1 and 2, scatterplots in Fig. 4 illustrate the direct comparison of reproducibility for a given connection between tracking with 1 reconstruction repetition versus 10 repetitions, separately illustrated for the four selection variants. That is, for each streamline selection variant, the connections found in both the 1 repetition and 10 repetition data set are directly compared with each other. Information on seed-specific ICC (2,1) values is further presented in Figs. 5 and 6 which depict the ICC values in a 90 Â 45 connectivity matrix. In relation to factorial models 3 and 4, scatterplots in Fig. 7 depict connection-specific direct comparisons of reproducibility for 12ch-versus 32ch-coil data, separately illustrated for the four selection variants. Seed-specific ICC(2,1) values for the different combinations of type of head-coil and streamline selection variant are illustrated in 90 Â 45 connectivity matrices in Figs. 8 and 9. Table 4 reports the correlation of ICC(2,1) values of all factor combinations of the four factorial models with mean seed size and mean streamline counts. Figs. 10 and 11 depict FDR-adjusted p-values of the tests for differences between the correlations of all factor combinations of each factorial model.

Participants
All participants who provided data were students recruited from the University of Freiburg who participated voluntarily. To be eligible for participation, participants had to have German as native language, normal or corrected-to-normal vision, unimpaired color vision and be neurologically and psychiatrically healthy. Based on these inclusion criteria, data from a total of 30 participants, who gave written informed consent to participation and were compensated with 60€, were collected. Two participants had to be excluded because of a high depressivity score and an incidental MRI finding, leaving data from N ¼ 28 participants (N ¼ 13 males) with a mean (7 SD) age of 22.53 ( 71.77) years and a mean 15.82 (71.54) years of education for analysis. Table 3 Reproducibility statistics for the four factorial models. ; η 2 , eta squared, denoting the share of total variance explained by each factor. As per convention by Cohen [14], an effect is considered small if η 2 Z .01, medium if η 2 Z.06, and large if η 2 Z .14.

Reconstruction Repetitions Â Streamline Selection Variant
Data collection was approved by the local ethics committee and conducted in accordance with the Declaration of Helsinki. Please note that this sample is identical to the one in the related research article (Schumacher et al., Probing the reproducibility of quantitative estimates of structural connectivity derived from global tractography). Further details on the sample can be found in the related research article.

Magnetic resonance imaging
In two separate testing sessions, MR imaging was performed on the same 3T TimTrio MR scanner (Siemens GmbH, Erlangen, Germany) acquiring the same set of MRI data. Using a 12-channel head- Relative treatment effects (RTEs) for the four statistical models. The RTE for a factor combination denotes the probability that a randomly chosen observation for that factor combination yields a higher ICC(2,1) value than a randomly chosen observation from the whole data set. Thus, higher RTEs indicate higher ICC(2,1) values for that factor combination, expressed as a probability value ranging from 0 to 1. Panels depict RTEs for (A) Model 1, within-session reproducibility of reconstruction repetitions Â streamline selection variant; (B) Model 2, between-session reproducibility of reconstruction repetitions Â streamline selection variant; (C) Model 3, within-session reproducibility of type of head-coil Â streamline selection variant; and (D) Model 4, between-session reproducibility of type of head-coil Â streamline selection variant. Error bars denote the 95% confidence interval of RTEs. End_nofuz, endpoint_nofuzzy; end_fuz, endpoint_fuzzy; vis_nofuz, visiting_nofuzzy; and vis_fuz, visiting_fuzzy streamline selection. 1 Rep, 1 reconstruction repetition; 10 Rep, 10 reconstruction repetitions. 12ch Coil, 12-channel head-coil; 32ch Coil, 32-channel head-coil. coil, the following imaging sequences were acquired: a three-dimensional T1-weighted magnetization-prepared rapid gradient-echo (MPRAGE) sequence (repetition time [TR], 2200 ms; echo time [TE], 2.15 ms; flip angle, 12°; 160 sagittal slices; matrix size, 256 Â 256; field of view, 256 mm; voxel size, 1 Â 1 Â 1 mm 3 ) and a diffusion-sensitive single-shot spin-echo echo-planar imaging (EPI) sequence with cerebrospinal fluid (CSF) suppression applying a HARDI (high angular resolution diffusion imaging) acquisition scheme with 61 diffusion encoding gradient directions (b-factor, 1000 s/mm 2 ); 69 axial slices; TR, 10 000 ms; TE, 94 ms; flip angle, 90°; matrix size, 104 Â 104; field of view, 208 mm; voxel size, 2 Â 2 Â 2 mm 3 and nine scans without diffusion weighting (b-factor, 0 s/mm 2 ), equally distributed across the acquisition series. A 32-channel head-coil was used to acquire a second diffusion-sensitive EPI sequence with 61 diffusion encoding gradient directions (b-factor, 1000 s/   4. Scatterplots of connection-specific ICC(2,1) values for Models 1 (within session reproducibility; left column) and 2 (between-session reproducibility; right column). In each column and for each streamline selection variant, the ICC(2,1) value for a given connection derived from 1 reconstruction repetition (x-axis) is plotted against the ICC(2,1) value for the same connection derived from 10 reconstruction repetitions (y-axis). Thus, connections with a higher ICC(2,1) value for 10 repetitions are above the diagonal, and connections with a higher ICC(2,1) value for 1 repetition are below the diagonal. For visualization purposes, negative ICC(2,1) values were set to zero (0.08% of values).    7. Scatterplots of connection-specific ICC(2,1) values for Models 3 (within-session reproducibility; left column) and 4 (between-session reproducibility; right column). In each column and for each streamline selection variant, the ICC(2,1) value for a given connection acquired with the 12-channel head-coil (x-axis) is plotted against the ICC(2,1) value for the same connection acquired with the 32-channel head-coil (y-axis). Thus, connections with a higher ICC(2,1) value for 32ch data are above the diagonal, and connections with a higher ICC(2,1) value for 12ch data are below the diagonal. For visualization purposes, negative ICC(2,1) values were set to zero (0.07% of values).   Numbers in cells refer to Spearman's rho (ρ) coefficient of the correlation between intra-class correlation coefficient type ICC(2,1) and the mean seed size or mean streamline count of each connection across all connections of each of the four statistical models.

. Numbers in cells refer to
p-values of the z-test of difference between the two Fisher's r-to-Z transformed correlation coefficients, adjusted for the false discovery rate (FDR) and then tested against α ¼ .05. Please note that FDR-adjusted p-values can be larger than 1. Green shading denotes p o .05, i.e. significant differences between the two correlations. Gray shading denotes p4 .05, i.e. non-significant differences between the two correlations. For all significant comparisons between a visiting and an endpoint variant, the correlation with mean seed size was significantly larger for the endpoint than the visiting variant (see Table 4). Abbreviations: End, endpoint streamline selection; vis, visiting streamline selection; nofuz, no fuzzy streamline selection; fuz, fuzzy streamline selection; 1 Rep, 1 reconstruction repetition; 10 Rep, 10 reconstruction repetitions; 12ch, 12-channel head-coil; 32ch, 32-channel head-coil.

. Numbers in cells refer
to p-values of the z-test of difference between the two Fisher's r-to-Z transformed correlation coefficients, adjusted for the false discovery rate (FDR) and then tested against α ¼ .05. Please note that FDR-adjusted p-values can be larger than 1. Green shading denotes p o .05, i.e. significant differences between the two correlations. Gray shading denotes p4 .05, i.e. non-significant differences between the two correlations. For all significant comparisons between a visiting and an endpoint variant, the correlation with mean streamline counts was significantly larger for the visiting than the endpoint variant (see Table 4). Abbreviations: End, endpoint streamline selection; vis, visiting streamline selection; nofuz, no fuzzy streamline selection; fuz, fuzzy streamline selection;

Global fiber tractography
Tractography based on HARDI images was performed using the global tractography approach or Gibbs tracking implemented in DTI & Fibertools [1]. With this approach, the entire connectome of estimated fibers is reconstructed in a single optimization step by modeling segments of the to-be-reconstructed fibers as small cylinders freely moving in the tissue due to Brownian motion. During an iterative simulated annealing procedure, the assumed temperature is slowly reduced so that the cylinders align to form longer chains and then fiber tracts [1]. To designate the white-matter voxels considered for fiber reconstruction, each individual's whole-brain white matter image from the first testing session (segmented, written in native space, co-registered and resliced with reference to the b0-image, binarized at a threshold of 4.50) was used. Fiber reconstruction was carried out using the 'dense' parameter set (segment weight of 0.05, corresponding to the percentage of brain-averaged anisotropic signal at which reconstruction of fibers was thresholded; a start and stop temperature of 0.1 and 0.001, respectively; 5 Â 10 8 iterations; and a minimum fiber length of 10 cylinder segments; cf. [1]).
In addition to this standard procedure, we additionally applied a technique whereby a repetitive reconstruction of fibers is introduced. After the initial global tracking is completed, the system is set to a given temperature at which a repeated number of 'samples' are collected. In detail, the end state of the first estimation process (i.e. repetition) represents the start state for the next estimation process, with the iterative annealing procedure being started anew from this start state. The end state of that annealing procedure is then again used as start state for the next repetition and so forth. The streamlines reconstructed during each repetition are then sum-aggregated into a single fiber tracking Table 5 Percentage of total number of streamlines selected for statistical analysis using endpoint_nofuzzy streamline selection for data acquired with the 12-channel head-coil.

Within-Session Reproducibility
Between-Session Reproducibility Within-session reproducibility refers to data acquired with the 12ch head-coil using the criterion that for a given connection a streamline count larger than 0 had to be present in at least 95% of subjects on both tracking runs of the first testing session. Between-session reproducibility refers to data acquired with the 12ch head-coil using the criterion that for a given connection a streamline count larger than 0 had to be present in at least 95% of subjects on the first tracking run of both testing sessions. 1 Rep, 1 reconstruction repetition; 10 Rep, 10 reconstruction repetitions. output file. For this repetitive reconstruction, the number of repetitions, a temperature at which samples are collected, and the number of iterations for the annealing procedure applied during every repetition have to be given. Here we used 1 Â 108 iterations at a temperature of 0.1 (effectively resulting in complete repetition of the whole fiber reconstruction procedure).

Streamline selection
To select streamlines, binary seed images were re-normalized from MNI space into an individual's native space to select the subject-specific streamlines. This selection procedure demands that two parameters be defined: First, either streamlines ending in the seed or streamlines passing through the seed can be selected (endpoint vs. visiting streamlines). Second, seed images can either be used in their binary version, so that only seed mask voxels with a value of 1 are considered for streamline selection, or a Gaussian kernel is applied to the seed image voxels, resulting in 'fuzzy' borders of the image mask, so that streamline selection is then based on voxels that have at least a given minimum probability of being inside the seed mask (fuzzy vs. no fuzzy selection). Here we used a Gaussian kernel with σ¼ 1 mm (corresponding FWHM E 2.35 mm) and a minimum probability of 0.1 for fuzzy selection. These two selection parameters result in four possible variants for the selection of streamlines: endpoint_nofuzzy, endpoint_fuzzy, visiting_nofuzzy, and visiting_fuzzy.
The streamlines for the single seeds (i.e. all streamlines that end in/pass through a seed) and for all pairwise connections between the 90 seeds were selected using the four streamline selection variants. As streamline counts are estimated without directional information, there were 4095 possible streamline count values in total (89 Â 45 bivariate connections plus the 90 single-seed streamline counts). The streamline counts for a given connection were divided by the subject-specific total number of streamlines (computed for each unique combination of testing session, type of head-coil, tracking repetition, and tracking run) to correct for differences in head size, which influence the total number of streamlines reconstructed [5]. This adjusted number of streamlines was multiplied by the sample mean of total streamline counts to yield a value in the same magnitude order as the uncorrected number of streamlines. For a connection to be included in the statistical analyses, an adjusted streamline count greater than 0 had to be present in at least 95% of subjects in both measurements on which reproducibility was computed. For the connections passing this threshold, a streamline count of zero was given if no streamlines had been reconstructed for a subject. Data acquisition with the 32ch head-coil resulted in volumes with 43 slices only partly covering the temporal, inferior frontal, and inferior occipital lobes. A seed was considered sufficiently covered if at most 5% of its voxels were outside of the 32ch-coverage mask (mask image of the individual head-coil placement and coverage). Across subjects and sessions (N ¼ 56), a seed was considered for analysis if sufficiently covered in at least 95% of subjects. This procedure resulted in 37 seeds being excluded from analysis for the 32chcoil data: 5, 6, 15, 16, 21, 22, 27, 28, 37-44, 46-56, 69, 70, 83-90 (AAL numbering convention).
For an indication of how many of all reconstructed streamlines actually entered subsequent statistical analyses, Table 5 reports the percentage of total streamline counts to which streamline counts of selected connections corresponded (i.e. counting the streamlines that ended in both gray matter seeds forming a given connection relative to all reconstructed streamlines). This was carried out for within-session and between-session reproducibility analysis of 12ch head-coil data. In detail, the adjusted number of streamlines for all bivariate connections present in at least 95% of subjects across both tracking runs (within-session reproducibility) or both testing sessions (between-session reproducibility) were sum-aggregated and divided by the total number of streamlines per subject. This percentage value was mean-aggregated across subjects. To avoid redundant counting of streamlines due to visiting or fuzzy selection, only data of the endpoint_nofuzzy streamline selection were used. As the 32ch head-coil did not provide whole-brain coverage and thus selection of streamlines was not based on all seeds of the AAL atlas (see above), 32ch data were not used for this calculation. On average, roughly 50% of all reconstructed streamlines entered statistical analyses for the endpoint_nofuzzy streamline selection variant (Table 5).

Experimental design and statistical analysis
A schematic of the experimental design is depicted in Fig. 1. Independent diffusion-weighted image data were not only acquired on two testing sessions, but were also acquired with two different types of head-coil (12 channels [12ch] vs 32 channels [32ch]). In addition, global tractography was performed twice on the data from each session, thus resulting in two independent runs of tracking per testing session and type of head-coil, and was furthermore separately performed with two variants of reconstruction repetitions (1 repetition vs. 10 repetitions). In addition, the selection of streamlines was carried out using the four streamline selection variants. Thus, for a given AAL-based connection in an individual, there were 64 independent streamline count estimations: 2 sessions Â 2 types of headcoil Â 2 tracking runs Â 2 variants of reconstruction repetitions Â 4 streamline selection variants (Fig. 1). For connections not included in the 32ch-coil data due to insufficient seed coverage, there were 32 independent streamline count estimations.
Statistical analyses assessed the impact of the following three factors on the reproducibility of streamline counts: (i) streamline selection variant (endpoint_nofuzzy, endpoint_fuzzy, visiting_nofuzzy, visiting_fuzzy), (ii) number of reconstruction repetitions (1 vs. 10 repetitions), and (iii) type of head-coil (12ch vs 32ch). As data for some factor combinations were non-normally distributed, nonparametric analyses were run with the R statistics package nparLD [6] on the ranks of values (instead of the raw values), thus constituting a non-parametric equivalent to a repeated-measures analysis of variance (ANOVA). However, nparLD is restricted to test the effects of a maximum of two withinsubject and two between-subjects factors at once so that the three factors of interest here could not be evaluated in one overall model but four separate models.
In consequence, only the 12ch-coil data were used to assess the effect of reconstruction repetitions and only the data from 10 reconstruction repetitions were used to compare the 12ch-and 32ch-coil data. This procedure yielded two analytical designs with two factors each: reconstruction repetitions Â streamline selection variant and type of head-coil Â streamline selection variant. For both analytical designs, reproducibility was assessed for (a) within-session data by comparing the streamline counts from the two tracking runs of the first testing session and for (b) between-session data by comparing streamline counts between the two testing sessions (using the data from the first tracking run of each testing session). Thus, four reproducibility analyses were performed (Fig. 1): Model 1, within-session reproducibility of reconstruction repetitions Â streamline selection variant; Model 2, between-session reproducibility of reconstruction repetitions Â streamline selection variant; Model 3, within-session reproducibility of type of head-coil Â streamline selection variant; Model 4, betweensession reproducibility of type of head-coil Â streamline selection variant.
For each of these four analyses, reproducibility was assessed connection-wise using intra-class correlation coefficients (ICC) based on two-way random effects models [7,8], with participants as "targets" and the two measurements (i.e., the two tracking runs or the two testing sessions) as "raters". That is, for each connection, the between-subjects and within-subject difference in streamline counts was assessed to probe the amount of total variance that can be attributed to true interindividual differences [8,9]. Here we assessed the absolute agreement [10], that is, the absolute difference in individual streamline count estimates between measurements, corresponding to ICC (2,1) according to Shrout and Fleiss (1979) (or ICC(A,1) according to McGraw and Wong, 1996). Reproducibility was considered low if ICC(2,1) r .59, marginal for .60-.69, adequate for .70-.79, high for .80-.89, and very high if Z .90 [11].
Factorial models were then computed on ICC(2,1) values to evaluate the four models described above using the nparLD package (version 2.1; [6]) for R statistics (version 3.3.1; [12]). For Model 1, there were N ¼822 bivariate connections that were found in all of the eight data sub-sets. Thus, their respective ICC(2,1) values entered the model as "subjects". For Model 2, N ¼829 bivariate connections were successfully reconstructed in all eight factor combinations and thus their ICC(2,1) values entered the nparLD analysis. For Model 3, the ICC(2,1) values of N ¼574 bivariate connections entered the within-session model with type of head-coil and streamline selection variant as within-subject factors. In Model 4, the between-session analysis on ranks for ICC(2,1) values with head-coil and selection variant as within-subject factors was based on N ¼578 connections. Furthermore, the dependency of ICC(2,1) values on the size of the seeds used for streamline selection and on the streamline counts themselves was probed. To this end, for each connection, the mean number of voxels of the two seed masks and the mean streamline count value (aggregated across the sample and both tracking runs or testing sessions, i.e. N ¼56) were computed and correlated with the ICC(2,1) value of that connection using Spearman's rho rank correlation, with ICC(2,1) values being Fisher r-to-Z transformed beforehand. Correlations were computed for each factorial combination in the four statistical models separately. Subsequently, it was tested whether there were significant differences between the eight correlation coefficients per statistical model. To this end, Spearman correlation coefficients were Fisher r-to-Z transformed; all possible differences between the correlations were computed and compared against the standard normal z-distribution. To correct for multiple testing, p-values were adjusted for the false discovery rate (FDR) according to Benjamini and Yekutieli [13] within each of the statistical models and then tested against a significance threshold of α¼ .001 for the correlation coefficients themselves (N ¼8 tests per model) and a threshold of α¼.05 for the pairwise differences in correlation coefficients (N ¼ 28 tests per model). FDR correction was performed in Matlab using the fdr_bh script by David Groppe (available from www.mathworks.com/matlabcentral/fileexchange/27418-fdr-bh/content/fdr_bh.m; accessed on 20th September 2016).