Sensor space group analysis for fNIRS data

Highlights • We apply random-effects analysis using summary statistics to fNIRS data.• Individual contrast images are generated in a 2D or 3D canonical scalp surface.• Random-effects analysis then enables inference about population effects.• We show that left frontopolar area is activated in a population during Stroop effects.• Results are consistent with previous neuroimaging findings.


Introduction
Functional near-infrared spectroscopy (fNIRS) is a neuroimaging technique for monitoring hemodynamic and oxygenation changes in the brain by measuring changes in optical absorption (Jobsis, The poor spatial resolution of fNIRS leads to difficulties in sensor space analyses of multisubject fNIRS studies. Channel-wise group analysis is used to assess whether regionally specific effects are common across or between groups of interest, assuming that the channel positions are consistent across subjects (Ciftci et al., 2007(Ciftci et al., , 2008. However, this assumption is often violated, because variability in head shape and size makes placement of the optical source and detector less reproducible (Tsuzuki and Dan, 2014). To address the misalignment of channel locations between subjects, several studies applied spatial interpolation to the channel-specific estimates from the individual subject analyses, and then performed group analysis on the individual topographic images (Schroeter et al., 2004;Plichta et al., 2007;Ye et al., 2009;Fekete et al., 2011;Tak and Ye, 2014).
Specifically, Fekete et al. (2011) used the two-level mixedeffects general linear model (GLM) (Beckmann et al., 2003) for the group analysis of interpolated fNIRS topographic maps. Spatial interpolation produces spatially correlated voxels, and it is therefore required to adjust p-value for the multiplicity of test performed. However, spatial correlation due to interpolation was not accommodated, when making inferences about regional effects at the group level. This problem can be solved by using random field theory (RFT) which allows one to assign adjusted p-values to topological features of random field statistics (Worsley et al., 1996). Tak and Ye (2014) used the two-level mixed-effects general linear mixed model (GLMM) (Searle, 1979;Cnaan et al., 1997) to relate the effects of interest at the group level to the concatenated individual fNIRS data, and its statistical significance was assessed using RFT. The GLMM makes fewer assumptions in estimating the error variances than the summary statistic approach that has been widely used in the group analysis of neuroimaging data Mumford and Nichols, 2009). However it is more computationally demanding, and this is particularly acute for fNIRS time series as they are acquired at higher sample rate than fMRI. Moreover, the group analysis presented in Tak and Ye (2014) was tested using smaller sample size (3 subjects) than the size typically used in functional neuroimaging studies (Desmond and Glover, 2002), and therefore further validation of the methods is required.
In this paper, as a practical solution to these issues, we apply random-effects analysis using summary statistics to fNIRS topographic data, to make inferences about regionally specific effects induced by (potentially) multiple experimental factors. In multisubject studies, random-effects analysis treats subject effects as random variables, and therefore allows for inferences about the population from which the subjects were drawn (Penny and Holmes, 2006). Random-effects analysis using the summary statistics approach has been adopted for group analysis of various neuroimaging data, including fMRI, electroencephalography (EEG), and magnetoencephalography (MEG) (Penny and Holmes, 2006;Litvak et al., 2011). It is a straightforward, computationally simple and flexible approach that can accomodate a broad range of experimental designs. Most simply, the experimental effect of interest can be captured using a single (summary) contrast image per subject and its statistical significance assessed using a one-sample t-test design at the second level. The generality of this approach stems from the fact that the summary images themselves can capture main effects or interaction effects from factorial designs (Winer et al., 1991;Penny and Henson, 2006).
In this paper, we show how random-effects analysis is applied to sensor space fNIRS data, focusing on (i) the generation of spatially interpolated contrast images for individual subjects, (ii) the implementation of group analysis pipeline which enables one to use the established random-effects analysis with the combined estimates from a group of subjects. In particular, we describe spatial interpolation methods for generating fNIRS contrast images either on a two-dimensional (2D) regular grid or on a three-dimensional (3D) triangular mesh, both representations of canonical scalp surface (Perrin et al., 1989;Litvak et al., 2011), and discuss under what circumstances each of these approaches is appropriate. Their topological features are then assessed in a statistical framework, when making inference about regionally specific effects at the group level (Taylor and Worsley, 2007;Kilner and Friston, 2010;Flandin and Friston, 2015). The proposed methods for the sensor space group analysis are validated using fNIRS data acquired from 21 subjects recorded during a colour-word matching Stroop task in an eventrelated design (Stroop, 1935;MacLeod, 1991). This paper is structured as follows. In Section 2, we first describe a generative model of sensor space fNIRS data, and review the random-effects analysis via summary statistics approach. We then illustrate basic procedures by applying the random-effects analysis to fNIRS topographic images, with a special focus on issues that relate to spatial interpolation on the canonical scalp surface. In Section 3, we provide an illustrative group analysis using fNIRS data acquired during a colour-word matching Stroop task (Stroop, 1935;MacLeod, 1991).

Methods
Multiple-subject fNIRS data are analysed in a two-level process. In the first-level analysis, the general linear models for each subject are independently fitted to individual data, and in the second-level analysis, the population effects are estimated using the summarystatistics approach. The following subsections describe the fNIRS measurement model in sensor space, and then introduces withinand between-subject models for the group analysis of fNIRS data.

fNIRS measurement model in sensor space
Optical attenuation in a highly scattering medium is described by the following modified Beer-Lambert law (Delpy et al., 1988) where i = 1, . . ., M observations, j = 1, . . ., J channels, is a particular wavelength, is optical density change, H and Q are changes in oxygenerated and deoxygenated hemoglobin (HbO and HbR, [mM]),˛H and˛Q are the molar absorption coefficients [mM −1 cm −1 ] for HbO and HbR (Matcher et al., 1995), d is a differential pathlength factor (DPF) (Duncan et al., 1996;Scholkmann and Wolf, 2013) which depends on and age of subject a, and l is distance between optical source and detector [cm]. This modified Beer-Lambert law shows that the optical density changes are linearly proportional to the changes in absorption coefficients, reflecting the hemoglobin concentration changes. Measurements of optical density changes at two wavelengths can then be used to calculate the changes in HbO and HbR in underlying brain regions: where c =˛H( 1 )˛Q( 2 ) −˛H( 2 )˛Q( 1 ).

Within-subject model
The GLM for each subject can be written as (Friston et al., 1995;Ye et al., 2009) where Y is [M × J] matrix containing hemoglobin responses ( H or Q) from a particular subject, M is total number of scans, and J is total number of channels; X is [M × L] design matrix containing L regressors of interest (e.g., stimulus function convolved by the canonical hemodynamic response function) and confounds;ˇis [L × J] matrix of regression coefficients; and is [M × J] matrix of zero-mean normally distributed errors. The variance of the errors at a particular channel is given by 2 V, where 2 is channel-specific and V is a global temporal autocorrelation matrix.
An estimator of channel-specific parameterˇcan be obtained by multiplying the observations Y and their model by a filter matrix S then using the least squares: where X * = SX, Y * = SY, and a filter matrix is set to S = V −(1/2) for whitening the data before fitting the GLM. The intrinsic temporal correlation V can be estimated, based on a first order autoregressive model (AR(1)) (Purdon and Weisskoff, 1998), and its model parameters are estimated using a restricted maximum likelihood (ReML) method (Friston et al., 2002). The effects of interest are then estimated as c Tˆ, where c is [L × 1] contrast vector which forms a linear combination of parameter estimates.

Group analysis via summary-statistics
Random-effects analysis takes into account within-subject and between-subject variability, and allows for making inferences about population effects (Penny and Holmes, 2006). For a given location, this random effects analysis is implemented using the summary-statistics approach: where n = 1, . . ., N subjects; w n = c Tˇn is the true mean effect for subject n at a particular location; w n = c Tˆn is the sample mean effect; w pop = X n,GˇG is the true effect for the population, where X n,G is the [1 × P] row vector in the group-level design matrix,ˇG is the [P × 1] vector of group-level parameters; the within-subject Gaussian error e n has zero mean and variance 2 w ; and the betweensubject Gaussian error z n has zero mean and variance 2 b . Collapsing the two levels into one gives where the mixed-effects error, e * n = e n + z n , has zero mean and a variance given by Thus we see that, on average, the error variance contains a contribution from both the within and between subject errors. Considering Eq. (6) for all subjects gives where . .
The least squares estimate ofˇG is then given bŷ and its variance is estimated as

Individual topographic images
In Eq. (4), we have estimated the GLM regression coefficient,ˆ, and the effects of interest, c Tˆf or each subject from fNIRS channel measurements. However, the spatial resolution of the channelspecific estimates is limited to the optical source-detector distance due to the high level of light scattering. Additionally, optical probe locations are not consistent across subjects due to variability in head shape and size. It is therefore necessary to estimate (i.e. interpolate) the effects of interest for each subject at intervening voxels. Spatial interpolation of the channel-specific estimates on the canonical scalp surface generates individual topographic images containing the voxel-specific effects of interest, c Tˆ( r) where r is a voxel location. The implied smoothing in this interpolation blurs effects that are focal in space, and ensures overlap among a group of subjects.
Specifically, individual topographic images can be computed in two stages. Channel positions are first normalised to the Montreal Neurological Institute (MNI) coordinate system using a virtual registration method (Okamoto et al., 2004;Singh et al., 2005;Tsuzuki et al., 2007), and projected either onto a 2D regular grid or onto a 3D triangular mesh, both representations of canonical scalp surface. Surface interpolations for scattered data on 2D and 3D canonical scalp surfaces are then applied to channel-specific estimates of GLM parameters, to generate individual topographic images.
In the topographic mapping on a 2D regular grid (Kiebel and Friston, 2004), we perform the linear interpolation on a planar and circular surface that accords with the international 10-20 system and is commonly used in EEG/MEG, and fNIRS data displays (Jasper, 1958;Jurcak et al., 2007;Litvak et al., 2011). The 2D topographic image is then smoothed by multidimensional convolution with a Gaussian kernel, to accommodate spatial variability over subjects and ensure the images conform to the assumptions of the topological inference approach (RFT applies to continuous statistical processes and we must ensure the smoothness of the underlying statistical field is large in relation to the voxel size ). The image is saved in standard NIfTI-1 data format prior to group level analysis.
In the topographic mapping on a 3D triangular mesh (Mattout et al., 2007), spherical splines are applied to construct spatial maps of the effects of interest for each subject (Perrin et al., 1989;Oostenveld et al., 2010). Specifically, the scalp channel positions are projected on the sphere and represented using spherical coordinates (Estrin and Uzgalis, 1969). The effects of interest at any position r i on the canonical scalp surface is then given by where r j,c is the jth channel position, j = 1, . . ., J; cos(r i , r j,c ) is the cosine of the angle between interpolation position r i and channel position r j,c ; k 0 and k j are coefficients fit to the data; P n is the nth degree Legendre polynomial; m is the spline order which controls the level of smoothness of interpolation kernel, and a value of m = 3 or 4 is typically chosen (Ferree, 2006). The J + 1 coefficients k j are calculated by imposing two conditions (i) sum of coefficients k j is In the congruent condition, the colour word at the top row was presented in the congruent colour, whereas for the incongruent condition, the colour word was presented in a different colour. Stimuli were presented randomly in 80 congruent and 20 incongruent combinations of four colour words "red", "blue", "yellow", and "green". Each stimulus was presented until a response was given, with a maximum allowed response time of 2 s. Interstimulus interval was randomly selected in a range of 9-12 s. One of the objectives of this study was to understand brain activity associated with the suppression of automatic responses (Stroop, 1935;MacLeod, 1991).
(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) equal to zero, and (ii) the interpolation function should reproduce the data when evaluated at the channels: k j g(cos(r j,c , r k,c )).
The 3D topographic image on the canonical scalp surface is then saved in standard surface-based data format GIfTI.

Colour-word matching Stroop task data
We now apply the summary statistics approach to random effects analysis to sensor space fNIRS data from 21 subjects. Twenty-six healthy subjects participated in this study (range 23-63 years, mean = 41.3, SD = 14.5, 14 males, 12 females). Among these participants, 3 subjects were excluded from the study before the data were collected due to failure to accomplish experimental tasks, and 2 subjects were excluded from the group analysis due to excessive artifacts in data. All subjects gave informed written consent prior to testing under the approval of the Jichi Medical University ethics committee. fNIRS data were recorded during a colour-word matching Stroop task in an event-related design, under congruent and incongruent experimental conditions (Stroop, 1935;MacLeod, 1991). Subjects wearing a cap that holds the optical probes were seated in front of a computer screen on which two rows of letters were displayed. The subjects were instructed to determine whether the colour of the top row letters corresponded to the colour name written on the bottom row. In the congruent condition, the colour word at the top row was presented in the congruent colour, whereas for the incongruent condition, the colour word was presented in a different colour, as shown in Fig. 1. Stimuli were presented randomly in 80 congruent and 20 incongruent combinations of four colour words "red", x' indicate optical source and detector. 'number' on white background indicates a functional near-infrared spectroscopy (fNIRS) channel position defined as a midpoint between a source and detector. The channel positions are transformed into the Montreal Neurological Institute (MNI) space, and then projected onto the surface of a volume rendered brain (Okamoto et al., 2004;Singh et al., 2005). 'letter + number' on black background indicates an electroencephalography (EEG) electrode position in the 10-10 system (Chatrian et al., 1985;Jurcak et al., 2007), an extension to the international 10-20 system (Jasper, 1958).
"blue", "yellow", and "green". Each stimulus was presented until a response was given, with a maximum allowed response time of 2 s. Interstimulus interval was randomly selected in a range of 9-12 s. One of the objectives of this study was to understand brain activity associated with the suppression of automatic responses. Previous studies showed activations in middle frontal gyrus (MFG), inferior frontal gyrus (IFG), and anterior cingulate cortex while performing the Stroop task (MacLeod and MacDonald, 2000;Leung et al., 2000).
Optical density changes during the above Stroop task were acquired using a multichannel fNIRS optical topography system (ETG-4000, Hitachi Medical Corporation, Kashiwa, Japan). The fNIRS system had 52 channels, consisting of 17 optical sources with wavelengths of 695 nm and 830 nm, and 16 optical detectors. The sampling frequency was 10 Hz. The distance between the optical source and detector was 3 cm. The geometry of the optical probes covering the prefrontal area is shown in Fig. 2. EEG electrode positions in the 10-10 system (Chatrian et al., 1985;Jurcak et al., 2007) are additionally displayed in Fig. 2(b), to provide the standard cranial landmarks.
The first-level (individual subject) analysis was performed with the SPM for fNIRS toolbox (https://www.nitrc.org/projects/spm fnirs/). Specifically, the hemoglobin concentration changes for each channel were calculated from fNIRS data using the modified Beer-Lambert law (Delpy et al., 1988). Time series of hemoglobin changes were preproprocessed with the following steps: (i) motion artifact was reduced using a method based on moving standard deviation and spline interpolation (Scholkmann et al., 2010); (ii) physiological noise, including respiration and cardiac pulsation, was removed using a band-stop filter with stopband frequencies of 0.12-0.35 Hz and 0.7-2.0 Hz, respectively. In both cases, we used a fifth-order infinite impulse response (IIR) Butterworth filter (Oostenveld et al., 2010); and (iii) very low-frequency confounds were removed using a high-pass filter based on a discrete cosine transform set with a cutoff frequency of 1/64 Hz. The data were downsampled to 1 Hz and whitened using the AR (1) model (Purdon and Weisskoff, 1998;Friston et al., 2002). In this  (Perrin et al., 1989;Oostenveld et al., 2010;Litvak et al., 2011). The implied smoothing in this interpolation blurs effects that are focal in space, and ensures overlap among a group of subjects. study, fNIRS data was recorded using optical source and detector separated by 3 cm, and therefore hemoglobin response analysed from each channel measurement contained signals arising from both cerebral and extracerebral compartments (i.e. cerebral blood flow and scalp blood flow) (Hirasawa et al., 2015;Selb et al., 2014). This would result in reduced sensitivity of fNIRS to changes in the cortex versus changes in the extracerebral tissue. However, when a short source-detector separation channel (sensitive to superficial layers only) is available, extracerebral effects can be reduced and separated from the fNIRS signal using state-space modelling with Kalman filter (Gagnon et al., 2011) and independent component analysis (Funane et al., 2014).
In the spatial preprocessing step, channel positions in subject space were transformed into corresponding positions in MNI space using the NFRI functions (Okamoto et al., 2004;Singh et al., 2005). The GLM for each subject was then fitted to the channelspecific responses. The first-level design matrix was constructed to include two regressors modelling two conditions (congruent and incongruent events) convolved with a basis set consisting of the canonical hemodynamic response function and its temporal derivative, which allowed to take into account of the variability in the delay of the peak of the response. In our approach, the canonical hemodynamic function was characterised by a double gamma function (onset delay, 0 s; peak delay, 6 s; undershoot delay, 16 s) (Boynton et al., 1996;Friston et al., 1998). However, it is also possible to utilise an adaptive hemodynamic response function in the current framework, to adjust parameters for temporally different behaviours of HbO and HbR (Uga et al., 2014). Overall, the firstlevel analysis allowed us to estimate the effects of interest for each subject, defined with the relevant contrasts of the GLM parameter estimates, from the channel-specific fNIRS response.
Individual topographic images for each of the contrasts were made by interpolating the channel-wise contrast on the 2D regular grid (64 pixels in each spatial direction), and 3D triangular mesh (2562 vertices), both representations of canonical scalp surface. Specifically, 2D contrast images were produced by using linear interpolation between channels. The images were smoothed using a Gaussian kernel with full width at half maximum (FWHM) of 14 mm, to make the error field a reasonable lattice approximation of a random field with a multivariate Gaussian distribution. In 3D topographic mapping, spherical splines with the order of 4 were applied to the channel-wise contrasts. The procedures for computing 2D and 3D contrast images per subject are summarised in Fig. 3.
The contrast images of all subjects from the first-level were then analysed as a random-effects analysis via summary-statistics implemented in SPM12 (Penny and Holmes (2006), http://www.fil. ion.ucl.ac.uk/spm/software/spm12/). The random-effects analysis allowed fNIRS topographic images to be used for making inference about the population from which subjects were drawn. A schematic diagram describing the random-effects analysis using summarystatistics for fNIRS is shown in Fig. 4.

Results
Effects of the colour-word matching Stroop task were identified in a population using random-effects analysis, as shown in Fig. 5. Two contrast images per subject, containing the effects of congruent and incongruent conditions, were computed from the HbO response by applying spatial interpolation to the channel-wise contrast values. The statistical significance of each effect in the population was then assessed using one-sample t-tests at the second level. The degrees of freedom for these tests are df = 20.
The statistical parametric maps (SPMs) were thresholded using RFT (Taylor and Worsley, 2007;Friston et al., 2007). These results were compared to results obtained from conventional procedures by analysing data from each channel separately and then applying a Bonferroni correction (Ciftci et al., 2008;Schroeter et al., 2002;Fekete et al., 2011). Fig. 5 shows T-statistic maps thresholded at p < 0.02, correcting for multiple comparisons over (a) all channels using the Bonferroni correction, and (b) all voxels/vertices on the canonical scalp surface (search region) using RFT. The search Fig. 4. Schematic diagram describing the random-effects analysis using summarystatistics for functional near-infrared spectroscopy (fNIRS). The experimental effect of interest is captured using a single contrast image per subject, and its statistical significance is assessed using a one-sample t-test design at the second level (Penny and Holmes, 2006). The generality of this approach stems from the fact that the summary images themselves can capture main effects or interaction effects from factorial designs (Winer et al., 1991;Penny and Henson, 2006). region contained 1609 voxels for 2D SPMs and 291 vertices for 3D SPMs. The cluster-forming height thresholds according to RFT (T 2D c = 3.74, T 3D c = 3.66) were lower than the Bonferroni threshold (T = 3.96). Therefore, the RFT was more sensitive than the Bonferroni correction, which allowed us to observe more activation during the Stroop task.
In the context of the RFT, topological features of the excursion set were assessed controlling the family-wise false positive rate of peaks (a local maximum) or clusters (a connected component of the excursion set) (Friston et al., 1994;Flandin and Friston, 2015). We observed a significant cluster containing bilateral frontopolar area and left inferior/middle frontal areas during the incongruent condition (p 2D FWE < 0.0002, p 3D FWE < 0.0003), as shown in the right column of Fig 5b. Maximum t-value in each cluster was located at the left middle frontal area, and the channel nearest to the peak was 48 (p 2D FWE < 0.002, p 3D FWE < 0.001), where p FWE denotes the family-wise error rate corrected over the search region. A similar pattern of activation was observed during the congruent condition, albeit with fewer voxels significantly activated, as shown in the left column of Fig. 5b.
Details of significance thresholds and peak/cluster-level inferences are summarised in Tables 1 and 2. Importantly, the number of resolution elements (RESELS) in the search region is very similar for 2D versus 3D SPMs (Taylor and Worsley, 2007;Friston et al., 2007).

Table 1
Statistical results of effects of congruent condition. Random field theory computes the cluster-forming height threshold Tc that gives a corrected p-value (for peaks) of p < 0.02. The size of each cluster is then the number of contiguous voxels/vertices above Tc. We report the number of resolution elements (RESELS) in the search region, the number of voxels kE in each cluster, the maximum t-value in each cluster, T, and the family-wise error rate of either cluster-level or peak-level effects, pFWE−corr.  The statistical significance of each effect in the population was assessed using one-sample t-tests at the second level. The resulting statistical parametric maps were thresholded at p < 0.02. The p-value was corrected for multiple comparisons over (a) all sensors using the Bonferroni correction, and (b) all voxels/vertices on the canonical scalp surface (search region) using random field theory (RFT) (Taylor and Worsley, 2007;Friston et al., 2007Friston et al., , 1994Kilner and Friston, 2010;Flandin and Friston, 2015). The cluster-forming height thresholds according to RFT (T 2D c = 3.74, T 3D c = 3.66) were lower than the Bonferroni threshold (T = 3.96). The RFT was therefore more sensitive than the Bonferroni correction, which allowed us to observe more activation during the Stroop task. We observed a significant cluster containing bilateral frontopolar area and left inferior/middle frontal areas during the incongruent condition (p 2D FWE < 0.0002, p 3D FWE < 0.0003). Maximum t-value in each cluster was located at the left middle frontal area, and the channel nearest to the peak was 48 (p 2D FWE < 0.002, p 3D FWE < 0.001). A similar pattern of activation was observed during the congruent condition, albeit with fewer voxels significantly activated. pFWE denotes the familywise error rate corrected over the search region.

Table 2
Statistical results of effects of incongruent condition. Random field theory computes the cluster-forming height threshold Tc that gives a corrected p-value (for peaks) of p < 0.02. We report the number of resolution elements (RESELS) in the search region, the number of voxels kE in each cluster, the maximum t-value in each cluster, T, and the family-wise error rate of either cluster-level or peak-level effects, pFWE−corr.

Cluster-level
Peak-level   Statistical results of effects of difference between incongruent and congruent conditions. Random field theory computes the cluster-forming height threshold Tc that gives a corrected p-value (for peaks) of p < 0.05. We report the number of resolution elements (RESELS) in the search region, the number of voxels kE in each cluster, the maximum t-value in each cluster, T, and the family-wise error rate of either cluster-level or peak-level effects, pFWE−corr.  Table 3 A one-sample t-test was additionally performed (df = 20), to make an inference about the difference between incongruent and congruent effects in the population. The resulting SPM was thresholded at p < 0.05, correcting for multiple comparisons over all channels using the Bonferroni correction, and over all voxels/vertices on the canonical scalp surface (search region) using RFT. The Bonferroni threshold (T = 3.57) was intermediate between the RFT thresholds for 2D and 3D SPMs (T 2D c = 3.61, T 3D c = 3.55). However, the implied spatial smoothing in the 2D and 3D SPMs significantly reduced standard deviation of group-level parameters over subjects ( std[ˆG]), which led to increases in T-values at individual sensors and nearby voxels in 2D and 3D SPMs (see Fig. 6). Therefore, although the largest detected T-value at individual sensors did not exceed the Bonferroni threshold, we observed significantly activated voxels surrounded by sensors 27, 39, and 48 after thresholding using RFT. Overall, the RFT approach is attractive in that it automatically adjusts for the statistical dependencies in the data. This is reflected, for example, in the RESEL count with more RESELS for the difference test (e.g., 6.4 for 3D reflecting rougher images) than for the individual conditions (e.g., 3.5 for 3D incongruent condition).

RESELS
A cluster of significant effects was observed in the left frontopolar area (p 2D FWE < 0.042, p 3D FWE < 0.011), as shown in Fig. 7. Maximum t-value in each cluster was located at the left frontopolar area, and the channel nearest to the peak was 48 (p 2D FWE < 0.038, p 3D FWE < 0.005). These results indicate that activation in the left frontopolar area was significantly stronger during incongruent than congruent conditions. Our results correspond to the findings of previous studies which have shown that left prefrontal regions, including frontopolar and inferior frontal cortices, were the most strongly activated area occurred for the incongruent−congruent contrast, while bilateral activation in the frontopolar and inferior/middle frontal regions were observed during the activity in each of incongruent and congruent conditions individually (Schroeter et al., 2002;Ehlis et al., 2005;Byun et al., 2014;MacLeod and MacDonald, 2000;Leung et al., 2000).

Discussion
In this paper, we have shown how fNIRS data from multiple subjects can be analysed in sensor space using random-effects analysis (Penny and Holmes, 2006) widely used in various neuroimaging modalities including fMRI, PET, and MEG. The experimental effects of interest are captured using one or more contrast images per subject, and their statistical significance over the population the group level parameters obtained from the proposed method by analysing data from spatially interpolated contrast images. The implied spatial smoothing in the 2D and 3D SPMs significantly reduced standard deviation of group-level parameters over subjects ( std[ˆG]), which led to increases in T-values at individual sensors and nearby voxels in 2D and 3D SPMs. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. 7. Effects of Stroop interference (incongruent-congruent conditions) in a population. A one-sample t-test was performed, to make inference about the difference between incongruent and congruent effects in the population. The resulting statistical parametric map was thresholded at p < 0.05, correcting for multiple comparisons over all voxels/vertices on the canonical scalp surface (search region). A significant cluster was observed in the left frontopolar area (p 2D FWE < 0.042, p 3D FWE < 0.011). Maximum t-value in each cluster was located at the left frontopolar area, and the channel nearest to the peak was 48 (p 2D FWE < 0.038, p 3D FWE < 0.005). These results indicate that activation in the left frontopolar area was significantly stronger during incongruent than congruent conditions, and agrees with previous neuroimaging findings (Schroeter et al., 2002;Ehlis et al., 2005;Byun et al., 2014;MacLeod and MacDonald, 2000;Leung et al., 2000). assessed using the summary statistics approach to random-effects analysis. We illustrated the methods using fNIRS data recorded during a colour-word matching Stroop task (n = 21 subjects). Specifically, by applying random-effects analysis to fNIRS data, we showed that left frontopolar regions were significantly activated in the population for incongruent−congruent contrast in a colour-word matching Stroop task. This result agrees with previous neuroimaging studies using fMRI, PET, and fNIRS that have previously detected regionally specific effects in the prefrontal area related to Stroop effects (Schroeter et al., 2002;Ehlis et al., 2005;Byun et al., 2014;MacLeod and MacDonald, 2000;Leung et al., 2000).
We have generated sensor space contrast images on the 2D and 3D canonical scalp surfaces (Jasper, 1958;Litvak et al., 2011). Linear interpolation with Gaussian spatial smoothing and spherical splines were used in 2D and 3D topographic mapping respectively, to accommodate potential misalignment of channel locations and spatial variability over subjects (Perrin et al., 1989;Oostenveld et al., 2010;Litvak et al., 2011). The 3D approach may be perceived as a more natural space in which to view the data as it provides a better reflection of the underlying physical reality (i.e. sensors are located on a 2D manifold in a 3D space) (Cutini et al., 2011). The viewpoint can be changed by rotating the 3D canonical scalp surface on the screen. This allows the user to select a specific view showing where experimental effects mainly appear (note that statistical results are entirely independent of the viewpoint). The 2D approach displays experimental effects on a planar and circular surface that accords with the international 10-20 system. This has the benefit of being more familiar to users of, for example, EEG (Jasper, 1958), and does not require the user to additionally specify a point of view. Additionally, 2D contrast images are much quicker to generate than 3D contrast images. For example, on a desktop PC running Windows 8.1 (64 bit) with an Intel Xeon 3.0 GHz, a single 2D image takes 1-2 s, whereas a 3D image takes approximately 1 min.
We have applied the random-effects analysis using summary statistics to fNIRS topographic images, to make inference about the population effects Penny and Holmes, 2006). Statistical significance of regionally specific effects was then assessed using RFT (Taylor and Worsley, 2007;Kilner and Friston, 2010;Flandin and Friston, 2015). Compared with the full mixed effects model, the summary statistic approach is computationally much simpler to implement, and gives identical results unless subjects have widely varying within-subject error variance or number of trials, whereas the full mixed effects model does not require assumptions of equal within-subject error variances (Penny and Holmes, 2006;Mumford and Nichols, 2009). These are the reasons why we applied classical summary statistic approach to fNIRS topographic images.
A previous fNIRS study introduced several group analysis methods for channel-specific fNIRS responses, and compared the results from summary statistic approaches with those obtained from Bayesian analysis of the concatenated individual fNIRS data (Ciftci et al., 2007(Ciftci et al., , 2008. They showed that summary statistic approaches, including mixed-effects and random-effects analysis, gave similar results to the Bayesian approach (see Fig. 2 in Ciftci et al. (2008)) and within-subject error variance was much lower than betweensubject error variance for fNIRS signals, which supports the validity of our approach.
Several fNIRS studies used the mixed-effects model for the group analysis of fNIRS topographic maps (Fekete et al., 2011;Tak and Ye, 2014). Specifically, Fekete et al. (2011) made inference about population effects from visual task data using a two-level mixed-effects model (Beckmann et al., 2003). However, RFT was not employed to control the false positive rate of topological features when making inference on spatially interpolated data. Tak and Ye (2014) applied a full mixed-effects model (Searle, 1979;Cnaan et al., 1997) to the concatenated individual fNIRS data. While the full mixed-effects model makes fewer assumptions in estimating the error variances than the summary statistic approach, equivalence between two-level and single-level mixed effects models would not hold for the interpolated parameter estimates. It is therefore more computationally demanding, and the analysis needs to be repeated for all subjects, if a new subject is included in a group of subjects.
Finally, we have focused on producing sensor space contrast images on a canonical scalp surface. While sensor space approaches enable us to infer regionally specific effects on the canonical scalp surface, several studies have also performed source reconstruction for generating depth-resolved images of the hemoglobin concentration changes, which is referred to as diffuse optical tomography (DOT) (Gibson et al., 2005;Abdelnour and Huppert, 2011;Hassanpour et al., 2014;Aasted et al., 2015). Specifically, Abdelnour and Huppert (2011) described the randomeffects model for concatenated optical measurements, and source parameters at the group level were then estimated using the restricted maximum likelihood (ReML) method (Mattout et al., 2006). Hassanpour et al. (2014) applied SPM methods in the framework of general linear model  to high density DOT data, to make inference about regionally specific effects on the cortical regions with relatively high spatial resolution.
Although computation of contrast images in source space is beyond the scope of this work, it is worth noting that dynamic causal modelling (DCM) for fNIRS is a form of source space analysis (Friston et al., 2003;Tak et al., 2015). DCM-fNIRS is based on forward models relating optical measurements to hemodynamic activities and underlying neuronal interactions at specified point source locations. Inversion of these models, using an established Bayesian framework, then enables inference about regional activity and directed connectivity changes among source locations at the neuronal level (Friston et al., 2007b;Penny et al., 2010). For example, in the colour-word matching Stroop task, one might use DCM-fNIRS to investigate how a fronto-cingulate network including dorsolateral prefrontal cortex and rostral part of anterior cingulate cortex is modulated by the Stroop interference effect (Schlösser et al., 2008).