Non-invasive laminar inference with MEG: Comparison of methods and source inversion algorithms

Magnetoencephalography (MEG) is a direct measure of neuronal current flow; its anatomical resolution is therefore not constrained by physiology but rather by data quality and the models used to explain these data. Recent simulation work has shown that it is possible to distinguish between signals arising in the deep and superficial cortical laminae given accurate knowledge of these surfaces with respect to the MEG sensors. This previous work has focused around a single inversion scheme (multiple sparse priors) and a single global parametric fit metric (free energy). In this paper we use several different source inversion algorithms and both local and global, as well as parametric and non-parametric fit metrics in order to demonstrate the robustness of the discrimination between layers. We find that only algorithms with some sparsity constraint can successfully be used to make laminar discrimination. Importantly, local t-statistics, global cross-validation and free energy all provide robust and mutually corroborating metrics of fit. We show that discrimination accuracy is affected by patch size estimates, cortical surface features, and lead field strength, which suggests several possible future improvements to this technique. This study demonstrates the possibility of determining the laminar origin of MEG sensor activity, and thus directly testing theories of human cognition that involve laminar- and frequency-specific mechanisms. This possibility can now be achieved using recent developments in high precision MEG, most notably the use of subject-specific head-casts, which allow for significant increases in data quality and therefore anatomically precise MEG recordings. Section Analysis methods. Classifications Source localization: inverse problem; Source localization: other.


Introduction
Modern theories of brain organization and function increasingly incorporate the laminar organization of cortical projections and oscillatory signatures of neural activity (Adams et al., 2013;Arnal and Giraud, 2012;Bastos et al., 2012;Jensen et al., 2015;Wang, 2010). While high resolution functional magnetic resonance imaging (fMRI) can resolve laminar-specific activity (Chen et al., 2013;Goense et al., 2012;Guidi et al., 2016;Huber et al., 2015;Kok et al., 2016;Koopmans et al., 2011Koopmans et al., , 2010Olman et al., 2012;Scheeringa et al., 2016), it cannot measure dynamics at a millisecond time-scale. Being a completely non-invasive and direct measure of neuronal activity capable of such temporal resolution, magnetoencephalography (MEG) is an attractive option for testing such theories (Baillet, 2017). While MEG has excellent temporal resolution, its spatial resolution is limited by subject movement and co-registration error Barnes, 2011, 2003;Medvedovsky et al., 2007;Uutela et al., 2001). With recently developed high precision MEG using head-cast technology, it is possible to address both issues and record higher quality MEG data than previously achievable (Liuzzi et al., 2016;Meyer et al., 2017;Troebinger et al., 2014aTroebinger et al., , 2014b. Simulations have shown that such high quality data make it theoretically possible to distinguish the MEG signal originating from either deep or superficial laminae (Troebinger et al., 2014a). However, previous attempts at laminar localization have used global measures of model fit that cannot infer the laminar origin of activity in a spatially specific way, using regions of interest (ROIs). Additionally, multiple source inversion algorithms exist to link extra-cranial electromagnetic activity measured at the sensors to cortical sources. Each of these algorithms uses different assumptions to constrain the source estimation, further complicating the validation of laminar specific MEG by making it unclear which might be most suited for laminar-specific analysis.
We here develop whole brain and ROI analyses using multiple source inversion algorithms for classifying MEG signals as originating from either deep or superficial laminae, test them using simulated laminar data, and compare them in terms of their classification performance. To this end, we simulated sensor data for source activity at locations on either the pial or white matter cortical surface (representing superficial and deep cortical laminae, respectively). We then measured the accuracy of two types of analyses (whole-brain or ROI) in determining, based on the sensor data alone, the correct origin of the source. The whole brain analysis reconstructs the sensor data onto the pial and white matter cortical surfaces and then compares the fit of the two models (Troebinger et al., 2014a). This provides an overall measure of model fit, but cannot provide spatially-specific comparisons. To address this limitation, the ROI analysis reconstructs the data onto both pial and white matter surfaces simultaneously, computes an ROI based on the change of activity on either surface from a baseline time window, and compares the reconstructed activity within the ROI between the two surfaces. We tested four different commonly used functional priors: minimum norm (IID; Ilmoniemi, 1984, 1994), LORETA (COH;Pascual-Marqui, 1999;Pascual-Marqui et al., 1994), empirical Bayes beamformer (EBB; Belardinelli et al., 2012;L opez et al., 2014), and multiple sparse priors (MSP; Friston et al., 2008). These functional priors each embody different assumptions about the distribution of current flow across the cortex, from complete independence (IID), to locally coherent and distributed (COH), to uncorrelated in time (EBB), to locally coherent and sparse (MSP).

MRI acquisition
Data were acquired from a single volunteer with a 3T whole body MR system (Magnetom TIM Trio, Siemens Healthcare, Erlangen, Germany) using the body coil for radio-frequency (RF) transmission and a standard 32-channel RF head coil for reception. A quantitative multiple parameter map (MPM) protocol, consisting of 3 differentially-weighted, RF and gradient spoiled, multi-echo 3D fast low angle shot (FLASH) acquisitions and 2 additional calibration sequences to correct for inhomogeneities in the RF transmit field (Callaghan et al., 2015;Lutti et al., 2012Lutti et al., , 2010, was acquired with whole-brain coverage at 800 μm isotropic resolution. The FLASH acquisitions had predominantly proton density (PD), T1 or MT weighting. The flip angle was 6 for the PD-and MT-weighted volumes and 21 for the T1 weighted acquisition. MT-weighting was achieved through the application of a Gaussian RF pulse 2 kHz off resonance with 4 ms duration and a nominal flip angle of 220 prior to each excitation. The field of view was 256 mm head-foot, 224 mm anterior-posterior (AP), and 179 mm right-left (RL). Gradient echoes were acquired with alternating readout gradient polarity at eight equidistant echo times ranging from 2.34 to 18.44 ms in steps of 2.30 ms using a readout bandwidth of 488 Hz/pixel. Only six echoes were acquired for the MT-weighted acquisition in order to maintain a repetition time (TR) of 25 ms for all FLASH volumes. To accelerate the data acquisition, partially parallel imaging using the GRAPPA algorithm was employed with a speed-up factor of 2 in each phase-encoded direction (AP and RL) with forty integrated reference lines.
To maximise the accuracy of the measurements, inhomogeneity in the transmit field was mapped by acquiring spin echoes and stimulated echoes across a range of nominal flip angles following the approach described in Lutti et al. (2010), including correcting for geometric distortions of the EPI data due to B0 field inhomogeneity. Total acquisition time for all MRI scans was less than 30 min.
Quantitative maps of proton density (PD), longitudinal relaxation rate (R1 ¼ 1/T1), magnetisation transfer saturation (MT) and effective transverse relaxation rate (R2* ¼ 1/T2*) were subsequently calculated according to the procedure described in Weiskopf et al. (2013). FreeSurfer (v5.3.0;Fischl, 2012) was used to extract cortical surfaces from the multi-parameter maps. Use of multi-parameter maps as input to FreeSurfer can lead to localized tissue segmentation failures due to boundaries between the pial surface, dura matter and CSF showing different contrast compared to that assumed within FreeSurfer algorithms (Lutti et al., 2014). Therefore, an in-house FreeSurfer surface reconstruction procedure was used to overcome these issues, using the PD and T1 volumes as inputs. Detailed methods for cortical surface reconstruction can be found in Carey et al. (2017). This process yields surface extractions for the pial surface (the most superficial layer of the cortex adjacent to the cerebro-spinal fluid, CSF), and the white/grey matter boundary (the deepest cortical layer). Each of these surfaces is downsampled by a factor of 10, resulting in two meshes comprising 33,596 vertices each. For the purpose of this paper, we will use these two surfaces to represent deep (white/grey interface) and superficial (grey-CSF interface) cortical models. Cortical thickness was computed as the distance between linked vertices on the pial and white matter surfaces (Kabani et al., 2001;Lerch and Evans, 2005a;MacDonald et al., 2000), and smoothed over each surface with a Gaussian kernel (FHWM ¼ 8 mm). Mean surface curvature, a measure of local cortical folding, was computed as the mean of the two principal curvatures at each vertex (Davatzikos and Bryan, 1996;Griffin, 1994;Joshi et al., 1995;Luders et al., 2006;Van Essen and Drury, 1997). Sulcal depth was computed using the CAT12 toolbox (http://dbm.neuro.uni-jena.de/ cat/) to generate a convex hull surface from the pial surface, and then computing the Euclidean distance between each vertex and the nearest vertex on hull surface (Im et al., 2006;Tosun et al., 2015;Van Essen, 2005).

Simulations
We tested the efficacy of each analysis method using synthetic data sets. All simulations were based on a single dataset acquired from a real experimental recording of the same human participant that the MRI scans were acquired from using a CTF 275 channel Omega system. The MEG sensor data from this dataset were discarded; the dataset was only used to determine the sensor layout, sampling rate (1200 Hz, downsampled to 250 Hz), number of trials (515), and number of samples (1251) for the simulations. All simulations and analyses were implemented using the SPM12 software package (http://www.fil.ion.ucl.ac.uk/spm/software/ spm12/) and are available at http://github.com/jbonaiuto/laminar_sim.
In each simulation, we specified a simulated source centered at a vertex on either the pial or white matter surface. We simulated sinusoidal activity profiles of 20 Hz with a patch size of FWHM ¼ 5 mm over a time window from 100 to 500 ms, and used a single shell forward model (Nolte, 2003) to generate a synthetic dataset from the simulated activity. We chose 60 random vertices on each surface to simulate sources at, giving a total of 120 synthetic datasets (60 sources simulated on the pial surface, and 60 on the white matter surface; Fig. 1). Each simulation consisted of a single dipole with a moment of 10 nAm unless otherwise specified. Typical per-trial SNR levels for MEG data range from À40 to À20 dB (Goldenholz et al., 2009), and therefore Gaussian white noise was added to the simulated data and scaled in order to yield per-trial amplitude SNR levels (averaged over all sensors), of À100, À50, À20, À5, À0 or 5 dB in order to generate synthetic datasets across a range of realistic SNRs.

Analyses for laminar discrimination
We compared two methods for determining the laminar locus of simulated activity: a whole brain and a region of interest (ROI) analysis ( Fig. 2). Each analysis computed 4 different models to explain the simulated data (IID, COH, EBB, and MSP), each using different functional priors expressing common MEG inversion assumptions.
The whole brain analysis reconstructed the simulated data (with sensor noise) separately onto each of the surface models (the pial and the white matter) and compared the fit between the two models using either free energy (Troebinger et al., 2014a) or cross validation error. Free energy is a parametric metric that rewards fit accuracy and penalizes model complexity (Friston et al., 2007), providing a lower bound for the log model evidence value (Penny et al., 2010). Cross validation involves partitioning the data into training and test portions. The idea is to fit models to the training data and then to compare models based on their accuracy in predicting the test data. Models which are too complex will over-fit the training data (i.e. fit the noise) and therefore perform poorly on the test data. Conversely, models which are too simple will not be able to explain the training or the test data. In other words, cross validation involves a similar accuracy-complexity trade-off to free energy, but is calculated on the basis of the sensor-level time-courses, rather than the distance between the means of the prior and posterior distributions. This in turn means that while free energy is dependent on the prior distribution specified (including the variance thereof), cross validation is not. We computed the average 10-fold cross validation error by excluding 10% of the sensors from the source reconstruction and computing the error in predicting the missing sensor data using the resulting model. The error in each fold was defined as the root mean square error (RMSE) of the sensor data predictions, expressed as a percentage of the root mean square (RMS) measured sensor data, averaged over each excluded sensor. Much like arguments for parametric and non-parametric statistics, the free energy approximation is more powerful (as it uses all the data) when the underlying assumptions are met, whereas cross validation is not quite as sensitive, is more time consuming, yet is robust.
The ROI analysis reconstructed the data (with sensor noise) onto a mesh combining the pial and white matter surfaces, thus providing an estimate of source activity on both surfaces. We defined an ROI by comparing power in the 10-30 Hz frequency band during the time period containing the simulated activity (100 ms-500 ms) with a prior baseline period (À500ms to À100 ms) at each vertex using paired t-tests. Vertices in either surface with a t-statistic in the 75th percentile of the t-statistics over all vertices in that surface, as well as the corresponding vertices in the other surface, were included in the ROI. This ensured that the contrast used to define the ROI was orthogonal to the subsequent pial

Fig. 2. Simulation and analysis protocol.
Pial and white matter surfaces are constructed based on an MPM volume. Sources are simulated as patches of 20 Hz activity on random vertices of either the pial or white matter surface. The whole brain analysis computes separate generative models for each surface and compares these models in terms of free energy or cross validation error. The ROI analysis creates a single generative model combining both surfaces. It then creates an ROI localization of baseline-corrected increase in power, and compares estimated power at corresponding vertices between the surfaces within that ROI.
versus white matter surface contrast. For each trial, ROI values for the pial and white matter surfaces were computed by averaging the absolute value of the change in power compared to baseline in that surface within the ROI. Finally, a paired t-test was used to compare the ROI values from the pial surface with those from the white matter surface over trials. All ttests were performed with corrected noise variance estimates in order to attenuate artifactually high significance values (Ridgway et al., 2012). Two-sided binomial tests were used to compare the accuracy in classifying simulated sources as originating from the correct surface, as well as bias toward the pial surface, with chance levels (50%). The whole brain and ROI analyses as well as source reconstruction algorithms were compared in terms of their classification accuracy using exact McNemar's tests (McNemar, 1947). Correlations between free energy and cross validation error differences, and ROI t-statistics were evaluated using Spearman's rho tests. The free energy difference was computed as the free energy for the pial surface model minus that of the white matter surface model, while the cross validation error difference was computed as the cross validation error for the white matter surface model minus that of the pial surface model. This ensured that for both metrics, positive values indicate a better fit metric for the pial surface model. Relationships between the difference in free energy between the correct and incorrect surface models, and cortical surface statistics such as cortical thickness, mean surface curvature, sulcal depth, and lead field strength were evaluated using Spearman's rho tests. Because the surface statistics were all potentially correlated with each other, the correlation coefficients were compared using Meng's test for correlated correlation coefficients (Meng et al., 1992), followed up by pairwise Z-tests.
The empirical Bayes optimization rests upon estimating hyperparameters which express the relative contribution of source and sensor level covariance priors to the data (L opez et al., 2014). For all algorithms we assumed the sensor level covariance to be an identity matrix. For the EBB and COH algorithms there is a single source level prior which is either estimated from the data (EBB) or fixed (COH). There are therefore only two hyper-parameters to estimatedefining the relative contribution of the source and sensor level covariance components to the data. For the MSP algorithm, there is a different source covariance prior for every possible patch. We used a total of 90 patches, 60 of which were used as locations for potential simulated sources, plus 30 patches at random vertices. This does give MSP a considerable advantage (see Discussion) but factors out computational issues in the optimization.

Laminar source discrimination
In the whole brain analysis, we computed a difference in free energy between the pial and white matter generative models, approximating the log ratio of the model likelihoods. This resulted in a metric which is positive or negative if there is more evidence for the pial or white matter model, respectively. Similarly, the ROI analysis produced a t-statistic which was positive when the change in power was greater on the pial surface, and negative when the change was greater on the white matter surface. The free energy difference and ROI t-statistics for each simulation using each source inversion algorithm are shown in Fig. 3. For the EBB and MSP algorithms (Fig. 3A, D), most of the sources simulated on the pial surface resulted in positive free energy differences and t-statistics, while most of those simulated on the white matter surface yielded negative metrics (EBB whole brain: accuracy ¼ 93.33%, p < 0.0001; EBB ROI: accuracy ¼ 99.17%, p < 0.0001; MSP whole brain: accuracy ¼ 100%, p < 0.0001; MSP ROI: accuracy ¼ 100%, p < 0.0001). In contrast, the COH and IID algorithms were biased deep for free energy (COH: white matter ¼ 69.17%, p < 0.0001; IID: white matter ¼ 67.5%, p < 0.0005) and superficial for the ROI analysis (COH: pial ¼ 100%, p < 0.0001; IID: pial ¼ 100%, p < 0.0001) regardless of the layer on which sources were simulated. Thus, both the EBB and MSP versions of both the whole brain and ROI analyses were able to distinguish between white matter and pial sources, but the COH and IID algorithms could not ( Fig. 3B and C).
In order to quantify the agreement between the different metrics used (free energy, cross validation error, and ROI t-statistic), we analyzed pairwise correlations between them. The difference in free energy from the whole brain analysis and the t-statistic from the ROI analysis were correlated for the EBB algorithm (ρ(118) ¼ 0.75, p < 0.001; Fig. 4A). There was a considerable separation between the MSP t-statistic distributions for pial and white matter sources, so we therefore considered the pial and white matter simulation sources separately. Correlations between the free energy difference and ROI analysis t-statistic were still significant for sources on both surfaces (pial: ρ(58) ¼ 0.48, p < 0.0005, white matter: ρ(58) ¼ 0.3, p ¼ 0.02; Fig. 4B). We compared the difference in cross validation error between the white matter and pial surface models with the difference in free energy in order to verify that the results of the whole brain analysis were the same with an independent metric. We found that the cross validation error difference was highly correlated with both the free energy difference (EBB: ρ(118) ¼ 1.0, p < 0.0001; Fig. 4A; MSP: ρ(118) ¼ 0.98, p < 0.0001; Fig. 4B) and the tstatistic from the EBB version of the ROI analysis (ρ(118) ¼ 0.75, p < 0.0001; Fig. 4A). The cross validation error difference and ROI tstatistic were only significantly correlated for pial sources for the MSP algorithm (pial: ρ(58) ¼ 0.34, p ¼ 0.009, white matter: ρ(58) ¼ 0.18, p ¼ 0.173; Fig. 4B), but the two metrics predicted the same classification category (pial or white matter) for every simulated source.
At an SNR of À20 dB, the EBB and MSP versions of the whole brain and ROI analyses correctly classified most of the simulated sources, while the IID and COH versions performed at chance levels ( Fig. 3). In order to further evaluate the performance of the analyses, we simulated sinusoidal activity with varying levels of noise resulting in SNRs from À100 dB to 5 dB, as well as a control dataset containing only noise and no signal (SNR ¼ À∞dB). We first compared the bias of each analysis and source inversion algorithm by computing the percentage of sources classified as pial, using a threshold of ±3 for the free energy difference (meaning that one model is approximately twenty times more likely than the other) and a threshold of the critical t value with df ¼ 514 and α ¼ 0.05 for the ROI tstatistic. At low levels of SNR, all of the metrics were biased toward pial sources (SNR ¼ À100 dB, EBB whole brain: pial ¼ 85%, p < 0.0001; SNR ¼ À100 dB, EBB ROI: pial ¼ 100%, p < 0.0001; SNR ¼ À100 dB, MSP whole brain: pial ¼ 74.17%, p < 0.0001), except for the MSP version of the ROI analysis (SNR ¼ À100 dB: pial ¼ 42.5%, p ¼ 0.12), but these biases were not statistically significant (i.e. the free energy difference and t-statistics were greater than zero, but did not exceed the significance threshold). As SNR increased nearly all classifications exceeded the significance threshold, and the EBB and MSP versions of the whole brain and ROI analyses became unbiased (SNR ¼ 5 dB, EBB whole brain: pial ¼ 50%, p ¼ 1.0; SNR ¼ 5 dB, EBB ROI: pial ¼ 50%, p ¼ 1.0; SNR ¼ 5 dB, MSP whole brain: pial ¼ 50%, p ¼ 1.0; SNR ¼ 5 dB, MSP ROI: pial ¼ 50%, p ¼ 1.0; Fig. 5A, C). We then compared the accuracy of each analysis and source inversion algorithms over the range of SNRs.
Because we scaled the level of the white noise in order to achieve a fixed SNR level, there was the possibility that these results were biased due to boosting the signal from deep layer sources which were further away from the sensors. In order to address this, we ran additional control simulations in which we kept the source signal magnitude fixed at 20 nAm, and varied the level of the white noise from 10 to 2 Â 10 6 fT RMS. Consistent with the results of our main simulations, both the EBB and MSP versions of the whole brain and ROI analyses achieved at least 80% accuracy when the noise level fell below 100 fT RMS (EBB whole brain: 80.83%, p < 0.0001; EBB ROI: 90.83%, p < 0.0001; MSP whole brain: 99.17%, p < 0.0001; MSP ROI: 100%, p < 0.0001; Figure S1). For comparison, this corresponds to a per-trial SNR level of approximately À25 dB.
We tested the EBB and MSP versions of the whole brain and ROI analyses under more realistic noise assumptions by simulating sources added to resting state MEG data acquired from the same participant whose anatomy the simulations were based on. Data from a 10 min The difference in free energy between the pial and white matter generative models in each simulation (SNR ¼ À20 dB). Right column: T-statistics from the ROI analysis comparing pial and white matter ROIs for each simulation (SNR ¼ À20 dB) Each panel shows simulations with pial surface sources on the top row, and simulations with white matter surface sources on the bottom row. In both the whole brain and ROI analyses, the EBB (A) and MSP (D) source reconstruction algorithms were able to correctly classify simulated source activity as arising from either the deep or superficial surface (EBB whole brain: accuracy ¼ 93.33%, p < 0.0001; EBB ROI: accuracy ¼ 99.17%, p < 0.0001; MSP whole brain: accuracy ¼ 100%, p < 0.0001; MSP ROI: accuracy ¼ 100%, p < 0.0001). The IID (B) and COH (C) versions of the whole brain analysis were biased toward the deep surface (IID: white matter ¼ 67.5%; p < 0.0005; COH: white matter ¼ 69.17%, p < 0.0001), and toward the superficial surface in the ROI analysis (IID: pial ¼ 100%, p < 0.0001; COH: pial ¼ 100%, p < 0.0001).
resting state scan with the eyes open were downsampled to 250 Hz, epoched to match the simulated trial durations, and baseline corrected to remove the DC offset. We varied the strength of the simulated dipoles from 0.01 nAm to 300 nAm (resulting in a range of SNRs from approximately À100 dB to -5 dB), positioned them at the same vertices as the main simulations, and added their projected sensor activity to the resting state data ( Figure S2). The EBB and MSP versions of the whole brain and ROI analyses achieved significantly above chance accuracy when the dipole moment was at least 50 nAm (corresponding to an SNR of approximately À20 dB), MSP was more accurate than EBB, and the ROI analysis was more accurate than the whole brain analysis, consistent with the main simulation results ( Figure S3). Note, however, that at low dipole moments, both the EBB and MSP global metrics are deemed significant yet are biased towards the superficial surface (see Discussion). A) The percentage of sources classified as originating from the pial surface for the EBB version of the whole brain and ROI analyses, for each level of SNR tested. The error bars represent the standard error. The percentage of simulations with free energy differences or t-statistics exceeding the significance threshold is represented by the intensity of the line color. Asterisks show where the percentage is significantly above or below chance levels. Both analyses were biased toward the pial surface at low SNR levels, but not significantly. As SNR increased, nearly all of the classifications exceeded the significance threshold and both analyses became unbiased. B) The percentage of sources accurately classified by the EBB version of the whole brain and ROI analyses, over all tested SNR levels. Both analyses accurately classified at least 90% of the simulated sources at SNR ¼ À20 dB. C) and D) As in (A) and (B) for the MSP version of both analyses. The MSP version of the ROI analysis was unbiased even at low SNR levels and the whole brain analysis correctly classified at least 90% of the simulated sources at SNR ¼ À50 dB.

Surface anatomy
We next sought to determine what anatomical features of the cortical surface make it easier or more difficult to discriminate between white matter and pial surface sources. We therefore computed several surface statistics including cortical thickness (Kabani et al., 2001;Lerch and Evans, 2005a;MacDonald et al., 2000), surface mean curvature (Davatzikos and Bryan, 1996;Griffin, 1994;Joshi et al., 1995;Luders et al., 2006;Van Essen and Drury, 1997), sulcal depth (Im et al., 2006;Tosun et al., 2015;Van Essen, 2005), and lead field RMS (Hillebrand and Barnes, 2002), and examined the relationship between each measure and the difference in free energy between the correct and incorrect generative models in the whole brain analysis (ΔF ¼ F correct À F incorrect , Fig. 7). The distributions of cortical thickness, mean curvature, and sulcal depth closely matched previously published estimates (Fischl and Dale, 2000;Hutton et al., 2008;Jones et al., 2000;MacDonald et al., 2000;Tosun 6. Whole brain analysis performance with varying patch sizes. Blue lines denote simulations where the reconstructed patch size matches the simulated patch size (solid ¼ 5 mm, dashed ¼ 10 mm), red lines are where patch size is either under-(dashed red) or over-estimated (solid red). The percentage of sources significantly classified as either pial or white matter is represented by the intensity of the line color. Asterisks show where the percentage is significantly above or below chance levels. A) The percentage of sources classified as originating from the pial surface for the EBB version of the whole brain analysis, for each level of SNR tested. The error bars represent the standard error. As SNR increased, simulations where the patch size was correctly estimated became unbiased, while incorrect patch size estimates resulted in a bias toward the white matter surface. B) Incorrect patch size estimates resulted in reduced classification accuracy for the EBB version of the whole brain analysis. C) The MSP version of the whole brain analysis became biased toward the pial surface as SNR increased. D) Classification accuracy was reduced for the MSP version of the whole brain analysis when the patch size was under-or over-estimated. et al., 2015). Sulcal depth and lead field RMS were both highly correlated with ΔF (sulcal depth: ρ(118) ¼ -0.726, p < 0.0001; lead field RMS: ρ(118) ¼ 0.653, p < 0.0001), indicating that source activity was more easily classified as originating from the pial or white matter surface in sources closer to the scalp, and those which have a greater impact on the sensors. Cortical thickness and surface curvature were both weakly correlated with ΔF (cortical thickness: ρ(118) ¼ 0.288, p < 0.005; surface curvature: ρ(118) ¼ 0.294, p < 0.005), meaning that source activity was more easily classified as pial or white matter where there was greater distance between the surfaces, and at gyral rather than sulcal vertices. These surface metrics are not independent (e.g. both surface curvature and sulcal depth influence the lead field RMS), but a comparison of correlated correlation coefficients (Meng et al., 1992) revealed significant heterogeneity of the absolute correlation matrix (Х 2 (3, N ¼ 120) ¼ 48.72, p < 0.0001). Follow-up tests revealed that sulcal depth and lead field RMS were significantly more correlated with ΔF than either cortical thickness (sulcal depth: Z ¼ 5.6, p < 0.0001; lead field RMS: Z ¼ 4.37, p < 0.0001) or surface curvature (sulcal depth: Z ¼ 5.34, p < 0.0001; lead field RMS: Z ¼ 4.12, p < 0.0001) were, and there was no difference between the sulcal depth and lead field RMS correlation co- We also looked at the distribution of lead-field differences between matching pial/white vertices relative to their nearest neighbours on the same surface ( Figure S8) and found that generally (in 66.8% of vertices on the pial surface and 56.65% on the white matter surface) there was a smaller difference in lead-field magnitude between pial-white vertex pairs than their immediate same-surface neighbours.

Discussion
We here provide a comparison and evaluation of analysis techniques for non-invasive laminar specific inference in human neocortex with MEG. We found that, given sufficient SNR, both the whole brain model comparison analysis and the ROI analysis were able to distinguish Fig. 7. Relationship between cortical surface anatomy and laminar discriminability. First row (from left to right): cortical thickness plotted on the pial surface, cortical thickness distribution over the pial surface, free energy difference between the correct and incorrect generative model as a function of cortical thickness (red ¼ white matter sources, blue ¼ pial sources; EBB, SNR ¼ À20 dB). Second row: Surface curvature of the white matter and pial surfaces, relationship between white matter and pial surface curvature, free energy difference as a function of surface curvature. Third row: Sulcal depth of the white matter and pial surfaces, relationship between white matter and pial surface sulcal depth, free energy difference as a function of sulcal depth. Fourth row: Lead field RMS of the white matter and pial surfaces, relationship between lead field RMS of the white matter and pial surfaces, free energy difference as a function of lead field RMS.
J.J. Bonaiuto et al. NeuroImage 167 (2018) 372-383 between simulated activity originating on white matter versus pial surfaces, representing deep and superficial cortical laminae, from the sensor data alone. Importantly, we found mutually corroborating results from two inversion schemes (MSP and EBB) and three metrics of fit (free energy, cross-validation and local t-tests). The MSP source inversion algorithm was more sensitive to laminar differences in activity at a lower SNR, though EBB achieved similar classification performance with a slightly higher SNR. These SNR levels are now possible thanks to novel head-cast technology which reduces head movement and allows accurate co-registration over repeated sessions, resulting in very high SNR MEG data Troebinger et al., 2014aTroebinger et al., , 2014b.

Laminar discrimination depends on functional assumptions
In this study we remained agnostic about functional assumptions and used four commonly used sets of priors. We find that only the algorithms with some sparsity constraint (EBB and MSP) could successfully discriminate the laminar origin of source activity (Fig. 3). This is because model evidence, like cross-validation, penalizes more complex models as they tend to have poor generalization performance. This means that intrinsically sparser algorithms (like EBB and MSP) will always be rewarded if they can explain the same amount of data with fewer active sources. We should note that MSP had a distinct advantage here as the possible source space of priors included the vertex locations on which sources were simulated. EBB had no such prior information. We found that the implementations of minimum norm and LORETA were not suitable for this discrimination task, but we should point out that these algorithms (as implemented in SPM12) are somewhat generic and not individually optimized. For example, many groups define a baseline period or empty room recording, which allows an estimate of the optimal regularization parameter, or use a depth re-weighting in the minimum norm estimates (Gramfort et al., 2014). Here all regularization (the balance between the source and sensor level covariance matrices) was set based on a Free energy optimization , but cross-validation approaches are also possible (Engemann and Gramfort, 2015).
We note that the ROI and global metrics implemented here are not based on identical data. The ROI analysis requires a functional contrastit can only detect laminar differences where there is a change in power from a baseline time window, while the whole brain analysis simply determines which surface best supports the measured data (even if there is no modulation from baseline). Another difference in the implementation is the use of a Hann window in the whole brain analyses (originally motivated to give frequency specificity) which was not present in the local analysis and hence the local metrics have a marginal SNR advantage.

Laminar discrimination requires accurate patch size estimates
We looked at a range of simulation and reconstruction patch sizes, as previous simulation work has shown that an overestimation of patch extent can bias model evidence towards superficial cortical layer models (Troebinger et al., 2014a). In this study we observed a similar skew in the mean free energy difference ( Figure S5) but this was always much smaller than the free energy difference for the correct surface. This is perhaps due to the refined and much smoother surfaces we are using here (and that we are assuming zero co-registration error). Here we focused on accuracy (rather than mean free energy difference) of laminar classification and found that it was indeed degraded when the source patch size was over-or under-estimated, but that EBB (Fig. 6B) was biased toward white matter sources, while MSP was slightly biased toward pial sources (Fig. 6A,C), regardless of the over-or under-estimation. EBB is a two stage process, first estimating the source distribution with beamformer priors, and then balancing this estimate against sensor noise to fit the data. For a single source, under-or over-estimation of patch size using EBB will lead to a source estimate with a lower peak variance (Hillebrand and Barnes, 2011). In the limit this will tend to a flat variance distribution across the cortex (resembling an IID or COH prior), and our initial simulations (Fig. 3) show that free energy metrics of these estimates (at high SNR) will be biased toward the deep layers ( Figure S6). We speculate that given similar levels of accuracy, free energy favors less the marginally less complex current distributions on the deep layer models. In contrast, laminar classification based on the percentage of variance explained, a goodness of fit metric that does not penalize complexity, results in a superficial bias for IID and COH ( Figure S7). We also note that the ROI analysis for IID and COH has a superficial bias (Fig. 3, S6).

Free energy and cross-validation error are closely related
Free energy is a widely used parametric measure of model fit in the neuroimaging community, but cross validation error is a more commonly used nonparametric measure in the field of machine learning. Both metrics reward model accuracy and try to avoid overfitting of data. Free energy accomplishes this by penalizing model complexity, while cross validation error measures the ability of a model to generalize to new, unseen data. In our simulations, the difference in cross validation error between the pial and white matter generative models was highly correlated with their difference in free energy. Importantly, this demonstrates that the laminar inferences in these analyses are not specific to specialized parametric measures of model fit such as free energy, but generalize to familiar nonparametric measures such as cross validation error.

Anatomical assumptions
We made three major simplifying anatomical assumptions in these analyses: i) the locations of deep and superficial laminar sources are on the white matter and pial surfaces, ii) deep and superficial laminae contribute equally to the measured MEG signal, and iii) the spatial spread of lateral connectivity is the same across laminae. However, these assumptions may require further scrutiny. First, the mean thickness of the human cerebral cortex ranges from 2 mm to over 4 mm (Lerch and Evans, 2005b;MacDonald et al., 2000), and therefore the effective net dipole moments of sources in the supra-and infra-granular layers will be more proximal than the pial and white matter surfaces used here. Second, while the main contributors to the MEG signal are the supra-granular layers II/III pyramidal neurons and the infra-granular layer V pyramidal neurons of the neocortex (Murakami and Okada, 2006) with relatively comparable numbers of neurons in these layers (Meyer et al., 2010), based purely on histology, one would expect the layer V cells to have an approximately 3-4 times larger dipole moment (or impact on the MEG sensors) than the layer II/III cells (Bush and Sejnowski, 1993;Jones et al., 2007;Okada, 2015, 2006). Based on geometry, as the layer II/III cells are nearer to the MEG sensors, one might expect this to mitigate the effect. However the regions in which the superficial layers are closer to the sensors (the crests of the gyri) are also the regions in which the MEG signal is attenuated due to their predominantly radial orientation (Hillebrand and Barnes, 2002). In practice, one should expect a slight bias towards the superficial surface (on average the lead fields from the superficial layer are marginally (~12%) greater than the deepsee Fig. 7) but this is insignificant as compared to the bias against supra-granular cells due to their size (a factor of 2-4). Third, there are differences in the extent of lateral connectivity in different cortical layers (Kritzer and Goldman-Rakic, 1995;Schubert et al., 2007), and regions (Amir et al., 1993;Elston and Rosa, 1997;Elston et al., 1999;Levitt et al., 1993;Lund et al., 1993). Specifically, superficial layer II/III neurons are generally more sparsely and less locally connected than the deep ones (Sakata and Harris, 2009;Schubert et al., 2007).

Limitations
While the simulations reported here demonstrate the feasibility of laminar specific inferences with MEG, there are several limitations that could be addressed in future work.
Here, each simulation involved a single source of activity on either the white matter or pial surface. In the human brain, multiple sources are often co-active (Hari and Salmelin, 1997;Jensen and Vanni, 2002), and laminar recordings from non-human primates suggest that activity occurs simultaneously at deep and superficial sources in the same patch of cortex (Bollimunta et al., 2011;Haegens et al., 2015;Maier et al., 2010;Smith et al., 2013;Spaak et al., 2012;Xing et al., 2012). The ROI analysis could easily be extended to handle multiple sources across the brain by defining multiple ROIs based on clusters of activity. However, in the case of simultaneous activity in deep and superficial laminae within a cortical patch, both the whole brain and ROI analyses implemented here can only infer the laminar origin of the strongest source. One option could be to make use of biophysically informed generative models such as the canonical microcircuit (Bastos et al., 2012), in order to predict the expected time-series from the different pyramidal cell populations. These could then be projected from different spatial origins, and thus would give a complete spatio-temporal model of the MEG signal.
In the main simulations reported here, we used Gaussian white noise scaled in each simulation in order to achieve a fixed per-trial SNR, which could have biased the results by attenuating noise levels in deep layer simulations. In additional control simulations we have shown that the whole brain and ROI analyses can still make accurate laminar inferences when the SNR is not fixed (Figure S1), and with more realistic noise sources ( Figure S2, S3). However, in all of our simulations we assume that the noise covariance matrix is diagonal, but for real data this would need to be estimated using empty room measurements.
We should point out that the SNR of the dataset, although apparently low, will be significantly increased through the singular value decomposition implicit in the source reconstruction. Here the 251 samples were reduced to 4 orthogonal temporal modes giving an effective SNR increase of a factor of 7.922.
In human MEG data, the true source patch size is generally unknown. Our simulations show that spatial specificity can be reduced by inaccurate estimates of source patch size, and therefore any empirical MEG studies of laminar-specific activity should determine the appropriate patch size in order to control for potential biases. This could be accomplished by using the global fit metrics (free energy or cross validation error) to compare source inversions performed on combined pial/white manifolds for multiple patch sizes, prior to any layer comparison. With the patch size optimized one could then proceed with either the global or local metrics as outlined in this paper.
In these simulations we should note that MSP generally performs the best for two reasons. Firstly, the source configurations simulated were sparse, favoring both MSP and EBB. Secondly, in the MSP case the source space was constrained to include the possible simulation locations. We did this to factor out the computationally intensive optimization for a global (free energy) maximum as MSP selects and removes patch combinations. In practice, however, we would typically use multiple initial patch selections (e.g. Troebinger et al., 2014a) to produce multiple solutions from which we would select the solution with highest overall free energy.
The simulations reported here do not contain co-registration error or within-session movement. Previous work suggests that the coregistration error must be less than 2mm/2 in order to make accurate laminar inferences (Troebinger et al., 2014a). While subject-specific head-casts have been shown to achieve this level of precision , it is not clear what range of error and within-session movement the specific analyses described here can tolerate.
Finally, both the ROI and whole-brain metrics are biased superficially for low SNR data (Fig. 5). In our Gaussian noise simulations this is not an issue, as these metrics are non-significant at these levels. However, in real recording situations it will be important to consider whether a superficial estimate of activity is not to mischaracterization of the noise (see Figure S3, the realistic noise case, at low dipole moment).

Future improvements to increase laminar discrimination accuracy
Future improvements to the methods we describe could further boost laminar discrimination accuracy with low SNR. We used the potential locations of simulation sources as well as some extra vertices as priors for MSP source reconstruction. Future versions of this analysis could use pairs of vertices at corresponding locations on the pial and white matter surfaces as priors. This would take advantage of the sparsity constraints of MSP to determine the most likely location of source activity in each pair (pial or white matter), and the difference between the two could be further amplified by the ROI analysis. Another possibility is to run EBB and MSP source reconstruction in a staged manner, using the results of the EBB source reconstruction as MSP priors. In our simulations, patch size over-or under-estimation significantly affected laminar discrimination accuracy. Intrinsic surface curvature may indicate the extent of lateral connectivity (Ronan et al., 2011), and therefore one possibility for future analyses is to use intrinsic surface curvature to inform patch size estimates. Finally, we found that laminar discrimination was most accurate where the lead field was strongest (Fig. 7). Optically pumped magnetometers (OPMs) promise to achieve much higher SNRs than traditional SQUIDs since they can be placed directly on the scalp, boosting the lead field strength (Boto et al., 2017.

Laminar-and frequency-specific inferences
Theories of brain organization and function such as predictive coding (Bastos et al., 2012;Friston, 2008), communication through coherence (Fries, 2015(Fries, , 2009(Fries, , 2005, gating by inhibition (Jensen and Mazaheri, 2010), and communication through nested oscillations (Bonnefond et al., 2017) ascribe distinct roles to deep and superficial cortical laminae as well as low and high frequency oscillations. Support for these theories comes mainly from tract tracing and laminar recordings in nonhuman primates which demonstrate a layer-specific segregation in the origin of feedforward and feedback cortico-cortical connections (Felleman and Van Essen, 1991;Markov et al., 2013) and the distribution of low and high frequency activity (Bollimunta et al., 2011(Bollimunta et al., , 2008Buffalo et al., 2011;Dougherty et al., 2015;Haegens et al., 2015;Maier et al., 2010;Smith et al., 2013;Spaak et al., 2012;Van Kerkoerle et al., 2014;Xing et al., 2012). While there is corroborating evidence for these theories in terms of task-modulated frequency-specific activity in the human brain (De Lange et al., 2013;Donner et al., 2009;Jensen et al., 2012;Michalareas et al., 2016;Polanía et al., 2015;Popov et al., 2017), studies linking human oscillatory activity to laminar organization are scarce (Scheeringa et al., 2016). The recent development of high precision MEG with subject-specific head-casts allows non-invasive recording of oscillatory activity in the human brain at previously infeasible SNRs Troebinger et al., 2014b), rendering these theories finally testable in humans (Troebinger et al., 2014a). We have demonstrated that given this high quality data, it is in principle possible to make spatially-localized laminar specific inferences.