Autoregressive model based algorithm for correcting motion and serially correlated errors in fNIRS.

Systemic physiology and motion-induced artifacts represent two major sources of confounding noise in functional near infrared spectroscopy (fNIRS) imaging that can reduce the performance of analyses and inflate false positive rates (i.e., type I errors) of detecting evoked hemodynamic responses. In this work, we demonstrated a general algorithm for solving the general linear model (GLM) for both deconvolution (finite impulse response) and canonical regression models based on designing optimal pre-whitening filters using autoregressive models and employing iteratively reweighted least squares. We evaluated the performance of the new method by performing receiver operating characteristic (ROC) analyses using synthetic data, in which serial correlations, motion artifacts, and evoked responses were controlled via simulations, as well as using experimental data from children (3-5 years old) as a source baseline physiological noise and motion artifacts. The new method outperformed ordinary least squares (OLS) with no motion correction, wavelet based motion correction, or spline interpolation based motion correction in the presence of physiological and motion related noise. In the experimental data, false positive rates were as high as 37% when the estimated p-value was 0.05 for the OLS methods. The false positive rate was reduced to 5-9% with the proposed method. Overall, the method improves control of type I errors and increases performance when motion artifacts are present.


Introduction
Functional near infrared spectroscopy (fNIRS) is a non-invasive brain imaging technique that is based on the spectroscopic measurement of the optical absorption of cerebral blood [1]. Evoked changes in tissue hemoglobin concentrations are detected by an array of light sources and detectors on the scalp that monitor changes in optical absorption and scattering. Both oxyhemoglobin (HbO 2 ) and deoxy-hemoglobin (Hb) concentration changes can be estimated from the modified Beer-Lambert law [2] using absorption measurements taken at multiple wave-lengths of light within the optical window (650-850nm). In comparison to other cerebral vascular imaging methods such as functional magnetic resonance imaging (fMRI) or positron emission tomography (PET), fNIRS imaging is considerably lower cost and more portable. Because it is non-ionizing and non-confining, fNIRS is particularly well suited for non-invasive studies in children or infants. In addition, the portability of fNIRS and its ability to record brain signals during moderate subject movement has allowed its use during various tasks such as walking [3,4], balance [5][6][7], or social interaction [8], which otherwise could not be imaged by conventional fMRI.
Functional NIRS imaging is accomplished through a set of spatially arranged optical sources and detectors (optodes), which are often coupled through fiber optics between a head cap worn by the participant and the fNIRS instrument. Although fNIRS can be used to image brain activity during participant movement, fNIRS is prone to artifacts introduced by slippage of these optodes and headcap on the surface of the scalp. These artifacts are particularly common in studies involving children or infants where the fNIRS cap is more loosely fastened on the head to increase participant tolerance of the method. A great deal of effort has been spent by researchers in developing better methods for identification and removal of motion artifacts, including principal component analysis [9], spline interpolation [10], wavelet transform [11], and Kalman filtering [12] based algorithms. These methods were compared by Cooper et. al. [13], in which spline interpolation and wavelet based filtering gave the best results for two different performance metrics. As a note, these methods are applied in pre-processing of the fNIRS data and in general have some level of subjective tuning involved in the implementation of the algorithm. In addition to motion artifacts, fNIRS signals are also contaminated by superficial and systemic physiological signals such as cardiac, respiratory, and blood pressure oscillations. Because fNIRS signals are measured from sensors placed on the surface of the scalp, these measurements have a high sensitivity to superficial changes in optical absorption. These slow background oscillations contribute to serially correlated noise in fNIRS measurements. This effect is increased by the high sampling rate of fNIRS instruments (∼10Hz or more) that is substantially faster than the underlying physiological signals.
Motion artifacts and serial correlations due to physiology violate common statistical assumptions of the independence of repeated measurements over time. Subsequently, these sources of noise can contribute to inaccurate estimation of type I errors. In this work, we describe an iterative algorithm that utilizes an autoregressive model based pre-whitening filter and robust regression to reduce the effects of physiological and motion-induced sources of noise, respectively. We demonstrated that this algorithm has better performance than ordinary least squares with no motion correction, wavelet based motion correction [11], or spline interpolation based motion correction [10] in the presence of physiological noise and motion artifacts for both deconvolution (e.g., HOMER [9]) and canonical (e.g., SPM-NIRS [14]) based regression models. Our proposed algorithm is demonstrated for both synthetic and experimental data containing motion artifacts. In addition, we tested both short (event-related design) and long (block design) duration tasks as well as varying magnitudes of hemoglobin concentration changes (contrast to noise ratios).

Theory
The simplest model of the observed fNIRS signal is an evoked response with a normally distributed error term as given by where y t is the observed signal; h t is the evoked hemodynamic response; ε t is the noise at time t; and σ 2 is the variance of ε t . The additive noise term in the fNIRS signal is often contaminated with a high degree of physiological noise, which is sampled at the high temporal resolution of fNIRS instruments leading to serially correlated error terms. In other words, slow physiological oscillations are sampled rapidly causing nearby time points in the fNIRS signal to be highly correlated. The presence of these errors can greatly affect the false positive rate (e.g., type I error) of the model and lead to inaccurate interpretations of the data. In standard linear regression methods, serially correlated errors are often dealt with by pre-whitening based on an autoregressive (AR) model of the error terms. A first-order model (AR(1)) of the error terms is where the correlated error terms have been transformed by subtraction, leaving only the white noise term (ν t = ε t − ρ 1 ε t−1 ). This transformation of the data (y) and model (h) using an AR(1) model of the residual error terms is the Cochrane-Orcutt method [15], which has been used in several studies [16][17][18] to remove serial correlations in fNIRS analysis. This method can easily be generalized to any AR model order [19] as follows: In practice, the AR coefficients can be estimated by fitting an AR model to the residual errors (ε t ) themselves. The optimal AR model order should be sufficient to remove serially correlated errors without overfitting. The model order can be chosen objectively by minimizing an information criterion function. For example, the Bayesian information criterion (BIC) [20] function is where P is the model order; LL is the log-likelihood of the model fit; and n is the number of time points.
In the context of fNIRS brain imaging, a linear model of the evoked response [21] can be written as where X is the design matrix, and β is the estimate of the magnitude of brain activity. In the case of a finite impulse response (FIR) model, X is given by the convolution matrix of a series of delta functions describing the timing of the stimulus onsets [9]. The coefficients (β ) then describe the magnitude of the evoked hemodynamic response at each time point. The FIR model is equivalent to block averaging in the case of long inter-stimulus intervals or deconvolution in the case of closely spaced inter-stimulus intervals [22]. Note that boxcar basis functions describing the whole duration of the task can alternatively be used in FIR models and are typically one option of basis function in fMRI analysis; however, we chose not to investigate this option. For canonical linear models (e.g., the statistical parametric modeling approach; SPM-NIRS [14]), X is given by the convolution of the stimulus blocks with a canonical hemodynamic response. In this case, a limited number of coefficients (β ) are used to allow statistical testing of differences in brain activity (e.g., the amplitude of β ) between different task conditions. In both cases, additional regressors such as the time course of systemic physiology or motion terms can also be included in the design matrix to model nuisance terms (e.g., [23]). Regardless of whether an FIR or canonical design matrix is assumed, the values of β (e.g., brain activity) can be estimated by the inversion of X in Eq. (6). The Gauss-Markov equation for the unbiased ordinary least squares (OLS) estimator is where cov(β ) is the covariance matrix of β ; σ 2 is the variance of the residual (ε); and the superscript T denotes the transpose. The linear model (Eq. (6)) can be corrected for serially correlated errors by pre-whitening with the filter f generated via Eq. (4b) giving the expression where F is the convolution matrix of f and performs column-wise filtering of the measurements in y and of the regressors in X, leading to whitened residuals (Fε). The least squares solution to this model is As a practical note, the matrix F is approximately the size of the vector y in both dimensions. This is often too large to form explicitly; however, the matrix F is sparse with only entries of the filter coefficients on the diagonals, which can be implemented as a sparse matrix. A more efficient implementation is to employ digital filtering methods as discussed in section 2.2. The methods above can correct for serial correlations due to physiology; however, motion artifacts also affect the error terms by producing outliers to a normal distribution. In this work, we propose to use iteratively reweighted least squares [24] to deal with outliers caused by motion. In this approach, the influence of each time point is weighted based on the value of the residual by a weighting function, such as Tukey's bisquare function [25] given by where κ is the tuning parameter with a default value of 4.685 that preserves 95% statistical efficiency in the case of normally distributed residuals (i.e., no outliers). Adjusting the tuning parameter based on statistical efficiency gives an objective criteria for determining the tuning parameter. For example, the tuning parameter could be decreased to the point of maintaining 85% statistical efficiency if more robustness against outliers is desired. The weighted least squares (WLS) solution is where W is diagonal matrix of weights calculated from the filtered residuals as follows: Regression is done iteratively, updating the weight matrix W with each iteration until a convergence criterion is reached. As a practical note, robust regression is implemented as part of the statistics toolbox in Matlab (Mathworks R2012b, Natick MA) as the function "robustfit". The iteratively reweighted least squares estimator has an asymptotic covariance matrix given by where E denotes expection, and ψ is the influence function [26], which can be specified as ψ(r) = r w(r).
After regression, the null hypothesis that the hemodynamic response was zero (e.g., β = 0) over a given time window or set of regressors can be tested by defining a contrast vector (c) and calculating the t-statistic given by the expression In the case of an FIR (deconvolution or block averaging) model, the contrast vector might be of value 1 for some predefined window of time (e.g., over the expected peak of the response) and 0 otherwise. In the case of the canonical design matrix, the contrast vector would be 1 for the column of the design matrix representing the regressor (task condition) of interest. A contrast vector can also be defined using positive and negative terms to test differences between two conditions.
4. Apply the whitening filter to the data y and column-wise to the design matrix X.
Note that it is much more computationally efficient to perform the filtering in step 4 via a digital filtering algorithm, such as the "filter" function in Matlab, rather than explicitly forming the convolution matrix F. The filtered data and design matrix can then be passed to any software implementation of IRLS, such as the "robustfit" function in Matlab.
There are many approaches to estimating the AR model coefficients. The current default in Matlab is to minimize the squared prediction errors of the forward and reversed signals. Because motion artifacts are likely to influence the results, we modified this approach to a weighted least squares method, in which the weighted squared prediction errors of the forward and reversed signal were minimized. This was performed as follows: (1) find the ordinary least squares solution, (2) calculate weights from the Tukey's bisquare function of the residual (i.e., Eq. (10)), and (3) solve the weighted least squares problem. This approach was computationally efficient enough to perform a direct search of the AR model order that minimizes BIC. The entire algorithm ran in ∼3s for the canonical regression model and ∼20s for the deconvolution model, which required a much larger design matrix to invert, for one subject's data (24 channels, ∼1500 data points/channel) on a single core of a 2.9 GHz Intel Xeon E5-2690 CPU.

Simulated data
In order to test the performance of our proposed method under controlled noise structures, we first applied the method to synthetic data, in which noise, motion artifacts, and evoked response were simulated. We simulated serially correlated noise via tools in Matlab's Econometrics toolbox with an AR(5) model resulting in a signal with standard deviation of 0.05µM, which was comparable to the real data set discussed in the next section. The AR model order of 5 used for simulated data was close to the median optimal model order of 6 for the experimental data. The AR coefficients were taken as the median values across subjects from an AR(5) model fit to the experimental data. Two sets of AR coefficients were used in the simulations based on AR(5) model fits to the HbO 2 and Hb channels, respectively. Half of the data were simulated using the HbO 2 AR coefficients, and half of the data were simulated using the Hb AR coefficients.
Simulations were performed with and without simulated motion artifacts. Data that were simulated with motion artifacts contained either 5-10 spike artifacts or 1-3 shift artifacts at random times. The spike artifacts were modeled with a Laplace distribution function given by with random peak amplitude, A, of 0.25-1µM and random scale parameter, b, of 0-1.5s. Shift artifacts were modeled as a random positive or negative change in DC value of 0.125-0.5M with a linear transition lasting between 0.25-1.5s. All random parameters described above were drawn from uniform distributions. In order to quantify the performance of the various regression methods, a known simulated evoked response was added to a subset (50%) of the channels. After regression, a receiver operating characteristic (ROC) analysis was performed to quantify the true positive rate (sensitivity) as a function of false positive rate (1-specificity). We simulated both a short duration task consisting of 15 trials of 1s stimuli every 20s and a long duration task of 10 trials of 10s stimuli every 30s. Evoked responses ranging from 0.01-0.10M in peak amplitude were generated by convolving the stimulus design with the canonical hemodynamic response function and added to the simulated noise. Evoked responses were added at half magnitude to simulated deoxy-hemoglobin channels (∆[Hb] = ∆[HbO 2 ]/2). We simulated >100,000 random channels of data at each magnitude of evoked response with exactly half containing an evoked response.

Experimental data
In addition to the completely synthetic dataset, we used experimental "resting state" data from a study involving 22 normal, healthy children (3-5 years old) as a source of real physiological noise and motion artifacts. All subjects provided written parental informed consent, and this data was collected as part of a larger study of functional brain activity [27]. Each "resting state" scan lasted approximately three minutes and was acquired while the children watched a short video on the computer (Disney films). Although this is not true resting state, there were no specific cued tasks during the scan, thus serving as an effective estimate of background physiology for the purpose of this study. The experimental data were acquired at 690nm and 830nm wavelengths at a sampling rate of 4Hz with a probe consisting of 24 source-detector measurements over the prefrontal cortex. The optical density changes were converted to oxyand deoxy-hemoglobin prior to input into the analysis method via the modified Beer-Lambert law. No other preprocessing was performed.
In order to test the performance of the various regression methods, a known simulated evoked response was added to the experimental "resting state" data on a subset of the channels (50%), and an ROC analysis was performed. Evoked responses were generated using the event-related (1s on, 19s off, 15 trials) and block design (10s on, 20s off, 10 trials) paradigms. Each channel of data was concatenated with itself in order to accommodate the length of the experimental designs (300s). We simulated evoked responses with amplitudes ranging from 0.01-0.1M on the oxy-hemoglobin channels. Evoked responses were added at half amplitude to the deoxyhemoglobin channels. For each of the 22 subjects, we simulated 200 trial data sets (>100,000 channels), in which exactly half (chosen randomly) of the source-detector pairs contained an added evoked response in order to perform ROC analysis.

Analysis methods
We tested the performance of the proposed AR(P)-IRLS algorithm as well as OLS with and without AR(1) pre-whitening. Additionally, we tested wavelet based motion correction [11] followed by OLS regression (Wavelet-OLS below) and spline interpolation based motion correction [10] followed by OLS regression (Spline-OLS below) in order to validate the performance of the proposed method's ability to correct motion artifacts. The wavelet based motion correction was implemented using matlab function "hmrMotionCorrectWavelet" that is included in the HOMER2 software package (www.nmr.mgh.harvard.edu/DOT/resources/software.html) using an interquartile range of 1.5, as suggested in the function's documentation. The spline based motion correction was implemented using the functions "hmrMotionCorrectSpline" and "hmrMotionArtifactByChannel" in HOMER2. The parameters for this method were set to AMPThresh = 0.5, tMotion = 0.5s and tMask = 2s following reference [13]. The optimal parameter for SDThresh varied from 12-20 in reference [13] depending on the subject, thus we set SDThresh = 16. Receiver operating chararcteristic (ROC) curves were generated for each of the tested methods by varying the estimated p-value threshold for activation from 0 to 1 and calculating the true positive rate (sensitivity) and false positive rate (1-specificity). Partial area under the ROC curve (AUC 0.05 ) was used as a metric for performance, in which the true positive rate vs false positive rate curve is integrated up to a false positive rate of 0.05. In this case, a random estimator would result in AUC 0.05 = 0.00125, and a perfect estimator would result in AUC 0.05 = 0.05. Partial AUC is potentially a better indicator of overall performance than AUC, which integrates the entire curve, since the threshold for statistical significance is often set to 0.05 or less. Plots of false positive rate as a function of estimated p-value were generated and examined to evaluate control of type I errors (false positive rate). We estimated standard errors of AUC 0.05 via bootstrap resampling of the data 1,000 times with replacement and calculating the mean and standard deviation of 1,000 estimates of AUC 0.05 .

Results
The AR model based pre-whitening filters removed serially correlated errors in the optical data. In Fig. 1(a), an illustrative example of a simulated fNIRS signal with a few spike artifacts and a shift artifact is shown. Figure 1(b) shows the same data after pre-whitening with an AR(2) model (determined optimal by BIC) generated from Eq. (4b). Figure 1(c) shows the autocorrelation function of the original ( Fig. 1(a)) and whitened ( Fig. 1(b)) signals, in which the whitened signal showed significantly reduced signal autocorrelations. After filtering, the simulated artifacts were visually identifiable as outliers in the whitened signal. These time points are subsequently given little or zero weight during the iteratively reweighted least squares regression. Similarly, an illustrative example of an experimental fNIRS signal is shown in Fig.  1(d). After filtering with an AR(2) model (determined optimal by BIC), the whitened signal ( Fig. 1(e)) showed significantly reduced autocorrelations ( Fig. 1(f)). For the experimental data in this study, the lower quartile, median, and upper quartile of the AR model order chosen to minimize BIC were 5, 7, and 9 for HbO 2 channels and 3, 5, and 8 for Hb channels. In order to briefly test the model order dependence on sampling rate, we downsampled (without filtering) the experimental data to 4/5Hz, 1Hz, 4/3Hz, and 2Hz. We observed that the median optimal model order increased with increasing sampling rate. After generating an optimal pre-whitening filter via fitting an AR(2) model, the whitened signal (b) has significantly reduced autocorrelations (c). An experimental fNIRS signal is shown in (d). After generating an optimal pre-whitening filter via fitting an AR(2) model, the whitened signal (e) has significantly reduced autocorrelations (d). Figure 2 shows example hemodynamic response functions recovered via deconvolution analysis for the block task (a-c) and event task (d-f). These examples used the experimental data as a baseline signal with a simulated evoked response of 0.04µM amplitude. The first column (Figs.  2(a,d)) shows examples when hemodynamic response function produced by OLS appears to be free of artifacts. In this case, the hemodynamic response functions produced by all methods appeared to be of high quality. The middle column (Figs. 2(b,e)) shows examples when the hemodynamic response recovered by OLS appeared to be mildly contaminated by motion artifacts. In this case, Wavelet-OLS and AR(P)-IRLS completely removed the suspected artifacts, while Spline-OLS provided partial attenuation of the artifacts. The last column (Figs. 2(c,f)) shows examples, in which the hemodynamic response recovered by OLS was severely corrupted by artifacts and/or noise. Wavelet-OLS and AR(P)-IRLS showed significant improvement in the visual appearance of the recovered hemodynamic response, while Spline-OLS failed to improve the results visually.  Figure 3 shows the results of ROC analysis of the deconvolution model performance for the block task ( Fig. 3(a)) and event tasks ( Fig. 3(b)). The data is shown for simulated evoked responses of 0.04µM amplitude. In the case of synthetic data (AR(5) process) with no simulated motion artifacts (AR/None), the performance was similar for all methods tested with AR(1)-OLS and AR(P)-IRLS showing small, but statistically significant increases in AUC 0.05 over OLS for both the block and event task. In the case of synthetic data with simulated spike artifacts(AR/Spike), AR(P)-IRLS gave the best performance, followed by Wavelet-OLS. The AR(1)-OLS results also showed a small, but statistically significant increase in AUC 0.05 over OLS for both the block and event task. For synthetic data with simulated shift artifacts (AR/Shift), AR(P)-IRLS showed a much larger performance over the other methods for both block and event tasks. The Spline-OLS method showed a small, but statistically significant increase in AUC 0.05 over OLS for both the block and event task. The AR(1)-OLS showed a statistically significant increase in performance over OLS for the event task, but not the block task. For experimental data, only AR(P)-IRLS showed a significant increase in performance over OLS. Figure 4 illustrates the type I error control of OLS, AR(1)-OLS, and AR(P)-IRLS for the same data as Fig. 3. The type I errors for Wavelet-OLS and Spline-OLS were similar to OLS and are not shown. For all cases, OLS showed severely inflated type I errors, with false positive rates of 33-43% when the p-value was estimated to be 0.05. In other words, false positive rates were 7-8 times higher than predicted. Prewhitening with an AR(1) filter (AR(1)-OLS) significantly reduced type I errors; however, false positive rates were still slightly higher than predicted for the experimental data, in which false positive rates were 17% and 11% for the block and event task, respectively, when the p-value was estimated to be 0.05. The AR(P)-IRLS method provided excellent type I error control for the synthetic data with no artifacts and the experimental data for both the block and event task. For synthetic data with simulated spike or shift artifacts, AR(P)-IRLS overestimated type I errors. Figure 5 shows the results of ROC analysis of the canonical regression model performance for the block task ( Fig. 5(a)) and event tasks (Fig. 5(b)). The data is shown for simulated evoked responses of 0.04µM amplitude. In the case of synthetic data with no simulated motion artifacts (AR/None), the performance was similar for all methods tested. In the case of synthetic data with simulated spike artifacts(AR/Spike), AR(P)-IRLS gave the best performance, followed by Wavelet-OLS. For synthetic data with simulated shift artifacts (AR/Shift), AR(P)-IRLS showed a large increase in performance over the other methods for both the block and event task, while AR(1)-OLS showed a small increase in performance for the block task and a moderate increase in performance for the event task. For the experimental data, AR(1)-OLS increased performance over OLS for the event task, but decreased performance for the block task. The AR(P)-IRLS method showed a moderate increase in performance over OLS for both the block and event task.  Figure 6 illustrates the type I error control of OLS, AR(1)-OLS, and AR(P)-IRLS for the same data as Fig. 5. The type I errors for Wavelet-OLS and Spline-OLS were similar to OLS and are not shown. For all cases, OLS showed severely inflated type I errors, with false positive rates of 32-38% when the p-value was estimated to be 0.05. Prewhitening with an AR(1) filter (AR(1)-OLS) significantly reduced type I error estimation. The AR(P)-IRLS method provided slightly better type I error estimation than AR(1)-OLS for the synthetic data with no artifacts or with spike artifacts. For the case of synthetic data with shift artifacts, the AR(P)-IRLS method overestimated type I errors. For the experimental data, AR(1)-OLS and AR(P)-IRLS performed similarly in terms of type I error estimation.
The results shown in Figs. 3-6 were for one magnitude of evoked response (∆[HbO 2 ] = 0.04µM, ∆[Hb] = 0.02µM). We tested a range of evoked response magnitudes (∆[HbO 2 ] = 0.01-0.10µM). In general, the false positive rates were not dependent on the magnitude of the evoked response, but on the underlying noise structure. The gain in performance (AUC 0.05 ) by using AR(P)-IRLS over OLS was dependent on the amplitude of the evoked response, or more specifically the contrast to noise ratio (CNR), in which all methods approach random performance (AUC 0.05 = 0.00125) as CNR decreases and ideal performance (AUC 0.05 = 0.05) as CNR increases. In general, correcting for motion artifacts will offer the most benefits in the middle range of CNR. The CNR of a typical NIRS experiment will vary greatly by experimental paradigm, number of trials, and design and location of the probe; however, for all of the simulations tested in this work, the AUC 0.05 for AR(P)-IRLS was greater than or equal to AUC 0.05 of OLS.

Discussion
In this study, we demonstrated a general algorithm for solving the GLM in the context of deconvolution (FIR) and canonical regression models for fNIRS that combines two well established statistical methods: AR model based data transformations (filtering) and iteratively reweighted least squares. These techniques were combined in a simple, computationally efficient algorithm to correct for two major sources of confounding noise in fNIRS analyses: serial correlations caused by systemic physiology and large outliers caused by motion artifacts, respectively. Furthermore, we rigorously tested the algorithm by performing ROC analysis using synthetic data, in which serial correlations, motion artifacts, and evoked responses could be controlled via simulations. Finally, we validated our simulation results by using experimental data as a source of real baseline physiological noise and motion artifacts. Although this approach to pre-whitening via an AR model based transformation is common practice in linear regression, this exact approach is not widely used in neuroscience applications. In fMRI analysis (which typically deals with data sets on the order of 64x64x30 = 122,880 voxels) the estimation of an AR model for each voxel is not practical due to the high computational time required. For example, in the fMRI analysis program SPM8 [21] a global estimate of an AR(1) model is included in the parameterization of the covariance matrix in the context of restricted maximum likelihood (ReML) rather than fitting each voxel independently. In fMRI analysis, this approximation is justified because the time between subsequent measurements is much longer (∼2-3s) than fNIRS, such that serial correlations are less of a concern. In the case of fNIRS data, which has both higher sample rates and lower number of time series measurements, these approximations are no longer appropriate.
The use of simulated AR noise with simulated artifacts allowed us to control the type, number, and amplitude of motion artifacts. The simulated artifacts were relatively large compared to experimental data and provided a good test of the AR(P)-IRLS method's ability to correct for motion artifacts. The AR(P)-IRLS method outperformed spline interpolation based motion correction [10] and wavelet based motion correction [11] methods when motion artifacts were present. Despite showing visual improvements in the recovered hemodynamic response functions, such as in Fig. 2, Wavelet-OLS and Spline-OLS did not offer any statistical advantage in the ROC analysis with the experimental data. It is possible that the parameters used for these methods were not optimal for the experimental data set in this study. Further ROC analyses should be performed for data collected under a wide range of experimental conditions and subject populations to validate and generalize the comparison of these methods.
In simulations without artifacts the algorithm did not negatively affect the analyses. The practical implication of this is that the algorithm can be applied to every data set without making a subjective judgment about whether artifact correction is needed or not. The method will improve performance when data is contaminated by motion and preserve normal performance when data are free of artifacts. Overall, our results suggest that application of robust regression methods after pre-whitening represents a reliable method for correction of motion artifacts.
An AR(1) model of the errors was insufficient for controlling type I errors in some cases. This effect is likely to be a greater issue at higher sampling rates than in this study (>4Hz). Using an adaptive AR model order for pre-whitening is simple to implement and computationally efficient enough to be added to standard fNIRS analysis methods to improve type I error control.
For the deconvolution model, AR(P)-IRLS method provided accurate estimation of false positive rates for the synthetic data with no artifacts and the experimental data, while overestimating type I errors for synthetic data with simulated spike or shift artifacts. For the canonical regression model AR(P)-IRLS method provided accurate estimation of false positive rates for the synthetic data with no artifacts or spike artifacts and the experimental data, while overestimating type I errors for synthetic data with shift artifacts. While it is potentially better to overestimate than underestimate type I errors, this may lead to reduced sensitivity near the threshold of statistical significance (estimated p-value < 0.05). This overestimation is possibly related to violation of stationariy (assumed by AR models) caused by severe motion artifacts, especially shift artifacts, or to the breakdown of the asymptotic covariance estimate in Eq. (13). This issue did not appear to be a problem for the simulations using experimental data with real motion artifacts, in which the estimated p-values were relatively accurate. The statistics may potentially be improved if shift artifacts can be removed or identified during preprocessing.

Conclusions
In conclusion, we have developed a general algorithm for both the deconvolution (FIR) and canonical regression analysis models that is robust to two major sources of noise in fNIRS: systemic physiological signals and motion artifacts. The algorithm preserves the performance of standard OLS methods when data are free of motion artifacts, thus freeing the user from making a subjective choice about artifact correction. Lastly, the algorithm contains a single tuning parameter for the weighting function used in IRLS that can be set objectively.