Comparison of electrical microstimulation artifact removal methods for high-channel-count prostheses

Background: Neuroprostheses are used to electrically stimulate the brain, modulate neural activity and restore sensory and motor function following injury or disease, such as blindness, paralysis, and other movement and psychiatric disorders. Recordings are often made simultaneously with stimulation, allowing the monitoring of neural signals and closed-loop control of devices. However, stimulation-J

Previously, we provided proof-of-concept that a high-resolution visual prosthesis consisting of several hundred electrodes could produce recognizable artificial percepts.We implanted 1024 penetrating electrodes (Utah arrays) in the visual cortex of monkeys and we able to convey the shape of letters by patterned stimulation of V1 (Chen et al., 2020).In a clinical study involving implantation of a 96-channel Utah array in the visual cortex of a blind volunteer (Fernández et al., 2021), we showed that the subject could recognize simple shapes composed of phosphenes, such as lines and letters, within a small region of the visual field.
It is important to reduce the overall current that is delivered because this decreases the risk of triggering epileptic activity and reduces the power consumption of the device.Researchers therefore apply low current levels that only cause focal activation (Lycke et al., 2023), close to the minimal current that elicits a phosphene at a particular electrode.Traditional methods for measuring current thresholds require the user to report phosphenes, while the stimulation amplitude is varied for one electrode at a time.This procedure can take up to minutes per electrode so that the calibration of future devices with thousands of electrodes could become impractical.Hence, there is a need for the development of semi-automatic current-thresholding methods that allow device calibration with minimal input from the prosthesis user.
Electrical microstimulation activates neurons in downstream brain regions (Klink et al., 2017;Logothetis et al., 2010) and current thresholding might also be achieved by recording the activity in these brain regions as a proxy for phosphene perception, via a closed-loop approach.We previously showed (Chen et al., 2020) that the current threshold for phosphenes induced in V1 could be predicted by neuronal activity elicited in V4, indicating that this technique could be used for rapid, automatic calibration of current thresholds.
However, microstimulation artifacts can obscure the neuronal activity because the artifacts can be orders of magnitude larger than the neuronal signals, making it difficult to remove them from the signal.There are several artifact-removal methods, which can be categorized into hardware-and software-based solutions.
Hardware-based artifact-removal methods (Brown et al., 2008;Gnadt and Echols, 2007;Wichmann and Devergnas, 2011) typically implement a fast-discharge circuit for the amplifier, allowing the signal to settle rapidly after each stimulation pulse and avoiding the recording of stimulation artifacts.A disadvantage is that the hardware occupies space and consumes power, and the heat generated by the device needs to dissipate.At present, these artifact-removal methods have only been implemented in low-channel-count devices, but future hardware-based systems for artifact removal might undergo miniaturization.By contrast, software-based methods remove stimulation artifacts post-hoc, making them easier to implement in portable devices.In this article, we created a well-characterized dataset, consisting of neurophysiological signals as well as artifacts.One could, in theory, attempt to apply artifact-removal methods to an electrophysiological dataset consisting of neuronal activity and stimulation-induced artifacts.A disadvantage of such an approach is the lack of uncontaminated neurophysiological signals (i.e. the 'ground-truth'), to which cleaned-up signals could be compared.To address this problem, previous studies synthesized datasets that can provide ground truth (Caldwell et al., 2020;Erez et al., 2010;Hashimoto et al., 2002;Mena et al., 2017;O'Shea and Shenoy, 2018;Schelles et al., 2022;Young et al., 2018).Some of these previous methods require a dense sampling of neural tissue with a high signal-to-noise ratio of the individual action potentials (Mena et al., 2017).Others require an independence of the stimulation artifacts and the recorded neuronal signals (Schelles et al., 2022), a condition that is not fulfilled when researchers aim to measure the neuronal responses elicited by microstimulation.Yet other studies focused on the restoration of EEG signals (Young et al., 2018).We here focused on artifact rejection methods that permit the restoration of spiking activity for a prosthesis with hundreds or more stimulation electrodes, each with their own, unique stimulation artifact.
We will evaluate four classes of software-based artifact-removal methods.The first class uses template averaging and subtraction ( Bar--Gad et al., 2004;Klink et al., 2017).These methods estimate the average shape of the artifact, for example by calculating a running average across successive artifacts.The average shape is subtracted from the neural signal for each stimulation pulse.Implementation of template subtraction is efficient because the algorithm is simple.These methods work well when the shape of the artifact waveform is relatively constant across pulses, but they do not completely remove artifacts that vary in shape.
The second class of methods detects the artifacts and removes the signal during a stimulation pulse or interpolates it between flanking data  points (Heffer and Fallon, 2008).While these methods are quick and effective, they discard neuronal data during each stimulation pulse.If artifacts last several milliseconds and stimulation frequencies are high, a large percentage of the neuronal activity may be lost.
The third class of methods remove artifacts based on curve fitting (Harding, 1991).They derive a model for the artifact shape, which is fit to each individual artifact.The fitted function is subtracted from the original signal.Some degree of variability in the artifact waveform is likely to occur in a neuroprosthesis device, and a key advantage of curve-fitting methods is their ability to remove artifacts despite such variability.If simple models are used, artifact removal is possible in real time (Wagenaar and Potter, 2002), but artifact removal may only be possible off-line with more complex models (e.g. using iterative curve-fitting algorithms).
The fourth class of methods estimates the shape of the artifact for each recording channel and current amplitude (O'Shea and Shenoy, 2018), using tensor decomposition across multiple recording channels, stimulation pulses and trials with a principal component analysis.We hypothesized that the performance of these methods might decrease if the number of pulse and trial repetitions is low, such as in a clinical cortical visual prosthesis.Furthermore, this method is computationally intensive, with a computational complexity of C 3 , where C is the number of channels.Given that future prosthesis devices could consist of tens of thousands of electrodes, implanted across several square centimeters of cortex, the feasibility of this method remains to be determined.
To make our simulated data set realistic, we recorded a large-scale (1024-channel) electrophysiological dataset from the visual cortex of two monkeys during delivery of microstimulation and characterized the properties of the neural signals and the artifacts.We ensured that the simulated datasets resembled the relevant aspects of the real data, including spikes and local field potentials (LFP).We added simulated artifacts and ensured that they resembled the real artifacts.This approach yielded a realistic simulated dataset of artifact-contaminated neural signals.We applied the software-based artifact removal methods to the data set and examined their accuracy, robustness, speed and suitability for future high-channel-count, portable prostheses.

Surgery
All experimental surgical procedures complied with the NIH Guide for Care and Use of Laboratory Animals (National Institutes of Health, Bethesda, Maryland), and were approved by the Institutional Animal Care and Use Committee of the Royal Netherlands Academy of Arts and Sciences.Two rhesus macaque monkeys were each implanted with 14 64-channel Utah arrays in area V1, and 2 64-channel Utah arrays in area V4 (Chen et al., 2020(Chen et al., , 2022)).The arrays were connected via wire bundles to a cranially mounted pedestal, which was connected to an electronic interface board (EIB) during recording sessions (see Chen et al., 2022, for details).

Electrical microstimulation and electrophysiology
Neuronal activity recordings were made with Cereplex-M front stage amplifiers and processed by 128-channel Cerebus systems.Stimulation was delivered via a CereStim R96 stimulator.We used custom-written Matlab scripts and the Tracker (Van Der Togt et al., 2022) and Psychophysics Toolbox (Brainard and Vision, 1997;Kleiner et al., 2007;Pelli, 1997) software packages to control the experiments.We recorded electrophysiological data at 30 kHz and carried out data analysis with custom-written Matlab scripts.We extracted three components of the neural signal: LFP, envelope multiunit activity (MUAe (Super and Roelfsema, 2005)) and spikes.LFP signals were generated by low-pass filtering the data at 200 Hz.MUAe was generated by band-pass filtering the raw signal between 250 and 5000 Hz, followed by full-wave rectification and low-pass filtering at 200 Hz.For spike extraction, the raw signal was first band-pass filtered between 250 and 5000 Hz.Spikes were detected as threshold-crossing events (threshold at 4.5x root-mean-square [RMS] of baseline activity), and spike waveforms consisting of 48 data samples were stored.

Generation of the simulated dataset
We generated a simulated dataset consisting of electrophysiological signals across 1024 channels.The simulated signal had five components: spontaneously occurring spikes, stimulation-evoked spikes, spontaneous LFP, spike-triggered LFP and stimulation artifacts.These components will be described below.The locations of the simulated electrodes were based on a realistic geometry of multiple 8×8 Utah arrays implanted in the visual cortex.

Simulation of stimulation trains
The stimulation artifacts consisted of a large transient deflection followed by a slow decay component and were based on parameters that elicited phosphene perception in the primary visual cortex (Chen et al., 2020;Tehovnik et al., 2003).The stimulation pulses were biphasic, symmetric, charge-balanced cathodal first square waves, 170 us per phase with 60 us between phases.Stimulation trains consisted of identical pulses with a frequency ranging from 200 to 300 Hz and the duration of each train ranged from 20 to 167 ms (8-50 pulses per train).
The stimulation current ranged from 1 to 210 μA (Fig. 2a).The titanium pedestal was used as the ground.

Simulation of spikes induced by micro-stimulation
We simulated spiking activity using a spontaneous firing rate drawn from a uniform distribution from 10 to 30 Hz and a maximal firing rate elicited by microstimulation between 80 and 200 Hz.The firing rate depended on current amplitude according to a logistic function, with the current threshold as an inflection point.Each microstimulation pulse elicited changes in the instantaneous firing rate modeled as a Poisson process (Fig. 2c) according to a kernel (Eq.( 1)) with excitation followed by inhibition, which was centered on every pulse (Fig. 2b).The amplitudes of the excitation and inhibition phases depended on the distance between the recording and stimulating electrodes according to a 3D Gaussian with an s.d. of 250 microns, centered on the stimulation electrode.
The kernel of the stimulation-induced change in firing rate, K stim , was defined as Where

Simulation of local field potentials (LFPs)
We used 1/f noise as the baseline LFP for each electrode.For each spike that occurred within a maximal distance from the recording electrode, a spike-triggered average LFP component was added to the baseline LFP (Fig. 2e).The amplitude of the spike-triggered average was inversely proportional to the distance between the electrode on which the spike occurred, and the recording electrode.
The kernel of the spike-triggered LFP was (2) Where k s is a spatial factor for the spike-triggered LFP, V Sp is voltage of

Simulation of the stimulation artifact
The simulated artifacts (K art ) had two components: a transient sine wave (K art1 ) superimposed on a slower multi-exponential decay (K art2 ) (Fig. 2g).The average amplitude of the artifact decreased in proportion to the square of the distance between the recording and stimulation electrodes, as measured in our recordings.We added random variation in the amplitude of the artifact (5-15%), in accordance with our measurements (reported in Results).We kept the amplitude of the simulated stimulation artifacts below the value that corresponded to saturation of the amplifiers (<4096 uV).
The sine wave was defined as where f pulse is the frequency of one stimulation pulse (2500 Hz).
The multi-exponential decay was defined as: Where λ 1 and λ 2 are the time constants of the decay, and t 0 ~ U(3/8 f, 7/8 f) ms.
The combination of K art1 and K art2 was defined as: Where Amplitude is the size of the artifact, which depends on the stimulation current and the distance between the stimulation and recording electrodes, as will be defined in Eq. ( 7) of the results section.

Simulation of the broadband signal
We generated the ground-truth broadband neurophysiological signal by adding the spike waveforms to the simulated LFP.We then added the simulated stimulation artifacts (Fig. 2h).

Artifact removal methods
We will now briefly describe the various methods for artifact removal that were applied to our simulated data.

Template subtraction
In visual prosthesis devices, a camera captures the visual world frame by frame and an image-processing computer converts each image into patterns of electrical stimulation.If the camera has a frame rate of 25 Hz, for example, images are presented for a duration of 40 ms.Hence, for a prosthesis, the stimulation parameters would remain stable for periods of 40 ms and the shape of the stimulation artifact is expected to remain relatively constant during this epoch.The Template subtraction method calculated the average shape of the artifact elicited by each of the pulses of such a train and subtracted it from the data.The template that we used was 150 samples long (5 ms) and included the slow decay of the artifact.

Linear interpolation
The Linear interpolation method identified the beginning and the end of the artifact waveform evoked by each stimulation pulse and replaced the data between these two points using Linear interpolation (using the Matlab function 'interp1').

Polynomial fitting
Whereas the Linear interpolation method removed the fast, transient component of the artifacts, it did not remove the slower decay that follows every transient component of the artifacts (shown in Fig. 2g).The polynomial fitting method aimed to remove both the transient component of the artifacts with linear interpolation and the slower decay using a polynomial fit to estimate the shape of the artifact in the time window between the offset of each stimulation pulse and the onset of the next pulse.For a pulse with a duration of T data points and an interval of N data points between two successive pulse onsets, the method fitted a k th -order polynomial function to the data points T+1 to N, where 3 ≤ k ≤ 8, and then subtracted the best-fitting curve from the original data.

Exponential fitting
This method was similar to the Polynomial fitting method but replaced the Polynomial fitting with a fast multi-exponential fit.We used a multi-exponential function (sum of 1-3 exponentials) using a fast algorithm that used the integral of the exponential (Tittelbach-Helmrich, 1993).

SALPA
SALPA artifact removal was performed using the published Matlab software package (Wagenaar and Potter, 2002).SALPA ignores large, fast deflections and makes no assumptions about the shape of the slow decaying component.It treats each artifact independently, by fitting a polynomial model that captures the shape of the artifact but does not fit the more complex shapes of spike waveforms.Specifically, the SALPA algorithm fits a series of third-order polynomials within a short time window, centered on each data point between successive large deflections.Thus, for N data points, there are N polynomial curves.All the polynomial curves are combined to estimate the shape of the artifact, and the estimate is subtracted from the data.Any unusable data or poorly fitted data points (labeled as such by the SALPA algorithm) were removed from the signal by performing a linear interpolation across those samples.

ERAASR
Artifact removal using the ERAASR method was carried out using a published Matlab code package (O'Shea and Shenoy, 2018).In short, the traces are first aligned temporally.The data are then arranged in a tensor in which all the channels, time points and stimulation pulses are represented.The algorithm calculates the linear decomposition of this data tensor.It first uses the first few principal components (PCs) (typically 4) across channels except the channel where the artifact is being removed and its immediate vicinity channels to capture the stimulation artifact.The reconstruction of these PCs is fitted to the signal from the channel where the artifact will be removed through linear regression.The resulting reconstructed artifact is then subtracted from that channel.Next, ERAASR repeats the reconstruction across pulses with the same stimulation parameters on the same recording electrode.Finally, decomposition is carried out across trials with the same stimulation parameters.Hence by iteratively removing the top PCs, ERAASR removes the shared structure across stimulation pulses and maintains the neuronal signals because they are unique to each channel (O'Shea and Shenoy, 2018).

Quantifying the performance of the artifact-removal methods
To evaluate the performance of the artifact-removal methods, we extracted spikes, MUA and LFP from the cleaned signals and compared them to the ground-truth signals.

Spike detection from cleaned data
To extract spikes, the cleaned broadband signal was band-pass filtered between 250 and 5000 Hz and we then set a threshold at 4x the RMS of the filtered signal.Any event that crossed the threshold was taken to be a 'spike.'The samples occurring between 0.4 ms before to 1.2 ms after the threshold crossing event were stored as the spike waveform.

Quantifying spike waveform distortion and spike identification from cleaned data
To evaluate the efficacy of the removal methods, we calculated the ratio between the firing rates obtained after artifact removal and the ground-truth firing rates.To detect missed spikes, we obtained spike tallies within 0.1-ms bins and calculated how often spikes in the groundtruth data were missed after artifact removal (misses) and vice versa, how often spikes occurred in the cleaned data that were not present in the ground truth data (false positives).We also calculated true positive (hits) and true negative firing rates (correct rejections).
F. Wang et al.

MUA from cleaned data
To calculate MUA from the cleaned broadband data, the 30-kHz broadband signal was band-pass filtered at 250-5000 Hz with a fourth-order Butterworth filter and full-wave rectified.The result was then low-pass filtered with a fourth-order Butterworth filter at 200 Hz.Finally, we down-sampled the result to 1000 Hz (Super and Roelfsema, 2005).

LFP from cleaned data
To generate LFP from the cleaned broadband data, the broadband signal was low-pass filtered at 200 Hz and down-sampled to 500 Hz.

Artifacts in electrophysiology data
We characterized the properties of artifacts that were generated by electrical microstimulation in area V1 in two monkeys.The implants consisted of fourteen 64-channel Utah arrays in area V1 and two 64channel Utah arrays in area V4 (Chen et al., 2020).We delivered trains of square-wave, biphasic, charge-balanced pulses at 300 Hz with current amplitudes ranging from 10 to 210 μA, via single V1 electrodes, while simultaneously recording from the other electrodes.Stimulation artifacts dominated the recorded signal during the stimulation train.These artifacts lasted tens to hundreds of milliseconds after the end of the train (Fig. 3a).The stimulation artifact from each individual pulse in a train typically consisted of two components: a large transient (<1 ms) deflection, followed by a slow decay that could be described by a sum of exponentials (Harding, 1991).The peak-to-trough amplitudes of the transient deflections were one to two orders of magnitude larger than those of recorded spikes in areas V1 and V4 (Fig. 3b) and depended on the stimulation current amplitude.On any given electrode, the peak-to-trough amplitude of the large transient deflection varied by 10-15% across stimulation pulses, even though the stimulation waveform and current amplitude was held constant (Fig. 3c).The amplitude of the artifact depends on the three-dimensional configuration of the recording electrode, the stimulation electrode and the return electrode (Fig. 3d,e).In our data set, the relationship between the artifact amplitude and the distance between the stimulation and recording electrodes could be approximated by a log linear model, although it is conceivable that the quality of the approximation would have increased had we taken the position and dimensions of the pedestal that was used for current return into account (Fig. 4a-c): The delay between the onset of the stimulation train and the artifact did not vary systematically with the distance between the electrodes (Fig. 4d).It can be seen in Fig. 3 how the stimulation artifacts obscured the neuronal signals.
In what follows, we will apply the artifact removal methods to simulated data so that we can compare their output to the ground truth, before artifacts were added.

Comparison of six software-based artifact removal methods
To evaluate the artifact-removal methods, we created a "ground truth" simulated dataset and added realistic artifacts.We then used the methods to remove the artifacts and compared the result to the ground truth.The first method, Template subtraction, estimates the shape of the artifact by averaging successive artifacts.This average artifact shape is subtracted from the electrophysiological signal at the stimulation time points, thereby reducing the artifact size in the cleaned signal (Fig. 5bd).
The second method was Linear interpolation.In this method, the large transient deflections caused by each stimulation pulse were replaced by a linear interpolation between the samples flanking the artifact (Fig. 5e, f).This method did not remove the slow artifactual decay, which followed the large deflection.
The next three methods do not only remove the large deflections but they also remove the later slow decay component, which lasts several milliseconds.The third method fitted a higher-order polynomial function to this slow decay (Fig. 5g, h) and the fourth method a sum of exponentials (Fig. 5i, j).The fifth method, SALPA, used a series of lowerorder polynomial functions, each within a short time window (Fig. 5k, l) (see Methods for details).
The sixth method, ERAASR, took advantage of the fact that stimulation artifacts recorded across electrodes have a similar shape.ERAASR uses serial linear decomposition to estimate the shared signal across electrodes, based on a principal component analysis.It preserves neuronal signals which are not shared across electrodes (Fig. 5m, n).This method makes no assumptions about the shape of the artifact.

Performance of the software-based artifact-removal methods
We evaluated each method for data containing 2,560,000 simulated stimulation pulses of different amplitudes, ranging from 1 to 210 uA, with 5-15% random variation in the amplitude of artifacts in the same train.

Estimation of stimulation artifacts
We first examined how well the artifact removal methods estimated the artifact waveform (A O ) that was added to the data.Fig. 6a-f shows examples of artifact estimation (A E ) for each method, during a simulated stimulation pulse train consisting of eight 80 μA pulses delivered at 200 Hz to an electrode located at a distance of 0.8 mm from the example recording electrode.The two model-free methods (Template subtraction and ERAASR) captured the large transient deflections and the slow decay components of the artifact (blue traces in Fig. 6a,f).The other four methods (Linear interpolation, Polynomial fitting, Exponential fitting, and SALPA) do not aim to estimate the shape of the fast and transient deflection but replace the artifact with a line segment.
Next, we quantified how closely A E matched the ground truth A O in the frequency band used to detect action potentials (250-5000 Hz), which is strongly influenced by the slow decay component of the artifact.We examined a 4.5-ms time window following the onset of each pulse and calculated the mean difference between the amplitudes of the A E and A O waveforms and the s.e.m. across all 2560,000 stimulation pulses (Fig. 6g-l).The Frame-averaging, Polynomial fitting, Exponential fitting and ERAASR methods yielded a mean difference that was near zero, indicating the absence of systematic deviations between A E and A O during the slow decay.The Linear interpolation method was characterized by a large negative deflection following each stimulation pulse, which gradually decreased over time because it did not estimate the slow decay component of A O .SALPA caused a small, but systematic deviation.

Removal of stimulation artifacts
Next, we examined the ability of the methods to remove artifacts from the neuronal data (Fig. 6m-r).We calculated the difference D between neuronal signal with artifact cleaned (S NAC ) and the ground truth simulated neuronal signal (S N ) prior to addition of simulated artifacts.

D(t) = |S NAC (t) − S N (t)|
The differences were largest immediately following each stimulation pulse, and then decreased until the onset of the next pulse.The Linear interpolation method had the largest peak deviation D(t) of 0.565 mV.The deviations for ERAASR (0.163 mV) and Template subtraction (0.159 mV) were also relatively large, because these methods aimed to recover the signal from the entire trace, including the larger deflections during stimulation pulses.During this phase, the amplitude of the stimulation artifact was 10-100 times larger than the neuronal signals.The Exponential fitting, Polynomial fitting and SALPA methods had peak deviations of 0.071, 0.066, and 0.055 mV, respectively.When we evaluated the mean deviation across all time points in the 4.5 s time window for the six methods, we found that it was smallest for SALPA (0.0498 mV), followed by Polynomial fitting (0.0500 mV), Exponential fitting (0.0616 mV), Template subtraction (0.1088 mV), ERAASR (0.1465 mV), and finally Linear interpolation (0.7914 mV) (Fig. 6s).

Spike detection and recovery of MUA
S NAC should be similar to S N if the stimulation artifact is successfully removed and the spikes are preserved.We measured the average difference between S NAC and S N across all data points (N=11.5×10 7) in the band-pass filtered (250-5000 Hz) signal, for each of the 64 electrodes of a simulated array, as function of the distance between the stimulation and recording electrodes (Fig. 7a).The difference between S NAC and S N decreased with distance from the stimulation electrode for the Template subtraction, Linear interpolation, SALPA and ERAASR methods (Template subtraction r = − 0.48, p = 5.6×10 − 5 ; Linear interpolation r = − 0.73, p = 1.1×10 − 11 ; SALPA r = − 0.52, p = 9.0×10 − 6 ; ERAASR r = − 0.57, p = 6.6×10 − 7 ), but this correlation was not significant for the Polynomial fitting and Exponential fitting in the distance range of a single 8×8 Utah array.
Next, we assessed the influence of the artifact-removal methods on spike detection.We detected spikes when S NAC and S N crossed a threshold (Fig. 7b).For an 8 ×8 Utah array on which stimulation was delivered via an electrode located on the corner of the array, Polynomial fitting and Exponential fitting hardly distorted the spike waveforms (the mean cross-correlation coefficient r between the spikes in S NAc and S N was 0.84 and 0.92, respectively), whereas Template subtraction and SALPA caused more distortion (r was 0.76 and 0.77, respectively).The distortion was largest for Linear interpolation and ERAASR (r was 0.38 and 0.32, respectively).
We next examined how well each method recovered spike firing rates based on spike detection.We calculated the ratio between the detected (S NAC ) and ground-truth (S N ) firing rates in 0.05-ms time bins spanning 0-1.5 ms after the stimulation pulse (Fig. 7c).False positives, which are spikes introduced by the artifact removal method, cause ratios larger than one and missed spikes cause ratios smaller than one.The Polynomial fitting, Exponential fitting and SALPA methods lost spikes that occurred during and immediately after the stimulation pulses but yielded accurate firing rates in the epoch thereafter.Template subtraction and ERAASR tended to miss spikes.The large remnants of artifacts associated with Linear interpolation caused many missed spikes.
Next, we examined the proportion of ground-truth spikes that were detected and the number of 'false-positive' spikes that were not present in the ground truth data, in time bins of 1 ms, as function of the distance between the stimulation and recording electrodes (Fig. 7d).Polynomial fitting and Exponential fitting yielded high and stable spike recovery rates across distances (with an average hit rate of 0.77 and 0.81, respectively).The SALPA method tended to remove spikes when the  artifacts were large, causing a low hit rate at distances smaller than 0.84 mm.For the Template subtraction and ERAASR methods, the hit rate also increased with distance.
We also evaluated the influence of the artifacts on multi-unit activity (MUA), a measure of spiking activity in the vicinity of the recording electrode.We calculated MUA as the envelope of the band-pass-filtered and full-wave-rectified data (see Methods, (Super and Roelfsema, 2005)).Polynomial fitting and Exponential fitting yielded MUA signals that closely matched the ground truth.After applying the Template subtraction, SALPA and ERAASR methods, the cleaned MUA signal had a larger amplitude than the ground-truth, indicating that remnants of the stimulation artifacts persisted in the MUA frequency band (Fig. 7e).Finally, Linear interpolation did not yield interpretable MUA.

Local field potentials
LFP signals are often used for controlling and calibrating prostheses, because this signal usually remains whereas the signal-to-noise ratio of spiking activity diminishes across the time that the implant is in the brain (Chen et al., 2023;Hughes et al., 2021).Most of the artifact-removal algorithms that we tested remove low-frequency components from the data, resulting in the loss of LFP signals.When we compared the LFP between the cleaned and ground-truth data (Fig. 8a), we observed that Template subtraction and Linear interpolation were the only two methods that preserved the LFP trace.This finding was confirmed with the power spectra of the entire LFP (Fig. 8b).We note that future studies could combine the LFP signal cleaned with Template subtraction or Linear interpolation with spiking activity in the 250-5000 Hz frequency band that is cleaned using one of the other methods.

Time consumption of the algorithms
Time-and power-consumption are important considerations for neuroprosthetic systems, especially when they must be implemented on mobile devices.To estimate the run time required, we tested the artifact removal methods on a 3.6-second simulated dataset with 1024 channels (Fig. 9).The Template subtraction, Linear interpolation and SALPA methods could remove artifacts in real time, whereas Polynomial fitting and Exponential fitting took more time, and ERAASR took the longest.The average trace of signal around each stimulation pulse was calculated (c) and subtracted from the original signal, resulting in the signal visible in panel d.e) Linear interpolation was a common first step for four methods: Linear interpolation, Polynomial fitting, Exponential fitting, and SALPA.f) Only the large transients were removed by the Linear interpolation method.g,h) Fitting of the slow decay with a 3rd-order polynomial (g) and the result after subtraction (h).i,j) Sum-ofexponentials fit (i) and the cleaned signal (j).k,l) Series of 3rd-order local polynomial fits based on SALPA (k) and the cleaned signal (l).m,n) ERAASR reconstructs the artifact on every channel using PCA and subtracts this waveform.We note that these results only give a first impression of the required computation resources and improved hardware and implementation of the algorithms will likely decrease the run time needed.
We provide a summary of the performance of the artifact-removal in Table 1, comparing i) their ability to retrieve spikes, ii) the preservation of spike waveforms, iii) their ability to recover MUA and iv) to extract

Discussion
A lack of ground truth data, caused by the distortion of neuronal signals during stimulation artifacts, has been a major obstacle in the evaluation of fully automatic methods for removing electrical stimulation artifacts.Here, we obtained recordings from a high-channel-count neural interface in the visual cortex and use key characteristics of the signals to build a simulated dataset that can be used to evaluate artifact removal methods.We illustrated the method by evaluating six software-based artifact-removal methods with a visual prosthesis application in mind.Such a prosthetic system transforms camera images into brain stimulation patterns and we therefore created a dataset with varying stimulation parameters, emulating successive 'camera frames.'Hence, the artifact-removal methods could not exploit the stereotypical artifact waveforms that occur under some of the laboratory conditions when the same stimulation parameters are often repeated.We note, however, that others may now use our methods to simulate other stimulation conditions.Several previous studies therefore used synthesized data to validate artifact removal methods (Caldwell et al., 2020;Erez et al., 2010;Hashimoto et al., 2002;Mena et al., 2017;O'Shea and Shenoy, 2018;Schelles et al., 2022;Young et al., 2018).Mena et al. (2017) achieved a remarkable accuracy in the removal of artifacts during microstimulation of the retina while obtaining high density recordings.They could follow action potentials of individual neurons while they propagated through the tissue, which contributed to the efficacy of the method.Unfortunately, it is unlikely that it is possible to place a sufficiently high density of electrodes in the visual cortex to apply this method.Young et al. (2018) investigated artifact removal while recording from Utah arrays, just as in the present study.The Utah array was positioned in the motor cortex and the researchers applied a regression method to remove artifacts induced by electrical stimulation of the skin or muscles of the arm.They investigated the accuracy of decoding the intended arm movement, but did not quantify the quality of the LFP and the spiking activity.This approach requires a separate decoder for each combination of a stimulation and recording electrodes, and future studies could examine whether it can be scaled up for implants with hundreds of electrodes for a visual brain prosthesis.More recently, Schelles et al. (2022) applied an advanced multi-channel Wiener filter method for artifact removal.Unfortunately, their approach requires that the neural signal and artifacts are uncorrelated, a requirement that is not met by the visual prosthesis, which aims to record neuronal responses elicited by microstimulation.Future studies could examine whether the violation of this assumption is detrimental and how the method scales if there are many electrodes that can be stimulated, because it might become computationally expensive.
The prototype visual brain prosthesis has several features that make artifact removal challenging.First, the density of electrodes is low compared to the number of recorded neurons, as was mentioned above.Second, the distance between recording and stimulation electrodes is only a few mm and artifacts are therefore much larger than the spiking activity.Third, the signal-to-noise ratio of the multiunit signals of the Utah array generally do not permit spike sorting (Chen et al., 2023;Chestek et al., 2011;Cody et al., 2018;Hughes et al., 2021).We here limited the simulations to the electrical stimulation of a single electrode at a time, a pattern that is useful for the calibration of stimulation thresholds at the individual electrodes, one at a time, by examining the activity at the other, non-stimulated electrodes.The present simulation framework does, however, permit the study of the more complex stimulation artifacts that would be caused by the simultaneous stimulation of multiple electrodes when a sequence of camera frames is presented to the prosthesis user.
A limitation common to all software-based artifact removal methods is that the saturation of the recording amplifiers causes a loss of data.In  our neuronal recordings in monkeys, none of the amplifiers were saturated during the delivery of currents smaller than 20 μA.When we applied higher stimulation amplitudes, amplifier saturation occurred for electrodes near the stimulation electrode, but not for electrodes >2.5 mm away.We note, however, that amplifier saturation depends on the configuration of the current source, the return for the current and the method used for grounding.We used the Linear interpolation method, which connects the data points before and after the stimulation pulse with a straight line, as baseline.Artifacts remained, because this method did not remove the slow decay component and it therefore failed to recover spiking activity and MUA.We note, however, this method permitted recovery of the LFP and can be used in combination with one of the other methods to detect spikes.
When we examined the recovery of spiking activity, the Exponential fitting method performed best, followed closely by Polynomial fitting and SALPA.These three methods were robust to variations in the data as they did not rely on structure in the data that was repeated across trials, pulse trains, or electrode arrangements.Instead, these methods treat all stimulation pulses independently.A disadvantage is that these methods did not remove the large deflection caused by the stimulation pulses but lost ~1 ms of the signal during each pulse.
The performance of Template subtraction and ERAASR was poorer than that of Exponential fitting, Polynomial fitting and SALPA.An additional disadvantage is that Template subtraction and ERAASR required the precise temporal alignment of the neurophysiological data to the stimulation pulses, which can be time-consuming in high-channelcount systems.Without such alignment, aliasing occurs and large deflections cannot be fully removed.
The suboptimal performance of ERAASR case may have been caused by several factors.Firstly, ERAASR was developed for V-probes (O'Shea and Shenoy, 2018) which are linear probes with nearby channels.In spite of our efforts, we may have failed to obtain the optimal ERAASR parameters for our dataset.Secondly, ERAASR benefits from oversampling of the data followed by careful time alignment of the artifacts, which may not be compatible with our secondary aim to minimize power consumption.Thirdly, we examined an early version of ERAASR and did not test a later version (O'Shea et al., 2022).
The computational power and time-and energy-consumption are also important factors in the design of a visual prosthesis device.The system must be compact, portable, untethered and run on battery power.The computational power requirements of Template subtraction, Linear interpolation, SALPA, and fast Exponential fitting methods were relatively light compared to Polynomial fitting, and ERAASR was computationally most demanding.
We conclude that among the artifact removal methods tested, the combination of Exponential fitting for spike detection with Linear interpolation or Template subtraction for LFP recording, worked best.However, we also identified several limitations of these methods.None of them removed the stimulation artifacts from the electrophysiological signal completely.Furthermore, implementation of these methods required careful supervision and did not run fully automatically.Hence, the ultimate choice of method and its calibration is likely to depend on the setup and equipment.
Our comparison of artifact removal systems using simulated datasets, described here, would be useful for the further development of artifactremoval methods, and could take the present pipeline and code to create simulated datasets as a starting point.Improvements in modelling and machine-learning techniques may further enhance the artifact-removal methods, paving the way for neuroprosthetic systems that can read from the brain while using electrical microstimulation to convey information for the partial replacement of lost sensory organs.

Declaration of Competing Interest
F W declares no conflict of interests.P R and X C are co-founders and shareholders of a neurotechnology start-up, Phosphoenix BV (Netherlands) (https://phosphoenix.nl).

Fig. 1 .
Fig. 1.Diagram of a future visual prosthesis system.High-channel-count multi-electrode arrays are implanted in the primary visual cortex of the patient.The stimulators and amplifiers are under the skin, together with wireless power and communication modules.A portable image processor and stimulation controller is connected to the head-mounted camera.The commands for stimulation and the signals recorded from the electrodes are transmitted wirelessly.

F
.Wang et al.

Fig. 2 .
Fig. 2. Simulated data.a) Waveform of example stimulation pulses.b) The kernel describing the effects of a stimulation pulse on the firing rate of neurons near the stimulation electrode.c) The simulated instantaneous firing rate of an example channel during a stimulation train.d) Example simulated spike trains, where each row is a repetition of the same stimulation parameters on the same recording electrode.e) Simulated spike-triggered average LFP.f) Simulated broadband data (LFP plus spike waveforms).g) An example stimulation artifact.h) Combined data, consisting of the simulated broadband neuronal data plus artifacts.

Fig. 3 .
Fig. 3. Properties of the recorded artifacts during stimulation on an example V1 electrode (62.5 μA at 200 Hz).a) Each row shows the mean waveform of a train of stimulation artifacts, averaged across all trials and all electrodes per Utah array in V1 of a monkey.The traces in panels a and b are color-coded per array, with the same colors as in the top of panel d.b) The mean stimulation artifact per Utah array, averaged across stimulation pulses and electrodes per array.Red, 100 μA; blue, 5 μA.c) The standard deviation in the peak-to-trough amplitude of the stimulation artifact pulses as function of the distance between the recording and stimulating electrodes, across all pulses in stimulation trains.d) Electrode locations on the cortex, color-coded by the amplitude of the stimulation artifact.The electrode locations were obtained from a photograph taken during the implantation surgery.The cross indicates the stimulated electrode.The color of the data points indicates the amplitude of the artifact waveform (from peak to trough) in mV (same color scale as in panel e).e) The peak-to-trough artifact size decreased with the distance between the recording and stimulation electrodes.

Fig. 4 .
Fig. 4. Influence of distance from the stimulation electrode on the size of the artifact.a-c) Peak-to-trough amplitude of the artifact as a function of the distance between stimulation and recording electrodes, when stimulating in V1 and recording from V1 (a), when stimulating in V4 and recording from V1 (b) and when stimulating in V1 and recording from V4 (c).d) The time elapsed between the onset of stimulation pulses and the first peak of the artifact waveform, as a function of the distance between stimulation and recording electrodes.

F
.Wang et al.

Fig. 5 .
Fig. 5. Overview of the artifact-removal methods tested.a) Simulated, unfiltered data with electrical stimulation artifacts.b-d) Template subtraction method.The average trace of signal around each stimulation pulse was calculated (c) and subtracted from the original signal, resulting in the signal visible in panel d.e) Linear interpolation was a common first step for four methods: Linear interpolation, Polynomial fitting, Exponential fitting, and SALPA.f) Only the large transients were removed by the Linear interpolation method.g,h) Fitting of the slow decay with a 3rd-order polynomial (g) and the result after subtraction (h).i,j) Sum-ofexponentials fit (i) and the cleaned signal (j).k,l) Series of 3rd-order local polynomial fits based on SALPA (k) and the cleaned signal (l).m,n) ERAASR reconstructs the artifact on every channel using PCA and subtracts this waveform.

F
.Wang et al.

Fig. 6 .
Fig. 6.The estimation of artifact shape.a-f) The blue trace shows the shape of the artifact that was estimated by each method for an example stimulation train and recording channel.Orange trace, signal after removal of estimated artifact.g-l) Average trace after subtraction of estimated artifacts.Thin line marks s.d.m-r) Average deviation from the ground truth after artifact removal.Thin line marks s.d.s) left, Deviation from ground-truth data, averaged across time points.right, Given the amount of data, all differences between methods were significant (paired t-test, n = 64, df = 63).*, p<0.0001; **, p<0.00001.

F
.Wang et al.

Fig. 7 .
Fig. 7. Quality of recovered spiking data and MUA.a) Absolute deviation from ground truth after artifact removal (filtered between 250 and 5000 Hz), as a function of the distance between recording and stimulation electrodes.Note the difference scale on the y-axis for the Linear interpolation method.b) Black trace, the average spike waveform of the ground-truth and its s.d.Red traces, 100 example spike waveforms after artifact-removal.c) Ratio between recovered firing rate after artifact removal and the ground-truth firing rate, aligned to the onset of simulated electrical stimulation pulses.d) Blue: hit rate.Red: false positive rate.e) Recovery of MUA.

Fig. 8 .
Fig. 8. Detection of LFP.a) Example LFP traces, with the stimulation pulse at t=0 ms.b) LFP power spectrum calculated across the entire dataset.Black traces, ground truth.Red traces, after artifact removal.

Fig. 9 .
Fig. 9. Average time consumption.We measured the run time needed by each method to remove 76,800 stimulation pulses on a desktop workstation with a single Xeon E3-1245 CPU.

Table 1
Overview of methods tested.