Animal preparation
Two adult common marmoset (Callithrix jacchus) were chronically and epidurally implanted with hemispheric electrocorticogram (ECoG) arrays. Marmoset S (male, 2y3m, 370g) was implanted on the right hemisphere with a 96-channel ECoG array developed by Komatsu et al26. Marmoset L (female, 3y1m, 380g) was implanted on the left hemisphere with a custom-designed 128-channel ECoG array. Both ECoG arrays were made of polyimide, with 0.8 mm electrode diameter and 2 ~ 2.5 mm inter-electrode distance (Cir-Tech Co., Ltd.). The surgeries were planned based on computerized tomography (CT) and magnetic resonance (MR) imaging. Due to individual difference in brain and skull shape, some electrodes were cut before surgery for easiness of implantation. In the surgery, two slit-shaped craniotomy were opened, one close to the midline and one close to the border of temporal muscle. The array arms were inserted to the medial slit and adjusted through the lateral slit to avoid overlapping. The connector box (combined with the headpost) was glued to the skull and 7 skull skews by Super-Bond (Sun Medical Co., Ltd) and strengthened by Unifast II dental resin (GC Corporation). Both the connector box and the scull screws were made of polyether ether ketone (PEEK). All surgeries were performed under general anesthesia (isoflurane 1.5%) and analgesics (ketoprofen 1mg/kg) was administered after surgery. All experiments followed the guidelines approved by the Institutional Animal Care and Use Committee (IACUC) of Kyoto University.
Electrode localization
CT images were taken after ECoG implantation and aligned with MR image taken before surgery. Electrode locations were identified by manually labeling on the CT images. The MR images was registered to the BMA 2019 Ex Vivo v1.0.0 atlas27 using 3D Slicer and the Advanced Normalization Tools (ANTs)28 implemented in 3D Slicer. The resulting transformations were applied to the electrode labels, and the covered brain areas were determined by vertices of the atlas nearest to the electrode labels.
Data acquisition
The local field potential (LFP) was digitized at 30 kHz and acquired using the hardware and software from SpikeGadgets (headstage: UH32; control units: MCU, HCU, ECU; software: Trodes; SpikeGadgets LLC). The LFP data was synchronized with task events by a digital line connecting the ECU and a USB DAQ system (USB1208FS, Measurement Computing Corporation) wired to the experimental computer, and streamed to the experimental computer through ethernet connection. The LFP was down-sampled to 1000 Hz before processing. The sign of LFP was flipped to reflect the intracellular voltage. No filter or reference was applied unless specified otherwise. Overlapping, cut or damaged electrodes were excluded in the analysis, so that 82 and 122 valid electrodes remained for Marmoset S and Marmoset L, respectively.
Eye tracking data was acquired by Eyelink 1000 plus primate mount system (SR Research, Canada) at 500 Hz binocularly in pupil-CR mode. The eye data was synchronized with task events by messages sending to the Eyelink host PC and stored in the EDF file. The eye data was linearly up-sampled to 1000 Hz before processing.
All data analysis was done using custom MATLAB (MathWorks) codes and C codes implemented as MATLAB mex functions unless specified otherwise.
Behavioral paradigm
Experimental setup was the same as our previous study29, if not specified otherwise. Experiments were performed in a dark sound-attenuating chamber. Visual stimuli were presented at 60 Hz on a 1920x1080 pixel LCD monitor (Dell AW2521HF), placed at 410 mm distance from the subject. The display area of the monitor spans 67.1x40.5 degree visual angles (DVA), but only a 30x30 DVA gray window (luminance: 119 cd/m2) centered at monitor center was used as stimulus background (area outside this window was black). Two photodiodes were placed at the top left and bottom left corners of the monitor, and connected to the experimental computer via the USB DAQ system. All tasks were written in Python using the PsychoPy library30 for presentation of visual stimuli. During the task, white squares were presented on both corners, which flashed off for 1 frame at each stimulus onset, so the precise timing can be detected by the photodiodes. The flashed area and the photodiodes were covered by black shrouds. Adaptive Sync technology was applied to minimize display latency and enhance consistency of frame interval31. The average display latency measured by photodiodes is about 2.5 ms, and the standard deviation of frame interval is about 0.1 ms. The marmosets were trained to be head-fixed in a primate chair while staying calm. The chair has 6 degree of freedom for adjustment of head position, and custom coordinates were used for each marmoset, so they look at the monitor center when relaxed.
In the visually-guided saccade task, a full-contrast, bright-center-dark-surround anulus stimulus (diameter: 2 DVA) was shown at the monitor center. The marmosets were required to fixate on the stimulus for 250 ms within 1000 ms. Then the stimulus jumped to a peripheral location (eccentricity: 4, 5, 6, 7, 8, 9 DVA, direction: 0°, 45°, 90°, 135°, 180°, 225°, 270°, 315°), where the marmosets were required to make a saccade to and fixate for 250 ms within 1000 ms, before a 2000 ms inter-trial interval (ITI). If successful, the stimulus was replaced by a picture of marmoset face (2x2 DVA square) staying for the first 500 ms of ITI and liquid reward was delivered. An invisible eye mask (diameter: 2 DVA) was used to determine overlapping of gaze location with the stimulus. Stimulus eccentricity remained constant and stimulus direction was pseudo-randomized in each task session. After each failed trial, the same stimulus location was used in the next trial. Each task session contains 24 required trials, and was aborted if number of failed trials reached 24. The length of eye data and ECoG data sessions were the same as their corresponding task sessions.
The same task was used for daily gaze calibration separately for both eyes and both axes using the following formula:
$${\text{y}=\text{tan}}^{-1}\left(a\bullet x+b\right)$$
where Y is the target location relative to each eye in DVA, x is the mean raw gaze location in the last 50 ms of fixation, and a, b are fitting parameters. Although the maximum tolerated fixation error was 2 DVA by task design, the marmosets usually made precise fixations and the root-mean-square-error of daily gaze calibration was about 0.6 DVA.
Saccade detection
Saccade periods were flagged using the U'n'Eye deep neural network32 in Python, trained by manually flagged eye data from other 3 marmosets. Eye velocity for the input to this network was calculated by a differentiating odd symmetric FIR filter with a smoothing 3 dB cutoff at 54 Hz. Samples of flagged data were manually checked for each marmoset to ensure model generalizability. Only successful trials with single saccade from the central stimulus to the peripheral stimulus were used for analysis. Trials with saccade reaction time shorter than 55 ms were regarded as anticipatory and excluded from analysis.
Event-related potential (ERP) analysis
LFPs of pure ipsiversive (Marmoset S, n = 1355; Marmoset L, n = 790) and pure contraversive (Marmoset S, n = 1281; Marmoset L, n = 724) trials were aligned on saccade offset to calculate the averaged ERP. For feature analysis of the triphasic ERP (Supplementary Table 2), timing of the trough (Trough), the first peak (Peak 1), and the second peak (Peak 2) were determined for each trial. For LFP data from each trial, the minimum during 20 ~ 60 ms after saccade offset was determined as Trough, the maximum form saccade offset to Trough was determined as Peak 1, and the maximum from Trough to 100 ms after saccade offset was determined as Peak 2. For the same analysis on ERP aligned on saccade onset (Supplementary Table 3), different searching window was used: 50 ~ 90 ms for Trough, 30 ~ Trough ms for Peak 1, and Trough ~ 150 ms for Peak 2. If any of the extrema was located at edges of the searching window, the trial was excluded from this analysis.
Statistics were performed to compare ERP features from two conditions (Contra. vs. Ipsi. in Supplementary Table 2; Sacc. On. vs. Sacc. Off in Supplementary Table 3). Median latency and amplitude were compared by two-tailed Wilcoxon rank sum test. Mean absolute deviation from the median (MADM) of latency was compared by the Brown–Forsythe test.
Time-frequency analysis
The time-frequency representation (TFR) of the original LFP signal (same as used for ERP analysis) was calculated by the wavelet convolution method, using complex Morlet wavelets (MW) of 64 central frequencies logarithmically spaced between 1 Hz and 160 Hz. The process was done to the whole recording session, zero padded to \({2}^{18}\) samples, before dividing into trials. To achieve a balance between temporal and spectral precision across different frequencies, a fixed wavelet number of 7 was used instead of fixed time window.
$$s=\frac{7}{2\pi f\bullet fs}$$
$$MW\left(t,f\right)= \frac{1}{\sqrt{s\sqrt{\pi }}}\bullet {e}^{\raisebox{1ex}{${-t}^{2}$}\!\left/ \!\raisebox{-1ex}{$2{s}^{2}$}\right.}\bullet {e}^{i2\pi ft}$$
$$TFR\left(t,f\right)=LFP\left(t\right)*MW\left(t,f\right)$$
where s is standard deviation of the Gaussian taper, fs is the sampling frequency (here, 1000 Hz), and f is central frequency of the Morlet wavelet. The Morlet wavelets were then normalized by their area under the curve, so that the amplitude of the resulting TFR reflects that of the original LFP.
Power was calculated by taking the absolute value of the TFRs, which was then Z-scored by the mean and standard deviation of each recording session, for each electrode and frequency.
The inter-trial phase clustering (ITPC) value was calculated by the following formula:
$$ITPC\left(t,f\right)=\left|\frac{\sum _{k=1}^{n}{e}^{i\phi \left(t,f\right)}}{n}\right|$$
where n is the total trial number, i is the imaginary unit, and \(\phi \left(t,f\right)\) is the phase angle of \(TFR\left(t,f\right)\). By definition ITPC ranges between 0 and 1, with 0 indicating random phase distribution and 1 indicating perfect phase locking. Phase-locked power was the product of power and ITPC.
Instantaneous amplitude, phase, and frequency
For each channel in the ECoG array, its original LFP time series was temporally filtered with a zero-lag 8th -order digital Butterworth band-pass filter (20 ~ 100 Hz). The complex analytic signal \(X\left(t\right)\) can be derived from the following formula:
$$X\left(t\right)=x\left(t\right)+iH\left[x\left(t\right)\right]$$
$$=A\left(t\right)\left(\text{cos}\left(\phi \left(t\right)\right)+i\text{sin}\left(\phi \left(t\right)\right)\right)$$
where i is the imaginary unit, \(H\left[x\left(t\right)\right]\) denotes the Hilbert transform of the filtered signal \(x\left(t\right)\), \(A\left(t\right)\) and \(\phi \left(t\right)\) denote the instantaneous amplitude and phase of \(x\left(t\right)\) respectively. The amplitude-normalized signal is therefore:
$$\stackrel{\sim}{x}\left(t\right) = \frac{x\left(t\right)}{A\left(t\right)}= \text{cos}\left(\phi \left(t\right)\right)$$
The instantaneous frequency is defined as:
$$fi\left(t\right)=\frac{\varDelta \phi }{\varDelta t}=\frac{\phi \left(t+1\right)-\phi \left(t-1\right)}{2}$$
Graph-based algorithm for wave detection
1. A weighted neighbor graph \({G}_{n}=\left(V\left({G}_{n}\right),E\left({G}_{n}\right)\right)\) was constructed for each electrode map, with each node in \(V\left({G}_{n}\right)\) representing a valid electrode. Then, nodes of \({G}_{n}\)were placed in Euclidian space according to 3D coordinates of the electrodes. For each node pair in \(\left\{\left(a,b\right)\left|a,b\in V\left({G}_{n}\right)\right.\right\}\), if the Euclidian distance between them was smaller than a threshold (here, 3.5 mm), add e\(\left(a,b\right)\) to \(E\left({G}_{n}\right)\), weighted by the distance. Due to the uneven surface distribution of electrodes, some edges were added to connect distant neighbors, and some edges were removed so that each node was only adjacent to its immediate neighbors.
2. For each data frame at time t, the instantaneous frequency and normalized LFP of each channel were calculated using the above-mentioned method. Then, a directed, weighted phase gradient graph \({G}_{pg}\left(t\right)=\left(V\left({G}_{n}\right),\varnothing \right)\) was constructed based on the neighbor graph. For each edge e\(\left(a,b\right)\) in the neighbor graph \({G}_{n}\), do the following steps.
1) A time window win was calculated by the following formula:
$$win=round\left(\frac{fs}{4\left({fi}_{a}\left(t\right)+{fi}_{b}\left(t\right)\right)}\right)$$
where fs is the sampling frequency. This time window corresponds to roughly a quarter of an average period.
2) The mean absolute difference (MAD) between the fragment of \({\stackrel{\sim}{x}}_{a}\left(t\right)\) centered at time t flanked by w, and the fragment of \({\stackrel{\sim}{x}}_{b}\left(t\right)\) time-shifted by d (ranging from − 12 to + 12 ms) was calculated by the following formula:
$${MAD}_{ba}\left(d\right) = \frac{{\sum }_{u=t-win}^{t+win}\left|{\stackrel{\sim}{x}}_{b}\left(u+d\right)-{\stackrel{\sim}{x}}_{a}\left(u\right)\right|}{2win+1}$$
The time shifts that give the minimum \({MAD}_{ba}\) and \({MAD}_{ab}\) were \({d}_{ba}\) and \({d}_{ab}\) respectively.
3) Edge e\(\left(a,b\right)\) was skipped if either condition was satisfied:
I. \({MAD}_{ba}\) or \({MAD}_{ab}\) is larger than a threshold (here, 0.15). This means a good matching cannot be found by shifting the normalized LFP of either channel.
II. \(\left|{d}_{ba}+{d}_{ab}\right|\) is larger than a threshold (here, 2 milliseconds). This means the best time shifts were not well centered around zero.
4) A weighted directed edge \(\stackrel{⃑}{e}\left(a, b\right)\) was added to \(E\left({G}_{pg}\right)\) if both conditions are satisfied:
I. \({d}_{ab}<0\) , meaning the past waveform of channel a fits well with the current waveform of channel b.
II. \({d}_{ba}>0\) , meaning the current waveform of channel a fits well with the future waveform of channel b.
Similarly, if \({d}_{ab}>0\) and \({d}_{ba}<0\), a directed edge \(\stackrel{⃑}{e}\left(b, a\right)\) was added.
The weight of \(\stackrel{⃑}{e}\left(a, b\right)\) was calculated by the phase gradient \(PG\left(a,b\right)\):
$$PG\left(a,b\right)=\frac{\left|arg\left({X}_{a}{\stackrel{-}{X}}_{b}\right)\right|}{weight\left(e\left(a,b\right)\right)}$$
5) If dab = dba = 0, the latency of the putative causal relationship between channel a and b is less than 1 millisecond. In this case, both \({d}_{ab}\) and\({d}_{ba}\) were replaced by 1 or -1 which gave a smaller MAD. Then the criteria in 4) were applied. Edges added by this way were labeled as “unstable” and were subjects to reversal in following steps.
3. Find the root node set R ⊆\(V\left({G}_{pg}\right)\), which contains nodes that have no inedge and at least 1 outedge. For each r ∈ R, construct an induced subgraph \({G}_{sub}\left(t, r\right)\) of \({G}_{pg}\left(t\right)\) containing root node r and nodes that r has a directed path to. F(t) is the set containing all such subgraph at time t. For each \({G}_{sub}\left(t,r\right)\) in F(t), do the following:
1) If there is an unstable edge \(\stackrel{⃑}{e}\) pointing from r to a node that also belongs to \({G}_{sub}\left(t, \stackrel{\prime }{r}\right)\) and \(\left|V\left({G}_{sub}\left(t,\stackrel{\prime }{r}\right)\right)\right|>\left|V\left({G}_{sub}\left(t,r\right)\right)\right|\), reverse the direction of \(\stackrel{⃑}{e}\) and change the sign of its weight, add to \({G}_{sub}\left(t, \stackrel{\prime }{r}\right)\)all nodes and edges of \({G}_{sub}\left(t,r\right)\) that was not in it already, and remove \({G}_{sub}\left(t,r\right)\) from F(t). After this, remove \({G}_{sub}\left(t,r\right)\) if \(\left|V\left({G}_{sub}\left(t,r\right)\right)\right|\) is smaller than a threshold (here, 3 electrodes).
2) Calculate the maximum directed spanning tree \(MDST\left(t, r\right)\) of \({G}_{sub}\left(t,r\right)\) by the Chu–Liu/Edmonds' algorithm. Find the unique path from r to each nodes in \(V\left({G}_{sub}\left(t,r\right)\right)\). The weight of such path in \({G}_{n}\) represents the traveling distance from the source channel. The cumulative phase difference of such path can be derived by summing up the phase difference of its constituent edges, which are their weights in \({G}_{pg}\) multiplied by those in \({G}_{n}\). P-value of Pearson correlation between the traveling distance and the cumulative phase difference was calculated. If the P-value is larger than a threshold (here, 0.005), remove \({G}_{sub}\left(t,r\right)\).
4. Create list TW to store the spatiotemporal representation of identified waves, and list \({TW}_{temp}\) to store ongoing waves. Each of such representation is formed by concatenating \({G}_{sub}\left(t,r\right)\) representing the same wave across consecutive time frames. For each F(t), do the following:
1) For each remaining \({G}_{sub}\left(t,r\right)\) in F(t), check if r is also a root of the last frames of waves in \({TW}_{temp}\). Also check for neighbors of r (defined in \({G}_{n}\)) whose phase gradient from r is lower than a threshold (here, 0.05 rad/mm). If \({G}_{sub}\left(t,r\right)\) is the succession of an ongoing wave in \({TW}_{temp}\), concatenate it to the last frame of that wave. Otherwise, add a new wave to \({TW}_{temp}\) and let \({G}_{sub}\left(t,r\right)\) be its first frame.
2) Find waves in \({TW}_{temp}\) which had not been renewed for longer than an interruption threshold (here, 3 ms). For each of such wave, if the wave duration is shorter than a threshold (here, 6 ms), or the ratio of interruption to the wave duration is higher than a threshold (here, 0.4), remove the wave. For each remaining wave, if the weighted average (by number of nodes) of Pearson P-value of its frames was higher than a threshold (here, 0.001), remove the wave. Otherwise, move the wave to TW.
Simulated wave
Traveling waves of known parameters were simulated on the electrode maps of the 2 marmosets. First, 1 ~ 4 source electrodes locating on occipital, frontal, parietal and temporal areas were manually selected (see Supplementary Fig. 1; map of Marmoset S: 80, 60, 30, 10; map of Marmoset L: 69, 58, 45, 31; only the first electrode in the list was used in 1 source condition, and so on). Each of them independently generated traveling waves spanning random duration between 50–200 ms at random intervals between 50 ~ 200 ms. For each wave from electrode i, its source signal is given by:
$${s}_{i}\left(t\right)=A\text{cos}\left(\frac{2\pi {\omega }_{i}t}{fs}+{\theta }_{i}\right), {t}_{0}\le t\le {t}_{end}$$
where A, \({\omega }_{i}\), and \({\theta }_{i}\) denote the amplitude (same for all waves), frequency (randomly selected between 20 ~ 50 Hz), and phase shift (randomly selected between 0 ~ 2π) of the signal respectively, fs denotes the sampling frequency (here, 1000 Hz), \({t}_{0}\) and \({t}_{end}\) denote the onset and offset timing of the wave.
To determine the spatial scope of each simulated wave, the shortest path tree from the source node was constructed on the neighbor graph. Then a probabilistic depth-first search (pDFS) was conducted on the shortest path tree from the source node. During the pDFS, the probability for a newly encountered node to be pushed to the stack is describe by a Gaussian cumulative distribution function (GCDF) based on its graph distance from the source. Nodes whose graph distance were shorter than a threshold (here, 3.5 mm) were pushed to the stack regardless of the probability, to ensure a minimum number of member nodes.
$$P\left(i,a\right)=1-GCDF\left(\mu ,\sigma ,dist\left(i,a\right)\right)$$
where \(\mu\) (here, 10 mm) and \(\sigma\) (here, 5 mm) denote the mean and standard deviation of the Gaussian distribution respectively, and \(dist\left(i,a\right)\) denotes the graph distance between source node i and node a on the neighbor graph.
Therefore, the signal at electrode a is given by:
$${x}_{a}\left(t\right) = \sigma \epsilon \left(t\right)+{\sum }_{i=1}^{n}{s}_{i}\left(t-\frac{dist\left(i,a\right)}{{v}_{i}}\right)\bullet {\gamma }^{dist\left(i,a\right)}$$
where n denotes the number of simultaneously occurring waves at time t that encompass a, \({v}_{i}\) denotes the traveling velocity (randomly selected from 600 ~ 3000 mm/s) of the wave, \(dist\left(a,i\right)\) denotes the graph distance between electrode a and source i in the neighbor graph, \(\sigma\) denotes the standard deviation (10%~50% of \({A}_{i}\)were tested) of the Gaussian noise \(\epsilon \left(t\right)\), and γ denotes the attenuation factor of the signal (here, 0.99 \({mm}^{-1}\)).
Then, the graph-based algorithm was applied to the simulated data. For each simulated wave, we extracted identified waves temporally overlapping with the simulated wave, whose main root node also matched the source electrode of the simulated wave. The total labeled time points of these identified waves were compared to the ground truth to calculate the temporal true positive rate (TPR) and false discovery rate (FDR). Nodes appearing in more than 50% of the labeled time points were defined as the spatial scope of the wave and compared to the ground truth to calculate the spatial TPR and FDR. Median root amplitude, median frequency, and median velocity were calculated from the spatiotemporal representations of the identified waves, to calculate the percent deviations from the ground truth.
Statistical analysis on wave parameters
For analysis on latency distribution (Fig. 5), only the largest wave originating from area V1 around saccade offset (-50 ~ 100 ms) in each trial was included (sample size; Marmoset S, 0°: 1325, 45°: 1318, 90°: 1218, 135°: 1231, 180°: 1269, 225°: 1349, 270°: 1013, 315°: 1358; Marmoset L, 0°: 718, 45°: 702, 90°: 703, 135°: 737, 180°: 784, 225°: 802, 270°: 724, 315°: 752), determined by spatio-temporal scope over the visual area (see Supplementary Table.S1). Data was divided into 30 bins, each 5 ms wide. One-tailed binomial test against the chance level of 1/30 was performed for each bin. The probability of the largest waves occurring in 0 ~ 50 ms after saccade offset was compared between direction pairs (0° vs. 180°, 90° vs. 270°, 45° vs. 135°, 315° vs. 225°) by Pearson's chi-squared test.
For analysis on wave source (Fig. 6) and other parameters (Fig. 7), only trials with at least 1 identified TW during 0 ~ 50 ms after saccade offset were included (sample size; Marmoset S, 0°: 1085, 45°: 1012, 90°: 995, 135°: 1079, 180°: 1192, 225°: 1284, 270°: 940, 315°: 1113; Marmoset L, 0°: 639, 45°: 605, 90°: 619, 135°: 667, 180°: 677, 225°: 686, 270°: 651, 315°: 658). The probability of wave source located at each electrode between ipsiversive and contraversive trials was compared by Pearson's chi-squared test. The spatio-temporal median of wave duration, maximum number of covered electrode, and spatio-temporal median of propagating velocity between ipsiversive and contraversive trials were compared by the two-tailed Wilcoxon rank sum test.