Macaque monkeys and humans sample temporal regularities in the acoustic environment

Many animal species show comparable abilities to detect basic rhythms and produce rhythmic behavior. Yet, the capacities to process complex rhythms and synchronize rhythmic behavior appear to be species-specific: vocal learning animals can, but some primates might not. This discrepancy is of high interest as there is a putative link between rhythm processing and the development of sophisticated sensorimotor behavior in humans. Do our closest ancestors show comparable endogenous dispositions to sample the acoustic environment in the absence of task instructions and training? We recorded EEG from macaque monkeys and humans while they passively listened to isochronous equitone sequences. Individual-and trial-level analyses showed that macaque monkeys ’ and humans ’ delta-band neural oscillations encoded and tracked the timing of auditory events. Further, mu-(8 – 15 Hz) and beta-band (12 – 20 Hz) oscillations revealed the superimposition of varied accentuation patterns on a subset of trials. These observations suggest convergence in the encoding and dynamic attending of temporal regularities in the acoustic environment, bridging a gap in the phylogenesis of rhythm cognition


Introduction
"The perception, if not the enjoyment, of musical cadences and of rhythm is probably common to all animals and no doubt depends on the common physiological nature of their nervous system" -Charles Darwin.
Research in non-human animal species is considered a test case for unravelling the evolutionary origin(s) of human rhythm cognition (Honing, 2018;Kotz et al., 2018;Ravignani et al., 2019;Patel, 2014).Here, we conceptualize 'rhythm' as any pattern of events re-occurring over time.The temporal regularity of an acoustic event sequence (from isochronous tones to music) and complex behavior (e.g., walking and speaking) can, thus, be seen as a form of rhythm.
What makes us capable of detecting temporal regularities in the environment?And in turn, what allows us to produce rhythmic behaviors and synchronize our movements to an external rhythm?Detection requires the encoding of a rhythm, i.e., the neurophysiological processing of temporal regularity.In turn, this allows producing and synchronizing with external rhythms.However, are the capacities to detect, produce, and synchronize rhythms (from now on, 'DPS') innate and shared across species?
Comparative studies have shown that DPS is found in vocal learning animals (e.g., parrots; (Patel et al., 2009)), but only partially in others (Schachner et al., 2009).Macaque monkeys performed similarly to humans in tapping tasks (Zarco et al., 2009), showed sensitivity to temporal (ir-)regularities in auditory sequences (Selezneva et al., 2013), and chimpanzees spontaneously aligned their tapping to task-irrelevant auditory rhythms (Hattori et al., 2013).However, the macaque monkey seems unable to detect and synchronize with salient periodicities from complex rhythms such as human-made (musical) stimuli (Honing et al., 2012(Honing et al., , 2018)).Such results partially supported the notion that many species share a basic capacity for detecting temporal regularities, but some cognitive processes underlying DPS might be species-specific (Patel, 2008(Patel, , 2006;;Fitch, 2013), and further dependent upon neuroanatomical differences (Merchant and Honing, 2014;Patel and Iversen, 2014).
However, other studies have shown that the capacity to process and synchronize behavior with rhythms also depends on voluntary control, cognitive state, attention, and motivation, beyond the actual capacity to control behavior (Wilson and Cook, 2016).When exposed to music, children do not automatically synchronize with the musical beat, but do so in a social context (in presence of adults; (Kirschner and Tomasello, 2009)).Similarly, macaque monkeys spontaneously synchronize behavioral displays (Nagasaka et al., 2012), and bonobos tend to spontaneously synchronize rhythmic behavior with a human experimenter (Large and Gray, 2015).
Thus, a more fundamental question in comparative rhythm cognition is: do nonhuman animals have the neurophysiological predisposition to encode temporal regularities in the environment?Once we have addressed this question we can probe their ability for rhythmic displays, which, in isolation, do not necessarily inform about their sensory capacities to process temporal regularities (Wilson and Cook, 2016).
Accumulating evidence shows that premotor cortex cells in macaque monkeys resemble a 'neural chronometer' encoding time intervals (Merchant et al., 2011(Merchant et al., , 2013a) ) with a dynamic time-varying representation (Crowe et al., 2014).These cells predict regularly timed stimuli (Bartolo and Merchant, 2015) and allow predictive tapping to tempo changes (Gámez et al., 2018).Macaque monkeys' neural activity indicates the encoding and synchronization with temporal regularities in the sensory environment (Lakatos et al., 2008) and dynamic attending (Large and Jones, 1999) to sensory streams, similarly to humans (for overviews: Obleser and Kayser, 2019;Schroeder and Lakatos, 2009).
In the auditory as well as in other sensory domains, fluctuations of neural activity instantiate a 'rhythmic mode' of sensing (Lakatos et al., 2013) and attending (Large and Jones, 1999) the environment that can bias subjective perception (Iemi and Busch, 2018;Zoefel and VanRullen, 2017).Thus, the alternation of high-and low-salience may influence how we perceive sensory input and further shape behavioral performance (e.g., perceptual tasks).For instance, humans show a disposition to perceive subjective accentuations (the tic-toc illusion) when listening to isochronous equitone sequences (Brochard et al., 2003).Even though all tones are physically identical, the human brain tends to accentuate two or three equidistant tones according to a binary (on-/off-beat; or strong-weak (S-w)) or ternary (S-w-w) pattern (Brochard et al., 2003;Abecasis et al., 2005;Schmidt-Kassow et al., 2011;Poudrier, 2020;Baath, 2015).As the phenomenon emerges during passive listening, the superimposition of accentuations may represent a spontaneous tendency to sample (or attend) the acoustic environment beyond the encoding of single event onsets or time-intervals.As such, subjective accentuation may represent an unbiased marker of endogenous rhythm processing in a quasi-naturalistic context as it does not depend on training nor task demands.
Human event-related potentials (ERPs) mirror these subjective accentuation patterns, showing amplitude differences for tones in S-w positions (Brochard et al., 2003;Abecasis et al., 2005).Neural oscillations in the alpha-and beta-band (8-20 Hz) not only coincide with but precede the onset of expected tones (Fujioka et al., 2012;Arnal, 2012;Fujioka et al., 2009;Snyder and Large, 2005), and their amplitude can be modulated by trial-level accentuation patterns (Criscuolo et al., 2023).These oscillatory brain dynamics likely reflect the active generation of temporal predictions, whereby the first element in a series of two or three might be more salient (S) than others (w).
Does our closest ancestor show a comparable endogenous disposition to sample the acoustic environment in the absence of task instructions and/or training?
In the current study, we recorded EEG in two macaque monkeys who passively listened to isochronous equitone sequences.Similar to a recent study with human participants (Criscuolo et al., 2023), we investigated the monkey's endogenous tendencies to (i) internalize the timing of external sound events, (ii) track tone onsets, and (iii) parse equitonal sequences with superimposed binary accentuation .Lastly, we directly compared human and macaque monkey EEG to (iv) test for (dis-)similarities in basic rhythm processing.
The current findings suggest that macaque monkeys have an adequate neural outfit to go beyond simple isochrony processing.The unexplored parallels between macaque monkeys and humans bridge a critical gap in the phylogenesis of rhythm cognition, potentially lending support to Darwin's notion of a shared neurophysiological predisposition for rhythm processing in non-human primates.

Experimental procedure
We tested two macaque monkeys.Monkey 1 (M1) is a 12-year-old male, Monkey 2 (M2) a 9-year-old female.Both monkeys had normal hearing and were previously trained in spatial and temporal categorization tasks (M1) (Mendoza et al., 2018) and a synchronization tapping task (M2) (Gámez et al., 2019).Both monkeys were awake (i.e., were not sedated) while EEG was recorded, sitting in a quiet room [3 (l) × 2 (d) × 2.5 (h) m] with dimmed lighting and two loudspeakers placed at a ~50 cm from their ears.The animals were seated comfortably in a monkey chair, where they could freely move their head, hands, and feet.No head fixation was used and the EEG electrodes were attached to the monkey's scalp using tape (see EEG data acquisition below).To ease the fixation of the electrodes, the monkey's hair on the scalp and reference ear was shaved.Detailed information about human participants, EEG data collection, and analysis procedures can be found elsewhere (Criscuolo et al., 2023).Briefly, we randomly selected 4 human participants from a dataset of 20 individuals (21-29 years of age, mean age 26.2 years).EEG was recorded from 59 Ag/AgCl scalp electrodes (Electrocap International), amplified using a PORTI-32/MREFA amplifier (DC to 135 Hz), and digitized at 500 Hz.

Ethics statement
Animal care and experimental procedures were approved by the National University of Mexico Institutional Animal Care and Use Committee and conformed to the principles outlined in the Guide for Care and Use of Laboratory Animals (NIH, publication number 85-23, revised 1985).The Mexican standards on research ethical protocols with nonhuman primates (NHP) are in the 'NORMA Oficial Mexicana NOM-062-ZOO-1999', and are in line with regulations from 13 other countries (Hartig et al., 2023).

Audiogram
The animal's hearing capacity was recorded with a scalp-recorded audiogram in partially sedated states.Sedation was induced and maintained with Ketamine (Aranada, Mexico).

Stimuli materials
For the audiogram, click sounds were produced trough TTL pulses generated with a TDT-RZ6 signal processor (Tucker-Davis Technologies, system 3, Florida, USA) at a digitization rate of 97656.24Hz.Clicks lasted 0.5 ms and were delivered at a rate of 15.1 Hz, in random polarities and sound intensities (from 20 to 90, steps of 10 dB SPL).Twenty blocks of 4000 stimuli were presented in a single recording session to collect 1000 repetitions per click intensity.Clicks were binaurally delivered through open-field speakers (KRK 5-G3, USA) located 85 cm from the animal's ears.Sound intensity was calibrated using a free-field condenser microphone (426B03), a sensor signal conditioner (480C02, PCB Piezotronics, NY USA), and the TDT RPvdsEx circuits.

EEG recording
For the audiogram, continuous EEG was recorded from three Grass gold-plated electrodes (Natus Neurology, #FS-E5GH-60; Fig. 1) located at Fz, Cz, and Pz according to the 10/20 system.A reference electrode was located on the right earlobe, while the ground electrode was located on the central forehead.Scalps were shaved and cleaned with a mild abrasive gel (Nuprep, Weaver and Company, USA) before the recording session to reduce scalp impedances.The signal was amplified by means of a medusa preamplifier (RA16PA, TDT systems) and digitized at 24414.06 Hz with an on-line filter from 3 Hz to 6000 Hz.Additionally, a notch filter at 60 Hz was applied to remove the line frequency.

Signal processing
Channel Cz was selected for the analyses of the audiogram based on the known higher signal-to-noise ratio of the vertex signal.The signal was further band-pass filtered (150-3000 Hz, Butterworth 4th-order filter) and epoched relative to stimulus onset (− 10 to 66 ms).EEG epochs were sorted into positive and negative polarity click presentations and sub-averages were computed for each polarity condition.An added polarity grand average was obtained and used for further analysis to avoid any transduction stimulus artifact and to minimize the cochlear and microphonic potentials.ABR waves of similar latency as those reported in anesthetized or sedated animals (Laughlin et al., 1999) were observed (Suppl.Fig. 1).A significant evoked response (2-tailed t-test, 10 ms window size) was observed at 60-and 50-dB SPL for monkey 1 and 2, respectively.A previous study in fuscata macaques reported that at a level of 60 dB SPL level, the monkeys were able to hear tones in a 28 Hz to 37 kHz range (Jackson et al., 1999), frequencies that are elicited by the click broad-band stimulus.Three ABR components were clearly identified at 90 dB SPL in both monkeys.These components with positive peaks had latencies of ≈ 3, 4.6, and 8 ms.The shape and latency of the first two waves agrees with previous reports (Suppl.Fig. 1, (Lasky et al., 1995)).The likely neural generators of the observed waves are the cochlear nucleus, lateral lemniscus, and inferior colliculus (Lasky et al., 1995;Uno et al., 1993Uno et al., , 1991)).The observed peak amplitudes (≈3-14 µV) were larger than those reported in awake head-fixed Macaca fuscata (Uno et al., 1993) (tenths of µV) using similar level and rate parameters but close-field and monaural stimulation.Although it is known that anaesthesia or sedation might considerably diminish or even abolish evoked responses (Uno et al., 1993), the audiogram ensured that both monkeys could perceive the tones employed in the experimental paradigm as both standard (STD) tones and amplitude-deviant (DEV) tones were above the individual hearing threshold in both monkeys.

Experimental paradigm
Monkeys listened to 13-tone (440 Hz, 85 dB, 50 ms duration) isochronous equitone sequences (Fig. 1, right).On < 5% of the trials, amplitude-deviant tones (DEV; 66 dB) could fall on the 8-9-10-11th position (Fig. 1, bottom row).The inter-stimulus-interval between tones was fixed at 0.6 s, corresponding to a constant stimulus rate of (1.6667 Hz).The entire trial sequence lasted 7.8 s and was followed by a random inter-trial silent period between 3.5 and 5.5 s.Critically, no accentuation pattern (strong-weak sounds) was imposed on the auditory sequence.M1 underwent 21 recording sessions and M2 25 sessions during which both animals listened to 100 13-tone sequences each.Out of 1300 total events, 1240 were STD tones (>95%) and 60 were DEV (4,6%).The stimulus materials in use were nearly identical for humans and monkeys (Criscuolo et al., 2023).

EEG data acquisition
The EEG was recorded from electrodes (Grass gold-plated electrodes) attached to five scalp positions (Fz, Cz, Pz, F3, F4) according to the 10-20 system (Fig. 1).Both monkeys previously underwent surgery procedures where the head fixation posts were implanted during aseptic surgery and under gas anesthesia.Importantly, the temporal maxillary muscles of the two monkeys were retracted during the surgery, thus leaving the upper skull surface free of muscular or eye-induced artifacts.A second surgery was performed and head holding devices were removed prior to data collection.All electrodes were attached to the scalp using Ten20 Conductive EEG paste and medical tape and were referenced to the right ear (fleshy part of the pinna).The electrodes were connected to a Tucker-Davis Technologies (TDT) head stage (#RA16LI) for low impedance electrodes.This head stage was connected to a TDT RA16PA preamplifier, which in turn was connected to a TDT RZ2 processor.RZ2 was programmed to acquire the EEG signals with a sampling rate of 610.35 Hz and the bandpass filters were set at 0.01-100 Hz.

Data analysis
2.4.2.1.Preprocessing.Data were pre-processed with a combination of custom Matlab scripts/functions and the Matlab-based FieldTrip toolbox (Oostenveld et al., 2011).Data were band-pass filtered with a 4th order Butterworth filter in the frequency range of 0.5-50 Hz (ft_preprocessing).Next, data segmentation was conducted separately for 'rhythm-tracking', event-related potentials (ERP) and time-frequency representation (TFR) analyses.

Rhythm tracking analyses.
Rhythm-tracking analyses were time-locked to encompass the whole equitone sequence.100 sequences (per experimental session) were created, starting from the third tone onset and including up to the 13th tone (6.6 s).Next, we computed a fronto-central channel cluster encompassing 'Fz', 'F3', 'F4', 'Cz'.Data from this front-central cluster were used for Fast-Fourier transform (FFT) and phase-locking analyses.

Fast-Fourier transform.
Single-trial data from the fronto-central cluster were submitted to a FFT ("FFT data") with an output frequency resolution of 0.15 Hz (1/6.6 s = 0.15 Hz).Spectral power was The hypothesized superimposition of binary accentuations would parse the auditory sequence in alternating "strong" (S) and "weak" (w) accents.Deviant tones (DEV) occurred from the 8th position onward.Accordingly, they could occur on S and w accentuated positions with equal probability.
A. Criscuolo et al. calculated as the squared absolute value of the complex Fourier output.Data in each frequency bin were normalized by the frequency-specific standard deviation across trials.Lastly, we averaged the frequencydomain data across channels and trials.For illustration purposes, we restricted the Fourier spectrum to 1-7 Hz in Fig. 2 A, top for M1 and bottom for M2.

Phase-locking analyses.
A time-resolved phase-locking analysis was performed to estimate the phase relationship between neural activity at the stimulation frequency and the sequential tone onsets.Sequence-level data from the fronto-central cluster were bandpassfiltered with a 4th order Butterworth filter around the stimulation frequency (1.1-2.1 Hz, considering a 1.67 Hz center frequency; ft_preprocessing) and underwent Hilbert transform to extract the analytic signal.Next, we plotted the time-course of the real part of the analytic signal (Fig. 2B, top for M1 and bottom for M2) as a function of the onsets of STD tones preceding (blue) and following (red) the DEV (green; this plot is for illustrative purposes only).Phase-locking analyses were performed at sequence-and channel-levels by means of circular statistics (circular toolbox in Matlab (Berens, 2009)) based on the circular mean phase-angles estimated in the 60 ms (proportional to the stimulation frequency: 1/1.67 Hz/10) preceding individual tone onsets.Next, the sequence-and channel-levels mean vector length (MVL; (Berens, 2009)) were calculated for pre-DEV STD tones and the values averaged across channels.MVL for pre-DEV STD tones were statistically assessed against the MVL from a random distribution (random uniform distribution of phase-angles) by means of 1000 permutation tests.A p-value lower than .05was considered statistically significant.In Suppl.Fig. 2, we also provide session-, channel-, and sequence-level 'relative phase angles.These were expressed as the absolute phase difference between phase-angles for each tone position (e.g., three to eight) and the most common phase-angle in the sequence (the one with the highest probability, as obtained from the histogram function in MATLAB, with 'probability' as input).The pooling over sessions and channels is displayed in Figs.2D and 3D.  2. cubic temporal interpolation (using the 'pchip' option for both the built-in Matlab and FieldTrip-based interpolation functions) considering the time-course of neighboring time-windows (extending up to 500 ms when possible, automatically reduced otherwise).The current approach is a data-driven procedure developed to minimize data loss.Rather than rejecting entire epochs only partly contaminated by artifacts (i.e., standard artifact rejection procedure), we opted for an artifact suppression approach that allowed keeping all trials.The channel-by-channel routine allowed the algorithm to flexibly adapt the outlier threshold estimates to the inherent noise varying over channels.Lastly, a standard whole-trial rejection procedure based on an amplitude criterion (85uV) was applied.Data selected for event-related-potential (ERP) analyses ("ERP data") were segmented including 500 ms prior and following each tone onset (1 s in total).Data destined to time-frequency representation analyses ("TFR data") were not further segmented at this stage.ERP data were band-pass filtered between 1 and 30 Hz and TFR data low-pass filtered at 40 Hz.
TFR data finally underwent time-frequency transformation by means of a wavelet-transform (Uno et al., 1993) with a frequency resolution of 0.25 Hz.The number of fitted cycles ranged from 3 for the low frequencies (<5 Hz) to 10 for high frequencies (>5 Hz and up to 40 Hz).TFR data were then re-segmented, so to reduce the total length to 2 s, symmetrically distributed relative to tone onsets.

Post-processing of ERP and TFR data.
Single-trial ERP amplitudes were mean-corrected by a global average over epochs and 500 ms long (− 0.2 to 0.3 s relative to tone onset).Similarly, single-trial TFR amplitudes were normalized by computing relative percent change with reference to the global mean amplitude across epochs and 500 ms long.This approach has been used elsewhere (Fujioka et al., 2012;Abbasi and Gross, 2020) and was preferred over baseline correction as we were interested in analyzing amplitude fluctuations in the pre-stimulus intervals.Then, we created a fronto-central channel cluster.All following analyses were performed exclusively on this channel cluster.

ERP analyses.
We averaged evoked responses over trials separately for STD and DEV tones, and for odd (hypothetical "Strong" position in a binary accent; S, Figs.3,4) and even ("weak"; w) serial 2.4.2.8.TFR analyses.We averaged time-frequency representations over STD trials, separately for odd and even positions (hypothetical strong and weak positions, respectively; Figs.3B and 4B).Next, we quantified mean peak amplitudes in the mu-band (8-15 Hz) in the poststimulus intervals (80 ms (proportional to the center frequency: 1/ 12 Hz) and compared them for S-w positions (Fig. 3 C and 4 C).
2.4.2.9.Individual classification of accents.An individual modelling approach was developed to identify binary accents.Since other accentuation patterns are possible (Abecasis et al., 2005) beyond the binary default (Brochard et al., 2003), the model further tested for the presence of ternary accents.We focused on single-subject, mu-band peak amplitudes for STD tones in the first 8 positions of the auditory sequence in 80 ms time-windows (proportional to the center frequency of interest; mu-band: 1/12 Hz = 83 ms) following the stimulus onset.

(caption on next page)
A. Criscuolo et al. binary accents (values: 1, − 1), ternary (1, − 0.5, − 0.5), and a constant term (ones).The winning model was chosen based on adjusted Etasquared.Trials for which the winning model involved the binary predictor were labeled "binary".Similarly, trials for which the winning model involved the ternary predictor were labeled "ternary".The model thus allowed the combination of multiple predictors, but no interactions between terms.We accordingly interpreted (and labeled) the combination of binary and ternary terms as "combined".Lastly, trials in which grouping could not be clearly identified were labeled as "not classified".Session-level model results are provided in Suppl.Table 2, designated as "Preferences for accents" and expressed as the percentage of trials relative to the full number of auditory sequences (100 per session, per monkey).The session-level goodness of fit of the model is provided in the same table.The "Preferences for accents" across sessions and monkeys are provided in Fig. 5B.

Binary accent analyses.
To confirm that the identified "binary" trials indeed showed binary accentuation patterns, we performed further analyses.First, we concatenated trials classified as "binary" and computed a trial-based single-tone pair-wise amplitude difference (its lower-triangle 2-D mean is provided in Fig. 5 C).Namely, we calculated the amplitude difference between every tone (1-8 positions) along the auditory sequence independently for each sequence.Hence, we estimated the amplitude difference for the 1st and 2nd position, then to the third position, and so forth.Similarly, the amplitude for the 2nd position was compared to the 3rd, the 4th position, and so on.The result of this computation is a session-level pair-wise amplitude difference matrix, whose size is N trials x N positions-1 x N positions-1.Next, we isolated the pair-wise amplitude difference for tones in odd-positions (Fig. 5D; "odd-pos difference") and even-positions ("even-pos difference") and statistically compared them by means of 1000 permutations of odd-even labels.An FDR-adjusted p-value lower than .05was considered statistically significant (Benjamini & Hochberg correction).These two variables were then combined into a distribution of "binary similarity".The binary similarity thus features the amplitude difference for tones in oddnumbered positions (1-3-5-7th) and the amplitude difference for tones in even-numbered positions (2-4-6-8th).In contrast, the binary dissimilarity was created by extracting the mean difference of tones in odd versus even positions (Fig. 5D).Finally, the binary similarity was statistically compared to the binary dissimilarity.Statistical testing was performed by means of 1000 permutations of odd-even labels, and an FDR-adjusted p-value lower than .05was considered statistically significant (Benjamini & Hochberg correction).

Comparative analyses.
In a prior study (Criscuolo et al., 2023) we investigated rhythm processing capacities in humans, using a comparable experimental paradigm and analyses.Human participants only took part in two experimental sessions, hence we only focused on the first two experimental sessions of the monkeys for comparative data analyses (see (Criscuolo et al., 2023) for further details regarding the analysis procedure adopted for the human dataset).
From the human sample, we randomly selected 4 participants (gender-balanced, as for the monkeys).In monkeys as well as in humans, we extracted the 'individual preferences for accents' as calculated in the accent modelling.Next, we isolated 'binary' trials, performed 'binary similarity' and 'binary dissimilarity' analyses as described in the 'Binary accent analyses' section.Lastly, we inspected the time-course of neural activity time-locked to STD tones and compared it across monkeys and humans (Fig. 6).
2.4.2.12.Ternary accent analyses.Exploratory analyses zoomed in ternary trials.While the 'binary trials' could only show two accentuation patterns (S-w or w-S), ternary trials can show at least three different accentuation patterns, i.e., the accent can either fall on the first (S-w-w), second (w-S-w), or the third position (w-w-S).To disentangle these three accentuation patterns from the distribution of "Ternary trials" identified during the "Individual classification of accents", we ran a second stepwise regression model.This model featured three predictors corresponding to the respective accentuation types, implemented as:1, − 0.5, − 0.5 (pattern 1), − 0.5, 1, − 0.5 (pattern 2) and − 0.5, − 0.5, 1 (pattern 3).The model did not allow interaction terms, and the winning model was chosen based on adjusted Eta-squared.The output of the model is provided in Fig. 5B (bottom right), as the percent distribution of three accentuation patterns across sessions and relative to the total number of auditory sequences (100 per session).Note that not all trials could be classified as pertaining to the three modelled accentuation patterns.Other accentuation patterns are possible in the ternary trials, which were not modelled here: for instance, a S-w-w pattern could be as well represented by a stair-case amplitude change (i.e., 1, − 0.75, − 0.25) or a shuffled version of it (i.e., 1, − 0.25, − 0.75).However, given that the stimulation rate in use is likely suboptimal to test ternary accentuations even in humans (Brochard et al., 2003;Abecasis et al., 2005;Poudrier, 2020;Baath, 2015;Fujioka et al., 2012), we did not build models to test all possible ternary accentuation patterns.Furthermore, it is important to note that the model in use here only uses the first 8 tones of the auditory sequence.This choice avoids the onset of DEV tones in later positions, which may disrupt ongoing accentuations, but inevitably leaves only up to two periods of a ternary accent (as compared to 4 repetitions of a binary accent).Consequently, even two small amplitude fluctuations with superimposed noise (inherent in EEG recordings) may drive the 'ternary' classification, but these trials may not necessarily reflect a true ternary accent.In turn, we expected a large proportion of 'ternary' trials to fail to be further classified as strictly reflecting the modelled patterns (1,− 0.5,− 0.5 (pattern 1), − 0.5,1,− 0.5 (pattern 2) and − 0.5,− 0.5,1 (pattern 3)).The small percentage of trials belonging to the three accentuation types (~2%) precluded further analyses due to insufficient statistical power to interpret results.

Resources and details
The datasets supporting the current study will be deposited in a public repository but are available from the corresponding author upon reasonable request.The code for analyses is available upon request.Further information and requests for resources should be directed to the Corresponding Author.

Rhythm tracking
Macaque monkeys passively listened to isochronous equitone sequences presented at a stimulation rate of 1.67 Hz and containing 13-to-Fig.5. Modelling of individual accents and analyses on binary accents.A: The modelling of accents was performed by means of stepwise regression modelling and using mu-band post-stimulus responses as the dependent variable.The predictors were a binary (1, − 1), a ternary (1, − 0.5, − 0.5) and a constant term (ones).B: preferences for accents, as reported from the modelling.In order, we plot the distribution of trials assigned to binary, ternary, combined (binary-ternary) accents, and 'not classified' (neither binary nor ternary) across the two monkeys.At the bottom, we zoom into binary trials and distinguish S-w accents from w-S accents based on trial-level Beta coefficients from the modelling (top for M1, M2 below).C: grand-average pair-wise difference for mu-band peak amplitudes across the first 8 positions of the auditory sequence in binary trials.D: on the left, the distribution of amplitude differences across odd-numbered positions (in blue) and even-numbered positions (cyan).The average of these two distributions forms the 'Binary similarity'.On the right, the 'binary similarity' (blue) and the mean amplitude difference of the odd-versus even-numbered position ('binary dissimilarity'; in cyan).Statistical testing was performed by means of 1000 permutations and an FDRadjusted p < .05 was considered as statistically significant.
A. Criscuolo et al.Fig. 6.Monkey-human similarities in rhythm processing.A: distribution of preferences for accents for, from left to right, M1, M2, and 4 human participants selected from a separate human dataset (Criscuolo et al., 2023).B: The time-course of time-locked neural dynamics in the mu-band for M1 and M2, and in the low-beta band for human participants.In blue, the time-locked responses to STD tones on Strong positions and red for weak positions.C: Binary accent effect quantified by means of Binary Similarity and Dissimilarity metrics.While M1 and the four human participants showed comparable binary accentuations from the beginning, M2 showed the same pattern later (Fig. 5).Of note, M2 only showed significant binary accentuations after two recording sessions.This observation differentiates the one depicted in Fig. 5, where we denoted significant binary accentuations when pooling across data from all recording sessions (>20).Thus, monkeys, similarly to humans, tend to vary in how they subjectively employ binary accentuation patterns over time.
15 frequent tones (standard; STD) and one amplitude-attenuated deviant tone (DEV).We tested whether and how their neural activity would show idiosyncratic signatures of rhythm tracking.
Macaque monkeys' neural activity encoded the timing of external events (Fig. 2 A, B top for M1 and bottom for M2).The Fourier spectrum showed a power peak at the stimulation frequency (1.67 Hz; Fig. 2 A), indicating that neural activity responded timely to tone onsets.Next, we quantified the consistency of pre-stimulus phase in delta-band (centered at 1.67 Hz) neural activity during a time-window preceding tone onsets (Fig. 2B) by means of mean vector length (MVL) analyses.This phase analysis focused on the ~60 ms (proportional to the stimulation frequency: 1/1.67 Hz/10) prior to tone onsets.Single-session trial-and channel-level MVL of STD tones preceding a DEV (pre-DEV) significantly differed from a random distribution (MVLs are plotted as kernel distributions in Fig. 2D; pre-DEV in blue; Suppl.Table 1-2 for statistics).

Accent processing
We tested whether participants' neural activity would sample the acoustic environment by superimposing binary accentuation patterns (S-w accents in odd-numbered versus even-numbered positions) onto the isochronous equitone sequences.Thus, we analyzed event-related responses (ERP) to STD tones in S and w positions, and further inspected the time-frequency representation of time-locked responses.
ERPs to STD tones in S-w positions did not differ statistically (Fig. 3 A, 4 A).However, STD tones elicited stronger N100 and P200 responses as compared to DEV tones (FDR-adjusted p < .05;Fig. 3 A,4 A, right), confirming the processing of an unpredicted amplitude-attenuated deviant tone.The time-frequency representation plots of neural activity in response to STD tones mainly showed one event-locked response in the mu-band (8-15 Hz; Fig. 3B for M1 and 4B for M2).In this frequency band, we compared event-locked fluctuations for STD tones in odd (S) versus even (w) positions along the sequence (Fig. 3 C, 4 C), corresponding to hypothetical S-w positions (blue and red, respectively).The result of the statistical comparison of S-w positions did not survive FDR correction for multiple comparisons (FDR >0.05).
To summarize, neither ERP nor TFR analyses revealed a binary accentuation, as STD tones elicited similar responses when they occurred in odd-and even-numbered positions along the auditory sequence.This result, however, may be associated with inter-and intraindividual differences in when and how accentuation patterns are superimposed onto the auditory sequences (Brochard et al., 2003).These hypotheses required a more adequate method to be tested: the novel accent modelling approach below.

Accent modelling
To address the questions of whether (i) monkeys accentuate in a similar way, (ii) they always accentuate, and (iii) accentuation patterns influence DEV processing (Brochard et al., 2003), we focused on trial-level data and modelled various accentuation patterns.We used a stepwise regression model to classify session-level, mu-band neural responses as best reflecting a binary, ternary, or absence of accentuation patterns.The model predicted tone-by-tone mu-band amplitude changes from three predictors: binary, ternary, and constant terms (Fig. 5 A).Resulting preferences for accents are reported in Suppl.Table 3 and summarized in Fig. 5B.Note that most trials (~60%) did not reflect either binary or ternary accents.In the absence of perceptual reports, we cannot confirm whether this observation signals lack of sensitivity of our method or whether monkeys did indeed not accentuate.
Next, we zoomed into "binary trials" and disentangled S-w from w-S accentuation patterns based on the single-trial β-coefficients obtained from the modelling (see methods).The resulting distributions are reported in Fig. 5B, bottom left.Similarly, we disentangled three possible accentuation patterns in the "ternary trials" (Suppl.Fig. 3).We performed a separate stepwise regression modelling using S-w-w, w-S-w and w-w-S accents as predictors (see methods).Distributions are reported in Suppl.Fig. 3.
Overall, this approach allowed showing that macaque monkeys' neural activity spontaneously superimposes accentuation patterns on identical tones embedded in isochronous equitone sequences.Importantly, the monkeys' neural activity seems to switch between binary, ternary, and other accentuation patterns over trials.Notably, however, in the majority of trials no consistent accentuation pattern was identified.

Binary accents
After isolating trials showing binary accentuation patterns, we aimed to statistically test whether mu-band responses would significantly differ in S versus w positions.If so, neural responses to tones falling on oddnumbered positions should differ from those on even-numbered positions.However, there should be no differences for neural responses on the same positions: namely, tones falling on odd-numbered positions should elicit similar (i.e., non-significantly different) neural activity.In sum, we assessed whether this modelling approach delivers a meaningful classification of binary accents.
To this end, we isolated the binary trials and calculated the tone-bytone pair-wise amplitude difference for mu-band post-stimulus activity across 8 positions in the auditory sequence and preceding the DEV tone.For visualization, the resulting matrix was averaged across trials and only the lower diagonal matrix is shown (Fig. 5 C).The original matrix (all trials) was instead used to calculate metrics of "Binary similarity" and "Binary dissimilarity" (Fig. 5D; top for M1 and bottom for M2).The Binary similarity features two distributions: the amplitude difference for tones in odd-numbered positions (1-3-5-7th; "blue", labeled as "oddnumbered position difference") and the amplitude difference for tones in even-numbered positions (2-4-6-8th; cyan, labeled as "even-numbered position difference").The two distributions did not significantly differ from each other.The respective values were then combined to compute a "Binary similarity" variable.For the "Binary dissimilarity" analyses, we calculated the amplitude difference for tones in odd-versus evennumbered positions (corresponding to on-beat versus off-beat "Binary difference") and statistically compared it to the "Binary similarity" (Fig. 5D).Statistical testing confirmed a significant difference (FDRadjusted p < .05),indicating that mu-band post-stimulus amplitudes were significantly modulated according to a binary accent.The same procedure was independently repeated for both monkeys.
Taken together, these results confirmed that trials classified as 'binary' during the modelling did indeed show a consistent binary accentuation pattern.Mu-band amplitudes on STD tones in S positions significantly differed from those in w positions.

Comparative analyses of human and monkey data
Next, we set out to investigate similarities in rhythm processing between humans and macaque monkeys.We directly compared the two macaque monkeys with a subset of 4 datasets of participants taken from a prior human study (Criscuolo et al., 2023).Critically, these human participants underwent EEG recording while listening to similar stimulus material as the monkeys.This allowed us to reproduce the modelling of accents in use here and to directly compare the two datasets.
As human participants took part in only two experimental sessions, we focused on the first two experimental sessions of the monkeys as well.Details on the analysis procedure for the human dataset can be found in (Criscuolo et al., 2023).In Fig. 6 A, we show preferences for accents for M1, M2, and 4 human participants and below, the distribution of preferences for binary, ternary, and other accents (non-classified trials).Like humans, macaque monkeys showed the emergence of binary accentuations in 21% of the trials and ternary accentuations in 23% of the trials.From the selected binary trials, we then plotted the time-course of event-locked activity in the mu-band for M1 and M2 and low-beta for humans, and for STD tones on S-w positions (blue and red respectively; Fig. 6B).Comparable to human participants, both monkeys showed larger amplitudes in response to STD-S tones than for STD-w tones (Fig. 5).However, while in M1 and human participants binary accentuations were evidenced by a non-significant Binary Similarity but a significant Binary Dissimilarity, M2 showed such an effect later (Fig. 6 C).We take this finding as additional evidence for the inter-and intra-individual variability across species in the if, when, and how accentuations occur.

Discussion
In this comparative EEG study, we set out to investigate the basic rhythm processing capacities of macaque monkeys and humans.All participants passively listened to isochronous equitone sequences,and we examined spontaneous neurophysiological activity that underlies the sampling of temporal regularities in the acoustic environment.We intentionally did not choose an active experimental task setting as it might enforce unspecific goal-directed behavior and potentially confound genuine endogenous rhythm processing.We further suggest that the testing of task-independent neural behavior in non-human primates might be a quintessential step in understanding the phylogenetic trajectories of basic rhythm cognition.
The present results show that the macaque monkey's neurophysiological responses display the encoding of temporal regularities in the acoustic environment even during passive listening and confirm prior task-active results (Zarco et al., 2009;Honing et al., 2018;Merchant et al., 2011;Crowe et al., 2014;Gámez et al., 2018;Ayala et al., 2017;Bartolo et al., 2014;Merchant et al., 2013b).We further show that these neurophysiological responses go beyond the mere encoding of isochrony: neural activity in the mu-band indicated the superimposition of binary (strong (S) -weak (w)) accentuations in a subset of trials, mirroring results in humans (Criscuolo et al., 2023).Even though all tones were physically identical, in some trials tone-locked neural responses were modulated by a binary (S-w) or ternary (S-w-w) subjective accentuation, resembling the well-documented tic-toc phenomenon observed in humans (Brochard et al., 2003;Abecasis et al., 2005;Schmidt-Kassow et al., 2011;Poudrier, 2020;Baath, 2015).As this phenomenon emerged during passive listening, the superimposition of accentuations might represent the spontaneous sampling of the acoustic environment beyond the encoding of single event onsets or time-intervals.
Standard ERP and time-frequency analyses, relying on the averaging of neural activity over hundreds of trials, failed to show these binary accentuation patterns.In comparison, our novel trial-based analysis increased sensitivity to such accentuation patterns.These combined observations also support earlier human studies that reported large inter-and intra-individual differences in if, when, and how participants accentuate (Poudrier, 2020;Baath, 2015).Hence, various accentuation patterns are possible: individuals may start accentuating at different time points (i.e., not necessarily from the beginning of an auditory sequence), may alternate accents over time (thus, over trials), or may not accentuate at all (if not instructed to do so).Thus, any trial-averaging procedure may inevitably overwrite these accentuation possibilities and mask out individual tendencies that influence the parsing of acoustic environmental rhythms.
While isochrony may not necessarily represent a (musical) rhythm, we propose it to be an ideal test-case for investigating the basic neurophysiology that underlies the encoding of temporal regularities.We further note that isochrony is present in a wide range of daily behavior, and its evolutionary advantage might lie in its simplicity: it allows generating temporal predictions (Ravignani and Madison, 2017).In turn, temporal predictability facilitates adaptive behaviours, rhythmic interactions, music, speech, and much more (Greenfield et al., 2021).
Cross-species differences in complex rhythm and beat processing have been commonly associated with neuroanatomical and -functional differences in the motor system, specifically in cortico-basal-gangliathalamo-cortical (mCBGT) circuitry, which is more developed in humans than in non-human species (Patel and Iversen, 2014;Wilson and Cook, 2016;Mendoza and Merchant, 2014).However, the current findings might indicate that basic rhythm processing capacities not necessarily involve the mCBGT, or that they preceded neuroanatomical changes in the evolution of the human brain.However, given the absence of information on the neuroanatomical provenance in EEG signals, we refrain from speculations but motivate future studies to investigate this matter.
In summary, the current findings confirm that macaque monkeys have an adequate neural outfit to sample temporal regularities in the environment, and further show a human-like predisposition to parse regular acoustic input with accentuation patterns.These observations confirm that macaque monkeys have the fundamental building blocks that are necessary for DPS.
These unexplored parallels between humans and macaque monkeys motivate further cross-species investigations to advance better understanding of the phylogenesis of human rhythm cognition.

Conclusion
While passively listening to isochronous equitone sequences, macaque monkeys' neural oscillatory activity sampled the acoustic environment at multiple timescales.Delta-and mu-band oscillations encoded the temporal regularity in auditory sequences, tracked soundonsets, and parsed them with a superimposed accentuation pattern.These observations mirror basic rhythm processing in humans and confirm a complementary role of low-(delta) and high-(mu) frequency bands.As these basic rhythm processing capacities are linked to the development of complex sensorimotor skills in humans (e.g., speech and music), these findings highlight a basic and fundamental steppingstone in the phylogenetic trajectories of humans' rhythm cognition.A. Criscuolo et al.

Fig. 1 .
Fig. 1.Electrode positions and stimulus sequence.Electrode positions on the macaque monkey scalp (left) and the 13-tone isochronous equitone sequence (right).The hypothesized superimposition of binary accentuations would parse the auditory sequence in alternating "strong" (S) and "weak" (w) accents.Deviant tones (DEV) occurred from the 8th position onward.Accordingly, they could occur on S and w accentuated positions with equal probability.
Fig. 2. Rhythm tracking analyses.A: Fourier spectrum of neural activity along the entire auditory sequence.The plot displays the grand-average power in the frequency range from 1 to 7 Hz.B: time-course of neural activity at the stimulation frequency.Vertical dotted lines indicate the onsets of STD tones prior-(blue) and post-DEV (red).The DEV onset is reported in green.Blue shades represent the standard errors.Light-blue rectangles indicate the pre-stimulus intervals of STD in which we performed phase analyses (not scaled).C: the kernel density distribution of mean vector length (MVL) calculated at the single-session and sequence-level and averaged across the fronto-central cluster of interest.These MVLs are based on the raw phase-angles for pre-DEV STD tones (blue) and are statistically compared to the MVL for random distribution of phase-angles.Single-session statistics are reported in Suppl.Table 2.

Fig. 4 .
Fig. 4. ERPs and TFR data for monkey 2. A: On the left, ERP responses for STD tones in S (blue) w (red) positions.On the right, ERP responses for STD (blue) and DEV tones.Stars indicate significant time-windows, as assessed by means of paired-sample t-tests (FDR-adjusted p < .05).B: grand-average time-frequency spectrum timelocked to STD tones (− 0.2 to 0.4 s).The frequency range spans 1-40 Hz with a frequency resolution of 0.25 Hz.The red rectangle highlights predominant responses in the mu (8-15 Hz) frequency range, on which we performed statistical comparisons.C: extracted time-course of mu-band activity in hypothetical S-w positions, time-locked to STD tones onsets, in blue for odd-numbered positions (Strong binary accent) and red for even-numbered positions (weak binary accent).Shaded colors indicate standard errors.On top, a grey rectangle delineates the time-window in which peak amplitude extraction is performed.