Dyadic entrainment. An overwhelming and modality-dependent phenomenon.

Rhythmic joint coordination is ubiquitous in daily-life human activities. In order to coordinate their actions towards shared goals, individuals need to co-regulate their timing and move together at the collective level of behavior. Remarkably, basic forms of coordinated behavior tend to emerge spontaneously as long as two individuals are exposed to each other’s rhythmic movements. The present study investigated the dynamics of spontaneous dyadic entrainment, and more specifically how they depend on the sensory modalities mediating informational coupling. By means of a novel interactive paradigm, we showed that dyadic entrainment systematically takes place during a minimalistic rhythmic task despite explicit instructions to ignore the partner. Crucially, the interaction was organized by clear dynamics in a modality-dependent fashion. Our results showed highly consistent coordination patterns in visually-mediated entrainment, whereas we observed more chaotic and more variable profiles in the auditorily-mediated counterpart. The proposed experimental paradigm yields empirical evidence for the overwhelming tendency of dyads to behave as coupled rhythmic units. In the context of our experimental design, it showed that coordination dynamics differ according to availability and nature of perceptual information. Interventions aimed at rehabilitating, teaching or training sensorimotor functions can be ultimately informed and optimized by such fundamental knowledge.


Introduction
Humans are in remarkable ways influenced by what they hear and see, especially when they perceive other people moving. The exposure to rhythmic behaviors attracts individuals to fall in sync with one another, and the phenomenon can be observed at the level of dyads, small-groups and large crowds 1 . For instance, when two people walk, run or dance together, they show a natural tendency to synchronize their steps. When an audience applauds at the end of the concert, the auditory scene goes through intermittent periods of synchronized clapping 2 . When marching bands cross each other on the street, it requires conscious effort for them to avoid getting rhythmically entrained 3 . Rhythmic joint coordination is ubiquitous in daily-life human activities, from the simplest form of unintentional synchronization 4 to complex tasks such as musical performance 5 . The way different rhythms organize into orderly coordinated patterns over time implies co-regulated actions in response to rhythms produced by other agents. Characteristic for this co-regulation is that action patterns are often spontaneously attracted towards stable synchronization, via a process called entrainment 6 . 'Coordination dynamics' has been proposed as a theoretical framework to understand the generic organizational principles that underlie the coordination of coupled rhythmic units 4,[6][7][8][9] . Central to the theory is that the coordination of multiple units over time is the result of a dynamic balance of keeping the own rhythm (competition) and moving together on the collective level (cooperation) 10 . Such idea resonates in earlier work by von Holst 11 , who termed the 'maintenance effect' versus the 'magnet effect' in his study of coordination in animal behavior.
When intrinsic and external rhythms are incongruent, rhythmic units engage in a dynamic competition-cooperation process. When it comes to rhythmic interactions within a dyad, this dynamic balance can be influenced by manifold variables, such as preferred tempo 12 , empathy 13 and social factors 14 . However, among all possible sources of variability, informational coupling between individuals stands out as the most fundamental condition for interpersonal coordination. The essential minimal requirement for human interactions to occur is that individuals mutually exchange information through one or more sensory channels. Although experimental interactive scenarios necessarily imply a determined sensory modality [15][16][17] and possibly cross-modal interactions 18 , the role of competing sensory modalities remains under-looked in the literature. To our knowledge, whether and how coordination dynamics depend on the nature of available perceptual information is yet to be investigated. The aim of the present study is to develop a dynamic experimental paradigm, and apply it for studying modality dependencies of competition-cooperation dynamics in dyadic entrainment. The paradigm involves a unimanual tapping task where two individuals are exposed to minimally different tempi and reciprocally coupled in visual and auditory modalities across experimental conditions. The procedure implies a tension between competition and cooperation processes in dyadic interaction, and is meant to return an empirical layout of its dynamics across sensory modalities.
The core idea underlying the proposed experimental paradigm is to induce a competition between the intrinsic rhythm of an individual's actions and the rhythm produced by another person. During a unimanual tapping task, two partners are exposed to two metronomes in different modalities (visual and auditory). The metronomes assigned to the partners slightly differ in frequency, resulting in a gradual increasing (0 to π rad) and decreasing (π to 0 rad) of absolute relative phase over multiple cycles. Accordingly, for each person, a corresponding cyclical de-phasing of the assigned metronome and partner's metronome unfolds over time. With this simple expedient, we implemented a completely predictable system of two oscillators which deterministically revisits the same states throughout 10 consecutive cycles. An audio file showing metronomes' behavior in available in the Supplementary materials. When the partners are informationally coupled, they are exposed to two competing rhythms. The first one is the intended rhythm: participants are explicitly instructed to synchronize to their assigned metronome. The second rhythm is the coupling rhythm, which is produced by the partner and perceived in another sensory modality. Thus, when the members of the dyad follow their own auditory metronome, they are exposed to visual information from the partner (visual coupling). Alternatively, when the members of the dyad follow their own visual metronome, they are exposed to auditory information from the partner (auditory coupling). The same task performed in absence of informational coupling provides the control conditions for each modality, and a baseline to assess the impact of coupling on spontaneous dyadic entrainment. In presence of informational coupling, a cross-modal incongruency takes place between intended and coupling rhythms. The experimental design is schematized and described in Figure 1.
In the framework of coordination dynamics, the concept of relative phase (ϕ) functions as a collective variable to describe co-regulation of timing. When competition dominates the interaction, rhythmic units may fully keep their own intrinsic frequency, while they may be fully attracted towards each other as cooperation prevails. The former case would typically result in an evenly spread distribution across ϕ (0 to 2π radians). In the latter case, the distribution would exhibit peaks over the so-called attractor regions, where rhythmic units are locked in temporally stable state. In human behavior, the intrinsic dynamics of the collective variable ϕ are well characterized by a layout of attractive and repelling fixed points, wherein in-phase (0 radians) and anti-phase (π radians) are the dominant stable modes of coordination. The Haken-Kelso-Bunz (HKB) model mathematically describes how phase transitions from one spatiotemporal pattern to another take place when the system is pushed beyond its equilibrium state 19 . The model points at the in-phase mode as the point where the system ultimately tends to reach stability. Crucially, the behavior of the 'drifting metronomes' recursively guided the dyads through the whole range of ϕ values -and over its respective attractor regions. Based on a joint recurrence analysis of the system's dynamics 20 , we computed a recurrence score as a metric of dyadic coupling strength and tracked its evolution over time (for a visual representation of the processing pipeline, see Figure 2 & 3). By these means, we obtained an empirical attractor layout descriptive of the competition-cooperation dynamics within the dyads. The cooperation process (i.e., entrainment to the partner's coupling rhythm) was expected to dominate the conflict around in-phase (0 radians) and anti-phase (π radians) points, leading to significantly higher recurrence score in presence of informational coupling and a significant modulation around attractor regions. Given the time-varying nature of the recurrence score as a response variable, we needed a method to model its local variations over the course of the metronomes' cycles. We opted then for a statistical model which allowed us to assess the effects of our experimental manipulations over the entire unfolding of the interaction, and to avoid segmentation and multiple comparison while respecting the temporal structure of the data 21,22 . In addition to the inferences made possible by linear models based on intercepts, such as ANOVAs, our solution allowed to assess the effect of categorical predictors on the temporal profile of the response variable.

Results
Growth curve analysis 23 was used to analyze the evolution of recurrence score over the course of the expected relative phase between partners (i.e., from 0 to 2π). The curves were calculated as the average of the 10 consecutive metronomes cycles in each experimental condition, and modeled with 2 nd order orthogonal polynomials and fixed effects of Coupling and Modality on the polynomial terms. The uncoupled (control) conditions were treated as the baseline, and parameters were estimated for the coupled conditions. Similarly, the audio conditions were treated as a baseline, and parameters were estimated for the visual conditions. In the context of orthogonal polynomials, a flat line can be considered a pure intercept and a 'zeroorder' polynomial, in the sense that it exhibits zero changes in any direction. If the recurrence score was time-invariant, it would appear as a flat line indicating complete dominance of the competition process between rhythmic units, which would pursue their individual intended rhythms. On the other hand, significant changes of direction would indicate that the interaction is systematically modulated by the temporal structure of the task: around attractor points, the cooperation process would take over and the rhythmic units would be attracted towards the coupling rhythm. Figure 4 shows how orthogonal polynomials are suited for modelling local variations of the response variable around attractor points. Orthogonal polynomials in the model were limited to the 2 nd order: based on our hypotheses, we expected quadratic and quartic terms to capture the effects of in-phase and anti-phase attractor points. However, following data inspection and analysis of residuals, we concluded that the quadratic term alone was enough to capture the relevant effect of our experimental manipulation. Figure 5 shows the data with descriptive statistics across experimental conditions, and a comparison between 2 nd order and 4 th order models: arguably, the better fit of the quartic term comes with a major cost in terms of model parsimony and interpretation. Our model also included random effects of Dyads on all polynomial terms, and their interactions with the factors: the random effects structure was maximized in order to minimize false alarm rates without substantial loss of power 24 . As expected, we found a significant main effect of Coupling (Estimate = 42.809, SE = 18.142, p = 0.018) indicating an overall increase of the recurrence score in presence of informational coupling as compared to the uncoupled control conditions, independently from the sensory modality. We also report a significant interaction effect of Coupling and Modality on the orthogonal polynomial terms of Time (Linear: Estimate = -84.579, SE = 30.867, p = 0.006; Quadratic: Estimate = 190.320, SE = 36.973, p < 0.001), meaning that the parameter estimates in coupled conditions significantly differ across visual and auditory modalities. Given the observed major variability in the Auditory level and following inspection of the residuals, we fitted the same model on the natural logtransformed response variable for it provided a considerably better fit, and still found significant main effect of Coupling (Estimate = 0.237, SE = 0.117, p = 0.042) and interaction effects on the polynomial terms of Time (Linear: Estimate = -0.534, SE = 0.222, p = 0.016; Quadratic: Estimate = 1.257, SE = 0.299, p < 0.001). Table 1 shows the fixed effects parameter estimates and their standard errors for log-transformed data, along with p-values estimated using the normal approximation for the t-values.

Recurrence score Predictors
Estimates SE p In the context of growth curve analysis 23 , the parameter estimates provide a measure of effect size of straightforward interpretation for linear and non-linear changes overtime as long as the order not too high: above the 4th order, results become hardly interpretable and the risk of over-fitting the data increases. With the interaction effect of Coupling and Modality factors on the polynomial terms, we could quantify modality-specific effects of informational coupling on the evolution of the recurrence score. The results strongly support our predictions of spontaneous dyadic entrainment in presence of informational coupling and, crucially, show that its temporal dynamics are modalitydependent.

Discussion
The present study investigated the dynamics of dyadic entrainment, and more specifically how they depend on the sensory modality mediating informational coupling. An experimental paradigm was designed to dynamically manipulate the expected relative-phase between participants, while informational coupling was manipulated in visual and auditory modalities across conditions. At the global level, we wanted to prove that when intended and coupling rhythms are at odds, a spontaneous co-regulation of timing occurs within the dyad at the expenses of the individual intended rhythms. At the local level, we wanted to infer the configuration of a dyadic basin of attraction, by modelling the dyadic cooperation process around attractor points 9 . Ultimately, it was possible to evidence a crucial modality-dependency of dyadic entrainment.
In the first place, we reported that the mere presence of informational coupling led to spontaneous entrainment in dyads engaged in a rhythmic task. As long as two partners perceived each other, their rhythms were 'attracted' towards each other at the expenses of the synchronization with the respectively assigned metronomes. The effect was quantified as a significant increase of the average recurrence score, compared to the baseline provided by the de-coupled conditions. Remarkably, such manifestation of self-organizing behavior 9,25 emerged in the context of a task where subjects were explicitly instructed to ignore each other's actions and to pursue independent uncoupled behaviors cued by their own reference. Even though the observation of spontaneous dyadic entrainment was reported in self-paced 16,26 , cued 27,28 and joint improvisation tasks 29-31 , our procedure explicitly implemented rhythmic incongruency to investigate the competitioncooperation dynamic balance latent to the system's dynamics 10 . When competition prevails, rhythmic units manage to keep the assigned (intended) rhythm, whereas when cooperation takes over, rhythmic units undergo a (spontaneous) co-regulation of timing and move together on a collective level of coordinated behavior. Dyadic entrainment occurred regardless of the sensory modality mediating the informational coupling.
Beyond a global measure of coupling, the continuous de-phasing implemented in the 'drifting metronomes' allowed us to quantify local variations of recurrence score as a function of the expected relative phase between partners. The balance between cooperation and competition processes is dynamic, in the sense that it varies over time and is constrained by the intrinsic dynamics of the system. Dyadic entrainment was expected to occur more strongly and more consistently around attractor regions, located around the in-phase and anti-phase points of the metronomes' cycles 19 . With our approach, we achieved to return an empirical 'attractor landscape' 32 and to contrast it across conditions of visual and auditory coupling. Our findings strongly point towards a modality-dependency of dyadic entrainment (see Figure 5). The interaction effect of Coupling and Modality on the linear and quadratic terms of Time statistically confirms that the temporal modulation is an exclusive property of visually-mediated coupling, with a remarkable consistency across dyads. In such condition, the recurrence score exhibits maximum values around the in-phase point at the start of the metronomes cycle, gradually decreases as the dyad leaves the attractor zone, reaches its minimum passed the anti-phase point, and finally increases towards maximum values as it enters the in-phase zone at the end of the cycle. We report here an effect of directionality in the visually coupled condition: dyads slowly de-couple as they leave the in-phase point, whereas they couple more rapidly as they approach the same point. The negative slope of the linear term indicates that dyads tend to be more coupled in the first portion of the cycle, namely before passing the anti-phase point. To the best of our knowledge, such phenomenon has not been reported in the literature before.
Differently, in condition of auditory coupling the recurrence score did not exhibit any significant variation over time: the measure was locally independent from the proximity of attractor points. Nevertheless, we still observed a significant global increase of recurrences compared to the baseline. Previous evidence shows that phase shifts in auditory sequences elicit a phase-correction response in participants instructed to tap in synchrony to isochronous sequence of flashing lights 33 . Auditory-driven phase corrections occur even when shifts are highly irregular 34 and at different cross-modal relative phases, with overall high interindividual variability 35 . In the context of our paradigm, sounds generated by the partner exerted the same attraction among individual rhythms regardless of the expected relative-phase. Finally, both control conditions resulted in a flat horizontal line below the average of the respective coupled conditions, as expected. It is worth noticing the higher variability in Audio conditions, arguably due to the task of synchronizing to visual metronomes: from lower recurrence score in the control condition we can conclude that dyads performed globally worse in synchronizing to visual cues as compared to auditory cues, which is in line with previous evidence on modality-dependent synchronization skills 36,37 , and with participants' self-reports about the perceived difficulty of the task. Taken together, our findings support our hypothesis of modality-dependent dynamics, suggesting that auditorily-mediated entrainment is considerably less sensitive to the basin of attraction when compared to its visual counterpart. Although interpersonal coordination can be described at the collective level as a self-organizing process 9,25 , humans ultimately entrain to each other via mutual adaptation of individual actionperception loops 16,17 . Therefore, an exhaustive account for dyadic entrainment should be capable of bridging the two levels, considering how coordination dynamics depend on the processing of information available in the interaction. Predictive coding [38][39][40] was first formulated as a general theory for neural computation, and more recently put forward as the potential mechanism underpinning coordination dynamics in dyadic behavior 41 . The theory states that the brain is constantly engaged in an optimization process based on Bayesian inference: information sampled from the environment is compared to prior evidence, and the optimization consists of minimizing prediction errors. In the context of interpersonal coordination, such a putative mechanism could underpin the overt manifestations of dyadic entrainment by engaging the action systems of the rhythmic units involved. Let us consider the prediction error as the phase difference between executed and observed movements: as both units minimize the mismatch between the representations of observed and own motor behavior, the dyadic system tends towards a collective minimization of free energy in a stable attractor state 41 . Such interpretation of the dyadic attractor is supported by our results in the context of visually-mediated coupling: as the relative phase between partners increases, the cooperation process between rhythmic units loses strength in favor of competition. In other terms, the tendency of the partners to lock in a stable coordinative state depends on the energy required to correct for the prediction error. As the error increases, the correction becomes more effortful and it becomes easier for the units to pursue independent behaviors. Crucially, such explanation does not hold for auditorily-mediated coupling. As distinct sensory systems have their unique interface with motor and timing systems 37,42,43 , it is necessary to interpret the results in light of individual information processing. There is arguably a crucial difference in the nature of the perceptual information available to the partners across different modalities. When the coupling is visually mediated, a stream of kinematic information is continuously sampled from the partner's actions 44 , such that the predictive models can be constantly updated at every stage of movement execution. On the other hand, in case of auditory coupling the prediction of the partner's tap is solely based on temporal information about the previous taps, since partners do not have online access to the reciprocal kinematic information. In such condition, the cooperation process still dominates the interaction but no consistent pattern emerges in terms of dynamics as quantified by our method. We propose that the availability of kinematic information through informational coupling is a crucial discriminant that might explain the observed modality-dependence of dyadic entrainment. A final question arises spontaneous: how would the dyad behave if kinematic information was embedded in a sound envelope, and conveyed to the partners via the auditory channel? One limitation of the current work is that we cannot conclude whether dyadic entrainment is modalitydependent in the strict sense or rather dependent on availability of kinematic information. Given the central role attributed to kinematics in predictive accounts for action perception 44 , it is of particular interest to investigate whether continuous sonification of movement parameters 45 would support the same dynamics observed in presence of visual coupling. Previous work shows that presenting visual stimuli in a way that indicates movement over time, e.g., apparent hand motion 46,47 or a bouncing ball 36,42,48 , behavioral outcomes of predictive entrainment are improved by the continuous stream of information. We hypothesize that a movement sonification strategy based on this principle could translate the kinematic advantage into auditorily-mediated dyadic interactions, resembling the dynamics here described in condition of visual coupling. The findings and the questions raised from the present study might have relevant implications for optimization of interventions aimed at rehabilitating, teaching or training sensorimotor functions. Fundamental knowledge about dyadic interactions is what will ultimately inform such optimization. The evidence from the present study points at the advantages of the physical presence of a teacher (or a therapist, or a trainer) for guiding rhythmic movements, possibly integrating kinematic information via sonification strategies 45 .
In conclusion, we want to highlight that the 'drifting metronomes' procedure is meant as a methodological contribution for the investigation of interpersonal coordination, in the hope that the community can build upon it with new research questions and experimental designs. We point at its adoption with simultaneous dual-electroencephalography recordings 49 , computational simulations 50 and replication on pathological populations 51-53 as potential sources of insight into the fundamentals of dyadic interactions.

Methods
Participants. Twenty-eight right handed participants took part in the study (18 females, 10 males; mean age = 29.07 years, standard deviation = 5.73 years). They were randomly paired in fourteen (N = 14) gender-matched dyads, in order to control for any gender bias in the interaction. None of them had history of neurological, major medical or psychiatric disorders. All of them declared not to be professional musicians upon recruitment, although some of them had musical experience. With the exception of one dyad, all participants declared not to know the assigned partner from before the experiment. The experiment was approved by the Ethics Committee of Ghent University (Faculty of Arts and Philosophy) and informed written consent was obtained from each participant, who received a 15€ coupon as economic compensation for their participation.
Experimental apparatus and procedure. The two partners sitting across the same table, facing each other. Chairs were provided with an armrest in order to exclude any tactile or proprioceptive coupling due to the table vibrating. Each partner was assigned to one pad and instructed to tap on it with the right index finger synchronizing with a metronome. As represented and described in Figure 1, the metronome was either an auditory cue or a flickering light, depending on the experimental condition. Each partner was presented with a different metronome tempo (1.67Hz and 1.64Hz). Aligning the start of the two tracks, the relative phase between the metronomes started at 0º and steadily increased in regular steps of 5.6º. A full cycle took 39 seconds to be completed. 10 consecutive cycles were performed in each experimental condition. In conditions of informational coupling, participants were instructed to ignore the partner and tap along with the assigned metronome. A M-Audio® M-Track 8 soundcard was used to route independent audio channels to each participant via in-ear plugs. Ableton Live 10 ® was adopted as main interface for stimuli presentation, to sonify in real-time the participants' finger-taps and to route them. The same MIDI tracks were used to control the metronomes across conditions, by either triggering an audio sample or a flickering LED. Volume was adapted to every participant before the start of the experiment, and pink noise was regulated up to the point of suppressing any sound but the auditory stimulation presented via Ableton. Participants were monitored on-line by means of a USB-camera, to make sure they complied with the instructions. No dyad was excluded from the analysis. A Teensy 3.2 microcontroller was used as serial/MIDI hub in the setup. It was used to detect tapping onsets with 1 ms resolution, based on the analog input from the piezo sensors installed inside the pads. Every time a metronome onset was presented to a participant, a MIDI message was sent to the Teensy device to log the metronomes timeseries and to control the voltage of the LEDs when needed. Simultaneous EEG recordings were performed from both partners of the dyads during the whole experiment, but such data were not presented in the present paper. Additional data were collected prior and during the experiment. Prior to the experiment, demographical data were collected; the Edinburgh inventory 54 was administered to assess the right handedness of the participants; the Interpersonal Reactivity Index (IRI) was administered as self-report of empathy and its subscales 55 ; the preferred spontaneous tempo was calculated via 30 seconds of self-paced finger-tapping on a dedicated smartphone app (www.beatsperminuteonline.com). During the breaks between experimental conditions, all participants provided subjective self-reports on different aspects of the task by expressing agreement on a scale from 1 ("Completely disagree") to 7 ("Completely agree") with a custom-made battery of 11 Likert items.

Data Analysis
Pre-processing. Our raw data consisted of timestamps logged from the Teensy controller with 1ms resolution for 10 consecutive metronome cycles (390 seconds in total), and an associated ID for each partner and each metronome. As only form of cleaning, we removed onsets occurring <350ms from the previous one: false positives could occasionally be recorded when a participant pushed the pad for too long or accidentally laid the hand on it. The cleaned timeseries were then interpolated with a sine function at 1kHz sampling rate, providing an estimate of the oscillators' positions on its cycle with a temporal resolution of 1ms. Conceptually, the choice of sinusoidal interpolation was supported by recent work on modelling of systems of coupled oscillators in joint finger-tapping studies 50 . Operationally, it guaranteed that all timeseries match in size without any loss of data, which was a requirement for the next steps of our analysis. Timeseries were finally down-sampled by a factor of 40 to make the computation of RPs computationally feasible. Different orders of down-sampling were tested to make sure that the results do not depend on this choice.
Phase-space reconstruction. The optimal parameters for the time-delayed embedding were computed for each participant, for the time course of each single metronomes cycle in all experimental conditions. The resulting mean value was applied to all individual instances. The reason for this approach is that in order to compare the rate of recurrences across conditions at the group level, the embedding procedure must be consistent across participants (e.g., see 56 , for an example of parameter selections in a factorial design). We first selected the delay tau of the timeseries (τ) as the first local minimum mutual information index 57 as a function of delay: this approach minimized the timeseries self-similarity, extracting nearly orthogonal components and preventing the attractor from folding over itself 58 . The mean value of τ resulted to be 7. Next, we determined the number of embedding dimensions m with the method of false nearest neighbor 59 : we progressively unfolded the time series into higher dimensions until the data points did not overlap spuriously, finding a mean m = 3. Finally, in order to determine a maximum threshold for counting two neighboring points as recurrent, we selected a radius of 10% of the maximal phasespace diameter 20 .

Joint Recurrence Plots (JRPs).
Individual recurrence plots where computed as follows: where ε is the neighborhood threshold, ǁ ⋅ ǁ is the Euclidean norm, and Θ is the Heaviside step function. A square matrix was returned from each shadow-manifold in the phase-space, containing 1s for all the instances where the distance ǁ · ǁ was smaller than the threshold ε, and 0s for the remaining elements. The distance was computed with the method of maximum norm. A joint recurrence plot (JRP) was computed for each dyad by overlapping the individual JRPs of the partners pair-wise, and keeping as 1 only the instances where both plots contain a recurrence. The computation of the JRPs was carried out with the crp toolbox for Matlab ® 20 . The 10 trials (i.e., the metronome cycles) of each experimental condition were aggregated by summing the respective JRPs such that every cell of the 2-D matrix contains the count of times that a recurrence occurs in the same point of the cycle. Finally, we looped over the columns of the matrix summing all the counts contained in the rows, obtaining a 1-D vector recurrence scores which represents a density measure of the instances of coupled behavior over the course of the cycle. The scale of the recurrence score depends on the size of the JRPs and in turn on the embedding procedure, which makes it necessary to set the same parameters on the whole sample. In order to stabilize our response variable and avoid over-sampling in view of our statistical model, the resulting timeseries were divided in 64 segments averaging the recurrence score. For the division, we chose the intervals determined by the steps of the slower metronome, as they provide an intrinsic regular subdivision of the experimental runs. All the steps presented so far were carried out in Matlab ® (see Figure 2, for a schematic representation). Our approach was preferred over the "windowed" version for JRQA, for the latter acted as a low-pass filter on our timeseries, making it impossible to interpret our results. The resulting phase-shift in the timeseries was dependent on the choice of the window size, and hence not reliable in detecting attractor points over the landscape.
Statistical model. The recurrence score was used as response variable in a mixed-effects model with Modality and Coupling as factors, and Time as a continuous predictor expressed with the indexes of the metronome's steps (from 1 to 64). Given the non-linear time-course observed in the 'Visual Coupling' condition, we adopted the method of orthogonal polynomials 23 including linear and quadratic functions of Time into our model (see Figure 5). Dyads and interactions between Dyads and Factors were modelled as random effects 24

Acknowledgments
The present study was funded by Bijzonder Onderzoeksfonds (BOF) from Ghent University (Belgium). The hardware of the recording device was built by Ivan Schepers.   Finally, by looping over the column and summing all the rows of the matrix, F) we compressed the representation into a 1-D timeseries of the total count of recurrences over the cycle structure. The presence of diagonal lines dominating the JRPs is exactly the configuration expected by a deterministic system 20 , and when compressed in the timeseries format result in a flat horizontal line with small periodic fluctuations (due to the down-sampling performed on the sinewaves prior to embedding, which was necessary to make JRPs calculations computationally feasible). Applying the pipeline to human dyads allowed us to map the local variations of their coupling strength over the expected relative phase, resulting in the picture of the 'attractor landscape' 32 .   ; for the quadratic component (2nd  order), it corresponds to the parabolic curvature; for the cubic and quartic components (3rd and 4th order), it corresponds to the sharpness of the peaks on the inflexion points. The order of the polynomial is ideally chosen based on hypothesis and on the nature of the data, is should be confirmed by the empirical data, and should be allow a straightforward interpretation of the effects 23 . In our case, orders from 2 to 4 seem suited candidates for modelling different possible attractor landscapes, given the symmetric measure of the cycle and inflection points around in-phase and anti-phase regions. A full model based on orthogonal polynomials includes all terms up to the chosen order. The condition of auditory coupling shows a recurrence score higher on average but no clear pattern over the average cycle, and overall higher variability. B) Recurrence score predicted by our 2 nd order polynomial model. The inflection of the parabolic term and the negative slope of the linear term successfully capture the influence of the in-phase attractor points in the visually coupled condition. Arguably, this is the most parsimonious model for our data in the framework of growth curve analysis 23 . C) Recurrence score predicted by our 4 th order polynomial model. The quartic fit manages to capture more fine-grained modulations, particularly the local maxima on the inflection points at the extremes of the cycle. The profile recalls the typical 'seagull-effect' originally reported in 32 and discussed in 9 in the context of bimanual coordination dynamics. However, for the sake of interpretation of parameter estimates, we opted for the simplest possible model: given the absence of a local maximum on the anti-phase point, a single parabolic inflection point was sufficient to explain the effect of visual coupling. D) Within-dyads comparison of the recurrence score over the average trial, across experimental conditions. Dots represent the empirical data, whereas continuous lines represent the full model (up to the 2 nd order polynomial) fitted to the data. Table 1. Model summary. All significant effects are marked with an asterisk, and the associated p-values are highlighted in bold. We reported a significant main effect of Modality and significant 3way interactions with linear and quadratic functions of Time. In the framework of orthogonal polynomials, coefficient estimates represent a measure of effect size 23 : the intercept represents the average increase in the response variable contrasted to the baseline (i.e., the Uncoupled level), the slope of the linear term represents the change in the response variable per unit of time, and the quadratic term represents the steepness of the parabolic inflection. Given the response variable was transformed to the log-scale, the natural exponential of the coefficient estimates should be used for interpretation purposes. The significant correlation with the quadratic function of time was not discussed in the text, for it does not have relevance in the context of the experimental design.