Inter‐subject correlation of electroencephalographic and behavioural responses reflects time‐varying engagement with natural music

Musical engagement can be conceptualized through various activities, modes of listening and listener states. Recent research has reported that a state of focused engagement can be indexed by the inter‐subject correlation (ISC) of audience responses to a shared naturalistic stimulus. While statistically significant ISC has been reported during music listening, we lack insight into the temporal dynamics of engagement over the course of musical works—such as those composed in the Western classical style—which involve the formulation of expectations that are realized or derailed at subsequent points of arrival. Here, we use the ISC of electroencephalographic (EEG) and continuous behavioural (CB) responses to investigate the time‐varying dynamics of engagement with functional tonal music. From a sample of adult musicians who listened to a complete cello concerto movement, we found that ISC varied throughout the excerpt for both measures. In particular, significant EEG ISC was observed during periods of musical tension that built to climactic highpoints, while significant CB ISC corresponded more to declarative entrances and points of arrival. Moreover, we found that a control stimulus retaining envelope characteristics of the intact music, but little other temporal structure, also elicited significantly correlated EEG and CB responses, though to lesser extents than the original version. In sum, these findings shed light on the temporal dynamics of engagement during music listening and clarify specific aspects of musical engagement that may be indexed by each measure.

Musical engagement can implicate diverse aspects of the musical experience, from creation to interpretation to reception (Kaneshiro, 2016).Engaging with music is typically a hedonic experience able to evoke affective responses ranging from core affects to implicit empathy (Huron & Vuoskoski, 2020).Most studies of engagement focus on the listener, who can engage with music in a variety of ways including denotative, connotative, reflexive and associative modes of listening (Huron, 2002).A broadly encompassing definition of listener engagement with music is suggested by Schubert et al. (2013) as being 'compelled, drawn in, connected to what is happening, [and] interested in what will happen next'.
Engagement with music involves dynamic processes in which attention is drawn to both surface and structural musical features.Directing and maintaining attention across a varying musical landscape involves coactivated brain regions critical in formulating expectations and consequently updating perceptual frameworks (Sridharan et al., 2007).These processes entail varying types of saliency that result in both expectation formulation and response to the degree to which these expectations are realized or thwarted.Anticipation is often characterized in terms of musical tension (Farbood, 2012;Lerdahl & Krumhansl, 2007;Madsen & Fredrickson, 1993), whose neural correlates have been studied by Lehne et al. (2013) and others.Music-theoretic approaches isolate moments within these dynamic processes that constitute salient or climactic points of arrival-for example, 'highpoints' (Agawu, 1984) or 'tipping points' (Chew, 2016).A number of studies have considered self-reports of musical engagement, including retrospective ratings (Agres et al., 2017;Dauer et al., 2021;Kaneshiro et al., 2020), questionnaires assessing trait tendencies to enter states of absorption with music (Sandstrom & Russo, 2013) and continuous ratings delivered as music plays (Broughton et al., 2019;Dauer et al., 2021;Gregory, 1989;Olsen et al., 2014).
Electroencephalography (EEG) is a popular modality for studying the neural processing of music, as its temporal resolution is sufficiently high to interrogate specific events in time.Classical EEG paradigms involving repeated presentations of short, often tightly controlled stimuli tend to be incompatible with investigations calling for complete naturalistic works and single stimulus exposures (Kaneshiro, 2016), and until recently, few studies (e.g., Leslie et al., 2014) have used EEG to index engagement in realistic musical settings.However, intersubject correlation (ISC) paradigms-in which audience members' neural responses to a shared stimulus are correlated with one another-facilitate the study of engagement with single exposures to ecologically valid stimuli.The ISC approach was originally introduced in functional magnetic resonance imaging (fMRI) research as a means of analysing responses to natural film excerpts (Hasson et al., 2004); methodologies for EEG were later developed by Dmochowski et al. (2012).Since then, EEG studies using narrative stimuli-such as excerpts of films or speeches-have reported that ISC indexes narrative cohesion (Dmochowski et al., 2012;Ki et al., 2016), largescale population preferences (Dmochowski et al., 2014), attentional state (Cohen et al., 2018;Ki et al., 2016), video viewership (Cohen et al., 2017), memory retention (Cohen & Parra, 2016), perceived emotion (Ding et al., 2021) and even individual learning outcomes (Cohen et al., 2018).Given these findings, EEG ISC is sometimes interpreted as a measure of audience engagement, described by Dmochowski et al. (2012) as 'emotionally laden attention'.
ISC has been used to study the processing of music as well.fMRI studies have related the measure to neural tracking of time-varying stimulus features (Alluri et al., 2012), temporal cohesion of music (Abrams et al., 2013;Farbood et al., 2015), emotional responses to music (Trost et al., 2015) and effects of training (Fasano et al., 2020).Recent EEG studies focusing on musical engagement have shown ISC to be modulated by training, repeated exposures and familiarity with genres (Madsen et al., 2019) and have used ISC to study the impacts of temporal stimulus manipulations, beat processing and repetition (Kaneshiro et al., 2020), engagement with specific genres (Dauer et al., 2021) and structural repetition (Rajagopalan & Kaneshiro, 2023) in full-length songs.
One advantage of EEG's high temporal resolution is that it affords the calculation of ISC over short time windows, producing time-varying measures on the scale of seconds.Dmochowski et al. (2012) applied this approach in their first EEG ISC study of film viewing, finding that ISC not only varied over the course of a film excerpt but also peaked during periods of high narrative tension and suspense; Cohen et al. (2017) later used temporally resolved EEG ISC to predict real-world engagement with videos.In studies involving EEG ISC and continuous behavioural (CB) measures together, Ding et al. (2021) used ISC and other EEG features along with a regression approach to predict continuous annotations of arousal and valence delivered while viewing emotional film excerpts, while Dauer et al. (2021) compared the ISC of both EEG and CB reports of engagement collected while participants heard stimuli derived from a well-known minimalist musical work.
EEG ISC studies of music listening often consider correlation across entire excerpts on the order of minutes (Kaneshiro et al., 2020;Madsen et al., 2019).However, a deeper understanding of the neural correlates of musical engagement calls for insights into temporal dynamicsthat is, variations in engagement over the course of an excerpt.As musical tension is thought to share the same underlying states as narrative tension in film excerptsof dissonance and uncertainty seeking more stable ground (Lehne & Koelsch, 2015)-it is plausible that time-resolved ISC could highlight specific moments of high engagement during music listening as well.Dauer et al. (2021) highlighted the promise of assessing musical engagement-via time-varying EEG and CB ISC together-with minimalist, process-oriented music, in which 'a compositional process and a sounding music […] are one and the same thing' (Reich, 2009).Musicologist H. Wiley Hitchcock described minimalism as having 'nothing to do with more traditional kinds of musical divisions of time-beats, measures, the division of measures into "strongly" and "weakly" accented portions, the build-up of measures into phrases or phrases into periods, and the like.These are background […] for other material that is conceived-and perceived-as foreground' (Hitchcock, 1996).In the repertoire of traditional functional tonal music, however, listener engagement involves the formulation of expectations generated by more extended, goal-oriented time frames.
Temporally manipulated natural music has been used in neuroscience research for over 20 years (Levitin & Menon, 2003;Menon & Levitin, 2005), and both fMRI (Abrams et al., 2013) and EEG (Kaneshiro et al., 2020) ISC studies have employed phase scrambling to disrupt the temporal structure of music while preserving aggregate frequency characteristics.In these previous ISC studies, phase-scrambled excerpts elicited lower neural correlation and implicated different brain areas than intact natural music, implying that auditory stimulation alone does not explain neural correlation observed during music listening.However, both studies noted that phase scrambling-while preserving 'long-term spectral features' (Abrams et al., 2013)-removed numerous other structural elements of the intact music including but not limited to thematic, tonal and dynamic features; tempo; and characteristic variations in loudness.Moreover, Kaneshiro et al. (2020) found that stimuli subjected to other, less extreme forms of temporal manipulationthat is, time reversal and shuffling at the measure (fourbeat) level-elicited responses with ISC exceeding the intact music.Therefore, there remain numerous stimulus attributes whose contributions to neural correlation have yet to be understood.
In the present study, we sought to index time-varying engagement-which could include temporal dynamic processes of attraction, attention and anticipation-with naturalistic tonal music as operationalized by ISC.We analysed EEG responses and continuous self-reports from adult musicians who listened to a real-world concerto movement whose escalations to climatic events we consider conducive to the definition of engagement provided by Schubert et al. (2013) and whose climactic culminations we view as aligned with theoretically relevant points of arrival (Agawu, 1984;Chew, 2016).We also included a control stimulus providing incrementally more temporal structure than phase-scrambled stimuli used previously (Abrams et al., 2013;Kaneshiro et al., 2020) and took an exploratory approach based on past investigations of temporally resolved EEG ISC during film viewing (Dmochowski et al., 2012), conducting analyses over three maximally correlated neural components.We hypothesized first that ISC would vary over the course of the original excerpt and peak during buildups to musical 'highpoints', analogous to periods of tension and suspense in more explicitly narrative stimuli such as films (Dmochowski et al., 2012;Lehne & Koelsch, 2015), and that correspondences would emerge across the EEG and behavioural data modalities.Second, based on the prominent role of the audio envelope in speech processing (Aiken & Picton, 2008;Drullman et al., 1994;Shannon et al., 1995;van der Horst et al., 1999;Van Tasell et al., 1987), we hypothesized that the control stimulus-which applied the envelope of the intact music onto a phase-scrambled version-would elicit responses with higher neural correlation than phase scrambling alone.However, we also predicted that fluctuations in audio envelope would play a role in, but not fully explain, neural correlation during music listening and thus expected that intact music would elicit higher ISC.

| Ethics approval statement
This research was approved by the Institutional Review Board of Stanford University (Protocol Number IRB-28863).All participants delivered written informed consent prior to taking part in the experiment.

| Stimulus selection and characterization
We used two stimuli in this study: the complete opening movement of Edward Elgar's Cello Concerto in E minor, Op. 85, performed by Jacqueline du Pré and the London Symphony Orchestra, conducted by Sir John Barbirolli; and a temporally manipulated version of the performance as described below.Composed in 1919, Elgar's concerto is a dramatic work with a wide dynamic and expressive range as well as highly contrasting textures.The work comprises two waves of dramatic gestures involving buildups of tension that sweep to culminating 'highpoints' (Agawu, 1984).du Pré's influential 1965 recording is known for reviving the work and bringing it into the standard orchestral repertoire (Solomon, 2009).The concerto, and the du Pré performance in particular, is widely noted as being highly expressive and arousing. 1he movement is in ternary (ABA 0 ) form with framing sections in E minor and the middle section primarily in E major.The sections are linked through two similar motivic elements, which share a long-short-long rhythmic pattern encompassing neighbouring melodic motion.The unifying motivic elements pervade the movement with varying tempi, underlying harmony, melody, orchestration and highly contrasting levels of intensity.The framing A and A 0 sections each include a climactic event preceded by the main theme played by the soloist, followed by textural and dynamic increases that culminate in a melodic ascent to an E-minor cadence.
To guide our analyses, we identified in advance a set of salient musical events (shown as dashed vertical lines in Figure 1a,b).First, we labelled as E1 and E1 0 the two introductions of the main theme played by the soloist, which signify the start of each buildup to climactic highpoints in the movement.The theme is then taken up by the strings at E2 and E2 0 , and in E3, it is played by the soloist, now definitively in the tonic key of E minor (E3 is not reprised in the second buildup).Finally, an ascending melodic line played by the soloist reaches a climactic E-minor cadence at E4 and E4 0 , which are the structural highpoints.ISC as a measure of anticipatory engagement would suggest significant peaks between E1 and E4-and later between E1 0 and E4 0 -because these are regions of rising tension and suspense that ultimately culminate in the cadential highpoints (cf. the definition of engagement from Schubert et al., 2013, andISC peaks reported by Dmochowski et al., 2012).On the other hand, engagement reflecting culmination and arrival at musical 'highpoints' would correspond to high ISC at the arrival points demarcated by E4 and E4 0 .Finally, we labelled as E5 the start of the B section of the movement; this caesura between sections and subsequent introduction of new musical themes could be considered a point of both suspense and arrival, though it is not a 'highpoint'.

| Stimulus preparation
Both stimuli were 480 s (8 min) in length.We purchased the EMI Records Ltd. digitally remastered version of the recording (Elgar et al., 1999) in digital format from iTunes.Using Audacity2 recording and editing software, we remixed the .m4astereo recording to mono and exported the result to .wavformat.Subsequent audio processing was performed using MATLAB. 3First, a linear fade-in and fade-out were applied to the first and last 1 s of the recording, respectively; the resulting audio served as the original stimulus.
The audio was processed further to create the control stimulus.First, we performed the phase-scrambling procedure described in Kaneshiro et al. (2020), converting the audio waveform to the frequency domain, randomizing phase values at each frequency bin while preserving conjugate symmetry, and converting the result back to the time domain.Following this, we used the publicly available MIRtoolbox (Lartillot & Toiviainen, 2007) to compute the audio envelope of the original stimulus and subsequently scaled the phase-scrambled audio by this envelope.The result is reminiscent of envelope-shaped noise stimuli that have been used in past speech studies (Horii et al., 1971;Shannon et al., 1995;Van Tasell et al., 1987).Finally, we scaled the entire control waveform so that its global root mean square (RMS) value equalled that of the original.
Both stimuli are visualized in Figure 1.As shown in Figure 1a,b, the original (blue) and control (red) waveforms are visually similar but not identical.We also visualized the stimuli as spectrograms,4 which highlight spectral harmonics in the original but not the control stimulus (e.g., around 1 min 0 s).While phase scrambling alone preserves the exact power spectrum of the original audio, the additional envelope-scaling procedure produces a similar but not identical power spectrum.Figure 1c,d shows that the two stimuli remain similar on the basis of cumulative distributions of waveform values (computed from histograms with 500 bins each) and aggregate power spectra, respectively.

| Participants
We recruited right-handed participants who were 18-35 years of age, had normal hearing, had no cognitive or decisional impairments and were fluent in English.Past research has shown that formal musical training is associated with enhanced cortical responses to music (Pantev et al., 1998), and a recent EEG ISC study by Madsen et al. (2019) (Haueisen & Knösche, 2001).All participants confirmed their eligibility prior to participating.
From the 24 participants who completed the experiment, one participant's EEG data and a different participant's CB data were excluded during preprocessing-prior to spatial filtering, if performed, and ISC analyses-due to gross noise artefacts as described below.We thus obtained usable data from N = 23 participants for each response measure; demographic information about the samples is reported in Table 1.Classical-music listening was confirmed through both the verbal eligibility confirmation and a behavioural report delivered during the experiment (see Section 3).

| Experimental paradigm
The data analysed here are part of a larger study on multimodal listener responses to natural music.The study comprised demographic and music experience questionnaires followed by two blocks of music listening, all of which was completed in a single experimental session.
In the first block, after completing a short training session to become familiar with the experimental and task paradigms, the participant heard each stimulus, while EEG, electrocardiogram (ECG), and chest and abdomen respiratory plethysmography were recorded simultaneously.In the second block, the participant used a mouse-operated onscreen slider to report their level of engagement over time with each stimulus as it played; in this block, they were guided by Schubert et al.'s (2013) definition of engagement, as described below.Past studies have reported connections between EEG ISC and level of engagement even when participants were not primed to reflect on that state during EEG sessions (Dauer et al., 2021;Dmochowski et al., 2012Dmochowski et al., , 2014;;Kaneshiro et al., 2020).Therefore, in order to align with past research and avoid biasing responses, the EEG block always took place first so that participants were not yet prompted with a definition of engagement.
In the first block, the participant experienced each of the two stimuli one time.Each stimulus trial was preceded by a 60-s baseline period to be used during the analysis of physiological responses.During this interval, pink noise was played at a low volume while the participant sat still with eyes open and performed no task.Each baseline period was followed by a stimulus, to which the participant listened attentively with eyes open while avoiding movement and viewing a fixation image shown on a monitor 57 cm in front of them.After each stimulus, participants delivered key-press ratings on a scale of 1-9 on the degree of pleasantness, arousal, level of interest, predictability and familiarity of the stimulus just heard; these questions, adapted from those used in a previous EEG ISC music study (Kaneshiro et al., 2020), were intended to probe aspects of music listening relating to both engagement and (for the larger scope of the study) arousal.After the intact excerpt only, participants also used the same response scale to report how often they listened to the genre of classical music.The EEG electrode net and physiological sensors were removed after the first block.In the second block, participants again heard each stimulus one time, now without the preceding 60-s baseline period.The flow of this block was similar to that reported in the ISC study by Dauer et al. (2021).Before each stimulus, participants were shown a definition of engagement adapted from Schubert et al. (2013): 'Engagement-being compelled, drawn in, connected to what is happening, and interested in what will happen next' as well as the instruction, 'You will continuously rate your level of engagement with an excerpt as it plays'.Once the participant initiated a stimulus trial, they used a mouse-operated slider in accordance with the instruction, 'Rate your level of engagement as the excerpt plays', shown onscreen for the duration of the trial.The slider was horizontally oriented, with the text 'not at all' and 'very engaged' labelling the left and right end points, respectively.The slider position was initiated at the far left of the axis for each trial.
In each block, stimuli were presented in a pseudo-random order, ensuring an equal distribution of orderings across the original 24 participants.Our present analysis considers EEG data and behavioural ratings collected in the first block as well as CB data collected in the second block; thus, further descriptions of data acquisition, preprocessing and analysis will focus only on these data.

| EEG data acquisition and preprocessing
For the EEG block, stimulus presentation and key-press response acquisition were programmed using MATLAB's Psychophysics Toolbox (Brainard, 1997).Other aspects of stimulus delivery and EEG data acquisition were as described in Kaneshiro et al. (2020): Mono sound files were played through two magnetically shielded Genelec 1030A speakers, which were located 120 cm from the participant in an acoustically and electrically shielded ETS-Lindgren booth.The second audio channel (not played to participants) contained intermittent squarewave pulses, which were sent directly to the EEG amplifier for precise time stamping of stimulus onsets.EEG was recorded from 128-channel nets using the Electrical Geodesics, Inc. (Eugene, OR, USA) GES 300 platform (Tucker, 1993) and Net Station software at a sampling rate of 1 kHz, referenced to the vertex.Responses to stimuli and preceding baseline intervals, delivery of ratings and a short break after the first stimulus trial were acquired in a single recording.Electrode impedances were no higher than 60 kΩ at the start of each EEG recording (Ferree et al., 2001).
Continuous EEG recordings were exported to MATLAB file format using Net Station software.Subsequent preprocessing was performed on a per-recording basis, using custom MATLAB code as well as publicly available third-party code as described below.First, the continuous recording was highpass (.3 Hz, eighth-order Butterworth), notch (59-61 Hz, eighth-order Butterworth) and lowpass (50 Hz, eighth-order Chebyshev) filtered using zero-phase filters and then downsampled by a factor of 8 to a final sampling rate of 125 Hz.Following this, we extracted baseline and stimulus trial labels, event triggers of timing markers sent from the audio and behavioural ratings.Each baseline and stimulus trial was then epoched to the length of the respective stimulus using the precise timing triggers sent from the audio: Baseline trials (which were preprocessed but are not analysed here) were 60 s (7501 time samples) in length, and stimulus trials were 480 s (60,001 time samples) in length.We performed a median-based direct current (DC) correction on each of the four epochs and concatenated them in time.Electrooculogram (EOG) channels were computed from electrodes above and/or below the eyes (VEOG) and at the sides of the eyes (HEOG).We retained electrodes 1-124 for further analysis, excluding electrodes on the face.We used the EEGLAB Toolbox (Delorme & Makeig, 2004) implementation of extended Infomax independent component analysis (ICA) to identify and remove ocular and ECG artefacts from the data in a semi-automatic fashion (Bell & Sejnowski, 1995;Jung et al., 1998).Because the ICA function requires a full-rank input with no missing values, matrix rows representing bad electrodes-identified during the experimental session based on impedances and/or by visual inspection of the raw data, or with at least 10% of voltage magnitudes exceeding 50 μV across the recording after ICA artefact removal-were removed prior to the function call, temporarily reducing the number of rows in the data matrix.Once the data were converted from sensor space to ICA space, ocular artefacts were removed by automatically setting to zero all independent components whose activity returned a magnitude correlation with either EOG channel of jrj ≥ .3 and additionally setting to zero independent components with magnitude EOG correlations .2≤ jrj < .3 on the basis of manual inspection of component topographies and temporal activations (Kaneshiro et al., 2020).ICA components corresponding to ECG were identified by visually inspecting the time-domain waveforms of the 30 components with the highest mean projected variance and also set to zero.Following this, the data were converted back to sensor space.
Following ICA, we identified additional recording-wide bad electrodes as any electrode having 10% or more of voltage magnitudes greater than 50 μV.If any recordingwide bad electrodes were identified, we reinitiated preprocessing with that electrode excluded prior to ICA.
Once no recording-wide bad electrodes were identified after ICA, a number of final preprocessing steps were performed on the data from each trial separately.First, trial-wide bad electrodes were identified using the same percentage and voltage thresholds described above and temporarily removed from the data matrix of that trial.To address noisy transients, we performed a medianbased DC correction of the data and then used a fourstage iterative process, in each iteration identifying data points exceeding four standard deviations of the mean power of each electrode and replacing those data points with NaNs.We then filled the bad channel rows, which had previously been removed from the data matrix altogether, back in as rows of NaNs so that each data matrix contained 124 rows.Next, we added a row of zeros to the matrix (representing the Cz reference) and converted the data to average reference while ignoring missing values.As final steps, we imputed all missing values (NaNs) by computing spatial averages of data from neighbouring sensors and then performed a mean-based DC correction.
During the preprocessing stage, one participant was excluded from further analysis due to gross noise artefacts (20 recording-wide bad electrodes identified after ICA).Following preprocessing of all EEG recordings, the trials for each stimulus were aggregated across participants.This produced an electrode-by-time-by-trial matrix for each stimulus, the size of which was 125 Â 7501 Â 23 for baseline stimuli (which are not presently analysed further) and 125 Â 60,001 Â 23 for original and control stimuli.

| CB data acquisition and preprocessing
In the second block, stimulus presentation and the acquisition of CB responses were again programmed using the MATLAB Psychophysics Toolbox (Brainard, 1997).CB data were recorded at a sampling rate of 20 Hz, with slider positions registered in the range of 0-100.Response vectors for the 480-s stimuli ranged in length from 9583 samples (479.15s) to 9591 samples (479.55 s) and were truncated to the length of the shortest response vector (9583 samples) for matrix aggregation.During CB data preprocessing, data from one participant were excluded from further analysis due to gross noise artefacts, namely, sustained activity in the 2-to 5-Hz range, which neither was observed in other participants' data nor would be expected from an authentic behavioural response.The remaining usable data were aggregated into a single time-by-trial-by-stimulus matrix, the size of which was 9583 Â 23 Â 2.

| Data analysis
We analysed EEG and CB responses to the original and control stimuli (excluding the baseline trials) separately for each stimulus and response modality.The experimental and data analysis paradigms of the two blocks are summarized in Figure 2.

| EEG spatial filtering using reliable component analysis (RCA)
EEG data were analysed through a two-step process first introduced by Dmochowski et al. (2012).First, the EEG data matrices were factorized to compute spatial components in which ISC was maximized; then, we computed ISC across entire excerpts and in a time-resolved fashion.In the first spatial filtering step, we used RCA to compute optimal spatial 'components' (linear combinations of electrodes) in which the ISC of the projected EEG data would be maximized.This procedure concentrates the most correlated activity across trials-which might be spatially distributed across the electrode montage-into a small number of components.In the current use case, RCA optimizes the EEG for correlation in time while reducing the spatial dimensionality of the data from 125 electrodes to a few components.RCA is similar to principal component analysis (PCA) in that both involve eigenvalue decompositions; therefore, the procedure returns multiple component weight vectors (eigenvectors) and accompanying coefficients (eigenvalues), which for RCA are sorted in descending order of across-trial covariance.Following Dmochowski et al. (2012), we computed the first three reliable components (RC1-RC3).Therefore, each trial of sensor-space data-a timeby-electrode matrix (60,001 Â 125)-was multiplied by an electrode-by-component (125 Â 3) weight matrix to produce a time-by-component matrix (60,001 Â 3).ISC was then computed on individual component vectors (60,001 Â 1).
We used a publicly available MATLAB implementation (Dmochowski et al., 2015) 5 to perform RCA using the procedure and parameters reported in Kaneshiro et al. (2020).The RCA components were computed separately for each stimulus.As reported in Kaneshiro et al. (2020), we visualize individual components as scalp topographies using forward-model projections of the weight vectors (Haufe et al., 2014;Parra et al., 2005) and report the scalar coefficient of each component.

| ISC
For both EEG and CB data, ISC was computed across participants on a per-stimulus basis, with each ISC calculation involving N = 23 single-trial vectors of timedomain response data.For EEG data, ISC was computed on a per-RC basis.Each ISC was computed in a oneagainst-all fashion, meaning that a given participant's ISC was the mean correlation of their trial with all other trials.Following that, we computed the group-mean ISC across distributions of N = 23 participants.
We computed ISC in two ways: first, over the entire duration of the stimulus, as was done in past music studies (Dauer et al., 2021;Kaneshiro et al., 2020;Madsen et al., 2019), and second, in a temporally resolved fashion.For the latter case, we followed the procedure introduced by Dmochowski et al. (2012) and used in the recent study by Dauer et al. (2021), computing ISC over 5-s windows that advanced in 1-s increments.This produced an ISC time series with a temporal resolution of 1 s.In the CB block, participants sometimes did not move the slider for entire 5-s windows, which would produce undefined correlations due to zero variance.To address this, we followed the procedure of Dauer et al. (2021) and added to each CB response vector a small amount of noise (uniform over the ±.001 interval) at every time sample prior to phase scrambling (if performed for permutation testing; see next section) and ISC calculation.
Following the calculation of time-resolved ISC, we performed additional correlations across selected pairs of group-mean ISC time series.These correlations were performed to determine whether ISCs correlated over time, even if their overall levels differed due to stimulus condition or response modality.Two of the pairings probed condition effects within a given response modality (EEG ISC: original vs. control stimulus; CB ISC: original vs. control stimulus), and the other two pairings probed differences in response modality for a given stimulus condition (original stimulus: EEG vs. CB ISC; control stimulus: EEG vs. CB ISC).For each of the four possible pairings, these final correlations were performed over all ISC time windows; correlations were also computed with the first four time windows-which collectively covered the first 0-8 s of each stimulus and accounted for the first major time-resolved ISC peaks observed in EEG responses to both stimuli-excluded.For cross-modal (EEG-CB) comparisons, we additionally performed cross-correlations with lags ranging from À5 to 5 s in order to account for unknown positive or negative lags between brain and behavioural responses.Participants also provided ratings at the end of each stimulus trial.Reliable component analysis (RCA) was used to optimize the EEG for phase-locked activity across trials, and inter-subject correlation (ISC) of the optimized data was computed, both across entire stimulus excerpts and 5-s windows.Bottom: In the second experimental block, participants heard the two stimuli again, this time while continuously reporting their perceived level of engagement.The resulting response vectors then underwent all-time and time-resolved ISCs.For response measures from both blocks, the statistical significance of the ISC was assessed via permutation testing, with null distributions constructed by phase scrambling the sensor-space EEG (prior to RCA) and the continuous behavioural response vectors, respectively.

| Statistical analysis
Statistical significance was computed in three ways.First, we assessed the standalone statistical significance of each RCA coefficient (for EEG data) and ISC (for EEG and CB data) using permutation testing.As described in recent ISC studies (Dauer et al., 2021;Kaneshiro et al., 2020), null distributions were constructed from 1000 surrogate versions of the data that had undergone phase scrambling in order to disrupt across-trial temporal covariance while preserving aggregate power spectra and autocorrelation characteristics of individual trials (Prichard & Theiler, 1994).As shown in Figure 2, EEG data were phase scrambled in sensor space (prior to RCA), producing null distributions of RCA coefficients as well as downstream ISC; CB data were phase scrambled just before ISC calculations.For each analysis, the 95th percentile of the null distribution served as the threshold for statistical significance at α = .05.For EEG data, p values of RCA coefficients and EEG ISC computed across entire stimulus durations were corrected for multiple comparisons using false discovery rate (FDR; Benjamini & Yekutieli, 2001), and we report adjusted p FDR values corrected for three comparisons (components) on a perstimulus basis.For time-resolved ISC, we present the observed group-mean time series for each stimulus (and RC, for EEG data) in relation to the time-varying 95th percentile of its null distribution.While a mass-univariate approach (e.g., testing against a null zero-mean or zero-median distribution at every ISC time window) would necessitate subsequent cluster correction due to temporal dependencies arising from autocorrelation characteristics of the data, the present approach for surrogate data generation takes these temporal dependencies into account through the preservation of autocorrelation in the 1000 phase-scrambled records, which are collectively analysed to form the null distribution; hence, we did not perform any cluster correction (Lancaster et al., 2018;Prichard & Theiler, 1994;Theiler et al., 1992).We computed the size of each statistically significant peak of time-varying ISC as the summed ISC across each run of observed values exceeding the statistical significance threshold; this was done separately for EEG and CB responses, but across stimulus conditions and-for the EEG data-across all three RCs together.
The second form of statistical analysis involved across-condition comparisons of ISC and behavioural ratings; these tests were conducted separately for each response type.For EEG ISC, we focused on RC1 results only, as this was the only component for which permutation testing returned significant ISC for both stimulus conditions.For behavioural ratings as well as EEG RC1 ISC and CB ISC computed across entire stimulus durations, we performed paired, two-tailed Wilcoxon signed-rank tests across distributions of N = 23 participants.For behavioural responses, we corrected for five comparisons (questions) using FDR.To determine whether distributions of time-resolved ISC values differed significantly according to stimulus condition, we performed paired, two-tailed Wilcoxon signed-rank tests across pairs of group-mean ISC vectors, separately for each response modality.All Wilcoxon tests are reported with the exact statistic W as well as p or p FDR value; as a measure of effect size, we report the z-statistic of the normal approximation of the test (Lehmann, 2006, p. 129), adjusted for ties.
Finally, we report the statistical significance of the correlations across pairs of group-mean time-resolved ISC time series as correlation p values.For correlations not involving lags, the p values for each of the four pairs of time series were FDR corrected for two comparisons (correlation over all ISC time windows and correlation with the first four time windows excluded).Lagged correlations for the two EEG-CB pairs were FDR corrected for 22 comparisons each (11 lag values Â 2 ISC time-window options).

| Behavioural ratings are higher for intact music
Participants reported how pleasant, arousing, interesting, predictable and familiar they found each stimulus on an ordinal scale of 1-9.Ratings are shown in Figure 3; y values have been slightly jittered for visualization purposes only.Significant condition effects were observed for all questions (two-tailed Wilcoxon signed-rank test, all W ≥ 55; all p FDR ≤ .043,corrected for five comparisons; all z ≥ 2.017), with higher mean responses for the original stimulus in all cases.Participants also reported how often they listened to the genre of the original stimulus (i.e., classical music); all responded higher than 1 (never), confirming that aspect of their eligibility.

| Original and control stimuli elicit correlated EEG components
Prior to computing ISC, sensor-space EEG data were spatially optimized using RCA.The topographies and coefficients of the three maximally correlated components, RC1-RC3, are shown in Figure 4.In Figure 4a, the first component (RC1) was broadly similar across stimulus conditions, with a fronto-central topography. 6Figure 4b shows the RCA coefficients (eigenvalues) for the first three RCs in line plots, with the respective 95th percentiles of the null distributions represented as the height of each shaded grey area.As suggested by these plots, the RC1 coefficient was statistically significant for the original but not the control stimulus (permutation test, original RC1 p FDR < .001and control RC1 p FDR = .23,corrected for three comparisons per condition).RC2 and RC3 topographies differed by stimulus condition, and their coefficients were not statistically significant (all p FDR ≥ .23).

| Aggregate ISC is higher for intact music
We computed the ISC of RC1-RC3 EEG as well as CB responses across entire stimulus durations.EEG results are shown in Figure 5a, and CB results are shown in Figure 5b.For the EEG, the mean ISC of RC1 exceeded the 95th percentile of the null distribution (upper edge of 6 The RC1 topography for the control stimulus was multiplied by À1 prior to plotting.This is due to a known arbitrary sign issue with RCA and does not impact correlation.the shaded grey area) for both stimuli (permutation test, original and control p FDR < .001,corrected for three comparisons per condition).A paired comparison between stimulus conditions indicated a significant condition effect for RC1 ISC (two-tailed Wilcoxon signed-rank test, W = 276, p < .001,z = 4.197), with a higher mean ISC for the original stimulus.However-perhaps unsurprisingly given the RCA coefficients (Figure 4b)-all-time RC2 and RC3 ISCs were not significant for either stimulus (all p FDR ≥ .12)and thus were not analysed further.
For CB data, ISC was significant for both conditions (p < .001)and differed significantly between conditions (W = 270, p < .001,z = 4.015), again with a higher ISC for the original condition.

| Time-resolved EEG and CB ISC implicate shared and contrasting salient musical events
For each stimulus and response modality, we also computed ISC over 5-s windows, with a temporal resolution (window shift size) of 1 s.EEG RC1 and CB results are shown in Figure 6, while EEG RC2-RC3 results can be found in Figure S1.
EEG RC1 ISC of the original stimulus contained several statistically significant peaks (points exceeding the top of the shaded grey area; Figure 6a), with a total of 37.18% of the time windows corresponding to statistically significant ISC (permutation test, uncorrected; Figure 6b).For the control stimulus, time-varying ISC was lower overall, with only 8.40% of the time windows reaching statistical significance.
The top 10 EEG ISC peaks are summarized in Table 2, and numbered peaks implicating RC1 are labelled in Figure 6a.Some but not all peaks correspond to preselected points in the original stimulus: ISC was significant after each demarcated event (E1-E3) leading up to the first structural highpoint at E4 and also at the start of the B section (E5).ISC was lower overall in the shorter second leadup (E1 0 and E2 0 ) to the highpoint at E4 0 .Other notable RC1 ISC peaks occurred during the first 21 and last 15 ISC windows.The former peaks highlight the opening portion of the excerpt, which contains an extended dramatic solo passage that establishes the key of the movement.The latter peak consists of the solo cello, followed by the strings, reprising the sparse closing cadence of the movement's A section leading up to E5. RC1 ISC was not significant at either highpoint arrival (E4 and E4 0 ).As suggested by the percentages of significant RC1 ISC reported for each condition, distributions of temporally resolved ISC differed according to stimulus condition (two-tailed Wilcoxon signed-rank test, W = 99,342, p < .001,z = 14.181) and were higher overall for the original condition.
The time-resolved ISC of CB responses was similarly higher overall for the original compared to the control stimulus (Figure 6c), with 20.42% of the time windows being statistically significant for the original compared to 14.74% for the control (Figure 6d).Similar to the EEG ISC, the top 10 CB peaks summarized in Table 3 and annotated in Figure 6c include peaks at the beginning of both stimuli and the end of the original stimulus.Unlike the EEG, other CB ISC peaks did not fall in between the theme entrances but rather aligned with them (Peaks 8 and 10); moreover, both structural highpoints of both stimuli elicited statistically significant CB ISC (Peaks 2, 3, 6 and 7).Finally, CB ISC was not significant at the transition between sections marked by event E5.Distributions of time-resolved ISC again differed significantly by [FDR] corrected) and differs significantly between stimuli (Wilcoxon signed-rank test), with a higher ISC for the original excerpt.RC2 and RC3 ISCs are not significant for either stimulus (permutation test, FDR corrected) and are thus not compared across conditions.(b) For continuous behavioural (CB) responses, the mean ISC is statistically significant for both stimulus conditions.ISC also differs significantly between conditions, with higher ISC among responses to the original excerpt.
T A B L E 2 Top 10 statistically significant, temporally resolved EEG ISC peaks, as computed by the summed ISC for each run of statistically significant ISC.

Peak rank
F I G U R E 6 Time-varying inter-subject correlation (ISC).ISC of spatially filtered electroencephalographic (EEG) data and of continuous behavioural (CB) reports of engagement was computed in 5-s windows advancing in 1-s increments and plotted in relation to the 95th percentile of the null distribution (shaded grey area).Dashed vertical lines denote musically salient events (see Figure 1).Statistically significant ISC (permutation test, uncorrected) is denoted in darker colours exceeding the shaded grey area.Numbered peaks denote the top 10 EEG and CB ISC peaks, based on the summed ISC of each significant peak.stimulus condition (Wilcoxon results: W = 70,035, p < .001,z = 4.514), with higher ISC overall for the original condition.
We correlated pairs of time-resolved ISC time series across stimulus conditions and response modalities.Pairs of time series are visualized in Figure 7; responses have been z-scored for visualization to account for differing ranges of values.Pairwise correlations involving all time-resolved ISC windows and with the first four time windows omitted, as well as cross-correlations for comparisons of disparate response modalities, are summarized in Table 4.For across-condition comparisons within a single response modality, EEG RC1 ISC was moderately correlated (r = .3572,p FDR < .001,corrected for two comparisons) when all time-resolved ISC windows were included.However, this appears to have been driven largely by the first four time windows, as the correlation was no longer significant (r = .0497,p FDR = .282)when those were omitted.The acrosscondition correlation of time-resolved CB ISC was also higher with all time windows included (r = .4193,p FDR < .001)and remained moderately high (r = .3562,p FDR < .001)when the first four time windows were omitted.
Brain-behaviour correlations of CB and EEG timeresolved ISC were calculated with no lags as well as cross-correlated over a set of lags ranging from À5 to 5 s.Correlations were highest for the original stimulus: When all time-resolved ISC windows were included, EEG and CB correlated moderately with no lag (r = .3792,p FDR < .001,corrected for two comparisons), and with lags, it reached a cross-correlation of r = .4055(p FDR < .001,corrected for 22 comparisons) when the CB ISC preceded the EEG ISC by 1 s.When the first four time windows were omitted, the EEG-CB correlation was lower but still significant (r = .1588,p FDR < .001)with no lags, and with lags, it peaked at r = .1962(p FDR < .001)when the cross-correlated CB lagged behind the EEG by 5 s.For the control condition, EEG-CB correlation across all ISC time windows with no lag was not significant (r = .0720,p FDR = .234)and dropped further when the first four ISC time windows were excluded (r = À.0498, p FDR = .281).Cross-correlations accounting for lags were similarly not significant for this condition: Maximum cross-correlation involving all ISC time windows was observed when the CB ISC preceded the EEG ISC by 2 s (r = .1006,p FDR = .120),and when the first four ISC time windows were excluded, the maximum cross-correlation occurred when the CB lagged behind the EEG by 5 s (r = .0513,p FDR = .534).Visualizations of cross-correlation values as a function of time lag are provided in Figure S2.

| DISCUSSION
Synchronization of audience responses has been found to index states of engagement with naturalistic film excerpts, speeches and musical works.In this study, we have analysed the ISC of EEG and CB responses to a musical work composed in the classical style.We have found that ISC varies over the course of the excerpt, with the level and timing of ISC peaks differing by response modality.Moreover, in extending a phase-scrambling stimulus manipulation used in past music ISC studies, we have found that an audio envelope characteristic of music elicits correlated responses, though to a lesser extent than the original, intact excerpt.Note: Peak ranks correspond to numbered peaks as shown in Figure 6c.Abbreviations: CB, continuous behavioural; ISC, inter-subject correlation.

| ISC, response modalities and frameworks of engagement
ISC computed across entire stimulus durations was statistically significant for both original and control stimuli, for both the maximally correlated EEG component (RC1) and CB responses.A significant condition effect was observed for both response measures: The original stimulus elicited responses with a higher ISC, with a slightly larger condition effect size for EEG than CB.
We also computed ISC in short, overlapping time windows to obtain correlations with a temporal resolution of 1 s.Dmochowski et al. (2012, fig. 1) and Poulsen et al. (2017, fig.2) reported that the ISC of EEG responses to film excerpts varied over time and peaked during periods of high tension and suspense.Based on these findings as well as identified commonalities linking narrative and musical suspense (Lehne & Koelsch, 2015), we expected ISC to vary during music listening as well and to peak during periods in which the music induced a state of engagement, defined as being 'compelled, drawn in, connected to what is happening, [and] interested in what will happen next' (Schubert et al., 2013).
Indeed, across an audience of adult musicians listening to the first movement of Elgar's Cello Concerto, we observed temporal variations in both EEG and CB ISC with a number of significant peaks (Figure 6).For both response modalities, a larger percentage of statistically significant time-resolved ISC windows was observed for the original stimulus than the control stimulus.In addition, both measures showed significant ISC peaks at the beginning of both stimuli.While these peaks are likely partly attributable to low-level processing of stimulus onsets (Sokolov, 1990), original ISC was markedly higher here than control, corresponding to the top-ranked original peak compared to the fifth-highest control peak for both EEG and CB (Tables 2 and 3).Given this result, and the musical content of the opening of the movement (see Section 2), we interpret these peaks as reflecting not only low-level processing but also the 'drawn in' and 'interested in what will happen next' aspects of engagement proposed by Schubert et al. (2013).Finally, both EEG and CB ISC included significant peaks near the end of the original stimulus.
We also observed notable differences between the two response modalities.First, for the stimulus event markers specified a priori, significant EEG ISC peaks occurred not at the entrances of the main theme or the 'highpoint' arrival (e.g., Figure 1, E1-E4), but rather in between some entrances (Figure 6a, EEG Peaks 2 and 3).In contrast, CB ISC peaks in those regions aligned more directly with the event markers (Figure 6c, CB Peaks 2, 3, 6, 7 and 10).Finally, the pause marking the transition between the A and B sections of the movement (Figure 1, E5) corresponded to a significant ISC peak for EEG but not CB.
The response modalities also differed in condition effects.While the original stimulus elicited higher ISC than the control for both EEG and CB, the corresponding effect sizes for aggregate and time-resolved ISC, as well as the change in percentage of significant time-resolved ISC windows, were consistently larger for EEG than for CB.A smaller condition effect for CB ISC is also suggested in Table 4: Between-condition correlation of timevarying ISC was moderate for both EEG and CB across all time-varying ISC windows; yet when the first four time windows were omitted, between-condition correlation of CB ISC remained moderate while EEG ISC correlations dropped substantially.
While we know of no EEG ISC study focusing specifically on moments of apotheosis (i.e., the culminating moment of tension and suspense) during film viewing, we posit that in the present results, the low EEG ISC and peak CB ISC at the resolution of suspense reflect a state distinct from the preceding, increasingly anticipatory state.In all, EEG and CB ISC may index differential states of engagement relating to anticipation and culmination, respectively.These findings begin to fill a conspicuous absence of musicological, music-theoretic and psychological studies focusing on affective processes, particularly from a point of maximal tension to the actual resolution of tension.Music theorists have sought to characterize the process leading to and following a climax in terms of intensity (Berry, 1978) or narrative (Agawu, 1984;Childs, 1977); the affective peaks and troughs of these processes are described in terms of T A B L E 4 Correlations between time-resolved inter-subject correlation (ISC) time series, computed across original (O) and control (C) stimulus conditions within the EEG or continuous behavioural (CB) response modality or across response modalities for a given stimulus condition.
intensification, climax and abatement (Patty, 2009).Building upon past work by Dauer et al. (2021), who reported EEG and CB ISC with process-based, minimalist music, the present work contributes insights into the temporal dynamics of engagement with tonal music as indexed by the ISC of dual response modalities.
Taken together, these results suggest that the synchronization of EEG responses may be driven by different stimulus attributes than CB synchronization and that CB synchronization is less impacted by condition differences.One interpretation of this finding is that the behavioural reports delivered by participants as they experienced the stimuli were driven more by low-level stimulus features preserved across conditions-that is, amplitude envelope fluctuations-while synchronized EEG responses were driven more by other aspects of the music.This could explain both the smaller condition effect of the CB data and the across-condition similarities of the CB ISC time series, particularly at the structural 'high points' accompanied by sudden and notable increases in loudness (Figure 6: original Peaks 7 and 2, control Peaks 6 and 3).Known limitations of behavioural responses-including self-report bias (Rosenman et al., 2011) and a listener's ability, or lack thereof, to self-assess (Madsen et al., 1993)-can make such reports challenging or distracting to deliver, particularly continuously during music listening.Thus, subjective listener responses-while often considered a 'gold standard' of music perception research-may be influenced, more so than EEG, by acoustic changes even in the absence of content that would arguably be deemed musical.

| Audio envelope, correlated components and ISC
For the present study, we created a control stimulus by re-scaling the phase-scrambled original by the envelope of the intact version.This enabled us to assess the role of the audio envelope-a low-level stimulus feature-in driving correlated responses.For both stimuli, the topographies of the maximally correlated EEG component RC1 were consistent with previously reported auditory RC1s (Cohen & Parra, 2016;Dauer et al., 2021;Kaneshiro et al., 2020;Ki et al., 2016;Madsen et al., 2019) and other spatial EEG components computed from responses to natural music (Gang et al., 2017;Schaefer et al., 2011;Sturm et al., 2015).The similarity of the present control RC1 topography to that of the original diverges from the RC1 topography computed from responses to phasescrambled music (without envelope scaling) reported by Kaneshiro et al. (2020), which differed from that of intact music.In addition, while the present original stimulus elicited higher behavioural ratings and neural correlation than the control, the control did elicit significant ISC, which was also not the case previously (Kaneshiro  et al., 2020). 7In sum, the consistent RC1 topography and significant EEG and CB ISC elicited by the control stimulus suggest that envelope fluctuations shared across the stimuli did drive correlated responses even while the magnitude and temporal dynamics of those correlations varied according to stimulus condition and response modality.Future research can extend this approach, devising new control conditions that include or omit other musical features and attributes, to further clarify the role of specific acoustic, compositional and performance elements of real-world music in engaging listeners.
The percentage of significant EEG RC1 ISC windows for the original stimulus (37.18%, Figure 6c) was broadly on par with previous reports related to film viewing (between approximately 21% and 33%, Dmochowski et al., 2012;54.1%, Poulsen et al., 2017).However, percentages of significant EEG ISC for RC2 and RC3 were markedly lower during music listening compared to film viewing (Dmochowski et al., 2012).This finding is corroborated by sizeable drops in the RCA coefficients (Figure 4b) and all-time EEG ISC (Figure 5).The drop in RCA coefficients has been reported previously (Dauer et al., 2021;Kaneshiro et al., 2020) and motivated the decision in those studies to analyse only RC1 data.While we have taken an exploratory approach and analysed RC1-RC3 data (Dmochowski et al., 2012), our betweencondition comparisons also ultimately focused on RC1.We recommend that the standalone significance of each RC's ISC be assessed, as was done here, to determine its utility for subsequent analyses.

| Affordances and limitations
While promising, these findings highlight the ongoing challenges of utilizing ecologically valid stimuli and data-driven analysis techniques.The use of a complete, real-world musical work may have served to elicit neural processing that might not have occurred under the use of tightly controlled stimuli (Nastase et al., 2020), while RCA and ISC serve as data-driven approaches to respectively optimize and analyse single-trial EEG 'in the absence of [traditional] event markers' (Dmochowski et al., 2012).Indeed, the present findings related to the 7 We acknowledge that one phase-scrambled excerpt in Kaneshiro et al. (2020, fig. 3C) did elicit significant ISC; however, we cannot directly compare findings given the differing component topographies across studies.
timing of significant ISC peaks advance our understanding of the relationship between musical events and listener states indexed by ISC.However, there remain challenges in reaching definitive and generalizable conclusions based on these results.For instance, the timing, duration and size of the peaks do not vary systematically relative to specific musical events such as thematic entrances or regions of building tension between them.Moreover, a number of ISC peaks-particularly for EEG responses-do not align with any predefined musical event.Finally, our interpretations do not account for other, uncontrolled stimulus factors or 'non-relevant dimensions' that may drive variations in ISC (Nastase et al., 2020).
Designing the study around a single musical work follows past neuroscience investigations of music feature tracking using fMRI (Alluri et al., 2012) and EEG (Cong et al., 2013), as well as more recent fMRI (Farbood et al., 2015) and EEG (Dauer et al., 2021) ISC studies in which control conditions were derived from a single work of interest.The cello concerto movement used as the current primary stimulus offered a range of affects, intensities and textures while also affording investigation into two forms of musical saliency: anticipatory passages marked by musical tension (Madsen & Fredrickson, 1993;Schubert et al., 2013) and climactic points of arrival (Agawu, 1984;Chew, 2016).Even so, future studies utilizing additional and perhaps larger sets of stimuli drawn from classical and other genres are needed to generalize the present findings.

| Future work
Our next work will augment the present findings with the physiological responses collected in the larger study.Other work can investigate how the extent of 'narrative' conveyed by music drives ISC.For instance, 'programme' music conveying an extra-musical narrative, such as Prokofiev's Peter and the Wolf, Op. 67 (Di Stefano et al., 2024;Prokofiev, 1940), could bridge explicit narratives (Dmochowski et al., 2012;Ki et al., 2016), popular vocal music (Kaneshiro et al., 2020), process-based works (Dauer et al., 2021) and 'absolute' (i.e., non-representational) musical works used previously (Madsen et al., 2019) and here.
Other data features and response measures can be considered as well.Recent work by Chabin et al. (2020) used EEG oscillatory band power to identify neural correlates of chills and emotional pleasure during natural music listening, while ISC of physiological responses has been linked to structural segmentation boundaries in a live-listening context (Czepiel et al., 2021) and to flow states during Javanese gamelan performance (Gibbs et al., 2023).The interpretation of time-varying ISC could also be broadened from a focus on specific musical events to 'second-order' correlations with time-varying stimulus features (Lartillot & Toiviainen, 2007) or models of musical tension (Farbood, 2012).
ISC, while increasingly shown to be a useful index of engagement, is often reported as a group measure, yet individual listeners may focus on different stimulus attributes at different times.Future work seeking individualized insights could consider single-participant ISC, which has been shown to correlate with individual learning outcomes in a video-viewing EEG study (Cohen et al., 2018).The correlation of EEG trials with time-varying stimulus features (Dmochowski et al., 2018) may be another way to index single-listener engagement and has revealed significant group-level correlations across entire musical works (Gang et al., 2017;Kaneshiro et al., 2020).This could be explored further-perhaps with participant-selected stimuli (Grewe et al., 2007;Rickard, 2004), which the paradigm enables-to derive individualized, time-resolved measures of musical engagement.
Finally, ISC may contribute not only basic scientific findings but also predictive insights for real-world application (Kaneshiro & Dmochowski, 2015).Leeuwis et al. (2021) showed that EEG alpha power synchrony could predict song popularity based on Spotify8 streams, when subjective ratings from the experimental sample could not.Small-sample EEG ISC has also been used to predict population-level preferences for television commercials and in a time-resolved fashion over a television episode (Dmochowski et al., 2014).Within-song insights are also of interest in the field, with published research contributing computational models for chorus and hook detection from audio (Van Balen et al., 2015) and using large-scale commercial data to model time-varying, within-song engagement using probabilities of Shazam9 songidentification queries (Kaneshiro et al., 2017) and Spotify skips (Montecchio et al., 2020).Thus, future investigations of temporally resolved ISC in conjunction with such reports have the potential to contribute to real-world services.

AUTHOR CONTRIBUTIONS
Experimental stimuli.(a) Audio waveform and spectrogram of the natural music (original) stimulus, an influential recording of the first movement of Elgar's Cello Concerto (1919), performed in 1965 by Jacqueline du Pré and the London Symphony Orchestra, conducted by Sir John Barbirolli.Dashed vertical lines denote preselected stimulus events of interest: instances of the main theme played by the solo cello (E1 and E1 0 ), strings (E2 and E2 0 ) and again by the soloist (E3), which lead to structural highpoints (E4 and E4 0 ), and the start of the second section in the movement (E5).(b) Audio waveform and spectrogram of the control stimulus: a phase-scrambled version of the original stimulus that was subsequently scaled by the amplitude envelope of the intact music.The dashed lines are as described in (a).(c) Cumulative distributions of waveform values are similar across stimuli.(d) Stimuli share similar aggregate power spectra.

F
I G U R E 2 Overview of the experimental paradigm.Top: In the first block of the experiment, participants listened to each stimulus once while electroencephalographic (EEG) and physiological responses were recorded (physiological responses were not analysed here).
Reliable component analysis (RCA) spatial filter outputs.(a) Forward-model projected topographies of the three maximally correlated components (RC1-RC3), computed separately for each stimulus.(b) RC1-RC3 coefficients.The height of the shaded grey area denotes the 95th percentile of the null distribution for each component.Only the RC1 coefficient for the original stimulus is statistically significant (permutation test, false discovery rate [FDR] corrected).Behavioural ratings.Participants rated each stimulus on a 1-9 ordinal scale along dimensions of pleasantness, arousal, level of interest, predictability and familiarity and reported how often they listen to classical music.Bar heights represent mean values; error bars represent ± SEM; and dots represent individual responses.Ordinal responses (y values) are slightly jittered for visualization only.All between-condition comparisons are statistically significant (Wilcoxon signed-rank test, false discovery rate [FDR] corrected), with the original stimulus receiving higher ratings for each dimension.All participants listened to classical music at least occasionally.
Aggregate inter-subject correlation (ISC) of electroencephalographic (EEG) responses and continuous behavioural reports.Correlations were computed in a one-against-all fashion across the entire duration of each stimulus.Error bars represent ± SEM, and points represent individual participants.The height of the shaded grey area denotes the 95th percentile of the null distribution for each group-level ISC.(a) For EEG responses, RC1 ISC is statistically significant for both original and control stimuli (permutation test, false discovery rate (a) Temporally resolved ISC for EEG RC1; plots for RC2 and RC3 are provided in Figure S1.(b) Percentage of time windows containing statistically significant EEG ISC for each stimulus and RC1-RC3 (permutation test, uncorrected).(c) Temporally resolved ISC for CB responses.(d) Percentage of time windows containing statistically significant CB ISC for each stimulus.
U R E 7 Time-varying inter-subject correlation (ISC) overlays.Time-varying ISC was correlated between pairs of stimulus conditions within a single response measure and also across response measures within each stimulus condition.Numeric correlation coefficients and corresponding statistical significance are reported in Table4.(a) Original and control electroencephalographic (EEG) ISCs are moderately and significantly correlated when all ISC time windows are included, but the correlation is not significant when the first four ISC time windows are excluded.(b) Original and control continuous behavioural (CB) ISCs are moderately and significantly correlated, both across all ISC time windows and with the first four time windows omitted.(c) The correlation of EEG and CB ISC for the original stimulus is significant, whether computed across all ISC time windows or with the first four time windows omitted.The highest correlations for this comparison are observed when all ISC time windows are included, peaking during cross-correlation when CB ISC precedes EEG ISC by 1 s.(d) No correlations between EEG and CB ISC for the control stimulus-across any lag and ISC time window configuration-are significant.
Number of usable EEG and CB data records collected from an initial sample of 24 adult musician participants and demographics of each usable sample.
T A B L E 1Abbreviations: CB, continuous behavioural; EEG, electroencephalography.
T A B L E 3 Top 10 statistically significant, temporally resolved CB ISC peaks, as computed by the summed ISC for each run of statistically significant ISC.