Introduction

“What is this thing called swing”, is a question raised already by Louis Armstrong in a well-known song. The phenomenon of swing certainly is one of the most salient general features of jazz music and is considered an essential ingredient of jazz performances. The term was introduced by jazz musicians to describe what they felt was a specific playing style in their performances. Yet astonishingly, a century after jazz musicians like Armstrong and Ellington came on stage, it is still controversial what is the nature of swing, and what are its main musical and psychoacoustical components. It was even argued in the past that “you can feel it but you just can’t explain it”1, or similarly, according to The New Harvard Dictionary of Music, that swing is “an intangible rhythmic momentum in jazz. Specifically manifested in a variety of relationships between long and short notes, or in the presentation of single notes, swing defies analysis”2.

Among the possible components of swing only one is established unambiguously so far, the conspicuous uneven subdivision of quarter notes into long and short eighth notes. It is measured by the so-called swing ratio, i.e., the length ratio of consecutive long and short eighth notes that are known as downbeats and offbeats. (Downbeat refers to the first eighth of a quarter note, offbeat to the second eighth. An interactive tool with audio examples demonstrating downbeats, offbeats, and different swing ratios can be found on our website (https://www.ds.mpg.de/swing/swingratio). Non-experts may also find a short description of relevant musical nomenclature in the Supplementary Note 1.) Listening to computer-generated jazz music that was “swingified” by merely implementing a swing ratio (“swung notes”)3, it is obvious to jazz musicians that this is not sufficient and that there must be other components. But which are these components, and which ones are important?

It has long been speculated that (besides syncopation4,5) rhythmic effects, in particular microtiming deviations (MTDs) (i.e., small timing deviations much below an eighth note duration), are a major component of swing. However, while the importance of the swing ratio is generally accepted6, the role of rhythmic MTDs has been a subject of controversy for many decades. Speaking of MTDs is merely a general concept; various types have been studied using a multitude of methodologies7,8,9,10,11,12,13,14,15,16,17,18. Following Charles Keil7,8 some authors have emphasized the importance of participatory discrepancies; “it is the little discrepancies within a jazz drummer’s beat, between bass and drums, between rhythm section and soloists that create ‘swing’ and invite us to participate”7,8,9,10,11,12,13. This is a very strong statement as it amounts to the claim that MTD in the form of participatory discrepancies are the major component of swing. In contrast, however, others stressed the importance of rhythmic accuracy19,20,21,22,23,24 claiming that MTDs may impede swing. Many of these claims were based on observational analyses of performances by individual jazz musicians. This may explain the origin of the controversial claims, as MTD are not used equally by all musicians14,16, and even if present, they might be unrelated to swing.

A group of studies used another methodological approach investigating listeners’ perceptions and experiences20,21,22,23,24,25,26. These studies, however, mostly investigated the effect of MTD on groove20,21,22,23,25,26 rather than on swing24 in particular. Groove is commonly defined as the musical aspect that induces a pleasant sensation (enjoyment) and body movement along with the music (entrainment)27,28. It is a prerequisite for swinging jazz, but also for various other musical categories that do not even use swung notes, swing thus must have additional components besides groove. Recent studies on groove investigated either body movements triggered by the music (e.g., periodic head movements11) or asked listeners to rate the groove and how much they liked a piece20. Even a psychometrically valid questionnaire to assess groove has recently been published28. Findings from experimental studies investigating listeners’ experiences indicate that MTD tend to decrease groove, with fully quantized versions of performances often rated higher than or at least the same as the original versions performed by professional musicians20,21,22,23, (see Hove et al.25 or Eaves et al.26 for overviews on other musical and non-musical aspects that affect groove). While the effect of MTD on groove was studied extensively, only in one study listeners were asked to judge the swing of a performance24. In this preceding study of our research group we found that involuntary random MTD did not enhance swing, as quantized versions of twelve different jazz pieces were rated highest by listeners. Hence, it is still unclear whether MTD—even if they occur—are an essential component of swing. Is there a way to prove that MTD do contribute substantially to swing?

Adopting an operational definition of swing (i.e., the performance of a piece swings if it is judged as swinging by expert listeners), the present paper uses an approach that is able to clarify the controversy and to rigorously demonstrate a positive effect of certain MTDs on swing. By manipulating the timing of original piano recordings and measuring the swing of different manipulated versions (as rated by jazz musicians) we demonstrate that a playing style with systematic MTDs, slightly delaying downbeats of the soloist with respect to the rhythm section while synchronizing offbeats, considerably enhances swing. As the soloist’s offbeats need to remain synchronized with the rhythm section, this playing style has an influence on the swing ratio. If the downbeat onsets of the soloist are delayed (their durations thus shortened) and offbeats remain synchronized, this implies a somewhat smaller swing ratio for the soloist than for the rhythm section and may create a perceived friction between them.

Analyzing short musical extracts in six recorded solos of different jazz musicians Friberg and Sundström observed that such downbeat delays did show up in a majority (not all) of their extracts16. As the variation in their measurements was quite large and some musicians did not make use of such delays, they called for a substantially larger data set to confirm these anecdotal observations16. We therefore also analyze a large set (456) of full solo performances using the Weimar Jazz Database29 and determine the average downbeat delays. We find downbeat delays of jazz soloists as a general trend and we find that their magnitude decreases with tempo.

Taken together the results of our experimental and our observational study lead to the conclusion that downbeat delays are a key component of swing in jazz. They underline the general importance of timing and rhythmic effects for swing and resolve the long-standing controversy on the role of MTDs by demonstrating that certain systematic MTDs enhance swing, while involuntary random MTDs do not, as we showed in our previous work. That downbeat delays could play such an important role for swing was widely unknown. Professional and semiprofessional jazz musicians participating in our online experiment reported a pleasant friction between soloist and rhythm section, but were unaware of the effect and could not determine its nature (cf. “you can feel it but you just can’t explain it”). We emphasize that the phenomenon reported here (with downbeat delays of the order of 30 ms or 9% of a quarter note for intermediate tempi) is not identical to the well-known laid-back mode, where musicians play with much larger and easily perceivable delays.

Results

Timing analysis of jazz solos

We begin by an in-depth analysis of onset timing in a large set of jazz recordings. As outlined above, our main goal is to prove that there is a positive effect of downbeat delays on swing, but we first want to clarify the question, whether or not and to which extent soloists tend to delay their downbeats with respect to the rhythm section. We evaluated data from the Weimar Jazz Database29, which contains accurately labeled transcriptions of 456 jazz solos of various artists, and gives access to several quantities like note positions or rhythmic value. We want to stress that our general analysis, which does not consider individual differences and different playing styles, can only have a limited accuracy with a large scatter of data. Nevertheless, it is able to reveal general trends, which is the goal of this section.

For each given piece in the database29, we isolated every downbeat-offbeat pair of the solo to compute the average downbeat delay and swing ratio (averaged over each solo) as a function of tempo using the downbeats of the drums as a reference. The results presented in Fig. 1 show the existence of non-zero downbeat delays in most cases (with the exception of a few negative and a few very small delays). The data show some variation, probably reflecting individual preferences, but there is a clear trend for decreasing delays with increasing tempo (Fig. 1a). The trend becomes nearly linear, if the downbeat delays are measured in ticks (Fig. 1b). Ticks represent fractions of quarter notes (which are subdivided into 960 ticks) and are not an absolute measurement of time (see Eq. (1)). The figure demonstrates that many soloists are using systematic MTD, i.e., positive downbeat delays, which typically are of the order of 30 ms or 85 ticks for intermediate tempi of about 150 bpm. (This value in ticks corresponds to delays of about 9% of a quarter note). While this is true for a majority of jazz soloists, it should be mentioned that a few soloists use only small or no downbeat delays at all.

Fig. 1: Average downbeat delays of soloists as a function of tempo.
figure 1

Each point in the scatter plots corresponds to a piece of the Weimar Jazz Database29. In order to ease readibility, the corresponding standard deviations are shown in Supplementary Fig. 1. a Average downbeat delays in milliseconds as a function of tempo in beats per minute. The red line delimits the tempo range of pieces used in our experiment (see “Methods” section) and corresponds to the fit in (b). The scattered data exhibit mostly positive delays with generally a nonlinear trend, which is nearly linear in this restricted tempo range. b Average downbeat delays expressed in ticks as a function of tempo. The red line shows a linear fit to the data.

This trend did not change, when we considered jazz sub-genres (“bebop”, “swing” or “hardbop”) separately (see Supplementary Fig. 6). Of course the magnitude of downbeat delays may vary within a solo or a whole piece and it makes sense to also look at individual delays in their musical context. Here, however, we want to detect general trends and are therefore studying average quantities. It is important to note that the standard deviation of the downbeat delays are noticeably smaller than their average value (except for high tempi above 200 bpm), which means that typical downbeat delays are almost always positive (see Supplementary Fig. 1).

The swing ratio is another important parameter. Although not in the focus of the present paper, we determine it here, as it is also relevant for swing. In particular, we realized in the course of our experimental study that it was crucial to choose a suitable swing ratio before applying systematic timing manipulations (for details see Supplementary Results 3: Serenade to a cuckoo, second experiment testing different swing ratios). The swing ratio is a measure of non-isochronous metrical subdivisions. Non-isochronous rhythmical patterns are prominent in jazz music, but are found also in some other cultures, e.g., in Malian jembe drumming and Uruguayan candombe drumming30,31,32. While the swing ratio has been extensively studied for drummers3,16,17,33, the swing ratio of soloists and in particular how its optimal value varies with tempo is still not unambiguously established34.

We determined the mean swing ratio of the soloists using the definition of Eq. (2) for each of the 456 pieces of the Weimar Jazz Database as described in the “Methods” section. The results are shown as a function of tempo in Fig. 2. Note that the mean swing ratios of the soloists are much smaller than generally believed and also smaller than reported in early observational studies14,34 that were using episodic excerpts. Assuming synchronized offbeats, such small swing ratios appear as a result of downbeat delays. In particular, the figure also demonstrates that the noted triplet feel (or ternary feel, i.e., a swing ratio of 2:1) is rather a myth as far as soloists are concerned. Most of them use swing ratios that are below 1.5. For fast tempi (more than 160 bpm), one finds a decreasing trend of the soloists’ swing ratio with increasing tempo. So far this is similar to the trend reported for the swing ratio of drummers16,35. On the other hand, the trend is reversed for medium to slow tempi (below 160 bpm), where the soloists’ swing ratio tends to decrease with decreasing tempo. This means that drummers and soloists follow two opposing trends regarding the swing ratio in this tempo range and that the swing ratio of soloists tends to be smaller than that of the rhythm section. We also analyzed other characteristics of the recordings such as the position of individual triplets as a function of tempo. These additional findings are included in the Supplementary Results 1. After submission of our manuscript, we became aware of recent work by Corcoran and Frieler, who also analyzed the swing ratios of the solos contained in the Weimar Jazz Database. They used a different method to determine the swing ratio and obtained qualitatively similar results apart from the increasing trend we found below 160 bpm36.

Fig. 2: Mean swing ratios of soloists as a function of tempo.
figure 2

Each point corresponds to a piece of the Weimar Jazz Database29 and represents the soloist's averaged swing ratio as a function of tempo. A quadratic fit to the data (gray line) as an indicator of preferential swing ratios reveals an increasing trend as a function of tempo up to 160 bpm and a decreasing trend above 160 bpm. The swing ratio of most soloists lies below 1.5, thus is much smaller than generally believed, and does not correspond to a triplet feel (i.e., swing ratio 2:1).

Experiment investigating swing

The above empirical observations indicate that a large fraction of jazz musicians play jazz solos with downbeats slightly delayed with respect to the rhythm section. Nevertheless, the question remains, whether these delays are an essential component of swing, as not all jazz musicians use them. To address this question, we adopted an operational definition of swing, that is, the performance of a piece swings if it is judged as swinging by expert listeners. Professional and semi-professional jazz musicians can be considered expert listeners, as they are trained and experienced in creating and evaluating the swing of a performance. For the study, we used an experimental approach, which we developed for a previous microtiming study on swing24. Manipulating the onset timing in MIDI recordings of piano jazz performances and letting expert jazz musicians rate the swing of different manipulations gives us the possibility to clarify whether different ways of microtiming have a positive effect on swing. In that previous study, we investigated the impact of random MTDs by amplifying them, deleting them, and inverting them. We showed that random MTDs, which are present in every human musical performance, did not enhance swing, which entails that these MTDs can be detrimental to swing. In the present work, we now focus on studying the effect of systematic MTDs.

Moreover, the analysis presented in the preceding section did not show whether soloists are also delaying their offbeats. The Weimar Jazz Database only reports downbeats of drums as a reference, but does not give access to their offbeats, which precludes determining the offbeat MTDs of soloists with respect to the drums. With our experimental approach, however, we are able to clarify the role of offbeat timing by studying how different versions with and without offbeat delays affect swing.

We prepared audio extracts presenting different kinds of systematic MTDs in jazz piano performances (“soloist”) with respect to a quantized rhythm section (“rhythm section”). The manipulations we carried out on real performances are explained in detail in the Methods section and sketched in Fig. 3. We based all manipulations on a quantized original version, which aligns the notes to a grid with an optimized swing ratio. We needed to take such a step for the sake of providing well-controlled distinguishable conditions. We think that this is justified as a minor intervention; we previously showed for instance that random microtiming fluctuations do not play a positive role for swing24. For the present experiment, we hypothesized that a positive effect on swing might result from (i) a both delayed manipulation, where all notes of the soloist are uniformly delayed with respect to the rhythm section, and/or (ii) a downbeat delayed manipulation, where the soloist notes are delayed apart from the offbeats (which are synchronized with the rhythm section).

Fig. 3: Timing manipulations.
figure 3

Schematic representation of the timing manipulations we used in the experiment to probe the effects of microtiming deviations on the swing feel. Importantly, all manipulations were done so as to keep the same swing ratio for the soloist (i.e., piano). Full lines represent exact quarter note positions (metronome beats). The dashed line shows the position of the offbeats corresponding to a chosen “optimal” swing-ratio, referred to as ropt in the upper-left frame. Black notes and gray notes denote timing positions of soloist and rhythm section, respectively, in the different manipulations. In the “quantized original” version (green background) underlying all further manipulations, the microtiming deviations of the soloist's original performance are suppressed and the notes are aligned with the grid. In the “both delayed” version (red background), all notes of the soloist are delayed by 85 ticks. Finally, in the “downbeat delayed” version (brown background), additionally, the offbeats of the rhythm section are synchronized with the offbeats of the soloist. This procedure creates downbeat delays of 85 ticks for the soloist without changing the soloist's swing ratio, but increases the swing ratio of the rhythm section.

We presented the manipulated audio extracts of four different pieces (“The smudge”, “Texas blues”, “Jordu”, “Serenade to a cuckoo”) to professional and semiprofessional jazz musicians in an online experiment. Participants were asked to compare all three manipulations with each other and to respond to the questions “Did it swing?” and “Did it groove?” for each piece separately. Answers were given on a scale from 1 (“not at all”) to 4 (“very much”). The responses to one of the pieces (“Serenade to a Cuckoo”), were not included in the analyses due to an ill-chosen swing ratio for the rhythm section (see “Methods” section). We, therefore, conducted a second experiment on this piece testing the influence of different swing ratios. The results of the second experiment were much in line with the results for the other three pieces presented in the following paragraphs (see Supplementary Results 3: Serenade to a cuckoo, second experiment testing different swing ratios).

The results show that professional and semiprofessional jazz musicians gave the highest swing ratings to versions with delayed downbeats and synchronized offbeats (i.e., the downbeat delayed version). This is apparent in the average distribution of swing ratings across three pieces shown in Fig. 4 as well as in Supplementary Fig. 9. In Fig. 4 one can see that the downbeat delayed version obtained a large proportion of high ratings (3 and 4, blue colors) while the quantized original or both delayed versions received considerably smaller fractions of high ratings. The results on the groove ratings show a similar pattern with considerably smaller effect sizes of the manipulations (see Supplementary Results 2: Groove ratings). It is worthwhile pointing out that professional musicians gave overall lower ratings than semiprofessionals, which is noticeable in particular for the highest rating in the downbeat delayed version (6.5% vs 31.4% for professionals and semiprofessionals, respectively; see Supplementary Fig. 9). We made a similar observation in our earlier study24. This finding probably reflects the higher standards and expectations of professional musicians. An ordinal logistic regression of the swing ratings upon manipulation, musician category, and their interaction statistically confirmed the results described above (cf. Table 1). The downbeat delayed versions received significantly higher swing ratings than quantized original versions not having any delays (p < 0.001). No significant difference was observed comparing the swing ratings of both delayed versions to those of the quantized original versions (p = 0.440). Professional jazz musicians gave significantly lower ratings than semiprofessional musicians (p = 0.019). In addition, the effect of the downbeat delayed versions (vs. quantized original) was larger for semiprofessionals than for professionals (p = 0.022).

Fig. 4: Distribution of swing ratings given by professional and semiprofessional jazz musicians to different manipulated versions.
figure 4

The three stacked histograms display the proportions of different possible ratings from 1 (“not at all”) to 4 (“very much”) averaged over three pieces. The downbeat delayed manipulation in the center elicits a much larger portion of high ratings (3 and 4 in blue colors) than the two other manipulations.

Table 1 Results of ordinal logistic regression for swing ratings.

The odds ratios as well as their associated confidence intervals for the different conditions are summarized in Table 1. The odds ratio of the downbeat delayed versions as compared to the quantized original versions was 7.48. In other words, delaying the soloist’s downbeats while synchronizing the offbeats makes it more than seven times more likely that jazz musicians judge the recording as more swinging than the quantized original. To further validate this effect, we performed three additional checks to analyze the statistical power and to test for potential effects of outliers and sample size (see Supplementary Results 2: Statistical power and robustness). They yield a very high statistical power together with a high robustness of the effects. Separately, we also analyzed participants’ ratings for only the very first piece they listened to, in order to ensure that the results were not affected by repeating the task or by being asked whether one perceived differences between versions (see Supplementary Results 2: Additional analyses on swing ratings).

To elucidate the discriminability between the different manipulations, we determined receiver operating characteristic curves (ROC) for each piece in Fig. 5. These ROC curves compare the cumulative proportions of the four ratings for two conditions mapped along the horizontal and the vertical axis, i.e., two of the stacked histograms of Fig. 4 are plotted against each other along an axis each. A deviation from the diagonal to either side indicates higher swing ratings for one of the conditions and shows that listeners discriminate between the versions and perceive one of them as more swinging. The area under the curve (AUC) quantifies the deviation from the diagonal (AUC = 0.5 means no discrimination) and is an effect size that can be tested for significance. The effect is statistically significant, if 0.5 is not within the AUC confidence interval (CI). Comparing the downbeat delayed to the quantized original manipulations (blue curves in Fig. 5) shows higher swing for the downbeat delayed versions with significant AUC values for all three pieces: AUCThe smudge = 0.71 ± 0.13, AUCTexas blues = 0.70 ± 0.12 and AUCJordu = 0.69 ± 0.13. Comparing the downbeat delayed and both delayed manipulations (black curves in Fig. 5) also shows higher swing for the downbeat delayed versions with significant AUC values. By contrast, the yellow curves and their AUC values display no significant difference between the both delayed and quantized original versions. Taken together, these findings imply that delaying the soloist’s downbeats while synchronizing offbeats has a significant positive impact on swing, whereas uniformly delaying all soloist notes does not.

Fig. 5: Receiver operating characteristic (ROC) curves for the swing ratings of three pieces.
figure 5

These curves compare cumulative proportions of the ratings 4 to 1 for two conditions mapped along the horizontal and the vertical axis, i.e., two of the stacked histograms of Fig. 4 are plotted against each other along an axis each. A deviation from the diagonal to either side indicates higher swing ratings for one of the conditions and shows that listeners discriminate between the versions. The area under the curve (AUC) quantifies the deviation from the diagonal (AUC = 0.5 means no discrimination). Comparison of the downbeat delayed with the quantized original manipulation (blue curves) and with the both delayed manipulation (black curves) shows a significant preference for the downbeat delayed versions. Statistical significance is confirmed for these two curves by the AUC confidence intervals (CI), which do not contain AUC = 0.5.

Discussion

The research presented in this article aimed at identifying systematic MTDs in recorded jazz solos and clarifying their possible role for swing in jazz. Our observational study analyzing more than 400 recordings showed that downbeat delays, although piece and player dependent, are used by many jazz soloists and follow a clear tempo-dependent trend with increasing delays for decreasing tempo.

To find out whether these downbeat delays are relevant for swing, we conducted an experimental study. In lack of a generally accepted definition of swing, we used an operational definition of swing (a performance swings if it is judged as swinging by expert listeners). This approach required introducing a number of simplifications. In particular, we used a quantized original version as a well-defined starting point for manipulating the recordings. Another simplification was to consider a solo instrument, a piano, playing on top of a quantized rhythm section. Moreover, we focused on pieces with many downbeat-offbeat pairs, which are prominent in jazz music, in order to study the role of their microtiming. As soloists sometimes vary their playing style within a piece or even within a solo, it was necessary and worthwhile to make such simplifications, in order to reveal general trends.

In his 1987 paper7, Charles Keil made the strong claim that swing is created by MTDs in the form of participatory discrepancies, yet this claim had remained hypothetic, as long as it could not be proved. With our experimental approach, we were not only able to obtain an empirical proof, but we could also substantiate, which type of MTDs/participatory discrepancies is able to strongly enhance swing, and which types are not.

Our experimental study yielded the clear and significant result that soloists delaying their downbeats while synchronizing their offbeats with the rhythm section considerably enhance swing. Random participatory discrepancies, on the other hand, can be detrimental to swing, as we showed previously24. In the spirit of Charles Keil’s claim, we conclude that these downbeat delays are a key component of swing in jazz. To the best of our knowledge, it is the first time a positive impact of certain MTDs on swing was shown. While the authors of early observational studies14,16 investigating downbeat timing could only speculate that occurring downbeat delays play a role for swing, our experimental approach provides direct evidence of their positive effect.

Designing the experimental study a number of decisions had to be made. The number of different tested conditions had to be limited, in order to keep the survey to a reasonable length for participants. We based the choice of conditions on our observational analysis of the Weimar Jazz Database as well as on a prestudy conducted on a limited number of professional participants (see “Methods” section for details). Two potential conditions we considered were not included in the experiment as all participating professional jazz musicians in the prestudy clearly judged them as not swinging: (i) perfect synchrony between the soloist and the rhythm section, but quantized to a higher swing ratio (corresponding to the swing ratio of the rhythm section in the downbeat delayed condition) and (ii) a condition where the rhythm section is using the increased swing ratio of the downbeat delayed condition (as in Fig. 3) while the soloist’s downbeats and offbeats remain identical to the quantized original condition. As these two conditions have no downbeat delays of the soloist, they can differentiate whether in our downbeat delayed condition it is the downbeat delays or the increased swing ratio of the rhythm section that plays the important role.

To validate the above effects we did not only carry out an ordinal logistic regression, but also three additional statistical checks that are included in the Supplementary Results 2. They yielded a very high statistical power together with a high robustness of the effects with respect to sample size and outliers. We also could demonstrate in the Supplementary Results 2: Additional analyses on swing ratings, that the results were not affected by repeating the task or by the question on perceived differences, which we asked participants whenever they gave identical ratings to different versions. While our operational definition of swing explicitly refers to expert listeners, i.e., professional and semi-professional jazz musicians, we also checked for the generalizability of our results to other populations by including amateur jazz-musicians and non-jazz musicians as a third category in the analysis (see Supplementary Results 2: Additional analyses on swing ratings). The main results remain unchanged with amateurs and non-jazz musicians showing similar albeit smaller effects than professional and semi-professional jazz musicians.

Previous experimental studies have investigated the role of MTDs for groove20,21,22,23,25,26, defined as a pleasant sensation and the impetus to move along with the rhythm. Whereas groove can exist in music without swing, jazz musicians argue that swing requires groove. The influence of different systematic microtiming patterns on groove (besides random MTD) was studied in particular by Davies et al.20 in percussion musical examples without soloists. In almost all cases (samba, funk, and jazz musical examples) the authors found that their systematic MTD as well as random MTD did not increase groove ratings, but rather were detrimental for groove. An exception was the case, when a specific jazz microtiming pattern was applied to a jazz musical example. Here expert listeners’ groove ratings first increased and then decreased for an increasing magnitude of the systematic jazz MTD. As the jazz microtiming pattern was a typical repetitive long-short (downbeat-offbeat) pattern, changing the microtiming magnitude merely meant changing the swing ratio. Thus, expert listeners’ higher ratings for intermediate microtiming magnitudes reflect a preference for an intermediate swing ratio. Unfortunately, we do not know whether the increase in groove was also accompanied by an increase in swing as participants were not asked about swing.

In our study, we inquired about swing and groove (see Supplementary Results 2: Groove ratings). Overall, the groove ratings are similar to the swing ratings due to the fact that swing and groove are closely related concepts. Therefore it seems plausible to assume that a manipulation that increases perceived swing will also affect the perceived groove. Nevertheless, our additional analyses suggest that swing ratings are influenced more strongly by the downbeat delays than groove ratings, hinting at a partial dissociation of these concepts. This finding is consistent with the assumption that swing and groove share certain characteristics (e.g., enjoyment and entrainment), but swing features additional characteristics that go beyond groove. Thus swing implies groove but not vice versa. Validating this dissociation, however, will require further experimental studies.

Our findings are of interest to various fields, from the physics of social interactions and human behavior to psychoacoustics and the perception of musical rhythms. They also have implications for music education and music production. Many modern digital audio workstations offer options for “swingifying” computer-generated music. So far these features are of limited value, as they mainly serve to introduce a suitable swing ratio. Adding downbeat delays according to our findings would help improve these features for digital music production.

The question might come up, whether downbeat delays are specific for swinging jazz, or whether they show up more generally. We therefore also carried out timing analyses of latin music of various origins, e.g., The Latin Pianist by PG music37. We found that downbeat delays, where they occur, are very small, sometimes negative, and are mostly below the threshold of timing accuracy.

Considering future research, there is plenty of room for lifting some restrictions we imposed in our study. For instance, we did not study ensemble microtiming such as in big band performances, but we believe that similar mechanisms are at work as in the case of soloists. Furthermore, jazz soloists also typically use other rhythmic values, e.g., 8th-note triplets among others. Future work should aim at elucidating trends of MTDs for these other rhythmic values.

It is important to note, that the downbeat delays of the order of 30 ms studied here are not related to the well-known laid-back style that is occasionally applied by jazz musicians. These small downbeat delays were not perceivable as such by professional jazz musicians in the recordings. The much larger delays in laid-back playing—on the other hand—are easily perceivable and are also applied to offbeats, even though the detailed nature of delays in laid-back playing still remains to be clarified in future work. In their comments to the online study, some professional jazz musicians reported that they could perceive a pleasant friction between soloist and rhythm section, but were amazed that they could not determine its nature. They apparently could “feel it”, but they just couldn’t “explain it”.

Methods

Timing analysis of jazz solos

To clarify the question, whether jazz musicians apply systematic MTD, we analyzed 456 jazz solos from the Weimar Jazz Database29. The transcriptions in this database were obtained using the sonic visualizer software, a software enabling precise spectral visualization of audio data38. In the database, each note of a piece is stored as a collection of entries describing its properties. Features like pitch, onset (in ms) and quarter note subdivision are available, as well as the temporal position (in ms) of the measure to which it belongs. This detailed representation permits computation of mean quantities such as average metrical positions of downbeats and offbeats as a function of tempo. To determine the root mean squared error of note onsets in the database, we performed two successive transcriptions of John Coltrane’s “Giant steps” with the help of sonic visualizer and compared the obtained values. We found the root mean squared error of note onsets to be around 20ms. This reflects the human nature of the transcriptions and should be taken into account when considering the variance in our results. We converted our results to ticks, the unit of time in MIDI format, as ticks time units are required to manipulate MIDI files. The conversion from ticks to milliseconds is done via:

$${t}_{{{{{\rm{ms}}}}}}=1000\times \frac{{t}_{{{{{\rm{ticks}}}}}}}{{{tpq}}}\times \frac{60}{{{tempo}}}$$
(1)

where tpq is a MIDI variable representing the number of ticks per quarter notes in the piece, its default value is 960. We also computed the mean swing ratio as a function of tempo for the pieces of the database. The swing ratio is a quantity used to measure the asymmetry of downbeat-offbeat pairs in jazz. For a triplet of three adjacent downbeat, offbeat and downbeat, it can be defined as:

$$r=\frac{{p}_{ob}-{p}_{d{b}_{1}}}{{p}_{d{b}_{2}}-{p}_{ob}}$$
(2)

with pob the position of the offbeat, \({p}_{d{b}_{1}}\) and \({p}_{d{b}_{2}}\) the position of the first and second downbeat, resp. The mean swing ratio of a given piece is then the average of equation (2) for all such triplets in the piece.

Timing manipulations

The code to perform the manipulations on the MIDI recordings was written in Julia, using the MusicManipulations.jl package39. The mp3 audio examples were generated using the Reaper software with plugins by Native Instruments (“The Gentleman” piano, “50s drummer” drumset and the acoustic bass from the standard library). All recordings used in the present study were first quantized to a grid (see Fig. 3) whose swing ratio was adjusted to a value guided by the average swing ratios observed by Friberg and Sundström16,34 and our own analyses of the Weimar Jazz Database, assuming that this provides optimized swing ratios. The chosen swing ratios of the three pieces were close to the optimal values ropt of Fig. 2 and are listed in Table 2. In our timing manipulations, we took care not to modify the swing ratio of the piano track across versions to ensure that swing ratings were not affected by a more or less optimized swing ratio of the soloist. Triplets were not manipulated as they rarely occurred in the recordings we used and their dependence on tempo as well as their relation to surrounding 8th-notes would be harder to quantify and would distract from our main objective. After quantizing a given recording, we performed two further manipulations, as sketched in Fig. 3 in the both delayed and downbeat delayed boxes. In the downbeat delayed version, soloist downbeats were delayed while the offbeats of soloist and rhythm section were synchronized. The delay value was fixed to 85 ticks as this allowed us to have a common delay value for all pieces (see Timing analysis of jazz solos in the Methods section), taking into account the tempo range of our recordings and the variance of our observations (see Fig. 1). In order to keep the swing ratio of the soloist unchanged, we first delayed the whole soloist track and then synchronized all rhythm offbeats with the delayed soloist offbeats. In the both delayed manipulation, the whole soloist track was delayed with respect to the rhythm section. This version allowed us to test, whether the mere presence of a delay is relevant for the swing feel, or whether the offbeat synchronicity is also crucial.

Table 2 Characteristic parameters of the recordings used in the experiment.

In consequence, we had three versions of each piece: the quantized original, a version with a delayed soloist, and a version where downbeats were delayed but offbeats were synchronized. In Fig. 3, these versions are called quantized original, both delayed, and downbeat delayed.

Online experiments

Jazz musicians recruited through musical conservatories, universities, big bands, and personal contacts were asked to participate in an anonymous online study designed with the EFS Survey (Unipark, 2019). The software did not allow collecting any data or meta-data from participants thus guaranteeing anonymity. Participants were free to end the study at any time. Following our operational definition of swing, the study aimed at professional and semiprofessional jazz musicians, as they are expert listeners highly familiar with swing. Musicianship was determined by self-assessment of the participants who categorized themselves into one of five categories: (1) professional jazz musician, (2) semiprofessional jazz musician, (3) amateur jazz musician, (4) non-jazz musician, or (5) non-musician. We analyzed data from 19 semiprofessional and 18 professional musicians, who took sufficient time to rate the recordings (at least two pieces with 5 min per piece in 3 versions). The majority of respondents were males (n = 30), 5 were females and 2 were participants without gender information. Mean age was M = 38.59 years (SD = 16.10 years). Nineteen amateur jazz musicians and non-jazz musicians also participated (results for these participants can be found in the Supplementary Results 2: Additional analyses on swing ratings). Non-musicians were notified beforehand that their results would not be taken into account. Two non-musicians participated in this way. Participants provided information about their current daily practice and the number of concerts played within the last year (see Supplementary Information: Table 1).

In the experiment, participants were presented with three versions of four different pieces. The three versions resulted from the manipulations described above. We based the choice of conditions on our observational analysis of the Weimar Jazz Database as well as on a prestudy conducted on a limited number of professional jazz musicians, which helped us identify various unpromising conditions early on. One of the potential conditions we did not investigate further, was perfect synchrony between the soloist and the rhythm section, but quantized to a higher swing ratio (corresponding to the swing ratio of the rhythm section in the downbeat delayed condition). Similarly, we could also exclude a condition, where the rhythm section merely used the increased swing ratio of the downbeat delayed condition (as in Fig. 3), whereas the soloist’s downbeats and offbeats remained like in the quantized original condition (i.e., without downbeat delays). The decision to exclude these two conditions was motivated by the fact that all participants of the prestudy rated and commented these conditions negatively, leaving little room for ambiguity.

The four pieces consisted of live MIDI recordings of two professional jazz pianists to which a standard swing drum track was added, which consisted of 8th-notes on the ride cymbal and hi-hat hits on the 2 and 4 beat of each measure. A bass-line was also included, playing quarter notes to outline the harmonic background. The pieces “Serenade to a Cuckoo” and “Jordu” were recorded in our accoustics lab on a Kawai ES7 keyboard, the other two “The smudge” and “Texas blues” were performed by Miles Black on the PG music “Oscar Peterson multimedia CD”. These pieces were chosen because of their large number of 8th-notes, maximizing the effects we seek to study. As noted above, one of the pieces of the experiment (“Serenade to a Cuckoo”) originally had an ill-chosen swing ratio, which by itself seemed to influence the swing ratings. In particular, the downbeat delayed manipulation led to a very large swing ratio of 2.91 for the rhythm section (compared to about 2.5 for the other pieces, see Table 2). We therefore did not include “Serenade to a Cuckoo” in the analysis and ran an additional study for this piece to clarify the influence of different swing ratios (For details and findings see Supplementary Results 3: Serenade to a cuckoo, second experiment testing different swing ratios).

All participants were asked to use headphones to better perceive the subtle differences between versions and to minimize external noise. Before starting the experiment, participants were reminded of the notion of groove (“musical aspect that induces a pleasant sensation […] of wanting to move along […]”) and were told that swing requires groove, but there can be groove without swing. No specific definition of swing was provided, so that participants could follow their acquired intuition on what constitutes swing. Participants were given an audio example for each in order to get sensitized to the differences of these concepts.

In the main part of the study, the different pieces were presented in random order. For each piece, participants were given all three manipulated versions on one page to allow them to switch back and forth between versions and listen to them as often as they liked. For each version participants had to rate the groove and swing on a 4-point scale ranging from “not at all” to “very much”. When the same swing ratings were given to different versions, participants were subsequently asked, whether they could perceive a difference between the versions. The question aimed to clarify whether the same ratings were due to (a) no perceived differences or (b) the same degree of swing of perceptibly different versions. Note that only (b) would entail that systematic MTD make no difference with respect to swing.

Statistical analyses

To analyze the participants’ swing ratings and their dependence on the manipulations and the categories of musicians, we performed an ordinal logistic regression. It takes into account the ordinal nature of our 4-point rating scale, with ordered but unstructured thresholds for our four response categories40 and was based on the statistical model

$$\log \left[\frac{\Pr ({{{rating}}}\le j)}{1-\Pr ({{{rating}}}\le j)}\right]={\alpha }_{j}+{\beta }_{1}{x}_{1}+{\beta }_{2}{x}_{2}+{\beta }_{3}{x}_{1}{x}_{2}$$
(3)

Variables x1 and x2 represent the manipulation and musician category, x1x2 their interaction. The list of all parameters for the model can be found in the Supplementary Information. The quantized original version and semiprofessional jazz musicians were chosen as reference categories to be compared to the other versions and musician category. The resulting odds ratios indicate how much more likely it is that the respective version elicits higher swing ratings than the quantized original version. A value larger than 1 signifies a higher probability, a value lower than 1 a reduced probability. For detailed analyses of statistical power and robustness of the effects see Supplementary Results 2: Experiment on the perceived swing feel.

To analyze whether and how participants discriminated between the different versions of a piece, we determined ROC and computed areas under the curve (AUC). The AUC is a measure of discriminability and an effect size. In a ROC analysis, one version is assigned to the abscissa, the other to the ordinate. The ROC-curve reflects the cumulative frequencies of each rating category starting from 4 (very much) to 1 (not at all). A diagonal line results, if participants do not differentiate between versions or have no preference. The more the ROC-curve deviates from the diagonal to the top, the higher the ratings for the version assigned to the ordinate in comparison to the version assigned to the abscissa. For the discriminability of the versions to be significant, the ROC-curve must deviate substantially from the diagonal, that is, the confidence interval of the AUC must not include 0.5 (the area under the diagonal)41.

Ethics statement

All experimental procedures adhere to the Ethical Principles of the American Psychological Association42 and are in full accordance with the guidelines of the local ethics committee and federal regulations. All participants were fully informed about the aims and procedures of the study, and gave informed consent before participating in the survey. The study was conducted anonymously.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.