Introduction

Training for complex skills such as musicianship has far-ranging effects on cognition, although the directionality of these relationships is less well understood. Previous work has found impacts of musical training on enculturation, attention, executive function, low-level auditory processing, working memory, speech-in-noise perception, foreign language perception, cognitive prediction, and more (see Hannon & Trainor, 2007; Kraus & Chandrasekaran, 2010, for reviews). Whereas the cognitive effects of musical training have received much attention from researchers, comparatively less work has addressed downstream effects of musical training on other psychological domains, such as aesthetic preference. Accordingly, the present study sought to draw links between musical training, working memory capacity (WMC), and preference for musical complexity.

Musical training and working memory capacity

Previous research suggests that musical training is positively associated with WMC across the lifespan (Brandler & Rammsayer, 2003; Bugos, Perlstein, McCrae, Brophy, & Bedenbaugh, 2007; Chan, Ho, & Cheung, 1998; Franklin et al., 2008; Fujioka, Ross, Kakigi, Pantev, & Trainor, 2006; George & Coch, 2011; Hanna-Pladdy & Gajewski, 2012; Ho, Cheung, & Chan, 2003; Lee, Lu, & Ko, 2007; Nutley, Darki, & Klingberg, 2014; Pallesen et al., 2010; Parbery-Clark, Skoe, Lam, & Kraus, 2009; Parbery-Clark, Strait, Anderson, Hittner, & Kraus, 2011; Roden, Grube, Bongard, & Kreutz, 2014; Schellenberg, 2006; Schulze, Dowling, & Tillmann, 2012; see also the meta-analysis by Talamini, Altoè, Carretti, & Grassi, 2017). There seems to be a tendency for researchers to assume that musical training has a causal impact on increasing WMC, despite the fact that the majority of studies have employed cross-sectional designs. For instance, Talamini et al.’s (2017) meta-analysis reported that musicians outperformed their less musically trained counterparts in long-term memory, short-term memory, and working memory tasks. Further, the authors reported that task type moderated the relation between musicianship and memory performance. Specifically, task type moderated the size of the musician advantage for short-term and working memory tasks, but not long-term memory tasks. Further, this moderation was largest for tonal stimuli, moderate for verbal stimuli, and smallest or null for visuospatial stimuli. These authors also noted that many of the studies included in their meta-analysis did not control for pre-existing cognitive differences, which may also play a role in explaining performance due to selection biases (Okada & Slevc, 2018).

Relatively fewer studies have employed a longitudinal design (Bugos, Perlstein, McCrae, Brophy, & Bedenbaugh, 2007; Fujioka, et al. 2006, Nutley, Darki, & Klingberg, 2014; Ho, Cheung, & Chan, 2003; Roden, Grube, Bongard, & Kreutz, 2014). Of note are the studies by Bugos et al. (2007) and Roden et al. (2014), which used random assignment to music versus control groups, providing stronger evidence for causal direction. These studies converge with cross-sectional findings, providing evidence for the assumption that musical training leads to improved working memory capacity. For example, Roden et al. (2014) provided either music or natural science training to primary school-aged children over a period of 18 months, and found that the music group outperformed the natural science group on working memory tasks post-training.

Regardless of causal direction, a secondary question is whether the relation between musical training and working memory is domain specific (i.e., near transfer to auditory working memory) or domain general (i.e., far transfer to working memory more broadly). The evidence regarding the specificity of this relation is mixed, with some studies finding that musical training impacts auditory but not visual working memory tasks (Brandler & Rammsayer, 2003; Chan et al., 1998; Hanna-Pladdy & Gajewski, 2012; Ho et al., 2003; Talamini et al., 2016), and other studies finding that musical training impacts working memory across domains (Bugos et al., 2007; George & Coch, 2011; Nutley et al., 2014; Slevc, Davey, Buschkuehl, & Jaeggi, 2016).

Musical training and preference for musical complexity

Beyond working memory capacity, musical training is also associated with an increased preference for musical complexity (Burke & Gridley, 1990; Getz, Marks, & Roy, 2014; Ginocchio, 2009; Gregory, 1994; Hargreaves, Messerschmidt, & Rubert, 1980; Keston & Pinto, 1955; Przysinda, Zeng, Maves, Arkin, & Loui, 2017, although see Dunn, de Ruyter, & Bouwhuis, 2012, for an exception). For instance, Getz et al. (2014) found that musicians tended to prefer music that was labeled as “reflective and complex” more than non-musicians, as measured by the Short Test of Musical Preferences (STOMP; Rentfrow & Gosling, 2003).

The mechanism by which this association develops is less well understood. One potential explanation has to do with exposure. Since one purpose of musical education is to expand students’ knowledge and appreciation for a diversity of musical styles (Droe, 2006), one could expect musically trained individuals to have encountered a larger breadth of music, including more complex styles, more often than musically untrained individuals. This heightened exposure might then lead to an increase in preference for complex music. Indeed, Price (1988) recorded students’ favorite composers before and after a music appreciation course, and found that students were much more likely after the course to include composers encountered during the course in their ranked listings of preferred composers. Furthermore, Schellenberg, Peretz, and Vieillard (2008) demonstrated experimentally that liking ratings for musical excerpts increased linearly over multiple exposures.

Musical training, working memory, and preference for complexity

In summary, a robust literature supports associations between musical training and working memory capacity on one hand, and musical training and preference for complexity on the other. Given these two bivariate relations, we theorized that musical training and working memory might interact to produce preference for complexity.

This idea is supported by research in vision. Sherman, Grabowecky, and Suzuki (2015) found that participants preferred visual artworks that were compatible with their visual working memory capacity. Relatedly, Reber and colleagues have found that manipulating perceptual fluency in visual figures increases perceivers’ ratings of aesthetic pleasure (Reber, Schwarz, & Winkielman, 2004; Reber, Winkielman, & Schwarz, 1998). These authors proposed an interactionist view arguing that judgments of beauty emerge from stimulus properties making contact with cognitive and affective processing. This empirical literature dovetails with theoretical accounts suggesting that aesthetic preferences depend in part on an optimum level of complexity for the perceiver (Berlyne, 1971, 1974; Walker, 1973). Thus, musicians may enjoy musical complexity more than non-musicians (Berlyne, 1971, 1974; Gordon & Gridley, 2013; Heyduk, 1975; Orr & Ohlsson, 2005; Wundt, 1874) partly by virtue of their superior working memory capacity, just as visual art experts enjoy visual complexity more than non-experts (Silvia, 2006; Winston & Cupchik, 1992).

Taken together, this body of literature motivates the development of a model that describes the associations between musical training, WMC, and aesthetic preference. We propose that musical training may be related to an increase in WMC, and it is this increase in WMC that explains a commensurate increase in aesthetic preference for complexity in musicians. Put differently, we propose that WMC mediates the relation between musical training and preference for musical complexity.

The current study

The purpose of the current study was to test a model of musical preference in which the relation between musical training and preference for complexity is explained by WMC. This model is tested in a sample of 234 participants who were assessed for their musical experience, WMC (tone, symmetry, and operation span), and preference for various genres of music, as well as a variety of demographic characteristics.

We predicted that musical training would be positively correlated with both WMC and preference for musical complexity, and that WMC would significantly mediate the association between musical training and preference for musical complexity. Here we operationally defined preference for musical complexity via an individual’s preference for genres as categorized by the Short Test of Musical Preferences or STOMP (Rentfrow & Gosling, 2003). Additionally, we investigated whether WMC exerts its effects in a domain-specific (i.e., auditory) or domain-general fashion, and whether the model was affected by demographic variables that have been shown in previous studies to impact engagement with musical activities (Corrigall & Schellenberg, 2015; Corrigall, Schellenberg, & Misura, 2013). Our study plan was pre-registered on the Open Science Framework, and can be viewed at https://osf.io/x97bv.

Methods

Dataset

The dataset analyzed in the present study has been reported in Baker et al. (2018). 254 students enrolled at Louisiana State University were recruited for the present study. Participants were paid $15, volunteered, or participated for course credit. After screening for participant eligibility and data collection error (see Baker et al., 2018), 242 participants met the criteria for inclusion. The eligible participants were between the ages of 17 and 38 years (M = 20.64, SD = 3.23) and included 76 men and 165 women (one person did not identify gender). Participants’ formal years of musical training were between 0 and 21 (M = 4.71, SD = 4.58) and years of learning music theory were between 0 and 21 (M = 2.24, SD = 3.45). The full data file (WmMusicII_OSF.csv) and data dictionary (wmMusicII_datadictionary.csv) are available at https://osf.io/sw4qc/. However, of those who met criteria, there were additional participants who were missing data on just one task (such as only missing a WMC measure due to computer error) and the full set with complete data on all measures was N = 234 (ParticipantDetails.csv, https://osf.io/mkvs5/).

Variables of interest

As reported in Baker et al. (2018), participants completed a large battery of tasks, lasting a total of approximately 90 min. For the current study, we analyzed a subset of variables from this battery, including musical training, WMC, preference for musical complexity, and demographics. Operational definitions for each of these variables are described below.

Musical training

Musical Training was measured using the Goldsmiths Musical Sophistication Index (Gold-MSI; Müllensiefen, Gingras, Musil, & Stewart, 2014). The Gold-MSI was developed to measure various aspects of musical sophistication, independent of preference for particular musical styles, and includes five facets: active musical engagement, self-reported perceptual abilities, musical training, self-reported singing abilities, and sophisticated emotional engagement with music.

For the musical training subscale of the Gold-MSI, participants self-reported responses on a 7-point Likert scale to items such as “I have had __ years of formal training on a musical instrument (including voice) during my lifetime” and “I have had formal training in music theory for __ years.” For our analysis, musical training was defined using participants’ total score on this subscale of the Gold-MSI, coded according to Müllensiefen et al. (2014).

Working memory capacity (WMC)

To assess working memory, participants completed three complex span tasks adapted from Unsworth, Heitz, Schrock, and Engle (2005). In the Tone Span task, participants completed math judgments and recalled a sequence of tones. In each math judgment, participants saw an arithmetic problem and had to determine whether the solution presented was true or false. After each math operation, a high, medium, or low tone was presented aurally for 1,000 ms. The three tones were played at frequencies outside of the equal tempered system (200 Hz, 375 Hz, and 702 Hz; after Li, Cowan, & Saults, 2013). These frequencies were chosen to avoid familiarity with the pitches of Western tonality. During recall, participants were asked to recall the order of high, medium, and low tones (no time limit). The test procedure included three trials of each list length (three to seven tones), with a maximum score of 75 (Baker et al., 2018). Because Tone Span focuses on recall of auditory stimuli, we used scores on this task as our operational definition for auditory WMC.

In the Symmetry Span task, participants completed symmetry judgments and recalled a sequence of locations of a red square. The symmetry judgment was performed on an 8 × 8 matrix with random squares filled with black; participants were required to judge whether the black square pattern was vertically symmetrical. After each symmetry judgment, a red square was presented on a 4 × 4 matrix for 650 ms. During recall, participants were asked to recall in order the location of each red square (no time limit). The test procedure included three trials of each list length (two to five red squares), with a maximum score of 42 (Baker et al., 2018). Because Symmetry Span does not focus on recall of auditory stimuli, we used scores on this task to test questions regarding domain-specificity of working memory capacity in our mediation model.

In the Operation Span task, participants completed a math judgment and recalled a sequence of letters. In each math judgment, participants saw an arithmetic problem and had to determine whether the solution presented was true or false. After each math operation, a letter was presented visually for 1,000 ms. During recall, participants saw a 4 × 3 matrix of all possible letters, and were asked to recall the order of letters in the sequence. The test procedure included three trials of each list length (three to seven letters), with a maximum score of 75 (Baker et al., 2018). Because Operation Span does not focus on recall of auditory stimuli, we used scores on this task to test questions regarding domain-specificity of WMC in our mediation model.

Preference for musical complexity

Preference for musical complexity was measured using the Reflective and Complex dimension of the STOMP (Rentfrow & Gosling, 2003). The STOMP is a 14-item scale assessing preferences in music genres across four broad music preference dimensions: Reflective and Complex, Intense and Rebellious, Upbeat and Conventional, and Energetic and Rhythmic. The Reflective and Complex dimension of the STOMP includes participants’ preference for jazz, folk, classical, and blues on a 7-item Likert scale (1 = strongly dislike, 7 = strongly like). To obtain a measure of Preference for Musical Complexity, we averaged participant scores for the four genres in the Reflective and Complex dimension.

Demographic variables

The current study also considered participant demographics that have previously been shown to be associated with engagement in musical activities, including age, gender, and socioeconomic status (Corrigall & Schellenberg, 2015; Lima, Correia, Müllensiefen, & Castro, 2018). Socioeconomic status (SES) was an aggregate measure composed of parental education and family income. Participant education was not included since there was very little variability in the current college student sample. Participants reported education for each parent by selecting one of eight categories: 1 = some high school, 2 = completed high school, 3 = some associates/vocational program (e.g., AA or AS), 4 = completed associates/vocational program, 5 = some undergraduate degree (BA, BS, BM), 6 = completed undergraduate degree, 7 = some graduate degree (PhD, JK, MD, MA), 8 = completed graduate degree, 9 = NA (Corrigall & Schellenberg, 2015). Participants raised by one parent reported 9 = NA for the absent parent. Participants reported family income by selecting one of nine categories: 1 = less than $25,000, 2 = between $25,000 and $50,000, 3 = between $50,000 and $75,000, 4 = between $75,000 and $100,000, 5 = between $100,000 and $125,000, 6 = between $125,000 and $150,000, 7 = between $150,000 and $175,000, 8 = between 175,000 and 200,000, 9 = greater than $200,000. SES was calculated by the sum of each participant’s family income and each parent’s education level. For participants who reported only one parent’s level of education, that parent’s education level was counted twice for the calculation of SES.

Analysis procedure

Our predictions and analysis plan have been pre-registered and can be viewed at https://osf.io/x97bv, along with our analysis code https://osf.io/7jz8n/. All analyses were carried out using R (R Core Team, 2018).

Predicted mediation

As a first step, we confirmed previous research suggesting that there exist significant bivariate associations between all pairwise combinations of Musical Training, auditory WMC, and Preference for Musical Complexity. Next, a mediation analysis was conducted (following Hayes, 2017) to test our hypothesized model [1] in which WMC mediates the relation between Musical Training and Preference for Musical Complexity. Models in this form (Fig. 1) are henceforth referred to as “predicted models.”

Fig. 1
figure 1

Schematic of the predicted mediation model, with coefficients italicized

The total effect of Musical Training on Preference for Musical Complexity, c, can be partitioned into the direct effect of Musical Training on Preference for Musical Complexity (c’) and the indirect effect of Musical Training on Preference for Musical Complexity, through WMC (ab), such that c = ab + c’ (Fig. 1). Of particular interest in our analysis was whether the indirect (i.e., mediating) effect ab was a significant predictor in modeling Preference for Musical Complexity. Significance testing of the mediating effect of Musical Training → WMC → Preference for Musical Complexity (quantified by ab) was carried out via a bootstrapping analysis with 5,000 iterations (Canty & Ripley, 2019; Davison & Hinkley, 1997). We investigated four versions of the predicted mediation, as follows.

  • Mediation [1] tested a domain-specific model in which auditory WMC mediated the relation between Musical Training and Preference for Musical Complexity.

  • Mediation [2] explored whether demographic variables (age, gender, SES) affected mediation by calculating three regression models in which each of Musical Training, Auditory WMC, and Preference for Musical Complexity were predicted by age, gender, and SES. From these models, we then extracted residualized variables, Musical Trainingres, Auditory WMCres, and Preference for Musical Complexityres, which represent the variability in each respective variable that is not predicted by age, gender, and SES. For Mediation [2] these residual variables were then analyzed in the same fashion as the original variables from Mediation [1]. In accordance with our pre-registration, we report on a model that considers fluid intelligence (a composite of standardized scores on Raven’s Progressive Matrices; Raven, Raven, & Court, 1998; and Number Series; Thurstone, 1938), beat perception, and melodic memory in addition to these demographics, in supplementary material.

  • Mediations [3] and [4] tested whether the role of working memory capacity in our mediation model was domain-specific or domain-general. Thus, Mediation [3] replaced Auditory WMC with Symmetry Span, and Mediation [4] replaced Auditory WMC with Operation Span.

Order-switched mediation

Following this, we tested the causal order of the mediation by assessing a model in which the positions of Musical Training and WMC were switched [2]. Our prediction was that the indirect effect (ab) would be significant for our predicted mediations (Fig. 1), but not for these order-switched mediations. Models in this form (Fig. 2) will henceforth be referred to as “order-switched models.” We investigated four order-switched mediations. Mediations [5], [6], [7], and [8] were order-switched versions of predicted mediations [1], [2], [3], and [4], respectively.

Fig. 2
figure 2

Schematic of order-switched mediation model, with coefficients italicized

Results

Table 1 displays the bivariate correlations between the eight variables assessed in the following mediation models. As expected, there is a high degree of intercorrelation among our variables, providing a reasonable starting point for exploring the structure of this overlapping variance via mediation modeling.

Table 1 Bivariate correlations between major variables of interest

Predicted models

Our predicted models hypothesized that working memory capacity (WMC) mediated the relation between musical training and preference for musical complexity (Fig. 1). The parameters for these predicted models are reported in Table 2. Bootstrapped significance testing of the indirect effect (ab) indicated that there was no significant mediation for any version of the predicted model, regardless of whether the measure of WMC was domain-specific or -general (models 1, 3, 4), or whether demographic covariates were taken into account (model 2, see also Supplementary Table 2).

Table 2 Predicted mediation model parameters

Order-switched models

Our order-switched models were structured such that musical training mediated the relation between WMC and preference for musical complexity (Fig. 2). The parameters for these predicted models are reported in Table 3. Bootstrapped significance testing of the indirect effect (ab) indicated that there was significant mediation for all versions of the order-switched model, regardless of whether the measure of WMC was domain-specific or -general (models 5, 7, 8), or whether demographic covariates were taken into account (model 6, see also Supplementary Table 2).

Table 3 Order-switched mediation model parameters

Discussion

The current study tested a model in which auditory WMC mediated the positive relation between musical training and preference for musical complexity. Confirming previous research, Musical Training, Auditory WMC, and Preference for Musical Complexity were significantly intercorrelated. Contrary to our predictions, Auditory WMC did not mediate the relation between Musical Training and Preference for Musical Complexity. Rather, we found that Musical Training mediated the relation between Auditory WMC and Preference for Musical Complexity. Furthermore, this significant mediation persisted when inter-participant demographic differences were taken into account. Finally, the role of WMC in this model seems to be domain-general, as the pattern of mediation observed for WMC was consistent across measures, whether domain-congruent (Auditory WMC) or incongruent (Symmetry Span or Operation Span).

To our knowledge, this is the first study to model the intercorrelations among musical training, WMC, and aesthetic preference, and in doing so, extends work on visual aesthetics (Reber et al., 1998, 2004; Sherman et al., 2015) to auditory processing. The current project makes important connections between generally disconnected bodies of work within audition that seek to understand the effect of musical training on cognition, on the one hand, and affect, on the other.

Interestingly, our modeling indicates that the interrelations between Musical Training, Auditory WMC, and Preference for Musical Complexity do not take the form predicted by the visual literature (i.e., Auditory WMC mediating the association between Musical Training and Preference for Musical Complexity). Rather, our analyses revealed that musical training mediated the positive association between WMC and preference for musical complexity. One straightforward interpretation would be that people with larger auditory WMCs tend to excel at musical training, and that the extensive engagement with complex forms of music provided by this training leads to increased appreciation for these forms. However, we cannot clearly determine directionality given the cross-sectional nature of these data, and are further limited by our restricted sampling (Henrich et al., 2010).

Similar theoretical interpretations have been proposed by Schellenberg and colleagues. For instance, Corrigall, Schellenberg, and colleagues (Corrigall & Schellenberg, 2015; Corrigall, Schellenberg, & Misura, 2013) argued that the cognitive benefits that are associated with musical training can be explained in part by pre-existing differences in personality and cognition that attract children and their parents to music lessons, and retain their interest in musical education. In line with the current findings, Corrigall et al. (2013) also reported that the relation between cognitive ability and musical involvement remained even when demographics were controlled.

Still, the current findings pose important questions regarding the correspondence, or lack thereof, between visual and auditory models of aesthetic preference (Sherman et al., 2015). One potential explanation for this is that we used a different operational definition for aesthetic complexity (i.e., the STOMP) than previous visual studies (i.e., feature density, expert ratings). In addition to exploring converging definitions of complexity, future work should seek to better understand how differences between training in visual versus auditory art might interact with WMC, and how these differences might cause divergent outcomes in patterns of aesthetic preference. For example, individuals with higher scores on measures of WMC may be able to process more information in real time, leading to faster comprehension of structures intrinsic to the structure of the art. A larger WMC, and a related larger window of attention, might also afford preferences for slower tempi in expressive music since the individual would be able to simultaneously track multiple, related musical events unfolding over a longer time span. Pursuing this further might also provide a fruitful investigation into the interaction of relative musical time (tempo choices) given perceptual and cognitive constraints of the listener (Lerdahl, 1992).

Despite this lack of convergence between patterns of visual and auditory preference, we did observe a domain-general effect of WMC in our mediation models. As might be expected, Auditory WMC was more highly correlated to Musical Training and Preference for Musical Complexity than were Symmetry Span or Operation Span. Despite this, we observed significant mediation of the association between WMC and preference for musical complexity by musical training regardless of whether WMC was measured auditorily or visuospatially (Unsworth et al., 2005). Thus, people with higher scores on measures of WMC, regardless of the domain, tended to engage in more musical training, which was related to a higher preference for musical complexity. This finding is supported by previous research showing evidence that the core construct in WMC is domain-general, rather than domain-specific (e.g., Chein, Moore, & Conway, 2011; Kane et al., 2004).

In summary, the current study used mediation analysis to investigate the interrelations between musical training, WMC, and aesthetic preference. Diverging from predictions based on vision research, we found that musical training mediated the association between WMC and preference for musical complexity. These results illuminated differences between vision and audition when it comes to the multifaceted effects of complex skills training on cognition and affect. Moreover, they drive new work aimed at better understanding how domain-general constructs such as WMC might interact with domain-specific cognition.

Open Practices Statement

Data and analysis code from the current study are available via the Open Science Framework (OSF) and can be accessed at https://osf.io/x97bv. Materials used in the current study were all developed by other research groups, and can be accessed via the references we have provided in the text of this article. The experiment reported here was pre-registered on OSF prior to beginning analysis, and our research was conducted following OSF’s pre-registration guidelines and requirements.