Investigating the interaction of pleasantness and arousal and the role of aesthetic emotions on episodic memory using a musical what-where-when paradigm

ABSTRACT Recent evidence suggests that the presence of music experienced as pleasant positively influences episodic memory (EM) encoding. However, it is unclear whether this impact of pleasant music holds regardless of how arousing the music is, and what influence, if any, music-induced aesthetic emotions have. Furthermore, most music EM studies have used verbal or facial memory tasks limiting the generalisability of these findings to everyday EMs with spatiotemporal richness. The current study used an online what-where-when paradigm to assess music's influence on EM encoding in a rich spatiotemporal environment. 105 participants carried out the what-where-when task in the presence of either silence or one of four musical stimuli falling into the four corners of the 2-D circumplex emotion model. We observed an interaction effect between the pleasantness and arousingness of music stimuli, whereby for pleasant stimuli, the low arousing excerpt was associated with better recall performance compared to the high arousing excerpt. We also observed that, across all excerpts, experience of negative aesthetic emotions was associated with compromised recall performance. Together, our results confirm the deleterious influence arousing stimuli can have on memory and support the notion that aesthetic-emotional experience of music can influence how memories are encoded in everyday life.

Does the music that accompanies us in our everyday lives influence how well we later remember associated events? Music is a stimulus that is present during many waking moments and most important life events, and which is able to induce a rich variety of emotions (Omigie, 2016). However, music's influence on episodic memory (EM) encoding, as a function of the emotions it induces in listeners, remains poorly understood.
A number of studies have shown superior recall performance in verbal EM tasks when highly pleasant music is present compared to both silence (Ferreri et al., 2013(Ferreri et al., , 2015 and less pleasant music (Cardona et al., 2020). This influence of pleasant music has been attributed to reward-driven activation of midbrain dopaminergic neurons (Cardona et al., 2020;Ferreri et al., 2021), that in turn increase long-term potentiation in the hippocampus (Lisman et al., 2011).
However, studies on music and EM to date have been largely limited in two ways. Firstly, as few music EM studies have gone beyond a comparison of performance when EM encoding takes place in the presence of silence and pleasant or unpleasant music, there are outstanding questions related to any interaction effects that may exist between felt pleasantness and arousal. Critically, as music with high arousal potential is held to have detrimental effects on cognitive task performance (Kiger, 1989), such musical stimuli, even if experienced as pleasant, may be expected to negatively impact EM. Further, and perhaps more importantly, no studies to date have examined how aesthetic emotions like nostalgia, curiosity, distaste or boredom may influence EM (Schindler et al., 2017) despite evidence that such emotions are reliably induced by music (Barrett & Janata, 2016;Omigie & Ricci, 2021). Critically, given that epistemic emotions like curiosity may lead to improvement in memory for incidental information (Gruber & Ranganath, 2019), while negative emotions like boredom may, in contrast, have detrimental effects (Blondé et al., 2022;Goldberg & Todman, 2018), it seems highly relevant to examine the influence of such complex music-induced emotions on EM.
Secondly, another limitation of existing studies is their reliance on tests that lack the spatiotemporal richness that is characteristic of everyday EMs. EM encoding crucially involves an observer having a first-person perspective and being in a context with temporal, spatial, and perceptual information. To recreate such experiences, so-called "What-Where-When" (WWW) EM paradigms have been developed (Russell, 2014). In real-world embodied versions of these paradigms, participants may be required to hide objects around a room in different encoding sessions, and then later recall which objects they hid, where they hid them and during which session (Smulders et al., 2017). In alternative versions of this paradigm, participants may carry out a computerised version of the task (Silva et al., 2020) or watch experimenters move objects around a physical grid space (Varela et al., 2021).
In contrast to paradigms of EM that fail to take integration of object, location and temporal information into account, WWW tasks successfully measure such memory integration processes. Indeed, neuroimaging (Cheke et al., 2017) has demonstrated the importance of the left hippocampus and left angular gyrus activity during WWW integration alongside the importance of other parietal (precuneus) and frontal areas (dorsolateral prefrontal cortex) for the three different memory components. Critically, based on the neo-Hebbian framework which suggests that reward-driven dopamine transmission in the hippocampus promotes episodic memory formation (Lisman et al., 2011), as well as recent findings regarding music-induced pleasure and memory performance (Ferreri et al., 2021;Ferreri & Rodriguez-Fornells, 2022), it may be expected that music heard during encoding will influence performance on the what-where-when tasks.
To date, no study has taken advantage of WWW paradigms to explore how emotions induced by music may influence EMs in everyday life. The aims of the present study were thus two-fold: Firstly, it sought to comprehensively investigate the influence of music-induced emotion on EM, by examining both how the arousingness of a stimulus interacts with its pleasantness to influence EM, and the extent to which experience of aesthetic emotions is reflected in EM performance. Secondly, it sought to examine these effects in a rich spatiotemporal context that previous studies of music and EM have failed to employ.
Accordingly, we had participants carry out a computerised version of a WWW test, whereby participants were required, with music or silence in the background, to drag objects over images of two rooms, and then, in a silent recall phase, to indicate (what) objects they dragged (where), and in which of the two rooms (when). Here it is important to note that the concept of when in our study is intimately entangled with that of which given that i) our study only utilised a short gap between the two sessions, and ii) the two "times"/sessions during which our participants hid objects were two different contexts/rooms (rather than a single context/room). We therefore revisit this fact in our discussion.
In any case, we predicted that recall would be more accurate for those participants who carried out the task in the presence of pleasant music (compared to silence and low pleasant music) in line with previous findings (Cardona et al., 2020;Ferreri et al., 2015) but that arousal would play an important role whereby within the pleasant music conditions, the high arousing condition would lead to poorer performance than the low arousing condition (Kahneman, 1973;Kiger, 1989). Further we predicted that certain aesthetic emotions such as epistemic curiosity may lead to enhanced memory performance (Gruber & Ranganath, 2019), while negative aesthetic emotions, including boredom, may lead to reduced performance (Blondé et al., 2022;Goldberg & Todman, 2018).

Materials and methods
Participants 105 individuals (71 Male, 1 unreported; Mean age = 25.9, range 18-50) participated in the experiment. 17 in the control condition, 24 in the HPHA (high pleasantness-high arousal) condition, 21 in the HPLA (high pleasantness-low arousal) condition, 22 in the LPHA (low pleasantness-high arousal) condition, 21 in the LPLA (low pleasantness-low arousal) condition. Power analysis revealed that to detect a medium effect size of d = .4, a total sample size of 93 participants would be needed for 80% power. Therefore, a total sample size of 100 participants was the recruitment goal.

Materials
Music stimuli were selected from the DEAM dataset (Soleymani et al., 2018), in which static annotations of felt valence and arousal are provided for a variety of musical clips. 20 non-lyrical stimuli belonging to classical or classical-fusion genres with extreme arousal and valence scores were identified and validated in terms of their felt arousal and pleasantness ratings using a pre-test survey we conducted (where 23 participants were instructed to identify how pleasant the clip made them feel, and how arousing the music was for them). Each 45 s clip was then looped once, such that musical excerpts (see Supplementary Materials for details) were all 90 s in length.
The experimental task, implemented in Psy-choPy (Peirce et al., 2019), was a computerised version of the real-world WWW test (Smulders et al., 2017) such that rather than hide physical objects in cued real-world destinations, participants dragged icons of objects on their computer screens to designated destinations over images of two different living rooms. Objects were to be dragged to pre-determined destinations to replicate as closely as possible the procedure implemented by Smulders and colleagues (2017) as well as to ensure that those destinations made sense spatially (i.e. that objects could indeed be placed in those destinations in a real-world threedimensional space).
Participants provided self-report experience of their aesthetic emotions using the AESTHEMOS questionnaire (Schindler et al., 2017), which groups aesthetic emotions into 7 dimensions, namely negative, prototypical, epistemic, animation, nostalgia/relaxation, sadness, and amusement aesthetic emotions. (See Supplementary Materials for all AESTHEMOS items and their dimension groupings).

Procedure
Before beginning, participants were instructed to complete the task in a place without distraction. Participants completed a practice encoding and recall session (with no music) to acclimate to the procedure. Prior to the two encoding sessions (corresponding to images of two different living rooms), participants were told that they would be cued objects to drag to destinations designated on screen, and would later once more need to drag those objects to the correct destinations in the correct rooms. During each encoding session, participants were shown a static image of a living room with twenty every-day objects at the top of the screen; where the same objects appeared in the two rooms/ encoding sessions. After 10 s of silence, the image of the first object to be dragged appeared in the corner of the screen, and a red circle indicated where the object should be dragged to (see Figure 1). After 12 s, each new object-destination pair (a total of 6 pairs in each encoding session) was presented and participant had to drag objects to destinations accordingly. All object-destination pairs were randomised so that each participant had 6 unique object-destination pairings, which were cued in unique order, for each encoding session.
Participants completed both memory encoding sessions while presented with the excerpt corresponding to their randomly-assigned experimental condition (HPHA, HPLA, LPHA, LPLA, silence). Between the two encoding sessions was a 90second break during which participants were instructed to fixate on a cross on the screen and continue listening to the same audio. After both encoding sessions, participants completed the AESTHEMOS questionnaire with regard to their experience of the excerpt heard (Schindler et al., 2017), or if they were in the silence condition, with regard to the way that they felt during the task. An item was included ("Please select 4") to identify any participants who were not completing the survey with sufficient attention and who would thus need to be excluded prior to analysis.
During the recall phase, participants were shown each room with the same 20 objects presented in two rows at the top of the screen, and were instructed to once more drag previously dragged objects to the destinations that they remembered initially dragging them to. Participants were instructed to drag i) objects to anywhere on the screen if they remem-bered the objects but not the related destinations, and ii) any object to a given destination, if they remembered the destination but not the object associated with that destination. At this time, participants were also asked to provide ratings of how certain they felt about the accuracy of their memory, and how vivid the recollection was (see Figure 1 for an overview of the study design map and Supplementary Materials for complete task instructions).

Analysis
For each participant, a total recall score was determined as follows: 2 points for each correct object-destination-encoding session (room 1 or 2) identification, 1 point for only remembering an object in the correct encoding session (room 1 or 2), and 1 point for only remembering a destination in the correct encoding session. Within each session there was a possible score of 12 points, for an overall total possible score of 24 points across the two sessions, and a maximum score of 24 could only be achieved if the participant correctly dragged each object (what) to its correct destination (where) during the correct encoding session/ room (when).
To evaluate the effect of arousal and pleasantness on memory recall, a between-subject 2 (pleasantness: high, low) x 2 (arousal: high, low) ANOVA was estimated with EM recall score as the DV. Planned post-hoc one-way ANOVAs were carried out to examine the effect of arousal at low and high pleasantness levels, while a series of Bonferroni-corrected t-tests were conducted to compare each of the four music conditions to the control silent condition. To evaluate the effect of arousal and pleasantness on vividness, we carried out an ordinal logistic regression with pleasantness (high, low) and arousal (high, low) as predictor variables. The same ordinal logistic regression modelling approach was used to evaluate the effect of arousal and pleasantness on certainty.
Finally, to examine which, if any, aesthetic emotion dimensions influenced recall, we computed a stepwise multiple linear regression model where a stepwise search with all 7 dimensions of the AESTHEMOS scale as predictors was performed in both directions. Two ordinal logistic regression models were also estimated to examine the influence of the 7 AESTHEMOS dimensions on certainty and vividness ratings in turn.  analysis of EM performance as function of aesthetic emotions, a further 7 participants were removed for failing the attention check (leaving 15 in silent, 19 in HPHA, 18 in HPLA, 19 in LPLA and 19 in the LPHA condition). 3 data points identified as outliers, using the interquartile range criterion, were replaced with NAs (but note a similar pattern of results was observed without the removal of these data).
Finally, there was no influence of pleasantness or arousal on the vividness or certainty with which memories were relived. Nor was there any difference in vividness or certainty between any of the four music conditions and silence (all ps > 0.05). Effect of aesthetic emotions on memory recall Figure 2B shows recall score as a function of the aesthetic emotions induced. A stepwise multiple linear regression model (both directions) with all 7 dimensions as predictors was run. A significant regression equation was found (F(1, 71) = 8.90, p = 0.003) with an R squared value of 0.11. The model revealed negative emotions as the sole significant predictor of recall scores whereby high negative aesthetic emotion was associated with worse recall (B = −1.89, SE = 0.63, t = −2.98, p = 0.004). The ordinal logistic regression models did not show vividness or certainty of episodic memory experiences to be significantly influenced by the experience of any of the aesthetic emotion dimensions.

Discussion
The aims of the present study were two-fold. Firstly, it aimed to examine whether the felt pleasantness of a musical stimulus interacts with its arousingness to influence EM recall, and what role , if any, the subjective experience of aesthetic emotions plays. Secondly, it sought to examine any above emotion effects in a rich spatiotemporal context that previous studies of music and EM have failed to employ.
Our results failed to show a main effect of either pleasantness or arousal on memory recall but, as predicted, did show an interaction between them. Follow-up tests showed that while there was no difference as a function of arousal between the low pleasantness conditions, the high pleasantness-low arousal group performed better than the high pleasantness-high arousal group. Our results speak to the potential for music with arousing properties to distract from a task at hand (Kahneman, 1973) and, thus, seem in line with that of a previous study in which participants performed an old-new face recognition task after being presented with auditory stimuli in the encoding phase (Mado Proverbio et al., 2015). There, across a range of musical stimuli of arguably similar pleasantness levels, the higher arousal condition also resulted in poorer performance than the lower arousal one.
One unexpected finding from our study is that how arousing the stimulus was for listeners did not seem to influence performance when felt pleasantness was generally low. We suggest that with such unpleasant stimuli, participants may make an effort to block out the sounds, thus reducing the extent to which features of the stimuli may modulate arousal in the listener. Further work will be needed, however, to test this hypothesis. Finally, that neither of the low pleasantness conditions resulted in significantly lower performance than the high pleasantness-low arousal one, suggests that the high pleasantness-high arousal condition (which did result in significantly lower performance than the high pleasantness-low arousal one) is the single most detrimental for memory recall. However, the possibility that pairwise comparisons between the high pleasantness-low arousal condition and the two low pleasantness conditions did not reach significance due to limited power cannot be ruled out.
With regard to aesthetic emotions, our results demonstrated that negative emotional responses inversely predicted memory recall, such that the more negative a listener felt about the music, the worse their recall score was. Items included within the negative AESTHEMOS dimension include feelings of boredom, distaste, and ugliness, amongst others. As there is evidence of a link between reward and EM encoding (Ferreri et al., 2021), we suggest this finding may be due to the low reward that is experienced from music when negative aesthetic emotions are experienced. We however did not see an effect of epistemic emotions like curiosity (Gruber & Ranganath, 2019), or indeed any of the other AESTHEMOS dimensions, contrary to our initial predictions. One explanation of this pattern of findings is that negative emotions are induced in a more consistent way across participants than other aesthetic emotional dimensions. Further, that neither certainty nor vividness of the memory experience was influenced by the objective nature (pleasantness or arousingness) or the subjective experience (aesthetic emotions) of the music being heard, suggests that the effect music has on listeners' memory recall may not reach their awareness.
Last but not least, we did not see performance in any of the music conditions to be better than performance when encoding took place in silence. The presence of sound has been argued to provide an enriched encoding context, and music's advantage has been attributed to its ability to modulate psycho-physiological arousal and mood (Thompson et al., 2001). However, our failure to show better performance during any of the music conditions compared to silence has been seen in previous work (Mado Proverbio et al., 2015).
At this point, it is interesting to consider the current findings within the broader literature on how emotions influence EM. Previous studies using the circumplex model of emotions have shown that arousing information during an event is preferentially attended to and remembered (LeBlanc et al., 2015) and that negatively valenced (unpleasant) scenes, events and object details are more likely to be remembered than neutral and positively-valenced ones (Anderson et al., 2006;Kensinger, 2009). While seemingly contradictory to our studies, it is important to emphasise that those studies described situations where the stimulus to be remembered was also the arousing or negatively-valenced stimuli; here the musical stimulus was task-irrelevant.
It is also relevant to discuss the nature of our paradigm; specifically, the ways in which adapting it to become an online behavioural task made it differ from in-person versions of the task. Indeed, in contrast to Smulders and colleagues' (2017) original in-person task, where participants were asked to freely recall object-location associations, our study implemented cued recall, whereby participants were required to link objects to locations while both the objects and spatial context (locations) were presented on screen. This difference of our task from the original is important because a significant body of work suggests cued recall is faster and easier than free recall (Paivio et al., 1994).
As briefly mentioned in the introduction section, two other ways in which our task differed from the original Smulders and colleagues' (2017) in-person task are that our task only utilised a gap of 90 s between the two sessions (while their task integrated a 2-hour break), and further the two "times" during which participants hid objects in our study were two different contexts/rooms (rather than a single context/room). Given the diverse ways with which the "when" component has been implemented in past research, we have kept the original paradigm name in this study. However, due to these two differences from the original in-person task, the present paradigm may alternatively be described as a "whatwhere-which" paradigm, where, rather than asking when they moved each object, participants ask themselves which room they moved a given object in. This distinction is worth noting given evidence that the cognitive processes and neural circuitry recruited during encoding when versus which contexts differ (Robertson et al., 2015). Future versions of this task would, thus, do well to better discriminate the spatial from the temporal components with respect to episodic memory encoding and retrieval.
While it delivers new insight, our study suffers from a number of other limitations that would be good to address in future work. For instance, as the current study used experimenter-selected stimuli, it is not easy to generalise our findings to the music which people listen to in everyday life: such very familiar music may be able to trigger stronger emotions than the stimuli we used here. Furthermore, while emotions are held to play a particularly significant role at the encoding phase of EM (Kensinger, 2009), it would be highly relevant for future studies to also examine how memory retrieval is influenced when music is present in the recall phase.
Another limitation is that our task required participants to drag the objects to pre-determined destinations; future work that would allow participants to move objects freely (e.g. as in Cheke et al., 2017) would more closely emulate memory encoding in real life, where actions are freely taken and memories are, thus, at least partially, self-initiated. Similarly, as the between-subjects nature of the study limits power, future studies may consider implementing a withinsubject design. To accommodate the four conditions, such a design would need at least four rooms or sessions, along with a greater number of objects (at least more than 24; both to allow six distinct objects to be used in the four different rooms, and to allow for extra objects to act as foils).
Finally, while the present paradigm functioned as a valid task of episodic memory, future studies could also include neuropsychological tasks (such as the California Verbal Learning Test (Delis et al., 1987) or Rey Auditory-Verbal Learning Test (Rey, 1964)) alongside the what-where-when procedure to further validate the reported influences of music on episodic memory performance.
In any case, our results confirm the complex influence that music-induced emotions can have on everyday memory, while also showcasing the utility of a novel musical What-Where-When paradigm for testing episodic memory.

Data availability statement
The data that support the findings of this study are available from the corresponding authors, S. Nawaz and D. Omigie, upon reasonable request.