Vicarious punishment of moral violations in naturalistic drama narratives predicts cortical synchronization

Punishment of moral norm violators is instrumental for human cooperation. Yet, social and affective neuroscience research has primarily focused on second-and third-party norm enforcement, neglecting the neural architecture underlying observed (vicarious) punishment of moral wrongdoers. We used naturalistic television drama as a sampling space for observing outcomes of morally-relevant behaviors to assess how individuals cognitively process dynamically evolving moral actions and their consequences. Drawing on Affective Disposition Theory, we derived hypotheses linking character morality with viewers ’ neural processing of characters ’ rewards and punishments. We used functional magnetic resonance imaging (fMRI) to examine neural responses of 28 female participants while free-viewing 15 short story summary video clips of episodes from a popular US television soap opera. Each summary included a complete narrative structure, fully crossing main character behaviors (moral/immoral) and the consequences (reward/punishment) characters faced for their actions. Narrative engagement was examined via intersubject correlation and representational similarity analysis. Highest cortical synchronization in 9 specifically selected regions previously implicated in processing moral information was observed when characters who act immorally are punished for their actions with participants ’ empathy as an important moderator. The results advance our understanding of the moral brain and the role of normative considerations and character outcomes in viewers ’ engagement with popular narratives.


Introduction
Social norms are defined as standards of behavior that are based on widely shared beliefs on how to behave in a given situation (e.g., Elster, 1989;Horne, 2001;Voss, 2001) and are critical for coordinating human social life (Buckholtz and Marois, 2012;Fehr and Fischbacher, 2004a).In recent years, interdisciplinary efforts have focused on understanding the origin and function of moral norms, a subclass of social norms whose boundaries are defined by behaviors that individuals, groups, or societies attribute a moral (versus conventional) basis (Forgas et al., 2016;Kuhn et al., 2000;Sinnott-Armstrong and Wheatley, 2012).A commonly shared view is that moral norms evolved to solve distinct adaptive problems (Barkow et al., 1995;Delton and Krasnow, 2015;Joyce, 2007;Sinnott-Armstrong, and Miller, 2008), including kin selection, social exchange, and large-scale cooperation (Curry et al., 2019).Hence, enforcing moral norms via appropriate punishments is considered a human universal (Henrich, 2003) and likely co-evolved with stable, large-scale cooperation (Bendor and Swistak, 2001;Hauser, 2006).
Although we know a fair amount about the cognitive mechanisms of second- (Crockett et al., 2008(Crockett et al., , 2013;;Sanfey et al., 2003;Singer et al., 2004Singer et al., , 2006) ) and third-party moral norm enforcement (Bellucci et al., 2017;2020;Buckholz et al., 2008;Fehr and Fischbacher, 2004b;Glass et al., 2016;Krueger and Hoffman, 2016;Krueger et al., 2014;Zhong et al., 2016), the neural architecture underlying observed (vicarious) punishment of moral wrongdoers is not well understood.This lack of knowledge is surprising considering that vicarious moral norm enforcement is a wide-ranging phenomenon, promoted by the will of impartial observers to incur punishments towards moral norm violators, even when observers are not themselves the victim of transgressions (Buckholtz et al., 2008;Krueger and Hoffman, 2016;Krueger et al., 2014).
Indeed, moral prescriptions are often transmitted via impartial, observational learning processes, in contrast to direct, costly experience (Bandura, 1977;A. 2008;Blakemore et al., 2004).Consuming narrative fiction, which is one of the most popular leisure activities, is likely driven by people's desire to exercise their moral intuitions in a safe, low-cost environment (Bandura, 2002;Gottschall, 2012;Tooby and Cosmides, 2001).Compellingly, this mediated punishment of deserving others is universally enjoyed (Raney, 2005;Raney and Bryant, 2002;Tamborini et al., 2013;Weber et al., 2008), evidenced by the success of cinematographic productions and fictional drama that prominently feature moral conflicts (Hopp et al., 2020).Thus, narratives should be a boon for scholars studying the neural correlates of observed moral norm enforcement in naturalistic settings (Nastase et al., 2020;Willems et al., 2020).
In the context of media psychology, Affective Disposition Theory (ADT; Zillmann, 2000) provides a framework for understanding how audiences vicariously apply their moral intuitions to narrative formats.According to ADT, viewers act as "untiring moral monitors" (Zillmann, 2000, p. 54;Grizzard et al., 2023), who continually evaluate the moral actions of characters during narrative exposure.As the story unfolds, audiences form affective (i.e., positively or negatively valenced) dispositions towards characters based on the perceived moral rightness or wrongness of characters' actions (Raney, 2005;Raney and Bryant, 2002).In turn, viewers come to like characters whose behavior conforms to their moral convictions and dislike characters whose actions display a violation of moral norms.These affective dispositions form the basis for viewers' expectations towards characters' deserved outcomes, such that liked characters are expected to be rewarded and disliked characters are expected to be appropriately punished (Grizzard et al., 2023;Raney, 2005;Zillmann, 2000;Zillmann and Bryant, 1975).
ADT predicts that narrative engagement and appeal are modulated by the degree to which these expectations are fulfilled.A large number of behavioral studies have found support for ADT (for an overview see Raney, 2020;Tamborini andWeber, 2020, Grizzard et al., 2023).For example, in a longitudinal design, Weber et al. (2008) demonstrated that viewers' continued exposure and enjoyment of soap opera drama are driven by characters' moral actions and the outcomes that befall them.Grizzard et al. (2023) recently tested all parts of ADT, and found that enjoyment was indeed tied to characters' perceived morality and the justness of their outcomes.Eden et al. (2014) demonstrated that consumers of film summaries judge character behavior with respect to distinct moral norms (e.g., care and fairness), highlighting how characters' violation of these norms and narrative resolutions (positive or negative outcomes) predict story appeal.Yet, despite strong evidence in behavioral studies, a neurophysiological examination of propositions that align with ADT is still lacking.
Here, using functional Magnetic Resonance Imaging (fMRI), we examined the neural responses to moral behaviors portrayed in excerpts of audiovisual drama narratives using intersubject correlation (ISC; Hasson et al., 2004), a measure that co-varies with viewer engagement with narrative stimuli at the neurological level.In addition, we measured enjoyment of each narrative at the behavioral level using continuous response measurements (CRM; Biocca et al., 1994) and viewer traits such as empathy, and studied the relationship between these concepts and the neurological response to moral narratives via intersubject representational similarity analysis (IS-RSA; Chen et al., 2020;van Baar et al., 2019).The narrative stimuli displayed a complete dramatic structure and fully crossed main character behavior (moral/immoral) and consequences (reward/punishment), along with neutral clips, allowing for the test of ADT's central predictions at both the behavioral and neurological level.

Foundations of ADT -Moral Norm enforcement
While mounting literature is converging on the idea that a central function of morality is to promote cooperation (Curry et al., 2019), there is less consensus about how moral norms are learned to carry out their function (Cushman et al., 2017).Although the capacity for moral judgment is argued to be innate (Graham et al., 2012), humans are not born with a neurocognitive system that inherits the information required to compute appropriate moral judgments.Similar to language acquisition (Hauser, 2006), innate modules for moral judgment reflect incomplete "building blocks" that need to be shaped into higher-level systems during the process of development (German and Leslie, 2000).A common mechanism to ready these moral architectures is by incentivizing "moral" actions through rewards and deterring "immoral" actions through punishments.Converging evidence supports this value-based reinforcement learning (Cushman, 2013).Punishments of moral transgressions are especially effective means for learning and reinforcing moral standards (Buckholtz and Marois, 2012;Henrich et al., 2006) as they frequently elicit recursive mentalizing to understand the communicative intent behind the punisher (Cushman et al., 2019;Sarin et al., 2021).
To study how people update and enforce their moral convictions through second-party punishments, the bulk of norm-focused cognitive neuroscience research has used economic norm-enforcement games (Fehr and Camerer, 2007).One of the strongest findings in the two-party normative literature involves reciprocity, a desire for fairness in social interactions, such that individuals are rewarded or punished to the extent that they contribute to the shared benefit of the group rather than for personal benefit (Crockett et al., 2013;Sanfey et al., 2003;Singer et al., 2004Singer et al., , 2006)).
However, only recent studies have replicated these findings in indirect (impartial) settings (e.g., hypothetical crime scenarios; Glass, et al., 2016), where third-party observers apply their moral intuitions to incur punishments on deserving wrongdoers.The mental state of the offender (i.e., the wrongdoer's intention, Krueger et al., 2014) and the severity of harm caused to the victim are the two primary predictors of impartial punishment decisions (Buckholtz and Marois, 2012;Ginther et al., 2016;Glass et al., 2016).Furthermore, Buckholz et al. (2008) demonstrated that sanctions against third-party norm violations activated similar brain networks found consistently in studies focusing on two-party economic games (see also Zhong et al., 2016).These results provide converging support for the hypothesis that third-party punishment is a selective extension of second-party punishment (Hoffman, 2014;Krueger and Hoffman, 2016).

Narratives as catalyst for moral readiness
Recent calls criticize the ecological validity of the above discussed, tightly-controlled punishment paradigms for understanding how moral norms are learned and enforced "in the wild" (Bauman et al., 2014;Hester and Gray, 2020;Schein, 2020).In daily life, a majority of appropriate moral rules are not inculcated through decontextualized, costly-to-experience, costly-to-punish, and relatively rare direct experience.Instead, via observational learning (Burke et al., 2010), humans have developed a much safer, more efficient technique to vicariously learn which morally-relevant behaviors produce rewards/punishments and thus should be reenacted/averted (Bandura, 2002;Blakemore et al., 2004;Cushman, 2013).Critically, observational learning of appropriate moral conduct does not solely rely on monitoring "action signals" of behavioral outcomes (Cushman et al., 2019), but necessitates considering an actor's mental state (Young et al., 2007), character (Inbar, et al., 2012;Uhlmann et al., 2015), intention (Borg et al., 2006;Koster-Hale et al., 2013;Young and Saxe, 2009b), and contextual factors (Schein, 2020).Integrating these interrelated, dynamically-evolving features into an ecologically-valid task paradigm is highly challenging.Yet, these elements are prominently woven into the fabric of narratives (Emelin et al., 2020), especially dramatic narratives found in media (Zillmann, 2000).This positions narratives as a unique tool to study how humans apply their moral intuitions, and observe and learn from vicarious punishment displays in a more naturalistic setting.
It is now well established that compelling narratives feature characters who face moral decisions and moral conflicts (Hopp et al., 2020;Tamborini and Weber, 2020).According to Tooby and Cosmides (2001), such morally-relevant narratives are "attended to, valued, preserved, and transmitted because the mind detects that such bundles of representations have a powerfully organizing effect on our neurocognitive adaptations, even though the representations are not literally true" (p.21).Whereas learning appropriate moral conduct from informationally-sparse real-life interactions is risky and difficult, the "narrative mode" (Bruner, 2009) is an effective, mediated (i.e., wide-reaching) shortcut to easily transmit moral rules (Bandura, 2000;Raney, 2004; see also Greene et al., 2001).In fact, exposure to moral stories is an important vector for moral learning, promoting honesty (Lee et al., 2014) and empathy (Dodell-Feder and Tamir, 2018;Rohm, Hopp, and Smit, 2022), reducing intergroup prejudice (Paluck, 2009), and shifting moral judgment towards social convention (Tamborini et al., 2010).For these reasons, narrative engagement has been explained as a "map exercise" (Schwab, 2004) that allows viewers to continually develop and update their moral machinery to be in an optimal state of "internal readiness" (Tooby and Cosmides, 2001).Compellingly, narratives closely simulate humans' natural experience (Graesser et al., 2002;Mar and Oatley, 2008), to a degree where neural responses to narrative events approximate brain activation elicited by similar real-world experiences (Speer et al., 2009).In this sense, examining how audiences apply their moral intuitions to the characters of narratives, in contrast to "raceless, genderless strangers" (Hester, and Gray, 2020) offers a more accurate, ecologically-valid picture of how moral norm enforcement operates in the real world (see also Kaplan et al., 2017).

Hypotheses
The evidence discussed above suggests that audiences' vicarious monitoring of characters' moral actions and the resulting, value-based outcomes for characters within the story produces robust behavioral reactions in the form of selective exposure to narratives and their enjoyment.This reliable, behavioral response pattern suggests that humans have shared neural circuits that (1) reliably track characters' moral behaviors (Hopp et al., 2023), and (2) respond sensitively to value-based outcomes in the form of rewards and punishments.Addressing this question lends itself to ISC (Hasson et al. 2004;U. 2008; for a tutorial, see Nastase et al., 2019), which has been shown to track the degree of time-locked, collective engagement between viewers and narrative stimuli (U.Hasson et al., 2008;Nguyen et al., 2019;Sonkusare et al., 2019;Willems et al., 2020).
In addition, we predicted that morally relevant, versus neutral narratives, elicit higher ISC in these morally relevant brain networks (H2).First, as Haidt and Kesebir (2010) conclude, "morality is about binding groups together in ways that build cooperative moral communities, able to achieve goals that individuals cannot achieve on their own" (italics added; p. 815; see also Haidt, 2001).Thus, moral narratives should carry an inherently social component, leading to collectively engaged, synchronized viewer cognitions that promote cooperation (Lewis et al., 2017).In addition, recent findings emphasize the enhanced attentional capture of moral over non-moral stimuli (Gantman et al., 2020;Gantman and Van Bavel, 2014;Hopp et al., 2023), suggesting that narratives exhibiting morally-relevant cues should attract attention in a more synchronous fashion as opposed to stories that disperse information of non-moral, idiosyncratic relevance.
Relatedly, humans display a well-documented "negativity bias," such that negative information is more likely to be attended to and remembered in adults' and children's evaluations of social situations (Aloise, 1993;Knobe, 2003;Leslie et al., 2006).Even infants seem to have an aversion to antisocial characters but fail to show independent attraction to prosocial characters (Hamlin et al., 2010).Applied to narratives, moral transgressions in plot lines typically cause a vicarious experience of some form of harm, which neuroimaging research has linked to activity in dACC and dAI (Lamm et al., 2011;Singer et al., 2004).Thus, we expected particularly strong ISC in moral brain networks when narratives exhibit characters who engage in moral transgressions compared to morally-appropriate or non-moral actions (H3).
Moreover, ADT postulates that viewers will expect characters who act morally to be rewarded and characters who commit moral transgressions to be punished.In particular, viewers should expect that deserving moral wrongdoers are being appropriately punished as a "corrective" response to reinforce moral conduct and stabilize social interaction.In this sense, it has been proposed that the desire for punishment of moral transgressors arises from an intuitive, undifferentiated concept of ''normality" (Bear and Knobe, 2017) that is based on learned expectations for punishing immoral actions (Chang and Sanfey, 2013).Hence, we expected the strongest ISC in the previously highlighted moral brain networks when immoral behaviors are punished and moral actions are rewarded (H4).In contrast, when these expectations are violated, for example, when a moral transgression is rewarded or morally appropriate behavior punished, audiences likely experience a prediction error (Burke et al., 2010) that invites cognitive reappraisals of the story.While reappraisals can likewise be collectively shared, their relation to synchronous neural activity during story processing is theoretically less clear-cut, which is why we assessed these scenarios in an exploratory fashion.
Furthermore, in the context of serial narratives, continuous consumers of a show (i.e., fans) will likely hold particular expectations towards main characters' moral behavior and their outcomes.On the one hand, this suggests that fans, compared to non-fans, engage more strongly with characters' moral behavior and consequences, and thus exhibit higher ISC in moral brain networks compared to non-fans (H5a).On the other hand, assuming a "hardwired", functional relevance of moral information, both fans and non-fans may process and integrate novel character behavior and consequences in a very similar fashion, and thus ISC in brain regions implicated in moral cognition should not differ between fans and non-fans (H5b).
Moreover, consistent evidence has shown that individuals' empathy can modulate moral judgment (Decety and Cowell, 2014), and that individual differences in empathy predict viewer engagement with character-driven, dramatic narratives and fandom (Rohm et al., 2022).Thus, we expected that individuals high [low] in empathy are alike in their moral brain network responses to drama narratives, whereas each individual low [high] in empathy exhibits idiosyncratic neural response dynamics (H6).
Finally, while ISC has reliably been linked to viewer engagement and shared narrative interpretation (Nguyen et al., 2019;Yeshurun et al., 2017), it does not provide a real-time measure of the valence (e.g., liking/positive evaluation versus disliking/negative evaluation) of narrative processing.Thus, studies have supplemented ISC with CRM (Biocca et al., 1994), for instance, to obtain time-locked assessments of stimuli valence or arousal that underpin cognitive experience (Nummenmaa et al., 2012(Nummenmaa et al., , 2014)).Informed by these applications, we R. Weber et al. predicted that individuals who are similar in their continuously reported narrative enjoyment will also exhibit similar neural activation patterns in moral brain networks (H7a).Analogously, we expected that individuals with similar moral judgments of characters' behavior (H7b) as well as stories' outcome valence (H7c) will also exhibit stronger neural synchronization in regions engaged during moral cognition.

Participants
We studied 28 healthy female participants (age range: 18-24 years, M = 21.3,SD = 1.31) from a large Midwestern university.Although we did not conduct a power analysis, our sample size was comparable to those of previous neuroimaging studies investigating affective responses to naturalistic stimuli (see Chan et al., 2020, N = 31;Chang et al., 2021, N = 13;Chen et al., 2020, N = 26;).Analogously, we focused solely on young females as they were the main target group of Days of Our Lives.None reported a history of neurological or psychiatric disorders.Half of the participants (N = 14) identified as fans of the selected narrative (described below), whereas the other half identified as non-fans.Participants received $50 as compensation.All participants provided informed consent as part of the protocol approved by the ethics committee at Michigan State University in Lansing, Michigan (protocol number #06-064).

Stimuli and naturalistic viewing task
We selected 15 short scenes (180 s each) with a complete dramatic structure selected from 50 episodes of Days of Our Lives, a popular US daytime television soap opera, as stimulus material for this study.The scene selection was based on a survey among an independent sample of 547 female undergraduate students (M = 20.1 years, SD = 1.09).The sample resembled both the main target group of the television show and the sample for the present study.Evaluations included the moral valence of character's behaviors (i.e., did the character act morally or immorally), the valence of story outcomes for characters (i.e., whether moral or immoral behaviors were punished or rewarded), and liking of the scenes.Selection criteria were a clearly identifiable main character with either high, neutral, or low morality ratings and distinct positive, neutral, or negative outcomes.Due to the nature of the programming, most moral behaviors violated or upheld the fairness/reciprocity norm.That is, immoral characters tried social sabotage or deceit, whereas moral characters offered help at personal costs.Outcomes were mostly social in nature.Positive outcomes often meant good news or social successes, while negative outcomes entailed rejection or emotional distress. 2The 15 selected scenes fell into five experimental groups (3 per group; see Fig. Upon arrival, participants shared demographic details, handedness, medical history, and trait empathy.They then watched the 15 preselected scenes during an fMRI scan.The scene order was fully randomized for each participant.Scenes were presented on a 640 × 480 LCD screen in the magnet (as hood mount) and synchronized with the MR signal.Participants wore MRI headphones to listen to the clips.The scanning session comprised three 18-minute runs, totalling 54 min.Between clips, a 30-seconds blue screen with a centered cross acted as a non-active baseline.Post-fMRI, participants rewatched and assessed their enjoyment of all 15 scenes on a PC using CRM (see below).For each viewed clip, participants additionally rated main characters' moral behaviors and outcomes' valence.

Behavioral measurements
Empathy.Participants' levels of empathy were measured using the Interpersonal Reactivity Index (IRI; Davis, 1983).Participants responded to 28 statements on a 5-point Likert-type scale ranging from 1 (strongly disagree) to 5 (strongly agree).The statements were categorized into four subscales: perspective-talking (e.g., "When I'm upset at someone, I usually try to 'put myself in his shoes' for a while."),fantasy (e.g., "When reading an interesting story or a novel, I imagine how I would feel if the events were to happen to me"), empathic concern (e.g., "I would describe myself as a pretty soft-hearted person"), and personal distress (e.g., "Other people's misfortunes do not usually disturb me a great deal.").We computed a total empathy score for each participant by summing the responses across all items.

Character behavior and outcome valence
Both the moral propriety of behaviors and outcome valence were measured using a 7-point Likert-type format.For moral propriety, the scale ranged from "extremely immoral" to "extremely moral" with "neutral (neither moral nor immoral)" as the midpoint.For outcome valence, the endpoints were "extremely bad" and "extremely good" with "neutral (neither bad nor good)" centered on the scale.

Continuous response measurements
Participants' moment-to-moment enjoyment of each scene was assessed using continuous response measurement (CRM; Biocca et al., 1994).Using continuous video evaluation software, participants re-watched each of the 15 scenes while continually moving a slider to rate how much they enjoyed each moment on screen, with anchors ranging from"I do not enjoy it at all" to "I enjoy it very much".Due to software issues, CRM data from four participants had to be discarded.

Imaging
Cortical activity was measured using the blood-oxygenation level dependent (BOLD) effect.BOLD contrast was obtained with a gradientecho-planar imaging (EPI) sequence (General Electric Signa HDX 3.0T; field strength of 3 Tesla; whole brain coverage with 30 interleaved slices; slice size 4 mm with 0.4 mm gap; TR = 2000 ms; TE = 27.2 ms; flip angle = 77 • , field of view 22 × 22 cm2, matrix size 64 × 64).For reference, we acquired anatomical brain images of each participant between the first and the second run of the fMRI procedure.Raw DICOMs were organized according to the Brain Imaging Data Structure (BIDS; Gorgolewski et al., 2016).

fMRI data preprocessing
Imaging data were minimally pre-processed using fMRIprep (Esteban et al., 2019; see SI for details), a robust tool to prepare task-based fMRI data for statistical analysis and circumventing the reproducibility concerns of fMRI preprocessing steps (Esteban et al., 2020).After preprocessing with fMRIprep, we postprocessed the data according to established guidelines for naturalistic imaging data. 3Specifically, we smoothed the data (fwhm=6 mm) and performed basic voxelwise 1 Supplemental information (SI), behavioral data, and code for all analyses are available on OSF at https://osf.io/cn63b/?view_only=44ad22e4c2b64 1d38b5eab78138ceb22.Corresponding fMRI data are available at the openneuro.org 2 In one scene, physical negative outcomes were present; i.e. a child was hit by a main character's car.
3 https://naturalistic-data.org/content/Preprocessing.htmldenoising using a General Linear Model (GLM).This entailed including the 6 realignment parameters, their squares, their derivatives, and squared derivatives.We also included dummy codes for spikes identified from global signal outliers and outliers identified from frame differencing (i.e., temporal derivative).We chose to not perform high-pass filtering and instead include linear & quadratic trends, and average CSF activity to remove additional physiological and scanner artifacts.

Intersubject correlation
We used intersubject correlation (ISC) to examine the reliability of neural dynamics in response to the soap opera clips across individuals (Cohen et al., 2017;Hasson et al., 2004;Nastase et al., 2019).First, for each participant, we separately extracted the mean time course for each clip across a set of 9 a-priori parcels commonly observed in the moral neurosciences (Eres et al., 2018; see SI Fig. 1; parcellation available at http://neurovault.org/images/39711).The parcellation was created by performing a whole-brain parcellation of the coactivation patterns of activations across over 10,000 published studies available in the Neurosynth database and has previously been used to study both narrative engagement (Chang et al., 2021) and moral judgment (Hopp et al., 2023).For testing our hypotheses, we then aggregated and concatenated the clip-wise parcel time courses into the following conditions: a) narrative segments versus non-active baselines (H1); b) morally relevant versus neutral narratives (H2); moral, immoral, and neutral character behavior (H3); and c) moral-positive, moral-negative, immoral-positive, immoral-negative, and neutral (H4) (for an overview, see Table 1).For each of these conditions, we separately computed the pairwise correlation between participants' mean time-course in each parcel, producing a subject by subject correlation matrix for each parcel.Statistical significance of ISC values was obtained via subject-wise bootstrapping (using 10,000 bootstraps) of pairwise similarity matrices (Chen et al., 2016(Chen et al., , 2020)).To compare ISC values across condition pairs and between fans and non-fans, we used nonparametric randomization tests with 10, 000 permutation samples.

Intersubject representational similarity analysis
We used intersubject representational similarity analysis (IS-RSA) (Chen et al., 2020;van Baar et al., 2019) to test whether neural synchrony across moral brain networks in response to moral narratives was modulated by participants' similarity in traits and story ratings.This analytic technique is an extension of ISC, which identifies the reliability of temporal responses to a dynamic stimulus across participants (Cohen et al., 2017;Hasson et al., 2004).However, rather than examining the reliability of responses in a region across participants, IS-RSA instead explores how intersubject variability in brain dynamics is related to  Note.The non-active baseline was shown at the end of every clip.
R. Weber et al. individual differences in behavioral disposition using second order statistics akin to representational similarity analysis (Kriegeskorte et al., 2008;Nguyen et al., 2019;Nummenmaa et al., 2012).The intuition is that participants who occupy a similar position in the multidimensional feature space (e.g., empathy, character judgments, or CRM) will also be processing information about moral narratives more similarly and regions involved in these processes should show a commensurate similarity in temporal dynamics (van Baar et al., 2019).
To compute participants' pairwise similarity in trait empathy, we used participants' scalar summary scores of the IRI and computed the mean of every subject pair's rank normalized by the highest possible rank.This "Anna Karenina" (AnnaK; Finn et al., 2020) distance matrix predicts that all individuals high [low] in empathy are alike in their neural responses to drama narratives, whereas each individual low [high] in empathy is different in their own way.For CRM, we computed the correlation between pairs of participants' continuous story evaluations for each condition.Accordingly, this "nearest neighbor" (NN; Finn et al., 2020) distance matrix predicts that individuals with (dis)similar CRM response profiles, regardless of their absolute CRM ratings, display (dis)similar neural responses to drama narratives.Lastly, we measured intersubject similarity in character morality and outcome valence ratings by computing the pairwise Euclidean distance across clips of the same condition (3 per condition).This produced two NN-based distance matrices, one capturing relative similarity in character morality judgments, and one denoting relative similarity in outcome valence ratings.
Finally, we used spearman rank correlations to compute the overall similarity between the lower triangles of the neural and behavioral intersubject similarity matrices separately for each of the 9 a priori moral judgment ROIs.This procedure yielded 20 brain maps of regional similarity in individual variations in narrative experience based on variations in empathy, CRM, character ratings, and outcome ratings (5 brain maps per behavioral measure).To assess whether there is general representational similarity between moral brain networks and behavior, we used one-sample permutation tests with 5000 permutations and compared the distribution of spearman rank correlations across moral brain networks against zero.To determine whether individual ROIs show significant representational similarity, we thresholded each map using a Mantel permutation test (Mantel, 1967;Nummenmaa et al., 2012), in which both the rows and columns of one subject by subject correlation matrix were shuffled and the spearman correlation between both correlation matrices was recomputed.This procedure was repeated 5000 times to generate a null distribution of rank correlations which was used to compute p-values based on a one-tailed test with the alternative hypothesis of correlations greater than 0 for each ROI (Nili et al., 2014).Permuted p-values were thresholded using a false-discovery rate (Benjamini and Hochberg, 1995), q < 0.05 across ROIs using the FDR function available in the nltools package.

Behavioral results
We first examined whether participants evaluated the clips according to their selected character behavior and outcome valence.Starting with moral character judgments, there was a significant main effect of clip condition (F(4)=97.22,p < .0001,η 2 =0.48).The main character was rated as more moral in scenarios where the character behaved more morally, whereas the main character was perceived as more immoral in scenarios that illustrated an immoral character (Table 2, Fig. 2A; all pairwise mean comparisons significant at p < .001).In addition, clips in which the characters acted immoral were also rated as significantly more immoral compared to neutral clips.Curiously, characters that acted morally and were rewarded were not rated as more moral compared to characters from neutral clips (mean difference = 0.226, p = .894),but moral characters who were punished for their praiseworthy actions were indeed perceived as more moral compared to neutral scenarios (mean difference: 1.024, p < .001).In fact, across all conditions, characters who were punished for their moral actions were perceived as most moral.This result is coherent with the "virtuous victim" effect, where victims of moral transgressions are attributed greater moral character than neutral controls (Jordan and Kouchaki, 2021).Yet, our findings suggest that this effect may be even more pronounced if the victim first commits a moral action and is subsequently punished.
Clip condition also had a main effect on outcome valence ratings (F (4)=72.64,p < .0001,η 2 =0.41).Stories were rated as having a more positive outcome if the character was rewarded, and vice versa, received a more negative outcome valence rating if the character was punished (Table 3, Fig. 2B; all pairwise mean comparisons significant at p < .001).Likewise, negative outcomes for characters were rated as significantly less positive compared to neutral clips (all p < .001).Notably, whereas outcomes for characters that acted morally and were rewarded were rated as more positive compared to neutral scenarios (mean difference = 1.413, p < 0.001), outcomes for characters that acted immoral and were rewarded were not rated significantly more positive compared to neutral scenarios (mean difference = 0.375, p = .523).

Intersubject correlation
We hypothesized greater cortical inter-subject correlation (ISC) in select moral judgment regions as well as in visual and auditory brain networks for clips that feature narrative segments compared to nonactive baselines (Fig. 3A).Supporting H1, we observed higher ISC in visual and auditory processing regions when participants viewed narrative clips (mean ISC V1 = 0.26; mean ISC A1 = 0.22), compared to viewing a blue-screen during the baselines (mean ISC V1 = 0.11; mean ISC A1 = 0.06).For narrative clips and consistent with our predictions, we also found that average ISC values were significantly higher across the 9 a-priori moral brain networks during narrative processing (mean Note.Results of pairwise mean comparisons using the Tukey method for multiple comparison correction.Rows in bold font highlight contrasts between moral versus immoral conditions.ISC = 0.08) versus resting baselines (mean ISC = 0.03; p = .003).
Next, we examined whether morally relevant, versus neutral narratives, elicited differential ISC in moral brain networks.When comparing average ISC in a-priori selected ROIs previously implicated in moral judgment for both moral (mean ISC = 0.079) and neutral narratives (mean ISC = 0.082), we found that ISC were slightly higher in neutral narratives compared to moral narratives (p = .78),thereby rejecting H2.However, this finding fits well to our third and fourth hypothesis (H3, H4) in which we stated that moral violations and negative outcomes for norm violators should lead to greater viewer engagement and cortical synchronization in moral brain networks than moral adherence and positive outcomes.
Confirming H3, we observed significantly greater average ISC in moral brain networks in narratives that feature immoral character actions (mean ISC = 0.09) compared to moral (mean ISC = 0.07; p < .05)behaviors (Fig. 4A).Although average cortical synchronization in moral brain networks was slightly higher when observing immoral behaviors compared to neutral behaviors (mean ISC = 0.08), this difference was not statistically significant (p = .11).Analogously, moral character actions did not produce significantly higher ISC than neutral character behavior in our select moral brain regions (p = .99).
Critically, and in line with H4, we observed significantly higher average ISC in our select moral brain networks when immoral characters were punished (mean ISC = 0.103), compared to narratives in which immoral characters were rewarded (mean ISC = 0.078; p = .006),moral characters were punished (mean ISC = 0.061; p = .002),moral characters were rewarded (mean ISC = 0.071; p = .013),and neutral narratives (mean ISC = 0.081; p = .02)(Fig. 4C).With the exception of TPJ, which displayed slightly higher average ISC in the immoral-positive (mean ISC = 0.19) than the immoral-negative condition (mean ISC = 0.18), each of the remaining eight moral brain networks were synchronized most in the narrative condition where an immoral character was punished.When observing immoral behaviors being punished, a particularly sensitive and specific increase in average cortical synchronization was observed in dlPFC, PCC/PC, and the dAI.In line with previous studies, the dlPFC has shown greater activation when participants decide to punish protagonists in third-party interactions (Buckholtz et al., 2008) than when they withhold doing so because of mitigating circumstances.Together, these findings support our central notion that vicariously observing norm violations, especially when coupled with norm enforcements in the form of punishments, will lead to synchronized neural responses within brain regions involved in moral cognition.
In an additional analysis, we divided participants in fan-and non-fan groups and computed the average cortical synchronization for each narrative condition and group separately.Speaking to the hardwired, functional relevance of moral information (H5b), both fans and non-fans showed largely similar average ISC in moral brain networks across all narrative conditions (Fig. 4A).While non-fans showed slightly higher average ISC in moral brain networks in the immoral-negative condition (mean ISC difference = 0.019) and the immoral-positive condition (mean ISC difference = 0.01), fans displayed marginally higher cortical synchronization in the moral-negative (mean ISC difference = 0.01) and neutral (mean ISC difference = 0.01) narratives.Yet, none of these differences reached statistical significance.Providing further support to H5b, the spatial distribution of average ISC in moral brain networks was also highly similar between fans and non-fans across all clip conditions (Fig. 4B; all rs > 0.82), suggesting that both groups may have processed and integrated character behavior and consequences in a similar fashion, and evaluations are likely not driven by fans' a priori dispositions towards story characters.

Fig. 2. Behavioral ratings of narrative stimuli.
For each of the 15 clips, participants morally judged the main character's behavior, using a 7-point Likert scale from 1 (extremely immoral) to 7 (extremely moral) (A).Analogously, participants rated the valence of the story outcome, using a 7-point Likert scale from 1 (extremely bad) to 7 (extremely good) (B).Violin-plots show the distribution of ratings for each participant and each clip, separated by condition.Dots reflect the condition averages for each participant.Note.Results of pairwise mean comparisons using the Tukey method for multiple comparison correction.Rows in bold font highlight contrasts between positive versus negative conditions.

Intersubject representational similarity analysis
Having established that characters' moral actions and ensuing consequences evoke viewer engagement in brain networks relevant for moral cognition, we next determined whether the degree of ISC within each narrative condition was modulated by audiences' trait empathy.We generally predicted that individuals who scored high in empathy display a similar brain pattern in moral judgment networks to other high scorers, whereas low scorers will not look particularly similar to one another or to high scorers (H6).Using intersubject representational similarity analysis (IS-RSA), we indeed observed that high-empathy individuals process narratives in more similar ways than those scoring low on empathy, particularly when narratives were consistent with ADT's behavior-outcome interactions (Fig. 5A).Confirming our prediction, there was significant representational similarity in moral brain networks when moral characters were rewarded (mean IS-RSA = 0.10, p = .014)and immoral characters were punished (mean IS-RSA = 0.09, p = .018).
Compellingly, this relationship was non-significant when moral characters were punished (mean IS-RSA = -.01,p = .46)and when immoral characters were rewarded (mean IS-RSA = 0.01, p = .49).We also observed significant representational similarity in moral brain networks when individuals processed neutral narratives (mean IS-RSA = 0.09, p = .01),undergirding the domain-general role of empathy for processing stories (Tamir et al., 2016).Note that H6 did not specify differences in representational similarity between clip conditions, but rather that high empathy individuals will process morally-relevant narratives in more similar ways.Hence, we emphasize that these results do not speak to statistically significant differences between clip   conditions.This aspect should be the focus of future research.While no ROI survived multiple comparison corrections in any of the conditions, IS-RSA values were generally highest in TPJ (Fig. 5A; Table 4), which confirms prior research linking TPJ activation with mentalizing and narrative processing (Tamir et al., 2016).
Next, we tested whether subjects who were similar in their continuously reported narrative enjoyment via CRMs will also exhibit similar neural activation patterns in moral brain networks (H7a; Fig. 6B).Here we only observed significant, but negative relationships between continuous story ratings and neural synchrony in the immoral-positive (mean IS-RSA = -.04,p = .02)and neutral conditions (mean IS-RSA = -.08,p = .004),suggesting that individuals who had similar story ratings diverged in their neural synchrony.While no ROI survived multiple comparison corrections in any of the conditions (Fig. 5B), the TPJ showed a significant IS-RSA value in the moral-positive condition, whereas both the dMPFC and the PCC/Superior LOC showed significant, albeit negative IS-RSA values in the neutral condition (Table 4).
Lastly, we tested whether individuals with similar moral judgments of characters' behavior (H7b; Fig. 5C) as well as stories' outcome valence ratings (H7c; Fig. 5D) also exhibited stronger neural synchronization in moral brain networks.We only observed significant positive intersubject representational similarity for judgments of character outcomes in the immoral-positive condition (mean IS-RSA = 0.04, p = .017),suggesting that viewers who judged characters' experienced outcomes in a similar manner also engaged moral brain networks more similarly when processing these moral scenes, particularly in TPJ and PCC/Superior LOC (Table 4).In contrast, there was significant negative intersubject representational similarity for judgments of character behavior in the immoral-negative condition (mean IS-RSA = -.11,p = .002)and neutral condition (mean IS-RSA = -.03,p = .01)as well as for character outcome judgments in the moral-negative (mean IS-RSA = -.07,p = .006)and neutral conditions (mean IS-RSA = -.07,p = .004).Yet, as in the previous analyses, no ROI survived multiple comparison corrections, but both the TPJ and PCC/Superior LOC showed significant positive IS-RSA values in the immoral-positive condition (Table 4), suggesting that individuals who judged the morality of characters more similarly also had more aligned neural responses in these brain areas.

Discussion
The aim of this study was to illustrate neural responses to moral norm adherence and violation in third-party viewing situations, specifically televised, naturalistic drama narratives.Results overall indicate that the portrayal of socially relevant information in narrative scenes has a strong influence on collective viewer engagement as measured by cortical inter-subject synchronization in 9 a-priori selected moral judgment regions as well as in visual and auditory processing regions in the brain.Specifically, characters exhibiting non-normative or immoral behavior, and receiving punishment for this behavior, elicited stronger inter-subject synchronization in select moral judgment regions than any other type of scene presented.These findings provide evidence for the relevance of portraying moral norms and norm violations in engaging collective viewer response at a basic neural level.Our results are consistent with the idea that viewers may attend to and learn from scenes presenting socially relevant information that is not readily available in day-to-day life, such as moral violations and their subsequent punishment (Bandura, 2002).
Our stimuli presented characters acting in morally normative or nonnormative fashion, and viewers demonstrated strongest ISC in moral brain networks to immoral behaviors.This indicates that there is a systematic neural response to viewing moral norm violators that is distinct from other types of social information.Although this has been suggested by past research on moral violations (Hopp et al., 2023), our study is the first to use ISC to demonstrate these results for third-party, vicariously perceived violations.Hence, our approach can also advance theoretical frameworks that postulate specific moral domains (e.g., Haidt and Joseph, 2007): To probe the existence of a particular moral domain (e.g., justice, reciprocity, care), people should show shared intentionality to uphold a domain-specific norm, even when there are no direct implications for the self (Graham et al., 2013).Accordingly, viewers demonstrated the strongest ISC in moral brain networks, particularly in TPJ, dlPFC, PCC/PC, and the dAI, not only when characters behaved poorly, but also when they were punished for violating social norms.
Together, these results suggest that vicarious social learning is orchestrated by a complex interplay between brain networks involved in both implicit and explicit moral cognition.To disentangle these relationships, future studies may investigate how synchronization across participants and moral brain networks dynamically unfolds over the course of and in response to morally relevant actions and social sanction.In contrast, recent advances in multivariate neural decoding demonstrate that moral judgments of different moral violations can reliably be inferred from individuals' neural activation patterns (Hopp et al., 2023).In view of our findings, drama narratives may be an especially powerful and naturalistic environment to examine how individuals' dynamically process the moral actions of others.Likewise, the strong pattern of synchronization between viewers may support the notion that there is an evolutionary component to altruistic punishment that is triggered through viewing norm violators receive expected punishment, and also supports previous work on the role of expectations for norm violations (Chang and Sanfey, 2013).
Moreover, we also illuminated how individual variability in morally relevant traits, dynamic story evaluations, and post-hoc narrative judgments modulate intersubject neural synchrony in moral brain networks when viewing televised drama.Confirming the moderating role of empathy for moral decision-making (Decety and Cowell, 2014), we demonstrated that individuals high in empathy process morally relevant narratives in a more synchronized fashion compared to low-empathy individuals.Based on these results, future research may explicate whether higher synchrony among high empathy individuals is related to general narrative engagement, or specifically to the characters who audiences were empathizing with.4Repeating this study using ambiguous (Finn et al., 2018) or morally polarizing actions (i.e., that some participants view as very immoral and others view as very moral) may be worthwhile to disentangle these effects.Variability in continuous story ratings was negatively related to neural synchrony when participants viewed immoral characters being rewarded or narratives without moral content.On the one hand, these discrepancies may result from the fact that during punishment and neutral stories participants' CRMs showed substantially lower variance suggesting that participants had problems rating the "enjoyment" of negative outcomes and may not have been able to make sense of narratives without any obvious moral content or outcomes.On the other hand, participants may have focused on different aspects of the narrative when gauging their narrative enjoyment, leading to distinct neural appraisal processes.Accordingly, interindividual variability in moral judgments of characters and story outcome ratings was positively related to variations in neural synchrony when immoral characters were rewarded.
This study has several noteworthy limitations.First, the professionally edited drama narratives, though concise (180 s), may have been challenging for some participants to understand, particularly the nonfan group unfamiliar with Days of Our Lives.However, stories of traditional soap operas are not overly complex.In addition, the editors of the short story summary video clips ensured that relevant information about characters that allowed for (moral) viewer dispositions toward the characters and outcome evaluations was maintained.Plus, behavioral results demonstrated that both fans and non-fans were indeed able to identify both the morality of a character and the valence of story outcomes for that character.
Second, our analyses rest on the assumption that a particular viewer disposition (moral/immoral character) and story outcome (positive/ negative) is "ON" for the entire condition.In reality, dispositions were likely formed at the beginning of the story summary clips and linked to particular events and character behaviors, and outcomes for the characters were not revealed before the last 30-50 s in most video clips.This has likely resulted in lower signal strength within conditions.In future re-analyses of our data, we recommend using an event-related analysis design or time-resolved ISCs with which the effect of particular events throughout the narratives can be traced.
Third, in contrast to other paradigms using naturalistic stimuli (Nummenmaa et al., 2012(Nummenmaa et al., , 2014)), our participants did not evaluate the story during fMRI, but immediately after the fMRI session.While this could be seen as a limitation, this was a deliberative choice as we believe that simultaneous evaluations via button-boxes or sliders in a CRM during fMRI interferes with the naturalistic character of the stimulus and may even induce brain responses that are more related to the evaluation task than the actual stimulus (Jolly et al., 2022).Overall, the support for most of our a priori hypotheses suggests that post evaluations were not a substantial limitation and may even be a strength of our paradigm.
Fourth, filling in questionnaires prior to scanning (0.5 h), a scan duration of almost 1 hour, followed by two subsequent full evaluation tasks (1.5 h), may have pushed the limits of what participants can meaningfully evaluate.Thus, some of our results may not have survived higher testing thresholds due to the higher error variance created by our participants' exhaustion with the task.Yet, post-study interviews suggested that no participant reported major issues regarding the compliance with our study.
Fifth, our study had a relatively small number of participants (N = 28; but see Chan et al., 2020, N = 31;Chang et al., 2021, N = 13;and Chen et al., 2020, N = 26; with comparable sample sizes and naturalistic tasks) consisting exclusively of young females who self-identified as either fan or non-fan.While this indeed reflects the specified main target group of the soap opera Days of Our Lives, our results cannot be generalized to males and to participants of different ages.
Sixth, to distinguish between fans and non-fans, we asked how familiar participants were with the show Days of Our Lives, how many episodes they have watched, and also asked participants to self-identify as a fan or non-fan of the show at the time of the study.However, in conversations, several participants who self-identified as fans mentioned they had not watched the show for many years, or were unfamiliar with the newer characters or plotlines.Considering that this popular show started in 1965, such types of fans are likely common.Therefore, in this case, the fan designation may simply have been identifying familiarity with the show broadly versus a superfan who was very involved with the show.Thus, differences between our fan and non-fan group may not have been as pronounced as we had planned, and our findings may not generalize to superfans of the show who may have more strongly held dispositions towards characters.Future research should target superfans versus casual fans for comparing groups.
Finally, our study focused specifically on how moral character actions and ensuing consequences modulate audiences' neural synchronization, yet there exists a larger space of narrative elements that predicts intersubject synchrony, including valence and arousal (Nummenmaa et al., 2012;2014), mentalizing events (Finn et al., 2018), or erotic scenes (Chen et al., 2020).Contrasting how specific and decisive moral content features are in eliciting ISC compared to other narrative elements may thus help to illuminate the relative contribution of morality for crafting engaging stories.

Conclusion
Generally, findings from the current study support media psychologists' theories who argue that norm-violations and punishment are important ingredients that make for engaging drama narratives (Zillmann, 2000).Indeed, Zillmann suggests that one of the functions of drama is to allow viewers to enjoy the punishment of deserving characters.The importance of immoral characters also complements research indicating that villains drive narrative enjoyment of soap opera (Weber et al., 2008) and suspense (Eden et al., 2011).
In our view, the presented study is innovative in that it examines questions pertaining to broad concerns of social and moral cognitive neuroscience in a novel fashion.Our study illustrates the importance of viewing moral norms in third-party, vicarious exchanges with metrics introduced specifically to study neural responses to naturalistic narratives.The multidisciplinary approach taken here enhances the integration of cognitive moral neuroscience and narrative theory which potentially creates a richer understanding within the field of neurocinematics (Hasson, Landesman, et al., 2008).

Declaration of competing interest
None 1): (1) perception of main character's behaviors as moral & perception of story outcomes for the main character as positive; (2) immoral behavior & negative outcome; (3) moral behavior & negative outcome; (4) immoral behavior & positive outcome; (5) neutral behavior (perception of behavior as neither moral nor immoral) & neutral outcome (perception of story outcome as neither negative nor positive).

Fig. 1 .
Fig. 1.Naturalistic viewing task and clip conditions.Each participant viewed 15 clips, each lasting 180 s, presented in fully randomized order.There were three clips per condition across five conditions: (1) perception of main character's behaviors as moral and perception of story outcomes for the main character as positive; (2) immoral behavior and negative outcome; (3) moral behavior & negative outcome; (4) immoral behavior and positive outcome; (5) neutral behavior (perception of behavior as neither moral nor immoral) and neutral outcome (perception of story outcome as neither negative nor positive).Between clips, a 30-second blue screen with a centered cross acted as a non-active baseline.Time courses for each clip and participant were extracted from 9 non-overlapping parcels previously implicated in processing moral information.

Fig. 4 .
Fig. 4. Intersubject correlation between fans and non-fans.(A) Distribution of average ISC values in the selected 9 moral brain networks between fans and nonfans of Days of Our Lives across clip conditions.Mean ISC values in regions involved in moral cognition were not significantly different between fans and non-fans within each condition.Box plots for each condition display median (center line), upper and lower quartiles (box limits), and whiskers connotate 1.5 × interquartile range (IQR).Each dot represents the mean ISC within each ROI.(B) The spatial distribution of ISC values across moral brain networks was highly similar between fans and non-fans within each condition (each dot reflects a single ROI of the moral judgment network).

Table 1
Aggregation of clips at various levels of hierarchical organization.

Table 2
Character morality ratings across clip conditions.

Table 3
Story outcome valence ratings across clip conditions.

Table 4
Intersubject representational similarity in moral brain networks.
Note.Moral brain networks with statistically significant (p < .05)intersubject representational similarity (IS-RSA) before applying multiple comparisons corrections are displayed.No individual ROI showed significant IS-RSA for the behavioral measure of outcome-valence judgments across narrative conditions.