Neuroimaging and behavioral evidence that violent video games exert no negative effect on human empathy for pain and emotional reactivity to violence

Influential accounts claim that violent video games (VVGs) decrease players’ emotional empathy by desensitizing them to both virtual and real-life violence. However, scientific evidence for this claim is inconclusive and controversially debated. To assess the causal effect of VVGs on the behavioral and neural correlates of empathy and emotional reactivity to violence, we conducted a prospective experimental study using functional magnetic resonance imaging (fMRI). We recruited 89 male participants without prior VVG experience. Over the course of two weeks, participants played either a highly violent video game or a non-violent version of the same game. Before and after this period, participants completed an fMRI experiment with paradigms measuring their empathy for pain and emotional reactivity to violent images. Applying a Bayesian analysis approach throughout enabled us to find substantial evidence for the absence of an effect of VVGs on the behavioral and neural correlates of empathy. Moreover, participants in the VVG group were not desensitized to images of real-world violence. These results imply that short and controlled exposure to VVGs does not numb empathy nor the responses to real-world violence. We discuss the implications of our findings regarding the potential and limitations of experimental research on the causal effects of VVGs. While VVGs might not have a discernible effect on the investigated subpopulation within our carefully controlled experimental setting, our results cannot preclude that effects could be found in settings with higher ecological validity, in vulnerable subpopulations, or after more extensive VVG play.


Introduction
Video games have evolved into one of the most popular forms of entertainment.In Europe, 25% of the population report playing video games weekly, and especially young adults spend much time in these 'virtual worlds' (IPSOS MediaCT, 2012).Many popular games contain high levels of violent imagery, with the killing or hurting of other characters being deeply engrained in the gameplay (Gentile et al., 2004;Krantz et al., 2017).Many recent studies have investigated whether such violent video games (VVGs) have adverse effects on real-world social behavior and empathy (Anderson et al., 2010).According to the influential general aggression model (Bushman and Anderson, 2002), VVGs should decrease the players' empathy for the pain of others by desensitizing them to both virtual and real violence.Such desensitizing effects should in turn be reflected by decreased activity in brain areas underpinning empathy, such as the anterior insula (AI) and the anterior midcingulate cortex (aMCC) (Lamm et al., 2011;Lamm et al., 2019).However, the evidence for this prediction is mixed.While some studies found that playing VVGs leads to emotional desensitization on the behavioral and neural level (Arriaga et al., 2011;Bartholow et al., 2006;Carnagey et al., 2007;Engelhardt et al., 2011;Staude-Müller et al., 2008), other studies failed to reveal such effects (Gao et al., 2017;Kühn et al., 2018;Szycik et al., 2017a;Szycik et al., 2017b).Conflicting results are also found on the level of systematic reviews (de Vrieze, 2018; Mathur and VanderWeele, 2019).Several meta-analyses suggest that VVGs exert small, yet consistent adverse effects on aggression and empathy (Anderson et al., 2010;Calvert et al., 2017;Greitemeyer and Mügge, 2014;Mathur and VanderWeele, 2019;Prescott et al., 2018).Other researchers contest these results, claiming that results are a product of selective reporting and biased analyses (Ferguson and Kilburn, 2010;Hilgard et al., 2017b).
A key question is whether VVGs are causally responsible for low empathy, or whether less empathic individuals are more likely to play VVGs (Bushman and Anderson, 2015;Ferguson et al., 2008).Many studies have been quasi-experimental in nature, comparing the empathic responses of participants who habitually play VVGs with those of participants without VVG experience (Bartholow et al., 2005;Bartholow et al., 2006;Gentile et al., 2016;Krahé et al., 2011).Such designs provide limited information on the direction of the causal link between VVGs and decreased empathy.The existing experimental studies have nearly always used VVGs as an experimental manipulation shortly before measuring the outcomes of interest (Arriaga et al., 2011;Bushman and Anderson, 2009;Carnagey et al., 2007;Engelhardt et al., 2011;Guo et al., 2013;Staude-Müller et al., 2008).While these studies consistently report evidence for a desensitizing effect of violent games, they cannot eLife digest Violent video games have often been accused of facilitating aggressive behaviour, in particular due to concerns that they could numb players toward real violence and therefore result in decreased empathy towards the pain of others.However, studies investigating these claims have often produced conflicting results, potentially due to methodological issues.For instance, work showing that violent games lead to emotional desensitization has often relied on testing participants immediately after a gaming session, which limits interpretations about prolonged impact.Many studies also compare gamers to people with no gaming experience, making it difficult to assess whether violent games decrease empathy, or whether less empathetic individuals are more likely to be drawn to this content.
Lengersdorff et al. aimed to examine the long-term effects of violent video games using an experimental design that would bypass some of these limitations.A group of 89 young men with little gaming experience were recruited to play either a highly or non-violent version of the same game for seven hour-long sessions over two weeks.The way their brain reacted to violent images and processed other people's pain was assessed before and after this 'gaming training' using fMRI.The analyses showed no changes in these measures in volunteers who played the violent version of the game, suggesting that it had not numbed them to violence or affected their empathy.
While experimental studies cannot fully capture the experiences of real-world gamers, the findings by Lengersdorff et al. represent a step towards resolving the scientific controversy surrounding the effects of violent games.Ultimately, a deeper understanding of how this type of media influences our emotions could help inform policymaking decisions about access to violent content.disentangle the immediate effects of VVG play from those that have a persistent, long-term impact on individuals.Immediate VVG effects may encompass a wide range of processes, such as priming (Bushman, 1998), as well as stress-like responses such as increases in active fear and aggressive behaviors (Fanselow, 1994;Mobbs et al., 2007;Mobbs et al., 2009) that include generally increased sympathetic activity, release of stress hormones, heightened activation of involved brain structures, and cognitive-affective responses (e.g.deep reflection on the seen content, and changes in emotions and mood).Such responses can persist on a timescale of minutes to hours after aversive events such as VVG exposure, and have been shown to negatively affect social behavior (Nitschke et al., 2022).It is important to distinguish these immediate effects from longer-term adaptations that occur over days or weeks, such as habituation or memory consolidation processes.The general aggression model predicts that the repeated exposure to violence in the positive emotional context of videogames leads to the gradual extinction of aversive reactions, resulting in the long-term desensitization of players to real-world violence (Bushman and Anderson, 2009).
It is therefore essential to conduct experimental studies that can disentangle the long-and shortterm effects of VVGs in participants without prior VVG experience.One first such study was conducted by Kühn et al., 2018, who found no significant effects of VVGs on empathy and its neural correlates.While this study was an important starting point, four important design features limited its conclusions.First, the researchers used very dissimilar games in the experimental group versus the control group, restricting the comparability of the two conditions.Second, while the participants of the experimental group were asked to play the violent game Grand Theft Auto V (Rockstar Studios) for 30 min per day over 2 months, the authors did not control the degree to which participants actually played the game.Third, the authors did not control that participants actually committed violent acts within the game, as the game offers a large amount of gameplay without violent content.Fourth, the absence of significant results was interpreted as evidence for the absence of VVG effects.However, the authors did not report the results of equivalence tests (Lakens et al., 2018) or Bayesian hypothesis tests (Keysers et al., 2020) that would support such claims conclusively (Hilgard et al., 2017a).In view of In trials of the Self condition, participants passively received electrical stimuli.In the Other condition, participants experienced how another person (a confederate) received electrical stimuli.The stimuli were either painful or not painful.In the cue phase, an arrow indicated the recipient (downwards: Self; right: Other) and the intensity (blue: not painful; red: painful) of the next stimulus.In the stimulation phase, the stimulus was delivered.After half of the trials, participants were asked to rate the last stimulus.The confederate depicted has given informed consent that his photograph can be published.(B) Emotional reactivity task.Participants were presented pictures with different content (violent or neutral) and different context (real or game context).After observing a block of pictures, participants rated their current unpleasantness on a visual analog scale from 0 to 100.
the many conflicting results reported by experimental research and even meta-analyses (de Vrieze, 2018; Mathur and VanderWeele, 2019), clearly differentiating between 'absence of evidence' and 'evidence of absence' is particularly important.
To test possible causal effects of VVGs on empathy and its neural correlates, we conducted an experimental prospective study, which addressed each of these limitations.Eighty-nine male participants with little to no prior VVG experience repeatedly played a modified version of Grand Theft Auto V over the course of 2 weeks.Participants in the experimental group played a highly violent version of the game and were tasked to kill as many other characters as possible.Participants of the control group played a version of the same game from which all violent content was removed, and were asked to perform a non-violent task (taking photographs of other characters).Before and after this gaming period, participants completed a functional magnetic resonance imaging (fMRI) session during which we measured the behavioral and neural correlates of empathy for pain and emotional reactivity to violent images (see Figure 1 and Methods: Experimental fMRI sessions for details).We used Bayesian hypothesis tests to assess whether there were negative effects of VVGs on participants' empathic behavior and neural responses.Hypothesis tests were performed by means of the Bayes factor (BF; Kass and Raftery, 1995).We followed the convention to report a BF>3 as evidence for the alternative hypothesis, a BF<1/3 as evidence for the null hypothesis, and a BF in the interval [1/3, 3] as inconclusive evidence for either hypothesis (Kass and Raftery, 1995;Keysers et al., 2020).We would like to emphasize, though, that the BF provides an easily interpretable continuous quantification of the evidence for and against hypotheses, and that a strict categorization of BFs into evidence for and against hypotheses is not necessary.Our aim was to provide conclusive evidence on the question whether VVGs can desensitize humans to the plight of others or not, within our carefully balanced experimental model.
Table 1.Posterior parameter means of models for ratings in the empathy-for-pain task.Dependent variable: empathy ratings (visual analog scale, range: 0-100).Factor codings: Group: control game group = -1, violent game group = 1; Session: first session = -1, second session = 1; Intensity: non-painful = -1, painful = 1.Bayes factors were derived from comparing a model where the respective parameter was unrestricted to a model where it was restricted to zero.†These Bayes factors were derived from comparing a model where the parameter was restricted to be negative to a model where it was restricted to zero (one-sided hypothesis test).

Descriptive statistics of gaming behavior
Forty-five participants took part as part of the experimental group, and 44 participants as part of the control group.On average, participants of the experimental group killed 2844.7 characters (SD = 993.9,median = 2820, minimum = 441, maximum = 6815).Participants of the control group took an average of 3055.3 pictures of other characters (SD = 1307.5,median = 3026, minimum = 441, maximum = 6815).Thus, as was the aim of our experimental design, each participant of the experimental group was exposed to a substantial number of violent acts in the video game.

Empathy for pain
To test our central hypothesis, we investigated if participants who played the VVG showed decreased empathy for pain on the behavioral level.We analyzed the ratings obtained during the empathy-forpain task with a hierarchical Bayesian censored regression model.We modeled fixed effects for the experimental factors Group (non-violent vs. violent gaming, coded as -1 and 1), Time (pre vs. post gaming sessions, coded as -1 and 1), and Intensity (non-painful vs. painful stimulation of the confederate, coded as -1 and 1), as well as all interactions between these factors.See Methods: Data analysis for more details.
The posterior means of fixed effect parameters are listed in Table 1.A for painfulness ratings, and Table 1.B for unpleasantness ratings.As a manipulation check, we first tested whether painful stimuli led to increased painfulness and unpleasantness ratings, compared to non-painful stimuli.For both kinds of ratings, this test revealed very strong evidence (BF>100) for an effect of intensity, indicating that our paradigm was able to induce empathic responses in participants (see Figure 2A and B).The posterior mean of the regression parameter β of the factor Intensity was 27.86 for painfulness ratings, and 17.48 for unpleasantness ratings.Given our used factor coding, this means that the average difference in ratings between painful and non-painful stimuli was 2*27.86=55.72 points of the 100point VAS for painfulness ratings, and 2*17.48=34.96points for unpleasantness ratings.
We found evidence for the absence of a VVG effect on the painfulness ratings.Comparing a model where the fixed effect of Group*Time*Intensity could be negative to a model where the effect was set to zero resulted in a BF of 0.324.This means that the observed ratings were about 3.1 times more likely under the null hypothesis of no VVG effect than under the alternative hypothesis.When estimated without restrictions, the posterior mean of β for the interaction Group*Session*Intensity was -0.78.Given our factor codings, this means that the quantity [rating Pain -rating No Pain ] Session 2 -[rating-Pain -rating No Pain ] Session 1 (thus, the baseline-corrected empathic response) was on average 1.56 points smaller in the experimental group than in the control group, on the 100-point VAS.However, note that the Bayesian hypothesis test suggests that a model with this interaction restricted to zero provides a better explanation of the data.
For the unpleasantness ratings, evidence for absence of a VVG effect was substantial.With a BF of 0.130, the observed data were about 7.7 times more likely under the null hypothesis of no VVG effect than under the alternative hypothesis.The posterior mean of β for the interaction Group*Session*Intensity was -0.45.Given our factor codings, this means that the quantity [rating Pain -rating No Pain ] Session 2 -[rating Pain -rating No Pain ] Session 1 was on average 0.9 points smaller in the experimental group than in the control group.However, note again that the Bayesian hypothesis test suggests that a model without this interaction provides a better explanation of the data.
In summary, the behavioral data suggest that VVG play as implemented in this study has no effect on either type of empathy rating.

Emotional reactivity
Next, we investigated whether playing the VVG desensitized participants toward depictions of violence.We again used a hierarchical Bayesian censored regression model, and included fixed effects for the experimental factors Group (non-violent vs. violent gaming, coded as -1 and 1), Content (neutral vs. violent, coded as -1 and 1), and Context (real vs. game, coded as -1 and 1).
The posterior means of fixed effect parameters of this model are listed in Table 2.As a manipulation check, we first tested whether participants experienced more unpleasantness in the emotional reactivity task while observing violent pictures compared to neutral pictures.We found very strong evidence (BF>100) for this hypothesis, indicating that our paradigm was successful in inducing unpleasantness by violent imagery.The posterior mean of the regression parameter β of the factor Content was 37.08.This means that the average difference in ratings between violent and neutral stimuli was 74.16 points of the 100-point VAS.The unpleasantness ratings are depicted in Figure 2C.
Further, we found substantial evidence for the absence of a desensitizing VVG effect.Comparing a model where the fixed effect of Group*Content could be negative to a model where the effect was set to zero resulted in a BF of 0.151.Thus, participants of the violent game group did not show a decreased emotional response toward depictions of real and game violence.Moreover, testing the fixed effect of Group*Content*Context resulted in a BF of 0.094, indicating that there was also no desensitizing effect that was specific to depictions of game violence.When estimated without restrictions, the regression parameters associated with both interactions were positive, β=2.28 for Group*Content, and β=0.33 for Group*Content*Context.This means that, ostensibly, participants in the experimental group had a very weak tendency to rate violent images as more unpleasant than participants in the control group, contrary to expectations.However, note again that the Bayesian hypothesis test suggests that a model without these interactions provides a better explanation of the data.In summary, the behavioral data suggest that playing the VVG did not emotionally desensitize participants toward violent images.

fMRI data Empathy for pain
We next analyzed the fMRI data collected during the empathy-for-pain task.To define our regions of interest (ROIs), we first performed whole-brain general linear model (GLM) analysis of the data of the first fMRI session.Our contrast of interest [Other Pain -Other No Pain] compared brain activity when the confederate experienced painful stimulation to activity when the confederate experienced only non-painful stimulation (see Methods: Data analysis for details).This revealed significant clusters in our a priori defined brain areas of interest, aMCC and bilateral AI, as well as in other areas, including the left supramarginal gyrus and the right angular gyrus (see Figure 3A, and Appendix 3 for detailed Group*Session*Intensity is not supported by the respective Bayesian hypothesis test (BF = 0.130, Table 1).(C) Emotional reactivity, unpleasantness rating.Question text: 'How unpleasant?'.Boxes: the middle line marks the group mean of participant ratings in the respective condition; the box represents the 95% credible interval of the posterior predictive distribution of mean ratings.Dots depict the individual mean ratings of participants, lines depict the 95% credible interval of the posterior predictive distribution of mean ratings of single participants.Control game group: N = 44.Violent game group: N = 45.

Figure 2 continued
Table 2. Posterior parameter estimates of models for ratings in the emotional reactivity task.Dependent variable: unpleasantness ratings (visual analog scale, range: 0-100).Factor codings: Group: control game group = -1, violent game group = 1; Content: neutral = -1, violent = 1; Context: real = -1, game = 1.Bayes factors were derived from comparing a model where the respective parameter was unrestricted to a model where it was restricted to zero.†These Bayes factors were derived from comparing a model where the parameter was restricted to be negative to a model where it was restricted to zero (one-sided hypothesis test).results).Subsequently, we performed Bayesian linear mixed effects analyses on the data extracted from the ROIs (aMCC, left AI, right AI).See Methods: Data analysis for details.We compared models where the fixed effect of Group*Time*Intensity could be negative to a model where the effect was set to zero.For responses in the Cue phase (where participants were informed whether the other person would receive a painful or a non-painful stimulus), we obtained the following BFs: BF aMCC = 0.402; BF left AI = 0.547; BF right AI = 0.190.For responses in the Stimulation phase (where participants observed the other person receiving the stimulus), we obtained the following BFs: BF aMCC = 0.176; BF left AI = 0.143; BF right AI = 0.434.See Appendix 2-table 1 for posterior distributions and BFs of all model parameters.In summary, we found weak to moderate evidence for the absence of an effect of playing the VVG on participants' brain activity while they observed another person in pain.

Emotional reactivity
Our next analysis concerned the fMRI data coming from the emotional reactivity task.To define our ROIs, we computed the contrast [Violent -Neutral], comparing brain activity during observation of violent images to brain activity during observation of images with neutral content (see Methods: Data analysis for details).This revealed significant clusters in one of our a priori areas of interest, the bilateral amygdala, as well as several other regions, such as the bilateral fusiform gyrus and the bilateral precentral gyrus (see Figure 3B and Appendix 3 for detailed results).However, we found no significant clusters in the other brain ROIs, the aMCC or the bilateral AI.Therefore, we restricted our subsequent ROI analysis to the amygdala.We performed Bayesian linear mixed effects analyses on the data extracted from the amygdala.See Methods: Data analysis for details.First, we compared a model where the fixed effect of Group*-Content could be negative to a model where the effect was set to zero.This resulted in a BF of 0.324 for the left amygdala, and a BF of 0.338 for the right amygdala, indicating absence of an effect in both ROIs.Next, we tested the fixed effect of Group*Content*Context.With a BF of 0.205 for the left amygdala, and 0.163 for the right amygdala, this analysis also indicated the absence of an effect.See Appendix 2-table 2 for posterior distributions and BFs of all model parameters.In summary, the data suggest that playing the VVG did not lead to a dampened brain response to images of violence in neither real nor gaming contexts.

Sample comparability
We constrained our sample to young adult (18-35 years) males who had minimal prior exposure to VVGs in general, and who had not played the game used in the study before.However, given the great popularity of VVGs among young adult males, it is also possible that this constrained our sample to a subpopulation that is less susceptible to desensitization effects to begin with.Therefore, we tested whether the subpopulation from which we drew our sample exhibited higher levels of trait empathy than the general population.To achieve this, we compared the trait empathy levels of our sample, as measured by the Questionnaire for Cognitive and Affective Empathy (QCAE; Reniers et al., 2011), to those of a control sample of 18-to 35-year-old males who were not preselected for minimal VVG use.See Methods: Data analysis: Post hoc analyses for more details.
The results are depicted in Table 3.For all subdimensions, Bayesian t-tests provided moderate to substantial evidence for the hypothesis that there is no difference between the two groups (BF<1/3).Thus, our exploratory analysis suggests that our inclusion criterion of minimal VVG exposure did not result in a preselection of individuals with extraordinarily high levels of empathy.

Test-retest reliabilities
In this study, we measured a number of behavioral and neural correlates in two experimental sessions -once before the exposure to the VVG or the control game, once after.Thus, the test-retest reliability (i.e. the correlation between the two measurements of a variable) is of interest, as this informs us about the relative stability of our outcome variables of interest.This also affects the statistical power of our performed tests (see next section).

Bayesian design analyses
We based our sample size on the results of a power analysis designed for the frequentist inference framework (see section Methods: Power analysis).However, as we ultimately based our inference on BF tests, the theoretical long-term behavior of these tests, given our sample size and expected effect size, is of interest.This also informs us about the effect sizes that could realistically have been detected using our sample size.Therefore, we conducted a post hoc BF design analysis by means of a Monte Carlo simulation experiment (Schönbrodt and Wagenmakers, 2018).See Methods: Data analysis: Post hoc analyses for analysis details.It is of particular importance to note that the diagnosticity of hypothesis tests involving repeated measurements does also depend on the correlation between the repeated measures, i.e., the test-retest reliability.
The results are presented in Table 4.In summary, the simulation experiment suggested that our behavioral analyses, for which test-retest reliabilities were high, were well enough powered to differentiate between the absence and presence of a medium-to-small effect of d=0.3.Note that this effect size is smaller than the lower bound of effect size estimates reported in the meta-analysis of Anderson et al., 2010, which was d=0.345.For smaller effects, such as d=0.2, the a priori power of our behavioral analyses was not optimal, as it would have been likely that we would have obtained an Table 4. Results of the Bayes factor design analysis.Depicted are the estimated probabilities of inferential decisions for each dependent variable and assumed true effect size d.Inc.: Inconclusive evidence, no decision.H0: evidence for the null hypothesis.H1: evidence for the alternative hypothesis.ρ=correlation between repeated measurements, i.e., test-retest reliability.The estimated probabilities of correct decisions (evidence for H0 when d=0.0, evidence for H1 when d>0.0) are marked in bold.inconclusive result (1/3<BF<3) even in the presence of a true effect of that size.However, given that we obtained evidence for the null hypothesis (BF<1/3) in all relevant BF tests on our behavioral data, our results speak strongly against the presence of such an effect.Regarding our neural analyses, given the low correlation between repeated measurements (i.e.test-retest reliability), the Bayesian power of our fMRI analyses should be regarded as low.Taken alone, we would not consider them convincing evidence against the presence of a VVG effect.However, together with our behavioral results, they suggest that VVG effects, if they exist, can be expected to be very small.

Cross-task correlations
Given that we measured empathy for pain and emotional reactivity in the same subjects, our data also allowed us to investigate the relationships between these two phenomena.For this, we calculated the correlations between the behavioral and neural measurements of our outcome variables.The results are presented in Table 5.We can observe that for our behavioral measures, cross-task correlations were substantial (r=0.227-0.280, with all credible intervals not covering zero).However, we could observe no substantial cross-task correlations for our neural measures, or across neural and behavioral indicators.

Discussion
Influential theories of media violence predict that the repeated playing of VVGs results in decreased empathy for pain due to a desensitization to real-world violence (Anderson et al., 2010;Bushman and Anderson, 2002).Here, we report evidence against this hypothesis in relation to our specific setting.We found that participants who repeatedly played a highly violent game for 7 hr over the course of 2 weeks did not show decreased empathy for another person's pain or decreased responses to violent imagery.
Our findings contrast with several earlier studies that found a negative relationship between playing VVGs and empathic responses to violence.Importantly, the majority of these studies were quasi-experimental in nature, and therefore provide only limited evidence for a putative causal effect of violent gaming (Bartholow et al., 2005;Bartholow et al., 2006;Gentile et al., 2016;Krahé et al., 2011).Moreover, the few experimental studies that exist implemented designs investigating short-term carryover effects, as they had exposed participants to virtual violence rather immediately before measurements of their outcome variables of interest (Arriaga et al., 2011;Bushman and Anderson, 2009;Carnagey et al., 2007;Engelhardt et al., 2011;Guo et al., 2013;Staude-Müller et al., 2008).Together with the study of Kühn et al., 2018, our study is one of the first to investigate persistent effects of VVGs in participants without prior experience with them, enabling a clear assessment of the causality of VVG effects.Importantly, our study was designed to address several limitations of the study of Kühn et al.: We strictly controlled the amount of virtual violence actually experienced by participants, and used a non-violent version of the same game in the control condition; moreover, we applied a Bayesian analytical approach, which, together with our comparatively large sample size, enabled us from the outset to distinguish 'absence of evidence' from 'evidence of absence' of VVG effects.This approach yields consistent evidence from both behavioral and neural data that VVGs, to the extent and characteristics played in our interventional design, are not causally responsible for a persistent lack of empathy or emotional desensitization to violence.Despite the aforementioned strengths of our study, we also need to address several limitations.Our experimental design ensured that participants of the experimental group were exposed to a substantial amount of violent gameplay during gaming sessions (each participant 'killed' an average of 2845 other characters in a graphically violent way).However, the overall exposure to virtual violence was still very low when compared to the amount that is possible in the everyday life of typical VVG players.During our experiment, participants played for 7 hr over the course of 2 weeks.However, habitual gamers can play an average of 16 hr in the same time frame (Clement, 2021;Statista Research Department, 2022).Our results cannot preclude that longer and more intense exposure to VVGs could have negative causal effects on empathy.In particular, adolescents and children as well as persons with specific neuropsychiatric traits might be especially susceptible to long-term changes due to increased brain plasticity.However, empirically testing higher levels of violence with the same degree of control as realized in our study would reach the limits of practical feasibility.We thus believe that our results provide an important perspective on the size of VVG effects that could realistically be expected in experimental research.
To increase experimental control, we restricted our sample to young adult males who had minimal prior exposure to VVGs.It is possible that, due to this strict preselection criterion, our sample was drawn from a subpopulation that is particularly resistant to desensitization.An exploratory analysis provided strong evidence that our selection criterion did not result in particularly high levels of trait empathy in our sample, though.However, we cannot preclude that our sample was particularly resistant to VVG effects due to other, untested characteristics.Further research is needed to assess if our results generalize to samples with other characteristics that may be more representative for the general population.
To maximize the amount of violence that participants would be exposed to (and commit) in the game, we restricted the game's objective to killing other characters, and incentivized this behavior with monetary rewards.This might have reduced the ecological validity of our operationalization of gaming, and it is possible that bigger effects could be seen when violent gameplay is more internally motivated, i.e., individuals who want to play the game may be differently affected than those that have merely accepted to be part of an experiment.Still, our results provide valid evidence that the mere exposure to virtual violence for 7 hr over 2 weeks is not sufficient to decrease empathy.
It should be noted that there are few studies that connect laboratory-based experimental investigations of empathy and emotional reactivity to real-world behavior and its measures.There are indications, however, that neuroscientifc empathy measures similar to the ones used here predict individual social behavior (e.g.donation, helping, or care-based behavior; Ashar et al., 2017;Hartmann et al., 2022;Hein et al., 2010;Tomova et al., 2017), and that they are also validated by their predictivity of mental or preclinical disorders characterized by deficits in empathy (Bird et al., 2010;Lamm et al., 2016, for review).That said, it is obvious that future research is needed that bridges and integrates laboratory and field-based measures and approaches, in order to inform us how changes (or their absence) in neural responses induced by VVG play are connected to real-life social emotions and behaviors (see Stijovic et al., 2023, for a recent example illustrating, in the domain of social isolation research, how a combined lab-and field-based study can be directly informed by prior laboratorybased neuroscience findings).
Our study was designed to reliably detect an effect size of d=0.3, an effect even smaller than the lower estimate for VVG effects on empathy reported in Anderson et al., 2010.Our results provide substantial evidence that effects of this magnitude are not present in settings similar to our experimental design.These arguments notwithstanding, it needs to be noted that future studies with higher power may detect still smaller effects.Considering the high prevalence of VVG, even such small effects could be of high societal relevance (Funder and Ozer, 2019).For now, based on the current design and data, we can conclude that experimental long-term VVG effects on empathy are unlikely to be as large as previously reported.
It may be argued that the empathy for pain paradigm and the associated behavioral and neural responses are so robust and resistant to changes by external factors that this may explain the lack of evidence for the effects of VVG play.This argument however would contradict a wealth of findings illustrating malleability of empathic responses using this and related designs, including with placebo analgesia (Rütgen et al., 2021;Rütgen et al., 2015a;Rütgen et al., 2015b), an intervention that usually shows low to moderate effect sizes as well (see e.g.Hein and Singer, 2008;Jauniaux et al., 2019;Lamm et al., 2019, for review).
Lastly, and somewhat surprisingly, we found that the test-retest reliability of our neural covariates of empathy for pain were close to zero for all investigated ROIs.Knowing that an individual's neural empathic response (blood oxygen level-dependent [BOLD] activity for seeing somebody else in pain vs. in no pain) was above or below average in the first session provides little to no information about their relative response in the second session.To the best of our knowledge, our study was the first one to present the empathy for pain paradigm to the same sample of participants after a longer time frame.Thus, this surprising result provides valuable information on the limitations of this task respectively the neural measurements acquired in it, and certainly demands further research to investigate the factors influencing fMRI reliability (see also Elliott et al., 2020;Kragel et al., 2021).We would like to emphasize, though, that a high test-retest reliability is not a precondition for the valid testing of group-level effects.For a group-level effect to be testable, it is only necessary that the mean of the dependent variable is consistently affected by the independent variable.It is not necessary that participants who show an above average level in the DV in one session also show an above average level in the second session, and vice versa.Otherwise, there would also be no point in independentsample designs.Indeed, it has recently been discussed that highly robust cognitive tasks are bound to exhibit low test-retest reliability, as robust tasks are often characterized by low interindividual variation, and thus leave only little variance that can be explained by participant traits (Hedge et al., 2018).However, it must also be noted that low reliability does lead to lower power of repeated measures designs.As discussed above, the low reliability of the measured neural responses has resulted in suboptimal power of our tests on fMRI data.
In summary, our findings stand in contrast to claims that posit the playing of violent games as an essential factor for explaining decreases in empathy.If this is shown to generalize to when people play more often and over longer periods, the desensitization to violence described in prior reports using quasi-experimental designs might have been caused by third and pre-existing factors, such as education, socio-economic status, or mental health issues (DeCamp and Ferguson, 2017;Lemmens et al., 2006;Shao and Wang, 2019;Tortolero et al., 2014).Together with similar findings (Kühn et al., 2018), our results point out the limits to which VVGs can be held responsible for lacks of empathy, at least in highly controlled experimental settings that last for the 2 weeks of play implemented here.This is not to say, though, that there is no point in further investigating the complex relationships between violent media use and adverse social behavior.We propose that the design and analysis approach of the present study could act a reference of how future studies should be conducted, in order to increase the stringency and robustness of research in this domain.Together with our findings, such studies will aid in resolving the scientific controversy regarding the negative effects of VVGs (de Vrieze, 2018; Mathur and VanderWeele, 2019), and contribute to a deeper understanding of the interplay between violent media and emotion.

Power analysis
We planned to collect data from 90 participants.We derived this sample size from a power analysis based on VVG effect sizes reported in the meta-analysis of Anderson et al., 2010.The authors estimated the size of the negative VVG effect on empathy/desensitization to be r=0.194,95% CI = [0.170,0.217], which corresponds to Cohen's d=0.396, 95% CI = [0.345,0.445], representing a smallto-medium effect.We chose d=0.300 as the minimum effect size for which we wanted to achieve a power of 0.80, to ensure that we would have enough power even if the reported effect size was overestimated.Note that thus, the effect size we used was even smaller than the lower bound reported in Anderson et al., 2010.We performed the power analysis using the software Gpower 3.1.9.2 (Faul et al., 2007), calculating the required sample size to achieve a power of 0.8 for the interaction in a 2-by-2 within-between design ANOVA, assuming a medium correlation of 0.5 between repeated measures, and using the conventional alpha error level of 0.05.This resulted in a required sample size of 90.Using such a sample size, the achieved power for the effect size reported in Anderson et al., 2010, as well as its lower and upper bound, was as follows: for d=0.345, achieved power = 0.901; for d=0.396, achieved power = 0.960; for d=0.445, achieved power = 0.986.
Please note that while this power analysis was based on a frequentist analysis framework, we are reporting Bayesian analyses here.However, we considered this power analysis to be a sensible benchmark for the sample size needed to answer our research questions.See Results: Post hoc analyses: Bayesian design analysis for a Bayesian design analysis that provides more information on the size of effects that could be detected with our sample size using Bayesian analyses.

Participants
In total, 97 participants completed the first experimental session.Of these, eight participants dropped out of the study (six before the first video game sessions; two after, of which one was from the experimental group and one from the control group).We thus acquired complete datasets from 89 participants.
To control for previous VVG exposure, we only included individuals that had not played VVGs at least 12 months before testing, and had not played the video game Grand Theft Auto V before.We did this to avoid a possible ceiling effect: participants who had already played these games before might already have been desensitized too much for our experimental VVG exposure to show any effect, therefore reducing sensitivity.We tested only male participants, as more males than females play VVGs regularly (Gentile et al., 2004;Krahé and Möller, 2004;Padilla-Walker et al., 2010).Moreover, males have been shown to be more easily influenced by violent media (Bartholow and Anderson, 2002;Bettencourt and Kernahan, 1997).To further increase homogeneity of the sample, we restricted the age range of possible participants to 18-35 years.Additional inclusion criteria were no history of neurological or psychiatric disorders or drug abuse, and standard inclusion criteria for MRI measurements.Participants were recruited through online advertisements and received a financial compensation of €145 for participating in all experimental sessions.A performance-linked bonus of up to €35 acted as an additional incentive during the game sessions.The study was approved by the ethics committee of the Medical University of Vienna (decision number 1258/2017).The confederate depicted in Figure 1A has given informed consent that his photograph may be used for this publication.

Overall study design
Participants were randomly assigned to the violent game group or the control game group.Participants first completed a pretest fMRI session, during which they performed an experimental task designed to measure empathy for pain.Then, over the course of 2 weeks, participants of the violent game group repeatedly played a VVG, while the control game group played a non-violent version of the same game.Subsequently, both groups completed the posttest fMRI session.Here, participants performed the empathy-for-pain paradigm again, and also completed a task designed to measure emotional reactivity to violent pictures.

Experimental fMRI sessions Confederate
To facilitate empathic responses during the experimental tasks, participants completed the experimental session together with a male confederate.The confederate acted as if he were a second participant of the experiment.This deception was maintained until the end of the last experimental session, at which point participants were debriefed.

Pain calibration
The empathy-for-pain paradigm included the administration of painful but tolerable stimuli.The physical pain was induced via a well-established procedure (e.g.Rütgen et al., 2015b).Electrical stimuli were produced by a Digitimer DS5 stimulator (Digitimer Ltd, Clinical & Biomedical Research Instruments, United Kingdom) and delivered by electrodes placed on the dorsum of the left hand.Subjective pain thresholds were determined using a standardized calibration procedure.The participant received short (500 ms) stimuli of increasing intensity and was asked to rate pain intensity on a numeric scale (0 = 'not perceptible'; 1 = 'perceptible, but not painful', 3 = 'a little painful', 5 = 'moderately painful', 7 = 'very painful', 9 = 'extremely painful, highest tolerable pain').The average intensities of stimuli rated as 1 and 7 were then chosen as the intensities of the non-painful and painful stimulation conditions during the empathy-for-pain task.

Empathy-for-pain paradigm
We used a well-established paradigm to measure participants' empathic responses (Hartmann et al., 2021;Rütgen et al., 2015a;Rütgen et al., 2015b;Singer et al., 2004).Participants either received electric stimuli themselves (Self condition), or saw images of the confederate indicating that he was currently receiving electric stimulation (Other condition).The stimuli were either painful (Pain condition) or perceptible but not painful (No Pain condition).The timeline of the task is illustrated in Figure 1A.At the start of each trial, a downwards or rightwards arrow (presented for 2 s) indicated whether the next stimulus would be delivered to the participant or the confederate, respectively (Cue phase).Red and blue arrows indicated painful and non-painful stimulation, respectively.After a jittered interval [3-7 s], the stimulus was delivered (Stimulation phase).In the Self condition, the participant received the electrical stimulus (0.5 s), and saw a pixelated photograph (1 s).In the Other condition, the participant saw a photograph of the confederate with a neutral or painful facial expression.After half of the trials, participants rated the last stimulus on a 100-step visual analog scale (VAS).In the Self condition, participants rated how painful the last stimulus was for themselves.In the Other condition, participants rated how painful the stimulus was for the confederate (other-oriented painfulness rating), and how unpleasant it was for themselves to observe the confederate receiving the stimulus (self-oriented unpleasantness rating).In total, there were 64 trials, with 16 trials per condition (Self Pain, Self No Pain, Other Pain, Other No Pain).Conditions were presented in a pseudorandomized order.The task was presented using COGENT (http://www.vislab.ucl.ac.uk/cogent.php),implemented in MATLAB 2017b (The MathWorks Inc, Natick, MA, USA).The total task duration was approx.20 min.

Emotional reactivity paradigm
To investigate emotional reactivity to violent images, we used an affective picture paradigm (Olofsson et al., 2008;Petrovic et al., 2005).Participants were shown pictures of either neutral or violent content (factor Content).Additionally, the pictures depicted either real scenes, or scenes taken from the video game participants played during the gaming sessions (factor Context).Real pictures were taken from the International Affective Pictures System (IAPS; Lang et al., 2005).Game pictures were matched to IAPS pictures in terms of content, valence, and arousal (see Appendix 1).
The sequence of events of the task is illustrated in Figure 1B.Each block consisted of five pictures of the same condition, presented for 3 s each, and with a short interval of 0.2 s between pictures.After a jittered interval [3-7 s] participants rated how unpleasant they felt on a 100-step VAS.In total, participants saw 16 blocks of pictures, with 4 blocks per condition (Neutral Real, Neutral Game, Violent Real, Violent Game).The task was presented using COGENT, and total task duration was approx.5 min.To avoid that participants formed expectations about the purpose of the study early on, participants completed this task only in the second fMRI session.

Gaming sessions
Between the two fMRI sessions, participants came seven times to the laboratory to play a video game for 1 hr.Intervals between subsequent gaming sessions were approximately 24-48 hr, and the second fMRI session was completed at least 24 hr after the last gaming session.Participants of both groups played a modified version of the game Grand Theft Auto V.In the violent game group, participants controlled a male character equipped with a close-combat weapon, and were tasked to kill as many other characters as possible.Killing was graphically violent, as hitting a character was accompanied by the splattering of blood, realistic animations of injury, and screams.In the control game group, participants played a version of the game in which all violence was removed.The player character had no weapon, and could not hurt other characters in any way.They could also not be attacked by other characters, and there was no violence between non-player characters.In this condition, participants were tasked to take photographs of as many other characters as possible.In both groups, participants could also freely explore the world of the game.To incentivize a high number of violent or non-violent acts, each kill or photograph was rewarded with one point.For every two points, participants were paid out +0.01€ at the end of the study.
Due to the lack of other studies implementing a randomized experimental prospective design (except for Kühn et al., 2018, published while data collection was already ongoing), there were no benchmarks for the amount and frequency of video game exposure for our study.We chose our regimen (seven 1-hourly sessions over 2 weeks) as we considered this a substantial yet still feasible amount of exposure.Number of sessions, playing time per session, and total playing time were considerably higher than in previous studies reporting VVG effects on empathy (Arriaga et al., 2011;Carnagey et al., 2007;Engelhardt et al., 2011;Hasan et al., 2013).

Data analysis
In this paper, we follow a Bayesian data analysis approach (Keysers et al., 2020), which allows clear assessments of the presence or absence of an effect of VVGs on empathy.Hypothesis tests were performed by means of the BF (Kass and Raftery, 1995).The BF represents how much more probable the observed data is under the alternative hypothesis compared to the null hypothesis.A wellestablished convention is to report a BF>3 as evidence for the alternative hypothesis, a BF<1/3 as evidence for the null hypothesis, and a BF in the interval [1/3, 3] as inconclusive evidence for either hypothesis (Kass and Raftery, 1995;Keysers et al., 2020).We formulated informed priors for all models to enable valid BF hypothesis tests (Vanpaemel, 2010).To increase comparability with the results of previous papers, we also report analogous frequentist analyses in the Appendix 5. We registered the analysis plan of this study at https://osf.io/yx423/.

Behavioral data analysis
To test the effects of VVGs on behavioral measures of empathy for pain, we analyzed the VAS ratings obtained during the empathy-for-pain task with hierarchical Bayesian censored regression models.We used censored regression models to account for the fact that participants could give no ratings lower than 0, or higher than 100.Models were estimated using the R package brms (Bürkner, 2017).We modeled fixed effects for the experimental factors Group (non-violent vs. violent gaming, coded as -1 and 1), Time (pre vs. post gaming sessions, coded as -1 and 1), and Intensity (non-painful vs. painful stimulation of the confederate, coded as -1 and 1), as well as all interactions between these factors.Additionally, we modeled per-subject random effects of Time, Intensity, and these factors' interaction term.To further account for variations in how participants used the VAS rating scale, we modeled per-subject error variance terms.For further details about the model specification and prior formulation, see Appendix 2.
We used the same kind of model to test possible desensitizing effects of VVGs on emotional reactivity to violent images.Here, we modeled fixed effects for the experimental factors Group (nonviolent vs. violent gaming, coded as -1 and 1), Content (neutral vs. violent, coded as -1 and 1), and Context (real vs. game, coded as -1 and 1).Additionally, we modeled per-subject random effects for Content, Context, and their interaction, as well as per-subject error variances.

MRI data preprocessing
Preprocessing and analysis of fMRI data were performed using SPM12 (Wellcome Trust Centre for Neuroimaging, https://www.fil.ion.ucl.ac.uk/spm) implemented in MATLAB 2017b.Functional images were slice timed and referenced to the middle slice, realigned to the mean image, and unwarped using the acquired field map.The structural image was co-registered to the mean image of the realigned functional images using mutual information maximization, and structural and functional images were normalized to the stereotactic Montreal Neurological Institute (MNI) space.The normalized functional images were smoothed with a Gaussian kernel of 4 mm full-width-at-half-maximum, which is equal to twice the voxel size on every axis.To remove motion-related artifacts, the functional images were then subjected to an independent-component-analysis based algorithm for automatic removal of motion artifacts (Pruim et al., 2015a;Pruim et al., 2015b), implemented using the FMRIB software library (FSL v5.0; http://www.fmrib.ox.ac.uk/fsl).

fMRI analyses: empathy for pain
With regard to empathy, our central interest lay in modulations of AI and ACC activity.To identify the regions in which empathic responses were reliably elicited independently of our experimental manipulation, we first analyzed the data from the first experimental session.We performed GLMbased whole-brain analysis using SPM12 (Wellcome Trust Centre for Neuroimaging, https://www.fil.ion.ucl.ac.uk/spm), implemented in MATLAB 2017b.For each participant, the design matrix included regressors for the Cue and Stimulation events, separate for all four combinations of conditions (Self No Pain; Self Pain; Other No Pain; Other Pain).As nuisance regressors, we included regressors for the rating events.We then subjected the beta images of the first-level contrast Other Pain>Other No Pain to a one-sample t-test, and identified the voxels in which this contrast was significant and positive (p<0.05 after family-wise error correction).From this, we obtained a binary mask of significant voxels.We then intersected this mask with anatomical masks taken from the Automated Anatomical Labeling atlas (AAL; Tzourio-Mazoyer et al., 2002).For the AI ROI, the binary mask was intersected with the AAL mask of the insula (label IN).For the aMCC ROI, the binary mask was intersected with the AAL masks of the anterior and median cingulate and paracingulate gyri (labels ACIN and MCIN).The aim of this masking procedure was to restrict analyses to those parts of the brain areas that are actually recruited by the task.We believe that this increases the sensitivity of our analyses, as we remove signals from voxels that are also part of these anatomical regions, but not actually recruited by the task.
We analyzed signal changes extracted from our ROIs with Bayesian linear mixed effects model tailored for fMRI data.Note that the ROIs, which were based on the signal from only the first session, were used to extract signals from both sessions.Custom code for this analysis with the software STAN (Carpenter et al., 2017) can be found at https://osf.io/yx423/.See also the Appendix 2 for more information.The full model included regressors for the Cue and Stimulation events, as well as nuisance regressors for rating events.

fMRI analyses: emotional desensitization
When testing the effects of VVGs on brain activity during the emotional-reactivity task, our main interest lay in a possible modulation of responses in the amygdala, as well as aMCC and AI.To define the corresponding ROIs, we first identified the brain areas that were reliably activated by violent imagery, independent of the experimental manipulation, using whole-brain GLM analysis.For each participant, the design matrix included regressors for the blocks of picture presentations, separate for all four combinations of conditions (Neutral Real, Neutral Game, Violent Real, Violent Game).As nuisance regressors, we included regressors for the rating events.We then pooled the beta images of the first-level contrast Violent>Neutral across both groups, and subjected them to a one-sample t-test.From this, we obtained a binary mask of voxels significant at p<0.05 after family-wise error correction.We then intersected this mask with AAL masks to obtain our final ROIs (for AI: label IN; for aMCC: labels ACIN and MCIN; for amygdala: label AMYG).We analyzed signal changes extracted from our ROIs with Bayesian linear mixed effects model.The full model included regressors for the blocks of picture presentation, as well as nuisance regressors for rating events.

Sample comparability
Due to our preselection of young adult males with minimal prior VVG exposure, it appeared possible that our sample was drawn from a subpopulation with higher trait empathy than the general population.To test this potential limitation, we compared the trait empathy levels of our sample, as measured by the QCAE (Reniers et al., 2011), to those of a control sample of 18-to 35-year-old males who were not preselected for minimal VVG use.The control sample was taken from the dataset of Borghi et al., 2023, which is freely accessible online (https://osf.io/ujp3e).We chose this open dataset because we deemed it highly comparable to our own sample, having also been drawn from the Austrian population, by researchers of the same university.To test whether our sample exhibited higher trait empathy levels than the control sample, we calculated a one-sided Bayesian t-test for each of the five subdimensions of the QCAE, using the R package BayesFactor (Morey and Rouder, 2022).

Test-retest reliabilities
Given that our experimental design included measurements of participants' empathic responses in two sessions (once before playing the VVG or the control game, once after), the test-retest reliability ρ of these two measurements was of interest.
In our behavioral data, the empathic response in one session was given by the average difference in ratings for Pain trials minus No Pain trials in session 1 and 2. Given our estimated hierarchical Bayesian censored regression models, the test-retest reliability of empathic responses can be estimated as  where σ 2 b I and σ 2 b I : S are the variances of the random effect of Intensity and Intensity:Session, respectively, and where r bIbI : S is the correlation between these two random effects.
In our neural data, we defined the empathic response in one session as the average difference in BOLD signal to observing the other in pain vs. observing the other in no pain.Given our estimated hierarchical Bayesian regression model, the test-retest reliability of the neural response was given by the correlation coefficient between the random effect for the regressor Stimulus Other: Pain -No Pain in Session 1 and the random effect for the equivalent regressor in Session 2.

Bayesian design analysis
We based our sample size on the results of a power analysis designed for the frequentist inference framework (see section Methods: Power analysis).However, as we ultimately based our inference on BF tests, the theoretical long-term behavior of these tests, given our sample size and expected effect size, is of interest.Therefore, we conducted a post hoc BF design analysis by means of a Monte Carlo simulation experiment (Schönbrodt and Wagenmakers, 2018).
The analysis was performed using the R package BayesFactor (Morey and Rouder, 2022).We simulated data from the scenario in which there was no VVG effect on the outcome variable (H0; Cohen's d=0), as well as from three scenarios where there was a true VVG effect (H1).Here, we considered three different effect sizes: d=0.4,which is close to the effect size estimate of Anderson et al., 2010; the exact estimate was (d=0.394);d=0.3, which is the effect size we used in our power analysis; and d=0.2, the conventional threshold for small effects.
For each scenario/effect size, we randomly generated 10,000 datasets of the same size as our real sample (control group = 44 participants; experimental group = 45 participants) and subjected them to BF hypothesis tests, assessing whether the BF provided evidence for the alternative hypothesis (BF>3), for the null hypothesis (BF<1/3), or inconclusive evidence (1/3<BF<3).For the behavioral and neural empathy measures, which were measured in two sessions (once before playing the VVG or the control game, once after), we used test-retest-reliability estimates that are close to those from the previous section.

Cross-task correlations
We additionally report the empirical correlations between the behavioral and neural measurements of our participants empathic response in the empathy-for-pain task, and their response in the emotional reactivity task.As indicators of participants' behavioral responses, we used their estimated random effects from the Bayesian hierarchical models on their rating data (for Empathy for Pain: factor Intensity, i.The following datasets were generated:

fMRI data analysis
We fitted hierarchical regression models to the BOLD response data extracted from the ROIs.To account for the autocorrelation that is to be expected in fMRI data, we assumed that the residual error terms within a run could be described by an autoregressive process of order 1 (AR(1)).
To facilitate interpretation and formulation of priors, we directly parameterized the model in terms of contrasts of regression weights.For each subject i and session j , we first build the raw design matrix X R ij as is done in established software such as SPM (Wellcome Trust Centre for Neuroimaging, https://www.fil.ion.ucl.ac.uk/spm).Shortly, for each separate type of event, we created a regressor representing the expected BOLD signal induced by the event by convolving a boxcar function of appropriate onset and length with the canonical hemodynamic response function.Then we constructed the design matrix in terms of contrasts X C ij by right-multiplying X R ij with a matrix C that encoded the contrasts of interest.For example, consider the raw design matrix X R ij of the empathyfor-pain paradigm with the following mapping between column numbers and events:

Then
results in a matrix X C ij with the column-to-contrast mapping Finally, for tasks with two sessions (i.e. the empathy-for-pain task), main effects and interactions with the session factor were represented by constructing the final first-level design matrix as the block matrix With this construction, the first half of columns of X i correspond to the effects of events marginal to the session factor, while the second half of columns correspond to the interactions between events and the session factor.
For describing the hierarchical model, we denote the sequence of extracted signals of participant i in session j as y ij .Further, we let t index the timepoints within each such sequence, such that y ij = ( y ij1 , y ij2 , ..., y ijt , ... ) ′ .To account for low-frequency changes of signal of no interest (e.g.scanner drift), each y ij was first filtered with a high-pass filter of period 128.As differences in grand mean between participants and sessions were not of interest, each sequence y ij was mean centered to have mean 0. Further, to change the arbitrary scale of BOLD response signals to a known scale, each y ij was scaled to have variance 1.The same operations were also performed on each design matrix X ij .
For each y ij , we formulate the linear model where β is the vector of fixed effects of contrasts, G i is a variable that takes the value -1 when participant i is in the control group, and 1 when participant i is in the experimental group, γ is the vector of fixed effects of the factor group, b i is the vector of random effects of participant i , and u ij is the vector of error terms of participant i in session j .We assume that u ij follows an AR(1) process, thus ) , where φ ij is the autoregression parameter of person i in session j , ε ijt are independently and identically distributed impulses, and σ εij is the standard deviation of these impulses for participant i and session j .Following the hierarchical modeling approach, we assume that φ ij and σ εij are themselves drawn from a higher-order distribution, for whose hyperparameters we formulate weak priors: and ) .
For the vector of random effects, we assume where k b is the number of random effects, D is the diagonal matrix with the random effect standard deviations σ b1 , ..., σ bkb on the diagonal, and R is the correlation matrix of random effects.We further formulate the priors where σε is the theoretical mean of error standard deviations across participants.This reflects the assumption that random effects will not be much larger than the mean error standard deviation.Lastly, we formulate the following priors on the fixed effects, where k β and kγ are the number of fixed effects.Putting the prior on the ratio β σε instead of β allows us to formulate an appropriately informed prior without prior knowledge of the average variance of the error term.We use the scaling factor 1/10 to represent our prior assumption that fixed effects are likely to be much smaller (in absolute value) than the average error standard deviation, due to the high amount of noise in fMRI signal.However, as this prior formulation might still have been too vague for proper hypothesis testing via the BF, we additionally informed the prior with a fraction of 2/n of the likelihood of the data (where n is the sample size), therefore calculating fractional BFs (O'Hagan, 1995).
restricted to zero (one-sided hypothesis test).Posterior parameter means of models for ratings in the empathy-for-pain task, including the trait covariate SSRT (stop-signal reaction time).Dependent variable: empathy ratings (visual analog scale, range: 0-100).Factor codings: Group: control game group = -1, violent game group = 1; Session: first session = -1, second session = 1; Intensity: non-painful = -1, painful = 1.The variable SSRT was scaled to mean 0 and standard deviation 1. Bayes factors were derived from comparing a model where the respective parameter was unrestricted to a model where it was restricted to zero.†These Bayes factors were derived from comparing a model where the parameter was restricted to be negative to a model where it was restricted to zero (one-sided hypothesis test).

Figure 1 .
Figure1.Schematic depiction of the experimental tasks.(A) Empathy-for-pain task.In trials of the Self condition, participants passively received electrical stimuli.In the Other condition, participants experienced how another person (a confederate) received electrical stimuli.The stimuli were either painful or not painful.In the cue phase, an arrow indicated the recipient (downwards: Self; right: Other) and the intensity (blue: not painful; red: painful) of the next stimulus.In the stimulation phase, the stimulus was delivered.After half of the trials, participants were asked to rate the last stimulus.The confederate depicted has given informed consent that his photograph can be published.(B) Emotional reactivity task.Participants were presented pictures with different content (violent or neutral) and different context (real or game context).After observing a block of pictures, participants rated their current unpleasantness on a visual analog scale from 0 to 100.

Figure 2 .
Figure 2. Behavioral results.Depicted are participants' ratings during the empathy-for-pain task (A and B) and the emotional reactivity task (C).Ratings were given on a visual analog scale (range: 0-100).(A) Empathy for pain, painfulness rating.Question text: 'How painful for the other?'.(B) Empathy for pain, unpleasantness rating.Question text: 'How unpleasant for yourself?'.Note that an apparent trend toward a three-way interaction Figure 2 continued on next page

Figure 3 .
Figure 3. Results of the whole-brain analyses for region-of-interest definition.(A) Empathy-for-pain task.Clusters represent areas where brain activity was increased when the confederate received a painful electrical stimulus, compared to a non-painful stimulus.(B) Emotional reactivity task.Clusters represent areas where brain activity was increased during the observation of violent images, compared to neutral images.All results p<0.05 FWEcorrected.This figure was made with the software MRICron (https://www.nitrc.org/projects/mricron).
I : S , b I + b I : S ) √ Var ( b I − b I : S ) Var ( b I + b I : S ) where Cov and Var are the Covariance and Variance, respectively, b I is the random effect of the factor Intensity (Pain vs.No Pain), and b I : S is the random effect of the interaction of factors Intensity and Session.By the bilinearity of the covariance operator, this formula can be written in terms of estimated model parameters as e. Pain vs.No Pain; for Emotional Reactivity: factor Context, i.e.Violent vs. Neutral).As indicators of participants' neural responses, we used their estimated random effects from the Bayesian models on signals extracted from the ROIs (for Empathy for Pain: regressor Stimulus Other: Pain -No Pain; for Emotional Reactivity: regressor Violent -Neutral).

Table 3 .
Comparison of trait empathy levels between experimental group and control group.Bayes factors were derived from comparing a model where the mean difference violent video game (VVG) group -Control group was positive to a model where it was restricted to zero (one-sided Bayesian t-test).

.
Posterior parameter estimates and contrasts of models for functional magnetic resonance imaging (fMRI) signal in the emotional reactivity task.Dependent variable: fMRI signal extracted from the respective region of interest (ROI) (standardized to unit variance): Fixed effect: terms in standard font describe the mean regression parameter of the respective event averaged across all conditions.Terms in italic font describe the fixed effect of the respective condition on the regression parameter.Factor codings: Group: control game group = -1, violent game group = 1; Content: neutral = -1, violent = 1; Context: real = -1, game = 1.β/σ e : Mean model parameter divided by the mean error standard deviation.Bayes factors were derived from comparing a model where the respective parameter was unrestricted to a model where it was restricted to zero.†These Bayes factors were derived from comparing a model where the parameter was restricted to be negative to a model where it was restricted to zero (one-sided hypothesis test).